[ceph-users] Adding Labels Section to Perf Counters Output

2023-01-31 Thread Ali Maredia
Hi Ceph Developers and Users,

Various upstream developers and I are working on adding labels to perf
counters (https://github.com/ceph/ceph/pull/48657).

We would like to understand the ramifications of changing the format of the
json dumped by the `perf dump` command for the Reef Release on users and
components of Ceph.

As an example given in the PR, currently unlabeled counters are dumped like
this in comparison with their new labeled counterparts.

"some unlabeled_counter": {
"put_b": 1048576,
},
"some labeled_counter": {
"labels": {
"Bucket: "bkt1",
"User: "user1",
},
"counters": {
"put_b": 1048576,
},
},

Here is an example given in the PR of the old style unlabeled counters
being dumped in the same format as the labeled counters:

"some unlabeled": {
"labels": {
},
"counters": {
"put_b": 1048576,
},
},
"some labeled": {
"labels": {
"Bucket: "bkt1",
"User: "user1",
},
"counters": {
"put_b": 1048576,
},
},

Would users/consumers of these counters be opposed to changing the format?
Why is this the case?

As far as I know there are ceph-mgr modules related to Prometheus and
telemetry that are consuming the current unlabeled counters. Also this
topic will be discussed at the upcoming Ceph Developer Monthly EMEA as well.

Best,
Ali
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: January Ceph Science Virtual User Group

2023-01-31 Thread Mike Perez
And here's the recording in case you missed it:

https://www.youtube.com/playlist?list=PLrBUGiINAakM3d4bw6Rb7EZUcLd98iaWG

On Thu, Jan 26, 2023 at 6:15 AM Kevin Hrpcek 
wrote:

> Hey all,
>
> We will be having a Ceph science/research/big cluster call on Tuesday
> January 31st. If anyone wants to discuss something specific they can add
> it to the pad linked below. If you have questions or comments you can
> contact me.
>
> This is an informal open call of community members mostly from
> hpc/htc/research environments where we discuss whatever is on our minds
> regarding ceph. Updates, outages, features, maintenance, etc...there is
> no set presenter but I do attempt to keep the conversation lively.
>
> Pad URL:
> https://pad.ceph.com/p/Ceph_Science_User_Group_20230131
>
> Ceph calendar event details:
> January 31, 2023
> 15:00 UTC
> 4pm Central European
> 9am Central US
>
> Description: Main pad for discussions:
> https://pad.ceph.com/p/Ceph_Science_User_Group_Index
> Meetings will be recorded and posted to the Ceph Youtube channel.
> To join the meeting on a computer or mobile phone:
> https://bluejeans.com/908675367?src=calendarLink
> To join from a Red Hat Deskphone or Softphone, dial: 84336.
> Connecting directly from a room system?
>  1.) Dial: 199.48.152.152 or bjn.vc 
>  2.) Enter Meeting ID: 908675367
> Just want to dial in on your phone?
>  1.) Dial one of the following numbers: 408-915-6466 (US)
>  See all numbers: https://www.redhat.com/en/conference-numbers
>  2.) Enter Meeting ID: 908675367
>  3.) Press #
> Want to test your video connection? https://bluejeans.com/111
>
> Kevin
>
> --
> Kevin Hrpcek
> NASA VIIRS Atmosphere SIPS/TROPICS
> Space Science & Engineering Center
> University of Wisconsin-Madison
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Mike Perez
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to get RBD client log?

2023-01-31 Thread Jinhao Hu
Hi,

How can I collect the logs of the RBD client?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replacing OSD with containerized deployment

2023-01-31 Thread Guillaume Abrioux
On Tue, 31 Jan 2023 at 22:31, mailing-lists  wrote:

> I am not sure. I didn't find it... It should be somewhere, right? I used
> the dashboard to create the osd service.
>

what does a `cephadm shell -- ceph orch ls osd --format yaml` say?

-- 

*Guillaume Abrioux*Senior Software Engineer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replacing OSD with containerized deployment

2023-01-31 Thread mailing-lists

Did your db/wall device show as having free space prior to the OSD creation?

Yes.

root@ceph-a1-06:~# pvs
  PV   VG    Fmt Attr 
PSize    PFree
  /dev/nvme0n1 ceph-3a336b8e-ed39-4532-a199-ac6a3730840b lvm2 a-- 
5.82t   2.91t
  /dev/nvme1n1 ceph-b38117e8-8e50-48dd-95f2-b4226286bfde lvm2 a-- 
5.82t   2.91t


Although:

ceph-a1-06  /dev/nvme0n1  ssd 
Dell_Ent_NVMe_AGN_MU_AIC_6.4TB_S61MNE0R900788  6401G 102s ago   LVM 
detected, *locked*
ceph-a1-06  /dev/nvme1n1  ssd 
Dell_Ent_NVMe_AGN_MU_AIC_6.4TB_S61MNE0R900777  6401G 102s ago   LVM 
detected, *locked*





What does your OSD service specification look like?


I am not sure. I didn't find it... It should be somewhere, right? I used 
the dashboard to create the osd service.



Best

Ken




On 31.01.23 12:35, David Orman wrote:

What does your OSD service specification look like? Did your db/wall device 
show as having free space prior to the OSD creation?

On Tue, Jan 31, 2023, at 04:01, mailing-lists wrote:

OK, the OSD is filled again. In and Up, but it is not using the nvme
WAL/DB anymore.

And it looks like the lvm group of the old osd is still on the nvme
drive. I come to this idea, because the two nvme drives still have 9 lvm
groups each. 18 groups but only 17 osd are using the nvme (shown in
dashboard).


Do you have a hint on how to fix this?



Best

Ken



On 30.01.23 16:50, mailing-lists wrote:

oph wait,

i might have been too impatient:


1/30/23 4:43:07 PM[INF]Deploying daemon osd.232 on ceph-a1-06

1/30/23 4:42:26 PM[INF]Found osd claims for drivegroup
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}

1/30/23 4:42:26 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:42:19 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims for drivegroup
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:00 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims for drivegroup
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}



Although, it doesnt show the NVME as wal/db yet, but i will let it
proceed to a clear state until i do anything further.


On 30.01.23 16:42, mailing-lists wrote:

root@ceph-a2-01:/# ceph osd destroy 232 --yes-i-really-mean-it
destroyed osd.232


OSD 232 shows now as destroyed and out in the dashboard.


root@ceph-a1-06:/# ceph-volume lvm zap /dev/sdm
--> Zapping: /dev/sdm
--> --destroy was not specified, but zapping a whole device will
remove the partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdm bs=1M count=10
conv=fsync
  stderr: 10+0 records in
10+0 records out
  stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0675647 s, 155 MB/s
--> Zapping successful for: 


root@ceph-a2-01:/# ceph orch device ls

ceph-a1-06  /dev/sdm  hdd   TOSHIBA_X_X 16.0T 21m ago *locked*


It shows locked and is not automatically added now, which is good i
think? otherwise it would probably be a new osd 307.


root@ceph-a2-01:/# ceph orch osd rm status
No OSD remove/replace operations reported

root@ceph-a2-01:/# ceph orch osd rm 232 --replace
Unable to find OSDs: ['232']


Unfortunately it is still not replacing.


It is so weird, i tried this procedure exactly in my virtual ceph
environment and it just worked. The real scenario is acting up now. -.-


Do you have more hints for me?

Thank you for your help so far!


Best

Ken


On 30.01.23 15:46, David Orman wrote:

The 'down' status is why it's not being replaced, vs. destroyed,
which would allow the replacement. I'm not sure why --replace lead
to that scenario, but you will probably need to mark it destroyed
for it to be replaced.

https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-an-osd  
has instructions on the non-orch way of doing that. You only need 1/2.


You should look through your logs to see what happened that the OSD
was marked down and not destroyed. Obviously, make sure you
understand ramifications before running any commands. :)

David

On Mon, Jan 30, 2023, at 04:24, mailing-lists wrote:

# ceph orch osd rm status
No OSD remove/replace operations reported
# ceph orch osd rm 232 --replace
Unable to find OSDs: ['232']

It is not finding 232 anymore. It is still shown as down and out in
the
Ceph-Dashboard.


       pgs: 3236 active+clean


This is the new disk shown as locked (because unzapped at the moment).

# ceph orch device ls

ceph-a1-06  /dev/sdm  hdd   TOSHIBA_X_X 16.0T 9m ago
locked


Best

Ken


On 29.01.23 18:19, David Orman wrote:

What does "ceph orch osd rm status" show before you try the zap? Is
your cluster still backfilling to the other OSDs for the PGs that
were
on the failed 

[ceph-users] Re: ceph/daemon stable tag

2023-01-31 Thread Jonas Nemeikšis
Cool, thank you!

On Tue, Jan 31, 2023 at 4:34 PM Guillaume Abrioux 
wrote:

> v6.0.10-stable-6.0-pacific-centos-stream8 (pacific 16.2.11) is now
> available on quay.io
>
> Thanks,
>
> On Tue, 31 Jan 2023 at 13:43, Guillaume Abrioux 
> wrote:
>
>> On Tue, 31 Jan 2023 at 11:14, Jonas Nemeikšis 
>> wrote:
>>
>>> Hello Guillaume,
>>>
>>> A little bit sad news about drop. :/
>>>
>>
>> Why? That shouldn't change a lot in the end. You will be able to use
>> ceph/ceph:vXX images instead.
>>
>>
>>> We've just future plans to migrate to cephadm, but not for now:)
>>>
>>> I would like to update and test Pacific's latest version.
>>>
>>
>> Let me check if I can get these tags pushed quickly, I'll update this
>> thread.
>>
>> Thanks,
>>
>> --
>>
>> *Guillaume Abrioux*Senior Software Engineer
>>
>
>
> --
>
> *Guillaume Abrioux*Senior Software Engineer
>


-- 
Jonas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replacing OSD with containerized deployment

2023-01-31 Thread mailing-lists

Did your db/wall device show as having free space prior to the OSD creation?

Yes.

root@ceph-a1-06:~# pvs
  PV   VG    Fmt Attr 
PSize    PFree
  /dev/nvme0n1 ceph-3a336b8e-ed39-4532-a199-ac6a3730840b lvm2 a-- 
5.82t   2.91t
  /dev/nvme1n1 ceph-b38117e8-8e50-48dd-95f2-b4226286bfde lvm2 a-- 
5.82t   2.91t


Although:

ceph-a1-06  /dev/nvme0n1  ssd 
Dell_Ent_NVMe_AGN_MU_AIC_6.4TB_S61MNE0R900788  6401G 102s ago   LVM 
detected, *locked*
ceph-a1-06  /dev/nvme1n1  ssd 
Dell_Ent_NVMe_AGN_MU_AIC_6.4TB_S61MNE0R900777  6401G 102s ago   LVM 
detected, *locked*





What does your OSD service specification look like?


I am not sure. I didn't find it... It should be somewhere, right? I used 
the dashboard to create the osd service.



Best

Ken




On 31.01.23 12:35, David Orman wrote:

What does your OSD service specification look like? Did your db/wall device 
show as having free space prior to the OSD creation?

On Tue, Jan 31, 2023, at 04:01, mailing-lists wrote:

OK, the OSD is filled again. In and Up, but it is not using the nvme
WAL/DB anymore.

And it looks like the lvm group of the old osd is still on the nvme
drive. I come to this idea, because the two nvme drives still have 9 lvm
groups each. 18 groups but only 17 osd are using the nvme (shown in
dashboard).


Do you have a hint on how to fix this?



Best

Ken



On 30.01.23 16:50, mailing-lists wrote:

oph wait,

i might have been too impatient:


1/30/23 4:43:07 PM[INF]Deploying daemon osd.232 on ceph-a1-06

1/30/23 4:42:26 PM[INF]Found osd claims for drivegroup
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}

1/30/23 4:42:26 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:42:19 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims for drivegroup
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:00 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims for drivegroup
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}



Although, it doesnt show the NVME as wal/db yet, but i will let it
proceed to a clear state until i do anything further.


On 30.01.23 16:42, mailing-lists wrote:

root@ceph-a2-01:/# ceph osd destroy 232 --yes-i-really-mean-it
destroyed osd.232


OSD 232 shows now as destroyed and out in the dashboard.


root@ceph-a1-06:/# ceph-volume lvm zap /dev/sdm
--> Zapping: /dev/sdm
--> --destroy was not specified, but zapping a whole device will
remove the partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdm bs=1M count=10
conv=fsync
  stderr: 10+0 records in
10+0 records out
  stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0675647 s, 155 MB/s
--> Zapping successful for: 


root@ceph-a2-01:/# ceph orch device ls

ceph-a1-06  /dev/sdm  hdd   TOSHIBA_X_X 16.0T 21m ago *locked*


It shows locked and is not automatically added now, which is good i
think? otherwise it would probably be a new osd 307.


root@ceph-a2-01:/# ceph orch osd rm status
No OSD remove/replace operations reported

root@ceph-a2-01:/# ceph orch osd rm 232 --replace
Unable to find OSDs: ['232']


Unfortunately it is still not replacing.


It is so weird, i tried this procedure exactly in my virtual ceph
environment and it just worked. The real scenario is acting up now. -.-


Do you have more hints for me?

Thank you for your help so far!


Best

Ken


On 30.01.23 15:46, David Orman wrote:

The 'down' status is why it's not being replaced, vs. destroyed,
which would allow the replacement. I'm not sure why --replace lead
to that scenario, but you will probably need to mark it destroyed
for it to be replaced.

https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-an-osd  
has instructions on the non-orch way of doing that. You only need 1/2.


You should look through your logs to see what happened that the OSD
was marked down and not destroyed. Obviously, make sure you
understand ramifications before running any commands. :)

David

On Mon, Jan 30, 2023, at 04:24, mailing-lists wrote:

# ceph orch osd rm status
No OSD remove/replace operations reported
# ceph orch osd rm 232 --replace
Unable to find OSDs: ['232']

It is not finding 232 anymore. It is still shown as down and out in
the
Ceph-Dashboard.


       pgs: 3236 active+clean


This is the new disk shown as locked (because unzapped at the moment).

# ceph orch device ls

ceph-a1-06  /dev/sdm  hdd   TOSHIBA_X_X 16.0T 9m ago
locked


Best

Ken


On 29.01.23 18:19, David Orman wrote:

What does "ceph orch osd rm status" show before you try the zap? Is
your cluster still backfilling to the other OSDs for the PGs that
were
on the failed 

[ceph-users] Re: ceph/daemon stable tag

2023-01-31 Thread Guillaume Abrioux
v6.0.10-stable-6.0-pacific-centos-stream8 (pacific 16.2.11) is now
available on quay.io

Thanks,

On Tue, 31 Jan 2023 at 13:43, Guillaume Abrioux  wrote:

> On Tue, 31 Jan 2023 at 11:14, Jonas Nemeikšis 
> wrote:
>
>> Hello Guillaume,
>>
>> A little bit sad news about drop. :/
>>
>
> Why? That shouldn't change a lot in the end. You will be able to use
> ceph/ceph:vXX images instead.
>
>
>> We've just future plans to migrate to cephadm, but not for now:)
>>
>> I would like to update and test Pacific's latest version.
>>
>
> Let me check if I can get these tags pushed quickly, I'll update this
> thread.
>
> Thanks,
>
> --
>
> *Guillaume Abrioux*Senior Software Engineer
>


-- 

*Guillaume Abrioux*Senior Software Engineer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Debian update to 16.2.11-1~bpo11+1 failing

2023-01-31 Thread Matthew Booth
Please excuse the request: I don't have the previous email in this
thread and the archives seem to be currently unavailable.

Do you have a link to the issue which was resolved? I am wondering if
it might be related to the recent issue I discovered on CoreOS 37.

Thanks,

Matt

On Mon, 30 Jan 2023 at 05:53,  wrote:
>
> This problem has been fixed by the Ceph team in the mean time, Pacific 
> upgrades and installations on Debian are now working as expected!
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Matthew Booth
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph/daemon stable tag

2023-01-31 Thread Guillaume Abrioux
On Tue, 31 Jan 2023 at 11:14, Jonas Nemeikšis  wrote:

> Hello Guillaume,
>
> A little bit sad news about drop. :/
>

Why? That shouldn't change a lot in the end. You will be able to use
ceph/ceph:vXX images instead.


> We've just future plans to migrate to cephadm, but not for now:)
>
> I would like to update and test Pacific's latest version.
>

Let me check if I can get these tags pushed quickly, I'll update this
thread.

Thanks,

-- 

*Guillaume Abrioux*Senior Software Engineer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replacing OSD with containerized deployment

2023-01-31 Thread David Orman
What does your OSD service specification look like? Did your db/wall device 
show as having free space prior to the OSD creation?

On Tue, Jan 31, 2023, at 04:01, mailing-lists wrote:
> OK, the OSD is filled again. In and Up, but it is not using the nvme 
> WAL/DB anymore.
>
> And it looks like the lvm group of the old osd is still on the nvme 
> drive. I come to this idea, because the two nvme drives still have 9 lvm 
> groups each. 18 groups but only 17 osd are using the nvme (shown in 
> dashboard).
>
>
> Do you have a hint on how to fix this?
>
>
>
> Best
>
> Ken
>
>
>
> On 30.01.23 16:50, mailing-lists wrote:
>> oph wait,
>>
>> i might have been too impatient:
>>
>>
>> 1/30/23 4:43:07 PM[INF]Deploying daemon osd.232 on ceph-a1-06
>>
>> 1/30/23 4:42:26 PM[INF]Found osd claims for drivegroup 
>> dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:42:26 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:42:19 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:41:01 PM[INF]Found osd claims for drivegroup 
>> dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:41:00 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:39:34 PM[INF]Found osd claims for drivegroup 
>> dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>>
>>
>> Although, it doesnt show the NVME as wal/db yet, but i will let it 
>> proceed to a clear state until i do anything further.
>>
>>
>> On 30.01.23 16:42, mailing-lists wrote:
>>> root@ceph-a2-01:/# ceph osd destroy 232 --yes-i-really-mean-it
>>> destroyed osd.232
>>>
>>>
>>> OSD 232 shows now as destroyed and out in the dashboard.
>>>
>>>
>>> root@ceph-a1-06:/# ceph-volume lvm zap /dev/sdm
>>> --> Zapping: /dev/sdm
>>> --> --destroy was not specified, but zapping a whole device will 
>>> remove the partition table
>>> Running command: /usr/bin/dd if=/dev/zero of=/dev/sdm bs=1M count=10 
>>> conv=fsync
>>>  stderr: 10+0 records in
>>> 10+0 records out
>>>  stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0675647 s, 155 MB/s
>>> --> Zapping successful for: 
>>>
>>>
>>> root@ceph-a2-01:/# ceph orch device ls
>>>
>>> ceph-a1-06  /dev/sdm  hdd   TOSHIBA_X_X 16.0T 21m ago *locked*
>>>
>>>
>>> It shows locked and is not automatically added now, which is good i 
>>> think? otherwise it would probably be a new osd 307.
>>>
>>>
>>> root@ceph-a2-01:/# ceph orch osd rm status
>>> No OSD remove/replace operations reported
>>>
>>> root@ceph-a2-01:/# ceph orch osd rm 232 --replace
>>> Unable to find OSDs: ['232']
>>>
>>>
>>> Unfortunately it is still not replacing.
>>>
>>>
>>> It is so weird, i tried this procedure exactly in my virtual ceph 
>>> environment and it just worked. The real scenario is acting up now. -.-
>>>
>>>
>>> Do you have more hints for me?
>>>
>>> Thank you for your help so far!
>>>
>>>
>>> Best
>>>
>>> Ken
>>>
>>>
>>> On 30.01.23 15:46, David Orman wrote:
 The 'down' status is why it's not being replaced, vs. destroyed, 
 which would allow the replacement. I'm not sure why --replace lead 
 to that scenario, but you will probably need to mark it destroyed 
 for it to be replaced.

 https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-an-osd
  
 has instructions on the non-orch way of doing that. You only need 1/2.

 You should look through your logs to see what happened that the OSD 
 was marked down and not destroyed. Obviously, make sure you 
 understand ramifications before running any commands. :)

 David

 On Mon, Jan 30, 2023, at 04:24, mailing-lists wrote:
> # ceph orch osd rm status
> No OSD remove/replace operations reported
> # ceph orch osd rm 232 --replace
> Unable to find OSDs: ['232']
>
> It is not finding 232 anymore. It is still shown as down and out in 
> the
> Ceph-Dashboard.
>
>
>       pgs: 3236 active+clean
>
>
> This is the new disk shown as locked (because unzapped at the moment).
>
> # ceph orch device ls
>
> ceph-a1-06  /dev/sdm  hdd   TOSHIBA_X_X 16.0T 9m ago
> locked
>
>
> Best
>
> Ken
>
>
> On 29.01.23 18:19, David Orman wrote:
>> What does "ceph orch osd rm status" show before you try the zap? Is
>> your cluster still backfilling to the other OSDs for the PGs that 
>> were
>> on the failed disk?
>>
>> David
>>
>> On Fri, Jan 27, 2023, at 03:25, mailing-lists wrote:
>>> Dear Ceph-Users,
>>>
>>> i am struggling to replace a disk. My ceph-cluster is not 
>>> replacing the
>>> old OSD even though I did:
>>>
>>> 

[ceph-users] Re: ceph/daemon stable tag

2023-01-31 Thread Jonas Nemeikšis
Hello Guillaume,

A little bit sad news about drop. :/

We've just future plans to migrate to cephadm, but not for now:)

I would like to update and test Pacific's latest version.



Thanks!

On Tue, Jan 31, 2023 at 11:23 AM Guillaume Abrioux 
wrote:

> Hello Jonas,
>
> As far as I remember, these tags were pushed mostly for ceph-ansible / OSP.
> By the way, the plan is to drop the ceph/daemon image. See corresponding
> PRs in both ceph-ansible and ceph-container repositories [1] [2]
>
> What stable tag are you looking for? I can trigger new builds if it can
> help you until you migrate to cephadm.
>
> [1] https://github.com/ceph/ceph-ansible/pull/7326
> [2] https://github.com/ceph/ceph-container/pull/2083
>
> On Tue, 31 Jan 2023 at 08:49, Jonas Nemeikšis 
> wrote:
>
>> Hello,
>>
>> What's the status with the *-stable-* tags?
>> https://quay.io/repository/ceph/daemon?tab=tags
>>
>> No longer build/support?
>>
>> What should we use until we'll migrate from ceph-ansible to cephadm?
>>
>>
>> Thanks.
>>
>> --
>> Jonas
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>
> --
>
> *Guillaume Abrioux*Senior Software Engineer
>


-- 
Jonas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replacing OSD with containerized deployment

2023-01-31 Thread mailing-lists
OK, the OSD is filled again. In and Up, but it is not using the nvme 
WAL/DB anymore.


And it looks like the lvm group of the old osd is still on the nvme 
drive. I come to this idea, because the two nvme drives still have 9 lvm 
groups each. 18 groups but only 17 osd are using the nvme (shown in 
dashboard).



Do you have a hint on how to fix this?



Best

Ken



On 30.01.23 16:50, mailing-lists wrote:

oph wait,

i might have been too impatient:


1/30/23 4:43:07 PM[INF]Deploying daemon osd.232 on ceph-a1-06

1/30/23 4:42:26 PM[INF]Found osd claims for drivegroup 
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}


1/30/23 4:42:26 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:42:19 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims for drivegroup 
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}


1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:00 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims for drivegroup 
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}


1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}



Although, it doesnt show the NVME as wal/db yet, but i will let it 
proceed to a clear state until i do anything further.



On 30.01.23 16:42, mailing-lists wrote:

root@ceph-a2-01:/# ceph osd destroy 232 --yes-i-really-mean-it
destroyed osd.232


OSD 232 shows now as destroyed and out in the dashboard.


root@ceph-a1-06:/# ceph-volume lvm zap /dev/sdm
--> Zapping: /dev/sdm
--> --destroy was not specified, but zapping a whole device will 
remove the partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdm bs=1M count=10 
conv=fsync

 stderr: 10+0 records in
10+0 records out
 stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0675647 s, 155 MB/s
--> Zapping successful for: 


root@ceph-a2-01:/# ceph orch device ls

ceph-a1-06  /dev/sdm  hdd   TOSHIBA_X_X 16.0T 21m ago *locked*


It shows locked and is not automatically added now, which is good i 
think? otherwise it would probably be a new osd 307.



root@ceph-a2-01:/# ceph orch osd rm status
No OSD remove/replace operations reported

root@ceph-a2-01:/# ceph orch osd rm 232 --replace
Unable to find OSDs: ['232']


Unfortunately it is still not replacing.


It is so weird, i tried this procedure exactly in my virtual ceph 
environment and it just worked. The real scenario is acting up now. -.-



Do you have more hints for me?

Thank you for your help so far!


Best

Ken


On 30.01.23 15:46, David Orman wrote:
The 'down' status is why it's not being replaced, vs. destroyed, 
which would allow the replacement. I'm not sure why --replace lead 
to that scenario, but you will probably need to mark it destroyed 
for it to be replaced.


https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-an-osd 
has instructions on the non-orch way of doing that. You only need 1/2.


You should look through your logs to see what happened that the OSD 
was marked down and not destroyed. Obviously, make sure you 
understand ramifications before running any commands. :)


David

On Mon, Jan 30, 2023, at 04:24, mailing-lists wrote:

# ceph orch osd rm status
No OSD remove/replace operations reported
# ceph orch osd rm 232 --replace
Unable to find OSDs: ['232']

It is not finding 232 anymore. It is still shown as down and out in 
the

Ceph-Dashboard.


      pgs: 3236 active+clean


This is the new disk shown as locked (because unzapped at the moment).

# ceph orch device ls

ceph-a1-06  /dev/sdm  hdd   TOSHIBA_X_X 16.0T 9m ago
locked


Best

Ken


On 29.01.23 18:19, David Orman wrote:

What does "ceph orch osd rm status" show before you try the zap? Is
your cluster still backfilling to the other OSDs for the PGs that 
were

on the failed disk?

David

On Fri, Jan 27, 2023, at 03:25, mailing-lists wrote:

Dear Ceph-Users,

i am struggling to replace a disk. My ceph-cluster is not 
replacing the

old OSD even though I did:

ceph orch osd rm 232 --replace

The OSD 232 is still shown in the osd list, but the new hdd will be
placed as a new OSD. This wouldnt mind me much, if the OSD was also
placed on the bluestoreDB / NVME, but it doesn't.


My steps:

"ceph orch osd rm 232 --replace"

remove the failed hdd.

add the new one.

Convert the disk within the servers bios, so that the node can have
direct access on it.

It shows up as /dev/sdt,

enter maintenance mode

reboot server

drive is now /dev/sdm (which the old drive had)

"ceph orch device zap node-x /dev/sdm"

A new OSD is placed on the cluster.


Can you give me a hint, where did I take a wrong turn? Why is the 
disk

not being used as OSD 232?


Best

Ken

___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe 

[ceph-users] Re: ceph/daemon stable tag

2023-01-31 Thread Guillaume Abrioux
Hello Jonas,

As far as I remember, these tags were pushed mostly for ceph-ansible / OSP.
By the way, the plan is to drop the ceph/daemon image. See corresponding
PRs in both ceph-ansible and ceph-container repositories [1] [2]

What stable tag are you looking for? I can trigger new builds if it can
help you until you migrate to cephadm.

[1] https://github.com/ceph/ceph-ansible/pull/7326
[2] https://github.com/ceph/ceph-container/pull/2083

On Tue, 31 Jan 2023 at 08:49, Jonas Nemeikšis  wrote:

> Hello,
>
> What's the status with the *-stable-* tags?
> https://quay.io/repository/ceph/daemon?tab=tags
>
> No longer build/support?
>
> What should we use until we'll migrate from ceph-ansible to cephadm?
>
>
> Thanks.
>
> --
> Jonas
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 

*Guillaume Abrioux*Senior Software Engineer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io