[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-30 Thread Mary Zhang
childer >> AIT Risø Campus >> Bygning 109, rum S14 >> >> >> From: Mary Zhang >> Sent: Tuesday, April 30, 2024 5:00 PM >> To: Frank Schilder >> Cc: Eugen Block; ceph-users@ceph.io; Wesley Dillingham >> Su

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-30 Thread Mary Zhang
M > To: Frank Schilder > Cc: Eugen Block; ceph-users@ceph.io; Wesley Dillingham > Subject: Re: [ceph-users] Re: Remove an OSD with hardware issue caused rgw > 503 > > Thank you Frank for sharing such valuable experience! I really appreciate > it. > We observe similar timelines

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-30 Thread Frank Schilder
From: Mary Zhang Sent: Tuesday, April 30, 2024 5:00 PM To: Frank Schilder Cc: Eugen Block; ceph-users@ceph.io; Wesley Dillingham Subject: Re: [ceph-users] Re: Remove an OSD with hardware issue caused rgw 503 Thank you Frank for sharing such valuable experience! I really

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-30 Thread Mary Zhang
From: Eugen Block > Sent: Saturday, April 27, 2024 10:29 AM > To: Mary Zhang > Cc: ceph-users@ceph.io; Wesley Dillingham > Subject: [ceph-users] Re: Remove an OSD with hardware issue caused rgw 503 > > If the rest of the cluster is healthy and your resiliency is > configur

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-30 Thread Frank Schilder
4 10:29 AM To: Mary Zhang Cc: ceph-users@ceph.io; Wesley Dillingham Subject: [ceph-users] Re: Remove an OSD with hardware issue caused rgw 503 If the rest of the cluster is healthy and your resiliency is configured properly, for example to sustain the loss of one or more hosts at a time, you don’t nee

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-27 Thread Mary Zhang
Thank you Eugen so much for your insights! We will definitely apply this method next time. :-) Best Regards, Mary On Sat, Apr 27, 2024 at 1:29 AM Eugen Block wrote: > If the rest of the cluster is healthy and your resiliency is > configured properly, for example to sustain the loss of one or

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-27 Thread Eugen Block
If the rest of the cluster is healthy and your resiliency is configured properly, for example to sustain the loss of one or more hosts at a time, you don’t need to worry about a single disk. Just take it out and remove it (forcefully) so it doesn’t have any clients anymore. Ceph will

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Mary Zhang
Thank you Wesley for the clear explanation between the 2 methods! The tracker issue you mentioned https://tracker.ceph.com/issues/44400 talks about primary-affinity. Could primary-affinity help remove an OSD with hardware issue from the cluster gracefully? Thanks, Mary On Fri, Apr 26, 2024 at

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Wesley Dillingham
What you want to do is to stop the OSD (and all its copies of data it contains) by stopping the OSD service immediately. The downside of this approach is it causes the PGs on that OSD to be degraded. But the upside is the OSD which has bad hardware is immediately no longer participating in any

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Mary Zhang
Thank you Eugen for your warm help! I'm trying to understand the difference between 2 methods. For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph Documentation says it involves 2 steps: 1. evacuating all

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Eugen Block
Hi, if you remove the OSD this way, it will be drained. Which means that it will try to recover PGs from this OSD, and in case of hardware failure it might lead to slow requests. It might make sense to forcefully remove the OSD without draining: - stop the osd daemon - mark it as out -