[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-30 Thread Mary Zhang
Sorry Frank, I typed the wrong name.

On Tue, Apr 30, 2024, 8:51 AM Mary Zhang  wrote:

> Sounds good. Thank you Kevin and have a nice day!
>
> Best Regards,
> Mary
>
> On Tue, Apr 30, 2024, 8:21 AM Frank Schilder  wrote:
>
>> I think you are panicking way too much. Chances are that you will never
>> need that command, so don't get fussed out by an old post.
>>
>> Just follow what I wrote and, in the extremely rare case that recovery
>> does not complete due to missing information, send an e-mail to this list
>> and state that you still have the disk of the down OSD. Someone will send
>> you the export/import commands within a short time.
>>
>> So stop worrying and just administrate your cluster with common storage
>> admin sense.
>>
>> Best regards,
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> 
>> From: Mary Zhang 
>> Sent: Tuesday, April 30, 2024 5:00 PM
>> To: Frank Schilder
>> Cc: Eugen Block; ceph-users@ceph.io; Wesley Dillingham
>> Subject: Re: [ceph-users] Re: Remove an OSD with hardware issue caused
>> rgw 503
>>
>> Thank you Frank for sharing such valuable experience! I really appreciate
>> it.
>> We observe similar timelines: it took more than 1 week to drain our OSD.
>> Regarding export PGs from failed disk and inject it back to the cluster,
>> do you have any documentations? I find this online Ceph.io — Incomplete PGs
>> -- OH MY!<https://ceph.io/en/news/blog/2015/incomplete-pgs-oh-my/>, but
>> not sure whether it's the standard process.
>>
>> Thanks,
>> Mary
>>
>> On Tue, Apr 30, 2024 at 3:27 AM Frank Schilder > fr...@dtu.dk>> wrote:
>> Hi all,
>>
>> I second Eugen's recommendation. We have a cluster with large HDD OSDs
>> where the following timings are found:
>>
>> - drain an OSD: 2 weeks.
>> - down an OSD and let cluster recover: 6 hours.
>>
>> The drain OSD procedure is - in my experience - a complete waste of time,
>> actually puts your cluster at higher risk of a second failure (its not
>> guaranteed that the bad PG(s) is/are drained first) and also screws up all
>> sorts of internal operations like scrub etc for an unnecessarily long time.
>> The recovery procedure is much faster, because it uses all-to-all recovery
>> while drain is limited to no more than max_backfills PGs at a time and your
>> broken disk sits much longer in the cluster.
>>
>> On SSDs the "down OSD"-method shows a similar speed-up factor.
>>
>> For a security measure, don't destroy the OSD right away, wait for
>> recovery to complete and only then destroy the OSD and throw away the disk.
>> In case an error occurs during recovery, you can almost always still export
>> PGs from a failed disk and inject it back into the cluster. This, however,
>> requires to take disks out as soon as they show problems and before they
>> fail hard. Keep a little bit of life time to have a chance to recover data.
>> Look at the manual of ddrescue why it is important to stop IO from a
>> failing disk as soon as possible.
>>
>> Best regards,
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> 
>> From: Eugen Block mailto:ebl...@nde.ag>>
>> Sent: Saturday, April 27, 2024 10:29 AM
>> To: Mary Zhang
>> Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io>; Wesley Dillingham
>> Subject: [ceph-users] Re: Remove an OSD with hardware issue caused rgw 503
>>
>> If the rest of the cluster is healthy and your resiliency is
>> configured properly, for example to sustain the loss of one or more
>> hosts at a time, you don’t need to worry about a single disk. Just
>> take it out and remove it (forcefully) so it doesn’t have any clients
>> anymore. Ceph will immediately assign different primary OSDs and your
>> clients will be happy again. ;-)
>>
>> Zitat von Mary Zhang > maryzhang0...@gmail.com>>:
>>
>> > Thank you Wesley for the clear explanation between the 2 methods!
>> > The tracker issue you mentioned https://tracker.ceph.com/issues/44400
>> talks
>> > about primary-affinity. Could primary-affinity help remove an OSD with
>> > hardware issue from the cluster gracefully?
>> >
>> > Thanks,
>> > Mary
>> >
>> >
>> > On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham <
>> w...@wesdillingham.com<mailto:w...@wesdillingham

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-30 Thread Mary Zhang
Sounds good. Thank you Kevin and have a nice day!

Best Regards,
Mary

On Tue, Apr 30, 2024, 8:21 AM Frank Schilder  wrote:

> I think you are panicking way too much. Chances are that you will never
> need that command, so don't get fussed out by an old post.
>
> Just follow what I wrote and, in the extremely rare case that recovery
> does not complete due to missing information, send an e-mail to this list
> and state that you still have the disk of the down OSD. Someone will send
> you the export/import commands within a short time.
>
> So stop worrying and just administrate your cluster with common storage
> admin sense.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Mary Zhang 
> Sent: Tuesday, April 30, 2024 5:00 PM
> To: Frank Schilder
> Cc: Eugen Block; ceph-users@ceph.io; Wesley Dillingham
> Subject: Re: [ceph-users] Re: Remove an OSD with hardware issue caused rgw
> 503
>
> Thank you Frank for sharing such valuable experience! I really appreciate
> it.
> We observe similar timelines: it took more than 1 week to drain our OSD.
> Regarding export PGs from failed disk and inject it back to the cluster,
> do you have any documentations? I find this online Ceph.io — Incomplete PGs
> -- OH MY!<https://ceph.io/en/news/blog/2015/incomplete-pgs-oh-my/>, but
> not sure whether it's the standard process.
>
> Thanks,
> Mary
>
> On Tue, Apr 30, 2024 at 3:27 AM Frank Schilder  fr...@dtu.dk>> wrote:
> Hi all,
>
> I second Eugen's recommendation. We have a cluster with large HDD OSDs
> where the following timings are found:
>
> - drain an OSD: 2 weeks.
> - down an OSD and let cluster recover: 6 hours.
>
> The drain OSD procedure is - in my experience - a complete waste of time,
> actually puts your cluster at higher risk of a second failure (its not
> guaranteed that the bad PG(s) is/are drained first) and also screws up all
> sorts of internal operations like scrub etc for an unnecessarily long time.
> The recovery procedure is much faster, because it uses all-to-all recovery
> while drain is limited to no more than max_backfills PGs at a time and your
> broken disk sits much longer in the cluster.
>
> On SSDs the "down OSD"-method shows a similar speed-up factor.
>
> For a security measure, don't destroy the OSD right away, wait for
> recovery to complete and only then destroy the OSD and throw away the disk.
> In case an error occurs during recovery, you can almost always still export
> PGs from a failed disk and inject it back into the cluster. This, however,
> requires to take disks out as soon as they show problems and before they
> fail hard. Keep a little bit of life time to have a chance to recover data.
> Look at the manual of ddrescue why it is important to stop IO from a
> failing disk as soon as possible.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________
> From: Eugen Block mailto:ebl...@nde.ag>>
> Sent: Saturday, April 27, 2024 10:29 AM
> To: Mary Zhang
> Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io>; Wesley Dillingham
> Subject: [ceph-users] Re: Remove an OSD with hardware issue caused rgw 503
>
> If the rest of the cluster is healthy and your resiliency is
> configured properly, for example to sustain the loss of one or more
> hosts at a time, you don’t need to worry about a single disk. Just
> take it out and remove it (forcefully) so it doesn’t have any clients
> anymore. Ceph will immediately assign different primary OSDs and your
> clients will be happy again. ;-)
>
> Zitat von Mary Zhang  maryzhang0...@gmail.com>>:
>
> > Thank you Wesley for the clear explanation between the 2 methods!
> > The tracker issue you mentioned https://tracker.ceph.com/issues/44400
> talks
> > about primary-affinity. Could primary-affinity help remove an OSD with
> > hardware issue from the cluster gracefully?
> >
> > Thanks,
> > Mary
> >
> >
> > On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham  <mailto:w...@wesdillingham.com>>
> > wrote:
> >
> >> What you want to do is to stop the OSD (and all its copies of data it
> >> contains) by stopping the OSD service immediately. The downside of this
> >> approach is it causes the PGs on that OSD to be degraded. But the
> upside is
> >> the OSD which has bad hardware is immediately no  longer participating
> in
> >> any client IO (the source of your RGW 503s). In this situation the PGs
> go
> >> into degraded+backfilling
> >>
> &

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-30 Thread Frank Schilder
I think you are panicking way too much. Chances are that you will never need 
that command, so don't get fussed out by an old post.

Just follow what I wrote and, in the extremely rare case that recovery does not 
complete due to missing information, send an e-mail to this list and state that 
you still have the disk of the down OSD. Someone will send you the 
export/import commands within a short time.

So stop worrying and just administrate your cluster with common storage admin 
sense.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Mary Zhang 
Sent: Tuesday, April 30, 2024 5:00 PM
To: Frank Schilder
Cc: Eugen Block; ceph-users@ceph.io; Wesley Dillingham
Subject: Re: [ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

Thank you Frank for sharing such valuable experience! I really appreciate it.
We observe similar timelines: it took more than 1 week to drain our OSD.
Regarding export PGs from failed disk and inject it back to the cluster, do you 
have any documentations? I find this online Ceph.io — Incomplete PGs -- OH 
MY!<https://ceph.io/en/news/blog/2015/incomplete-pgs-oh-my/>, but not sure 
whether it's the standard process.

Thanks,
Mary

On Tue, Apr 30, 2024 at 3:27 AM Frank Schilder 
mailto:fr...@dtu.dk>> wrote:
Hi all,

I second Eugen's recommendation. We have a cluster with large HDD OSDs where 
the following timings are found:

- drain an OSD: 2 weeks.
- down an OSD and let cluster recover: 6 hours.

The drain OSD procedure is - in my experience - a complete waste of time, 
actually puts your cluster at higher risk of a second failure (its not 
guaranteed that the bad PG(s) is/are drained first) and also screws up all 
sorts of internal operations like scrub etc for an unnecessarily long time. The 
recovery procedure is much faster, because it uses all-to-all recovery while 
drain is limited to no more than max_backfills PGs at a time and your broken 
disk sits much longer in the cluster.

On SSDs the "down OSD"-method shows a similar speed-up factor.

For a security measure, don't destroy the OSD right away, wait for recovery to 
complete and only then destroy the OSD and throw away the disk. In case an 
error occurs during recovery, you can almost always still export PGs from a 
failed disk and inject it back into the cluster. This, however, requires to 
take disks out as soon as they show problems and before they fail hard. Keep a 
little bit of life time to have a chance to recover data. Look at the manual of 
ddrescue why it is important to stop IO from a failing disk as soon as possible.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block mailto:ebl...@nde.ag>>
Sent: Saturday, April 27, 2024 10:29 AM
To: Mary Zhang
Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io>; Wesley Dillingham
Subject: [ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

If the rest of the cluster is healthy and your resiliency is
configured properly, for example to sustain the loss of one or more
hosts at a time, you don’t need to worry about a single disk. Just
take it out and remove it (forcefully) so it doesn’t have any clients
anymore. Ceph will immediately assign different primary OSDs and your
clients will be happy again. ;-)

Zitat von Mary Zhang mailto:maryzhang0...@gmail.com>>:

> Thank you Wesley for the clear explanation between the 2 methods!
> The tracker issue you mentioned https://tracker.ceph.com/issues/44400 talks
> about primary-affinity. Could primary-affinity help remove an OSD with
> hardware issue from the cluster gracefully?
>
> Thanks,
> Mary
>
>
> On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham 
> mailto:w...@wesdillingham.com>>
> wrote:
>
>> What you want to do is to stop the OSD (and all its copies of data it
>> contains) by stopping the OSD service immediately. The downside of this
>> approach is it causes the PGs on that OSD to be degraded. But the upside is
>> the OSD which has bad hardware is immediately no  longer participating in
>> any client IO (the source of your RGW 503s). In this situation the PGs go
>> into degraded+backfilling
>>
>> The alternative method is to keep the failing OSD up and in the cluster
>> but slowly migrate the data off of it, this would be a long drawn out
>> period of time in which the failing disk would continue to serve client
>> reads and also facilitate backfill but you wouldnt take a copy of the data
>> out of the cluster and cause degraded PGs. In this scenario the PGs would
>> be remapped+backfilling
>>
>> I tried to find a way to have your cake and eat it to in relation to this
>> "predicament" in this tracker issue: https://tracker.ceph.com/issues/4440

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-30 Thread Mary Zhang
Thank you Frank for sharing such valuable experience! I really appreciate
it.
We observe similar timelines: it took more than 1 week to drain our OSD.
Regarding export PGs from failed disk and inject it back to the cluster, do
you have any documentations? I find this online Ceph.io — Incomplete PGs --
OH MY! <https://ceph.io/en/news/blog/2015/incomplete-pgs-oh-my/>, but not
sure whether it's the standard process.

Thanks,
Mary

On Tue, Apr 30, 2024 at 3:27 AM Frank Schilder  wrote:

> Hi all,
>
> I second Eugen's recommendation. We have a cluster with large HDD OSDs
> where the following timings are found:
>
> - drain an OSD: 2 weeks.
> - down an OSD and let cluster recover: 6 hours.
>
> The drain OSD procedure is - in my experience - a complete waste of time,
> actually puts your cluster at higher risk of a second failure (its not
> guaranteed that the bad PG(s) is/are drained first) and also screws up all
> sorts of internal operations like scrub etc for an unnecessarily long time.
> The recovery procedure is much faster, because it uses all-to-all recovery
> while drain is limited to no more than max_backfills PGs at a time and your
> broken disk sits much longer in the cluster.
>
> On SSDs the "down OSD"-method shows a similar speed-up factor.
>
> For a security measure, don't destroy the OSD right away, wait for
> recovery to complete and only then destroy the OSD and throw away the disk.
> In case an error occurs during recovery, you can almost always still export
> PGs from a failed disk and inject it back into the cluster. This, however,
> requires to take disks out as soon as they show problems and before they
> fail hard. Keep a little bit of life time to have a chance to recover data.
> Look at the manual of ddrescue why it is important to stop IO from a
> failing disk as soon as possible.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Eugen Block 
> Sent: Saturday, April 27, 2024 10:29 AM
> To: Mary Zhang
> Cc: ceph-users@ceph.io; Wesley Dillingham
> Subject: [ceph-users] Re: Remove an OSD with hardware issue caused rgw 503
>
> If the rest of the cluster is healthy and your resiliency is
> configured properly, for example to sustain the loss of one or more
> hosts at a time, you don’t need to worry about a single disk. Just
> take it out and remove it (forcefully) so it doesn’t have any clients
> anymore. Ceph will immediately assign different primary OSDs and your
> clients will be happy again. ;-)
>
> Zitat von Mary Zhang :
>
> > Thank you Wesley for the clear explanation between the 2 methods!
> > The tracker issue you mentioned https://tracker.ceph.com/issues/44400
> talks
> > about primary-affinity. Could primary-affinity help remove an OSD with
> > hardware issue from the cluster gracefully?
> >
> > Thanks,
> > Mary
> >
> >
> > On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham  >
> > wrote:
> >
> >> What you want to do is to stop the OSD (and all its copies of data it
> >> contains) by stopping the OSD service immediately. The downside of this
> >> approach is it causes the PGs on that OSD to be degraded. But the
> upside is
> >> the OSD which has bad hardware is immediately no  longer participating
> in
> >> any client IO (the source of your RGW 503s). In this situation the PGs
> go
> >> into degraded+backfilling
> >>
> >> The alternative method is to keep the failing OSD up and in the cluster
> >> but slowly migrate the data off of it, this would be a long drawn out
> >> period of time in which the failing disk would continue to serve client
> >> reads and also facilitate backfill but you wouldnt take a copy of the
> data
> >> out of the cluster and cause degraded PGs. In this scenario the PGs
> would
> >> be remapped+backfilling
> >>
> >> I tried to find a way to have your cake and eat it to in relation to
> this
> >> "predicament" in this tracker issue:
> https://tracker.ceph.com/issues/44400
> >> but it was deemed "wont fix".
> >>
> >> Respectfully,
> >>
> >> *Wes Dillingham*
> >> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> >> w...@wesdillingham.com
> >>
> >>
> >>
> >>
> >> On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang 
> >> wrote:
> >>
> >>> Thank you Eugen for your warm help!
> >>>
> >>> I'm trying to understand the difference between 2 methods.
> >>> For method 1, or &

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-30 Thread Frank Schilder
Hi all,

I second Eugen's recommendation. We have a cluster with large HDD OSDs where 
the following timings are found:

- drain an OSD: 2 weeks.
- down an OSD and let cluster recover: 6 hours.

The drain OSD procedure is - in my experience - a complete waste of time, 
actually puts your cluster at higher risk of a second failure (its not 
guaranteed that the bad PG(s) is/are drained first) and also screws up all 
sorts of internal operations like scrub etc for an unnecessarily long time. The 
recovery procedure is much faster, because it uses all-to-all recovery while 
drain is limited to no more than max_backfills PGs at a time and your broken 
disk sits much longer in the cluster.

On SSDs the "down OSD"-method shows a similar speed-up factor.

For a security measure, don't destroy the OSD right away, wait for recovery to 
complete and only then destroy the OSD and throw away the disk. In case an 
error occurs during recovery, you can almost always still export PGs from a 
failed disk and inject it back into the cluster. This, however, requires to 
take disks out as soon as they show problems and before they fail hard. Keep a 
little bit of life time to have a chance to recover data. Look at the manual of 
ddrescue why it is important to stop IO from a failing disk as soon as possible.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block 
Sent: Saturday, April 27, 2024 10:29 AM
To: Mary Zhang
Cc: ceph-users@ceph.io; Wesley Dillingham
Subject: [ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

If the rest of the cluster is healthy and your resiliency is
configured properly, for example to sustain the loss of one or more
hosts at a time, you don’t need to worry about a single disk. Just
take it out and remove it (forcefully) so it doesn’t have any clients
anymore. Ceph will immediately assign different primary OSDs and your
clients will be happy again. ;-)

Zitat von Mary Zhang :

> Thank you Wesley for the clear explanation between the 2 methods!
> The tracker issue you mentioned https://tracker.ceph.com/issues/44400 talks
> about primary-affinity. Could primary-affinity help remove an OSD with
> hardware issue from the cluster gracefully?
>
> Thanks,
> Mary
>
>
> On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham 
> wrote:
>
>> What you want to do is to stop the OSD (and all its copies of data it
>> contains) by stopping the OSD service immediately. The downside of this
>> approach is it causes the PGs on that OSD to be degraded. But the upside is
>> the OSD which has bad hardware is immediately no  longer participating in
>> any client IO (the source of your RGW 503s). In this situation the PGs go
>> into degraded+backfilling
>>
>> The alternative method is to keep the failing OSD up and in the cluster
>> but slowly migrate the data off of it, this would be a long drawn out
>> period of time in which the failing disk would continue to serve client
>> reads and also facilitate backfill but you wouldnt take a copy of the data
>> out of the cluster and cause degraded PGs. In this scenario the PGs would
>> be remapped+backfilling
>>
>> I tried to find a way to have your cake and eat it to in relation to this
>> "predicament" in this tracker issue: https://tracker.ceph.com/issues/44400
>> but it was deemed "wont fix".
>>
>> Respectfully,
>>
>> *Wes Dillingham*
>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>> w...@wesdillingham.com
>>
>>
>>
>>
>> On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang 
>> wrote:
>>
>>> Thank you Eugen for your warm help!
>>>
>>> I'm trying to understand the difference between 2 methods.
>>> For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph
>>> Documentation
>>> <https://docs.ceph.com/en/latest/cephadm/services/osd/#remove-an-osd>
>>> says
>>> it involves 2 steps:
>>>
>>>1.
>>>
>>>evacuating all placement groups (PGs) from the OSD
>>>2.
>>>
>>>removing the PG-free OSD from the cluster
>>>
>>> For method 2, or the procedure you recommended, Adding/Removing OSDs —
>>> Ceph
>>> Documentation
>>> <
>>> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual
>>> >
>>> says
>>> "After the OSD has been taken out of the cluster, Ceph begins rebalancing
>>> the cluster by migrating placement groups out of the OSD that was removed.
>>> "
>>>
>>> What's the difference between "evacuatin

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-27 Thread Mary Zhang
Thank you Eugen so much for your insights! We will definitely apply this
method next time. :-)

Best Regards,
Mary

On Sat, Apr 27, 2024 at 1:29 AM Eugen Block  wrote:

> If the rest of the cluster is healthy and your resiliency is
> configured properly, for example to sustain the loss of one or more
> hosts at a time, you don’t need to worry about a single disk. Just
> take it out and remove it (forcefully) so it doesn’t have any clients
> anymore. Ceph will immediately assign different primary OSDs and your
> clients will be happy again. ;-)
>
> Zitat von Mary Zhang :
>
> > Thank you Wesley for the clear explanation between the 2 methods!
> > The tracker issue you mentioned https://tracker.ceph.com/issues/44400
> talks
> > about primary-affinity. Could primary-affinity help remove an OSD with
> > hardware issue from the cluster gracefully?
> >
> > Thanks,
> > Mary
> >
> >
> > On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham  >
> > wrote:
> >
> >> What you want to do is to stop the OSD (and all its copies of data it
> >> contains) by stopping the OSD service immediately. The downside of this
> >> approach is it causes the PGs on that OSD to be degraded. But the
> upside is
> >> the OSD which has bad hardware is immediately no  longer participating
> in
> >> any client IO (the source of your RGW 503s). In this situation the PGs
> go
> >> into degraded+backfilling
> >>
> >> The alternative method is to keep the failing OSD up and in the cluster
> >> but slowly migrate the data off of it, this would be a long drawn out
> >> period of time in which the failing disk would continue to serve client
> >> reads and also facilitate backfill but you wouldnt take a copy of the
> data
> >> out of the cluster and cause degraded PGs. In this scenario the PGs
> would
> >> be remapped+backfilling
> >>
> >> I tried to find a way to have your cake and eat it to in relation to
> this
> >> "predicament" in this tracker issue:
> https://tracker.ceph.com/issues/44400
> >> but it was deemed "wont fix".
> >>
> >> Respectfully,
> >>
> >> *Wes Dillingham*
> >> LinkedIn 
> >> w...@wesdillingham.com
> >>
> >>
> >>
> >>
> >> On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang 
> >> wrote:
> >>
> >>> Thank you Eugen for your warm help!
> >>>
> >>> I'm trying to understand the difference between 2 methods.
> >>> For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph
> >>> Documentation
> >>> 
> >>> says
> >>> it involves 2 steps:
> >>>
> >>>1.
> >>>
> >>>evacuating all placement groups (PGs) from the OSD
> >>>2.
> >>>
> >>>removing the PG-free OSD from the cluster
> >>>
> >>> For method 2, or the procedure you recommended, Adding/Removing OSDs —
> >>> Ceph
> >>> Documentation
> >>> <
> >>>
> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual
> >>> >
> >>> says
> >>> "After the OSD has been taken out of the cluster, Ceph begins
> rebalancing
> >>> the cluster by migrating placement groups out of the OSD that was
> removed.
> >>> "
> >>>
> >>> What's the difference between "evacuating PGs" in method 1 and
> "migrating
> >>> PGs" in method 2? I think method 1 must read the OSD to be removed.
> >>> Otherwise, we would not see slow ops warning. Does method 2 not involve
> >>> reading this OSD?
> >>>
> >>> Thanks,
> >>> Mary
> >>>
> >>> On Fri, Apr 26, 2024 at 5:15 AM Eugen Block  wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > if you remove the OSD this way, it will be drained. Which means that
> >>> > it will try to recover PGs from this OSD, and in case of hardware
> >>> > failure it might lead to slow requests. It might make sense to
> >>> > forcefully remove the OSD without draining:
> >>> >
> >>> > - stop the osd daemon
> >>> > - mark it as out
> >>> > - osd purge  [--force] [--yes-i-really-mean-it]
> >>> >
> >>> > Regards,
> >>> > Eugen
> >>> >
> >>> > Zitat von Mary Zhang :
> >>> >
> >>> > > Hi,
> >>> > >
> >>> > > We recently removed an osd from our Cepth cluster. Its underlying
> disk
> >>> > has
> >>> > > a hardware issue.
> >>> > >
> >>> > > We use command: ceph orch osd rm osd_id --zap
> >>> > >
> >>> > > During the process, sometimes ceph cluster enters warning state
> with
> >>> slow
> >>> > > ops on this osd. Our rgw also failed to respond to requests and
> >>> returned
> >>> > > 503.
> >>> > >
> >>> > > We restarted rgw daemon to make it work again. But the same failure
> >>> > occured
> >>> > > from time to time. Eventually we noticed that rgw 503 error is a
> >>> result
> >>> > of
> >>> > > osd slow ops.
> >>> > >
> >>> > > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
> >>> > > hardware issue won't impact cluster performance & rgw availbility.
> Is
> >>> our
> >>> > > expectation reasonable? What's the best way to handle osd with
> >>> hardware
> >>> > > failures?
> >>> > >
> >>> > > Thank you in advance for any comments or suggestions.
> >>> > 

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-27 Thread Eugen Block
If the rest of the cluster is healthy and your resiliency is  
configured properly, for example to sustain the loss of one or more  
hosts at a time, you don’t need to worry about a single disk. Just  
take it out and remove it (forcefully) so it doesn’t have any clients  
anymore. Ceph will immediately assign different primary OSDs and your  
clients will be happy again. ;-)


Zitat von Mary Zhang :


Thank you Wesley for the clear explanation between the 2 methods!
The tracker issue you mentioned https://tracker.ceph.com/issues/44400 talks
about primary-affinity. Could primary-affinity help remove an OSD with
hardware issue from the cluster gracefully?

Thanks,
Mary


On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham 
wrote:


What you want to do is to stop the OSD (and all its copies of data it
contains) by stopping the OSD service immediately. The downside of this
approach is it causes the PGs on that OSD to be degraded. But the upside is
the OSD which has bad hardware is immediately no  longer participating in
any client IO (the source of your RGW 503s). In this situation the PGs go
into degraded+backfilling

The alternative method is to keep the failing OSD up and in the cluster
but slowly migrate the data off of it, this would be a long drawn out
period of time in which the failing disk would continue to serve client
reads and also facilitate backfill but you wouldnt take a copy of the data
out of the cluster and cause degraded PGs. In this scenario the PGs would
be remapped+backfilling

I tried to find a way to have your cake and eat it to in relation to this
"predicament" in this tracker issue: https://tracker.ceph.com/issues/44400
but it was deemed "wont fix".

Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com




On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang 
wrote:


Thank you Eugen for your warm help!

I'm trying to understand the difference between 2 methods.
For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph
Documentation

says
it involves 2 steps:

   1.

   evacuating all placement groups (PGs) from the OSD
   2.

   removing the PG-free OSD from the cluster

For method 2, or the procedure you recommended, Adding/Removing OSDs —
Ceph
Documentation
<
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual
>
says
"After the OSD has been taken out of the cluster, Ceph begins rebalancing
the cluster by migrating placement groups out of the OSD that was removed.
"

What's the difference between "evacuating PGs" in method 1 and "migrating
PGs" in method 2? I think method 1 must read the OSD to be removed.
Otherwise, we would not see slow ops warning. Does method 2 not involve
reading this OSD?

Thanks,
Mary

On Fri, Apr 26, 2024 at 5:15 AM Eugen Block  wrote:

> Hi,
>
> if you remove the OSD this way, it will be drained. Which means that
> it will try to recover PGs from this OSD, and in case of hardware
> failure it might lead to slow requests. It might make sense to
> forcefully remove the OSD without draining:
>
> - stop the osd daemon
> - mark it as out
> - osd purge  [--force] [--yes-i-really-mean-it]
>
> Regards,
> Eugen
>
> Zitat von Mary Zhang :
>
> > Hi,
> >
> > We recently removed an osd from our Cepth cluster. Its underlying disk
> has
> > a hardware issue.
> >
> > We use command: ceph orch osd rm osd_id --zap
> >
> > During the process, sometimes ceph cluster enters warning state with
slow
> > ops on this osd. Our rgw also failed to respond to requests and
returned
> > 503.
> >
> > We restarted rgw daemon to make it work again. But the same failure
> occured
> > from time to time. Eventually we noticed that rgw 503 error is a
result
> of
> > osd slow ops.
> >
> > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
> > hardware issue won't impact cluster performance & rgw availbility. Is
our
> > expectation reasonable? What's the best way to handle osd with
hardware
> > failures?
> >
> > Thank you in advance for any comments or suggestions.
> >
> > Best Regards,
> > Mary Zhang
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Mary Zhang
Thank you Wesley for the clear explanation between the 2 methods!
The tracker issue you mentioned https://tracker.ceph.com/issues/44400 talks
about primary-affinity. Could primary-affinity help remove an OSD with
hardware issue from the cluster gracefully?

Thanks,
Mary


On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham 
wrote:

> What you want to do is to stop the OSD (and all its copies of data it
> contains) by stopping the OSD service immediately. The downside of this
> approach is it causes the PGs on that OSD to be degraded. But the upside is
> the OSD which has bad hardware is immediately no  longer participating in
> any client IO (the source of your RGW 503s). In this situation the PGs go
> into degraded+backfilling
>
> The alternative method is to keep the failing OSD up and in the cluster
> but slowly migrate the data off of it, this would be a long drawn out
> period of time in which the failing disk would continue to serve client
> reads and also facilitate backfill but you wouldnt take a copy of the data
> out of the cluster and cause degraded PGs. In this scenario the PGs would
> be remapped+backfilling
>
> I tried to find a way to have your cake and eat it to in relation to this
> "predicament" in this tracker issue: https://tracker.ceph.com/issues/44400
> but it was deemed "wont fix".
>
> Respectfully,
>
> *Wes Dillingham*
> LinkedIn 
> w...@wesdillingham.com
>
>
>
>
> On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang 
> wrote:
>
>> Thank you Eugen for your warm help!
>>
>> I'm trying to understand the difference between 2 methods.
>> For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph
>> Documentation
>> 
>> says
>> it involves 2 steps:
>>
>>1.
>>
>>evacuating all placement groups (PGs) from the OSD
>>2.
>>
>>removing the PG-free OSD from the cluster
>>
>> For method 2, or the procedure you recommended, Adding/Removing OSDs —
>> Ceph
>> Documentation
>> <
>> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual
>> >
>> says
>> "After the OSD has been taken out of the cluster, Ceph begins rebalancing
>> the cluster by migrating placement groups out of the OSD that was removed.
>> "
>>
>> What's the difference between "evacuating PGs" in method 1 and "migrating
>> PGs" in method 2? I think method 1 must read the OSD to be removed.
>> Otherwise, we would not see slow ops warning. Does method 2 not involve
>> reading this OSD?
>>
>> Thanks,
>> Mary
>>
>> On Fri, Apr 26, 2024 at 5:15 AM Eugen Block  wrote:
>>
>> > Hi,
>> >
>> > if you remove the OSD this way, it will be drained. Which means that
>> > it will try to recover PGs from this OSD, and in case of hardware
>> > failure it might lead to slow requests. It might make sense to
>> > forcefully remove the OSD without draining:
>> >
>> > - stop the osd daemon
>> > - mark it as out
>> > - osd purge  [--force] [--yes-i-really-mean-it]
>> >
>> > Regards,
>> > Eugen
>> >
>> > Zitat von Mary Zhang :
>> >
>> > > Hi,
>> > >
>> > > We recently removed an osd from our Cepth cluster. Its underlying disk
>> > has
>> > > a hardware issue.
>> > >
>> > > We use command: ceph orch osd rm osd_id --zap
>> > >
>> > > During the process, sometimes ceph cluster enters warning state with
>> slow
>> > > ops on this osd. Our rgw also failed to respond to requests and
>> returned
>> > > 503.
>> > >
>> > > We restarted rgw daemon to make it work again. But the same failure
>> > occured
>> > > from time to time. Eventually we noticed that rgw 503 error is a
>> result
>> > of
>> > > osd slow ops.
>> > >
>> > > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
>> > > hardware issue won't impact cluster performance & rgw availbility. Is
>> our
>> > > expectation reasonable? What's the best way to handle osd with
>> hardware
>> > > failures?
>> > >
>> > > Thank you in advance for any comments or suggestions.
>> > >
>> > > Best Regards,
>> > > Mary Zhang
>> > > ___
>> > > ceph-users mailing list -- ceph-users@ceph.io
>> > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> >
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Wesley Dillingham
What you want to do is to stop the OSD (and all its copies of data it
contains) by stopping the OSD service immediately. The downside of this
approach is it causes the PGs on that OSD to be degraded. But the upside is
the OSD which has bad hardware is immediately no  longer participating in
any client IO (the source of your RGW 503s). In this situation the PGs go
into degraded+backfilling

The alternative method is to keep the failing OSD up and in the cluster but
slowly migrate the data off of it, this would be a long drawn out period of
time in which the failing disk would continue to serve client reads and
also facilitate backfill but you wouldnt take a copy of the data out of the
cluster and cause degraded PGs. In this scenario the PGs would be
remapped+backfilling

I tried to find a way to have your cake and eat it to in relation to this
"predicament" in this tracker issue: https://tracker.ceph.com/issues/44400
but it was deemed "wont fix".

Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com




On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang  wrote:

> Thank you Eugen for your warm help!
>
> I'm trying to understand the difference between 2 methods.
> For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph
> Documentation
>  says
> it involves 2 steps:
>
>1.
>
>evacuating all placement groups (PGs) from the OSD
>2.
>
>removing the PG-free OSD from the cluster
>
> For method 2, or the procedure you recommended, Adding/Removing OSDs — Ceph
> Documentation
> <
> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual
> >
> says
> "After the OSD has been taken out of the cluster, Ceph begins rebalancing
> the cluster by migrating placement groups out of the OSD that was removed.
> "
>
> What's the difference between "evacuating PGs" in method 1 and "migrating
> PGs" in method 2? I think method 1 must read the OSD to be removed.
> Otherwise, we would not see slow ops warning. Does method 2 not involve
> reading this OSD?
>
> Thanks,
> Mary
>
> On Fri, Apr 26, 2024 at 5:15 AM Eugen Block  wrote:
>
> > Hi,
> >
> > if you remove the OSD this way, it will be drained. Which means that
> > it will try to recover PGs from this OSD, and in case of hardware
> > failure it might lead to slow requests. It might make sense to
> > forcefully remove the OSD without draining:
> >
> > - stop the osd daemon
> > - mark it as out
> > - osd purge  [--force] [--yes-i-really-mean-it]
> >
> > Regards,
> > Eugen
> >
> > Zitat von Mary Zhang :
> >
> > > Hi,
> > >
> > > We recently removed an osd from our Cepth cluster. Its underlying disk
> > has
> > > a hardware issue.
> > >
> > > We use command: ceph orch osd rm osd_id --zap
> > >
> > > During the process, sometimes ceph cluster enters warning state with
> slow
> > > ops on this osd. Our rgw also failed to respond to requests and
> returned
> > > 503.
> > >
> > > We restarted rgw daemon to make it work again. But the same failure
> > occured
> > > from time to time. Eventually we noticed that rgw 503 error is a result
> > of
> > > osd slow ops.
> > >
> > > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
> > > hardware issue won't impact cluster performance & rgw availbility. Is
> our
> > > expectation reasonable? What's the best way to handle osd with hardware
> > > failures?
> > >
> > > Thank you in advance for any comments or suggestions.
> > >
> > > Best Regards,
> > > Mary Zhang
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Mary Zhang
Thank you Eugen for your warm help!

I'm trying to understand the difference between 2 methods.
For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph Documentation
 says
it involves 2 steps:

   1.

   evacuating all placement groups (PGs) from the OSD
   2.

   removing the PG-free OSD from the cluster

For method 2, or the procedure you recommended, Adding/Removing OSDs — Ceph
Documentation

says
"After the OSD has been taken out of the cluster, Ceph begins rebalancing
the cluster by migrating placement groups out of the OSD that was removed.
"

What's the difference between "evacuating PGs" in method 1 and "migrating
PGs" in method 2? I think method 1 must read the OSD to be removed.
Otherwise, we would not see slow ops warning. Does method 2 not involve
reading this OSD?

Thanks,
Mary

On Fri, Apr 26, 2024 at 5:15 AM Eugen Block  wrote:

> Hi,
>
> if you remove the OSD this way, it will be drained. Which means that
> it will try to recover PGs from this OSD, and in case of hardware
> failure it might lead to slow requests. It might make sense to
> forcefully remove the OSD without draining:
>
> - stop the osd daemon
> - mark it as out
> - osd purge  [--force] [--yes-i-really-mean-it]
>
> Regards,
> Eugen
>
> Zitat von Mary Zhang :
>
> > Hi,
> >
> > We recently removed an osd from our Cepth cluster. Its underlying disk
> has
> > a hardware issue.
> >
> > We use command: ceph orch osd rm osd_id --zap
> >
> > During the process, sometimes ceph cluster enters warning state with slow
> > ops on this osd. Our rgw also failed to respond to requests and returned
> > 503.
> >
> > We restarted rgw daemon to make it work again. But the same failure
> occured
> > from time to time. Eventually we noticed that rgw 503 error is a result
> of
> > osd slow ops.
> >
> > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
> > hardware issue won't impact cluster performance & rgw availbility. Is our
> > expectation reasonable? What's the best way to handle osd with hardware
> > failures?
> >
> > Thank you in advance for any comments or suggestions.
> >
> > Best Regards,
> > Mary Zhang
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Eugen Block

Hi,

if you remove the OSD this way, it will be drained. Which means that  
it will try to recover PGs from this OSD, and in case of hardware  
failure it might lead to slow requests. It might make sense to  
forcefully remove the OSD without draining:


- stop the osd daemon
- mark it as out
- osd purge  [--force] [--yes-i-really-mean-it]

Regards,
Eugen

Zitat von Mary Zhang :


Hi,

We recently removed an osd from our Cepth cluster. Its underlying disk has
a hardware issue.

We use command: ceph orch osd rm osd_id --zap

During the process, sometimes ceph cluster enters warning state with slow
ops on this osd. Our rgw also failed to respond to requests and returned
503.

We restarted rgw daemon to make it work again. But the same failure occured
from time to time. Eventually we noticed that rgw 503 error is a result of
osd slow ops.

Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
hardware issue won't impact cluster performance & rgw availbility. Is our
expectation reasonable? What's the best way to handle osd with hardware
failures?

Thank you in advance for any comments or suggestions.

Best Regards,
Mary Zhang
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io