[ceph-users] Re: 17.2.7: Backfilling deadlock / stall / stuck / standstill

2024-01-28 Thread Kai Stian Olstad
On 26.01.2024 23:09, Mark Nelson wrote: For what it's worth, we saw this last week at Clyso on two separate customer clusters on 17.2.7 and also solved it by moving back to wpq.  We've been traveling this week so haven't created an upstream tracker for it yet, but we're back to recommending

[ceph-users] Re: 17.2.7: Backfilling deadlock / stall / stuck / standstill

2024-01-28 Thread Kai Stian Olstad
On 26.01.2024 22:08, Wesley Dillingham wrote: I faced a similar issue. The PG just would never finish recovery. Changing all OSDs in the PG to "osd_op_queue wpq" and then restarting them serially ultimately allowed the PG to recover. Seemed to be some issue with mclock. Thank you Wes,

[ceph-users] Re: 17.2.7: Backfilling deadlock / stall / stuck / standstill

2024-01-26 Thread Mark Nelson
For what it's worth, we saw this last week at Clyso on two separate customer clusters on 17.2.7 and also solved it by moving back to wpq.  We've been traveling this week so haven't created an upstream tracker for it yet, but we're back to recommending wpq to our customers for all production

[ceph-users] Re: 17.2.7: Backfilling deadlock / stall / stuck / standstill

2024-01-26 Thread Wesley Dillingham
I faced a similar issue. The PG just would never finish recovery. Changing all OSDs in the PG to "osd_op_queue wpq" and then restarting them serially ultimately allowed the PG to recover. Seemed to be some issue with mclock. Respectfully, *Wes Dillingham* w...@wesdillingham.com LinkedIn