[ceph-users] octupus: stall i/o during recovery

Peter Lieven Thu, 26 Nov 2020 08:37:41 -0800

Hi,

I am currently evaluating ceph and stumbled across an odd issue when an osd 
comes back online.
The osd was taken offline, but is still "in" and is brought back online before 
it is marked "out".


As a test I run a fio job with 4k rand I/O on a 10G rbd volume during the OSD 
down and up procedure.
As OSDs I use 8x 960GB SAS SSDs on 4 Nodes interconnected with 2x10GE each. 
Network and SSD seems not to be congested at any time.

>From time to time I see complete stall in the fio benchmark for approx. 10 
>seconds while recovery is ongoing.
All recovery parameteres (max_recovery, max_backfill, sleep etc.) do not seem 
to influence it.

While digging deeper I found that the requests are hanging in the "waiting for 
readable" state.

Help how to debug this further would be great. Might it be linked to the new 
feature in Octopus to read from
all OSDs and not just the primary?

Thank you,
Peter
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] octupus: stall i/o during recovery

Reply via email to