[ceph-users] Re: The last 15 'degraded' items take as many hours as the first 15K?

Anthony D'Atri Wed, 11 May 2022 15:08:33 -0700

Small objects recover faster than large ones.

But especially, early in the process many OSDs / PGs are recovering in 
parallel.  Toward the end there’s a long tail where parallelism is limited by 
osd_max_backfills, say the remaining PGs to recover are all on a single OSD, 
they will execute serially.


> 
> Might someone explain why the count of degraded items can drop thousands, 
> sometimes tens of thousands in the same number of hours it takes to go from 
> 10 to 0?  For example, when an OSD or a host with a few OSD's goes offline 
> for a while, reboots.
> 
> Sitting at one complete and entire degraded object out of millions for longer 
> than it took to write this post.
> 
> Seems the fewer the number of degraded objects, the less interested the 
> cluster is in fixing it!
> 
> HC
> 
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: The last 15 'degraded' items take as many hours as the first 15K?

Reply via email to