[ceph-users] Re: Why does recovering objects take much longer than the outage that caused them?

Burkhard Linke Fri, 19 Sep 2025 05:13:19 -0700

Hi,

On 9/19/25 13:23, Niklas Hambüchen wrote:

I noticed that for my clusters, even a short 5-minute network outage or 
single-host reboot can cause


     pgs:     5586988/366684639 objects misplaced (1.524%)

which at the speed of

     recovery: 2.2 GiB/s, 676 objects/s

can take hours to recover.

I don't understand how this can be. If it's down for so short, how can 
rebalancing can take this long?

If the objects are only misplaced, but not degraded, the full dataavailability and consistency according to the chosen crush rules isalready guaranteed. The objects are "only" on the wrong OSDs.

In my experiences the backfilling operation for misplaced objects have alower priority than other operation. Things are different if objects are_degraded_. In this case CEPH pushes the backfill operations.

You also have to consider how many backfill operations can run inparallel. Each OSD has a number of backfill slots; and all OSDs handlinga PG have to provide a free slot for the operation. For replicated poolsthis means that three OSDs are involved, probably more for EC pools. Andfinally there's Murphy....the last objects have to be handled by thesame set of OSDS, resulting in a queue up of backfill operations andslow progress.

There are a number of settings for backfilling, but these differ e.g.depending on the select osd_op_queue.


Best regards,

Burkhard

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Why does recovering objects take much longer than the outage that caused them?

Reply via email to