[ceph-users] Re: Why does recovering objects take much longer than the outage that caused them?

Stefan Kooman Fri, 17 Oct 2025 23:12:00 -0700

On 9/22/25 14:42, Niklas Hambüchen wrote:

Well, if they were away long enough to get "out", then it is somewhat 
reasonable even for ~5m downtimes.


Right, but what I'm saying is that this is not what happens.

My reboot or disconnect takes < 5 minutes, and no OSD is `out` afterwards.

When I say "down for 5 minutes", I literally mean that the node goes down, 
comes back up, and I'm sitting in front of its terminal and observe that all OSDs are 
`up` and `in`.

Of course your explanation of what happens if it's `out` makes sense, but that 
isn't my scenario; if Ceph had 10 hours to move data off, of course I would 
have to expect at least 10 hours to move data back on. But it only has 5 
minutes at max to move data off.


Are we talking about a replicated pool or EC here?
And what is your failure domain?

What might give insight is the following command:

watch -n 5 ceph pg ls remapped

In the first columns you can see how many objects / data are missing oneach PG. Maybe note what the status is for a specific set of OSDs beforea reboot. That might give a clue what happens.

There is a difference in recovery strategy between replicated and EC[1]. Not sure about backfill, but this might be handled in a similarmanner. In a replicated pool I would expect the time to backfillmisplaced objects be roughly similar to the downtime. A bit longeractually as there might be a few OSDs that have multiple PGs on them andnot enough backfill slots to do this in parallel (as was alreadymentioned in this thread).


Gr. Stefan

[1]: https://docs.ceph.com/en/latest/dev/osd_internals/log_based_pg/
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Why does recovering objects take much longer than the outage that caused them?

Reply via email to