On 9/22/25 14:42, Niklas Hambüchen wrote:
Well, if they were away long enough to get "out", then it is somewhat 
reasonable even for ~5m downtimes.

Right, but what I'm saying is that this is not what happens.

My reboot or disconnect takes < 5 minutes, and no OSD is `out` afterwards.

When I say "down for 5 minutes", I literally mean that the node goes down, 
comes back up, and I'm sitting in front of its terminal and observe that all OSDs are 
`up` and `in`.

Of course your explanation of what happens if it's `out` makes sense, but that 
isn't my scenario; if Ceph had 10 hours to move data off, of course I would 
have to expect at least 10 hours to move data back on. But it only has 5 
minutes at max to move data off.

Are we talking about a replicated pool or EC here?
And what is your failure domain?

What might give insight is the following command:

watch -n 5 ceph pg ls remapped

In the first columns you can see how many objects / data are missing on each PG. Maybe note what the status is for a specific set of OSDs before a reboot. That might give a clue what happens.

There is a difference in recovery strategy between replicated and EC [1]. Not sure about backfill, but this might be handled in a similar manner. In a replicated pool I would expect the time to backfill misplaced objects be roughly similar to the downtime. A bit longer actually as there might be a few OSDs that have multiple PGs on them and not enough backfill slots to do this in parallel (as was already mentioned in this thread).

Gr. Stefan

[1]: https://docs.ceph.com/en/latest/dev/osd_internals/log_based_pg/
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to