On 10/6/22 16:12, Frank Schilder wrote:
Hi Igor and Stefan.

Not sure why you're talking about replicated(!) 4(2) pool.

Its because in the production cluster its the 4(2) pool that has that problem. On the 
test cluster it was an > > EC pool. Seems to affect all sorts of pools.

I have to take this one back. It is indeed an EC pool that is also on these SSD 
OSDs that is affected. The meta-data pool was all active all the time until we 
lost the 3rd host. So, the bug reported is confirmed to affect EC pools.

If not - does any died OSD unconditionally mean its underlying disk is
unavailable any more?

Fortunately not. After loosing disks on the 3rd host, we had to start taking 
somewhat more desperate measures. We set the file system off-line to stop 
client IO and started rebooting hosts in reverse order of failing. This brought 
back the OSDs on the still un-converted hosts. We rebooted the converted host 
with the original fail of OSDs last. Unfortunately, here it seems we lost a 
drive for good. It looks like the OSDs crashed while the conversion was going 
on or something. They don't boot up and I need to look into that with more 
detail.

Maybe not do the online conversion, but opt for offline one? So you can inspect if it works or not. Time wise it hardly matters (online conversions used to be much slower, but that is not the case anymore). If an already upgraded OSD reboots (because it crashed for example), it will immediately do the conversion. It might be better to have a bit more control over it and do it manually.

We recently observed that OSDs that are restarted might take some time to do their standard RocksDB compactions. We therefore set the "noup" flag to give them some time to do the housekeeping and only after that finishes unset the noup flag. It helped prevent a lot of slow ops we would have had otherwise. It might help in this case as well.


We are currently trying to encourage fs clients to reconnect to the file 
system. Unfortunately, on many we get

# ls /shares/nfs/ait_pnora01 # this *is* a ceph-fs mount point
ls: cannot access '/shares/nfs/ait_pnora01': Stale file handle

Is there a server-sided way to encourage the FS clients to reconnect to the 
cluster? What is a clean way to get them back onto the file system? I tried a 
remounts without success.

Not that I know of. You probably need to reboot those hosts.


Before executing the next conversion, I will compact the rocksdb on all SSD 
OSDs. The HDDs seem to be entirely unaffected. The SSDs have a very high number 
of objects per PG, which is potentially the main reason for our observations.

Yup, pretty much certain that's the reason. Nowadays one of our default maintenance routines before doing upgrades / conversions, etc. is to do offline compaction of all OSDs beforehand.

I hope it helps.


Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to