Hi Folks
We've noticed that in a cluster of 21 nodes (5 mgrs&mons & 504 OSDs with 24
per node) that the mgr's are, after a non specific period of time, dropping
out of the cluster. The logs only show the following:
debug 2020-12-10T02:02:50.409+ 7f1005840700 0 log_channel(cluster) log
[DBG]
The release notes do have it, however it's under different PR & issue
numbers, as it's backported into octopus:
mgr/ActivePyModules.cc: always release GIL before attempting to acquire
a lock (pr#38801, Cory Snyder) [https://github.com/ceph/ceph/pull/38801,
https://tracker.ceph.com/issues/48714]
Hi Igor
We'll take a look at disabling swap on the nodes and see if that improves
the situation.
Having checked across all osds we're not seeing
bluestore_reads_with_retries as anything other than a zero value. We get
the error anywhere from 3 - 10 occurrences of the error a week, but it's
usuall