HI,

we are running two MDS servers in active/standby-replay setup. Recently we had to disconnect active MDS server, and failover to standby works as expected.


The filesystem currently contains over 5 million files, so reading all the metadata information from the data pool took too long, since the information was not available on the OSD page caches. The MDS was timed out by the mons, and a failover switch to the former active MDS (which was available as standby again) happened. This MDS in turn had to read the metadata, again running into a timeout, failover, etc. I resolved the situation by disabling one of the MDS, which kept the mons from failing the now solely available MDS.


So given a large filesystem, how do I prevent failover flapping between MDS instances that are in the rejoin state and reading the inode information?

Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to