[ceph-users] MDS flapping: how to increase MDS timeouts?

Burkhard Linke Thu, 26 Jan 2017 00:20:07 -0800

HI,

we are running two MDS servers in active/standby-replay setup. Recentlywe had to disconnect active MDS server, and failover to standby works asexpected.

The filesystem currently contains over 5 million files, so reading allthe metadata information from the data pool took too long, since theinformation was not available on the OSD page caches. The MDS was timedout by the mons, and a failover switch to the former active MDS (whichwas available as standby again) happened. This MDS in turn had to readthe metadata, again running into a timeout, failover, etc. I resolvedthe situation by disabling one of the MDS, which kept the mons fromfailing the now solely available MDS.

So given a large filesystem, how do I prevent failover flapping betweenMDS instances that are in the rejoin state and reading the inodeinformation?


Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] MDS flapping: how to increase MDS timeouts?

Reply via email to