We are using luminous, we have seven ceph nodes and setup them all as MDS. Recently the MDS lost very frequently, and when there is only one MDS left, the cephfs just degraded to unusable.
Checked the mds log in one ceph node, I found below >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /build/ceph-12.2.8/src/mds/Locker.cc: 5076: FAILED assert(lock->get_state() == LOCK_PRE_SCAN) ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x564400e50e42] 2: (Locker::file_recover(ScatterLock*)+0x208) [0x564400c6ae18] 3: (MDCache::start_files_to_recover()+0xb3) [0x564400b98af3] 4: (MDSRank::clientreplay_start()+0x1f7) [0x564400ae04c7] 5: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x25c0) [0x564400aefd40] 6: (MDSDaemon::handle_mds_map(MMDSMap*)+0x154d) [0x564400ace3bd] 7: (MDSDaemon::handle_core_message(Message*)+0x7f3) [0x564400ad1273] 8: (MDSDaemon::ms_dispatch(Message*)+0x1c3) [0x564400ad15a3] 9: (DispatchQueue::entry()+0xeda) [0x5644011a547a] 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x564400ee3fcd] 11: (()+0x7494) [0x7f7a2b106494] 12: (clone()+0x3f) [0x7f7a2a17eaff] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< The full log is also attached. Could you please help us? Thanks! BR Oliver
<<attachment: ceph-mds.lkp-ceph-node1.zip>>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com