Hi all, we have a serious problem with CephFS. A few days ago, the CephFS file systems became inaccessible, with the message MDS_DAMAGE: 1 mds daemon damaged
The cephfs-journal-tool tells us: "Overall journal integrity: OK" The usual attempts with redeploy were unfortunately not successful. After many attempts to achieve something with the orchestrator, we set the MDS to “failed” and provoked the creation of new MDS with “ceph fs reset”. But this MDS crashes: ceph-17.2.7/src/mds/MDCache.cc: In function 'void MDCache::rejoin_send_rejoins()' ceph-17.2.7/src/mds/MDCache.cc: 4086: FAILED ceph_assert(auth >= 0) (The full trace is attached). What can we do now? We are grateful for any help!
May 05 22:42:43 ceph06 bash[707251]: debug -1> 2024-05-05T20:42:43.006+0000 7f6892752700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDCache.cc: In function 'void MDCache::rejoin_send_rejoins()' thread 7f6892752700 time 2024-05-05T20:42:43.008448+0000 May 05 22:42:43 ceph06 bash[707251]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDCache.cc: 4086: FAILED ceph_assert(auth >= 0) May 05 22:42:43 ceph06 bash[707251]: ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) May 05 22:42:43 ceph06 bash[707251]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7f689fb974a3] May 05 22:42:43 ceph06 bash[707251]: 2: /usr/lib64/ceph/libceph-common.so.2(+0x269669) [0x7f689fb97669] May 05 22:42:43 ceph06 bash[707251]: 3: (MDCache::rejoin_send_rejoins()+0x216b) [0x5605d03da7eb] May 05 22:42:43 ceph06 bash[707251]: 4: (MDCache::process_imported_caps()+0x1993) [0x5605d03d8353] May 05 22:42:43 ceph06 bash[707251]: 5: (MDCache::rejoin_open_ino_finish(inodeno_t, int)+0x217) [0x5605d03e5837] May 05 22:42:43 ceph06 bash[707251]: 6: (MDSContext::complete(int)+0x5f) [0x5605d05a7f4f] May 05 22:42:43 ceph06 bash[707251]: 7: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x8d) [0x5605d024cf5d] May 05 22:42:43 ceph06 bash[707251]: 8: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x5605d03cd168] May 05 22:42:43 ceph06 bash[707251]: 9: (MDCache::_open_ino_traverse_dir(inodeno_t, MDCache::open_ino_info_t&, int)+0xbb) [0x5605d03cd4bb] May 05 22:42:43 ceph06 bash[707251]: 10: (MDSContext::complete(int)+0x5f) [0x5605d05a7f4f] May 05 22:42:43 ceph06 bash[707251]: 11: (MDSRank::_advance_queues()+0xaa) [0x5605d025b34a] May 05 22:42:43 ceph06 bash[707251]: 12: (MDSRank::ProgressThread::entry()+0xb8) [0x5605d025b918] May 05 22:42:43 ceph06 bash[707251]: 13: /lib64/libpthread.so.0(+0x81ca) [0x7f689eb861ca] May 05 22:42:43 ceph06 bash[707251]: 14: clone() May 05 22:42:43 ceph06 bash[707251]: debug 0> 2024-05-05T20:42:43.010+0000 7f6892752700 -1 *** Caught signal (Aborted) ** May 05 22:42:43 ceph06 bash[707251]: in thread 7f6892752700 thread_name:mds_rank_progr May 05 22:42:43 ceph06 bash[707251]: ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) May 05 22:42:43 ceph06 bash[707251]: 1: /lib64/libpthread.so.0(+0x12cf0) [0x7f689eb90cf0] May 05 22:42:43 ceph06 bash[707251]: 2: gsignal() May 05 22:42:43 ceph06 bash[707251]: 3: abort() May 05 22:42:43 ceph06 bash[707251]: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7f689fb974fd] May 05 22:42:43 ceph06 bash[707251]: 5: /usr/lib64/ceph/libceph-common.so.2(+0x269669) [0x7f689fb97669] May 05 22:42:43 ceph06 bash[707251]: 6: (MDCache::rejoin_send_rejoins()+0x216b) [0x5605d03da7eb] May 05 22:42:43 ceph06 bash[707251]: 7: (MDCache::process_imported_caps()+0x1993) [0x5605d03d8353] May 05 22:42:43 ceph06 bash[707251]: 8: (MDCache::rejoin_open_ino_finish(inodeno_t, int)+0x217) [0x5605d03e5837] May 05 22:42:43 ceph06 bash[707251]: 9: (MDSContext::complete(int)+0x5f) [0x5605d05a7f4f] May 05 22:42:43 ceph06 bash[707251]: 10: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x8d) [0x5605d024cf5d] May 05 22:42:43 ceph06 bash[707251]: 11: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x5605d03cd168] May 05 22:42:43 ceph06 bash[707251]: 12: (MDCache::_open_ino_traverse_dir(inodeno_t, MDCache::open_ino_info_t&, int)+0xbb) [0x5605d03cd4bb] May 05 22:42:43 ceph06 bash[707251]: 13: (MDSContext::complete(int)+0x5f) [0x5605d05a7f4f] May 05 22:42:43 ceph06 bash[707251]: 14: (MDSRank::_advance_queues()+0xaa) [0x5605d025b34a] May 05 22:42:43 ceph06 bash[707251]: 15: (MDSRank::ProgressThread::entry()+0xb8) [0x5605d025b918] May 05 22:42:43 ceph06 bash[707251]: 16: /lib64/libpthread.so.0(+0x81ca) [0x7f689eb861ca] May 05 22:42:43 ceph06 bash[707251]: 17: clone() May 05 22:42:43 ceph06 bash[707251]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. May 05 22:42:43 ceph06 bash[707251]: --- logging levels --- May 05 22:42:43 ceph06 bash[707251]: 0/ 5 none May 05 22:42:43 ceph06 bash[707251]: 0/ 1 lockdep May 05 22:42:43 ceph06 bash[707251]: 0/ 1 context May 05 22:42:43 ceph06 bash[707251]: 1/ 1 crush May 05 22:42:43 ceph06 bash[707251]: 1/ 5 mds May 05 22:42:43 ceph06 bash[707251]: 1/ 5 mds_balancer May 05 22:42:43 ceph06 bash[707251]: 1/ 5 mds_locker May 05 22:42:43 ceph06 bash[707251]: 1/ 5 mds_log May 05 22:42:43 ceph06 bash[707251]: 1/ 5 mds_log_expire May 05 22:42:43 ceph06 bash[707251]: 1/ 5 mds_migrator May 05 22:42:43 ceph06 bash[707251]: 0/ 1 buffer May 05 22:42:43 ceph06 bash[707251]: 0/ 1 timer May 05 22:42:43 ceph06 bash[707251]: 0/ 1 filer May 05 22:42:43 ceph06 bash[707251]: 0/ 1 striper May 05 22:42:43 ceph06 bash[707251]: 0/ 1 objecter May 05 22:42:43 ceph06 bash[707251]: 0/ 5 rados May 05 22:42:43 ceph06 bash[707251]: 0/ 5 rbd May 05 22:42:43 ceph06 bash[707251]: 0/ 5 rbd_mirror May 05 22:42:43 ceph06 bash[707251]: 0/ 5 rbd_replay May 05 22:42:43 ceph06 bash[707251]: 0/ 5 rbd_pwl May 05 22:42:43 ceph06 bash[707251]: 0/ 5 journaler May 05 22:42:43 ceph06 bash[707251]: 0/ 5 objectcacher May 05 22:42:43 ceph06 bash[707251]: 0/ 5 immutable_obj_cache May 05 22:42:43 ceph06 bash[707251]: 0/ 5 client May 05 22:42:43 ceph06 bash[707251]: 1/ 5 osd May 05 22:42:43 ceph06 bash[707251]: 0/ 5 optracker May 05 22:42:43 ceph06 bash[707251]: 0/ 5 objclass May 05 22:42:43 ceph06 bash[707251]: 1/ 3 filestore May 05 22:42:43 ceph06 bash[707251]: 1/ 3 journal May 05 22:42:43 ceph06 bash[707251]: 0/ 0 ms May 05 22:42:43 ceph06 bash[707251]: 1/ 5 mon May 05 22:42:43 ceph06 bash[707251]: 0/10 monc May 05 22:42:43 ceph06 bash[707251]: 1/ 5 paxos May 05 22:42:43 ceph06 bash[707251]: 0/ 5 tp May 05 22:42:43 ceph06 bash[707251]: 1/ 5 auth May 05 22:42:43 ceph06 bash[707251]: 1/ 5 crypto May 05 22:42:43 ceph06 bash[707251]: 1/ 1 finisher May 05 22:42:43 ceph06 bash[707251]: 1/ 1 reserver May 05 22:42:43 ceph06 bash[707251]: 1/ 5 heartbeatmap May 05 22:42:43 ceph06 bash[707251]: 1/ 5 perfcounter May 05 22:42:43 ceph06 bash[707251]: 1/ 2 rgw May 05 22:42:43 ceph06 bash[707251]: 1/ 5 rgw_sync May 05 22:42:43 ceph06 bash[707251]: 1/ 5 rgw_datacache May 05 22:42:43 ceph06 bash[707251]: 1/10 civetweb May 05 22:42:43 ceph06 bash[707251]: 1/ 5 javaclient May 05 22:42:43 ceph06 bash[707251]: 1/ 5 asok May 05 22:42:43 ceph06 bash[707251]: 1/ 1 throttle May 05 22:42:43 ceph06 bash[707251]: 0/ 0 refs May 05 22:42:43 ceph06 bash[707251]: 1/ 5 compressor May 05 22:42:43 ceph06 bash[707251]: 1/ 5 bluestore May 05 22:42:43 ceph06 bash[707251]: 1/ 5 bluefs May 05 22:42:43 ceph06 bash[707251]: 1/ 3 bdev May 05 22:42:43 ceph06 bash[707251]: 1/ 5 kstore May 05 22:42:43 ceph06 bash[707251]: 4/ 5 rocksdb May 05 22:42:43 ceph06 bash[707251]: 4/ 5 leveldb May 05 22:42:43 ceph06 bash[707251]: 4/ 5 memdb May 05 22:42:43 ceph06 bash[707251]: 1/ 5 fuse May 05 22:42:43 ceph06 bash[707251]: 1/ 5 mgr May 05 22:42:43 ceph06 bash[707251]: 1/ 5 mgrc May 05 22:42:43 ceph06 bash[707251]: 1/ 5 dpdk May 05 22:42:43 ceph06 bash[707251]: 1/ 5 eventtrace May 05 22:42:43 ceph06 bash[707251]: 1/ 5 prioritycache May 05 22:42:43 ceph06 bash[707251]: 0/ 5 test May 05 22:42:43 ceph06 bash[707251]: 0/ 5 cephfs_mirror May 05 22:42:43 ceph06 bash[707251]: 0/ 5 cephsqlite May 05 22:42:43 ceph06 bash[707251]: 0/ 5 seastore May 05 22:42:43 ceph06 bash[707251]: 0/ 5 seastore_onode May 05 22:42:43 ceph06 bash[707251]: 0/ 5 seastore_odata May 05 22:42:43 ceph06 bash[707251]: 0/ 5 seastore_omap May 05 22:42:43 ceph06 bash[707251]: 0/ 5 seastore_tm May 05 22:42:43 ceph06 bash[707251]: 0/ 5 seastore_cleaner May 05 22:42:43 ceph06 bash[707251]: 0/ 5 seastore_lba May 05 22:42:43 ceph06 bash[707251]: 0/ 5 seastore_cache May 05 22:42:43 ceph06 bash[707251]: 0/ 5 seastore_journal May 05 22:42:43 ceph06 bash[707251]: 0/ 5 seastore_device May 05 22:42:43 ceph06 bash[707251]: 0/ 5 alienstore May 05 22:42:43 ceph06 bash[707251]: 1/ 5 mclock May 05 22:42:43 ceph06 bash[707251]: 1/ 5 ceph_exporter May 05 22:42:43 ceph06 bash[707251]: -2/-2 (syslog threshold) May 05 22:42:43 ceph06 bash[707251]: 99/99 (stderr threshold) May 05 22:42:43 ceph06 bash[707251]: --- pthread ID / name mapping for recent threads --- May 05 22:42:43 ceph06 bash[707251]: 7f688f74c700 / May 05 22:42:43 ceph06 bash[707251]: 7f689074e700 / May 05 22:42:43 ceph06 bash[707251]: 7f6890f4f700 / MR_Finisher May 05 22:42:43 ceph06 bash[707251]: 7f6891f51700 / PQ_Finisher May 05 22:42:43 ceph06 bash[707251]: 7f6892752700 / mds_rank_progr May 05 22:42:43 ceph06 bash[707251]: 7f6892f53700 / ms_dispatch May 05 22:42:43 ceph06 bash[707251]: 7f6894f57700 / ceph-mds May 05 22:42:43 ceph06 bash[707251]: 7f6895758700 / safe_timer May 05 22:42:43 ceph06 bash[707251]: 7f6895f59700 / safe_timer May 05 22:42:43 ceph06 bash[707251]: 7f6896f5b700 / ms_dispatch May 05 22:42:43 ceph06 bash[707251]: 7f6897f5d700 / io_context_pool May 05 22:42:43 ceph06 bash[707251]: 7f6898f5f700 / admin_socket May 05 22:42:43 ceph06 bash[707251]: 7f6899760700 / msgr-worker-2 May 05 22:42:43 ceph06 bash[707251]: 7f6899f61700 / msgr-worker-1 May 05 22:42:43 ceph06 bash[707251]: 7f689a762700 / msgr-worker-0 May 05 22:42:43 ceph06 bash[707251]: 7f68a0cd4ac0 / ceph-mds May 05 22:42:43 ceph06 bash[707251]: max_recent 10000 May 05 22:42:43 ceph06 bash[707251]: max_new 10000 May 05 22:42:43 ceph06 bash[707251]: log_file /var/lib/ceph/crash/2024-05-05T20:42:43.014159Z_b6ad6bc1-0faa-4a78-8cb0-c004f051c7d6/log May 05 22:42:43 ceph06 bash[707251]: --- end dump of recent events ---
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io