On Thu, Jan 19, 2023 at 9:07 PM Lo Re Giuseppe <giuseppe.l...@cscs.ch> wrote: > > Dear all, > > We have started to use more intensively cephfs for some wlcg related workload. > We have 3 active mds instances spread on 3 servers, > mds_cache_memory_limit=12G, most of the other configs are default ones. > One of them has crashed this night leaving the log below. > Do you have any hint on what could be the cause and how to avoid it?
Not atm. Telemetry reported similar crashes https://tracker.ceph.com/issues/54959 (cephfs) https://tracker.ceph.com/issues/54685 (mgr) BT indicates tcmalloc involvement, but not sure what's going on. > > Regards, > > Giuseppe > > [root@naret-monitor03 ~]# journalctl -u > ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service > ... > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific > > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 1: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 2: abort() > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 3: /lib64/libstdc++.so.6(+0x987ba) [0x7fe2912567ba] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 4: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 5: /lib64/libstdc++.so.6(+0x95559) [0x7fe291253559] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 6: __gxx_personality_v0() > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 7: /lib64/libgcc_s.so.1(+0x10b03) [0x7fe290c34b03] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 8: _Unwind_Resume() > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 9: /usr/bin/ceph-mds(+0x18c104) [0x5638351e7104] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 10: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 11: gsignal() > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 12: abort() > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 13: /lib64/libstdc++.so.6(+0x9009b) [0x7fe29124e09b] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 14: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 15: /lib64/libstdc++.so.6(+0x96597) [0x7fe291254597] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 16: /lib64/libstdc++.so.6(+0x967f8) [0x7fe2912547f8] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 17: /lib64/libtcmalloc.so.4(+0x19fa4) [0x7fe29bae6fa4] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 18: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, vo> > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 19: (std::shared_ptr<inode_t<mempool::mds_co::pool_allocator> > InodeSt> > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 20: (CInode::_decode_base(ceph::buffer::v15_2_0::list::iterator_impl<tr> > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 21: (CInode::decode_import(ceph::buffer::v15_2_0::list::iterator_impl<t> > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 22: (Migrator::decode_import_inode(CDentry*, ceph::buffer::v15_2_0::lis> > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 23: (Migrator::decode_import_dir(ceph::buffer::v15_2_0::list::iterator_> > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 24: (Migrator::handle_export_dir(boost::intrusive_ptr<MExportDir const>> > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 25: (Migrator::dispatch(boost::intrusive_ptr<Message const> const&)+0x1> > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 26: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&> > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 27: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, boo> > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 28: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const>> > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 29: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x10> > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 30: (DispatchQueue::entry()+0x126a) [0x7fe2930a5aba] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 31: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fe2931575d1] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 32: /lib64/libpthread.so.0(+0x81cf) [0x7fe291e451cf] > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > 33: clone() > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > NOTE: a copy of the executable, or `objdump -rdS <executable>` is neede> > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > --- begin dump of recent events --- > Jan 19 04:49:40 naret-monitor03 > ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: > terminate called recursively > Jan 19 04:49:43 naret-monitor03 systemd[1]: > ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service: > Main process exited, code=exited, status=127/n/a > Jan 19 04:49:43 naret-monitor03 systemd[1]: > ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service: > Failed with result 'exit-code'. > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io