On Sat, Oct 12, 2019 at 1:10 AM Kenneth Waegeman <kenneth.waege...@ugent.be> wrote:
> Hi all, > > After solving some pg inconsistency problems, my fs is still in > trouble. my mds's are crashing with this error: > > > > -5> 2019-10-11 19:02:55.375 7f2d39f10700 1 mds.1.564276 rejoin_start > > -4> 2019-10-11 19:02:55.385 7f2d3d717700 5 mds.beacon.mds01 > > received beacon reply up:rejoin seq 5 rtt 1.01 > > -3> 2019-10-11 19:02:55.495 7f2d39f10700 1 mds.1.564276 > > rejoin_joint_start > > -2> 2019-10-11 19:02:55.505 7f2d39f10700 5 mds.mds01 > > handle_mds_map old map epoch 564279 <= 564279, discarding > > -1> 2019-10-11 19:02:55.695 7f2d33f04700 -1 > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/mdstyp > > es.h: In function 'static void > > dentry_key_t::decode_helper(std::string_view, std::string&, > > snapid_t&)' thread 7f2d33f04700 time 2019-10-11 19:02:55.703343 > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/mdstypes.h: > > > 1229: FAILED ceph_assert(i != string::npos > > ) > > > > ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) > > nautilus (stable) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > const*)+0x14a) [0x7f2d43393046] > > 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char > > const*, char const*, ...)+0) [0x7f2d43393214] > > 3: (CDir::_omap_fetched(ceph::buffer::v14_2_0::list&, > > std::map<std::string, ceph::buffer::v14_2_0::list, > > std::less<std::string>, std::allocator<std::pair<std::string const, > > ceph::buffer::v14_2_0::list> > >&, bool, int)+0xa68) [0x556a17ec > > baa8] > > 4: (C_IO_Dir_OMAP_Fetched::finish(int)+0x54) [0x556a17ee0034] > > 5: (MDSContext::complete(int)+0x70) [0x556a17f5e710] > > 6: (MDSIOContextBase::complete(int)+0x16b) [0x556a17f5e9ab] > > 7: (Finisher::finisher_thread_entry()+0x156) [0x7f2d433d8386] > > 8: (()+0x7dd5) [0x7f2d41262dd5] > > 9: (clone()+0x6d) [0x7f2d3ff1302d] > > > > 0> 2019-10-11 19:02:55.695 7f2d33f04700 -1 *** Caught signal > > (Aborted) ** > > in thread 7f2d33f04700 thread_name:fn_anonymous > > > > ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) > > nautilus (stable) > > 1: (()+0xf5d0) [0x7f2d4126a5d0] > > 2: (gsignal()+0x37) [0x7f2d3fe4b2c7] > > 3: (abort()+0x148) [0x7f2d3fe4c9b8] > > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > const*)+0x199) [0x7f2d43393095] > > 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char > > const*, char const*, ...)+0) [0x7f2d43393214] > > 6: (CDir::_omap_fetched(ceph::buffer::v14_2_0::list&, > > std::map<std::string, ceph::buffer::v14_2_0::list, > > std::less<std::string>, std::allocator<std::pair<std::string const, > > ceph::buffer::v14_2_0::list> > >&, bool, int)+0xa68) [0x556a17ec > > baa8] > > 7: (C_IO_Dir_OMAP_Fetched::finish(int)+0x54) [0x556a17ee0034] > > 8: (MDSContext::complete(int)+0x70) [0x556a17f5e710] > > 9: (MDSIOContextBase::complete(int)+0x16b) [0x556a17f5e9ab] > > 10: (Finisher::finisher_thread_entry()+0x156) [0x7f2d433d8386] > > 11: (()+0x7dd5) [0x7f2d41262dd5] > > 12: (clone()+0x6d) [0x7f2d3ff1302d] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > > needed to interpret this. > > > > [root@mds02 ~]# ceph -s > > cluster: > > id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47 > > health: HEALTH_WARN > > 1 filesystem is degraded > > insufficient standby MDS daemons available > > 1 MDSs behind on trimming > > 1 large omap objects > > > > services: > > mon: 3 daemons, quorum mds01,mds02,mds03 (age 4d) > > mgr: mds02(active, since 3w), standbys: mds01, mds03 > > mds: ceph_fs:2/2 {0=mds02=up:rejoin,1=mds01=up:rejoin(laggy or > > crashed)} > > osd: 535 osds: 533 up, 529 in > > > > data: > > pools: 3 pools, 3328 pgs > > objects: 376.32M objects, 673 TiB > > usage: 1.0 PiB used, 2.2 PiB / 3.2 PiB avail > > pgs: 3315 active+clean > > 12 active+clean+scrubbing+deep > > 1 active+clean+scrubbing > > > Someone an idea where to go from here ?☺ > > looks like omap for dirfrag is corrupted. please check mds log (debug_mds = 10) to find which omap is corrupted. Basically all omap keys of dirfrag should be in format xxxx_xxxx. > Thanks! > > K > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com