Hi Sebastian, You can find some more discussion and fixes for this type of fs corruption here: https://www.spinics.net/lists/ceph-users/msg76952.html
-- Dan van der Ster CTO Clyso GmbH p: +49 89 215252722 | a: Vancouver, Canada w: https://clyso.com | e: dan.vanders...@clyso.com We are hiring: https://www.clyso.com/jobs/ On Fri, Nov 24, 2023 at 5:48 AM Sebastian Knust <skn...@physik.uni-bielefeld.de> wrote: > > Hi, > > After updating from 17.2.6 to 17.2.7 with cephadm, our cluster went into > MDS_DAMAGE state. We had some prior issues with faulty kernel clients > not releasing capabilities, therefore the update might just be a > coincidence. > > `ceph tell mds.cephfs:0 damage ls` lists 56 affected files all with > these general details: > > { > "damage_type": "dentry", > "id": 123456, > "ino": 1234567890, > "frag": "*", > "dname": "some-filename.ext", > "snap_id": "head", > "path": "/full/path/to/file" > } > > The behaviour upon trying to access file information in the (Kernel > mounted) filesystem is a bit inconsistent. Generally, the first `stat` > call seems to result in "Input/output error", the next call provides all > `stat` data as expected from an undamaged file. The file can be read > with `cat` with full and correct content (verified with backup) once the > stat call succeeds. > > Scrubbing the affected subdirectories with `ceph tell mds.cephfs:0 scrub > start /path/to/dir/ recursive,repair,force` does not fix the issue. > > Trying to delete the file results in an "Input/output error". If the > stat calls beforehand succeeded, this also crashes the active MDS with > these messages in the system journal: > > Nov 24 14:21:15 iceph-18.servernet ceph-mds[1946861]: > > mds.0.cache.den(0x10012271195 DisplaySettings.json) newly corrupt dentry to > > be committed: [dentry > > #0x1/homes/huser/d3data/transfer/hortkrass/FLIMSIM/2023-04-12-irf-characterization/2-qwp-no-extra-filter-pc-off-tirf-94-tirf-cursor/DisplaySettings.json > > [1000275c4a0,head] auth (dversion lock) pv=0 v=225 ino=0x10012271197 > > state=1073741824 | inodepin=1 0x56413e1e2780] > > Nov 24 14:21:15 iceph-18.servernet ceph-mds[1946861]: log_channel(cluster) > > log [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry > > #0x1/homes/huser/d3data/transfer/hortkrass/FLIMSIM/2023-04-12-irf-characterization/2-qwp-no-extra-filter-pc-off-tirf-94-tirf-cursor/DisplaySettings.json > > [1000275c4a0,head] auth (dversion lock) pv=0 v=225 ino=0x10012271197 > > state=1073741824 | inodepin=1 0x56413e1e2780] > > Nov 24 14:21:15 iceph-18.servernet > > ceph-eafd0514-3644-11eb-bc6a-3cecef2330fa-mds-cephfs-iceph-18-ujfqnd[1946838]: > > 2023-11-24T13:21:15.654+0000 7f3fdcde0700 -1 mds.0.cache.den(0x10012271195 > > DisplaySettings.json) newly corrupt dentry to be committed: [dentry > > #0x1/homes/huser/d3data/transfer/hortkrass/FLIMSIM/2023-04-12-irf-characterization/2-qwp-no-extra-filter-pc-off-tirf-94-tirf-cursor/DisplaySettings.json > > [1000275c4a0,head] auth (dversion lock) pv=0 v=225 ino=0x1001> > > Nov 24 14:21:15 iceph-18.servernet > > ceph-eafd0514-3644-11eb-bc6a-3cecef2330fa-mds-cephfs-iceph-18-ujfqnd[1946838]: > > 2023-11-24T13:21:15.654+0000 7f3fdcde0700 -1 log_channel(cluster) log > > [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry > > #0x1/homes/huser/d3data/transfer/hortkrass/FLIMSIM/2023-04-12-irf-characterization/2-qwp-no-extra-filter-pc-off-tirf-94-tirf-cursor/DisplaySettings.json > > [1000275c4a0,head] auth (dversion lock) pv=0 v=225 ino=0x10012> > > Nov 24 14:21:15 iceph-18.servernet > > ceph-eafd0514-3644-11eb-bc6a-3cecef2330fa-mds-cephfs-iceph-18-ujfqnd[1946838]: > > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDSRank.cc: > > In function 'void MDSRank::abort(std::string_view)' thread 7f3fdcde0700 > > time 2023-11-24T13:21:15.655088+0000 > > Nov 24 14:21:15 iceph-18.servernet ceph-mds[1946861]: > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDSRank.cc: > > In function 'void MDSRank::abort(std::string_view)' thread 7f3fdcde0700 > > time 2023-11-24T13:21:15.655088+0000 > > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDSRank.cc: > > 937: ceph_abort_msg("abort() called") > > > > ceph version 17.2.7 > > (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) > > 1: > > (ceph::__ceph_abort(char const*, int, char const*, > > std::__cxx11::basic_string<char, std::char_traits<char>, > > std::allocator<char> > const&)+0xd7) [0x7f3fe5a1cb03] > > 2: > > (MDSRank::abort(std::basic_string_view<char, std::char_traits<char> > > >)+0x7d) [0x5640f2e6fa2d] > > 3: > > (CDentry::check_corruption(bool)+0x740) [0x5640f30e4820] > > 4: > > (EMetaBlob::add_primary_dentry(EMetaBlob::dirlump&, CDentry*, CInode*, > > unsigned char)+0x47) [0x5640f2f41877] > > 5: > > (EOpen::add_clean_inode(CInode*)+0x121) [0x5640f2f49fc1] > > 6: > > (Locker::adjust_cap_wanted(Capability*, int, int)+0x426) [0x5640f305e036] > > 7: > > (Locker::process_request_cap_release(boost::intrusive_ptr<MDRequestImpl>&, > > client_t, ceph_mds_request_release const&, std::basic_string_view<char, > > std::char_traits<char> >)+0x599) [0x5640f307f7e9] > > 8: > > (Server::handle_client_request(boost::intrusive_ptr<MClientRequest const> > > const&)+0xc06) [0x5640f2f2a7c6] > > 9: > > (Server::dispatch(boost::intrusive_ptr<Message const> const&)+0x13c) > > [0x5640f2f2ef6c] > > 10: > > (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, > > bool)+0x5db) [0x5640f2e7727b] > > 11: > > (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> > > const&)+0x5c) [0x5640f2e778bc] > > 12: > > (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x1bf) > > [0x5640f2e60c2f] > > 13: > > (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> > > const&)+0x478) [0x7f3fe5c97ed8] > > 14: > > (DispatchQueue::entry()+0x50f) [0x7f3fe5c9531f] > > 15: > > (DispatchQueue::DispatchThread::entry()+0x11) [0x7f3fe5d5f381] > > 16: > > /lib64/libpthread.so.0(+0x81ca) [0x7f3fe4a0b1ca] > > 17: clone() > > Deleting the file with cephfs-shell also does give Input/output error (5). > > Does anyone have an idea on how to proceed here? I am perfectly fine > with loosing the affected files, they can all be easily restored from > backup. > > Cheers > Sebastian > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io