[ceph-users] Re: mds dump inode crashes file system

Frank Schilder Thu, 11 May 2023 03:28:53 -0700

Dear Xiubo,

thanks for your reply.


> BTW, did you enabled the async dirop ? Currently this is disabled by
> default in 4.18.0-486.el8.x86_64.

I have never heard about that option until now. How do I check that and how to 
I disable it if necessary?

I'm in meetings pretty much all day and will try to send some more info later.

> Could you reproduce this by enabling the mds debug logs ?

Not right now. Our users are annoyed enough already. I first need to figure out 
how to move the troublesome inode somewhere else where I might be able to do 
something. The boot message shows up on this one file server every time. Is 
there any information about what dir/inode might be causing the issue? How 
could I reproduce this without affecting the users, say, by re-creating the 
same condition somewhere else? Any hints are appreciated.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Xiubo Li <xiu...@redhat.com>
Sent: Thursday, May 11, 2023 3:45 AM
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: mds dump inode crashes file system

Hey Frank,

On 5/10/23 21:44, Frank Schilder wrote:
> The kernel message that shows up on boot on the file server in text format:
>
> May 10 13:56:59 rit-pfile01 kernel: WARNING: CPU: 3 PID: 34 at 
> fs/ceph/caps.c:689 ceph_add_cap+0x53e/0x550 [ceph]
> May 10 13:56:59 rit-pfile01 kernel: Modules linked in: ceph libceph 
> dns_resolver nls_utf8 isofs cirrus drm_shmem_helper intel_rapl_msr iTCO_wdt 
> intel_rapl_common iTCO_vendor_support drm_kms_helper syscopyarea sysfillrect 
> sysimgblt fb_sys_fops pcspkr joydev virtio_net drm i2c_i801 net_failover 
> virtio_balloon failover lpc_ich nfsd nfs_acl lockd auth_rpcgss grace sunrpc 
> sr_mod cdrom sg xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci 
> libahci ghash_clmulni_intel libata serio_raw virtio_blk virtio_console 
> virtio_scsi dm_mirror dm_region_hash dm_log dm_mod fuse
> May 10 13:56:59 rit-pfile01 kernel: CPU: 3 PID: 34 Comm: kworker/3:0 Not 
> tainted 4.18.0-486.el8.x86_64 #1
> May 10 13:56:59 rit-pfile01 kernel: Hardware name: Red Hat KVM/RHEL-AV, BIOS 
> 1.16.0-3.module_el8.7.0+3346+68867adb 04/01/2014
> May 10 13:56:59 rit-pfile01 kernel: Workqueue: ceph-msgr ceph_con_workfn 
> [libceph]
> May 10 13:56:59 rit-pfile01 kernel: RIP: 0010:ceph_add_cap+0x53e/0x550 [ceph]
> May 10 13:56:59 rit-pfile01 kernel: Code: c0 48 c7 c7 c0 69 7f c0 e8 6c 4c 72 
> c3 0f 0b 44 89 7c 24 04 e9 7e fc ff ff 44 8b 7c 24 04 e9 68 fe ff ff 0f 0b e9 
> c9 fc ff ff <0f> 0b e9 0a fe ff ff 0f 0b e9 12 fe ff ff 0f 0b 66 90 0f 1f 44 
> 00
> May 10 13:56:59 rit-pfile01 kernel: RSP: 0018:ffffa4d000d87b48 EFLAGS: 
> 00010217
> May 10 13:56:59 rit-pfile01 kernel: RAX: 0000000000000000 RBX: 
> 0000000000000005 RCX: dead000000000200
> May 10 13:56:59 rit-pfile01 kernel: RDX: ffff92d7d7f6e7d0 RSI: 
> ffff92d7d7f6e7d0 RDI: ffff92d7d7f6e7c8
> May 10 13:56:59 rit-pfile01 kernel: RBP: ffff92d7c5588970 R08: 
> ffff92d7d7f6e7d0 R09: 0000000000000001
> May 10 13:56:59 rit-pfile01 kernel: R10: ffff92d80078cbb8 R11: 
> 00000000ffff92c0 R12: 0000000000000155
> May 10 13:56:59 rit-pfile01 kernel: R13: ffff92d80078cbb8 R14: 
> ffff92d80078cbc0 R15: 0000000000000001
> May 10 13:56:59 rit-pfile01 kernel: FS:  0000000000000000(0000) 
> GS:ffff92d937d80000(0000) knlGS:0000000000000000
> May 10 13:56:59 rit-pfile01 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
> 0000000080050033
> May 10 13:56:59 rit-pfile01 kernel: CR2: 00007f74435b9008 CR3: 
> 00000001099fa000 CR4: 00000000003506e0
> May 10 13:56:59 rit-pfile01 kernel: Call Trace:
> May 10 13:56:59 rit-pfile01 kernel: ceph_handle_caps+0xdf2/0x1780 [ceph]
> May 10 13:56:59 rit-pfile01 kernel: mds_dispatch+0x13a/0x670 [ceph]
> May 10 13:56:59 rit-pfile01 kernel: ceph_con_process_message+0x79/0x140 
> [libceph]
> May 10 13:56:59 rit-pfile01 kernel: ? calc_signature+0xdf/0x110 [libceph]
> May 10 13:56:59 rit-pfile01 kernel: ceph_con_v1_try_read+0x5d7/0xf30 [libceph]
> May 10 13:56:59 rit-pfile01 kernel: ceph_con_workfn+0x329/0x680 [libceph]
> May 10 13:56:59 rit-pfile01 kernel: process_one_work+0x1a7/0x360
> May 10 13:56:59 rit-pfile01 kernel: worker_thread+0x30/0x390
> May 10 13:56:59 rit-pfile01 kernel: ? create_worker+0x1a0/0x1a0
> May 10 13:56:59 rit-pfile01 kernel: kthread+0x134/0x150
> May 10 13:56:59 rit-pfile01 kernel: ? set_kthread_struct+0x50/0x50
> May 10 13:56:59 rit-pfile01 kernel: ret_from_fork+0x35/0x40
> May 10 13:56:59 rit-pfile01 kernel: ---[ end trace 84e4b3694bbe9fde ]---

BTW, did you enabled the async dirop ? Currently this is disabled by
default in 4.18.0-486.el8.x86_64.

The async dirop is buggy and we have hit a very similar bug as above,
please see https://tracker.ceph.com/issues/55857. This is a racy between
the client requests and dir migrating in MDS and this has been fixed
long time ago.

If you didn't enable the async dirop then it should be a different issue
without the async dirop. But I guess this should also be a racy between
client requests and dir migrating in non-async dirop case.

And from the kernel call trace, I just guess the MDS was doing dir
spliting and migrating and the MDS daemon's crash possible caused by:

"dirfragtree.dump(f);"

Could you reproduce this by enabling the mds debug logs ?


Thanks

- Xiubo


> I can't interpret it, some help is appreciated.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Frank Schilder <fr...@dtu.dk>
> Sent: Wednesday, May 10, 2023 3:36 PM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: mds dump inode crashes file system
>
> For the "mds dump inode" command I could find the crash in the log; see 
> below. Most of the log contents is the past OPS dump from the 3 MDS restarts 
> that happened. It contains the 10000 last OPS before the crash and I can 
> upload the log if someone can use it. The crash stack trace somewhat 
> truncated for readability:
>
> 2023-05-10T12:54:53.142+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map to 
> version 892464 from mon.4
> 2023-05-10T13:39:50.962+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
> client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 61.705410 seconds ago
> 2023-05-10T13:39:52.550+0200 7fe971ca6700  1 mds.ceph-23 Updating MDS map to 
> version 892465 from mon.4
> 2023-05-10T13:40:50.963+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
> client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 121.706193 seconds ago
> 2023-05-10T13:42:50.966+0200 7fe96fca2700  0 log_channel(cluster) log [WRN] : 
> client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 241.709072 seconds ago
> 2023-05-10T13:44:00.506+0200 7fe972ca8700  1 mds.ceph-23 asok_command: dump 
> inode {number=2199322355147,prefix=dump inode} (starting...)
> 2023-05-10T13:44:00.520+0200 7fe972ca8700 -1 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/common/buffer.cc:
>  In function 'const char* ceph::buffer::v15_2_0::ptr::c_str() const' thread 
> 7fe972ca8700 time 2023-05-10T13:44:00.507652+0200
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.17/rpm/el8/BUILD/ceph-15.2.17/src/common/buffer.cc:
>  501: FAILED ceph_assert(_raw)
>
>   ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus 
> (stable)
>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x158) [0x7fe979ae9b92]
>   2: (()+0x27ddac) [0x7fe979ae9dac]
>   3: (()+0x5ce831) [0x7fe979e3a831]
>   4: (InodeStoreBase::dump(ceph::Formatter*) const+0x153) [0x55c08c59b543]
>   5: (CInode::dump(ceph::Formatter*, int) const+0x144) [0x55c08c59b8d4]
>   6: (MDCache::dump_inode(ceph::Formatter*, unsigned long)+0x7c) 
> [0x55c08c41e00c]
>   7: (MDSRank::command_dump_inode(ceph::Formatter*, ..., std::ostream&)+0xb5) 
> [0x55c08c353e75]
>   8: (MDSRankDispatcher::handle_asok_command(std::basic_string_view<char, 
> std::char_traits<char> >, ..., ceph::buffer::v15_2_0::list&)>)+0x2296) 
> [0x55c08c36c5f6]
>   9: (MDSDaemon::asok_command(std::basic_string_view<char, ..., 
> ceph::buffer::v15_2_0::list&)>)+0x75b) [0x55c08c340eab]
>   10: (MDSSocketHook::call_async(std::basic_string_view<char, 
> std::char_traits<char> >, ..., ceph::buffer::v15_2_0::list&)>)+0x6a) 
> [0x55c08c34f9ca]
>   11: 
> (AdminSocket::execute_command(std::vector<std::__cxx11::basic_string<char, 
> std::char_traits<char>, ..., ceph::buffer::v15_2_0::list&)>)+0x6f9) 
> [0x7fe979bece59]
>   12: (AdminSocket::do_tell_queue()+0x289) [0x7fe979bed809]
>   13: (AdminSocket::entry()+0x4d3) [0x7fe979beefd3]
>   14: (()+0xc2ba3) [0x7fe977afaba3]
>   15: (()+0x81ca) [0x7fe9786bf1ca]
>   16: (clone()+0x43) [0x7fe977111dd3]
>
> 2023-05-10T13:44:00.522+0200 7fe972ca8700 -1 *** Caught signal (Aborted) **
>   in thread 7fe972ca8700 thread_name:admin_socket
>
>   ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus 
> (stable)
>   1: (()+0x12ce0) [0x7fe9786c9ce0]
>   2: (gsignal()+0x10f) [0x7fe977126a9f]
>   3: (abort()+0x127) [0x7fe9770f9e05]
>   4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x1a9) [0x7fe979ae9be3]
>   5: (()+0x27ddac) [0x7fe979ae9dac]
>   6: (()+0x5ce831) [0x7fe979e3a831]
>   7: (InodeStoreBase::dump(ceph::Formatter*) const+0x153) [0x55c08c59b543]
>   8: (CInode::dump(ceph::Formatter*, int) const+0x144) [0x55c08c59b8d4]
>   9: (MDCache::dump_inode(ceph::Formatter*, unsigned long)+0x7c) 
> [0x55c08c41e00c]
>   10: (MDSRank::command_dump_inode(ceph::Formatter*, ..., 
> std::ostream&)+0xb5) [0x55c08c353e75]
>   11: (MDSRankDispatcher::handle_asok_command(std::basic_string_view<char, 
> std::char_traits<char> >, ..., ceph::buffer::v15_2_0::list&)>)+0x2296) 
> [0x55c08c36c5f6]
>   12: (MDSDaemon::asok_command(std::basic_string_view<char, 
> std::char_traits<char> >, ..., ceph::buffer::v15_2_0::list&)>)+0x75b) 
> [0x55c08c340eab]
>   13: (MDSSocketHook::call_async(std::basic_string_view<char, 
> std::char_traits<char> >, ..., ceph::buffer::v15_2_0::list&)>)+0x6a) 
> [0x55c08c34f9ca]
>   14: 
> (AdminSocket::execute_command(std::vector<std::__cxx11::basic_string<char, 
> std::char_traits<char>, ..., ceph::buffer::v15_2_0::list&)>)+0x6f9) 
> [0x7fe979bece59]
>   15: (AdminSocket::do_tell_queue()+0x289) [0x7fe979bed809]
>   16: (AdminSocket::entry()+0x4d3) [0x7fe979beefd3]
>   17: (()+0xc2ba3) [0x7fe977afaba3]
>   18: (()+0x81ca) [0x7fe9786bf1ca]
>   19: (clone()+0x43) [0x7fe977111dd3]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Frank Schilder <fr...@dtu.dk>
> Sent: Wednesday, May 10, 2023 2:33 PM
> To: ceph-users@ceph.io
> Subject: [ceph-users] mds dump inode crashes file system
>
> Hi all,
>
> I have an annoying problem with a specific ceph fs client. We have a file 
> server on which we re-export kernel mounts via samba (all mounts with noshare 
> option). On one of these re-exports we have recurring problems. Today I 
> caught it with
>
> 2023-05-10T13:39:50.963685+0200 mds.ceph-23 (mds.1) 1761 : cluster [WRN] 
> client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb 
> pending pAsLsXsFscr issued pAsLsXsFscr, sent 61.705410 seconds ago
>
> and I wanted to look up what path the inode 0x20011d3e5cb points to. 
> Unfortunately, the command
>
> ceph tell "mds.*" dump inode 0x20011d3e5cb
>
> crashes an MDS in a way that it restarts itself, but doesn't seem to come 
> back clean (it does not fail over to a stand-by). If I repeat the command 
> above, it crashes the MDS again. Execution on other MDS daemons succeeds, for 
> example:
>
> # ceph tell "mds.ceph-24" dump inode 0x20011d3e5cb
> 2023-05-10T14:14:37.091+0200 7fa47ffff700  0 client.210149523 ms_handle_reset 
> on v2:192.168.32.88:6800/3216233914
> 2023-05-10T14:14:37.124+0200 7fa4857fa700  0 client.210374440 ms_handle_reset 
> on v2:192.168.32.88:6800/3216233914
> dump inode failed, wrong inode number or the inode is not cached
>
> The caps recall gets the client evicted at some point but it doesn't manage 
> to come back clean. On a single ceph fs mount point I see this
>
> # ls /shares/samba/rit-oil
> ls: cannot access '/shares/samba/rit-oil': Stale file handle
>
> All other mount points are fine, just this one acts up. A "mount -o remount 
> /shares/samba/rit-oil" crashed the entire server and I had to do a cold 
> reboot. On reboot I see this message: https://imgur.com/a/bOSLxBb , which 
> only occurs on this one file server (we are running a few of those). Does 
> this point to a more serious problem, like a file system corruption? Should I 
> try an fs scrub on the corresponding path?
>
> Some info about the system:
>
> The file server's kernel version is quite recent, updated two weeks ago:
>
> $ uname -r
> 4.18.0-486.el8.x86_64
> # cat /etc/redhat-release
> CentOS Stream release 8
>
> Our ceph cluster is octopus latest and we use the packages from the octopus 
> el8 repo on this server.
>
> We have several such shares and they all work fine. It is only on one share 
> where we have persistent problems with the mount point hanging or the server 
> freezing and crashing.
>
> After working hours I will try a proper fail of the "broken" MDS to see if I 
> can execute the dump inode command without it crashing everything.
>
> In the mean time, any hints would be appreciated. I see that we have an 
> exceptionally large MDS log for the problematic one. Any hint what to look 
> for would be appreciated, it contains a lot from the recovery operations:
>
> # pdsh -w ceph-[08-17,23-24] ls -lh "/var/log/ceph/ceph-mds.ceph-??.log"
>
> ceph-23: -rw-r--r--. 1 ceph ceph 15M May 10 14:28 
> /var/log/ceph/ceph-mds.ceph-23.log *** huge ***
>
> ceph-24: -rw-r--r--. 1 ceph ceph 14K May 10 14:28 
> /var/log/ceph/ceph-mds.ceph-24.log
> ceph-10: -rw-r--r--. 1 ceph ceph 394 May 10 14:02 
> /var/log/ceph/ceph-mds.ceph-10.log
> ceph-13: -rw-r--r--. 1 ceph ceph 394 May 10 14:02 
> /var/log/ceph/ceph-mds.ceph-13.log
> ceph-08: -rw-r--r--. 1 ceph ceph 394 May 10 14:02 
> /var/log/ceph/ceph-mds.ceph-08.log
> ceph-15: -rw-r--r--. 1 ceph ceph 14K May 10 14:28 
> /var/log/ceph/ceph-mds.ceph-15.log
> ceph-17: -rw-r--r--. 1 ceph ceph 14K May 10 14:28 
> /var/log/ceph/ceph-mds.ceph-17.log
> ceph-14: -rw-r--r--. 1 ceph ceph 16K May 10 14:28 
> /var/log/ceph/ceph-mds.ceph-14.log
> ceph-09: -rw-r--r--. 1 ceph ceph 16K May 10 14:28 
> /var/log/ceph/ceph-mds.ceph-09.log
> ceph-16: -rw-r--r--. 1 ceph ceph 15K May 10 14:28 
> /var/log/ceph/ceph-mds.ceph-16.log
> ceph-11: -rw-r--r--. 1 ceph ceph 14K May 10 14:28 
> /var/log/ceph/ceph-mds.ceph-11.log
> ceph-12: -rw-r--r--. 1 ceph ceph 394 May 10 14:02 
> /var/log/ceph/ceph-mds.ceph-12.log
>
> Thanks and best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: mds dump inode crashes file system

Reply via email to