Re: [ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9
> On 25 Aug 2017, at 18:57, donglifec...@gmail.com wrote: > > ZhengYan, > > Yes, shutdown osd.1, process D status disappear. what reason is this ? > when this problem( D status) comes up, ceph is health ok. How should I > deal with this problem? > maybe the disk underneath osd.1 is about to die. > Thanks a lot. > > donglifec...@gmail.com > > From: Yan, Zheng > Date: 2017-08-25 17:17 > To: donglifec...@gmail.com > CC: ceph-users > Subject: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D > status), ceph version 0.94.9 > > > On 25 Aug 2017, at 16:23, donglifec...@gmail.com wrote: > > > > ZhengYan, > > > > [root@ceph-radosgw-lb-backup cephfs]# ps aux | grep D > > USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > > root 578 0.0 0.0 203360 3248 ?Ssl Aug24 0:00 > > /usr/sbin/gssproxy -D > > root 865 0.0 0.0 82552 6104 ?Ss Aug24 0:00 > > /usr/sbin/sshd -D > > root 2997 0.0 0.0 0 0 ?DAug24 0:11 > > [kworker/2:1] > > root 3996 0.0 0.0 115384 452 ?DAug24 0:00 -bash > > root 4479 0.0 0.0 112024 652 ?DAug24 0:00 cat > > /mnt/cephfs/a/test-test1 > > root 18143 0.0 0.0 112656 2244 pts/2S+ 08:19 0:00 grep > > --color=auto D > > > > [root@ceph-radosgw-lb-backup cephfs]# cat > > /sys/kernel/debug/ceph/b6ea8682-c90d-495b-80f2-bc5bef1da9d1.client270186/osdc > > > > REQUESTS 25 homeless 0 > > 6225 osd1 21.cd699f5e [1,2,3]/1 [1,2,3]/1 1000ab2. 0x400014 1 > > read > > 6229 osd1 21.ba1e03ba [1,3,2]/1 [1,3,2]/1 1008f05.0002 0x400024 1 > > write > > 6234 osd1 21.29c5e8d [1,0,2]/1 [1,0,2]/1 1008f05.0007 0x400024 1 > > write > > 6238 osd1 21.9fe9de73 [1,2,0]/1 [1,2,0]/1 1008f05.000b 0x400024 1 > > write > > 6240 osd1 21.a05b2ba8 [1,0,3]/1 [1,0,3]/1 1008f05.000d 0x400024 1 > > write > > 6241 osd1 21.b7c85b45 [1,2,3]/1 [1,2,3]/1 1008f05.000e 0x400024 1 > > write > > 6242 osd1 21.1bca917f [1,2,3]/1 [1,2,3]/1 1008f05.000f 0x400024 1 > > write > > 6243 osd1 21.cdc3143b [1,2,3]/1 [1,2,3]/1 1008f05.0010 0x400024 1 > > write > > 6244 osd1 21.8e65566e [1,3,2]/1 [1,3,2]/1 1008f05.0011 0x400024 1 > > write > > 6246 osd1 21.5395ea66 [1,2,3]/1 [1,2,3]/1 1008f05.0013 0x400024 1 > > write > > 6251 osd1 21.2acb9e9c [1,0,2]/1 [1,0,2]/1 1008f05.0018 0x400024 1 > > write > > 6252 osd1 21.b22077e3 [1,0,3]/1 [1,0,3]/1 1008f05.0019 0x400024 1 > > write > > 6254 osd1 21.a9a1c2b1 [1,2,3]/1 [1,2,3]/1 1008f05.001b 0x400024 1 > > write > > 6256 osd1 21.64cfc57f [1,2,3]/1 [1,2,3]/1 1008f05.001d 0x400024 1 > > write > > 6259 osd1 21.629f77ff [1,2,3]/1 [1,2,3]/1 1008f05.0020 0x400024 1 > > write > > 6263 osd1 21.dde8c63 [1,0,3]/1 [1,0,3]/1 1008f05.0024 0x400024 1 > > write > > 6265 osd1 21.909ba5f8 [1,0,3]/1 [1,0,3]/1 1008f05.0026 0x400024 1 > > write > > 6266 osd1 21.5496fef5 [1,3,2]/1 [1,3,2]/1 1008f05.0027 0x400024 1 > > write > > 6269 osd1 21.1f26a27b [1,2,3]/1 [1,2,3]/1 1008f05.002a 0x400024 1 > > write > > 6270 osd1 21.c9021b4e [1,0,2]/1 [1,0,2]/1 1008f05.002b 0x400024 1 > > write > > 6276 osd1 21.e6df28ed [1,2,0]/1 [1,2,0]/1 1008f05.0031 0x400024 1 > > write > > 6286 osd1 21.1a38cab8 [1,0,3]/1 [1,0,3]/1 1008f05.003b 0x400024 1 > > write > > 6287 osd1 21.425da4ba [1,3,2]/1 [1,3,2]/1 1008f05.003c 0x400024 1 > > write > > 6291 osd1 21.bb5fa23f [1,2,3]/1 [1,2,3]/1 1008f05.0040 0x400024 1 > > write > > 6298 osd1 21.b9255c8e [1,0,2]/1 [1,0,2]/1 1008f05.0047 0x400024 1 > > write > > LINGER REQUESTS > > > > It seems kernel was unable to write data to osd1. please try shutdowni osd1. > > > > > > > > [root@ceph-radosgw-lb-backup cephfs]# cat /proc/2997/stack > > [] io_schedule+0x16/0x40 > > [] __lock_page+0x119/0x170 > > [] truncate_inode_pages_range+0x421/0x790 > > [] truncate_pagecache+0x47/0x60 > > [] __ceph_do_pending_vmtruncate+0xc2/0x1c0 [ceph] > > [] ceph_vmtruncate_work+0x1b/0x40 [ceph] > > [] process_one_work+0x149/0x360 > > [] worker_thread+0x4d/0x3c0 > > [] kthread+0x109/0x140 > > [] ret_from_fork+0x25/0x30 > > [] 0x > > > > [root@ceph-radosgw-lb-backup cephfs]# cat /proc/3996/stack > &g
Re: [ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9
ZhengYan, Yes, shutdown osd.1, process D status disappear. what reason is this ? when this problem( D status) comes up, ceph is health ok. How should I deal with this problem? Thanks a lot. donglifec...@gmail.com From: Yan, Zheng Date: 2017-08-25 17:17 To: donglifec...@gmail.com CC: ceph-users Subject: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9 > On 25 Aug 2017, at 16:23, donglifec...@gmail.com wrote: > > ZhengYan, > > [root@ceph-radosgw-lb-backup cephfs]# ps aux | grep D > USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > root 578 0.0 0.0 203360 3248 ?Ssl Aug24 0:00 > /usr/sbin/gssproxy -D > root 865 0.0 0.0 82552 6104 ?Ss Aug24 0:00 > /usr/sbin/sshd -D > root 2997 0.0 0.0 0 0 ?DAug24 0:11 [kworker/2:1] > root 3996 0.0 0.0 115384 452 ?DAug24 0:00 -bash > root 4479 0.0 0.0 112024 652 ?DAug24 0:00 cat > /mnt/cephfs/a/test-test1 > root 18143 0.0 0.0 112656 2244 pts/2S+ 08:19 0:00 grep > --color=auto D > > [root@ceph-radosgw-lb-backup cephfs]# cat > /sys/kernel/debug/ceph/b6ea8682-c90d-495b-80f2-bc5bef1da9d1.client270186/osdc > REQUESTS 25 homeless 0 > 6225 osd1 21.cd699f5e [1,2,3]/1 [1,2,3]/1 1000ab2. 0x400014 1 read > 6229 osd1 21.ba1e03ba [1,3,2]/1 [1,3,2]/1 1008f05.0002 0x400024 1 > write > 6234 osd1 21.29c5e8d [1,0,2]/1 [1,0,2]/1 1008f05.0007 0x400024 1 write > 6238 osd1 21.9fe9de73 [1,2,0]/1 [1,2,0]/1 1008f05.000b 0x400024 1 > write > 6240 osd1 21.a05b2ba8 [1,0,3]/1 [1,0,3]/1 1008f05.000d 0x400024 1 > write > 6241 osd1 21.b7c85b45 [1,2,3]/1 [1,2,3]/1 1008f05.000e 0x400024 1 > write > 6242 osd1 21.1bca917f [1,2,3]/1 [1,2,3]/1 1008f05.000f 0x400024 1 > write > 6243 osd1 21.cdc3143b [1,2,3]/1 [1,2,3]/1 1008f05.0010 0x400024 1 > write > 6244 osd1 21.8e65566e [1,3,2]/1 [1,3,2]/1 1008f05.0011 0x400024 1 > write > 6246 osd1 21.5395ea66 [1,2,3]/1 [1,2,3]/1 1008f05.0013 0x400024 1 > write > 6251 osd1 21.2acb9e9c [1,0,2]/1 [1,0,2]/1 1008f05.0018 0x400024 1 > write > 6252 osd1 21.b22077e3 [1,0,3]/1 [1,0,3]/1 1008f05.0019 0x400024 1 > write > 6254 osd1 21.a9a1c2b1 [1,2,3]/1 [1,2,3]/1 1008f05.001b 0x400024 1 > write > 6256 osd1 21.64cfc57f [1,2,3]/1 [1,2,3]/1 1008f05.001d 0x400024 1 > write > 6259 osd1 21.629f77ff [1,2,3]/1 [1,2,3]/1 1008f05.0020 0x400024 1 > write > 6263 osd1 21.dde8c63 [1,0,3]/1 [1,0,3]/1 1008f05.0024 0x400024 1 write > 6265 osd1 21.909ba5f8 [1,0,3]/1 [1,0,3]/1 1008f05.0026 0x400024 1 > write > 6266 osd1 21.5496fef5 [1,3,2]/1 [1,3,2]/1 1008f05.0027 0x400024 1 > write > 6269 osd1 21.1f26a27b [1,2,3]/1 [1,2,3]/1 1008f05.002a 0x400024 1 > write > 6270 osd1 21.c9021b4e [1,0,2]/1 [1,0,2]/1 1008f05.002b 0x400024 1 > write > 6276 osd1 21.e6df28ed [1,2,0]/1 [1,2,0]/1 1008f05.0031 0x400024 1 > write > 6286 osd1 21.1a38cab8 [1,0,3]/1 [1,0,3]/1 1008f05.003b 0x400024 1 > write > 6287 osd1 21.425da4ba [1,3,2]/1 [1,3,2]/1 1008f05.003c 0x400024 1 > write > 6291 osd1 21.bb5fa23f [1,2,3]/1 [1,2,3]/1 1008f05.0040 0x400024 1 > write > 6298 osd1 21.b9255c8e [1,0,2]/1 [1,0,2]/1 1008f05.0047 0x400024 1 > write > LINGER REQUESTS > It seems kernel was unable to write data to osd1. please try shutdowni osd1. > > [root@ceph-radosgw-lb-backup cephfs]# cat /proc/2997/stack > [] io_schedule+0x16/0x40 > [] __lock_page+0x119/0x170 > [] truncate_inode_pages_range+0x421/0x790 > [] truncate_pagecache+0x47/0x60 > [] __ceph_do_pending_vmtruncate+0xc2/0x1c0 [ceph] > [] ceph_vmtruncate_work+0x1b/0x40 [ceph] > [] process_one_work+0x149/0x360 > [] worker_thread+0x4d/0x3c0 > [] kthread+0x109/0x140 > [] ret_from_fork+0x25/0x30 > [] 0x > > [root@ceph-radosgw-lb-backup cephfs]# cat /proc/3996/stack > [] __ceph_do_pending_vmtruncate+0x44/0x1c0 [ceph] > [] __ceph_setattr+0x79a/0x8b0 [ceph] > [] ceph_setattr+0x3c/0x60 [ceph] > [] notify_change+0x266/0x440 > [] do_truncate+0x75/0xc0 > [] path_openat+0xaba/0x13b0 > [] do_filp_open+0x91/0x100 > [] do_sys_open+0x124/0x210 > [] SyS_open+0x1e/0x20 > [] do_syscall_64+0x67/0x150 > [] entry_SYSCALL64_slow_path+0x25/0x25 > [] 0x > > [root@ceph-radosgw-lb-backup cephfs]# cat /proc/4479/stack > [] __ceph_do_pending_vmtruncate+0x44/0x1c0 [ceph] > [] try_get_cap_refs+0xb5/0x5a0 [ceph] > [] ceph_get_caps+0x119/0x390 [ceph] > [] ceph_read_iter+0xc5/0x820 [ceph] > [] __vfs_read+0xdf/0x130 > [] vfs_read+
Re: [ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9
/0xc0 > [] path_openat+0xaba/0x13b0 > [] do_filp_open+0x91/0x100 > [] do_sys_open+0x124/0x210 > [] SyS_open+0x1e/0x20 > [] do_syscall_64+0x67/0x150 > [] entry_SYSCALL64_slow_path+0x25/0x25 > [] 0x > > [root@ceph-radosgw-lb-backup cephfs]# cat /proc/4479/stack > [] __ceph_do_pending_vmtruncate+0x44/0x1c0 [ceph] > [] try_get_cap_refs+0xb5/0x5a0 [ceph] > [] ceph_get_caps+0x119/0x390 [ceph] > [] ceph_read_iter+0xc5/0x820 [ceph] > [] __vfs_read+0xdf/0x130 > [] vfs_read+0x8c/0x130 > [] SyS_read+0x55/0xc0 > [] do_syscall_64+0x67/0x150 > [] entry_SYSCALL64_slow_path+0x25/0x25 > [] 0x > > > > donglifec...@gmail.com > > From: donglifec...@gmail.com > Date: 2017-08-25 16:14 > To: zyan > CC: ceph-users > Subject: Re: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D > status), ceph version 0.94.9 > ZhengYan, > > I will test this problem again. > > Thanks a lot. > > > donglifec...@gmail.com > > From: Yan, Zheng > Date: 2017-08-25 16:12 > To: donglifecomm > CC: ceph-users > Subject: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D > status), ceph version 0.94.9 > > > > On 24 Aug 2017, at 17:40, donglifec...@gmail.com wrote: > > > > ZhengYan, > > > > I meet a problem, Follow the steps outlined below: > > > > 1. create 30G file test823 > > 2. host1 client(kernel 4.12.8) > > cat /mnt/cephfs/a/test823 > /mnt/cephfs/a/test823-backup > > ls -al /mnt/cephfs/a/* > > > > 3. host2 client(kernel 4.12.8) > > while true; do cp /home/scripts/512k.file /mnt/cephfs/a/512k.file$i ; > > done // loop copy file > > cat /mnt/cephfs/a/test823-backup > /mnt/cephfs/a/newtestfile > > ls -al /mnt/cephfs/a/* > > > > 4. host2 client hung, stack is : > > [ 9462.754853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > > this message. > > [ 9462.756838] bashD0 32738 14988 0x0084 > > [ 9462.758568] Call Trace: > > [ 9462.759945] __schedule+0x28a/0x880 > > [ 9462.761414] schedule+0x36/0x80 > > [ 9462.762835] rwsem_down_write_failed+0x20d/0x380 > > [ 9462.764433] call_rwsem_down_write_failed+0x17/0x30 > > [ 9462.766075] ? __ceph_getxattr+0x340/0x340 [ceph] > > [ 9462.767693] down_write+0x2d/0x40 > > [ 9462.769175] do_truncate+0x67/0xc0 > > [ 9462.770642] path_openat+0xaba/0x13b0 > > [ 9462.772136] do_filp_open+0x91/0x100 > > [ 9462.773616] ? __check_object_size+0x159/0x190 > > [ 9462.775156] ? __alloc_fd+0x46/0x170 > > [ 9462.776574] do_sys_open+0x124/0x210 > > [ 9462.777972] SyS_open+0x1e/0x20 > > [ 9462.779320] do_syscall_64+0x67/0x150 > > [ 9462.780736] entry_SYSCALL64_slow_path+0x25/0x25 > > > > [root@cephtest ~]# cat /proc/29541/stack > > [] ceph_mdsc_do_request+0x183/0x240 [ceph] > > [] __ceph_setattr+0x3fc/0x8b0 [ceph] > > [] ceph_setattr+0x3c/0x60 [ceph] > > [] notify_change+0x266/0x440 > > [] do_truncate+0x75/0xc0 > > [] path_openat+0xaba/0x13b0 > > [] do_filp_open+0x91/0x100 > > [] do_sys_open+0x124/0x210 > > [] SyS_open+0x1e/0x20 > > [] do_syscall_64+0x67/0x150 > > [] entry_SYSCALL64_slow_path+0x25/0x25 > > [] 0x > > > > [root@cephtest ~]# cat /proc/32738/stack > > [] call_rwsem_down_write_failed+0x17/0x30 > > [] do_truncate+0x67/0xc0 > > [] path_openat+0xaba/0x13b0 > > [] do_filp_open+0x91/0x100 > > [] do_sys_open+0x124/0x210 > > [] SyS_open+0x1e/0x20 > > [] do_syscall_64+0x67/0x150 > > [] entry_SYSCALL64_slow_path+0x25/0x25 > > [] 0x > > > > ceph log is: > > f pending pAsLsXs issued pAsLsXsFcb, sent 1921.069365 seconds ago > > 2017-08-24 17:16:00.219523 7f746db8f700 0 log_channel(cluster) log [WRN] : > > client.268113 isn't responding to mclientcaps(revoke), ino 1000424 > > pending pAsLsXs issued pAsLsXsFcb, sent 1921.063079 seconds ago > > 2017-08-24 17:16:00.219534 7f746db8f700 0 log_channel(cluster) log [WRN] : > > client.268113 isn't responding to mclientcaps(revoke), ino 1000521 > > pending pAsLsXs issued pAsLsXsFcb, sent 1921.026983 seconds ago > > 2017-08-24 17:16:00.219545 7f746db8f700 0 log_channel(cluster) log [WRN] : > > client.268113 isn't responding to mclientcaps(revoke), ino 1000523 > > pending pAsLsXs issued pAsLsXsFcb, sent 1920.985596 seconds ago > > 2017-08-24 17:16:00.219574 7f746db8f700 0 log_channel(cluster) log [WRN] : > > client.268113 isn't responding to mclientcaps(revoke), ino 1000528 > >
Re: [ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9
ZhengYan, [root@ceph-radosgw-lb-backup cephfs]# ps aux | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 578 0.0 0.0 203360 3248 ?Ssl Aug24 0:00 /usr/sbin/gssproxy -D root 865 0.0 0.0 82552 6104 ?Ss Aug24 0:00 /usr/sbin/sshd -D root 2997 0.0 0.0 0 0 ?DAug24 0:11 [kworker/2:1] root 3996 0.0 0.0 115384 452 ?DAug24 0:00 -bash root 4479 0.0 0.0 112024 652 ?DAug24 0:00 cat /mnt/cephfs/a/test-test1 root 18143 0.0 0.0 112656 2244 pts/2S+ 08:19 0:00 grep --color=auto D [root@ceph-radosgw-lb-backup cephfs]# cat /sys/kernel/debug/ceph/b6ea8682-c90d-495b-80f2-bc5bef1da9d1.client270186/osdc REQUESTS 25 homeless 0 6225 osd1 21.cd699f5e [1,2,3]/1 [1,2,3]/1 1000ab2. 0x400014 1 read 6229 osd1 21.ba1e03ba [1,3,2]/1 [1,3,2]/1 1008f05.0002 0x400024 1 write 6234 osd1 21.29c5e8d [1,0,2]/1 [1,0,2]/1 1008f05.0007 0x400024 1 write 6238 osd1 21.9fe9de73 [1,2,0]/1 [1,2,0]/1 1008f05.000b 0x400024 1 write 6240 osd1 21.a05b2ba8 [1,0,3]/1 [1,0,3]/1 1008f05.000d 0x400024 1 write 6241 osd1 21.b7c85b45 [1,2,3]/1 [1,2,3]/1 1008f05.000e 0x400024 1 write 6242 osd1 21.1bca917f [1,2,3]/1 [1,2,3]/1 1008f05.000f 0x400024 1 write 6243 osd1 21.cdc3143b [1,2,3]/1 [1,2,3]/1 1008f05.0010 0x400024 1 write 6244 osd1 21.8e65566e [1,3,2]/1 [1,3,2]/1 1008f05.0011 0x400024 1 write 6246 osd1 21.5395ea66 [1,2,3]/1 [1,2,3]/1 1008f05.0013 0x400024 1 write 6251 osd1 21.2acb9e9c [1,0,2]/1 [1,0,2]/1 1008f05.0018 0x400024 1 write 6252 osd1 21.b22077e3 [1,0,3]/1 [1,0,3]/1 1008f05.0019 0x400024 1 write 6254 osd1 21.a9a1c2b1 [1,2,3]/1 [1,2,3]/1 1008f05.001b 0x400024 1 write 6256 osd1 21.64cfc57f [1,2,3]/1 [1,2,3]/1 1008f05.001d 0x400024 1 write 6259 osd1 21.629f77ff [1,2,3]/1 [1,2,3]/1 1008f05.0020 0x400024 1 write 6263 osd1 21.dde8c63 [1,0,3]/1 [1,0,3]/1 1008f05.0024 0x400024 1 write 6265 osd1 21.909ba5f8 [1,0,3]/1 [1,0,3]/1 1008f05.0026 0x400024 1 write 6266 osd1 21.5496fef5 [1,3,2]/1 [1,3,2]/1 1008f05.0027 0x400024 1 write 6269 osd1 21.1f26a27b [1,2,3]/1 [1,2,3]/1 1008f05.002a 0x400024 1 write 6270 osd1 21.c9021b4e [1,0,2]/1 [1,0,2]/1 1008f05.002b 0x400024 1 write 6276 osd1 21.e6df28ed [1,2,0]/1 [1,2,0]/1 1008f05.0031 0x400024 1 write 6286 osd1 21.1a38cab8 [1,0,3]/1 [1,0,3]/1 1008f05.003b 0x400024 1 write 6287 osd1 21.425da4ba [1,3,2]/1 [1,3,2]/1 1008f05.003c 0x400024 1 write 6291 osd1 21.bb5fa23f [1,2,3]/1 [1,2,3]/1 1008f05.0040 0x400024 1 write 6298 osd1 21.b9255c8e [1,0,2]/1 [1,0,2]/1 1008f05.0047 0x400024 1 write LINGER REQUESTS [root@ceph-radosgw-lb-backup cephfs]# cat /proc/2997/stack [] io_schedule+0x16/0x40 [] __lock_page+0x119/0x170 [] truncate_inode_pages_range+0x421/0x790 [] truncate_pagecache+0x47/0x60 [] __ceph_do_pending_vmtruncate+0xc2/0x1c0 [ceph] [] ceph_vmtruncate_work+0x1b/0x40 [ceph] [] process_one_work+0x149/0x360 [] worker_thread+0x4d/0x3c0 [] kthread+0x109/0x140 [] ret_from_fork+0x25/0x30 [] 0x [root@ceph-radosgw-lb-backup cephfs]# cat /proc/3996/stack [] __ceph_do_pending_vmtruncate+0x44/0x1c0 [ceph] [] __ceph_setattr+0x79a/0x8b0 [ceph] [] ceph_setattr+0x3c/0x60 [ceph] [] notify_change+0x266/0x440 [] do_truncate+0x75/0xc0 [] path_openat+0xaba/0x13b0 [] do_filp_open+0x91/0x100 [] do_sys_open+0x124/0x210 [] SyS_open+0x1e/0x20 [] do_syscall_64+0x67/0x150 [] entry_SYSCALL64_slow_path+0x25/0x25 [] 0x [root@ceph-radosgw-lb-backup cephfs]# cat /proc/4479/stack [] __ceph_do_pending_vmtruncate+0x44/0x1c0 [ceph] [] try_get_cap_refs+0xb5/0x5a0 [ceph] [] ceph_get_caps+0x119/0x390 [ceph] [] ceph_read_iter+0xc5/0x820 [ceph] [] __vfs_read+0xdf/0x130 [] vfs_read+0x8c/0x130 [] SyS_read+0x55/0xc0 [] do_syscall_64+0x67/0x150 [] entry_SYSCALL64_slow_path+0x25/0x25 [] 0x donglifec...@gmail.com From: donglifec...@gmail.com Date: 2017-08-25 16:14 To: zyan CC: ceph-users Subject: Re: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9 ZhengYan, I will test this problem again. Thanks a lot. donglifec...@gmail.com From: Yan, Zheng Date: 2017-08-25 16:12 To: donglifecomm CC: ceph-users Subject: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9 > On 24 Aug 2017, at 17:40, donglifec...@gmail.com wrote: > > ZhengYan, > > I meet a problem, Follow the steps outlined below: > > 1. create 30G file test823 > 2. host1 client(kernel 4.12.8) > cat /mnt/cephfs/a/test823 > /mnt/cephfs/a/test823-backup > ls -al /mnt/cephfs/a/* > > 3. host2 client(kernel 4.12.8) > while true; do cp /home/scripts/512k.file /mnt/cephfs/a/512k.file$i ; > done // loop copy file > cat /mnt/cep
Re: [ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9
ZhengYan, I will test this problem again. Thanks a lot. donglifec...@gmail.com From: Yan, Zheng Date: 2017-08-25 16:12 To: donglifecomm CC: ceph-users Subject: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9 > On 24 Aug 2017, at 17:40, donglifec...@gmail.com wrote: > > ZhengYan, > > I meet a problem, Follow the steps outlined below: > > 1. create 30G file test823 > 2. host1 client(kernel 4.12.8) > cat /mnt/cephfs/a/test823 > /mnt/cephfs/a/test823-backup > ls -al /mnt/cephfs/a/* > > 3. host2 client(kernel 4.12.8) > while true; do cp /home/scripts/512k.file /mnt/cephfs/a/512k.file$i ; > done // loop copy file > cat /mnt/cephfs/a/test823-backup > /mnt/cephfs/a/newtestfile > ls -al /mnt/cephfs/a/* > > 4. host2 client hung, stack is : > [ 9462.754853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 9462.756838] bashD0 32738 14988 0x0084 > [ 9462.758568] Call Trace: > [ 9462.759945] __schedule+0x28a/0x880 > [ 9462.761414] schedule+0x36/0x80 > [ 9462.762835] rwsem_down_write_failed+0x20d/0x380 > [ 9462.764433] call_rwsem_down_write_failed+0x17/0x30 > [ 9462.766075] ? __ceph_getxattr+0x340/0x340 [ceph] > [ 9462.767693] down_write+0x2d/0x40 > [ 9462.769175] do_truncate+0x67/0xc0 > [ 9462.770642] path_openat+0xaba/0x13b0 > [ 9462.772136] do_filp_open+0x91/0x100 > [ 9462.773616] ? __check_object_size+0x159/0x190 > [ 9462.775156] ? __alloc_fd+0x46/0x170 > [ 9462.776574] do_sys_open+0x124/0x210 > [ 9462.777972] SyS_open+0x1e/0x20 > [ 9462.779320] do_syscall_64+0x67/0x150 > [ 9462.780736] entry_SYSCALL64_slow_path+0x25/0x25 > > [root@cephtest ~]# cat /proc/29541/stack > [] ceph_mdsc_do_request+0x183/0x240 [ceph] > [] __ceph_setattr+0x3fc/0x8b0 [ceph] > [] ceph_setattr+0x3c/0x60 [ceph] > [] notify_change+0x266/0x440 > [] do_truncate+0x75/0xc0 > [] path_openat+0xaba/0x13b0 > [] do_filp_open+0x91/0x100 > [] do_sys_open+0x124/0x210 > [] SyS_open+0x1e/0x20 > [] do_syscall_64+0x67/0x150 > [] entry_SYSCALL64_slow_path+0x25/0x25 > [] 0x > > [root@cephtest ~]# cat /proc/32738/stack > [] call_rwsem_down_write_failed+0x17/0x30 > [] do_truncate+0x67/0xc0 > [] path_openat+0xaba/0x13b0 > [] do_filp_open+0x91/0x100 > [] do_sys_open+0x124/0x210 > [] SyS_open+0x1e/0x20 > [] do_syscall_64+0x67/0x150 > [] entry_SYSCALL64_slow_path+0x25/0x25 > [] 0x > > ceph log is: > f pending pAsLsXs issued pAsLsXsFcb, sent 1921.069365 seconds ago > 2017-08-24 17:16:00.219523 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 1000424 > pending pAsLsXs issued pAsLsXsFcb, sent 1921.063079 seconds ago > 2017-08-24 17:16:00.219534 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 1000521 > pending pAsLsXs issued pAsLsXsFcb, sent 1921.026983 seconds ago > 2017-08-24 17:16:00.219545 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 1000523 > pending pAsLsXs issued pAsLsXsFcb, sent 1920.985596 seconds ago > 2017-08-24 17:16:00.219574 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 1000528 > pending pAsLsXs issued pAsLsXsFcb, sent 1920.866863 seconds ago > 2017-08-24 17:16:00.219592 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 100052a > pending pAsLsXs issued pAsLsXsFcb, sent 1920.788282 seconds ago > 2017-08-24 17:16:00.219606 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 100052c > pending pAsLsXs issued pAsLsXsFcb, sent 1920.712564 seconds ago > 2017-08-24 17:16:00.219618 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 100052f > pending pAsLsXs issued pAsLsXsFcb, sent 1920.563784 seconds ago > 2017-08-24 17:16:00.219630 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 100040b > pending pAsLsXsFsc issued pAsLsXsFscb, sent 1920.506752 seconds ago > 2017-08-24 17:16:00.219741 7f746db8f700 0 log_channel(cluster) log [WRN] : 4 > slow requests, 1 included below; oldest blocked for > 1941.487238 secs > 2017-08-24 17:16:00.219753 7f746db8f700 0 log_channel(cluster) log [WRN] : > slow request 1920.507384 seconds old, received at 2017-08-24 16:43:59.712319: > client_request(c
Re: [ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9
> On 24 Aug 2017, at 17:40, donglifec...@gmail.com wrote: > > ZhengYan, > > I meet a problem, Follow the steps outlined below: > > 1. create 30G file test823 > 2. host1 client(kernel 4.12.8) > cat /mnt/cephfs/a/test823 > /mnt/cephfs/a/test823-backup > ls -al /mnt/cephfs/a/* > > 3. host2 client(kernel 4.12.8) > while true; do cp /home/scripts/512k.file /mnt/cephfs/a/512k.file$i ; > done // loop copy file > cat /mnt/cephfs/a/test823-backup > /mnt/cephfs/a/newtestfile > ls -al /mnt/cephfs/a/* > > 4. host2 client hung, stack is : > [ 9462.754853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 9462.756838] bashD0 32738 14988 0x0084 > [ 9462.758568] Call Trace: > [ 9462.759945] __schedule+0x28a/0x880 > [ 9462.761414] schedule+0x36/0x80 > [ 9462.762835] rwsem_down_write_failed+0x20d/0x380 > [ 9462.764433] call_rwsem_down_write_failed+0x17/0x30 > [ 9462.766075] ? __ceph_getxattr+0x340/0x340 [ceph] > [ 9462.767693] down_write+0x2d/0x40 > [ 9462.769175] do_truncate+0x67/0xc0 > [ 9462.770642] path_openat+0xaba/0x13b0 > [ 9462.772136] do_filp_open+0x91/0x100 > [ 9462.773616] ? __check_object_size+0x159/0x190 > [ 9462.775156] ? __alloc_fd+0x46/0x170 > [ 9462.776574] do_sys_open+0x124/0x210 > [ 9462.777972] SyS_open+0x1e/0x20 > [ 9462.779320] do_syscall_64+0x67/0x150 > [ 9462.780736] entry_SYSCALL64_slow_path+0x25/0x25 > > [root@cephtest ~]# cat /proc/29541/stack > [] ceph_mdsc_do_request+0x183/0x240 [ceph] > [] __ceph_setattr+0x3fc/0x8b0 [ceph] > [] ceph_setattr+0x3c/0x60 [ceph] > [] notify_change+0x266/0x440 > [] do_truncate+0x75/0xc0 > [] path_openat+0xaba/0x13b0 > [] do_filp_open+0x91/0x100 > [] do_sys_open+0x124/0x210 > [] SyS_open+0x1e/0x20 > [] do_syscall_64+0x67/0x150 > [] entry_SYSCALL64_slow_path+0x25/0x25 > [] 0x > > [root@cephtest ~]# cat /proc/32738/stack > [] call_rwsem_down_write_failed+0x17/0x30 > [] do_truncate+0x67/0xc0 > [] path_openat+0xaba/0x13b0 > [] do_filp_open+0x91/0x100 > [] do_sys_open+0x124/0x210 > [] SyS_open+0x1e/0x20 > [] do_syscall_64+0x67/0x150 > [] entry_SYSCALL64_slow_path+0x25/0x25 > [] 0x > > ceph log is: > f pending pAsLsXs issued pAsLsXsFcb, sent 1921.069365 seconds ago > 2017-08-24 17:16:00.219523 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 1000424 > pending pAsLsXs issued pAsLsXsFcb, sent 1921.063079 seconds ago > 2017-08-24 17:16:00.219534 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 1000521 > pending pAsLsXs issued pAsLsXsFcb, sent 1921.026983 seconds ago > 2017-08-24 17:16:00.219545 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 1000523 > pending pAsLsXs issued pAsLsXsFcb, sent 1920.985596 seconds ago > 2017-08-24 17:16:00.219574 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 1000528 > pending pAsLsXs issued pAsLsXsFcb, sent 1920.866863 seconds ago > 2017-08-24 17:16:00.219592 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 100052a > pending pAsLsXs issued pAsLsXsFcb, sent 1920.788282 seconds ago > 2017-08-24 17:16:00.219606 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 100052c > pending pAsLsXs issued pAsLsXsFcb, sent 1920.712564 seconds ago > 2017-08-24 17:16:00.219618 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 100052f > pending pAsLsXs issued pAsLsXsFcb, sent 1920.563784 seconds ago > 2017-08-24 17:16:00.219630 7f746db8f700 0 log_channel(cluster) log [WRN] : > client.268113 isn't responding to mclientcaps(revoke), ino 100040b > pending pAsLsXsFsc issued pAsLsXsFscb, sent 1920.506752 seconds ago > 2017-08-24 17:16:00.219741 7f746db8f700 0 log_channel(cluster) log [WRN] : 4 > slow requests, 1 included below; oldest blocked for > 1941.487238 secs > 2017-08-24 17:16:00.219753 7f746db8f700 0 log_channel(cluster) log [WRN] : > slow request 1920.507384 seconds old, received at 2017-08-24 16:43:59.712319: > client_request(client.268101:1122217 getattr pAsLsXsFs #100040b > 2017-08-24 16:44:00.463827) currently failed to rdlock, waiting please check if there are hung request in /sys/kernel/debug/ceph/*/osdc. It’s likely that kernel was unable to flush dirty pages. Regards Yan, Zheng > > Thanks a lot. > > > > > > donglifec...@gmail.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9
ZhengYan, I meet a problem, Follow the steps outlined below: 1. create 30G file test823 2. host1 client(kernel 4.12.8) cat /mnt/cephfs/a/test823 > /mnt/cephfs/a/test823-backup ls -al /mnt/cephfs/a/* 3. host2 client(kernel 4.12.8) while true; do cp /home/scripts/512k.file /mnt/cephfs/a/512k.file$i ; done // loop copy file cat /mnt/cephfs/a/test823-backup > /mnt/cephfs/a/newtestfile ls -al /mnt/cephfs/a/* 4. host2 client hung, stack is : [ 9462.754853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9462.756838] bashD0 32738 14988 0x0084 [ 9462.758568] Call Trace: [ 9462.759945] __schedule+0x28a/0x880 [ 9462.761414] schedule+0x36/0x80 [ 9462.762835] rwsem_down_write_failed+0x20d/0x380 [ 9462.764433] call_rwsem_down_write_failed+0x17/0x30 [ 9462.766075] ? __ceph_getxattr+0x340/0x340 [ceph] [ 9462.767693] down_write+0x2d/0x40 [ 9462.769175] do_truncate+0x67/0xc0 [ 9462.770642] path_openat+0xaba/0x13b0 [ 9462.772136] do_filp_open+0x91/0x100 [ 9462.773616] ? __check_object_size+0x159/0x190 [ 9462.775156] ? __alloc_fd+0x46/0x170 [ 9462.776574] do_sys_open+0x124/0x210 [ 9462.777972] SyS_open+0x1e/0x20 [ 9462.779320] do_syscall_64+0x67/0x150 [ 9462.780736] entry_SYSCALL64_slow_path+0x25/0x25 [root@cephtest ~]# cat /proc/29541/stack [] ceph_mdsc_do_request+0x183/0x240 [ceph] [] __ceph_setattr+0x3fc/0x8b0 [ceph] [] ceph_setattr+0x3c/0x60 [ceph] [] notify_change+0x266/0x440 [] do_truncate+0x75/0xc0 [] path_openat+0xaba/0x13b0 [] do_filp_open+0x91/0x100 [] do_sys_open+0x124/0x210 [] SyS_open+0x1e/0x20 [] do_syscall_64+0x67/0x150 [] entry_SYSCALL64_slow_path+0x25/0x25 [] 0x [root@cephtest ~]# cat /proc/32738/stack [] call_rwsem_down_write_failed+0x17/0x30 [] do_truncate+0x67/0xc0 [] path_openat+0xaba/0x13b0 [] do_filp_open+0x91/0x100 [] do_sys_open+0x124/0x210 [] SyS_open+0x1e/0x20 [] do_syscall_64+0x67/0x150 [] entry_SYSCALL64_slow_path+0x25/0x25 [] 0x ceph log is: f pending pAsLsXs issued pAsLsXsFcb, sent 1921.069365 seconds ago 2017-08-24 17:16:00.219523 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 1000424 pending pAsLsXs issued pAsLsXsFcb, sent 1921.063079 seconds ago 2017-08-24 17:16:00.219534 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 1000521 pending pAsLsXs issued pAsLsXsFcb, sent 1921.026983 seconds ago 2017-08-24 17:16:00.219545 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 1000523 pending pAsLsXs issued pAsLsXsFcb, sent 1920.985596 seconds ago 2017-08-24 17:16:00.219574 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 1000528 pending pAsLsXs issued pAsLsXsFcb, sent 1920.866863 seconds ago 2017-08-24 17:16:00.219592 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 100052a pending pAsLsXs issued pAsLsXsFcb, sent 1920.788282 seconds ago 2017-08-24 17:16:00.219606 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 100052c pending pAsLsXs issued pAsLsXsFcb, sent 1920.712564 seconds ago 2017-08-24 17:16:00.219618 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 100052f pending pAsLsXs issued pAsLsXsFcb, sent 1920.563784 seconds ago 2017-08-24 17:16:00.219630 7f746db8f700 0 log_channel(cluster) log [WRN] : client.268113 isn't responding to mclientcaps(revoke), ino 100040b pending pAsLsXsFsc issued pAsLsXsFscb, sent 1920.506752 seconds ago 2017-08-24 17:16:00.219741 7f746db8f700 0 log_channel(cluster) log [WRN] : 4 slow requests, 1 included below; oldest blocked for > 1941.487238 secs 2017-08-24 17:16:00.219753 7f746db8f700 0 log_channel(cluster) log [WRN] : slow request 1920.507384 seconds old, received at 2017-08-24 16:43:59.712319: client_request(client.268101:1122217 getattr pAsLsXsFs #100040b 2017-08-24 16:44:00.463827) currently failed to rdlock, waiting Thanks a lot. donglifec...@gmail.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com