Re: [ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9

2017-08-25 Thread Yan, Zheng

> On 25 Aug 2017, at 18:57, donglifec...@gmail.com wrote:
> 
> ZhengYan,
> 
> Yes,  shutdown osd.1,  process  D status disappear.   what reason is this ?  
> when  this problem( D status) comes up,   ceph is health ok.  How should I 
> deal with this problem?
> 

maybe the disk underneath osd.1 is about to die.


> Thanks a lot.
> 
> donglifec...@gmail.com
>  
> From: Yan, Zheng
> Date: 2017-08-25 17:17
> To: donglifec...@gmail.com
> CC: ceph-users
> Subject: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D 
> status), ceph version 0.94.9
>  
> > On 25 Aug 2017, at 16:23, donglifec...@gmail.com wrote:
> >
> > ZhengYan,
> >
> > [root@ceph-radosgw-lb-backup cephfs]# ps aux | grep D
> > USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> > root   578  0.0  0.0 203360  3248 ?Ssl  Aug24   0:00 
> > /usr/sbin/gssproxy -D
> > root   865  0.0  0.0  82552  6104 ?Ss   Aug24   0:00 
> > /usr/sbin/sshd -D
> > root  2997  0.0  0.0  0 0 ?DAug24   0:11 
> > [kworker/2:1]
> > root  3996  0.0  0.0 115384   452 ?DAug24   0:00 -bash
> > root  4479  0.0  0.0 112024   652 ?DAug24   0:00 cat 
> > /mnt/cephfs/a/test-test1
> > root 18143  0.0  0.0 112656  2244 pts/2S+   08:19   0:00 grep 
> > --color=auto D
> >
> > [root@ceph-radosgw-lb-backup cephfs]# cat  
> > /sys/kernel/debug/ceph/b6ea8682-c90d-495b-80f2-bc5bef1da9d1.client270186/osdc
> >  
> > REQUESTS 25 homeless 0
> > 6225 osd1 21.cd699f5e [1,2,3]/1 [1,2,3]/1 1000ab2. 0x400014 1 
> > read
> > 6229 osd1 21.ba1e03ba [1,3,2]/1 [1,3,2]/1 1008f05.0002 0x400024 1 
> > write
> > 6234 osd1 21.29c5e8d [1,0,2]/1 [1,0,2]/1 1008f05.0007 0x400024 1 
> > write
> > 6238 osd1 21.9fe9de73 [1,2,0]/1 [1,2,0]/1 1008f05.000b 0x400024 1 
> > write
> > 6240 osd1 21.a05b2ba8 [1,0,3]/1 [1,0,3]/1 1008f05.000d 0x400024 1 
> > write
> > 6241 osd1 21.b7c85b45 [1,2,3]/1 [1,2,3]/1 1008f05.000e 0x400024 1 
> > write
> > 6242 osd1 21.1bca917f [1,2,3]/1 [1,2,3]/1 1008f05.000f 0x400024 1 
> > write
> > 6243 osd1 21.cdc3143b [1,2,3]/1 [1,2,3]/1 1008f05.0010 0x400024 1 
> > write
> > 6244 osd1 21.8e65566e [1,3,2]/1 [1,3,2]/1 1008f05.0011 0x400024 1 
> > write
> > 6246 osd1 21.5395ea66 [1,2,3]/1 [1,2,3]/1 1008f05.0013 0x400024 1 
> > write
> > 6251 osd1 21.2acb9e9c [1,0,2]/1 [1,0,2]/1 1008f05.0018 0x400024 1 
> > write
> > 6252 osd1 21.b22077e3 [1,0,3]/1 [1,0,3]/1 1008f05.0019 0x400024 1 
> > write
> > 6254 osd1 21.a9a1c2b1 [1,2,3]/1 [1,2,3]/1 1008f05.001b 0x400024 1 
> > write
> > 6256 osd1 21.64cfc57f [1,2,3]/1 [1,2,3]/1 1008f05.001d 0x400024 1 
> > write
> > 6259 osd1 21.629f77ff [1,2,3]/1 [1,2,3]/1 1008f05.0020 0x400024 1 
> > write
> > 6263 osd1 21.dde8c63 [1,0,3]/1 [1,0,3]/1 1008f05.0024 0x400024 1 
> > write
> > 6265 osd1 21.909ba5f8 [1,0,3]/1 [1,0,3]/1 1008f05.0026 0x400024 1 
> > write
> > 6266 osd1 21.5496fef5 [1,3,2]/1 [1,3,2]/1 1008f05.0027 0x400024 1 
> > write
> > 6269 osd1 21.1f26a27b [1,2,3]/1 [1,2,3]/1 1008f05.002a 0x400024 1 
> > write
> > 6270 osd1 21.c9021b4e [1,0,2]/1 [1,0,2]/1 1008f05.002b 0x400024 1 
> > write
> > 6276 osd1 21.e6df28ed [1,2,0]/1 [1,2,0]/1 1008f05.0031 0x400024 1 
> > write
> > 6286 osd1 21.1a38cab8 [1,0,3]/1 [1,0,3]/1 1008f05.003b 0x400024 1 
> > write
> > 6287 osd1 21.425da4ba [1,3,2]/1 [1,3,2]/1 1008f05.003c 0x400024 1 
> > write
> > 6291 osd1 21.bb5fa23f [1,2,3]/1 [1,2,3]/1 1008f05.0040 0x400024 1 
> > write
> > 6298 osd1 21.b9255c8e [1,0,2]/1 [1,0,2]/1 1008f05.0047 0x400024 1 
> > write
> > LINGER REQUESTS
> >
>  
> It seems kernel was unable to write data to osd1. please try shutdowni osd1.
>  
>  
>  
>  
> >
> > [root@ceph-radosgw-lb-backup cephfs]# cat /proc/2997/stack
> > [] io_schedule+0x16/0x40
> > [] __lock_page+0x119/0x170
> > [] truncate_inode_pages_range+0x421/0x790
> > [] truncate_pagecache+0x47/0x60
> > [] __ceph_do_pending_vmtruncate+0xc2/0x1c0 [ceph]
> > [] ceph_vmtruncate_work+0x1b/0x40 [ceph]
> > [] process_one_work+0x149/0x360
> > [] worker_thread+0x4d/0x3c0
> > [] kthread+0x109/0x140
> > [] ret_from_fork+0x25/0x30
> > [] 0x
> >
> > [root@ceph-radosgw-lb-backup cephfs]# cat /proc/3996/stack
> &g

Re: [ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9

2017-08-25 Thread donglifec...@gmail.com
ZhengYan,

Yes,  shutdown osd.1,  process  D status disappear.   what reason is this ?  
when  this problem( D status) comes up,   ceph is health ok.  How should I deal 
with this problem?

Thanks a lot.



donglifec...@gmail.com
 
From: Yan, Zheng
Date: 2017-08-25 17:17
To: donglifec...@gmail.com
CC: ceph-users
Subject: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D status), 
ceph version 0.94.9
 
> On 25 Aug 2017, at 16:23, donglifec...@gmail.com wrote:
> 
> ZhengYan,
> 
> [root@ceph-radosgw-lb-backup cephfs]# ps aux | grep D
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> root   578  0.0  0.0 203360  3248 ?Ssl  Aug24   0:00 
> /usr/sbin/gssproxy -D
> root   865  0.0  0.0  82552  6104 ?Ss   Aug24   0:00 
> /usr/sbin/sshd -D
> root  2997  0.0  0.0  0 0 ?DAug24   0:11 [kworker/2:1]
> root  3996  0.0  0.0 115384   452 ?DAug24   0:00 -bash
> root  4479  0.0  0.0 112024   652 ?DAug24   0:00 cat 
> /mnt/cephfs/a/test-test1
> root 18143  0.0  0.0 112656  2244 pts/2S+   08:19   0:00 grep 
> --color=auto D
> 
> [root@ceph-radosgw-lb-backup cephfs]# cat  
> /sys/kernel/debug/ceph/b6ea8682-c90d-495b-80f2-bc5bef1da9d1.client270186/osdc 
> REQUESTS 25 homeless 0
> 6225 osd1 21.cd699f5e [1,2,3]/1 [1,2,3]/1 1000ab2. 0x400014 1 read
> 6229 osd1 21.ba1e03ba [1,3,2]/1 [1,3,2]/1 1008f05.0002 0x400024 1 
> write
> 6234 osd1 21.29c5e8d [1,0,2]/1 [1,0,2]/1 1008f05.0007 0x400024 1 write
> 6238 osd1 21.9fe9de73 [1,2,0]/1 [1,2,0]/1 1008f05.000b 0x400024 1 
> write
> 6240 osd1 21.a05b2ba8 [1,0,3]/1 [1,0,3]/1 1008f05.000d 0x400024 1 
> write
> 6241 osd1 21.b7c85b45 [1,2,3]/1 [1,2,3]/1 1008f05.000e 0x400024 1 
> write
> 6242 osd1 21.1bca917f [1,2,3]/1 [1,2,3]/1 1008f05.000f 0x400024 1 
> write
> 6243 osd1 21.cdc3143b [1,2,3]/1 [1,2,3]/1 1008f05.0010 0x400024 1 
> write
> 6244 osd1 21.8e65566e [1,3,2]/1 [1,3,2]/1 1008f05.0011 0x400024 1 
> write
> 6246 osd1 21.5395ea66 [1,2,3]/1 [1,2,3]/1 1008f05.0013 0x400024 1 
> write
> 6251 osd1 21.2acb9e9c [1,0,2]/1 [1,0,2]/1 1008f05.0018 0x400024 1 
> write
> 6252 osd1 21.b22077e3 [1,0,3]/1 [1,0,3]/1 1008f05.0019 0x400024 1 
> write
> 6254 osd1 21.a9a1c2b1 [1,2,3]/1 [1,2,3]/1 1008f05.001b 0x400024 1 
> write
> 6256 osd1 21.64cfc57f [1,2,3]/1 [1,2,3]/1 1008f05.001d 0x400024 1 
> write
> 6259 osd1 21.629f77ff [1,2,3]/1 [1,2,3]/1 1008f05.0020 0x400024 1 
> write
> 6263 osd1 21.dde8c63 [1,0,3]/1 [1,0,3]/1 1008f05.0024 0x400024 1 write
> 6265 osd1 21.909ba5f8 [1,0,3]/1 [1,0,3]/1 1008f05.0026 0x400024 1 
> write
> 6266 osd1 21.5496fef5 [1,3,2]/1 [1,3,2]/1 1008f05.0027 0x400024 1 
> write
> 6269 osd1 21.1f26a27b [1,2,3]/1 [1,2,3]/1 1008f05.002a 0x400024 1 
> write
> 6270 osd1 21.c9021b4e [1,0,2]/1 [1,0,2]/1 1008f05.002b 0x400024 1 
> write
> 6276 osd1 21.e6df28ed [1,2,0]/1 [1,2,0]/1 1008f05.0031 0x400024 1 
> write
> 6286 osd1 21.1a38cab8 [1,0,3]/1 [1,0,3]/1 1008f05.003b 0x400024 1 
> write
> 6287 osd1 21.425da4ba [1,3,2]/1 [1,3,2]/1 1008f05.003c 0x400024 1 
> write
> 6291 osd1 21.bb5fa23f [1,2,3]/1 [1,2,3]/1 1008f05.0040 0x400024 1 
> write
> 6298 osd1 21.b9255c8e [1,0,2]/1 [1,0,2]/1 1008f05.0047 0x400024 1 
> write
> LINGER REQUESTS
> 
 
It seems kernel was unable to write data to osd1. please try shutdowni osd1.
 
 
 
 
> 
> [root@ceph-radosgw-lb-backup cephfs]# cat /proc/2997/stack
> [] io_schedule+0x16/0x40
> [] __lock_page+0x119/0x170
> [] truncate_inode_pages_range+0x421/0x790
> [] truncate_pagecache+0x47/0x60
> [] __ceph_do_pending_vmtruncate+0xc2/0x1c0 [ceph]
> [] ceph_vmtruncate_work+0x1b/0x40 [ceph]
> [] process_one_work+0x149/0x360
> [] worker_thread+0x4d/0x3c0
> [] kthread+0x109/0x140
> [] ret_from_fork+0x25/0x30
> [] 0x
> 
> [root@ceph-radosgw-lb-backup cephfs]# cat /proc/3996/stack
> [] __ceph_do_pending_vmtruncate+0x44/0x1c0 [ceph]
> [] __ceph_setattr+0x79a/0x8b0 [ceph]
> [] ceph_setattr+0x3c/0x60 [ceph]
> [] notify_change+0x266/0x440
> [] do_truncate+0x75/0xc0
> [] path_openat+0xaba/0x13b0
> [] do_filp_open+0x91/0x100
> [] do_sys_open+0x124/0x210
> [] SyS_open+0x1e/0x20
> [] do_syscall_64+0x67/0x150
> [] entry_SYSCALL64_slow_path+0x25/0x25
> [] 0x
> 
> [root@ceph-radosgw-lb-backup cephfs]# cat /proc/4479/stack
> [] __ceph_do_pending_vmtruncate+0x44/0x1c0 [ceph]
> [] try_get_cap_refs+0xb5/0x5a0 [ceph]
> [] ceph_get_caps+0x119/0x390 [ceph]
> [] ceph_read_iter+0xc5/0x820 [ceph]
> [] __vfs_read+0xdf/0x130
> [] vfs_read+

Re: [ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9

2017-08-25 Thread Yan, Zheng
/0xc0
> [] path_openat+0xaba/0x13b0
> [] do_filp_open+0x91/0x100
> [] do_sys_open+0x124/0x210
> [] SyS_open+0x1e/0x20
> [] do_syscall_64+0x67/0x150
> [] entry_SYSCALL64_slow_path+0x25/0x25
> [] 0x
> 
> [root@ceph-radosgw-lb-backup cephfs]# cat /proc/4479/stack
> [] __ceph_do_pending_vmtruncate+0x44/0x1c0 [ceph]
> [] try_get_cap_refs+0xb5/0x5a0 [ceph]
> [] ceph_get_caps+0x119/0x390 [ceph]
> [] ceph_read_iter+0xc5/0x820 [ceph]
> [] __vfs_read+0xdf/0x130
> [] vfs_read+0x8c/0x130
> [] SyS_read+0x55/0xc0
> [] do_syscall_64+0x67/0x150
> [] entry_SYSCALL64_slow_path+0x25/0x25
> [] 0x
> 
> 
> 
> donglifec...@gmail.com
>  
> From: donglifec...@gmail.com
> Date: 2017-08-25 16:14
> To: zyan
> CC: ceph-users
> Subject: Re: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D 
> status), ceph version 0.94.9
> ZhengYan,
> 
> I will test this problem again. 
> 
> Thanks a lot.
> 
> 
> donglifec...@gmail.com
>  
> From: Yan, Zheng
> Date: 2017-08-25 16:12
> To: donglifecomm
> CC: ceph-users
> Subject: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D 
> status), ceph version 0.94.9
>  
>  
> > On 24 Aug 2017, at 17:40, donglifec...@gmail.com wrote:
> >
> > ZhengYan,
> >
> > I meet a problem,   Follow the steps outlined below:
> >
> > 1.  create 30G file test823
> > 2.  host1 client(kernel 4.12.8)
> >   cat /mnt/cephfs/a/test823 > /mnt/cephfs/a/test823-backup
> >   ls -al /mnt/cephfs/a/* 
> >
> > 3. host2 client(kernel 4.12.8)
> >   while true; do cp /home/scripts/512k.file /mnt/cephfs/a/512k.file$i ; 
> > done   //  loop copy file 
> >   cat /mnt/cephfs/a/test823-backup > /mnt/cephfs/a/newtestfile
> >   ls -al /mnt/cephfs/a/*
> >   
> > 4. host2 client hung, stack is :
> > [ 9462.754853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> > this message.
> > [ 9462.756838] bashD0 32738  14988 0x0084
> > [ 9462.758568] Call Trace:
> > [ 9462.759945]  __schedule+0x28a/0x880
> > [ 9462.761414]  schedule+0x36/0x80
> > [ 9462.762835]  rwsem_down_write_failed+0x20d/0x380
> > [ 9462.764433]  call_rwsem_down_write_failed+0x17/0x30
> > [ 9462.766075]  ? __ceph_getxattr+0x340/0x340 [ceph]
> > [ 9462.767693]  down_write+0x2d/0x40
> > [ 9462.769175]  do_truncate+0x67/0xc0
> > [ 9462.770642]  path_openat+0xaba/0x13b0
> > [ 9462.772136]  do_filp_open+0x91/0x100
> > [ 9462.773616]  ? __check_object_size+0x159/0x190
> > [ 9462.775156]  ? __alloc_fd+0x46/0x170
> > [ 9462.776574]  do_sys_open+0x124/0x210
> > [ 9462.777972]  SyS_open+0x1e/0x20
> > [ 9462.779320]  do_syscall_64+0x67/0x150
> > [ 9462.780736]  entry_SYSCALL64_slow_path+0x25/0x25
> >
> > [root@cephtest ~]# cat /proc/29541/stack
> > [] ceph_mdsc_do_request+0x183/0x240 [ceph]
> > [] __ceph_setattr+0x3fc/0x8b0 [ceph]
> > [] ceph_setattr+0x3c/0x60 [ceph]
> > [] notify_change+0x266/0x440
> > [] do_truncate+0x75/0xc0
> > [] path_openat+0xaba/0x13b0
> > [] do_filp_open+0x91/0x100
> > [] do_sys_open+0x124/0x210
> > [] SyS_open+0x1e/0x20
> > [] do_syscall_64+0x67/0x150
> > [] entry_SYSCALL64_slow_path+0x25/0x25
> > [] 0x
> >
> > [root@cephtest ~]# cat /proc/32738/stack
> > [] call_rwsem_down_write_failed+0x17/0x30
> > [] do_truncate+0x67/0xc0
> > [] path_openat+0xaba/0x13b0
> > [] do_filp_open+0x91/0x100
> > [] do_sys_open+0x124/0x210
> > [] SyS_open+0x1e/0x20
> > [] do_syscall_64+0x67/0x150
> > [] entry_SYSCALL64_slow_path+0x25/0x25
> > [] 0x
> >
> > ceph log is:
> > f pending pAsLsXs issued pAsLsXsFcb, sent 1921.069365 seconds ago
> > 2017-08-24 17:16:00.219523 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> > client.268113 isn't responding to mclientcaps(revoke), ino 1000424 
> > pending pAsLsXs issued pAsLsXsFcb, sent 1921.063079 seconds ago
> > 2017-08-24 17:16:00.219534 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> > client.268113 isn't responding to mclientcaps(revoke), ino 1000521 
> > pending pAsLsXs issued pAsLsXsFcb, sent 1921.026983 seconds ago
> > 2017-08-24 17:16:00.219545 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> > client.268113 isn't responding to mclientcaps(revoke), ino 1000523 
> > pending pAsLsXs issued pAsLsXsFcb, sent 1920.985596 seconds ago
> > 2017-08-24 17:16:00.219574 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> > client.268113 isn't responding to mclientcaps(revoke), ino 1000528 
> > 

Re: [ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9

2017-08-25 Thread donglifec...@gmail.com
ZhengYan,

[root@ceph-radosgw-lb-backup cephfs]# ps aux | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root   578  0.0  0.0 203360  3248 ?Ssl  Aug24   0:00 
/usr/sbin/gssproxy -D
root   865  0.0  0.0  82552  6104 ?Ss   Aug24   0:00 /usr/sbin/sshd 
-D
root  2997  0.0  0.0  0 0 ?DAug24   0:11 [kworker/2:1]
root  3996  0.0  0.0 115384   452 ?DAug24   0:00 -bash
root  4479  0.0  0.0 112024   652 ?DAug24   0:00 cat 
/mnt/cephfs/a/test-test1
root 18143  0.0  0.0 112656  2244 pts/2S+   08:19   0:00 grep 
--color=auto D

[root@ceph-radosgw-lb-backup cephfs]# cat  
/sys/kernel/debug/ceph/b6ea8682-c90d-495b-80f2-bc5bef1da9d1.client270186/osdc 
REQUESTS 25 homeless 0
6225 osd1 21.cd699f5e [1,2,3]/1 [1,2,3]/1 1000ab2. 0x400014 1 read
6229 osd1 21.ba1e03ba [1,3,2]/1 [1,3,2]/1 1008f05.0002 0x400024 1 write
6234 osd1 21.29c5e8d [1,0,2]/1 [1,0,2]/1 1008f05.0007 0x400024 1 write
6238 osd1 21.9fe9de73 [1,2,0]/1 [1,2,0]/1 1008f05.000b 0x400024 1 write
6240 osd1 21.a05b2ba8 [1,0,3]/1 [1,0,3]/1 1008f05.000d 0x400024 1 write
6241 osd1 21.b7c85b45 [1,2,3]/1 [1,2,3]/1 1008f05.000e 0x400024 1 write
6242 osd1 21.1bca917f [1,2,3]/1 [1,2,3]/1 1008f05.000f 0x400024 1 write
6243 osd1 21.cdc3143b [1,2,3]/1 [1,2,3]/1 1008f05.0010 0x400024 1 write
6244 osd1 21.8e65566e [1,3,2]/1 [1,3,2]/1 1008f05.0011 0x400024 1 write
6246 osd1 21.5395ea66 [1,2,3]/1 [1,2,3]/1 1008f05.0013 0x400024 1 write
6251 osd1 21.2acb9e9c [1,0,2]/1 [1,0,2]/1 1008f05.0018 0x400024 1 write
6252 osd1 21.b22077e3 [1,0,3]/1 [1,0,3]/1 1008f05.0019 0x400024 1 write
6254 osd1 21.a9a1c2b1 [1,2,3]/1 [1,2,3]/1 1008f05.001b 0x400024 1 write
6256 osd1 21.64cfc57f [1,2,3]/1 [1,2,3]/1 1008f05.001d 0x400024 1 write
6259 osd1 21.629f77ff [1,2,3]/1 [1,2,3]/1 1008f05.0020 0x400024 1 write
6263 osd1 21.dde8c63 [1,0,3]/1 [1,0,3]/1 1008f05.0024 0x400024 1 write
6265 osd1 21.909ba5f8 [1,0,3]/1 [1,0,3]/1 1008f05.0026 0x400024 1 write
6266 osd1 21.5496fef5 [1,3,2]/1 [1,3,2]/1 1008f05.0027 0x400024 1 write
6269 osd1 21.1f26a27b [1,2,3]/1 [1,2,3]/1 1008f05.002a 0x400024 1 write
6270 osd1 21.c9021b4e [1,0,2]/1 [1,0,2]/1 1008f05.002b 0x400024 1 write
6276 osd1 21.e6df28ed [1,2,0]/1 [1,2,0]/1 1008f05.0031 0x400024 1 write
6286 osd1 21.1a38cab8 [1,0,3]/1 [1,0,3]/1 1008f05.003b 0x400024 1 write
6287 osd1 21.425da4ba [1,3,2]/1 [1,3,2]/1 1008f05.003c 0x400024 1 write
6291 osd1 21.bb5fa23f [1,2,3]/1 [1,2,3]/1 1008f05.0040 0x400024 1 write
6298 osd1 21.b9255c8e [1,0,2]/1 [1,0,2]/1 1008f05.0047 0x400024 1 write
LINGER REQUESTS


[root@ceph-radosgw-lb-backup cephfs]# cat /proc/2997/stack
[] io_schedule+0x16/0x40
[] __lock_page+0x119/0x170
[] truncate_inode_pages_range+0x421/0x790
[] truncate_pagecache+0x47/0x60
[] __ceph_do_pending_vmtruncate+0xc2/0x1c0 [ceph]
[] ceph_vmtruncate_work+0x1b/0x40 [ceph]
[] process_one_work+0x149/0x360
[] worker_thread+0x4d/0x3c0
[] kthread+0x109/0x140
[] ret_from_fork+0x25/0x30
[] 0x

[root@ceph-radosgw-lb-backup cephfs]# cat /proc/3996/stack
[] __ceph_do_pending_vmtruncate+0x44/0x1c0 [ceph]
[] __ceph_setattr+0x79a/0x8b0 [ceph]
[] ceph_setattr+0x3c/0x60 [ceph]
[] notify_change+0x266/0x440
[] do_truncate+0x75/0xc0
[] path_openat+0xaba/0x13b0
[] do_filp_open+0x91/0x100
[] do_sys_open+0x124/0x210
[] SyS_open+0x1e/0x20
[] do_syscall_64+0x67/0x150
[] entry_SYSCALL64_slow_path+0x25/0x25
[] 0x

[root@ceph-radosgw-lb-backup cephfs]# cat /proc/4479/stack
[] __ceph_do_pending_vmtruncate+0x44/0x1c0 [ceph]
[] try_get_cap_refs+0xb5/0x5a0 [ceph]
[] ceph_get_caps+0x119/0x390 [ceph]
[] ceph_read_iter+0xc5/0x820 [ceph]
[] __vfs_read+0xdf/0x130
[] vfs_read+0x8c/0x130
[] SyS_read+0x55/0xc0
[] do_syscall_64+0x67/0x150
[] entry_SYSCALL64_slow_path+0x25/0x25
[] 0x





donglifec...@gmail.com
 
From: donglifec...@gmail.com
Date: 2017-08-25 16:14
To: zyan
CC: ceph-users
Subject: Re: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D 
status), ceph version 0.94.9
ZhengYan,

I will test this problem again. 

Thanks a lot.




donglifec...@gmail.com
 
From: Yan, Zheng
Date: 2017-08-25 16:12
To: donglifecomm
CC: ceph-users
Subject: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D status), 
ceph version 0.94.9
 
 
> On 24 Aug 2017, at 17:40, donglifec...@gmail.com wrote:
> 
> ZhengYan,
> 
> I meet a problem,   Follow the steps outlined below:
> 
> 1.  create 30G file test823
> 2.  host1 client(kernel 4.12.8)
>   cat /mnt/cephfs/a/test823 > /mnt/cephfs/a/test823-backup
>   ls -al /mnt/cephfs/a/* 
> 
> 3. host2 client(kernel 4.12.8)
>   while true; do cp /home/scripts/512k.file /mnt/cephfs/a/512k.file$i ; 
> done   //  loop copy file 
>   cat /mnt/cep

Re: [ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9

2017-08-25 Thread donglifec...@gmail.com
ZhengYan,

I will test this problem again. 

Thanks a lot.




donglifec...@gmail.com
 
From: Yan, Zheng
Date: 2017-08-25 16:12
To: donglifecomm
CC: ceph-users
Subject: Re: [ceph-users]cephfs, kernel(4.12.8) client version hung(D status), 
ceph version 0.94.9
 
 
> On 24 Aug 2017, at 17:40, donglifec...@gmail.com wrote:
> 
> ZhengYan,
> 
> I meet a problem,   Follow the steps outlined below:
> 
> 1.  create 30G file test823
> 2.  host1 client(kernel 4.12.8)
>   cat /mnt/cephfs/a/test823 > /mnt/cephfs/a/test823-backup
>   ls -al /mnt/cephfs/a/* 
> 
> 3. host2 client(kernel 4.12.8)
>   while true; do cp /home/scripts/512k.file /mnt/cephfs/a/512k.file$i ; 
> done   //  loop copy file 
>   cat /mnt/cephfs/a/test823-backup > /mnt/cephfs/a/newtestfile
>   ls -al /mnt/cephfs/a/*
>   
> 4. host2 client hung, stack is :
> [ 9462.754853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [ 9462.756838] bashD0 32738  14988 0x0084
> [ 9462.758568] Call Trace:
> [ 9462.759945]  __schedule+0x28a/0x880
> [ 9462.761414]  schedule+0x36/0x80
> [ 9462.762835]  rwsem_down_write_failed+0x20d/0x380
> [ 9462.764433]  call_rwsem_down_write_failed+0x17/0x30
> [ 9462.766075]  ? __ceph_getxattr+0x340/0x340 [ceph]
> [ 9462.767693]  down_write+0x2d/0x40
> [ 9462.769175]  do_truncate+0x67/0xc0
> [ 9462.770642]  path_openat+0xaba/0x13b0
> [ 9462.772136]  do_filp_open+0x91/0x100
> [ 9462.773616]  ? __check_object_size+0x159/0x190
> [ 9462.775156]  ? __alloc_fd+0x46/0x170
> [ 9462.776574]  do_sys_open+0x124/0x210
> [ 9462.777972]  SyS_open+0x1e/0x20
> [ 9462.779320]  do_syscall_64+0x67/0x150
> [ 9462.780736]  entry_SYSCALL64_slow_path+0x25/0x25
> 
> [root@cephtest ~]# cat /proc/29541/stack
> [] ceph_mdsc_do_request+0x183/0x240 [ceph]
> [] __ceph_setattr+0x3fc/0x8b0 [ceph]
> [] ceph_setattr+0x3c/0x60 [ceph]
> [] notify_change+0x266/0x440
> [] do_truncate+0x75/0xc0
> [] path_openat+0xaba/0x13b0
> [] do_filp_open+0x91/0x100
> [] do_sys_open+0x124/0x210
> [] SyS_open+0x1e/0x20
> [] do_syscall_64+0x67/0x150
> [] entry_SYSCALL64_slow_path+0x25/0x25
> [] 0x
> 
> [root@cephtest ~]# cat /proc/32738/stack
> [] call_rwsem_down_write_failed+0x17/0x30
> [] do_truncate+0x67/0xc0
> [] path_openat+0xaba/0x13b0
> [] do_filp_open+0x91/0x100
> [] do_sys_open+0x124/0x210
> [] SyS_open+0x1e/0x20
> [] do_syscall_64+0x67/0x150
> [] entry_SYSCALL64_slow_path+0x25/0x25
> [] 0x
> 
> ceph log is:
> f pending pAsLsXs issued pAsLsXsFcb, sent 1921.069365 seconds ago
> 2017-08-24 17:16:00.219523 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 1000424 
> pending pAsLsXs issued pAsLsXsFcb, sent 1921.063079 seconds ago
> 2017-08-24 17:16:00.219534 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 1000521 
> pending pAsLsXs issued pAsLsXsFcb, sent 1921.026983 seconds ago
> 2017-08-24 17:16:00.219545 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 1000523 
> pending pAsLsXs issued pAsLsXsFcb, sent 1920.985596 seconds ago
> 2017-08-24 17:16:00.219574 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 1000528 
> pending pAsLsXs issued pAsLsXsFcb, sent 1920.866863 seconds ago
> 2017-08-24 17:16:00.219592 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 100052a 
> pending pAsLsXs issued pAsLsXsFcb, sent 1920.788282 seconds ago
> 2017-08-24 17:16:00.219606 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 100052c 
> pending pAsLsXs issued pAsLsXsFcb, sent 1920.712564 seconds ago
> 2017-08-24 17:16:00.219618 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 100052f 
> pending pAsLsXs issued pAsLsXsFcb, sent 1920.563784 seconds ago
> 2017-08-24 17:16:00.219630 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 100040b 
> pending pAsLsXsFsc issued pAsLsXsFscb, sent 1920.506752 seconds ago
> 2017-08-24 17:16:00.219741 7f746db8f700  0 log_channel(cluster) log [WRN] : 4 
> slow requests, 1 included below; oldest blocked for > 1941.487238 secs
> 2017-08-24 17:16:00.219753 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> slow request 1920.507384 seconds old, received at 2017-08-24 16:43:59.712319: 
> client_request(c

Re: [ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9

2017-08-25 Thread Yan, Zheng


> On 24 Aug 2017, at 17:40, donglifec...@gmail.com wrote:
> 
> ZhengYan,
> 
> I meet a problem,   Follow the steps outlined below:
> 
> 1.  create 30G file test823
> 2.  host1 client(kernel 4.12.8)
>   cat /mnt/cephfs/a/test823 > /mnt/cephfs/a/test823-backup
>   ls -al /mnt/cephfs/a/* 
> 
> 3. host2 client(kernel 4.12.8)
>   while true; do cp /home/scripts/512k.file /mnt/cephfs/a/512k.file$i ; 
> done   //  loop copy file 
>   cat /mnt/cephfs/a/test823-backup > /mnt/cephfs/a/newtestfile
>   ls -al /mnt/cephfs/a/*
>   
> 4. host2 client hung, stack is :
> [ 9462.754853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [ 9462.756838] bashD0 32738  14988 0x0084
> [ 9462.758568] Call Trace:
> [ 9462.759945]  __schedule+0x28a/0x880
> [ 9462.761414]  schedule+0x36/0x80
> [ 9462.762835]  rwsem_down_write_failed+0x20d/0x380
> [ 9462.764433]  call_rwsem_down_write_failed+0x17/0x30
> [ 9462.766075]  ? __ceph_getxattr+0x340/0x340 [ceph]
> [ 9462.767693]  down_write+0x2d/0x40
> [ 9462.769175]  do_truncate+0x67/0xc0
> [ 9462.770642]  path_openat+0xaba/0x13b0
> [ 9462.772136]  do_filp_open+0x91/0x100
> [ 9462.773616]  ? __check_object_size+0x159/0x190
> [ 9462.775156]  ? __alloc_fd+0x46/0x170
> [ 9462.776574]  do_sys_open+0x124/0x210
> [ 9462.777972]  SyS_open+0x1e/0x20
> [ 9462.779320]  do_syscall_64+0x67/0x150
> [ 9462.780736]  entry_SYSCALL64_slow_path+0x25/0x25
> 
> [root@cephtest ~]# cat /proc/29541/stack
> [] ceph_mdsc_do_request+0x183/0x240 [ceph]
> [] __ceph_setattr+0x3fc/0x8b0 [ceph]
> [] ceph_setattr+0x3c/0x60 [ceph]
> [] notify_change+0x266/0x440
> [] do_truncate+0x75/0xc0
> [] path_openat+0xaba/0x13b0
> [] do_filp_open+0x91/0x100
> [] do_sys_open+0x124/0x210
> [] SyS_open+0x1e/0x20
> [] do_syscall_64+0x67/0x150
> [] entry_SYSCALL64_slow_path+0x25/0x25
> [] 0x
> 
> [root@cephtest ~]# cat /proc/32738/stack
> [] call_rwsem_down_write_failed+0x17/0x30
> [] do_truncate+0x67/0xc0
> [] path_openat+0xaba/0x13b0
> [] do_filp_open+0x91/0x100
> [] do_sys_open+0x124/0x210
> [] SyS_open+0x1e/0x20
> [] do_syscall_64+0x67/0x150
> [] entry_SYSCALL64_slow_path+0x25/0x25
> [] 0x
> 
> ceph log is:
> f pending pAsLsXs issued pAsLsXsFcb, sent 1921.069365 seconds ago
> 2017-08-24 17:16:00.219523 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 1000424 
> pending pAsLsXs issued pAsLsXsFcb, sent 1921.063079 seconds ago
> 2017-08-24 17:16:00.219534 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 1000521 
> pending pAsLsXs issued pAsLsXsFcb, sent 1921.026983 seconds ago
> 2017-08-24 17:16:00.219545 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 1000523 
> pending pAsLsXs issued pAsLsXsFcb, sent 1920.985596 seconds ago
> 2017-08-24 17:16:00.219574 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 1000528 
> pending pAsLsXs issued pAsLsXsFcb, sent 1920.866863 seconds ago
> 2017-08-24 17:16:00.219592 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 100052a 
> pending pAsLsXs issued pAsLsXsFcb, sent 1920.788282 seconds ago
> 2017-08-24 17:16:00.219606 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 100052c 
> pending pAsLsXs issued pAsLsXsFcb, sent 1920.712564 seconds ago
> 2017-08-24 17:16:00.219618 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 100052f 
> pending pAsLsXs issued pAsLsXsFcb, sent 1920.563784 seconds ago
> 2017-08-24 17:16:00.219630 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> client.268113 isn't responding to mclientcaps(revoke), ino 100040b 
> pending pAsLsXsFsc issued pAsLsXsFscb, sent 1920.506752 seconds ago
> 2017-08-24 17:16:00.219741 7f746db8f700  0 log_channel(cluster) log [WRN] : 4 
> slow requests, 1 included below; oldest blocked for > 1941.487238 secs
> 2017-08-24 17:16:00.219753 7f746db8f700  0 log_channel(cluster) log [WRN] : 
> slow request 1920.507384 seconds old, received at 2017-08-24 16:43:59.712319: 
> client_request(client.268101:1122217 getattr pAsLsXsFs #100040b 
> 2017-08-24 16:44:00.463827) currently failed to rdlock, waiting


please check if there are hung request in /sys/kernel/debug/ceph/*/osdc. It’s 
likely that kernel was unable to flush dirty pages.

Regards
Yan, Zheng



> 
> Thanks a lot.
> 
>  
> 
> 
> 
> donglifec...@gmail.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9

2017-08-24 Thread donglifec...@gmail.com
ZhengYan,

I meet a problem,   Follow the steps outlined below:

1.  create 30G file test823
2.  host1 client(kernel 4.12.8)
  cat /mnt/cephfs/a/test823 > /mnt/cephfs/a/test823-backup
  ls -al /mnt/cephfs/a/* 

3. host2 client(kernel 4.12.8)
  while true; do cp /home/scripts/512k.file /mnt/cephfs/a/512k.file$i ; 
done // loop copy file 
  cat /mnt/cephfs/a/test823-backup > /mnt/cephfs/a/newtestfile
  ls -al /mnt/cephfs/a/*
  
4. host2 client hung, stack is :
[ 9462.754853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 9462.756838] bashD0 32738  14988 0x0084
[ 9462.758568] Call Trace:
[ 9462.759945]  __schedule+0x28a/0x880
[ 9462.761414]  schedule+0x36/0x80
[ 9462.762835]  rwsem_down_write_failed+0x20d/0x380
[ 9462.764433]  call_rwsem_down_write_failed+0x17/0x30
[ 9462.766075]  ? __ceph_getxattr+0x340/0x340 [ceph]
[ 9462.767693]  down_write+0x2d/0x40
[ 9462.769175]  do_truncate+0x67/0xc0
[ 9462.770642]  path_openat+0xaba/0x13b0
[ 9462.772136]  do_filp_open+0x91/0x100
[ 9462.773616]  ? __check_object_size+0x159/0x190
[ 9462.775156]  ? __alloc_fd+0x46/0x170
[ 9462.776574]  do_sys_open+0x124/0x210
[ 9462.777972]  SyS_open+0x1e/0x20
[ 9462.779320]  do_syscall_64+0x67/0x150
[ 9462.780736]  entry_SYSCALL64_slow_path+0x25/0x25

[root@cephtest ~]# cat /proc/29541/stack
[] ceph_mdsc_do_request+0x183/0x240 [ceph]
[] __ceph_setattr+0x3fc/0x8b0 [ceph]
[] ceph_setattr+0x3c/0x60 [ceph]
[] notify_change+0x266/0x440
[] do_truncate+0x75/0xc0
[] path_openat+0xaba/0x13b0
[] do_filp_open+0x91/0x100
[] do_sys_open+0x124/0x210
[] SyS_open+0x1e/0x20
[] do_syscall_64+0x67/0x150
[] entry_SYSCALL64_slow_path+0x25/0x25
[] 0x

[root@cephtest ~]# cat /proc/32738/stack
[] call_rwsem_down_write_failed+0x17/0x30
[] do_truncate+0x67/0xc0
[] path_openat+0xaba/0x13b0
[] do_filp_open+0x91/0x100
[] do_sys_open+0x124/0x210
[] SyS_open+0x1e/0x20
[] do_syscall_64+0x67/0x150
[] entry_SYSCALL64_slow_path+0x25/0x25
[] 0x

ceph log is:
f pending pAsLsXs issued pAsLsXsFcb, sent 1921.069365 seconds ago
2017-08-24 17:16:00.219523 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 1000424 pending 
pAsLsXs issued pAsLsXsFcb, sent 1921.063079 seconds ago
2017-08-24 17:16:00.219534 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 1000521 pending 
pAsLsXs issued pAsLsXsFcb, sent 1921.026983 seconds ago
2017-08-24 17:16:00.219545 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 1000523 pending 
pAsLsXs issued pAsLsXsFcb, sent 1920.985596 seconds ago
2017-08-24 17:16:00.219574 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 1000528 pending 
pAsLsXs issued pAsLsXsFcb, sent 1920.866863 seconds ago
2017-08-24 17:16:00.219592 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 100052a pending 
pAsLsXs issued pAsLsXsFcb, sent 1920.788282 seconds ago
2017-08-24 17:16:00.219606 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 100052c pending 
pAsLsXs issued pAsLsXsFcb, sent 1920.712564 seconds ago
2017-08-24 17:16:00.219618 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 100052f pending 
pAsLsXs issued pAsLsXsFcb, sent 1920.563784 seconds ago
2017-08-24 17:16:00.219630 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 100040b pending 
pAsLsXsFsc issued pAsLsXsFscb, sent 1920.506752 seconds ago
2017-08-24 17:16:00.219741 7f746db8f700  0 log_channel(cluster) log [WRN] : 4 
slow requests, 1 included below; oldest blocked for > 1941.487238 secs
2017-08-24 17:16:00.219753 7f746db8f700  0 log_channel(cluster) log [WRN] : 
slow request 1920.507384 seconds old, received at 2017-08-24 16:43:59.712319: 
client_request(client.268101:1122217 getattr pAsLsXsFs #100040b 2017-08-24 
16:44:00.463827) currently failed to rdlock, waiting

Thanks a lot.

 





donglifec...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com