Re: [ceph-users] long blocking with writes on rbds

Jeff Epstein Thu, 23 Apr 2015 07:20:02 -0700

The appearance of these socket closed messages seems to coincide withthe slowdown symptoms. What is the cause?


2015-04-23T14:08:47.111838+00:00 i-65062482 kernel: [ 4229.485489] libceph: 
osd1 192.168.160.4:6800 socket closed (con state OPEN)


2015-04-23T14:09:06.961823+00:00 i-65062482 kernel: [ 4249.332547] libceph: 
osd2 192.168.96.4:6800 socket closed (con state OPEN)

2015-04-23T14:09:09.701819+00:00 i-65062482 kernel: [ 4252.070594] libceph: 
osd4 192.168.64.4:6800 socket closed (con state OPEN)

2015-04-23T14:09:10.381817+00:00 i-65062482 kernel: [ 4252.755400] libceph: 
osd5 192.168.128.4:6800 socket closed (con state OPEN)

2015-04-23T14:09:14.831817+00:00 i-65062482 kernel: [ 4257.200257] libceph: 
osd5 192.168.128.4:6800 socket closed (con state OPEN)

2015-04-23T14:13:57.061877+00:00 i-65062482 kernel: [ 4539.431624] libceph: 
osd4 192.168.64.4:6800 socket closed (con state OPEN)

2015-04-23T14:13:57.541842+00:00 i-65062482 kernel: [ 4539.913284] libceph: 
osd5 192.168.128.4:6800 socket closed (con state OPEN)

2015-04-23T14:13:59.801822+00:00 i-65062482 kernel: [ 4542.177187] libceph: 
osd3 192.168.0.4:6800 socket closed (con state OPEN)

2015-04-23T14:14:11.361819+00:00 i-65062482 kernel: [ 4553.733566] libceph: 
osd4 192.168.64.4:6800 socket closed (con state OPEN)

2015-04-23T14:14:47.871829+00:00 i-65062482 kernel: [ 4590.242136] libceph: 
osd5 192.168.128.4:6800 socket closed (con state OPEN)

2015-04-23T14:14:47.991826+00:00 i-65062482 kernel: [ 4590.364078] libceph: 
osd2 192.168.96.4:6800 socket closed (con state OPEN)

2015-04-23T14:15:00.081817+00:00 i-65062482 kernel: [ 4602.452980] libceph: 
osd5 192.168.128.4:6800 socket closed (con state OPEN)

2015-04-23T14:16:21.301820+00:00 i-65062482 kernel: [ 4683.671614] libceph: 
osd5 192.168.128.4:6800 socket closed (con state OPEN)



Jeff

On 04/23/2015 12:26 AM, Jeff Epstein wrote:

Do you have some idea how I can diagnose this problem?
I'll look at ceph -s output while you get these stuck process to seeif there's any unusual activity (scrub/deepscrub/recovery/bacfills/...). Is it correlated in any way with rbdremoval (ie: write blocking don't appear unless you removed at leastone rbd for say one hour before the write performance problems).
I'm not familiar with Amazon VMs. If you map the rbds using thekernel driver to local block devices do you have control over thekernel you run (I've seen reports of various problems with olderkernels and you probably want the latest possible) ?
ceph status shows nothing unusual. However, on the problematic node,we typically see entries in ps like this:
 1468 12329 root     D     0.0 mkfs.ext4       wait_on_page_bit
 1468 12332 root     D     0.0 mkfs.ext4       wait_on_buffer
Notice the "D" blocking state. Here, mkfs is stopped on some waitfunctions for long periods of time. (Also, we are formatting the RBDsas ext4 even though the OSDs are xfs; I assume this shouldn't be aproblem?)
We're on kernel 3.18.4pl2, which is pretty recent. Still, an outdatedkernel driver isn't out of the question; if anyone has any concreteinformation, I'd be grateful.
Jeff


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] long blocking with writes on rbds

Reply via email to