Hi,

We found a problem when doing traffic tests. We created a replicated volume 
with two storage nodes (CentOS 6.5). There was one FUSE client (CentOS 6.7) 
which did multi-threading reads and writes. Most of IOs are reads for big 
files. All machines used 10Gbe NICs. And the typical read throught was 4-6Gbps 
(0.5-1.5GB/s).


After the test ran several minutes, the test program hung. The throughput 
suddenly dropped to zero. Then there was no traffic any more. If we ran df, df 
would hang, too. But we could still read or write the volume from other clients.


We tried several GlusterFS version from 3.7.5 to 3.8.0. Each version had this 
problem. We also tried to restore default GlusterFS options, but the problem 
still existed.


The GlusterFS version was 3.7.11 for the following stacks.


This was the stack of dd when hanging:
[ffffffffa046d211] wait_answer_interruptible+0x81/0xc0 [fuse]
[ffffffffa046d42b] __fuse_request_send+0x1db/0x2b0 [fuse]
[ffffffffa046d512] fuse_request_send+0x12/0x20 [fuse]
[ffffffffa0477d4a] fuse_statfs+0xda/0x150 [fuse]
[ffffffff811c2b64] statfs_by_dentry+0x74/0xa0
[ffffffff811c2c9b] vfs_statfs+0x1b/0xb0
[ffffffff811c2e97] user_statfs+0x47/0xb0
[ffffffff811c2f9a] sys_statfs+0x2a/0x50
[ffffffff8100b072] system_call_fastpath+0x16/0x1b
[ffffffffffffffff] 0xffffffffffffffff


This was the stack of gluster:
[ffffffff810b226a] futex_wait_queue_me+0xba/0xf0
[ffffffff810b33a0] futex_wait+0x1c0/0x310
[ffffffff810b4c91] do_futex+0x121/0xae0
[ffffffff810b56cb] sys_futex+0x7b/0x170
[ffffffff8100b072] system_call_fastpath+0x16/0x1b
[ffffffffffffffff] 0xffffffffffffffff


This was the stack of the test program:
[ffffffff810a3f74] hrtimer_nanosleep+0xc4/0x180
[ffffffff810a409e] sys_nanosleep+0x6e/0x80
[ffffffff8100b072] system_call_fastpath+0x16/0x1b
[ffffffffffffffff] 0xffffffffffffffff


Any clue?


Thanks,
Paul
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to