Re: [Gluster-devel] Fuse client hangs on doing multithreading IO tests

2016-06-25 Thread len...@storswift.com
Hi,

Sorry, the client OS version is CentOS 6.6, not 6.7.

And I find it is not the problem of GlusterFS. It's a kernel problem. 
futex_wait() may cause deadlock.

https://groups.google.com/forum/#!msg/mechanical-sympathy/QbmpZxp6C64/nMhNjQPTeLEJ
 

It just happens on CentOS 6.5. The bug was fixed on kernel 2.6.32-504.16.2. 
After I upgrade the kernel, then I have not seen this problem until now. I will 
continue the test to confirm it is solved.

Thanks,
Paul
 
From: paul
Date: 2016-06-25 01:44
To: FNU Raghavendra Manjunath
CC: gluster-devel@gluster.org; gluster-us...@gluster.org
Subject: Re: [Gluster-devel] Fuse client hangs on doing multithreading IO tests
Hi, Raghavendra,

The typical file size is 2GB. I remember that the problem existed if I use 10MB 
file size.

I have attached the logs. The attachment file includes the logs of one client 
and two servers.

This time the client hung at 16:48 on June 24.  

Thanks,
Paul

 Original Message  
Sender: FNU Raghavendra Manjunath<rab...@redhat.com>
Recipient: 冷波<len...@storswift.com>
Cc: gluster-devel@gluster.org<gluster-devel@gluster.org>; 
gluster-us...@gluster.org<gluster-us...@gluster.org>
Date: Friday, Jun 24, 2016 23:18
Subject: Re: [Gluster-devel] Fuse client hangs on doing multithreading IO tests


Hi,

Any idea how big were the files that were being read?

Can you please attach the logs from all the gluster server and client nodes? 
(the logs can be found in /var/log/glusterfs)

Also please provide the /var/log/messages from all the server and client nodes.

Regards,
Raghavendra


On Fri, Jun 24, 2016 at 10:32 AM, 冷波 <len...@storswift.com> wrote:
Hi,

We found a problem when doing traffic tests. We created a replicated volume 
with two storage nodes (CentOS 6.5). There was one FUSE client (CentOS 6.7) 
which did multi-threading reads and writes. Most of IOs are reads for big 
files. All machines used 10Gbe NICs. And the typical read throught was 4-6Gbps 
(0.5-1.5GB/s). 

After the test ran several minutes, the test program hung. The throughput 
suddenly dropped to zero. Then there was no traffic any more. If we ran df, df 
would hang, too. But we could still read or write the volume from other clients.

We tried several GlusterFS version from 3.7.5 to 3.8.0. Each version had this 
problem. We also tried to restore default GlusterFS options, but the problem 
still existed.

The GlusterFS version was 3.7.11 for the following stacks.

This was the stack of dd when hanging:
[] wait_answer_interruptible+0x81/0xc0 [fuse]
[] __fuse_request_send+0x1db/0x2b0 [fuse]
[] fuse_request_send+0x12/0x20 [fuse]
[] fuse_statfs+0xda/0x150 [fuse]
[] statfs_by_dentry+0x74/0xa0
[] vfs_statfs+0x1b/0xb0
[] user_statfs+0x47/0xb0
[] sys_statfs+0x2a/0x50
[] system_call_fastpath+0x16/0x1b
[] 0x

This was the stack of gluster:
[] futex_wait_queue_me+0xba/0xf0
[] futex_wait+0x1c0/0x310
[] do_futex+0x121/0xae0
[] sys_futex+0x7b/0x170
[] system_call_fastpath+0x16/0x1b
[] 0x

This was the stack of the test program:
[] hrtimer_nanosleep+0xc4/0x180
[] sys_nanosleep+0x6e/0x80
[] system_call_fastpath+0x16/0x1b
[] 0x

Any clue?

Thanks,
Paul

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Fuse client hangs on doing multithreading IO tests

2016-06-24 Thread FNU Raghavendra Manjunath
Hi,

Any idea how big were the files that were being read?

Can you please attach the logs from all the gluster server and client
nodes? (the logs can be found in /var/log/glusterfs)

Also please provide the /var/log/messages from all the server and client
nodes.

Regards,
Raghavendra


On Fri, Jun 24, 2016 at 10:32 AM, 冷波  wrote:

> Hi,
>
>
> We found a problem when doing traffic tests. We created a replicated
> volume with two storage nodes (CentOS 6.5). There was one FUSE client
> (CentOS 6.7) which did multi-threading reads and writes. Most of IOs are
> reads for big files. All machines used 10Gbe NICs. And the typical read
> throught was 4-6Gbps (0.5-1.5GB/s).
>
>
> After the test ran several minutes, the test program hung. The throughput
> suddenly dropped to zero. Then there was no traffic any more. If we ran df,
> df would hang, too. But we could still read or write the volume from other
> clients.
>
>
> We tried several GlusterFS version from 3.7.5 to 3.8.0. Each version had
> this problem. We also tried to restore default GlusterFS options, but the
> problem still existed.
>
>
> The GlusterFS version was 3.7.11 for the following stacks.
>
>
> This was the stack of dd when hanging:
>
> [] wait_answer_interruptible+0x81/0xc0 [fuse]
>
> [] __fuse_request_send+0x1db/0x2b0 [fuse]
>
> [] fuse_request_send+0x12/0x20 [fuse]
>
> [] fuse_statfs+0xda/0x150 [fuse]
>
> [] statfs_by_dentry+0x74/0xa0
>
> [] vfs_statfs+0x1b/0xb0
>
> [] user_statfs+0x47/0xb0
>
> [] sys_statfs+0x2a/0x50
>
> [] system_call_fastpath+0x16/0x1b
>
> [] 0x
>
>
> This was the stack of gluster:
>
> [] futex_wait_queue_me+0xba/0xf0
>
> [] futex_wait+0x1c0/0x310
>
> [] do_futex+0x121/0xae0
>
> [] sys_futex+0x7b/0x170
>
> [] system_call_fastpath+0x16/0x1b
>
> [] 0x
>
>
> This was the stack of the test program:
>
> [] hrtimer_nanosleep+0xc4/0x180
>
> [] sys_nanosleep+0x6e/0x80
>
> [] system_call_fastpath+0x16/0x1b
>
> [] 0x
>
>
> Any clue?
>
> Thanks,
> Paul
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Fuse client hangs on doing multithreading IO tests

2016-06-24 Thread 冷波
Hi,


We found a problem when doing traffic tests. We created a replicated volume 
with two storage nodes (CentOS 6.5). There was one FUSE client (CentOS 6.7) 
which did multi-threading reads and writes. Most of IOs are reads for big 
files. All machines used 10Gbe NICs. And the typical read throught was 4-6Gbps 
(0.5-1.5GB/s).


After the test ran several minutes, the test program hung. The throughput 
suddenly dropped to zero. Then there was no traffic any more. If we ran df, df 
would hang, too. But we could still read or write the volume from other clients.


We tried several GlusterFS version from 3.7.5 to 3.8.0. Each version had this 
problem. We also tried to restore default GlusterFS options, but the problem 
still existed.


The GlusterFS version was 3.7.11 for the following stacks.


This was the stack of dd when hanging:
[a046d211] wait_answer_interruptible+0x81/0xc0 [fuse]
[a046d42b] __fuse_request_send+0x1db/0x2b0 [fuse]
[a046d512] fuse_request_send+0x12/0x20 [fuse]
[a0477d4a] fuse_statfs+0xda/0x150 [fuse]
[811c2b64] statfs_by_dentry+0x74/0xa0
[811c2c9b] vfs_statfs+0x1b/0xb0
[811c2e97] user_statfs+0x47/0xb0
[811c2f9a] sys_statfs+0x2a/0x50
[8100b072] system_call_fastpath+0x16/0x1b
[] 0x


This was the stack of gluster:
[810b226a] futex_wait_queue_me+0xba/0xf0
[810b33a0] futex_wait+0x1c0/0x310
[810b4c91] do_futex+0x121/0xae0
[810b56cb] sys_futex+0x7b/0x170
[8100b072] system_call_fastpath+0x16/0x1b
[] 0x


This was the stack of the test program:
[810a3f74] hrtimer_nanosleep+0xc4/0x180
[810a409e] sys_nanosleep+0x6e/0x80
[8100b072] system_call_fastpath+0x16/0x1b
[] 0x


Any clue?


Thanks,
Paul___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel