Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-20 Thread Nithya Balachandran
Thank you. In the meantime, turning off parallel readdir should prevent the
first crash.


On 20 June 2018 at 21:42, mohammad kashif  wrote:

> Hi Nithya
>
> Thanks for the bug report. This new crash happened only once and only at
> one client in the last 6 days. I will let you know if it happened again or
> more frequently.
>
> Cheers
>
> Kashif
>
> On Wed, Jun 20, 2018 at 12:28 PM, Nithya Balachandran  > wrote:
>
>> Hi Mohammad,
>>
>> This is a different crash. How often does it happen?
>>
>>
>> We have managed to reproduce the first crash you reported and a bug has
>> been filed at [1].
>> We will work on a fix for this.
>>
>>
>> Regards,
>> Nithya
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1593199
>>
>>
>> On 18 June 2018 at 14:09, mohammad kashif  wrote:
>>
>>> Hi
>>>
>>> Problem appeared again after few days. This time, the client
>>> is glusterfs-3.10.12-1.el6.x86_64 and performance.parallel-readdir is
>>> off. The log level was set to ERROR and I got this log at the time of crash
>>>
>>> [2018-06-14 08:45:43.551384] E [rpc-clnt.c:365:saved_frames_unwind]
>>> (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x7fac2e66ce03]
>>> (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fac2e434867]
>>> (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fac2e43497e]
>>> (--> 
>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x7fac2e434a45]
>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x7fac2e434d68]
>>> ) 0-atlasglust-client-4: forced unwinding frame type(GlusterFS 3.3)
>>> op(READDIRP(40)) called at 2018-06-14 08:45:43.483303 (xid=0x7553c7
>>>
>>> Core dump was enabled on client so it created a dump. It is here
>>>
>>> http://www-pnp.physics.ox.ac.uk/~mohammad
>>> /core.1002074
>>>
>>> I used a gdb trace using this command
>>>
>>> gdb /usr/sbin/glusterfs core.1002074 -ex bt -ex quit |& tee
>>> backtrace.log_18_16_1
>>>
>>>
>>> http://www-pnp.physics.ox.ac.uk/~mohammad
>>> 
>>> /backtrace.log_18_16_1
>>>
>>> I haven't used gdb much so let me know if you want me to run gdb in
>>> different manner.
>>>
>>> Thanks
>>>
>>> Kashif
>>>
>>>
>>> On Mon, Jun 18, 2018 at 6:27 AM, Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>


 On Mon, Jun 18, 2018 at 9:39 AM, Raghavendra Gowdappa <
 rgowd...@redhat.com> wrote:

>
>
> On Mon, Jun 18, 2018 at 8:11 AM, Raghavendra Gowdappa <
> rgowd...@redhat.com> wrote:
>
>> From the bt:
>>
>> #8  0x7f6ef977e6de in rda_readdirp (frame=0x7f6eec862320,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=357, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #9  0x7f6ef952db4c in dht_readdirp_cbk (frame=> out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> dht-common.c:5388
>> #10 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862210,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #11 0x7f6ef952db4c in dht_readdirp_cbk (frame=> out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> dht-common.c:5388
>> #12 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862100,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #13 0x7f6ef952db4c in dht_readdirp_cbk (frame=> out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> dht-common.c:5388
>> #14 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ff0,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #15 0x7f6ef952db4c in dht_readdirp_cbk (frame=> out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> dht-common.c:5388
>> #16 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ee0,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #17 0x7f6ef952db4c in dht_readdirp_cbk (frame=> out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> dht-common.c:5388
>> #18 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861dd0,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #19 0x7f6ef952db4c in dht_readdirp_cbk (frame=> out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-20 Thread mohammad kashif
Hi Nithya

Thanks for the bug report. This new crash happened only once and only at
one client in the last 6 days. I will let you know if it happened again or
more frequently.

Cheers

Kashif

On Wed, Jun 20, 2018 at 12:28 PM, Nithya Balachandran 
wrote:

> Hi Mohammad,
>
> This is a different crash. How often does it happen?
>
>
> We have managed to reproduce the first crash you reported and a bug has
> been filed at [1].
> We will work on a fix for this.
>
>
> Regards,
> Nithya
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1593199
>
>
> On 18 June 2018 at 14:09, mohammad kashif  wrote:
>
>> Hi
>>
>> Problem appeared again after few days. This time, the client
>> is glusterfs-3.10.12-1.el6.x86_64 and performance.parallel-readdir is
>> off. The log level was set to ERROR and I got this log at the time of crash
>>
>> [2018-06-14 08:45:43.551384] E [rpc-clnt.c:365:saved_frames_unwind] (-->
>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x7fac2e66ce03]
>> (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fac2e434867]
>> (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fac2e43497e]
>> (--> 
>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x7fac2e434a45]
>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x7fac2e434d68]
>> ) 0-atlasglust-client-4: forced unwinding frame type(GlusterFS 3.3)
>> op(READDIRP(40)) called at 2018-06-14 08:45:43.483303 (xid=0x7553c7
>>
>> Core dump was enabled on client so it created a dump. It is here
>>
>> http://www-pnp.physics.ox.ac.uk/~mohammad
>> /core.1002074
>>
>> I used a gdb trace using this command
>>
>> gdb /usr/sbin/glusterfs core.1002074 -ex bt -ex quit |& tee
>> backtrace.log_18_16_1
>>
>>
>> http://www-pnp.physics.ox.ac.uk/~mohammad
>> 
>> /backtrace.log_18_16_1
>>
>> I haven't used gdb much so let me know if you want me to run gdb in
>> different manner.
>>
>> Thanks
>>
>> Kashif
>>
>>
>> On Mon, Jun 18, 2018 at 6:27 AM, Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>>
>>>
>>> On Mon, Jun 18, 2018 at 9:39 AM, Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>


 On Mon, Jun 18, 2018 at 8:11 AM, Raghavendra Gowdappa <
 rgowd...@redhat.com> wrote:

> From the bt:
>
> #8  0x7f6ef977e6de in rda_readdirp (frame=0x7f6eec862320,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=357, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #9  0x7f6ef952db4c in dht_readdirp_cbk (frame= out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #10 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862210,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #11 0x7f6ef952db4c in dht_readdirp_cbk (frame= out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #12 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862100,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #13 0x7f6ef952db4c in dht_readdirp_cbk (frame= out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #14 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ff0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #15 0x7f6ef952db4c in dht_readdirp_cbk (frame= out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #16 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ee0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #17 0x7f6ef952db4c in dht_readdirp_cbk (frame= out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #18 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861dd0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #19 0x7f6ef952db4c in dht_readdirp_cbk (frame= out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #20 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861cc0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #21 0x7f6ef952db4c in dht_readdirp_cbk (frame= out>, 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-20 Thread Nithya Balachandran
Hi Mohammad,

This is a different crash. How often does it happen?


We have managed to reproduce the first crash you reported and a bug has
been filed at [1].
We will work on a fix for this.


Regards,
Nithya

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1593199


On 18 June 2018 at 14:09, mohammad kashif  wrote:

> Hi
>
> Problem appeared again after few days. This time, the client
> is glusterfs-3.10.12-1.el6.x86_64 and performance.parallel-readdir is
> off. The log level was set to ERROR and I got this log at the time of crash
>
> [2018-06-14 08:45:43.551384] E [rpc-clnt.c:365:saved_frames_unwind] (-->
> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x7fac2e66ce03]
> (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fac2e434867]
> (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fac2e43497e]
> (--> 
> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x7fac2e434a45]
> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x7fac2e434d68]
> ) 0-atlasglust-client-4: forced unwinding frame type(GlusterFS 3.3)
> op(READDIRP(40)) called at 2018-06-14 08:45:43.483303 (xid=0x7553c7
>
> Core dump was enabled on client so it created a dump. It is here
>
> http://www-pnp.physics.ox.ac.uk/~mohammad
> /core.1002074
>
> I used a gdb trace using this command
>
> gdb /usr/sbin/glusterfs core.1002074 -ex bt -ex quit |& tee
> backtrace.log_18_16_1
>
>
> http://www-pnp.physics.ox.ac.uk/~mohammad
> 
> /backtrace.log_18_16_1
>
> I haven't used gdb much so let me know if you want me to run gdb in
> different manner.
>
> Thanks
>
> Kashif
>
>
> On Mon, Jun 18, 2018 at 6:27 AM, Raghavendra Gowdappa  > wrote:
>
>>
>>
>> On Mon, Jun 18, 2018 at 9:39 AM, Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>>
>>>
>>> On Mon, Jun 18, 2018 at 8:11 AM, Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>
 From the bt:

 #8  0x7f6ef977e6de in rda_readdirp (frame=0x7f6eec862320,
 this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=357, off=2,
 xdata=0x7f6eec0085a0) at readdir-ahead.c:266
 #9  0x7f6ef952db4c in dht_readdirp_cbk (frame=>>> out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
 orig_entries=, xdata=0x7f6eec0085a0) at
 dht-common.c:5388
 #10 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862210,
 this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
 xdata=0x7f6eec0085a0) at readdir-ahead.c:266
 #11 0x7f6ef952db4c in dht_readdirp_cbk (frame=>>> out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
 orig_entries=, xdata=0x7f6eec0085a0) at
 dht-common.c:5388
 #12 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862100,
 this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
 xdata=0x7f6eec0085a0) at readdir-ahead.c:266
 #13 0x7f6ef952db4c in dht_readdirp_cbk (frame=>>> out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
 orig_entries=, xdata=0x7f6eec0085a0) at
 dht-common.c:5388
 #14 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ff0,
 this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
 xdata=0x7f6eec0085a0) at readdir-ahead.c:266
 #15 0x7f6ef952db4c in dht_readdirp_cbk (frame=>>> out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
 orig_entries=, xdata=0x7f6eec0085a0) at
 dht-common.c:5388
 #16 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ee0,
 this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
 xdata=0x7f6eec0085a0) at readdir-ahead.c:266
 #17 0x7f6ef952db4c in dht_readdirp_cbk (frame=>>> out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
 orig_entries=, xdata=0x7f6eec0085a0) at
 dht-common.c:5388
 #18 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861dd0,
 this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
 xdata=0x7f6eec0085a0) at readdir-ahead.c:266
 #19 0x7f6ef952db4c in dht_readdirp_cbk (frame=>>> out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
 orig_entries=, xdata=0x7f6eec0085a0) at
 dht-common.c:5388
 #20 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861cc0,
 this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
 xdata=0x7f6eec0085a0) at readdir-ahead.c:266
 #21 0x7f6ef952db4c in dht_readdirp_cbk (frame=>>> out>, cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
 orig_entries=, xdata=0x7f6eec0085a0) at
 dht-common.c:5388
 #22 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861bb0,
 this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
 xdata=0x7f6eec0085a0) at readdir-ahead.c:266
 #23 0x7f6ef952db4c in dht_readdirp_cbk (frame=>>> out>, cookie=0x7f6ef4019f20, 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-18 Thread mohammad kashif
Hi

Problem appeared again after few days. This time, the client
is glusterfs-3.10.12-1.el6.x86_64 and performance.parallel-readdir is off.
The log level was set to ERROR and I got this log at the time of crash

[2018-06-14 08:45:43.551384] E [rpc-clnt.c:365:saved_frames_unwind] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x7fac2e66ce03] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fac2e434867] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fac2e43497e] (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x7fac2e434a45]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x7fac2e434d68] )
0-atlasglust-client-4: forced unwinding frame type(GlusterFS 3.3)
op(READDIRP(40)) called at 2018-06-14 08:45:43.483303 (xid=0x7553c7

Core dump was enabled on client so it created a dump. It is here

http://www-pnp.physics.ox.ac.uk/~mohammad
/core.1002074

I used a gdb trace using this command

gdb /usr/sbin/glusterfs core.1002074 -ex bt -ex quit |& tee
backtrace.log_18_16_1


http://www-pnp.physics.ox.ac.uk/~mohammad
/backtrace.log_18_16_1


I haven't used gdb much so let me know if you want me to run gdb in
different manner.

Thanks

Kashif


On Mon, Jun 18, 2018 at 6:27 AM, Raghavendra Gowdappa 
wrote:

>
>
> On Mon, Jun 18, 2018 at 9:39 AM, Raghavendra Gowdappa  > wrote:
>
>>
>>
>> On Mon, Jun 18, 2018 at 8:11 AM, Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>> From the bt:
>>>
>>> #8  0x7f6ef977e6de in rda_readdirp (frame=0x7f6eec862320,
>>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=357, off=2,
>>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>>> #9  0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>>> orig_entries=, xdata=0x7f6eec0085a0) at
>>> dht-common.c:5388
>>> #10 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862210,
>>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>>> #11 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>>> orig_entries=, xdata=0x7f6eec0085a0) at
>>> dht-common.c:5388
>>> #12 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862100,
>>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>>> #13 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>>> orig_entries=, xdata=0x7f6eec0085a0) at
>>> dht-common.c:5388
>>> #14 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ff0,
>>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>>> #15 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>>> orig_entries=, xdata=0x7f6eec0085a0) at
>>> dht-common.c:5388
>>> #16 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ee0,
>>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>>> #17 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>>> orig_entries=, xdata=0x7f6eec0085a0) at
>>> dht-common.c:5388
>>> #18 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861dd0,
>>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>>> #19 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>>> orig_entries=, xdata=0x7f6eec0085a0) at
>>> dht-common.c:5388
>>> #20 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861cc0,
>>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>>> #21 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>>> orig_entries=, xdata=0x7f6eec0085a0) at
>>> dht-common.c:5388
>>> #22 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861bb0,
>>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>>> #23 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>>> orig_entries=, xdata=0x7f6eec0085a0) at
>>> dht-common.c:5388
>>>
>>> It looks like an infinite recursion. Note that readdirp is wound to the
>>> same subvol (value of "this" is same in all calls to rda_readdirp) at the
>>> same offset (of value 2). This may be a bug in DHT (winding down readdirp
>>> with wrong offset) or in readdir-ahead (populating incorrect offset values
>>> in dentries it returns as readdirp response).
>>>
>>
>> It looks to be a 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-17 Thread Raghavendra Gowdappa
On Mon, Jun 18, 2018 at 9:39 AM, Raghavendra Gowdappa 
wrote:

>
>
> On Mon, Jun 18, 2018 at 8:11 AM, Raghavendra Gowdappa  > wrote:
>
>> From the bt:
>>
>> #8  0x7f6ef977e6de in rda_readdirp (frame=0x7f6eec862320,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=357, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #9  0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> dht-common.c:5388
>> #10 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862210,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #11 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> dht-common.c:5388
>> #12 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862100,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #13 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> dht-common.c:5388
>> #14 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ff0,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #15 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> dht-common.c:5388
>> #16 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ee0,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #17 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> dht-common.c:5388
>> #18 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861dd0,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #19 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> dht-common.c:5388
>> #20 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861cc0,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #21 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> dht-common.c:5388
>> #22 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861bb0,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
>> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
>> #23 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
>> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
>> orig_entries=, xdata=0x7f6eec0085a0) at
>> dht-common.c:5388
>>
>> It looks like an infinite recursion. Note that readdirp is wound to the
>> same subvol (value of "this" is same in all calls to rda_readdirp) at the
>> same offset (of value 2). This may be a bug in DHT (winding down readdirp
>> with wrong offset) or in readdir-ahead (populating incorrect offset values
>> in dentries it returns as readdirp response).
>>
>
> It looks to be a corruption. Value of size argument in rda_readdirp is too
> big (around 127 TB) to be sane. If you've a reproducer, please run it in
> valgrind or ASAN.
>


I spoke too early. It could be a negative value and hence it may not be a
corruption. Is it possible to upload the core somewhere? Or better still
access to gdb session with this core would be more helpful.


> To make it explicit, ATM its not clear that there is bug in readdir-ahead
> or DHT as it looks to be a memory corruption. Till I get a reproducer or
> valgrind/ASAN output of client process when the issue occcurs, I won't be
> working on this problem.
>
>
>>
>> On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif 
>> wrote:
>>
>>> Hi Milind
>>>
>>> Thanks a lot, I manage to run gdb and produced traceback as well. Its
>>> here
>>>
>>> http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>>>
>>>
>>> I am trying to understand but still not able to make sense out of it.
>>>
>>> Thanks
>>>
>>> Kashif
>>>
>>> On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire 
>>> wrote:
>>>
 Kashif,
 FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/


 On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif >>> > wrote:

> Hi Milind
>
> There is no glusterfs-debuginfo available for gluster-3.12 from
> http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo.
> Do you know from where I can get it?
> Also when I run gdb, it says
>
> Missing separate debuginfos, 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-17 Thread Raghavendra Gowdappa
On Mon, Jun 18, 2018 at 8:11 AM, Raghavendra Gowdappa 
wrote:

> From the bt:
>
> #8  0x7f6ef977e6de in rda_readdirp (frame=0x7f6eec862320,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=357, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #9  0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #10 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862210,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #11 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #12 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862100,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #13 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #14 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ff0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #15 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #16 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ee0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #17 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #18 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861dd0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #19 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #20 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861cc0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #21 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #22 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861bb0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #23 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
>
> It looks like an infinite recursion. Note that readdirp is wound to the
> same subvol (value of "this" is same in all calls to rda_readdirp) at the
> same offset (of value 2). This may be a bug in DHT (winding down readdirp
> with wrong offset) or in readdir-ahead (populating incorrect offset values
> in dentries it returns as readdirp response).
>

It looks to be a corruption. Value of size argument in rda_readdirp is too
big (around 127 TB) to be sane. If you've a reproducer, please run it in
valgrind or ASAN.

To make it explicit, ATM its not clear that there is bug in readdir-ahead
or DHT as it looks to be a memory corruption. Till I get a reproducer or
valgrind/ASAN output of client process when the issue occcurs, I won't be
working on this problem.


>
> On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif 
> wrote:
>
>> Hi Milind
>>
>> Thanks a lot, I manage to run gdb and produced traceback as well. Its here
>>
>> http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>>
>>
>> I am trying to understand but still not able to make sense out of it.
>>
>> Thanks
>>
>> Kashif
>>
>> On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire 
>> wrote:
>>
>>> Kashif,
>>> FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/
>>>
>>>
>>> On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif 
>>> wrote:
>>>
 Hi Milind

 There is no glusterfs-debuginfo available for gluster-3.12 from
 http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo.
 Do you know from where I can get it?
 Also when I run gdb, it says

 Missing separate debuginfos, use: debuginfo-install
 glusterfs-fuse-3.12.9-1.el6.x86_64

 I can't find debug package for glusterfs-fuse either

 Thanks from the pit of despair ;)

 Kashif


 On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif >>> > wrote:

> Hi Milind
>
> I will send you links for logs.
>
> I collected these core dumps at client and there is no glusterd

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-17 Thread Raghavendra Gowdappa
On Mon, Jun 18, 2018 at 8:11 AM, Raghavendra Gowdappa 
wrote:

> From the bt:
>
> #8  0x7f6ef977e6de in rda_readdirp (frame=0x7f6eec862320,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=357, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #9  0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #10 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862210,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #11 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #12 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862100,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #13 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #14 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ff0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #15 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #16 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ee0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #17 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #18 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861dd0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #19 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #20 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861cc0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #21 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
> #22 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861bb0,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #23 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
> cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
> orig_entries=, xdata=0x7f6eec0085a0) at
> dht-common.c:5388
>
> It looks like an infinite recursion. Note that readdirp is wound to the
> same subvol (value of "this" is same in all calls to rda_readdirp) at the
> same offset (of value 2). This may be a bug in DHT (winding down readdirp
> with wrong offset) or in readdir-ahead (populating incorrect offset values
> in dentries it returns as readdirp response).
>

There has been quite a bit of code change in readdir-ahead and dht-readdirp
between the good and bad release. @Poornima, can you check for anything
relevant in readdir-ahead, while I check for anything interesting in
dht-readdirp?


>
> On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif 
> wrote:
>
>> Hi Milind
>>
>> Thanks a lot, I manage to run gdb and produced traceback as well. Its here
>>
>> http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>>
>>
>> I am trying to understand but still not able to make sense out of it.
>>
>> Thanks
>>
>> Kashif
>>
>> On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire 
>> wrote:
>>
>>> Kashif,
>>> FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/
>>>
>>>
>>> On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif 
>>> wrote:
>>>
 Hi Milind

 There is no glusterfs-debuginfo available for gluster-3.12 from
 http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo.
 Do you know from where I can get it?
 Also when I run gdb, it says

 Missing separate debuginfos, use: debuginfo-install
 glusterfs-fuse-3.12.9-1.el6.x86_64

 I can't find debug package for glusterfs-fuse either

 Thanks from the pit of despair ;)

 Kashif


 On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif >>> > wrote:

> Hi Milind
>
> I will send you links for logs.
>
> I collected these core dumps at client and there is no glusterd
> process running on client.
>
> Kashif
>
>
>
> On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire 
> wrote:
>
>> Kashif,
>> Could you also 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-17 Thread Raghavendra Gowdappa
>From the bt:

#8  0x7f6ef977e6de in rda_readdirp (frame=0x7f6eec862320,
this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=357, off=2,
xdata=0x7f6eec0085a0) at readdir-ahead.c:266
#9  0x7f6ef952db4c in dht_readdirp_cbk (frame=,
cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
orig_entries=, xdata=0x7f6eec0085a0) at
dht-common.c:5388
#10 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862210,
this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
xdata=0x7f6eec0085a0) at readdir-ahead.c:266
#11 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
orig_entries=, xdata=0x7f6eec0085a0) at
dht-common.c:5388
#12 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec862100,
this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
xdata=0x7f6eec0085a0) at readdir-ahead.c:266
#13 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
orig_entries=, xdata=0x7f6eec0085a0) at
dht-common.c:5388
#14 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ff0,
this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
xdata=0x7f6eec0085a0) at readdir-ahead.c:266
#15 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
orig_entries=, xdata=0x7f6eec0085a0) at
dht-common.c:5388
#16 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861ee0,
this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
xdata=0x7f6eec0085a0) at readdir-ahead.c:266
#17 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
orig_entries=, xdata=0x7f6eec0085a0) at
dht-common.c:5388
#18 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861dd0,
this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
xdata=0x7f6eec0085a0) at readdir-ahead.c:266
#19 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
orig_entries=, xdata=0x7f6eec0085a0) at
dht-common.c:5388
#20 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861cc0,
this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
xdata=0x7f6eec0085a0) at readdir-ahead.c:266
#21 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
orig_entries=, xdata=0x7f6eec0085a0) at
dht-common.c:5388
#22 0x7f6ef977e7d7 in rda_readdirp (frame=0x7f6eec861bb0,
this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=140114606084288, off=2,
xdata=0x7f6eec0085a0) at readdir-ahead.c:266
#23 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
orig_entries=, xdata=0x7f6eec0085a0) at
dht-common.c:5388

It looks like an infinite recursion. Note that readdirp is wound to the
same subvol (value of "this" is same in all calls to rda_readdirp) at the
same offset (of value 2). This may be a bug in DHT (winding down readdirp
with wrong offset) or in readdir-ahead (populating incorrect offset values
in dentries it returns as readdirp response).


On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif 
wrote:

> Hi Milind
>
> Thanks a lot, I manage to run gdb and produced traceback as well. Its here
>
> http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>
>
> I am trying to understand but still not able to make sense out of it.
>
> Thanks
>
> Kashif
>
> On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire 
> wrote:
>
>> Kashif,
>> FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/
>>
>>
>> On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif 
>> wrote:
>>
>>> Hi Milind
>>>
>>> There is no glusterfs-debuginfo available for gluster-3.12 from
>>> http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo. Do
>>> you know from where I can get it?
>>> Also when I run gdb, it says
>>>
>>> Missing separate debuginfos, use: debuginfo-install
>>> glusterfs-fuse-3.12.9-1.el6.x86_64
>>>
>>> I can't find debug package for glusterfs-fuse either
>>>
>>> Thanks from the pit of despair ;)
>>>
>>> Kashif
>>>
>>>
>>> On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif 
>>> wrote:
>>>
 Hi Milind

 I will send you links for logs.

 I collected these core dumps at client and there is no glusterd process
 running on client.

 Kashif



 On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire 
 wrote:

> Kashif,
> Could you also send over the client/mount log file as Vijay suggested ?
> Or maybe the lines with the crash backtrace lines
>
> Also, you've mentioned that you straced glusterd, but when you ran
> gdb, you ran it over /usr/sbin/glusterfs
>
>
> On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur 
> wrote:
>
>>
>>
>> On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif <
>> kashif.a...@gmail.com> wrote:
>>
>>> Hi Milind
>>>
>>> The operating 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-17 Thread mohammad kashif
Hi Nithya

Fuse volfiles is here after disabling parallel-readdir
http://www-pnp.physics.ox.ac.uk/~mohammad/atlasglust.tcp-fuse.vol
a
Unfortunately I can't take risk of enabling parallel-readdir as the cluster
is in heavy use and likely to kill many jobs if clients unmounted again.

There is one thing which I haven't mentioned earlier to make thing simpler.
I have another 300TB gluster cluster which is only 70% full and  have much
less number of files. It has  parallel-readdir enabled and some of the
clients are common to both. But I didn't have any problem with this
cluster.

I suspect that the problem triggered when atlasglust became more than 98%
full or crossed a certain number of files. But upgrading clients to 3.12.9
was definitely an issue  as this particular problem started after that.
Rolling back some clients to 3.10 while keeping parallel-readir enabled
also fixed this prblem.

Thanks

Kashif

On Fri, Jun 15, 2018 at 9:19 AM, Nithya Balachandran 
wrote:

>
>
> On 15 June 2018 at 13:45, Nithya Balachandran  wrote:
>
>> Hi Mohammad,
>>
>> I was unable to reproduce this on a volume created on a system running
>> 3.12.9.
>>
>> Can you send me the FUSE volfiles for the volume atlasglust? They will
>> be in   /var/lib/glusterd/vols/atlasglust/ on any of the gluster servers
>> hosting the volume and called *.tcp-fuse.vol.
>>
>
> Can you also send the same files after enabling parallel-readdir?
>
>>
>>
>> Thanks,
>> Nithya
>>
>> On 14 June 2018 at 16:42, mohammad kashif  wrote:
>>
>>> Hi Nithya
>>>
>>> It seems that problem can be solved by either turning parallel-readir
>>> off or downgrading client to 3.10.12-1 . Yesterday I downgraded some
>>> clients to 3.10.12-1 and it seems to fixed the problem. Today when I saw
>>> your email then I disabled parallel-readir off and the current client
>>> 3.12.9-1 started  to work.   I upgraded server and clients to 3.12.9-1 last
>>> month and since then clients were intermittently unmounting once in a week.
>>> But during last three days, it started unmounting every few minutes. I
>>> don't know that what triggered this sudden panic except that file system
>>> was quite full; around 98%. It is 480 TB file system. The file system has
>>> almost 80 Million files.
>>>
>>> Servers have 64GB RAM and clients have 64GB to 192GB RAM. I tested with
>>> 192GB RAM client and it still had the same issue.
>>>
>>>
>>> Volume Name: atlasglust
>>> Type: Distribute
>>> Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 7
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: pplxgluster01.X.Y.Z/glusteratlas/brick001/gv0
>>> Brick2: pplxgluster02.X.Y.Z:/glusteratlas/brick002/gv0
>>> Brick3: pplxgluster03.X.Y.Z:/glusteratlas/brick003/gv0
>>> Brick4: pplxgluster04.X.Y.Z:/glusteratlas/brick004/gv0
>>> Brick5: pplxgluster05.X.Y.Z:/glusteratlas/brick005/gv0
>>> Brick6: pplxgluster06.X.Y.Z:/glusteratlas/brick006/gv0
>>> Brick7: pplxgluster07.X.Y.Z:/glusteratlas/brick007/gv0
>>> Options Reconfigured:
>>> diagnostics.client-log-level: ERROR
>>> diagnostics.brick-log-level: ERROR
>>> performance.cache-invalidation: on
>>> server.event-threads: 4
>>> client.event-threads: 4
>>> cluster.lookup-optimize: on
>>> performance.client-io-threads: on
>>> performance.cache-size: 1GB
>>> performance.parallel-readdir: off
>>> performance.md-cache-timeout: 600
>>> performance.stat-prefetch: on
>>> features.cache-invalidation-timeout: 600
>>> features.cache-invalidation: on
>>> auth.allow: X.Y.Z.*
>>> transport.address-family: inet
>>> performance.readdir-ahead: on
>>> nfs.disable: on
>>>
>>>
>>> Thanks
>>>
>>> Kashif
>>>
>>> On Thu, Jun 14, 2018 at 5:39 AM, Nithya Balachandran <
>>> nbala...@redhat.com> wrote:
>>>
 +Poornima who works on parallel-readdir.

 @Poornima, Have you seen anything like this before?

 On 14 June 2018 at 10:07, Nithya Balachandran 
 wrote:

> This is not the same issue as the one you are referring - that was in
> the RPC layer and caused the bricks to crash. This one is different as it
> seems to be in the dht and rda layers. It does look like a stack overflow
> though.
>
> @Mohammad,
>
> Please send the following information:
>
> 1. gluster volume info
> 2. The number of entries in the directory being listed
> 3. System memory
>
> Does this still happen if you turn off parallel-readdir?
>
> Regards,
> Nithya
>
>
>
>
> On 13 June 2018 at 16:40, Milind Changire  wrote:
>
>> +Nithya
>>
>> Nithya,
>> Do these logs [1]  look similar to the recursive readdir() issue that
>> you encountered just a while back ?
>> i.e. recursive readdir() response definition in the XDR
>>
>> [1] http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>>
>>
>> On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif <
>> kashif.a...@gmail.com> wrote:
>>
>>> Hi Milind
>>>
>>> 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-15 Thread Nithya Balachandran
On 15 June 2018 at 13:45, Nithya Balachandran  wrote:

> Hi Mohammad,
>
> I was unable to reproduce this on a volume created on a system running
> 3.12.9.
>
> Can you send me the FUSE volfiles for the volume atlasglust? They will be
> in   /var/lib/glusterd/vols/atlasglust/ on any of the gluster servers
> hosting the volume and called *.tcp-fuse.vol.
>

Can you also send the same files after enabling parallel-readdir?

>
>
> Thanks,
> Nithya
>
> On 14 June 2018 at 16:42, mohammad kashif  wrote:
>
>> Hi Nithya
>>
>> It seems that problem can be solved by either turning parallel-readir off
>> or downgrading client to 3.10.12-1 . Yesterday I downgraded some clients to
>> 3.10.12-1 and it seems to fixed the problem. Today when I saw your email
>> then I disabled parallel-readir off and the current client 3.12.9-1
>> started  to work.   I upgraded server and clients to 3.12.9-1 last month
>> and since then clients were intermittently unmounting once in a week. But
>> during last three days, it started unmounting every few minutes. I don't
>> know that what triggered this sudden panic except that file system was
>> quite full; around 98%. It is 480 TB file system. The file system has
>> almost 80 Million files.
>>
>> Servers have 64GB RAM and clients have 64GB to 192GB RAM. I tested with
>> 192GB RAM client and it still had the same issue.
>>
>>
>> Volume Name: atlasglust
>> Type: Distribute
>> Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 7
>> Transport-type: tcp
>> Bricks:
>> Brick1: pplxgluster01.X.Y.Z/glusteratlas/brick001/gv0
>> Brick2: pplxgluster02.X.Y.Z:/glusteratlas/brick002/gv0
>> Brick3: pplxgluster03.X.Y.Z:/glusteratlas/brick003/gv0
>> Brick4: pplxgluster04.X.Y.Z:/glusteratlas/brick004/gv0
>> Brick5: pplxgluster05.X.Y.Z:/glusteratlas/brick005/gv0
>> Brick6: pplxgluster06.X.Y.Z:/glusteratlas/brick006/gv0
>> Brick7: pplxgluster07.X.Y.Z:/glusteratlas/brick007/gv0
>> Options Reconfigured:
>> diagnostics.client-log-level: ERROR
>> diagnostics.brick-log-level: ERROR
>> performance.cache-invalidation: on
>> server.event-threads: 4
>> client.event-threads: 4
>> cluster.lookup-optimize: on
>> performance.client-io-threads: on
>> performance.cache-size: 1GB
>> performance.parallel-readdir: off
>> performance.md-cache-timeout: 600
>> performance.stat-prefetch: on
>> features.cache-invalidation-timeout: 600
>> features.cache-invalidation: on
>> auth.allow: X.Y.Z.*
>> transport.address-family: inet
>> performance.readdir-ahead: on
>> nfs.disable: on
>>
>>
>> Thanks
>>
>> Kashif
>>
>> On Thu, Jun 14, 2018 at 5:39 AM, Nithya Balachandran > > wrote:
>>
>>> +Poornima who works on parallel-readdir.
>>>
>>> @Poornima, Have you seen anything like this before?
>>>
>>> On 14 June 2018 at 10:07, Nithya Balachandran 
>>> wrote:
>>>
 This is not the same issue as the one you are referring - that was in
 the RPC layer and caused the bricks to crash. This one is different as it
 seems to be in the dht and rda layers. It does look like a stack overflow
 though.

 @Mohammad,

 Please send the following information:

 1. gluster volume info
 2. The number of entries in the directory being listed
 3. System memory

 Does this still happen if you turn off parallel-readdir?

 Regards,
 Nithya




 On 13 June 2018 at 16:40, Milind Changire  wrote:

> +Nithya
>
> Nithya,
> Do these logs [1]  look similar to the recursive readdir() issue that
> you encountered just a while back ?
> i.e. recursive readdir() response definition in the XDR
>
> [1] http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>
>
> On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif <
> kashif.a...@gmail.com> wrote:
>
>> Hi Milind
>>
>> Thanks a lot, I manage to run gdb and produced traceback as well. Its
>> here
>>
>> http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>>
>>
>> I am trying to understand but still not able to make sense out of it.
>>
>> Thanks
>>
>> Kashif
>>
>> On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire <
>> mchan...@redhat.com> wrote:
>>
>>> Kashif,
>>> FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/
>>>
>>>
>>> On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif <
>>> kashif.a...@gmail.com> wrote:
>>>
 Hi Milind

 There is no glusterfs-debuginfo available for gluster-3.12 from
 http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/
 repo. Do you know from where I can get it?
 Also when I run gdb, it says

 Missing separate debuginfos, use: debuginfo-install
 glusterfs-fuse-3.12.9-1.el6.x86_64

 I can't find debug package for glusterfs-fuse either

 Thanks from the pit of despair ;)

 Kashif


Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-15 Thread Nithya Balachandran
Hi Mohammad,

I was unable to reproduce this on a volume created on a system running
3.12.9.

Can you send me the FUSE volfiles for the volume atlasglust? They will be
in   /var/lib/glusterd/vols/atlasglust/ on any of the gluster servers
hosting the volume and called *.tcp-fuse.vol.


Thanks,
Nithya

On 14 June 2018 at 16:42, mohammad kashif  wrote:

> Hi Nithya
>
> It seems that problem can be solved by either turning parallel-readir off
> or downgrading client to 3.10.12-1 . Yesterday I downgraded some clients to
> 3.10.12-1 and it seems to fixed the problem. Today when I saw your email
> then I disabled parallel-readir off and the current client 3.12.9-1
> started  to work.   I upgraded server and clients to 3.12.9-1 last month
> and since then clients were intermittently unmounting once in a week. But
> during last three days, it started unmounting every few minutes. I don't
> know that what triggered this sudden panic except that file system was
> quite full; around 98%. It is 480 TB file system. The file system has
> almost 80 Million files.
>
> Servers have 64GB RAM and clients have 64GB to 192GB RAM. I tested with
> 192GB RAM client and it still had the same issue.
>
>
> Volume Name: atlasglust
> Type: Distribute
> Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 7
> Transport-type: tcp
> Bricks:
> Brick1: pplxgluster01.X.Y.Z/glusteratlas/brick001/gv0
> Brick2: pplxgluster02.X.Y.Z:/glusteratlas/brick002/gv0
> Brick3: pplxgluster03.X.Y.Z:/glusteratlas/brick003/gv0
> Brick4: pplxgluster04.X.Y.Z:/glusteratlas/brick004/gv0
> Brick5: pplxgluster05.X.Y.Z:/glusteratlas/brick005/gv0
> Brick6: pplxgluster06.X.Y.Z:/glusteratlas/brick006/gv0
> Brick7: pplxgluster07.X.Y.Z:/glusteratlas/brick007/gv0
> Options Reconfigured:
> diagnostics.client-log-level: ERROR
> diagnostics.brick-log-level: ERROR
> performance.cache-invalidation: on
> server.event-threads: 4
> client.event-threads: 4
> cluster.lookup-optimize: on
> performance.client-io-threads: on
> performance.cache-size: 1GB
> performance.parallel-readdir: off
> performance.md-cache-timeout: 600
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> auth.allow: X.Y.Z.*
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on
>
>
> Thanks
>
> Kashif
>
> On Thu, Jun 14, 2018 at 5:39 AM, Nithya Balachandran 
> wrote:
>
>> +Poornima who works on parallel-readdir.
>>
>> @Poornima, Have you seen anything like this before?
>>
>> On 14 June 2018 at 10:07, Nithya Balachandran 
>> wrote:
>>
>>> This is not the same issue as the one you are referring - that was in
>>> the RPC layer and caused the bricks to crash. This one is different as it
>>> seems to be in the dht and rda layers. It does look like a stack overflow
>>> though.
>>>
>>> @Mohammad,
>>>
>>> Please send the following information:
>>>
>>> 1. gluster volume info
>>> 2. The number of entries in the directory being listed
>>> 3. System memory
>>>
>>> Does this still happen if you turn off parallel-readdir?
>>>
>>> Regards,
>>> Nithya
>>>
>>>
>>>
>>>
>>> On 13 June 2018 at 16:40, Milind Changire  wrote:
>>>
 +Nithya

 Nithya,
 Do these logs [1]  look similar to the recursive readdir() issue that
 you encountered just a while back ?
 i.e. recursive readdir() response definition in the XDR

 [1] http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log


 On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif >>> > wrote:

> Hi Milind
>
> Thanks a lot, I manage to run gdb and produced traceback as well. Its
> here
>
> http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>
>
> I am trying to understand but still not able to make sense out of it.
>
> Thanks
>
> Kashif
>
> On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire  > wrote:
>
>> Kashif,
>> FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/
>>
>>
>> On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif <
>> kashif.a...@gmail.com> wrote:
>>
>>> Hi Milind
>>>
>>> There is no glusterfs-debuginfo available for gluster-3.12 from
>>> http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/
>>> repo. Do you know from where I can get it?
>>> Also when I run gdb, it says
>>>
>>> Missing separate debuginfos, use: debuginfo-install
>>> glusterfs-fuse-3.12.9-1.el6.x86_64
>>>
>>> I can't find debug package for glusterfs-fuse either
>>>
>>> Thanks from the pit of despair ;)
>>>
>>> Kashif
>>>
>>>
>>> On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif <
>>> kashif.a...@gmail.com> wrote:
>>>
 Hi Milind

 I will send you links for logs.

 I collected these core dumps at client and there is no glusterd
 process running on client.

 Kashif


Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-14 Thread mohammad kashif
Hi Nithya

It seems that problem can be solved by either turning parallel-readir off
or downgrading client to 3.10.12-1 . Yesterday I downgraded some clients to
3.10.12-1 and it seems to fixed the problem. Today when I saw your email
then I disabled parallel-readir off and the current client 3.12.9-1
started  to work.   I upgraded server and clients to 3.12.9-1 last month
and since then clients were intermittently unmounting once in a week. But
during last three days, it started unmounting every few minutes. I don't
know that what triggered this sudden panic except that file system was
quite full; around 98%. It is 480 TB file system. The file system has
almost 80 Million files.

Servers have 64GB RAM and clients have 64GB to 192GB RAM. I tested with
192GB RAM client and it still had the same issue.


Volume Name: atlasglust
Type: Distribute
Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
Status: Started
Snapshot Count: 0
Number of Bricks: 7
Transport-type: tcp
Bricks:
Brick1: pplxgluster01.X.Y.Z/glusteratlas/brick001/gv0
Brick2: pplxgluster02.X.Y.Z:/glusteratlas/brick002/gv0
Brick3: pplxgluster03.X.Y.Z:/glusteratlas/brick003/gv0
Brick4: pplxgluster04.X.Y.Z:/glusteratlas/brick004/gv0
Brick5: pplxgluster05.X.Y.Z:/glusteratlas/brick005/gv0
Brick6: pplxgluster06.X.Y.Z:/glusteratlas/brick006/gv0
Brick7: pplxgluster07.X.Y.Z:/glusteratlas/brick007/gv0
Options Reconfigured:
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
performance.cache-invalidation: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.client-io-threads: on
performance.cache-size: 1GB
performance.parallel-readdir: off
performance.md-cache-timeout: 600
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
auth.allow: X.Y.Z.*
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on


Thanks

Kashif

On Thu, Jun 14, 2018 at 5:39 AM, Nithya Balachandran 
wrote:

> +Poornima who works on parallel-readdir.
>
> @Poornima, Have you seen anything like this before?
>
> On 14 June 2018 at 10:07, Nithya Balachandran  wrote:
>
>> This is not the same issue as the one you are referring - that was in the
>> RPC layer and caused the bricks to crash. This one is different as it seems
>> to be in the dht and rda layers. It does look like a stack overflow though.
>>
>> @Mohammad,
>>
>> Please send the following information:
>>
>> 1. gluster volume info
>> 2. The number of entries in the directory being listed
>> 3. System memory
>>
>> Does this still happen if you turn off parallel-readdir?
>>
>> Regards,
>> Nithya
>>
>>
>>
>>
>> On 13 June 2018 at 16:40, Milind Changire  wrote:
>>
>>> +Nithya
>>>
>>> Nithya,
>>> Do these logs [1]  look similar to the recursive readdir() issue that
>>> you encountered just a while back ?
>>> i.e. recursive readdir() response definition in the XDR
>>>
>>> [1] http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>>>
>>>
>>> On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif 
>>> wrote:
>>>
 Hi Milind

 Thanks a lot, I manage to run gdb and produced traceback as well. Its
 here

 http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log


 I am trying to understand but still not able to make sense out of it.

 Thanks

 Kashif

 On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire 
 wrote:

> Kashif,
> FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/
>
>
> On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif <
> kashif.a...@gmail.com> wrote:
>
>> Hi Milind
>>
>> There is no glusterfs-debuginfo available for gluster-3.12 from
>> http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo.
>> Do you know from where I can get it?
>> Also when I run gdb, it says
>>
>> Missing separate debuginfos, use: debuginfo-install
>> glusterfs-fuse-3.12.9-1.el6.x86_64
>>
>> I can't find debug package for glusterfs-fuse either
>>
>> Thanks from the pit of despair ;)
>>
>> Kashif
>>
>>
>> On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif <
>> kashif.a...@gmail.com> wrote:
>>
>>> Hi Milind
>>>
>>> I will send you links for logs.
>>>
>>> I collected these core dumps at client and there is no glusterd
>>> process running on client.
>>>
>>> Kashif
>>>
>>>
>>>
>>> On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire <
>>> mchan...@redhat.com> wrote:
>>>
 Kashif,
 Could you also send over the client/mount log file as Vijay
 suggested ?
 Or maybe the lines with the crash backtrace lines

 Also, you've mentioned that you straced glusterd, but when you ran
 gdb, you ran it over /usr/sbin/glusterfs


 On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur 
 wrote:

>
>
> On Tue, Jun 12, 2018 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-13 Thread Nithya Balachandran
+Poornima who works on parallel-readdir.

@Poornima, Have you seen anything like this before?

On 14 June 2018 at 10:07, Nithya Balachandran  wrote:

> This is not the same issue as the one you are referring - that was in the
> RPC layer and caused the bricks to crash. This one is different as it seems
> to be in the dht and rda layers. It does look like a stack overflow though.
>
> @Mohammad,
>
> Please send the following information:
>
> 1. gluster volume info
> 2. The number of entries in the directory being listed
> 3. System memory
>
> Does this still happen if you turn off parallel-readdir?
>
> Regards,
> Nithya
>
>
>
>
> On 13 June 2018 at 16:40, Milind Changire  wrote:
>
>> +Nithya
>>
>> Nithya,
>> Do these logs [1]  look similar to the recursive readdir() issue that you
>> encountered just a while back ?
>> i.e. recursive readdir() response definition in the XDR
>>
>> [1] http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>>
>>
>> On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif 
>> wrote:
>>
>>> Hi Milind
>>>
>>> Thanks a lot, I manage to run gdb and produced traceback as well. Its
>>> here
>>>
>>> http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>>>
>>>
>>> I am trying to understand but still not able to make sense out of it.
>>>
>>> Thanks
>>>
>>> Kashif
>>>
>>> On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire 
>>> wrote:
>>>
 Kashif,
 FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/


 On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif >>> > wrote:

> Hi Milind
>
> There is no glusterfs-debuginfo available for gluster-3.12 from
> http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo.
> Do you know from where I can get it?
> Also when I run gdb, it says
>
> Missing separate debuginfos, use: debuginfo-install
> glusterfs-fuse-3.12.9-1.el6.x86_64
>
> I can't find debug package for glusterfs-fuse either
>
> Thanks from the pit of despair ;)
>
> Kashif
>
>
> On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif <
> kashif.a...@gmail.com> wrote:
>
>> Hi Milind
>>
>> I will send you links for logs.
>>
>> I collected these core dumps at client and there is no glusterd
>> process running on client.
>>
>> Kashif
>>
>>
>>
>> On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire > > wrote:
>>
>>> Kashif,
>>> Could you also send over the client/mount log file as Vijay
>>> suggested ?
>>> Or maybe the lines with the crash backtrace lines
>>>
>>> Also, you've mentioned that you straced glusterd, but when you ran
>>> gdb, you ran it over /usr/sbin/glusterfs
>>>
>>>
>>> On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur 
>>> wrote:
>>>


 On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif <
 kashif.a...@gmail.com> wrote:

> Hi Milind
>
> The operating system is Scientific Linux 6 which is based on
> RHEL6. The cpu arch is Intel x86_64.
>
> I will send you a separate email with link to core dump.
>


 You could also grep for crash in the client log file and the lines
 following crash would have a backtrace in most cases.

 HTH,
 Vijay


>
> Thanks for your help.
>
> Kashif
>
>
> On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire <
> mchan...@redhat.com> wrote:
>
>> Kashif,
>> Could you share the core dump via Google Drive or something
>> similar
>>
>> Also, let me know the CPU arch and OS Distribution on which you
>> are running gluster.
>>
>> If you've installed the glusterfs-debuginfo package, you'll also
>> get the source lines in the backtrace via gdb
>>
>>
>>
>> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif <
>> kashif.a...@gmail.com> wrote:
>>
>>> Hi Milind, Vijay
>>>
>>> Thanks, I have some more information now as I straced glusterd
>>> on client
>>>
>>> 138544  0.000131 mprotect(0x7f2f70785000, 4096,
>>> PROT_READ|PROT_WRITE) = 0 <0.26>
>>> 138544  0.000128 mprotect(0x7f2f70786000, 4096,
>>> PROT_READ|PROT_WRITE) = 0 <0.27>
>>> 138544  0.000126 mprotect(0x7f2f70787000, 4096,
>>> PROT_READ|PROT_WRITE) = 0 <0.27>
>>> 138544  0.000124 --- SIGSEGV {si_signo=SIGSEGV,
>>> si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} ---
>>> 138544  0.51 --- SIGSEGV {si_signo=SIGSEGV,
>>> si_code=SI_KERNEL, si_addr=0} ---
>>> 138551  0.105048 +++ killed by SIGSEGV (core dumped) +++
>>> 138550  0.41 +++ killed by SIGSEGV (core dumped) +++
>>> 138547  0.08 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-13 Thread Nithya Balachandran
This is not the same issue as the one you are referring - that was in the
RPC layer and caused the bricks to crash. This one is different as it seems
to be in the dht and rda layers. It does look like a stack overflow though.

@Mohammad,

Please send the following information:

1. gluster volume info
2. The number of entries in the directory being listed
3. System memory

Does this still happen if you turn off parallel-readdir?

Regards,
Nithya




On 13 June 2018 at 16:40, Milind Changire  wrote:

> +Nithya
>
> Nithya,
> Do these logs [1]  look similar to the recursive readdir() issue that you
> encountered just a while back ?
> i.e. recursive readdir() response definition in the XDR
>
> [1] http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>
>
> On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif 
> wrote:
>
>> Hi Milind
>>
>> Thanks a lot, I manage to run gdb and produced traceback as well. Its here
>>
>> http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>>
>>
>> I am trying to understand but still not able to make sense out of it.
>>
>> Thanks
>>
>> Kashif
>>
>> On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire 
>> wrote:
>>
>>> Kashif,
>>> FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/
>>>
>>>
>>> On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif 
>>> wrote:
>>>
 Hi Milind

 There is no glusterfs-debuginfo available for gluster-3.12 from
 http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo.
 Do you know from where I can get it?
 Also when I run gdb, it says

 Missing separate debuginfos, use: debuginfo-install
 glusterfs-fuse-3.12.9-1.el6.x86_64

 I can't find debug package for glusterfs-fuse either

 Thanks from the pit of despair ;)

 Kashif


 On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif >>> > wrote:

> Hi Milind
>
> I will send you links for logs.
>
> I collected these core dumps at client and there is no glusterd
> process running on client.
>
> Kashif
>
>
>
> On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire 
> wrote:
>
>> Kashif,
>> Could you also send over the client/mount log file as Vijay suggested
>> ?
>> Or maybe the lines with the crash backtrace lines
>>
>> Also, you've mentioned that you straced glusterd, but when you ran
>> gdb, you ran it over /usr/sbin/glusterfs
>>
>>
>> On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur 
>> wrote:
>>
>>>
>>>
>>> On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif <
>>> kashif.a...@gmail.com> wrote:
>>>
 Hi Milind

 The operating system is Scientific Linux 6 which is based on RHEL6.
 The cpu arch is Intel x86_64.

 I will send you a separate email with link to core dump.

>>>
>>>
>>> You could also grep for crash in the client log file and the lines
>>> following crash would have a backtrace in most cases.
>>>
>>> HTH,
>>> Vijay
>>>
>>>

 Thanks for your help.

 Kashif


 On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire <
 mchan...@redhat.com> wrote:

> Kashif,
> Could you share the core dump via Google Drive or something similar
>
> Also, let me know the CPU arch and OS Distribution on which you
> are running gluster.
>
> If you've installed the glusterfs-debuginfo package, you'll also
> get the source lines in the backtrace via gdb
>
>
>
> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif <
> kashif.a...@gmail.com> wrote:
>
>> Hi Milind, Vijay
>>
>> Thanks, I have some more information now as I straced glusterd on
>> client
>>
>> 138544  0.000131 mprotect(0x7f2f70785000, 4096,
>> PROT_READ|PROT_WRITE) = 0 <0.26>
>> 138544  0.000128 mprotect(0x7f2f70786000, 4096,
>> PROT_READ|PROT_WRITE) = 0 <0.27>
>> 138544  0.000126 mprotect(0x7f2f70787000, 4096,
>> PROT_READ|PROT_WRITE) = 0 <0.27>
>> 138544  0.000124 --- SIGSEGV {si_signo=SIGSEGV,
>> si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} ---
>> 138544  0.51 --- SIGSEGV {si_signo=SIGSEGV,
>> si_code=SI_KERNEL, si_addr=0} ---
>> 138551  0.105048 +++ killed by SIGSEGV (core dumped) +++
>> 138550  0.41 +++ killed by SIGSEGV (core dumped) +++
>> 138547  0.08 +++ killed by SIGSEGV (core dumped) +++
>> 138546  0.07 +++ killed by SIGSEGV (core dumped) +++
>> 138545  0.07 +++ killed by SIGSEGV (core dumped) +++
>> 138544  0.08 +++ killed by SIGSEGV (core dumped) +++
>> 138543  0.07 +++ killed by SIGSEGV (core dumped) +++
>>
>> As for I 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-13 Thread Milind Changire
+Nithya

Nithya,
Do these logs [1]  look similar to the recursive readdir() issue that you
encountered just a while back ?
i.e. recursive readdir() response definition in the XDR

[1] http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log


On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif 
wrote:

> Hi Milind
>
> Thanks a lot, I manage to run gdb and produced traceback as well. Its here
>
> http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>
>
> I am trying to understand but still not able to make sense out of it.
>
> Thanks
>
> Kashif
>
> On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire 
> wrote:
>
>> Kashif,
>> FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/
>>
>>
>> On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif 
>> wrote:
>>
>>> Hi Milind
>>>
>>> There is no glusterfs-debuginfo available for gluster-3.12 from
>>> http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo. Do
>>> you know from where I can get it?
>>> Also when I run gdb, it says
>>>
>>> Missing separate debuginfos, use: debuginfo-install
>>> glusterfs-fuse-3.12.9-1.el6.x86_64
>>>
>>> I can't find debug package for glusterfs-fuse either
>>>
>>> Thanks from the pit of despair ;)
>>>
>>> Kashif
>>>
>>>
>>> On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif 
>>> wrote:
>>>
 Hi Milind

 I will send you links for logs.

 I collected these core dumps at client and there is no glusterd process
 running on client.

 Kashif



 On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire 
 wrote:

> Kashif,
> Could you also send over the client/mount log file as Vijay suggested ?
> Or maybe the lines with the crash backtrace lines
>
> Also, you've mentioned that you straced glusterd, but when you ran
> gdb, you ran it over /usr/sbin/glusterfs
>
>
> On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur 
> wrote:
>
>>
>>
>> On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif <
>> kashif.a...@gmail.com> wrote:
>>
>>> Hi Milind
>>>
>>> The operating system is Scientific Linux 6 which is based on RHEL6.
>>> The cpu arch is Intel x86_64.
>>>
>>> I will send you a separate email with link to core dump.
>>>
>>
>>
>> You could also grep for crash in the client log file and the lines
>> following crash would have a backtrace in most cases.
>>
>> HTH,
>> Vijay
>>
>>
>>>
>>> Thanks for your help.
>>>
>>> Kashif
>>>
>>>
>>> On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire <
>>> mchan...@redhat.com> wrote:
>>>
 Kashif,
 Could you share the core dump via Google Drive or something similar

 Also, let me know the CPU arch and OS Distribution on which you are
 running gluster.

 If you've installed the glusterfs-debuginfo package, you'll also
 get the source lines in the backtrace via gdb



 On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif <
 kashif.a...@gmail.com> wrote:

> Hi Milind, Vijay
>
> Thanks, I have some more information now as I straced glusterd on
> client
>
> 138544  0.000131 mprotect(0x7f2f70785000, 4096,
> PROT_READ|PROT_WRITE) = 0 <0.26>
> 138544  0.000128 mprotect(0x7f2f70786000, 4096,
> PROT_READ|PROT_WRITE) = 0 <0.27>
> 138544  0.000126 mprotect(0x7f2f70787000, 4096,
> PROT_READ|PROT_WRITE) = 0 <0.27>
> 138544  0.000124 --- SIGSEGV {si_signo=SIGSEGV,
> si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} ---
> 138544  0.51 --- SIGSEGV {si_signo=SIGSEGV,
> si_code=SI_KERNEL, si_addr=0} ---
> 138551  0.105048 +++ killed by SIGSEGV (core dumped) +++
> 138550  0.41 +++ killed by SIGSEGV (core dumped) +++
> 138547  0.08 +++ killed by SIGSEGV (core dumped) +++
> 138546  0.07 +++ killed by SIGSEGV (core dumped) +++
> 138545  0.07 +++ killed by SIGSEGV (core dumped) +++
> 138544  0.08 +++ killed by SIGSEGV (core dumped) +++
> 138543  0.07 +++ killed by SIGSEGV (core dumped) +++
>
> As for I understand that somehow gluster is trying to access
> memory in appropriate manner and kernel sends SIGSEGV
>
> I also got the core dump. I am trying gdb first time so I am not
> sure whether I am using it correctly
>
> gdb /usr/sbin/glusterfs core.138536
>
> It just tell me that program terminated with signal 11,
> segmentation fault .
>
> The problem is not limited to one client but happening to many
> clients.
>
> I will really appreciate any help as whole file system has become
> unusable
>
> Thanks
>
> Kashif

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-13 Thread mohammad kashif
Hi Milind

Thanks a lot, I manage to run gdb and produced traceback as well. Its here

http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log


I am trying to understand but still not able to make sense out of it.

Thanks

Kashif

On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire 
wrote:

> Kashif,
> FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/
>
>
> On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif 
> wrote:
>
>> Hi Milind
>>
>> There is no glusterfs-debuginfo available for gluster-3.12 from
>> http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo. Do
>> you know from where I can get it?
>> Also when I run gdb, it says
>>
>> Missing separate debuginfos, use: debuginfo-install
>> glusterfs-fuse-3.12.9-1.el6.x86_64
>>
>> I can't find debug package for glusterfs-fuse either
>>
>> Thanks from the pit of despair ;)
>>
>> Kashif
>>
>>
>> On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif 
>> wrote:
>>
>>> Hi Milind
>>>
>>> I will send you links for logs.
>>>
>>> I collected these core dumps at client and there is no glusterd process
>>> running on client.
>>>
>>> Kashif
>>>
>>>
>>>
>>> On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire 
>>> wrote:
>>>
 Kashif,
 Could you also send over the client/mount log file as Vijay suggested ?
 Or maybe the lines with the crash backtrace lines

 Also, you've mentioned that you straced glusterd, but when you ran gdb,
 you ran it over /usr/sbin/glusterfs


 On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur 
 wrote:

>
>
> On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif <
> kashif.a...@gmail.com> wrote:
>
>> Hi Milind
>>
>> The operating system is Scientific Linux 6 which is based on RHEL6.
>> The cpu arch is Intel x86_64.
>>
>> I will send you a separate email with link to core dump.
>>
>
>
> You could also grep for crash in the client log file and the lines
> following crash would have a backtrace in most cases.
>
> HTH,
> Vijay
>
>
>>
>> Thanks for your help.
>>
>> Kashif
>>
>>
>> On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire > > wrote:
>>
>>> Kashif,
>>> Could you share the core dump via Google Drive or something similar
>>>
>>> Also, let me know the CPU arch and OS Distribution on which you are
>>> running gluster.
>>>
>>> If you've installed the glusterfs-debuginfo package, you'll also get
>>> the source lines in the backtrace via gdb
>>>
>>>
>>>
>>> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif <
>>> kashif.a...@gmail.com> wrote:
>>>
 Hi Milind, Vijay

 Thanks, I have some more information now as I straced glusterd on
 client

 138544  0.000131 mprotect(0x7f2f70785000, 4096,
 PROT_READ|PROT_WRITE) = 0 <0.26>
 138544  0.000128 mprotect(0x7f2f70786000, 4096,
 PROT_READ|PROT_WRITE) = 0 <0.27>
 138544  0.000126 mprotect(0x7f2f70787000, 4096,
 PROT_READ|PROT_WRITE) = 0 <0.27>
 138544  0.000124 --- SIGSEGV {si_signo=SIGSEGV,
 si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} ---
 138544  0.51 --- SIGSEGV {si_signo=SIGSEGV,
 si_code=SI_KERNEL, si_addr=0} ---
 138551  0.105048 +++ killed by SIGSEGV (core dumped) +++
 138550  0.41 +++ killed by SIGSEGV (core dumped) +++
 138547  0.08 +++ killed by SIGSEGV (core dumped) +++
 138546  0.07 +++ killed by SIGSEGV (core dumped) +++
 138545  0.07 +++ killed by SIGSEGV (core dumped) +++
 138544  0.08 +++ killed by SIGSEGV (core dumped) +++
 138543  0.07 +++ killed by SIGSEGV (core dumped) +++

 As for I understand that somehow gluster is trying to access memory
 in appropriate manner and kernel sends SIGSEGV

 I also got the core dump. I am trying gdb first time so I am not
 sure whether I am using it correctly

 gdb /usr/sbin/glusterfs core.138536

 It just tell me that program terminated with signal 11,
 segmentation fault .

 The problem is not limited to one client but happening to many
 clients.

 I will really appreciate any help as whole file system has become
 unusable

 Thanks

 Kashif




 On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire <
 mchan...@redhat.com> wrote:

> Kashif,
> You can change the log level by:
> $ gluster volume set  diagnostics.brick-log-level TRACE
> $ gluster volume set  diagnostics.client-log-level TRACE
>
> and see how things fare
>
> If you want fewer logs you can change the log-level to DEBUG
> instead of TRACE.

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-13 Thread Milind Changire
Kashif,
FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/


On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif 
wrote:

> Hi Milind
>
> There is no glusterfs-debuginfo available for gluster-3.12 from
> http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo. Do
> you know from where I can get it?
> Also when I run gdb, it says
>
> Missing separate debuginfos, use: debuginfo-install
> glusterfs-fuse-3.12.9-1.el6.x86_64
>
> I can't find debug package for glusterfs-fuse either
>
> Thanks from the pit of despair ;)
>
> Kashif
>
>
> On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif 
> wrote:
>
>> Hi Milind
>>
>> I will send you links for logs.
>>
>> I collected these core dumps at client and there is no glusterd process
>> running on client.
>>
>> Kashif
>>
>>
>>
>> On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire 
>> wrote:
>>
>>> Kashif,
>>> Could you also send over the client/mount log file as Vijay suggested ?
>>> Or maybe the lines with the crash backtrace lines
>>>
>>> Also, you've mentioned that you straced glusterd, but when you ran gdb,
>>> you ran it over /usr/sbin/glusterfs
>>>
>>>
>>> On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur 
>>> wrote:
>>>


 On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif >>> > wrote:

> Hi Milind
>
> The operating system is Scientific Linux 6 which is based on RHEL6.
> The cpu arch is Intel x86_64.
>
> I will send you a separate email with link to core dump.
>


 You could also grep for crash in the client log file and the lines
 following crash would have a backtrace in most cases.

 HTH,
 Vijay


>
> Thanks for your help.
>
> Kashif
>
>
> On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire 
> wrote:
>
>> Kashif,
>> Could you share the core dump via Google Drive or something similar
>>
>> Also, let me know the CPU arch and OS Distribution on which you are
>> running gluster.
>>
>> If you've installed the glusterfs-debuginfo package, you'll also get
>> the source lines in the backtrace via gdb
>>
>>
>>
>> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif <
>> kashif.a...@gmail.com> wrote:
>>
>>> Hi Milind, Vijay
>>>
>>> Thanks, I have some more information now as I straced glusterd on
>>> client
>>>
>>> 138544  0.000131 mprotect(0x7f2f70785000, 4096,
>>> PROT_READ|PROT_WRITE) = 0 <0.26>
>>> 138544  0.000128 mprotect(0x7f2f70786000, 4096,
>>> PROT_READ|PROT_WRITE) = 0 <0.27>
>>> 138544  0.000126 mprotect(0x7f2f70787000, 4096,
>>> PROT_READ|PROT_WRITE) = 0 <0.27>
>>> 138544  0.000124 --- SIGSEGV {si_signo=SIGSEGV,
>>> si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} ---
>>> 138544  0.51 --- SIGSEGV {si_signo=SIGSEGV,
>>> si_code=SI_KERNEL, si_addr=0} ---
>>> 138551  0.105048 +++ killed by SIGSEGV (core dumped) +++
>>> 138550  0.41 +++ killed by SIGSEGV (core dumped) +++
>>> 138547  0.08 +++ killed by SIGSEGV (core dumped) +++
>>> 138546  0.07 +++ killed by SIGSEGV (core dumped) +++
>>> 138545  0.07 +++ killed by SIGSEGV (core dumped) +++
>>> 138544  0.08 +++ killed by SIGSEGV (core dumped) +++
>>> 138543  0.07 +++ killed by SIGSEGV (core dumped) +++
>>>
>>> As for I understand that somehow gluster is trying to access memory
>>> in appropriate manner and kernel sends SIGSEGV
>>>
>>> I also got the core dump. I am trying gdb first time so I am not
>>> sure whether I am using it correctly
>>>
>>> gdb /usr/sbin/glusterfs core.138536
>>>
>>> It just tell me that program terminated with signal 11, segmentation
>>> fault .
>>>
>>> The problem is not limited to one client but happening to many
>>> clients.
>>>
>>> I will really appreciate any help as whole file system has become
>>> unusable
>>>
>>> Thanks
>>>
>>> Kashif
>>>
>>>
>>>
>>>
>>> On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire <
>>> mchan...@redhat.com> wrote:
>>>
 Kashif,
 You can change the log level by:
 $ gluster volume set  diagnostics.brick-log-level TRACE
 $ gluster volume set  diagnostics.client-log-level TRACE

 and see how things fare

 If you want fewer logs you can change the log-level to DEBUG
 instead of TRACE.



 On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif <
 kashif.a...@gmail.com> wrote:

> Hi Vijay
>
> Now it is unmounting every 30 mins !
>
> The server log at 
> /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
> have this line only
>
> 2018-06-12 09:53:19.303102] I [MSGID: 115013]
> [server-helpers.c:289:do_fd_cleanup] 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-13 Thread mohammad kashif
Hi Milind

There is no glusterfs-debuginfo available for gluster-3.12 from
http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo. Do you
know from where I can get it?
Also when I run gdb, it says

Missing separate debuginfos, use: debuginfo-install
glusterfs-fuse-3.12.9-1.el6.x86_64

I can't find debug package for glusterfs-fuse either

Thanks from the pit of despair ;)

Kashif


On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif 
wrote:

> Hi Milind
>
> I will send you links for logs.
>
> I collected these core dumps at client and there is no glusterd process
> running on client.
>
> Kashif
>
>
>
> On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire 
> wrote:
>
>> Kashif,
>> Could you also send over the client/mount log file as Vijay suggested ?
>> Or maybe the lines with the crash backtrace lines
>>
>> Also, you've mentioned that you straced glusterd, but when you ran gdb,
>> you ran it over /usr/sbin/glusterfs
>>
>>
>> On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur  wrote:
>>
>>>
>>>
>>> On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif 
>>> wrote:
>>>
 Hi Milind

 The operating system is Scientific Linux 6 which is based on RHEL6. The
 cpu arch is Intel x86_64.

 I will send you a separate email with link to core dump.

>>>
>>>
>>> You could also grep for crash in the client log file and the lines
>>> following crash would have a backtrace in most cases.
>>>
>>> HTH,
>>> Vijay
>>>
>>>

 Thanks for your help.

 Kashif


 On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire 
 wrote:

> Kashif,
> Could you share the core dump via Google Drive or something similar
>
> Also, let me know the CPU arch and OS Distribution on which you are
> running gluster.
>
> If you've installed the glusterfs-debuginfo package, you'll also get
> the source lines in the backtrace via gdb
>
>
>
> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif <
> kashif.a...@gmail.com> wrote:
>
>> Hi Milind, Vijay
>>
>> Thanks, I have some more information now as I straced glusterd on
>> client
>>
>> 138544  0.000131 mprotect(0x7f2f70785000, 4096,
>> PROT_READ|PROT_WRITE) = 0 <0.26>
>> 138544  0.000128 mprotect(0x7f2f70786000, 4096,
>> PROT_READ|PROT_WRITE) = 0 <0.27>
>> 138544  0.000126 mprotect(0x7f2f70787000, 4096,
>> PROT_READ|PROT_WRITE) = 0 <0.27>
>> 138544  0.000124 --- SIGSEGV {si_signo=SIGSEGV,
>> si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} ---
>> 138544  0.51 --- SIGSEGV {si_signo=SIGSEGV,
>> si_code=SI_KERNEL, si_addr=0} ---
>> 138551  0.105048 +++ killed by SIGSEGV (core dumped) +++
>> 138550  0.41 +++ killed by SIGSEGV (core dumped) +++
>> 138547  0.08 +++ killed by SIGSEGV (core dumped) +++
>> 138546  0.07 +++ killed by SIGSEGV (core dumped) +++
>> 138545  0.07 +++ killed by SIGSEGV (core dumped) +++
>> 138544  0.08 +++ killed by SIGSEGV (core dumped) +++
>> 138543  0.07 +++ killed by SIGSEGV (core dumped) +++
>>
>> As for I understand that somehow gluster is trying to access memory
>> in appropriate manner and kernel sends SIGSEGV
>>
>> I also got the core dump. I am trying gdb first time so I am not sure
>> whether I am using it correctly
>>
>> gdb /usr/sbin/glusterfs core.138536
>>
>> It just tell me that program terminated with signal 11, segmentation
>> fault .
>>
>> The problem is not limited to one client but happening to many
>> clients.
>>
>> I will really appreciate any help as whole file system has become
>> unusable
>>
>> Thanks
>>
>> Kashif
>>
>>
>>
>>
>> On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire <
>> mchan...@redhat.com> wrote:
>>
>>> Kashif,
>>> You can change the log level by:
>>> $ gluster volume set  diagnostics.brick-log-level TRACE
>>> $ gluster volume set  diagnostics.client-log-level TRACE
>>>
>>> and see how things fare
>>>
>>> If you want fewer logs you can change the log-level to DEBUG instead
>>> of TRACE.
>>>
>>>
>>>
>>> On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif <
>>> kashif.a...@gmail.com> wrote:
>>>
 Hi Vijay

 Now it is unmounting every 30 mins !

 The server log at 
 /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
 have this line only

 2018-06-12 09:53:19.303102] I [MSGID: 115013]
 [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd
 cleanup on /atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal
 [2018-06-12 09:53:19.306190] I [MSGID: 101055]
 [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting
 down connection  -2224879-2018/06/12-09:51:01:4
 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-12 Thread mohammad kashif
Hi Milind

I will send you links for logs.

I collected these core dumps at client and there is no glusterd process
running on client.

Kashif



On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire 
wrote:

> Kashif,
> Could you also send over the client/mount log file as Vijay suggested ?
> Or maybe the lines with the crash backtrace lines
>
> Also, you've mentioned that you straced glusterd, but when you ran gdb,
> you ran it over /usr/sbin/glusterfs
>
>
> On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur  wrote:
>
>>
>>
>> On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif 
>> wrote:
>>
>>> Hi Milind
>>>
>>> The operating system is Scientific Linux 6 which is based on RHEL6. The
>>> cpu arch is Intel x86_64.
>>>
>>> I will send you a separate email with link to core dump.
>>>
>>
>>
>> You could also grep for crash in the client log file and the lines
>> following crash would have a backtrace in most cases.
>>
>> HTH,
>> Vijay
>>
>>
>>>
>>> Thanks for your help.
>>>
>>> Kashif
>>>
>>>
>>> On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire 
>>> wrote:
>>>
 Kashif,
 Could you share the core dump via Google Drive or something similar

 Also, let me know the CPU arch and OS Distribution on which you are
 running gluster.

 If you've installed the glusterfs-debuginfo package, you'll also get
 the source lines in the backtrace via gdb



 On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif >>> > wrote:

> Hi Milind, Vijay
>
> Thanks, I have some more information now as I straced glusterd on
> client
>
> 138544  0.000131 mprotect(0x7f2f70785000, 4096,
> PROT_READ|PROT_WRITE) = 0 <0.26>
> 138544  0.000128 mprotect(0x7f2f70786000, 4096,
> PROT_READ|PROT_WRITE) = 0 <0.27>
> 138544  0.000126 mprotect(0x7f2f70787000, 4096,
> PROT_READ|PROT_WRITE) = 0 <0.27>
> 138544  0.000124 --- SIGSEGV {si_signo=SIGSEGV,
> si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} ---
> 138544  0.51 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL,
> si_addr=0} ---
> 138551  0.105048 +++ killed by SIGSEGV (core dumped) +++
> 138550  0.41 +++ killed by SIGSEGV (core dumped) +++
> 138547  0.08 +++ killed by SIGSEGV (core dumped) +++
> 138546  0.07 +++ killed by SIGSEGV (core dumped) +++
> 138545  0.07 +++ killed by SIGSEGV (core dumped) +++
> 138544  0.08 +++ killed by SIGSEGV (core dumped) +++
> 138543  0.07 +++ killed by SIGSEGV (core dumped) +++
>
> As for I understand that somehow gluster is trying to access memory in
> appropriate manner and kernel sends SIGSEGV
>
> I also got the core dump. I am trying gdb first time so I am not sure
> whether I am using it correctly
>
> gdb /usr/sbin/glusterfs core.138536
>
> It just tell me that program terminated with signal 11, segmentation
> fault .
>
> The problem is not limited to one client but happening to many
> clients.
>
> I will really appreciate any help as whole file system has become
> unusable
>
> Thanks
>
> Kashif
>
>
>
>
> On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire  > wrote:
>
>> Kashif,
>> You can change the log level by:
>> $ gluster volume set  diagnostics.brick-log-level TRACE
>> $ gluster volume set  diagnostics.client-log-level TRACE
>>
>> and see how things fare
>>
>> If you want fewer logs you can change the log-level to DEBUG instead
>> of TRACE.
>>
>>
>>
>> On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif <
>> kashif.a...@gmail.com> wrote:
>>
>>> Hi Vijay
>>>
>>> Now it is unmounting every 30 mins !
>>>
>>> The server log at 
>>> /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
>>> have this line only
>>>
>>> 2018-06-12 09:53:19.303102] I [MSGID: 115013]
>>> [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd
>>> cleanup on /atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal
>>> [2018-06-12 09:53:19.306190] I [MSGID: 101055]
>>> [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting down
>>> connection  -2224879-2018/06/12-09:51:01:4
>>> 60889-atlasglust-client-0-0-0
>>>
>>> There is no other information. Is there any way to increase log
>>> verbosity?
>>>
>>> on the client
>>>
>>> 2018-06-12 09:51:01.744980] I [MSGID: 114057]
>>> [client-handshake.c:1478:select_server_supported_programs]
>>> 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), 
>>> Version
>>> (330)
>>> [2018-06-12 09:51:01.746508] I [MSGID: 114046]
>>> [client-handshake.c:1231:client_setvolume_cbk]
>>> 0-atlasglust-client-5: Connected to atlasglust-client-5, attached to 
>>> remote
>>> volume '/glusteratlas/brick006/gv0'.
>>> [2018-06-12 09:51:01.746543] I 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-12 Thread mohammad kashif
Hi Vijay

I have enabled TRACE for client and there are lots of Trace messages in log
but no 'crash'

The only error I can see is about inode context is NULL

[io-cache.c:564:ioc_open_cbk] 0-atlasglust-io-cache: inode context is NULL
(748157d2-274f-4595-9bb6-afb1fb5a0642) [Invalid argument]

Kashif

On Tue, Jun 12, 2018 at 3:49 PM, Vijay Bellur  wrote:

>
>
> On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif 
> wrote:
>
>> Hi Milind
>>
>> The operating system is Scientific Linux 6 which is based on RHEL6. The
>> cpu arch is Intel x86_64.
>>
>> I will send you a separate email with link to core dump.
>>
>
>
> You could also grep for crash in the client log file and the lines
> following crash would have a backtrace in most cases.
>
> HTH,
> Vijay
>
>
>>
>> Thanks for your help.
>>
>> Kashif
>>
>>
>> On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire 
>> wrote:
>>
>>> Kashif,
>>> Could you share the core dump via Google Drive or something similar
>>>
>>> Also, let me know the CPU arch and OS Distribution on which you are
>>> running gluster.
>>>
>>> If you've installed the glusterfs-debuginfo package, you'll also get the
>>> source lines in the backtrace via gdb
>>>
>>>
>>>
>>> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif 
>>> wrote:
>>>
 Hi Milind, Vijay

 Thanks, I have some more information now as I straced glusterd on client

 138544  0.000131 mprotect(0x7f2f70785000, 4096,
 PROT_READ|PROT_WRITE) = 0 <0.26>
 138544  0.000128 mprotect(0x7f2f70786000, 4096,
 PROT_READ|PROT_WRITE) = 0 <0.27>
 138544  0.000126 mprotect(0x7f2f70787000, 4096,
 PROT_READ|PROT_WRITE) = 0 <0.27>
 138544  0.000124 --- SIGSEGV {si_signo=SIGSEGV,
 si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} ---
 138544  0.51 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL,
 si_addr=0} ---
 138551  0.105048 +++ killed by SIGSEGV (core dumped) +++
 138550  0.41 +++ killed by SIGSEGV (core dumped) +++
 138547  0.08 +++ killed by SIGSEGV (core dumped) +++
 138546  0.07 +++ killed by SIGSEGV (core dumped) +++
 138545  0.07 +++ killed by SIGSEGV (core dumped) +++
 138544  0.08 +++ killed by SIGSEGV (core dumped) +++
 138543  0.07 +++ killed by SIGSEGV (core dumped) +++

 As for I understand that somehow gluster is trying to access memory in
 appropriate manner and kernel sends SIGSEGV

 I also got the core dump. I am trying gdb first time so I am not sure
 whether I am using it correctly

 gdb /usr/sbin/glusterfs core.138536

 It just tell me that program terminated with signal 11, segmentation
 fault .

 The problem is not limited to one client but happening to many clients.

 I will really appreciate any help as whole file system has become
 unusable

 Thanks

 Kashif




 On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire 
 wrote:

> Kashif,
> You can change the log level by:
> $ gluster volume set  diagnostics.brick-log-level TRACE
> $ gluster volume set  diagnostics.client-log-level TRACE
>
> and see how things fare
>
> If you want fewer logs you can change the log-level to DEBUG instead
> of TRACE.
>
>
>
> On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif <
> kashif.a...@gmail.com> wrote:
>
>> Hi Vijay
>>
>> Now it is unmounting every 30 mins !
>>
>> The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
>> have this line only
>>
>> 2018-06-12 09:53:19.303102] I [MSGID: 115013]
>> [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd cleanup
>> on /atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal
>> [2018-06-12 09:53:19.306190] I [MSGID: 101055]
>> [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting down
>> connection  -2224879-2018/06/12-09:51:01:4
>> 60889-atlasglust-client-0-0-0
>>
>> There is no other information. Is there any way to increase log
>> verbosity?
>>
>> on the client
>>
>> 2018-06-12 09:51:01.744980] I [MSGID: 114057]
>> [client-handshake.c:1478:select_server_supported_programs]
>> 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), 
>> Version
>> (330)
>> [2018-06-12 09:51:01.746508] I [MSGID: 114046]
>> [client-handshake.c:1231:client_setvolume_cbk]
>> 0-atlasglust-client-5: Connected to atlasglust-client-5, attached to 
>> remote
>> volume '/glusteratlas/brick006/gv0'.
>> [2018-06-12 09:51:01.746543] I [MSGID: 114047]
>> [client-handshake.c:1242:client_setvolume_cbk]
>> 0-atlasglust-client-5: Server and Client lk-version numbers are not same,
>> reopening the fds
>> [2018-06-12 09:51:01.746814] I [MSGID: 114035]
>> [client-handshake.c:202:client_set_lk_version_cbk]
>> 0-atlasglust-client-5: 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-12 Thread Milind Changire
Kashif,
Could you also send over the client/mount log file as Vijay suggested ?
Or maybe the lines with the crash backtrace lines

Also, you've mentioned that you straced glusterd, but when you ran gdb, you
ran it over /usr/sbin/glusterfs


On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur  wrote:

>
>
> On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif 
> wrote:
>
>> Hi Milind
>>
>> The operating system is Scientific Linux 6 which is based on RHEL6. The
>> cpu arch is Intel x86_64.
>>
>> I will send you a separate email with link to core dump.
>>
>
>
> You could also grep for crash in the client log file and the lines
> following crash would have a backtrace in most cases.
>
> HTH,
> Vijay
>
>
>>
>> Thanks for your help.
>>
>> Kashif
>>
>>
>> On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire 
>> wrote:
>>
>>> Kashif,
>>> Could you share the core dump via Google Drive or something similar
>>>
>>> Also, let me know the CPU arch and OS Distribution on which you are
>>> running gluster.
>>>
>>> If you've installed the glusterfs-debuginfo package, you'll also get the
>>> source lines in the backtrace via gdb
>>>
>>>
>>>
>>> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif 
>>> wrote:
>>>
 Hi Milind, Vijay

 Thanks, I have some more information now as I straced glusterd on client

 138544  0.000131 mprotect(0x7f2f70785000, 4096,
 PROT_READ|PROT_WRITE) = 0 <0.26>
 138544  0.000128 mprotect(0x7f2f70786000, 4096,
 PROT_READ|PROT_WRITE) = 0 <0.27>
 138544  0.000126 mprotect(0x7f2f70787000, 4096,
 PROT_READ|PROT_WRITE) = 0 <0.27>
 138544  0.000124 --- SIGSEGV {si_signo=SIGSEGV,
 si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} ---
 138544  0.51 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL,
 si_addr=0} ---
 138551  0.105048 +++ killed by SIGSEGV (core dumped) +++
 138550  0.41 +++ killed by SIGSEGV (core dumped) +++
 138547  0.08 +++ killed by SIGSEGV (core dumped) +++
 138546  0.07 +++ killed by SIGSEGV (core dumped) +++
 138545  0.07 +++ killed by SIGSEGV (core dumped) +++
 138544  0.08 +++ killed by SIGSEGV (core dumped) +++
 138543  0.07 +++ killed by SIGSEGV (core dumped) +++

 As for I understand that somehow gluster is trying to access memory in
 appropriate manner and kernel sends SIGSEGV

 I also got the core dump. I am trying gdb first time so I am not sure
 whether I am using it correctly

 gdb /usr/sbin/glusterfs core.138536

 It just tell me that program terminated with signal 11, segmentation
 fault .

 The problem is not limited to one client but happening to many clients.

 I will really appreciate any help as whole file system has become
 unusable

 Thanks

 Kashif




 On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire 
 wrote:

> Kashif,
> You can change the log level by:
> $ gluster volume set  diagnostics.brick-log-level TRACE
> $ gluster volume set  diagnostics.client-log-level TRACE
>
> and see how things fare
>
> If you want fewer logs you can change the log-level to DEBUG instead
> of TRACE.
>
>
>
> On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif <
> kashif.a...@gmail.com> wrote:
>
>> Hi Vijay
>>
>> Now it is unmounting every 30 mins !
>>
>> The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
>> have this line only
>>
>> 2018-06-12 09:53:19.303102] I [MSGID: 115013]
>> [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd cleanup
>> on /atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal
>> [2018-06-12 09:53:19.306190] I [MSGID: 101055]
>> [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting down
>> connection  -2224879-2018/06/12-09:51:01:4
>> 60889-atlasglust-client-0-0-0
>>
>> There is no other information. Is there any way to increase log
>> verbosity?
>>
>> on the client
>>
>> 2018-06-12 09:51:01.744980] I [MSGID: 114057]
>> [client-handshake.c:1478:select_server_supported_programs]
>> 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), 
>> Version
>> (330)
>> [2018-06-12 09:51:01.746508] I [MSGID: 114046]
>> [client-handshake.c:1231:client_setvolume_cbk]
>> 0-atlasglust-client-5: Connected to atlasglust-client-5, attached to 
>> remote
>> volume '/glusteratlas/brick006/gv0'.
>> [2018-06-12 09:51:01.746543] I [MSGID: 114047]
>> [client-handshake.c:1242:client_setvolume_cbk]
>> 0-atlasglust-client-5: Server and Client lk-version numbers are not same,
>> reopening the fds
>> [2018-06-12 09:51:01.746814] I [MSGID: 114035]
>> [client-handshake.c:202:client_set_lk_version_cbk]
>> 0-atlasglust-client-5: Server lk version = 1
>> [2018-06-12 09:51:01.748449] I 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-12 Thread Vijay Bellur
On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif 
wrote:

> Hi Milind
>
> The operating system is Scientific Linux 6 which is based on RHEL6. The
> cpu arch is Intel x86_64.
>
> I will send you a separate email with link to core dump.
>


You could also grep for crash in the client log file and the lines
following crash would have a backtrace in most cases.

HTH,
Vijay


>
> Thanks for your help.
>
> Kashif
>
>
> On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire 
> wrote:
>
>> Kashif,
>> Could you share the core dump via Google Drive or something similar
>>
>> Also, let me know the CPU arch and OS Distribution on which you are
>> running gluster.
>>
>> If you've installed the glusterfs-debuginfo package, you'll also get the
>> source lines in the backtrace via gdb
>>
>>
>>
>> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif 
>> wrote:
>>
>>> Hi Milind, Vijay
>>>
>>> Thanks, I have some more information now as I straced glusterd on client
>>>
>>> 138544  0.000131 mprotect(0x7f2f70785000, 4096,
>>> PROT_READ|PROT_WRITE) = 0 <0.26>
>>> 138544  0.000128 mprotect(0x7f2f70786000, 4096,
>>> PROT_READ|PROT_WRITE) = 0 <0.27>
>>> 138544  0.000126 mprotect(0x7f2f70787000, 4096,
>>> PROT_READ|PROT_WRITE) = 0 <0.27>
>>> 138544  0.000124 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR,
>>> si_addr=0x7f2f7c60ef88} ---
>>> 138544  0.51 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL,
>>> si_addr=0} ---
>>> 138551  0.105048 +++ killed by SIGSEGV (core dumped) +++
>>> 138550  0.41 +++ killed by SIGSEGV (core dumped) +++
>>> 138547  0.08 +++ killed by SIGSEGV (core dumped) +++
>>> 138546  0.07 +++ killed by SIGSEGV (core dumped) +++
>>> 138545  0.07 +++ killed by SIGSEGV (core dumped) +++
>>> 138544  0.08 +++ killed by SIGSEGV (core dumped) +++
>>> 138543  0.07 +++ killed by SIGSEGV (core dumped) +++
>>>
>>> As for I understand that somehow gluster is trying to access memory in
>>> appropriate manner and kernel sends SIGSEGV
>>>
>>> I also got the core dump. I am trying gdb first time so I am not sure
>>> whether I am using it correctly
>>>
>>> gdb /usr/sbin/glusterfs core.138536
>>>
>>> It just tell me that program terminated with signal 11, segmentation
>>> fault .
>>>
>>> The problem is not limited to one client but happening to many clients.
>>>
>>> I will really appreciate any help as whole file system has become
>>> unusable
>>>
>>> Thanks
>>>
>>> Kashif
>>>
>>>
>>>
>>>
>>> On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire 
>>> wrote:
>>>
 Kashif,
 You can change the log level by:
 $ gluster volume set  diagnostics.brick-log-level TRACE
 $ gluster volume set  diagnostics.client-log-level TRACE

 and see how things fare

 If you want fewer logs you can change the log-level to DEBUG instead of
 TRACE.



 On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif >>> > wrote:

> Hi Vijay
>
> Now it is unmounting every 30 mins !
>
> The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
> have this line only
>
> 2018-06-12 09:53:19.303102] I [MSGID: 115013]
> [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd cleanup
> on /atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal
> [2018-06-12 09:53:19.306190] I [MSGID: 101055]
> [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting down
> connection  -2224879-2018/06/12-09:51:01:4
> 60889-atlasglust-client-0-0-0
>
> There is no other information. Is there any way to increase log
> verbosity?
>
> on the client
>
> 2018-06-12 09:51:01.744980] I [MSGID: 114057]
> [client-handshake.c:1478:select_server_supported_programs]
> 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), Version
> (330)
> [2018-06-12 09:51:01.746508] I [MSGID: 114046]
> [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-5:
> Connected to atlasglust-client-5, attached to remote volume
> '/glusteratlas/brick006/gv0'.
> [2018-06-12 09:51:01.746543] I [MSGID: 114047]
> [client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-5:
> Server and Client lk-version numbers are not same, reopening the fds
> [2018-06-12 09:51:01.746814] I [MSGID: 114035]
> [client-handshake.c:202:client_set_lk_version_cbk]
> 0-atlasglust-client-5: Server lk version = 1
> [2018-06-12 09:51:01.748449] I [MSGID: 114057]
> [client-handshake.c:1478:select_server_supported_programs]
> 0-atlasglust-client-6: Using Program GlusterFS 3.3, Num (1298437), Version
> (330)
> [2018-06-12 09:51:01.750219] I [MSGID: 114046]
> [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-6:
> Connected to atlasglust-client-6, attached to remote volume
> '/glusteratlas/brick007/gv0'.
> [2018-06-12 09:51:01.750261] I [MSGID: 114047]
> 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-12 Thread mohammad kashif
Hi Milind

The operating system is Scientific Linux 6 which is based on RHEL6. The cpu
arch is Intel x86_64.

I will send you a separate email with link to core dump.

Thanks for your help.

Kashif


On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire 
wrote:

> Kashif,
> Could you share the core dump via Google Drive or something similar
>
> Also, let me know the CPU arch and OS Distribution on which you are
> running gluster.
>
> If you've installed the glusterfs-debuginfo package, you'll also get the
> source lines in the backtrace via gdb
>
>
>
> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif 
> wrote:
>
>> Hi Milind, Vijay
>>
>> Thanks, I have some more information now as I straced glusterd on client
>>
>> 138544  0.000131 mprotect(0x7f2f70785000, 4096, PROT_READ|PROT_WRITE)
>> = 0 <0.26>
>> 138544  0.000128 mprotect(0x7f2f70786000, 4096, PROT_READ|PROT_WRITE)
>> = 0 <0.27>
>> 138544  0.000126 mprotect(0x7f2f70787000, 4096, PROT_READ|PROT_WRITE)
>> = 0 <0.27>
>> 138544  0.000124 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR,
>> si_addr=0x7f2f7c60ef88} ---
>> 138544  0.51 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL,
>> si_addr=0} ---
>> 138551  0.105048 +++ killed by SIGSEGV (core dumped) +++
>> 138550  0.41 +++ killed by SIGSEGV (core dumped) +++
>> 138547  0.08 +++ killed by SIGSEGV (core dumped) +++
>> 138546  0.07 +++ killed by SIGSEGV (core dumped) +++
>> 138545  0.07 +++ killed by SIGSEGV (core dumped) +++
>> 138544  0.08 +++ killed by SIGSEGV (core dumped) +++
>> 138543  0.07 +++ killed by SIGSEGV (core dumped) +++
>>
>> As for I understand that somehow gluster is trying to access memory in
>> appropriate manner and kernel sends SIGSEGV
>>
>> I also got the core dump. I am trying gdb first time so I am not sure
>> whether I am using it correctly
>>
>> gdb /usr/sbin/glusterfs core.138536
>>
>> It just tell me that program terminated with signal 11, segmentation
>> fault .
>>
>> The problem is not limited to one client but happening to many clients.
>>
>> I will really appreciate any help as whole file system has become
>> unusable
>>
>> Thanks
>>
>> Kashif
>>
>>
>>
>>
>> On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire 
>> wrote:
>>
>>> Kashif,
>>> You can change the log level by:
>>> $ gluster volume set  diagnostics.brick-log-level TRACE
>>> $ gluster volume set  diagnostics.client-log-level TRACE
>>>
>>> and see how things fare
>>>
>>> If you want fewer logs you can change the log-level to DEBUG instead of
>>> TRACE.
>>>
>>>
>>>
>>> On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif 
>>> wrote:
>>>
 Hi Vijay

 Now it is unmounting every 30 mins !

 The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
 have this line only

 2018-06-12 09:53:19.303102] I [MSGID: 115013]
 [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd cleanup
 on /atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal
 [2018-06-12 09:53:19.306190] I [MSGID: 101055]
 [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting down
 connection  -2224879-2018/06/12-09:51:01:4
 60889-atlasglust-client-0-0-0

 There is no other information. Is there any way to increase log
 verbosity?

 on the client

 2018-06-12 09:51:01.744980] I [MSGID: 114057]
 [client-handshake.c:1478:select_server_supported_programs]
 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), Version
 (330)
 [2018-06-12 09:51:01.746508] I [MSGID: 114046]
 [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-5:
 Connected to atlasglust-client-5, attached to remote volume
 '/glusteratlas/brick006/gv0'.
 [2018-06-12 09:51:01.746543] I [MSGID: 114047]
 [client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-5:
 Server and Client lk-version numbers are not same, reopening the fds
 [2018-06-12 09:51:01.746814] I [MSGID: 114035]
 [client-handshake.c:202:client_set_lk_version_cbk]
 0-atlasglust-client-5: Server lk version = 1
 [2018-06-12 09:51:01.748449] I [MSGID: 114057]
 [client-handshake.c:1478:select_server_supported_programs]
 0-atlasglust-client-6: Using Program GlusterFS 3.3, Num (1298437), Version
 (330)
 [2018-06-12 09:51:01.750219] I [MSGID: 114046]
 [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-6:
 Connected to atlasglust-client-6, attached to remote volume
 '/glusteratlas/brick007/gv0'.
 [2018-06-12 09:51:01.750261] I [MSGID: 114047]
 [client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-6:
 Server and Client lk-version numbers are not same, reopening the fds
 [2018-06-12 09:51:01.750503] I [MSGID: 114035]
 [client-handshake.c:202:client_set_lk_version_cbk]
 0-atlasglust-client-6: Server lk version = 1
 [2018-06-12 09:51:01.752207] I [fuse-bridge.c:4205:fuse_init]
 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-12 Thread Milind Changire
Kashif,
Could you share the core dump via Google Drive or something similar

Also, let me know the CPU arch and OS Distribution on which you are running
gluster.

If you've installed the glusterfs-debuginfo package, you'll also get the
source lines in the backtrace via gdb



On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif 
wrote:

> Hi Milind, Vijay
>
> Thanks, I have some more information now as I straced glusterd on client
>
> 138544  0.000131 mprotect(0x7f2f70785000, 4096, PROT_READ|PROT_WRITE)
> = 0 <0.26>
> 138544  0.000128 mprotect(0x7f2f70786000, 4096, PROT_READ|PROT_WRITE)
> = 0 <0.27>
> 138544  0.000126 mprotect(0x7f2f70787000, 4096, PROT_READ|PROT_WRITE)
> = 0 <0.27>
> 138544  0.000124 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR,
> si_addr=0x7f2f7c60ef88} ---
> 138544  0.51 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL,
> si_addr=0} ---
> 138551  0.105048 +++ killed by SIGSEGV (core dumped) +++
> 138550  0.41 +++ killed by SIGSEGV (core dumped) +++
> 138547  0.08 +++ killed by SIGSEGV (core dumped) +++
> 138546  0.07 +++ killed by SIGSEGV (core dumped) +++
> 138545  0.07 +++ killed by SIGSEGV (core dumped) +++
> 138544  0.08 +++ killed by SIGSEGV (core dumped) +++
> 138543  0.07 +++ killed by SIGSEGV (core dumped) +++
>
> As for I understand that somehow gluster is trying to access memory in
> appropriate manner and kernel sends SIGSEGV
>
> I also got the core dump. I am trying gdb first time so I am not sure
> whether I am using it correctly
>
> gdb /usr/sbin/glusterfs core.138536
>
> It just tell me that program terminated with signal 11, segmentation fault
> .
>
> The problem is not limited to one client but happening to many clients.
>
> I will really appreciate any help as whole file system has become unusable
>
> Thanks
>
> Kashif
>
>
>
>
> On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire 
> wrote:
>
>> Kashif,
>> You can change the log level by:
>> $ gluster volume set  diagnostics.brick-log-level TRACE
>> $ gluster volume set  diagnostics.client-log-level TRACE
>>
>> and see how things fare
>>
>> If you want fewer logs you can change the log-level to DEBUG instead of
>> TRACE.
>>
>>
>>
>> On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif 
>> wrote:
>>
>>> Hi Vijay
>>>
>>> Now it is unmounting every 30 mins !
>>>
>>> The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
>>> have this line only
>>>
>>> 2018-06-12 09:53:19.303102] I [MSGID: 115013]
>>> [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd cleanup on
>>> /atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal
>>> [2018-06-12 09:53:19.306190] I [MSGID: 101055]
>>> [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting down
>>> connection  -2224879-2018/06/12-09:51:01:4
>>> 60889-atlasglust-client-0-0-0
>>>
>>> There is no other information. Is there any way to increase log
>>> verbosity?
>>>
>>> on the client
>>>
>>> 2018-06-12 09:51:01.744980] I [MSGID: 114057]
>>> [client-handshake.c:1478:select_server_supported_programs]
>>> 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), Version
>>> (330)
>>> [2018-06-12 09:51:01.746508] I [MSGID: 114046]
>>> [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-5:
>>> Connected to atlasglust-client-5, attached to remote volume
>>> '/glusteratlas/brick006/gv0'.
>>> [2018-06-12 09:51:01.746543] I [MSGID: 114047]
>>> [client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-5:
>>> Server and Client lk-version numbers are not same, reopening the fds
>>> [2018-06-12 09:51:01.746814] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>> 0-atlasglust-client-5: Server lk version = 1
>>> [2018-06-12 09:51:01.748449] I [MSGID: 114057]
>>> [client-handshake.c:1478:select_server_supported_programs]
>>> 0-atlasglust-client-6: Using Program GlusterFS 3.3, Num (1298437), Version
>>> (330)
>>> [2018-06-12 09:51:01.750219] I [MSGID: 114046]
>>> [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-6:
>>> Connected to atlasglust-client-6, attached to remote volume
>>> '/glusteratlas/brick007/gv0'.
>>> [2018-06-12 09:51:01.750261] I [MSGID: 114047]
>>> [client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-6:
>>> Server and Client lk-version numbers are not same, reopening the fds
>>> [2018-06-12 09:51:01.750503] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>> 0-atlasglust-client-6: Server lk version = 1
>>> [2018-06-12 09:51:01.752207] I [fuse-bridge.c:4205:fuse_init]
>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>>> 7.14
>>> [2018-06-12 09:51:01.752261] I [fuse-bridge.c:4835:fuse_graph_sync]
>>> 0-fuse: switched to graph 0
>>>
>>>
>>> is there a problem with server and client 1k version?
>>>
>>> Thanks for your help.
>>>
>>> Kashif
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 11, 2018 at 11:52 PM, Vijay Bellur 
>>> wrote:
>>>


Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-12 Thread mohammad kashif
Hi Milind, Vijay

Thanks, I have some more information now as I straced glusterd on client

138544  0.000131 mprotect(0x7f2f70785000, 4096, PROT_READ|PROT_WRITE) =
0 <0.26>
138544  0.000128 mprotect(0x7f2f70786000, 4096, PROT_READ|PROT_WRITE) =
0 <0.27>
138544  0.000126 mprotect(0x7f2f70787000, 4096, PROT_READ|PROT_WRITE) =
0 <0.27>
138544  0.000124 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR,
si_addr=0x7f2f7c60ef88} ---
138544  0.51 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL,
si_addr=0} ---
138551  0.105048 +++ killed by SIGSEGV (core dumped) +++
138550  0.41 +++ killed by SIGSEGV (core dumped) +++
138547  0.08 +++ killed by SIGSEGV (core dumped) +++
138546  0.07 +++ killed by SIGSEGV (core dumped) +++
138545  0.07 +++ killed by SIGSEGV (core dumped) +++
138544  0.08 +++ killed by SIGSEGV (core dumped) +++
138543  0.07 +++ killed by SIGSEGV (core dumped) +++

As for I understand that somehow gluster is trying to access memory in
appropriate manner and kernel sends SIGSEGV

I also got the core dump. I am trying gdb first time so I am not sure
whether I am using it correctly

gdb /usr/sbin/glusterfs core.138536

It just tell me that program terminated with signal 11, segmentation fault .

The problem is not limited to one client but happening to many clients.

I will really appreciate any help as whole file system has become unusable

Thanks

Kashif




On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire 
wrote:

> Kashif,
> You can change the log level by:
> $ gluster volume set  diagnostics.brick-log-level TRACE
> $ gluster volume set  diagnostics.client-log-level TRACE
>
> and see how things fare
>
> If you want fewer logs you can change the log-level to DEBUG instead of
> TRACE.
>
>
>
> On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif 
> wrote:
>
>> Hi Vijay
>>
>> Now it is unmounting every 30 mins !
>>
>> The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
>> have this line only
>>
>> 2018-06-12 09:53:19.303102] I [MSGID: 115013]
>> [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd cleanup on
>> /atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal
>> [2018-06-12 09:53:19.306190] I [MSGID: 101055]
>> [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting down
>> connection  -2224879-2018/06/12-09:51:01:4
>> 60889-atlasglust-client-0-0-0
>>
>> There is no other information. Is there any way to increase log verbosity?
>>
>> on the client
>>
>> 2018-06-12 09:51:01.744980] I [MSGID: 114057]
>> [client-handshake.c:1478:select_server_supported_programs]
>> 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), Version
>> (330)
>> [2018-06-12 09:51:01.746508] I [MSGID: 114046]
>> [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-5:
>> Connected to atlasglust-client-5, attached to remote volume
>> '/glusteratlas/brick006/gv0'.
>> [2018-06-12 09:51:01.746543] I [MSGID: 114047]
>> [client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-5:
>> Server and Client lk-version numbers are not same, reopening the fds
>> [2018-06-12 09:51:01.746814] I [MSGID: 114035]
>> [client-handshake.c:202:client_set_lk_version_cbk]
>> 0-atlasglust-client-5: Server lk version = 1
>> [2018-06-12 09:51:01.748449] I [MSGID: 114057]
>> [client-handshake.c:1478:select_server_supported_programs]
>> 0-atlasglust-client-6: Using Program GlusterFS 3.3, Num (1298437), Version
>> (330)
>> [2018-06-12 09:51:01.750219] I [MSGID: 114046]
>> [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-6:
>> Connected to atlasglust-client-6, attached to remote volume
>> '/glusteratlas/brick007/gv0'.
>> [2018-06-12 09:51:01.750261] I [MSGID: 114047]
>> [client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-6:
>> Server and Client lk-version numbers are not same, reopening the fds
>> [2018-06-12 09:51:01.750503] I [MSGID: 114035]
>> [client-handshake.c:202:client_set_lk_version_cbk]
>> 0-atlasglust-client-6: Server lk version = 1
>> [2018-06-12 09:51:01.752207] I [fuse-bridge.c:4205:fuse_init]
>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>> 7.14
>> [2018-06-12 09:51:01.752261] I [fuse-bridge.c:4835:fuse_graph_sync]
>> 0-fuse: switched to graph 0
>>
>>
>> is there a problem with server and client 1k version?
>>
>> Thanks for your help.
>>
>> Kashif
>>
>>
>>
>>
>>
>> On Mon, Jun 11, 2018 at 11:52 PM, Vijay Bellur 
>> wrote:
>>
>>>
>>>
>>> On Mon, Jun 11, 2018 at 8:50 AM, mohammad kashif 
>>> wrote:
>>>
 Hi

 Since I have updated our gluster server and client to latest version
 3.12.9-1, I am having this issue of gluster getting unmounted from client
 very regularly. It was not a problem before update.

 Its a distributed file system with no replication. We have seven
 servers totaling around 480TB data. Its 97% full.

 I am using following config on server


 gluster volume 

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-12 Thread Milind Changire
Kashif,
You can change the log level by:
$ gluster volume set  diagnostics.brick-log-level TRACE
$ gluster volume set  diagnostics.client-log-level TRACE

and see how things fare

If you want fewer logs you can change the log-level to DEBUG instead of
TRACE.



On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif 
wrote:

> Hi Vijay
>
> Now it is unmounting every 30 mins !
>
> The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
> have this line only
>
> 2018-06-12 09:53:19.303102] I [MSGID: 115013] 
> [server-helpers.c:289:do_fd_cleanup]
> 0-atlasglust-server: fd cleanup on /atlas/atlasdata/zgubic/hmumu/
> histograms/v14.3/Signal
> [2018-06-12 09:53:19.306190] I [MSGID: 101055] 
> [client_t.c:443:gf_client_unref]
> 0-atlasglust-server: Shutting down connection 
> -2224879-2018/06/12-09:51:01:460889-atlasglust-client-0-0-0
>
> There is no other information. Is there any way to increase log verbosity?
>
> on the client
>
> 2018-06-12 09:51:01.744980] I [MSGID: 114057] [client-handshake.c:1478:
> select_server_supported_programs] 0-atlasglust-client-5: Using Program
> GlusterFS 3.3, Num (1298437), Version (330)
> [2018-06-12 09:51:01.746508] I [MSGID: 114046] 
> [client-handshake.c:1231:client_setvolume_cbk]
> 0-atlasglust-client-5: Connected to atlasglust-client-5, attached to remote
> volume '/glusteratlas/brick006/gv0'.
> [2018-06-12 09:51:01.746543] I [MSGID: 114047] 
> [client-handshake.c:1242:client_setvolume_cbk]
> 0-atlasglust-client-5: Server and Client lk-version numbers are not same,
> reopening the fds
> [2018-06-12 09:51:01.746814] I [MSGID: 114035] 
> [client-handshake.c:202:client_set_lk_version_cbk]
> 0-atlasglust-client-5: Server lk version = 1
> [2018-06-12 09:51:01.748449] I [MSGID: 114057] [client-handshake.c:1478:
> select_server_supported_programs] 0-atlasglust-client-6: Using Program
> GlusterFS 3.3, Num (1298437), Version (330)
> [2018-06-12 09:51:01.750219] I [MSGID: 114046] 
> [client-handshake.c:1231:client_setvolume_cbk]
> 0-atlasglust-client-6: Connected to atlasglust-client-6, attached to remote
> volume '/glusteratlas/brick007/gv0'.
> [2018-06-12 09:51:01.750261] I [MSGID: 114047] 
> [client-handshake.c:1242:client_setvolume_cbk]
> 0-atlasglust-client-6: Server and Client lk-version numbers are not same,
> reopening the fds
> [2018-06-12 09:51:01.750503] I [MSGID: 114035] 
> [client-handshake.c:202:client_set_lk_version_cbk]
> 0-atlasglust-client-6: Server lk version = 1
> [2018-06-12 09:51:01.752207] I [fuse-bridge.c:4205:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
> 7.14
> [2018-06-12 09:51:01.752261] I [fuse-bridge.c:4835:fuse_graph_sync]
> 0-fuse: switched to graph 0
>
>
> is there a problem with server and client 1k version?
>
> Thanks for your help.
>
> Kashif
>
>
>
>
>
> On Mon, Jun 11, 2018 at 11:52 PM, Vijay Bellur  wrote:
>
>>
>>
>> On Mon, Jun 11, 2018 at 8:50 AM, mohammad kashif 
>> wrote:
>>
>>> Hi
>>>
>>> Since I have updated our gluster server and client to latest version
>>> 3.12.9-1, I am having this issue of gluster getting unmounted from client
>>> very regularly. It was not a problem before update.
>>>
>>> Its a distributed file system with no replication. We have seven servers
>>> totaling around 480TB data. Its 97% full.
>>>
>>> I am using following config on server
>>>
>>>
>>> gluster volume set atlasglust features.cache-invalidation on
>>> gluster volume set atlasglust features.cache-invalidation-timeout 600
>>> gluster volume set atlasglust performance.stat-prefetch on
>>> gluster volume set atlasglust performance.cache-invalidation on
>>> gluster volume set atlasglust performance.md-cache-timeout 600
>>> gluster volume set atlasglust performance.parallel-readdir on
>>> gluster volume set atlasglust performance.cache-size 1GB
>>> gluster volume set atlasglust performance.client-io-threads on
>>> gluster volume set atlasglust cluster.lookup-optimize on
>>> gluster volume set atlasglust performance.stat-prefetch on
>>> gluster volume set atlasglust client.event-threads 4
>>> gluster volume set atlasglust server.event-threads 4
>>>
>>> clients are mounted with this option
>>>
>>> defaults,direct-io-mode=disable,attribute-timeout=600,entry-
>>> timeout=600,negative-timeout=600,fopen-keep-cache,rw,_netdev
>>>
>>> I can't see anything in the log file. Can someone suggest that how to
>>> troubleshoot this issue?
>>>
>>>
>>>
>>
>> Can you please share the log file? Checking for messages related to
>> disconnections/crashes in the log file would be a good way to start
>> troubleshooting the problem.
>>
>> Thanks,
>> Vijay
>>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Milind
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-12 Thread mohammad kashif
Hi Vijay

Now it is unmounting every 30 mins !

The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
have this line only

2018-06-12 09:53:19.303102] I [MSGID: 115013]
[server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd cleanup on
/atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal
[2018-06-12 09:53:19.306190] I [MSGID: 101055]
[client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting down
connection 
-2224879-2018/06/12-09:51:01:460889-atlasglust-client-0-0-0

There is no other information. Is there any way to increase log verbosity?

on the client

2018-06-12 09:51:01.744980] I [MSGID: 114057]
[client-handshake.c:1478:select_server_supported_programs]
0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2018-06-12 09:51:01.746508] I [MSGID: 114046]
[client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-5:
Connected to atlasglust-client-5, attached to remote volume
'/glusteratlas/brick006/gv0'.
[2018-06-12 09:51:01.746543] I [MSGID: 114047]
[client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-5:
Server and Client lk-version numbers are not same, reopening the fds
[2018-06-12 09:51:01.746814] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-atlasglust-client-5:
Server lk version = 1
[2018-06-12 09:51:01.748449] I [MSGID: 114057]
[client-handshake.c:1478:select_server_supported_programs]
0-atlasglust-client-6: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2018-06-12 09:51:01.750219] I [MSGID: 114046]
[client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-6:
Connected to atlasglust-client-6, attached to remote volume
'/glusteratlas/brick007/gv0'.
[2018-06-12 09:51:01.750261] I [MSGID: 114047]
[client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-6:
Server and Client lk-version numbers are not same, reopening the fds
[2018-06-12 09:51:01.750503] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-atlasglust-client-6:
Server lk version = 1
[2018-06-12 09:51:01.752207] I [fuse-bridge.c:4205:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
7.14
[2018-06-12 09:51:01.752261] I [fuse-bridge.c:4835:fuse_graph_sync] 0-fuse:
switched to graph 0


is there a problem with server and client 1k version?

Thanks for your help.

Kashif





On Mon, Jun 11, 2018 at 11:52 PM, Vijay Bellur  wrote:

>
>
> On Mon, Jun 11, 2018 at 8:50 AM, mohammad kashif 
> wrote:
>
>> Hi
>>
>> Since I have updated our gluster server and client to latest version
>> 3.12.9-1, I am having this issue of gluster getting unmounted from client
>> very regularly. It was not a problem before update.
>>
>> Its a distributed file system with no replication. We have seven servers
>> totaling around 480TB data. Its 97% full.
>>
>> I am using following config on server
>>
>>
>> gluster volume set atlasglust features.cache-invalidation on
>> gluster volume set atlasglust features.cache-invalidation-timeout 600
>> gluster volume set atlasglust performance.stat-prefetch on
>> gluster volume set atlasglust performance.cache-invalidation on
>> gluster volume set atlasglust performance.md-cache-timeout 600
>> gluster volume set atlasglust performance.parallel-readdir on
>> gluster volume set atlasglust performance.cache-size 1GB
>> gluster volume set atlasglust performance.client-io-threads on
>> gluster volume set atlasglust cluster.lookup-optimize on
>> gluster volume set atlasglust performance.stat-prefetch on
>> gluster volume set atlasglust client.event-threads 4
>> gluster volume set atlasglust server.event-threads 4
>>
>> clients are mounted with this option
>>
>> defaults,direct-io-mode=disable,attribute-timeout=600,entry-
>> timeout=600,negative-timeout=600,fopen-keep-cache,rw,_netdev
>>
>> I can't see anything in the log file. Can someone suggest that how to
>> troubleshoot this issue?
>>
>>
>>
>
> Can you please share the log file? Checking for messages related to
> disconnections/crashes in the log file would be a good way to start
> troubleshooting the problem.
>
> Thanks,
> Vijay
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

2018-06-11 Thread Vijay Bellur
On Mon, Jun 11, 2018 at 8:50 AM, mohammad kashif 
wrote:

> Hi
>
> Since I have updated our gluster server and client to latest version
> 3.12.9-1, I am having this issue of gluster getting unmounted from client
> very regularly. It was not a problem before update.
>
> Its a distributed file system with no replication. We have seven servers
> totaling around 480TB data. Its 97% full.
>
> I am using following config on server
>
>
> gluster volume set atlasglust features.cache-invalidation on
> gluster volume set atlasglust features.cache-invalidation-timeout 600
> gluster volume set atlasglust performance.stat-prefetch on
> gluster volume set atlasglust performance.cache-invalidation on
> gluster volume set atlasglust performance.md-cache-timeout 600
> gluster volume set atlasglust performance.parallel-readdir on
> gluster volume set atlasglust performance.cache-size 1GB
> gluster volume set atlasglust performance.client-io-threads on
> gluster volume set atlasglust cluster.lookup-optimize on
> gluster volume set atlasglust performance.stat-prefetch on
> gluster volume set atlasglust client.event-threads 4
> gluster volume set atlasglust server.event-threads 4
>
> clients are mounted with this option
>
> defaults,direct-io-mode=disable,attribute-timeout=600,
> entry-timeout=600,negative-timeout=600,fopen-keep-cache,rw,_netdev
>
> I can't see anything in the log file. Can someone suggest that how to
> troubleshoot this issue?
>
>
>

Can you please share the log file? Checking for messages related to
disconnections/crashes in the log file would be a good way to start
troubleshooting the problem.

Thanks,
Vijay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users