Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-20 Thread Vijaikumar M
From the log: 
http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a17%3a10%3a51.tgzit 
looks like glusterd was hung:


*Glusterd log:**
* 5305 [2014-05-20 20:08:55.040665] E 
[glusterd-snapshot.c:3805:glusterd_add_brick_to_snap_volume] 
0-management: Unable to fetch snap device (vol1.brick_snapdevice0). 
Leaving empty
 5306 [2014-05-20 20:08:55.649146] I 
[rpc-clnt.c:973:rpc_clnt_connection_init] 0-management: setting 
frame-timeout to 600
 5307 [2014-05-20 20:08:55.663181] I 
[rpc-clnt.c:973:rpc_clnt_connection_init] 0-management: setting 
frame-timeout to 600
 5308 [2014-05-20 20:16:55.541197] W 
[glusterfsd.c:1182:cleanup_and_exit] (--> 0-: received signum (15), 
shutting down


Glusterd was hung when executing the testcase ./tests/bugs/bug-1090042.t.

*Cli log:**
*72649 [2014-05-20 20:12:51.960765] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72650 [2014-05-20 20:12:51.960850] T [socket.c:2689:socket_connect] 
(-->/build/install/lib/libglusterfs.so.0(gf_timer_proc+0x1a2) 
[0x7ff8b6609994] 
(-->/build/install/lib/libgfrpc.so.0(rpc_clnt_reconnect+0x137) 
[0x7ff8b5d3305b] (- 
->/build/install/lib/libgfrpc.so.0(rpc_transport_connect+0x74) 
[0x7ff8b5d30071]))) 0-glusterfs: connect () called on transport already 
connected
 72651 [2014-05-20 20:12:52.960943] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72652 [2014-05-20 20:12:52.960999] T [socket.c:2697:socket_connect] 
0-glusterfs: connecting 0x1e0fcc0, state=0 gen=0 sock=-1
 72653 [2014-05-20 20:12:52.961038] W [dict.c:1059:data_to_str] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(+0xb5f3) 
[0x7ff8ad9e95f3] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(socket_clien 
t_get_remote_sockaddr+0x10a) [0x7ff8ad9ed568] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(client_fill_address_family+0xf1) 
[0x7ff8ad9ec7d0]))) 0-dict: data is NULL
 72654 [2014-05-20 20:12:52.961070] W [dict.c:1059:data_to_str] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(+0xb5f3) 
[0x7ff8ad9e95f3] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(socket_clien 
t_get_remote_sockaddr+0x10a) [0x7ff8ad9ed568] 
(-->/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(client_fill_address_family+0x100) 
[0x7ff8ad9ec7df]))) 0-dict: data is NULL
 72655 [2014-05-20 20:12:52.961079] E 
[name.c:140:client_fill_address_family] 0-glusterfs: 
transport.address-family not specified. Could not guess default value 
from (remote-host:(null) or transport.unix.connect-path:(null)) 
optio   ns
 72656 [2014-05-20 20:12:54.961273] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72657 [2014-05-20 20:12:54.961404] T [socket.c:2689:socket_connect] 
(-->/build/install/lib/libglusterfs.so.0(gf_timer_proc+0x1a2) 
[0x7ff8b6609994] 
(-->/build/install/lib/libgfrpc.so.0(rpc_clnt_reconnect+0x137) 
[0x7ff8b5d3305b] (- 
->/build/install/lib/libgfrpc.so.0(rpc_transport_connect+0x74) 
[0x7ff8b5d30071]))) 0-glusterfs: connect () called on transport already 
connected
 72658 [2014-05-20 20:12:55.120645] D [cli-cmd.c:384:cli_cmd_submit] 
0-cli: Returning 110
 72659 [2014-05-20 20:12:55.120723] D 
[cli-rpc-ops.c:8716:gf_cli_snapshot] 0-cli: Returning 110



Now we need to find why glusterd was hung.


Thanks,
Vijay



On Wednesday 21 May 2014 06:46 AM, Pranith Kumar Karampuri wrote:

Hey,
 Seems like even after this fix is merged, the regression tests are failing 
for the same script. You can check the logs at 
http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz

Relevant logs:
[2014-05-20 20:17:07.026045]  : volume create patchy 
build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : 
SUCCESS
[2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
[2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
[2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED : Failed 
to reconfigure barrier.
[2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
[2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED : Failed 
to reconfigure barrier.

Pranith

- Original Message -

From: "Pranith Kumar Karampuri" 
To: "Gluster Devel" 
Cc: "Joseph Fernandes" , "Vijaikumar M" 

Sent: Tuesday, May 20, 2014 3:41:11 PM
Subject: Re: Spurious failures because of nfs and snapshots

hi,
 Please resubmit the patches on top of http://review.gluster.com/#/c/7753
 to prevent frequent regression failures.

Pranith
- Original Message -

From: "Vijaikumar M" 
To: "Pranith Kumar Karampuri" 
Cc: "Joseph Fernandes" , "Gluster Devel"

Sent: Monday, May 19, 2014 2:40:47 PM
Subject: Re: Spurious failures because of nfs and snapshots

Brick disconnected with ping-time out:

Here is the log message
[2014-05-19 04:29:38.13

Re: [Gluster-devel] Fwd: Re: Spurious failures because of nfs and snapshots

2014-05-20 Thread Atin Mukherjee


On 05/21/2014 10:54 AM, SATHEESARAN wrote:
> Guys,
> 
> This is the issue pointed out by Pranith with regard to Barrier.
> I was reading through it.
> 
> But I wanted to bring it to concern
> 
> -- S
> 
> 
>  Original Message 
> Subject:  Re: [Gluster-devel] Spurious failures because of nfs and
> snapshots
> Date: Tue, 20 May 2014 21:16:57 -0400 (EDT)
> From: Pranith Kumar Karampuri 
> To:   Vijaikumar M , Joseph Fernandes
> 
> CC:   Gluster Devel 
> 
> 
> 
> Hey,
> Seems like even after this fix is merged, the regression tests are 
> failing for the same script. You can check the logs at 
> http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz
Pranith,

Is this the correct link? I don't see any log having this sequence there.
Also looking at the log from this mail, this is expected as per the
barrier functionality, an enable request followed by another enable
should always fail and the same happens for disable.

Can you please confirm the link and which particular regression test is
causing this issue, is it bug-1090042.t?

--Atin
> 
> Relevant logs:
> [2014-05-20 20:17:07.026045]  : volume create patchy 
> build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : 
> SUCCESS
> [2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
> [2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
> [2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED : 
> Failed to reconfigure barrier.
> [2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
> [2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED : 
> Failed to reconfigure barrier.
> 
> Pranith
> 
> - Original Message -
>> From: "Pranith Kumar Karampuri" 
>> To: "Gluster Devel" 
>> Cc: "Joseph Fernandes" , "Vijaikumar M" 
>> 
>> Sent: Tuesday, May 20, 2014 3:41:11 PM
>> Subject: Re: Spurious failures because of nfs and snapshots
>> 
>> hi,
>> Please resubmit the patches on top of http://review.gluster.com/#/c/7753
>> to prevent frequent regression failures.
>> 
>> Pranith
>> - Original Message -
>> > From: "Vijaikumar M" 
>> > To: "Pranith Kumar Karampuri" 
>> > Cc: "Joseph Fernandes" , "Gluster Devel"
>> > 
>> > Sent: Monday, May 19, 2014 2:40:47 PM
>> > Subject: Re: Spurious failures because of nfs and snapshots
>> > 
>> > Brick disconnected with ping-time out:
>> > 
>> > Here is the log message
>> > [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
>> > 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
>> > n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s
>> > build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9
>> > 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3
>> > -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f
>> > bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid
>> > -S /var/run/51fe50a6faf0aae006c815da946caf3a.socket --brick-name
>> > /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l
>> > /build/install/var/log/glusterfs/br
>> > icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log
>> > --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba
>> > 7b4b0 --brick-port 49164 --xlator-option
>> > 3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
>> >2 [2014-05-19 04:29:38.141118] I
>> > [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting
>> > ping-timeout to 30secs
>> >3 [2014-05-19 04:30:09.139521] C
>> > [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server
>> > 10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting.
>> > 
>> > 
>> > 
>> > Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where
>> > ping-timer will be disabled by default for all the rpc connection except
>> > for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).
>> > 
>> > 
>> > Thanks,
>> > Vijay
>> > 
>> > 
>> > On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:
>> > > The latest build failure also has the same issue:
>> > > Download it from here:
>> > > 

Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t

2014-05-20 Thread Pranith Kumar Karampuri


- Original Message -
> From: "Anand Avati" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Edward Shishkin" , "Gluster Devel" 
> 
> Sent: Wednesday, May 21, 2014 10:53:54 AM
> Subject: Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t
> 
> There are a few suspicious things going on here..
> 
> On Tue, May 20, 2014 at 10:07 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
> 
> >
> > > > hi,
> > > >  crypt.t is failing regression builds once in a while and most of
> > > > the times it is because of the failures just after the remount in the
> > > > script.
> > > >
> > > > TEST rm -f $M0/testfile-symlink
> > > > TEST rm -f $M0/testfile-link
> > > >
> > > > Both of these are failing with ENOTCONN. I got a chance to look at
> > > > the logs. According to the brick logs, this is what I see:
> > > > [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open]
> > > > 0-patchy-posix: open on /d/backends/patchy1/testfile-symlink:
> > > > Transport endpoint is not connected
> >
> 
> posix_open() happening on a symlink? This should NEVER happen. glusterfs
> itself should NEVER EVER by triggering symlink resolution on the server. In
> this case, for whatever reason an open() is attempted on a symlink, and it
> is getting followed back onto gluster's own mount point (test case is
> creating an absolute link).
> 
> So first find out: who is triggering fop->open() on a symlink. Fix the
> caller.
> 
> Next: add a check in posix_open() to fail with ELOOP or EINVAL if the inode
> is a symlink.

I think I understood what you are saying. Open call for symlink on fuse mount 
lead to an open call again for the target on the same fuse mount. Which lead to 
deadlock :). That is why we disallow opens on symlink in gluster?

Pranith
> 
> 
> > > >
> > > > This is the very first time I saw posix failing with ENOTCONN. Do we
> > > > have these bricks on some other network mounts? I wonder why it fails
> > > > with ENOTCONN.
> > > >
> > > > I also see that it happens right after a call_bail on the mount.
> > > >
> > > > Pranith
> > >
> > > Hello.
> > > OK, I'll try to reproduce it.
> >
> > I tried re-creating the issue on my fedora VM and it happened just now.
> > When this issue happens I am not able to attach the process to gdb. From
> > /proc/ the threads are in the following state for a while now:
> > root@pranith-vm1 - /proc/4053/task
> > 10:20:50 :) ⚡ for i in `ls`; do cat $i/stack; echo
> > "-"; done
> > [] ep_poll+0x21e/0x330
> > [] SyS_epoll_wait+0xd5/0x100
> > [] system_call_fastpath+0x16/0x1b
> > [] 0x
> > -
> > [] hrtimer_nanosleep+0xad/0x170
> > [] SyS_nanosleep+0x66/0x80
> > [] system_call_fastpath+0x16/0x1b
> > [] 0x
> > -
> > [] do_sigtimedwait+0x161/0x200
> > [] SYSC_rt_sigtimedwait+0x76/0xd0
> > [] SyS_rt_sigtimedwait+0xe/0x10
> > [] system_call_fastpath+0x16/0x1b
> > [] 0x
> > -
> > [] futex_wait_queue_me+0xda/0x140
> > [] futex_wait+0x17e/0x290
> > [] do_futex+0xe6/0xc30
> > [] SyS_futex+0x71/0x150
> > [] system_call_fastpath+0x16/0x1b
> > [] 0x
> > -
> > [] futex_wait_queue_me+0xda/0x140
> > [] futex_wait+0x17e/0x290
> > [] do_futex+0xe6/0xc30
> > [] SyS_futex+0x71/0x150
> > [] system_call_fastpath+0x16/0x1b
> > [] 0x
> > -
> > [] futex_wait_queue_me+0xda/0x140
> > [] futex_wait+0x17e/0x290
> > [] do_futex+0xe6/0xc30
> > [] SyS_futex+0x71/0x150
> > [] system_call_fastpath+0x16/0x1b
> > [] 0x
> > -
> > [] wait_answer_interruptible+0x89/0xd0 [fuse]
> >  <<--- This is the important thing I think
> > [] __fuse_request_send+0x232/0x290 [fuse]
> > [] fuse_request_send+0x12/0x20 [fuse]
> > [] fuse_do_open+0xca/0x170 [fuse]
> > [] fuse_open_common+0x56/0x80 [fuse]
> > [] fuse_open+0x10/0x20 [fuse]
> > [] do_dentry_open+0x1eb/0x280
> > [] finish_open+0x31/0x40
> > [] do_last+0x4ca/0xe00
> > [] path_openat+0x420/0x690
> > [] do_filp_open+0x3a/0x90
> > [] do_sys_open+0x12e/0x210
> > [] SyS_open+0x1e/0x20
> > [] system_call_fastpath+0x16/0x1b
> > [] 0x
> > -
> > [] futex_wait_queue_me+0xda/0x140
> > [] futex_wait+0x17e/0x290
> > [] do_futex+0xe6/0xc30
> > [] SyS_futex+0x71/0x150
> > [] system_call_fastpath+0x16/0x1b
> > [] 0x
> > -
> > [] futex_wait_queue_me+0xda/0x140
> > [] futex_wait+0x17e/0x290
> > [] do_futex+0xe6/0xc30
> > [] SyS_futex+0x71/0x150
> > [] system_call_fastpath+0x16/0x1b
> > [] 0x
> > -
> > [] hrtimer_nanosleep+0xad/0x170
> > [] SyS_nanosleep+0x66/0x80
> > [] system_call_fastpath+0x16/0x1b
> > [] 0x
> > -
> >
> > I don't know how to debug further but it seems like the s

Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t

2014-05-20 Thread Anand Avati
There are a few suspicious things going on here..

On Tue, May 20, 2014 at 10:07 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
> > > hi,
> > >  crypt.t is failing regression builds once in a while and most of
> > > the times it is because of the failures just after the remount in the
> > > script.
> > >
> > > TEST rm -f $M0/testfile-symlink
> > > TEST rm -f $M0/testfile-link
> > >
> > > Both of these are failing with ENOTCONN. I got a chance to look at
> > > the logs. According to the brick logs, this is what I see:
> > > [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open]
> > > 0-patchy-posix: open on /d/backends/patchy1/testfile-symlink:
> > > Transport endpoint is not connected
>

posix_open() happening on a symlink? This should NEVER happen. glusterfs
itself should NEVER EVER by triggering symlink resolution on the server. In
this case, for whatever reason an open() is attempted on a symlink, and it
is getting followed back onto gluster's own mount point (test case is
creating an absolute link).

So first find out: who is triggering fop->open() on a symlink. Fix the
caller.

Next: add a check in posix_open() to fail with ELOOP or EINVAL if the inode
is a symlink.


> > >
> > > This is the very first time I saw posix failing with ENOTCONN. Do we
> > > have these bricks on some other network mounts? I wonder why it fails
> > > with ENOTCONN.
> > >
> > > I also see that it happens right after a call_bail on the mount.
> > >
> > > Pranith
> >
> > Hello.
> > OK, I'll try to reproduce it.
>
> I tried re-creating the issue on my fedora VM and it happened just now.
> When this issue happens I am not able to attach the process to gdb. From
> /proc/ the threads are in the following state for a while now:
> root@pranith-vm1 - /proc/4053/task
> 10:20:50 :) ⚡ for i in `ls`; do cat $i/stack; echo
> "-"; done
> [] ep_poll+0x21e/0x330
> [] SyS_epoll_wait+0xd5/0x100
> [] system_call_fastpath+0x16/0x1b
> [] 0x
> -
> [] hrtimer_nanosleep+0xad/0x170
> [] SyS_nanosleep+0x66/0x80
> [] system_call_fastpath+0x16/0x1b
> [] 0x
> -
> [] do_sigtimedwait+0x161/0x200
> [] SYSC_rt_sigtimedwait+0x76/0xd0
> [] SyS_rt_sigtimedwait+0xe/0x10
> [] system_call_fastpath+0x16/0x1b
> [] 0x
> -
> [] futex_wait_queue_me+0xda/0x140
> [] futex_wait+0x17e/0x290
> [] do_futex+0xe6/0xc30
> [] SyS_futex+0x71/0x150
> [] system_call_fastpath+0x16/0x1b
> [] 0x
> -
> [] futex_wait_queue_me+0xda/0x140
> [] futex_wait+0x17e/0x290
> [] do_futex+0xe6/0xc30
> [] SyS_futex+0x71/0x150
> [] system_call_fastpath+0x16/0x1b
> [] 0x
> -
> [] futex_wait_queue_me+0xda/0x140
> [] futex_wait+0x17e/0x290
> [] do_futex+0xe6/0xc30
> [] SyS_futex+0x71/0x150
> [] system_call_fastpath+0x16/0x1b
> [] 0x
> -
> [] wait_answer_interruptible+0x89/0xd0 [fuse]
>  <<--- This is the important thing I think
> [] __fuse_request_send+0x232/0x290 [fuse]
> [] fuse_request_send+0x12/0x20 [fuse]
> [] fuse_do_open+0xca/0x170 [fuse]
> [] fuse_open_common+0x56/0x80 [fuse]
> [] fuse_open+0x10/0x20 [fuse]
> [] do_dentry_open+0x1eb/0x280
> [] finish_open+0x31/0x40
> [] do_last+0x4ca/0xe00
> [] path_openat+0x420/0x690
> [] do_filp_open+0x3a/0x90
> [] do_sys_open+0x12e/0x210
> [] SyS_open+0x1e/0x20
> [] system_call_fastpath+0x16/0x1b
> [] 0x
> -
> [] futex_wait_queue_me+0xda/0x140
> [] futex_wait+0x17e/0x290
> [] do_futex+0xe6/0xc30
> [] SyS_futex+0x71/0x150
> [] system_call_fastpath+0x16/0x1b
> [] 0x
> -
> [] futex_wait_queue_me+0xda/0x140
> [] futex_wait+0x17e/0x290
> [] do_futex+0xe6/0xc30
> [] SyS_futex+0x71/0x150
> [] system_call_fastpath+0x16/0x1b
> [] 0x
> -
> [] hrtimer_nanosleep+0xad/0x170
> [] SyS_nanosleep+0x66/0x80
> [] system_call_fastpath+0x16/0x1b
> [] 0x
> -
>
> I don't know how to debug further but it seems like the system call hung
>

The threads in the above process are of glusterfsd, and glusterfsd is
ending up an open() attempt on a FUSE (its own) mount. Pretty obvious that
it is deadlocking. Find the open()er on the symlink and you have your fix.

Avati
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t

2014-05-20 Thread Pranith Kumar Karampuri


- Original Message -
> From: "Edward Shishkin" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Vijay Bellur" , "Anand Avati" , 
> "Gluster Devel"
> 
> Sent: Monday, May 19, 2014 6:05:02 PM
> Subject: Re: spurios failures in tests/encryption/crypt.t
> 
> On Sat, 17 May 2014 04:28:45 -0400 (EDT)
> Pranith Kumar Karampuri  wrote:
> 
> > hi,
> >  crypt.t is failing regression builds once in a while and most of
> > the times it is because of the failures just after the remount in the
> > script.
> > 
> > TEST rm -f $M0/testfile-symlink
> > TEST rm -f $M0/testfile-link
> > 
> > Both of these are failing with ENOTCONN. I got a chance to look at
> > the logs. According to the brick logs, this is what I see:
> > [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open]
> > 0-patchy-posix: open on /d/backends/patchy1/testfile-symlink:
> > Transport endpoint is not connected
> > 
> > This is the very first time I saw posix failing with ENOTCONN. Do we
> > have these bricks on some other network mounts? I wonder why it fails
> > with ENOTCONN.
> > 
> > I also see that it happens right after a call_bail on the mount.
> > 
> > Pranith
> 
> Hello.
> OK, I'll try to reproduce it.

I tried re-creating the issue on my fedora VM and it happened just now. When 
this issue happens I am not able to attach the process to gdb. From /proc/ the 
threads are in the following state for a while now:
root@pranith-vm1 - /proc/4053/task 
10:20:50 :) ⚡ for i in `ls`; do cat $i/stack; echo 
"-"; done
[] ep_poll+0x21e/0x330
[] SyS_epoll_wait+0xd5/0x100
[] system_call_fastpath+0x16/0x1b
[] 0x
-
[] hrtimer_nanosleep+0xad/0x170
[] SyS_nanosleep+0x66/0x80
[] system_call_fastpath+0x16/0x1b
[] 0x
-
[] do_sigtimedwait+0x161/0x200
[] SYSC_rt_sigtimedwait+0x76/0xd0
[] SyS_rt_sigtimedwait+0xe/0x10
[] system_call_fastpath+0x16/0x1b
[] 0x
-
[] futex_wait_queue_me+0xda/0x140
[] futex_wait+0x17e/0x290
[] do_futex+0xe6/0xc30
[] SyS_futex+0x71/0x150
[] system_call_fastpath+0x16/0x1b
[] 0x
-
[] futex_wait_queue_me+0xda/0x140
[] futex_wait+0x17e/0x290
[] do_futex+0xe6/0xc30
[] SyS_futex+0x71/0x150
[] system_call_fastpath+0x16/0x1b
[] 0x
-
[] futex_wait_queue_me+0xda/0x140
[] futex_wait+0x17e/0x290
[] do_futex+0xe6/0xc30
[] SyS_futex+0x71/0x150
[] system_call_fastpath+0x16/0x1b
[] 0x
-
[] wait_answer_interruptible+0x89/0xd0 [fuse]  <<--- 
This is the important thing I think
[] __fuse_request_send+0x232/0x290 [fuse]
[] fuse_request_send+0x12/0x20 [fuse]
[] fuse_do_open+0xca/0x170 [fuse]
[] fuse_open_common+0x56/0x80 [fuse]
[] fuse_open+0x10/0x20 [fuse]
[] do_dentry_open+0x1eb/0x280
[] finish_open+0x31/0x40
[] do_last+0x4ca/0xe00
[] path_openat+0x420/0x690
[] do_filp_open+0x3a/0x90
[] do_sys_open+0x12e/0x210
[] SyS_open+0x1e/0x20
[] system_call_fastpath+0x16/0x1b
[] 0x
-
[] futex_wait_queue_me+0xda/0x140
[] futex_wait+0x17e/0x290
[] do_futex+0xe6/0xc30
[] SyS_futex+0x71/0x150
[] system_call_fastpath+0x16/0x1b
[] 0x
-
[] futex_wait_queue_me+0xda/0x140
[] futex_wait+0x17e/0x290
[] do_futex+0xe6/0xc30
[] SyS_futex+0x71/0x150
[] system_call_fastpath+0x16/0x1b
[] 0x
-
[] hrtimer_nanosleep+0xad/0x170
[] SyS_nanosleep+0x66/0x80
[] system_call_fastpath+0x16/0x1b
[] 0x
-

I don't know how to debug further but it seems like the system call hung

CC Brian Foster.

Pranith
> 
> Thanks for the report!
> Edward.
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Need sensible default value for detecting unclean client disconnects

2014-05-20 Thread Anand Avati
Niels,
This is a good addition. While gluster clients do a reasonably good job at
detecting dead/hung servers with ping-timeout, the server side detection
has been rather weak. TCP_KEEPALIVE has helped to some extent, for cases
where an idling client (which holds a lock) goes dead. However if an active
client with pending data in server's socket buffer dies, we have been
subject to long tcp retransmission to finish and give up.

The way I see it, this option is complementary to TCP_KEEPALIVE (keepalive
works for idle and only idle connections, user_timeout works only when
there is pending acknowledgements, thus covering the full spectrum). To
that end, it might make sense to present the admin a single timeout
configuration value rather than two. It would be very frustrating for the
admin to configure one of them to, say, 30 seconds, and then find that the
server does not clean up after 30 seconds of a hung client only because the
connection was idle (or not idle). Configuring a second timeout for the
other case can be very unintuitive.

In fact, I would suggest to have a single network timeout configuration,
which gets applied to all the three: ping-timeout on the client,
user_timeout on the server, keepalive on both. I think that is what a user
would be expecting anyways. Each is for a slightly different technical
situation, but all just internal details as far as a user is concerned.

Thoughts?


On Tue, May 20, 2014 at 4:30 AM, Niels de Vos  wrote:

> Hi all,
>
> the last few days I've been looking at a problem [1] where a client
> locks a file over a FUSE-mount, and a 2nd client tries to grab that lock
> too.  It is expected that the 2nd client gets blocked until the 1st
> client releases the lock. This all work as long as the 1st client
> cleanly releases the lock.
>
> Whenever the 1st client crashes (like a kernel panic) or the network is
> split and the 1st client is unreachable, the 2nd client may not get the
> lock until the bricks detect that the connection to the 1st client is
> dead. If there are pending Replies, the bricks may need 15-20 minutes
> until the re-transmissions of the replies have timed-out.
>
> The current default of 15-20 minutes is quite long for a fail-over
> scenario. Relatively recently [2], the Linux kernel got
> a TCP_USER_TIMEOUT socket option (similar to TCP_KEEPALIVE). This option
> can be used to configure a per-socket timeout, instead of a system-wide
> configuration through the net.ipv4.tcp_retries2 sysctl.
>
> The default network.ping-timeout is set to 42 seconds. I'd like to
> propose a network.tcp-timeout option that can be set per volume. This
> option should then set TCP_USER_TIMEOUT for the socket, which causes
> re-transmission failures to be fatal after the timeout has passed.
>
> Now the remaining question, what shall be the default timeout in seconds
> for this new network.tcp-timeout option? I'm currently thinking of
> making it high enough (like 5 minutes) to prevent false positives.
>
> Thoughts and comments welcome,
> Niels
>
>
> 1 https://bugzilla.redhat.com/show_bug.cgi?id=1099460
> 2
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dca43c7
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-20 Thread Pranith Kumar Karampuri
Hey,
Seems like even after this fix is merged, the regression tests are failing 
for the same script. You can check the logs at 
http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz

Relevant logs:
[2014-05-20 20:17:07.026045]  : volume create patchy 
build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : 
SUCCESS
[2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
[2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
[2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED : Failed 
to reconfigure barrier.
[2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
[2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED : Failed 
to reconfigure barrier.

Pranith

- Original Message -
> From: "Pranith Kumar Karampuri" 
> To: "Gluster Devel" 
> Cc: "Joseph Fernandes" , "Vijaikumar M" 
> 
> Sent: Tuesday, May 20, 2014 3:41:11 PM
> Subject: Re: Spurious failures because of nfs and snapshots
> 
> hi,
> Please resubmit the patches on top of http://review.gluster.com/#/c/7753
> to prevent frequent regression failures.
> 
> Pranith
> - Original Message -
> > From: "Vijaikumar M" 
> > To: "Pranith Kumar Karampuri" 
> > Cc: "Joseph Fernandes" , "Gluster Devel"
> > 
> > Sent: Monday, May 19, 2014 2:40:47 PM
> > Subject: Re: Spurious failures because of nfs and snapshots
> > 
> > Brick disconnected with ping-time out:
> > 
> > Here is the log message
> > [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
> > 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
> > n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s
> > build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9
> > 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3
> > -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f
> > bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid
> > -S /var/run/51fe50a6faf0aae006c815da946caf3a.socket --brick-name
> > /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l
> > /build/install/var/log/glusterfs/br
> > icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log
> > --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba
> > 7b4b0 --brick-port 49164 --xlator-option
> > 3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
> >2 [2014-05-19 04:29:38.141118] I
> > [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting
> > ping-timeout to 30secs
> >3 [2014-05-19 04:30:09.139521] C
> > [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server
> > 10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting.
> > 
> > 
> > 
> > Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where
> > ping-timer will be disabled by default for all the rpc connection except
> > for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).
> > 
> > 
> > Thanks,
> > Vijay
> > 
> > 
> > On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:
> > > The latest build failure also has the same issue:
> > > Download it from here:
> > > http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz
> > >
> > > Pranith
> > >
> > > - Original Message -
> > >> From: "Vijaikumar M" 
> > >> To: "Joseph Fernandes" 
> > >> Cc: "Pranith Kumar Karampuri" , "Gluster Devel"
> > >> 
> > >> Sent: Monday, 19 May, 2014 11:41:28 AM
> > >> Subject: Re: Spurious failures because of nfs and snapshots
> > >>
> > >> Hi Joseph,
> > >>
> > >> In the log mentioned below, it say ping-time is set to default value
> > >> 30sec.I think issue is different.
> > >> Can you please point me to the logs where you where able to re-create
> > >> the problem.
> > >>
> > >> Thanks,
> > >> Vijay
> > >>
> > >>
> > >>
> > >> On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:
> > >>> hi Vijai, Joseph,
> > >>>   In 2 of the last 3 build failures,
> > >>>   http://build.gluster.org/job/regression/4479/console,
> > >>>   http://build.gluste

Re: [Gluster-devel] Split-brain present and future in afr

2014-05-20 Thread Jeff Darcy
> 1. Better protection for split-brain over time.
> 2. Policy based split-brain resolution.
> 3. Provide better availability with client quorum and replica 2.

I would add the following:

(4) Quorum enforcement - any kind - on by default.

(5) Fix the problem of volumes losing quorum because unrelated nodes
went down (i.e. implement volume-level quorum).

(6) Better tools for users to resolve split brain themselves.

> For 3, we are planning to introduce arbiter bricks that can be used to
> determine quorum. The arbiter bricks will be dummy bricks that host only
> files that will be updated from multiple clients. This will be achieved by
> bringing about variable replication count for configurable class of files
> within a volume.
>  In the case of a replicated volume with one arbiter brick per replica group,
>  certain files that are prone to split-brain will be in 3 bricks (2 data
>  bricks + 1 arbiter brick).  All other files will be present in the regular
>  data bricks. For example, when oVirt VM disks are hosted on a replica 2
>  volume, sanlock is used by oVirt for arbitration. sanloclk lease files will
>  be written by all clients and VM disks are written by only a single client
>  at any given point of time. In this scenario, we can place sanlock lease
>  files on 2 data + 1 arbiter bricks. The VM disk files will only be present
>  on the 2 data bricks. Client quorum is now determined by looking at 3
>  bricks instead of 2 and we have better protection when network split-brains
>  happen.

Constantly filtering requests to use either N or N+1 bricks is going to be
complicated and hard to debug.  Every data-structure allocation or loop
based on replica count will have to be examined, and many will have to be
modified.  That's a *lot* of places.  This also overlaps significantly
with functionality that can be achieved with data classification (i.e.
supporting multiple replica levels within the same volume).  What use case
requires that it be implemented within AFR instead of more generally and
flexibly?

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Need sensible default value for detecting unclean client disconnects

2014-05-20 Thread Niels de Vos
On Tue, May 20, 2014 at 01:30:24PM +0200, Niels de Vos wrote:
> Hi all,
> 
> the last few days I've been looking at a problem [1] where a client 
> locks a file over a FUSE-mount, and a 2nd client tries to grab that lock 
> too.  It is expected that the 2nd client gets blocked until the 1st 
> client releases the lock. This all work as long as the 1st client 
> cleanly releases the lock.
> 
> Whenever the 1st client crashes (like a kernel panic) or the network is 
> split and the 1st client is unreachable, the 2nd client may not get the 
> lock until the bricks detect that the connection to the 1st client is 
> dead. If there are pending Replies, the bricks may need 15-20 minutes 
> until the re-transmissions of the replies have timed-out.
> 
> The current default of 15-20 minutes is quite long for a fail-over 
> scenario. Relatively recently [2], the Linux kernel got 
> a TCP_USER_TIMEOUT socket option (similar to TCP_KEEPALIVE). This option 
> can be used to configure a per-socket timeout, instead of a system-wide 
> configuration through the net.ipv4.tcp_retries2 sysctl.
> 
> The default network.ping-timeout is set to 42 seconds. I'd like to 
> propose a network.tcp-timeout option that can be set per volume. This 
> option should then set TCP_USER_TIMEOUT for the socket, which causes 
> re-transmission failures to be fatal after the timeout has passed.
> 
> Now the remaining question, what shall be the default timeout in seconds 
> for this new network.tcp-timeout option? I'm currently thinking of 
> making it high enough (like 5 minutes) to prevent false positives.
> 
> Thoughts and comments welcome,
> Niels
> 
> 
> 1 https://bugzilla.redhat.com/show_bug.cgi?id=1099460
> 2 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dca43c7

Posted a patch for review: http://review.gluster.org/7814
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Split-brain present and future in afr

2014-05-20 Thread Pranith Kumar Karampuri
hi,

Thanks to Vijay Bellur for helping with the re-write of the draft I sent him 
:-).

Present:
Split-brains of files happen in afr today due to 2 primary reasons:

1. Split-brains due to network partition or network split-brains

2. Split-brains due to servers in a replicated group being offline at different 
points in time without self-heal happening in the common period of time when 
the servers were online. For further discussion, this is referred to as 
split-brain over time.

To prevent the occurence of split-brains, we have the following quorum 
implementations in place:

a> Client quorum - Driven by afr (client) and writes are allowed when majority 
of bricks in a replica group are online. Majority is by default N/2 + 1, where 
N is the replication factor for files in a volume.

b> Server quorum - Driven by glusterd (server) and writes are allowed when 
majority of peers are online. Majority by default is N/2 + 1, where N is the 
number of peers in a trusted storage pool.

Both a> and b> primarily safeguard network split-brains. The protection of 
these quorum implementations for split-brain over time scenarios is not very 
high.
Let us consider how replica 3 and replica 2 can be protected against 
split-brains.

Replica 3:
Client quorum is quite effective in this case as writes are only allowed when 
at least 2 of 3 bricks that form a replica group is seen by afr/client. A 
recent fix for a corner case race in client quorum, 
(http://review.gluster.org/7600) makes it very robust. This patch is now part 
of master and release-3.5. We plan to backport it to release-3.4 too.

Replica 2:
Majority for client quorum in a deployment with 2 bricks per replica group is 
2.  Hence availability becomes a problem with replica 2 when either of the 
bricks is offline. To provide better avaialbility for replica-2, the first 
brick in a replica set is provided higher weight and quorum is met as long as 
the first brick is online. If the first brick is offline, then quorum is lost. 

Let us consider the following cases with B1 and B2 forming a replicated set:
B1B2Quorum
Online  OnlineMet
Online  Offline Met
Offline   OfflineNot Met
Offline   OfflineNot Met

Though better availability is provided by client quorum in replica 2 scenarios, 
it is not very optimal and hence an improvement in behavior seems desirable.
Future:

Our  focus in afr going forward would be to solve three problems to provide 
better protection  against split-brains and resolving them:

1. Better protection for split-brain over time.
2. Policy based split-brain resolution.
3. Provide better availability with client quorum and replica 2.

For 1, implementation of outcasting logic will address the problem:
   - An outcast is a copy of a file on which writes have been performed only 
when quorum is met.
   - When a brick goes down and comes back up self-heal daemon will go and mark 
the affected files on the brick that just came back up as outcasts. The outcast 
marking can be implemented even before the brick is declared available to 
regular clients. Once a copy of a file is marked as needing self-heal (or as an 
outcast), writes from clients will not land on that copy till self-heal is 
completed and the outcast tag is removed.

For 2,  we plan to provide commands that can heal based on user configurable 
policies. Examples of policies would be:
 - Pick up the largest file as the winner for resolving a self-heal
-  Choose brick foo as the winner for resolving split-brains
-  Pick up the file with the latest version as the winner (when versioning for 
files is available).

For 3, we are planning to introduce arbiter bricks that can be used to 
determine quorum. The arbiter bricks will be dummy bricks that host only files 
that will be updated from multiple clients. This will be achieved by bringing 
about variable replication count for configurable class of files within a 
volume.
 In the case of a replicated volume with one arbiter brick per replica group, 
certain files that are prone to split-brain will be in 3 bricks (2 data bricks 
+ 1 arbiter brick).  All other files will be present in the regular data 
bricks. For example, when oVirt VM disks are hosted on a replica 2 volume, 
sanlock is used by oVirt for arbitration. sanloclk lease files will be written 
by all clients and VM disks are written by only a single client at any given 
point of time. In this scenario, we can place sanlock lease files on 2 data + 1 
arbiter bricks. The VM disk files will only be present on the 2 data bricks. 
Client quorum is now determined by looking at 3 bricks instead of 2 and we have 
better protection when network split-brains happen.
 
 A combination of 1. and 3. does s

[Gluster-devel] Test, pls ignore

2014-05-20 Thread Justin Clift
Ignore this, just testing mailing list archiving...

+ Justin
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Changes to Regression script

2014-05-20 Thread Pranith Kumar Karampuri


- Original Message -
> From: "Kaushal M" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Vijay Bellur" , "Gluster Devel" 
> , "gluster-infra"
> 
> Sent: Tuesday, May 20, 2014 4:42:25 PM
> Subject: Re: [Gluster-devel] Changes to Regression script
> 
> The build.gluster.org machine had the PDT timezone set. So the timestamps
> should be UTC-7 or UTC-8 depending on daylight savings. It's currently
> UTC-7.
> 
> Would having the archive timestamps also in UTC help?

Hey!,
interesting. I guess we can just do export TZ=UTC in run-tests.sh? prove 
will print start time in UTC against each test. Let me send out that patch. 
That should help narrow search space for trying to figure out the relevant 
logs. The archive timestamp is only for making sure the files have unique name 
isn't it? I am not sure if that would help much.

Pranith.

> 
> ~kaushal
> 
> 
> On Mon, May 19, 2014 at 10:32 AM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
> 
> >
> >
> > - Original Message -
> > > From: "Vijay Bellur" 
> > > To: "Pranith Kumar Karampuri" 
> > > Cc: "gluster-infra" ,
> > gluster-devel@gluster.org
> > > Sent: Monday, 19 May, 2014 10:03:41 AM
> > > Subject: Re: [Gluster-devel] Changes to Regression script
> > >
> > > On 05/19/2014 09:41 AM, Pranith Kumar Karampuri wrote:
> > > >
> > > >
> > > > - Original Message -
> > > >> From: "Vijay Bellur" 
> > > >> To: "Pranith Kumar Karampuri" 
> > > >> Cc: "gluster-infra" ,
> > gluster-devel@gluster.org
> > > >> Sent: Saturday, 17 May, 2014 2:52:03 PM
> > > >> Subject: Re: [Gluster-devel] Changes to Regression script
> > > >>
> > > >> On 05/17/2014 02:10 PM, Pranith Kumar Karampuri wrote:
> > > >>>
> > > >>>
> > > >>> - Original Message -
> > >  From: "Vijay Bellur" 
> > >  To: "gluster-infra" 
> > >  Cc: gluster-devel@gluster.org
> > >  Sent: Tuesday, May 13, 2014 4:13:02 PM
> > >  Subject: [Gluster-devel] Changes to Regression script
> > > 
> > >  Hi All,
> > > 
> > >  Me and Kaushal have effected the following changes on regression.sh
> > in
> > >  build.gluster.org:
> > > 
> > >  1. If a regression run results in a core and all tests pass, that
> > >  particular run will be flagged as a failure. Previously a core that
> > >  would cause test failures only would get marked as a failure.
> > > 
> > >  2. Cores from a particular test run are now archived and are
> > available
> > >  at /d/archived_builds/. This will also prevent manual intervention
> > for
> > >  managing cores.
> > > 
> > >  3. Logs from failed regression runs are now archived and are
> > available
> > >  at /d/logs/glusterfs-.tgz
> > > 
> > >  Do let us know if you have any comments on these changes.
> > > >>>
> > > >>> This is already proving to be useful :-). I was able to debug one of
> > the
> > > >>> spurious failures for crypt.t. But the only problem is I was not able
> > > >>> copy
> > > >>> out the logs. Had to take avati's help to get the log files. Will it
> > be
> > > >>> possible to give access to these files so that anyone can download
> > them?
> > > >>>
> > > >>
> > > >> Good to know!
> > > >>
> > > >> You can access the .tgz files from:
> > > >>
> > > >> http://build.gluster.org:443/logs/
> > > >
> > > > I was able to access these yesterday. But now it gives 404.
> >
> > Its working now. But how do we convert the timestamp to logs' timestamp. I
> > want to know the time difference.
> >
> > Pranith.
> >
> > > >
> > >
> > > Fixed.
> > >
> > > -Vijay
> > >
> > >
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> >
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Need sensible default value for detecting unclean client disconnects

2014-05-20 Thread Niels de Vos
Hi all,

the last few days I've been looking at a problem [1] where a client 
locks a file over a FUSE-mount, and a 2nd client tries to grab that lock 
too.  It is expected that the 2nd client gets blocked until the 1st 
client releases the lock. This all work as long as the 1st client 
cleanly releases the lock.

Whenever the 1st client crashes (like a kernel panic) or the network is 
split and the 1st client is unreachable, the 2nd client may not get the 
lock until the bricks detect that the connection to the 1st client is 
dead. If there are pending Replies, the bricks may need 15-20 minutes 
until the re-transmissions of the replies have timed-out.

The current default of 15-20 minutes is quite long for a fail-over 
scenario. Relatively recently [2], the Linux kernel got 
a TCP_USER_TIMEOUT socket option (similar to TCP_KEEPALIVE). This option 
can be used to configure a per-socket timeout, instead of a system-wide 
configuration through the net.ipv4.tcp_retries2 sysctl.

The default network.ping-timeout is set to 42 seconds. I'd like to 
propose a network.tcp-timeout option that can be set per volume. This 
option should then set TCP_USER_TIMEOUT for the socket, which causes 
re-transmission failures to be fatal after the timeout has passed.

Now the remaining question, what shall be the default timeout in seconds 
for this new network.tcp-timeout option? I'm currently thinking of 
making it high enough (like 5 minutes) to prevent false positives.

Thoughts and comments welcome,
Niels


1 https://bugzilla.redhat.com/show_bug.cgi?id=1099460
2 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dca43c7
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Changes to Regression script

2014-05-20 Thread Kaushal M
The build.gluster.org machine had the PDT timezone set. So the timestamps
should be UTC-7 or UTC-8 depending on daylight savings. It's currently
UTC-7.

Would having the archive timestamps also in UTC help?

~kaushal


On Mon, May 19, 2014 at 10:32 AM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> - Original Message -
> > From: "Vijay Bellur" 
> > To: "Pranith Kumar Karampuri" 
> > Cc: "gluster-infra" ,
> gluster-devel@gluster.org
> > Sent: Monday, 19 May, 2014 10:03:41 AM
> > Subject: Re: [Gluster-devel] Changes to Regression script
> >
> > On 05/19/2014 09:41 AM, Pranith Kumar Karampuri wrote:
> > >
> > >
> > > - Original Message -
> > >> From: "Vijay Bellur" 
> > >> To: "Pranith Kumar Karampuri" 
> > >> Cc: "gluster-infra" ,
> gluster-devel@gluster.org
> > >> Sent: Saturday, 17 May, 2014 2:52:03 PM
> > >> Subject: Re: [Gluster-devel] Changes to Regression script
> > >>
> > >> On 05/17/2014 02:10 PM, Pranith Kumar Karampuri wrote:
> > >>>
> > >>>
> > >>> - Original Message -
> >  From: "Vijay Bellur" 
> >  To: "gluster-infra" 
> >  Cc: gluster-devel@gluster.org
> >  Sent: Tuesday, May 13, 2014 4:13:02 PM
> >  Subject: [Gluster-devel] Changes to Regression script
> > 
> >  Hi All,
> > 
> >  Me and Kaushal have effected the following changes on regression.sh
> in
> >  build.gluster.org:
> > 
> >  1. If a regression run results in a core and all tests pass, that
> >  particular run will be flagged as a failure. Previously a core that
> >  would cause test failures only would get marked as a failure.
> > 
> >  2. Cores from a particular test run are now archived and are
> available
> >  at /d/archived_builds/. This will also prevent manual intervention
> for
> >  managing cores.
> > 
> >  3. Logs from failed regression runs are now archived and are
> available
> >  at /d/logs/glusterfs-.tgz
> > 
> >  Do let us know if you have any comments on these changes.
> > >>>
> > >>> This is already proving to be useful :-). I was able to debug one of
> the
> > >>> spurious failures for crypt.t. But the only problem is I was not able
> > >>> copy
> > >>> out the logs. Had to take avati's help to get the log files. Will it
> be
> > >>> possible to give access to these files so that anyone can download
> them?
> > >>>
> > >>
> > >> Good to know!
> > >>
> > >> You can access the .tgz files from:
> > >>
> > >> http://build.gluster.org:443/logs/
> > >
> > > I was able to access these yesterday. But now it gives 404.
>
> Its working now. But how do we convert the timestamp to logs' timestamp. I
> want to know the time difference.
>
> Pranith.
>
> > >
> >
> > Fixed.
> >
> > -Vijay
> >
> >
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-20 Thread Pranith Kumar Karampuri
hi,
Please resubmit the patches on top of http://review.gluster.com/#/c/7753 to 
prevent frequent regression failures.

Pranith
- Original Message -
> From: "Vijaikumar M" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Joseph Fernandes" , "Gluster Devel" 
> 
> Sent: Monday, May 19, 2014 2:40:47 PM
> Subject: Re: Spurious failures because of nfs and snapshots
> 
> Brick disconnected with ping-time out:
> 
> Here is the log message
> [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
> 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
> n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s
> build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9
> 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3
> -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f
> bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid
> -S /var/run/51fe50a6faf0aae006c815da946caf3a.socket --brick-name
> /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l
> /build/install/var/log/glusterfs/br
> icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log
> --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba
> 7b4b0 --brick-port 49164 --xlator-option
> 3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
>2 [2014-05-19 04:29:38.141118] I
> [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting
> ping-timeout to 30secs
>3 [2014-05-19 04:30:09.139521] C
> [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server
> 10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting.
> 
> 
> 
> Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where
> ping-timer will be disabled by default for all the rpc connection except
> for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).
> 
> 
> Thanks,
> Vijay
> 
> 
> On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:
> > The latest build failure also has the same issue:
> > Download it from here:
> > http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz
> >
> > Pranith
> >
> > - Original Message -
> >> From: "Vijaikumar M" 
> >> To: "Joseph Fernandes" 
> >> Cc: "Pranith Kumar Karampuri" , "Gluster Devel"
> >> 
> >> Sent: Monday, 19 May, 2014 11:41:28 AM
> >> Subject: Re: Spurious failures because of nfs and snapshots
> >>
> >> Hi Joseph,
> >>
> >> In the log mentioned below, it say ping-time is set to default value
> >> 30sec.I think issue is different.
> >> Can you please point me to the logs where you where able to re-create
> >> the problem.
> >>
> >> Thanks,
> >> Vijay
> >>
> >>
> >>
> >> On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:
> >>> hi Vijai, Joseph,
> >>>   In 2 of the last 3 build failures,
> >>>   http://build.gluster.org/job/regression/4479/console,
> >>>   http://build.gluster.org/job/regression/4478/console this
> >>>   test(tests/bugs/bug-1090042.t) failed. Do you guys think it is
> >>>   better
> >>>   to revert this test until the fix is available? Please send a patch
> >>>   to revert the test case if you guys feel so. You can re-submit it
> >>>   along with the fix to the bug mentioned by Joseph.
> >>>
> >>> Pranith.
> >>>
> >>> - Original Message -
>  From: "Joseph Fernandes" 
>  To: "Pranith Kumar Karampuri" 
>  Cc: "Gluster Devel" 
>  Sent: Friday, 16 May, 2014 5:13:57 PM
>  Subject: Re: Spurious failures because of nfs and snapshots
> 
> 
>  Hi All,
> 
>  tests/bugs/bug-1090042.t :
> 
>  I was able to reproduce the issue i.e when this test is done in a loop
> 
>  for i in {1..135} ; do  ./bugs/bug-1090042.t
> 
>  When checked the logs
>  [2014-05-16 10:49:49.003978] I [rpc-clnt.c:973:rpc_clnt_connection_init]
>  0-management: setting frame-timeout to 600
>  [2014-05-16 10:49:49.004035] I [rpc-clnt.c:988:rpc_clnt_connection_init]
>  0-management: defaulting ping-timeout to 30secs
>  [2014-05-16 10:49:49.004303] I [rpc-clnt.c:973:rpc_clnt_connection_init]
>  0-management: setting frame-timeout to 600
>  [2014-05-16 10:49:49.004340] I [rpc-clnt.c:988:rpc_clnt_connection_init]
>  0-management: defaulting ping-timeout to 30secs
> 
>  The issue is with ping-timeout and is tracked under the bug
> 
>  https://bugzilla.redhat.com/show_bug.cgi?id=1096729
> 
> 
>  The workaround is mentioned in
>  https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8
> 
> 
>  Regards,
>  Joe
> 
>  - Original Message -
>  From: "Pranith Kumar Karampuri" 
>  To: "Gluster Devel" 
>  Cc: "Joseph Fernandes" 
>  Sent: Friday, May 16, 2014 6:19:54 AM
>  Subject: Spurious failures because of nfs and snapshots
> 
>  hi,
>    In

Re: [Gluster-devel] Regression tests: Should we test non-XFS too?

2014-05-20 Thread Vijay Bellur

On 05/19/2014 06:56 AM, Dan Mons wrote:

On 15 May 2014 14:35, Ric Wheeler  wrote:


it is up to those developers and users to test their preferred combination.



Not sure if this was quoting me or someone else.  BtrFS is in-tree for
most distros these days, and RHEL is putting it in as a "technology
preview" in 7, which likely means it'll be supported in a point
release down the road somewhere.  My question was merely if that's
going to be a bigger emphasis for Gluster.org folks to test into the
future, or if XFS is going to remain the default/recommended for a lot
longer yet.

If the answer is "it depends on our customers' needs", then put me
down as one who needs something better than XFS.  I'll happily put in
the hard yards to test BtrFS with GlusterFS, but at the same time I'm
keen to know if that's a wise use of my time or a complete waste of my
time if I'm deviating too far from what RedHat/Gluster.org is planning
on blessing in the future.


From a gluster.org perspective, btrfs is certainly very interesting. 
Integrating with btrfs and exposing its capabilities like bitrot, 
snapshots etc. through glusterfs is on the cards.


There have been few reports of using glusterfs over btrfs in the 
community. I would definitely be interested in hearing more feedback and 
addressing issues in this combination by collaborating with the btrfs 
community.


Regards,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel