Re: [Gluster-devel] Need sensible default value for detecting unclean client disconnects

2014-05-20 Thread Niels de Vos
On Tue, May 20, 2014 at 01:30:24PM +0200, Niels de Vos wrote:
 Hi all,
 
 the last few days I've been looking at a problem [1] where a client 
 locks a file over a FUSE-mount, and a 2nd client tries to grab that lock 
 too.  It is expected that the 2nd client gets blocked until the 1st 
 client releases the lock. This all work as long as the 1st client 
 cleanly releases the lock.
 
 Whenever the 1st client crashes (like a kernel panic) or the network is 
 split and the 1st client is unreachable, the 2nd client may not get the 
 lock until the bricks detect that the connection to the 1st client is 
 dead. If there are pending Replies, the bricks may need 15-20 minutes 
 until the re-transmissions of the replies have timed-out.
 
 The current default of 15-20 minutes is quite long for a fail-over 
 scenario. Relatively recently [2], the Linux kernel got 
 a TCP_USER_TIMEOUT socket option (similar to TCP_KEEPALIVE). This option 
 can be used to configure a per-socket timeout, instead of a system-wide 
 configuration through the net.ipv4.tcp_retries2 sysctl.
 
 The default network.ping-timeout is set to 42 seconds. I'd like to 
 propose a network.tcp-timeout option that can be set per volume. This 
 option should then set TCP_USER_TIMEOUT for the socket, which causes 
 re-transmission failures to be fatal after the timeout has passed.
 
 Now the remaining question, what shall be the default timeout in seconds 
 for this new network.tcp-timeout option? I'm currently thinking of 
 making it high enough (like 5 minutes) to prevent false positives.
 
 Thoughts and comments welcome,
 Niels
 
 
 1 https://bugzilla.redhat.com/show_bug.cgi?id=1099460
 2 
 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dca43c7

Posted a patch for review: http://review.gluster.org/7814
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Need sensible default value for detecting unclean client disconnects

2014-05-20 Thread Anand Avati
Niels,
This is a good addition. While gluster clients do a reasonably good job at
detecting dead/hung servers with ping-timeout, the server side detection
has been rather weak. TCP_KEEPALIVE has helped to some extent, for cases
where an idling client (which holds a lock) goes dead. However if an active
client with pending data in server's socket buffer dies, we have been
subject to long tcp retransmission to finish and give up.

The way I see it, this option is complementary to TCP_KEEPALIVE (keepalive
works for idle and only idle connections, user_timeout works only when
there is pending acknowledgements, thus covering the full spectrum). To
that end, it might make sense to present the admin a single timeout
configuration value rather than two. It would be very frustrating for the
admin to configure one of them to, say, 30 seconds, and then find that the
server does not clean up after 30 seconds of a hung client only because the
connection was idle (or not idle). Configuring a second timeout for the
other case can be very unintuitive.

In fact, I would suggest to have a single network timeout configuration,
which gets applied to all the three: ping-timeout on the client,
user_timeout on the server, keepalive on both. I think that is what a user
would be expecting anyways. Each is for a slightly different technical
situation, but all just internal details as far as a user is concerned.

Thoughts?


On Tue, May 20, 2014 at 4:30 AM, Niels de Vos nde...@redhat.com wrote:

 Hi all,

 the last few days I've been looking at a problem [1] where a client
 locks a file over a FUSE-mount, and a 2nd client tries to grab that lock
 too.  It is expected that the 2nd client gets blocked until the 1st
 client releases the lock. This all work as long as the 1st client
 cleanly releases the lock.

 Whenever the 1st client crashes (like a kernel panic) or the network is
 split and the 1st client is unreachable, the 2nd client may not get the
 lock until the bricks detect that the connection to the 1st client is
 dead. If there are pending Replies, the bricks may need 15-20 minutes
 until the re-transmissions of the replies have timed-out.

 The current default of 15-20 minutes is quite long for a fail-over
 scenario. Relatively recently [2], the Linux kernel got
 a TCP_USER_TIMEOUT socket option (similar to TCP_KEEPALIVE). This option
 can be used to configure a per-socket timeout, instead of a system-wide
 configuration through the net.ipv4.tcp_retries2 sysctl.

 The default network.ping-timeout is set to 42 seconds. I'd like to
 propose a network.tcp-timeout option that can be set per volume. This
 option should then set TCP_USER_TIMEOUT for the socket, which causes
 re-transmission failures to be fatal after the timeout has passed.

 Now the remaining question, what shall be the default timeout in seconds
 for this new network.tcp-timeout option? I'm currently thinking of
 making it high enough (like 5 minutes) to prevent false positives.

 Thoughts and comments welcome,
 Niels


 1 https://bugzilla.redhat.com/show_bug.cgi?id=1099460
 2
 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dca43c7
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel