[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565272#comment-14565272
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1748:
---------------------------------------------------

Sorry for the slowness [~gdrot]! Would you mind rebasing the patch on trunk 
since it doesn't apply cleanly anymore?

Also, a few nits:

In:

{code}
     private void setSockOpts(Socket sock) throws SocketException {
         sock.setTcpNoDelay(true);
+        LOG.info("Settings keepAlive to " + self.tcpKeepAlive +  " on socket");
+        sock.setKeepAlive(self.tcpKeepAlive);
{code}

you can probably drop that LOG.info call, since it's logged from 
QuorumPeer#setTcpKeepAlive already. And for the log call in 
QuorumPeer#setTcpKeepAlive it would be nice to use extrapolation instead of 
string concatenation, i.e.:

{code}
+    public void setTcpKeepAlive(boolean tcpKeepAlive) {
+        LOG.info("tcpKeepAlive set to {}", tcpKeepAlive);
+        this.tcpKeepAlive = tcpKeepAlive;
+    }
{code}

Other than that, +1 lgtm. Btw, we are using this patch internally so I can 
attest it works great too. 

Also, it would be nice to document this new configuration property, i.e.:

{code}
--- a/docs/zookeeperAdmin.html
+++ b/docs/zookeeperAdmin.html
@@ -1358,6 +1358,14 @@ server.3=zoo3:2888:3888</pre>
 </div>
 </div>
 </dd>
+<term>tcpKeepAliveEnabled</term>
+</dt>
+<dd>
+<p>(No Java system property)</p>
+<p><Strong>New in 3.5.0:</strong> Sets the keepAlive flag on the sockets used 
by quorum
+   members to perform elections. Defaults to <strong>false</strong>.
+</p>
+</dd>
         
{code}

Thanks!


> TCP keepalive for leader election connections
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1748
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: leaderElection
>    Affects Versions: 3.4.5, 3.5.0
>         Environment: Linux, Java 1.7
>            Reporter: Antal Sasvári
>            Assignee: Daniel Peon
>            Priority: Minor
>             Fix For: 3.5.2, 3.6.0
>
>         Attachments: Zookeeper-1748-add_tcp_keepalive.patch
>
>
> In our system we encountered the following problem:
> If the system is stable, and there is no leader election, the leader election 
> port connections are open for very long time without any packets being sent 
> on them.
> Some network elements silently drop the established TCP connection after a 
> timeout if there are no packets being sent on it. In this case the ZK servers 
> will not notice the connection loss. This causes additional delay later when 
> the next leader election is started, as the TCP connections are not alive any 
> more.
> We would like to be able to enable TCP keepalive on the leader election 
> sockets in order to prevent the connection timeout in some network elements 
> due to connection inactivity.
> This could be controlled by adding a new config parameter called tcpKeepAlive 
> in the ZooKeeper configuration file. It would be only applicable in case of 
> algorithm 3 (TCP based fast leader election), having the default value false.
> If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for 
> the leader election sockets in QuorumCnxManager.setSockOpts() by calling 
> sock.setKeepAlive(true).
> We have tested this change successfully in our environment.
> Please comment whether you see any problem with this. If not, I am going to 
> submit a patch.
> I've been told that e.g. Apache ActiveMQ also has a config option for similar 
> purpose called transport.keepalive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to