Sometimes one process node owns 2 tasks
---------------------------------------

                 Key: S4-3
                 URL: https://issues.apache.org/jira/browse/S4-3
             Project: Apache S4
          Issue Type: Bug
            Reporter: Gavin Li


When using S4, we found sometimes it ends up with one process node owns 2 
tasks. I did some investigation, it seems that the handling of 
ConnectionLossException when creating the ephemeral node is problematic. 
Sometimes when the response from zookeeper server times out, zookeeper.create() 
will fail with ConnectionLossException while the creation request might already 
be sent to server(see 
http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java
 line 830). From our logs this is the case we ran into.

Maybe we should handle it in the way that HBase is handling it 
(http://svn.apache.org/viewvc/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java?view=markup),
 just simply exit the process when got that exception to let the whole process 
restart.

To be more clear, what happened was: a process node called zookeeper.create() 
to acquire a task, the request was successfully sent to zookeeper server, but 
the zookeeper IO loop timed out before the response came. So the 
zookeeper.create() failed with ConnectionLossException. Then the process node 
ignored this exception and tried to acquire another task. Then it got 2 tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to