[ 
https://issues.apache.org/jira/browse/ACCUMULO-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Turner updated ACCUMULO-1277:
-----------------------------------

    Assignee: Keith Turner  (was: Eric Newton)

I had some code in place that delayed deleting empty tserver nodes, but it 
looks like I just dropped it.  Ooppss.  I'll take a look at this.  Nice write 
up.  
                
> Race condition between master and tserver when acquiring tserver lock
> ---------------------------------------------------------------------
>
>                 Key: ACCUMULO-1277
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1277
>             Project: Accumulo
>          Issue Type: Bug
>          Components: master, tserver
>    Affects Versions: 1.5.0, 1.4.3
>            Reporter: Daniel P Truitt
>            Assignee: Keith Turner
>
> When restarting a stopped tserver, the following happens:
> The tserver (in TabletServer.announceExistence()) creates an entry in 
> zookeeper at /accumulo/instance-id/tserver/host:port.
> This in turn triggers master to execute the call chain:
> LiveTServerSet.process(WatchedEvent)
> LiveTServerSet.scanServers()
> LiveTServerSet.checkServer(Set<TServerInstance>, Set<TServerInstance>, 
> String, String)
> The checkServer() method checks to see if the ZooLock data has been created 
> yet (if tserver loses the race, it has not yet been created) causing master 
> to then delete the tserver node.  
> When the tserver attempts to create the ZooLock, the parent path no longer 
> exists and creating the lock fails.  Eventually the tserver will time out 
> waiting to create the lock, and fail to start.
> This problem is easier to reproduce in a smallish cluster using a single 
> zookeeper node, where there is more latency between the tserver and zookeeper 
> than there is between the master and zookeeper.
> This behavior was introduced in the fix for ACCUMULO-1049.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to