ivakegg opened a new issue, #6044:
URL: https://github.com/apache/accumulo/issues/6044

   The problem scenario we came across is where a node became inaccessible, 
however somehow tservers on that node were keeping the zookeeper locks alive.  
The master was still continually trying to contact that node for tablet 
assignments, various fate transactions (bulk loads, table deletes, 
compactions).  All of the communications were timing out because sockets could 
not be established.  The master got to the point where it was attempting to 
shutdown the tservers bug of course that was failing as well.  After removing 
the node from the cluster.yaml and failing all of the fate transactions, the 
master still would not get past the issue.  We finally has to issue an admin 
stop -f <tserver> to force the lock to be removed and to get past the issue.
   
   I would like the ability for the master to forcefully remove the zookeeper 
lock after a configurable number of attempts to stop the same tserver.
   
   This was in accumulo 2.1.4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to