[ 
https://issues.apache.org/jira/browse/ACCUMULO-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915886#comment-13915886
 ] 

Billie Rinaldi commented on ACCUMULO-2422:
------------------------------------------

Looks like good detective work, [~bhavanki].  In ZooLock.lockAsync it seems to 
check for a NodeDeleted event or expired session, but doesn't reset the watch 
if it gets another type of event.  If you verify this is the problem, it was 
introduced in 1.5.1.

> Backup master can miss acquiring lock when primary exits
> --------------------------------------------------------
>
>                 Key: ACCUMULO-2422
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2422
>             Project: Accumulo
>          Issue Type: Bug
>          Components: fate, master
>    Affects Versions: 1.5.0
>            Reporter: Bill Havanki
>            Assignee: Bill Havanki
>            Priority: Critical
>              Labels: failover, locking
>
> While running randomwalk tests with agitation for the 1.5.1 release, I've 
> seen situations where a backup master that is eligible to grab the master 
> lock continues to wait. When this condition arises and the other master 
> restarts, both wait for the lock without success.
> I cannot reproduce the problem reliably, and I think more investigation is 
> needed to see what circumstances could be causing the problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to