[ https://issues.apache.org/jira/browse/ACCUMULO-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915886#comment-13915886 ]
Billie Rinaldi commented on ACCUMULO-2422: ------------------------------------------ Looks like good detective work, [~bhavanki]. In ZooLock.lockAsync it seems to check for a NodeDeleted event or expired session, but doesn't reset the watch if it gets another type of event. If you verify this is the problem, it was introduced in 1.5.1. > Backup master can miss acquiring lock when primary exits > -------------------------------------------------------- > > Key: ACCUMULO-2422 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2422 > Project: Accumulo > Issue Type: Bug > Components: fate, master > Affects Versions: 1.5.0 > Reporter: Bill Havanki > Assignee: Bill Havanki > Priority: Critical > Labels: failover, locking > > While running randomwalk tests with agitation for the 1.5.1 release, I've > seen situations where a backup master that is eligible to grab the master > lock continues to wait. When this condition arises and the other master > restarts, both wait for the lock without success. > I cannot reproduce the problem reliably, and I think more investigation is > needed to see what circumstances could be causing the problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)