[ https://issues.apache.org/jira/browse/QPID-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098130#comment-15098130 ]
ASF subversion and git services commented on QPID-6972: ------------------------------------------------------- Commit 1724616 from oru...@apache.org in branch 'java/branches/6.0.x' [ https://svn.apache.org/r1724616 ] QPID-6972: Delegate exception handling decisions on flushLog failures to EnvironmentFacade merged from trunk svn merge -c 1724582 https://svn.apache.org/repos/asf/qpid/java/trunk > BDB HA: Node may remain detached from group following loss of quorum > -------------------------------------------------------------------- > > Key: QPID-6972 > URL: https://issues.apache.org/jira/browse/QPID-6972 > Project: Qpid > Issue Type: Bug > Components: Java Broker > Affects Versions: 0.30, 0.32, qpid-java-6.0 > Reporter: Keith Wall > Labels: bdbstore, high-availability > > If a master detects that it has lost quorum (which may occur owing to a user > generated transaction, or an internally generated 'ping' transaction, failing > to see the required number of replica acknowledgements), the underlying JE > environment {{ReplicatedEnvironment}} is automatically restarted (the old one > closed and a new one created to replace it). This approach ensures that > clients reconnect to a new master in a timely way. > There is a coding error in the CoalescingCommitter that means that the JE > environment restart may not complete properly. If quorum disappears whilst > there are jobs on the CoalescingCommitter's job queue, the > CoalescingCommitter's error handling will cause the BDB EnvironmentFacade to > be closed. This is okay for the BDB non-HA case as such an exception is > always fatal, but for HA, calling {{ReplicatedEnvironmentFacade#close()}} > prevents the environment from being recreated. > This effect of this defect is that a node may disappear from the group every > time quorum is temporarily lost. This will keep occuring until quorum no > longer remains, at which point the business will stop. Bouncing the affected > brokers (or restarting the VHNs) will restore the service, without message > loss. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org