[ 
https://issues.apache.org/jira/browse/QPID-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098130#comment-15098130
 ] 

ASF subversion and git services commented on QPID-6972:
-------------------------------------------------------

Commit 1724616 from oru...@apache.org in branch 'java/branches/6.0.x'
[ https://svn.apache.org/r1724616 ]

QPID-6972: Delegate exception handling decisions on flushLog failures to 
EnvironmentFacade

           merged from trunk
           svn merge -c 1724582 https://svn.apache.org/repos/asf/qpid/java/trunk

> BDB HA: Node may remain detached from group following loss of quorum
> --------------------------------------------------------------------
>
>                 Key: QPID-6972
>                 URL: https://issues.apache.org/jira/browse/QPID-6972
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: 0.30, 0.32, qpid-java-6.0
>            Reporter: Keith Wall
>              Labels: bdbstore, high-availability
>
> If a master detects that it has lost quorum (which may occur owing to a user 
> generated transaction, or an internally generated 'ping' transaction, failing 
> to see the required number of replica acknowledgements), the underlying JE 
> environment {{ReplicatedEnvironment}} is automatically restarted (the old one 
> closed and a new one created to replace it).   This approach ensures that 
> clients reconnect to a new master in a timely way.
> There is a coding error in the CoalescingCommitter that means that the JE 
> environment restart may not complete properly.  If quorum disappears whilst 
> there are jobs on the CoalescingCommitter's job queue, the  
> CoalescingCommitter's error handling will cause the BDB EnvironmentFacade to 
> be closed.   This is okay for the BDB non-HA case as such an exception is 
> always fatal, but for HA, calling {{ReplicatedEnvironmentFacade#close()}} 
> prevents the environment from being recreated.
> This effect of this defect is that a node may disappear from the group every 
> time quorum is temporarily lost.  This will keep occuring until quorum no 
> longer remains, at which point the business will stop.  Bouncing the affected 
> brokers (or restarting the VHNs) will restore the service, without message 
> loss.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to