[
https://issues.apache.org/jira/browse/AMQ-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309092#comment-14309092
]
Torsten Mielke edited comment on AMQ-5568 at 2/6/15 1:05 PM:
-------------------------------------------------------------
The keepAlive() check is needed due to
[AMQ-4705|https://issues.apache.org/jira/browse/AMQ-4705], otherwise you may
get two master broker instances.
was (Author: tmielke):
The keepAlive ping is needed due to
[AMQ-4705|https://issues.apache.org/jira/browse/AMQ-4705], otherwise you may
get two master broker instances.
> deleting lock file on broker shut down can take a master broker down
> --------------------------------------------------------------------
>
> Key: AMQ-5568
> URL: https://issues.apache.org/jira/browse/AMQ-5568
> Project: ActiveMQ
> Issue Type: Bug
> Components: Broker, Message Store
> Affects Versions: 5.11.0
> Reporter: Torsten Mielke
> Labels: persistence
>
> This problem may only occur on a shared file system master/slave setup.
> I can reproduce reliably on a NFSv4 mount using a persistence adapter
> configuration like
> {code}
> <levelDB directory="/nfs/activemq/data/leveldb" lockKeepAlivePeriod="5000">
> <locker>
> <shared-file-locker lockAcquireSleepInterval="10000"/>
> </locker>
> </levelDB>
> {code}
> However the problem is also reproducible using kahaDB.
> Two broker instances competing for the lock on the shared storage (e.g.
> leveldb or kahadb). Lets say brokerA becomes master, broker B slave.
> If brokerA looses access to the NFS share, it will shut down. As part of
> shutting down, it tries delete the lock file of the persistence adapter. Now
> since the NFS share is gone, all file i/o calls hang for a good while before
> returning errors. As such the broker shut down gets delayed.
> In the meantime the slave broker B (not affected by the NFS problem) grabs
> the lock and becomes master.
> If the NFS mount is restored while broker A (the previous master) still hangs
> on the file i/o operations (as part of its shutdown routine), the attempt to
> delete the persistence adapter lock file will finally succeed and broker A
> shuts down.
> Deleting the lock file however also affects the new master broker B who
> periodically runs a keepAlive() check on the lock. That check verifies the
> file still exists and the FileLock is still valid. As the lock file got
> deleted, keepAlive() fails on broker B and that broker shuts down as well.
> The overall result is that both broker instances have shut down despite an
> initially successful failover.
> Using restartAllowed=true is not an option either as this can cause other
> problems in an NFS based master/slave setup.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)