[
https://issues.apache.org/jira/browse/AMQ-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gary Tully updated AMQ-5568:
----------------------------
Fix Version/s: 5.12.0
> Deleting lock file on broker shut down can take a master broker down
> --------------------------------------------------------------------
>
> Key: AMQ-5568
> URL: https://issues.apache.org/jira/browse/AMQ-5568
> Project: ActiveMQ
> Issue Type: Bug
> Components: Broker, Message Store
> Affects Versions: 5.11.0
> Reporter: Torsten Mielke
> Assignee: Gary Tully
> Labels: persistence
> Fix For: 5.12.0
>
>
> This problem may only occur on a shared file system master/slave setup.
> I can reproduce reliably on a NFSv4 mount using a persistence adapter
> configuration like
> {code}
> <levelDB directory="/nfs/activemq/data/leveldb" lockKeepAlivePeriod="5000">
> <locker>
> <shared-file-locker lockAcquireSleepInterval="10000"/>
> </locker>
> </levelDB>
> {code}
> However the problem is also reproducible using kahaDB.
> Two broker instances competing for the lock on the shared storage (e.g.
> leveldb or kahadb). Lets say brokerA becomes master, broker B slave.
> If brokerA looses access to the NFS share, it will shut down. As part of
> shutting down, it tries delete the lock file of the persistence adapter. Now
> since the NFS share is gone, all file i/o calls hang for a good while before
> returning errors. As such the broker shut down gets delayed.
> In the meantime the slave broker B (not affected by the NFS problem) grabs
> the lock and becomes master.
> If the NFS mount is restored while broker A (the previous master) still hangs
> on the file i/o operations (as part of its shutdown routine), the attempt to
> delete the persistence adapter lock file will finally succeed and broker A
> shuts down.
> Deleting the lock file however also affects the new master broker B who
> periodically runs a keepAlive() check on the lock. That check verifies the
> file still exists and the FileLock is still valid. As the lock file got
> deleted, keepAlive() fails on broker B and that broker shuts down as well.
> The overall result is that both broker instances have shut down despite an
> initially successful failover.
> Using restartAllowed=true is not an option either as this can cause other
> problems in an NFS based master/slave setup.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)