[
https://issues.apache.org/jira/browse/AMQ-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298361#comment-14298361
]
Torsten Mielke edited comment on AMQ-5549 at 1/30/15 8:28 AM:
--------------------------------------------------------------
Some of the NFS mount options may not support a quick broker failover from
master to slave.
The options we finally got best results with where
{code}
timeo=100,retrans=1,soft,noac
{code}
We reduced the timeout to 10 seconds and also reduced the retry to just 1.
In addition a hard mount seems to retry NFS operations forever (according to
man page) and using soft operations will fail after retrans transmission
attempts. Most likely what you want to ensure a quick failover.
And finally the noac option seemed to had a big effect as well on speed at
which the master broker detects the NFS failure as it also caused a sync write
to NFS, which seems to propagate exceptions more quickly. It most likely has a
negative impact on performance though.
I can't provide no scientific support for these arguments other than above but
with these settings the master broker would should down much quicker upon an
NFS failure.
was (Author: tmielke):
Some of the NFS mount options may not support a quick broker failover from
master to slave.
The options we finally got best results with where
{code}
timeo=100,retrans=1,soft,noac
{code}
We reduced the timeout to 10 seconds and also reduced the retry to just 1.
In addition a hard mount seems to retry NFS operations forever (according to
man page) and using soft operations will fail after retrans transmission
attempts.
And finally the noac option seemed to had a big effect as well on speed at
which the master broker detects the NFS failure as it also caused a sync write
to NFS, which seems to propagate exceptions more quickly. It most likely has a
negative impact on performance though.
I can't provide no scientific support for these arguments other than above but
with these settings the master broker would should down much quicker upon an
NFS failure.
> Shared Filesystem Master/Slave using NFSv4 allows both brokers become active
> at the same time
> ---------------------------------------------------------------------------------------------
>
> Key: AMQ-5549
> URL: https://issues.apache.org/jira/browse/AMQ-5549
> Project: ActiveMQ
> Issue Type: Bug
> Components: Broker, Message Store
> Affects Versions: 5.10.1
> Environment: - CentOS Linux 6
> - OpenJDK 1.7
> - ActiveMQ 5.10.1
> Reporter: Heikki Manninen
> Priority: Critical
>
> Identical ActiveMQ master and slave brokers are installed on CentOS Linux 6
> virtual machines. There is a third virtual machine (also CentOS 6) providing
> an NFSv4 share for the brokers KahaDB.
> Both brokers are started and the master broker acquires file lock on the lock
> file and the slave broker sits in a loop and waits for a lock as expected.
> Also changing brokers work as expected.
> Once the network connection of the NFS server is disconnected both master and
> slave NFS mounts block and slave broker stops logging file lock re-tries.
> After a short while after bringing the network connection back the mounts
> come back and the slave broker is able to acquire the lock simultaneously.
> Both brokers accept client connections.
> In this situation it is also possible to stop and start both individual
> brokers many times and they are always able to acquire the lock even if the
> other one is already running. Only after stopping both brokers and starting
> them again is the situation back to normal.
> * NFS server:
> ** CentOS Linux 6
> ** NFS v4 export options: rw,sync
> ** NFS v4 grace time 45 seconds
> ** NFS v4 lease time 10 seconds
> * NFS client:
> ** CentOS Linux 6
> ** NFS mount options: nfsvers=4,proto=tcp,hard,wsize=65536,rsize=65536
> * ActiveMQ configuration (otherwise default):
> {code:xml}
> <persistenceAdapter>
> <kahaDB directory="${activemq.data}/kahadb">
> <locker>
> <shared-file-locker lockAcquireSleepInterval="1000"/>
> </locker>
> </kahaDB>
> </persistenceAdapter>
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)