[ https://issues.apache.org/jira/browse/AMQ-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298361#comment-14298361 ]
Torsten Mielke edited comment on AMQ-5549 at 1/30/15 8:28 AM: -------------------------------------------------------------- Some of the NFS mount options may not support a quick broker failover from master to slave. The options we finally got best results with where {code} timeo=100,retrans=1,soft,noac {code} We reduced the timeout to 10 seconds and also reduced the retry to just 1. In addition a hard mount seems to retry NFS operations forever (according to man page) and using soft operations will fail after retrans transmission attempts. Most likely what you want to ensure a quick failover. And finally the noac option seemed to had a big effect as well on speed at which the master broker detects the NFS failure as it also caused a sync write to NFS, which seems to propagate exceptions more quickly. It most likely has a negative impact on performance though. I can't provide no scientific support for these arguments other than above but with these settings the master broker would should down much quicker upon an NFS failure. was (Author: tmielke): Some of the NFS mount options may not support a quick broker failover from master to slave. The options we finally got best results with where {code} timeo=100,retrans=1,soft,noac {code} We reduced the timeout to 10 seconds and also reduced the retry to just 1. In addition a hard mount seems to retry NFS operations forever (according to man page) and using soft operations will fail after retrans transmission attempts. And finally the noac option seemed to had a big effect as well on speed at which the master broker detects the NFS failure as it also caused a sync write to NFS, which seems to propagate exceptions more quickly. It most likely has a negative impact on performance though. I can't provide no scientific support for these arguments other than above but with these settings the master broker would should down much quicker upon an NFS failure. > Shared Filesystem Master/Slave using NFSv4 allows both brokers become active > at the same time > --------------------------------------------------------------------------------------------- > > Key: AMQ-5549 > URL: https://issues.apache.org/jira/browse/AMQ-5549 > Project: ActiveMQ > Issue Type: Bug > Components: Broker, Message Store > Affects Versions: 5.10.1 > Environment: - CentOS Linux 6 > - OpenJDK 1.7 > - ActiveMQ 5.10.1 > Reporter: Heikki Manninen > Priority: Critical > > Identical ActiveMQ master and slave brokers are installed on CentOS Linux 6 > virtual machines. There is a third virtual machine (also CentOS 6) providing > an NFSv4 share for the brokers KahaDB. > Both brokers are started and the master broker acquires file lock on the lock > file and the slave broker sits in a loop and waits for a lock as expected. > Also changing brokers work as expected. > Once the network connection of the NFS server is disconnected both master and > slave NFS mounts block and slave broker stops logging file lock re-tries. > After a short while after bringing the network connection back the mounts > come back and the slave broker is able to acquire the lock simultaneously. > Both brokers accept client connections. > In this situation it is also possible to stop and start both individual > brokers many times and they are always able to acquire the lock even if the > other one is already running. Only after stopping both brokers and starting > them again is the situation back to normal. > * NFS server: > ** CentOS Linux 6 > ** NFS v4 export options: rw,sync > ** NFS v4 grace time 45 seconds > ** NFS v4 lease time 10 seconds > * NFS client: > ** CentOS Linux 6 > ** NFS mount options: nfsvers=4,proto=tcp,hard,wsize=65536,rsize=65536 > * ActiveMQ configuration (otherwise default): > {code:xml} > <persistenceAdapter> > <kahaDB directory="${activemq.data}/kahadb"> > <locker> > <shared-file-locker lockAcquireSleepInterval="1000"/> > </locker> > </kahaDB> > </persistenceAdapter> > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)