[jira] [Comment Edited] (AMQ-5549) Shared Filesystem Master/Slave using NFSv4 allows both brokers become active at the same time

Torsten Mielke (JIRA) Fri, 30 Jan 2015 00:30:08 -0800

    [ 
https://issues.apache.org/jira/browse/AMQ-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298361#comment-14298361
 ]


Torsten Mielke edited comment on AMQ-5549 at 1/30/15 8:28 AM:
--------------------------------------------------------------

Some of the NFS mount options may not support a quick broker failover from 
master to slave. 
The options we finally got best results with where

{code}
timeo=100,retrans=1,soft,noac
{code}

We reduced the timeout to 10 seconds and also reduced the retry to just 1. 
In addition a hard mount seems to retry NFS operations forever (according to 
man page) and using soft operations will fail after retrans transmission 
attempts. Most likely what you want to ensure a quick failover.
And finally the noac option seemed to had a big effect as well on speed at 
which the master broker detects the NFS failure as it also caused a sync write 
to NFS, which seems to propagate exceptions more quickly. It most likely has a 
negative impact on performance though.

I can't provide no scientific support for these arguments other than above but 
with these settings the master broker would should down much quicker upon an 
NFS failure. 



was (Author: tmielke):
Some of the NFS mount options may not support a quick broker failover from 
master to slave. 
The options we finally got best results with where

{code}
timeo=100,retrans=1,soft,noac
{code}

We reduced the timeout to 10 seconds and also reduced the retry to just 1. 
In addition a hard mount seems to retry NFS operations forever (according to 
man page) and using soft operations will fail after retrans transmission 
attempts.
And finally the noac option seemed to had a big effect as well on speed at 
which the master broker detects the NFS failure as it also caused a sync write 
to NFS, which seems to propagate exceptions more quickly. It most likely has a 
negative impact on performance though.

I can't provide no scientific support for these arguments other than above but 
with these settings the master broker would should down much quicker upon an 
NFS failure. 


> Shared Filesystem Master/Slave using NFSv4 allows both brokers become active 
> at the same time
> ---------------------------------------------------------------------------------------------
>
>                 Key: AMQ-5549
>                 URL: https://issues.apache.org/jira/browse/AMQ-5549
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker, Message Store
>    Affects Versions: 5.10.1
>         Environment: - CentOS Linux 6
> - OpenJDK 1.7
> - ActiveMQ 5.10.1
>            Reporter: Heikki Manninen
>            Priority: Critical
>
> Identical ActiveMQ master and slave brokers are installed on CentOS Linux 6 
> virtual machines. There is a third virtual machine (also CentOS 6) providing 
> an NFSv4 share for the brokers KahaDB.
> Both brokers are started and the master broker acquires file lock on the lock 
> file and the slave broker sits in a loop and waits for a lock as expected. 
> Also changing brokers work as expected.
> Once the network connection of the NFS server is disconnected both master and 
> slave NFS mounts block and slave broker stops logging file lock re-tries. 
> After a short while after bringing the network connection back the mounts 
> come back and the slave broker is able to acquire the lock simultaneously. 
> Both brokers accept client connections.
> In this situation it is also possible to stop and start both individual 
> brokers many times and they are always able to acquire the lock even if the 
> other one is already running. Only after stopping both brokers and starting 
> them again is the situation back to normal.
> * NFS server:
> ** CentOS Linux 6
> ** NFS v4 export options: rw,sync
> ** NFS v4 grace time 45 seconds
> ** NFS v4 lease time 10 seconds
> * NFS client:
> ** CentOS Linux 6
> ** NFS mount options: nfsvers=4,proto=tcp,hard,wsize=65536,rsize=65536
> * ActiveMQ configuration (otherwise default):
> {code:xml}
>         <persistenceAdapter>
>             <kahaDB directory="${activemq.data}/kahadb">
>               <locker>
>                 <shared-file-locker lockAcquireSleepInterval="1000"/>
>               </locker>
>             </kahaDB>
>         </persistenceAdapter>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (AMQ-5549) Shared Filesystem Master/Slave using NFSv4 allows both brokers become active at the same time

Reply via email to