Fencing, yes.  I have pcmk-redirect for each node in cluster.conf.

I run with default cman settings for corosync.  No totem clause.  That gives 
the 20s detection.  Not sure what the defaults really are.
I added <totem token="1000" token_retransmits_before_loss_const="5" /> to 
cluster.conf and get about a 5s detection.

The corosync man page says:
       token  This timeout specifies in milliseconds until a token loss is 
declared after not receiving a token.  This is the time spent detecting a
              failure of a processor in the current configuration.  Reforming a 
new configuration takes about 50 milliseconds in  addition  to  this
              timeout.

              The default is 1000 milliseconds.

       token_retransmit
              This timeout specifies in milliseconds after how long before 
receiving a token the token is retransmitted.  This will be automatically
              calculated if token is modified.  It is not recommended to alter 
this value without guidance from the corosync community.

              The default is 238 milliseconds.

       hold   This timeout specifies in milliseconds how long the token should 
be held by the representative when the protocol is under low utiliza‐
              tion.   It is not recommended to alter this value without 
guidance from the corosync community.

              The default is 180 milliseconds.

       token_retransmits_before_loss_const
              This  value  identifies  how  many  token  retransmits  should be 
attempted before forming a new configuration.  If this value is set,
              retransmit and hold will be automatically calculated from 
retransmits_before_loss and token.

              The default is 4 retransmissions.

But, I don't know what cman sets these to.  But, they aren't these values.  
And, they aren't the values in the cman man page, which says this:
              Cman uses different defaults for some of the corosync parameters 
listed in corosync.conf(5).  If you wish to use a non-default set‐
              ting, they can be configured in cluster.conf as shown above.  
Cman uses the following default values:

                <totem
                  vsftype="none"
                  token="10000"
                  token_retransmits_before_loss_const="20"
                  join="60"
                  consensus="4800"
                  rrp_mode="none"
                  <!-- or rrp_mode="active" if altnames are present >
                />
               
So, it looks like setting the corosync parameters in cluster.conf has some 
effect.  Cman seems to pass them to corosync.

Onward!


Regards.
Mark K Vallevand   mark.vallev...@unisys.com <mailto:mark.vallev...@unisys.com> 
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.


-----Original Message-----
From: Digimer [mailto:li...@alteeve.ca] 
Sent: Friday, October 16, 2015 11:18 AM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] Cluster node loss detection.

On 16/10/15 11:40 AM, Vallevand, Mark K wrote:
> Thanks.  I wasn't completely aware of corosync's role in this.  I see new 
> things in the docs every time I read them.
> 
> I looked up the corosync settings at one time and did it again:
>       token loss 3000ms
>       retransmits 10
> So 30s.  Redid my simple testing and got detection times of 22s, 26s, and 25s 
> using very crude methods.
> Any warnings about setting these values to something else?
> We require our customers to use an isolated, private network for cluster 
> communications.  All taken care of in our instructions and cluster 
> configuration scripts.  Network traffic will not be a factor.  So, I'm 
> thinking 1000ms and 5 retransmits as an experiment.

That is very high. I think the default is something like 236ms x 4 losses.

You do have fencing, right?

> I was pretty sure that DLM was just being informed by clustering, but I 
> needed to ask.
> 
> Again, thanks.
>       
> 
> Regards.
> Mark K Vallevand   mark.vallev...@unisys.com 
> <mailto:mark.vallev...@unisys.com> 
> Never try and teach a pig to sing: it's a waste of time, and it annoys the 
> pig.


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to