Hi,
I'm performing downtime measurement tests using corosync version 2.3.0 and
pacemaker version 1.1.12 under RHEL 6.5 MRG and although not recommended, I
tuned the corosync configuration settings to following insane values:
# Timeout for token
token: 60
token_retransmits_before_loss_const: 1
# How long to wait for join messages in the membership protocol (ms)
join: 35
consensus: 70
My two node cluster consists of a kamailio clone resource, which replicates the
so called userlocation state using DMQ on application level (see [1]). The
switchover performs the migration of a ocf:heartbeat:IPaddr2 resource. With
these settings, the service downtime is lower 100ms in case of a controlled
cluster switchover, when "/etc/init.d/pacemaker stop" and "/etc/init.d/corosync
stop" get executed.
The service downtime is about 400ms when the power loss is simulated on the
active node, which does not execute the DC task. When I simulate power loss on
the active node, which is active and executes the DC task, the service downtime
increases to about 1500ms. As the timestamps in the logs are on second
resolution only, it is hard to provide more detailed numbers, but apparently
the DC election procedure takes more than 1000ms.
Are there any possibilities to tune the DC election process? Is there
documentation available what is happening in this situation?
Tests with more nodes in the cluster showed that the service downtime increases
with the number of online cluster nodes, even if the DC is executed on one of
the nodes, which remain active.
I'm using one ring only. It looks as the usage of two rings do not change the
test results a lot.
Thank you,
Stefan
[1] http://kamailio.org/docs/modules/devel/modules/dmq.html
_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss