Hi, I have set up a ACTIVE/PASSIVE HA
*Issue 1) * *corosync.conf* file is # Please read the openais.conf.5 manual page totem { version: 2 # How long before declaring a token lost (ms) token: 10000 # How many token retransmits before forming a new configuration token_retransmits_before_loss_const: 20 # How long to wait for join messages in the membership protocol (ms) join: 10000 # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms) consensus: 12000 # Turn off the virtual synchrony filter vsftype: none # Number of messages that may be sent by one processor on receipt of the token max_messages: 20 # Limit generated nodeids to 31-bits (positive signed integers) clear_node_high_bit: yes # Disable encryption secauth: off # How many threads to use for encryption/decryption threads: 0 # Optionally assign a fixed node id (integer) # nodeid: 1234 # This specifies the mode of redundant ring, which may be none, active, or passive. rrp_mode: none interface { # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 192.168.101.0 mcastport: 5405 } transport: udpu } amf { mode: disabled } quorum { # Quorum for the Pacemaker Cluster Resource Manager provider: corosync_votequorum expected_votes: 1 } nodelist { node { ring0_addr: 192.168.101.73 } node { ring0_addr: 192.168.101.74 } } aisexec { user: root group: root } logging { fileline: off to_stderr: yes to_logfile: yes to_syslog: yes syslog_facility: daemon logfile: /var/log/corosync/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off tags: enter|leave|trace1|trace2|trace3|trace4|trace6 } } And I have added 5 resources - 1 is VIP and 4 are upstart jobs Node names are configured as -> sc-node-1(ACTIVE) and sc-node-2(PASSIVE) Resources are running on ACTIVE node Default cluster properties - <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-42f2063"/> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/> <nvpair name="no-quorum-policy" value="ignore" id="cib-bootstrap-options-no-quorum-policy"/> <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/> <nvpair name="cluster-recheck-interval" value="3min" id="cib-bootstrap-options-cluster-recheck-interval"/> <nvpair name="default-action-timeout" value="120s" id="cib-bootstrap-options-default-action-timeout"/> </cluster_property_set> But sometimes after 2-3 migrations from ACTIVE to STANDBY and then from STANDBY to ACTIVE, both nodes become OFFLINE and Current DC becomes None, I have disabled the stonith property and even quorum is ignored root@sc-node-2:/usr/lib/python2.7/dist-packages/sc# crm status Last updated: Sat Oct 3 00:01:40 2015 Last change: Fri Oct 2 23:38:28 2015 via crm_resource on sc-node-1 Stack: corosync Current DC: NONE 2 Nodes configured 5 Resources configured OFFLINE: [ sc-node-1 sc-node-2 ] What is going wrong here ? What is the reason for node Current DC becoming None suddenly ? Is corosync.conf okay ? Are default cluster properties fine ? Help will be appreciated. *Issue 2)* Command used to add upstart job is crm configure primitive service upstart:service meta allow-migrate=true migration-threshold=5 failure-timeout=30s op monitor interval=15s timeout=60s But still sometimes I see fail count going to INFINITY. Why ? How can we avoid it ? Resource should have migrated as soon as it reaches migration threshold. * Node sc-node-2: service: migration-threshold=5 fail-count=1000000 last-failure='Fri Oct 2 23:38:53 2015' service1: migration-threshold=5 fail-count=1000000 last-failure='Fri Oct 2 23:38:53 2015' Failed actions: service_start_0 (node=sc-node-2, call=-1, rc=1, status=Timed Out, last-rc-change=Fri Oct 2 23:38:53 2015 , queued=0ms, exec=0ms ): unknown error service1_start_0 (node=sc-node-2, call=-1, rc=1, status=Timed Out, last-rc-change=Fri Oct 2 23:38:53 2015 , queued=0ms, exec=0ms -- Thanks and Regards, Pritam Kharat.
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org