Re: [Pacemaker] query ?

2014-09-28 Thread Andrew Beekhof

On 29 Sep 2014, at 12:26 pm, Alex Samad - Yieldbroker 
 wrote:

> Cool, thanks
> 
> Thought it might have been a normal check.

If there is a problem we'll normally log it as 'error' or 'crit'

> 
> A
> 
>> -Original Message-
>> From: renayama19661...@ybb.ne.jp
>> [mailto:renayama19661...@ybb.ne.jp]
>> Sent: Monday, 29 September 2014 12:20 PM
>> To: The Pacemaker cluster resource manager
>> Subject: Re: [Pacemaker] query ?
>> 
>> Hi Alex,
>> 
>> Because recheck_timer moves by default every 15 minutes, state transition
>> is calculated in pengine.
>> 
>> 
>> -
>> { XML_CONFIG_ATTR_RECHECK, "cluster_recheck_interval", "time",
>>  "Zero disables polling.  Positive values are an interval in seconds (unless
>> other SI units are specified. eg. 5min)", "15min", &check_timer,
>>  "Polling interval for time based changes to options, resource parameters
>> and constraints.",
>>  "The Cluster is primarily event driven, however the configuration can have
>> elements that change based on time."
>>  "  To ensure these changes take effect, we can optionally poll the cluster's
>> status for changes." }, { "load-threshold", NULL, "percentage", NULL, "80%",
>> &check_utilization,
>>  "The maximum amount of system resources that should be used by nodes
>> in the cluster",
>>  "The cluster will slow down its recovery process when the amount of
>> system resources used"
>>   " (currently CPU) approaches this limit", },
>> -
>> 
>> Best Regards,
>> Hideo Yamauchi.
>> 
>> 
>> 
>> 
>> - Original Message -
>>> From: Alex Samad - Yieldbroker 
>>> To: "pacemaker@oss.clusterlabs.org" 
>>> Cc:
>>> Date: 2014/9/29, Mon 10:56
>>> Subject: [Pacemaker] query ?
>>> 
>>> Hi
>>> 
>>> Is this normal logging ?
>>> 
>>> Not sure if I need to investigate any thing
>>> 
>>> Sep 29 11:35:15 gsdmz1 crmd[2481]:   notice: do_state_transition:
>>> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
>>> cause=C_TIMER_POPPED origin=crm_timer_popped ] Sep 29 11:35:15
>> gsdmz1
>>> pengine[2480]:   notice: unpack_config: On loss of CCM
>>> Quorum: Ignore
>>> Sep 29 11:35:15 gsdmz1 pengine[2480]:   notice: process_pe_message:
>>> Calculated Transition 196: /var/lib/pacer/pengine/pe-input-247.bz2
>>> Sep 29 11:35:15 gsdmz1 crmd[2481]:   notice: run_graph: Transition 196
>>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>>> Source=/var/lib/pacer/pengine/pe-input-247.bz2): Complete Sep 29
>>> 11:35:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State
>>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>>> cause=C_FSA_INTERNAL origin=notify_crmd ] Sep 29 11:50:15 gsdmz1
>>> crmd[2481]:   notice: do_state_transition: State transition S_IDLE ->
>>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
>>> origin=crm_timer_popped ] Sep 29 11:50:15 gsdmz1 pengine[2480]:
>>> notice: unpack_config: On loss of CCM
>>> Quorum: Ignore
>>> Sep 29 11:50:15 gsdmz1 pengine[2480]:   notice: process_pe_message:
>>> Calculated Transition 197: /var/lib/pacer/pengine/pe-input-247.bz2
>>> Sep 29 11:50:15 gsdmz1 crmd[2481]:   notice: run_graph: Transition 197
>>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>>> Source=/var/lib/pacer/pengine/pe-input-247.bz2): Complete Sep 29
>>> 11:50:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State
>>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>>> 
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] query ?

2014-09-28 Thread Alex Samad - Yieldbroker
Cool, thanks

Thought it might have been a normal check.

A

> -Original Message-
> From: renayama19661...@ybb.ne.jp
> [mailto:renayama19661...@ybb.ne.jp]
> Sent: Monday, 29 September 2014 12:20 PM
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] query ?
> 
> Hi Alex,
> 
> Because recheck_timer moves by default every 15 minutes, state transition
> is calculated in pengine.
> 
> 
> -
> { XML_CONFIG_ATTR_RECHECK, "cluster_recheck_interval", "time",
>   "Zero disables polling.  Positive values are an interval in seconds (unless
> other SI units are specified. eg. 5min)", "15min", &check_timer,
>   "Polling interval for time based changes to options, resource parameters
> and constraints.",
>   "The Cluster is primarily event driven, however the configuration can have
> elements that change based on time."
>   "  To ensure these changes take effect, we can optionally poll the cluster's
> status for changes." }, { "load-threshold", NULL, "percentage", NULL, "80%",
> &check_utilization,
>   "The maximum amount of system resources that should be used by nodes
> in the cluster",
>   "The cluster will slow down its recovery process when the amount of
> system resources used"
>           " (currently CPU) approaches this limit", },
> -
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> 
> 
> - Original Message -
> > From: Alex Samad - Yieldbroker 
> > To: "pacemaker@oss.clusterlabs.org" 
> > Cc:
> > Date: 2014/9/29, Mon 10:56
> > Subject: [Pacemaker] query ?
> >
> > Hi
> >
> > Is this normal logging ?
> >
> > Not sure if I need to investigate any thing
> >
> > Sep 29 11:35:15 gsdmz1 crmd[2481]:   notice: do_state_transition:
> > State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> > cause=C_TIMER_POPPED origin=crm_timer_popped ] Sep 29 11:35:15
> gsdmz1
> > pengine[2480]:   notice: unpack_config: On loss of CCM
> > Quorum: Ignore
> > Sep 29 11:35:15 gsdmz1 pengine[2480]:   notice: process_pe_message:
> > Calculated Transition 196: /var/lib/pacer/pengine/pe-input-247.bz2
> > Sep 29 11:35:15 gsdmz1 crmd[2481]:   notice: run_graph: Transition 196
> > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > Source=/var/lib/pacer/pengine/pe-input-247.bz2): Complete Sep 29
> > 11:35:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State
> > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> > cause=C_FSA_INTERNAL origin=notify_crmd ] Sep 29 11:50:15 gsdmz1
> > crmd[2481]:   notice: do_state_transition: State transition S_IDLE ->
> > S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
> > origin=crm_timer_popped ] Sep 29 11:50:15 gsdmz1 pengine[2480]:
> > notice: unpack_config: On loss of CCM
> > Quorum: Ignore
> > Sep 29 11:50:15 gsdmz1 pengine[2480]:   notice: process_pe_message:
> > Calculated Transition 197: /var/lib/pacer/pengine/pe-input-247.bz2
> > Sep 29 11:50:15 gsdmz1 crmd[2481]:   notice: run_graph: Transition 197
> > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > Source=/var/lib/pacer/pengine/pe-input-247.bz2): Complete Sep 29
> > 11:50:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State
> > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> > cause=C_FSA_INTERNAL origin=notify_crmd ]
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] query ?

2014-09-28 Thread renayama19661014
Hi Alex,

Because recheck_timer moves by default every 15 minutes, state transition is 
calculated in pengine.


-
{ XML_CONFIG_ATTR_RECHECK, "cluster_recheck_interval", "time",
  "Zero disables polling.  Positive values are an interval in seconds (unless 
other SI units are specified. eg. 5min)", "15min", &check_timer,
  "Polling interval for time based changes to options, resource parameters and 
constraints.",
  "The Cluster is primarily event driven, however the configuration can have 
elements that change based on time."
  "  To ensure these changes take effect, we can optionally poll the cluster's 
status for changes." },
{ "load-threshold", NULL, "percentage", NULL, "80%", &check_utilization,
  "The maximum amount of system resources that should be used by nodes in the 
cluster",
  "The cluster will slow down its recovery process when the amount of system 
resources used"
          " (currently CPU) approaches this limit", },
-

Best Regards,
Hideo Yamauchi.




- Original Message -
> From: Alex Samad - Yieldbroker 
> To: "pacemaker@oss.clusterlabs.org" 
> Cc: 
> Date: 2014/9/29, Mon 10:56
> Subject: [Pacemaker] query ?
> 
> Hi
> 
> Is this normal logging ?
> 
> Not sure if I need to investigate any thing
> 
> Sep 29 11:35:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State 
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
> origin=crm_timer_popped ]
> Sep 29 11:35:15 gsdmz1 pengine[2480]:   notice: unpack_config: On loss of CCM 
> Quorum: Ignore
> Sep 29 11:35:15 gsdmz1 pengine[2480]:   notice: process_pe_message: 
> Calculated 
> Transition 196: /var/lib/pacer/pengine/pe-input-247.bz2
> Sep 29 11:35:15 gsdmz1 crmd[2481]:   notice: run_graph: Transition 196 
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacer/pengine/pe-input-247.bz2): Complete
> Sep 29 11:35:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State 
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Sep 29 11:50:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State 
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
> origin=crm_timer_popped ]
> Sep 29 11:50:15 gsdmz1 pengine[2480]:   notice: unpack_config: On loss of CCM 
> Quorum: Ignore
> Sep 29 11:50:15 gsdmz1 pengine[2480]:   notice: process_pe_message: 
> Calculated 
> Transition 197: /var/lib/pacer/pengine/pe-input-247.bz2
> Sep 29 11:50:15 gsdmz1 crmd[2481]:   notice: run_graph: Transition 197 
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacer/pengine/pe-input-247.bz2): Complete
> Sep 29 11:50:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State 
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] query ?

2014-09-28 Thread Alex Samad - Yieldbroker
Hi

Is this normal logging ?

Not sure if I need to investigate any thing

Sep 29 11:35:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
origin=crm_timer_popped ]
Sep 29 11:35:15 gsdmz1 pengine[2480]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Sep 29 11:35:15 gsdmz1 pengine[2480]:   notice: process_pe_message: Calculated 
Transition 196: /var/lib/pacer/pengine/pe-input-247.bz2
Sep 29 11:35:15 gsdmz1 crmd[2481]:   notice: run_graph: Transition 196 
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacer/pengine/pe-input-247.bz2): Complete
Sep 29 11:35:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Sep 29 11:50:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
origin=crm_timer_popped ]
Sep 29 11:50:15 gsdmz1 pengine[2480]:   notice: unpack_config: On loss of CCM 
Quorum: Ignore
Sep 29 11:50:15 gsdmz1 pengine[2480]:   notice: process_pe_message: Calculated 
Transition 197: /var/lib/pacer/pengine/pe-input-247.bz2
Sep 29 11:50:15 gsdmz1 crmd[2481]:   notice: run_graph: Transition 197 
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacer/pengine/pe-input-247.bz2): Complete
Sep 29 11:50:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Master is restarted when other node comes online

2014-09-28 Thread Andrei Borzenkov
В Sun, 28 Sep 2014 13:03:08 +0200
emmanuel segura  пишет:

> try to use interleave meta attribute in your clone definition
> 
> http://www.hastexo.com/resources/hints-and-kinks/interleaving-pacemaker-clones
> 

I appreciate if you could elaborate a bit more here. The above link
explains dependencies between two cloned resources. I have single
cloned resource, without any external dependency. I'm afraid I miss how
interleave helps here.

Thank you!

> 2014-09-28 9:56 GMT+02:00 Andrei Borzenkov :
> > I have two node cluster with single master/slave resource (replicated
> > database) using pacemaker+openais on SLES11 SP3 (pacemaker
> > 1.1.11-3ca8c3b). I hit weird situation that I did not see before, and
> > I cannot really understand it. Assuming master runs on node A and
> > slave runs on node B. If I stop cluster stack on B (rcopenais stop)
> > and start it again (rcopenais start) master is restarted. Of course
> > this means service interruption. Same happens if I reboot node B. I
> > have crm_report and can provide logs which are required but I wanted
> > first to quickly make sure this is not expected behavior.
> >
> > I have not seen it before, but now when I recall what I tested, it was
> > always simulation of node failure. I did not really tried above
> > scenario.
> >
> > Assuming this is correct behavior - what is the correct procedure to
> > shutdown single node then? It makes it impossible to do any
> > maintenance on slave node.
> >
> > Configuration below:
> >
> > node msksaphana1 \
> > attributes hana_hdb_vhost="msksaphana1" hana_hdb_site="SITE1"
> > hana_hdb_remoteHost="msksaphana2" hana_hdb_srmode="sync"
> > lpa_hdb_lpt="1411732740"
> > node msksaphana2 \
> > attributes hana_hdb_vhost="msksaphana2" hana_hdb_site="SITE2"
> > hana_hdb_srmode="sync" hana_hdb_remoteHost="msksaphana1"
> > lpa_hdb_lpt="30"
> > primitive rsc_SAPHanaTopology_HDB_HDB00 ocf:suse:SAPHanaTopology \
> > params SID="HDB" InstanceNumber="00" \
> > op monitor interval="10" timeout="600" \
> > op start interval="0" timeout="600" \
> > op stop interval="0" timeout="300"
> > primitive rsc_SAPHana_HDB_HDB00 ocf:suse:SAPHana \
> > params SID="HDB" InstanceNumber="00" PREFER_SITE_TAKEOVER="true"
> > AUTOMATED_REGISTER="true" DUPLICATE_PRIMARY_TIMEOUT="7200" \
> > op start timeout="3600" interval="0" \
> > op stop timeout="3600" interval="0" \
> > op promote timeout="3600" interval="0" \
> > op monitor timeout="700" role="Master" interval="60" \
> > op monitor timeout="700" role="Slave" interval="61"
> > primitive rsc_ip_HDB_HDB00 ocf:heartbeat:IPaddr2 \
> > params ip="10.72.10.64" \
> > op start timeout="20" interval="0" \
> > op stop timeout="20" interval="0" \
> > op monitor interval="10" timeout="20"
> > primitive stonith_IPMI_msksaphana1 stonith:external/ipmi \
> > params ipmitool="/usr/bin/ipmitool" hostname="msksaphana1"
> > passwd="P@ssw0rd" userid="hacluster" ipaddr="10.72.5.47" \
> > op stop timeout="15" interval="0" \
> > op monitor timeout="20" interval="3600" \
> > op start timeout="20" interval="0" \
> > meta target-role="Started"
> > primitive stonith_IPMI_msksaphana2 stonith:external/ipmi \
> > params ipmitool="/usr/bin/ipmitool" hostname="msksaphana2"
> > passwd="P@ssw0rd" userid="hacluster" ipaddr="10.72.5.48" \
> > op stop timeout="15" interval="0" \
> > op monitor timeout="20" interval="3600" \
> > op start timeout="20" interval="0" \
> > meta target-role="Started"
> > ms msl_SAPHana_HDB_HDB00 rsc_SAPHana_HDB_HDB00 \
> > meta clone-max="2" clone-node-max="1" target-role="Started"
> > clone cln_SAPHanaTopology_HDB_HDB00 rsc_SAPHanaTopology_HDB_HDB00 \
> > meta is-managed="true" clone-node-max="1" target-role="Started"
> > location stonoth_IPMI_msksaphana1_on_msksaphana2
> > stonith_IPMI_msksaphana1 -inf: msksaphana1
> > location stonoth_IPMI_msksaphana2_on_msksaphana1
> > stonith_IPMI_msksaphana2 -inf: msksaphana2
> > colocation col_saphana_ip_HDB_HDB00 2000: rsc_ip_HDB_HDB00:Started
> > msl_SAPHana_HDB_HDB00:Master
> > order ord_SAPHana_HDB_HDB00 2000: cln_SAPHanaTopology_HDB_HDB00
> > msl_SAPHana_HDB_HDB00
> > property $id="cib-bootstrap-options" \
> > stonith-enabled="true" \
> > placement-strategy="balanced" \
> > dc-version="1.1.11-3ca8c3b" \
> > cluster-infrastructure="classic openais (with plugin)" \
> > expected-quorum-votes="2" \
> > stonith-action="reboot" \
> > no-quorum-policy="ignore" \
> > last-lrm-refresh="1411730405"
> > rsc_defaults $id="rsc-options" \
> > resource-stickiness="1" \
> > migration-threshold="3"
> > op_defaults $id="op-options" \
> > timeout="600" \
> > record-pending="true"
> >
> > Thank you!
> >
> > -andrei
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.o

Re: [Pacemaker] Master is restarted when other node comes online

2014-09-28 Thread emmanuel segura
try to use interleave meta attribute in your clone definition

http://www.hastexo.com/resources/hints-and-kinks/interleaving-pacemaker-clones

2014-09-28 9:56 GMT+02:00 Andrei Borzenkov :
> I have two node cluster with single master/slave resource (replicated
> database) using pacemaker+openais on SLES11 SP3 (pacemaker
> 1.1.11-3ca8c3b). I hit weird situation that I did not see before, and
> I cannot really understand it. Assuming master runs on node A and
> slave runs on node B. If I stop cluster stack on B (rcopenais stop)
> and start it again (rcopenais start) master is restarted. Of course
> this means service interruption. Same happens if I reboot node B. I
> have crm_report and can provide logs which are required but I wanted
> first to quickly make sure this is not expected behavior.
>
> I have not seen it before, but now when I recall what I tested, it was
> always simulation of node failure. I did not really tried above
> scenario.
>
> Assuming this is correct behavior - what is the correct procedure to
> shutdown single node then? It makes it impossible to do any
> maintenance on slave node.
>
> Configuration below:
>
> node msksaphana1 \
> attributes hana_hdb_vhost="msksaphana1" hana_hdb_site="SITE1"
> hana_hdb_remoteHost="msksaphana2" hana_hdb_srmode="sync"
> lpa_hdb_lpt="1411732740"
> node msksaphana2 \
> attributes hana_hdb_vhost="msksaphana2" hana_hdb_site="SITE2"
> hana_hdb_srmode="sync" hana_hdb_remoteHost="msksaphana1"
> lpa_hdb_lpt="30"
> primitive rsc_SAPHanaTopology_HDB_HDB00 ocf:suse:SAPHanaTopology \
> params SID="HDB" InstanceNumber="00" \
> op monitor interval="10" timeout="600" \
> op start interval="0" timeout="600" \
> op stop interval="0" timeout="300"
> primitive rsc_SAPHana_HDB_HDB00 ocf:suse:SAPHana \
> params SID="HDB" InstanceNumber="00" PREFER_SITE_TAKEOVER="true"
> AUTOMATED_REGISTER="true" DUPLICATE_PRIMARY_TIMEOUT="7200" \
> op start timeout="3600" interval="0" \
> op stop timeout="3600" interval="0" \
> op promote timeout="3600" interval="0" \
> op monitor timeout="700" role="Master" interval="60" \
> op monitor timeout="700" role="Slave" interval="61"
> primitive rsc_ip_HDB_HDB00 ocf:heartbeat:IPaddr2 \
> params ip="10.72.10.64" \
> op start timeout="20" interval="0" \
> op stop timeout="20" interval="0" \
> op monitor interval="10" timeout="20"
> primitive stonith_IPMI_msksaphana1 stonith:external/ipmi \
> params ipmitool="/usr/bin/ipmitool" hostname="msksaphana1"
> passwd="P@ssw0rd" userid="hacluster" ipaddr="10.72.5.47" \
> op stop timeout="15" interval="0" \
> op monitor timeout="20" interval="3600" \
> op start timeout="20" interval="0" \
> meta target-role="Started"
> primitive stonith_IPMI_msksaphana2 stonith:external/ipmi \
> params ipmitool="/usr/bin/ipmitool" hostname="msksaphana2"
> passwd="P@ssw0rd" userid="hacluster" ipaddr="10.72.5.48" \
> op stop timeout="15" interval="0" \
> op monitor timeout="20" interval="3600" \
> op start timeout="20" interval="0" \
> meta target-role="Started"
> ms msl_SAPHana_HDB_HDB00 rsc_SAPHana_HDB_HDB00 \
> meta clone-max="2" clone-node-max="1" target-role="Started"
> clone cln_SAPHanaTopology_HDB_HDB00 rsc_SAPHanaTopology_HDB_HDB00 \
> meta is-managed="true" clone-node-max="1" target-role="Started"
> location stonoth_IPMI_msksaphana1_on_msksaphana2
> stonith_IPMI_msksaphana1 -inf: msksaphana1
> location stonoth_IPMI_msksaphana2_on_msksaphana1
> stonith_IPMI_msksaphana2 -inf: msksaphana2
> colocation col_saphana_ip_HDB_HDB00 2000: rsc_ip_HDB_HDB00:Started
> msl_SAPHana_HDB_HDB00:Master
> order ord_SAPHana_HDB_HDB00 2000: cln_SAPHanaTopology_HDB_HDB00
> msl_SAPHana_HDB_HDB00
> property $id="cib-bootstrap-options" \
> stonith-enabled="true" \
> placement-strategy="balanced" \
> dc-version="1.1.11-3ca8c3b" \
> cluster-infrastructure="classic openais (with plugin)" \
> expected-quorum-votes="2" \
> stonith-action="reboot" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1411730405"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="1" \
> migration-threshold="3"
> op_defaults $id="op-options" \
> timeout="600" \
> record-pending="true"
>
> Thank you!
>
> -andrei
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Master is restarted when other node comes online

2014-09-28 Thread Andrei Borzenkov
I have two node cluster with single master/slave resource (replicated
database) using pacemaker+openais on SLES11 SP3 (pacemaker
1.1.11-3ca8c3b). I hit weird situation that I did not see before, and
I cannot really understand it. Assuming master runs on node A and
slave runs on node B. If I stop cluster stack on B (rcopenais stop)
and start it again (rcopenais start) master is restarted. Of course
this means service interruption. Same happens if I reboot node B. I
have crm_report and can provide logs which are required but I wanted
first to quickly make sure this is not expected behavior.

I have not seen it before, but now when I recall what I tested, it was
always simulation of node failure. I did not really tried above
scenario.

Assuming this is correct behavior - what is the correct procedure to
shutdown single node then? It makes it impossible to do any
maintenance on slave node.

Configuration below:

node msksaphana1 \
attributes hana_hdb_vhost="msksaphana1" hana_hdb_site="SITE1"
hana_hdb_remoteHost="msksaphana2" hana_hdb_srmode="sync"
lpa_hdb_lpt="1411732740"
node msksaphana2 \
attributes hana_hdb_vhost="msksaphana2" hana_hdb_site="SITE2"
hana_hdb_srmode="sync" hana_hdb_remoteHost="msksaphana1"
lpa_hdb_lpt="30"
primitive rsc_SAPHanaTopology_HDB_HDB00 ocf:suse:SAPHanaTopology \
params SID="HDB" InstanceNumber="00" \
op monitor interval="10" timeout="600" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="300"
primitive rsc_SAPHana_HDB_HDB00 ocf:suse:SAPHana \
params SID="HDB" InstanceNumber="00" PREFER_SITE_TAKEOVER="true"
AUTOMATED_REGISTER="true" DUPLICATE_PRIMARY_TIMEOUT="7200" \
op start timeout="3600" interval="0" \
op stop timeout="3600" interval="0" \
op promote timeout="3600" interval="0" \
op monitor timeout="700" role="Master" interval="60" \
op monitor timeout="700" role="Slave" interval="61"
primitive rsc_ip_HDB_HDB00 ocf:heartbeat:IPaddr2 \
params ip="10.72.10.64" \
op start timeout="20" interval="0" \
op stop timeout="20" interval="0" \
op monitor interval="10" timeout="20"
primitive stonith_IPMI_msksaphana1 stonith:external/ipmi \
params ipmitool="/usr/bin/ipmitool" hostname="msksaphana1"
passwd="P@ssw0rd" userid="hacluster" ipaddr="10.72.5.47" \
op stop timeout="15" interval="0" \
op monitor timeout="20" interval="3600" \
op start timeout="20" interval="0" \
meta target-role="Started"
primitive stonith_IPMI_msksaphana2 stonith:external/ipmi \
params ipmitool="/usr/bin/ipmitool" hostname="msksaphana2"
passwd="P@ssw0rd" userid="hacluster" ipaddr="10.72.5.48" \
op stop timeout="15" interval="0" \
op monitor timeout="20" interval="3600" \
op start timeout="20" interval="0" \
meta target-role="Started"
ms msl_SAPHana_HDB_HDB00 rsc_SAPHana_HDB_HDB00 \
meta clone-max="2" clone-node-max="1" target-role="Started"
clone cln_SAPHanaTopology_HDB_HDB00 rsc_SAPHanaTopology_HDB_HDB00 \
meta is-managed="true" clone-node-max="1" target-role="Started"
location stonoth_IPMI_msksaphana1_on_msksaphana2
stonith_IPMI_msksaphana1 -inf: msksaphana1
location stonoth_IPMI_msksaphana2_on_msksaphana1
stonith_IPMI_msksaphana2 -inf: msksaphana2
colocation col_saphana_ip_HDB_HDB00 2000: rsc_ip_HDB_HDB00:Started
msl_SAPHana_HDB_HDB00:Master
order ord_SAPHana_HDB_HDB00 2000: cln_SAPHanaTopology_HDB_HDB00
msl_SAPHana_HDB_HDB00
property $id="cib-bootstrap-options" \
stonith-enabled="true" \
placement-strategy="balanced" \
dc-version="1.1.11-3ca8c3b" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-action="reboot" \
no-quorum-policy="ignore" \
last-lrm-refresh="1411730405"
rsc_defaults $id="rsc-options" \
resource-stickiness="1" \
migration-threshold="3"
op_defaults $id="op-options" \
timeout="600" \
record-pending="true"

Thank you!

-andrei

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org