Re: [Pacemaker] fencing question

2014-03-17 Thread Andrew Beekhof

On 14 Mar 2014, at 1:18 am, Karl Rößmann  wrote:

> Hi,
> 
> I changed the running resource by
> crm / configure / edit / commit. It seemed to work.
> 
> I stopped the resource, and changed some details,
> Whenever I commit again I get this warning:
> warning: do_log: FSA: Input I_ELECTION_DC from do_election_check() received 
> in state S_INTEGRATION
> 
> see below
> 
> Mar 13 15:02:04 ha1infra crm_verify[24991]:   notice: crm_log_args: Invoked: 
> crm_verify -V -p
> Mar 13 15:02:04 ha1infra cibadmin[24992]:   notice: crm_log_args: Invoked: 
> cibadmin -p -R
> Mar 13 15:02:04 ha1infra crmd[21812]:   notice: do_state_transition: State 
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph ]
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: Diff: --- 0.1057.3
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: Diff: +++ 0.1058.1 
> a460a945dcf52bbb4ffb39e7963ee925
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: --  admin_epoch="0" epoch="1057" num_updates="3"/>
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++id="vmdv03" class="ocf" provider="heartbeat" type="Xen">
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++ 
> 
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++name="target-role" value="Stopped" id="vmdv03-meta_attributes-target-role"/>
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++name="allow-migrate" value="true" id="vmdv03-meta_attributes-allow-migrate"/>
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++ 
> 
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++ 
> 
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++name="monitor" interval="10" timeout="30" id="vmdv03-monitor-10"/>
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++name="migrate_from" interval="0" timeout="600" id="vmdv03-migrate_from-0"/>
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++name="migrate_to" interval="0" timeout="600" id="vmdv03-migrate_to-0"/>
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++ 
> 
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++ 
> 
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++name="xmfile" value="/etc/xen/vm/vmdv03" 
> id="vmdv03-instance_attributes-xmfile"/>
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++name="shutdown_timeout" value="120" 
> id="vmdv03-instance_attributes-shutdown_timeout"/>
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++ 
> 
> Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++   
> Mar 13 15:02:04 ha1infra crmd[21812]:   notice: do_state_transition: State 
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC 
> cause=C_TIMER_POPPED origin=election_timeout_popped ]
> Mar 13 15:02:04 ha1infra crmd[21812]:  warning: do_log: FSA: Input 
> I_ELECTION_DC from do_election_check() received in state S_INTEGRATION  
> <-- what does this mean ?

It means that something not completely normal is going on.
Possibly the nodes can't talk to each other, but I'm betting on a bug of some 
kind.

> Mar 13 15:02:04 ha1infra crmd[21812]:   notice: do_state_transition: State 
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph ]
> Mar 13 15:02:04 ha1infra crmd[21812]:   notice: do_state_transition: State 
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC 
> cause=C_TIMER_POPPED origin=election_timeout_popped ]

There's not enough time for a timer to have really expired.
Probably a good idea to contact SUSE support (and configure a log file, it will 
contain more information than syslog).

> Mar 13 15:02:04 ha1infra attrd[21810]:   notice: attrd_local_callback: 
> Sending full refresh (origin=crmd)
> Mar 13 15:02:04 ha1infra attrd[21810]:   notice: attrd_trigger_update: 
> Sending flush op to all hosts for: shutdown (0)
> Mar 13 15:02:04 ha1infra crmd[21812]:   notice: crm_update_quorum: Updating 
> quorum status to true (call=457)
> Mar 13 15:02:04 ha1infra attrd[21810]:   notice: attrd_trigger_update: 
> Sending flush op to all hosts for: probe_complete (true)
> 
> 
> 
> Karl
> 
> 
>> On 2014-03-12T16:16:54, Karl Rößmann  wrote:
>> 
>>> >>primitive fkflmw ocf:heartbeat:Xen \
>>> >>meta target-role="Started" is-managed="true" allow-migrate="true" 
>>> >> \
>>> >>op monitor interval="10" timeout="30" \
>>> >>op migrate_from interval="0" timeout="600" \
>>> >>op migrate_to interval="0" timeout="600" \
>>> >>params xmfile="/etc/xen/vm/fkflmw" shutdown_timeout="120"
>>> >
>>> >You need to set a >120s timeout for the stop operation too:
>>> >   op stop timeout="150"
>>> >
>>> >>default-action-timeout="60s"
>>> >
>>> >Or set this to, say, 150s.
>>> can I do this while the resou

Re: [Pacemaker] fencing question

2014-03-13 Thread Karl Rößmann

Hi,

I changed the running resource by
crm / configure / edit / commit. It seemed to work.

I stopped the resource, and changed some details,
Whenever I commit again I get this warning:
warning: do_log: FSA: Input I_ELECTION_DC from do_election_check()  
received in state S_INTEGRATION


see below

Mar 13 15:02:04 ha1infra crm_verify[24991]:   notice: crm_log_args:  
Invoked: crm_verify -V -p
Mar 13 15:02:04 ha1infra cibadmin[24992]:   notice: crm_log_args:  
Invoked: cibadmin -p -R
Mar 13 15:02:04 ha1infra crmd[21812]:   notice: do_state_transition:  
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC  
cause=C_FSA_INTERNAL origin=abort_transition_graph ]

Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: Diff: --- 0.1057.3
Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: Diff: +++  
0.1058.1 a460a945dcf52bbb4ffb39e7963ee925
Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: -- admin_epoch="0" epoch="1057" num_updates="3"/>
Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++

Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++  

Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++
id="vmdv03-meta_attributes-target-role"/>
Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++
id="vmdv03-meta_attributes-allow-migrate"/>
Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++  

Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++  

Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++

Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++
id="vmdv03-migrate_from-0"/>
Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++
id="vmdv03-migrate_to-0"/>
Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++  

Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++  

Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++
id="vmdv03-instance_attributes-xmfile"/>
Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++
id="vmdv03-instance_attributes-shutdown_timeout"/>
Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++  


Mar 13 15:02:04 ha1infra cib[21807]:   notice: cib:diff: ++   
Mar 13 15:02:04 ha1infra crmd[21812]:   notice: do_state_transition:  
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC  
cause=C_TIMER_POPPED origin=election_timeout_popped ]
Mar 13 15:02:04 ha1infra crmd[21812]:  warning: do_log: FSA: Input  
I_ELECTION_DC from do_election_check() received in state S_INTEGRATION  
 <-- what does this mean ?
Mar 13 15:02:04 ha1infra attrd[21810]:   notice: attrd_local_callback:  
Sending full refresh (origin=crmd)
Mar 13 15:02:04 ha1infra attrd[21810]:   notice: attrd_trigger_update:  
Sending flush op to all hosts for: shutdown (0)
Mar 13 15:02:04 ha1infra crmd[21812]:   notice: crm_update_quorum:  
Updating quorum status to true (call=457)
Mar 13 15:02:04 ha1infra attrd[21810]:   notice: attrd_trigger_update:  
Sending flush op to all hosts for: probe_complete (true)




Karl



On 2014-03-12T16:16:54, Karl Rößmann  wrote:


>>primitive fkflmw ocf:heartbeat:Xen \
>>meta target-role="Started" is-managed="true"  
allow-migrate="true" \

>>op monitor interval="10" timeout="30" \
>>op migrate_from interval="0" timeout="600" \
>>op migrate_to interval="0" timeout="600" \
>>params xmfile="/etc/xen/vm/fkflmw" shutdown_timeout="120"
>
>You need to set a >120s timeout for the stop operation too:
>op stop timeout="150"
>
>>default-action-timeout="60s"
>
>Or set this to, say, 150s.
can I do this while the resource (the xen VM) is running ?


Yes, changing the stop timeout should not have a negative impact on your
resource.

You can also check how the cluster would react:

# crm configure
crm(live)configure# edit
(Make all changes you want here)
crm(live)configure# simulate actions nograph

before you type "commit".

Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix  
Imendörffer, HRB 21284 (AG Nürnberg)

"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org





--
Karl RößmannTel. +49-711-689-1657
Max-Planck-Institut FKF Fax. +49-711-689-1632
Postfach 800 665
70506 Stuttgart email k.roessm...@fkf.mpg.de

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlab

Re: [Pacemaker] fencing question

2014-03-12 Thread Lars Marowsky-Bree
On 2014-03-12T16:16:54, Karl Rößmann  wrote:

> >>primitive fkflmw ocf:heartbeat:Xen \
> >>meta target-role="Started" is-managed="true" allow-migrate="true" \
> >>op monitor interval="10" timeout="30" \
> >>op migrate_from interval="0" timeout="600" \
> >>op migrate_to interval="0" timeout="600" \
> >>params xmfile="/etc/xen/vm/fkflmw" shutdown_timeout="120"
> >
> >You need to set a >120s timeout for the stop operation too:
> > op stop timeout="150"
> >
> >>default-action-timeout="60s"
> >
> >Or set this to, say, 150s.
> can I do this while the resource (the xen VM) is running ?

Yes, changing the stop timeout should not have a negative impact on your
resource.

You can also check how the cluster would react:

# crm configure
crm(live)configure# edit
(Make all changes you want here)
crm(live)configure# simulate actions nograph

before you type "commit".

Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] fencing question

2014-03-12 Thread Karl Rößmann

Hi.


primitive fkflmw ocf:heartbeat:Xen \
meta target-role="Started" is-managed="true" allow-migrate="true" \
op monitor interval="10" timeout="30" \
op migrate_from interval="0" timeout="600" \
op migrate_to interval="0" timeout="600" \
params xmfile="/etc/xen/vm/fkflmw" shutdown_timeout="120"


You need to set a >120s timeout for the stop operation too:
op stop timeout="150"


default-action-timeout="60s"


Or set this to, say, 150s.



can I do this while the resource (the xen VM) is running ?



Karl



--
Karl RößmannTel. +49-711-689-1657
Max-Planck-Institut FKF Fax. +49-711-689-1632
Postfach 800 665
70506 Stuttgart email k.roessm...@fkf.mpg.de

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] fencing question

2014-03-12 Thread Lars Marowsky-Bree
On 2014-03-12T15:17:13, Karl Rößmann  wrote:

> Hi,
> 
> we have a two node HA cluster using SuSE SlES 11 HA Extension SP3,
> latest release value.
> A resource (xen) was manually stopped, the shutdown_timeout is 120s
> but after 60s the node was fenced and shut down by the other node.
> 
> should I change some timeout value ?
> 
> This is a part of our configuration:
> ...
> primitive fkflmw ocf:heartbeat:Xen \
> meta target-role="Started" is-managed="true" allow-migrate="true" \
> op monitor interval="10" timeout="30" \
> op migrate_from interval="0" timeout="600" \
> op migrate_to interval="0" timeout="600" \
> params xmfile="/etc/xen/vm/fkflmw" shutdown_timeout="120"

You need to set a >120s timeout for the stop operation too:
op stop timeout="150"

> default-action-timeout="60s"

Or set this to, say, 150s.


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org