Re: [Pacemaker] If 256 resources are load(ed), crmd will reboot.

2014-05-28 Thread Andrew Beekhof

On 29 May 2014, at 3:40 pm, Yusuke Iida  wrote:

> Hi, Andrew
> 
> 2014-05-29 14:00 GMT+09:00 Andrew Beekhof :
>> 
>> On 29 May 2014, at 12:28 pm, Yusuke Iida  wrote:
>> 
>>> Hi, Andrew
>>> 
>>> I'm sorry.
>>> It seems that the notation of the node name became another by syslog.
>>> In order to dispel misunderstanding, the report was newly acquired.
>>> I think that the signs are appearing in vm02/ha-log.
>> 
>> Got it :)
>> 
>> Ok, step 1 - stop logging debug.
>> Debug is accounting for 30% of the logs and all that writing to disk would 
>> be adding significantly to the cluster's workload.
> I understand.
> 
>> 
>> Question:  How have you got logging configured? Anything in 
>> /etc/sysconfig/pacemaker ?
>> 
>> I ask because pacemaker.log appears to have a jumble of syslog and regular 
>> file output:
>> 
>> May 29 10:45:26 vm02 cib[25603]: info: cib_perform_op: +  /cib:  
>> @num_updates=1295
>> May 29 10:45:26 [25603] vm02cib: info: cib_perform_op:  +  
>> /cib:  @num_updates=1295
> The position of pid is different although seldom cared.
> I attach the /etc/sysconfig/pacemaker of my environment.

The format isn't a problem, it just indicates that there are two mechanisms 
logging to the same place.
So its redundant.

The question is... how, your configs look fine to me :-/

> 
>> 
>> 
>> Step 2 - can you try this patch:
>> 
>> diff --git a/crmd/te_callbacks.c b/crmd/te_callbacks.c
>> index 4d330a6..eba5f11 100644
>> --- a/crmd/te_callbacks.c
>> +++ b/crmd/te_callbacks.c
>> @@ -381,12 +381,15 @@ te_update_diff(const char *event, xmlNode * msg)
>> 
>> } else if(strstr(xpath, "/cib/configuration")) {
>> abort_transition(INFINITY, tg_restart, "Non-status change", 
>> change);
>> +break; /* Wont be packaged with any resource operations we may 
>> be waiting for */
>> 
>> } else if(strstr(xpath, "/"XML_CIB_TAG_TICKETS) || safe_str_eq(name, 
>> XML_CIB_TAG_TICKETS)) {
>> abort_transition(INFINITY, tg_restart, "Ticket attribute 
>> change", change);
>> +break; /* Wont be packaged with any resource operations we may 
>> be waiting for */
>> 
>> } else if(strstr(xpath, "/"XML_TAG_TRANSIENT_NODEATTRS"[") || 
>> safe_str_eq(name, XML_TAG_TRANSIENT_NODEATTRS)) {
>> abort_transition(INFINITY, tg_restart, "Transient attribute 
>> change", change);
>> +break; /* Wont be packaged with any resource operations we may 
>> be waiting for */
>> 
>> } else if(strstr(xpath, "/"XML_LRM_TAG_RSC_OP"[") && safe_str_eq(op, 
>> "delete")) {
>> crm_action_t *cancel = NULL;
> 
> Thank you for the patch.
> It replies by checking a motion.

Do you mean it works now?

> 
> Regards,
> Yusuke
>> 
>> 
>>> 
>>> May 29 10:43:37 vm02 crmd[25608]:error: config_query_callback:
>>> Local CIB query resulted in an error: Timer expired
>>> May 29 10:43:37 vm02 crmd[25608]: info: register_fsa_error_adv:
>>> Resetting the current action list
>>> May 29 10:43:37 vm02 crmd[25608]:error: do_log: FSA: Input I_ERROR
>>> from config_query_callback() received in state S_POLICY_ENGINE
>>> May 29 10:43:37 vm02 crmd[25608]:  warning: do_state_transition: State
>>> transition S_POLICY_ENGINE -> S_RECOVERY [ input=I_ERROR
>>> cause=C_FSA_INTERNAL origin=config_query_callback ]
>>> May 29 10:43:37 vm02 crmd[25608]:  warning: do_recover: Fast-tracking
>>> shutdown in response to errors
>>> May 29 10:43:37 vm02 crmd[25608]:  warning: do_election_vote: Not
>>> voting in election, we're in state S_RECOVERY
>>> 
>>> https://drive.google.com/file/d/0BwMFJItoO-fVSEd2MkRiOGxkelk/edit?usp=sharing
>>> 
>>> Regards,
>>> Yusuke
>>> 
>>> 2014-05-29 10:26 GMT+09:00 Andrew Beekhof :
 
 On 28 May 2014, at 6:42 pm, Yusuke Iida  wrote:
 
> Hi, Andrew
> 
> I made the cluster load a setup to which 256 resources are started using 
> crmsh.
> At this time, crmd changed into the S_RECOVERY state and rebooted.
> 
> May 28 17:08:00 [14194] vm02   crmd:error:
> config_query_callback: Local CIB query resulted in an error: Timer
> expired
> May 28 17:08:00 [14194] vm02   crmd: info:
> register_fsa_error_adv: Resetting the current action list
> May 28 17:08:00 [14194] vm02   crmd:error: do_log: FSA: Input
> I_ERROR from config_query_callback() received in state S_POLICY_ENGINE
> May 28 17:08:00 [14194] vm02   crmd:  warning:
> do_state_transition: State transition S_POLICY_ENGINE -> S_RECOVERY [
> input=I_ERROR cause=C_FSA_INTERNAL origin=config_query_callback ]
> May 28 17:08:00 [14194] vm02   crmd:  warning: do_recover:
> Fast-tracking shutdown in response to errors
> May 28 17:08:00 [14194] vm02   crmd:  warning: do_election_vote:
> Not voting in election, we're in state S_RECOVERY
> 
> I think that query performed in large quantities cannot be processed.
> Before implementing cib_performance, abort_trans

Re: [Pacemaker] If 256 resources are load(ed), crmd will reboot.

2014-05-28 Thread Yusuke Iida
Hi, Andrew

2014-05-29 14:00 GMT+09:00 Andrew Beekhof :
>
> On 29 May 2014, at 12:28 pm, Yusuke Iida  wrote:
>
>> Hi, Andrew
>>
>> I'm sorry.
>> It seems that the notation of the node name became another by syslog.
>> In order to dispel misunderstanding, the report was newly acquired.
>> I think that the signs are appearing in vm02/ha-log.
>
> Got it :)
>
> Ok, step 1 - stop logging debug.
> Debug is accounting for 30% of the logs and all that writing to disk would be 
> adding significantly to the cluster's workload.
I understand.

>
> Question:  How have you got logging configured? Anything in 
> /etc/sysconfig/pacemaker ?
>
> I ask because pacemaker.log appears to have a jumble of syslog and regular 
> file output:
>
> May 29 10:45:26 vm02 cib[25603]: info: cib_perform_op: +  /cib:  
> @num_updates=1295
> May 29 10:45:26 [25603] vm02cib: info: cib_perform_op:  +  
> /cib:  @num_updates=1295
The position of pid is different although seldom cared.
I attach the /etc/sysconfig/pacemaker of my environment.

>
>
> Step 2 - can you try this patch:
>
> diff --git a/crmd/te_callbacks.c b/crmd/te_callbacks.c
> index 4d330a6..eba5f11 100644
> --- a/crmd/te_callbacks.c
> +++ b/crmd/te_callbacks.c
> @@ -381,12 +381,15 @@ te_update_diff(const char *event, xmlNode * msg)
>
>  } else if(strstr(xpath, "/cib/configuration")) {
>  abort_transition(INFINITY, tg_restart, "Non-status change", 
> change);
> +break; /* Wont be packaged with any resource operations we may 
> be waiting for */
>
>  } else if(strstr(xpath, "/"XML_CIB_TAG_TICKETS) || safe_str_eq(name, 
> XML_CIB_TAG_TICKETS)) {
>  abort_transition(INFINITY, tg_restart, "Ticket attribute 
> change", change);
> +break; /* Wont be packaged with any resource operations we may 
> be waiting for */
>
>  } else if(strstr(xpath, "/"XML_TAG_TRANSIENT_NODEATTRS"[") || 
> safe_str_eq(name, XML_TAG_TRANSIENT_NODEATTRS)) {
>  abort_transition(INFINITY, tg_restart, "Transient attribute 
> change", change);
> +break; /* Wont be packaged with any resource operations we may 
> be waiting for */
>
>  } else if(strstr(xpath, "/"XML_LRM_TAG_RSC_OP"[") && safe_str_eq(op, 
> "delete")) {
>  crm_action_t *cancel = NULL;

Thank you for the patch.
It replies by checking a motion.

Regards,
Yusuke
>
>
>>
>> May 29 10:43:37 vm02 crmd[25608]:error: config_query_callback:
>> Local CIB query resulted in an error: Timer expired
>> May 29 10:43:37 vm02 crmd[25608]: info: register_fsa_error_adv:
>> Resetting the current action list
>> May 29 10:43:37 vm02 crmd[25608]:error: do_log: FSA: Input I_ERROR
>> from config_query_callback() received in state S_POLICY_ENGINE
>> May 29 10:43:37 vm02 crmd[25608]:  warning: do_state_transition: State
>> transition S_POLICY_ENGINE -> S_RECOVERY [ input=I_ERROR
>> cause=C_FSA_INTERNAL origin=config_query_callback ]
>> May 29 10:43:37 vm02 crmd[25608]:  warning: do_recover: Fast-tracking
>> shutdown in response to errors
>> May 29 10:43:37 vm02 crmd[25608]:  warning: do_election_vote: Not
>> voting in election, we're in state S_RECOVERY
>>
>> https://drive.google.com/file/d/0BwMFJItoO-fVSEd2MkRiOGxkelk/edit?usp=sharing
>>
>> Regards,
>> Yusuke
>>
>> 2014-05-29 10:26 GMT+09:00 Andrew Beekhof :
>>>
>>> On 28 May 2014, at 6:42 pm, Yusuke Iida  wrote:
>>>
 Hi, Andrew

 I made the cluster load a setup to which 256 resources are started using 
 crmsh.
 At this time, crmd changed into the S_RECOVERY state and rebooted.

 May 28 17:08:00 [14194] vm02   crmd:error:
 config_query_callback: Local CIB query resulted in an error: Timer
 expired
 May 28 17:08:00 [14194] vm02   crmd: info:
 register_fsa_error_adv: Resetting the current action list
 May 28 17:08:00 [14194] vm02   crmd:error: do_log: FSA: Input
 I_ERROR from config_query_callback() received in state S_POLICY_ENGINE
 May 28 17:08:00 [14194] vm02   crmd:  warning:
 do_state_transition: State transition S_POLICY_ENGINE -> S_RECOVERY [
 input=I_ERROR cause=C_FSA_INTERNAL origin=config_query_callback ]
 May 28 17:08:00 [14194] vm02   crmd:  warning: do_recover:
 Fast-tracking shutdown in response to errors
 May 28 17:08:00 [14194] vm02   crmd:  warning: do_election_vote:
 Not voting in election, we're in state S_RECOVERY

 I think that query performed in large quantities cannot be processed.
 Before implementing cib_performance, abort_transition() was called only 
 once.

 Is this corrected?

 report when a problem occurs is attached.
 https://drive.google.com/file/d/0BwMFJItoO-fVX0gxM1ptcE52WWs/edit?usp=sharing
>>>
>>> That doesn't appear to match the symptoms above.
>>>

 Regards,
 Yusuke
 --
 
 METRO SYSTEMS CO., LTD

 Yusuke Iida
 Mail: yusk.i.

Re: [Pacemaker] If 256 resources are load(ed), crmd will reboot.

2014-05-28 Thread Andrew Beekhof

On 29 May 2014, at 12:28 pm, Yusuke Iida  wrote:

> Hi, Andrew
> 
> I'm sorry.
> It seems that the notation of the node name became another by syslog.
> In order to dispel misunderstanding, the report was newly acquired.
> I think that the signs are appearing in vm02/ha-log.

Got it :)

Ok, step 1 - stop logging debug.
Debug is accounting for 30% of the logs and all that writing to disk would be 
adding significantly to the cluster's workload.

Question:  How have you got logging configured? Anything in 
/etc/sysconfig/pacemaker ?

I ask because pacemaker.log appears to have a jumble of syslog and regular file 
output:

May 29 10:45:26 vm02 cib[25603]: info: cib_perform_op: +  /cib:  
@num_updates=1295
May 29 10:45:26 [25603] vm02cib: info: cib_perform_op:  +  
/cib:  @num_updates=1295


Step 2 - can you try this patch:

diff --git a/crmd/te_callbacks.c b/crmd/te_callbacks.c
index 4d330a6..eba5f11 100644
--- a/crmd/te_callbacks.c
+++ b/crmd/te_callbacks.c
@@ -381,12 +381,15 @@ te_update_diff(const char *event, xmlNode * msg)
 
 } else if(strstr(xpath, "/cib/configuration")) {
 abort_transition(INFINITY, tg_restart, "Non-status change", 
change);
+break; /* Wont be packaged with any resource operations we may be 
waiting for */
 
 } else if(strstr(xpath, "/"XML_CIB_TAG_TICKETS) || safe_str_eq(name, 
XML_CIB_TAG_TICKETS)) {
 abort_transition(INFINITY, tg_restart, "Ticket attribute change", 
change);
+break; /* Wont be packaged with any resource operations we may be 
waiting for */
 
 } else if(strstr(xpath, "/"XML_TAG_TRANSIENT_NODEATTRS"[") || 
safe_str_eq(name, XML_TAG_TRANSIENT_NODEATTRS)) {
 abort_transition(INFINITY, tg_restart, "Transient attribute 
change", change);
+break; /* Wont be packaged with any resource operations we may be 
waiting for */
 
 } else if(strstr(xpath, "/"XML_LRM_TAG_RSC_OP"[") && safe_str_eq(op, 
"delete")) {
 crm_action_t *cancel = NULL;


> 
> May 29 10:43:37 vm02 crmd[25608]:error: config_query_callback:
> Local CIB query resulted in an error: Timer expired
> May 29 10:43:37 vm02 crmd[25608]: info: register_fsa_error_adv:
> Resetting the current action list
> May 29 10:43:37 vm02 crmd[25608]:error: do_log: FSA: Input I_ERROR
> from config_query_callback() received in state S_POLICY_ENGINE
> May 29 10:43:37 vm02 crmd[25608]:  warning: do_state_transition: State
> transition S_POLICY_ENGINE -> S_RECOVERY [ input=I_ERROR
> cause=C_FSA_INTERNAL origin=config_query_callback ]
> May 29 10:43:37 vm02 crmd[25608]:  warning: do_recover: Fast-tracking
> shutdown in response to errors
> May 29 10:43:37 vm02 crmd[25608]:  warning: do_election_vote: Not
> voting in election, we're in state S_RECOVERY
> 
> https://drive.google.com/file/d/0BwMFJItoO-fVSEd2MkRiOGxkelk/edit?usp=sharing
> 
> Regards,
> Yusuke
> 
> 2014-05-29 10:26 GMT+09:00 Andrew Beekhof :
>> 
>> On 28 May 2014, at 6:42 pm, Yusuke Iida  wrote:
>> 
>>> Hi, Andrew
>>> 
>>> I made the cluster load a setup to which 256 resources are started using 
>>> crmsh.
>>> At this time, crmd changed into the S_RECOVERY state and rebooted.
>>> 
>>> May 28 17:08:00 [14194] vm02   crmd:error:
>>> config_query_callback: Local CIB query resulted in an error: Timer
>>> expired
>>> May 28 17:08:00 [14194] vm02   crmd: info:
>>> register_fsa_error_adv: Resetting the current action list
>>> May 28 17:08:00 [14194] vm02   crmd:error: do_log: FSA: Input
>>> I_ERROR from config_query_callback() received in state S_POLICY_ENGINE
>>> May 28 17:08:00 [14194] vm02   crmd:  warning:
>>> do_state_transition: State transition S_POLICY_ENGINE -> S_RECOVERY [
>>> input=I_ERROR cause=C_FSA_INTERNAL origin=config_query_callback ]
>>> May 28 17:08:00 [14194] vm02   crmd:  warning: do_recover:
>>> Fast-tracking shutdown in response to errors
>>> May 28 17:08:00 [14194] vm02   crmd:  warning: do_election_vote:
>>> Not voting in election, we're in state S_RECOVERY
>>> 
>>> I think that query performed in large quantities cannot be processed.
>>> Before implementing cib_performance, abort_transition() was called only 
>>> once.
>>> 
>>> Is this corrected?
>>> 
>>> report when a problem occurs is attached.
>>> https://drive.google.com/file/d/0BwMFJItoO-fVX0gxM1ptcE52WWs/edit?usp=sharing
>> 
>> That doesn't appear to match the symptoms above.
>> 
>>> 
>>> Regards,
>>> Yusuke
>>> --
>>> 
>>> METRO SYSTEMS CO., LTD
>>> 
>>> Yusuke Iida
>>> Mail: yusk.i...@gmail.com
>>> 
>>> 
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>>

Re: [Pacemaker] unexpected demote request on master

2014-05-28 Thread K Mehta
In which pcs version is this issue fixed ?

On Wednesday, May 28, 2014, K Mehta  wrote:
> Chris,
> Here is the required information
> [root@vsanqa11 ~]# rpm -qa | grep pcs ; rpm -qa | grep pacemaker ; uname
-a ; cat /etc/redhat-release
> pcs-0.9.90-2.el6.centos.2.noarch
> pacemaker-cli-1.1.10-14.el6_5.3.x86_64
> pacemaker-libs-1.1.10-14.el6_5.3.x86_64
> pacemaker-1.1.10-14.el6_5.3.x86_64
> pacemaker-cluster-libs-1.1.10-14.el6_5.3.x86_64
> Linux vsanqa11 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012
x86_64 x86_64 x86_64 GNU/Linux
> CentOS release 6.3 (Final)
>
> Regards,
>  Kiran
>
> On Wed, May 28, 2014 at 2:47 AM, Chris Feist  wrote:
>
> On 05/27/14 05:38, K Mehta wrote:
>
> One more question.
> With crmsh, it was easy to add constraint to avoid a resource from
running only
> a subset(say vsanqa11 and vsanqa12) of nodes using the following command
>
> crm configure location ms-${uuid}-nodes ms-$uuid rule -inf: \#uname ne
vsanqa11
> and \#uname ne  vsanqa12
> [root@vsanqa11 ~]# pcs constraint show --full
> Location Constraints:
>Resource: ms-c6933988-9e5c-419e-8fdf-744100d76ad6
>  Constraint: ms-c6933988-9e5c-419e-8fdf-744100d76ad6-nodes
>Rule: score=-INFINITY
>   (id:ms-c6933988-9e5c-419e-8fdf-744100d76ad6-nodes-rule)
>  Expression: #uname ne vsanqa11
>   (id:ms-c6933988-9e5c-419e-8fdf-744100d76ad6-nodes-expression)
>  Expression: #uname ne vsanqa12
>   (id:ms-c6933988-9e5c-419e-8fdf-744100d76ad6-nodes-expression-0)
> Ordering Constraints:
> Colocation Constraints:
>
> So, both expression are part of the same rule as expected.
>
>
>
> With pcs, I am not sure how to use avoid constraints if I need a resource
to run
> on vsanqa11 and vsanqa12 and not on any other node.
> So I tried adding location constraint as follows:
> pcs -f $CLUSTER_CREATE_LOG constraint location vha-$uuid rule
score=-INFINITY
> \#uname ne vsanqa11 and \#uname ne vsanqa12
> Even though no error is thrown, the condition after "and" is silently
dropped as
> shown below
>
> [root@vsanqa11 ~]# pcs constraint show --full
> Location Constraints:
>Resource: ms-c6933988-9e5c-419e-8fdf-744100d76ad6
>  Constraint: location-vha-c6933988-9e5c-419e-8fdf-744100d76ad6
>Rule: score=-INFINITY
>   (id:location-vha-c6933988-9e5c-419e-8fdf-744100d76ad6-rule)
>  Expression: #uname ne vsanqa11
>   (id:location-vha-c6933988-9e5c-419e-8fdf-744100d76ad6-rule-expr)
> Ordering Constraints:
> Colocation Constraints:
>
>
> Then I tried the following
> pcs -f $CLUSTER_CREATE_LOG constraint location vha-$uuid rule
score=-INFINITY
> \#uname ne vsanqa11
> pcs -f $CLUSTER_CREATE_LOG constraint location vha-$uuid rule
score=-INFINITY
> \#uname ne vsanqa12
>
> but running these two commands did not help either. Expressions were
added to
> separate rules.
>
> [root@vsanqa11 ~]# pcs constraint show --full
> Location Constraints:
>Resource: ms-c6933988-9e5c-419e-8fdf-744100d76ad6
>  Constraint: location-vha-c6933988-9e5c-419e-8fdf-744100d76ad6-1
>Rule: score=-INFINITY
>   (id:location-vha-c6933988-9e5c-419e-8fdf-744100d76ad6-1-rule)
>  Expression: #uname ne vsanqa12
>   (id:location-vha-c6933988-9e5c-419e-8fdf-744100d76ad6-1-rule-expr)
>  Constraint: location-vha-c6933988-9e5c-419e-8fdf-744100d76ad6
>Rule: score=-INFINITY
>   (id:location-vha-c6933988-9e5c-419e-8fdf-744100d76ad6-rule)
>  Expression: #uname ne vsanqa11
>   (id:location-vha-c6933988-9e5c-419e-8fdf-744100d76ad6-rule-expr)
> Ordering Constraints:
> Colocation Constraints:
>
>
> Also, tried using multistate resource name
> [root@vsanqa11 ~]# pcs constraint location
> ms-c6933988-9e5c-419e-8fdf-744100d76ad6 rule score=-INFINITY \#uname ne
vsanqa11
> Error: 'ms-c6933988-9e5c-419e-8fdf-744100d76ad6' is not a resource
>
>
> Can anyone let me correct command for this ?
>
> Which version of pcs are you using (and what distribution)?  This has
been fixed upstream.  (Below is a test from my system using the upstream
pcs).
>
> [root@rh7-1 pcs]# pcs constraint location D1 rule score=-INFINITY \#uname
ne vsanqa11 and \#uname ne vsanqa12
> [root@rh7-1 pcs]# pcs constraint
> Location Constraints:
>   Resource: D1
> Constraint: location-D1
>   Rule: score=-INFINITY boolean-op=and
> Expression: #uname ne vsanqa11
> Expression: #uname ne vsanqa12
>
> Thanks,
> Chris
>
>
>
>
>
>
> On Tue, May 27, 2014 at 11:01 AM, Andrew Beekhof  ___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] If 256 resources are load(ed), crmd will reboot.

2014-05-28 Thread Yusuke Iida
Hi, Andrew

I'm sorry.
It seems that the notation of the node name became another by syslog.
In order to dispel misunderstanding, the report was newly acquired.
I think that the signs are appearing in vm02/ha-log.

May 29 10:43:37 vm02 crmd[25608]:error: config_query_callback:
Local CIB query resulted in an error: Timer expired
May 29 10:43:37 vm02 crmd[25608]: info: register_fsa_error_adv:
Resetting the current action list
May 29 10:43:37 vm02 crmd[25608]:error: do_log: FSA: Input I_ERROR
from config_query_callback() received in state S_POLICY_ENGINE
May 29 10:43:37 vm02 crmd[25608]:  warning: do_state_transition: State
transition S_POLICY_ENGINE -> S_RECOVERY [ input=I_ERROR
cause=C_FSA_INTERNAL origin=config_query_callback ]
May 29 10:43:37 vm02 crmd[25608]:  warning: do_recover: Fast-tracking
shutdown in response to errors
May 29 10:43:37 vm02 crmd[25608]:  warning: do_election_vote: Not
voting in election, we're in state S_RECOVERY

https://drive.google.com/file/d/0BwMFJItoO-fVSEd2MkRiOGxkelk/edit?usp=sharing

Regards,
Yusuke

2014-05-29 10:26 GMT+09:00 Andrew Beekhof :
>
> On 28 May 2014, at 6:42 pm, Yusuke Iida  wrote:
>
>> Hi, Andrew
>>
>> I made the cluster load a setup to which 256 resources are started using 
>> crmsh.
>> At this time, crmd changed into the S_RECOVERY state and rebooted.
>>
>> May 28 17:08:00 [14194] vm02   crmd:error:
>> config_query_callback: Local CIB query resulted in an error: Timer
>> expired
>> May 28 17:08:00 [14194] vm02   crmd: info:
>> register_fsa_error_adv: Resetting the current action list
>> May 28 17:08:00 [14194] vm02   crmd:error: do_log: FSA: Input
>> I_ERROR from config_query_callback() received in state S_POLICY_ENGINE
>> May 28 17:08:00 [14194] vm02   crmd:  warning:
>> do_state_transition: State transition S_POLICY_ENGINE -> S_RECOVERY [
>> input=I_ERROR cause=C_FSA_INTERNAL origin=config_query_callback ]
>> May 28 17:08:00 [14194] vm02   crmd:  warning: do_recover:
>> Fast-tracking shutdown in response to errors
>> May 28 17:08:00 [14194] vm02   crmd:  warning: do_election_vote:
>> Not voting in election, we're in state S_RECOVERY
>>
>> I think that query performed in large quantities cannot be processed.
>> Before implementing cib_performance, abort_transition() was called only once.
>>
>> Is this corrected?
>>
>> report when a problem occurs is attached.
>> https://drive.google.com/file/d/0BwMFJItoO-fVX0gxM1ptcE52WWs/edit?usp=sharing
>
> That doesn't appear to match the symptoms above.
>
>>
>> Regards,
>> Yusuke
>> --
>> 
>> METRO SYSTEMS CO., LTD
>>
>> Yusuke Iida
>> Mail: yusk.i...@gmail.com
>> 
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 

METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.i...@gmail.com


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] If 256 resources are load(ed), crmd will reboot.

2014-05-28 Thread Andrew Beekhof

On 28 May 2014, at 6:42 pm, Yusuke Iida  wrote:

> Hi, Andrew
> 
> I made the cluster load a setup to which 256 resources are started using 
> crmsh.
> At this time, crmd changed into the S_RECOVERY state and rebooted.
> 
> May 28 17:08:00 [14194] vm02   crmd:error:
> config_query_callback: Local CIB query resulted in an error: Timer
> expired
> May 28 17:08:00 [14194] vm02   crmd: info:
> register_fsa_error_adv: Resetting the current action list
> May 28 17:08:00 [14194] vm02   crmd:error: do_log: FSA: Input
> I_ERROR from config_query_callback() received in state S_POLICY_ENGINE
> May 28 17:08:00 [14194] vm02   crmd:  warning:
> do_state_transition: State transition S_POLICY_ENGINE -> S_RECOVERY [
> input=I_ERROR cause=C_FSA_INTERNAL origin=config_query_callback ]
> May 28 17:08:00 [14194] vm02   crmd:  warning: do_recover:
> Fast-tracking shutdown in response to errors
> May 28 17:08:00 [14194] vm02   crmd:  warning: do_election_vote:
> Not voting in election, we're in state S_RECOVERY
> 
> I think that query performed in large quantities cannot be processed.
> Before implementing cib_performance, abort_transition() was called only once.
> 
> Is this corrected?
> 
> report when a problem occurs is attached.
> https://drive.google.com/file/d/0BwMFJItoO-fVX0gxM1ptcE52WWs/edit?usp=sharing

That doesn't appear to match the symptoms above.

> 
> Regards,
> Yusuke
> -- 
> 
> METRO SYSTEMS CO., LTD
> 
> Yusuke Iida
> Mail: yusk.i...@gmail.com
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Logging options: how to change logpriority only for cib

2014-05-28 Thread Andrew Beekhof

On 28 May 2014, at 8:25 pm, Bernardo Cabezas Serra  wrote:

> Hello,
> 
> El 27/05/14 23:05, Andrew Beekhof escribió:
>>> But now, we get this line each 5 seconds on corosync syslog:
>>> pacemakerd[23624]:   notice: crm_add_logfile: Additional logging available 
>>> in /var/log/pacemaker.log
>> 
>> That message should only pop up when pacemakerd has just been started... 
>> someone/thing is calling pacemakerd over and over.
> 
> Mmmm is strange. I have no other process than pacemakerd and its forked 
> processes, and pids are not changing.

We run on corosync stack on top of ubuntu.

^^^ /me puts bets on upstart being the culprit :-)

> 
> But it could be related to pgsql RA. When I stop pgsql resource, crm_log_init 
> stops happening.

That is very odd... I wonder if its calling pacemakerd somehow.
Or maybe upstart is checking if pacemaker is still alive when the core files 
below are produced and its something related to pgsql that is causing them to 
be produced.

> 
> With pgsql RA started, this logs repeats:
> 
> May 28 12:13:47 lohap2 pacemakerd[25581]: info: crm_log_init: Changed 
> active directory to /opt/ha/var/lib/heartbeat/cores/root
> May 28 12:13:47 lohap2 pacemakerd[25581]: info: crm_xml_cleanup: Cleaning 
> up memory from libxml2
> May 28 12:13:47 lohap2 pgsql(pgsql)[25571]: DEBUG: PostgreSQL is running as a 
> hot standby.
> 
> 
> ¿Could it be originated by pgsql RA trying to log with ocf_log from 
> ofc-shellfuncs?

I would be surprised.

> ¿Can ocf_log shellfunc originate a call to crm_log_init on pacemakerd?

I don't believe so.

> 
> 
> Also, I see ha_logd is not running. Is this normal?

yes

> 
> 
> And this also could be related (repeats each 60 seconds):
> 
> 
> May 28 11:28:48 lohap1 stonith-ng[23321]:error: crm_abort: 
> crm_glib_handler: Forked child 13192 to record non-fatal assert at 
> logging.c:73 : Source ID 2493 was not found when attempting to remove it
> May 28 11:28:48 lohap1 stonith-ng[23321]: crit: crm_glib_handler: GLib: 
> Source ID 2493 was not found when attempting to remove it
> May 28 11:28:48 lohap1 stonith-ng[23321]:error: crm_abort: 
> crm_glib_handler: Forked child 13194 to record non-fatal assert at 
> logging.c:73 : Source ID 2494 was not found when attempting to remove it
> May 28 11:28:48 lohap1 stonith-ng[23321]: crit: crm_glib_handler: GLib: 
> Source ID 2494 was not found when attempting to remove it
> ---

I think someone else mentioned this but I don't have a crm_report for it.
Could you send one through (after installing the debug symbols) please?

> 
> Our glib version: 2.40.0 from ubuntu 14.04LTS
> 
> 
>>> And /var/log/pacemaker.log is at info level, but cib outputs LOTS of cib 
>>> messages like these (nearly each second):
>> I'll make the following change for rc2.  There is really no need for query 
>> operations to be logged at info.
>> With those removed, are the cib logs more acceptable?
>> 
>> diff --git a/cib/callbacks.c b/cib/callbacks.c
> > [...]
> 
> Thank you, tested patch, and log is now more clean ;)
> 
>>> But with this priority we have no info about lrmd or pacemakerd actions.
>> 
>> The default value for PCMK_logpriority, when sending logs to syslog, is 
>> 'notice'... that should contain enough information to be useful without 
>> being completely overwhelming.
>> its only the log file that is logged at level 'info' but you've already 
>> disabled that with PCMK_debugfile=/dev/null.
> 
> Ok, thanks for the info.
> Now I have logging as I wanted: Only log to syslog (corosync default), 
> priority INFO.
> I have PCMK_debugfile=none, which seems better than /dev/null (seen from 
> sources).
> 
> Thank you
> Bernardo
> 
> -- 
> APSL
> *Bernardo Cabezas Serra*
> *Responsable Sistemas*
> Camí Vell de Bunyola 37, esc. A, local 7
> 07009 Polígono de Son Castelló, Palma
> Mail: bcabe...@apsl.net
> Skype: bernat.cabezas
> Tel: 971439771
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Logging options: how to change logpriority only for cib

2014-05-28 Thread Bernardo Cabezas Serra

Hello,

El 27/05/14 23:05, Andrew Beekhof escribió:

But now, we get this line each 5 seconds on corosync syslog:
pacemakerd[23624]:   notice: crm_add_logfile: Additional logging available in 
/var/log/pacemaker.log


That message should only pop up when pacemakerd has just been started... 
someone/thing is calling pacemakerd over and over.


Mmmm is strange. I have no other process than pacemakerd and its forked 
processes, and pids are not changing.


But it could be related to pgsql RA. When I stop pgsql resource, 
crm_log_init stops happening.


With pgsql RA started, this logs repeats:

May 28 12:13:47 lohap2 pacemakerd[25581]: info: crm_log_init: 
Changed active directory to /opt/ha/var/lib/heartbeat/cores/root
May 28 12:13:47 lohap2 pacemakerd[25581]: info: crm_xml_cleanup: 
Cleaning up memory from libxml2
May 28 12:13:47 lohap2 pgsql(pgsql)[25571]: DEBUG: PostgreSQL is running 
as a hot standby.



¿Could it be originated by pgsql RA trying to log with ocf_log from 
ofc-shellfuncs?

¿Can ocf_log shellfunc originate a call to crm_log_init on pacemakerd?


Also, I see ha_logd is not running. Is this normal?


And this also could be related (repeats each 60 seconds):


May 28 11:28:48 lohap1 stonith-ng[23321]:error: crm_abort: 
crm_glib_handler: Forked child 13192 to record non-fatal assert at 
logging.c:73 : Source ID 2493 was not found when attempting to remove it
May 28 11:28:48 lohap1 stonith-ng[23321]: crit: crm_glib_handler: 
GLib: Source ID 2493 was not found when attempting to remove it
May 28 11:28:48 lohap1 stonith-ng[23321]:error: crm_abort: 
crm_glib_handler: Forked child 13194 to record non-fatal assert at 
logging.c:73 : Source ID 2494 was not found when attempting to remove it
May 28 11:28:48 lohap1 stonith-ng[23321]: crit: crm_glib_handler: 
GLib: Source ID 2494 was not found when attempting to remove it

---

Our glib version: 2.40.0 from ubuntu 14.04LTS



And /var/log/pacemaker.log is at info level, but cib outputs LOTS of cib 
messages like these (nearly each second):

I'll make the following change for rc2.  There is really no need for query 
operations to be logged at info.
With those removed, are the cib logs more acceptable?

diff --git a/cib/callbacks.c b/cib/callbacks.c

> [...]

Thank you, tested patch, and log is now more clean ;)


But with this priority we have no info about lrmd or pacemakerd actions.


The default value for PCMK_logpriority, when sending logs to syslog, is 
'notice'... that should contain enough information to be useful without being 
completely overwhelming.
its only the log file that is logged at level 'info' but you've already 
disabled that with PCMK_debugfile=/dev/null.


Ok, thanks for the info.
Now I have logging as I wanted: Only log to syslog (corosync default), 
priority INFO.
I have PCMK_debugfile=none, which seems better than /dev/null (seen from 
sources).


Thank you
Bernardo

--
APSL
*Bernardo Cabezas Serra*
*Responsable Sistemas*
Camí Vell de Bunyola 37, esc. A, local 7
07009 Polígono de Son Castelló, Palma
Mail: bcabe...@apsl.net
Skype: bernat.cabezas
Tel: 971439771


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] If 256 resources are load(ed), crmd will reboot.

2014-05-28 Thread Yusuke Iida
Hi, Andrew

I made the cluster load a setup to which 256 resources are started using crmsh.
At this time, crmd changed into the S_RECOVERY state and rebooted.

May 28 17:08:00 [14194] vm02   crmd:error:
config_query_callback: Local CIB query resulted in an error: Timer
expired
May 28 17:08:00 [14194] vm02   crmd: info:
register_fsa_error_adv: Resetting the current action list
May 28 17:08:00 [14194] vm02   crmd:error: do_log: FSA: Input
I_ERROR from config_query_callback() received in state S_POLICY_ENGINE
May 28 17:08:00 [14194] vm02   crmd:  warning:
do_state_transition: State transition S_POLICY_ENGINE -> S_RECOVERY [
input=I_ERROR cause=C_FSA_INTERNAL origin=config_query_callback ]
May 28 17:08:00 [14194] vm02   crmd:  warning: do_recover:
Fast-tracking shutdown in response to errors
May 28 17:08:00 [14194] vm02   crmd:  warning: do_election_vote:
Not voting in election, we're in state S_RECOVERY

I think that query performed in large quantities cannot be processed.
Before implementing cib_performance, abort_transition() was called only once.

Is this corrected?

report when a problem occurs is attached.
https://drive.google.com/file/d/0BwMFJItoO-fVX0gxM1ptcE52WWs/edit?usp=sharing

Regards,
Yusuke
-- 

METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.i...@gmail.com


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-05-28 Thread Christian Ciach
Done:

http://bugs.clusterlabs.org/show_bug.cgi?id=5216

Best regards,
Christianc


2014-05-27 22:51 GMT+02:00 Andrew Beekhof :

>
> On 27 May 2014, at 7:20 pm, Christian Ciach  wrote:
>
> >
> >
> >
> > 2014-05-27 7:34 GMT+02:00 Andrew Beekhof :
> >
> > On 27 May 2014, at 3:12 pm, Gao,Yan  wrote:
> >
> > > On 05/27/14 08:07, Andrew Beekhof wrote:
> > >>
> > >> On 26 May 2014, at 10:47 pm, Christian Ciach 
> wrote:
> > >>
> > >>> I am sorry to get back to this topic, but I'm genuinely curious:
> > >>>
> > >>> Why is "demote" an option for the ticket "loss-policy" for
> multi-site-clusters but not for the normal "no-quorum-policy" of local
> clusters? This seems like a missing feature to me.
> > >>
> > >> Or one feature too many.
> > >> Perhaps Yan can explain why he wanted demote as an option for the
> loss-policy.
> > > Loss-policy="demote" is a kind of natural default if the "Master" mode
> > > of a resource requires a ticket like:
> > > 
> > >
> > > The idea is for running stateful resource instances across clusters.
> And
> > > loss-policy="demote" provides the possibility if there's the need to
> > > still run the resource in slave mode for any reason when losing the
> > > ticket, rather than stopping it or fencing the node hosting it.
> >
> > I guess the same logic applies to the single cluster use-case too and we
> should allow no-quorum-policy=demote.
> >
> >
> > Thank you for mentioning this. This was my thought as well.
> >
> > At the moment we "simulate" this behaviour by using a primitive resource
> where "started" means "master" and "stopped" means "slave". This way we can
> use "no-quorum-policy=stop" to actually switch the resource to slave on
> quorum loss. This seems hacky, so I would appreciate if this could be done
> in a proper way some time in the future.
>
> Could you file a bug for that in bugs.clusterlabs.org so we don't loose
> track of it?
>
> >
> > One question though... do we still stop non-master/slave resources for
> loss-policy=demote?
> >
> > >
> > > Regards,
> > >  Yan
> > >
> > >>
> > >>>
> > >>> Best regards
> > >>> Christian
> > >>>
> > >>>
> > >>> 2014-04-07 9:54 GMT+02:00 Christian Ciach :
> > >>> Hello,
> > >>>
> > >>> I am using Corosync 2.0 with Pacemaker 1.1 on Ubuntu Server 14.04
> (daily builds until final release).
> > >>>
> > >>> My problem is as follows: I have a 2-node (plus a quorum-node)
> cluster to manage a multistate-resource. One node should be the master and
> the other one the slave. It is absolutely not allowed to have two masters
> at the same time. To prevent a split-brain situation, I am also using a
> third node as a quorum-only node (set to standby). There is no redundant
> connection because the nodes are connected over the internet.
> > >>>
> > >>> If one of the two nodes managing the resource becomes disconnected,
> it loses quorum. In this case, I want this resource to become a slave, but
> the resource should never be stopped completely! This leaves me with a
> problem: "no-quorum-policy=stop" will stop the resource, while
> "no-quorum-policy=ignore" will keep this resource in a master-state. I
> already tried to demote the resource manually inside the monitor-action of
> the OCF-agent, but pacemaker will promote the resource immediately again.
> > >>>
> > >>> I am aware that I am trying the manage a multi-site-cluster and
> there is something like the booth-daemon, which sounds like the solution to
> my problem. But unfortunately I need the location-constraints of pacemaker
> based on the score of the OCF-agent. As far as I know location-constraints
> are not possible when using booth, because the 2-node-cluster is
> essentially split into two 1-node-clusters. Is this correct?
> > >>>
> > >>> To conclude: Is it possible to demote a resource on quorum loss
> instead of stopping it? Is booth an option if I need to manage the location
> of the master based on the score returned by the OCF-agent?
> > >>>
> > >>>
> > >>> ___
> > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >>>
> > >>> Project Home: http://www.clusterlabs.org
> > >>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >>> Bugs: http://bugs.clusterlabs.org
> > >>
> > >
> > > --
> > > Gao,Yan 
> > > Software Engineer
> > > China Server Team, SUSE.
> > >
> > > ___
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started