Re: [Pacemaker] [Question] About the stop order at the time of the Probe error.

2012-09-04 Thread renayama19661014
Hi Andrew, 

Thank you for comments.

  Question 1) Is there right setting in cib.xml to evade this problem?
 
 No.
 
 
  Question 2) In Pacemaker1.1, does this problem occur?
 
 Yes.  I'll see what I can do.
 
 
  Question 3) I added following order.
 
 
  rsc_order id=order-2 first=resource1 then=resource3 /
  rsc_order id=order-3 first=resource1 then=resource4 /
  rsc_order id=order-5 first=resource2 then=resource4 /
 
  And the addition of this order seems to solve a problem.
  Is the addition of order right as one method of the solution, 
  too?
 
 Really the PE should handle this implicitly, without need for
 additional constraints.

All right.

I wish this problem is solved.
I registered a demand with Bugzilla about this problem.
 * http://bugs.clusterlabs.org/show_bug.cgi?id=5101

Best Regards,
Hideo Yamauchi.



--- On Wed, 2012/9/5, Andrew Beekhof and...@beekhof.net wrote:

 On Wed, Aug 22, 2012 at 4:44 PM,  renayama19661...@ybb.ne.jp wrote:
  Hi All,
 
  We found a problem at the time of Porobe error.
 
  It is the following simple resource constitution.
 
  
  Last updated: Wed Aug 22 15:19:50 2012
  Stack: Heartbeat
  Current DC: drbd1 (6081ac99-d941-40b9-a4a3-9f996ff291c0) - partition with 
  quorum
  Version: 1.0.12-c6770b8
  1 Nodes configured, unknown expected votes
  1 Resources configured.
  
 
  Online: [ drbd1 ]
 
   Resource Group: grpTest
       resource1  (ocf::pacemaker:Dummy): Started drbd1
       resource2  (ocf::pacemaker:Dummy): Started drbd1
       resource3  (ocf::pacemaker:Dummy): Started drbd1
       resource4  (ocf::pacemaker:Dummy): Started drbd1
 
  Node Attributes:
  * Node drbd1:
 
  Migration summary:
  * Node drbd1:
 
 
  Depending on the resource that the Probe error occurs, the stop of the 
  resource does not become the inverse order.
 
  I confirmed it in the next procedure.
 
  Step 1) Make resource2 and resource4 a starting state.
 
  [root@drbd1 ~]# touch /var/run/Dummy-resource2.state
  [root@drbd1 ~]# touch /var/run/Dummy-resource4.state
 
  Step 2) Start a node and send cib.
 
  Step 3) Resource2 and resource3 stop, but are not inverse order.
 
  (snip)
  Aug 22 15:19:47 drbd1 pengine: [32722]: notice: group_print:  Resource 
  Group: grpTest
  Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print:      
  resource1#011(ocf::pacemaker:Dummy):#011Stopped
  Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print:      
  resource2#011(ocf::pacemaker:Dummy):#011Started drbd1
  Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print:      
  resource3#011(ocf::pacemaker:Dummy):#011Stopped
  Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print:      
  resource4#011(ocf::pacemaker:Dummy):#011Started drbd1
  (snip)
  Aug 22 15:19:47 drbd1 crmd: [32719]: info: te_rsc_command: Initiating 
  action 6: stop resource2_stop_0 on drbd1 (local)
  Aug 22 15:19:47 drbd1 crmd: [32719]: info: do_lrm_rsc_op: Performing 
  key=6:2:0:5c924067-0d20-48fd-9772-88e530661270 op=resource2_stop_0 )
  Aug 22 15:19:47 drbd1 lrmd: [32716]: info: rsc:resource2 stop[6] (pid 32745)
  Aug 22 15:19:47 drbd1 crmd: [32719]: info: te_rsc_command: Initiating 
  action 11: stop resource4_stop_0 on drbd1 (local)
  Aug 22 15:19:47 drbd1 crmd: [32719]: info: do_lrm_rsc_op: Performing 
  key=11:2:0:5c924067-0d20-48fd-9772-88e530661270 op=resource4_stop_0 )
  Aug 22 15:19:47 drbd1 lrmd: [32716]: info: rsc:resource4 stop[7] (pid 32746)
  Aug 22 15:19:47 drbd1 lrmd: [32716]: info: operation stop[6] on resource2 
  for client 32719: pid 32745 exited with return code 0
  (snip)
 
 Hmmm. Thats not good.
 
 
  I know that there is a cause of this stop order for order in group.
 
  In this case our user wants to stop a resource in inverse order definitely.
 
   * resource4_stop - resource2_stop
 
  Stop order is important to the resource of our user.
 
 
  I ask next question.
 
  Question 1) Is there right setting in cib.xml to evade this problem?
 
 No.
 
 
  Question 2) In Pacemaker1.1, does this problem occur?
 
 Yes.  I'll see what I can do.
 
 
  Question 3) I added following order.
 
 
          rsc_order id=order-2 first=resource1 then=resource3 /
          rsc_order id=order-3 first=resource1 then=resource4 /
          rsc_order id=order-5 first=resource2 then=resource4 /
 
              And the addition of this order seems to solve a problem.
              Is the addition of order right as one method of the solution, 
 too?
 
 Really the PE should handle this implicitly, without need for
 additional constraints.
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] fencing best practices for virtual environments

2012-09-09 Thread renayama19661014
Hi Alberto,

I think that I should set plural external/vcenter if your problem is practice 
of stonith when vcenter falls and is not usable.

Please refer to the next email and patch.
 * http://www.gossamer-threads.com/lists/linuxha/dev/78702

Best Regards,
Hideo Yamauchi.

--- On Sun, 2012/9/9, Alberto Menichetti albmeniche...@tai.it wrote:

 Hi all,
 
 I'm setting up a two-node pacemaker cluster (SLES-HA Extension) on vmware 
 vsphere 5.
 I've successfully configured and tested the stonith plugin 
 external/vcenter; but this plugin introduces a single point of failure in 
 my cluster infrastructure because it depends on the availability of the 
 virtual center (which is, in the customer environment, a virtual machine).
 I was thinking to introduce an additional fencing device, to be used when the 
 virtual center is unavailable; is this a suggested deployment?
 The fecing device I'd like to use is sdb.
 
 Are there some best practices or validated configurations for a deploy like 
 this?
 
 Thank you.
 Alberto
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Problem] About the replacement of the master/slave resource.

2012-09-10 Thread renayama19661014
Hi All,

We confirmed movement of the trouble of the clone resource that we combined 
with Master/Slave resource.

The master / slave resources are replaced under the influence of the trouble of 
the clonal resource.

We confirmed it in the next procedure.


Step1) We start a cluster and send cib.


Last updated: Mon Sep 10 15:26:25 2012
Stack: Heartbeat
Current DC: drbd2 (08607c71-da7b-4abf-b6d5-39ee39552e89) - partition with quorum
Version: 1.0.12-c6770b8
2 Nodes configured, unknown expected votes
6 Resources configured.


Online: [ drbd1 drbd2 ]

 Resource Group: grpPostgreSQLDB
 prmApPostgreSQLDB  (ocf::pacemaker:Dummy): Started drbd1
 Resource Group: grpStonith1
 prmStonith1-2  (stonith:external/ssh): Started drbd2
 prmStonith1-3  (stonith:meatware): Started drbd2
 Resource Group: grpStonith2
 prmStonith2-2  (stonith:external/ssh): Started drbd1
 prmStonith2-3  (stonith:meatware): Started drbd1
 Master/Slave Set: msDrPostgreSQLDB
 Masters: [ drbd1 ]
 Slaves: [ drbd2 ]
 Clone Set: clnDiskd1
 Started: [ drbd1 drbd2 ]
 Clone Set: clnPingd
 Started: [ drbd1 drbd2 ]

Step2) We cause a monitor error in pingd.

[root@drbd1 ~]# rm -rf /var/run/pingd-default_ping_set 

Step3) FailOver is finished.


Last updated: Mon Sep 10 15:27:08 2012
Stack: Heartbeat
Current DC: drbd2 (08607c71-da7b-4abf-b6d5-39ee39552e89) - partition with quorum
Version: 1.0.12-c6770b8
2 Nodes configured, unknown expected votes
6 Resources configured.


Online: [ drbd1 drbd2 ]

 Resource Group: grpPostgreSQLDB
 prmApPostgreSQLDB  (ocf::pacemaker:Dummy): Started drbd2
 Resource Group: grpStonith1
 prmStonith1-2  (stonith:external/ssh): Started drbd2
 prmStonith1-3  (stonith:meatware): Started drbd2
 Resource Group: grpStonith2
 prmStonith2-2  (stonith:external/ssh): Started drbd1
 prmStonith2-3  (stonith:meatware): Started drbd1
 Master/Slave Set: msDrPostgreSQLDB
 Masters: [ drbd2 ]
 Stopped: [ prmDrPostgreSQLDB:1 ]
 Clone Set: clnDiskd1
 Started: [ drbd1 drbd2 ]
 Clone Set: clnPingd
 Started: [ drbd2 ]
 Stopped: [ prmPingd:0 ]

Failed actions:
prmPingd:0_monitor_1 (node=drbd1, call=14, rc=7, status=complete): not 
running



However, Master/Slave resources seemed to be replaced when we watched log.

Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Moveresource 
prmApPostgreSQLDB#011(Started drbd1 - drbd2)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource 
prmStonith1-2#011(Started drbd2)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource 
prmStonith1-3#011(Started drbd2)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource 
prmStonith2-2#011(Started drbd1)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource 
prmStonith2-3#011(Started drbd1)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Moveresource 
prmDrPostgreSQLDB:0#011(Master drbd1 - drbd2)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Stopresource 
prmDrPostgreSQLDB:1#011(drbd2)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource 
prmDiskd1:0#011(Started drbd1)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource 
prmDiskd1:1#011(Started drbd2)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Stopresource 
prmPingd:0#011(drbd1)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource 
prmPingd:1#011(Started drbd2)

The replacement is unnecessary, and Slave becomes Master, and inoperative 
Master should have only to originally stop.

However, this problem seems to be solved in Pacemaker1.1.

Will the correction be possible for Pacemaker1.0?
Because I have a big difference in placement processing with Pacemaker1.1, I 
think that the correction to Pacemaker1.0 is difficult.

 * This problem may have been reported as a known problem.
 * I registered this problem with Bugzilla.
  * http://bugs.clusterlabs.org/show_bug.cgi?id=5103

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Patch] The log when I lost Quorum is never output.

2012-10-17 Thread renayama19661014
Hi Andrew,

Thank you for comments.

Even your correction is good, but one place of correction is necessary.

 #if SUPPORT_HEARTBEAT
-static gboolean fsa_have_quorum = FALSE;
 
 gboolean ccm_dispatch(int fd, gpointer user_data)
 {
@@ -575,14 +574,14 @@
 
if(update_quorum) {
crm_have_quorum = ccm_have_quorum(event);
-   crm_update_quorum(crm_have_quorum, FALSE);
 
if(crm_have_quorum == FALSE) {
/* did we just loose quorum? */
-   if(fsa_have_quorum) {
+   if(fsa_has_quorum) {
crm_info(Quorum lost: %s, ccm_event_name(event));
}
}
+   crm_update_quorum(crm_have_quorum, FALSE);
}

if(update_cache) {


And I think fsa_have_quorum to delete it because I am not necessary.


Best Regards,
Hideo Yamauuchi.


--- On Thu, 2012/10/18, Andrew Beekhof and...@beekhof.net wrote:

 What about this instead?
 
 diff --git a/crmd/heartbeat.c b/crmd/heartbeat.c
 index cae143b..3a7f31d 100644
 --- a/crmd/heartbeat.c
 +++ b/crmd/heartbeat.c
 @@ -354,14 +354,13 @@ crmd_ccm_msg_callback(oc_ed_t event, void
 *cookie, size_t size, const void *data
 
      if (update_quorum) {
          crm_have_quorum = ccm_have_quorum(event);
 -        crm_update_quorum(crm_have_quorum, FALSE);
 -
          if (crm_have_quorum == FALSE) {
              /* did we just loose quorum? */
              if (fsa_have_quorum) {
                  crm_info(Quorum lost: %s, ccm_event_name(event));
              }
          }
 +        crm_update_quorum(crm_have_quorum, FALSE);
      }
 
      if (update_cache) {
 
 
 On Wed, Oct 17, 2012 at 6:41 PM,  renayama19661...@ybb.ne.jp wrote:
  Hi All,
 
  I watched a source of crmd of 
  ClusterLabs-pacemaker-1.0-Pacemaker-1.0.12-116-gf372204.zip.
 
  I found the log(Quorum lost: ) processing that was never output.
  The cause that log is not output is fsa_have_quorum.
  This value is always FALSE.
 
  (snip)
  * crmd/callback.c
  #if SUPPORT_HEARTBEAT
  static gboolean fsa_have_quorum = FALSE;
  (snip)
          if(update_quorum) {
              crm_have_quorum = ccm_have_quorum(event);
              crm_update_quorum(crm_have_quorum, FALSE);
 
              if(crm_have_quorum == FALSE) {
                          /* did we just loose quorum? */
                          if(fsa_have_quorum) {
                          crm_info(Quorum lost: %s, ccm_event_name(event));
                          }
              }
          }
  (snip)
 
  I made a patch to output this log.
  Please apply to a repository if this patch does not have a problem.
 
  Best Regards,
  Hideo Yamauchi.
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 


new.patch
Description: Binary data
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] The strange behavior of Master/Slave when it failed to demote

2013-01-22 Thread renayama19661014
Hi All,

I registered a problem at bugzilla in place of Miss Ikeda.
 * http://bugs.clusterlabs.org/show_bug.cgi?id=5133

Best Regards,
Hideo Yamauchi.


--- On Thu, 2013/1/10, Junko IKEDA tsukishima...@gmail.com wrote:

 
 
 Hi,
 
 I'm running Stateful RA with Pacemaker 1.0.12, and found that its demote 
 behavior is something wrong.
 
 This is my configuration;
 There is no stonith devices, and demote/stop are set as on-fail=block.
 
 # crm configure show
 node $id=21c624bd-c426-43dc-9665-bbfb92054bcd dl380g5c \
 node $id=3f6ec88d-ee47-4f63-bfeb-652b8dd96027 dl380g5d
 primitive dummy ocf:pacemaker:Stateful \
         op start interval=0s timeout=100s on-fail=restart \
         op monitor interval=10s role=Master timeout=100s 
 on-fail=restart \
         op monitor interval=20s role=Slave timeout=100s 
 on-fail=restart \
         op promote interval=0s timeout=100s on-fail=restart \
         op demote interval=0s timeout=100s on-fail=block \
         op stop interval=0s timeout=100s on-fail=block
 ms stateful dummy
 property $id=cib-bootstrap-options \
         dc-version=1.0.12-066152e \
         cluster-infrastructure=Heartbeat \
         no-quorum-policy=ignore \
         stonith-enabled=false \
         startup-fencing=false \
         crmd-transition-delay=2s
 rsc_defaults $id=rsc-options \
         resource-stickiness=INFINITY \
         migration-threshold=1
 
 
 
 1) Initial status (dl380g5c=Master/dl380g5d=Slave)
 # crm_mon -1 -n
 
 
 Last updated: Thu Jan 10 18:25:17 2013
 Stack: Heartbeat
 Current DC: dl380g5d (3f6ec88d-ee47-4f63-bfeb-652b8dd96027) - partition with 
 quorum
 Version: 1.0.12-066152e
 2 Nodes configured, unknown expected votes
 1 Resources configured.
 
 
 Node dl380g5c (21c624bd-c426-43dc-9665-bbfb92054bcd): online
         dummy:0 (ocf::pacemaker:Stateful) Master
 Node dl380g5d (3f6ec88d-ee47-4f63-bfeb-652b8dd96027): online
         dummy:1 (ocf::pacemaker:Stateful) Started
 
 
 
 2) Modify Stateful RA to reprodece demote NG, and put the Master node into 
 standby mode.
 
 # vim /usr/lib/ocf/resource.d/pacemaker/Stateful
 stateful_demote() {
 return $OCF_ERR_GENERIC
 
     stateful_check_state
     if [ $? = 0 ]; then
         # CRM Error - Should never happen
         return $OCF_NOT_RUNNING
 
 ...
 
 
 # crm node standby dl380g5c
 # crm_mon -1 -n
 
 Last updated: Thu Jan 10 18:27:04 2013
 Stack: Heartbeat
 Current DC: dl380g5d (3f6ec88d-ee47-4f63-bfeb-652b8dd96027) - partition with 
 quorum
 Version: 1.0.12-066152e
 2 Nodes configured, unknown expected votes
 1 Resources configured.
 
 
 Node dl380g5c (21c624bd-c426-43dc-9665-bbfb92054bcd): standby
         dummy:0 (ocf::pacemaker:Stateful) Slave  (unmanaged) FAILED
 Node dl380g5d (3f6ec88d-ee47-4f63-bfeb-652b8dd96027): online
         dummy:1 (ocf::pacemaker:Stateful) Master
 
 Failed actions:
     dummy:0_demote_0 (node=dl380g5c, call=4, rc=1, status=complete): unknown 
 error
 
 
 In the above crm_mon, dl380g5c's status is Slave, but it might be still 
 Master because it failed to demote.
 So dl380g5d should be prohibited from its promoting action to prevent the 
 multiple Master.
 It seems that Pacemaker 1.1 shows the same behavior as 1.0.12.
 I'm not sure but Pacemaker 1.0.11's behavior is correct(dl380g5d can not 
 promote).
 Please see the attached hb_report.
 
 
 Jan 10 18:27:01 dl380g5d pengine: [4297]: info: determine_online_status: Node 
 dl380g5c is standby
 Jan 10 18:27:01 dl380g5d pengine: [4297]: info: determine_online_status: Node 
 dl380g5d is online
 Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: unpack_rsc_op: Operation 
 dummy:0_monitor_0 found resource dummy:0 active in master mode on dl380g5c
 Jan 10 18:27:01 dl380g5d pengine: [4297]: WARN: unpack_rsc_op: Processing 
 failed op dummy:0_demote_0 on dl380g5c: unknown error (1)
 Jan 10 18:27:01 dl380g5d pengine: [4297]: WARN: unpack_rsc_op: Forcing 
 dummy:0 to stop after a failed demote action
 Jan 10 18:27:01 dl380g5d pengine: [4297]: info: native_add_running: resource 
 dummy:0 isnt managed
 Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: clone_print:  Master/Slave 
 Set: stateful
 Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: native_print:      dummy:0  
 (ocf::pacemaker:Stateful):  Slave dl380g5c (unmanaged) FAILED
 Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: short_print:      Slaves: [ 
 dl380g5d ]
 Jan 10 18:27:01 dl380g5d pengine: [4297]: info: get_failcount: stateful has 
 failed 1 times on dl380g5c
 Jan 10 18:27:01 dl380g5d pengine: [4297]: WARN: common_apply_stickiness: 
 Forcing stateful away from dl380g5c after 1 failures (max=1)
 Jan 10 18:27:01 dl380g5d pengine: [4297]: info: get_failcount: stateful has 
 failed 1 times on dl380g5c
 Jan 10 18:27:01 dl380g5d pengine: [4297]: WARN: common_apply_stickiness: 
 Forcing stateful away from dl380g5c after 1 failures (max=1)
 Jan 10 18:27:01 dl380g5d pengine: [4297]: info: native_color: Unmanaged 
 resource dummy:0 allocated to 

[Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.

2013-03-05 Thread renayama19661014
Hi Dejan,
Hi Andrew,

As for the crm shell, the check of the meta attribute was revised with the next 
patch.

 * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3

This patch was backported in Pacemaker1.0.13.

 * 
https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py

However, the ordered,colocated attribute of the group resource is treated as an 
error when I use crm Shell which adopted this patch.

--
(snip)
### Group Configuration ###
group master-group \
vip-master \
vip-rep \
meta \
ordered=false
(snip)

[root@rh63-heartbeat1 ~]# crm configure load update test2339.crm 
INFO: building help index
crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind faith: not 
fencing unseen nodes
WARNING: vip-master: specified timeout 60s for start is smaller than the 
advised 90
WARNING: vip-master: specified timeout 60s for stop is smaller than the advised 
100
WARNING: vip-rep: specified timeout 60s for start is smaller than the advised 90
WARNING: vip-rep: specified timeout 60s for stop is smaller than the advised 100
ERROR: master-group: attribute ordered does not exist  - WHY?
Do you still want to commit? y
--

If it chooses `yes` by a confirmation message, it is reflected, but it is a 
problem that error message is displayed.
 * The error occurs in the same way when I appoint colocated attribute.
AndI noticed that there was not explanation of ordered,colocated of the 
group resource in online help of Pacemaker.

I think that the designation of the ordered,colocated attribute should not 
become the error in group resource.
In addition, I think that ordered,colocated should be added to online help.

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.

2013-03-06 Thread renayama19661014
Hi Dejan,

The problem was settled with your patch.

However, I have a question.
I want to use resource_set which Mr. Andrew proposed, but do not understand a 
method to use with crm shell.

I read two next cib.xml and confirmed it with crm shell.

Case 1) sequential=false. 
(snip)
constraints
rsc_order id=test-order
resource_set sequential=false id=test-order-resource_set
resource_ref id=vip-master/
resource_ref id=vip-rep/
/resource_set
/rsc_order
/constraints
(snip)
 * When I confirm it with crm shell ...
(snip)
group master-group vip-master vip-rep
order test-order : _rsc_set_ ( vip-master vip-rep )
(snip)

Case 2) sequential=true
(snip)
constraints
  rsc_order id=test-order
resource_set sequential=true id=test-order-resource_set
  resource_ref id=vip-master/
  resource_ref id=vip-rep/
/resource_set
  /rsc_order
/constraints
(snip)
 * When I confirm it with crm shell ...
(snip)
   group master-group vip-master vip-rep
   xml rsc_order id=test-order \
resource_set id=test-order-resource_set sequential=true \
resource_ref id=vip-master/ \
resource_ref id=vip-rep/ \
/resource_set \
/rsc_order
(snip)

Does the designation of sequential=true have to describe it in xml?
Is there a right method to appoint an attribute of resource_set with crm 
shell?
Possibly is not resource_set usable with crm shell of Pacemaker1.0.13?

Best Regards,
Hideo Yamauchi.

--- On Thu, 2013/3/7, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi Dejan,
 Hi Andrew,
 
 Thank you for comment.
 I confirm the movement of the patch and report it.
 
 Best Regards,
 Hideo Yamauchi.
 
 --- On Wed, 2013/3/6, Dejan Muhamedagic deja...@fastmail.fm wrote:
 
  Hi Hideo-san,
  
  On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp wrote:
   Hi Dejan,
   Hi Andrew,
   
   As for the crm shell, the check of the meta attribute was revised with 
   the next patch.
   
    * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3
   
   This patch was backported in Pacemaker1.0.13.
   
    * 
  https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py
   
   However, the ordered,colocated attribute of the group resource is treated 
   as an error when I use crm Shell which adopted this patch.
   
   --
   (snip)
   ### Group Configuration ###
   group master-group \
           vip-master \
           vip-rep \
           meta \
                   ordered=false
   (snip)
   
   [root@rh63-heartbeat1 ~]# crm configure load update test2339.crm 
   INFO: building help index
   crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind faith: 
   not fencing unseen nodes
   WARNING: vip-master: specified timeout 60s for start is smaller than the 
   advised 90
   WARNING: vip-master: specified timeout 60s for stop is smaller than the 
   advised 100
   WARNING: vip-rep: specified timeout 60s for start is smaller than the 
   advised 90
   WARNING: vip-rep: specified timeout 60s for stop is smaller than the 
   advised 100
   ERROR: master-group: attribute ordered does not exist  - WHY?
   Do you still want to commit? y
   --
   
   If it chooses `yes` by a confirmation message, it is reflected, but it is 
   a problem that error message is displayed.
    * The error occurs in the same way when I appoint colocated attribute.
   AndI noticed that there was not explanation of ordered,colocated of 
   the group resource in online help of Pacemaker.
   
   I think that the designation of the ordered,colocated attribute should 
   not become the error in group resource.
   In addition, I think that ordered,colocated should be added to online 
   help.
  
  These attributes are not listed in crmsh. Does the attached patch
  help?
  
  Thanks,
  
  Dejan
   
   Best Regards,
   Hideo Yamauchi.
   
   
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
   
   Project Home: http://www.clusterlabs.org
   Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
  
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: 

[Pacemaker] [Question]About sequential designation of resource_set.

2013-03-06 Thread renayama19661014
Hi Andrew,

I tried resource_set  sequential designation.
 *  http://www.gossamer-threads.com/lists/linuxha/pacemaker/84578

I caused an error in start of the vip-master resource and confirmed movement.

(snip)
  group id=master-group
primitive class=ocf id=vip-master provider=pacemaker 
type=Dummy2
  operations
op id=vip-master-start-0s interval=0s name=start 
on-fail=restart timeout=60s/
op id=vip-master-monitor-10s interval=10s name=monitor 
on-fail=restart timeout=60s/
op id=vip-master-stop-0s interval=0s name=stop 
on-fail=block timeout=60s/
  /operations
/primitive
primitive class=ocf id=vip-rep provider=pacemaker type=Dummy
  operations
op id=vip-rep-start-0s interval=0s name=start on-fail=stop 
timeout=60s/
op id=vip-rep-monitor-10s interval=10s name=monitor 
on-fail=restart timeout=60s/
op id=vip-rep-stop-0s interval=0s name=stop on-fail=block 
timeout=60s/
  /operations
/primitive
  /group
(snip)

By the ordered designation of the group resource, the difference that I 
expected appeared.( Case 1 and Case 2)
However, by the sequential designation, the difference that I expected did 
not appear.(Case 3 and Case 4)

(snip)
constraints
rsc_order id=test-order
resource_set sequential=true id=test-order-resource_set  
--- or false
resource_ref id=vip-master/
resource_ref id=vip-rep/
/resource_set
/rsc_order
/constraints
(snip)


Case 1) group meta_attribute ordered=false 
 * Start of vip-rep is published without waiting for start of vip-master.

[root@rh63-heartbeat2 ~]# grep Initiating action /var/log/ha-log
Mar  7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 6: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 5: start vip-master_start_0 on rh63-heartbeat1
Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 7: start vip-rep_start_0 on rh63-heartbeat1 
Mar  7 19:41:26 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 8: monitor vip-rep_monitor_1 on rh63-heartbeat1
Mar  7 19:41:27 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 2: stop vip-master_stop_0 on rh63-heartbeat1
Mar  7 19:41:28 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 6: stop vip-rep_stop_0 on rh63-heartbeat1


Case 2) group meta_attribute ordered=true
 * Start of vip-rep waits for start of vip-master and is published.

[root@rh63-heartbeat2 ~]# grep Initiating action /var/log/ha-log
Mar  7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 6: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  7 19:35:43 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on 

Re: [Pacemaker] [Question]About sequential designation of resource_set.

2013-03-06 Thread renayama19661014
Hi Andrew,

Thank you for comment.

It was colocation.
I make modifications and confirm movement.

Many Thanks!
Hideo Yamauchi.

--- On Thu, 2013/3/7, Andrew Beekhof and...@beekhof.net wrote:

 Oh!
 
 You use the resource sets _instead_ of a group.
 If you want group.ordered=false, then use a colocation set (with
 sequential=true).
 If you want group.colocated=false, then use an ordering set (with
 sequential=true).
 
 Hope that helps :)
 
 On Thu, Mar 7, 2013 at 3:16 PM,  renayama19661...@ybb.ne.jp wrote:
  Hi Andrew,
 
  Thank you for comments.
 
   Case 3) group resource_set sequential=false
    * Start of vip-rep waits for start of vip-master and is published.
    * I expected a result same as the first case.
 
  Me too. Have you got the relevant PE file?
 
  I attached the thing which just collected hb_report.
 
  Best Regards,
  Hideo Yamauchi.
 
 
 
  --- On Thu, 2013/3/7, Andrew Beekhof and...@beekhof.net wrote:
 
  On Thu, Mar 7, 2013 at 1:27 PM,  renayama19661...@ybb.ne.jp wrote:
   Hi Andrew,
  
   I tried resource_set  sequential designation.
    *  http://www.gossamer-threads.com/lists/linuxha/pacemaker/84578
  
   I caused an error in start of the vip-master resource and confirmed 
   movement.
  
   (snip)
         group id=master-group
           primitive class=ocf id=vip-master provider=pacemaker 
  type=Dummy2
             operations
               op id=vip-master-start-0s interval=0s name=start 
  on-fail=restart timeout=60s/
               op id=vip-master-monitor-10s interval=10s 
  name=monitor on-fail=restart timeout=60s/
               op id=vip-master-stop-0s interval=0s name=stop 
  on-fail=block timeout=60s/
             /operations
           /primitive
           primitive class=ocf id=vip-rep provider=pacemaker 
  type=Dummy
             operations
               op id=vip-rep-start-0s interval=0s name=start 
  on-fail=stop timeout=60s/
               op id=vip-rep-monitor-10s interval=10s name=monitor 
  on-fail=restart timeout=60s/
               op id=vip-rep-stop-0s interval=0s name=stop 
  on-fail=block timeout=60s/
             /operations
           /primitive
         /group
   (snip)
  
   By the ordered designation of the group resource, the difference that I 
   expected appeared.( Case 1 and Case 2)
   However, by the sequential designation, the difference that I expected 
   did not appear.(Case 3 and Case 4)
  
   (snip)
       constraints
           rsc_order id=test-order
                   resource_set sequential=true 
  id=test-order-resource_set  --- or false
                           resource_ref id=vip-master/
                           resource_ref id=vip-rep/
                   /resource_set
           /rsc_order
       /constraints
   (snip)
  
  
   Case 1) group meta_attribute ordered=false
    * Start of vip-rep is published without waiting for start of vip-master.
  
   [root@rh63-heartbeat2 ~]# grep Initiating action /var/log/ha-log
   Mar  7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
   Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - 
   no waiting
   Mar  7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
   Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 
   (local) - no waiting
   Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
   Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
   Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
   Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 
   (local)
   Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
   Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
   Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
   Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
   Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
   Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 
   (local) - no waiting
   Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
   Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - 
   no waiting
   Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
   Initiating action 5: start vip-master_start_0 on rh63-heartbeat1
   Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
   Initiating action 7: start vip-rep_start_0 on rh63-heartbeat1
   Mar  7 19:41:26 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
   Initiating action 8: monitor vip-rep_monitor_1 on rh63-heartbeat1
   Mar  7 19:41:27 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
   Initiating action 2: stop vip-master_stop_0 on rh63-heartbeat1
   Mar  7 19:41:28 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
   Initiating action 6: stop vip-rep_stop_0 on rh63-heartbeat1
  
  
   Case 2) group meta_attribute ordered=true
    * Start of vip-rep waits 

Re: [Pacemaker] [Question]About sequential designation of resource_set.

2013-03-06 Thread renayama19661014
Hi Andrew,

  You use the resource sets _instead_ of a group.
  If you want group.ordered=false, then use a colocation set (with
  sequential=true).

In colocation, I used resource_set.
However, a result did not include the change.

Will this result be a mistake of my setting?

Case 1) sequential=false
(snip)
constraints
  rsc_colocation id=test-colocation score=INFINITY
resource_set sequential=false id=test-colocation-resource_set
  resource_ref id=vip-master/
  resource_ref id=vip-rep/
/resource_set
  /rsc_colocation
/constraints
(sip)
[root@rh63-heartbeat2 ~]# grep Initiating action /var/log/ha-log
Mar  8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 6: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 5: start vip-master_start_0 on rh63-heartbeat1
Mar  8 00:20:58 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 1: stop vip-master_stop_0 on rh63-heartbeat1


Case 2) sequential=true
(snip)
constraints
  rsc_colocation id=test-colocation score=INFINITY
resource_set sequential=true id=test-colocation-resource_set
  resource_ref id=vip-master/
  resource_ref id=vip-rep/
/resource_set
  /rsc_colocation
/constraints
(snip)
[root@rh63-heartbeat2 ~]# grep Initiating action /var/log/ha-log
Mar  7 23:54:44 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  7 23:54:44 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 6: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  7 23:54:49 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  7 23:54:49 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 5: start vip-master_start_0 on rh63-heartbeat1
Mar  7 23:54:51 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 1: stop vip-master_stop_0 on rh63-heartbeat1


Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.

2013-03-10 Thread renayama19661014
Hi Dejan,

Thank you for comment.

 sequential=true is the default. In that case it's not possible to
 have an unequivocal representation for the same construct and, in
 this particular case, the conversion XML-CLI-XML yields a
 different XML. There's a later commit which helps here, I think
 that it should be possible to backport it to 1.0:
 
 changeset:   789:916d1b15edc3
 user:Dejan Muhamedagic de...@hello-penguin.com
 date:Thu Aug 16 17:01:24 2012 +0200
 summary: Medium: cibconfig: drop attributes set to default on cib import

I apply the backporting that you taught and confirm movement.
I talk with you again if I have a problem.

  Is there a right method to appoint an attribute of resource_set with crm 
  shell?
  Possibly is not resource_set usable with crm shell of Pacemaker1.0.13?
 
 Should work. It's just that using it with two resources, well,
 it's sort of unusual use case.

All right!

Many Thanks!
Hideo Yamauchi.

--- On Fri, 2013/3/8, Dejan Muhamedagic deja...@fastmail.fm wrote:

 Hi Hideo-san,
 
 On Thu, Mar 07, 2013 at 10:18:09AM +0900, renayama19661...@ybb.ne.jp wrote:
  Hi Dejan,
  
  The problem was settled with your patch.
  
  However, I have a question.
  I want to use resource_set which Mr. Andrew proposed, but do not 
  understand a method to use with crm shell.
  
  I read two next cib.xml and confirmed it with crm shell.
  
  Case 1) sequential=false. 
  (snip)
      constraints
          rsc_order id=test-order
                  resource_set sequential=false 
 id=test-order-resource_set
                          resource_ref id=vip-master/
                          resource_ref id=vip-rep/
                  /resource_set
          /rsc_order
      /constraints
  (snip)
   * When I confirm it with crm shell ...
  (snip)
      group master-group vip-master vip-rep
      order test-order : _rsc_set_ ( vip-master vip-rep )
  (snip)
 
 Yes. All size two resource sets get the _rsc_set_ keyword,
 otherwise it's not possible to distinguish them from normal
 constraints. Resource sets are supposed to help cases when it is
 necessary to express relation between three or more resources.
 Perhaps this case should be an exception.
 
  Case 2) sequential=true
  (snip)
      constraints
        rsc_order id=test-order
          resource_set sequential=true id=test-order-resource_set
            resource_ref id=vip-master/
            resource_ref id=vip-rep/
          /resource_set
        /rsc_order
      /constraints
  (snip)
   * When I confirm it with crm shell ...
  (snip)
     group master-group vip-master vip-rep
     xml rsc_order id=test-order \
          resource_set id=test-order-resource_set sequential=true \
                  resource_ref id=vip-master/ \
                  resource_ref id=vip-rep/ \
          /resource_set \
  /rsc_order
  (snip)
  
  Does the designation of sequential=true have to describe it in xml?
 
 sequential=true is the default. In that case it's not possible to
 have an unequivocal representation for the same construct and, in
 this particular case, the conversion XML-CLI-XML yields a
 different XML. There's a later commit which helps here, I think
 that it should be possible to backport it to 1.0:
 
 changeset:   789:916d1b15edc3
 user:        Dejan Muhamedagic de...@hello-penguin.com
 date:        Thu Aug 16 17:01:24 2012 +0200
 summary:     Medium: cibconfig: drop attributes set to default on cib import
 
  Is there a right method to appoint an attribute of resource_set with crm 
  shell?
  Possibly is not resource_set usable with crm shell of Pacemaker1.0.13?
 
 Should work. It's just that using it with two resources, well,
 it's sort of unusual use case.
 
 Cheers,
 
 Dejan
 
  Best Regards,
  Hideo Yamauchi.
  
  --- On Thu, 2013/3/7, renayama19661...@ybb.ne.jp 
  renayama19661...@ybb.ne.jp wrote:
  
   Hi Dejan,
   Hi Andrew,
   
   Thank you for comment.
   I confirm the movement of the patch and report it.
   
   Best Regards,
   Hideo Yamauchi.
   
   --- On Wed, 2013/3/6, Dejan Muhamedagic deja...@fastmail.fm wrote:
   
Hi Hideo-san,

On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp 
wrote:
 Hi Dejan,
 Hi Andrew,
 
 As for the crm shell, the check of the meta attribute was revised 
 with the next patch.
 
  * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3
 
 This patch was backported in Pacemaker1.0.13.
 
  * 
https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py
 
 However, the ordered,colocated attribute of the group resource is 
 treated as an error when I use crm Shell which adopted this patch.
 
 --
 (snip)
 ### Group Configuration ###
 group master-group \
         vip-master \
         vip-rep \
         meta \
                 ordered=false
 (snip)
 
 

Re: [Pacemaker] [Question]About sequential designation of resource_set.

2013-03-13 Thread renayama19661014
Hi Andrew,

 In colocation, I used resource_set.
 However, a result did not include the change.

Please, about the result that I tried, give me comment.

Best Regards,
Hideo Yamauchi.

--- On Thu, 2013/3/7, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi Andrew,
 
   You use the resource sets _instead_ of a group.
   If you want group.ordered=false, then use a colocation set (with
   sequential=true).
 
 In colocation, I used resource_set.
 However, a result did not include the change.
 
 Will this result be a mistake of my setting?
 
 Case 1) sequential=false
 (snip)
     constraints
       rsc_colocation id=test-colocation score=INFINITY
         resource_set sequential=false id=test-colocation-resource_set
           resource_ref id=vip-master/
           resource_ref id=vip-rep/
         /resource_set
       /rsc_colocation
     /constraints
 (sip)
 [root@rh63-heartbeat2 ~]# grep Initiating action /var/log/ha-log
 Mar  8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no 
 waiting
 Mar  8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 (local) 
 - no waiting
 Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
 Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
 Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
 Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
 Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 (local) 
 - no waiting
 Mar  8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no 
 waiting
 Mar  8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 5: start vip-master_start_0 on rh63-heartbeat1
 Mar  8 00:20:58 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 1: stop vip-master_stop_0 on rh63-heartbeat1
 
 
 Case 2) sequential=true
 (snip)
     constraints
       rsc_colocation id=test-colocation score=INFINITY
         resource_set sequential=true id=test-colocation-resource_set
           resource_ref id=vip-master/
           resource_ref id=vip-rep/
         /resource_set
       /rsc_colocation
     /constraints
 (snip)
 [root@rh63-heartbeat2 ~]# grep Initiating action /var/log/ha-log
 Mar  7 23:54:44 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no 
 waiting
 Mar  7 23:54:44 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 (local) 
 - no waiting
 Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
 Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
 Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
 Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
 Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 (local) 
 - no waiting
 Mar  7 23:54:49 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no 
 waiting
 Mar  7 23:54:49 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 5: start vip-master_start_0 on rh63-heartbeat1
 Mar  7 23:54:51 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 1: stop vip-master_stop_0 on rh63-heartbeat1
 
 
 Best Regards,
 Hideo Yamauchi.
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.

2013-03-18 Thread renayama19661014
Hi Dejan,

  changeset:   789:916d1b15edc3
  user:Dejan Muhamedagic de...@hello-penguin.com
  date:Thu Aug 16 17:01:24 2012 +0200
  summary: Medium: cibconfig: drop attributes set to default on cib import

I confirmed that I was set definitely without becoming xml if you made the 
modifications that you taught.

* When I set true with cib.xml file.(sequential=true)
(snip)
constraints
rsc_order id=test-order
resource_set sequential=true id=test-order-resource_set
resource_ref id=Dummy01/
resource_ref id=Dummy02/
/resource_set
/rsc_order
/constraints
(snip)

[root@rh64-heartbeat1 ~]# crm 
crm(live)# configure
crm(live)configure# show
(snip)
group testGroup01 Dummy01 Dummy02
order test-order : _rsc_set_ Dummy01 Dummy02
(snip)

Many Thanks!
Hideo Yamauchi.


--- On Mon, 2013/3/11, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi Dejan,
 
 Thank you for comment.
 
  sequential=true is the default. In that case it's not possible to
  have an unequivocal representation for the same construct and, in
  this particular case, the conversion XML-CLI-XML yields a
  different XML. There's a later commit which helps here, I think
  that it should be possible to backport it to 1.0:
  
  changeset:   789:916d1b15edc3
  user:        Dejan Muhamedagic de...@hello-penguin.com
  date:        Thu Aug 16 17:01:24 2012 +0200
  summary:     Medium: cibconfig: drop attributes set to default on cib import
 
 I apply the backporting that you taught and confirm movement.
 I talk with you again if I have a problem.
 
   Is there a right method to appoint an attribute of resource_set with 
   crm shell?
   Possibly is not resource_set usable with crm shell of Pacemaker1.0.13?
  
  Should work. It's just that using it with two resources, well,
  it's sort of unusual use case.
 
 All right!
 
 Many Thanks!
 Hideo Yamauchi.
 
 --- On Fri, 2013/3/8, Dejan Muhamedagic deja...@fastmail.fm wrote:
 
  Hi Hideo-san,
  
  On Thu, Mar 07, 2013 at 10:18:09AM +0900, renayama19661...@ybb.ne.jp wrote:
   Hi Dejan,
   
   The problem was settled with your patch.
   
   However, I have a question.
   I want to use resource_set which Mr. Andrew proposed, but do not 
   understand a method to use with crm shell.
   
   I read two next cib.xml and confirmed it with crm shell.
   
   Case 1) sequential=false. 
   (snip)
       constraints
           rsc_order id=test-order
                   resource_set sequential=false 
  id=test-order-resource_set
                           resource_ref id=vip-master/
                           resource_ref id=vip-rep/
                   /resource_set
           /rsc_order
       /constraints
   (snip)
    * When I confirm it with crm shell ...
   (snip)
       group master-group vip-master vip-rep
       order test-order : _rsc_set_ ( vip-master vip-rep )
   (snip)
  
  Yes. All size two resource sets get the _rsc_set_ keyword,
  otherwise it's not possible to distinguish them from normal
  constraints. Resource sets are supposed to help cases when it is
  necessary to express relation between three or more resources.
  Perhaps this case should be an exception.
  
   Case 2) sequential=true
   (snip)
       constraints
         rsc_order id=test-order
           resource_set sequential=true id=test-order-resource_set
             resource_ref id=vip-master/
             resource_ref id=vip-rep/
           /resource_set
         /rsc_order
       /constraints
   (snip)
    * When I confirm it with crm shell ...
   (snip)
      group master-group vip-master vip-rep
      xml rsc_order id=test-order \
           resource_set id=test-order-resource_set sequential=true \
                   resource_ref id=vip-master/ \
                   resource_ref id=vip-rep/ \
           /resource_set \
   /rsc_order
   (snip)
   
   Does the designation of sequential=true have to describe it in xml?
  
  sequential=true is the default. In that case it's not possible to
  have an unequivocal representation for the same construct and, in
  this particular case, the conversion XML-CLI-XML yields a
  different XML. There's a later commit which helps here, I think
  that it should be possible to backport it to 1.0:
  
  changeset:   789:916d1b15edc3
  user:        Dejan Muhamedagic de...@hello-penguin.com
  date:        Thu Aug 16 17:01:24 2012 +0200
  summary:     Medium: cibconfig: drop attributes set to default on cib import
  
   Is there a right method to appoint an attribute of resource_set with 
   crm shell?
   Possibly is not resource_set usable with crm shell of Pacemaker1.0.13?
  
  Should work. It's just that using it with two resources, well,
  it's sort of unusual use case.
  
  Cheers,
  
  Dejan
  
   Best Regards,
   Hideo Yamauchi.
   
   --- On Thu, 2013/3/7, renayama19661...@ybb.ne.jp 
   renayama19661...@ybb.ne.jp wrote:
   
Hi Dejan,
Hi Andrew,

Thank you for 

Re: [Pacemaker] [Question]About sequential designation of resource_set.

2013-03-21 Thread renayama19661014
Hi Andrew,

I registered this question with Bugzilla.

 * http://bugs.clusterlabs.org/show_bug.cgi?id=5147

Best Regards,
Hideo Yamauchi.

--- On Thu, 2013/3/14, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi Andrew,
 
  In colocation, I used resource_set.
  However, a result did not include the change.
 
 Please, about the result that I tried, give me comment.
 
 Best Regards,
 Hideo Yamauchi.
 
 --- On Thu, 2013/3/7, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
 wrote:
 
  Hi Andrew,
  
You use the resource sets _instead_ of a group.
If you want group.ordered=false, then use a colocation set (with
sequential=true).
  
  In colocation, I used resource_set.
  However, a result did not include the change.
  
  Will this result be a mistake of my setting?
  
  Case 1) sequential=false
  (snip)
      constraints
        rsc_colocation id=test-colocation score=INFINITY
          resource_set sequential=false id=test-colocation-resource_set
            resource_ref id=vip-master/
            resource_ref id=vip-rep/
          /resource_set
        /rsc_colocation
      /constraints
  (sip)
  [root@rh63-heartbeat2 ~]# grep Initiating action /var/log/ha-log
  Mar  8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no 
  waiting
  Mar  8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 
  (local) - no waiting
  Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
  Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
  Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
  Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
  Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 
  (local) - no waiting
  Mar  8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no 
  waiting
  Mar  8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 5: start vip-master_start_0 on rh63-heartbeat1
  Mar  8 00:20:58 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 1: stop vip-master_stop_0 on rh63-heartbeat1
  
  
  Case 2) sequential=true
  (snip)
      constraints
        rsc_colocation id=test-colocation score=INFINITY
          resource_set sequential=true id=test-colocation-resource_set
            resource_ref id=vip-master/
            resource_ref id=vip-rep/
          /resource_set
        /rsc_colocation
      /constraints
  (snip)
  [root@rh63-heartbeat2 ~]# grep Initiating action /var/log/ha-log
  Mar  7 23:54:44 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
  Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no 
  waiting
  Mar  7 23:54:44 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
  Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 
  (local) - no waiting
  Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
  Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
  Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
  Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
  Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
  Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
  Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
  Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
  Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
  Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 
  (local) - no waiting
  Mar  7 23:54:49 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
  Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no 
  waiting
  Mar  7 23:54:49 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
  Initiating action 5: start vip-master_start_0 on rh63-heartbeat1
  Mar  7 23:54:51 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
  Initiating action 1: stop vip-master_stop_0 on rh63-heartbeat1
  
  
  Best Regards,
  Hideo Yamauchi.
  
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: 

Re: [Pacemaker] [Question]About sequential designation of resource_set.

2013-03-21 Thread renayama19661014
Hi Andrew,

Thank your for comments.
 
   You use the resource sets _instead_ of a group.
   If you want group.ordered=false, then use a colocation set (with
   sequential=true).
 
 In colocation, I used resource_set.
 However, a result did not include the change.
 Will this result be a mistake of my setting?
 
 Case 1) sequential=false
 (snip)
 constraints
   rsc_colocation id=test-colocation score=INFINITY
 resource_set sequential=false id=test-colocation-resource_set
   resource_ref id=vip-master/
   resource_ref id=vip-rep/
 /resource_set
   /rsc_colocation
 /constraints
 
 What are trying to achieve with this?  It doesn't do anything because there 
 is nothing to collocate master or rep with.
 The only value here is to show that rep would not be stopped when master is. 

However, you made next reply.
I used colocation_set in substitution for ordered=false.

You use the resource sets _instead_ of a group. 
If you want group.ordered=false, then use a colocation set (with 
sequential=true). 
If you want group.colocated=false, then use an ordering set (with 
sequential=true). 

After all is it right that the substitute for ordered=false of group sets 
order_set?

Best Regards,
Hideo Yamauchi.



--- On Fri, 2013/3/22, Andrew Beekhof and...@beekhof.net wrote:

 
 
 On Thursday, March 7, 2013,   wrote:
 Hi Andrew,
 
   You use the resource sets _instead_ of a group.
   If you want group.ordered=false, then use a colocation set (with
   sequential=true).
 
 In colocation, I used resource_set.
 However, a result did not include the change.
 Will this result be a mistake of my setting?
 
 Case 1) sequential=false
 (snip)
     constraints
       rsc_colocation id=test-colocation score=INFINITY
         resource_set sequential=false id=test-colocation-resource_set
           resource_ref id=vip-master/
           resource_ref id=vip-rep/
         /resource_set
       /rsc_colocation
     /constraints
 
 What are trying to achieve with this?  It doesn't do anything because there 
 is nothing to collocate master or rep with.
 The only value here is to show that rep would not be stopped when master is. 
  (sip)
 [root@rh63-heartbeat2 ~]# grep Initiating action /var/log/ha-log
 Mar  8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no 
 waiting
 Mar  8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 (local) 
 - no waiting
 Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
 Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
 Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
 Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
 Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 (local) 
 - no waiting
 Mar  8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no 
 waiting
 Mar  8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 5: start vip-master_start_0 on rh63-heartbeat1
 Mar  8 00:20:58 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
 Initiating action 1: stop vip-master_stop_0 on rh63-heartbeat1
 
 
 Case 2) sequential=true
 (snip)
     constraints
       rsc_colocation id=test-colocation score=INFINITY
         resource_set sequential=true id=test-colocation-resource_set
           resource_ref id=vip-master/
           resource_ref id=vip-rep/
         /resource_set
       /rsc_colocation
     /constraints
 (snip)
 [root@rh63-heartbeat2 ~]# grep Initiating action /var/log/ha-log
 Mar  7 23:54:44 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no 
 waiting
 Mar  7 23:54:44 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 (local) 
 - no waiting
 Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
 Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
 Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
 Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
 Initiating action 8: monitor 

Re: [Pacemaker] [Question]About sequential designation of resource_set.

2013-03-21 Thread renayama19661014
Hi Andrew,

Thank your for comment.

 Sorry, I'm not sure I understand the question.

Sorry

When we use resource_set in substitution for ordered of a thing of group 
resource, do we use colocation set?
Or do we use ordering set ?

It does not seem to do work same as group ordered=fase if it is right to use 
ordering set.

Best Regards,
Hideo Yamauchi.

--- On Fri, 2013/3/22, Andrew Beekhof and...@beekhof.net wrote:

 On Fri, Mar 22, 2013 at 12:34 PM,  renayama19661...@ybb.ne.jp wrote:
  Hi Andrew,
 
  Thank your for comments.
 
You use the resource sets _instead_ of a group.
If you want group.ordered=false, then use a colocation set (with
sequential=true).
 
  In colocation, I used resource_set.
  However, a result did not include the change.
  Will this result be a mistake of my setting?
 
  Case 1) sequential=false
  (snip)
      constraints
        rsc_colocation id=test-colocation score=INFINITY
          resource_set sequential=false id=test-colocation-resource_set
            resource_ref id=vip-master/
            resource_ref id=vip-rep/
          /resource_set
        /rsc_colocation
      /constraints
 
  What are trying to achieve with this?  It doesn't do anything because 
  there is nothing to collocate master or rep with.
  The only value here is to show that rep would not be stopped when master 
  is.
 
  However, you made next reply.
  I used colocation_set in substitution for ordered=false.
 
 You use the resource sets _instead_ of a group.
 If you want group.ordered=false, then use a colocation set (with
 sequential=true).
 If you want group.colocated=false, then use an ordering set (with
 sequential=true).
 
  After all is it right that the substitute for ordered=false of group sets 
  order_set?
 
 Sorry, I'm not sure I understand the question.
 
 
  Best Regards,
  Hideo Yamauchi.
 
 
 
  --- On Fri, 2013/3/22, Andrew Beekhof and...@beekhof.net wrote:
 
 
 
  On Thursday, March 7, 2013,   wrote:
  Hi Andrew,
 
You use the resource sets _instead_ of a group.
If you want group.ordered=false, then use a colocation set (with
sequential=true).
 
  In colocation, I used resource_set.
  However, a result did not include the change.
  Will this result be a mistake of my setting?
 
  Case 1) sequential=false
  (snip)
      constraints
        rsc_colocation id=test-colocation score=INFINITY
          resource_set sequential=false id=test-colocation-resource_set
            resource_ref id=vip-master/
            resource_ref id=vip-rep/
          /resource_set
        /rsc_colocation
      /constraints
 
  What are trying to achieve with this?  It doesn't do anything because 
  there is nothing to collocate master or rep with.
  The only value here is to show that rep would not be stopped when master 
  is.
   (sip)
  [root@rh63-heartbeat2 ~]# grep Initiating action /var/log/ha-log
  Mar  8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no 
  waiting
  Mar  8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 
  (local) - no waiting
  Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
  Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 
  (local)
  Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
  Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
  Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 
  (local) - no waiting
  Mar  8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no 
  waiting
  Mar  8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 5: start vip-master_start_0 on rh63-heartbeat1
  Mar  8 00:20:58 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: 
  Initiating action 1: stop vip-master_stop_0 on rh63-heartbeat1
 
 
  Case 2) sequential=true
  (snip)
      constraints
        rsc_colocation id=test-colocation score=INFINITY
          resource_set sequential=true id=test-colocation-resource_set
            resource_ref id=vip-master/
            resource_ref id=vip-rep/
          /resource_set
        /rsc_colocation
      /constraints
  (snip)
  [root@rh63-heartbeat2 ~]# grep Initiating action /var/log/ha-log
  Mar  7 23:54:44 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: 
  Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no 
  waiting
  Mar  7 23:54:44 rh63-heartbeat2 crmd: 

Re: [Pacemaker] [Question]About sequential designation of resource_set.

2013-04-07 Thread renayama19661014
Hi Andrew,

Thank you for comments.

  Using ordering_set and colocation_set, is it impossible to perform movement 
  same as ordered=false of the group resource?
 
 Yes, because they're not the same thing.
 
 Setting sequential=false is not at all like setting ordered=false.
 Setting ordered=false is the equivalent of _removing_ rsc_order 
 id=test-order completely.

Which next case does your answer correspond to?

Case 1)  Cannot replace ordered=false of group with sequential. Therefore, 
the ordered attribute of the group resource continues being supported from 
now on.

Case 2) It works in the same way if I exclude rsc_order id=test-order like 
the next setting. (rsc_colocation sequential=false, remove rsc_order 
id=test-order.) 

(snip)
resources
group id=testGroup01
   primitive class=ocf type=Dummy provider=heartbeat id=vip-master
 operations
(snip)
   primitive class=ocf type=Dummy provider=heartbeat id=vip-rep
 operations
 /group
(snip)
   constraints
   rsc_colocation id=test-colocation
   resource_set sequential=false 
id=test-colocation-resource_set
   resource_ref id=vip-master/
   resource_ref id=vip-rep/
   /resource_set
/rsc_colocation
   /constraints
(snip)

Case 3) There is a method to set ordered=false of group elsewhere.

Best Regards,
Hideo Yamauchi.



--- On Mon, 2013/4/8, Andrew Beekhof and...@beekhof.net wrote:

 
 On 22/03/2013, at 3:17 PM, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
  
  Thank you for comments.
  
  We demand time and the same movement that appointed ordered=false of the 
  group resource.
  
  * Case 0 - group : orderded=false
   * At the time of orderded=false, it takes start of vip-rep.  
 We demand!!
  {{{
  (snip)
     resources
       group id=testGroup01
         meta_attributes id=master-group-meta_attributes
           nvpair id=master-group-meta_attributes-ordered name=ordered 
 value=false/
         /meta_attributes
         primitive class=ocf type=Dummy1 provider=heartbeat 
 id=vip-master
           operations
             op id=op-Dummy01-start interval=0 name=start 
 timeout=60s on-fail=restart/
             op id=op-Dummy01-monitor interval=10 name=monitor 
 timeout=60s on-fail=restart/
             op id=op-Dummy01-stop interval=0 name=stop timeout=60s 
 on-fail=block/
           /operations
         /primitive
         primitive class=ocf type=Dummy2 provider=heartbeat 
 id=vip-rep
           operations
             op id=op-Dummy02-start interval=0 name=start 
 timeout=60s on-fail=restart/
             op id=op-Dummy02-monitor interval=10 name=monitor 
 timeout=60s on-fail=restart/
             op id=op-Dummy02-stop interval=0 name=stop timeout=60s 
 on-fail=block/
           /operations
         /primitive
       /group
      /resources
  (snip)
  [root@rh64-heartbeat1 ~]# grep Initiating action /var/log/ha-log
  Mar 22 21:45:01 rh64-heartbeat1 crmd: [2625]: info: te_rsc_command: 
  Initiating action 2: probe_complete probe_complete on rh64-heartbeat1 
  (local) - no waiting
  Mar 22 21:46:36 rh64-heartbeat1 crmd: [2625]: info: te_rsc_command: 
  Initiating action 4: monitor vip-master_monitor_0 on rh64-heartbeat1 (local)
  Mar 22 21:46:36 rh64-heartbeat1 crmd: [2625]: info: te_rsc_command: 
  Initiating action 5: monitor vip-rep_monitor_0 on rh64-heartbeat1 (local)
  Mar 22 21:46:36 rh64-heartbeat1 crmd: [2625]: info: te_rsc_command: 
  Initiating action 3: probe_complete probe_complete on rh64-heartbeat1 
  (local) - no waiting
  Mar 22 21:46:36 rh64-heartbeat1 crmd: [2625]: info: te_rsc_command: 
  Initiating action 6: start vip-master_start_0 on rh64-heartbeat1 (local)
  Mar 22 21:46:36 rh64-heartbeat1 crmd: [2625]: info: te_rsc_command: 
  Initiating action 8: start vip-rep_start_0 on rh64-heartbeat1 (local)
  Mar 22 21:46:37 rh64-heartbeat1 crmd: [2625]: info: te_rsc_command: 
  Initiating action 1: stop vip-master_stop_0 on rh64-heartbeat1 (local)
  Mar 22 21:46:37 rh64-heartbeat1 crmd: [2625]: info: te_rsc_command: 
  Initiating action 5: stop vip-rep_stop_0 on rh64-heartbeat1 (local)
  }}}
  
  
  I tried an all combination of ordering_set and colocation_set.
  However, start of vip-rep was not carried out by all combinations.
  * I do not do the ordered=false designation of the group resource.
  
  * Case 1 : true/true
  {{{
  (snip)
     resources
       group id=testGroup01
         primitive class=ocf type=Dummy1 provider=heartbeat 
 id=vip-master
           operations
  (snip)
     constraints
         rsc_colocation id=test-colocation
                 resource_set sequential=true 
 id=test-colocation-resource_set
                         resource_ref id=vip-master/
                         resource_ref id=vip-rep/
                 /resource_set
         /rsc_colocation
         rsc_order id=test-order
                 resource_set sequential=true id=test-order-resource_set
                         

Re: [Pacemaker] [Question]About sequential designation of resource_set.

2013-04-08 Thread renayama19661014
Hi Andrew,

Thank you for comments.

  Using ordering_set and colocation_set, is it impossible to perform 
  movement same as ordered=false of the group resource?
  
  Yes, because they're not the same thing.
  
  Setting sequential=false is not at all like setting ordered=false.
  Setting ordered=false is the equivalent of _removing_ rsc_order 
  id=test-order completely.
  
  Which next case does your answer correspond to?
 
 I was answering Case 4 as thats where I saw the '?'.
 However it equally applies to all cases.
 
 If you do not want ordering, do not define an ordering constraint.
 

Okay!

I changed case 4 and carried it out.(remove rsc_order id=test-order.)

(snip)
  group id=testGroup01
primitive class=ocf type=Dummy provider=heartbeat 
id=vip-master
(snip)
primitive class=ocf type=Dummy provider=heartbeat id=vip-rep
  /group
(snip)
constraints
rsc_colocation id=test-colocation
resource_set sequential=false 
id=test-colocation-resource_set
resource_ref id=vip-master/
resource_ref id=vip-rep/
/resource_set
/rsc_colocation
/constraints

(snip)
[root@rh64-heartbeat1 ~]# grep Initiating action /var/log/ha-log
Apr  8 23:46:32 rh64-heartbeat1 crmd: [3171]: info: te_rsc_command: Initiating 
action 2: probe_complete probe_complete on rh64-heartbeat1 (local) - no waiting
Apr  8 23:47:59 rh64-heartbeat1 crmd: [3171]: info: te_rsc_command: Initiating 
action 4: monitor vip-master_monitor_0 on rh64-heartbeat1 (local)
Apr  8 23:47:59 rh64-heartbeat1 crmd: [3171]: info: te_rsc_command: Initiating 
action 5: monitor vip-rep_monitor_0 on rh64-heartbeat1 (local)
Apr  8 23:47:59 rh64-heartbeat1 crmd: [3171]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh64-heartbeat1 (local) - no waiting
Apr  8 23:47:59 rh64-heartbeat1 crmd: [3171]: info: te_rsc_command: Initiating 
action 6: start vip-master_start_0 on rh64-heartbeat1 (local)
Apr  8 23:47:59 rh64-heartbeat1 crmd: [3171]: info: te_rsc_command: Initiating 
action 1: stop vip-master_stop_0 on rh64-heartbeat1 (local)

(snip)

Last updated: Mon Apr  8 23:48:04 2013
Stack: Heartbeat
Current DC: rh64-heartbeat1 (d2016b22-145f-4e6a-87a4-a05f7c5a9c29) - partition 
with quorum
Version: 1.0.13-30bb726
1 Nodes configured, unknown expected votes
1 Resources configured.


Online: [ rh64-heartbeat1 ]


Node Attributes:
* Node rh64-heartbeat1:

Migration summary:
* Node rh64-heartbeat1: 
   vip-master: migration-threshold=1 fail-count=100

Failed actions:
vip-master_start_0 (node=rh64-heartbeat1, call=4, rc=1, status=complete): 
unknown error

However, the result was the same.

When start trouble of vip-master happens, vip-rep does not do start.
The order of start of the resource seems to be controlled.

Possibly is it a problem of Pacemaker1.0?
Do you move well in Pacemaker1.1?

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question]About sequential designation of resource_set.

2013-04-08 Thread renayama19661014
Hi Andrew,

Thank you for comments.

 Oh!
 I somehow failed to recognise that you were using 1.0
 There is a reasonable chance that 1.1 behaves better in this regard.
 
 I also notice, now, that the resources are still in a group - deleting the 
 ordering constraint achieves nothing if the resources are still in a group.  
 Just define the resources and the colocation set, no group.

All right!

We use ordered of group in Pacemaker1.0.
In Pacemaker1.1, I believe that resourece_set moves in the future in 
substitution for ordered in group.

Many Thanks!
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Patch] An error may occur to be behind with a stop of pingd.

2013-04-10 Thread renayama19661014
Hi All,

We confirmed the phenomenon that an error generated to be behind with a stop of 
pingd.

The problem seems to be to be behind with receiving of SIGTERM of pingd until 
stand_alone_ping processing is completed.


Apr 11 00:48:33 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
192.168.40.1 is unreachable (read)
Apr 11 00:48:36 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
192.168.40.1 is unreachable (read)
Apr 11 00:48:39 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
192.168.40.1 is unreachable (read)
Apr 11 00:48:42 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
192.168.40.1 is unreachable (read)
Apr 11 00:48:45 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
192.168.40.1 is unreachable (read)
Apr 11 00:48:48 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
192.168.40.1 is unreachable (read)
(snip)
Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing 
/usr/lib64/heartbeat/crmd process group 2427 with signal 15
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_signal_dispatch: 
Invoking handler for signal 15: Terminated
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_shutdown: Requesting 
shutdown
(snip)
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: te_rsc_command: Initiating 
action 9: stop prmPingd:0_stop_0 on rh64-heartbeat1 (local)
Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: cancel_op: operation 
monitor[5] on prmPingd:0 for client 2427, its parameters: CRM_meta_clone=[0] 
host_list=[192.168.40.1] name=[default_ping_set] attempts=[2] 
CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] 
CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[1] 
timeout=[2] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] multiplier=[100] 
CRM_meta_interval=[1] CRM_meta_timeout=[6]  cancelled
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: do_lrm_rsc_op: Performing 
key=9:4:0:948901c2-4e97-4715-9f6b-1611810f8ef7 op=prmPingd:0_stop_0 )
Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: rsc:prmPingd:0 stop[9] (pid 
2570)
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM 
operation prmPingd:0_monitor_1 (call=5, status=1, cib-update=0, 
confirmed=true) Cancelled
Apr 11 00:48:50 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
192.168.40.1 is unreachable (read)
Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: operation stop[9] on 
prmPingd:0 for client 2427: pid 2570 exited with return code 0
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM 
operation prmPingd:0_stop_0 (call=9, rc=0, cib-update=59, confirmed=true) ok
Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: match_graph_event: Action 
prmPingd:0_stop_0 (9) confirmed on rh64-heartbeat1 (rc=0)
(snip)
Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing 
/usr/lib64/heartbeat/ccm process group 2422 with signal 15
Apr 11 00:48:50 rh64-heartbeat1 ccm: [2422]: info: received SIGTERM, going to 
shut down
Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: ERROR: send_ipc_message: IPC 
Channel to 2426 is not connected--- ERROR
Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: info: attrd_update: Could not 
send update: default_ping_set=0 for localhost
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBWRITE 
process 2418 with signal 15
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBREAD process 
2419 with signal 15
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBFIFO process 
2417 with signal 15
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2417 
exited. 3 remaining
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2418 
exited. 2 remaining
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2419 
exited. 1 remaining
Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: rh64-heartbeat1 
Heartbeat shutdown complete.
Apr 11 00:48:53 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
Connecting to cluster... 4 retries remaining Pingd do 
not yet stop
Apr 11 00:48:55 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
Connecting to cluster... 3 retries remaining
Apr 11 00:48:57 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
Connecting to cluster... 2 retries remaining
Apr 11 00:48:59 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
Connecting to cluster... 1 retries remaining
Apr 11 00:49:01 rh64-heartbeat1 pingd: [2505]: info: crm_signal_dispatch: 
Invoking handler for signal 15: Terminated
Apr 11 00:49:01 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
Connecting to cluster... 5 retries remaining
Apr 11 00:49:03 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
Connecting to cluster... 4 retries 

Re: [Pacemaker] [Patch] An error may occur to be behind with a stop of pingd.

2013-04-17 Thread renayama19661014
Hi All,

I sent the pull request of this patch.

 * https://github.com/ClusterLabs/pacemaker-1.0/pull/13

Best Regards,
Hideo Yamauchi.

--- On Wed, 2013/4/10, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi All,
 
 We confirmed the phenomenon that an error generated to be behind with a stop 
 of pingd.
 
 The problem seems to be to be behind with receiving of SIGTERM of pingd until 
 stand_alone_ping processing is completed.
 
 
 Apr 11 00:48:33 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
 192.168.40.1 is unreachable (read)
 Apr 11 00:48:36 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
 192.168.40.1 is unreachable (read)
 Apr 11 00:48:39 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
 192.168.40.1 is unreachable (read)
 Apr 11 00:48:42 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
 192.168.40.1 is unreachable (read)
 Apr 11 00:48:45 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
 192.168.40.1 is unreachable (read)
 Apr 11 00:48:48 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
 192.168.40.1 is unreachable (read)
 (snip)
 Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing 
 /usr/lib64/heartbeat/crmd process group 2427 with signal 15
 Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_signal_dispatch: 
 Invoking handler for signal 15: Terminated
 Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_shutdown: Requesting 
 shutdown
 (snip)
 Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: te_rsc_command: 
 Initiating action 9: stop prmPingd:0_stop_0 on rh64-heartbeat1 (local)
 Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: cancel_op: operation 
 monitor[5] on prmPingd:0 for client 2427, its parameters: CRM_meta_clone=[0] 
 host_list=[192.168.40.1] name=[default_ping_set] attempts=[2] 
 CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] 
 CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[1] 
 timeout=[2] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] 
 multiplier=[100] CRM_meta_interval=[1] CRM_meta_timeout=[6]  cancelled
 Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: do_lrm_rsc_op: Performing 
 key=9:4:0:948901c2-4e97-4715-9f6b-1611810f8ef7 op=prmPingd:0_stop_0 )
 Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: rsc:prmPingd:0 stop[9] 
 (pid 2570)
 Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM 
 operation prmPingd:0_monitor_1 (call=5, status=1, cib-update=0, 
 confirmed=true) Cancelled
 Apr 11 00:48:50 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
 192.168.40.1 is unreachable (read)
 Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: operation stop[9] on 
 prmPingd:0 for client 2427: pid 2570 exited with return code 0
 Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM 
 operation prmPingd:0_stop_0 (call=9, rc=0, cib-update=59, confirmed=true) ok
 Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: match_graph_event: Action 
 prmPingd:0_stop_0 (9) confirmed on rh64-heartbeat1 (rc=0)
 (snip)
 Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing 
 /usr/lib64/heartbeat/ccm process group 2422 with signal 15
 Apr 11 00:48:50 rh64-heartbeat1 ccm: [2422]: info: received SIGTERM, going to 
 shut down
 Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: ERROR: send_ipc_message: IPC 
 Channel to 2426 is not connected                        --- ERROR
 Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: info: attrd_update: Could not 
 send update: default_ping_set=0 for localhost
 Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBWRITE 
 process 2418 with signal 15
 Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBREAD 
 process 2419 with signal 15
 Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBFIFO 
 process 2417 with signal 15
 Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2417 
 exited. 3 remaining
 Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2418 
 exited. 2 remaining
 Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2419 
 exited. 1 remaining
 Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: rh64-heartbeat1 
 Heartbeat shutdown complete.
 Apr 11 00:48:53 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
 Connecting to cluster... 4 retries remaining                 Pingd 
 do not yet stop
 Apr 11 00:48:55 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
 Connecting to cluster... 3 retries remaining
 Apr 11 00:48:57 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
 Connecting to cluster... 2 retries remaining
 Apr 11 00:48:59 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
 Connecting to cluster... 1 retries remaining
 Apr 11 00:49:01 rh64-heartbeat1 

Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.

2013-04-17 Thread renayama19661014
Hi Dejan,
Hi Andreas,

 The shell in pacemaker v1.0.x is in maintenance mode and shipped
 along with the pacemaker code. The v1.1.x doesn't have the
 ordered and collocated meta attributes.

I sent the pull request of the patch which Mr. Dejan donated.
 * https://github.com/ClusterLabs/pacemaker-1.0/pull/14

Many Thanks!
Hideo Yamauchi.
--- On Tue, 2013/4/2, Dejan Muhamedagic deja...@fastmail.fm wrote:

 Hi,
 
 On Mon, Apr 01, 2013 at 09:19:51PM +0200, Andreas Kurz wrote:
  Hi Dejan,
  
  On 2013-03-06 11:59, Dejan Muhamedagic wrote:
   Hi Hideo-san,
   
   On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp 
   wrote:
   Hi Dejan,
   Hi Andrew,
  
   As for the crm shell, the check of the meta attribute was revised with 
   the next patch.
  
    * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3
  
   This patch was backported in Pacemaker1.0.13.
  
    * 
  https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py
  
   However, the ordered,colocated attribute of the group resource is 
   treated as an error when I use crm Shell which adopted this patch.
  
   --
   (snip)
   ### Group Configuration ###
   group master-group \
           vip-master \
           vip-rep \
           meta \
                   ordered=false
   (snip)
  
   [root@rh63-heartbeat1 ~]# crm configure load update test2339.crm 
   INFO: building help index
   crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind faith: 
   not fencing unseen nodes
   WARNING: vip-master: specified timeout 60s for start is smaller than the 
   advised 90
   WARNING: vip-master: specified timeout 60s for stop is smaller than the 
   advised 100
   WARNING: vip-rep: specified timeout 60s for start is smaller than the 
   advised 90
   WARNING: vip-rep: specified timeout 60s for stop is smaller than the 
   advised 100
   ERROR: master-group: attribute ordered does not exist  - WHY?
   Do you still want to commit? y
   --
  
   If it chooses `yes` by a confirmation message, it is reflected, but it 
   is a problem that error message is displayed.
    * The error occurs in the same way when I appoint colocated attribute.
   AndI noticed that there was not explanation of ordered,colocated of 
   the group resource in online help of Pacemaker.
  
   I think that the designation of the ordered,colocated attribute should 
   not become the error in group resource.
   In addition, I think that ordered,colocated should be added to online 
   help.
   
   These attributes are not listed in crmsh. Does the attached patch
   help?
  
  Dejan, will this patch for the missing ordered and collocated group
  meta-attribute be included in the next crmsh release? ... can't see the
  patch in the current tip.
 
 The shell in pacemaker v1.0.x is in maintenance mode and shipped
 along with the pacemaker code. The v1.1.x doesn't have the
 ordered and collocated meta attributes.
 
 Thanks,
 
 Dejan
 
 
  Thanks  Regards,
  Andreas
  
   
   Thanks,
   
   Dejan
  
   Best Regards,
   Hideo Yamauchi.
  
  
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
   Project Home: http://www.clusterlabs.org
   Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
  
  
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
   Project Home: http://www.clusterlabs.org
   Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
  
  
  
 
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.

2013-04-30 Thread renayama19661014
Hi Mori san,

 The patch for crmsh is now included in the 1.0.x repository:
 
   
 https://github.com/ClusterLabs/pacemaker-1.0/commit/9227e89fb748cd52d330f5fca80d56fbd9d3efbf
 
 
 It will be appeared in 1.0.14 maintenance release, which is not scheduled yet 
 though.

All right.

Many Thanks!
Hideo Yamauchi.

--- On Tue, 2013/4/30, Keisuke MORI keisuke.mori...@gmail.com wrote:

 
 Hi Dejan, Andreas, Yamauchi-san
 
 
 
 
 
 2013/4/18  renayama19661...@ybb.ne.jp
 Hi Dejan,
 Hi Andreas,
 
 
  The shell in pacemaker v1.0.x is in maintenance mode and shipped
  along with the pacemaker code. The v1.1.x doesn't have the
  ordered and collocated meta attributes.
 
 I sent the pull request of the patch which Mr. Dejan donated.
  * https://github.com/ClusterLabs/pacemaker-1.0/pull/14
 
 
 
 The patch for crmsh is now included in the 1.0.x repository:
 
   
 https://github.com/ClusterLabs/pacemaker-1.0/commit/9227e89fb748cd52d330f5fca80d56fbd9d3efbf
 
 
 It will be appeared in 1.0.14 maintenance release, which is not scheduled yet 
 though.
 
 
 Thanks,
 
 
 Keisuke MORI
 
  Many Thanks!
 Hideo Yamauchi.
 
 
 --- On Tue, 2013/4/2, Dejan Muhamedagic deja...@fastmail.fm wrote:
 
  Hi,
 
  On Mon, Apr 01, 2013 at 09:19:51PM +0200, Andreas Kurz wrote:
   Hi Dejan,
  
   On 2013-03-06 11:59, Dejan Muhamedagic wrote:
Hi Hideo-san,
   
On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp 
wrote:
Hi Dejan,
Hi Andrew,
   
As for the crm shell, the check of the meta attribute was revised with 
the next patch.
   
     * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3
   
This patch was backported in Pacemaker1.0.13.
   
     * 
   https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py
   
However, the ordered,colocated attribute of the group resource is 
treated as an error when I use crm Shell which adopted this patch.
   
--
(snip)
### Group Configuration ###
group master-group \
            vip-master \
            vip-rep \
            meta \
                    ordered=false
(snip)
   
[root@rh63-heartbeat1 ~]# crm configure load update test2339.crm
INFO: building help index
crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind 
faith: not fencing unseen nodes
WARNING: vip-master: specified timeout 60s for start is smaller than 
the advised 90
WARNING: vip-master: specified timeout 60s for stop is smaller than 
the advised 100
WARNING: vip-rep: specified timeout 60s for start is smaller than the 
advised 90
WARNING: vip-rep: specified timeout 60s for stop is smaller than the 
advised 100
ERROR: master-group: attribute ordered does not exist  - WHY?
Do you still want to commit? y
--
   
If it chooses `yes` by a confirmation message, it is reflected, but it 
is a problem that error message is displayed.
     * The error occurs in the same way when I appoint colocated attribute.
AndI noticed that there was not explanation of ordered,colocated 
of the group resource in online help of Pacemaker.
   
I think that the designation of the ordered,colocated attribute should 
not become the error in group resource.
In addition, I think that ordered,colocated should be added to online 
help.
   
These attributes are not listed in crmsh. Does the attached patch
help?
  
   Dejan, will this patch for the missing ordered and collocated group
   meta-attribute be included in the next crmsh release? ... can't see the
   patch in the current tip.
 
  The shell in pacemaker v1.0.x is in maintenance mode and shipped
  along with the pacemaker code. The v1.1.x doesn't have the
  ordered and collocated meta attributes.
 
  Thanks,
 
  Dejan
 
 
   Thanks  Regards,
   Andreas
  
   
Thanks,
   
Dejan
   
Best Regards,
Hideo Yamauchi.
   
   
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
   
Project Home: http://www.clusterlabs.org
Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
   
   
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
   
Project Home: http://www.clusterlabs.org
Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
  
  
  
 
 
 
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
   Project Home: http://www.clusterlabs.org
   Getting started: 

[Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-13 Thread renayama19661014
Hi All,

We constituted a simple cluster in environment of vSphere5.1.

We composed it of two ESXi servers and shared disk.

The guest located it to the shared disk.


Step 1) Constitute a cluster.(A DC node is an active node.)


Last updated: Mon May 13 14:16:09 2013
Stack: Heartbeat
Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with 
quorum
Version: 1.0.13-30bb726
2 Nodes configured, unknown expected votes
2 Resources configured.


Online: [ pgsr01 pgsr02 ]

 Resource Group: test-group
 Dummy1 (ocf::pacemaker:Dummy): Started pgsr01
 Dummy2 (ocf::pacemaker:Dummy): Started pgsr01
 Clone Set: clnPingd
 Started: [ pgsr01 pgsr02 ]

Node Attributes:
* Node pgsr01:
+ default_ping_set  : 100   
* Node pgsr02:
+ default_ping_set  : 100   

Migration summary:
* Node pgsr01: 
* Node pgsr02: 


Step 2) Strace does the pengine process of the DC node.

[root@pgsr01 ~]# ps -ef |grep heartbeat
root  2072 1  0 13:56 ?00:00:00 heartbeat: master control 
process
root  2075  2072  0 13:56 ?00:00:00 heartbeat: FIFO reader
root  2076  2072  0 13:56 ?00:00:00 heartbeat: write: bcast eth1  
root  2077  2072  0 13:56 ?00:00:00 heartbeat: read: bcast eth1   
root  2078  2072  0 13:56 ?00:00:00 heartbeat: write: bcast eth2  
root  2079  2072  0 13:56 ?00:00:00 heartbeat: read: bcast eth2   
496   2082  2072  0 13:57 ?00:00:00 /usr/lib64/heartbeat/ccm
496   2083  2072  0 13:57 ?00:00:00 /usr/lib64/heartbeat/cib
root  2084  2072  0 13:57 ?00:00:00 /usr/lib64/heartbeat/lrmd -r
root  2085  2072  0 13:57 ?00:00:00 /usr/lib64/heartbeat/stonithd
496   2086  2072  0 13:57 ?00:00:00 /usr/lib64/heartbeat/attrd
496   2087  2072  0 13:57 ?00:00:00 /usr/lib64/heartbeat/crmd
496   2089  2087  0 13:57 ?00:00:00 /usr/lib64/heartbeat/pengine
root  2182 1  0 14:15 ?00:00:00 /usr/lib64/heartbeat/pingd -D 
-p /var/run//pingd-default_ping_set -a default_ping_set -d 5s -m 100 -i 1 -h 
192.168.101.254
root  2287  1973  0 14:16 pts/000:00:00 grep heartbea

[root@pgsr01 ~]# strace -p 2089
Process 2089 attached - interrupt to quit
restart_syscall(... resuming interrupted call ...) = 0
times({tms_utime=5, tms_stime=6, tms_cutime=0, tms_cstime=0}) = 429527557
recvfrom(5, 0xa93ff7, 953, 64, 0, 0)= -1 EAGAIN (Resource temporarily 
unavailable)
poll([{fd=5, events=0}], 1, 0)  = 0 (Timeout)
recvfrom(5, 0xa93ff7, 953, 64, 0, 0)= -1 EAGAIN (Resource temporarily 
unavailable)
poll([{fd=5, events=0}], 1, 0)  = 0 (Timeout)
(snip)


Step 3) Disconnect the shared disk which an active node was placed.

Step 4) Cut off pingd of the standby node. 
The score of pingd is reflected definitely, but handling of pengine 
blocks it.

~ # esxcfg-vswitch -N vmnic1 -p ap-db vSwitch1
~ # esxcfg-vswitch -N vmnic2 -p ap-db vSwitch1


(snip)
brk(0xd05000)   = 0xd05000
brk(0xeed000)   = 0xeed000
brk(0xf2d000)   = 0xf2d000
fstat(6, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f86a255a000
write(6, BZh51AYSY\327\373\370\203\0\t(_\200UPX\3\377\377%cT 
\277\377\377..., 2243) = 2243
brk(0xb1d000)   = 0xb1d000
fsync(6-- BLOCKED
(snip)



Last updated: Mon May 13 14:19:15 2013
Stack: Heartbeat
Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with 
quorum
Version: 1.0.13-30bb726
2 Nodes configured, unknown expected votes
2 Resources configured.


Online: [ pgsr01 pgsr02 ]

 Resource Group: test-group
 Dummy1 (ocf::pacemaker:Dummy): Started pgsr01
 Dummy2 (ocf::pacemaker:Dummy): Started pgsr01
 Clone Set: clnPingd
 Started: [ pgsr01 pgsr02 ]

Node Attributes:
* Node pgsr01:
+ default_ping_set  : 100   
* Node pgsr02:
+ default_ping_set  : 0 : Connectivity is lost

Migration summary:
* Node pgsr01: 
* Node pgsr02: 


Step 4) Reconnect communication of pingd of the standby node.
The score of pingd is reflected definitely, but handling of pengine 
blocks it.


~ # esxcfg-vswitch -M vmnic1 -p ap-db vSwitch1
~ # esxcfg-vswitch -M vmnic2 -p ap-db vSwitch1


Last updated: Mon May 13 14:19:40 2013
Stack: Heartbeat
Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition with 
quorum
Version: 1.0.13-30bb726
2 Nodes configured, unknown expected votes
2 Resources configured.


Online: [ pgsr01 pgsr02 ]

 Resource Group: test-group
 Dummy1 (ocf::pacemaker:Dummy): Started pgsr01
 Dummy2 (ocf::pacemaker:Dummy): Started pgsr01
 Clone Set: clnPingd
 Started: [ 

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-14 Thread renayama19661014
Hi Andrew,

  Thank you for comments.
  
  The guest located it to the shared disk.
  
  What is on the shared disk?  The whole OS or app-specific data (i.e. 
  nothing pacemaker needs directly)?
  
  Shared disk has all the OS and the all data.
 
 Oh. I can imagine that being problematic.
 Pacemaker really isn't designed to function without disk access.

I think so, too.

I thought so, and I did the following suggestion.

  For example...
  1. crmd watches a request to pengine with a timer...
  2. pengine writes in it with a timer and watches processing
  ..etc...

But, there may be a better method.

 
 You might be able to get away with it if you turn off saving PE files to disk 
 though.
 
  The placement of this shared disk is similar in KVM where the problem does 
  not occur.
 
 That it works in KVM in this situation is kind of surprising.
 Or perhaps I misunderstand.

About the movement on KVM, I confirm the details once again.
However, the movement on KVM is clearly different from the movement on 
vSphere5.1.

Best Regards,
Hideo Yamauchi.

 
  
  * We understand that we are different in movement in the difference of the 
  hyper visor.
  * However, it seems to be necessary to evade this problem to use Pacemaker 
  in vSphere5.1 environment.
  
  Best Regards,
  Hideo Yamauchi.
  
  
  --- On Wed, 2013/5/15, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 13/05/2013, at 4:14 PM, renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  We constituted a simple cluster in environment of vSphere5.1.
  
  We composed it of two ESXi servers and shared disk.
  
  The guest located it to the shared disk.
  
  What is on the shared disk?  The whole OS or app-specific data (i.e. 
  nothing pacemaker needs directly)?
  
  
  
  Step 1) Constitute a cluster.(A DC node is an active node.)
  
  
  Last updated: Mon May 13 14:16:09 2013
  Stack: Heartbeat
  Current DC: pgsr01 (85a81130-4fed-4932-ab4c-21ac2320186f) - partition 
  with quorum
  Version: 1.0.13-30bb726
  2 Nodes configured, unknown expected votes
  2 Resources configured.
  
  
  Online: [ pgsr01 pgsr02 ]
  
  Resource Group: test-group
       Dummy1     (ocf::pacemaker:Dummy): Started pgsr01
       Dummy2     (ocf::pacemaker:Dummy): Started pgsr01
  Clone Set: clnPingd
       Started: [ pgsr01 pgsr02 ]
  
  Node Attributes:
  * Node pgsr01:
      + default_ping_set                  : 100       
  * Node pgsr02:
      + default_ping_set                  : 100       
  
  Migration summary:
  * Node pgsr01: 
  * Node pgsr02: 
  
  
  Step 2) Strace does the pengine process of the DC node.
  
  [root@pgsr01 ~]# ps -ef |grep heartbeat
  root      2072     1  0 13:56 ?        00:00:00 heartbeat: master control 
  process
  root      2075  2072  0 13:56 ?        00:00:00 heartbeat: FIFO reader    
      
  root      2076  2072  0 13:56 ?        00:00:00 heartbeat: write: bcast 
  eth1  
  root      2077  2072  0 13:56 ?        00:00:00 heartbeat: read: bcast 
  eth1   
  root      2078  2072  0 13:56 ?        00:00:00 heartbeat: write: bcast 
  eth2  
  root      2079  2072  0 13:56 ?        00:00:00 heartbeat: read: bcast 
  eth2   
  496       2082  2072  0 13:57 ?        00:00:00 /usr/lib64/heartbeat/ccm
  496       2083  2072  0 13:57 ?        00:00:00 /usr/lib64/heartbeat/cib
  root      2084  2072  0 13:57 ?        00:00:00 /usr/lib64/heartbeat/lrmd 
  -r
  root      2085  2072  0 13:57 ?        00:00:00 
  /usr/lib64/heartbeat/stonithd
  496       2086  2072  0 13:57 ?        00:00:00 /usr/lib64/heartbeat/attrd
  496       2087  2072  0 13:57 ?        00:00:00 /usr/lib64/heartbeat/crmd
  496       2089  2087  0 13:57 ?        00:00:00 
  /usr/lib64/heartbeat/pengine
  root      2182     1  0 14:15 ?        00:00:00 
  /usr/lib64/heartbeat/pingd -D -p /var/run//pingd-default_ping_set -a 
  default_ping_set -d 5s -m 100 -i 1 -h 192.168.101.254
  root      2287  1973  0 14:16 pts/0    00:00:00 grep heartbea
  
  [root@pgsr01 ~]# strace -p 2089
  Process 2089 attached - interrupt to quit
  restart_syscall(... resuming interrupted call ...) = 0
  times({tms_utime=5, tms_stime=6, tms_cutime=0, tms_cstime=0}) = 429527557
  recvfrom(5, 0xa93ff7, 953, 64, 0, 0)    = -1 EAGAIN (Resource temporarily 
  unavailable)
  poll([{fd=5, events=0}], 1, 0)          = 0 (Timeout)
  recvfrom(5, 0xa93ff7, 953, 64, 0, 0)    = -1 EAGAIN (Resource temporarily 
  unavailable)
  poll([{fd=5, events=0}], 1, 0)          = 0 (Timeout)
  (snip)
  
  
  Step 3) Disconnect the shared disk which an active node was placed.
  
  Step 4) Cut off pingd of the standby node. 
          The score of pingd is reflected definitely, but handling of 
 pengine blocks it.
  
  ~ # esxcfg-vswitch -N vmnic1 -p ap-db vSwitch1
  ~ # esxcfg-vswitch -N vmnic2 -p ap-db vSwitch1
  
  
  (snip)
  brk(0xd05000)                           = 0xd05000
  brk(0xeed000)                           = 0xeed000
  brk(0xf2d000)                           = 0xf2d000
  fstat(6, 

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-16 Thread renayama19661014
Hi Andrew,
Hi Vladislav,

I try whether this correction is effective for this problem.
 * 
https://github.com/beekhof/pacemaker/commit/eb6264bf2db395779e65dadf1c626e050a388c59

Best Regards,
Hideo Yamauchi.

--- On Thu, 2013/5/16, Andrew Beekhof and...@beekhof.net wrote:

 
 On 16/05/2013, at 3:49 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
  16.05.2013 02:46, Andrew Beekhof wrote:
  
  On 15/05/2013, at 6:44 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
  
  15.05.2013 11:18, Andrew Beekhof wrote:
  
  On 15/05/2013, at 5:31 PM, Vladislav Bogdanov bub...@hoster-ok.com 
  wrote:
  
  15.05.2013 10:25, Andrew Beekhof wrote:
  
  On 15/05/2013, at 3:50 PM, Vladislav Bogdanov bub...@hoster-ok.com 
  wrote:
  
  15.05.2013 08:23, Andrew Beekhof wrote:
  
  On 15/05/2013, at 3:11 PM, renayama19661...@ybb.ne.jp wrote:
  
  Hi Andrew,
  
  Thank you for comments.
  
  The guest located it to the shared disk.
  
  What is on the shared disk?  The whole OS or app-specific data 
  (i.e. nothing pacemaker needs directly)?
  
  Shared disk has all the OS and the all data.
  
  Oh. I can imagine that being problematic.
  Pacemaker really isn't designed to function without disk access.
  
  You might be able to get away with it if you turn off saving PE 
  files to disk though.
  
  I store CIB and PE files to tmpfs, and sync them to remote storage
  (CIFS) with lsyncd level 1 config (I may share it on request). It 
  copies
  critical data like cib.xml, and moves everything else, symlinking it 
  to
  original place. The same technique may apply here, but with local fs
  instead of cifs.
  
  Btw, the following patch is needed for that, otherwise pacemaker
  overwrites remote files instead of creating new ones on tmpfs:
  
  --- a/lib/common/xml.c  2011-02-11 11:42:37.0 +0100
  +++ b/lib/common/xml.c  2011-02-24 15:07:48.541870829 +0100
  @@ -529,6 +529,8 @@ write_file(const char *string, const char 
  *filename)
       return -1;
   }
  
  +    unlink(filename);
  
  Seems like it should be safe to include for normal operation.
  
  Exactly.
  
  Small flaw in that logic... write_file() is not used anywhere.
  
  Heh, thanks for spotting this.
  
  I recall write_file() was used for pengine, but some other function for
  CIB. You probably optimized that but forgot to remove unused function,
  that's why I was sure patch is still valid. And I did tests (CIFS
  storage outage simulation) only after initial patch, but not last years,
  that's why I didn't notice the regression - storage uses pacemaker too ;) 
  .
  
  This should go to write_xml_file() (And probably to other places just
  before fopen(..., w), f.e. series).
  
  I've consolidated the code, however adding the unlink() would break things 
  for anyone intentionally symlinking cib.xml from somewhere else (like a 
  git repo).
  So I'm not so sure I should make the unlink() change :(
  
  Agree.
  I originally made it specific to pengine files.
  What do you prefer, simple wrapper in xml.c (f.e.
  unlink_and_write_xml_file()) or just add unlink() call to pengine before
  it calls write_xml_file()?
 
 The last one :)
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-17 Thread renayama19661014
Hi Vladislav,

Thank you for advice.

I try the patch which you showed.

We use Pacemaker1.0, but apply a patch there because there is a similar code.

If there is a question by setting, I ask you a question by an email.
 * At first I only use tmpfs, and I intend to test it.

 P.S. Andrew, is this patch ok to apply?

To Andrew...
  Does the patch in conjunction with the write_xml processing in your 
repository have to apply it before the confirmation of the patch of Vladislav?

Many Thanks!
Hideo Yamauchi.




--- On Fri, 2013/5/17, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 Hi Hideo-san,
 
 You may try the following patch (with trick below)
 
 From 2c4418d11c491658e33c149f63e6a2f2316ef310 Mon Sep 17 00:00:00 2001
 From: Vladislav Bogdanov bub...@hoster-ok.com
 Date: Fri, 17 May 2013 05:58:34 +
 Subject: [PATCH] Feature: PE: Unlink pengine output files before writing.
  This should help guys who store them to tmpfs and then copy to a stable 
 storage
  on (inotify) events with symlink creation in the original place to survive 
 when
  stable storage is not accessible.
 
 ---
  pengine/pengine.c |    1 +
  1 files changed, 1 insertions(+), 0 deletions(-)
 
 diff --git a/pengine/pengine.c b/pengine/pengine.c
 index c7e1c68..99a81c6 100644
 --- a/pengine/pengine.c
 +++ b/pengine/pengine.c
 @@ -184,6 +184,7 @@ process_pe_message(xmlNode * msg, xmlNode * xml_data, 
 crm_client_t * sender)
          }
  
          if (is_repoke == FALSE  series_wrap != 0) {
 +            unlink(filename);
              write_xml_file(xml_data, filename, HAVE_BZLIB_H);
              write_last_sequence(PE_STATE_DIR, series[series_id].name, seq + 
 1, series_wrap);
          } else {
 -- 
 1.7.1
 
 You just need to ensure that /var/lib/pacemaker is on tmpfs. Then you may 
 watch on directories there
 with inotify or so and take actions to move (copy) files to a stable storage 
 (RAM is not of infinite size).
 In my case that is CIFS. And I use lsyncd to synchronize that directories. If 
 you are interested, I can
 provide you with relevant lsyncd configuration. Frankly speaking, three is no 
 big need to create symlinks
 in tmpfs to stable storage, as pacemaker does not use existing pengine files 
 (except sequences). That sequence
 files and cib.xml are the only exceptions which you may want to exist in two 
 places (and you may want to copy
 them from stable storage to tmpfs before pacemaker start), and you can just 
 move everything else away from
 tmpfs once it is written. In this case you do not need this patch.
 
 Best,
 Vladislav
 
 P.S. Andrew, is this patch ok to apply?
 
 17.05.2013 03:27, renayama19661...@ybb.ne.jp wrote:
  Hi Andrew,
  Hi Vladislav,
  
  I try whether this correction is effective for this problem.
   * 
 https://github.com/beekhof/pacemaker/commit/eb6264bf2db395779e65dadf1c626e050a388c59
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Thu, 2013/5/16, Andrew Beekhof and...@beekhof.net wrote:
  
 
  On 16/05/2013, at 3:49 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
  16.05.2013 02:46, Andrew Beekhof wrote:
 
  On 15/05/2013, at 6:44 PM, Vladislav Bogdanov bub...@hoster-ok.com 
  wrote:
 
  15.05.2013 11:18, Andrew Beekhof wrote:
 
  On 15/05/2013, at 5:31 PM, Vladislav Bogdanov bub...@hoster-ok.com 
  wrote:
 
  15.05.2013 10:25, Andrew Beekhof wrote:
 
  On 15/05/2013, at 3:50 PM, Vladislav Bogdanov bub...@hoster-ok.com 
  wrote:
 
  15.05.2013 08:23, Andrew Beekhof wrote:
 
  On 15/05/2013, at 3:11 PM, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
 
  Thank you for comments.
 
  The guest located it to the shared disk.
 
  What is on the shared disk?  The whole OS or app-specific data 
  (i.e. nothing pacemaker needs directly)?
 
  Shared disk has all the OS and the all data.
 
  Oh. I can imagine that being problematic.
  Pacemaker really isn't designed to function without disk access.
 
  You might be able to get away with it if you turn off saving PE 
  files to disk though.
 
  I store CIB and PE files to tmpfs, and sync them to remote storage
  (CIFS) with lsyncd level 1 config (I may share it on request). It 
  copies
  critical data like cib.xml, and moves everything else, symlinking 
  it to
  original place. The same technique may apply here, but with local fs
  instead of cifs.
 
  Btw, the following patch is needed for that, otherwise pacemaker
  overwrites remote files instead of creating new ones on tmpfs:
 
  --- a/lib/common/xml.c  2011-02-11 11:42:37.0 +0100
  +++ b/lib/common/xml.c  2011-02-24 15:07:48.541870829 +0100
  @@ -529,6 +529,8 @@ write_file(const char *string, const char 
  *filename)
        return -1;
    }
 
  +    unlink(filename);
 
  Seems like it should be safe to include for normal operation.
 
  Exactly.
 
  Small flaw in that logic... write_file() is not used anywhere.
 
  Heh, thanks for spotting this.
 
  I recall write_file() was used for pengine, but some other function for
  CIB. You probably optimized that but forgot to remove 

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-19 Thread renayama19661014
Hi Vladislav,

 For just this, patch is unneeded. It only plays when you have that
 pengine files symlinked from stable storage to tmpfs, Without patch,
 pengine would try to rewrite file where symlink points it - directly on
 a stable storage. With that patch, pengine will remove symlink (and just
 symlink) and will open new file on tmpfs for writing. Thus, it will not
 block if stable storage is inaccessible (for my case because of
 connectivity problems, for yours - because of backing storage outage).
 
 If you decide to go with tmpfs *and* use the same synchronization method
 as I do, then you'd need to bake the similar patch for 1.0, just add
 unlink() before pengine writes its data (I suspect that code to differ
 between 1.0 and 1.1.10, even in 1.1.6 it was different to current master).

Thank you for detailed explanation.
At first I confirm movement only in tmpfs.

Many Thanks!
Hideo Yamauchi.

--- On Fri, 2013/5/17, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 Hi Hideo-san,
 
 17.05.2013 10:29, renayama19661...@ybb.ne.jp wrote:
  Hi Vladislav,
  
  Thank you for advice.
  
  I try the patch which you showed.
  
  We use Pacemaker1.0, but apply a patch there because there is a similar 
  code.
  
  If there is a question by setting, I ask you a question by an email.
   * At first I only use tmpfs, and I intend to test it.
 
 For just this, patch is unneeded. It only plays when you have that
 pengine files symlinked from stable storage to tmpfs, Without patch,
 pengine would try to rewrite file where symlink points it - directly on
 a stable storage. With that patch, pengine will remove symlink (and just
 symlink) and will open new file on tmpfs for writing. Thus, it will not
 block if stable storage is inaccessible (for my case because of
 connectivity problems, for yours - because of backing storage outage).
 
 If you decide to go with tmpfs *and* use the same synchronization method
 as I do, then you'd need to bake the similar patch for 1.0, just add
 unlink() before pengine writes its data (I suspect that code to differ
 between 1.0 and 1.1.10, even in 1.1.6 it was different to current master).
 
  
  P.S. Andrew, is this patch ok to apply?
  
  To Andrew...
    Does the patch in conjunction with the write_xml processing in your 
 repository have to apply it before the confirmation of the patch of 
 Vladislav?
  
  Many Thanks!
  Hideo Yamauchi.
  
  
  
  
  --- On Fri, 2013/5/17, Vladislav Bogdanov bub...@hoster-ok.com wrote:
  
  Hi Hideo-san,
 
  You may try the following patch (with trick below)
 
  From 2c4418d11c491658e33c149f63e6a2f2316ef310 Mon Sep 17 00:00:00 2001
  From: Vladislav Bogdanov bub...@hoster-ok.com
  Date: Fri, 17 May 2013 05:58:34 +
  Subject: [PATCH] Feature: PE: Unlink pengine output files before writing.
   This should help guys who store them to tmpfs and then copy to a stable 
 storage
   on (inotify) events with symlink creation in the original place to 
 survive when
   stable storage is not accessible.
 
  ---
   pengine/pengine.c |    1 +
   1 files changed, 1 insertions(+), 0 deletions(-)
 
  diff --git a/pengine/pengine.c b/pengine/pengine.c
  index c7e1c68..99a81c6 100644
  --- a/pengine/pengine.c
  +++ b/pengine/pengine.c
  @@ -184,6 +184,7 @@ process_pe_message(xmlNode * msg, xmlNode * xml_data, 
  crm_client_t * sender)
           }
   
           if (is_repoke == FALSE  series_wrap != 0) {
  +            unlink(filename);
               write_xml_file(xml_data, filename, HAVE_BZLIB_H);
               write_last_sequence(PE_STATE_DIR, series[series_id].name, seq 
 + 1, series_wrap);
           } else {
  -- 
  1.7.1
 
  You just need to ensure that /var/lib/pacemaker is on tmpfs. Then you may 
  watch on directories there
  with inotify or so and take actions to move (copy) files to a stable 
  storage (RAM is not of infinite size).
  In my case that is CIFS. And I use lsyncd to synchronize that directories. 
  If you are interested, I can
  provide you with relevant lsyncd configuration. Frankly speaking, three is 
  no big need to create symlinks
  in tmpfs to stable storage, as pacemaker does not use existing pengine 
  files (except sequences). That sequence
  files and cib.xml are the only exceptions which you may want to exist in 
  two places (and you may want to copy
  them from stable storage to tmpfs before pacemaker start), and you can 
  just move everything else away from
  tmpfs once it is written. In this case you do not need this patch.
 
  Best,
  Vladislav
 
  P.S. Andrew, is this patch ok to apply?
 
  17.05.2013 03:27, renayama19661...@ybb.ne.jp wrote:
  Hi Andrew,
  Hi Vladislav,
 
  I try whether this correction is effective for this problem.
    * 
 https://github.com/beekhof/pacemaker/commit/eb6264bf2db395779e65dadf1c626e050a388c59
 
  Best Regards,
  Hideo Yamauchi.
 
  --- On Thu, 2013/5/16, Andrew Beekhof and...@beekhof.net wrote:
 
 
  On 16/05/2013, at 3:49 PM, Vladislav Bogdanov bub...@hoster-ok.com 
  wrote:
 
  

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-22 Thread renayama19661014
Hi Andrew,
Hi Vladislav,

We test movement when we located pe file in tmpfs repeatedly.
It seems to move well for the moment.

I confirm movement a little more, and we are going to try the method that Mr. 
Vladislav synchronizes.

Best Regards,
Hideo Yamauchi.

--- On Wed, 2013/5/22, Andrew Beekhof and...@beekhof.net wrote:

 
 On 17/05/2013, at 4:17 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
  P.S. Andrew, is this patch ok to apply?
 
 https://github.com/beekhof/pacemaker/commit/c7e10c6 :)
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-23 Thread renayama19661014
Hi Andrew,
Hi Vladislav,

 We test movement when we located pe file in tmpfs repeatedly.
 It seems to move well for the moment.

I only adopted tmpfs, and the I/O block of pengine was improved.
I confirm the synchronization with the fixed file, but think that there is not 
the problem from now on.
It becomes one means to solve this problem to adopt tmpfs.

 Pacemaker really isn't designed to function without disk access. 
 You might be able to get away with it if you turn off saving PE files to disk 
 though. 

To Andrew : 
 If you make a patch removing a block of the file handling of pengine, I 
confirm the movement.
 If a problem is evaded without using tmpfs, many users welcome it.

Best Regards,
Hideo Yamauchi.


--- On Wed, 2013/5/22, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi Andrew,
 Hi Vladislav,
 
 We test movement when we located pe file in tmpfs repeatedly.
 It seems to move well for the moment.
 
 I confirm movement a little more, and we are going to try the method that Mr. 
 Vladislav synchronizes.
 
 Best Regards,
 Hideo Yamauchi.
 
 --- On Wed, 2013/5/22, Andrew Beekhof and...@beekhof.net wrote:
 
  
  On 17/05/2013, at 4:17 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
  
   P.S. Andrew, is this patch ok to apply?
  
  https://github.com/beekhof/pacemaker/commit/c7e10c6 :)
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-24 Thread renayama19661014
Hi Andrew,

  To Andrew : 
  If you make a patch removing a block of the file handling of pengine, I 
  confirm the movement.
  If a problem is evaded without using tmpfs, many users welcome it.
  
 
 You mean this patch? https://github.com/beekhof/pacemaker/commit/c7e10c6
 Or another one?

It is another one that I need.
A necessary thing is the structure which I can evade without our using tmpfs.
But it is our demand to the last because it is an environmental problem of 
vSphere5.

Best Regards,
Hideo Yamauchi.

--- On Fri, 2013/5/24, Andrew Beekhof and...@beekhof.net wrote:

 
 On 24/05/2013, at 2:58 PM, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
  Hi Vladislav,
  
  We test movement when we located pe file in tmpfs repeatedly.
  It seems to move well for the moment.
  
  I only adopted tmpfs, and the I/O block of pengine was improved.
  I confirm the synchronization with the fixed file, but think that there is 
  not the problem from now on.
  It becomes one means to solve this problem to adopt tmpfs.
  
  Pacemaker really isn't designed to function without disk access. 
  You might be able to get away with it if you turn off saving PE files to 
  disk though. 
  
  To Andrew : 
  If you make a patch removing a block of the file handling of pengine, I 
  confirm the movement.
  If a problem is evaded without using tmpfs, many users welcome it.
  
 
 You mean this patch? https://github.com/beekhof/pacemaker/commit/c7e10c6
 Or another one?
 
  Best Regards,
  Hideo Yamauchi.
  
  
  --- On Wed, 2013/5/22, renayama19661...@ybb.ne.jp 
  renayama19661...@ybb.ne.jp wrote:
  
  Hi Andrew,
  Hi Vladislav,
  
  We test movement when we located pe file in tmpfs repeatedly.
  It seems to move well for the moment.
  
  I confirm movement a little more, and we are going to try the method that 
  Mr. Vladislav synchronizes.
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Wed, 2013/5/22, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 17/05/2013, at 4:17 PM, Vladislav Bogdanov bub...@hoster-ok.com 
  wrote:
  
  P.S. Andrew, is this patch ok to apply?
  
  https://github.com/beekhof/pacemaker/commit/c7e10c6 :)
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-26 Thread renayama19661014
Hi Andrew,

I registered a demand with Bugzilla.
 * http://bugs.clusterlabs.org/show_bug.cgi?id=5158

Many Thanks!
Hideo Yamauchi.

--- On Fri, 2013/5/24, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi Andrew,
 
   To Andrew : 
   If you make a patch removing a block of the file handling of pengine, I 
   confirm the movement.
   If a problem is evaded without using tmpfs, many users welcome it.
   
  
  You mean this patch? https://github.com/beekhof/pacemaker/commit/c7e10c6
  Or another one?
 
 It is another one that I need.
 A necessary thing is the structure which I can evade without our using tmpfs.
 But it is our demand to the last because it is an environmental problem of 
 vSphere5.
 
 Best Regards,
 Hideo Yamauchi.
 
 --- On Fri, 2013/5/24, Andrew Beekhof and...@beekhof.net wrote:
 
  
  On 24/05/2013, at 2:58 PM, renayama19661...@ybb.ne.jp wrote:
  
   Hi Andrew,
   Hi Vladislav,
   
   We test movement when we located pe file in tmpfs repeatedly.
   It seems to move well for the moment.
   
   I only adopted tmpfs, and the I/O block of pengine was improved.
   I confirm the synchronization with the fixed file, but think that there 
   is not the problem from now on.
   It becomes one means to solve this problem to adopt tmpfs.
   
   Pacemaker really isn't designed to function without disk access. 
   You might be able to get away with it if you turn off saving PE files to 
   disk though. 
   
   To Andrew : 
   If you make a patch removing a block of the file handling of pengine, I 
   confirm the movement.
   If a problem is evaded without using tmpfs, many users welcome it.
   
  
  You mean this patch? https://github.com/beekhof/pacemaker/commit/c7e10c6
  Or another one?
  
   Best Regards,
   Hideo Yamauchi.
   
   
   --- On Wed, 2013/5/22, renayama19661...@ybb.ne.jp 
   renayama19661...@ybb.ne.jp wrote:
   
   Hi Andrew,
   Hi Vladislav,
   
   We test movement when we located pe file in tmpfs repeatedly.
   It seems to move well for the moment.
   
   I confirm movement a little more, and we are going to try the method 
   that Mr. Vladislav synchronizes.
   
   Best Regards,
   Hideo Yamauchi.
   
   --- On Wed, 2013/5/22, Andrew Beekhof and...@beekhof.net wrote:
   
   
   On 17/05/2013, at 4:17 PM, Vladislav Bogdanov bub...@hoster-ok.com 
   wrote:
   
   P.S. Andrew, is this patch ok to apply?
   
   https://github.com/beekhof/pacemaker/commit/c7e10c6 :)
   
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
   
   Project Home: http://www.clusterlabs.org
   Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
   
   
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
   
   Project Home: http://www.clusterlabs.org
   Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
   
   
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
   
   Project Home: http://www.clusterlabs.org
   Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
  
  
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Problem] The state of a node cut with the node that rebooted by a cluster is not recognized.

2013-06-03 Thread renayama19661014
Hi All,

We confirmed a state of the recognition of the cluster in the next procedure.
We confirm it by the next combination.(RHEL6.4 guest)
 * corosync-2.3.0
 * pacemaker-Pacemaker-1.1.10-rc3

-

Step 1) Start all nodes and constitute a cluster.

[root@rh64-coro1 ~]# crm_mon -1 -Af
Last updated: Tue Jun  4 22:30:25 2013
Last change: Tue Jun  4 22:22:54 2013 via crmd on rh64-coro1
Stack: corosync
Current DC: rh64-coro3 (4231178432) - partition with quorum
Version: 1.1.9-db294e1
3 Nodes configured, unknown expected votes
0 Resources configured.


Online: [ rh64-coro1 rh64-coro2 rh64-coro3 ]


Node Attributes:
* Node rh64-coro1:
* Node rh64-coro2:
* Node rh64-coro3:

Migration summary:
* Node rh64-coro1: 
* Node rh64-coro3: 
* Node rh64-coro2: 


Step 2) Stop the first unit node.

[root@rh64-coro2 ~]# crm_mon -1 -Af
Last updated: Tue Jun  4 22:30:55 2013
Last change: Tue Jun  4 22:22:54 2013 via crmd on rh64-coro1
Stack: corosync
Current DC: rh64-coro3 (4231178432) - partition with quorum
Version: 1.1.9-db294e1
3 Nodes configured, unknown expected votes
0 Resources configured.


Online: [ rh64-coro2 rh64-coro3 ]
OFFLINE: [ rh64-coro1 ]


Node Attributes:
* Node rh64-coro2:
* Node rh64-coro3:

Migration summary:
* Node rh64-coro3: 
* Node rh64-coro2: 


Step 3) Restart the first unit node.

[root@rh64-coro1 ~]# crm_mon -1 -Af
Last updated: Tue Jun  4 22:31:29 2013
Last change: Tue Jun  4 22:22:54 2013 via crmd on rh64-coro1
Stack: corosync
Current DC: rh64-coro3 (4231178432) - partition with quorum
Version: 1.1.9-db294e1
3 Nodes configured, unknown expected votes
0 Resources configured.


Online: [ rh64-coro1 rh64-coro2 rh64-coro3 ]


Node Attributes:
* Node rh64-coro1:
* Node rh64-coro2:
* Node rh64-coro3:

Migration summary:
* Node rh64-coro1: 
* Node rh64-coro3: 
* Node rh64-coro2: 


Step 4) Interrupt the inter-connect of all nodes.

[root@kvm-host ~]# brctl delif virbr2 vnet1;brctl delif virbr2 vnet4;brctl 
delif virbr2 vnet7;brctl delif virbr3 vnet2;brctl delif virbr3 vnet5;brctl 
delif virbr3 vnet8

-


Two nodes that do not reboot then recognize other nodes definitely.

[root@rh64-coro2 ~]# crm_mon -1 -Af
Last updated: Tue Jun  4 22:32:06 2013
Last change: Tue Jun  4 22:22:54 2013 via crmd on rh64-coro1
Stack: corosync
Current DC: rh64-coro2 (4214401216) - partition WITHOUT quorum
Version: 1.1.9-db294e1
3 Nodes configured, unknown expected votes
0 Resources configured.


Node rh64-coro1 (4197624000): UNCLEAN (offline)
Node rh64-coro3 (4231178432): UNCLEAN (offline)
Online: [ rh64-coro2 ]


Node Attributes:
* Node rh64-coro2:

Migration summary:
* Node rh64-coro2: 

[root@rh64-coro3 ~]# crm_mon -1 -Af
Last updated: Tue Jun  4 22:33:17 2013
Last change: Tue Jun  4 22:22:54 2013 via crmd on rh64-coro1
Stack: corosync
Current DC: rh64-coro3 (4231178432) - partition WITHOUT quorum
Version: 1.1.9-db294e1
3 Nodes configured, unknown expected votes
0 Resources configured.


Node rh64-coro1 (4197624000): UNCLEAN (offline)
Node rh64-coro2 (4214401216): UNCLEAN (offline)
Online: [ rh64-coro3 ]


Node Attributes:
* Node rh64-coro3:

Migration summary:
* Node rh64-coro3: 


However, the node that rebooted does not recognize the state of one node 
definitely.

[root@rh64-coro1 ~]# crm_mon -1 -Af
Last updated: Tue Jun  4 22:33:31 2013
Last change: Tue Jun  4 22:22:54 2013 via crmd on rh64-coro1
Stack: corosync
Current DC: rh64-coro1 (4197624000) - partition WITHOUT quorum
Version: 1.1.9-db294e1
3 Nodes configured, unknown expected votes
0 Resources configured.


Node rh64-coro3 (4231178432): UNCLEAN (offline) OKay.
Online: [ rh64-coro1 rh64-coro2 ] -- rh64-coro2 NG.


Node Attributes:
* Node rh64-coro1:
* Node rh64-coro2:

Migration summary:
* Node rh64-coro1: 
* Node rh64-coro2: 


It is right movement that recognize other nodes in a UNCLEAN state in the node 
that rebooted, but seems to recognize it by mistake.

It is like the problem of Pacemaker somehow or other.
 * There seems to be the problem with crm_peer_cache hush table.

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] The state of a node cut with the node that rebooted by a cluster is not recognized.

2013-06-03 Thread renayama19661014
Hi All,

I registered this problem with Bugzilla.

 * http://bugs.clusterlabs.org/show_bug.cgi?id=5160

Best Regards,
Hideo Yamauchi.

--- On Tue, 2013/6/4, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi All,
 
 We confirmed a state of the recognition of the cluster in the next procedure.
 We confirm it by the next combination.(RHEL6.4 guest)
  * corosync-2.3.0
  * pacemaker-Pacemaker-1.1.10-rc3
 
 -
 
 Step 1) Start all nodes and constitute a cluster.
 
 [root@rh64-coro1 ~]# crm_mon -1 -Af
 Last updated: Tue Jun  4 22:30:25 2013
 Last change: Tue Jun  4 22:22:54 2013 via crmd on rh64-coro1
 Stack: corosync
 Current DC: rh64-coro3 (4231178432) - partition with quorum
 Version: 1.1.9-db294e1
 3 Nodes configured, unknown expected votes
 0 Resources configured.
 
 
 Online: [ rh64-coro1 rh64-coro2 rh64-coro3 ]
 
 
 Node Attributes:
 * Node rh64-coro1:
 * Node rh64-coro2:
 * Node rh64-coro3:
 
 Migration summary:
 * Node rh64-coro1: 
 * Node rh64-coro3: 
 * Node rh64-coro2: 
 
 
 Step 2) Stop the first unit node.
 
 [root@rh64-coro2 ~]# crm_mon -1 -Af
 Last updated: Tue Jun  4 22:30:55 2013
 Last change: Tue Jun  4 22:22:54 2013 via crmd on rh64-coro1
 Stack: corosync
 Current DC: rh64-coro3 (4231178432) - partition with quorum
 Version: 1.1.9-db294e1
 3 Nodes configured, unknown expected votes
 0 Resources configured.
 
 
 Online: [ rh64-coro2 rh64-coro3 ]
 OFFLINE: [ rh64-coro1 ]
 
 
 Node Attributes:
 * Node rh64-coro2:
 * Node rh64-coro3:
 
 Migration summary:
 * Node rh64-coro3: 
 * Node rh64-coro2: 
 
 
 Step 3) Restart the first unit node.
 
 [root@rh64-coro1 ~]# crm_mon -1 -Af
 Last updated: Tue Jun  4 22:31:29 2013
 Last change: Tue Jun  4 22:22:54 2013 via crmd on rh64-coro1
 Stack: corosync
 Current DC: rh64-coro3 (4231178432) - partition with quorum
 Version: 1.1.9-db294e1
 3 Nodes configured, unknown expected votes
 0 Resources configured.
 
 
 Online: [ rh64-coro1 rh64-coro2 rh64-coro3 ]
 
 
 Node Attributes:
 * Node rh64-coro1:
 * Node rh64-coro2:
 * Node rh64-coro3:
 
 Migration summary:
 * Node rh64-coro1: 
 * Node rh64-coro3: 
 * Node rh64-coro2: 
 
 
 Step 4) Interrupt the inter-connect of all nodes.
 
 [root@kvm-host ~]# brctl delif virbr2 vnet1;brctl delif virbr2 vnet4;brctl 
 delif virbr2 vnet7;brctl delif virbr3 vnet2;brctl delif virbr3 vnet5;brctl 
 delif virbr3 vnet8
 
 -
 
 
 Two nodes that do not reboot then recognize other nodes definitely.
 
 [root@rh64-coro2 ~]# crm_mon -1 -Af
 Last updated: Tue Jun  4 22:32:06 2013
 Last change: Tue Jun  4 22:22:54 2013 via crmd on rh64-coro1
 Stack: corosync
 Current DC: rh64-coro2 (4214401216) - partition WITHOUT quorum
 Version: 1.1.9-db294e1
 3 Nodes configured, unknown expected votes
 0 Resources configured.
 
 
 Node rh64-coro1 (4197624000): UNCLEAN (offline)
 Node rh64-coro3 (4231178432): UNCLEAN (offline)
 Online: [ rh64-coro2 ]
 
 
 Node Attributes:
 * Node rh64-coro2:
 
 Migration summary:
 * Node rh64-coro2: 
 
 [root@rh64-coro3 ~]# crm_mon -1 -Af
 Last updated: Tue Jun  4 22:33:17 2013
 Last change: Tue Jun  4 22:22:54 2013 via crmd on rh64-coro1
 Stack: corosync
 Current DC: rh64-coro3 (4231178432) - partition WITHOUT quorum
 Version: 1.1.9-db294e1
 3 Nodes configured, unknown expected votes
 0 Resources configured.
 
 
 Node rh64-coro1 (4197624000): UNCLEAN (offline)
 Node rh64-coro2 (4214401216): UNCLEAN (offline)
 Online: [ rh64-coro3 ]
 
 
 Node Attributes:
 * Node rh64-coro3:
 
 Migration summary:
 * Node rh64-coro3: 
 
 
 However, the node that rebooted does not recognize the state of one node 
 definitely.
 
 [root@rh64-coro1 ~]# crm_mon -1 -Af
 Last updated: Tue Jun  4 22:33:31 2013
 Last change: Tue Jun  4 22:22:54 2013 via crmd on rh64-coro1
 Stack: corosync
 Current DC: rh64-coro1 (4197624000) - partition WITHOUT quorum
 Version: 1.1.9-db294e1
 3 Nodes configured, unknown expected votes
 0 Resources configured.
 
 
 Node rh64-coro3 (4231178432): UNCLEAN (offline) OKay.
 Online: [ rh64-coro1 rh64-coro2 ] -- rh64-coro2 
 NG.
 
 
 Node Attributes:
 * Node rh64-coro1:
 * Node rh64-coro2:
 
 Migration summary:
 * Node rh64-coro1: 
 * Node rh64-coro2: 
 
 
 It is right movement that recognize other nodes in a UNCLEAN state in the 
 node that rebooted, but seems to recognize it by mistake.
 
 It is like the problem of Pacemaker somehow or other.
  * There seems to be the problem with crm_peer_cache hush table.
 
 Best Regards,
 Hideo Yamauchi.
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project 

Re: [Pacemaker] [Problem] The state of a node cut with the node that rebooted by a cluster is not recognized.

2013-06-04 Thread renayama19661014
Hi Andrew,

 Yep, sounds like a problem.
 I'll follow up on bugzilla

All right!

Many Thanks!
Hideo Yamauchi.

--- On Tue, 2013/6/4, Andrew Beekhof and...@beekhof.net wrote:

 
 On 04/06/2013, at 3:00 PM, renayama19661...@ybb.ne.jp wrote:
 
  
  It is right movement that recognize other nodes in a UNCLEAN state in the 
  node that rebooted, but seems to recognize it by mistake.
  
  It is like the problem of Pacemaker somehow or other.
  * There seems to be the problem with crm_peer_cache hush table.
 
 Yep, sounds like a problem.
 I'll follow up on bugzilla

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Problem]Two error information is displayed.

2013-08-28 Thread renayama19661014
Hi All,

Though the trouble is only once, two error information is displayed in crm_mon.

-

[root@rh64-coro2 ~]# crm_mon -1 -Af
Last updated: Thu Aug 29 18:11:00 2013
Last change: Thu Aug 29 18:10:45 2013 via cibadmin on rh64-coro2
Stack: corosync
Current DC: NONE
1 Nodes configured
1 Resources configured


Online: [ rh64-coro2 ]


Node Attributes:
* Node rh64-coro2:

Migration summary:
* Node rh64-coro2: 
   dummy: migration-threshold=1 fail-count=1 last-failure='Thu Aug 29 18:10:57 
2013'

Failed actions:
dummy_monitor_3000 on (null) 'not running' (7): call=11, status=complete, 
last-rc-change='Thu Aug 29 18:10:57 2013', queued=0ms, exec=0ms
dummy_monitor_3000 on rh64-coro2 'not running' (7): call=11, 
status=complete, last-rc-change='Thu Aug 29 18:10:57 2013', queued=0ms, exec=0ms

-

There seems to be the problem with an additional judgment of the error 
information somehow or other.

-
static void
unpack_rsc_op_failure(resource_t *rsc, node_t *node, int rc, xmlNode *xml_op, 
enum action_fail_response *on_fail, pe_working_set_t * data_set) 
{
int interval = 0;
bool is_probe = FALSE;
action_t *action = NULL;
(snip)
if (rc != PCMK_OCF_NOT_INSTALLED || is_set(data_set-flags, 
pe_flag_symmetric_cluster)) {
if ((node-details-shutdown == FALSE) || (node-details-online == 
TRUE)) {
add_node_copy(data_set-failed, xml_op);
}
}

crm_xml_add(xml_op, XML_ATTR_UNAME, node-details-uname);
if ((node-details-shutdown == FALSE) || (node-details-online == TRUE)) {
add_node_copy(data_set-failed, xml_op);
}
(snip)
-


Please revise the additional handling of error information.

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem]Two error information is displayed.

2013-08-29 Thread renayama19661014
Hi Andres,

Thank you for comment.

 But to be seriously: I see this phaenomena, too.
 (pacemaker 1.1.11-1.el6-4f672bc)

If the version that you confirm is the same as next, probably it will be that 
the same problem happens.
There is a similar cord.
(https://github.com/ClusterLabs/pacemaker/blob/4f672bc85eefd33e2fb09b601bb8ec1510645468/lib/pengine/unpack.c)

Best Regards,
Hideo Yamauchi.

--- On Thu, 2013/8/29, Andreas Mock andreas.m...@web.de wrote:

 Hi Hideo san,
 
 the two line shall emphasis that you do not only have trouble
 but real trouble...  ;-)
 
 But to be seriously: I see this phaenomena, too.
 (pacemaker 1.1.11-1.el6-4f672bc)
 
 Best regards
 Andreas Mock
 
 -Ursprüngliche Nachricht-
 Von: renayama19661...@ybb.ne.jp [mailto:renayama19661...@ybb.ne.jp] 
 Gesendet: Donnerstag, 29. August 2013 02:38
 An: PaceMaker-ML
 Betreff: [Pacemaker] [Problem]Two error information is displayed.
 
 Hi All,
 
 Though the trouble is only once, two error information is displayed in
 crm_mon.
 
 -
 
 [root@rh64-coro2 ~]# crm_mon -1 -Af
 Last updated: Thu Aug 29 18:11:00 2013
 Last change: Thu Aug 29 18:10:45 2013 via cibadmin on rh64-coro2
 Stack: corosync
 Current DC: NONE
 1 Nodes configured
 1 Resources configured
 
 
 Online: [ rh64-coro2 ]
 
 
 Node Attributes:
 * Node rh64-coro2:
 
 Migration summary:
 * Node rh64-coro2: 
    dummy: migration-threshold=1 fail-count=1 last-failure='Thu Aug 29
 18:10:57 2013'
 
 Failed actions:
     dummy_monitor_3000 on (null) 'not running' (7): call=11,
 status=complete, last-rc-change='Thu Aug 29 18:10:57 2013', queued=0ms,
 exec=0ms
     dummy_monitor_3000 on rh64-coro2 'not running' (7): call=11,
 status=complete, last-rc-change='Thu Aug 29 18:10:57 2013', queued=0ms,
 exec=0ms
 
 -
 
 There seems to be the problem with an additional judgment of the error
 information somehow or other.
 
 -
 static void
 unpack_rsc_op_failure(resource_t *rsc, node_t *node, int rc, xmlNode
 *xml_op, enum action_fail_response *on_fail, pe_working_set_t * data_set) 
 {
     int interval = 0;
     bool is_probe = FALSE;
     action_t *action = NULL;
 (snip)
     if (rc != PCMK_OCF_NOT_INSTALLED || is_set(data_set-flags,
 pe_flag_symmetric_cluster)) {
         if ((node-details-shutdown == FALSE) || (node-details-online ==
 TRUE)) {
             add_node_copy(data_set-failed, xml_op);
         }
     }
 
     crm_xml_add(xml_op, XML_ATTR_UNAME, node-details-uname);
     if ((node-details-shutdown == FALSE) || (node-details-online ==
 TRUE)) {
         add_node_copy(data_set-failed, xml_op);
     }
 (snip)
 -
 
 
 Please revise the additional handling of error information.
 
 Best Regards,
 Hideo Yamauchi.
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem]Two error information is displayed.

2013-09-03 Thread renayama19661014
Hi Andrew,

  Hi All,
  
  Though the trouble is only once, two error information is displayed in 
  crm_mon.
 
 Have you got the full cib for when crm_mon is showing this?

No.
I reproduce a problem once again and acquire cib.

Best Regards,
Hideo Yamauchi.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem]Two error information is displayed.

2013-09-03 Thread renayama19661014
Hi Andrew,

   Though the trouble is only once, two error information is displayed in 
   crm_mon.
  
  Have you got the full cib for when crm_mon is showing this?
 
 No.
 I reproduce a problem once again and acquire cib.

I send the result that I acquired by cibadmin -Q command.

Best Regards,
Hideo Yamauchi.

--- On Tue, 2013/9/3, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi Andrew,
 
   Hi All,
   
   Though the trouble is only once, two error information is displayed in 
   crm_mon.
  
  Have you got the full cib for when crm_mon is showing this?
 
 No.
 I reproduce a problem once again and acquire cib.
 
 Best Regards,
 Hideo Yamauchi.
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
cib epoch=7 num_updates=8 admin_epoch=0 validate-with=pacemaker-1.2 crm_feature_set=3.0.7 cib-last-written=Wed Sep  4 18:26:09 2013 update-origin=rh64-coro2 update-client=cibadmin have-quorum=1 dc-uuid=1084752244
  configuration
crm_config
  cluster_property_set id=cib-bootstrap-options
nvpair id=cib-bootstrap-options-dc-version name=dc-version value=1.1.11-0.1.e6a6a48.git.el6-e6a6a48/
nvpair id=cib-bootstrap-options-cluster-infrastructure name=cluster-infrastructure value=corosync/
nvpair name=no-quorum-policy value=ignore id=cib-bootstrap-options-no-quorum-policy/
nvpair name=stonith-enabled value=false id=cib-bootstrap-options-stonith-enabled/
  /cluster_property_set
/crm_config
nodes
  node id=3232248571 uname=rh64-coro2/
/nodes
resources
  primitive id=dummy class=ocf provider=pacemaker type=Dummy
operations
  op name=start timeout=60s interval=0s on-fail=restart id=dummy-start-0s/
  op name=monitor timeout=60s interval=3s on-fail=restart id=dummy-monitor-3s/
  op name=monitor timeout=60s interval=2s on-fail=restart role=Master id=dummy-monitor-2s/
  op name=stop timeout=60s interval=0s on-fail=block id=dummy-stop-0s/
/operations
  /primitive
/resources
constraints/
rsc_defaults
  meta_attributes id=rsc-options
nvpair name=resource-stickiness value=INFINITY id=rsc-options-resource-stickiness/
nvpair name=migration-threshold value=1 id=rsc-options-migration-threshold/
  /meta_attributes
/rsc_defaults
  /configuration
  status
node_state id=3232248571 uname=rh64-coro2 in_ccm=true crmd=online crm-debug-origin=do_update_resource join=member expected=member
  lrm id=3232248571
lrm_resources
  lrm_resource id=dummy type=Dummy class=ocf provider=pacemaker
lrm_rsc_op id=dummy_last_0 operation_key=dummy_stop_0 operation=stop crm-debug-origin=do_update_resource crm_feature_set=3.0.7 transition-key=2:4:0:0c6dc5a5-ddf5-4755-b4dd-af090a180786 transition-magic=0:0;2:4:0:0c6dc5a5-ddf5-4755-b4dd-af090a180786 call-id=16 rc-code=0 op-status=0 interval=0 last-run=1378286890 last-rc-change=1378286890 exec-time=15 queue-time=0 op-digest=f2317cad3d54cec5d7d7aa7d0bf35cf8 op-force-restart= state  op_sleep  op-restart-digest=f2317cad3d54cec5d7d7aa7d0bf35cf8/
lrm_rsc_op id=dummy_monitor_3000 operation_key=dummy_monitor_3000 operation=monitor crm-debug-origin=do_update_resource crm_feature_set=3.0.7 transition-key=6:1:0:0c6dc5a5-ddf5-4755-b4dd-af090a180786 transition-magic=0:0;6:1:0:0c6dc5a5-ddf5-4755-b4dd-af090a180786 call-id=11 rc-code=0 op-status=0 interval=3000 last-rc-change=1378286769 exec-time=16 queue-time=0 op-digest=873ed4f07792aa8ff18f3254244675ea/
lrm_rsc_op id=dummy_last_failure_0 operation_key=dummy_monitor_3000 operation=monitor crm-debug-origin=do_update_resource crm_feature_set=3.0.7 transition-key=6:1:0:0c6dc5a5-ddf5-4755-b4dd-af090a180786 transition-magic=0:7;6:1:0:0c6dc5a5-ddf5-4755-b4dd-af090a180786 call-id=11 rc-code=7 op-status=0 interval=3000 last-rc-change=1378286890 exec-time=0 queue-time=0 op-digest=873ed4f07792aa8ff18f3254244675ea/
  /lrm_resource
/lrm_resources
  /lrm
  transient_attributes id=3232248571
instance_attributes id=status-3232248571
  nvpair id=status-3232248571-probe_complete name=probe_complete value=true/
  nvpair id=status-3232248571-fail-count-dummy name=fail-count-dummy value=1/
  nvpair id=status-3232248571-last-failure-dummy name=last-failure-dummy value=1378286890/
/instance_attributes
  /transient_attributes
/node_state
  /status
/cib
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: 

Re: [Pacemaker] [Problem]Two error information is displayed.

2013-09-03 Thread renayama19661014
Hi Andrew,

I confirmed that a problem was solved in a revision.

Thanks!
Hideo Yamauchi.

--- On Wed, 2013/9/4, Andrew Beekhof and...@beekhof.net wrote:

 Thanks (also to Andreas for sending me an example too)!
 
 Fixed:
    https://github.com/beekhof/pacemaker/commit/a32474b
 
 On 04/09/2013, at 11:02 AM, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
  
  Though the trouble is only once, two error information is displayed in 
  crm_mon.
  
  Have you got the full cib for when crm_mon is showing this?
  
  No.
  I reproduce a problem once again and acquire cib.
  
  I send the result that I acquired by cibadmin -Q command.
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Tue, 2013/9/3, renayama19661...@ybb.ne.jp 
  renayama19661...@ybb.ne.jp wrote:
  
  Hi Andrew,
  
  Hi All,
  
  Though the trouble is only once, two error information is displayed in 
  crm_mon.
  
  Have you got the full cib for when crm_mon is showing this?
  
  No.
  I reproduce a problem once again and acquire cib.
  
  Best Regards,
  Hideo Yamauchi.
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  cib-Q.xml___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread renayama19661014
Hi All,

I contributed next bugzilla by a problem to occur for the difference of the 
timing of the attribute update by attrd before.
 * https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2528

We can evade this problem now by using crmd-transition-delay parameter.

I confirmed whether I could evade this problem by renewed attrd recently.
 * In latest attrd, one became a leader and seemed to come to update an 
attribute.

However, latest attrd does not seem to substitute for crmd-transition-delay.
 * I contribute detailed log later.

We are dissatisfied with continuing using crmd-transition-delay.
Is there the plan when attrd handles this problem well in the future?

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread renayama19661014
Hi Andrew,

Thank you for comments.

 Are you using the new attrd code or the legacy stuff?

I use new attrd.

 
 If you're not using corosync 2.x or see:
 
     crm_notice(Starting mainloop...);
 
 then its the old code.  The new code could also be used with CMAN but isn't 
 configured to build for in that situation.
 
 Only the new code makes (or at least should do) crmd-transition-delay 
 redundant.

It did not seem to work so that new attrd dispensed with crmd-transition-delay 
to me.
I report the details again.
# Probably it will be Bugzilla. . .

Best Regards,
Hideo Yamauchi.

--- On Tue, 2014/1/14, Andrew Beekhof and...@beekhof.net wrote:

 
 On 14 Jan 2014, at 3:52 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi All,
  
  I contributed next bugzilla by a problem to occur for the difference of the 
  timing of the attribute update by attrd before.
  * https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2528
  
  We can evade this problem now by using crmd-transition-delay parameter.
  
  I confirmed whether I could evade this problem by renewed attrd recently.
  * In latest attrd, one became a leader and seemed to come to update an 
  attribute.
  
  However, latest attrd does not seem to substitute for crmd-transition-delay.
  * I contribute detailed log later.
  
  We are dissatisfied with continuing using crmd-transition-delay.
  Is there the plan when attrd handles this problem well in the future?
 
 Are you using the new attrd code or the legacy stuff?
 
 If you're not using corosync 2.x or see:
 
     crm_notice(Starting mainloop...);
 
 then its the old code.  The new code could also be used with CMAN but isn't 
 configured to build for in that situation.
 
 Only the new code makes (or at least should do) crmd-transition-delay 
 redundant.
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread renayama19661014
Hi Andrew,

  Are you using the new attrd code or the legacy stuff?
  
  I use new attrd.
 
 And the values are not being sent to the cib at the same time? 

As far as I looked. . .
When the transmission of the attribute of attrd of the node was late, a leader 
of attrd seemed to send an attribute to cib without waiting for it.

  Only the new code makes (or at least should do) crmd-transition-delay 
  redundant.
  
  It did not seem to work so that new attrd dispensed with 
  crmd-transition-delay to me.
  I report the details again.
  # Probably it will be Bugzilla. . .
 
 Sounds good

All right!

Many Thanks!
Hideo Yamauch.

--- On Tue, 2014/1/14, Andrew Beekhof and...@beekhof.net wrote:

 
 On 14 Jan 2014, at 4:13 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
  
  Thank you for comments.
  
  Are you using the new attrd code or the legacy stuff?
  
  I use new attrd.
 
 And the values are not being sent to the cib at the same time? 
 
  
  
  If you're not using corosync 2.x or see:
  
      crm_notice(Starting mainloop...);
  
  then its the old code.  The new code could also be used with CMAN but 
  isn't configured to build for in that situation.
  
  Only the new code makes (or at least should do) crmd-transition-delay 
  redundant.
  
  It did not seem to work so that new attrd dispensed with 
  crmd-transition-delay to me.
  I report the details again.
  # Probably it will be Bugzilla. . .
 
 Sounds good
 
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Tue, 2014/1/14, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 14 Jan 2014, at 3:52 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  I contributed next bugzilla by a problem to occur for the difference of 
  the timing of the attribute update by attrd before.
  * https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2528
  
  We can evade this problem now by using crmd-transition-delay parameter.
  
  I confirmed whether I could evade this problem by renewed attrd recently.
  * In latest attrd, one became a leader and seemed to come to update an 
  attribute.
  
  However, latest attrd does not seem to substitute for 
  crmd-transition-delay.
  * I contribute detailed log later.
  
  We are dissatisfied with continuing using crmd-transition-delay.
  Is there the plan when attrd handles this problem well in the future?
  
  Are you using the new attrd code or the legacy stuff?
  
  If you're not using corosync 2.x or see:
  
      crm_notice(Starting mainloop...);
  
  then its the old code.  The new code could also be used with CMAN but 
  isn't configured to build for in that situation.
  
  Only the new code makes (or at least should do) crmd-transition-delay 
  redundant.
  
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Enhancement] Change of the globally-unique attribute of the resource.

2014-01-14 Thread renayama19661014
Hi All,

When a user changes the globally-unique attribute of the resource, a problem 
occurs.

When it manages the resource with PID file, this occurs, but this is because 
PID file name changes by globally-unique attribute.

(snip)
if [ ${OCF_RESKEY_CRM_meta_globally_unique} = false ]; then
: ${OCF_RESKEY_pidfile:=$HA_VARRUN/ping-${OCF_RESKEY_name}}
else
: ${OCF_RESKEY_pidfile:=$HA_VARRUN/ping-${OCF_RESOURCE_INSTANCE}}
fi
(snip)


The problem can reappear in the following procedure.

* Step1: Started a resource.
(snip)
primitive prmPingd ocf:pacemaker:pingd \
params name=default_ping_set host_list=192.168.0.1 multiplier=200 
\
op start interval=0s timeout=60s on-fail=restart \
op monitor interval=10s timeout=60s on-fail=restart \
op stop interval=0s timeout=60s on-fail=ignore
clone clnPingd prmPingd
(snip)

* Step2: Change globally-unique attribute.

[root]# crm configure edit
(snip)
clone clnPingd prmPingd \
meta clone-max=2 clone-node-max=2 globally-unique=true
(snip)

* Step3: Stop Pacemaker

But, the resource does not stop because PID file was changed as for the changed 
resource of the globally-unique attribute.

I think that this is a known problem.

I wish this problem is solved in the future

Best Regards,
Hideo Yamauchi.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Enhancement] Change of the globally-unique attribute of the resource.

2014-01-14 Thread renayama19661014
Hi Andrew,

Thank you for comment.

  But, the resource does not stop because PID file was changed as for the 
  changed resource of the globally-unique attribute.
 
 I'd have expected the stop action to be performed with the old attributes.
 crm_report tarball?

Okay.

I register this topic with Bugzilla.
I attach the log to Bugzilla.

Best Regards,
Hideo Yamauchi.
--- On Wed, 2014/1/15, Andrew Beekhof and...@beekhof.net wrote:

 
 On 14 Jan 2014, at 7:26 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi All,
  
  When a user changes the globally-unique attribute of the resource, a 
  problem occurs.
  
  When it manages the resource with PID file, this occurs, but this is 
  because PID file name changes by globally-unique attribute.
  
  (snip)
  if [ ${OCF_RESKEY_CRM_meta_globally_unique} = false ]; then
     : ${OCF_RESKEY_pidfile:=$HA_VARRUN/ping-${OCF_RESKEY_name}}
  else
     : ${OCF_RESKEY_pidfile:=$HA_VARRUN/ping-${OCF_RESOURCE_INSTANCE}}
  fi
  (snip)
 
 This is correct.  The pid file cannot include the instance number when 
 globally-unique is false and must do so when it is true.
 
  
  
  The problem can reappear in the following procedure.
  
  * Step1: Started a resource.
  (snip)
  primitive prmPingd ocf:pacemaker:pingd \
         params name=default_ping_set host_list=192.168.0.1 
 multiplier=200 \
         op start interval=0s timeout=60s on-fail=restart \
         op monitor interval=10s timeout=60s on-fail=restart \
         op stop interval=0s timeout=60s on-fail=ignore
  clone clnPingd prmPingd
  (snip)
  
  * Step2: Change globally-unique attribute.
  
  [root]# crm configure edit
  (snip)
  clone clnPingd prmPingd \
     meta clone-max=2 clone-node-max=2 globally-unique=true
  (snip)
  
  * Step3: Stop Pacemaker
  
  But, the resource does not stop because PID file was changed as for the 
  changed resource of the globally-unique attribute.
 
 I'd have expected the stop action to be performed with the old attributes.
 crm_report tarball?
 
 
  
  I think that this is a known problem.
 
 It wasn't until now.
 
  
  I wish this problem is solved in the future
  
  Best Regards,
  Hideo Yamauchi.
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Question] About replacing in resource_set of the order limitation.

2014-01-16 Thread renayama19661014
Hi All,

We confirm a function of resource_set.

There were the resource of the group and the resource of the clone.

(snip)
Stack: corosync
Current DC: srv01 (3232238180) - partition WITHOUT quorum
Version: 1.1.10-f2d0cbc
1 Nodes configured
7 Resources configured


Online: [ srv01 ]

 Resource Group: grpPg
 A  (ocf::heartbeat:Dummy): Started srv01 
 B  (ocf::heartbeat:Dummy): Started srv01 
 C  (ocf::heartbeat:Dummy): Started srv01 
 D  (ocf::heartbeat:Dummy): Started srv01 
 E  (ocf::heartbeat:Dummy): Started srv01 
 F  (ocf::heartbeat:Dummy): Started srv01 
 Clone Set: clnPing [prmPing]
 Started: [ srv01 ]

Node Attributes:
* Node srv01:
+ default_ping_set  : 100   

Migration summary:
* Node srv01: 

(snip)

These have limitation showing next.

(snip)
  rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY 
rsc=grpPg with-rsc=clnPing
  /rsc_colocation
  rsc_order id=rsc_order-clnPing-grpPg score=0 first=clnPing 
then=grpPg symmetrical=false
  /rsc_order
(snip)


We tried that we rearranged a group in resource_set.
I think that I can rearrange the limitation of colocation as follows.

(snip)
  rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY
resource_set id=rsc_colocation-grpPg-clnPing-0
  resource_ref id=clnPing/
  resource_ref id=A/
  ...
  resource_ref id=F/
/resource_set
  /rsc_colocation
(snip)

How should I rearrange the limitation of order in resource_set?

I thought that it was necessary to list two of the next, but a method to 
express well was not found.

 * symmetirical=true is necessary between the resources that were a group(A 
to F).
 * symmetirical=false is necessary between the resource that was a group(A to 
F) and the clone resources.

I wrote it as follows.
However, I think that symmetircal=false is applied to all order limitation in 
this.
(snip)
  rsc_order id=rsc_order-clnPing-grpPg1 score=0 symmetrical=false
resource_set id=rsc_order-clnPing-grpPg1-0
  resource_ref id=clnPing/
/resource_set
resource_set id=rsc_order-clnPing-grpPg1-1.
  resource_ref id=A/
  ...
  resource_ref id=F/
/resource_set
  /rsc_order
(snip)

Best Reards,
Hideo Yamauchi.



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question] About replacing in resource_set of the order limitation.

2014-01-21 Thread renayama19661014
Hi All,

My test seemed to include a mistake.
It seems to be replaced by two limitation.

 However, I think that symmetircal=false is applied to all order limitation 
 in this.
 (snip)
       rsc_order id=rsc_order-clnPing-grpPg1 score=0 symmetrical=false
         resource_set id=rsc_order-clnPing-grpPg1-0
           resource_ref id=clnPing/
         /resource_set
         resource_set id=rsc_order-clnPing-grpPg1-1.
           resource_ref id=A/
           ...
           resource_ref id=F/
         /resource_set
       /rsc_order
 (snip)


  rsc_order id=rsc_order-clnPing-grpPg1 score=0 first=clnPing 
then=prmEx symmetrical=false
  /rsc_order
  rsc_order id=rsc_order-clnPing-grpPg2 score=0 symmetrical=true
resource_set id=rsc_order-clnPing-grpPg2-0 require-all=false
  resource_ref id=prmEx/
  resource_ref id=prmFs1/
  resource_ref id=prmFs2/
  resource_ref id=prmFs3/
  resource_ref id=prmIp/
  resource_ref id=prmPg/
/resource_set
  /rsc_order

If my understanding includes a mistake, please point it out.

Best Reagards,
Hideo Yamauchi.

--- On Fri, 2014/1/17, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi All,
 
 We confirm a function of resource_set.
 
 There were the resource of the group and the resource of the clone.
 
 (snip)
 Stack: corosync
 Current DC: srv01 (3232238180) - partition WITHOUT quorum
 Version: 1.1.10-f2d0cbc
 1 Nodes configured
 7 Resources configured
 
 
 Online: [ srv01 ]
 
  Resource Group: grpPg
      A      (ocf::heartbeat:Dummy): Started srv01 
      B      (ocf::heartbeat:Dummy): Started srv01 
      C      (ocf::heartbeat:Dummy): Started srv01 
      D      (ocf::heartbeat:Dummy): Started srv01 
      E      (ocf::heartbeat:Dummy): Started srv01 
      F      (ocf::heartbeat:Dummy): Started srv01 
  Clone Set: clnPing [prmPing]
      Started: [ srv01 ]
 
 Node Attributes:
 * Node srv01:
     + default_ping_set                  : 100       
 
 Migration summary:
 * Node srv01: 
 
 (snip)
 
 These have limitation showing next.
 
 (snip)
       rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY 
 rsc=grpPg with-rsc=clnPing
       /rsc_colocation
       rsc_order id=rsc_order-clnPing-grpPg score=0 first=clnPing 
 then=grpPg symmetrical=false
       /rsc_order
 (snip)
 
 
 We tried that we rearranged a group in resource_set.
 I think that I can rearrange the limitation of colocation as follows.
 
 (snip)
       rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY
         resource_set id=rsc_colocation-grpPg-clnPing-0
           resource_ref id=clnPing/
           resource_ref id=A/
           ...
           resource_ref id=F/
         /resource_set
       /rsc_colocation
 (snip)
 
 How should I rearrange the limitation of order in resource_set?
 
 I thought that it was necessary to list two of the next, but a method to 
 express well was not found.
 
  * symmetirical=true is necessary between the resources that were a group(A 
 to F).
  * symmetirical=false is necessary between the resource that was a group(A 
 to F) and the clone resources.
 
 I wrote it as follows.
 However, I think that symmetircal=false is applied to all order limitation 
 in this.
 (snip)
       rsc_order id=rsc_order-clnPing-grpPg1 score=0 symmetrical=false
         resource_set id=rsc_order-clnPing-grpPg1-0
           resource_ref id=clnPing/
         /resource_set
         resource_set id=rsc_order-clnPing-grpPg1-1.
           resource_ref id=A/
           ...
           resource_ref id=F/
         /resource_set
       /rsc_order
 (snip)
 
 Best Reards,
 Hideo Yamauchi.
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-30 Thread renayama19661014
Hi Andrew,

It became late.
I registered this problem by Bugzilla.
The report file is attached, too.

 * http://bugs.clusterlabs.org/show_bug.cgi?id=5194

Best Regards,
Hideo Yamauchi.

--- On Tue, 2014/1/14, Andrew Beekhof and...@beekhof.net wrote:

 
 On 14 Jan 2014, at 4:33 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
  
  Are you using the new attrd code or the legacy stuff?
  
  I use new attrd.
  
  And the values are not being sent to the cib at the same time? 
  
  As far as I looked. . .
  When the transmission of the attribute of attrd of the node was late, a 
  leader of attrd seemed to send an attribute to cib without waiting for it.
 
 And you have a delay configured?  And this value was set prior to that delay 
 expiring?
 
  
  Only the new code makes (or at least should do) crmd-transition-delay 
  redundant.
  
  It did not seem to work so that new attrd dispensed with 
  crmd-transition-delay to me.
  I report the details again.
  # Probably it will be Bugzilla. . .
  
  Sounds good
  
  All right!
  
  Many Thanks!
  Hideo Yamauch.
  
  --- On Tue, 2014/1/14, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 14 Jan 2014, at 4:13 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi Andrew,
  
  Thank you for comments.
  
  Are you using the new attrd code or the legacy stuff?
  
  I use new attrd.
  
  And the values are not being sent to the cib at the same time? 
  
  
  
  If you're not using corosync 2.x or see:
  
       crm_notice(Starting mainloop...);
  
  then its the old code.  The new code could also be used with CMAN but 
  isn't configured to build for in that situation.
  
  Only the new code makes (or at least should do) crmd-transition-delay 
  redundant.
  
  It did not seem to work so that new attrd dispensed with 
  crmd-transition-delay to me.
  I report the details again.
  # Probably it will be Bugzilla. . .
  
  Sounds good
  
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Tue, 2014/1/14, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 14 Jan 2014, at 3:52 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  I contributed next bugzilla by a problem to occur for the difference of 
  the timing of the attribute update by attrd before.
  * https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2528
  
  We can evade this problem now by using crmd-transition-delay parameter.
  
  I confirmed whether I could evade this problem by renewed attrd 
  recently.
  * In latest attrd, one became a leader and seemed to come to update an 
  attribute.
  
  However, latest attrd does not seem to substitute for 
  crmd-transition-delay.
  * I contribute detailed log later.
  
  We are dissatisfied with continuing using crmd-transition-delay.
  Is there the plan when attrd handles this problem well in the future?
  
  Are you using the new attrd code or the legacy stuff?
  
  If you're not using corosync 2.x or see:
  
       crm_notice(Starting mainloop...);
  
  then its the old code.  The new code could also be used with CMAN but 
  isn't configured to build for in that situation.
  
  Only the new code makes (or at least should do) crmd-transition-delay 
  redundant.
  
  
  
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Question:crmsh] About a setting method of seuquential=true in crmsh.

2014-02-04 Thread renayama19661014
Hi All,

We tried to set sequential attribute of resource_set of colocation in true in 
crmsh.

We tried the next method, but true was not able to set it well.

-
[pengine]# crm --version
2.0 (Build 7cd5688c164d2949009accc7f172ce559cadbc4b)

- Pattern 1 - 
colocation rsc_colocation-grpPg-clnPing INFINITY: msPostgresql:Master 
vip-master vip-rep sequential=true 

  rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY
resource_set id=rsc_colocation-grpPg-clnPing-0 role=Master
  resource_ref id=msPostgresql/
/resource_set
resource_set id=rsc_colocation-grpPg-clnPing-1
  resource_ref id=vip-master/
  resource_ref id=vip-rep/
/resource_set
  /rsc_colocation

- Pattern 2 - 
colocation rsc_colocation-grpPg-clnPing INFINITY: msPostgresql:Master 
vip-master vip-rep sequential=false

  rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY
resource_set id=rsc_colocation-grpPg-clnPing-0 role=Master
  resource_ref id=msPostgresql/
/resource_set
resource_set id=rsc_colocation-grpPg-clnPing-1 sequential=false
  resource_ref id=vip-master/
  resource_ref id=vip-rep/
/resource_set
  /rsc_colocation

- Pattern 3 - 
colocation rsc_colocation-grpPg-clnPing INFINITY: msPostgresql:Master 
sequential=true vip-master vip-rep sequential=true

  rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY
resource_set id=rsc_colocation-grpPg-clnPing-0 role=Master
  resource_ref id=msPostgresql/
/resource_set
resource_set id=rsc_colocation-grpPg-clnPing-1
  resource_ref id=vip-master/
  resource_ref id=vip-rep/
/resource_set
  /rsc_colocation

- Pattern 4 - 
colocation rsc_colocation-grpPg-clnPing INFINITY: msPostgresql:Master 
sequential=false vip-master vip-rep sequential=false

  rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY
!--#colocation rsc_colocation-grpPg-clnPing INFINITY: [ 
msPostgresql:Master sequential=true ]--
resource_set id=rsc_colocation-grpPg-clnPing-0 sequential=false 
role=Master
  resource_ref id=msPostgresql/
/resource_set
resource_set id=rsc_colocation-grpPg-clnPing-1 sequential=false
  resource_ref id=vip-master/
  resource_ref id=vip-rep/
/resource_set
  /rsc_colocation

- Pattern 5 - 
colocation rsc_colocation-grpPg-clnPing INFINITY: [ msPostgresql:Master ] [ 
vip-master vip-rep ]

  rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY
resource_set id=rsc_colocation-grpPg-clnPing-0 require-all=false 
sequential=false role=Master
  resource_ref id=msPostgresql/
/resource_set
resource_set id=rsc_colocation-grpPg-clnPing-1 require-all=false 
sequential=false
  resource_ref id=vip-master/
  resource_ref id=vip-rep/
/resource_set
  /rsc_colocation

- Pattern 6 - 
colocation rsc_colocation-grpPg-clnPing INFINITY: ( msPostgresql:Master ) ( 
vip-master vip-rep )

  rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY
resource_set id=rsc_colocation-grpPg-clnPing-0 sequential=false 
role=Master
  resource_ref id=msPostgresql/
/resource_set
resource_set id=rsc_colocation-grpPg-clnPing-1 sequential=false
  resource_ref id=vip-master/
  resource_ref id=vip-rep/
/resource_set
  /rsc_colocation

- Pattern 7 - 
colocation rsc_colocation-grpPg-clnPing INFINITY: [ msPostgresql:Master 
sequential=true ] [ vip-master vip-rep sequential=true ]

  rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY
resource_set id=rsc_colocation-grpPg-clnPing-0 require-all=false 
role=Master
  resource_ref id=msPostgresql/
/resource_set
resource_set id=rsc_colocation-grpPg-clnPing-1 require-all=false
  resource_ref id=vip-master/
  resource_ref id=vip-rep/
/resource_set
  /rsc_colocation

- Pattern 8 - 
colocation rsc_colocation-grpPg-clnPing INFINITY: ( msPostgresql:Master 
sequential=true ) ( vip-master vip-rep sequential=true )

  rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY
resource_set id=rsc_colocation-grpPg-clnPing-0 role=Master
  resource_ref id=msPostgresql/
/resource_set
resource_set id=rsc_colocation-grpPg-clnPing-1
  resource_ref id=vip-master/
  resource_ref id=vip-rep/
/resource_set
  /rsc_colocation

-

How can true set sequantial attribute if I operate it in crmsh?

Best Regards,
Hideo Yamauchi.



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question:crmsh] About a setting method of seuquential=true in crmsh.

2014-02-05 Thread renayama19661014
Hi Kristoffer.

Thank you for comments.
We wait for a correction.

Many Thanks!
Hideo Yamauchi.


--- On Wed, 2014/2/5, Kristoffer Grönlund kgronl...@suse.com wrote:

 On Wed, 5 Feb 2014 15:55:42 +0900 (JST)
 renayama19661...@ybb.ne.jp wrote:
 
  Hi All,
  
  We tried to set sequential attribute of resource_set of colocation in
  true in crmsh.
  
  We tried the next method, but true was not able to set it well.
  
  [snipped]
  
  How can true set sequantial attribute if I operate it in crmsh?
  
 
 Hello,
 
 Unfortunately, the parsing of resource sets in crmsh is incorrect in
 this case. crmsh will never explicitly set sequential to true, only to
 false. This is a bug in both the 2.0 development branch and in all
 previous versions.
 
 A fix is on the way.
 
 Thank you,
 Kristoffer
 
  Best Regards,
  Hideo Yamauchi.
  
  
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started:
  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
  http://bugs.clusterlabs.org
  
 
 
 
 -- 
 // Kristoffer Grönlund
 // kgronl...@suse.com
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question:crmsh] About a setting method of seuquential=true in crmsh.

2014-02-06 Thread renayama19661014
Hi Kristoffer,

In RHEL6.4, crmsh-c8f214020b2c gives the next error and cannot install it.

Does a procedure of any installation have a problem?

---

[root@srv01 crmsh-c8f214020b2c]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.4 (Santiago)

[root@srv01 crmsh-c8f214020b2c]# ./autogen.sh
(snip)
[root@srv01 crmsh-c8f214020b2c]# ./configure --sysconfdir=/etc 
--localstatedir=/var
(snip)
[root@srv01 crmsh-c8f214020b2c]# make install
Making install in doc
make[1]: Entering directory `/opt/crmsh-c8f214020b2c/doc'
a2x -f manpage crm.8.txt
WARNING: crm.8.txt: line 621: missing [[cmdhelp_._status] section
WARNING: crm.8.txt: line 3935: missing [[cmdhelp_._report] section
./crm.8.xml:3137: element refsect1: validity error : Element refsect1 content 
does not follow the DTD, expecting (refsect1info? , (title , subtitle? , 
titleabbrev?) , (((calloutlist | glosslist | bibliolist | itemizedlist | 
orderedlist | segmentedlist | simplelist | variablelist | caution | important | 
note | tip | warning | literallayout | programlisting | programlistingco | 
screen | screenco | screenshot | synopsis | cmdsynopsis | funcsynopsis | 
classsynopsis | fieldsynopsis | constructorsynopsis | destructorsynopsis | 
methodsynopsis | formalpara | para | simpara | address | blockquote | graphic | 
graphicco | mediaobject | mediaobjectco | informalequation | informalexample | 
informalfigure | informaltable | equation | example | figure | table | msgset | 
procedure | sidebar | qandaset | task | anchor | bridgehead | remark | 
highlights | abstract | authorblurb | epigraph | indexterm | beginpage)+ , 
refsect2*) | refsect2+)), got (title simpara simpara
 simpara simpara literallayout refsect2 refsect2 refsect2 refsect2 refsect2 
refsect2 refsect2 refsect2 refsect2 refsect2 refsect2 refsect2 refsect2 
refsect2 refsect2 refsect2 simpara simpara simpara literallayout simpara 
literallayout refsect2 refsect2 refsect2 )
/refsect1
   ^
a2x: failed: xmllint --nonet --noout --valid ./crm.8.xml
make[1]: *** [crm.8] Error 1
make[1]: Leaving directory `/opt/crmsh-c8f214020b2c/doc'
make: *** [install-recursive] Error 1

---


The same error seems to be given in latest crmsh-cc52dc69ceb1.

Best Regards,
Hideo Yamauchi.

--- On Thu, 2014/2/6, Kristoffer Grönlund kgronl...@suse.com wrote:

 On Wed, 5 Feb 2014 23:17:36 +0900 (JST)
 renayama19661...@ybb.ne.jp wrote:
 
  Hi Kristoffer.
  
  Thank you for comments.
  We wait for a correction.
  
  Many Thanks!
  Hideo Yamauchi.
  
 
 Hello,
 
 A fix for this issue has now been committed as changeset c8f214020b2c,
 please let me know if it solves the problem for you.
 
 This construction should now generate the correct XML:
 
 colocation rsc_colocation-grpPg-clnPing INFINITY: \
     [ msPostgresql:Master sequential=true ] \
     [ vip-master vip-rep sequential=true ]
 
 Thank you,
 
 -- 
 // Kristoffer Grönlund
 // kgronl...@suse.com
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question:crmsh] About a setting method of seuquential=true in crmsh.

2014-02-11 Thread renayama19661014
Hi Kristoffer,

Thank you for comments.

I tested it.
However, the problem seems to still occur.

---
[root@srv01 crmsh-8d984b138fc4]# pwd
/opt/crmsh-8d984b138fc4

[root@srv01 crmsh-8d984b138fc4]# ./autogen.sh 
autoconf:   autoconf (GNU Autoconf) 2.63
automake:   automake (GNU automake) 1.11.1
(snip)

[root@srv01 crmsh-8d984b138fc4]# ./configure --sysconfdir=/etc 
--localstatedir=/var
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
(snip)
  Prefix   = /usr
  Executables  = /usr/sbin
  Man pages= /usr/share/man
  Libraries= /usr/lib64
  Header files = ${prefix}/include
  Arch-independent files   = /usr/share
  State information= /var
  System configuration = /etc

[root@srv01 crmsh-8d984b138fc4]# make install
Making install in doc
make[1]: Entering directory `/opt/crmsh-8d984b138fc4/doc'
a2x -f manpage crm.8.txt
WARNING: crm.8.txt: line 621: missing [[cmdhelp_._status] section
WARNING: crm.8.txt: line 3936: missing [[cmdhelp_._report] section
./crm.8.xml:3137: element refsect1: validity error : Element refsect1 content 
does not follow the DTD, expecting (refsect1info? , (title , subtitle? , 
titleabbrev?) , (((calloutlist | glosslist | bibliolist | itemizedlist | 
orderedlist | segmentedlist | simplelist | variablelist | caution | important | 
note | tip | warning | literallayout | programlisting | programlistingco | 
screen | screenco | screenshot | synopsis | cmdsynopsis | funcsynopsis | 
classsynopsis | fieldsynopsis | constructorsynopsis | destructorsynopsis | 
methodsynopsis | formalpara | para | simpara | address | blockquote | graphic | 
graphicco | mediaobject | mediaobjectco | informalequation | informalexample | 
informalfigure | informaltable | equation | example | figure | table | msgset | 
procedure | sidebar | qandaset | task | anchor | bridgehead | remark | 
highlights | abstract | authorblurb | epigraph | indexterm | beginpage)+ , 
refsect2*) | refsect2+)), got (title simpara simpara
 simpara simpara literallayout refsect2 refsect2 refsect2 refsect2 refsect2 
refsect2 refsect2 refsect2 refsect2 refsect2 refsect2 refsect2 refsect2 
refsect2 refsect2 refsect2 simpara simpara simpara literallayout simpara 
literallayout refsect2 refsect2 refsect2 )
/refsect1
   ^
a2x: failed: xmllint --nonet --noout --valid ./crm.8.xml
make[1]: *** [crm.8] Error 1
make[1]: Leaving directory `/opt/crmsh-8d984b138fc4/doc'
make: *** [install-recursive] Error 1

---


Best Regards,
Hideo Yamauchi.

--- On Mon, 2014/2/10, Kristoffer Grönlund kgronl...@suse.com wrote:

 On Fri, 7 Feb 2014 09:21:12 +0900 (JST)
 renayama19661...@ybb.ne.jp wrote:
 
  Hi Kristoffer,
  
  In RHEL6.4, crmsh-c8f214020b2c gives the next error and cannot
  install it.
  
  Does a procedure of any installation have a problem?
 
 Hello,
 
 It seems that docbook validation is stricter on RHEL 6.4 than on other
 systems I use to test. I have pushed a fix for this problem, please
 test again with changeset 8d984b138fc4.
 
 Thank you,
 
 -- 
 // Kristoffer Grönlund
 // kgronl...@suse.com
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] About the difference in handling of sequential.

2014-02-11 Thread renayama19661014
Hi All,

There is difference in two between handling of sequential of resouce_set of 
colocation.

Is either one not a mistake?


static gboolean
unpack_colocation_set(xmlNode * set, int score, pe_working_set_t * data_set)
{
xmlNode *xml_rsc = NULL;
resource_t *with = NULL;
resource_t *resource = NULL;
const char *set_id = ID(set);
const char *role = crm_element_value(set, role);
const char *sequential = crm_element_value(set, sequential);
int local_score = score;

const char *score_s = crm_element_value(set, XML_RULE_ATTR_SCORE);

if (score_s) {
local_score = char2score(score_s);
}

/* When sequential is not set, sequential is treat as TRUE. */

if (sequential != NULL  crm_is_true(sequential) == FALSE) {
return TRUE;
(snip)
static gboolean
colocate_rsc_sets(const char *id, xmlNode * set1, xmlNode * set2, int score,
  pe_working_set_t * data_set)
{
xmlNode *xml_rsc = NULL;
resource_t *rsc_1 = NULL;
resource_t *rsc_2 = NULL;

const char *role_1 = crm_element_value(set1, role);
const char *role_2 = crm_element_value(set2, role);

const char *sequential_1 = crm_element_value(set1, sequential);
const char *sequential_2 = crm_element_value(set2, sequential);

/* When sequential is not set, sequential is treat as FALSE. */

if (crm_is_true(sequential_1)) {
/* get the first one */
for (xml_rsc = __xml_first_child(set1); xml_rsc != NULL; xml_rsc = 
__xml_next(xml_rsc)) {
if (crm_str_eq((const char *)xml_rsc-name, XML_TAG_RESOURCE_REF, 
TRUE)) {
EXPAND_CONSTRAINT_IDREF(id, rsc_1, ID(xml_rsc));
break;
}
}
}

if (crm_is_true(sequential_2)) {
/* get the last one */
(snip)



Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question:crmsh] About a setting method of seuquential=true in crmsh.

2014-02-12 Thread renayama19661014
Hi Kristoffer,

Thank you for comments.

 Could you try with the latest changeset 337654e0cdc4?

However, the problem seems to still occur.

[root@srv01 crmsh-337654e0cdc4]# make install
Making install in doc
make[1]: Entering directory `/opt/crmsh-337654e0cdc4/doc'
a2x -f manpage crm.8.txt
WARNING: crm.8.txt: line 621: missing [[cmdhelp_._status] section
WARNING: crm.8.txt: line 3936: missing [[cmdhelp_._report] section
./crm.8.xml:3137: element refsect1: validity error : Element refsect1 content 
does not follow the DTD, expecting (refsect1info? , (title , subtitle? , 
titleabbrev?) , (((calloutlist | glosslist | bibliolist | itemizedlist | 
orderedlist | segmentedlist | simplelist | variablelist | caution | important | 
note | tip | warning | literallayout | programlisting | programlistingco | 
screen | screenco | screenshot | synopsis | cmdsynopsis | funcsynopsis | 
classsynopsis | fieldsynopsis | constructorsynopsis | destructorsynopsis | 
methodsynopsis | formalpara | para | simpara | address | blockquote | graphic | 
graphicco | mediaobject | mediaobjectco | informalequation | informalexample | 
informalfigure | informaltable | equation | example | figure | table | msgset | 
procedure | sidebar | qandaset | task | anchor | bridgehead | remark | 
highlights | abstract | authorblurb | epigraph | indexterm | beginpage)+ , 
refsect2*) | refsect2+)), got (title simpara simpara
 simpara simpara literallayout refsect2 refsect2 refsect2 refsect2 refsect2 
refsect2 refsect2 refsect2 refsect2 refsect2 refsect2 refsect2 refsect2 
refsect2 refsect2 refsect2 simpara simpara simpara literallayout simpara 
literallayout refsect2 refsect2 refsect2 )
/refsect1
   ^
a2x: failed: xmllint --nonet --noout --valid ./crm.8.xml
make[1]: *** [crm.8] Error 1
make[1]: Leaving directory `/opt/crmsh-337654e0cdc4/doc'
make: *** [install-recursive] Error 1

Best Regards,
--- On Wed, 2014/2/12, Kristoffer Grönlund kgronl...@suse.com wrote:

 On Wed, 12 Feb 2014 09:12:08 +0900 (JST)
 renayama19661...@ybb.ne.jp wrote:
 
  Hi Kristoffer,
  
  Thank you for comments.
  
  I tested it.
  However, the problem seems to still occur.
  
 
 Hello,
 
 Could you try with the latest changeset 337654e0cdc4?
 
 Thank you,
 
 -- 
 // Kristoffer Grönlund
 // kgronl...@suse.com
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question:crmsh] About a setting method of seuquential=true in crmsh.

2014-02-12 Thread renayama19661014
Hi Kristoffer,

Thank you for comments.

By crmsh-7f620e736895.tar.gz, I did make install  well.

I seem to be able to set the sequential attribute definitely.
The sequential attribute does become true.

---
(snip)
colocation rsc_colocation-master INFINITY: [ vip-master vip-rep sequential=true 
] [ msPostgresql:Master sequential=true ]
(snip)
resource_set id=rsc_colocation-master-0 require-all=false 
sequential=true
  resource_ref id=vip-master/
  resource_ref id=vip-rep/
/resource_set
resource_set id=rsc_colocation-master-1 require-all=false 
sequential=true role=Master
  resource_ref id=msPostgresql/
/resource_set
(snip)
---

But the next information appeared when I put crm.
Does this last message not have any problem?

---
[root@srv01 ~]# crm configure load update db2-resource_set_0207.crm 
 
WARNING: pgsql: action monitor not advertised in meta-data, it may not be 
supported by the RA
WARNING: pgsql: action notify not advertised in meta-data, it may not be 
supported by the RA
WARNING: pgsql: action demote not advertised in meta-data, it may not be 
supported by the RA
WARNING: pgsql: action promote not advertised in meta-data, it may not be 
supported by the RA
INFO: object rsc_colocation-master cannot be represented in the CLI notation
---

In addition, colocation becomes xml when I confirm it by crm configure show 
command, and sequential disappears.
Do you not have a problem with this result either?

---
 crm configure show
(snip)
xml rsc_colocation id=rsc_colocation-master score=INFINITY \
  resource_set id=rsc_colocation-master-0 require-all=false \
resource_ref id=vip-master/ \
resource_ref id=vip-rep/ \
  /resource_set \
  resource_set id=rsc_colocation-master-1 require-all=false role=Master 
\
resource_ref id=msPostgresql/ \
  /resource_set \
/rsc_colocation
(snip)
---

In addition, it becomes the error when I appoint false in symmetrical attribute 
of order.

---
(snip)
### Resource Order ###
order rsc_order-clnPingd-msPostgresql-1 0: clnPingd msPostgresql 
symmetrical=false
order test-order-1 0: ( vip-master vip-rep ) symmetrical=false
order test-order-2 INFINITY: msPostgresql:promote vip-master:start 
symmetrical=false
order test-order-3 INFINITY: msPostgresql:promote vip-rep:start 
symmetrical=false
order test-order-4 0: msPostgresql:demote vip-master:stop symmetrical=false
order test-order-5 0: msPostgresql:demote vip-rep:stop symmetrical=false
(snip)
[root@srv01 ~]# crm configure load update db2-resource_set_0207.crm 
Traceback (most recent call last):
  File /usr/sbin/crm, line 56, in module
rc = main.run()
  File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 433, in run
return do_work(context, user_args)
  File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 272, in do_work
if context.run(' '.join(l)):
  File /usr/lib64/python2.6/site-packages/crmsh/ui_context.py, line 87, in run
rv = self.execute_command() is not False
  File /usr/lib64/python2.6/site-packages/crmsh/ui_context.py, line 244, in 
execute_command
rv = self.command_info.function(*arglist)
  File /usr/lib64/python2.6/site-packages/crmsh/ui_configure.py, line 432, in 
do_load
return set_obj.import_file(method, url)
  File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 314, in 
import_file
return self.save(s, no_remove=True, method=method)
  File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 529, in 
save
upd_type=cli, method=method)
  File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 3178, in 
set_update
if not self._set_update(edit_d, mk_set, upd_set, del_set, upd_type, method):
  File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 3170, in 
_set_update
return self._cli_set_update(edit_d, mk_set, upd_set, del_set, method)
  File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 3118, in 
_cli_set_update
obj = self.create_from_cli(cli)
  File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 3045, in 
create_from_cli
node = obj.cli2node(cli_list)
  File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 1014, in 
cli2node
node = self._cli_list2node(cli_list, oldnode)
  File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 1832, in 
_cli_list2node
headnode = mkxmlsimple(head, oldnode, '')
  File /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py, line 662, in 
mkxmlsimple
node.set(n, v)
  File lxml.etree.pyx, line 634, in lxml.etree._Element.set 
(src/lxml/lxml.etree.c:31548)
  File apihelpers.pxi, line 487, in lxml.etree._setAttributeValue 
(src/lxml/lxml.etree.c:13896)
  File apihelpers.pxi, line 1240, in lxml.etree._utf8 
(src/lxml/lxml.etree.c:19826)
TypeError: Argument must be 

Re: [Pacemaker] [Question:crmsh] About a setting method of seuquential=true in crmsh.

2014-02-13 Thread renayama19661014
Hi Kristoffer,

Thank you for comments.

  But the next information appeared when I put crm.
  Does this last message not have any problem?
  
  ---
  [root@srv01 ~]# crm configure load update
  db2-resource_set_0207.crm WARNING: pgsql: action monitor not
  advertised in meta-data, it may not be supported by the RA WARNING:
  pgsql: action notify not advertised in meta-data, it may not be
  supported by the RA WARNING: pgsql: action demote not advertised in
  meta-data, it may not be supported by the RA WARNING: pgsql: action
  promote not advertised in meta-data, it may not be supported by the
  RA INFO: object rsc_colocation-master cannot be represented in the
  CLI notation ---
 
 It seems that there is some problem parsing the pgsql RA, which
 I suspect is the underlying cause that makes crmsh display the
 constraint as XML. 
 
 Which version of resource-agents is installed? On my test system, I
 have version resource-agents-3.9.5-70.2.x86_64. However, the version
 installed by centOS 6.5 seems to be a very old version,
 resource-agents-3.9.2-40.el6_5.6.x86_64.

I use resource-agent3.9.5, too.
I send it to savannah.nongnu.org if I still have a problem.

 It would be very helpful if you could file an issue for this problem at
 
 https://savannah.nongnu.org/bugs/?group=crmsh
 
 and also attach your configuration and an hb_report or crm_report,
 thank you.
 
 I have also implemented a fix for the problem discovered by Mr. Inoue.
 His original email was unfortunately missed at the time.
 The fix is in changeset 71841e4559cf.
 
 Thank you,

I told the uptake of the patch to Mr Inoue.

I confirmed movement in latest crmsh-364c59ee0612.

Most problems were solved somehow or other.
However, the problem that colocation is displayed in xml seems to still remain 
after having sent crm file.


(snip)
colocation rsc_colocation-master INFINITY: [ vip-master vip-rep sequential=true 
] [ msPostgresql:Master sequential=true ]
(snip)
#crm(live)configure# show
(snip)
xml rsc_colocation id=rsc_colocation-master score=INFINITY \
  resource_set id=rsc_colocation-master-0 require-all=false \
resource_ref id=vip-master/ \
resource_ref id=vip-rep/ \
  /resource_set \
  resource_set id=rsc_colocation-master-1 require-all=false role=Master 
\
resource_ref id=msPostgresql/ \
  /resource_set \
/rsc_colocation
(snip)


I send this problem to savannah.nongnu.org.

Many Thanks!
Hideo Yamauchi.


 
 -- 
 // Kristoffer Grönlund
 // kgronl...@suse.com
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question] About replacing in resource_set of the order limitation.

2014-02-16 Thread renayama19661014
Hi Andrew,

Thank you for comments.

 Is this related to your email about symmetrical not being defaulted 
 consistently between colocate_rsc_sets() and unpack_colocation_set()?

Yes.
I think that a default is not handled well.
I will not have any problem when sequential attribute is set in cib by all 
means.

I think that I should revise processing when sequential attribute is not set.

Best Regards,
Hideo Yamauchi.

 
 On 22 Jan 2014, at 3:05 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi All,
  
  My test seemed to include a mistake.
  It seems to be replaced by two limitation.
  
  However, I think that symmetircal=false is applied to all order 
  limitation in this.
  (snip)
        rsc_order id=rsc_order-clnPing-grpPg1 score=0 
 symmetrical=false
          resource_set id=rsc_order-clnPing-grpPg1-0
            resource_ref id=clnPing/
          /resource_set
          resource_set id=rsc_order-clnPing-grpPg1-1.
            resource_ref id=A/
            ...
            resource_ref id=F/
          /resource_set
        /rsc_order
  (snip)
  
  
       rsc_order id=rsc_order-clnPing-grpPg1 score=0 first=clnPing 
 then=prmEx symmetrical=false
       /rsc_order
       rsc_order id=rsc_order-clnPing-grpPg2 score=0 symmetrical=true
         resource_set id=rsc_order-clnPing-grpPg2-0 require-all=false
           resource_ref id=prmEx/
           resource_ref id=prmFs1/
           resource_ref id=prmFs2/
           resource_ref id=prmFs3/
           resource_ref id=prmIp/
           resource_ref id=prmPg/
         /resource_set
       /rsc_order
  
  If my understanding includes a mistake, please point it out.
  
  Best Reagards,
  Hideo Yamauchi.
  
  --- On Fri, 2014/1/17, renayama19661...@ybb.ne.jp 
  renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  We confirm a function of resource_set.
  
  There were the resource of the group and the resource of the clone.
  
  (snip)
  Stack: corosync
  Current DC: srv01 (3232238180) - partition WITHOUT quorum
  Version: 1.1.10-f2d0cbc
  1 Nodes configured
  7 Resources configured
  
  
  Online: [ srv01 ]
  
  Resource Group: grpPg
       A      (ocf::heartbeat:Dummy): Started srv01 
       B      (ocf::heartbeat:Dummy): Started srv01 
       C      (ocf::heartbeat:Dummy): Started srv01 
       D      (ocf::heartbeat:Dummy): Started srv01 
       E      (ocf::heartbeat:Dummy): Started srv01 
       F      (ocf::heartbeat:Dummy): Started srv01 
  Clone Set: clnPing [prmPing]
       Started: [ srv01 ]
  
  Node Attributes:
  * Node srv01:
      + default_ping_set                  : 100       
  
  Migration summary:
  * Node srv01: 
  
  (snip)
  
  These have limitation showing next.
  
  (snip)
        rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY 
 rsc=grpPg with-rsc=clnPing
        /rsc_colocation
        rsc_order id=rsc_order-clnPing-grpPg score=0 first=clnPing 
 then=grpPg symmetrical=false
        /rsc_order
  (snip)
  
  
  We tried that we rearranged a group in resource_set.
  I think that I can rearrange the limitation of colocation as follows.
  
  (snip)
        rsc_colocation id=rsc_colocation-grpPg-clnPing score=INFINITY
          resource_set id=rsc_colocation-grpPg-clnPing-0
            resource_ref id=clnPing/
            resource_ref id=A/
            ...
            resource_ref id=F/
          /resource_set
        /rsc_colocation
  (snip)
  
  How should I rearrange the limitation of order in resource_set?
  
  I thought that it was necessary to list two of the next, but a method to 
  express well was not found.
  
  * symmetirical=true is necessary between the resources that were a 
  group(A to F).
  * symmetirical=false is necessary between the resource that was a 
  group(A to F) and the clone resources.
  
  I wrote it as follows.
  However, I think that symmetircal=false is applied to all order 
  limitation in this.
  (snip)
        rsc_order id=rsc_order-clnPing-grpPg1 score=0 
 symmetrical=false
          resource_set id=rsc_order-clnPing-grpPg1-0
            resource_ref id=clnPing/
          /resource_set
          resource_set id=rsc_order-clnPing-grpPg1-1.
            resource_ref id=A/
            ...
            resource_ref id=F/
          /resource_set
        /rsc_order
  (snip)
  
  Best Reards,
  Hideo Yamauchi.
  
  
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 


Re: [Pacemaker] About the difference in handling of sequential.

2014-02-16 Thread renayama19661014
Hi Andrew,

I found your correction.

https://github.com/beekhof/pacemaker/commit/37ff51a0edba208e6240e812936717fffc941a41

Many Thanks!
Hideo Yamauchi.

--- On Wed, 2014/2/12, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi All,
 
 There is difference in two between handling of sequential of resouce_set 
 of colocation.
 
 Is either one not a mistake?
 
 
 static gboolean
 unpack_colocation_set(xmlNode * set, int score, pe_working_set_t * data_set)
 {
     xmlNode *xml_rsc = NULL;
     resource_t *with = NULL;
     resource_t *resource = NULL;
     const char *set_id = ID(set);
     const char *role = crm_element_value(set, role);
     const char *sequential = crm_element_value(set, sequential);
     int local_score = score;
 
     const char *score_s = crm_element_value(set, XML_RULE_ATTR_SCORE);
 
     if (score_s) {
         local_score = char2score(score_s);
     }
 
 /* When sequential is not set, sequential is treat as TRUE. */
 
     if (sequential != NULL  crm_is_true(sequential) == FALSE) {
         return TRUE;
 (snip)
 static gboolean
 colocate_rsc_sets(const char *id, xmlNode * set1, xmlNode * set2, int score,
                   pe_working_set_t * data_set)
 {
     xmlNode *xml_rsc = NULL;
     resource_t *rsc_1 = NULL;
     resource_t *rsc_2 = NULL;
 
     const char *role_1 = crm_element_value(set1, role);
     const char *role_2 = crm_element_value(set2, role);
 
     const char *sequential_1 = crm_element_value(set1, sequential);
     const char *sequential_2 = crm_element_value(set2, sequential);
 
 /* When sequential is not set, sequential is treat as FALSE. */
 
     if (crm_is_true(sequential_1)) {
         /* get the first one */
         for (xml_rsc = __xml_first_child(set1); xml_rsc != NULL; xml_rsc = 
 __xml_next(xml_rsc)) {
             if (crm_str_eq((const char *)xml_rsc-name, XML_TAG_RESOURCE_REF, 
 TRUE)) {
                 EXPAND_CONSTRAINT_IDREF(id, rsc_1, ID(xml_rsc));
                 break;
             }
         }
     }
 
     if (crm_is_true(sequential_2)) {
         /* get the last one */
 (snip)
 
 
 
 Best Regards,
 Hideo Yamauchi.
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question] About replacing in resource_set of the order limitation.

2014-02-16 Thread renayama19661014
Hi Andrew,

  Is this related to your email about symmetrical not being defaulted 
  consistently between colocate_rsc_sets() and unpack_colocation_set()?
  
  Yes.
  I think that a default is not handled well.
  I will not have any problem when sequential attribute is set in cib by 
  all means.
  
  I think that I should revise processing when sequential attribute is not 
  set.
 
 agreed. I've changed some occurrences locally but there may be more.

All right!

Many Thanks!
Hideo Yamauchi.


--- On Mon, 2014/2/17, Andrew Beekhof and...@beekhof.net wrote:

 
 On 17 Feb 2014, at 12:47 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
  
  Thank you for comments.
  
  Is this related to your email about symmetrical not being defaulted 
  consistently between colocate_rsc_sets() and unpack_colocation_set()?
  
  Yes.
  I think that a default is not handled well.
  I will not have any problem when sequential attribute is set in cib by 
  all means.
  
  I think that I should revise processing when sequential attribute is not 
  set.
 
 agreed. I've changed some occurrences locally but there may be more.
 
  
  Best Regards,
  Hideo Yamauchi.
  
  
  On 22 Jan 2014, at 3:05 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  My test seemed to include a mistake.
  It seems to be replaced by two limitation.
  
  However, I think that symmetircal=false is applied to all order 
  limitation in this.
  (snip)
         rsc_order id=rsc_order-clnPing-grpPg1 score=0 
 symmetrical=false
           resource_set id=rsc_order-clnPing-grpPg1-0
             resource_ref id=clnPing/
           /resource_set
           resource_set id=rsc_order-clnPing-grpPg1-1.
             resource_ref id=A/
             ...
             resource_ref id=F/
           /resource_set
         /rsc_order
  (snip)
  
  
        rsc_order id=rsc_order-clnPing-grpPg1 score=0 first=clnPing 
 then=prmEx symmetrical=false
        /rsc_order
        rsc_order id=rsc_order-clnPing-grpPg2 score=0 
 symmetrical=true
          resource_set id=rsc_order-clnPing-grpPg2-0 require-all=false
            resource_ref id=prmEx/
            resource_ref id=prmFs1/
            resource_ref id=prmFs2/
            resource_ref id=prmFs3/
            resource_ref id=prmIp/
            resource_ref id=prmPg/
          /resource_set
        /rsc_order
  
  If my understanding includes a mistake, please point it out.
  
  Best Reagards,
  Hideo Yamauchi.
  
  --- On Fri, 2014/1/17, renayama19661...@ybb.ne.jp 
  renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  We confirm a function of resource_set.
  
  There were the resource of the group and the resource of the clone.
  
  (snip)
  Stack: corosync
  Current DC: srv01 (3232238180) - partition WITHOUT quorum
  Version: 1.1.10-f2d0cbc
  1 Nodes configured
  7 Resources configured
  
  
  Online: [ srv01 ]
  
  Resource Group: grpPg
        A      (ocf::heartbeat:Dummy): Started srv01 
        B      (ocf::heartbeat:Dummy): Started srv01 
        C      (ocf::heartbeat:Dummy): Started srv01 
        D      (ocf::heartbeat:Dummy): Started srv01 
        E      (ocf::heartbeat:Dummy): Started srv01 
        F      (ocf::heartbeat:Dummy): Started srv01 
  Clone Set: clnPing [prmPing]
        Started: [ srv01 ]
  
  Node Attributes:
  * Node srv01:
       + default_ping_set                  : 100       
  
  Migration summary:
  * Node srv01: 
  
  (snip)
  
  These have limitation showing next.
  
  (snip)
         rsc_colocation id=rsc_colocation-grpPg-clnPing 
 score=INFINITY rsc=grpPg with-rsc=clnPing
         /rsc_colocation
         rsc_order id=rsc_order-clnPing-grpPg score=0 first=clnPing 
 then=grpPg symmetrical=false
         /rsc_order
  (snip)
  
  
  We tried that we rearranged a group in resource_set.
  I think that I can rearrange the limitation of colocation as follows.
  
  (snip)
         rsc_colocation id=rsc_colocation-grpPg-clnPing 
 score=INFINITY
           resource_set id=rsc_colocation-grpPg-clnPing-0
             resource_ref id=clnPing/
             resource_ref id=A/
             ...
             resource_ref id=F/
           /resource_set
         /rsc_colocation
  (snip)
  
  How should I rearrange the limitation of order in resource_set?
  
  I thought that it was necessary to list two of the next, but a method to 
  express well was not found.
  
  * symmetirical=true is necessary between the resources that were a 
  group(A to F).
  * symmetirical=false is necessary between the resource that was a 
  group(A to F) and the clone resources.
  
  I wrote it as follows.
  However, I think that symmetircal=false is applied to all order 
  limitation in this.
  (snip)
         rsc_order id=rsc_order-clnPing-grpPg1 score=0 
 symmetrical=false
           resource_set id=rsc_order-clnPing-grpPg1-0
             resource_ref id=clnPing/
           /resource_set
           resource_set id=rsc_order-clnPing-grpPg1-1.
             resource_ref id=A/
             ...
           

[Pacemaker] [Patch]Information of Connectivity is lost is not displayed

2014-02-16 Thread renayama19661014
Hi All,

The crm_mon tool which is attached to Pacemaker1.1 seems to have a problem.
I send a patch.

Best Regards,
Hideo Yamauchi.

trac2781.patch
Description: Binary data
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Patch]Information of Connectivity is lost is not displayed

2014-02-16 Thread renayama19661014
Hi All,

The next change was accomplished by Mr. Lars.

 
https://github.com/ClusterLabs/pacemaker/commit/6a17c003b0167de9fe51d5330fb6e4f1b4ffe64c

I may lack the correction of other parts which are not the patch which I sent.

Best Regards,
Hideo Yamauchi.

--- On Mon, 2014/2/17, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi All,
 
 The crm_mon tool which is attached to Pacemaker1.1 seems to have a problem.
 I send a patch.
 
 Best Regards,
 Hideo Yamauchi.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Problem] Fail-over is delayed.(State transition is not calculated.)

2014-02-17 Thread renayama19661014
Hi All,

I confirmed movement at the time of the trouble in one of Master/Slave in 
Pacemaker1.1.11.

-

Step1) Constitute a cluster.

[root@srv01 ~]# crm_mon -1 -Af
Last updated: Tue Feb 18 18:07:24 2014
Last change: Tue Feb 18 18:05:46 2014 via crmd on srv01
Stack: corosync
Current DC: srv01 (3232238180) - partition with quorum
Version: 1.1.10-9d39a6b
2 Nodes configured
6 Resources configured


Online: [ srv01 srv02 ]

 vip-master (ocf::heartbeat:Dummy): Started srv01 
 vip-rep(ocf::heartbeat:Dummy): Started srv01 
 Master/Slave Set: msPostgresql [pgsql]
 Masters: [ srv01 ]
 Slaves: [ srv02 ]
 Clone Set: clnPingd [prmPingd]
 Started: [ srv01 srv02 ]

Node Attributes:
* Node srv01:
+ default_ping_set  : 100   
+ master-pgsql  : 10
* Node srv02:
+ default_ping_set  : 100   
+ master-pgsql  : 5 

Migration summary:
* Node srv01: 
* Node srv02: 

Step2) Monitor error in vip-master.

[root@srv01 ~]# rm -rf /var/run/resource-agents/Dummy-vip-master.state 

[root@srv01 ~]# crm_mon -1 -Af  
Last updated: Tue Feb 18 18:07:58 2014
Last change: Tue Feb 18 18:05:46 2014 via crmd on srv01
Stack: corosync
Current DC: srv01 (3232238180) - partition with quorum
Version: 1.1.10-9d39a6b
2 Nodes configured
6 Resources configured


Online: [ srv01 srv02 ]

 Master/Slave Set: msPostgresql [pgsql]
 Masters: [ srv01 ]
 Slaves: [ srv02 ]
 Clone Set: clnPingd [prmPingd]
 Started: [ srv01 srv02 ]

Node Attributes:
* Node srv01:
+ default_ping_set  : 100   
+ master-pgsql  : 10
* Node srv02:
+ default_ping_set  : 100   
+ master-pgsql  : 5 

Migration summary:
* Node srv01: 
   vip-master: migration-threshold=1 fail-count=1 last-failure='Tue Feb 18 
18:07:50 2014'
* Node srv02: 

Failed actions:
vip-master_monitor_1 on srv01 'not running' (7): call=30, 
status=complete, last-rc-change='Tue Feb 18 18:07:50 2014', queued=0ms, exec=0ms
-

However, the resource does not fail-over.

But, fail-over is calculated when I check cib in crm_simulate at this point in 
time.

-
[root@srv01 ~]# crm_simulate -L -s

Current cluster status:
Online: [ srv01 srv02 ]

 vip-master (ocf::heartbeat:Dummy): Stopped 
 vip-rep(ocf::heartbeat:Dummy): Stopped 
 Master/Slave Set: msPostgresql [pgsql]
 Masters: [ srv01 ]
 Slaves: [ srv02 ]
 Clone Set: clnPingd [prmPingd]
 Started: [ srv01 srv02 ]

Allocation scores:
clone_color: clnPingd allocation score on srv01: 0
clone_color: clnPingd allocation score on srv02: 0
clone_color: prmPingd:0 allocation score on srv01: INFINITY
clone_color: prmPingd:0 allocation score on srv02: 0
clone_color: prmPingd:1 allocation score on srv01: 0
clone_color: prmPingd:1 allocation score on srv02: INFINITY
native_color: prmPingd:0 allocation score on srv01: INFINITY
native_color: prmPingd:0 allocation score on srv02: 0
native_color: prmPingd:1 allocation score on srv01: -INFINITY
native_color: prmPingd:1 allocation score on srv02: INFINITY
clone_color: msPostgresql allocation score on srv01: 0
clone_color: msPostgresql allocation score on srv02: 0
clone_color: pgsql:0 allocation score on srv01: INFINITY
clone_color: pgsql:0 allocation score on srv02: 0
clone_color: pgsql:1 allocation score on srv01: 0
clone_color: pgsql:1 allocation score on srv02: INFINITY
native_color: pgsql:0 allocation score on srv01: INFINITY
native_color: pgsql:0 allocation score on srv02: 0
native_color: pgsql:1 allocation score on srv01: -INFINITY
native_color: pgsql:1 allocation score on srv02: INFINITY
pgsql:1 promotion score on srv02: 5
pgsql:0 promotion score on srv01: 1
native_color: vip-master allocation score on srv01: -INFINITY
native_color: vip-master allocation score on srv02: INFINITY
native_color: vip-rep allocation score on srv01: -INFINITY
native_color: vip-rep allocation score on srv02: INFINITY

Transition Summary:
 * Start   vip-master   (srv02)
 * Start   vip-rep  (srv02)
 * Demote  pgsql:0  (Master - Slave srv01)
 * Promote pgsql:1  (Slave - Master srv02)

-

In addition, fail-over is calculated even if cluster_recheck_interval is 
carried out.

Fail-over is carried out even if I carry out cibadmin -B.

-
[root@srv01 ~]# cibadmin -B

[root@srv01 ~]# crm_mon -1 -Af
Last updated: Tue Feb 18 18:21:15 2014
Last change: Tue Feb 18 18:21:00 2014 via cibadmin on srv01
Stack: corosync
Current DC: srv01 (3232238180) - partition with quorum
Version: 1.1.10-9d39a6b
2 Nodes configured
6 Resources configured


Online: [ srv01 srv02 ]

 vip-master (ocf::heartbeat:Dummy): Started srv02 
 vip-rep(ocf::heartbeat:Dummy): Started srv02 
 Master/Slave Set: 

Re: [Pacemaker] [Patch]Information of Connectivity is lost is not displayed

2014-02-17 Thread renayama19661014
Hi Andrew,

 I'm confused... that patch seems to be the reverse of yours.
 Are you saying that we need to undo Lars' one?

No, I do not understand the meaning of the correction of Mr. Lars.

However, as now, crm_mon does not display a right attribute.
Possibly did you not discuss the correction to put meta data in rsc-parameters 
with Mr. Lars? Or Mr. David?

Best Regards,
Hideo Yamauchi.

--- On Tue, 2014/2/18, Andrew Beekhof and...@beekhof.net wrote:

 
 On 17 Feb 2014, at 5:43 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi All,
  
  The next change was accomplished by Mr. Lars.
  
  https://github.com/ClusterLabs/pacemaker/commit/6a17c003b0167de9fe51d5330fb6e4f1b4ffe64c
 
 I'm confused... that patch seems to be the reverse of yours.
 Are you saying that we need to undo Lars' one?
 
  
  I may lack the correction of other parts which are not the patch which I 
  sent.
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Mon, 2014/2/17, renayama19661...@ybb.ne.jp 
  renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  The crm_mon tool which is attached to Pacemaker1.1 seems to have a problem.
  I send a patch.
  
  Best Regards,
  Hideo Yamauchi.
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Patch]Information of Connectivity is lost is not displayed

2014-02-17 Thread renayama19661014
Hi Andrew,

Thank you for comments.

 can I see the config of yours that crm_mon is not displaying correctly?

It is displayed as follows.
-
[root@srv01 tmp]# crm_mon -1 -Af   
Last updated: Tue Feb 18 19:51:04 2014
Last change: Tue Feb 18 19:48:55 2014 via cibadmin on srv01
Stack: corosync
Current DC: srv01 (3232238180) - partition WITHOUT quorum
Version: 1.1.10-9d39a6b
1 Nodes configured
5 Resources configured


Online: [ srv01 ]

Clone Set: clnPingd [prmPingd]
 Started: [ srv01 ]

Node Attributes:
* Node srv01:
+ default_ping_set  : 0 

Migration summary:
* Node srv01: 

-

I uploaded log in the next place.(trac2781.zip)

 * https://skydrive.live.com/?cid=3A14D57622C66876id=3A14D57622C66876%21117

Best Regards,
Hideo Yamauchi.


--- On Tue, 2014/2/18, Andrew Beekhof and...@beekhof.net wrote:

 
 On 18 Feb 2014, at 12:19 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
  
  I'm confused... that patch seems to be the reverse of yours.
  Are you saying that we need to undo Lars' one?
  
  No, I do not understand the meaning of the correction of Mr. Lars.
 
 
 name, multiplier and host_list are all resource parameters, not meta 
 attributes.
 so lars' patch should be correct.
 
 can I see the config of yours that crm_mon is not displaying correctly?
 
  
  However, as now, crm_mon does not display a right attribute.
  Possibly did you not discuss the correction to put meta data in 
  rsc-parameters with Mr. Lars? Or Mr. David?
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Tue, 2014/2/18, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 17 Feb 2014, at 5:43 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  The next change was accomplished by Mr. Lars.
  
  https://github.com/ClusterLabs/pacemaker/commit/6a17c003b0167de9fe51d5330fb6e4f1b4ffe64c
  
  I'm confused... that patch seems to be the reverse of yours.
  Are you saying that we need to undo Lars' one?
  
  
  I may lack the correction of other parts which are not the patch which I 
  sent.
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Mon, 2014/2/17, renayama19661...@ybb.ne.jp 
  renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  The crm_mon tool which is attached to Pacemaker1.1 seems to have a 
  problem.
  I send a patch.
  
  Best Regards,
  Hideo Yamauchi.
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
  
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] Fail-over is delayed.(State transition is not calculated.)

2014-02-18 Thread renayama19661014
Hi David,

Thank you for comments.

 You have resource-stickiness=INFINITY, this is what is preventing the 
 failover from occurring. Set resource-stickiness=1 or 0 and the failover 
 should occur.
 

However, the resource moves by a calculation of the next state transition.
By a calculation of the first trouble, can it not travel the resource?

In addition, the resource moves when the resource deletes next colocation.

colocation rsc_colocation-master-3 INFINITY: vip-rep msPostgresql:Master

There is the problem with handling of colocation of some Pacemaker?

Best Regards,
Hideo Yamauchi.

--- On Wed, 2014/2/19, David Vossel dvos...@redhat.com wrote:

 
 - Original Message -
  From: renayama19661...@ybb.ne.jp
  To: PaceMaker-ML pacemaker@oss.clusterlabs.org
  Sent: Monday, February 17, 2014 7:06:53 PM
  Subject: [Pacemaker] [Problem] Fail-over is delayed.(State transition is 
  not    calculated.)
  
  Hi All,
  
  I confirmed movement at the time of the trouble in one of Master/Slave in
  Pacemaker1.1.11.
  
  -
  
  Step1) Constitute a cluster.
  
  [root@srv01 ~]# crm_mon -1 -Af
  Last updated: Tue Feb 18 18:07:24 2014
  Last change: Tue Feb 18 18:05:46 2014 via crmd on srv01
  Stack: corosync
  Current DC: srv01 (3232238180) - partition with quorum
  Version: 1.1.10-9d39a6b
  2 Nodes configured
  6 Resources configured
  
  
  Online: [ srv01 srv02 ]
  
   vip-master     (ocf::heartbeat:Dummy): Started srv01
   vip-rep        (ocf::heartbeat:Dummy): Started srv01
   Master/Slave Set: msPostgresql [pgsql]
       Masters: [ srv01 ]
       Slaves: [ srv02 ]
   Clone Set: clnPingd [prmPingd]
       Started: [ srv01 srv02 ]
  
  Node Attributes:
  * Node srv01:
      + default_ping_set                  : 100
      + master-pgsql                      : 10
  * Node srv02:
      + default_ping_set                  : 100
      + master-pgsql                      : 5
  
  Migration summary:
  * Node srv01:
  * Node srv02:
  
  Step2) Monitor error in vip-master.
  
  [root@srv01 ~]# rm -rf /var/run/resource-agents/Dummy-vip-master.state
  
  [root@srv01 ~]# crm_mon -1 -Af
  Last updated: Tue Feb 18 18:07:58 2014
  Last change: Tue Feb 18 18:05:46 2014 via crmd on srv01
  Stack: corosync
  Current DC: srv01 (3232238180) - partition with quorum
  Version: 1.1.10-9d39a6b
  2 Nodes configured
  6 Resources configured
  
  
  Online: [ srv01 srv02 ]
  
   Master/Slave Set: msPostgresql [pgsql]
       Masters: [ srv01 ]
       Slaves: [ srv02 ]
   Clone Set: clnPingd [prmPingd]
       Started: [ srv01 srv02 ]
  
  Node Attributes:
  * Node srv01:
      + default_ping_set                  : 100
      + master-pgsql                      : 10
  * Node srv02:
      + default_ping_set                  : 100
      + master-pgsql                      : 5
  
  Migration summary:
  * Node srv01:
     vip-master: migration-threshold=1 fail-count=1 last-failure='Tue Feb 18
     18:07:50 2014'
  * Node srv02:
  
  Failed actions:
      vip-master_monitor_1 on srv01 'not running' (7): call=30,
      status=complete, last-rc-change='Tue Feb 18 18:07:50 2014', queued=0ms,
      exec=0ms
  -
  
  However, the resource does not fail-over.
  
  But, fail-over is calculated when I check cib in crm_simulate at this point
  in time.
  
  -
  [root@srv01 ~]# crm_simulate -L -s
  
  Current cluster status:
  Online: [ srv01 srv02 ]
  
   vip-master     (ocf::heartbeat:Dummy): Stopped
   vip-rep        (ocf::heartbeat:Dummy): Stopped
   Master/Slave Set: msPostgresql [pgsql]
       Masters: [ srv01 ]
       Slaves: [ srv02 ]
   Clone Set: clnPingd [prmPingd]
       Started: [ srv01 srv02 ]
  
  Allocation scores:
  clone_color: clnPingd allocation score on srv01: 0
  clone_color: clnPingd allocation score on srv02: 0
  clone_color: prmPingd:0 allocation score on srv01: INFINITY
  clone_color: prmPingd:0 allocation score on srv02: 0
  clone_color: prmPingd:1 allocation score on srv01: 0
  clone_color: prmPingd:1 allocation score on srv02: INFINITY
  native_color: prmPingd:0 allocation score on srv01: INFINITY
  native_color: prmPingd:0 allocation score on srv02: 0
  native_color: prmPingd:1 allocation score on srv01: -INFINITY
  native_color: prmPingd:1 allocation score on srv02: INFINITY
  clone_color: msPostgresql allocation score on srv01: 0
  clone_color: msPostgresql allocation score on srv02: 0
  clone_color: pgsql:0 allocation score on srv01: INFINITY
  clone_color: pgsql:0 allocation score on srv02: 0
  clone_color: pgsql:1 allocation score on srv01: 0
  clone_color: pgsql:1 allocation score on srv02: INFINITY
  native_color: pgsql:0 allocation score on srv01: INFINITY
  native_color: pgsql:0 allocation score on srv02: 0
  native_color: pgsql:1 allocation score on srv01: -INFINITY
  native_color: pgsql:1 allocation score on srv02: INFINITY
  pgsql:1 promotion score on srv02: 5
  pgsql:0 promotion score on srv01: 1
 

Re: [Pacemaker] [Problem] Fail-over is delayed.(State transition is not calculated.)

2014-02-18 Thread renayama19661014
Hi Andrew,

 I'll follow up on the bug.

Thanks!

Hideo Yamauch.

--- On Wed, 2014/2/19, Andrew Beekhof and...@beekhof.net wrote:

 I'll follow up on the bug.
 
 On 19 Feb 2014, at 10:55 am, renayama19661...@ybb.ne.jp wrote:
 
  Hi David,
  
  Thank you for comments.
  
  You have resource-stickiness=INFINITY, this is what is preventing the 
  failover from occurring. Set resource-stickiness=1 or 0 and the failover 
  should occur.
  
  
  However, the resource moves by a calculation of the next state transition.
  By a calculation of the first trouble, can it not travel the resource?
  
  In addition, the resource moves when the resource deletes next colocation.
  
  colocation rsc_colocation-master-3 INFINITY: vip-rep msPostgresql:Master
  
  There is the problem with handling of colocation of some Pacemaker?
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Wed, 2014/2/19, David Vossel dvos...@redhat.com wrote:
  
  
  - Original Message -
  From: renayama19661...@ybb.ne.jp
  To: PaceMaker-ML pacemaker@oss.clusterlabs.org
  Sent: Monday, February 17, 2014 7:06:53 PM
  Subject: [Pacemaker] [Problem] Fail-over is delayed.(State transition is 
  not    calculated.)
  
  Hi All,
  
  I confirmed movement at the time of the trouble in one of Master/Slave in
  Pacemaker1.1.11.
  
  -
  
  Step1) Constitute a cluster.
  
  [root@srv01 ~]# crm_mon -1 -Af
  Last updated: Tue Feb 18 18:07:24 2014
  Last change: Tue Feb 18 18:05:46 2014 via crmd on srv01
  Stack: corosync
  Current DC: srv01 (3232238180) - partition with quorum
  Version: 1.1.10-9d39a6b
  2 Nodes configured
  6 Resources configured
  
  
  Online: [ srv01 srv02 ]
  
    vip-master     (ocf::heartbeat:Dummy): Started srv01
    vip-rep        (ocf::heartbeat:Dummy): Started srv01
    Master/Slave Set: msPostgresql [pgsql]
        Masters: [ srv01 ]
        Slaves: [ srv02 ]
    Clone Set: clnPingd [prmPingd]
        Started: [ srv01 srv02 ]
  
  Node Attributes:
  * Node srv01:
       + default_ping_set                  : 100
       + master-pgsql                      : 10
  * Node srv02:
       + default_ping_set                  : 100
       + master-pgsql                      : 5
  
  Migration summary:
  * Node srv01:
  * Node srv02:
  
  Step2) Monitor error in vip-master.
  
  [root@srv01 ~]# rm -rf /var/run/resource-agents/Dummy-vip-master.state
  
  [root@srv01 ~]# crm_mon -1 -Af
  Last updated: Tue Feb 18 18:07:58 2014
  Last change: Tue Feb 18 18:05:46 2014 via crmd on srv01
  Stack: corosync
  Current DC: srv01 (3232238180) - partition with quorum
  Version: 1.1.10-9d39a6b
  2 Nodes configured
  6 Resources configured
  
  
  Online: [ srv01 srv02 ]
  
    Master/Slave Set: msPostgresql [pgsql]
        Masters: [ srv01 ]
        Slaves: [ srv02 ]
    Clone Set: clnPingd [prmPingd]
        Started: [ srv01 srv02 ]
  
  Node Attributes:
  * Node srv01:
       + default_ping_set                  : 100
       + master-pgsql                      : 10
  * Node srv02:
       + default_ping_set                  : 100
       + master-pgsql                      : 5
  
  Migration summary:
  * Node srv01:
      vip-master: migration-threshold=1 fail-count=1 last-failure='Tue Feb 
 18
      18:07:50 2014'
  * Node srv02:
  
  Failed actions:
       vip-master_monitor_1 on srv01 'not running' (7): call=30,
       status=complete, last-rc-change='Tue Feb 18 18:07:50 2014', 
 queued=0ms,
       exec=0ms
  -
  
  However, the resource does not fail-over.
  
  But, fail-over is calculated when I check cib in crm_simulate at this 
  point
  in time.
  
  -
  [root@srv01 ~]# crm_simulate -L -s
  
  Current cluster status:
  Online: [ srv01 srv02 ]
  
    vip-master     (ocf::heartbeat:Dummy): Stopped
    vip-rep        (ocf::heartbeat:Dummy): Stopped
    Master/Slave Set: msPostgresql [pgsql]
        Masters: [ srv01 ]
        Slaves: [ srv02 ]
    Clone Set: clnPingd [prmPingd]
        Started: [ srv01 srv02 ]
  
  Allocation scores:
  clone_color: clnPingd allocation score on srv01: 0
  clone_color: clnPingd allocation score on srv02: 0
  clone_color: prmPingd:0 allocation score on srv01: INFINITY
  clone_color: prmPingd:0 allocation score on srv02: 0
  clone_color: prmPingd:1 allocation score on srv01: 0
  clone_color: prmPingd:1 allocation score on srv02: INFINITY
  native_color: prmPingd:0 allocation score on srv01: INFINITY
  native_color: prmPingd:0 allocation score on srv02: 0
  native_color: prmPingd:1 allocation score on srv01: -INFINITY
  native_color: prmPingd:1 allocation score on srv02: INFINITY
  clone_color: msPostgresql allocation score on srv01: 0
  clone_color: msPostgresql allocation score on srv02: 0
  clone_color: pgsql:0 allocation score on srv01: INFINITY
  clone_color: pgsql:0 allocation score on srv02: 0
  clone_color: pgsql:1 allocation score on srv01: 0
  clone_color: pgsql:1 allocation score on srv02: INFINITY
 

Re: [Pacemaker] [Patch]Information of Connectivity is lost is not displayed

2014-02-18 Thread renayama19661014
Hi Andrew,

Thank you for comments.

 So I'm confused as to what the problem is.
 What are you expecting crm_mon to show?

I wish it is displayed as follows.


* Node srv01:
+ default_ping_set  : 0 : Connectivity is lost

Best Regards,
Hideo Yamauchi.
--- On Wed, 2014/2/19, Andrew Beekhof and...@beekhof.net wrote:

 
 On 18 Feb 2014, at 2:38 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
  
  I attach the result of the cibadmin -Q command.
 
 
 So I see:
 
         primitive id=prmPingd class=ocf provider=pacemaker type=ping
           instance_attributes id=prmPingd-instance_attributes
             nvpair name=name value=default_ping_set 
 id=prmPingd-instance_attributes-name/
             nvpair name=host_list value=192.168.40.1 
 id=prmPingd-instance_attributes-host_list/
             nvpair name=multiplier value=100 
 id=prmPingd-instance_attributes-multiplier/
             nvpair name=attempts value=2 
 id=prmPingd-instance_attributes-attempts/
             nvpair name=timeout value=2 
 id=prmPingd-instance_attributes-timeout/
           /instance_attributes
 
 The correct way to query those is as parameters, not as meta attributes.
 Which is what lars' patch achieves.
 
 In your email I see:
 
  * Node srv01:
      + default_ping_set                  : 0         
 
 
 Which looks correct based on:
 
           nvpair id=status-3232238180-default_ping_set 
 name=default_ping_set value=0/
 
 So I'm confused as to what the problem is.
 What are you expecting crm_mon to show?
 
  
  Best Regards,
  Hideo Yamauchi.
  
  
  --- On Tue, 2014/2/18, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 18 Feb 2014, at 1:45 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi Andrew,
  
  Thank you for comments.
  
  can I see the config of yours that crm_mon is not displaying correctly?
  
  It is displayed as follows.
  
  I mean the raw xml. Can you attach it?
  
  -
  [root@srv01 tmp]# crm_mon -1 -Af                   
  Last updated: Tue Feb 18 19:51:04 2014
  Last change: Tue Feb 18 19:48:55 2014 via cibadmin on srv01
  Stack: corosync
  Current DC: srv01 (3232238180) - partition WITHOUT quorum
  Version: 1.1.10-9d39a6b
  1 Nodes configured
  5 Resources configured
  
  
  Online: [ srv01 ]
  
  Clone Set: clnPingd [prmPingd]
       Started: [ srv01 ]
  
  Node Attributes:
  * Node srv01:
      + default_ping_set                  : 0         
  
  Migration summary:
  * Node srv01: 
  
  -
  
  I uploaded log in the next place.(trac2781.zip)
  
  * 
  https://skydrive.live.com/?cid=3A14D57622C66876id=3A14D57622C66876%21117
  
  Best Regards,
  Hideo Yamauchi.
  
  
  --- On Tue, 2014/2/18, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 18 Feb 2014, at 12:19 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi Andrew,
  
  I'm confused... that patch seems to be the reverse of yours.
  Are you saying that we need to undo Lars' one?
  
  No, I do not understand the meaning of the correction of Mr. Lars.
  
  
  name, multiplier and host_list are all resource parameters, not meta 
  attributes.
  so lars' patch should be correct.
  
  can I see the config of yours that crm_mon is not displaying correctly?
  
  
  However, as now, crm_mon does not display a right attribute.
  Possibly did you not discuss the correction to put meta data in 
  rsc-parameters with Mr. Lars? Or Mr. David?
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Tue, 2014/2/18, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 17 Feb 2014, at 5:43 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  The next change was accomplished by Mr. Lars.
  
  https://github.com/ClusterLabs/pacemaker/commit/6a17c003b0167de9fe51d5330fb6e4f1b4ffe64c
  
  I'm confused... that patch seems to be the reverse of yours.
  Are you saying that we need to undo Lars' one?
  
  
  I may lack the correction of other parts which are not the patch 
  which I sent.
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Mon, 2014/2/17, renayama19661...@ybb.ne.jp 
  renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  The crm_mon tool which is attached to Pacemaker1.1 seems to have a 
  problem.
  I send a patch.
  
  Best Regards,
  Hideo Yamauchi.
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: 
  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
  
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
  
  
  ___
  Pacemaker mailing list: 

Re: [Pacemaker] [Patch]Information of Connectivity is lost is not displayed

2014-02-18 Thread renayama19661014
Hi Andrew,

  I wish it is displayed as follows.
  
  
  * Node srv01:
     + default_ping_set                  : 0             : Connectivity is 
 lost
 
 Ah!   https://github.com/beekhof/pacemaker/commit/5d51930

It was displayed definitely.

Many Thanks!
Hideo Yamauchi.

--- On Wed, 2014/2/19, Andrew Beekhof and...@beekhof.net wrote:

 
 On 19 Feb 2014, at 2:55 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
  
  Thank you for comments.
  
  So I'm confused as to what the problem is.
  What are you expecting crm_mon to show?
  
  I wish it is displayed as follows.
  
  
  * Node srv01:
     + default_ping_set                  : 0             : Connectivity is 
 lost
 
 Ah!   https://github.com/beekhof/pacemaker/commit/5d51930
 
  
  Best Regards,
  Hideo Yamauchi.
  --- On Wed, 2014/2/19, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 18 Feb 2014, at 2:38 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi Andrew,
  
  I attach the result of the cibadmin -Q command.
  
  
  So I see:
  
          primitive id=prmPingd class=ocf provider=pacemaker 
 type=ping
            instance_attributes id=prmPingd-instance_attributes
              nvpair name=name value=default_ping_set 
 id=prmPingd-instance_attributes-name/
              nvpair name=host_list value=192.168.40.1 
 id=prmPingd-instance_attributes-host_list/
              nvpair name=multiplier value=100 
 id=prmPingd-instance_attributes-multiplier/
              nvpair name=attempts value=2 
 id=prmPingd-instance_attributes-attempts/
              nvpair name=timeout value=2 
 id=prmPingd-instance_attributes-timeout/
            /instance_attributes
  
  The correct way to query those is as parameters, not as meta attributes.
  Which is what lars' patch achieves.
  
  In your email I see:
  
  * Node srv01:
       + default_ping_set                  : 0         
  
  
  Which looks correct based on:
  
            nvpair id=status-3232238180-default_ping_set 
 name=default_ping_set value=0/
  
  So I'm confused as to what the problem is.
  What are you expecting crm_mon to show?
  
  
  Best Regards,
  Hideo Yamauchi.
  
  
  --- On Tue, 2014/2/18, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 18 Feb 2014, at 1:45 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi Andrew,
  
  Thank you for comments.
  
  can I see the config of yours that crm_mon is not displaying correctly?
  
  It is displayed as follows.
  
  I mean the raw xml. Can you attach it?
  
  -
  [root@srv01 tmp]# crm_mon -1 -Af                   
  Last updated: Tue Feb 18 19:51:04 2014
  Last change: Tue Feb 18 19:48:55 2014 via cibadmin on srv01
  Stack: corosync
  Current DC: srv01 (3232238180) - partition WITHOUT quorum
  Version: 1.1.10-9d39a6b
  1 Nodes configured
  5 Resources configured
  
  
  Online: [ srv01 ]
  
  Clone Set: clnPingd [prmPingd]
        Started: [ srv01 ]
  
  Node Attributes:
  * Node srv01:
       + default_ping_set                  : 0         
  
  Migration summary:
  * Node srv01: 
  
  -
  
  I uploaded log in the next place.(trac2781.zip)
  
  * 
  https://skydrive.live.com/?cid=3A14D57622C66876id=3A14D57622C66876%21117
  
  Best Regards,
  Hideo Yamauchi.
  
  
  --- On Tue, 2014/2/18, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 18 Feb 2014, at 12:19 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi Andrew,
  
  I'm confused... that patch seems to be the reverse of yours.
  Are you saying that we need to undo Lars' one?
  
  No, I do not understand the meaning of the correction of Mr. Lars.
  
  
  name, multiplier and host_list are all resource parameters, not meta 
  attributes.
  so lars' patch should be correct.
  
  can I see the config of yours that crm_mon is not displaying correctly?
  
  
  However, as now, crm_mon does not display a right attribute.
  Possibly did you not discuss the correction to put meta data in 
  rsc-parameters with Mr. Lars? Or Mr. David?
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Tue, 2014/2/18, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 17 Feb 2014, at 5:43 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  The next change was accomplished by Mr. Lars.
  
  https://github.com/ClusterLabs/pacemaker/commit/6a17c003b0167de9fe51d5330fb6e4f1b4ffe64c
  
  I'm confused... that patch seems to be the reverse of yours.
  Are you saying that we need to undo Lars' one?
  
  
  I may lack the correction of other parts which are not the patch 
  which I sent.
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Mon, 2014/2/17, renayama19661...@ybb.ne.jp 
  renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  The crm_mon tool which is attached to Pacemaker1.1 seems to have a 
  problem.
  I send a patch.
  
  Best Regards,
  Hideo Yamauchi.
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting 

[Pacemaker] [Problem] The timer which does not stop is discarded.

2014-02-19 Thread renayama19661014
Hi All,

The timer which is not stopped at the time of the stop of the monitor of the 
master slave resource of the local node runs.
Therefore, warning to cancel outputs a timer when crmd handles the transition 
that is in a new state.

I confirm it in the next procedure.

Step1) Constitute a cluster.

[root@srv01 ~]# crm_mon -1 -Af
Last updated: Thu Feb 20 22:57:09 2014
Last change: Thu Feb 20 22:56:32 2014 via cibadmin on srv01
Stack: corosync
Current DC: srv01 (3232238180) - partition with quorum
Version: 1.1.10-c1a326d
2 Nodes configured
6 Resources configured


Online: [ srv01 srv02 ]

 vip-master (ocf::heartbeat:Dummy2):Started srv01 
 vip-rep(ocf::heartbeat:Dummy): Started srv01 
 Master/Slave Set: msPostgresql [pgsql]
 Masters: [ srv01 ]
 Slaves: [ srv02 ]
 Clone Set: clnPingd [prmPingd]
 Started: [ srv01 srv02 ]

Node Attributes:
* Node srv01:
+ default_ping_set  : 100   
+ master-pgsql  : 10
* Node srv02:
+ default_ping_set  : 100   
+ master-pgsql  : 5 

Migration summary:
* Node srv01: 
* Node srv02: 

Step2) Cause trouble.
[root@srv01 ~]# rm -rf /var/run/resource-agents/Dummy-vip-master.state 

Step3) Warning is displayed by log.
(snip)
Feb 20 22:57:46 srv01 crmd[12107]:   notice: te_rsc_command: Initiating action 
5: cancel pgsql_cancel_9000 on srv01 (local)
Feb 20 22:57:46 srv01 lrmd[12104]: info: cancel_recurring_action: 
Cancelling operation pgsql_monitor_9000
Feb 20 22:57:46 srv01 crmd[12107]: info: match_graph_event: Action 
pgsql_monitor_9000 (5) confirmed on srv01 (rc=0)
(snip)
Feb 20 22:57:46 srv01 pengine[12106]: info: LogActions: Leave   
prmPingd:1#011(Started srv02)Feb 20 22:57:46 srv01 crmd[12107]: info: 
do_state_transition: State transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ 
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Feb 20 22:57:46 srv01 crmd[12107]:  warning: destroy_action: Cancelling timer 
for action 5 (src=139)
(snip)

The time-out monitoring with the timer thinks like an unnecessary at the time 
of the stop of the monitor of the master slave resource of the local node.

I registered these contents with Bugzilla.

 * http://bugs.clusterlabs.org/show_bug.cgi?id=5199

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Question] About quorum-policy=freeze and promote.

2014-05-07 Thread renayama19661014
Hi All,

I composed Master/Slave resource of three nodes that set quorum-policy=freeze.
(I use Stateful in Master/Slave resource.)

-
Current DC: srv01 (3232238280) - partition with quorum
Version: 1.1.11-830af67
3 Nodes configured
9 Resources configured


Online: [ srv01 srv02 srv03 ]

 Resource Group: grpStonith1
 prmStonith1-1  (stonith:external/ssh): Started srv02 
 Resource Group: grpStonith2
 prmStonith2-1  (stonith:external/ssh): Started srv01 
 Resource Group: grpStonith3
 prmStonith3-1  (stonith:external/ssh): Started srv01 
 Master/Slave Set: msPostgresql [pgsql]
 Masters: [ srv01 ]
 Slaves: [ srv02 srv03 ]
 Clone Set: clnPingd [prmPingd]
 Started: [ srv01 srv02 srv03 ]
-


Master resource starts in all nodes when I interrupt the internal communication 
of all nodes.

-
Node srv02 (3232238290): UNCLEAN (offline)
Node srv03 (3232238300): UNCLEAN (offline)
Online: [ srv01 ]

 Resource Group: grpStonith1
 prmStonith1-1  (stonith:external/ssh): Started srv02 
 Resource Group: grpStonith2
 prmStonith2-1  (stonith:external/ssh): Started srv01 
 Resource Group: grpStonith3
 prmStonith3-1  (stonith:external/ssh): Started srv01 
 Master/Slave Set: msPostgresql [pgsql]
 Masters: [ srv01 ]
 Slaves: [ srv02 srv03 ]
 Clone Set: clnPingd [prmPingd]
 Started: [ srv01 srv02 srv03 ]
(snip)
Node srv01 (3232238280): UNCLEAN (offline)
Node srv03 (3232238300): UNCLEAN (offline)
Online: [ srv02 ]

 Resource Group: grpStonith1
 prmStonith1-1  (stonith:external/ssh): Started srv02 
 Resource Group: grpStonith2
 prmStonith2-1  (stonith:external/ssh): Started srv01 
 Resource Group: grpStonith3
 prmStonith3-1  (stonith:external/ssh): Started srv01 
 Master/Slave Set: msPostgresql [pgsql]
 Masters: [ srv01 srv02 ]
 Slaves: [ srv03 ]
 Clone Set: clnPingd [prmPingd]
 Started: [ srv01 srv02 srv03 ]
(snip)
Node srv01 (3232238280): UNCLEAN (offline)
Node srv02 (3232238290): UNCLEAN (offline)
Online: [ srv03 ]

 Resource Group: grpStonith1
 prmStonith1-1  (stonith:external/ssh): Started srv02 
 Resource Group: grpStonith2
 prmStonith2-1  (stonith:external/ssh): Started srv01 
 Resource Group: grpStonith3
 prmStonith3-1  (stonith:external/ssh): Started srv01 
 Master/Slave Set: msPostgresql [pgsql]
 Masters: [ srv01 srv03 ]
 Slaves: [ srv02 ]
 Clone Set: clnPingd [prmPingd]
 Started: [ srv01 srv02 srv03 ]
-

I think even if the cluster loses Quorum, being promote the Master / Slave 
resource that's specification of Pacemaker.

Is it responsibility of the resource agent side to prevent a state of these 
plural Master?
 * I think that drbd-RA has those functions.
 * But, there is no function in Stateful-RA.
 * As an example, I think that the mechanism such as drbd is necessary by all 
means when I make a resource of Master/Slave newly.

Will my understanding be wrong?

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question] About quorum-policy=freeze and promote.

2014-05-08 Thread renayama19661014
Hi Emmanuel,

 Why are you using ssh as stonith? i don't think the fencing is working 
 because your nodes are in unclean state

No, STONITH is not carried out because all nodes lose quorum.
This is right movement of Pacemaker.

It is an example to use STONITH of ssh.

Best Regards,
Hideo Yamauchi.
--- On Thu, 2014/5/8, emmanuel segura emi2f...@gmail.com wrote:

 
 Why are you using ssh as stonith? i don't think the fencing is working 
 because your nodes are in unclean state
 
 
 
 
 2014-05-08 5:37 GMT+02:00  renayama19661...@ybb.ne.jp:
 Hi All,
 
 I composed Master/Slave resource of three nodes that set 
 quorum-policy=freeze.
 (I use Stateful in Master/Slave resource.)
 
 -
 Current DC: srv01 (3232238280) - partition with quorum
 Version: 1.1.11-830af67
 3 Nodes configured
 9 Resources configured
 
 
 Online: [ srv01 srv02 srv03 ]
 
  Resource Group: grpStonith1
      prmStonith1-1      (stonith:external/ssh): Started srv02
  Resource Group: grpStonith2
      prmStonith2-1      (stonith:external/ssh): Started srv01
  Resource Group: grpStonith3
      prmStonith3-1      (stonith:external/ssh): Started srv01
  Master/Slave Set: msPostgresql [pgsql]
      Masters: [ srv01 ]
      Slaves: [ srv02 srv03 ]
  Clone Set: clnPingd [prmPingd]
      Started: [ srv01 srv02 srv03 ]
 -
 
 
 Master resource starts in all nodes when I interrupt the internal 
 communication of all nodes.
 
 -
 Node srv02 (3232238290): UNCLEAN (offline)
 Node srv03 (3232238300): UNCLEAN (offline)
 Online: [ srv01 ]
 
  Resource Group: grpStonith1
      prmStonith1-1      (stonith:external/ssh): Started srv02
  Resource Group: grpStonith2
      prmStonith2-1      (stonith:external/ssh): Started srv01
  Resource Group: grpStonith3
      prmStonith3-1      (stonith:external/ssh): Started srv01
  Master/Slave Set: msPostgresql [pgsql]
      Masters: [ srv01 ]
      Slaves: [ srv02 srv03 ]
  Clone Set: clnPingd [prmPingd]
      Started: [ srv01 srv02 srv03 ]
 (snip)
 Node srv01 (3232238280): UNCLEAN (offline)
 Node srv03 (3232238300): UNCLEAN (offline)
 Online: [ srv02 ]
 
  Resource Group: grpStonith1
      prmStonith1-1      (stonith:external/ssh): Started srv02
  Resource Group: grpStonith2
      prmStonith2-1      (stonith:external/ssh): Started srv01
  Resource Group: grpStonith3
      prmStonith3-1      (stonith:external/ssh): Started srv01
  Master/Slave Set: msPostgresql [pgsql]
      Masters: [ srv01 srv02 ]
      Slaves: [ srv03 ]
  Clone Set: clnPingd [prmPingd]
      Started: [ srv01 srv02 srv03 ]
 (snip)
 Node srv01 (3232238280): UNCLEAN (offline)
 Node srv02 (3232238290): UNCLEAN (offline)
 Online: [ srv03 ]
 
  Resource Group: grpStonith1
      prmStonith1-1      (stonith:external/ssh): Started srv02
  Resource Group: grpStonith2
      prmStonith2-1      (stonith:external/ssh): Started srv01
  Resource Group: grpStonith3
      prmStonith3-1      (stonith:external/ssh): Started srv01
  Master/Slave Set: msPostgresql [pgsql]
      Masters: [ srv01 srv03 ]
      Slaves: [ srv02 ]
  Clone Set: clnPingd [prmPingd]
      Started: [ srv01 srv02 srv03 ]
 -
 
 I think even if the cluster loses Quorum, being promote the Master / Slave 
 resource that's specification of Pacemaker.
 
 Is it responsibility of the resource agent side to prevent a state of these 
 plural Master?
  * I think that drbd-RA has those functions.
  * But, there is no function in Stateful-RA.
  * As an example, I think that the mechanism such as drbd is necessary by all 
 means when I make a resource of Master/Slave newly.
 
 Will my understanding be wrong?
 
 Best Regards,
 Hideo Yamauchi.
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 
 -- 
 esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Problem][pacemaker1.0] The probe may not be carried out by difference in cib information of probe.

2014-05-08 Thread renayama19661014
Hi All,

We confirmed a problem when we performed clean up of the Master/Slave 
resource in Pacemaker1.0.
When this problem occurs, probe processing is not carried out.

I registered the problem with Bugzilla.
 * http://bugs.clusterlabs.org/show_bug.cgi?id=5211

In addition, I wrote the method of clean up avoiding a problem for Bugzilla.
But this method may not be usable depending on the combination of resources.

I request improvement if I can revise this problem in Pacemaker1.0 in community.

 * But this problem is improved in Pacemaker1.1 and does not seem to occur.

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question] About quorum-policy=freeze and promote.

2014-05-09 Thread renayama19661014
Hi Andrew,

  Okay.
  I wish this problem is revised by the next release.
 
 crm_report?

I confirmed a problem again in PM1.2-rc1 and registered in Bugzilla.
 * http://bugs.clusterlabs.org/show_bug.cgi?id=5212

Towards Bugzilla, I attached the crm_report file.

Best Regards,
Hideo Yamauchi.

--- On Fri, 2014/5/9, Andrew Beekhof and...@beekhof.net wrote:

 
 On 9 May 2014, at 2:05 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
  
  Thank you for comment.
  
  Is it responsibility of the resource agent side to prevent a state of 
  these plural Master?
  
  No.
  
  In this scenario, no nodes have quorum and therefor no additional 
  instances should have been promoted.  Thats the definition of freeze :)
  Even if one partition DID have quorum, no instances should have been 
  promoted without fencing occurring first.
  
  Okay.
  I wish this problem is revised by the next release.
 
 crm_report?
 
  
  Many Thanks!
  Hideo Yamauchi.
  
  --- On Fri, 2014/5/9, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 8 May 2014, at 1:37 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  I composed Master/Slave resource of three nodes that set 
  quorum-policy=freeze.
  (I use Stateful in Master/Slave resource.)
  
  -
  Current DC: srv01 (3232238280) - partition with quorum
  Version: 1.1.11-830af67
  3 Nodes configured
  9 Resources configured
  
  
  Online: [ srv01 srv02 srv03 ]
  
  Resource Group: grpStonith1
       prmStonith1-1      (stonith:external/ssh): Started srv02 
  Resource Group: grpStonith2
       prmStonith2-1      (stonith:external/ssh): Started srv01 
  Resource Group: grpStonith3
       prmStonith3-1      (stonith:external/ssh): Started srv01 
  Master/Slave Set: msPostgresql [pgsql]
       Masters: [ srv01 ]
       Slaves: [ srv02 srv03 ]
  Clone Set: clnPingd [prmPingd]
       Started: [ srv01 srv02 srv03 ]
  -
  
  
  Master resource starts in all nodes when I interrupt the internal 
  communication of all nodes.
  
  -
  Node srv02 (3232238290): UNCLEAN (offline)
  Node srv03 (3232238300): UNCLEAN (offline)
  Online: [ srv01 ]
  
  Resource Group: grpStonith1
       prmStonith1-1      (stonith:external/ssh): Started srv02 
  Resource Group: grpStonith2
       prmStonith2-1      (stonith:external/ssh): Started srv01 
  Resource Group: grpStonith3
       prmStonith3-1      (stonith:external/ssh): Started srv01 
  Master/Slave Set: msPostgresql [pgsql]
       Masters: [ srv01 ]
       Slaves: [ srv02 srv03 ]
  Clone Set: clnPingd [prmPingd]
       Started: [ srv01 srv02 srv03 ]
  (snip)
  Node srv01 (3232238280): UNCLEAN (offline)
  Node srv03 (3232238300): UNCLEAN (offline)
  Online: [ srv02 ]
  
  Resource Group: grpStonith1
       prmStonith1-1      (stonith:external/ssh): Started srv02 
  Resource Group: grpStonith2
       prmStonith2-1      (stonith:external/ssh): Started srv01 
  Resource Group: grpStonith3
       prmStonith3-1      (stonith:external/ssh): Started srv01 
  Master/Slave Set: msPostgresql [pgsql]
       Masters: [ srv01 srv02 ]
       Slaves: [ srv03 ]
  Clone Set: clnPingd [prmPingd]
       Started: [ srv01 srv02 srv03 ]
  (snip)
  Node srv01 (3232238280): UNCLEAN (offline)
  Node srv02 (3232238290): UNCLEAN (offline)
  Online: [ srv03 ]
  
  Resource Group: grpStonith1
       prmStonith1-1      (stonith:external/ssh): Started srv02 
  Resource Group: grpStonith2
       prmStonith2-1      (stonith:external/ssh): Started srv01 
  Resource Group: grpStonith3
       prmStonith3-1      (stonith:external/ssh): Started srv01 
  Master/Slave Set: msPostgresql [pgsql]
       Masters: [ srv01 srv03 ]
       Slaves: [ srv02 ]
  Clone Set: clnPingd [prmPingd]
       Started: [ srv01 srv02 srv03 ]
  -
  
  I think even if the cluster loses Quorum, being promote the Master / 
  Slave resource that's specification of Pacemaker.
  
  Is it responsibility of the resource agent side to prevent a state of 
  these plural Master?
  
  No.
  
  In this scenario, no nodes have quorum and therefor no additional 
  instances should have been promoted.  Thats the definition of freeze :)
  Even if one partition DID have quorum, no instances should have been 
  promoted without fencing occurring first.
  
  * I think that drbd-RA has those functions.
  * But, there is no function in Stateful-RA.
  * As an example, I think that the mechanism such as drbd is necessary by 
  all means when I make a resource of Master/Slave newly.
  
  Will my understanding be wrong?
  
  Best Regards,
  Hideo Yamauchi.
  
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
  
 
 


[Pacemaker] [Question] About control of colocation.(master-slave with primitive)

2014-05-12 Thread renayama19661014
Hi All,

We assume special resource constitution.
Master of master-slave depends on primitive resource for the constitution.

We performed the setting that Master stopped becoming it in Slave node 
experimentally.


   location rsc_location-msStateful-1 msPostgresql \
rule $role=master 200: #uname eq srv01 \
rule $role=master -INFINITY: #uname eq srv02

The Master resource depends on the primitive resource.

   colocation rsc_colocation-master-1 INFINITY: msPostgresql:Master A-master


Step1) Start Slave node.
---
[root@srv02 ~]# crm_mon -1 -Af
Last updated: Tue May 13 22:28:12 2014
Last change: Tue May 13 22:28:07 2014
Stack: corosync
Current DC: srv02 (3232238190) - partition WITHOUT quorum
Version: 1.1.11-f0f09b8
1 Nodes configured
3 Resources configured


Online: [ srv02 ]

 A-master (ocf::heartbeat:Dummy): Started srv02 
 Master/Slave Set: msPostgresql [pgsql]
 Slaves: [ srv02 ]

Node Attributes:
* Node srv02:
+ master-pgsql  : 5 

Migration summary:
* Node srv02: 
---

Step2) Start Master node.
---
[root@srv02 ~]# crm_mon -1 -Af
Last updated: Tue May 13 22:33:39 2014
Last change: Tue May 13 22:28:07 2014
Stack: corosync
Current DC: srv02 (3232238190) - partition with quorum
Version: 1.1.11-f0f09b8
2 Nodes configured
3 Resources configured


Online: [ srv01 srv02 ]

 A-master (ocf::heartbeat:Dummy): Started srv02 
 Master/Slave Set: msPostgresql [pgsql]
 Masters: [ srv01 ]
 Slaves: [ srv02 ]

Node Attributes:
* Node srv01:
+ master-pgsql  : 10
* Node srv02:
+ master-pgsql  : 5 

Migration summary:
* Node srv02: 
* Node srv01: 
---

 * The Master node that primitive node does not start becomes Master.


We do not want to be promoted to Master in the node that primitive resource 
does not start.
Is there the setting of colocation and order which are not promoted to Master 
of the Master node?

 I think that one method includes the next method.
  * I handle it to update an attribute when primitive resource starts.
  * I write an attribute in the condition to be promoted to Master.


In addition, we are often confused about control of colotaion and order.
It is in particular the control between primitive/group resource and 
clone/master-slave resources.
Will you describe detailed contents in a document?


Best Regards,
Hideo Yamauchi.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem][pacemaker1.0] The probe may not be carried out by difference in cib information of probe.

2014-05-13 Thread renayama19661014
Hi Andrew,

Thank you for comments.

 Do you guys have any timeframe for moving away from 1.0.x?
 The 1.1 series is over 4 years old now and quite usable :-)
 
 There is really a (low) limit to how much effort I can put into support for 
 it.

We gradually move from Pacemaker1.0 to Pacemaker1.1, too.

I thought that I should record that there was this problem with Pacemaker1.0.
And I registered a problem and reported it.
(possibly the user who is behind with a shift to Pacemaker1.1 may encounter the 
same problem.)

It is not necessary at all to revise it for Pacemaker1.0.

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem][pacemaker1.0] The probe may not be carried out by difference in cib information of probe.

2014-05-14 Thread renayama19661014
Hi Andrew,

  It is not necessary at all to revise it for Pacemaker1.0.
 
 Maybe we need to add KnownIssues.md to the repo for anyone thats slow to 
 update.
 Are there any 1.0 bugs that really really need fixing or shall we move them 
 all to the KnownIssues file?

That's a good idea.
In the user who is behind with a shift to PM1.1, it will help big.

Best Regards,
Hideo Yamachi.



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question] About control of colocation.(master-slave with primitive)

2014-05-14 Thread renayama19661014
Hi Andrew,

Thank you for comments.

  We do not want to be promoted to Master in the node that primitive resource 
  does not start.
  Is there the setting of colocation and order which are not promoted to 
  Master of the Master node?
 
 Your config looks reasonable... almost certainly a bug in the PE.
 Do you happen to have the relevant pengine input file available?

Really?
It was like right handling of PE as far as I confirmed a source code of PM1.1.
I register this problem with Bugzilla and contact you.

Best Regards,
Hideo Yamauchi.

--- On Wed, 2014/5/14, Andrew Beekhof and...@beekhof.net wrote:

 
 On 13 May 2014, at 3:14 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi All,
  
  We assume special resource constitution.
  Master of master-slave depends on primitive resource for the constitution.
  
  We performed the setting that Master stopped becoming it in Slave node 
  experimentally.
  
  
    location rsc_location-msStateful-1 msPostgresql \
         rule $role=master 200: #uname eq srv01 \
         rule $role=master -INFINITY: #uname eq srv02
  
  The Master resource depends on the primitive resource.
  
    colocation rsc_colocation-master-1 INFINITY: msPostgresql:Master A-master
  
  
  Step1) Start Slave node.
  ---
  [root@srv02 ~]# crm_mon -1 -Af
  Last updated: Tue May 13 22:28:12 2014
  Last change: Tue May 13 22:28:07 2014
  Stack: corosync
  Current DC: srv02 (3232238190) - partition WITHOUT quorum
  Version: 1.1.11-f0f09b8
  1 Nodes configured
  3 Resources configured
  
  
  Online: [ srv02 ]
  
  A-master     (ocf::heartbeat:Dummy): Started srv02 
  Master/Slave Set: msPostgresql [pgsql]
      Slaves: [ srv02 ]
  
  Node Attributes:
  * Node srv02:
     + master-pgsql                      : 5         
  
  Migration summary:
  * Node srv02: 
  ---
  
  Step2) Start Master node.
  ---
  [root@srv02 ~]# crm_mon -1 -Af
  Last updated: Tue May 13 22:33:39 2014
  Last change: Tue May 13 22:28:07 2014
  Stack: corosync
  Current DC: srv02 (3232238190) - partition with quorum
  Version: 1.1.11-f0f09b8
  2 Nodes configured
  3 Resources configured
  
  
  Online: [ srv01 srv02 ]
  
  A-master     (ocf::heartbeat:Dummy): Started srv02 
  Master/Slave Set: msPostgresql [pgsql]
      Masters: [ srv01 ]
      Slaves: [ srv02 ]
  
  Node Attributes:
  * Node srv01:
     + master-pgsql                      : 10        
  * Node srv02:
     + master-pgsql                      : 5         
  
  Migration summary:
  * Node srv02: 
  * Node srv01: 
  ---
  
  * The Master node that primitive node does not start becomes Master.
  
  
  We do not want to be promoted to Master in the node that primitive resource 
  does not start.
  Is there the setting of colocation and order which are not promoted to 
  Master of the Master node?
 
 Your config looks reasonable... almost certainly a bug in the PE.
 Do you happen to have the relevant pengine input file available?
 
  
  I think that one method includes the next method.
   * I handle it to update an attribute when primitive resource starts.
   * I write an attribute in the condition to be promoted to Master.
  
  
  In addition, we are often confused about control of colotaion and order.
  It is in particular the control between primitive/group resource and 
  clone/master-slave resources.
  Will you describe detailed contents in a document?
  
  
  Best Regards,
  Hideo Yamauchi.
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question] About control of colocation.(master-slave with primitive)

2014-05-14 Thread renayama19661014
Hi Andrew,

  Your config looks reasonable... almost certainly a bug in the PE.
  Do you happen to have the relevant pengine input file available?
  
  Really?
 
 I would expect that:
 
   colocation rsc_colocation-master-1 INFINITY: msPostgresql:Master A-master
 
 would only promote msPostgresql on a node where A-master was running.
 
 Is that not what you were wanting?

Yes. I wanted it.

However, this colocation does not come to be applied by handling of PE.
This is because role of msPostgresql is not decided when it calculates 
placement of A-MASTER.
 * In this case colocation seems to affect only the priority of the 
Master/Slave resource.

I think that this problem disappears if this calculation of the PE is revised.

Best Regards,
Hideo Yamauchi.

--- On Thu, 2014/5/15, Andrew Beekhof and...@beekhof.net wrote:

 
 On 15 May 2014, at 9:57 am, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
  
  Thank you for comments.
  
  We do not want to be promoted to Master in the node that primitive 
  resource does not start.
  Is there the setting of colocation and order which are not promoted to 
  Master of the Master node?
  
  Your config looks reasonable... almost certainly a bug in the PE.
  Do you happen to have the relevant pengine input file available?
  
  Really?
 
 I would expect that:
 
   colocation rsc_colocation-master-1 INFINITY: msPostgresql:Master A-master
 
 would only promote msPostgresql on a node where A-master was running.
 
 Is that not what you were wanting?
 
 
  It was like right handling of PE as far as I confirmed a source code of 
  PM1.1.
  I register this problem with Bugzilla and contact you.
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Wed, 2014/5/14, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 13 May 2014, at 3:14 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  We assume special resource constitution.
  Master of master-slave depends on primitive resource for the constitution.
  
  We performed the setting that Master stopped becoming it in Slave node 
  experimentally.
  
  
     location rsc_location-msStateful-1 msPostgresql \
          rule $role=master 200: #uname eq srv01 \
          rule $role=master -INFINITY: #uname eq srv02
  
  The Master resource depends on the primitive resource.
  
     colocation rsc_colocation-master-1 INFINITY: msPostgresql:Master 
 A-master
  
  
  Step1) Start Slave node.
  ---
  [root@srv02 ~]# crm_mon -1 -Af
  Last updated: Tue May 13 22:28:12 2014
  Last change: Tue May 13 22:28:07 2014
  Stack: corosync
  Current DC: srv02 (3232238190) - partition WITHOUT quorum
  Version: 1.1.11-f0f09b8
  1 Nodes configured
  3 Resources configured
  
  
  Online: [ srv02 ]
  
  A-master     (ocf::heartbeat:Dummy): Started srv02 
  Master/Slave Set: msPostgresql [pgsql]
       Slaves: [ srv02 ]
  
  Node Attributes:
  * Node srv02:
      + master-pgsql                      : 5         
  
  Migration summary:
  * Node srv02: 
  ---
  
  Step2) Start Master node.
  ---
  [root@srv02 ~]# crm_mon -1 -Af
  Last updated: Tue May 13 22:33:39 2014
  Last change: Tue May 13 22:28:07 2014
  Stack: corosync
  Current DC: srv02 (3232238190) - partition with quorum
  Version: 1.1.11-f0f09b8
  2 Nodes configured
  3 Resources configured
  
  
  Online: [ srv01 srv02 ]
  
  A-master     (ocf::heartbeat:Dummy): Started srv02 
  Master/Slave Set: msPostgresql [pgsql]
       Masters: [ srv01 ]
       Slaves: [ srv02 ]
  
  Node Attributes:
  * Node srv01:
      + master-pgsql                      : 10        
  * Node srv02:
      + master-pgsql                      : 5         
  
  Migration summary:
  * Node srv02: 
  * Node srv01: 
  ---
  
  * The Master node that primitive node does not start becomes Master.
  
  
  We do not want to be promoted to Master in the node that primitive 
  resource does not start.
  Is there the setting of colocation and order which are not promoted to 
  Master of the Master node?
  
  Your config looks reasonable... almost certainly a bug in the PE.
  Do you happen to have the relevant pengine input file available?
  
  
  I think that one method includes the next method.
    * I handle it to update an attribute when primitive resource starts.
    * I write an attribute in the condition to be promoted to Master.
  
  
  In addition, we are often confused about control of colotaion and order.
  It is in particular the control between primitive/group resource and 
  clone/master-slave resources.
  Will you describe detailed contents in a document?
  
  
  Best Regards,
  Hideo Yamauchi.
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: 

Re: [Pacemaker] [Problem][pacemaker1.0] The probe may not be carried out by difference in cib information of probe.

2014-05-14 Thread renayama19661014
Hi Andrwe,

 Here we go:
 
    https://github.com/ClusterLabs/pacemaker-1.0/blob/master/README.md
 
 If any additional bugs are found in 1.0, we should create a new entry at 
 bugs.clusterlabs.org, add it to the above README and as long as 1.1 is 
 unaffected: close the bug as WONTFIX. 

All right!

Many Thanks!
Hideo Yamauchi.


--- On Thu, 2014/5/15, Andrew Beekhof and...@beekhof.net wrote:

 
 On 15 May 2014, at 9:54 am, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
  
  It is not necessary at all to revise it for Pacemaker1.0.
  
  Maybe we need to add KnownIssues.md to the repo for anyone thats slow to 
  update.
  Are there any 1.0 bugs that really really need fixing or shall we move 
  them all to the KnownIssues file?
  
  That's a good idea.
  In the user who is behind with a shift to PM1.1, it will help big.
 
 Here we go:
 
    https://github.com/ClusterLabs/pacemaker-1.0/blob/master/README.md
 
 If any additional bugs are found in 1.0, we should create a new entry at 
 bugs.clusterlabs.org, add it to the above README and as long as 1.1 is 
 unaffected: close the bug as WONTFIX. 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question] About control of colocation.(master-slave with primitive)

2014-05-14 Thread renayama19661014
Hi Andrew,

I registered a problem in Bugzilla.
And I attached a file of crm_report.

 * http://bugs.clusterlabs.org/show_bug.cgi?id=5213

Best Regards,
Hideo Yamauchi.

--- On Thu, 2014/5/15, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi Andrew,
 
   Your config looks reasonable... almost certainly a bug in the PE.
   Do you happen to have the relevant pengine input file available?
   
   Really?
  
  I would expect that:
  
    colocation rsc_colocation-master-1 INFINITY: msPostgresql:Master A-master
  
  would only promote msPostgresql on a node where A-master was running.
  
  Is that not what you were wanting?
 
 Yes. I wanted it.
 
 However, this colocation does not come to be applied by handling of PE.
 This is because role of msPostgresql is not decided when it calculates 
 placement of A-MASTER.
  * In this case colocation seems to affect only the priority of the 
 Master/Slave resource.
 
 I think that this problem disappears if this calculation of the PE is revised.
 
 Best Regards,
 Hideo Yamauchi.
 
 --- On Thu, 2014/5/15, Andrew Beekhof and...@beekhof.net wrote:
 
  
  On 15 May 2014, at 9:57 am, renayama19661...@ybb.ne.jp wrote:
  
   Hi Andrew,
   
   Thank you for comments.
   
   We do not want to be promoted to Master in the node that primitive 
   resource does not start.
   Is there the setting of colocation and order which are not promoted to 
   Master of the Master node?
   
   Your config looks reasonable... almost certainly a bug in the PE.
   Do you happen to have the relevant pengine input file available?
   
   Really?
  
  I would expect that:
  
    colocation rsc_colocation-master-1 INFINITY: msPostgresql:Master A-master
  
  would only promote msPostgresql on a node where A-master was running.
  
  Is that not what you were wanting?
  
  
   It was like right handling of PE as far as I confirmed a source code of 
   PM1.1.
   I register this problem with Bugzilla and contact you.
   
   Best Regards,
   Hideo Yamauchi.
   
   --- On Wed, 2014/5/14, Andrew Beekhof and...@beekhof.net wrote:
   
   
   On 13 May 2014, at 3:14 pm, renayama19661...@ybb.ne.jp wrote:
   
   Hi All,
   
   We assume special resource constitution.
   Master of master-slave depends on primitive resource for the 
   constitution.
   
   We performed the setting that Master stopped becoming it in Slave node 
   experimentally.
   
   
      location rsc_location-msStateful-1 msPostgresql \
           rule $role=master 200: #uname eq srv01 \
           rule $role=master -INFINITY: #uname eq srv02
   
   The Master resource depends on the primitive resource.
   
      colocation rsc_colocation-master-1 INFINITY: msPostgresql:Master 
  A-master
   
   
   Step1) Start Slave node.
   ---
   [root@srv02 ~]# crm_mon -1 -Af
   Last updated: Tue May 13 22:28:12 2014
   Last change: Tue May 13 22:28:07 2014
   Stack: corosync
   Current DC: srv02 (3232238190) - partition WITHOUT quorum
   Version: 1.1.11-f0f09b8
   1 Nodes configured
   3 Resources configured
   
   
   Online: [ srv02 ]
   
   A-master     (ocf::heartbeat:Dummy): Started srv02 
   Master/Slave Set: msPostgresql [pgsql]
        Slaves: [ srv02 ]
   
   Node Attributes:
   * Node srv02:
       + master-pgsql                      : 5         
   
   Migration summary:
   * Node srv02: 
   ---
   
   Step2) Start Master node.
   ---
   [root@srv02 ~]# crm_mon -1 -Af
   Last updated: Tue May 13 22:33:39 2014
   Last change: Tue May 13 22:28:07 2014
   Stack: corosync
   Current DC: srv02 (3232238190) - partition with quorum
   Version: 1.1.11-f0f09b8
   2 Nodes configured
   3 Resources configured
   
   
   Online: [ srv01 srv02 ]
   
   A-master     (ocf::heartbeat:Dummy): Started srv02 
   Master/Slave Set: msPostgresql [pgsql]
        Masters: [ srv01 ]
        Slaves: [ srv02 ]
   
   Node Attributes:
   * Node srv01:
       + master-pgsql                      : 10        
   * Node srv02:
       + master-pgsql                      : 5         
   
   Migration summary:
   * Node srv02: 
   * Node srv01: 
   ---
   
   * The Master node that primitive node does not start becomes Master.
   
   
   We do not want to be promoted to Master in the node that primitive 
   resource does not start.
   Is there the setting of colocation and order which are not promoted to 
   Master of the Master node?
   
   Your config looks reasonable... almost certainly a bug in the PE.
   Do you happen to have the relevant pengine input file available?
   
   
   I think that one method includes the next method.
     * I handle it to update an attribute when primitive resource starts.
     * I write an attribute in the condition to be promoted to Master.
   
   
   In addition, we are often confused about control of 

[Pacemaker] [Problem] The dampen parameter of the attrd_updater command is ignored, and an attribute is updated.

2014-05-26 Thread renayama19661014
Hi All,

The attrd_updater command ignores the dampen parameter and updates an 
attribute.

Step1) Start one node.
[root@srv01 ~]# crm_mon -1 -Af
Last updated: Tue May 27 19:36:35 2014
Last change: Tue May 27 19:34:59 2014
Stack: corosync
Current DC: srv01 (3232238180) - partition WITHOUT quorum
Version: 1.1.11-f0f09b8
1 Nodes configured
0 Resources configured


Online: [ srv01 ]


Node Attributes:
* Node srv01:

Migration summary:
* Node srv01: 

Step2) Update an attribute by attrd_updater command.
[root@srv01 ~]# attrd_updater -n default_ping_set -U 500 -d 3000   

Step3) The attribute is updated without waiting for the time of the dampen 
parameter.
[root@srv01 ~]# cibadmin -Q | grep ping_set
  nvpair id=status-3232238180-default_ping_set 
name=default_ping_set value=500/

The next code seems to have a problem somehow or other.

--- attrd/command.c -
(snip)
/* this only involves cluster nodes. */
if(v-nodeid == 0  (v-is_remote == FALSE)) {
if(crm_element_value_int(xml, F_ATTRD_HOST_ID, (int*)v-nodeid) == 0) {
/* Create the name/id association */
crm_node_t *peer = crm_get_peer(v-nodeid, host);
crm_trace(We know %s's node id now: %s, peer-uname, peer-uuid);
if(election_state(writer) == election_won) {
write_attributes(FALSE, TRUE);
return;
}
}
}

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] The dampen parameter of the attrd_updater command is ignored, and an attribute is updated.

2014-05-27 Thread renayama19661014
Hi Andrew,

Thank you for comment.

  --- attrd/command.c -
  (snip)
     /* this only involves cluster nodes. */
     if(v-nodeid == 0  (v-is_remote == FALSE)) {
         if(crm_element_value_int(xml, F_ATTRD_HOST_ID, (int*)v-nodeid) == 
 0) {
             /* Create the name/id association */
             crm_node_t *peer = crm_get_peer(v-nodeid, host);
             crm_trace(We know %s's node id now: %s, peer-uname, 
 peer-uuid);
             if(election_state(writer) == election_won) {
                 write_attributes(FALSE, TRUE);
                 return;
             }
         }
     }
 
 This is for 5194 right?

No.
I listed the same thing in 5194, but this does not seem to be 5194 basic 
problems.

I try the reproduction of 5194 problems, but have not been able to yet reappear.
Possibly 5194 problems may not happen in PM1.1.12-rc1.
 * As for 5194 matters, please give me time a little more.

 
 I'd expect that block to hit this clause though:
 
      } else if(mainloop_timer_running(a-timer)) {
         crm_info(Write out of '%s' delayed: timer is running, a-id);
         return;

Which point of the source code does the suggested code mentioned above revise?
(Which line of the source code is it?)

Best Regards,
Hideo Yamauchi.

--- On Wed, 2014/5/28, Andrew Beekhof and...@beekhof.net wrote:

 
 On 27 May 2014, at 12:13 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi All,
  
  The attrd_updater command ignores the dampen parameter and updates an 
  attribute.
  
  Step1) Start one node.
  [root@srv01 ~]# crm_mon -1 -Af
  Last updated: Tue May 27 19:36:35 2014
  Last change: Tue May 27 19:34:59 2014
  Stack: corosync
  Current DC: srv01 (3232238180) - partition WITHOUT quorum
  Version: 1.1.11-f0f09b8
  1 Nodes configured
  0 Resources configured
  
  
  Online: [ srv01 ]
  
  
  Node Attributes:
  * Node srv01:
  
  Migration summary:
  * Node srv01: 
  
  Step2) Update an attribute by attrd_updater command.
  [root@srv01 ~]# attrd_updater -n default_ping_set -U 500 -d 3000       
  
  Step3) The attribute is updated without waiting for the time of the 
  dampen parameter.
  [root@srv01 ~]# cibadmin -Q | grep ping_set            
           nvpair id=status-3232238180-default_ping_set 
 name=default_ping_set value=500/
  
  The next code seems to have a problem somehow or other.
  
  --- attrd/command.c -
  (snip)
     /* this only involves cluster nodes. */
     if(v-nodeid == 0  (v-is_remote == FALSE)) {
         if(crm_element_value_int(xml, F_ATTRD_HOST_ID, (int*)v-nodeid) == 
 0) {
             /* Create the name/id association */
             crm_node_t *peer = crm_get_peer(v-nodeid, host);
             crm_trace(We know %s's node id now: %s, peer-uname, 
 peer-uuid);
             if(election_state(writer) == election_won) {
                 write_attributes(FALSE, TRUE);
                 return;
             }
         }
     }
 
 This is for 5194 right?
 
 I'd expect that block to hit this clause though:
 
      } else if(mainloop_timer_running(a-timer)) {
         crm_info(Write out of '%s' delayed: timer is running, a-id);
         return;
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] The dampen parameter of the attrd_updater command is ignored, and an attribute is updated.

2014-05-27 Thread renayama19661014
Hi Andrew,

  I'd expect that block to hit this clause though:
  
       } else if(mainloop_timer_running(a-timer)) {
          crm_info(Write out of '%s' delayed: timer is running, a-id);
          return;
 
 Which point of the source code does the suggested code mentioned above revise?
 (Which line of the source code is it?)

Is it the next cord that you pointed?

void
write_attribute(attribute_t *a)
{
int updates = 0;
(snip)
} else if(mainloop_timer_running(a-timer)) {
crm_info(Write out of '%s' delayed: timer is running, a-id);
return;
}
(snip)

At the time of phenomenon of the problem, the timer does not yet block it by 
this processing because it does not start.

Best Regards,
Hideo Yamauchi.

--- On Wed, 2014/5/28, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi Andrew,
 
 Thank you for comment.
 
   --- attrd/command.c -
   (snip)
      /* this only involves cluster nodes. */
      if(v-nodeid == 0  (v-is_remote == FALSE)) {
          if(crm_element_value_int(xml, F_ATTRD_HOST_ID, (int*)v-nodeid) 
  == 0) {
              /* Create the name/id association */
              crm_node_t *peer = crm_get_peer(v-nodeid, host);
              crm_trace(We know %s's node id now: %s, peer-uname, 
  peer-uuid);
              if(election_state(writer) == election_won) {
                  write_attributes(FALSE, TRUE);
                  return;
              }
          }
      }
  
  This is for 5194 right?
 
 No.
 I listed the same thing in 5194, but this does not seem to be 5194 basic 
 problems.
 
 I try the reproduction of 5194 problems, but have not been able to yet 
 reappear.
 Possibly 5194 problems may not happen in PM1.1.12-rc1.
  * As for 5194 matters, please give me time a little more.
 
  
  I'd expect that block to hit this clause though:
  
       } else if(mainloop_timer_running(a-timer)) {
          crm_info(Write out of '%s' delayed: timer is running, a-id);
          return;
 
 Which point of the source code does the suggested code mentioned above revise?
 (Which line of the source code is it?)
 
 Best Regards,
 Hideo Yamauchi.
 
 --- On Wed, 2014/5/28, Andrew Beekhof and...@beekhof.net wrote:
 
  
  On 27 May 2014, at 12:13 pm, renayama19661...@ybb.ne.jp wrote:
  
   Hi All,
   
   The attrd_updater command ignores the dampen parameter and updates an 
   attribute.
   
   Step1) Start one node.
   [root@srv01 ~]# crm_mon -1 -Af
   Last updated: Tue May 27 19:36:35 2014
   Last change: Tue May 27 19:34:59 2014
   Stack: corosync
   Current DC: srv01 (3232238180) - partition WITHOUT quorum
   Version: 1.1.11-f0f09b8
   1 Nodes configured
   0 Resources configured
   
   
   Online: [ srv01 ]
   
   
   Node Attributes:
   * Node srv01:
   
   Migration summary:
   * Node srv01: 
   
   Step2) Update an attribute by attrd_updater command.
   [root@srv01 ~]# attrd_updater -n default_ping_set -U 500 -d 3000       
   
   Step3) The attribute is updated without waiting for the time of the 
   dampen parameter.
   [root@srv01 ~]# cibadmin -Q | grep ping_set            
            nvpair id=status-3232238180-default_ping_set 
  name=default_ping_set value=500/
   
   The next code seems to have a problem somehow or other.
   
   --- attrd/command.c -
   (snip)
      /* this only involves cluster nodes. */
      if(v-nodeid == 0  (v-is_remote == FALSE)) {
          if(crm_element_value_int(xml, F_ATTRD_HOST_ID, (int*)v-nodeid) 
  == 0) {
              /* Create the name/id association */
              crm_node_t *peer = crm_get_peer(v-nodeid, host);
              crm_trace(We know %s's node id now: %s, peer-uname, 
  peer-uuid);
              if(election_state(writer) == election_won) {
                  write_attributes(FALSE, TRUE);
                  return;
              }
          }
      }
  
  This is for 5194 right?
  
  I'd expect that block to hit this clause though:
  
       } else if(mainloop_timer_running(a-timer)) {
          crm_info(Write out of '%s' delayed: timer is running, a-id);
          return;
  
  
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] The dampen parameter of the attrd_updater command is ignored, and an attribute is updated.

2014-05-28 Thread renayama19661014
Hi Andrew,

 Perhaps try:
 
 diff --git a/attrd/commands.c b/attrd/commands.c
 index 7f1b4b0..7342e23 100644
 --- a/attrd/commands.c
 +++ b/attrd/commands.c
 @@ -464,6 +464,15 @@ attrd_peer_update(crm_node_t *peer, xmlNode *xml, bool 
 filter)
  
      a-changed |= changed;
  
 +    if(changed) {
 +        if(a-timer) {
 +            crm_trace(Delayed write out (%dms) for %s, a-timeout_ms, 
 a-id);
 +            mainloop_timer_start(a-timer);
 +        } else {
 +            write_or_elect_attribute(a);
 +        }
 +    }
 +
      /* this only involves cluster nodes. */
      if(v-nodeid == 0  (v-is_remote == FALSE)) {
          if(crm_element_value_int(xml, F_ATTRD_HOST_ID, (int*)v-nodeid) == 
 0) {
 @@ -476,15 +485,6 @@ attrd_peer_update(crm_node_t *peer, xmlNode *xml, bool 
 filter)
              }
          }
      }
 -
 -    if(changed) {
 -        if(a-timer) {
 -            crm_trace(Delayed write out (%dms) for %s, a-timeout_ms, 
 a-id);
 -            mainloop_timer_start(a-timer);
 -        } else {
 -            write_or_elect_attribute(a);
 -        }
 -    }
  }
  
  void

Okay!
I confirm movement.

Many Thanks!
Hideo Yamauchi.

--- On Wed, 2014/5/28, Andrew Beekhof and...@beekhof.net wrote:

 
 On 28 May 2014, at 4:10 pm, Andrew Beekhof and...@beekhof.net wrote:
 
  
  On 28 May 2014, at 3:04 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi Andrew,
  
  I'd expect that block to hit this clause though:
  
      } else if(mainloop_timer_running(a-timer)) {
         crm_info(Write out of '%s' delayed: timer is running, a-id);
         return;
  
  Which point of the source code does the suggested code mentioned above 
  revise?
  (Which line of the source code is it?)
  
  Is it the next cord that you pointed?
  
  right
  
  
  void
  write_attribute(attribute_t *a)
  {
    int updates = 0;
  (snip)
    } else if(mainloop_timer_running(a-timer)) {
        crm_info(Write out of '%s' delayed: timer is running, a-id);
        return;
    }
  (snip)
  
  At the time of phenomenon of the problem, the timer does not yet block it 
  by this processing because it does not start.
  
  Thats the curious part
 
 Perhaps try:
 
 diff --git a/attrd/commands.c b/attrd/commands.c
 index 7f1b4b0..7342e23 100644
 --- a/attrd/commands.c
 +++ b/attrd/commands.c
 @@ -464,6 +464,15 @@ attrd_peer_update(crm_node_t *peer, xmlNode *xml, bool 
 filter)
  
      a-changed |= changed;
  
 +    if(changed) {
 +        if(a-timer) {
 +            crm_trace(Delayed write out (%dms) for %s, a-timeout_ms, 
 a-id);
 +            mainloop_timer_start(a-timer);
 +        } else {
 +            write_or_elect_attribute(a);
 +        }
 +    }
 +
      /* this only involves cluster nodes. */
      if(v-nodeid == 0  (v-is_remote == FALSE)) {
          if(crm_element_value_int(xml, F_ATTRD_HOST_ID, (int*)v-nodeid) == 
 0) {
 @@ -476,15 +485,6 @@ attrd_peer_update(crm_node_t *peer, xmlNode *xml, bool 
 filter)
              }
          }
      }
 -
 -    if(changed) {
 -        if(a-timer) {
 -            crm_trace(Delayed write out (%dms) for %s, a-timeout_ms, 
 a-id);
 -            mainloop_timer_start(a-timer);
 -        } else {
 -            write_or_elect_attribute(a);
 -        }
 -    }
  }
  
  void
 
 
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] The dampen parameter of the attrd_updater command is ignored, and an attribute is updated.

2014-05-28 Thread renayama19661014
Hi Andrew,

I confirmed movement at once.
Your patch solves a problem.

Many Thanks!
Hideo Yamauchi.

--- On Wed, 2014/5/28, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp 
wrote:

 Hi Andrew,
 
  Perhaps try:
  
  diff --git a/attrd/commands.c b/attrd/commands.c
  index 7f1b4b0..7342e23 100644
  --- a/attrd/commands.c
  +++ b/attrd/commands.c
  @@ -464,6 +464,15 @@ attrd_peer_update(crm_node_t *peer, xmlNode *xml, bool 
  filter)
   
       a-changed |= changed;
   
  +    if(changed) {
  +        if(a-timer) {
  +            crm_trace(Delayed write out (%dms) for %s, a-timeout_ms, 
  a-id);
  +            mainloop_timer_start(a-timer);
  +        } else {
  +            write_or_elect_attribute(a);
  +        }
  +    }
  +
       /* this only involves cluster nodes. */
       if(v-nodeid == 0  (v-is_remote == FALSE)) {
           if(crm_element_value_int(xml, F_ATTRD_HOST_ID, (int*)v-nodeid) 
  == 0) {
  @@ -476,15 +485,6 @@ attrd_peer_update(crm_node_t *peer, xmlNode *xml, bool 
  filter)
               }
           }
       }
  -
  -    if(changed) {
  -        if(a-timer) {
  -            crm_trace(Delayed write out (%dms) for %s, a-timeout_ms, 
  a-id);
  -            mainloop_timer_start(a-timer);
  -        } else {
  -            write_or_elect_attribute(a);
  -        }
  -    }
   }
   
   void
 
 Okay!
 I confirm movement.
 
 Many Thanks!
 Hideo Yamauchi.
 
 --- On Wed, 2014/5/28, Andrew Beekhof and...@beekhof.net wrote:
 
  
  On 28 May 2014, at 4:10 pm, Andrew Beekhof and...@beekhof.net wrote:
  
   
   On 28 May 2014, at 3:04 pm, renayama19661...@ybb.ne.jp wrote:
   
   Hi Andrew,
   
   I'd expect that block to hit this clause though:
   
       } else if(mainloop_timer_running(a-timer)) {
          crm_info(Write out of '%s' delayed: timer is running, a-id);
          return;
   
   Which point of the source code does the suggested code mentioned above 
   revise?
   (Which line of the source code is it?)
   
   Is it the next cord that you pointed?
   
   right
   
   
   void
   write_attribute(attribute_t *a)
   {
     int updates = 0;
   (snip)
     } else if(mainloop_timer_running(a-timer)) {
         crm_info(Write out of '%s' delayed: timer is running, a-id);
         return;
     }
   (snip)
   
   At the time of phenomenon of the problem, the timer does not yet block 
   it by this processing because it does not start.
   
   Thats the curious part
  
  Perhaps try:
  
  diff --git a/attrd/commands.c b/attrd/commands.c
  index 7f1b4b0..7342e23 100644
  --- a/attrd/commands.c
  +++ b/attrd/commands.c
  @@ -464,6 +464,15 @@ attrd_peer_update(crm_node_t *peer, xmlNode *xml, bool 
  filter)
   
       a-changed |= changed;
   
  +    if(changed) {
  +        if(a-timer) {
  +            crm_trace(Delayed write out (%dms) for %s, a-timeout_ms, 
  a-id);
  +            mainloop_timer_start(a-timer);
  +        } else {
  +            write_or_elect_attribute(a);
  +        }
  +    }
  +
       /* this only involves cluster nodes. */
       if(v-nodeid == 0  (v-is_remote == FALSE)) {
           if(crm_element_value_int(xml, F_ATTRD_HOST_ID, (int*)v-nodeid) 
  == 0) {
  @@ -476,15 +485,6 @@ attrd_peer_update(crm_node_t *peer, xmlNode *xml, bool 
  filter)
               }
           }
       }
  -
  -    if(changed) {
  -        if(a-timer) {
  -            crm_trace(Delayed write out (%dms) for %s, a-timeout_ms, 
  a-id);
  -            mainloop_timer_start(a-timer);
  -        } else {
  -            write_or_elect_attribute(a);
  -        }
  -    }
   }
   
   void
  
  
  
  
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Enhancement] When attrd reboots, the attribute disappears.

2014-06-08 Thread renayama19661014
Hi All,

I submitted a problem in next bugziila in the past.
 * https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2501

A similar phenomenon is generated in attrd of latest Pacemaker.

Step 1) Set the setting of the cluster as follows.
 export PCMK_fail_fast=no

Step 2) Start a cluster.

Step 3) Cause trouble in a resource and improve a trouble count.(fail-count)

[root@srv01 ~]# crm_mon -1 -Af
(snip)
Online: [ srv01 ]

 before-dummy   (ocf::heartbeat:Dummy): Started srv01 
 vip-master (ocf::heartbeat:Dummy2):Started srv01 


Migration summary:
* Node srv01: 
   before-dummy: migration-threshold=10 fail-count=1 last-failure='Mon Jun  9 
19:21:07 2014'

Failed actions:
before-dummy_monitor_1 on srv01 'not running' (7): call=11, 
status=complete, last-rc-change='Mon Jun  9 19:21:07 2014', queued=0ms, exec=0ms


Step 4) Reboot attrd in kill.(I assume that attrd breaks down and rebooted.)

Step 5) Produce trouble in a resource same as step 3 again.
 * The trouble number(fail-count) of times returns to 1.


[root@srv01 ~]# crm_mon -1 -Af 
(snip)
Online: [ srv01 ]

 before-dummy   (ocf::heartbeat:Dummy): Started srv01 
 vip-master (ocf::heartbeat:Dummy2):Started srv01 

Migration summary:
* Node srv01: 
   before-dummy: migration-threshold=10 fail-count=1 last-failure='Mon Jun  9 
19:22:47 2014'

Failed actions:
before-dummy_monitor_1 on srv01 'not running' (7): call=17, 
status=complete, last-rc-change='Mon Jun  9 19:22:47 2014', queued=0ms, exec=0ms


Even if attrd reboots, I think that it is necessary to improve attrd so that an 
attribute is maintained definitely.

Best Regards,
Hideo Yamauch.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Enhancement] When attrd reboots, the attribute disappears.

2014-06-09 Thread renayama19661014
Hi Andrew,

Thank you for comennts.

 Please use bugs.clusterlabs.org in future.
 I'll follow up in bugzilla

Okay!

Best Regards,
Hideo Yamauchi.

--- On Tue, 2014/6/10, Andrew Beekhof and...@beekhof.net wrote:

 
 On 9 Jun 2014, at 12:01 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi All,
  
  I submitted a problem in next bugziila in the past.
  * https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2501
 
 Please use bugs.clusterlabs.org in future.
 I'll follow up in bugzilla
 
  
  A similar phenomenon is generated in attrd of latest Pacemaker.
  
  Step 1) Set the setting of the cluster as follows.
  export PCMK_fail_fast=no
  
  Step 2) Start a cluster.
  
  Step 3) Cause trouble in a resource and improve a trouble count.(fail-count)
  
  [root@srv01 ~]# crm_mon -1 -Af
  (snip)
  Online: [ srv01 ]
  
  before-dummy   (ocf::heartbeat:Dummy): Started srv01 
  vip-master     (ocf::heartbeat:Dummy2):        Started srv01 
  
  
  Migration summary:
  * Node srv01: 
    before-dummy: migration-threshold=10 fail-count=1 last-failure='Mon Jun  
 9 19:21:07 2014'
  
  Failed actions:
     before-dummy_monitor_1 on srv01 'not running' (7): call=11, 
 status=complete, last-rc-change='Mon Jun  9 19:21:07 2014', queued=0ms, 
 exec=0ms
  
  
  Step 4) Reboot attrd in kill.(I assume that attrd breaks down and rebooted.)
  
  Step 5) Produce trouble in a resource same as step 3 again.
  * The trouble number(fail-count) of times returns to 1.
  
  
  [root@srv01 ~]# crm_mon -1 -Af         
  (snip)
  Online: [ srv01 ]
  
  before-dummy   (ocf::heartbeat:Dummy): Started srv01 
  vip-master     (ocf::heartbeat:Dummy2):        Started srv01 
  
  Migration summary:
  * Node srv01: 
    before-dummy: migration-threshold=10 fail-count=1 last-failure='Mon Jun  
 9 19:22:47 2014'
  
  Failed actions:
     before-dummy_monitor_1 on srv01 'not running' (7): call=17, 
 status=complete, last-rc-change='Mon Jun  9 19:22:47 2014', queued=0ms, 
 exec=0ms
  
  
  Even if attrd reboots, I think that it is necessary to improve attrd so 
  that an attribute is maintained definitely.
  
  Best Regards,
  Hideo Yamauch.
  
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Question] About snmp trap of crm_mon.

2014-07-23 Thread renayama19661014
Hi All,

We were going to confirm snmptrap function in crm_mon of Pacemaker1.1.12.
However, crm_mon does not seem to support a message for a new difference of cib.


void
crm_diff_update(const char *event, xmlNode * msg)
{
    int rc = -1;
    long now = time(NULL);
(snip)
    if (crm_mail_to || snmp_target || external_agent) {
        /* Process operation updates */
        xmlXPathObject *xpathObj = xpath_search(msg,
                                                // F_CIB_UPDATE_RESULT // 
XML_TAG_DIFF_ADDED
                                                // XML_LRM_TAG_RSC_OP);
        int lpc = 0, max = numXpathResults(xpathObj);
(snip)

Best Regards,
Hideo Yamauch.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question] About snmp trap of crm_mon.

2014-07-24 Thread renayama19661014
Hi Andrew,

 Perhaps someone feels like testing this:
   https://github.com/beekhof/pacemaker/commit/3df6aff
 
 Otherwise I'll do it on monday


An immediate correction, thank you.
I confirm snmp by the end of Monday.

Many Thanks!
Hideo Yamauchi.



- Original Message -
 From: Andrew Beekhof and...@beekhof.net
 To: renayama19661...@ybb.ne.jp; The Pacemaker cluster resource manager 
 pacemaker@oss.clusterlabs.org
 Cc: 
 Date: 2014/7/25, Fri 14:02
 Subject: Re: [Pacemaker] [Question] About snmp trap of crm_mon.
 
 
 On 24 Jul 2014, at 6:32 pm, Andrew Beekhof and...@beekhof.net wrote:
 
 
  On 24 Jul 2014, at 11:54 am, renayama19661...@ybb.ne.jp wrote:
 
  Hi All,
 
  We were going to confirm snmptrap function in crm_mon of 
 Pacemaker1.1.12.
  However, crm_mon does not seem to support a message for a new 
 difference of cib.
 
  dammit :(
 
 Perhaps someone feels like testing this:
   https://github.com/beekhof/pacemaker/commit/3df6aff
 
 Otherwise I'll do it on monday
 
 
 
 
  void
  crm_diff_update(const char *event, xmlNode * msg)
  {
     int rc = -1;
     long now = time(NULL);
  (snip)
     if (crm_mail_to || snmp_target || external_agent) {
         /* Process operation updates */
         xmlXPathObject *xpathObj = xpath_search(msg,
                                                 // 
 F_CIB_UPDATE_RESULT // XML_TAG_DIFF_ADDED
                                                 // 
 XML_LRM_TAG_RSC_OP);
         int lpc = 0, max = numXpathResults(xpathObj);
  (snip)
 
  Best Regards,
  Hideo Yamauch.
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question] About snmp trap of crm_mon.

2014-07-27 Thread renayama19661014

Hi Andrew,

 Perhaps someone feels like testing this:
   https://github.com/beekhof/pacemaker/commit/3df6aff
 
 Otherwise I'll do it on monday


I confirmed the output of the SNMP trap of the resource and the SNMP trap of 
STONITH.
By your correction, the crm_mon command came to send trap.

Please reflect a correction in Master repository.

Best Regards,
Hideo Yamauchi.



- Original Message -
From: renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp
To: Andrew Beekhof and...@beekhof.net; The Pacemaker cluster resource 
manager pacemaker@oss.clusterlabs.org 
Date: 2014/7/25, Fri 14:21
Subject: Re: [Pacemaker] [Question] About snmp trap of crm_mon.
 
Hi Andrew,

 Perhaps someone feels like testing this:
   https://github.com/beekhof/pacemaker/commit/3df6aff
 
 Otherwise I'll do it on monday


An immediate correction, thank you.
I confirm snmp by the end of Monday.

Many Thanks!
Hideo Yamauchi.



- Original Message -
 From: Andrew Beekhof and...@beekhof.net
 To: renayama19661...@ybb.ne.jp; The Pacemaker cluster resource manager 
 pacemaker@oss.clusterlabs.org
 Cc: 
 Date: 2014/7/25, Fri 14:02
 Subject: Re: [Pacemaker] [Question] About snmp trap of crm_mon.
 
 
 On 24 Jul 2014, at 6:32 pm, Andrew Beekhof and...@beekhof.net wrote:
 
 
  On 24 Jul 2014, at 11:54 am, renayama19661...@ybb.ne.jp wrote:
 
  Hi All,
 
  We were going to confirm snmptrap function in crm_mon of 
 Pacemaker1.1.12.
  However, crm_mon does not seem to support a message for a new 
 difference of cib.
 
  dammit :(
 
 Perhaps someone feels like testing this:
   https://github.com/beekhof/pacemaker/commit/3df6aff
 
 Otherwise I'll do it on monday
 
 
 
 
  void
  crm_diff_update(const char *event, xmlNode * msg)
  {
     int rc = -1;
     long now = time(NULL);
  (snip)
     if (crm_mail_to || snmp_target || external_agent) {
         /* Process operation updates */
         xmlXPathObject *xpathObj = xpath_search(msg,
                                                 // 
 F_CIB_UPDATE_RESULT // XML_TAG_DIFF_ADDED
                                                 // 
 XML_LRM_TAG_RSC_OP);
         int lpc = 0, max = numXpathResults(xpathObj);
  (snip)
 
  Best Regards,
  Hideo Yamauch.
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-04 Thread renayama19661014
Hi All,

We confirmed that lrmd caused the time-out of the monitor when the time of the 
system was revised.
When a system considers revision of the time when I used ntpd, it is a problem 
very much.

We can confirm this problem in the next procedure.

Step1) Start Pacemaker in a single node.
[root@snmp1 ~]# start pacemaker.combined
pacemaker.combined start/running, process 11382

Step2) Send simple crm.

trac2915-3.crm
primitive prmDummyA ocf:pacemaker:Dummy1 \
    op start interval=0s timeout=60s on-fail=restart \
    op monitor interval=10s timeout=30s on-fail=restart \
    op stop interval=0s timeout=60s on-fail=block
group grpA prmDummyA
location rsc_location-grpA-1 grpA \
    rule $id=rsc_location-grpA-1-rule   200: #uname eq snmp1 \
    rule $id=rsc_location-grpA-1-rule-0 100: #uname eq snmp2

property $id=cib-bootstrap-options \
    no-quorum-policy=ignore \
    stonith-enabled=false \
    crmd-transition-delay=2s
rsc_defaults $id=rsc-options \
    resource-stickiness=INFINITY \
    migration-threshold=1
--

[root@snmp1 ~]# crm configure load update trac2915-3.crm 
WARNING: rsc_location-grpA-1: referenced node snmp2 does not exist

[root@snmp1 ~]# crm_mon -1 -Af
Last updated: Fri Sep  5 13:09:45 2014
Last change: Fri Sep  5 13:09:13 2014
Stack: corosync
Current DC: snmp1 (3232238180) - partition WITHOUT quorum
Version: 1.1.12-561c4cf
1 Nodes configured
1 Resources configured


Online: [ snmp1 ]

 Resource Group: grpA
     prmDummyA  (ocf::pacemaker:Dummy1):        Started snmp1 

Node Attributes:
* Node snmp1:

Migration summary:
* Node snmp1: 

Step3) After the monitor of the resource just began, we push forward time than 
the timeout(timeout=30s) of the monitor.
[root@snmp1 ~]#  date -s +40sec
Fri Sep  5 13:11:04 JST 2014

Step4) The time-out of the monitor occurs.

[root@snmp1 ~]# crm_mon -1 -Af
Last updated: Fri Sep  5 13:11:24 2014
Last change: Fri Sep  5 13:09:13 2014
Stack: corosync
Current DC: snmp1 (3232238180) - partition WITHOUT quorum
Version: 1.1.12-561c4cf
1 Nodes configured
1 Resources configured


Online: [ snmp1 ]


Node Attributes:
* Node snmp1:

Migration summary:
* Node snmp1: 
   prmDummyA: migration-threshold=1 fail-count=1 last-failure='Fri Sep  5 
13:11:04 2014'

Failed actions:
    prmDummyA_monitor_1 on snmp1 'unknown error' (1): call=7, status=Timed 
Out, last-rc-change='Fri Sep  5 13:11:04 2014', queued=0ms, exec=0ms


I confirmed some problems, but seem to be caused by the fact that an event 
occurs somehow or other in g_main_loop of lrmd in the period when it is shorter 
than a monitor.

This problem does not seem to happen somehow or other in lrmd of PM1.0.

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-07 Thread renayama19661014
Hi Andrew,

Thank you for comments.

  I confirmed some problems, but seem to be caused by the fact that an event 
 occurs somehow or other in g_main_loop of lrmd in the period when it is 
 shorter 
 than a monitor.
 
 So if you create a trivial program with g_main_loop and a timer, and then 
 change 
 the system time, does the timer expire early?

Yes.

 
  This problem does not seem to happen somehow or other in lrmd of PM1.0.
 
 cluster-glue was probably using custom timeout code.


I watched implementation of glue, too.
The time-out handling of new lrmd seems to have to perform implementation 
similar to glue somehow or other.


Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-08 Thread renayama19661014
Hi Andrew,

  I confirmed some problems, but seem to be caused by the fact that 

 an event 
  occurs somehow or other in g_main_loop of lrmd in the period when it is 
 shorter 
  than a monitor.
 
  So if you create a trivial program with g_main_loop and a timer, and 
 then change 
  the system time, does the timer expire early?
 
  Yes.
 
 That sounds like a glib bug. Ideally we'd get it fixed there rather than 
 work-around it in pacemaker.
 Have you spoken to them at all?
 


No.
I investigate glib library a little more.
And I talk with community of glib.

I may talk again afterwards.

Many Thanks,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-09 Thread renayama19661014
Hi Andrew,

I confirmed it in various ways.

The conclusion varies in movement by a version of glib.
 * The problem occurs in RHEL6.x.
 * The problem does not occur in RHEL7.0.

And this problem is solved in glib of a new version.

A change of next glib seems to solve a problem in a new version.
 * 
https://github.com/GNOME/glib/commit/91113a8aeea40cc2d7dda65b09537980bb602a06#diff-fc9b4bb280a13f8e51c51b434e7d26fd

Many users expect right movement in old glib.
 * Till it shifts to RHEL7...

Do you not make modifications in Pacemaker to support an old version?
 * Model it on old G_() function.

Best Regards,
Hideo Yamauchi.



- Original Message -
 From: Andrew Beekhof and...@beekhof.net
 To: renayama19661...@ybb.ne.jp
 Cc: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Date: 2014/9/8, Mon 19:55
 Subject: Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision 
 of the system time.
 
 
 On 8 Sep 2014, at 7:12 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
 
  I confirmed some problems, but seem to be caused by the 
 fact that 
 
  an event 
  occurs somehow or other in g_main_loop of lrmd in the period 
 when it is 
  shorter 
  than a monitor.
 
  So if you create a trivial program with g_main_loop and a 
 timer, and 
  then change 
  the system time, does the timer expire early?
 
  Yes.
 
  That sounds like a glib bug. Ideally we'd get it fixed there rather 
 than 
  work-around it in pacemaker.
  Have you spoken to them at all?
 
 
 
  No.
  I investigate glib library a little more.
  And I talk with community of glib.
 
  I may talk again afterwards.
 
 Cool. I somewhat expect them to say working as designed.
 Which would be unfortunate, but it shouldn't be too hard to work around.
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-09 Thread renayama19661014
Hi Andrew,

Thank you for comments.

 I'll file a bug against glib on RHEL6 so that it gets fixed there.
 Can you send me your simple reproducer program?




I make revision during practice of timer_func2() at the time
When timer_func2() is carried out, time-out of timer_func() is completed before 
planned time.

-
#include stdio.h
#include glib.h
#include sys/times.h
gboolean timer_func(gpointer data){
        printf(TIMER EXPIRE!\n);
        fflush(stdout);
        exit(1);
//      return FALSE;
}
gboolean timer_func2(gpointer data){
        clock_t         ret;
        struct tms buff;

        ret = times(buff);

        printf(TIMER2 EXPIRE! %d\n, ret);
        fflush(stdout);
        return TRUE;
}
int main(int argc, char** argv){
        GMainLoop *m;
        clock_t         ret;
        struct tms buff;        gint64 t;

//      t = g_get_monotonic_time();
        m = g_main_new(FALSE);
        g_timeout_add(5000, timer_func2, NULL);
        g_timeout_add(6, timer_func, NULL);
        ret = times(buff);
        printf(START! %d\n, ret);]
        g_main_run(m);
}
-



Many Thanks,
Hideo Yamauchi.


- Original Message -
 From: Andrew Beekhof and...@beekhof.net
 To: renayama19661...@ybb.ne.jp
 Cc: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Date: 2014/9/10, Wed 13:56
 Subject: Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision 
 of the system time.
 
 
 On 10 Sep 2014, at 2:48 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
 
  I confirmed it in various ways.
 
  The conclusion varies in movement by a version of glib.
   * The problem occurs in RHEL6.x.
   * The problem does not occur in RHEL7.0.
 
  And this problem is solved in glib of a new version.
 
  A change of next glib seems to solve a problem in a new version.
   * 
 https://github.com/GNOME/glib/commit/91113a8aeea40cc2d7dda65b09537980bb602a06#diff-fc9b4bb280a13f8e51c51b434e7d26fd
 
  Many users expect right movement in old glib.
   * Till it shifts to RHEL7...
 
  Do you not make modifications in Pacemaker to support an old version?
   * Model it on old G_() function.
 
 I'll file a bug against glib on RHEL6 so that it gets fixed there.
 Can you send me your simple reproducer program?
 
 
  Best Regards,
  Hideo Yamauchi.
 
 
 
  - Original Message -
  From: Andrew Beekhof and...@beekhof.net
  To: renayama19661...@ybb.ne.jp
  Cc: The Pacemaker cluster resource manager 
 pacemaker@oss.clusterlabs.org
  Date: 2014/9/8, Mon 19:55
  Subject: Re: [Pacemaker] [Problem] lrmd detects monitor time-out by 
 revision of the system time.
 
 
  On 8 Sep 2014, at 7:12 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
 
  I confirmed some problems, but seem to be caused by 
 the 
  fact that 
 
  an event 
  occurs somehow or other in g_main_loop of lrmd in the 
 period 
  when it is 
  shorter 
  than a monitor.
 
  So if you create a trivial program with g_main_loop and 
 a 
  timer, and 
  then change 
  the system time, does the timer expire early?
 
  Yes.
 
  That sounds like a glib bug. Ideally we'd get it fixed 
 there rather 
  than 
  work-around it in pacemaker.
  Have you spoken to them at all?
 
 
 
  No.
  I investigate glib library a little more.
  And I talk with community of glib.
 
  I may talk again afterwards.
 
  Cool. I somewhat expect them to say working as designed.
  Which would be unfortunate, but it shouldn't be too hard to work 
 around.
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] About a process name to output in log.

2014-09-19 Thread renayama19661014
Hi All,

In the log of latest Pacemaker, the name of the lrmd process is output by the 
name of the pacemaker_remoted process.
We like that log is output by default as lrmd.
These names seem to be changed on a macro.
However, the option which even configure command changes this macro to does 
not seem to exist.

 * lrmd/Makefile.am
(snip)
pacemaker_remoted_CFLAGS= -DSUPPORT_REMOTE
(snip)
 * lrmd/main.c
(snip)
#if defined(HAVE_GNUTLS_GNUTLS_H)  defined(SUPPORT_REMOTE)
#  define ENABLE_PCMK_REMOTE
#endif
(snip)
#ifndef ENABLE_PCMK_REMOTE
    crm_log_preinit(lrmd, argc, argv);
    crm_set_options(NULL, [options], long_options,
                    Daemon for controlling services confirming to different 
standards);
#else
    crm_log_preinit(pacemaker_remoted, argc, argv);
    crm_set_options(NULL, [options], long_options,
                    Pacemaker Remote daemon for extending pacemaker 
functionality to remote nodes.);
#endif
(snip)


Please examine the option of the configure command to give macro.

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] About a process name to output in log.

2014-09-21 Thread renayama19661014
Hi Andrew,

Thank you for comments.

 In the log of latest Pacemaker, the name of the lrmd process is output by 
 the name of the pacemaker_remoted process.
 We like that log is output by default as lrmd.
 
 I think you just need: https://github.com/beekhof/pacemaker/commit/ad083a8


But, I did not understand the meaning of your answer.


It is SUPPORT_REMOTE macro to cause the macro of ENABLE_PCMK_REMOTE.
If I cannot change SUPPORT_REMOTE by configure command, it is necessary to 
change Makefile.am every time.

Is my understanding wrong?


Best Regards,
Hideo Yamauchi.



- Original Message -
 From: Andrew Beekhof and...@beekhof.net
 To: renayama19661...@ybb.ne.jp; The Pacemaker cluster resource manager 
 pacemaker@oss.clusterlabs.org
 Cc: 
 Date: 2014/9/19, Fri 20:50
 Subject: Re: [Pacemaker] About a process name to output in log.
 
 
 On 19 Sep 2014, at 5:04 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi All,
 
  In the log of latest Pacemaker, the name of the lrmd process is output by 
 the name of the pacemaker_remoted process.
  We like that log is output by default as lrmd.
 
 I think you just need: https://github.com/beekhof/pacemaker/commit/ad083a8
 
  These names seem to be changed on a macro.
  However, the option which even configure command changes this 
 macro to does not seem to exist.
 
   * lrmd/Makefile.am
  (snip)
  pacemaker_remoted_CFLAGS= -DSUPPORT_REMOTE
  (snip)
   * lrmd/main.c
  (snip)
  #if defined(HAVE_GNUTLS_GNUTLS_H)  defined(SUPPORT_REMOTE)
  #  define ENABLE_PCMK_REMOTE
  #endif
  (snip)
  #ifndef ENABLE_PCMK_REMOTE
      crm_log_preinit(lrmd, argc, argv);
      crm_set_options(NULL, [options], long_options,
                      Daemon for controlling services confirming to 
 different standards);
  #else
      crm_log_preinit(pacemaker_remoted, argc, argv);
      crm_set_options(NULL, [options], long_options,
                      Pacemaker Remote daemon for extending pacemaker 
 functionality to remote nodes.);
  #endif
  (snip)
 
 
  Please examine the option of the configure command to give 
 macro.
 
  Best Regards,
  Hideo Yamauchi.
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] About a process name to output in log.

2014-09-21 Thread renayama19661014
Hi Andrew,

 Is my understanding wrong?
 
 Without the above commit, the lrmd logs as 'paceamker_remoted' and 
 pacemaker_remoted logs as 'lrmd'.
 We just needed to swap the two cases.  Which is what the commit achieves.


Okay!

We use it with the thing which invalidated the commit mentioned above for a 
while.


Many Thanks,
Hideo Yamauchi.


- Original Message -
 From: Andrew Beekhof and...@beekhof.net
 To: renayama19661...@ybb.ne.jp
 Cc: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Date: 2014/9/22, Mon 10:05
 Subject: Re: [Pacemaker] About a process name to output in log.
 
 
 On 22 Sep 2014, at 10:54 am, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
 
  Thank you for comments.
 
  In the log of latest Pacemaker, the name of the lrmd process is 
 output by 
  the name of the pacemaker_remoted process.
  We like that log is output by default as lrmd.
   
  I think you just need: 
 https://github.com/beekhof/pacemaker/commit/ad083a8
 
 
  But, I did not understand the meaning of your answer.
 
 
  It is SUPPORT_REMOTE macro to cause the macro of ENABLE_PCMK_REMOTE.
  If I cannot change SUPPORT_REMOTE by configure command, it is 
 necessary to change Makefile.am every time.
 
  Is my understanding wrong?
 
 Without the above commit, the lrmd logs as 'paceamker_remoted' and 
 pacemaker_remoted logs as 'lrmd'.
 We just needed to swap the two cases.  Which is what the commit achieves.
 
 
 
  Best Regards,
  Hideo Yamauchi.
 
 
 
  - Original Message -
  From: Andrew Beekhof and...@beekhof.net
  To: renayama19661...@ybb.ne.jp; The Pacemaker cluster resource manager 
 pacemaker@oss.clusterlabs.org
  Cc: 
  Date: 2014/9/19, Fri 20:50
  Subject: Re: [Pacemaker] About a process name to output in log.
 
 
  On 19 Sep 2014, at 5:04 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi All,
 
  In the log of latest Pacemaker, the name of the lrmd process is 
 output by 
  the name of the pacemaker_remoted process.
  We like that log is output by default as lrmd.
 
  I think you just need: 
 https://github.com/beekhof/pacemaker/commit/ad083a8
 
  These names seem to be changed on a macro.
  However, the option which even configure command 
 changes this 
  macro to does not seem to exist.
 
    * lrmd/Makefile.am
  (snip)
  pacemaker_remoted_CFLAGS= -DSUPPORT_REMOTE
  (snip)
    * lrmd/main.c
  (snip)
  #if defined(HAVE_GNUTLS_GNUTLS_H)  
 defined(SUPPORT_REMOTE)
  #  define ENABLE_PCMK_REMOTE
  #endif
  (snip)
  #ifndef ENABLE_PCMK_REMOTE
       crm_log_preinit(lrmd, argc, argv);
       crm_set_options(NULL, [options], long_options,
                       Daemon for controlling services 
 confirming to 
  different standards);
  #else
       crm_log_preinit(pacemaker_remoted, argc, argv);
       crm_set_options(NULL, [options], long_options,
                       Pacemaker Remote daemon for extending 
 pacemaker 
  functionality to remote nodes.);
  #endif
  (snip)
 
 
  Please examine the option of the configure command to 
 give 
  macro.
 
  Best Regards,
  Hideo Yamauchi.
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] query ?

2014-09-28 Thread renayama19661014
Hi Alex,

Because recheck_timer moves by default every 15 minutes, state transition is 
calculated in pengine.


-
{ XML_CONFIG_ATTR_RECHECK, cluster_recheck_interval, time,
  Zero disables polling.  Positive values are an interval in seconds (unless 
other SI units are specified. eg. 5min), 15min, check_timer,
  Polling interval for time based changes to options, resource parameters and 
constraints.,
  The Cluster is primarily event driven, however the configuration can have 
elements that change based on time.
    To ensure these changes take effect, we can optionally poll the cluster's 
status for changes. },
{ load-threshold, NULL, percentage, NULL, 80%, check_utilization,
  The maximum amount of system resources that should be used by nodes in the 
cluster,
  The cluster will slow down its recovery process when the amount of system 
resources used
           (currently CPU) approaches this limit, },
-

Best Regards,
Hideo Yamauchi.




- Original Message -
 From: Alex Samad - Yieldbroker alex.sa...@yieldbroker.com
 To: pacemaker@oss.clusterlabs.org pacemaker@oss.clusterlabs.org
 Cc: 
 Date: 2014/9/29, Mon 10:56
 Subject: [Pacemaker] query ?
 
 Hi
 
 Is this normal logging ?
 
 Not sure if I need to investigate any thing
 
 Sep 29 11:35:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State 
 transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
 origin=crm_timer_popped ]
 Sep 29 11:35:15 gsdmz1 pengine[2480]:   notice: unpack_config: On loss of CCM 
 Quorum: Ignore
 Sep 29 11:35:15 gsdmz1 pengine[2480]:   notice: process_pe_message: 
 Calculated 
 Transition 196: /var/lib/pacer/pengine/pe-input-247.bz2
 Sep 29 11:35:15 gsdmz1 crmd[2481]:   notice: run_graph: Transition 196 
 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
 Source=/var/lib/pacer/pengine/pe-input-247.bz2): Complete
 Sep 29 11:35:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State 
 transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS 
 cause=C_FSA_INTERNAL origin=notify_crmd ]
 Sep 29 11:50:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State 
 transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
 origin=crm_timer_popped ]
 Sep 29 11:50:15 gsdmz1 pengine[2480]:   notice: unpack_config: On loss of CCM 
 Quorum: Ignore
 Sep 29 11:50:15 gsdmz1 pengine[2480]:   notice: process_pe_message: 
 Calculated 
 Transition 197: /var/lib/pacer/pengine/pe-input-247.bz2
 Sep 29 11:50:15 gsdmz1 crmd[2481]:   notice: run_graph: Transition 197 
 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
 Source=/var/lib/pacer/pengine/pe-input-247.bz2): Complete
 Sep 29 11:50:15 gsdmz1 crmd[2481]:   notice: do_state_transition: State 
 transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS 
 cause=C_FSA_INTERNAL origin=notify_crmd ]
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Lot of errors after update

2014-10-05 Thread renayama19661014
Hi Andrew,

 lrmd[1632]:    error: crm_abort: crm_glib_handler: Forked child 1840 to 
 record non-fatal assert at logging.c:73 : Source ID 51 was not found when 
 attempting to remove it
 lrmd[1632]:    crit: crm_glib_handler: GLib: Source ID 51 was not found 
 when attempting to remove it
 
 stack trace of child 1840?


No. I don't get it.

But, I have a simple method to confirm a problem of glib.
I register a problem with Bugzilla by the end of today and contact you.


Best Regards,
Hideo Yamauchi.



- Original Message -
 From: Andrew Beekhof and...@beekhof.net
 To: renayama19661...@ybb.ne.jp; The Pacemaker cluster resource manager 
 pacemaker@oss.clusterlabs.org
 Cc: 
 Date: 2014/10/6, Mon 10:40
 Subject: Re: [Pacemaker] Lot of errors after update
 
 
 On 3 Oct 2014, at 11:18 am, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
 
  About a similar problem, we confirmed it in Pacemaker1.1.12.
  The problem occurs in (glib2.40.0) in Ubuntu14.04.
 
  lrmd[1632]:    error: crm_abort: crm_glib_handler: Forked child 1840 to 
 record non-fatal assert at logging.c:73 : Source ID 51 was not found when 
 attempting to remove it
  lrmd[1632]:     crit: crm_glib_handler: GLib: Source ID 51 was not found 
 when attempting to remove it
 
 stack trace of child 1840?
 
 
 
  This problem does not happen in RHEL6.
 
 
  The cause of the version of glib seem to be different.
 
 
  When g_source_remove does timer processing to return FALSE, it becomes the 
 error in glib2.40.0.(Probably as for the subsequent version too)
 
  It seems to be necessary to revise Pacemaker to solve a problem.
 
  Best Regards,
  Hideo Yamauchi.
 
 
  - Original Message -
  From: Andrew Beekhof and...@beekhof.net
  To: The Pacemaker cluster resource manager 
 pacemaker@oss.clusterlabs.org
  Cc: 
  Date: 2014/10/3, Fri 08:06
  Subject: Re: [Pacemaker] Lot of errors after update
 
 
  On 3 Oct 2014, at 12:10 am, Riccardo Bicelli 
 r.bice...@gmail.com wrote:
 
  I'm running  pacemaker-1.0.10
 
  well and truly time to get off the 1.0.x series
 
  and  glib-2.40.0-r1:2 on gentoo
 
  Il 30/09/2014 23:23, Andrew Beekhof ha scritto:
  On 30 Sep 2014, at 11:36 pm, Riccardo Bicelli 
  r.bice...@gmail.com
    wrote:
 
 
  Hello,
  I've just updated my cluster nodes and now I see lot of 
 these 
  errors in syslog:
 
  Sep 30 15:32:43 localhost cib: [2870]: ERROR: crm_abort: 
  crm_glib_handler: Forked child 28573 to record non-fatal assert at 
 utils.c:449 : 
  Source ID 128394 was not found when attempting to remove it
  Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort: 
  crm_glib_handler: Forked child 28753 to record non-fatal assert at 
 utils.c:449 : 
  Source ID 128395 was not found when attempting to remove it
  Sep 30 15:32:55 localhost attrd: [2872]: ERROR: crm_abort: 
  crm_glib_handler: Forked child 28756 to record non-fatal assert at 
 utils.c:449 : 
  Source ID 58434 was not found when attempting to remove it
  Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort: 
  crm_glib_handler: Forked child 28757 to record non-fatal assert at 
 utils.c:449 : 
  Source ID 128396 was not found when attempting to remove it
  Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort: 
  crm_glib_handler: Forked child 28876 to record non-fatal assert at 
 utils.c:449 : 
  Source ID 128397 was not found when attempting to remove it
  Sep 30 15:33:04 localhost attrd: [2872]: ERROR: crm_abort: 
  crm_glib_handler: Forked child 28877 to record non-fatal assert at 
 utils.c:449 : 
  Source ID 58435 was not found when attempting to remove it
  Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort: 
  crm_glib_handler: Forked child 28878 to record non-fatal assert at 
 utils.c:449 : 
  Source ID 128398 was not found when attempting to remove it
  Sep 30 15:33:11 localhost cib: [2870]:  29010 to record 
 non-fatal 
  assert at utils.c:449 : Source ID 128399 was not found when attempting 
 to remove 
  it
  Sep 30 15:33:11 localhost attrd: [2872]: ERROR: crm_abort: 
  crm_glib_handler: Forked child 29011 to record non-fatal assert at 
 utils.c:449 : 
  Source ID 58436 was not found when attempting to remove it
  Sep 30 15:33:11 localhost cib: [2870]: ERROR: crm_abort: 
  crm_glib_handler: Forked child 29012 to record non-fatal assert at 
 utils.c:449 : 
  Source ID 128400 was not found when attempting to remove it
  Sep 30 15:33:14 localhost cib: [2870]: ERROR: crm_abort: 
  crm_glib_handler: Forked child 29060 to record non-fatal assert at 
 utils.c:449 : 
  Source ID 128401 was not found when attempting to remove it
  Sep 30 15:33:14 localhost attrd: [2872]: ERROR: crm_abort: 
  crm_glib_handler: Forked child 29061 to record non-fatal assert at 
 utils.c:449 : 
  Source ID 58437 was not found when attempting to remove it
 
  I don't understand what does it mean.
 
  It means glib is bitching about something it didn't used 
 to.
 
  What version of pacemaker did you update to?  I'm 
 reasonably 
  confident they're fixed in 1.1.12
 
 
 

<    1   2   3   4   >