Re: [Pacemaker] pacemaker processes RSS growth

2012-09-14 Thread Vladislav Bogdanov
13.09.2012 15:18, Vladislav Bogdanov wrote:

...

 and now it runs on my testing cluster.
 
 Ipc-related memory problems seem to be completely fixed now, processes
 own memory (RES-SHR in terms of htop) does not grow any longer (after 40
 minutes). Although I see that both RES and SHR counters sometimes
 increase synchronously. lrmd does not grow at all. Will look again after
 few hours.


So, lrmd is ok. I see only 4kb growth in RES-SHR on one node (current
DC). Other instances are of the constant size for almost a day.

I see RES-SHR growth in pacemakerd (100kb per day). So I expect some
leakage here. Should I run it under valgrind?

And I see that both RES and SHR synchronously grow in crmd (600-700kb
per day on member nodes, 6Mb on DC), while RES-SHR is reduced by 24kb on
DC.

And I see cib growth in both RES and SHR in range 12-340 kb, and 4kb
growth in RES-SHR on nodes except DC.

I can't say for sure what causes growth of shared pages.
May be it is /dev/shm. Lot of files are there. I'll look if it grows.

# ls -l /dev/shm
total 75492
-rw--- 1 hacluster root  24 Sep 13 10:49
qb-attrd-control-1732-1734-6
-rw--- 1 root  root 1048576 Sep 13 10:49
qb-cfg-event-1634-1727-29-data
-rw--- 1 root  root8248 Sep 13 10:49
qb-cfg-event-1634-1727-29-header
-rw--- 1 hacluster root 1048576 Sep 13 10:49
qb-cfg-event-1634-1734-37-data
-rw--- 1 hacluster root8248 Sep 13 10:49
qb-cfg-event-1634-1734-37-header
-rw--- 1 hacluster root 1048576 Sep 13 10:49
qb-cfg-event-1634-1734-38-data
-rw--- 1 hacluster root8248 Sep 13 10:49
qb-cfg-event-1634-1734-38-header
-rw--- 1 hacluster root 1048576 Sep 13 10:49
qb-cfg-event-1634-1734-39-data
-rw--- 1 hacluster root8248 Sep 13 10:49
qb-cfg-event-1634-1734-39-header
-rw--- 1 root  root 1048576 Sep 13 10:50
qb-cfg-event-1634-2440-36-data
-rw--- 1 root  root8248 Sep 13 10:50
qb-cfg-event-1634-2440-36-header
-rw--- 1 root  root 1048576 Sep 13 10:49
qb-cfg-request-1634-1727-29-data
-rw--- 1 root  root8252 Sep 13 10:49
qb-cfg-request-1634-1727-29-header
-rw--- 1 hacluster root 1048576 Sep 13 10:49
qb-cfg-request-1634-1734-37-data
-rw--- 1 hacluster root8252 Sep 13 10:49
qb-cfg-request-1634-1734-37-header
-rw--- 1 hacluster root 1048576 Sep 13 10:49
qb-cfg-request-1634-1734-38-data
-rw--- 1 hacluster root8252 Sep 13 10:49
qb-cfg-request-1634-1734-38-header
-rw--- 1 hacluster root 1048576 Sep 13 10:49
qb-cfg-request-1634-1734-39-data
-rw--- 1 hacluster root8252 Sep 13 10:49
qb-cfg-request-1634-1734-39-header
-rw--- 1 root  root 1048576 Sep 13 10:50
qb-cfg-request-1634-2440-36-data
-rw--- 1 root  root8252 Sep 13 10:50
qb-cfg-request-1634-2440-36-header
-rw--- 1 root  root 1048576 Sep 13 10:49
qb-cfg-response-1634-1727-29-data
-rw--- 1 root  root8248 Sep 13 10:49
qb-cfg-response-1634-1727-29-header
-rw--- 1 hacluster root 1048576 Sep 13 10:49
qb-cfg-response-1634-1734-37-data
-rw--- 1 hacluster root8248 Sep 13 10:49
qb-cfg-response-1634-1734-37-header
-rw--- 1 hacluster root 1048576 Sep 13 10:49
qb-cfg-response-1634-1734-38-data
-rw--- 1 hacluster root8248 Sep 13 10:49
qb-cfg-response-1634-1734-38-header
-rw--- 1 hacluster root 1048576 Sep 13 10:49
qb-cfg-response-1634-1734-39-data
-rw--- 1 hacluster root8248 Sep 13 10:49
qb-cfg-response-1634-1734-39-header
-rw--- 1 root  root 1048576 Sep 13 10:50
qb-cfg-response-1634-2440-36-data
-rw--- 1 root  root8248 Sep 13 10:50
qb-cfg-response-1634-2440-36-header
-rw--- 1 hacluster root  24 Sep 13 10:49
qb-cib_rw-control-1729-1730-10
-rw--- 1 hacluster root  24 Sep 13 10:49
qb-cib_rw-control-1729-1732-12
-rw--- 1 hacluster root  524288 Sep 13 11:11
qb-cib_shm-event-1729-1734-8-data
-rw--- 1 hacluster root8248 Sep 13 10:49
qb-cib_shm-event-1729-1734-8-header
-rw--- 1 hacluster root  524288 Sep 13 21:55
qb-cib_shm-request-1729-1734-8-data
-rw--- 1 hacluster root8252 Sep 13 10:49
qb-cib_shm-request-1729-1734-8-header
-rw--- 1 hacluster root  524288 Sep 13 10:49
qb-cib_shm-response-1729-1734-8-data
-rw--- 1 hacluster root8248 Sep 13 10:49
qb-cib_shm-response-1729-1734-8-header
-rw--- 1 root  root 8388608 Sep 14 06:34 qb-corosync-blackbox-data
-rw--- 1 root  root8248 Sep 13 10:49 qb-corosync-blackbox-header
-rw--- 1 root  root 1048576 Sep 13 10:49
qb-cpg-event-1634-1727-30-data
-rw--- 1 root  root8248 Sep 13 10:49
qb-cpg-event-1634-1727-30-header
-rw--- 1 hacluster root 1048576 Sep 13 11:11
qb-cpg-event-1634-1729-33-data
-rw--- 1 hacluster root8248 Sep 13 10:49
qb-cpg-event-1634-1729-33-header
-rw--- 1 root  root 1048576 Sep 13 10:49
qb-cpg-event-1634-1730-32-data
-rw--- 1 root  root8248 Sep 13 10:49
qb-cpg-event-1634-1730-32-header
-rw--- 1 hacluster root 1048576 Sep 13 10:49

Re: [Pacemaker] master/slave resource does not stop (tries start repeatedly)

2012-09-14 Thread Kazunori INOUE

Hi Andrew,

I confirmed that this problem had been resolved.
- ClusterLabs/pacemaker : 7a9bf21cfc

However, I found two problems.

(1) it is output with orphan in crm_mon.

  # crm_mon -rf1
   :
  Full list of resources:

   Master/Slave Set: msAP [prmAP]
   Stopped: [ prmAP:0 prmAP:1 ]

  Migration summary:
  * Node vm5:
 prmAP: orphan
  * Node vm6:
 prmAP: orphan

  Failed actions:
  prmAP_monitor_1 (node=vm5, call=15, rc=1, status=complete): unknown 
error
  prmAP_monitor_1 (node=vm6, call=21, rc=1, status=complete): unknown 
error

(2) and, cannot clear the failure status.

  CIB is not updated even if I execute a 'crm_resource -C'.

  # crm_resource -C -r msAP
  Cleaning up prmAP:0 on vm5
  Cleaning up prmAP:0 on vm6
  Cleaning up prmAP:1 on vm5
  Cleaning up prmAP:1 on vm6
  Waiting for 1 replies from the CRMd. OK

  # cibadmin -Q -o status
  status
node_state id=2439358656 uname=vm5 in_ccm=true crmd=online join=member 
expected=member crm-debug-origin=do_update_resource
  transient_attributes id=2439358656
instance_attributes id=status-2439358656
  nvpair id=status-2439358656-probe_complete name=probe_complete 
value=true/
  nvpair id=status-2439358656-fail-count-prmAP name=fail-count-prmAP 
value=1/
  nvpair id=status-2439358656-last-failure-prmAP name=last-failure-prmAP 
value=1347598951/
/instance_attributes
  /transient_attributes
  lrm id=2439358656
lrm_resources
  lrm_resource id=prmAP type=Stateful class=ocf 
provider=pacemaker
lrm_rsc_op id=prmAP_last_0 operation_key=prmAP_stop_0 operation=stop crm-debug-origin=do_update_resource crm_feature_set=3.0.6 
transition-key=1:5:0:2935833e-7e6f-4931-9da8-f13f7de7aafc transition-magic=0:0;1:5:0:2935833e-7e6f-4931-9da8-f13f7de7aafc call-id=24 rc-code=0 op-status=0 interval=0 
last-run=1347598936 last-rc-change=0 exec-time=205 queue-time=0 op-digest=f2317cad3d54cec5d7d7aa7d0bf35cf8/
lrm_rsc_op id=prmAP_monitor_1 operation_key=prmAP_monitor_1 operation=monitor crm-debug-origin=do_update_resource crm_feature_set=3.0.6 
transition-key=10:3:8:2935833e-7e6f-4931-9da8-f13f7de7aafc transition-magic=0:8;10:3:8:2935833e-7e6f-4931-9da8-f13f7de7aafc call-id=15 rc-code=8 op-status=0 
interval=1 last-rc-change=1347598916 exec-time=40 queue-time=0 op-digest=4811cef7f7f94e3a35a70be7916cb2fd/
lrm_rsc_op id=prmAP_last_failure_0 operation_key=prmAP_monitor_1 operation=monitor crm-debug-origin=do_update_resource crm_feature_set=3.0.6 
transition-key=10:3:8:2935833e-7e6f-4931-9da8-f13f7de7aafc transition-magic=0:1;10:3:8:2935833e-7e6f-4931-9da8-f13f7de7aafc call-id=15 rc-code=1 op-status=0 
interval=1 last-rc-change=1347598936 exec-time=0 queue-time=0 op-digest=4811cef7f7f94e3a35a70be7916cb2fd/
  /lrm_resource
/lrm_resources
  /lrm
/node_state
node_state id=2456135872 uname=vm6 in_ccm=true crmd=online join=member 
expected=member crm-debug-origin=do_update_resource
  transient_attributes id=2456135872
instance_attributes id=status-2456135872
  nvpair id=status-2456135872-probe_complete name=probe_complete 
value=true/
  nvpair id=status-2456135872-fail-count-prmAP name=fail-count-prmAP 
value=1/
  nvpair id=status-2456135872-last-failure-prmAP name=last-failure-prmAP 
value=1347598962/
/instance_attributes
  /transient_attributes
  lrm id=2456135872
lrm_resources
  lrm_resource id=prmAP type=Stateful class=ocf 
provider=pacemaker
lrm_rsc_op id=prmAP_last_0 operation_key=prmAP_stop_0 operation=stop crm-debug-origin=do_update_resource crm_feature_set=3.0.6 
transition-key=1:9:0:2935833e-7e6f-4931-9da8-f13f7de7aafc transition-magic=0:0;1:9:0:2935833e-7e6f-4931-9da8-f13f7de7aafc call-id=30 rc-code=0 op-status=0 interval=0 
last-run=1347598962 last-rc-change=0 exec-time=230 queue-time=0 op-digest=f2317cad3d54cec5d7d7aa7d0bf35cf8/
lrm_rsc_op id=prmAP_monitor_1 operation_key=prmAP_monitor_1 operation=monitor crm-debug-origin=do_update_resource crm_feature_set=3.0.6 
transition-key=9:7:8:2935833e-7e6f-4931-9da8-f13f7de7aafc transition-magic=0:8;9:7:8:2935833e-7e6f-4931-9da8-f13f7de7aafc call-id=21 rc-code=8 op-status=0 
interval=1 last-rc-change=1347598952 exec-time=43 queue-time=0 op-digest=4811cef7f7f94e3a35a70be7916cb2fd/
lrm_rsc_op id=prmAP_last_failure_0 operation_key=prmAP_monitor_1 operation=monitor crm-debug-origin=do_update_resource crm_feature_set=3.0.6 
transition-key=9:7:8:2935833e-7e6f-4931-9da8-f13f7de7aafc transition-magic=0:1;9:7:8:2935833e-7e6f-4931-9da8-f13f7de7aafc call-id=21 rc-code=1 op-status=0 
interval=1 last-rc-change=1347598962 exec-time=0 queue-time=0 op-digest=4811cef7f7f94e3a35a70be7916cb2fd/
  /lrm_resource
/lrm_resources
  /lrm
/node_state
  /status


I wrote a patch for crm_mon and crm_resource.
(I am not checking 

[Pacemaker] pacemakerd does not daemonize

2012-09-14 Thread Borislav Borisov
Hello all, Andrew,


I am performing tests against pacemaker from commit
7a9bf21cfc993530812ee43bde8c5af2653c1fa6 and for some reason it does not
want to daemonize:

root@Cluster-Server-1:~/crmsh# pacemakerd -V
Could not establish pacemakerd connection: Connection refused (111)
info: crm_ipc_connect:  Could not establish pacemakerd connection:
Connection refused (111)
info: config_find_next: Processing additional service options...
info: get_config_opt:   Found 'corosync_quorum' for option: name
info: config_find_next: Processing additional service options...
info: get_config_opt:   Found 'corosync_cman' for option: name
info: config_find_next: Processing additional service options...
info: get_config_opt:   Found 'openais_clm' for option: name
info: config_find_next: Processing additional service options...
info: get_config_opt:   Found 'openais_evt' for option: name
info: config_find_next: Processing additional service options...
info: get_config_opt:   Found 'openais_ckpt' for option: name
info: config_find_next: Processing additional service options...
info: get_config_opt:   Found 'openais_msg' for option: name
info: config_find_next: Processing additional service options...
info: get_config_opt:   Found 'openais_lck' for option: name
info: config_find_next: Processing additional service options...
info: get_config_opt:   Found 'openais_tmr' for option: name
info: config_find_next: No additional configuration supplied for:
service
info: config_find_next: Processing additional quorum options...
info: get_config_opt:   Found 'quorum_cman' for option: provider
info: get_cluster_type: Detected an active 'cman' cluster
info: read_config:  Reading configure for stack: cman
info: config_find_next: Processing additional logging options...
info: get_config_opt:   Defaulting to 'off' for option: debug
info: get_config_opt:   Found '/var/log/cluster/corosync.log' for
option: logfile
info: get_config_opt:   Found 'yes' for option: to_logfile
info: get_config_opt:   Found 'no' for option: to_syslog
info: get_config_opt:   Found 'daemon' for option: syslog_facility
  notice: crm_add_logfile:  Additional logging available in
/var/log/cluster/corosync.log
info: read_config:  User configured file based logging and explicitly
disabled syslog.
  notice: main: Starting Pacemaker 1.1.7 (Build: 7a9bf21):  ncurses
libqb-logging libqb-ipc lha-fencing  corosync-plugin cman
info: main: Maximum core file size is: 18446744073709551615
info: qb_ipcs_us_publish:   server name: pacemakerd
info: get_local_node_name:  Using CMAN node name: Cluster-Server-1
  notice: update_node_processes:0xfe76d0 Node 1 now known as
Cluster-Server-1, was:
info: start_child:  Forked child 33591 for process cib
info: start_child:  Forked child 33592 for process stonith-ng
info: start_child:  Forked child 33593 for process lrmd
info: start_child:  Forked child 33594 for process attrd
info: start_child:  Forked child 33595 for process pengine
info: start_child:  Forked child 33596 for process crmd
info: main: Starting mainloop
  notice: update_node_processes:0xfe8370 Node 2 now known as
Cluster-Server-2, was:

It continues to work without any problem, just never goes into background.

Cheers,

Borislav
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pacemakerd does not daemonize

2012-09-14 Thread Borislav Borisov
Mysteriously the problem is gone, so I guess I screwed up the installation
at some point.

On Fri, Sep 14, 2012 at 12:32 PM, Borislav Borisov 
borislav.v.bori...@gmail.com wrote:

 Hello all, Andrew,


 I am performing tests against pacemaker from commit
 7a9bf21cfc993530812ee43bde8c5af2653c1fa6 and for some reason it does not
 want to daemonize:

 root@Cluster-Server-1:~/crmsh# pacemakerd -V
 Could not establish pacemakerd connection: Connection refused (111)
 info: crm_ipc_connect:  Could not establish pacemakerd connection:
 Connection refused (111)
 info: config_find_next: Processing additional service options...
 info: get_config_opt:   Found 'corosync_quorum' for option: name
 info: config_find_next: Processing additional service options...
 info: get_config_opt:   Found 'corosync_cman' for option: name
 info: config_find_next: Processing additional service options...
 info: get_config_opt:   Found 'openais_clm' for option: name
 info: config_find_next: Processing additional service options...
 info: get_config_opt:   Found 'openais_evt' for option: name
 info: config_find_next: Processing additional service options...
 info: get_config_opt:   Found 'openais_ckpt' for option: name
 info: config_find_next: Processing additional service options...
 info: get_config_opt:   Found 'openais_msg' for option: name
 info: config_find_next: Processing additional service options...
 info: get_config_opt:   Found 'openais_lck' for option: name
 info: config_find_next: Processing additional service options...
 info: get_config_opt:   Found 'openais_tmr' for option: name
 info: config_find_next: No additional configuration supplied for:
 service
 info: config_find_next: Processing additional quorum options...
 info: get_config_opt:   Found 'quorum_cman' for option: provider
 info: get_cluster_type: Detected an active 'cman' cluster
 info: read_config:  Reading configure for stack: cman
 info: config_find_next: Processing additional logging options...
 info: get_config_opt:   Defaulting to 'off' for option: debug
 info: get_config_opt:   Found '/var/log/cluster/corosync.log' for
 option: logfile
 info: get_config_opt:   Found 'yes' for option: to_logfile
 info: get_config_opt:   Found 'no' for option: to_syslog
 info: get_config_opt:   Found 'daemon' for option: syslog_facility
   notice: crm_add_logfile:  Additional logging available in
 /var/log/cluster/corosync.log
 info: read_config:  User configured file based logging and explicitly
 disabled syslog.
   notice: main: Starting Pacemaker 1.1.7 (Build: 7a9bf21):
 ncurses libqb-logging libqb-ipc lha-fencing  corosync-plugin cman
 info: main: Maximum core file size is: 18446744073709551615
 info: qb_ipcs_us_publish:   server name: pacemakerd
 info: get_local_node_name:  Using CMAN node name: Cluster-Server-1
   notice: update_node_processes:0xfe76d0 Node 1 now known as
 Cluster-Server-1, was:
 info: start_child:  Forked child 33591 for process cib
 info: start_child:  Forked child 33592 for process stonith-ng
 info: start_child:  Forked child 33593 for process lrmd
 info: start_child:  Forked child 33594 for process attrd
 info: start_child:  Forked child 33595 for process pengine
 info: start_child:  Forked child 33596 for process crmd
 info: main: Starting mainloop
   notice: update_node_processes:0xfe8370 Node 2 now known as
 Cluster-Server-2, was:

 It continues to work without any problem, just never goes into background.

 Cheers,

 Borislav

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [COLOCATION] constraints

2012-09-14 Thread Kashif Jawed Siddiqui
Hi,



colocation myset-1 inf: app2 app1
The above indicates app1 is the Dominant resource. If app1 is stopped, app2 
also stops



The chain is app1 - app2



Next,

colocation myset inf: app1 app2 app3

the above indiactes app1 is the Dominant resource.

If app1 is stopped, app2  app3 also stops,

if app2 stops, then only app3 stops



The chain is app1 - app2 - app3



The above is equivalent of:

colocation myset-1 inf: app2 app1
colocation myset-2 inf: app3 app2

The question:

Why the ordering of allocating resource cannot follow one format? Why 
difference of configuration?



Why not implement like below?

colocation myset-1 inf: app1 app2

The chain can be app1 - app2



colocation myset-1 inf: app1 app2 app3

The chain is app1 - app2 - app3



And its equivalent (for easy understanding can be)

 colocation myset-1 inf: app1 app2

 colocation myset-2 inf: app2 app3



And so on..



Why is the diffrence in configuration for 2 resources only? More than 2 
resources follow the same pattern.



Please help explain?













Regards,
Kashif Jawed Siddiqui

***
This e-mail and attachments contain confidential information from HUAWEI, which 
is intended only for the person or entity whose address is listed above. Any 
use of the information contained herein in any way (including, but not limited 
to, total or partial disclosure, reproduction, or dissemination) by persons 
other than the intended recipient's) is prohibited. If you receive this e-mail 
in error, please notify the sender by phone or email immediately and delete it!
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [COLOCATION] constraints

2012-09-14 Thread Lars Marowsky-Bree
On 2012-09-14T10:26:05, Kashif Jawed Siddiqui kashi...@huawei.com wrote:

 Why the ordering of allocating resource cannot follow one format? Why 
 difference of configuration?

Because it was a mistake made at one point and then it became impossible
to fix because existing configurations and scripts would have to be
changed - and we can't break compatibility like that.


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] crm shell issues

2012-09-14 Thread Borislav Borisov
Hi all, Dejan,


I am struggling to get the latest crmsh version (812:b58a3398bf11) to work
with the latest pacemaker version () and so far I've encountered couple of
issues.

The first one, which was already discussed on the list, INFO: object
Cluster-Server-1 cannot be represented in the CLI notation. Because you
never replied to what Vladislav Bogdanov reported in his last reply - I
just added the type=normal parameter using crm edit xml, to fix the issue.

The next thing that I encountered, I believe that it was discussed earlier
this year:

 crm(live)configure# primitive dummy ocf:heartbeat:Dummy
 ERROR: pengine:metadata: could not parse meta-data:


Which was fixed with the following patch:

 diff -r b58a3398bf11 configure.ac
 --- a/configure.ac  Thu Sep 13 12:19:56 2012 +0200
 +++ b/configure.ac  Fri Sep 14 14:35:17 2012 +0300
 @@ -190,11 +190,9 @@
  AC_DEFINE_UNQUOTED(CRM_DTD_DIRECTORY,$CRM_DTD_DIRECTORY, Where to keep
 CIB configuration files)
  AC_SUBST(CRM_DTD_DIRECTORY)

 -dnl Eventually move out of the heartbeat dir tree and create
 compatability code
 -dnl CRM_DAEMON_DIR=$libdir/pacemaker
 -GLUE_DAEMON_DIR=`extract_header_define $GLUE_HEADER GLUE_DAEMON_DIR`
 -AC_DEFINE_UNQUOTED(GLUE_DAEMON_DIR,$GLUE_DAEMON_DIR, Location for
 Pacemaker daemons)
 -AC_SUBST(GLUE_DAEMON_DIR)
 +CRM_DAEMON_DIR=`$PKGCONFIG pcmk --variable=daemondir`
 +AC_DEFINE_UNQUOTED(CRM_DAEMON_DIR,$CRM_DAEMON_DIR, Location for the
 Pacemaker daemons)
 +AC_SUBST(CRM_DAEMON_DIR)

  CRM_CACHE_DIR=${localstatedir}/cache/crm
  AC_DEFINE_UNQUOTED(CRM_CACHE_DIR,$CRM_CACHE_DIR, Where crm shell keeps
 the cache)
 diff -r b58a3398bf11 modules/vars.py.in
 --- a/modules/vars.py.inThu Sep 13 12:19:56 2012 +0200
 +++ b/modules/vars.py.inFri Sep 14 14:35:17 2012 +0300
 @@ -200,7 +200,7 @@
  crm_schema_dir = @CRM_DTD_DIRECTORY@
  pe_dir = @PE_STATE_DIR@
  crm_conf_dir = @CRM_CONFIG_DIR@
 -crm_daemon_dir = @GLUE_DAEMON_DIR@
 +crm_daemon_dir = @CRM_DAEMON_DIR@
  crm_daemon_user = @CRM_DAEMON_USER@
  crm_version = @VERSION@ (Build @BUILD_VERSION@)


 What came next was:

 ERROR: running cibadmin -Ql -o rsc_defaults: Call cib_query failed (-6):
 No such device or address

Configuring any of the rsc_defaults parameters solves that problem.

The last thing encountered was the unability to add LBS resource.

 crm(live)# ra
 crm(live)ra# list lsb
 acpid   apache2 apcupsd
 atd bootlogdbootlogs
 bootmisc.sh checkfs.sh  checkroot.sh
 clamav-freshclamcman
 console-setup   corosynccorosync-notifyd
 cronctdbdbus
 drbdhalthdparm
 hostname.sh hwclock.sh
 hwclockfirst.sh ifupdownifupdown-clean
 iptablesiscsi-scst  kbd
 keyboard-setup  killprocs   ldirectord
 logdlvm2
 mdadm   mdadm-raid  minidlna
 module-init-tools   mountall-bootclean.sh   mountall.sh
 mountdevsubfs.shmountkernfs.sh  mountnfs-bootclean.sh
 mountnfs.sh mountoverflowtmp
 mpt-statusd mrmonitor   mrmonitor.dpkg-old
 msm_profile mtab.sh netatalk
 networking  nfs-common  nfs-kernel-server
 ntp openais
 openhpidpacemaker   procps
 proftpd quota   quotarpc
 rc  rc.localrcS
 reboot  rmnologin
 rpcbind rsync   rsyslog
 samba   screen-cleanup  scst
 sendsigssingle  smartd
 smartmontools   snmpd
 ssh stop-bootlogd   stop-bootlogd-single
 stor_agent  sudosysstat
 tdm2udevudev-mtab
 umountfsumountnfs.sh
 umountroot  ups-monitor urandom
 vivaldiframeworkd   winbind x11-common
 xinetd
 crm(live)ra# end
 crm(live)# configure
 crm(live)configure# primitive testlsb lsb:nfs-kernel-server
 ERROR: lsb:nfs-kernel-server: could not parse meta-data:
 ERROR: lsb:nfs-kernel-server: no such resource agent


Since I need this for my testing I stopped here.  I do not know how
adequate my patch for the daemon dir, but it did the job. The lsb I just
couldn't tackle.

Cheers,

Borislav
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] correct way to deploy a CLVM configuration with pacemaker

2012-09-14 Thread Alberto Menichetti

Hi all,

I'm trying to deploy a CLVM configuration; my VGs will be active on only 
1 node at time and I won't use a clustered fs but ext3.


I configured clvmd and dlm in this way:

primitive cluster-dlm ocf:pacemaker:controld op monitor interval=60 \ 
timeout=60 meta is-managed=true


primitive cluster-lvm ocf:lvm2:clvmd params daemon_timeout=30 \
meta is-managed=true

group cluster-base cluster-dlm cluster-lvm meta is-managed=true

clone cluster-infra cluster-base meta \
interleave=true is-managed=true


Suppose now that I want to configure a resource to manage my VG, 
something like this:


primitive wfq-lv-rs ocf:heartbeat:LVM params \
volgrpname=WFQ_vg exclusive=yes op start interval=0 \
op monitor interval=120s timeout=60s op stop \
interval=0 timeout=30s meta is-managed=true



I think that my LVM resource should be someway dependant from 
cluster-infra; in my opinion the following dependencies should be honored:


1. the resource who manage the VG, wfq-lv-rs, must be started only after 
the resource who manage the CLVM


2. because the resource who manage the CLVM is inside a clone resource 
and will be started in all nodes, the wfq-lv-rs must be started only in 
a node who has the clone resource containing the CLVM resource online.



If the above assumptions are correct, how is it possible to manage this 
in pacemaker?


Thank you,
Alberto


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Percona MySQL RA on MySQL 5.5 problem

2012-09-14 Thread Nathan Bird
I tried searching a bit and it seems like this one hasn't been reported
yet, my apologies if it has.

The RA currently issues a RESET SLAVE command, but the meaning of this
changed in 5.5. https://dev.mysql.com/doc/refman/5.5/en/reset-slave.html

It now needs to do a RESET SLAVE ALL.

Before I changed this the resource agent got into a weird state because
'show slave status' wasn't producing the expected blank output on a master.

Resource agent pulled from
https://github.com/y-trudeau/resource-agents-prm/raw/master/heartbeat/mysql
on 8/21/12 (though doesn't appear to have changed since).


Cheers,
Nathan Bird


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pacemaker processes RSS growth

2012-09-14 Thread Vladislav Bogdanov
14.09.2012 09:54, Vladislav Bogdanov wrote:
 13.09.2012 15:18, Vladislav Bogdanov wrote:
 
 ...
 
 and now it runs on my testing cluster.

 Ipc-related memory problems seem to be completely fixed now, processes
 own memory (RES-SHR in terms of htop) does not grow any longer (after 40
 minutes). Although I see that both RES and SHR counters sometimes
 increase synchronously. lrmd does not grow at all. Will look again after
 few hours.
 
 
 So, lrmd is ok. I see only 4kb growth in RES-SHR on one node (current
 DC). Other instances are of the constant size for almost a day.
 
 I see RES-SHR growth in pacemakerd (100kb per day). So I expect some
 leakage here. Should I run it under valgrind?

Valgrind doesn't find anything valuable here (1 and 9 hours runs).

==23851== LEAK SUMMARY:
==23851==definitely lost: 528 bytes in 3 blocks
==23851==indirectly lost: 17,361 bytes in 36 blocks
==23851==  possibly lost: 234 bytes in 8 blocks
==23851==still reachable: 17,458 bytes in 163 blocks
==23851== suppressed: 0 bytes in 0 blocks

 
 And I see that both RES and SHR synchronously grow in crmd (600-700kb
 per day on member nodes, 6Mb on DC), while RES-SHR is reduced by 24kb on
 DC.
 
 And I see cib growth in both RES and SHR in range 12-340 kb, and 4kb
 growth in RES-SHR on nodes except DC.
 
 I can't say for sure what causes growth of shared pages.
 May be it is /dev/shm. Lot of files are there. I'll look if it grows.
 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] When are you going to release the next version of Booth?

2012-09-14 Thread Jiaju Zhang
On Thu, 2012-09-13 at 19:16 +0900, Yuichi SEINO wrote:
 Hi Jiaju,
 
 If the schedule is determined, I would like to know when you are going
 to release the next version of booth.

Sorry, it has not been decided yet. Currently I myself am busy with
other work and have not found time yet ... 

 And I have an other question. Currently, How many people operate booth
 on the system?

I have not had the exact number so far. However, if you have specific
use cases, we can discuss.

Thanks,
Jiaju





___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org