Re: [Pacemaker] pacemaker processes RSS growth
13.09.2012 15:18, Vladislav Bogdanov wrote: ... and now it runs on my testing cluster. Ipc-related memory problems seem to be completely fixed now, processes own memory (RES-SHR in terms of htop) does not grow any longer (after 40 minutes). Although I see that both RES and SHR counters sometimes increase synchronously. lrmd does not grow at all. Will look again after few hours. So, lrmd is ok. I see only 4kb growth in RES-SHR on one node (current DC). Other instances are of the constant size for almost a day. I see RES-SHR growth in pacemakerd (100kb per day). So I expect some leakage here. Should I run it under valgrind? And I see that both RES and SHR synchronously grow in crmd (600-700kb per day on member nodes, 6Mb on DC), while RES-SHR is reduced by 24kb on DC. And I see cib growth in both RES and SHR in range 12-340 kb, and 4kb growth in RES-SHR on nodes except DC. I can't say for sure what causes growth of shared pages. May be it is /dev/shm. Lot of files are there. I'll look if it grows. # ls -l /dev/shm total 75492 -rw--- 1 hacluster root 24 Sep 13 10:49 qb-attrd-control-1732-1734-6 -rw--- 1 root root 1048576 Sep 13 10:49 qb-cfg-event-1634-1727-29-data -rw--- 1 root root8248 Sep 13 10:49 qb-cfg-event-1634-1727-29-header -rw--- 1 hacluster root 1048576 Sep 13 10:49 qb-cfg-event-1634-1734-37-data -rw--- 1 hacluster root8248 Sep 13 10:49 qb-cfg-event-1634-1734-37-header -rw--- 1 hacluster root 1048576 Sep 13 10:49 qb-cfg-event-1634-1734-38-data -rw--- 1 hacluster root8248 Sep 13 10:49 qb-cfg-event-1634-1734-38-header -rw--- 1 hacluster root 1048576 Sep 13 10:49 qb-cfg-event-1634-1734-39-data -rw--- 1 hacluster root8248 Sep 13 10:49 qb-cfg-event-1634-1734-39-header -rw--- 1 root root 1048576 Sep 13 10:50 qb-cfg-event-1634-2440-36-data -rw--- 1 root root8248 Sep 13 10:50 qb-cfg-event-1634-2440-36-header -rw--- 1 root root 1048576 Sep 13 10:49 qb-cfg-request-1634-1727-29-data -rw--- 1 root root8252 Sep 13 10:49 qb-cfg-request-1634-1727-29-header -rw--- 1 hacluster root 1048576 Sep 13 10:49 qb-cfg-request-1634-1734-37-data -rw--- 1 hacluster root8252 Sep 13 10:49 qb-cfg-request-1634-1734-37-header -rw--- 1 hacluster root 1048576 Sep 13 10:49 qb-cfg-request-1634-1734-38-data -rw--- 1 hacluster root8252 Sep 13 10:49 qb-cfg-request-1634-1734-38-header -rw--- 1 hacluster root 1048576 Sep 13 10:49 qb-cfg-request-1634-1734-39-data -rw--- 1 hacluster root8252 Sep 13 10:49 qb-cfg-request-1634-1734-39-header -rw--- 1 root root 1048576 Sep 13 10:50 qb-cfg-request-1634-2440-36-data -rw--- 1 root root8252 Sep 13 10:50 qb-cfg-request-1634-2440-36-header -rw--- 1 root root 1048576 Sep 13 10:49 qb-cfg-response-1634-1727-29-data -rw--- 1 root root8248 Sep 13 10:49 qb-cfg-response-1634-1727-29-header -rw--- 1 hacluster root 1048576 Sep 13 10:49 qb-cfg-response-1634-1734-37-data -rw--- 1 hacluster root8248 Sep 13 10:49 qb-cfg-response-1634-1734-37-header -rw--- 1 hacluster root 1048576 Sep 13 10:49 qb-cfg-response-1634-1734-38-data -rw--- 1 hacluster root8248 Sep 13 10:49 qb-cfg-response-1634-1734-38-header -rw--- 1 hacluster root 1048576 Sep 13 10:49 qb-cfg-response-1634-1734-39-data -rw--- 1 hacluster root8248 Sep 13 10:49 qb-cfg-response-1634-1734-39-header -rw--- 1 root root 1048576 Sep 13 10:50 qb-cfg-response-1634-2440-36-data -rw--- 1 root root8248 Sep 13 10:50 qb-cfg-response-1634-2440-36-header -rw--- 1 hacluster root 24 Sep 13 10:49 qb-cib_rw-control-1729-1730-10 -rw--- 1 hacluster root 24 Sep 13 10:49 qb-cib_rw-control-1729-1732-12 -rw--- 1 hacluster root 524288 Sep 13 11:11 qb-cib_shm-event-1729-1734-8-data -rw--- 1 hacluster root8248 Sep 13 10:49 qb-cib_shm-event-1729-1734-8-header -rw--- 1 hacluster root 524288 Sep 13 21:55 qb-cib_shm-request-1729-1734-8-data -rw--- 1 hacluster root8252 Sep 13 10:49 qb-cib_shm-request-1729-1734-8-header -rw--- 1 hacluster root 524288 Sep 13 10:49 qb-cib_shm-response-1729-1734-8-data -rw--- 1 hacluster root8248 Sep 13 10:49 qb-cib_shm-response-1729-1734-8-header -rw--- 1 root root 8388608 Sep 14 06:34 qb-corosync-blackbox-data -rw--- 1 root root8248 Sep 13 10:49 qb-corosync-blackbox-header -rw--- 1 root root 1048576 Sep 13 10:49 qb-cpg-event-1634-1727-30-data -rw--- 1 root root8248 Sep 13 10:49 qb-cpg-event-1634-1727-30-header -rw--- 1 hacluster root 1048576 Sep 13 11:11 qb-cpg-event-1634-1729-33-data -rw--- 1 hacluster root8248 Sep 13 10:49 qb-cpg-event-1634-1729-33-header -rw--- 1 root root 1048576 Sep 13 10:49 qb-cpg-event-1634-1730-32-data -rw--- 1 root root8248 Sep 13 10:49 qb-cpg-event-1634-1730-32-header -rw--- 1 hacluster root 1048576 Sep 13 10:49
Re: [Pacemaker] master/slave resource does not stop (tries start repeatedly)
Hi Andrew, I confirmed that this problem had been resolved. - ClusterLabs/pacemaker : 7a9bf21cfc However, I found two problems. (1) it is output with orphan in crm_mon. # crm_mon -rf1 : Full list of resources: Master/Slave Set: msAP [prmAP] Stopped: [ prmAP:0 prmAP:1 ] Migration summary: * Node vm5: prmAP: orphan * Node vm6: prmAP: orphan Failed actions: prmAP_monitor_1 (node=vm5, call=15, rc=1, status=complete): unknown error prmAP_monitor_1 (node=vm6, call=21, rc=1, status=complete): unknown error (2) and, cannot clear the failure status. CIB is not updated even if I execute a 'crm_resource -C'. # crm_resource -C -r msAP Cleaning up prmAP:0 on vm5 Cleaning up prmAP:0 on vm6 Cleaning up prmAP:1 on vm5 Cleaning up prmAP:1 on vm6 Waiting for 1 replies from the CRMd. OK # cibadmin -Q -o status status node_state id=2439358656 uname=vm5 in_ccm=true crmd=online join=member expected=member crm-debug-origin=do_update_resource transient_attributes id=2439358656 instance_attributes id=status-2439358656 nvpair id=status-2439358656-probe_complete name=probe_complete value=true/ nvpair id=status-2439358656-fail-count-prmAP name=fail-count-prmAP value=1/ nvpair id=status-2439358656-last-failure-prmAP name=last-failure-prmAP value=1347598951/ /instance_attributes /transient_attributes lrm id=2439358656 lrm_resources lrm_resource id=prmAP type=Stateful class=ocf provider=pacemaker lrm_rsc_op id=prmAP_last_0 operation_key=prmAP_stop_0 operation=stop crm-debug-origin=do_update_resource crm_feature_set=3.0.6 transition-key=1:5:0:2935833e-7e6f-4931-9da8-f13f7de7aafc transition-magic=0:0;1:5:0:2935833e-7e6f-4931-9da8-f13f7de7aafc call-id=24 rc-code=0 op-status=0 interval=0 last-run=1347598936 last-rc-change=0 exec-time=205 queue-time=0 op-digest=f2317cad3d54cec5d7d7aa7d0bf35cf8/ lrm_rsc_op id=prmAP_monitor_1 operation_key=prmAP_monitor_1 operation=monitor crm-debug-origin=do_update_resource crm_feature_set=3.0.6 transition-key=10:3:8:2935833e-7e6f-4931-9da8-f13f7de7aafc transition-magic=0:8;10:3:8:2935833e-7e6f-4931-9da8-f13f7de7aafc call-id=15 rc-code=8 op-status=0 interval=1 last-rc-change=1347598916 exec-time=40 queue-time=0 op-digest=4811cef7f7f94e3a35a70be7916cb2fd/ lrm_rsc_op id=prmAP_last_failure_0 operation_key=prmAP_monitor_1 operation=monitor crm-debug-origin=do_update_resource crm_feature_set=3.0.6 transition-key=10:3:8:2935833e-7e6f-4931-9da8-f13f7de7aafc transition-magic=0:1;10:3:8:2935833e-7e6f-4931-9da8-f13f7de7aafc call-id=15 rc-code=1 op-status=0 interval=1 last-rc-change=1347598936 exec-time=0 queue-time=0 op-digest=4811cef7f7f94e3a35a70be7916cb2fd/ /lrm_resource /lrm_resources /lrm /node_state node_state id=2456135872 uname=vm6 in_ccm=true crmd=online join=member expected=member crm-debug-origin=do_update_resource transient_attributes id=2456135872 instance_attributes id=status-2456135872 nvpair id=status-2456135872-probe_complete name=probe_complete value=true/ nvpair id=status-2456135872-fail-count-prmAP name=fail-count-prmAP value=1/ nvpair id=status-2456135872-last-failure-prmAP name=last-failure-prmAP value=1347598962/ /instance_attributes /transient_attributes lrm id=2456135872 lrm_resources lrm_resource id=prmAP type=Stateful class=ocf provider=pacemaker lrm_rsc_op id=prmAP_last_0 operation_key=prmAP_stop_0 operation=stop crm-debug-origin=do_update_resource crm_feature_set=3.0.6 transition-key=1:9:0:2935833e-7e6f-4931-9da8-f13f7de7aafc transition-magic=0:0;1:9:0:2935833e-7e6f-4931-9da8-f13f7de7aafc call-id=30 rc-code=0 op-status=0 interval=0 last-run=1347598962 last-rc-change=0 exec-time=230 queue-time=0 op-digest=f2317cad3d54cec5d7d7aa7d0bf35cf8/ lrm_rsc_op id=prmAP_monitor_1 operation_key=prmAP_monitor_1 operation=monitor crm-debug-origin=do_update_resource crm_feature_set=3.0.6 transition-key=9:7:8:2935833e-7e6f-4931-9da8-f13f7de7aafc transition-magic=0:8;9:7:8:2935833e-7e6f-4931-9da8-f13f7de7aafc call-id=21 rc-code=8 op-status=0 interval=1 last-rc-change=1347598952 exec-time=43 queue-time=0 op-digest=4811cef7f7f94e3a35a70be7916cb2fd/ lrm_rsc_op id=prmAP_last_failure_0 operation_key=prmAP_monitor_1 operation=monitor crm-debug-origin=do_update_resource crm_feature_set=3.0.6 transition-key=9:7:8:2935833e-7e6f-4931-9da8-f13f7de7aafc transition-magic=0:1;9:7:8:2935833e-7e6f-4931-9da8-f13f7de7aafc call-id=21 rc-code=1 op-status=0 interval=1 last-rc-change=1347598962 exec-time=0 queue-time=0 op-digest=4811cef7f7f94e3a35a70be7916cb2fd/ /lrm_resource /lrm_resources /lrm /node_state /status I wrote a patch for crm_mon and crm_resource. (I am not checking
[Pacemaker] pacemakerd does not daemonize
Hello all, Andrew, I am performing tests against pacemaker from commit 7a9bf21cfc993530812ee43bde8c5af2653c1fa6 and for some reason it does not want to daemonize: root@Cluster-Server-1:~/crmsh# pacemakerd -V Could not establish pacemakerd connection: Connection refused (111) info: crm_ipc_connect: Could not establish pacemakerd connection: Connection refused (111) info: config_find_next: Processing additional service options... info: get_config_opt: Found 'corosync_quorum' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'corosync_cman' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'openais_clm' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'openais_evt' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'openais_ckpt' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'openais_msg' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'openais_lck' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'openais_tmr' for option: name info: config_find_next: No additional configuration supplied for: service info: config_find_next: Processing additional quorum options... info: get_config_opt: Found 'quorum_cman' for option: provider info: get_cluster_type: Detected an active 'cman' cluster info: read_config: Reading configure for stack: cman info: config_find_next: Processing additional logging options... info: get_config_opt: Defaulting to 'off' for option: debug info: get_config_opt: Found '/var/log/cluster/corosync.log' for option: logfile info: get_config_opt: Found 'yes' for option: to_logfile info: get_config_opt: Found 'no' for option: to_syslog info: get_config_opt: Found 'daemon' for option: syslog_facility notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log info: read_config: User configured file based logging and explicitly disabled syslog. notice: main: Starting Pacemaker 1.1.7 (Build: 7a9bf21): ncurses libqb-logging libqb-ipc lha-fencing corosync-plugin cman info: main: Maximum core file size is: 18446744073709551615 info: qb_ipcs_us_publish: server name: pacemakerd info: get_local_node_name: Using CMAN node name: Cluster-Server-1 notice: update_node_processes:0xfe76d0 Node 1 now known as Cluster-Server-1, was: info: start_child: Forked child 33591 for process cib info: start_child: Forked child 33592 for process stonith-ng info: start_child: Forked child 33593 for process lrmd info: start_child: Forked child 33594 for process attrd info: start_child: Forked child 33595 for process pengine info: start_child: Forked child 33596 for process crmd info: main: Starting mainloop notice: update_node_processes:0xfe8370 Node 2 now known as Cluster-Server-2, was: It continues to work without any problem, just never goes into background. Cheers, Borislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemakerd does not daemonize
Mysteriously the problem is gone, so I guess I screwed up the installation at some point. On Fri, Sep 14, 2012 at 12:32 PM, Borislav Borisov borislav.v.bori...@gmail.com wrote: Hello all, Andrew, I am performing tests against pacemaker from commit 7a9bf21cfc993530812ee43bde8c5af2653c1fa6 and for some reason it does not want to daemonize: root@Cluster-Server-1:~/crmsh# pacemakerd -V Could not establish pacemakerd connection: Connection refused (111) info: crm_ipc_connect: Could not establish pacemakerd connection: Connection refused (111) info: config_find_next: Processing additional service options... info: get_config_opt: Found 'corosync_quorum' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'corosync_cman' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'openais_clm' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'openais_evt' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'openais_ckpt' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'openais_msg' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'openais_lck' for option: name info: config_find_next: Processing additional service options... info: get_config_opt: Found 'openais_tmr' for option: name info: config_find_next: No additional configuration supplied for: service info: config_find_next: Processing additional quorum options... info: get_config_opt: Found 'quorum_cman' for option: provider info: get_cluster_type: Detected an active 'cman' cluster info: read_config: Reading configure for stack: cman info: config_find_next: Processing additional logging options... info: get_config_opt: Defaulting to 'off' for option: debug info: get_config_opt: Found '/var/log/cluster/corosync.log' for option: logfile info: get_config_opt: Found 'yes' for option: to_logfile info: get_config_opt: Found 'no' for option: to_syslog info: get_config_opt: Found 'daemon' for option: syslog_facility notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log info: read_config: User configured file based logging and explicitly disabled syslog. notice: main: Starting Pacemaker 1.1.7 (Build: 7a9bf21): ncurses libqb-logging libqb-ipc lha-fencing corosync-plugin cman info: main: Maximum core file size is: 18446744073709551615 info: qb_ipcs_us_publish: server name: pacemakerd info: get_local_node_name: Using CMAN node name: Cluster-Server-1 notice: update_node_processes:0xfe76d0 Node 1 now known as Cluster-Server-1, was: info: start_child: Forked child 33591 for process cib info: start_child: Forked child 33592 for process stonith-ng info: start_child: Forked child 33593 for process lrmd info: start_child: Forked child 33594 for process attrd info: start_child: Forked child 33595 for process pengine info: start_child: Forked child 33596 for process crmd info: main: Starting mainloop notice: update_node_processes:0xfe8370 Node 2 now known as Cluster-Server-2, was: It continues to work without any problem, just never goes into background. Cheers, Borislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [COLOCATION] constraints
Hi, colocation myset-1 inf: app2 app1 The above indicates app1 is the Dominant resource. If app1 is stopped, app2 also stops The chain is app1 - app2 Next, colocation myset inf: app1 app2 app3 the above indiactes app1 is the Dominant resource. If app1 is stopped, app2 app3 also stops, if app2 stops, then only app3 stops The chain is app1 - app2 - app3 The above is equivalent of: colocation myset-1 inf: app2 app1 colocation myset-2 inf: app3 app2 The question: Why the ordering of allocating resource cannot follow one format? Why difference of configuration? Why not implement like below? colocation myset-1 inf: app1 app2 The chain can be app1 - app2 colocation myset-1 inf: app1 app2 app3 The chain is app1 - app2 - app3 And its equivalent (for easy understanding can be) colocation myset-1 inf: app1 app2 colocation myset-2 inf: app2 app3 And so on.. Why is the diffrence in configuration for 2 resources only? More than 2 resources follow the same pattern. Please help explain? Regards, Kashif Jawed Siddiqui *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [COLOCATION] constraints
On 2012-09-14T10:26:05, Kashif Jawed Siddiqui kashi...@huawei.com wrote: Why the ordering of allocating resource cannot follow one format? Why difference of configuration? Because it was a mistake made at one point and then it became impossible to fix because existing configurations and scripts would have to be changed - and we can't break compatibility like that. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] crm shell issues
Hi all, Dejan, I am struggling to get the latest crmsh version (812:b58a3398bf11) to work with the latest pacemaker version () and so far I've encountered couple of issues. The first one, which was already discussed on the list, INFO: object Cluster-Server-1 cannot be represented in the CLI notation. Because you never replied to what Vladislav Bogdanov reported in his last reply - I just added the type=normal parameter using crm edit xml, to fix the issue. The next thing that I encountered, I believe that it was discussed earlier this year: crm(live)configure# primitive dummy ocf:heartbeat:Dummy ERROR: pengine:metadata: could not parse meta-data: Which was fixed with the following patch: diff -r b58a3398bf11 configure.ac --- a/configure.ac Thu Sep 13 12:19:56 2012 +0200 +++ b/configure.ac Fri Sep 14 14:35:17 2012 +0300 @@ -190,11 +190,9 @@ AC_DEFINE_UNQUOTED(CRM_DTD_DIRECTORY,$CRM_DTD_DIRECTORY, Where to keep CIB configuration files) AC_SUBST(CRM_DTD_DIRECTORY) -dnl Eventually move out of the heartbeat dir tree and create compatability code -dnl CRM_DAEMON_DIR=$libdir/pacemaker -GLUE_DAEMON_DIR=`extract_header_define $GLUE_HEADER GLUE_DAEMON_DIR` -AC_DEFINE_UNQUOTED(GLUE_DAEMON_DIR,$GLUE_DAEMON_DIR, Location for Pacemaker daemons) -AC_SUBST(GLUE_DAEMON_DIR) +CRM_DAEMON_DIR=`$PKGCONFIG pcmk --variable=daemondir` +AC_DEFINE_UNQUOTED(CRM_DAEMON_DIR,$CRM_DAEMON_DIR, Location for the Pacemaker daemons) +AC_SUBST(CRM_DAEMON_DIR) CRM_CACHE_DIR=${localstatedir}/cache/crm AC_DEFINE_UNQUOTED(CRM_CACHE_DIR,$CRM_CACHE_DIR, Where crm shell keeps the cache) diff -r b58a3398bf11 modules/vars.py.in --- a/modules/vars.py.inThu Sep 13 12:19:56 2012 +0200 +++ b/modules/vars.py.inFri Sep 14 14:35:17 2012 +0300 @@ -200,7 +200,7 @@ crm_schema_dir = @CRM_DTD_DIRECTORY@ pe_dir = @PE_STATE_DIR@ crm_conf_dir = @CRM_CONFIG_DIR@ -crm_daemon_dir = @GLUE_DAEMON_DIR@ +crm_daemon_dir = @CRM_DAEMON_DIR@ crm_daemon_user = @CRM_DAEMON_USER@ crm_version = @VERSION@ (Build @BUILD_VERSION@) What came next was: ERROR: running cibadmin -Ql -o rsc_defaults: Call cib_query failed (-6): No such device or address Configuring any of the rsc_defaults parameters solves that problem. The last thing encountered was the unability to add LBS resource. crm(live)# ra crm(live)ra# list lsb acpid apache2 apcupsd atd bootlogdbootlogs bootmisc.sh checkfs.sh checkroot.sh clamav-freshclamcman console-setup corosynccorosync-notifyd cronctdbdbus drbdhalthdparm hostname.sh hwclock.sh hwclockfirst.sh ifupdownifupdown-clean iptablesiscsi-scst kbd keyboard-setup killprocs ldirectord logdlvm2 mdadm mdadm-raid minidlna module-init-tools mountall-bootclean.sh mountall.sh mountdevsubfs.shmountkernfs.sh mountnfs-bootclean.sh mountnfs.sh mountoverflowtmp mpt-statusd mrmonitor mrmonitor.dpkg-old msm_profile mtab.sh netatalk networking nfs-common nfs-kernel-server ntp openais openhpidpacemaker procps proftpd quota quotarpc rc rc.localrcS reboot rmnologin rpcbind rsync rsyslog samba screen-cleanup scst sendsigssingle smartd smartmontools snmpd ssh stop-bootlogd stop-bootlogd-single stor_agent sudosysstat tdm2udevudev-mtab umountfsumountnfs.sh umountroot ups-monitor urandom vivaldiframeworkd winbind x11-common xinetd crm(live)ra# end crm(live)# configure crm(live)configure# primitive testlsb lsb:nfs-kernel-server ERROR: lsb:nfs-kernel-server: could not parse meta-data: ERROR: lsb:nfs-kernel-server: no such resource agent Since I need this for my testing I stopped here. I do not know how adequate my patch for the daemon dir, but it did the job. The lsb I just couldn't tackle. Cheers, Borislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] correct way to deploy a CLVM configuration with pacemaker
Hi all, I'm trying to deploy a CLVM configuration; my VGs will be active on only 1 node at time and I won't use a clustered fs but ext3. I configured clvmd and dlm in this way: primitive cluster-dlm ocf:pacemaker:controld op monitor interval=60 \ timeout=60 meta is-managed=true primitive cluster-lvm ocf:lvm2:clvmd params daemon_timeout=30 \ meta is-managed=true group cluster-base cluster-dlm cluster-lvm meta is-managed=true clone cluster-infra cluster-base meta \ interleave=true is-managed=true Suppose now that I want to configure a resource to manage my VG, something like this: primitive wfq-lv-rs ocf:heartbeat:LVM params \ volgrpname=WFQ_vg exclusive=yes op start interval=0 \ op monitor interval=120s timeout=60s op stop \ interval=0 timeout=30s meta is-managed=true I think that my LVM resource should be someway dependant from cluster-infra; in my opinion the following dependencies should be honored: 1. the resource who manage the VG, wfq-lv-rs, must be started only after the resource who manage the CLVM 2. because the resource who manage the CLVM is inside a clone resource and will be started in all nodes, the wfq-lv-rs must be started only in a node who has the clone resource containing the CLVM resource online. If the above assumptions are correct, how is it possible to manage this in pacemaker? Thank you, Alberto ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Percona MySQL RA on MySQL 5.5 problem
I tried searching a bit and it seems like this one hasn't been reported yet, my apologies if it has. The RA currently issues a RESET SLAVE command, but the meaning of this changed in 5.5. https://dev.mysql.com/doc/refman/5.5/en/reset-slave.html It now needs to do a RESET SLAVE ALL. Before I changed this the resource agent got into a weird state because 'show slave status' wasn't producing the expected blank output on a master. Resource agent pulled from https://github.com/y-trudeau/resource-agents-prm/raw/master/heartbeat/mysql on 8/21/12 (though doesn't appear to have changed since). Cheers, Nathan Bird ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker processes RSS growth
14.09.2012 09:54, Vladislav Bogdanov wrote: 13.09.2012 15:18, Vladislav Bogdanov wrote: ... and now it runs on my testing cluster. Ipc-related memory problems seem to be completely fixed now, processes own memory (RES-SHR in terms of htop) does not grow any longer (after 40 minutes). Although I see that both RES and SHR counters sometimes increase synchronously. lrmd does not grow at all. Will look again after few hours. So, lrmd is ok. I see only 4kb growth in RES-SHR on one node (current DC). Other instances are of the constant size for almost a day. I see RES-SHR growth in pacemakerd (100kb per day). So I expect some leakage here. Should I run it under valgrind? Valgrind doesn't find anything valuable here (1 and 9 hours runs). ==23851== LEAK SUMMARY: ==23851==definitely lost: 528 bytes in 3 blocks ==23851==indirectly lost: 17,361 bytes in 36 blocks ==23851== possibly lost: 234 bytes in 8 blocks ==23851==still reachable: 17,458 bytes in 163 blocks ==23851== suppressed: 0 bytes in 0 blocks And I see that both RES and SHR synchronously grow in crmd (600-700kb per day on member nodes, 6Mb on DC), while RES-SHR is reduced by 24kb on DC. And I see cib growth in both RES and SHR in range 12-340 kb, and 4kb growth in RES-SHR on nodes except DC. I can't say for sure what causes growth of shared pages. May be it is /dev/shm. Lot of files are there. I'll look if it grows. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] When are you going to release the next version of Booth?
On Thu, 2012-09-13 at 19:16 +0900, Yuichi SEINO wrote: Hi Jiaju, If the schedule is determined, I would like to know when you are going to release the next version of booth. Sorry, it has not been decided yet. Currently I myself am busy with other work and have not found time yet ... And I have an other question. Currently, How many people operate booth on the system? I have not had the exact number so far. However, if you have specific use cases, we can discuss. Thanks, Jiaju ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org