Re: [Pacemaker] Bug in compiling pacemaker-pygui: CRM_DAEMON_DIR not defined
On 05/31/11 04:13, Andrew Beekhof wrote: On Mon, May 30, 2011 at 2:23 PM, Gao,Yan y...@novell.com wrote: On 05/30/11 17:31, Andrew Beekhof wrote: It used to be in crm_config.h but I had to remove it because it interfered with multilib (not that I've ever seen anyone make use of that particular feature). Mgmtd uses it for invoking pengine/crmd to retrieve the metadata. Is there a replacement of it? I don't seem to find one. You can now use pkg-config. We supply the .pc files Fine! How about to install the pc files like the attached? Regards, Yan -- Gao,Yan y...@novell.com Software Engineer China Server Team, SUSE. # HG changeset patch # User Gao,Yan y...@novell.com # Date 1306831076 -28800 # Node ID c86cb93c5a57c1f507a21be69d24fd28dee85397 # Parent e872eeb39a5f6e1fdb57c3108551a5353648c4f4 Low: Build: Install pkg-config files diff -r e872eeb39a5f -r c86cb93c5a57 lib/Makefile.am --- a/lib/Makefile.am Fri May 06 11:01:44 2011 +0200 +++ b/lib/Makefile.am Tue May 31 16:37:56 2011 +0800 @@ -17,6 +17,27 @@ # MAINTAINERCLEANFILES= Makefile.in +EXTRA_DIST = pcmk.pc.in pcmk-cib.pc.in pcmk-pe.pc.in + +LIBS = cib pe + +target_LIBS = $(LIBS:%=pcmk-%.pc) + +target_PACKAGE = pcmk.pc + +all-local: $(target_LIBS) $(target_PACKAGE) + +install-exec-local: $(target_LIBS) $(target_PACKAGE) + $(INSTALL) -d $(DESTDIR)/$(libdir)/pkgconfig + $(INSTALL) -m 644 $(target_LIBS) $(target_PACKAGE) $(DESTDIR)/$(libdir)/pkgconfig + +uninstall-local: + cd $(DESTDIR)/$(libdir)/pkgconfig rm -f $(target_LIBS) $(target_PACKAGE) + rmdir $(DESTDIR)/$(libdir)/pkgconfig 2 /dev/null || : + +clean-local: + rm -f *.pc + ## Subdirectories... SUBDIRS = common pengine transition cib fencing plugins DIST_SUBDIRS = $(SUBDIRS) ais diff -r e872eeb39a5f -r c86cb93c5a57 pacemaker.spec.in --- a/pacemaker.spec.in Fri May 06 11:01:44 2011 +0200 +++ b/pacemaker.spec.in Tue May 31 16:37:56 2011 +0800 @@ -158,6 +158,7 @@ License: GPLv2+ and LGPLv2+ Summary: Pacemaker development package Group: Development/Libraries Requires: pacemaker-libs = %{version}-%{release} +Requires: pkgconfig Requires: cluster-glue-libs-devel Requires: libxml2-devel libxslt-devel bzip2-devel glib2-devel %if %{with ais} @@ -374,6 +375,7 @@ fi %dir %{_var}/lib/pacemaker %{_var}/lib/pacemaker %endif +%{_libdir}/pkgconfig/*.pc %doc COPYING.LIB %doc AUTHORS ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Node removal in corosync-based cluster
On Mon, May 30, 2011 at 05:27:41PM +0300, Vladislav Bogdanov wrote: Hi, 30.05.2011 17:12, Dejan Muhamedagic wrote: Hi, On Sun, May 29, 2011 at 11:58:17PM +0300, Vladislav Bogdanov wrote: Hi all. I've got a task to remove some nodes from cluster to save some power and found that it is not sufficient to follow http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-delete.html#s-del-ais Did you try just crm node delete ...? That should do whatever's necessary to remove a node (on which pacemaker be down). What extra commands does it run comparing to Pacemaker Explained? I must say that I don't know, easy to check in the code, but it should do whatever's necessary. If it doesn't then we need to fix it. The idea is that users don't need to know about cibadmin, crm_node, and such stuff. Thanks, Dejan I can vaguely recall this case, but cannot remember anymore what was the outcome. Thanks, Dejan After pacemaker is then restarted on any another remaining node, removed node comes back in a CIB (with OFFLINE status). Even removal of that node from corosync's objdb with corosync-objctl -d runtime.totem.pg.mrp.srp.members.COROSYNC_ID does not help. The only way to completely remove node I found is to stop pacemaker on all cluster nodes and start it again then. So there should be some 'ghost' data in a running pacemaker instance which is not deleted after removal of node data and status from a CIB. I hope this could be easily fixed, just a note that it does not fully work now (1.1.5 with some patches from both 1.1 and devel). Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Automating Pacemaker Setup
On Mon, May 30, 2011 at 10:16:12PM +0200, Andrew Beekhof wrote: On Mon, May 30, 2011 at 3:59 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Fri, May 27, 2011 at 08:21:08PM +, veghead wrote: veghead sean@... writes: Todd Nine todd@... writes: Wow. The example pacemaker config and the trick of starting heartbeat before Bah. So close. But I still don't have it completely automated. If I start heatbeat on the first node and then run: crm configure myconfigure.txt That fails. If I start heartbeat on the second node and wait for the two nodes to connect to each other (so that we have a quorum), then I can run crm configure and it works. So that leaves me with couple questions: 1) Is there a way to force crm to accept my configuration request ~before~ starting the second node? No before the DC is elected. Isn't there a force option? I thought that set the cib_quorum_override flag. Actually no, the force option only overrides the shell. Of course, we could add that. Thanks, Dejan There are two settings: dc-deadtime and startup-fencing which can reduce the time for DC election. Note that disabling startup fencing is not recommended. But I don't know what's your use case. YMMV. 2) Is there a way to tell Pacemaker to ignore quorum requirements ~before~ starting additional nodes? 2) Is there an alternate way to configure Pacemaker? Yes, you can modify the CIB _before_ starting pacemaker. Sth like: CIB_file=/var/lib/heartbeat/crm/cib.xml crm configure ... But in that case you need to remove cib.xml.sig. Then you have to make sure that pacemaker starts first on this node. Consider this only if everything else fails. Thanks, Dejan -Sean ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Node removal in corosync-based cluster
31.05.2011 12:32, Dejan Muhamedagic wrote: On Mon, May 30, 2011 at 05:27:41PM +0300, Vladislav Bogdanov wrote: Hi, 30.05.2011 17:12, Dejan Muhamedagic wrote: Hi, On Sun, May 29, 2011 at 11:58:17PM +0300, Vladislav Bogdanov wrote: Hi all. I've got a task to remove some nodes from cluster to save some power and found that it is not sufficient to follow http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-delete.html#s-del-ais Did you try just crm node delete ...? That should do whatever's necessary to remove a node (on which pacemaker be down). What extra commands does it run comparing to Pacemaker Explained? I must say that I don't know, easy to check in the code, but it should do whatever's necessary. If it doesn't then we need to fix it. The idea is that users don't need to know about cibadmin, crm_node, and such stuff. The problem is not how to delete node, this works with cibadmin, and should work with crm shell too. Node is deleted and everything is fine until pacemaker restart on any remaining node. But after such restart node re-appears in cib. That should be some artifact in pacemaker (e.g. not all node-specific data is deleted from some list and that list is then consulted for a list of nodes). qourum-votes are not touched btw, they show correct number. Hm. Not sure if clone instances are allocated. Probably not. Problem is quite minor, I wrote that just to note that it exists. Best, Vladislav Thanks, Dejan I can vaguely recall this case, but cannot remember anymore what was the outcome. Thanks, Dejan After pacemaker is then restarted on any another remaining node, removed node comes back in a CIB (with OFFLINE status). Even removal of that node from corosync's objdb with corosync-objctl -d runtime.totem.pg.mrp.srp.members.COROSYNC_ID does not help. The only way to completely remove node I found is to stop pacemaker on all cluster nodes and start it again then. So there should be some 'ghost' data in a running pacemaker instance which is not deleted after removal of node data and status from a CIB. I hope this could be easily fixed, just a note that it does not fully work now (1.1.5 with some patches from both 1.1 and devel). Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Group not started/stopped in correct order with -INF collocation
Hello, In a 2-node cluster two groups (prod, test) are defined with a collocation score=-INFINITY. When the node running the group prod failed, the test group on the other node was stopped and the prod group was started. After the failed node was online again, the prod group was moved to the other node. BUT the test group was first started, then the prod group was stopped and started on the other node. There was a very short time where both groups were active on the same node. We use a RA to set the home directory for some users, and so the path gets first set and then unset by this sequence. There is no order defined between these groups to enable parallel starts on different nodes. Is this a definition problem or a bug. Nevertheless this behavior looks strange for me! Kind regards, Christian Mit freundlichen Grüßen / with best regards Christian Kulovits [cid:image001.jpg@01CC1F9B.D214EB40] AUSTRIAN AIRLINES Christian Kulovits ITSOC Central System Database Services Senior IT System Engineer Head Office Office Park 2, P.O. Box 100 1300 Vienna-Airport, Austria * Phone: +43 (0)5 1766 11557 Ê Fax: +43 (0)5 1766 511557 È Mobile: +43 (0)664 80111 11557 * email: christian.kulov...@austrian.commailto:christian.kulov...@austrian.com ý www: www.austrian.comhttp://www.austrian.com/ Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport, Austria, registered office: Vienna, registered with Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail is confidential and is subject to disclaimers. Details can be found at: http://www.austrian.com/disclaimer. inline: image001.jpg___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Group not started/stopped in correct order with -INF collocation
On 2011-05-31 14:05, Kulovits Christian - OS ITSC wrote: Hello, In a 2-node cluster two groups (prod, test) are defined with a collocation score=“-INFINITY“. When the node running the group prod failed, the test group on the other node was stopped and the prod group was started. After the failed node was online again, the prod group was moved to the other node. BUT the test group was first started, then the prod group was stopped and started on the other node. There was a very short time where both groups were active on the same node. We use a RA to set the home directory for some users, and so the path gets first set and then unset by this sequence. There is no order defined between these groups to enable parallel starts on different nodes. Is this a definition problem or a bug. Nevertheless this behavior looks strange for me! Version information would be tremendously helpful, as would an hb_report tarball uploaded to Dropbox or any other publicly accessible location. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Automating Pacemaker Setup
Dejan Muhamedagic dejanmm@... writes: On Fri, May 27, 2011 at 08:21:08PM +, veghead wrote: 1) Is there a way to force crm to accept my configuration request ~before~ starting the second node? No before the DC is elected. There are two settings: dc-deadtime and startup-fencing which can reduce the time for DC election. Note that disabling startup fencing is not recommended. But I don't know what's your use case. YMMV. Well, I'm probably not quite the typical use case. We're using Amazon EC2 to setup and tear down testing environments. I have automated the entire process except for setting up Pacemaker. Beyond testing environments, I'd like to automate Pacemaker setup to cover the scenario where all nodes in a Pacemaker cluster crash and the entire configuration is lost. Obviously, once one node is running, setting up additional nodes becomes easy. It's just the bootstrap phase that's a challenge to automate. 2) Is there a way to tell Pacemaker to ignore quorum requirements ~before~ starting additional nodes? 2) Is there an alternate way to configure Pacemaker? Yes, you can modify the CIB _before_ starting pacemaker. Sth like: CIB_file=/var/lib/heartbeat/crm/cib.xml crm configure ... But in that case you need to remove cib.xml.sig. Then you have to make sure that pacemaker starts first on this node. Consider this only if everything else fails. I'll give that a shot. Thanks. -S ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Group not started/stopped in correct order with -INF collocation
Sorry!! Yes, of course Corosync Cluster Engine, version '1.2.7' SVN revision '3008' CRM Version: 1.0.9 (da7075976b5ff0bee71074385f8fd02f296ec8a3) pacemaker 1.0.10-1.2.el5 - (none) x86_64 corosync 1.2.7-1.1.el5 - (none) x86_64 heartbeat 3.0.3-2.3.el5 - (none) x86_64 Platform: Linux Kernel release: 2.6.18-164.el5 Architecture: x86_64 Distribution: Description: Red Hat Enterprise Linux Server release 5.4 (Tikanga) You can find the report here: http://dl.dropbox.com/u/30915505/AMOSreport.tar.bz2 And it is a 4-node cluster... Regards, Christian -Original Message- From: Florian Haas [mailto:florian.h...@linbit.com] Sent: Dienstag, 31. Mai 2011 14:41 To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] Group not started/stopped in correct order with -INF collocation On 2011-05-31 14:05, Kulovits Christian - OS ITSC wrote: Hello, In a 2-node cluster two groups (prod, test) are defined with a collocation score=-INFINITY. When the node running the group prod failed, the test group on the other node was stopped and the prod group was started. After the failed node was online again, the prod group was moved to the other node. BUT the test group was first started, then the prod group was stopped and started on the other node. There was a very short time where both groups were active on the same node. We use a RA to set the home directory for some users, and so the path gets first set and then unset by this sequence. There is no order defined between these groups to enable parallel starts on different nodes. Is this a definition problem or a bug. Nevertheless this behavior looks strange for me! Version information would be tremendously helpful, as would an hb_report tarball uploaded to Dropbox or any other publicly accessible location. Cheers, Florian Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport, Austria, registered office: Vienna, registered with Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail is confidential and is subject to disclaimers. Details can be found at: http://www.austrian.com/disclaimer. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [Openais] Linux HA on debian sparc
Try running paceamaker using the MCP. The plugin mode of pacemaker never really worked very well because of complexities of posix mmap and fork. Not having sparc hardware personally, YMMV. We have recently with corosync 1.3.1 gone through an alignment fixing process for ARM arches - hope that solves your alignment problems on sparc as well. Regards -steve On 05/31/2011 08:38 AM, william felipe_welter wrote: Im trying to setup HA with corosync and pacemaker using the debian packages on SPARC Architecture. Using Debian package corosync process dies after initializate pacemaker process. I make some tests with ltrace and strace and this tools tell me that corosync died because a segmentation fault. I try a lot of thing to solve this problem, but nothing made corosync works. My second try is to compile from scratch (using this docs:http://www.clusterlabs.org/wiki/Install#From_Source) http://www.clusterlabs.org/wiki/Install#From_Source%29. . This way corosync process startup perfectly! but some process of pacemaker don't start.. Analyzing log i see the probably reason: attrd: [2283]: info: init_ais_connection_once: Connection to our AIS plugin (9) failed: Library error (2) stonithd: [2280]: info: init_ais_connection_once: Connection to our AIS plugin (9) failed: Library error (2) . cib: [2281]: info: init_ais_connection_once: Connection to our AIS plugin (9) failed: Library error (2) . crmd: [3320]: debug: init_client_ipc_comms_ nodispatch: Attempting to talk on: /usr/var/run/crm/cib_rw crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Could not init comms on: /usr/var/run/crm/cib_rw crmd: [3320]: debug: cib_native_signon_raw: Connection to command channel failed crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Attempting to talk on: /usr/var/run/crm/cib_callback crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Could not init comms on: /usr/var/run/crm/cib_callback crmd: [3320]: debug: cib_native_signon_raw: Connection to callback channel failed crmd: [3320]: debug: cib_native_signon_raw: Connection to CIB failed: connection failed crmd: [3320]: debug: cib_native_signoff: Signing out of the CIB Service crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Attempting to talk on: /usr/var/run/crm/cib_rw crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Could not init comms on: /usr/var/run/crm/cib_rw crmd: [3320]: debug: cib_native_signon_raw: Connection to command channel failed crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Attempting to talk on: /usr/var/run/crm/cib_callback crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Could not init comms on: /usr/var/run/crm/cib_callback crmd: [3320]: debug: cib_native_signon_raw: Connection to callback channel failed crmd: [3320]: debug: cib_native_signon_raw: Connection to CIB failed: connection failed crmd: [3320]: debug: cib_native_signoff: Signing out of the CIB Service crmd: [3320]: info: do_cib_control: Could not connect to the CIB service: connection failed My conf: # Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 join: 60 token: 3000 token_retransmits_before_loss_const: 10 secauth: off threads: 0 consensus: 8601 vsftype: none threads: 0 rrp_mode: none clear_node_high_bit: yes max_messages: 20 interface { ringnumber: 0 bindnetaddr: 10.10.23.0 mcastaddr: 226.94.1.1 mcastport: 5405 } } logging { fileline: off to_stderr: no to_logfile: yes to_syslog: yes logfile: /var/log/cluster/corosync.log debug: on timestamp: on logger_subsys { subsys: AMF debug: on } } amf { mode: disabled } service { # Load the Pacemaker Cluster Resource Manager ver: 0 name: pacemaker } aisexec { user: root group: root } My Question is: why attrd, cib ... can't connect to AIS Plugin? What could be the reasons for the connection failed ? (Yes, my /dev/shm are tmpfs) -- William Felipe Welter -- Consultor em Tecnologias Livres william.wel...@4linux.com.br mailto:william.wel...@4linux.com.br www.4linux.com.br http://www.4linux.com.br ___ Openais mailing list open...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Linux HA on debian sparc
Note. there are three signals you could possibly see that generate a core file. SIGABRT (assert() called in the codebase) SIGSEGV (segmentation violation) SIGBUS (alignment error) Make sure you don't have a sigbus. Opening the core file with gdb will tell you which signal triggered the fault. Regards -steve On 05/31/2011 08:34 AM, william felipe_welter wrote: Im trying to setup HA with corosync and pacemaker using the debian packages on SPARC Architecture. Using Debian package corosync process dies after initializate pacemaker process. I make some tests with ltrace and strace and this tools tell me that corosync died because a segmentation fault. I try a lot of thing to solve this problem, but nothing made corosync works. My second try is to compile from scratch (using this docs:http://www.clusterlabs.org/wiki/Install#From_Source) http://www.clusterlabs.org/wiki/Install#From_Source%29. . This way corosync process startup perfectly! but some process of pacemaker don't start.. Analyzing log i see the probably reason: attrd: [2283]: info: init_ais_connection_once: Connection to our AIS plugin (9) failed: Library error (2) stonithd: [2280]: info: init_ais_connection_once: Connection to our AIS plugin (9) failed: Library error (2) . cib: [2281]: info: init_ais_connection_once: Connection to our AIS plugin (9) failed: Library error (2) . crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Attempting to talk on: /usr/var/run/crm/cib_rw crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Could not init comms on: /usr/var/run/crm/cib_rw crmd: [3320]: debug: cib_native_signon_raw: Connection to command channel failed crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Attempting to talk on: /usr/var/run/crm/cib_callback crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Could not init comms on: /usr/var/run/crm/cib_callback crmd: [3320]: debug: cib_native_signon_raw: Connection to callback channel failed crmd: [3320]: debug: cib_native_signon_raw: Connection to CIB failed: connection failed crmd: [3320]: debug: cib_native_signoff: Signing out of the CIB Service crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Attempting to talk on: /usr/var/run/crm/cib_rw crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Could not init comms on: /usr/var/run/crm/cib_rw crmd: [3320]: debug: cib_native_signon_raw: Connection to command channel failed crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Attempting to talk on: /usr/var/run/crm/cib_callback crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Could not init comms on: /usr/var/run/crm/cib_callback crmd: [3320]: debug: cib_native_signon_raw: Connection to callback channel failed crmd: [3320]: debug: cib_native_signon_raw: Connection to CIB failed: connection failed crmd: [3320]: debug: cib_native_signoff: Signing out of the CIB Service crmd: [3320]: info: do_cib_control: Could not connect to the CIB service: connection failed My conf: # Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 join: 60 token: 3000 token_retransmits_before_loss_const: 10 secauth: off threads: 0 consensus: 8601 vsftype: none threads: 0 rrp_mode: none clear_node_high_bit: yes max_messages: 20 interface { ringnumber: 0 bindnetaddr: 10.10.23.0 mcastaddr: 226.94.1.1 mcastport: 5405 } } logging { fileline: off to_stderr: no to_logfile: yes to_syslog: yes logfile: /var/log/cluster/corosync.log debug: on timestamp: on logger_subsys { subsys: AMF debug: on } } amf { mode: disabled } service { # Load the Pacemaker Cluster Resource Manager ver: 0 name: pacemaker } aisexec { user: root group: root } My Question is: why attrd, cib ... can't connect to AIS Plugin? What could be the reasons for the connection failed ? (Yes, my /dev/shm are tmpfs) -- William Felipe Welter -- Consultor em Tecnologias Livres william.wel...@4linux.com.br mailto:william.wel...@4linux.com.br www.4linux.com.br http://www.4linux.com.br ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
Re: [Pacemaker] [Openais] Linux HA on debian sparc
Thanks Steven, Now im try to run on the MCP: - Uninstall the pacemaker 1.0 - Compile and install 1.1 But now i have problems to initialize the pacemakerd: Could not initialize Cluster Configuration Database API instance error 2 Debbuging with gdb i see that the error are on the confdb.. most specificaly the errors start on coreipcc.c at line: 448if (addr != addr_orig) { 449goto error_close_unlink; - enter here 450 } Some ideia about what can cause this ? 2011/5/31 Steven Dake sd...@redhat.com Try running paceamaker using the MCP. The plugin mode of pacemaker never really worked very well because of complexities of posix mmap and fork. Not having sparc hardware personally, YMMV. We have recently with corosync 1.3.1 gone through an alignment fixing process for ARM arches - hope that solves your alignment problems on sparc as well. Regards -steve On 05/31/2011 08:38 AM, william felipe_welter wrote: Im trying to setup HA with corosync and pacemaker using the debian packages on SPARC Architecture. Using Debian package corosync process dies after initializate pacemaker process. I make some tests with ltrace and strace and this tools tell me that corosync died because a segmentation fault. I try a lot of thing to solve this problem, but nothing made corosync works. My second try is to compile from scratch (using this docs:http://www.clusterlabs.org/wiki/Install#From_Source) http://www.clusterlabs.org/wiki/Install#From_Source%29. . This way corosync process startup perfectly! but some process of pacemaker don't start.. Analyzing log i see the probably reason: attrd: [2283]: info: init_ais_connection_once: Connection to our AIS plugin (9) failed: Library error (2) stonithd: [2280]: info: init_ais_connection_once: Connection to our AIS plugin (9) failed: Library error (2) . cib: [2281]: info: init_ais_connection_once: Connection to our AIS plugin (9) failed: Library error (2) . crmd: [3320]: debug: init_client_ipc_comms_ nodispatch: Attempting to talk on: /usr/var/run/crm/cib_rw crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Could not init comms on: /usr/var/run/crm/cib_rw crmd: [3320]: debug: cib_native_signon_raw: Connection to command channel failed crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Attempting to talk on: /usr/var/run/crm/cib_callback crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Could not init comms on: /usr/var/run/crm/cib_callback crmd: [3320]: debug: cib_native_signon_raw: Connection to callback channel failed crmd: [3320]: debug: cib_native_signon_raw: Connection to CIB failed: connection failed crmd: [3320]: debug: cib_native_signoff: Signing out of the CIB Service crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Attempting to talk on: /usr/var/run/crm/cib_rw crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Could not init comms on: /usr/var/run/crm/cib_rw crmd: [3320]: debug: cib_native_signon_raw: Connection to command channel failed crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Attempting to talk on: /usr/var/run/crm/cib_callback crmd: [3320]: debug: init_client_ipc_comms_nodispatch: Could not init comms on: /usr/var/run/crm/cib_callback crmd: [3320]: debug: cib_native_signon_raw: Connection to callback channel failed crmd: [3320]: debug: cib_native_signon_raw: Connection to CIB failed: connection failed crmd: [3320]: debug: cib_native_signoff: Signing out of the CIB Service crmd: [3320]: info: do_cib_control: Could not connect to the CIB service: connection failed My conf: # Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 join: 60 token: 3000 token_retransmits_before_loss_const: 10 secauth: off threads: 0 consensus: 8601 vsftype: none threads: 0 rrp_mode: none clear_node_high_bit: yes max_messages: 20 interface { ringnumber: 0 bindnetaddr: 10.10.23.0 mcastaddr: 226.94.1.1 mcastport: 5405 } } logging { fileline: off to_stderr: no to_logfile: yes to_syslog: yes logfile: /var/log/cluster/corosync.log debug: on timestamp: on logger_subsys { subsys: AMF debug: on } } amf { mode: disabled } service { # Load the Pacemaker Cluster Resource Manager ver: 0 name: pacemaker } aisexec { user: root group: root } My Question is: why attrd, cib ... can't connect to AIS Plugin? What could be the reasons for the connection failed ? (Yes, my /dev/shm are tmpfs) -- William Felipe Welter -- Consultor em Tecnologias Livres william.wel...@4linux.com.br
Re: [Pacemaker] [Openais] Linux HA on debian sparc
On Tue, May 31, 2011 at 06:25:56PM -0300, william felipe_welter wrote: Thanks Steven, Now im try to run on the MCP: - Uninstall the pacemaker 1.0 - Compile and install 1.1 But now i have problems to initialize the pacemakerd: Could not initialize Cluster Configuration Database API instance error 2 Debbuging with gdb i see that the error are on the confdb.. most specificaly the errors start on coreipcc.c at line: 448if (addr != addr_orig) { 449goto error_close_unlink; - enter here 450 } Some ideia about what can cause this ? I tried porting a ringbuffer (www.libqb.org) to sparc and had the same failure. There are 3 mmap() calls and on sparc the third one keeps failing. This is a common way of creating a ring buffer, see: http://en.wikipedia.org/wiki/Circular_buffer#Exemplary_POSIX_Implementation I couldn't get it working in the short time I tried. It's probably worth looking at the clib implementation to see why it's failing (I didn't get to that). -Angus ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] High: PE: Bug lf#2554 - target-role alone is not sufficient to promote resources
in version 1.1.5, I found a High: PE: Bug lf#2554 - target-role alone is not sufficient to promote resources in the ChangLog.what does it mean?which configurations are sufficient to promote resources? who can give me an example cib.xml? I have been troubled by failing to promote resource as usual for days, really need a help.thank you very much!___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [Openais] Linux HA on debian sparc
Angus, I make some test program (based on the code coreipcc.c) and i now i sure that are problems with the mmap systems call on sparc.. Source code of my test program: #include stdlib.h #include sys/mman.h #include stdio.h #define PATH_MAX 36 int main() { int32_t fd; void *addr_orig; void *addr; char path[PATH_MAX]; const char *file = teste123XX; size_t bytes=10024; snprintf (path, PATH_MAX, /dev/shm/%s, file); printf(PATH=%s\n,path); fd = mkstemp (path); printf(fd=%d \n,fd); addr_orig = mmap (NULL, bytes, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); addr = mmap (addr_orig, bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0); printf(ADDR_ORIG:%p ADDR:%p\n,addr_orig,addr); if (addr != addr_orig) { printf(Erro); } } Results on x86: PATH=/dev/shm/teste123XX fd=3 ADDR_ORIG:0x7f867d8e6000 ADDR:0x7f867d8e6000 Results on sparc: PATH=/dev/shm/teste123XX fd=3 ADDR_ORIG:0xf7f72000 ADDR:0x But im wondering if is really needed to call mmap 2 times ? What are the reason to call the mmap 2 times, on the second time using the address of the first? 2011/5/31 Angus Salkeld asalk...@redhat.com On Tue, May 31, 2011 at 06:25:56PM -0300, william felipe_welter wrote: Thanks Steven, Now im try to run on the MCP: - Uninstall the pacemaker 1.0 - Compile and install 1.1 But now i have problems to initialize the pacemakerd: Could not initialize Cluster Configuration Database API instance error 2 Debbuging with gdb i see that the error are on the confdb.. most specificaly the errors start on coreipcc.c at line: 448if (addr != addr_orig) { 449goto error_close_unlink; - enter here 450 } Some ideia about what can cause this ? I tried porting a ringbuffer (www.libqb.org) to sparc and had the same failure. There are 3 mmap() calls and on sparc the third one keeps failing. This is a common way of creating a ring buffer, see: http://en.wikipedia.org/wiki/Circular_buffer#Exemplary_POSIX_Implementation I couldn't get it working in the short time I tried. It's probably worth looking at the clib implementation to see why it's failing (I didn't get to that). -Angus ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- William Felipe Welter -- Consultor em Tecnologias Livres william.wel...@4linux.com.br www.4linux.com.br ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [Openais] Linux HA on debian sparc
On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter wrote: Angus, I make some test program (based on the code coreipcc.c) and i now i sure that are problems with the mmap systems call on sparc.. Source code of my test program: #include stdlib.h #include sys/mman.h #include stdio.h #define PATH_MAX 36 int main() { int32_t fd; void *addr_orig; void *addr; char path[PATH_MAX]; const char *file = teste123XX; size_t bytes=10024; snprintf (path, PATH_MAX, /dev/shm/%s, file); printf(PATH=%s\n,path); fd = mkstemp (path); printf(fd=%d \n,fd); addr_orig = mmap (NULL, bytes, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); addr = mmap (addr_orig, bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0); printf(ADDR_ORIG:%p ADDR:%p\n,addr_orig,addr); if (addr != addr_orig) { printf(Erro); } } Results on x86: PATH=/dev/shm/teste123XX fd=3 ADDR_ORIG:0x7f867d8e6000 ADDR:0x7f867d8e6000 Results on sparc: PATH=/dev/shm/teste123XX fd=3 ADDR_ORIG:0xf7f72000 ADDR:0x Note: 0x == MAP_FAILED (from man mmap) RETURN VALUE On success, mmap() returns a pointer to the mapped area. On error, the value MAP_FAILED (that is, (void *) -1) is returned, and errno is set appropriately. But im wondering if is really needed to call mmap 2 times ? What are the reason to call the mmap 2 times, on the second time using the address of the first? Well there are 3 calls to mmap() 1) one to allocate 2 * what you need (in pages) 2) maps the first half of the mem to a real file 3) maps the second half of the mem to the same file The point is when you write to an address over the end of the first half of memory it is taken care of the the third mmap which maps the address back to the top of the file for you. This means you don't have to worry about ringbuffer wrapping which can be a headache. -Angus 2011/5/31 Angus Salkeld asalk...@redhat.com On Tue, May 31, 2011 at 06:25:56PM -0300, william felipe_welter wrote: Thanks Steven, Now im try to run on the MCP: - Uninstall the pacemaker 1.0 - Compile and install 1.1 But now i have problems to initialize the pacemakerd: Could not initialize Cluster Configuration Database API instance error 2 Debbuging with gdb i see that the error are on the confdb.. most specificaly the errors start on coreipcc.c at line: 448if (addr != addr_orig) { 449goto error_close_unlink; - enter here 450 } Some ideia about what can cause this ? I tried porting a ringbuffer (www.libqb.org) to sparc and had the same failure. There are 3 mmap() calls and on sparc the third one keeps failing. This is a common way of creating a ring buffer, see: http://en.wikipedia.org/wiki/Circular_buffer#Exemplary_POSIX_Implementation I couldn't get it working in the short time I tried. It's probably worth looking at the clib implementation to see why it's failing (I didn't get to that). -Angus ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- William Felipe Welter -- Consultor em Tecnologias Livres william.wel...@4linux.com.br www.4linux.com.br ___ Openais mailing list open...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker