Re: [Pacemaker] votequorum for 2 node cluster
On 06/11/14 16:35, Kostiantyn Ponomarenko wrote: > And that is like roulette, in case we lose the lowest nodeid we lose all. > So I can lose only the node which doesn't have the lowest nodeid? > And it's not useful in 2 node cluster. > Am i correct? It may be usefull. If you define roles of the nodes, like this: – node 2: 'master' node – node 1: 'backup' node and monitor node availability to replace backup when it fails, then you can get quite a high availability. It won't work in Active/Active or peer-to-peer (all nodes equal) setups, though. Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Howto check if the current node is active?
On 2014-01-07 13:33, Bauer, Stefan (IZLBW Extern) wrote: > How can i check if the current node i’m connected to is the active? > > It should be parseable because i want to use it in a script. Pacemaker is not limited to Active-Passive setups, in fact it has no notion of 'Active' node – every node in the cluster is active (unless on standby). Active/Passive node status may make sense in many Pacemaker deployments – but that is specific to the configuration. Sometimes 'active node' will be running the DRBD master, other times it will be the one where a specific resource is running. Generally you can test that with by parsing 'cibadmin' output or some higher-level Pacemaker shell (crmsh or pcs). Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file
On Wed, 26 Jun 2013 18:38:37 +1000 Andrew Beekhof wrote: > >> trace Jun 25 13:40:10 gio_read_socket(366):0: 0xa6c140.4 1 > >> (ref=1) trace Jun 25 13:40:10 lrmd_ipc_accept(89):0: Connection > >> 0xa6d110 infoJun 25 13:40:10 crm_client_new(276):0: Connecting > >> 0xa6d110 for uid=17 gid=0 pid=25212 > >> id=d771e06b-47e7-43a6-a447-63343870396e debug Jun 25 13:40:10 > >> handle_new_connection(735):2147483648: IPC credentials > >> authenticated (25209-25212-6) debug Jun 25 13:40:10 > >> qb_ipcs_shm_connect(282):2147483648: connecting to client [25212] > >> > >> Slightly more helpful... > >> > >> What group(s) does uid=17 have? > > > > # id hacluster > > uid=17(hacluster) gid=60(haclient) groups=60(haclient) > > # ps -u hacluster > > PID TTY TIME CMD > > 2335 ?00:00:00 cib > > 2339 ?00:00:00 attrd > > 2340 ?00:00:00 pengine > > 2355 ?00:00:00 crmd > > # grep -E '^(Uid|Gid|Groups)' /proc/2355/status > > Uid:17171717 > > Gid:0000 > > Groups: > > > > So, crmd runs with the 'hacluster' uid, but no associated > > gid/groups. Can this be the problem? > > Definitely. > > Urgh. Now that I look closer, I see the commits I was thinking of > came _after_ 1.1.9, not before :-( > > So basically you need an rc of 1.1.10 1.1.10-rc5 works. Thanks a lot for the debugging help! :) Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file
On Wed, 26 Jun 2013 14:35:03 +1000 Andrew Beekhof wrote: > Urgh: > > infoJun 25 13:40:10 lrmd_ipc_connect(913):0: Connecting to lrmd > trace Jun 25 13:40:10 pick_ipc_buffer(670):0: Using max message > size of 51200 error Jun 25 13:40:10 > qb_sys_mmap_file_open(92):2147483648: couldn't open > file /dev/shm/qb-lrmd-request-25209-25212-6-header: Permission denied > (13) > > useless :-( :-( > trace Jun 25 13:40:10 gio_read_socket(366):0: 0xa6c140.4 1 (ref=1) > trace Jun 25 13:40:10 lrmd_ipc_accept(89):0: Connection 0xa6d110 > infoJun 25 13:40:10 crm_client_new(276):0: Connecting 0xa6d110 > for uid=17 gid=0 pid=25212 id=d771e06b-47e7-43a6-a447-63343870396e > debug Jun 25 13:40:10 handle_new_connection(735):2147483648: IPC > credentials authenticated (25209-25212-6) debug Jun 25 13:40:10 > qb_ipcs_shm_connect(282):2147483648: connecting to client [25212] > > Slightly more helpful... > > What group(s) does uid=17 have? # id hacluster uid=17(hacluster) gid=60(haclient) groups=60(haclient) # ps -u hacluster PID TTY TIME CMD 2335 ?00:00:00 cib 2339 ?00:00:00 attrd 2340 ?00:00:00 pengine 2355 ?00:00:00 crmd # grep -E '^(Uid|Gid|Groups)' /proc/2355/status Uid:17 17 17 17 Gid:0 0 0 0 Groups: So, crmd runs with the 'hacluster' uid, but no associated gid/groups. Can this be the problem? Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file
On Tue, 25 Jun 2013 20:24:00 +1000 Andrew Beekhof wrote: > On 25/06/2013, at 5:56 PM, Jacek Konieczny wrote: > > > On Tue, 25 Jun 2013 10:50:14 +0300 > > Vladislav Bogdanov wrote: > >> I would recommend qb 1.4.4. 1.4.3 had at least one nasty bug which > >> affects pacemaker. > > > > Just tried that. It didn't help. > > Can you turn on the blockbox please? Sure. > Details at http://blog.clusterlabs.org/blog/2013/pacemaker-logging/ > > That should produce a mountain of logs when the error occurs. I have sent the logs to Andrew only, not to pollute the mailing list (not sure even if the list accepts MB of attachments). Myself, I was not able to find anything suspicious in the logs. Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [corosync] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file
On Tue, 25 Jun 2013 10:50:14 +0300 Vladislav Bogdanov wrote: > I would recommend qb 1.4.4. 1.4.3 had at least one nasty bug which > affects pacemaker. Just tried that. It didn't help. Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [corosync] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file
On Tue, 25 Jun 2013 08:59:19 +0200 Jacek Konieczny wrote: > On Tue, 25 Jun 2013 16:43:54 +1000 > Andrew Beekhof wrote: > > > > Ok, I was just checking Pacemaker was built for the running version > > of libqb. > > Yes it was. corosync 2.2.0 and libqb 0.14.0 both on the build system > and on the cluster systems. > > Hmm… I forgot libqb is a separate package… I guess I should try > upgrading libqb now… I have upgraded libqb to 0.14.4 and rebuilt both corosync and pacemaker with it. No change: Jun 25 09:52:32 dev1n2 crmd[22714]:error: qb_sys_mmap_file_open: couldn't open file /dev/shm/qb-lrmd-request-22711-22714-5-header: Permission denied (13) Jun 25 09:52:32 dev1n2 crmd[22714]:error: qb_sys_mmap_file_open: couldn't open file /var/run/qb-lrmd-request-22711-22714-5-header: No such file or directory (2) Jun 25 09:52:32 dev1n2 crmd[22714]:error: qb_rb_open: couldn't create file for mmap Jun 25 09:52:32 dev1n2 crmd[22714]:error: qb_ipcc_shm_connect: qb_rb_open:REQUEST: No such file or directory (2) Jun 25 09:52:32 dev1n2 crmd[22714]:error: qb_ipcc_shm_connect: connection failed: No such file or directory (2) Jun 25 09:52:32 dev1n2 crmd[22714]: warning: do_lrm_control: Failed to sign on to the LRM 11 (30 max) times Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [corosync] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file
On Tue, 25 Jun 2013 16:43:54 +1000 Andrew Beekhof wrote: > > Ok, I was just checking Pacemaker was built for the running version > of libqb. Yes it was. corosync 2.2.0 and libqb 0.14.0 both on the build system and on the cluster systems. Hmm… I forgot libqb is a separate package… I guess I should try upgrading libqb now… > What is the permissions on /dev/shm/ itself? [root@dev1n2 ~]# ls -ld /dev/shm drwxrwxrwt 2 root root 800 Jun 24 13:31 /dev/shm Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [corosync] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file
On Tue, 25 Jun 2013 10:10:13 +1000 Andrew Beekhof wrote: > On 24/06/2013, at 9:31 PM, Jacek Konieczny wrote: > > > > > After I have upgraded Pacemaker from 1.1.8 to 1.1.9 on a node I get > > the following errors in my syslog and Pacemaker doesn't seem to be > > able to start services on this node. > > What else did you upgrade? libqb too? Only the Pacemaker. > > Any ideas what is going wrong here? > > > > crmd is running with uid 17 ('hacluster'). I have tried to add it > > to the 'uidgid' section of corosync conf or set uidgid.uid.17 with > > corosync-cmapctl, but it didn't help. > > Which distro is this? PLD-Linux. And I am the packager of Corosync and Pacemaker in PLD-Linux, so you can assume it is a custom build from sources. Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file
After I have upgraded Pacemaker from 1.1.8 to 1.1.9 on a node I get the following errors in my syslog and Pacemaker doesn't seem to be able to start services on this node. Jun 24 13:19:44 dev1n2 crmd[5994]:error: qb_sys_mmap_file_open: couldn't open file /dev/shm/qb-lrmd-request-5991-5994-5-header: Permission denied (13) Jun 24 13:19:44 dev1n2 crmd[5994]:error: qb_sys_mmap_file_open: couldn't open file /var/run/qb-lrmd-request-5991-5994-5-header: No such file or directory (2) Jun 24 13:19:44 dev1n2 crmd[5994]:error: qb_rb_open: couldn't create file for mmap Jun 24 13:19:44 dev1n2 crmd[5994]:error: qb_ipcc_shm_connect: qb_rb_open:REQUEST: No such file or directory (2) Jun 24 13:19:44 dev1n2 crmd[5994]:error: qb_ipcc_shm_connect: connection failed: No such file or directory (2) Jun 24 13:19:44 dev1n2 crmd[5994]: warning: do_lrm_control: Failed to sign on to the LRM 18 (30 max) times I have googled for such messages and found nothing relevant. Any ideas what is going wrong here? crmd is running with uid 17 ('hacluster'). I have tried to add it to the 'uidgid' section of corosync conf or set uidgid.uid.17 with corosync-cmapctl, but it didn't help. Also: # ls -l /dev/shm/qb-lrmd* ls: cannot access /dev/shm/qb-lrmd*: No such file or directory While on a working, Pacemaker 1.1.8 node: # ls -l /dev/shm/qb-lrmd* -rw--- 1 hacluster root 20480 Jun 21 10:36 /dev/shm/qb-lrmd-event-1182-1185-6-data -rw--- 1 hacluster root 8248 Jun 21 10:36 /dev/shm/qb-lrmd-event-1182-1185-6-header -rw--- 1 hacluster root 20480 Jun 21 10:36 /dev/shm/qb-lrmd-request-1182-1185-6-data -rw--- 1 hacluster root 8252 Jun 21 10:36 /dev/shm/qb-lrmd-request-1182-1185-6-header -rw--- 1 hacluster root 20480 Jun 21 10:36 /dev/shm/qb-lrmd-response-1182-1185-6-data -rw--- 1 hacluster root 8248 Jun 21 10:36 /dev/shm/qb-lrmd-response-1182-1185-6-header Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Two resource nodes + one quorum node
On Thu, 13 Jun 2013 15:50:26 +0400 Andrey Groshev wrote: > 11.06.2013, 22:52, "Michael Schwartzkopff" : > > Am Dienstag, 11. Juni 2013, 22:33:32 schrieb Andrey Groshev: > > > Hi, > > > I want to make Postgres cluster. > > > As far as I understand, for the proper functioning of the cluster > > must use a > > > quorum (ie, at least three nodes). > > No. Two nodes are enough. See: no-quorum-policy="ignore". > > Very big thanks, but I know it. :) And what is wrong with a 2-node quorum provided by the corosync '2 node' mode? AFAIK it is better than no-quorum-policy="ignore", as it prevents the 'fence loop' – node won't fence the other node or do anything dangerous just after booting up after being fenced, because it cannot get quorum without the other, but when the cluster is booted properly with both nodes, any of the two can fail and the cluster would continue in degraded mode. I agree that using three nodes when two would be ok is an overkill. When we sell a 'high availability' solution, the customer expects two devices - one 'normal' and one 'backup'. It would be hard to sell him the third machine (same high-end server as the other two? or some 'stupid little box' just for the quorum-keeping?), that 'does nothing'. And it is not only the purchase cost, but also power and rack space. And even in a three-node cluster there are still many things that may go wrong (with very little probablility) so even if it is slightly more reliable it is probably not worth it. There is no fail-proof solution for any IT problem. If a 2-node cluster handles gracefully 95% of possible failures and 3-node cluster would handle 98%, is it worth installing the third machine? Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] issues when installing on pxe booted environment
On Fri, 29 Mar 2013 11:37:37 +1100 Andrew Beekhof wrote: > On Thu, Mar 28, 2013 at 10:43 PM, Rainer Brestan > wrote: > > Hi John, > > to get Corosync/Pacemaker running during anaconda installation, i > > have created a configuration RPM package which does a few actions > > before starting Corosync and Pacemaker. > > > > An excerpt of the post install of this RPM. > > # mount /dev/shm if not already existing, otherwise openais cannot > > work if [ ! -d /dev/shm ]; then > > mkdir /dev/shm > > mount /dev/shm > > fi > > Perhaps mention this to the corosync guys, it should probably go into > their init script. I don't think so. It is just a part of modern Linux system environment. corosync is not supposed to mount the root filesystem or /proc – mounting /dev/shm is not its responsibility either. BTW The excerpt above assumes there is a /dev/shm entry in /etc/fstab. Should this be added there by the corosync init script too? Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster
On Mon, 25 Mar 2013 20:01:28 +0100 "Angel L. Mateo" wrote: > >quorum { > > provider: corosync_votequorum > > expected_votes: 2 > > two_node: 1 > >} > > > >Corosync will then manage quorum for the two-node cluster and > >Pacemaker > > I'm using corosync 1.1 which is the one provided with my > distribution (ubuntu 12.04). I could also use cman. I don't think corosync 1.1 can do that, but I guess in this case cman should be able provide this functionality. > >can use that. You still need proper fencing to enforce the quorum > >(both for pacemaker and the storage layer – dlm in case you use > >clvmd), but no > >extra quorum node is needed. > > > I hace configured a dlm resource usted with clvm. > > One doubt... With this configuration, how split brain problem is > handled? The first node to notice that the other is unreachable will fence (kill) the other, making sure it is the only one operating on the shared data. Even though it is only half of the node, the cluster is considered quorate as the other node is known not to be running any cluster resources. When the fenced node reboots its cluster stack starts, but with no quorum until it comminicates with the surviving node again. So no cluster services are started there until both nodes communicate properly and the proper quorum is recovered. Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster
On Mon, 25 Mar 2013 13:54:22 +0100 > My problem is how to avoid split brain situation with this > configuration, without configuring a 3rd node. I have read about > quorum disks, external/sbd stonith plugin and other references, but > I'm too confused with all this. > > For example, [1] mention techniques to improve quorum with > scsi reserve or quorum daemon, but it didn't point to how to do this > pacemaker. Or [2] talks about external/sbd. > > Any help? With corosync 2.2 (2.1 too, I guess) you can use, in corosync.conf: quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 } Corosync will then manage quorum for the two-node cluster and Pacemaker can use that. You still need proper fencing to enforce the quorum (both for pacemaker and the storage layer – dlm in case you use clvmd), but no extra quorum node is needed. There is one more thing, though: you need two nodes active to boot the cluster, but then when one fails (and is fenced) the other may continue, keeping quorum. Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Does LVM resouce agent conflict with UDEV rules?
On Wed, 06 Mar 2013 22:41:51 +0100 Sven Arnold wrote: > In fact, disabling the udev rule > > SUBSYSTEM=="block", ACTION=="add|change", > ENV{ID_FS_TYPE}=="lvm*|LVM*",\ RUN+="watershed sh -c '/sbin/lvm > vgscan; /sbin/lvm vgchange -a y'" > > seems to resolve the problem for me. This rule looks like asking for problems in clustered LVM environment. It activates all volumes, non exclusively, as soon as the PV becomes available - it leaves no place for any management by the cluster software. I don't think this came from LVM upstream, rather it looks like some Ubuntu invention. Removing this rule and activating the volumes in a controlled manner (e.g. via the LVM resource agent) seems the right way. Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Accessing CIB by user not 'root' and not 'hacluster'
Hi, It used to be possible to access the Pacemaker's CIB from any user in the 'haclient' group, but after one of the upgrades it stopped working (I didn't care about this issue match then, so I cannot recall the exact point). Now I would like to restore the cluster state overview functionality in the UI of my system, so I would like to fix it. Currently I use Pacemaker 1.1.8 and Corosync 2.2.0. The problem is: $ id uid=993(sipgwui) gid=993(sipgwui) groups=993(sipgwui),60(haclient),109(lighttpd) $ cibadmin -Q Could not establish cib_rw connection: Permission denied (13) Signon to CIB failed: Transport endpoint is not connected Init failed, could not perform requested operations Strace shows this fails on: open("/dev/shm/qb-cib_rw-control-12542-19960-19", O_RDWR) = -1 EACCES (Permission denied) and: $ ls -l /dev/shm/qb-cib_rw-control-12542-19960-19 -rw--- 1 hacluster root 24 Jan 25 10:31 /dev/shm/qb-cib_rw-control-12542-19960-19 I have googled around and found that a qb_ipcs_connection_auth_set() function could be used to set the permissions right on the SHM file. I found the right call in the Pacemaker sources (cib/callbacks.c), enclosed in the '#if ENABLE_ACL' clause. My build was not compiled with the ACL support, so I have re-built it with ACL on. Now the behaviour is the same, with one exception: $ ls -l /dev/shm/qb-cib_rw-control-1488-5008-17 -rw-rw 1 hacluster root 24 Jan 25 10:19 /dev/shm/qb-cib_rw-control-1488-5008-17 The file is now group-accessible, but the group is still 'root' and not 'haclient', although confdefs.h contained: #define CRM_DAEMON_GROUP "haclient"' The docs at http://clusterlabs.org/doc/acls.html state: > The various tools for administering Pacemaker clusters (crm_mon, crm > shell, cibadmin and friends, Python GUI, Hawk) can be used by the root > user, or any user in the haclient group. By default, these users have > full read/write access. This clearly is not the case. Any ideas? Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] CIB verification failure with any change via crmsh
On Thu, 24 Jan 2013 09:04:14 +0100 Jacek Konieczny wrote: > I should probably upgrade my CIB somehow Indeed. 'cibadmin --upgrade --force' solved my problem. Thanks for all the hints. Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] CIB verification failure with any change via crmsh
Hi, On Wed, 23 Jan 2013 18:52:20 +0100 Dejan Muhamedagic wrote: > > > > > > Note sure if id can start with a digit. Corosync node id's are always digits-only. > This should really work with versions >= v1.2.4 Yeah… I have looked into the crmsh code and it has explicit support for node 'type' attribute in Pacemaker 1.1.8. For some reason this does not work for me on this cluster (no such problems on another cluster, which was not upgraded, but set up on Pacemaker 1.1 from the beginning). > Which schema do you validate against? Look for the validate-with > attribute of the cib element. validate-with="pacemaker-1.0" normal member ping So no, it is not optional here. But it is optional in the pacemaker-1.1 schema. So the problem is crmsh uses the wrong schema for the XML it generates… # cibadmin -Q | grep validate-with So, the 'validate-with="pacemaker-1.0"' comes from the current CIB. crmsh keeps that, but generates Pacemaker 1.1 XML, so the verification fails. I should probably upgrade my CIB somehow, but still it seems there is a bug in crmsh. Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] CIB verification failure with any change via crmsh
On Wed, 23 Jan 2013 16:44:45 +0100 Lars Marowsky-Bree wrote: > On 2013-01-23T16:31:20, Jacek Konieczny wrote: > > > I have recently upgraded Pacemaker on one of my clusters from > > 1.0.something to 1.1.8 and installed crmsh to manage it as I used > > to. > > It'd be helpful if you mentioned which crmsh version you installed. > The errors you get suggest you need to update it. You are right, I missed the information. It was crmsh 1.2.1 and the first thing I tried was an upgrade to 1.2.4, but this did not change a thing. So it is the same with crmsh 1.2.1 and crmsh 1.2.4. Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] CIB verification failure with any change via crmsh
Hi, I have recently upgraded Pacemaker on one of my clusters from 1.0.something to 1.1.8 and installed crmsh to manage it as I used to. crmsh mostly works for me, until I try to change the configuration with 'crm configure'. Any, even trivial change shows verification errors and fails to commit: > crm(live)configure# commit > element instance_attributes: Relax-NG validity error : Expecting an element > nvpair, got nothing > element node: Relax-NG validity error : Expecting an element > instance_attributes, got nothing > element node: Relax-NG validity error : Element nodes has extra content: node > element configuration: Relax-NG validity error : Invalid sequence in > interleave > element instance_attributes: Relax-NG validity error : Element node failed to > validate attributes > element cib: Relax-NG validity error : Element cib failed to validate content >error: main: CIB did not pass DTD/schema validation > Errors found during check: config not valid > -V may provide more details > Do you still want to commit? no It seems as crmsh fails to parse current configuration properly, as: crm configure save xml /tmp/saved.xml ; crm_verify -V --xml-file /tmp/saved.xml fails the same way: > /tmp/saved.xml:19: element instance_attributes: Relax-NG validity error : > Expecting an element nvpair, got nothing > /tmp/saved.xml:18: element node: Relax-NG validity error : Expecting an > element instance_attributes, got nothing > /tmp/saved.xml:18: element node: Relax-NG validity error : Element nodes has > extra content: node > /tmp/saved.xml:3: element configuration: Relax-NG validity error : Invalid > sequence in interleave > /tmp/saved.xml:19: element instance_attributes: Relax-NG validity error : > Element node failed to validate attributes > /tmp/saved.xml:2: element cib: Relax-NG validity error : Element cib failed > to validate content >error: main: CIB did not pass DTD/schema validation > Errors found during check: config not valid > -V may provide more details But: cibadmin -Q > /tmp/good.xml ; crm_verify --xml-file shows no error. Any ideas? Looking into the 'invalid' XML file gives me no hints, as the line 18 is the first in: which looks quite right too me. Oh… now I see the difference with the current cib. The elements miss the type="normal" attribute. After adding those to the crmsh-generated XML everything works. Then it is a crmsh bug, right? And the errors reported by crm_verify are very misleading. Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] stonithd crash on exit
On Thu, Nov 01, 2012 at 11:05:04AM +1100, Andrew Beekhof wrote: > On Thu, Nov 1, 2012 at 7:40 AM, Jacek Konieczny wrote: > > On Wed, Oct 31, 2012 at 05:33:03PM +1100, Andrew Beekhof wrote: > >> I havent seen that before. What version? > > > > Pacemaker 1.1.8, corosync 2.1.0, cluster-glue 1.0.11 > > I think you want these two patches: > > https://github.com/beekhof/pacemaker/commit/7282066 > https://github.com/beekhof/pacemaker/commit/280926a > > They came after the official release These help, indeed. Thanks! ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] stonithd crash on exit
On Wed, Oct 31, 2012 at 05:33:03PM +1100, Andrew Beekhof wrote: > I havent seen that before. What version? Pacemaker 1.1.8, corosync 2.1.0, cluster-glue 1.0.11 > On Wed, Oct 31, 2012 at 12:42 AM, Jacek Konieczny wrote: > > Hello, > > > > Probably this is not a critical problem, but it become annoying during > > my cluster setup/testing time: > > > > Whenever I restart corosync with 'systemctl restart corosync.service' I > > get message about stonithd crashing with SIGSEGV: > > > >> stonithd[3179]: segfault at 10 ip 00403144 sp 7fffe83d6370 > >> error 4 in stonithd (deleted)[40+13000] > >> stonithd/3179: potentially unexpected fatal signal 11. > > > > GDB shows this: > > > >> Program received signal SIGTERM, Terminated. > >> 0x7fd6ec319c18 in poll () from /lib64/libc.so.6 > >> (gdb) signal SIGTERM > >> Continuing with signal SIGTERM. > >> > >> Program received signal SIGSEGV, Segmentation fault. > >> 0x00403144 in main (argc=, argv=0x7fff4648f318) > >> at main.c:933 > >> 933 cluster.hb_conn->llc_ops->delete(cluster.hb_conn); > >> (gdb) bt > >> #0 0x00403144 in main (argc=, argv=0x7fff4648f318) > >> at main.c:933 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] stonithd crash on exit
Hello, Probably this is not a critical problem, but it become annoying during my cluster setup/testing time: Whenever I restart corosync with 'systemctl restart corosync.service' I get message about stonithd crashing with SIGSEGV: > stonithd[3179]: segfault at 10 ip 00403144 sp 7fffe83d6370 error > 4 in stonithd (deleted)[40+13000] > stonithd/3179: potentially unexpected fatal signal 11. GDB shows this: > Program received signal SIGTERM, Terminated. > 0x7fd6ec319c18 in poll () from /lib64/libc.so.6 > (gdb) signal SIGTERM > Continuing with signal SIGTERM. > > Program received signal SIGSEGV, Segmentation fault. > 0x00403144 in main (argc=, argv=0x7fff4648f318) > at main.c:933 > 933 cluster.hb_conn->llc_ops->delete(cluster.hb_conn); > (gdb) bt > #0 0x00403144 in main (argc=, argv=0x7fff4648f318) > at main.c:933 Greets, Jacek ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org