Re: [Pacemaker] Upgrading from 1.0 to 1.1
29 июля 2011, 04:34 от Andrew Beekhof : On Wed, Jul 27, 2011 at 5:02 PM, Proskurin Kirill < k.prosku...@corp.mail.ru> wrote: > 27.07.2011 5:56, Andrew Beekhof пишет: >> >> On Tue, Jul 19, 2011 at 5:40 PM, Proskurin Kirill >> < k.prosku...@corp.mail.ru> wrote: >>> >>> On 07/19/2011 03:22 AM, Andrew Beekhof wrote: On Fri, Jul 15, 2011 at 10:33 PM, Proskurin Kirill < k.prosku...@corp.mail.ru> wrote: > > Hello all. > > I found what I using corosync with pacemaker "ver:0" with installed > pacemaker 1.1.5 - eg without start a pacemakerd. > > Sounds wrong. :-) > So I try to upgrade. > I shutdown one node. Change 0 to 1 on service.d/pcmk > Start corosync and then start pacemakerd via init script. > > But this node stays online and on clusters DC I see: > cib: [18392]: WARN: cib_peer_callback: Discarding cib_sync_one message > (255) > from mysender10.example.com: not in our membership Thats odd. The only you changed was ver: 0 to ver: 1 ? >>> >>> Yes, only this. To make it more clear - I have 4 nodes with ver 0 and try >>> to >>> add one with ver 1 and got this. >>> >>> Well I shutdown all nodes change all to 1 and star them up add all was >>> ok. >>> Not a really good way to upgrade but I don`t have time. >> >> Do you still have the logs for the failure case? >> I'd really like to see them. > > No I don`t. But some time ago I got same error on vise-versa situation - > then I try to add node with "ver: 0" to cluster there all nodes are "ver: 1" > > Anyway my cluster are down now so I can do some test. I will sent logs to > maillist if I reproduce this situation again. excellent I already try it with 3 node test cluster. I try 2 nodes with ver:1 and one with ver:0 - ver:0 stays offline. And i try vise-versa - 2 with ver:0 and 1 with ver:1 - ver:1 stays offline I post this on #linix-cluster IRC and check it with danfrincu. Hm, ok I have some time before production start and make test again to send you logs. -- Best regards, Proskurin Kirill___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Upgrading from 1.0 to 1.1
On Wed, Jul 27, 2011 at 5:02 PM, Proskurin Kirill wrote: > 27.07.2011 5:56, Andrew Beekhof пишет: >> >> On Tue, Jul 19, 2011 at 5:40 PM, Proskurin Kirill >> wrote: >>> >>> On 07/19/2011 03:22 AM, Andrew Beekhof wrote: On Fri, Jul 15, 2011 at 10:33 PM, Proskurin Kirill wrote: > > Hello all. > > I found what I using corosync with pacemaker "ver:0" with installed > pacemaker 1.1.5 - eg without start a pacemakerd. > > Sounds wrong. :-) > So I try to upgrade. > I shutdown one node. Change 0 to 1 on service.d/pcmk > Start corosync and then start pacemakerd via init script. > > But this node stays online and on clusters DC I see: > cib: [18392]: WARN: cib_peer_callback: Discarding cib_sync_one message > (255) > from mysender10.example.com: not in our membership Thats odd. The only you changed was ver: 0 to ver: 1 ? >>> >>> Yes, only this. To make it more clear - I have 4 nodes with ver 0 and try >>> to >>> add one with ver 1 and got this. >>> >>> Well I shutdown all nodes change all to 1 and star them up add all was >>> ok. >>> Not a really good way to upgrade but I don`t have time. >> >> Do you still have the logs for the failure case? >> I'd really like to see them. > > No I don`t. But some time ago I got same error on vise-versa situation - > then I try to add node with "ver: 0" to cluster there all nodes are "ver: 1" > > Anyway my cluster are down now so I can do some test. I will sent logs to > maillist if I reproduce this situation again. excellent ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Resources are not restarted on definition change after f59d7460bdde (devel)
On Wed, Jul 27, 2011 at 6:12 PM, Florian Haas wrote: > On 2011-07-27 03:46, Andrew Beekhof wrote: >> On Fri, Jul 1, 2011 at 4:59 PM, Andrew Beekhof wrote: >>> Hmm. Interesting. I will investigate. >> >> This is an unfortunate side-effect of my history compression patch. >> >> Since we only store the last successful and last failed operation, we >> don't have the md5 of the start operation around to check when a >> resource's definition is changed. >> >> Solutions appear to be either: >> a) give up the space savings and revert the history compression patch >> b) always restart a resource if a non-matching md5 is detected - even >> if the operation was a recurring monitor >> >> I'd favor b) along with dropping the per-operation parameters. >> The only valid use-case I've heard for those is setting OCF_LEVEL or >> depth or whatever it was called - and I think we're in basic agreement >> that we need a better solution for that anyway. > > We are, and you know my opinion that OCF_CHECK_LEVEL is hideous > (although lmb, for one, seems to disagree). His disagreement (IIUC) is with changing it in the RA API - not how its exposed to users in the config. > But dropping it now does > clearly count as a regression and I'd really hate to see that happen unless > > a) there is a replacement method for tuning the thoroughness of checking > the resource state during monitor, _and_ That would be the part about promoting it to the 'op' level albeit with a different name (it would still appear as OCF_CHECK_LEVEL in the RA's environment). > b) there is an automated or semi-automated ("cibadmin --upgrade"?) means > of transitioning off OCF_CHECK_LEVEL and replacing it with its successor > feature. Thats what our xslt's do. In any case, the dropping of per-op params is not a pre-req for the "restart the resource on any op change" part. That OCF_CHECK_LEVEL is the only param that is normally set this way just makes option b) palatable. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Not seeing VIF/VIP on pacemaker system
Lars, Thank you that was it. Len On Thu, Jul 28, 2011 at 3:19 PM, Lars Ellenberg wrote: > On Thu, Jul 28, 2011 at 02:09:46PM -0400, Leonard Smith wrote: >> I have a very simply cluster configuration where I have a Virtual IP >> that is shared between two hosts. It is working fine, except that I >> cannot goto the hosts, issue an ifconfig command, and see a virttual >> IP address or the fact that the IP address is bound to the host. >> >> I would expect to see a VIF or at least the fact that the ip address >> is bound to the eth0 interface. >> >> Centos 5.6 >> pacemaker-1.0.11-1.2.el5 >> pacemaker-libs-1.0.11-1.2.el5 >> >> >> >> node $id="xx" bos-vs002.foo.bar >> node $id="xx" bos-vs001.foo.bar >> >> primitive ClusterIP ocf:heartbeat:IPaddr2 \ >> params ip="10.1.0.22" cidr_netmask="255.255.252.0" nic="eth0" \ >> op monitor interval="10s" >> >> property $id="cib-bootstrap-options" \ >> dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \ >> cluster-infrastructure="Heartbeat" \ >> stonith-enabled="false" \ >> no-quorum-policy="ignore" \ >> default-resource-stickiness="1000" >> >> [root@bos-vs001 ~]# ifconfig -a >> eth0 Link encap:Ethernet HWaddr 00:16:36:41:D3:6D >> inet addr:10.1.1.1 Bcast:10.1.3.255 Mask:255.255.252.0 >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:454721 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:90795 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:257195727 (245.2 MiB) TX bytes:160400169 (152.9 MiB) >> >> lo Link encap:Local Loopback >> inet addr:127.0.0.1 Mask:255.0.0.0 >> UP LOOPBACK RUNNING MTU:16436 Metric:1 >> RX packets:146 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:146 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:0 >> RX bytes:13592 (13.2 KiB) TX bytes:13592 (13.2 KiB) > > IPaddr != IPaddr2, > ifconfig != ip (from the iproute package) > > # this will list the addresses: > ip addr show > # also try: > ip -o -f inet a s > man ip > > If you want/need ifconfig to see those aliases as well, you need to > label them, i.e. add the parameter iflabel to your primitive. > > > -- > : Lars Ellenberg > : LINBIT | Your Way to High Availability > : DRBD/HA support and consulting http://www.linbit.com > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Not seeing VIF/VIP on pacemaker system
On Thu, Jul 28, 2011 at 02:09:46PM -0400, Leonard Smith wrote: > I have a very simply cluster configuration where I have a Virtual IP > that is shared between two hosts. It is working fine, except that I > cannot goto the hosts, issue an ifconfig command, and see a virttual > IP address or the fact that the IP address is bound to the host. > > I would expect to see a VIF or at least the fact that the ip address > is bound to the eth0 interface. > > Centos 5.6 > pacemaker-1.0.11-1.2.el5 > pacemaker-libs-1.0.11-1.2.el5 > > > > node $id="xx" bos-vs002.foo.bar > node $id="xx" bos-vs001.foo.bar > > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > params ip="10.1.0.22" cidr_netmask="255.255.252.0" nic="eth0" \ > op monitor interval="10s" > > property $id="cib-bootstrap-options" \ > dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \ > cluster-infrastructure="Heartbeat" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > default-resource-stickiness="1000" > > [root@bos-vs001 ~]# ifconfig -a > eth0 Link encap:Ethernet HWaddr 00:16:36:41:D3:6D > inet addr:10.1.1.1 Bcast:10.1.3.255 Mask:255.255.252.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:454721 errors:0 dropped:0 overruns:0 frame:0 > TX packets:90795 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:257195727 (245.2 MiB) TX bytes:160400169 (152.9 MiB) > > loLink encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:146 errors:0 dropped:0 overruns:0 frame:0 > TX packets:146 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:13592 (13.2 KiB) TX bytes:13592 (13.2 KiB) IPaddr != IPaddr2, ifconfig != ip (from the iproute package) # this will list the addresses: ip addr show # also try: ip -o -f inet a s man ip If you want/need ifconfig to see those aliases as well, you need to label them, i.e. add the parameter iflabel to your primitive. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Not seeing VIF/VIP on pacemaker system
I have a very simply cluster configuration where I have a Virtual IP that is shared between two hosts. It is working fine, except that I cannot goto the hosts, issue an ifconfig command, and see a virttual IP address or the fact that the IP address is bound to the host. I would expect to see a VIF or at least the fact that the ip address is bound to the eth0 interface. Centos 5.6 pacemaker-1.0.11-1.2.el5 pacemaker-libs-1.0.11-1.2.el5 node $id="xx" bos-vs002.foo.bar node $id="xx" bos-vs001.foo.bar primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="10.1.0.22" cidr_netmask="255.255.252.0" nic="eth0" \ op monitor interval="10s" property $id="cib-bootstrap-options" \ dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ default-resource-stickiness="1000" [root@bos-vs001 ~]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:16:36:41:D3:6D inet addr:10.1.1.1 Bcast:10.1.3.255 Mask:255.255.252.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:454721 errors:0 dropped:0 overruns:0 frame:0 TX packets:90795 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:257195727 (245.2 MiB) TX bytes:160400169 (152.9 MiB) loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:146 errors:0 dropped:0 overruns:0 frame:0 TX packets:146 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:13592 (13.2 KiB) TX bytes:13592 (13.2 KiB) ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] ping RA question
Here's my fping RA, for anyone who's interested. Note that some of the parameters are different than ping/pingd, since fping works differently. The major advantages of fping over the system ping are that multiple hosts can be pinged with a single fping command, and fping will return as soon as all hosts succeed (the linux system ping will not return until it has exhausted either its count or the timeout, regardless of success). -- Dan Urist dur...@ucar.edu 303-497-2459 fping Description: Binary data ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Resource Group Questions - Start/Stop Order
On Tue, Jul 26, 2011 at 9:52 PM, Andrew Beekhof wrote: > On Thu, Jul 21, 2011 at 2:36 AM, Bobbie Lind wrote: > > Hi group, > > > > I am running a 6 node system, 4 of which mount the LUNs for my Lustre > file > > system. I currently have 29 LUNs per server set up in 4 Resource > Groups. I > > understand the default startup/shudown order of the resource but I was > > wondering if there is a way to override that and have all the resources > in > > the group startup or shutdown at the same time. Ideally what I am > looking > > for is all the resources in the group OSS1group to startup and shutdown > at > > the same time since none of them are dependent on each other, they just > > belong on the same server. > > I'd suggest just not using a group in this case. > If all you want is colocation, use a colocation set. > Thank you. That did exactly what I was looking for. Changing the groups to a colocation and adding a Dummy resource to map to the specific servers fixed the issue. I have attached an updated crm configuration for reference. Bobbie Lind Systems Engineer *Solutions Made Simple, Inc (SMSi)* node s02ns030.dsd.net \ attributes standby="off" node s02ns040 node s02ns050 node s02ns060 node s02ns070 \ attributes standby="off" node s02ns090 \ attributes standby="off" primitive resMDT ocf:heartbeat:Filesystem \ operations $id="resMDT-operations" \ op monitor interval="120" timeout="60" \ op start interval="0" timeout="300" \ op stop interval="0" timeout="300" \ params device="/dev/mapper/dsdw_mdt_vg-dsdw_mdt_vol" directory="/lustre/dsdw-MDT" fstype="lustre" \ meta target-role="Started" primitive resMDTLVM ocf:heartbeat:LVM \ params volgrpname="dsdw_mdt_vg" primitive resOST ocf:heartbeat:Filesystem \ operations $id="resOST-operations" \ op monitor interval="120" timeout="60" \ op start interval="0" timeout="300" \ op stop interval="0" timeout="300" \ params device="/dev/mapper/dsdw-OST" directory="/lustre/dsdw-OST" fstype="lustre" primitive resOST0001 ocf:heartbeat:Filesystem \ operations $id="resOST0001-operations" \ op monitor interval="120" timeout="60" \ op start interval="0" timeout="300" \ op stop interval="0" timeout="300" \ params device="/dev/mapper/dsdw-OST0001" directory="/lustre/dsdw-OST0001" fstype="lustre" primitive resOST0002 ocf:heartbeat:Filesystem \ operations $id="resOST0002-operations" \ op monitor interval="120" timeout="60" \ op start interval="0" timeout="300" \ op stop interval="0" timeout="300" \ params device="/dev/mapper/dsdw-OST0002" directory="/lustre/dsdw-OST0002" fstype="lustre" primitive resOST0003 ocf:heartbeat:Filesystem \ operations $id="resOST0003-operations" \ op monitor interval="120" timeout="60" \ op start interval="0" timeout="300" \ op stop interval="0" timeout="300" \ params device="/dev/mapper/dsdw-OST0003" directory="/lustre/dsdw-OST0003" fstype="lustre" primitive resOST0004 ocf:heartbeat:Filesystem \ operations $id="resOST0004-operations" \ op monitor interval="120" timeout="60" \ op start interval="0" timeout="300" \ op stop interval="0" timeout="300" \ params device="/dev/mapper/dsdw-OST0004" directory="/lustre/dsdw-OST0004" fstype="lustre" primitive resOST0005 ocf:heartbeat:Filesystem \ operations $id="resOST0005-operations" \ op monitor interval="120" timeout="60" \ op start interval="0" timeout="300" \ op stop interval="0" timeout="300" \ params device="/dev/mapper/dsdw-OST0005" directory="/lustre/dsdw-OST0005" fstype="lustre" primitive resOST0006 ocf:heartbeat:Filesystem \ operations $id="resOST0006-operations" \ op monitor interval="120" timeout="60" \ op start interval="0" timeout="300" \ op stop interval="0" timeout="300" \ params device="/dev/mapper/dsdw-OST0006" directory="/lustre/dsdw-OST0006" fstype="lustre" primitive resOST0007 ocf:heartbeat:Filesystem \ operations $id="resOST0007-operations" \ op monitor interval="120" timeout="60" \ op start interval="0" timeout="300" \ op stop interval="0" timeout="300" \ params device="/dev/mapper/dsdw-OST0007" directory="/lustre/dsdw-OST0007" fstype="lustre" primitive resOST0008 ocf:heartbeat:Filesystem \ operations $id="resOST0008-operations" \ op monitor interval="120" timeout="60" \ op start interval="0" timeout="300" \ op stop interval="0" timeout="300" \ params device="/dev/mapper/dsdw-OST0008" directory="/lustre/dsdw-OST0008" fstype="lustre" primitive resOST0009 ocf:heartbeat:Filesystem \ operations $id="resOST0009-operations" \ op monitor interval="120" timeout="60" \ op start interval="0" timeout="300" \