Re: [Pacemaker] Upgrading from 1.0 to 1.1

2011-07-28 Thread Kiril Proskurin
29 июля 2011, 04:34 от Andrew Beekhof :






On Wed, Jul 27, 2011 at 5:02 PM, Proskurin Kirill
<
k.prosku...@corp.mail.ru> wrote:
> 27.07.2011 5:56, Andrew Beekhof пишет:
>>
>> On Tue, Jul 19, 2011 at 5:40 PM, Proskurin Kirill
>> <
k.prosku...@corp.mail.ru>  wrote:
>>>
>>> On 07/19/2011 03:22 AM, Andrew Beekhof wrote:

 On Fri, Jul 15, 2011 at 10:33 PM, Proskurin Kirill
 <
k.prosku...@corp.mail.ru>    wrote:
>
> Hello all.
>
> I found what I using corosync with pacemaker "ver:0" with installed
> pacemaker 1.1.5 - eg without start a pacemakerd.
>
> Sounds wrong. :-)
> So I try to upgrade.
> I shutdown one node. Change 0 to 1 on service.d/pcmk
> Start corosync and then start pacemakerd via init script.
>
> But this node stays online and on clusters DC I see:
> cib: [18392]: WARN: cib_peer_callback: Discarding cib_sync_one message
> (255)
> from mysender10.example.com: not in our membership

 Thats odd.  The only you changed was ver: 0 to ver: 1 ?
>>>
>>> Yes, only this. To make it more clear - I have 4 nodes with ver 0 and try
>>> to
>>> add one with ver 1 and got this.
>>>
>>> Well I shutdown all nodes change all to 1 and star them up add all was
>>> ok.
>>> Not a really good way to upgrade but I don`t have time.
>>
>> Do you still have the logs for the failure case?
>> I'd really like to see them.
>
> No I don`t. But some time ago I got same error on vise-versa situation -
> then I try to add node with "ver: 0" to cluster there all nodes are "ver: 1"
>
> Anyway my cluster are down now so I can do some test. I will sent logs to
> maillist if I reproduce this situation again.

excellent

I already try it with 3 node test cluster.


I try 2 nodes with ver:1 and one with ver:0 - ver:0 stays offline. 
And i try vise-versa - 2 with ver:0 and 1 with ver:1 - ver:1 stays offline
I post this on #linix-cluster IRC and check it with danfrincu. 



Hm, ok I have some time before production start and make test again to send you 
logs. 
  

--  
Best regards, 
Proskurin Kirill___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Upgrading from 1.0 to 1.1

2011-07-28 Thread Andrew Beekhof
On Wed, Jul 27, 2011 at 5:02 PM, Proskurin Kirill
 wrote:
> 27.07.2011 5:56, Andrew Beekhof пишет:
>>
>> On Tue, Jul 19, 2011 at 5:40 PM, Proskurin Kirill
>>   wrote:
>>>
>>> On 07/19/2011 03:22 AM, Andrew Beekhof wrote:

 On Fri, Jul 15, 2011 at 10:33 PM, Proskurin Kirill
     wrote:
>
> Hello all.
>
> I found what I using corosync with pacemaker "ver:0" with installed
> pacemaker 1.1.5 - eg without start a pacemakerd.
>
> Sounds wrong. :-)
> So I try to upgrade.
> I shutdown one node. Change 0 to 1 on service.d/pcmk
> Start corosync and then start pacemakerd via init script.
>
> But this node stays online and on clusters DC I see:
> cib: [18392]: WARN: cib_peer_callback: Discarding cib_sync_one message
> (255)
> from mysender10.example.com: not in our membership

 Thats odd.  The only you changed was ver: 0 to ver: 1 ?
>>>
>>> Yes, only this. To make it more clear - I have 4 nodes with ver 0 and try
>>> to
>>> add one with ver 1 and got this.
>>>
>>> Well I shutdown all nodes change all to 1 and star them up add all was
>>> ok.
>>> Not a really good way to upgrade but I don`t have time.
>>
>> Do you still have the logs for the failure case?
>> I'd really like to see them.
>
> No I don`t. But some time ago I got same error on vise-versa situation -
> then I try to add node with "ver: 0" to cluster there all nodes are "ver: 1"
>
> Anyway my cluster are down now so I can do some test. I will sent logs to
> maillist if I reproduce this situation again.

excellent

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Resources are not restarted on definition change after f59d7460bdde (devel)

2011-07-28 Thread Andrew Beekhof
On Wed, Jul 27, 2011 at 6:12 PM, Florian Haas  wrote:
> On 2011-07-27 03:46, Andrew Beekhof wrote:
>> On Fri, Jul 1, 2011 at 4:59 PM, Andrew Beekhof  wrote:
>>> Hmm.  Interesting. I will investigate.
>>
>> This is an unfortunate side-effect of my history compression patch.
>>
>> Since we only store the last successful and last failed operation, we
>> don't have the md5 of the start operation around to check when a
>> resource's definition is changed.
>>
>> Solutions appear to be either:
>> a) give up the space savings and revert the history compression patch
>> b) always restart a resource if a non-matching md5 is detected - even
>> if the operation was a recurring monitor
>>
>> I'd favor b) along with dropping the per-operation parameters.
>> The only valid use-case I've heard for those is setting OCF_LEVEL or
>> depth or whatever it was called - and I think we're in basic agreement
>> that we need a better solution for that anyway.
>
> We are, and you know my opinion that OCF_CHECK_LEVEL is hideous
> (although lmb, for one, seems to disagree).

His disagreement (IIUC) is with changing it in the RA API - not how
its exposed to users in the config.

> But dropping it now does
> clearly count as a regression and I'd really hate to see that happen unless
>
> a) there is a replacement method for tuning the thoroughness of checking
> the resource state during monitor, _and_

That would be the part about promoting it to the 'op' level albeit
with a different name (it would still appear as OCF_CHECK_LEVEL in the
RA's environment).

> b) there is an automated or semi-automated ("cibadmin --upgrade"?) means
> of transitioning off OCF_CHECK_LEVEL and replacing it with its successor
> feature.

Thats what our xslt's do.

In any case, the dropping of per-op params is not a pre-req for the
"restart the resource on any op change" part.
That OCF_CHECK_LEVEL is the only param that is normally set this way
just makes option b) palatable.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Not seeing VIF/VIP on pacemaker system

2011-07-28 Thread Leonard Smith
Lars,

Thank you that was it.

Len


On Thu, Jul 28, 2011 at 3:19 PM, Lars Ellenberg
 wrote:
> On Thu, Jul 28, 2011 at 02:09:46PM -0400, Leonard Smith wrote:
>> I have a very simply cluster configuration where I have a Virtual IP
>> that is shared between two hosts. It is working fine, except that I
>> cannot goto the hosts, issue an ifconfig command, and see a virttual
>> IP address or the fact that the IP address is bound to the host.
>>
>> I would expect to see a VIF or at least the fact that the ip address
>> is bound to the eth0 interface.
>>
>> Centos 5.6
>> pacemaker-1.0.11-1.2.el5
>> pacemaker-libs-1.0.11-1.2.el5
>>
>>
>>
>> node $id="xx" bos-vs002.foo.bar
>> node $id="xx" bos-vs001.foo.bar
>>
>> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>>       params ip="10.1.0.22" cidr_netmask="255.255.252.0" nic="eth0" \
>>       op monitor interval="10s"
>>
>> property $id="cib-bootstrap-options" \
>>       dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
>>       cluster-infrastructure="Heartbeat" \
>>       stonith-enabled="false" \
>>       no-quorum-policy="ignore" \
>>       default-resource-stickiness="1000"
>>
>> [root@bos-vs001 ~]# ifconfig -a
>> eth0      Link encap:Ethernet  HWaddr 00:16:36:41:D3:6D
>>           inet addr:10.1.1.1  Bcast:10.1.3.255  Mask:255.255.252.0
>>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>           RX packets:454721 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:90795 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:1000
>>           RX bytes:257195727 (245.2 MiB)  TX bytes:160400169 (152.9 MiB)
>>
>> lo        Link encap:Local Loopback
>>           inet addr:127.0.0.1  Mask:255.0.0.0
>>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>>           RX packets:146 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:146 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:0
>>           RX bytes:13592 (13.2 KiB)  TX bytes:13592 (13.2 KiB)
>
> IPaddr != IPaddr2,
> ifconfig != ip (from the iproute package)
>
> # this will list the addresses:
> ip addr show
> # also try:
> ip -o -f inet a s
> man ip
>
> If you want/need ifconfig to see those aliases as well, you need to
> label them, i.e. add the parameter iflabel to your primitive.
>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Not seeing VIF/VIP on pacemaker system

2011-07-28 Thread Lars Ellenberg
On Thu, Jul 28, 2011 at 02:09:46PM -0400, Leonard Smith wrote:
> I have a very simply cluster configuration where I have a Virtual IP
> that is shared between two hosts. It is working fine, except that I
> cannot goto the hosts, issue an ifconfig command, and see a virttual
> IP address or the fact that the IP address is bound to the host.
> 
> I would expect to see a VIF or at least the fact that the ip address
> is bound to the eth0 interface.
> 
> Centos 5.6
> pacemaker-1.0.11-1.2.el5
> pacemaker-libs-1.0.11-1.2.el5
> 
> 
> 
> node $id="xx" bos-vs002.foo.bar
> node $id="xx" bos-vs001.foo.bar
> 
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>   params ip="10.1.0.22" cidr_netmask="255.255.252.0" nic="eth0" \
>   op monitor interval="10s"
> 
> property $id="cib-bootstrap-options" \
>   dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
>   cluster-infrastructure="Heartbeat" \
>   stonith-enabled="false" \
>   no-quorum-policy="ignore" \
>   default-resource-stickiness="1000"
> 
> [root@bos-vs001 ~]# ifconfig -a
> eth0  Link encap:Ethernet  HWaddr 00:16:36:41:D3:6D
>   inet addr:10.1.1.1  Bcast:10.1.3.255  Mask:255.255.252.0
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:454721 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:90795 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:257195727 (245.2 MiB)  TX bytes:160400169 (152.9 MiB)
> 
> loLink encap:Local Loopback
>   inet addr:127.0.0.1  Mask:255.0.0.0
>   UP LOOPBACK RUNNING  MTU:16436  Metric:1
>   RX packets:146 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:146 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:0
>   RX bytes:13592 (13.2 KiB)  TX bytes:13592 (13.2 KiB)

IPaddr != IPaddr2,
ifconfig != ip (from the iproute package)

# this will list the addresses:
ip addr show 
# also try:
ip -o -f inet a s
man ip

If you want/need ifconfig to see those aliases as well, you need to
label them, i.e. add the parameter iflabel to your primitive.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Not seeing VIF/VIP on pacemaker system

2011-07-28 Thread Leonard Smith
I have a very simply cluster configuration where I have a Virtual IP
that is shared between two hosts. It is working fine, except that I
cannot goto the hosts, issue an ifconfig command, and see a virttual
IP address or the fact that the IP address is bound to the host.

I would expect to see a VIF or at least the fact that the ip address
is bound to the eth0 interface.

Centos 5.6
pacemaker-1.0.11-1.2.el5
pacemaker-libs-1.0.11-1.2.el5



node $id="xx" bos-vs002.foo.bar
node $id="xx" bos-vs001.foo.bar

primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="10.1.0.22" cidr_netmask="255.255.252.0" nic="eth0" \
op monitor interval="10s"

property $id="cib-bootstrap-options" \
dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
default-resource-stickiness="1000"

[root@bos-vs001 ~]# ifconfig -a
eth0  Link encap:Ethernet  HWaddr 00:16:36:41:D3:6D
  inet addr:10.1.1.1  Bcast:10.1.3.255  Mask:255.255.252.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:454721 errors:0 dropped:0 overruns:0 frame:0
  TX packets:90795 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:257195727 (245.2 MiB)  TX bytes:160400169 (152.9 MiB)

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:146 errors:0 dropped:0 overruns:0 frame:0
  TX packets:146 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:13592 (13.2 KiB)  TX bytes:13592 (13.2 KiB)

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] ping RA question

2011-07-28 Thread Dan Urist
Here's my fping RA, for anyone who's interested. Note that some of the
parameters are different than ping/pingd, since fping works
differently. 

The major advantages of fping over the system ping are that multiple
hosts can be pinged with a single fping command, and fping will return
as soon as all hosts succeed (the linux system ping will not return
until it has exhausted either its count or the timeout, regardless of
success).
-- 
Dan Urist
dur...@ucar.edu
303-497-2459


fping
Description: Binary data
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Resource Group Questions - Start/Stop Order

2011-07-28 Thread Bobbie Lind
On Tue, Jul 26, 2011 at 9:52 PM, Andrew Beekhof  wrote:

> On Thu, Jul 21, 2011 at 2:36 AM, Bobbie Lind  wrote:
> > Hi group,
> >
> > I am running a 6 node system, 4 of which mount the LUNs for my Lustre
> file
> > system.  I currently have 29 LUNs per server set up in 4 Resource
> Groups.  I
> > understand the default startup/shudown order of the resource but I was
> > wondering if there is a way to override that and have all the resources
> in
> > the group startup or shutdown at the same time.  Ideally what I am
> looking
> > for is all the resources in the group OSS1group to startup and shutdown
> at
> > the same time since none of them are dependent on each other, they just
> > belong on the same server.
>
> I'd suggest just not using a group in this case.
> If all you want is colocation, use a colocation set.
>

Thank you.  That did exactly what I was looking for.
Changing the groups to a colocation and adding a Dummy resource to map to
the specific servers fixed the issue.

I have attached an updated crm configuration for reference.

Bobbie Lind
Systems Engineer
*Solutions Made Simple, Inc (SMSi)*
node s02ns030.dsd.net \
attributes standby="off"
node s02ns040
node s02ns050
node s02ns060
node s02ns070 \
attributes standby="off"
node s02ns090 \
attributes standby="off"
primitive resMDT ocf:heartbeat:Filesystem \
operations $id="resMDT-operations" \
op monitor interval="120" timeout="60" \
op start interval="0" timeout="300" \
op stop interval="0" timeout="300" \
params device="/dev/mapper/dsdw_mdt_vg-dsdw_mdt_vol" 
directory="/lustre/dsdw-MDT" fstype="lustre" \
meta target-role="Started"
primitive resMDTLVM ocf:heartbeat:LVM \
params volgrpname="dsdw_mdt_vg"
primitive resOST ocf:heartbeat:Filesystem \
operations $id="resOST-operations" \
op monitor interval="120" timeout="60" \
op start interval="0" timeout="300" \
op stop interval="0" timeout="300" \
params device="/dev/mapper/dsdw-OST" 
directory="/lustre/dsdw-OST" fstype="lustre"
primitive resOST0001 ocf:heartbeat:Filesystem \
operations $id="resOST0001-operations" \
op monitor interval="120" timeout="60" \
op start interval="0" timeout="300" \
op stop interval="0" timeout="300" \
params device="/dev/mapper/dsdw-OST0001" 
directory="/lustre/dsdw-OST0001" fstype="lustre"
primitive resOST0002 ocf:heartbeat:Filesystem \
operations $id="resOST0002-operations" \
op monitor interval="120" timeout="60" \
op start interval="0" timeout="300" \
op stop interval="0" timeout="300" \
params device="/dev/mapper/dsdw-OST0002" 
directory="/lustre/dsdw-OST0002" fstype="lustre"
primitive resOST0003 ocf:heartbeat:Filesystem \
operations $id="resOST0003-operations" \
op monitor interval="120" timeout="60" \
op start interval="0" timeout="300" \
op stop interval="0" timeout="300" \
params device="/dev/mapper/dsdw-OST0003" 
directory="/lustre/dsdw-OST0003" fstype="lustre"
primitive resOST0004 ocf:heartbeat:Filesystem \
operations $id="resOST0004-operations" \
op monitor interval="120" timeout="60" \
op start interval="0" timeout="300" \
op stop interval="0" timeout="300" \
params device="/dev/mapper/dsdw-OST0004" 
directory="/lustre/dsdw-OST0004" fstype="lustre"
primitive resOST0005 ocf:heartbeat:Filesystem \
operations $id="resOST0005-operations" \
op monitor interval="120" timeout="60" \
op start interval="0" timeout="300" \
op stop interval="0" timeout="300" \
params device="/dev/mapper/dsdw-OST0005" 
directory="/lustre/dsdw-OST0005" fstype="lustre"
primitive resOST0006 ocf:heartbeat:Filesystem \
operations $id="resOST0006-operations" \
op monitor interval="120" timeout="60" \
op start interval="0" timeout="300" \
op stop interval="0" timeout="300" \
params device="/dev/mapper/dsdw-OST0006" 
directory="/lustre/dsdw-OST0006" fstype="lustre"
primitive resOST0007 ocf:heartbeat:Filesystem \
operations $id="resOST0007-operations" \
op monitor interval="120" timeout="60" \
op start interval="0" timeout="300" \
op stop interval="0" timeout="300" \
params device="/dev/mapper/dsdw-OST0007" 
directory="/lustre/dsdw-OST0007" fstype="lustre"
primitive resOST0008 ocf:heartbeat:Filesystem \
operations $id="resOST0008-operations" \
op monitor interval="120" timeout="60" \
op start interval="0" timeout="300" \
op stop interval="0" timeout="300" \
params device="/dev/mapper/dsdw-OST0008" 
directory="/lustre/dsdw-OST0008" fstype="lustre"
primitive resOST0009 ocf:heartbeat:Filesystem \
operations $id="resOST0009-operations" \
op monitor interval="120" timeout="60" \
op start interval="0" timeout="300" \