Hi,

Regarding to the discussion in the pacemaker ML below,
I would suggest a patch as attached.

The patch includes:
1) Fix IPaddr to return the correct OCF value (It returned 255 when
delete_interface failed).
2) Add a description about the assumption in IPaddr / IPaddr2 meta-data.

Regards,

Keisuke MORI

2010/4/14 Lars Ellenberg <lars.ellenb...@linbit.com>:
> On Tue, Apr 13, 2010 at 08:28:09PM +0200, Lars Ellenberg wrote:
>> On Tue, Apr 13, 2010 at 12:10:18PM +0200, Dejan Muhamedagic wrote:
>> > Hi,
>> >
>> > On Mon, Apr 12, 2010 at 05:26:19PM +0200, Markus M. wrote:
>> > > Markus M. wrote:
>> > > >is there a known problem with IPaddr(2) when defining many (in my
>> > > >case: 11) ip resources which are started/stopped concurrently?
>> >
>> > Don't remember any problems.
>> >
>> > > Well... some further investigation revealed that it seems to be a
>> > > problem with the way how the ip addresses are assigned.
>> > >
>> > > When looking at the output of "ip addr", the first ip address added
>> > > to the interface gets the scope "global", all further aliases gets
>> > > the scope "global secondary".
>> > >
>> > > If afterwards the first ip address is removed before the secondaries
>> > > (due to concurrently run of the scripts), ALL secondaries are
>> > > removed at the same time by the "ip" command, leading to an error
>> > > for all subsequent trials to remove the other ip addresses because
>> > > they are already gone.
>> > >
>> > > I am not sure how "ip" decides for the "secondary" scope, maybe
>> > > beacuse the other ip addresses are in the same subnet as the first
>> > > one.
>> >
>> > That sounds bad. Instances should be independent of each other.
>> > Can you please open a bugzilla and attach a hb_report.
>>
>> Oh, that is perfectly expected the way he describes it.
>> The assumption has always been that there is at least one
>> "normal", not managed by crm, address on the interface,
>> so no one will have noticed before.
>>
>> I suggest the following patch,
>> basically doing one retry.
>>
>> For the described scenario,
>> the second try will find the IP already "non existant",
>> and exit $OCF_SUCCESS.
>
> Though that obviously won't make instances independent.
>
> The typical way to achieve that is to have them all as "secondary" IPs.
> Which implies that for successful use of independent IPaddr2 resources
> on the same device, you need at least one "system" IP (as opposed to
> "managed by cluster") on that device.
>
> The first IP assigned will get "primary" status.
> Usually, if you delete a "primary" IP, the kernel will also
> delete all secondary IP addresses.
>
> If using a "system" IP is not an option, here is the alternative:
> "Recent" kernels (a quick check revealed that this setting is around
> since at least 2.6.12) can do "alias promotion", which can be enabled
> using
>        sysctl -w net.ipv4.conf.all.promote_secondaries=1
> (or per device)
>
> In both cases the previously "retry on ip_stop" patch is unnecesssary.
> But won't do any harm, either. Most likely ;-)
>
> Glad that helped ;-)
>
> Somebody please add that to the man page respectively agent meta data...
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>
> _______________________________________________
> Pacemaker mailing list: pacema...@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>



-- 
Keisuke MORI

Attachment: agents-ipaddr-retval.patch
Description: Binary data

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to