Re: [Linux-HA] Understanding the behavior of IPaddr2 clone

Dejan Muhamedagic Fri, 17 Feb 2012 04:15:43 -0800

On Fri, Feb 17, 2012 at 12:13:49PM +1100, Andrew Beekhof wrote:
> On Fri, Feb 17, 2012 at 5:05 AM, Dejan Muhamedagic <deja...@fastmail.fm> 
> wrote:
> > Hi,
> >
> > On Wed, Feb 15, 2012 at 04:24:15PM -0500, William Seligman wrote:
> >> On 2/10/12 4:53 PM, William Seligman wrote:
> >> > I'm trying to set up an Active/Active cluster (yes, I hear the sounds of 
> >> > kittens
> >> > dying). Versions:
> >> >
> >> > Scientific Linux 6.2
> >> > pacemaker-1.1.6
> >> > resource-agents-3.9.2
> >> >
> >> > I'm using cloned IPaddr2 resources:
> >> >
> >> > primitive ClusterIP ocf:heartbeat:IPaddr2 \
> >> >         params ip="129.236.252.13" cidr_netmask="32" \
> >> >         op monitor interval="30s"
> >> > primitive ClusterIPLocal ocf:heartbeat:IPaddr2 \
> >> >         params ip="10.44.7.13" cidr_netmask="32" \
> >> >         op monitor interval="31s"
> >> > primitive ClusterIPSandbox ocf:heartbeat:IPaddr2 \
> >> >         params ip="10.43.7.13" cidr_netmask="32" \
> >> >         op monitor interval="32s"
> >> > group ClusterIPGroup ClusterIP ClusterIPLocal ClusterIPSandbox
> >> > clone ClusterIPClone ClusterIPGroup
> >> >
> >> > When both nodes of my two-node cluster are running, everything looks and
> >> > functions OK. From "service iptables status" on node 1 (hypatia-tb):
> >> >
> >> > 5    CLUSTERIP  all  --  0.0.0.0/0            10.43.7.13          
> >> > CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> >> > local_node=1 hash_init=0
> >> > 6    CLUSTERIP  all  --  0.0.0.0/0            10.44.7.13          
> >> > CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> >> > local_node=1 hash_init=0
> >> > 7    CLUSTERIP  all  --  0.0.0.0/0            129.236.252.13      
> >> > CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> >> > local_node=1 hash_init=0
> >> >
> >> > On node 2 (orestes-tb):
> >> >
> >> > 5    CLUSTERIP  all  --  0.0.0.0/0            10.43.7.13          
> >> > CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> >> > local_node=2 hash_init=0
> >> > 6    CLUSTERIP  all  --  0.0.0.0/0            10.44.7.13          
> >> > CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> >> > local_node=2 hash_init=0
> >> > 7    CLUSTERIP  all  --  0.0.0.0/0            129.236.252.13      
> >> > CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> >> > local_node=2 hash_init=0
> >> >
> >> > If I do a simple test of ssh'ing into 129.236.252.13, I see that I 
> >> > alternately
> >> > login into hypatia-tb and orestes-tb. All is good.
> >> >
> >> > Now take orestes-tb offline. The iptables rules on hypatia-tb are 
> >> > unchanged:
> >> >
> >> > 5    CLUSTERIP  all  --  0.0.0.0/0            10.43.7.13          
> >> > CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> >> > local_node=1 hash_init=0
> >> > 6    CLUSTERIP  all  --  0.0.0.0/0            10.44.7.13          
> >> > CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> >> > local_node=1 hash_init=0
> >> > 7    CLUSTERIP  all  --  0.0.0.0/0            129.236.252.13      
> >> > CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> >> > local_node=1 hash_init=0
> >> >
> >> > If I attempt to ssh to 129.236.252.13, whether or not I get in seems to 
> >> > be
> >> > machine-dependent. On one machine I get in, from another I get a 
> >> > time-out. Both
> >> > machines show the same MAC address for 129.236.252.13:
> >> >
> >> > arp 129.236.252.13
> >> > Address                  HWtype  HWaddress           Flags Mask          
> >> >   Iface
> >> > hamilton-tb.nevis.colum  ether   B1:95:5A:B5:16:79   C                   
> >> >   eth0
> >> >
> >> > Is this the way the cloned IPaddr2 resource is supposed to behave in the 
> >> > event
> >> > of a node failure, or have I set things up incorrectly?
> >>
> >> I spent some time looking over the IPaddr2 script. As far as I can tell, 
> >> the
> >> script has no mechanism for reconfiguring iptables in the event of a 
> >> change of
> >> state in the number of clones.
> >>
> >> I might be stupid -- er -- dedicated enough to make this change on my own, 
> >> then
> >> share the code with the appropriate group. The change seems to be 
> >> relatively
> >> simple. It would be in the monitor operation. In pseudo-code:
> >>
> >> if ( <IPaddr2 resource is already started> ) then
> >>   if ( OCF_RESKEY_CRM_meta_clone_max != OCF_RESKEY_CRM_meta_clone_max last 
> >> time
> >>     || OCF_RESKEY_CRM_meta_clone     != OCF_RESKEY_CRM_meta_clone last 
> >> time )
> >>     ip_stop
> >>     ip_start
> >
> > Just changing the iptables entries should suffice, right?
> > Besides, doing stop/start in the monitor is sort of unexpected.
> > Another option is to add the missing node to one of the nodes
> > which are still running (echo "+<n>" >>
> > /proc/net/ipt_CLUSTERIP/<ip>). But any of that would be extremely
> > tricky to implement properly (if not impossible).
> >
> >>   fi
> >> fi
> >>
> >> If this would work, then I'd have two questions for the experts:
> >>
> >> - Would the values of OCF_RESKEY_CRM_meta_clone_max and/or
> >> OCF_RESKEY_CRM_meta_clone change if the number of cloned copies of a 
> >> resource
> >> changed?
> >
> > OCF_RESKEY_CRM_meta_clone_max definitely not.
> > OCF_RESKEY_CRM_meta_clone may change but also probably not; it's
> > just a clone sequence number. In short, there's no way to figure
> > out the total number of clones by examining the environment.
> > Information such as membership changes doesn't trickle down to
> > the resource instances.
> 
> What about notifications?  The would be the right point to
> re-configure things I'd have thought.


Sounds like the right way. Still, it may be hard to coordinate
between different instances. Unless we figure out how to map
nodes to numbers used by the CLUSTERIP. For instance, the notify
operation gets:

OCF_RESKEY_CRM_meta_notify_stop_resource="ip_lb:2 "
OCF_RESKEY_CRM_meta_notify_stop_uname="xen-f "

But the instance number may not match the node number from
/proc/net/ipt_CLUSTERIP/<ip> and that's where we should add the
node. It should be something like:

notify() {
        if node_down; then
                echo "+node_num" >> /proc/net/ipt_CLUSTERIP/<ip>
        elif node_up; then
                echo "-node_num" >> /proc/net/ipt_CLUSTERIP/<ip>
        fi
}

Another issue is that the above code should be executed on
_exactly_ one node.

Cheers,

Dejan
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Understanding the behavior of IPaddr2 clone

Reply via email to