On Fri, Feb 17, 2012 at 5:05 AM, Dejan Muhamedagic <deja...@fastmail.fm> wrote:
> Hi,
>
> On Wed, Feb 15, 2012 at 04:24:15PM -0500, William Seligman wrote:
>> On 2/10/12 4:53 PM, William Seligman wrote:
>> > I'm trying to set up an Active/Active cluster (yes, I hear the sounds of 
>> > kittens
>> > dying). Versions:
>> >
>> > Scientific Linux 6.2
>> > pacemaker-1.1.6
>> > resource-agents-3.9.2
>> >
>> > I'm using cloned IPaddr2 resources:
>> >
>> > primitive ClusterIP ocf:heartbeat:IPaddr2 \
>> >         params ip="129.236.252.13" cidr_netmask="32" \
>> >         op monitor interval="30s"
>> > primitive ClusterIPLocal ocf:heartbeat:IPaddr2 \
>> >         params ip="10.44.7.13" cidr_netmask="32" \
>> >         op monitor interval="31s"
>> > primitive ClusterIPSandbox ocf:heartbeat:IPaddr2 \
>> >         params ip="10.43.7.13" cidr_netmask="32" \
>> >         op monitor interval="32s"
>> > group ClusterIPGroup ClusterIP ClusterIPLocal ClusterIPSandbox
>> > clone ClusterIPClone ClusterIPGroup
>> >
>> > When both nodes of my two-node cluster are running, everything looks and
>> > functions OK. From "service iptables status" on node 1 (hypatia-tb):
>> >
>> > 5    CLUSTERIP  all  --  0.0.0.0/0            10.43.7.13          CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
>> > local_node=1 hash_init=0
>> > 6    CLUSTERIP  all  --  0.0.0.0/0            10.44.7.13          CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
>> > local_node=1 hash_init=0
>> > 7    CLUSTERIP  all  --  0.0.0.0/0            129.236.252.13      CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
>> > local_node=1 hash_init=0
>> >
>> > On node 2 (orestes-tb):
>> >
>> > 5    CLUSTERIP  all  --  0.0.0.0/0            10.43.7.13          CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
>> > local_node=2 hash_init=0
>> > 6    CLUSTERIP  all  --  0.0.0.0/0            10.44.7.13          CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
>> > local_node=2 hash_init=0
>> > 7    CLUSTERIP  all  --  0.0.0.0/0            129.236.252.13      CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
>> > local_node=2 hash_init=0
>> >
>> > If I do a simple test of ssh'ing into 129.236.252.13, I see that I 
>> > alternately
>> > login into hypatia-tb and orestes-tb. All is good.
>> >
>> > Now take orestes-tb offline. The iptables rules on hypatia-tb are 
>> > unchanged:
>> >
>> > 5    CLUSTERIP  all  --  0.0.0.0/0            10.43.7.13          CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
>> > local_node=1 hash_init=0
>> > 6    CLUSTERIP  all  --  0.0.0.0/0            10.44.7.13          CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
>> > local_node=1 hash_init=0
>> > 7    CLUSTERIP  all  --  0.0.0.0/0            129.236.252.13      CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
>> > local_node=1 hash_init=0
>> >
>> > If I attempt to ssh to 129.236.252.13, whether or not I get in seems to be
>> > machine-dependent. On one machine I get in, from another I get a time-out. 
>> > Both
>> > machines show the same MAC address for 129.236.252.13:
>> >
>> > arp 129.236.252.13
>> > Address                  HWtype  HWaddress           Flags Mask            
>> > Iface
>> > hamilton-tb.nevis.colum  ether   B1:95:5A:B5:16:79   C                     
>> > eth0
>> >
>> > Is this the way the cloned IPaddr2 resource is supposed to behave in the 
>> > event
>> > of a node failure, or have I set things up incorrectly?
>>
>> I spent some time looking over the IPaddr2 script. As far as I can tell, the
>> script has no mechanism for reconfiguring iptables in the event of a change 
>> of
>> state in the number of clones.
>>
>> I might be stupid -- er -- dedicated enough to make this change on my own, 
>> then
>> share the code with the appropriate group. The change seems to be relatively
>> simple. It would be in the monitor operation. In pseudo-code:
>>
>> if ( <IPaddr2 resource is already started> ) then
>>   if ( OCF_RESKEY_CRM_meta_clone_max != OCF_RESKEY_CRM_meta_clone_max last 
>> time
>>     || OCF_RESKEY_CRM_meta_clone     != OCF_RESKEY_CRM_meta_clone last time )
>>     ip_stop
>>     ip_start
>
> Just changing the iptables entries should suffice, right?
> Besides, doing stop/start in the monitor is sort of unexpected.
> Another option is to add the missing node to one of the nodes
> which are still running (echo "+<n>" >>
> /proc/net/ipt_CLUSTERIP/<ip>). But any of that would be extremely
> tricky to implement properly (if not impossible).
>
>>   fi
>> fi
>>
>> If this would work, then I'd have two questions for the experts:
>>
>> - Would the values of OCF_RESKEY_CRM_meta_clone_max and/or
>> OCF_RESKEY_CRM_meta_clone change if the number of cloned copies of a resource
>> changed?
>
> OCF_RESKEY_CRM_meta_clone_max definitely not.
> OCF_RESKEY_CRM_meta_clone may change but also probably not; it's
> just a clone sequence number. In short, there's no way to figure
> out the total number of clones by examining the environment.
> Information such as membership changes doesn't trickle down to
> the resource instances.

What about notifications?  The would be the right point to
re-configure things I'd have thought.

> Of course, it's possible to find that out
> using say crm_node, but then actions need to be coordinated
> between the remaining nodes.
>
>> - Is there some standard mechanism by which RA scripts can maintain 
>> persistent
>> information between successive calls?
>
> No. One needs to keep the information in a local file.
>
>> I realize there's a flaw in the logic: it risks breaking an ongoing IP
>> connection. But as it stands, IPaddr2 is a clonable resource but not a
>> highly-available one. If one of N cloned copies goes down, then one out of N 
>> new
>> network connections to the IP address will fail.
>
> Interesting. Has anybody run into this before? I believe that
> there are some running similar setups. Does anybody have a
> solution for this?
>
> Thanks,
>
> Dejan
>
>> --
>> Bill Seligman             | Phone: (914) 591-2823
>> Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
>> PO Box 137                |
>> Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/
>>
>
>
>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to