Re: [Openstack] pacemaker would be wrong when both node have same hostname

2014-05-16 Thread walterxj







Hi Marica:    I have test your RA script for 2 days. After so many times of 
attempts it finally works well :)    I changed serveral 
settings(resource-stickiness,RA script,neutron l3-agent setting etc.) in my 
envirment ,so I can't tell which is the key point of my changes.    I have 
attach my RA script here ,hope to help anybody else with same problems,and I 
have commented for all my changes.    In my RA script I had made a change to 
yours restore old hostname section:    The origin section is :hostname 
Network01    I changed it to : hostname $(cat /etc/sysconfig/network | grep 
HOSTNAME | awk -F = '{print $2}')     So both nodes can use one RA script.
    Aleita's method is good because when l3-agent start on whicherver node,the 
l3-agent id which hosting-router is the same because the node's hostname is the 
same,    in this example script,it's network-controller. We can use neutron 
l3-agent-list-hosting-router $ext-router-id to check it.It will always be like: 
   
+--+++---+
     | id   | host   | 
admin_state_up | alive |

    
+--+++---+

    | -xx-x-x-| network-controller | True   
| :-)   |

    
+--+++---+
    So when one node goes down,other node's l3-agent can take the same l3-agent 
id as own.
    And my thought before your mail was: remove the l3-agent from hosting 
router and add the backup l3-agent to hosting router,something like:   
#=
    down_l3_agent_ID=$(/usr/bin/neutron agent-list | grep 'L3 agent' | awk 
'$7!='`hostname`' {print $2}') 

    back_l3_agent_ID=$(/usr/bin/neutron agent-list | grep 'L3 agent' | awk 
'$7=='`hostname`' {print $2}')
    for r in $(/usr/bin/neutron router-list-on-l3-agent $down_l3_agent_ID | awk 
'NR3  NF1{print $2}');

        do /usr/bin/neutron l3-agent-router-remove $down_l3_agent_ID $r  
/usr/bin/neutron l3-agent-router-add $back_l3_agent_ID $r;     done   
#=
    I think it will work as well,but your method is better I think :)     So 
thank you very much!
    btw: I have change OCF_RESKEY_agent_config_default to 
OCF_RESKEY_plugin_config_default, otherwise we can't set the pacemaker as 
high-availability-guide like :    primitive p_neutron-l3-agent 
ocf:openstack:neutron-agent-l3 \     params config=/etc/neutron/neutron.conf \

    plugin_config=/etc/neutron/l3_agent.ini \

    op monitor interval=30s timeout=30s 
    these changes are based on : 
https://bugs.launchpad.net/openstack-manuals/+bug/1252131 
 




Walter Xu
 From: walterxjDate: 2014-05-15 09:51To: Marica AntonacciSubject: Re: Re: 
[Openstack] pacemaker would be wrong when both node have same hostname
Hi Marica,
   When I use crm node standby,it seems work,but I think by that way,the 
virtual router still resident on the former node because when I poweroff this 
node, the VM instance can not access the external net.
   I'll test again carefully,After testing I'll feed back to you.
   Thank you again for your help. walterxj From: Marica AntonacciDate: 
2014-05-14 22:21To: xu WalterSubject: Re: [Openstack] pacemaker would be wrong 
when both node have same hostnameHi Walter,
we are using it in our production havana environment. We have tested it both 
using crm node standby” and “crm resource migrate g_network node2” and 
turning off the node network interfaces and shutting down the node, etc..
Have you modified our script using the correct hostnames for the two different 
nodes?
Cheers,Marica  
Il giorno 14/mag/2014, alle ore 16:08, xu Walter walte...@gmail.com ha 
scritto:

Hi Marica:    Thanks for your script,but it seems not work for me.I want to 
know how did you test it? Just use crm node standby or shutdown the node 
physically?


2014-05-14 19:49 GMT+08:00 Marica Antonacci marica.antona...@gmail.com:

Hi,
in attachment you can find our modified resource agent…we have noticed that the 
network namespaces (router and dhcp) are automatically re-created on the new 
node when the resource manager migrates the network controller on the other 
physical node (we have grouped all the services related to the network node).

Please, note that the attached script contains also other patches wrt to the RA 
available at 
https://raw.githubusercontent.com/madkiss/openstack-resource-agents/master/ocf/neutron-agent-l3
 because we found some issues with the resource agent parameters and the port 
used to check the established connection with the server; moreover we have 
added the start/stop operations for the neutron-plugin-openvswitch-agent since 
there is no available RA at the moment

Re: [Openstack] pacemaker would be wrong when both node have same hostname

2014-05-14 Thread Marica Antonacci
Hi all,

we are currently using pacemaker to manage 2 network nodes (node1, node2) and 
we have modified the neutron L3 agent RA in order to dynamically change the 
hostname of the active network node: start() function sets the hostname 
“network-controller to be used by the scheduler; the stop() function restores 
the old hostname (“node1” or “node2”). It seems to work, yet it’s a rude patch 
:) A more general solution that exploits neutron functionalities would be very 
appreciated!

Best,
Marica  

Il giorno 14/mag/2014, alle ore 12:34, walterxj walte...@gmail.com ha scritto:

 hi:
   the high-availability-guide 
 (http://docs.openstack.org/high-availability-guide/content/ch-network.html) 
 says that Both nodes should have the same hostname since the Networking 
 scheduler will be aware of one node, for example a virtual router attached to 
 a single L3 node.
 
  But when I test it on two servers with same hostname,after installing 
 corosync and pacemaker service on them(with no resource configured),the 
 crm_mon output goes into endless loop.And in the log of corosync,there are so 
 many messages like:May 09 22:25:40 [2149] TEST crmd: warning: crm_get_peer: 
 Node 'TEST' and 'TEST' share the same cluster nodeid: 1678901258.After this I 
 set diffrent nodeid in /etc/corosync/corosync.conf of each test node,but it 
 didn't help.
 So,I set diffrent hostname for each server,and then configure pacemaker 
 just like the manual except the hostname,the neutron-dhcp-agent and 
 neutron-metadata-agent works well,but neutron-l3-agent not(VM instance can't 
 not access the external net,further more the gateway of the VM instance can't 
 be accessed either).
 After two days checking,finally I found that we can use netron 
 l3-agent-router-remove network1_l3_agentid external-routeid and netron 
 l3-agent-router-add network2_l3_agentid external-routeid to let the backup 
 l3-agent to work when the former network node is down.(assume the two node's 
 names are network1 and network2),alternatively,we can update the mysql table 
 routerl3agentbindings in neutron base directly.If it make sense,I think we 
 can change the scrip neutron-agent-l3 , in it's neutron_l3_agent_start() 
 function,only need few lines to make it work well.
 
 Walter Xu
 ___
 Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
 Post to : openstack@lists.openstack.org
 Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] pacemaker would be wrong when both node have same hostname

2014-05-14 Thread walterxj
Hi all,

we are currently using pacemaker to manage 2 network nodes (node1, node2) and 
we have modified the neutron L3 agent RA in order to dynamically change the 
hostname of the active network node: start() function sets the hostname 
“network-controller to be used by the scheduler; the stop() function restores 
the old hostname (“node1” or “node2”). It seems to work, yet it’s a rude patch 
:) A more general solution that exploits neutron functionalities would be very 
appreciated!

Best,
Marica  

Il giorno 14/mag/2014, alle ore 12:34, walterxj walte...@gmail.com ha scritto:

 hi:
   the high-availability-guide 
 (http://docs.openstack.org/high-availability-guide/content/ch-network.html) 
 says that Both nodes should have the same hostname since the Networking 
 scheduler will be aware of one node, for example a virtual router attached to 
 a single L3 node.
 
  But when I test it on two servers with same hostname,after installing 
 corosync and pacemaker service on them(with no resource configured),the 
 crm_mon output goes into endless loop.And in the log of corosync,there are so 
 many messages like:May 09 22:25:40 [2149] TEST crmd: warning: crm_get_peer: 
 Node 'TEST' and 'TEST' share the same cluster nodeid: 1678901258.After this I 
 set diffrent nodeid in /etc/corosync/corosync.conf of each test node,but it 
 didn't help.
 So,I set diffrent hostname for each server,and then configure pacemaker 
 just like the manual except the hostname,the neutron-dhcp-agent and 
 neutron-metadata-agent works well,but neutron-l3-agent not(VM instance can't 
 not access the external net,further more the gateway of the VM instance can't 
 be accessed either).
 After two days checking,finally I found that we can use netron 
 l3-agent-router-remove network1_l3_agentid external-routeid and netron 
 l3-agent-router-add network2_l3_agentid external-routeid to let the backup 
 l3-agent to work when the former network node is down.(assume the two node's 
 names are network1 and network2),alternatively,we can update the mysql table 
 routerl3agentbindings in neutron base directly.If it make sense,I think we 
 can change the scrip neutron-agent-l3 , in it's neutron_l3_agent_start() 
 function,only need few lines to make it work well.
 
 Walter Xu
 ___
 Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
 Post to : openstack@lists.openstack.org
 Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack