Hi All, I posted below question on launchpad -quantum, but i didn't get any response from the team, may be its not that active as openstack mailing list.
I am facing an issue detailed in this question [ https://answers.launchpad.net/quantum/+question/221283] and did some analysis and shared it on the same question. You can find my analysis in the below mail as well. I am looking for suggestion from the openstack networking expert on how to further debug this issue. My deployment is stuck because of this issue. I really appreciate your help. Thanks Anil ---------- Forwarded message ---------- From: Anil Vishnoi <question221...@answers.launchpad.net> Date: Fri, Feb 8, 2013 at 3:41 AM Subject: Re: [Question #221283]: VM instance is not able to get IP address To: vishnoia...@gmail.com Your question #221283 on quantum changed: https://answers.launchpad.net/quantum/+question/221283 You gave more information on the question: Hi Team, I further debugged this issue, and figure out one workaround. I really don't want to say it a "workaround" but moreover its a hack. As i mentioned in the above description that because of the action=drop, DHCP packets were getting dropped and not reaching to the DHCP agent, and hence it was not able to respond with the DHCPOFFER response. First i resolve this error [Feb 07 17:32:40|00001|netdev_linux|WARN|/sys/class/net/tap9fdb5c15-26/carrier: open failed: ] with the following steps : 1. disable the network namespace for l3_agent and dhcp agent by modifying the use_namespace=false in the respective configuration file. 2. Delete the port (tap9fdb5c15-26) from the br-int bridge. [Quick instructions : root@management:~# ovs-vsctl del-port tap9fdb5c15-26 root@management:~# ovs-vsctl add-port br-int tap9fdb5c15-26 root@management:~# ovs-vsctl set port tap9fdb5c15-26 tag=1 root@management:~# ovs-vsctl set Interface tap9fdb5c15-26 type=internal ] 3. Restart both the services and it will create tap devices outside the network name space. If network namespace is enabled, ifconfig will not show this tap device in its output, but if you fire command 'ip netns exec dhcpnsXXXX ip -d link' it will show you the device. In my setup i followed the above step, but even if you don't want to disable namespace, you can stop dhcp agent, delete the port from br-int and restart the service. It possibly will resolve this error ( it did worked in my setup). So in my setup, namespace is disabled. And following is the output of ovs-dpctl root@management:~# ovs-dpctl show system@br-eth1: lookups: hit:151651 missed:37759 lost:0 flows: 3 port 0: br-eth1 (internal) port 1: eth1 port 3: phy-br-eth1 system@br-int: lookups: hit:1183 missed:23283 lost:0 flows: 1 port 0: br-int (internal) port 6: tap9fdb5c15-26 (internal) port 7: int-br-eth1 system@br-ex: lookups: hit:96895 missed:67156 lost:0 flows: 16 port 0: br-ex (internal) port 1: eth0 DHCP request packet is broadcast packet and it takes following path to reach the br-int port 1: eth1 (br-eth1) --> port 7: int-br-eth1(br- int) and this packet gets drop here because of the following rule installed on br-int bridge cookie=0x0, duration=11422.615s, table=0, n_packets=16711, n_bytes=1178562, priority=2,in_port=7 actions=drop Ideally it should be forwarded to port 6: tap9fdb5c15-26 (internal) (br- int) and that way it can reach DHCP agent. So i modified above flow to following flow cookie=0x0, duration=3169.501s, table=0, n_packets=2562, n_bytes=228241, priority=2,in_port=7 actions=output:6 and also installed following rule to route back the DHCPOFFER packet cookie=0x0, duration=4536.551s, table=0, n_packets=233, n_bytes=28896, priority=2,in_port=6 actions=output:7 So after installing these two flow rules, DHCP agent got the request and responded with the DHCPOFFER response. root@management:~# tail -f /var/log/syslog Feb 8 03:26:16 management dnsmasq-dhcp[25811]: DHCPREQUEST(tap9fdb5c15-26) 192.168.0.3 fa:16:3e:93:74:73 Feb 8 03:26:16 management dnsmasq-dhcp[25811]: DHCPACK(tap9fdb5c15-26) 192.168.0.3 fa:16:3e:93:74:73 192-168-0-3 DHCP response packet will take following path port 6: tap9fdb5c15-26 (internal)(br-int) ---> port 7: int-br-eth1(br-int) ---> port 3: phy-br- eth1 (br-eth1) ---> port 1: eth1 (br-eth1) and that way this packet will go out of controller node. But on br-eth1 bridge another rule was installed which was dropping the response cookie=0x0, duration=2669.22s, table=0, n_packets=173, n_bytes=18144, priority=2,in_port=3 actions=drop and i changed this flow to cookie=0x0, duration=2669.22s, table=0, n_packets=173, n_bytes=18144, priority=2,in_port=3 actions=output:1 so now packet can escape from the controller machine. Now follows the story of compute node side. Following is ovs-dpctl output of my compute node : system@br-eth1: lookups: hit:404442 missed:110048 lost:0 flows: 1 port 0: br-eth1 (internal) port 1: eth1 port 3: phy-br-eth1 system@br-int: lookups: hit:1884 missed:71022 lost:0 flows: 0 port 0: br-int (internal) port 3: int-br-eth1 port 4: qvo819abf08-ca port 6: tap718d359b-d1 <<VM Connected to this tap device Response packet should take following path: port 1: eth1(br-eth1) ---> port 3: int-br-eth1 (br-int) --->port 6: tap718d359b-d1 (br-int), but on br-int bridge following flow rule was installed which was dropping the response packet cookie=0x0, duration=1671.356s, table=0, n_packets=1068, n_bytes=99127, priority=2,in_port=3 actions=drop so i modified this flow to cookie=0x0, duration=1671.356s, table=0, n_packets=1068, n_bytes=99127, priority=2,in_port=3 actions=output:6 and that way it was forwarding the packet to my VM, and i can now see that IP address 192.168.0.3 is now assigned to my machine. Ideally this is the job of quantum plug-in, but not sure why its dropping all the packets from both the sides. Above exercise establishes the fact that dhcp agent is working fine here, its the network routing which is causing the issue, and that too openvswitch plug-in as per my understanding. Seeking suggestion from the networking experts on the list, what possibly can cause this issue, do openvswitch plug-in has any dependency on linux bridge or brcompat module to work properly ? because on controller node neither bridge module nor brcompat module is loaded. Obviously this hack won't work for all other cases, so we need to resolve the issue at the plugin level. Please suggest! Thanks Anil -- You received this question notification because you asked the question. -- Thanks & Regards --Anil Kumar Vishnoi
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp