Steps to debug.

  1.  Understand where exactly the problem lies
     *   Are you not able to reach the floating ip of instances?
        *   First start a continuous ping from an machine outside openstack to 
the floating ip
        *   Go to network node. Find the interface of the router that attaches 
your external network to the br-ex(external bridge, you should see it in 
bridge_mappings, the one with no vlan id ranges in its corresponding 
network_vlan_ranges)
        *   Note: This interface might not be in default network node host's 
namespace. It would exists inside the namespace that was created for your 
router. Your namespace for your router would normally be something like 
'qrouter-<router_id>' and you can view it using 'ip netns' command.
        *   Do 'tcpdump -lennvi <the interface>. To do this you would have to 
execute tcpdump inside the namespace mentioned above. You can do that by 'ip 
netns exec <namespace id> tcpdump -lennvi <interface_name>
        *   In your tcpdump do you see the ping requests arriving?
           *   No?
              *   If you do not see them then it might be that your physical 
network interface (say eth3) attached to br-ex is not in promiscous mode or it 
is not up.
              *   So you do 'ip link set <physical_interface> up', 'ip link set 
<physical_interface> promisc on'
           *   Yes?
              *   Go on the next step. Find the network interface attaching 
your router(external router) to your instance's network. Again it will be 
inside the same network namespace and to the tcpdump there.
              *   Here you should see the same ping request except that the ip 
you are pinging should be the private ip and not the floating ip. If this is 
not happening the problem lies in your neutron l3 agent and /or firewall driver.
                 *   If this too is happening you have to go to the below 
subject.
     *   Are the instances not able to reach other through their private ip 
itself?
        *   This could mean that your instance would also not be able to reach 
its gateway router. The router that is responsible for floating ip mapping and 
inter subnet connectivity.
        *   To check this start a continuous ping from one of the instances in 
openstack to the gateway router interface for that subnet.
        *   Start tracing where your packets are dropped using tcpdump. Below 
is the list of interface you are to look in the order from instance to router.
           *   The tap device attached to the instance. You can find this in 
the openstack dashboard page of the network.
           *   'int-br-eth1'
           *   'phy-br-eth1' at this interface the ping packets should carry a 
vlan(if you are using vlan mode)
           *   eth1( I am assuming that your physnet is bridged to br-eth1 and 
eth1 is attached to br-eth1) here the packets should carry a vlan id that was 
assigned to the openstack network while you created it.
           *   eth1 of the network node. 'phy-br-eth1',  'int-br-eth1' of 
network node. Then to the interface of the router in the instance's network


I agree Its too cryptic and would not make sense on first look but if you study 
the way neutron openvswitch agent works, you will see the flow I have mentioned 
above. If you could tell me where exactly your packet goes missing I could find 
a possible reason and solution to prevent outages.


There is however another way to debug using ovs-ofctl dump-flows on br-int and 
br-eth1 on both compute and network node. But this assumes that all flows are 
correctly programmed.


Thank you,

Ageeleshwar K





________________________________
From: Akshat Kansal [akshatk...@gmail.com]
Sent: Thursday, April 10, 2014 1:26 PM
To: Robert van Leeuwen
Cc: openstack@lists.openstack.org
Subject: Re: [Openstack] quantum openvswitch agent on compute nodes stops 
working.

Thanks Robert,

Yes other components still work, openvswitch works fine as no flows are dropped.
I even do not see any error in the logs, but still it stops working.

Also, after the restart it starts working fine,so I don't doubt the space in 
rabbit message queue to be a problem.

Regards
Akshat



On Thu, Apr 10, 2014 at 11:23 AM, Robert van Leeuwen 
<robert.vanleeu...@spilgames.com<mailto:robert.vanleeu...@spilgames.com>> wrote:
> I am facing a issue, where all of a sudden the quantum openvswitch agent 
> stops working and all the VMs lose
> connectivity and even the provisioning fails.
>
>Also, I also want to understand what is the role of quantum openvswitch agent.
>
>Any pointer will be helpful.

The agent setups the Openvswitch flows  (ovs-ofctl dump-flows).
I think it also creates the interfaces to be patched into the vms.

What does the openvswitch logs say? Do other components still work?

I think I saw something similar when rabbitmq did not have enough space (it 
needs at least 1GB free space).
You would be able to connect to rabbitmq (so no errors in the logs) but it 
stopped processing messages.

Cheers,
Robert van Leeuwen

http://www.csscorp.com/common/email-disclaimer.php
_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to