[openstack-dev] [Openstack-dev][Neutron] Handling of ovs command errors

2013-11-25 Thread Salvatore Orlando
Hi,

I've been recently debugging some issues I've had with the OVS agent, and I
found out that in many  cases (possibly every case) the code just logs
errors from ovs-vsctl and ovs-ofctl without taking any action in the
control flow.

For instance, the routine which should do the wiring for a port, port_bound
[1], does not react in any way if it fails to configure the local vlan,
which I guess means the port would not be able to send/receive any data.

I'm pretty sure there's a good reason for this which I'm missing at the
moment. I am asking because I see a pretty large number of ALARM_CLOCK
errors returned by OVS commands in gate logs (see bug [2]), and I'm not
sure whether it's ok to handle them as the OVS agent is doing nowadays.

Regards,
Salvatore

[1]
https://github.com/openstack/neutron/blob/master/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py#L599
[2] https://bugs.launchpad.net/neutron/+bug/1254520
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-dev][Neutron] Handling of ovs command errors

2013-11-25 Thread Kyle Mestery (kmestery)
On Nov 25, 2013, at 8:28 AM, Salvatore Orlando sorla...@nicira.com wrote:
 
 Hi,
 
 I've been recently debugging some issues I've had with the OVS agent, and I 
 found out that in many  cases (possibly every case) the code just logs errors 
 from ovs-vsctl and ovs-ofctl without taking any action in the control flow.
 
 For instance, the routine which should do the wiring for a port, port_bound 
 [1], does not react in any way if it fails to configure the local vlan, which 
 I guess means the port would not be able to send/receive any data.
 
 I'm pretty sure there's a good reason for this which I'm missing at the 
 moment. I am asking because I see a pretty large number of ALARM_CLOCK errors 
 returned by OVS commands in gate logs (see bug [2]), and I'm not sure whether 
 it's ok to handle them as the OVS agent is doing nowadays.
 
Thanks for bringing this up Salvatore. It looks like the underlying run_vstcl 
[1] provides an ability to raise exceptions on errors, but this is not used by 
most of the callers of run_vsctl. Do you think we should be returning the 
exceptions back up the stack to callers to handle? I think that may be a good 
first step.

Thanks,
Kyle

[1] 
https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ovs_lib.py#L52

 Regards,
 Salvatore
 
 [1] 
 https://github.com/openstack/neutron/blob/master/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py#L599
 [2] https://bugs.launchpad.net/neutron/+bug/1254520
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-dev][Neutron] Handling of ovs command errors

2013-11-25 Thread Salvatore Orlando
Thanks Kyle,

More comments inline.

Salvatore


On 25 November 2013 16:03, Kyle Mestery (kmestery) kmest...@cisco.comwrote:

 On Nov 25, 2013, at 8:28 AM, Salvatore Orlando sorla...@nicira.com
 wrote:
 
  Hi,
 
  I've been recently debugging some issues I've had with the OVS agent,
 and I found out that in many  cases (possibly every case) the code just
 logs errors from ovs-vsctl and ovs-ofctl without taking any action in the
 control flow.
 
  For instance, the routine which should do the wiring for a port,
 port_bound [1], does not react in any way if it fails to configure the
 local vlan, which I guess means the port would not be able to send/receive
 any data.
 
  I'm pretty sure there's a good reason for this which I'm missing at the
 moment. I am asking because I see a pretty large number of ALARM_CLOCK
 errors returned by OVS commands in gate logs (see bug [2]), and I'm not
 sure whether it's ok to handle them as the OVS agent is doing nowadays.
 
 Thanks for bringing this up Salvatore. It looks like the underlying
 run_vstcl [1] provides an ability to raise exceptions on errors, but this
 is not used by most of the callers of run_vsctl. Do you think we should be
 returning the exceptions back up the stack to callers to handle? I think
 that may be a good first step.


I think it makes sense to start to handle errors; as they often happen in
the agent's rpc loop simply raising will probably just cause the agent to
crash.
I looked again at the code and it really seems it's silently ignoring
errors from ovs command.
This actually makes sense in some cases. For instance the l3 agent might
remove a qr-xxx or qg-xxx port while the l2 agent is in the middle of its
iteration.

There are however cases in which the exception must be handled.
In cases like the ALARM_CLOCK error, either a retry mechanism or marking
the port for re-syncing at the next iteration might make sense.
Other error cases might be unrecoverable; for instance when a port
disappears. In that case it seems reasonable to put the relevant neutron
port in ERROR state, so that the user is aware that the port anymore.


 Thanks,
 Kyle

 [1]
 https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ovs_lib.py#L52

  Regards,
  Salvatore
 
  [1]
 https://github.com/openstack/neutron/blob/master/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py#L599
  [2] https://bugs.launchpad.net/neutron/+bug/1254520
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-dev][Neutron] Handling of ovs command errors

2013-11-25 Thread Irena Berezovsky
Salvatore, 
Very good questions.
You raised your concerns for OVS agent, but I think it will be applicable for 
any other neutron agent that requires additional service to perform actions . 
At least, I was dealing with similar issues for Mellanox L2 agent. It makes 
sense for me if you fail to bind the port, it should be indicated  by neutron 
port status.
Another issue I had and try to solve  by the following patch: 
https://review.openstack.org/#/c/48842/ is the situation when agent fails to 
communicate with external daemon that responsible for actual programming. After 
number of retries with increasing back-off interval between retries, the agent 
will be terminated if fails to communicate. Does it make sense?

Regards,
Irena 

-Original Message-
From: Kyle Mestery (kmestery) [mailto:kmest...@cisco.com] 
Sent: Monday, November 25, 2013 11:16 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Openstack-dev][Neutron] Handling of ovs command 
errors

On Nov 25, 2013, at 12:36 PM, Salvatore Orlando sorla...@nicira.com wrote:
 
 Thanks Kyle,
 
 More comments inline.
 
 Salvatore
 
 
 On 25 November 2013 16:03, Kyle Mestery (kmestery) kmest...@cisco.com wrote:
 On Nov 25, 2013, at 8:28 AM, Salvatore Orlando sorla...@nicira.com wrote:
 
  Hi,
 
  I've been recently debugging some issues I've had with the OVS agent, and I 
  found out that in many  cases (possibly every case) the code just logs 
  errors from ovs-vsctl and ovs-ofctl without taking any action in the 
  control flow.
 
  For instance, the routine which should do the wiring for a port, port_bound 
  [1], does not react in any way if it fails to configure the local vlan, 
  which I guess means the port would not be able to send/receive any data.
 
  I'm pretty sure there's a good reason for this which I'm missing at the 
  moment. I am asking because I see a pretty large number of ALARM_CLOCK 
  errors returned by OVS commands in gate logs (see bug [2]), and I'm not 
  sure whether it's ok to handle them as the OVS agent is doing nowadays.
 
 Thanks for bringing this up Salvatore. It looks like the underlying run_vstcl 
 [1] provides an ability to raise exceptions on errors, but this is not used 
 by most of the callers of run_vsctl. Do you think we should be returning the 
 exceptions back up the stack to callers to handle? I think that may be a good 
 first step.
 
 I think it makes sense to start to handle errors; as they often happen in the 
 agent's rpc loop simply raising will probably just cause the agent to crash.
 I looked again at the code and it really seems it's silently ignoring errors 
 from ovs command.
 This actually makes sense in some cases. For instance the l3 agent might 
 remove a qr-xxx or qg-xxx port while the l2 agent is in the middle of its 
 iteration.
 
 There are however cases in which the exception must be handled.
 In cases like the ALARM_CLOCK error, either a retry mechanism or marking the 
 port for re-syncing at the next iteration might make sense.
 Other error cases might be unrecoverable; for instance when a port 
 disappears. In that case it seems reasonable to put the relevant neutron port 
 in ERROR state, so that the user is aware that the port anymore.
 
I think it makes sense to address these things. Want me to file a bug?

 Thanks,
 Kyle
 
 [1] 
 https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ovs_lib.py#L52
 
  Regards,
  Salvatore
 
  [1] 
  https://github.com/openstack/neutron/blob/master/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py#L599
  [2] https://bugs.launchpad.net/neutron/+bug/1254520
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev