Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-12-09 Thread James Polley
On Fri, Oct 31, 2014 at 3:28 PM, Ben Nemec openst...@nemebean.com wrote:

 On 10/29/2014 10:17 AM, Kyle Mestery wrote:
  On Wed, Oct 29, 2014 at 7:25 AM, Hly henry4...@gmail.com wrote:
 
 
  Sent from my iPad
 
  On 2014-10-29, at 下午8:01, Robert van Leeuwen 
 robert.vanleeu...@spilgames.com wrote:
 
  I find our current design is remove all flows then add flow by
 entry, this
  will cause every network node will break off all tunnels between
 other
  network node and all compute node.
  Perhaps a way around this would be to add a flag on agent startup
  which would have it skip reprogramming flows. This could be used for
  the upgrade case.
 
  I hit the same issue last week and filed a bug here:
  https://bugs.launchpad.net/neutron/+bug/1383674
 
  From an operators perspective this is VERY annoying since you also
 cannot push any config changes that requires/triggers a restart of the
 agent.
  e.g. something simple like changing a log setting becomes a hassle.
  I would prefer the default behaviour to be to not clear the flows or
 at the least an config option to disable it.
 
 
  +1, we also suffered from this even when a very little patch is done
 
  I'd really like to get some input from the tripleo folks, because they
  were the ones who filed the original bug here and were hit by the
  agent NOT reprogramming flows on agent restart. It does seem fairly
  obvious that adding an option around this would be a good way forward,
  however.

 Since nobody else has commented, I'll put in my two cents (though I
 might be overcharging you ;-).  I've also added the TripleO tag to the
 subject, although with Summit coming up I don't know if that will help.


Summit did lead to some delays - I started this response and then got
distracted, and only just found the draft again


 Anyway, if the bug you're referring to is the one I think, then our
 issue was just with the flows not existing.  I don't think we care
 whether they get reprogrammed on agent restart or not as long as they
 somehow come into existence at some point.


Is https://bugs.launchpad.net/bugs/1290486 the bug in you'rethinking of?

That seems to have been solved with https://review.openstack.org/#/c/96919/

My memory of that problem is that prior to 96919, when the daemon was
restarted, existing flows were thrown away. We'd end up with just a NORMAL
flow, which didn't route the traffic where we need it.

The fix implemented there seems to have been to implement a canary rule to
detect when this happens - ie, detect that all the existing flows had been
thrown away. Once we know they've been thrown away, we know we need to
recreate the flows that were thrown away when the daemon restarted.

If my memory is correct (and it may not be, I'm not 100% sure I fully
understood the problem at the time), the root cause here is not the change
added in 96919 - by the time that code is triggered and the flows are
reprogrammed, they've already been lost.



 It's possible I'm wrong about that, and probably the best person to talk
 to would be Robert Collins since I think he's the one who actually
 tracked down the problem in the first place.


I think (if I'm looking at the right bug) that you're referring to his
comment:

we're trying to do things before ovs-db is up and running and neutron-
openvswitch-agent is not handling ovs-db being down properly - it should
back off and retry, or alternatively, do a full sync once the db is
available.


As far as I can tell, everything after that point (ie, once I got involved)
focused on the latter, which is why we ended up with the canary and the
reprogramming. Assuming he's right about the race condition, it sounds as
though fixing that might be preferable. Later discussion on this thread has
centered around a full flow-synchornization approach: it sounds to me as
though handling the db being unavailable will need to be part of that
approach (we don't want to synchronize towards no rules just because we
can't get a canonical list of rules from the DB)


 -Ben


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-11-28 Thread Erik Moe

Hi,

What is the status of this?

It looks like simplistic approach might not be that far from flow 
synchronization. Both methods needs to reinitialize internal structures so that 
they match deployed configuration. For example provision_local_vlan picks a 
free VLAN. This has to be the same one after restart.

Are you trying to also support an upgrade use case, not only agent restart?

/Erik


From: Damon Wang [mailto:damon.dev...@gmail.com]
Sent: den 7 november 2014 11:27
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent 
start? why and how avoid?

Hi all,
Let me introduce our experiment's result:
First we write an patch: https://review.openstack.org/#/c/131791/, and tried to 
use it in an experiment environment.
Bad things happened:
1. Note that this is the old flows (Network node's br-tun, the previous version 
is about icehouse):
cookie=0x0, duration=238379.566s, table=1, n_packets=373521, n_bytes=26981817, 
idle_age=0, hard_age=65534, 
priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,21)
cookie=0x0, duration=238379.575s, table=1, n_packets=30101, n_bytes=3603857, 
idle_age=198, hard_age=65534, 
priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
cookie=0x0, duration=238379.530s, table=20, n_packets=4957, n_bytes=631543, 
idle_age=198, hard_age=65534, priority=0 actions=resubmit(,21)
If the packet is a broadcast packet, we will resubmit it to table 20, and table 
20 will do nothing but resubmit to table 21.
the full sequence is:
from vxlan ports?: table 0 - table 3 - table 10 (learn flows and insert to 
table 20)
from br-int?: table 0 - table 1 - (table 20) - table 21

In the new version (about to juno), we discard table 1, use table 2 instead:
cookie=0x0, duration=142084.354s, table=2, n_packets=175823, n_bytes=12323286, 
idle_age=0, hard_age=65534, 
priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
cookie=0x0, duration=142084.364s, table=2, n_packets=861601, 
n_bytes=107499857, idle_age=0, hard_age=65534, 
priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
But if haven't remove all old flows, the table 1 will still exists, and it will 
intercept packets, and try to submit packets to table 21 and 20, which the 
correct tables are 22 and 20.
the full sequence is:
from vxlan ports?: table 0 - table 4 - table 10
from br-int?: table 0 - table 2 - (table 20, maybe output then!) - table 22
Let's image we mix these up, because priority is 1 to table 0's flows, so we 
can't make sure packets will trans to right flow, so some packets may submit to 
table 21, this is quite beyond the pale!
2. What's more, let's imagine if we both use vxlan and vlan as provider:
  +-+
  | |
  |  namespace  |++
  | +---++  |||
  | | qg-|  ||  namespace |
  | ||  |||
  | ++  || ++ |
  | || |  tap   | |
  | ++  || ++ |
  | | qr x   |  |||
  | ++  |+--+-+
  | |   |
  +---+++   |
  |||
+-+++---+
|   |
+---+   |   |  
+---+
|   |   |   br-int  |  |
   |
|  ovs-br vlan  +---+   +--+  
br-tun(vxlan)|
|   |   |   |  |
   |
+---+---+   |   |  
+-+-+
|   +---+   
 |
|   
 |
|   
 |
|  +-+  
 |
|  | |  
 |
|  | 
+---+
+--+ |
   | eth0(ethernet card

Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-11-07 Thread Damon Wang
:57, Erik Moe erik@ericsson.com wrote:



 Hi,



 I also agree, IMHO we need flow synchronization method so we can avoid
 network downtime and stray flows.



 Regards,

 Erik





 *From:* Germy Lure [mailto:germy.l...@gmail.com]
 *Sent:* den 5 november 2014 10:46
 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [neutron][TripleO] Clear all flows when
 ovs agent start? why and how avoid?



 Hi Salvatore,

 A startup flag is really a simpler approach. But in what situation we
 should set this flag to remove all flows? upgrade? restart manually?
 internal fault?



 Indeed, only at the time that there are inconsistent(incorrect,
 unwanted, stable and so on) flows between agent and the ovs related, we
 need refresh flows. But the problem is how we know this? I think a startup
 flag is too rough, unless we can tolerate the inconsistent situation.



 Of course, I believe that turn off startup reset flows action can
 resolve most problem. The flows are correct most time after all. But
 considering NFV 5 9s, I still recommend flow synchronization approach.



 BR,

 Germy



 On Wed, Nov 5, 2014 at 3:36 PM, Salvatore Orlando sorla...@nicira.com
 wrote:

 From what I gather from this thread and related bug report, the change
 introduced in the OVS agent is causing a data plane outage upon agent
 restart, which is not desirable in most cases.



 The rationale for the change that introduced this bug was, I believe,
 cleaning up stale flows on the OVS agent, which also makes some sense.



 Unless I'm missing something, I reckon the best way forward is actually
 quite straightforward; we might add a startup flag to reset all flows and
 not reset them by default.

 While I agree the flow synchronisation process proposed in the
 previous post is valuable too, I hope we might be able to fix this with a
 simpler approach.



 Salvatore



 On 5 November 2014 04:43, Germy Lure germy.l...@gmail.com wrote:

 Hi,



 Consider the triggering of restart agent, I think it's nothing but:

 1). only restart agent

 2). reboot the host that agent deployed on



 When the agent started, the ovs may:

 a.have all correct flows

 b.have nothing at all

 c.have partly correct flows, the others may need to be reprogrammed,
 deleted or added



 In any case, I think both user and developer would happy to see that the
 system recovery ASAP after agent restarting. The best is agent only push
 those incorrect flows, but keep the correct ones. This can ensure those
 business with correct flows working during agent starting.



 So, I suggest two solutions:

 1.Agent gets all flows from ovs and compare with its local flows after
 restarting. And agent only corrects the different ones.

 2.Adapt ovs and agent. Agent just push all(not remove) flows every time
 and ovs prepares two tables for flows switch(like RCU lock).



 1 is recommended because of the 3rd vendors.



 BR,

 Germy





 On Fri, Oct 31, 2014 at 10:28 PM, Ben Nemec openst...@nemebean.com
 wrote:

 On 10/29/2014 10:17 AM, Kyle Mestery wrote:
  On Wed, Oct 29, 2014 at 7:25 AM, Hly henry4...@gmail.com wrote:
 
 
  Sent from my iPad
 
  On 2014-10-29, at 下午8:01, Robert van Leeuwen 
 robert.vanleeu...@spilgames.com wrote:
 
  I find our current design is remove all flows then add flow by
 entry, this
  will cause every network node will break off all tunnels between
 other
  network node and all compute node.
  Perhaps a way around this would be to add a flag on agent startup
  which would have it skip reprogramming flows. This could be used for
  the upgrade case.
 
  I hit the same issue last week and filed a bug here:
  https://bugs.launchpad.net/neutron/+bug/1383674
 
  From an operators perspective this is VERY annoying since you also
 cannot push any config changes that requires/triggers a restart of the
 agent.
  e.g. something simple like changing a log setting becomes a hassle.
  I would prefer the default behaviour to be to not clear the flows or
 at the least an config option to disable it.
 
 
  +1, we also suffered from this even when a very little patch is done
 
  I'd really like to get some input from the tripleo folks, because they
  were the ones who filed the original bug here and were hit by the
  agent NOT reprogramming flows on agent restart. It does seem fairly
  obvious that adding an option around this would be a good way forward,
  however.

 Since nobody else has commented, I'll put in my two cents (though I
 might be overcharging you ;-).  I've also added the TripleO tag to the
 subject, although with Summit coming up I don't know if that will help.

 Anyway, if the bug you're referring to is the one I think, then our
 issue was just with the flows not existing.  I don't think we care
 whether they get reprogrammed on agent restart or not as long as they
 somehow come into existence at some point.

 It's possible I'm wrong about that, and probably the best person to talk
 to would be Robert

Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-11-06 Thread Germy Lure
Hi Armando,
Static configuration really introduces unnecessary burden to the operator.
But I can't understand your explore a way, although it sounds
interesting. Can you explain it in detail? Thank you.

BTW, as Sudhakar wrote, [1] attempted to implement the flow
synchronization, but without any progress/updates. So how to remind the
register. Or if I want to participate in it even work on it alone, what I
need do? register another BP?

[1]
https://blueprints.launchpad.net/neutron/+spec/neutron-agent-soft-restart

BR,
Germy


On Thu, Nov 6, 2014 at 2:59 AM, Armando M. arma...@gmail.com wrote:

 I would be open to making this toggle switch available, however I feel
 that doing it via static configuration can introduce unnecessary burden to
 the operator. Perhaps we could explore a way where the agent can figure
 which state it's supposed to be in based on its reported status?

 Armando

 On 5 November 2014 12:09, Salvatore Orlando sorla...@nicira.com wrote:

 I have no opposition to that, and I will be happy to assist reviewing the
 code that will enable flow synchronisation  (or to say it in an easier way,
 punctual removal of flows unknown to the l2 agent).

 In the meanwhile, I hope you won't mind if we go ahead and start making
 flow reset optional - so that we stop causing downtime upon agent restart.

 Salvatore

 On 5 November 2014 11:57, Erik Moe erik@ericsson.com wrote:



 Hi,



 I also agree, IMHO we need flow synchronization method so we can avoid
 network downtime and stray flows.



 Regards,

 Erik





 *From:* Germy Lure [mailto:germy.l...@gmail.com]
 *Sent:* den 5 november 2014 10:46
 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [neutron][TripleO] Clear all flows when
 ovs agent start? why and how avoid?



 Hi Salvatore,

 A startup flag is really a simpler approach. But in what situation we
 should set this flag to remove all flows? upgrade? restart manually?
 internal fault?



 Indeed, only at the time that there are inconsistent(incorrect,
 unwanted, stable and so on) flows between agent and the ovs related, we
 need refresh flows. But the problem is how we know this? I think a startup
 flag is too rough, unless we can tolerate the inconsistent situation.



 Of course, I believe that turn off startup reset flows action can
 resolve most problem. The flows are correct most time after all. But
 considering NFV 5 9s, I still recommend flow synchronization approach.



 BR,

 Germy



 On Wed, Nov 5, 2014 at 3:36 PM, Salvatore Orlando sorla...@nicira.com
 wrote:

 From what I gather from this thread and related bug report, the change
 introduced in the OVS agent is causing a data plane outage upon agent
 restart, which is not desirable in most cases.



 The rationale for the change that introduced this bug was, I believe,
 cleaning up stale flows on the OVS agent, which also makes some sense.



 Unless I'm missing something, I reckon the best way forward is actually
 quite straightforward; we might add a startup flag to reset all flows and
 not reset them by default.

 While I agree the flow synchronisation process proposed in the
 previous post is valuable too, I hope we might be able to fix this with a
 simpler approach.



 Salvatore



 On 5 November 2014 04:43, Germy Lure germy.l...@gmail.com wrote:

 Hi,



 Consider the triggering of restart agent, I think it's nothing but:

 1). only restart agent

 2). reboot the host that agent deployed on



 When the agent started, the ovs may:

 a.have all correct flows

 b.have nothing at all

 c.have partly correct flows, the others may need to be reprogrammed,
 deleted or added



 In any case, I think both user and developer would happy to see that the
 system recovery ASAP after agent restarting. The best is agent only push
 those incorrect flows, but keep the correct ones. This can ensure those
 business with correct flows working during agent starting.



 So, I suggest two solutions:

 1.Agent gets all flows from ovs and compare with its local flows after
 restarting. And agent only corrects the different ones.

 2.Adapt ovs and agent. Agent just push all(not remove) flows every time
 and ovs prepares two tables for flows switch(like RCU lock).



 1 is recommended because of the 3rd vendors.



 BR,

 Germy





 On Fri, Oct 31, 2014 at 10:28 PM, Ben Nemec openst...@nemebean.com
 wrote:

 On 10/29/2014 10:17 AM, Kyle Mestery wrote:
  On Wed, Oct 29, 2014 at 7:25 AM, Hly henry4...@gmail.com wrote:
 
 
  Sent from my iPad
 
  On 2014-10-29, at 下午8:01, Robert van Leeuwen 
 robert.vanleeu...@spilgames.com wrote:
 
  I find our current design is remove all flows then add flow by
 entry, this
  will cause every network node will break off all tunnels between
 other
  network node and all compute node.
  Perhaps a way around this would be to add a flag on agent startup
  which would have it skip reprogramming flows. This could be used for
  the upgrade case.
 
  I hit the same

Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-11-05 Thread Germy Lure
Hi Salvatore,
A startup flag is really a simpler approach. But in what situation we
should set this flag to remove all flows? upgrade? restart manually?
internal fault?

Indeed, only at the time that there are inconsistent(incorrect, unwanted,
stable and so on) flows between agent and the ovs related, we need refresh
flows. But the problem is how we know this? I think a startup flag is too
rough, unless we can tolerate the inconsistent situation.

Of course, I believe that turn off startup reset flows action can resolve
most problem. The flows are correct most time after all. But considering
NFV 5 9s, I still recommend flow synchronization approach.

BR,
Germy

On Wed, Nov 5, 2014 at 3:36 PM, Salvatore Orlando sorla...@nicira.com
wrote:

 From what I gather from this thread and related bug report, the change
 introduced in the OVS agent is causing a data plane outage upon agent
 restart, which is not desirable in most cases.

 The rationale for the change that introduced this bug was, I believe,
 cleaning up stale flows on the OVS agent, which also makes some sense.

 Unless I'm missing something, I reckon the best way forward is actually
 quite straightforward; we might add a startup flag to reset all flows and
 not reset them by default.
 While I agree the flow synchronisation process proposed in the previous
 post is valuable too, I hope we might be able to fix this with a simpler
 approach.

 Salvatore

 On 5 November 2014 04:43, Germy Lure germy.l...@gmail.com wrote:

 Hi,

 Consider the triggering of restart agent, I think it's nothing but:
 1). only restart agent
 2). reboot the host that agent deployed on

 When the agent started, the ovs may:
 a.have all correct flows
 b.have nothing at all
 c.have partly correct flows, the others may need to be reprogrammed,
 deleted or added

 In any case, I think both user and developer would happy to see that the
 system recovery ASAP after agent restarting. The best is agent only push
 those incorrect flows, but keep the correct ones. This can ensure those
 business with correct flows working during agent starting.

 So, I suggest two solutions:
 1.Agent gets all flows from ovs and compare with its local flows after
 restarting. And agent only corrects the different ones.
 2.Adapt ovs and agent. Agent just push all(not remove) flows every time
 and ovs prepares two tables for flows switch(like RCU lock).

 1 is recommended because of the 3rd vendors.

 BR,
 Germy


 On Fri, Oct 31, 2014 at 10:28 PM, Ben Nemec openst...@nemebean.com
 wrote:

 On 10/29/2014 10:17 AM, Kyle Mestery wrote:
  On Wed, Oct 29, 2014 at 7:25 AM, Hly henry4...@gmail.com wrote:
 
 
  Sent from my iPad
 
  On 2014-10-29, at 下午8:01, Robert van Leeuwen 
 robert.vanleeu...@spilgames.com wrote:
 
  I find our current design is remove all flows then add flow by
 entry, this
  will cause every network node will break off all tunnels between
 other
  network node and all compute node.
  Perhaps a way around this would be to add a flag on agent startup
  which would have it skip reprogramming flows. This could be used for
  the upgrade case.
 
  I hit the same issue last week and filed a bug here:
  https://bugs.launchpad.net/neutron/+bug/1383674
 
  From an operators perspective this is VERY annoying since you also
 cannot push any config changes that requires/triggers a restart of the
 agent.
  e.g. something simple like changing a log setting becomes a hassle.
  I would prefer the default behaviour to be to not clear the flows or
 at the least an config option to disable it.
 
 
  +1, we also suffered from this even when a very little patch is done
 
  I'd really like to get some input from the tripleo folks, because they
  were the ones who filed the original bug here and were hit by the
  agent NOT reprogramming flows on agent restart. It does seem fairly
  obvious that adding an option around this would be a good way forward,
  however.

 Since nobody else has commented, I'll put in my two cents (though I
 might be overcharging you ;-).  I've also added the TripleO tag to the
 subject, although with Summit coming up I don't know if that will help.

 Anyway, if the bug you're referring to is the one I think, then our
 issue was just with the flows not existing.  I don't think we care
 whether they get reprogrammed on agent restart or not as long as they
 somehow come into existence at some point.

 It's possible I'm wrong about that, and probably the best person to talk
 to would be Robert Collins since I think he's the one who actually
 tracked down the problem in the first place.

 -Ben


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 

Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-11-05 Thread Erik Moe

Hi,

I also agree, IMHO we need flow synchronization method so we can avoid network 
downtime and stray flows.

Regards,
Erik


From: Germy Lure [mailto:germy.l...@gmail.com]
Sent: den 5 november 2014 10:46
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent 
start? why and how avoid?

Hi Salvatore,
A startup flag is really a simpler approach. But in what situation we should 
set this flag to remove all flows? upgrade? restart manually? internal fault?

Indeed, only at the time that there are inconsistent(incorrect, unwanted, 
stable and so on) flows between agent and the ovs related, we need refresh 
flows. But the problem is how we know this? I think a startup flag is too 
rough, unless we can tolerate the inconsistent situation.

Of course, I believe that turn off startup reset flows action can resolve most 
problem. The flows are correct most time after all. But considering NFV 5 9s, I 
still recommend flow synchronization approach.

BR,
Germy

On Wed, Nov 5, 2014 at 3:36 PM, Salvatore Orlando 
sorla...@nicira.commailto:sorla...@nicira.com wrote:
From what I gather from this thread and related bug report, the change 
introduced in the OVS agent is causing a data plane outage upon agent restart, 
which is not desirable in most cases.

The rationale for the change that introduced this bug was, I believe, cleaning 
up stale flows on the OVS agent, which also makes some sense.

Unless I'm missing something, I reckon the best way forward is actually quite 
straightforward; we might add a startup flag to reset all flows and not reset 
them by default.
While I agree the flow synchronisation process proposed in the previous post 
is valuable too, I hope we might be able to fix this with a simpler approach.

Salvatore

On 5 November 2014 04:43, Germy Lure 
germy.l...@gmail.commailto:germy.l...@gmail.com wrote:
Hi,

Consider the triggering of restart agent, I think it's nothing but:
1). only restart agent
2). reboot the host that agent deployed on

When the agent started, the ovs may:
a.have all correct flows
b.have nothing at all
c.have partly correct flows, the others may need to be reprogrammed, deleted or 
added

In any case, I think both user and developer would happy to see that the system 
recovery ASAP after agent restarting. The best is agent only push those 
incorrect flows, but keep the correct ones. This can ensure those business with 
correct flows working during agent starting.

So, I suggest two solutions:
1.Agent gets all flows from ovs and compare with its local flows after 
restarting. And agent only corrects the different ones.
2.Adapt ovs and agent. Agent just push all(not remove) flows every time and ovs 
prepares two tables for flows switch(like RCU lock).

1 is recommended because of the 3rd vendors.

BR,
Germy


On Fri, Oct 31, 2014 at 10:28 PM, Ben Nemec 
openst...@nemebean.commailto:openst...@nemebean.com wrote:
On 10/29/2014 10:17 AM, Kyle Mestery wrote:
 On Wed, Oct 29, 2014 at 7:25 AM, Hly 
 henry4...@gmail.commailto:henry4...@gmail.com wrote:


 Sent from my iPad

 On 2014-10-29, at 下午8:01, Robert van Leeuwen 
 robert.vanleeu...@spilgames.commailto:robert.vanleeu...@spilgames.com 
 wrote:

 I find our current design is remove all flows then add flow by entry, this
 will cause every network node will break off all tunnels between other
 network node and all compute node.
 Perhaps a way around this would be to add a flag on agent startup
 which would have it skip reprogramming flows. This could be used for
 the upgrade case.

 I hit the same issue last week and filed a bug here:
 https://bugs.launchpad.net/neutron/+bug/1383674

 From an operators perspective this is VERY annoying since you also cannot 
 push any config changes that requires/triggers a restart of the agent.
 e.g. something simple like changing a log setting becomes a hassle.
 I would prefer the default behaviour to be to not clear the flows or at the 
 least an config option to disable it.


 +1, we also suffered from this even when a very little patch is done

 I'd really like to get some input from the tripleo folks, because they
 were the ones who filed the original bug here and were hit by the
 agent NOT reprogramming flows on agent restart. It does seem fairly
 obvious that adding an option around this would be a good way forward,
 however.

Since nobody else has commented, I'll put in my two cents (though I
might be overcharging you ;-).  I've also added the TripleO tag to the
subject, although with Summit coming up I don't know if that will help.

Anyway, if the bug you're referring to is the one I think, then our
issue was just with the flows not existing.  I don't think we care
whether they get reprogrammed on agent restart or not as long as they
somehow come into existence at some point.

It's possible I'm wrong about that, and probably the best person to talk
to would be Robert Collins since I think he's the one who

Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-11-05 Thread Salvatore Orlando
I have no opposition to that, and I will be happy to assist reviewing the
code that will enable flow synchronisation  (or to say it in an easier way,
punctual removal of flows unknown to the l2 agent).

In the meanwhile, I hope you won't mind if we go ahead and start making
flow reset optional - so that we stop causing downtime upon agent restart.

Salvatore

On 5 November 2014 11:57, Erik Moe erik@ericsson.com wrote:



 Hi,



 I also agree, IMHO we need flow synchronization method so we can avoid
 network downtime and stray flows.



 Regards,

 Erik





 *From:* Germy Lure [mailto:germy.l...@gmail.com]
 *Sent:* den 5 november 2014 10:46
 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [neutron][TripleO] Clear all flows when
 ovs agent start? why and how avoid?



 Hi Salvatore,

 A startup flag is really a simpler approach. But in what situation we
 should set this flag to remove all flows? upgrade? restart manually?
 internal fault?



 Indeed, only at the time that there are inconsistent(incorrect, unwanted,
 stable and so on) flows between agent and the ovs related, we need refresh
 flows. But the problem is how we know this? I think a startup flag is too
 rough, unless we can tolerate the inconsistent situation.



 Of course, I believe that turn off startup reset flows action can resolve
 most problem. The flows are correct most time after all. But considering
 NFV 5 9s, I still recommend flow synchronization approach.



 BR,

 Germy



 On Wed, Nov 5, 2014 at 3:36 PM, Salvatore Orlando sorla...@nicira.com
 wrote:

 From what I gather from this thread and related bug report, the change
 introduced in the OVS agent is causing a data plane outage upon agent
 restart, which is not desirable in most cases.



 The rationale for the change that introduced this bug was, I believe,
 cleaning up stale flows on the OVS agent, which also makes some sense.



 Unless I'm missing something, I reckon the best way forward is actually
 quite straightforward; we might add a startup flag to reset all flows and
 not reset them by default.

 While I agree the flow synchronisation process proposed in the previous
 post is valuable too, I hope we might be able to fix this with a simpler
 approach.



 Salvatore



 On 5 November 2014 04:43, Germy Lure germy.l...@gmail.com wrote:

 Hi,



 Consider the triggering of restart agent, I think it's nothing but:

 1). only restart agent

 2). reboot the host that agent deployed on



 When the agent started, the ovs may:

 a.have all correct flows

 b.have nothing at all

 c.have partly correct flows, the others may need to be reprogrammed,
 deleted or added



 In any case, I think both user and developer would happy to see that the
 system recovery ASAP after agent restarting. The best is agent only push
 those incorrect flows, but keep the correct ones. This can ensure those
 business with correct flows working during agent starting.



 So, I suggest two solutions:

 1.Agent gets all flows from ovs and compare with its local flows after
 restarting. And agent only corrects the different ones.

 2.Adapt ovs and agent. Agent just push all(not remove) flows every time
 and ovs prepares two tables for flows switch(like RCU lock).



 1 is recommended because of the 3rd vendors.



 BR,

 Germy





 On Fri, Oct 31, 2014 at 10:28 PM, Ben Nemec openst...@nemebean.com
 wrote:

 On 10/29/2014 10:17 AM, Kyle Mestery wrote:
  On Wed, Oct 29, 2014 at 7:25 AM, Hly henry4...@gmail.com wrote:
 
 
  Sent from my iPad
 
  On 2014-10-29, at 下午8:01, Robert van Leeuwen 
 robert.vanleeu...@spilgames.com wrote:
 
  I find our current design is remove all flows then add flow by
 entry, this
  will cause every network node will break off all tunnels between
 other
  network node and all compute node.
  Perhaps a way around this would be to add a flag on agent startup
  which would have it skip reprogramming flows. This could be used for
  the upgrade case.
 
  I hit the same issue last week and filed a bug here:
  https://bugs.launchpad.net/neutron/+bug/1383674
 
  From an operators perspective this is VERY annoying since you also
 cannot push any config changes that requires/triggers a restart of the
 agent.
  e.g. something simple like changing a log setting becomes a hassle.
  I would prefer the default behaviour to be to not clear the flows or
 at the least an config option to disable it.
 
 
  +1, we also suffered from this even when a very little patch is done
 
  I'd really like to get some input from the tripleo folks, because they
  were the ones who filed the original bug here and were hit by the
  agent NOT reprogramming flows on agent restart. It does seem fairly
  obvious that adding an option around this would be a good way forward,
  however.

 Since nobody else has commented, I'll put in my two cents (though I
 might be overcharging you ;-).  I've also added the TripleO tag to the
 subject, although with Summit coming up

Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-11-05 Thread Gariganti, Sudhakar Babu
I guess this blueprint[1] attempted to implement the flow synchronization issue 
during the agent restart.
But I see no progress/updates. It would be helpful to know about the progress 
there.

[1] https://blueprints.launchpad.net/neutron/+spec/neutron-agent-soft-restart

On a different note, I agree with Salvatore on getting started with the 
simplistic approach and improve it further.

Regards,
Sudhakar.

From: Salvatore Orlando [mailto:sorla...@nicira.com]
Sent: Wednesday, November 05, 2014 4:39 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent 
start? why and how avoid?

I have no opposition to that, and I will be happy to assist reviewing the code 
that will enable flow synchronisation  (or to say it in an easier way, punctual 
removal of flows unknown to the l2 agent).

In the meanwhile, I hope you won't mind if we go ahead and start making flow 
reset optional - so that we stop causing downtime upon agent restart.

Salvatore

On 5 November 2014 11:57, Erik Moe 
erik@ericsson.commailto:erik@ericsson.com wrote:

Hi,

I also agree, IMHO we need flow synchronization method so we can avoid network 
downtime and stray flows.

Regards,
Erik


From: Germy Lure [mailto:germy.l...@gmail.commailto:germy.l...@gmail.com]
Sent: den 5 november 2014 10:46
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent 
start? why and how avoid?

Hi Salvatore,
A startup flag is really a simpler approach. But in what situation we should 
set this flag to remove all flows? upgrade? restart manually? internal fault?

Indeed, only at the time that there are inconsistent(incorrect, unwanted, 
stable and so on) flows between agent and the ovs related, we need refresh 
flows. But the problem is how we know this? I think a startup flag is too 
rough, unless we can tolerate the inconsistent situation.

Of course, I believe that turn off startup reset flows action can resolve most 
problem. The flows are correct most time after all. But considering NFV 5 9s, I 
still recommend flow synchronization approach.

BR,
Germy

On Wed, Nov 5, 2014 at 3:36 PM, Salvatore Orlando 
sorla...@nicira.commailto:sorla...@nicira.com wrote:
From what I gather from this thread and related bug report, the change 
introduced in the OVS agent is causing a data plane outage upon agent restart, 
which is not desirable in most cases.

The rationale for the change that introduced this bug was, I believe, cleaning 
up stale flows on the OVS agent, which also makes some sense.

Unless I'm missing something, I reckon the best way forward is actually quite 
straightforward; we might add a startup flag to reset all flows and not reset 
them by default.
While I agree the flow synchronisation process proposed in the previous post 
is valuable too, I hope we might be able to fix this with a simpler approach.

Salvatore

On 5 November 2014 04:43, Germy Lure 
germy.l...@gmail.commailto:germy.l...@gmail.com wrote:
Hi,

Consider the triggering of restart agent, I think it's nothing but:
1). only restart agent
2). reboot the host that agent deployed on

When the agent started, the ovs may:
a.have all correct flows
b.have nothing at all
c.have partly correct flows, the others may need to be reprogrammed, deleted or 
added

In any case, I think both user and developer would happy to see that the system 
recovery ASAP after agent restarting. The best is agent only push those 
incorrect flows, but keep the correct ones. This can ensure those business with 
correct flows working during agent starting.

So, I suggest two solutions:
1.Agent gets all flows from ovs and compare with its local flows after 
restarting. And agent only corrects the different ones.
2.Adapt ovs and agent. Agent just push all(not remove) flows every time and ovs 
prepares two tables for flows switch(like RCU lock).

1 is recommended because of the 3rd vendors.

BR,
Germy


On Fri, Oct 31, 2014 at 10:28 PM, Ben Nemec 
openst...@nemebean.commailto:openst...@nemebean.com wrote:
On 10/29/2014 10:17 AM, Kyle Mestery wrote:
 On Wed, Oct 29, 2014 at 7:25 AM, Hly 
 henry4...@gmail.commailto:henry4...@gmail.com wrote:


 Sent from my iPad

 On 2014-10-29, at 下午8:01, Robert van Leeuwen 
 robert.vanleeu...@spilgames.commailto:robert.vanleeu...@spilgames.com 
 wrote:

 I find our current design is remove all flows then add flow by entry, this
 will cause every network node will break off all tunnels between other
 network node and all compute node.
 Perhaps a way around this would be to add a flag on agent startup
 which would have it skip reprogramming flows. This could be used for
 the upgrade case.

 I hit the same issue last week and filed a bug here:
 https://bugs.launchpad.net/neutron/+bug/1383674

 From an operators perspective this is VERY annoying since you also cannot 
 push any config changes that requires/triggers

Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-11-05 Thread Erik Moe

Ok, I don’t mind starting with the simplistic approach.

Regards,
Erik


From: Gariganti, Sudhakar Babu [mailto:sudhakar-babu.gariga...@hp.com]
Sent: den 5 november 2014 12:14
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent 
start? why and how avoid?

I guess this blueprint[1] attempted to implement the flow synchronization issue 
during the agent restart.
But I see no progress/updates. It would be helpful to know about the progress 
there.

[1] https://blueprints.launchpad.net/neutron/+spec/neutron-agent-soft-restart

On a different note, I agree with Salvatore on getting started with the 
simplistic approach and improve it further.

Regards,
Sudhakar.

From: Salvatore Orlando [mailto:sorla...@nicira.com]
Sent: Wednesday, November 05, 2014 4:39 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent 
start? why and how avoid?

I have no opposition to that, and I will be happy to assist reviewing the code 
that will enable flow synchronisation  (or to say it in an easier way, punctual 
removal of flows unknown to the l2 agent).

In the meanwhile, I hope you won't mind if we go ahead and start making flow 
reset optional - so that we stop causing downtime upon agent restart.

Salvatore

On 5 November 2014 11:57, Erik Moe 
erik@ericsson.commailto:erik@ericsson.com wrote:

Hi,

I also agree, IMHO we need flow synchronization method so we can avoid network 
downtime and stray flows.

Regards,
Erik


From: Germy Lure [mailto:germy.l...@gmail.commailto:germy.l...@gmail.com]
Sent: den 5 november 2014 10:46
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent 
start? why and how avoid?

Hi Salvatore,
A startup flag is really a simpler approach. But in what situation we should 
set this flag to remove all flows? upgrade? restart manually? internal fault?

Indeed, only at the time that there are inconsistent(incorrect, unwanted, 
stable and so on) flows between agent and the ovs related, we need refresh 
flows. But the problem is how we know this? I think a startup flag is too 
rough, unless we can tolerate the inconsistent situation.

Of course, I believe that turn off startup reset flows action can resolve most 
problem. The flows are correct most time after all. But considering NFV 5 9s, I 
still recommend flow synchronization approach.

BR,
Germy

On Wed, Nov 5, 2014 at 3:36 PM, Salvatore Orlando 
sorla...@nicira.commailto:sorla...@nicira.com wrote:
From what I gather from this thread and related bug report, the change 
introduced in the OVS agent is causing a data plane outage upon agent restart, 
which is not desirable in most cases.

The rationale for the change that introduced this bug was, I believe, cleaning 
up stale flows on the OVS agent, which also makes some sense.

Unless I'm missing something, I reckon the best way forward is actually quite 
straightforward; we might add a startup flag to reset all flows and not reset 
them by default.
While I agree the flow synchronisation process proposed in the previous post 
is valuable too, I hope we might be able to fix this with a simpler approach.

Salvatore

On 5 November 2014 04:43, Germy Lure 
germy.l...@gmail.commailto:germy.l...@gmail.com wrote:
Hi,

Consider the triggering of restart agent, I think it's nothing but:
1). only restart agent
2). reboot the host that agent deployed on

When the agent started, the ovs may:
a.have all correct flows
b.have nothing at all
c.have partly correct flows, the others may need to be reprogrammed, deleted or 
added

In any case, I think both user and developer would happy to see that the system 
recovery ASAP after agent restarting. The best is agent only push those 
incorrect flows, but keep the correct ones. This can ensure those business with 
correct flows working during agent starting.

So, I suggest two solutions:
1.Agent gets all flows from ovs and compare with its local flows after 
restarting. And agent only corrects the different ones.
2.Adapt ovs and agent. Agent just push all(not remove) flows every time and ovs 
prepares two tables for flows switch(like RCU lock).

1 is recommended because of the 3rd vendors.

BR,
Germy


On Fri, Oct 31, 2014 at 10:28 PM, Ben Nemec 
openst...@nemebean.commailto:openst...@nemebean.com wrote:
On 10/29/2014 10:17 AM, Kyle Mestery wrote:
 On Wed, Oct 29, 2014 at 7:25 AM, Hly 
 henry4...@gmail.commailto:henry4...@gmail.com wrote:


 Sent from my iPad

 On 2014-10-29, at 下午8:01, Robert van Leeuwen 
 robert.vanleeu...@spilgames.commailto:robert.vanleeu...@spilgames.com 
 wrote:

 I find our current design is remove all flows then add flow by entry, this
 will cause every network node will break off all tunnels between other
 network node and all compute node.
 Perhaps a way around this would

Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-11-05 Thread Armando M.
I would be open to making this toggle switch available, however I feel that
doing it via static configuration can introduce unnecessary burden to the
operator. Perhaps we could explore a way where the agent can figure which
state it's supposed to be in based on its reported status?

Armando

On 5 November 2014 12:09, Salvatore Orlando sorla...@nicira.com wrote:

 I have no opposition to that, and I will be happy to assist reviewing the
 code that will enable flow synchronisation  (or to say it in an easier way,
 punctual removal of flows unknown to the l2 agent).

 In the meanwhile, I hope you won't mind if we go ahead and start making
 flow reset optional - so that we stop causing downtime upon agent restart.

 Salvatore

 On 5 November 2014 11:57, Erik Moe erik@ericsson.com wrote:



 Hi,



 I also agree, IMHO we need flow synchronization method so we can avoid
 network downtime and stray flows.



 Regards,

 Erik





 *From:* Germy Lure [mailto:germy.l...@gmail.com]
 *Sent:* den 5 november 2014 10:46
 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [neutron][TripleO] Clear all flows when
 ovs agent start? why and how avoid?



 Hi Salvatore,

 A startup flag is really a simpler approach. But in what situation we
 should set this flag to remove all flows? upgrade? restart manually?
 internal fault?



 Indeed, only at the time that there are inconsistent(incorrect, unwanted,
 stable and so on) flows between agent and the ovs related, we need refresh
 flows. But the problem is how we know this? I think a startup flag is too
 rough, unless we can tolerate the inconsistent situation.



 Of course, I believe that turn off startup reset flows action can resolve
 most problem. The flows are correct most time after all. But considering
 NFV 5 9s, I still recommend flow synchronization approach.



 BR,

 Germy



 On Wed, Nov 5, 2014 at 3:36 PM, Salvatore Orlando sorla...@nicira.com
 wrote:

 From what I gather from this thread and related bug report, the change
 introduced in the OVS agent is causing a data plane outage upon agent
 restart, which is not desirable in most cases.



 The rationale for the change that introduced this bug was, I believe,
 cleaning up stale flows on the OVS agent, which also makes some sense.



 Unless I'm missing something, I reckon the best way forward is actually
 quite straightforward; we might add a startup flag to reset all flows and
 not reset them by default.

 While I agree the flow synchronisation process proposed in the previous
 post is valuable too, I hope we might be able to fix this with a simpler
 approach.



 Salvatore



 On 5 November 2014 04:43, Germy Lure germy.l...@gmail.com wrote:

 Hi,



 Consider the triggering of restart agent, I think it's nothing but:

 1). only restart agent

 2). reboot the host that agent deployed on



 When the agent started, the ovs may:

 a.have all correct flows

 b.have nothing at all

 c.have partly correct flows, the others may need to be reprogrammed,
 deleted or added



 In any case, I think both user and developer would happy to see that the
 system recovery ASAP after agent restarting. The best is agent only push
 those incorrect flows, but keep the correct ones. This can ensure those
 business with correct flows working during agent starting.



 So, I suggest two solutions:

 1.Agent gets all flows from ovs and compare with its local flows after
 restarting. And agent only corrects the different ones.

 2.Adapt ovs and agent. Agent just push all(not remove) flows every time
 and ovs prepares two tables for flows switch(like RCU lock).



 1 is recommended because of the 3rd vendors.



 BR,

 Germy





 On Fri, Oct 31, 2014 at 10:28 PM, Ben Nemec openst...@nemebean.com
 wrote:

 On 10/29/2014 10:17 AM, Kyle Mestery wrote:
  On Wed, Oct 29, 2014 at 7:25 AM, Hly henry4...@gmail.com wrote:
 
 
  Sent from my iPad
 
  On 2014-10-29, at 下午8:01, Robert van Leeuwen 
 robert.vanleeu...@spilgames.com wrote:
 
  I find our current design is remove all flows then add flow by
 entry, this
  will cause every network node will break off all tunnels between
 other
  network node and all compute node.
  Perhaps a way around this would be to add a flag on agent startup
  which would have it skip reprogramming flows. This could be used for
  the upgrade case.
 
  I hit the same issue last week and filed a bug here:
  https://bugs.launchpad.net/neutron/+bug/1383674
 
  From an operators perspective this is VERY annoying since you also
 cannot push any config changes that requires/triggers a restart of the
 agent.
  e.g. something simple like changing a log setting becomes a hassle.
  I would prefer the default behaviour to be to not clear the flows or
 at the least an config option to disable it.
 
 
  +1, we also suffered from this even when a very little patch is done
 
  I'd really like to get some input from the tripleo folks, because they
  were the ones who filed

Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-11-04 Thread Manish Godara

Clearing all flows upon agent restart is a major issue, imho.  We should really 
look at this with higher priority than the modular L2 agent as the timeline of 
the refactor isn't clear for the modular layer 2 agent.  Whatever the issue 
was, I think we ought to be able to find a better solution that doesn't disrupt 
the network.  I agree that reconciling data after a restart is not 
straight-forward in all scenarios but there should be an option to just do 
basic sanity and not interrupt existing flows.  I'd like to help out on this 
(if needed) - there is a blueprint [1] that was suggested but I'm not sure who 
the owner is and what the status is.  If anyone is working on this and is at 
the summit this week, please let me know.  We can meet one of the days here at 
the summit.





thanks,

manish





[1] Adding an option of Soft Restart in neutron agent along with o... : 
Blueprints : neutron


|   |
|   |  |   |   |   |   |   |
| Adding an option of Soft Restart in neutron agent alon...While the 
blueprint of ovs-firewall-driver is being developed, a new concern comes up. 
When an ovs agent (or an ml2 agent with ovs) restarts, if it cleans up all ... |
|  |
| View on blueprints.launchpad.net | Preview by Yahoo |
|  |
|   |


  
 On Friday, October 31, 2014 7:32 AM, Ben Nemec openst...@nemebean.com 
wrote:
   

 On 10/29/2014 10:17 AM, Kyle Mestery wrote:
 On Wed, Oct 29, 2014 at 7:25 AM, Hly henry4...@gmail.com wrote:


 Sent from my iPad

 On 2014-10-29, at 下午8:01, Robert van Leeuwen 
 robert.vanleeu...@spilgames.com wrote:

 I find our current design is remove all flows then add flow by entry, this
 will cause every network node will break off all tunnels between other
 network node and all compute node.
 Perhaps a way around this would be to add a flag on agent startup
 which would have it skip reprogramming flows. This could be used for
 the upgrade case.

 I hit the same issue last week and filed a bug here:
 https://bugs.launchpad.net/neutron/+bug/1383674

 From an operators perspective this is VERY annoying since you also cannot 
 push any config changes that requires/triggers a restart of the agent.
 e.g. something simple like changing a log setting becomes a hassle.
 I would prefer the default behaviour to be to not clear the flows or at the 
 least an config option to disable it.


 +1, we also suffered from this even when a very little patch is done

 I'd really like to get some input from the tripleo folks, because they
 were the ones who filed the original bug here and were hit by the
 agent NOT reprogramming flows on agent restart. It does seem fairly
 obvious that adding an option around this would be a good way forward,
 however.

Since nobody else has commented, I'll put in my two cents (though I
might be overcharging you ;-).  I've also added the TripleO tag to the
subject, although with Summit coming up I don't know if that will help.

Anyway, if the bug you're referring to is the one I think, then our
issue was just with the flows not existing.  I don't think we care
whether they get reprogrammed on agent restart or not as long as they
somehow come into existence at some point.

It's possible I'm wrong about that, and probably the best person to talk
to would be Robert Collins since I think he's the one who actually
tracked down the problem in the first place.

-Ben


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


   ___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-11-04 Thread Germy Lure
Hi,

Consider the triggering of restart agent, I think it's nothing but:
1). only restart agent
2). reboot the host that agent deployed on

When the agent started, the ovs may:
a.have all correct flows
b.have nothing at all
c.have partly correct flows, the others may need to be reprogrammed,
deleted or added

In any case, I think both user and developer would happy to see that the
system recovery ASAP after agent restarting. The best is agent only push
those incorrect flows, but keep the correct ones. This can ensure those
business with correct flows working during agent starting.

So, I suggest two solutions:
1.Agent gets all flows from ovs and compare with its local flows after
restarting. And agent only corrects the different ones.
2.Adapt ovs and agent. Agent just push all(not remove) flows every time and
ovs prepares two tables for flows switch(like RCU lock).

1 is recommended because of the 3rd vendors.

BR,
Germy


On Fri, Oct 31, 2014 at 10:28 PM, Ben Nemec openst...@nemebean.com wrote:

 On 10/29/2014 10:17 AM, Kyle Mestery wrote:
  On Wed, Oct 29, 2014 at 7:25 AM, Hly henry4...@gmail.com wrote:
 
 
  Sent from my iPad
 
  On 2014-10-29, at 下午8:01, Robert van Leeuwen 
 robert.vanleeu...@spilgames.com wrote:
 
  I find our current design is remove all flows then add flow by
 entry, this
  will cause every network node will break off all tunnels between
 other
  network node and all compute node.
  Perhaps a way around this would be to add a flag on agent startup
  which would have it skip reprogramming flows. This could be used for
  the upgrade case.
 
  I hit the same issue last week and filed a bug here:
  https://bugs.launchpad.net/neutron/+bug/1383674
 
  From an operators perspective this is VERY annoying since you also
 cannot push any config changes that requires/triggers a restart of the
 agent.
  e.g. something simple like changing a log setting becomes a hassle.
  I would prefer the default behaviour to be to not clear the flows or
 at the least an config option to disable it.
 
 
  +1, we also suffered from this even when a very little patch is done
 
  I'd really like to get some input from the tripleo folks, because they
  were the ones who filed the original bug here and were hit by the
  agent NOT reprogramming flows on agent restart. It does seem fairly
  obvious that adding an option around this would be a good way forward,
  however.

 Since nobody else has commented, I'll put in my two cents (though I
 might be overcharging you ;-).  I've also added the TripleO tag to the
 subject, although with Summit coming up I don't know if that will help.

 Anyway, if the bug you're referring to is the one I think, then our
 issue was just with the flows not existing.  I don't think we care
 whether they get reprogrammed on agent restart or not as long as they
 somehow come into existence at some point.

 It's possible I'm wrong about that, and probably the best person to talk
 to would be Robert Collins since I think he's the one who actually
 tracked down the problem in the first place.

 -Ben


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-11-04 Thread Salvatore Orlando
From what I gather from this thread and related bug report, the change
introduced in the OVS agent is causing a data plane outage upon agent
restart, which is not desirable in most cases.

The rationale for the change that introduced this bug was, I believe,
cleaning up stale flows on the OVS agent, which also makes some sense.

Unless I'm missing something, I reckon the best way forward is actually
quite straightforward; we might add a startup flag to reset all flows and
not reset them by default.
While I agree the flow synchronisation process proposed in the previous
post is valuable too, I hope we might be able to fix this with a simpler
approach.

Salvatore

On 5 November 2014 04:43, Germy Lure germy.l...@gmail.com wrote:

 Hi,

 Consider the triggering of restart agent, I think it's nothing but:
 1). only restart agent
 2). reboot the host that agent deployed on

 When the agent started, the ovs may:
 a.have all correct flows
 b.have nothing at all
 c.have partly correct flows, the others may need to be reprogrammed,
 deleted or added

 In any case, I think both user and developer would happy to see that the
 system recovery ASAP after agent restarting. The best is agent only push
 those incorrect flows, but keep the correct ones. This can ensure those
 business with correct flows working during agent starting.

 So, I suggest two solutions:
 1.Agent gets all flows from ovs and compare with its local flows after
 restarting. And agent only corrects the different ones.
 2.Adapt ovs and agent. Agent just push all(not remove) flows every time
 and ovs prepares two tables for flows switch(like RCU lock).

 1 is recommended because of the 3rd vendors.

 BR,
 Germy


 On Fri, Oct 31, 2014 at 10:28 PM, Ben Nemec openst...@nemebean.com
 wrote:

 On 10/29/2014 10:17 AM, Kyle Mestery wrote:
  On Wed, Oct 29, 2014 at 7:25 AM, Hly henry4...@gmail.com wrote:
 
 
  Sent from my iPad
 
  On 2014-10-29, at 下午8:01, Robert van Leeuwen 
 robert.vanleeu...@spilgames.com wrote:
 
  I find our current design is remove all flows then add flow by
 entry, this
  will cause every network node will break off all tunnels between
 other
  network node and all compute node.
  Perhaps a way around this would be to add a flag on agent startup
  which would have it skip reprogramming flows. This could be used for
  the upgrade case.
 
  I hit the same issue last week and filed a bug here:
  https://bugs.launchpad.net/neutron/+bug/1383674
 
  From an operators perspective this is VERY annoying since you also
 cannot push any config changes that requires/triggers a restart of the
 agent.
  e.g. something simple like changing a log setting becomes a hassle.
  I would prefer the default behaviour to be to not clear the flows or
 at the least an config option to disable it.
 
 
  +1, we also suffered from this even when a very little patch is done
 
  I'd really like to get some input from the tripleo folks, because they
  were the ones who filed the original bug here and were hit by the
  agent NOT reprogramming flows on agent restart. It does seem fairly
  obvious that adding an option around this would be a good way forward,
  however.

 Since nobody else has commented, I'll put in my two cents (though I
 might be overcharging you ;-).  I've also added the TripleO tag to the
 subject, although with Summit coming up I don't know if that will help.

 Anyway, if the bug you're referring to is the one I think, then our
 issue was just with the flows not existing.  I don't think we care
 whether they get reprogrammed on agent restart or not as long as they
 somehow come into existence at some point.

 It's possible I'm wrong about that, and probably the best person to talk
 to would be Robert Collins since I think he's the one who actually
 tracked down the problem in the first place.

 -Ben


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

2014-10-31 Thread Ben Nemec
On 10/29/2014 10:17 AM, Kyle Mestery wrote:
 On Wed, Oct 29, 2014 at 7:25 AM, Hly henry4...@gmail.com wrote:


 Sent from my iPad

 On 2014-10-29, at 下午8:01, Robert van Leeuwen 
 robert.vanleeu...@spilgames.com wrote:

 I find our current design is remove all flows then add flow by entry, this
 will cause every network node will break off all tunnels between other
 network node and all compute node.
 Perhaps a way around this would be to add a flag on agent startup
 which would have it skip reprogramming flows. This could be used for
 the upgrade case.

 I hit the same issue last week and filed a bug here:
 https://bugs.launchpad.net/neutron/+bug/1383674

 From an operators perspective this is VERY annoying since you also cannot 
 push any config changes that requires/triggers a restart of the agent.
 e.g. something simple like changing a log setting becomes a hassle.
 I would prefer the default behaviour to be to not clear the flows or at the 
 least an config option to disable it.


 +1, we also suffered from this even when a very little patch is done

 I'd really like to get some input from the tripleo folks, because they
 were the ones who filed the original bug here and were hit by the
 agent NOT reprogramming flows on agent restart. It does seem fairly
 obvious that adding an option around this would be a good way forward,
 however.

Since nobody else has commented, I'll put in my two cents (though I
might be overcharging you ;-).  I've also added the TripleO tag to the
subject, although with Summit coming up I don't know if that will help.

Anyway, if the bug you're referring to is the one I think, then our
issue was just with the flows not existing.  I don't think we care
whether they get reprogrammed on agent restart or not as long as they
somehow come into existence at some point.

It's possible I'm wrong about that, and probably the best person to talk
to would be Robert Collins since I think he's the one who actually
tracked down the problem in the first place.

-Ben


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev