Re: [ovs-discuss] Error ovs recursion limit reached on datapath

2019-06-27 Thread Heim, Dennis
I have out of band management configured with:

ovs-vsctl set controller  connection-mode=out-of-band
ovs-vsctl set bridge  other-config:disable-in-band=true

The out of band NIC is not added to an ovs bridge. When I connect it to my 
openDayLight controller (v0.8.4), that is when the recursion errors show up. 
Running 2.5.5 (ovs), it works perfectly.

f0fc01b4-904d-407b-a3bb-03fd3e004926
Manager "tcp:10.246.49.188:6640"
is_connected: true
Bridge data
Controller "tcp:10.246.49.188:6633"
is_connected: true
fail_mode: secure
Port "vlan1"
tag: 1
Interface "vlan1"
type: internal
Port "ens256"
tag: 10
Interface "ens256"
Port "ens224"
tag: 1
Interface "ens224"
Port "vxlan3"
Interface "vxlan3"
type: vxlan
options: {key=flow, remote_ip="10.246.48.149"}
Port data
Interface data
type: inte

Dennis Heim | Domain Architect (Collaboration Labs)
World Wide Technology, Inc. | +1 314-212-1814


"The most powerful person in the world is the story teller. The storyteller 
sets the vision, values and agenda of an entire generation that is to come" - 
Steve Jobs
"Leaders who don't listen will eventually be surrounded by people who have 
nothing to say" --- Andy Stanley
"Worry less about who you might offend, and more about who you might inspire" 
-- Tim Allen
"Imagination is more important than knowledge."  -- Albert Einstein
"If you can raise the level of effort and performance in those around you, you 
are officially a leader" - Urban Meyer
"The greatest danger for most of us is not that our aim is too high and we miss 
it, but that it is too low and we reach it." -- Michelangelo Buonarroti
"Mediocore managers play checkers (assuming everyone is the same). Great 
managers play chess (acknowledging that everyone is unique)" - Marcus Buckingham
"If you're not failing every now and again, it's a sign you're not doing 
anything very innovative" - Woody Allen

Click here to join me in my Collaboration Meeting Room

-Original Message-
From: Ben Pfaff  
Sent: Thursday, June 27, 2019 10:01 AM
To: Heim, Dennis 
Cc: ovs-discuss@openvswitch.org
Subject: Re: [ovs-discuss] Error ovs recursion limit reached on datapath

On Thu, Jun 27, 2019 at 04:48:45AM +, Heim, Dennis wrote:
> Any idea what causes the error message "ovs recursion limit reached on 
> datapath"? I have the configuration working on 2.5.5, but if I run 2.9 
> or 2.11, I get that error message.

It could be a bug in OVS or it could be an OpenFlow flow table that does 
something odd, for example, recursively executing connection tracking or 
tunneling from a system to itself.  You might be able to track it down by 
looking at the kernel flows with "ovs-dpctl dump-flows" or by tracing 
microflows or packets with "ovs-appctl ofproto/trace".
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [HELP]Question about fdb entry size

2019-06-27 Thread Ben Pfaff
This is a pretty extreme situation.  OVS isn't optimized for it.  You
might need to adjust the code in ofproto-dpif-upcall.c by hand to tune
for it.  If you come up with some changes that improve performance and
are unlikely to substantially negatively affect performance in more
common situations, then we'd be grateful to have the patches.

On Thu, Jun 27, 2019 at 10:38:20AM +0800, txfh2...@aliyun.com wrote:
> Dear Ben:
> sorry for my mistake, yes the fdb entry max limit is 1000k.
> I have found in my test, when the test pkt flow num beyond 200k, 
> throughput
> declines as the kernel flow limit is 200k.the revalidator threads will delete
> kernel flow entry to remain flow size below 200k, am i right?
>  But even if i have set the flow-limit to 500k, i have found the kernel
> flow num would also declines to around 200k after a few minutes. i donot know
> the reason. i have read the "revalidatorwhat" slide(2014 ovs conf) but still
> cannot get the clue.
> Thanks for your reply.
> 
> TIMO
> 
> ---Original---
> From: "Ben Pfaff"
> Date: Wed, Jun 26, 2019 23:11 PM
> To: "txfh2007";
> Cc: "ovs-discuss";
> Subject: Re: [ovs-discuss] [HELP]Question about fdb entry size
> 
> On Wed, Jun 26, 2019 at 09:18:12PM +0800, txfh2007 via discuss wrote:
> > I have a question about ovs fdb entry size &&
>  aging time. I have found the
> >
>  max fdb entry size is hard coded in  mac_learning.c, that max_entries is 
> 100k,
> > the longest aging time is 3600s.
> > 
> >   But in my test environment on which pkt forwarding is based on OVS normal
> > action, and my test center could generate about 200k flow simultaneously.  
> > So
> > the performance is effected by max entry size(there shoud be fdb entries
> > evicted by new pkts), So can we enlarge the max_entries limitation, and what
> > is the side effect?
> 
> It looks to me like the maximum is 1 million:
> 
> /* Sets the maximum number of entries in 'ml' to 'max_entries', adjusting it
>  * to be within a reasonable range. */
> void
> mac_learning_set_max_entries(struct mac_learning *ml, size_t max_entries)
> {
> ml->max_entries = (max_entries < 10 ? 10
>: max_entries > 1000 * 1000 ? 1000 * 1000
>: max_entries);
> }

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Error ovs recursion limit reached on datapath

2019-06-27 Thread Ben Pfaff
On Thu, Jun 27, 2019 at 04:48:45AM +, Heim, Dennis wrote:
> Any idea what causes the error message "ovs recursion limit reached on
> datapath"? I have the configuration working on 2.5.5, but if I run 2.9
> or 2.11, I get that error message.

It could be a bug in OVS or it could be an OpenFlow flow table that does
something odd, for example, recursively executing connection tracking or
tunneling from a system to itself.  You might be able to track it down
by looking at the kernel flows with "ovs-dpctl dump-flows" or by tracing
microflows or packets with "ovs-appctl ofproto/trace".
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] Re: [HELP]Question about fdb entry size

2019-06-27 Thread txfh2007 via discuss
Dear Ben
   sorry for my mistake, yes the max limit of fdb entry is 1000k.
   I have found  when the test pkt flow num beyond 200k, throughput declines . 
I guess the reason is kernel flow limit is 200k.the revalidator threads will 
delete kernel flow entry to remain flow size below 200k, am i right?
 But even if i have set the flow-limit to 500k, i have found the kernel 
flow num would also declines to around 200k after a few minutes. i don’t know 
the reason. i have read the "revalidatorwhat" slide(2014 ovs conf) but still 
cannot get the clue.
Thanks for your reply.

TIMO

On Wed, Jun 26, 2019 at 09:18:12PM +0800, txfh2007 via discuss wrote:
> I have a question about ovs fdb entry size && aging time. I have found the
> max fdb entry size is hard coded in  mac_learning.c, that max_entries is 100k,
> the longest aging time is 3600s.
> 
>   But in my test environment on which pkt forwarding is based on OVS normal
> action, and my test center could generate about 200k flow simultaneously.  So
> the performance is effected by max entry size(there shoud be fdb entries
> evicted by new pkts), So can we enlarge the max_entries limitation, and what
> is the side effect?

It looks to me like the maximum is 1 million:

/* Sets the maximum number of entries in 'ml' to 'max_entries', adjusting it
 * to be within a reasonable range. */
void
mac_learning_set_max_entries(struct mac_learning *ml, size_t max_entries)
{
ml->max_entries = (max_entries < 10 ? 10
   : max_entries > 1000 * 1000 ? 1000 * 1000
   : max_entries);
}
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] Aging mechanism for MAC_Binding table

2019-06-27 Thread Ben Pfaff
On Tue, Jun 25, 2019 at 01:05:21PM +0200, Daniel Alvarez Sanchez wrote:
> Lately we've been trying to solve certain issues related to stale
> entries in the MAC_Binding table (e.g. [0]). On the other hand, for
> the OpenStack + Octavia (Load Balancing service) use case, we see that
> a reused VIP can be as well affected by stale entries in this table
> due to the fact that it's never bound to a VIF so ovn-controller won't
> claim it and send the GARPs to update the neighbors.
> 
> I'm not sure if other scenarios may suffer from this issue but seems
> reasonable to have an aging mechanism (as we discussed at some point
> in the past) that makes unused/old entries to expire. After talking to
> Numan on IRC, since a new pinctrl thread has been introduced recently
> [1], it'd be nice to implement this aging mechanism there.
> At the same time we'd be also reducing the amount of entries for long
> lived systems as it'd grow indefinitely.
> 
> Any thoughts?
> 
> Thanks!
> Daniel
> 
> PS. With regards to the 'unused' vs 'old' entries I think it has to be
> 'old' rather than 'unused' as I don't see a way to reset the TTL of a
> MAC_Binding entry when we see packets coming. The implication is that
> we'll be seeing ARPs sent out more often when perhaps they're not
> needed. This also leads to the discussion of making the cache timeout
> configurable.

I've always considered the MAC_Binding implementation incomplete because
of this issue and others.  ovn/TODO.rst says:

* Dynamic IP to MAC binding enhancements.

  OVN has basic support for establishing IP to MAC bindings dynamically, 
using
  ARP.

  * Ratelimiting.

From casual observation, Linux appears to generate at most one ARP per
second per destination.

This might be supported by adding a new OVN logical action for
rate-limiting.

  * Tracking queries

 It's probably best to only record in the database responses to queries
 actually issued by an L3 logical router, so somehow they have to be
 tracked, probably by putting a tentative binding without a MAC address
 into the database.

  * Renewal and expiration.

Something needs to make sure that bindings remain valid and expire those
that become stale.

One way to do this might be to add some support for time to the database
server itself.

  * Table size limiting.

The table of MAC bindings must not be allowed to grow unreasonably 
large.

  * MTU handling (fragmentation on output)

So, what do we do about it?  First, I think that adding support for time
to the database server is a terrible idea (even though I think I wrote
the above originally).  Let's not do that.  The following is some
"thinking out loud" on the subject.

I think there's a challenge around which ovn-controller should take care
of a given MAC_Binding.  We don't want every ovn-controller expiring
every binding.  Ideally, we want exactly one ovn-controller expiring a
binding.  One way would be to add an owner column (but it would be
better if we don't need it).

If we want to keep track of "unused" bindings, I can imagine a
statistical mechanism to do that.  Any user of a binding occasionally
and probabilistically changes a serial number column that we'd introduce
into the MAC_Binding table (this could be optimized to not bother if it
has changed recently).  The owner checks the serial number every so
often and if it hasn't changed then it deletes the row.

The owner could also occasionally revalidate the binding.

Any thoughts?

Thanks,

Ben.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss