Re: [ovs-discuss] TCP TLV option population by OVS?

2018-10-31 Thread benli ye
Thanks Ben for replying. It seems we can have this as user may need to change 
or add TCP option.
For example, the Load Balance Fullnat mode, if we want to use OVS to implement 
LB Fullnat function, we can choose to add client IP address in TCP option

Thanks,
Daniel.

> On Nov 1, 2018, at 1:37 AM, Ben Pfaff  wrote:
> 
> On Wed, Oct 31, 2018 at 01:23:15PM +0800, benli ye wrote:
>> Does anyone know if OVS supports to add TLV option for TCP header now?
> 
> No, it doesn't.

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN SB DB server overload when restarted at large scale environment

2018-10-31 Thread Ben Pfaff
On Tue, Oct 30, 2018 at 11:51:05PM -0700, Han Zhou wrote:
> On Tue, Oct 30, 2018 at 11:15 AM Ben Pfaff  wrote:
> >
> > On Wed, Oct 24, 2018 at 05:42:15PM -0700, Han Zhou wrote:
> > > On Tue, Sep 25, 2018 at 10:18 AM Han Zhou  wrote:
> > > >
> > > >
> > > >
> > > > On Thu, Sep 20, 2018 at 4:43 PM Ben Pfaff  wrote:
> > > > >
> > > > > On Thu, Sep 13, 2018 at 12:28:27PM -0700, Han Zhou wrote:
> > > > > > In scalability test with ovn-scale-test, ovsdb-server SB load is
> not a
> > > > > > problem at least with 1k HVs. However, if we restart the
> ovsdb-server,
> > > > > > depending on the number of HVs and scale of logical objects, e.g.
> the
> > > > > > number of logical ports, ovsdb-server of SB become an obvious
> > > bottleneck.
> > > > > >
> > > > > > In our test with 1k HVs and 20k logical ports (200 lport * 100
> > > lswitches
> > > > > > connected by one single logical router). Restarting ovsdb-server
> of SB
> > > > > > resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs
> > > (and
> > > > > > northd) are reconnecting and resyncing the big amount of data at
> the
> > > same
> > > > > > time. Considering the amount of data and json rpc cost, this is
> not
> > > > > > surprising.
> > > > > >
> > > > > > At this scale, SB ovsdb-server process has RES 303848KB before
> > > restart. It
> > > > > > is likely a big proportion of this size is SB DB data that is
> going
> > > to be
> > > > > > transferred to all 1,001 clients, which is about 300GB. With a
> 10Gbps
> > > NIC,
> > > > > > even the pure network transmission would take ~5 minutes.
> Considering
> > > the
> > > > > > actual size of JSON RPC would be much bigger than the raw data,
> and
> > > the
> > > > > > processing cost of the single thread ovsdb-server, 1 hour is
> > > reasonable.
> > > > > >
> > > > > > In addition to the CPU cost of ovsdb-server, the memory
> consumption
> > > could
> > > > > > also be a problem. Since all clients are syncing data from it,
> > > probably due
> > > > > > to the buffering, RES increases quickly, spiked to 10G at some
> point.
> > > After
> > > > > > all the syncing finished, the RES is back to the similar size as
> > > before
> > > > > > restart. The client side (ovn-controller, northd) were also seeing
> > > memory
> > > > > > spike - it is a huge JSON RPC for the new snapshot of the whole
> DB to
> > > be
> > > > > > downloaded, so it is just buffered until the whole message is
> > > received -
> > > > > > RES peaked at the doubled size of its original size, and then went
> > > back to
> > > > > > the original size after the first round of processing of the new
> > > snapshot.
> > > > > > This means for deploying OVN, this memory spike should be
> considered
> > > for
> > > > > > the SB DB restart scenario, especially the central node.
> > > > > >
> > > > > > Here is some of my brainstorming of how could we improve on this
> (very
> > > > > > rough ones at this stage).
> > > > > > There are two directions: 1) reducing the size of data to be
> > > transferred.
> > > > > > 2) scaling out ovsdb-server.
> > > > > >
> > > > > > 1) Reducing the size of data to be transferred.
> > > > > >
> > > > > > 1.1) Using BSON instead of JSON. It could reduce the size of data,
> > > but not
> > > > > > sure yet how much it could help since most of the data are
> strings. It
> > > > > > might be even worse since the bottleneck is not yet the network
> > > bandwidth
> > > > > > but processing power of ovsdb-server.
> > > > > >
> > > > > > 1.2) Move northd processing to HVs - only relevant NB data needs
> to be
> > > > > > transfered, which is much smaller than the SB DB because there is
> no
> > > > > > logical flows. However, this would lead to more processing load on
> > > > > > ovn-controller on HVs. Also, it is a big/huge architecture change.
> > > > > >
> > > > > > 1.3) Incremental data transfer. The way IDL works is like a cache.
> > > Now when
> > > > > > connection reset the cache has to be rebuilt. But if we know the
> > > version
> > > > > > the current snapshot, even when connection is reset, the client
> can
> > > still
> > > > > > communicate with the newly started server to tell the difference
> of
> > > the
> > > > > > current data and the new data, so that only the delta is
> transferred,
> > > as if
> > > > > > the server is not restarted at all.
> > > > > >
> > > > > > 2) Scaling out the ovsdb-server.
> > > > > >
> > > > > > 2.1) Currently ovsdb-server is single threaded, so that single
> thread
> > > has
> > > > > > to take care of transmission to all clients with 100% CPU. If it
> is
> > > > > > mutli-threaded, more cores can be utilized to make this much
> faster.
> > > > > >
> > > > > > 2.2) Using ovsdb cluster. This feature is supported already but I
> > > haven't
> > > > > > tested it in this scenario yet. If everything works as expected,
> > > there can
> > > > > > be 3 - 5 servers sharing the load, so the transfer should be
> > > completed 3 -
> > > > > > 5 times faster than it is right 

Re: [ovs-discuss] OVS local port overloaded

2018-10-31 Thread Flavio Leitner
On Tue, Oct 30, 2018 at 10:29:37AM +0700, Soe Ye Htet wrote:
> Dear OvS Team,
> 
> I have one problem in openvswitch. Let me state my simple tested toplogy.
> OVS1(RYU)---OVS2. Instead of applying ovs inband mode, I configure my
> own predefiend fules in OvS1 & 2 to apply in band scemario according to my
> work. RYU controller can connect successfully to OvS1 and 2. Then Iperf3
> connection has been established between OvS1 and OvS2. OvS1 is a server and
> OvS2 is a receiver. After sometime, Iperf3 connection is broken and the
> local port ftom OVS2 cannot transmit packet.

See if there is any relevant message in the ovs-vswitchd.log or
journal.

-- 
Flavio

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] Geneve remote_ip as flow for OVN hosts

2018-10-31 Thread Ben Pfaff
Honestly the best thing to do is probably to propose a design or, if
it's simple enough, to send a patch.  That will probably be more
effective at sparking a discussion.

On Wed, Oct 31, 2018 at 03:33:48PM +, venugopal iyer wrote:
>  Hi:
> Just wanted to check if folks had any thoughts on the use case Girish 
> outlined below. We do have
> a real use case for this and are interested in looking at options for 
> supporting more than one VTEP IP.It is currently a limitation for us, wanted 
> to know if there are similar use cases folks are looking at/interested in 
> addressing.
> 
> thanks,
> -venu
> 
> On Thursday, September 6, 2018, 9:19:01 AM PDT, venugopal iyer via dev 
>  wrote:  
>  
>  Would it be possible for the association  to be 
> made
> when the logical port is instantiated on a node? and relayed on to the SB by
> the controller, e.g. assuming a mechanism to specify/determine a physical 
> port mapping for a
> logical port for a VM.  The  mappings can be 
> specified as
> configuration on the chassis. In the absence of physical port information for
> a logical port/VM, I suppose we could default to an encap-ip.
> 
> 
> just a thought,
> -venu
>   On Wednesday, September 5, 2018, 2:03:35 PM PDT, Ben Pfaff  
> wrote:  
>  
>  How would OVN know which IP to use for a given logical port on a
> chassis?
> 
> I think that the "multiple tunnel encapsulations" is meant to cover,
> say, Geneve vs. STT vs. VXLAN, not the case you have in mind.
> 
> On Wed, Sep 05, 2018 at 09:50:32AM -0700, Girish Moodalbail wrote:
> > Hello all,
> > 
> > I would like to add more context here. In the diagram below
> > 
> > +--+
> > |ovn-host                          |
> > |                                  |
> > |                                  |
> > |      +-+|
> > |      |        br-int          ||
> > |      ++-+--+|
> > |            |            |      |
> > |        +--v-+  +---v+  |
> > |        | geneve |  | geneve |  |
> > |        +--+-+  +---++  |
> > |            |            |      |
> > |          +-v+    +--v---+  |
> > |          | IP0  |    | IP1  |  |
> > |          +--+    +--+  |
> > +--+ eth0 +-+ eth1 +---+
> >            +--+    +--+
> > 
> > eth0 and eth are, say, in its own physical segments. The VMs that are
> > instantiated in the above ovn-host will have multiple interfaces and each
> > of those interface need to be on a different Geneve VTEP.
> > 
> > I think the following entry in OVN TODOs (
> > https://github.com/openvswitch/ovs/blob/master/ovn/TODO.rst)
> > 
> > ---8<--8<---
> > Support multiple tunnel encapsulations in Chassis.
> > 
> > So far, both ovn-controller and ovn-controller-vtep only allow chassis to
> > have one tunnel encapsulation entry. We should extend the implementation to
> > support multiple tunnel encapsulations
> > ---8<--8<---
> > 
> > captures the above requirement. Is that the case?
> > 
> > Thanks again.
> > 
> > Regards,
> > ~Girish
> > 
> > 
> > 
> > 
> > On Tue, Sep 4, 2018 at 3:00 PM Girish Moodalbail 
> > wrote:
> > 
> > > Hello all,
> > >
> > > Is it possible to configure remote_ip as a 'flow' instead of an IP address
> > > (i.e., setting ovn-encap-ip to a single IP address)?
> > >
> > > Today, we have one VTEP endpoint per OVN host and all the VMs that
> > > connects to br-int  on that OVN host are reachable behind this VTEP
> > > endpoint. Is it possible to have multiple VTEP endpoints for a br-int
> > > bridge and use Open Flow flows to select one of the VTEP endpoint?
> > >
> > >
> > > +--+
> > > |ovn-host                          |
> > > |                                  |
> > > |                                  |
> > > |      +-+|
> > > |      |        br-int          ||
> > > |      ++-+--+|
> > > |            |            |      |
> > > |        +--v-+  +---v+  |
> > > |        | geneve |  | geneve |  |
> > > |        +--+-+  +---++  |
> > > |            |            |      |
> > > |          +-v+    +--v---+  |
> > > |          | IP0  |    | IP1  |  |
> > > |          +--+    +--+  |
> > > +--+ eth0 +-+ eth1 +---+
> > >            +--+    +--+
> > >
> > > Also, we don't want to bond eth0 and eth1 into a bond interface and then
> > > use bond's IP as VTEP endpoint.
> > >
> > > Thanks in advance,
> > > ~Girish
> > >
> > >
> > >
> > >
> 
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>   
> ___
> dev mailing list

Re: [ovs-discuss] TCP TLV option population by OVS?

2018-10-31 Thread Ben Pfaff
On Wed, Oct 31, 2018 at 01:23:15PM +0800, benli ye wrote:
> Does anyone know if OVS supports to add TLV option for TCP header now?

No, it doesn't.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OvS using newer DPDK

2018-10-31 Thread Ophir Munk
Guys,
Any comments to OVS upgrade to dpdk 18.08? 
https://patchwork.ozlabs.org/project/openvswitch/list/?series=72606

Regards,
Ophir

> -Original Message-
> From: Stokes, Ian [mailto:ian.sto...@intel.com]
> Sent: Wednesday, October 31, 2018 5:52 PM
> To: Andrzej Ostruszka ; ovs-discuss@openvswitch.org
> Cc: Ophir Munk 
> Subject: RE: [ovs-discuss] OvS using newer DPDK
> 
> > Hello all,
> >
> > I remember some time ago there was topic raised here about new LTS
> > release.  I'd like to ask related question - what version of DPDK will
> > it be based on?  18.11 (which is going to be new LTS release of DPDK)?
> >
> 
> Yes, the plan would be ideally to move to DPDK 18.11.
> 
> > If it is then is there anybody already working on that?
> 
> Yes, the dpdk_latest branch was setup for this purpose.
> 
> There are patches submitted to move OVS to use DPDK 18.08 first. From
> there a new set of patches will be created to move to DPDK 18.11. Once
> there is agreement and sign off from the OVS DPDK community we would
> look to apply those to the OVS master branch in time for the OVS 2.11
> release.
> 
> >
> > I'm asking these questions since I've nailed the reason for getting
> > OvS crashes on Marvell Armada 8K board.  They are while attempting to
> > set MTU and there are some patches affecting MTU/MRU calculations that
> might help.
> 
> Are these patches targeted at OVS project or the DPDK project?
> 
> > So basically I might attempt to backport them or try to get OvS
> > working with newer DPDK.
> 
> OVS is moving towards using DPDK LTS releases only for OVS releases and
> the master branch.
> 
> If the patches target DPDK then they could be backported to the relevant
> DPDK LTS releases. Once in place there you could also backport support to
> OVS 2.9 and OVS 2.10 which use DPDK 17.11.
> 
> > Since I prefer the latter I would like to join somebody doing this
> > update (I don't feel comfortable enough with OvS to do that on my
> > own).
> 
> Ok sure, there is not a patch to make DPDK use 18.11 yet. That's in progress.
> I've cc'd Ophir who has been looking at this to date. Once there is a patch 
> for
> 18.11 if you could test it with the Marvell device that would be great help.
> 
> Thanks
> Ian
> >
> > Best regards
> > Andrzej
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmai
> > l.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-
> discussdata=02%7C01%
> >
> 7Cophirmu%40mellanox.com%7C331523e48ebe430445d008d63f48bffd%7C
> a652971c
> >
> 7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636765979023036325sda
> ta=WbdP%2
> >
> FAlmdnLB%2FkX1DeK%2F9vHN3oaBD2DWrXKyG%2Bc7uzQ%3Dreserv
> ed=0
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] Geneve remote_ip as flow for OVN hosts

2018-10-31 Thread venugopal iyer via discuss
 Hi:
Just wanted to check if folks had any thoughts on the use case Girish outlined 
below. We do have
a real use case for this and are interested in looking at options for 
supporting more than one VTEP IP.It is currently a limitation for us, wanted to 
know if there are similar use cases folks are looking at/interested in 
addressing.

thanks,
-venu

On Thursday, September 6, 2018, 9:19:01 AM PDT, venugopal iyer via dev 
 wrote:  
 
 Would it be possible for the association  to be 
made
when the logical port is instantiated on a node? and relayed on to the SB by
the controller, e.g. assuming a mechanism to specify/determine a physical port 
mapping for a
logical port for a VM.  The  mappings can be specified 
as
configuration on the chassis. In the absence of physical port information for
a logical port/VM, I suppose we could default to an encap-ip.


just a thought,
-venu
  On Wednesday, September 5, 2018, 2:03:35 PM PDT, Ben Pfaff  
wrote:  
 
 How would OVN know which IP to use for a given logical port on a
chassis?

I think that the "multiple tunnel encapsulations" is meant to cover,
say, Geneve vs. STT vs. VXLAN, not the case you have in mind.

On Wed, Sep 05, 2018 at 09:50:32AM -0700, Girish Moodalbail wrote:
> Hello all,
> 
> I would like to add more context here. In the diagram below
> 
> +--+
> |ovn-host                          |
> |                                  |
> |                                  |
> |      +-+|
> |      |        br-int          ||
> |      ++-+--+|
> |            |            |      |
> |        +--v-+  +---v+  |
> |        | geneve |  | geneve |  |
> |        +--+-+  +---++  |
> |            |            |      |
> |          +-v+    +--v---+  |
> |          | IP0  |    | IP1  |  |
> |          +--+    +--+  |
> +--+ eth0 +-+ eth1 +---+
>            +--+    +--+
> 
> eth0 and eth are, say, in its own physical segments. The VMs that are
> instantiated in the above ovn-host will have multiple interfaces and each
> of those interface need to be on a different Geneve VTEP.
> 
> I think the following entry in OVN TODOs (
> https://github.com/openvswitch/ovs/blob/master/ovn/TODO.rst)
> 
> ---8<--8<---
> Support multiple tunnel encapsulations in Chassis.
> 
> So far, both ovn-controller and ovn-controller-vtep only allow chassis to
> have one tunnel encapsulation entry. We should extend the implementation to
> support multiple tunnel encapsulations
> ---8<--8<---
> 
> captures the above requirement. Is that the case?
> 
> Thanks again.
> 
> Regards,
> ~Girish
> 
> 
> 
> 
> On Tue, Sep 4, 2018 at 3:00 PM Girish Moodalbail 
> wrote:
> 
> > Hello all,
> >
> > Is it possible to configure remote_ip as a 'flow' instead of an IP address
> > (i.e., setting ovn-encap-ip to a single IP address)?
> >
> > Today, we have one VTEP endpoint per OVN host and all the VMs that
> > connects to br-int  on that OVN host are reachable behind this VTEP
> > endpoint. Is it possible to have multiple VTEP endpoints for a br-int
> > bridge and use Open Flow flows to select one of the VTEP endpoint?
> >
> >
> > +--+
> > |ovn-host                          |
> > |                                  |
> > |                                  |
> > |      +-+|
> > |      |        br-int          ||
> > |      ++-+--+|
> > |            |            |      |
> > |        +--v-+  +---v+  |
> > |        | geneve |  | geneve |  |
> > |        +--+-+  +---++  |
> > |            |            |      |
> > |          +-v+    +--v---+  |
> > |          | IP0  |    | IP1  |  |
> > |          +--+    +--+  |
> > +--+ eth0 +-+ eth1 +---+
> >            +--+    +--+
> >
> > Also, we don't want to bond eth0 and eth1 into a bond interface and then
> > use bond's IP as VTEP endpoint.
> >
> > Thanks in advance,
> > ~Girish
> >
> >
> >
> >

> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
  
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
  ___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OvS using newer DPDK

2018-10-31 Thread Stokes, Ian
> Hello all,
> 
> I remember some time ago there was topic raised here about new LTS
> release.  I'd like to ask related question - what version of DPDK will it
> be based on?  18.11 (which is going to be new LTS release of DPDK)?
> 

Yes, the plan would be ideally to move to DPDK 18.11.

> If it is then is there anybody already working on that?

Yes, the dpdk_latest branch was setup for this purpose.

There are patches submitted to move OVS to use DPDK 18.08 first. From there a 
new set of patches will be created to move to DPDK 18.11. Once there is 
agreement and sign off from the OVS DPDK community we would look to apply those 
to the OVS master branch in time for the OVS 2.11 release.

> 
> I'm asking these questions since I've nailed the reason for getting OvS
> crashes on Marvell Armada 8K board.  They are while attempting to set MTU
> and there are some patches affecting MTU/MRU calculations that might help.

Are these patches targeted at OVS project or the DPDK project?

> So basically I might attempt to backport them or try to get OvS working
> with newer DPDK.

OVS is moving towards using DPDK LTS releases only for OVS releases and the 
master branch.

If the patches target DPDK then they could be backported to the relevant DPDK 
LTS releases. Once in place there you could also backport support to OVS 2.9 
and OVS 2.10 which use DPDK 17.11.

> Since I prefer the latter I would like to join somebody
> doing this update (I don't feel comfortable enough with OvS to do that on
> my own).

Ok sure, there is not a patch to make DPDK use 18.11 yet. That's in progress. 
I've cc'd Ophir who has been looking at this to date. Once there is a patch for 
18.11 if you could test it with the Marvell device that would be great help.

Thanks
Ian
> 
> Best regards
> Andrzej
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] OvS using newer DPDK

2018-10-31 Thread Andrzej Ostruszka
Hello all,

I remember some time ago there was topic raised here about new LTS
release.  I'd like to ask related question - what version of DPDK will
it be based on?  18.11 (which is going to be new LTS release of DPDK)?

If it is then is there anybody already working on that?

I'm asking these questions since I've nailed the reason for getting OvS
crashes on Marvell Armada 8K board.  They are while attempting to set
MTU and there are some patches affecting MTU/MRU calculations that might
help.  So basically I might attempt to backport them or try to get OvS
working with newer DPDK.  Since I prefer the latter I would like to join
somebody doing this update (I don't feel comfortable enough with OvS to
do that on my own).

Best regards
Andrzej
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Problems executing decap(eth)+encap(eth) actions

2018-10-31 Thread Gregory Rose


On 10/31/2018 5:40 AM, Jaime Caamaño Ruiz wrote:

Greg, I submitted this patch [1], let me know if anything looks bad.

[1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-October/353410.html


I'll have a look and comment there.

Thanks!



Thanks
Jaime.


-Original Message-
From: Jaime Caamaño Ruiz 
Reply-to: jcaam...@suse.com
To: Gregory Rose , ovs-discuss@openvswitch.org
Subject: Re: [ovs-discuss] Problems executing decap(eth)+encap(eth)
actions
Date: Wed, 31 Oct 2018 12:07:59 +0100

Let me give it a try.

Aside for the fix on master, who takes care of mapping the fix to
bugfix releases?

BR
Jaime.

-Original Message-
From: Gregory Rose 
To: ovs-discuss@openvswitch.org, jcaam...@suse.de
Subject: Re: [ovs-discuss] Problems executing decap(eth)+encap(eth)
actions
Date: Tue, 30 Oct 2018 14:42:15 -0700

On 10/29/2018 3:38 AM, Jaime Caamaño Ruiz wrote:

Hey Greg. Thanks for helping out. I did build OVS with the fix and it
got my problem sorted without causing any additional ones on my
environment. Let me know if I can help with anything else.

BR
Jaime.

Jaime,

you seem to have identified a bug!

Using printks with a simple rule to just decap and then encap an
Ethernet header we see this with the code
as it is right now:

[13568.973807] __ovs_nla_copy_actions:3007 <- decap
[13568.973812] __ovs_nla_copy_actions:3012 <- decap succeeds but sets
mac_proto = MAC_PROTO_ETHERNET
[13568.973815] __ovs_nla_copy_actions:2999 <- encap
[13568.973818] openvswitch: netlink: Flow actions may not be safe on
all
matching packets. <- returns -EINVAL

Note that the decap happens at lines 3007-3012 and is successful.
However, the very next encap action
starting at line 2999 does not finish and returns -EINVAL so a printk
at
line 3002 does not execute.
If I change the code as you suggested the flow of decap/encap works
without complaint and without
returning -EINVAL:

[13838.435051] __ovs_nla_copy_actions:3007 <- decap
[13838.435054] __ovs_nla_copy_actions:3012 <-decap succeeds and sets
mac_proto = MAC_PROTO_NONE
[13838.435055] __ovs_nla_copy_actions:2999 <- encap
[13838.435056] __ovs_nla_copy_actions:3002 <- encap succeeds and sets
mac_proto = MAC_PROTO_ETHERNET

Thank you for finding this bug.  Do you wish to send the patch to fix
it
or would you prefer me to do it?

Regards,

- Greg



-Original Message-
From: Gregory Rose 
To: ovs-discuss@openvswitch.org, jcaam...@suse.de
Subject: Re: [ovs-discuss] Problems executing decap(eth)+encap(eth)
actions
Date: Fri, 26 Oct 2018 15:42:51 -0700

On 10/19/2018 1:39 AM, Jaime Caamaño Ruiz wrote:

Hello

When using nsh encapsulation, it's useful to normalize your
pipeline
to
packet_type=nsh, poping an ethernet header on input if necessary
and
pushing an ethernet header again if required before output.

But it seems to be problematic:

---
2018-10-18T13:10:59.196Z|00010|dpif(handler3)|WARN|system@ovs-syste
m:
execute
pop_eth,push_eth(src=fe:16:3e:c1:9e:87,dst=fa:16:3e:c1:9e:87),5
failed (Invalid argument) on packet
vlan_tci=0x,dl_src=fa:16:3e:c2:e6:68,dl_dst=fe:16:3e:c2:e6:68,d
l_
ty
pe=0x894f,nsh_flags=0,nsh_ttl=63,nsh_mdtype=1,nsh_np=3,nsh_spi=0x1a
,n
sh
_si=254,nsh_c1=0xc0a82a01,nsh_c2=0x3,nsh_c3=0x0,nsh_c4=0x9100,n
w_
pr
oto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
with metadata
skb_priority(0),tunnel(tun_id=0x0,src=192.168.42.1,dst=192.168.42.3
,t
tl
=64,tp_src=47656,tp_dst=4789,flags(key)),skb_mark(0),in_port(4) mtu
0
---

Looking at the code datapath/flow_netlink.c @
__ovs_nla_copy_actions:

   case OVS_ACTION_ATTR_PUSH_ETH:
   /* Disallow pushing an Ethernet header if
one
* is already present */
   if (mac_proto != MAC_PROTO_NONE)
   return -EINVAL;
   mac_proto = MAC_PROTO_NONE;
   break;

   case OVS_ACTION_ATTR_POP_ETH:
   if (mac_proto != MAC_PROTO_ETHERNET)
   return -EINVAL;
   if (vlan_tci & htons(VLAN_TAG_PRESENT))
   return -EINVAL;
   mac_proto = MAC_PROTO_ETHERNET;
   break;


Isn't the mac_proto set inverted here, should'nt it look like this?


   case OVS_ACTION_ATTR_PUSH_ETH:
   /* Disallow pushing an Ethernet header if
one
* is already present */
   if (mac_proto != MAC_PROTO_NONE)
   return -EINVAL;
   mac_proto = MAC_PROTO_ETHERNET;
   break;

   case OVS_ACTION_ATTR_POP_ETH:
   if (mac_proto != MAC_PROTO_ETHERNET)
   return -EINVAL;
   if (vlan_tci & htons(VLAN_TAG_PRESENT))
   return -EINVAL;

Re: [ovs-discuss] Problems executing decap(eth)+encap(eth) actions

2018-10-31 Thread Gregory Rose

On 10/31/2018 4:07 AM, Jaime Caamaño Ruiz wrote:

Let me give it a try.

Aside for the fix on master, who takes care of mapping the fix to
bugfix releases?


Jaime,

The maintainers will take care of backporting to previous release 
branches where the fix is appropriate.


Thanks,

- Greg



BR
Jaime.

-Original Message-
From: Gregory Rose 
To: ovs-discuss@openvswitch.org, jcaam...@suse.de
Subject: Re: [ovs-discuss] Problems executing decap(eth)+encap(eth)
actions
Date: Tue, 30 Oct 2018 14:42:15 -0700

On 10/29/2018 3:38 AM, Jaime Caamaño Ruiz wrote:

Hey Greg. Thanks for helping out. I did build OVS with the fix and it
got my problem sorted without causing any additional ones on my
environment. Let me know if I can help with anything else.

BR
Jaime.

Jaime,

you seem to have identified a bug!

Using printks with a simple rule to just decap and then encap an
Ethernet header we see this with the code
as it is right now:

[13568.973807] __ovs_nla_copy_actions:3007 <- decap
[13568.973812] __ovs_nla_copy_actions:3012 <- decap succeeds but sets
mac_proto = MAC_PROTO_ETHERNET
[13568.973815] __ovs_nla_copy_actions:2999 <- encap
[13568.973818] openvswitch: netlink: Flow actions may not be safe on
all
matching packets. <- returns -EINVAL

Note that the decap happens at lines 3007-3012 and is successful.
However, the very next encap action
starting at line 2999 does not finish and returns -EINVAL so a printk
at
line 3002 does not execute.
If I change the code as you suggested the flow of decap/encap works
without complaint and without
returning -EINVAL:

[13838.435051] __ovs_nla_copy_actions:3007 <- decap
[13838.435054] __ovs_nla_copy_actions:3012 <-decap succeeds and sets
mac_proto = MAC_PROTO_NONE
[13838.435055] __ovs_nla_copy_actions:2999 <- encap
[13838.435056] __ovs_nla_copy_actions:3002 <- encap succeeds and sets
mac_proto = MAC_PROTO_ETHERNET

Thank you for finding this bug.  Do you wish to send the patch to fix
it
or would you prefer me to do it?

Regards,

- Greg



-Original Message-
From: Gregory Rose 
To: ovs-discuss@openvswitch.org, jcaam...@suse.de
Subject: Re: [ovs-discuss] Problems executing decap(eth)+encap(eth)
actions
Date: Fri, 26 Oct 2018 15:42:51 -0700

On 10/19/2018 1:39 AM, Jaime Caamaño Ruiz wrote:

Hello

When using nsh encapsulation, it's useful to normalize your
pipeline
to
packet_type=nsh, poping an ethernet header on input if necessary
and
pushing an ethernet header again if required before output.

But it seems to be problematic:

---
2018-10-18T13:10:59.196Z|00010|dpif(handler3)|WARN|system@ovs-syste
m:
execute
pop_eth,push_eth(src=fe:16:3e:c1:9e:87,dst=fa:16:3e:c1:9e:87),5
failed (Invalid argument) on packet
vlan_tci=0x,dl_src=fa:16:3e:c2:e6:68,dl_dst=fe:16:3e:c2:e6:68,d
l_
ty
pe=0x894f,nsh_flags=0,nsh_ttl=63,nsh_mdtype=1,nsh_np=3,nsh_spi=0x1a
,n
sh
_si=254,nsh_c1=0xc0a82a01,nsh_c2=0x3,nsh_c3=0x0,nsh_c4=0x9100,n
w_
pr
oto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
with metadata
skb_priority(0),tunnel(tun_id=0x0,src=192.168.42.1,dst=192.168.42.3
,t
tl
=64,tp_src=47656,tp_dst=4789,flags(key)),skb_mark(0),in_port(4) mtu
0
---

Looking at the code datapath/flow_netlink.c @
__ovs_nla_copy_actions:

   case OVS_ACTION_ATTR_PUSH_ETH:
   /* Disallow pushing an Ethernet header if
one
* is already present */
   if (mac_proto != MAC_PROTO_NONE)
   return -EINVAL;
   mac_proto = MAC_PROTO_NONE;
   break;

   case OVS_ACTION_ATTR_POP_ETH:
   if (mac_proto != MAC_PROTO_ETHERNET)
   return -EINVAL;
   if (vlan_tci & htons(VLAN_TAG_PRESENT))
   return -EINVAL;
   mac_proto = MAC_PROTO_ETHERNET;
   break;


Isn't the mac_proto set inverted here, should'nt it look like this?


   case OVS_ACTION_ATTR_PUSH_ETH:
   /* Disallow pushing an Ethernet header if
one
* is already present */
   if (mac_proto != MAC_PROTO_NONE)
   return -EINVAL;
   mac_proto = MAC_PROTO_ETHERNET;
   break;

   case OVS_ACTION_ATTR_POP_ETH:
   if (mac_proto != MAC_PROTO_ETHERNET)
   return -EINVAL;
   if (vlan_tci & htons(VLAN_TAG_PRESENT))
   return -EINVAL;
   mac_proto = MAC_PROTO_NONE;
   break;

Jaime,

I am looking into this and at first sight this does look inverted but
we
have no other reported bugs
in this area so I want to be careful that we don't break anything
else
while fixing this.  Have you tried
building OVS with 

Re: [ovs-discuss] Problems executing decap(eth)+encap(eth) actions

2018-10-31 Thread Jaime Caamaño Ruiz
Greg, I submitted this patch [1], let me know if anything looks bad.

[1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-October/353410.html

Thanks
Jaime.


-Original Message-
From: Jaime Caamaño Ruiz 
Reply-to: jcaam...@suse.com
To: Gregory Rose , ovs-discuss@openvswitch.org
Subject: Re: [ovs-discuss] Problems executing decap(eth)+encap(eth)
actions
Date: Wed, 31 Oct 2018 12:07:59 +0100

Let me give it a try.

Aside for the fix on master, who takes care of mapping the fix to
bugfix releases?

BR
Jaime.

-Original Message-
From: Gregory Rose 
To: ovs-discuss@openvswitch.org, jcaam...@suse.de
Subject: Re: [ovs-discuss] Problems executing decap(eth)+encap(eth)
actions
Date: Tue, 30 Oct 2018 14:42:15 -0700

On 10/29/2018 3:38 AM, Jaime Caamaño Ruiz wrote:
> Hey Greg. Thanks for helping out. I did build OVS with the fix and it
> got my problem sorted without causing any additional ones on my
> environment. Let me know if I can help with anything else.
> 
> BR
> Jaime.

Jaime,

you seem to have identified a bug!

Using printks with a simple rule to just decap and then encap an 
Ethernet header we see this with the code
as it is right now:

[13568.973807] __ovs_nla_copy_actions:3007 <- decap
[13568.973812] __ovs_nla_copy_actions:3012 <- decap succeeds but sets 
mac_proto = MAC_PROTO_ETHERNET
[13568.973815] __ovs_nla_copy_actions:2999 <- encap
[13568.973818] openvswitch: netlink: Flow actions may not be safe on
all 
matching packets. <- returns -EINVAL

Note that the decap happens at lines 3007-3012 and is successful. 
However, the very next encap action
starting at line 2999 does not finish and returns -EINVAL so a printk
at 
line 3002 does not execute.
If I change the code as you suggested the flow of decap/encap works 
without complaint and without
returning -EINVAL:

[13838.435051] __ovs_nla_copy_actions:3007 <- decap
[13838.435054] __ovs_nla_copy_actions:3012 <-decap succeeds and sets 
mac_proto = MAC_PROTO_NONE
[13838.435055] __ovs_nla_copy_actions:2999 <- encap
[13838.435056] __ovs_nla_copy_actions:3002 <- encap succeeds and sets 
mac_proto = MAC_PROTO_ETHERNET

Thank you for finding this bug.  Do you wish to send the patch to fix
it 
or would you prefer me to do it?

Regards,

- Greg

> 
> 
> -Original Message-
> From: Gregory Rose 
> To: ovs-discuss@openvswitch.org, jcaam...@suse.de
> Subject: Re: [ovs-discuss] Problems executing decap(eth)+encap(eth)
> actions
> Date: Fri, 26 Oct 2018 15:42:51 -0700
> 
> On 10/19/2018 1:39 AM, Jaime Caamaño Ruiz wrote:
> > Hello
> > 
> > When using nsh encapsulation, it's useful to normalize your
> > pipeline
> > to
> > packet_type=nsh, poping an ethernet header on input if necessary
> > and
> > pushing an ethernet header again if required before output.
> > 
> > But it seems to be problematic:
> > 
> > ---
> > 2018-10-18T13:10:59.196Z|00010|dpif(handler3)|WARN|system@ovs-syste
> > m:
> > execute
> > pop_eth,push_eth(src=fe:16:3e:c1:9e:87,dst=fa:16:3e:c1:9e:87),5
> > failed (Invalid argument) on packet
> > vlan_tci=0x,dl_src=fa:16:3e:c2:e6:68,dl_dst=fe:16:3e:c2:e6:68,d
> > l_
> > ty
> > pe=0x894f,nsh_flags=0,nsh_ttl=63,nsh_mdtype=1,nsh_np=3,nsh_spi=0x1a
> > ,n
> > sh
> > _si=254,nsh_c1=0xc0a82a01,nsh_c2=0x3,nsh_c3=0x0,nsh_c4=0x9100,n
> > w_
> > pr
> > oto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
> >with metadata
> > skb_priority(0),tunnel(tun_id=0x0,src=192.168.42.1,dst=192.168.42.3
> > ,t
> > tl
> > =64,tp_src=47656,tp_dst=4789,flags(key)),skb_mark(0),in_port(4) mtu
> > 0
> > ---
> > 
> > Looking at the code datapath/flow_netlink.c @
> > __ovs_nla_copy_actions:
> > 
> >   case OVS_ACTION_ATTR_PUSH_ETH:
> >   /* Disallow pushing an Ethernet header if
> > one
> >* is already present */
> >   if (mac_proto != MAC_PROTO_NONE)
> >   return -EINVAL;
> >   mac_proto = MAC_PROTO_NONE;
> >   break;
> > 
> >   case OVS_ACTION_ATTR_POP_ETH:
> >   if (mac_proto != MAC_PROTO_ETHERNET)
> >   return -EINVAL;
> >   if (vlan_tci & htons(VLAN_TAG_PRESENT))
> >   return -EINVAL;
> >   mac_proto = MAC_PROTO_ETHERNET;
> >   break;
> > 
> > 
> > Isn't the mac_proto set inverted here, should'nt it look like this?
> > 
> > 
> >   case OVS_ACTION_ATTR_PUSH_ETH:
> >   /* Disallow pushing an Ethernet header if
> > one
> >* is already present */
> >   if (mac_proto != MAC_PROTO_NONE)
> >   return -EINVAL;
> >   mac_proto = MAC_PROTO_ETHERNET;
> >   break;
> > 
> >   case OVS_ACTION_ATTR_POP_ETH:
> >   if (mac_proto 

Re: [ovs-discuss] Problems executing decap(eth)+encap(eth) actions

2018-10-31 Thread Jaime Caamaño Ruiz
Let me give it a try.

Aside for the fix on master, who takes care of mapping the fix to
bugfix releases?

BR
Jaime.

-Original Message-
From: Gregory Rose 
To: ovs-discuss@openvswitch.org, jcaam...@suse.de
Subject: Re: [ovs-discuss] Problems executing decap(eth)+encap(eth)
actions
Date: Tue, 30 Oct 2018 14:42:15 -0700

On 10/29/2018 3:38 AM, Jaime Caamaño Ruiz wrote:
> Hey Greg. Thanks for helping out. I did build OVS with the fix and it
> got my problem sorted without causing any additional ones on my
> environment. Let me know if I can help with anything else.
> 
> BR
> Jaime.

Jaime,

you seem to have identified a bug!

Using printks with a simple rule to just decap and then encap an 
Ethernet header we see this with the code
as it is right now:

[13568.973807] __ovs_nla_copy_actions:3007 <- decap
[13568.973812] __ovs_nla_copy_actions:3012 <- decap succeeds but sets 
mac_proto = MAC_PROTO_ETHERNET
[13568.973815] __ovs_nla_copy_actions:2999 <- encap
[13568.973818] openvswitch: netlink: Flow actions may not be safe on
all 
matching packets. <- returns -EINVAL

Note that the decap happens at lines 3007-3012 and is successful. 
However, the very next encap action
starting at line 2999 does not finish and returns -EINVAL so a printk
at 
line 3002 does not execute.
If I change the code as you suggested the flow of decap/encap works 
without complaint and without
returning -EINVAL:

[13838.435051] __ovs_nla_copy_actions:3007 <- decap
[13838.435054] __ovs_nla_copy_actions:3012 <-decap succeeds and sets 
mac_proto = MAC_PROTO_NONE
[13838.435055] __ovs_nla_copy_actions:2999 <- encap
[13838.435056] __ovs_nla_copy_actions:3002 <- encap succeeds and sets 
mac_proto = MAC_PROTO_ETHERNET

Thank you for finding this bug.  Do you wish to send the patch to fix
it 
or would you prefer me to do it?

Regards,

- Greg

> 
> 
> -Original Message-
> From: Gregory Rose 
> To: ovs-discuss@openvswitch.org, jcaam...@suse.de
> Subject: Re: [ovs-discuss] Problems executing decap(eth)+encap(eth)
> actions
> Date: Fri, 26 Oct 2018 15:42:51 -0700
> 
> On 10/19/2018 1:39 AM, Jaime Caamaño Ruiz wrote:
> > Hello
> > 
> > When using nsh encapsulation, it's useful to normalize your
> > pipeline
> > to
> > packet_type=nsh, poping an ethernet header on input if necessary
> > and
> > pushing an ethernet header again if required before output.
> > 
> > But it seems to be problematic:
> > 
> > ---
> > 2018-10-18T13:10:59.196Z|00010|dpif(handler3)|WARN|system@ovs-syste
> > m:
> > execute
> > pop_eth,push_eth(src=fe:16:3e:c1:9e:87,dst=fa:16:3e:c1:9e:87),5
> > failed (Invalid argument) on packet
> > vlan_tci=0x,dl_src=fa:16:3e:c2:e6:68,dl_dst=fe:16:3e:c2:e6:68,d
> > l_
> > ty
> > pe=0x894f,nsh_flags=0,nsh_ttl=63,nsh_mdtype=1,nsh_np=3,nsh_spi=0x1a
> > ,n
> > sh
> > _si=254,nsh_c1=0xc0a82a01,nsh_c2=0x3,nsh_c3=0x0,nsh_c4=0x9100,n
> > w_
> > pr
> > oto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
> >with metadata
> > skb_priority(0),tunnel(tun_id=0x0,src=192.168.42.1,dst=192.168.42.3
> > ,t
> > tl
> > =64,tp_src=47656,tp_dst=4789,flags(key)),skb_mark(0),in_port(4) mtu
> > 0
> > ---
> > 
> > Looking at the code datapath/flow_netlink.c @
> > __ovs_nla_copy_actions:
> > 
> >   case OVS_ACTION_ATTR_PUSH_ETH:
> >   /* Disallow pushing an Ethernet header if
> > one
> >* is already present */
> >   if (mac_proto != MAC_PROTO_NONE)
> >   return -EINVAL;
> >   mac_proto = MAC_PROTO_NONE;
> >   break;
> > 
> >   case OVS_ACTION_ATTR_POP_ETH:
> >   if (mac_proto != MAC_PROTO_ETHERNET)
> >   return -EINVAL;
> >   if (vlan_tci & htons(VLAN_TAG_PRESENT))
> >   return -EINVAL;
> >   mac_proto = MAC_PROTO_ETHERNET;
> >   break;
> > 
> > 
> > Isn't the mac_proto set inverted here, should'nt it look like this?
> > 
> > 
> >   case OVS_ACTION_ATTR_PUSH_ETH:
> >   /* Disallow pushing an Ethernet header if
> > one
> >* is already present */
> >   if (mac_proto != MAC_PROTO_NONE)
> >   return -EINVAL;
> >   mac_proto = MAC_PROTO_ETHERNET;
> >   break;
> > 
> >   case OVS_ACTION_ATTR_POP_ETH:
> >   if (mac_proto != MAC_PROTO_ETHERNET)
> >   return -EINVAL;
> >   if (vlan_tci & htons(VLAN_TAG_PRESENT))
> >   return -EINVAL;
> >   mac_proto = MAC_PROTO_NONE;
> >   break;
> 
> Jaime,
> 
> I am looking into this and at first sight this does look inverted but
> we
> have no other 

Re: [ovs-discuss] OVS bridges in docker containers segfault when dpdkvhostuser port is added.

2018-10-31 Thread Stokes, Ian
> On Thu, Oct 25, 2018 at 09:51:38PM +0200, Alan Kayahan wrote:
> > Hello,
> >
> > I have 3 OVS bridges on the same host, connected to each other as
> > br1<->br2<->br3. br1 and br3 are connected to the docker container cA
> > via dpdkvhostuser port type (I know it is deprecated, the app works
> > this way only). The DPDK app running in cA generate packets, which
> > traverse bridges br1->br2->br3, then ends up back at the DPDK app.
> > This setup works fine.
> >
> > Now I am trying to put each OVS bridge into its respective docker
> > container. I connect the containers with veth pairs, then add the veth
> > ports to the bridges. Next, I add a dpdkvhostuser port named SRC to
> > br1, so far so good. The moment I add a dpdkvhostuser port named SNK
> > to br3, ovs-vswitchd services in br1's and br3's containers segfault.
> > Following are the backtraces from each,

What version of OVS and DPDK are you using?

> >
> > --br1's container---
> >
> > [Thread debugging using libthread_db enabled] Using host libthread_db
> > library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> > Core was generated by `ovs-vswitchd
> > unix:/usr/local/var/run/openvswitch/db.sock -vconsole:emer -vsyslo'.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x5608fa0f321b in netdev_rxq_recv (rx=0x7ff13c34ee80,
> > batch=batch@entry=0x7ff1bbb4d890) at lib/netdev.c:702
> > 702retval = rx->netdev->netdev_class->rxq_recv(rx, batch);
> > [Current thread is 1 (Thread 0x7ff1bbb4e700 (LWP 376))]
> > (gdb) bt
> > #0  0x5608fa0f321b in netdev_rxq_recv (rx=0x7ff13c34ee80,
> > batch=batch@entry=0x7ff1bbb4d890) at lib/netdev.c:702
> > #1  0x5608fa0cce65 in dp_netdev_process_rxq_port (
> > pmd=pmd@entry=0x7ff1bbb4f010, rxq=0x5608fb651be0, port_no=1)
> > at lib/dpif-netdev.c:3279
> > #2  0x5608fa0cd296 in pmd_thread_main (f_=)
> > at lib/dpif-netdev.c:4145
> > #3  0x5608fa14a836 in ovsthread_wrapper (aux_=)
> > at lib/ovs-thread.c:348
> > #4  0x7ff1c52517fc in start_thread (arg=0x7ff1bbb4e700)
> > at pthread_create.c:465
> > #5  0x7ff1c4815b5f in clone ()
> > at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> >
> > --br3's container---
> >
> > [Thread debugging using libthread_db enabled] Using host libthread_db
> > library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> > Core was generated by `ovs-vswitchd
> > unix:/usr/local/var/run/openvswitch/db.sock -vconsole:emer -vsyslo'.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x55c517e3abcb in rte_mempool_free_memchunks () [Current
> > thread is 1 (Thread 0x7f202351f300 (LWP 647))]
> > (gdb) bt
> > #0  0x55c517e3abcb in rte_mempool_free_memchunks ()
> > #1  0x55c517e3ad46 in rte_mempool_free.part ()
> > #2  0x55c518218b78 in dpdk_mp_free (mp=0x7f603fe66a00)
> > at lib/netdev-dpdk.c:599
> > #3  0x55c518218ff0 in dpdk_mp_free (mp=)
> > at lib/netdev-dpdk.c:593
> > #4  netdev_dpdk_mempool_configure (dev=0x7f1f7ffeac00) at
> > lib/netdev-dpdk.c:629
> > #5  0x55c51821a98d in dpdk_vhost_reconfigure_helper
> (dev=0x7f1f7ffeac00)
> > at lib/netdev-dpdk.c:3599
> > #6  0x55c51821ac8b in netdev_dpdk_vhost_reconfigure
> (netdev=0x7f1f7ffebcc0)
> > at lib/netdev-dpdk.c:3624
> > #7  0x55c51813fe6b in port_reconfigure (port=0x55c51a4522a0)
> > at lib/dpif-netdev.c:3341
> > #8  reconfigure_datapath (dp=dp@entry=0x55c51a46efc0) at
> > lib/dpif-netdev.c:3822
> > #9  0x55c5181403e8 in do_add_port (dp=dp@entry=0x55c51a46efc0,
> > devname=devname@entry=0x55c51a456520 "SNK",
> > type=0x55c51834f7bd "dpdkvhostuser", port_no=port_no@entry=1)
> > at lib/dpif-netdev.c:1584
> > #10 0x55c51814059b in dpif_netdev_port_add (dpif=,
> > netdev=0x7f1f7ffebcc0, port_nop=0x7fffb4eef68c) at
> > lib/dpif-netdev.c:1610
> > #11 0x55c5181469be in dpif_port_add (dpif=0x55c51a469350,
> > netdev=netdev@entry=0x7f1f7ffebcc0,
> port_nop=port_nop@entry=0x7fffb4eef6ec)
> > at lib/dpif.c:579
> > ---Type  to continue, or q  to quit---
> > #12 0x55c5180f9f28 in port_add (ofproto_=0x55c51a464ee0,
> > netdev=0x7f1f7ffebcc0) at ofproto/ofproto-dpif.c:3645
> > #13 0x55c5180ecafe in ofproto_port_add (ofproto=0x55c51a464ee0,
> > netdev=0x7f1f7ffebcc0, ofp_portp=ofp_portp@entry=0x7fffb4eef7e8) at
> > ofproto/ofproto.c:1999
> > #14 0x55c5180d97e6 in iface_do_create (errp=0x7fffb4eef7f8,
> > netdevp=0x7fffb4eef7f0, ofp_portp=0x7fffb4eef7e8,
> > iface_cfg=0x55c51a46d590, br=0x55c51a4415b0)
> > at vswitchd/bridge.c:1799
> > #15 iface_create (port_cfg=0x55c51a46e210, iface_cfg=0x55c51a46d590,
> > br=0x55c51a4415b0) at vswitchd/bridge.c:1837
> > #16 bridge_add_ports__ (br=br@entry=0x55c51a4415b0,
> > wanted_ports=wanted_ports@entry=0x55c51a441690,
> > with_requested_port=with_requested_port@entry=true) at
> > vswitchd/bridge.c:931
> > #17 0x55c5180db87a in bridge_add_ports
> > 

Re: [ovs-discuss] OVN SB DB server overload when restarted at large scale environment

2018-10-31 Thread Han Zhou
On Tue, Oct 30, 2018 at 11:15 AM Ben Pfaff  wrote:
>
> On Wed, Oct 24, 2018 at 05:42:15PM -0700, Han Zhou wrote:
> > On Tue, Sep 25, 2018 at 10:18 AM Han Zhou  wrote:
> > >
> > >
> > >
> > > On Thu, Sep 20, 2018 at 4:43 PM Ben Pfaff  wrote:
> > > >
> > > > On Thu, Sep 13, 2018 at 12:28:27PM -0700, Han Zhou wrote:
> > > > > In scalability test with ovn-scale-test, ovsdb-server SB load is
not a
> > > > > problem at least with 1k HVs. However, if we restart the
ovsdb-server,
> > > > > depending on the number of HVs and scale of logical objects, e.g.
the
> > > > > number of logical ports, ovsdb-server of SB become an obvious
> > bottleneck.
> > > > >
> > > > > In our test with 1k HVs and 20k logical ports (200 lport * 100
> > lswitches
> > > > > connected by one single logical router). Restarting ovsdb-server
of SB
> > > > > resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs
> > (and
> > > > > northd) are reconnecting and resyncing the big amount of data at
the
> > same
> > > > > time. Considering the amount of data and json rpc cost, this is
not
> > > > > surprising.
> > > > >
> > > > > At this scale, SB ovsdb-server process has RES 303848KB before
> > restart. It
> > > > > is likely a big proportion of this size is SB DB data that is
going
> > to be
> > > > > transferred to all 1,001 clients, which is about 300GB. With a
10Gbps
> > NIC,
> > > > > even the pure network transmission would take ~5 minutes.
Considering
> > the
> > > > > actual size of JSON RPC would be much bigger than the raw data,
and
> > the
> > > > > processing cost of the single thread ovsdb-server, 1 hour is
> > reasonable.
> > > > >
> > > > > In addition to the CPU cost of ovsdb-server, the memory
consumption
> > could
> > > > > also be a problem. Since all clients are syncing data from it,
> > probably due
> > > > > to the buffering, RES increases quickly, spiked to 10G at some
point.
> > After
> > > > > all the syncing finished, the RES is back to the similar size as
> > before
> > > > > restart. The client side (ovn-controller, northd) were also seeing
> > memory
> > > > > spike - it is a huge JSON RPC for the new snapshot of the whole
DB to
> > be
> > > > > downloaded, so it is just buffered until the whole message is
> > received -
> > > > > RES peaked at the doubled size of its original size, and then went
> > back to
> > > > > the original size after the first round of processing of the new
> > snapshot.
> > > > > This means for deploying OVN, this memory spike should be
considered
> > for
> > > > > the SB DB restart scenario, especially the central node.
> > > > >
> > > > > Here is some of my brainstorming of how could we improve on this
(very
> > > > > rough ones at this stage).
> > > > > There are two directions: 1) reducing the size of data to be
> > transferred.
> > > > > 2) scaling out ovsdb-server.
> > > > >
> > > > > 1) Reducing the size of data to be transferred.
> > > > >
> > > > > 1.1) Using BSON instead of JSON. It could reduce the size of data,
> > but not
> > > > > sure yet how much it could help since most of the data are
strings. It
> > > > > might be even worse since the bottleneck is not yet the network
> > bandwidth
> > > > > but processing power of ovsdb-server.
> > > > >
> > > > > 1.2) Move northd processing to HVs - only relevant NB data needs
to be
> > > > > transfered, which is much smaller than the SB DB because there is
no
> > > > > logical flows. However, this would lead to more processing load on
> > > > > ovn-controller on HVs. Also, it is a big/huge architecture change.
> > > > >
> > > > > 1.3) Incremental data transfer. The way IDL works is like a cache.
> > Now when
> > > > > connection reset the cache has to be rebuilt. But if we know the
> > version
> > > > > the current snapshot, even when connection is reset, the client
can
> > still
> > > > > communicate with the newly started server to tell the difference
of
> > the
> > > > > current data and the new data, so that only the delta is
transferred,
> > as if
> > > > > the server is not restarted at all.
> > > > >
> > > > > 2) Scaling out the ovsdb-server.
> > > > >
> > > > > 2.1) Currently ovsdb-server is single threaded, so that single
thread
> > has
> > > > > to take care of transmission to all clients with 100% CPU. If it
is
> > > > > mutli-threaded, more cores can be utilized to make this much
faster.
> > > > >
> > > > > 2.2) Using ovsdb cluster. This feature is supported already but I
> > haven't
> > > > > tested it in this scenario yet. If everything works as expected,
> > there can
> > > > > be 3 - 5 servers sharing the load, so the transfer should be
> > completed 3 -
> > > > > 5 times faster than it is right now. However, this is a limit of
how
> > many
> > > > > nodes there can be in a cluster, so the problem can be alleviated
but
> > may
> > > > > still be a problem if the data size goes bigger.
> > > > >
> > > > > 2.3) Using readonly copies of ovsdb replications. If
ovn-controller
> > > > > connects to readonly copies,