Re: [ovs-dev] [PATCH v1] ovn: Extend Address_Set to Macros to support define port name sets

2016-08-09 Thread Zong Kai Li
On Thu, Aug 4, 2016 at 7:46 PM, Russell Bryant  wrote:
>
>
> On Wed, Aug 3, 2016 at 6:31 PM, Ben Pfaff  wrote:
>>
>> On Wed, Aug 03, 2016 at 11:14:20PM +0800, Zong Kai LI wrote:
>> > This patch aims to extend Address_Set to Macros, make it more common to
>> > accept variable set, not only address. And by that, we can skinny down
>> > ACLs,
>> > if we use Macros to defines port name sets for ACL to use, since lots of
>> > ACL
>> > entries are similar but only "inport" and "outport" fields are
>> > different.
>> >
>> > Address_Set improves the performance of OVN control plane. But what more
>> > important is, its implementation introduced a "$macros -- element_set"
>> > mechanism, it's good way to do bunch jobs for ACLs, and even lflows.
>>
>> I'd prefer to defer this feature past the Open vSwitch 2.6 release.
>
>
> On an earlier patch, I suggested the idea that we might want to at least
> rename the table if we see that coming, even if we don't apply the rest of
> the changes.  That way, we get the non-backwards compatible part out of the
> way before 2.6.  What do you think?
>
> --
> Russell Bryant

Hi, Russell and Ben.
I submitted another patch @ http://patchwork.ozlabs.org/patch/657589/
, it's a only rename table Address_Set version. Since we only have
address set supported for now, so I didn't renames and changes
variables and methods used for address set.

I hope this can get merged recently, and I don't get misunderstand on
Russell's suggestion.

Thanks.
Zongkai, LI
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v1] ovn: rename Address_Set to Set to reflect a more broad purpose

2016-08-09 Thread Zong Kai LI
This patch renames table Address_Set to Set, Address_Set.addresses to
Set.members to reflect a more broad purpose, that we can define other types
of sets than address set.

Per discussion around [1] and [2], this patch only does rename work on table
Address_Set, and consider put future purpose on using table Set, to create
port set which will be used to improve and skinny ACLs, in following pacthes.

Since it only renames table Address_Set, and up to now, only address set get
supported, so this patch doesn't rename variables and methods which used for
address set, such as [3].

[1] http://openvswitch.org/pipermail/dev/2016-August/077121.html
[2] http://openvswitch.org/pipermail/dev/2016-August/077224.html
[3]
https://github.com/openvswitch/ovs/blob/master/ovn/controller/lflow.c#L41-L42
https://github.com/openvswitch/ovs/blob/master/ovn/controller/lflow.c#L62-L205

Signed-off-by: Zong Kai LI 
---
 ovn/controller/lflow.c| 22 +++---
 ovn/northd/ovn-northd.c   | 26 +-
 ovn/ovn-nb.ovsschema  | 12 ++--
 ovn/ovn-nb.xml| 13 +++--
 ovn/ovn-sb.ovsschema  | 12 ++--
 ovn/ovn-sb.xml|  6 +++---
 ovn/utilities/ovn-nbctl.c |  4 ++--
 ovn/utilities/ovn-sbctl.c |  4 ++--
 tests/ovn.at  |  2 +-
 9 files changed, 51 insertions(+), 50 deletions(-)

diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index ece464b..9faada4 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -82,20 +82,20 @@ addr_cmp(const void *p1, const void *p2)
 /* Return true if the address sets match, false otherwise. */
 static bool
 address_sets_match(const struct address_set *addr_set,
-   const struct sbrec_address_set *addr_set_rec)
+   const struct sbrec_set *addr_set_rec)
 {
 char **addrs1;
 char **addrs2;
 
-if (addr_set->n_addresses != addr_set_rec->n_addresses) {
+if (addr_set->n_addresses != addr_set_rec->n_members) {
 return false;
 }
 size_t n_addresses = addr_set->n_addresses;
 
 addrs1 = xmemdup(addr_set->addresses,
  n_addresses * sizeof addr_set->addresses[0]);
-addrs2 = xmemdup(addr_set_rec->addresses,
- n_addresses * sizeof addr_set_rec->addresses[0]);
+addrs2 = xmemdup(addr_set_rec->members,
+ n_addresses * sizeof addr_set_rec->members[0]);
 
 qsort(addrs1, n_addresses, sizeof *addrs1, addr_cmp);
 qsort(addrs2, n_addresses, sizeof *addrs2, addr_cmp);
@@ -142,8 +142,8 @@ update_address_sets(struct controller_ctx *ctx)
 
 /* Iterate address sets in the southbound database.  Create and update the
  * corresponding symtab entries as necessary. */
-const struct sbrec_address_set *addr_set_rec;
-SBREC_ADDRESS_SET_FOR_EACH (addr_set_rec, ctx->ovnsb_idl) {
+const struct sbrec_set *addr_set_rec;
+SBREC_SET_FOR_EACH (addr_set_rec, ctx->ovnsb_idl) {
 struct address_set *addr_set =
 shash_find_data(_address_sets, addr_set_rec->name);
 
@@ -169,13 +169,13 @@ update_address_sets(struct controller_ctx *ctx)
  * that resolves to the full set of addresses.  Store it in
  * address_sets to remember that we created this symbol. */
 addr_set = xzalloc(sizeof *addr_set);
-addr_set->n_addresses = addr_set_rec->n_addresses;
-if (addr_set_rec->n_addresses) {
-addr_set->addresses = xmalloc(addr_set_rec->n_addresses
+addr_set->n_addresses = addr_set_rec->n_members;
+if (addr_set_rec->n_members) {
+addr_set->addresses = xmalloc(addr_set_rec->n_members
   * sizeof addr_set->addresses[0]);
 size_t i;
-for (i = 0; i < addr_set_rec->n_addresses; i++) {
-addr_set->addresses[i] = 
xstrdup(addr_set_rec->addresses[i]);
+for (i = 0; i < addr_set_rec->n_members; i++) {
+addr_set->addresses[i] = xstrdup(addr_set_rec->members[i]);
 }
 }
 shash_add(_address_sets, addr_set_rec->name, addr_set);
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 861f872..acd9726 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -3787,29 +3787,29 @@ sync_address_sets(struct northd_context *ctx)
 {
 struct shash sb_address_sets = SHASH_INITIALIZER(_address_sets);
 
-const struct sbrec_address_set *sb_address_set;
-SBREC_ADDRESS_SET_FOR_EACH (sb_address_set, ctx->ovnsb_idl) {
+const struct sbrec_set *sb_address_set;
+SBREC_SET_FOR_EACH (sb_address_set, ctx->ovnsb_idl) {
 shash_add(_address_sets, sb_address_set->name, sb_address_set);
 }
 
-const struct nbrec_address_set *nb_address_set;
-NBREC_ADDRESS_SET_FOR_EACH (nb_address_set, ctx->ovnnb_idl) {
+const struct nbrec_set *nb_address_set;
+

[ovs-dev] [CudaMailTagged] 国家海运,什么样的货都可以接,价格优惠,时效稳定

2016-08-09 Thread Bright
深圳民海国际承接散货拼箱,整柜出口,可接受敏感货物,电池,化工品,化妆品等。
深圳市民海国际物流有限公司总部位于深圳,业务分布于香港、中国内地、东南亚以及全球极具活力和潜力的新兴市场,在国际物流界有着广泛影响。公司与多家世


界著名的运输供应商建立了长期而深厚的业务关系:国际海运主要代理OOCL、APL、EVERGREEN、YANGMING、WHL的东南亚/中东/印巴航线, 
COSCO、MISC、PIL、ANL


、MAERSK、OOCL的欧洲/地中海航线, COSCO、NYK、EVERGREEN、MSC、CMA-CGM、CSAV、SAFMARINE的拉美航线, 
SAFMARINE、MISC、CMA-CGM的非洲航线, APL、MOSK、


MAERSK的澳新航线以及是马来西亚航空、亚洲航空、泰国航空、中国国际航空、中国南方航空、国泰航空、日本航空、德国汉莎航空在深圳/广州/上海/香港地区的


核心空运订舱代理。稳定和优质的航线确保民海国际实现对客户所作“华夏环球、运所必达”的庄严承诺。
深圳市民海国际物流有限公司
Shenzhen minhai international logistics co., LTD
网址:www.minhaiwuliu.com
联系人:   梁亮/Bright
联系电话:18320766890
在线QQ:  327044034
E_MAIL:bri...@minhaiwuliu.com
地址:深圳市宝安区福永镇兴华路北创业城A栋305
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [v2] ovs-vsctl: simply vsctl_parent_process_info()

2016-08-09 Thread William Tu
Thanks for making this code much more clean. I've tested it and no problem.

Acked-by: William Tu 



On Tue, Aug 9, 2016 at 12:50 PM, Andy Zhou  wrote:
> Use ds_get_line() instead of hand rolling it. Rearrange the logic
> to removes some duplication.
>
> Signed-off-by: Andy Zhou 
>
> ---
> v1->v2:  rebase to current master.
> ---
>  utilities/ovs-vsctl.c | 16 +++-
>  1 file changed, 3 insertions(+), 13 deletions(-)
>
> diff --git a/utilities/ovs-vsctl.c b/utilities/ovs-vsctl.c
> index efdfb86..e710095 100644
> --- a/utilities/ovs-vsctl.c
> +++ b/utilities/ovs-vsctl.c
> @@ -2488,26 +2488,16 @@ vsctl_parent_process_info(void)
>  procfile = xasprintf("/proc/%d/cmdline", parent_pid);
>
>  f = fopen(procfile, "r");
> -if (!f) {
> -free(procfile);
> -return NULL;
> -}
>  free(procfile);
> -
> -for (;;) {
> -int c = getc(f);
> -if (!c || c == EOF) {
> -break;
> -}
> -ds_put_char(, c);
> +if (f) {
> +ds_get_line(, f);
> +fclose(f);
>  }
> -fclose(f);
>  } else {
>  ds_put_cstr(, "init");
>  }
>
>  ds_put_format(, " (pid %d)", parent_pid);
> -
>  return ds_steal_cstr();
>  #else
>  return NULL;
> --
> 1.9.1
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 3/3] netdev-dpdk: vHost client mode and reconnect

2016-08-09 Thread Daniele Di Proietto
2016-08-08 7:18 GMT-07:00 Loftus, Ciara :

> >
> > The patch mostly looks good to me, thanks.
> > I'm not 100% sure about the interface.  Can we make the flag interface
> > specific?
>
> I'm not 100% sure about making the flag interface specific :) Do you think
> there's a use case for both client and server mode ports to be used in
> conjunction with each other?
>

Well, I don't have any specific use case in mind :-).  I just think it's
cleaner making it per interface for two reasons:

* I'd like to provide the user with the maximum flexibility that the API
allows.  I don't like adding artificial limitations, especially in user
interfaces, since those are supposed to be stable.
* The behavior of an interface depends on the status of the switch.  It's
like having a global variable that influences the behavior of all the
functions.


>
> > If I'm not mistaken we currently limit vhost-sock-dir to be under OVS
> > rundir.  With client mode this is not necessary anymore.
>
> Correct I've fixed this in the next version. Thanks.
>
> > I hope that client will be made the default mode at some point, I think
> we
> > should keep that in mind when considering the interface.
>
> I agree. I think we should wait until at least the QEMU v2.7.0 release
> though.
>
> > Since we're planning to break compatibility with the dpdk phy naming
> > change, maybe we can break compatibility also with vhost ports and add a
> > path option.
>
> Ok. So something like this?
>
> ovs-vsctl add-port vhost0
> ovs-vsctl set Interface vhost0 options:vhost-path=/tmp/v0.sock


Maybe we can rely on the presence of this attribute to discern between
client and server ports (I would call it vhost-server-path).


> Maybe something for a separate standalone patch?
>

If we're going with the per-interface configuration I think this should be
done at the same time.

Thanks,

Daniele


> Thanks,
> Ciara
>
> >
> > Thoughts?
> > Daniele
> >
> > 2016-08-04 7:09 GMT-07:00 Ciara Loftus :
> > A new other_config DB option has been added called 'vhost-driver-mode'.
> > By default this is set to 'server' which is the mode of operation OVS
> > with DPDK has used up until this point - whereby OVS creates and manages
> > vHost user sockets.
> >
> > If set to 'client', OVS will act as the vHost client and connect to
> > sockets created and managed by QEMU which acts as the server. This mode
> > allows for reconnect capability, which allows vHost ports to resume
> > normal connectivity in event of switch reset.
> >
> > QEMU v2.7.0+ is required when using OVS in client mode and QEMU in
> > server mode.
> >
> > Signed-off-by: Ciara Loftus 
> > ---
> > v2
> > - Updated comments in vhost construct & destruct
> > - Add check for server-mode before printing error when destruct is called
> >   on a running VM
> > - Fixed coding style/standards issues
> > - Use strcmp instead of strncmp when processing 'vhost-driver-mode'
> >
> >  INSTALL.DPDK-ADVANCED.md | 27 +++
> >  NEWS |  1 +
> >  lib/netdev-dpdk.c| 31 +++
> >  vswitchd/vswitch.xml | 13 +
> >  4 files changed, 64 insertions(+), 8 deletions(-)
> >
> > diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> > index f9587b5..a773533 100755
> > --- a/INSTALL.DPDK-ADVANCED.md
> > +++ b/INSTALL.DPDK-ADVANCED.md
> > @@ -483,6 +483,33 @@ For users wanting to do packet forwarding using
> > kernel stack below are the steps
> > where `-L`: Changes the numbers of channels of the specified
> network
> > device
> > and `combined`: Changes the number of multi-purpose channels.
> >
> > +4. Enable OVS vHost client-mode & vHost reconnect (OPTIONAL)
> > +
> > +   By default, OVS DPDK acts as the vHost socket server and QEMU the
> > +   client. In QEMU v2.7 the option is available for QEMU to act as
> the
> > +   server. In order for this to work, OVS DPDK must be switched to
> 'client'
> > +   mode. This is possible by setting the 'vhost-driver-mode' DB
> entry to
> > +   'client' like so:
> > +
> > +   ```
> > +   ovs-vsctl set Open_vSwitch . other_config:vhost-driver-
> mode="client"
> > +   ```
> > +
> > +   This must be done before the switch is launched. It cannot
> sucessfully
> > +   be changed after switch has launched.
> > +
> > +   One must also append ',server' to the 'chardev' arguments on the
> > QEMU
> > +   command line, to instruct QEMU to use vHost server mode, like so:
> > +
> > +   
> > +   -chardev
> > socket,id=char0,path=/usr/local/var/run/openvswitch/vhost0,server
> > +   
> > +
> > +   One benefit of using this mode is the ability for vHost ports to
> > +   'reconnect' in event of the switch crashing or being brought
> down. Once
> > +   it is brought back up, the vHost ports will reconnect
> automatically and
> > +   normal service 

Re: [ovs-dev] [PATCH v2] netdev-dpdk: Avoid reconfiguration on reconnection of same vhost device.

2016-08-09 Thread Daniele Di Proietto
2016-08-08 4:19 GMT-07:00 Ilya Maximets :

> Binding/unbinding of virtio driver inside VM leads to reconfiguration
> of PMD threads. This behaviour may be abused by executing bind/unbind
> in an infinite loop to break normal networking on all ports attached
> to the same instance of Open vSwitch.
>
> Fix that by avoiding reconfiguration if it's not necessary.
> Number of queues will not be decreased to 1 on device disconnection but
> it's not very important in comparison with possible DOS attack from the
> inside of guest OS.
>
>
Makes sense to me

Applied to master, thanks


> Fixes: 81acebdaaf27 ("netdev-dpdk: Obtain number of queues for vhost
>   ports from attached virtio.")
> Reported-by: Ciara Loftus 
> Signed-off-by: Ilya Maximets 
> ---
>
> Version 2:
> * Set 'vhost_reconfigured' flag if reconfiguration not
>   required.
> * Rebased on current master.
>
>  lib/netdev-dpdk.c | 19 +++
>  1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index b671601..ea0e16e 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2299,10 +2299,17 @@ new_device(int vid)
>  newnode = dev->socket_id;
>  }
>
> -dev->requested_socket_id = newnode;
> -dev->requested_n_rxq = qp_num;
> -dev->requested_n_txq = qp_num;
> -netdev_request_reconfigure(>up);
> +if (dev->requested_n_txq != qp_num
> +|| dev->requested_n_rxq != qp_num
> +|| dev->requested_socket_id != newnode) {
> +dev->requested_socket_id = newnode;
> +dev->requested_n_rxq = qp_num;
> +dev->requested_n_txq = qp_num;
> +netdev_request_reconfigure(>up);
> +} else {
> +/* Reconfiguration not required. */
> +dev->vhost_reconfigured = true;
> +}
>
>  ovsrcu_index_set(>vid, vid);
>  exists = true;
> @@ -2362,11 +2369,7 @@ destroy_device(int vid)
>  ovs_mutex_lock(>mutex);
>  dev->vhost_reconfigured = false;
>  ovsrcu_index_set(>vid, -1);
> -/* Clear tx/rx queue settings. */
>  netdev_dpdk_txq_map_clear(dev);
> -dev->requested_n_rxq = NR_QUEUE;
> -dev->requested_n_txq = NR_QUEUE;
> -netdev_request_reconfigure(>up);
>
>  netdev_change_seq_changed(>up);
>  ovs_mutex_unlock(>mutex);
> --
> 2.7.4
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] ovn-controller: Reset flow processing after (re)connection to switch

2016-08-09 Thread Numan Siddique
On Aug 9, 2016 8:28 PM, "Ryan Moats"  wrote:
>
> Numan Siddique  wrote on 08/09/2016 09:39:21 AM:
>
> > From: Numan Siddique 
> > To: Ryan Moats/Omaha/IBM@IBMUS
> > Cc: ovs dev 
> > Date: 08/09/2016 09:39 AM
> > Subject: Re: [ovs-dev] [PATCH] ovn-controller: Reset flow processing
> > after (re)connection to switch
>
> >
> > On Tue, Aug 9, 2016 at 7:15 PM, Ryan Moats  wrote:
> > "dev"  wrote on 08/09/2016 07:19:27 AM:
> >
> > > From: Numan Siddique 
> > > To: ovs dev 
> > > Date: 08/09/2016 07:19 AM
> > > Subject: [ovs-dev] [PATCH] ovn-controller: Reset flow processing
> > > after (re)connection to switch
> > > Sent by: "dev" 
> > >
> > > When ovn-controller reconnects to the ovs-vswitchd, it deletes all the
> > > OF flows in the switch. It doesn't install the flows again, leaving
> > > the datapath broken unless ovn-controller is restarted or ovn-northd
> > > updates the SB DB.
> > >
> > > The reason for this is
> > >   - lflow_reset_processing() is not called after the reconnection
> > >   - the hmap "installed_flows" is not cleared, because of which
> > > ofctrl_put skips adding the flows to the switch.
> > >
> > > This patch fixes the issue and also adds a test case to test
> > > this scenario.
> > >
> > > Signed-off-by: Numan Siddique 
> > > ---
> >
> > I'm going to pick a nit on this one - is the behavior you are aiming
> > for delete and re-add or just recalculate and leave alone?
>
> >
> > ​In my testing I am seeing that all the OF flows are getting de​
> > leted when I restart ovs-vswitchd.
> > I am testing with the latest master of OVS. I am able to see this on
> > 2 different machines and also in sandbox.
> >
> > I thought that ovn-controller is deleting the flows in the switch
> > when it restarts (https://github.com/openvswitch/ovs/blob/master/
> > ovn/controller/ofctrl.c#L355)
> >
> > Now I tested again and before restarting ovs-vswitchd, I killed ovn-
> > controller. Looks like ovs-vswitchd is clearing the old flows when
> > it restarts. I am not sure if this is the intended behavior. Looks
> > like it is. Please correct me if I am wrong here.
> >
> > ​You can run below commands to reproduce the issue in sandbox​
> >
> > -
> >  $make sandbox SANDBOXFLAGS="--ovn"
> >  $ovn/env1/setup.sh
> >  $ovs-ofctl dump-flows br-int
> >  $ovs-appctl -t ovn-controller exit
> >  $ovs-appctl -t ovs-vswitchd exit
> > ​ $ovs-vswitchd --detach --no-chdir --pidfile -vconsole:off --log-
> > file --enable-dummy=override -vvconn -vnetdev_dummy
> >  $ ovs-ofctl dump-flows br-int
> > NXST_FLOW reply (xid=0x4):
> > 
> >
> > You will see that the flows are deleted even if you don't run -
> > "ovs-appctl -t ovn-controller exit".
> >
> > I ask because if it is "delete and re-add" aren't you still disrupting
> > the datapath even if only momentarily?
> >
>
> Ok, so we'll assume that your code is valid for when ovswitchd purges
> the old flows and that's good.
>
> IIRC there is a way to restart ovswitchd via ovs_ctl so that it doesn't
> purge the old flows.  Since I'll argue (as an operator) that is the more
> important case, can you add a unit test for this and verify that your
> patch doesn't leave that path broken?
>
>
Sure. I will do that.

Thanks
Numan
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: add DPDK pdump capability

2016-08-09 Thread Daniele Di Proietto
This is interesting, thanks for working on this.

The patch looks pretty simple, most of the magic happens in DPDK, so I will
comment only on the OvS side of things.

Why is 'other_config:dpdk-pdump' required?  Can't we always enable the
feature?  I tried running with it and I didn't notice any slowdown, unless
there's actually a listener (I'm not sure if this is true for every driver,
though).  Having a feature switch that requires a restart makes it really
hard to debug production systems, which I'd say is one of the most
interesting use cases of such a facility.

Should we perhaps call rte_pdump_init() with a path in ovs_rundir()?  I'm
not sure what's the best practice for DPDK apps in this regard

Thanks,

Daniele

2016-08-04 3:47 GMT-07:00 Ciara Loftus :

> This commit provides the ability to 'listen' on DPDK ports and save
> packets to a pcap file with a DPDK app that uses the librte_pdump
> library. One such app is the 'pdump' app that can be found in the DPDK
> 'app' directory. Instructions on how to use this can be found in
> INSTALL.DPDK-ADVANCED.md
>
> The pdump feature is optional. Should you wish to use it, pcap libraries
> must to be installed on the system and the CONFIG_RTE_LIBRTE_PMD_PCAP=y
>

Extra 'to'


> and CONFIG_RTE_LIBRTE_PDUMP=y options set in DPDK. Additionally you must
> set the 'dpdk-pdump' ovs other_config DB value to 'true'.
>
> Signed-off-by: Ciara Loftus 
> ---
>  INSTALL.DPDK-ADVANCED.md | 30 --
>  NEWS |  1 +
>  acinclude.m4 | 23 +++
>  lib/netdev-dpdk.c| 19 +++
>  vswitchd/vswitch.xml | 12 
>  5 files changed, 83 insertions(+), 2 deletions(-)
>
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index c8d69ae..877824b 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>  7. [QOS](#qos)
>  8. [Rate Limiting](#rl)
>  9. [Flow Control](#fc)
> -10. [Vsperf](#vsperf)
> +10. [Pdump](#pdump)
> +11. [Vsperf](#vsperf)
>
>  ##  1. Overview
>
> @@ -862,7 +863,32 @@ respective parameter. To disable the flow control at
> tx side,
>
>  `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
>
> -##  10. Vsperf
> +##  10. Pdump
> +
> +Pdump allows you to listen on DPDK ports and view the traffic that is
> +passing on them. To use this utility, one must have libpcap installed
> +on the system. Furthermore, DPDK must be built with
> CONFIG_RTE_LIBRTE_PDUMP=y
> +and CONFIG_RTE_LIBRTE_PMD_PCAP=y. And finally, the following database
> +value must be set before launching the switch, like so:
> +
> +`ovs-vsctl set Open_vSwitch . other_config:dpdk-pdump=true`
> +
> +To use pdump, simply launch OVS as usual. Then, navigate to the
> 'app/pdump'
> +directory in DPDK, 'make' the application and run like so:
> +
> +`sudo ./build/app/dpdk_pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/
> rx.pcap'`
> +
> +The above command captures traffic received on queue 0 of port 0 and
> stores
> +it in /tmp/rx.pcap. Other combinations of port numbers, queues numbers and
> +pcap locations are of course also available to use. More information on
> the
> +pdump app and its usage can be found in the below link.
> +
> +http://dpdk.org/doc/guides/sample_app_ug/pdump.html
> +
> +A performance decrease is expected when using a monitoring application
> like
> +the DPDK pdump app.
> +
> +##  11. Vsperf
>
>  Vsperf project goal is to develop vSwitch test framework that can be used
> to
>  validate the suitability of different vSwitch implementations in a Telco
> deployment
> diff --git a/NEWS b/NEWS
> index c2ed71d..3f40e23 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -69,6 +69,7 @@ Post-v2.5.0
>   * Basic connection tracking for the userspace datapath (no ALG,
> fragmentation or NAT support yet)
>   * Support for DPDK 16.07
> + * Optional support for DPDK pdump enabled.
> - Increase number of registers to 16.
> - ovs-benchmark: This utility has been removed due to lack of use and
>   bitrot.
> diff --git a/acinclude.m4 b/acinclude.m4
> index f02166d..b8f1850 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -211,6 +211,29 @@ AC_DEFUN([OVS_CHECK_DPDK], [
>
>  AC_SEARCH_LIBS([get_mempolicy],[numa],[],[AC_MSG_ERROR([unable to
> find libnuma, install the dependency package])])
>
> +AC_COMPILE_IFELSE([
> +  AC_LANG_PROGRAM(
> +[
> +  #include 
> +#if RTE_LIBRTE_PMD_PCAP
> +#error
> +#endif
> +], [])
> +  ], [],
> +  [AC_SEARCH_LIBS([pcap_dump],[pcap],[],[AC_MSG_ERROR([unable to
> find libpcap, install the dependency package])])
> +   DPDK_EXTRA_LIB="-lpcap"
> +   AC_COMPILE_IFELSE([
> + AC_LANG_PROGRAM(
> +   [
> + #include 
> +#if RTE_LIBRTE_PDUMP
> +#error
> +#endif
> + ], [])
> +   ], [],
> +   [AC_DEFINE([DPDK_PDUMP], [1], [DPDK pdump enabled 

[ovs-dev] [PATCH] sandbox: launch SB backup server when running in OVN mode

2016-08-09 Thread Andy Zhou
Automatically launch backup server for OVN SB database that replicates
all transactions of the active server. This can be handy for
experimenting with the newly added replication feature.

Signed-off-by: Andy Zhou 
---
 tutorial/ovs-sandbox | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/tutorial/ovs-sandbox b/tutorial/ovs-sandbox
index 69c2c68..6f03ede 100755
--- a/tutorial/ovs-sandbox
+++ b/tutorial/ovs-sandbox
@@ -316,10 +316,12 @@ if $ovn; then
 touch "$sandbox"/.ovnsb.db.~lock~
 touch "$sandbox"/.ovnnb.db.~lock~
 run ovsdb-tool create ovnsb.db "$ovnsb_schema"
+run ovsdb-tool create ovnsb2.db "$ovnsb_schema"
 run ovsdb-tool create ovnnb.db "$ovnnb_schema"
 run ovsdb-tool create vtep.db "$vtep_schema"
 ovsdb_server_args="vtep.db conf.db"
 ovsdb_sb_server_args="ovnsb.db"
+ovsdb_sb_backup_server_args="ovnsb2.db"
 ovsdb_nb_server_args="ovnnb.db"
 fi
 rungdb $gdb_ovsdb $gdb_ovsdb_ex ovsdb-server --detach --no-chdir --pidfile 
-vconsole:off --log-file \
@@ -333,6 +335,13 @@ if $ovn; then
 --pidfile="$sandbox"/ovnsb_db.pid -vconsole:off \
 --log-file="$sandbox"/ovnsb_db.log \
 --remote=punix:"$sandbox"/ovnsb_db.sock $ovsdb_sb_server_args
+# Start SB back up server
+rungdb $gdb_ovsdb $gdb_ovsdb_ex ovsdb-server --detach --no-chdir \
+--pidfile="$sandbox"/ovnsb_db2.pid  -vconsole:off \
+--log-file="$sandbox"/ovnsb_db2.log \
+--remote=punix:"$sandbox"/ovnsb_db2.sock \
+--unixctl="$sandbox"/sb_backup_unixctl \
+--sync-from=unix:"$sandbox"/ovnsb_db.sock $ovsdb_sb_backup_server_args
 fi
 
 #Add a small delay to allow ovsdb-server to launch.
@@ -389,6 +398,12 @@ if $ovn; then cat << EOF
 This environment also has the OVN daemons and databases enabled.
 You can use ovn-nbctl and ovn-sbctl to interact with the OVN databases.
 
+The backup server of OVN SB can be accessed by:
+* ovn-sbctl --db=unix:`pwd`/sandbox/ovnsb_db2.sock
+* ovs-appctl -t `pwd`/sandbox/sb_backup_unixctl
+The backup database file is "sandbox"/ovnsb2.db
+
+
 EOF
 fi
 cat 

[ovs-dev] Returned mail: see transcript for details

2016-08-09 Thread Automatic Email Delivery Software
The original message was received at Wed, 10 Aug 2016 07:00:44 +0800
from [97.17.115.44]

- The following addresses had permanent fatal errors -
dev@openvswitch.org



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 2/3] datapath-windows: Add multiple switch internal ports

2016-08-09 Thread Sairam Venugopal
Hi Alin,

I took a preliminary look at the patch and have added some review
comments. 

I was mainly concerned about the use of switchContext->internalPortId when
we no longer have switchContext->internalVport and we can have multiple
internal ports. 

Thanks,
Sairam

On 8/2/16, 12:47 PM, "Alin Serdean" 
wrote:

>This patch adds multiple internal ports support to the windows datapath.
>All tunnels types have been updated to accommodate this new functionality.
>
>Signed-off-by: Alin Gabriel Serdean 
>Co-authored-by: Sorin Vinturis 
>Acked-by: Paul Boca 
>---
>v3: Add acked
>v2: Rebase
>---
> datapath-windows/ovsext/Actions.c  |  47 +-
> datapath-windows/ovsext/Geneve.c   |   5 +-
> datapath-windows/ovsext/Geneve.h   |   6 +-
> datapath-windows/ovsext/Gre.c  |   5 +-
> datapath-wndows/ovsext/Gre.h  |   8 +-
> datapath-windows/ovsext/IpHelper.c | 917
>-
> datapath-windows/ovsext/IpHelper.h |  20 +-
> datapath-windows/ovsext/Stt.c  |   5 +-
> datapath-windows/ovsext/Stt.h  |   7 +-
> datapath-windows/ovsext/Switch.h   |   8 +-
> datapath-windows/ovsext/Vport.c   | 135 +++---
> datapath-windows/ovsext/Vport.h| 104 ++---
> datapath-windows/ovsext/Vxlan.c|   8 +-
> datapath-windows/ovsext/Vxlan.h|  10 +-
> 14 files changed, 882 insertions(+), 403 deletions(-)
>
>diff --git a/datapath-windows/ovsext/Actions.c
>b/datapath-windows/ovsext/Actions.c
>index 722a2a8..803b2b7 100644
>--- a/datapath-windows/ovsext/Actions.c
>+++ b/datapath-windows/ovsext/Actions.c
>@@ -301,7 +301,6 @@ OvsDetectTunnelPkt(OvsForwardingContext *ovsFwdCtx,
> return TRUE;
> }
> } else if (OvsIsTunnelVportType(dstVport->ovsType)) {
>-ASSERT(ovsFwdCtx->tunnelTxNic == NULL);
> ASSERT(ovsFwdCtx->tunnelRxNic == NULL);
> 
> /*
>@@ -322,7 +321,7 @@ OvsDetectTunnelPkt(OvsForwardingContext *ovsFwdCtx,
> 
> if (!vport ||
> (vport->ovsType != OVS_VPORT_TYPE_NETDEV &&
>- !OvsIsBridgeInternalVport(vport))) {
>+ vport->ovsType != OVS_VPORT_TYPE_INTERNAL)) {
> ovsFwdCtx->tunKey.dst = 0;
> }
> }
>@@ -402,10 +401,6 @@ OvsAddPorts(OvsForwardingContext *ovsFwdCtx,
> vport->stats.txBytes +=
> 
>NET_BUFFER_DATA_LENGTH(NET_BUFFER_LIST_FIRST_NB(ovsFwdCtx->curNbl));
> 
>-if (OvssBridgeInternalVport(vport)) {
>-return NDIS_STATUS_SUCCESS;
>-}
>-
> if (OvsDetectTunnelPkt(ovsFwdCtx, vport, flowKey)) {
> return NDIS_STATUS_SUCCESS;
> }
>@@ -667,41 +662,36 @@ OvsTunnelPortTx(OvsForwardingContext *ovsFwdCtx)
>  * Setup the source port to be the internal port to as to facilitate
>the
>  * second OvsLookupFlow.
>  */
>-if (ovsFwdCtx->switchContext->internalVport == NULL ||
>+if (ovsFwdCtx->switchContext->countInternalVports <= 0 ||
> ovsFwdCtx->switchContext->virtualExternalVport == NULL) {
> OvsClearTunTxCtx(ovsFwdCtx);
> OvsCompleteNBLForwardingCtx(ovsFwdCtx,
> L"OVS-Dropped since either internal or external port is
>absent");
> return NDIS_STATUS_FAILURE;
> }
>-ovsFwdCtx->srcVportNo =
>-
>((POVS_VPORT_ENTRY)ovsFwdCtx->switchContext->internalVport)->portNo;
> 
>-ovsFwdCtx->fwdDetail->SourcePortId =
>ovsFwdCtx->switchContext->internalPortId;
>-ovsFwdCtx->fwdDetail->SourceNicIndex =
>-
>((POVS_VPORT_ENTRY)ovsFwdCtx->switchContext->internalVport)->nicIndex;
>-
>-/* Do the encap. Encap function does not consume the NBL. */
>+OVS_FWD_INFO switchFwdInfo = { 0 };
>+/* Do the encapsulation. The encapsulation will not consume the NBL.
>*/
> switch(ovsFwdCtx->tunnelTxNic->ovsType) {
> case OVS_VPORT_TYPE_GRE:
> status = OvsEncapGre(ovsFwdCtx->tunnelTxNic, ovsFwdCtx->curNbl,
>  >tunKey,
>ovsFwdCtx->switchContext,
>- >layers, );
>+ >layers, ,
>);
> break;
> case OVS_VPORT_TYPE_VXLAN:
> status = OvsEncapVxlan(ovsFwdCtx->tunnelTxNic, ovsFwdCtx->curNbl,
>>tunKey,
>ovsFwdCtx->switchContext,
>-   >layers, );
>+   >layers, ,
>);
> break;
> case OVS_VPORT_TYPE_STT:
> status = OvsEncapStt(ovsFwdCtx->tunnelTxNic, ovsFwdCtx->curNbl,
>  >tunKey,
>ovsFwdCtx->switchContext,
>- >layers, );
>+ >layers, ,
>);
> break;
> case OVS_VPORT_TYPE_GENEVE:
> status = OvsEncapGeneve(ovsFwdCtx->tunnelTxNic,
>ovsFwdCtx->curNbl,
> >tunKey,
>ovsFwdCtx->switchContext,
>->layers, );
>+>layers, ,
>);
> 

Re: [ovs-dev] [PATCH 1/2] netdev-dpdk: Fix dead initialization reported by clang.

2016-08-09 Thread Daniele Di Proietto
Applied to master, thanks

2016-08-07 14:06 GMT-07:00 Bhanuprakash Bodireddy <
bhanuprakash.bodire...@intel.com>:

> Clang reports that value stored to 'tok' during initialization is never
> read.
>
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  lib/netdev-dpdk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index f37ec1c..dd79e4b 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -3063,7 +3063,7 @@ extra_dpdk_args(const char *ovs_extra_config, char
> ***argv, int argc)
>  {
>  int ret = argc;
>  char *release_tok = xstrdup(ovs_extra_config);
> -char *tok = release_tok, *endptr = NULL;
> +char *tok, *endptr = NULL;
>
>  for (tok = strtok_r(release_tok, " ", ); tok != NULL;
>   tok = strtok_r(NULL, " ", )) {
> --
> 2.4.11
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] INSTALL.DPDK: Update documentation for DPDK 16.07 support

2016-08-09 Thread Daniele Di Proietto
Applied to master, thanks

2016-08-08 0:55 GMT-07:00 Loftus, Ciara :

> >
> > Replace 'dpdk_nic_bind.py' references with 'dpdk-devbind.py'. The script
> > name is changed in DPDK 16.07 as the script can be used also on crypto
> > devices along with NICs.
> >
> > Update the command for setting packet forwarding mode in 'testpmd' app
> > from 'set fwd mac_retry' to 'set fwd mac retry'.
> >
> > Signed-off-by: Bhanuprakash Bodireddy
> > 
> > ---
> >  INSTALL.DPDK.md | 16 
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> > index 253d022..c0686ce 100644
> > --- a/INSTALL.DPDK.md
> > +++ b/INSTALL.DPDK.md
> > @@ -153,8 +153,8 @@ advanced install guide [INSTALL.DPDK-
> > ADVANCED.md]
> >  modprobe vfio-pci
> >  sudo /usr/bin/chmod a+x /dev/vfio
> >  sudo /usr/bin/chmod 0666 /dev/vfio/*
> > -$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1
> > -$DPDK_DIR/tools/dpdk_nic_bind.py --status
> > +$DPDK_DIR/tools/dpdk-devbind.py --bind=vfio-pci eth1
> > +$DPDK_DIR/tools/dpdk-devbind.py --status
> >  ```
> >
> >Note: If running kernels < 3.6 UIO drivers to be used,
> > @@ -398,8 +398,8 @@ can be found in [Vhost Walkthrough].
> >mount -t hugetlbfs hugetlbfs /dev/hugepages (only if not already
> > mounted)
> >modprobe uio
> >insmod $DPDK_BUILD/kmod/igb_uio.ko
> > -  $DPDK_DIR/tools/dpdk_nic_bind.py --status
> > -  $DPDK_DIR/tools/dpdk_nic_bind.py -b igb_uio 00:03.0 00:04.0
> > +  $DPDK_DIR/tools/dpdk-devbind.py --status
> > +  $DPDK_DIR/tools/dpdk-devbind.py -b igb_uio 00:03.0 00:04.0
> >```
> >
> >vhost ports pci ids can be retrieved using `lspci | grep Ethernet`
> cmd.
> > @@ -570,18 +570,18 @@ can be found in [Vhost Walkthrough].
> > ```
> > cd $DPDK_DIR/app/test-pmd;
> > ./testpmd -c 0x3 -n 4 --socket-mem 1024 -- --burst=64 -i
> --txqflags=0xf00
> > --disable-hw-vlan
> > -   set fwd mac_retry
> > +   set fwd mac retry
> > start
> > ```
> >
> > * Bind vNIC back to kernel once the test is completed.
> >
> > ```
> > -   $DPDK_DIR/tools/dpdk_nic_bind.py --bind=virtio-pci :00:03.0
> > -   $DPDK_DIR/tools/dpdk_nic_bind.py --bind=virtio-pci :00:04.0
> > +   $DPDK_DIR/tools/dpdk-devbind.py --bind=virtio-pci :00:03.0
> > +   $DPDK_DIR/tools/dpdk-devbind.py --bind=virtio-pci :00:04.0
> > ```
> > Note: Appropriate PCI IDs to be passed in above example. The PCI
> IDs
> > can be
> > -   retrieved using '$DPDK_DIR/tools/dpdk_nic_bind.py --status' cmd.
> > +   retrieved using '$DPDK_DIR/tools/dpdk-devbind.py --status' cmd.
> >
> >  ### 5.3 PHY-VM-PHY [IVSHMEM]
> >
> > --
> > 2.4.11
>
> Acked-by: Ciara Loftus 
>
> >
> > ___
> > dev mailing list
> > dev@openvswitch.org
> > http://openvswitch.org/mailman/listinfo/dev
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 1/1] netdev-dpdk: Fix egress policer error detection bug.

2016-08-09 Thread Daniele Di Proietto
2016-08-09 10:20 GMT-07:00 Ian Stokes :

> When egress policer is set as a QoS type for a port, an error may occur
> during
> setup if incorrect parameters are used for the rte_meter. If this occurs
> the egress policer construct and set functions should free any allocated
> memory relevant to the policer and set the QoS configuration pointer to
> null. The netdev_dpdk_set_qos function should check the error value
> returned
> for any QoS construct/set calls with an assertion to avoid segfault.
> Also this commit modifies egress_policer_qos_set() to correctly lock the
> QoS
> spinlock while the egress policer configuration is updated to avoid
> segfault.
>
> Signed-off-by: Ian Stokes 
> ---
> v2
> * netdev-dpdk.c
> - Simplify assertion in netdev_dpdk_set_qos() to check that no error
>   has been returned and that a QoS configuration exists before checking
>   and logging an error.
> - Use rte_strerror  in netdev_dpdk_set_qos() when logging error for a
>   textual representation.
> - Align VLOG message for correct formatting in netdev_dpdk_set_qos().
> - egress_policer_qos_construct() now returns positive error.
> - egress_policer_qos_set() now return positive error.
> - Document addition of spinlock in egress_policer_qos_set() in commit
>   message.
> ---
>  lib/netdev-dpdk.c |   30 --
>  1 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index bf3a898..f37130e 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2731,11 +2731,16 @@ netdev_dpdk_set_qos(struct netdev *netdev,
>
>  /* Install new QoS configuration. */
>  error = new_ops->qos_construct(netdev, details);
> -ovs_assert((error == 0) == (dev->qos_conf != NULL));
>  }
>  } else {
>  error = new_ops->qos_construct(netdev, details);
> -ovs_assert((error == 0) == (dev->qos_conf != NULL));
> +}
> +
> +ovs_assert((error == 0) == (dev->qos_conf != NULL));
> +if (error) {
> +VLOG_ERR("Failed to set QoS type %s on port %s, returned error:
> %s",
> + type, netdev->name, rte_strerror(-error));
> +ovs_assert(dev->qos_conf == NULL);
>

This assert should be unnecessary, given the assert above the if.

I removed it and I pushed this to master, thanks
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] INSTALL.DPDK-ADVANCED: Add vhost multiqueue loopback testcase.

2016-08-09 Thread Daniele Di Proietto
Applied to master, thanks

2016-07-28 5:48 GMT-07:00 Bhanuprakash Bodireddy <
bhanuprakash.bodire...@intel.com>:

> Add steps for loopback test using vhost-user configured with multiqueue
> doing packet forwarding in kernel.
>
> Signed-off-by: Bhanuprakash Bodireddy 
> ---
>  INSTALL.DPDK-ADVANCED.md | 86 ++
> ++
>  1 file changed, 86 insertions(+)
>
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 9ae536d..63440d0 100644
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -372,6 +372,91 @@ For users wanting to do packet forwarding using
> kernel stack below are the steps
> where "-n 0" refers to ring '0' i.e dpdkr0
> ```
>
> +### 5.3 PHY-VM-PHY [VHOST MULTIQUEUE]
> +
> +  The steps (1-5) in 3.3 section of [INSTALL DPDK] guide will create &
> initialize DB,
> +  start vswitchd and add dpdk devices to bridge br0.
> +
> +  1. Configure PMD and RXQs. For example set no. of dpdk port rx queues
> to atleast 2.
> + The number of rx queues at vhost-user interface gets automatically
> configured after
> + virtio device connection and doesn't need manual configuration.
> +
> + ```
> + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=c
> + ovs-vsctl set Interface dpdk0 options:n_rxq=2
> + ovs-vsctl set Interface dpdk1 options:n_rxq=2
> + ```
> +
> +  2. Instantiate Guest VM using Qemu cmdline
> +
> +   Guest Configuration
> +
> +   ```
> +   | configuration| values | comments
> +   |--||-
> +   | qemu version | 2.5.0  |
> +   | qemu thread affinity |2 cores | taskset 0x30
> +   | memory   | 4GB| -
> +   | cores| 2  | -
> +   | Qcow2 image  |Fedora22| -
> +   | multiqueue   |   on   | -
> +   ```
> +
> +   Instantiate Guest
> +
> +   ```
> +   export VM_NAME=vhost-vm
> +   export GUEST_MEM=4096M
> +   export QCOW2_IMAGE=/root/Fedora22_x86_64.qcow2
> +   export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch
> +
> +   taskset 0x30 qemu-system-x86_64 -cpu host -smp 2,cores=2 -drive
> file=$QCOW2_IMAGE -m 4096M --enable-kvm -name $VM_NAME -nographic -object
> memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on
> -numa node,memdev=mem -mem-prealloc -chardev 
> socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser0
> -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=2
> -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mq=on,vectors=6
> -chardev socket,id=char2,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev
> type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=2 -device
> virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=6
> +   ```
> +
> +   Note: Queue value above should match the queues configured in OVS,
> The vector value
> +   should be set to 'no. of queues x 2 + 2'.
> +
> +  3. Guest interface configuration
> +
> + Assuming there are 2 interfaces in the guest named eth0, eth1 check
> the channel
> + configuration and set the number of combined channels to 2 for
> virtio devices.
> + More information can be found in [Vhost walkthrough] section.
> +
> +   ```
> +   ethtool -l eth0
> +   ethtool -L eth0 combined 2
> +   ethtool -L eth1 combined 2
> +   ```
> +
> +  4. Kernel Packet forwarding
> +
> + Configure IP and enable interfaces
> +
> + ```
> + ifconfig eth0 5.5.5.1/24 up
> + ifconfig eth1 90.90.90.1/24 up
> + ```
> +
> + Configure IP forwarding and add route entries
> +
> + ```
> + sysctl -w net.ipv4.ip_forward=1
> + sysctl -w net.ipv4.conf.all.rp_filter=0
> + sysctl -w net.ipv4.conf.eth0.rp_filter=0
> + sysctl -w net.ipv4.conf.eth1.rp_filter=0
> + ip route add 2.1.1.0/24 dev eth1
> + route add default gw 2.1.1.2 eth1
> + route add default gw 90.90.90.90 eth1
> + arp -s 90.90.90.90 DE:AD:BE:EF:CA:FE
> + arp -s 2.1.1.2 DE:AD:BE:EF:CA:FA
> + ```
> +
> + Check traffic on multiple queues
> +
> + ```
> + cat /proc/interrupts | grep virtio
> + ```
> +
>  ##  6. Vhost Walkthrough
>
>  DPDK 16.04 supports two types of vhost:
> @@ -848,5 +933,6 @@ Please report problems to b...@openvswitch.org.
>  [DPDK Docs]: http://dpdk.org/doc
>  [libvirt]: http://libvirt.org/formatdomain.html
>  [Guest VM using libvirt]: INSTALL.DPDK.md#ovstc
> +[Vhost walkthrough]: INSTALL.DPDK.md#vhost
>  [INSTALL DPDK]: INSTALL.DPDK.md#build
>  [INSTALL OVS]: INSTALL.DPDK.md#build
> --
> 2.4.11
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] Revert "pvector: Expose non-concurrent priority vector."

2016-08-09 Thread Daniele Di Proietto
Simple revert, looks good to me, thanks

Acked-by: Daniele Di Proietto 


2016-08-09 13:59 GMT-07:00 Jarno Rajahalme :

> This reverts commit 8bdfe1313894047d44349fa4cf4402970865950f.
>
> I failed to see that lib/dpif-netdev.c actually needs the concurrency
> provided by pvector prior to this change.  More specifically, when a
> subtable is removed, concurrent lookups may skip over another subtable
> swapped in to the place of the removed subtable in the vector.
>
> Since this was the only use of the non-concurrent pvector, it is
> cleaner to revert the whole patch.
>
> Reported-by: Jan Scheurich 
> Signed-off-by: Jarno Rajahalme 
> ---
>  lib/classifier.c|  30 
>  lib/classifier.h|   6 +-
>  lib/dpif-netdev.c   |  14 ++--
>  lib/pvector.c   | 190 +++---
> --
>  lib/pvector.h   | 187 +++---
> -
>  tests/test-classifier.c |  12 +--
>  6 files changed, 182 insertions(+), 257 deletions(-)
>
> diff --git a/lib/classifier.c b/lib/classifier.c
> index 8f195d5..0551146 100644
> --- a/lib/classifier.c
> +++ b/lib/classifier.c
> @@ -325,7 +325,7 @@ classifier_init(struct classifier *cls, const uint8_t
> *flow_segments)
>  {
>  cls->n_rules = 0;
>  cmap_init(>subtables_map);
> -cpvector_init(>subtables);
> +pvector_init(>subtables);
>  cls->n_flow_segments = 0;
>  if (flow_segments) {
>  while (cls->n_flow_segments < CLS_MAX_INDICES
> @@ -359,7 +359,7 @@ classifier_destroy(struct classifier *cls)
>  }
>  cmap_destroy(>subtables_map);
>
> -cpvector_destroy(>subtables);
> +pvector_destroy(>subtables);
>  }
>  }
>
> @@ -658,20 +658,20 @@ classifier_replace(struct classifier *cls, const
> struct cls_rule *rule,
>  if (n_rules == 1) {
>  subtable->max_priority = rule->priority;
>  subtable->max_count = 1;
> -cpvector_insert(>subtables, subtable, rule->priority);
> +pvector_insert(>subtables, subtable, rule->priority);
>  } else if (rule->priority == subtable->max_priority) {
>  ++subtable->max_count;
>  } else if (rule->priority > subtable->max_priority) {
>  subtable->max_priority = rule->priority;
>  subtable->max_count = 1;
> -cpvector_change_priority(>subtables, subtable,
> rule->priority);
> +pvector_change_priority(>subtables, subtable,
> rule->priority);
>  }
>
>  /* Nothing was replaced. */
>  cls->n_rules++;
>
>  if (cls->publish) {
> -cpvector_publish(>subtables);
> +pvector_publish(>subtables);
>  }
>
>  return NULL;
> @@ -803,12 +803,12 @@ check_priority:
>  }
>  }
>  subtable->max_priority = max_priority;
> -cpvector_change_priority(>subtables, subtable,
> max_priority);
> +pvector_change_priority(>subtables, subtable,
> max_priority);
>  }
>  }
>
>  if (cls->publish) {
> -cpvector_publish(>subtables);
> +pvector_publish(>subtables);
>  }
>
>  /* free the rule. */
> @@ -959,8 +959,8 @@ classifier_lookup__(const struct classifier *cls,
> ovs_version_t version,
>
>  /* Main loop. */
>  struct cls_subtable *subtable;
> -CPVECTOR_FOR_EACH_PRIORITY (subtable, hard_pri + 1, 2, sizeof
> *subtable,
> ->subtables) {
> +PVECTOR_FOR_EACH_PRIORITY (subtable, hard_pri + 1, 2, sizeof
> *subtable,
> +   >subtables) {
>  struct cls_conjunction_set *conj_set;
>
>  /* Skip subtables with no match, or where the match is
> lower-priority
> @@ -1231,8 +1231,8 @@ classifier_rule_overlaps(const struct classifier
> *cls,
>  struct cls_subtable *subtable;
>
>  /* Iterate subtables in the descending max priority order. */
> -CPVECTOR_FOR_EACH_PRIORITY (subtable, target->priority, 2,
> -sizeof(struct cls_subtable),
> >subtables) {
> +PVECTOR_FOR_EACH_PRIORITY (subtable, target->priority, 2,
> +   sizeof(struct cls_subtable),
> >subtables) {
>  struct {
>  struct minimask mask;
>  uint64_t storage[FLOW_U64S];
> @@ -1350,8 +1350,8 @@ cls_cursor_start(const struct classifier *cls, const
> struct cls_rule *target,
>  cursor.rule = NULL;
>
>  /* Find first rule. */
> -CPVECTOR_CURSOR_FOR_EACH (subtable, ,
> -  >subtables) {
> +PVECTOR_CURSOR_FOR_EACH (subtable, ,
> + >subtables) {
>  const struct cls_rule *rule = search_subtable(subtable, );
>
>  if (rule) {
> @@ -1378,7 +1378,7 @@ cls_cursor_next(struct cls_cursor *cursor)
>  }
>  }
>
> -CPVECTOR_CURSOR_FOR_EACH_CONTINUE (subtable, >subtables) {
> +PVECTOR_CURSOR_FOR_EACH_CONTINUE (subtable, 

Re: [ovs-dev] [PATCH RFC v2 3/3] ovn: add SLAAC support for IPv6

2016-08-09 Thread Dustin Lundquist
On Mon, Aug 1, 2016 at 7:16 PM, Zong Kai LI  wrote:

> This patch tries to implement Router Advertisement (RA) responder for SLAAC
> on ovn-northd side.
>
> It tries to build lflows per each Logical Router Port, who have IPv6
> networks
> and set their 'slaac' column to true.
>
> The lflows will look like:
>  match=(inport == "lrp-32a71e0b-8b19-4c52-8cde-058325e4df5d" &&
> ip6.dst == ff02::2 && nd_rs)
>  action=(nd_ra{slaac(fd80:a123:b345::/64,1450,fa:16:3e:62:f1:e6);
>  outport = inport; flags.loopback = 1; output;};)
> while:
>  - nd_rs is a new symbol stands for
>"icmp6.type == 133 && icmp6.code == 0 && ttl == 255"
>  - slaac is a new action which accepts ordered parameter list:
>  - one or more IPv6 prefixes: such as fd80:a123:b345::/64.
>  - MTU: logical switch MTU, such as 1450.
>  - MAC address: router port mac address, such as fa:16:3e:62:f1:e6.
>  - nd_ra is a new action which stands for RA responder, it will compose a
> RA
>packet per parameters in slaac, and eth.src and ip6.src from packet
> being
>processed.
>
This would be a router solicitation responder, since it responds to router
solicitation messages by sending router advertisement messages.

>
> Logical_Router_Port.slaac column will only tell whether ovn should reply a
> RA
> packet for Router solicitation packet received from the lrp port. To
> respond
> a RA packet for other scenario will be a future work.
> ---
>  ovn/northd/ovn-northd.c |  94 
>  ovn/ovn-nb.ovsschema|   6 ++-
>  ovn/ovn-nb.xml  |  11 +
>  tests/ovn.at| 111 ++
> ++
>  4 files changed, 213 insertions(+), 9 deletions(-)
>
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index d6c14cf..98db819 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -126,9 +126,10 @@ enum ovn_stage {
>  PIPELINE_STAGE(ROUTER, IN,  IP_INPUT,1, "lr_in_ip_input") \
>  PIPELINE_STAGE(ROUTER, IN,  UNSNAT,  2, "lr_in_unsnat")   \
>  PIPELINE_STAGE(ROUTER, IN,  DNAT,3, "lr_in_dnat") \
> -PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  4, "lr_in_ip_routing")   \
> -PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 5, "lr_in_arp_resolve")  \
> -PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 6, "lr_in_arp_request")  \
> +PIPELINE_STAGE(ROUTER, IN,  RA_RSP,  4, "lr_in_ra_rsp")  \

Since this is responding to router solicitation messages, should this be
RS_RSP?

> +PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  5, "lr_in_ip_routing")   \
> +PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 6, "lr_in_arp_resolve")  \
> +PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 7, "lr_in_arp_request")  \
>\
>  /* Logical router egress stages. */   \
>  PIPELINE_STAGE(ROUTER, OUT, SNAT,  0, "lr_out_snat")  \
> @@ -3409,7 +3443,51 @@ build_lrouter_flows(struct hmap *datapaths, struct
> hmap *ports,
>"ip", "flags.loopback = 1; ct_dnat;");
>  }
>
> -/* Logical router ingress table 4: IP Routing.
> +/* Logical router ingress table 5: RA responder, by default goto next.
>
Isn't this table 4 now?

> + * (priority 0)*/
> +HMAP_FOR_EACH (od, key_node, datapaths) {
> +if (!od->nbr) {
> +continue;
> +}
> +
> +ovn_lflow_add(lflows, od, S_ROUTER_IN_RA_RSP, 0, "1", "next;");
> +}
> +
> +/* Logical router ingress table 5: RA responder, reply for 'slaac'
> enabled
> + * router port. (priority 50)*/
> +HMAP_FOR_EACH (op, key_node, ports) {
> +if (!op->nbrp || op->nbrp->peer
> +|| !op->peer
> +|| !op->nbrp->slaac
> +|| !*op->nbrp->slaac) {
> +continue;
> +}
> +
> +ds_clear();
> +ds_put_format(, "inport == %s", op->json_key);
> +ds_put_cstr(,  " && ip6.dst == ff02::2 && nd_rs");
> +ds_clear();
> +ds_put_format(, "nd_ra{slaac(");
> +size_t actions_len = actions.length;
> +for (size_t i = 0; i != op->lrp_networks.n_ipv6_addrs; i++) {
> +if (in6_is_lla(>lrp_networks.ipv6_addrs[i].network)) {
> +continue;
> +}
> +ds_put_format(, "%s/%u,",
> +  op->lrp_networks.ipv6_addrs[i].network_s,
> +  op->lrp_networks.ipv6_addrs[i].plen);
> +}
> +if (actions.length != actions_len) {
> +ds_put_format(, "%ld,", op->peer->od->nbs->mtu);
> +ds_put_cstr(, op->lrp_networks.ea_s);
> +ds_put_cstr(, "); outport = inport; flags.loopback =
> 1;"
> +  " output;};");
> +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_RA_RSP, 50,
> +  ds_cstr(), ds_cstr());
> +}
> +}

Re: [ovs-dev] [PATCH v3] dpif-netdev: dpcls per in_port with sorted subtables

2016-08-09 Thread Jan Scheurich
From your second mail I figure I should base on your revert patch and post a v5 
instead where cpvector is replaced by the old pvector again.

What is the best procedure for this? I guess I should wait until the revert 
patch is merged. But that might delay the review and reduce chances of making 
it for 2.6. Please advise.

Thanks, Jan


From: Jarno Rajahalme [mailto:ja...@ovn.org]
Sent: Tuesday, 09 August, 2016 22:04
To: Jan Scheurich
Cc: dev@openvswitch.org
Subject: Re: [ovs-dev] [PATCH v3] dpif-netdev: dpcls per in_port with sorted 
subtables


On Aug 9, 2016, at 7:08 AM, Jan Scheurich 
> wrote:

- Adapted to renamed cpvector API
 Reverted dplcs to using cpvector due to threading issue during flow removal

Would you be kind and make this a separate patch, with a more detailed commit 
message.

Thanks,

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] Revert "pvector: Expose non-concurrent priority vector."

2016-08-09 Thread Jarno Rajahalme
This reverts commit 8bdfe1313894047d44349fa4cf4402970865950f.

I failed to see that lib/dpif-netdev.c actually needs the concurrency
provided by pvector prior to this change.  More specifically, when a
subtable is removed, concurrent lookups may skip over another subtable
swapped in to the place of the removed subtable in the vector.

Since this was the only use of the non-concurrent pvector, it is
cleaner to revert the whole patch.

Reported-by: Jan Scheurich 
Signed-off-by: Jarno Rajahalme 
---
 lib/classifier.c|  30 
 lib/classifier.h|   6 +-
 lib/dpif-netdev.c   |  14 ++--
 lib/pvector.c   | 190 +++-
 lib/pvector.h   | 187 +++
 tests/test-classifier.c |  12 +--
 6 files changed, 182 insertions(+), 257 deletions(-)

diff --git a/lib/classifier.c b/lib/classifier.c
index 8f195d5..0551146 100644
--- a/lib/classifier.c
+++ b/lib/classifier.c
@@ -325,7 +325,7 @@ classifier_init(struct classifier *cls, const uint8_t 
*flow_segments)
 {
 cls->n_rules = 0;
 cmap_init(>subtables_map);
-cpvector_init(>subtables);
+pvector_init(>subtables);
 cls->n_flow_segments = 0;
 if (flow_segments) {
 while (cls->n_flow_segments < CLS_MAX_INDICES
@@ -359,7 +359,7 @@ classifier_destroy(struct classifier *cls)
 }
 cmap_destroy(>subtables_map);
 
-cpvector_destroy(>subtables);
+pvector_destroy(>subtables);
 }
 }
 
@@ -658,20 +658,20 @@ classifier_replace(struct classifier *cls, const struct 
cls_rule *rule,
 if (n_rules == 1) {
 subtable->max_priority = rule->priority;
 subtable->max_count = 1;
-cpvector_insert(>subtables, subtable, rule->priority);
+pvector_insert(>subtables, subtable, rule->priority);
 } else if (rule->priority == subtable->max_priority) {
 ++subtable->max_count;
 } else if (rule->priority > subtable->max_priority) {
 subtable->max_priority = rule->priority;
 subtable->max_count = 1;
-cpvector_change_priority(>subtables, subtable, rule->priority);
+pvector_change_priority(>subtables, subtable, rule->priority);
 }
 
 /* Nothing was replaced. */
 cls->n_rules++;
 
 if (cls->publish) {
-cpvector_publish(>subtables);
+pvector_publish(>subtables);
 }
 
 return NULL;
@@ -803,12 +803,12 @@ check_priority:
 }
 }
 subtable->max_priority = max_priority;
-cpvector_change_priority(>subtables, subtable, max_priority);
+pvector_change_priority(>subtables, subtable, max_priority);
 }
 }
 
 if (cls->publish) {
-cpvector_publish(>subtables);
+pvector_publish(>subtables);
 }
 
 /* free the rule. */
@@ -959,8 +959,8 @@ classifier_lookup__(const struct classifier *cls, 
ovs_version_t version,
 
 /* Main loop. */
 struct cls_subtable *subtable;
-CPVECTOR_FOR_EACH_PRIORITY (subtable, hard_pri + 1, 2, sizeof *subtable,
->subtables) {
+PVECTOR_FOR_EACH_PRIORITY (subtable, hard_pri + 1, 2, sizeof *subtable,
+   >subtables) {
 struct cls_conjunction_set *conj_set;
 
 /* Skip subtables with no match, or where the match is lower-priority
@@ -1231,8 +1231,8 @@ classifier_rule_overlaps(const struct classifier *cls,
 struct cls_subtable *subtable;
 
 /* Iterate subtables in the descending max priority order. */
-CPVECTOR_FOR_EACH_PRIORITY (subtable, target->priority, 2,
-sizeof(struct cls_subtable), >subtables) {
+PVECTOR_FOR_EACH_PRIORITY (subtable, target->priority, 2,
+   sizeof(struct cls_subtable), >subtables) {
 struct {
 struct minimask mask;
 uint64_t storage[FLOW_U64S];
@@ -1350,8 +1350,8 @@ cls_cursor_start(const struct classifier *cls, const 
struct cls_rule *target,
 cursor.rule = NULL;
 
 /* Find first rule. */
-CPVECTOR_CURSOR_FOR_EACH (subtable, ,
-  >subtables) {
+PVECTOR_CURSOR_FOR_EACH (subtable, ,
+ >subtables) {
 const struct cls_rule *rule = search_subtable(subtable, );
 
 if (rule) {
@@ -1378,7 +1378,7 @@ cls_cursor_next(struct cls_cursor *cursor)
 }
 }
 
-CPVECTOR_CURSOR_FOR_EACH_CONTINUE (subtable, >subtables) {
+PVECTOR_CURSOR_FOR_EACH_CONTINUE (subtable, >subtables) {
 rule = search_subtable(subtable, cursor);
 if (rule) {
 cursor->subtable = subtable;
@@ -1510,7 +1510,7 @@ destroy_subtable(struct classifier *cls, struct 
cls_subtable *subtable)
 {
 int i;
 
-cpvector_remove(>subtables, subtable);
+pvector_remove(>subtables, subtable);
 cmap_remove(>subtables_map, >cmap_node,
 

Re: [ovs-dev] [PATCH RFC v2 2/3] ovn: add SLAAC support for IPv6

2016-08-09 Thread Dustin Lundquist
On Mon, Aug 1, 2016 at 7:19 PM, Zong Kai LI  wrote:

> This patch tries to implement Router Advertisement (RA) responder for SLAAC
> on ovn-controller side.
>
> It parses lflows which have:
>  - match: inport == LRP_NAME && ip6.dst == ff02::2 && nd_rs
>(nd_rs: icmp6.type == 133 && icmp6.code == 0 && ttl == 255)
>  - action: nd_ra{slaac(prefix,...,MTU, mac_address);
>outport = inport; flags.loopback = 1; output;};
>(nd_ra is a new action which stands for RA responder;
> slaac is a new action which has the following parameters to tell
> ovn-controller to compose RA packet with SLAAC flags, with these
> parameters:
>  - IPv6 prefix(es): such as fd80:a0f9:a012::/64.
>  - MTU: logical switch MTU, such as 1450.
>  - MAC address: router port mac address, such as fa:16:3e:12:34:56.
> Beside the parameters list above, nd_ra action will use eth.src and
> ip6.src
> from packet being processed as RA packet eth.src and ip6.src.
> After a RA packet is composed, the left nested actions will make RA packet
> transmitted back to the inport, where Router Solicitation (RS) packet
> comes.
>
> For inner action 'slaac', as a prototype, it doesn't try to expose all RA
> relevant flags and fields. As most flags are constant for in SLAAC
> scenario,
> so it only exposed few fields user may care, such as prefixes, MTU and LLA.
> It uses an ordered parameters list to hold all those values for now, but in
> future, when we try to implement more feature on RA, the unexposed ones
> should
> be exposed, and key-pair style parameter list should be used.
>
> For outer action 'nd_ra', as a prototype, it's just a RA responder now,
> not a
> real implement for RA, like it wont send periodic RA broadcast. This will
> be
> harmful when routing relevant stuff changes, like user changes lrp mac,
> disattach a switch from a router. However, a periodic version seems not so
> necessary, but a broadcast should be sent when things change.
>
> For lflow, this patch add prerequisites for Router Solicitation (RS) and RA
> message. But it doesn't fix details of nd.target, nd.sll and nd.tll for RS
> and RA, since these fields are not supported to be modified via ovs native
> actions for now.
> ---
>  include/ovn/actions.h|  15 +
>  include/ovn/expr.h   |  11 
>  ovn/controller/lflow.c   |   6 +-
>  ovn/controller/pinctrl.c | 168 ++
> +
>  ovn/lib/actions.c|  69 +++
>  ovn/lib/expr.c   |  11 +---
>  ovn/ovn-sb.xml   |  32 -
>  tests/ovn.at |   6 +-
>  tests/test-ovn.c |   6 +-
>  9 files changed, 310 insertions(+), 14 deletions(-)
>
> diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
>
index bd685fe..fa1e7d7 100644
> --- a/ovn/controller/pinctrl.c
> +++ b/ovn/controller/pinctrl.c
>
@@ -1019,3 +1026,164 @@ exit:
>  dp_packet_uninit();
>  ofpbuf_uninit();
>  }
> +
> +static bool pinctrl_handle_slaac(const struct flow *ip_flow,
> + struct ofpbuf *userdata,
> + struct dp_packet *packet,
> + int *left_bytes)
> +{
> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
> +bool ret_val = false;
> +struct ipv6_netaddr *prefixes = NULL;
> +/* Get number of prefixes. */
> +uint8_t *n_prefixes = ofpbuf_try_pull(userdata, 1);
> +if (!n_prefixes || *n_prefixes == 0) {
> +VLOG_WARN_RL(, "Failed to parse prefixes number.");
> +goto exit;
> +}
> +(*left_bytes)--;
> +
> +/* Get prefixes. */
> +size_t prefixes_size = sizeof(struct ipv6_netaddr) * (*n_prefixes);
> +prefixes = xmalloc(prefixes_size);
> +struct in6_addr *prefix;
> +uint8_t *prefix_len = NULL;
> +for (size_t i = 0; i < *n_prefixes; i++) {
> +prefix = ofpbuf_try_pull(userdata, sizeof(struct in6_addr));
> +if (!prefix) {
> +VLOG_WARN_RL(, "Failed to parse ipv6 prefix.");
> +goto exit;
> +}
> +prefix_len = ofpbuf_try_pull(userdata, 1);
> +if (!prefix_len || *prefix_len == 0) {
> +VLOG_WARN_RL(, "Failed to parse prefix len.");
> +goto exit;
> +}
> +memcpy([i].addr, prefix, sizeof(struct in6_addr));
> +prefixes[i].plen = *prefix_len;
> +}
> +(*left_bytes) -= *n_prefixes * (sizeof(struct in6_addr) + 1);
> +
> +/* Get MTU. */
> +ovs_be32 *mtu = ofpbuf_try_pull(userdata, sizeof(ovs_be32));
> +if (!mtu || *mtu == 0) {
> +VLOG_WARN_RL(, "Failed to parse mtu.");
> +goto exit;
> +}
> +(*left_bytes) -= sizeof(ovs_be32);
> +
> +/* Get SLL. */
> +struct eth_addr *sll = ofpbuf_try_pull(userdata,
> +   sizeof(struct eth_addr));
> +if (!sll) {
> +VLOG_WARN_RL(, "Failed to parse link-layer address.");
> +goto exit;

Re: [ovs-dev] [PATCH V14] Function tracer to trace all functioncalls

2016-08-09 Thread Ryan Moats
"dev"  wrote on 08/08/2016 11:41:21 PM:

> >
> > There is a python file [generate_ft_report.py] with the patch,
> > that may be used to convert this trace output to a human readable
> > format with symbol names instead of address and their execution
> > times. This tool uses addr2line that expects the executable to
> > be built with -g flag.

After further review, I'm going to call this paragraph into question.
The compile flags already include "-g", so I can have symbols
already (at least enough to use gdb).  Therefore, I'm going to ask
what happens if the following diff:

> > diff --git a/utilities/automake.mk b/utilities/automake.mk
> > index 9d5b425..c12d279 100644
> > --- a/utilities/automake.mk
> > +++ b/utilities/automake.mk
> > @@ -1,3 +1,11 @@
> > +if ENABLE_FT
> > +CFLAGS += -g -finstrument-functions \
> > +  -ldl -export-dynamic -lrt -DENABLE_FT \
> > +
> -f"instrument-functions-exclude-function-list=fprintf,time_init,\
> > +
> xclock_gettime,time_timespec__,timespec_to_msec,timespec_to_msec,\
> > +  time_msec__,time_msec,gettimeofday"
> > +endif
> > +
> >  bin_PROGRAMS += \
> > utilities/ovs-appctl \
> > utilities/ovs-testcontroller \

only defines "-finstrument-functions -DENABLE_FT \" from the first
two lines of CFLAGS.

Note: I've tried this out in a small cloud and removing "-g" and
"-ldl -export-dynamic -lrt" significantly improves rally-measured
performance, both for a single network containing 200 ports and
for 60 networks, each with 10 ports.

Ryan
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 3/3] pvector: Expose non-concurrent priority vector.

2016-08-09 Thread Jarno Rajahalme

> On Aug 8, 2016, at 3:10 PM, Jan Scheurich  wrote:
> 
> Hi Jarno,
> 
> While trying to rebase my "dpcls per in_port" patch to your updated 
> pvector/cpvector implementation, I have stumbled over a threading issue in 
> your patch.
> 
> I believe that dpcls_destroy_subtable(), which may be invoked from 
> revalidater threads at flow removal, should not simply call the new 
> non-concurrent pvector_remove() function as this may destroy the priority 
> order of the vector when it swaps in the last element.
> 
> In this particular case the damage done to the concurrently iterating PMD 
> thread might not be fatal, but I believe using the non-concurrent API from 
> two different threads is not OK. Seizing the pmd->flow_mutex for flow removal 
> does not suffice as it does not prevent the PMD thread from iterating the 
> pvector at lookup.
> 
> In my view we should revert dpcls to use the concurrent cpvector API. I would 
> also use that approach for v3 of my "dpcls per in_port with sorted subtables" 
> patch.
> 

Thanks for figuring this out. I think it is cleaner to just revert that patch 
that exposed the non-concurrent pvector, as it has no other users. I'll post a 
patch to that effect shortly.

Regards,

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3] dpif-netdev: dpcls per in_port with sorted subtables

2016-08-09 Thread Jarno Rajahalme

> On Aug 9, 2016, at 7:08 AM, Jan Scheurich  wrote:
> 
> - Adapted to renamed cpvector API
>  Reverted dplcs to using cpvector due to threading issue during flow removal

Would you be kind and make this a separate patch, with a more detailed commit 
message.

Thanks,

  Jarno

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [v2] ovs-vsctl: simply vsctl_parent_process_info()

2016-08-09 Thread Andy Zhou
Use ds_get_line() instead of hand rolling it. Rearrange the logic
to removes some duplication.

Signed-off-by: Andy Zhou 

---
v1->v2:  rebase to current master.
---
 utilities/ovs-vsctl.c | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/utilities/ovs-vsctl.c b/utilities/ovs-vsctl.c
index efdfb86..e710095 100644
--- a/utilities/ovs-vsctl.c
+++ b/utilities/ovs-vsctl.c
@@ -2488,26 +2488,16 @@ vsctl_parent_process_info(void)
 procfile = xasprintf("/proc/%d/cmdline", parent_pid);
 
 f = fopen(procfile, "r");
-if (!f) {
-free(procfile);
-return NULL;
-}
 free(procfile);
-
-for (;;) {
-int c = getc(f);
-if (!c || c == EOF) {
-break;
-}
-ds_put_char(, c);
+if (f) {
+ds_get_line(, f);
+fclose(f);
 }
-fclose(f);
 } else {
 ds_put_cstr(, "init");
 }
 
 ds_put_format(, " (pid %d)", parent_pid);
-
 return ds_steal_cstr();
 #else
 return NULL;
-- 
1.9.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] datapath: backport: net: vxlan: lwt: Use source ip address during route lookup.

2016-08-09 Thread pravin shelar
On Tue, Aug 9, 2016 at 10:51 AM, Jesse Gross  wrote:
> On Mon, Aug 8, 2016 at 2:54 PM, Pravin B Shelar  wrote:
>> Upstream commit:
>> commit 272d96a5ab10662691b4ec90c4a66fdbf30ea7ba
>> Author: pravin shelar 
>> Date:   Fri Aug 5 17:45:36 2016 -0700
>>
>> net: vxlan: lwt: Use source ip address during route lookup.
>>
>> LWT user can specify destination as well as source ip address
>> for given tunnel endpoint. But vxlan is ignoring given source
>> ip address. Following patch uses both ip address to route the
>> tunnel packet. This consistent with other LWT implementations,
>> like GENEVE and GRE.
>>
>> Fixes: ee122c79d42 ("vxlan: Flow based tunneling").
>> Signed-off-by: Pravin B Shelar 
>> Acked-by: Jiri Benc 
>> Signed-off-by: David S. Miller 
>>
>> Signed-off-by: Pravin B Shelar 
>
> Acked-by: Jesse Gross 

I pushed both patches to master and branch-2.5.

Thanks.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC PATCH v2 00/13] Add Network Service Header Support

2016-08-09 Thread Jesse Gross
On Tue, Aug 9, 2016 at 7:52 AM, Simon Horman  wrote:
> Hi Jesse,
>
> On Wed, Jul 13, 2016 at 07:35:59AM -0700, Jesse Gross wrote:
>> On Wed, Jul 13, 2016 at 4:04 AM, Brady Allen Johnson
>>  wrote:
>> > I wanted to mention though, currently the type 2 metadata (MD2) isnt a top
>> > priority for us. It looks like its already been investigated how to use 
>> > some
>> > existing OVS TLV code to implement this, so it should be easy to add MD2 in
>> > the future. Can we consider first merging the core NSH functionality 
>> > without
>> > MD2, and then submit MD2 in a subsequent patch?
>>
>> I think history tells us how this will end - similar to IPv4 options,
>> implementations that don't implement TLVs will become deployed and
>> then when there is a use for them it's no longer possible. Since I
>> don't want OVS to have a half implementation or contribute to this
>> issue, I'd like to see the whole protocol implemented before I apply
>> anything.
>
> I see your point but I also see value in incrementally implementing a full
> solution. If anything I feel that the current patch-set is already on the
> large side though of course the design could be fleshed out to some extent
> independently of the implementation and submitted patches.

I think this debate has pretty much been overcome by events at this
point. I believe that there is already work underway to implement MD
type 2. It seems like it would be most productive at this point to
focus on the code rather than continuing to go back and forth on this.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH RFC v2 1/3] ovn: add SLAAC support for IPv6

2016-08-09 Thread Dustin Lundquist
On Mon, Aug 1, 2016 at 7:16 PM, Zong Kai LI  wrote:

> This patch introduces methods to compose a Router Advertisement (RA)
> packet,
> introduces flags for RA, renames ovs_nd_opt to ovs_nd_lla_opt to specify
> it's
> Source/Target Link-layer Address option.
>
> Signed-off-by: Zong Kai LI 
> ---
>  lib/flow.c|  26 -
>  lib/odp-execute.c |  20 +++
>  lib/packets.c | 168 ++
> +++-
>  lib/packets.h |  86 
>  4 files changed, 239 insertions(+), 61 deletions(-)
>

Reviewed packet structures against specification in RFC4861.

Acked-by: Dustin Lundquist 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] netdev-dpdk: Fix deadlock in destroy_device().

2016-08-09 Thread Daniele Di Proietto





On 08/08/2016 01:02, "Kavanagh, Mark B"  wrote:

>>
>>Minor comment inline.
>>
>>Acked-by: Ilya Maximets 
>>
>
>Other than the comment mentioned by Ilya, this LGTM also - thanks again for 
>resolving, Daniele.
>
>Acked-by: mark.b.kavan...@intel.com

Thanks for the reviews, I applied this to master

>
>>On 05.08.2016 23:57, Daniele Di Proietto wrote:
>>> netdev_dpdk_vhost_destruct() calls rte_vhost_driver_unregister(), which
>>> can trigger the destroy_device() callback.  destroy_device() will try to
>>> take two mutexes already held by netdev_dpdk_vhost_destruct(), causing a
>>> deadlock.
>>>
>>> This problem can be solved by dropping the mutexes before calling
>>> rte_vhost_driver_unregister().  The netdev_dpdk_vhost_destruct() and
>>> construct() call are already serialized by netdev_mutex.
>>>
>>> This commit also makes clear that dev->vhost_id is constant and can be
>>> accessed without taking any mutexes in the lifetime of the devices.
>>>
>>> Fixes: 8d38823bdf8b("netdev-dpdk: fix memory leak")
>>> Reported-by: Ilya Maximets 
>>> Signed-off-by: Daniele Di Proietto 
>>> ---
>>>  lib/netdev-dpdk.c | 34 --
>>>  1 file changed, 24 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index f37ec1c..98bff62 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -355,8 +355,10 @@ struct netdev_dpdk {
>>>  /* True if vHost device is 'up' and has been reconfigured at least 
>>> once */
>>>  bool vhost_reconfigured;
>>>
>>> -/* Identifier used to distinguish vhost devices from each other */
>>> -char vhost_id[PATH_MAX];
>>> +/* Identifier used to distinguish vhost devices from each other.  It 
>>> does
>>> + * not change during the lifetime of a struct netdev_dpdk.  It can be 
>>> read
>>> + * without holding any mutex. */
>>> +const char vhost_id[PATH_MAX];
>>>
>>>  /* In dpdk_list. */
>>>  struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
>>> @@ -846,7 +848,8 @@ netdev_dpdk_vhost_cuse_construct(struct netdev *netdev)
>>>  }
>>>
>>>  ovs_mutex_lock(_mutex);
>>> -strncpy(dev->vhost_id, netdev->name, sizeof(dev->vhost_id));
>>> +strncpy(CONST_CAST(char *, dev->vhost_id), netdev->name,
>>> +sizeof dev->vhost_id);
>>>  err = vhost_construct_helper(netdev);
>>>  ovs_mutex_unlock(_mutex);
>>>  return err;
>>> @@ -878,7 +881,7 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>>  /* Take the name of the vhost-user port and append it to the location 
>>> where
>>>   * the socket is to be created, then register the socket.
>>>   */
>>> -snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
>>> +snprintf(CONST_CAST(char *,dev->vhost_id), sizeof(dev->vhost_id), 
>>> "%s/%s",
>>
>>Space between arguments of 'CONST_CAST()' and parenthesized operand of 
>>'sizeof'.

Fixed, thanks

>>
>>>   vhost_sock_dir, name);
>>>
>>>  err = rte_vhost_driver_register(dev->vhost_id, flags);
>>> @@ -938,6 +941,17 @@ netdev_dpdk_destruct(struct netdev *netdev)
>>>  ovs_mutex_unlock(_mutex);
>>>  }
>>>
>>> +/* rte_vhost_driver_unregister() can call back destroy_device(), which will
>>> + * try to acquire 'dpdk_mutex' and possibly 'dev->mutex'.  To avoid a
>>> + * deadlock, none of the mutexes must be held while calling this function. 
>>> */
>>> +static int
>>> +dpdk_vhost_driver_unregister(struct netdev_dpdk *dev)
>>> +OVS_EXCLUDED(dpdk_mutex)
>>> +OVS_EXCLUDED(dev->mutex)
>>> +{
>>> +return rte_vhost_driver_unregister(dev->vhost_id);
>>> +}
>>> +
>>>  static void
>>>  netdev_dpdk_vhost_destruct(struct netdev *netdev)
>>>  {
>>> @@ -955,12 +969,6 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
>>>   dev->vhost_id);
>>>  }
>>>
>>> -if (rte_vhost_driver_unregister(dev->vhost_id)) {
>>> -VLOG_ERR("Unable to remove vhost-user socket %s", dev->vhost_id);
>>> -} else {
>>> -fatal_signal_remove_file_to_unlink(dev->vhost_id);
>>> -}
>>> -
>>>  free(ovsrcu_get_protected(struct ingress_policer *,
>>>>ingress_policer));
>>>
>>> @@ -970,6 +978,12 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
>>>
>>>  ovs_mutex_unlock(>mutex);
>>>  ovs_mutex_unlock(_mutex);
>>> +
>>> +if (dpdk_vhost_driver_unregister(dev)) {
>>> +VLOG_ERR("Unable to remove vhost-user socket %s", dev->vhost_id);
>>> +} else {
>>> +fatal_signal_remove_file_to_unlink(dev->vhost_id);
>>> +}
>>>  }
>>>
>>>  static void
>>>
>>___
>>dev mailing list
>>dev@openvswitch.org
>>http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] datapath: backport: net: vxlan: lwt: Use source ip address during route lookup.

2016-08-09 Thread Jesse Gross
On Mon, Aug 8, 2016 at 2:54 PM, Pravin B Shelar  wrote:
> Upstream commit:
> commit 272d96a5ab10662691b4ec90c4a66fdbf30ea7ba
> Author: pravin shelar 
> Date:   Fri Aug 5 17:45:36 2016 -0700
>
> net: vxlan: lwt: Use source ip address during route lookup.
>
> LWT user can specify destination as well as source ip address
> for given tunnel endpoint. But vxlan is ignoring given source
> ip address. Following patch uses both ip address to route the
> tunnel packet. This consistent with other LWT implementations,
> like GENEVE and GRE.
>
> Fixes: ee122c79d42 ("vxlan: Flow based tunneling").
> Signed-off-by: Pravin B Shelar 
> Acked-by: Jiri Benc 
> Signed-off-by: David S. Miller 
>
> Signed-off-by: Pravin B Shelar 

Acked-by: Jesse Gross 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/2] datapath: backport: net: vxlan: lwt: Fix vxlan local traffic.

2016-08-09 Thread Jesse Gross
On Mon, Aug 8, 2016 at 2:54 PM, Pravin B Shelar  wrote:
> Upstream commit:
> commit bbec7802c6948c8626b71a4fe31283cb4691c358
> Author: pravin shelar 
> Date:   Fri Aug 5 17:45:37 2016 -0700
>
> net: vxlan: lwt: Fix vxlan local traffic.
>
> vxlan driver has bypass for local vxlan traffic, but that
> depends on information about all VNIs on local system in
> vxlan driver. This is not available in case of LWT.
> Therefore following patch disable encap bypass for LWT
> vxlan traffic.
>
> Fixes: ee122c79d42 ("vxlan: Flow based tunneling").
> Reported-by: Jakub Libosvar 
> Signed-off-by: Pravin B Shelar 
> Acked-by: Jiri Benc 
> Signed-off-by: David S. Miller 
>
> Signed-off-by: Pravin B Shelar 

Acked-by: Jesse Gross 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v1 1/1] netdev-dpdk: Fix egress policer error detection bug.

2016-08-09 Thread Stokes, Ian
From: Daniele Di Proietto [mailto:diproiet...@ovn.org]
Sent: Friday, August 05, 2016 2:15 AM
To: Stokes, Ian
Cc: dev@openvswitch.org
Subject: Re: [ovs-dev] [PATCH v1 1/1] netdev-dpdk: Fix egress policer error 
detection bug.

Thanks for the patch, comments inline

2016-08-02 9:37 GMT-07:00 Ian Stokes 
>:
When egress policer is set as a QoS type for a port, an error may occur during
setup if incorrect parameters are used for the rte_meter. If this occurs
the egress policer construct and set functions should free any allocated
memory relevant to the policer and set the QoS configuration pointer to
null. The netdev_dpdk_set_qos function should check the error value returned
for any QoS construct/set calls with an assertion to avoid segfault.

Signed-off-by: Ian Stokes >
---
 lib/netdev-dpdk.c |   29 -
 1 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index c208f32..c382270 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2679,12 +2679,19 @@ netdev_dpdk_set_qos(struct netdev *netdev,

 /* Install new QoS configuration. */
 error = new_ops->qos_construct(netdev, details);
-ovs_assert((error == 0) == (dev->qos_conf != NULL));
 }
 } else {
 error = new_ops->qos_construct(netdev, details);
+}
+
+if (!error) {
 ovs_assert((error == 0) == (dev->qos_conf != NULL));
 }
+else {
+VLOG_ERR("Failed to set QoS type %s on port %s, returned error %d",
+type, netdev->name, error);
+ovs_assert(dev->qos_conf == NULL);
+}

I think we can replace this with:

ovs_assert((error == 0) == (dev->qos_conf != NULL));
if (!error) {
   VLOG(...)
}

type should be aligned with " on the above line
Can we use rte_strerror to print a textual representation?


 ovs_mutex_unlock(>mutex);
 return error;
@@ -2726,6 +2733,15 @@ egress_policer_qos_construct(struct netdev *netdev,
 policer->app_srtcm_params.ebs = 0;
 err = rte_meter_srtcm_config(>egress_meter,
 >app_srtcm_params);
+
+if (err < 0) {
+/* Error occurred during rte_meter creation, destroy the policer
+ * and set the qos configuration for the netdev dpdk to NULL
+ */
+free(policer);
+dev->qos_conf = NULL;
+}
+

Can we return a positive error number instead of a negative one? This is more 
inline with the rest of OVS

 rte_spinlock_unlock(>qos_lock);

 return err;
@@ -2756,11 +2772,13 @@ static int
 egress_policer_qos_set(struct netdev *netdev, const struct smap *details)
 {
 struct egress_policer *policer;
+struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
 const char *cir_s;
 const char *cbs_s;
 int err = 0;

 policer = egress_policer_get__(netdev);
+rte_spinlock_lock(>qos_lock);
 cir_s = smap_get(details, "cir");
 cbs_s = smap_get(details, "cbs");
 policer->app_srtcm_params.cir = cir_s ? strtoull(cir_s, NULL, 10) : 0;
@@ -2769,6 +2787,15 @@ egress_policer_qos_set(struct netdev *netdev, const 
struct smap *details)
 err = rte_meter_srtcm_config(>egress_meter,
 >app_srtcm_params);

+if (err < 0) {
+/* Error occurred during rte_meter creation, destroy the policer
+ * and set the qos configuration for the netdev dpdk to NULL
+ */
+free(policer);
+dev->qos_conf = NULL;
+}
+rte_spinlock_unlock(>qos_lock);
+

Can we return a positive error number instead of a negative one? This is more 
inline with the rest of OVS
I guess we forgot to lock the spinlock here on the original patch and this 
commit fixes it. Can you document this in the commit message?
In the long term I'd like this to use RCU, as we wouldn't need so many critical 
sections, but it's fine to avoid it for now

 return err;
 }
Thanks for the review Daniele, I agree with the comments above and have sent a 
v2 patch
http://openvswitch.org/pipermail/dev/2016-August/077592.html
As regards the RCU, I had started work on this but didn’t have a chance to 
finish it off; it’s something I’ll look at again in future as it would be 
better than what we have now.

Thanks
Ian
Thanks,
Daniele

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v2 1/1] netdev-dpdk: Fix egress policer error detection bug.

2016-08-09 Thread Ian Stokes
When egress policer is set as a QoS type for a port, an error may occur during
setup if incorrect parameters are used for the rte_meter. If this occurs
the egress policer construct and set functions should free any allocated
memory relevant to the policer and set the QoS configuration pointer to
null. The netdev_dpdk_set_qos function should check the error value returned
for any QoS construct/set calls with an assertion to avoid segfault.
Also this commit modifies egress_policer_qos_set() to correctly lock the QoS
spinlock while the egress policer configuration is updated to avoid
segfault.

Signed-off-by: Ian Stokes 
---
v2
* netdev-dpdk.c
- Simplify assertion in netdev_dpdk_set_qos() to check that no error
  has been returned and that a QoS configuration exists before checking
  and logging an error.
- Use rte_strerror  in netdev_dpdk_set_qos() when logging error for a
  textual representation.
- Align VLOG message for correct formatting in netdev_dpdk_set_qos().
- egress_policer_qos_construct() now returns positive error.
- egress_policer_qos_set() now return positive error.
- Document addition of spinlock in egress_policer_qos_set() in commit
  message.
---
 lib/netdev-dpdk.c |   30 --
 1 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index bf3a898..f37130e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2731,11 +2731,16 @@ netdev_dpdk_set_qos(struct netdev *netdev,
 
 /* Install new QoS configuration. */
 error = new_ops->qos_construct(netdev, details);
-ovs_assert((error == 0) == (dev->qos_conf != NULL));
 }
 } else {
 error = new_ops->qos_construct(netdev, details);
-ovs_assert((error == 0) == (dev->qos_conf != NULL));
+}
+
+ovs_assert((error == 0) == (dev->qos_conf != NULL));
+if (error) {
+VLOG_ERR("Failed to set QoS type %s on port %s, returned error: %s",
+ type, netdev->name, rte_strerror(-error));
+ovs_assert(dev->qos_conf == NULL);
 }
 
 ovs_mutex_unlock(>mutex);
@@ -2774,6 +2779,15 @@ egress_policer_qos_construct(struct netdev *netdev,
 policer->app_srtcm_params.ebs = 0;
 err = rte_meter_srtcm_config(>egress_meter,
 >app_srtcm_params);
+
+if (err < 0) {
+/* Error occurred during rte_meter creation, destroy the policer
+ * and set the qos configuration for the netdev dpdk to NULL
+ */
+free(policer);
+dev->qos_conf = NULL;
+err = -err;
+}
 rte_spinlock_unlock(>qos_lock);
 
 return err;
@@ -2804,15 +2818,27 @@ static int
 egress_policer_qos_set(struct netdev *netdev, const struct smap *details)
 {
 struct egress_policer *policer;
+struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
 int err = 0;
 
 policer = egress_policer_get__(netdev);
+rte_spinlock_lock(>qos_lock);
 policer->app_srtcm_params.cir = smap_get_ullong(details, "cir", 0);
 policer->app_srtcm_params.cbs = smap_get_ullong(details, "cbs", 0);
 policer->app_srtcm_params.ebs = 0;
 err = rte_meter_srtcm_config(>egress_meter,
 >app_srtcm_params);
 
+if (err < 0) {
+/* Error occurred during rte_meter creation, destroy the policer
+ * and set the qos configuration for the netdev dpdk to NULL
+ */
+free(policer);
+dev->qos_conf = NULL;
+err = -err;
+}
+rte_spinlock_unlock(>qos_lock);
+
 return err;
 }
 
-- 
1.7.4.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch_v3] ovn: Add datapaths of interest filtering.

2016-08-09 Thread Darrell Ball
On Thu, Aug 4, 2016 at 3:26 AM, Liran Schour  wrote:

> "dev"  wrote on 03/08/2016 09:09:48 AM:
>
> > From: Darrell Ball 
> > To: dlu...@gmail.com, d...@openvswitch.com, b...@ovn.org
> > Date: 03/08/2016 09:10 AM
> > Subject: [ovs-dev] [patch_v3] ovn: Add datapaths of interest filtering.
> > Sent by: "dev" 
> >
> > This patch adds datapaths of interest support where only datapaths of
> > local interest are monitored by the ovn-controller ovsdb client.  The
> > idea is to do a flood fill in ovn-controller of datapath associations
> > calculated by northd. A new column is added to the SB database
> > datapath_binding table - related_datapaths to facilitate this so all
> > datapaths associations are known quickly in ovn-controller.  This
> > allows monitoring to adapt quickly with a single new monitor setting
> > for all datapaths of interest locally.
> >
> > Signed-off-by: Darrell Ball 
> > ---
> >
>
> I still think this work is mainly based on top of the conditional monitor
> work. However it introduces a flood fill in the ovn-controller using the
> new added column, related_datapaths, in the datapath_bindning table which
> is important optimization to the origin work.
>
> Since the origin patch has went through 12 iterations and it is much more
> stable and mature (http://openvswitch.org/pipermail/dev/2016-August/
> 077008.html).
> I propose to combine the 2 patches.
>
> What do you think?



I just got back from 5 days vacation yesterday.

The patches are significantly different in design and implementation
aspects.
They both call some conditional monitoring APIs.
However, I would like to find a way to share in some way.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH V7 7/7] netdev-dpdk: add support for jumbo frames

2016-08-09 Thread Daniele Di Proietto
Thanks for all the series and the reviews, I will push this when the
dependencies (patch 2 and patch 6) are reviewed.


Daniele

2016-08-09 9:01 GMT-07:00 Mark Kavanagh :

> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
>
> Using this approach, the amount of memory allocated to each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame of a specific
> size can be carried in a single mbuf, as opposed to partitioning
> it across multiple mbuf segments.
>
> The amount of space allocated to each mbuf to hold frame data is
> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
> parameter.
>
> Signed-off-by: Mark Kavanagh 
> Signed-off-by: Ilya Maximets 
> [diproiet...@vmware.com rebased]
> Signed-off-by: Daniele Di Proietto 
> ---
>
> v7:
> - add 'Signed-off-by' for Ilya Maximets (i.maxim...@samsung.com)
>
> v6:
> - include device name in netdev_dpdk_set_mtu error log
> - resolve minor coding standards infractions
>
> v5:
> - rename dpdk_mp_configure to netdev_dpdk_mempool_configure
> - consolidate socket_id and mtu changes within
>   netdev_dpdk_mempool_configure
> - add lower bounds check for user-supplied MTU
> - add socket_id and mtu fields to mempool configure error report
> - minor cosmetic changes
>
> v4:
> - restore error reporting in *_reconfigure functions (for
>   non-mtu-configuration based errors)
> - remove 'goto' in the event of dpdk_mp_configure failure
> - remove superfluous error variables
>
>  v3:
> - replace netdev_dpdk.last_mtu with local variable
> - add comment for dpdk_mp_configure
>
>  v2:
>  - rebase to HEAD of master
>  - fall back to previous 'good' MTU if reconfigure fails
>  - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
>fall-back
>  - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
>  - remove rebasing artifact in INSTALL.DPDK-Advanced.md
>  - remove superflous variable in dpdk_mp_configure
>  - fix minor coding style infraction
>
>
>  INSTALL.DPDK-ADVANCED.md |  58 ++-
>  INSTALL.DPDK.md  |   1 -
>  NEWS |   1 +
>  lib/netdev-dpdk.c| 145 ++
> +
>  4 files changed, 176 insertions(+), 29 deletions(-)
>
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 0ab43d4..5e758ce 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -1,5 +1,5 @@
>  OVS DPDK ADVANCED INSTALL GUIDE
> -=
> +===
>
>  ## Contents
>
> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>  7. [QOS](#qos)
>  8. [Rate Limiting](#rl)
>  9. [Flow Control](#fc)
> -10. [Vsperf](#vsperf)
> +10. [Jumbo Frames](#jumbo)
> +11. [Vsperf](#vsperf)
>
>  ##  1. Overview
>
> @@ -862,7 +863,58 @@ respective parameter. To disable the flow control at
> tx side,
>
>  `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
>
> -##  10. Vsperf
> +##  10. Jumbo Frames
> +
> +By default, DPDK ports are configured with standard Ethernet MTU (1500B).
> To
> +enable Jumbo Frames support for a DPDK port, change the Interface's
> `mtu_request`
> +attribute to a sufficiently large value.
> +
> +e.g. Add a DPDK Phy port with MTU of 9000:
> +
> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set
> Interface dpdk0 mtu_request=9000`
> +
> +e.g. Change the MTU of an existing port to 6200:
> +
> +`ovs-vsctl set Interface dpdk0 mtu_request=6200`
> +
> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
> +increased, such that a full Jumbo Frame of a specific size may be
> accommodated
> +within a single mbuf segment.
> +
> +Jumbo frame support has been validated against 9728B frames (largest
> frame size
> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger
> frames
> +(particularly in use cases involving East-West traffic only), and other
> DPDK NIC
> +drivers may be supported.
> +
> +### 9.1 vHost Ports and Jumbo Frames
> +
> +Some additional configuration is needed to take advantage of jumbo frames
> with
> +vhost ports:
> +
> +1. `mergeable buffers` must be enabled for vHost ports, as
> demonstrated in
> +the QEMU command line snippet below:
> +
> +```
> +'-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
> +'-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_
> rxbuf=on'
> +```
> +
> +2. Where virtio devices are bound to the Linux kernel driver in a
> guest
> +   environment (i.e. interfaces are not bound to an in-guest DPDK
> driver),
> +   the MTU of those logical network interfaces 

Re: [ovs-dev] [PATCH V7 1/7] ofproto: Consider datapath_type when looking for internal ports.

2016-08-09 Thread Daniele Di Proietto




On 09/08/2016 09:08, "Thadeu Lima de Souza Cascardo"  
wrote:

>On Tue, Aug 09, 2016 at 05:01:14PM +0100, Mark Kavanagh wrote:
>> From: Daniele Di Proietto 
>> 
>> Interfaces with type "internal" end up having a netdev with type "tap"
>> in the dpif-netdev datapath, so a strcmp will fail to match internal
>> interfaces.
>> 
>> We can translate the types with ofproto_port_open_type() before calling
>> strcmp to fix this.
>> 
>> This fixes a minor issue where internal interfaces are considered
>> non-internal in the userspace datapath for the purpose of adjusting the
>> MTU.
>> 
>> Signed-off-by: Daniele Di Proietto 
>
>Acked-by: Thadeu Lima de Souza Cascardo 

Thanks guys, I pushed this to master

>
>Hi, Mark.
>
>Can you keep my Ack in further submissions in case there are no changes to this
>patch?
>
>Thanks.
>Cascardo.
>
>> ---
>>  ofproto/ofproto.c | 16 +---
>>  1 file changed, 9 insertions(+), 7 deletions(-)
>> 
>> diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
>> index 8e59c69..088f91a 100644
>> --- a/ofproto/ofproto.c
>> +++ b/ofproto/ofproto.c
>> @@ -220,7 +220,8 @@ static void learned_cookies_flush(struct ofproto *, 
>> struct ovs_list *dead_cookie
>>  /* ofport. */
>>  static void ofport_destroy__(struct ofport *) OVS_EXCLUDED(ofproto_mutex);
>>  static void ofport_destroy(struct ofport *, bool del);
>> -static inline bool ofport_is_internal(const struct ofport *);
>> +static inline bool ofport_is_internal(const struct ofproto *,
>> +  const struct ofport *);
>>  
>>  static int update_port(struct ofproto *, const char *devname);
>>  static int init_ports(struct ofproto *);
>> @@ -2465,7 +2466,7 @@ static void
>>  ofport_remove(struct ofport *ofport)
>>  {
>>  struct ofproto *p = ofport->ofproto;
>> -bool is_internal = ofport_is_internal(ofport);
>> +bool is_internal = ofport_is_internal(p, ofport);
>>  
>>  connmgr_send_port_status(ofport->ofproto->connmgr, NULL, >pp,
>>   OFPPR_DELETE);
>> @@ -2751,9 +2752,10 @@ init_ports(struct ofproto *p)
>>  }
>>  
>>  static inline bool
>> -ofport_is_internal(const struct ofport *port)
>> +ofport_is_internal(const struct ofproto *p, const struct ofport *port)
>>  {
>> -return !strcmp(netdev_get_type(port->netdev), "internal");
>> +return !strcmp(netdev_get_type(port->netdev),
>> +   ofproto_port_open_type(p->type, "internal"));
>>  }
>>  
>>  /* Find the minimum MTU of all non-datapath devices attached to 'p'.
>> @@ -2770,7 +2772,7 @@ find_min_mtu(struct ofproto *p)
>>  
>>  /* Skip any internal ports, since that's what we're trying to
>>   * set. */
>> -if (ofport_is_internal(ofport)) {
>> +if (ofport_is_internal(p, ofport)) {
>>  continue;
>>  }
>>  
>> @@ -2797,7 +2799,7 @@ update_mtu(struct ofproto *p, struct ofport *port)
>>  port->mtu = 0;
>>  return;
>>  }
>> -if (ofport_is_internal(port)) {
>> +if (ofport_is_internal(p, port)) {
>>  if (dev_mtu > p->min_mtu) {
>> if (!netdev_set_mtu(port->netdev, p->min_mtu)) {
>> dev_mtu = p->min_mtu;
>> @@ -2827,7 +2829,7 @@ update_mtu_ofproto(struct ofproto *p)
>>  HMAP_FOR_EACH (ofport, hmap_node, >ports) {
>>  struct netdev *netdev = ofport->netdev;
>>  
>> -if (ofport_is_internal(ofport)) {
>> +if (ofport_is_internal(p, ofport)) {
>>  if (!netdev_set_mtu(netdev, p->min_mtu)) {
>>  ofport->mtu = p->min_mtu;
>>  }
>> -- 
>> 1.9.3
>> 
>> ___
>> dev mailing list
>> dev@openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Documents Requested

2016-08-09 Thread Owen
Dear dev,

Please find attached documents as requested.

Best Regards,
Owen
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH V7 1/7] ofproto: Consider datapath_type when looking for internal ports.

2016-08-09 Thread Thadeu Lima de Souza Cascardo
On Tue, Aug 09, 2016 at 05:01:14PM +0100, Mark Kavanagh wrote:
> From: Daniele Di Proietto 
> 
> Interfaces with type "internal" end up having a netdev with type "tap"
> in the dpif-netdev datapath, so a strcmp will fail to match internal
> interfaces.
> 
> We can translate the types with ofproto_port_open_type() before calling
> strcmp to fix this.
> 
> This fixes a minor issue where internal interfaces are considered
> non-internal in the userspace datapath for the purpose of adjusting the
> MTU.
> 
> Signed-off-by: Daniele Di Proietto 

Acked-by: Thadeu Lima de Souza Cascardo 

Hi, Mark.

Can you keep my Ack in further submissions in case there are no changes to this
patch?

Thanks.
Cascardo.

> ---
>  ofproto/ofproto.c | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
> index 8e59c69..088f91a 100644
> --- a/ofproto/ofproto.c
> +++ b/ofproto/ofproto.c
> @@ -220,7 +220,8 @@ static void learned_cookies_flush(struct ofproto *, 
> struct ovs_list *dead_cookie
>  /* ofport. */
>  static void ofport_destroy__(struct ofport *) OVS_EXCLUDED(ofproto_mutex);
>  static void ofport_destroy(struct ofport *, bool del);
> -static inline bool ofport_is_internal(const struct ofport *);
> +static inline bool ofport_is_internal(const struct ofproto *,
> +  const struct ofport *);
>  
>  static int update_port(struct ofproto *, const char *devname);
>  static int init_ports(struct ofproto *);
> @@ -2465,7 +2466,7 @@ static void
>  ofport_remove(struct ofport *ofport)
>  {
>  struct ofproto *p = ofport->ofproto;
> -bool is_internal = ofport_is_internal(ofport);
> +bool is_internal = ofport_is_internal(p, ofport);
>  
>  connmgr_send_port_status(ofport->ofproto->connmgr, NULL, >pp,
>   OFPPR_DELETE);
> @@ -2751,9 +2752,10 @@ init_ports(struct ofproto *p)
>  }
>  
>  static inline bool
> -ofport_is_internal(const struct ofport *port)
> +ofport_is_internal(const struct ofproto *p, const struct ofport *port)
>  {
> -return !strcmp(netdev_get_type(port->netdev), "internal");
> +return !strcmp(netdev_get_type(port->netdev),
> +   ofproto_port_open_type(p->type, "internal"));
>  }
>  
>  /* Find the minimum MTU of all non-datapath devices attached to 'p'.
> @@ -2770,7 +2772,7 @@ find_min_mtu(struct ofproto *p)
>  
>  /* Skip any internal ports, since that's what we're trying to
>   * set. */
> -if (ofport_is_internal(ofport)) {
> +if (ofport_is_internal(p, ofport)) {
>  continue;
>  }
>  
> @@ -2797,7 +2799,7 @@ update_mtu(struct ofproto *p, struct ofport *port)
>  port->mtu = 0;
>  return;
>  }
> -if (ofport_is_internal(port)) {
> +if (ofport_is_internal(p, port)) {
>  if (dev_mtu > p->min_mtu) {
> if (!netdev_set_mtu(port->netdev, p->min_mtu)) {
> dev_mtu = p->min_mtu;
> @@ -2827,7 +2829,7 @@ update_mtu_ofproto(struct ofproto *p)
>  HMAP_FOR_EACH (ofport, hmap_node, >ports) {
>  struct netdev *netdev = ofport->netdev;
>  
> -if (ofport_is_internal(ofport)) {
> +if (ofport_is_internal(p, ofport)) {
>  if (!netdev_set_mtu(netdev, p->min_mtu)) {
>  ofport->mtu = p->min_mtu;
>  }
> -- 
> 1.9.3
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH V6 7/7] netdev-dpdk: add support for jumbo frames

2016-08-09 Thread Kavanagh, Mark B
>
>On 09.08.2016 18:02, Mark Kavanagh wrote:
>> Add support for Jumbo Frames to DPDK-enabled port types,
>> using single-segment-mbufs.
>>
>> Using this approach, the amount of memory allocated to each mbuf
>> to store frame data is increased to a value greater than 1518B
>> (typical Ethernet maximum frame length). The increased space
>> available in the mbuf means that an entire Jumbo Frame of a specific
>> size can be carried in a single mbuf, as opposed to partitioning
>> it across multiple mbuf segments.
>>
>> The amount of space allocated to each mbuf to hold frame data is
>> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
>> parameter.
>>
>> Signed-off-by: Mark Kavanagh 
>> [diproiet...@vmware.com rebased]
>> Signed-off-by: Daniele Di Proietto 
>> ---
>>
>> v6:
>> - include device name in netdev_dpdk_set_mtu error log
>> - resolve minor coding standards infractions
>>
>> v5:
>> - rename dpdk_mp_configure to netdev_dpdk_mempool_configure
>> - consolidate socket_id and mtu changes within
>>   netdev_dpdk_mempool_configure
>> - add lower bounds check for user-supplied MTU
>> - add socket_id and mtu fields to mempool configure error report
>> - minor cosmetic changes
>>
>> v4:
>> - restore error reporting in *_reconfigure functions (for
>>   non-mtu-configuration based errors)
>> - remove 'goto' in the event of dpdk_mp_configure failure
>> - remove superfluous error variables
>>
>>  v3:
>> - replace netdev_dpdk.last_mtu with local variable
>> - add comment for dpdk_mp_configure
>>
>>  v2:
>>  - rebase to HEAD of master
>>  - fall back to previous 'good' MTU if reconfigure fails
>>  - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
>>fall-back
>>  - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
>>  - remove rebasing artifact in INSTALL.DPDK-Advanced.md
>>  - remove superflous variable in dpdk_mp_configure
>>  - fix minor coding style infraction
>>
>>
>>  INSTALL.DPDK-ADVANCED.md |  58 ++-
>>  INSTALL.DPDK.md  |   1 -
>>  NEWS |   1 +
>>  lib/netdev-dpdk.c| 145 
>> +++
>>  4 files changed, 176 insertions(+), 29 deletions(-)
>
>Looks good to me.
>You may add one of this tags:
>
>Signed-off-by: Ilya Maximets 
>Acked-by: Ilya Maximets 
>
>Choose which of them is more suitable.

I added you as 'Signed-off-by' - thanks for all of your review comments and 
contributions to this patch!

Cheers,
Mark

>
>Best regards, Ilya Maximets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH V7 7/7] netdev-dpdk: add support for jumbo frames

2016-08-09 Thread Mark Kavanagh
Add support for Jumbo Frames to DPDK-enabled port types,
using single-segment-mbufs.

Using this approach, the amount of memory allocated to each mbuf
to store frame data is increased to a value greater than 1518B
(typical Ethernet maximum frame length). The increased space
available in the mbuf means that an entire Jumbo Frame of a specific
size can be carried in a single mbuf, as opposed to partitioning
it across multiple mbuf segments.

The amount of space allocated to each mbuf to hold frame data is
defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
parameter.

Signed-off-by: Mark Kavanagh 
Signed-off-by: Ilya Maximets 
[diproiet...@vmware.com rebased]
Signed-off-by: Daniele Di Proietto 
---

v7:
- add 'Signed-off-by' for Ilya Maximets (i.maxim...@samsung.com)

v6:
- include device name in netdev_dpdk_set_mtu error log
- resolve minor coding standards infractions

v5:
- rename dpdk_mp_configure to netdev_dpdk_mempool_configure
- consolidate socket_id and mtu changes within
  netdev_dpdk_mempool_configure
- add lower bounds check for user-supplied MTU
- add socket_id and mtu fields to mempool configure error report
- minor cosmetic changes

v4:
- restore error reporting in *_reconfigure functions (for
  non-mtu-configuration based errors)
- remove 'goto' in the event of dpdk_mp_configure failure
- remove superfluous error variables

 v3:
- replace netdev_dpdk.last_mtu with local variable
- add comment for dpdk_mp_configure

 v2:
 - rebase to HEAD of master
 - fall back to previous 'good' MTU if reconfigure fails
 - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
   fall-back
 - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
 - remove rebasing artifact in INSTALL.DPDK-Advanced.md
 - remove superflous variable in dpdk_mp_configure
 - fix minor coding style infraction


 INSTALL.DPDK-ADVANCED.md |  58 ++-
 INSTALL.DPDK.md  |   1 -
 NEWS |   1 +
 lib/netdev-dpdk.c| 145 +++
 4 files changed, 176 insertions(+), 29 deletions(-)

diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
index 0ab43d4..5e758ce 100755
--- a/INSTALL.DPDK-ADVANCED.md
+++ b/INSTALL.DPDK-ADVANCED.md
@@ -1,5 +1,5 @@
 OVS DPDK ADVANCED INSTALL GUIDE
-=
+===
 
 ## Contents
 
@@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
 7. [QOS](#qos)
 8. [Rate Limiting](#rl)
 9. [Flow Control](#fc)
-10. [Vsperf](#vsperf)
+10. [Jumbo Frames](#jumbo)
+11. [Vsperf](#vsperf)
 
 ##  1. Overview
 
@@ -862,7 +863,58 @@ respective parameter. To disable the flow control at tx 
side,
 
 `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
 
-##  10. Vsperf
+##  10. Jumbo Frames
+
+By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
+enable Jumbo Frames support for a DPDK port, change the Interface's 
`mtu_request`
+attribute to a sufficiently large value.
+
+e.g. Add a DPDK Phy port with MTU of 9000:
+
+`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set 
Interface dpdk0 mtu_request=9000`
+
+e.g. Change the MTU of an existing port to 6200:
+
+`ovs-vsctl set Interface dpdk0 mtu_request=6200`
+
+When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
+increased, such that a full Jumbo Frame of a specific size may be accommodated
+within a single mbuf segment.
+
+Jumbo frame support has been validated against 9728B frames (largest frame size
+supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
+(particularly in use cases involving East-West traffic only), and other DPDK 
NIC
+drivers may be supported.
+
+### 9.1 vHost Ports and Jumbo Frames
+
+Some additional configuration is needed to take advantage of jumbo frames with
+vhost ports:
+
+1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
+the QEMU command line snippet below:
+
+```
+'-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
+'-device 
virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
+```
+
+2. Where virtio devices are bound to the Linux kernel driver in a guest
+   environment (i.e. interfaces are not bound to an in-guest DPDK driver),
+   the MTU of those logical network interfaces must also be increased to a
+   sufficiently large value. This avoids segmentation of Jumbo Frames
+   received in the guest. Note that 'MTU' refers to the length of the IP
+   packet only, and not that of the entire frame.
+
+   To calculate the exact MTU of a standard IPv4 frame, subtract the L2
+   header and CRC lengths (i.e. 18B) from the max supported frame size.
+   So, to set the MTU for a 9018B Jumbo Frame:
+
+   

[ovs-dev] [PATCH V7 6/7] netdev: Make netdev_set_mtu() netdev parameter non-const.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

Every provider silently drops the const attribute when converting the
parameter to the appropriate subclass.  Might as well drop the const
attribute from the parameter, since this is a "set" function.

Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-dummy.c| 2 +-
 lib/netdev-linux.c| 2 +-
 lib/netdev-provider.h | 2 +-
 lib/netdev.c  | 2 +-
 lib/netdev.h  | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index c8f82b7..dec1a8e 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -1150,7 +1150,7 @@ netdev_dummy_get_mtu(const struct netdev *netdev, int 
*mtup)
 }
 
 static int
-netdev_dummy_set_mtu(const struct netdev *netdev, int mtu)
+netdev_dummy_set_mtu(struct netdev *netdev, int mtu)
 {
 struct netdev_dummy *dev = netdev_dummy_cast(netdev);
 
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 1b5f7c1..20b5cc7 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -1382,7 +1382,7 @@ netdev_linux_get_mtu(const struct netdev *netdev_, int 
*mtup)
  * networking ioctl interface.
  */
 static int
-netdev_linux_set_mtu(const struct netdev *netdev_, int mtu)
+netdev_linux_set_mtu(struct netdev *netdev_, int mtu)
 {
 struct netdev_linux *netdev = netdev_linux_cast(netdev_);
 struct ifreq ifr;
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index 5bcfeba..cd04ae9 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -389,7 +389,7 @@ struct netdev_class {
  * If 'netdev' does not have an MTU (e.g. as some tunnels do not), then
  * this function should return EOPNOTSUPP.  This function may be set to
  * null if it would always return EOPNOTSUPP. */
-int (*set_mtu)(const struct netdev *netdev, int mtu);
+int (*set_mtu)(struct netdev *netdev, int mtu);
 
 /* Returns the ifindex of 'netdev', if successful, as a positive number.
  * On failure, returns a negative errno value.
diff --git a/lib/netdev.c b/lib/netdev.c
index 589d37c..5cf8bbb 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -869,7 +869,7 @@ netdev_get_mtu(const struct netdev *netdev, int *mtup)
  * MTU (as e.g. some tunnels do not).  On other failure, returns a positive
  * errno value. */
 int
-netdev_set_mtu(const struct netdev *netdev, int mtu)
+netdev_set_mtu(struct netdev *netdev, int mtu)
 {
 const struct netdev_class *class = netdev->netdev_class;
 int error;
diff --git a/lib/netdev.h b/lib/netdev.h
index dc7ede8..d8ec627 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -132,7 +132,7 @@ const char *netdev_get_name(const struct netdev *);
 const char *netdev_get_type(const struct netdev *);
 const char *netdev_get_type_from_name(const char *);
 int netdev_get_mtu(const struct netdev *, int *mtup);
-int netdev_set_mtu(const struct netdev *, int mtu);
+int netdev_set_mtu(struct netdev *, int mtu);
 int netdev_get_ifindex(const struct netdev *);
 int netdev_set_tx_multiq(struct netdev *, unsigned int n_txq);
 
-- 
1.9.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH V7 5/7] tests: Add a new MTU test.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

Also, netdev-dummy needs to call netdev_change_seq_changed() in
set_mtu().

Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-dummy.c|  5 -
 tests/ofproto-dpif.at | 30 ++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 92af15f..c8f82b7 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -1155,7 +1155,10 @@ netdev_dummy_set_mtu(const struct netdev *netdev, int 
mtu)
 struct netdev_dummy *dev = netdev_dummy_cast(netdev);
 
 ovs_mutex_lock(>mutex);
-dev->mtu = mtu;
+if (dev->mtu != mtu) {
+dev->mtu = mtu;
+netdev_change_seq_changed(netdev);
+}
 ovs_mutex_unlock(>mutex);
 
 return 0;
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index a46fc81..3638063 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -8859,3 +8859,33 @@ n_packets=0
 
 OVS_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([ofproto - set mtu])
+OVS_VSWITCHD_START
+
+add_of_ports br0 1
+
+# Check that initial MTU is 1500 for 'br0' and 'p1'.
+AT_CHECK([ovs-vsctl get Interface br0 mtu], [0], [dnl
+1500
+])
+AT_CHECK([ovs-vsctl get Interface p1 mtu], [0], [dnl
+1500
+])
+
+# Request new MTU for 'p1'
+AT_CHECK([ovs-vsctl set Interface p1 mtu_request=1600])
+
+# Check that the new MTU is applied
+AT_CHECK([ovs-vsctl --timeout=10 wait-until Interface p1 mtu=1600])
+# The internal port 'br0' should have the same MTU value as p1, becase it's
+# the new bridge minimum.
+AT_CHECK([ovs-vsctl --timeout=10 wait-until Interface br0 mtu=1600])
+
+AT_CHECK([ovs-vsctl del-port br0 p1])
+
+# When 'p1' is deleted, the internal port should return to the default MTU
+AT_CHECK([ovs-vsctl --timeout=10 wait-until Interface br0 mtu=1500])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
-- 
1.9.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH V7 3/7] netdev: Pass 'netdev_class' to ->run() and ->wait().

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

This will allow run() and wait() methods to be shared between different
classes and still perform class-specific work.

Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-bsd.c  |  6 +++---
 lib/netdev-dummy.c|  4 ++--
 lib/netdev-linux.c|  6 +++---
 lib/netdev-provider.h | 14 ++
 lib/netdev-vport.c|  4 ++--
 lib/netdev.c  |  4 ++--
 6 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c
index 2bba0ed..75a330b 100644
--- a/lib/netdev-bsd.c
+++ b/lib/netdev-bsd.c
@@ -146,7 +146,7 @@ static void ifr_set_flags(struct ifreq *, int flags);
 static int af_link_ioctl(unsigned long command, const void *arg);
 #endif
 
-static void netdev_bsd_run(void);
+static void netdev_bsd_run(const struct netdev_class *);
 static int netdev_bsd_get_mtu(const struct netdev *netdev_, int *mtup);
 
 static bool
@@ -180,7 +180,7 @@ netdev_get_kernel_name(const struct netdev *netdev)
  * interface status changes, and eventually calls all the user callbacks.
  */
 static void
-netdev_bsd_run(void)
+netdev_bsd_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 rtbsd_notifier_run();
 }
@@ -190,7 +190,7 @@ netdev_bsd_run(void)
  * be called.
  */
 static void
-netdev_bsd_wait(void)
+netdev_bsd_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 rtbsd_notifier_wait();
 }
diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index a950409..2a6aa56 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -622,7 +622,7 @@ dummy_netdev_get_conn_state(struct dummy_packet_conn *conn)
 }
 
 static void
-netdev_dummy_run(void)
+netdev_dummy_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct netdev_dummy *dev;
 
@@ -636,7 +636,7 @@ netdev_dummy_run(void)
 }
 
 static void
-netdev_dummy_wait(void)
+netdev_dummy_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct netdev_dummy *dev;
 
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index fa37bcf..1b5f7c1 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -526,7 +526,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
20);
  * changes in the device miimon status, so we can use atomic_count. */
 static atomic_count miimon_cnt = ATOMIC_COUNT_INIT(0);
 
-static void netdev_linux_run(void);
+static void netdev_linux_run(const struct netdev_class *);
 
 static int netdev_linux_do_ethtool(const char *name, struct ethtool_cmd *,
int cmd, const char *cmd_name);
@@ -623,7 +623,7 @@ netdev_linux_miimon_enabled(void)
 }
 
 static void
-netdev_linux_run(void)
+netdev_linux_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct nl_sock *sock;
 int error;
@@ -697,7 +697,7 @@ netdev_linux_run(void)
 }
 
 static void
-netdev_linux_wait(void)
+netdev_linux_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct nl_sock *sock;
 
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index ae390cb..5bcfeba 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -236,15 +236,21 @@ struct netdev_class {
 int (*init)(void);
 
 /* Performs periodic work needed by netdevs of this class.  May be null if
- * no periodic work is necessary. */
-void (*run)(void);
+ * no periodic work is necessary.
+ *
+ * 'netdev_class' points to the class.  It is useful in case the same
+ * function is used to implement different classes. */
+void (*run)(const struct netdev_class *netdev_class);
 
 /* Arranges for poll_block() to wake up if the "run" member function needs
  * to be called.  Implementations are additionally required to wake
  * whenever something changes in any of its netdevs which would cause their
  * ->change_seq() function to change its result.  May be null if nothing is
- * needed here. */
-void (*wait)(void);
+ * needed here.
+ *
+ * 'netdev_class' points to the class.  It is useful in case the same
+ * function is used to implement different classes. */
+void (*wait)(const struct netdev_class *netdev_class);
 
 /* ##  ## */
 /* ## netdev Functions ## */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 87a30f8..7eabd2c 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -321,7 +321,7 @@ netdev_vport_update_flags(struct netdev *netdev OVS_UNUSED,
 }
 
 static void
-netdev_vport_run(void)
+netdev_vport_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 uint64_t seq;
 
@@ -334,7 +334,7 @@ netdev_vport_run(void)
 }
 
 static void
-netdev_vport_wait(void)
+netdev_vport_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 uint64_t seq;
 
diff --git a/lib/netdev.c b/lib/netdev.c
index 75bf1cb..589d37c 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -160,7 +160,7 @@ netdev_run(void)
 struct netdev_registered_class *rc;
 CMAP_FOR_EACH (rc, cmap_node, 

[ovs-dev] [PATCH V7 4/7] netdev-dummy: Add dummy-internal class.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

"internal" netdevs are treated specially in OVS (e.g. for MTU), but
the dummy datapath remaps both "system" and "internal" devices to the
same "dummy" netdev class, so there's no way to discern those in tests.

This commit adds a new "dummy-internal" netdev type, which will be used
by the dummy datapath for internal ports, so that other parts of the
code can understand which ports are internal just by looking at the
netdev object.

The alternative solution, using the original interface type ("internal")
instead of the translated netdev type ("dummy"), is harder to implement,
because in so many places only the netdev object is available.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c |  2 +-
 lib/netdev-dummy.c| 14 --
 tests/bridge.at   |  6 +++---
 tests/dpctl.at| 12 ++--
 tests/mpls-xlate.at   |  4 ++--
 tests/netdev-type.at  |  2 +-
 tests/ofproto-dpif.at | 18 +-
 tests/ovs-vswitchd.at |  6 +++---
 tests/pmd.at  |  8 
 tests/tunnel-push-pop-ipv6.at |  4 ++--
 tests/tunnel-push-pop.at  |  4 ++--
 tests/tunnel.at   | 28 ++--
 12 files changed, 59 insertions(+), 49 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e39362e..6f2e07d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -888,7 +888,7 @@ static const char *
 dpif_netdev_port_open_type(const struct dpif_class *class, const char *type)
 {
 return strcmp(type, "internal") ? type
-  : dpif_netdev_class_is_dummy(class) ? "dummy"
+  : dpif_netdev_class_is_dummy(class) ? "dummy-internal"
   : "tap";
 }
 
diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 2a6aa56..92af15f 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -622,12 +622,15 @@ dummy_netdev_get_conn_state(struct dummy_packet_conn 
*conn)
 }
 
 static void
-netdev_dummy_run(const struct netdev_class *netdev_class OVS_UNUSED)
+netdev_dummy_run(const struct netdev_class *netdev_class)
 {
 struct netdev_dummy *dev;
 
 ovs_mutex_lock(_list_mutex);
 LIST_FOR_EACH (dev, list_node, _list) {
+if (netdev_get_class(>up) != netdev_class) {
+continue;
+}
 ovs_mutex_lock(>mutex);
 dummy_packet_conn_run(dev);
 ovs_mutex_unlock(>mutex);
@@ -636,12 +639,15 @@ netdev_dummy_run(const struct netdev_class *netdev_class 
OVS_UNUSED)
 }
 
 static void
-netdev_dummy_wait(const struct netdev_class *netdev_class OVS_UNUSED)
+netdev_dummy_wait(const struct netdev_class *netdev_class)
 {
 struct netdev_dummy *dev;
 
 ovs_mutex_lock(_list_mutex);
 LIST_FOR_EACH (dev, list_node, _list) {
+if (netdev_get_class(>up) != netdev_class) {
+continue;
+}
 ovs_mutex_lock(>mutex);
 dummy_packet_conn_wait(>conn);
 ovs_mutex_unlock(>mutex);
@@ -1380,6 +1386,9 @@ netdev_dummy_update_flags(struct netdev *netdev_,
 static const struct netdev_class dummy_class =
 NETDEV_DUMMY_CLASS("dummy", false, NULL);
 
+static const struct netdev_class dummy_internal_class =
+NETDEV_DUMMY_CLASS("dummy-internal", false, NULL);
+
 static const struct netdev_class dummy_pmd_class =
 NETDEV_DUMMY_CLASS("dummy-pmd", true,
netdev_dummy_reconfigure);
@@ -1751,6 +1760,7 @@ netdev_dummy_register(enum dummy_level level)
 netdev_dummy_override("system");
 }
 netdev_register_provider(_class);
+netdev_register_provider(_internal_class);
 netdev_register_provider(_pmd_class);
 
 netdev_vport_tunnel_register();
diff --git a/tests/bridge.at b/tests/bridge.at
index 37c55ba..3dbabe5 100644
--- a/tests/bridge.at
+++ b/tests/bridge.at
@@ -12,7 +12,7 @@ add_of_ports br0 1 2
 AT_CHECK([ovs-appctl dpif/show], [0], [dnl
 dummy@ovs-dummy: hit:0 missed:0
br0:
-   br0 65534/100: (dummy)
+   br0 65534/100: (dummy-internal)
p1 1/1: (dummy)
p2 2/2: (dummy)
 ])
@@ -23,7 +23,7 @@ AT_CHECK([ovs-appctl dpctl/del-if dummy@ovs-dummy p1])
 AT_CHECK([ovs-appctl dpif/show], [0], [dnl
 dummy@ovs-dummy: hit:0 missed:0
br0:
-   br0 65534/100: (dummy)
+   br0 65534/100: (dummy-internal)
p2 2/2: (dummy)
 ])
 
@@ -32,7 +32,7 @@ AT_CHECK([ovs-vsctl del-port p2])
 AT_CHECK([ovs-appctl dpif/show], [0], [dnl
 dummy@ovs-dummy: hit:0 missed:0
br0:
-   br0 65534/100: (dummy)
+   br0 65534/100: (dummy-internal)
p1 1/1: (dummy)
 ])
 OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
diff --git a/tests/dpctl.at b/tests/dpctl.at
index b6d5dd6..8c761c8 100644
--- a/tests/dpctl.at
+++ b/tests/dpctl.at
@@ -23,14 +23,14 @@ AT_CHECK([ovs-appctl dpctl/show dummy@br0], [0], [dnl
 dummy@br0:
lookups: hit:0 

[ovs-dev] [PATCH V7 2/7] vswitchd: Introduce 'mtu_request' column in Interface.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

The 'mtu_request' column can be used to set the MTU of a specific
interface.

This column is useful because it will allow changing the MTU of DPDK
devices (implemented in a future commit), which are not accessible
outside the ovs-vswitchd process, but it can be used for kernel
interfaces as well.

The current implementation of set_mtu() in netdev-dpdk is removed
because it's broken.  It will be reintroduced by a subsequent commit on
this series.

Signed-off-by: Daniele Di Proietto 
---
 NEWS   |  2 ++
 lib/netdev-dpdk.c  | 53 +-
 vswitchd/bridge.c  |  9 
 vswitchd/vswitch.ovsschema | 10 +++--
 vswitchd/vswitch.xml   | 52 +
 5 files changed, 58 insertions(+), 68 deletions(-)

diff --git a/NEWS b/NEWS
index c2ed71d..ce10982 100644
--- a/NEWS
+++ b/NEWS
@@ -101,6 +101,8 @@ Post-v2.5.0
- ovs-pki: Changed message digest algorithm from SHA-1 to SHA-512 because
  SHA-1 is no longer secure and some operating systems have started to
  disable it in OpenSSL.
+   - Add 'mtu_request' column to the Interface table. It can be used to
+ configure the MTU of non-internal ports.
 
 
 v2.5.0 - 26 Feb 2016
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index f37ec1c..60db568 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1639,57 +1639,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int 
*mtup)
 }
 
 static int
-netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
-{
-struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
-int old_mtu, err, dpdk_mtu;
-struct dpdk_mp *old_mp;
-struct dpdk_mp *mp;
-uint32_t buf_size;
-
-ovs_mutex_lock(_mutex);
-ovs_mutex_lock(>mutex);
-if (dev->mtu == mtu) {
-err = 0;
-goto out;
-}
-
-buf_size = dpdk_buf_size(mtu);
-dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
-
-mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
-if (!mp) {
-err = ENOMEM;
-goto out;
-}
-
-rte_eth_dev_stop(dev->port_id);
-
-old_mtu = dev->mtu;
-old_mp = dev->dpdk_mp;
-dev->dpdk_mp = mp;
-dev->mtu = mtu;
-dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-
-err = dpdk_eth_dev_init(dev);
-if (err) {
-dpdk_mp_put(mp);
-dev->mtu = old_mtu;
-dev->dpdk_mp = old_mp;
-dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-dpdk_eth_dev_init(dev);
-goto out;
-}
-
-dpdk_mp_put(old_mp);
-netdev_change_seq_changed(netdev);
-out:
-ovs_mutex_unlock(>mutex);
-ovs_mutex_unlock(_mutex);
-return err;
-}
-
-static int
 netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier);
 
 static int
@@ -2964,7 +2913,7 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev)
 netdev_dpdk_set_etheraddr,\
 netdev_dpdk_get_etheraddr,\
 netdev_dpdk_get_mtu,  \
-netdev_dpdk_set_mtu,  \
+NULL,   /* set_mtu */ \
 netdev_dpdk_get_ifindex,  \
 GET_CARRIER,  \
 netdev_dpdk_get_carrier_resets,   \
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index ddf1fe5..397be70 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -775,6 +775,15 @@ bridge_delete_or_reconfigure_ports(struct bridge *br)
 goto delete;
 }
 
+if (iface->cfg->n_mtu_request == 1
+&& strcmp(iface->type,
+  ofproto_port_open_type(br->type, "internal"))) {
+/* Try to set the MTU to the requested value.  This is not done
+ * for internal interfaces, since their MTU is decided by the
+ * ofproto module, based on other ports in the bridge. */
+netdev_set_mtu(iface->netdev, *iface->cfg->mtu_request);
+}
+
 /* If the requested OpenFlow port for 'iface' changed, and it's not
  * already the correct port, then we might want to temporarily delete
  * this interface, so we can add it back again with the new OpenFlow
diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 32fdf28..8966803 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@
 {"name": "Open_vSwitch",
- "version": "7.13.0",
- "cksum": "889248633 22774",
+ "version": "7.14.0",
+ "cksum": "3974332717 22936",
  "tables": {
"Open_vSwitch": {
  "columns": {
@@ -321,6 +321,12 @@
"mtu": {
  "type": {"key": "integer", "min": 0, "max": 1},
  "ephemeral": true},
+   "mtu_request": {
+ "type": {
+   "key": {"type": "integer",
+   "minInteger": 1},
+   

[ovs-dev] [PATCH V7 1/7] ofproto: Consider datapath_type when looking for internal ports.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

Interfaces with type "internal" end up having a netdev with type "tap"
in the dpif-netdev datapath, so a strcmp will fail to match internal
interfaces.

We can translate the types with ofproto_port_open_type() before calling
strcmp to fix this.

This fixes a minor issue where internal interfaces are considered
non-internal in the userspace datapath for the purpose of adjusting the
MTU.

Signed-off-by: Daniele Di Proietto 
---
 ofproto/ofproto.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 8e59c69..088f91a 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -220,7 +220,8 @@ static void learned_cookies_flush(struct ofproto *, struct 
ovs_list *dead_cookie
 /* ofport. */
 static void ofport_destroy__(struct ofport *) OVS_EXCLUDED(ofproto_mutex);
 static void ofport_destroy(struct ofport *, bool del);
-static inline bool ofport_is_internal(const struct ofport *);
+static inline bool ofport_is_internal(const struct ofproto *,
+  const struct ofport *);
 
 static int update_port(struct ofproto *, const char *devname);
 static int init_ports(struct ofproto *);
@@ -2465,7 +2466,7 @@ static void
 ofport_remove(struct ofport *ofport)
 {
 struct ofproto *p = ofport->ofproto;
-bool is_internal = ofport_is_internal(ofport);
+bool is_internal = ofport_is_internal(p, ofport);
 
 connmgr_send_port_status(ofport->ofproto->connmgr, NULL, >pp,
  OFPPR_DELETE);
@@ -2751,9 +2752,10 @@ init_ports(struct ofproto *p)
 }
 
 static inline bool
-ofport_is_internal(const struct ofport *port)
+ofport_is_internal(const struct ofproto *p, const struct ofport *port)
 {
-return !strcmp(netdev_get_type(port->netdev), "internal");
+return !strcmp(netdev_get_type(port->netdev),
+   ofproto_port_open_type(p->type, "internal"));
 }
 
 /* Find the minimum MTU of all non-datapath devices attached to 'p'.
@@ -2770,7 +2772,7 @@ find_min_mtu(struct ofproto *p)
 
 /* Skip any internal ports, since that's what we're trying to
  * set. */
-if (ofport_is_internal(ofport)) {
+if (ofport_is_internal(p, ofport)) {
 continue;
 }
 
@@ -2797,7 +2799,7 @@ update_mtu(struct ofproto *p, struct ofport *port)
 port->mtu = 0;
 return;
 }
-if (ofport_is_internal(port)) {
+if (ofport_is_internal(p, port)) {
 if (dev_mtu > p->min_mtu) {
if (!netdev_set_mtu(port->netdev, p->min_mtu)) {
dev_mtu = p->min_mtu;
@@ -2827,7 +2829,7 @@ update_mtu_ofproto(struct ofproto *p)
 HMAP_FOR_EACH (ofport, hmap_node, >ports) {
 struct netdev *netdev = ofport->netdev;
 
-if (ofport_is_internal(ofport)) {
+if (ofport_is_internal(p, ofport)) {
 if (!netdev_set_mtu(netdev, p->min_mtu)) {
 ofport->mtu = p->min_mtu;
 }
-- 
1.9.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] the host will be soft lookup when some illeagal packets attack host

2016-08-09 Thread pravin shelar
On Mon, Aug 8, 2016 at 8:32 PM, Zhangkun (K)  wrote:
> diff --git a/datapath/linux/compat/flow_dissector.c 
> b/datapath/linux/compat/flow_dissector.c
> index 3f42dba..4c5d023 100644
> --- a/datapath/linux/compat/flow_dissector.c
> +++ b/datapath/linux/compat/flow_dissector.c
> @@ -77,7 +77,7 @@ again:
> struct iphdr _iph;
> ip:
> iph = skb_header_pointer(skb, nhoff, sizeof(_iph), &_iph);
> -   if (!iph)
> +   if (!iph || iph->ihl < 5)
> return false;
> if (ip_is_fragment(iph))
>
>
Good catch. Can you send formal patch?

Thanks.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support

2016-08-09 Thread pravin shelar
On Mon, Aug 8, 2016 at 8:17 AM, Simon Horman  wrote:
> On Wed, Jul 20, 2016 at 11:06:37AM -0700, pravin shelar wrote:
>> On Tue, Jul 19, 2016 at 5:02 PM, Simon Horman
>>  wrote:
>> > On Mon, Jul 18, 2016 at 03:34:52PM -0700, pravin shelar wrote:
>> >> On Sun, Jul 17, 2016 at 9:50 PM, Simon Horman
>> >>  wrote:
>> >> > [CC Jiri Benc for portion regarding GRE]
>> >> >
>> >> > Hi Pravin,
>> >> >
>> >> > On Fri, Jul 15, 2016 at 02:07:37PM -0700, pravin shelar wrote:
>> >> >> On Wed, Jul 13, 2016 at 12:31 AM, Simon Horman
>> >> >>  wrote:
>> >> >> > Hi Pravin,
>> >> >> >
>> >> >> > On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote:
>> >> >> >> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
>> >> >> >>  wrote:
>> >> >> >
>> >> >> > ...
>> >> >>
>> >> >> >
>> >> >> >> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>> >> >> >> > index 0ea128eeeab2..86f2cfb19de3 100644
>> >> >> >> > --- a/net/openvswitch/flow.c
>> >> >> >> > +++ b/net/openvswitch/flow.c
>> >> >> >> ...
>> >> >> >>
>> >> >> >> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct 
>> >> >> >> > ip_tunnel_info *tun_info,
>> >> >> >> > key->phy.skb_mark = skb->mark;
>> >> >> >> > ovs_ct_fill_key(skb, key);
>> >> >> >> > key->ovs_flow_hash = 0;
>> >> >> >> > +   key->phy.is_layer3 = skb->mac_len == 0;
>> >> >> >>
>> >> >> >> I do not think mac_len can be used. mac_header needs to be checked.
>> >> >> >> ...
>> >> >> >
>> >> >> > Yes, indeed. The update to use skb_mac_header_was_set() here 
>> >> >> > accidently
>> >> >> > slipped into the following patch, sorry about that.
>> >> >> >
>> >> >> > With that change in place I believe that this patch is internally
>> >> >> > consistent because mac_header and mac_len are set correctly by the
>> >> >> > call to key_extract() which is called by ovs_flow_key_extract() just
>> >> >> > after where the excerpt above ends.
>> >> >> >
>> >> >> > That said, I do think that it is possible to rely on 
>> >> >> > skb_mac_header_was_set
>> >> >> > throughout the datapath, including action processing etc... I have 
>> >> >> > provided
>> >> >> > an incremental patch - which I created on top of this entire series 
>> >> >> > - at
>> >> >> > the end of this email. If you prefer that approach I am happy to 
>> >> >> > take it,
>> >> >> > though I do feel that using mac_len leads to slightly cleaner code. 
>> >> >> > Let me
>> >> >> > know what you think.
>> >> >> >
>> >> >>
>> >> >>
>> >> >> I am not sure if you can use only mac_len to detect L3 packet. This
>> >> >> does not work with MPLS packets, mac_len is used to account MPLS
>> >> >> headers pushed on skb. Therefore in case of a MPLS header on L3
>> >> >> packet, mac_len would be non zero and we have to look at either
>> >> >> mac_header or some other metadata like is_layer3 flag from key to
>> >> >> check for L3 packet.
>> >> >
>> >> > At least within OvS mac_len does not include the length of the MPLS 
>> >> > label
>> >> > stack. Rather, the MPLS label stack length is the difference between the
>> >> > end of (mac_header + mac_len) and network_header.
>> >> >
>> >> > So I think that the scheme does work as mac_len is 0 if there is no L2
>> >> > header regardless of if an MPLS label stack is present or not.
>> >> >
>> >>
>> >> I was thinking in overall networking stack rather than just ovs
>> >> datapath. I think we should have consistent method of detecting L3
>> >> packet. As commented in previous mail it could be achieved using
>> >> skb-protocol and device type.
>> >
>> > This is somewhat of a surprise to me. As far as I recall when MPLS support
>> > was added to OvS it and the accompanying support for MPLS GSO was the only
>> > MPLS support present in the kernel. And at the time the scheme developed by
>> > Jesse Gross, myself and others was as I describe above.
>> >
>> > Internally OvS relies on this scheme and in particular it is used
>> > by skb_mpls_header() to calculate the beginning of the MPLS label stack
>> > accurately in the presence of VLAN tags.
>> >
>> > Is it mpls_gso_segment() that you are concerned about?
>> > If so, perhaps the problem could be addressed there.
>>
>> Yes.
>> Can you read the comment I made in previous main in context of
>> function skb_mpls_header(). I have given rational for requested
>> change.
>
> Hi Pravin,
>
> I have made an attempt to implement your suggestion to the extent that
> I understand it. The following is an incremental change on top
> of this patch-set. Does it move things closer to what you have in mind?
>
Following approach looks good to me. I have posted couple of comments.

> Light testing seems to indicate that it works for GSO skbs
> received over both L3 and L2 GRE tunnels by OvS with both
> IP-in-MPLS and IP (without MPLS) payloads.
>

Thanks for testing it. Can you also add those tests to OVS kmod test 

[ovs-dev] FW: Documents Requested

2016-08-09 Thread Elvis
Dear dev,

Please find attached documents as requested.

Best Regards,
Elvis
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH V6 7/7] netdev-dpdk: add support for jumbo frames

2016-08-09 Thread Ilya Maximets
On 09.08.2016 18:02, Mark Kavanagh wrote:
> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
> 
> Using this approach, the amount of memory allocated to each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame of a specific
> size can be carried in a single mbuf, as opposed to partitioning
> it across multiple mbuf segments.
> 
> The amount of space allocated to each mbuf to hold frame data is
> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
> parameter.
> 
> Signed-off-by: Mark Kavanagh 
> [diproiet...@vmware.com rebased]
> Signed-off-by: Daniele Di Proietto 
> ---
> 
> v6:
> - include device name in netdev_dpdk_set_mtu error log
> - resolve minor coding standards infractions
> 
> v5:
> - rename dpdk_mp_configure to netdev_dpdk_mempool_configure
> - consolidate socket_id and mtu changes within
>   netdev_dpdk_mempool_configure
> - add lower bounds check for user-supplied MTU
> - add socket_id and mtu fields to mempool configure error report
> - minor cosmetic changes
> 
> v4:
> - restore error reporting in *_reconfigure functions (for
>   non-mtu-configuration based errors)
> - remove 'goto' in the event of dpdk_mp_configure failure
> - remove superfluous error variables
> 
>  v3:
> - replace netdev_dpdk.last_mtu with local variable
> - add comment for dpdk_mp_configure
> 
>  v2:
>  - rebase to HEAD of master
>  - fall back to previous 'good' MTU if reconfigure fails
>  - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
>fall-back
>  - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
>  - remove rebasing artifact in INSTALL.DPDK-Advanced.md
>  - remove superflous variable in dpdk_mp_configure
>  - fix minor coding style infraction
> 
> 
>  INSTALL.DPDK-ADVANCED.md |  58 ++-
>  INSTALL.DPDK.md  |   1 -
>  NEWS |   1 +
>  lib/netdev-dpdk.c| 145 
> +++
>  4 files changed, 176 insertions(+), 29 deletions(-)

Looks good to me.
You may add one of this tags:

Signed-off-by: Ilya Maximets 
Acked-by: Ilya Maximets 

Choose which of them is more suitable.

Best regards, Ilya Maximets.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] openvswitch: do not ignore netdev errors when creating tunnel vports

2016-08-09 Thread Martynas Pumputis
The creation of a tunnel vport (geneve, gre, vxlan) brings up a
corresponding netdev, a multi-step operation which can fail.

For example, changing a vxlan vport's netdev state to 'up' binds the
vport's socket to a UDP port - if the binding fails (e.g. due to the
port being in use), the error is currently ignored giving the
appearance that the tunnel vport creation completed successfully.

Signed-off-by: Martynas Pumputis 
---
 net/openvswitch/vport-geneve.c |  9 -
 net/openvswitch/vport-gre.c| 11 +--
 net/openvswitch/vport-vxlan.c  |  9 -
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index 1a1fcec..5aaf3ba 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -93,7 +93,14 @@ static struct vport *geneve_tnl_create(const struct 
vport_parms *parms)
return ERR_CAST(dev);
}
 
-   dev_change_flags(dev, dev->flags | IFF_UP);
+   err = dev_change_flags(dev, dev->flags | IFF_UP);
+   if (err < 0) {
+   rtnl_delete_link(dev);
+   rtnl_unlock();
+   ovs_vport_free(vport);
+   goto error;
+   }
+
rtnl_unlock();
return vport;
 error:
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index 7f8897f..0e72d95 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -54,6 +54,7 @@ static struct vport *gre_tnl_create(const struct vport_parms 
*parms)
struct net *net = ovs_dp_get_net(parms->dp);
struct net_device *dev;
struct vport *vport;
+   int err;
 
vport = ovs_vport_alloc(0, _gre_vport_ops, parms);
if (IS_ERR(vport))
@@ -67,9 +68,15 @@ static struct vport *gre_tnl_create(const struct vport_parms 
*parms)
return ERR_CAST(dev);
}
 
-   dev_change_flags(dev, dev->flags | IFF_UP);
-   rtnl_unlock();
+   err = dev_change_flags(dev, dev->flags | IFF_UP);
+   if (err < 0) {
+   rtnl_delete_link(dev);
+   rtnl_unlock();
+   ovs_vport_free(vport);
+   return ERR_PTR(err);
+   }
 
+   rtnl_unlock();
return vport;
 }
 
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 5eb7694..7eb955e 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -130,7 +130,14 @@ static struct vport *vxlan_tnl_create(const struct 
vport_parms *parms)
return ERR_CAST(dev);
}
 
-   dev_change_flags(dev, dev->flags | IFF_UP);
+   err = dev_change_flags(dev, dev->flags | IFF_UP);
+   if (err < 0) {
+   rtnl_delete_link(dev);
+   rtnl_unlock();
+   ovs_vport_free(vport);
+   goto error;
+   }
+
rtnl_unlock();
return vport;
 error:
-- 
2.9.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] ovn-controller: Reset flow processing after (re)connection to switch

2016-08-09 Thread Ryan Moats
Numan Siddique  wrote on 08/09/2016 09:39:21 AM:

> From: Numan Siddique 
> To: Ryan Moats/Omaha/IBM@IBMUS
> Cc: ovs dev 
> Date: 08/09/2016 09:39 AM
> Subject: Re: [ovs-dev] [PATCH] ovn-controller: Reset flow processing
> after (re)connection to switch
>
> On Tue, Aug 9, 2016 at 7:15 PM, Ryan Moats  wrote:
> "dev"  wrote on 08/09/2016 07:19:27 AM:
>
> > From: Numan Siddique 
> > To: ovs dev 
> > Date: 08/09/2016 07:19 AM
> > Subject: [ovs-dev] [PATCH] ovn-controller: Reset flow processing
> > after (re)connection to switch
> > Sent by: "dev" 
> >
> > When ovn-controller reconnects to the ovs-vswitchd, it deletes all the
> > OF flows in the switch. It doesn't install the flows again, leaving
> > the datapath broken unless ovn-controller is restarted or ovn-northd
> > updates the SB DB.
> >
> > The reason for this is
> >   - lflow_reset_processing() is not called after the reconnection
> >   - the hmap "installed_flows" is not cleared, because of which
> >     ofctrl_put skips adding the flows to the switch.
> >
> > This patch fixes the issue and also adds a test case to test
> > this scenario.
> >
> > Signed-off-by: Numan Siddique 
> > ---
>
> I'm going to pick a nit on this one - is the behavior you are aiming
> for delete and re-add or just recalculate and leave alone?

>
> ​In my testing I am seeing that all the OF flows are getting de​
> leted when I restart ovs-vswitchd.
> I am testing with the latest master of OVS. I am able to see this on
> 2 different machines and also in sandbox.
>
> I thought that ovn-controller is deleting the flows in the switch
> when it restarts (https://github.com/openvswitch/ovs/blob/master/
> ovn/controller/ofctrl.c#L355)
>
> Now I tested again and before restarting ovs-vswitchd, I killed ovn-
> controller. Looks like ovs-vswitchd is clearing the old flows when
> it restarts. I am not sure if this is the intended behavior. Looks
> like it is. Please correct me if I am wrong here.
>
> ​You can run below commands to reproduce the issue in sandbox​
>
> -
>  $make sandbox SANDBOXFLAGS="--ovn"
>  $ovn/env1/setup.sh
>  $ovs-ofctl dump-flows br-int
>  $ovs-appctl -t ovn-controller exit
>  $ovs-appctl -t ovs-vswitchd exit
> ​ $ovs-vswitchd --detach --no-chdir --pidfile -vconsole:off --log-
> file --enable-dummy=override -vvconn -vnetdev_dummy
>  $ ovs-ofctl dump-flows br-int
> NXST_FLOW reply (xid=0x4):
> 
>
> You will see that the flows are deleted even if you don't run -
> "ovs-appctl -t ovn-controller exit".
>
> I ask because if it is "delete and re-add" aren't you still disrupting
> the datapath even if only momentarily?
>

Ok, so we'll assume that your code is valid for when ovswitchd purges
the old flows and that's good.

IIRC there is a way to restart ovswitchd via ovs_ctl so that it doesn't
purge the old flows.  Since I'll argue (as an operator) that is the more
important case, can you add a unit test for this and verify that your
patch doesn't leave that path broken?

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH V6 7/7] netdev-dpdk: add support for jumbo frames

2016-08-09 Thread Mark Kavanagh
Add support for Jumbo Frames to DPDK-enabled port types,
using single-segment-mbufs.

Using this approach, the amount of memory allocated to each mbuf
to store frame data is increased to a value greater than 1518B
(typical Ethernet maximum frame length). The increased space
available in the mbuf means that an entire Jumbo Frame of a specific
size can be carried in a single mbuf, as opposed to partitioning
it across multiple mbuf segments.

The amount of space allocated to each mbuf to hold frame data is
defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
parameter.

Signed-off-by: Mark Kavanagh 
[diproiet...@vmware.com rebased]
Signed-off-by: Daniele Di Proietto 
---

v6:
- include device name in netdev_dpdk_set_mtu error log
- resolve minor coding standards infractions

v5:
- rename dpdk_mp_configure to netdev_dpdk_mempool_configure
- consolidate socket_id and mtu changes within
  netdev_dpdk_mempool_configure
- add lower bounds check for user-supplied MTU
- add socket_id and mtu fields to mempool configure error report
- minor cosmetic changes

v4:
- restore error reporting in *_reconfigure functions (for
  non-mtu-configuration based errors)
- remove 'goto' in the event of dpdk_mp_configure failure
- remove superfluous error variables

 v3:
- replace netdev_dpdk.last_mtu with local variable
- add comment for dpdk_mp_configure

 v2:
 - rebase to HEAD of master
 - fall back to previous 'good' MTU if reconfigure fails
 - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
   fall-back
 - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
 - remove rebasing artifact in INSTALL.DPDK-Advanced.md
 - remove superflous variable in dpdk_mp_configure
 - fix minor coding style infraction


 INSTALL.DPDK-ADVANCED.md |  58 ++-
 INSTALL.DPDK.md  |   1 -
 NEWS |   1 +
 lib/netdev-dpdk.c| 145 +++
 4 files changed, 176 insertions(+), 29 deletions(-)

diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
index 0ab43d4..5e758ce 100755
--- a/INSTALL.DPDK-ADVANCED.md
+++ b/INSTALL.DPDK-ADVANCED.md
@@ -1,5 +1,5 @@
 OVS DPDK ADVANCED INSTALL GUIDE
-=
+===
 
 ## Contents
 
@@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
 7. [QOS](#qos)
 8. [Rate Limiting](#rl)
 9. [Flow Control](#fc)
-10. [Vsperf](#vsperf)
+10. [Jumbo Frames](#jumbo)
+11. [Vsperf](#vsperf)
 
 ##  1. Overview
 
@@ -862,7 +863,58 @@ respective parameter. To disable the flow control at tx 
side,
 
 `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
 
-##  10. Vsperf
+##  10. Jumbo Frames
+
+By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
+enable Jumbo Frames support for a DPDK port, change the Interface's 
`mtu_request`
+attribute to a sufficiently large value.
+
+e.g. Add a DPDK Phy port with MTU of 9000:
+
+`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set 
Interface dpdk0 mtu_request=9000`
+
+e.g. Change the MTU of an existing port to 6200:
+
+`ovs-vsctl set Interface dpdk0 mtu_request=6200`
+
+When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
+increased, such that a full Jumbo Frame of a specific size may be accommodated
+within a single mbuf segment.
+
+Jumbo frame support has been validated against 9728B frames (largest frame size
+supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
+(particularly in use cases involving East-West traffic only), and other DPDK 
NIC
+drivers may be supported.
+
+### 9.1 vHost Ports and Jumbo Frames
+
+Some additional configuration is needed to take advantage of jumbo frames with
+vhost ports:
+
+1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
+the QEMU command line snippet below:
+
+```
+'-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
+'-device 
virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
+```
+
+2. Where virtio devices are bound to the Linux kernel driver in a guest
+   environment (i.e. interfaces are not bound to an in-guest DPDK driver),
+   the MTU of those logical network interfaces must also be increased to a
+   sufficiently large value. This avoids segmentation of Jumbo Frames
+   received in the guest. Note that 'MTU' refers to the length of the IP
+   packet only, and not that of the entire frame.
+
+   To calculate the exact MTU of a standard IPv4 frame, subtract the L2
+   header and CRC lengths (i.e. 18B) from the max supported frame size.
+   So, to set the MTU for a 9018B Jumbo Frame:
+
+   ```
+   ifconfig eth1 mtu 9000
+   ```
+
+##  11. Vsperf
 
 Vsperf project goal is to develop vSwitch test framework 

[ovs-dev] [PATCH V6 1/7] ofproto: Consider datapath_type when looking for internal ports.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

Interfaces with type "internal" end up having a netdev with type "tap"
in the dpif-netdev datapath, so a strcmp will fail to match internal
interfaces.

We can translate the types with ofproto_port_open_type() before calling
strcmp to fix this.

This fixes a minor issue where internal interfaces are considered
non-internal in the userspace datapath for the purpose of adjusting the
MTU.

Signed-off-by: Daniele Di Proietto 
---
 ofproto/ofproto.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 8e59c69..088f91a 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -220,7 +220,8 @@ static void learned_cookies_flush(struct ofproto *, struct 
ovs_list *dead_cookie
 /* ofport. */
 static void ofport_destroy__(struct ofport *) OVS_EXCLUDED(ofproto_mutex);
 static void ofport_destroy(struct ofport *, bool del);
-static inline bool ofport_is_internal(const struct ofport *);
+static inline bool ofport_is_internal(const struct ofproto *,
+  const struct ofport *);
 
 static int update_port(struct ofproto *, const char *devname);
 static int init_ports(struct ofproto *);
@@ -2465,7 +2466,7 @@ static void
 ofport_remove(struct ofport *ofport)
 {
 struct ofproto *p = ofport->ofproto;
-bool is_internal = ofport_is_internal(ofport);
+bool is_internal = ofport_is_internal(p, ofport);
 
 connmgr_send_port_status(ofport->ofproto->connmgr, NULL, >pp,
  OFPPR_DELETE);
@@ -2751,9 +2752,10 @@ init_ports(struct ofproto *p)
 }
 
 static inline bool
-ofport_is_internal(const struct ofport *port)
+ofport_is_internal(const struct ofproto *p, const struct ofport *port)
 {
-return !strcmp(netdev_get_type(port->netdev), "internal");
+return !strcmp(netdev_get_type(port->netdev),
+   ofproto_port_open_type(p->type, "internal"));
 }
 
 /* Find the minimum MTU of all non-datapath devices attached to 'p'.
@@ -2770,7 +2772,7 @@ find_min_mtu(struct ofproto *p)
 
 /* Skip any internal ports, since that's what we're trying to
  * set. */
-if (ofport_is_internal(ofport)) {
+if (ofport_is_internal(p, ofport)) {
 continue;
 }
 
@@ -2797,7 +2799,7 @@ update_mtu(struct ofproto *p, struct ofport *port)
 port->mtu = 0;
 return;
 }
-if (ofport_is_internal(port)) {
+if (ofport_is_internal(p, port)) {
 if (dev_mtu > p->min_mtu) {
if (!netdev_set_mtu(port->netdev, p->min_mtu)) {
dev_mtu = p->min_mtu;
@@ -2827,7 +2829,7 @@ update_mtu_ofproto(struct ofproto *p)
 HMAP_FOR_EACH (ofport, hmap_node, >ports) {
 struct netdev *netdev = ofport->netdev;
 
-if (ofport_is_internal(ofport)) {
+if (ofport_is_internal(p, ofport)) {
 if (!netdev_set_mtu(netdev, p->min_mtu)) {
 ofport->mtu = p->min_mtu;
 }
-- 
1.9.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH V6 6/7] netdev: Make netdev_set_mtu() netdev parameter non-const.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

Every provider silently drops the const attribute when converting the
parameter to the appropriate subclass.  Might as well drop the const
attribute from the parameter, since this is a "set" function.

Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-dummy.c| 2 +-
 lib/netdev-linux.c| 2 +-
 lib/netdev-provider.h | 2 +-
 lib/netdev.c  | 2 +-
 lib/netdev.h  | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index c8f82b7..dec1a8e 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -1150,7 +1150,7 @@ netdev_dummy_get_mtu(const struct netdev *netdev, int 
*mtup)
 }
 
 static int
-netdev_dummy_set_mtu(const struct netdev *netdev, int mtu)
+netdev_dummy_set_mtu(struct netdev *netdev, int mtu)
 {
 struct netdev_dummy *dev = netdev_dummy_cast(netdev);
 
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 1b5f7c1..20b5cc7 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -1382,7 +1382,7 @@ netdev_linux_get_mtu(const struct netdev *netdev_, int 
*mtup)
  * networking ioctl interface.
  */
 static int
-netdev_linux_set_mtu(const struct netdev *netdev_, int mtu)
+netdev_linux_set_mtu(struct netdev *netdev_, int mtu)
 {
 struct netdev_linux *netdev = netdev_linux_cast(netdev_);
 struct ifreq ifr;
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index 5bcfeba..cd04ae9 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -389,7 +389,7 @@ struct netdev_class {
  * If 'netdev' does not have an MTU (e.g. as some tunnels do not), then
  * this function should return EOPNOTSUPP.  This function may be set to
  * null if it would always return EOPNOTSUPP. */
-int (*set_mtu)(const struct netdev *netdev, int mtu);
+int (*set_mtu)(struct netdev *netdev, int mtu);
 
 /* Returns the ifindex of 'netdev', if successful, as a positive number.
  * On failure, returns a negative errno value.
diff --git a/lib/netdev.c b/lib/netdev.c
index 589d37c..5cf8bbb 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -869,7 +869,7 @@ netdev_get_mtu(const struct netdev *netdev, int *mtup)
  * MTU (as e.g. some tunnels do not).  On other failure, returns a positive
  * errno value. */
 int
-netdev_set_mtu(const struct netdev *netdev, int mtu)
+netdev_set_mtu(struct netdev *netdev, int mtu)
 {
 const struct netdev_class *class = netdev->netdev_class;
 int error;
diff --git a/lib/netdev.h b/lib/netdev.h
index dc7ede8..d8ec627 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -132,7 +132,7 @@ const char *netdev_get_name(const struct netdev *);
 const char *netdev_get_type(const struct netdev *);
 const char *netdev_get_type_from_name(const char *);
 int netdev_get_mtu(const struct netdev *, int *mtup);
-int netdev_set_mtu(const struct netdev *, int mtu);
+int netdev_set_mtu(struct netdev *, int mtu);
 int netdev_get_ifindex(const struct netdev *);
 int netdev_set_tx_multiq(struct netdev *, unsigned int n_txq);
 
-- 
1.9.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH V6 5/7] tests: Add a new MTU test.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

Also, netdev-dummy needs to call netdev_change_seq_changed() in
set_mtu().

Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-dummy.c|  5 -
 tests/ofproto-dpif.at | 30 ++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 92af15f..c8f82b7 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -1155,7 +1155,10 @@ netdev_dummy_set_mtu(const struct netdev *netdev, int 
mtu)
 struct netdev_dummy *dev = netdev_dummy_cast(netdev);
 
 ovs_mutex_lock(>mutex);
-dev->mtu = mtu;
+if (dev->mtu != mtu) {
+dev->mtu = mtu;
+netdev_change_seq_changed(netdev);
+}
 ovs_mutex_unlock(>mutex);
 
 return 0;
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index a46fc81..3638063 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -8859,3 +8859,33 @@ n_packets=0
 
 OVS_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([ofproto - set mtu])
+OVS_VSWITCHD_START
+
+add_of_ports br0 1
+
+# Check that initial MTU is 1500 for 'br0' and 'p1'.
+AT_CHECK([ovs-vsctl get Interface br0 mtu], [0], [dnl
+1500
+])
+AT_CHECK([ovs-vsctl get Interface p1 mtu], [0], [dnl
+1500
+])
+
+# Request new MTU for 'p1'
+AT_CHECK([ovs-vsctl set Interface p1 mtu_request=1600])
+
+# Check that the new MTU is applied
+AT_CHECK([ovs-vsctl --timeout=10 wait-until Interface p1 mtu=1600])
+# The internal port 'br0' should have the same MTU value as p1, becase it's
+# the new bridge minimum.
+AT_CHECK([ovs-vsctl --timeout=10 wait-until Interface br0 mtu=1600])
+
+AT_CHECK([ovs-vsctl del-port br0 p1])
+
+# When 'p1' is deleted, the internal port should return to the default MTU
+AT_CHECK([ovs-vsctl --timeout=10 wait-until Interface br0 mtu=1500])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
-- 
1.9.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH V6 4/7] netdev-dummy: Add dummy-internal class.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

"internal" netdevs are treated specially in OVS (e.g. for MTU), but
the dummy datapath remaps both "system" and "internal" devices to the
same "dummy" netdev class, so there's no way to discern those in tests.

This commit adds a new "dummy-internal" netdev type, which will be used
by the dummy datapath for internal ports, so that other parts of the
code can understand which ports are internal just by looking at the
netdev object.

The alternative solution, using the original interface type ("internal")
instead of the translated netdev type ("dummy"), is harder to implement,
because in so many places only the netdev object is available.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c |  2 +-
 lib/netdev-dummy.c| 14 --
 tests/bridge.at   |  6 +++---
 tests/dpctl.at| 12 ++--
 tests/mpls-xlate.at   |  4 ++--
 tests/netdev-type.at  |  2 +-
 tests/ofproto-dpif.at | 18 +-
 tests/ovs-vswitchd.at |  6 +++---
 tests/pmd.at  |  8 
 tests/tunnel-push-pop-ipv6.at |  4 ++--
 tests/tunnel-push-pop.at  |  4 ++--
 tests/tunnel.at   | 28 ++--
 12 files changed, 59 insertions(+), 49 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e39362e..6f2e07d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -888,7 +888,7 @@ static const char *
 dpif_netdev_port_open_type(const struct dpif_class *class, const char *type)
 {
 return strcmp(type, "internal") ? type
-  : dpif_netdev_class_is_dummy(class) ? "dummy"
+  : dpif_netdev_class_is_dummy(class) ? "dummy-internal"
   : "tap";
 }
 
diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 2a6aa56..92af15f 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -622,12 +622,15 @@ dummy_netdev_get_conn_state(struct dummy_packet_conn 
*conn)
 }
 
 static void
-netdev_dummy_run(const struct netdev_class *netdev_class OVS_UNUSED)
+netdev_dummy_run(const struct netdev_class *netdev_class)
 {
 struct netdev_dummy *dev;
 
 ovs_mutex_lock(_list_mutex);
 LIST_FOR_EACH (dev, list_node, _list) {
+if (netdev_get_class(>up) != netdev_class) {
+continue;
+}
 ovs_mutex_lock(>mutex);
 dummy_packet_conn_run(dev);
 ovs_mutex_unlock(>mutex);
@@ -636,12 +639,15 @@ netdev_dummy_run(const struct netdev_class *netdev_class 
OVS_UNUSED)
 }
 
 static void
-netdev_dummy_wait(const struct netdev_class *netdev_class OVS_UNUSED)
+netdev_dummy_wait(const struct netdev_class *netdev_class)
 {
 struct netdev_dummy *dev;
 
 ovs_mutex_lock(_list_mutex);
 LIST_FOR_EACH (dev, list_node, _list) {
+if (netdev_get_class(>up) != netdev_class) {
+continue;
+}
 ovs_mutex_lock(>mutex);
 dummy_packet_conn_wait(>conn);
 ovs_mutex_unlock(>mutex);
@@ -1380,6 +1386,9 @@ netdev_dummy_update_flags(struct netdev *netdev_,
 static const struct netdev_class dummy_class =
 NETDEV_DUMMY_CLASS("dummy", false, NULL);
 
+static const struct netdev_class dummy_internal_class =
+NETDEV_DUMMY_CLASS("dummy-internal", false, NULL);
+
 static const struct netdev_class dummy_pmd_class =
 NETDEV_DUMMY_CLASS("dummy-pmd", true,
netdev_dummy_reconfigure);
@@ -1751,6 +1760,7 @@ netdev_dummy_register(enum dummy_level level)
 netdev_dummy_override("system");
 }
 netdev_register_provider(_class);
+netdev_register_provider(_internal_class);
 netdev_register_provider(_pmd_class);
 
 netdev_vport_tunnel_register();
diff --git a/tests/bridge.at b/tests/bridge.at
index 37c55ba..3dbabe5 100644
--- a/tests/bridge.at
+++ b/tests/bridge.at
@@ -12,7 +12,7 @@ add_of_ports br0 1 2
 AT_CHECK([ovs-appctl dpif/show], [0], [dnl
 dummy@ovs-dummy: hit:0 missed:0
br0:
-   br0 65534/100: (dummy)
+   br0 65534/100: (dummy-internal)
p1 1/1: (dummy)
p2 2/2: (dummy)
 ])
@@ -23,7 +23,7 @@ AT_CHECK([ovs-appctl dpctl/del-if dummy@ovs-dummy p1])
 AT_CHECK([ovs-appctl dpif/show], [0], [dnl
 dummy@ovs-dummy: hit:0 missed:0
br0:
-   br0 65534/100: (dummy)
+   br0 65534/100: (dummy-internal)
p2 2/2: (dummy)
 ])
 
@@ -32,7 +32,7 @@ AT_CHECK([ovs-vsctl del-port p2])
 AT_CHECK([ovs-appctl dpif/show], [0], [dnl
 dummy@ovs-dummy: hit:0 missed:0
br0:
-   br0 65534/100: (dummy)
+   br0 65534/100: (dummy-internal)
p1 1/1: (dummy)
 ])
 OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
diff --git a/tests/dpctl.at b/tests/dpctl.at
index b6d5dd6..8c761c8 100644
--- a/tests/dpctl.at
+++ b/tests/dpctl.at
@@ -23,14 +23,14 @@ AT_CHECK([ovs-appctl dpctl/show dummy@br0], [0], [dnl
 dummy@br0:
lookups: hit:0 

[ovs-dev] [PATCH V6 2/7] vswitchd: Introduce 'mtu_request' column in Interface.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

The 'mtu_request' column can be used to set the MTU of a specific
interface.

This column is useful because it will allow changing the MTU of DPDK
devices (implemented in a future commit), which are not accessible
outside the ovs-vswitchd process, but it can be used for kernel
interfaces as well.

The current implementation of set_mtu() in netdev-dpdk is removed
because it's broken.  It will be reintroduced by a subsequent commit on
this series.

Signed-off-by: Daniele Di Proietto 
---
 NEWS   |  2 ++
 lib/netdev-dpdk.c  | 53 +-
 vswitchd/bridge.c  |  9 
 vswitchd/vswitch.ovsschema | 10 +++--
 vswitchd/vswitch.xml   | 52 +
 5 files changed, 58 insertions(+), 68 deletions(-)

diff --git a/NEWS b/NEWS
index c2ed71d..ce10982 100644
--- a/NEWS
+++ b/NEWS
@@ -101,6 +101,8 @@ Post-v2.5.0
- ovs-pki: Changed message digest algorithm from SHA-1 to SHA-512 because
  SHA-1 is no longer secure and some operating systems have started to
  disable it in OpenSSL.
+   - Add 'mtu_request' column to the Interface table. It can be used to
+ configure the MTU of non-internal ports.
 
 
 v2.5.0 - 26 Feb 2016
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index f37ec1c..60db568 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1639,57 +1639,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int 
*mtup)
 }
 
 static int
-netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
-{
-struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
-int old_mtu, err, dpdk_mtu;
-struct dpdk_mp *old_mp;
-struct dpdk_mp *mp;
-uint32_t buf_size;
-
-ovs_mutex_lock(_mutex);
-ovs_mutex_lock(>mutex);
-if (dev->mtu == mtu) {
-err = 0;
-goto out;
-}
-
-buf_size = dpdk_buf_size(mtu);
-dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
-
-mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
-if (!mp) {
-err = ENOMEM;
-goto out;
-}
-
-rte_eth_dev_stop(dev->port_id);
-
-old_mtu = dev->mtu;
-old_mp = dev->dpdk_mp;
-dev->dpdk_mp = mp;
-dev->mtu = mtu;
-dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-
-err = dpdk_eth_dev_init(dev);
-if (err) {
-dpdk_mp_put(mp);
-dev->mtu = old_mtu;
-dev->dpdk_mp = old_mp;
-dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-dpdk_eth_dev_init(dev);
-goto out;
-}
-
-dpdk_mp_put(old_mp);
-netdev_change_seq_changed(netdev);
-out:
-ovs_mutex_unlock(>mutex);
-ovs_mutex_unlock(_mutex);
-return err;
-}
-
-static int
 netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier);
 
 static int
@@ -2964,7 +2913,7 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev)
 netdev_dpdk_set_etheraddr,\
 netdev_dpdk_get_etheraddr,\
 netdev_dpdk_get_mtu,  \
-netdev_dpdk_set_mtu,  \
+NULL,   /* set_mtu */ \
 netdev_dpdk_get_ifindex,  \
 GET_CARRIER,  \
 netdev_dpdk_get_carrier_resets,   \
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index ddf1fe5..397be70 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -775,6 +775,15 @@ bridge_delete_or_reconfigure_ports(struct bridge *br)
 goto delete;
 }
 
+if (iface->cfg->n_mtu_request == 1
+&& strcmp(iface->type,
+  ofproto_port_open_type(br->type, "internal"))) {
+/* Try to set the MTU to the requested value.  This is not done
+ * for internal interfaces, since their MTU is decided by the
+ * ofproto module, based on other ports in the bridge. */
+netdev_set_mtu(iface->netdev, *iface->cfg->mtu_request);
+}
+
 /* If the requested OpenFlow port for 'iface' changed, and it's not
  * already the correct port, then we might want to temporarily delete
  * this interface, so we can add it back again with the new OpenFlow
diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 32fdf28..8966803 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@
 {"name": "Open_vSwitch",
- "version": "7.13.0",
- "cksum": "889248633 22774",
+ "version": "7.14.0",
+ "cksum": "3974332717 22936",
  "tables": {
"Open_vSwitch": {
  "columns": {
@@ -321,6 +321,12 @@
"mtu": {
  "type": {"key": "integer", "min": 0, "max": 1},
  "ephemeral": true},
+   "mtu_request": {
+ "type": {
+   "key": {"type": "integer",
+   "minInteger": 1},
+   

[ovs-dev] [PATCH V6 3/7] netdev: Pass 'netdev_class' to ->run() and ->wait().

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

This will allow run() and wait() methods to be shared between different
classes and still perform class-specific work.

Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-bsd.c  |  6 +++---
 lib/netdev-dummy.c|  4 ++--
 lib/netdev-linux.c|  6 +++---
 lib/netdev-provider.h | 14 ++
 lib/netdev-vport.c|  4 ++--
 lib/netdev.c  |  4 ++--
 6 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c
index 2bba0ed..75a330b 100644
--- a/lib/netdev-bsd.c
+++ b/lib/netdev-bsd.c
@@ -146,7 +146,7 @@ static void ifr_set_flags(struct ifreq *, int flags);
 static int af_link_ioctl(unsigned long command, const void *arg);
 #endif
 
-static void netdev_bsd_run(void);
+static void netdev_bsd_run(const struct netdev_class *);
 static int netdev_bsd_get_mtu(const struct netdev *netdev_, int *mtup);
 
 static bool
@@ -180,7 +180,7 @@ netdev_get_kernel_name(const struct netdev *netdev)
  * interface status changes, and eventually calls all the user callbacks.
  */
 static void
-netdev_bsd_run(void)
+netdev_bsd_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 rtbsd_notifier_run();
 }
@@ -190,7 +190,7 @@ netdev_bsd_run(void)
  * be called.
  */
 static void
-netdev_bsd_wait(void)
+netdev_bsd_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 rtbsd_notifier_wait();
 }
diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index a950409..2a6aa56 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -622,7 +622,7 @@ dummy_netdev_get_conn_state(struct dummy_packet_conn *conn)
 }
 
 static void
-netdev_dummy_run(void)
+netdev_dummy_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct netdev_dummy *dev;
 
@@ -636,7 +636,7 @@ netdev_dummy_run(void)
 }
 
 static void
-netdev_dummy_wait(void)
+netdev_dummy_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct netdev_dummy *dev;
 
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index fa37bcf..1b5f7c1 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -526,7 +526,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
20);
  * changes in the device miimon status, so we can use atomic_count. */
 static atomic_count miimon_cnt = ATOMIC_COUNT_INIT(0);
 
-static void netdev_linux_run(void);
+static void netdev_linux_run(const struct netdev_class *);
 
 static int netdev_linux_do_ethtool(const char *name, struct ethtool_cmd *,
int cmd, const char *cmd_name);
@@ -623,7 +623,7 @@ netdev_linux_miimon_enabled(void)
 }
 
 static void
-netdev_linux_run(void)
+netdev_linux_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct nl_sock *sock;
 int error;
@@ -697,7 +697,7 @@ netdev_linux_run(void)
 }
 
 static void
-netdev_linux_wait(void)
+netdev_linux_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct nl_sock *sock;
 
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index ae390cb..5bcfeba 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -236,15 +236,21 @@ struct netdev_class {
 int (*init)(void);
 
 /* Performs periodic work needed by netdevs of this class.  May be null if
- * no periodic work is necessary. */
-void (*run)(void);
+ * no periodic work is necessary.
+ *
+ * 'netdev_class' points to the class.  It is useful in case the same
+ * function is used to implement different classes. */
+void (*run)(const struct netdev_class *netdev_class);
 
 /* Arranges for poll_block() to wake up if the "run" member function needs
  * to be called.  Implementations are additionally required to wake
  * whenever something changes in any of its netdevs which would cause their
  * ->change_seq() function to change its result.  May be null if nothing is
- * needed here. */
-void (*wait)(void);
+ * needed here.
+ *
+ * 'netdev_class' points to the class.  It is useful in case the same
+ * function is used to implement different classes. */
+void (*wait)(const struct netdev_class *netdev_class);
 
 /* ##  ## */
 /* ## netdev Functions ## */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 87a30f8..7eabd2c 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -321,7 +321,7 @@ netdev_vport_update_flags(struct netdev *netdev OVS_UNUSED,
 }
 
 static void
-netdev_vport_run(void)
+netdev_vport_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 uint64_t seq;
 
@@ -334,7 +334,7 @@ netdev_vport_run(void)
 }
 
 static void
-netdev_vport_wait(void)
+netdev_vport_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 uint64_t seq;
 
diff --git a/lib/netdev.c b/lib/netdev.c
index 75bf1cb..589d37c 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -160,7 +160,7 @@ netdev_run(void)
 struct netdev_registered_class *rc;
 CMAP_FOR_EACH (rc, cmap_node, 

Re: [ovs-dev] [PATCH v2] dpif-netdev: dpcls per in_port with sorted subtables

2016-08-09 Thread Jan Scheurich
I just submitted a v3 version of the patch. No need to review this one.

Jan

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Jan Scheurich
> Sent: Friday, 15 July, 2016 18:35
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH v2] dpif-netdev: dpcls per in_port with sorted
> subtables
> 
> This turns the previous RFC PATCH dpif-netdev: dpcls per in_port with sorted
> subtables into a non-RFC patch v2.
> 
> The user-space datapath (dpif-netdev) consists of a first level "exact match
> cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With
> many parallel packet flows (e.g. TCP connections) the EMC becomes inefficient
> and the OVS forwarding performance is determined by the megaflow classifier.
> 
> The megaflow classifier (dpcls) consists of a variable number of hash tables
> (aka subtables), each containing megaflow entries with the same mask of
> packet header and metadata fields to match upon. A dpcls lookup matches a
> given packet against all subtables in sequence until it hits a match. As
> megaflow cache entries are by construction non-overlapping, the first match is
> the only match.
> 
> Today the order of the subtables in the dpcls is essentially random so that on
> average a dpcsl lookup has to visit N/2 subtables for a hit, when N is the 
> total
> number of subtables. Even though every single hash-table lookup is fast, the
> performance of the current dpcls degrades when there are many subtables.
> 
> How does the patch address this issue:
> 
> In reality there is often a strong correlation between the ingress port and a
> small subset of subtables that have hits. The entire megaflow cache typically
> decomposes nicely into partitions that are hit only by packets entering from a
> range of similar ports (e.g. traffic from Phy  -> VM vs. traffic from VM -> 
> Phy).
> 
> Therefore, maintaining a separate dpcls instance per ingress port with its
> subtable vector sorted by frequency of hits reduces the average number of
> subtables lookups in the dpcls to a minimum, even if the total number of
> subtables gets large. This is possible because megaflows always have an exact
> match on in_port, so every megaflow belongs to unique dpcls instance.
> 
> For thread safety, the PMD thread needs to block out revalidators during the
> periodic optimization. We use ovs_mutex_trylock() to avoid blocking the PMD.
> 
> To monitor the effectiveness of the patch we have enhanced the ovs-appctl
> dpif-netdev/pmd-stats-show command with an extra line "avg. subtable lookups
> per hit" to report the average number of subtable lookup needed for a
> megaflow match. Ideally, this should be close to 1 and almost all cases much
> smaller than N/2.
> 
> I have benchmarked a cloud L3 overlay pipeline with a VXLAN overlay mesh.
> With pure L3 tenant traffic between VMs on different nodes the resulting
> netdev dpcls contains N=4 subtables.
> 
> Disabling the EMC, I have measured a baseline performance (in+out) of ~1.32
> Mpps (64 bytes, 1000 L4 flows). The average number of subtable lookups per
> dpcls match is 2.5.
> 
> With the patch the average number of subtable lookups per dpcls match is
> reduced 1 and the forwarding performance grows by ~30% to 1.72 Mpps.
> 
> As the actual number of subtables will often be higher in reality, we can
> assume that this is at the lower end of the speed-up one can expect from this
> optimization. Just running a parallel ping between the VXLAN tunnel endpoints
> increases the number of subtables and hence the average number of subtable
> lookups from 2.5 to 3.5 with a corresponding decrease of throughput to 1.14
> Mpps. With the patch the parallel ping has no impact on average number of
> subtable lookups and performance. The performance gain is then ~50%.
> 
> The main change to the previous patch is that instead of having a subtable
> vector per in_port in a single dplcs instance, we now have one dpcls instance
> with a single subtable per ingress port. This is better aligned with the 
> design
> base code and also improves the number of subtable lookups in a miss case.
> 
> The PMD tests have been adjusted to the additional line in pmd-stats-show.
> 
> Signed-off-by: Jan Scheurich 
> 
> 
> Changes in v2:
> - Rebased to master (commit 3041e1fc9638)
> - Take the pmd->flow_mutex during optimization to block out revalidators
>   Use trylock in order to not block the PMD thread
> - Made in_port an explicit input parameter to fast_path_processing()
> - Fixed coding style issues
> 
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH V5 7/7] netdev-dpdk: add support for Jumbo Frames

2016-08-09 Thread Kavanagh, Mark B
>Few minor comments inline.

Thanks Ilya - responses inline.

>
>On 09.08.2016 15:03, Mark Kavanagh wrote:
>> Add support for Jumbo Frames to DPDK-enabled port types,
>> using single-segment-mbufs.
>>
>> Using this approach, the amount of memory allocated to each mbuf
>> to store frame data is increased to a value greater than 1518B
>> (typical Ethernet maximum frame length). The increased space
>> available in the mbuf means that an entire Jumbo Frame of a specific
>> size can be carried in a single mbuf, as opposed to partitioning
>> it across multiple mbuf segments.
>>
>> The amount of space allocated to each mbuf to hold frame data is
>> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
>> parameter.
>>
>> Signed-off-by: Mark Kavanagh 
>> [diproiet...@vmware.com rebased]
>> Signed-off-by: Daniele Di Proietto 
>> ---
>>
>> v5:
>> - rename dpdk_mp_configure to netdev_dpdk_mempool_configure
>> - consolidate socket_id and mtu changes within
>>   netdev_dpdk_mempool_configure
>> - add lower bounds check for user-supplied MTU
>> - add socket_id and mtu fields to mempool configure error report
>> - minor cosmetic changes
>>
>> v4:
>> - restore error reporting in *_reconfigure functions (for
>>   non-mtu-configuration based errors)
>> - remove 'goto' in the event of dpdk_mp_configure failure
>> - remove superfluous error variables
>>
>>  v3:
>> - replace netdev_dpdk.last_mtu with local variable
>> - add comment for dpdk_mp_configure
>>
>>  v2:
>>  - rebase to HEAD of master
>>  - fall back to previous 'good' MTU if reconfigure fails
>>  - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
>>fall-back
>>  - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
>>  - remove rebasing artifact in INSTALL.DPDK-Advanced.md
>>  - remove superflous variable in dpdk_mp_configure
>>  - fix minor coding style infraction
>>
>>  INSTALL.DPDK-ADVANCED.md |  58 ++-
>>  INSTALL.DPDK.md  |   1 -
>>  NEWS |   1 +
>>  lib/netdev-dpdk.c| 145 
>> +++
>>  4 files changed, 176 insertions(+), 29 deletions(-)
>>
>> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
>> index 0ab43d4..5e758ce 100755
>> --- a/INSTALL.DPDK-ADVANCED.md
>> +++ b/INSTALL.DPDK-ADVANCED.md
>> @@ -1,5 +1,5 @@
>>  OVS DPDK ADVANCED INSTALL GUIDE
>> -=
>> +===
>>
>>  ## Contents
>>
>> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>>  7. [QOS](#qos)
>>  8. [Rate Limiting](#rl)
>>  9. [Flow Control](#fc)
>> -10. [Vsperf](#vsperf)
>> +10. [Jumbo Frames](#jumbo)
>> +11. [Vsperf](#vsperf)
>>
>>  ##  1. Overview
>>
>> @@ -862,7 +863,58 @@ respective parameter. To disable the flow control at tx 
>> side,
>>
>>  `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
>>
>> -##  10. Vsperf
>> +##  10. Jumbo Frames
>> +
>> +By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
>> +enable Jumbo Frames support for a DPDK port, change the Interface's 
>> `mtu_request`
>> +attribute to a sufficiently large value.
>> +
>> +e.g. Add a DPDK Phy port with MTU of 9000:
>> +
>> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set 
>> Interface dpdk0
>mtu_request=9000`
>> +
>> +e.g. Change the MTU of an existing port to 6200:
>> +
>> +`ovs-vsctl set Interface dpdk0 mtu_request=6200`
>> +
>> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
>> +increased, such that a full Jumbo Frame of a specific size may be 
>> accommodated
>> +within a single mbuf segment.
>> +
>> +Jumbo frame support has been validated against 9728B frames (largest frame 
>> size
>> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
>> +(particularly in use cases involving East-West traffic only), and other 
>> DPDK NIC
>> +drivers may be supported.
>> +
>> +### 9.1 vHost Ports and Jumbo Frames
>> +
>> +Some additional configuration is needed to take advantage of jumbo frames 
>> with
>> +vhost ports:
>> +
>> +1. `mergeable buffers` must be enabled for vHost ports, as demonstrated 
>> in
>> +the QEMU command line snippet below:
>> +
>> +```
>> +'-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
>> +'-device 
>> virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
>> +```
>> +
>> +2. Where virtio devices are bound to the Linux kernel driver in a guest
>> +   environment (i.e. interfaces are not bound to an in-guest DPDK 
>> driver),
>> +   the MTU of those logical network interfaces must also be increased 
>> to a
>> +   sufficiently large value. This avoids segmentation of Jumbo Frames
>> +   received in the guest. Note that 'MTU' refers to the length of the IP
>> +   packet only, 

Re: [ovs-dev] [PATCH v3] dpif-netdev: dpcls per in_port with sorted subtables

2016-08-09 Thread Jan Scheurich
This change improves the performance of the DPDK netdev datapath in real-world 
NFV scenarios by 30% or more.

Given that the patch has been reviewed and conceptually agreed upon earlier 
(e.g. http://openvswitch.org/pipermail/dev/2016-July/074455.html) it would be 
great if it could still make it into OVS 2.6.

Thanks, Jan

> -Original Message-
> From: Jan Scheurich [mailto:jan.scheur...@web.de]
> Sent: Tuesday, 09 August, 2016 16:08
> To: dev@openvswitch.org
> Cc: jan.scheur...@web.de
> Subject: [PATCH v3] dpif-netdev: dpcls per in_port with sorted subtables
> 
> The user-space datapath (dpif-netdev) consists of a first level "exact match
> cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With
> many parallel packet flows (e.g. TCP connections) the EMC becomes inefficient
> and the OVS forwarding performance is determined by the megaflow classifier.
> 
> The megaflow classifier (dpcls) consists of a variable number of hash tables
> (aka subtables), each containing megaflow entries with the same mask of
> packet header and metadata fields to match upon. A dpcls lookup matches a
> given packet against all subtables in sequence until it hits a match. As
> megaflow cache entries are by construction non-overlapping, the first match is
> the only match.
> 
> Today the order of the subtables in the dpcls is essentially random so that on
> average a dpcls lookup has to visit N/2 subtables for a hit, when N is the 
> total
> number of subtables. Even though every single hash-table lookup is fast, the
> performance of the current dpcls degrades when there are many subtables.
> 
> How does the patch address this issue:
> 
> In reality there is often a strong correlation between the ingress port and a
> small subset of subtables that have hits. The entire megaflow cache typically
> decomposes nicely into partitions that are hit only by packets entering from a
> range of similar ports (e.g. traffic from Phy  -> VM vs. traffic from VM -> 
> Phy).
> 
> Therefore, maintaining a separate dpcls instance per ingress port with its
> subtable vector sorted by frequency of hits reduces the average number of
> subtables lookups in the dpcls to a minimum, even if the total number of
> subtables gets large. This is possible because megaflows always have an exact
> match on in_port, so every megaflow belongs to unique dpcls instance.
> 
> For thread safety, the PMD thread needs to block out revalidators during the
> periodic optimization. We use ovs_mutex_trylock() to avoid blocking the PMD.
> 
> To monitor the effectiveness of the patch we have enhanced the ovs-appctl
> dpif-netdev/pmd-stats-show command with an extra line "avg. subtable lookups
> per hit" to report the average number of subtable lookup needed for a
> megaflow match. Ideally, this should be close to 1 and almost all cases much
> smaller than N/2.
> 
> The PMD tests have been adjusted to the additional line in pmd-stats-show.
> 
> We have benchmarked a L3-VPN pipeline on top of a VXLAN overlay mesh.
> With pure L3 tenant traffic between VMs on different nodes the resulting
> netdev dpcls contains N=4 subtables. Each packet traversing the OVS datapath
> is subject to dpcls lookup twice due to the tunnel termination.
> 
> Disabling the EMC, we have measured a baseline performance (in+out) of
> ~1.45 Mpps (64 bytes, 10K L4 packet flows). The average number of subtable
> lookups per dpcls match is 2.5. With the patch the average number of subtable
> lookups per dpcls match is reduced to 1 and the forwarding performance grows
> by ~50% to 2.13 Mpps.
> 
> Even with EMC enabled, the patch improves the performance by 9% (for 1000
> L4
> flows) and 34% (for 50K+ L4 flows).
> 
> As the actual number of subtables will often be higher in reality, we can
> assume that this is at the lower end of the speed-up one can expect from this
> optimization. Just running a parallel ping between the VXLAN tunnel endpoints
> increases the number of subtables and hence the average number of subtable
> lookups from 2.5 to 3.5 on master with a corresponding decrease of throughput
> to 1.2 Mpps. With the patch the parallel ping has no impact on average number
> of subtable lookups and performance. The performance gain is then ~75%.
> 
> 
> Signed-off-by: Jan Scheurich 
> 
> Changes in v3:
> - Rebased to master (commit 6ef5fa92eb70)
> - Updated performance benchmark figures
> - Adapted to renamed cpvector API
>   Reverted dplcs to using cpvector due to threading issue during flow removal
> - Implemented v2 comments by Antonio Fischetti
> 
> Changes in v2:
> - Rebased to master (commit 3041e1fc9638)
> - Take the pmd->flow_mutex during optimization to block out revalidators
>   Use trylock in order to not block the PMD thread
> - Made in_port an explicit input parameter to fast_path_processing()
> - Fixed coding style issues
> 
> 
> ---
> 
>  lib/dpif-netdev.c | 215
> 
> 

Re: [ovs-dev] [RFC PATCH v2 1/5] Add NSH fields for Openvswitch flow key

2016-08-09 Thread Simon Horman
On Wed, Jul 13, 2016 at 01:36:14AM +0800, Johnson Li wrote:
> Openvswitch could use the fields of Network Serivce Header(NSH)
> as key to steer traffic to the Virtual Network Functions(VNF).
> The key will contain fields for NSH base header, service path
> header and context header for MD type 1. For MD type 2, will
> reuse the field definition tun_opts.
> 
> Signed-off-by: Johnson Li 
> 
> diff --git a/datapath/flow.c b/datapath/flow.c
> index c97c9c9..fd09cec 100644
> --- a/datapath/flow.c
> +++ b/datapath/flow.c
> @@ -489,6 +489,9 @@ static int key_extract(struct sk_buff *skb, struct 
> sw_flow_key *key)
>   skb_reset_mac_len(skb);
>   __skb_push(skb, skb->data - skb_mac_header(skb));
>  
> + /* Network Service Header */
> + memset(>nsh, 0, sizeof(key->nsh));

This seems to be an expensive per-packet cost.

I wonder if clearing the nsh key could be avoided in the general case.
Or if an abbreviated mechanism could be used - e.g. setting the nsp field
to 0.

> +
>   /* Network layer. */
>   if (key->eth.type == htons(ETH_P_IP)) {
>   struct iphdr *nh;
> diff --git a/datapath/flow.h b/datapath/flow.h
> index c0b628a..6ac96c3 100644
> --- a/datapath/flow.h
> +++ b/datapath/flow.h
> @@ -54,10 +54,25 @@ struct ovs_tunnel_info {
>   (offsetof(struct sw_flow_key, recirc_id) +  \
>   FIELD_SIZEOF(struct sw_flow_key, recirc_id))
>  
> +/* Network Service Header.
> + */
> +struct ovs_nsh_key {
> + u8  flags;
> + u8  md_type;/* NSH metadata type */
> + u8  next_proto; /* NSH next protocol */
> + u8  nsi;/* NSH index */
> + u32 nsp;/* NSH path id */

My understanding is that the nsp is only 24-bits wide.
Would it make sense for the nsp and nsi to share a single u32?

[...]
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC PATCH v2 06/13] Parse NSH header in function flow_extract in user space

2016-08-09 Thread Simon Horman
On Wed, Jul 13, 2016 at 01:27:53AM +0800, Johnson Li wrote:
> Signed-off-by: Johnson Li 
> 
> diff --git a/lib/flow.c b/lib/flow.c
> index a4c1215..cdeccfc 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -439,6 +439,46 @@ invalid:
>  arp_buf[1] = eth_addr_zero;
>  }
>  
> +static int
> +parse_nsh(const void **datap, size_t *sizep,
> +  struct flow_nsh *key, struct tun_metadata *md OVS_UNUSED)
> +{
> +const struct nsh_header *nsh = (const struct nsh_header *) *datap;
> +uint16_t length = 0;
> +
> +memset(key, 0, sizeof(struct flow_nsh));
> +
> +length = nsh->base.length << 2;
> +if (length > NSH_LEN_MAX)
> +return -EINVAL;
> +
> +key->flags = nsh->base.flags;
> +key->md_type = nsh->base.md_type;
> +key->next_proto = nsh->base.next_proto;
> +key->nsi = nsh->base.nsi;
> +key->nsp = nsh->base.sfp << 8;
> +
> +if (nsh->base.md_type == NSH_MD_TYPE1) {
> +const struct nsh_md1_ctx *ctx = (struct nsh_md1_ctx *)(nsh + 1);
> +key->nshc1 = ctx->nshc1;
> +key->nshc2 = ctx->nshc2;
> +key->nshc3 = ctx->nshc3;
> +key->nshc4 = ctx->nshc4;
> +#if 0
> +} else if (nsh->base.md_type == NSH_MD_TYPE2) {
> +const struct nsh_md2_ctx *ctx = (struct nsh_md2_ctx *)(nsh + 1);
> +
> +/* Prototype with TUN_METADATA APIs. */
> +tun_metadata_from_nsh_ctx((struct geneve_opt *)ctx,
> +   md, length - sizeof *nsh);
> +#endif

Please use or remove unused code.

> +}
> +
> +data_pull(datap, sizep, length);
> +
> +return 0;
> +}
> +
>  /* Initializes 'flow' members from 'packet' and 'md'
>   *
>   * Initializes 'packet' header l2 pointer to the start of the Ethernet
> @@ -563,6 +603,27 @@ miniflow_extract(struct dp_packet *packet, struct 
> miniflow *dst)
>  /* Network layer. */
>  packet->l3_ofs = (char *)data - l2;
>  
> +/* Network Service Header */
> +if (dl_type == htons(ETH_TYPE_NSH)) {
> +struct flow_nsh nsh;
> +struct tun_metadata metadata;
> +
> +if (OVS_LIKELY(!parse_nsh(, , , ))) {
> +miniflow_push_words(mf, nsh, , sizeof(struct flow_nsh) /
> +sizeof(uint64_t));
> +#if 0
> +if (nsh.md_type == NSH_MD_TYPE2) {
> +/* MD type 2 is not fully implemented yet. */
> +if (metadata.present.map) {
> +miniflow_push_words(mf, tunnel.metadata, ,
> +sizeof(metadata) / sizeof(uint64_t));
> +}
> +}
> +#endif

Ditto.

> +}
> +goto out;
> +}
> +
>  nw_frag = 0;
>  if (OVS_LIKELY(dl_type == htons(ETH_TYPE_IP))) {
>  const struct ip_header *nh = data;
> @@ -1293,6 +1354,18 @@ void flow_wildcards_init_for_packet(struct 
> flow_wildcards *wc,
>  WC_MASK_FIELD(wc, dp_hash);
>  WC_MASK_FIELD(wc, in_port);
>  
> +if (flow->nsh.nsp) {
> +WC_MASK_FIELD(wc, nsh.flags);
> +WC_MASK_FIELD(wc, nsh.md_type);
> +WC_MASK_FIELD(wc, nsh.next_proto);
> +WC_MASK_FIELD(wc, nsh.nsi);
> +WC_MASK_FIELD(wc, nsh.nsp);
> +WC_MASK_FIELD(wc, nsh.nshc1);
> +WC_MASK_FIELD(wc, nsh.nshc2);
> +WC_MASK_FIELD(wc, nsh.nshc3);
> +WC_MASK_FIELD(wc, nsh.nshc4);
> +}
> +

Probably the above works but as I understand things the above is encoding a
TYPE 1 NSH header so I wonder if (flow->base.mt_type == NS_MD_TYPE1) would
be a more appropriate condition.

>  /* actset_output wildcarded. */
>  
>  WC_MASK_FIELD(wc, dl_dst);
> @@ -1397,6 +1470,18 @@ flow_wc_map(const struct flow *flow, struct flowmap 
> *map)
>  FLOWMAP_SET(map, ct_mark);
>  FLOWMAP_SET(map, ct_label);
>  
> +if (flow->nsh.nsp) {
> +FLOWMAP_SET(map, nsh.flags);
> +FLOWMAP_SET(map, nsh.md_type);
> +FLOWMAP_SET(map, nsh.next_proto);
> +FLOWMAP_SET(map, nsh.nsi);
> +FLOWMAP_SET(map, nsh.nsp);
> +FLOWMAP_SET(map, nsh.nshc1);
> +FLOWMAP_SET(map, nsh.nshc2);
> +FLOWMAP_SET(map, nsh.nshc3);
> +FLOWMAP_SET(map, nsh.nshc4);
> +}
> +

Ditto.

[...]
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC PATCH v2 04/13] Add APIs to set NSH keys for match fields

2016-08-09 Thread Simon Horman
On Wed, Jul 13, 2016 at 01:27:19AM +0800, Johnson Li wrote:
> Signed-off-by: Johnson Li 
> 
> diff --git a/include/openvswitch/match.h b/include/openvswitch/match.h
> index c955753..4c79da3 100644
> --- a/include/openvswitch/match.h
> +++ b/include/openvswitch/match.h
> @@ -40,6 +40,18 @@ struct match {
>  /* Initializer for a "struct match" that matches every packet. */
>  #define MATCH_CATCHALL_INITIALIZER { .flow = { .dl_type = 0 } }
>  
> +#define MATCH_SET_FIELD_MASKED(match, field, value, msk)  \
> +do {  \
> +(match)->wc.masks.field = (msk);  \
> +(match)->flow.field = (value) & (msk);\
> +} while (0)
> +
> +#define MATCH_SET_FIELD_UINT8(match, field, value)\
> +MATCH_SET_FIELD_MASKED(match, field, value, UINT8_MAX)
> +
> +#define MATCH_SET_FIELD_BE32(match, field, value) \
> +MATCH_SET_FIELD_MASKED(match, field, value, OVS_BE32_MAX)
> +
>  void match_init(struct match *,
>  const struct flow *, const struct flow_wildcards *);
>  void match_wc_init(struct match *match, const struct flow *flow);

This patch seems more generic than the changelog indicates.

And if this approach is acceptable it seems that as a follow-up
it could be used extensively in lib/match.c.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC PATCH v2 3/5] parse NSH key in key_extract of openvswitch

2016-08-09 Thread Simon Horman
On Wed, Jul 13, 2016 at 01:36:30AM +0800, Johnson Li wrote:
> Parse the Network Service Header to fullfill the fields in the
> struct sw_flow_key.
> 
> Signed-off-by: Johnson Li 
> 
> diff --git a/datapath/flow.c b/datapath/flow.c
> index fd09cec..debac6f 100644
> --- a/datapath/flow.c
> +++ b/datapath/flow.c
> @@ -44,6 +44,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "datapath.h"
>  #include "conntrack.h"
> @@ -296,6 +297,45 @@ static bool icmp6hdr_ok(struct sk_buff *skb)
> sizeof(struct icmp6hdr));
>  }
>  
> +static int parse_nsh(struct sk_buff *skb, struct sw_flow_key *key)
> +{
> + struct nsh_hdr *nsh_hdr = (struct nsh_hdr *)skb_mac_header(skb);

I'm a little surprised to see the NSH header be accessed via
skb_mac_header() here. I would have expected the skb_mac_header()
to point to any Ethernet header that is present.

> + uint16_t retval = -1;

I don't think there is any need to initialise retval above
as it is initialised before use below.

> + // uint16_t length = 0; /* For MD type 2 support */

Please remove or use this and other unused code.

> +
> + retval = nsh_hdr->base.length << 2;
> + if (retval > NSH_LEN_MAX)
> + return -EINVAL;

Perhaps the local retval variable could be removed as its only
set and then compared above.

[...]
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] ovn-controller: Reset flow processing after (re)connection to switch

2016-08-09 Thread Numan Siddique
On Tue, Aug 9, 2016 at 7:15 PM, Ryan Moats  wrote:

> "dev"  wrote on 08/09/2016 07:19:27 AM:
>
> > From: Numan Siddique 
> > To: ovs dev 
> > Date: 08/09/2016 07:19 AM
> > Subject: [ovs-dev] [PATCH] ovn-controller: Reset flow processing
> > after (re)connection to switch
> > Sent by: "dev" 
> >
> > When ovn-controller reconnects to the ovs-vswitchd, it deletes all the
> > OF flows in the switch. It doesn't install the flows again, leaving
> > the datapath broken unless ovn-controller is restarted or ovn-northd
> > updates the SB DB.
> >
> > The reason for this is
> >   - lflow_reset_processing() is not called after the reconnection
> >   - the hmap "installed_flows" is not cleared, because of which
> > ofctrl_put skips adding the flows to the switch.
> >
> > This patch fixes the issue and also adds a test case to test
> > this scenario.
> >
> > Signed-off-by: Numan Siddique 
> > ---
>
> I'm going to pick a nit on this one - is the behavior you are aiming
> for delete and re-add or just recalculate and leave alone?
>
>
​
In my testing I am seeing that all the OF flows are getting de​leted when I
restart ovs-vswitchd.
I am testing with the latest master of OVS. I am able to see this on 2
different machines and also in sandbox.

I thought that ovn-controller is deleting the flows in the switch when it
restarts (
https://github.com/openvswitch/ovs/blob/master/ovn/controller/ofctrl.c#L355)

Now I tested again and before restarting ovs-vswitchd, I killed
ovn-controller. Looks like ovs-vswitchd is clearing the old flows when it
restarts. I am not sure if this is the intended behavior. Looks like it is.
Please correct me if I am wrong here.


​You can run below commands to reproduce the issue in sandbox​

-
 $make sandbox SANDBOXFLAGS="--ovn"
 $ovn/env1/setup.sh
 $ovs-ofctl dump-flows br-int
 $ovs-appctl -t ovn-controller exit
 $ovs-appctl -t ovs-vswitchd exit
​ $ovs-vswitchd --detach --no-chdir --pidfile -vconsole:off --log-file
--enable-dummy=override -vvconn -vnetdev_dummy
 $ ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):



You will see that the flows are deleted even if you don't run - "ovs-appctl
-t ovn-controller exit".




I ask because if it is "delete and re-add" aren't you still disrupting
> the datapath even if only momentarily?
>
>
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH V5 7/7] netdev-dpdk: add support for Jumbo Frames

2016-08-09 Thread Ilya Maximets
Few minor comments inline.

On 09.08.2016 15:03, Mark Kavanagh wrote:
> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
> 
> Using this approach, the amount of memory allocated to each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame of a specific
> size can be carried in a single mbuf, as opposed to partitioning
> it across multiple mbuf segments.
> 
> The amount of space allocated to each mbuf to hold frame data is
> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
> parameter.
> 
> Signed-off-by: Mark Kavanagh 
> [diproiet...@vmware.com rebased]
> Signed-off-by: Daniele Di Proietto 
> ---
> 
> v5:
> - rename dpdk_mp_configure to netdev_dpdk_mempool_configure
> - consolidate socket_id and mtu changes within
>   netdev_dpdk_mempool_configure
> - add lower bounds check for user-supplied MTU
> - add socket_id and mtu fields to mempool configure error report
> - minor cosmetic changes
> 
> v4:
> - restore error reporting in *_reconfigure functions (for
>   non-mtu-configuration based errors)
> - remove 'goto' in the event of dpdk_mp_configure failure
> - remove superfluous error variables
> 
>  v3:
> - replace netdev_dpdk.last_mtu with local variable
> - add comment for dpdk_mp_configure
> 
>  v2:
>  - rebase to HEAD of master
>  - fall back to previous 'good' MTU if reconfigure fails
>  - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
>fall-back
>  - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
>  - remove rebasing artifact in INSTALL.DPDK-Advanced.md
>  - remove superflous variable in dpdk_mp_configure
>  - fix minor coding style infraction
> 
>  INSTALL.DPDK-ADVANCED.md |  58 ++-
>  INSTALL.DPDK.md  |   1 -
>  NEWS |   1 +
>  lib/netdev-dpdk.c| 145 
> +++
>  4 files changed, 176 insertions(+), 29 deletions(-)
> 
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 0ab43d4..5e758ce 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -1,5 +1,5 @@
>  OVS DPDK ADVANCED INSTALL GUIDE
> -=
> +===
>  
>  ## Contents
>  
> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>  7. [QOS](#qos)
>  8. [Rate Limiting](#rl)
>  9. [Flow Control](#fc)
> -10. [Vsperf](#vsperf)
> +10. [Jumbo Frames](#jumbo)
> +11. [Vsperf](#vsperf)
>  
>  ##  1. Overview
>  
> @@ -862,7 +863,58 @@ respective parameter. To disable the flow control at tx 
> side,
>  
>  `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
>  
> -##  10. Vsperf
> +##  10. Jumbo Frames
> +
> +By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
> +enable Jumbo Frames support for a DPDK port, change the Interface's 
> `mtu_request`
> +attribute to a sufficiently large value.
> +
> +e.g. Add a DPDK Phy port with MTU of 9000:
> +
> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set 
> Interface dpdk0 mtu_request=9000`
> +
> +e.g. Change the MTU of an existing port to 6200:
> +
> +`ovs-vsctl set Interface dpdk0 mtu_request=6200`
> +
> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
> +increased, such that a full Jumbo Frame of a specific size may be 
> accommodated
> +within a single mbuf segment.
> +
> +Jumbo frame support has been validated against 9728B frames (largest frame 
> size
> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
> +(particularly in use cases involving East-West traffic only), and other DPDK 
> NIC
> +drivers may be supported.
> +
> +### 9.1 vHost Ports and Jumbo Frames
> +
> +Some additional configuration is needed to take advantage of jumbo frames 
> with
> +vhost ports:
> +
> +1. `mergeable buffers` must be enabled for vHost ports, as demonstrated 
> in
> +the QEMU command line snippet below:
> +
> +```
> +'-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
> +'-device 
> virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
> +```
> +
> +2. Where virtio devices are bound to the Linux kernel driver in a guest
> +   environment (i.e. interfaces are not bound to an in-guest DPDK 
> driver),
> +   the MTU of those logical network interfaces must also be increased to 
> a
> +   sufficiently large value. This avoids segmentation of Jumbo Frames
> +   received in the guest. Note that 'MTU' refers to the length of the IP
> +   packet only, and not that of the entire frame.
> +
> +   To calculate the exact MTU of a standard IPv4 frame, subtract the L2
> +   header and CRC lengths 

[ovs-dev] Documents Requested

2016-08-09 Thread Garth
Dear dev,

Please find attached documents as requested.

Best Regards,
Garth
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v3] dpif-netdev: dpcls per in_port with sorted subtables

2016-08-09 Thread Jan Scheurich
The user-space datapath (dpif-netdev) consists of a first level "exact match
cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With
many parallel packet flows (e.g. TCP connections) the EMC becomes inefficient
and the OVS forwarding performance is determined by the megaflow classifier.

The megaflow classifier (dpcls) consists of a variable number of hash tables
(aka subtables), each containing megaflow entries with the same mask of
packet header and metadata fields to match upon. A dpcls lookup matches a
given packet against all subtables in sequence until it hits a match. As
megaflow cache entries are by construction non-overlapping, the first match
is the only match.

Today the order of the subtables in the dpcls is essentially random so that
on average a dpcls lookup has to visit N/2 subtables for a hit, when N is the
total number of subtables. Even though every single hash-table lookup is
fast, the performance of the current dpcls degrades when there are many
subtables.

How does the patch address this issue:

In reality there is often a strong correlation between the ingress port and a
small subset of subtables that have hits. The entire megaflow cache typically
decomposes nicely into partitions that are hit only by packets entering from
a range of similar ports (e.g. traffic from Phy  -> VM vs. traffic from VM ->
Phy). 

Therefore, maintaining a separate dpcls instance per ingress port with its
subtable vector sorted by frequency of hits reduces the average number of
subtables lookups in the dpcls to a minimum, even if the total number of
subtables gets large. This is possible because megaflows always have an exact
match on in_port, so every megaflow belongs to unique dpcls instance.

For thread safety, the PMD thread needs to block out revalidators during the
periodic optimization. We use ovs_mutex_trylock() to avoid blocking the PMD. 

To monitor the effectiveness of the patch we have enhanced the ovs-appctl
dpif-netdev/pmd-stats-show command with an extra line "avg. subtable lookups
per hit" to report the average number of subtable lookup needed for a
megaflow match. Ideally, this should be close to 1 and almost all cases much
smaller than N/2.

The PMD tests have been adjusted to the additional line in pmd-stats-show.

We have benchmarked a L3-VPN pipeline on top of a VXLAN overlay mesh.
With pure L3 tenant traffic between VMs on different nodes the resulting
netdev dpcls contains N=4 subtables. Each packet traversing the OVS 
datapath is subject to dpcls lookup twice due to the tunnel termination.

Disabling the EMC, we have measured a baseline performance (in+out) of ~1.45
Mpps (64 bytes, 10K L4 packet flows). The average number of subtable lookups 
per dpcls match is 2.5. With the patch the average number of subtable lookups 
per dpcls match is reduced to 1 and the forwarding performance grows by ~50% 
to 2.13 Mpps.

Even with EMC enabled, the patch improves the performance by 9% (for 1000 L4 
flows) and 34% (for 50K+ L4 flows).

As the actual number of subtables will often be higher in reality, we can
assume that this is at the lower end of the speed-up one can expect from this
optimization. Just running a parallel ping between the VXLAN tunnel endpoints
increases the number of subtables and hence the average number of subtable
lookups from 2.5 to 3.5 on master with a corresponding decrease of throughput 
to 1.2 Mpps. With the patch the parallel ping has no impact on average number
of subtable lookups and performance. The performance gain is then ~75%.


Signed-off-by: Jan Scheurich 

Changes in v3:
- Rebased to master (commit 6ef5fa92eb70)
- Updated performance benchmark figures
- Adapted to renamed cpvector API
  Reverted dplcs to using cpvector due to threading issue during flow removal
- Implemented v2 comments by Antonio Fischetti

Changes in v2:
- Rebased to master (commit 3041e1fc9638)
- Take the pmd->flow_mutex during optimization to block out revalidators
  Use trylock in order to not block the PMD thread
- Made in_port an explicit input parameter to fast_path_processing()
- Fixed coding style issues


---

 lib/dpif-netdev.c | 215 
+-
 tests/pmd.at  |   6 +++--
 2 files changed, 188 insertions(+), 33 deletions(-)


diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1541628..0b3586b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -161,9 +161,14 @@ struct emc_cache {
 ^L
 /* Simple non-wildcarding single-priority classifier. */

+/* Time in ms between successive optimizations of the dpcls subtable vector */
+#define DPCLS_OPTIMIZATION_INTERVAL 1000
+
 struct dpcls {
+struct cmap_node node;
+odp_port_t in_port;
 struct cmap subtables_map;
-struct pvector *subtables;
+struct cpvector subtables;
 };

 /* A rule to be inserted to the classifier. */
@@ -176,12 

Re: [ovs-dev] [PATCH] ovn-controller: Reset flow processing after (re)connection to switch

2016-08-09 Thread Ryan Moats
"dev"  wrote on 08/09/2016 07:19:27 AM:

> From: Numan Siddique 
> To: ovs dev 
> Date: 08/09/2016 07:19 AM
> Subject: [ovs-dev] [PATCH] ovn-controller: Reset flow processing
> after (re)connection to switch
> Sent by: "dev" 
>
> When ovn-controller reconnects to the ovs-vswitchd, it deletes all the
> OF flows in the switch. It doesn't install the flows again, leaving
> the datapath broken unless ovn-controller is restarted or ovn-northd
> updates the SB DB.
>
> The reason for this is
>   - lflow_reset_processing() is not called after the reconnection
>   - the hmap "installed_flows" is not cleared, because of which
> ofctrl_put skips adding the flows to the switch.
>
> This patch fixes the issue and also adds a test case to test
> this scenario.
>
> Signed-off-by: Numan Siddique 
> ---

I'm going to pick a nit on this one - is the behavior you are aiming
for delete and re-add or just recalculate and leave alone?

I ask because if it is "delete and re-add" aren't you still disrupting
the datapath even if only momentarily?

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 9/9] ovn-trace: New utility.

2016-08-09 Thread Ryan Moats


Ben Pfaff  wrote on 08/08/2016 11:56:22 PM:

> From: Ben Pfaff 
> To: Ryan Moats/Omaha/IBM@IBMUS
> Cc: dev@openvswitch.org
> Date: 08/08/2016 11:57 PM
> Subject: Re: [ovs-dev] [PATCH v3 9/9] ovn-trace: New utility.
>
> On Mon, Aug 08, 2016 at 09:33:54PM -0500, Ryan Moats wrote:
> > "dev"  wrote on 08/08/2016 01:21:52 PM:
> >
> > > From: Ben Pfaff 
> > > To: dev@openvswitch.org
> > > Cc: Ben Pfaff 
> > > Date: 08/08/2016 01:23 PM
> > > Subject: [ovs-dev] [PATCH v3 9/9] ovn-trace: New utility.
> > > Sent by: "dev" 
> > >
> > > This new utility is intended to fulfill for OVN the purpose that
> > > "ofproto/trace" has for Open vSwitch.  First, it's meant to be a
useful
> > > tool for troubleshooting and diagnosis and in general for improving
one's
> > > understanding of the emergent properties of a flow table.  Second, it
> > > simplifies and increases the practical scope of testing, as well as
> > making
> > > testing more reliable and repeatable and failures easier to
interpret.
> > >
> > > This commit adds only a single test that uses the new utility, based
on
> > the
> > > oldest OVN end-to-end test "ovn -- 3 HVs, 1 LS, 3 lports/HV".  The
> > > differences between the old and the new test illustrate properties of
> > > tracing.  First, the new test does not start any ovn-controller
processes
> > > or simulate any hypervisors in a nontrivial way.  This is because
> > ovn-trace
> > > does not actually forward packets or rely on the physical structure
of
> > the
> > > system.  Second, whereas the old test tested not just the logical but
> > also
> > > the physical structure of the system, it needed to have several
logical
> > > ports, a total of 9 (3 on each of 3 HVs), whereas since this test
only
> > > tests the logical network implementation it can use a smaller number.
> > This
> > > property also means that the new test runs signicantly faster than
the
> > old
> > > one (less than a second on my laptop).
> > >
> > > In my opinion this approach points the way toward the future of OVN
> > > testing.  Certainly, we need end-to-end tests.  However, I believe
that
> > the
> > > bulk of our tests can be broken into ones that test the logical
network
> > > implementation (using tracing) and ones that test physical/logical
> > > translation.
> > >
> > > Signed-off-by: Ben Pfaff 
> > > ---
> >
> > Rather than dribs and drabs, let's just ack the whole rest of the
> > series (it took me longer because there was a bunch of code to read
> > through and understand)
> >
> > Acked-by: Ryan Moats 
>
> Thanks for all the reviews!
>
> I've asked Justin to have a second look at patches 2 and 9 in
> particular.  I guess that this will take him a day or so.
>

That's fair - it was those two that took me the longest...

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] FW: Documents Requested

2016-08-09 Thread Cedric
Dear dev,

Please find attached documents as requested.

Best Regards,
Cedric
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [ovs-discuss] OVS DPDK VFIO error

2016-08-09 Thread Panu Matilainen

On 08/09/2016 03:14 PM, Kapil Adhikesavalu wrote:

Hi Bhanu Prakash,

I already check the BIOS, VT-d is enabled by default.
From the dmesg, how can i find if VT-d is enabled, i see "IOMMU enabled", i
don't understand the rest.

[root@localhost ~]# dmesg | grep -e DMAR -e IOMMU
[0.00] ACPI: DMAR 0xBDDAD200 000558 (v01 HP ProLiant
0001 \xffd2?   162E)
[0.00] DMAR: IOMMU enabled
[0.069333] DMAR: Host address width 46
[0.069335] DMAR: DRHD base: 0x00fbefe000 flags: 0x0
[0.069341] DMAR: dmar0: reg_base_addr fbefe000 ver 1:0 cap
d2078c106f0466 ecap f020de
[0.069342] DMAR: DRHD base: 0x00f4ffe000 flags: 0x1
[0.069346] DMAR: dmar1: reg_base_addr f4ffe000 ver 1:0 cap
d2078c106f0466 ecap f020de
[0.069347] DMAR: RMRR base: 0x00bdffd000 end: 0x00bdff
[0.069349] DMAR: RMRR base: 0x00bdff6000 end: 0x00bdffcfff
[0.069349] DMAR: RMRR base: 0x00bdf83000 end: 0x00bdf84fff
[0.069351] DMAR: RMRR base: 0x00bdf7f000 end: 0x00bdf82fff
[0.069352] DMAR: RMRR base: 0x00bdf6f000 end: 0x00bdf7efff
[0.069353] DMAR: RMRR base: 0x00bdf6e000 end: 0x00bdf6efff
[0.069355] DMAR: RMRR base: 0x0f4000 end: 0x0f4fff
[0.069356] DMAR: RMRR base: 0x0e8000 end: 0x0e8fff
[0.069356] DMAR: RMRR base: 0x00bddde000 end: 0x00bdddefff
[0.069357] DMAR: ATSR flags: 0x0
[0.069360] DMAR-IR: IOAPIC id 10 under DRHD base  0xfbefe000 IOMMU 0
[0.069361] DMAR-IR: IOAPIC id 8 under DRHD base  0xf4ffe000 IOMMU 1
[0.069362] DMAR-IR: IOAPIC id 0 under DRHD base  0xf4ffe000 IOMMU 1
[0.069362] DMAR-IR: HPET id 0 under DRHD base 0xf4ffe000
[0.069364] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out
bit.
[0.069364] DMAR-IR: Use 'intremap=no_x2apic_optout' to override the
BIOS setting.
[0.070293] DMAR-IR: Enabled IRQ remapping in xapic mode
[0.996061] DMAR: dmar0: Using Queued invalidation
[0.996220] DMAR: dmar1: Using Queued invalidation
[0.996477] DMAR: Hardware identity mapping for device :00:00.0
[0.996479] DMAR: Hardware identity mapping for device :00:01.0
[0.996481] DMAR: Hardware identity mapping for device :00:01.1
[0.996483] DMAR: Hardware identity mapping for device :00:02.0
[0.996484] DMAR: Hardware identity mapping for device :00:02.1
[0.996489] DMAR: Hardware identity mapping for device :00:02.2
[0.996490] DMAR: Hardware identity mapping for device :00:02.3
[0.996492] DMAR: Hardware identity mapping for device :00:03.0
[0.996494] DMAR: Hardware identity mapping for device :00:03.1
[0.996495] DMAR: Hardware identity mapping for device :00:03.2
[0.996497] DMAR: Hardware identity mapping for device :00:03.3
[0.996499] DMAR: Hardware identity mapping for device :00:04.0
[0.996501] DMAR: Hardware identity mapping for device :00:04.1
[0.996502] DMAR: Hardware identity mapping for device :00:04.2
[0.996504] DMAR: Hardware identity mapping for device :00:04.3
[0.996505] DMAR: Hardware identity mapping for device :00:04.4
[0.996507] DMAR: Hardware identity mapping for device :00:04.5
[0.996509] DMAR: Hardware identity mapping for device :00:04.6
[0.996510] DMAR: Hardware identity mapping for device :00:04.7
[0.996512] DMAR: Hardware identity mapping for device :00:05.0
[0.996514] DMAR: Hardware identity mapping for device :00:05.2
[0.996515] DMAR: Hardware identity mapping for device :00:05.4
[0.996517] DMAR: Hardware identity mapping for device :00:11.0
[0.996519] DMAR: Hardware identity mapping for device :00:1a.0
[0.996520] DMAR: Hardware identity mapping for device :00:1c.0
[0.996522] DMAR: Hardware identity mapping for device :00:1c.7
[0.996523] DMAR: Hardware identity mapping for device :00:1d.0
[0.996525] DMAR: Hardware identity mapping for device :00:1f.0
[0.996534] DMAR: Hardware identity mapping for device :01:00.1
[0.996536] DMAR: Hardware identity mapping for device :01:00.4
[0.996545] DMAR: Hardware identity mapping for device :1f:08.0
[0.996547] DMAR: Hardware identity mapping for device :1f:09.0
[0.996548] DMAR: Hardware identity mapping for device :1f:0a.0
[0.996550] DMAR: Hardware identity mapping for device :1f:0a.1
[0.996552] DMAR: Hardware identity mapping for device :1f:0a.2
[0.996553] DMAR: Hardware identity mapping for device :1f:0a.3
[0.996555] DMAR: Hardware identity mapping for device :1f:0b.0
[0.996556] DMAR: Hardware identity mapping for device :1f:0b.3
[0.996558] DMAR: Hardware identity mapping for device :1f:0c.0
[0.996559] DMAR: Hardware identity mapping for device :1f:0c.1
[0.996561] DMAR: Hardware identity mapping for device :1f:0c.2
[0.996563] DMAR: Hardware 

[ovs-dev] [PATCH] ovn-controller: Reset flow processing after (re)connection to switch

2016-08-09 Thread Numan Siddique
When ovn-controller reconnects to the ovs-vswitchd, it deletes all the
OF flows in the switch. It doesn't install the flows again, leaving
the datapath broken unless ovn-controller is restarted or ovn-northd
updates the SB DB.

The reason for this is
  - lflow_reset_processing() is not called after the reconnection
  - the hmap "installed_flows" is not cleared, because of which
ofctrl_put skips adding the flows to the switch.

This patch fixes the issue and also adds a test case to test
this scenario.

Signed-off-by: Numan Siddique 
---
 ovn/controller/ofctrl.c |  7 +++
 tests/ovn.at| 52 +
 2 files changed, 59 insertions(+)

diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
index 79d840d..d9104de 100644
--- a/ovn/controller/ofctrl.c
+++ b/ovn/controller/ofctrl.c
@@ -20,6 +20,7 @@
 #include "flow.h"
 #include "hash.h"
 #include "hindex.h"
+#include "lflow.h"
 #include "ofctrl.h"
 #include "openflow/openflow.h"
 #include "openvswitch/dynamic-string.h"
@@ -369,6 +370,7 @@ run_S_CLEAR_FLOWS(void)
 
 /* Clear installed_flows, to match the state of the switch. */
 ovn_flow_table_clear();
+lflow_reset_processing();
 
 /* Clear existing groups, to match the state of the switch. */
 if (groups) {
@@ -803,6 +805,11 @@ ovn_flow_table_clear(void)
 hindex_remove(_flow_table, >uuid_hindex_node);
 ovn_flow_destroy(f);
 }
+
+HMAP_FOR_EACH_SAFE (f, next, match_hmap_node, _flows) {
+hmap_remove(_flows, >match_hmap_node);
+ovn_flow_destroy(f);
+}
 }
 
 static void
diff --git a/tests/ovn.at b/tests/ovn.at
index 72868be..c5a6b75 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -4069,3 +4069,55 @@ AT_CHECK([cat received2.packets], [0], [expout])
 OVN_CLEANUP([hv1])
 
 AT_CLEANUP
+
+AT_SETUP([ovn -- ovs-vswitchd restart])
+AT_KEYWORDS([vswitchd restart])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+ovn-nbctl ls-add ls1
+
+ovn-nbctl lsp-add ls1 ls1-lp1 \
+-- lsp-set-addresses ls1-lp1 "f0:00:00:00:00:01 10.0.0.4"
+
+ovn-nbctl lsp-set-port-security ls1-lp1 "f0:00:00:00:00:01 10.0.0.4"
+
+net_add n1
+sim_add hv1
+
+as hv1
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+ovs-vsctl -- add-port br-int hv1-vif1 -- \
+set interface hv1-vif1 external-ids:iface-id=ls1-lp1 \
+options:tx_pcap=hv1/vif1-tx.pcap \
+options:rxq_pcap=hv1/vif1-rx.pcap \
+ofport-request=1
+
+ovn_populate_arp
+sleep 2
+
+as hv1 ovs-vsctl show
+
+echo "-"
+ovn-sbctl dump-flows
+echo "-"
+
+echo "-- hv1 dump --"
+as hv1 ovs-ofctl dump-flows br-int
+total_flows=`as hv1 ovs-ofctl dump-flows br-int | wc -l`
+
+echo "Total flows before vswitchd restart = " $total_flows
+
+as hv1
+OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
+start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif 
-vunixctl
+
+sleep 2
+total_flows_after_restart=`as hv1 ovs-ofctl dump-flows br-int | wc -l`
+echo "Total flows after vswitchd restart = " $total_flows_after_restart
+
+AT_CHECK([test "${total_flows}" = "${total_flows_after_restart}"])
+
+OVN_CLEANUP([hv1])
+AT_CLEANUP
-- 
2.7.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [ovs-discuss] OVS DPDK VFIO error

2016-08-09 Thread Kapil Adhikesavalu
Hi Bhanu Prakash,

I already check the BIOS, VT-d is enabled by default.
From the dmesg, how can i find if VT-d is enabled, i see "IOMMU enabled", i
don't understand the rest.

[root@localhost ~]# dmesg | grep -e DMAR -e IOMMU
[0.00] ACPI: DMAR 0xBDDAD200 000558 (v01 HP ProLiant
0001 \xffd2?   162E)
[0.00] DMAR: IOMMU enabled
[0.069333] DMAR: Host address width 46
[0.069335] DMAR: DRHD base: 0x00fbefe000 flags: 0x0
[0.069341] DMAR: dmar0: reg_base_addr fbefe000 ver 1:0 cap
d2078c106f0466 ecap f020de
[0.069342] DMAR: DRHD base: 0x00f4ffe000 flags: 0x1
[0.069346] DMAR: dmar1: reg_base_addr f4ffe000 ver 1:0 cap
d2078c106f0466 ecap f020de
[0.069347] DMAR: RMRR base: 0x00bdffd000 end: 0x00bdff
[0.069349] DMAR: RMRR base: 0x00bdff6000 end: 0x00bdffcfff
[0.069349] DMAR: RMRR base: 0x00bdf83000 end: 0x00bdf84fff
[0.069351] DMAR: RMRR base: 0x00bdf7f000 end: 0x00bdf82fff
[0.069352] DMAR: RMRR base: 0x00bdf6f000 end: 0x00bdf7efff
[0.069353] DMAR: RMRR base: 0x00bdf6e000 end: 0x00bdf6efff
[0.069355] DMAR: RMRR base: 0x0f4000 end: 0x0f4fff
[0.069356] DMAR: RMRR base: 0x0e8000 end: 0x0e8fff
[0.069356] DMAR: RMRR base: 0x00bddde000 end: 0x00bdddefff
[0.069357] DMAR: ATSR flags: 0x0
[0.069360] DMAR-IR: IOAPIC id 10 under DRHD base  0xfbefe000 IOMMU 0
[0.069361] DMAR-IR: IOAPIC id 8 under DRHD base  0xf4ffe000 IOMMU 1
[0.069362] DMAR-IR: IOAPIC id 0 under DRHD base  0xf4ffe000 IOMMU 1
[0.069362] DMAR-IR: HPET id 0 under DRHD base 0xf4ffe000
[0.069364] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out
bit.
[0.069364] DMAR-IR: Use 'intremap=no_x2apic_optout' to override the
BIOS setting.
[0.070293] DMAR-IR: Enabled IRQ remapping in xapic mode
[0.996061] DMAR: dmar0: Using Queued invalidation
[0.996220] DMAR: dmar1: Using Queued invalidation
[0.996477] DMAR: Hardware identity mapping for device :00:00.0
[0.996479] DMAR: Hardware identity mapping for device :00:01.0
[0.996481] DMAR: Hardware identity mapping for device :00:01.1
[0.996483] DMAR: Hardware identity mapping for device :00:02.0
[0.996484] DMAR: Hardware identity mapping for device :00:02.1
[0.996489] DMAR: Hardware identity mapping for device :00:02.2
[0.996490] DMAR: Hardware identity mapping for device :00:02.3
[0.996492] DMAR: Hardware identity mapping for device :00:03.0
[0.996494] DMAR: Hardware identity mapping for device :00:03.1
[0.996495] DMAR: Hardware identity mapping for device :00:03.2
[0.996497] DMAR: Hardware identity mapping for device :00:03.3
[0.996499] DMAR: Hardware identity mapping for device :00:04.0
[0.996501] DMAR: Hardware identity mapping for device :00:04.1
[0.996502] DMAR: Hardware identity mapping for device :00:04.2
[0.996504] DMAR: Hardware identity mapping for device :00:04.3
[0.996505] DMAR: Hardware identity mapping for device :00:04.4
[0.996507] DMAR: Hardware identity mapping for device :00:04.5
[0.996509] DMAR: Hardware identity mapping for device :00:04.6
[0.996510] DMAR: Hardware identity mapping for device :00:04.7
[0.996512] DMAR: Hardware identity mapping for device :00:05.0
[0.996514] DMAR: Hardware identity mapping for device :00:05.2
[0.996515] DMAR: Hardware identity mapping for device :00:05.4
[0.996517] DMAR: Hardware identity mapping for device :00:11.0
[0.996519] DMAR: Hardware identity mapping for device :00:1a.0
[0.996520] DMAR: Hardware identity mapping for device :00:1c.0
[0.996522] DMAR: Hardware identity mapping for device :00:1c.7
[0.996523] DMAR: Hardware identity mapping for device :00:1d.0
[0.996525] DMAR: Hardware identity mapping for device :00:1f.0
[0.996534] DMAR: Hardware identity mapping for device :01:00.1
[0.996536] DMAR: Hardware identity mapping for device :01:00.4
[0.996545] DMAR: Hardware identity mapping for device :1f:08.0
[0.996547] DMAR: Hardware identity mapping for device :1f:09.0
[0.996548] DMAR: Hardware identity mapping for device :1f:0a.0
[0.996550] DMAR: Hardware identity mapping for device :1f:0a.1
[0.996552] DMAR: Hardware identity mapping for device :1f:0a.2
[0.996553] DMAR: Hardware identity mapping for device :1f:0a.3
[0.996555] DMAR: Hardware identity mapping for device :1f:0b.0
[0.996556] DMAR: Hardware identity mapping for device :1f:0b.3
[0.996558] DMAR: Hardware identity mapping for device :1f:0c.0
[0.996559] DMAR: Hardware identity mapping for device :1f:0c.1
[0.996561] DMAR: Hardware identity mapping for device :1f:0c.2
[0.996563] DMAR: Hardware identity mapping for device :1f:0c.3
[0.996564] 

Re: [ovs-dev] [ovs-discuss] OVS DPDK VFIO error

2016-08-09 Thread Bodireddy, Bhanuprakash
>-Original Message-
>From: discuss [mailto:discuss-boun...@openvswitch.org] On Behalf Of Kapil
>Adhikesavalu
>Sent: Tuesday, August 9, 2016 10:46 AM
>To: dev@openvswitch.org; disc...@openvswitch.org
>Subject: [ovs-discuss] OVS DPDK VFIO error
>
>Hi,
>
>On a Intel xeon E5-2697 chip with iommu turned on with Intel NIC 82599, i am
>getting the following error while doing the NIC binding using VFIO.
>kernel: 4.23 fedora 23, i haven't tried the latest kernel yet.
>
>E5-2697 supports IOMMU VT-d
I hope you have already enabled VT-d in BIOS, can you check 'dmesg | grep -e 
DMAR -e IOMMU'. 

>
>VFIO NIC binding steps,
>modprobe vfio-pci
>sudo /usr/bin/chmod a+x /dev/vfio
>sudo /usr/bin/chmod 0666 /dev/vfio/*
>$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci :04:00.0
>$DPDK_DIR/tools/dpdk_nic_bind.py --status
>Error
>=
>EAL: Detected 48 lcore(s)
>EAL: Probing VFIO support...
>EAL:   IOMMU type 1 (Type 1) is supported
>EAL:   IOMMU type 8 (No-IOMMU) is not supported
>EAL: VFIO support initialized
>
>EAL: Master lcore 1 is ready (tid=83504bc0;cpuset=[1])
>EAL: PCI device :04:00.0 on NUMA socket 0
>EAL:   probe driver: 8086:154d rte_ixgbe_pmd
>EAL:   set IOMMU type 1 (Type 1) failed, error 1 (Operation not permitted)
>EAL:   set IOMMU type 8 (No-IOMMU) failed, error 19 (No such device)
>EAL:   :04:00.0 failed to select IOMMU type
>EAL: Error - exiting with code: 1
>  Cause: Requested device :04:00.0 cannot be used
>
>dmesg:
>==
>[    0.997461] DMAR: Ignoring identity map for HW passthrough device
>:00:1f.0 [0x0 - 0xff]
>[    0.997465] DMAR: Intel(R) Virtualization Technology for Directed I/O
>[    1.351801] DMAR: 32bit :00:1a.0 uses non-identity mapping
>[    1.362623] DMAR: 32bit :00:1d.0 uses non-identity mapping
>[    1.373601] DMAR: 32bit :01:00.4 uses non-identity mapping
>[  297.035504] vfio-pci :04:00.0: Device is ineligible for IOMMU domain
>attach due to platform RMRR requirement.  Contact your platform vendor.
>
>
>[root@localhost bin]# cat /proc/cmdline
>BOOT_IMAGE=/vmlinuz-4.2.3-300.fc23.x86_64 root=/dev/mapper/fedora-
>root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet
>default_hugepagesz=1G hugepagesz=1G hugepages=16 hugepagesz=2M
>hugepages=2048 iommu=pt intel_iommu=on

I don’t see any problem with your cmdline as  iommu=pt and intel_iommu is 
added. 

Regards,
Bhanu Prakash.

>
>demsg | grep 10G  - 82599 controller
>04:00.0 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter
>(rev 01)
>04:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter
>(rev 01)
>
>Regards
>Kapil.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH V5 6/7] netdev: Make netdev_set_mtu() netdev parameter non-const.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

Every provider silently drops the const attribute when converting the
parameter to the appropriate subclass.  Might as well drop the const
attribute from the parameter, since this is a "set" function.

Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-dummy.c| 2 +-
 lib/netdev-linux.c| 2 +-
 lib/netdev-provider.h | 2 +-
 lib/netdev.c  | 2 +-
 lib/netdev.h  | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index c8f82b7..dec1a8e 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -1150,7 +1150,7 @@ netdev_dummy_get_mtu(const struct netdev *netdev, int 
*mtup)
 }
 
 static int
-netdev_dummy_set_mtu(const struct netdev *netdev, int mtu)
+netdev_dummy_set_mtu(struct netdev *netdev, int mtu)
 {
 struct netdev_dummy *dev = netdev_dummy_cast(netdev);
 
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 1b5f7c1..20b5cc7 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -1382,7 +1382,7 @@ netdev_linux_get_mtu(const struct netdev *netdev_, int 
*mtup)
  * networking ioctl interface.
  */
 static int
-netdev_linux_set_mtu(const struct netdev *netdev_, int mtu)
+netdev_linux_set_mtu(struct netdev *netdev_, int mtu)
 {
 struct netdev_linux *netdev = netdev_linux_cast(netdev_);
 struct ifreq ifr;
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index 5bcfeba..cd04ae9 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -389,7 +389,7 @@ struct netdev_class {
  * If 'netdev' does not have an MTU (e.g. as some tunnels do not), then
  * this function should return EOPNOTSUPP.  This function may be set to
  * null if it would always return EOPNOTSUPP. */
-int (*set_mtu)(const struct netdev *netdev, int mtu);
+int (*set_mtu)(struct netdev *netdev, int mtu);
 
 /* Returns the ifindex of 'netdev', if successful, as a positive number.
  * On failure, returns a negative errno value.
diff --git a/lib/netdev.c b/lib/netdev.c
index 589d37c..5cf8bbb 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -869,7 +869,7 @@ netdev_get_mtu(const struct netdev *netdev, int *mtup)
  * MTU (as e.g. some tunnels do not).  On other failure, returns a positive
  * errno value. */
 int
-netdev_set_mtu(const struct netdev *netdev, int mtu)
+netdev_set_mtu(struct netdev *netdev, int mtu)
 {
 const struct netdev_class *class = netdev->netdev_class;
 int error;
diff --git a/lib/netdev.h b/lib/netdev.h
index dc7ede8..d8ec627 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -132,7 +132,7 @@ const char *netdev_get_name(const struct netdev *);
 const char *netdev_get_type(const struct netdev *);
 const char *netdev_get_type_from_name(const char *);
 int netdev_get_mtu(const struct netdev *, int *mtup);
-int netdev_set_mtu(const struct netdev *, int mtu);
+int netdev_set_mtu(struct netdev *, int mtu);
 int netdev_get_ifindex(const struct netdev *);
 int netdev_set_tx_multiq(struct netdev *, unsigned int n_txq);
 
-- 
1.9.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH V5 4/7] netdev-dummy: Add dummy-internal class.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

"internal" netdevs are treated specially in OVS (e.g. for MTU), but
the dummy datapath remaps both "system" and "internal" devices to the
same "dummy" netdev class, so there's no way to discern those in tests.

This commit adds a new "dummy-internal" netdev type, which will be used
by the dummy datapath for internal ports, so that other parts of the
code can understand which ports are internal just by looking at the
netdev object.

The alternative solution, using the original interface type ("internal")
instead of the translated netdev type ("dummy"), is harder to implement,
because in so many places only the netdev object is available.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c |  2 +-
 lib/netdev-dummy.c| 14 --
 tests/bridge.at   |  6 +++---
 tests/dpctl.at| 12 ++--
 tests/mpls-xlate.at   |  4 ++--
 tests/netdev-type.at  |  2 +-
 tests/ofproto-dpif.at | 18 +-
 tests/ovs-vswitchd.at |  6 +++---
 tests/pmd.at  |  8 
 tests/tunnel-push-pop-ipv6.at |  4 ++--
 tests/tunnel-push-pop.at  |  4 ++--
 tests/tunnel.at   | 28 ++--
 12 files changed, 59 insertions(+), 49 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e39362e..6f2e07d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -888,7 +888,7 @@ static const char *
 dpif_netdev_port_open_type(const struct dpif_class *class, const char *type)
 {
 return strcmp(type, "internal") ? type
-  : dpif_netdev_class_is_dummy(class) ? "dummy"
+  : dpif_netdev_class_is_dummy(class) ? "dummy-internal"
   : "tap";
 }
 
diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 2a6aa56..92af15f 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -622,12 +622,15 @@ dummy_netdev_get_conn_state(struct dummy_packet_conn 
*conn)
 }
 
 static void
-netdev_dummy_run(const struct netdev_class *netdev_class OVS_UNUSED)
+netdev_dummy_run(const struct netdev_class *netdev_class)
 {
 struct netdev_dummy *dev;
 
 ovs_mutex_lock(_list_mutex);
 LIST_FOR_EACH (dev, list_node, _list) {
+if (netdev_get_class(>up) != netdev_class) {
+continue;
+}
 ovs_mutex_lock(>mutex);
 dummy_packet_conn_run(dev);
 ovs_mutex_unlock(>mutex);
@@ -636,12 +639,15 @@ netdev_dummy_run(const struct netdev_class *netdev_class 
OVS_UNUSED)
 }
 
 static void
-netdev_dummy_wait(const struct netdev_class *netdev_class OVS_UNUSED)
+netdev_dummy_wait(const struct netdev_class *netdev_class)
 {
 struct netdev_dummy *dev;
 
 ovs_mutex_lock(_list_mutex);
 LIST_FOR_EACH (dev, list_node, _list) {
+if (netdev_get_class(>up) != netdev_class) {
+continue;
+}
 ovs_mutex_lock(>mutex);
 dummy_packet_conn_wait(>conn);
 ovs_mutex_unlock(>mutex);
@@ -1380,6 +1386,9 @@ netdev_dummy_update_flags(struct netdev *netdev_,
 static const struct netdev_class dummy_class =
 NETDEV_DUMMY_CLASS("dummy", false, NULL);
 
+static const struct netdev_class dummy_internal_class =
+NETDEV_DUMMY_CLASS("dummy-internal", false, NULL);
+
 static const struct netdev_class dummy_pmd_class =
 NETDEV_DUMMY_CLASS("dummy-pmd", true,
netdev_dummy_reconfigure);
@@ -1751,6 +1760,7 @@ netdev_dummy_register(enum dummy_level level)
 netdev_dummy_override("system");
 }
 netdev_register_provider(_class);
+netdev_register_provider(_internal_class);
 netdev_register_provider(_pmd_class);
 
 netdev_vport_tunnel_register();
diff --git a/tests/bridge.at b/tests/bridge.at
index 37c55ba..3dbabe5 100644
--- a/tests/bridge.at
+++ b/tests/bridge.at
@@ -12,7 +12,7 @@ add_of_ports br0 1 2
 AT_CHECK([ovs-appctl dpif/show], [0], [dnl
 dummy@ovs-dummy: hit:0 missed:0
br0:
-   br0 65534/100: (dummy)
+   br0 65534/100: (dummy-internal)
p1 1/1: (dummy)
p2 2/2: (dummy)
 ])
@@ -23,7 +23,7 @@ AT_CHECK([ovs-appctl dpctl/del-if dummy@ovs-dummy p1])
 AT_CHECK([ovs-appctl dpif/show], [0], [dnl
 dummy@ovs-dummy: hit:0 missed:0
br0:
-   br0 65534/100: (dummy)
+   br0 65534/100: (dummy-internal)
p2 2/2: (dummy)
 ])
 
@@ -32,7 +32,7 @@ AT_CHECK([ovs-vsctl del-port p2])
 AT_CHECK([ovs-appctl dpif/show], [0], [dnl
 dummy@ovs-dummy: hit:0 missed:0
br0:
-   br0 65534/100: (dummy)
+   br0 65534/100: (dummy-internal)
p1 1/1: (dummy)
 ])
 OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
diff --git a/tests/dpctl.at b/tests/dpctl.at
index b6d5dd6..8c761c8 100644
--- a/tests/dpctl.at
+++ b/tests/dpctl.at
@@ -23,14 +23,14 @@ AT_CHECK([ovs-appctl dpctl/show dummy@br0], [0], [dnl
 dummy@br0:
lookups: hit:0 

[ovs-dev] [PATCH V5 7/7] netdev-dpdk: add support for Jumbo Frames

2016-08-09 Thread Mark Kavanagh
Add support for Jumbo Frames to DPDK-enabled port types,
using single-segment-mbufs.

Using this approach, the amount of memory allocated to each mbuf
to store frame data is increased to a value greater than 1518B
(typical Ethernet maximum frame length). The increased space
available in the mbuf means that an entire Jumbo Frame of a specific
size can be carried in a single mbuf, as opposed to partitioning
it across multiple mbuf segments.

The amount of space allocated to each mbuf to hold frame data is
defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
parameter.

Signed-off-by: Mark Kavanagh 
[diproiet...@vmware.com rebased]
Signed-off-by: Daniele Di Proietto 
---

v5:
- rename dpdk_mp_configure to netdev_dpdk_mempool_configure
- consolidate socket_id and mtu changes within
  netdev_dpdk_mempool_configure
- add lower bounds check for user-supplied MTU
- add socket_id and mtu fields to mempool configure error report
- minor cosmetic changes

v4:
- restore error reporting in *_reconfigure functions (for
  non-mtu-configuration based errors)
- remove 'goto' in the event of dpdk_mp_configure failure
- remove superfluous error variables

 v3:
- replace netdev_dpdk.last_mtu with local variable
- add comment for dpdk_mp_configure

 v2:
 - rebase to HEAD of master
 - fall back to previous 'good' MTU if reconfigure fails
 - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
   fall-back
 - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
 - remove rebasing artifact in INSTALL.DPDK-Advanced.md
 - remove superflous variable in dpdk_mp_configure
 - fix minor coding style infraction

 INSTALL.DPDK-ADVANCED.md |  58 ++-
 INSTALL.DPDK.md  |   1 -
 NEWS |   1 +
 lib/netdev-dpdk.c| 145 +++
 4 files changed, 176 insertions(+), 29 deletions(-)

diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
index 0ab43d4..5e758ce 100755
--- a/INSTALL.DPDK-ADVANCED.md
+++ b/INSTALL.DPDK-ADVANCED.md
@@ -1,5 +1,5 @@
 OVS DPDK ADVANCED INSTALL GUIDE
-=
+===
 
 ## Contents
 
@@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
 7. [QOS](#qos)
 8. [Rate Limiting](#rl)
 9. [Flow Control](#fc)
-10. [Vsperf](#vsperf)
+10. [Jumbo Frames](#jumbo)
+11. [Vsperf](#vsperf)
 
 ##  1. Overview
 
@@ -862,7 +863,58 @@ respective parameter. To disable the flow control at tx 
side,
 
 `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
 
-##  10. Vsperf
+##  10. Jumbo Frames
+
+By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
+enable Jumbo Frames support for a DPDK port, change the Interface's 
`mtu_request`
+attribute to a sufficiently large value.
+
+e.g. Add a DPDK Phy port with MTU of 9000:
+
+`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set 
Interface dpdk0 mtu_request=9000`
+
+e.g. Change the MTU of an existing port to 6200:
+
+`ovs-vsctl set Interface dpdk0 mtu_request=6200`
+
+When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
+increased, such that a full Jumbo Frame of a specific size may be accommodated
+within a single mbuf segment.
+
+Jumbo frame support has been validated against 9728B frames (largest frame size
+supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
+(particularly in use cases involving East-West traffic only), and other DPDK 
NIC
+drivers may be supported.
+
+### 9.1 vHost Ports and Jumbo Frames
+
+Some additional configuration is needed to take advantage of jumbo frames with
+vhost ports:
+
+1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
+the QEMU command line snippet below:
+
+```
+'-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
+'-device 
virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
+```
+
+2. Where virtio devices are bound to the Linux kernel driver in a guest
+   environment (i.e. interfaces are not bound to an in-guest DPDK driver),
+   the MTU of those logical network interfaces must also be increased to a
+   sufficiently large value. This avoids segmentation of Jumbo Frames
+   received in the guest. Note that 'MTU' refers to the length of the IP
+   packet only, and not that of the entire frame.
+
+   To calculate the exact MTU of a standard IPv4 frame, subtract the L2
+   header and CRC lengths (i.e. 18B) from the max supported frame size.
+   So, to set the MTU for a 9018B Jumbo Frame:
+
+   ```
+   ifconfig eth1 mtu 9000
+   ```
+
+##  11. Vsperf
 
 Vsperf project goal is to develop vSwitch test framework that can be used to
 validate the suitability of different vSwitch implementations in a Telco 
deployment
diff --git 

[ovs-dev] [PATCH V5 2/7] vswitchd: Introduce 'mtu_request' column in Interface.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

The 'mtu_request' column can be used to set the MTU of a specific
interface.

This column is useful because it will allow changing the MTU of DPDK
devices (implemented in a future commit), which are not accessible
outside the ovs-vswitchd process, but it can be used for kernel
interfaces as well.

The current implementation of set_mtu() in netdev-dpdk is removed
because it's broken.  It will be reintroduced by a subsequent commit on
this series.

Signed-off-by: Daniele Di Proietto 
---
 NEWS   |  2 ++
 lib/netdev-dpdk.c  | 53 +-
 vswitchd/bridge.c  |  9 
 vswitchd/vswitch.ovsschema | 10 +++--
 vswitchd/vswitch.xml   | 52 +
 5 files changed, 58 insertions(+), 68 deletions(-)

diff --git a/NEWS b/NEWS
index c2ed71d..ce10982 100644
--- a/NEWS
+++ b/NEWS
@@ -101,6 +101,8 @@ Post-v2.5.0
- ovs-pki: Changed message digest algorithm from SHA-1 to SHA-512 because
  SHA-1 is no longer secure and some operating systems have started to
  disable it in OpenSSL.
+   - Add 'mtu_request' column to the Interface table. It can be used to
+ configure the MTU of non-internal ports.
 
 
 v2.5.0 - 26 Feb 2016
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index f37ec1c..60db568 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1639,57 +1639,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int 
*mtup)
 }
 
 static int
-netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
-{
-struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
-int old_mtu, err, dpdk_mtu;
-struct dpdk_mp *old_mp;
-struct dpdk_mp *mp;
-uint32_t buf_size;
-
-ovs_mutex_lock(_mutex);
-ovs_mutex_lock(>mutex);
-if (dev->mtu == mtu) {
-err = 0;
-goto out;
-}
-
-buf_size = dpdk_buf_size(mtu);
-dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
-
-mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
-if (!mp) {
-err = ENOMEM;
-goto out;
-}
-
-rte_eth_dev_stop(dev->port_id);
-
-old_mtu = dev->mtu;
-old_mp = dev->dpdk_mp;
-dev->dpdk_mp = mp;
-dev->mtu = mtu;
-dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-
-err = dpdk_eth_dev_init(dev);
-if (err) {
-dpdk_mp_put(mp);
-dev->mtu = old_mtu;
-dev->dpdk_mp = old_mp;
-dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-dpdk_eth_dev_init(dev);
-goto out;
-}
-
-dpdk_mp_put(old_mp);
-netdev_change_seq_changed(netdev);
-out:
-ovs_mutex_unlock(>mutex);
-ovs_mutex_unlock(_mutex);
-return err;
-}
-
-static int
 netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier);
 
 static int
@@ -2964,7 +2913,7 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev)
 netdev_dpdk_set_etheraddr,\
 netdev_dpdk_get_etheraddr,\
 netdev_dpdk_get_mtu,  \
-netdev_dpdk_set_mtu,  \
+NULL,   /* set_mtu */ \
 netdev_dpdk_get_ifindex,  \
 GET_CARRIER,  \
 netdev_dpdk_get_carrier_resets,   \
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index ddf1fe5..397be70 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -775,6 +775,15 @@ bridge_delete_or_reconfigure_ports(struct bridge *br)
 goto delete;
 }
 
+if (iface->cfg->n_mtu_request == 1
+&& strcmp(iface->type,
+  ofproto_port_open_type(br->type, "internal"))) {
+/* Try to set the MTU to the requested value.  This is not done
+ * for internal interfaces, since their MTU is decided by the
+ * ofproto module, based on other ports in the bridge. */
+netdev_set_mtu(iface->netdev, *iface->cfg->mtu_request);
+}
+
 /* If the requested OpenFlow port for 'iface' changed, and it's not
  * already the correct port, then we might want to temporarily delete
  * this interface, so we can add it back again with the new OpenFlow
diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 32fdf28..8966803 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@
 {"name": "Open_vSwitch",
- "version": "7.13.0",
- "cksum": "889248633 22774",
+ "version": "7.14.0",
+ "cksum": "3974332717 22936",
  "tables": {
"Open_vSwitch": {
  "columns": {
@@ -321,6 +321,12 @@
"mtu": {
  "type": {"key": "integer", "min": 0, "max": 1},
  "ephemeral": true},
+   "mtu_request": {
+ "type": {
+   "key": {"type": "integer",
+   "minInteger": 1},
+   

[ovs-dev] [PATCH V5 3/7] netdev: Pass 'netdev_class' to ->run() and ->wait().

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

This will allow run() and wait() methods to be shared between different
classes and still perform class-specific work.

Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-bsd.c  |  6 +++---
 lib/netdev-dummy.c|  4 ++--
 lib/netdev-linux.c|  6 +++---
 lib/netdev-provider.h | 14 ++
 lib/netdev-vport.c|  4 ++--
 lib/netdev.c  |  4 ++--
 6 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c
index 2bba0ed..75a330b 100644
--- a/lib/netdev-bsd.c
+++ b/lib/netdev-bsd.c
@@ -146,7 +146,7 @@ static void ifr_set_flags(struct ifreq *, int flags);
 static int af_link_ioctl(unsigned long command, const void *arg);
 #endif
 
-static void netdev_bsd_run(void);
+static void netdev_bsd_run(const struct netdev_class *);
 static int netdev_bsd_get_mtu(const struct netdev *netdev_, int *mtup);
 
 static bool
@@ -180,7 +180,7 @@ netdev_get_kernel_name(const struct netdev *netdev)
  * interface status changes, and eventually calls all the user callbacks.
  */
 static void
-netdev_bsd_run(void)
+netdev_bsd_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 rtbsd_notifier_run();
 }
@@ -190,7 +190,7 @@ netdev_bsd_run(void)
  * be called.
  */
 static void
-netdev_bsd_wait(void)
+netdev_bsd_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 rtbsd_notifier_wait();
 }
diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index a950409..2a6aa56 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -622,7 +622,7 @@ dummy_netdev_get_conn_state(struct dummy_packet_conn *conn)
 }
 
 static void
-netdev_dummy_run(void)
+netdev_dummy_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct netdev_dummy *dev;
 
@@ -636,7 +636,7 @@ netdev_dummy_run(void)
 }
 
 static void
-netdev_dummy_wait(void)
+netdev_dummy_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct netdev_dummy *dev;
 
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index fa37bcf..1b5f7c1 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -526,7 +526,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
20);
  * changes in the device miimon status, so we can use atomic_count. */
 static atomic_count miimon_cnt = ATOMIC_COUNT_INIT(0);
 
-static void netdev_linux_run(void);
+static void netdev_linux_run(const struct netdev_class *);
 
 static int netdev_linux_do_ethtool(const char *name, struct ethtool_cmd *,
int cmd, const char *cmd_name);
@@ -623,7 +623,7 @@ netdev_linux_miimon_enabled(void)
 }
 
 static void
-netdev_linux_run(void)
+netdev_linux_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct nl_sock *sock;
 int error;
@@ -697,7 +697,7 @@ netdev_linux_run(void)
 }
 
 static void
-netdev_linux_wait(void)
+netdev_linux_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 struct nl_sock *sock;
 
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index ae390cb..5bcfeba 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -236,15 +236,21 @@ struct netdev_class {
 int (*init)(void);
 
 /* Performs periodic work needed by netdevs of this class.  May be null if
- * no periodic work is necessary. */
-void (*run)(void);
+ * no periodic work is necessary.
+ *
+ * 'netdev_class' points to the class.  It is useful in case the same
+ * function is used to implement different classes. */
+void (*run)(const struct netdev_class *netdev_class);
 
 /* Arranges for poll_block() to wake up if the "run" member function needs
  * to be called.  Implementations are additionally required to wake
  * whenever something changes in any of its netdevs which would cause their
  * ->change_seq() function to change its result.  May be null if nothing is
- * needed here. */
-void (*wait)(void);
+ * needed here.
+ *
+ * 'netdev_class' points to the class.  It is useful in case the same
+ * function is used to implement different classes. */
+void (*wait)(const struct netdev_class *netdev_class);
 
 /* ##  ## */
 /* ## netdev Functions ## */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 87a30f8..7eabd2c 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -321,7 +321,7 @@ netdev_vport_update_flags(struct netdev *netdev OVS_UNUSED,
 }
 
 static void
-netdev_vport_run(void)
+netdev_vport_run(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 uint64_t seq;
 
@@ -334,7 +334,7 @@ netdev_vport_run(void)
 }
 
 static void
-netdev_vport_wait(void)
+netdev_vport_wait(const struct netdev_class *netdev_class OVS_UNUSED)
 {
 uint64_t seq;
 
diff --git a/lib/netdev.c b/lib/netdev.c
index 75bf1cb..589d37c 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -160,7 +160,7 @@ netdev_run(void)
 struct netdev_registered_class *rc;
 CMAP_FOR_EACH (rc, cmap_node, 

[ovs-dev] [PATCH V5 5/7] tests: Add a new MTU test.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

Also, netdev-dummy needs to call netdev_change_seq_changed() in
set_mtu().

Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-dummy.c|  5 -
 tests/ofproto-dpif.at | 30 ++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 92af15f..c8f82b7 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -1155,7 +1155,10 @@ netdev_dummy_set_mtu(const struct netdev *netdev, int 
mtu)
 struct netdev_dummy *dev = netdev_dummy_cast(netdev);
 
 ovs_mutex_lock(>mutex);
-dev->mtu = mtu;
+if (dev->mtu != mtu) {
+dev->mtu = mtu;
+netdev_change_seq_changed(netdev);
+}
 ovs_mutex_unlock(>mutex);
 
 return 0;
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index a46fc81..3638063 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -8859,3 +8859,33 @@ n_packets=0
 
 OVS_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([ofproto - set mtu])
+OVS_VSWITCHD_START
+
+add_of_ports br0 1
+
+# Check that initial MTU is 1500 for 'br0' and 'p1'.
+AT_CHECK([ovs-vsctl get Interface br0 mtu], [0], [dnl
+1500
+])
+AT_CHECK([ovs-vsctl get Interface p1 mtu], [0], [dnl
+1500
+])
+
+# Request new MTU for 'p1'
+AT_CHECK([ovs-vsctl set Interface p1 mtu_request=1600])
+
+# Check that the new MTU is applied
+AT_CHECK([ovs-vsctl --timeout=10 wait-until Interface p1 mtu=1600])
+# The internal port 'br0' should have the same MTU value as p1, becase it's
+# the new bridge minimum.
+AT_CHECK([ovs-vsctl --timeout=10 wait-until Interface br0 mtu=1600])
+
+AT_CHECK([ovs-vsctl del-port br0 p1])
+
+# When 'p1' is deleted, the internal port should return to the default MTU
+AT_CHECK([ovs-vsctl --timeout=10 wait-until Interface br0 mtu=1500])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
-- 
1.9.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH V5 1/7] ofproto: Consider datapath_type when looking for internal ports.

2016-08-09 Thread Mark Kavanagh
From: Daniele Di Proietto 

Interfaces with type "internal" end up having a netdev with type "tap"
in the dpif-netdev datapath, so a strcmp will fail to match internal
interfaces.

We can translate the types with ofproto_port_open_type() before calling
strcmp to fix this.

This fixes a minor issue where internal interfaces are considered
non-internal in the userspace datapath for the purpose of adjusting the
MTU.

Signed-off-by: Daniele Di Proietto 
---
 ofproto/ofproto.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 8e59c69..088f91a 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -220,7 +220,8 @@ static void learned_cookies_flush(struct ofproto *, struct 
ovs_list *dead_cookie
 /* ofport. */
 static void ofport_destroy__(struct ofport *) OVS_EXCLUDED(ofproto_mutex);
 static void ofport_destroy(struct ofport *, bool del);
-static inline bool ofport_is_internal(const struct ofport *);
+static inline bool ofport_is_internal(const struct ofproto *,
+  const struct ofport *);
 
 static int update_port(struct ofproto *, const char *devname);
 static int init_ports(struct ofproto *);
@@ -2465,7 +2466,7 @@ static void
 ofport_remove(struct ofport *ofport)
 {
 struct ofproto *p = ofport->ofproto;
-bool is_internal = ofport_is_internal(ofport);
+bool is_internal = ofport_is_internal(p, ofport);
 
 connmgr_send_port_status(ofport->ofproto->connmgr, NULL, >pp,
  OFPPR_DELETE);
@@ -2751,9 +2752,10 @@ init_ports(struct ofproto *p)
 }
 
 static inline bool
-ofport_is_internal(const struct ofport *port)
+ofport_is_internal(const struct ofproto *p, const struct ofport *port)
 {
-return !strcmp(netdev_get_type(port->netdev), "internal");
+return !strcmp(netdev_get_type(port->netdev),
+   ofproto_port_open_type(p->type, "internal"));
 }
 
 /* Find the minimum MTU of all non-datapath devices attached to 'p'.
@@ -2770,7 +2772,7 @@ find_min_mtu(struct ofproto *p)
 
 /* Skip any internal ports, since that's what we're trying to
  * set. */
-if (ofport_is_internal(ofport)) {
+if (ofport_is_internal(p, ofport)) {
 continue;
 }
 
@@ -2797,7 +2799,7 @@ update_mtu(struct ofproto *p, struct ofport *port)
 port->mtu = 0;
 return;
 }
-if (ofport_is_internal(port)) {
+if (ofport_is_internal(p, port)) {
 if (dev_mtu > p->min_mtu) {
if (!netdev_set_mtu(port->netdev, p->min_mtu)) {
dev_mtu = p->min_mtu;
@@ -2827,7 +2829,7 @@ update_mtu_ofproto(struct ofproto *p)
 HMAP_FOR_EACH (ofport, hmap_node, >ports) {
 struct netdev *netdev = ofport->netdev;
 
-if (ofport_is_internal(ofport)) {
+if (ofport_is_internal(p, ofport)) {
 if (!netdev_set_mtu(netdev, p->min_mtu)) {
 ofport->mtu = p->min_mtu;
 }
-- 
1.9.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH V3 7/7] netdev-dpdk: add support for Jumbo Frames

2016-08-09 Thread Kavanagh, Mark B
>On 08.08.2016 18:50, Mark Kavanagh wrote:
>> Add support for Jumbo Frames to DPDK-enabled port types,
>> using single-segment-mbufs.
>> 
>> Using this approach, the amount of memory allocated to each mbuf
>> to store frame data is increased to a value greater than 1518B
>> (typical Ethernet maximum frame length). The increased space
>> available in the mbuf means that an entire Jumbo Frame of a specific
>> size can be carried in a single mbuf, as opposed to partitioning
>> it across multiple mbuf segments.
>> 
>> The amount of space allocated to each mbuf to hold frame data is
>> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
>> parameter.
>> 
>> Signed-off-by: Mark Kavanagh 
>> [diproiet...@vmware.com rebased]
>> Signed-off-by: Daniele Di Proietto 
>> ---
>> 
>> v4:
>> - restore error reporting in *_reconfigure functions (for
>>   non-mtu-configuration based errors)
>> - remove 'goto' in the event of dpdk_mp_configure failure
>> - remove superfluous error variables
>> 
>>  v3:
>> - replace netdev_dpdk.last_mtu with local variable
>> - add comment for dpdk_mp_configure
>> 
>>  v2:
>>  - rebase to HEAD of master
>>  - fall back to previous 'good' MTU if reconfigure fails
>>  - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
>>fall-back
>>  - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
>>  - remove rebasing artifact in INSTALL.DPDK-Advanced.md
>>  - remove superflous variable in dpdk_mp_configure
>>  - fix minor coding style infraction
>> 
>> 
>>  INSTALL.DPDK-ADVANCED.md |  58 -
>>  INSTALL.DPDK.md  |   1 -
>>  NEWS |   1 +
>>  lib/netdev-dpdk.c| 162 
>> ---
>>  4 files changed, 194 insertions(+), 28 deletions(-)
>> 
>> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
>> index 0ab43d4..5e758ce 100755
>> --- a/INSTALL.DPDK-ADVANCED.md
>> +++ b/INSTALL.DPDK-ADVANCED.md
>> @@ -1,5 +1,5 @@
>>  OVS DPDK ADVANCED INSTALL GUIDE
>> -=
>> +===
>>  
>>  ## Contents
>>  
>> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>>  7. [QOS](#qos)
>>  8. [Rate Limiting](#rl)
>>  9. [Flow Control](#fc)
>> -10. [Vsperf](#vsperf)
>> +10. [Jumbo Frames](#jumbo)
>> +11. [Vsperf](#vsperf)
>>  
>>  ##  1. Overview
>>  
>> @@ -862,7 +863,58 @@ respective parameter. To disable the flow control at tx 
>> side,
>>  
>>  `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
>>  
>> -##  10. Vsperf
>> +##  10. Jumbo Frames
>> +
>> +By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
>> +enable Jumbo Frames support for a DPDK port, change the Interface's 
>> `mtu_request`
>> +attribute to a sufficiently large value.
>> +
>> +e.g. Add a DPDK Phy port with MTU of 9000:
>> +
>> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set 
>> Interface dpdk0 mtu_request=9000`
>> +
>> +e.g. Change the MTU of an existing port to 6200:
>> +
>> +`ovs-vsctl set Interface dpdk0 mtu_request=6200`
>> +
>> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
>> +increased, such that a full Jumbo Frame of a specific size may be 
>> accommodated
>> +within a single mbuf segment.
>> +
>> +Jumbo frame support has been validated against 9728B frames (largest frame 
>> size
>> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
>> +(particularly in use cases involving East-West traffic only), and other 
>> DPDK NIC
>> +drivers may be supported.
>> +
>> +### 9.1 vHost Ports and Jumbo Frames
>> +
>> +Some additional configuration is needed to take advantage of jumbo frames 
>> with
>> +vhost ports:
>> +
>> +1. `mergeable buffers` must be enabled for vHost ports, as demonstrated 
>> in
>> +the QEMU command line snippet below:
>> +
>> +```
>> +'-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
>> +'-device 
>> virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
>> +```
>> +
>> +2. Where virtio devices are bound to the Linux kernel driver in a guest
>> +   environment (i.e. interfaces are not bound to an in-guest DPDK 
>> driver),
>> +   the MTU of those logical network interfaces must also be increased 
>> to a
>> +   sufficiently large value. This avoids segmentation of Jumbo Frames
>> +   received in the guest. Note that 'MTU' refers to the length of the IP
>> +   packet only, and not that of the entire frame.
>> +
>> +   To calculate the exact MTU of a standard IPv4 frame, subtract the L2
>> +   header and CRC lengths (i.e. 18B) from the max supported frame size.
>> +   So, to set the MTU for a 9018B Jumbo Frame:
>> +
>> +   ```
>> +   ifconfig eth1 mtu 9000
>> +   ```
>> +
>> +##  11. Vsperf
>>  
>>  Vsperf project 

[ovs-dev] Delivery reports about your e-mail

2016-08-09 Thread Mail Delivery Subsystem
The original message was included as attachment

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] Documents Requested

2016-08-09 Thread Bridgette
Dear dev,

Please find attached documents as requested.

Best Regards,
Bridgette
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] FW: Documents Requested

2016-08-09 Thread Bradford
Dear dev,

Please find attached documents as requested.

Best Regards,
Bradford
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] FW: Documents Requested

2016-08-09 Thread Robt
Dear dev,

Please find attached documents as requested.

Best Regards,
Robt
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Documents Requested

2016-08-09 Thread Rosalind
Dear dev,

Please find attached documents as requested.

Best Regards,
Rosalind
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Documents Requested

2016-08-09 Thread Blair
Dear dev,

Please find attached documents as requested.

Best Regards,
Blair
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] OVS DPDK VFIO error

2016-08-09 Thread Kapil Adhikesavalu
Hi,

On a Intel xeon E5-2697 chip with iommu turned on with Intel NIC 82599, i
am getting the following error while doing the NIC binding using VFIO.
kernel: 4.23 fedora 23, i haven't tried the latest kernel yet.

E5-2697 supports IOMMU VT-d, does

VFIO NIC binding steps,

modprobe vfio-pci
sudo /usr/bin/chmod a+x /dev/vfio
sudo /usr/bin/chmod 0666 /dev/vfio/*
$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci :04:00.0
$DPDK_DIR/tools/dpdk_nic_bind.py --status

Error
=

EAL: Detected 48 lcore(s)

EAL: Probing VFIO support...

EAL:   IOMMU type 1 (Type 1) is supported

EAL:   IOMMU type 8 (No-IOMMU) is not supported

EAL: VFIO support initialized


EAL: Master lcore 1 is ready (tid=83504bc0;cpuset=[1])

EAL: PCI device :04:00.0 on NUMA socket 0

EAL:   probe driver: 8086:154d rte_ixgbe_pmd

EAL:   set IOMMU type 1 (Type 1) failed, error 1 (Operation not permitted)

EAL:   set IOMMU type 8 (No-IOMMU) failed, error 19 (No such device)

EAL:   :04:00.0 failed to select IOMMU type

EAL: Error - exiting with code: 1

  Cause: Requested device :04:00.0 cannot be used


dmesg:
==

[0.997461] DMAR: Ignoring identity map for HW passthrough device
:00:1f.0 [0x0 - 0xff]

[0.997465] DMAR: Intel(R) Virtualization Technology for Directed I/O

[1.351801] DMAR: 32bit :00:1a.0 uses non-identity mapping

[1.362623] DMAR: 32bit :00:1d.0 uses non-identity mapping

[1.373601] DMAR: 32bit :01:00.4 uses non-identity mapping

[  297.035504] vfio-pci :04:00.0: Device is ineligible for IOMMU domain
attach due to platform RMRR requirement.  Contact your platform vendor.



[root@localhost bin]# cat /proc/cmdline

BOOT_IMAGE=/vmlinuz-4.2.3-300.fc23.x86_64 root=/dev/mapper/fedora-root ro
rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet
default_hugepagesz=1G hugepagesz=1G hugepages=16 hugepagesz=2M
hugepages=2048 iommu=pt intel_iommu=on


demsg | grep 10G  - 82599 controller

04:00.0 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter
(rev 01)

04:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter
(rev 01)

Regards
Kapil.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH V4 7/7] netdev-dpdk: add support for Jumbo Frames

2016-08-09 Thread Ilya Maximets
On 08.08.2016 18:50, Mark Kavanagh wrote:
> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
> 
> Using this approach, the amount of memory allocated to each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame of a specific
> size can be carried in a single mbuf, as opposed to partitioning
> it across multiple mbuf segments.
> 
> The amount of space allocated to each mbuf to hold frame data is
> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
> parameter.
> 
> Signed-off-by: Mark Kavanagh 
> [diproiet...@vmware.com rebased]
> Signed-off-by: Daniele Di Proietto 
> ---
> 
> v4:
> - restore error reporting in *_reconfigure functions (for
>   non-mtu-configuration based errors)
> - remove 'goto' in the event of dpdk_mp_configure failure
> - remove superfluous error variables
> 
>  v3:
> - replace netdev_dpdk.last_mtu with local variable
> - add comment for dpdk_mp_configure
> 
>  v2:
>  - rebase to HEAD of master
>  - fall back to previous 'good' MTU if reconfigure fails
>  - introduce new field 'last_mtu' in struct netdev_dpdk to facilitate
>fall-back
>  - rename 'mtu_request' to 'requested_mtu' in struct netdev_dpdk
>  - remove rebasing artifact in INSTALL.DPDK-Advanced.md
>  - remove superflous variable in dpdk_mp_configure
>  - fix minor coding style infraction
> 
> 
>  INSTALL.DPDK-ADVANCED.md |  58 -
>  INSTALL.DPDK.md  |   1 -
>  NEWS |   1 +
>  lib/netdev-dpdk.c| 162 
> ---
>  4 files changed, 194 insertions(+), 28 deletions(-)
> 
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 0ab43d4..5e758ce 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -1,5 +1,5 @@
>  OVS DPDK ADVANCED INSTALL GUIDE
> -=
> +===
>  
>  ## Contents
>  
> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>  7. [QOS](#qos)
>  8. [Rate Limiting](#rl)
>  9. [Flow Control](#fc)
> -10. [Vsperf](#vsperf)
> +10. [Jumbo Frames](#jumbo)
> +11. [Vsperf](#vsperf)
>  
>  ##  1. Overview
>  
> @@ -862,7 +863,58 @@ respective parameter. To disable the flow control at tx 
> side,
>  
>  `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
>  
> -##  10. Vsperf
> +##  10. Jumbo Frames
> +
> +By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
> +enable Jumbo Frames support for a DPDK port, change the Interface's 
> `mtu_request`
> +attribute to a sufficiently large value.
> +
> +e.g. Add a DPDK Phy port with MTU of 9000:
> +
> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set 
> Interface dpdk0 mtu_request=9000`
> +
> +e.g. Change the MTU of an existing port to 6200:
> +
> +`ovs-vsctl set Interface dpdk0 mtu_request=6200`
> +
> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
> +increased, such that a full Jumbo Frame of a specific size may be 
> accommodated
> +within a single mbuf segment.
> +
> +Jumbo frame support has been validated against 9728B frames (largest frame 
> size
> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
> +(particularly in use cases involving East-West traffic only), and other DPDK 
> NIC
> +drivers may be supported.
> +
> +### 9.1 vHost Ports and Jumbo Frames
> +
> +Some additional configuration is needed to take advantage of jumbo frames 
> with
> +vhost ports:
> +
> +1. `mergeable buffers` must be enabled for vHost ports, as demonstrated 
> in
> +the QEMU command line snippet below:
> +
> +```
> +'-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
> +'-device 
> virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
> +```
> +
> +2. Where virtio devices are bound to the Linux kernel driver in a guest
> +   environment (i.e. interfaces are not bound to an in-guest DPDK 
> driver),
> +   the MTU of those logical network interfaces must also be increased to 
> a
> +   sufficiently large value. This avoids segmentation of Jumbo Frames
> +   received in the guest. Note that 'MTU' refers to the length of the IP
> +   packet only, and not that of the entire frame.
> +
> +   To calculate the exact MTU of a standard IPv4 frame, subtract the L2
> +   header and CRC lengths (i.e. 18B) from the max supported frame size.
> +   So, to set the MTU for a 9018B Jumbo Frame:
> +
> +   ```
> +   ifconfig eth1 mtu 9000
> +   ```
> +
> +##  11. Vsperf
>  
>  Vsperf project goal is to develop vSwitch test framework that can be used to
>  validate the suitability of different vSwitch implementations in a Telco 

Re: [ovs-dev] [RFC/PATCH v2] Make the PID part of socket path configurable

2016-08-09 Thread Christian Svensson
On Mon, Aug 8, 2016 at 11:59 PM, Ben Pfaff  wrote:
>
> Including the PID allows multiple daemons of a single type to run.

While technically true, in practice I would argue this doesn't hold and is
indeed one of the reason I made this change.

The documented way to connect to a daemon is "ovs-appctl -t my-daemon".
That reads the PID file "my-daemin.pid" and uses "my-daemon.$PID.ctl".
That requires my PID file given to the daemon to match the internal
"program_name" for the socket resolution code to work.
I.e., I cannot do --pidfile=my-daemon2.pid and then ovs-appctl -t
my-daemon2 will work.

In order to call the second one you would have to do:
ovs-appctl -t $PWD/my-daemon.$( pidfile isn't difficult.
>
I don't understand why you want to change this so badly.  It's not hard
> to read a pidfile.


Just because it isn't difficult doesn't mean it's the right way to do
things.

I definitely don't want to fork OVS behavior here based on a
> configuration flag, as I already explained.
>
But you already do!
The configuration flag today is called WIN32. With this patch it's called
WITH_PID_SOCKET_PATH and defaults to YES/NO depending on the platform.
The code that is behind #ifdef $OS is much smaller, and the one behind
#ifdef $FEATURE is larger. That should make more code testable in both
platforms, making the total code aliveness higher.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] ovn: conditional monitor feature

2016-08-09 Thread Liran Schour
Hi,

Do we plan to include the ovn conditional monitoring usage in 2.6.0?
Performance evaluation shows up to 75% reduction of computation at the 
ovn-controller and also a reduction of computation at the server side, 
depending at the spread of the logical networks over the physical hosts. 
(numbers are included in the commit message).

The client side (ovn) of the patch series is still under review: 
http://openvswitch.org/pipermail/dev/2016-August/077007.html

There was also additional work using conditional monitoring, that 
introduced an optimization of flood fill in the ovn-controller to reduce 
the number of hops required to reveal all datapaths: 
http://openvswitch.org/pipermail/dev/2016-August/077113.html
This optimization can be added on top of the origin patch.

Thanks,
- Liran

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev