Re: [ovs-dev] [PATCH] Add VxLAN-GBP support for user space data path

2016-04-19 Thread Li, Johnson
> -Original Message-
> From: Jesse Gross [mailto:je...@kernel.org]
> Sent: Tuesday, April 19, 2016 11:22 PM
> To: Li, Johnson 
> Cc: ovs dev 
> Subject: Re: [ovs-dev] [PATCH] Add VxLAN-GBP support for user space data
> path
> 
> On Tue, Apr 19, 2016 at 3:20 AM, Johnson.Li  wrote:
> > diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c index
> > e398562..e6b35ed 100644
> > --- a/lib/netdev-vport.c
> > +++ b/lib/netdev-vport.c
> > @@ -1297,10 +1298,16 @@ netdev_vxlan_pop_header(struct dp_packet
> *packet)
> >  return EINVAL;
> >  }
> >
> > -if (get_16aligned_be32(>vx_flags) != htonl(VXLAN_FLAGS) ||
> > -   (get_16aligned_be32(>vx_vni) & htonl(0xff))) {
> > -VLOG_WARN_RL(_rl, "invalid vxlan flags=%#x vni=%#x\n",
> > - ntohl(get_16aligned_be32(>vx_flags)),
> > +vxh_flags = get_16aligned_be32(>vx_flags);
> > +if ((vxh_flags & htonl(VXLAN_GBP_FLAGS)) ==
> > + htonl(VXLAN_GBP_FLAGS)) {
> 
> There are many overlapping extensions to VXLAN. We should only try to
> parse these flags when the GBP extensions is actually turned on.
> 
> > +tnl->gbp_id = htons(ntohl(vxh_flags) & 0x); /*
> > + VXLAN_GBP_ID_MASK */
> 
> There is a field called tnl->gbp_flags which should presumably also be
> populated here (and on transmit).
> 
Ok, I will try to check each flag individually and store them into the 
tnl->gbl_flags.
> > diff --git a/lib/packets.h b/lib/packets.h index 8139a6b..d9a149d
> > 100644
> > --- a/lib/packets.h
> > +++ b/lib/packets.h
> > @@ -1002,6 +1002,7 @@ struct vxlanhdr {  };
> >
> >  #define VXLAN_FLAGS 0x0800  /* struct vxlanhdr.vx_flags required
> > value. */
> > +#define VXLAN_GBP_FLAGS 0x8800  /* Group Based Policy */
> 
> Can you please break these down into individual flags so that it is easier to
> tell what is going on (and you could use them in parsing as well)?
> 
> Can you also please add some test cases?
I will, Thanks for your advice.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] 答复: Reply: ovs + dpdk vhost-user match flows but cannot execute actions

2016-04-19 Thread lifuqiong
Hi Mauricio Vasquez:

 

root@host52:~# ovs-vsctl show

f2664a74-7523-4240-812a-4f022051346a

Bridge "ovsbr0"

Port "vhost-user-1"

Interface "vhost-user-1"

type: dpdkvhostuser

Port "dpdk0"

Interface "dpdk0"

type: dpdk

Port "ovsbr0"

Interface "ovsbr0"

type: internal

Port "vhost-user-0"

Interface "vhost-user-0"

type: dpdkvhostuser

Port "dpdk1"

Interface "dpdk1"

type: dpdk

 

root@host52:~# ovs-ofctl dump-flows ovsbr0

NXST_FLOW reply (xid=0x4):

cookie=0x0, duration=291278.976s, table=0, n_packets=4414, n_bytes=392730, 
idle_age=65534, hard_age=65534, priority=0 actions=NORMAL

 

 

root@host52:~# ovs-ofctl dump-ports ovsbr0

OFPST_PORT reply (xid=0x2): 5 ports

  port  4: rx pkts=2472, bytes=207008, drop=?, errs=0, frame=?, over=?, crc=?

   tx pkts=1893, bytes=181572, drop=49, errs=?, coll=?

  port LOCAL: rx pkts=23, bytes=1278, drop=0, errs=0, frame=0, over=0, crc=0

   tx pkts=631, bytes=28750, drop=0, errs=0, coll=0

  port  1: rx pkts=272, bytes=26283, drop=127396, errs=0, frame=?, over=?, crc=?

   tx pkts=0, bytes=0, drop=0, errs=0, coll=?

  port  2: rx pkts=0, bytes=0, drop=0, errs=0, frame=?, over=?, crc=?

   tx pkts=0, bytes=0, drop=0, errs=0, coll=?

  port  3: rx pkts=1919, bytes=184174, drop=?, errs=0, frame=?, over=?, crc=?

   tx pkts=2462, bytes=205658, drop=0, errs=?, coll=?

 

Thank you.

发件人: Mauricio Vásquez [mailto:mauricio.vasquezber...@studenti.polito.it] 
发送时间: 2016年4月19日 21:49
收件人: lifuqiong
抄送: dev@openvswitch.org
主题: Re: Reply: [ovs-dev] ovs + dpdk vhost-user match flows but cannot execute 
actions

 

Hi lifuqiong, 

Could you provide the output of the following commands in your setup?:
ovs-vsctl show
ovs-ofctl dump-flows
ovs-ofctl dump-ports

Mauricio Vasquez, 

 

On Sat, Apr 16, 2016 at 10:23 AM, lifuqiong  wrote:

Hi Mauricio:

 I changed my qemu version from 2.2.1 to 2.5.0 and the vms can 
communication with each other. But the VM cannot ping PC in the outside 
network. They in a same subnet.

 

PC(192.168.0.103/24) ping vm(192.168.0.90);  vm’s host’s dpdk NIC and PC are in 
a L2 switch. And DPDK NIC connected to switch’s Port Eth1/0/8

Investigation:

 Binding VM’s IP and Mac in PC’s ARP table, and binding VM’s Mac in 
Port Eth1/0/8; PC is pinging vm, I can see Output packets of Eth1/0/8 
increasing regularly, which means the ping request packet has send to DPDK NIC.

 

But the VM can not receive the packet?

   

 My switch has NO vlan , the default VLAN is 1. But when I ping from 
vm1 to vm2 , showing MaC address in OVS which shows as follows:

port  VLAN  MACAge

4 0  00:00:00:00:02:121

3 0  00:00:00:00:00:041

 

 

What was the reason why VM cannot communication with the PC? Thank you

 

发件人: lifuqiong [mailto:lifuqi...@cncloudsec.com] 
发送时间: 2016年4月15日 9:22
收件人: 'Mauricio Vásquez'
抄送: 'dev@openvswitch.org'
主题: 答复: [ovs-dev] ovs + dpdk vhost-user match flows but cannot execute actions

 

Hello Mauricio Vasquez:

It works. Thank you very much.

 

 

发件人: Mauricio Vásquez [mailto:mauricio.vasquezber...@studenti.polito.it] 
发送时间: 2016年4月14日 14:55
收件人: lifuqiong
抄送: dev@openvswitch.org
主题: Re: [ovs-dev] ovs + dpdk vhost-user match flows but cannot execute actions

 

Hello lifuqiong,

I faced the same problem some days ago 
(http://openvswitch.org/pipermail/dev/2016-March/068282.html), the bug is 
already fixed. 

Where are you downloading OVS from?, it appears that the bug is still present 
in the version on http://openvswitch.org/releases/openvswitch-2.5.0.tar.gz, 
please download ovs from git and switch to branch-2.5.

Mauricio Vasquez, 

 

On Thu, Apr 14, 2016 at 4:28 AM, lifuqiong  wrote:

I want to test dpdk vhost-user port on ovs to follow 
https://software.intel.com/en-us/blogs/2015/06/09/building-vhost-user-for-ovs-today-using-dpdk-200;
I create ovs+dpdk environment followed INSTALL.DPDK.md; and create 2 VM2, try 
to ping each other but show me “Destination Host Unreachable”;
Dump-flows shows packets matched the flow, but can’t output to port 4, why ? I 
can’t get any useful error or warning info from ovs-vswitchd.log.
While ping from vm1 to vm2, statistics on vm1 shows that eth1 RX_packet keeps 
zero, TX_PACKET keeps increasing.
1.
OVS: 2.5.0
Dpdk: 2.2.0
Qemu: 2.2.1

2. ovs-ofctl dump-flows ovsbr0
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=836.946s, table=0, n_packets=628, n_bytes=26376, 
idle_age=0, in_port=3 actions=output:4
cookie=0x0, duration=831.458s, table=0, n_packets=36, n_bytes=1512, 
idle_age=770, in_port=4 actions=output:3

3. root@host152:/usr/local/var/run/openvswitch# ovs-vsctl show
03ae6f7d-3b71-45e3-beb0-09fa11292eaa
Bridge "ovsbr0"

Re: [ovs-dev] [PATCH v2 15/15] system-tests: Run conntrack tests with userspace

2016-04-19 Thread Joe Stringer
On 19 April 2016 at 16:53, Joe Stringer  wrote:
> On 15 April 2016 at 17:02, Daniele Di Proietto  wrote:
>> The userspace connection tracker doesn't support ALGs, frag reassembly
>> or NAT yet, so skip those tests.
>>
>> Also, connection tracking state input from a local port is not possible
>> in userspace.
>>
>> Finally, the userspace datapath pads all frames with 0, to make them at
>> least 64 bytes.
>>
>> Signed-off-by: Daniele Di Proietto 
>
> Having three prongs to your commit message like this usually indicates
> that there are three separate patches folded inside ;-)
>
> FWIW the LOCAL tests should work whether the local stack provides
> conntrack state input or not - they just exercise slightly different
> codepaths in the kernel, once the packet leaves OVS, by going out a
> different netdev type (with a different destination netns). Probably
> the class of bugs these tests pick up are not applicable for userspace
> datapath, so it seems fine to skip the conntrack-related ones. I
> wouldn't mind seeing a basic sanity test that sends packets between
> the local stack and another namespace, though - which would run for
> both testsuites, and both with IPv4 and IPv6 traffic.

I was mistaken here. I looked back over and apparently some of the
tests do rely on this behaviour, and they're passing on the latest
net-next kernels. They can be skipped for userspace.

> A few comments on the comments below, but LGTM.
>
> Acked-by: Joe Stringer 
>
>> ---
>>  tests/system-kmod-macros.at  | 28 +++
>>  tests/system-traffic.at  | 49 
>> ++--
>>  tests/system-userspace-macros.at | 45 +---
>>  3 files changed, 107 insertions(+), 15 deletions(-)
>>
>> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
>> index 8e60929..4cecc23 100644
>> --- a/tests/system-kmod-macros.at
>> +++ b/tests/system-kmod-macros.at
>> @@ -65,3 +65,31 @@ m4_define([CHECK_CONNTRACK],
>>   on_exit 'ovstest test-netlink-conntrack flush'
>>  ]
>>  )
>> +
>> +# CHECK_CONNTRACK_ALG()
>> +#
>> +# Perform requirements checks for running conntrack ALG tests. The kernel
>> +# always supports ALG, so no check is needed.
>> +#
>> +m4_define([CHECK_CONNTRACK_ALG])
>
> Saying the kernel "always" supports X is a little misleading, as these
> features can all be turned off in the CONFIG.. of course, all the
> major distros provide kernels with them turned on so it's not such a
> big deal. Maybe drop the 'always'.
>
>> +# CHECK_CONNTRACK_FRAG()
>> +#
>> +# Perform requirements checks for running conntrack fragmentations tests.
>> +# The kernel always supports fragmentation, so no check is needed.
>> +m4_define([CHECK_CONNTRACK_FRAG])
>> +
>> +# CHECK_CONNTRACK_LOCAL_STACK()
>> +#
>> +# Perform requirements checks for running conntrack tests with local stack.
>> +# The kernel always supports reading the connection state of an skb coming
>> +# from an internal port, without an explicit ct() action, so no check is
>> +# needed.
>> +m4_define([CHECK_CONNTRACK_LOCAL_STACK])
>
> While the kernel module supports this, the skb is not guaranteed to
> have conntrack state when coming from an internal port, depending on
> the kernel version and whether iptables rules are installed in the
> host network namespace.

Looks like I got a little mixed up - netfilter hooks are not installed
in all namespaces by default on the latest kernels, but I'd guess that
they're still set up in the root netns so it should be fairly reliable
there (at least for the moment).

> 
>
>> +# CHECK_CONNTRACK_LOCAL_STACK()
>> +#
>> +# Perform requirements checks for running conntrack tests with local stack.
>> +# While the kernel connection tracker automatically passes all the 
>> connection
>> +# tracking state from an internal port to the OpenvSwitch kernel module, 
>> there
>> +# is simply no way of doing that with the userspace, so skip the tests.
>> +m4_define([CHECK_CONNTRACK_LOCAL_STACK],
>> +[
>> +AT_SKIP_IF([:])
>> +])
>
> Same comment here, packets may not have conntrack state associated
> when arriving on internal port.
>
>> +# CHECK_CONNTRACK_NAT()
>> +#
>> +# Perform requirements checks for running conntrack NAT tests. The userspace
>> +# doesn't support NATs yet, so skip the tests
>> +#
>> +m4_define([CHECK_CONNTRACK_NAT],
>> +[
>> +AT_SKIP_IF([:])
>> +])
>> --
>> 2.1.4
>>
>> ___
>> dev mailing list
>> dev@openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/4] ovsdb: Introduce OVSDB replication feature

2016-04-19 Thread Cabrera Vega, Mario Alberto
Hi,

As part of this patch the following scenario takes place:
 
An ovsdb-server acts as a client of another ovsdb-server: Server A is 
monitoring 
Server B tables, so Server B constantly sends update notifications to Server A.
When an update notification arrives to Server A, I'd like to process that 
notification 
right away, as if it were a client request. 

In order to wake up Server A from poll block when an update notification 
arrives,
how can I add the file descriptor corresponding to the jsonrpc receiving 
notifications
to the set of file descriptors used to wake up the server during a poll block?
 
Currently, Server A process the incoming update notifications when status_timer 
in the server main loop is elapsed.

Could someone guide me on how to implement that?

Thanks,

-Mario

-Original Message-
From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Cabrera Vega, Mario 
Alberto
Sent: Tuesday, March 29, 2016 3:30 PM
To: dev@openvswitch.org
Subject: [ovs-dev] [PATCH 2/4] ovsdb: Introduce OVSDB replication feature

Replication is enabled by using the following option when starting the database 
server:

--sync-from=server

Where 'server' can take any form described in the ovsdb-client(1) manpage as an 
active connection. If this option is specified, the replication process is 
immediately started.

Signed-off-by: Mario Cabrera 
---
ovsdb/automake.mk |   6 +-
ovsdb/ovsdb-server.1.in   |   3 +
ovsdb/ovsdb-server.c  |  46 ++--
ovsdb/replication-syn.man |   2 +
ovsdb/replication.c   | 597 ++
ovsdb/replication.h   |  39 +++
ovsdb/replication.man |   8 +
tests/ovsdb-server.at |  51 
8 files changed, 725 insertions(+), 27 deletions(-) create mode 100644 
ovsdb/replication-syn.man create mode 100644 ovsdb/replication.c create mode 
100644 ovsdb/replication.h create mode 100644 ovsdb/replication.man

diff --git a/ovsdb/automake.mk b/ovsdb/automake.mk index 7db6fea..099ed3c 100644
--- a/ovsdb/automake.mk
+++ b/ovsdb/automake.mk
@@ -24,6 +24,8 @@ ovsdb_libovsdb_la_SOURCES = \
  ovsdb/monitor.h \
  ovsdb/query.c \
  ovsdb/query.h \
+ ovsdb/replication.c \
+ ovsdb/replication.h \
  ovsdb/row.c \
  ovsdb/row.h \
  ovsdb/server.c \
@@ -42,7 +44,9 @@ pkgconfig_DATA += \
 MAN_FRAGMENTS += \
  ovsdb/remote-active.man \
-  ovsdb/remote-passive.man
+ ovsdb/remote-passive.man \
+ ovsdb/replication.man \
+ ovsdb/replication-syn.man
 # ovsdb-tool
bin_PROGRAMS += ovsdb/ovsdb-tool
diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in index 
6c85729..1025ade 100644
--- a/ovsdb/ovsdb-server.1.in
+++ b/ovsdb/ovsdb-server.1.in
@@ -19,6 +19,7 @@ ovsdb\-server \- Open vSwitch database server .so 
lib/daemon-syn.man .so lib/service-syn.man .so lib/vlog-syn.man
+.so ovsdb/replication-syn.man
.so lib/ssl-syn.man
.so lib/ssl-bootstrap-syn.man
.so lib/ssl-peer-ca-cert-syn.man
@@ -100,6 +101,8 @@ configured remotes.
.so lib/service.man
.SS "Logging Options"
.so lib/vlog.man
+.SS "Syncing Options"
+.so ovsdb/replication.man
.SS "Public Key Infrastructure Options"
The options described below for configuring the SSL public key infrastructure 
accept a special syntax for obtaining their diff --git a/ovsdb/ovsdb-server.c 
b/ovsdb/ovsdb-server.c index fa662b1..63dd209 100644
--- a/ovsdb/ovsdb-server.c
+++ b/ovsdb/ovsdb-server.c
@@ -42,6 +42,7 @@
#include "ovsdb-error.h"
#include "poll-loop.h"
#include "process.h"
+#include "replication.h"
#include "row.h"
#include "simap.h"
#include "shash.h"
@@ -59,15 +60,7 @@
 VLOG_DEFINE_THIS_MODULE(ovsdb_server);
-struct db {
-/* Initialized in main(). */
-char *filename;
-struct ovsdb_file *file;
-struct ovsdb *db;
-
-/* Only used by update_remote_status(). */
-struct ovsdb_txn *txn;
-};
+struct db;
 /* SSL configuration. */
static char *private_key_file;
@@ -75,6 +68,9 @@ static char *certificate_file; static char *ca_cert_file; 
static bool bootstrap_ca_cert;
+/* Replication configuration. */
+static bool connect_to_remote_server;
+
static unixctl_cb_func ovsdb_server_exit; static unixctl_cb_func 
ovsdb_server_compact; static unixctl_cb_func ovsdb_server_reconnect; @@ -159,6 
+155,10 @@ main_loop(struct ovsdb_jsonrpc_server *jsonrpc, struct shash 
*all_dbs,
 report_error_if_changed(reconfigure_ssl(all_dbs), _error);
 ovsdb_jsonrpc_server_run(jsonrpc);
+if (connect_to_remote_server) {
+ replication_run(all_dbs);
+}
+
 SHASH_FOR_EACH(node, all_dbs) {
 struct db *db = node->data;
 ovsdb_trigger_run(db->db, time_msec()); @@ -170,9 +170,9 @@ 
main_loop(struct ovsdb_jsonrpc_server *jsonrpc, struct shash *all_dbs,
 }
 }
-/* update Manager status(es) every 5 seconds */
+/* update Manager status(es) 

Re: [ovs-dev] [PATCH v2 15/15] system-tests: Run conntrack tests with userspace

2016-04-19 Thread Joe Stringer
On 15 April 2016 at 17:02, Daniele Di Proietto  wrote:
> The userspace connection tracker doesn't support ALGs, frag reassembly
> or NAT yet, so skip those tests.
>
> Also, connection tracking state input from a local port is not possible
> in userspace.
>
> Finally, the userspace datapath pads all frames with 0, to make them at
> least 64 bytes.
>
> Signed-off-by: Daniele Di Proietto 

Having three prongs to your commit message like this usually indicates
that there are three separate patches folded inside ;-)

FWIW the LOCAL tests should work whether the local stack provides
conntrack state input or not - they just exercise slightly different
codepaths in the kernel, once the packet leaves OVS, by going out a
different netdev type (with a different destination netns). Probably
the class of bugs these tests pick up are not applicable for userspace
datapath, so it seems fine to skip the conntrack-related ones. I
wouldn't mind seeing a basic sanity test that sends packets between
the local stack and another namespace, though - which would run for
both testsuites, and both with IPv4 and IPv6 traffic.

A few comments on the comments below, but LGTM.

Acked-by: Joe Stringer 

> ---
>  tests/system-kmod-macros.at  | 28 +++
>  tests/system-traffic.at  | 49 
> ++--
>  tests/system-userspace-macros.at | 45 +---
>  3 files changed, 107 insertions(+), 15 deletions(-)
>
> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
> index 8e60929..4cecc23 100644
> --- a/tests/system-kmod-macros.at
> +++ b/tests/system-kmod-macros.at
> @@ -65,3 +65,31 @@ m4_define([CHECK_CONNTRACK],
>   on_exit 'ovstest test-netlink-conntrack flush'
>  ]
>  )
> +
> +# CHECK_CONNTRACK_ALG()
> +#
> +# Perform requirements checks for running conntrack ALG tests. The kernel
> +# always supports ALG, so no check is needed.
> +#
> +m4_define([CHECK_CONNTRACK_ALG])

Saying the kernel "always" supports X is a little misleading, as these
features can all be turned off in the CONFIG.. of course, all the
major distros provide kernels with them turned on so it's not such a
big deal. Maybe drop the 'always'.

> +# CHECK_CONNTRACK_FRAG()
> +#
> +# Perform requirements checks for running conntrack fragmentations tests.
> +# The kernel always supports fragmentation, so no check is needed.
> +m4_define([CHECK_CONNTRACK_FRAG])
> +
> +# CHECK_CONNTRACK_LOCAL_STACK()
> +#
> +# Perform requirements checks for running conntrack tests with local stack.
> +# The kernel always supports reading the connection state of an skb coming
> +# from an internal port, without an explicit ct() action, so no check is
> +# needed.
> +m4_define([CHECK_CONNTRACK_LOCAL_STACK])

While the kernel module supports this, the skb is not guaranteed to
have conntrack state when coming from an internal port, depending on
the kernel version and whether iptables rules are installed in the
host network namespace.



> +# CHECK_CONNTRACK_LOCAL_STACK()
> +#
> +# Perform requirements checks for running conntrack tests with local stack.
> +# While the kernel connection tracker automatically passes all the connection
> +# tracking state from an internal port to the OpenvSwitch kernel module, 
> there
> +# is simply no way of doing that with the userspace, so skip the tests.
> +m4_define([CHECK_CONNTRACK_LOCAL_STACK],
> +[
> +AT_SKIP_IF([:])
> +])

Same comment here, packets may not have conntrack state associated
when arriving on internal port.

> +# CHECK_CONNTRACK_NAT()
> +#
> +# Perform requirements checks for running conntrack NAT tests. The userspace
> +# doesn't support NATs yet, so skip the tests
> +#
> +m4_define([CHECK_CONNTRACK_NAT],
> +[
> +AT_SKIP_IF([:])
> +])
> --
> 2.1.4
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] Open Dev's INBOX with UNREAD MESSAGE of Kassie Detemple

2016-04-19 Thread Kassie U . Detemple

George went oï the top of food.What's up my lovely peckeri found yr images on 
instagram! you are rogue!are u down to f$ck? i hate commitment !! just want to have 
fun... if ur good in bed, we should chat..The screenname - Kassie1994my profile is over 
there: http://jaxzbivhDatingSurferru/?acc=Kassie1994I need that 
d@ck right now ! text me @ '+1.574.212O158' .It is me Talk soon!
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC PATCH] create vxlan device using rtnetlink interface

2016-04-19 Thread Jesse Gross
On Mon, Apr 18, 2016 at 2:57 AM, Thadeu Lima de Souza Cascardo
 wrote:
> On Fri, Apr 15, 2016 at 08:36:51PM -0700, Jesse Gross wrote:
>> On Fri, Apr 15, 2016 at 2:30 PM, Thadeu Lima de Souza Cascardo
>>  wrote:
>> > Hi, this is an RFC patch (could probably be 2 patches - more below), that
>> > creates VXLAN devices in userspace and adds them as netdev vports, instead 
>> > of
>> > using the vxlan vport code.
>> >
>> > The reason for this change is that it has been mentioned in the past that 
>> > some
>> > of the vport code should be considered compat code, that is, it won't 
>> > receive
>> > new features or flags.
>>
>> Thanks for looking at this. I agree that this is definitely a
>> necessary direction.
>>
>> > First is the creation of the device under a switch/case. I tried to make 
>> > this a
>> > netdev_class function, overriding the netdev class by dpif_port_open_type. 
>> > That
>> > required exporting a lot of the vport functions. I can still do it that 
>> > way, if
>> > that's preferrable, but this seems more simple.
>>
>> Can you give some examples of what would need to be exported? It would
>> be nice to just set the open function for each type but if it is
>> really horrible we can live with the switch statement.
>>
>
> netdev_vport_{alloc, construct, destruct, dealloc}, get_tunnel_config,
> set_tunnel_config, get_netdev_tunnel_config.
>
> We could also do it the other way around, write this code in netdev-vport.c 
> and
> export one or two functions from netdev-linux or dpif-netlink, and the create
> function per tunnel type.

OK, thanks, I understand now. I think what you have currently is
probably the best solution for the time being.

>> > The other problem is during port_del, that since we don't have a netdev, 
>> > the
>> > dpif_port type would not be vxlan, and deciding whether to remove it or not
>> > could not be made. In fact, this is one of the main reasons why this is an 
>> > RFC.
>> >
>> > The solution here is to decide on the type by the device's name, and there 
>> > is
>> > even a function for this, though it looks up for the netdev's already 
>> > created,
>> > those that probably come from the database. However, when OVS starts, it 
>> > removes
>> > ports from the datapath, and that information is not available, and may 
>> > not even
>> > be. In fact, since the dpif_port has a different name because it's a 
>> > vport, it
>> > won't match any netdev at all. So, I did the opposite of getting the type 
>> > from
>> > the dpif_port name, ie, if it contains vxlan_sys, it's of vxlan type. Even 
>> > if we
>> > make this more strict and use the port, we could still be in conflict with 
>> > a
>> > vxlan device created by the user with such name. This should have been one 
>> > patch
>> > by itself.
>>
>> What about using the driver name exposed through ethtool? I believe
>> that all of the tunnel and internal devices expose this in a
>> consistent way - i.e. a VXLAN device can be queried and get back the
>> string "vxlan". Any driver other than the ones that we recognize can
>> be assumed to be OVS_VPORT_TYPE_NETDEV.
>>
>
> I don't see how this address the case when the user adds a vxlan interface
> created by the system. Like:
>
> ip link add vxlan_sys_4789 type vxlan id 10 remote 192.168.123.1 dstport 4789
> ovs-vsctl add-port br0 vxlan_sys_4789
>
> Its driver would be vxlan. That goes to vxlan0 too.

I think we can check the combination of device type and the options
that are set on it (the same as what we need to do after we create the
device). If all of those match then it doesn't matter whether we added
it previously or the user added it - the device will work exactly the
same either way. If there isn't a match then we should bail out
without adding the port. This is pretty similar to the behavior of the
current compat code in the kernel (in that case if a port already
exists with the same name, it just aborts).

Two unresolved issues in my mind:
 * How do we handle cleaning up ports (including in cases where
userspace crashes)? In the kernel case, the port will stick around
until either the module is removed or the port is explicitly deleted.
 * How do we handle upgrade where the new version of OVS needs options
that are different from the previous version? This is basically a
special version of the port being created by someone else but we
should be able to handle the differences. Depending on how good a job
we do at cleaning up the port, this might not be an issue.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v7 05/16] dpif-netdev: Fix race condition in pmd thread initialization.

2016-04-19 Thread Daniele Di Proietto


On 19/04/2016 02:48, "Ilya Maximets"  wrote:

>On 19.04.2016 10:18, Ilya Maximets wrote:
>> There was a reason for 2 calls for dp_netdev_pmd_reload_done() inside
>> pmd_thread_main(). The reason is that we must wait until PMD thread
>> completely done with reloading. This patch introduces race condition
>> for pmd->exit_latch. While removing last port on numa node
>> dp_netdev_reload_pmd__(pmd) will be called twice for each port.
>> First call to remove port and second to destroy PMD thread.
>> pmd->exit_latch setted between this two calls. This leads to probable
>> situation when PMD thread will exit while processing first reloading.
>> Main thread will wait forever on cond_wait in second reload in this
>> case. Situation is easily reproducible by addition/deletion of last
>> port (may be after few iterations in a cycle).
>> 
>> Best regards, Ilya Maximets.
>
>This incremental should help:
>--
>diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>index 588d56f..2235297 100644
>--- a/lib/dpif-netdev.c
>+++ b/lib/dpif-netdev.c
>@@ -2785,6 +2785,7 @@ pmd_thread_main(void *f_)
> unsigned int port_seq = PMD_INITIAL_SEQ;
> int poll_cnt;
> int i;
>+bool exiting;
> 
> poll_cnt = 0;
> poll_list = NULL;
>@@ -2825,14 +2826,15 @@ reload:
> }
> }
> 
>+emc_cache_uninit(>flow_cache);
>+
> poll_cnt = pmd_load_queues_and_ports(pmd, _list);
>+exiting = latch_is_set(>exit_latch);
> /* Signal here to make sure the pmd finishes
>  * reloading the updated configuration. */
> dp_netdev_pmd_reload_done(pmd);
> 
>-emc_cache_uninit(>flow_cache);
>-
>-if (!latch_is_set(>exit_latch)){
>+if (!exiting) {
> goto reload;
> }
> 
>--

You're right, thanks for the detailed analysis and the suggested fix.

I applied the suggested incremental, but kept emc_cache_uninit()
where it is right now.

I sent an updated version here:

http://openvswitch.org/pipermail/dev/2016-April/069835.html

Thanks,

Daniele

>
> 
>> On 08.04.2016 06:13, Daniele Di Proietto wrote:
>>> The pmds and the main threads are synchronized using a condition
>>> variable.  The main thread writes a new configuration, then it waits on
>>> the condition variable.  A pmd thread reads the new configuration, then
>>> it calls signal() on the condition variable. To make sure that the pmds
>>> and the main thread have a consistent view, each signal() should be
>>> backed by a wait().
>>>
>>> Currently the first signal() doesn't have a corresponding wait().  If
>>> the pmd thread takes a long time to start and the signal() is received
>>> by a later wait, the threads will have an inconsistent view.
>>>
>>> The commit fixes the problem by removing the first signal() from the
>>> pmd thread.
>>>
>>> This is hardly a problem on current master, because the main thread
>>> will call the first wait() a long time after the creation of a pmd
>>> thread.  It becomes a problem with the next commits.
>>>
>>> Signed-off-by: Daniele Di Proietto 
>>> ---
>>>  lib/dpif-netdev.c | 21 +
>>>  1 file changed, 9 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index 9c32c64..2424d3e 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -2652,21 +2652,22 @@ dpif_netdev_wait(struct dpif *dpif)
>>>  
>>>  static int
>>>  pmd_load_queues(struct dp_netdev_pmd_thread *pmd, struct rxq_poll
>>>**ppoll_list)
>>> -OVS_REQUIRES(pmd->poll_mutex)
>>>  {
>>>  struct rxq_poll *poll_list = *ppoll_list;
>>>  struct rxq_poll *poll;
>>>  int i;
>>>  
>>> +ovs_mutex_lock(>poll_mutex);
>>>  poll_list = xrealloc(poll_list, pmd->poll_cnt * sizeof
>>>*poll_list);
>>>  
>>>  i = 0;
>>>  LIST_FOR_EACH (poll, node, >poll_list) {
>>>  poll_list[i++] = *poll;
>>>  }
>>> +ovs_mutex_unlock(>poll_mutex);
>>>  
>>>  *ppoll_list = poll_list;
>>> -return pmd->poll_cnt;
>>> +return i;
>>>  }
>>>  
>>>  static void *
>>> @@ -2685,13 +2686,10 @@ pmd_thread_main(void *f_)
>>>  /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
>>>  ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
>>>  pmd_thread_setaffinity_cpu(pmd->core_id);
>>> +poll_cnt = pmd_load_queues(pmd, _list);
>>>  reload:
>>>  emc_cache_init(>flow_cache);
>>>  
>>> -ovs_mutex_lock(>poll_mutex);
>>> -poll_cnt = pmd_load_queues(pmd, _list);
>>> -ovs_mutex_unlock(>poll_mutex);
>>> -
>>>  /* List port/core affinity */
>>>  for (i = 0; i < poll_cnt; i++) {
>>> VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
>>> @@ -2699,10 +2697,6 @@ reload:
>>>  netdev_rxq_get_queue_id(poll_list[i].rx));
>>>  }
>>>  
>>> -/* Signal here to make sure the pmd finishes
>>> - * reloading the updated configuration. */
>>> -

[ovs-dev] [PATCH v8 16/16] netdev-dpdk: Use ->reconfigure() call to change rx/tx queues.

2016-04-19 Thread Daniele Di Proietto
This introduces in dpif-netdev and netdev-dpdk the first use for the
newly introduce reconfigure netdev call.

When a request to change the number of queues comes, netdev-dpdk will
remember this and notify the upper layer via
netdev_request_reconfigure().

The datapath, instead of periodically calling netdev_set_multiq(), can
detect this and call reconfigure().

This mechanism can also be used to:
* Automatically match the number of rxq with the one provided by qemu
  via the new_device callback.
* Provide a way to change the MTU of dpdk devices at runtime.
* Move a DPDK vhost device to the proper NUMA socket.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c |  69 +-
 lib/netdev-bsd.c  |   2 +-
 lib/netdev-dpdk.c | 195 ++
 lib/netdev-dummy.c|   2 +-
 lib/netdev-linux.c|   2 +-
 lib/netdev-provider.h |  23 +++---
 lib/netdev-vport.c|   2 +-
 lib/netdev.c  |  36 +++---
 lib/netdev.h  |   3 +-
 9 files changed, 160 insertions(+), 174 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 3a250fd..cce3bf1 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -257,8 +257,6 @@ struct dp_netdev_port {
 unsigned n_rxq; /* Number of elements in 'rxq' */
 struct netdev_rxq **rxq;
 char *type; /* Port type as requested by user. */
-int latest_requested_n_rxq; /* Latest requested from netdev number
-   of rx queues. */
 };
 
 /* Contained by struct dp_netdev_flow's 'stats' member.  */
@@ -1161,20 +1159,26 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 /* There can only be ovs_numa_get_n_cores() pmd threads,
  * so creates a txq for each, and one extra for the non
  * pmd threads. */
-error = netdev_set_multiq(netdev, n_cores + 1,
-  netdev_requested_n_rxq(netdev));
+error = netdev_set_tx_multiq(netdev, n_cores + 1);
 if (error && (error != EOPNOTSUPP)) {
 VLOG_ERR("%s, cannot set multiq", devname);
 goto out;
 }
 }
+
+if (netdev_is_reconf_required(netdev)) {
+error = netdev_reconfigure(netdev);
+if (error) {
+goto out;
+}
+}
+
 port = xzalloc(sizeof *port);
 port->port_no = port_no;
 port->netdev = netdev;
 port->n_rxq = netdev_n_rxq(netdev);
 port->rxq = xcalloc(port->n_rxq, sizeof *port->rxq);
 port->type = xstrdup(type);
-port->latest_requested_n_rxq = netdev_requested_n_rxq(netdev);
 
 for (i = 0; i < port->n_rxq; i++) {
 error = netdev_rxq_open(netdev, >rxq[i], i);
@@ -2455,27 +2459,6 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op 
**ops, size_t n_ops)
 }
 }
 
-/* Returns true if the configuration for rx queues is changed. */
-static bool
-pmd_n_rxq_changed(const struct dp_netdev *dp)
-{
-struct dp_netdev_port *port;
-
-ovs_mutex_lock(>port_mutex);
-HMAP_FOR_EACH (port, node, >ports) {
-int requested_n_rxq = netdev_requested_n_rxq(port->netdev);
-
-if (netdev_is_pmd(port->netdev)
-&& port->latest_requested_n_rxq != requested_n_rxq) {
-ovs_mutex_unlock(>port_mutex);
-return true;
-}
-}
-ovs_mutex_unlock(>port_mutex);
-
-return false;
-}
-
 static bool
 cmask_equals(const char *a, const char *b)
 {
@@ -2599,11 +2582,9 @@ static int
 port_reconfigure(struct dp_netdev_port *port)
 {
 struct netdev *netdev = port->netdev;
-int requested_n_rxq = netdev_requested_n_rxq(netdev);
 int i, err;
 
-if (!netdev_is_pmd(port->netdev)
-|| port->latest_requested_n_rxq != requested_n_rxq) {
+if (!netdev_is_reconf_required(netdev)) {
 return 0;
 }
 
@@ -2614,15 +2595,14 @@ port_reconfigure(struct dp_netdev_port *port)
 }
 port->n_rxq = 0;
 
-/* Sets the new rx queue config. */
-err = netdev_set_multiq(port->netdev, ovs_numa_get_n_cores() + 1,
-requested_n_rxq);
+/* Allows 'netdev' to apply the pending configuration changes. */
+err = netdev_reconfigure(netdev);
 if (err && (err != EOPNOTSUPP)) {
-VLOG_ERR("Failed to set dpdk interface %s rx_queue to: %u",
- netdev_get_name(port->netdev), requested_n_rxq);
+VLOG_ERR("Failed to set interface %s new configuration",
+ netdev_get_name(netdev));
 return err;
 }
-/* If the set_multiq() above succeeds, reopens the 'rxq's. */
+/* If the netdev_reconfigure( above succeeds, reopens the 'rxq's. */
 port->rxq = xrealloc(port->rxq, sizeof *port->rxq * netdev_n_rxq(netdev));
 for (i = 0; i < netdev_n_rxq(netdev); i++) {
 err = netdev_rxq_open(netdev, >rxq[i], i);
@@ -2666,6 +2646,22 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
 

[ovs-dev] [PATCH v8 12/16] ofproto-dpif: Call dpif_poll_threads_set() before dpif_run().

2016-04-19 Thread Daniele Di Proietto
An upcoming commit will make dpif_poll_threads_set() record the
requested configuration and dpif_run() apply it, so it makes sense to
change the order.

Signed-off-by: Daniele Di Proietto 
Tested-by: Ilya Maximets 
Acked-by: Ilya Maximets 
Acked-by: Mark Kavanagh 
---
 ofproto/ofproto-dpif.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 747482c..2aa12b7 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -534,6 +534,8 @@ type_run(const char *type)
 return 0;
 }
 
+/* This must be called before dpif_run() */
+dpif_poll_threads_set(backer->dpif, pmd_cpu_mask);
 
 if (dpif_run(backer->dpif)) {
 backer->need_revalidate = REV_RECONFIGURE;
@@ -562,8 +564,6 @@ type_run(const char *type)
 udpif_set_threads(backer->udpif, n_handlers, n_revalidators);
 }
 
-dpif_poll_threads_set(backer->dpif, pmd_cpu_mask);
-
 if (backer->need_revalidate) {
 struct ofproto_dpif *ofproto;
 struct simap_node *node;
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v8 13/16] dpif-netdev: Change pmd thread configuration in dpif_netdev_run().

2016-04-19 Thread Daniele Di Proietto
Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c   | 144 ++--
 lib/dpif-provider.h |   3 +-
 2 files changed, 84 insertions(+), 63 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 8cc37e2..c905d1d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -224,7 +224,9 @@ struct dp_netdev {
 ovsthread_key_t per_pmd_key;
 
 /* Cpu mask for pin of pmd threads. */
+char *requested_pmd_cmask;
 char *pmd_cmask;
+
 uint64_t last_tnl_conf_seq;
 };
 
@@ -2453,18 +2455,17 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op 
**ops, size_t n_ops)
 }
 }
 
-/* Returns true if the configuration for rx queues or cpu mask
- * is changed. */
+/* Returns true if the configuration for rx queues is changed. */
 static bool
-pmd_config_changed(const struct dp_netdev *dp, const char *cmask)
+pmd_n_rxq_changed(const struct dp_netdev *dp)
 {
 struct dp_netdev_port *port;
 
 ovs_mutex_lock(>port_mutex);
 HMAP_FOR_EACH (port, node, >ports) {
-struct netdev *netdev = port->netdev;
-int requested_n_rxq = netdev_requested_n_rxq(netdev);
-if (netdev_is_pmd(netdev)
+int requested_n_rxq = netdev_requested_n_rxq(port->netdev);
+
+if (netdev_is_pmd(port->netdev)
 && port->latest_requested_n_rxq != requested_n_rxq) {
 ovs_mutex_unlock(>port_mutex);
 return true;
@@ -2472,69 +2473,29 @@ pmd_config_changed(const struct dp_netdev *dp, const 
char *cmask)
 }
 ovs_mutex_unlock(>port_mutex);
 
-if (dp->pmd_cmask != NULL && cmask != NULL) {
-return strcmp(dp->pmd_cmask, cmask);
-} else {
-return (dp->pmd_cmask != NULL || cmask != NULL);
+return false;
+}
+
+static bool
+cmask_equals(const char *a, const char *b)
+{
+if (a && b) {
+return !strcmp(a, b);
 }
+
+return a == NULL && b == NULL;
 }
 
-/* Resets pmd threads if the configuration for 'rxq's or cpu mask changes. */
+/* Changes the number or the affinity of pmd threads.  The changes are actually
+ * applied in dpif_netdev_run(). */
 static int
 dpif_netdev_pmd_set(struct dpif *dpif, const char *cmask)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
 
-if (pmd_config_changed(dp, cmask)) {
-struct dp_netdev_port *port;
-
-dp_netdev_destroy_all_pmds(dp);
-
-ovs_mutex_lock(>port_mutex);
-HMAP_FOR_EACH (port, node, >ports) {
-struct netdev *netdev = port->netdev;
-int requested_n_rxq = netdev_requested_n_rxq(netdev);
-if (netdev_is_pmd(port->netdev)
-&& port->latest_requested_n_rxq != requested_n_rxq) {
-int i, err;
-
-/* Closes the existing 'rxq's. */
-for (i = 0; i < netdev_n_rxq(port->netdev); i++) {
-netdev_rxq_close(port->rxq[i]);
-port->rxq[i] = NULL;
-}
-port->n_rxq = 0;
-
-/* Sets the new rx queue config.  */
-err = netdev_set_multiq(port->netdev,
-ovs_numa_get_n_cores() + 1,
-requested_n_rxq);
-if (err && (err != EOPNOTSUPP)) {
-VLOG_ERR("Failed to set dpdk interface %s rx_queue to:"
- " %u", netdev_get_name(port->netdev),
- requested_n_rxq);
-ovs_mutex_unlock(>port_mutex);
-return err;
-}
-port->latest_requested_n_rxq = requested_n_rxq;
-/* If the set_multiq() above succeeds, reopens the 'rxq's. */
-port->n_rxq = netdev_n_rxq(port->netdev);
-port->rxq = xrealloc(port->rxq, sizeof *port->rxq * 
port->n_rxq);
-for (i = 0; i < port->n_rxq; i++) {
-netdev_rxq_open(port->netdev, >rxq[i], i);
-}
-}
-}
-/* Reconfigures the cpu mask. */
-ovs_numa_set_cpu_mask(cmask);
-free(dp->pmd_cmask);
-dp->pmd_cmask = cmask ? xstrdup(cmask) : NULL;
-
-/* Restores the non-pmd. */
-dp_netdev_set_nonpmd(dp);
-/* Restores all pmd threads. */
-dp_netdev_reset_pmd_threads(dp);
-ovs_mutex_unlock(>port_mutex);
+if (!cmask_equals(dp->requested_pmd_cmask, cmask)) {
+free(dp->requested_pmd_cmask);
+dp->requested_pmd_cmask = cmask ? xstrdup(cmask) : NULL;
 }
 
 return 0;
@@ -2634,6 +2595,59 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread 
*pmd,
 }
 }
 
+static void
+reconfigure_pmd_threads(struct dp_netdev *dp)
+OVS_REQUIRES(dp->port_mutex)
+{
+struct dp_netdev_port *port;
+
+dp_netdev_destroy_all_pmds(dp);
+
+HMAP_FOR_EACH (port, node, >ports) {
+struct netdev *netdev = port->netdev;
+int requested_n_rxq = 

[ovs-dev] [PATCH v8 15/16] netdev: Add reconfigure request mechanism.

2016-04-19 Thread Daniele Di Proietto
A netdev provider, especially a PMD provider (like netdev DPDK) might
not be able to change some of its parameters (such as MTU, or number of
queues) without stopping everything and restarting.

This commit introduces a mechanism that allows a netdev provider to
request a restart (netdev_request_reconfigure()).  The upper layer can
be notified via netdev_wait_reconf_required() and
netdev_is_reconf_required().  After closing all the rxqs the upper layer
can finally call netdev_reconfigure(), to make sure that the new
configuration is in place.

This will be used by next commit to reconfigure rx and tx queues in
netdev-dpdk.

Signed-off-by: Daniele Di Proietto 
Tested-by: Ilya Maximets 
Acked-by: Ilya Maximets 
Acked-by: Mark Kavanagh 
---
 lib/netdev-bsd.c  |  1 +
 lib/netdev-dpdk.c |  1 +
 lib/netdev-dummy.c|  1 +
 lib/netdev-linux.c|  1 +
 lib/netdev-provider.h | 27 ++-
 lib/netdev-vport.c|  1 +
 lib/netdev.c  | 39 +++
 lib/netdev.h  |  4 
 8 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c
index 49c05f4..32e8f74 100644
--- a/lib/netdev-bsd.c
+++ b/lib/netdev-bsd.c
@@ -1536,6 +1536,7 @@ netdev_bsd_update_flags(struct netdev *netdev_, enum 
netdev_flags off,
 netdev_bsd_arp_lookup, /* arp_lookup */  \
  \
 netdev_bsd_update_flags, \
+NULL, /* reconfigure */  \
  \
 netdev_bsd_rxq_alloc,\
 netdev_bsd_rxq_construct,\
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 208c5f5..c4ff039 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2720,6 +2720,7 @@ static const struct dpdk_qos_ops egress_policer_ops = {
 NULL,   /* arp_lookup */  \
   \
 netdev_dpdk_update_flags, \
+NULL,   /* reconfigure */ \
   \
 netdev_dpdk_rxq_alloc,\
 netdev_dpdk_rxq_construct,\
diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 5acb4e1..a001322 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -1275,6 +1275,7 @@ static const struct netdev_class dummy_class = {
 NULL,   /* arp_lookup */
 
 netdev_dummy_update_flags,
+NULL,   /* reconfigure */
 
 netdev_dummy_rxq_alloc,
 netdev_dummy_rxq_construct,
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 2c1ffec..1af08f3 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -2806,6 +2806,7 @@ netdev_linux_update_flags(struct netdev *netdev_, enum 
netdev_flags off,
 netdev_linux_arp_lookup,\
 \
 netdev_linux_update_flags,  \
+NULL,   /* reconfigure */   \
 \
 netdev_linux_rxq_alloc, \
 netdev_linux_rxq_construct, \
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index cda25eb..853fc44 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -52,6 +52,16 @@ struct netdev {
  * 'netdev''s flags, features, ethernet address, or carrier changes. */
 uint64_t change_seq;
 
+/* A netdev provider might be unable to change some of the device's
+ * parameter (n_rxq, mtu) when the device is in use.  In this case
+ * the provider can notify the upper layer by calling
+ * netdev_request_reconfigure().  The upper layer will react by stopping
+ * the operations on the device and calling netdev_reconfigure() to allow
+ * the configuration changes.  'last_reconfigure_seq' remembers the value
+ * of 'reconfigure_seq' when the last reconfiguration happened. */
+struct seq *reconfigure_seq;
+uint64_t last_reconfigure_seq;
+
 /* The core netdev code initializes these at netdev construction and only
  * provide read-only access to its client.  Netdev implementations may
  * modify them. */
@@ -64,7 +74,7 @@ struct netdev {
 struct ovs_list saved_flags_list; /* Contains "struct netdev_saved_flags". 
*/
 };
 
-static void
+static inline void
 netdev_change_seq_changed(const struct netdev *netdev_)
 {
 struct netdev *netdev = CONST_CAST(struct netdev *, netdev_);
@@ -75,6 +85,12 @@ netdev_change_seq_changed(const struct netdev *netdev_)
 }
 }
 

[ovs-dev] [PATCH v8 14/16] dpif-netdev: Handle errors in reconfigure_pmd_threads().

2016-04-19 Thread Daniele Di Proietto
Errors returned by netdev_set_multiq() and netdev_rxq_open() weren't
handled properly in reconfigure_pmd_threads().  In case of error now we
remove the port from the datapath.

Also, part of the code is moved in a new function, port_reconfigure().

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 78 ++-
 1 file changed, 48 insertions(+), 30 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c905d1d..3a250fd 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2595,44 +2595,62 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread 
*pmd,
 }
 }
 
+static int
+port_reconfigure(struct dp_netdev_port *port)
+{
+struct netdev *netdev = port->netdev;
+int requested_n_rxq = netdev_requested_n_rxq(netdev);
+int i, err;
+
+if (!netdev_is_pmd(port->netdev)
+|| port->latest_requested_n_rxq != requested_n_rxq) {
+return 0;
+}
+
+/* Closes the existing 'rxq's. */
+for (i = 0; i < port->n_rxq; i++) {
+netdev_rxq_close(port->rxq[i]);
+port->rxq[i] = NULL;
+}
+port->n_rxq = 0;
+
+/* Sets the new rx queue config. */
+err = netdev_set_multiq(port->netdev, ovs_numa_get_n_cores() + 1,
+requested_n_rxq);
+if (err && (err != EOPNOTSUPP)) {
+VLOG_ERR("Failed to set dpdk interface %s rx_queue to: %u",
+ netdev_get_name(port->netdev), requested_n_rxq);
+return err;
+}
+/* If the set_multiq() above succeeds, reopens the 'rxq's. */
+port->rxq = xrealloc(port->rxq, sizeof *port->rxq * netdev_n_rxq(netdev));
+for (i = 0; i < netdev_n_rxq(netdev); i++) {
+err = netdev_rxq_open(netdev, >rxq[i], i);
+if (err) {
+return err;
+}
+port->n_rxq++;
+}
+
+return 0;
+}
+
 static void
 reconfigure_pmd_threads(struct dp_netdev *dp)
 OVS_REQUIRES(dp->port_mutex)
 {
-struct dp_netdev_port *port;
+struct dp_netdev_port *port, *next;
 
 dp_netdev_destroy_all_pmds(dp);
 
-HMAP_FOR_EACH (port, node, >ports) {
-struct netdev *netdev = port->netdev;
-int requested_n_rxq = netdev_requested_n_rxq(netdev);
-if (netdev_is_pmd(port->netdev)
-&& port->latest_requested_n_rxq != requested_n_rxq) {
-int i, err;
+HMAP_FOR_EACH_SAFE (port, next, node, >ports) {
+int err;
 
-/* Closes the existing 'rxq's. */
-for (i = 0; i < port->n_rxq; i++) {
-netdev_rxq_close(port->rxq[i]);
-port->rxq[i] = NULL;
-}
-port->n_rxq = 0;
-
-/* Sets the new rx queue config. */
-err = netdev_set_multiq(port->netdev, ovs_numa_get_n_cores() + 1,
-requested_n_rxq);
-if (err && (err != EOPNOTSUPP)) {
-VLOG_ERR("Failed to set dpdk interface %s rx_queue to: %u",
- netdev_get_name(port->netdev),
- requested_n_rxq);
-return;
-}
-port->latest_requested_n_rxq = requested_n_rxq;
-/* If the set_multiq() above succeeds, reopens the 'rxq's. */
-port->n_rxq = netdev_n_rxq(port->netdev);
-port->rxq = xrealloc(port->rxq, sizeof *port->rxq * port->n_rxq);
-for (i = 0; i < port->n_rxq; i++) {
-netdev_rxq_open(port->netdev, >rxq[i], i);
-}
+err = port_reconfigure(port);
+if (err) {
+hmap_remove(>ports, >node);
+seq_change(dp->port_seq);
+port_destroy(port);
 }
 }
 /* Reconfigures the cpu mask. */
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v8 11/16] ovs-thread: Do not quiesce in ovs_mutex_cond_wait().

2016-04-19 Thread Daniele Di Proietto
ovs_mutex_cond_wait() is used in many functions in dpif-netdev to
synchronize with pmd threads, but we can't guarantee that the callers do
not hold RCU references, so it's better to avoid quiescing.

In system_stats_thread_func() the code relied on ovs_mutex_cond_wait()
to introduce a quiescent state, so explicit calls to
ovsrcu_quiesce_start() and ovsrcu_quiesce_end() are added there.

Signed-off-by: Daniele Di Proietto 
---
 lib/ovs-rcu.h   | 3 +--
 lib/ovs-thread.c| 2 --
 vswitchd/system-stats.c | 6 ++
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/lib/ovs-rcu.h b/lib/ovs-rcu.h
index 600be0b..8750ead 100644
--- a/lib/ovs-rcu.h
+++ b/lib/ovs-rcu.h
@@ -42,8 +42,7 @@
  * A "quiescent state" is a time at which a thread holds no pointers to memory
  * that is managed by RCU; that is, when the thread is known not to reference
  * memory that might be an old version of some object freed via RCU.  For
- * example, poll_block() includes a quiescent state, as does
- * ovs_mutex_cond_wait().
+ * example, poll_block() includes a quiescent state.
  *
  * The following functions manage the recognition of quiescent states:
  *
diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
index 3c065cf..26dd928 100644
--- a/lib/ovs-thread.c
+++ b/lib/ovs-thread.c
@@ -253,9 +253,7 @@ ovs_mutex_cond_wait(pthread_cond_t *cond, const struct 
ovs_mutex *mutex_)
 struct ovs_mutex *mutex = CONST_CAST(struct ovs_mutex *, mutex_);
 int error;
 
-ovsrcu_quiesce_start();
 error = pthread_cond_wait(cond, >lock);
-ovsrcu_quiesce_end();
 
 if (OVS_UNLIKELY(error)) {
 ovs_abort(error, "pthread_cond_wait failed");
diff --git a/vswitchd/system-stats.c b/vswitchd/system-stats.c
index df4971e..129f0cf 100644
--- a/vswitchd/system-stats.c
+++ b/vswitchd/system-stats.c
@@ -37,6 +37,7 @@
 #include "json.h"
 #include "latch.h"
 #include "openvswitch/ofpbuf.h"
+#include "ovs-rcu.h"
 #include "ovs-thread.h"
 #include "poll-loop.h"
 #include "shash.h"
@@ -615,7 +616,12 @@ system_stats_thread_func(void *arg OVS_UNUSED)
 
 ovs_mutex_lock();
 while (!enabled) {
+/* The thread is sleeping, potentially for a long time, and it's
+ * not holding RCU protected references, so it makes sense to
+ * quiesce */
+ovsrcu_quiesce_start();
 ovs_mutex_cond_wait(, );
+ovsrcu_quiesce_end();
 }
 ovs_mutex_unlock();
 
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v8 10/16] dpif-netdev: Use hmap for ports.

2016-04-19 Thread Daniele Di Proietto
netdev objects are hard to use with RCU, because it's not possible to
split removal and reclamation.  Postponing the removal means that the
port is not removed and cannot be readded immediately.  Waiting for
reclamation means introducing a quiescent state, and that may introduce
subtle bugs, due to the RCU model we use in userspace.

This commit changes the port container from cmap to hmap.  'port_mutex'
must be held by readers and writers.  This shouldn't have performace
impact, as readers in the fast path use a thread local cache.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 96 +--
 1 file changed, 57 insertions(+), 39 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index bd2249e..8cc37e2 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -195,9 +195,10 @@ struct dp_netdev {
 
 /* Ports.
  *
- * Protected by RCU.  Take the mutex to add or remove ports. */
+ * Any lookup into 'ports' or any access to the dp_netdev_ports found
+ * through 'ports' requires taking 'port_mutex'. */
 struct ovs_mutex port_mutex;
-struct cmap ports;
+struct hmap ports;
 struct seq *port_seq;   /* Incremented whenever a port changes. */
 
 /* Protects access to ofproto-dpif-upcall interface during revalidator
@@ -228,7 +229,8 @@ struct dp_netdev {
 };
 
 static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp,
-odp_port_t);
+odp_port_t)
+OVS_REQUIRES(>port_mutex);
 
 enum dp_stat_type {
 DP_STAT_EXACT_HIT,  /* Packets that had an exact match (emc). */
@@ -248,7 +250,7 @@ enum pmd_cycles_counter_type {
 struct dp_netdev_port {
 odp_port_t port_no;
 struct netdev *netdev;
-struct cmap_node node;  /* Node in dp_netdev's 'ports'. */
+struct hmap_node node;  /* Node in dp_netdev's 'ports'. */
 struct netdev_saved_flags *sf;
 unsigned n_rxq; /* Number of elements in 'rxq' */
 struct netdev_rxq **rxq;
@@ -476,9 +478,11 @@ struct dpif_netdev {
 };
 
 static int get_port_by_number(struct dp_netdev *dp, odp_port_t port_no,
-  struct dp_netdev_port **portp);
+  struct dp_netdev_port **portp)
+OVS_REQUIRES(dp->port_mutex);
 static int get_port_by_name(struct dp_netdev *dp, const char *devname,
-struct dp_netdev_port **portp);
+struct dp_netdev_port **portp)
+OVS_REQUIRES(dp->port_mutex);
 static void dp_netdev_free(struct dp_netdev *)
 OVS_REQUIRES(dp_netdev_mutex);
 static int do_add_port(struct dp_netdev *dp, const char *devname,
@@ -522,7 +526,8 @@ dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd,
  struct dp_netdev_port *port, struct netdev_rxq *rx);
 static struct dp_netdev_pmd_thread *
 dp_netdev_less_loaded_pmd_on_numa(struct dp_netdev *dp, int numa_id);
-static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp);
+static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp)
+OVS_REQUIRES(dp->port_mutex);
 static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_pmd_unref(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_pmd_flow_flush(struct dp_netdev_pmd_thread *pmd);
@@ -913,7 +918,7 @@ create_dp_netdev(const char *name, const struct dpif_class 
*class,
 atomic_flag_clear(>destroyed);
 
 ovs_mutex_init(>port_mutex);
-cmap_init(>ports);
+hmap_init(>ports);
 dp->port_seq = seq_create();
 fat_rwlock_init(>upcall_rwlock);
 
@@ -984,7 +989,7 @@ static void
 dp_netdev_free(struct dp_netdev *dp)
 OVS_REQUIRES(dp_netdev_mutex)
 {
-struct dp_netdev_port *port;
+struct dp_netdev_port *port, *next;
 
 shash_find_and_delete(_netdevs, dp->name);
 
@@ -993,15 +998,14 @@ dp_netdev_free(struct dp_netdev *dp)
 ovsthread_key_delete(dp->per_pmd_key);
 
 ovs_mutex_lock(>port_mutex);
-CMAP_FOR_EACH (port, node, >ports) {
-/* PMD threads are destroyed here. do_del_port() cannot quiesce */
+HMAP_FOR_EACH_SAFE (port, next, node, >ports) {
 do_del_port(dp, port);
 }
 ovs_mutex_unlock(>port_mutex);
 cmap_destroy(>poll_threads);
 
 seq_destroy(dp->port_seq);
-cmap_destroy(>ports);
+hmap_destroy(>ports);
 ovs_mutex_destroy(>port_mutex);
 
 /* Upcalls must be disabled at this point */
@@ -1222,7 +1226,7 @@ do_add_port(struct dp_netdev *dp, const char *devname, 
const char *type,
 return error;
 }
 
-cmap_insert(>ports, >node, hash_port_no(port_no));
+hmap_insert(>ports, >node, hash_port_no(port_no));
 
 dp_netdev_add_port_to_pmds(dp, port);
 seq_change(dp->port_seq);
@@ -1288,10 +1292,11 @@ is_valid_port_number(odp_port_t port_no)
 
 static struct dp_netdev_port *
 

[ovs-dev] [PATCH v8 07/16] hmap: Add HMAP_FOR_EACH_POP.

2016-04-19 Thread Daniele Di Proietto
Makes popping each member of the hmap a bit easier.

Signed-off-by: Daniele Di Proietto 
---
 lib/cfm.c|  5 ++---
 lib/hmap.h   |  4 
 lib/id-pool.c|  5 ++---
 lib/learning-switch.c|  5 ++---
 lib/netdev-linux.c   |  5 ++---
 lib/odp-util.c   |  7 +++
 ofproto/bond.c   | 10 --
 ofproto/in-band.c|  5 ++---
 ofproto/ofproto-dpif-ipfix.c |  5 ++---
 ofproto/ofproto-dpif-xlate.c |  5 ++---
 ofproto/ofproto.c|  5 ++---
 ofproto/pinsched.c   |  5 ++---
 ovn/controller-vtep/vtep.c   |  5 ++---
 ovn/controller/encaps.c  |  5 ++---
 ovn/controller/lport.c   |  5 ++---
 ovn/controller/ofctrl.c  |  5 ++---
 ovn/controller/physical.c|  4 +---
 ovn/controller/pinctrl.c |  5 ++---
 ovn/lib/expr.c   |  5 ++---
 ovn/northd/ovn-northd.c  | 10 --
 ovsdb/monitor.c  |  5 ++---
 ovsdb/row.c  |  5 ++---
 tests/library.at |  2 +-
 tests/test-hmap.c| 42 ++
 24 files changed, 93 insertions(+), 71 deletions(-)

diff --git a/lib/cfm.c b/lib/cfm.c
index cf1f725..fb077de 100644
--- a/lib/cfm.c
+++ b/lib/cfm.c
@@ -374,7 +374,7 @@ cfm_create(const struct netdev *netdev) OVS_EXCLUDED(mutex)
 void
 cfm_unref(struct cfm *cfm) OVS_EXCLUDED(mutex)
 {
-struct remote_mp *rmp, *rmp_next;
+struct remote_mp *rmp;
 
 if (!cfm) {
 return;
@@ -389,8 +389,7 @@ cfm_unref(struct cfm *cfm) OVS_EXCLUDED(mutex)
 hmap_remove(all_cfms, >hmap_node);
 ovs_mutex_unlock();
 
-HMAP_FOR_EACH_SAFE (rmp, rmp_next, node, >remote_mps) {
-hmap_remove(>remote_mps, >node);
+HMAP_FOR_EACH_POP (rmp, node, >remote_mps) {
 free(rmp);
 }
 
diff --git a/lib/hmap.h b/lib/hmap.h
index 53e75cc..08c4719 100644
--- a/lib/hmap.h
+++ b/lib/hmap.h
@@ -192,6 +192,10 @@ bool hmap_contains(const struct hmap *, const struct 
hmap_node *);
  __VA_ARGS__;   \
  (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) || (NODE = NULL); \
  ASSIGN_CONTAINER(NODE, hmap_next(HMAP, &(NODE)->MEMBER), MEMBER))
+#define HMAP_FOR_EACH_POP(NODE, MEMBER, HMAP)  \
+while (!hmap_is_empty(HMAP)\
+   && (INIT_CONTAINER(NODE, hmap_first(HMAP), MEMBER), 1)  \
+   && (hmap_remove(HMAP, &(NODE)->MEMBER), 1))
 
 static inline struct hmap_node *hmap_first(const struct hmap *);
 static inline struct hmap_node *hmap_next(const struct hmap *,
diff --git a/lib/id-pool.c b/lib/id-pool.c
index 6b93d37..f32c008 100644
--- a/lib/id-pool.c
+++ b/lib/id-pool.c
@@ -69,10 +69,9 @@ id_pool_init(struct id_pool *pool, uint32_t base, uint32_t 
n_ids)
 static void
 id_pool_uninit(struct id_pool *pool)
 {
-struct id_node *id_node, *next;
+struct id_node *id_node;
 
-HMAP_FOR_EACH_SAFE(id_node, next, node, >map) {
-hmap_remove(>map, _node->node);
+HMAP_FOR_EACH_POP(id_node, node, >map) {
 free(id_node);
 }
 
diff --git a/lib/learning-switch.c b/lib/learning-switch.c
index c69ca4c..b420fe5 100644
--- a/lib/learning-switch.c
+++ b/lib/learning-switch.c
@@ -269,11 +269,10 @@ void
 lswitch_destroy(struct lswitch *sw)
 {
 if (sw) {
-struct lswitch_port *node, *next;
+struct lswitch_port *node;
 
 rconn_destroy(sw->rconn);
-HMAP_FOR_EACH_SAFE (node, next, hmap_node, >queue_numbers) {
-hmap_remove(>queue_numbers, >hmap_node);
+HMAP_FOR_EACH_POP (node, hmap_node, >queue_numbers) {
 free(node);
 }
 shash_destroy(>queue_names);
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index a7d7ac7..2c1ffec 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -3851,10 +3851,9 @@ static void
 htb_tc_destroy(struct tc *tc)
 {
 struct htb *htb = CONTAINER_OF(tc, struct htb, tc);
-struct htb_class *hc, *next;
+struct htb_class *hc;
 
-HMAP_FOR_EACH_SAFE (hc, next, tc_queue.hmap_node, >tc.queues) {
-hmap_remove(>tc.queues, >tc_queue.hmap_node);
+HMAP_FOR_EACH_POP (hc, tc_queue.hmap_node, >tc.queues) {
 free(hc);
 }
 tc_destroy(tc);
diff --git a/lib/odp-util.c b/lib/odp-util.c
index b4689cc..3c75379 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -2081,10 +2081,9 @@ odp_portno_names_get(const struct hmap *portno_names, 
odp_port_t port_no)
 void
 odp_portno_names_destroy(struct hmap *portno_names)
 {
-struct odp_portno_names *odp_portno_names, *odp_portno_names_next;
-HMAP_FOR_EACH_SAFE (odp_portno_names, odp_portno_names_next,
-hmap_node, portno_names) {
-hmap_remove(portno_names, _portno_names->hmap_node);
+struct odp_portno_names *odp_portno_names;
+
+HMAP_FOR_EACH_POP (odp_portno_names, hmap_node, portno_names) {
 

[ovs-dev] [PATCH v8 09/16] hmap: Use struct for hmap_at_position().

2016-04-19 Thread Daniele Di Proietto
The interface will be more similar to the cmap.

Signed-off-by: Daniele Di Proietto 
---
 lib/hmap.c | 26 --
 lib/hmap.h |  7 ++-
 lib/sset.c | 12 +---
 lib/sset.h |  7 ++-
 ofproto/ofproto-dpif.c |  8 +++-
 5 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/lib/hmap.c b/lib/hmap.c
index b70ce51..9462c5e 100644
--- a/lib/hmap.c
+++ b/lib/hmap.c
@@ -236,24 +236,22 @@ hmap_random_node(const struct hmap *hmap)
 }
 
 /* Returns the next node in 'hmap' in hash order, or NULL if no nodes remain in
- * 'hmap'.  Uses '*bucketp' and '*offsetp' to determine where to begin
- * iteration, and stores new values to pass on the next iteration into them
- * before returning.
+ * 'hmap'.  Uses '*pos' to determine where to begin iteration, and updates
+ * '*pos' to pass on the next iteration into them before returning.
  *
  * It's better to use plain HMAP_FOR_EACH and related functions, since they are
  * faster and better at dealing with hmaps that change during iteration.
  *
- * Before beginning iteration, store 0 into '*bucketp' and '*offsetp'.
- */
+ * Before beginning iteration, set '*pos' to all zeros. */
 struct hmap_node *
 hmap_at_position(const struct hmap *hmap,
- uint32_t *bucketp, uint32_t *offsetp)
+ struct hmap_position *pos)
 {
 size_t offset;
 size_t b_idx;
 
-offset = *offsetp;
-for (b_idx = *bucketp; b_idx <= hmap->mask; b_idx++) {
+offset = pos->offset;
+for (b_idx = pos->bucket; b_idx <= hmap->mask; b_idx++) {
 struct hmap_node *node;
 size_t n_idx;
 
@@ -261,11 +259,11 @@ hmap_at_position(const struct hmap *hmap,
  n_idx++, node = node->next) {
 if (n_idx == offset) {
 if (node->next) {
-*bucketp = node->hash & hmap->mask;
-*offsetp = offset + 1;
+pos->bucket = node->hash & hmap->mask;
+pos->offset = offset + 1;
 } else {
-*bucketp = (node->hash & hmap->mask) + 1;
-*offsetp = 0;
+pos->bucket = (node->hash & hmap->mask) + 1;
+pos->offset = 0;
 }
 return node;
 }
@@ -273,8 +271,8 @@ hmap_at_position(const struct hmap *hmap,
 offset = 0;
 }
 
-*bucketp = 0;
-*offsetp = 0;
+pos->bucket = 0;
+pos->offset = 0;
 return NULL;
 }
 
diff --git a/lib/hmap.h b/lib/hmap.h
index 08c4719..9a96c5f 100644
--- a/lib/hmap.h
+++ b/lib/hmap.h
@@ -201,8 +201,13 @@ static inline struct hmap_node *hmap_first(const struct 
hmap *);
 static inline struct hmap_node *hmap_next(const struct hmap *,
   const struct hmap_node *);
 
+struct hmap_position {
+unsigned int bucket;
+unsigned int offset;
+};
+
 struct hmap_node *hmap_at_position(const struct hmap *,
-   uint32_t *bucket, uint32_t *offset);
+   struct hmap_position *);
 
 /* Returns the number of nodes currently in 'hmap'. */
 static inline size_t
diff --git a/lib/sset.c b/lib/sset.c
index f9d4fc0..4fd3fae 100644
--- a/lib/sset.c
+++ b/lib/sset.c
@@ -251,21 +251,19 @@ sset_equals(const struct sset *a, const struct sset *b)
 }
 
 /* Returns the next node in 'set' in hash order, or NULL if no nodes remain in
- * 'set'.  Uses '*bucketp' and '*offsetp' to determine where to begin
- * iteration, and stores new values to pass on the next iteration into them
- * before returning.
+ * 'set'.  Uses '*pos' to determine where to begin iteration, and updates
+ * '*pos' to pass on the next iteration into them before returning.
  *
  * It's better to use plain SSET_FOR_EACH and related functions, since they are
  * faster and better at dealing with ssets that change during iteration.
  *
- * Before beginning iteration, store 0 into '*bucketp' and '*offsetp'.
- */
+ * Before beginning iteration, set '*pos' to all zeros. */
 struct sset_node *
-sset_at_position(const struct sset *set, uint32_t *bucketp, uint32_t *offsetp)
+sset_at_position(const struct sset *set, struct sset_position *pos)
 {
 struct hmap_node *hmap_node;
 
-hmap_node = hmap_at_position(>map, bucketp, offsetp);
+hmap_node = hmap_at_position(>map, >pos);
 return SSET_NODE_FROM_HMAP_NODE(hmap_node);
 }
 
diff --git a/lib/sset.h b/lib/sset.h
index 7d1d496..9c2f703 100644
--- a/lib/sset.h
+++ b/lib/sset.h
@@ -64,8 +64,13 @@ char *sset_pop(struct sset *);
 struct sset_node *sset_find(const struct sset *, const char *);
 bool sset_contains(const struct sset *, const char *);
 bool sset_equals(const struct sset *, const struct sset *);
+
+struct sset_position {
+struct hmap_position pos;
+};
+
 struct sset_node *sset_at_position(const struct sset *,
-   uint32_t *bucketp, uint32_t 

[ovs-dev] [PATCH v8 08/16] dpif-netdev: Add pmd thread local port cache for transmission.

2016-04-19 Thread Daniele Di Proietto
A future commit will stop using RCU for 'dp->ports' and use a mutex for
reading/writing them.  To avoid taking a mutex in dp_execute_cb(), which
is called in the fast path, this commit introduces a pmd thread local
cache of ports.

The downside is that every port add/remove now needs to synchronize with
every pmd thread.

Among the advantages, keeping a per thread port mapping could allow
greater control over the txq assigment.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 249 +++---
 1 file changed, 179 insertions(+), 70 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index cedaf39..bd2249e 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -184,6 +184,7 @@ static bool dpcls_lookup(const struct dpcls *cls,
  *
  *dp_netdev_mutex (global)
  *port_mutex
+ *non_pmd_mutex
  */
 struct dp_netdev {
 const struct dpif_class *const class;
@@ -379,6 +380,13 @@ struct rxq_poll {
 struct ovs_list node;
 };
 
+/* Contained by struct dp_netdev_pmd_thread's 'port_cache' or 'tx_ports'. */
+struct tx_port {
+odp_port_t port_no;
+struct netdev *netdev;
+struct hmap_node node;
+};
+
 /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
  * the performance overhead of interrupt processing.  Therefore netdev can
  * not implement rx-wait for these devices.  dpif-netdev needs to poll
@@ -405,8 +413,8 @@ struct dp_netdev_pmd_thread {
 
 /* Per thread exact-match cache.  Note, the instance for cpu core
  * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly
- * need to be protected (e.g. by 'dp_netdev_mutex').  All other
- * instances will only be accessed by its own pmd thread. */
+ * need to be protected by 'non_pmd_mutex'.  Every other instance
+ * will only be accessed by its own pmd thread. */
 struct emc_cache flow_cache;
 
 /* Classifier and Flow-Table.
@@ -435,10 +443,20 @@ struct dp_netdev_pmd_thread {
 atomic_int tx_qid;  /* Queue id used by this pmd thread to
  * send packets on all netdevs */
 
-struct ovs_mutex poll_mutex;/* Mutex for poll_list. */
+struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
 struct ovs_list poll_list OVS_GUARDED;
-int poll_cnt;   /* Number of elemints in poll_list. */
+/* Number of elements in 'poll_list' */
+int poll_cnt;
+/* Map of 'tx_port's used for transmission.  Written by the main thread,
+ * read by the pmd thread. */
+struct hmap tx_ports OVS_GUARDED;
+
+/* Map of 'tx_port' used in the fast path. This is a thread-local copy of
+ * 'tx_ports'. The instance for cpu core NON_PMD_CORE_ID can be accessed
+ * by multiple threads, and thusly need to be protected by 'non_pmd_mutex'.
+ * Every other instance will only be accessed by its own pmd thread. */
+struct hmap port_cache;
 
 /* Only a pmd thread can write on its own 'cycles' and 'stats'.
  * The main thread keeps 'stats_zero' and 'cycles_zero' as base
@@ -494,7 +512,7 @@ dp_netdev_pmd_get_next(struct dp_netdev *dp, struct 
cmap_position *pos);
 static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp);
 static void dp_netdev_del_pmds_on_numa(struct dp_netdev *dp, int numa_id);
 static void dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int numa_id);
-static void dp_netdev_pmd_clear_poll_list(struct dp_netdev_pmd_thread *pmd);
+static void dp_netdev_pmd_clear_ports(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp,
  struct dp_netdev_port *port);
 static void
@@ -508,6 +526,8 @@ static void dp_netdev_reset_pmd_threads(struct dp_netdev 
*dp);
 static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_pmd_unref(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_pmd_flow_flush(struct dp_netdev_pmd_thread *pmd);
+static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd)
+OVS_REQUIRES(pmd->port_mutex);
 
 static inline bool emc_entry_alive(struct emc_entry *ce);
 static void emc_clear_entry(struct emc_entry *ce);
@@ -690,7 +710,7 @@ pmd_info_show_rxq(struct ds *reply, struct 
dp_netdev_pmd_thread *pmd)
 ds_put_format(reply, "pmd thread numa_id %d core_id %u:\n",
   pmd->numa_id, pmd->core_id);
 
-ovs_mutex_lock(>poll_mutex);
+ovs_mutex_lock(>port_mutex);
 LIST_FOR_EACH (poll, node, >poll_list) {
 const char *name = netdev_get_name(poll->port->netdev);
 
@@ -704,7 +724,7 @@ pmd_info_show_rxq(struct ds *reply, struct 
dp_netdev_pmd_thread *pmd)
 ds_put_format(reply, " %d", netdev_rxq_get_queue_id(poll->rx));
 prev_name = name;
 }
-ovs_mutex_unlock(>poll_mutex);
+

[ovs-dev] [PATCH v8 06/16] dpif-netdev: Remove duplicate code in dp_netdev_set_pmds_on_numa().

2016-04-19 Thread Daniele Di Proietto
Instead of duplicating code to add ports in
dp_netdev_set_pmds_on_numa(), we can always use
dp_netdev_add_port_to_pmds__().

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 58 +--
 1 file changed, 22 insertions(+), 36 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index fbd23cf..cedaf39 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3113,13 +3113,12 @@ dp_netdev_add_port_to_pmds__(struct dp_netdev *dp, 
struct dp_netdev_port *port,
 
 /* Cannot create pmd threads for invalid numa node. */
 ovs_assert(ovs_numa_numa_id_is_valid(numa_id));
+dp_netdev_set_pmds_on_numa(dp, numa_id);
 
 for (i = 0; i < port->n_rxq; i++) {
 pmd = dp_netdev_less_loaded_pmd_on_numa(dp, numa_id);
 if (!pmd) {
-/* There is no pmd threads on this numa node. */
-dp_netdev_set_pmds_on_numa(dp, numa_id);
-/* Assigning of rx queues done. */
+VLOG_WARN("There's no pmd thread on numa node %d", numa_id);
 break;
 }
 
@@ -3158,9 +3157,9 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
numa_id)
 int n_pmds;
 
 if (!ovs_numa_numa_id_is_valid(numa_id)) {
-VLOG_ERR("Cannot create pmd threads due to numa id (%d)"
- "invalid", numa_id);
-return ;
+VLOG_WARN("Cannot create pmd threads due to numa id (%d) invalid",
+  numa_id);
+return;
 }
 
 n_pmds = get_n_pmd_threads_on_numa(dp, numa_id);
@@ -3169,46 +3168,25 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
numa_id)
  * in which 'netdev' is on, do nothing.  Else, creates the
  * pmd threads for the numa node. */
 if (!n_pmds) {
-int can_have, n_unpinned, i, index = 0;
-struct dp_netdev_pmd_thread **pmds;
-struct dp_netdev_port *port;
+int can_have, n_unpinned, i;
 
 n_unpinned = ovs_numa_get_n_unpinned_cores_on_numa(numa_id);
 if (!n_unpinned) {
-VLOG_ERR("Cannot create pmd threads due to out of unpinned "
- "cores on numa node %d", numa_id);
+VLOG_WARN("Cannot create pmd threads due to out of unpinned "
+  "cores on numa node %d", numa_id);
 return;
 }
 
 /* If cpu mask is specified, uses all unpinned cores, otherwise
  * tries creating NR_PMD_THREADS pmd threads. */
 can_have = dp->pmd_cmask ? n_unpinned : MIN(n_unpinned, 
NR_PMD_THREADS);
-pmds = xzalloc(can_have * sizeof *pmds);
 for (i = 0; i < can_have; i++) {
 unsigned core_id = ovs_numa_get_unpinned_core_on_numa(numa_id);
-pmds[i] = xzalloc(sizeof **pmds);
-dp_netdev_configure_pmd(pmds[i], dp, core_id, numa_id);
-}
+struct dp_netdev_pmd_thread *pmd = xzalloc(sizeof *pmd);
 
-/* Distributes rx queues of this numa node between new pmd threads. */
-CMAP_FOR_EACH (port, node, >ports) {
-if (netdev_is_pmd(port->netdev)
-&& netdev_get_numa_id(port->netdev) == numa_id) {
-for (i = 0; i < port->n_rxq; i++) {
-/* Make thread-safety analyser happy. */
-ovs_mutex_lock([index]->poll_mutex);
-dp_netdev_add_rxq_to_pmd(pmds[index], port, port->rxq[i]);
-ovs_mutex_unlock([index]->poll_mutex);
-index = (index + 1) % can_have;
-}
-}
-}
-
-/* Actual start of pmd threads. */
-for (i = 0; i < can_have; i++) {
-pmds[i]->thread = ovs_thread_create("pmd", pmd_thread_main, 
pmds[i]);
+dp_netdev_configure_pmd(pmd, dp, core_id, numa_id);
+pmd->thread = ovs_thread_create("pmd", pmd_thread_main, pmd);
 }
-free(pmds);
 VLOG_INFO("Created %d pmd threads on numa node %d", can_have, numa_id);
 }
 }
@@ -3220,14 +3198,22 @@ static void
 dp_netdev_reset_pmd_threads(struct dp_netdev *dp)
 {
 struct dp_netdev_port *port;
+struct hmapx to_reload = HMAPX_INITIALIZER(_reload);
+struct hmapx_node *node;
 
 CMAP_FOR_EACH (port, node, >ports) {
 if (netdev_is_pmd(port->netdev)) {
-int numa_id = netdev_get_numa_id(port->netdev);
-
-dp_netdev_set_pmds_on_numa(dp, numa_id);
+dp_netdev_add_port_to_pmds__(dp, port, _reload);
 }
 }
+
+HMAPX_FOR_EACH (node, _reload) {
+struct dp_netdev_pmd_thread *pmd;
+pmd = (struct dp_netdev_pmd_thread *) node->data;
+dp_netdev_reload_pmd__(pmd);
+}
+
+hmapx_destroy(_reload);
 }
 
 static char *
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v8 05/16] dpif-netdev: Fix race condition in pmd thread initialization.

2016-04-19 Thread Daniele Di Proietto
The pmds and the main threads are synchronized using a condition
variable.  The main thread writes a new configuration, then it waits on
the condition variable.  A pmd thread reads the new configuration, then
it calls signal() on the condition variable. To make sure that the pmds
and the main thread have a consistent view, each signal() should be
backed by a wait().

Currently the first signal() doesn't have a corresponding wait().  If
the pmd thread takes a long time to start and the signal() is received
by a later wait, the threads will have an inconsistent view.

The commit fixes the problem by removing the first signal() from the
pmd thread.

This is hardly a problem on current master, because the main thread
will call the first wait() a long time after the creation of a pmd
thread.  It becomes a problem with the next commits.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c3be4eb..fbd23cf 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2651,21 +2651,22 @@ dpif_netdev_wait(struct dpif *dpif)
 
 static int
 pmd_load_queues(struct dp_netdev_pmd_thread *pmd, struct rxq_poll **ppoll_list)
-OVS_REQUIRES(pmd->poll_mutex)
 {
 struct rxq_poll *poll_list = *ppoll_list;
 struct rxq_poll *poll;
 int i;
 
+ovs_mutex_lock(>poll_mutex);
 poll_list = xrealloc(poll_list, pmd->poll_cnt * sizeof *poll_list);
 
 i = 0;
 LIST_FOR_EACH (poll, node, >poll_list) {
 poll_list[i++] = *poll;
 }
+ovs_mutex_unlock(>poll_mutex);
 
 *ppoll_list = poll_list;
-return pmd->poll_cnt;
+return i;
 }
 
 static void *
@@ -2675,6 +2676,7 @@ pmd_thread_main(void *f_)
 unsigned int lc = 0;
 struct rxq_poll *poll_list;
 unsigned int port_seq = PMD_INITIAL_SEQ;
+bool exiting;
 int poll_cnt;
 int i;
 
@@ -2684,13 +2686,10 @@ pmd_thread_main(void *f_)
 /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
 ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
 pmd_thread_setaffinity_cpu(pmd->core_id);
+poll_cnt = pmd_load_queues(pmd, _list);
 reload:
 emc_cache_init(>flow_cache);
 
-ovs_mutex_lock(>poll_mutex);
-poll_cnt = pmd_load_queues(pmd, _list);
-ovs_mutex_unlock(>poll_mutex);
-
 /* List port/core affinity */
 for (i = 0; i < poll_cnt; i++) {
VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
@@ -2698,10 +2697,6 @@ reload:
 netdev_rxq_get_queue_id(poll_list[i].rx));
 }
 
-/* Signal here to make sure the pmd finishes
- * reloading the updated configuration. */
-dp_netdev_pmd_reload_done(pmd);
-
 for (;;) {
 for (i = 0; i < poll_cnt; i++) {
 dp_netdev_process_rxq_port(pmd, poll_list[i].port, 
poll_list[i].rx);
@@ -2724,14 +2719,18 @@ reload:
 }
 }
 
+poll_cnt = pmd_load_queues(pmd, _list);
+exiting = latch_is_set(>exit_latch);
+/* Signal here to make sure the pmd finishes
+ * reloading the updated configuration. */
+dp_netdev_pmd_reload_done(pmd);
+
 emc_cache_uninit(>flow_cache);
 
-if (!latch_is_set(>exit_latch)){
+if (!exiting) {
 goto reload;
 }
 
-dp_netdev_pmd_reload_done(pmd);
-
 free(poll_list);
 return NULL;
 }
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v8 03/16] dpif-netdev: Factor out port_create() from do_add_port().

2016-04-19 Thread Daniele Di Proietto
Instead of performing every operation inside do_port_add() it seems
clearer to introduce port_create(), since we already have
port_destroy().

No functional change.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 69 ++-
 1 file changed, 43 insertions(+), 26 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 060f5e0..a224b43 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1095,29 +1095,22 @@ hash_port_no(odp_port_t port_no)
 }
 
 static int
-do_add_port(struct dp_netdev *dp, const char *devname, const char *type,
-odp_port_t port_no)
-OVS_REQUIRES(dp->port_mutex)
+port_create(const char *devname, const char *open_type, const char *type,
+odp_port_t port_no, struct dp_netdev_port **portp)
 {
 struct netdev_saved_flags *sf;
 struct dp_netdev_port *port;
-struct netdev *netdev;
 enum netdev_flags flags;
-const char *open_type;
-int error = 0;
-int i, n_open_rxqs = 0;
+struct netdev *netdev;
+int n_open_rxqs = 0;
+int i, error;
 
-/* Reject devices already in 'dp'. */
-if (!get_port_by_name(dp, devname, )) {
-error = EEXIST;
-goto out;
-}
+*portp = NULL;
 
 /* Open and validate network device. */
-open_type = dpif_netdev_port_open_type(dp->class, type);
 error = netdev_open(devname, open_type, );
 if (error) {
-goto out;
+return error;
 }
 /* XXX reject non-Ethernet devices */
 
@@ -1125,7 +1118,7 @@ do_add_port(struct dp_netdev *dp, const char *devname, 
const char *type,
 if (flags & NETDEV_LOOPBACK) {
 VLOG_ERR("%s: cannot add a loopback device", devname);
 error = EINVAL;
-goto out_close;
+goto out;
 }
 
 if (netdev_is_pmd(netdev)) {
@@ -1134,7 +1127,7 @@ do_add_port(struct dp_netdev *dp, const char *devname, 
const char *type,
 if (n_cores == OVS_CORE_UNSPEC) {
 VLOG_ERR("%s, cannot get cpu core info", devname);
 error = ENOENT;
-goto out_close;
+goto out;
 }
 /* There can only be ovs_numa_get_n_cores() pmd threads,
  * so creates a txq for each, and one extra for the non
@@ -1143,14 +1136,14 @@ do_add_port(struct dp_netdev *dp, const char *devname, 
const char *type,
   netdev_requested_n_rxq(netdev));
 if (error && (error != EOPNOTSUPP)) {
 VLOG_ERR("%s, cannot set multiq", devname);
-goto out_close;
+goto out;
 }
 }
 port = xzalloc(sizeof *port);
 port->port_no = port_no;
 port->netdev = netdev;
 port->n_rxq = netdev_n_rxq(netdev);
-port->rxq = xmalloc(sizeof *port->rxq * port->n_rxq);
+port->rxq = xcalloc(port->n_rxq, sizeof *port->rxq);
 port->type = xstrdup(type);
 port->latest_requested_n_rxq = netdev_requested_n_rxq(netdev);
 
@@ -1170,12 +1163,7 @@ do_add_port(struct dp_netdev *dp, const char *devname, 
const char *type,
 }
 port->sf = sf;
 
-cmap_insert(>ports, >node, hash_port_no(port_no));
-
-if (netdev_is_pmd(netdev)) {
-dp_netdev_add_port_to_pmds(dp, port);
-}
-seq_change(dp->port_seq);
+*portp = port;
 
 return 0;
 
@@ -1186,13 +1174,42 @@ out_rxq_close:
 free(port->type);
 free(port->rxq);
 free(port);
-out_close:
-netdev_close(netdev);
+
 out:
+netdev_close(netdev);
 return error;
 }
 
 static int
+do_add_port(struct dp_netdev *dp, const char *devname, const char *type,
+odp_port_t port_no)
+OVS_REQUIRES(dp->port_mutex)
+{
+struct dp_netdev_port *port;
+int error;
+
+/* Reject devices already in 'dp'. */
+if (!get_port_by_name(dp, devname, )) {
+return EEXIST;
+}
+
+error = port_create(devname, dpif_netdev_port_open_type(dp->class, type),
+type, port_no, );
+if (error) {
+return error;
+}
+
+cmap_insert(>ports, >node, hash_port_no(port_no));
+
+if (netdev_is_pmd(port->netdev)) {
+dp_netdev_add_port_to_pmds(dp, port);
+}
+seq_change(dp->port_seq);
+
+return 0;
+}
+
+static int
 dpif_netdev_port_add(struct dpif *dpif, struct netdev *netdev,
  odp_port_t *port_nop)
 {
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v8 04/16] dpif-netdev: Add functions to modify rxq without reloading pmd threads.

2016-04-19 Thread Daniele Di Proietto
This commit introduces some functions to add/remove rxqs from pmd
threads without reloading them.  They will be used by next commits.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 77 ---
 1 file changed, 56 insertions(+), 21 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a224b43..c3be4eb 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -495,8 +495,6 @@ static void dp_netdev_destroy_all_pmds(struct dp_netdev 
*dp);
 static void dp_netdev_del_pmds_on_numa(struct dp_netdev *dp, int numa_id);
 static void dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int numa_id);
 static void dp_netdev_pmd_clear_poll_list(struct dp_netdev_pmd_thread *pmd);
-static void dp_netdev_del_port_from_pmd(struct dp_netdev_port *port,
-struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp,
  struct dp_netdev_port *port);
 static void
@@ -3002,11 +3000,11 @@ dp_netdev_pmd_clear_poll_list(struct 
dp_netdev_pmd_thread *pmd)
 ovs_mutex_unlock(>poll_mutex);
 }
 
-/* Deletes all rx queues of 'port' from poll_list of pmd thread and
- * reloads it if poll_list was changed. */
-static void
-dp_netdev_del_port_from_pmd(struct dp_netdev_port *port,
-struct dp_netdev_pmd_thread *pmd)
+/* Deletes all rx queues of 'port' from poll_list of pmd thread.  Returns true
+ * if 'port' was found in 'pmd' (therefore a restart is required). */
+static bool
+dp_netdev_del_port_from_pmd__(struct dp_netdev_port *port,
+  struct dp_netdev_pmd_thread *pmd)
 {
 struct rxq_poll *poll, *next;
 bool found = false;
@@ -3021,8 +3019,30 @@ dp_netdev_del_port_from_pmd(struct dp_netdev_port *port,
 }
 }
 ovs_mutex_unlock(>poll_mutex);
-if (found) {
-dp_netdev_reload_pmd__(pmd);
+
+return found;
+}
+
+/* Deletes all rx queues of 'port' from all pmd threads.  The pmd threads that
+ * need to be restarted are inserted in 'to_reload'. */
+static void
+dp_netdev_del_port_from_all_pmds__(struct dp_netdev *dp,
+   struct dp_netdev_port *port,
+   struct hmapx *to_reload)
+{
+int numa_id = netdev_get_numa_id(port->netdev);
+struct dp_netdev_pmd_thread *pmd;
+
+CMAP_FOR_EACH (pmd, node, >poll_threads) {
+if (pmd->numa_id == numa_id) {
+bool found;
+
+found = dp_netdev_del_port_from_pmd__(port, pmd);
+
+if (found) {
+hmapx_add(to_reload, pmd);
+}
+   }
 }
 }
 
@@ -3032,16 +3052,21 @@ static void
 dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp,
  struct dp_netdev_port *port)
 {
-int numa_id = netdev_get_numa_id(port->netdev);
 struct dp_netdev_pmd_thread *pmd;
+struct hmapx to_reload = HMAPX_INITIALIZER(_reload);
+struct hmapx_node *node;
 
-CMAP_FOR_EACH (pmd, node, >poll_threads) {
-if (pmd->numa_id == numa_id) {
-dp_netdev_del_port_from_pmd(port, pmd);
-   }
+dp_netdev_del_port_from_all_pmds__(dp, port, _reload);
+
+HMAPX_FOR_EACH (node, _reload) {
+pmd = (struct dp_netdev_pmd_thread *) node->data;
+dp_netdev_reload_pmd__(pmd);
 }
+
+hmapx_destroy(_reload);
 }
 
+
 /* Returns PMD thread from this numa node with fewer rx queues to poll.
  * Returns NULL if there is no PMD threads on this numa node.
  * Can be called safely only by main thread. */
@@ -3077,18 +3102,16 @@ dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread 
*pmd,
 pmd->poll_cnt++;
 }
 
-/* Distributes all rx queues of 'port' between all PMD threads and reloads
- * them if needed. */
+/* Distributes all rx queues of 'port' between all PMD threads in 'dp'. The
+ * pmd threads that need to be restarted are inserted in 'to_reload'. */
 static void
-dp_netdev_add_port_to_pmds(struct dp_netdev *dp, struct dp_netdev_port *port)
+dp_netdev_add_port_to_pmds__(struct dp_netdev *dp, struct dp_netdev_port *port,
+ struct hmapx *to_reload)
 {
 int numa_id = netdev_get_numa_id(port->netdev);
 struct dp_netdev_pmd_thread *pmd;
-struct hmapx to_reload;
-struct hmapx_node *node;
 int i;
 
-hmapx_init(_reload);
 /* Cannot create pmd threads for invalid numa node. */
 ovs_assert(ovs_numa_numa_id_is_valid(numa_id));
 
@@ -3105,8 +3128,20 @@ dp_netdev_add_port_to_pmds(struct dp_netdev *dp, struct 
dp_netdev_port *port)
 dp_netdev_add_rxq_to_pmd(pmd, port, port->rxq[i]);
 ovs_mutex_unlock(>poll_mutex);
 
-hmapx_add(_reload, pmd);
+hmapx_add(to_reload, pmd);
 }
+}
+
+/* Distributes all rx queues of 'port' between all PMD threads in 'dp' and
+ * reloads them, if needed. */
+static void
+dp_netdev_add_port_to_pmds(struct 

[ovs-dev] [PATCH v8 01/16] dpif-netdev: Destroy 'port_mutex' in dp_netdev_free().

2016-04-19 Thread Daniele Di Proietto
Found by inspection.

Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1e8a37c..24717cc 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -986,6 +986,7 @@ dp_netdev_free(struct dp_netdev *dp)
 
 seq_destroy(dp->port_seq);
 cmap_destroy(>ports);
+ovs_mutex_destroy(>port_mutex);
 
 /* Upcalls must be disabled at this point */
 dp_netdev_destroy_upcall_lock(dp);
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v8 00/16] Reconfigure netdev at runtime

2016-04-19 Thread Daniele Di Proietto
Currently we treat set_multiq() calls specially in netdev and dpif-netdev:
every pmd thread must be stopped and set_multiq() is allowed to destroy and
recreate the device.

I think we can improve this by:
* Generalizing the mechanism to allow changing other parameters at runtime
  (such as MTU).
* Involving less the above layer (dpif-netdev).  The request for changes
  often comes from below (netdev_dpdk_set_config(), or the vhost new_device()
  callback).  There's no need for dpif-netdev to remember the requested value,
  all that it needs to know is that a configuration change is requested.

This series implements exactly this: a mechanism to allow a netdev provider
to request configuration changes, to which dpif-netdev will respond by
stopping rx/tx and calling a netdev function to appy the new configuration.

The new mechanism is used in this series to replace the set_multiq() call,
but the idea is to use it also at least for:

* Changing the MTU at runtime
* Automatically detecting the number of rx queues for a vhost-user device
* Move a DPDK vhost device to the proper NUMA socket

The first commits refactor some code in dpif-netdev and, most importantly
avoid using RCU for ports.  Each thread will have its local copy of all the
ports in the datapath.

The series is also available here:

https://github.com/ddiproietto/ovs/tree/configchangesv8

v8:
* Update comment in rcu.h: ovs_mutex_cond_wait doesn't quiesce.
* Change 'set_multiq' to 'set_tx_multiq'.
* Added documentation in comments and commit messages explaining thread local
  port cache.
* Fixed style issues reported by checkpatch.py.
* Fixed race condition when deleting pmd thread.

v7:
* Dropped already applied patches.
* Stop using RCU for ports.
* Rebased against master.

v6:
* Rebased against master.
* Check return value of netdev_rxq_open().
* Fix comment.

v5:
* Style fixes.
* Fixed a bug in dp_netdev_free() in patch 6.

v4:
* Added another patch to uniform names of variables in netdev-dpdk (no
  functional change)
* Update some netdev comments to document the relation between
  netdev_set_multiq() and netdev_reconfigure()
* Clarify that when netdev_reconfigure() is called no call to netdev_send()
  or netdev_rxq_recv() must be issued.
* Move check to skip reconfiguration in netdev_dpdk_reconfigure() before
  rte_eth_dev_stop().

v3:
* Fixed another outdated comment about rx queue configuration, as pointed out
  by Mark
* Removed unnecessary and buggy initialization of requested_n_rxq in
  reconfigure_pmd_threads().
* Removed unused 'err' variable in netdev_dpdk_set_multiq().
* Changed comparison in netdev_set_multiq() to use previous
  'netdev->requested_n_txq' instead of 'netdev->up.n_txq'
* Return immediately in netdev_dpdk_reconfigure() if configuration didn't
  change anything.

v2:
* Fixed do_add_port(): we have to call netdev_reconfigure() before opening
  the rxqs.  This prevents memory leaks, and makes sure that the datapath
  polls the appropriate number of queues
* Fixed netdev_dpdk_vhost_set_multiq(): it must call
  netdev_request_reconfigure(). Since it is now equal to
  netdev_dpdk_set_multiq(), the two function have been merged.
* Fixed netdev_dpdk_set_config(): dev->requested_n_rxq is now accessed
  while holding the appropriate mutex.
* Fixed some outdated comments about rx queue configuration.


Daniele Di Proietto (16):
  dpif-netdev: Destroy 'port_mutex' in dp_netdev_free().
  dpif-netdev: Remove unused 'index' in dp_netdev_pmd_thread.
  dpif-netdev: Factor out port_create() from do_add_port().
  dpif-netdev: Add functions to modify rxq without reloading pmd
threads.
  dpif-netdev: Fix race condition in pmd thread initialization.
  dpif-netdev: Remove duplicate code in dp_netdev_set_pmds_on_numa().
  hmap: Add HMAP_FOR_EACH_POP.
  dpif-netdev: Add pmd thread local port cache for transmission.
  hmap: Use struct for hmap_at_position().
  dpif-netdev: Use hmap for ports.
  ovs-thread: Do not quiesce in ovs_mutex_cond_wait().
  ofproto-dpif: Call dpif_poll_threads_set() before dpif_run().
  dpif-netdev: Change pmd thread configuration in dpif_netdev_run().
  dpif-netdev: Handle errors in reconfigure_pmd_threads().
  netdev: Add reconfigure request mechanism.
  netdev-dpdk: Use ->reconfigure() call to change rx/tx queues.

 lib/cfm.c|   5 +-
 lib/dpif-netdev.c| 702 +++
 lib/dpif-provider.h  |   3 +-
 lib/hmap.c   |  26 +-
 lib/hmap.h   |  11 +-
 lib/id-pool.c|   5 +-
 lib/learning-switch.c|   5 +-
 lib/netdev-bsd.c |   3 +-
 lib/netdev-dpdk.c| 194 ++--
 lib/netdev-dummy.c   |   3 +-
 lib/netdev-linux.c   |   8 +-
 lib/netdev-provider.h|  50 ++-
 lib/netdev-vport.c   |   3 +-
 lib/netdev.c |  75 +++--
 lib/netdev.h |   7 +-
 lib/odp-util.c   |   7 +-
 lib/ovs-rcu.h| 

[ovs-dev] [PATCH v8 02/16] dpif-netdev: Remove unused 'index' in dp_netdev_pmd_thread.

2016-04-19 Thread Daniele Di Proietto
Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 24717cc..060f5e0 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -430,8 +430,6 @@ struct dp_netdev_pmd_thread {
 struct latch exit_latch;/* For terminating the pmd thread. */
 atomic_uint change_seq; /* For reloading pmd ports. */
 pthread_t thread;
-int index;  /* Idx of this pmd thread among pmd*/
-/* threads on same numa node. */
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread. */
 atomic_int tx_qid;  /* Queue id used by this pmd thread to
@@ -485,8 +483,8 @@ static void dp_netdev_recirculate(struct 
dp_netdev_pmd_thread *,
 static void dp_netdev_disable_upcall(struct dp_netdev *);
 static void dp_netdev_pmd_reload_done(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd,
-struct dp_netdev *dp, int index,
-unsigned core_id, int numa_id);
+struct dp_netdev *dp, unsigned core_id,
+int numa_id);
 static void dp_netdev_destroy_pmd(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_set_nonpmd(struct dp_netdev *dp);
 static struct dp_netdev_pmd_thread *dp_netdev_get_pmd(struct dp_netdev *dp,
@@ -2787,8 +2785,7 @@ dp_netdev_set_nonpmd(struct dp_netdev *dp)
 struct dp_netdev_pmd_thread *non_pmd;
 
 non_pmd = xzalloc(sizeof *non_pmd);
-dp_netdev_configure_pmd(non_pmd, dp, 0, NON_PMD_CORE_ID,
-OVS_NUMA_UNSPEC);
+dp_netdev_configure_pmd(non_pmd, dp, NON_PMD_CORE_ID, OVS_NUMA_UNSPEC);
 }
 
 /* Caller must have valid pointer to 'pmd'. */
@@ -2829,10 +2826,9 @@ dp_netdev_pmd_get_next(struct dp_netdev *dp, struct 
cmap_position *pos)
 /* Configures the 'pmd' based on the input argument. */
 static void
 dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
-int index, unsigned core_id, int numa_id)
+unsigned core_id, int numa_id)
 {
 pmd->dp = dp;
-pmd->index = index;
 pmd->core_id = core_id;
 pmd->numa_id = numa_id;
 pmd->poll_cnt = 0;
@@ -3140,7 +3136,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int 
numa_id)
 for (i = 0; i < can_have; i++) {
 unsigned core_id = ovs_numa_get_unpinned_core_on_numa(numa_id);
 pmds[i] = xzalloc(sizeof **pmds);
-dp_netdev_configure_pmd(pmds[i], dp, i, core_id, numa_id);
+dp_netdev_configure_pmd(pmds[i], dp, core_id, numa_id);
 }
 
 /* Distributes rx queues of this numa node between new pmd threads. */
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 14/15] system-tests: Add tcp simple test.

2016-04-19 Thread Joe Stringer
On 15 April 2016 at 17:02, Daniele Di Proietto  wrote:
> Useful to test the datapath ability to forward tcp packets without the
> complexity of connection tracking.
>
> Signed-off-by: Daniele Di Proietto 

I think we could do with a few more of these incremental steps on the
way up to the more complex configurations in the existing tests. This
is a step in the right direction.

Acked-by: Joe Stringer 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 13/15] system-tests: Disable offloads in userspace tests.

2016-04-19 Thread Joe Stringer
On 15 April 2016 at 17:02, Daniele Di Proietto  wrote:
> The system userspace testsuite uses the userspace datapath with
> netdev-linux devices, connected to veth pairs with the AF_PACKET socket:
>
>  (veth pair) (AF_PACKET)
> TCP stack -> p0 ---> ovs-p0  -> netdev-linux (userspace OVS)
>
> Unfortunately this configuration has some problems with offloads: a
> packet generated by the TCP stack maybe sent to p0 without being
> checksummed or segmented. The AF_PACKET socket, by default, ignores the
> offloads and just transmits the data of the packets to userspace, but:
>
> 1. The packet may need GSO, so the data will be too big to be received
>by the userspace datapath
> 2. The packet might have incomplete checksums, so it will likely be
>discarded by the receiver.
>
> Problem 1 causes TCP connections to see a congestion window smaller than
> the MTU, which hurts performance but doesn't prevent communication.
>
> Problem 2 was hidden in the testsuite by a Linux kernel bug, fixed by
> commit ce8c839b74e3("veth: don’t modify ip_summed; doing so treats
> packets with bad checksums as good").  In the kernels that include the
> fix, the userspace datapath is able to process pings, but not tcp or udp
> data.
>
> Unfortunately I couldn't find a way to ask the AF_PACKET to perform
> offloads in kernel.  A possible fix would be to use the PACKET_VNET_HDR
> sockopt and perform the offloads in userspace.
>
> Until a proper fix is worked out for netdev-linux, this commit disables
> offloads on the non-OVS side of the veth pair, as a workaround.
>
> Signed-off-by: Daniele Di Proietto 

Thanks for the fix. As I understand this simply fixes the userspace
testsuite when running on newer kernels today, so we could apply this
independently of the rest of the series.

Acked-by: Joe Stringer 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 10/15] dpif-netdev: Implement conntrack dump functions.

2016-04-19 Thread Joe Stringer
On 15 April 2016 at 17:02, Daniele Di Proietto  wrote:
> Signed-off-by: Daniele Di Proietto 

Could be combined with the earlier patch that provides the actual
implementation?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 09/15] dpif-netdev: Execute conntrack action.

2016-04-19 Thread Joe Stringer
On 15 April 2016 at 17:02, Daniele Di Proietto  wrote:
> This commit implements the OVS_ACTION_ATTR_CT action in dpif-netdev.
>
> To allow ofproto-dpif to detect the conntrack feature, flow_put will not
> discard anymore flows with ct_* fields set. We still shouldn't allow
> flows with NAT bits set, since there is no support for NAT.
>
> Signed-off-by: Daniele Di Proietto 
> ---

...

> @@ -2604,6 +2617,9 @@ dpif_netdev_run(struct dpif *dpif)
>  ovs_mutex_unlock(>non_pmd_mutex);
>  dp_netdev_pmd_unref(non_pmd);
>
> +/* XXX: If workload is too heavy we could add a separate thread. */
> +conntrack_run(>conntrack);
> +

Have you tried, eg, portscanning with several threads against the
implementation to see how bad it gets?


> @@ -3850,12 +3866,48 @@ dp_execute_cb(void *aux_, struct dp_packet **packets, 
> int cnt,
>  VLOG_WARN("Packet dropped. Max recirculation depth exceeded.");
>  break;
>
> -case OVS_ACTION_ATTR_CT:
> -/* If a flow with this action is slow-pathed, datapath assistance is
> - * required to implement it. However, we don't support this action
> - * in the userspace datapath. */
> -VLOG_WARN("Cannot execute conntrack action in userspace.");
> +case OVS_ACTION_ATTR_CT: {
> +const struct nlattr *b;
> +bool commit = false;
> +unsigned int left;
> +uint16_t zone = 0;
> +const char *helper = NULL;
> +const uint32_t *setmark = NULL;
> +const struct ovs_key_ct_labels *setlabel = NULL;
> +
> +
> +/* XXX parsing this everytime is expensive.  We should do like kernel
> + * does and create a structure. */

Seems reasonable. You say it's expensive (how expensive?), but it also
seems a little cleaner to store it in a more accessible manner.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 08/15] conntrack: Implement dumping to ct_entry.

2016-04-19 Thread Joe Stringer
On 15 April 2016 at 17:02, Daniele Di Proietto  wrote:
> Signed-off-by: Daniele Di Proietto 



> +static void
> +conn_key_to_tuple(const struct conn_key *key, struct ct_dpif_tuple *tuple)
> +{
> +if (key->dl_type == htons(ETH_TYPE_IP)) {
> +tuple->l3_type = AF_INET;
> +} else if (key->dl_type == htons(ETH_TYPE_IPV6)) {
> +tuple->l3_type = AF_INET6;
> +}
> +tuple->ip_proto = key->nw_proto;
> +ct_endpoint_to_ct_dpif_inet_addr(>src.addr, >src,
> + key->dl_type);
> +ct_endpoint_to_ct_dpif_inet_addr(>dst.addr, >dst,
> + key->dl_type);
> +
> +if (key->nw_proto == IPPROTO_ICMP) {
> +tuple->icmp_id = key->src.port;
> +/* ICMP type and code are not tracked */
> +tuple->icmp_type = 0;
> +tuple->icmp_code = 0;
> +} else {
> +tuple->src_port = key->src.port;
> +tuple->dst_port = key->dst.port;
> +}

I think in the Linux implementation, the original ICMP message's icmp
type/code are stored/provided here. Might be a behaviour parity
question.

> +}
> +
> +static void
> +conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry,
> +  long long now)
> +{
> +struct ct_l4_proto *class;
> +long long expiration;
> +memset(entry, 0, sizeof *entry);
> +conn_key_to_tuple(>key, >tuple_orig);
> +conn_key_to_tuple(>rev_key, >tuple_reply);
> +
> +entry->zone = conn->key.zone;
> +entry->mark = conn->mark;
> +
> +memcpy(>labels, >label, sizeof(entry->labels));
> +/* Not implemented yet */
> +entry->timestamp.start = 0;
> +entry->timestamp.stop = 0;

Is it better to reflect that timestamps are unsupported up to the user
or simply report 0 when it's unsupported? (I don't have a particular
preference, just asking the question)
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 07/15] conntrack: Implement flush function.

2016-04-19 Thread Joe Stringer
On 15 April 2016 at 17:02, Daniele Di Proietto  wrote:
> Signed-off-by: Daniele Di Proietto 

Looks like this can be combined with patch #11 (to plumb it up through appctl).

Acked-by: Joe Stringer 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 04/15] conntrack: New userspace connection tracker.

2016-04-19 Thread Joe Stringer
On 15 April 2016 at 17:02, Daniele Di Proietto  wrote:
> This commit adds the conntrack module.
>
> It is a connection tracker that resides entirely in userspace.  Its
> primary user will be the dpif-netdev datapath.
>
> The module main goal is to provide conntrack_execute(), which offers a
> convenient interface to implement the datapath ct() action.
>
> The conntrack module uses two submodules to deal with the l4 protocol
> details (conntrack-other for UDP and ICMP, conntrack-tcp for TCP).
>
> The conntrack-tcp submodule implementation is adapted from FreeBSD's pf
> subsystem, therefore it's BSD licensed.  It has been slightly altered to
> match the OVS coding style and to allow the pickup of already
> established connections.
>
> Signed-off-by: Daniele Di Proietto 

Thanks for submitting this, I know there's a few people excited about
this feature.

I notice that there is no NEWS item about this, were you intending to
hold off on announcing it until there is feature parity with the
kernel, eg support for IPv[46] fragments; ALGs; NAT?


> diff --git a/lib/conntrack-other.c b/lib/conntrack-other.c
> new file mode 100644
> index 000..65d02a9
> --- /dev/null
> +++ b/lib/conntrack-other.c
> @@ -0,0 +1,91 @@
> +/*
> + * Copyright (c) 2015, 2016 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include 
> +
> +#include "conntrack-private.h"
> +#include "dp-packet.h"
> +
> +enum other_state {
> +OTHERS_FIRST,
> +OTHERS_MULTIPLE,
> +OTHERS_BIDIR,
> +};
> +
> +struct conn_other {
> +struct conn up;
> +enum other_state state;
> +};
> +
> +static const long long other_timeouts[] = {
> +[OTHERS_FIRST] = 60 * 1000,
> +[OTHERS_MULTIPLE] = 60 * 1000,
> +[OTHERS_BIDIR] = 30 * 1000,
> +};

Are these timeouts in milliseconds? (a comment or #define might help with that)



> diff --git a/lib/conntrack.c b/lib/conntrack.c
> new file mode 100644
> index 000..840335b
> --- /dev/null
> +++ b/lib/conntrack.c
...
> +static struct ct_l4_proto *l4_protos[] = {
> +[IPPROTO_TCP] = _proto_tcp,
> +[IPPROTO_UDP] = _proto_other,
> +[IPPROTO_ICMP] = _proto_other,
> +};

I see that conntrack-other is shared by UDP and ICMP. In the Linux
datapath, the UDP handler also checks UDP length and UDP checksum - I
wonder if we can share most of the code here for these protocols but
add these additional checks for the UDP case?

> +static struct conn *
> +conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
> +   struct conn_lookup_ctx *ctx, uint8_t *state, bool commit,
> +   long long now)
> +{
> +unsigned bucket = hash_to_bucket(ctx->hash);
> +struct conn *nc = NULL;
> +
> +if (!valid_new(pkt, >key)) {
> +*state |= CS_INVALID;
> +return nc;
> +}
> +
> +*state |= CS_NEW;
> +
> +if (commit) {
> +nc = new_conn(pkt, >key, now);
> +
> +memcpy(>rev_key, >key, sizeof nc->rev_key);
> +
> +conn_key_reverse(>rev_key);
> +hmap_insert(>connections[bucket], >node, ctx->hash);
> +}
> +
> +return nc;
> +}

This function can return NULL

> +static struct conn *
> +process_one(struct conntrack *ct, struct dp_packet *pkt,
> +struct conn_lookup_ctx *ctx, uint16_t zone,
> +bool commit, long long now)
> +{
> +unsigned bucket = hash_to_bucket(ctx->hash);
> +struct conn *conn = ctx->conn;
> +uint8_t state = 0;
> +
> +if (conn) {
> +if (ctx->related) {
> +state |= CS_RELATED;
> +if (ctx->reply) {
> +state |= CS_REPLY_DIR;
> +}
> +} else {
> +enum ct_update_res res;
> +
> +res = conn_update(conn, pkt, ctx->reply, now);
> +
> +switch (res) {
> +case CT_UPDATE_VALID:
> +state |= CS_ESTABLISHED;
> +if (ctx->reply) {
> +state |= CS_REPLY_DIR;
> +}
> +break;
> +case CT_UPDATE_INVALID:
> +state |= CS_INVALID;
> +break;
> +case CT_UPDATE_NEW:
> +hmap_remove(>connections[bucket], >node);
> +delete_conn(conn);
> +conn = conn_not_found(ct, pkt, ctx, , commit, now);

...which is then stored here...

> +break;
> +}
> +}
> +
> +

Re: [ovs-dev] [PATCH v2 03/15] flow: Introduce parse_dl_type().

2016-04-19 Thread Joe Stringer
On 15 April 2016 at 17:02, Daniele Di Proietto  wrote:
> The function simply returns the ethernet type of the packet (after
> eventually discarding the VLAN tag).  It will be used by a following
> commit.
>
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/flow.c | 14 --
>  lib/flow.h |  1 +
>  2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/lib/flow.c b/lib/flow.c
> index 972a996..0250a7c 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -327,7 +327,7 @@ parse_mpls(const void **datap, size_t *sizep)
>  return MIN(count, FLOW_MAX_MPLS_LABELS);
>  }
>
> -static inline ovs_be16
> +static inline ALWAYS_INLINE ovs_be16

Usually we try to avoid telling the compiler how to optimise things
like inlining - is there a reasoning around these changes?

On 15 April 2016 at 17:02, Daniele Di Proietto  wrote:
> The function simply returns the ethernet type of the packet (after
> eventually discarding the VLAN tag).  It will be used by a following
> commit.
>
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/flow.c | 14 --
>  lib/flow.h |  1 +
>  2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/lib/flow.c b/lib/flow.c
> index 972a996..0250a7c 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -327,7 +327,7 @@ parse_mpls(const void **datap, size_t *sizep)
>  return MIN(count, FLOW_MAX_MPLS_LABELS);
>  }
>
> -static inline ovs_be16
> +static inline ALWAYS_INLINE ovs_be16
>  parse_vlan(const void **datap, size_t *sizep)
>  {
>  const struct eth_header *eth = *datap;
> @@ -349,7 +349,7 @@ parse_vlan(const void **datap, size_t *sizep)
>  return 0;
>  }
>
> -static inline ovs_be16
> +static inline ALWAYS_INLINE ovs_be16
>  parse_ethertype(const void **datap, size_t *sizep)
>  {
>  const struct llc_snap_header *llc;
> @@ -826,6 +826,16 @@ miniflow_extract(struct dp_packet *packet, struct 
> miniflow *dst)
>  dst->map = mf.map;
>  }
>
> +ovs_be16
> +parse_dl_type(const struct eth_header *data_, size_t size)
> +{
> +const void *data = data_;
> +
> +parse_vlan(, );
> +
> +return parse_ethertype(, );
> +}
> +
>  /* For every bit of a field that is wildcarded in 'wildcards', sets the
>   * corresponding bit in 'flow' to zero. */
>  void
> diff --git a/lib/flow.h b/lib/flow.h
> index 6771232..0406198 100644
> --- a/lib/flow.h
> +++ b/lib/flow.h
> @@ -107,6 +107,7 @@ void flow_compose(struct dp_packet *, const struct flow 
> *);
>
>  bool parse_ipv6_ext_hdrs(const void **datap, size_t *sizep, uint8_t 
> *nw_proto,
>   uint8_t *nw_frag);
> +ovs_be16 parse_dl_type(const struct eth_header *data_, size_t size);
>
>  static inline uint64_t
>  flow_get_xreg(const struct flow *flow, int idx)
> --
> 2.1.4
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 02/15] flow: Export parse_ipv6_ext_hdrs().

2016-04-19 Thread Joe Stringer
On 15 April 2016 at 17:02, Daniele Di Proietto  wrote:
> This will be used by a future commit.
>
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/flow.c | 140 
> ++---
>  lib/flow.h |   3 ++
>  2 files changed, 81 insertions(+), 62 deletions(-)
>
> diff --git a/lib/flow.c b/lib/flow.c
> index 560a90f..972a996 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -439,6 +439,82 @@ invalid:
>  arp_buf[1] = eth_addr_zero;
>  }
>
> +static inline bool
> +parse_ipv6_ext_hdrs__(const void **datap, size_t *sizep, uint8_t *nw_proto,
> +  uint8_t *nw_frag)
> +{
> +while (1) {
> +if (OVS_LIKELY((*nw_proto != IPPROTO_HOPOPTS)
> +   && (*nw_proto != IPPROTO_ROUTING)
> +   && (*nw_proto != IPPROTO_DSTOPTS)
> +   && (*nw_proto != IPPROTO_AH)
> +   && (*nw_proto != IPPROTO_FRAGMENT))) {
> +/* It's either a terminal header (e.g., TCP, UDP) or one we
> + * don't understand.  In either case, we're done with the
> + * packet, so use it to fill in 'nw_proto'. */
> +return true;
> +}
> +
> +/* We only verify that at least 8 bytes of the next header are
> + * available, but many of these headers are longer.  Ensure that
> + * accesses within the extension header are within those first 8
> + * bytes. All extension headers are required to be at least 8
> + * bytes. */
> +if (OVS_UNLIKELY(*sizep < 8)) {
> +return false;
> +}
> +
> +if ((*nw_proto == IPPROTO_HOPOPTS)
> +|| (*nw_proto == IPPROTO_ROUTING)
> +|| (*nw_proto == IPPROTO_DSTOPTS)) {
> +/* These headers, while different, have the fields we care
> + * about in the same location and with the same
> + * interpretation. */
> +const struct ip6_ext *ext_hdr = *datap;
> +*nw_proto = ext_hdr->ip6e_nxt;
> +if (OVS_UNLIKELY(!data_try_pull(datap, sizep,
> +(ext_hdr->ip6e_len + 1) * 8))) {
> +return false;
> +}
> +} else if (*nw_proto == IPPROTO_AH) {
> +/* A standard AH definition isn't available, but the fields
> + * we care about are in the same location as the generic
> + * option header--only the header length is calculated
> + * differently. */
> +const struct ip6_ext *ext_hdr = *datap;
> +*nw_proto = ext_hdr->ip6e_nxt;
> +if (OVS_UNLIKELY(!data_try_pull(datap, sizep,
> +(ext_hdr->ip6e_len + 2) * 4))) {
> +return false;
> +}
> +} else if (*nw_proto == IPPROTO_FRAGMENT) {
> +const struct ovs_16aligned_ip6_frag *frag_hdr = *datap;
> +
> +*nw_proto = frag_hdr->ip6f_nxt;
> +if (!data_try_pull(datap, sizep, sizeof *frag_hdr)) {
> +return false;
> +}
> +
> +/* We only process the first fragment. */
> +if (frag_hdr->ip6f_offlg != htons(0)) {
> +*nw_frag = FLOW_NW_FRAG_ANY;
> +if ((frag_hdr->ip6f_offlg & IP6F_OFF_MASK) != htons(0)) {
> +*nw_frag |= FLOW_NW_FRAG_LATER;
> +*nw_proto = IPPROTO_FRAGMENT;
> +return true;
> +}
> +}
> +}
> +}
> +}
> +
> +bool
> +parse_ipv6_ext_hdrs(const void **datap, size_t *sizep, uint8_t *nw_proto,
> +uint8_t *nw_frag)
> +{
> +return parse_ipv6_ext_hdrs__(datap, sizep, nw_proto, nw_frag);
> +}
> +

I couldn't tell any difference between parse_ipv6_ext_hdrs() and
parse_ipv6_ext_hdrs__(); are they both necessary?

Acked-by: Joe Stringer 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 01/15] packets: Define ICMP types.

2016-04-19 Thread Joe Stringer
On 15 April 2016 at 17:02, Daniele Di Proietto  wrote:
> Linux and FreeBSD have slightly different names for these constants.
> Windows doesn't define them.  It is simpler to redefine them from
> scratch for OVS.  The new names are different than those used in Linux
> and FreeBSD.
>
> These definitions will be used by a future commit.
>
> Signed-off-by: Daniele Di Proietto 

Acked-by: Joe Stringer 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH RFC] ovn: distributed logical port for VM metadata access

2016-04-19 Thread Aaron Rosen
Hi Ramu,


On Tue, Apr 19, 2016 at 8:12 AM, Ramu Ramamurthy 
wrote:

> On Mon, Apr 18, 2016 at 2:55 PM, Aaron Rosen 
> wrote:
> > I like this idea as well.
> >
> > The one question I have with is is how we should determine which ip
> address
> > to select for the 'distributed' port?
>
> Aaron, Thanks for your review, and feedback.
>
> We can use the dhcp-port (and its IP address) for the distributed-port.
>
> The current native-dhcp proposal
> (http://openvswitch.org/pipermail/dev/2016-April/069787.html)
> assumes that a dhcp-server ip-address "server_id" is defined on the subnet.
>
> action=(dhcp_offer(offerip = 10.0.0.2, router = 10.0.0.1,
> server_id = 10.0.0.2, mtu = 1300, lease_time = 3600,
>
> For openstack, This means that a DHCP port has been created on that
> subnet in neutron.
> In the absence of a dhcp-agent, the neutron-ovn-plugin would have to
> auto-create the
> dhcp-port in neutron upon creation of a subnet, and then use that
> port's IP address as
> the "server_id" when it programs the "dhcp-options"
> column of the Logical_Switch table.
>
> The pros of the distributed-port approach is that a) HA is not needed,
> b) it runs the existing
> neutron-metadata-proxy/neutron-metadata-agent as-is, c) In the future,
> we could remove the
> neutron-metadata-agent also, by (nova-compute) configuring the
> instance-id and tenant-id as external-ids
> of the VM's ovs interface - hence not need to run any neutron-agents at
> all.
> The drawbacks include creation of namespaces and metadata-proxy
> processes on each hypervisor.
>
>
> Cool, makes sense to me.


> > If we want to avoid creating the network name spaces and
> > running the http proxy on each hypervisor is if we took a similar
> approach
> > that openstack uses for handling dhcp/metadata requests.
> > When a subnet is created we could have the neutron-ovn-plugin notify a
> > metadata agent which would create a port on the given subnet for the
> logical
> > network. Then, to get instances to route its metadata traffic to this
> > logical port we could have ovn distribute an additional host-route via
> dhcp
> > (using option 121).  Similar to what you are proposing.
> >
> >
> > I.e:  So for example if someone created a network/subnet.
> >
> > In the ovn plugin we can signal the metadata agent to create a port on
> that
> > network.  Then, for every port that is created on this network we would
> > distribute a hostroute of 169.254.169.254/32 via ; Then,
> > we'd have the metadata agent just run there which would answer these meta
> > data requests and route them back.
> >
> > One down side to this solution is that this metadata agent would need to
> be
> > made HA in some way. In your current solution if the metadata agent
> crashes
> > or something the failure is isolated to the hypervisor. That said, this
> type
> > of HA seems like it can be implemented in at least an active passive
> > solution easily enough.
> >
> > Thoughts?
> >
>
> Your proposal is an alternative solution - which involves changes only to
> the
> neutron components  (and no changes in ovn ?). Would there be only one
> modified neutron-metadata-agent in an active-passive configuration serving
> all
> the VMs ? If there are multiple agents, would you need agent-scheduling to
> assign networks to agents ?
>
> Could you share more details of the approach ?
>


Right, with this approach I don't believe we would need any additional
changes in ovn besides the ability to specify host routes via dhcp.

For this solution we'd need to modify the neutron-metadata-agent to work in
this way. Currently, it has this functionality though it's coupled together
with the dhcp-agent (which we wouldn't want). In order to support HA you
would need to run multiple agents and the HA would use a similar
agent-scheduling method that neutron currently uses to map dhcp-agents to
networks. Also, the HA for this would need to be implemented as
active-passive I believe.

Personally, I do prefer your solution as the HA solution is more elegant as
it runs on each HV. That said, if having the namespaces on the hypervisor
nodes is a deal breaker than this would be an alternative solution to that.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] Display correct ovnsb sock location in the ovnsb-ctl help message

2016-04-19 Thread Hui Kang
Signed-off-by: Hui Kang 
---
 ovn/utilities/ovn-sbctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ovn/utilities/ovn-sbctl.c b/ovn/utilities/ovn-sbctl.c
index a257729..a888333 100644
--- a/ovn/utilities/ovn-sbctl.c
+++ b/ovn/utilities/ovn-sbctl.c
@@ -329,7 +329,7 @@ Options:\n\
   -t, --timeout=SECS  wait at most SECS seconds\n\
   --dry-run   do not commit changes to database\n\
   --oneline   print exactly one line of output per command\n",
-   program_name, program_name, ctl_get_db_cmd_usage(), 
ctl_default_db());
+   program_name, program_name, ctl_get_db_cmd_usage(), 
sbctl_default_db());
 vlog_usage();
 printf("\
   --no-syslog equivalent to --verbose=sbctl:syslog:warn\n");
-- 
1.9.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC PATCH] create vxlan device using rtnetlink interface

2016-04-19 Thread Ben Pfaff
On Tue, Apr 19, 2016 at 07:00:00PM +0200, Jiri Benc wrote:
> On Mon, 18 Apr 2016 11:30:43 -0700, Jesse Gross wrote:
> > (Even on new kernels I don't know how this would interact
> > with OVS tunnel ports that are not of type 'flow'. Would they also use
> > the compat code?)
> 
> This is a very good point. Although, on the other hand, how much common
> are tunnel ports with fixed parameters nowadays?

OVN and NSX use them.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC PATCH] create vxlan device using rtnetlink interface

2016-04-19 Thread Jiri Benc
On Mon, 18 Apr 2016 11:30:43 -0700, Jesse Gross wrote:
> (Even on new kernels I don't know how this would interact
> with OVS tunnel ports that are not of type 'flow'. Would they also use
> the compat code?)

This is a very good point. Although, on the other hand, how much common
are tunnel ports with fixed parameters nowadays?

> Even if we didn't have to worry about any legacy code, I'm not
> convinced that taking tunnel configuration out of OVSDB is the right
> thing from the user's perspective. Much of the value of OVS is around
> programmatic and central control so we should allow that where
> possible. For example, if a controller starts using a different
> encapsulation type in a new version or for a different situation, it
> seems undesirable to require the user to change things on each
> machine.

But tunnels can't be completely set up from a controller even now. At
minimum, an (outer) IP address has to be assigned on each machine. Of
course, there are cases in which an existing, already set up network is
used as the underlay. If the main target is such use cases, then indeed,
this could be perceived as usability regression.

> With regards to veths, in many cases they are used to stitch
> together different combinations of OVS and bridging, which is
> something that is itself a pain point and something that I hope will
> be less necessary in the future.

Fair enough.

 Jiri
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH RFC] ovn: distributed logical port for VM metadata access

2016-04-19 Thread Numan Siddique
>
> We can use the dhcp-port (and its IP address) for the distributed-port.
>
> The current native-dhcp proposal
> (http://openvswitch.org/pipermail/dev/2016-April/069787.html)
> assumes that a dhcp-server ip-address "server_id" is defined on the subnet.
>
> action=(dhcp_offer(offerip = 10.0.0.2, router = 10.0.0.1,
> server_id = 10.0.0.2, mtu = 1300, lease_time = 3600,
>
> For openstack, This means that a DHCP port has been created on that
> subnet in neutron.
>

​The networking-ovn patch for native dhcp support presently doesn't create
a
dhcp port in the create subnet method [1]

Either this patch [1] can create the dhcp port or it can be created when
the metadata changes
are ready. Please let me know if you have any comments on the gerrit review

[1] - https://review.openstack.org/#/c/243174/10/networking_ovn/plugin.py


In the absence of a dhcp-agent, the neutron-ovn-plugin would have to
> auto-create the
> dhcp-port in neutron upon creation of a subnet, and then use that
> port's IP address as
> the "server_id" when it programs the "dhcp-options"
> column of the Logical_Switch table.
>
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] ovn-northd: Add support for static_routes.

2016-04-19 Thread Guru Shetty
Mickey, Can you give a first shot at the review. You had some issues with
the my code structure and looks like Steve decided to use something
similar.

On 19 April 2016 at 02:36, steve.ruan  wrote:

> From: Guru Shetty 
>
Steve,
 You should be the author (when you respin after Mickey's review).  You can
just add me as "Co-authored-by"  along with my signed-off-by. You can
search in "git log" for other examples.

>
> static routes are useful when connecting multiple
> routers with each other.
>
> Signed-off-by: Gurucharan Shetty 
> Signed-off-by: steve.ruan 
> ---
>  ovn/northd/ovn-northd.8.xml   |   5 +-
>  ovn/northd/ovn-northd.c   | 101 +
>  ovn/ovn-nb.ovsschema  |  15 +++-
>  ovn/ovn-nb.xml|  31 +++
>  ovn/utilities/ovn-nbctl.8.xml |   5 ++
>  ovn/utilities/ovn-nbctl.c |   5 ++
>  tests/ovn.at  | 204
> ++
>  7 files changed, 362 insertions(+), 4 deletions(-)
>
> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> index da776e1..978853c 100644
> --- a/ovn/northd/ovn-northd.8.xml
> +++ b/ovn/northd/ovn-northd.8.xml
> @@ -682,8 +682,9 @@ next;
>  
>
>  
> -  If the route has a gateway, G is the gateway IP
> address,
> -  otherwise it is ip4.dst.
> +  If the route has a gateway, G is the gateway IP
> address.
> +  Instead, if the route is from a configured static route,
> G
> +  is the next hop IP address.  Else it is ip4.dst.
>  
>
>
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index 260c02f..71c9b29 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -234,6 +234,18 @@ allocate_tnlid(struct hmap *set, const char *name,
> uint32_t max,
>  return 0;
>  }
>
> +/* Holds the next hop ip address and the logical router port via which
> + * a static route is reachable. */
> +struct route_to_port {
> +ovs_be32 ip;/* network address of the route. */
> +ovs_be32 mask;  /* network mask of the route. */
> +ovs_be32 next_hop;  /* next_hop ip address for the above
> route. */
> +struct uuid rport;  /* output port specified by CMS, or null
> if not specified */
> +struct ovn_port *ovn_port;  /* The logical router port via which the
> packet
> + * needs to exit to reach the next hop. */
> +};
> +
> +
>  /* The 'key' comes from nbs->header_.uuid or nbr->header_.uuid or
>   * sb->external_ids:logical-switch. */
>  struct ovn_datapath {
> @@ -249,6 +261,9 @@ struct ovn_datapath {
>  /* Logical router data (digested from nbr). */
>  const struct ovn_port *gateway_port;
>  ovs_be32 gateway;
> +/* Maps a static route to a ovn logical router port via which packet
> + * needs to exit. */
> +struct shash routes_map;
>
>  /* Logical switch data. */
>  struct ovn_port **router_ports;
> @@ -272,6 +287,7 @@ ovn_datapath_create(struct hmap *datapaths, const
> struct uuid *key,
>  od->nbs = nbs;
>  od->nbr = nbr;
>  hmap_init(>port_tnlids);
> +shash_init(>routes_map);
>  od->port_key_hint = 0;
>  hmap_insert(datapaths, >key_node, uuid_hash(>key));
>  return od;
> @@ -286,6 +302,7 @@ ovn_datapath_destroy(struct hmap *datapaths, struct
> ovn_datapath *od)
>   * use it. */
>  hmap_remove(datapaths, >key_node);
>  destroy_tnlids(>port_tnlids);
> +shash_destroy_free_data(>routes_map);
>  free(od->router_ports);
>  free(od);
>  }
> @@ -318,6 +335,50 @@ ovn_datapath_from_sbrec(struct hmap *datapaths,
>  }
>
>  static void
> +build_static_route(struct ovn_datapath *od,
> + const struct nbrec_logical_router_static_route *route)
> +{
> +ovs_be32 prefix, next_hop, mask;
> +
> +/* verify nexthop */
> +char *error = ip_parse_masked(route->nexthop, _hop, );
> +if (error || mask != OVS_BE32_MAX) {
> +static struct vlog_rate_limit rl
> += VLOG_RATE_LIMIT_INIT(5, 1);
> +VLOG_WARN_RL(, "bad next hop ip address %s",
> +route->nexthop);
> +free(error);
> +return;
> +}
> +
> +/* verify prefix */
> +error = ip_parse_masked(route->prefix, , );
> +if (error || !ip_is_cidr(mask)) {
> +static struct vlog_rate_limit rl
> += VLOG_RATE_LIMIT_INIT(5, 1);
> +VLOG_WARN_RL(, "bad 'network' in static routes %s",
> +  route->prefix);
> +free(error);
> +return;
> +}
> +
> +struct uuid lrp_uuid;
> +struct route_to_port *route_port = xmalloc(sizeof *route_port);
> +route_port->ip = prefix;
> +route_port->mask = mask;
> +route_port->next_hop = next_hop;
> +/* The correct ovn_port will be filled while traversing
> + * logical_router_ports. 

Re: [ovs-dev] [PATCH] Add VxLAN-GBP support for user space data path

2016-04-19 Thread Jesse Gross
On Tue, Apr 19, 2016 at 3:20 AM, Johnson.Li  wrote:
> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
> index e398562..e6b35ed 100644
> --- a/lib/netdev-vport.c
> +++ b/lib/netdev-vport.c
> @@ -1297,10 +1298,16 @@ netdev_vxlan_pop_header(struct dp_packet *packet)
>  return EINVAL;
>  }
>
> -if (get_16aligned_be32(>vx_flags) != htonl(VXLAN_FLAGS) ||
> -   (get_16aligned_be32(>vx_vni) & htonl(0xff))) {
> -VLOG_WARN_RL(_rl, "invalid vxlan flags=%#x vni=%#x\n",
> - ntohl(get_16aligned_be32(>vx_flags)),
> +vxh_flags = get_16aligned_be32(>vx_flags);
> +if ((vxh_flags & htonl(VXLAN_GBP_FLAGS)) == htonl(VXLAN_GBP_FLAGS)) {

There are many overlapping extensions to VXLAN. We should only try to
parse these flags when the GBP extensions is actually turned on.

> +tnl->gbp_id = htons(ntohl(vxh_flags) & 0x); /* VXLAN_GBP_ID_MASK 
> */

There is a field called tnl->gbp_flags which should presumably also be
populated here (and on transmit).

> diff --git a/lib/packets.h b/lib/packets.h
> index 8139a6b..d9a149d 100644
> --- a/lib/packets.h
> +++ b/lib/packets.h
> @@ -1002,6 +1002,7 @@ struct vxlanhdr {
>  };
>
>  #define VXLAN_FLAGS 0x0800  /* struct vxlanhdr.vx_flags required value. 
> */
> +#define VXLAN_GBP_FLAGS 0x8800  /* Group Based Policy */

Can you please break these down into individual flags so that it is
easier to tell what is going on (and you could use them in parsing as
well)?

Can you also please add some test cases?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH RFC] ovn: distributed logical port for VM metadata access

2016-04-19 Thread Ramu Ramamurthy
On Mon, Apr 18, 2016 at 2:55 PM, Aaron Rosen  wrote:
> I like this idea as well.
>
> The one question I have with is is how we should determine which ip address
> to select for the 'distributed' port?

Aaron, Thanks for your review, and feedback.

We can use the dhcp-port (and its IP address) for the distributed-port.

The current native-dhcp proposal
(http://openvswitch.org/pipermail/dev/2016-April/069787.html)
assumes that a dhcp-server ip-address "server_id" is defined on the subnet.

action=(dhcp_offer(offerip = 10.0.0.2, router = 10.0.0.1,
server_id = 10.0.0.2, mtu = 1300, lease_time = 3600,

For openstack, This means that a DHCP port has been created on that
subnet in neutron.
In the absence of a dhcp-agent, the neutron-ovn-plugin would have to
auto-create the
dhcp-port in neutron upon creation of a subnet, and then use that
port's IP address as
the "server_id" when it programs the "dhcp-options"
column of the Logical_Switch table.

The pros of the distributed-port approach is that a) HA is not needed,
b) it runs the existing
neutron-metadata-proxy/neutron-metadata-agent as-is, c) In the future,
we could remove the
neutron-metadata-agent also, by (nova-compute) configuring the
instance-id and tenant-id as external-ids
of the VM's ovs interface - hence not need to run any neutron-agents at all.
The drawbacks include creation of namespaces and metadata-proxy
processes on each hypervisor.


> If we want to avoid creating the network name spaces and
> running the http proxy on each hypervisor is if we took a similar approach
> that openstack uses for handling dhcp/metadata requests.
> When a subnet is created we could have the neutron-ovn-plugin notify a
> metadata agent which would create a port on the given subnet for the logical
> network. Then, to get instances to route its metadata traffic to this
> logical port we could have ovn distribute an additional host-route via dhcp
> (using option 121).  Similar to what you are proposing.
>
>
> I.e:  So for example if someone created a network/subnet.
>
> In the ovn plugin we can signal the metadata agent to create a port on that
> network.  Then, for every port that is created on this network we would
> distribute a hostroute of 169.254.169.254/32 via ; Then,
> we'd have the metadata agent just run there which would answer these meta
> data requests and route them back.
>
> One down side to this solution is that this metadata agent would need to be
> made HA in some way. In your current solution if the metadata agent crashes
> or something the failure is isolated to the hypervisor. That said, this type
> of HA seems like it can be implemented in at least an active passive
> solution easily enough.
>
> Thoughts?
>

Your proposal is an alternative solution - which involves changes only to the
neutron components  (and no changes in ovn ?). Would there be only one
modified neutron-metadata-agent in an active-passive configuration serving all
the VMs ? If there are multiple agents, would you need agent-scheduling to
assign networks to agents ?

Could you share more details of the approach ?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Reply: ovs + dpdk vhost-user match flows but cannot execute actions

2016-04-19 Thread Mauricio Vásquez
Hi lifuqiong,

Could you provide the output of the following commands in your setup?:
ovs-vsctl show
ovs-ofctl dump-flows
ovs-ofctl dump-ports

Mauricio Vasquez,

On Sat, Apr 16, 2016 at 10:23 AM, lifuqiong 
wrote:

> Hi Mauricio:
>
>  I changed my qemu version from 2.2.1 to 2.5.0 and the vms can
> communication with each other. But the VM cannot ping PC in the outside
> network. They in a same subnet.
>
>
>
> PC(192.168.0.103/24) ping vm(192.168.0.90);  vm’s host’s dpdk NIC and PC
> are in a L2 switch. And DPDK NIC connected to switch’s Port Eth1/0/8
>
> Investigation:
>
>  Binding VM’s IP and Mac in PC’s ARP table, and binding VM’s Mac
> in Port Eth1/0/8; PC is pinging vm, I can see Output packets of Eth1/0/8
> increasing regularly, which means the ping request packet has send to DPDK
> NIC.
>
>
>
> But the VM can not receive the packet?
>
>
>
>  My switch has NO vlan , the default VLAN is 1. But when I ping
> from vm1 to vm2 , showing MaC address in OVS which shows as follows:
>
> port  VLAN  MACAge
>
> 4 0  00:00:00:00:02:121
>
> 3 0  00:00:00:00:00:041
>
>
>
>
>
> What was the reason why VM cannot communication with the PC? Thank you
>
>
>
> *发件人:* lifuqiong [mailto:lifuqi...@cncloudsec.com]
> *发送时间:* 2016年4月15日 9:22
> *收件人:* 'Mauricio Vásquez'
> *抄送:* 'dev@openvswitch.org'
> *主题:* 答复: [ovs-dev] ovs + dpdk vhost-user match flows but cannot execute
> actions
>
>
>
> Hello Mauricio Vasquez:
>
> It works. Thank you very much.
>
>
>
>
>
> *发件人:* Mauricio Vásquez [mailto:mauricio.vasquezber...@studenti.polito.it
> ]
> *发送时间:* 2016年4月14日 14:55
> *收件人:* lifuqiong
> *抄送:* dev@openvswitch.org
> *主题:* Re: [ovs-dev] ovs + dpdk vhost-user match flows but cannot execute
> actions
>
>
>
> Hello lifuqiong,
>
> I faced the same problem some days ago (
> http://openvswitch.org/pipermail/dev/2016-March/068282.html), the bug is
> already fixed.
>
> Where are you downloading OVS from?, it appears that the bug is still
> present in the version on
> http://openvswitch.org/releases/openvswitch-2.5.0.tar.gz, please download
> ovs from git and switch to branch-2.5.
>
> Mauricio Vasquez,
>
>
>
> On Thu, Apr 14, 2016 at 4:28 AM, lifuqiong 
> wrote:
>
> I want to test dpdk vhost-user port on ovs to follow
> https://software.intel.com/en-us/blogs/2015/06/09/building-vhost-user-for-ovs-today-using-dpdk-200
> ;
> I create ovs+dpdk environment followed INSTALL.DPDK.md; and create 2 VM2,
> try to ping each other but show me “Destination Host Unreachable”;
> Dump-flows shows packets matched the flow, but can’t output to port 4,
> why ? I can’t get any useful error or warning info from ovs-vswitchd.log.
> While ping from vm1 to vm2, statistics on vm1 shows that eth1 RX_packet
> keeps zero, TX_PACKET keeps increasing.
> 1.
> OVS: 2.5.0
> Dpdk: 2.2.0
> Qemu: 2.2.1
>
> 2. ovs-ofctl dump-flows ovsbr0
> NXST_FLOW reply (xid=0x4):
> cookie=0x0, duration=836.946s, table=0, n_packets=628, n_bytes=26376,
> idle_age=0, in_port=3 actions=output:4
> cookie=0x0, duration=831.458s, table=0, n_packets=36, n_bytes=1512,
> idle_age=770, in_port=4 actions=output:3
>
> 3. root@host152:/usr/local/var/run/openvswitch# ovs-vsctl show
> 03ae6f7d-3b71-45e3-beb0-09fa11292eaa
> Bridge "ovsbr0"
> Port "vhost-user-1"
> Interface "vhost-user-1"
> type: dpdkvhostuser
> Port "ovsbr0"
> Interface "ovsbr0"
> type: internal
> Port "dpdk1"
> Interface "dpdk1"
> type: dpdk
> Port "vhost-user-0"
> Interface "vhost-user-0"
> type: dpdkvhostuser
> Port "dpdk0"
> Interface "dpdk0"
> type: dpdk
>
> 4. Start VM info:
> qemu-system-x86_64 -m 1024 -smp 2 -hda /root/vm11.qcow2 -boot c
> -enable-kvm -vnc 0.0.0.0:1 -chardev
> socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-0 -netdev
> type=vhost-user,id=mynet1,chardev=char1,vhostforce -device
> virtio-net-pci,mac=00:00:00:00:01:12,netdev=mynet1 -object
> memory-backend-file,id=mem,size=1024M,mem-path=/dev/hugepages,share=on
> -numa node,memdev=mem -mem-prealloc -d exec
>
> qemu-system-x86_64: -netdev
> type=vhost-user,id=mynet1,chardev=char1,vhostforce: chardev "char1" went up
>
>
>
> 5. My build command as follows:
> #!/bin/bash
>
>  config and compile dpdk   # cd dpdk #
> make config T=x86_64-native-linuxapp-gcc # make install
> T=x86_64-native-linuxapp-gcc
> 
>
>  config and compile ovs  # cd ovs #
> ./boot.sh # ./configure --localstatedir=/var
> --with-dpdk=/root/workplane/dpdk/x86_64-native-linuxapp-gcc
> # make
> # make install
> 
>
>  config and compile qemu 

[ovs-dev] Delivery reports about your e-mail

2016-04-19 Thread Bounced mail
This message was undeliverable due to the following reason(s):

Your message was not delivered because the destination computer was
unreachable within the allowed queue period. The amount of time
a message is queued before it is returned depends on local configura-
tion parameters.

Most likely there is a network problem that prevented delivery, but
it is also possible that the computer is turned off, or does not
have a mail system running right now.

Your message was not delivered within 1 days:
Server 5.179.227.28 is not responding.

The following recipients could not receive this message:


Please reply to postmas...@openvswitch.org
if you feel this message to be in error.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] odp-util: Fix build warning on flags_mask.

2016-04-19 Thread antonio . fischetti
Fix build warning: 'flags_mask' may be used uninitialized.

Signed-off-by: Antonio Fischetti 
---
 lib/odp-util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/odp-util.c b/lib/odp-util.c
index b4689cc..10fb6c2 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -2366,7 +2366,7 @@ format_odp_tun_vxlan_opt(const struct nlattr *attr,
 case OVS_VXLAN_EXT_GBP: {
 uint32_t key = nl_attr_get_u32(a);
 ovs_be16 id, id_mask;
-uint8_t flags, flags_mask;
+uint8_t flags, flags_mask = 0;
 
 id = htons(key & 0x);
 flags = (key >> 16) & 0xFF;
-- 
2.4.11

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] FAQ: Add entry for OVS/DPDK version dependencies.

2016-04-19 Thread Kevin Traynor
For a given release this is listed in the INSTALL.DPDK.md
but it gets asked quite a bit on the mailing list, so create
a table in the FAQ.

Signed-off-by: Kevin Traynor 
---
 FAQ.md |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/FAQ.md b/FAQ.md
index 0fee992..5777471 100644
--- a/FAQ.md
+++ b/FAQ.md
@@ -233,6 +233,19 @@ Validate flow actions |  YES   |   YES  |  
  N/A|   NO|
 Multiple datapaths|  YES   |   YES  |YES|   NO|
 Tunnel TSO - STT  |  N/A   |   YES  |NO |   YES   |
 
+### Q: What DPDK version does each Open vSwitch release work with?
+
+A: The following table lists the DPDK version against which the
+   given versions of Open vSwitch will successfully build.
+
+| Open vSwitch | DPDK
+|::|:-:
+|2.2.x | 1.6
+|2.3.x | 1.6
+|2.4.x | 2.0
+|2.5.x | 2.2
+|2.6.x | 16.04
+
 ### Q: I get an error like this when I configure Open vSwitch:
 
configure: error: Linux kernel in  is version , but
-- 
1.7.4.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] Add VxLAN-GBP support for user space data path

2016-04-19 Thread Johnson.Li
From: Johnson Li 

In user space, only standard VxLAN was support. This patch will
add the VxLAN-GBP support for the user space data path.

How to use:
1> Create VxLAN port with GBP extension
  $ovs-vsctl add-port br-int vxlan0 -- set interface vxlan0 \
   type=vxlan options:dst_port=6633 \
   options:remote_ip=192.168.60.22 \
   options:key=1000 options:exts=gbp
2> Add flow for transmitting
  $ovs-ofctl add-flow br-int "table=0, priority=260, \
 in_port=LOCAL actions=load:0x100->NXM_NX_TUN_GBP_ID[], \
 output:1"
3> Add flow for receiving
  $ovs-ofctl add-flow br-int "table=0, priority=260, \
 in_port=1,tun_gbp_id=0x100 actions=output:LOCAL"

Signed-off-by: Johnson Li 

diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index e398562..e6b35ed 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -1286,6 +1286,7 @@ netdev_vxlan_pop_header(struct dp_packet *packet)
 struct flow_tnl *tnl = >tunnel;
 struct vxlanhdr *vxh;
 unsigned int hlen;
+uint32_t vxh_flags;
 
 pkt_metadata_init_tnl(md);
 if (VXLAN_HLEN > dp_packet_l4_size(packet)) {
@@ -1297,10 +1298,16 @@ netdev_vxlan_pop_header(struct dp_packet *packet)
 return EINVAL;
 }
 
-if (get_16aligned_be32(>vx_flags) != htonl(VXLAN_FLAGS) ||
-   (get_16aligned_be32(>vx_vni) & htonl(0xff))) {
-VLOG_WARN_RL(_rl, "invalid vxlan flags=%#x vni=%#x\n",
- ntohl(get_16aligned_be32(>vx_flags)),
+vxh_flags = get_16aligned_be32(>vx_flags);
+if ((vxh_flags & htonl(VXLAN_GBP_FLAGS)) == htonl(VXLAN_GBP_FLAGS)) {
+tnl->gbp_id = htons(ntohl(vxh_flags) & 0x); /* VXLAN_GBP_ID_MASK */
+} else if (vxh_flags != htonl(VXLAN_FLAGS)) {
+VLOG_WARN_RL(_rl, "invalid vxlan flags=%#x\n",
+ ntohl(get_16aligned_be32(>vx_flags)));
+return EINVAL;
+}
+if (get_16aligned_be32(>vx_vni) & htonl(0xff)) {
+VLOG_WARN_RL(_rl, "invalid vxlan vni=%#x\n",
  ntohl(get_16aligned_be32(>vx_vni)));
 return EINVAL;
 }
@@ -1328,7 +1335,12 @@ netdev_vxlan_build_header(const struct netdev *netdev,
 
 vxh = udp_build_header(tnl_cfg, tnl_flow, data, );
 
-put_16aligned_be32(>vx_flags, htonl(VXLAN_FLAGS));
+if (tnl_cfg->exts & (1 << OVS_VXLAN_EXT_GBP)) {
+put_16aligned_be32(>vx_flags,
+htonl(VXLAN_GBP_FLAGS | ntohs(tnl_flow->tunnel.gbp_id)));
+} else {
+put_16aligned_be32(>vx_flags, htonl(VXLAN_FLAGS));
+}
 put_16aligned_be32(>vx_vni, htonl(ntohll(tnl_flow->tunnel.tun_id) << 
8));
 
 ovs_mutex_unlock(>mutex);
diff --git a/lib/packets.h b/lib/packets.h
index 8139a6b..d9a149d 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -1002,6 +1002,7 @@ struct vxlanhdr {
 };
 
 #define VXLAN_FLAGS 0x0800  /* struct vxlanhdr.vx_flags required value. */
+#define VXLAN_GBP_FLAGS 0x8800  /* Group Based Policy */
 
 void ipv6_format_addr(const struct in6_addr *addr, struct ds *);
 void ipv6_format_addr_bracket(const struct in6_addr *addr, struct ds *,
-- 
1.8.4.2

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v7 05/16] dpif-netdev: Fix race condition in pmd thread initialization.

2016-04-19 Thread Ilya Maximets
On 19.04.2016 10:18, Ilya Maximets wrote:
> There was a reason for 2 calls for dp_netdev_pmd_reload_done() inside
> pmd_thread_main(). The reason is that we must wait until PMD thread
> completely done with reloading. This patch introduces race condition
> for pmd->exit_latch. While removing last port on numa node
> dp_netdev_reload_pmd__(pmd) will be called twice for each port.
> First call to remove port and second to destroy PMD thread.
> pmd->exit_latch setted between this two calls. This leads to probable
> situation when PMD thread will exit while processing first reloading.
> Main thread will wait forever on cond_wait in second reload in this
> case. Situation is easily reproducible by addition/deletion of last
> port (may be after few iterations in a cycle).
> 
> Best regards, Ilya Maximets.

This incremental should help:
--
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 588d56f..2235297 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2785,6 +2785,7 @@ pmd_thread_main(void *f_)
 unsigned int port_seq = PMD_INITIAL_SEQ;
 int poll_cnt;
 int i;
+bool exiting;
 
 poll_cnt = 0;
 poll_list = NULL;
@@ -2825,14 +2826,15 @@ reload:
 }
 }
 
+emc_cache_uninit(>flow_cache);
+
 poll_cnt = pmd_load_queues_and_ports(pmd, _list);
+exiting = latch_is_set(>exit_latch);
 /* Signal here to make sure the pmd finishes
  * reloading the updated configuration. */
 dp_netdev_pmd_reload_done(pmd);
 
-emc_cache_uninit(>flow_cache);
-
-if (!latch_is_set(>exit_latch)){
+if (!exiting) {
 goto reload;
 }
 
--

 
> On 08.04.2016 06:13, Daniele Di Proietto wrote:
>> The pmds and the main threads are synchronized using a condition
>> variable.  The main thread writes a new configuration, then it waits on
>> the condition variable.  A pmd thread reads the new configuration, then
>> it calls signal() on the condition variable. To make sure that the pmds
>> and the main thread have a consistent view, each signal() should be
>> backed by a wait().
>>
>> Currently the first signal() doesn't have a corresponding wait().  If
>> the pmd thread takes a long time to start and the signal() is received
>> by a later wait, the threads will have an inconsistent view.
>>
>> The commit fixes the problem by removing the first signal() from the
>> pmd thread.
>>
>> This is hardly a problem on current master, because the main thread
>> will call the first wait() a long time after the creation of a pmd
>> thread.  It becomes a problem with the next commits.
>>
>> Signed-off-by: Daniele Di Proietto 
>> ---
>>  lib/dpif-netdev.c | 21 +
>>  1 file changed, 9 insertions(+), 12 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 9c32c64..2424d3e 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -2652,21 +2652,22 @@ dpif_netdev_wait(struct dpif *dpif)
>>  
>>  static int
>>  pmd_load_queues(struct dp_netdev_pmd_thread *pmd, struct rxq_poll 
>> **ppoll_list)
>> -OVS_REQUIRES(pmd->poll_mutex)
>>  {
>>  struct rxq_poll *poll_list = *ppoll_list;
>>  struct rxq_poll *poll;
>>  int i;
>>  
>> +ovs_mutex_lock(>poll_mutex);
>>  poll_list = xrealloc(poll_list, pmd->poll_cnt * sizeof *poll_list);
>>  
>>  i = 0;
>>  LIST_FOR_EACH (poll, node, >poll_list) {
>>  poll_list[i++] = *poll;
>>  }
>> +ovs_mutex_unlock(>poll_mutex);
>>  
>>  *ppoll_list = poll_list;
>> -return pmd->poll_cnt;
>> +return i;
>>  }
>>  
>>  static void *
>> @@ -2685,13 +2686,10 @@ pmd_thread_main(void *f_)
>>  /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
>>  ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
>>  pmd_thread_setaffinity_cpu(pmd->core_id);
>> +poll_cnt = pmd_load_queues(pmd, _list);
>>  reload:
>>  emc_cache_init(>flow_cache);
>>  
>> -ovs_mutex_lock(>poll_mutex);
>> -poll_cnt = pmd_load_queues(pmd, _list);
>> -ovs_mutex_unlock(>poll_mutex);
>> -
>>  /* List port/core affinity */
>>  for (i = 0; i < poll_cnt; i++) {
>> VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
>> @@ -2699,10 +2697,6 @@ reload:
>>  netdev_rxq_get_queue_id(poll_list[i].rx));
>>  }
>>  
>> -/* Signal here to make sure the pmd finishes
>> - * reloading the updated configuration. */
>> -dp_netdev_pmd_reload_done(pmd);
>> -
>>  for (;;) {
>>  for (i = 0; i < poll_cnt; i++) {
>>  dp_netdev_process_rxq_port(pmd, poll_list[i].port, 
>> poll_list[i].rx);
>> @@ -2725,14 +2719,17 @@ reload:
>>  }
>>  }
>>  
>> +poll_cnt = pmd_load_queues(pmd, _list);
>> +/* Signal here to make sure the pmd finishes
>> + * reloading the updated configuration. */
>> +dp_netdev_pmd_reload_done(pmd);
>> +
>>  

[ovs-dev] Adding a new field in the flow

2016-04-19 Thread Amrane Ait Zeouay
I want to add a field in the flow, and i checked the FAQ but i have a
problem is that field is not related to the packet, and when i use
"ovs-ofctl add-flow br0
idle_timeout=180,priority=33000,new_field=1547,actions=output:2" it give me
that ovs extract the field but when i use "ovs-ofctl dump-flows br0" it
shows a value and if i excute the command again i found another value, and
when i print the ofpbuf i found that the value of new field (in Hex) is not
in the flow packet. So can anyone help me with this Thank you.

Here what i've got from add-flows

root@root:~# ovs-ofctl add-flow br0
idle_timeout=180,priority=33000,new_field=1547,actions=output:2
2016-04-15T04:40:15Z|1|ofpparse|WARN|new_field ofp-parse 1547
2016-04-15T04:40:15Z|2|ofpparse|WARN|new_field 1547
2016-04-15T04:40:15Z|3|ofpparse|WARN|priority  ofp-parse 33000
2016-04-15T04:40:15Z|4|ofctl|WARN|new_field 1547
2016-04-15T04:40:15Z|5|ofp_msgs|WARN|ofp-msgs msg  size=8,
allocated=8, head=0, tail=0
  04 00 00 08 00 00 00 03-
2016-04-15T04:40:15Z|6|ofp_util|WARN|ofm->match_nf is 1547 is 1547
2016-04-15T04:40:15Z|7|ofctl|WARN|ovs_ofctl  debut   size=88,
allocated=88, head=0, tail=0
  01 0e 00 58 00 00 00 02-00 38 20 ff 00 00 00 00
0010  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
0020  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
0030  00 00 00 00 00 00 00 00-00 00 00 b4 00 00 80 e8
0040  ff ff ff ff ff ff 00 00-06 0b 00 00 00 00 00 00
0050  00 00 00 08 00 02 00 00-

Here what i got from dump-flow

root@root:~# ovs-ofctl dump-flows br0
2016-04-15T04:40:16Z|1|ofpparse|WARN| new_field is 35390
2016-04-15T04:40:16Z|2|ofpparse|WARN|priority  ofp-parse 32768
2016-04-15T04:40:16Z|3|ofp_msgs|WARN|ofp-msgs msg  debut   size=8,
allocated=8, head=0, tail=0
  04 00 00 08 00 00 00 04-
2016-04-15T04:40:16Z|4|ofp_util|WARN|ovs_ofctl msg  debut
size=40, allocated=104, head=0, tail=64
  01 10 00 18 00 00 00 04-ff ff 00 00 00 00 23 20
0010  00 00 00 00 00 00 00 00-ff ff 00 00 ff 00 00 00
0020  00 00 00 00 00 00 00 00-
2016-04-15T04:40:16Z|5|ofp_msgs|WARN|ofp-msgs msg  debut
size=88, allocated=88, head=0, tail=0
  01 11 00 58 00 00 00 04-ff ff 00 00 00 00 23 20
0010  00 00 00 00 00 00 00 00-00 40 00 00 00 00 00 01
0020  0c 56 91 c0 80 e8 00 b4-00 00 00 00 00 02 00 02
0030  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
0040  00 00 00 00 00 00 00 00-20 20 73 69 7a 65 3d 34
0050  00 00 00 08 00 02 00 00-
2016-04-15T04:40:16Z|7|ofp_util|WARN|match nf in NX is 0 in other is 8224
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1.207s, table=0, n_packets=0, n_bytes=0,
idle_timeout=180, idle_age=1, new_field=8224, priority=33000
actions=output:2


-- 

Amrane Ait Zeouay

Engineer Student in The Development of Software and Systems

University of Western Brittany

Tel:  +33 7 62 25 56 03 <+33+7+62+25+56+03>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] ovn-northd: Add support for static_routes.

2016-04-19 Thread steve.ruan
From: Guru Shetty 

static routes are useful when connecting multiple
routers with each other.

Signed-off-by: Gurucharan Shetty 
Signed-off-by: steve.ruan 
---
 ovn/northd/ovn-northd.8.xml   |   5 +-
 ovn/northd/ovn-northd.c   | 101 +
 ovn/ovn-nb.ovsschema  |  15 +++-
 ovn/ovn-nb.xml|  31 +++
 ovn/utilities/ovn-nbctl.8.xml |   5 ++
 ovn/utilities/ovn-nbctl.c |   5 ++
 tests/ovn.at  | 204 ++
 7 files changed, 362 insertions(+), 4 deletions(-)

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index da776e1..978853c 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -682,8 +682,9 @@ next;
 
 
 
-  If the route has a gateway, G is the gateway IP address,
-  otherwise it is ip4.dst.
+  If the route has a gateway, G is the gateway IP address.
+  Instead, if the route is from a configured static route, G
+  is the next hop IP address.  Else it is ip4.dst.
 
   
 
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 260c02f..71c9b29 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -234,6 +234,18 @@ allocate_tnlid(struct hmap *set, const char *name, 
uint32_t max,
 return 0;
 }
 
+/* Holds the next hop ip address and the logical router port via which
+ * a static route is reachable. */
+struct route_to_port {
+ovs_be32 ip;/* network address of the route. */
+ovs_be32 mask;  /* network mask of the route. */
+ovs_be32 next_hop;  /* next_hop ip address for the above route. */
+struct uuid rport;  /* output port specified by CMS, or null if 
not specified */
+struct ovn_port *ovn_port;  /* The logical router port via which the packet
+ * needs to exit to reach the next hop. */
+};
+
+
 /* The 'key' comes from nbs->header_.uuid or nbr->header_.uuid or
  * sb->external_ids:logical-switch. */
 struct ovn_datapath {
@@ -249,6 +261,9 @@ struct ovn_datapath {
 /* Logical router data (digested from nbr). */
 const struct ovn_port *gateway_port;
 ovs_be32 gateway;
+/* Maps a static route to a ovn logical router port via which packet
+ * needs to exit. */
+struct shash routes_map;
 
 /* Logical switch data. */
 struct ovn_port **router_ports;
@@ -272,6 +287,7 @@ ovn_datapath_create(struct hmap *datapaths, const struct 
uuid *key,
 od->nbs = nbs;
 od->nbr = nbr;
 hmap_init(>port_tnlids);
+shash_init(>routes_map);
 od->port_key_hint = 0;
 hmap_insert(datapaths, >key_node, uuid_hash(>key));
 return od;
@@ -286,6 +302,7 @@ ovn_datapath_destroy(struct hmap *datapaths, struct 
ovn_datapath *od)
  * use it. */
 hmap_remove(datapaths, >key_node);
 destroy_tnlids(>port_tnlids);
+shash_destroy_free_data(>routes_map);
 free(od->router_ports);
 free(od);
 }
@@ -318,6 +335,50 @@ ovn_datapath_from_sbrec(struct hmap *datapaths,
 }
 
 static void
+build_static_route(struct ovn_datapath *od,
+ const struct nbrec_logical_router_static_route *route)
+{
+ovs_be32 prefix, next_hop, mask;
+
+/* verify nexthop */
+char *error = ip_parse_masked(route->nexthop, _hop, );
+if (error || mask != OVS_BE32_MAX) {
+static struct vlog_rate_limit rl
+= VLOG_RATE_LIMIT_INIT(5, 1);
+VLOG_WARN_RL(, "bad next hop ip address %s", 
+route->nexthop);
+free(error);
+return;
+}
+
+/* verify prefix */
+error = ip_parse_masked(route->prefix, , );
+if (error || !ip_is_cidr(mask)) {
+static struct vlog_rate_limit rl
+= VLOG_RATE_LIMIT_INIT(5, 1);
+VLOG_WARN_RL(, "bad 'network' in static routes %s", 
+  route->prefix);
+free(error);
+return;
+}
+
+struct uuid lrp_uuid;
+struct route_to_port *route_port = xmalloc(sizeof *route_port);
+route_port->ip = prefix;
+route_port->mask = mask;
+route_port->next_hop = next_hop;
+/* The correct ovn_port will be filled while traversing
+ * logical_router_ports. */
+route_port->ovn_port = NULL;
+memset(_port->rport, 0, sizeof (struct uuid));
+if (route->output_port 
+&& uuid_from_string(_uuid, route->output_port)){
+memcpy(_port->rport, _uuid, sizeof (struct uuid));
+}
+shash_add(>routes_map, route->prefix, route_port);
+}
+
+static void
 join_datapaths(struct northd_context *ctx, struct hmap *datapaths,
struct ovs_list *sb_only, struct ovs_list *nb_only,
struct ovs_list *both)
@@ -409,6 +470,14 @@ join_datapaths(struct northd_context *ctx, struct hmap 
*datapaths,
 /* Set the gateway port to NULL.  If there is a gateway, it will 

[ovs-dev] [PATCH v1] Support port level IPFIX

2016-04-19 Thread Daniel Benli Ye
From: Benli Ye 

This patch enables port level IPFIX. Before this patch, OVS supported
per bridge IPFIX and per flow IPFX, and exporting packet tunnel headers
is only supported by bridge IPFIX. This patch adds port level IPFIX
for easy configuration and port level IPFIX also supports exporting
packet tunnel headers, just the same with bridge level IPFIX.
Three main things are done in this patch.
  1) Add a column ipfix in Port table to ref IPFIX table
  2) Each interface in the port should use the port IPFiX configuration
  3) A hash map is used to manage the port which is configured IPFIX

CLI to configure Port IPFIX:
  1) Configure
 ovs-vsctl -- set Port port0 ipfix=@i -- --id=@i create IPFIX \
 targets=\"10.24.122.72:4739\" sampling=1 obs_domain_id=123 \
 obs_point_id=456 cache_active_timeout=1 cache_max_flows=128 \
 other_config:enable-tunnel-sampling=true
  2) Clear
 ovs-vsctl clear Port port0 ipfix
---
 lib/odp-util.c|  29 ++-
 lib/odp-util.h|  19 +-
 ofproto/ofproto-dpif-ipfix.c  | 403 +++---
 ofproto/ofproto-dpif-ipfix.h  |  17 ++
 ofproto/ofproto-dpif-upcall.c |  39 +++-
 ofproto/ofproto-dpif-xlate.c  | 117 
 ofproto/ofproto-dpif-xlate.h  |   3 +-
 ofproto/ofproto-dpif.c|  19 +-
 ofproto/ofproto-provider.h|   7 +-
 ofproto/ofproto.c |   7 +-
 ofproto/ofproto.h |  23 +++
 vswitchd/bridge.c | 123 +++--
 vswitchd/vswitch.ovsschema|   6 +-
 vswitchd/vswitch.xml  |  34 +++-
 14 files changed, 741 insertions(+), 105 deletions(-)

diff --git a/lib/odp-util.c b/lib/odp-util.c
index b4689cc..453ae4f 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -316,10 +316,16 @@ format_odp_userspace_action(struct ds *ds, const struct 
nlattr *attr)
   cookie.flow_sample.collector_set_id,
   cookie.flow_sample.obs_domain_id,
   cookie.flow_sample.obs_point_id);
-} else if (userdata_len >= sizeof cookie.ipfix
-   && cookie.type == USER_ACTION_COOKIE_IPFIX) {
-ds_put_format(ds, ",ipfix(output_port=%"PRIu32")",
-  cookie.ipfix.output_odp_port);
+} else if (userdata_len >= sizeof cookie.bridge_ipfix
+   && cookie.type == USER_ACTION_COOKIE_BRIDGE_IPFIX) {
+ds_put_format(ds, ",bridge_ipfix(output_port=%"PRIu32")",
+  cookie.bridge_ipfix.output_odp_port);
+} else if (userdata_len >= sizeof cookie.port_ipfix
+   && cookie.type == USER_ACTION_COOKIE_PORT_IPFIX) {
+ds_put_format(ds, ",port_ipfix(ofp_port=%"PRIu16
+  ",output_port=%"PRIu32")",
+  cookie.port_ipfix.ofp_port,
+  cookie.port_ipfix.output_odp_port);
 } else {
 userdata_unspec = true;
 }
@@ -963,13 +969,20 @@ parse_odp_userspace_action(const char *s, struct ofpbuf 
*actions)
 cookie.flow_sample.obs_point_id = obs_point_id;
 user_data = 
 user_data_size = sizeof cookie.flow_sample;
-} else if (ovs_scan([n], ",ipfix(output_port=%"SCNi32")%n",
+} else if (ovs_scan([n], ",bridge_ipfix(output_port=%"SCNi32")%n",
 , ) ) {
 n += n1;
-cookie.type = USER_ACTION_COOKIE_IPFIX;
-cookie.ipfix.output_odp_port = u32_to_odp(output);
+cookie.type = USER_ACTION_COOKIE_BRIDGE_IPFIX;
+cookie.bridge_ipfix.output_odp_port = u32_to_odp(output);
 user_data = 
-user_data_size = sizeof cookie.ipfix;
+user_data_size = sizeof cookie.bridge_ipfix;
+} else if (ovs_scan([n], ",port_ipfix(output_port=%"SCNi32")%n",
+, ) ) {
+n += n1;
+cookie.type = USER_ACTION_COOKIE_PORT_IPFIX;
+cookie.port_ipfix.output_odp_port = u32_to_odp(output);
+user_data = 
+user_data_size = sizeof cookie.port_ipfix;
 } else if (ovs_scan([n], ",userdata(%n",
 )) {
 char *end;
diff --git a/lib/odp-util.h b/lib/odp-util.h
index 51cf5c3..4c9f271 100644
--- a/lib/odp-util.h
+++ b/lib/odp-util.h
@@ -274,10 +274,11 @@ enum slow_path_reason commit_odp_actions(const struct 
flow *,
 
 enum user_action_cookie_type {
 USER_ACTION_COOKIE_UNSPEC,
-USER_ACTION_COOKIE_SFLOW,/* Packet for per-bridge sFlow sampling. 
*/
-USER_ACTION_COOKIE_SLOW_PATH,/* Userspace must process this flow. */
-USER_ACTION_COOKIE_FLOW_SAMPLE,  /* Packet for per-flow sampling. */
-USER_ACTION_COOKIE_IPFIX,/* Packet for per-bridge IPFIX sampling. 
*/
+USER_ACTION_COOKIE_SFLOW, /* Packet for 

Re: [ovs-dev] [PATCH v7 08/16] dpif-netdev: Add pmd thread local port cache for transmission.

2016-04-19 Thread Ilya Maximets
On 19.04.2016 00:19, Daniele Di Proietto wrote:
> 
> 
> On 18/04/2016 07:50, "Ilya Maximets"  wrote:
> 
>> On 08.04.2016 06:13, Daniele Di Proietto wrote:
>>> Signed-off-by: Daniele Di Proietto 
>>> ---
>>>  lib/dpif-netdev.c | 243
>>> +++---
>>>  1 file changed, 175 insertions(+), 68 deletions(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index 8c5893d..5d1cc43 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -185,6 +185,7 @@ static bool dpcls_lookup(const struct dpcls *cls,
>>>   *
>>>   *dp_netdev_mutex (global)
>>>   *port_mutex
>>> + *non_pmd_mutex
>>>   */
>>>  struct dp_netdev {
>>>  const struct dpif_class *const class;
>>> @@ -380,6 +381,13 @@ struct rxq_poll {
>>>  struct ovs_list node;
>>>  };
>>>  
>>> +/* Contained by struct dp_netdev_pmd_thread's 'port_cache' or
>>> 'tx_ports'. */
>>> +struct tx_port {
>>> +odp_port_t port_no;
>>> +struct netdev *netdev;
>>> +struct hmap_node node;
>>> +};
>>> +
>>>  /* PMD: Poll modes drivers.  PMD accesses devices via polling to
>>> eliminate
>>>   * the performance overhead of interrupt processing.  Therefore netdev
>>> can
>>>   * not implement rx-wait for these devices.  dpif-netdev needs to poll
>>> @@ -436,10 +444,18 @@ struct dp_netdev_pmd_thread {
>>>  atomic_int tx_qid;  /* Queue id used by this pmd
>>> thread to
>>>   * send packets on all netdevs */
>>>  
>>> -struct ovs_mutex poll_mutex;/* Mutex for poll_list. */
>>> +struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and
>>> 'tx_ports'. */
>>>  /* List of rx queues to poll. */
>>>  struct ovs_list poll_list OVS_GUARDED;
>>> -int poll_cnt;   /* Number of elemints in
>>> poll_list. */
>>> +/* Number of elements in 'poll_list' */
>>> +int poll_cnt;
>>> +/* Map of 'tx_port's used for transmission.  Written by the main
>>> thread,
>>> + * read by the pmd thread. */
>>> +struct hmap tx_ports OVS_GUARDED;
>>> +
>>> +/* Map of 'tx_port' used in the fast path. This is a thread-local
>>> copy
>>> + * 'tx_ports'. */
>>> +struct hmap port_cache;
>>>  
>>>  /* Only a pmd thread can write on its own 'cycles' and 'stats'.
>>>   * The main thread keeps 'stats_zero' and 'cycles_zero' as base
>>> @@ -495,7 +511,7 @@ dp_netdev_pmd_get_next(struct dp_netdev *dp, struct
>>> cmap_position *pos);
>>>  static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp);
>>>  static void dp_netdev_del_pmds_on_numa(struct dp_netdev *dp, int
>>> numa_id);
>>>  static void dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int
>>> numa_id);
>>> -static void dp_netdev_pmd_clear_poll_list(struct dp_netdev_pmd_thread
>>> *pmd);
>>> +static void dp_netdev_pmd_clear_ports(struct dp_netdev_pmd_thread
>>> *pmd);
>>>  static void dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp,
>>>   struct dp_netdev_port
>>> *port);
>>>  static void
>>> @@ -509,6 +525,8 @@ static void dp_netdev_reset_pmd_threads(struct
>>> dp_netdev *dp);
>>>  static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd);
>>>  static void dp_netdev_pmd_unref(struct dp_netdev_pmd_thread *pmd);
>>>  static void dp_netdev_pmd_flow_flush(struct dp_netdev_pmd_thread *pmd);
>>> +static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd)
>>> +OVS_REQUIRES(pmd->port_mutex);
>>>  
>>>  static inline bool emc_entry_alive(struct emc_entry *ce);
>>>  static void emc_clear_entry(struct emc_entry *ce);
>>> @@ -691,7 +709,7 @@ pmd_info_show_rxq(struct ds *reply, struct
>>> dp_netdev_pmd_thread *pmd)
>>>  ds_put_format(reply, "pmd thread numa_id %d core_id %u:\n",
>>>pmd->numa_id, pmd->core_id);
>>>  
>>> -ovs_mutex_lock(>poll_mutex);
>>> +ovs_mutex_lock(>port_mutex);
>>>  LIST_FOR_EACH (poll, node, >poll_list) {
>>>  const char *name = netdev_get_name(poll->port->netdev);
>>>  
>>> @@ -705,7 +723,7 @@ pmd_info_show_rxq(struct ds *reply, struct
>>> dp_netdev_pmd_thread *pmd)
>>>  ds_put_format(reply, " %d",
>>> netdev_rxq_get_queue_id(poll->rx));
>>>  prev_name = name;
>>>  }
>>> -ovs_mutex_unlock(>poll_mutex);
>>> +ovs_mutex_unlock(>port_mutex);
>>>  ds_put_cstr(reply, "\n");
>>>  }
>>>  }
>>> @@ -1078,6 +1096,11 @@ dp_netdev_reload_pmd__(struct
>>> dp_netdev_pmd_thread *pmd)
>>>  int old_seq;
>>>  
>>>  if (pmd->core_id == NON_PMD_CORE_ID) {
>>> +ovs_mutex_lock(>dp->non_pmd_mutex);
>>> +ovs_mutex_lock(>port_mutex);
>>> +pmd_load_cached_ports(pmd);
>>> +ovs_mutex_unlock(>port_mutex);
>>> +ovs_mutex_unlock(>dp->non_pmd_mutex);
>>>  return;
>>>  }
>>>  
>>> @@ -1200,9 +1223,7 @@ do_add_port(struct dp_netdev *dp, const char
>>> 

Re: [ovs-dev] [PATCH v7 05/16] dpif-netdev: Fix race condition in pmd thread initialization.

2016-04-19 Thread Ilya Maximets
There was a reason for 2 calls for dp_netdev_pmd_reload_done() inside
pmd_thread_main(). The reason is that we must wait until PMD thread
completely done with reloading. This patch introduces race condition
for pmd->exit_latch. While removing last port on numa node
dp_netdev_reload_pmd__(pmd) will be called twice for each port.
First call to remove port and second to destroy PMD thread.
pmd->exit_latch setted between this two calls. This leads to probable
situation when PMD thread will exit while processing first reloading.
Main thread will wait forever on cond_wait in second reload in this
case. Situation is easily reproducible by addition/deletion of last
port (may be after few iterations in a cycle).

Best regards, Ilya Maximets.

On 08.04.2016 06:13, Daniele Di Proietto wrote:
> The pmds and the main threads are synchronized using a condition
> variable.  The main thread writes a new configuration, then it waits on
> the condition variable.  A pmd thread reads the new configuration, then
> it calls signal() on the condition variable. To make sure that the pmds
> and the main thread have a consistent view, each signal() should be
> backed by a wait().
> 
> Currently the first signal() doesn't have a corresponding wait().  If
> the pmd thread takes a long time to start and the signal() is received
> by a later wait, the threads will have an inconsistent view.
> 
> The commit fixes the problem by removing the first signal() from the
> pmd thread.
> 
> This is hardly a problem on current master, because the main thread
> will call the first wait() a long time after the creation of a pmd
> thread.  It becomes a problem with the next commits.
> 
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/dpif-netdev.c | 21 +
>  1 file changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 9c32c64..2424d3e 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -2652,21 +2652,22 @@ dpif_netdev_wait(struct dpif *dpif)
>  
>  static int
>  pmd_load_queues(struct dp_netdev_pmd_thread *pmd, struct rxq_poll 
> **ppoll_list)
> -OVS_REQUIRES(pmd->poll_mutex)
>  {
>  struct rxq_poll *poll_list = *ppoll_list;
>  struct rxq_poll *poll;
>  int i;
>  
> +ovs_mutex_lock(>poll_mutex);
>  poll_list = xrealloc(poll_list, pmd->poll_cnt * sizeof *poll_list);
>  
>  i = 0;
>  LIST_FOR_EACH (poll, node, >poll_list) {
>  poll_list[i++] = *poll;
>  }
> +ovs_mutex_unlock(>poll_mutex);
>  
>  *ppoll_list = poll_list;
> -return pmd->poll_cnt;
> +return i;
>  }
>  
>  static void *
> @@ -2685,13 +2686,10 @@ pmd_thread_main(void *f_)
>  /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
>  ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
>  pmd_thread_setaffinity_cpu(pmd->core_id);
> +poll_cnt = pmd_load_queues(pmd, _list);
>  reload:
>  emc_cache_init(>flow_cache);
>  
> -ovs_mutex_lock(>poll_mutex);
> -poll_cnt = pmd_load_queues(pmd, _list);
> -ovs_mutex_unlock(>poll_mutex);
> -
>  /* List port/core affinity */
>  for (i = 0; i < poll_cnt; i++) {
> VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
> @@ -2699,10 +2697,6 @@ reload:
>  netdev_rxq_get_queue_id(poll_list[i].rx));
>  }
>  
> -/* Signal here to make sure the pmd finishes
> - * reloading the updated configuration. */
> -dp_netdev_pmd_reload_done(pmd);
> -
>  for (;;) {
>  for (i = 0; i < poll_cnt; i++) {
>  dp_netdev_process_rxq_port(pmd, poll_list[i].port, 
> poll_list[i].rx);
> @@ -2725,14 +2719,17 @@ reload:
>  }
>  }
>  
> +poll_cnt = pmd_load_queues(pmd, _list);
> +/* Signal here to make sure the pmd finishes
> + * reloading the updated configuration. */
> +dp_netdev_pmd_reload_done(pmd);
> +
>  emc_cache_uninit(>flow_cache);
>  
>  if (!latch_is_set(>exit_latch)){
>  goto reload;
>  }
>  
> -dp_netdev_pmd_reload_done(pmd);
> -
>  free(poll_list);
>  return NULL;
>  }
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev