Re: [ovs-dev] [PATCH] Add VxLAN-GBP support for user space data path
> -Original Message- > From: Jesse Gross [mailto:je...@kernel.org] > Sent: Tuesday, April 19, 2016 11:22 PM > To: Li, Johnson> Cc: ovs dev > Subject: Re: [ovs-dev] [PATCH] Add VxLAN-GBP support for user space data > path > > On Tue, Apr 19, 2016 at 3:20 AM, Johnson.Li wrote: > > diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c index > > e398562..e6b35ed 100644 > > --- a/lib/netdev-vport.c > > +++ b/lib/netdev-vport.c > > @@ -1297,10 +1298,16 @@ netdev_vxlan_pop_header(struct dp_packet > *packet) > > return EINVAL; > > } > > > > -if (get_16aligned_be32(>vx_flags) != htonl(VXLAN_FLAGS) || > > - (get_16aligned_be32(>vx_vni) & htonl(0xff))) { > > -VLOG_WARN_RL(_rl, "invalid vxlan flags=%#x vni=%#x\n", > > - ntohl(get_16aligned_be32(>vx_flags)), > > +vxh_flags = get_16aligned_be32(>vx_flags); > > +if ((vxh_flags & htonl(VXLAN_GBP_FLAGS)) == > > + htonl(VXLAN_GBP_FLAGS)) { > > There are many overlapping extensions to VXLAN. We should only try to > parse these flags when the GBP extensions is actually turned on. > > > +tnl->gbp_id = htons(ntohl(vxh_flags) & 0x); /* > > + VXLAN_GBP_ID_MASK */ > > There is a field called tnl->gbp_flags which should presumably also be > populated here (and on transmit). > Ok, I will try to check each flag individually and store them into the tnl->gbl_flags. > > diff --git a/lib/packets.h b/lib/packets.h index 8139a6b..d9a149d > > 100644 > > --- a/lib/packets.h > > +++ b/lib/packets.h > > @@ -1002,6 +1002,7 @@ struct vxlanhdr { }; > > > > #define VXLAN_FLAGS 0x0800 /* struct vxlanhdr.vx_flags required > > value. */ > > +#define VXLAN_GBP_FLAGS 0x8800 /* Group Based Policy */ > > Can you please break these down into individual flags so that it is easier to > tell what is going on (and you could use them in parsing as well)? > > Can you also please add some test cases? I will, Thanks for your advice. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] 答复: Reply: ovs + dpdk vhost-user match flows but cannot execute actions
Hi Mauricio Vasquez: root@host52:~# ovs-vsctl show f2664a74-7523-4240-812a-4f022051346a Bridge "ovsbr0" Port "vhost-user-1" Interface "vhost-user-1" type: dpdkvhostuser Port "dpdk0" Interface "dpdk0" type: dpdk Port "ovsbr0" Interface "ovsbr0" type: internal Port "vhost-user-0" Interface "vhost-user-0" type: dpdkvhostuser Port "dpdk1" Interface "dpdk1" type: dpdk root@host52:~# ovs-ofctl dump-flows ovsbr0 NXST_FLOW reply (xid=0x4): cookie=0x0, duration=291278.976s, table=0, n_packets=4414, n_bytes=392730, idle_age=65534, hard_age=65534, priority=0 actions=NORMAL root@host52:~# ovs-ofctl dump-ports ovsbr0 OFPST_PORT reply (xid=0x2): 5 ports port 4: rx pkts=2472, bytes=207008, drop=?, errs=0, frame=?, over=?, crc=? tx pkts=1893, bytes=181572, drop=49, errs=?, coll=? port LOCAL: rx pkts=23, bytes=1278, drop=0, errs=0, frame=0, over=0, crc=0 tx pkts=631, bytes=28750, drop=0, errs=0, coll=0 port 1: rx pkts=272, bytes=26283, drop=127396, errs=0, frame=?, over=?, crc=? tx pkts=0, bytes=0, drop=0, errs=0, coll=? port 2: rx pkts=0, bytes=0, drop=0, errs=0, frame=?, over=?, crc=? tx pkts=0, bytes=0, drop=0, errs=0, coll=? port 3: rx pkts=1919, bytes=184174, drop=?, errs=0, frame=?, over=?, crc=? tx pkts=2462, bytes=205658, drop=0, errs=?, coll=? Thank you. 发件人: Mauricio Vásquez [mailto:mauricio.vasquezber...@studenti.polito.it] 发送时间: 2016年4月19日 21:49 收件人: lifuqiong 抄送: dev@openvswitch.org 主题: Re: Reply: [ovs-dev] ovs + dpdk vhost-user match flows but cannot execute actions Hi lifuqiong, Could you provide the output of the following commands in your setup?: ovs-vsctl show ovs-ofctl dump-flows ovs-ofctl dump-ports Mauricio Vasquez, On Sat, Apr 16, 2016 at 10:23 AM, lifuqiongwrote: Hi Mauricio: I changed my qemu version from 2.2.1 to 2.5.0 and the vms can communication with each other. But the VM cannot ping PC in the outside network. They in a same subnet. PC(192.168.0.103/24) ping vm(192.168.0.90); vm’s host’s dpdk NIC and PC are in a L2 switch. And DPDK NIC connected to switch’s Port Eth1/0/8 Investigation: Binding VM’s IP and Mac in PC’s ARP table, and binding VM’s Mac in Port Eth1/0/8; PC is pinging vm, I can see Output packets of Eth1/0/8 increasing regularly, which means the ping request packet has send to DPDK NIC. But the VM can not receive the packet? My switch has NO vlan , the default VLAN is 1. But when I ping from vm1 to vm2 , showing MaC address in OVS which shows as follows: port VLAN MACAge 4 0 00:00:00:00:02:121 3 0 00:00:00:00:00:041 What was the reason why VM cannot communication with the PC? Thank you 发件人: lifuqiong [mailto:lifuqi...@cncloudsec.com] 发送时间: 2016年4月15日 9:22 收件人: 'Mauricio Vásquez' 抄送: 'dev@openvswitch.org' 主题: 答复: [ovs-dev] ovs + dpdk vhost-user match flows but cannot execute actions Hello Mauricio Vasquez: It works. Thank you very much. 发件人: Mauricio Vásquez [mailto:mauricio.vasquezber...@studenti.polito.it] 发送时间: 2016年4月14日 14:55 收件人: lifuqiong 抄送: dev@openvswitch.org 主题: Re: [ovs-dev] ovs + dpdk vhost-user match flows but cannot execute actions Hello lifuqiong, I faced the same problem some days ago (http://openvswitch.org/pipermail/dev/2016-March/068282.html), the bug is already fixed. Where are you downloading OVS from?, it appears that the bug is still present in the version on http://openvswitch.org/releases/openvswitch-2.5.0.tar.gz, please download ovs from git and switch to branch-2.5. Mauricio Vasquez, On Thu, Apr 14, 2016 at 4:28 AM, lifuqiong wrote: I want to test dpdk vhost-user port on ovs to follow https://software.intel.com/en-us/blogs/2015/06/09/building-vhost-user-for-ovs-today-using-dpdk-200; I create ovs+dpdk environment followed INSTALL.DPDK.md; and create 2 VM2, try to ping each other but show me “Destination Host Unreachable”; Dump-flows shows packets matched the flow, but can’t output to port 4, why ? I can’t get any useful error or warning info from ovs-vswitchd.log. While ping from vm1 to vm2, statistics on vm1 shows that eth1 RX_packet keeps zero, TX_PACKET keeps increasing. 1. OVS: 2.5.0 Dpdk: 2.2.0 Qemu: 2.2.1 2. ovs-ofctl dump-flows ovsbr0 NXST_FLOW reply (xid=0x4): cookie=0x0, duration=836.946s, table=0, n_packets=628, n_bytes=26376, idle_age=0, in_port=3 actions=output:4 cookie=0x0, duration=831.458s, table=0, n_packets=36, n_bytes=1512, idle_age=770, in_port=4 actions=output:3 3. root@host152:/usr/local/var/run/openvswitch# ovs-vsctl show 03ae6f7d-3b71-45e3-beb0-09fa11292eaa Bridge "ovsbr0"
Re: [ovs-dev] [PATCH v2 15/15] system-tests: Run conntrack tests with userspace
On 19 April 2016 at 16:53, Joe Stringerwrote: > On 15 April 2016 at 17:02, Daniele Di Proietto wrote: >> The userspace connection tracker doesn't support ALGs, frag reassembly >> or NAT yet, so skip those tests. >> >> Also, connection tracking state input from a local port is not possible >> in userspace. >> >> Finally, the userspace datapath pads all frames with 0, to make them at >> least 64 bytes. >> >> Signed-off-by: Daniele Di Proietto > > Having three prongs to your commit message like this usually indicates > that there are three separate patches folded inside ;-) > > FWIW the LOCAL tests should work whether the local stack provides > conntrack state input or not - they just exercise slightly different > codepaths in the kernel, once the packet leaves OVS, by going out a > different netdev type (with a different destination netns). Probably > the class of bugs these tests pick up are not applicable for userspace > datapath, so it seems fine to skip the conntrack-related ones. I > wouldn't mind seeing a basic sanity test that sends packets between > the local stack and another namespace, though - which would run for > both testsuites, and both with IPv4 and IPv6 traffic. I was mistaken here. I looked back over and apparently some of the tests do rely on this behaviour, and they're passing on the latest net-next kernels. They can be skipped for userspace. > A few comments on the comments below, but LGTM. > > Acked-by: Joe Stringer > >> --- >> tests/system-kmod-macros.at | 28 +++ >> tests/system-traffic.at | 49 >> ++-- >> tests/system-userspace-macros.at | 45 +--- >> 3 files changed, 107 insertions(+), 15 deletions(-) >> >> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at >> index 8e60929..4cecc23 100644 >> --- a/tests/system-kmod-macros.at >> +++ b/tests/system-kmod-macros.at >> @@ -65,3 +65,31 @@ m4_define([CHECK_CONNTRACK], >> on_exit 'ovstest test-netlink-conntrack flush' >> ] >> ) >> + >> +# CHECK_CONNTRACK_ALG() >> +# >> +# Perform requirements checks for running conntrack ALG tests. The kernel >> +# always supports ALG, so no check is needed. >> +# >> +m4_define([CHECK_CONNTRACK_ALG]) > > Saying the kernel "always" supports X is a little misleading, as these > features can all be turned off in the CONFIG.. of course, all the > major distros provide kernels with them turned on so it's not such a > big deal. Maybe drop the 'always'. > >> +# CHECK_CONNTRACK_FRAG() >> +# >> +# Perform requirements checks for running conntrack fragmentations tests. >> +# The kernel always supports fragmentation, so no check is needed. >> +m4_define([CHECK_CONNTRACK_FRAG]) >> + >> +# CHECK_CONNTRACK_LOCAL_STACK() >> +# >> +# Perform requirements checks for running conntrack tests with local stack. >> +# The kernel always supports reading the connection state of an skb coming >> +# from an internal port, without an explicit ct() action, so no check is >> +# needed. >> +m4_define([CHECK_CONNTRACK_LOCAL_STACK]) > > While the kernel module supports this, the skb is not guaranteed to > have conntrack state when coming from an internal port, depending on > the kernel version and whether iptables rules are installed in the > host network namespace. Looks like I got a little mixed up - netfilter hooks are not installed in all namespaces by default on the latest kernels, but I'd guess that they're still set up in the root netns so it should be fairly reliable there (at least for the moment). > > >> +# CHECK_CONNTRACK_LOCAL_STACK() >> +# >> +# Perform requirements checks for running conntrack tests with local stack. >> +# While the kernel connection tracker automatically passes all the >> connection >> +# tracking state from an internal port to the OpenvSwitch kernel module, >> there >> +# is simply no way of doing that with the userspace, so skip the tests. >> +m4_define([CHECK_CONNTRACK_LOCAL_STACK], >> +[ >> +AT_SKIP_IF([:]) >> +]) > > Same comment here, packets may not have conntrack state associated > when arriving on internal port. > >> +# CHECK_CONNTRACK_NAT() >> +# >> +# Perform requirements checks for running conntrack NAT tests. The userspace >> +# doesn't support NATs yet, so skip the tests >> +# >> +m4_define([CHECK_CONNTRACK_NAT], >> +[ >> +AT_SKIP_IF([:]) >> +]) >> -- >> 2.1.4 >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/4] ovsdb: Introduce OVSDB replication feature
Hi, As part of this patch the following scenario takes place: An ovsdb-server acts as a client of another ovsdb-server: Server A is monitoring Server B tables, so Server B constantly sends update notifications to Server A. When an update notification arrives to Server A, I'd like to process that notification right away, as if it were a client request. In order to wake up Server A from poll block when an update notification arrives, how can I add the file descriptor corresponding to the jsonrpc receiving notifications to the set of file descriptors used to wake up the server during a poll block? Currently, Server A process the incoming update notifications when status_timer in the server main loop is elapsed. Could someone guide me on how to implement that? Thanks, -Mario -Original Message- From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Cabrera Vega, Mario Alberto Sent: Tuesday, March 29, 2016 3:30 PM To: dev@openvswitch.org Subject: [ovs-dev] [PATCH 2/4] ovsdb: Introduce OVSDB replication feature Replication is enabled by using the following option when starting the database server: --sync-from=server Where 'server' can take any form described in the ovsdb-client(1) manpage as an active connection. If this option is specified, the replication process is immediately started. Signed-off-by: Mario Cabrera--- ovsdb/automake.mk | 6 +- ovsdb/ovsdb-server.1.in | 3 + ovsdb/ovsdb-server.c | 46 ++-- ovsdb/replication-syn.man | 2 + ovsdb/replication.c | 597 ++ ovsdb/replication.h | 39 +++ ovsdb/replication.man | 8 + tests/ovsdb-server.at | 51 8 files changed, 725 insertions(+), 27 deletions(-) create mode 100644 ovsdb/replication-syn.man create mode 100644 ovsdb/replication.c create mode 100644 ovsdb/replication.h create mode 100644 ovsdb/replication.man diff --git a/ovsdb/automake.mk b/ovsdb/automake.mk index 7db6fea..099ed3c 100644 --- a/ovsdb/automake.mk +++ b/ovsdb/automake.mk @@ -24,6 +24,8 @@ ovsdb_libovsdb_la_SOURCES = \ ovsdb/monitor.h \ ovsdb/query.c \ ovsdb/query.h \ + ovsdb/replication.c \ + ovsdb/replication.h \ ovsdb/row.c \ ovsdb/row.h \ ovsdb/server.c \ @@ -42,7 +44,9 @@ pkgconfig_DATA += \ MAN_FRAGMENTS += \ ovsdb/remote-active.man \ - ovsdb/remote-passive.man + ovsdb/remote-passive.man \ + ovsdb/replication.man \ + ovsdb/replication-syn.man # ovsdb-tool bin_PROGRAMS += ovsdb/ovsdb-tool diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in index 6c85729..1025ade 100644 --- a/ovsdb/ovsdb-server.1.in +++ b/ovsdb/ovsdb-server.1.in @@ -19,6 +19,7 @@ ovsdb\-server \- Open vSwitch database server .so lib/daemon-syn.man .so lib/service-syn.man .so lib/vlog-syn.man +.so ovsdb/replication-syn.man .so lib/ssl-syn.man .so lib/ssl-bootstrap-syn.man .so lib/ssl-peer-ca-cert-syn.man @@ -100,6 +101,8 @@ configured remotes. .so lib/service.man .SS "Logging Options" .so lib/vlog.man +.SS "Syncing Options" +.so ovsdb/replication.man .SS "Public Key Infrastructure Options" The options described below for configuring the SSL public key infrastructure accept a special syntax for obtaining their diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c index fa662b1..63dd209 100644 --- a/ovsdb/ovsdb-server.c +++ b/ovsdb/ovsdb-server.c @@ -42,6 +42,7 @@ #include "ovsdb-error.h" #include "poll-loop.h" #include "process.h" +#include "replication.h" #include "row.h" #include "simap.h" #include "shash.h" @@ -59,15 +60,7 @@ VLOG_DEFINE_THIS_MODULE(ovsdb_server); -struct db { -/* Initialized in main(). */ -char *filename; -struct ovsdb_file *file; -struct ovsdb *db; - -/* Only used by update_remote_status(). */ -struct ovsdb_txn *txn; -}; +struct db; /* SSL configuration. */ static char *private_key_file; @@ -75,6 +68,9 @@ static char *certificate_file; static char *ca_cert_file; static bool bootstrap_ca_cert; +/* Replication configuration. */ +static bool connect_to_remote_server; + static unixctl_cb_func ovsdb_server_exit; static unixctl_cb_func ovsdb_server_compact; static unixctl_cb_func ovsdb_server_reconnect; @@ -159,6 +155,10 @@ main_loop(struct ovsdb_jsonrpc_server *jsonrpc, struct shash *all_dbs, report_error_if_changed(reconfigure_ssl(all_dbs), _error); ovsdb_jsonrpc_server_run(jsonrpc); +if (connect_to_remote_server) { + replication_run(all_dbs); +} + SHASH_FOR_EACH(node, all_dbs) { struct db *db = node->data; ovsdb_trigger_run(db->db, time_msec()); @@ -170,9 +170,9 @@ main_loop(struct ovsdb_jsonrpc_server *jsonrpc, struct shash *all_dbs, } } -/* update Manager status(es) every 5 seconds */ +/* update Manager status(es)
Re: [ovs-dev] [PATCH v2 15/15] system-tests: Run conntrack tests with userspace
On 15 April 2016 at 17:02, Daniele Di Proiettowrote: > The userspace connection tracker doesn't support ALGs, frag reassembly > or NAT yet, so skip those tests. > > Also, connection tracking state input from a local port is not possible > in userspace. > > Finally, the userspace datapath pads all frames with 0, to make them at > least 64 bytes. > > Signed-off-by: Daniele Di Proietto Having three prongs to your commit message like this usually indicates that there are three separate patches folded inside ;-) FWIW the LOCAL tests should work whether the local stack provides conntrack state input or not - they just exercise slightly different codepaths in the kernel, once the packet leaves OVS, by going out a different netdev type (with a different destination netns). Probably the class of bugs these tests pick up are not applicable for userspace datapath, so it seems fine to skip the conntrack-related ones. I wouldn't mind seeing a basic sanity test that sends packets between the local stack and another namespace, though - which would run for both testsuites, and both with IPv4 and IPv6 traffic. A few comments on the comments below, but LGTM. Acked-by: Joe Stringer > --- > tests/system-kmod-macros.at | 28 +++ > tests/system-traffic.at | 49 > ++-- > tests/system-userspace-macros.at | 45 +--- > 3 files changed, 107 insertions(+), 15 deletions(-) > > diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at > index 8e60929..4cecc23 100644 > --- a/tests/system-kmod-macros.at > +++ b/tests/system-kmod-macros.at > @@ -65,3 +65,31 @@ m4_define([CHECK_CONNTRACK], > on_exit 'ovstest test-netlink-conntrack flush' > ] > ) > + > +# CHECK_CONNTRACK_ALG() > +# > +# Perform requirements checks for running conntrack ALG tests. The kernel > +# always supports ALG, so no check is needed. > +# > +m4_define([CHECK_CONNTRACK_ALG]) Saying the kernel "always" supports X is a little misleading, as these features can all be turned off in the CONFIG.. of course, all the major distros provide kernels with them turned on so it's not such a big deal. Maybe drop the 'always'. > +# CHECK_CONNTRACK_FRAG() > +# > +# Perform requirements checks for running conntrack fragmentations tests. > +# The kernel always supports fragmentation, so no check is needed. > +m4_define([CHECK_CONNTRACK_FRAG]) > + > +# CHECK_CONNTRACK_LOCAL_STACK() > +# > +# Perform requirements checks for running conntrack tests with local stack. > +# The kernel always supports reading the connection state of an skb coming > +# from an internal port, without an explicit ct() action, so no check is > +# needed. > +m4_define([CHECK_CONNTRACK_LOCAL_STACK]) While the kernel module supports this, the skb is not guaranteed to have conntrack state when coming from an internal port, depending on the kernel version and whether iptables rules are installed in the host network namespace. > +# CHECK_CONNTRACK_LOCAL_STACK() > +# > +# Perform requirements checks for running conntrack tests with local stack. > +# While the kernel connection tracker automatically passes all the connection > +# tracking state from an internal port to the OpenvSwitch kernel module, > there > +# is simply no way of doing that with the userspace, so skip the tests. > +m4_define([CHECK_CONNTRACK_LOCAL_STACK], > +[ > +AT_SKIP_IF([:]) > +]) Same comment here, packets may not have conntrack state associated when arriving on internal port. > +# CHECK_CONNTRACK_NAT() > +# > +# Perform requirements checks for running conntrack NAT tests. The userspace > +# doesn't support NATs yet, so skip the tests > +# > +m4_define([CHECK_CONNTRACK_NAT], > +[ > +AT_SKIP_IF([:]) > +]) > -- > 2.1.4 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Open Dev's INBOX with UNREAD MESSAGE of Kassie Detemple
George went oï the top of food.What's up my lovely peckeri found yr images on instagram! you are rogue!are u down to f$ck? i hate commitment !! just want to have fun... if ur good in bed, we should chat..The screenname - Kassie1994my profile is over there: http://jaxzbivhDatingSurferru/?acc=Kassie1994I need that d@ck right now ! text me @ '+1.574.212O158' .It is me Talk soon! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC PATCH] create vxlan device using rtnetlink interface
On Mon, Apr 18, 2016 at 2:57 AM, Thadeu Lima de Souza Cascardowrote: > On Fri, Apr 15, 2016 at 08:36:51PM -0700, Jesse Gross wrote: >> On Fri, Apr 15, 2016 at 2:30 PM, Thadeu Lima de Souza Cascardo >> wrote: >> > Hi, this is an RFC patch (could probably be 2 patches - more below), that >> > creates VXLAN devices in userspace and adds them as netdev vports, instead >> > of >> > using the vxlan vport code. >> > >> > The reason for this change is that it has been mentioned in the past that >> > some >> > of the vport code should be considered compat code, that is, it won't >> > receive >> > new features or flags. >> >> Thanks for looking at this. I agree that this is definitely a >> necessary direction. >> >> > First is the creation of the device under a switch/case. I tried to make >> > this a >> > netdev_class function, overriding the netdev class by dpif_port_open_type. >> > That >> > required exporting a lot of the vport functions. I can still do it that >> > way, if >> > that's preferrable, but this seems more simple. >> >> Can you give some examples of what would need to be exported? It would >> be nice to just set the open function for each type but if it is >> really horrible we can live with the switch statement. >> > > netdev_vport_{alloc, construct, destruct, dealloc}, get_tunnel_config, > set_tunnel_config, get_netdev_tunnel_config. > > We could also do it the other way around, write this code in netdev-vport.c > and > export one or two functions from netdev-linux or dpif-netlink, and the create > function per tunnel type. OK, thanks, I understand now. I think what you have currently is probably the best solution for the time being. >> > The other problem is during port_del, that since we don't have a netdev, >> > the >> > dpif_port type would not be vxlan, and deciding whether to remove it or not >> > could not be made. In fact, this is one of the main reasons why this is an >> > RFC. >> > >> > The solution here is to decide on the type by the device's name, and there >> > is >> > even a function for this, though it looks up for the netdev's already >> > created, >> > those that probably come from the database. However, when OVS starts, it >> > removes >> > ports from the datapath, and that information is not available, and may >> > not even >> > be. In fact, since the dpif_port has a different name because it's a >> > vport, it >> > won't match any netdev at all. So, I did the opposite of getting the type >> > from >> > the dpif_port name, ie, if it contains vxlan_sys, it's of vxlan type. Even >> > if we >> > make this more strict and use the port, we could still be in conflict with >> > a >> > vxlan device created by the user with such name. This should have been one >> > patch >> > by itself. >> >> What about using the driver name exposed through ethtool? I believe >> that all of the tunnel and internal devices expose this in a >> consistent way - i.e. a VXLAN device can be queried and get back the >> string "vxlan". Any driver other than the ones that we recognize can >> be assumed to be OVS_VPORT_TYPE_NETDEV. >> > > I don't see how this address the case when the user adds a vxlan interface > created by the system. Like: > > ip link add vxlan_sys_4789 type vxlan id 10 remote 192.168.123.1 dstport 4789 > ovs-vsctl add-port br0 vxlan_sys_4789 > > Its driver would be vxlan. That goes to vxlan0 too. I think we can check the combination of device type and the options that are set on it (the same as what we need to do after we create the device). If all of those match then it doesn't matter whether we added it previously or the user added it - the device will work exactly the same either way. If there isn't a match then we should bail out without adding the port. This is pretty similar to the behavior of the current compat code in the kernel (in that case if a port already exists with the same name, it just aborts). Two unresolved issues in my mind: * How do we handle cleaning up ports (including in cases where userspace crashes)? In the kernel case, the port will stick around until either the module is removed or the port is explicitly deleted. * How do we handle upgrade where the new version of OVS needs options that are different from the previous version? This is basically a special version of the port being created by someone else but we should be able to handle the differences. Depending on how good a job we do at cleaning up the port, this might not be an issue. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v7 05/16] dpif-netdev: Fix race condition in pmd thread initialization.
On 19/04/2016 02:48, "Ilya Maximets"wrote: >On 19.04.2016 10:18, Ilya Maximets wrote: >> There was a reason for 2 calls for dp_netdev_pmd_reload_done() inside >> pmd_thread_main(). The reason is that we must wait until PMD thread >> completely done with reloading. This patch introduces race condition >> for pmd->exit_latch. While removing last port on numa node >> dp_netdev_reload_pmd__(pmd) will be called twice for each port. >> First call to remove port and second to destroy PMD thread. >> pmd->exit_latch setted between this two calls. This leads to probable >> situation when PMD thread will exit while processing first reloading. >> Main thread will wait forever on cond_wait in second reload in this >> case. Situation is easily reproducible by addition/deletion of last >> port (may be after few iterations in a cycle). >> >> Best regards, Ilya Maximets. > >This incremental should help: >-- >diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >index 588d56f..2235297 100644 >--- a/lib/dpif-netdev.c >+++ b/lib/dpif-netdev.c >@@ -2785,6 +2785,7 @@ pmd_thread_main(void *f_) > unsigned int port_seq = PMD_INITIAL_SEQ; > int poll_cnt; > int i; >+bool exiting; > > poll_cnt = 0; > poll_list = NULL; >@@ -2825,14 +2826,15 @@ reload: > } > } > >+emc_cache_uninit(>flow_cache); >+ > poll_cnt = pmd_load_queues_and_ports(pmd, _list); >+exiting = latch_is_set(>exit_latch); > /* Signal here to make sure the pmd finishes > * reloading the updated configuration. */ > dp_netdev_pmd_reload_done(pmd); > >-emc_cache_uninit(>flow_cache); >- >-if (!latch_is_set(>exit_latch)){ >+if (!exiting) { > goto reload; > } > >-- You're right, thanks for the detailed analysis and the suggested fix. I applied the suggested incremental, but kept emc_cache_uninit() where it is right now. I sent an updated version here: http://openvswitch.org/pipermail/dev/2016-April/069835.html Thanks, Daniele > > >> On 08.04.2016 06:13, Daniele Di Proietto wrote: >>> The pmds and the main threads are synchronized using a condition >>> variable. The main thread writes a new configuration, then it waits on >>> the condition variable. A pmd thread reads the new configuration, then >>> it calls signal() on the condition variable. To make sure that the pmds >>> and the main thread have a consistent view, each signal() should be >>> backed by a wait(). >>> >>> Currently the first signal() doesn't have a corresponding wait(). If >>> the pmd thread takes a long time to start and the signal() is received >>> by a later wait, the threads will have an inconsistent view. >>> >>> The commit fixes the problem by removing the first signal() from the >>> pmd thread. >>> >>> This is hardly a problem on current master, because the main thread >>> will call the first wait() a long time after the creation of a pmd >>> thread. It becomes a problem with the next commits. >>> >>> Signed-off-by: Daniele Di Proietto >>> --- >>> lib/dpif-netdev.c | 21 + >>> 1 file changed, 9 insertions(+), 12 deletions(-) >>> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >>> index 9c32c64..2424d3e 100644 >>> --- a/lib/dpif-netdev.c >>> +++ b/lib/dpif-netdev.c >>> @@ -2652,21 +2652,22 @@ dpif_netdev_wait(struct dpif *dpif) >>> >>> static int >>> pmd_load_queues(struct dp_netdev_pmd_thread *pmd, struct rxq_poll >>>**ppoll_list) >>> -OVS_REQUIRES(pmd->poll_mutex) >>> { >>> struct rxq_poll *poll_list = *ppoll_list; >>> struct rxq_poll *poll; >>> int i; >>> >>> +ovs_mutex_lock(>poll_mutex); >>> poll_list = xrealloc(poll_list, pmd->poll_cnt * sizeof >>>*poll_list); >>> >>> i = 0; >>> LIST_FOR_EACH (poll, node, >poll_list) { >>> poll_list[i++] = *poll; >>> } >>> +ovs_mutex_unlock(>poll_mutex); >>> >>> *ppoll_list = poll_list; >>> -return pmd->poll_cnt; >>> +return i; >>> } >>> >>> static void * >>> @@ -2685,13 +2686,10 @@ pmd_thread_main(void *f_) >>> /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */ >>> ovsthread_setspecific(pmd->dp->per_pmd_key, pmd); >>> pmd_thread_setaffinity_cpu(pmd->core_id); >>> +poll_cnt = pmd_load_queues(pmd, _list); >>> reload: >>> emc_cache_init(>flow_cache); >>> >>> -ovs_mutex_lock(>poll_mutex); >>> -poll_cnt = pmd_load_queues(pmd, _list); >>> -ovs_mutex_unlock(>poll_mutex); >>> - >>> /* List port/core affinity */ >>> for (i = 0; i < poll_cnt; i++) { >>> VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n", >>> @@ -2699,10 +2697,6 @@ reload: >>> netdev_rxq_get_queue_id(poll_list[i].rx)); >>> } >>> >>> -/* Signal here to make sure the pmd finishes >>> - * reloading the updated configuration. */ >>> -
[ovs-dev] [PATCH v8 16/16] netdev-dpdk: Use ->reconfigure() call to change rx/tx queues.
This introduces in dpif-netdev and netdev-dpdk the first use for the newly introduce reconfigure netdev call. When a request to change the number of queues comes, netdev-dpdk will remember this and notify the upper layer via netdev_request_reconfigure(). The datapath, instead of periodically calling netdev_set_multiq(), can detect this and call reconfigure(). This mechanism can also be used to: * Automatically match the number of rxq with the one provided by qemu via the new_device callback. * Provide a way to change the MTU of dpdk devices at runtime. * Move a DPDK vhost device to the proper NUMA socket. Signed-off-by: Daniele Di Proietto--- lib/dpif-netdev.c | 69 +- lib/netdev-bsd.c | 2 +- lib/netdev-dpdk.c | 195 ++ lib/netdev-dummy.c| 2 +- lib/netdev-linux.c| 2 +- lib/netdev-provider.h | 23 +++--- lib/netdev-vport.c| 2 +- lib/netdev.c | 36 +++--- lib/netdev.h | 3 +- 9 files changed, 160 insertions(+), 174 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 3a250fd..cce3bf1 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -257,8 +257,6 @@ struct dp_netdev_port { unsigned n_rxq; /* Number of elements in 'rxq' */ struct netdev_rxq **rxq; char *type; /* Port type as requested by user. */ -int latest_requested_n_rxq; /* Latest requested from netdev number - of rx queues. */ }; /* Contained by struct dp_netdev_flow's 'stats' member. */ @@ -1161,20 +1159,26 @@ port_create(const char *devname, const char *open_type, const char *type, /* There can only be ovs_numa_get_n_cores() pmd threads, * so creates a txq for each, and one extra for the non * pmd threads. */ -error = netdev_set_multiq(netdev, n_cores + 1, - netdev_requested_n_rxq(netdev)); +error = netdev_set_tx_multiq(netdev, n_cores + 1); if (error && (error != EOPNOTSUPP)) { VLOG_ERR("%s, cannot set multiq", devname); goto out; } } + +if (netdev_is_reconf_required(netdev)) { +error = netdev_reconfigure(netdev); +if (error) { +goto out; +} +} + port = xzalloc(sizeof *port); port->port_no = port_no; port->netdev = netdev; port->n_rxq = netdev_n_rxq(netdev); port->rxq = xcalloc(port->n_rxq, sizeof *port->rxq); port->type = xstrdup(type); -port->latest_requested_n_rxq = netdev_requested_n_rxq(netdev); for (i = 0; i < port->n_rxq; i++) { error = netdev_rxq_open(netdev, >rxq[i], i); @@ -2455,27 +2459,6 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op **ops, size_t n_ops) } } -/* Returns true if the configuration for rx queues is changed. */ -static bool -pmd_n_rxq_changed(const struct dp_netdev *dp) -{ -struct dp_netdev_port *port; - -ovs_mutex_lock(>port_mutex); -HMAP_FOR_EACH (port, node, >ports) { -int requested_n_rxq = netdev_requested_n_rxq(port->netdev); - -if (netdev_is_pmd(port->netdev) -&& port->latest_requested_n_rxq != requested_n_rxq) { -ovs_mutex_unlock(>port_mutex); -return true; -} -} -ovs_mutex_unlock(>port_mutex); - -return false; -} - static bool cmask_equals(const char *a, const char *b) { @@ -2599,11 +2582,9 @@ static int port_reconfigure(struct dp_netdev_port *port) { struct netdev *netdev = port->netdev; -int requested_n_rxq = netdev_requested_n_rxq(netdev); int i, err; -if (!netdev_is_pmd(port->netdev) -|| port->latest_requested_n_rxq != requested_n_rxq) { +if (!netdev_is_reconf_required(netdev)) { return 0; } @@ -2614,15 +2595,14 @@ port_reconfigure(struct dp_netdev_port *port) } port->n_rxq = 0; -/* Sets the new rx queue config. */ -err = netdev_set_multiq(port->netdev, ovs_numa_get_n_cores() + 1, -requested_n_rxq); +/* Allows 'netdev' to apply the pending configuration changes. */ +err = netdev_reconfigure(netdev); if (err && (err != EOPNOTSUPP)) { -VLOG_ERR("Failed to set dpdk interface %s rx_queue to: %u", - netdev_get_name(port->netdev), requested_n_rxq); +VLOG_ERR("Failed to set interface %s new configuration", + netdev_get_name(netdev)); return err; } -/* If the set_multiq() above succeeds, reopens the 'rxq's. */ +/* If the netdev_reconfigure( above succeeds, reopens the 'rxq's. */ port->rxq = xrealloc(port->rxq, sizeof *port->rxq * netdev_n_rxq(netdev)); for (i = 0; i < netdev_n_rxq(netdev); i++) { err = netdev_rxq_open(netdev, >rxq[i], i); @@ -2666,6 +2646,22 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
[ovs-dev] [PATCH v8 12/16] ofproto-dpif: Call dpif_poll_threads_set() before dpif_run().
An upcoming commit will make dpif_poll_threads_set() record the requested configuration and dpif_run() apply it, so it makes sense to change the order. Signed-off-by: Daniele Di ProiettoTested-by: Ilya Maximets Acked-by: Ilya Maximets Acked-by: Mark Kavanagh --- ofproto/ofproto-dpif.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index 747482c..2aa12b7 100644 --- a/ofproto/ofproto-dpif.c +++ b/ofproto/ofproto-dpif.c @@ -534,6 +534,8 @@ type_run(const char *type) return 0; } +/* This must be called before dpif_run() */ +dpif_poll_threads_set(backer->dpif, pmd_cpu_mask); if (dpif_run(backer->dpif)) { backer->need_revalidate = REV_RECONFIGURE; @@ -562,8 +564,6 @@ type_run(const char *type) udpif_set_threads(backer->udpif, n_handlers, n_revalidators); } -dpif_poll_threads_set(backer->dpif, pmd_cpu_mask); - if (backer->need_revalidate) { struct ofproto_dpif *ofproto; struct simap_node *node; -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v8 13/16] dpif-netdev: Change pmd thread configuration in dpif_netdev_run().
Signed-off-by: Daniele Di Proietto--- lib/dpif-netdev.c | 144 ++-- lib/dpif-provider.h | 3 +- 2 files changed, 84 insertions(+), 63 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 8cc37e2..c905d1d 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -224,7 +224,9 @@ struct dp_netdev { ovsthread_key_t per_pmd_key; /* Cpu mask for pin of pmd threads. */ +char *requested_pmd_cmask; char *pmd_cmask; + uint64_t last_tnl_conf_seq; }; @@ -2453,18 +2455,17 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op **ops, size_t n_ops) } } -/* Returns true if the configuration for rx queues or cpu mask - * is changed. */ +/* Returns true if the configuration for rx queues is changed. */ static bool -pmd_config_changed(const struct dp_netdev *dp, const char *cmask) +pmd_n_rxq_changed(const struct dp_netdev *dp) { struct dp_netdev_port *port; ovs_mutex_lock(>port_mutex); HMAP_FOR_EACH (port, node, >ports) { -struct netdev *netdev = port->netdev; -int requested_n_rxq = netdev_requested_n_rxq(netdev); -if (netdev_is_pmd(netdev) +int requested_n_rxq = netdev_requested_n_rxq(port->netdev); + +if (netdev_is_pmd(port->netdev) && port->latest_requested_n_rxq != requested_n_rxq) { ovs_mutex_unlock(>port_mutex); return true; @@ -2472,69 +2473,29 @@ pmd_config_changed(const struct dp_netdev *dp, const char *cmask) } ovs_mutex_unlock(>port_mutex); -if (dp->pmd_cmask != NULL && cmask != NULL) { -return strcmp(dp->pmd_cmask, cmask); -} else { -return (dp->pmd_cmask != NULL || cmask != NULL); +return false; +} + +static bool +cmask_equals(const char *a, const char *b) +{ +if (a && b) { +return !strcmp(a, b); } + +return a == NULL && b == NULL; } -/* Resets pmd threads if the configuration for 'rxq's or cpu mask changes. */ +/* Changes the number or the affinity of pmd threads. The changes are actually + * applied in dpif_netdev_run(). */ static int dpif_netdev_pmd_set(struct dpif *dpif, const char *cmask) { struct dp_netdev *dp = get_dp_netdev(dpif); -if (pmd_config_changed(dp, cmask)) { -struct dp_netdev_port *port; - -dp_netdev_destroy_all_pmds(dp); - -ovs_mutex_lock(>port_mutex); -HMAP_FOR_EACH (port, node, >ports) { -struct netdev *netdev = port->netdev; -int requested_n_rxq = netdev_requested_n_rxq(netdev); -if (netdev_is_pmd(port->netdev) -&& port->latest_requested_n_rxq != requested_n_rxq) { -int i, err; - -/* Closes the existing 'rxq's. */ -for (i = 0; i < netdev_n_rxq(port->netdev); i++) { -netdev_rxq_close(port->rxq[i]); -port->rxq[i] = NULL; -} -port->n_rxq = 0; - -/* Sets the new rx queue config. */ -err = netdev_set_multiq(port->netdev, -ovs_numa_get_n_cores() + 1, -requested_n_rxq); -if (err && (err != EOPNOTSUPP)) { -VLOG_ERR("Failed to set dpdk interface %s rx_queue to:" - " %u", netdev_get_name(port->netdev), - requested_n_rxq); -ovs_mutex_unlock(>port_mutex); -return err; -} -port->latest_requested_n_rxq = requested_n_rxq; -/* If the set_multiq() above succeeds, reopens the 'rxq's. */ -port->n_rxq = netdev_n_rxq(port->netdev); -port->rxq = xrealloc(port->rxq, sizeof *port->rxq * port->n_rxq); -for (i = 0; i < port->n_rxq; i++) { -netdev_rxq_open(port->netdev, >rxq[i], i); -} -} -} -/* Reconfigures the cpu mask. */ -ovs_numa_set_cpu_mask(cmask); -free(dp->pmd_cmask); -dp->pmd_cmask = cmask ? xstrdup(cmask) : NULL; - -/* Restores the non-pmd. */ -dp_netdev_set_nonpmd(dp); -/* Restores all pmd threads. */ -dp_netdev_reset_pmd_threads(dp); -ovs_mutex_unlock(>port_mutex); +if (!cmask_equals(dp->requested_pmd_cmask, cmask)) { +free(dp->requested_pmd_cmask); +dp->requested_pmd_cmask = cmask ? xstrdup(cmask) : NULL; } return 0; @@ -2634,6 +2595,59 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, } } +static void +reconfigure_pmd_threads(struct dp_netdev *dp) +OVS_REQUIRES(dp->port_mutex) +{ +struct dp_netdev_port *port; + +dp_netdev_destroy_all_pmds(dp); + +HMAP_FOR_EACH (port, node, >ports) { +struct netdev *netdev = port->netdev; +int requested_n_rxq =
[ovs-dev] [PATCH v8 15/16] netdev: Add reconfigure request mechanism.
A netdev provider, especially a PMD provider (like netdev DPDK) might not be able to change some of its parameters (such as MTU, or number of queues) without stopping everything and restarting. This commit introduces a mechanism that allows a netdev provider to request a restart (netdev_request_reconfigure()). The upper layer can be notified via netdev_wait_reconf_required() and netdev_is_reconf_required(). After closing all the rxqs the upper layer can finally call netdev_reconfigure(), to make sure that the new configuration is in place. This will be used by next commit to reconfigure rx and tx queues in netdev-dpdk. Signed-off-by: Daniele Di ProiettoTested-by: Ilya Maximets Acked-by: Ilya Maximets Acked-by: Mark Kavanagh --- lib/netdev-bsd.c | 1 + lib/netdev-dpdk.c | 1 + lib/netdev-dummy.c| 1 + lib/netdev-linux.c| 1 + lib/netdev-provider.h | 27 ++- lib/netdev-vport.c| 1 + lib/netdev.c | 39 +++ lib/netdev.h | 4 8 files changed, 74 insertions(+), 1 deletion(-) diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c index 49c05f4..32e8f74 100644 --- a/lib/netdev-bsd.c +++ b/lib/netdev-bsd.c @@ -1536,6 +1536,7 @@ netdev_bsd_update_flags(struct netdev *netdev_, enum netdev_flags off, netdev_bsd_arp_lookup, /* arp_lookup */ \ \ netdev_bsd_update_flags, \ +NULL, /* reconfigure */ \ \ netdev_bsd_rxq_alloc,\ netdev_bsd_rxq_construct,\ diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 208c5f5..c4ff039 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2720,6 +2720,7 @@ static const struct dpdk_qos_ops egress_policer_ops = { NULL, /* arp_lookup */ \ \ netdev_dpdk_update_flags, \ +NULL, /* reconfigure */ \ \ netdev_dpdk_rxq_alloc,\ netdev_dpdk_rxq_construct,\ diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c index 5acb4e1..a001322 100644 --- a/lib/netdev-dummy.c +++ b/lib/netdev-dummy.c @@ -1275,6 +1275,7 @@ static const struct netdev_class dummy_class = { NULL, /* arp_lookup */ netdev_dummy_update_flags, +NULL, /* reconfigure */ netdev_dummy_rxq_alloc, netdev_dummy_rxq_construct, diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 2c1ffec..1af08f3 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -2806,6 +2806,7 @@ netdev_linux_update_flags(struct netdev *netdev_, enum netdev_flags off, netdev_linux_arp_lookup,\ \ netdev_linux_update_flags, \ +NULL, /* reconfigure */ \ \ netdev_linux_rxq_alloc, \ netdev_linux_rxq_construct, \ diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h index cda25eb..853fc44 100644 --- a/lib/netdev-provider.h +++ b/lib/netdev-provider.h @@ -52,6 +52,16 @@ struct netdev { * 'netdev''s flags, features, ethernet address, or carrier changes. */ uint64_t change_seq; +/* A netdev provider might be unable to change some of the device's + * parameter (n_rxq, mtu) when the device is in use. In this case + * the provider can notify the upper layer by calling + * netdev_request_reconfigure(). The upper layer will react by stopping + * the operations on the device and calling netdev_reconfigure() to allow + * the configuration changes. 'last_reconfigure_seq' remembers the value + * of 'reconfigure_seq' when the last reconfiguration happened. */ +struct seq *reconfigure_seq; +uint64_t last_reconfigure_seq; + /* The core netdev code initializes these at netdev construction and only * provide read-only access to its client. Netdev implementations may * modify them. */ @@ -64,7 +74,7 @@ struct netdev { struct ovs_list saved_flags_list; /* Contains "struct netdev_saved_flags". */ }; -static void +static inline void netdev_change_seq_changed(const struct netdev *netdev_) { struct netdev *netdev = CONST_CAST(struct netdev *, netdev_); @@ -75,6 +85,12 @@ netdev_change_seq_changed(const struct netdev *netdev_) } }
[ovs-dev] [PATCH v8 14/16] dpif-netdev: Handle errors in reconfigure_pmd_threads().
Errors returned by netdev_set_multiq() and netdev_rxq_open() weren't handled properly in reconfigure_pmd_threads(). In case of error now we remove the port from the datapath. Also, part of the code is moved in a new function, port_reconfigure(). Signed-off-by: Daniele Di Proietto--- lib/dpif-netdev.c | 78 ++- 1 file changed, 48 insertions(+), 30 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index c905d1d..3a250fd 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2595,44 +2595,62 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, } } +static int +port_reconfigure(struct dp_netdev_port *port) +{ +struct netdev *netdev = port->netdev; +int requested_n_rxq = netdev_requested_n_rxq(netdev); +int i, err; + +if (!netdev_is_pmd(port->netdev) +|| port->latest_requested_n_rxq != requested_n_rxq) { +return 0; +} + +/* Closes the existing 'rxq's. */ +for (i = 0; i < port->n_rxq; i++) { +netdev_rxq_close(port->rxq[i]); +port->rxq[i] = NULL; +} +port->n_rxq = 0; + +/* Sets the new rx queue config. */ +err = netdev_set_multiq(port->netdev, ovs_numa_get_n_cores() + 1, +requested_n_rxq); +if (err && (err != EOPNOTSUPP)) { +VLOG_ERR("Failed to set dpdk interface %s rx_queue to: %u", + netdev_get_name(port->netdev), requested_n_rxq); +return err; +} +/* If the set_multiq() above succeeds, reopens the 'rxq's. */ +port->rxq = xrealloc(port->rxq, sizeof *port->rxq * netdev_n_rxq(netdev)); +for (i = 0; i < netdev_n_rxq(netdev); i++) { +err = netdev_rxq_open(netdev, >rxq[i], i); +if (err) { +return err; +} +port->n_rxq++; +} + +return 0; +} + static void reconfigure_pmd_threads(struct dp_netdev *dp) OVS_REQUIRES(dp->port_mutex) { -struct dp_netdev_port *port; +struct dp_netdev_port *port, *next; dp_netdev_destroy_all_pmds(dp); -HMAP_FOR_EACH (port, node, >ports) { -struct netdev *netdev = port->netdev; -int requested_n_rxq = netdev_requested_n_rxq(netdev); -if (netdev_is_pmd(port->netdev) -&& port->latest_requested_n_rxq != requested_n_rxq) { -int i, err; +HMAP_FOR_EACH_SAFE (port, next, node, >ports) { +int err; -/* Closes the existing 'rxq's. */ -for (i = 0; i < port->n_rxq; i++) { -netdev_rxq_close(port->rxq[i]); -port->rxq[i] = NULL; -} -port->n_rxq = 0; - -/* Sets the new rx queue config. */ -err = netdev_set_multiq(port->netdev, ovs_numa_get_n_cores() + 1, -requested_n_rxq); -if (err && (err != EOPNOTSUPP)) { -VLOG_ERR("Failed to set dpdk interface %s rx_queue to: %u", - netdev_get_name(port->netdev), - requested_n_rxq); -return; -} -port->latest_requested_n_rxq = requested_n_rxq; -/* If the set_multiq() above succeeds, reopens the 'rxq's. */ -port->n_rxq = netdev_n_rxq(port->netdev); -port->rxq = xrealloc(port->rxq, sizeof *port->rxq * port->n_rxq); -for (i = 0; i < port->n_rxq; i++) { -netdev_rxq_open(port->netdev, >rxq[i], i); -} +err = port_reconfigure(port); +if (err) { +hmap_remove(>ports, >node); +seq_change(dp->port_seq); +port_destroy(port); } } /* Reconfigures the cpu mask. */ -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v8 11/16] ovs-thread: Do not quiesce in ovs_mutex_cond_wait().
ovs_mutex_cond_wait() is used in many functions in dpif-netdev to synchronize with pmd threads, but we can't guarantee that the callers do not hold RCU references, so it's better to avoid quiescing. In system_stats_thread_func() the code relied on ovs_mutex_cond_wait() to introduce a quiescent state, so explicit calls to ovsrcu_quiesce_start() and ovsrcu_quiesce_end() are added there. Signed-off-by: Daniele Di Proietto--- lib/ovs-rcu.h | 3 +-- lib/ovs-thread.c| 2 -- vswitchd/system-stats.c | 6 ++ 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/lib/ovs-rcu.h b/lib/ovs-rcu.h index 600be0b..8750ead 100644 --- a/lib/ovs-rcu.h +++ b/lib/ovs-rcu.h @@ -42,8 +42,7 @@ * A "quiescent state" is a time at which a thread holds no pointers to memory * that is managed by RCU; that is, when the thread is known not to reference * memory that might be an old version of some object freed via RCU. For - * example, poll_block() includes a quiescent state, as does - * ovs_mutex_cond_wait(). + * example, poll_block() includes a quiescent state. * * The following functions manage the recognition of quiescent states: * diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c index 3c065cf..26dd928 100644 --- a/lib/ovs-thread.c +++ b/lib/ovs-thread.c @@ -253,9 +253,7 @@ ovs_mutex_cond_wait(pthread_cond_t *cond, const struct ovs_mutex *mutex_) struct ovs_mutex *mutex = CONST_CAST(struct ovs_mutex *, mutex_); int error; -ovsrcu_quiesce_start(); error = pthread_cond_wait(cond, >lock); -ovsrcu_quiesce_end(); if (OVS_UNLIKELY(error)) { ovs_abort(error, "pthread_cond_wait failed"); diff --git a/vswitchd/system-stats.c b/vswitchd/system-stats.c index df4971e..129f0cf 100644 --- a/vswitchd/system-stats.c +++ b/vswitchd/system-stats.c @@ -37,6 +37,7 @@ #include "json.h" #include "latch.h" #include "openvswitch/ofpbuf.h" +#include "ovs-rcu.h" #include "ovs-thread.h" #include "poll-loop.h" #include "shash.h" @@ -615,7 +616,12 @@ system_stats_thread_func(void *arg OVS_UNUSED) ovs_mutex_lock(); while (!enabled) { +/* The thread is sleeping, potentially for a long time, and it's + * not holding RCU protected references, so it makes sense to + * quiesce */ +ovsrcu_quiesce_start(); ovs_mutex_cond_wait(, ); +ovsrcu_quiesce_end(); } ovs_mutex_unlock(); -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v8 10/16] dpif-netdev: Use hmap for ports.
netdev objects are hard to use with RCU, because it's not possible to split removal and reclamation. Postponing the removal means that the port is not removed and cannot be readded immediately. Waiting for reclamation means introducing a quiescent state, and that may introduce subtle bugs, due to the RCU model we use in userspace. This commit changes the port container from cmap to hmap. 'port_mutex' must be held by readers and writers. This shouldn't have performace impact, as readers in the fast path use a thread local cache. Signed-off-by: Daniele Di Proietto--- lib/dpif-netdev.c | 96 +-- 1 file changed, 57 insertions(+), 39 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index bd2249e..8cc37e2 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -195,9 +195,10 @@ struct dp_netdev { /* Ports. * - * Protected by RCU. Take the mutex to add or remove ports. */ + * Any lookup into 'ports' or any access to the dp_netdev_ports found + * through 'ports' requires taking 'port_mutex'. */ struct ovs_mutex port_mutex; -struct cmap ports; +struct hmap ports; struct seq *port_seq; /* Incremented whenever a port changes. */ /* Protects access to ofproto-dpif-upcall interface during revalidator @@ -228,7 +229,8 @@ struct dp_netdev { }; static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp, -odp_port_t); +odp_port_t) +OVS_REQUIRES(>port_mutex); enum dp_stat_type { DP_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ @@ -248,7 +250,7 @@ enum pmd_cycles_counter_type { struct dp_netdev_port { odp_port_t port_no; struct netdev *netdev; -struct cmap_node node; /* Node in dp_netdev's 'ports'. */ +struct hmap_node node; /* Node in dp_netdev's 'ports'. */ struct netdev_saved_flags *sf; unsigned n_rxq; /* Number of elements in 'rxq' */ struct netdev_rxq **rxq; @@ -476,9 +478,11 @@ struct dpif_netdev { }; static int get_port_by_number(struct dp_netdev *dp, odp_port_t port_no, - struct dp_netdev_port **portp); + struct dp_netdev_port **portp) +OVS_REQUIRES(dp->port_mutex); static int get_port_by_name(struct dp_netdev *dp, const char *devname, -struct dp_netdev_port **portp); +struct dp_netdev_port **portp) +OVS_REQUIRES(dp->port_mutex); static void dp_netdev_free(struct dp_netdev *) OVS_REQUIRES(dp_netdev_mutex); static int do_add_port(struct dp_netdev *dp, const char *devname, @@ -522,7 +526,8 @@ dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_port *port, struct netdev_rxq *rx); static struct dp_netdev_pmd_thread * dp_netdev_less_loaded_pmd_on_numa(struct dp_netdev *dp, int numa_id); -static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp); +static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp) +OVS_REQUIRES(dp->port_mutex); static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd); static void dp_netdev_pmd_unref(struct dp_netdev_pmd_thread *pmd); static void dp_netdev_pmd_flow_flush(struct dp_netdev_pmd_thread *pmd); @@ -913,7 +918,7 @@ create_dp_netdev(const char *name, const struct dpif_class *class, atomic_flag_clear(>destroyed); ovs_mutex_init(>port_mutex); -cmap_init(>ports); +hmap_init(>ports); dp->port_seq = seq_create(); fat_rwlock_init(>upcall_rwlock); @@ -984,7 +989,7 @@ static void dp_netdev_free(struct dp_netdev *dp) OVS_REQUIRES(dp_netdev_mutex) { -struct dp_netdev_port *port; +struct dp_netdev_port *port, *next; shash_find_and_delete(_netdevs, dp->name); @@ -993,15 +998,14 @@ dp_netdev_free(struct dp_netdev *dp) ovsthread_key_delete(dp->per_pmd_key); ovs_mutex_lock(>port_mutex); -CMAP_FOR_EACH (port, node, >ports) { -/* PMD threads are destroyed here. do_del_port() cannot quiesce */ +HMAP_FOR_EACH_SAFE (port, next, node, >ports) { do_del_port(dp, port); } ovs_mutex_unlock(>port_mutex); cmap_destroy(>poll_threads); seq_destroy(dp->port_seq); -cmap_destroy(>ports); +hmap_destroy(>ports); ovs_mutex_destroy(>port_mutex); /* Upcalls must be disabled at this point */ @@ -1222,7 +1226,7 @@ do_add_port(struct dp_netdev *dp, const char *devname, const char *type, return error; } -cmap_insert(>ports, >node, hash_port_no(port_no)); +hmap_insert(>ports, >node, hash_port_no(port_no)); dp_netdev_add_port_to_pmds(dp, port); seq_change(dp->port_seq); @@ -1288,10 +1292,11 @@ is_valid_port_number(odp_port_t port_no) static struct dp_netdev_port *
[ovs-dev] [PATCH v8 07/16] hmap: Add HMAP_FOR_EACH_POP.
Makes popping each member of the hmap a bit easier. Signed-off-by: Daniele Di Proietto--- lib/cfm.c| 5 ++--- lib/hmap.h | 4 lib/id-pool.c| 5 ++--- lib/learning-switch.c| 5 ++--- lib/netdev-linux.c | 5 ++--- lib/odp-util.c | 7 +++ ofproto/bond.c | 10 -- ofproto/in-band.c| 5 ++--- ofproto/ofproto-dpif-ipfix.c | 5 ++--- ofproto/ofproto-dpif-xlate.c | 5 ++--- ofproto/ofproto.c| 5 ++--- ofproto/pinsched.c | 5 ++--- ovn/controller-vtep/vtep.c | 5 ++--- ovn/controller/encaps.c | 5 ++--- ovn/controller/lport.c | 5 ++--- ovn/controller/ofctrl.c | 5 ++--- ovn/controller/physical.c| 4 +--- ovn/controller/pinctrl.c | 5 ++--- ovn/lib/expr.c | 5 ++--- ovn/northd/ovn-northd.c | 10 -- ovsdb/monitor.c | 5 ++--- ovsdb/row.c | 5 ++--- tests/library.at | 2 +- tests/test-hmap.c| 42 ++ 24 files changed, 93 insertions(+), 71 deletions(-) diff --git a/lib/cfm.c b/lib/cfm.c index cf1f725..fb077de 100644 --- a/lib/cfm.c +++ b/lib/cfm.c @@ -374,7 +374,7 @@ cfm_create(const struct netdev *netdev) OVS_EXCLUDED(mutex) void cfm_unref(struct cfm *cfm) OVS_EXCLUDED(mutex) { -struct remote_mp *rmp, *rmp_next; +struct remote_mp *rmp; if (!cfm) { return; @@ -389,8 +389,7 @@ cfm_unref(struct cfm *cfm) OVS_EXCLUDED(mutex) hmap_remove(all_cfms, >hmap_node); ovs_mutex_unlock(); -HMAP_FOR_EACH_SAFE (rmp, rmp_next, node, >remote_mps) { -hmap_remove(>remote_mps, >node); +HMAP_FOR_EACH_POP (rmp, node, >remote_mps) { free(rmp); } diff --git a/lib/hmap.h b/lib/hmap.h index 53e75cc..08c4719 100644 --- a/lib/hmap.h +++ b/lib/hmap.h @@ -192,6 +192,10 @@ bool hmap_contains(const struct hmap *, const struct hmap_node *); __VA_ARGS__; \ (NODE != OBJECT_CONTAINING(NULL, NODE, MEMBER)) || (NODE = NULL); \ ASSIGN_CONTAINER(NODE, hmap_next(HMAP, &(NODE)->MEMBER), MEMBER)) +#define HMAP_FOR_EACH_POP(NODE, MEMBER, HMAP) \ +while (!hmap_is_empty(HMAP)\ + && (INIT_CONTAINER(NODE, hmap_first(HMAP), MEMBER), 1) \ + && (hmap_remove(HMAP, &(NODE)->MEMBER), 1)) static inline struct hmap_node *hmap_first(const struct hmap *); static inline struct hmap_node *hmap_next(const struct hmap *, diff --git a/lib/id-pool.c b/lib/id-pool.c index 6b93d37..f32c008 100644 --- a/lib/id-pool.c +++ b/lib/id-pool.c @@ -69,10 +69,9 @@ id_pool_init(struct id_pool *pool, uint32_t base, uint32_t n_ids) static void id_pool_uninit(struct id_pool *pool) { -struct id_node *id_node, *next; +struct id_node *id_node; -HMAP_FOR_EACH_SAFE(id_node, next, node, >map) { -hmap_remove(>map, _node->node); +HMAP_FOR_EACH_POP(id_node, node, >map) { free(id_node); } diff --git a/lib/learning-switch.c b/lib/learning-switch.c index c69ca4c..b420fe5 100644 --- a/lib/learning-switch.c +++ b/lib/learning-switch.c @@ -269,11 +269,10 @@ void lswitch_destroy(struct lswitch *sw) { if (sw) { -struct lswitch_port *node, *next; +struct lswitch_port *node; rconn_destroy(sw->rconn); -HMAP_FOR_EACH_SAFE (node, next, hmap_node, >queue_numbers) { -hmap_remove(>queue_numbers, >hmap_node); +HMAP_FOR_EACH_POP (node, hmap_node, >queue_numbers) { free(node); } shash_destroy(>queue_names); diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index a7d7ac7..2c1ffec 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -3851,10 +3851,9 @@ static void htb_tc_destroy(struct tc *tc) { struct htb *htb = CONTAINER_OF(tc, struct htb, tc); -struct htb_class *hc, *next; +struct htb_class *hc; -HMAP_FOR_EACH_SAFE (hc, next, tc_queue.hmap_node, >tc.queues) { -hmap_remove(>tc.queues, >tc_queue.hmap_node); +HMAP_FOR_EACH_POP (hc, tc_queue.hmap_node, >tc.queues) { free(hc); } tc_destroy(tc); diff --git a/lib/odp-util.c b/lib/odp-util.c index b4689cc..3c75379 100644 --- a/lib/odp-util.c +++ b/lib/odp-util.c @@ -2081,10 +2081,9 @@ odp_portno_names_get(const struct hmap *portno_names, odp_port_t port_no) void odp_portno_names_destroy(struct hmap *portno_names) { -struct odp_portno_names *odp_portno_names, *odp_portno_names_next; -HMAP_FOR_EACH_SAFE (odp_portno_names, odp_portno_names_next, -hmap_node, portno_names) { -hmap_remove(portno_names, _portno_names->hmap_node); +struct odp_portno_names *odp_portno_names; + +HMAP_FOR_EACH_POP (odp_portno_names, hmap_node, portno_names) {
[ovs-dev] [PATCH v8 09/16] hmap: Use struct for hmap_at_position().
The interface will be more similar to the cmap. Signed-off-by: Daniele Di Proietto--- lib/hmap.c | 26 -- lib/hmap.h | 7 ++- lib/sset.c | 12 +--- lib/sset.h | 7 ++- ofproto/ofproto-dpif.c | 8 +++- 5 files changed, 32 insertions(+), 28 deletions(-) diff --git a/lib/hmap.c b/lib/hmap.c index b70ce51..9462c5e 100644 --- a/lib/hmap.c +++ b/lib/hmap.c @@ -236,24 +236,22 @@ hmap_random_node(const struct hmap *hmap) } /* Returns the next node in 'hmap' in hash order, or NULL if no nodes remain in - * 'hmap'. Uses '*bucketp' and '*offsetp' to determine where to begin - * iteration, and stores new values to pass on the next iteration into them - * before returning. + * 'hmap'. Uses '*pos' to determine where to begin iteration, and updates + * '*pos' to pass on the next iteration into them before returning. * * It's better to use plain HMAP_FOR_EACH and related functions, since they are * faster and better at dealing with hmaps that change during iteration. * - * Before beginning iteration, store 0 into '*bucketp' and '*offsetp'. - */ + * Before beginning iteration, set '*pos' to all zeros. */ struct hmap_node * hmap_at_position(const struct hmap *hmap, - uint32_t *bucketp, uint32_t *offsetp) + struct hmap_position *pos) { size_t offset; size_t b_idx; -offset = *offsetp; -for (b_idx = *bucketp; b_idx <= hmap->mask; b_idx++) { +offset = pos->offset; +for (b_idx = pos->bucket; b_idx <= hmap->mask; b_idx++) { struct hmap_node *node; size_t n_idx; @@ -261,11 +259,11 @@ hmap_at_position(const struct hmap *hmap, n_idx++, node = node->next) { if (n_idx == offset) { if (node->next) { -*bucketp = node->hash & hmap->mask; -*offsetp = offset + 1; +pos->bucket = node->hash & hmap->mask; +pos->offset = offset + 1; } else { -*bucketp = (node->hash & hmap->mask) + 1; -*offsetp = 0; +pos->bucket = (node->hash & hmap->mask) + 1; +pos->offset = 0; } return node; } @@ -273,8 +271,8 @@ hmap_at_position(const struct hmap *hmap, offset = 0; } -*bucketp = 0; -*offsetp = 0; +pos->bucket = 0; +pos->offset = 0; return NULL; } diff --git a/lib/hmap.h b/lib/hmap.h index 08c4719..9a96c5f 100644 --- a/lib/hmap.h +++ b/lib/hmap.h @@ -201,8 +201,13 @@ static inline struct hmap_node *hmap_first(const struct hmap *); static inline struct hmap_node *hmap_next(const struct hmap *, const struct hmap_node *); +struct hmap_position { +unsigned int bucket; +unsigned int offset; +}; + struct hmap_node *hmap_at_position(const struct hmap *, - uint32_t *bucket, uint32_t *offset); + struct hmap_position *); /* Returns the number of nodes currently in 'hmap'. */ static inline size_t diff --git a/lib/sset.c b/lib/sset.c index f9d4fc0..4fd3fae 100644 --- a/lib/sset.c +++ b/lib/sset.c @@ -251,21 +251,19 @@ sset_equals(const struct sset *a, const struct sset *b) } /* Returns the next node in 'set' in hash order, or NULL if no nodes remain in - * 'set'. Uses '*bucketp' and '*offsetp' to determine where to begin - * iteration, and stores new values to pass on the next iteration into them - * before returning. + * 'set'. Uses '*pos' to determine where to begin iteration, and updates + * '*pos' to pass on the next iteration into them before returning. * * It's better to use plain SSET_FOR_EACH and related functions, since they are * faster and better at dealing with ssets that change during iteration. * - * Before beginning iteration, store 0 into '*bucketp' and '*offsetp'. - */ + * Before beginning iteration, set '*pos' to all zeros. */ struct sset_node * -sset_at_position(const struct sset *set, uint32_t *bucketp, uint32_t *offsetp) +sset_at_position(const struct sset *set, struct sset_position *pos) { struct hmap_node *hmap_node; -hmap_node = hmap_at_position(>map, bucketp, offsetp); +hmap_node = hmap_at_position(>map, >pos); return SSET_NODE_FROM_HMAP_NODE(hmap_node); } diff --git a/lib/sset.h b/lib/sset.h index 7d1d496..9c2f703 100644 --- a/lib/sset.h +++ b/lib/sset.h @@ -64,8 +64,13 @@ char *sset_pop(struct sset *); struct sset_node *sset_find(const struct sset *, const char *); bool sset_contains(const struct sset *, const char *); bool sset_equals(const struct sset *, const struct sset *); + +struct sset_position { +struct hmap_position pos; +}; + struct sset_node *sset_at_position(const struct sset *, - uint32_t *bucketp, uint32_t
[ovs-dev] [PATCH v8 08/16] dpif-netdev: Add pmd thread local port cache for transmission.
A future commit will stop using RCU for 'dp->ports' and use a mutex for reading/writing them. To avoid taking a mutex in dp_execute_cb(), which is called in the fast path, this commit introduces a pmd thread local cache of ports. The downside is that every port add/remove now needs to synchronize with every pmd thread. Among the advantages, keeping a per thread port mapping could allow greater control over the txq assigment. Signed-off-by: Daniele Di Proietto--- lib/dpif-netdev.c | 249 +++--- 1 file changed, 179 insertions(+), 70 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index cedaf39..bd2249e 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -184,6 +184,7 @@ static bool dpcls_lookup(const struct dpcls *cls, * *dp_netdev_mutex (global) *port_mutex + *non_pmd_mutex */ struct dp_netdev { const struct dpif_class *const class; @@ -379,6 +380,13 @@ struct rxq_poll { struct ovs_list node; }; +/* Contained by struct dp_netdev_pmd_thread's 'port_cache' or 'tx_ports'. */ +struct tx_port { +odp_port_t port_no; +struct netdev *netdev; +struct hmap_node node; +}; + /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate * the performance overhead of interrupt processing. Therefore netdev can * not implement rx-wait for these devices. dpif-netdev needs to poll @@ -405,8 +413,8 @@ struct dp_netdev_pmd_thread { /* Per thread exact-match cache. Note, the instance for cpu core * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly - * need to be protected (e.g. by 'dp_netdev_mutex'). All other - * instances will only be accessed by its own pmd thread. */ + * need to be protected by 'non_pmd_mutex'. Every other instance + * will only be accessed by its own pmd thread. */ struct emc_cache flow_cache; /* Classifier and Flow-Table. @@ -435,10 +443,20 @@ struct dp_netdev_pmd_thread { atomic_int tx_qid; /* Queue id used by this pmd thread to * send packets on all netdevs */ -struct ovs_mutex poll_mutex;/* Mutex for poll_list. */ +struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */ /* List of rx queues to poll. */ struct ovs_list poll_list OVS_GUARDED; -int poll_cnt; /* Number of elemints in poll_list. */ +/* Number of elements in 'poll_list' */ +int poll_cnt; +/* Map of 'tx_port's used for transmission. Written by the main thread, + * read by the pmd thread. */ +struct hmap tx_ports OVS_GUARDED; + +/* Map of 'tx_port' used in the fast path. This is a thread-local copy of + * 'tx_ports'. The instance for cpu core NON_PMD_CORE_ID can be accessed + * by multiple threads, and thusly need to be protected by 'non_pmd_mutex'. + * Every other instance will only be accessed by its own pmd thread. */ +struct hmap port_cache; /* Only a pmd thread can write on its own 'cycles' and 'stats'. * The main thread keeps 'stats_zero' and 'cycles_zero' as base @@ -494,7 +512,7 @@ dp_netdev_pmd_get_next(struct dp_netdev *dp, struct cmap_position *pos); static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp); static void dp_netdev_del_pmds_on_numa(struct dp_netdev *dp, int numa_id); static void dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int numa_id); -static void dp_netdev_pmd_clear_poll_list(struct dp_netdev_pmd_thread *pmd); +static void dp_netdev_pmd_clear_ports(struct dp_netdev_pmd_thread *pmd); static void dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp, struct dp_netdev_port *port); static void @@ -508,6 +526,8 @@ static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp); static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd); static void dp_netdev_pmd_unref(struct dp_netdev_pmd_thread *pmd); static void dp_netdev_pmd_flow_flush(struct dp_netdev_pmd_thread *pmd); +static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd) +OVS_REQUIRES(pmd->port_mutex); static inline bool emc_entry_alive(struct emc_entry *ce); static void emc_clear_entry(struct emc_entry *ce); @@ -690,7 +710,7 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd) ds_put_format(reply, "pmd thread numa_id %d core_id %u:\n", pmd->numa_id, pmd->core_id); -ovs_mutex_lock(>poll_mutex); +ovs_mutex_lock(>port_mutex); LIST_FOR_EACH (poll, node, >poll_list) { const char *name = netdev_get_name(poll->port->netdev); @@ -704,7 +724,7 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd) ds_put_format(reply, " %d", netdev_rxq_get_queue_id(poll->rx)); prev_name = name; } -ovs_mutex_unlock(>poll_mutex); +
[ovs-dev] [PATCH v8 06/16] dpif-netdev: Remove duplicate code in dp_netdev_set_pmds_on_numa().
Instead of duplicating code to add ports in dp_netdev_set_pmds_on_numa(), we can always use dp_netdev_add_port_to_pmds__(). Signed-off-by: Daniele Di Proietto--- lib/dpif-netdev.c | 58 +-- 1 file changed, 22 insertions(+), 36 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index fbd23cf..cedaf39 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3113,13 +3113,12 @@ dp_netdev_add_port_to_pmds__(struct dp_netdev *dp, struct dp_netdev_port *port, /* Cannot create pmd threads for invalid numa node. */ ovs_assert(ovs_numa_numa_id_is_valid(numa_id)); +dp_netdev_set_pmds_on_numa(dp, numa_id); for (i = 0; i < port->n_rxq; i++) { pmd = dp_netdev_less_loaded_pmd_on_numa(dp, numa_id); if (!pmd) { -/* There is no pmd threads on this numa node. */ -dp_netdev_set_pmds_on_numa(dp, numa_id); -/* Assigning of rx queues done. */ +VLOG_WARN("There's no pmd thread on numa node %d", numa_id); break; } @@ -3158,9 +3157,9 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int numa_id) int n_pmds; if (!ovs_numa_numa_id_is_valid(numa_id)) { -VLOG_ERR("Cannot create pmd threads due to numa id (%d)" - "invalid", numa_id); -return ; +VLOG_WARN("Cannot create pmd threads due to numa id (%d) invalid", + numa_id); +return; } n_pmds = get_n_pmd_threads_on_numa(dp, numa_id); @@ -3169,46 +3168,25 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int numa_id) * in which 'netdev' is on, do nothing. Else, creates the * pmd threads for the numa node. */ if (!n_pmds) { -int can_have, n_unpinned, i, index = 0; -struct dp_netdev_pmd_thread **pmds; -struct dp_netdev_port *port; +int can_have, n_unpinned, i; n_unpinned = ovs_numa_get_n_unpinned_cores_on_numa(numa_id); if (!n_unpinned) { -VLOG_ERR("Cannot create pmd threads due to out of unpinned " - "cores on numa node %d", numa_id); +VLOG_WARN("Cannot create pmd threads due to out of unpinned " + "cores on numa node %d", numa_id); return; } /* If cpu mask is specified, uses all unpinned cores, otherwise * tries creating NR_PMD_THREADS pmd threads. */ can_have = dp->pmd_cmask ? n_unpinned : MIN(n_unpinned, NR_PMD_THREADS); -pmds = xzalloc(can_have * sizeof *pmds); for (i = 0; i < can_have; i++) { unsigned core_id = ovs_numa_get_unpinned_core_on_numa(numa_id); -pmds[i] = xzalloc(sizeof **pmds); -dp_netdev_configure_pmd(pmds[i], dp, core_id, numa_id); -} +struct dp_netdev_pmd_thread *pmd = xzalloc(sizeof *pmd); -/* Distributes rx queues of this numa node between new pmd threads. */ -CMAP_FOR_EACH (port, node, >ports) { -if (netdev_is_pmd(port->netdev) -&& netdev_get_numa_id(port->netdev) == numa_id) { -for (i = 0; i < port->n_rxq; i++) { -/* Make thread-safety analyser happy. */ -ovs_mutex_lock([index]->poll_mutex); -dp_netdev_add_rxq_to_pmd(pmds[index], port, port->rxq[i]); -ovs_mutex_unlock([index]->poll_mutex); -index = (index + 1) % can_have; -} -} -} - -/* Actual start of pmd threads. */ -for (i = 0; i < can_have; i++) { -pmds[i]->thread = ovs_thread_create("pmd", pmd_thread_main, pmds[i]); +dp_netdev_configure_pmd(pmd, dp, core_id, numa_id); +pmd->thread = ovs_thread_create("pmd", pmd_thread_main, pmd); } -free(pmds); VLOG_INFO("Created %d pmd threads on numa node %d", can_have, numa_id); } } @@ -3220,14 +3198,22 @@ static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp) { struct dp_netdev_port *port; +struct hmapx to_reload = HMAPX_INITIALIZER(_reload); +struct hmapx_node *node; CMAP_FOR_EACH (port, node, >ports) { if (netdev_is_pmd(port->netdev)) { -int numa_id = netdev_get_numa_id(port->netdev); - -dp_netdev_set_pmds_on_numa(dp, numa_id); +dp_netdev_add_port_to_pmds__(dp, port, _reload); } } + +HMAPX_FOR_EACH (node, _reload) { +struct dp_netdev_pmd_thread *pmd; +pmd = (struct dp_netdev_pmd_thread *) node->data; +dp_netdev_reload_pmd__(pmd); +} + +hmapx_destroy(_reload); } static char * -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v8 05/16] dpif-netdev: Fix race condition in pmd thread initialization.
The pmds and the main threads are synchronized using a condition variable. The main thread writes a new configuration, then it waits on the condition variable. A pmd thread reads the new configuration, then it calls signal() on the condition variable. To make sure that the pmds and the main thread have a consistent view, each signal() should be backed by a wait(). Currently the first signal() doesn't have a corresponding wait(). If the pmd thread takes a long time to start and the signal() is received by a later wait, the threads will have an inconsistent view. The commit fixes the problem by removing the first signal() from the pmd thread. This is hardly a problem on current master, because the main thread will call the first wait() a long time after the creation of a pmd thread. It becomes a problem with the next commits. Signed-off-by: Daniele Di Proietto--- lib/dpif-netdev.c | 25 - 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index c3be4eb..fbd23cf 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2651,21 +2651,22 @@ dpif_netdev_wait(struct dpif *dpif) static int pmd_load_queues(struct dp_netdev_pmd_thread *pmd, struct rxq_poll **ppoll_list) -OVS_REQUIRES(pmd->poll_mutex) { struct rxq_poll *poll_list = *ppoll_list; struct rxq_poll *poll; int i; +ovs_mutex_lock(>poll_mutex); poll_list = xrealloc(poll_list, pmd->poll_cnt * sizeof *poll_list); i = 0; LIST_FOR_EACH (poll, node, >poll_list) { poll_list[i++] = *poll; } +ovs_mutex_unlock(>poll_mutex); *ppoll_list = poll_list; -return pmd->poll_cnt; +return i; } static void * @@ -2675,6 +2676,7 @@ pmd_thread_main(void *f_) unsigned int lc = 0; struct rxq_poll *poll_list; unsigned int port_seq = PMD_INITIAL_SEQ; +bool exiting; int poll_cnt; int i; @@ -2684,13 +2686,10 @@ pmd_thread_main(void *f_) /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */ ovsthread_setspecific(pmd->dp->per_pmd_key, pmd); pmd_thread_setaffinity_cpu(pmd->core_id); +poll_cnt = pmd_load_queues(pmd, _list); reload: emc_cache_init(>flow_cache); -ovs_mutex_lock(>poll_mutex); -poll_cnt = pmd_load_queues(pmd, _list); -ovs_mutex_unlock(>poll_mutex); - /* List port/core affinity */ for (i = 0; i < poll_cnt; i++) { VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n", @@ -2698,10 +2697,6 @@ reload: netdev_rxq_get_queue_id(poll_list[i].rx)); } -/* Signal here to make sure the pmd finishes - * reloading the updated configuration. */ -dp_netdev_pmd_reload_done(pmd); - for (;;) { for (i = 0; i < poll_cnt; i++) { dp_netdev_process_rxq_port(pmd, poll_list[i].port, poll_list[i].rx); @@ -2724,14 +2719,18 @@ reload: } } +poll_cnt = pmd_load_queues(pmd, _list); +exiting = latch_is_set(>exit_latch); +/* Signal here to make sure the pmd finishes + * reloading the updated configuration. */ +dp_netdev_pmd_reload_done(pmd); + emc_cache_uninit(>flow_cache); -if (!latch_is_set(>exit_latch)){ +if (!exiting) { goto reload; } -dp_netdev_pmd_reload_done(pmd); - free(poll_list); return NULL; } -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v8 03/16] dpif-netdev: Factor out port_create() from do_add_port().
Instead of performing every operation inside do_port_add() it seems clearer to introduce port_create(), since we already have port_destroy(). No functional change. Signed-off-by: Daniele Di Proietto--- lib/dpif-netdev.c | 69 ++- 1 file changed, 43 insertions(+), 26 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 060f5e0..a224b43 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -1095,29 +1095,22 @@ hash_port_no(odp_port_t port_no) } static int -do_add_port(struct dp_netdev *dp, const char *devname, const char *type, -odp_port_t port_no) -OVS_REQUIRES(dp->port_mutex) +port_create(const char *devname, const char *open_type, const char *type, +odp_port_t port_no, struct dp_netdev_port **portp) { struct netdev_saved_flags *sf; struct dp_netdev_port *port; -struct netdev *netdev; enum netdev_flags flags; -const char *open_type; -int error = 0; -int i, n_open_rxqs = 0; +struct netdev *netdev; +int n_open_rxqs = 0; +int i, error; -/* Reject devices already in 'dp'. */ -if (!get_port_by_name(dp, devname, )) { -error = EEXIST; -goto out; -} +*portp = NULL; /* Open and validate network device. */ -open_type = dpif_netdev_port_open_type(dp->class, type); error = netdev_open(devname, open_type, ); if (error) { -goto out; +return error; } /* XXX reject non-Ethernet devices */ @@ -1125,7 +1118,7 @@ do_add_port(struct dp_netdev *dp, const char *devname, const char *type, if (flags & NETDEV_LOOPBACK) { VLOG_ERR("%s: cannot add a loopback device", devname); error = EINVAL; -goto out_close; +goto out; } if (netdev_is_pmd(netdev)) { @@ -1134,7 +1127,7 @@ do_add_port(struct dp_netdev *dp, const char *devname, const char *type, if (n_cores == OVS_CORE_UNSPEC) { VLOG_ERR("%s, cannot get cpu core info", devname); error = ENOENT; -goto out_close; +goto out; } /* There can only be ovs_numa_get_n_cores() pmd threads, * so creates a txq for each, and one extra for the non @@ -1143,14 +1136,14 @@ do_add_port(struct dp_netdev *dp, const char *devname, const char *type, netdev_requested_n_rxq(netdev)); if (error && (error != EOPNOTSUPP)) { VLOG_ERR("%s, cannot set multiq", devname); -goto out_close; +goto out; } } port = xzalloc(sizeof *port); port->port_no = port_no; port->netdev = netdev; port->n_rxq = netdev_n_rxq(netdev); -port->rxq = xmalloc(sizeof *port->rxq * port->n_rxq); +port->rxq = xcalloc(port->n_rxq, sizeof *port->rxq); port->type = xstrdup(type); port->latest_requested_n_rxq = netdev_requested_n_rxq(netdev); @@ -1170,12 +1163,7 @@ do_add_port(struct dp_netdev *dp, const char *devname, const char *type, } port->sf = sf; -cmap_insert(>ports, >node, hash_port_no(port_no)); - -if (netdev_is_pmd(netdev)) { -dp_netdev_add_port_to_pmds(dp, port); -} -seq_change(dp->port_seq); +*portp = port; return 0; @@ -1186,13 +1174,42 @@ out_rxq_close: free(port->type); free(port->rxq); free(port); -out_close: -netdev_close(netdev); + out: +netdev_close(netdev); return error; } static int +do_add_port(struct dp_netdev *dp, const char *devname, const char *type, +odp_port_t port_no) +OVS_REQUIRES(dp->port_mutex) +{ +struct dp_netdev_port *port; +int error; + +/* Reject devices already in 'dp'. */ +if (!get_port_by_name(dp, devname, )) { +return EEXIST; +} + +error = port_create(devname, dpif_netdev_port_open_type(dp->class, type), +type, port_no, ); +if (error) { +return error; +} + +cmap_insert(>ports, >node, hash_port_no(port_no)); + +if (netdev_is_pmd(port->netdev)) { +dp_netdev_add_port_to_pmds(dp, port); +} +seq_change(dp->port_seq); + +return 0; +} + +static int dpif_netdev_port_add(struct dpif *dpif, struct netdev *netdev, odp_port_t *port_nop) { -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v8 04/16] dpif-netdev: Add functions to modify rxq without reloading pmd threads.
This commit introduces some functions to add/remove rxqs from pmd threads without reloading them. They will be used by next commits. Signed-off-by: Daniele Di Proietto--- lib/dpif-netdev.c | 77 --- 1 file changed, 56 insertions(+), 21 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index a224b43..c3be4eb 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -495,8 +495,6 @@ static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp); static void dp_netdev_del_pmds_on_numa(struct dp_netdev *dp, int numa_id); static void dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int numa_id); static void dp_netdev_pmd_clear_poll_list(struct dp_netdev_pmd_thread *pmd); -static void dp_netdev_del_port_from_pmd(struct dp_netdev_port *port, -struct dp_netdev_pmd_thread *pmd); static void dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp, struct dp_netdev_port *port); static void @@ -3002,11 +3000,11 @@ dp_netdev_pmd_clear_poll_list(struct dp_netdev_pmd_thread *pmd) ovs_mutex_unlock(>poll_mutex); } -/* Deletes all rx queues of 'port' from poll_list of pmd thread and - * reloads it if poll_list was changed. */ -static void -dp_netdev_del_port_from_pmd(struct dp_netdev_port *port, -struct dp_netdev_pmd_thread *pmd) +/* Deletes all rx queues of 'port' from poll_list of pmd thread. Returns true + * if 'port' was found in 'pmd' (therefore a restart is required). */ +static bool +dp_netdev_del_port_from_pmd__(struct dp_netdev_port *port, + struct dp_netdev_pmd_thread *pmd) { struct rxq_poll *poll, *next; bool found = false; @@ -3021,8 +3019,30 @@ dp_netdev_del_port_from_pmd(struct dp_netdev_port *port, } } ovs_mutex_unlock(>poll_mutex); -if (found) { -dp_netdev_reload_pmd__(pmd); + +return found; +} + +/* Deletes all rx queues of 'port' from all pmd threads. The pmd threads that + * need to be restarted are inserted in 'to_reload'. */ +static void +dp_netdev_del_port_from_all_pmds__(struct dp_netdev *dp, + struct dp_netdev_port *port, + struct hmapx *to_reload) +{ +int numa_id = netdev_get_numa_id(port->netdev); +struct dp_netdev_pmd_thread *pmd; + +CMAP_FOR_EACH (pmd, node, >poll_threads) { +if (pmd->numa_id == numa_id) { +bool found; + +found = dp_netdev_del_port_from_pmd__(port, pmd); + +if (found) { +hmapx_add(to_reload, pmd); +} + } } } @@ -3032,16 +3052,21 @@ static void dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp, struct dp_netdev_port *port) { -int numa_id = netdev_get_numa_id(port->netdev); struct dp_netdev_pmd_thread *pmd; +struct hmapx to_reload = HMAPX_INITIALIZER(_reload); +struct hmapx_node *node; -CMAP_FOR_EACH (pmd, node, >poll_threads) { -if (pmd->numa_id == numa_id) { -dp_netdev_del_port_from_pmd(port, pmd); - } +dp_netdev_del_port_from_all_pmds__(dp, port, _reload); + +HMAPX_FOR_EACH (node, _reload) { +pmd = (struct dp_netdev_pmd_thread *) node->data; +dp_netdev_reload_pmd__(pmd); } + +hmapx_destroy(_reload); } + /* Returns PMD thread from this numa node with fewer rx queues to poll. * Returns NULL if there is no PMD threads on this numa node. * Can be called safely only by main thread. */ @@ -3077,18 +3102,16 @@ dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd, pmd->poll_cnt++; } -/* Distributes all rx queues of 'port' between all PMD threads and reloads - * them if needed. */ +/* Distributes all rx queues of 'port' between all PMD threads in 'dp'. The + * pmd threads that need to be restarted are inserted in 'to_reload'. */ static void -dp_netdev_add_port_to_pmds(struct dp_netdev *dp, struct dp_netdev_port *port) +dp_netdev_add_port_to_pmds__(struct dp_netdev *dp, struct dp_netdev_port *port, + struct hmapx *to_reload) { int numa_id = netdev_get_numa_id(port->netdev); struct dp_netdev_pmd_thread *pmd; -struct hmapx to_reload; -struct hmapx_node *node; int i; -hmapx_init(_reload); /* Cannot create pmd threads for invalid numa node. */ ovs_assert(ovs_numa_numa_id_is_valid(numa_id)); @@ -3105,8 +3128,20 @@ dp_netdev_add_port_to_pmds(struct dp_netdev *dp, struct dp_netdev_port *port) dp_netdev_add_rxq_to_pmd(pmd, port, port->rxq[i]); ovs_mutex_unlock(>poll_mutex); -hmapx_add(_reload, pmd); +hmapx_add(to_reload, pmd); } +} + +/* Distributes all rx queues of 'port' between all PMD threads in 'dp' and + * reloads them, if needed. */ +static void +dp_netdev_add_port_to_pmds(struct
[ovs-dev] [PATCH v8 01/16] dpif-netdev: Destroy 'port_mutex' in dp_netdev_free().
Found by inspection. Signed-off-by: Daniele Di Proietto--- lib/dpif-netdev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 1e8a37c..24717cc 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -986,6 +986,7 @@ dp_netdev_free(struct dp_netdev *dp) seq_destroy(dp->port_seq); cmap_destroy(>ports); +ovs_mutex_destroy(>port_mutex); /* Upcalls must be disabled at this point */ dp_netdev_destroy_upcall_lock(dp); -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v8 00/16] Reconfigure netdev at runtime
Currently we treat set_multiq() calls specially in netdev and dpif-netdev: every pmd thread must be stopped and set_multiq() is allowed to destroy and recreate the device. I think we can improve this by: * Generalizing the mechanism to allow changing other parameters at runtime (such as MTU). * Involving less the above layer (dpif-netdev). The request for changes often comes from below (netdev_dpdk_set_config(), or the vhost new_device() callback). There's no need for dpif-netdev to remember the requested value, all that it needs to know is that a configuration change is requested. This series implements exactly this: a mechanism to allow a netdev provider to request configuration changes, to which dpif-netdev will respond by stopping rx/tx and calling a netdev function to appy the new configuration. The new mechanism is used in this series to replace the set_multiq() call, but the idea is to use it also at least for: * Changing the MTU at runtime * Automatically detecting the number of rx queues for a vhost-user device * Move a DPDK vhost device to the proper NUMA socket The first commits refactor some code in dpif-netdev and, most importantly avoid using RCU for ports. Each thread will have its local copy of all the ports in the datapath. The series is also available here: https://github.com/ddiproietto/ovs/tree/configchangesv8 v8: * Update comment in rcu.h: ovs_mutex_cond_wait doesn't quiesce. * Change 'set_multiq' to 'set_tx_multiq'. * Added documentation in comments and commit messages explaining thread local port cache. * Fixed style issues reported by checkpatch.py. * Fixed race condition when deleting pmd thread. v7: * Dropped already applied patches. * Stop using RCU for ports. * Rebased against master. v6: * Rebased against master. * Check return value of netdev_rxq_open(). * Fix comment. v5: * Style fixes. * Fixed a bug in dp_netdev_free() in patch 6. v4: * Added another patch to uniform names of variables in netdev-dpdk (no functional change) * Update some netdev comments to document the relation between netdev_set_multiq() and netdev_reconfigure() * Clarify that when netdev_reconfigure() is called no call to netdev_send() or netdev_rxq_recv() must be issued. * Move check to skip reconfiguration in netdev_dpdk_reconfigure() before rte_eth_dev_stop(). v3: * Fixed another outdated comment about rx queue configuration, as pointed out by Mark * Removed unnecessary and buggy initialization of requested_n_rxq in reconfigure_pmd_threads(). * Removed unused 'err' variable in netdev_dpdk_set_multiq(). * Changed comparison in netdev_set_multiq() to use previous 'netdev->requested_n_txq' instead of 'netdev->up.n_txq' * Return immediately in netdev_dpdk_reconfigure() if configuration didn't change anything. v2: * Fixed do_add_port(): we have to call netdev_reconfigure() before opening the rxqs. This prevents memory leaks, and makes sure that the datapath polls the appropriate number of queues * Fixed netdev_dpdk_vhost_set_multiq(): it must call netdev_request_reconfigure(). Since it is now equal to netdev_dpdk_set_multiq(), the two function have been merged. * Fixed netdev_dpdk_set_config(): dev->requested_n_rxq is now accessed while holding the appropriate mutex. * Fixed some outdated comments about rx queue configuration. Daniele Di Proietto (16): dpif-netdev: Destroy 'port_mutex' in dp_netdev_free(). dpif-netdev: Remove unused 'index' in dp_netdev_pmd_thread. dpif-netdev: Factor out port_create() from do_add_port(). dpif-netdev: Add functions to modify rxq without reloading pmd threads. dpif-netdev: Fix race condition in pmd thread initialization. dpif-netdev: Remove duplicate code in dp_netdev_set_pmds_on_numa(). hmap: Add HMAP_FOR_EACH_POP. dpif-netdev: Add pmd thread local port cache for transmission. hmap: Use struct for hmap_at_position(). dpif-netdev: Use hmap for ports. ovs-thread: Do not quiesce in ovs_mutex_cond_wait(). ofproto-dpif: Call dpif_poll_threads_set() before dpif_run(). dpif-netdev: Change pmd thread configuration in dpif_netdev_run(). dpif-netdev: Handle errors in reconfigure_pmd_threads(). netdev: Add reconfigure request mechanism. netdev-dpdk: Use ->reconfigure() call to change rx/tx queues. lib/cfm.c| 5 +- lib/dpif-netdev.c| 702 +++ lib/dpif-provider.h | 3 +- lib/hmap.c | 26 +- lib/hmap.h | 11 +- lib/id-pool.c| 5 +- lib/learning-switch.c| 5 +- lib/netdev-bsd.c | 3 +- lib/netdev-dpdk.c| 194 ++-- lib/netdev-dummy.c | 3 +- lib/netdev-linux.c | 8 +- lib/netdev-provider.h| 50 ++- lib/netdev-vport.c | 3 +- lib/netdev.c | 75 +++-- lib/netdev.h | 7 +- lib/odp-util.c | 7 +- lib/ovs-rcu.h|
[ovs-dev] [PATCH v8 02/16] dpif-netdev: Remove unused 'index' in dp_netdev_pmd_thread.
Signed-off-by: Daniele Di Proietto--- lib/dpif-netdev.c | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 24717cc..060f5e0 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -430,8 +430,6 @@ struct dp_netdev_pmd_thread { struct latch exit_latch;/* For terminating the pmd thread. */ atomic_uint change_seq; /* For reloading pmd ports. */ pthread_t thread; -int index; /* Idx of this pmd thread among pmd*/ -/* threads on same numa node. */ unsigned core_id; /* CPU core id of this pmd thread. */ int numa_id;/* numa node id of this pmd thread. */ atomic_int tx_qid; /* Queue id used by this pmd thread to @@ -485,8 +483,8 @@ static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *, static void dp_netdev_disable_upcall(struct dp_netdev *); static void dp_netdev_pmd_reload_done(struct dp_netdev_pmd_thread *pmd); static void dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, -struct dp_netdev *dp, int index, -unsigned core_id, int numa_id); +struct dp_netdev *dp, unsigned core_id, +int numa_id); static void dp_netdev_destroy_pmd(struct dp_netdev_pmd_thread *pmd); static void dp_netdev_set_nonpmd(struct dp_netdev *dp); static struct dp_netdev_pmd_thread *dp_netdev_get_pmd(struct dp_netdev *dp, @@ -2787,8 +2785,7 @@ dp_netdev_set_nonpmd(struct dp_netdev *dp) struct dp_netdev_pmd_thread *non_pmd; non_pmd = xzalloc(sizeof *non_pmd); -dp_netdev_configure_pmd(non_pmd, dp, 0, NON_PMD_CORE_ID, -OVS_NUMA_UNSPEC); +dp_netdev_configure_pmd(non_pmd, dp, NON_PMD_CORE_ID, OVS_NUMA_UNSPEC); } /* Caller must have valid pointer to 'pmd'. */ @@ -2829,10 +2826,9 @@ dp_netdev_pmd_get_next(struct dp_netdev *dp, struct cmap_position *pos) /* Configures the 'pmd' based on the input argument. */ static void dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, -int index, unsigned core_id, int numa_id) +unsigned core_id, int numa_id) { pmd->dp = dp; -pmd->index = index; pmd->core_id = core_id; pmd->numa_id = numa_id; pmd->poll_cnt = 0; @@ -3140,7 +3136,7 @@ dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int numa_id) for (i = 0; i < can_have; i++) { unsigned core_id = ovs_numa_get_unpinned_core_on_numa(numa_id); pmds[i] = xzalloc(sizeof **pmds); -dp_netdev_configure_pmd(pmds[i], dp, i, core_id, numa_id); +dp_netdev_configure_pmd(pmds[i], dp, core_id, numa_id); } /* Distributes rx queues of this numa node between new pmd threads. */ -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 14/15] system-tests: Add tcp simple test.
On 15 April 2016 at 17:02, Daniele Di Proiettowrote: > Useful to test the datapath ability to forward tcp packets without the > complexity of connection tracking. > > Signed-off-by: Daniele Di Proietto I think we could do with a few more of these incremental steps on the way up to the more complex configurations in the existing tests. This is a step in the right direction. Acked-by: Joe Stringer ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 13/15] system-tests: Disable offloads in userspace tests.
On 15 April 2016 at 17:02, Daniele Di Proiettowrote: > The system userspace testsuite uses the userspace datapath with > netdev-linux devices, connected to veth pairs with the AF_PACKET socket: > > (veth pair) (AF_PACKET) > TCP stack -> p0 ---> ovs-p0 -> netdev-linux (userspace OVS) > > Unfortunately this configuration has some problems with offloads: a > packet generated by the TCP stack maybe sent to p0 without being > checksummed or segmented. The AF_PACKET socket, by default, ignores the > offloads and just transmits the data of the packets to userspace, but: > > 1. The packet may need GSO, so the data will be too big to be received >by the userspace datapath > 2. The packet might have incomplete checksums, so it will likely be >discarded by the receiver. > > Problem 1 causes TCP connections to see a congestion window smaller than > the MTU, which hurts performance but doesn't prevent communication. > > Problem 2 was hidden in the testsuite by a Linux kernel bug, fixed by > commit ce8c839b74e3("veth: don’t modify ip_summed; doing so treats > packets with bad checksums as good"). In the kernels that include the > fix, the userspace datapath is able to process pings, but not tcp or udp > data. > > Unfortunately I couldn't find a way to ask the AF_PACKET to perform > offloads in kernel. A possible fix would be to use the PACKET_VNET_HDR > sockopt and perform the offloads in userspace. > > Until a proper fix is worked out for netdev-linux, this commit disables > offloads on the non-OVS side of the veth pair, as a workaround. > > Signed-off-by: Daniele Di Proietto Thanks for the fix. As I understand this simply fixes the userspace testsuite when running on newer kernels today, so we could apply this independently of the rest of the series. Acked-by: Joe Stringer ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 10/15] dpif-netdev: Implement conntrack dump functions.
On 15 April 2016 at 17:02, Daniele Di Proiettowrote: > Signed-off-by: Daniele Di Proietto Could be combined with the earlier patch that provides the actual implementation? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 09/15] dpif-netdev: Execute conntrack action.
On 15 April 2016 at 17:02, Daniele Di Proiettowrote: > This commit implements the OVS_ACTION_ATTR_CT action in dpif-netdev. > > To allow ofproto-dpif to detect the conntrack feature, flow_put will not > discard anymore flows with ct_* fields set. We still shouldn't allow > flows with NAT bits set, since there is no support for NAT. > > Signed-off-by: Daniele Di Proietto > --- ... > @@ -2604,6 +2617,9 @@ dpif_netdev_run(struct dpif *dpif) > ovs_mutex_unlock(>non_pmd_mutex); > dp_netdev_pmd_unref(non_pmd); > > +/* XXX: If workload is too heavy we could add a separate thread. */ > +conntrack_run(>conntrack); > + Have you tried, eg, portscanning with several threads against the implementation to see how bad it gets? > @@ -3850,12 +3866,48 @@ dp_execute_cb(void *aux_, struct dp_packet **packets, > int cnt, > VLOG_WARN("Packet dropped. Max recirculation depth exceeded."); > break; > > -case OVS_ACTION_ATTR_CT: > -/* If a flow with this action is slow-pathed, datapath assistance is > - * required to implement it. However, we don't support this action > - * in the userspace datapath. */ > -VLOG_WARN("Cannot execute conntrack action in userspace."); > +case OVS_ACTION_ATTR_CT: { > +const struct nlattr *b; > +bool commit = false; > +unsigned int left; > +uint16_t zone = 0; > +const char *helper = NULL; > +const uint32_t *setmark = NULL; > +const struct ovs_key_ct_labels *setlabel = NULL; > + > + > +/* XXX parsing this everytime is expensive. We should do like kernel > + * does and create a structure. */ Seems reasonable. You say it's expensive (how expensive?), but it also seems a little cleaner to store it in a more accessible manner. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 08/15] conntrack: Implement dumping to ct_entry.
On 15 April 2016 at 17:02, Daniele Di Proiettowrote: > Signed-off-by: Daniele Di Proietto > +static void > +conn_key_to_tuple(const struct conn_key *key, struct ct_dpif_tuple *tuple) > +{ > +if (key->dl_type == htons(ETH_TYPE_IP)) { > +tuple->l3_type = AF_INET; > +} else if (key->dl_type == htons(ETH_TYPE_IPV6)) { > +tuple->l3_type = AF_INET6; > +} > +tuple->ip_proto = key->nw_proto; > +ct_endpoint_to_ct_dpif_inet_addr(>src.addr, >src, > + key->dl_type); > +ct_endpoint_to_ct_dpif_inet_addr(>dst.addr, >dst, > + key->dl_type); > + > +if (key->nw_proto == IPPROTO_ICMP) { > +tuple->icmp_id = key->src.port; > +/* ICMP type and code are not tracked */ > +tuple->icmp_type = 0; > +tuple->icmp_code = 0; > +} else { > +tuple->src_port = key->src.port; > +tuple->dst_port = key->dst.port; > +} I think in the Linux implementation, the original ICMP message's icmp type/code are stored/provided here. Might be a behaviour parity question. > +} > + > +static void > +conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, > + long long now) > +{ > +struct ct_l4_proto *class; > +long long expiration; > +memset(entry, 0, sizeof *entry); > +conn_key_to_tuple(>key, >tuple_orig); > +conn_key_to_tuple(>rev_key, >tuple_reply); > + > +entry->zone = conn->key.zone; > +entry->mark = conn->mark; > + > +memcpy(>labels, >label, sizeof(entry->labels)); > +/* Not implemented yet */ > +entry->timestamp.start = 0; > +entry->timestamp.stop = 0; Is it better to reflect that timestamps are unsupported up to the user or simply report 0 when it's unsupported? (I don't have a particular preference, just asking the question) ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 07/15] conntrack: Implement flush function.
On 15 April 2016 at 17:02, Daniele Di Proiettowrote: > Signed-off-by: Daniele Di Proietto Looks like this can be combined with patch #11 (to plumb it up through appctl). Acked-by: Joe Stringer ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 04/15] conntrack: New userspace connection tracker.
On 15 April 2016 at 17:02, Daniele Di Proiettowrote: > This commit adds the conntrack module. > > It is a connection tracker that resides entirely in userspace. Its > primary user will be the dpif-netdev datapath. > > The module main goal is to provide conntrack_execute(), which offers a > convenient interface to implement the datapath ct() action. > > The conntrack module uses two submodules to deal with the l4 protocol > details (conntrack-other for UDP and ICMP, conntrack-tcp for TCP). > > The conntrack-tcp submodule implementation is adapted from FreeBSD's pf > subsystem, therefore it's BSD licensed. It has been slightly altered to > match the OVS coding style and to allow the pickup of already > established connections. > > Signed-off-by: Daniele Di Proietto Thanks for submitting this, I know there's a few people excited about this feature. I notice that there is no NEWS item about this, were you intending to hold off on announcing it until there is feature parity with the kernel, eg support for IPv[46] fragments; ALGs; NAT? > diff --git a/lib/conntrack-other.c b/lib/conntrack-other.c > new file mode 100644 > index 000..65d02a9 > --- /dev/null > +++ b/lib/conntrack-other.c > @@ -0,0 +1,91 @@ > +/* > + * Copyright (c) 2015, 2016 Nicira, Inc. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#include > + > +#include "conntrack-private.h" > +#include "dp-packet.h" > + > +enum other_state { > +OTHERS_FIRST, > +OTHERS_MULTIPLE, > +OTHERS_BIDIR, > +}; > + > +struct conn_other { > +struct conn up; > +enum other_state state; > +}; > + > +static const long long other_timeouts[] = { > +[OTHERS_FIRST] = 60 * 1000, > +[OTHERS_MULTIPLE] = 60 * 1000, > +[OTHERS_BIDIR] = 30 * 1000, > +}; Are these timeouts in milliseconds? (a comment or #define might help with that) > diff --git a/lib/conntrack.c b/lib/conntrack.c > new file mode 100644 > index 000..840335b > --- /dev/null > +++ b/lib/conntrack.c ... > +static struct ct_l4_proto *l4_protos[] = { > +[IPPROTO_TCP] = _proto_tcp, > +[IPPROTO_UDP] = _proto_other, > +[IPPROTO_ICMP] = _proto_other, > +}; I see that conntrack-other is shared by UDP and ICMP. In the Linux datapath, the UDP handler also checks UDP length and UDP checksum - I wonder if we can share most of the code here for these protocols but add these additional checks for the UDP case? > +static struct conn * > +conn_not_found(struct conntrack *ct, struct dp_packet *pkt, > + struct conn_lookup_ctx *ctx, uint8_t *state, bool commit, > + long long now) > +{ > +unsigned bucket = hash_to_bucket(ctx->hash); > +struct conn *nc = NULL; > + > +if (!valid_new(pkt, >key)) { > +*state |= CS_INVALID; > +return nc; > +} > + > +*state |= CS_NEW; > + > +if (commit) { > +nc = new_conn(pkt, >key, now); > + > +memcpy(>rev_key, >key, sizeof nc->rev_key); > + > +conn_key_reverse(>rev_key); > +hmap_insert(>connections[bucket], >node, ctx->hash); > +} > + > +return nc; > +} This function can return NULL > +static struct conn * > +process_one(struct conntrack *ct, struct dp_packet *pkt, > +struct conn_lookup_ctx *ctx, uint16_t zone, > +bool commit, long long now) > +{ > +unsigned bucket = hash_to_bucket(ctx->hash); > +struct conn *conn = ctx->conn; > +uint8_t state = 0; > + > +if (conn) { > +if (ctx->related) { > +state |= CS_RELATED; > +if (ctx->reply) { > +state |= CS_REPLY_DIR; > +} > +} else { > +enum ct_update_res res; > + > +res = conn_update(conn, pkt, ctx->reply, now); > + > +switch (res) { > +case CT_UPDATE_VALID: > +state |= CS_ESTABLISHED; > +if (ctx->reply) { > +state |= CS_REPLY_DIR; > +} > +break; > +case CT_UPDATE_INVALID: > +state |= CS_INVALID; > +break; > +case CT_UPDATE_NEW: > +hmap_remove(>connections[bucket], >node); > +delete_conn(conn); > +conn = conn_not_found(ct, pkt, ctx, , commit, now); ...which is then stored here... > +break; > +} > +} > + > +
Re: [ovs-dev] [PATCH v2 03/15] flow: Introduce parse_dl_type().
On 15 April 2016 at 17:02, Daniele Di Proiettowrote: > The function simply returns the ethernet type of the packet (after > eventually discarding the VLAN tag). It will be used by a following > commit. > > Signed-off-by: Daniele Di Proietto > --- > lib/flow.c | 14 -- > lib/flow.h | 1 + > 2 files changed, 13 insertions(+), 2 deletions(-) > > diff --git a/lib/flow.c b/lib/flow.c > index 972a996..0250a7c 100644 > --- a/lib/flow.c > +++ b/lib/flow.c > @@ -327,7 +327,7 @@ parse_mpls(const void **datap, size_t *sizep) > return MIN(count, FLOW_MAX_MPLS_LABELS); > } > > -static inline ovs_be16 > +static inline ALWAYS_INLINE ovs_be16 Usually we try to avoid telling the compiler how to optimise things like inlining - is there a reasoning around these changes? On 15 April 2016 at 17:02, Daniele Di Proietto wrote: > The function simply returns the ethernet type of the packet (after > eventually discarding the VLAN tag). It will be used by a following > commit. > > Signed-off-by: Daniele Di Proietto > --- > lib/flow.c | 14 -- > lib/flow.h | 1 + > 2 files changed, 13 insertions(+), 2 deletions(-) > > diff --git a/lib/flow.c b/lib/flow.c > index 972a996..0250a7c 100644 > --- a/lib/flow.c > +++ b/lib/flow.c > @@ -327,7 +327,7 @@ parse_mpls(const void **datap, size_t *sizep) > return MIN(count, FLOW_MAX_MPLS_LABELS); > } > > -static inline ovs_be16 > +static inline ALWAYS_INLINE ovs_be16 > parse_vlan(const void **datap, size_t *sizep) > { > const struct eth_header *eth = *datap; > @@ -349,7 +349,7 @@ parse_vlan(const void **datap, size_t *sizep) > return 0; > } > > -static inline ovs_be16 > +static inline ALWAYS_INLINE ovs_be16 > parse_ethertype(const void **datap, size_t *sizep) > { > const struct llc_snap_header *llc; > @@ -826,6 +826,16 @@ miniflow_extract(struct dp_packet *packet, struct > miniflow *dst) > dst->map = mf.map; > } > > +ovs_be16 > +parse_dl_type(const struct eth_header *data_, size_t size) > +{ > +const void *data = data_; > + > +parse_vlan(, ); > + > +return parse_ethertype(, ); > +} > + > /* For every bit of a field that is wildcarded in 'wildcards', sets the > * corresponding bit in 'flow' to zero. */ > void > diff --git a/lib/flow.h b/lib/flow.h > index 6771232..0406198 100644 > --- a/lib/flow.h > +++ b/lib/flow.h > @@ -107,6 +107,7 @@ void flow_compose(struct dp_packet *, const struct flow > *); > > bool parse_ipv6_ext_hdrs(const void **datap, size_t *sizep, uint8_t > *nw_proto, > uint8_t *nw_frag); > +ovs_be16 parse_dl_type(const struct eth_header *data_, size_t size); > > static inline uint64_t > flow_get_xreg(const struct flow *flow, int idx) > -- > 2.1.4 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 02/15] flow: Export parse_ipv6_ext_hdrs().
On 15 April 2016 at 17:02, Daniele Di Proiettowrote: > This will be used by a future commit. > > Signed-off-by: Daniele Di Proietto > --- > lib/flow.c | 140 > ++--- > lib/flow.h | 3 ++ > 2 files changed, 81 insertions(+), 62 deletions(-) > > diff --git a/lib/flow.c b/lib/flow.c > index 560a90f..972a996 100644 > --- a/lib/flow.c > +++ b/lib/flow.c > @@ -439,6 +439,82 @@ invalid: > arp_buf[1] = eth_addr_zero; > } > > +static inline bool > +parse_ipv6_ext_hdrs__(const void **datap, size_t *sizep, uint8_t *nw_proto, > + uint8_t *nw_frag) > +{ > +while (1) { > +if (OVS_LIKELY((*nw_proto != IPPROTO_HOPOPTS) > + && (*nw_proto != IPPROTO_ROUTING) > + && (*nw_proto != IPPROTO_DSTOPTS) > + && (*nw_proto != IPPROTO_AH) > + && (*nw_proto != IPPROTO_FRAGMENT))) { > +/* It's either a terminal header (e.g., TCP, UDP) or one we > + * don't understand. In either case, we're done with the > + * packet, so use it to fill in 'nw_proto'. */ > +return true; > +} > + > +/* We only verify that at least 8 bytes of the next header are > + * available, but many of these headers are longer. Ensure that > + * accesses within the extension header are within those first 8 > + * bytes. All extension headers are required to be at least 8 > + * bytes. */ > +if (OVS_UNLIKELY(*sizep < 8)) { > +return false; > +} > + > +if ((*nw_proto == IPPROTO_HOPOPTS) > +|| (*nw_proto == IPPROTO_ROUTING) > +|| (*nw_proto == IPPROTO_DSTOPTS)) { > +/* These headers, while different, have the fields we care > + * about in the same location and with the same > + * interpretation. */ > +const struct ip6_ext *ext_hdr = *datap; > +*nw_proto = ext_hdr->ip6e_nxt; > +if (OVS_UNLIKELY(!data_try_pull(datap, sizep, > +(ext_hdr->ip6e_len + 1) * 8))) { > +return false; > +} > +} else if (*nw_proto == IPPROTO_AH) { > +/* A standard AH definition isn't available, but the fields > + * we care about are in the same location as the generic > + * option header--only the header length is calculated > + * differently. */ > +const struct ip6_ext *ext_hdr = *datap; > +*nw_proto = ext_hdr->ip6e_nxt; > +if (OVS_UNLIKELY(!data_try_pull(datap, sizep, > +(ext_hdr->ip6e_len + 2) * 4))) { > +return false; > +} > +} else if (*nw_proto == IPPROTO_FRAGMENT) { > +const struct ovs_16aligned_ip6_frag *frag_hdr = *datap; > + > +*nw_proto = frag_hdr->ip6f_nxt; > +if (!data_try_pull(datap, sizep, sizeof *frag_hdr)) { > +return false; > +} > + > +/* We only process the first fragment. */ > +if (frag_hdr->ip6f_offlg != htons(0)) { > +*nw_frag = FLOW_NW_FRAG_ANY; > +if ((frag_hdr->ip6f_offlg & IP6F_OFF_MASK) != htons(0)) { > +*nw_frag |= FLOW_NW_FRAG_LATER; > +*nw_proto = IPPROTO_FRAGMENT; > +return true; > +} > +} > +} > +} > +} > + > +bool > +parse_ipv6_ext_hdrs(const void **datap, size_t *sizep, uint8_t *nw_proto, > +uint8_t *nw_frag) > +{ > +return parse_ipv6_ext_hdrs__(datap, sizep, nw_proto, nw_frag); > +} > + I couldn't tell any difference between parse_ipv6_ext_hdrs() and parse_ipv6_ext_hdrs__(); are they both necessary? Acked-by: Joe Stringer ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 01/15] packets: Define ICMP types.
On 15 April 2016 at 17:02, Daniele Di Proiettowrote: > Linux and FreeBSD have slightly different names for these constants. > Windows doesn't define them. It is simpler to redefine them from > scratch for OVS. The new names are different than those used in Linux > and FreeBSD. > > These definitions will be used by a future commit. > > Signed-off-by: Daniele Di Proietto Acked-by: Joe Stringer ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH RFC] ovn: distributed logical port for VM metadata access
Hi Ramu, On Tue, Apr 19, 2016 at 8:12 AM, Ramu Ramamurthywrote: > On Mon, Apr 18, 2016 at 2:55 PM, Aaron Rosen > wrote: > > I like this idea as well. > > > > The one question I have with is is how we should determine which ip > address > > to select for the 'distributed' port? > > Aaron, Thanks for your review, and feedback. > > We can use the dhcp-port (and its IP address) for the distributed-port. > > The current native-dhcp proposal > (http://openvswitch.org/pipermail/dev/2016-April/069787.html) > assumes that a dhcp-server ip-address "server_id" is defined on the subnet. > > action=(dhcp_offer(offerip = 10.0.0.2, router = 10.0.0.1, > server_id = 10.0.0.2, mtu = 1300, lease_time = 3600, > > For openstack, This means that a DHCP port has been created on that > subnet in neutron. > In the absence of a dhcp-agent, the neutron-ovn-plugin would have to > auto-create the > dhcp-port in neutron upon creation of a subnet, and then use that > port's IP address as > the "server_id" when it programs the "dhcp-options" > column of the Logical_Switch table. > > The pros of the distributed-port approach is that a) HA is not needed, > b) it runs the existing > neutron-metadata-proxy/neutron-metadata-agent as-is, c) In the future, > we could remove the > neutron-metadata-agent also, by (nova-compute) configuring the > instance-id and tenant-id as external-ids > of the VM's ovs interface - hence not need to run any neutron-agents at > all. > The drawbacks include creation of namespaces and metadata-proxy > processes on each hypervisor. > > > Cool, makes sense to me. > > If we want to avoid creating the network name spaces and > > running the http proxy on each hypervisor is if we took a similar > approach > > that openstack uses for handling dhcp/metadata requests. > > When a subnet is created we could have the neutron-ovn-plugin notify a > > metadata agent which would create a port on the given subnet for the > logical > > network. Then, to get instances to route its metadata traffic to this > > logical port we could have ovn distribute an additional host-route via > dhcp > > (using option 121). Similar to what you are proposing. > > > > > > I.e: So for example if someone created a network/subnet. > > > > In the ovn plugin we can signal the metadata agent to create a port on > that > > network. Then, for every port that is created on this network we would > > distribute a hostroute of 169.254.169.254/32 via ; Then, > > we'd have the metadata agent just run there which would answer these meta > > data requests and route them back. > > > > One down side to this solution is that this metadata agent would need to > be > > made HA in some way. In your current solution if the metadata agent > crashes > > or something the failure is isolated to the hypervisor. That said, this > type > > of HA seems like it can be implemented in at least an active passive > > solution easily enough. > > > > Thoughts? > > > > Your proposal is an alternative solution - which involves changes only to > the > neutron components (and no changes in ovn ?). Would there be only one > modified neutron-metadata-agent in an active-passive configuration serving > all > the VMs ? If there are multiple agents, would you need agent-scheduling to > assign networks to agents ? > > Could you share more details of the approach ? > Right, with this approach I don't believe we would need any additional changes in ovn besides the ability to specify host routes via dhcp. For this solution we'd need to modify the neutron-metadata-agent to work in this way. Currently, it has this functionality though it's coupled together with the dhcp-agent (which we wouldn't want). In order to support HA you would need to run multiple agents and the HA would use a similar agent-scheduling method that neutron currently uses to map dhcp-agents to networks. Also, the HA for this would need to be implemented as active-passive I believe. Personally, I do prefer your solution as the HA solution is more elegant as it runs on each HV. That said, if having the namespaces on the hypervisor nodes is a deal breaker than this would be an alternative solution to that. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] Display correct ovnsb sock location in the ovnsb-ctl help message
Signed-off-by: Hui Kang--- ovn/utilities/ovn-sbctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovn/utilities/ovn-sbctl.c b/ovn/utilities/ovn-sbctl.c index a257729..a888333 100644 --- a/ovn/utilities/ovn-sbctl.c +++ b/ovn/utilities/ovn-sbctl.c @@ -329,7 +329,7 @@ Options:\n\ -t, --timeout=SECS wait at most SECS seconds\n\ --dry-run do not commit changes to database\n\ --oneline print exactly one line of output per command\n", - program_name, program_name, ctl_get_db_cmd_usage(), ctl_default_db()); + program_name, program_name, ctl_get_db_cmd_usage(), sbctl_default_db()); vlog_usage(); printf("\ --no-syslog equivalent to --verbose=sbctl:syslog:warn\n"); -- 1.9.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC PATCH] create vxlan device using rtnetlink interface
On Tue, Apr 19, 2016 at 07:00:00PM +0200, Jiri Benc wrote: > On Mon, 18 Apr 2016 11:30:43 -0700, Jesse Gross wrote: > > (Even on new kernels I don't know how this would interact > > with OVS tunnel ports that are not of type 'flow'. Would they also use > > the compat code?) > > This is a very good point. Although, on the other hand, how much common > are tunnel ports with fixed parameters nowadays? OVN and NSX use them. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC PATCH] create vxlan device using rtnetlink interface
On Mon, 18 Apr 2016 11:30:43 -0700, Jesse Gross wrote: > (Even on new kernels I don't know how this would interact > with OVS tunnel ports that are not of type 'flow'. Would they also use > the compat code?) This is a very good point. Although, on the other hand, how much common are tunnel ports with fixed parameters nowadays? > Even if we didn't have to worry about any legacy code, I'm not > convinced that taking tunnel configuration out of OVSDB is the right > thing from the user's perspective. Much of the value of OVS is around > programmatic and central control so we should allow that where > possible. For example, if a controller starts using a different > encapsulation type in a new version or for a different situation, it > seems undesirable to require the user to change things on each > machine. But tunnels can't be completely set up from a controller even now. At minimum, an (outer) IP address has to be assigned on each machine. Of course, there are cases in which an existing, already set up network is used as the underlay. If the main target is such use cases, then indeed, this could be perceived as usability regression. > With regards to veths, in many cases they are used to stitch > together different combinations of OVS and bridging, which is > something that is itself a pain point and something that I hope will > be less necessary in the future. Fair enough. Jiri ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH RFC] ovn: distributed logical port for VM metadata access
> > We can use the dhcp-port (and its IP address) for the distributed-port. > > The current native-dhcp proposal > (http://openvswitch.org/pipermail/dev/2016-April/069787.html) > assumes that a dhcp-server ip-address "server_id" is defined on the subnet. > > action=(dhcp_offer(offerip = 10.0.0.2, router = 10.0.0.1, > server_id = 10.0.0.2, mtu = 1300, lease_time = 3600, > > For openstack, This means that a DHCP port has been created on that > subnet in neutron. > The networking-ovn patch for native dhcp support presently doesn't create a dhcp port in the create subnet method [1] Either this patch [1] can create the dhcp port or it can be created when the metadata changes are ready. Please let me know if you have any comments on the gerrit review [1] - https://review.openstack.org/#/c/243174/10/networking_ovn/plugin.py In the absence of a dhcp-agent, the neutron-ovn-plugin would have to > auto-create the > dhcp-port in neutron upon creation of a subnet, and then use that > port's IP address as > the "server_id" when it programs the "dhcp-options" > column of the Logical_Switch table. > > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn-northd: Add support for static_routes.
Mickey, Can you give a first shot at the review. You had some issues with the my code structure and looks like Steve decided to use something similar. On 19 April 2016 at 02:36, steve.ruanwrote: > From: Guru Shetty > Steve, You should be the author (when you respin after Mickey's review). You can just add me as "Co-authored-by" along with my signed-off-by. You can search in "git log" for other examples. > > static routes are useful when connecting multiple > routers with each other. > > Signed-off-by: Gurucharan Shetty > Signed-off-by: steve.ruan > --- > ovn/northd/ovn-northd.8.xml | 5 +- > ovn/northd/ovn-northd.c | 101 + > ovn/ovn-nb.ovsschema | 15 +++- > ovn/ovn-nb.xml| 31 +++ > ovn/utilities/ovn-nbctl.8.xml | 5 ++ > ovn/utilities/ovn-nbctl.c | 5 ++ > tests/ovn.at | 204 > ++ > 7 files changed, 362 insertions(+), 4 deletions(-) > > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml > index da776e1..978853c 100644 > --- a/ovn/northd/ovn-northd.8.xml > +++ b/ovn/northd/ovn-northd.8.xml > @@ -682,8 +682,9 @@ next; > > > > - If the route has a gateway, G is the gateway IP > address, > - otherwise it is ip4.dst. > + If the route has a gateway, G is the gateway IP > address. > + Instead, if the route is from a configured static route, > G > + is the next hop IP address. Else it is ip4.dst. > > > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > index 260c02f..71c9b29 100644 > --- a/ovn/northd/ovn-northd.c > +++ b/ovn/northd/ovn-northd.c > @@ -234,6 +234,18 @@ allocate_tnlid(struct hmap *set, const char *name, > uint32_t max, > return 0; > } > > +/* Holds the next hop ip address and the logical router port via which > + * a static route is reachable. */ > +struct route_to_port { > +ovs_be32 ip;/* network address of the route. */ > +ovs_be32 mask; /* network mask of the route. */ > +ovs_be32 next_hop; /* next_hop ip address for the above > route. */ > +struct uuid rport; /* output port specified by CMS, or null > if not specified */ > +struct ovn_port *ovn_port; /* The logical router port via which the > packet > + * needs to exit to reach the next hop. */ > +}; > + > + > /* The 'key' comes from nbs->header_.uuid or nbr->header_.uuid or > * sb->external_ids:logical-switch. */ > struct ovn_datapath { > @@ -249,6 +261,9 @@ struct ovn_datapath { > /* Logical router data (digested from nbr). */ > const struct ovn_port *gateway_port; > ovs_be32 gateway; > +/* Maps a static route to a ovn logical router port via which packet > + * needs to exit. */ > +struct shash routes_map; > > /* Logical switch data. */ > struct ovn_port **router_ports; > @@ -272,6 +287,7 @@ ovn_datapath_create(struct hmap *datapaths, const > struct uuid *key, > od->nbs = nbs; > od->nbr = nbr; > hmap_init(>port_tnlids); > +shash_init(>routes_map); > od->port_key_hint = 0; > hmap_insert(datapaths, >key_node, uuid_hash(>key)); > return od; > @@ -286,6 +302,7 @@ ovn_datapath_destroy(struct hmap *datapaths, struct > ovn_datapath *od) > * use it. */ > hmap_remove(datapaths, >key_node); > destroy_tnlids(>port_tnlids); > +shash_destroy_free_data(>routes_map); > free(od->router_ports); > free(od); > } > @@ -318,6 +335,50 @@ ovn_datapath_from_sbrec(struct hmap *datapaths, > } > > static void > +build_static_route(struct ovn_datapath *od, > + const struct nbrec_logical_router_static_route *route) > +{ > +ovs_be32 prefix, next_hop, mask; > + > +/* verify nexthop */ > +char *error = ip_parse_masked(route->nexthop, _hop, ); > +if (error || mask != OVS_BE32_MAX) { > +static struct vlog_rate_limit rl > += VLOG_RATE_LIMIT_INIT(5, 1); > +VLOG_WARN_RL(, "bad next hop ip address %s", > +route->nexthop); > +free(error); > +return; > +} > + > +/* verify prefix */ > +error = ip_parse_masked(route->prefix, , ); > +if (error || !ip_is_cidr(mask)) { > +static struct vlog_rate_limit rl > += VLOG_RATE_LIMIT_INIT(5, 1); > +VLOG_WARN_RL(, "bad 'network' in static routes %s", > + route->prefix); > +free(error); > +return; > +} > + > +struct uuid lrp_uuid; > +struct route_to_port *route_port = xmalloc(sizeof *route_port); > +route_port->ip = prefix; > +route_port->mask = mask; > +route_port->next_hop = next_hop; > +/* The correct ovn_port will be filled while traversing > + * logical_router_ports.
Re: [ovs-dev] [PATCH] Add VxLAN-GBP support for user space data path
On Tue, Apr 19, 2016 at 3:20 AM, Johnson.Liwrote: > diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c > index e398562..e6b35ed 100644 > --- a/lib/netdev-vport.c > +++ b/lib/netdev-vport.c > @@ -1297,10 +1298,16 @@ netdev_vxlan_pop_header(struct dp_packet *packet) > return EINVAL; > } > > -if (get_16aligned_be32(>vx_flags) != htonl(VXLAN_FLAGS) || > - (get_16aligned_be32(>vx_vni) & htonl(0xff))) { > -VLOG_WARN_RL(_rl, "invalid vxlan flags=%#x vni=%#x\n", > - ntohl(get_16aligned_be32(>vx_flags)), > +vxh_flags = get_16aligned_be32(>vx_flags); > +if ((vxh_flags & htonl(VXLAN_GBP_FLAGS)) == htonl(VXLAN_GBP_FLAGS)) { There are many overlapping extensions to VXLAN. We should only try to parse these flags when the GBP extensions is actually turned on. > +tnl->gbp_id = htons(ntohl(vxh_flags) & 0x); /* VXLAN_GBP_ID_MASK > */ There is a field called tnl->gbp_flags which should presumably also be populated here (and on transmit). > diff --git a/lib/packets.h b/lib/packets.h > index 8139a6b..d9a149d 100644 > --- a/lib/packets.h > +++ b/lib/packets.h > @@ -1002,6 +1002,7 @@ struct vxlanhdr { > }; > > #define VXLAN_FLAGS 0x0800 /* struct vxlanhdr.vx_flags required value. > */ > +#define VXLAN_GBP_FLAGS 0x8800 /* Group Based Policy */ Can you please break these down into individual flags so that it is easier to tell what is going on (and you could use them in parsing as well)? Can you also please add some test cases? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH RFC] ovn: distributed logical port for VM metadata access
On Mon, Apr 18, 2016 at 2:55 PM, Aaron Rosenwrote: > I like this idea as well. > > The one question I have with is is how we should determine which ip address > to select for the 'distributed' port? Aaron, Thanks for your review, and feedback. We can use the dhcp-port (and its IP address) for the distributed-port. The current native-dhcp proposal (http://openvswitch.org/pipermail/dev/2016-April/069787.html) assumes that a dhcp-server ip-address "server_id" is defined on the subnet. action=(dhcp_offer(offerip = 10.0.0.2, router = 10.0.0.1, server_id = 10.0.0.2, mtu = 1300, lease_time = 3600, For openstack, This means that a DHCP port has been created on that subnet in neutron. In the absence of a dhcp-agent, the neutron-ovn-plugin would have to auto-create the dhcp-port in neutron upon creation of a subnet, and then use that port's IP address as the "server_id" when it programs the "dhcp-options" column of the Logical_Switch table. The pros of the distributed-port approach is that a) HA is not needed, b) it runs the existing neutron-metadata-proxy/neutron-metadata-agent as-is, c) In the future, we could remove the neutron-metadata-agent also, by (nova-compute) configuring the instance-id and tenant-id as external-ids of the VM's ovs interface - hence not need to run any neutron-agents at all. The drawbacks include creation of namespaces and metadata-proxy processes on each hypervisor. > If we want to avoid creating the network name spaces and > running the http proxy on each hypervisor is if we took a similar approach > that openstack uses for handling dhcp/metadata requests. > When a subnet is created we could have the neutron-ovn-plugin notify a > metadata agent which would create a port on the given subnet for the logical > network. Then, to get instances to route its metadata traffic to this > logical port we could have ovn distribute an additional host-route via dhcp > (using option 121). Similar to what you are proposing. > > > I.e: So for example if someone created a network/subnet. > > In the ovn plugin we can signal the metadata agent to create a port on that > network. Then, for every port that is created on this network we would > distribute a hostroute of 169.254.169.254/32 via ; Then, > we'd have the metadata agent just run there which would answer these meta > data requests and route them back. > > One down side to this solution is that this metadata agent would need to be > made HA in some way. In your current solution if the metadata agent crashes > or something the failure is isolated to the hypervisor. That said, this type > of HA seems like it can be implemented in at least an active passive > solution easily enough. > > Thoughts? > Your proposal is an alternative solution - which involves changes only to the neutron components (and no changes in ovn ?). Would there be only one modified neutron-metadata-agent in an active-passive configuration serving all the VMs ? If there are multiple agents, would you need agent-scheduling to assign networks to agents ? Could you share more details of the approach ? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] Reply: ovs + dpdk vhost-user match flows but cannot execute actions
Hi lifuqiong, Could you provide the output of the following commands in your setup?: ovs-vsctl show ovs-ofctl dump-flows ovs-ofctl dump-ports Mauricio Vasquez, On Sat, Apr 16, 2016 at 10:23 AM, lifuqiongwrote: > Hi Mauricio: > > I changed my qemu version from 2.2.1 to 2.5.0 and the vms can > communication with each other. But the VM cannot ping PC in the outside > network. They in a same subnet. > > > > PC(192.168.0.103/24) ping vm(192.168.0.90); vm’s host’s dpdk NIC and PC > are in a L2 switch. And DPDK NIC connected to switch’s Port Eth1/0/8 > > Investigation: > > Binding VM’s IP and Mac in PC’s ARP table, and binding VM’s Mac > in Port Eth1/0/8; PC is pinging vm, I can see Output packets of Eth1/0/8 > increasing regularly, which means the ping request packet has send to DPDK > NIC. > > > > But the VM can not receive the packet? > > > > My switch has NO vlan , the default VLAN is 1. But when I ping > from vm1 to vm2 , showing MaC address in OVS which shows as follows: > > port VLAN MACAge > > 4 0 00:00:00:00:02:121 > > 3 0 00:00:00:00:00:041 > > > > > > What was the reason why VM cannot communication with the PC? Thank you > > > > *发件人:* lifuqiong [mailto:lifuqi...@cncloudsec.com] > *发送时间:* 2016年4月15日 9:22 > *收件人:* 'Mauricio Vásquez' > *抄送:* 'dev@openvswitch.org' > *主题:* 答复: [ovs-dev] ovs + dpdk vhost-user match flows but cannot execute > actions > > > > Hello Mauricio Vasquez: > > It works. Thank you very much. > > > > > > *发件人:* Mauricio Vásquez [mailto:mauricio.vasquezber...@studenti.polito.it > ] > *发送时间:* 2016年4月14日 14:55 > *收件人:* lifuqiong > *抄送:* dev@openvswitch.org > *主题:* Re: [ovs-dev] ovs + dpdk vhost-user match flows but cannot execute > actions > > > > Hello lifuqiong, > > I faced the same problem some days ago ( > http://openvswitch.org/pipermail/dev/2016-March/068282.html), the bug is > already fixed. > > Where are you downloading OVS from?, it appears that the bug is still > present in the version on > http://openvswitch.org/releases/openvswitch-2.5.0.tar.gz, please download > ovs from git and switch to branch-2.5. > > Mauricio Vasquez, > > > > On Thu, Apr 14, 2016 at 4:28 AM, lifuqiong > wrote: > > I want to test dpdk vhost-user port on ovs to follow > https://software.intel.com/en-us/blogs/2015/06/09/building-vhost-user-for-ovs-today-using-dpdk-200 > ; > I create ovs+dpdk environment followed INSTALL.DPDK.md; and create 2 VM2, > try to ping each other but show me “Destination Host Unreachable”; > Dump-flows shows packets matched the flow, but can’t output to port 4, > why ? I can’t get any useful error or warning info from ovs-vswitchd.log. > While ping from vm1 to vm2, statistics on vm1 shows that eth1 RX_packet > keeps zero, TX_PACKET keeps increasing. > 1. > OVS: 2.5.0 > Dpdk: 2.2.0 > Qemu: 2.2.1 > > 2. ovs-ofctl dump-flows ovsbr0 > NXST_FLOW reply (xid=0x4): > cookie=0x0, duration=836.946s, table=0, n_packets=628, n_bytes=26376, > idle_age=0, in_port=3 actions=output:4 > cookie=0x0, duration=831.458s, table=0, n_packets=36, n_bytes=1512, > idle_age=770, in_port=4 actions=output:3 > > 3. root@host152:/usr/local/var/run/openvswitch# ovs-vsctl show > 03ae6f7d-3b71-45e3-beb0-09fa11292eaa > Bridge "ovsbr0" > Port "vhost-user-1" > Interface "vhost-user-1" > type: dpdkvhostuser > Port "ovsbr0" > Interface "ovsbr0" > type: internal > Port "dpdk1" > Interface "dpdk1" > type: dpdk > Port "vhost-user-0" > Interface "vhost-user-0" > type: dpdkvhostuser > Port "dpdk0" > Interface "dpdk0" > type: dpdk > > 4. Start VM info: > qemu-system-x86_64 -m 1024 -smp 2 -hda /root/vm11.qcow2 -boot c > -enable-kvm -vnc 0.0.0.0:1 -chardev > socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-0 -netdev > type=vhost-user,id=mynet1,chardev=char1,vhostforce -device > virtio-net-pci,mac=00:00:00:00:01:12,netdev=mynet1 -object > memory-backend-file,id=mem,size=1024M,mem-path=/dev/hugepages,share=on > -numa node,memdev=mem -mem-prealloc -d exec > > qemu-system-x86_64: -netdev > type=vhost-user,id=mynet1,chardev=char1,vhostforce: chardev "char1" went up > > > > 5. My build command as follows: > #!/bin/bash > > config and compile dpdk # cd dpdk # > make config T=x86_64-native-linuxapp-gcc # make install > T=x86_64-native-linuxapp-gcc > > > config and compile ovs # cd ovs # > ./boot.sh # ./configure --localstatedir=/var > --with-dpdk=/root/workplane/dpdk/x86_64-native-linuxapp-gcc > # make > # make install > > > config and compile qemu
[ovs-dev] Delivery reports about your e-mail
This message was undeliverable due to the following reason(s): Your message was not delivered because the destination computer was unreachable within the allowed queue period. The amount of time a message is queued before it is returned depends on local configura- tion parameters. Most likely there is a network problem that prevented delivery, but it is also possible that the computer is turned off, or does not have a mail system running right now. Your message was not delivered within 1 days: Server 5.179.227.28 is not responding. The following recipients could not receive this message:Please reply to postmas...@openvswitch.org if you feel this message to be in error. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] odp-util: Fix build warning on flags_mask.
Fix build warning: 'flags_mask' may be used uninitialized. Signed-off-by: Antonio Fischetti--- lib/odp-util.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/odp-util.c b/lib/odp-util.c index b4689cc..10fb6c2 100644 --- a/lib/odp-util.c +++ b/lib/odp-util.c @@ -2366,7 +2366,7 @@ format_odp_tun_vxlan_opt(const struct nlattr *attr, case OVS_VXLAN_EXT_GBP: { uint32_t key = nl_attr_get_u32(a); ovs_be16 id, id_mask; -uint8_t flags, flags_mask; +uint8_t flags, flags_mask = 0; id = htons(key & 0x); flags = (key >> 16) & 0xFF; -- 2.4.11 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] FAQ: Add entry for OVS/DPDK version dependencies.
For a given release this is listed in the INSTALL.DPDK.md but it gets asked quite a bit on the mailing list, so create a table in the FAQ. Signed-off-by: Kevin Traynor--- FAQ.md | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/FAQ.md b/FAQ.md index 0fee992..5777471 100644 --- a/FAQ.md +++ b/FAQ.md @@ -233,6 +233,19 @@ Validate flow actions | YES | YES | N/A| NO| Multiple datapaths| YES | YES |YES| NO| Tunnel TSO - STT | N/A | YES |NO | YES | +### Q: What DPDK version does each Open vSwitch release work with? + +A: The following table lists the DPDK version against which the + given versions of Open vSwitch will successfully build. + +| Open vSwitch | DPDK +|::|:-: +|2.2.x | 1.6 +|2.3.x | 1.6 +|2.4.x | 2.0 +|2.5.x | 2.2 +|2.6.x | 16.04 + ### Q: I get an error like this when I configure Open vSwitch: configure: error: Linux kernel in is version , but -- 1.7.4.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] Add VxLAN-GBP support for user space data path
From: Johnson LiIn user space, only standard VxLAN was support. This patch will add the VxLAN-GBP support for the user space data path. How to use: 1> Create VxLAN port with GBP extension $ovs-vsctl add-port br-int vxlan0 -- set interface vxlan0 \ type=vxlan options:dst_port=6633 \ options:remote_ip=192.168.60.22 \ options:key=1000 options:exts=gbp 2> Add flow for transmitting $ovs-ofctl add-flow br-int "table=0, priority=260, \ in_port=LOCAL actions=load:0x100->NXM_NX_TUN_GBP_ID[], \ output:1" 3> Add flow for receiving $ovs-ofctl add-flow br-int "table=0, priority=260, \ in_port=1,tun_gbp_id=0x100 actions=output:LOCAL" Signed-off-by: Johnson Li diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c index e398562..e6b35ed 100644 --- a/lib/netdev-vport.c +++ b/lib/netdev-vport.c @@ -1286,6 +1286,7 @@ netdev_vxlan_pop_header(struct dp_packet *packet) struct flow_tnl *tnl = >tunnel; struct vxlanhdr *vxh; unsigned int hlen; +uint32_t vxh_flags; pkt_metadata_init_tnl(md); if (VXLAN_HLEN > dp_packet_l4_size(packet)) { @@ -1297,10 +1298,16 @@ netdev_vxlan_pop_header(struct dp_packet *packet) return EINVAL; } -if (get_16aligned_be32(>vx_flags) != htonl(VXLAN_FLAGS) || - (get_16aligned_be32(>vx_vni) & htonl(0xff))) { -VLOG_WARN_RL(_rl, "invalid vxlan flags=%#x vni=%#x\n", - ntohl(get_16aligned_be32(>vx_flags)), +vxh_flags = get_16aligned_be32(>vx_flags); +if ((vxh_flags & htonl(VXLAN_GBP_FLAGS)) == htonl(VXLAN_GBP_FLAGS)) { +tnl->gbp_id = htons(ntohl(vxh_flags) & 0x); /* VXLAN_GBP_ID_MASK */ +} else if (vxh_flags != htonl(VXLAN_FLAGS)) { +VLOG_WARN_RL(_rl, "invalid vxlan flags=%#x\n", + ntohl(get_16aligned_be32(>vx_flags))); +return EINVAL; +} +if (get_16aligned_be32(>vx_vni) & htonl(0xff)) { +VLOG_WARN_RL(_rl, "invalid vxlan vni=%#x\n", ntohl(get_16aligned_be32(>vx_vni))); return EINVAL; } @@ -1328,7 +1335,12 @@ netdev_vxlan_build_header(const struct netdev *netdev, vxh = udp_build_header(tnl_cfg, tnl_flow, data, ); -put_16aligned_be32(>vx_flags, htonl(VXLAN_FLAGS)); +if (tnl_cfg->exts & (1 << OVS_VXLAN_EXT_GBP)) { +put_16aligned_be32(>vx_flags, +htonl(VXLAN_GBP_FLAGS | ntohs(tnl_flow->tunnel.gbp_id))); +} else { +put_16aligned_be32(>vx_flags, htonl(VXLAN_FLAGS)); +} put_16aligned_be32(>vx_vni, htonl(ntohll(tnl_flow->tunnel.tun_id) << 8)); ovs_mutex_unlock(>mutex); diff --git a/lib/packets.h b/lib/packets.h index 8139a6b..d9a149d 100644 --- a/lib/packets.h +++ b/lib/packets.h @@ -1002,6 +1002,7 @@ struct vxlanhdr { }; #define VXLAN_FLAGS 0x0800 /* struct vxlanhdr.vx_flags required value. */ +#define VXLAN_GBP_FLAGS 0x8800 /* Group Based Policy */ void ipv6_format_addr(const struct in6_addr *addr, struct ds *); void ipv6_format_addr_bracket(const struct in6_addr *addr, struct ds *, -- 1.8.4.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v7 05/16] dpif-netdev: Fix race condition in pmd thread initialization.
On 19.04.2016 10:18, Ilya Maximets wrote: > There was a reason for 2 calls for dp_netdev_pmd_reload_done() inside > pmd_thread_main(). The reason is that we must wait until PMD thread > completely done with reloading. This patch introduces race condition > for pmd->exit_latch. While removing last port on numa node > dp_netdev_reload_pmd__(pmd) will be called twice for each port. > First call to remove port and second to destroy PMD thread. > pmd->exit_latch setted between this two calls. This leads to probable > situation when PMD thread will exit while processing first reloading. > Main thread will wait forever on cond_wait in second reload in this > case. Situation is easily reproducible by addition/deletion of last > port (may be after few iterations in a cycle). > > Best regards, Ilya Maximets. This incremental should help: -- diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 588d56f..2235297 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2785,6 +2785,7 @@ pmd_thread_main(void *f_) unsigned int port_seq = PMD_INITIAL_SEQ; int poll_cnt; int i; +bool exiting; poll_cnt = 0; poll_list = NULL; @@ -2825,14 +2826,15 @@ reload: } } +emc_cache_uninit(>flow_cache); + poll_cnt = pmd_load_queues_and_ports(pmd, _list); +exiting = latch_is_set(>exit_latch); /* Signal here to make sure the pmd finishes * reloading the updated configuration. */ dp_netdev_pmd_reload_done(pmd); -emc_cache_uninit(>flow_cache); - -if (!latch_is_set(>exit_latch)){ +if (!exiting) { goto reload; } -- > On 08.04.2016 06:13, Daniele Di Proietto wrote: >> The pmds and the main threads are synchronized using a condition >> variable. The main thread writes a new configuration, then it waits on >> the condition variable. A pmd thread reads the new configuration, then >> it calls signal() on the condition variable. To make sure that the pmds >> and the main thread have a consistent view, each signal() should be >> backed by a wait(). >> >> Currently the first signal() doesn't have a corresponding wait(). If >> the pmd thread takes a long time to start and the signal() is received >> by a later wait, the threads will have an inconsistent view. >> >> The commit fixes the problem by removing the first signal() from the >> pmd thread. >> >> This is hardly a problem on current master, because the main thread >> will call the first wait() a long time after the creation of a pmd >> thread. It becomes a problem with the next commits. >> >> Signed-off-by: Daniele Di Proietto>> --- >> lib/dpif-netdev.c | 21 + >> 1 file changed, 9 insertions(+), 12 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 9c32c64..2424d3e 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -2652,21 +2652,22 @@ dpif_netdev_wait(struct dpif *dpif) >> >> static int >> pmd_load_queues(struct dp_netdev_pmd_thread *pmd, struct rxq_poll >> **ppoll_list) >> -OVS_REQUIRES(pmd->poll_mutex) >> { >> struct rxq_poll *poll_list = *ppoll_list; >> struct rxq_poll *poll; >> int i; >> >> +ovs_mutex_lock(>poll_mutex); >> poll_list = xrealloc(poll_list, pmd->poll_cnt * sizeof *poll_list); >> >> i = 0; >> LIST_FOR_EACH (poll, node, >poll_list) { >> poll_list[i++] = *poll; >> } >> +ovs_mutex_unlock(>poll_mutex); >> >> *ppoll_list = poll_list; >> -return pmd->poll_cnt; >> +return i; >> } >> >> static void * >> @@ -2685,13 +2686,10 @@ pmd_thread_main(void *f_) >> /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */ >> ovsthread_setspecific(pmd->dp->per_pmd_key, pmd); >> pmd_thread_setaffinity_cpu(pmd->core_id); >> +poll_cnt = pmd_load_queues(pmd, _list); >> reload: >> emc_cache_init(>flow_cache); >> >> -ovs_mutex_lock(>poll_mutex); >> -poll_cnt = pmd_load_queues(pmd, _list); >> -ovs_mutex_unlock(>poll_mutex); >> - >> /* List port/core affinity */ >> for (i = 0; i < poll_cnt; i++) { >> VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n", >> @@ -2699,10 +2697,6 @@ reload: >> netdev_rxq_get_queue_id(poll_list[i].rx)); >> } >> >> -/* Signal here to make sure the pmd finishes >> - * reloading the updated configuration. */ >> -dp_netdev_pmd_reload_done(pmd); >> - >> for (;;) { >> for (i = 0; i < poll_cnt; i++) { >> dp_netdev_process_rxq_port(pmd, poll_list[i].port, >> poll_list[i].rx); >> @@ -2725,14 +2719,17 @@ reload: >> } >> } >> >> +poll_cnt = pmd_load_queues(pmd, _list); >> +/* Signal here to make sure the pmd finishes >> + * reloading the updated configuration. */ >> +dp_netdev_pmd_reload_done(pmd); >> + >>
[ovs-dev] Adding a new field in the flow
I want to add a field in the flow, and i checked the FAQ but i have a problem is that field is not related to the packet, and when i use "ovs-ofctl add-flow br0 idle_timeout=180,priority=33000,new_field=1547,actions=output:2" it give me that ovs extract the field but when i use "ovs-ofctl dump-flows br0" it shows a value and if i excute the command again i found another value, and when i print the ofpbuf i found that the value of new field (in Hex) is not in the flow packet. So can anyone help me with this Thank you. Here what i've got from add-flows root@root:~# ovs-ofctl add-flow br0 idle_timeout=180,priority=33000,new_field=1547,actions=output:2 2016-04-15T04:40:15Z|1|ofpparse|WARN|new_field ofp-parse 1547 2016-04-15T04:40:15Z|2|ofpparse|WARN|new_field 1547 2016-04-15T04:40:15Z|3|ofpparse|WARN|priority ofp-parse 33000 2016-04-15T04:40:15Z|4|ofctl|WARN|new_field 1547 2016-04-15T04:40:15Z|5|ofp_msgs|WARN|ofp-msgs msg size=8, allocated=8, head=0, tail=0 04 00 00 08 00 00 00 03- 2016-04-15T04:40:15Z|6|ofp_util|WARN|ofm->match_nf is 1547 is 1547 2016-04-15T04:40:15Z|7|ofctl|WARN|ovs_ofctl debut size=88, allocated=88, head=0, tail=0 01 0e 00 58 00 00 00 02-00 38 20 ff 00 00 00 00 0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 0030 00 00 00 00 00 00 00 00-00 00 00 b4 00 00 80 e8 0040 ff ff ff ff ff ff 00 00-06 0b 00 00 00 00 00 00 0050 00 00 00 08 00 02 00 00- Here what i got from dump-flow root@root:~# ovs-ofctl dump-flows br0 2016-04-15T04:40:16Z|1|ofpparse|WARN| new_field is 35390 2016-04-15T04:40:16Z|2|ofpparse|WARN|priority ofp-parse 32768 2016-04-15T04:40:16Z|3|ofp_msgs|WARN|ofp-msgs msg debut size=8, allocated=8, head=0, tail=0 04 00 00 08 00 00 00 04- 2016-04-15T04:40:16Z|4|ofp_util|WARN|ovs_ofctl msg debut size=40, allocated=104, head=0, tail=64 01 10 00 18 00 00 00 04-ff ff 00 00 00 00 23 20 0010 00 00 00 00 00 00 00 00-ff ff 00 00 ff 00 00 00 0020 00 00 00 00 00 00 00 00- 2016-04-15T04:40:16Z|5|ofp_msgs|WARN|ofp-msgs msg debut size=88, allocated=88, head=0, tail=0 01 11 00 58 00 00 00 04-ff ff 00 00 00 00 23 20 0010 00 00 00 00 00 00 00 00-00 40 00 00 00 00 00 01 0020 0c 56 91 c0 80 e8 00 b4-00 00 00 00 00 02 00 02 0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 0040 00 00 00 00 00 00 00 00-20 20 73 69 7a 65 3d 34 0050 00 00 00 08 00 02 00 00- 2016-04-15T04:40:16Z|7|ofp_util|WARN|match nf in NX is 0 in other is 8224 NXST_FLOW reply (xid=0x4): cookie=0x0, duration=1.207s, table=0, n_packets=0, n_bytes=0, idle_timeout=180, idle_age=1, new_field=8224, priority=33000 actions=output:2 -- Amrane Ait Zeouay Engineer Student in The Development of Software and Systems University of Western Brittany Tel: +33 7 62 25 56 03 <+33+7+62+25+56+03> ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ovn-northd: Add support for static_routes.
From: Guru Shettystatic routes are useful when connecting multiple routers with each other. Signed-off-by: Gurucharan Shetty Signed-off-by: steve.ruan --- ovn/northd/ovn-northd.8.xml | 5 +- ovn/northd/ovn-northd.c | 101 + ovn/ovn-nb.ovsschema | 15 +++- ovn/ovn-nb.xml| 31 +++ ovn/utilities/ovn-nbctl.8.xml | 5 ++ ovn/utilities/ovn-nbctl.c | 5 ++ tests/ovn.at | 204 ++ 7 files changed, 362 insertions(+), 4 deletions(-) diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml index da776e1..978853c 100644 --- a/ovn/northd/ovn-northd.8.xml +++ b/ovn/northd/ovn-northd.8.xml @@ -682,8 +682,9 @@ next; - If the route has a gateway, G is the gateway IP address, - otherwise it is ip4.dst. + If the route has a gateway, G is the gateway IP address. + Instead, if the route is from a configured static route, G + is the next hop IP address. Else it is ip4.dst. diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 260c02f..71c9b29 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -234,6 +234,18 @@ allocate_tnlid(struct hmap *set, const char *name, uint32_t max, return 0; } +/* Holds the next hop ip address and the logical router port via which + * a static route is reachable. */ +struct route_to_port { +ovs_be32 ip;/* network address of the route. */ +ovs_be32 mask; /* network mask of the route. */ +ovs_be32 next_hop; /* next_hop ip address for the above route. */ +struct uuid rport; /* output port specified by CMS, or null if not specified */ +struct ovn_port *ovn_port; /* The logical router port via which the packet + * needs to exit to reach the next hop. */ +}; + + /* The 'key' comes from nbs->header_.uuid or nbr->header_.uuid or * sb->external_ids:logical-switch. */ struct ovn_datapath { @@ -249,6 +261,9 @@ struct ovn_datapath { /* Logical router data (digested from nbr). */ const struct ovn_port *gateway_port; ovs_be32 gateway; +/* Maps a static route to a ovn logical router port via which packet + * needs to exit. */ +struct shash routes_map; /* Logical switch data. */ struct ovn_port **router_ports; @@ -272,6 +287,7 @@ ovn_datapath_create(struct hmap *datapaths, const struct uuid *key, od->nbs = nbs; od->nbr = nbr; hmap_init(>port_tnlids); +shash_init(>routes_map); od->port_key_hint = 0; hmap_insert(datapaths, >key_node, uuid_hash(>key)); return od; @@ -286,6 +302,7 @@ ovn_datapath_destroy(struct hmap *datapaths, struct ovn_datapath *od) * use it. */ hmap_remove(datapaths, >key_node); destroy_tnlids(>port_tnlids); +shash_destroy_free_data(>routes_map); free(od->router_ports); free(od); } @@ -318,6 +335,50 @@ ovn_datapath_from_sbrec(struct hmap *datapaths, } static void +build_static_route(struct ovn_datapath *od, + const struct nbrec_logical_router_static_route *route) +{ +ovs_be32 prefix, next_hop, mask; + +/* verify nexthop */ +char *error = ip_parse_masked(route->nexthop, _hop, ); +if (error || mask != OVS_BE32_MAX) { +static struct vlog_rate_limit rl += VLOG_RATE_LIMIT_INIT(5, 1); +VLOG_WARN_RL(, "bad next hop ip address %s", +route->nexthop); +free(error); +return; +} + +/* verify prefix */ +error = ip_parse_masked(route->prefix, , ); +if (error || !ip_is_cidr(mask)) { +static struct vlog_rate_limit rl += VLOG_RATE_LIMIT_INIT(5, 1); +VLOG_WARN_RL(, "bad 'network' in static routes %s", + route->prefix); +free(error); +return; +} + +struct uuid lrp_uuid; +struct route_to_port *route_port = xmalloc(sizeof *route_port); +route_port->ip = prefix; +route_port->mask = mask; +route_port->next_hop = next_hop; +/* The correct ovn_port will be filled while traversing + * logical_router_ports. */ +route_port->ovn_port = NULL; +memset(_port->rport, 0, sizeof (struct uuid)); +if (route->output_port +&& uuid_from_string(_uuid, route->output_port)){ +memcpy(_port->rport, _uuid, sizeof (struct uuid)); +} +shash_add(>routes_map, route->prefix, route_port); +} + +static void join_datapaths(struct northd_context *ctx, struct hmap *datapaths, struct ovs_list *sb_only, struct ovs_list *nb_only, struct ovs_list *both) @@ -409,6 +470,14 @@ join_datapaths(struct northd_context *ctx, struct hmap *datapaths, /* Set the gateway port to NULL. If there is a gateway, it will
[ovs-dev] [PATCH v1] Support port level IPFIX
From: Benli YeThis patch enables port level IPFIX. Before this patch, OVS supported per bridge IPFIX and per flow IPFX, and exporting packet tunnel headers is only supported by bridge IPFIX. This patch adds port level IPFIX for easy configuration and port level IPFIX also supports exporting packet tunnel headers, just the same with bridge level IPFIX. Three main things are done in this patch. 1) Add a column ipfix in Port table to ref IPFIX table 2) Each interface in the port should use the port IPFiX configuration 3) A hash map is used to manage the port which is configured IPFIX CLI to configure Port IPFIX: 1) Configure ovs-vsctl -- set Port port0 ipfix=@i -- --id=@i create IPFIX \ targets=\"10.24.122.72:4739\" sampling=1 obs_domain_id=123 \ obs_point_id=456 cache_active_timeout=1 cache_max_flows=128 \ other_config:enable-tunnel-sampling=true 2) Clear ovs-vsctl clear Port port0 ipfix --- lib/odp-util.c| 29 ++- lib/odp-util.h| 19 +- ofproto/ofproto-dpif-ipfix.c | 403 +++--- ofproto/ofproto-dpif-ipfix.h | 17 ++ ofproto/ofproto-dpif-upcall.c | 39 +++- ofproto/ofproto-dpif-xlate.c | 117 ofproto/ofproto-dpif-xlate.h | 3 +- ofproto/ofproto-dpif.c| 19 +- ofproto/ofproto-provider.h| 7 +- ofproto/ofproto.c | 7 +- ofproto/ofproto.h | 23 +++ vswitchd/bridge.c | 123 +++-- vswitchd/vswitch.ovsschema| 6 +- vswitchd/vswitch.xml | 34 +++- 14 files changed, 741 insertions(+), 105 deletions(-) diff --git a/lib/odp-util.c b/lib/odp-util.c index b4689cc..453ae4f 100644 --- a/lib/odp-util.c +++ b/lib/odp-util.c @@ -316,10 +316,16 @@ format_odp_userspace_action(struct ds *ds, const struct nlattr *attr) cookie.flow_sample.collector_set_id, cookie.flow_sample.obs_domain_id, cookie.flow_sample.obs_point_id); -} else if (userdata_len >= sizeof cookie.ipfix - && cookie.type == USER_ACTION_COOKIE_IPFIX) { -ds_put_format(ds, ",ipfix(output_port=%"PRIu32")", - cookie.ipfix.output_odp_port); +} else if (userdata_len >= sizeof cookie.bridge_ipfix + && cookie.type == USER_ACTION_COOKIE_BRIDGE_IPFIX) { +ds_put_format(ds, ",bridge_ipfix(output_port=%"PRIu32")", + cookie.bridge_ipfix.output_odp_port); +} else if (userdata_len >= sizeof cookie.port_ipfix + && cookie.type == USER_ACTION_COOKIE_PORT_IPFIX) { +ds_put_format(ds, ",port_ipfix(ofp_port=%"PRIu16 + ",output_port=%"PRIu32")", + cookie.port_ipfix.ofp_port, + cookie.port_ipfix.output_odp_port); } else { userdata_unspec = true; } @@ -963,13 +969,20 @@ parse_odp_userspace_action(const char *s, struct ofpbuf *actions) cookie.flow_sample.obs_point_id = obs_point_id; user_data = user_data_size = sizeof cookie.flow_sample; -} else if (ovs_scan([n], ",ipfix(output_port=%"SCNi32")%n", +} else if (ovs_scan([n], ",bridge_ipfix(output_port=%"SCNi32")%n", , ) ) { n += n1; -cookie.type = USER_ACTION_COOKIE_IPFIX; -cookie.ipfix.output_odp_port = u32_to_odp(output); +cookie.type = USER_ACTION_COOKIE_BRIDGE_IPFIX; +cookie.bridge_ipfix.output_odp_port = u32_to_odp(output); user_data = -user_data_size = sizeof cookie.ipfix; +user_data_size = sizeof cookie.bridge_ipfix; +} else if (ovs_scan([n], ",port_ipfix(output_port=%"SCNi32")%n", +, ) ) { +n += n1; +cookie.type = USER_ACTION_COOKIE_PORT_IPFIX; +cookie.port_ipfix.output_odp_port = u32_to_odp(output); +user_data = +user_data_size = sizeof cookie.port_ipfix; } else if (ovs_scan([n], ",userdata(%n", )) { char *end; diff --git a/lib/odp-util.h b/lib/odp-util.h index 51cf5c3..4c9f271 100644 --- a/lib/odp-util.h +++ b/lib/odp-util.h @@ -274,10 +274,11 @@ enum slow_path_reason commit_odp_actions(const struct flow *, enum user_action_cookie_type { USER_ACTION_COOKIE_UNSPEC, -USER_ACTION_COOKIE_SFLOW,/* Packet for per-bridge sFlow sampling. */ -USER_ACTION_COOKIE_SLOW_PATH,/* Userspace must process this flow. */ -USER_ACTION_COOKIE_FLOW_SAMPLE, /* Packet for per-flow sampling. */ -USER_ACTION_COOKIE_IPFIX,/* Packet for per-bridge IPFIX sampling. */ +USER_ACTION_COOKIE_SFLOW, /* Packet for
Re: [ovs-dev] [PATCH v7 08/16] dpif-netdev: Add pmd thread local port cache for transmission.
On 19.04.2016 00:19, Daniele Di Proietto wrote: > > > On 18/04/2016 07:50, "Ilya Maximets"wrote: > >> On 08.04.2016 06:13, Daniele Di Proietto wrote: >>> Signed-off-by: Daniele Di Proietto >>> --- >>> lib/dpif-netdev.c | 243 >>> +++--- >>> 1 file changed, 175 insertions(+), 68 deletions(-) >>> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >>> index 8c5893d..5d1cc43 100644 >>> --- a/lib/dpif-netdev.c >>> +++ b/lib/dpif-netdev.c >>> @@ -185,6 +185,7 @@ static bool dpcls_lookup(const struct dpcls *cls, >>> * >>> *dp_netdev_mutex (global) >>> *port_mutex >>> + *non_pmd_mutex >>> */ >>> struct dp_netdev { >>> const struct dpif_class *const class; >>> @@ -380,6 +381,13 @@ struct rxq_poll { >>> struct ovs_list node; >>> }; >>> >>> +/* Contained by struct dp_netdev_pmd_thread's 'port_cache' or >>> 'tx_ports'. */ >>> +struct tx_port { >>> +odp_port_t port_no; >>> +struct netdev *netdev; >>> +struct hmap_node node; >>> +}; >>> + >>> /* PMD: Poll modes drivers. PMD accesses devices via polling to >>> eliminate >>> * the performance overhead of interrupt processing. Therefore netdev >>> can >>> * not implement rx-wait for these devices. dpif-netdev needs to poll >>> @@ -436,10 +444,18 @@ struct dp_netdev_pmd_thread { >>> atomic_int tx_qid; /* Queue id used by this pmd >>> thread to >>> * send packets on all netdevs */ >>> >>> -struct ovs_mutex poll_mutex;/* Mutex for poll_list. */ >>> +struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and >>> 'tx_ports'. */ >>> /* List of rx queues to poll. */ >>> struct ovs_list poll_list OVS_GUARDED; >>> -int poll_cnt; /* Number of elemints in >>> poll_list. */ >>> +/* Number of elements in 'poll_list' */ >>> +int poll_cnt; >>> +/* Map of 'tx_port's used for transmission. Written by the main >>> thread, >>> + * read by the pmd thread. */ >>> +struct hmap tx_ports OVS_GUARDED; >>> + >>> +/* Map of 'tx_port' used in the fast path. This is a thread-local >>> copy >>> + * 'tx_ports'. */ >>> +struct hmap port_cache; >>> >>> /* Only a pmd thread can write on its own 'cycles' and 'stats'. >>> * The main thread keeps 'stats_zero' and 'cycles_zero' as base >>> @@ -495,7 +511,7 @@ dp_netdev_pmd_get_next(struct dp_netdev *dp, struct >>> cmap_position *pos); >>> static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp); >>> static void dp_netdev_del_pmds_on_numa(struct dp_netdev *dp, int >>> numa_id); >>> static void dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int >>> numa_id); >>> -static void dp_netdev_pmd_clear_poll_list(struct dp_netdev_pmd_thread >>> *pmd); >>> +static void dp_netdev_pmd_clear_ports(struct dp_netdev_pmd_thread >>> *pmd); >>> static void dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp, >>> struct dp_netdev_port >>> *port); >>> static void >>> @@ -509,6 +525,8 @@ static void dp_netdev_reset_pmd_threads(struct >>> dp_netdev *dp); >>> static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd); >>> static void dp_netdev_pmd_unref(struct dp_netdev_pmd_thread *pmd); >>> static void dp_netdev_pmd_flow_flush(struct dp_netdev_pmd_thread *pmd); >>> +static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd) >>> +OVS_REQUIRES(pmd->port_mutex); >>> >>> static inline bool emc_entry_alive(struct emc_entry *ce); >>> static void emc_clear_entry(struct emc_entry *ce); >>> @@ -691,7 +709,7 @@ pmd_info_show_rxq(struct ds *reply, struct >>> dp_netdev_pmd_thread *pmd) >>> ds_put_format(reply, "pmd thread numa_id %d core_id %u:\n", >>>pmd->numa_id, pmd->core_id); >>> >>> -ovs_mutex_lock(>poll_mutex); >>> +ovs_mutex_lock(>port_mutex); >>> LIST_FOR_EACH (poll, node, >poll_list) { >>> const char *name = netdev_get_name(poll->port->netdev); >>> >>> @@ -705,7 +723,7 @@ pmd_info_show_rxq(struct ds *reply, struct >>> dp_netdev_pmd_thread *pmd) >>> ds_put_format(reply, " %d", >>> netdev_rxq_get_queue_id(poll->rx)); >>> prev_name = name; >>> } >>> -ovs_mutex_unlock(>poll_mutex); >>> +ovs_mutex_unlock(>port_mutex); >>> ds_put_cstr(reply, "\n"); >>> } >>> } >>> @@ -1078,6 +1096,11 @@ dp_netdev_reload_pmd__(struct >>> dp_netdev_pmd_thread *pmd) >>> int old_seq; >>> >>> if (pmd->core_id == NON_PMD_CORE_ID) { >>> +ovs_mutex_lock(>dp->non_pmd_mutex); >>> +ovs_mutex_lock(>port_mutex); >>> +pmd_load_cached_ports(pmd); >>> +ovs_mutex_unlock(>port_mutex); >>> +ovs_mutex_unlock(>dp->non_pmd_mutex); >>> return; >>> } >>> >>> @@ -1200,9 +1223,7 @@ do_add_port(struct dp_netdev *dp, const char >>>
Re: [ovs-dev] [PATCH v7 05/16] dpif-netdev: Fix race condition in pmd thread initialization.
There was a reason for 2 calls for dp_netdev_pmd_reload_done() inside pmd_thread_main(). The reason is that we must wait until PMD thread completely done with reloading. This patch introduces race condition for pmd->exit_latch. While removing last port on numa node dp_netdev_reload_pmd__(pmd) will be called twice for each port. First call to remove port and second to destroy PMD thread. pmd->exit_latch setted between this two calls. This leads to probable situation when PMD thread will exit while processing first reloading. Main thread will wait forever on cond_wait in second reload in this case. Situation is easily reproducible by addition/deletion of last port (may be after few iterations in a cycle). Best regards, Ilya Maximets. On 08.04.2016 06:13, Daniele Di Proietto wrote: > The pmds and the main threads are synchronized using a condition > variable. The main thread writes a new configuration, then it waits on > the condition variable. A pmd thread reads the new configuration, then > it calls signal() on the condition variable. To make sure that the pmds > and the main thread have a consistent view, each signal() should be > backed by a wait(). > > Currently the first signal() doesn't have a corresponding wait(). If > the pmd thread takes a long time to start and the signal() is received > by a later wait, the threads will have an inconsistent view. > > The commit fixes the problem by removing the first signal() from the > pmd thread. > > This is hardly a problem on current master, because the main thread > will call the first wait() a long time after the creation of a pmd > thread. It becomes a problem with the next commits. > > Signed-off-by: Daniele Di Proietto> --- > lib/dpif-netdev.c | 21 + > 1 file changed, 9 insertions(+), 12 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 9c32c64..2424d3e 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -2652,21 +2652,22 @@ dpif_netdev_wait(struct dpif *dpif) > > static int > pmd_load_queues(struct dp_netdev_pmd_thread *pmd, struct rxq_poll > **ppoll_list) > -OVS_REQUIRES(pmd->poll_mutex) > { > struct rxq_poll *poll_list = *ppoll_list; > struct rxq_poll *poll; > int i; > > +ovs_mutex_lock(>poll_mutex); > poll_list = xrealloc(poll_list, pmd->poll_cnt * sizeof *poll_list); > > i = 0; > LIST_FOR_EACH (poll, node, >poll_list) { > poll_list[i++] = *poll; > } > +ovs_mutex_unlock(>poll_mutex); > > *ppoll_list = poll_list; > -return pmd->poll_cnt; > +return i; > } > > static void * > @@ -2685,13 +2686,10 @@ pmd_thread_main(void *f_) > /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */ > ovsthread_setspecific(pmd->dp->per_pmd_key, pmd); > pmd_thread_setaffinity_cpu(pmd->core_id); > +poll_cnt = pmd_load_queues(pmd, _list); > reload: > emc_cache_init(>flow_cache); > > -ovs_mutex_lock(>poll_mutex); > -poll_cnt = pmd_load_queues(pmd, _list); > -ovs_mutex_unlock(>poll_mutex); > - > /* List port/core affinity */ > for (i = 0; i < poll_cnt; i++) { > VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n", > @@ -2699,10 +2697,6 @@ reload: > netdev_rxq_get_queue_id(poll_list[i].rx)); > } > > -/* Signal here to make sure the pmd finishes > - * reloading the updated configuration. */ > -dp_netdev_pmd_reload_done(pmd); > - > for (;;) { > for (i = 0; i < poll_cnt; i++) { > dp_netdev_process_rxq_port(pmd, poll_list[i].port, > poll_list[i].rx); > @@ -2725,14 +2719,17 @@ reload: > } > } > > +poll_cnt = pmd_load_queues(pmd, _list); > +/* Signal here to make sure the pmd finishes > + * reloading the updated configuration. */ > +dp_netdev_pmd_reload_done(pmd); > + > emc_cache_uninit(>flow_cache); > > if (!latch_is_set(>exit_latch)){ > goto reload; > } > > -dp_netdev_pmd_reload_done(pmd); > - > free(poll_list); > return NULL; > } > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev