[ovs-dev] DPDK Merge Repo
Hi All As mentioned before, I am using a repo for DPDK patch merging. The repo is here: https://github.com/darball/ovs/ There are still some outstanding patches from Bhanu that have not completed review yet: util: Add PADDED_MEMBERS_CACHELINE_MARKER macro to mark cachelines.- Bhanu packets: Reorganize the pkt_metadata structure. - Bhanu and a series we would like to get into 2.8 netdev-dpdk: Use intermediate queue during packet transmission. Bhanu Jun 29/V3 netdev: Add netdev_txq_flush function. netdev-dpdk: Add netdev_dpdk_txq_flush function. netdev-dpdk: Add netdev_dpdk_vhost_txq_flush function. netdev-dpdk: Add intermediate queue support. netdev-dpdk: Enable intermediate queue for vHost User port. dpif-netdev: Flush the packets in intermediate queue. Please let me know if something else is approved but missed ? Anything else ? Thanks Darrell ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4] Update relevant artifacts to add support for DPDK 17.05.1.
> -Original Message- > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > boun...@openvswitch.org] On Behalf Of Timothy M. Redaelli > Sent: Tuesday, August 1, 2017 11:37 AM > To: Weglicki, MichalX ; d...@openvswitch.org > Subject: Re: [ovs-dev] [PATCH v4] Update relevant artifacts to add support for > DPDK 17.05.1. > > On 07/25/2017 02:34 PM, Michal Weglicki wrote: > > Upgrading to DPDK 17.05.1 stable release adds new > > significant features relevant to OVS, including, > > but not limited to: > > - tun/tap PMD, > > - VFIO hotplug support, > > - Generic flow API. > > > > Following changes are applied: > > - netdev-dpdk: Changes required by DPDK API modifications. > > - doc: Because of DPDK API changes, backward compatibility > >with previous DPDK releases will be broken, thus all > >relevant documentation entries are updated. > > - .travis: DPDK version change from 16.11.1 to 17.05.1. > > - rhel/openvswitch-fedora.spec.in: DPDK version change > >from 16.11 to 17.05.1 > > > > v1->v2: Patch rebase. > > v2->v3: Fixed wrong formating after v2 patch rebase. > > v3->v4: Minor documentation changes. > > > > Signed-off-by: Michal Weglicki > > Reviewed-by: Aaron Conole > > --- > > .travis/linux-build.sh | 2 +- > > Documentation/faq/releases.rst | 1 + > > Documentation/howto/dpdk.rst | 6 +- > > Documentation/intro/install/dpdk.rst | 14 +-- > > Documentation/topics/dpdk/vhost-user.rst | 12 +-- > > NEWS | 1 + > > lib/netdev-dpdk.c| 144 > > +++ > > rhel/openvswitch-fedora.spec.in | 2 +- > > tests/dpdk/ring_client.c | 6 +- > > 9 files changed, 114 insertions(+), 74 deletions(-) > > > [...] > > diff --git a/rhel/openvswitch-fedora.spec.in > > b/rhel/openvswitch-fedora.spec.in > > index 3a045d3..2bb7102 100644 > > --- a/rhel/openvswitch-fedora.spec.in > > +++ b/rhel/openvswitch-fedora.spec.in > > @@ -84,7 +84,7 @@ BuildRequires: libcap-ng libcap-ng-devel > > %endif > > %if %{with dpdk} > > BuildRequires: libpcap-devel numactl-devel > > -BuildRequires: dpdk-devel >= 16.11 > > +BuildRequires: dpdk-devel >= 17.05.1 > > Provides: %{name}-dpdk = %{version}-%{release} > > %endif > > > > > Hi, > on Fedora there is no dpdk-devel 17.05.1 package, the last rawhide > version is dpdk-devel 17.05 (and usually Fedora doesn't package minor > releases of dpdk). > > I tried to build openvswitch with the dpdk 17.05 package from rawhide. > It build and it works (tested with dpdkvhostuserclient) so, unless you > need a specific fix from 17.05.1, I suggest to change the BuildRequires > to 17.05 instead of 17.05.1 or the package cannot be built on Fedora, > unless you are using an additional repository (copr). Hi, If I'm not mistaken we moved to 17.05.1 version, because it includes fix for 'vhost: fix crash on NUMA', which was causing ovs-vswitchd to terminate. More details here: http://dpdk.org/dev/patchwork/patch/25012/ Regards, Przemek -- Intel Research and Development Ireland Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5] dpctl: Add new 'ct-bkts' command.
Hi Antonio I noticed a problem in final testing and I made a small incremental change. Pls let me know if this is ok ? Thanks Darrell diff --git a/lib/dpctl.c b/lib/dpctl.c index 8eb0671..59f09b7 100644 --- a/lib/dpctl.c +++ b/lib/dpctl.c @@ -1518,8 +1518,7 @@ dpctl_ct_bkts(int argc, const char *argv[], dpctl_print(dpctl_p, "Total Buckets: %d\n", tot_bkts); int tot_conn = 0; -uint32_t conn_per_bkts[tot_bkts]; -memset(conn_per_bkts, 0, sizeof(uint32_t) * tot_bkts); +uint32_t *conn_per_bkts = xzalloc(tot_bkts * sizeof(uint32_t)); while (!ct_dpif_dump_next(dump, &cte)) { ct_dpif_entry_uninit(&cte); @@ -1560,6 +1559,7 @@ dpctl_ct_bkts(int argc, const char *argv[], ct_dpif_dump_done(dump); dpif_close(dpif); +free(conn_per_bkts); return error; } -Original Message- From: on behalf of "antonio.fische...@intel.com" Date: Monday, July 24, 2017 at 3:01 AM To: "d...@openvswitch.org" Subject: [ovs-dev] [PATCH v5] dpctl: Add new 'ct-bkts' command. With the command: ovs-appctl dpctl/ct-bkts shows the number of connections per bucket. By using a threshold: ovs-appctl dpctl/ct-bkts gt=N for each bucket shows the number of connections when they are greater than N. Signed-off-by: Antonio Fischetti Signed-off-by: Bhanuprakash Bodireddy Co-authored-by: Bhanuprakash Bodireddy --- lib/conntrack.c| 9 ++-- lib/conntrack.h| 2 +- lib/ct-dpif.c | 4 +- lib/ct-dpif.h | 3 +- lib/dpctl.c| 103 - lib/dpctl.man | 7 +++ lib/dpif-netdev.c | 4 +- lib/dpif-netlink.c | 4 +- lib/dpif-provider.h| 2 +- lib/netlink-conntrack.c| 6 ++- lib/netlink-conntrack.h| 3 +- tests/test-netlink-conntrack.c | 3 +- utilities/ovs-dpctl.c | 1 + 13 files changed, 134 insertions(+), 17 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index de46a6b..e290b20 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -1931,7 +1931,7 @@ conn_key_to_tuple(const struct conn_key *key, struct ct_dpif_tuple *tuple) static void conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, - long long now) + long long now, int bkt) { struct ct_l4_proto *class; long long expiration; @@ -1954,11 +1954,12 @@ conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, if (class->conn_get_protoinfo) { class->conn_get_protoinfo(conn, &entry->protoinfo); } +entry->bkt = bkt; } int conntrack_dump_start(struct conntrack *ct, struct conntrack_dump *dump, - const uint16_t *pzone) + const uint16_t *pzone, int *ptot_bkts) { memset(dump, 0, sizeof(*dump)); if (pzone) { @@ -1967,6 +1968,8 @@ conntrack_dump_start(struct conntrack *ct, struct conntrack_dump *dump, } dump->ct = ct; +*ptot_bkts = CONNTRACK_BUCKETS; + return 0; } @@ -1991,7 +1994,7 @@ conntrack_dump_next(struct conntrack_dump *dump, struct ct_dpif_entry *entry) INIT_CONTAINER(conn, node, node); if ((!dump->filter_zone || conn->key.zone == dump->zone) && (conn->conn_type != CT_CONN_TYPE_UN_NAT)) { -conn_to_ct_dpif_entry(conn, entry, now); +conn_to_ct_dpif_entry(conn, entry, now, dump->bucket); break; } /* Else continue, until we find an entry in the appropriate zone diff --git a/lib/conntrack.h b/lib/conntrack.h index defde4c..3f48444 100644 --- a/lib/conntrack.h +++ b/lib/conntrack.h @@ -108,7 +108,7 @@ struct conntrack_dump { struct ct_dpif_entry; int conntrack_dump_start(struct conntrack *, struct conntrack_dump *, - const uint16_t *pzone); + const uint16_t *pzone, int *); int conntrack_dump_next(struct conntrack_dump *, struct ct_dpif_entry *); int conntrack_dump_done(struct conntrack_dump *); diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c index f8d2cf1..c79e69e 100644 --- a/lib/ct-dpif.c +++ b/lib/ct-dpif.c @@ -65,12 +65,12 @@ static const struct flags ct_dpif_status_flags[] = { * that represents the error. Otherwise it returns zero. */ int ct_dpif_dump_start(struct dpif *dpif, struct ct_dpif_dump_state **dump, - const uint16_t *zone) + const uint16_t *zone, int *ptot_bkts) { int err; err =
Re: [ovs-dev] [PATCH] Prepare for 2.8.0.
> On Aug 1, 2017, at 12:09 PM, Ben Pfaff wrote: > > Signed-off-by: Ben Pfaff I usually hold off on putting the changes in "debian/changelog", since NEWS often continues to grow post-branch, and it's easy to miss updates. However, I don't know how often people look at that file, so it may not matter. Acked-by: Justin Pettit --Justin ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] RFC: Let's branch for 2.8 today
Hi, Ben Ericsson, Redhat and Intel has thought OVS 2.8 would include NSH support, there are some projects depending on this, for example, Opendaylight (Openflowplugin, SFC and NetVirt), OPNFV and Openstack (networking-sfc). This email is long and easy to make people misunderstand :-) my readout is OVS 2.8 will include NSH support although it is branched now, can you double confirm this? My stakeholders need an affirmative answer. -Original Message- From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of Ben Pfaff Sent: Wednesday, August 2, 2017 1:24 AM To: d...@openvswitch.org Subject: [ovs-dev] RFC: Let's branch for 2.8 today I'd like to create branch-2.8 today in preparation for releasing OVS 2.8 later in August. I sent the patch that would lead off the branch a few minutes ago: https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336544.html According to our published schedule, we should have create branch-2.8 at the beginning of July, but we delayed to allow some more features to enter the 2.8 release. There continue to be reasonable arguments along these lines in favor of delaying branching further: * Many DPDK patches have been posted but are still under review or awaiting application. * Multiple series of patches leading up to NSH support have been posted but are still iterating through reviews. The primary argument in favor of branching quickly is to keep to a regular release cadence. My understanding is that multiple downstream projects rely on prompt Open vSwitch releases. (This is mainly based on speaking to Russell Bryant at Red Hat; he can fill in the details if need be.) I'd prefer to stick to our schedule for this reason. I propose that we branch today, with the following further details: * Continue to apply patches that add features, that as of now have already been posted and ready for review or iterating on reviews, until approximately this Friday. These patches will need to be committed both to master and branch-2.8. * As always, bug fix patches are acceptable on any branch at any time. * Plan to release, given acceptable quality, during the last week of August. Thoughts? Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovn-northd: Add native active-standby HA.
On Wed, Aug 2, 2017 at 1:18 AM, Russell Bryant wrote: > On Tue, Aug 1, 2017 at 3:26 PM, Han Zhou wrote: > > > > > > On Tue, Aug 1, 2017 at 9:19 AM, Russell Bryant wrote: > >> > >> Add native support for active-standby HA in ovn-northd by having each > >> instance attempt to acquire an OVSDB lock. Only the instance of > >> ovn-northd that currently holds the lock will make active changes to > >> the OVN databases. > >> > >> Signed-off-by: Russell Bryant > >> --- > >> NEWS| 1 + > >> ovn/northd/ovn-northd.8.xml | 9 + > >> ovn/northd/ovn-northd.c | 40 ++ > +- > >> 3 files changed, 41 insertions(+), 9 deletions(-) > >> > >> diff --git a/NEWS b/NEWS > >> index facea0228..f3cdd2443 100644 > >> --- a/NEWS > >> +++ b/NEWS > >> @@ -49,6 +49,7 @@ Post-v2.7.0 > >> one chassis is specified, OVN will manage high availability for > >> that > >> gateway. > >> * Add support for ACL logging. > >> + * ovn-northd now has native support for active-standby high > >> availability. > >> - Tracing with ofproto/trace now traces through recirculation. > >> - OVSDB: > >> * New support for role-based access control (see ovsdb-server(1)). > >> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml > >> index 1527e8a60..0d85ec0d2 100644 > >> --- a/ovn/northd/ovn-northd.8.xml > >> +++ b/ovn/northd/ovn-northd.8.xml > >> @@ -72,6 +72,15 @@ > >> > >> > >> > >> +Active-Standby for High Availability > >> + > >> + You may run ovn-northd more than once in an OVN > >> deployment. > >> + OVN will automatically ensure that only one of them is active at > a > >> time. > >> + If multiple instances of ovn-northd are running and > >> the > >> + active ovn-northd fails, one of the hot standby > >> instances > >> + of ovn-northd will automatically take over. > >> + > >> + > >> Logical Flow Table Structure > >> > >> > >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > >> index 10e0c7ce0..3d2be4267 100644 > >> --- a/ovn/northd/ovn-northd.c > >> +++ b/ovn/northd/ovn-northd.c > >> @@ -6531,6 +6531,12 @@ main(int argc, char *argv[]) > >> ovsdb_idl_add_column(ovnsb_idl_loop.idl, > &sbrec_chassis_col_nb_cfg); > >> ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name); > >> > >> +/* Ensure that only a single ovn-northd is active in the deployment > >> by > >> + * acquiring a lock called "ovn_northd" on the southbound database > >> + * and then only performing DB transactions if the lock is held. */ > >> +ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd"); > >> +bool had_lock = false; > >> + > >> /* Main loop. */ > >> exiting = false; > >> while (!exiting) { > >> @@ -6541,15 +6547,29 @@ main(int argc, char *argv[]) > >> .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop), > >> }; > >> > >> -struct chassis_index chassis_index; > >> -chassis_index_init(&chassis_index, ctx.ovnsb_idl); > >> +if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { > >> +VLOG_INFO("ovn-northd lock acquired. " > >> + "This ovn-northd instance is now active."); > >> +had_lock = true; > >> +} else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) > { > >> +VLOG_INFO("ovn-northd lock lost. " > >> + "This ovn-northd instance is now on standby."); > > > > Should it try lock again, if we want it to be standby? Otherwise, this > > instance won't have a chance to be active any more. > > Good question ... I was assuming this scenario was due to a lost > connection, and that the IDL would automatically try to re-acquire the > lock. > > I tested to make sure I saw a second ovn-northd go from standby to > active, but I have not tested active -> standby -> active again. > > I'll take a closer look at this before applying the patch. > > I tested it and it works fine. active -> standby -> active scenario also works fine. I also tested by restarting southbound ovsdb-server. Once ovsdb-server is up again, the idl clients try to acquire the lock and one of the ovn-northd instance becomes active again. I don't think it is required to try lock again as idl client takes care of it. How about starting another instance of ovn-northd in the sandbox/test environment so that active/standby scenario gets tested for all the ovn tests ? Acked-by: Numan Siddique Thanks Numan > > >> +had_lock = false; > >> +} > >> > >> -ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop); > >> -ovnsb_db_run(&ctx, &ovnsb_idl_loop); > >> -if (ctx.ovnsb_txn) { > >> -check_and_add_supported_dhcp_opts_to_sb_db(&ctx); > >> -check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx); > >> -check_and_update_rbac(&ctx); > >> +struct chassis_index c
Re: [ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap
Maybe encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210)) is better, the below are possible formats, I think they are all ok, what do you think about them? encap(nsh) encap(nsh()) encap(nsh(md_type=1)) encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210)) -Original Message- From: Ben Pfaff [mailto:b...@ovn.org] Sent: Wednesday, August 2, 2017 7:17 AM To: Yang, Yi Y Cc: d...@openvswitch.org; Jan Scheurich ; Zoltan Balogh Subject: Re: [PATCH v4 1/2] OF support and translation of generic encap and decap encap(hdr=nsh,nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210)) On Tue, Aug 01, 2017 at 10:53:27PM +, Yang, Yi Y wrote: > Ben, because we're considering to cover NSH md type 2 case, for NSH TLV, now > we provide it by the below way. > > encap(hdr=nsh,prop(class=nsh,type=md_type,val=2),prop(class=nsh,type=t > lv,val(0x1000,10,0x12345678)),prop(class=nsh,type=tlv,val(0x2000,20,0x > fedcba9876543210))) > > Can you help provide a format you prefer to handle this use case? > > -Original Message- > From: Ben Pfaff [mailto:b...@ovn.org] > Sent: Tuesday, August 1, 2017 11:56 PM > To: Yang, Yi Y > Cc: d...@openvswitch.org; Jan Scheurich ; > Zoltan Balogh > Subject: Re: [PATCH v4 1/2] OF support and translation of generic > encap and decap > > On Tue, Aug 01, 2017 at 08:06:59PM +0800, Yi Yang wrote: > > From: Jan Scheurich > > > > This commit adds support for the OpenFlow actions generic encap and > > decap (as specified in ONF EXT-382) to the OVS control plane. > > > > CLI syntax for encap action with properties: > > encap(hdr=) > > encap(hdr=, > > prop(class=,type=,val=), > > prop(class=,type=,val())) > > > > CLI syntax for decap action: > > decap() > > decap(packet_type(ns=,type=)) > > > > The first header supported for encap and decap is "ethernet" to > > convert packets between packet_type (1,Ethertype) and (0,0). > > > > This commit also implements a skeleton for the translation of > > generic encap and decap actions in ofproto-dpif and adds support to > > encap and decap an Ethernet header. > > > > In general translation of encap commits pending actions and then > > rewrites struct flow in accordance with the new packet type and > > header. In the case of encap(ethernet) it suffices to change the > > packet type from (1, Ethertype) to (0,0) and set the dl_type > > accordingly. A new pending_encap flag in xlate ctx is set to mark > > that an corresponding datapath encap action must be triggered at the > > next commit. In the case of encap(ethernet) ofproto generetas a push_eth > > action. > > > > The general case for translation of decap() is to emit a datapath > > action to decap the current outermost header and then recirculate > > the packet to reparse the inner headers. In the special case of an > > Ethernet packet, > > decap() just changes the packet type from (0,0) to (1, dl_type) > > without a need to recirculate. The emission of the pop_eth action > > for the datapath is postponed to the next commit. > > > > Hence encap(ethernet) and decap() on an Ethernet packet are OF > > octions that only incur a cost in the dataplane when a modifed > > packet is actually committed, e.g. because it is sent out. They can > > freely be used for normalizing the packet type in the OF pipeline > > without degrading performance. > > > > Signed-off-by: Jan Scheurich > > Signed-off-by: Yi Yang > > Signed-off-by: Zoltan Balogh > > Co-authored-by: Zoltan Balogh > > Thanks for iterating so quickly! > > Besides the syntax for properties, which I still think should be simplified > to e.g. nsh(md_type=1), I have only a few remaining comments. > > I don't think there's any value in the n_props member in nx_action_encap. > (This is about the OpenFlow wire format now, not the internal format.) The > properties are the whole tail of the structure and having a count doesn't > make anything easier. Can you remove it? It will allow us to drop 8 bytes > from the action structure due to padding. > (In case it can't be removed, I'm providing a spelling fix.) > > decode_ed_prop() still doesn't check the length properly, so I'm providing a > fix. > > I'm appending my suggested incremental. > > Thanks again! > > --8<--cut here-->8-- > > diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index > d0437f20922a..c7d47eb5dd71 100644 > --- a/lib/ofp-actions.c > +++ b/lib/ofp-actions.c > @@ -4036,7 +4036,7 @@ struct nx_action_encap { > ovs_be16 subtype; /* NXAST_ENCAP. */ > ovs_be16 hdr_size; /* Header size in bytes, 0 = 'not specified'.*/ > ovs_be32 new_pkt_type; /* Header type to add and PACKET_TYPE of result. > */ > -ovs_be16 n_props; /* Number of the following endecap properties. */ > +ovs_be16 n_props; /* Number of the following encap properties. */ > uint8_
Re: [ovs-dev] RFC: publishing per-release roadmaps
On Tue, Aug 01, 2017 at 07:53:18PM -0400, Russell Bryant wrote: > On Tue, Aug 1, 2017 at 6:47 PM, Ben Pfaff wrote: > > Hello everyone. Last week I spent an afternoon with the OVS-DPDK folks > > at Intel in Shannon, Ireland. One of the ideas that came out of it was > > a proposal for the contributors at each company to publish, at the > > beginning of each release cycle, a list of what features and other > > contributions they were planning to target for the upcoming release. > > This sounds useful to me as a way for each of the major contributors to > > get an idea of what to expect and how to coordinate over a release > > cycle. > > > > I believe that Intel is volunteering to kick this off in the 2.9 release > > cycle for OVS. I'm going to try to get us here at VMware to write a > > list as well, and I encourage other contributors to give it a try as > > well. > > > > If this works out then I'd consider adding a item for it to > > release-process.rst in the tree. > > Sure - the more communication the better. > > My only concern is continuing to set clear expecations that the > release cycle is time based. It should be clear that the list is not > a committed roadmap, but a living document communicating intent that > is subject to change if features miss the release schedule. Good point, I agree that we should be clear that we're sticking to a time-based release plan. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] RFC: publishing per-release roadmaps
On Tue, Aug 1, 2017 at 6:47 PM, Ben Pfaff wrote: > Hello everyone. Last week I spent an afternoon with the OVS-DPDK folks > at Intel in Shannon, Ireland. One of the ideas that came out of it was > a proposal for the contributors at each company to publish, at the > beginning of each release cycle, a list of what features and other > contributions they were planning to target for the upcoming release. > This sounds useful to me as a way for each of the major contributors to > get an idea of what to expect and how to coordinate over a release > cycle. > > I believe that Intel is volunteering to kick this off in the 2.9 release > cycle for OVS. I'm going to try to get us here at VMware to write a > list as well, and I encourage other contributors to give it a try as > well. > > If this works out then I'd consider adding a item for it to > release-process.rst in the tree. Sure - the more communication the better. My only concern is continuing to set clear expecations that the release cycle is time based. It should be clear that the list is not a committed roadmap, but a living document communicating intent that is subject to change if features miss the release schedule. -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2] dpif-netlink: Fix log level for error message
On 1 August 2017 at 15:49, Eric Garver wrote: > On Tue, Aug 01, 2017 at 01:56:44PM -0700, Joe Stringer wrote: >> On 1 August 2017 at 13:56, Joe Stringer wrote: >> > On 29 July 2017 at 21:58, Roi Dayan wrote: >> >> Since it's an error but also will always occur in older kernels >> >> log the message with level warning instead of info. >> >> >> >> Signed-off-by: Roi Dayan >> >> --- >> > >> > When I run this patch on a fedora system with kernel 4.8 and >> > out-of-tree modules, when running the "make check-kmod" tests, every >> > vxlan test now fails because the last step of the test checks the logs >> > for any log message with WARN or higher level, and this message shows >> > up. Looking at the other platforms affected, it seems like this will >> > always triggers on newer kernels in my test environment. Here's the >> > log: >> > >> > 2017-08-01T20:43:59.387Z|00053|dpif_netlink|WARN|Failed to create >> > at_vxlan0 with rtnetlink: Invalid argument >> > >> > I wonder if the new rtnetlink infrastructure is not always working for >> > VXLAN, and this error message is just highlighting that? Eric, could >> > this be related to VXLAN-GPE? >> >> Correction: The non-GPE VXLAN tests fail in this case. > > Yup, sorry. This broke with VXLAN-GPE, but we didn't notice because it > was falling back to the compat interface. A good example of why this log > needs to be higher. :) > > I posted a fix. Indeed, I think that this shows that the log message should be a bit higher level, and I tested this with your fix and no longer see such messages. I applied this patch to master, thanks folks. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpif-netlink-rtnl: Fix false errors on interfaces without tunnel config
On 1 August 2017 at 13:57, Joe Stringer wrote: > On 28 July 2017 at 05:43, Simon Horman wrote: >> On Thu, Jul 27, 2017 at 02:40:02PM +0300, Roi Dayan wrote: >>> When we skip adding a port using rtnetlink and not because of an error we >>> need to return EOPNOTSUPP to avoid logging an error message. >>> >>> Fixes: 2fd3d5eda508 ("dpif-netlink-rtnl: Support layer3 GRE") >>> Signed-off-by: Roi Dayan >>> Reviewed-by: Paul Blakey >> >> Thanks, this looks good to me. I would be happy to apply if someone >> provided an Ack. I'd also be happy if someone else applied it with: >> >> Acked-by: Simon Horman > > Acked-by: Joe Stringer I needed this to test Eric's recent VXLAN non-GPE fix, so I applied this to master. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpif-netlink-rtnl: Fix VXLAN port create for regular VXLAN
On 1 August 2017 at 15:47, Eric Garver wrote: > When VXLAN-GPE was introduced we added IFLA_VXLAN_GPE to the policy > parsing, but did not mark it as optional. The kernel only returns this > netlink attribute if it's actually configured. > > This also adds a missing entry for IFLA_VXLAN_GBP. Apparently we have no > system-traffic test coverage there. > > Fixes: c33c412fb139 ("dpif-netlink-rtnl: Support VXLAN-GPE") > Fixes: 825e45e0109f ("dpif-netlink-rtnl: Support VXLAN creation") > Reported-by: Joe Stringer > Signed-off-by: Eric Garver Thanks for the quick fix, I applied this to master. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Salut
Bonjour cherJ'ai une proposition d'affaires qui sera avantageuse pour nous deux, et je vous indemniserai avec le pourcentage à la conclusion finale. Si vous êtes intéressé s'il vous plaît me répondre en arrière afin de vous fournir plus de détails sur la façon dont nous pouvons procéder plus loin. C'est confidentiel et légal.Concerne. Jeff. Hello dearI have a business Proposal that will be benefit to both of us, and I shall compensate you with percentage at the final conclusion. If you are interested please reply me back in order to provide you more detail on how we can proceed further .This is confidential and legal .Regards. Jeff. Name: Jeff Chris.Email:jeff.ch...@linuxmail.org ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap
encap(hdr=nsh,nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210)) On Tue, Aug 01, 2017 at 10:53:27PM +, Yang, Yi Y wrote: > Ben, because we're considering to cover NSH md type 2 case, for NSH TLV, now > we provide it by the below way. > > encap(hdr=nsh,prop(class=nsh,type=md_type,val=2),prop(class=nsh,type=tlv,val(0x1000,10,0x12345678)),prop(class=nsh,type=tlv,val(0x2000,20,0xfedcba9876543210))) > > Can you help provide a format you prefer to handle this use case? > > -Original Message- > From: Ben Pfaff [mailto:b...@ovn.org] > Sent: Tuesday, August 1, 2017 11:56 PM > To: Yang, Yi Y > Cc: d...@openvswitch.org; Jan Scheurich ; Zoltan > Balogh > Subject: Re: [PATCH v4 1/2] OF support and translation of generic encap and > decap > > On Tue, Aug 01, 2017 at 08:06:59PM +0800, Yi Yang wrote: > > From: Jan Scheurich > > > > This commit adds support for the OpenFlow actions generic encap and > > decap (as specified in ONF EXT-382) to the OVS control plane. > > > > CLI syntax for encap action with properties: > > encap(hdr=) > > encap(hdr=, > > prop(class=,type=,val=), > > prop(class=,type=,val())) > > > > CLI syntax for decap action: > > decap() > > decap(packet_type(ns=,type=)) > > > > The first header supported for encap and decap is "ethernet" to > > convert packets between packet_type (1,Ethertype) and (0,0). > > > > This commit also implements a skeleton for the translation of generic > > encap and decap actions in ofproto-dpif and adds support to encap and > > decap an Ethernet header. > > > > In general translation of encap commits pending actions and then > > rewrites struct flow in accordance with the new packet type and > > header. In the case of encap(ethernet) it suffices to change the > > packet type from (1, Ethertype) to (0,0) and set the dl_type > > accordingly. A new pending_encap flag in xlate ctx is set to mark that > > an corresponding datapath encap action must be triggered at the next > > commit. In the case of encap(ethernet) ofproto generetas a push_eth action. > > > > The general case for translation of decap() is to emit a datapath > > action to decap the current outermost header and then recirculate the > > packet to reparse the inner headers. In the special case of an > > Ethernet packet, > > decap() just changes the packet type from (0,0) to (1, dl_type) > > without a need to recirculate. The emission of the pop_eth action for > > the datapath is postponed to the next commit. > > > > Hence encap(ethernet) and decap() on an Ethernet packet are OF octions > > that only incur a cost in the dataplane when a modifed packet is > > actually committed, e.g. because it is sent out. They can freely be > > used for normalizing the packet type in the OF pipeline without > > degrading performance. > > > > Signed-off-by: Jan Scheurich > > Signed-off-by: Yi Yang > > Signed-off-by: Zoltan Balogh > > Co-authored-by: Zoltan Balogh > > Thanks for iterating so quickly! > > Besides the syntax for properties, which I still think should be simplified > to e.g. nsh(md_type=1), I have only a few remaining comments. > > I don't think there's any value in the n_props member in nx_action_encap. > (This is about the OpenFlow wire format now, not the internal format.) The > properties are the whole tail of the structure and having a count doesn't > make anything easier. Can you remove it? It will allow us to drop 8 bytes > from the action structure due to padding. > (In case it can't be removed, I'm providing a spelling fix.) > > decode_ed_prop() still doesn't check the length properly, so I'm providing a > fix. > > I'm appending my suggested incremental. > > Thanks again! > > --8<--cut here-->8-- > > diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index > d0437f20922a..c7d47eb5dd71 100644 > --- a/lib/ofp-actions.c > +++ b/lib/ofp-actions.c > @@ -4036,7 +4036,7 @@ struct nx_action_encap { > ovs_be16 subtype; /* NXAST_ENCAP. */ > ovs_be16 hdr_size; /* Header size in bytes, 0 = 'not specified'.*/ > ovs_be32 new_pkt_type; /* Header type to add and PACKET_TYPE of result. > */ > -ovs_be16 n_props; /* Number of the following endecap properties. */ > +ovs_be16 n_props; /* Number of the following encap properties. */ > uint8_t pad[6];/* Align to 8 bytes boundary */ > struct ofp_ed_prop_header props[]; /* Encap TLV properties. */ }; @@ > -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad, > enum ofp_version ofp_version OVS_UNUSED, > struct ofpbuf *ofpacts) { > +if (ntohs(nad->len) > sizeof *nad) { > +/* No properties supported yet. */ > +return OFPERR_OFPBPC_BAD_TYPE; > +} > + > struct ofpact_decap * decap; > > decap = ofpact_put_DECAP(ofpacts); > diff --git a/lib/ofp-ed-props.c b/l
Re: [ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap
An action has an embedded length. For nx_action_encap, if the embedded length ->len is longer than sizeof(struct nx_action_encap), then properties follow struct nx_action_encap until the length has been exhausted. On Tue, Aug 01, 2017 at 11:06:21PM +, Yang, Yi Y wrote: > About why we need n_props in nx_action_encap, I added this for Opendaylight > to deserialize the wire format from OVS, n_props can clearly tell > Opendaylight if there is any property. Otherwise how do we check if there is > a property following? > > -Original Message- > From: Ben Pfaff [mailto:b...@ovn.org] > Sent: Tuesday, August 1, 2017 11:56 PM > To: Yang, Yi Y > Cc: d...@openvswitch.org; Jan Scheurich ; Zoltan > Balogh > Subject: Re: [PATCH v4 1/2] OF support and translation of generic encap and > decap > > On Tue, Aug 01, 2017 at 08:06:59PM +0800, Yi Yang wrote: > > From: Jan Scheurich > > > > This commit adds support for the OpenFlow actions generic encap and > > decap (as specified in ONF EXT-382) to the OVS control plane. > > > > CLI syntax for encap action with properties: > > encap(hdr=) > > encap(hdr=, > > prop(class=,type=,val=), > > prop(class=,type=,val())) > > > > CLI syntax for decap action: > > decap() > > decap(packet_type(ns=,type=)) > > > > The first header supported for encap and decap is "ethernet" to > > convert packets between packet_type (1,Ethertype) and (0,0). > > > > This commit also implements a skeleton for the translation of generic > > encap and decap actions in ofproto-dpif and adds support to encap and > > decap an Ethernet header. > > > > In general translation of encap commits pending actions and then > > rewrites struct flow in accordance with the new packet type and > > header. In the case of encap(ethernet) it suffices to change the > > packet type from (1, Ethertype) to (0,0) and set the dl_type > > accordingly. A new pending_encap flag in xlate ctx is set to mark that > > an corresponding datapath encap action must be triggered at the next > > commit. In the case of encap(ethernet) ofproto generetas a push_eth action. > > > > The general case for translation of decap() is to emit a datapath > > action to decap the current outermost header and then recirculate the > > packet to reparse the inner headers. In the special case of an > > Ethernet packet, > > decap() just changes the packet type from (0,0) to (1, dl_type) > > without a need to recirculate. The emission of the pop_eth action for > > the datapath is postponed to the next commit. > > > > Hence encap(ethernet) and decap() on an Ethernet packet are OF octions > > that only incur a cost in the dataplane when a modifed packet is > > actually committed, e.g. because it is sent out. They can freely be > > used for normalizing the packet type in the OF pipeline without > > degrading performance. > > > > Signed-off-by: Jan Scheurich > > Signed-off-by: Yi Yang > > Signed-off-by: Zoltan Balogh > > Co-authored-by: Zoltan Balogh > > Thanks for iterating so quickly! > > Besides the syntax for properties, which I still think should be simplified > to e.g. nsh(md_type=1), I have only a few remaining comments. > > I don't think there's any value in the n_props member in nx_action_encap. > (This is about the OpenFlow wire format now, not the internal format.) The > properties are the whole tail of the structure and having a count doesn't > make anything easier. Can you remove it? It will allow us to drop 8 bytes > from the action structure due to padding. > (In case it can't be removed, I'm providing a spelling fix.) > > decode_ed_prop() still doesn't check the length properly, so I'm providing a > fix. > > I'm appending my suggested incremental. > > Thanks again! > > --8<--cut here-->8-- > > diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index > d0437f20922a..c7d47eb5dd71 100644 > --- a/lib/ofp-actions.c > +++ b/lib/ofp-actions.c > @@ -4036,7 +4036,7 @@ struct nx_action_encap { > ovs_be16 subtype; /* NXAST_ENCAP. */ > ovs_be16 hdr_size; /* Header size in bytes, 0 = 'not specified'.*/ > ovs_be32 new_pkt_type; /* Header type to add and PACKET_TYPE of result. > */ > -ovs_be16 n_props; /* Number of the following endecap properties. */ > +ovs_be16 n_props; /* Number of the following encap properties. */ > uint8_t pad[6];/* Align to 8 bytes boundary */ > struct ofp_ed_prop_header props[]; /* Encap TLV properties. */ }; @@ > -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad, > enum ofp_version ofp_version OVS_UNUSED, > struct ofpbuf *ofpacts) { > +if (ntohs(nad->len) > sizeof *nad) { > +/* No properties supported yet. */ > +return OFPERR_OFPBPC_BAD_TYPE; > +} > + > struct ofpact_decap * decap; > > decap = ofpact_put_DECAP(ofpacts); > diff --
[ovs-dev] Pólizas del mes de Agosto
Le recordamos que nuestras Pólizas de Capacitación son una excelente opción para capacitarse desde su oficina, sin gastos de traslado o viáticos y con expertos de primer nivel que le darán a sus colaboradores la actualización que necesitan para ejercer sus puestos de la manera más competitiva. Pólizas de Capacitación disponibles: • Las relaciones humanas y el éxito en la empresa • Servicio al cliente estilo Disney • Auditoría y Control Interno • Compras • Ventas • Recursos Humanos • Contabilidad y Finanzas Para mayores detalles responda este correo con la póliza de su Interés y llene sus datos en la parte de abajo. Nombre: Teléfono: Correo: Y le enviaremos la información completa. centro telefónico: 018002129393 Coordinador de Evento ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap
About why we need n_props in nx_action_encap, I added this for Opendaylight to deserialize the wire format from OVS, n_props can clearly tell Opendaylight if there is any property. Otherwise how do we check if there is a property following? -Original Message- From: Ben Pfaff [mailto:b...@ovn.org] Sent: Tuesday, August 1, 2017 11:56 PM To: Yang, Yi Y Cc: d...@openvswitch.org; Jan Scheurich ; Zoltan Balogh Subject: Re: [PATCH v4 1/2] OF support and translation of generic encap and decap On Tue, Aug 01, 2017 at 08:06:59PM +0800, Yi Yang wrote: > From: Jan Scheurich > > This commit adds support for the OpenFlow actions generic encap and > decap (as specified in ONF EXT-382) to the OVS control plane. > > CLI syntax for encap action with properties: > encap(hdr=) > encap(hdr=, > prop(class=,type=,val=), > prop(class=,type=,val())) > > CLI syntax for decap action: > decap() > decap(packet_type(ns=,type=)) > > The first header supported for encap and decap is "ethernet" to > convert packets between packet_type (1,Ethertype) and (0,0). > > This commit also implements a skeleton for the translation of generic > encap and decap actions in ofproto-dpif and adds support to encap and > decap an Ethernet header. > > In general translation of encap commits pending actions and then > rewrites struct flow in accordance with the new packet type and > header. In the case of encap(ethernet) it suffices to change the > packet type from (1, Ethertype) to (0,0) and set the dl_type > accordingly. A new pending_encap flag in xlate ctx is set to mark that > an corresponding datapath encap action must be triggered at the next > commit. In the case of encap(ethernet) ofproto generetas a push_eth action. > > The general case for translation of decap() is to emit a datapath > action to decap the current outermost header and then recirculate the > packet to reparse the inner headers. In the special case of an > Ethernet packet, > decap() just changes the packet type from (0,0) to (1, dl_type) > without a need to recirculate. The emission of the pop_eth action for > the datapath is postponed to the next commit. > > Hence encap(ethernet) and decap() on an Ethernet packet are OF octions > that only incur a cost in the dataplane when a modifed packet is > actually committed, e.g. because it is sent out. They can freely be > used for normalizing the packet type in the OF pipeline without > degrading performance. > > Signed-off-by: Jan Scheurich > Signed-off-by: Yi Yang > Signed-off-by: Zoltan Balogh > Co-authored-by: Zoltan Balogh Thanks for iterating so quickly! Besides the syntax for properties, which I still think should be simplified to e.g. nsh(md_type=1), I have only a few remaining comments. I don't think there's any value in the n_props member in nx_action_encap. (This is about the OpenFlow wire format now, not the internal format.) The properties are the whole tail of the structure and having a count doesn't make anything easier. Can you remove it? It will allow us to drop 8 bytes from the action structure due to padding. (In case it can't be removed, I'm providing a spelling fix.) decode_ed_prop() still doesn't check the length properly, so I'm providing a fix. I'm appending my suggested incremental. Thanks again! --8<--cut here-->8-- diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index d0437f20922a..c7d47eb5dd71 100644 --- a/lib/ofp-actions.c +++ b/lib/ofp-actions.c @@ -4036,7 +4036,7 @@ struct nx_action_encap { ovs_be16 subtype; /* NXAST_ENCAP. */ ovs_be16 hdr_size; /* Header size in bytes, 0 = 'not specified'.*/ ovs_be32 new_pkt_type; /* Header type to add and PACKET_TYPE of result. */ -ovs_be16 n_props; /* Number of the following endecap properties. */ +ovs_be16 n_props; /* Number of the following encap properties. */ uint8_t pad[6];/* Align to 8 bytes boundary */ struct ofp_ed_prop_header props[]; /* Encap TLV properties. */ }; @@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad, enum ofp_version ofp_version OVS_UNUSED, struct ofpbuf *ofpacts) { +if (ntohs(nad->len) > sizeof *nad) { +/* No properties supported yet. */ +return OFPERR_OFPBPC_BAD_TYPE; +} + struct ofpact_decap * decap; decap = ofpact_put_DECAP(ofpacts); diff --git a/lib/ofp-ed-props.c b/lib/ofp-ed-props.c index 260adc30acf0..6bba3ac7d982 100644 --- a/lib/ofp-ed-props.c +++ b/lib/ofp-ed-props.c @@ -32,6 +32,9 @@ decode_ed_prop(const struct ofp_ed_prop_header **ofp_prop, uint16_t prop_class = ntohs((*ofp_prop)->prop_class); size_t len = (*ofp_prop)->len; size_t pad_len = ROUND_UP(len, 8); +if (pad_len > *remaining) { +return OFPERR_OFPBAC_BAD_LEN; +} switch (prop_class) { default: __
Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap
Got it, thanks a lot. -Original Message- From: Ben Pfaff [mailto:b...@ovn.org] Sent: Wednesday, August 2, 2017 6:51 AM To: Yang, Yi Y Cc: Zoltán Balogh ; d...@openvswitch.org Subject: Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap I don't understand your explanation. decode_NXAST_RAW_DECAP() converts an OpenFlow action in wire format into OVS internal format. The OpenFlow action that it contains might have been generated by anything (for example, a controller). parse_ENCAP() doesn't have anything to do with it as far as I can tell. On Tue, Aug 01, 2017 at 10:45:50PM +, Yang, Yi Y wrote: > Ben, thank you for your patch, but from my understanding, parse_ENCAP has > ensured it is impossible to have any property for decap, so I'm not sure in > what case this will happened. > > -Original Message- > From: Ben Pfaff [mailto:b...@ovn.org] > Sent: Tuesday, August 1, 2017 11:26 PM > To: Yang, Yi Y > Cc: Zoltán Balogh ; d...@openvswitch.org > Subject: Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of > generic encap and decap > > On Tue, Aug 01, 2017 at 12:32:20PM +, Yang, Yi Y wrote: > > #2. > > [Ben] I suspect that decode_NXAST_RAW_DECAP() should report an error if > > properties are present, since it doesn't support properties. > > > > [Yi] It is impossible. > > What is impossible? It is easy to detect that properties are present and > report an error: > > diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index > d0437f20922a..7be302a4005d 100644 > --- a/lib/ofp-actions.c > +++ b/lib/ofp-actions.c > @@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap > *nad, > enum ofp_version ofp_version OVS_UNUSED, > struct ofpbuf *ofpacts) { > +if (ntohs(nad->len) > sizeof *nad) { > +/* No properties supported yet. */ > +return OFPERR_OFPBPC_BAD_TYPE; > +} > + > struct ofpact_decap * decap; > > decap = ofpact_put_DECAP(ofpacts); ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap
Ben, because we're considering to cover NSH md type 2 case, for NSH TLV, now we provide it by the below way. encap(hdr=nsh,prop(class=nsh,type=md_type,val=2),prop(class=nsh,type=tlv,val(0x1000,10,0x12345678)),prop(class=nsh,type=tlv,val(0x2000,20,0xfedcba9876543210))) Can you help provide a format you prefer to handle this use case? -Original Message- From: Ben Pfaff [mailto:b...@ovn.org] Sent: Tuesday, August 1, 2017 11:56 PM To: Yang, Yi Y Cc: d...@openvswitch.org; Jan Scheurich ; Zoltan Balogh Subject: Re: [PATCH v4 1/2] OF support and translation of generic encap and decap On Tue, Aug 01, 2017 at 08:06:59PM +0800, Yi Yang wrote: > From: Jan Scheurich > > This commit adds support for the OpenFlow actions generic encap and > decap (as specified in ONF EXT-382) to the OVS control plane. > > CLI syntax for encap action with properties: > encap(hdr=) > encap(hdr=, > prop(class=,type=,val=), > prop(class=,type=,val())) > > CLI syntax for decap action: > decap() > decap(packet_type(ns=,type=)) > > The first header supported for encap and decap is "ethernet" to > convert packets between packet_type (1,Ethertype) and (0,0). > > This commit also implements a skeleton for the translation of generic > encap and decap actions in ofproto-dpif and adds support to encap and > decap an Ethernet header. > > In general translation of encap commits pending actions and then > rewrites struct flow in accordance with the new packet type and > header. In the case of encap(ethernet) it suffices to change the > packet type from (1, Ethertype) to (0,0) and set the dl_type > accordingly. A new pending_encap flag in xlate ctx is set to mark that > an corresponding datapath encap action must be triggered at the next > commit. In the case of encap(ethernet) ofproto generetas a push_eth action. > > The general case for translation of decap() is to emit a datapath > action to decap the current outermost header and then recirculate the > packet to reparse the inner headers. In the special case of an > Ethernet packet, > decap() just changes the packet type from (0,0) to (1, dl_type) > without a need to recirculate. The emission of the pop_eth action for > the datapath is postponed to the next commit. > > Hence encap(ethernet) and decap() on an Ethernet packet are OF octions > that only incur a cost in the dataplane when a modifed packet is > actually committed, e.g. because it is sent out. They can freely be > used for normalizing the packet type in the OF pipeline without > degrading performance. > > Signed-off-by: Jan Scheurich > Signed-off-by: Yi Yang > Signed-off-by: Zoltan Balogh > Co-authored-by: Zoltan Balogh Thanks for iterating so quickly! Besides the syntax for properties, which I still think should be simplified to e.g. nsh(md_type=1), I have only a few remaining comments. I don't think there's any value in the n_props member in nx_action_encap. (This is about the OpenFlow wire format now, not the internal format.) The properties are the whole tail of the structure and having a count doesn't make anything easier. Can you remove it? It will allow us to drop 8 bytes from the action structure due to padding. (In case it can't be removed, I'm providing a spelling fix.) decode_ed_prop() still doesn't check the length properly, so I'm providing a fix. I'm appending my suggested incremental. Thanks again! --8<--cut here-->8-- diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index d0437f20922a..c7d47eb5dd71 100644 --- a/lib/ofp-actions.c +++ b/lib/ofp-actions.c @@ -4036,7 +4036,7 @@ struct nx_action_encap { ovs_be16 subtype; /* NXAST_ENCAP. */ ovs_be16 hdr_size; /* Header size in bytes, 0 = 'not specified'.*/ ovs_be32 new_pkt_type; /* Header type to add and PACKET_TYPE of result. */ -ovs_be16 n_props; /* Number of the following endecap properties. */ +ovs_be16 n_props; /* Number of the following encap properties. */ uint8_t pad[6];/* Align to 8 bytes boundary */ struct ofp_ed_prop_header props[]; /* Encap TLV properties. */ }; @@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad, enum ofp_version ofp_version OVS_UNUSED, struct ofpbuf *ofpacts) { +if (ntohs(nad->len) > sizeof *nad) { +/* No properties supported yet. */ +return OFPERR_OFPBPC_BAD_TYPE; +} + struct ofpact_decap * decap; decap = ofpact_put_DECAP(ofpacts); diff --git a/lib/ofp-ed-props.c b/lib/ofp-ed-props.c index 260adc30acf0..6bba3ac7d982 100644 --- a/lib/ofp-ed-props.c +++ b/lib/ofp-ed-props.c @@ -32,6 +32,9 @@ decode_ed_prop(const struct ofp_ed_prop_header **ofp_prop, uint16_t prop_class = ntohs((*ofp_prop)->prop_class); size_t len = (*ofp_prop)->len; size_t pad_len = ROUND_UP(len, 8); +if (pad_len > *remaining) { +return OFPER
Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap
I don't understand your explanation. decode_NXAST_RAW_DECAP() converts an OpenFlow action in wire format into OVS internal format. The OpenFlow action that it contains might have been generated by anything (for example, a controller). parse_ENCAP() doesn't have anything to do with it as far as I can tell. On Tue, Aug 01, 2017 at 10:45:50PM +, Yang, Yi Y wrote: > Ben, thank you for your patch, but from my understanding, parse_ENCAP has > ensured it is impossible to have any property for decap, so I'm not sure in > what case this will happened. > > -Original Message- > From: Ben Pfaff [mailto:b...@ovn.org] > Sent: Tuesday, August 1, 2017 11:26 PM > To: Yang, Yi Y > Cc: Zoltán Balogh ; d...@openvswitch.org > Subject: Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic > encap and decap > > On Tue, Aug 01, 2017 at 12:32:20PM +, Yang, Yi Y wrote: > > #2. > > [Ben] I suspect that decode_NXAST_RAW_DECAP() should report an error if > > properties are present, since it doesn't support properties. > > > > [Yi] It is impossible. > > What is impossible? It is easy to detect that properties are present and > report an error: > > diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index > d0437f20922a..7be302a4005d 100644 > --- a/lib/ofp-actions.c > +++ b/lib/ofp-actions.c > @@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap > *nad, > enum ofp_version ofp_version OVS_UNUSED, > struct ofpbuf *ofpacts) { > +if (ntohs(nad->len) > sizeof *nad) { > +/* No properties supported yet. */ > +return OFPERR_OFPBPC_BAD_TYPE; > +} > + > struct ofpact_decap * decap; > > decap = ofpact_put_DECAP(ofpacts); ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2] dpif-netlink: Fix log level for error message
On Tue, Aug 01, 2017 at 01:56:44PM -0700, Joe Stringer wrote: > On 1 August 2017 at 13:56, Joe Stringer wrote: > > On 29 July 2017 at 21:58, Roi Dayan wrote: > >> Since it's an error but also will always occur in older kernels > >> log the message with level warning instead of info. > >> > >> Signed-off-by: Roi Dayan > >> --- > > > > When I run this patch on a fedora system with kernel 4.8 and > > out-of-tree modules, when running the "make check-kmod" tests, every > > vxlan test now fails because the last step of the test checks the logs > > for any log message with WARN or higher level, and this message shows > > up. Looking at the other platforms affected, it seems like this will > > always triggers on newer kernels in my test environment. Here's the > > log: > > > > 2017-08-01T20:43:59.387Z|00053|dpif_netlink|WARN|Failed to create > > at_vxlan0 with rtnetlink: Invalid argument > > > > I wonder if the new rtnetlink infrastructure is not always working for > > VXLAN, and this error message is just highlighting that? Eric, could > > this be related to VXLAN-GPE? > > Correction: The non-GPE VXLAN tests fail in this case. Yup, sorry. This broke with VXLAN-GPE, but we didn't notice because it was falling back to the compat interface. A good example of why this log needs to be higher. :) I posted a fix. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] RFC: publishing per-release roadmaps
Hello everyone. Last week I spent an afternoon with the OVS-DPDK folks at Intel in Shannon, Ireland. One of the ideas that came out of it was a proposal for the contributors at each company to publish, at the beginning of each release cycle, a list of what features and other contributions they were planning to target for the upcoming release. This sounds useful to me as a way for each of the major contributors to get an idea of what to expect and how to coordinate over a release cycle. I believe that Intel is volunteering to kick this off in the 2.9 release cycle for OVS. I'm going to try to get us here at VMware to write a list as well, and I encourage other contributors to give it a try as well. If this works out then I'd consider adding a item for it to release-process.rst in the tree. Ben ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] dpif-netlink-rtnl: Fix VXLAN port create for regular VXLAN
When VXLAN-GPE was introduced we added IFLA_VXLAN_GPE to the policy parsing, but did not mark it as optional. The kernel only returns this netlink attribute if it's actually configured. This also adds a missing entry for IFLA_VXLAN_GBP. Apparently we have no system-traffic test coverage there. Fixes: c33c412fb139 ("dpif-netlink-rtnl: Support VXLAN-GPE") Fixes: 825e45e0109f ("dpif-netlink-rtnl: Support VXLAN creation") Reported-by: Joe Stringer Signed-off-by: Eric Garver --- lib/dpif-netlink-rtnl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c index 98a2f2bddae2..83f51ac3f3b3 100644 --- a/lib/dpif-netlink-rtnl.c +++ b/lib/dpif-netlink-rtnl.c @@ -70,7 +70,8 @@ static const struct nl_policy vxlan_policy[] = { [IFLA_VXLAN_LEARNING] = { .type = NL_A_U8 }, [IFLA_VXLAN_UDP_ZERO_CSUM6_RX] = { .type = NL_A_U8 }, [IFLA_VXLAN_PORT] = { .type = NL_A_U16 }, -[IFLA_VXLAN_GPE] = { .type = NL_A_FLAG }, +[IFLA_VXLAN_GBP] = { .type = NL_A_FLAG, .optional = true }, +[IFLA_VXLAN_GPE] = { .type = NL_A_FLAG, .optional = true }, }; static const struct nl_policy gre_policy[] = { [IFLA_GRE_COLLECT_METADATA] = { .type = NL_A_FLAG }, -- 2.12.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap
Ben, thank you for your patch, but from my understanding, parse_ENCAP has ensured it is impossible to have any property for decap, so I'm not sure in what case this will happened. -Original Message- From: Ben Pfaff [mailto:b...@ovn.org] Sent: Tuesday, August 1, 2017 11:26 PM To: Yang, Yi Y Cc: Zoltán Balogh ; d...@openvswitch.org Subject: Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap On Tue, Aug 01, 2017 at 12:32:20PM +, Yang, Yi Y wrote: > #2. > [Ben] I suspect that decode_NXAST_RAW_DECAP() should report an error if > properties are present, since it doesn't support properties. > > [Yi] It is impossible. What is impossible? It is easy to detect that properties are present and report an error: diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index d0437f20922a..7be302a4005d 100644 --- a/lib/ofp-actions.c +++ b/lib/ofp-actions.c @@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad, enum ofp_version ofp_version OVS_UNUSED, struct ofpbuf *ofpacts) { +if (ntohs(nad->len) > sizeof *nad) { +/* No properties supported yet. */ +return OFPERR_OFPBPC_BAD_TYPE; +} + struct ofpact_decap * decap; decap = ofpact_put_DECAP(ofpacts); ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] datapath-windows: Refactor OvsCreateNewNBLsFromMultipleNBs
Guru, could you please apply this patch. Thanks, Shashank From: Anand Kumar Sent: Thursday, July 27, 2017 4:48:02 PM To: Shashank Ram; d...@openvswitch.org Subject: Re: [ovs-dev] [PATCH v2] datapath-windows: Refactor OvsCreateNewNBLsFromMultipleNBs Thanks for refactoring it. Acked-by: Anand Kumar Thanks, Anand Kumar On 7/24/17, 3:31 PM, "ovs-dev-boun...@openvswitch.org on behalf of Shashank Ram" wrote: Previously, the function would take the curNbl and nextNbl as inputs, and modify the linked list, and merge the input linked list with the newly generated newNbl list. This is confusing for the caller, and the function has unnecessary logic for merging linked lists that instead the caller should take care of. This is because the OvsCreateNewNBLsFromMultipleNBs() is a generic API that can be used by other functions as well, and its natural for different callers to have different needs. This patch refactors the behavior of OvsCreateNewNBLsFromMultipleNBs to take in the curNbl and lastNbl, and it returns a linked list of NBLs and sets the HEAD and TAIL of the new list obtained from the curNbl. If the caller wants to chain a new linked list at the HEAD or TAIL, it can make use of the curNbl and lastNbl to do so. Signed-off-by: Shashank Ram --- datapath-windows/ovsext/PacketIO.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/datapath-windows/ovsext/PacketIO.c b/datapath-windows/ovsext/PacketIO.c index a90b556..81c574e 100644 --- a/datapath-windows/ovsext/PacketIO.c +++ b/datapath-windows/ovsext/PacketIO.c @@ -49,7 +49,7 @@ static VOID OvsCompleteNBLIngress(POVS_SWITCH_CONTEXT switchContext, static NTSTATUS OvsCreateNewNBLsFromMultipleNBs( POVS_SWITCH_CONTEXT switchContext, PNET_BUFFER_LIST *curNbl, -PNET_BUFFER_LIST *nextNbl); +PNET_BUFFER_LIST *lastNbl); VOID OvsInitCompletionList(OvsCompletionList *completionList, @@ -212,7 +212,7 @@ OvsStartNBLIngress(POVS_SWITCH_CONTEXT switchContext, NDIS_SWITCH_PORT_ID sourcePort = 0; NDIS_SWITCH_NIC_INDEX sourceIndex = 0; PNDIS_SWITCH_FORWARDING_DETAIL_NET_BUFFER_LIST_INFO fwdDetail; -PNET_BUFFER_LIST curNbl = NULL, nextNbl = NULL; +PNET_BUFFER_LIST curNbl = NULL, nextNbl = NULL, lastNbl = NULL; ULONG sendCompleteFlags; UCHAR dispatch; LOCK_STATE_EX lockState, dpLockState; @@ -282,7 +282,7 @@ OvsStartNBLIngress(POVS_SWITCH_CONTEXT switchContext, /* Create a NET_BUFFER_LIST for each NET_BUFFER. */ status = OvsCreateNewNBLsFromMultipleNBs(switchContext, &curNbl, - &nextNbl); + &lastNbl); if (!NT_SUCCESS(status)) { RtlInitUnicodeString(&filterReason, L"Cannot allocate NBLs with single NB."); @@ -292,6 +292,10 @@ OvsStartNBLIngress(POVS_SWITCH_CONTEXT switchContext, NDIS_STATUS_RESOURCES); continue; } + +lastNbl->Next = nextNbl; +nextNbl = curNbl->Next; +curNbl->Next = NULL; } { OvsFlow *flow; @@ -500,11 +504,10 @@ OvsExtCancelSendNBL(NDIS_HANDLE filterModuleContext, static NTSTATUS OvsCreateNewNBLsFromMultipleNBs(POVS_SWITCH_CONTEXT switchContext, PNET_BUFFER_LIST *curNbl, -PNET_BUFFER_LIST *nextNbl) +PNET_BUFFER_LIST *lastNbl) { NTSTATUS status = STATUS_SUCCESS; PNET_BUFFER_LIST newNbls = NULL; -PNET_BUFFER_LIST lastNbl = NULL; PNET_BUFFER_LIST nbl = NULL; BOOLEAN error = TRUE; @@ -520,16 +523,15 @@ OvsCreateNewNBLsFromMultipleNBs(POVS_SWITCH_CONTEXT switchContext, nbl = newNbls; while (nbl) { -lastNbl = nbl; +*lastNbl = nbl; nbl = NET_BUFFER_LIST_NEXT_NBL(nbl); } -lastNbl->Next = *nextNbl; -*nextNbl = newNbls->Next; + +(*curNbl)->Next = NULL; OvsCompleteNBL(switchContext, *curNbl, TRUE); *curNbl = newNbls; -(*curNbl)->Next = NULL; error = FALSE; } while (error); -- 2.9.3.windows.2 ___ dev mailing list d...@openvswitch.org https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman
Re: [ovs-dev] [PATCH] datapath-windows: Fix conntrack lookups for reversed keys
From: ovs-dev-boun...@openvswitch.org on behalf of Anand Kumar Sent: Tuesday, August 1, 2017 3:01 PM To: d...@openvswitch.org Subject: [ovs-dev] [PATCH] datapath-windows: Fix conntrack lookups for reversed keys From: Sairam Venugopal The conntrack table needs to be queried for entries in either directions to determine if the packet is in forward direction or reply direction. The current behavior ends up reversing the incoming packet's 5-Tuple for every entry in the loop instead of doing it only once. Testing Done: - Verified that ICMP requests are no longer treated as replies in Conntrack. Change-Id: I826a164cfb9137e2167c404ff5c9bfd9dfaa33ad Co-authored-by: Sairam Venugopal Signed-off-by: Anand Kumar --- datapath-windows/ovsext/Conntrack.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/datapath-windows/ovsext/Conntrack.c b/datapath-windows/ovsext/Conntrack.c index 8ea1e65..917ebee 100644 --- a/datapath-windows/ovsext/Conntrack.c +++ b/datapath-windows/ovsext/Conntrack.c @@ -401,7 +401,14 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx) POVS_CT_ENTRY entry; BOOLEAN reply = FALSE; POVS_CT_ENTRY found = NULL; -OVS_CT_KEY key = ctx->key; + +/* Reverse NAT must be performed before OvsCtLookup, so here + * we simply need to flip the src and dst in key and compare + * they are equal. Note that flipped key is not equal to + * rev_key due to NAT effect. + */ +OVS_CT_KEY revCtxKey = ctx->key; +OvsCtKeyReverse(&revCtxKey); if (!ctTotalEntries) { return found; @@ -410,19 +417,13 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx) LIST_FORALL(&ovsConntrackTable[ctx->hash & CT_HASH_TABLE_MASK], link) { entry = CONTAINING_RECORD(link, OVS_CT_ENTRY, link); -if (OvsCtKeyAreSame(key,entry->key)) { +if (OvsCtKeyAreSame(ctx->key, entry->key)) { found = entry; reply = FALSE; break; } -/* Reverse NAT must be performed before OvsCtLookup, so here - * we simply need to flip the src and dst in key and compare - * they are equal. Note that flipped key is not equal to - * rev_key due to NAT effect. - */ -OvsCtKeyReverse(&key); -if (OvsCtKeyAreSame(key, entry->key)) { +if (OvsCtKeyAreSame(revCtxKey, entry->key)) { found = entry; reply = TRUE; break; -- 2.9.3.windows.1 ___ Acked-by: Shashank Ram ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 6/6] redhat: allow dpdk to also run as non-root user
After this commit, users may start a dpdk-enabled ovs setup as a non-root user. This is accomplished by exporting the $HOME directory, which dpdk uses to fill in it's semi-persistent RTE configuration. This change may be a bit controversial since it modifies /dev/hugepages as part of starting the ovs-vswitchd to set a hugetlbfs group ownership. This is used to enable writing to /dev/hugepages so that the dpdk_init will successfully complete. There is an alternate way of accomplishing this - namely to initialize DPDK before dropping privileges. However, this would mean that if DPDK ever grows an uninit / reinit function, non-root ovs likely could never use it. This does not change OvS+DPDK's SELinux requirements. It still must be disabled. Signed-off-by: Aaron Conole --- Documentation/intro/install/dpdk.rst| 7 +++ NEWS| 1 + rhel/README.RHEL.rst| 11 +++ rhel/openvswitch-fedora.spec.in | 13 + rhel/usr_lib_systemd_system_ovs-vswitchd.service.in | 5 + 5 files changed, 37 insertions(+) diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst index a05aa1a..0585c6a 100644 --- a/Documentation/intro/install/dpdk.rst +++ b/Documentation/intro/install/dpdk.rst @@ -134,6 +134,13 @@ has to be configured with DPDK support (``--with-dpdk``). Additional information can be found in :doc:`general`. +.. note:: + If you are running using the Fedora or Red Hat package, the Open vSwitch + daemon will run as a non-root user. This implies that you must have a + working IOMMU. Visit the `RHEL README`__ for additional information. + +__ https://github.com/openvswitch/ovs/blob/master/rhel/README.RHEL.rst + Setup - diff --git a/NEWS b/NEWS index facea02..095272a 100644 --- a/NEWS +++ b/NEWS @@ -64,6 +64,7 @@ Post-v2.7.0 * OpenFlow 1.5 packet-out is now supported. - Fedora Packaging: * OVN services are no longer restarted automatically after upgrade. + * ovs-vswitchd and ovsdb-server run as non-root users by default. - Add --cleanup option to command 'ovs-appctl exit' (see ovs-vswitchd(8)). - L3 tunneling: * Use new tunnel port option "packet_type" to configure L2 vs. L3. diff --git a/rhel/README.RHEL.rst b/rhel/README.RHEL.rst index 1845e8f..5f7a99a 100644 --- a/rhel/README.RHEL.rst +++ b/rhel/README.RHEL.rst @@ -337,6 +337,17 @@ running. All other commands where executed when Open vSwitch was successfully running. +Non-root User Support +--- +Fedora and RHEL support running the Open vSwitch daemons as a non-root user. +By default, a fresh installation will create an *openvswitch* user, along +with any additional support groups needed (such as *hugetlbfs* for DPDK +support). + +This is controlled by modifying the ``OVS_USER_ID`` option. Setting this +to 'root:root', or commenting the variable out will revert this behavior. + + Reporting Bugs -- diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in index 959aa2e..ccf6ea0 100644 --- a/rhel/openvswitch-fedora.spec.in +++ b/rhel/openvswitch-fedora.spec.in @@ -95,6 +95,10 @@ Requires: openssl hostname iproute module-init-tools Requires(post): /usr/bin/getent Requires(post): /usr/sbin/useradd Requires(post): /usr/bin/sed +%if %{with dpdk} +Requires(post): /usr/sbin/usermod +Requires(post): /usr/sbin/groupadd +%endif Requires(post): systemd-units Requires(preun): systemd-units Requires(postun): systemd-units @@ -370,6 +374,15 @@ if [ $1 -eq 1 ]; then sed -i 's:^#OVS_USER_ID=:OVS_USER_ID=:' /etc/sysconfig/openvswitch +%if %{with dpdk} +getent group hugetlbfs >/dev/null || \ +groupadd hugetlbfs +usermod -a -G hugetlbfs openvswitch +sed -i \ + 's@OVS_USER_ID="openvswitch:openvswitch"@OVS_USER_ID="openvswitch:hugetlbfs"@'\ +/etc/sysconfig/openvswitch +%endif + # In the case of upgrade, this is not needed. chown -R openvswitch:openvswitch /etc/openvswitch fi diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in index 9aff70b..bf0f058 100644 --- a/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in +++ b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in @@ -10,8 +10,13 @@ PartOf=openvswitch.service [Service] Type=forking Restart=on-failure +Environment=HOME=/var/run/openvswitch EnvironmentFile=/etc/openvswitch/default.conf EnvironmentFile=-/etc/sysconfig/openvswitch +@begin_dpdk@ +ExecStartPre=/usr/bin/chown :hugetlbfs /dev/hugepages +ExecStartPre=/usr/bin/chmod 0775 /dev/hugepages +@end_dpdk@ ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \ --no-ovsdb-server --no-monitor --system-id=random \ --ovs-user=${OVS_USER_ID} \ -- 2.9.4 ___ dev mailing list d...@openvswitch.org https://mail.o
[ovs-dev] [PATCH v3 5/6] dpdkstrip: add a preprocessor tool for stripping dpdk blocks
Normally, in C code, pre-processing macros can be used to enable/disable specific functionality based on switches passed to configure. This works for DPDK using the --with-dpdk flag, which sets the DPDK_NETDEV define to the appropriate value. However, not all files are processed with the C pre-processor. For those files which are not, this commit adds a new pre-processor tool for .in files to either include or exclude those stanzas as appropriate. Signed-off-by: Aaron Conole --- Makefile.am| 5 + build-aux/dpdkstrip.pl | 35 +++ 2 files changed, 40 insertions(+) create mode 100644 build-aux/dpdkstrip.pl diff --git a/Makefile.am b/Makefile.am index 30794ff..433a7f9 100644 --- a/Makefile.am +++ b/Makefile.am @@ -35,6 +35,9 @@ AM_CFLAGS += $(OVS_CFLAGS) if DPDK_NETDEV AM_CFLAGS += -D_FILE_OFFSET_BITS=64 +DPDKSTRIP_FLAGS = "--dpdk" +else +DPDKSTRIP_FLAGS = "--nodpdk" endif if NDEBUG @@ -82,6 +85,7 @@ EXTRA_DIST = \ build-aux/cksum-schema-check \ build-aux/calculate-schema-cksum \ build-aux/dist-docs \ + build-aux/dpdkstrip.pl \ build-aux/sodepends.pl \ build-aux/soexpand.pl \ build-aux/xml2nroff \ @@ -141,6 +145,7 @@ SUFFIXES += .in .in: $(AM_V_GEN)$(MKDIR_P) $(dir $@) $(AM_V_GEN)$(PERL) $(srcdir)/build-aux/soexpand.pl -I$(srcdir) < $< | \ + $(PERL) $(srcdir)/build-aux/dpdkstrip.pl $(DPDKSTRIP_FLAGS) | \ sed \ -e 's,[@]PKIDIR[@],$(PKIDIR),g' \ -e 's,[@]LOGDIR[@],$(LOGDIR),g' \ diff --git a/build-aux/dpdkstrip.pl b/build-aux/dpdkstrip.pl new file mode 100644 index 000..98539cc --- /dev/null +++ b/build-aux/dpdkstrip.pl @@ -0,0 +1,35 @@ +# Copyright (c) 2017 Red Hat, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at: +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +use strict; +use warnings; +use Getopt::Long; + +my ($check_dpdk) = 0; +my ($disabled_print) = 0; + +Getopt::Long::Configure ("bundling"); +GetOptions("dpdk!" => \$check_dpdk) or exit(1); + +OUTER: while () { +if (/@(begin|end)_dpdk@/) { +if (!$check_dpdk) { +$disabled_print = ! $disabled_print; +} +next; +} + +print $_ unless $disabled_print; +} +exit 0; -- 2.9.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 4/6] redhat: dynamic service file for vswitchd
This commit changes the service file from static configuration to an autogenerated file, produced during the build. This will be relevant in a future commit. Signed-off-by: Aaron Conole --- rhel/.gitignore | 1 + rhel/automake.mk | 5 + ...witchd.service => usr_lib_systemd_system_ovs-vswitchd.service.in} | 0 3 files changed, 6 insertions(+) rename rhel/{usr_lib_systemd_system_ovs-vswitchd.service => usr_lib_systemd_system_ovs-vswitchd.service.in} (100%) diff --git a/rhel/.gitignore b/rhel/.gitignore index 164bb66..e584a1e 100644 --- a/rhel/.gitignore +++ b/rhel/.gitignore @@ -4,3 +4,4 @@ openvswitch-kmod-rhel6.spec openvswitch-kmod-fedora.spec openvswitch.spec openvswitch-fedora.spec +usr_lib_systemd_system_ovs-vswitchd.service diff --git a/rhel/automake.mk b/rhel/automake.mk index 2d9443f..56d377b 100644 --- a/rhel/automake.mk +++ b/rhel/automake.mk @@ -29,12 +29,15 @@ EXTRA_DIST += \ rhel/usr_lib_systemd_system_openvswitch.service \ rhel/usr_lib_systemd_system_ovsdb-server.service \ rhel/usr_lib_systemd_system_ovs-vswitchd.service \ + rhel/usr_lib_systemd_system_ovs-vswitchd.service.in \ rhel/usr_lib_systemd_system_ovn-controller.service \ rhel/usr_lib_systemd_system_ovn-controller-vtep.service \ rhel/usr_lib_systemd_system_ovn-northd.service \ rhel/usr_lib_firewalld_services_ovn-central-firewall-service.xml \ rhel/usr_lib_firewalld_services_ovn-host-firewall-service.xml +DISTCLEANFILES += rhel/usr_lib_systemd_system_ovs-vswitchd.service + update_rhel_spec = \ $(AM_V_GEN)($(ro_shell) && sed -e 's,[@]VERSION[@],$(VERSION),g') \ < $(srcdir)/rhel/$(@F).in > $(@F).tmp || exit 1; \ @@ -46,6 +49,8 @@ $(srcdir)/rhel/openvswitch-dkms.spec: rhel/openvswitch-dkms.spec.in $(top_buildd $(srcdir)/rhel/openvswitch-kmod-rhel6.spec: rhel/openvswitch-kmod-rhel6.spec.in $(top_builddir)/config.status $(update_rhel_spec) +rhel/usr_lib_systemd_system_ovs-vswitchd.service: rhel/usr_lib_systemd_system_ovs-vswitchd.service.in $(top_builddir)/config.status + $(srcdir)/rhel/openvswitch-kmod-fedora.spec: rhel/openvswitch-kmod-fedora.spec.in $(top_builddir)/config.status $(update_rhel_spec) diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in similarity index 100% rename from rhel/usr_lib_systemd_system_ovs-vswitchd.service rename to rhel/usr_lib_systemd_system_ovs-vswitchd.service.in -- 2.9.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 3/6] makefile: create destdir when generating files
When intermediary files are generated, the destination directory is assumed to exist. This has worked so far because most files are built prior to the dist-packaging step. However, any files which require rebuild after the packaging step may end up in failure if the output directory is not available. This commit adds a 'mkdir -p' before generating the output files. Signed-off-by: Aaron Conole --- Makefile.am | 1 + 1 file changed, 1 insertion(+) diff --git a/Makefile.am b/Makefile.am index 373ef6e..30794ff 100644 --- a/Makefile.am +++ b/Makefile.am @@ -139,6 +139,7 @@ ro_shell = printf '\043 Generated automatically -- do not modify!-*- buffer- SUFFIXES += .in .in: + $(AM_V_GEN)$(MKDIR_P) $(dir $@) $(AM_V_GEN)$(PERL) $(srcdir)/build-aux/soexpand.pl -I$(srcdir) < $< | \ sed \ -e 's,[@]PKIDIR[@],$(PKIDIR),g' \ -- 2.9.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 2/6] redhat: dynamically allocate and reference ovs user
After this commit, the fedora RPM will create the openvswitch user, from the non-static pool, for use as an Open vSwitch daemon user. This only happens on install - not upgrade. This will be the default user:group combination for the openvswitch daemons. To do this in a way that doesn't impact existing installations, the /etc/openvswitch directory will be created during the installation, rather than being provided as part of the rpm. Acked-by: Markos Chandras Signed-off-by: Aaron Conole --- rhel/openvswitch-fedora.spec.in | 13 + rhel/usr_lib_systemd_system_ovsdb-server.service | 1 + 2 files changed, 14 insertions(+) diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in index 3179568..959aa2e 100644 --- a/rhel/openvswitch-fedora.spec.in +++ b/rhel/openvswitch-fedora.spec.in @@ -92,6 +92,9 @@ Requires: openssl hostname iproute module-init-tools #Upstream kernel commit 4f647e0a3c37b8d5086214128614a136064110c3 #Requires: kernel >= 3.15.0-0 +Requires(post): /usr/bin/getent +Requires(post): /usr/sbin/useradd +Requires(post): /usr/bin/sed Requires(post): systemd-units Requires(preun): systemd-units Requires(postun): systemd-units @@ -361,6 +364,16 @@ rm -rf $RPM_BUILD_ROOT %endif %post +if [ $1 -eq 1 ]; then +getent passwd openvswitch >/dev/null || \ +useradd -r -d / -s /sbin/nologin -c "Open vSwitch Daemons" openvswitch + +sed -i 's:^#OVS_USER_ID=:OVS_USER_ID=:' /etc/sysconfig/openvswitch + +# In the case of upgrade, this is not needed. +chown -R openvswitch:openvswitch /etc/openvswitch +fi + %if 0%{?systemd_post:1} %systemd_post %{name}.service %else diff --git a/rhel/usr_lib_systemd_system_ovsdb-server.service b/rhel/usr_lib_systemd_system_ovsdb-server.service index b82cb33..7acd25f 100644 --- a/rhel/usr_lib_systemd_system_ovsdb-server.service +++ b/rhel/usr_lib_systemd_system_ovsdb-server.service @@ -10,6 +10,7 @@ Type=forking Restart=on-failure EnvironmentFile=/etc/openvswitch/default.conf EnvironmentFile=-/etc/sysconfig/openvswitch +ExecStartPre=/usr/bin/chown ${OVS_USER_ID} /var/run/openvswitch ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \ --no-ovs-vswitchd --no-monitor --system-id=random \ --ovs-user=${OVS_USER_ID} \ -- 2.9.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 1/6] redhat: allow arbitrary user:group
Under rpm based distributions, the only user:group that the rhel daemons run as is 'root:root'. This is fine as a default, but as part of a security procedure, users may want to run as an alternate uid/gid. This commit adds an OVS_USER_ID environment variable for systemd, which defaults to root:root, but can be overridden by changing the /etc/sysconfig/openvswitch environment file. Acked-by: Markos Chandras Signed-off-by: Aaron Conole --- rhel/automake.mk | 1 + rhel/etc_openvswitch_default.conf | 5 + rhel/openvswitch-fedora.spec.in | 4 rhel/usr_lib_systemd_system_ovs-vswitchd.service | 3 +++ rhel/usr_lib_systemd_system_ovsdb-server.service | 3 +++ rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template | 3 +++ 6 files changed, 19 insertions(+) create mode 100644 rhel/etc_openvswitch_default.conf diff --git a/rhel/automake.mk b/rhel/automake.mk index 1265fa7..2d9443f 100644 --- a/rhel/automake.mk +++ b/rhel/automake.mk @@ -10,6 +10,7 @@ EXTRA_DIST += \ rhel/automake.mk \ rhel/etc_init.d_openvswitch \ rhel/etc_logrotate.d_openvswitch \ + rhel/etc_openvswitch_default.conf \ rhel/etc_sysconfig_network-scripts_ifdown-ovs \ rhel/etc_sysconfig_network-scripts_ifup-ovs \ rhel/openvswitch-dkms.spec \ diff --git a/rhel/etc_openvswitch_default.conf b/rhel/etc_openvswitch_default.conf new file mode 100644 index 000..c74417d --- /dev/null +++ b/rhel/etc_openvswitch_default.conf @@ -0,0 +1,5 @@ +# DO NOT EDIT THIS FILE + +# The following is the *default* configuration for the openvswitch user ID. +# This is for backward compatibility. +OVS_USER_ID="root:root" diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in index 367207b..3179568 100644 --- a/rhel/openvswitch-fedora.spec.in +++ b/rhel/openvswitch-fedora.spec.in @@ -246,6 +246,9 @@ done install -m 0755 rhel/etc_init.d_openvswitch \ $RPM_BUILD_ROOT%{_datadir}/openvswitch/scripts/openvswitch.init +install -p -D -m 0644 rhel/etc_openvswitch_default.conf \ +$RPM_BUILD_ROOT/%{_sysconfdir}/openvswitch/default.conf + install -p -D -m 0644 rhel/etc_logrotate.d_openvswitch \ $RPM_BUILD_ROOT/%{_sysconfdir}/logrotate.d/openvswitch @@ -475,6 +478,7 @@ fi %{_sysconfdir}/bash_completion.d/ovs-appctl-bashcomp.bash %{_sysconfdir}/bash_completion.d/ovs-vsctl-bashcomp.bash %dir %{_sysconfdir}/openvswitch +%{_sysconfdir}/openvswitch/default.conf %config %ghost %{_sysconfdir}/openvswitch/conf.db %ghost %{_sysconfdir}/openvswitch/.conf.db.~lock~ %config %ghost %{_sysconfdir}/openvswitch/system-id.conf diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service b/rhel/usr_lib_systemd_system_ovs-vswitchd.service index 886b68a..9aff70b 100644 --- a/rhel/usr_lib_systemd_system_ovs-vswitchd.service +++ b/rhel/usr_lib_systemd_system_ovs-vswitchd.service @@ -10,12 +10,15 @@ PartOf=openvswitch.service [Service] Type=forking Restart=on-failure +EnvironmentFile=/etc/openvswitch/default.conf EnvironmentFile=-/etc/sysconfig/openvswitch ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \ --no-ovsdb-server --no-monitor --system-id=random \ + --ovs-user=${OVS_USER_ID} \ start $OPTIONS ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server \ --no-monitor --system-id=random \ + --ovs-user=${OVS_USER_ID} \ restart $OPTIONS TimeoutSec=300 diff --git a/rhel/usr_lib_systemd_system_ovsdb-server.service b/rhel/usr_lib_systemd_system_ovsdb-server.service index 68deace..b82cb33 100644 --- a/rhel/usr_lib_systemd_system_ovsdb-server.service +++ b/rhel/usr_lib_systemd_system_ovsdb-server.service @@ -8,12 +8,15 @@ PartOf=openvswitch.service [Service] Type=forking Restart=on-failure +EnvironmentFile=/etc/openvswitch/default.conf EnvironmentFile=-/etc/sysconfig/openvswitch ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \ --no-ovs-vswitchd --no-monitor --system-id=random \ + --ovs-user=${OVS_USER_ID} \ start $OPTIONS ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd stop ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd \ + --ovs-user=${OVS_USER_ID} \ --no-monitor restart $OPTIONS RuntimeDirectory=openvswitch RuntimeDirectoryMode=0755 diff --git a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template index 3050a07..fdaee00 100644 --- a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template +++ b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template @@ -21,3 +21,6 @@ # --ovsdb-server-wrapper=valgrind # OPTIONS="" + +# Uncomment and set the OVS User/Group value +#OVS_USER_ID="openvswitch:openvswitch" --
[ovs-dev] [PATCH v3 0/6] rhel/fedora: non-root OvS out of the box
This series attempts to introduce the ability to start and use Open vSwitch 'out of the box' as a non-root user. It does this by modifying the service files to pass the recently introduced --ovs-user argument around, and by making some minor tweaks to the passwd, group, and filesystem information. I prefixed the packaging work with 'redhat', but if rpm or packaging is a preferred prefx for that work, I can respin. The more controversial changes are: * This modifies the /etc/sysconfig/ file on install. * The dpdk support directly modifies /dev/hugepages with a call to chmod * A new user 'openvswitch', and up to two new groups 'openvswitch', and 'hugetlbfs' are created * A change to soexpand.pl to allow conditional inclusion of dpdk-related options It is important to note that for the DPDK section to function properly, the update to 17.05.1 patch needs to be accepted. A workaround is documented in the RHEL guide (instructing users to revert the functionality). After this series: > [root at wsfd-netdev60 ~]# yum install openvswitch-2.7.90-1.fc25.x86_64.rpm > Loaded plugins: product-id, search-disabled-repos, subscription-manager > This system is not registered to Red Hat Subscription Management. You can use > subscription-manager to register. > Examining openvswitch-2.7.90-1.fc25.x86_64.rpm: > openvswitch-2.7.90-1.fc25.x86_64 > Marking openvswitch-2.7.90-1.fc25.x86_64.rpm to be installed > Resolving Dependencies > --> Running transaction check > ---> Package openvswitch.x86_64 0:2.7.90-1.fc25 will be installed > --> Finished Dependency Resolution > > Dependencies Resolved > > > Package ArchVersion Repository > Size > > Installing: > openvswitch x86_64 2.7.90-1.fc25/openvswitch-2.7.90-1.fc25.x86_64 11 > M > > Transaction Summary > > Install 1 Package > > Total size: 11 M > Installed size: 11 M > Is this ok [y/d/N]: y > Downloading packages: > Running transaction check > Running transaction test > Transaction test succeeded > Running transaction > Installing : openvswitch-2.7.90-1.fc25.x86_64 > 1/1 > Verifying : openvswitch-2.7.90-1.fc25.x86_64 > 1/1 > > Installed: > openvswitch.x86_64 0:2.7.90-1.fc25 > > > Complete! > [root at wsfd-netdev60 ~]# systemctl start openvswitch > [root at wsfd-netdev60 ~]# ps aux | grep ovs > openvsw+ 12642 0.0 0.0 47864 2296 ?S ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info > --remote=punix:/var/run/openvswitch/db.sock > --private-key=db:Open_vSwitch,SSL,private_key > --certificate=db:Open_vSwitch,SSL,certificate > --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user > openvswitch:openvswitch --no-chdir > --log-file=/var/log/openvswitch/ovsdb-server.log > --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach > openvsw+ 12688 0.0 0.0 49588 10600 ?S ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err > -vfile:info --mlockall --user openvswitch:openvswitch --no-chdir > --log-file=/var/log/openvswitch/ovs-vswitchd.log > --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach v1->v2: https://lists.linux-foundation.org/pipermail/ovs-dev/2017-June/333417.html The previous method used 3 different locations of configuration from environment variables: 1. The systemd file. 2. A new /etc/sysconfig/openvswitch-pre 3. The existing /etc/sysconfig/openvswitch Now, configuration is from two areas: 1. A new /etc/openvswitch/default.conf 2. The existing /etc/sysconfig/openvswitch As part of the install, we set the OVS_USER_ID to the new values. v2->v3: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/334946.html Refactor for the dpdk non-root user portion due to an issue discovered where the generated service file didn't honor new configuration when re-running ./configure. Also, converted the "Reviewed-by" to "Acked-by". This is because there is no such thing as Reviewed-by in the OVS contributing guide. Finally, included some documentation updates. Aaron Conole (6): redhat: allow arbitrary user:group redhat: dynamically allocate and reference ovs user makefile: create destdir when generating files redhat: dynamic service file for vswitchd dpdkstrip: add a preprocessor tool for stripping dpdk blocks redhat: allow dpdk to also run as non-root user Documentation/intro/install/dpdk.rst | 7 + Makefile.am| 6 NEWS | 1 + build-aux/dpdkstrip.pl | 35 ++ rhel/.gitignore
[ovs-dev] [PATCH] datapath-windows: Fix conntrack lookups for reversed keys
From: Sairam Venugopal The conntrack table needs to be queried for entries in either directions to determine if the packet is in forward direction or reply direction. The current behavior ends up reversing the incoming packet's 5-Tuple for every entry in the loop instead of doing it only once. Testing Done: - Verified that ICMP requests are no longer treated as replies in Conntrack. Change-Id: I826a164cfb9137e2167c404ff5c9bfd9dfaa33ad Co-authored-by: Sairam Venugopal Signed-off-by: Anand Kumar --- datapath-windows/ovsext/Conntrack.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/datapath-windows/ovsext/Conntrack.c b/datapath-windows/ovsext/Conntrack.c index 8ea1e65..917ebee 100644 --- a/datapath-windows/ovsext/Conntrack.c +++ b/datapath-windows/ovsext/Conntrack.c @@ -401,7 +401,14 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx) POVS_CT_ENTRY entry; BOOLEAN reply = FALSE; POVS_CT_ENTRY found = NULL; -OVS_CT_KEY key = ctx->key; + +/* Reverse NAT must be performed before OvsCtLookup, so here + * we simply need to flip the src and dst in key and compare + * they are equal. Note that flipped key is not equal to + * rev_key due to NAT effect. + */ +OVS_CT_KEY revCtxKey = ctx->key; +OvsCtKeyReverse(&revCtxKey); if (!ctTotalEntries) { return found; @@ -410,19 +417,13 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx) LIST_FORALL(&ovsConntrackTable[ctx->hash & CT_HASH_TABLE_MASK], link) { entry = CONTAINING_RECORD(link, OVS_CT_ENTRY, link); -if (OvsCtKeyAreSame(key,entry->key)) { +if (OvsCtKeyAreSame(ctx->key, entry->key)) { found = entry; reply = FALSE; break; } -/* Reverse NAT must be performed before OvsCtLookup, so here - * we simply need to flip the src and dst in key and compare - * they are equal. Note that flipped key is not equal to - * rev_key due to NAT effect. - */ -OvsCtKeyReverse(&key); -if (OvsCtKeyAreSame(key, entry->key)) { +if (OvsCtKeyAreSame(revCtxKey, entry->key)) { found = entry; reply = TRUE; break; -- 2.9.3.windows.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] dpif-netlink-rtnl: Fix false errors on interfaces without tunnel config
On 28 July 2017 at 05:43, Simon Horman wrote: > On Thu, Jul 27, 2017 at 02:40:02PM +0300, Roi Dayan wrote: >> When we skip adding a port using rtnetlink and not because of an error we >> need to return EOPNOTSUPP to avoid logging an error message. >> >> Fixes: 2fd3d5eda508 ("dpif-netlink-rtnl: Support layer3 GRE") >> Signed-off-by: Roi Dayan >> Reviewed-by: Paul Blakey > > Thanks, this looks good to me. I would be happy to apply if someone > provided an Ack. I'd also be happy if someone else applied it with: > > Acked-by: Simon Horman Acked-by: Joe Stringer ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2] tc: Correct convert ticks to msecs on parsing tc TM
On 29 July 2017 at 22:01, Roi Dayan wrote: > From: Paul Blakey > > Use sysconf(_SC_CLK_TCK) to read run time "number of clock ticks per > second" and use that to convert ticks to msecs. > This is how iproute does the conversion when parsing tc filters. > The system call is done only once. > > Signed-off-by: Paul Blakey > Reviewed-by: Roi Dayan > --- Acked-by: Joe Stringer Looks like this is missing Simon's ack from the previous round. In future, please consider propagating these from earlier patch revisions if the change is minor. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2] dpif-netlink: Fix log level for error message
On 1 August 2017 at 13:56, Joe Stringer wrote: > On 29 July 2017 at 21:58, Roi Dayan wrote: >> Since it's an error but also will always occur in older kernels >> log the message with level warning instead of info. >> >> Signed-off-by: Roi Dayan >> --- > > When I run this patch on a fedora system with kernel 4.8 and > out-of-tree modules, when running the "make check-kmod" tests, every > vxlan test now fails because the last step of the test checks the logs > for any log message with WARN or higher level, and this message shows > up. Looking at the other platforms affected, it seems like this will > always triggers on newer kernels in my test environment. Here's the > log: > > 2017-08-01T20:43:59.387Z|00053|dpif_netlink|WARN|Failed to create > at_vxlan0 with rtnetlink: Invalid argument > > I wonder if the new rtnetlink infrastructure is not always working for > VXLAN, and this error message is just highlighting that? Eric, could > this be related to VXLAN-GPE? Correction: The non-GPE VXLAN tests fail in this case. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH V2] dpif-netlink: Fix log level for error message
On 29 July 2017 at 21:58, Roi Dayan wrote: > Since it's an error but also will always occur in older kernels > log the message with level warning instead of info. > > Signed-off-by: Roi Dayan > --- When I run this patch on a fedora system with kernel 4.8 and out-of-tree modules, when running the "make check-kmod" tests, every vxlan test now fails because the last step of the test checks the logs for any log message with WARN or higher level, and this message shows up. Looking at the other platforms affected, it seems like this will always triggers on newer kernels in my test environment. Here's the log: 2017-08-01T20:43:59.387Z|00053|dpif_netlink|WARN|Failed to create at_vxlan0 with rtnetlink: Invalid argument I wonder if the new rtnetlink infrastructure is not always working for VXLAN, and this error message is just highlighting that? Eric, could this be related to VXLAN-GPE? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] RFC: Let's branch for 2.8 today
On Tue, Aug 01, 2017 at 06:49:53PM +, Darrell Ball wrote: > > -Original Message- > From: on behalf of Ben Pfaff > Date: Tuesday, August 1, 2017 at 10:23 AM > To: "d...@openvswitch.org" > Subject: [ovs-dev] RFC: Let's branch for 2.8 today > > I'd like to create branch-2.8 today in preparation for releasing OVS 2.8 > later in August. I sent the patch that would lead off the branch a few > minutes ago: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-2DAugust_336544.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=EHRmZog9ZtlKH5MXuo4OklgP6dlm_pfMwfokiw2bXGk&s=udsL-1fyk8cB7_qe39p2Ua8qSMMWmIcBcWDmsjECPTg&e= > > > According to our published schedule, we should have create branch-2.8 at > the beginning of July, but we delayed to allow some more features to > enter the 2.8 release. There continue to be reasonable arguments along > these lines in favor of delaying branching further: > > * Many DPDK patches have been posted but are still under review > or awaiting application. > > I don’t see much problem for double commit as the baselines should be > identical. However, most DPDK folks are offline until late tonight > PST; can we give them a chance to be back online before branching ? Waiting until Wednesday morning (US Pacific) isn't a problem for me. > I propose that we branch today, with the following further details: > > * Continue to apply patches that add features, that as of now > have already been posted and ready for review or iterating on > reviews, until approximately this Friday. > > For limited scope features, would be ok to allow a little longer, > maybe early next week ? OK. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] RFC: Let's branch for 2.8 today
On Tue, Aug 1, 2017 at 1:23 PM, Ben Pfaff wrote: > I'd like to create branch-2.8 today in preparation for releasing OVS 2.8 > later in August. I sent the patch that would lead off the branch a few > minutes ago: > https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336544.html > > According to our published schedule, we should have create branch-2.8 at > the beginning of July, but we delayed to allow some more features to > enter the 2.8 release. There continue to be reasonable arguments along > these lines in favor of delaying branching further: > > * Many DPDK patches have been posted but are still under review > or awaiting application. > > * Multiple series of patches leading up to NSH support have been > posted but are still iterating through reviews. > > The primary argument in favor of branching quickly is to keep to a > regular release cadence. My understanding is that multiple downstream > projects rely on prompt Open vSwitch releases. (This is mainly based on > speaking to Russell Bryant at Red Hat; he can fill in the details if > need be.) I'd prefer to stick to our schedule for this reason. Yes, my reasoning is that a time based release schedule is only valuable if we stick to it as closely as possible. When dates are reliable, more downstream advance planning can be done. If we stray too far from published dates, downstream consumers will lose confidence in any published dates. The net result will be a longer window of time between an upstream OVS release and downstream integration. Here is a concrete example. We published a date for OVS 2.8 of mid-August. OpenStack Pike is scheduled to be released at the end of August. What version of OVS should a downstream aim to ship with this version of OpenStack? The ideal case in this scenario would be to provide OVS 2.8 with OpenStack Pike. Some new OVN features for OpenStack depend on support in OVN as well. The safer fallback is to plan on shipping OVS 2.7 instead. That fallback is much less sensitive to OVS release timing, but introduces another ~6 month delay before features are finally shipped to users via normal distribution mechanisms. A consistent time based release model can be very helpful to downstream consumers, but only if the dates can be relied upon. > > I propose that we branch today, with the following further details: > > * Continue to apply patches that add features, that as of now > have already been posted and ready for review or iterating on > reviews, until approximately this Friday. These patches will > need to be committed both to master and branch-2.8. > > * As always, bug fix patches are acceptable on any branch at any > time. > > * Plan to release, given acceptable quality, during the last > week of August. > > Thoughts? > > Thanks, > > Ben. > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovn-northd: Add native active-standby HA.
On Tue, Aug 1, 2017 at 3:26 PM, Han Zhou wrote: > > > On Tue, Aug 1, 2017 at 9:19 AM, Russell Bryant wrote: >> >> Add native support for active-standby HA in ovn-northd by having each >> instance attempt to acquire an OVSDB lock. Only the instance of >> ovn-northd that currently holds the lock will make active changes to >> the OVN databases. >> >> Signed-off-by: Russell Bryant >> --- >> NEWS| 1 + >> ovn/northd/ovn-northd.8.xml | 9 + >> ovn/northd/ovn-northd.c | 40 +++- >> 3 files changed, 41 insertions(+), 9 deletions(-) >> >> diff --git a/NEWS b/NEWS >> index facea0228..f3cdd2443 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -49,6 +49,7 @@ Post-v2.7.0 >> one chassis is specified, OVN will manage high availability for >> that >> gateway. >> * Add support for ACL logging. >> + * ovn-northd now has native support for active-standby high >> availability. >> - Tracing with ofproto/trace now traces through recirculation. >> - OVSDB: >> * New support for role-based access control (see ovsdb-server(1)). >> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml >> index 1527e8a60..0d85ec0d2 100644 >> --- a/ovn/northd/ovn-northd.8.xml >> +++ b/ovn/northd/ovn-northd.8.xml >> @@ -72,6 +72,15 @@ >> >> >> >> +Active-Standby for High Availability >> + >> + You may run ovn-northd more than once in an OVN >> deployment. >> + OVN will automatically ensure that only one of them is active at a >> time. >> + If multiple instances of ovn-northd are running and >> the >> + active ovn-northd fails, one of the hot standby >> instances >> + of ovn-northd will automatically take over. >> + >> + >> Logical Flow Table Structure >> >> >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c >> index 10e0c7ce0..3d2be4267 100644 >> --- a/ovn/northd/ovn-northd.c >> +++ b/ovn/northd/ovn-northd.c >> @@ -6531,6 +6531,12 @@ main(int argc, char *argv[]) >> ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg); >> ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name); >> >> +/* Ensure that only a single ovn-northd is active in the deployment >> by >> + * acquiring a lock called "ovn_northd" on the southbound database >> + * and then only performing DB transactions if the lock is held. */ >> +ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd"); >> +bool had_lock = false; >> + >> /* Main loop. */ >> exiting = false; >> while (!exiting) { >> @@ -6541,15 +6547,29 @@ main(int argc, char *argv[]) >> .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop), >> }; >> >> -struct chassis_index chassis_index; >> -chassis_index_init(&chassis_index, ctx.ovnsb_idl); >> +if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { >> +VLOG_INFO("ovn-northd lock acquired. " >> + "This ovn-northd instance is now active."); >> +had_lock = true; >> +} else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { >> +VLOG_INFO("ovn-northd lock lost. " >> + "This ovn-northd instance is now on standby."); > > Should it try lock again, if we want it to be standby? Otherwise, this > instance won't have a chance to be active any more. Good question ... I was assuming this scenario was due to a lost connection, and that the IDL would automatically try to re-acquire the lock. I tested to make sure I saw a second ovn-northd go from standby to active, but I have not tested active -> standby -> active again. I'll take a closer look at this before applying the patch. > >> +had_lock = false; >> +} >> >> -ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop); >> -ovnsb_db_run(&ctx, &ovnsb_idl_loop); >> -if (ctx.ovnsb_txn) { >> -check_and_add_supported_dhcp_opts_to_sb_db(&ctx); >> -check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx); >> -check_and_update_rbac(&ctx); >> +struct chassis_index chassis_index; >> +bool destroy_chassis_index = false; >> +if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { >> +chassis_index_init(&chassis_index, ctx.ovnsb_idl); >> +destroy_chassis_index = true; >> + >> +ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop); >> +ovnsb_db_run(&ctx, &ovnsb_idl_loop); >> +if (ctx.ovnsb_txn) { >> +check_and_add_supported_dhcp_opts_to_sb_db(&ctx); >> +check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx); >> +check_and_update_rbac(&ctx); >> +} >> } >> >> unixctl_server_run(unixctl); >> @@ -6565,7 +6585,9 @@ main(int argc, char *argv[]) >> exiting = true; >> } >> >> -chassis_index_destroy(&chassis_index); >>
Re: [ovs-dev] [PATCH] ovn-northd: Add native active-standby HA.
On Tue, Aug 1, 2017 at 9:19 AM, Russell Bryant wrote: > > Add native support for active-standby HA in ovn-northd by having each > instance attempt to acquire an OVSDB lock. Only the instance of > ovn-northd that currently holds the lock will make active changes to > the OVN databases. > > Signed-off-by: Russell Bryant > --- > NEWS| 1 + > ovn/northd/ovn-northd.8.xml | 9 + > ovn/northd/ovn-northd.c | 40 +++- > 3 files changed, 41 insertions(+), 9 deletions(-) > > diff --git a/NEWS b/NEWS > index facea0228..f3cdd2443 100644 > --- a/NEWS > +++ b/NEWS > @@ -49,6 +49,7 @@ Post-v2.7.0 > one chassis is specified, OVN will manage high availability for that > gateway. > * Add support for ACL logging. > + * ovn-northd now has native support for active-standby high availability. > - Tracing with ofproto/trace now traces through recirculation. > - OVSDB: > * New support for role-based access control (see ovsdb-server(1)). > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml > index 1527e8a60..0d85ec0d2 100644 > --- a/ovn/northd/ovn-northd.8.xml > +++ b/ovn/northd/ovn-northd.8.xml > @@ -72,6 +72,15 @@ > > > > +Active-Standby for High Availability > + > + You may run ovn-northd more than once in an OVN deployment. > + OVN will automatically ensure that only one of them is active at a time. > + If multiple instances of ovn-northd are running and the > + active ovn-northd fails, one of the hot standby instances > + of ovn-northd will automatically take over. > + > + > Logical Flow Table Structure > > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > index 10e0c7ce0..3d2be4267 100644 > --- a/ovn/northd/ovn-northd.c > +++ b/ovn/northd/ovn-northd.c > @@ -6531,6 +6531,12 @@ main(int argc, char *argv[]) > ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg); > ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name); > > +/* Ensure that only a single ovn-northd is active in the deployment by > + * acquiring a lock called "ovn_northd" on the southbound database > + * and then only performing DB transactions if the lock is held. */ > +ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd"); > +bool had_lock = false; > + > /* Main loop. */ > exiting = false; > while (!exiting) { > @@ -6541,15 +6547,29 @@ main(int argc, char *argv[]) > .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop), > }; > > -struct chassis_index chassis_index; > -chassis_index_init(&chassis_index, ctx.ovnsb_idl); > +if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { > +VLOG_INFO("ovn-northd lock acquired. " > + "This ovn-northd instance is now active."); > +had_lock = true; > +} else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { > +VLOG_INFO("ovn-northd lock lost. " > + "This ovn-northd instance is now on standby."); Should it try lock again, if we want it to be standby? Otherwise, this instance won't have a chance to be active any more. > +had_lock = false; > +} > > -ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop); > -ovnsb_db_run(&ctx, &ovnsb_idl_loop); > -if (ctx.ovnsb_txn) { > -check_and_add_supported_dhcp_opts_to_sb_db(&ctx); > -check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx); > -check_and_update_rbac(&ctx); > +struct chassis_index chassis_index; > +bool destroy_chassis_index = false; > +if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { > +chassis_index_init(&chassis_index, ctx.ovnsb_idl); > +destroy_chassis_index = true; > + > +ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop); > +ovnsb_db_run(&ctx, &ovnsb_idl_loop); > +if (ctx.ovnsb_txn) { > +check_and_add_supported_dhcp_opts_to_sb_db(&ctx); > +check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx); > +check_and_update_rbac(&ctx); > +} > } > > unixctl_server_run(unixctl); > @@ -6565,7 +6585,9 @@ main(int argc, char *argv[]) > exiting = true; > } > > -chassis_index_destroy(&chassis_index); > +if (destroy_chassis_index) { > +chassis_index_destroy(&chassis_index); > +} > } > > unixctl_server_destroy(unixctl); > -- > 2.13.3 > Acked-by: Han Zhou ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] RFC: Let's branch for 2.8 today
-Original Message- From: on behalf of Ben Pfaff Date: Tuesday, August 1, 2017 at 10:23 AM To: "d...@openvswitch.org" Subject: [ovs-dev] RFC: Let's branch for 2.8 today I'd like to create branch-2.8 today in preparation for releasing OVS 2.8 later in August. I sent the patch that would lead off the branch a few minutes ago: https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-2DAugust_336544.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=EHRmZog9ZtlKH5MXuo4OklgP6dlm_pfMwfokiw2bXGk&s=udsL-1fyk8cB7_qe39p2Ua8qSMMWmIcBcWDmsjECPTg&e= According to our published schedule, we should have create branch-2.8 at the beginning of July, but we delayed to allow some more features to enter the 2.8 release. There continue to be reasonable arguments along these lines in favor of delaying branching further: * Many DPDK patches have been posted but are still under review or awaiting application. I don’t see much problem for double commit as the baselines should be identical. However, most DPDK folks are offline until late tonight PST; can we give them a chance to be back online before branching ? * Multiple series of patches leading up to NSH support have been posted but are still iterating through reviews. The primary argument in favor of branching quickly is to keep to a regular release cadence. My understanding is that multiple downstream projects rely on prompt Open vSwitch releases. (This is mainly based on speaking to Russell Bryant at Red Hat; he can fill in the details if need be.) I'd prefer to stick to our schedule for this reason. I propose that we branch today, with the following further details: * Continue to apply patches that add features, that as of now have already been posted and ready for review or iterating on reviews, until approximately this Friday. For limited scope features, would be ok to allow a little longer, maybe early next week ? These patches will need to be committed both to master and branch-2.8. * As always, bug fix patches are acceptable on any branch at any time. * Plan to release, given acceptable quality, during the last week of August. Thoughts? Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=EHRmZog9ZtlKH5MXuo4OklgP6dlm_pfMwfokiw2bXGk&s=9VEWp89QUGpFZt15Xw2A3I7u7X3Le-4H_EGM7Vo7eiU&e= ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ofpbuf: Fix parameter for const initializer.
On Tue, Aug 01, 2017 at 10:24:47AM -0700, Joe Stringer wrote: > On 1 August 2017 at 07:55, Ben Pfaff wrote: > > On Mon, Jul 31, 2017 at 05:16:11PM -0700, Joe Stringer wrote: > >> Clang complains: > >> > >> In file included from include/openvswitch/cxxtest.cc:11:0: > >> ../include/openvswitch/ofpbuf.h: In function ‘ofpbuf > >> ofpbuf_const_initializer(const void*, size_t)’: > >> ../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of > >> ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned > >> int}’ inside { } [-Wnarrowing] > >> }; > >> ^ > >> ../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of > >> ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned > >> int}’ inside { } [-Wnarrowing] > >> > >> This is because the ofpbuf struct's "size" parameter is a uint32_t, > >> while ofpbuf_const_initializer() takes a size_t for the size. Fix this > >> function to take a uint32_t instead. > >> > >> Signed-off-by: Joe Stringer > > > > Thanks. (We must have different clang versions.0 > > > > Acked-by: Ben Pfaff > > Perhaps - Clang-4.0? > > Thanks, applied to master. Probably, I have clang-3.8 here. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ofpbuf: Fix parameter for const initializer.
On 1 August 2017 at 07:55, Ben Pfaff wrote: > On Mon, Jul 31, 2017 at 05:16:11PM -0700, Joe Stringer wrote: >> Clang complains: >> >> In file included from include/openvswitch/cxxtest.cc:11:0: >> ../include/openvswitch/ofpbuf.h: In function ‘ofpbuf >> ofpbuf_const_initializer(const void*, size_t)’: >> ../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of >> ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned >> int}’ inside { } [-Wnarrowing] >> }; >> ^ >> ../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of >> ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned >> int}’ inside { } [-Wnarrowing] >> >> This is because the ofpbuf struct's "size" parameter is a uint32_t, >> while ofpbuf_const_initializer() takes a size_t for the size. Fix this >> function to take a uint32_t instead. >> >> Signed-off-by: Joe Stringer > > Thanks. (We must have different clang versions.0 > > Acked-by: Ben Pfaff Perhaps - Clang-4.0? Thanks, applied to master. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] RFC: Let's branch for 2.8 today
I'd like to create branch-2.8 today in preparation for releasing OVS 2.8 later in August. I sent the patch that would lead off the branch a few minutes ago: https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336544.html According to our published schedule, we should have create branch-2.8 at the beginning of July, but we delayed to allow some more features to enter the 2.8 release. There continue to be reasonable arguments along these lines in favor of delaying branching further: * Many DPDK patches have been posted but are still under review or awaiting application. * Multiple series of patches leading up to NSH support have been posted but are still iterating through reviews. The primary argument in favor of branching quickly is to keep to a regular release cadence. My understanding is that multiple downstream projects rely on prompt Open vSwitch releases. (This is mainly based on speaking to Russell Bryant at Red Hat; he can fill in the details if need be.) I'd prefer to stick to our schedule for this reason. I propose that we branch today, with the following further details: * Continue to apply patches that add features, that as of now have already been posted and ready for review or iterating on reviews, until approximately this Friday. These patches will need to be committed both to master and branch-2.8. * As always, bug fix patches are acceptable on any branch at any time. * Plan to release, given acceptable quality, during the last week of August. Thoughts? Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] Prepare for 2.8.0.
Signed-off-by: Ben Pfaff --- This is intended to be the first commit on a newly created branch-2.8. NEWS | 2 +- configure.ac | 2 +- debian/changelog | 88 ++-- 3 files changed, 81 insertions(+), 11 deletions(-) diff --git a/NEWS b/NEWS index facea0228d3a..a9e4d253b918 100644 --- a/NEWS +++ b/NEWS @@ -1,4 +1,4 @@ -Post-v2.7.0 +v2.8.0 - xx xxx - - ovs-ofctl: * ovs-ofctl can now accept and display port names in place of numbers. By diff --git a/configure.ac b/configure.ac index 093b6ffed50e..aefe475bcac8 100644 --- a/configure.ac +++ b/configure.ac @@ -13,7 +13,7 @@ # limitations under the License. AC_PREREQ(2.63) -AC_INIT(openvswitch, 2.7.90, b...@openvswitch.org) +AC_INIT(openvswitch, 2.8.0, b...@openvswitch.org) AC_CONFIG_SRCDIR([datapath/datapath.c]) AC_CONFIG_MACRO_DIR([m4]) AC_CONFIG_AUX_DIR([build-aux]) diff --git a/debian/changelog b/debian/changelog index 45d566a872ad..e216bd4cabab 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,13 +1,83 @@ -openvswitch (2.7.90-1) unstable; urgency=low - +openvswitch (2.8.0-1) unstable; urgency=low [ Open vSwitch team ] - * Use debhelper for DEB_VERSION_UPSTREAM. Thanks to Clint Byrum - for this improvement. - * New upstream version - - Nothing yet! Try NEWS... - - -- Open vSwitch team Tue, 31 Jan 2017 17:51:32 -0700 - + - ovs-ofctl: + * ovs-ofctl can now accept and display port names in place of numbers. By + default it always accepts names and in interactive use it displays them; + use --names or --no-names to override. See ovs-ofctl(8) for details. + * "ovs-ofctl dump-flows" now accepts --no-stats to omit flow statistics. + - New ovs-dpctl command "ct-stats-show" to show connection tracking stats. + - Tunnels: + * Added support to set packet mark for tunnel endpoint using + `egress_pkt_mark` OVSDB option. + * When using Linux kernel datapath tunnels may be created using rtnetlink. + This will allow us to take advantage of new tunnel features without + having to make changes to the vport modules. + - EMC insertion probability is reduced to 1% and is configurable via + the new 'other_config:emc-insert-inv-prob' option. + - DPDK: + * DPDK log messages redirected to OVS logging subsystem. + Log level can be changed in a usual OVS way using + 'ovs-appctl vlog' commands for 'dpdk' module. Lower bound + still can be configured via extra arguments for DPDK EAL. + * dpdkvhostuser ports are marked as deprecated. They will be removed + in an upcoming release. + - IPFIX now provides additional counters: + * Total counters since metering process startup. + * Per-flow TCP flag counters. + * Multicast, broadcast, and unicast counters. + - New support for multiple VLANs (802.1ad or "QinQ"), including a new + "dot1q-tunnel" port VLAN mode. + - In ovn-vsctl and vtep-ctl, record UUIDs in commands may now be + abbreviated to 4 hex digits. + - Userspace Datapath: + * Added NAT support for userspace datapath. + - OVN: + * New built-in DNS support. + * IPAM for IPv4 can now exclude user-defined addresses from assignment. + * IPAM can now assign IPv6 addresses. + * Make the DHCPv4 router setting optional. + * Gratuitous ARP for NAT addresses on a distributed logical router. + * Allow ovn-controller SSL configuration to be obtained from vswitchd + database. + * ovn-trace now has basic support for tracing distributed firewalls. + * In ovn-nbctl and ovn-sbctl, record UUIDs in commands may now be + abbreviated to 4 hex digits. + * "ovn-sbctl lflow-list" can now print OpenFlow flows that correspond + to logical flows. + * Now uses OVSDB RBAC support to reduce impact of compromised hypervisors. + * Multiple chassis may now be specified for L3 gateways. When more than + one chassis is specified, OVN will manage high availability for that + gateway. + * Add support for ACL logging. + - Tracing with ofproto/trace now traces through recirculation. + - OVSDB: + * New support for role-based access control (see ovsdb-server(1)). + - New commands 'stp/show' and 'rstp/show' (see ovs-vswitchd(8)). + - OpenFlow: + * All features required by OpenFlow 1.4 are now implemented, so + ovs-vswitchd now enables OpenFlow 1.4 by default (in addition to + OpenFlow 1.0 to 1.3). + * Increased support for OpenFlow 1.6 (draft). + * Bundles now support hashing by just nw_src or nw_dst. + * The "learn" action now supports a "limit" option (see ovs-ofctl(8)). + * The port status bit OFPPS_LIVE now reflects link aliveness. + * OpenFlow 1.5 packet-out is now supported. + - Fedora Packaging: + * OVN services are no longer restarted automatically after upgrade. + - Add --cleanup option to command 'ovs-appctl exit' (see
Re: [ovs-dev] FW: [patch_v1] docs/dpdk: Consolidate pmd-cpu-mask references.
On 07/31/2017 04:56 PM, Darrell Ball wrote: Hi Greg Since you would be looking at the DPDK documentation these days; could you help review this patch ? Thanks Darrell The patch looks good to me Darrell. Replying to your forward since my email client seems to have dumped the original patch on the ovs-dev list. Reviewed-by: Greg rose -Original Message- From: on behalf of Darrell Ball Date: Thursday, July 27, 2017 at 4:50 PM To: "d...@openvswitch.org" Subject: [ovs-dev] [patch_v1] docs/dpdk: Consolidate pmd-cpu-mask references. The DPDK introductory documentation has various references to pmd-cpu-mask, including a section devoted to it. These parts of the documentation seeemed to have been written at different times and look like they were individually ported from other sources. They all include an example command which gets repeated several times. Here, we consolidate those referenes to make the documentation easier to maintain. At the same time, create linkages to the pmd-cpu-mask section from other sections to provide some level of coherence. Signed-off-by: Darrell Ball --- Documentation/intro/install/dpdk.rst | 141 --- 1 file changed, 65 insertions(+), 76 deletions(-) diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst index a05aa1a..2e9cf8d 100644 --- a/Documentation/intro/install/dpdk.rst +++ b/Documentation/intro/install/dpdk.rst @@ -237,20 +237,12 @@ or:: $ ovs-vsctl --no-wait set Open_vSwitch . \ other_config:dpdk-socket-mem="1024" -Similarly, if you wish to better scale the workloads across cores, then -multiple pmd threads can be created and pinned to CPU cores by explicity -specifying ``pmd-cpu-mask``. Cores are numbered from 0, so to spawn two pmd -threads and pin them to cores 1,2, run:: - -$ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x6 - -Refer to ovs-vswitchd.conf.db(5) for additional information on configuration -options. - .. note:: Changing any of these options requires restarting the ovs-vswitchd application +See the section ``Performance Tuning`` for important DPDK customizations. + Validating -- @@ -373,32 +365,6 @@ Now mount the huge pages, if not already done so:: $ mount -t hugetlbfs -o pagesize=1G none /dev/hugepages -Enable HyperThreading -~ - -With HyperThreading, or SMT, enabled, a physical core appears as two logical -cores. SMT can be utilized to spawn worker threads on logical cores of the same -physical core there by saving additional cores. - -With DPDK, when pinning pmd threads to logical cores, care must be taken to set -the correct bits of the ``pmd-cpu-mask`` to ensure that the pmd threads are -pinned to SMT siblings. - -Take a sample system configuration, with 2 sockets, 2 * 10 core processors, HT -enabled. This gives us a total of 40 logical cores. To identify the physical -core shared by two logical cores, run:: - -$ cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list - -where ``N`` is the logical core number. - -In this example, it would show that cores ``1`` and ``21`` share the same -physical core. As cores are counted from 0, the ``pmd-cpu-mask`` can be used -to enable these two pmd threads running on these two logical cores (one -physical core) is:: - -$ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x22 - Isolate Cores ~ @@ -413,23 +379,6 @@ cmdline. It has been verified that core isolation has minimal advantage due to mature Linux scheduler in some circumstances. -NUMA/Cluster-on-Die -~~~ - -Ideally inter-NUMA datapaths should be avoided where possible as packets will -go across QPI and there may be a slight performance penalty when compared with -intra NUMA datapaths. On Intel Xeon Processor E5 v3, Cluster On Die is -introduced on models that have 10 cores or more. This makes it possible to -logically split a socket into two NUMA regions and again it is preferred where -possible to keep critical datapaths within the one cluster. - -It is good practice to ensure that threads that are in the datapath are pinned -to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs responsible for -forwarding. If DPDK is built with ``CONFIG_RTE_LIBRTE_VHOST_NUMA=y``, vHost -User ports automatically detect the NUMA socket of the QEMU vCPUs and will be -serviced by a PMD from the same node provided a core on this node is enabled in -the ``pmd-cpu-mask``. ``libnuma`` packages are req
Re: [ovs-dev] [PATCH] rhel: Fix typo in README.RHEL.rst
On Fri, Jul 28, 2017 at 3:02 PM, Timothy Redaelli wrote: > Replace systemctk with systemctl > > Signed-off-by: Timothy Redaelli > --- > rhel/README.RHEL.rst | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Thanks, applied to master. -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] ovn-northd: Add native active-standby HA.
Add native support for active-standby HA in ovn-northd by having each instance attempt to acquire an OVSDB lock. Only the instance of ovn-northd that currently holds the lock will make active changes to the OVN databases. Signed-off-by: Russell Bryant --- NEWS| 1 + ovn/northd/ovn-northd.8.xml | 9 + ovn/northd/ovn-northd.c | 40 +++- 3 files changed, 41 insertions(+), 9 deletions(-) diff --git a/NEWS b/NEWS index facea0228..f3cdd2443 100644 --- a/NEWS +++ b/NEWS @@ -49,6 +49,7 @@ Post-v2.7.0 one chassis is specified, OVN will manage high availability for that gateway. * Add support for ACL logging. + * ovn-northd now has native support for active-standby high availability. - Tracing with ofproto/trace now traces through recirculation. - OVSDB: * New support for role-based access control (see ovsdb-server(1)). diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml index 1527e8a60..0d85ec0d2 100644 --- a/ovn/northd/ovn-northd.8.xml +++ b/ovn/northd/ovn-northd.8.xml @@ -72,6 +72,15 @@ +Active-Standby for High Availability + + You may run ovn-northd more than once in an OVN deployment. + OVN will automatically ensure that only one of them is active at a time. + If multiple instances of ovn-northd are running and the + active ovn-northd fails, one of the hot standby instances + of ovn-northd will automatically take over. + + Logical Flow Table Structure diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 10e0c7ce0..3d2be4267 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -6531,6 +6531,12 @@ main(int argc, char *argv[]) ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg); ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name); +/* Ensure that only a single ovn-northd is active in the deployment by + * acquiring a lock called "ovn_northd" on the southbound database + * and then only performing DB transactions if the lock is held. */ +ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd"); +bool had_lock = false; + /* Main loop. */ exiting = false; while (!exiting) { @@ -6541,15 +6547,29 @@ main(int argc, char *argv[]) .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop), }; -struct chassis_index chassis_index; -chassis_index_init(&chassis_index, ctx.ovnsb_idl); +if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { +VLOG_INFO("ovn-northd lock acquired. " + "This ovn-northd instance is now active."); +had_lock = true; +} else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { +VLOG_INFO("ovn-northd lock lost. " + "This ovn-northd instance is now on standby."); +had_lock = false; +} -ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop); -ovnsb_db_run(&ctx, &ovnsb_idl_loop); -if (ctx.ovnsb_txn) { -check_and_add_supported_dhcp_opts_to_sb_db(&ctx); -check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx); -check_and_update_rbac(&ctx); +struct chassis_index chassis_index; +bool destroy_chassis_index = false; +if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { +chassis_index_init(&chassis_index, ctx.ovnsb_idl); +destroy_chassis_index = true; + +ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop); +ovnsb_db_run(&ctx, &ovnsb_idl_loop); +if (ctx.ovnsb_txn) { +check_and_add_supported_dhcp_opts_to_sb_db(&ctx); +check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx); +check_and_update_rbac(&ctx); +} } unixctl_server_run(unixctl); @@ -6565,7 +6585,9 @@ main(int argc, char *argv[]) exiting = true; } -chassis_index_destroy(&chassis_index); +if (destroy_chassis_index) { +chassis_index_destroy(&chassis_index); +} } unixctl_server_destroy(unixctl); -- 2.13.3 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 2/5] dpif-netdev: Avoid reading RSS hash when EMC is disabled.
> -Original Message- > From: O Mahony, Billy > Sent: Monday, July 31, 2017 5:22 PM > To: Fischetti, Antonio ; d...@openvswitch.org > Subject: RE: [ovs-dev] [PATCH v2 2/5] dpif-netdev: Avoid reading RSS hash when > EMC is disabled. > > Hi Antonio, > > This is patch is definitely simpler than the original. > > However on the original patch I suggested: > > "If so it would be less disturbing to the existing code to just add a bool arg > to dpif_netdev_packet_get_rss_hash() called do_not_check_recirc_depth and use > that to return early (before the if (recirc_depth) check). Also in that case > the patch would require none of the conditional logic changes (neither the > original or that suggested in this email) and should be able to just set the > proposed do_not_check_recirc_depth based on md_is_valid." > > I know you checked this and reported the performance gain was lower than with > the v1 patch. We surmised that it was related to introducing a branch in the > dpif_netdev_packet_get_rss_hash(). However there are many branches in this > patch also. > > Can you give details of how you are testing? > * What is the traffic > * the flows/rules and > * how are you measuring the performance difference (ie. cycles per packet or > packet throughput or some other measure). [Antonio] I'm using a port-to-port setup, sending 1 UDP flow, 64B packets. 2 PMDs with 3 Tx queues. The biggest difference is with case A where the 5-tuple hash is computed in software. Case A) RSS Hash is disabled. I see for each side the Rx pkt rate: Orig OvS + previous patch#1: 1.281.29 =2.57 Mpps Orig OvS + previous patch#1 + this patch: 1.471.49 =2.96 Mpps Case B) In case RSS Hash is enabled I see more or less the same performance: Orig OvS + previous patch#1: 11.59 11.72 =23.31 Mpps Orig OvS + previous patch#1 + this patch: 11.45 11.84 =23.29 Mpps Case C) RSS Hash is enabled, EMC is disabled. Orig OvS + previous patch#1 + No EMC: 7.687.65 =15.33 Mpps Orig OvS + previous patch#1 + this patch: 7.627.73 =15.35 Mpps > > Apologies for going on about this but if we can't get the same effect with a > two or three line change than a 20line change I think it'll be worth it. > > One other comment below > > Thanks, > Billy. > > > > -Original Message- > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > > boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com > > Sent: Wednesday, July 19, 2017 5:05 PM > > To: d...@openvswitch.org > > Subject: [ovs-dev] [PATCH v2 2/5] dpif-netdev: Avoid reading RSS hash when > > EMC is disabled. > > > > When EMC is disabled the reading of RSS hash is skipped. > > [[BO'M]] I think this is already the case with the existing code? Just > addition of OVS_UNLIKELY on the check. [Antonio] No, in the existing code when EMC is disabled it just skips emc_lookup. miniflow_extract(packet, &key->mf); key->len = 0; /* Not computed yet. */ key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf); <--- Read Hash /* If EMC is disabled skip emc_lookup */ flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key); <--- Skip emc_lookup > > > For packets that are not recirculated it retrieves the hash value without > > considering the recirc id. > > > > This is mostly a preliminary change for the next patch in this series. > > > > Signed-off-by: Antonio Fischetti > > --- > > lib/dpif-netdev.c | 42 ++ > > 1 file changed, 34 insertions(+), 8 deletions(-) > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 123e04a..9562827 > > 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -4472,6 +4472,22 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread > > *pmd, struct dp_packet *packet_, } > > > > static inline uint32_t > > +dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet, > > +const struct miniflow *mf) { > > +uint32_t hash; > > + > > +if (OVS_LIKELY(dp_packet_rss_valid(packet))) { > > +hash = dp_packet_get_rss_hash(packet); > > +} else { > > +hash = miniflow_hash_5tuple(mf, 0); > > +dp_packet_set_rss_hash(packet, hash); > > +} > > + > > +return hash; > > +} > > + > > +static inline uint32_t > > dpif_netdev_packet_get_rss_hash(struct dp_packet *packet, > > const struct miniflow *mf) { @@ -4572,7 > +4588,8 @@ static > > inline size_t emc_processing(struct dp_netdev_pmd_thread *pmd, > > struct dp_packet_batch *packets_, > > struct netdev_flow_key *keys, > > - struct packet_batch_per_flow batches[], size_t *n_batches) > > + struct packet_batch_per_flow batches[], size_t *n_batches, > > + bool md_is_valid) > > { > > struct emc_cache *flow_cac
[ovs-dev] [PATCH v3 6/6] dpif-netdev: Add ovs-appctl dpif-netdev/pmd-rxq-rebalance.
Rxqs consumed processing cycles are used to improve the balance of how rxqs are assigned to pmds. Currently some reconfiguration is needed to perform a reassignment. Add an ovs-appctl command to perform a new assignment in order to balance based on the latest rxq processing cycle information. Note: Jan requested this for testing purposes. Suggested-by: Jan Scheurich Signed-off-by: Kevin Traynor --- Documentation/howto/dpdk.rst | 5 - lib/dpif-netdev.c| 35 +++ vswitchd/ovs-vswitchd.8.in | 2 ++ 3 files changed, 41 insertions(+), 1 deletion(-) diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index c7bac4e..4642849 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -140,5 +140,8 @@ Core 7: Q4 (70%) | Q5 (10%) core 8: Q0 (60%) | Q0 (30%) -Rxq to pmds assignment takes place whenever there are configuration changes. +Rxq to pmds assignment takes place whenever there are configuration changes +or can be triggered by using:: + +$ ovs-appctl dpif-netdev/pmd-rxq-rebalance QoS diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 3f174c8..7c97c93 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -702,4 +702,6 @@ static inline bool emc_entry_alive(struct emc_entry *ce); static void emc_clear_entry(struct emc_entry *ce); +static void dp_netdev_request_reconfigure(struct dp_netdev *dp); + static void emc_cache_init(struct emc_cache *flow_cache) @@ -995,4 +997,34 @@ sorted_poll_thread_list(struct dp_netdev *dp, static void +dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc, + const char *argv[], void *aux OVS_UNUSED) +{ +struct ds reply = DS_EMPTY_INITIALIZER; +struct dp_netdev *dp = NULL; + +ovs_mutex_lock(&dp_netdev_mutex); + +if (argc == 2) { +dp = shash_find_data(&dp_netdevs, argv[1]); +} else if (shash_count(&dp_netdevs) == 1) { +/* There's only one datapath */ +dp = shash_first(&dp_netdevs)->data; +} + +if (!dp) { +ovs_mutex_unlock(&dp_netdev_mutex); +unixctl_command_reply_error(conn, +"please specify an existing datapath"); +return; +} + +dp_netdev_request_reconfigure(dp); +ovs_mutex_unlock(&dp_netdev_mutex); +ds_put_cstr(&reply, "pmd rxq rebalance requested.\n"); +unixctl_command_reply(conn, ds_cstr(&reply)); +ds_destroy(&reply); +} + +static void dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], void *aux) @@ -1073,4 +1105,7 @@ dpif_netdev_init(void) 0, 1, dpif_netdev_pmd_info, (void *)&poll_aux); +unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", + 0, 1, dpif_netdev_pmd_rebalance, + NULL); return 0; } diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in index dfd209e..c18baf6 100644 --- a/vswitchd/ovs-vswitchd.8.in +++ b/vswitchd/ovs-vswitchd.8.in @@ -281,4 +281,6 @@ bridge statistics, only the values shown by the above command. For each pmd thread of the datapath \fIdp\fR shows list of queue-ids with port names, which this thread polls. +.IP "\fBdpif-netdev/pmd-rxq-rebalance\fR [\fIdp\fR]" +Reassigns rxqs to pmds in the datapath \fIdp\fR based on their current usage. . .so ofproto/ofproto-dpif-unixctl.man -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 5/6] dpif-netdev: Change pmd selection order.
Up to his point rxqs are sorted by processing cycles they consumed and assigned to pmds in a round robin manner. Ian pointed out that on wrap around the most loaded pmd will be the next one to be assigned an additional rxq and that it would be better to reverse the pmd order when wraparound occurs. In other words, change from assigning by rr to assigning in a forward and reverse cycle through pmds. Also, now that the algothim has finalised, document an example. Suggested-by: Ian Stokes Signed-off-by: Kevin Traynor --- Documentation/howto/dpdk.rst | 16 lib/dpif-netdev.c| 21 - tests/pmd.at | 2 +- 3 files changed, 37 insertions(+), 2 deletions(-) diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index a969285..c7bac4e 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -124,4 +124,20 @@ will be used where known to assign rxqs with the highest consumption of processing cycles to different pmds. +For example, in the case where here there are 5 rxqs and 3 cores (e.g. 3,7,8) +available, and the measured usage of core cycles per rxq over the last +interval is seen to be: + +- Queue #0: 30% +- Queue #1: 80% +- Queue #3: 60% +- Queue #4: 70% +- Queue #5: 10% + +The rxqs will be assigned to cores 3,7,8 in the following order: + +Core 3: Q1 (80%) | +Core 7: Q4 (70%) | Q5 (10%) +core 8: Q0 (60%) | Q0 (30%) + Rxq to pmds assignment takes place whenever there are configuration changes. diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index a05e586..3f174c8 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3236,4 +3236,5 @@ struct rr_numa { int cur_index; +bool idx_inc; }; @@ -3274,4 +3275,7 @@ rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr) numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa->pmds); numa->pmds[numa->n_pmds - 1] = pmd; +/* At least one pmd so initialise curr_idx and idx_inc. */ +numa->cur_index = 0; +numa->idx_inc = true; } } @@ -3280,5 +3284,20 @@ static struct dp_netdev_pmd_thread * rr_numa_get_pmd(struct rr_numa *numa) { -return numa->pmds[numa->cur_index++ % numa->n_pmds]; +int numa_idx = numa->cur_index; + +if (numa->idx_inc == true) { +if (numa->cur_index == numa->n_pmds-1) { +numa->idx_inc = false; +} else { +numa->cur_index++; +} +} else { +if (numa->cur_index == 0) { +numa->idx_inc = true; +} else { +numa->cur_index--; +} +} +return numa->pmds[numa_idx]; } diff --git a/tests/pmd.at b/tests/pmd.at index f95a016..73a8be1 100644 --- a/tests/pmd.at +++ b/tests/pmd.at @@ -54,5 +54,5 @@ m4_define([CHECK_PMD_THREADS_CREATED], [ m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id \)[[0-9]]*:/\1\2:/"]) -m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id \)[[0-9]]*:/\1\2:/;s/\(queue-id: \)0 2 4 6/\1/;s/\(queue-id: \)1 3 5 7/\1/"]) +m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id \)[[0-9]]*:/\1\2:/;s/\(queue-id: \)1 2 5 6/\1/;s/\(queue-id: \)0 3 4 7/\1/"]) m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"]) -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 4/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
Previously rxqs were assigned to pmds by round robin in port/queue order. Now that we have the processing cycles used for existing rxqs, use that information to try and produced a better balanced distribution of rxqs across pmds. i.e. given multiple pmds, the rxqs which have consumed the largest amount of processing cycles will be placed on different pmds. The rxqs are sorted by their processing cycles and assigned (in sorted order) round robin across pmds. Signed-off-by: Kevin Traynor --- Documentation/howto/dpdk.rst | 7 + lib/dpif-netdev.c| 73 +++- 2 files changed, 66 insertions(+), 14 deletions(-) diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index af01d3e..a969285 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues was pinned will become thread. +If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores) +automatically. The processing cycles that have been required for each rxq +will be used where known to assign rxqs with the highest consumption of +processing cycles to different pmds. + +Rxq to pmds assignment takes place whenever there are configuration changes. + QoS --- diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 25a521a..a05e586 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3295,8 +3295,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr) } +/* Sort Rx Queues by the processing cycles they are consuming. */ +static int +rxq_cycle_sort(const void *a, const void *b) +{ +struct dp_netdev_rxq * qa; +struct dp_netdev_rxq * qb; + +qa = *(struct dp_netdev_rxq **) a; +qb = *(struct dp_netdev_rxq **) b; + +if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >= +dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) { +return -1; +} + +return 1; +} + /* Assign pmds to queues. If 'pinned' is true, assign pmds to pinned * queues and marks the pmds as isolated. Otherwise, assign non isolated * pmds to unpinned queues. * + * If 'pinned' is false queues will be sorted by processing cycles they are + * consuming and then assigned to pmds in round robin order. + * * The function doesn't touch the pmd threads, it just stores the assignment * in the 'pmd' member of each rxq. */ @@ -3306,18 +3327,14 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) struct dp_netdev_port *port; struct rr_numa_list rr; - -rr_numa_list_populate(dp, &rr); +struct dp_netdev_rxq ** rxqs = NULL; +int i, n_rxqs = 0; +struct rr_numa *numa = NULL; +int numa_id; HMAP_FOR_EACH (port, node, &dp->ports) { -struct rr_numa *numa; -int numa_id; - if (!netdev_is_pmd(port->netdev)) { continue; } -numa_id = netdev_get_numa_id(port->netdev); -numa = rr_numa_list_lookup(&rr, numa_id); - for (int qid = 0; qid < port->n_rxq; qid++) { struct dp_netdev_rxq *q = &port->rxqs[qid]; @@ -3337,17 +3354,45 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) } } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) { -if (!numa) { -VLOG_WARN("There's no available (non isolated) pmd thread " - "on numa node %d. Queue %d on port \'%s\' will " - "not be polled.", - numa_id, qid, netdev_get_name(port->netdev)); +if (n_rxqs == 0) { +rxqs = xmalloc(sizeof *rxqs); } else { -q->pmd = rr_numa_get_pmd(numa); +rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1)); } +/* Store the queue. */ +rxqs[n_rxqs++] = q; } } } +if (n_rxqs > 1) { +/* Sort the queues in order of the processing cycles + * they consumed during their last pmd interval. */ +qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort); +} + +rr_numa_list_populate(dp, &rr); +/* Assign the sorted queues to pmds in round robin. */ +for (i = 0; i < n_rxqs; i++) { +numa_id = netdev_get_numa_id(rxqs[i]->port->netdev); +numa = rr_numa_list_lookup(&rr, numa_id); +if (!numa) { +VLOG_WARN("There's no available (non isolated) pmd thread " + "on numa node %d. Queue %d on port \'%s\' will " + "not be polled.", + numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx), + netdev_get_name(rxqs[i]->port->netdev)); +continue; +} +rxqs[i]->pmd = rr_numa_get_pmd(numa); +VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " + "rx queue %d (mea
[ovs-dev] [PATCH v3 3/6] dpif-netdev: Count the rxq processing cycles for an rxq.
Count the cycles used for processing an rxq during the pmd rxq interval. As this is an in flight counter and pmds run independently, also store the total cycles used during the last full interval. Signed-off-by: Kevin Traynor --- lib/dpif-netdev.c | 65 ++- 1 file changed, 60 insertions(+), 5 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 0a4daf9..25a521a 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -181,4 +181,8 @@ struct emc_cache { #define DPCLS_OPTIMIZATION_INTERVAL 1000 +/* Time in ms of the interval in which rxq processing cycles used in + * rxq to pmd assignments is measured and stored. */ +#define PMD_RXQ_INTERVAL 1000 + struct dpcls { struct cmap_node node; /* Within dp_netdev_pmd_thread.classifiers */ @@ -554,4 +558,6 @@ struct dp_netdev_pmd_thread { /* Periodically sort subtable vectors according to hit frequencies */ long long int next_optimization; +/* Periodically store the processing cycles used for each rxq. */ +long long int rxq_interval; /* Statistics. */ @@ -678,5 +684,13 @@ static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd) OVS_REQUIRES(pmd->port_mutex); static inline void -dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd); +dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd, + struct polled_queue *poll_list, int poll_cnt); +static void +dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx, + enum rxq_cycles_counter_type type, + unsigned long long cycles); +static uint64_t +dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx, + enum rxq_cycles_counter_type type); static void dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd, @@ -3099,4 +3113,21 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, } +static void +dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx, + enum rxq_cycles_counter_type type, + unsigned long long cycles) +{ + atomic_store_relaxed(&rx->cycles[type], cycles); +} + +static uint64_t +dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx, + enum rxq_cycles_counter_type type) +{ +unsigned long long tmp; +atomic_read_relaxed(&rx->cycles[type], &tmp); +return tmp; +} + static int dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd, @@ -3143,5 +3174,5 @@ port_reconfigure(struct dp_netdev_port *port) { struct netdev *netdev = port->netdev; -int i, err; +int i, err, last_nrxq; port->need_reconfigure = false; @@ -3152,4 +3183,5 @@ port_reconfigure(struct dp_netdev_port *port) port->rxqs[i].rx = NULL; } +last_nrxq = port->n_rxq; port->n_rxq = 0; @@ -3172,4 +3204,9 @@ port_reconfigure(struct dp_netdev_port *port) for (i = 0; i < netdev_n_rxq(netdev); i++) { port->rxqs[i].port = port; +if (i >= last_nrxq) { +/* Only reset cycle stats for new queues */ +dp_netdev_rxq_set_cycles(&port->rxqs[i], RXQ_CYCLES_PROC_CURR, 0); +dp_netdev_rxq_set_cycles(&port->rxqs[i], RXQ_CYCLES_PROC_LAST, 0); +} err = netdev_rxq_open(netdev, &port->rxqs[i].rx, i); if (err) { @@ -3765,5 +3802,5 @@ reload: dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, poll_list[i].port_no); -cycles_count_intermediate(pmd, NULL, +cycles_count_intermediate(pmd, poll_list[i].rxq, process_packets ? PMD_CYCLES_PROCESSING : PMD_CYCLES_IDLE); @@ -3776,5 +3813,5 @@ reload: coverage_try_clear(); -dp_netdev_pmd_try_optimize(pmd); +dp_netdev_pmd_try_optimize(pmd, poll_list, poll_cnt); if (!ovsrcu_try_quiesce()) { emc_cache_slow_sweep(&pmd->flow_cache); @@ -4221,4 +4258,5 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, cmap_init(&pmd->classifiers); pmd->next_optimization = time_msec() + DPCLS_OPTIMIZATION_INTERVAL; +pmd->rxq_interval = time_msec() + PMD_RXQ_INTERVAL; hmap_init(&pmd->poll_list); hmap_init(&pmd->tx_ports); @@ -5654,8 +5692,25 @@ dpcls_sort_subtable_vector(struct dpcls *cls) static inline void -dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd) +dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd, + struct polled_queue *poll_list, int poll_cnt) { struct dpcls *cls; long long int now = time_msec(); +int i; +uint64_t rxq_cyc_curr; + +if (now > pmd->rxq_interval) { +/* Get the cycles that were used to process each queue and store. */ +for (i = 0; i < poll_cnt; i++) { +rxq_cyc_curr = dp_netdev_rxq_get_
[ovs-dev] [PATCH v3 2/6] dpif-netdev: Add rxq processing cycle counters.
Add two counters to dp_netdev_rxq which will be used for storing the processing cycles of an rxq. Processing cycles will be stored in reference to a defined interval. One counter is used for storing cycles during the current in progress interval, while the other is used to store the cycles of the last fully complete interval. cycles_count_intermediate was used to count cycles for a pmd. With some small additions we can also use it to count the cycles used for processing an rxq. Signed-off-by: Kevin Traynor --- lib/dpif-netdev.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 62a3bce..0a4daf9 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -336,4 +336,11 @@ enum pmd_cycles_counter_type { }; +enum rxq_cycles_counter_type { +RXQ_CYCLES_PROC_CURR, /* Cycles spent successfully polling and + processing polled packets */ +RXQ_CYCLES_PROC_LAST, +RXQ_N_CYCLES +}; + #define XPS_TIMEOUT_MS 500LL @@ -347,4 +354,5 @@ struct dp_netdev_rxq { particular core. */ struct dp_netdev_pmd_thread *pmd; /* pmd thread that polls this queue. */ +atomic_ullong cycles[RXQ_N_CYCLES]; }; @@ -671,5 +679,4 @@ static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd) static inline void dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd); - static void dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd, @@ -3077,4 +3084,5 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd, static inline void cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, + struct dp_netdev_rxq *rxq, enum pmd_cycles_counter_type type) OVS_NO_THREAD_SAFETY_ANALYSIS @@ -3085,4 +3093,8 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, non_atomic_ullong_add(&pmd->cycles.n[type], interval); +if (rxq && (type == PMD_CYCLES_PROCESSING)) { +/* Add to the amount of current processing cycles. */ +non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval); +} } @@ -3589,5 +3601,5 @@ dpif_netdev_run(struct dpif *dpif) port->rxqs[i].rx, port->port_no); -cycles_count_intermediate(non_pmd, process_packets ? +cycles_count_intermediate(non_pmd, NULL, process_packets ? PMD_CYCLES_PROCESSING : PMD_CYCLES_IDLE); @@ -3753,5 +3765,5 @@ reload: dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, poll_list[i].port_no); -cycles_count_intermediate(pmd, +cycles_count_intermediate(pmd, NULL, process_packets ? PMD_CYCLES_PROCESSING : PMD_CYCLES_IDLE); -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 1/6] dpif-netdev: Change polled_queue to use dp_netdev_rxq.
Soon we will want to store processing cycle counts in the dp_netdev_rxq, so use that as a basis for the polled_queue that pmd_thread_main uses. Signed-off-by: Kevin Traynor --- lib/dpif-netdev.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 47a9fa0..62a3bce 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -482,5 +482,5 @@ struct dp_netdev_pmd_cycles { struct polled_queue { -struct netdev_rxq *rx; +struct dp_netdev_rxq *rxq; odp_port_t port_no; }; @@ -3698,5 +3698,5 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread *pmd, i = 0; HMAP_FOR_EACH (poll, node, &pmd->poll_list) { -poll_list[i].rx = poll->rxq->rx; +poll_list[i].rxq = poll->rxq; poll_list[i].port_no = poll->rxq->port->port_no; i++; @@ -3735,6 +3735,6 @@ reload: for (i = 0; i < poll_cnt; i++) { VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n", -pmd->core_id, netdev_rxq_get_name(poll_list[i].rx), -netdev_rxq_get_queue_id(poll_list[i].rx)); +pmd->core_id, netdev_rxq_get_name(poll_list[i].rxq->rx), +netdev_rxq_get_queue_id(poll_list[i].rxq->rx)); } @@ -3751,5 +3751,5 @@ reload: for (i = 0; i < poll_cnt; i++) { process_packets = -dp_netdev_process_rxq_port(pmd, poll_list[i].rx, +dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, poll_list[i].port_no); cycles_count_intermediate(pmd, -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 0/6] OVS-DPDK rxq to pmd assignment improvements.
For the DPDK datapath, by default rxqs are assigned to available pmds in round robin order with no weight or priority. It can happen that some very busy queues are handled by one pmd which does not have enough cycles to prevent packets being dropped on them. While at the same time another pmd which handles queues with no traffic on them is essentially idle. Rxq to pmd assignment happens as a result of a number of events and when it does, the same unweighted round robin approach is applied each time. This patchset proposes to improve the round robin nature of rxq to pmd assignment by counting the processing cycles used by the rxqs during their operation and incorporating that data into assignment. Before assigning in a round robin manner, the rxqs will be sorted in order of the processing cycles they have been consuming. Assuming multiple pmds, this ensures that the rxqs measured to be using the most processing cycles will be assigned to different cores. In some cases the measured cycles for an rxq may be not available as the rxq is new or may not be useful for assignment as traffic patterns may change. In those cases the code will essentially fallback to being round round similar to what currently exists. However, in the case where data is available and a reliable indication of future rxq cycles consumption, rxq to pmd distribution will be much improved. V2 -> V3 Dropped v2 1/7 as not reusing dpcls optimisation interval anymore 2/6 Moved unused functions to 3/6 to avoid compiler warning 3/6 Made pmd rxq interval independent from dpcls opt interval 4/6 Moved docs about rebalance command to when it is available in 6/6 Added logging info for pmd to rxq assignment 5/6 Added an example to docs 6/6 Noted in commit msg that Jan requested this for testing purposes V1 -> V2 Dropped Ciara's patch to change how pmd cycles are counted as it merged. 6/7: Rebased unit tests. RFC -> PATCH Changed order of patches, squashed two and added some new patches. 1/8 Added Ciara's patch into this series so it can be applied in one go. It's a standalone patch but has not merged yet. 2/8 No change. 3/8 No change. 4/8 Merged patch that adds counters with patch that updates them through cycle_count_intermediate to remove unsed function warnings. Consolidated counter functions by using enum as suggested by Ian Stokes. Small changes to use new counter functions. 5/8 Minor changes to use new counter functions. 6/8 Minor changes to use new counter functions. Added docs. 7/8 Kevin Traynor (6): dpif-netdev: Change polled_queue to use dp_netdev_rxq. dpif-netdev: Add rxq processing cycle counters. dpif-netdev: Count the rxq processing cycles for an rxq. dpif-netdev: Change rxq_scheduling to use rxq processing cycles. dpif-netdev: Change pmd selection order. dpif-netdev: Add ovs-appctl dpif-netdev/pmd-rxq-rebalance. Documentation/howto/dpdk.rst | 26 + lib/dpif-netdev.c| 220 +-- tests/pmd.at | 2 +- vswitchd/ovs-vswitchd.8.in | 2 + 4 files changed, 222 insertions(+), 28 deletions(-) -- 1.8.3.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap
On Tue, Aug 01, 2017 at 08:06:59PM +0800, Yi Yang wrote: > From: Jan Scheurich > > This commit adds support for the OpenFlow actions generic encap > and decap (as specified in ONF EXT-382) to the OVS control plane. > > CLI syntax for encap action with properties: > encap(hdr=) > encap(hdr=, > prop(class=,type=,val=), > prop(class=,type=,val())) > > CLI syntax for decap action: > decap() > decap(packet_type(ns=,type=)) > > The first header supported for encap and decap is "ethernet" to convert > packets between packet_type (1,Ethertype) and (0,0). > > This commit also implements a skeleton for the translation of generic > encap and decap actions in ofproto-dpif and adds support to encap and > decap an Ethernet header. > > In general translation of encap commits pending actions and then rewrites > struct flow in accordance with the new packet type and header. In the > case of encap(ethernet) it suffices to change the packet type from > (1, Ethertype) to (0,0) and set the dl_type accordingly. A new > pending_encap flag in xlate ctx is set to mark that an corresponding > datapath encap action must be triggered at the next commit. In the > case of encap(ethernet) ofproto generetas a push_eth action. > > The general case for translation of decap() is to emit a datapath action > to decap the current outermost header and then recirculate the packet > to reparse the inner headers. In the special case of an Ethernet packet, > decap() just changes the packet type from (0,0) to (1, dl_type) without > a need to recirculate. The emission of the pop_eth action for the > datapath is postponed to the next commit. > > Hence encap(ethernet) and decap() on an Ethernet packet are OF octions > that only incur a cost in the dataplane when a modifed packet is > actually committed, e.g. because it is sent out. They can freely be > used for normalizing the packet type in the OF pipeline without > degrading performance. > > Signed-off-by: Jan Scheurich > Signed-off-by: Yi Yang > Signed-off-by: Zoltan Balogh > Co-authored-by: Zoltan Balogh Thanks for iterating so quickly! Besides the syntax for properties, which I still think should be simplified to e.g. nsh(md_type=1), I have only a few remaining comments. I don't think there's any value in the n_props member in nx_action_encap. (This is about the OpenFlow wire format now, not the internal format.) The properties are the whole tail of the structure and having a count doesn't make anything easier. Can you remove it? It will allow us to drop 8 bytes from the action structure due to padding. (In case it can't be removed, I'm providing a spelling fix.) decode_ed_prop() still doesn't check the length properly, so I'm providing a fix. I'm appending my suggested incremental. Thanks again! --8<--cut here-->8-- diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index d0437f20922a..c7d47eb5dd71 100644 --- a/lib/ofp-actions.c +++ b/lib/ofp-actions.c @@ -4036,7 +4036,7 @@ struct nx_action_encap { ovs_be16 subtype; /* NXAST_ENCAP. */ ovs_be16 hdr_size; /* Header size in bytes, 0 = 'not specified'.*/ ovs_be32 new_pkt_type; /* Header type to add and PACKET_TYPE of result. */ -ovs_be16 n_props; /* Number of the following endecap properties. */ +ovs_be16 n_props; /* Number of the following encap properties. */ uint8_t pad[6];/* Align to 8 bytes boundary */ struct ofp_ed_prop_header props[]; /* Encap TLV properties. */ }; @@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad, enum ofp_version ofp_version OVS_UNUSED, struct ofpbuf *ofpacts) { +if (ntohs(nad->len) > sizeof *nad) { +/* No properties supported yet. */ +return OFPERR_OFPBPC_BAD_TYPE; +} + struct ofpact_decap * decap; decap = ofpact_put_DECAP(ofpacts); diff --git a/lib/ofp-ed-props.c b/lib/ofp-ed-props.c index 260adc30acf0..6bba3ac7d982 100644 --- a/lib/ofp-ed-props.c +++ b/lib/ofp-ed-props.c @@ -32,6 +32,9 @@ decode_ed_prop(const struct ofp_ed_prop_header **ofp_prop, uint16_t prop_class = ntohs((*ofp_prop)->prop_class); size_t len = (*ofp_prop)->len; size_t pad_len = ROUND_UP(len, 8); +if (pad_len > *remaining) { +return OFPERR_OFPBAC_BAD_LEN; +} switch (prop_class) { default: ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 6/7] dpif-netdev: Change pmd selection order.
On 07/22/2017 03:52 PM, Stokes, Ian wrote: >> Up to his point rxqs are sorted by processing cycles they consumed and >> assigned to pmds in a round robin manner. >> >> Ian pointed out that on wrap around the most loaded pmd will be the next >> one to be assigned an additional rxq and that it would be better to >> reverse the pmd order when wraparound occurs. >> >> In other words, change from assigning by rr to assigning in a forward and >> reverse cycle through pmds. >> >> Suggested-by: Ian Stokes >> Signed-off-by: Kevin Traynor >> --- >> lib/dpif-netdev.c | 21 - >> tests/pmd.at | 2 +- >> 2 files changed, 21 insertions(+), 2 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 7663dba..31f0ed4 >> 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -3230,4 +3230,5 @@ struct rr_numa { >> >> int cur_index; >> +bool idx_inc; >> }; >> >> @@ -3268,4 +3269,7 @@ rr_numa_list_populate(struct dp_netdev *dp, struct >> rr_numa_list *rr) >> numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa- >>> pmds); >> numa->pmds[numa->n_pmds - 1] = pmd; >> +/* At least one pmd so initialise curr_idx and idx_inc. */ >> +numa->cur_index = 0; >> +numa->idx_inc = true; >> } >> } >> @@ -3274,5 +3278,20 @@ static struct dp_netdev_pmd_thread * >> rr_numa_get_pmd(struct rr_numa *numa) { >> -return numa->pmds[numa->cur_index++ % numa->n_pmds]; >> +int numa_idx = numa->cur_index; >> + >> +if (numa->idx_inc == true) { >> +if (numa->cur_index == numa->n_pmds-1) { >> +numa->idx_inc = false; >> +} else { >> +numa->cur_index++; >> +} >> +} else { >> +if (numa->cur_index == 0) { >> +numa->idx_inc = true; >> +} else { >> +numa->cur_index--; >> +} >> +} >> +return numa->pmds[numa_idx]; >> } >> > I like the above approach, as there's a bit more complexity introduced into > the numa selection process, I'm wondering does it make sense to add a > VLOG_DBG message here regarding the current index and index > increments/decrements? > > I'm just thinking that this could be an area that has an impact on > performance and could be useful for someone to help debug the assignments. > Good idea. I didn't add it here as this function just knows about the pmd index. I've added it to the rxq_scheduling function as part of 4/6 where all the information about the pmds, queues, cycles etc. is available and put it at info level. >> diff --git a/tests/pmd.at b/tests/pmd.at index f95a016..73a8be1 100644 >> --- a/tests/pmd.at >> +++ b/tests/pmd.at >> @@ -54,5 +54,5 @@ m4_define([CHECK_PMD_THREADS_CREATED], [ >> >> m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id >> \)[[0-9]]*:/\1\2:/"]) >> -m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( >> core_id \)[[0-9]]*:/\1\2:/;s/\(queue-id: \)0 2 4 >> 6/\1/;s/\(queue-id: \)1 3 5 7/\1/"]) >> +m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( >> +core_id \)[[0-9]]*:/\1\2:/;s/\(queue-id: \)1 2 5 >> +6/\1/;s/\(queue-id: \)0 3 4 7/\1/"]) >> m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"]) >> >> -- >> 1.8.3.1 > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 5/7] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
On 07/22/2017 03:52 PM, Stokes, Ian wrote: >> Previously rxqs were assigned to pmds by round robin in port/queue order. >> >> Now that we have the processing cycles used for existing rxqs, use that >> information to try and produced a better balanced distribution of rxqs >> across pmds. i.e. given multiple pmds, the rxqs which have consumed the >> largest amount of processing cycles will be placed on different pmds. >> >> The rxqs are sorted by their processing cycles and assigned (in sorted >> order) round robin across pmds. >> >> Signed-off-by: Kevin Traynor >> --- >> Documentation/howto/dpdk.rst | 10 +++ >> lib/dpif-netdev.c| 67 +++ >> - >> 2 files changed, 63 insertions(+), 14 deletions(-) >> >> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst >> index af01d3e..d9ac8d3 100644 >> --- a/Documentation/howto/dpdk.rst >> +++ b/Documentation/howto/dpdk.rst >> @@ -119,4 +119,14 @@ After that PMD threads on cores where RX queues was >> pinned will become >>thread. >> >> +If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds >> +automatically. The processing cycles that have been required for each >> +rxq will be used where known to assign rxqs with the highest >> +consumption of processing cycles to different pmds. >> + >> +Rxq to pmds assignment takes place whenever there are configuration >> +changes or can be triggered by using:: >> + >> +$ ovs-appctl dpif-netdev/pmd-rxq-rebalance >> + > I think an illustrated example of the expected assignment behavior would be > beneficial here to give users a feel for what's happening under the hood. > > Something simple like how 4 queues would be distributed over 3 pmds, although > this change might make more sense to be rolled in with patch 6 when the pmd > selection process is modified. > Sure. Yeah, I agree it makes more sense when the algorithm is finalized, so I added it there. >> QoS >> --- >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 185de9b..7663dba >> 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -3289,8 +3289,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr) } >> >> +/* Sort Rx Queues by the processing cycles they are consuming. */ >> +static int rxq_cycle_sort(const void *a, const void *b) { >> +struct dp_netdev_rxq * qa; >> +struct dp_netdev_rxq * qb; >> + >> +qa = *(struct dp_netdev_rxq **) a; >> +qb = *(struct dp_netdev_rxq **) b; >> + >> +if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >= >> +dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) { >> +return -1; >> +} >> + >> +return 1; >> +} >> + >> /* Assign pmds to queues. If 'pinned' is true, assign pmds to pinned >> * queues and marks the pmds as isolated. Otherwise, assign non isolated >> * pmds to unpinned queues. >> * >> + * If 'pinned' is false queues will be sorted by processing cycles they >> + are >> + * consuming and then assigned to pmds in round robin order. >> + * >> * The function doesn't touch the pmd threads, it just stores the >> assignment >> * in the 'pmd' member of each rxq. */ >> @@ -3300,18 +3321,14 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) >> OVS_REQUIRES(dp->port_mutex) >> struct dp_netdev_port *port; >> struct rr_numa_list rr; >> - >> -rr_numa_list_populate(dp, &rr); >> +struct dp_netdev_rxq ** rxqs = NULL; >> +int i, n_rxqs = 0; >> +struct rr_numa *numa = NULL; >> +int numa_id; >> >> HMAP_FOR_EACH (port, node, &dp->ports) { >> -struct rr_numa *numa; >> -int numa_id; >> - >> if (!netdev_is_pmd(port->netdev)) { >> continue; >> } >> >> -numa_id = netdev_get_numa_id(port->netdev); >> -numa = rr_numa_list_lookup(&rr, numa_id); >> - >> for (int qid = 0; qid < port->n_rxq; qid++) { >> struct dp_netdev_rxq *q = &port->rxqs[qid]; @@ -3331,17 >> +3348,39 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) >> OVS_REQUIRES(dp->port_mutex) >> } >> } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) { >> -if (!numa) { >> -VLOG_WARN("There's no available (non isolated) pmd >> thread " >> - "on numa node %d. Queue %d on port \'%s\' >> will " >> - "not be polled.", >> - numa_id, qid, netdev_get_name(port- >>> netdev)); >> +if (n_rxqs == 0) { >> +rxqs = xmalloc(sizeof *rxqs); >> } else { >> -q->pmd = rr_numa_get_pmd(numa); >> +rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1)); >> } >> +/* Store the queue. */ >> +rxqs[n_rxqs++] = q; >> } >> } >> } >> >> +if (n_rxqs > 1) { >> +/* Sort the queues in order of the process
Re: [ovs-dev] [PATCH v2 3/7] dpif-netdev: Add rxq processing cycle counters.
On 07/22/2017 03:51 PM, Stokes, Ian wrote: >> Add two counters to dp_netdev_rxq which will be used for storing the >> processing cycles of an rxq. Processing cycles will be stored in reference >> to a defined interval. One counter is used for storing cycles during the >> current in progress interval, while the other is used to store the cycles >> of the last fully complete interval. >> >> cycles_count_intermediate was used to count cycles for a pmd. With some >> small additions we can also use it to count the cycles used for processing >> an rxq. >> >> Signed-off-by: Kevin Traynor > > Think I flagged this before but OVS doesn't compile cleanly after applying > this patch. > > lib/dpif-netdev.c:3117:1: error: 'dp_netdev_rxq_get_cycles' defined but not > used [-Werror=unused-function] > dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx, > ^~~~ > lib/dpif-netdev.c:3109:1: error: 'dp_netdev_rxq_set_cycles' defined but not > used [-Werror=unused-function] > dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx, > > Consider rolling it into patch 4 of the series to avoid the issue. Hmm, I thought I had fixed this but obviously not :( As you suggested, I moved them to the next patch where they are first used. >> --- >> lib/dpif-netdev.c | 42 +++--- >> 1 file changed, 39 insertions(+), 3 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 0b8a0c8..273db38 >> 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -336,4 +336,11 @@ enum pmd_cycles_counter_type { }; >> >> +enum rxq_cycles_counter_type { >> +RXQ_CYCLES_PROC_CURR, /* Cycles spent successfully polling and >> + processing polled packets */ >> +RXQ_CYCLES_PROC_LAST, >> +RXQ_N_CYCLES >> +}; >> + >> #define XPS_TIMEOUT_MS 500LL >> >> @@ -347,4 +354,5 @@ struct dp_netdev_rxq { >>particular core. */ >> struct dp_netdev_pmd_thread *pmd; /* pmd thread that polls this >> queue. */ >> +atomic_ullong cycles[RXQ_N_CYCLES]; /* Processing cycles. */ >> }; > Very minor nit but not crazy about the alignment of the comments within the > struct above. > I just deleted the comment as it's not really necessary and it won't line up with the rest correctly. >> >> @@ -671,5 +679,11 @@ static void pmd_load_cached_ports(struct >> dp_netdev_pmd_thread *pmd) static inline void >> dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd); >> - >> +static void >> +dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx, >> + enum rxq_cycles_counter_type type, >> + unsigned long long cycles); static uint64_t >> +dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx, >> + enum rxq_cycles_counter_type type); >> static void >> dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd, @@ >> -3077,4 +3091,5 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd, >> static inline void cycles_count_intermediate(struct dp_netdev_pmd_thread >> *pmd, >> + struct dp_netdev_rxq *rxq, >>enum pmd_cycles_counter_type type) >> OVS_NO_THREAD_SAFETY_ANALYSIS >> @@ -3085,4 +3100,25 @@ cycles_count_intermediate(struct >> dp_netdev_pmd_thread *pmd, >> >> non_atomic_ullong_add(&pmd->cycles.n[type], interval); >> +if (rxq && (type == PMD_CYCLES_PROCESSING)) { >> +/* Add to the amount of current processing cycles. */ >> +non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], >> interval); >> +} >> +} >> + >> +static void >> +dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx, >> + enum rxq_cycles_counter_type type, >> + unsigned long long cycles) { >> + atomic_store_relaxed(&rx->cycles[type], cycles); } >> + >> +static uint64_t >> +dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx, >> + enum rxq_cycles_counter_type type) { >> +unsigned long long tmp; >> +atomic_read_relaxed(&rx->cycles[type], &tmp); >> +return tmp; >> } >> >> @@ -3589,5 +3625,5 @@ dpif_netdev_run(struct dpif *dpif) >> port->rxqs[i].rx, >> port->port_no); >> -cycles_count_intermediate(non_pmd, process_packets ? >> +cycles_count_intermediate(non_pmd, NULL, >> process_packets ? >> >> PMD_CYCLES_PROCESSING >> : PMD_CYCLES_IDLE); >> @@ -3753,5 +3789,5 @@ reload: >> dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, >> poll_list[i].port_no); >> -cycles_count_intermediate(pmd, >> +cycles_count_intermediate(pmd, NULL, >>process_packets ? >> PMD_CYCLES_PROCESSING
Re: [ovs-dev] [PATCH v2 1/7] dpif-netdev: Make dpcls optimization interval more generic.
On 07/22/2017 03:50 PM, Stokes, Ian wrote: >> So far the interval was only used for dpcls optimization. >> Soon, we will use it for storing rxq cycles so make the names more >> generic. Also, set the interval regardless of whether dpcls optimization >> has occurred, as the optimization interval will need to be consistent >> across pmds. >> >> Signed-off-by: Kevin Traynor >> --- >> lib/dpif-netdev.c | 10 +- >> 1 file changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 47a9fa0..aa8c05d >> 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -178,6 +178,6 @@ struct emc_cache { >> /* Simple non-wildcarding single-priority classifier. */ >> >> -/* Time in ms between successive optimizations of the dpcls subtable >> vector */ -#define DPCLS_OPTIMIZATION_INTERVAL 1000 >> +/* Time in ms between successive optimizations of the pmd */ #define >> +PMD_OPTIMIZATION_INTERVAL 1000 >> >> struct dpcls { >> @@ -4208,5 +4208,5 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread >> *pmd, struct dp_netdev *dp, >> cmap_init(&pmd->flow_table); >> cmap_init(&pmd->classifiers); >> -pmd->next_optimization = time_msec() + DPCLS_OPTIMIZATION_INTERVAL; >> +pmd->next_optimization = time_msec() + PMD_OPTIMIZATION_INTERVAL; > > I'm not sure if we should be removing the dpcls optimization interval and > combining it with the pmd optimization for the purpose rxq balancing. > Is there a situation when you will want the dpcls optimized at a different > interval from the rxq rebalance? > > Maybe you have thought of this already in this solution, but for instance, in > the case of bursty non uniform traffic, would it be an idea to allow the > pmd_optmization period to be configurable by the user so that enough time > passes that the rxq rebalance occurs on statistic's that have been gathered > over a longer period to negate the bursty aspect of the traffic? And in that > case you may still want the dpcls optimization to occur at the usual period > of 1000. > It's a good question. On the other hand a longer interval can mean using older data. I'm reluctant to add another user config unless people *really* think it's needed. For the time being I've made the intervals independent. That way it avoids a slight change in the dpcls optimization behaviour (when it couldn't get the mutex), they can use different intervals and a user config can easily be added in the future if needed. As I don't need to make the dpcls interval generic anymore, I've dropped this patch from v3. >> hmap_init(&pmd->poll_list); >> hmap_init(&pmd->tx_ports); >> @@ -5656,7 +5656,7 @@ dp_netdev_pmd_try_optimize(struct >> dp_netdev_pmd_thread *pmd) >> } >> ovs_mutex_unlock(&pmd->flow_mutex); >> -/* Start new measuring interval */ >> -pmd->next_optimization = now + DPCLS_OPTIMIZATION_INTERVAL; >> } >> +/* Start new measuring interval */ >> +pmd->next_optimization = now + PMD_OPTIMIZATION_INTERVAL; >> } >> } >> -- >> 1.8.3.1 > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH RFC v2 4/4] dpif-netdev: Time based output batching.
>On 28.07.2017 10:20, Darrell Ball wrote: >> I have not tested yet >> >> However, I would have expected something max latency config. to be >specific to netdev-dpdk port types > >IMHO, if we can make it generic, we must make it generic. > >[Darrell] >The first question I ask myself is - is this functionality intrinsically >generic or is >it not ? >It is clearly not and trying to make it artificially so would do the following: > >1) We end up designing something the wrong way where it partially works. >2) Breaks other features present and future that really do intersect. > > > Making of this >functionality netdev-dpdk specific will brake ability to test it using >unit tests. As the change is complex and has a lot of pitfalls like >possible packet stucks and possible latency issues, this code should be >covered by unit tests to simplify the support and modifications. >(And it's already partly covered because it is generic. And I fixed many >minor issues while developing through unit test failures.) > >[Darrell] >Most of dpdk is not tested by our unit tests because it cannot be simulated >well at the moment. This is orthogonal to the basic question however. Darrell is right and the unit tests we have currently don't test DPDK datapath well. So having this changes in netdev layer shouldn't impact the unit tests much. While I share your other concern that changes in netdev layer will be little complex and slightly painful for future code changes, this max latency config introduced in dpif layer may not hold good to different port types and users may potentially introduce conflicting changes in netdev layer in future to suit their use cases. > > >In the future this can be used also to improve performance of netdev-linux >by replacing sendmsg() with batched sendmmsg(). This should significantly >increase performance of flood actions while MACs are not learned yet in >action NORMAL. > >> This type of code also seems to intersect with present and future QoS >considerations in netdev-dpdk > >Maybe, but there are also some related features in mail-list like rx queue >prioritization which are implemented in generic way on dpif-netdev layer. If you are referring to rxq prioritization work by Billy (https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336001.html), this feature is more implemented in netdev layer with very minimal updates to dpif layer. BTW, dp_execute_cb() is getting cluttered with this patch. - Bhanuprakash. > >> >> -Original Message- >> From: Ilya Maximets >> Date: Wednesday, July 26, 2017 at 8:21 AM >> To: "ovs-dev@openvswitch.org" , >Bhanuprakash Bodireddy >> Cc: Heetae Ahn , Ben Pfaff >, Antonio Fischetti , Eelco >Chaudron , Ciara Loftus , >Kevin Traynor , Darrell Ball , >Ilya Maximets >> Subject: [PATCH RFC v2 4/4] dpif-netdev: Time based output batching. >> >> This allows to collect packets from more than one RX burst >> and send them together with a configurable maximum latency. >> >> 'other_config:output-max-latency' can be used to configure >> time that a packet can wait in output batch for sending. >> >> Signed-off-by: Ilya Maximets >> --- >> >> millisecon granularity is used for now. Can be easily switched to use >> microseconds instead. >> >> lib/dpif-netdev.c| 97 >+++- >> vswitchd/vswitch.xml | 15 >> 2 files changed, 95 insertions(+), 17 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 07c7dad..e5f8a3d 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -84,6 +84,9 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev); >> #define MAX_RECIRC_DEPTH 5 >> DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0) >> >> +/* Use instant packet send by default. */ >> +#define DEFAULT_OUTPUT_MAX_LATENCY 0 >> + >> /* Configuration parameters. */ >> enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow >table. */ >> enum { MAX_METERS = 65536 };/* Maximum number of meters. */ >> @@ -261,6 +264,9 @@ struct dp_netdev { >> struct hmap ports; >> struct seq *port_seq; /* Incremented whenever a port > changes. >*/ >> >> +/* The time that a packet can wait in output batch for sending. > */ >> +atomic_uint32_t output_max_latency; >> + >> /* Meters. */ >> struct ovs_mutex meter_locks[N_METER_LOCKS]; >> struct dp_meter *meters[MAX_METERS]; /* Meter bands. */ >> @@ -498,6 +504,7 @@ struct tx_port { >> int qid; >> long long last_used; >> struct hmap_node node; >> +long lo
Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap
On Tue, Aug 01, 2017 at 08:26:07AM -0700, Ben Pfaff wrote: > On Tue, Aug 01, 2017 at 12:32:20PM +, Yang, Yi Y wrote: > > #2. > > [Ben] I suspect that decode_NXAST_RAW_DECAP() should report an error if > > properties are present, since it doesn't support properties. > > > > [Yi] It is impossible. > > What is impossible? It is easy to detect that properties are present > and report an error: > > diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c > index d0437f20922a..7be302a4005d 100644 > --- a/lib/ofp-actions.c > +++ b/lib/ofp-actions.c > @@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap > *nad, > enum ofp_version ofp_version OVS_UNUSED, > struct ofpbuf *ofpacts) > { > +if (ntohs(nad->len) > sizeof *nad) { > +/* No properties supported yet. */ > +return OFPERR_OFPBPC_BAD_TYPE; > +} Looking again, I guess that OFPERR_NXBAC_UNKNOWN_ED_PROP is better. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 0/4] Output packet batching.
>This patch-set inspired by [1] from Bhanuprakash Bodireddy. >Implementation of [1] looks very complex and introduces many pitfalls for >later code modifications like possible packet stucks. > >This version targeted to make simple and flexible output packet batching on >higher level without introducing and even simplifying netdev layer. > >Patch set consists of 3 patches. All the functionality introduced in the first >patch. Two others are just cleanups of netdevs to not do unnecessary things. > >Basic testing of 'PVP with OVS bonding on phy ports' scenario shows >significant performance improvement. >More accurate and intensive testing required. > >[1] [PATCH 0/6] netdev-dpdk: Use intermediate queue during packet >transmission. >https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/334762.html > >Version 2: > > * Rebased on current master. > * Added time based batching RFC patch. > * Fixed mixing packets with different sources in same batch. > Applied this series along with other patches[1] and gave initial try. With this series, approximately half a million throughput drop is observed in simple test case (P2P - 1stream - udp) vs master + [1]. The performance improvement is observed with multiple flows (which this series is meant to address). At this stage no latency settings were used. Yet to review and do more testing. [1] Improves performance. https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335359.html https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336186.html https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336187.html https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336290.html - Bhanuprakash. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap
On Tue, Aug 01, 2017 at 12:32:20PM +, Yang, Yi Y wrote: > #2. > [Ben] I suspect that decode_NXAST_RAW_DECAP() should report an error if > properties are present, since it doesn't support properties. > > [Yi] It is impossible. What is impossible? It is easy to detect that properties are present and report an error: diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index d0437f20922a..7be302a4005d 100644 --- a/lib/ofp-actions.c +++ b/lib/ofp-actions.c @@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad, enum ofp_version ofp_version OVS_UNUSED, struct ofpbuf *ofpacts) { +if (ntohs(nad->len) > sizeof *nad) { +/* No properties supported yet. */ +return OFPERR_OFPBPC_BAD_TYPE; +} + struct ofpact_decap * decap; decap = ofpact_put_DECAP(ofpacts); ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap
On Tue, Aug 01, 2017 at 12:04:55AM +, Yang, Yi Y wrote: > Hi, Ben > > Thank you so much for your great review and great comments, I'll do changes > per your comments and post next version because Zoltan and Jan are taking > vacation. I explained your concerns > > [Ben] The string format of the encap actions seems rather user unfriendly. > Is there a good reason why it should be the very generic > prop(class=,type=,val=) rather than something more tailored > to the actual properties that users will want to set? It is hard to tell > without any actual examples. > > [Yi] because encap is very generic action, you can add a new header/encap > type and some new properties for it very easily by this kind of very generic > abstraction and mustn't change generic encap implementation. I make an > example, for NSH we have > > prop(class=nsh,type=md_type,val=1) > > If we add new encap/header type gtpu, we can have the below property for GTP-u > > prop(class=gtpu,type=teid,val=1234) > > Generic encap action can accept this kind of new prop without any change > against generic encap action implementation, this is the magic of very > generic " prop(class=,type=,val=)". To me, this argument would make sense if it meant that we could add gtpu without having to modify OVS. But we'll still need to do that. At the very least, OVS isn't going to know class "gtpu" or type "teid" means without modifications. Here's a more user friendly syntax that seems just as generic to me: class(type=value) e.g.: nsh(md_type=1) gtpu(teid=1234) > [Ben] In decode_NXAST_RAW_ENCAP() and parse_ENCAP(), shouldn't there be > validation of the header size value? Most header types will only have a few > acceptable header sizes. > > [Yi] Actually header_size isn't used at all now, it is reserved for later > enhancement. For NSH MD type 2, NSH header is length-variable, ranged from 8 > to 264, number of props and size of each prop are also very flexible (like > Geneve tunnel metadata) OK. > [Ben] The internal representation of ofpact_ed_prop, has no examples so far. > How confident are you that it actually needs to be variable-length? If it > does, then using a member 'n_props' in struct ofpact_encap to count the > number of properties seems risky: it implicitly encourages programmers to try > to index the props[] array from 0 to n_props-1. I'd encourage, in that case, > switching to a length member, or just eliminating the member if > ofpact_ed_props can be padded to 8-byte multiples. But it's easier if we can > just make ofpact_ed_prop fixed-length for now; then we can just index props[] > as an array. > > However, I question whether the properties actually need to be internally > represented as properties at all. Presumably, OVS is only going to support a > specific set of properties. Probably, it's easy to just add specific members > to struct ofpact_encap that represent the values of those properties. Did > you consider that? That approach usually simplifies code greatly; properties > are usually needed only for external representations. > > [Yi] As I said above, prop is length-variable and unpredictable, especially > for NSH MD type 2, so a length-fixed ofpact_ed_prop impracticable and will > waste too much space, n_props can indicate number of props, actually props is > a pointer, we can use > > uint8 props[] > > to replace > > struct ofp_ed_prop_header props[]; > > But I think it doesn't bring any substantial benefit from my perspective. > > ofpact_ed_prop is header of each peoperty, it just helps parse properties > better. > > In addition, code is completely controllable, abuse of props[index] won't > happen because you guys check every commit very carefully :-) OK. I still think this seems to add more complexity than is probably needed, but I'll leave it alone for now. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] How to tell which code is for userspace
Dear Ben, Thanks. This make me clear. Ted Sent from Mail for Windows 10 From: Ben Pfaff Sent: Tuesday, August 1, 2017 10:53 PM To: ted.y.li...@gmail.com Cc: ovs-dev@openvswitch.org Subject: Re: [ovs-dev] How to tell which code is for userspace On Tue, Aug 01, 2017 at 10:01:31PM +0800, ted.y.li...@gmail.com wrote: > I am reading ovs 2.7.90 code, do you have a quick way to tell which code is > for userspace and which for kernel? Linux kernel code is in the datapath/ directory. Windows kernel code is in the datapath-windows/ directory. The rest of the code is userspace. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ofpbuf: Fix parameter for const initializer.
On Mon, Jul 31, 2017 at 05:16:11PM -0700, Joe Stringer wrote: > Clang complains: > > In file included from include/openvswitch/cxxtest.cc:11:0: > ../include/openvswitch/ofpbuf.h: In function ‘ofpbuf > ofpbuf_const_initializer(const void*, size_t)’: > ../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of > ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned int}’ > inside { } [-Wnarrowing] > }; > ^ > ../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of > ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned int}’ > inside { } [-Wnarrowing] > > This is because the ofpbuf struct's "size" parameter is a uint32_t, > while ofpbuf_const_initializer() takes a size_t for the size. Fix this > function to take a uint32_t instead. > > Signed-off-by: Joe Stringer Thanks. (We must have different clang versions.0 Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] How to tell which code is for userspace
On Tue, Aug 01, 2017 at 10:01:31PM +0800, ted.y.li...@gmail.com wrote: > I am reading ovs 2.7.90 code, do you have a quick way to tell which code is > for userspace and which for kernel? Linux kernel code is in the datapath/ directory. Windows kernel code is in the datapath-windows/ directory. The rest of the code is userspace. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] How to tell which code is for userspace
Thanks. But there is a function in datapath/vport.c, as below: struct vport *ovs_vport_alloc(int priv_size, const struct vport_ops *ops, const struct vport_parms *parms) { struct vport *vport; size_t alloc_size; alloc_size = sizeof(struct vport); if (priv_size) { alloc_size = ALIGN(alloc_size, VPORT_ALIGN); alloc_size += priv_size; } vport = kzalloc(alloc_size, GFP_KERNEL); if (!vport) return ERR_PTR(-ENOMEM); vport->dp = parms->dp; vport->port_no = parms->port_no; vport->ops = ops; INIT_HLIST_NODE(&vport->dp_hash_node); if (ovs_vport_set_upcall_portids(vport, parms->upcall_portids)) { kfree(vport); return ERR_PTR(-EINVAL); } return vport; } It uses kzalloc. Does not that mean it is part of the kernel code? Br, Ted From: Russell Bryant Sent: Tuesday, August 1, 2017 10:12 PM To: ted.y.li...@gmail.com Cc: ovs-dev@openvswitch.org Subject: Re: [ovs-dev] How to tell which code is for userspace On Tue, Aug 1, 2017 at 10:01 AM, wrote: > Dear all, > > I am reading ovs 2.7.90 code, do you have a quick way to tell which code is > for userspace and which for kernel? The ovs/datapath/linux/ directory is the Linux kernel datapath code. The rest is userspace (with the exception of the Windows datapath code). -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Openvswitch crash when bringing down the dpdk bond port using "ovs-ofctl mod-port br-prv dpdk1 down"
On 01.08.2017 17:19, Ilya Maximets wrote: > On 01.08.2017 15:46, Keshav Gupta wrote: >> Hi Ben/Ilya >> Mainly patch do the following >> 1) Call the dpdk rx api(rte_eth_rx_burst) only when ovs netdev state is UP >> 2) Now the rte_eth_dev_stop/ rte_eth_dev_start calls are removed in >> updating the flags >> >> >> This fixes the core dump issue but I see below side effect with this patch: >> With this change now only ports admin state(ovs netdev level) is changed >> (dpdk/nic level state is not changed) so NIC will continue to receive the >> packet until its Rxq ring is full >> Later if somebody bring up the port then he will get the burst of old >> packets which are there in the Rxq (typically 4k packet (dpdk Rxq size)). >> >> >> Thanks >> Keshav >> > > Hi Keshav, > > Yes, you're right. I think it's the same situation right now can be observed > on > current master. We can try to fix this for the HW NICs in the similar way as > we > handle destroy_device event for vhost-user ports. I prepared quick fix (not > even > compile tested) for the reference how this can be fixed. Please check the code > below. One thing is that we actually can't fix such behaviour for the > vhost-user > because there is no such command to stop the vhost device. But we may try to > fix it for HW NICs if it's important. > > --8<->8-- > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index ea17b97..202678f 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -2300,20 +2300,48 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev, > enum netdev_flags *old_flagsp) > OVS_REQUIRES(dev->mutex) > { > +enum netdev_flags new_flags = dev->flags; > + > if ((off | on) & ~(NETDEV_UP | NETDEV_PROMISC)) { > return EINVAL; > } > > *old_flagsp = dev->flags; > -dev->flags |= on; > -dev->flags &= ~off; > +new_flags |= on; > +new_flags &= ~off; > > -if (dev->flags == *old_flagsp) { > +if (new_flags == dev->flags) { > return 0; > } > > if (dev->type == DPDK_DEV_ETH) { > -if (dev->flags & NETDEV_PROMISC) { > +if (NETDEV_UP & ((*old_flagsp ^ on) | (*old_flagsp ^ off))) { > +if (NETDEV_UP & off) { > +bool quiescent = ovsrcu_is_quiescent(); > + > +dev->flags = new_flags; > +/* Wait for other threads to quiesce after turning off the > + * 'NETDEV_UP' before actual stopping the device to be sure > + * that all threads finished their rx/tx operations. */ > +ovsrcu_synchronize(); > +if (quiescent) { > +/* As call to ovsrcu_synchronize() will end the quiescent > + * state, put thread back into quiescent state. */ > +ovsrcu_quiesce_start(); > +} > +rte_eth_dev_stop(dev->port_id); > +} else { > +int retval = rte_eth_dev_start(dev->port_id); > + > +if (retval) { > +VLOG_ERR("Interface %s start error: %s", dev->up.name, > + rte_strerror(-diag)); > +return -retval; > +} > +} > +} > + > +if (new_flags & NETDEV_PROMISC) { > rte_eth_promiscuous_enable(dev->port_id); > } > > @@ -2336,6 +2364,7 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev, > } > } > > +dev->flags = new_flags; > return 0; > } > > --8<->8-- > > Also, the diff above made on top of current master. So, you possibly will > need to port it a little. > To make it work on current master at least following change also required: --8<->8-- diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index ea17b97..7145266 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -761,11 +761,13 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) return -diag; } -diag = rte_eth_dev_start(dev->port_id); -if (diag) { -VLOG_ERR("Interface %s start error: %s", dev->up.name, - rte_strerror(-diag)); -return -diag; +if (dev->flags & NETDEV_UP) { +diag = rte_eth_dev_start(dev->port_id); +if (diag) { +VLOG_ERR("Interface %s start error: %s", dev->up.name, + rte_strerror(-diag)); +return -diag; +} } rte_eth_promiscuous_enable(dev->port_id); --8<->8-- It is to prevent stopped port starting on each reconfiguration. > Best regards, Ilya Maximets. > >> >> -Original Message- >> From: Ben Pfaff [mailto:b...@ovn.org] >> Sent: Monday, July
Re: [ovs-dev] [PATCH 4/5] Automatically verify that OVS header files work OK in C++ also.
Ben Pfaff writes: > On Mon, Jul 31, 2017 at 04:44:10PM -0400, Russell Bryant wrote: >> On Mon, Jul 31, 2017 at 4:33 PM, Ben Pfaff wrote: >> > On Mon, Jul 31, 2017 at 01:05:32PM -0700, Ben Pfaff wrote: >> >> On Mon, Jul 31, 2017 at 03:50:59PM -0400, Russell Bryant wrote: >> >> > On Sun, Jul 30, 2017 at 10:54 PM, Ben Pfaff wrote: >> >> > > This should help address a recurring problem. >> >> > > >> >> > > Signed-off-by: Ben Pfaff >> >> > > --- >> >> > > .travis.yml | 1 + >> >> > > Makefile.am | 1 + >> >> > > configure.ac| 2 ++ >> >> > > include/openvswitch/automake.mk | 14 ++ >> >> > > m4/openvswitch.m4 | 21 + >> >> > > 5 files changed, 39 insertions(+) >> >> > >> >> > Why does this patch depend on libboost? It looks like it's only used >> >> > when building a test C++ program in configure. We could build a test >> >> > C++ program without the dependency, right? >> >> >> >> It's because of include/openvswitch/compiler.h, which has: >> >> >> >> #elif defined(__cplusplus) >> >> #include >> >> #define BUILD_ASSERT BOOST_STATIC_ASSERT >> >> #define BUILD_ASSERT_DECL BOOST_STATIC_ASSERT >> >> >> >> Ah, sorry ... >> >> >> We could probably define our own C++-compatible static assert without >> >> boost. It looks like C++11 and later has a built-in static_assert: >> >> http://en.cppreference.com/w/cpp/language/static_assert >> >> >> >> Any idea whether it's reasonable to assume C++11 support these days? >> >> I don't know, but that seems pretty reasonable to me. >> >> I suppose if it becomes an issue, we could re-add the boost version as >> an optional dependency that would get used only if C++11 wasn't >> available? > > Sure. It's actually quite reasonable to assume support for static_assert, it was added to the working group during 2007/2008 timeframe, and every compiler at the time which had c++-0x support (which I know includes GCC[1], and VS[2]) have support for it. Since this commit was applied I guess it doesn't matter, but it's possibly useful information for the future. >> Acked-by: Russell Bryant > > Thanks, I applied this whole series to master, using the revised version > of this patch. > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev [1]: https://gcc.gnu.org/gcc-4.3/cxx0x_status.html [2]: https://blogs.msdn.microsoft.com/vcblog/2008/10/28/lambdas-auto-and-static_assert-c0x-features-in-vc10-part-1/ ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side
Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum. So L4 packets's cksum were calculated in VM side but performance is not good. Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput and makes virtio-net frontend-driver support NETIF_F_SG as well Signed-off-by: Zhenyu Gao --- Here is some performance number: Setup: qperf client +-+ | VM| +-+ | | qperf server +--+ ++ | vswitch+dpdk | | bare-metal | +--+ ++ || || pNic-PhysicalSwitch do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0 tx on' in VM side. It offload cksum job to ovs-dpdk side. do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx off' in VM side. VM calculate cksum for tcp/udp packets. We can see huge improvment in TCP throughput if we leverage ovs-dpdk cksum. [root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2 host-qperf-server01 tcp_bw tcp_lat udp_bw udp_lat do cksum in ovs-dpdk do cksum in VM without this patch tcp_bw: bw = 2.05 MB/secbw = 1.92 MB/secbw = 1.95 MB/sec tcp_bw: bw = 3.9 MB/sec bw = 3.99 MB/secbw = 3.98 MB/sec tcp_bw: bw = 8.09 MB/secbw = 7.82 MB/secbw = 8.19 MB/sec tcp_bw: bw = 14.9 MB/secbw = 14.8 MB/secbw = 15.7 MB/sec tcp_bw: bw = 27.7 MB/secbw = 28 MB/sec bw = 29.7 MB/sec tcp_bw: bw = 51.2 MB/secbw = 50.9 MB/secbw = 54.9 MB/sec tcp_bw: bw = 86.7 MB/secbw = 86.8 MB/secbw = 95.1 MB/sec tcp_bw: bw = 149 MB/sec bw = 160 MB/sec bw = 149 MB/sec tcp_bw: bw = 211 MB/sec bw = 205 MB/sec bw = 216 MB/sec tcp_bw: bw = 271 MB/sec bw = 254 MB/sec bw = 275 MB/sec tcp_bw: bw = 326 MB/sec bw = 303 MB/sec bw = 321 MB/sec tcp_bw: bw = 407 MB/sec bw = 359 MB/sec bw = 361 MB/sec tcp_bw: bw = 816 MB/sec bw = 512 MB/sec bw = 419 MB/sec tcp_bw: bw = 840 MB/sec bw = 756 MB/sec bw = 457 MB/sec tcp_bw: bw = 1.07 GB/secbw = 880 MB/sec bw = 480 MB/sec tcp_bw: bw = 1.17 GB/secbw = 1.01 GB/secbw = 488 MB/sec tcp_bw: bw = 1.17 GB/secbw = 1.11 GB/secbw = 483 MB/sec tcp_lat: latency = 29 us latency = 29.2 us latency = 29.6 us tcp_lat: latency = 28.9 us latency = 29.3 us latency = 29.5 us tcp_lat: latency = 29 us latency = 29.3 us latency = 29.6 us tcp_lat: latency = 29 us latency = 29.4 us latency = 29.5 us tcp_lat: latency = 29 us latency = 29.2 us latency = 29.6 us tcp_lat: latency = 29.1 us latency = 29.3 us latency = 29.7 us tcp_lat: latency = 29.4 us latency = 29.6 us latency = 30 us tcp_lat: latency = 29.8 us latency = 30.1 us latency = 30.2 us tcp_lat: latency = 30.9 us latency = 30.9 us latency = 31 us tcp_lat: latency = 46.9 us latency = 46.2 us latency = 32.2 us tcp_lat: latency = 51.5 us latency = 52.6 us latency = 34.5 us tcp_lat: latency = 43.9 us latency = 43.8 us latency = 43.6 us tcp_lat: latency = 47.6 us latency = 48 us latency = 48.1 us tcp_lat: latency = 77.7 us latency = 78.8 us latency = 78.8 us tcp_lat: latency = 82.8 us latency = 82.3 us latency = 116 us tcp_lat: latency = 94.8 us latency = 94.2 us latency = 134 us tcp_lat: latency = 167 uslatency = 197 uslatency = 172 us udp_bw: send_bw = 418 KB/secsend_bw = 413 KB/secsend_bw = 403 KB/sec recv_bw = 410 KB/secrecv_bw = 412 KB/secrecv_bw = 400 KB/sec udp_bw: send_bw = 831 KB/secsend_bw = 825 KB/secsend_bw = 810 KB/sec recv_bw = 828 KB/secrecv_bw = 816 KB/secrecv_bw = 807 KB/sec udp_bw: send_bw = 1.67 MB/sec send_bw = 1.65 MB/sec send_bw = 1.63 MB/sec recv_bw = 1.64 MB/sec recv_bw = 1.62 MB/sec recv_bw = 1.63 MB/sec udp_bw: send_bw = 3.36 MB/sec send_bw = 3.29 MB/sec send_bw = 3.26 MB/sec recv_bw = 3.29 MB/sec recv_bw = 3.25 MB/sec recv_bw = 2.82 MB/sec udp_bw: send_bw = 6.72 MB/sec send_bw = 6.61 MB/sec send_bw = 6.45 MB/sec recv_bw = 6.54 MB/sec recv_bw = 6.59 MB/sec recv_bw = 6.45 MB/sec udp_bw: send_bw = 13.4 MB/sec send_bw = 13.2 MB/s
Re: [ovs-dev] Openvswitch crash when bringing down the dpdk bond port using "ovs-ofctl mod-port br-prv dpdk1 down"
On 01.08.2017 15:46, Keshav Gupta wrote: > Hi Ben/Ilya > Mainly patch do the following > 1) Call the dpdk rx api(rte_eth_rx_burst) only when ovs netdev state is UP > 2) Now the rte_eth_dev_stop/ rte_eth_dev_start calls are removed in updating > the flags > > > This fixes the core dump issue but I see below side effect with this patch: > With this change now only ports admin state(ovs netdev level) is changed > (dpdk/nic level state is not changed) so NIC will continue to receive the > packet until its Rxq ring is full > Later if somebody bring up the port then he will get the burst of old packets > which are there in the Rxq (typically 4k packet (dpdk Rxq size)). > > > Thanks > Keshav > Hi Keshav, Yes, you're right. I think it's the same situation right now can be observed on current master. We can try to fix this for the HW NICs in the similar way as we handle destroy_device event for vhost-user ports. I prepared quick fix (not even compile tested) for the reference how this can be fixed. Please check the code below. One thing is that we actually can't fix such behaviour for the vhost-user because there is no such command to stop the vhost device. But we may try to fix it for HW NICs if it's important. --8<->8-- diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index ea17b97..202678f 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2300,20 +2300,48 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev, enum netdev_flags *old_flagsp) OVS_REQUIRES(dev->mutex) { +enum netdev_flags new_flags = dev->flags; + if ((off | on) & ~(NETDEV_UP | NETDEV_PROMISC)) { return EINVAL; } *old_flagsp = dev->flags; -dev->flags |= on; -dev->flags &= ~off; +new_flags |= on; +new_flags &= ~off; -if (dev->flags == *old_flagsp) { +if (new_flags == dev->flags) { return 0; } if (dev->type == DPDK_DEV_ETH) { -if (dev->flags & NETDEV_PROMISC) { +if (NETDEV_UP & ((*old_flagsp ^ on) | (*old_flagsp ^ off))) { +if (NETDEV_UP & off) { +bool quiescent = ovsrcu_is_quiescent(); + +dev->flags = new_flags; +/* Wait for other threads to quiesce after turning off the + * 'NETDEV_UP' before actual stopping the device to be sure + * that all threads finished their rx/tx operations. */ +ovsrcu_synchronize(); +if (quiescent) { +/* As call to ovsrcu_synchronize() will end the quiescent + * state, put thread back into quiescent state. */ +ovsrcu_quiesce_start(); +} +rte_eth_dev_stop(dev->port_id); +} else { +int retval = rte_eth_dev_start(dev->port_id); + +if (retval) { +VLOG_ERR("Interface %s start error: %s", dev->up.name, + rte_strerror(-diag)); +return -retval; +} +} +} + +if (new_flags & NETDEV_PROMISC) { rte_eth_promiscuous_enable(dev->port_id); } @@ -2336,6 +2364,7 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev, } } +dev->flags = new_flags; return 0; } --8<->8-- Also, the diff above made on top of current master. So, you possibly will need to port it a little. Best regards, Ilya Maximets. > > -Original Message- > From: Ben Pfaff [mailto:b...@ovn.org] > Sent: Monday, July 31, 2017 9:26 PM > To: Ilya Maximets > Cc: ovs-dev@openvswitch.org; Keshav Gupta > Subject: Re: [ovs-dev] Openvswitch crash when bringing down the dpdk bond > port using "ovs-ofctl mod-port br-prv dpdk1 down" > > Thanks. > > It doesn't cherry-pick cleanly to branch-2.6, so I think I'll leave it for > now since Darrell made the reasonable point that DPDK was experimental in 2.6. > > On Mon, Jul 31, 2017 at 06:20:28PM +0300, Ilya Maximets wrote: >> On 31.07.2017 16:05, Ben Pfaff wrote: >>> Ilya, should we apply this patch to branch-2.6? Are there other >>> patches that should be backported? >> >> It's definitely a bug and should be backported if someone wants to use >> barnch-2.6 with DPDK datapath. I traced only this exact case, so it's >> hard to say if something else should be backported, but this patch >> should fix described issue without any additional changes. >> >>> On Wed, Jul 26, 2017 at 03:28:12PM +0300, Ilya Maximets wrote: Hi. You need to backport at least following patch: commit 3b1fb0779b87788968c1a6a9ff295a9883547485 Author: Daniele Di Proietto Date: Tue Nov 15 15:40:49 2016 -0800 netdev-dpdk: Don't call rte_dev_stop() in update_flags(). Calling rte_
Re: [ovs-dev] [PATCH v2 5/5] dp-packet: Use memcpy on dp_packet elements.
Hi Antonio, > -Original Message- > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com > Sent: Wednesday, July 19, 2017 5:05 PM > To: d...@openvswitch.org > Subject: [ovs-dev] [PATCH v2 5/5] dp-packet: Use memcpy on dp_packet > elements. > > memcpy replaces the several single copies inside > dp_packet_clone_with_headroom(). > > Signed-off-by: Antonio Fischetti > --- > lib/dp-packet.c | 18 +- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 67aa406..f4dbcb7 100644 > --- a/lib/dp-packet.c > +++ b/lib/dp-packet.c > @@ -157,8 +157,9 @@ dp_packet_clone(const struct dp_packet *buffer) > return dp_packet_clone_with_headroom(buffer, 0); } > > -/* Creates and returns a new dp_packet whose data are copied from > 'buffer'. The > - * returned dp_packet will additionally have 'headroom' bytes of headroom. > */ > +/* Creates and returns a new dp_packet whose data are copied from > 'buffer'. > + * The returned dp_packet will additionally have 'headroom' bytes of > + * headroom. */ > struct dp_packet * > dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t > headroom) { @@ -167,13 +168,12 @@ > dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t > headroom) > new_buffer = > dp_packet_clone_data_with_headroom(dp_packet_data(buffer), > dp_packet_size(buffer), > headroom); > -new_buffer->l2_pad_size = buffer->l2_pad_size; > -new_buffer->l2_5_ofs = buffer->l2_5_ofs; > -new_buffer->l3_ofs = buffer->l3_ofs; > -new_buffer->l4_ofs = buffer->l4_ofs; > -new_buffer->md = buffer->md; > -new_buffer->cutlen = buffer->cutlen; > -new_buffer->packet_type = buffer->packet_type; > +/* Copy the following fields into the returned buffer: l2_pad_size, > + * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */ > +memcpy(&new_buffer->l2_pad_size, &buffer->l2_pad_size, > +sizeof(struct dp_packet) - > +offsetof(struct dp_packet, l2_pad_size)); > + [[BO'M]] Does this change in itself have a perf improvement? A reference/warning in the dp_packet declaration would be a good idea - anyone changing fields from l2_pad_size to the end of the structure needs to be aware of this implementation of dp_packet_clone_data_with_headroom(). > #ifdef DPDK_NETDEV > new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags; #else > -- > 2.4.11 > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] How to tell which code is for userspace
On Tue, Aug 1, 2017 at 10:01 AM, wrote: > Dear all, > > I am reading ovs 2.7.90 code, do you have a quick way to tell which code is > for userspace and which for kernel? The ovs/datapath/linux/ directory is the Linux kernel datapath code. The rest is userspace (with the exception of the Windows datapath code). -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] How to tell which code is for userspace
Dear all, I am reading ovs 2.7.90 code, do you have a quick way to tell which code is for userspace and which for kernel? Br, Ted Sent from Mail for Windows 10 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Basic active-active HA for ovn-northd
On Tue, Aug 1, 2017 at 8:06 AM, Numan Siddique wrote: > > > On Tue, Aug 1, 2017 at 4:25 PM, Numan Siddique wrote: >> >> >> >> On Tue, Aug 1, 2017 at 5:00 AM, Han Zhou wrote: >>> >>> On Mon, Jul 31, 2017 at 1:53 PM, Russell Bryant wrote: >>> > >>> > I wanted to share the idea before I code it to see if it makes sense. >>> > I imagine the patch would be small, though. >>> > >>> > We currently provide HA for ovn-northd by using Pacemaker to ensure >>> > that ovn-northd is running only one time somewhere in a cluster. >>> > >> >> >> In the case of Pacemaker, the pacemaker OCF resource script >> (ovndb-servers.ocf) starts the ovn-northd on the master node if >> manage_northd is set to true and this is the approach tripleo has taken. >> ovn-northd uses the unix sockets to communicate to the NB and SB db servers. >> With the suggested approach, ovn-northd would be using tcp connection to >> communicate to the db servers ? > > > Given that SB DB ovsdb-server would assign the lock, it has to be a tcp > connection. Sorry for the confusion. Right. If we're still using Pacemaker for the databases anyway, we could keep managing ovn-northd as well to ensure they stay local so we can keep using the local unix socket. On the other hand, separating them will spread load a bit more across controllers. It's not obvious to me which is better. > > >> >> Thanks >> Numan >> >> >> >> >>> > What if we made ovn-northd acquire an OVSDB lock on the southbound >>> > database before it did any real work? That way we could start >>> > multiple copies of ovn-northd in a cluster, but only one would be >>> > active at a time. >>> > >>> > This is crude, and obviously we would want to distribute work among >>> > ovn-northd instances eventually, but does this sound like an >>> > improvement over requiring something like Pacemaker? >>> > >>> Russell, this sounds very good. I think it is better than Pacemaker. I >>> would still call it active-standby though. The standby is hot-standby, >>> not >>> cold-standby. Thanks, Han. Yes, your "hot-standby" terminology sounds better. :-) >>> >>> It seems the change would be having the active one updating >> northd hostname> periodically to southbound DB, so that other northd will >>> know if it is still alive, and otherwise taking over by updating the >>> . Is this correct? This wouldn't be required. If the active ovn-northd goes down, the OVSDB lock will automatically be released when the database detects that the connection is gone. We would be relying on ovsdb-server for how quickly the failover happens. I think this change would be an improvement. At a minimum, it ensures ovn-northd behaves in a sane way if you accidentally run it more than once in a deployment. It also allows intentionally running it more than once as one or more hot-standby instances. Maybe in the next release we can revisit proposals for fully active-active ovn-northd. -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Proposal: EMC load-shedding
Hi All, This proposal is an attempt to make a more general solution to the same issue of EMC thrashing addressed by https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335940.html. That patch proposes that when the EMC is overloaded that recirculated packets are neither inserted into the EMC nor are they looked up in the EMC. However, while it is a cheap decision, picking 'recirculated' as the category of packet to drop from the EMC is not very granular - dropping them from the EMC may result in an EMC that is now under-utilized or still overloaded - it all depends on what proportion of packets were in the recirculated category. Once there are too many entries contending for any cache it's going to become a liability as the lookup_cost + (miss_rate * cost_of_miss) grows to be greater than cost_of_miss. And in that overloaded case any scheme where a very cheap decision can be made to not insert a certain category of packets and to also not check the EMC for that same category will reduce the cache miss rate. Looking at a packet's RSS hash can also provide a "very cheap decision that can be made to not insert a certain category of packets and to also not check the EMC for that same category" I don't want to propose an actual patch right now but if the general principle is agreed then I would be happy to write at least a first iteration of the patch or maybe the authors of the patch above would like to do so? So I suggest that when the EMC becomes "overloaded" load is shed based on RSS /5-tuple hash. If the hash is above a certain threshold it becomes eligible to be inserted into the EMC. Packets with a hash below the threshold are non inserted nor are they looked up in the EMC. The threshold can be changed in an ongoing and granular way to adapt to current traffic conditions and maintain an optimum EMC utilization. So in simple, non-optimized (and non-compiling) terms: diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 47a9fa0..df5b9db 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -4599,11 +4599,12 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, miniflow_extract(packet, &key->mf); key->len = 0; /* Not computed yet. */ key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf); /* If EMC is disabled skip emc_lookup */ -flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key); +if (key->hash > flow_cache.shed_threshold) { +flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key); +} if (OVS_LIKELY(flow)) { dp_netdev_queue_batches(packet, flow, &key->mf, batches, n_batches); } else { /* Exact match cache missed. Group missed packets together at @@ -4777,11 +4778,12 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, continue; } flow = dp_netdev_flow_cast(rules[i]); -emc_probabilistic_insert(pmd, &keys[i], flow); +if (keys[i].hash > pmd->flow_cache->shed_threshold) { +emc_probabilistic_insert(pmd, &keys[i], flow); +} dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); } dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt); dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt); The other question is setting the shed_threshold value. Generally: sched_threshold = some_function_of(emc_load) Some suggestions for calculating emc_load: * the num_alive_entries/num_entries * the num_evictions/num_insertions (where num_evictions is the number of insertions that overwrote and existing alive entry). * something more adaptable: emc_load = some_function_of (cost_of_emc_miss, cost_of_emc_lookup, probability_of_emc_miss) I'd be interested to know what people think of doing something like this and any more details that can be fleshed out, corner cases and so on. Regards, Billy ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Openvswitch crash when bringing down the dpdk bond port using "ovs-ofctl mod-port br-prv dpdk1 down"
Hi Ben/Ilya Mainly patch do the following 1) Call the dpdk rx api(rte_eth_rx_burst) only when ovs netdev state is UP 2) Now the rte_eth_dev_stop/ rte_eth_dev_start calls are removed in updating the flags This fixes the core dump issue but I see below side effect with this patch: With this change now only ports admin state(ovs netdev level) is changed (dpdk/nic level state is not changed) so NIC will continue to receive the packet until its Rxq ring is full Later if somebody bring up the port then he will get the burst of old packets which are there in the Rxq (typically 4k packet (dpdk Rxq size)). Thanks Keshav -Original Message- From: Ben Pfaff [mailto:b...@ovn.org] Sent: Monday, July 31, 2017 9:26 PM To: Ilya Maximets Cc: ovs-dev@openvswitch.org; Keshav Gupta Subject: Re: [ovs-dev] Openvswitch crash when bringing down the dpdk bond port using "ovs-ofctl mod-port br-prv dpdk1 down" Thanks. It doesn't cherry-pick cleanly to branch-2.6, so I think I'll leave it for now since Darrell made the reasonable point that DPDK was experimental in 2.6. On Mon, Jul 31, 2017 at 06:20:28PM +0300, Ilya Maximets wrote: > On 31.07.2017 16:05, Ben Pfaff wrote: > > Ilya, should we apply this patch to branch-2.6? Are there other > > patches that should be backported? > > It's definitely a bug and should be backported if someone wants to use > barnch-2.6 with DPDK datapath. I traced only this exact case, so it's > hard to say if something else should be backported, but this patch > should fix described issue without any additional changes. > > > On Wed, Jul 26, 2017 at 03:28:12PM +0300, Ilya Maximets wrote: > >> Hi. > >> > >> You need to backport at least following patch: > >> > >> commit 3b1fb0779b87788968c1a6a9ff295a9883547485 > >> Author: Daniele Di Proietto > >> Date: Tue Nov 15 15:40:49 2016 -0800 > >> > >> netdev-dpdk: Don't call rte_dev_stop() in update_flags(). > >> > >> Calling rte_eth_dev_stop() while the device is running causes a crash. > >> > >> We could use rte_eth_dev_set_link_down(), but not every PMD implements > >> that, and I found one NIC where that has no effect. > >> > >> Instead, this commit checks if the device has the NETDEV_UP flag when > >> transmitting or receiving (similarly to what we do for vhostuser). I > >> didn't notice any performance difference with this check in case the > >> device is up. > >> > >> An alternative would be to remove the device queues from the pmd > >> threads > >> tx and receive cache, but that requires reconfiguration and I'd prefer > >> to avoid it, because the change can come from OpenFlow. > >> > >> Signed-off-by: Daniele Di Proietto > >> Acked-by: Ilya Maximets > >> > >> This should fix your issue. > >> In general, I'm suggesting to use stable 2.7 OVS, there was too > >> many DPDK related changes including stability fixes since 2.6. > >> > >> Best regards, Ilya Maximets. > >> > >>> Hi > >>> We are experiencing a openvswitch crash when bringing down the dpdk > >>> bond port using "ovs-ofctl mod-port br-prv dpdk1 down". > >>> > >>> backtrace of core is like below. Is there any issue reported earlier for > >>> this type of crash in openvswitch community. > >>> > >>> (gdb) bt > >>> #0 ixgbe_rxq_rearm (rxq=0x7fa45061f800) at > >>> /home/sdn/new_cloud_sdn_switch_2/cloud-sdn-switch/dpdk/drivers/net > >>> /ixgbe/ixgbe_rxtx_vec_sse.c:98 > >>> #1 _recv_raw_pkts_vec (split_packet=0x0, nb_pkts=32, rx_pkts= >>> out>, rxq=0x7fa45061f800) > >>> at > >>> /home/sdn/new_cloud_sdn_switch_2/cloud-sdn-switch/dpdk/drivers/net > >>> /ixgbe/ixgbe_rxtx_vec_sse.c:290 > >>> #2 ixgbe_recv_pkts_vec (rx_queue=0x7fa45061f800, rx_pkts= >>> out>, nb_pkts=) > >>> at > >>> /home/sdn/new_cloud_sdn_switch_2/cloud-sdn-switch/dpdk/drivers/net > >>> /ixgbe/ixgbe_rxtx_vec_sse.c:474 > >>> #3 0x00e500e4 in ?? () > >>> #4 0x004600e6 in ?? () > >>> #5 0x006a0069 in ?? () > >>> #6 0x006c006b in ?? () > >>> #7 0x00ec006d in ?? () > >>> #8 0x00ee00ed in ?? () > >>> #9 0x0001537f5780 in ?? () > >>> #10 0x in ?? () > >>> (gdb) > >>> > >>> > >>> I have analyzed the core and it seems it is a result of device > >>> stop and packet receive from the port happening at same time by > >>> two thread OVS main thread(device stop) and PMD thread(pkt receive). More > >>> precisely main thread cleaning the packet buffer from rxq sw_ring to > >>> avoid the packet buffer leak while in parallel PMD thread is filling the > >>> packet buffer in sw_ring/descriptor ring as part of ixgbe_recv_pkts_vec. > >>> > >>> version used is: openvswitch (2.6.1) with dpdk (16.11). > >>> > >>> This crash is not every time reproducible but frequency seems to be high. > >>> > >>> I am new to openvswitch community and this is first time I am posting a > >>> query. let me know if anything you require from my side. > >>> > >>
Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap
Hi, Ben I have posted v4, they are: https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336504.html https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336505.html https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336506.html I have fixed most of issues you commented, I need to clarify the below issues, actually they are not issues. I have explained the other issues in last email. #1. [Ben] I don't see anything in decode_ofp_prop() or its caller that checks that the property fits in the available space. It looks like an integer overflow error. [Yi] Currently it is a stub function, it doesn't handle any property, so it doesn't put any property into buf, so we mustn't check space at this point. Our NSH patch series will add more pieces for this. Let us check enum ofperr decode_ed_prop(const struct ofp_ed_prop_header **ofp_prop, struct ofpbuf *out OVS_UNUSED, size_t *remaining) We used ofpbuf_put_uninit(out, ...) to put the property to buffer, so we needn't care the available space, ofpbuf_put_uninit will allocate on demand. #2. [Ben] I suspect that decode_NXAST_RAW_DECAP() should report an error if properties are present, since it doesn't support properties. [Yi] It is impossible. static enum ofperr decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad, enum ofp_version ofp_version OVS_UNUSED, struct ofpbuf *ofpacts) { struct ofpact_decap * decap; decap = ofpact_put_DECAP(ofpacts); decap->ofpact.raw = NXAST_RAW_DECAP; decap->new_pkt_type = nad->new_pkt_type; return 0; } struct ofpact_decap { struct ofpact ofpact; /* New packet type. * * The special value (0,0xFFFE) "Use next proto" is used to request OVS to * automatically set the new packet type based on the decap'ed header's * next protocol. */ ovs_be32 new_pkt_type; }; struct nx_action_decap { ovs_be16 type; /* OFPAT_VENDOR. */ ovs_be16 len; /* Total size including any property TLVs. */ ovs_be32 vendor; /* NX_VENDOR_ID. */ ovs_be16 subtype; /* NXAST_DECAP. */ uint8_t pad[2];/* 2 bytes padding */ /* Packet type or result. * * The special value (0,0xFFFE) "Use next proto" * is used to request OVS to automatically set the new packet type based * on the decap'ed header's next protocol. */ ovs_be32 new_pkt_type; }; -Original Message- From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of Ben Pfaff Sent: Monday, July 31, 2017 11:43 PM To: Zoltán Balogh Cc: d...@openvswitch.org Subject: Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap On Fri, Jul 21, 2017 at 07:08:25PM +0200, Zoltán Balogh wrote: > From: Jan Scheurich > > This commit adds support for the OpenFlow actions generic encap and > decap (as specified in ONF EXT-382) to the OVS control plane. > > CLI syntax for encap action with properties: > encap(hdr=) > encap(hdr=, > prop(class=,type=,val=), > prop(class=,type=,val())) > > CLI syntax for decap action: > decap() > decap(packet_type(ns=,type=)) > > The first header supported for encap and decap is "ethernet" to > convert packets between packet_type (1,Ethertype) and (0,0). > > This commit also implements a skeleton for the translation of generic > encap and decap actions in ofproto-dpif and adds support to encap and > decap an Ethernet header. > > In general translation of encap commits pending actions and then > rewrites struct flow in accordance with the new packet type and > header. In the case of encap(ethernet) it suffices to change the > packet type from (1, Ethertype) to (0,0) and set the dl_type > accordingly. A new pending_encap flag in xlate ctx is set to mark that > an corresponding datapath encap action must be triggered at the next > commit. In the case of encap(ethernet) ofproto generetas a push_eth action. > > The general case for translation of decap() is to emit a datapath > action to decap the current outermost header and then recirculate the > packet to reparse the inner headers. In the special case of an > Ethernet packet, > decap() just changes the packet type from (0,0) to (1, dl_type) > without a need to recirculate. The emission of the pop_eth action for > the datapath is postponed to the next commit. > > Hence encap(ethernet) and decap() on an Ethernet packet are OF octions > that only incur a cost in the dataplane when a modifed packet is > actually committed, e.g. because it is sent out. They can freely be > used for normalizing the packet type in the OF pipeline without > degrading performance. > > Signed-off-by: Jan Scheurich > Signed-off-by: Yi Yang > Signed-off-by: Zoltan Balogh > Co-authored-by: Zoltan Balogh Thanks for working on this. I think that it will be a valuable feature. First, I'm having a really hard
[ovs-dev] [PATCH v4 2/2] tests: Extend PTAP unit tests with decap action
From: Zoltan Balogh - Checking decap() prerequisits. - Encap/decap VLAN tagged Ethernet frames. - Send L3 packet over patch port. - Output L2/L3 packet to ports with different packet_type properties. Signed-off-by: Zoltan Balogh Suggested-by: Jan Scheurich --- tests/packet-type-aware.at | 405 + 1 file changed, 405 insertions(+) diff --git a/tests/packet-type-aware.at b/tests/packet-type-aware.at index 883d9a0..8034e80 100644 --- a/tests/packet-type-aware.at +++ b/tests/packet-type-aware.at @@ -510,3 +510,408 @@ tunnel(src=20.0.0.3,dst=20.0.0.2,flags(-df-csum)),recirc_id(0),in_port(gre_sys), OVS_VSWITCHD_STOP(["/The Open vSwitch kernel module is probably not loaded/d"]) AT_CLEANUP + + +AT_SETUP([ptap - check decap() prerequisits]) +OVS_VSWITCHD_START + +# Decap IP header, then set IP destination address. This should fail. +AT_CHECK([ +ovs-ofctl add-flow br0 "in_port=1,packet_type=(1,0x800),actions=decap(),set_field:1.1.1.1->nw_dst" +], [1], [stdout], [stderr]) + +AT_CHECK([ +cat stderr | cut -d '|' -f 3- +], [0], [dnl +ofp_actions|WARN|set_field ip_dst lacks correct prerequisites +ovs-ofctl: actions are invalid with specified match (OFPBAC_MATCH_INCONSISTENT) +]) + +# Decap Ethernet header, then set IP destination address. This should work. +AT_CHECK([ +ovs-ofctl add-flow br0 -OOpenFlow13 "in_port=1,ip,actions=decap(),set_field:1.1.1.1->nw_dst" +], [0]) + +# Decap IP header, then set metadata. This should work. +AT_CHECK([ +ovs-ofctl add-flow br0 -OOpenFlow13 "in_port=1,packet_type=(1,0x800),actions=decap(),set_field:1234->metadata" +], [0]) + +OVS_VSWITCHD_STOP +AT_CLEANUP + + +AT_SETUP([ptap - check encap/decap VLAN tagged Ethernet frame]) +OVS_VSWITCHD_START([dnl +-- add-port br0 n1 -- set interface n1 type=dummy ofport_request=1 \ +-- add-port br0 n2 -- set interface n2 type=dummy ofport_request=2 \ +-- add-port br0 p1 -- set interface p1 type=patch options:peer=p2 ofport_request=3 \ +-- add-port br0 p2 -- set interface p2 type=patch options:peer=p1 ofport_request=4 +]) + +# Decap VLAN tagged Ethernet frames -> should be dropped +AT_CHECK([ +ovs-ofctl add-flow br0 -OOpenFlow13 "in_port=1,actions=push_vlan:0x8100,mod_vlan_vid:100,decap(),3" +ovs-ofctl add-flow br0 -OOpenFlow13 "in_port=4,actions=encap(hdr=ethernet),2" +], [0]) + +AT_CHECK([ +ovs-appctl netdev-dummy/receive n1 1e2ce92a669e3a6dd2099cab080045548a53400040011addc0a80a0ac0a80a1e08006f200a4d0001fc509a5827150200101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637 +ovs-appctl netdev-dummy/receive n1 1e2ce92a669e3a6dd2099cab080045548a83400040011aadc0a80a0ac0a80a1e0800b7170a4d0002fd509a58de1c0200101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637 +], [0], [ignore]) + +ovs-appctl time/warp 1000 + +AT_CHECK([ +ovs-appctl dpctl/dump-flows | strip_used | grep -v ipv6 | sort +], [0], [flow-dump from non-dpdk interfaces: +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:1, bytes:98, used:0.0s, actions:drop +]) + +# Encap(hdr=ethernet) on Ethernet frame -> should be droped +AT_CHECK([ +ovs-ofctl del-flows br0 +ovs-ofctl add-flow br0 -OOpenFlow13 "in_port=1,actions=3" +ovs-ofctl add-flow br0 -OOpenFlow13 "in_port=4,actions=encap(hdr=ethernet),2" +]) + +ovs-appctl time/warp 11000 + +AT_CHECK([ +ovs-appctl netdev-dummy/receive n1 1e2ce92a669e3a6dd2099cab080045548a53400040011addc0a80a0ac0a80a1e08006f200a4d0001fc509a5827150200101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637 +ovs-appctl netdev-dummy/receive n1 1e2ce92a669e3a6dd2099cab080045548a83400040011aadc0a80a0ac0a80a1e0800b7170a4d0002fd509a58de1c0200101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637 +], [0], [ignore]) + +ovs-appctl time/warp 1000 + +AT_CHECK([ +ovs-appctl dpctl/dump-flows | strip_used | grep -v ipv6 | sort +], [0], [flow-dump from non-dpdk interfaces: +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:1, bytes:98, used:0.0s, actions:drop +]) + +# Encap(hdr=ethernet) on VLAN tagged Ethernet frame -> should be droped +AT_CHECK([ +ovs-ofctl del-flows br0 +ovs-ofctl add-flow br0 -OOpenFlow13 "in_port=1,actions=push_vlan:0x8100,mod_vlan_vid:100,encap(hdr=ethernet),3" +ovs-ofctl add-flow br0 -OOpenFlow13 "in_port=4,actions=2" +]) + +ovs-appctl time/warp 11000 + +AT_CHECK([ +ovs-appctl netdev-dummy/receive n1 1e2ce92a669e3a6dd2099cab080045548a53400040011addc0a80a0ac0a80a1e08006f200a4d0001fc509a5827150200101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637 +ovs-appctl netdev-dummy/receive n1 1e2ce92a669e3a6dd2099cab080045548a83400040011aadc0a80a0ac0a80a1e0800b7170a4d0002fd509a58de1c02001
[ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap
From: Jan Scheurich This commit adds support for the OpenFlow actions generic encap and decap (as specified in ONF EXT-382) to the OVS control plane. CLI syntax for encap action with properties: encap(hdr=) encap(hdr=, prop(class=,type=,val=), prop(class=,type=,val())) CLI syntax for decap action: decap() decap(packet_type(ns=,type=)) The first header supported for encap and decap is "ethernet" to convert packets between packet_type (1,Ethertype) and (0,0). This commit also implements a skeleton for the translation of generic encap and decap actions in ofproto-dpif and adds support to encap and decap an Ethernet header. In general translation of encap commits pending actions and then rewrites struct flow in accordance with the new packet type and header. In the case of encap(ethernet) it suffices to change the packet type from (1, Ethertype) to (0,0) and set the dl_type accordingly. A new pending_encap flag in xlate ctx is set to mark that an corresponding datapath encap action must be triggered at the next commit. In the case of encap(ethernet) ofproto generetas a push_eth action. The general case for translation of decap() is to emit a datapath action to decap the current outermost header and then recirculate the packet to reparse the inner headers. In the special case of an Ethernet packet, decap() just changes the packet type from (0,0) to (1, dl_type) without a need to recirculate. The emission of the pop_eth action for the datapath is postponed to the next commit. Hence encap(ethernet) and decap() on an Ethernet packet are OF octions that only incur a cost in the dataplane when a modifed packet is actually committed, e.g. because it is sent out. They can freely be used for normalizing the packet type in the OF pipeline without degrading performance. Signed-off-by: Jan Scheurich Signed-off-by: Yi Yang Signed-off-by: Zoltan Balogh Co-authored-by: Zoltan Balogh --- NEWS | 6 + include/openflow/openflow-common.h | 1 + include/openvswitch/automake.mk| 1 + include/openvswitch/ofp-actions.h | 32 include/openvswitch/ofp-ed-props.h | 77 include/openvswitch/ofp-errors.h | 9 + lib/automake.mk| 1 + lib/odp-util.c | 84 +--- lib/odp-util.h | 3 +- lib/ofp-actions.c | 380 - lib/ofp-ed-props.c | 151 +++ lib/packets.h | 3 +- ofproto/ofproto-dpif-xlate.c | 116 ++- utilities/ovs-ofctl.8.in | 73 ++- 14 files changed, 890 insertions(+), 47 deletions(-) create mode 100644 include/openvswitch/ofp-ed-props.h create mode 100644 lib/ofp-ed-props.c diff --git a/NEWS b/NEWS index facea02..ca02ca7 100644 --- a/NEWS +++ b/NEWS @@ -62,11 +62,17 @@ Post-v2.7.0 * The "learn" action now supports a "limit" option (see ovs-ofctl(8)). * The port status bit OFPPS_LIVE now reflects link aliveness. * OpenFlow 1.5 packet-out is now supported. + * Support for OpenFlow 1.5 field packet_type and packet-type-aware + pipeline (PTAP). + * Added generic encap and decap actions (EXT-382). + First supported use case is encap/decap for Ethernet. - Fedora Packaging: * OVN services are no longer restarted automatically after upgrade. - Add --cleanup option to command 'ovs-appctl exit' (see ovs-vswitchd(8)). - L3 tunneling: * Use new tunnel port option "packet_type" to configure L2 vs. L3. + * In conjunction with PTAP tunnel ports can handle a mix of L2 and L3 + payload. * New vxlan tunnel extension "gpe" to support VXLAN-GPE tunnels. * New support for non-Ethernet (L3) payloads in GRE and VXLAN-GPE. - The BFD detection multiplier is now user-configurable. diff --git a/include/openflow/openflow-common.h b/include/openflow/openflow-common.h index 5f1e225..d665519 100644 --- a/include/openflow/openflow-common.h +++ b/include/openflow/openflow-common.h @@ -466,6 +466,7 @@ enum ofp_header_type_namespaces { OFPHTN_IP_PROTO = 2,/* ns_type is a IP protocol number. */ OFPHTN_UDP_TCP_PORT = 3,/* ns_type is a TCP or UDP port. */ OFPHTN_IPV4_OPTION = 4, /* ns_type is an IPv4 option number. */ +OFPHTN_N_TYPES }; #endif /* openflow/openflow-common.h */ diff --git a/include/openvswitch/automake.mk b/include/openvswitch/automake.mk index 6bace61..0a139d0 100644 --- a/include/openvswitch/automake.mk +++ b/include/openvswitch/automake.mk @@ -12,6 +12,7 @@ openvswitchinclude_HEADERS = \ include/openvswitch/meta-flow.h \ include/openvswitch/ofpbuf.h \ include/openvswitch/ofp-actions.h \ + include/openvswitch/ofp-ed-props.h \ include/openvswitch/ofp-errors.h \ include/openvswitch/ofp-msgs.h \ include/openvswitch/ofp-parse.h \ diff --git a/include/openvswitch/ofp-actions.h b/include/
[ovs-dev] [PATCH v4 0/2] basic encap/decap
This series is a continuation of other patch series initiated by Jan Scheurich before. The main purpose of this series is to add support for the OpenFlow actions generic encap and decap (ONF EXT-382) to the OVS control plane. It implements a skeleton for translation of generic encap and decap actions in ofproto-dpif and provides support to encap and decap an Ethernet header. v3->v4 - Add encap and decap actions man page in ovs-ofctl.8 - Fix some alignment errors by ALIGNED_CAST - Fix some comments style issues - Fix various trivial issues per Ben Pfaff's comments - Rebase to current master branch v2->v3 - NEWS updated. - fix: drop VLAN tagged packet trying to decap it. - New unit tests for the fix. - Some tests were updated due to change in recirculation on master branch. v1->v2 - Squash 1/4 and 2/4 commits of v1. - Put unit tests in a separate commit. - Use aligned cast. - Nicira extension numbers for encap/decap action numbers and error codes. - Small fixes according to comments. Jan Scheurich (1): OF support and translation of generic encap and decap Zoltan Balogh (1): tests: Extend PTAP unit tests with decap action NEWS | 6 + include/openflow/openflow-common.h | 1 + include/openvswitch/automake.mk| 1 + include/openvswitch/ofp-actions.h | 32 +++ include/openvswitch/ofp-ed-props.h | 77 +++ include/openvswitch/ofp-errors.h | 9 + lib/automake.mk| 1 + lib/odp-util.c | 84 +--- lib/odp-util.h | 3 +- lib/ofp-actions.c | 380 +- lib/ofp-ed-props.c | 151 ++ lib/packets.h | 3 +- ofproto/ofproto-dpif-xlate.c | 116 ++- tests/packet-type-aware.at | 405 + utilities/ovs-ofctl.8.in | 73 ++- 15 files changed, 1295 insertions(+), 47 deletions(-) create mode 100644 include/openvswitch/ofp-ed-props.h create mode 100644 lib/ofp-ed-props.c -- 2.1.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] Basic active-active HA for ovn-northd
On Tue, Aug 1, 2017 at 4:25 PM, Numan Siddique wrote: > > > On Tue, Aug 1, 2017 at 5:00 AM, Han Zhou wrote: > >> On Mon, Jul 31, 2017 at 1:53 PM, Russell Bryant wrote: >> > >> > I wanted to share the idea before I code it to see if it makes sense. >> > I imagine the patch would be small, though. >> > >> > We currently provide HA for ovn-northd by using Pacemaker to ensure >> > that ovn-northd is running only one time somewhere in a cluster. >> > >> > > In the case of Pacemaker, the pacemaker OCF resource script > (ovndb-servers.ocf) starts the ovn-northd on the master node if > manage_northd is set to true and this is the approach tripleo has taken. > ovn-northd uses the unix sockets to communicate to the NB and SB db servers. > With the suggested approach, ovn-northd would be using tcp connection to > communicate to the db servers ? > Given that SB DB ovsdb-server would assign the lock, it has to be a tcp connection. Sorry for the confusion. > Thanks > Numan > > > > > > What if we made ovn-northd acquire an OVSDB lock on the southbound >> > database before it did any real work? That way we could start >> > multiple copies of ovn-northd in a cluster, but only one would be >> > active at a time. >> > >> > This is crude, and obviously we would want to distribute work among >> > ovn-northd instances eventually, but does this sound like an >> > improvement over requiring something like Pacemaker? >> > >> Russell, this sounds very good. I think it is better than Pacemaker. I >> would still call it active-standby though. The standby is hot-standby, not >> cold-standby. >> >> It seems the change would be having the active one updating > northd hostname> periodically to southbound DB, so that other northd will >> know if it is still alive, and otherwise taking over by updating the >> . Is this correct? >> > > > >> ___ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 1/5] dpif-netdev: move pkt metadata init out of emc_processing.
> -Original Message- > From: O Mahony, Billy > Sent: Monday, July 31, 2017 4:33 PM > To: Fischetti, Antonio ; d...@openvswitch.org > Subject: RE: [ovs-dev] [PATCH v2 1/5] dpif-netdev: move pkt metadata init out > of emc_processing. > > There is also a reference to md_is_valid is the comments of emc_processing > that > needs to be removed. [Antonio] thanks will fix. > > > -Original Message- > > From: O Mahony, Billy > > Sent: Monday, July 31, 2017 4:04 PM > > To: 'antonio.fische...@intel.com' ; > > d...@openvswitch.org > > Subject: RE: [ovs-dev] [PATCH v2 1/5] dpif-netdev: move pkt metadata init > > out of emc_processing. > > > > Hi Antionio, > > > > This looks like a reasonable change to me. > > > > Can you add some performance statistics for when dealing with re-circulated > > packets? [Antonio] I used the ConnectionTracker setup for a firewall like table=0, priority=100,ct_state=-trk,ip actions=ct(table=1) table=0, priority=10,arp actions=NORMAL table=0, priority=1 actions=drop table=1, ct_state=+new+trk,ip,in_port=dpdk0 actions=ct(commit),output:dpdk1 table=1, ct_state=+new+trk,ip,in_port=dpdk1 actions=drop table=1, ct_state=+est+trk,ip,in_port=dpdk0 actions=output:dpdk1 table=1, ct_state=+est+trk,ip,in_port=dpdk1 actions=output:dpdk0 I was sending 10 different UDP streams from the traffic generator, 2 PMDs with 3 Tx queues, 64 B packets at the line rate. I got the following performance figures when I measured the received pkt rate [Mpps] on the 2 sides, regardless of packet loss. Original OvS-DPDK 2.582.60 + this patch 2.672.71 I didn't test with more UDP streams because - as the performance drops - the relative improvement becomes so small that is less visible and measurable. > > > > /Billy. > > > > > -Original Message- > > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > > > boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com > > > Sent: Wednesday, July 19, 2017 5:05 PM > > > To: d...@openvswitch.org > > > Subject: [ovs-dev] [PATCH v2 1/5] dpif-netdev: move pkt metadata init > > > out of emc_processing. > > > > > > Packet metadata initialization is moved into dp_netdev_input to > > > improve performance. > > > > > > Signed-off-by: Antonio Fischetti > > > --- > > > In my testbench with the following port to port flow setup: > > > in_port=1,action=output:2 > > > in_port=2,action=output:1 > > > > > > I measured packet Rx rate (regardless of packet loss) in a > > > Bidirectional test with 64B UDP packets. > > > > > > I saw the following performance improvement > > > > > > Orig: 11.30, 11.54 Mpps > > > Orig + patch: 11.70, 11.76 Mpps > > > > > > lib/dpif-netdev.c | 21 ++--- > > > 1 file changed, 10 insertions(+), 11 deletions(-) > > > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index > > > 98e7765..123e04a > > > 100644 > > > --- a/lib/dpif-netdev.c > > > +++ b/lib/dpif-netdev.c > > > @@ -4572,8 +4572,7 @@ static inline size_t emc_processing(struct > > > dp_netdev_pmd_thread *pmd, > > > struct dp_packet_batch *packets_, > > > struct netdev_flow_key *keys, > > > - struct packet_batch_per_flow batches[], size_t *n_batches, > > > - bool md_is_valid, odp_port_t port_no) > > > + struct packet_batch_per_flow batches[], size_t > > > + *n_batches) > > > { > > > struct emc_cache *flow_cache = &pmd->flow_cache; > > > struct netdev_flow_key *key = &keys[0]; @@ -4601,9 +4600,6 @@ > > > emc_processing(struct dp_netdev_pmd_thread *pmd, > > > pkt_metadata_prefetch_init(&packets[i+1]->md); > > > } > > > > > > -if (!md_is_valid) { > > > -pkt_metadata_init(&packet->md, port_no); > > > -} > > > miniflow_extract(packet, &key->mf); > > > key->len = 0; /* Not computed yet. */ > > > key->hash = dpif_netdev_packet_get_rss_hash(packet, > > > &key->mf); @@ -4805,8 +4801,7 @@ fast_path_processing(struct > > > dp_netdev_pmd_thread *pmd, > > > * valid, 'md_is_valid' must be true and 'port_no' will be ignored. > > > */ static void dp_netdev_input__(struct dp_netdev_pmd_thread *pmd, > > > - struct dp_packet_batch *packets, > > > - bool md_is_valid, odp_port_t port_no) > > > + struct dp_packet_batch *packets) > > > { > > > int cnt = packets->count; > > > #if !defined(__CHECKER__) && !defined(_WIN32) @@ -4823,8 +4818,7 > > @@ > > > dp_netdev_input__(struct dp_netdev_pmd_thread *pmd, > > > odp_port_t in_port; > > > > > > n_batches = 0; > > > -emc_processing(pmd, packets, keys, batches, &n_batches, > > > -md_is_valid, port_no); > > > +emc_processing(pmd, packets, keys, batches, &n_batches); > > > if (!dp_packet_batch_is_empty(packets)) { > > > /* Get ingress port from first packet's metadata. */ > > > in_port = packets->packet
Re: [ovs-dev] [RFC] Basic active-active HA for ovn-northd
On Tue, Aug 1, 2017 at 5:00 AM, Han Zhou wrote: > On Mon, Jul 31, 2017 at 1:53 PM, Russell Bryant wrote: > > > > I wanted to share the idea before I code it to see if it makes sense. > > I imagine the patch would be small, though. > > > > We currently provide HA for ovn-northd by using Pacemaker to ensure > > that ovn-northd is running only one time somewhere in a cluster. > > > In the case of Pacemaker, the pacemaker OCF resource script (ovndb-servers.ocf) starts the ovn-northd on the master node if manage_northd is set to true and this is the approach tripleo has taken. ovn-northd uses the unix sockets to communicate to the NB and SB db servers. With the suggested approach, ovn-northd would be using tcp connection to communicate to the db servers ? Thanks Numan > What if we made ovn-northd acquire an OVSDB lock on the southbound > > database before it did any real work? That way we could start > > multiple copies of ovn-northd in a cluster, but only one would be > > active at a time. > > > > This is crude, and obviously we would want to distribute work among > > ovn-northd instances eventually, but does this sound like an > > improvement over requiring something like Pacemaker? > > > Russell, this sounds very good. I think it is better than Pacemaker. I > would still call it active-standby though. The standby is hot-standby, not > cold-standby. > > It seems the change would be having the active one updating northd hostname> periodically to southbound DB, so that other northd will > know if it is still alive, and otherwise taking over by updating the > . Is this correct? > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for recirc packets.
Hi Antonio, Unfortunately I think the performance deltas of this here probably need to be re-worked given the bug discovered & fixed in EMC Insertion algorithm here which according to the patch notes will significantly reduce EMC contention for a given number of flows. https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336452.html However, before you commit more effort I would like to post a proposal to the list on a more generalized EMC load-shedding mechanism which I think could be more effective as it would be more granular than shedding just re-circulated traffic. I hope to post that today. Regards, /Billy > -Original Message- > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com > Sent: Wednesday, July 19, 2017 5:05 PM > To: d...@openvswitch.org > Subject: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for > recirc packets. > > When OVS is configured as a firewall, with thousands of active concurrent > connections, the EMC gets quicly saturated and may come under heavy > thrashing for the reason that original and recirculated packets keep overwrite > existing active EMC entries due to its limited size (8k). > > This thrashing causes the EMC to be less efficient than the dcpls in terms of > lookups and insertions. > > This patch allows to use the EMC efficiently by allowing only the 'original' > packets to hit EMC. All recirculated packets are sent to the classifier > directly. > An empirical threshold (EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50%) for > EMC occupancy is set to trigger this logic. By doing so when EMC utilization > exceeds > EMC_RECIRCT_NO_INSERT_THRESHOLD: > - EMC Insertions are allowed just for original packets. EMC insertion >and look up is skipped for recirculated packets. > - Recirculated packets are sent to the classifier. > > This patch is based on patch > "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at: > https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html > Also, this patch depends on the previous one in this series. > > Signed-off-by: Antonio Fischetti > Signed-off-by: Bhanuprakash Bodireddy > > Co-authored-by: Bhanuprakash Bodireddy > > --- > In our Connection Tracker testbench set up with > > table=0, priority=1 actions=drop > table=0, priority=10,arp actions=NORMAL table=0, priority=100,ct_state=- > trk,ip actions=ct(table=1) table=1, ct_state=+new+trk,ip,in_port=1 > actions=ct(commit),output:2 table=1, ct_state=+est+trk,ip,in_port=1 > actions=output:2 table=1, ct_state=+new+trk,ip,in_port=2 actions=drop > table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1 > > we saw the following performance improvement. > > We measured packet Rx rate (regardless of packet loss). Bidirectional test > with 64B UDP packets. > Each row is a test with a different number of traffic streams. The traffic > generator is set so that each stream establishes one UDP connection. > Mpps columns reports the Rx rates on the 2 sides. > > +--+---+ > | Original OvS-DPDK |Previous case | > | + patches #1,2 |+ this patch | > -++-++--+ > Traffic | Rx | EMC | Rx | EMC| > Streams | [Mpps] | entries | [Mpps] | entries | > -++-++--+ > 10 | 2.60, 2.67 |20 | 2.60, 2.64 |20| > 100 | 2.53, 2.58 | 200 | 2.59, 2.61 | 201| >1,000 | 2.02, 2.03 | 1929 | 2.15, 2.15 | 1997| >2,000 | 1.94, 1.96 | 3661 | 1.97, 1.98 | 3668| >3,000 | 1.87, 1.90 | 5086 | 1.96, 1.98 | 4736| >4,000 | 1.82, 1.82 | 6173 | 1.95, 1.94 | 5280| > 10,000 | 1.68, 1.69 | 7826 | 1.84, 1.84 | 7102| > 30,000 | 1.57, 1.58 | 8192 | 1.68, 1.70 | 8192| > -++-++--+ > > This test setup implies 1 recirculation on each received packet. > We didn't check this patch in a test scenario where more than 1 recirculation > is occurring per packet. > > lib/dpif-netdev.c | 63 > ++- > 1 file changed, 58 insertions(+), 5 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 9562827..79efce6 > 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -4573,6 +4573,9 @@ dp_netdev_queue_batches(struct dp_packet *pkt, > packet_batch_per_flow_update(batch, pkt, mf); } > > +/* Threshold to skip EMC for recirculated packets. */ #define > +EMC_RECIRCT_NO_INSERT_THRESHOLD 0xF000 > + > /* Try to process all ('cnt') the 'packets' using only the exact match cache > * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the > * miniflow is copied into 'keys' and the packet pointer is moved at the @@ - > 4620,15
Re: [ovs-dev] [PATCH v4] Update relevant artifacts to add support for DPDK 17.05.1.
On 07/25/2017 02:34 PM, Michal Weglicki wrote: Upgrading to DPDK 17.05.1 stable release adds new significant features relevant to OVS, including, but not limited to: - tun/tap PMD, - VFIO hotplug support, - Generic flow API. Following changes are applied: - netdev-dpdk: Changes required by DPDK API modifications. - doc: Because of DPDK API changes, backward compatibility with previous DPDK releases will be broken, thus all relevant documentation entries are updated. - .travis: DPDK version change from 16.11.1 to 17.05.1. - rhel/openvswitch-fedora.spec.in: DPDK version change from 16.11 to 17.05.1 v1->v2: Patch rebase. v2->v3: Fixed wrong formating after v2 patch rebase. v3->v4: Minor documentation changes. Signed-off-by: Michal Weglicki Reviewed-by: Aaron Conole --- .travis/linux-build.sh | 2 +- Documentation/faq/releases.rst | 1 + Documentation/howto/dpdk.rst | 6 +- Documentation/intro/install/dpdk.rst | 14 +-- Documentation/topics/dpdk/vhost-user.rst | 12 +-- NEWS | 1 + lib/netdev-dpdk.c| 144 +++ rhel/openvswitch-fedora.spec.in | 2 +- tests/dpdk/ring_client.c | 6 +- 9 files changed, 114 insertions(+), 74 deletions(-) [...] diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in index 3a045d3..2bb7102 100644 --- a/rhel/openvswitch-fedora.spec.in +++ b/rhel/openvswitch-fedora.spec.in @@ -84,7 +84,7 @@ BuildRequires: libcap-ng libcap-ng-devel %endif %if %{with dpdk} BuildRequires: libpcap-devel numactl-devel -BuildRequires: dpdk-devel >= 16.11 +BuildRequires: dpdk-devel >= 17.05.1 Provides: %{name}-dpdk = %{version}-%{release} %endif Hi, on Fedora there is no dpdk-devel 17.05.1 package, the last rawhide version is dpdk-devel 17.05 (and usually Fedora doesn't package minor releases of dpdk). I tried to build openvswitch with the dpdk 17.05 package from rawhide. It build and it works (tested with dpdkvhostuserclient) so, unless you need a specific fix from 17.05.1, I suggest to change the BuildRequires to 17.05 instead of 17.05.1 or the package cannot be built on Fedora, unless you are using an additional repository (copr). ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev