date:20170801

[ovs-dev] DPDK Merge Repo

2017-08-01 Thread Darrell Ball

Hi All

As mentioned before, I am using a repo for DPDK patch merging.
The repo is here:
https://github.com/darball/ovs/

There are still some outstanding patches from Bhanu that have not completed 
review yet:

util: Add PADDED_MEMBERS_CACHELINE_MARKER macro to mark cachelines.- Bhanu
packets: Reorganize the pkt_metadata structure. - Bhanu

and a series we would like to get into 2.8

netdev-dpdk: Use intermediate queue during packet transmission.  Bhanu Jun 29/V3
netdev: Add netdev_txq_flush function.
netdev-dpdk: Add netdev_dpdk_txq_flush function.
netdev-dpdk: Add netdev_dpdk_vhost_txq_flush function.
netdev-dpdk: Add intermediate queue support.
netdev-dpdk: Enable intermediate queue for vHost User port.
dpif-netdev: Flush the packets in intermediate queue.

Please let me know if something else is approved but missed ?
Anything else ?

Thanks Darrell


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4] Update relevant artifacts to add support for DPDK 17.05.1.

2017-08-01 Thread Szczerbik, PrzemyslawX

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Timothy M. Redaelli
> Sent: Tuesday, August 1, 2017 11:37 AM
> To: Weglicki, MichalX ; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v4] Update relevant artifacts to add support for
> DPDK 17.05.1.
> 
> On 07/25/2017 02:34 PM, Michal Weglicki wrote:
> > Upgrading to DPDK 17.05.1 stable release adds new
> > significant features relevant to OVS, including,
> > but not limited to:
> > - tun/tap PMD,
> > - VFIO hotplug support,
> > - Generic flow API.
> >
> > Following changes are applied:
> > - netdev-dpdk: Changes required by DPDK API modifications.
> > - doc: Because of DPDK API changes, backward compatibility
> >with previous DPDK releases will be broken, thus all
> >relevant documentation entries are updated.
> > - .travis: DPDK version change from 16.11.1 to 17.05.1.
> > - rhel/openvswitch-fedora.spec.in: DPDK version change
> >from 16.11 to 17.05.1
> >
> > v1->v2: Patch rebase.
> > v2->v3: Fixed wrong formating after v2 patch rebase.
> > v3->v4: Minor documentation changes.
> >
> > Signed-off-by: Michal Weglicki 
> > Reviewed-by: Aaron Conole 
> > ---
> >   .travis/linux-build.sh   |   2 +-
> >   Documentation/faq/releases.rst   |   1 +
> >   Documentation/howto/dpdk.rst |   6 +-
> >   Documentation/intro/install/dpdk.rst |  14 +--
> >   Documentation/topics/dpdk/vhost-user.rst |  12 +--
> >   NEWS |   1 +
> >   lib/netdev-dpdk.c| 144 
> > +++
> >   rhel/openvswitch-fedora.spec.in  |   2 +-
> >   tests/dpdk/ring_client.c |   6 +-
> >   9 files changed, 114 insertions(+), 74 deletions(-)
> >
> [...]
> > diff --git a/rhel/openvswitch-fedora.spec.in 
> > b/rhel/openvswitch-fedora.spec.in
> > index 3a045d3..2bb7102 100644
> > --- a/rhel/openvswitch-fedora.spec.in
> > +++ b/rhel/openvswitch-fedora.spec.in
> > @@ -84,7 +84,7 @@ BuildRequires: libcap-ng libcap-ng-devel
> >   %endif
> >   %if %{with dpdk}
> >   BuildRequires: libpcap-devel numactl-devel
> > -BuildRequires: dpdk-devel >= 16.11
> > +BuildRequires: dpdk-devel >= 17.05.1
> >   Provides: %{name}-dpdk = %{version}-%{release}
> >   %endif
> >
> 
> 
> Hi,
> on Fedora there is no dpdk-devel 17.05.1 package, the last rawhide
> version is dpdk-devel 17.05 (and usually Fedora doesn't package minor
> releases of dpdk).
> 
> I tried to build openvswitch with the dpdk 17.05 package from rawhide.
> It build and it works (tested with dpdkvhostuserclient) so, unless you
> need a specific fix from 17.05.1, I suggest to change the BuildRequires
> to 17.05 instead of 17.05.1 or the package cannot be built on Fedora,
> unless you are using an additional repository (copr).
Hi,

If I'm not mistaken we moved to 17.05.1 version, because it includes fix for 
'vhost: fix crash on NUMA', which was causing ovs-vswitchd to terminate.
More details here: http://dpdk.org/dev/patchwork/patch/25012/

Regards,
Przemek
--
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263


This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v5] dpctl: Add new 'ct-bkts' command.

2017-08-01 Thread Darrell Ball

Hi Antonio

I noticed a problem in final testing and I made a small incremental change.
Pls let me know if this is ok ?

Thanks Darrell

diff --git a/lib/dpctl.c b/lib/dpctl.c
index 8eb0671..59f09b7 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -1518,8 +1518,7 @@ dpctl_ct_bkts(int argc, const char *argv[],
 dpctl_print(dpctl_p, "Total Buckets: %d\n", tot_bkts);
 
 int tot_conn = 0;
-uint32_t conn_per_bkts[tot_bkts];
-memset(conn_per_bkts, 0, sizeof(uint32_t) * tot_bkts);
+uint32_t *conn_per_bkts = xzalloc(tot_bkts * sizeof(uint32_t));
 
 while (!ct_dpif_dump_next(dump, &cte)) {
 ct_dpif_entry_uninit(&cte);
@@ -1560,6 +1559,7 @@ dpctl_ct_bkts(int argc, const char *argv[],
 
 ct_dpif_dump_done(dump);
 dpif_close(dpif);
+free(conn_per_bkts);
 return error;
 }



-Original Message-
From:  on behalf of 
"antonio.fische...@intel.com" 
Date: Monday, July 24, 2017 at 3:01 AM
To: "d...@openvswitch.org" 
Subject: [ovs-dev] [PATCH v5] dpctl: Add new 'ct-bkts' command.

With the command:
 ovs-appctl dpctl/ct-bkts
shows the number of connections per bucket.

By using a threshold:
 ovs-appctl dpctl/ct-bkts gt=N
for each bucket shows the number of connections when they
are greater than N.

Signed-off-by: Antonio Fischetti 
Signed-off-by: Bhanuprakash Bodireddy 
Co-authored-by: Bhanuprakash Bodireddy 
---
 lib/conntrack.c|   9 ++--
 lib/conntrack.h|   2 +-
 lib/ct-dpif.c  |   4 +-
 lib/ct-dpif.h  |   3 +-
 lib/dpctl.c| 103 
-
 lib/dpctl.man  |   7 +++
 lib/dpif-netdev.c  |   4 +-
 lib/dpif-netlink.c |   4 +-
 lib/dpif-provider.h|   2 +-
 lib/netlink-conntrack.c|   6 ++-
 lib/netlink-conntrack.h|   3 +-
 tests/test-netlink-conntrack.c |   3 +-
 utilities/ovs-dpctl.c  |   1 +
 13 files changed, 134 insertions(+), 17 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index de46a6b..e290b20 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1931,7 +1931,7 @@ conn_key_to_tuple(const struct conn_key *key, struct 
ct_dpif_tuple *tuple)
 
 static void
 conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry,
-  long long now)
+  long long now, int bkt)
 {
 struct ct_l4_proto *class;
 long long expiration;
@@ -1954,11 +1954,12 @@ conn_to_ct_dpif_entry(const struct conn *conn, 
struct ct_dpif_entry *entry,
 if (class->conn_get_protoinfo) {
 class->conn_get_protoinfo(conn, &entry->protoinfo);
 }
+entry->bkt = bkt;
 }
 
 int
 conntrack_dump_start(struct conntrack *ct, struct conntrack_dump *dump,
- const uint16_t *pzone)
+ const uint16_t *pzone, int *ptot_bkts)
 {
 memset(dump, 0, sizeof(*dump));
 if (pzone) {
@@ -1967,6 +1968,8 @@ conntrack_dump_start(struct conntrack *ct, struct 
conntrack_dump *dump,
 }
 dump->ct = ct;
 
+*ptot_bkts = CONNTRACK_BUCKETS;
+
 return 0;
 }
 
@@ -1991,7 +1994,7 @@ conntrack_dump_next(struct conntrack_dump *dump, 
struct ct_dpif_entry *entry)
 INIT_CONTAINER(conn, node, node);
 if ((!dump->filter_zone || conn->key.zone == dump->zone) &&
  (conn->conn_type != CT_CONN_TYPE_UN_NAT)) {
-conn_to_ct_dpif_entry(conn, entry, now);
+conn_to_ct_dpif_entry(conn, entry, now, dump->bucket);
 break;
 }
 /* Else continue, until we find an entry in the appropriate 
zone
diff --git a/lib/conntrack.h b/lib/conntrack.h
index defde4c..3f48444 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -108,7 +108,7 @@ struct conntrack_dump {
 struct ct_dpif_entry;
 
 int conntrack_dump_start(struct conntrack *, struct conntrack_dump *,
- const uint16_t *pzone);
+ const uint16_t *pzone, int *);
 int conntrack_dump_next(struct conntrack_dump *, struct ct_dpif_entry *);
 int conntrack_dump_done(struct conntrack_dump *);
 
diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index f8d2cf1..c79e69e 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -65,12 +65,12 @@ static const struct flags ct_dpif_status_flags[] = {
  * that represents the error.  Otherwise it returns zero. */
 int
 ct_dpif_dump_start(struct dpif *dpif, struct ct_dpif_dump_state **dump,
-   const uint16_t *zone)
+   const uint16_t *zone, int *ptot_bkts)
 {
 int err;
 
 err =

Re: [ovs-dev] [PATCH] Prepare for 2.8.0.

2017-08-01 Thread Justin Pettit

> On Aug 1, 2017, at 12:09 PM, Ben Pfaff  wrote:
> 
> Signed-off-by: Ben Pfaff 

I usually hold off on putting the changes in "debian/changelog", since NEWS 
often continues to grow post-branch, and it's easy to miss updates.  However, I 
don't know how often people look at that file, so it may not matter.

Acked-by: Justin Pettit 

--Justin

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] RFC: Let's branch for 2.8 today

2017-08-01 Thread Yang, Yi Y

Hi, Ben

Ericsson, Redhat and Intel has thought OVS 2.8 would include NSH support, there 
are some projects depending on this, for example, Opendaylight (Openflowplugin, 
SFC and NetVirt), OPNFV and Openstack (networking-sfc).

This email is long and easy to make people misunderstand :-) my readout is OVS 
2.8 will include NSH support although it is branched now, can you double 
confirm this? My stakeholders need an affirmative answer.

-Original Message-
From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org] 
On Behalf Of Ben Pfaff
Sent: Wednesday, August 2, 2017 1:24 AM
To: d...@openvswitch.org
Subject: [ovs-dev] RFC: Let's branch for 2.8 today

I'd like to create branch-2.8 today in preparation for releasing OVS 2.8 later 
in August.  I sent the patch that would lead off the branch a few minutes ago:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336544.html

According to our published schedule, we should have create branch-2.8 at the 
beginning of July, but we delayed to allow some more features to enter the 2.8 
release.  There continue to be reasonable arguments along these lines in favor 
of delaying branching further:

* Many DPDK patches have been posted but are still under review
  or awaiting application.

* Multiple series of patches leading up to NSH support have been
  posted but are still iterating through reviews.

The primary argument in favor of branching quickly is to keep to a regular 
release cadence.  My understanding is that multiple downstream projects rely on 
prompt Open vSwitch releases.  (This is mainly based on speaking to Russell 
Bryant at Red Hat; he can fill in the details if need be.)  I'd prefer to stick 
to our schedule for this reason.

I propose that we branch today, with the following further details:

* Continue to apply patches that add features, that as of now
  have already been posted and ready for review or iterating on
  reviews, until approximately this Friday.  These patches will
  need to be committed both to master and branch-2.8.

* As always, bug fix patches are acceptable on any branch at any
  time.

* Plan to release, given acceptable quality, during the last
  week of August.

Thoughts?

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovn-northd: Add native active-standby HA.

2017-08-01 Thread Numan Siddique

On Wed, Aug 2, 2017 at 1:18 AM, Russell Bryant  wrote:

> On Tue, Aug 1, 2017 at 3:26 PM, Han Zhou  wrote:
> >
> >
> > On Tue, Aug 1, 2017 at 9:19 AM, Russell Bryant  wrote:
> >>
> >> Add native support for active-standby HA in ovn-northd by having each
> >> instance attempt to acquire an OVSDB lock.  Only the instance of
> >> ovn-northd that currently holds the lock will make active changes to
> >> the OVN databases.
> >>
> >> Signed-off-by: Russell Bryant 
> >> ---
> >>  NEWS|  1 +
> >>  ovn/northd/ovn-northd.8.xml |  9 +
> >>  ovn/northd/ovn-northd.c | 40 ++
> +-
> >>  3 files changed, 41 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/NEWS b/NEWS
> >> index facea0228..f3cdd2443 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -49,6 +49,7 @@ Post-v2.7.0
> >> one chassis is specified, OVN will manage high availability for
> >> that
> >> gateway.
> >>   * Add support for ACL logging.
> >> + * ovn-northd now has native support for active-standby high
> >> availability.
> >> - Tracing with ofproto/trace now traces through recirculation.
> >> - OVSDB:
> >>   * New support for role-based access control (see ovsdb-server(1)).
> >> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> >> index 1527e8a60..0d85ec0d2 100644
> >> --- a/ovn/northd/ovn-northd.8.xml
> >> +++ b/ovn/northd/ovn-northd.8.xml
> >> @@ -72,6 +72,15 @@
> >>
> >>  
> >>
> >> +Active-Standby for High Availability
> >> +
> >> +  You may run ovn-northd more than once in an OVN
> >> deployment.
> >> +  OVN will automatically ensure that only one of them is active at
> a
> >> time.
> >> +  If multiple instances of ovn-northd are running and
> >> the
> >> +  active ovn-northd fails, one of the hot standby
> >> instances
> >> +  of ovn-northd will automatically take over.
> >> +
> >> +
> >>  Logical Flow Table Structure
> >>
> >>  
> >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> >> index 10e0c7ce0..3d2be4267 100644
> >> --- a/ovn/northd/ovn-northd.c
> >> +++ b/ovn/northd/ovn-northd.c
> >> @@ -6531,6 +6531,12 @@ main(int argc, char *argv[])
> >>  ovsdb_idl_add_column(ovnsb_idl_loop.idl,
> &sbrec_chassis_col_nb_cfg);
> >>  ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
> >>
> >> +/* Ensure that only a single ovn-northd is active in the deployment
> >> by
> >> + * acquiring a lock called "ovn_northd" on the southbound database
> >> + * and then only performing DB transactions if the lock is held. */
> >> +ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd");
> >> +bool had_lock = false;
> >> +
> >>  /* Main loop. */
> >>  exiting = false;
> >>  while (!exiting) {
> >> @@ -6541,15 +6547,29 @@ main(int argc, char *argv[])
> >>  .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
> >>  };
> >>
> >> -struct chassis_index chassis_index;
> >> -chassis_index_init(&chassis_index, ctx.ovnsb_idl);
> >> +if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> >> +VLOG_INFO("ovn-northd lock acquired. "
> >> +  "This ovn-northd instance is now active.");
> >> +had_lock = true;
> >> +} else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl))
> {
> >> +VLOG_INFO("ovn-northd lock lost. "
> >> +  "This ovn-northd instance is now on standby.");
> >
> > Should it try lock again, if we want it to be standby? Otherwise, this
> > instance won't have a chance to be active any more.
>
> Good question ... I was assuming this scenario was due to a lost
> connection, and that the IDL would automatically try to re-acquire the
> lock.
>
> I tested to make sure I saw a second ovn-northd go from standby to
> active, but I have not tested active -> standby -> active again.
>
> I'll take a closer look at this before applying the patch.
>
>
I tested it and it works fine. active -> standby -> active scenario also
works fine.
I also tested by restarting southbound ovsdb-server. Once ovsdb-server is
up again, the idl clients  try to
acquire the lock and one of the ovn-northd instance becomes active again.
I don't think it is required to try lock again as idl client takes care of
it.

How about starting another instance of ovn-northd in the sandbox/test
environment so that active/standby
scenario gets tested for all the ovn tests ?

Acked-by: Numan Siddique 

Thanks
Numan

>
> >> +had_lock = false;
> >> +}
> >>
> >> -ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop);
> >> -ovnsb_db_run(&ctx, &ovnsb_idl_loop);
> >> -if (ctx.ovnsb_txn) {
> >> -check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
> >> -check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
> >> -check_and_update_rbac(&ctx);
> >> +struct chassis_index c

Re: [ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Yang, Yi Y

Maybe 
encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210))
 is better, the below are possible formats, I think they are all ok, what do 
you think about them?


encap(nsh)
encap(nsh())
encap(nsh(md_type=1))
encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210))

-Original Message-
From: Ben Pfaff [mailto:b...@ovn.org] 
Sent: Wednesday, August 2, 2017 7:17 AM
To: Yang, Yi Y 
Cc: d...@openvswitch.org; Jan Scheurich ; Zoltan 
Balogh 
Subject: Re: [PATCH v4 1/2] OF support and translation of generic encap and 
decap

encap(hdr=nsh,nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210))

On Tue, Aug 01, 2017 at 10:53:27PM +, Yang, Yi Y wrote:
> Ben, because we're considering to cover NSH md type 2 case, for NSH TLV, now 
> we provide it by the below way.
> 
> encap(hdr=nsh,prop(class=nsh,type=md_type,val=2),prop(class=nsh,type=t
> lv,val(0x1000,10,0x12345678)),prop(class=nsh,type=tlv,val(0x2000,20,0x
> fedcba9876543210)))
> 
> Can you help provide a format you prefer to handle this use case?
> 
> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Tuesday, August 1, 2017 11:56 PM
> To: Yang, Yi Y 
> Cc: d...@openvswitch.org; Jan Scheurich ; 
> Zoltan Balogh 
> Subject: Re: [PATCH v4 1/2] OF support and translation of generic 
> encap and decap
> 
> On Tue, Aug 01, 2017 at 08:06:59PM +0800, Yi Yang wrote:
> > From: Jan Scheurich 
> > 
> > This commit adds support for the OpenFlow actions generic encap and 
> > decap (as specified in ONF EXT-382) to the OVS control plane.
> > 
> > CLI syntax for encap action with properties:
> >   encap(hdr=)
> >   encap(hdr=,
> > prop(class=,type=,val=),
> > prop(class=,type=,val()))
> > 
> > CLI syntax for decap action:
> >   decap()
> >   decap(packet_type(ns=,type=))
> > 
> > The first header supported for encap and decap is "ethernet" to 
> > convert packets between packet_type (1,Ethertype) and (0,0).
> > 
> > This commit also implements a skeleton for the translation of 
> > generic encap and decap actions in ofproto-dpif and adds support to 
> > encap and decap an Ethernet header.
> > 
> > In general translation of encap commits pending actions and then 
> > rewrites struct flow in accordance with the new packet type and 
> > header. In the case of encap(ethernet) it suffices to change the 
> > packet type from (1, Ethertype) to (0,0) and set the dl_type 
> > accordingly. A new pending_encap flag in xlate ctx is set to mark 
> > that an corresponding datapath encap action must be triggered at the 
> > next commit. In the case of encap(ethernet) ofproto generetas a push_eth 
> > action.
> > 
> > The general case for translation of decap() is to emit a datapath 
> > action to decap the current outermost header and then recirculate 
> > the packet to reparse the inner headers. In the special case of an 
> > Ethernet packet,
> > decap() just changes the packet type from (0,0) to (1, dl_type) 
> > without a need to recirculate. The emission of the pop_eth action 
> > for the datapath is postponed to the next commit.
> > 
> > Hence encap(ethernet) and decap() on an Ethernet packet are OF 
> > octions that only incur a cost in the dataplane when a modifed 
> > packet is actually committed, e.g. because it is sent out. They can 
> > freely be used for normalizing the packet type in the OF pipeline 
> > without degrading performance.
> > 
> > Signed-off-by: Jan Scheurich 
> > Signed-off-by: Yi Yang 
> > Signed-off-by: Zoltan Balogh 
> > Co-authored-by: Zoltan Balogh 
> 
> Thanks for iterating so quickly!
> 
> Besides the syntax for properties, which I still think should be simplified 
> to e.g. nsh(md_type=1), I have only a few remaining comments.
> 
> I don't think there's any value in the n_props member in nx_action_encap.  
> (This is about the OpenFlow wire format now, not the internal format.)  The 
> properties are the whole tail of the structure and having a count doesn't 
> make anything easier.  Can you remove it?  It will allow us to drop 8 bytes 
> from the action structure due to padding.
> (In case it can't be removed, I'm providing a spelling fix.)
> 
> decode_ed_prop() still doesn't check the length properly, so I'm providing a 
> fix.
> 
> I'm appending my suggested incremental.
> 
> Thanks again!
> 
> --8<--cut here-->8--
> 
> diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index 
> d0437f20922a..c7d47eb5dd71 100644
> --- a/lib/ofp-actions.c
> +++ b/lib/ofp-actions.c
> @@ -4036,7 +4036,7 @@ struct nx_action_encap {
>  ovs_be16 subtype;  /* NXAST_ENCAP. */
>  ovs_be16 hdr_size; /* Header size in bytes, 0 = 'not specified'.*/
>  ovs_be32 new_pkt_type; /* Header type to add and PACKET_TYPE of result. 
> */
> -ovs_be16 n_props;  /* Number of the following endecap properties. */
> +ovs_be16 n_props;  /* Number of the following encap properties. */
>  uint8_

Re: [ovs-dev] RFC: publishing per-release roadmaps

2017-08-01 Thread Ben Pfaff

On Tue, Aug 01, 2017 at 07:53:18PM -0400, Russell Bryant wrote:
> On Tue, Aug 1, 2017 at 6:47 PM, Ben Pfaff  wrote:
> > Hello everyone.  Last week I spent an afternoon with the OVS-DPDK folks
> > at Intel in Shannon, Ireland.  One of the ideas that came out of it was
> > a proposal for the contributors at each company to publish, at the
> > beginning of each release cycle, a list of what features and other
> > contributions they were planning to target for the upcoming release.
> > This sounds useful to me as a way for each of the major contributors to
> > get an idea of what to expect and how to coordinate over a release
> > cycle.
> >
> > I believe that Intel is volunteering to kick this off in the 2.9 release
> > cycle for OVS.  I'm going to try to get us here at VMware to write a
> > list as well, and I encourage other contributors to give it a try as
> > well.
> >
> > If this works out then I'd consider adding a item for it to
> > release-process.rst in the tree.
> 
> Sure - the more communication the better.
> 
> My only concern is continuing to set clear expecations that the
> release cycle is time based.  It should be clear that the list is not
> a committed roadmap, but a living document communicating intent that
> is subject to change if features miss the release schedule.

Good point, I agree that we should be clear that we're sticking to a
time-based release plan.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] RFC: publishing per-release roadmaps

2017-08-01 Thread Russell Bryant

On Tue, Aug 1, 2017 at 6:47 PM, Ben Pfaff  wrote:
> Hello everyone.  Last week I spent an afternoon with the OVS-DPDK folks
> at Intel in Shannon, Ireland.  One of the ideas that came out of it was
> a proposal for the contributors at each company to publish, at the
> beginning of each release cycle, a list of what features and other
> contributions they were planning to target for the upcoming release.
> This sounds useful to me as a way for each of the major contributors to
> get an idea of what to expect and how to coordinate over a release
> cycle.
>
> I believe that Intel is volunteering to kick this off in the 2.9 release
> cycle for OVS.  I'm going to try to get us here at VMware to write a
> list as well, and I encourage other contributors to give it a try as
> well.
>
> If this works out then I'd consider adding a item for it to
> release-process.rst in the tree.

Sure - the more communication the better.

My only concern is continuing to set clear expecations that the
release cycle is time based.  It should be clear that the list is not
a committed roadmap, but a living document communicating intent that
is subject to change if features miss the release schedule.

-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH V2] dpif-netlink: Fix log level for error message

2017-08-01 Thread Joe Stringer

On 1 August 2017 at 15:49, Eric Garver  wrote:
> On Tue, Aug 01, 2017 at 01:56:44PM -0700, Joe Stringer wrote:
>> On 1 August 2017 at 13:56, Joe Stringer  wrote:
>> > On 29 July 2017 at 21:58, Roi Dayan  wrote:
>> >> Since it's an error but also will always occur in older kernels
>> >> log the message with level warning instead of info.
>> >>
>> >> Signed-off-by: Roi Dayan 
>> >> ---
>> >
>> > When I run this patch on a fedora system with kernel 4.8 and
>> > out-of-tree modules, when running the "make check-kmod" tests, every
>> > vxlan test now fails because the last step of the test checks the logs
>> > for any log message with WARN or higher level, and this message shows
>> > up. Looking at the other platforms affected, it seems like this will
>> > always triggers on newer kernels in my test environment. Here's the
>> > log:
>> >
>> > 2017-08-01T20:43:59.387Z|00053|dpif_netlink|WARN|Failed to create
>> > at_vxlan0 with rtnetlink: Invalid argument
>> >
>> > I wonder if the new rtnetlink infrastructure is not always working for
>> > VXLAN, and this error message is just highlighting that? Eric, could
>> > this be related to VXLAN-GPE?
>>
>> Correction: The non-GPE VXLAN tests fail in this case.
>
> Yup, sorry. This broke with VXLAN-GPE, but we didn't notice because it
> was falling back to the compat interface. A good example of why this log
> needs to be higher. :)
>
> I posted a fix.

Indeed, I think that this shows that the log message should be a bit
higher level, and I tested this with your fix and no longer see such
messages.

I applied this patch to master, thanks folks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] dpif-netlink-rtnl: Fix false errors on interfaces without tunnel config

2017-08-01 Thread Joe Stringer

On 1 August 2017 at 13:57, Joe Stringer  wrote:
> On 28 July 2017 at 05:43, Simon Horman  wrote:
>> On Thu, Jul 27, 2017 at 02:40:02PM +0300, Roi Dayan wrote:
>>> When we skip adding a port using rtnetlink and not because of an error we
>>> need to return EOPNOTSUPP to avoid logging an error message.
>>>
>>> Fixes: 2fd3d5eda508 ("dpif-netlink-rtnl: Support layer3 GRE")
>>> Signed-off-by: Roi Dayan 
>>> Reviewed-by: Paul Blakey 
>>
>> Thanks, this looks good to me. I would be happy to apply if someone
>> provided an Ack. I'd also be happy if someone else applied it with:
>>
>> Acked-by: Simon Horman 
>
> Acked-by: Joe Stringer 

I needed this to test Eric's recent VXLAN non-GPE fix, so I applied
this to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] dpif-netlink-rtnl: Fix VXLAN port create for regular VXLAN

2017-08-01 Thread Joe Stringer

On 1 August 2017 at 15:47, Eric Garver  wrote:
> When VXLAN-GPE was introduced we added IFLA_VXLAN_GPE to the policy
> parsing, but did not mark it as optional. The kernel only returns this
> netlink attribute if it's actually configured.
>
> This also adds a missing entry for IFLA_VXLAN_GBP. Apparently we have no
> system-traffic test coverage there.
>
> Fixes: c33c412fb139 ("dpif-netlink-rtnl: Support VXLAN-GPE")
> Fixes: 825e45e0109f ("dpif-netlink-rtnl: Support VXLAN creation")
> Reported-by: Joe Stringer 
> Signed-off-by: Eric Garver 

Thanks for the quick fix, I applied this to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] Salut

2017-08-01 Thread jeff Chris via dev

Bonjour cherJ'ai une proposition d'affaires qui sera avantageuse pour nous 
deux, et je vous indemniserai avec le pourcentage à la conclusion finale. Si 
vous êtes intéressé s'il vous plaît me répondre en arrière afin de vous fournir 
plus de détails sur la façon dont nous pouvons procéder plus loin. C'est 
confidentiel et légal.Concerne.
Jeff.

Hello dearI have a business Proposal that will be  benefit to both of us, and I 
shall  compensate you with percentage at the final conclusion. If you are 
interested please reply me back in order to provide you more detail on how we 
can proceed further .This is confidential and legal .Regards.
Jeff. 





Name: Jeff Chris.Email:jeff.ch...@linuxmail.org
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Ben Pfaff

encap(hdr=nsh,nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210))

On Tue, Aug 01, 2017 at 10:53:27PM +, Yang, Yi Y wrote:
> Ben, because we're considering to cover NSH md type 2 case, for NSH TLV, now 
> we provide it by the below way.
> 
> encap(hdr=nsh,prop(class=nsh,type=md_type,val=2),prop(class=nsh,type=tlv,val(0x1000,10,0x12345678)),prop(class=nsh,type=tlv,val(0x2000,20,0xfedcba9876543210)))
> 
> Can you help provide a format you prefer to handle this use case?
> 
> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org] 
> Sent: Tuesday, August 1, 2017 11:56 PM
> To: Yang, Yi Y 
> Cc: d...@openvswitch.org; Jan Scheurich ; Zoltan 
> Balogh 
> Subject: Re: [PATCH v4 1/2] OF support and translation of generic encap and 
> decap
> 
> On Tue, Aug 01, 2017 at 08:06:59PM +0800, Yi Yang wrote:
> > From: Jan Scheurich 
> > 
> > This commit adds support for the OpenFlow actions generic encap and 
> > decap (as specified in ONF EXT-382) to the OVS control plane.
> > 
> > CLI syntax for encap action with properties:
> >   encap(hdr=)
> >   encap(hdr=,
> > prop(class=,type=,val=),
> > prop(class=,type=,val()))
> > 
> > CLI syntax for decap action:
> >   decap()
> >   decap(packet_type(ns=,type=))
> > 
> > The first header supported for encap and decap is "ethernet" to 
> > convert packets between packet_type (1,Ethertype) and (0,0).
> > 
> > This commit also implements a skeleton for the translation of generic 
> > encap and decap actions in ofproto-dpif and adds support to encap and 
> > decap an Ethernet header.
> > 
> > In general translation of encap commits pending actions and then 
> > rewrites struct flow in accordance with the new packet type and 
> > header. In the case of encap(ethernet) it suffices to change the 
> > packet type from (1, Ethertype) to (0,0) and set the dl_type 
> > accordingly. A new pending_encap flag in xlate ctx is set to mark that 
> > an corresponding datapath encap action must be triggered at the next 
> > commit. In the case of encap(ethernet) ofproto generetas a push_eth action.
> > 
> > The general case for translation of decap() is to emit a datapath 
> > action to decap the current outermost header and then recirculate the 
> > packet to reparse the inner headers. In the special case of an 
> > Ethernet packet,
> > decap() just changes the packet type from (0,0) to (1, dl_type) 
> > without a need to recirculate. The emission of the pop_eth action for 
> > the datapath is postponed to the next commit.
> > 
> > Hence encap(ethernet) and decap() on an Ethernet packet are OF octions 
> > that only incur a cost in the dataplane when a modifed packet is 
> > actually committed, e.g. because it is sent out. They can freely be 
> > used for normalizing the packet type in the OF pipeline without 
> > degrading performance.
> > 
> > Signed-off-by: Jan Scheurich 
> > Signed-off-by: Yi Yang 
> > Signed-off-by: Zoltan Balogh 
> > Co-authored-by: Zoltan Balogh 
> 
> Thanks for iterating so quickly!
> 
> Besides the syntax for properties, which I still think should be simplified 
> to e.g. nsh(md_type=1), I have only a few remaining comments.
> 
> I don't think there's any value in the n_props member in nx_action_encap.  
> (This is about the OpenFlow wire format now, not the internal format.)  The 
> properties are the whole tail of the structure and having a count doesn't 
> make anything easier.  Can you remove it?  It will allow us to drop 8 bytes 
> from the action structure due to padding.
> (In case it can't be removed, I'm providing a spelling fix.)
> 
> decode_ed_prop() still doesn't check the length properly, so I'm providing a 
> fix.
> 
> I'm appending my suggested incremental.
> 
> Thanks again!
> 
> --8<--cut here-->8--
> 
> diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index 
> d0437f20922a..c7d47eb5dd71 100644
> --- a/lib/ofp-actions.c
> +++ b/lib/ofp-actions.c
> @@ -4036,7 +4036,7 @@ struct nx_action_encap {
>  ovs_be16 subtype;  /* NXAST_ENCAP. */
>  ovs_be16 hdr_size; /* Header size in bytes, 0 = 'not specified'.*/
>  ovs_be32 new_pkt_type; /* Header type to add and PACKET_TYPE of result. 
> */
> -ovs_be16 n_props;  /* Number of the following endecap properties. */
> +ovs_be16 n_props;  /* Number of the following encap properties. */
>  uint8_t pad[6];/* Align to 8 bytes boundary */
>  struct ofp_ed_prop_header props[];  /* Encap TLV properties. */  }; @@ 
> -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad,
> enum ofp_version ofp_version OVS_UNUSED,
> struct ofpbuf *ofpacts)  {
> +if (ntohs(nad->len) > sizeof *nad) {
> +/* No properties supported yet. */
> +return OFPERR_OFPBPC_BAD_TYPE;
> +}
> +
>  struct ofpact_decap * decap;
>  
>  decap = ofpact_put_DECAP(ofpacts);
> diff --git a/lib/ofp-ed-props.c b/l

Re: [ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Ben Pfaff

An action has an embedded length.  For nx_action_encap, if the embedded
length ->len is longer than sizeof(struct nx_action_encap), then
properties follow struct nx_action_encap until the length has been
exhausted.

On Tue, Aug 01, 2017 at 11:06:21PM +, Yang, Yi Y wrote:
> About why we need n_props in nx_action_encap, I added this for Opendaylight 
> to deserialize the wire format from OVS, n_props can clearly tell 
> Opendaylight if there is any property. Otherwise how do we check if there is 
> a property following?
> 
> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org] 
> Sent: Tuesday, August 1, 2017 11:56 PM
> To: Yang, Yi Y 
> Cc: d...@openvswitch.org; Jan Scheurich ; Zoltan 
> Balogh 
> Subject: Re: [PATCH v4 1/2] OF support and translation of generic encap and 
> decap
> 
> On Tue, Aug 01, 2017 at 08:06:59PM +0800, Yi Yang wrote:
> > From: Jan Scheurich 
> > 
> > This commit adds support for the OpenFlow actions generic encap and 
> > decap (as specified in ONF EXT-382) to the OVS control plane.
> > 
> > CLI syntax for encap action with properties:
> >   encap(hdr=)
> >   encap(hdr=,
> > prop(class=,type=,val=),
> > prop(class=,type=,val()))
> > 
> > CLI syntax for decap action:
> >   decap()
> >   decap(packet_type(ns=,type=))
> > 
> > The first header supported for encap and decap is "ethernet" to 
> > convert packets between packet_type (1,Ethertype) and (0,0).
> > 
> > This commit also implements a skeleton for the translation of generic 
> > encap and decap actions in ofproto-dpif and adds support to encap and 
> > decap an Ethernet header.
> > 
> > In general translation of encap commits pending actions and then 
> > rewrites struct flow in accordance with the new packet type and 
> > header. In the case of encap(ethernet) it suffices to change the 
> > packet type from (1, Ethertype) to (0,0) and set the dl_type 
> > accordingly. A new pending_encap flag in xlate ctx is set to mark that 
> > an corresponding datapath encap action must be triggered at the next 
> > commit. In the case of encap(ethernet) ofproto generetas a push_eth action.
> > 
> > The general case for translation of decap() is to emit a datapath 
> > action to decap the current outermost header and then recirculate the 
> > packet to reparse the inner headers. In the special case of an 
> > Ethernet packet,
> > decap() just changes the packet type from (0,0) to (1, dl_type) 
> > without a need to recirculate. The emission of the pop_eth action for 
> > the datapath is postponed to the next commit.
> > 
> > Hence encap(ethernet) and decap() on an Ethernet packet are OF octions 
> > that only incur a cost in the dataplane when a modifed packet is 
> > actually committed, e.g. because it is sent out. They can freely be 
> > used for normalizing the packet type in the OF pipeline without 
> > degrading performance.
> > 
> > Signed-off-by: Jan Scheurich 
> > Signed-off-by: Yi Yang 
> > Signed-off-by: Zoltan Balogh 
> > Co-authored-by: Zoltan Balogh 
> 
> Thanks for iterating so quickly!
> 
> Besides the syntax for properties, which I still think should be simplified 
> to e.g. nsh(md_type=1), I have only a few remaining comments.
> 
> I don't think there's any value in the n_props member in nx_action_encap.  
> (This is about the OpenFlow wire format now, not the internal format.)  The 
> properties are the whole tail of the structure and having a count doesn't 
> make anything easier.  Can you remove it?  It will allow us to drop 8 bytes 
> from the action structure due to padding.
> (In case it can't be removed, I'm providing a spelling fix.)
> 
> decode_ed_prop() still doesn't check the length properly, so I'm providing a 
> fix.
> 
> I'm appending my suggested incremental.
> 
> Thanks again!
> 
> --8<--cut here-->8--
> 
> diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index 
> d0437f20922a..c7d47eb5dd71 100644
> --- a/lib/ofp-actions.c
> +++ b/lib/ofp-actions.c
> @@ -4036,7 +4036,7 @@ struct nx_action_encap {
>  ovs_be16 subtype;  /* NXAST_ENCAP. */
>  ovs_be16 hdr_size; /* Header size in bytes, 0 = 'not specified'.*/
>  ovs_be32 new_pkt_type; /* Header type to add and PACKET_TYPE of result. 
> */
> -ovs_be16 n_props;  /* Number of the following endecap properties. */
> +ovs_be16 n_props;  /* Number of the following encap properties. */
>  uint8_t pad[6];/* Align to 8 bytes boundary */
>  struct ofp_ed_prop_header props[];  /* Encap TLV properties. */  }; @@ 
> -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad,
> enum ofp_version ofp_version OVS_UNUSED,
> struct ofpbuf *ofpacts)  {
> +if (ntohs(nad->len) > sizeof *nad) {
> +/* No properties supported yet. */
> +return OFPERR_OFPBPC_BAD_TYPE;
> +}
> +
>  struct ofpact_decap * decap;
>  
>  decap = ofpact_put_DECAP(ofpacts);
> diff --

[ovs-dev] Pólizas del mes de Agosto

2017-08-01 Thread Recursos Humanos

Le recordamos que nuestras Pólizas de Capacitación son una excelente opción 
para capacitarse desde su oficina, sin gastos de traslado o viáticos y con 
expertos de primer nivel que le darán a sus colaboradores la actualización que 
necesitan para ejercer sus puestos de la manera más competitiva.  

Pólizas de Capacitación disponibles: 

• Las relaciones humanas y el éxito en la empresa
• Servicio al cliente estilo Disney
• Auditoría y Control Interno
• Compras
• Ventas
• Recursos Humanos
• Contabilidad y Finanzas

Para mayores detalles responda este correo con la póliza de su Interés y llene 
sus datos en la parte de abajo.
Nombre:   
Teléfono: 
Correo:

Y le enviaremos la información completa.

centro telefónico: 018002129393
 


Coordinador de Evento


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Yang, Yi Y

About why we need n_props in nx_action_encap, I added this for Opendaylight to 
deserialize the wire format from OVS, n_props can clearly tell Opendaylight if 
there is any property. Otherwise how do we check if there is a property 
following?

-Original Message-
From: Ben Pfaff [mailto:b...@ovn.org] 
Sent: Tuesday, August 1, 2017 11:56 PM
To: Yang, Yi Y 
Cc: d...@openvswitch.org; Jan Scheurich ; Zoltan 
Balogh 
Subject: Re: [PATCH v4 1/2] OF support and translation of generic encap and 
decap

On Tue, Aug 01, 2017 at 08:06:59PM +0800, Yi Yang wrote:
> From: Jan Scheurich 
> 
> This commit adds support for the OpenFlow actions generic encap and 
> decap (as specified in ONF EXT-382) to the OVS control plane.
> 
> CLI syntax for encap action with properties:
>   encap(hdr=)
>   encap(hdr=,
> prop(class=,type=,val=),
> prop(class=,type=,val()))
> 
> CLI syntax for decap action:
>   decap()
>   decap(packet_type(ns=,type=))
> 
> The first header supported for encap and decap is "ethernet" to 
> convert packets between packet_type (1,Ethertype) and (0,0).
> 
> This commit also implements a skeleton for the translation of generic 
> encap and decap actions in ofproto-dpif and adds support to encap and 
> decap an Ethernet header.
> 
> In general translation of encap commits pending actions and then 
> rewrites struct flow in accordance with the new packet type and 
> header. In the case of encap(ethernet) it suffices to change the 
> packet type from (1, Ethertype) to (0,0) and set the dl_type 
> accordingly. A new pending_encap flag in xlate ctx is set to mark that 
> an corresponding datapath encap action must be triggered at the next 
> commit. In the case of encap(ethernet) ofproto generetas a push_eth action.
> 
> The general case for translation of decap() is to emit a datapath 
> action to decap the current outermost header and then recirculate the 
> packet to reparse the inner headers. In the special case of an 
> Ethernet packet,
> decap() just changes the packet type from (0,0) to (1, dl_type) 
> without a need to recirculate. The emission of the pop_eth action for 
> the datapath is postponed to the next commit.
> 
> Hence encap(ethernet) and decap() on an Ethernet packet are OF octions 
> that only incur a cost in the dataplane when a modifed packet is 
> actually committed, e.g. because it is sent out. They can freely be 
> used for normalizing the packet type in the OF pipeline without 
> degrading performance.
> 
> Signed-off-by: Jan Scheurich 
> Signed-off-by: Yi Yang 
> Signed-off-by: Zoltan Balogh 
> Co-authored-by: Zoltan Balogh 

Thanks for iterating so quickly!

Besides the syntax for properties, which I still think should be simplified to 
e.g. nsh(md_type=1), I have only a few remaining comments.

I don't think there's any value in the n_props member in nx_action_encap.  
(This is about the OpenFlow wire format now, not the internal format.)  The 
properties are the whole tail of the structure and having a count doesn't make 
anything easier.  Can you remove it?  It will allow us to drop 8 bytes from the 
action structure due to padding.
(In case it can't be removed, I'm providing a spelling fix.)

decode_ed_prop() still doesn't check the length properly, so I'm providing a 
fix.

I'm appending my suggested incremental.

Thanks again!

--8<--cut here-->8--

diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index 
d0437f20922a..c7d47eb5dd71 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -4036,7 +4036,7 @@ struct nx_action_encap {
 ovs_be16 subtype;  /* NXAST_ENCAP. */
 ovs_be16 hdr_size; /* Header size in bytes, 0 = 'not specified'.*/
 ovs_be32 new_pkt_type; /* Header type to add and PACKET_TYPE of result. */
-ovs_be16 n_props;  /* Number of the following endecap properties. */
+ovs_be16 n_props;  /* Number of the following encap properties. */
 uint8_t pad[6];/* Align to 8 bytes boundary */
 struct ofp_ed_prop_header props[];  /* Encap TLV properties. */  }; @@ 
-4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad,
enum ofp_version ofp_version OVS_UNUSED,
struct ofpbuf *ofpacts)  {
+if (ntohs(nad->len) > sizeof *nad) {
+/* No properties supported yet. */
+return OFPERR_OFPBPC_BAD_TYPE;
+}
+
 struct ofpact_decap * decap;
 
 decap = ofpact_put_DECAP(ofpacts);
diff --git a/lib/ofp-ed-props.c b/lib/ofp-ed-props.c index 
260adc30acf0..6bba3ac7d982 100644
--- a/lib/ofp-ed-props.c
+++ b/lib/ofp-ed-props.c
@@ -32,6 +32,9 @@ decode_ed_prop(const struct ofp_ed_prop_header **ofp_prop,
 uint16_t prop_class = ntohs((*ofp_prop)->prop_class);
 size_t len = (*ofp_prop)->len;
 size_t pad_len = ROUND_UP(len, 8);
+if (pad_len > *remaining) {
+return OFPERR_OFPBAC_BAD_LEN;
+}
 
 switch (prop_class) {
 default:
__

Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Yang, Yi Y

Got it, thanks a lot.

-Original Message-
From: Ben Pfaff [mailto:b...@ovn.org] 
Sent: Wednesday, August 2, 2017 6:51 AM
To: Yang, Yi Y 
Cc: Zoltán Balogh ; d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic 
encap and decap

I don't understand your explanation.  decode_NXAST_RAW_DECAP() converts an 
OpenFlow action in wire format into OVS internal format.  The OpenFlow action 
that it contains might have been generated by anything (for example, a 
controller).  parse_ENCAP() doesn't have anything to do with it as far as I can 
tell.

On Tue, Aug 01, 2017 at 10:45:50PM +, Yang, Yi Y wrote:
> Ben, thank you for your patch, but from my understanding, parse_ENCAP has 
> ensured it is impossible to have any property for decap, so I'm not sure in 
> what case this will happened.
> 
> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Tuesday, August 1, 2017 11:26 PM
> To: Yang, Yi Y 
> Cc: Zoltán Balogh ; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of 
> generic encap and decap
> 
> On Tue, Aug 01, 2017 at 12:32:20PM +, Yang, Yi Y wrote:
> > #2.
> > [Ben] I suspect that decode_NXAST_RAW_DECAP() should report an error if 
> > properties are present, since it doesn't support properties.
> > 
> > [Yi] It is impossible. 
> 
> What is impossible?  It is easy to detect that properties are present and 
> report an error:
> 
> diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index 
> d0437f20922a..7be302a4005d 100644
> --- a/lib/ofp-actions.c
> +++ b/lib/ofp-actions.c
> @@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap 
> *nad,
> enum ofp_version ofp_version OVS_UNUSED,
> struct ofpbuf *ofpacts)  {
> +if (ntohs(nad->len) > sizeof *nad) {
> +/* No properties supported yet. */
> +return OFPERR_OFPBPC_BAD_TYPE;
> +}
> +
>  struct ofpact_decap * decap;
>  
>  decap = ofpact_put_DECAP(ofpacts);
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Yang, Yi Y

Ben, because we're considering to cover NSH md type 2 case, for NSH TLV, now we 
provide it by the below way.

encap(hdr=nsh,prop(class=nsh,type=md_type,val=2),prop(class=nsh,type=tlv,val(0x1000,10,0x12345678)),prop(class=nsh,type=tlv,val(0x2000,20,0xfedcba9876543210)))

Can you help provide a format you prefer to handle this use case?

-Original Message-
From: Ben Pfaff [mailto:b...@ovn.org] 
Sent: Tuesday, August 1, 2017 11:56 PM
To: Yang, Yi Y 
Cc: d...@openvswitch.org; Jan Scheurich ; Zoltan 
Balogh 
Subject: Re: [PATCH v4 1/2] OF support and translation of generic encap and 
decap

On Tue, Aug 01, 2017 at 08:06:59PM +0800, Yi Yang wrote:
> From: Jan Scheurich 
> 
> This commit adds support for the OpenFlow actions generic encap and 
> decap (as specified in ONF EXT-382) to the OVS control plane.
> 
> CLI syntax for encap action with properties:
>   encap(hdr=)
>   encap(hdr=,
> prop(class=,type=,val=),
> prop(class=,type=,val()))
> 
> CLI syntax for decap action:
>   decap()
>   decap(packet_type(ns=,type=))
> 
> The first header supported for encap and decap is "ethernet" to 
> convert packets between packet_type (1,Ethertype) and (0,0).
> 
> This commit also implements a skeleton for the translation of generic 
> encap and decap actions in ofproto-dpif and adds support to encap and 
> decap an Ethernet header.
> 
> In general translation of encap commits pending actions and then 
> rewrites struct flow in accordance with the new packet type and 
> header. In the case of encap(ethernet) it suffices to change the 
> packet type from (1, Ethertype) to (0,0) and set the dl_type 
> accordingly. A new pending_encap flag in xlate ctx is set to mark that 
> an corresponding datapath encap action must be triggered at the next 
> commit. In the case of encap(ethernet) ofproto generetas a push_eth action.
> 
> The general case for translation of decap() is to emit a datapath 
> action to decap the current outermost header and then recirculate the 
> packet to reparse the inner headers. In the special case of an 
> Ethernet packet,
> decap() just changes the packet type from (0,0) to (1, dl_type) 
> without a need to recirculate. The emission of the pop_eth action for 
> the datapath is postponed to the next commit.
> 
> Hence encap(ethernet) and decap() on an Ethernet packet are OF octions 
> that only incur a cost in the dataplane when a modifed packet is 
> actually committed, e.g. because it is sent out. They can freely be 
> used for normalizing the packet type in the OF pipeline without 
> degrading performance.
> 
> Signed-off-by: Jan Scheurich 
> Signed-off-by: Yi Yang 
> Signed-off-by: Zoltan Balogh 
> Co-authored-by: Zoltan Balogh 

Thanks for iterating so quickly!

Besides the syntax for properties, which I still think should be simplified to 
e.g. nsh(md_type=1), I have only a few remaining comments.

I don't think there's any value in the n_props member in nx_action_encap.  
(This is about the OpenFlow wire format now, not the internal format.)  The 
properties are the whole tail of the structure and having a count doesn't make 
anything easier.  Can you remove it?  It will allow us to drop 8 bytes from the 
action structure due to padding.
(In case it can't be removed, I'm providing a spelling fix.)

decode_ed_prop() still doesn't check the length properly, so I'm providing a 
fix.

I'm appending my suggested incremental.

Thanks again!

--8<--cut here-->8--

diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index 
d0437f20922a..c7d47eb5dd71 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -4036,7 +4036,7 @@ struct nx_action_encap {
 ovs_be16 subtype;  /* NXAST_ENCAP. */
 ovs_be16 hdr_size; /* Header size in bytes, 0 = 'not specified'.*/
 ovs_be32 new_pkt_type; /* Header type to add and PACKET_TYPE of result. */
-ovs_be16 n_props;  /* Number of the following endecap properties. */
+ovs_be16 n_props;  /* Number of the following encap properties. */
 uint8_t pad[6];/* Align to 8 bytes boundary */
 struct ofp_ed_prop_header props[];  /* Encap TLV properties. */  }; @@ 
-4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad,
enum ofp_version ofp_version OVS_UNUSED,
struct ofpbuf *ofpacts)  {
+if (ntohs(nad->len) > sizeof *nad) {
+/* No properties supported yet. */
+return OFPERR_OFPBPC_BAD_TYPE;
+}
+
 struct ofpact_decap * decap;
 
 decap = ofpact_put_DECAP(ofpacts);
diff --git a/lib/ofp-ed-props.c b/lib/ofp-ed-props.c index 
260adc30acf0..6bba3ac7d982 100644
--- a/lib/ofp-ed-props.c
+++ b/lib/ofp-ed-props.c
@@ -32,6 +32,9 @@ decode_ed_prop(const struct ofp_ed_prop_header **ofp_prop,
 uint16_t prop_class = ntohs((*ofp_prop)->prop_class);
 size_t len = (*ofp_prop)->len;
 size_t pad_len = ROUND_UP(len, 8);
+if (pad_len > *remaining) {
+return OFPER

Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Ben Pfaff

I don't understand your explanation.  decode_NXAST_RAW_DECAP() converts
an OpenFlow action in wire format into OVS internal format.  The
OpenFlow action that it contains might have been generated by anything
(for example, a controller).  parse_ENCAP() doesn't have anything to do
with it as far as I can tell.

On Tue, Aug 01, 2017 at 10:45:50PM +, Yang, Yi Y wrote:
> Ben, thank you for your patch, but from my understanding, parse_ENCAP has 
> ensured it is impossible to have any property for decap, so I'm not sure in 
> what case this will happened.
> 
> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org] 
> Sent: Tuesday, August 1, 2017 11:26 PM
> To: Yang, Yi Y 
> Cc: Zoltán Balogh ; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic 
> encap and decap
> 
> On Tue, Aug 01, 2017 at 12:32:20PM +, Yang, Yi Y wrote:
> > #2.
> > [Ben] I suspect that decode_NXAST_RAW_DECAP() should report an error if 
> > properties are present, since it doesn't support properties.
> > 
> > [Yi] It is impossible. 
> 
> What is impossible?  It is easy to detect that properties are present and 
> report an error:
> 
> diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index 
> d0437f20922a..7be302a4005d 100644
> --- a/lib/ofp-actions.c
> +++ b/lib/ofp-actions.c
> @@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap 
> *nad,
> enum ofp_version ofp_version OVS_UNUSED,
> struct ofpbuf *ofpacts)  {
> +if (ntohs(nad->len) > sizeof *nad) {
> +/* No properties supported yet. */
> +return OFPERR_OFPBPC_BAD_TYPE;
> +}
> +
>  struct ofpact_decap * decap;
>  
>  decap = ofpact_put_DECAP(ofpacts);
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH V2] dpif-netlink: Fix log level for error message

2017-08-01 Thread Eric Garver

On Tue, Aug 01, 2017 at 01:56:44PM -0700, Joe Stringer wrote:
> On 1 August 2017 at 13:56, Joe Stringer  wrote:
> > On 29 July 2017 at 21:58, Roi Dayan  wrote:
> >> Since it's an error but also will always occur in older kernels
> >> log the message with level warning instead of info.
> >>
> >> Signed-off-by: Roi Dayan 
> >> ---
> >
> > When I run this patch on a fedora system with kernel 4.8 and
> > out-of-tree modules, when running the "make check-kmod" tests, every
> > vxlan test now fails because the last step of the test checks the logs
> > for any log message with WARN or higher level, and this message shows
> > up. Looking at the other platforms affected, it seems like this will
> > always triggers on newer kernels in my test environment. Here's the
> > log:
> >
> > 2017-08-01T20:43:59.387Z|00053|dpif_netlink|WARN|Failed to create
> > at_vxlan0 with rtnetlink: Invalid argument
> >
> > I wonder if the new rtnetlink infrastructure is not always working for
> > VXLAN, and this error message is just highlighting that? Eric, could
> > this be related to VXLAN-GPE?
> 
> Correction: The non-GPE VXLAN tests fail in this case.

Yup, sorry. This broke with VXLAN-GPE, but we didn't notice because it
was falling back to the compat interface. A good example of why this log
needs to be higher. :)

I posted a fix.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] RFC: publishing per-release roadmaps

2017-08-01 Thread Ben Pfaff

Hello everyone.  Last week I spent an afternoon with the OVS-DPDK folks
at Intel in Shannon, Ireland.  One of the ideas that came out of it was
a proposal for the contributors at each company to publish, at the
beginning of each release cycle, a list of what features and other
contributions they were planning to target for the upcoming release.
This sounds useful to me as a way for each of the major contributors to
get an idea of what to expect and how to coordinate over a release
cycle.

I believe that Intel is volunteering to kick this off in the 2.9 release
cycle for OVS.  I'm going to try to get us here at VMware to write a
list as well, and I encourage other contributors to give it a try as
well.

If this works out then I'd consider adding a item for it to
release-process.rst in the tree.

Ben
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] dpif-netlink-rtnl: Fix VXLAN port create for regular VXLAN

2017-08-01 Thread Eric Garver

When VXLAN-GPE was introduced we added IFLA_VXLAN_GPE to the policy
parsing, but did not mark it as optional. The kernel only returns this
netlink attribute if it's actually configured.

This also adds a missing entry for IFLA_VXLAN_GBP. Apparently we have no
system-traffic test coverage there.

Fixes: c33c412fb139 ("dpif-netlink-rtnl: Support VXLAN-GPE")
Fixes: 825e45e0109f ("dpif-netlink-rtnl: Support VXLAN creation")
Reported-by: Joe Stringer 
Signed-off-by: Eric Garver 
---
 lib/dpif-netlink-rtnl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
index 98a2f2bddae2..83f51ac3f3b3 100644
--- a/lib/dpif-netlink-rtnl.c
+++ b/lib/dpif-netlink-rtnl.c
@@ -70,7 +70,8 @@ static const struct nl_policy vxlan_policy[] = {
 [IFLA_VXLAN_LEARNING] = { .type = NL_A_U8 },
 [IFLA_VXLAN_UDP_ZERO_CSUM6_RX] = { .type = NL_A_U8 },
 [IFLA_VXLAN_PORT] = { .type = NL_A_U16 },
-[IFLA_VXLAN_GPE] = { .type = NL_A_FLAG },
+[IFLA_VXLAN_GBP] = { .type = NL_A_FLAG, .optional = true },
+[IFLA_VXLAN_GPE] = { .type = NL_A_FLAG, .optional = true },
 };
 static const struct nl_policy gre_policy[] = {
 [IFLA_GRE_COLLECT_METADATA] = { .type = NL_A_FLAG },
-- 
2.12.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Yang, Yi Y

Ben, thank you for your patch, but from my understanding, parse_ENCAP has 
ensured it is impossible to have any property for decap, so I'm not sure in 
what case this will happened.

-Original Message-
From: Ben Pfaff [mailto:b...@ovn.org] 
Sent: Tuesday, August 1, 2017 11:26 PM
To: Yang, Yi Y 
Cc: Zoltán Balogh ; d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic 
encap and decap

On Tue, Aug 01, 2017 at 12:32:20PM +, Yang, Yi Y wrote:
> #2.
> [Ben] I suspect that decode_NXAST_RAW_DECAP() should report an error if 
> properties are present, since it doesn't support properties.
> 
> [Yi] It is impossible. 

What is impossible?  It is easy to detect that properties are present and 
report an error:

diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index 
d0437f20922a..7be302a4005d 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad,
enum ofp_version ofp_version OVS_UNUSED,
struct ofpbuf *ofpacts)  {
+if (ntohs(nad->len) > sizeof *nad) {
+/* No properties supported yet. */
+return OFPERR_OFPBPC_BAD_TYPE;
+}
+
 struct ofpact_decap * decap;
 
 decap = ofpact_put_DECAP(ofpacts);
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] datapath-windows: Refactor OvsCreateNewNBLsFromMultipleNBs

2017-08-01 Thread Shashank Ram

Guru, could you please apply this patch.

Thanks,
Shashank


From: Anand Kumar
Sent: Thursday, July 27, 2017 4:48:02 PM
To: Shashank Ram; d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH v2] datapath-windows: Refactor 
OvsCreateNewNBLsFromMultipleNBs

Thanks for refactoring it.

Acked-by: Anand Kumar 

Thanks,
Anand Kumar

On 7/24/17, 3:31 PM, "ovs-dev-boun...@openvswitch.org on behalf of Shashank 
Ram"  wrote:

Previously, the function would take the curNbl and nextNbl
as inputs, and modify the linked list, and merge the input
linked list with the newly generated newNbl list.

This is confusing for the caller, and the function has
unnecessary logic for merging linked lists that instead
the caller should take care of. This is because the
OvsCreateNewNBLsFromMultipleNBs() is a generic API
that can be used by other functions as well, and its
natural for different callers to have different needs.

This patch refactors the behavior of OvsCreateNewNBLsFromMultipleNBs
to take in the curNbl and lastNbl, and it returns
a linked list of NBLs and sets the HEAD and TAIL of the
new list obtained from the curNbl. If the caller wants
to chain a new linked list at the HEAD or TAIL, it
can make use of the curNbl and lastNbl to do so.

Signed-off-by: Shashank Ram 
---
 datapath-windows/ovsext/PacketIO.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/datapath-windows/ovsext/PacketIO.c 
b/datapath-windows/ovsext/PacketIO.c
index a90b556..81c574e 100644
--- a/datapath-windows/ovsext/PacketIO.c
+++ b/datapath-windows/ovsext/PacketIO.c
@@ -49,7 +49,7 @@ static VOID OvsCompleteNBLIngress(POVS_SWITCH_CONTEXT 
switchContext,
 static NTSTATUS OvsCreateNewNBLsFromMultipleNBs(
 POVS_SWITCH_CONTEXT switchContext,
 PNET_BUFFER_LIST *curNbl,
-PNET_BUFFER_LIST *nextNbl);
+PNET_BUFFER_LIST *lastNbl);

 VOID
 OvsInitCompletionList(OvsCompletionList *completionList,
@@ -212,7 +212,7 @@ OvsStartNBLIngress(POVS_SWITCH_CONTEXT switchContext,
 NDIS_SWITCH_PORT_ID sourcePort = 0;
 NDIS_SWITCH_NIC_INDEX sourceIndex = 0;
 PNDIS_SWITCH_FORWARDING_DETAIL_NET_BUFFER_LIST_INFO fwdDetail;
-PNET_BUFFER_LIST curNbl = NULL, nextNbl = NULL;
+PNET_BUFFER_LIST curNbl = NULL, nextNbl = NULL, lastNbl = NULL;
 ULONG sendCompleteFlags;
 UCHAR dispatch;
 LOCK_STATE_EX lockState, dpLockState;
@@ -282,7 +282,7 @@ OvsStartNBLIngress(POVS_SWITCH_CONTEXT switchContext,
 /* Create a NET_BUFFER_LIST for each NET_BUFFER. */
 status = OvsCreateNewNBLsFromMultipleNBs(switchContext,
  &curNbl,
- &nextNbl);
+ &lastNbl);
 if (!NT_SUCCESS(status)) {
 RtlInitUnicodeString(&filterReason,
  L"Cannot allocate NBLs with single 
NB.");
@@ -292,6 +292,10 @@ OvsStartNBLIngress(POVS_SWITCH_CONTEXT switchContext,
 NDIS_STATUS_RESOURCES);
 continue;
 }
+
+lastNbl->Next = nextNbl;
+nextNbl = curNbl->Next;
+curNbl->Next = NULL;
 }
 {
 OvsFlow *flow;
@@ -500,11 +504,10 @@ OvsExtCancelSendNBL(NDIS_HANDLE filterModuleContext,
 static NTSTATUS
 OvsCreateNewNBLsFromMultipleNBs(POVS_SWITCH_CONTEXT switchContext,
 PNET_BUFFER_LIST *curNbl,
-PNET_BUFFER_LIST *nextNbl)
+PNET_BUFFER_LIST *lastNbl)
 {
 NTSTATUS status = STATUS_SUCCESS;
 PNET_BUFFER_LIST newNbls = NULL;
-PNET_BUFFER_LIST lastNbl = NULL;
 PNET_BUFFER_LIST nbl = NULL;
 BOOLEAN error = TRUE;

@@ -520,16 +523,15 @@ OvsCreateNewNBLsFromMultipleNBs(POVS_SWITCH_CONTEXT 
switchContext,

 nbl = newNbls;
 while (nbl) {
-lastNbl = nbl;
+*lastNbl = nbl;
 nbl = NET_BUFFER_LIST_NEXT_NBL(nbl);
 }
-lastNbl->Next = *nextNbl;
-*nextNbl = newNbls->Next;
+
+(*curNbl)->Next = NULL;

 OvsCompleteNBL(switchContext, *curNbl, TRUE);

 *curNbl = newNbls;
-(*curNbl)->Next = NULL;

 error = FALSE;
 } while (error);
--
2.9.3.windows.2

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman

Re: [ovs-dev] [PATCH] datapath-windows: Fix conntrack lookups for reversed keys

2017-08-01 Thread Shashank Ram





From: ovs-dev-boun...@openvswitch.org  on 
behalf of Anand Kumar 
Sent: Tuesday, August 1, 2017 3:01 PM
To: d...@openvswitch.org
Subject: [ovs-dev] [PATCH] datapath-windows: Fix conntrack lookups for  
reversed keys

From: Sairam Venugopal 

The conntrack table needs to be queried for entries in either directions
to determine if the packet is in forward direction or reply direction.

The current behavior ends up reversing the incoming packet's 5-Tuple for
every entry in the loop instead of doing it only once.

Testing Done:
- Verified that ICMP requests are no longer treated as replies
in Conntrack.

Change-Id: I826a164cfb9137e2167c404ff5c9bfd9dfaa33ad
Co-authored-by: Sairam Venugopal 
Signed-off-by: Anand Kumar 
---
 datapath-windows/ovsext/Conntrack.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index 8ea1e65..917ebee 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -401,7 +401,14 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx)
 POVS_CT_ENTRY entry;
 BOOLEAN reply = FALSE;
 POVS_CT_ENTRY found = NULL;
-OVS_CT_KEY key = ctx->key;
+
+/* Reverse NAT must be performed before OvsCtLookup, so here
+ * we simply need to flip the src and dst in key and compare
+ * they are equal. Note that flipped key is not equal to
+ * rev_key due to NAT effect.
+ */
+OVS_CT_KEY revCtxKey = ctx->key;
+OvsCtKeyReverse(&revCtxKey);

 if (!ctTotalEntries) {
 return found;
@@ -410,19 +417,13 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx)
 LIST_FORALL(&ovsConntrackTable[ctx->hash & CT_HASH_TABLE_MASK], link) {
 entry = CONTAINING_RECORD(link, OVS_CT_ENTRY, link);

-if (OvsCtKeyAreSame(key,entry->key)) {
+if (OvsCtKeyAreSame(ctx->key, entry->key)) {
 found = entry;
 reply = FALSE;
 break;
 }

-/* Reverse NAT must be performed before OvsCtLookup, so here
- * we simply need to flip the src and dst in key and compare
- * they are equal. Note that flipped key is not equal to
- * rev_key due to NAT effect.
- */
-OvsCtKeyReverse(&key);
-if (OvsCtKeyAreSame(key, entry->key)) {
+if (OvsCtKeyAreSame(revCtxKey, entry->key)) {
 found = entry;
 reply = TRUE;
 break;
--
2.9.3.windows.1

___

Acked-by: Shashank Ram 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 6/6] redhat: allow dpdk to also run as non-root user

2017-08-01 Thread Aaron Conole

After this commit, users may start a dpdk-enabled ovs setup as a
non-root user.  This is accomplished by exporting the $HOME directory,
which dpdk uses to fill in it's semi-persistent RTE configuration.

This change may be a bit controversial since it modifies /dev/hugepages
as part of starting the ovs-vswitchd to set a hugetlbfs group
ownership.  This is used to enable writing to /dev/hugepages so that the
dpdk_init will successfully complete.  There is an alternate way of
accomplishing this - namely to initialize DPDK before dropping
privileges.  However, this would mean that if DPDK ever grows an uninit
/ reinit function, non-root ovs likely could never use it.

This does not change OvS+DPDK's SELinux requirements.  It still must be
disabled.

Signed-off-by: Aaron Conole 
---
 Documentation/intro/install/dpdk.rst|  7 +++
 NEWS|  1 +
 rhel/README.RHEL.rst| 11 +++
 rhel/openvswitch-fedora.spec.in | 13 +
 rhel/usr_lib_systemd_system_ovs-vswitchd.service.in |  5 +
 5 files changed, 37 insertions(+)

diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
index a05aa1a..0585c6a 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -134,6 +134,13 @@ has to be configured with DPDK support (``--with-dpdk``).
 
 Additional information can be found in :doc:`general`.
 
+.. note::
+  If you are running using the Fedora or Red Hat package, the Open vSwitch
+  daemon will run as a non-root user.  This implies that you must have a
+  working IOMMU.  Visit the `RHEL README`__ for additional information.
+
+__ https://github.com/openvswitch/ovs/blob/master/rhel/README.RHEL.rst
+
 Setup
 -
 
diff --git a/NEWS b/NEWS
index facea02..095272a 100644
--- a/NEWS
+++ b/NEWS
@@ -64,6 +64,7 @@ Post-v2.7.0
  * OpenFlow 1.5 packet-out is now supported.
- Fedora Packaging:
  * OVN services are no longer restarted automatically after upgrade.
+ * ovs-vswitchd and ovsdb-server run as non-root users by default.
- Add --cleanup option to command 'ovs-appctl exit' (see ovs-vswitchd(8)).
- L3 tunneling:
  * Use new tunnel port option "packet_type" to configure L2 vs. L3.
diff --git a/rhel/README.RHEL.rst b/rhel/README.RHEL.rst
index 1845e8f..5f7a99a 100644
--- a/rhel/README.RHEL.rst
+++ b/rhel/README.RHEL.rst
@@ -337,6 +337,17 @@ running. All other commands where executed when Open 
vSwitch was successfully
 running.
 
 
+Non-root User Support
+---
+Fedora and RHEL support running the Open vSwitch daemons as a non-root user.
+By default, a fresh installation will create an *openvswitch* user, along
+with any additional support groups needed (such as *hugetlbfs* for DPDK
+support).
+
+This is controlled by modifying the ``OVS_USER_ID`` option.  Setting this
+to 'root:root', or commenting the variable out will revert this behavior.
+
+
 Reporting Bugs
 --
 
diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 959aa2e..ccf6ea0 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -95,6 +95,10 @@ Requires: openssl hostname iproute module-init-tools
 Requires(post): /usr/bin/getent
 Requires(post): /usr/sbin/useradd
 Requires(post): /usr/bin/sed
+%if %{with dpdk}
+Requires(post): /usr/sbin/usermod
+Requires(post): /usr/sbin/groupadd
+%endif
 Requires(post): systemd-units
 Requires(preun): systemd-units
 Requires(postun): systemd-units
@@ -370,6 +374,15 @@ if [ $1 -eq 1 ]; then
 
 sed -i 's:^#OVS_USER_ID=:OVS_USER_ID=:' /etc/sysconfig/openvswitch
 
+%if %{with dpdk}
+getent group hugetlbfs >/dev/null || \
+groupadd hugetlbfs
+usermod -a -G hugetlbfs openvswitch
+sed -i \
+
's@OVS_USER_ID="openvswitch:openvswitch"@OVS_USER_ID="openvswitch:hugetlbfs"@'\
+/etc/sysconfig/openvswitch
+%endif
+
 # In the case of upgrade, this is not needed.
 chown -R openvswitch:openvswitch /etc/openvswitch
 fi
diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in 
b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
index 9aff70b..bf0f058 100644
--- a/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
+++ b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
@@ -10,8 +10,13 @@ PartOf=openvswitch.service
 [Service]
 Type=forking
 Restart=on-failure
+Environment=HOME=/var/run/openvswitch
 EnvironmentFile=/etc/openvswitch/default.conf
 EnvironmentFile=-/etc/sysconfig/openvswitch
+@begin_dpdk@
+ExecStartPre=/usr/bin/chown :hugetlbfs /dev/hugepages
+ExecStartPre=/usr/bin/chmod 0775 /dev/hugepages
+@end_dpdk@
 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \
   --no-ovsdb-server --no-monitor --system-id=random \
   --ovs-user=${OVS_USER_ID} \
-- 
2.9.4

___
dev mailing list
d...@openvswitch.org
https://mail.o

[ovs-dev] [PATCH v3 5/6] dpdkstrip: add a preprocessor tool for stripping dpdk blocks

2017-08-01 Thread Aaron Conole

Normally, in C code, pre-processing macros can be used to enable/disable
specific functionality based on switches passed to configure.  This works
for DPDK using the --with-dpdk flag, which sets the DPDK_NETDEV define to
the appropriate value.

However, not all files are processed with the C pre-processor.  For those
files which are not, this commit adds a new pre-processor tool for .in
files to either include or exclude those stanzas as appropriate.

Signed-off-by: Aaron Conole 
---
 Makefile.am|  5 +
 build-aux/dpdkstrip.pl | 35 +++
 2 files changed, 40 insertions(+)
 create mode 100644 build-aux/dpdkstrip.pl

diff --git a/Makefile.am b/Makefile.am
index 30794ff..433a7f9 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -35,6 +35,9 @@ AM_CFLAGS += $(OVS_CFLAGS)
 
 if DPDK_NETDEV
 AM_CFLAGS += -D_FILE_OFFSET_BITS=64
+DPDKSTRIP_FLAGS = "--dpdk"
+else
+DPDKSTRIP_FLAGS = "--nodpdk"
 endif
 
 if NDEBUG
@@ -82,6 +85,7 @@ EXTRA_DIST = \
build-aux/cksum-schema-check \
build-aux/calculate-schema-cksum \
build-aux/dist-docs \
+   build-aux/dpdkstrip.pl \
build-aux/sodepends.pl \
build-aux/soexpand.pl \
build-aux/xml2nroff \
@@ -141,6 +145,7 @@ SUFFIXES += .in
 .in:
$(AM_V_GEN)$(MKDIR_P) $(dir $@)
$(AM_V_GEN)$(PERL) $(srcdir)/build-aux/soexpand.pl -I$(srcdir) < $< | \
+   $(PERL) $(srcdir)/build-aux/dpdkstrip.pl $(DPDKSTRIP_FLAGS) | \
  sed \
-e 's,[@]PKIDIR[@],$(PKIDIR),g' \
-e 's,[@]LOGDIR[@],$(LOGDIR),g' \
diff --git a/build-aux/dpdkstrip.pl b/build-aux/dpdkstrip.pl
new file mode 100644
index 000..98539cc
--- /dev/null
+++ b/build-aux/dpdkstrip.pl
@@ -0,0 +1,35 @@
+# Copyright (c) 2017 Red Hat, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at:
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+use strict;
+use warnings;
+use Getopt::Long;
+
+my ($check_dpdk) = 0;
+my ($disabled_print) = 0;
+
+Getopt::Long::Configure ("bundling");
+GetOptions("dpdk!" => \$check_dpdk) or exit(1);
+
+OUTER: while () {
+if (/@(begin|end)_dpdk@/) {
+if (!$check_dpdk) {
+$disabled_print = ! $disabled_print;
+}
+next;
+}
+
+print $_ unless $disabled_print;
+}
+exit 0;
-- 
2.9.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 4/6] redhat: dynamic service file for vswitchd

2017-08-01 Thread Aaron Conole

This commit changes the service file from static configuration to an
autogenerated file, produced during the build.  This will be relevant in a
future commit.

Signed-off-by: Aaron Conole 
---
 rhel/.gitignore  | 1 +
 rhel/automake.mk | 5 +
 ...witchd.service => usr_lib_systemd_system_ovs-vswitchd.service.in} | 0
 3 files changed, 6 insertions(+)
 rename rhel/{usr_lib_systemd_system_ovs-vswitchd.service => 
usr_lib_systemd_system_ovs-vswitchd.service.in} (100%)

diff --git a/rhel/.gitignore b/rhel/.gitignore
index 164bb66..e584a1e 100644
--- a/rhel/.gitignore
+++ b/rhel/.gitignore
@@ -4,3 +4,4 @@ openvswitch-kmod-rhel6.spec
 openvswitch-kmod-fedora.spec
 openvswitch.spec
 openvswitch-fedora.spec
+usr_lib_systemd_system_ovs-vswitchd.service
diff --git a/rhel/automake.mk b/rhel/automake.mk
index 2d9443f..56d377b 100644
--- a/rhel/automake.mk
+++ b/rhel/automake.mk
@@ -29,12 +29,15 @@ EXTRA_DIST += \
rhel/usr_lib_systemd_system_openvswitch.service \
rhel/usr_lib_systemd_system_ovsdb-server.service \
rhel/usr_lib_systemd_system_ovs-vswitchd.service \
+   rhel/usr_lib_systemd_system_ovs-vswitchd.service.in \
rhel/usr_lib_systemd_system_ovn-controller.service \
rhel/usr_lib_systemd_system_ovn-controller-vtep.service \
rhel/usr_lib_systemd_system_ovn-northd.service \
rhel/usr_lib_firewalld_services_ovn-central-firewall-service.xml \
rhel/usr_lib_firewalld_services_ovn-host-firewall-service.xml
 
+DISTCLEANFILES += rhel/usr_lib_systemd_system_ovs-vswitchd.service
+
 update_rhel_spec = \
   $(AM_V_GEN)($(ro_shell) && sed -e 's,[@]VERSION[@],$(VERSION),g') \
 < $(srcdir)/rhel/$(@F).in > $(@F).tmp || exit 1; \
@@ -46,6 +49,8 @@ $(srcdir)/rhel/openvswitch-dkms.spec: 
rhel/openvswitch-dkms.spec.in $(top_buildd
 $(srcdir)/rhel/openvswitch-kmod-rhel6.spec: 
rhel/openvswitch-kmod-rhel6.spec.in $(top_builddir)/config.status
$(update_rhel_spec)
 
+rhel/usr_lib_systemd_system_ovs-vswitchd.service: 
rhel/usr_lib_systemd_system_ovs-vswitchd.service.in 
$(top_builddir)/config.status
+
 $(srcdir)/rhel/openvswitch-kmod-fedora.spec: 
rhel/openvswitch-kmod-fedora.spec.in $(top_builddir)/config.status
$(update_rhel_spec)
 
diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service 
b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
similarity index 100%
rename from rhel/usr_lib_systemd_system_ovs-vswitchd.service
rename to rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
-- 
2.9.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 3/6] makefile: create destdir when generating files

2017-08-01 Thread Aaron Conole

When intermediary files are generated, the destination directory is assumed
to exist.  This has worked so far because most files are built prior to
the dist-packaging step.  However, any files which require rebuild after
the packaging step may end up in failure if the output directory is not
available.  This commit adds a 'mkdir -p' before generating the output files.

Signed-off-by: Aaron Conole 
---
 Makefile.am | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Makefile.am b/Makefile.am
index 373ef6e..30794ff 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -139,6 +139,7 @@ ro_shell = printf '\043 Generated automatically -- do not 
modify!-*- buffer-
 
 SUFFIXES += .in
 .in:
+   $(AM_V_GEN)$(MKDIR_P) $(dir $@)
$(AM_V_GEN)$(PERL) $(srcdir)/build-aux/soexpand.pl -I$(srcdir) < $< | \
  sed \
-e 's,[@]PKIDIR[@],$(PKIDIR),g' \
-- 
2.9.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 2/6] redhat: dynamically allocate and reference ovs user

2017-08-01 Thread Aaron Conole

After this commit, the fedora RPM will create the openvswitch user, from the
non-static pool, for use as an Open vSwitch daemon user.  This only happens
on install - not upgrade.  This will be the default user:group
combination for the openvswitch daemons.

To do this in a way that doesn't impact existing installations, the
/etc/openvswitch directory will be created during the installation,
rather than being provided as part of the rpm.

Acked-by: Markos Chandras 
Signed-off-by: Aaron Conole 
---
 rhel/openvswitch-fedora.spec.in  | 13 +
 rhel/usr_lib_systemd_system_ovsdb-server.service |  1 +
 2 files changed, 14 insertions(+)

diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 3179568..959aa2e 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -92,6 +92,9 @@ Requires: openssl hostname iproute module-init-tools
 #Upstream kernel commit 4f647e0a3c37b8d5086214128614a136064110c3
 #Requires: kernel >= 3.15.0-0
 
+Requires(post): /usr/bin/getent
+Requires(post): /usr/sbin/useradd
+Requires(post): /usr/bin/sed
 Requires(post): systemd-units
 Requires(preun): systemd-units
 Requires(postun): systemd-units
@@ -361,6 +364,16 @@ rm -rf $RPM_BUILD_ROOT
 %endif
 
 %post
+if [ $1 -eq 1 ]; then
+getent passwd openvswitch >/dev/null || \
+useradd -r -d / -s /sbin/nologin -c "Open vSwitch Daemons" openvswitch
+
+sed -i 's:^#OVS_USER_ID=:OVS_USER_ID=:' /etc/sysconfig/openvswitch
+
+# In the case of upgrade, this is not needed.
+chown -R openvswitch:openvswitch /etc/openvswitch
+fi
+
 %if 0%{?systemd_post:1}
 %systemd_post %{name}.service
 %else
diff --git a/rhel/usr_lib_systemd_system_ovsdb-server.service 
b/rhel/usr_lib_systemd_system_ovsdb-server.service
index b82cb33..7acd25f 100644
--- a/rhel/usr_lib_systemd_system_ovsdb-server.service
+++ b/rhel/usr_lib_systemd_system_ovsdb-server.service
@@ -10,6 +10,7 @@ Type=forking
 Restart=on-failure
 EnvironmentFile=/etc/openvswitch/default.conf
 EnvironmentFile=-/etc/sysconfig/openvswitch
+ExecStartPre=/usr/bin/chown ${OVS_USER_ID} /var/run/openvswitch
 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \
   --no-ovs-vswitchd --no-monitor --system-id=random \
   --ovs-user=${OVS_USER_ID} \
-- 
2.9.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 1/6] redhat: allow arbitrary user:group

2017-08-01 Thread Aaron Conole

Under rpm based distributions, the only user:group that the rhel daemons run
as is 'root:root'.  This is fine as a default, but as part of a security
procedure, users may want to run as an alternate uid/gid.  This commit
adds an OVS_USER_ID environment variable for systemd, which defaults to
root:root, but can be overridden by changing the /etc/sysconfig/openvswitch
environment file.

Acked-by: Markos Chandras 
Signed-off-by: Aaron Conole 
---
 rhel/automake.mk  | 1 +
 rhel/etc_openvswitch_default.conf | 5 +
 rhel/openvswitch-fedora.spec.in   | 4 
 rhel/usr_lib_systemd_system_ovs-vswitchd.service  | 3 +++
 rhel/usr_lib_systemd_system_ovsdb-server.service  | 3 +++
 rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template | 3 +++
 6 files changed, 19 insertions(+)
 create mode 100644 rhel/etc_openvswitch_default.conf

diff --git a/rhel/automake.mk b/rhel/automake.mk
index 1265fa7..2d9443f 100644
--- a/rhel/automake.mk
+++ b/rhel/automake.mk
@@ -10,6 +10,7 @@ EXTRA_DIST += \
rhel/automake.mk \
rhel/etc_init.d_openvswitch \
rhel/etc_logrotate.d_openvswitch \
+   rhel/etc_openvswitch_default.conf \
rhel/etc_sysconfig_network-scripts_ifdown-ovs \
rhel/etc_sysconfig_network-scripts_ifup-ovs \
rhel/openvswitch-dkms.spec \
diff --git a/rhel/etc_openvswitch_default.conf 
b/rhel/etc_openvswitch_default.conf
new file mode 100644
index 000..c74417d
--- /dev/null
+++ b/rhel/etc_openvswitch_default.conf
@@ -0,0 +1,5 @@
+# DO NOT EDIT THIS FILE
+
+# The following is the *default* configuration for the openvswitch user ID.
+# This is for backward compatibility.
+OVS_USER_ID="root:root"
diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 367207b..3179568 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -246,6 +246,9 @@ done
 install -m 0755 rhel/etc_init.d_openvswitch \
 $RPM_BUILD_ROOT%{_datadir}/openvswitch/scripts/openvswitch.init
 
+install -p -D -m 0644 rhel/etc_openvswitch_default.conf \
+$RPM_BUILD_ROOT/%{_sysconfdir}/openvswitch/default.conf
+
 install -p -D -m 0644 rhel/etc_logrotate.d_openvswitch \
 $RPM_BUILD_ROOT/%{_sysconfdir}/logrotate.d/openvswitch
 
@@ -475,6 +478,7 @@ fi
 %{_sysconfdir}/bash_completion.d/ovs-appctl-bashcomp.bash
 %{_sysconfdir}/bash_completion.d/ovs-vsctl-bashcomp.bash
 %dir %{_sysconfdir}/openvswitch
+%{_sysconfdir}/openvswitch/default.conf
 %config %ghost %{_sysconfdir}/openvswitch/conf.db
 %ghost %{_sysconfdir}/openvswitch/.conf.db.~lock~
 %config %ghost %{_sysconfdir}/openvswitch/system-id.conf
diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service 
b/rhel/usr_lib_systemd_system_ovs-vswitchd.service
index 886b68a..9aff70b 100644
--- a/rhel/usr_lib_systemd_system_ovs-vswitchd.service
+++ b/rhel/usr_lib_systemd_system_ovs-vswitchd.service
@@ -10,12 +10,15 @@ PartOf=openvswitch.service
 [Service]
 Type=forking
 Restart=on-failure
+EnvironmentFile=/etc/openvswitch/default.conf
 EnvironmentFile=-/etc/sysconfig/openvswitch
 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \
   --no-ovsdb-server --no-monitor --system-id=random \
+  --ovs-user=${OVS_USER_ID} \
   start $OPTIONS
 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop
 ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server \
   --no-monitor --system-id=random \
+  --ovs-user=${OVS_USER_ID} \
   restart $OPTIONS
 TimeoutSec=300
diff --git a/rhel/usr_lib_systemd_system_ovsdb-server.service 
b/rhel/usr_lib_systemd_system_ovsdb-server.service
index 68deace..b82cb33 100644
--- a/rhel/usr_lib_systemd_system_ovsdb-server.service
+++ b/rhel/usr_lib_systemd_system_ovsdb-server.service
@@ -8,12 +8,15 @@ PartOf=openvswitch.service
 [Service]
 Type=forking
 Restart=on-failure
+EnvironmentFile=/etc/openvswitch/default.conf
 EnvironmentFile=-/etc/sysconfig/openvswitch
 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \
   --no-ovs-vswitchd --no-monitor --system-id=random \
+  --ovs-user=${OVS_USER_ID} \
   start $OPTIONS
 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd stop
 ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd \
+   --ovs-user=${OVS_USER_ID} \
--no-monitor restart $OPTIONS
 RuntimeDirectory=openvswitch
 RuntimeDirectoryMode=0755
diff --git a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template 
b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template
index 3050a07..fdaee00 100644
--- a/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template
+++ b/rhel/usr_share_openvswitch_scripts_systemd_sysconfig.template
@@ -21,3 +21,6 @@
 #   --ovsdb-server-wrapper=valgrind
 #
 OPTIONS=""
+
+# Uncomment and set the OVS User/Group value
+#OVS_USER_ID="openvswitch:openvswitch"
--

[ovs-dev] [PATCH v3 0/6] rhel/fedora: non-root OvS out of the box

2017-08-01 Thread Aaron Conole

This series attempts to introduce the ability to start and use
Open vSwitch 'out of the box' as a non-root user.  It does this by
modifying the service files to pass the recently introduced --ovs-user
argument around, and by making some minor tweaks to the passwd, group,
and filesystem information.

I prefixed the packaging work with 'redhat', but if rpm or packaging
is a preferred prefx for that work, I can respin.

The more controversial changes are:

* This modifies the /etc/sysconfig/ file on install.
* The dpdk support directly modifies /dev/hugepages with a call to chmod
* A new user 'openvswitch', and up to two new groups 'openvswitch', and
  'hugetlbfs' are created
* A change to soexpand.pl to allow conditional inclusion of dpdk-related
  options

It is important to note that for the DPDK section to function properly, the
update to 17.05.1 patch needs to be accepted.  A workaround is documented
in the RHEL guide (instructing users to revert the functionality).

After this series:

> [root at wsfd-netdev60 ~]# yum install openvswitch-2.7.90-1.fc25.x86_64.rpm 
> Loaded plugins: product-id, search-disabled-repos, subscription-manager
> This system is not registered to Red Hat Subscription Management. You can use 
> subscription-manager to register.
> Examining openvswitch-2.7.90-1.fc25.x86_64.rpm: 
> openvswitch-2.7.90-1.fc25.x86_64
> Marking openvswitch-2.7.90-1.fc25.x86_64.rpm to be installed
> Resolving Dependencies
> --> Running transaction check
> ---> Package openvswitch.x86_64 0:2.7.90-1.fc25 will be installed
> --> Finished Dependency Resolution
>
> Dependencies Resolved
>
> 
>  Package  ArchVersion  Repository  
> Size
> 
> Installing:
>  openvswitch  x86_64  2.7.90-1.fc25/openvswitch-2.7.90-1.fc25.x86_64   11 
> M
>
> Transaction Summary
> 
> Install  1 Package
>
> Total size: 11 M
> Installed size: 11 M
> Is this ok [y/d/N]: y
> Downloading packages:
> Running transaction check
> Running transaction test
> Transaction test succeeded
> Running transaction
>   Installing : openvswitch-2.7.90-1.fc25.x86_64 
> 1/1 
>   Verifying  : openvswitch-2.7.90-1.fc25.x86_64 
> 1/1 
>
> Installed:
>   openvswitch.x86_64 0:2.7.90-1.fc25  
>   
>
> Complete!
> [root at wsfd-netdev60 ~]# systemctl start openvswitch
> [root at wsfd-netdev60 ~]# ps aux | grep ovs
> openvsw+  12642  0.0  0.0  47864  2296 ?S ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info 
> --remote=punix:/var/run/openvswitch/db.sock 
> --private-key=db:Open_vSwitch,SSL,private_key 
> --certificate=db:Open_vSwitch,SSL,certificate 
> --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user 
> openvswitch:openvswitch --no-chdir 
> --log-file=/var/log/openvswitch/ovsdb-server.log 
> --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
> openvsw+  12688  0.0  0.0  49588 10600 ?S ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err 
> -vfile:info --mlockall --user openvswitch:openvswitch --no-chdir 
> --log-file=/var/log/openvswitch/ovs-vswitchd.log 
> --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach


v1->v2:
https://lists.linux-foundation.org/pipermail/ovs-dev/2017-June/333417.html

The previous method used 3 different locations of configuration from
environment variables:
1. The systemd file.
2. A new /etc/sysconfig/openvswitch-pre
3. The existing /etc/sysconfig/openvswitch

Now, configuration is from two areas:
1. A new /etc/openvswitch/default.conf
2. The existing /etc/sysconfig/openvswitch

As part of the install, we set the OVS_USER_ID to the new values.

v2->v3:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/334946.html

Refactor for the dpdk non-root user portion due to an issue discovered
where the generated service file didn't honor new configuration when
re-running ./configure.

Also, converted the "Reviewed-by" to "Acked-by".  This is because there
is no such thing as Reviewed-by in the OVS contributing guide.

Finally, included some documentation updates.

Aaron Conole (6):
  redhat: allow arbitrary user:group
  redhat: dynamically allocate and reference ovs user
  makefile: create destdir when generating files
  redhat: dynamic service file for vswitchd
  dpdkstrip: add a preprocessor tool for stripping dpdk blocks
  redhat: allow dpdk to also run as non-root user

 Documentation/intro/install/dpdk.rst   |  7 +
 Makefile.am|  6 
 NEWS   |  1 +
 build-aux/dpdkstrip.pl | 35 ++
 rhel/.gitignore

[ovs-dev] [PATCH] datapath-windows: Fix conntrack lookups for reversed keys

2017-08-01 Thread Anand Kumar

From: Sairam Venugopal 

The conntrack table needs to be queried for entries in either directions
to determine if the packet is in forward direction or reply direction.

The current behavior ends up reversing the incoming packet's 5-Tuple for
every entry in the loop instead of doing it only once.

Testing Done:
- Verified that ICMP requests are no longer treated as replies
in Conntrack.

Change-Id: I826a164cfb9137e2167c404ff5c9bfd9dfaa33ad
Co-authored-by: Sairam Venugopal 
Signed-off-by: Anand Kumar 
---
 datapath-windows/ovsext/Conntrack.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index 8ea1e65..917ebee 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -401,7 +401,14 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx)
 POVS_CT_ENTRY entry;
 BOOLEAN reply = FALSE;
 POVS_CT_ENTRY found = NULL;
-OVS_CT_KEY key = ctx->key;
+
+/* Reverse NAT must be performed before OvsCtLookup, so here
+ * we simply need to flip the src and dst in key and compare
+ * they are equal. Note that flipped key is not equal to
+ * rev_key due to NAT effect.
+ */
+OVS_CT_KEY revCtxKey = ctx->key;
+OvsCtKeyReverse(&revCtxKey);
 
 if (!ctTotalEntries) {
 return found;
@@ -410,19 +417,13 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx)
 LIST_FORALL(&ovsConntrackTable[ctx->hash & CT_HASH_TABLE_MASK], link) {
 entry = CONTAINING_RECORD(link, OVS_CT_ENTRY, link);
 
-if (OvsCtKeyAreSame(key,entry->key)) {
+if (OvsCtKeyAreSame(ctx->key, entry->key)) {
 found = entry;
 reply = FALSE;
 break;
 }
 
-/* Reverse NAT must be performed before OvsCtLookup, so here
- * we simply need to flip the src and dst in key and compare
- * they are equal. Note that flipped key is not equal to
- * rev_key due to NAT effect.
- */
-OvsCtKeyReverse(&key);
-if (OvsCtKeyAreSame(key, entry->key)) {
+if (OvsCtKeyAreSame(revCtxKey, entry->key)) {
 found = entry;
 reply = TRUE;
 break;
-- 
2.9.3.windows.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] dpif-netlink-rtnl: Fix false errors on interfaces without tunnel config

2017-08-01 Thread Joe Stringer

On 28 July 2017 at 05:43, Simon Horman  wrote:
> On Thu, Jul 27, 2017 at 02:40:02PM +0300, Roi Dayan wrote:
>> When we skip adding a port using rtnetlink and not because of an error we
>> need to return EOPNOTSUPP to avoid logging an error message.
>>
>> Fixes: 2fd3d5eda508 ("dpif-netlink-rtnl: Support layer3 GRE")
>> Signed-off-by: Roi Dayan 
>> Reviewed-by: Paul Blakey 
>
> Thanks, this looks good to me. I would be happy to apply if someone
> provided an Ack. I'd also be happy if someone else applied it with:
>
> Acked-by: Simon Horman 

Acked-by: Joe Stringer 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH V2] tc: Correct convert ticks to msecs on parsing tc TM

2017-08-01 Thread Joe Stringer

On 29 July 2017 at 22:01, Roi Dayan  wrote:
> From: Paul Blakey 
>
> Use sysconf(_SC_CLK_TCK) to read run time "number of clock ticks per
> second" and use that to convert ticks to msecs.
> This is how iproute does the conversion when parsing tc filters.
> The system call is done only once.
>
> Signed-off-by: Paul Blakey 
> Reviewed-by: Roi Dayan 
> ---

Acked-by: Joe Stringer 

Looks like this is missing Simon's ack from the previous round. In
future, please consider propagating these from earlier patch revisions
if the change is minor.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH V2] dpif-netlink: Fix log level for error message

2017-08-01 Thread Joe Stringer

On 1 August 2017 at 13:56, Joe Stringer  wrote:
> On 29 July 2017 at 21:58, Roi Dayan  wrote:
>> Since it's an error but also will always occur in older kernels
>> log the message with level warning instead of info.
>>
>> Signed-off-by: Roi Dayan 
>> ---
>
> When I run this patch on a fedora system with kernel 4.8 and
> out-of-tree modules, when running the "make check-kmod" tests, every
> vxlan test now fails because the last step of the test checks the logs
> for any log message with WARN or higher level, and this message shows
> up. Looking at the other platforms affected, it seems like this will
> always triggers on newer kernels in my test environment. Here's the
> log:
>
> 2017-08-01T20:43:59.387Z|00053|dpif_netlink|WARN|Failed to create
> at_vxlan0 with rtnetlink: Invalid argument
>
> I wonder if the new rtnetlink infrastructure is not always working for
> VXLAN, and this error message is just highlighting that? Eric, could
> this be related to VXLAN-GPE?

Correction: The non-GPE VXLAN tests fail in this case.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH V2] dpif-netlink: Fix log level for error message

2017-08-01 Thread Joe Stringer

On 29 July 2017 at 21:58, Roi Dayan  wrote:
> Since it's an error but also will always occur in older kernels
> log the message with level warning instead of info.
>
> Signed-off-by: Roi Dayan 
> ---

When I run this patch on a fedora system with kernel 4.8 and
out-of-tree modules, when running the "make check-kmod" tests, every
vxlan test now fails because the last step of the test checks the logs
for any log message with WARN or higher level, and this message shows
up. Looking at the other platforms affected, it seems like this will
always triggers on newer kernels in my test environment. Here's the
log:

2017-08-01T20:43:59.387Z|00053|dpif_netlink|WARN|Failed to create
at_vxlan0 with rtnetlink: Invalid argument

I wonder if the new rtnetlink infrastructure is not always working for
VXLAN, and this error message is just highlighting that? Eric, could
this be related to VXLAN-GPE?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] RFC: Let's branch for 2.8 today

2017-08-01 Thread Ben Pfaff

On Tue, Aug 01, 2017 at 06:49:53PM +, Darrell Ball wrote:
> 
> -Original Message-
> From:  on behalf of Ben Pfaff 
> Date: Tuesday, August 1, 2017 at 10:23 AM
> To: "d...@openvswitch.org" 
> Subject: [ovs-dev] RFC: Let's branch for 2.8 today
> 
> I'd like to create branch-2.8 today in preparation for releasing OVS 2.8
> later in August.  I sent the patch that would lead off the branch a few
> minutes ago:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-2DAugust_336544.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=EHRmZog9ZtlKH5MXuo4OklgP6dlm_pfMwfokiw2bXGk&s=udsL-1fyk8cB7_qe39p2Ua8qSMMWmIcBcWDmsjECPTg&e=
>  
> 
> According to our published schedule, we should have create branch-2.8 at
> the beginning of July, but we delayed to allow some more features to
> enter the 2.8 release.  There continue to be reasonable arguments along
> these lines in favor of delaying branching further:
> 
> * Many DPDK patches have been posted but are still under review
>   or awaiting application.
> 
> I don’t see much problem for double commit as the baselines should be
> identical.  However, most DPDK folks are offline until late tonight
> PST; can we give them a chance to be back online before branching ?

Waiting until Wednesday morning (US Pacific) isn't a problem for me.

> I propose that we branch today, with the following further details:
> 
> * Continue to apply patches that add features, that as of now
>   have already been posted and ready for review or iterating on
>   reviews, until approximately this Friday.
> 
> For limited scope features, would be ok to allow a little longer,
> maybe early next week ?

OK.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] RFC: Let's branch for 2.8 today

2017-08-01 Thread Russell Bryant

On Tue, Aug 1, 2017 at 1:23 PM, Ben Pfaff  wrote:
> I'd like to create branch-2.8 today in preparation for releasing OVS 2.8
> later in August.  I sent the patch that would lead off the branch a few
> minutes ago:
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336544.html
>
> According to our published schedule, we should have create branch-2.8 at
> the beginning of July, but we delayed to allow some more features to
> enter the 2.8 release.  There continue to be reasonable arguments along
> these lines in favor of delaying branching further:
>
> * Many DPDK patches have been posted but are still under review
>   or awaiting application.
>
> * Multiple series of patches leading up to NSH support have been
>   posted but are still iterating through reviews.
>
> The primary argument in favor of branching quickly is to keep to a
> regular release cadence.  My understanding is that multiple downstream
> projects rely on prompt Open vSwitch releases.  (This is mainly based on
> speaking to Russell Bryant at Red Hat; he can fill in the details if
> need be.)  I'd prefer to stick to our schedule for this reason.

Yes, my reasoning is that a time based release schedule is only
valuable if we stick to it as closely as possible.  When dates are
reliable, more downstream advance planning can be done.

If we stray too far from published dates, downstream consumers will
lose confidence in any published dates.  The net result will be a
longer window of time between an upstream OVS release and downstream
integration.

Here is a concrete example.  We published a date for OVS 2.8 of
mid-August.  OpenStack Pike is scheduled to be released at the end of
August.  What version of OVS should a downstream aim to ship with this
version of OpenStack?

The ideal case in this scenario would be to provide OVS 2.8 with
OpenStack Pike.  Some new OVN features for OpenStack depend on support
in OVN as well.  The safer fallback is to plan on shipping OVS 2.7
instead.  That fallback is much less sensitive to OVS release timing,
but introduces another ~6 month delay before features are finally
shipped to users via normal distribution mechanisms.

A consistent time based release model can be very helpful to
downstream consumers, but only if the dates can be relied upon.

>
> I propose that we branch today, with the following further details:
>
> * Continue to apply patches that add features, that as of now
>   have already been posted and ready for review or iterating on
>   reviews, until approximately this Friday.  These patches will
>   need to be committed both to master and branch-2.8.
>
> * As always, bug fix patches are acceptable on any branch at any
>   time.
>
> * Plan to release, given acceptable quality, during the last
>   week of August.
>
> Thoughts?
>
> Thanks,
>
> Ben.
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovn-northd: Add native active-standby HA.

2017-08-01 Thread Russell Bryant

On Tue, Aug 1, 2017 at 3:26 PM, Han Zhou  wrote:
>
>
> On Tue, Aug 1, 2017 at 9:19 AM, Russell Bryant  wrote:
>>
>> Add native support for active-standby HA in ovn-northd by having each
>> instance attempt to acquire an OVSDB lock.  Only the instance of
>> ovn-northd that currently holds the lock will make active changes to
>> the OVN databases.
>>
>> Signed-off-by: Russell Bryant 
>> ---
>>  NEWS|  1 +
>>  ovn/northd/ovn-northd.8.xml |  9 +
>>  ovn/northd/ovn-northd.c | 40 +++-
>>  3 files changed, 41 insertions(+), 9 deletions(-)
>>
>> diff --git a/NEWS b/NEWS
>> index facea0228..f3cdd2443 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -49,6 +49,7 @@ Post-v2.7.0
>> one chassis is specified, OVN will manage high availability for
>> that
>> gateway.
>>   * Add support for ACL logging.
>> + * ovn-northd now has native support for active-standby high
>> availability.
>> - Tracing with ofproto/trace now traces through recirculation.
>> - OVSDB:
>>   * New support for role-based access control (see ovsdb-server(1)).
>> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
>> index 1527e8a60..0d85ec0d2 100644
>> --- a/ovn/northd/ovn-northd.8.xml
>> +++ b/ovn/northd/ovn-northd.8.xml
>> @@ -72,6 +72,15 @@
>>
>>  
>>
>> +Active-Standby for High Availability
>> +
>> +  You may run ovn-northd more than once in an OVN
>> deployment.
>> +  OVN will automatically ensure that only one of them is active at a
>> time.
>> +  If multiple instances of ovn-northd are running and
>> the
>> +  active ovn-northd fails, one of the hot standby
>> instances
>> +  of ovn-northd will automatically take over.
>> +
>> +
>>  Logical Flow Table Structure
>>
>>  
>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> index 10e0c7ce0..3d2be4267 100644
>> --- a/ovn/northd/ovn-northd.c
>> +++ b/ovn/northd/ovn-northd.c
>> @@ -6531,6 +6531,12 @@ main(int argc, char *argv[])
>>  ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
>>  ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
>>
>> +/* Ensure that only a single ovn-northd is active in the deployment
>> by
>> + * acquiring a lock called "ovn_northd" on the southbound database
>> + * and then only performing DB transactions if the lock is held. */
>> +ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd");
>> +bool had_lock = false;
>> +
>>  /* Main loop. */
>>  exiting = false;
>>  while (!exiting) {
>> @@ -6541,15 +6547,29 @@ main(int argc, char *argv[])
>>  .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
>>  };
>>
>> -struct chassis_index chassis_index;
>> -chassis_index_init(&chassis_index, ctx.ovnsb_idl);
>> +if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
>> +VLOG_INFO("ovn-northd lock acquired. "
>> +  "This ovn-northd instance is now active.");
>> +had_lock = true;
>> +} else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
>> +VLOG_INFO("ovn-northd lock lost. "
>> +  "This ovn-northd instance is now on standby.");
>
> Should it try lock again, if we want it to be standby? Otherwise, this
> instance won't have a chance to be active any more.

Good question ... I was assuming this scenario was due to a lost
connection, and that the IDL would automatically try to re-acquire the
lock.

I tested to make sure I saw a second ovn-northd go from standby to
active, but I have not tested active -> standby -> active again.

I'll take a closer look at this before applying the patch.

>
>> +had_lock = false;
>> +}
>>
>> -ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop);
>> -ovnsb_db_run(&ctx, &ovnsb_idl_loop);
>> -if (ctx.ovnsb_txn) {
>> -check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
>> -check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
>> -check_and_update_rbac(&ctx);
>> +struct chassis_index chassis_index;
>> +bool destroy_chassis_index = false;
>> +if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
>> +chassis_index_init(&chassis_index, ctx.ovnsb_idl);
>> +destroy_chassis_index = true;
>> +
>> +ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop);
>> +ovnsb_db_run(&ctx, &ovnsb_idl_loop);
>> +if (ctx.ovnsb_txn) {
>> +check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
>> +check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
>> +check_and_update_rbac(&ctx);
>> +}
>>  }
>>
>>  unixctl_server_run(unixctl);
>> @@ -6565,7 +6585,9 @@ main(int argc, char *argv[])
>>  exiting = true;
>>  }
>>
>> -chassis_index_destroy(&chassis_index);
>>

Re: [ovs-dev] [PATCH] ovn-northd: Add native active-standby HA.

2017-08-01 Thread Han Zhou

On Tue, Aug 1, 2017 at 9:19 AM, Russell Bryant  wrote:
>
> Add native support for active-standby HA in ovn-northd by having each
> instance attempt to acquire an OVSDB lock.  Only the instance of
> ovn-northd that currently holds the lock will make active changes to
> the OVN databases.
>
> Signed-off-by: Russell Bryant 
> ---
>  NEWS|  1 +
>  ovn/northd/ovn-northd.8.xml |  9 +
>  ovn/northd/ovn-northd.c | 40 +++-
>  3 files changed, 41 insertions(+), 9 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index facea0228..f3cdd2443 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -49,6 +49,7 @@ Post-v2.7.0
> one chassis is specified, OVN will manage high availability for
that
> gateway.
>   * Add support for ACL logging.
> + * ovn-northd now has native support for active-standby high
availability.
> - Tracing with ofproto/trace now traces through recirculation.
> - OVSDB:
>   * New support for role-based access control (see ovsdb-server(1)).
> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> index 1527e8a60..0d85ec0d2 100644
> --- a/ovn/northd/ovn-northd.8.xml
> +++ b/ovn/northd/ovn-northd.8.xml
> @@ -72,6 +72,15 @@
>
>  
>
> +Active-Standby for High Availability
> +
> +  You may run ovn-northd more than once in an OVN
deployment.
> +  OVN will automatically ensure that only one of them is active at a
time.
> +  If multiple instances of ovn-northd are running and
the
> +  active ovn-northd fails, one of the hot standby
instances
> +  of ovn-northd will automatically take over.
> +
> +
>  Logical Flow Table Structure
>
>  
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index 10e0c7ce0..3d2be4267 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -6531,6 +6531,12 @@ main(int argc, char *argv[])
>  ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
>  ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
>
> +/* Ensure that only a single ovn-northd is active in the deployment
by
> + * acquiring a lock called "ovn_northd" on the southbound database
> + * and then only performing DB transactions if the lock is held. */
> +ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd");
> +bool had_lock = false;
> +
>  /* Main loop. */
>  exiting = false;
>  while (!exiting) {
> @@ -6541,15 +6547,29 @@ main(int argc, char *argv[])
>  .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
>  };
>
> -struct chassis_index chassis_index;
> -chassis_index_init(&chassis_index, ctx.ovnsb_idl);
> +if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> +VLOG_INFO("ovn-northd lock acquired. "
> +  "This ovn-northd instance is now active.");
> +had_lock = true;
> +} else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> +VLOG_INFO("ovn-northd lock lost. "
> +  "This ovn-northd instance is now on standby.");

Should it try lock again, if we want it to be standby? Otherwise, this
instance won't have a chance to be active any more.

> +had_lock = false;
> +}
>
> -ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop);
> -ovnsb_db_run(&ctx, &ovnsb_idl_loop);
> -if (ctx.ovnsb_txn) {
> -check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
> -check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
> -check_and_update_rbac(&ctx);
> +struct chassis_index chassis_index;
> +bool destroy_chassis_index = false;
> +if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> +chassis_index_init(&chassis_index, ctx.ovnsb_idl);
> +destroy_chassis_index = true;
> +
> +ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop);
> +ovnsb_db_run(&ctx, &ovnsb_idl_loop);
> +if (ctx.ovnsb_txn) {
> +check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
> +check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
> +check_and_update_rbac(&ctx);
> +}
>  }
>
>  unixctl_server_run(unixctl);
> @@ -6565,7 +6585,9 @@ main(int argc, char *argv[])
>  exiting = true;
>  }
>
> -chassis_index_destroy(&chassis_index);
> +if (destroy_chassis_index) {
> +chassis_index_destroy(&chassis_index);
> +}
>  }
>
>  unixctl_server_destroy(unixctl);
> --
> 2.13.3
>

Acked-by: Han Zhou 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] RFC: Let's branch for 2.8 today

2017-08-01 Thread Darrell Ball

-Original Message-
From:  on behalf of Ben Pfaff 
Date: Tuesday, August 1, 2017 at 10:23 AM
To: "d...@openvswitch.org" 
Subject: [ovs-dev] RFC: Let's branch for 2.8 today

I'd like to create branch-2.8 today in preparation for releasing OVS 2.8
later in August.  I sent the patch that would lead off the branch a few
minutes ago:

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-2DAugust_336544.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=EHRmZog9ZtlKH5MXuo4OklgP6dlm_pfMwfokiw2bXGk&s=udsL-1fyk8cB7_qe39p2Ua8qSMMWmIcBcWDmsjECPTg&e=

According to our published schedule, we should have create branch-2.8 at
the beginning of July, but we delayed to allow some more features to
enter the 2.8 release.  There continue to be reasonable arguments along
these lines in favor of delaying branching further:

* Many DPDK patches have been posted but are still under review
  or awaiting application.

I don’t see much problem for double commit as the baselines should be identical.
However, most DPDK folks are offline until late tonight PST; can we give them a 
chance to be
back online before branching ?

* Multiple series of patches leading up to NSH support have been
  posted but are still iterating through reviews.

The primary argument in favor of branching quickly is to keep to a
regular release cadence.  My understanding is that multiple downstream
projects rely on prompt Open vSwitch releases.  (This is mainly based on
speaking to Russell Bryant at Red Hat; he can fill in the details if
need be.)  I'd prefer to stick to our schedule for this reason.

I propose that we branch today, with the following further details:

* Continue to apply patches that add features, that as of now
  have already been posted and ready for review or iterating on
  reviews, until approximately this Friday.

For limited scope features, would be ok to allow a little longer, maybe early 
next week ?

  These patches will
  need to be committed both to master and branch-2.8.

* As always, bug fix patches are acceptable on any branch at any
  time.

* Plan to release, given acceptable quality, during the last
  week of August.

Thoughts?

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=EHRmZog9ZtlKH5MXuo4OklgP6dlm_pfMwfokiw2bXGk&s=9VEWp89QUGpFZt15Xw2A3I7u7X3Le-4H_EGM7Vo7eiU&e=

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ofpbuf: Fix parameter for const initializer.

2017-08-01 Thread Ben Pfaff

On Tue, Aug 01, 2017 at 10:24:47AM -0700, Joe Stringer wrote:
> On 1 August 2017 at 07:55, Ben Pfaff  wrote:
> > On Mon, Jul 31, 2017 at 05:16:11PM -0700, Joe Stringer wrote:
> >> Clang complains:
> >>
> >> In file included from include/openvswitch/cxxtest.cc:11:0:
> >> ../include/openvswitch/ofpbuf.h: In function ‘ofpbuf 
> >> ofpbuf_const_initializer(const void*, size_t)’:
> >> ../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of 
> >> ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned 
> >> int}’ inside { } [-Wnarrowing]
> >>  };
> >>  ^
> >> ../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of 
> >> ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned 
> >> int}’ inside { } [-Wnarrowing]
> >>
> >> This is because the ofpbuf struct's "size" parameter is a uint32_t,
> >> while ofpbuf_const_initializer() takes a size_t for the size. Fix this
> >> function to take a uint32_t instead.
> >>
> >> Signed-off-by: Joe Stringer 
> >
> > Thanks.  (We must have different clang versions.0
> >
> > Acked-by: Ben Pfaff 
> 
> Perhaps - Clang-4.0?
> 
> Thanks, applied to master.

Probably, I have clang-3.8 here.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ofpbuf: Fix parameter for const initializer.

2017-08-01 Thread Joe Stringer

On 1 August 2017 at 07:55, Ben Pfaff  wrote:
> On Mon, Jul 31, 2017 at 05:16:11PM -0700, Joe Stringer wrote:
>> Clang complains:
>>
>> In file included from include/openvswitch/cxxtest.cc:11:0:
>> ../include/openvswitch/ofpbuf.h: In function ‘ofpbuf 
>> ofpbuf_const_initializer(const void*, size_t)’:
>> ../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of 
>> ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned 
>> int}’ inside { } [-Wnarrowing]
>>  };
>>  ^
>> ../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of 
>> ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned 
>> int}’ inside { } [-Wnarrowing]
>>
>> This is because the ofpbuf struct's "size" parameter is a uint32_t,
>> while ofpbuf_const_initializer() takes a size_t for the size. Fix this
>> function to take a uint32_t instead.
>>
>> Signed-off-by: Joe Stringer 
>
> Thanks.  (We must have different clang versions.0
>
> Acked-by: Ben Pfaff 

Perhaps - Clang-4.0?

Thanks, applied to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] RFC: Let's branch for 2.8 today

2017-08-01 Thread Ben Pfaff

I'd like to create branch-2.8 today in preparation for releasing OVS 2.8
later in August.  I sent the patch that would lead off the branch a few
minutes ago:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336544.html

According to our published schedule, we should have create branch-2.8 at
the beginning of July, but we delayed to allow some more features to
enter the 2.8 release.  There continue to be reasonable arguments along
these lines in favor of delaying branching further:

* Many DPDK patches have been posted but are still under review
  or awaiting application.

* Multiple series of patches leading up to NSH support have been
  posted but are still iterating through reviews.

The primary argument in favor of branching quickly is to keep to a
regular release cadence.  My understanding is that multiple downstream
projects rely on prompt Open vSwitch releases.  (This is mainly based on
speaking to Russell Bryant at Red Hat; he can fill in the details if
need be.)  I'd prefer to stick to our schedule for this reason.

I propose that we branch today, with the following further details:

* Continue to apply patches that add features, that as of now
  have already been posted and ready for review or iterating on
  reviews, until approximately this Friday.  These patches will
  need to be committed both to master and branch-2.8.

* As always, bug fix patches are acceptable on any branch at any
  time.

* Plan to release, given acceptable quality, during the last
  week of August.

Thoughts?

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] Prepare for 2.8.0.

2017-08-01 Thread Ben Pfaff

Signed-off-by: Ben Pfaff 
---
This is intended to be the first commit on a newly created branch-2.8.

 NEWS |  2 +-
 configure.ac |  2 +-
 debian/changelog | 88 ++--
 3 files changed, 81 insertions(+), 11 deletions(-)

diff --git a/NEWS b/NEWS
index facea0228d3a..a9e4d253b918 100644
--- a/NEWS
+++ b/NEWS
@@ -1,4 +1,4 @@
-Post-v2.7.0
+v2.8.0 - xx xxx 
 -
- ovs-ofctl:
  * ovs-ofctl can now accept and display port names in place of numbers.  By
diff --git a/configure.ac b/configure.ac
index 093b6ffed50e..aefe475bcac8 100644
--- a/configure.ac
+++ b/configure.ac
@@ -13,7 +13,7 @@
 # limitations under the License.
 
 AC_PREREQ(2.63)
-AC_INIT(openvswitch, 2.7.90, b...@openvswitch.org)
+AC_INIT(openvswitch, 2.8.0, b...@openvswitch.org)
 AC_CONFIG_SRCDIR([datapath/datapath.c])
 AC_CONFIG_MACRO_DIR([m4])
 AC_CONFIG_AUX_DIR([build-aux])
diff --git a/debian/changelog b/debian/changelog
index 45d566a872ad..e216bd4cabab 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,13 +1,83 @@
-openvswitch (2.7.90-1) unstable; urgency=low
-
+openvswitch (2.8.0-1) unstable; urgency=low
[ Open vSwitch team ]
-   * Use debhelper for DEB_VERSION_UPSTREAM.  Thanks to Clint Byrum
-  for this improvement.
-   * New upstream version
-   - Nothing yet!  Try NEWS...
-
- -- Open vSwitch team   Tue, 31 Jan 2017 17:51:32 -0700
-
+   - ovs-ofctl:
+ * ovs-ofctl can now accept and display port names in place of numbers.  By
+   default it always accepts names and in interactive use it displays them;
+   use --names or --no-names to override.  See ovs-ofctl(8) for details.
+ * "ovs-ofctl dump-flows" now accepts --no-stats to omit flow statistics.
+   - New ovs-dpctl command "ct-stats-show" to show connection tracking stats.
+   - Tunnels:
+ * Added support to set packet mark for tunnel endpoint using
+   `egress_pkt_mark` OVSDB option.
+ * When using Linux kernel datapath tunnels may be created using rtnetlink.
+   This will allow us to take advantage of new tunnel features without
+   having to make changes to the vport modules.
+   - EMC insertion probability is reduced to 1% and is configurable via
+ the new 'other_config:emc-insert-inv-prob' option.
+   - DPDK:
+ * DPDK log messages redirected to OVS logging subsystem.
+   Log level can be changed in a usual OVS way using
+   'ovs-appctl vlog' commands for 'dpdk' module. Lower bound
+   still can be configured via extra arguments for DPDK EAL.
+ * dpdkvhostuser ports are marked as deprecated.  They will be removed
+   in an upcoming release.
+   - IPFIX now provides additional counters:
+ * Total counters since metering process startup.
+ * Per-flow TCP flag counters.
+ * Multicast, broadcast, and unicast counters.
+   - New support for multiple VLANs (802.1ad or "QinQ"), including a new
+ "dot1q-tunnel" port VLAN mode.
+   - In ovn-vsctl and vtep-ctl, record UUIDs in commands may now be
+ abbreviated to 4 hex digits.
+   - Userspace Datapath:
+ * Added NAT support for userspace datapath.
+   - OVN:
+ * New built-in DNS support.
+ * IPAM for IPv4 can now exclude user-defined addresses from assignment.
+ * IPAM can now assign IPv6 addresses.
+ * Make the DHCPv4 router setting optional.
+ * Gratuitous ARP for NAT addresses on a distributed logical router.
+ * Allow ovn-controller SSL configuration to be obtained from vswitchd
+   database.
+ * ovn-trace now has basic support for tracing distributed firewalls.
+ * In ovn-nbctl and ovn-sbctl, record UUIDs in commands may now be
+   abbreviated to 4 hex digits.
+ * "ovn-sbctl lflow-list" can now print OpenFlow flows that correspond
+   to logical flows.
+ * Now uses OVSDB RBAC support to reduce impact of compromised hypervisors.
+ * Multiple chassis may now be specified for L3 gateways.  When more than
+   one chassis is specified, OVN will manage high availability for that
+   gateway.
+ * Add support for ACL logging.
+   - Tracing with ofproto/trace now traces through recirculation.
+   - OVSDB:
+ * New support for role-based access control (see ovsdb-server(1)).
+   - New commands 'stp/show' and 'rstp/show' (see ovs-vswitchd(8)).
+   - OpenFlow:
+ * All features required by OpenFlow 1.4 are now implemented, so
+   ovs-vswitchd now enables OpenFlow 1.4 by default (in addition to
+   OpenFlow 1.0 to 1.3).
+ * Increased support for OpenFlow 1.6 (draft).
+ * Bundles now support hashing by just nw_src or nw_dst.
+ * The "learn" action now supports a "limit" option (see ovs-ofctl(8)).
+ * The port status bit OFPPS_LIVE now reflects link aliveness.
+ * OpenFlow 1.5 packet-out is now supported.
+   - Fedora Packaging:
+ * OVN services are no longer restarted automatically after upgrade.
+   - Add --cleanup option to command 'ovs-appctl exit' (see

Re: [ovs-dev] FW: [patch_v1] docs/dpdk: Consolidate pmd-cpu-mask references.

2017-08-01 Thread Greg Rose


On 07/31/2017 04:56 PM, Darrell Ball wrote:

Hi Greg

Since you would be looking at the DPDK documentation these days; could you help 
review this patch ?

Thanks Darrell


The patch looks good to me Darrell.  Replying to your forward since my email 
client seems to have dumped the original patch on the ovs-dev list.

Reviewed-by: Greg rose 



-Original Message-
From:  on behalf of Darrell Ball 

Date: Thursday, July 27, 2017 at 4:50 PM
To: "d...@openvswitch.org" 
Subject: [ovs-dev] [patch_v1] docs/dpdk: Consolidate pmd-cpu-mask references.

 The DPDK introductory documentation has various references to
 pmd-cpu-mask, including a section devoted to it.  These parts of
 the documentation seeemed to have been written at different times
 and look like they were individually ported from other sources.
 They all include an example command which gets repeated several times.
 Here, we consolidate those referenes to make the documentation
 easier to maintain. At the same time, create linkages to the
 pmd-cpu-mask section from other sections to provide some level of
 coherence.

 Signed-off-by: Darrell Ball 
 ---
  Documentation/intro/install/dpdk.rst | 141 
---
  1 file changed, 65 insertions(+), 76 deletions(-)

 diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
 index a05aa1a..2e9cf8d 100644
 --- a/Documentation/intro/install/dpdk.rst
 +++ b/Documentation/intro/install/dpdk.rst
 @@ -237,20 +237,12 @@ or::
  $ ovs-vsctl --no-wait set Open_vSwitch . \
  other_config:dpdk-socket-mem="1024"

 -Similarly, if you wish to better scale the workloads across cores, then
 -multiple pmd threads can be created and pinned to CPU cores by explicity
 -specifying ``pmd-cpu-mask``. Cores are numbered from 0, so to spawn two 
pmd
 -threads and pin them to cores 1,2, run::
 -
 -$ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x6
 -
 -Refer to ovs-vswitchd.conf.db(5) for additional information on 
configuration
 -options.
 -
  .. note::
Changing any of these options requires restarting the ovs-vswitchd
application

 +See the section ``Performance Tuning`` for important DPDK customizations.
 +
  Validating
  --

 @@ -373,32 +365,6 @@ Now mount the huge pages, if not already done so::

  $ mount -t hugetlbfs -o pagesize=1G none /dev/hugepages

 -Enable HyperThreading
 -~
 -
 -With HyperThreading, or SMT, enabled, a physical core appears as two 
logical
 -cores. SMT can be utilized to spawn worker threads on logical cores of 
the same
 -physical core there by saving additional cores.
 -
 -With DPDK, when pinning pmd threads to logical cores, care must be taken 
to set
 -the correct bits of the ``pmd-cpu-mask`` to ensure that the pmd threads 
are
 -pinned to SMT siblings.
 -
 -Take a sample system configuration, with 2 sockets, 2 * 10 core 
processors, HT
 -enabled. This gives us a total of 40 logical cores. To identify the 
physical
 -core shared by two logical cores, run::
 -
 -$ cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list
 -
 -where ``N`` is the logical core number.
 -
 -In this example, it would show that cores ``1`` and ``21`` share the same
 -physical core. As cores are counted from 0, the ``pmd-cpu-mask`` can be 
used
 -to enable these two pmd threads running on these two logical cores (one
 -physical core) is::
 -
 -$ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x22
 -
  Isolate Cores
  ~

 @@ -413,23 +379,6 @@ cmdline.
It has been verified that core isolation has minimal advantage due to 
mature
Linux scheduler in some circumstances.

 -NUMA/Cluster-on-Die
 -~~~
 -
 -Ideally inter-NUMA datapaths should be avoided where possible as packets 
will
 -go across QPI and there may be a slight performance penalty when compared 
with
 -intra NUMA datapaths. On Intel Xeon Processor E5 v3, Cluster On Die is
 -introduced on models that have 10 cores or more.  This makes it possible 
to
 -logically split a socket into two NUMA regions and again it is preferred 
where
 -possible to keep critical datapaths within the one cluster.
 -
 -It is good practice to ensure that threads that are in the datapath are 
pinned
 -to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs 
responsible for
 -forwarding. If DPDK is built with ``CONFIG_RTE_LIBRTE_VHOST_NUMA=y``, 
vHost
 -User ports automatically detect the NUMA socket of the QEMU vCPUs and 
will be
 -serviced by a PMD from the same node provided a core on this node is 
enabled in
 -the ``pmd-cpu-mask``. ``libnuma`` packages are req

Re: [ovs-dev] [PATCH] rhel: Fix typo in README.RHEL.rst

2017-08-01 Thread Russell Bryant

On Fri, Jul 28, 2017 at 3:02 PM, Timothy Redaelli  wrote:
> Replace systemctk with systemctl
>
> Signed-off-by: Timothy Redaelli 
> ---
>  rhel/README.RHEL.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks, applied to master.

-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] ovn-northd: Add native active-standby HA.

2017-08-01 Thread Russell Bryant

Add native support for active-standby HA in ovn-northd by having each
instance attempt to acquire an OVSDB lock.  Only the instance of
ovn-northd that currently holds the lock will make active changes to
the OVN databases.

Signed-off-by: Russell Bryant 
---
 NEWS|  1 +
 ovn/northd/ovn-northd.8.xml |  9 +
 ovn/northd/ovn-northd.c | 40 +++-
 3 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/NEWS b/NEWS
index facea0228..f3cdd2443 100644
--- a/NEWS
+++ b/NEWS
@@ -49,6 +49,7 @@ Post-v2.7.0
one chassis is specified, OVN will manage high availability for that
gateway.
  * Add support for ACL logging.
+ * ovn-northd now has native support for active-standby high availability.
- Tracing with ofproto/trace now traces through recirculation.
- OVSDB:
  * New support for role-based access control (see ovsdb-server(1)).
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 1527e8a60..0d85ec0d2 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -72,6 +72,15 @@
   
 
 
+Active-Standby for High Availability
+
+  You may run ovn-northd more than once in an OVN deployment.
+  OVN will automatically ensure that only one of them is active at a time.
+  If multiple instances of ovn-northd are running and the
+  active ovn-northd fails, one of the hot standby instances
+  of ovn-northd will automatically take over.
+
+
 Logical Flow Table Structure
 
 
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 10e0c7ce0..3d2be4267 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -6531,6 +6531,12 @@ main(int argc, char *argv[])
 ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
 ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
 
+/* Ensure that only a single ovn-northd is active in the deployment by
+ * acquiring a lock called "ovn_northd" on the southbound database
+ * and then only performing DB transactions if the lock is held. */
+ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd");
+bool had_lock = false;
+
 /* Main loop. */
 exiting = false;
 while (!exiting) {
@@ -6541,15 +6547,29 @@ main(int argc, char *argv[])
 .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
 };
 
-struct chassis_index chassis_index;
-chassis_index_init(&chassis_index, ctx.ovnsb_idl);
+if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
+VLOG_INFO("ovn-northd lock acquired. "
+  "This ovn-northd instance is now active.");
+had_lock = true;
+} else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
+VLOG_INFO("ovn-northd lock lost. "
+  "This ovn-northd instance is now on standby.");
+had_lock = false;
+}
 
-ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop);
-ovnsb_db_run(&ctx, &ovnsb_idl_loop);
-if (ctx.ovnsb_txn) {
-check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
-check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
-check_and_update_rbac(&ctx);
+struct chassis_index chassis_index;
+bool destroy_chassis_index = false;
+if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
+chassis_index_init(&chassis_index, ctx.ovnsb_idl);
+destroy_chassis_index = true;
+
+ovnnb_db_run(&ctx, &chassis_index, &ovnsb_idl_loop);
+ovnsb_db_run(&ctx, &ovnsb_idl_loop);
+if (ctx.ovnsb_txn) {
+check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
+check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
+check_and_update_rbac(&ctx);
+}
 }
 
 unixctl_server_run(unixctl);
@@ -6565,7 +6585,9 @@ main(int argc, char *argv[])
 exiting = true;
 }
 
-chassis_index_destroy(&chassis_index);
+if (destroy_chassis_index) {
+chassis_index_destroy(&chassis_index);
+}
 }
 
 unixctl_server_destroy(unixctl);
-- 
2.13.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 2/5] dpif-netdev: Avoid reading RSS hash when EMC is disabled.

2017-08-01 Thread Fischetti, Antonio



> -Original Message-
> From: O Mahony, Billy
> Sent: Monday, July 31, 2017 5:22 PM
> To: Fischetti, Antonio ; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v2 2/5] dpif-netdev: Avoid reading RSS hash when
> EMC is disabled.
> 
> Hi Antonio,
> 
> This is patch is definitely simpler than the original.
> 
> However on the original patch I suggested:
> 
> "If so it would be less disturbing to the existing code to just add a bool arg
> to dpif_netdev_packet_get_rss_hash() called do_not_check_recirc_depth and use
> that to return early (before the if (recirc_depth) check). Also in that case
> the patch would require none of the  conditional logic changes (neither the
> original or that suggested in this email) and should be able to just set the
> proposed do_not_check_recirc_depth based on md_is_valid."
> 
> I know you checked this and reported the performance gain was lower than with
> the v1 patch. We surmised that it was related to introducing a branch in the
> dpif_netdev_packet_get_rss_hash(). However there are many branches in this
> patch also.
> 
> Can you give details of how you are testing?
> * What is the traffic
> * the flows/rules and
> * how are you measuring the performance difference  (ie. cycles per packet or
> packet throughput or some other measure).

[Antonio]
I'm using a port-to-port setup, sending 1 UDP flow, 64B packets.
2 PMDs with 3 Tx queues.
The biggest difference is with case A where the 5-tuple hash is computed 
in software.

Case A) RSS Hash is disabled. I see for each side the Rx pkt rate:
 Orig OvS + previous patch#1:   1.281.29   =2.57 Mpps
 Orig OvS + previous patch#1 + this patch:  1.471.49   =2.96 Mpps   
  

Case B) In case RSS Hash is enabled I see more or less the same performance:
 Orig OvS + previous patch#1:   11.59   11.72  =23.31 Mpps
 Orig OvS + previous patch#1 + this patch:  11.45   11.84  =23.29 Mpps

Case C) RSS Hash is enabled, EMC is disabled.
 Orig OvS + previous patch#1 + No EMC:  7.687.65   =15.33 Mpps
 Orig OvS + previous patch#1 + this patch:  7.627.73   =15.35 Mpps

> 
> Apologies for going on about this but if we can't get the same effect with a
> two or three line change than a 20line change I think it'll be worth it.
> 
> One other comment below
> 
> Thanks,
> Billy.
> 
> 
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com
> > Sent: Wednesday, July 19, 2017 5:05 PM
> > To: d...@openvswitch.org
> > Subject: [ovs-dev] [PATCH v2 2/5] dpif-netdev: Avoid reading RSS hash when
> > EMC is disabled.
> >
> > When EMC is disabled the reading of RSS hash is skipped.
> 
> [[BO'M]] I think this is already the case with the existing code?  Just
> addition of OVS_UNLIKELY on the check.
[Antonio] No, in the existing code when EMC is disabled it just skips 
emc_lookup.

miniflow_extract(packet, &key->mf);
key->len = 0; /* Not computed yet. */
key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf);  <---  
Read Hash

/* If EMC is disabled skip emc_lookup */
flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key);  <--- Skip 
emc_lookup


> 
> > For packets that are not recirculated it retrieves the hash value without
> > considering the recirc id.
> >
> > This is mostly a preliminary change for the next patch in this series.
> >
> > Signed-off-by: Antonio Fischetti 
> > ---
> >  lib/dpif-netdev.c | 42 ++
> >  1 file changed, 34 insertions(+), 8 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 123e04a..9562827
> > 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -4472,6 +4472,22 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread
> > *pmd, struct dp_packet *packet_,  }
> >
> >  static inline uint32_t
> > +dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
> > +const struct miniflow *mf) {
> > +uint32_t hash;
> > +
> > +if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
> > +hash = dp_packet_get_rss_hash(packet);
> > +} else {
> > +hash = miniflow_hash_5tuple(mf, 0);
> > +dp_packet_set_rss_hash(packet, hash);
> > +}
> > +
> > +return hash;
> > +}
> > +
> > +static inline uint32_t
> >  dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
> >  const struct miniflow *mf)  { @@ -4572,7
> +4588,8 @@ static
> > inline size_t  emc_processing(struct dp_netdev_pmd_thread *pmd,
> > struct dp_packet_batch *packets_,
> > struct netdev_flow_key *keys,
> > -   struct packet_batch_per_flow batches[], size_t *n_batches)
> > +   struct packet_batch_per_flow batches[], size_t *n_batches,
> > +   bool md_is_valid)
> >  {
> >  struct emc_cache *flow_cac

[ovs-dev] [PATCH v3 6/6] dpif-netdev: Add ovs-appctl dpif-netdev/pmd-rxq-rebalance.

2017-08-01 Thread Kevin Traynor

Rxqs consumed processing cycles are used to improve the balance
of how rxqs are assigned to pmds. Currently some reconfiguration
is needed to perform a reassignment.

Add an ovs-appctl command to perform a new assignment in order
to balance based on the latest rxq processing cycle information.

Note: Jan requested this for testing purposes.

Suggested-by: Jan Scheurich 
Signed-off-by: Kevin Traynor 
---
 Documentation/howto/dpdk.rst |  5 -
 lib/dpif-netdev.c| 35 +++
 vswitchd/ovs-vswitchd.8.in   |  2 ++
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index c7bac4e..4642849 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -140,5 +140,8 @@ Core 7: Q4 (70%) | Q5 (10%)
 core 8: Q0 (60%) | Q0 (30%)
 
-Rxq to pmds assignment takes place whenever there are configuration changes.
+Rxq to pmds assignment takes place whenever there are configuration changes
+or can be triggered by using::
+
+$ ovs-appctl dpif-netdev/pmd-rxq-rebalance
 
 QoS
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 3f174c8..7c97c93 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -702,4 +702,6 @@ static inline bool emc_entry_alive(struct emc_entry *ce);
 static void emc_clear_entry(struct emc_entry *ce);
 
+static void dp_netdev_request_reconfigure(struct dp_netdev *dp);
+
 static void
 emc_cache_init(struct emc_cache *flow_cache)
@@ -995,4 +997,34 @@ sorted_poll_thread_list(struct dp_netdev *dp,
 
 static void
+dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
+  const char *argv[], void *aux OVS_UNUSED)
+{
+struct ds reply = DS_EMPTY_INITIALIZER;
+struct dp_netdev *dp = NULL;
+
+ovs_mutex_lock(&dp_netdev_mutex);
+
+if (argc == 2) {
+dp = shash_find_data(&dp_netdevs, argv[1]);
+} else if (shash_count(&dp_netdevs) == 1) {
+/* There's only one datapath */
+dp = shash_first(&dp_netdevs)->data;
+}
+
+if (!dp) {
+ovs_mutex_unlock(&dp_netdev_mutex);
+unixctl_command_reply_error(conn,
+"please specify an existing datapath");
+return;
+}
+
+dp_netdev_request_reconfigure(dp);
+ovs_mutex_unlock(&dp_netdev_mutex);
+ds_put_cstr(&reply, "pmd rxq rebalance requested.\n");
+unixctl_command_reply(conn, ds_cstr(&reply));
+ds_destroy(&reply);
+}
+
+static void
 dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
  void *aux)
@@ -1073,4 +1105,7 @@ dpif_netdev_init(void)
  0, 1, dpif_netdev_pmd_info,
  (void *)&poll_aux);
+unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
+ 0, 1, dpif_netdev_pmd_rebalance,
+ NULL);
 return 0;
 }
diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in
index dfd209e..c18baf6 100644
--- a/vswitchd/ovs-vswitchd.8.in
+++ b/vswitchd/ovs-vswitchd.8.in
@@ -281,4 +281,6 @@ bridge statistics, only the values shown by the above 
command.
 For each pmd thread of the datapath \fIdp\fR shows list of queue-ids with
 port names, which this thread polls.
+.IP "\fBdpif-netdev/pmd-rxq-rebalance\fR [\fIdp\fR]"
+Reassigns rxqs to pmds in the datapath \fIdp\fR based on their current usage.
 .
 .so ofproto/ofproto-dpif-unixctl.man
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 5/6] dpif-netdev: Change pmd selection order.

2017-08-01 Thread Kevin Traynor

Up to his point rxqs are sorted by processing cycles they
consumed and assigned to pmds in a round robin manner.

Ian pointed out that on wrap around the most loaded pmd will be
the next one to be assigned an additional rxq and that it would be
better to reverse the pmd order when wraparound occurs.

In other words, change from assigning by rr to assigning in a forward
and reverse cycle through pmds.

Also, now that the algothim has finalised, document an example.

Suggested-by: Ian Stokes 
Signed-off-by: Kevin Traynor 
---
 Documentation/howto/dpdk.rst | 16 
 lib/dpif-netdev.c| 21 -
 tests/pmd.at |  2 +-
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index a969285..c7bac4e 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -124,4 +124,20 @@ will be used where known to assign rxqs with the highest 
consumption of
 processing cycles to different pmds.
 
+For example, in the case where here there are 5 rxqs and 3 cores (e.g. 3,7,8)
+available, and the measured usage of core cycles per rxq over the last
+interval is seen to be:
+
+- Queue #0: 30%
+- Queue #1: 80%
+- Queue #3: 60%
+- Queue #4: 70%
+- Queue #5: 10%
+
+The rxqs will be assigned to cores 3,7,8 in the following order:
+
+Core 3: Q1 (80%) |
+Core 7: Q4 (70%) | Q5 (10%)
+core 8: Q0 (60%) | Q0 (30%)
+
 Rxq to pmds assignment takes place whenever there are configuration changes.
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a05e586..3f174c8 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3236,4 +3236,5 @@ struct rr_numa {
 
 int cur_index;
+bool idx_inc;
 };
 
@@ -3274,4 +3275,7 @@ rr_numa_list_populate(struct dp_netdev *dp, struct 
rr_numa_list *rr)
 numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa->pmds);
 numa->pmds[numa->n_pmds - 1] = pmd;
+/* At least one pmd so initialise curr_idx and idx_inc. */
+numa->cur_index = 0;
+numa->idx_inc = true;
 }
 }
@@ -3280,5 +3284,20 @@ static struct dp_netdev_pmd_thread *
 rr_numa_get_pmd(struct rr_numa *numa)
 {
-return numa->pmds[numa->cur_index++ % numa->n_pmds];
+int numa_idx = numa->cur_index;
+
+if (numa->idx_inc == true) {
+if (numa->cur_index == numa->n_pmds-1) {
+numa->idx_inc = false;
+} else {
+numa->cur_index++;
+}
+} else {
+if (numa->cur_index == 0) {
+numa->idx_inc = true;
+} else {
+numa->cur_index--;
+}
+}
+return numa->pmds[numa_idx];
 }
 
diff --git a/tests/pmd.at b/tests/pmd.at
index f95a016..73a8be1 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -54,5 +54,5 @@ m4_define([CHECK_PMD_THREADS_CREATED], [
 
 m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id 
\)[[0-9]]*:/\1\2:/"])
-m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id 
\)[[0-9]]*:/\1\2:/;s/\(queue-id: \)0 2 4 
6/\1/;s/\(queue-id: \)1 3 5 7/\1/"])
+m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id 
\)[[0-9]]*:/\1\2:/;s/\(queue-id: \)1 2 5 
6/\1/;s/\(queue-id: \)0 3 4 7/\1/"])
 m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"])
 
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 4/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.

2017-08-01 Thread Kevin Traynor

Previously rxqs were assigned to pmds by round robin in
port/queue order.

Now that we have the processing cycles used for existing rxqs,
use that information to try and produced a better balanced
distribution of rxqs across pmds. i.e. given multiple pmds, the
rxqs which have consumed the largest amount of processing cycles
will be placed on different pmds.

The rxqs are sorted by their processing cycles and assigned (in
sorted order) round robin across pmds.

Signed-off-by: Kevin Traynor 
---
 Documentation/howto/dpdk.rst |  7 +
 lib/dpif-netdev.c| 73 +++-
 2 files changed, 66 insertions(+), 14 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index af01d3e..a969285 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues was pinned 
will become
   thread.
 
+If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores)
+automatically. The processing cycles that have been required for each rxq
+will be used where known to assign rxqs with the highest consumption of
+processing cycles to different pmds.
+
+Rxq to pmds assignment takes place whenever there are configuration changes.
+
 QoS
 ---
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 25a521a..a05e586 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3295,8 +3295,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
 }
 
+/* Sort Rx Queues by the processing cycles they are consuming. */
+static int
+rxq_cycle_sort(const void *a, const void *b)
+{
+struct dp_netdev_rxq * qa;
+struct dp_netdev_rxq * qb;
+
+qa = *(struct dp_netdev_rxq **) a;
+qb = *(struct dp_netdev_rxq **) b;
+
+if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >=
+dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) {
+return -1;
+}
+
+return 1;
+}
+
 /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
  * queues and marks the pmds as isolated.  Otherwise, assign non isolated
  * pmds to unpinned queues.
  *
+ * If 'pinned' is false queues will be sorted by processing cycles they are
+ * consuming and then assigned to pmds in round robin order.
+ *
  * The function doesn't touch the pmd threads, it just stores the assignment
  * in the 'pmd' member of each rxq. */
@@ -3306,18 +3327,14 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) 
OVS_REQUIRES(dp->port_mutex)
 struct dp_netdev_port *port;
 struct rr_numa_list rr;
-
-rr_numa_list_populate(dp, &rr);
+struct dp_netdev_rxq ** rxqs = NULL;
+int i, n_rxqs = 0;
+struct rr_numa *numa = NULL;
+int numa_id;
 
 HMAP_FOR_EACH (port, node, &dp->ports) {
-struct rr_numa *numa;
-int numa_id;
-
 if (!netdev_is_pmd(port->netdev)) {
 continue;
 }
 
-numa_id = netdev_get_numa_id(port->netdev);
-numa = rr_numa_list_lookup(&rr, numa_id);
-
 for (int qid = 0; qid < port->n_rxq; qid++) {
 struct dp_netdev_rxq *q = &port->rxqs[qid];
@@ -3337,17 +3354,45 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) 
OVS_REQUIRES(dp->port_mutex)
 }
 } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
-if (!numa) {
-VLOG_WARN("There's no available (non isolated) pmd thread "
-  "on numa node %d. Queue %d on port \'%s\' will "
-  "not be polled.",
-  numa_id, qid, netdev_get_name(port->netdev));
+if (n_rxqs == 0) {
+rxqs = xmalloc(sizeof *rxqs);
 } else {
-q->pmd = rr_numa_get_pmd(numa);
+rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
 }
+/* Store the queue. */
+rxqs[n_rxqs++] = q;
 }
 }
 }
 
+if (n_rxqs > 1) {
+/* Sort the queues in order of the processing cycles
+ * they consumed during their last pmd interval. */
+qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort);
+}
+
+rr_numa_list_populate(dp, &rr);
+/* Assign the sorted queues to pmds in round robin. */
+for (i = 0; i < n_rxqs; i++) {
+numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
+numa = rr_numa_list_lookup(&rr, numa_id);
+if (!numa) {
+VLOG_WARN("There's no available (non isolated) pmd thread "
+  "on numa node %d. Queue %d on port \'%s\' will "
+  "not be polled.",
+  numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
+  netdev_get_name(rxqs[i]->port->netdev));
+continue;
+}
+rxqs[i]->pmd = rr_numa_get_pmd(numa);
+VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
+  "rx queue %d (mea

[ovs-dev] [PATCH v3 3/6] dpif-netdev: Count the rxq processing cycles for an rxq.

2017-08-01 Thread Kevin Traynor

Count the cycles used for processing an rxq during the
pmd rxq interval. As this is an in flight counter and
pmds run independently, also store the total cycles used
during the last full interval.

Signed-off-by: Kevin Traynor 
---
 lib/dpif-netdev.c | 65 ++-
 1 file changed, 60 insertions(+), 5 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0a4daf9..25a521a 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -181,4 +181,8 @@ struct emc_cache {
 #define DPCLS_OPTIMIZATION_INTERVAL 1000
 
+/* Time in ms of the interval in which rxq processing cycles used in
+ * rxq to pmd assignments is measured and stored. */
+#define PMD_RXQ_INTERVAL 1000
+
 struct dpcls {
 struct cmap_node node;  /* Within dp_netdev_pmd_thread.classifiers */
@@ -554,4 +558,6 @@ struct dp_netdev_pmd_thread {
 /* Periodically sort subtable vectors according to hit frequencies */
 long long int next_optimization;
+/* Periodically store the processing cycles used for each rxq. */
+long long int rxq_interval;
 
 /* Statistics. */
@@ -678,5 +684,13 @@ static void pmd_load_cached_ports(struct 
dp_netdev_pmd_thread *pmd)
 OVS_REQUIRES(pmd->port_mutex);
 static inline void
-dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd);
+dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
+   struct polled_queue *poll_list, int poll_cnt);
+static void
+dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx,
+ enum rxq_cycles_counter_type type,
+ unsigned long long cycles);
+static uint64_t
+dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx,
+ enum rxq_cycles_counter_type type);
 static void
 dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd,
@@ -3099,4 +3113,21 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread 
*pmd,
 }
 
+static void
+dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx,
+ enum rxq_cycles_counter_type type,
+ unsigned long long cycles)
+{
+   atomic_store_relaxed(&rx->cycles[type], cycles);
+}
+
+static uint64_t
+dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx,
+ enum rxq_cycles_counter_type type)
+{
+unsigned long long tmp;
+atomic_read_relaxed(&rx->cycles[type], &tmp);
+return tmp;
+}
+
 static int
 dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
@@ -3143,5 +3174,5 @@ port_reconfigure(struct dp_netdev_port *port)
 {
 struct netdev *netdev = port->netdev;
-int i, err;
+int i, err, last_nrxq;
 
 port->need_reconfigure = false;
@@ -3152,4 +3183,5 @@ port_reconfigure(struct dp_netdev_port *port)
 port->rxqs[i].rx = NULL;
 }
+last_nrxq = port->n_rxq;
 port->n_rxq = 0;
 
@@ -3172,4 +3204,9 @@ port_reconfigure(struct dp_netdev_port *port)
 for (i = 0; i < netdev_n_rxq(netdev); i++) {
 port->rxqs[i].port = port;
+if (i >= last_nrxq) {
+/* Only reset cycle stats for new queues */
+dp_netdev_rxq_set_cycles(&port->rxqs[i], RXQ_CYCLES_PROC_CURR, 0);
+dp_netdev_rxq_set_cycles(&port->rxqs[i], RXQ_CYCLES_PROC_LAST, 0);
+}
 err = netdev_rxq_open(netdev, &port->rxqs[i].rx, i);
 if (err) {
@@ -3765,5 +3802,5 @@ reload:
 dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
poll_list[i].port_no);
-cycles_count_intermediate(pmd, NULL,
+cycles_count_intermediate(pmd, poll_list[i].rxq,
   process_packets ? PMD_CYCLES_PROCESSING
   : PMD_CYCLES_IDLE);
@@ -3776,5 +3813,5 @@ reload:
 
 coverage_try_clear();
-dp_netdev_pmd_try_optimize(pmd);
+dp_netdev_pmd_try_optimize(pmd, poll_list, poll_cnt);
 if (!ovsrcu_try_quiesce()) {
 emc_cache_slow_sweep(&pmd->flow_cache);
@@ -4221,4 +4258,5 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, 
struct dp_netdev *dp,
 cmap_init(&pmd->classifiers);
 pmd->next_optimization = time_msec() + DPCLS_OPTIMIZATION_INTERVAL;
+pmd->rxq_interval = time_msec() + PMD_RXQ_INTERVAL;
 hmap_init(&pmd->poll_list);
 hmap_init(&pmd->tx_ports);
@@ -5654,8 +5692,25 @@ dpcls_sort_subtable_vector(struct dpcls *cls)
 
 static inline void
-dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd)
+dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
+   struct polled_queue *poll_list, int poll_cnt)
 {
 struct dpcls *cls;
 long long int now = time_msec();
+int i;
+uint64_t rxq_cyc_curr;
+
+if (now > pmd->rxq_interval) {
+/* Get the cycles that were used to process each queue and store. */
+for (i = 0; i < poll_cnt; i++) {
+rxq_cyc_curr = dp_netdev_rxq_get_

[ovs-dev] [PATCH v3 2/6] dpif-netdev: Add rxq processing cycle counters.

2017-08-01 Thread Kevin Traynor

Add two counters to dp_netdev_rxq which will be used for storing the
processing cycles of an rxq. Processing cycles will be stored in reference
to a defined interval. One counter is used for storing cycles during the
current in progress interval, while the other is used to store the cycles
of the last fully complete interval.

cycles_count_intermediate was used to count cycles for a pmd. With some small
additions we can also use it to count the cycles used for processing an rxq.

Signed-off-by: Kevin Traynor 
---
 lib/dpif-netdev.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 62a3bce..0a4daf9 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -336,4 +336,11 @@ enum pmd_cycles_counter_type {
 };
 
+enum rxq_cycles_counter_type {
+RXQ_CYCLES_PROC_CURR,  /* Cycles spent successfully polling and
+  processing polled packets */
+RXQ_CYCLES_PROC_LAST,
+RXQ_N_CYCLES
+};
+
 #define XPS_TIMEOUT_MS 500LL
 
@@ -347,4 +354,5 @@ struct dp_netdev_rxq {
   particular core. */
 struct dp_netdev_pmd_thread *pmd;  /* pmd thread that polls this queue. */
+atomic_ullong cycles[RXQ_N_CYCLES];
 };
 
@@ -671,5 +679,4 @@ static void pmd_load_cached_ports(struct 
dp_netdev_pmd_thread *pmd)
 static inline void
 dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd);
-
 static void
 dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd,
@@ -3077,4 +3084,5 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd,
 static inline void
 cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
+  struct dp_netdev_rxq *rxq,
   enum pmd_cycles_counter_type type)
 OVS_NO_THREAD_SAFETY_ANALYSIS
@@ -3085,4 +3093,8 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread 
*pmd,
 
 non_atomic_ullong_add(&pmd->cycles.n[type], interval);
+if (rxq && (type == PMD_CYCLES_PROCESSING)) {
+/* Add to the amount of current processing cycles. */
+non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval);
+}
 }
 
@@ -3589,5 +3601,5 @@ dpif_netdev_run(struct dpif *dpif)
port->rxqs[i].rx,
port->port_no);
-cycles_count_intermediate(non_pmd, process_packets ?
+cycles_count_intermediate(non_pmd, NULL, process_packets ?
PMD_CYCLES_PROCESSING
  : PMD_CYCLES_IDLE);
@@ -3753,5 +3765,5 @@ reload:
 dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
poll_list[i].port_no);
-cycles_count_intermediate(pmd,
+cycles_count_intermediate(pmd, NULL,
   process_packets ? PMD_CYCLES_PROCESSING
   : PMD_CYCLES_IDLE);
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 1/6] dpif-netdev: Change polled_queue to use dp_netdev_rxq.

2017-08-01 Thread Kevin Traynor

Soon we will want to store processing cycle counts in the dp_netdev_rxq,
so use that as a basis for the polled_queue that pmd_thread_main uses.

Signed-off-by: Kevin Traynor 
---
 lib/dpif-netdev.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 47a9fa0..62a3bce 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -482,5 +482,5 @@ struct dp_netdev_pmd_cycles {
 
 struct polled_queue {
-struct netdev_rxq *rx;
+struct dp_netdev_rxq *rxq;
 odp_port_t port_no;
 };
@@ -3698,5 +3698,5 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread 
*pmd,
 i = 0;
 HMAP_FOR_EACH (poll, node, &pmd->poll_list) {
-poll_list[i].rx = poll->rxq->rx;
+poll_list[i].rxq = poll->rxq;
 poll_list[i].port_no = poll->rxq->port->port_no;
 i++;
@@ -3735,6 +3735,6 @@ reload:
 for (i = 0; i < poll_cnt; i++) {
VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
-pmd->core_id, netdev_rxq_get_name(poll_list[i].rx),
-netdev_rxq_get_queue_id(poll_list[i].rx));
+pmd->core_id, netdev_rxq_get_name(poll_list[i].rxq->rx),
+netdev_rxq_get_queue_id(poll_list[i].rxq->rx));
 }
 
@@ -3751,5 +3751,5 @@ reload:
 for (i = 0; i < poll_cnt; i++) {
 process_packets =
-dp_netdev_process_rxq_port(pmd, poll_list[i].rx,
+dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
poll_list[i].port_no);
 cycles_count_intermediate(pmd,
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v3 0/6] OVS-DPDK rxq to pmd assignment improvements.

2017-08-01 Thread Kevin Traynor

For the DPDK datapath, by default rxqs are assigned to available pmds
in round robin order with no weight or priority.

It can happen that some very busy queues are handled by one pmd which
does not have enough cycles to prevent packets being dropped on them.
While at the same time another pmd which handles queues with no traffic
on them is essentially idle.

Rxq to pmd assignment happens as a result of a number of events and
when it does, the same unweighted round robin approach is applied
each time.

This patchset proposes to improve the round robin nature of rxq to pmd
assignment by counting the processing cycles used by the rxqs during
their operation and incorporating that data into assignment.

Before assigning in a round robin manner, the rxqs will be sorted in
order of the processing cycles they have been consuming. Assuming
multiple pmds, this ensures that the rxqs measured to be using the
most processing cycles will be assigned to different cores.

In some cases the measured cycles for an rxq may be not available as
the rxq is new or may not be useful for assignment as traffic patterns
may change.  In those cases the code will essentially fallback to being
round round similar to what currently exists. However, in the case
where data is available and a reliable indication of future rxq cycles
consumption, rxq to pmd distribution will be much improved.

V2 -> V3

Dropped v2 1/7 as not reusing dpcls optimisation interval anymore

2/6
Moved unused functions to 3/6 to avoid compiler warning

3/6
Made pmd rxq interval independent from dpcls opt interval

4/6
Moved docs about rebalance command to when it is available in 6/6
Added logging info for pmd to rxq assignment

5/6
Added an example to docs

6/6
Noted in commit msg that Jan requested this for testing purposes

V1 -> V2

Dropped Ciara's patch to change how pmd cycles are counted as it merged.

6/7: Rebased unit tests.

RFC -> PATCH

Changed order of patches, squashed two and added some new patches.

1/8
Added Ciara's patch into this series so it can be applied in one go.
It's a standalone patch but has not merged yet.

2/8
No change.

3/8
No change.

4/8
Merged patch that adds counters with patch that updates them
through cycle_count_intermediate to remove unsed function warnings.
Consolidated counter functions by using enum as suggested by Ian Stokes.
Small changes to use new counter functions.

5/8
Minor changes to use new counter functions.

6/8
Minor changes to use new counter functions.
Added docs.

7/8


Kevin Traynor (6):
  dpif-netdev: Change polled_queue to use dp_netdev_rxq.
  dpif-netdev: Add rxq processing cycle counters.
  dpif-netdev: Count the rxq processing cycles for an rxq.
  dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
  dpif-netdev: Change pmd selection order.
  dpif-netdev: Add ovs-appctl dpif-netdev/pmd-rxq-rebalance.

 Documentation/howto/dpdk.rst |  26 +
 lib/dpif-netdev.c| 220 +--
 tests/pmd.at |   2 +-
 vswitchd/ovs-vswitchd.8.in   |   2 +
 4 files changed, 222 insertions(+), 28 deletions(-)

-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Ben Pfaff

On Tue, Aug 01, 2017 at 08:06:59PM +0800, Yi Yang wrote:
> From: Jan Scheurich 
> 
> This commit adds support for the OpenFlow actions generic encap
> and decap (as specified in ONF EXT-382) to the OVS control plane.
> 
> CLI syntax for encap action with properties:
>   encap(hdr=)
>   encap(hdr=,
> prop(class=,type=,val=),
> prop(class=,type=,val()))
> 
> CLI syntax for decap action:
>   decap()
>   decap(packet_type(ns=,type=))
> 
> The first header supported for encap and decap is "ethernet" to convert
> packets between packet_type (1,Ethertype) and (0,0).
> 
> This commit also implements a skeleton for the translation of generic
> encap and decap actions in ofproto-dpif and adds support to encap and
> decap an Ethernet header.
> 
> In general translation of encap commits pending actions and then rewrites
> struct flow in accordance with the new packet type and header. In the
> case of encap(ethernet) it suffices to change the packet type from
> (1, Ethertype) to (0,0) and set the dl_type accordingly. A new
> pending_encap flag in xlate ctx is set to mark that an corresponding
> datapath encap action must be triggered at the next commit. In the
> case of encap(ethernet) ofproto generetas a push_eth action.
> 
> The general case for translation of decap() is to emit a datapath action
> to decap the current outermost header and then recirculate the packet
> to reparse the inner headers. In the special case of an Ethernet packet,
> decap() just changes the packet type from (0,0) to (1, dl_type) without
> a need to recirculate. The emission of the pop_eth action for the
> datapath is postponed to the next commit.
> 
> Hence encap(ethernet) and decap() on an Ethernet packet are OF octions
> that only incur a cost in the dataplane when a modifed packet is
> actually committed, e.g. because it is sent out. They can freely be
> used for normalizing the packet type in the OF pipeline without
> degrading performance.
> 
> Signed-off-by: Jan Scheurich 
> Signed-off-by: Yi Yang 
> Signed-off-by: Zoltan Balogh 
> Co-authored-by: Zoltan Balogh 

Thanks for iterating so quickly!

Besides the syntax for properties, which I still think should be
simplified to e.g. nsh(md_type=1), I have only a few remaining comments.

I don't think there's any value in the n_props member in
nx_action_encap.  (This is about the OpenFlow wire format now, not the
internal format.)  The properties are the whole tail of the structure
and having a count doesn't make anything easier.  Can you remove it?  It
will allow us to drop 8 bytes from the action structure due to padding.
(In case it can't be removed, I'm providing a spelling fix.)

decode_ed_prop() still doesn't check the length properly, so I'm
providing a fix.

I'm appending my suggested incremental.

Thanks again!

--8<--cut here-->8--

diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index d0437f20922a..c7d47eb5dd71 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -4036,7 +4036,7 @@ struct nx_action_encap {
 ovs_be16 subtype;  /* NXAST_ENCAP. */
 ovs_be16 hdr_size; /* Header size in bytes, 0 = 'not specified'.*/
 ovs_be32 new_pkt_type; /* Header type to add and PACKET_TYPE of result. */
-ovs_be16 n_props;  /* Number of the following endecap properties. */
+ovs_be16 n_props;  /* Number of the following encap properties. */
 uint8_t pad[6];/* Align to 8 bytes boundary */
 struct ofp_ed_prop_header props[];  /* Encap TLV properties. */
 };
@@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad,
enum ofp_version ofp_version OVS_UNUSED,
struct ofpbuf *ofpacts)
 {
+if (ntohs(nad->len) > sizeof *nad) {
+/* No properties supported yet. */
+return OFPERR_OFPBPC_BAD_TYPE;
+}
+
 struct ofpact_decap * decap;
 
 decap = ofpact_put_DECAP(ofpacts);
diff --git a/lib/ofp-ed-props.c b/lib/ofp-ed-props.c
index 260adc30acf0..6bba3ac7d982 100644
--- a/lib/ofp-ed-props.c
+++ b/lib/ofp-ed-props.c
@@ -32,6 +32,9 @@ decode_ed_prop(const struct ofp_ed_prop_header **ofp_prop,
 uint16_t prop_class = ntohs((*ofp_prop)->prop_class);
 size_t len = (*ofp_prop)->len;
 size_t pad_len = ROUND_UP(len, 8);
+if (pad_len > *remaining) {
+return OFPERR_OFPBAC_BAD_LEN;
+}
 
 switch (prop_class) {
 default:
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 6/7] dpif-netdev: Change pmd selection order.

2017-08-01 Thread Kevin Traynor

On 07/22/2017 03:52 PM, Stokes, Ian wrote:
>> Up to his point rxqs are sorted by processing cycles they consumed and
>> assigned to pmds in a round robin manner.
>>
>> Ian pointed out that on wrap around the most loaded pmd will be the next
>> one to be assigned an additional rxq and that it would be better to
>> reverse the pmd order when wraparound occurs.
>>
>> In other words, change from assigning by rr to assigning in a forward and
>> reverse cycle through pmds.
>>
>> Suggested-by: Ian Stokes 
>> Signed-off-by: Kevin Traynor 
>> ---
>>  lib/dpif-netdev.c | 21 -
>>  tests/pmd.at  |  2 +-
>>  2 files changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 7663dba..31f0ed4
>> 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -3230,4 +3230,5 @@ struct rr_numa {
>>
>>  int cur_index;
>> +bool idx_inc;
>>  };
>>
>> @@ -3268,4 +3269,7 @@ rr_numa_list_populate(struct dp_netdev *dp, struct
>> rr_numa_list *rr)
>>  numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa-
>>> pmds);
>>  numa->pmds[numa->n_pmds - 1] = pmd;
>> +/* At least one pmd so initialise curr_idx and idx_inc. */
>> +numa->cur_index = 0;
>> +numa->idx_inc = true;
>>  }
>>  }
>> @@ -3274,5 +3278,20 @@ static struct dp_netdev_pmd_thread *
>> rr_numa_get_pmd(struct rr_numa *numa)  {
>> -return numa->pmds[numa->cur_index++ % numa->n_pmds];
>> +int numa_idx = numa->cur_index;
>> +
>> +if (numa->idx_inc == true) {
>> +if (numa->cur_index == numa->n_pmds-1) {
>> +numa->idx_inc = false;
>> +} else {
>> +numa->cur_index++;
>> +}
>> +} else {
>> +if (numa->cur_index == 0) {
>> +numa->idx_inc = true;
>> +} else {
>> +numa->cur_index--;
>> +}
>> +}
>> +return numa->pmds[numa_idx];
>>  }
>>
> I like the above approach, as there's a bit more complexity introduced into 
> the numa selection process, I'm wondering does it make sense to add a 
> VLOG_DBG message here regarding the current index and index 
> increments/decrements?
> 
> I'm just thinking that this could be an area that has an impact on 
> performance and could be useful for someone to help debug the assignments.
> 

Good idea. I didn't add it here as this function just knows about the
pmd index. I've added it to the rxq_scheduling function as part of 4/6
where all the information about the pmds, queues, cycles etc. is
available and put it at info level.

>> diff --git a/tests/pmd.at b/tests/pmd.at index f95a016..73a8be1 100644
>> --- a/tests/pmd.at
>> +++ b/tests/pmd.at
>> @@ -54,5 +54,5 @@ m4_define([CHECK_PMD_THREADS_CREATED], [
>>
>>  m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id
>> \)[[0-9]]*:/\1\2:/"])
>> -m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/\(numa_id \)[[0-9]]*\(
>> core_id \)[[0-9]]*:/\1\2:/;s/\(queue-id: \)0 2 4
>> 6/\1/;s/\(queue-id: \)1 3 5 7/\1/"])
>> +m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/\(numa_id \)[[0-9]]*\(
>> +core_id \)[[0-9]]*:/\1\2:/;s/\(queue-id: \)1 2 5
>> +6/\1/;s/\(queue-id: \)0 3 4 7/\1/"])
>>  m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"])
>>
>> --
>> 1.8.3.1
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 5/7] dpif-netdev: Change rxq_scheduling to use rxq processing cycles.

2017-08-01 Thread Kevin Traynor

On 07/22/2017 03:52 PM, Stokes, Ian wrote:
>> Previously rxqs were assigned to pmds by round robin in port/queue order.
>>
>> Now that we have the processing cycles used for existing rxqs, use that
>> information to try and produced a better balanced distribution of rxqs
>> across pmds. i.e. given multiple pmds, the rxqs which have consumed the
>> largest amount of processing cycles will be placed on different pmds.
>>
>> The rxqs are sorted by their processing cycles and assigned (in sorted
>> order) round robin across pmds.
>>
>> Signed-off-by: Kevin Traynor 
>> ---
>>  Documentation/howto/dpdk.rst | 10 +++
>>  lib/dpif-netdev.c| 67 +++
>> -
>>  2 files changed, 63 insertions(+), 14 deletions(-)
>>
>> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
>> index af01d3e..d9ac8d3 100644
>> --- a/Documentation/howto/dpdk.rst
>> +++ b/Documentation/howto/dpdk.rst
>> @@ -119,4 +119,14 @@ After that PMD threads on cores where RX queues was
>> pinned will become
>>thread.
>>
>> +If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds
>> +automatically. The processing cycles that have been required for each
>> +rxq will be used where known to assign rxqs with the highest
>> +consumption of processing cycles to different pmds.
>> +
>> +Rxq to pmds assignment takes place whenever there are configuration
>> +changes or can be triggered by using::
>> +
>> +$ ovs-appctl dpif-netdev/pmd-rxq-rebalance
>> +
> I think an illustrated example of the expected assignment behavior would be 
> beneficial here to give users a feel for what's happening under the hood.
> 
> Something simple like how 4 queues would be distributed over 3 pmds, although 
> this change might make more sense to be rolled in with patch 6 when the pmd 
> selection process is modified.
> 

Sure. Yeah, I agree it makes more sense when the algorithm is finalized,
so I added it there.

>>  QoS
>>  ---
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 185de9b..7663dba
>> 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -3289,8 +3289,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr)  }
>>
>> +/* Sort Rx Queues by the processing cycles they are consuming. */
>> +static int rxq_cycle_sort(const void *a, const void *b) {
>> +struct dp_netdev_rxq * qa;
>> +struct dp_netdev_rxq * qb;
>> +
>> +qa = *(struct dp_netdev_rxq **) a;
>> +qb = *(struct dp_netdev_rxq **) b;
>> +
>> +if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >=
>> +dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) {
>> +return -1;
>> +}
>> +
>> +return 1;
>> +}
>> +
>>  /* Assign pmds to queues.  If 'pinned' is true, assign pmds to pinned
>>   * queues and marks the pmds as isolated.  Otherwise, assign non isolated
>>   * pmds to unpinned queues.
>>   *
>> + * If 'pinned' is false queues will be sorted by processing cycles they
>> + are
>> + * consuming and then assigned to pmds in round robin order.
>> + *
>>   * The function doesn't touch the pmd threads, it just stores the
>> assignment
>>   * in the 'pmd' member of each rxq. */
>> @@ -3300,18 +3321,14 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
>> OVS_REQUIRES(dp->port_mutex)
>>  struct dp_netdev_port *port;
>>  struct rr_numa_list rr;
>> -
>> -rr_numa_list_populate(dp, &rr);
>> +struct dp_netdev_rxq ** rxqs = NULL;
>> +int i, n_rxqs = 0;
>> +struct rr_numa *numa = NULL;
>> +int numa_id;
>>
>>  HMAP_FOR_EACH (port, node, &dp->ports) {
>> -struct rr_numa *numa;
>> -int numa_id;
>> -
>>  if (!netdev_is_pmd(port->netdev)) {
>>  continue;
>>  }
>>
>> -numa_id = netdev_get_numa_id(port->netdev);
>> -numa = rr_numa_list_lookup(&rr, numa_id);
>> -
>>  for (int qid = 0; qid < port->n_rxq; qid++) {
>>  struct dp_netdev_rxq *q = &port->rxqs[qid]; @@ -3331,17
>> +3348,39 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
>> OVS_REQUIRES(dp->port_mutex)
>>  }
>>  } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
>> -if (!numa) {
>> -VLOG_WARN("There's no available (non isolated) pmd
>> thread "
>> -  "on numa node %d. Queue %d on port \'%s\'
>> will "
>> -  "not be polled.",
>> -  numa_id, qid, netdev_get_name(port-
>>> netdev));
>> +if (n_rxqs == 0) {
>> +rxqs = xmalloc(sizeof *rxqs);
>>  } else {
>> -q->pmd = rr_numa_get_pmd(numa);
>> +rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1));
>>  }
>> +/* Store the queue. */
>> +rxqs[n_rxqs++] = q;
>>  }
>>  }
>>  }
>>
>> +if (n_rxqs > 1) {
>> +/* Sort the queues in order of the process

Re: [ovs-dev] [PATCH v2 3/7] dpif-netdev: Add rxq processing cycle counters.

2017-08-01 Thread Kevin Traynor

On 07/22/2017 03:51 PM, Stokes, Ian wrote:
>> Add two counters to dp_netdev_rxq which will be used for storing the
>> processing cycles of an rxq. Processing cycles will be stored in reference
>> to a defined interval. One counter is used for storing cycles during the
>> current in progress interval, while the other is used to store the cycles
>> of the last fully complete interval.
>>
>> cycles_count_intermediate was used to count cycles for a pmd. With some
>> small additions we can also use it to count the cycles used for processing
>> an rxq.
>>
>> Signed-off-by: Kevin Traynor 
> 
> Think I flagged this before but OVS doesn't compile cleanly after applying 
> this patch.
> 
> lib/dpif-netdev.c:3117:1: error: 'dp_netdev_rxq_get_cycles' defined but not 
> used [-Werror=unused-function]
>  dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx,
>  ^~~~
> lib/dpif-netdev.c:3109:1: error: 'dp_netdev_rxq_set_cycles' defined but not 
> used [-Werror=unused-function]
>  dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx,
> 
> Consider rolling it into patch 4 of the series to avoid the issue.

Hmm, I thought I had fixed this but obviously not :(

As you suggested, I moved them to the next patch where they are first used.

>> ---
>>  lib/dpif-netdev.c | 42 +++---
>>  1 file changed, 39 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 0b8a0c8..273db38
>> 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -336,4 +336,11 @@ enum pmd_cycles_counter_type {  };
>>
>> +enum rxq_cycles_counter_type {
>> +RXQ_CYCLES_PROC_CURR,  /* Cycles spent successfully polling and
>> +  processing polled packets */
>> +RXQ_CYCLES_PROC_LAST,
>> +RXQ_N_CYCLES
>> +};
>> +
>>  #define XPS_TIMEOUT_MS 500LL
>>
>> @@ -347,4 +354,5 @@ struct dp_netdev_rxq {
>>particular core. */
>>  struct dp_netdev_pmd_thread *pmd;  /* pmd thread that polls this
>> queue. */
>> +atomic_ullong cycles[RXQ_N_CYCLES]; /* Processing cycles. */
>>  };
> Very minor nit but not crazy about the alignment of the comments within the 
> struct above.
> 

I just deleted the comment as it's not really necessary and it won't
line up with the rest correctly.

>>
>> @@ -671,5 +679,11 @@ static void pmd_load_cached_ports(struct
>> dp_netdev_pmd_thread *pmd)  static inline void
>> dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd);
>> -
>> +static void
>> +dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx,
>> + enum rxq_cycles_counter_type type,
>> + unsigned long long cycles); static uint64_t
>> +dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx,
>> + enum rxq_cycles_counter_type type);
>>  static void
>>  dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd, @@
>> -3077,4 +3091,5 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd,
>> static inline void  cycles_count_intermediate(struct dp_netdev_pmd_thread
>> *pmd,
>> +  struct dp_netdev_rxq *rxq,
>>enum pmd_cycles_counter_type type)
>>  OVS_NO_THREAD_SAFETY_ANALYSIS
>> @@ -3085,4 +3100,25 @@ cycles_count_intermediate(struct
>> dp_netdev_pmd_thread *pmd,
>>
>>  non_atomic_ullong_add(&pmd->cycles.n[type], interval);
>> +if (rxq && (type == PMD_CYCLES_PROCESSING)) {
>> +/* Add to the amount of current processing cycles. */
>> +non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR],
>> interval);
>> +}
>> +}
>> +
>> +static void
>> +dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx,
>> + enum rxq_cycles_counter_type type,
>> + unsigned long long cycles) {
>> +   atomic_store_relaxed(&rx->cycles[type], cycles); }
>> +
>> +static uint64_t
>> +dp_netdev_rxq_get_cycles(struct dp_netdev_rxq *rx,
>> + enum rxq_cycles_counter_type type) {
>> +unsigned long long tmp;
>> +atomic_read_relaxed(&rx->cycles[type], &tmp);
>> +return tmp;
>>  }
>>
>> @@ -3589,5 +3625,5 @@ dpif_netdev_run(struct dpif *dpif)
>> port->rxqs[i].rx,
>> port->port_no);
>> -cycles_count_intermediate(non_pmd, process_packets ?
>> +cycles_count_intermediate(non_pmd, NULL,
>> process_packets ?
>>
>> PMD_CYCLES_PROCESSING
>>   : PMD_CYCLES_IDLE);
>> @@ -3753,5 +3789,5 @@ reload:
>>  dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
>> poll_list[i].port_no);
>> -cycles_count_intermediate(pmd,
>> +cycles_count_intermediate(pmd, NULL,
>>process_packets ?
>> PMD_CYCLES_PROCESSING

Re: [ovs-dev] [PATCH v2 1/7] dpif-netdev: Make dpcls optimization interval more generic.

2017-08-01 Thread Kevin Traynor

On 07/22/2017 03:50 PM, Stokes, Ian wrote:
>> So far the interval was only used for dpcls optimization.
>> Soon, we will use it for storing rxq cycles so make the names more
>> generic. Also, set the interval regardless of whether dpcls optimization
>> has occurred, as the optimization interval will need to be consistent
>> across pmds.
>>
>> Signed-off-by: Kevin Traynor 
>> ---
>>  lib/dpif-netdev.c | 10 +-
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 47a9fa0..aa8c05d
>> 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -178,6 +178,6 @@ struct emc_cache {
>>  /* Simple non-wildcarding single-priority classifier. */
>>
>> -/* Time in ms between successive optimizations of the dpcls subtable
>> vector */ -#define DPCLS_OPTIMIZATION_INTERVAL 1000
>> +/* Time in ms between successive optimizations of the pmd */ #define
>> +PMD_OPTIMIZATION_INTERVAL 1000
>>
>>  struct dpcls {
>> @@ -4208,5 +4208,5 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread
>> *pmd, struct dp_netdev *dp,
>>  cmap_init(&pmd->flow_table);
>>  cmap_init(&pmd->classifiers);
>> -pmd->next_optimization = time_msec() + DPCLS_OPTIMIZATION_INTERVAL;
>> +pmd->next_optimization = time_msec() + PMD_OPTIMIZATION_INTERVAL;
> 
> I'm not sure if we should be removing the dpcls optimization interval and 
> combining it with the pmd optimization for the purpose rxq balancing.
> Is there a situation when you will want the dpcls optimized at a different 
> interval from the rxq rebalance?
> 
> Maybe you have thought of this already in this solution, but for instance, in 
> the case of bursty non uniform traffic, would it be an idea to allow the 
> pmd_optmization period to be configurable by the user so that enough time 
> passes that the rxq rebalance occurs on statistic's that have been gathered 
> over a longer period to negate the bursty aspect of the traffic? And in that 
> case you may still want the dpcls optimization to occur at the usual period 
> of 1000.
> 

It's a good question. On the other hand a longer interval can mean using
older data. I'm reluctant to add another user config unless people
*really* think it's needed.

For the time being I've made the intervals independent. That way it
avoids a slight change in the dpcls optimization behaviour (when it
couldn't get the mutex), they can use different intervals and a user
config can easily be added in the future if needed.

As I don't need to make the dpcls interval generic anymore, I've dropped
this patch from v3.

>>  hmap_init(&pmd->poll_list);
>>  hmap_init(&pmd->tx_ports);
>> @@ -5656,7 +5656,7 @@ dp_netdev_pmd_try_optimize(struct
>> dp_netdev_pmd_thread *pmd)
>>  }
>>  ovs_mutex_unlock(&pmd->flow_mutex);
>> -/* Start new measuring interval */
>> -pmd->next_optimization = now + DPCLS_OPTIMIZATION_INTERVAL;
>>  }
>> +/* Start new measuring interval */
>> +pmd->next_optimization = now + PMD_OPTIMIZATION_INTERVAL;
>>  }
>>  }
>> --
>> 1.8.3.1
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH RFC v2 4/4] dpif-netdev: Time based output batching.

2017-08-01 Thread Bodireddy, Bhanuprakash

>On 28.07.2017 10:20, Darrell Ball wrote:
>> I have not tested yet
>>
>> However, I would have expected something max latency config. to be
>specific to netdev-dpdk port types
>
>IMHO, if we can make it generic, we must make it generic.
>
>[Darrell]
>The first question I ask myself is -  is this functionality intrinsically 
>generic or is
>it not ?
>It is clearly not and trying to make it artificially so would do the following:
>
>1) We end up designing something the wrong way where it partially works.
>2) Breaks other features present and future that really do intersect.
>
>
> Making of this
>functionality netdev-dpdk specific will brake ability to test it using
>unit tests. As the change is complex and has a lot of pitfalls like
>possible packet stucks and possible latency issues, this code should be
>covered by unit tests to simplify the support and modifications.
>(And it's already partly covered because it is generic. And I fixed many
>minor issues while developing through unit test failures.)
>
>[Darrell]
>Most of dpdk is not tested by our unit tests because it cannot be simulated
>well at the moment. This is orthogonal to the basic question however.

Darrell is right and the unit tests we have currently don't test DPDK datapath 
well. 
So having this changes in netdev layer shouldn't  impact the unit tests much. 

While I share your other concern that changes in netdev layer will be little 
complex and slightly
painful for future code changes, this max latency config  introduced in dpif 
layer may not hold good to
different port types and users may potentially introduce conflicting changes in 
netdev layer in future to
suit their use cases.
 
>
>
>In the future this can be used also to improve performance of netdev-linux
>by replacing sendmsg() with batched sendmmsg(). This should significantly
>increase performance of flood actions while MACs are not learned yet in
>action NORMAL.
>
>> This type of code also seems to intersect with present and future QoS
>considerations in netdev-dpdk

>
>Maybe, but there are also some related features in mail-list like rx queue
>prioritization which are implemented in generic way on dpif-netdev layer.

If you are referring to rxq prioritization work by Billy 
(https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336001.html),
this feature is more implemented in netdev layer with very minimal updates to 
dpif layer. 

BTW,  dp_execute_cb()  is getting cluttered with this patch. 

- Bhanuprakash.

>
>>
>> -Original Message-
>> From: Ilya Maximets 
>> Date: Wednesday, July 26, 2017 at 8:21 AM
>> To: "ovs-dev@openvswitch.org" ,
>Bhanuprakash Bodireddy 
>> Cc: Heetae Ahn , Ben Pfaff
>, Antonio Fischetti , Eelco
>Chaudron , Ciara Loftus ,
>Kevin Traynor , Darrell Ball ,
>Ilya Maximets 
>> Subject: [PATCH RFC v2 4/4] dpif-netdev: Time based output batching.
>>
>> This allows to collect packets from more than one RX burst
>> and send them together with a configurable maximum latency.
>>
>> 'other_config:output-max-latency' can be used to configure
>> time that a packet can wait in output batch for sending.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>
>> millisecon granularity is used for now. Can be easily switched to use
>> microseconds instead.
>>
>>  lib/dpif-netdev.c| 97
>+++-
>>  vswitchd/vswitch.xml | 15 
>>  2 files changed, 95 insertions(+), 17 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 07c7dad..e5f8a3d 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -84,6 +84,9 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev);
>>  #define MAX_RECIRC_DEPTH 5
>>  DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0)
>>
>> +/* Use instant packet send by default. */
>> +#define DEFAULT_OUTPUT_MAX_LATENCY 0
>> +
>>  /* Configuration parameters. */
>>  enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow
>table. */
>>  enum { MAX_METERS = 65536 };/* Maximum number of meters. */
>> @@ -261,6 +264,9 @@ struct dp_netdev {
>>  struct hmap ports;
>>  struct seq *port_seq;   /* Incremented whenever a port 
> changes.
>*/
>>
>> +/* The time that a packet can wait in output batch for sending. 
> */
>> +atomic_uint32_t output_max_latency;
>> +
>>  /* Meters. */
>>  struct ovs_mutex meter_locks[N_METER_LOCKS];
>>  struct dp_meter *meters[MAX_METERS]; /* Meter bands. */
>> @@ -498,6 +504,7 @@ struct tx_port {
>>  int qid;
>>  long long last_used;
>>  struct hmap_node node;
>> +long lo

Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Ben Pfaff

On Tue, Aug 01, 2017 at 08:26:07AM -0700, Ben Pfaff wrote:
> On Tue, Aug 01, 2017 at 12:32:20PM +, Yang, Yi Y wrote:
> > #2.
> > [Ben] I suspect that decode_NXAST_RAW_DECAP() should report an error if 
> > properties are present, since it doesn't support properties.
> > 
> > [Yi] It is impossible. 
> 
> What is impossible?  It is easy to detect that properties are present
> and report an error:
> 
> diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
> index d0437f20922a..7be302a4005d 100644
> --- a/lib/ofp-actions.c
> +++ b/lib/ofp-actions.c
> @@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap 
> *nad,
> enum ofp_version ofp_version OVS_UNUSED,
> struct ofpbuf *ofpacts)
>  {
> +if (ntohs(nad->len) > sizeof *nad) {
> +/* No properties supported yet. */
> +return OFPERR_OFPBPC_BAD_TYPE;
> +}

Looking again, I guess that OFPERR_NXBAC_UNKNOWN_ED_PROP is better.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 0/4] Output packet batching.

2017-08-01 Thread Bodireddy, Bhanuprakash

>This patch-set inspired by [1] from Bhanuprakash Bodireddy.
>Implementation of [1] looks very complex and introduces many pitfalls for
>later code modifications like possible packet stucks.
>
>This version targeted to make simple and flexible output packet batching on
>higher level without introducing and even simplifying netdev layer.
>
>Patch set consists of 3 patches. All the functionality introduced in the first
>patch. Two others are just cleanups of netdevs to not do unnecessary things.
>
>Basic testing of 'PVP with OVS bonding on phy ports' scenario shows
>significant performance improvement.
>More accurate and intensive testing required.
>
>[1] [PATCH 0/6] netdev-dpdk: Use intermediate queue during packet
>transmission.
>https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/334762.html
>
>Version 2:
>
>   * Rebased on current master.
>   * Added time based batching RFC patch.
>   * Fixed mixing packets with different sources in same batch.
>

Applied this series along with other patches[1] and gave initial try.
With this series, approximately half a million throughput drop is observed in 
simple test case (P2P - 1stream - udp) vs  master + [1]. 
The performance improvement is observed with multiple flows  (which this series 
is meant to address).

At this stage no latency settings were used. Yet to review and do more testing.

[1] Improves performance.
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335359.html
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336186.html
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336187.html
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336290.html

- Bhanuprakash.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Ben Pfaff

On Tue, Aug 01, 2017 at 12:32:20PM +, Yang, Yi Y wrote:
> #2.
> [Ben] I suspect that decode_NXAST_RAW_DECAP() should report an error if 
> properties are present, since it doesn't support properties.
> 
> [Yi] It is impossible. 

What is impossible?  It is easy to detect that properties are present
and report an error:

diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index d0437f20922a..7be302a4005d 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -4263,6 +4263,11 @@ decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad,
enum ofp_version ofp_version OVS_UNUSED,
struct ofpbuf *ofpacts)
 {
+if (ntohs(nad->len) > sizeof *nad) {
+/* No properties supported yet. */
+return OFPERR_OFPBPC_BAD_TYPE;
+}
+
 struct ofpact_decap * decap;
 
 decap = ofpact_put_DECAP(ofpacts);
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Ben Pfaff

On Tue, Aug 01, 2017 at 12:04:55AM +, Yang, Yi Y wrote:
> Hi, Ben
> 
> Thank you so much for your great review and great comments, I'll do changes 
> per your comments and post next version because Zoltan and Jan are taking 
> vacation.  I explained your concerns 
> 
> [Ben] The string format of the encap actions seems rather user unfriendly.  
> Is there a good reason why it should be the very generic
> prop(class=,type=,val=) rather than something more tailored 
> to the actual properties that users will want to set?  It is hard to tell 
> without any actual examples.
> 
> [Yi] because encap is very generic action, you can add a new header/encap 
> type and some new properties for it very easily by this kind of very generic 
> abstraction and mustn't change generic encap implementation. I make an 
> example, for NSH we have
> 
> prop(class=nsh,type=md_type,val=1)
> 
> If we add new encap/header type gtpu, we can have the below property for GTP-u
> 
> prop(class=gtpu,type=teid,val=1234)
> 
> Generic encap action can accept this kind of new prop without any change 
> against generic encap action implementation, this is the magic of very 
> generic " prop(class=,type=,val=)".

To me, this argument would make sense if it meant that we could add gtpu
without having to modify OVS.  But we'll still need to do that.  At the
very least, OVS isn't going to know class "gtpu" or type "teid" means
without modifications.

Here's a more user friendly syntax that seems just as generic to me:
class(type=value)
e.g.:
nsh(md_type=1)
gtpu(teid=1234)

> [Ben] In decode_NXAST_RAW_ENCAP() and parse_ENCAP(), shouldn't there be 
> validation of the header size value?  Most header types will only have a few 
> acceptable header sizes.
> 
> [Yi] Actually header_size isn't used at all now, it is reserved for later 
> enhancement. For NSH MD type 2, NSH header is length-variable, ranged from 8 
> to 264, number of props and size of each prop are also very flexible (like 
> Geneve tunnel metadata)

OK.

> [Ben] The internal representation of ofpact_ed_prop, has no examples so far.
> How confident are you that it actually needs to be variable-length?  If it 
> does, then using a member 'n_props' in struct ofpact_encap to count the 
> number of properties seems risky: it implicitly encourages programmers to try 
> to index the props[] array from 0 to n_props-1.  I'd encourage, in that case, 
> switching to a length member, or just eliminating the member if 
> ofpact_ed_props can be padded to 8-byte multiples.  But it's easier if we can 
> just make ofpact_ed_prop fixed-length for now; then we can just index props[] 
> as an array.
> 
> However, I question whether the properties actually need to be internally 
> represented as properties at all.  Presumably, OVS is only going to support a 
> specific set of properties.  Probably, it's easy to just add specific members 
> to struct ofpact_encap that represent the values of those properties.  Did 
> you consider that?  That approach usually simplifies code greatly; properties 
> are usually needed only for external representations.
> 
> [Yi] As I said above, prop is length-variable and unpredictable, especially 
> for NSH MD type 2, so a length-fixed ofpact_ed_prop impracticable and will 
> waste too much space, n_props can indicate number of props, actually props is 
> a pointer, we can use 
> 
> uint8 props[]
> 
> to replace
> 
> struct ofp_ed_prop_header props[];
> 
> But I think it doesn't bring any substantial benefit from my perspective.
> 
> ofpact_ed_prop is header of each peoperty, it just helps parse properties 
> better.
> 
> In addition, code is completely controllable, abuse of props[index] won't 
> happen because you guys check every commit very carefully :-)

OK.  I still think this seems to add more complexity than is probably
needed, but I'll leave it alone for now.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] How to tell which code is for userspace

2017-08-01 Thread ted.y.liang

Dear Ben,

Thanks. This make me clear.

Ted

Sent from Mail for Windows 10

From: Ben Pfaff
Sent: Tuesday, August 1, 2017 10:53 PM
To: ted.y.li...@gmail.com
Cc: ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] How to tell which code is for userspace

On Tue, Aug 01, 2017 at 10:01:31PM +0800, ted.y.li...@gmail.com wrote:
> I am reading ovs 2.7.90 code, do you have a quick way to tell which code is 
> for userspace and which for kernel?

Linux kernel code is in the datapath/ directory.
Windows kernel code is in the datapath-windows/ directory.
The rest of the code is userspace.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ofpbuf: Fix parameter for const initializer.

2017-08-01 Thread Ben Pfaff

On Mon, Jul 31, 2017 at 05:16:11PM -0700, Joe Stringer wrote:
> Clang complains:
> 
> In file included from include/openvswitch/cxxtest.cc:11:0:
> ../include/openvswitch/ofpbuf.h: In function ‘ofpbuf 
> ofpbuf_const_initializer(const void*, size_t)’:
> ../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of 
> ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned int}’ 
> inside { } [-Wnarrowing]
>  };
>  ^
> ../include/openvswitch/ofpbuf.h:107:5: warning: narrowing conversion of 
> ‘size’ from ‘size_t {aka long unsigned int}’ to ‘uint32_t {aka unsigned int}’ 
> inside { } [-Wnarrowing]
> 
> This is because the ofpbuf struct's "size" parameter is a uint32_t,
> while ofpbuf_const_initializer() takes a size_t for the size. Fix this
> function to take a uint32_t instead.
> 
> Signed-off-by: Joe Stringer 

Thanks.  (We must have different clang versions.0

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] How to tell which code is for userspace

2017-08-01 Thread Ben Pfaff

On Tue, Aug 01, 2017 at 10:01:31PM +0800, ted.y.li...@gmail.com wrote:
> I am reading ovs 2.7.90 code, do you have a quick way to tell which code is 
> for userspace and which for kernel?

Linux kernel code is in the datapath/ directory.
Windows kernel code is in the datapath-windows/ directory.
The rest of the code is userspace.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] How to tell which code is for userspace

2017-08-01 Thread ted.y.liang

Thanks. 

But there is a function in datapath/vport.c, as below:
struct vport *ovs_vport_alloc(int priv_size, const struct vport_ops *ops,
  const struct vport_parms *parms)
{
struct vport *vport;
size_t alloc_size;

alloc_size = sizeof(struct vport);
if (priv_size) {
alloc_size = ALIGN(alloc_size, VPORT_ALIGN);
alloc_size += priv_size;
}

vport = kzalloc(alloc_size, GFP_KERNEL);
if (!vport)
return ERR_PTR(-ENOMEM);

vport->dp = parms->dp;
vport->port_no = parms->port_no;
vport->ops = ops;
INIT_HLIST_NODE(&vport->dp_hash_node);

if (ovs_vport_set_upcall_portids(vport, parms->upcall_portids)) {
kfree(vport);
return ERR_PTR(-EINVAL);
}

return vport;
}

It uses kzalloc. Does not that mean it is part of the kernel code?

Br,

Ted
From: Russell Bryant
Sent: Tuesday, August 1, 2017 10:12 PM
To: ted.y.li...@gmail.com
Cc: ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] How to tell which code is for userspace

On Tue, Aug 1, 2017 at 10:01 AM,   wrote:
> Dear all,
>
> I am reading ovs 2.7.90 code, do you have a quick way to tell which code is 
> for userspace and which for kernel?

The ovs/datapath/linux/ directory is the Linux kernel datapath code.
The rest is userspace (with the exception of the Windows datapath
code).

-- 
Russell Bryant

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] Openvswitch crash when bringing down the dpdk bond port using "ovs-ofctl mod-port br-prv dpdk1 down"

2017-08-01 Thread Ilya Maximets

On 01.08.2017 17:19, Ilya Maximets wrote:
> On 01.08.2017 15:46, Keshav Gupta wrote:
>> Hi Ben/Ilya
>> Mainly patch do the following
>> 1) Call the dpdk rx api(rte_eth_rx_burst) only when ovs netdev state is UP
>> 2)  Now the rte_eth_dev_stop/ rte_eth_dev_start calls are removed in 
>> updating the flags   
>>
>>
>> This  fixes the core dump issue but I see below side effect with this patch:
>> With this change now only ports admin state(ovs netdev level) is changed 
>> (dpdk/nic level state is not changed) so NIC will continue to receive the 
>> packet until its Rxq ring is full
>> Later if somebody bring up the port then he will get the burst of old 
>> packets which are there in the Rxq (typically 4k packet (dpdk Rxq size)).
>>
>>
>> Thanks
>> Keshav
>>
> 
> Hi Keshav,
> 
> Yes, you're right. I think it's the same situation right now can be observed 
> on
> current master. We can try to fix this for the HW NICs in the similar way as 
> we
> handle destroy_device event for vhost-user ports. I prepared quick fix (not 
> even
> compile tested) for the reference how this can be fixed. Please check the code
> below. One thing is that we actually can't fix such behaviour for the 
> vhost-user
> because there is no such command to stop the vhost device. But we may try to
> fix it for HW NICs if it's important.
> 
> --8<->8--
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index ea17b97..202678f 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2300,20 +2300,48 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev,
> enum netdev_flags *old_flagsp)
>  OVS_REQUIRES(dev->mutex)
>  {
> +enum netdev_flags new_flags = dev->flags;
> +
>  if ((off | on) & ~(NETDEV_UP | NETDEV_PROMISC)) {
>  return EINVAL;
>  }
>  
>  *old_flagsp = dev->flags;
> -dev->flags |= on;
> -dev->flags &= ~off;
> +new_flags |= on;
> +new_flags &= ~off;
>  
> -if (dev->flags == *old_flagsp) {
> +if (new_flags == dev->flags) {
>  return 0;
>  }
>  
>  if (dev->type == DPDK_DEV_ETH) {
> -if (dev->flags & NETDEV_PROMISC) {
> +if (NETDEV_UP & ((*old_flagsp ^ on) | (*old_flagsp ^ off))) {
> +if (NETDEV_UP & off) {
> +bool quiescent = ovsrcu_is_quiescent();
> +
> +dev->flags = new_flags;
> +/* Wait for other threads to quiesce after turning off the
> + * 'NETDEV_UP' before actual stopping the device to be sure
> + * that all threads finished their rx/tx operations. */
> +ovsrcu_synchronize();
> +if (quiescent) {
> +/* As call to ovsrcu_synchronize() will end the quiescent
> + * state, put thread back into quiescent state. */
> +ovsrcu_quiesce_start();
> +}
> +rte_eth_dev_stop(dev->port_id);
> +}  else {
> +int retval = rte_eth_dev_start(dev->port_id);
> +
> +if (retval) {
> +VLOG_ERR("Interface %s start error: %s", dev->up.name,
> + rte_strerror(-diag));
> +return -retval;
> +}
> +}
> +}
> +
> +if (new_flags & NETDEV_PROMISC) {
>  rte_eth_promiscuous_enable(dev->port_id);
>  }
>  
> @@ -2336,6 +2364,7 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev,
>  }
>  }
>  
> +dev->flags = new_flags;
>  return 0;
>  }
>  
> --8<->8--
> 
> Also, the diff above made on top of current master. So, you possibly will
> need to port it a little.
> 

To make it work on current master at least following change also required:

--8<->8--
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ea17b97..7145266 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -761,11 +761,13 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 return -diag;
 }
 
-diag = rte_eth_dev_start(dev->port_id);
-if (diag) {
-VLOG_ERR("Interface %s start error: %s", dev->up.name,
- rte_strerror(-diag));
-return -diag;
+if (dev->flags & NETDEV_UP) {
+diag = rte_eth_dev_start(dev->port_id);
+if (diag) {
+VLOG_ERR("Interface %s start error: %s", dev->up.name,
+ rte_strerror(-diag));
+return -diag;
+}
 }
 
 rte_eth_promiscuous_enable(dev->port_id);
--8<->8--

It is to prevent stopped port starting on each reconfiguration.

> Best regards, Ilya Maximets.
> 
>>
>> -Original Message-
>> From: Ben Pfaff [mailto:b...@ovn.org] 
>> Sent: Monday, July

Re: [ovs-dev] [PATCH 4/5] Automatically verify that OVS header files work OK in C++ also.

2017-08-01 Thread Aaron Conole

Ben Pfaff  writes:

> On Mon, Jul 31, 2017 at 04:44:10PM -0400, Russell Bryant wrote:
>> On Mon, Jul 31, 2017 at 4:33 PM, Ben Pfaff  wrote:
>> > On Mon, Jul 31, 2017 at 01:05:32PM -0700, Ben Pfaff wrote:
>> >> On Mon, Jul 31, 2017 at 03:50:59PM -0400, Russell Bryant wrote:
>> >> > On Sun, Jul 30, 2017 at 10:54 PM, Ben Pfaff  wrote:
>> >> > > This should help address a recurring problem.
>> >> > >
>> >> > > Signed-off-by: Ben Pfaff 
>> >> > > ---
>> >> > >  .travis.yml |  1 +
>> >> > >  Makefile.am |  1 +
>> >> > >  configure.ac|  2 ++
>> >> > >  include/openvswitch/automake.mk | 14 ++
>> >> > >  m4/openvswitch.m4   | 21 +
>> >> > >  5 files changed, 39 insertions(+)
>> >> >
>> >> > Why does this patch depend on libboost?  It looks like it's only used
>> >> > when building a test C++ program in configure.  We could build a test
>> >> > C++ program without the dependency, right?
>> >>
>> >> It's because of include/openvswitch/compiler.h, which has:
>> >>
>> >> #elif defined(__cplusplus)
>> >> #include 
>> >> #define BUILD_ASSERT BOOST_STATIC_ASSERT
>> >> #define BUILD_ASSERT_DECL BOOST_STATIC_ASSERT
>> >>
>> 
>> Ah, sorry ...
>> 
>> >> We could probably define our own C++-compatible static assert without
>> >> boost.  It looks like C++11 and later has a built-in static_assert:
>> >> http://en.cppreference.com/w/cpp/language/static_assert
>> >>
>> >> Any idea whether it's reasonable to assume C++11 support these days?
>> 
>> I don't know, but that seems pretty reasonable to me.
>> 
>> I suppose if it becomes an issue, we could re-add the boost version as
>> an optional dependency that would get used only if C++11 wasn't
>> available?
>
> Sure.

It's actually quite reasonable to assume support for static_assert, it
was added to the working group during 2007/2008 timeframe, and every
compiler at the time which had c++-0x support (which I know includes
GCC[1], and VS[2]) have support for it.

Since this commit was applied I guess it doesn't matter, but it's
possibly useful information for the future.

>> Acked-by: Russell Bryant 
>
> Thanks, I applied this whole series to master, using the revised version
> of this patch.
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[1]: https://gcc.gnu.org/gcc-4.3/cxx0x_status.html
[2]:
https://blogs.msdn.microsoft.com/vcblog/2008/10/28/lambdas-auto-and-static_assert-c0x-features-in-vc10-part-1/
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side

2017-08-01 Thread Zhenyu Gao

Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum.
So L4 packets's cksum were calculated in VM side but performance is not
good.
Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput and
makes virtio-net frontend-driver support NETIF_F_SG as well

Signed-off-by: Zhenyu Gao 
---

Here is some performance number:

Setup:

 qperf client
+-+
|   VM|
+-+
 |
 |  qperf server
+--+  ++
| vswitch+dpdk |  | bare-metal |
+--+  ++
   ||
   ||
  pNic-PhysicalSwitch

do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0 tx on' in 
VM side.
  It offload cksum job to ovs-dpdk side.

do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx off' in VM 
side.
VM calculate cksum for tcp/udp packets.

We can see huge improvment in TCP throughput if we leverage ovs-dpdk cksum.

[root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2  host-qperf-server01 
tcp_bw tcp_lat udp_bw udp_lat
  do cksum in ovs-dpdk  do cksum in VM without this patch
tcp_bw:
bw  =  2.05 MB/secbw  =  1.92 MB/secbw  =  1.95 MB/sec
tcp_bw:
bw  =  3.9 MB/sec bw  =  3.99 MB/secbw  =  3.98 MB/sec
tcp_bw:
bw  =  8.09 MB/secbw  =  7.82 MB/secbw  =  8.19 MB/sec
tcp_bw:
bw  =  14.9 MB/secbw  =  14.8 MB/secbw  =  15.7 MB/sec
tcp_bw:
bw  =  27.7 MB/secbw  =  28 MB/sec  bw  =  29.7 MB/sec
tcp_bw:
bw  =  51.2 MB/secbw  =  50.9 MB/secbw  =  54.9 MB/sec
tcp_bw:
bw  =  86.7 MB/secbw  =  86.8 MB/secbw  =  95.1 MB/sec
tcp_bw: 
bw  =  149 MB/sec bw  =  160 MB/sec bw  =  149 MB/sec
tcp_bw:
bw  =  211 MB/sec bw  =  205 MB/sec bw  =  216 MB/sec
tcp_bw:
bw  =  271 MB/sec bw  =  254 MB/sec bw  =  275 MB/sec
tcp_bw:
bw  =  326 MB/sec bw  =  303 MB/sec bw  =  321 MB/sec
tcp_bw:
bw  =  407 MB/sec bw  =  359 MB/sec bw  =  361 MB/sec
tcp_bw:
bw  =  816 MB/sec bw  =  512 MB/sec bw  =  419 MB/sec
tcp_bw: 
bw  =  840 MB/sec bw  =  756 MB/sec bw  =  457 MB/sec
tcp_bw:
bw  =  1.07 GB/secbw  =  880 MB/sec bw  =  480 MB/sec
tcp_bw:
bw  =  1.17 GB/secbw  =  1.01 GB/secbw  =  488 MB/sec
tcp_bw:
bw  =  1.17 GB/secbw  =  1.11 GB/secbw  =  483 MB/sec
tcp_lat:
latency  =  29 us latency  =  29.2 us   latency  =  29.6 us
tcp_lat:
latency  =  28.9 us   latency  =  29.3 us   latency  =  29.5 us
tcp_lat:
latency  =  29 us latency  =  29.3 us   latency  =  29.6 us
tcp_lat:
latency  =  29 us latency  =  29.4 us   latency  =  29.5 us
tcp_lat:
latency  =  29 us latency  =  29.2 us   latency  =  29.6 us
tcp_lat:
latency  =  29.1 us   latency  =  29.3 us   latency  =  29.7 us
tcp_lat:
latency  =  29.4 us   latency  =  29.6 us   latency  =  30 us
tcp_lat:
latency  =  29.8 us   latency  =  30.1 us   latency  =  30.2 us
tcp_lat:
latency  =  30.9 us   latency  =  30.9 us   latency  =  31 us
tcp_lat:
latency  =  46.9 us   latency  =  46.2 us   latency  =  32.2 us
tcp_lat:
latency  =  51.5 us   latency  =  52.6 us   latency  =  34.5 us
tcp_lat:
latency  =  43.9 us   latency  =  43.8 us   latency  =  43.6 us
tcp_lat:
 latency  =  47.6 us  latency  =  48 us latency  =  48.1 us
tcp_lat:
latency  =  77.7 us   latency  =  78.8 us   latency  =  78.8 us
tcp_lat:
latency  =  82.8 us   latency  =  82.3 us   latency  =  116 us
tcp_lat:
latency  =  94.8 us   latency  =  94.2 us   latency  =  134 us
tcp_lat:
latency  =  167 uslatency  =  197 uslatency  =  172 us
udp_bw:
send_bw  =  418 KB/secsend_bw  =  413 KB/secsend_bw  =  403 KB/sec
recv_bw  =  410 KB/secrecv_bw  =  412 KB/secrecv_bw  =  400 KB/sec
udp_bw:
send_bw  =  831 KB/secsend_bw  =  825 KB/secsend_bw  =  810 KB/sec
recv_bw  =  828 KB/secrecv_bw  =  816 KB/secrecv_bw  =  807 KB/sec
udp_bw:
send_bw  =  1.67 MB/sec   send_bw  =  1.65 MB/sec   send_bw  =  1.63 MB/sec
recv_bw  =  1.64 MB/sec   recv_bw  =  1.62 MB/sec   recv_bw  =  1.63 MB/sec
udp_bw:
send_bw  =  3.36 MB/sec   send_bw  =  3.29 MB/sec   send_bw  =  3.26 MB/sec
recv_bw  =  3.29 MB/sec   recv_bw  =  3.25 MB/sec   recv_bw  =  2.82 MB/sec
udp_bw:
send_bw  =  6.72 MB/sec   send_bw  =  6.61 MB/sec   send_bw  =  6.45 MB/sec
recv_bw  =  6.54 MB/sec   recv_bw  =  6.59 MB/sec   recv_bw  =  6.45 MB/sec
udp_bw:
send_bw  =  13.4 MB/sec   send_bw  =  13.2 MB/s

Re: [ovs-dev] Openvswitch crash when bringing down the dpdk bond port using "ovs-ofctl mod-port br-prv dpdk1 down"

2017-08-01 Thread Ilya Maximets

On 01.08.2017 15:46, Keshav Gupta wrote:
> Hi Ben/Ilya
> Mainly patch do the following
> 1) Call the dpdk rx api(rte_eth_rx_burst) only when ovs netdev state is UP
> 2)  Now the rte_eth_dev_stop/ rte_eth_dev_start calls are removed in updating 
> the flags   
> 
> 
> This  fixes the core dump issue but I see below side effect with this patch:
> With this change now only ports admin state(ovs netdev level) is changed 
> (dpdk/nic level state is not changed) so NIC will continue to receive the 
> packet until its Rxq ring is full
> Later if somebody bring up the port then he will get the burst of old packets 
> which are there in the Rxq (typically 4k packet (dpdk Rxq size)).
> 
> 
> Thanks
> Keshav
>

Hi Keshav,

Yes, you're right. I think it's the same situation right now can be observed on
current master. We can try to fix this for the HW NICs in the similar way as we
handle destroy_device event for vhost-user ports. I prepared quick fix (not even
compile tested) for the reference how this can be fixed. Please check the code
below. One thing is that we actually can't fix such behaviour for the vhost-user
because there is no such command to stop the vhost device. But we may try to
fix it for HW NICs if it's important.

--8<->8--
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ea17b97..202678f 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2300,20 +2300,48 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev,
enum netdev_flags *old_flagsp)
 OVS_REQUIRES(dev->mutex)
 {
+enum netdev_flags new_flags = dev->flags;
+
 if ((off | on) & ~(NETDEV_UP | NETDEV_PROMISC)) {
 return EINVAL;
 }
 
 *old_flagsp = dev->flags;
-dev->flags |= on;
-dev->flags &= ~off;
+new_flags |= on;
+new_flags &= ~off;
 
-if (dev->flags == *old_flagsp) {
+if (new_flags == dev->flags) {
 return 0;
 }
 
 if (dev->type == DPDK_DEV_ETH) {
-if (dev->flags & NETDEV_PROMISC) {
+if (NETDEV_UP & ((*old_flagsp ^ on) | (*old_flagsp ^ off))) {
+if (NETDEV_UP & off) {
+bool quiescent = ovsrcu_is_quiescent();
+
+dev->flags = new_flags;
+/* Wait for other threads to quiesce after turning off the
+ * 'NETDEV_UP' before actual stopping the device to be sure
+ * that all threads finished their rx/tx operations. */
+ovsrcu_synchronize();
+if (quiescent) {
+/* As call to ovsrcu_synchronize() will end the quiescent
+ * state, put thread back into quiescent state. */
+ovsrcu_quiesce_start();
+}
+rte_eth_dev_stop(dev->port_id);
+}  else {
+int retval = rte_eth_dev_start(dev->port_id);
+
+if (retval) {
+VLOG_ERR("Interface %s start error: %s", dev->up.name,
+ rte_strerror(-diag));
+return -retval;
+}
+}
+}
+
+if (new_flags & NETDEV_PROMISC) {
 rte_eth_promiscuous_enable(dev->port_id);
 }
 
@@ -2336,6 +2364,7 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev,
 }
 }
 
+dev->flags = new_flags;
 return 0;
 }
 
--8<->8--

Also, the diff above made on top of current master. So, you possibly will
need to port it a little.

Best regards, Ilya Maximets.

> 
> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org] 
> Sent: Monday, July 31, 2017 9:26 PM
> To: Ilya Maximets
> Cc: ovs-dev@openvswitch.org; Keshav Gupta
> Subject: Re: [ovs-dev] Openvswitch crash when bringing down the dpdk bond 
> port using "ovs-ofctl mod-port br-prv dpdk1 down"
> 
> Thanks.
> 
> It doesn't cherry-pick cleanly to branch-2.6, so I think I'll leave it for 
> now since Darrell made the reasonable point that DPDK was experimental in 2.6.
> 
> On Mon, Jul 31, 2017 at 06:20:28PM +0300, Ilya Maximets wrote:
>> On 31.07.2017 16:05, Ben Pfaff wrote:
>>> Ilya, should we apply this patch to branch-2.6?  Are there other 
>>> patches that should be backported?
>>
>> It's definitely a bug and should be backported if someone wants to use 
>> barnch-2.6 with DPDK datapath. I traced only this exact case, so it's 
>> hard to say if something else should be backported, but this patch 
>> should fix described issue without  any additional changes.
>>  
>>> On Wed, Jul 26, 2017 at 03:28:12PM +0300, Ilya Maximets wrote:
 Hi.

 You need to backport at least following patch:

 commit 3b1fb0779b87788968c1a6a9ff295a9883547485
 Author: Daniele Di Proietto 
 Date:   Tue Nov 15 15:40:49 2016 -0800

 netdev-dpdk: Don't call rte_dev_stop() in update_flags().
 
 Calling rte_

Re: [ovs-dev] [PATCH v2 5/5] dp-packet: Use memcpy on dp_packet elements.

2017-08-01 Thread O Mahony, Billy

Hi Antonio,

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com
> Sent: Wednesday, July 19, 2017 5:05 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v2 5/5] dp-packet: Use memcpy on dp_packet
> elements.
> 
> memcpy replaces the several single copies inside
> dp_packet_clone_with_headroom().
> 
> Signed-off-by: Antonio Fischetti 
> ---
>  lib/dp-packet.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 67aa406..f4dbcb7 100644
> --- a/lib/dp-packet.c
> +++ b/lib/dp-packet.c
> @@ -157,8 +157,9 @@ dp_packet_clone(const struct dp_packet *buffer)
>  return dp_packet_clone_with_headroom(buffer, 0);  }
> 
> -/* Creates and returns a new dp_packet whose data are copied from
> 'buffer'.   The
> - * returned dp_packet will additionally have 'headroom' bytes of headroom.
> */
> +/* Creates and returns a new dp_packet whose data are copied from
> 'buffer'.
> + * The returned dp_packet will additionally have 'headroom' bytes of
> + * headroom. */
>  struct dp_packet *
>  dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t
> headroom)  { @@ -167,13 +168,12 @@
> dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t
> headroom)
>  new_buffer =
> dp_packet_clone_data_with_headroom(dp_packet_data(buffer),
>   dp_packet_size(buffer),
>   headroom);
> -new_buffer->l2_pad_size = buffer->l2_pad_size;
> -new_buffer->l2_5_ofs = buffer->l2_5_ofs;
> -new_buffer->l3_ofs = buffer->l3_ofs;
> -new_buffer->l4_ofs = buffer->l4_ofs;
> -new_buffer->md = buffer->md;
> -new_buffer->cutlen = buffer->cutlen;
> -new_buffer->packet_type = buffer->packet_type;
> +/* Copy the following fields into the returned buffer: l2_pad_size,
> + * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
> +memcpy(&new_buffer->l2_pad_size, &buffer->l2_pad_size,
> +sizeof(struct dp_packet) -
> +offsetof(struct dp_packet, l2_pad_size));
> +
[[BO'M]] 
Does this change in itself have a perf improvement?

A reference/warning in the dp_packet declaration would be a good idea -  anyone 
changing fields from l2_pad_size to the end of the structure needs to be aware 
of this implementation of dp_packet_clone_data_with_headroom().

>  #ifdef DPDK_NETDEV
>  new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags;  #else
> --
> 2.4.11
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] How to tell which code is for userspace

2017-08-01 Thread Russell Bryant

On Tue, Aug 1, 2017 at 10:01 AM,   wrote:
> Dear all,
>
> I am reading ovs 2.7.90 code, do you have a quick way to tell which code is 
> for userspace and which for kernel?

The ovs/datapath/linux/ directory is the Linux kernel datapath code.
The rest is userspace (with the exception of the Windows datapath
code).

-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] How to tell which code is for userspace

2017-08-01 Thread ted.y.liang

Dear all,

I am reading ovs 2.7.90 code, do you have a quick way to tell which code is for 
userspace and which for kernel?

Br,

Ted

Sent from Mail for Windows 10

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC] Basic active-active HA for ovn-northd

2017-08-01 Thread Russell Bryant

On Tue, Aug 1, 2017 at 8:06 AM, Numan Siddique  wrote:
>
>
> On Tue, Aug 1, 2017 at 4:25 PM, Numan Siddique  wrote:
>>
>>
>>
>> On Tue, Aug 1, 2017 at 5:00 AM, Han Zhou  wrote:
>>>
>>> On Mon, Jul 31, 2017 at 1:53 PM, Russell Bryant  wrote:
>>> >
>>> > I wanted to share the idea before I code it to see if it makes sense.
>>> > I imagine the patch would be small, though.
>>> >
>>> > We currently provide HA for ovn-northd by using Pacemaker to ensure
>>> > that ovn-northd is running only one time somewhere in a cluster.
>>> >
>>
>>
>> In the case of Pacemaker, the pacemaker OCF resource script
>> (ovndb-servers.ocf) starts the ovn-northd on the master node if
>> manage_northd is set to true and this is the approach tripleo has taken.
>> ovn-northd uses the unix sockets to communicate to the NB and SB db servers.
>> With the suggested approach, ovn-northd would be using tcp connection to
>> communicate to the db servers ?
>
>
> Given that SB DB ovsdb-server would assign the lock, it has to be a tcp
> connection. Sorry for the confusion.

Right.  If we're still using Pacemaker for the databases anyway, we
could keep managing ovn-northd as well to ensure they stay local so we
can keep using the local unix socket.  On the other hand, separating
them will spread load a bit more across controllers.  It's not obvious
to me which is better.

>
>
>>
>> Thanks
>> Numan
>>
>>
>>
>>
>>> > What if we made ovn-northd acquire an OVSDB lock on the southbound
>>> > database before it did any real work?  That way we could start
>>> > multiple copies of ovn-northd in a cluster, but only one would be
>>> > active at a time.
>>> >
>>> > This is crude, and obviously we would want to distribute work among
>>> > ovn-northd instances eventually, but does this sound like an
>>> > improvement over requiring something like Pacemaker?
>>> >
>>> Russell, this sounds very good. I think it is better than Pacemaker. I
>>> would still call it active-standby though. The standby is hot-standby,
>>> not
>>> cold-standby.

Thanks, Han.  Yes, your "hot-standby" terminology sounds better.  :-)

>>>
>>> It seems the change would be having the active one updating >> northd hostname> periodically to southbound DB, so that other northd will
>>> know if it is still alive, and otherwise taking over by updating the
>>> . Is this correct?

This wouldn't be required.  If the active ovn-northd goes down, the
OVSDB lock will automatically be released when the database detects
that the connection is gone.  We would be relying on ovsdb-server for
how quickly the failover happens.

I think this change would be an improvement.  At a minimum, it ensures
ovn-northd behaves in a sane way if you accidentally run it more than
once in a deployment.  It also allows intentionally running it more
than once as one or more hot-standby instances.

Maybe in the next release we can revisit proposals for fully
active-active ovn-northd.

-- 
Russell Bryant
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] Proposal: EMC load-shedding

2017-08-01 Thread O Mahony, Billy

Hi All,

This proposal is an attempt to make a more general solution to the same issue
of EMC thrashing addressed by
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335940.html. That patch
proposes that when the EMC is overloaded that recirculated packets are neither
inserted into the EMC nor are they looked up in the EMC.

However, while it is a cheap decision, picking 'recirculated' as the category
of packet to drop from the EMC is not very granular - dropping them from the
EMC may result in an EMC that is now under-utilized or still overloaded - it 
all 
depends on what proportion of packets were in the recirculated category.

Once there are too many entries contending for any cache it's going to become
a liability as the lookup_cost + (miss_rate * cost_of_miss) grows to be greater
than cost_of_miss. And in that overloaded case any scheme where a very cheap
decision can be made to not insert a certain category of packets and to also
not check the EMC for that same category will reduce the cache miss rate.

Looking at a packet's RSS hash can also provide a "very cheap decision that can
be made to not insert a certain category of packets and to also not check the
EMC for that same category"

I don't want to propose an actual patch right now but if the general principle
is agreed then I would be happy to write at least a first iteration of the
patch or maybe the authors of the patch above would like to do so?

So I suggest that when the EMC becomes "overloaded" load is shed based on RSS
/5-tuple hash. If the hash is above a certain threshold it becomes eligible to
be inserted into the EMC. Packets with a hash below the threshold are non
inserted nor are they looked up in the EMC. The threshold can be changed in an
ongoing and granular way to adapt to current traffic conditions and maintain an
optimum EMC utilization.

So in simple, non-optimized (and non-compiling) terms:

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 47a9fa0..df5b9db 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4599,11 +4599,12 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
 miniflow_extract(packet, &key->mf);
 key->len = 0; /* Not computed yet. */
 key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf);

 /* If EMC is disabled skip emc_lookup */
-flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key);
+if (key->hash > flow_cache.shed_threshold) {
+flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key);
+}
 if (OVS_LIKELY(flow)) {
 dp_netdev_queue_batches(packet, flow, &key->mf, batches,
 n_batches);
 } else {
 /* Exact match cache missed. Group missed packets together at
@@ -4777,11 +4778,12 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
 continue;
 }

 flow = dp_netdev_flow_cast(rules[i]);

-emc_probabilistic_insert(pmd, &keys[i], flow);
+if (keys[i].hash > pmd->flow_cache->shed_threshold) {
+emc_probabilistic_insert(pmd, &keys[i], flow);
+}
 dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches);
 }

 dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt);
 dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);

The other question is setting the shed_threshold value. Generally:

 sched_threshold = some_function_of(emc_load)

Some suggestions for calculating emc_load:
* the num_alive_entries/num_entries
* the num_evictions/num_insertions (where num_evictions is the number of
  insertions that overwrote and existing alive entry).
* something more adaptable:
  emc_load = some_function_of (cost_of_emc_miss,
   cost_of_emc_lookup,
   probability_of_emc_miss)


I'd be interested to know what people think of doing something like this and
any more details that can be fleshed out, corner cases and so on.

Regards,
Billy

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] Openvswitch crash when bringing down the dpdk bond port using "ovs-ofctl mod-port br-prv dpdk1 down"

2017-08-01 Thread Keshav Gupta

Hi Ben/Ilya
Mainly patch do the following
1) Call the dpdk rx api(rte_eth_rx_burst) only when ovs netdev state is UP
2)  Now the rte_eth_dev_stop/ rte_eth_dev_start calls are removed in updating 
the flags   


This  fixes the core dump issue but I see below side effect with this patch:
With this change now only ports admin state(ovs netdev level) is changed 
(dpdk/nic level state is not changed) so NIC will continue to receive the 
packet until its Rxq ring is full
Later if somebody bring up the port then he will get the burst of old packets 
which are there in the Rxq (typically 4k packet (dpdk Rxq size)).


Thanks
Keshav
   
 

-Original Message-
From: Ben Pfaff [mailto:b...@ovn.org] 
Sent: Monday, July 31, 2017 9:26 PM
To: Ilya Maximets
Cc: ovs-dev@openvswitch.org; Keshav Gupta
Subject: Re: [ovs-dev] Openvswitch crash when bringing down the dpdk bond port 
using "ovs-ofctl mod-port br-prv dpdk1 down"

Thanks.

It doesn't cherry-pick cleanly to branch-2.6, so I think I'll leave it for now 
since Darrell made the reasonable point that DPDK was experimental in 2.6.

On Mon, Jul 31, 2017 at 06:20:28PM +0300, Ilya Maximets wrote:
> On 31.07.2017 16:05, Ben Pfaff wrote:
> > Ilya, should we apply this patch to branch-2.6?  Are there other 
> > patches that should be backported?
> 
> It's definitely a bug and should be backported if someone wants to use 
> barnch-2.6 with DPDK datapath. I traced only this exact case, so it's 
> hard to say if something else should be backported, but this patch 
> should fix described issue without  any additional changes.
>  
> > On Wed, Jul 26, 2017 at 03:28:12PM +0300, Ilya Maximets wrote:
> >> Hi.
> >>
> >> You need to backport at least following patch:
> >>
> >> commit 3b1fb0779b87788968c1a6a9ff295a9883547485
> >> Author: Daniele Di Proietto 
> >> Date:   Tue Nov 15 15:40:49 2016 -0800
> >>
> >> netdev-dpdk: Don't call rte_dev_stop() in update_flags().
> >> 
> >> Calling rte_eth_dev_stop() while the device is running causes a crash.
> >> 
> >> We could use rte_eth_dev_set_link_down(), but not every PMD implements
> >> that, and I found one NIC where that has no effect.
> >> 
> >> Instead, this commit checks if the device has the NETDEV_UP flag when
> >> transmitting or receiving (similarly to what we do for vhostuser). I
> >> didn't notice any performance difference with this check in case the
> >> device is up.
> >> 
> >> An alternative would be to remove the device queues from the pmd 
> >> threads
> >> tx and receive cache, but that requires reconfiguration and I'd prefer
> >> to avoid it, because the change can come from OpenFlow.
> >> 
> >> Signed-off-by: Daniele Di Proietto 
> >> Acked-by: Ilya Maximets 
> >>
> >> This should fix your issue.
> >> In general, I'm suggesting to use stable 2.7 OVS, there was too 
> >> many DPDK related changes including stability fixes since 2.6.
> >>
> >> Best regards, Ilya Maximets.
> >>
> >>> Hi
> >>>   We are experiencing a openvswitch crash when bringing down the dpdk 
> >>> bond port using "ovs-ofctl mod-port br-prv dpdk1 down".
> >>>
> >>> backtrace of core is like below. Is there any issue reported earlier  for 
> >>> this type of crash in openvswitch community.
> >>>
> >>> (gdb) bt
> >>> #0  ixgbe_rxq_rearm (rxq=0x7fa45061f800) at 
> >>> /home/sdn/new_cloud_sdn_switch_2/cloud-sdn-switch/dpdk/drivers/net
> >>> /ixgbe/ixgbe_rxtx_vec_sse.c:98
> >>> #1  _recv_raw_pkts_vec (split_packet=0x0, nb_pkts=32, rx_pkts= >>> out>, rxq=0x7fa45061f800)
> >>> at 
> >>> /home/sdn/new_cloud_sdn_switch_2/cloud-sdn-switch/dpdk/drivers/net
> >>> /ixgbe/ixgbe_rxtx_vec_sse.c:290
> >>> #2  ixgbe_recv_pkts_vec (rx_queue=0x7fa45061f800, rx_pkts= >>> out>, nb_pkts=)
> >>> at 
> >>> /home/sdn/new_cloud_sdn_switch_2/cloud-sdn-switch/dpdk/drivers/net
> >>> /ixgbe/ixgbe_rxtx_vec_sse.c:474
> >>> #3  0x00e500e4 in ?? ()
> >>> #4  0x004600e6 in ?? ()
> >>> #5  0x006a0069 in ?? ()
> >>> #6  0x006c006b in ?? ()
> >>> #7  0x00ec006d in ?? ()
> >>> #8  0x00ee00ed in ?? ()
> >>> #9  0x0001537f5780 in ?? ()
> >>> #10 0x in ?? ()
> >>> (gdb)
> >>>
> >>>
> >>> I have analyzed the core and it seems it is a result of device 
> >>> stop and packet receive from the port happening at same time by 
> >>> two thread OVS main thread(device stop) and PMD thread(pkt receive). More 
> >>> precisely main thread cleaning the packet buffer from rxq sw_ring to 
> >>> avoid the packet buffer leak while in parallel PMD thread is filling the 
> >>> packet buffer in sw_ring/descriptor ring as part of ixgbe_recv_pkts_vec.
> >>>
> >>> version used is: openvswitch (2.6.1) with dpdk (16.11).
> >>>
> >>> This crash is not every time reproducible but frequency seems to be high.
> >>>
> >>> I am new to openvswitch community and this is first time I am posting a 
> >>> query. let me know if anything you require from my side.
> >>>
> >>

Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Yang, Yi Y

Hi, Ben

I have posted v4, they are:

https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336504.html
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336505.html
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336506.html

I have fixed most of issues you commented, I need to clarify the below issues, 
actually they are not issues. I have explained the other issues in last email.

#1.
[Ben] I don't see anything in decode_ofp_prop() or its caller that checks that 
the property fits in the available space.  It looks like an integer overflow 
error.

[Yi] Currently it is a stub function, it doesn't handle any property, so it 
doesn't put any property into buf, so we mustn't check space at this point. Our 
NSH patch series will add more pieces for this. Let us check 

enum ofperr
decode_ed_prop(const struct ofp_ed_prop_header **ofp_prop,
   struct ofpbuf *out OVS_UNUSED,
   size_t *remaining)

We used ofpbuf_put_uninit(out, ...) to put the property to buffer, so we 
needn't care the available space, ofpbuf_put_uninit will allocate on demand.

#2.
[Ben] I suspect that decode_NXAST_RAW_DECAP() should report an error if 
properties are present, since it doesn't support properties.

[Yi] It is impossible. 

static enum ofperr
decode_NXAST_RAW_DECAP(const struct nx_action_decap *nad,
   enum ofp_version ofp_version OVS_UNUSED,
   struct ofpbuf *ofpacts)
{
struct ofpact_decap * decap;

decap = ofpact_put_DECAP(ofpacts);
decap->ofpact.raw = NXAST_RAW_DECAP;
decap->new_pkt_type = nad->new_pkt_type;
return 0;
}

struct ofpact_decap {
struct ofpact ofpact;

/* New packet type.
 *
 * The special value (0,0xFFFE) "Use next proto" is used to request OVS to
 * automatically set the new packet type based on the decap'ed header's
 * next protocol.
 */
ovs_be32 new_pkt_type;
};

struct nx_action_decap {
ovs_be16 type; /* OFPAT_VENDOR. */
ovs_be16 len;  /* Total size including any property TLVs. */
ovs_be32 vendor;   /* NX_VENDOR_ID. */
ovs_be16 subtype;  /* NXAST_DECAP. */
uint8_t pad[2];/* 2 bytes padding */

/* Packet type or result.
 *
 * The special value (0,0xFFFE) "Use next proto"
 * is used to request OVS to automatically set the new packet type based
 * on the decap'ed header's next protocol.
 */
ovs_be32 new_pkt_type;
};


-Original Message-
From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org] 
On Behalf Of Ben Pfaff
Sent: Monday, July 31, 2017 11:43 PM
To: Zoltán Balogh 
Cc: d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH v3 1/2] OF support and translation of generic 
encap and decap

On Fri, Jul 21, 2017 at 07:08:25PM +0200, Zoltán Balogh wrote:
> From: Jan Scheurich 
> 
> This commit adds support for the OpenFlow actions generic encap and 
> decap (as specified in ONF EXT-382) to the OVS control plane.
> 
> CLI syntax for encap action with properties:
>   encap(hdr=)
>   encap(hdr=,
> prop(class=,type=,val=),
> prop(class=,type=,val()))
> 
> CLI syntax for decap action:
>   decap()
>   decap(packet_type(ns=,type=))
> 
> The first header supported for encap and decap is "ethernet" to 
> convert packets between packet_type (1,Ethertype) and (0,0).
> 
> This commit also implements a skeleton for the translation of generic 
> encap and decap actions in ofproto-dpif and adds support to encap and 
> decap an Ethernet header.
> 
> In general translation of encap commits pending actions and then 
> rewrites struct flow in accordance with the new packet type and 
> header. In the case of encap(ethernet) it suffices to change the 
> packet type from (1, Ethertype) to (0,0) and set the dl_type 
> accordingly. A new pending_encap flag in xlate ctx is set to mark that 
> an corresponding datapath encap action must be triggered at the next 
> commit. In the case of encap(ethernet) ofproto generetas a push_eth action.
> 
> The general case for translation of decap() is to emit a datapath 
> action to decap the current outermost header and then recirculate the 
> packet to reparse the inner headers. In the special case of an 
> Ethernet packet,
> decap() just changes the packet type from (0,0) to (1, dl_type) 
> without a need to recirculate. The emission of the pop_eth action for 
> the datapath is postponed to the next commit.
> 
> Hence encap(ethernet) and decap() on an Ethernet packet are OF octions 
> that only incur a cost in the dataplane when a modifed packet is 
> actually committed, e.g. because it is sent out. They can freely be 
> used for normalizing the packet type in the OF pipeline without 
> degrading performance.
> 
> Signed-off-by: Jan Scheurich 
> Signed-off-by: Yi Yang 
> Signed-off-by: Zoltan Balogh 
> Co-authored-by: Zoltan Balogh 

Thanks for working on this.  I think that it will be a valuable feature.

First, I'm having a really hard

[ovs-dev] [PATCH v4 2/2] tests: Extend PTAP unit tests with decap action

2017-08-01 Thread Yi Yang

From: Zoltan Balogh 

  - Checking decap() prerequisits.
  - Encap/decap VLAN tagged Ethernet frames.
  - Send L3 packet over patch port.
  - Output L2/L3 packet to ports with different packet_type properties.

Signed-off-by: Zoltan Balogh 
Suggested-by: Jan Scheurich 
---
 tests/packet-type-aware.at | 405 +
 1 file changed, 405 insertions(+)

diff --git a/tests/packet-type-aware.at b/tests/packet-type-aware.at
index 883d9a0..8034e80 100644
--- a/tests/packet-type-aware.at
+++ b/tests/packet-type-aware.at
@@ -510,3 +510,408 @@ 
tunnel(src=20.0.0.3,dst=20.0.0.2,flags(-df-csum)),recirc_id(0),in_port(gre_sys),
 
 OVS_VSWITCHD_STOP(["/The Open vSwitch kernel module is probably not loaded/d"])
 AT_CLEANUP
+
+
+AT_SETUP([ptap - check decap() prerequisits])
+OVS_VSWITCHD_START
+
+# Decap IP header, then set IP destination address. This should fail.
+AT_CHECK([
+ovs-ofctl add-flow br0 
"in_port=1,packet_type=(1,0x800),actions=decap(),set_field:1.1.1.1->nw_dst"
+], [1], [stdout], [stderr])
+
+AT_CHECK([
+cat stderr | cut -d '|' -f 3-
+], [0], [dnl
+ofp_actions|WARN|set_field ip_dst lacks correct prerequisites
+ovs-ofctl: actions are invalid with specified match (OFPBAC_MATCH_INCONSISTENT)
+])
+
+# Decap Ethernet header, then set IP destination address. This should work.
+AT_CHECK([
+ovs-ofctl add-flow br0 -OOpenFlow13 
"in_port=1,ip,actions=decap(),set_field:1.1.1.1->nw_dst"
+], [0])
+
+# Decap IP header, then set metadata. This should work.
+AT_CHECK([
+ovs-ofctl add-flow br0 -OOpenFlow13 
"in_port=1,packet_type=(1,0x800),actions=decap(),set_field:1234->metadata"
+], [0])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+
+AT_SETUP([ptap - check encap/decap VLAN tagged Ethernet frame])
+OVS_VSWITCHD_START([dnl
+-- add-port br0 n1 -- set interface n1 type=dummy ofport_request=1 \
+-- add-port br0 n2 -- set interface n2 type=dummy ofport_request=2 \
+-- add-port br0 p1 -- set interface p1 type=patch options:peer=p2 
ofport_request=3 \
+-- add-port br0 p2 -- set interface p2 type=patch options:peer=p1 
ofport_request=4
+])
+
+# Decap VLAN tagged Ethernet frames -> should be dropped
+AT_CHECK([
+ovs-ofctl add-flow br0 -OOpenFlow13 
"in_port=1,actions=push_vlan:0x8100,mod_vlan_vid:100,decap(),3"
+ovs-ofctl add-flow br0 -OOpenFlow13 
"in_port=4,actions=encap(hdr=ethernet),2"
+], [0])
+
+AT_CHECK([
+ovs-appctl netdev-dummy/receive n1 
1e2ce92a669e3a6dd2099cab080045548a53400040011addc0a80a0ac0a80a1e08006f200a4d0001fc509a5827150200101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
+ovs-appctl netdev-dummy/receive n1 
1e2ce92a669e3a6dd2099cab080045548a83400040011aadc0a80a0ac0a80a1e0800b7170a4d0002fd509a58de1c0200101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
+], [0], [ignore])
+
+ovs-appctl time/warp 1000
+
+AT_CHECK([
+ovs-appctl dpctl/dump-flows | strip_used | grep -v ipv6 | sort
+], [0], [flow-dump from non-dpdk interfaces:
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), 
packets:1, bytes:98, used:0.0s, actions:drop
+])
+
+# Encap(hdr=ethernet) on Ethernet frame -> should be droped
+AT_CHECK([
+ovs-ofctl del-flows br0
+ovs-ofctl add-flow br0 -OOpenFlow13 "in_port=1,actions=3"
+ovs-ofctl add-flow br0 -OOpenFlow13 
"in_port=4,actions=encap(hdr=ethernet),2"
+])
+
+ovs-appctl time/warp 11000
+
+AT_CHECK([
+ovs-appctl netdev-dummy/receive n1 
1e2ce92a669e3a6dd2099cab080045548a53400040011addc0a80a0ac0a80a1e08006f200a4d0001fc509a5827150200101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
+ovs-appctl netdev-dummy/receive n1 
1e2ce92a669e3a6dd2099cab080045548a83400040011aadc0a80a0ac0a80a1e0800b7170a4d0002fd509a58de1c0200101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
+], [0], [ignore])
+
+ovs-appctl time/warp 1000
+
+AT_CHECK([
+ovs-appctl dpctl/dump-flows | strip_used | grep -v ipv6 | sort
+], [0], [flow-dump from non-dpdk interfaces:
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), 
packets:1, bytes:98, used:0.0s, actions:drop
+])
+
+# Encap(hdr=ethernet) on VLAN tagged Ethernet frame -> should be droped
+AT_CHECK([
+ovs-ofctl del-flows br0
+ovs-ofctl add-flow br0 -OOpenFlow13 
"in_port=1,actions=push_vlan:0x8100,mod_vlan_vid:100,encap(hdr=ethernet),3"
+ovs-ofctl add-flow br0 -OOpenFlow13 "in_port=4,actions=2"
+])
+
+ovs-appctl time/warp 11000
+
+AT_CHECK([
+ovs-appctl netdev-dummy/receive n1 
1e2ce92a669e3a6dd2099cab080045548a53400040011addc0a80a0ac0a80a1e08006f200a4d0001fc509a5827150200101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
+ovs-appctl netdev-dummy/receive n1 
1e2ce92a669e3a6dd2099cab080045548a83400040011aadc0a80a0ac0a80a1e0800b7170a4d0002fd509a58de1c02001

[ovs-dev] [PATCH v4 1/2] OF support and translation of generic encap and decap

2017-08-01 Thread Yi Yang

From: Jan Scheurich 

This commit adds support for the OpenFlow actions generic encap
and decap (as specified in ONF EXT-382) to the OVS control plane.

CLI syntax for encap action with properties:
  encap(hdr=)
  encap(hdr=,
prop(class=,type=,val=),
prop(class=,type=,val()))

CLI syntax for decap action:
  decap()
  decap(packet_type(ns=,type=))

The first header supported for encap and decap is "ethernet" to convert
packets between packet_type (1,Ethertype) and (0,0).

This commit also implements a skeleton for the translation of generic
encap and decap actions in ofproto-dpif and adds support to encap and
decap an Ethernet header.

In general translation of encap commits pending actions and then rewrites
struct flow in accordance with the new packet type and header. In the
case of encap(ethernet) it suffices to change the packet type from
(1, Ethertype) to (0,0) and set the dl_type accordingly. A new
pending_encap flag in xlate ctx is set to mark that an corresponding
datapath encap action must be triggered at the next commit. In the
case of encap(ethernet) ofproto generetas a push_eth action.

The general case for translation of decap() is to emit a datapath action
to decap the current outermost header and then recirculate the packet
to reparse the inner headers. In the special case of an Ethernet packet,
decap() just changes the packet type from (0,0) to (1, dl_type) without
a need to recirculate. The emission of the pop_eth action for the
datapath is postponed to the next commit.

Hence encap(ethernet) and decap() on an Ethernet packet are OF octions
that only incur a cost in the dataplane when a modifed packet is
actually committed, e.g. because it is sent out. They can freely be
used for normalizing the packet type in the OF pipeline without
degrading performance.

Signed-off-by: Jan Scheurich 
Signed-off-by: Yi Yang 
Signed-off-by: Zoltan Balogh 
Co-authored-by: Zoltan Balogh 
---
 NEWS   |   6 +
 include/openflow/openflow-common.h |   1 +
 include/openvswitch/automake.mk|   1 +
 include/openvswitch/ofp-actions.h  |  32 
 include/openvswitch/ofp-ed-props.h |  77 
 include/openvswitch/ofp-errors.h   |   9 +
 lib/automake.mk|   1 +
 lib/odp-util.c |  84 +---
 lib/odp-util.h |   3 +-
 lib/ofp-actions.c  | 380 -
 lib/ofp-ed-props.c | 151 +++
 lib/packets.h  |   3 +-
 ofproto/ofproto-dpif-xlate.c   | 116 ++-
 utilities/ovs-ofctl.8.in   |  73 ++-
 14 files changed, 890 insertions(+), 47 deletions(-)
 create mode 100644 include/openvswitch/ofp-ed-props.h
 create mode 100644 lib/ofp-ed-props.c

diff --git a/NEWS b/NEWS
index facea02..ca02ca7 100644
--- a/NEWS
+++ b/NEWS
@@ -62,11 +62,17 @@ Post-v2.7.0
  * The "learn" action now supports a "limit" option (see ovs-ofctl(8)).
  * The port status bit OFPPS_LIVE now reflects link aliveness.
  * OpenFlow 1.5 packet-out is now supported.
+ * Support for OpenFlow 1.5 field packet_type and packet-type-aware
+   pipeline (PTAP).
+ * Added generic encap and decap actions (EXT-382).
+   First supported use case is encap/decap for Ethernet.
- Fedora Packaging:
  * OVN services are no longer restarted automatically after upgrade.
- Add --cleanup option to command 'ovs-appctl exit' (see ovs-vswitchd(8)).
- L3 tunneling:
  * Use new tunnel port option "packet_type" to configure L2 vs. L3.
+ * In conjunction with PTAP tunnel ports can handle a mix of L2 and L3
+   payload.
  * New vxlan tunnel extension "gpe" to support VXLAN-GPE tunnels.
  * New support for non-Ethernet (L3) payloads in GRE and VXLAN-GPE.
- The BFD detection multiplier is now user-configurable.
diff --git a/include/openflow/openflow-common.h 
b/include/openflow/openflow-common.h
index 5f1e225..d665519 100644
--- a/include/openflow/openflow-common.h
+++ b/include/openflow/openflow-common.h
@@ -466,6 +466,7 @@ enum ofp_header_type_namespaces {
 OFPHTN_IP_PROTO = 2,/* ns_type is a IP protocol number. */
 OFPHTN_UDP_TCP_PORT = 3,/* ns_type is a TCP or UDP port. */
 OFPHTN_IPV4_OPTION = 4, /* ns_type is an IPv4 option number. */
+OFPHTN_N_TYPES
 };
 
 #endif /* openflow/openflow-common.h */
diff --git a/include/openvswitch/automake.mk b/include/openvswitch/automake.mk
index 6bace61..0a139d0 100644
--- a/include/openvswitch/automake.mk
+++ b/include/openvswitch/automake.mk
@@ -12,6 +12,7 @@ openvswitchinclude_HEADERS = \
include/openvswitch/meta-flow.h \
include/openvswitch/ofpbuf.h \
include/openvswitch/ofp-actions.h \
+   include/openvswitch/ofp-ed-props.h \
include/openvswitch/ofp-errors.h \
include/openvswitch/ofp-msgs.h \
include/openvswitch/ofp-parse.h \
diff --git a/include/openvswitch/ofp-actions.h 
b/include/

[ovs-dev] [PATCH v4 0/2] basic encap/decap

2017-08-01 Thread Yi Yang

This series is a continuation of other patch series initiated by Jan Scheurich 
before.

The main purpose of this series is to add support for the OpenFlow actions 
generic encap and decap (ONF EXT-382) to the OVS control plane. It implements
a skeleton for translation of generic encap and decap actions in ofproto-dpif
and provides support to encap and decap an Ethernet header. 

v3->v4
 - Add encap and decap actions man page in ovs-ofctl.8
 - Fix some alignment errors by ALIGNED_CAST
 - Fix some comments style issues
 - Fix various trivial issues per Ben Pfaff's comments
 - Rebase to current master branch

v2->v3
 - NEWS updated.
 - fix: drop VLAN tagged packet trying to decap it.
 - New unit tests for the fix.
 - Some tests were updated due to change in recirculation on master branch.

v1->v2
 - Squash 1/4 and 2/4 commits of v1.
 - Put unit tests in a separate commit.
 - Use aligned cast.
 - Nicira extension numbers for encap/decap action numbers and error codes.
 - Small fixes according to comments.

Jan Scheurich (1):
  OF support and translation of generic encap and decap

Zoltan Balogh (1):
  tests: Extend PTAP unit tests with decap action

 NEWS   |   6 +
 include/openflow/openflow-common.h |   1 +
 include/openvswitch/automake.mk|   1 +
 include/openvswitch/ofp-actions.h  |  32 +++
 include/openvswitch/ofp-ed-props.h |  77 +++
 include/openvswitch/ofp-errors.h   |   9 +
 lib/automake.mk|   1 +
 lib/odp-util.c |  84 +---
 lib/odp-util.h |   3 +-
 lib/ofp-actions.c  | 380 +-
 lib/ofp-ed-props.c | 151 ++
 lib/packets.h  |   3 +-
 ofproto/ofproto-dpif-xlate.c   | 116 ++-
 tests/packet-type-aware.at | 405 +
 utilities/ovs-ofctl.8.in   |  73 ++-
 15 files changed, 1295 insertions(+), 47 deletions(-)
 create mode 100644 include/openvswitch/ofp-ed-props.h
 create mode 100644 lib/ofp-ed-props.c

-- 
2.1.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC] Basic active-active HA for ovn-northd

2017-08-01 Thread Numan Siddique

On Tue, Aug 1, 2017 at 4:25 PM, Numan Siddique  wrote:

>
>
> On Tue, Aug 1, 2017 at 5:00 AM, Han Zhou  wrote:
>
>> On Mon, Jul 31, 2017 at 1:53 PM, Russell Bryant  wrote:
>> >
>> > I wanted to share the idea before I code it to see if it makes sense.
>> > I imagine the patch would be small, though.
>> >
>> > We currently provide HA for ovn-northd by using Pacemaker to ensure
>> > that ovn-northd is running only one time somewhere in a cluster.
>> >
>>
>
> In the case of Pacemaker, the pacemaker OCF resource script
>  (ovndb-servers.ocf) starts the ovn-northd on the master node if
> manage_northd is set to true and this is the approach tripleo has taken.
> ovn-northd uses the unix sockets to communicate to the NB and SB db servers.
> With the suggested approach, ovn-northd would be using tcp connection to
> communicate to the db servers ?
>

Given that SB DB ovsdb-server would assign the lock, it has to be a tcp
connection. Sorry for the confusion.



> Thanks
> Numan
>
>
>
>
> > What if we made ovn-northd acquire an OVSDB lock on the southbound
>> > database before it did any real work?  That way we could start
>> > multiple copies of ovn-northd in a cluster, but only one would be
>> > active at a time.
>> >
>> > This is crude, and obviously we would want to distribute work among
>> > ovn-northd instances eventually, but does this sound like an
>> > improvement over requiring something like Pacemaker?
>> >
>> Russell, this sounds very good. I think it is better than Pacemaker. I
>> would still call it active-standby though. The standby is hot-standby, not
>> cold-standby.
>>
>> It seems the change would be having the active one updating > northd hostname> periodically to southbound DB, so that other northd will
>> know if it is still alive, and otherwise taking over by updating the
>> . Is this correct?
>>
>
>
>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 1/5] dpif-netdev: move pkt metadata init out of emc_processing.

2017-08-01 Thread Fischetti, Antonio



> -Original Message-
> From: O Mahony, Billy
> Sent: Monday, July 31, 2017 4:33 PM
> To: Fischetti, Antonio ; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v2 1/5] dpif-netdev: move pkt metadata init out
> of emc_processing.
> 
> There is also a reference to md_is_valid is the comments of emc_processing 
> that
> needs to be removed.

[Antonio] thanks will fix.


> 
> > -Original Message-
> > From: O Mahony, Billy
> > Sent: Monday, July 31, 2017 4:04 PM
> > To: 'antonio.fische...@intel.com' ;
> > d...@openvswitch.org
> > Subject: RE: [ovs-dev] [PATCH v2 1/5] dpif-netdev: move pkt metadata init
> > out of emc_processing.
> >
> > Hi Antionio,
> >
> > This looks like a reasonable change to me.
> >
> > Can you add some performance statistics for when dealing with re-circulated
> > packets?

[Antonio]
I used the ConnectionTracker setup for a firewall like

table=0, priority=100,ct_state=-trk,ip actions=ct(table=1)
table=0, priority=10,arp actions=NORMAL
table=0, priority=1 actions=drop
table=1, ct_state=+new+trk,ip,in_port=dpdk0 actions=ct(commit),output:dpdk1
table=1, ct_state=+new+trk,ip,in_port=dpdk1 actions=drop
table=1, ct_state=+est+trk,ip,in_port=dpdk0 actions=output:dpdk1
table=1, ct_state=+est+trk,ip,in_port=dpdk1 actions=output:dpdk0

I was sending 10 different UDP streams from the traffic generator, 
2 PMDs with 3 Tx queues, 64 B packets at the line rate. 
I got the following performance figures when I measured the received 
pkt rate [Mpps] on the 2 sides, regardless of packet loss.
 
Original OvS-DPDK
2.582.60

+ this patch
2.672.71

I didn't test with more UDP streams because - as the performance drops - the 
relative improvement becomes so small that is less visible and measurable.


> >
> > /Billy.
> >
> > > -Original Message-
> > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > > boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com
> > > Sent: Wednesday, July 19, 2017 5:05 PM
> > > To: d...@openvswitch.org
> > > Subject: [ovs-dev] [PATCH v2 1/5] dpif-netdev: move pkt metadata init
> > > out of emc_processing.
> > >
> > > Packet metadata initialization is moved into dp_netdev_input to
> > > improve performance.
> > >
> > > Signed-off-by: Antonio Fischetti 
> > > ---
> > > In my testbench with the following port to port flow setup:
> > > in_port=1,action=output:2
> > > in_port=2,action=output:1
> > >
> > > I measured packet Rx rate (regardless of packet loss) in a
> > > Bidirectional test with  64B UDP packets.
> > >
> > > I saw the following performance improvement
> > >
> > > Orig:  11.30, 11.54 Mpps
> > > Orig + patch:  11.70, 11.76 Mpps
> > >
> > >  lib/dpif-netdev.c | 21 ++---
> > >  1 file changed, 10 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> > > 98e7765..123e04a
> > > 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -4572,8 +4572,7 @@ static inline size_t  emc_processing(struct
> > > dp_netdev_pmd_thread *pmd,
> > > struct dp_packet_batch *packets_,
> > > struct netdev_flow_key *keys,
> > > -   struct packet_batch_per_flow batches[], size_t *n_batches,
> > > -   bool md_is_valid, odp_port_t port_no)
> > > +   struct packet_batch_per_flow batches[], size_t
> > > + *n_batches)
> > >  {
> > >  struct emc_cache *flow_cache = &pmd->flow_cache;
> > >  struct netdev_flow_key *key = &keys[0]; @@ -4601,9 +4600,6 @@
> > > emc_processing(struct dp_netdev_pmd_thread *pmd,
> > >  pkt_metadata_prefetch_init(&packets[i+1]->md);
> > >  }
> > >
> > > -if (!md_is_valid) {
> > > -pkt_metadata_init(&packet->md, port_no);
> > > -}
> > >  miniflow_extract(packet, &key->mf);
> > >  key->len = 0; /* Not computed yet. */
> > >  key->hash = dpif_netdev_packet_get_rss_hash(packet,
> > > &key->mf); @@ -4805,8 +4801,7 @@ fast_path_processing(struct
> > > dp_netdev_pmd_thread *pmd,
> > >   * valid, 'md_is_valid' must be true and 'port_no' will be ignored.
> > > */  static void  dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
> > > -  struct dp_packet_batch *packets,
> > > -  bool md_is_valid, odp_port_t port_no)
> > > +  struct dp_packet_batch *packets)
> > >  {
> > >  int cnt = packets->count;
> > >  #if !defined(__CHECKER__) && !defined(_WIN32) @@ -4823,8 +4818,7
> > @@
> > > dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
> > >  odp_port_t in_port;
> > >
> > >  n_batches = 0;
> > > -emc_processing(pmd, packets, keys, batches, &n_batches,
> > > -md_is_valid, port_no);
> > > +emc_processing(pmd, packets, keys, batches, &n_batches);
> > >  if (!dp_packet_batch_is_empty(packets)) {
> > >  /* Get ingress port from first packet's metadata. */
> > >  in_port = packets->packet

Re: [ovs-dev] [RFC] Basic active-active HA for ovn-northd

2017-08-01 Thread Numan Siddique

On Tue, Aug 1, 2017 at 5:00 AM, Han Zhou  wrote:

> On Mon, Jul 31, 2017 at 1:53 PM, Russell Bryant  wrote:
> >
> > I wanted to share the idea before I code it to see if it makes sense.
> > I imagine the patch would be small, though.
> >
> > We currently provide HA for ovn-northd by using Pacemaker to ensure
> > that ovn-northd is running only one time somewhere in a cluster.
> >
>

In the case of Pacemaker, the pacemaker OCF resource script
 (ovndb-servers.ocf) starts the ovn-northd on the master node if
manage_northd is set to true and this is the approach tripleo has taken.
ovn-northd uses the unix sockets to communicate to the NB and SB db servers.
With the suggested approach, ovn-northd would be using tcp connection to
communicate to the db servers ?

Thanks
Numan




> What if we made ovn-northd acquire an OVSDB lock on the southbound
> > database before it did any real work?  That way we could start
> > multiple copies of ovn-northd in a cluster, but only one would be
> > active at a time.
> >
> > This is crude, and obviously we would want to distribute work among
> > ovn-northd instances eventually, but does this sound like an
> > improvement over requiring something like Pacemaker?
> >
> Russell, this sounds very good. I think it is better than Pacemaker. I
> would still call it active-standby though. The standby is hot-standby, not
> cold-standby.
>
> It seems the change would be having the active one updating  northd hostname> periodically to southbound DB, so that other northd will
> know if it is still alive, and otherwise taking over by updating the
> . Is this correct?
>



> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for recirc packets.

2017-08-01 Thread O Mahony, Billy

Hi Antonio,

Unfortunately I think the performance deltas of this here probably need to be 
re-worked given the bug discovered & fixed in EMC Insertion algorithm here 
which according to the patch notes will significantly reduce EMC contention for 
a given number of flows.

https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336452.html

However, before you commit more effort I would like to post a proposal to the 
list on a more generalized EMC load-shedding mechanism which I think could be 
more effective as it would be more granular than shedding just re-circulated 
traffic. I hope to post that today. 

Regards,
/Billy

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com
> Sent: Wednesday, July 19, 2017 5:05 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for
> recirc packets.
> 
> When OVS is configured as a firewall, with thousands of active concurrent
> connections, the EMC gets quicly saturated and may come under heavy
> thrashing for the reason that original and recirculated packets keep overwrite
> existing active EMC entries due to its limited size (8k).
> 
> This thrashing causes the EMC to be less efficient than the dcpls in terms of
> lookups and insertions.
> 
> This patch allows to use the EMC efficiently by allowing only the 'original'
> packets to hit EMC. All recirculated packets are sent to the classifier 
> directly.
> An empirical threshold (EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50%) for
> EMC occupancy is set to trigger this logic. By doing so when EMC utilization
> exceeds
> EMC_RECIRCT_NO_INSERT_THRESHOLD:
>  - EMC Insertions are allowed just for original packets. EMC insertion
>and look up is skipped for recirculated packets.
>  - Recirculated packets are sent to the classifier.
> 
> This patch is based on patch
> "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html
> Also, this patch depends on the previous one in this series.
> 
> Signed-off-by: Antonio Fischetti 
> Signed-off-by: Bhanuprakash Bodireddy
> 
> Co-authored-by: Bhanuprakash Bodireddy
> 
> ---
> In our Connection Tracker testbench set up with
> 
>  table=0, priority=1 actions=drop
>  table=0, priority=10,arp actions=NORMAL  table=0, priority=100,ct_state=-
> trk,ip actions=ct(table=1)  table=1, ct_state=+new+trk,ip,in_port=1
> actions=ct(commit),output:2  table=1, ct_state=+est+trk,ip,in_port=1
> actions=output:2  table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
> table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1
> 
> we saw the following performance improvement.
> 
> We measured packet Rx rate (regardless of packet loss). Bidirectional test
> with 64B UDP packets.
> Each row is a test with a different number of traffic streams. The traffic
> generator is set so that each stream establishes one UDP connection.
> Mpps columns reports the Rx rates on the 2 sides.
> 
>   +--+---+
>   |  Original OvS-DPDK   |Previous case  |
>   |  + patches #1,2  |+ this patch   |
>  -++-++--+
>   Traffic | Rx |   EMC   | Rx |   EMC|
>   Streams |   [Mpps]   | entries |   [Mpps]   | entries  |
>  -++-++--+
>   10  | 2.60, 2.67 |20   | 2.60, 2.64 |20|
>  100  | 2.53, 2.58 |   200   | 2.59, 2.61 |   201|
>1,000  | 2.02, 2.03 |  1929   | 2.15, 2.15 |  1997|
>2,000  | 1.94, 1.96 |  3661   | 1.97, 1.98 |  3668|
>3,000  | 1.87, 1.90 |  5086   | 1.96, 1.98 |  4736|
>4,000  | 1.82, 1.82 |  6173   | 1.95, 1.94 |  5280|
>   10,000  | 1.68, 1.69 |  7826   | 1.84, 1.84 |  7102|
>   30,000  | 1.57, 1.58 |  8192   | 1.68, 1.70 |  8192|
>  -++-++--+
> 
> This test setup implies 1 recirculation on each received packet.
> We didn't check this patch in a test scenario where more than 1 recirculation
> is occurring per packet.
> 
>  lib/dpif-netdev.c | 63
> ++-
>  1 file changed, 58 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 9562827..79efce6
> 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4573,6 +4573,9 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
>  packet_batch_per_flow_update(batch, pkt, mf);  }
> 
> +/* Threshold to skip EMC for recirculated packets. */ #define
> +EMC_RECIRCT_NO_INSERT_THRESHOLD 0xF000
> +
>  /* Try to process all ('cnt') the 'packets' using only the exact match cache
>   * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the
>   * miniflow is copied into 'keys' and the packet pointer is moved at the @@ -
> 4620,15

Re: [ovs-dev] [PATCH v4] Update relevant artifacts to add support for DPDK 17.05.1.

2017-08-01 Thread Timothy M. Redaelli


On 07/25/2017 02:34 PM, Michal Weglicki wrote:

Upgrading to DPDK 17.05.1 stable release adds new
significant features relevant to OVS, including,
but not limited to:
- tun/tap PMD,
- VFIO hotplug support,
- Generic flow API.

Following changes are applied:
- netdev-dpdk: Changes required by DPDK API modifications.
- doc: Because of DPDK API changes, backward compatibility
   with previous DPDK releases will be broken, thus all
   relevant documentation entries are updated.
- .travis: DPDK version change from 16.11.1 to 17.05.1.
- rhel/openvswitch-fedora.spec.in: DPDK version change
   from 16.11 to 17.05.1

v1->v2: Patch rebase.
v2->v3: Fixed wrong formating after v2 patch rebase.
v3->v4: Minor documentation changes.

Signed-off-by: Michal Weglicki 
Reviewed-by: Aaron Conole 
---
  .travis/linux-build.sh   |   2 +-
  Documentation/faq/releases.rst   |   1 +
  Documentation/howto/dpdk.rst |   6 +-
  Documentation/intro/install/dpdk.rst |  14 +--
  Documentation/topics/dpdk/vhost-user.rst |  12 +--
  NEWS |   1 +
  lib/netdev-dpdk.c| 144 +++
  rhel/openvswitch-fedora.spec.in  |   2 +-
  tests/dpdk/ring_client.c |   6 +-
  9 files changed, 114 insertions(+), 74 deletions(-)


[...]

diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 3a045d3..2bb7102 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -84,7 +84,7 @@ BuildRequires: libcap-ng libcap-ng-devel
  %endif
  %if %{with dpdk}
  BuildRequires: libpcap-devel numactl-devel
-BuildRequires: dpdk-devel >= 16.11
+BuildRequires: dpdk-devel >= 17.05.1
  Provides: %{name}-dpdk = %{version}-%{release}
  %endif
 



Hi,
on Fedora there is no dpdk-devel 17.05.1 package, the last rawhide 
version is dpdk-devel 17.05 (and usually Fedora doesn't package minor 
releases of dpdk).


I tried to build openvswitch with the dpdk 17.05 package from rawhide.
It build and it works (tested with dpdkvhostuserclient) so, unless you 
need a specific fix from 17.05.1, I suggest to change the BuildRequires 
to 17.05 instead of 17.05.1 or the package cannot be built on Fedora, 
unless you are using an additional repository (copr).

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

91 matches

Mail list logo