Re: [ovs-dev] [PATCH 1/2] ovn: Avoid tunneling for VLAN packets redirected to a gateway chassis

2018-10-12 Thread Mark Michelson

Hi Numan,

The patch does a good job of explaining the routing behavior and the 
tunneling problem solved within.


Prior to the patch, you can have a distributed gateway router with a 
redirect-chassis port set on it. This allows for east-west traffic to 
have an optimal direct path between hypervisors, but for the north-south 
use case, when the traffic is redirected to the redirect-chassis, the 
traffic is encapsulated.


With this patch, you add the reside-on-redirect-chassis option to router 
ports. This essentially makes all traffic destined for the router port 
get redirected to the gateway chassis prior to running the router 
pipeline. This removes the encapsulation issue, but it also means that 
east-west traffic is now also centralized.


I'm curious what the current behavior is when you specify a gateway 
router by setting options:chassis. Specifically, I'm curious about how 
it compares if you define a router where the "external" port has 
options:redirect-chassis set on it and all other ports have 
options:reside-on-redirect-chassis set on them. Have you essentially 
just created the same thing? Or is there some subtle difference?


On 10/05/2018 01:14 PM, nusid...@redhat.com wrote:

From: Numan Siddique 

An OVN deployment can have multiple logical switches each with a
localnet port connected to a distributed logical router with one
logical router port providing external connectivity (provider network)
and others used as tenant networks with VLAN tagging.

As reported in [1], external traffic from these VLAN tenant networks
are tunnelled to the gateway chassis (chassis hosting a distributed
gateway port which applies NAT rules). As part of the discussion in
[1], there were few possible solutions proposed by Russell [2]. This
patch implements the first option in [2].

With this patch, a new option 'reside-on-redirect-chassis' in 'options'
column of Logical_Router_Port table is added. If the value of this
option is set to 'true' and if the logical router also have a
distributed gateway port, then routing for this logical router port
is centralized in the chassis hosting the distributed gateway port.

If a logical switch 'sw0' is connected to a router 'lr0' with the
router port - 'lr0-sw0' with the address - "00:00:00:00:af:12 192.168.1.1"
, and it has a distributed logical port - 'lr0-public', then the
below logical flow is added in the logical switch pipeline
of 'sw0' if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' -

table=16(ls_in_l2_lkup), priority=50,
match=(eth.dst == 00:00:00:00:af:12 && is_chassis_resident("cr-lr0-public")),
action=(outport = "sw0-lr0"; output;)

With the above flow, the packet doesn't enter the router pipeline in
the source chassis. Instead the packet is sent out via the localnet
port of 'sw0'. The gateway chassis upon receiving this packet, runs
the logical router pipeline applying NAT rules and sends the traffic
out via the localnet port of the provider network. The gateway
chassis will also reply to the ARP requests for the router port IPs.

With this approach, we avoid redirecting the external traffic to the
gateway chassis via the tunnel port. There are a couple of drawbacks
with this approach:

   - East - West routing is no more distributed for the VLAN tenant
 networks if 'reside-on-redirect-chassis' option is defined

   - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port'
 columns defined will not work for the VLAN tenant networks.

This approach is taken for now as it is simple. If there is a requirement
to support distributed routing for these VLAN tenant networks, we
can explore other possible solutions.

[1] -  https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
[2] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html

Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
Reported-by: venkata anil 
Co-authored-by: venkata anil 
Signed-off-by: Numan Siddique 
Signed-off-by: venkata anil 
---
  ovn/northd/ovn-northd.8.xml |  30 
  ovn/northd/ovn-northd.c |  71 +++---
  ovn/ovn-architecture.7.xml  | 160 +
  ovn/ovn-nb.xml  |  43 ++
  tests/ovn.at| 273 
  5 files changed, 561 insertions(+), 16 deletions(-)

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 7352c6764..f52699bd3 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -874,6 +874,25 @@ output;
  resident.

  
+
+
+  For the Ethernet address on a logical switch port of type
+  router, when that logical switch port's
+   column is set to router and
+  the connected logical router port specifies a
+  reside-on-redirect-chassis and the logical router
+  to which the connected logical router port belongs to has a
+  redirect-chassis distributed ga

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Avoid deadlock on multicast snooping recursion.

2018-10-12 Thread Yifeng Sun
Looks good to me, thanks.

Reviewed-by: Yifeng Sun 

On Mon, Aug 20, 2018 at 8:26 PM Ben Pfaff  wrote:

> Until now, OVS did multicast snooping outputs holding the read-lock on
> the mcast_snooping object.  This could recurse via a patch port to try to
> take the write-lock on the same object, which deadlocked.  This patch fixes
> the problem, by releasing the read-lock before doing any outputs.
>
> It would probably be better to use RCU for mcast_snooping.  That would be
> a bigger patch and less suitable for backporting.
>
> Reported-by: Sameh Elsharkawy
> Reported-at: https://github.com/openvswitch/ovs-issues/issues/153
> Signed-off-by
> : Ben
> Pfaff 
> ---
>  ofproto/ofproto-dpif-xlate.c | 102
> +++
>  1 file changed, 84 insertions(+), 18 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index e26f6c8f554a..7701b64a2451 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -534,6 +534,8 @@ static void do_xlate_actions(const struct ofpact *,
> size_t ofpacts_len,
>  static void clone_xlate_actions(const struct ofpact *, size_t ofpacts_len,
>  struct xlate_ctx *, bool, bool);
>  static void xlate_normal(struct xlate_ctx *);
> +static void xlate_normal_flood(struct xlate_ctx *ct,
> +   struct xbundle *in_xbundle, struct xvlan
> *);
>  static void xlate_table_action(struct xlate_ctx *, ofp_port_t in_port,
> uint8_t table_id, bool may_packet_in,
> bool honor_table_miss, bool with_ct_orig,
> @@ -2690,6 +2692,53 @@ update_mcast_snooping_table(const struct xlate_ctx
> *ctx,
>  }
>  ovs_rwlock_unlock(&ms->rwlock);
>  }
> +
> +/* A list of multicast output ports.
> + *
> + * We accumulate output ports and then do all the outputs afterward.  It
> would
> + * be more natural to do the outputs one at a time as we discover the
> need for
> + * each one, but this can cause a deadlock because we need to take the
> + * mcast_snooping's rwlock for reading to iterate through the port lists
> and
> + * doing an output, if it goes to a patch port, can eventually come back
> to the
> + * same mcast_snooping and attempt to take the write lock (see
> + * https://github.com/openvswitch/ovs-issues/issues/153). */
> +struct mcast_output {
> +/* Discrete ports. */
> +struct xbundle **xbundles;
> +size_t n, allocated;
> +
> +/* If set, flood to all ports. */
> +bool flood;
> +};
> +#define MCAST_OUTPUT_INIT { NULL, 0, 0, false }
> +
> +/* Add 'mcast_bundle' to 'out'. */
> +static void
> +mcast_output_add(struct mcast_output *out, struct xbundle *mcast_xbundle)
> +{
> +if (out->n >= out->allocated) {
> +out->xbundles = x2nrealloc(out->xbundles, &out->allocated,
> +   sizeof *out->xbundles);
> +}
> +out->xbundles[out->n++] = mcast_xbundle;
> +}
> +
> +/* Outputs the packet in 'ctx' to all of the output ports in 'out', given
> input
> + * bundle 'in_xbundle' and the current 'xvlan'. */
> +static void
> +mcast_output_finish(struct xlate_ctx *ctx, struct mcast_output *out,
> +struct xbundle *in_xbundle, struct xvlan *xvlan)
> +{
> +if (out->flood) {
> +xlate_normal_flood(ctx, in_xbundle, xvlan);
> +} else {
> +for (size_t i = 0; i < out->n; i++) {
> +output_normal(ctx, out->xbundles[i], xvlan);
> +}
> +}
> +
> +free(out->xbundles);
> +}
>
>  /* send the packet to ports having the multicast group learned */
>  static void
> @@ -2697,7 +2746,7 @@ xlate_normal_mcast_send_group(struct xlate_ctx *ctx,
>struct mcast_snooping *ms OVS_UNUSED,
>struct mcast_group *grp,
>struct xbundle *in_xbundle,
> -  const struct xvlan *xvlan)
> +  struct mcast_output *out)
>  OVS_REQ_RDLOCK(ms->rwlock)
>  {
>  struct mcast_group_bundle *b;
> @@ -2707,7 +2756,7 @@ xlate_normal_mcast_send_group(struct xlate_ctx *ctx,
>  mcast_xbundle = xbundle_lookup(ctx->xcfg, b->port);
>  if (mcast_xbundle && mcast_xbundle != in_xbundle) {
>  xlate_report(ctx, OFT_DETAIL, "forwarding to mcast group
> port");
> -output_normal(ctx, mcast_xbundle, xvlan);
> +mcast_output_add(out, mcast_xbundle);
>  } else if (!mcast_xbundle) {
>  xlate_report(ctx, OFT_WARN,
>   "mcast group port is unknown, dropping");
> @@ -2723,7 +2772,8 @@ static void
>  xlate_normal_mcast_send_mrouters(struct xlate_ctx *ctx,
>   struct mcast_snooping *ms,
>   struct xbundle *in_xbundle,
> - const struct xvlan *xv

[ovs-dev] Nombreux modèles

2018-10-12 Thread Location ou achat
tranquille

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] dpif-netdev.at: Add datapath flow modification test.

2018-10-12 Thread Ilya Maximets
This test is intended to cover flow_put operation for datapath
flow modifications.

Original bug was reported here:
https://mail.openvswitch.org/pipermail/ovs-dev/2018-September/352579.html
And fixed by commit:
35fe9efb2f02 ("dpif-netdev: Add vlan to mask for flow_put operation.")

Signed-off-by: Ilya Maximets 
---

Ben, this test based on your test script. So, I'd like to add

Co-authored-by: Ben Pfaff 
Signed-off-by: Ben Pfaff 

if you agree to sign-off this.

 tests/dpif-netdev.at | 54 
 1 file changed, 54 insertions(+)

diff --git a/tests/dpif-netdev.at b/tests/dpif-netdev.at
index 378292dac..6915d43ba 100644
--- a/tests/dpif-netdev.at
+++ b/tests/dpif-netdev.at
@@ -149,6 +149,60 @@ 
skb_priority(0),skb_mark(0),ct_state(-new-est-rel-rpl-inv-trk-snat-dnat),ct_zone
 DPIF_NETDEV_MISS_FLOW_INSTALL([dummy])
 DPIF_NETDEV_MISS_FLOW_INSTALL([dummy-pmd])
 
+m4_define([DPIF_NETDEV_FLOW_PUT_MODIFY],
+  [AT_SETUP([dpif-netdev - datapath flow modification - $1])
+   OVS_VSWITCHD_START(
+ [add-port br0 p1 -- set interface p1 type=$1 ofport_request=1 
options:pstream=punix:$OVS_RUNDIR/p1.sock -- \
+  add-port br0 p2 -- set interface p2 type=$1 ofport_request=2 
options:pstream=punix:$OVS_RUNDIR/p2.sock -- \
+  set bridge br0 datapath-type=dummy \
+ other-config:datapath-id=1234 fail-mode=secure], [], [],
+  [m4_if([$1], [dummy-pmd], [--dummy-numa="0,0,0,0,1,1,1,1"], [])])
+   AT_CHECK([ovs-appctl vlog/set dpif:file:dbg dpif_netdev:file:dbg])
+
+   # Add a flow that directs some packets received on p1 to p2 and the
+   # rest back out p1.
+   AT_CHECK([ovs-ofctl del-flows br0])
+   AT_CHECK([ovs-ofctl add-flow br0 
priority=1,ip,in_port=1,dl_src=00:06:07:08:09:0a,dl_dst=00:01:02:03:04:05,actions=output:2])
+   AT_CHECK([ovs-ofctl add-flow br0 priority=0,in_port=1,actions=IN_PORT])
+
+   # Inject a packet of the form that should go to p2.
+   
packet="in_port(1),packet_type(ns=0,id=0),eth(src=00:06:07:08:09:0a,dst=00:01:02:03:04:05),eth_type(0x8100),vlan(vid=1000,pcp=5),encap(eth_type(0x0800),ipv4(src=127.0.0.1,dst=127.0.0.1,proto=0,tos=0,ttl=64,frag=no))"
+   AT_CHECK([ovs-appctl netdev-dummy/receive p1 $packet --len 64], [0])
+
+   OVS_WAIT_UNTIL([grep "miss upcall" ovs-vswitchd.log])
+   AT_CHECK([grep -A 1 'miss upcall' ovs-vswitchd.log | tail -n 1], [0], [dnl
+skb_priority(0),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),recirc_id(0),dp_hash(0),in_port(1),packet_type(ns=0,id=0),eth(src=00:06:07:08:09:0a,dst=00:01:02:03:04:05),eth_type(0x8100),vlan(vid=1000,pcp=5),encap(eth_type(0x0800),ipv4(src=127.0.0.1,dst=127.0.0.1,proto=0,tos=0,ttl=64,frag=no))
+])
+   ovs-appctl revalidator/wait
+   # Dump the datapath flow to see that it goes to p2 ("actions:2").
+   AT_CHECK([ovs-appctl dpif/dump-flows br0], [0], [dnl
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=00:06:07:08:09:0a,dst=00:01:02:03:04:05),eth_type(0x8100),vlan(vid=1000,pcp=5),encap(eth_type(0x0800),ipv4(frag=no)),
 packets:0, bytes:0, used:never, actions:2
+])
+
+   # Delete the flows, then add new flows that would not match the same
+   # packet as before.
+   AT_CHECK([ovs-ofctl del-flows br0])
+   AT_CHECK([ovs-ofctl add-flow br0 
priority=1,in_port=1,dl_src=00:06:07:08:09:0a,dl_dst=00:01:02:03:04:05,dl_type=0x0801,actions=output:2])
+   AT_CHECK([ovs-ofctl add-flow br0 priority=0,in_port=1,actions=IN_PORT])
+
+   # Wait for flow revalidation
+   ovs-appctl revalidator/wait
+
+   # Inject the same packet again.
+   AT_CHECK([ovs-appctl netdev-dummy/receive p1 $packet --len 64])
+
+   ovs-appctl revalidator/wait
+   # Dump the datapath flow to see that it goes to p1 ("actions:IN_PORT").
+   AT_CHECK([ovs-appctl dpif/dump-flows br0 | strip_timers], [0], [dnl
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=00:06:07:08:09:0a,dst=00:01:02:03:04:05),eth_type(0x8100),vlan(vid=1000,pcp=5),encap(eth_type(0x0800),ipv4(frag=no)),
 packets:1, bytes:64, used:0.0s, actions:1
+])
+   OVS_VSWITCHD_STOP
+   AT_CLEANUP])
+
+DPIF_NETDEV_FLOW_PUT_MODIFY([dummy])
+DPIF_NETDEV_FLOW_PUT_MODIFY([dummy-pmd])
+
+
 m4_define([DPIF_NETDEV_MISS_FLOW_DUMP],
   [AT_SETUP([dpif-netdev - miss upcall key matches flow_dump - $1])
OVS_VSWITCHD_START(
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] OVS DPDK: dpdk_merge pull request for master

2018-10-12 Thread Stokes, Ian
Hi Ben,

The following changes since commit 4617d1f6bd24c543f533f6485b42ebca6b0a8371:

  Work around Python/C JSON unicode differences (2018-10-11 15:00:59 -0700)

are available in the git repository at:

  https://github.com/istokes/ovs dpdk_merge

for you to fetch changes up to c92ccb42bdb544dc5a9fde1fa59826563905e09b:

  system-dpdk: Connect network namespaces via dpdkvhostuser ports (2018-10-12 
15:24:36 +0100)


Aaron Conole (3):
  system-dpdk: Update test suite for non-phy testing
  system-dpdk: Allow running the dpdk tests from a VM
  system-dpdk: Use a different character marker for sed commands

Bala Sankaran (3):
  system-dpdk: Skip all tests if there are no hugepages
  system-dpdk: Convert /tmp to use OVS_RUNDIR
  system-dpdk: Connect network namespaces via dpdkvhostuser ports

Flavio Leitner (1):
  dp-packet.h: move funcs to be within cond block

Ilya Maximets (3):
  dpif-netdev-unixctl: Change 'masked' to 'megaflow'.
  dpif-netdev-perf: Print SMC statistics.
  dpif-netdev-perf: Clarify frequency number.

 Documentation/topics/testing.rst |  14 ---
 lib/dp-packet.h  | 260 
++--
 lib/dpif-netdev-perf.c   |  39 --
 lib/dpif-netdev-perf.h   |   2 +-
 lib/dpif-netdev-unixctl.man  |  31 +---
 manpages.mk  |   2 ++
 tests/system-dpdk-macros.at  |  20 
 tests/system-dpdk.at | 191 

 vswitchd/ovs-vswitchd.8.in   |   6 +
 9 files changed, 391 insertions(+), 174 deletions(-)

Thanks
Ian

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 3/3] revalidator: Rebalance offloaded flows based on the pps rate

2018-10-12 Thread Kevin Traynor
On 09/27/2018 07:48 AM, Sriharsha Basavapatna via dev wrote:
> This is the third patch in the patch-set to support dynamic rebalancing
> of offloaded flows.
> 
> The dynamic rebalancing functionality is implemented in this patch. The
> ukeys that are not scheduled for deletion are obtained and passed as input
> to the rebalancing routine. The rebalancing is done in the context of
> revalidation leader thread, after all other revalidator threads are
> done with gathering rebalancing data for flows.
> 
> For each netdev that is in OOR state, a list of flows - both offloaded
> and non-offloaded (pending) - is obtained using the ukeys. For each netdev
> that is in OOR state, the flows are grouped and sorted into offloaded and
> pending flows.  The offloaded flows are sorted in descending order of
> pps-rate, while pending flows are sorted in ascending order of pps-rate.
> 
> The rebalancing is done in two phases. In the first phase, we try to
> offload all pending flows and if that succeeds, the OOR state on the device
> is cleared. If some (or none) of the pending flows could not be offloaded,
> then we start replacing an offloaded flow that has a lower pps-rate than
> a pending flow, until there are no more pending flows with a higher rate
> than an offloaded flow. The flows that are replaced from the device are
> added into kernel datapath.
> 
> A new OVS configuration parameter "offload-rebalance", is added to ovsdb.
> The default value of this is "false". To enable this feature, set the
> value of this parameter to "true", which provides packets-per-second
> rate based policy to dynamically offload and un-offload flows.
> 
> Note: This option can be enabled only when 'hw-offload' policy is enabled.
> It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow
> offload errors (specifically ENOSPC error this feature depends on) reported
> by an offloaded device are supressed by TC-Flower kernel module.
> 
> Signed-off-by: Sriharsha Basavapatna 
> Co-authored-by: Venkat Duvvuru 
> Signed-off-by: Venkat Duvvuru 
> Reviewed-by: Sathya Perla 
> ---

A few of minor comments below, nothing worth holding up a merge over -
could be addressed in follow up if required.



>  
> +static void
> +udpif_run_flow_rebalance(struct udpif *udpif)
> +{
> +long long int now = 0;
> +
> +/* Don't rebalance if OFFL_REBAL_INTVL_MSEC have not elapsed */
> +now = time_msec();
> +if (now < udpif->offload_rebalance_time + OFFL_REBAL_INTVL_MSEC) {
> +return;
> +}
> +
> +if (!netdev_any_oor()) {

won't you continually check this when there are no oor netdevs as
offload_rebalance_time is not updated - how about setting
'offload_rebalance_time = now' here too, or you want it to check
continually?

> +return;
> +}
> +
> +VLOG_DBG("Offload rebalance: Found OOR netdevs");

Seems like this could easily be a persistent state, do you want to log
it each time this code runs? perhaps just when the state changes?

> +udpif->offload_rebalance_time = now;
> +udpif_flow_rebalance(udpif);
> +}
> +
>  static void *
>  udpif_revalidator(void *arg)
>  {
> @@ -933,6 +963,9 @@ udpif_revalidator(void *arg)
>  
>  dpif_flow_dump_destroy(udpif->dump);
>  seq_change(udpif->dump_seq);
> +if (netdev_is_offload_rebalance_policy_enabled()) {
> +udpif_run_flow_rebalance(udpif);
> +}
>  
>  duration = MAX(time_msec() - start_time, 1);
>  udpif->dump_duration = duration;
> @@ -977,7 +1010,7 @@ udpif_revalidator(void *arg)
>  
>  return NULL;
>  }
> -
> +
>  static enum upcall_type
>  classify_upcall(enum dpif_upcall_type type, const struct nlattr *userdata,
>  struct user_action_cookie *cookie)
> @@ -1579,7 +1612,7 @@ handle_upcalls(struct udpif *udpif, struct upcall 
> *upcalls,
>  for (i = 0; i < n_ops; i++) {
>  opsp[n_opsp++] = &ops[i].dop;
>  }
> -dpif_operate(udpif->dpif, opsp, n_opsp);
> +dpif_operate(udpif->dpif, opsp, n_opsp, DPIF_OFFLOAD_AUTO);
>  for (i = 0; i < n_ops; i++) {
>  struct udpif_key *ukey = ops[i].ukey;
>  
> @@ -1671,13 +1704,14 @@ ukey_create__(const struct nlattr *key, size_t 
> key_len,
>  ukey->state = UKEY_CREATED;
>  ukey->state_thread = ovsthread_id_self();
>  ukey->state_where = OVS_SOURCE_LOCATOR;
> -ukey->created = time_msec();
> +ukey->created = ukey->flow_time = time_msec();
>  memset(&ukey->stats, 0, sizeof ukey->stats);
>  ukey->stats.used = used;
>  ukey->xcache = NULL;
>  
>  ukey->offloaded = false;
>  ukey->flow_time = 0;

this is leftover from 2/3 and can be removed, flow_time is set a few
lines above.

> +ukey->in_netdev = NULL;
>  ukey->flow_packets = ukey->flow_backlog_packets = 0;
>  
>  ukey->key_recirc_id = key_recirc_id;
> @@ -2329,7 +2363,7 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, 
> size_t n_ops)
>  for (i = 0; i < n_ops; i++) {
>   

Re: [ovs-dev] OVS DPDK: dpdk_merge pull request for master

2018-10-12 Thread Ben Pfaff
On Fri, Oct 12, 2018 at 03:26:57PM +, Stokes, Ian wrote:
> Hi Ben,
> 
> The following changes since commit 4617d1f6bd24c543f533f6485b42ebca6b0a8371:
> 
>   Work around Python/C JSON unicode differences (2018-10-11 15:00:59 -0700)
> 
> are available in the git repository at:
> 
>   https://github.com/istokes/ovs dpdk_merge
> 
> for you to fetch changes up to c92ccb42bdb544dc5a9fde1fa59826563905e09b:
> 
>   system-dpdk: Connect network namespaces via dpdkvhostuser ports (2018-10-12 
> 15:24:36 +0100)

Thanks, Ian.  I merged this to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS DPDK: dpdk_merge pull request for branch-2.10

2018-10-12 Thread Ben Pfaff
On Fri, Oct 12, 2018 at 03:33:47PM +, Stokes, Ian wrote:
> Hi Ben,
> 
> The following changes since commit 9eaf09872cb96d4ab3fe80d23032d480442c4c2b:
> 
>   ovn-controller: Support processing DHCPv6 information request message type 
> (2018-10-11 17:52:43 -0700)
> 
> are available in the git repository at:
> 
>   https://github.com/istokes/ovs dpdk_merge_2_10
> 
> for you to fetch changes up to ce3f8ea195451fcfe66dd2bda86a1ab5c4ab5533:
> 
>   dpif-netdev-perf: Print SMC statistics. (2018-10-12 15:40:12 +0100)

Thanks, Ian.  I merged this to branch-2.10.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] OVS DPDK: dpdk_merge pull request for branch-2.10

2018-10-12 Thread Stokes, Ian
Hi Ben,

The following changes since commit 9eaf09872cb96d4ab3fe80d23032d480442c4c2b:

  ovn-controller: Support processing DHCPv6 information request message type 
(2018-10-11 17:52:43 -0700)

are available in the git repository at:

  https://github.com/istokes/ovs dpdk_merge_2_10

for you to fetch changes up to ce3f8ea195451fcfe66dd2bda86a1ab5c4ab5533:

  dpif-netdev-perf: Print SMC statistics. (2018-10-12 15:40:12 +0100)


Ilya Maximets (2):
  dpif-netdev-unixctl: Change 'masked' to 'megaflow'.
  dpif-netdev-perf: Print SMC statistics.

 lib/dpif-netdev-perf.c  | 3 +++
 lib/dpif-netdev-perf.h  | 2 +-
 lib/dpif-netdev-unixctl.man | 3 ++-
 3 files changed, 6 insertions(+), 2 deletions(-)

Thanks
Ian

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Queries on LTS Release & OVSDB Replication featureset

2018-10-12 Thread Ben Pfaff
I don't see any evidence that you've attempted to answer these questions
yourself.  Most of it can be easily figured out from the OVS Git
repository.

On Fri, Oct 12, 2018 at 09:49:52AM +, kishore.yetik...@dell.com wrote:
> Dear OVS Team:
>   Gentle reminder. We need this info at the earliest to decide on the right 
> release to be chosen for our projects.
> 
> Regards
> Kishore
> 
> From: Yetikuri, Kishore
> Sent: Thursday, October 11, 2018 2:24 PM
> To: 'd...@openvswitch.org'
> Cc: Ghalam, Joe; Gada, Bhavini; Ayalasomayajul, Mohan; Nisar, Bhavesh; Lazar, 
> Mihai
> Subject: Queries on LTS Release & OVSDB Replication featureset
> 
> Dear OVS Team:
>   We are interested in using OVSDB replication feature released as patch 
> (603155) with a LTS release. 
> Towards achieving that, we request following info from you:
> 
> Current LTS release - 2.5.x
> 
> -  Does current LTS release (2.5.5) contain this patch?
> 
> -  If i2.5.x does not contain this patch, is it possible to release 
> 2.5.6 with this patch?
> 
> -  Is this patch (603155) 
> compatible with 2.5.x? If we install this patch on top of 2.5.x, can we 
> continue to get support as expected of a LTS release?
> 
> Next LTS release
> 
> -  Pls indicate ETA for next LTS release
> 
> -  Is there any plan to convert 2.10.x into a LTS release?
> 
> 
> The above info will greatly help us in picking the right release matching all 
> our requirements.
> 
> Looking forward to hearing from you.
> 
> Regards
> Kihosre
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/6] connmgr: Suppress duplicate port status notifications.

2018-10-12 Thread Ben Pfaff
On Fri, Aug 24, 2018 at 10:35:16AM -0700, Ben Pfaff wrote:
> When the status of a port changes, ofproto calls into connmgr to notify
> controllers.  Sometimes, particular changes are only visible to controllers
> running specific versions of OpenFlow.  Until now, OVS would send those
> controllers duplicate port status notifications.  This is unnecessary and
> somewhat confusing.  This commit eliminates it.
> 
> This commit updates one of the tests not to expect duplicate notifications.
> 
> Signed-off-by: Ben Pfaff 

This series still needs a review.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Possible Regression due to "ossfuzz: Break flow test target into two targets to speed up fuzzing."

2018-10-12 Thread Ben Pfaff
On Fri, Oct 12, 2018 at 03:44:51PM +0300, Ilya Maximets wrote:
> > Hi,
> > 
> > it seems that travis-ci is failing due to a testsuite regression introduced
> > by 1adcbcee8f4c ("ossfuzz: Break flow test target into two targets to speed
> > up fuzzing.").
> > 
> > https://travis-ci.org/openvswitch/ovs/jobs/439811394
> 
> Hi Simon,
> 
> As it was already said, this is not the patch that produces the issue.
> I'm able to reproduce '2649: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR'
> test failure locally by restricting to use only one CPU core with taskset.
> So, I bisected the issue and found that the fist patch that fails is:
> 
>   2e5cdb4b1392 ("OVN: add buffering support for ip packets").
> 
> Issue happens only if single core assigned:
> 
>   taskset -c 2   make check TESTSUITEFLAGS='2649'  #FAILS
>   taskset -c 2-3 make check TESTSUITEFLAGS='2649'  #OK
> 
> As Travis has no much CPU resources it fails all the time.
> 
> I'm not much familiar with OVN code/tests, hope above information will
> be useful for your investigation.

I can reproduce this too, with your hint (although it's test 3309, not
2649).

The failure is because a lot of expected packets don't show up:

...
rcv_n=212 exp_n=233
ovn.at:12: wait failed after 10 seconds

It needs more looking into.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 8/8] ofproto: Handle flow monitor requests with multiple parts.

2018-10-12 Thread Ben Pfaff
On Thu, Aug 30, 2018 at 01:00:56PM -0700, Ben Pfaff wrote:
> Signed-off-by: Ben Pfaff 

This series still needs a review.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] dpif-netdev.at: Add missing backslash.

2018-10-12 Thread Ilya Maximets
Lines splitted without '\' and the second line is never executed.

Fixes: b10d46a60013 ("tests: Check dpif-netdev odp_actions consistency.")
Signed-off-by: Ilya Maximets 
---
 tests/dpif-netdev.at | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/tests/dpif-netdev.at b/tests/dpif-netdev.at
index fff395d56..378292dac 100644
--- a/tests/dpif-netdev.at
+++ b/tests/dpif-netdev.at
@@ -110,8 +110,10 @@ DPIF_NETDEV_DUMMY_IFACE([dummy-pmd])
 m4_define([DPIF_NETDEV_MISS_FLOW_INSTALL],
   [AT_SETUP([dpif-netdev - miss upcall key matches flow_install - $1])
OVS_VSWITCHD_START(
- [add-port br0 p1 -- set interface p1 type=$1 
options:pstream=punix:$OVS_RUNDIR/p0.sock
-  set bridge br0 datapath-type=dummy other-config:datapath-id=1234 
fail-mode=secure], [], [],
+ [add-port br0 p1 \
+  -- set interface p1 type=$1 options:pstream=punix:$OVS_RUNDIR/p0.sock \
+  -- set bridge br0 datapath-type=dummy \
+other-config:datapath-id=1234 fail-mode=secure], [], 
[],
   [m4_if([$1], [dummy-pmd], [--dummy-numa="0,0,0,0,1,1,1,1"], [])])
AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg])
 
@@ -150,8 +152,10 @@ DPIF_NETDEV_MISS_FLOW_INSTALL([dummy-pmd])
 m4_define([DPIF_NETDEV_MISS_FLOW_DUMP],
   [AT_SETUP([dpif-netdev - miss upcall key matches flow_dump - $1])
OVS_VSWITCHD_START(
- [add-port br0 p1 -- set interface p1 type=$1 
options:pstream=punix:$OVS_RUNDIR/p0.sock
-  set bridge br0 datapath-type=dummy other-config:datapath-id=1234 
fail-mode=secure], [], [],
+ [add-port br0 p1 \
+  -- set interface p1 type=$1 options:pstream=punix:$OVS_RUNDIR/p0.sock \
+  -- set bridge br0 datapath-type=dummy \
+other-config:datapath-id=1234 fail-mode=secure], [], 
[],
   [m4_if([$1], [dummy-pmd], [--dummy-numa="0,0,0,0,1,1,1,1"], [])])
AT_CHECK([ovs-appctl upcall/disable-ufid], [0], [Datapath dumping tersely 
using UFID disabled
 ], [])
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [dpdk-howl PATCH v5 1/2] netdev-dpdk: Upgrade to dpdk v18.08

2018-10-12 Thread Kevin Traynor
On 10/10/2018 05:14 PM, Ophir Munk wrote:
> 1. Enable compilation and linkage with dpdk 18.08.0
> The following dpdk commits which were introduced after dpdk 17.11.x
> require OVS updates to accommodate to the dpdk changes.
> - ce17edde ("ethdev: introduce Rx queue offloads API")
> - ab3ce1e0 ("ethdev: remove old offload API")
> - c06ddf96 ("meter: add configuration profile")
> - e58638c3 ("ethdev: fix TPID handling in flow API")
> - cd8c7c7c ("ethdev: replace bus specific struct with generic dev")
> - ac8d22de ("ethdev: flatten RSS configuration in flow API")
> 
> 2. Limit configured rss hash functions to only those supported
> by the eth device.
> 
> 3. Set default RSS key in struct action_rss_data, required by OVS commit
> - e8a2b5bf ("netdev-dpdk: implement flow offload with rte flow")
> when configured with "other_config:hw-offload=true"
> Remark: calling RSS with 0 length (default) key is rejected
> in DPDK 18.08 and will be enabled in DPDK 18.11. It has no effect
> when running in a "hw-offload=false" configuration.
> 
> 4. Update references to DPDK version 18.08 in Documentation and in
> travis linux-build script
> 
> 5. There are currently warnings on DPDK deprecated functions calls:
> - rte_eth_dev_attach
> - rte_eth_dev_detach
> - rte_eth_devargs_parse
> The deprecated functions calls replacements will be added to
> DPDK 18.11.
> 

hi Ophir, thanks for the patch. Just a couple of minor comments below.

> Signed-off-by: Ophir Munk 
> ---
> v1:
> First version
> 
> v2:
> Avoid seg faults cases as described in 
> https://patchwork.ozlabs.org/patch/965451/
> by using the patch in:
> https://github.com/kevintraynor/ovs-dpdk-
> master/commit/88f46cc5ab338eb4f3ca5db1eacd0effefe4fa0c
> 
> v3:
> - rebase on latest dpdk-hwol branch
> - Updates based on latest reviews to versions v1 & v2
> 
> v4:
> This patch got lost in mailing list server due to administrative issues and
> is now obsolete
> 
> v5:
> - updated commit message
> - Address all reviews (some skipped by mistake) from recent versions
> - it is suggested to ignore deprecated functions warnings as the functions 
> replacements are missing in DPDK 18.08 and will be added to DPDK 18.11 
> 
>  .travis/linux-build.sh   |   2 +-
>  Documentation/intro/install/dpdk.rst |  14 ++--
>  Documentation/topics/dpdk/vhost-user.rst |   6 +-
>  lib/netdev-dpdk.c| 130 
> ---
>  4 files changed, 95 insertions(+), 57 deletions(-)
> 
> diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
> index 4b9fc4a..4c9e952 100755
> --- a/.travis/linux-build.sh
> +++ b/.travis/linux-build.sh
> @@ -83,7 +83,7 @@ fi
>  
>  if [ "$DPDK" ]; then
>  if [ -z "$DPDK_VER" ]; then
> -DPDK_VER="17.11.3"
> +DPDK_VER="18.08"
>  fi
>  install_dpdk $DPDK_VER
>  if [ "$CC" = "clang" ]; then
> diff --git a/Documentation/intro/install/dpdk.rst 
> b/Documentation/intro/install/dpdk.rst
> index 36501c6..73610ef 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -42,7 +42,7 @@ Build requirements
>  In addition to the requirements described in :doc:`general`, building Open
>  vSwitch with DPDK will require the following:
>  
> -- DPDK 17.11.3
> +- DPDK 18.08.0
>  
>  - A `DPDK supported NIC`_
>  
> @@ -71,9 +71,9 @@ Install DPDK
>  #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
>  
> $ cd /usr/src/
> -   $ wget http://fast.dpdk.org/rel/dpdk-17.11.3.tar.xz
> -   $ tar xf dpdk-17.11.3.tar.xz
> -   $ export DPDK_DIR=/usr/src/dpdk-stable-17.11.3
> +   $ wget http://fast.dpdk.org/rel/dpdk-18.08.tar.xz
> +   $ tar xf dpdk-18.08.tar.xz
> +   $ export DPDK_DIR=/usr/src/dpdk-stable-18.08
> $ cd $DPDK_DIR
>  
>  #. (Optional) Configure DPDK as a shared library
> @@ -283,9 +283,9 @@ with either the ovs-vswitchd logs, or by running either 
> of the commands::
>  
>$ ovs-vswitchd --version
>ovs-vswitchd (Open vSwitch) 2.9.0
> -  DPDK 17.11.0
> +  DPDK 18.08.0
>$ ovs-vsctl get Open_vSwitch . dpdk_version
> -  "DPDK 17.11.0"
> +  "DPDK 18.08.0"
>  
>  At this point you can use ovs-vsctl to set up bridges and other Open vSwitch
>  features. Seeing as we've configured the DPDK datapath, we will use DPDK-type
> @@ -673,7 +673,7 @@ Limitations
>The latest list of validated firmware versions can be found in the `DPDK
>release notes`_.
>  
> -.. _DPDK release notes: 
> http://dpdk.org/doc/guides/rel_notes/release_17_11.html
> +.. _DPDK release notes: 
> http://dpdk.org/doc/guides/rel_notes/release_18_08.html
>  
>  - Upper bound MTU: DPDK device drivers differ in how the L2 frame for a
>given MTU value is calculated e.g. i40e driver includes 2 x vlan headers in
> diff --git a/Documentation/topics/dpdk/vhost-user.rst 
> b/Documentation/topics/dpdk/vhost-user.rst
> index b1e2285..56f58ba 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/d

Re: [ovs-dev] Possible Regression due to "ossfuzz: Break flow test target into two targets to speed up fuzzing."

2018-10-12 Thread Ilya Maximets
> Hi,
> 
> it seems that travis-ci is failing due to a testsuite regression introduced
> by 1adcbcee8f4c ("ossfuzz: Break flow test target into two targets to speed
> up fuzzing.").
> 
> https://travis-ci.org/openvswitch/ovs/jobs/439811394

Hi Simon,

As it was already said, this is not the patch that produces the issue.
I'm able to reproduce '2649: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR'
test failure locally by restricting to use only one CPU core with taskset.
So, I bisected the issue and found that the fist patch that fails is:

2e5cdb4b1392 ("OVN: add buffering support for ip packets").

Issue happens only if single core assigned:

taskset -c 2   make check TESTSUITEFLAGS='2649'  #FAILS
taskset -c 2-3 make check TESTSUITEFLAGS='2649'  #OK

As Travis has no much CPU resources it fails all the time.

I'm not much familiar with OVN code/tests, hope above information will
be useful for your investigation.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/2] Add support to offload GRE tunnels

2018-10-12 Thread Simon Horman
On Thu, Oct 11, 2018 at 10:06:41AM +0300, Roi Dayan wrote:
> Hi,
> 
> The first patch is to add support to offload to GRE tunnels using tc.
> THe second patch is to offload the tunnel csum option correctly.

Thanks Roi,

this looks good to me:

Reviewed-by: Simon Horman 

Let me see if I can get master clean wrt travis-ci so I can test
your patchset there before applying.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v11 09/14] dp-packet: Add support for data "linearization".

2018-10-12 Thread Lam, Tiago
On 11/10/2018 15:50, Eelco Chaudron wrote:
> Hi Tiago,
> 
> It has been a while since I reviewed this patchset, guess before my 
> summer holiday ;)
> I picked this patch out of the set as it does not have my acked-by, and 
> I assume all the other which have it did not change too much to warrant 
> a review. If they did, let me know.
> 
> See inline comments...
> 
> Cheers,

Hi Eelco,

Thanks for the comments. Yeah, I've kept them because I, too, think
there haven't been changes on those patches, aside from smaller fixes.
Most of the changes have fallen now on the linearization patch (09/14).

If I remember of something I'll make sure I'll ping you on the specific
patch. Thanks again for that.

> 
> Eelco
> 
> On 10 Oct 2018, at 18:22, Tiago Lam wrote:
> 
>> Previous commits have added support to the dp_packet API to handle
>> multi-segmented packets, where data is not stored contiguously in
>> memory. However, in some cases, it is inevitable and data must be
>> provided contiguously. Examples of such cases are when performing 
>> csums
>> over the entire packet data, or when write()'ing to a file descriptor
>> (for a tap interface, for example). For such cases, the dp_packet API
>> has been extended to provide a way to transform a multi-segmented
>> DPBUF_DPDK packet into a DPBUF_MALLOC system packet (at the expense of 
>> a
>> copy of memory). If the packet's data is already stored in memory
>> contigously then there's no need to convert the packet.
>>
>> Thus, the main use cases that were assuming that a dp_packet's data is
>> always held contiguously in memory were changed to make use of the new
>> "linear functions" in the dp_packet API when there's a need to 
>> traverse
>> the entire's packet data. Per the example above, when the packet's 
>> data
>> needs to be write() to the tap's file descriptor, or when the 
>> conntrack
>> module needs to verify a packet's checksum, the data is now 
>> linearized.
>>
>> Additionally, the layer functions, such as dp_packet_l3() and 
>> variants,
>> have been modified to check if there's enough data in the packet 
>> before
>> returning a pointer to the data (and callers have been modified
>> accordingly). This requirement is needed to guarantee that a caller
>> doesn't read beyond the available memory.
> 
> I think the last might be a problem, as now the dp_packet_l2_5(), 
> dp_packet_l3() and dp_packet_l4() might start returning NULL.
> Some the changes you made to the code referencing this functions would 
> crash if they return NULL.
> 
> Can I assume you verified that all these calls will be guarded by 
> dp_packet_linearize() so we do not access a NULL pointer?

I understand your point. Ilya also brought it up in v9 and v10. This is
my point of view.

At the moment the miniflow_extract(), where the layer offsets are
initially set, is extracting the data contiguously. It is mentioned as
part of the "limitations" that headers should be within the first mbuf
(1920B). So, the only other way to get a NULL is if while manipulating
the packet one sets the headers up to more than those 1920B. I haven't
been able to find a case where this happens, since even if you are
pushing headers you won't go above the headroom (128B by default). While
I haven't verified all cases, I would say that this would happen rarely
and would consider it a corner case (you also have to had opted-in for
multi-segs mbufs option).

So, my point here, really, is that I'd rather find and fix those corner
cases than polluting the code with NULL checks (as I mentioned before,
the code under ovn/ won't be dealing with DPBUF_DPDK so it won't be
returning NULL).

But I understand this is the second person pointing this out, so I'll
extend the scope to cover the miniflow_extract() and deal with the NULL
returns.

(There are some in-line comments below which I didn't reply to because
they are related to this.)

> 
> In addition, why have you decided not to use the dp_packet_linearize() 
> inside these functions (I think I have an idea)?
> This might allow skipping the:
> 
>   if (!dp_packet_is_linear(pkt)) {
>  dp_packet_linearize(pkt);
>  }
> 
> Maybe the above can be changed to just dp_packet_linearize(), and have 
> the dp_packet_is_linear() part of the first call. They are both inline 
> functions already.

A previous version introduced a dp_packet_linear_data() API, which would
linearize the packet and return the pointer to the data. As Ilya pointed
out in the patch, and I agreed with, because dp_packet_linear_data()
would change the address space of where the data is, it would be very
easy for a caller to misuse it. So this has evolved into two separate
functions; One to check if the packet is linear and another to linearize
it. This would be the main reason, I'd say. The other is more personal,
but it feels odd to me to have a standalone call to
`dp_packet_linearize()` in the middle of the code without any other
context. This, I think, gives some more context.

> 
> 

Re: [ovs-dev] Queries on LTS Release & OVSDB Replication featureset

2018-10-12 Thread Kishore.Yetikuri
Dear OVS Team:
  Gentle reminder. We need this info at the earliest to decide on the right 
release to be chosen for our projects.

Regards
Kishore

From: Yetikuri, Kishore
Sent: Thursday, October 11, 2018 2:24 PM
To: 'd...@openvswitch.org'
Cc: Ghalam, Joe; Gada, Bhavini; Ayalasomayajul, Mohan; Nisar, Bhavesh; Lazar, 
Mihai
Subject: Queries on LTS Release & OVSDB Replication featureset

Dear OVS Team:
  We are interested in using OVSDB replication feature released as patch 
(603155) with a LTS release. 
Towards achieving that, we request following info from you:

Current LTS release - 2.5.x

-  Does current LTS release (2.5.5) contain this patch?

-  If i2.5.x does not contain this patch, is it possible to release 
2.5.6 with this patch?

-  Is this patch (603155) 
compatible with 2.5.x? If we install this patch on top of 2.5.x, can we 
continue to get support as expected of a LTS release?

Next LTS release

-  Pls indicate ETA for next LTS release

-  Is there any plan to convert 2.10.x into a LTS release?


The above info will greatly help us in picking the right release matching all 
our requirements.

Looking forward to hearing from you.

Regards
Kihosre
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 0/3] Support dynamic rebalancing of offloaded flows

2018-10-12 Thread Simon Horman
On Thu, Oct 11, 2018 at 04:01:40PM -0700, Ben Pfaff wrote:
> Thanks for the revision.
> 
> This seems basically OK at a glance but I'd like a second set of eyes.
> Simon, are you willing to review this?  It seems roughly in your area
> too.

Thanks Ben, Thanks Sriharsha,

I am very pleased to see work in this area.

I have a few lingering concerns, which I noted in separate emails regarding
* Extra CPU cost of processing OOR flows and;
* Correctly detecting the offload device of a tunnel
but I think they can be treated as possible further work rather
than holding up this patchset.

I am also not entirely comfortable with the use of ovs_assert() (in general)
but this may be just a matter of personal taste and again I don't think
it needs to hold up this patchset.

In all, as the feature will be disabled by default and should have negligible
impact when disabled I think it would be good to merge in its current form
to allow further testing (and ideally evolution) of this feature.

Acked-by: Simon Horman 


Ben, I would be happy to apply this series but I'd rather
do so once master travis-ci clean for master. Something I am looking at
separately. I'm also happy for you to apply this series.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 1/3] dpif-netlink: Detect Out-Of-Resource condition on a netdev

2018-10-12 Thread Simon Horman
On Thu, Sep 27, 2018 at 12:18:25PM +0530, Sriharsha Basavapatna via dev wrote:
> This is the first patch in the patch-set to support dynamic rebalancing
> of offloaded flows.
> 
> The patch detects OOR condition on a netdev port when ENOSPC error is
> returned by TC-Flower while adding a flow rule. A new structure is added
> to the netdev called "netdev_hw_info", to store OOR related information
> required to perform dynamic offload-rebalancing.
> 
> Signed-off-by: Sriharsha Basavapatna 
> Co-authored-by: Venkat Duvvuru 
> Signed-off-by: Venkat Duvvuru 
> Reviewed-by: Sathya Perla 
> ---
>  lib/dpif-netlink.c| 18 +-
>  lib/flow.c| 25 +
>  lib/flow.h|  1 +
>  lib/netdev-provider.h | 11 +++
>  lib/netdev.c  | 35 +++
>  lib/netdev.h  |  3 +++
>  6 files changed, 92 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index e6d5a6ec5..b9ce9cbe2 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -2178,7 +2178,23 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
> dpif_flow_put *put)
>  
>  VLOG_DBG("added flow");
>  } else if (err != EEXIST) {
> -VLOG_ERR_RL(&rl, "failed to offload flow: %s", ovs_strerror(err));
> +struct netdev *oor_netdev = NULL;
> +if (err == ENOSPC && netdev_is_offload_rebalance_policy_enabled()) {
> +/*
> + * We need to set OOR on the input netdev (i.e, 'dev') for the
> + * flow. But if the flow has a tunnel attribute (i.e, decap 
> action,
> + * with a virtual device like a VxLAN interface as its in-port),
> + * then lookup and set OOR on the underlying tunnel (real) 
> netdev.
> + */
> +oor_netdev = flow_get_tunnel_netdev(&match.flow.tunnel);
>
> +if (!oor_netdev) {
> +/* Not a 'tunnel' flow */
> +oor_netdev = dev;
> +}
> +netdev_set_hw_info(oor_netdev, HW_INFO_TYPE_OOR, true);
> +}
> +VLOG_ERR_RL(&rl, "failed to offload flow: %s: %s", ovs_strerror(err),
> +(oor_netdev ? oor_netdev->name : dev->name));
>  }
>  
>  out:
> diff --git a/lib/flow.c b/lib/flow.c
> index 77ed3d9df..a39807908 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -41,6 +42,8 @@
>  #include "unaligned.h"
>  #include "util.h"
>  #include "openvswitch/nsh.h"
> +#include "ovs-router.h"
> +#include "lib/netdev-provider.h"
>  
>  COVERAGE_DEFINE(flow_extract);
>  COVERAGE_DEFINE(miniflow_malloc);
> @@ -3403,3 +3406,25 @@ flow_limit_vlans(int vlan_limit)
>  flow_vlan_limit = MIN(vlan_limit, FLOW_MAX_VLAN_HEADERS);
>  }
>  }
> +
> +struct netdev *
> +flow_get_tunnel_netdev(struct flow_tnl *tunnel)
> +{
> +char iface[IFNAMSIZ];
> +struct in6_addr ip6;
> +struct in6_addr gw;
> +
> +if (tunnel->ip_src) {
> +in6_addr_set_mapped_ipv4(&ip6, tunnel->ip_src);
> +} else if (ipv6_addr_is_set(&tunnel->ipv6_src)) {
> +ip6 = tunnel->ipv6_src;
> +} else {
> +return NULL;
> +}
> +
> +if (!ovs_router_lookup(0, &ip6, iface, NULL, &gw)) {
> +return NULL;
> +}

I think what is desired here is the offload device of the flow.
In the case of a tunnel, its not the tunnel netdev. And it seems
this function tries to find the offload device by looking at the
egress route for the tunnel. But what if device of that route is a bridge,
LAG or other upper netdev?

I don't think this problem needs to block progress of this patchset.
But it may be an interesting area for further work.

> +
> +return netdev_from_name(iface);
> +}
> diff --git a/lib/flow.h b/lib/flow.h
> index d03f1ba9c..aca60c41a 100644
> --- a/lib/flow.h
> +++ b/lib/flow.h
> @@ -73,6 +73,7 @@ void flow_extract(struct dp_packet *, struct flow *);
>  void flow_zero_wildcards(struct flow *, const struct flow_wildcards *);
>  void flow_unwildcard_tp_ports(const struct flow *, struct flow_wildcards *);
>  void flow_get_metadata(const struct flow *, struct match *flow_metadata);
> +struct netdev *flow_get_tunnel_netdev(struct flow_tnl *tunnel);
>  
>  const char *ct_state_to_string(uint32_t state);
>  uint32_t ct_state_from_string(const char *);
> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
> index 5a7947351..e320dad61 100644
> --- a/lib/netdev-provider.h
> +++ b/lib/netdev-provider.h
> @@ -35,6 +35,15 @@ extern "C" {
>  struct netdev_tnl_build_header_params;
>  #define NETDEV_NUMA_UNSPEC OVS_NUMA_UNSPEC
>  
> +/* Offload-capable (HW) netdev information */
> +struct netdev_hw_info {
> +bool oor;/* Out of Offload Resources ? */
> +};
> +
> +enum hw_info_type {
> +HW_INFO_TYPE_OOR = 1 /* OOR state */
> +};
> +
>  /* A network device (e.g. an Ethernet device).
>   *

Re: [ovs-dev] [dpdk-howl PATCH v5 2/2] netdev-dpdk: Set scatter based on capabilities

2018-10-12 Thread Eelco Chaudron



On 10 Oct 2018, at 18:14, Ophir Munk wrote:


Before this commit setting scatter offload was based on checking
net_nfp device.
Since DPDK 17.11 more PMD drivers are reporting offload
capabilities. Therefore this commit removes the specific check
against net_nfp device and replaces it with a generic check of
device capabilities before setting the scatter offload.

Signed-off-by: Ophir Munk 
---

v1-v4
This patch was not included in version v1-v4 of the series

v5
Initial version

 lib/netdev-dpdk.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 4dd0ec3..ecca276 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -367,6 +367,7 @@ struct ingress_policer {
 enum dpdk_hw_ol_features {
 NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0,
 NETDEV_RX_HW_CRC_STRIP = 1 << 1,
+NETDEV_RX_HW_SCATTER = 1 << 2
 };

 /*
@@ -894,13 +895,11 @@ dpdk_eth_dev_port_config(struct netdev_dpdk 
*dev, int n_rxq, int n_txq)

 rte_eth_dev_info_get(dev->port_id, &info);

 /* As of DPDK 17.11.1 a few PMDs require to explicitly enable
- * scatter to support jumbo RX. Checking the offload capabilities
- * is not an option as PMDs are not required yet to report
- * them. The only reliable info is the driver name and knowledge
- * (testing or code review). Listing all such PMDs feels harder
- * than highlighting the one known not to need scatter */
+ * scatter to support jumbo RX.
+ * Setting scatter for the device is done after checking for
+ * scatter support in the device capabilites. */
 if (dev->mtu > ETHER_MTU) {
-if (strncmp(info.driver_name, "net_nfp", 7)) {
+if (dev->hw_ol_features & NETDEV_RX_HW_SCATTER) {
 conf.rxmode.offloads |= DEV_RX_OFFLOAD_SCATTER;
 }
 }
@@ -1035,6 +1034,13 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 dev->hw_ol_features |= NETDEV_RX_CHECKSUM_OFFLOAD;
 }

+if (info.rx_offload_capa & DEV_RX_OFFLOAD_SCATTER) {
+dev->hw_ol_features |= NETDEV_RX_HW_SCATTER;
+} else {
+/* Do not warn on lack of scatter support */
+dev->hw_ol_features &= ~NETDEV_RX_HW_SCATTER;


Don’t think we need to clear it explicitly, none of the other 
capabilities do this.

For the rest this patch look good to me.


+}
+
 n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
 n_txq = MIN(info.max_tx_queues, dev->up.n_txq);

--
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 3/3] revalidator: Rebalance offloaded flows based on the pps rate

2018-10-12 Thread Simon Horman
On Thu, Sep 27, 2018 at 12:06:42PM +0530, Sriharsha Basavapatna via dev wrote:
> Hi Simon,
> 
> Thanks for your review comments; please see my response inline.
> 
> On Mon, Sep 24, 2018 at 8:16 PM Simon Horman  
> wrote:
> >
> > Hi Sriharsha,
> >
> > thanks for your patch.
> >
> > I am pleased to see work in this area. I do however, have some questions
> > about the implementation. Please see comments inline.

...

> > > +/*
> > > + * Rebalance offloaded flows on HW netdevs that are in OOR state.
> > > + */
> > > +static void
> > > +udpif_flow_rebalance(struct udpif *udpif)
> > > +{
> > > +struct udpif_key **sort_flows = NULL;
> > > +size_t alloc_flow_count = 0;
> > > +size_t total_flow_count = 0;
> > > +int oor_netdev_count = 0;
> > > +int offload_index = 0;
> > > +int pending_index;
> > > +
> > > +/* Collect flows (offloaded and pending) that reference OOR netdevs 
> > > */
> > > +for (size_t i = 0; i < N_UMAPS; i++) {
> > > +struct udpif_key *ukey;
> > > +struct umap *umap = &udpif->ukeys[i];
> > > +
> > > +CMAP_FOR_EACH (ukey, cmap_node, &umap->cmap) {
> > > +ukey_to_flow_netdevs(udpif, ukey);
> > > +sort_flows = udpif_build_oor_flows(sort_flows, 
> > > &total_flow_count,
> > > +   &alloc_flow_count, ukey,
> > > +   &oor_netdev_count);
> > > +}
> > > +}
> >
> > If I read the above correctly it is collecting all flows present in the
> > system into a sorted list and then operating on that list below. I am
> > concerned about memory and CPU impact of this if a large number of flows,
> > lets say 500k, are present.
> 
> It is not collecting all flows present in the system. It used to, until v4
> version of the code; Ben had a similar comment on this to combine the 2 passes
> (collecting all flows and discarding non-OOR flows) into one, saving both
> space and time. We now add a flow into sort_flows[] only if its in-netdev is
> OOR (dynamically resizing sort_flows[]). So, only a subset of the flows in the
> system, that are referencing an OOR-netdev as input port are added to
> sort_flows[] and we then sort them. Another comment from Ben was to prevent
> the entire rebalancing work by first checking whether the system has any
> out-of-resource devices. A new function netdev_any_oor(), is first invoked by
> the caller.
> 
> Thanks, -Harsha

Thanks for your response to this and my other review comments and apologies
for not following-up earlier (in this case CCing me would have helped me).

Given your explanation I think that what you have is a reasonable start
and I don't object to it given that the feature must be enabled. But
I do still have a concern about the cost of this.

As I understand things the OOR condition kicks in for an offload device
when it rejects flows due to resource contention.  I am still concerned
about the cost of this loop because in a system where the offload device
can handle a large number of flows the OOR, if it occurs due to flow table
exhaustion or near-exhaustion on the offload device, means that there
are a lot of flows present on the device.  And it also implies that the
host may already be under stressed due to a) revalidating the large number
of offloaded flows and b) forwarding packets for flows that are not
offloaded due to the OOR condition.

So I while I think there may be conditions where this works quite well.
I can envisage conditions where it does not. But I'm happy for things
to proceed as is and for such scaling problems to be left open for
future work.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [dpdk-howl PATCH v5 1/2] netdev-dpdk: Upgrade to dpdk v18.08

2018-10-12 Thread Eelco Chaudron



On 10 Oct 2018, at 18:14, Ophir Munk wrote:


1. Enable compilation and linkage with dpdk 18.08.0
The following dpdk commits which were introduced after dpdk 17.11.x
require OVS updates to accommodate to the dpdk changes.
- ce17edde ("ethdev: introduce Rx queue offloads API")
- ab3ce1e0 ("ethdev: remove old offload API")
- c06ddf96 ("meter: add configuration profile")
- e58638c3 ("ethdev: fix TPID handling in flow API")
- cd8c7c7c ("ethdev: replace bus specific struct with generic dev")
- ac8d22de ("ethdev: flatten RSS configuration in flow API")

2. Limit configured rss hash functions to only those supported
by the eth device.

3. Set default RSS key in struct action_rss_data, required by OVS 
commit

- e8a2b5bf ("netdev-dpdk: implement flow offload with rte flow")
when configured with "other_config:hw-offload=true"
Remark: calling RSS with 0 length (default) key is rejected
in DPDK 18.08 and will be enabled in DPDK 18.11. It has no effect
when running in a "hw-offload=false" configuration.

4. Update references to DPDK version 18.08 in Documentation and in
travis linux-build script

5. There are currently warnings on DPDK deprecated functions calls:
- rte_eth_dev_attach
- rte_eth_dev_detach
- rte_eth_devargs_parse
The deprecated functions calls replacements will be added to
DPDK 18.11.

Signed-off-by: Ophir Munk 
---
v1:
First version

v2:
Avoid seg faults cases as described in
https://patchwork.ozlabs.org/patch/965451/
by using the patch in:
https://github.com/kevintraynor/ovs-dpdk-
master/commit/88f46cc5ab338eb4f3ca5db1eacd0effefe4fa0c

v3:
- rebase on latest dpdk-hwol branch
- Updates based on latest reviews to versions v1 & v2

v4:
This patch got lost in mailing list server due to administrative 
issues and

is now obsolete

v5:
- updated commit message
- Address all reviews (some skipped by mistake) from recent versions
- it is suggested to ignore deprecated functions warnings as the 
functions

replacements are missing in DPDK 18.08 and will be added to DPDK 18.11

 .travis/linux-build.sh   |   2 +-
 Documentation/intro/install/dpdk.rst |  14 ++--
 Documentation/topics/dpdk/vhost-user.rst |   6 +-
 lib/netdev-dpdk.c| 130 
---

 4 files changed, 95 insertions(+), 57 deletions(-)

diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
index 4b9fc4a..4c9e952 100755
--- a/.travis/linux-build.sh
+++ b/.travis/linux-build.sh
@@ -83,7 +83,7 @@ fi

 if [ "$DPDK" ]; then
 if [ -z "$DPDK_VER" ]; then
-DPDK_VER="17.11.3"
+DPDK_VER="18.08"
 fi
 install_dpdk $DPDK_VER
 if [ "$CC" = "clang" ]; then
diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst

index 36501c6..73610ef 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -42,7 +42,7 @@ Build requirements
 In addition to the requirements described in :doc:`general`, building 
Open

 vSwitch with DPDK will require the following:

-- DPDK 17.11.3
+- DPDK 18.08.0

 - A `DPDK supported NIC`_

@@ -71,9 +71,9 @@ Install DPDK
 #. Download the `DPDK sources`_, extract the file and set 
``DPDK_DIR``::


$ cd /usr/src/
-   $ wget http://fast.dpdk.org/rel/dpdk-17.11.3.tar.xz
-   $ tar xf dpdk-17.11.3.tar.xz
-   $ export DPDK_DIR=/usr/src/dpdk-stable-17.11.3
+   $ wget http://fast.dpdk.org/rel/dpdk-18.08.tar.xz
+   $ tar xf dpdk-18.08.tar.xz
+   $ export DPDK_DIR=/usr/src/dpdk-stable-18.08
$ cd $DPDK_DIR

 #. (Optional) Configure DPDK as a shared library
@@ -283,9 +283,9 @@ with either the ovs-vswitchd logs, or by running 
either of the commands::


   $ ovs-vswitchd --version
   ovs-vswitchd (Open vSwitch) 2.9.0
-  DPDK 17.11.0
+  DPDK 18.08.0
   $ ovs-vsctl get Open_vSwitch . dpdk_version
-  "DPDK 17.11.0"
+  "DPDK 18.08.0"

 At this point you can use ovs-vsctl to set up bridges and other Open 
vSwitch
 features. Seeing as we've configured the DPDK datapath, we will use 
DPDK-type

@@ -673,7 +673,7 @@ Limitations
   The latest list of validated firmware versions can be found in the 
`DPDK

   release notes`_.

-.. _DPDK release notes: 
http://dpdk.org/doc/guides/rel_notes/release_17_11.html
+.. _DPDK release notes: 
http://dpdk.org/doc/guides/rel_notes/release_18_08.html


 - Upper bound MTU: DPDK device drivers differ in how the L2 frame for 
a
   given MTU value is calculated e.g. i40e driver includes 2 x vlan 
headers in
diff --git a/Documentation/topics/dpdk/vhost-user.rst 
b/Documentation/topics/dpdk/vhost-user.rst

index b1e2285..56f58ba 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -320,9 +320,9 @@ To begin, instantiate a guest as described in 
:ref:`dpdk-vhost-user` or

 DPDK sources to VM and build DPDK::

 $ cd /root/dpdk/
-$ wget http://fast.dpdk.org/rel/dpdk-17.11.3.tar.xz
-$ tar xf dpdk-17.11.3.tar.xz
-$ export DPDK_DIR=/root/dpdk/dpdk-stable-17.11.3
+