[ovs-dev] [PATCH V2 1/1] dpif-netlink: Log eth type 0x1234 not offloadable

2019-07-02 Thread Eli Britstein
Ethernet type 0x1234 is used for testing and not being offloadable. For
testing offloadable features, log about using this value.

Signed-off-by: Eli Britstein 
Acked-by: Roi Dayan 
Signed-off-by: Eli Britstein 
---
 lib/dpif-netlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index ba80a0079..a0d51ae61 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2007,6 +2007,7 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
dpif_flow_put *put)
 
 /* When we try to install a dummy flow from a probed feature. */
 if (match.flow.dl_type == htons(0x1234)) {
+VLOG_INFO_RL(&rl, "eth 0x1234 is special and not offloadable");
 return EOPNOTSUPP;
 }
 
-- 
2.17.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] tnl-neigh-cache: Purge learnt neighbors when port/bridge is deleted

2019-07-02 Thread Vasu Dasari
Hi Flavio,

I am trying to emulate the test case scenario you mentioned earlier, where
in we need to clean you neighbor cache when an external interface goes
down. To study that, I wrote a small script based off of test case
system-traffic.at, "datapath - ping over vxlan tunnel". And this experiment
gave me good experience around the understanding of netdev and kmod
implementation of VxLAN.

Various tests I have performed using the new script include,
1. ovs-vswitchd started normally, for performing kernel mode VxLAN tests.
And in this mode, tested following cases
   a. VxLAN interface being part of OVS.
   b. VxLAN interface external to OVS
   I see that in kernel mode of operation, native tunneling is off, and
hence no ARP entries are learnt in tnl-neigh-cache. Please refer to
ofproto-dpif-xlate.c:terminate_native_tunnel(). So, whatever changes we are
making here(as part of my commit) are not effected in kernel mode of
operation.

2. ovs-vswitchd started with "--disable-system" option for performing
usermode VxLAN tests. And in this mode, tested following cases
   a. VxLAN interface being part of OVS. This test case works. In this
case, entries are cleanly removed by user deleting the ports from the
bridge.
   b. VxLAN interface external to OVS. I could not get this case to work.

3. ovs-vswitchd started normally(without --disable-system option) for
performing usermode VxLAN tests. And in this mode, tested following cases
   a. VxLAN interface being part of OVS. This test case works. In this
case, entries are cleanly removed by user deleting the ports from the
bridge.
   b. VxLAN interface external to OVS. I could not get this case to work.

I could not 2b and 3b use cases to at all. Do you expect these to work
normally?  The reason I ask this is, as I understand from the code, even
though "ovs/route/show" shows the route to remote vtep, OVS does not know
which ODP port to take to send the packet out of. Please refer to
ofproto-dpif-xlate.c:native_tunnel_output() and tnl_route_lookup_flow() and
hence packet transmission fails with "native tunnel routing failed".

If you agree that 2b and 3b are not supported use cases, then I can submit
my code changes as per your review comments.

Please let me know what you think of.
-Vasu

*Vasu Dasari*


On Mon, Jun 24, 2019 at 6:15 PM Flavio Leitner  wrote:

> On Mon, Jun 24, 2019 at 05:43:26PM -0400, Vasu Dasari wrote:
> > On Mon, Jun 24, 2019 at 3:58 PM Flavio Leitner  wrote:
> > > On Wed, Jun 19, 2019 at 11:02:07PM -0400, Vasu Dasari wrote:
> > > > +{
> > > > +struct tnl_neigh_entry *neigh;
> > > > +bool changed = false;
> > > > +
> > > > +ovs_mutex_lock(&mutex);
> > > > +CMAP_FOR_EACH(neigh, cmap_node, &table) {
> > > > +if (nullable_string_is_equal(neigh->br_name, br_name)) {
> > > > +tnl_neigh_delete(neigh);
> > > > +changed = true;
> > >
> > > Do you expect to match on additional entries? Otherwise it
> > > could bail out here.
> > >
> > >
> > Say there are two or more neighbors on the port or on bridge, then by
> > bailing out we would be missing others. So, will leave it there.
>
> You're right.
>
> [...]
> > > However, as I mentioned in the discussion, the tnl IP address could
> > > be on a device that doesn't belong to OVS at all. For example:
> [...]
> > > The tnl endpoint must have a valid route, so I suggest to hook the
> > > tnl_neigh_cache_flush into route-table.c which receives a notification
> > > when a route changes. If a route is deleted, we should flush the
> > > device's tnl cache. Doing so, would cover both userspace and kernel
> > > datapath for OVS and non-OVS devices.
> > >
> > >
> > I see what you are saying. Let me play with code a bit and resubmit
> patch.
>
> OK, looking forward to it!
> Thanks Vasu,
> fbl
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv5] tunnel: Add layer 2 IPv6 GRE encapsulation support.

2019-07-02 Thread Gregory Rose


On 7/1/2019 12:45 PM, William Tu wrote:

The patch adds ip6gre support. Tunnel type 'ip6gre' with packet_type=
legacy_l2 is a layer 2 GRE tunnel over IPv6, carrying inner ethernet packets
and encap with GRE header with outer IPv6 header.  Encapsulation of layer 3
packet over IPv6 GRE, ip6gre, is not supported yet.  I tested it by running:
   # make check-kernel TESTSUITEFLAGS='-k ip6gre'
under kernel 5.2 and for userspace:
   # make check TESTSUITEFLAGS='-k ip6gre'

Signed-off-by: William Tu 
---
v1-v2
  - rebase to master
v2-v3
   - update documentation suggested by Eli
v3-v4
   - squash Eli's documentation
v4-v5
   - remove using 'ip6gretap', use only 'ip6gre' with
 options:packet_type=legacy_l2
---


LGTM - ran into two problems testing, user error or on one and an issue 
on a single system that is not reproducible
anywhere else.  Ran check-kmod for systems running the out of tree 
kernel modules and check-kernel on a system

with the upstream kernel - all checks out well.  Passes a travis check -

https://travis-ci.org/gvrose8192/ovs-experimental/builds/552977116

Thanks William!

Tested-by: Greg Rose 
Reviewed-by: Greg Rose 

- Greg


  Documentation/faq/configuration.rst | 13 +++
  NEWS|  1 +
  datapath/linux/compat/ip6_gre.c |  2 +-
  lib/dpif-netlink-rtnl.c |  8 -
  tests/system-traffic.at | 40 +
  tests/tunnel-push-pop-ipv6.at   | 69 +
  vswitchd/vswitch.xml| 32 ++---
  7 files changed, 151 insertions(+), 14 deletions(-)

diff --git a/Documentation/faq/configuration.rst 
b/Documentation/faq/configuration.rst
index cb2c6b4eca98..ff3b71a5d4ef 100644
--- a/Documentation/faq/configuration.rst
+++ b/Documentation/faq/configuration.rst
@@ -212,6 +212,19 @@ Q: Does Open vSwitch support ERSPAN?
  options:erspan_ver=2 options:erspan_dir=1 \
  options:erspan_hwid=4
  
+Q: Does Open vSwitch support IPv6 GRE?

+
+A: Yes. L2 tunnel interface GRE over IPv6 is supported.
+L3 GRE tunnel over IPv6 is not supported.
+
+::
+
+$ ovs-vsctl add-br br0
+$ ovs-vsctl add-port br0 at_gre0 -- \
+set int at_gre0 type=ip6gre \
+options:remote_ip=fc00:100::1 \
+options:packet_type=legacy_l2
+
  Q: How do I connect two bridges?
  
  A: First, why do you want to do this?  Two connected bridges are not much

diff --git a/NEWS b/NEWS
index a38ab258fc6c..c7e84ed7931d 100644
--- a/NEWS
+++ b/NEWS
@@ -47,6 +47,7 @@ Post-v2.11.0
 - Linux datapath:
   * Support for the kernel versions 4.19.x and 4.20.x.
   * Support for the kernel version 5.0.x.
+   - Add L2 GRE tunnel over IPv6 support.
  
  
  v2.11.0 - 19 Feb 2019

diff --git a/datapath/linux/compat/ip6_gre.c b/datapath/linux/compat/ip6_gre.c
index ca4e66133570..ab50c72d0753 100644
--- a/datapath/linux/compat/ip6_gre.c
+++ b/datapath/linux/compat/ip6_gre.c
@@ -2550,7 +2550,7 @@ static struct rtnl_link_ops ip6gre_link_ops __read_mostly 
= {
  };
  
  static struct rtnl_link_ops ip6gre_tap_ops __read_mostly = {

-   .kind   = "ip6gre",
+   .kind   = "ip6gretap",
.maxtype= RPL_IFLA_GRE_MAX,
.policy = ip6gre_policy,
.priv_size  = sizeof(struct ip6_tnl),
diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
index 2e23a8c14fcf..582274c46774 100644
--- a/lib/dpif-netlink-rtnl.c
+++ b/lib/dpif-netlink-rtnl.c
@@ -104,7 +104,13 @@ vport_type_to_kind(enum ovs_vport_type type,
  case OVS_VPORT_TYPE_IP6ERSPAN:
  return "ip6erspan";
  case OVS_VPORT_TYPE_IP6GRE:
-return "ip6gre";
+if (tnl_cfg->pt_mode == NETDEV_PT_LEGACY_L2) {
+return "ip6gretap";
+} else if (tnl_cfg->pt_mode == NETDEV_PT_LEGACY_L3) {
+return NULL;
+} else {
+return NULL;
+}
  case OVS_VPORT_TYPE_NETDEV:
  case OVS_VPORT_TYPE_INTERNAL:
  case OVS_VPORT_TYPE_LISP:
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index d23ee897b0b2..8ea450887076 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -340,6 +340,46 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 
10.1.1.100 | FORMAT_PI
  OVS_TRAFFIC_VSWITCHD_STOP
  AT_CLEANUP
  
+AT_SETUP([datapath - ping over ip6gre L2 tunnel])

+OVS_CHECK_KERNEL_EXCL(3, 10, 4, 15)
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
+AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS out

Re: [ovs-dev] [PATCHv5] tunnel: Add layer 2 IPv6 GRE encapsulation support.

2019-07-02 Thread Gregory Rose


On 7/2/2019 10:42 AM, Gregory Rose wrote:


On 7/1/2019 12:45 PM, William Tu wrote:

The patch adds ip6gre support. Tunnel type 'ip6gre' with packet_type=
legacy_l2 is a layer 2 GRE tunnel over IPv6, carrying inner ethernet 
packets
and encap with GRE header with outer IPv6 header.  Encapsulation of 
layer 3
packet over IPv6 GRE, ip6gre, is not supported yet.  I tested it by 
running:

   # make check-kernel TESTSUITEFLAGS='-k ip6gre'
under kernel 5.2 and for userspace:
   # make check TESTSUITEFLAGS='-k ip6gre'


On a Ubuntu 18.04 system with the distro provided 4.15.0-43-generic 
kernel I get the following splat:





[ 510.469589] Kernel panic - not syncing: Fatal exception in interrupt
[  510.470789] Kernel Offset: 0x1e00 from 0x8100 
(relocation range: 0x8000-0xbfff)
[  510.471360] ---[ end Kernel panic - not syncing: Fatal exception in 
interrupt




[snip]

I've run into an interesting problem but I don't think it's related to 
this patch.  I can revert the patch on that system and get the same problem.


In the interest of getting this patch tested and reviewed I'll just try 
a different distro/machine.


- Greg

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] criteria/benchmarks for an improved userspace datapath classifier?

2019-07-02 Thread Michal Orsák

Hello,

On 02/07/2019 19:47, Flavio Leitner wrote:

On Tue, Jul 02, 2019 at 10:00:44AM -0700, Ben Pfaff wrote:

Hi Ilya and Ian.  Please allow me to introduce Michal Orsak, a grad
student currently looking at packet classifiers.  He's implemented a
novel classifier that is faster than the one already in OVS in the
benchmarks that he's run.  His classifier is tree-based, like most
high-performance classifiers, but also incremental so that flows can be
inserted and deleted individually without undue delay.  Ultimately, it
might make sense to replace the OVS userspace datapath classifier by one
based on the concepts that he's come up with.

A difficulty with classifiers, however, is coming up with an appropriate
set of benchmarks to compare them fairly.  The userspace datapath
focuses on performance, so do you have a set of benchmarks that you
recommend for comparison?  Are there other criteria that would be
important (besides correctness)?

(I'd take answers from anyone, not just Ian and Ilya!)

Hi Ben,

We use 1M constant IPv4 flows, and 200k new and 200k retiring IPv4
flows per second to compare as it seems to be close to some production
workloads.


What is the format of the rules in classifier?

Do you also remove and add 200K rules/s?


fbl


Michal

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] criteria/benchmarks for an improved userspace datapath classifier?

2019-07-02 Thread Michal Orsák

Hello everyone,

On 02/07/2019 19:43, Ian Stokes wrote:

On 7/2/2019 6:00 PM, Ben Pfaff wrote:

Hi Ilya and Ian.  Please allow me to introduce Michal Orsak, a grad
student currently looking at packet classifiers.  He's implemented a
novel classifier that is faster than the one already in OVS in the
benchmarks that he's run.  His classifier is tree-based, like most
high-performance classifiers, but also incremental so that flows can be
inserted and deleted individually without undue delay. Ultimately, it
might make sense to replace the OVS userspace datapath classifier by one
based on the concepts that he's come up with.


Thanks for the introduction Ben. I wasn't aware of Michals work. The 
timing is quite apt as currently I've been looking at Harrys (CCd) 
approach to improving performance for the current DPCLS.


https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/358790.html

I was hoping to have this approach in place for OVS 2.12 as there's 
been ongoing testing/reviews across the community with general 
improvements of ~ 15% or more for both x86 and ARM architectures which 
looks promising.


I'd be interested in seeing Michals proposals if there are patches 
available that would be great.


Now I am implementing the DPCLS API so we can test the thing. There are 
also some differences and problems like:


* support for priority of the rules

* only single matching rule

* c++

* my limited knowledge of DPCLS




A difficulty with classifiers, however, is coming up with an appropriate
set of benchmarks to compare them fairly.  The userspace datapath
focuses on performance, so do you have a set of benchmarks that you
recommend for comparison?  Are there other criteria that would be
important (besides correctness)?


Agreed, that's something we've been looking for feedback on from the 
community to date. Most testing we have seen have been performance 
focused.



There are so many corner cases... Any test is appreciated.
The areas I think that would be good to measure would be flow 
addition/flow deletion overhead in CPU cycles. Also memory and time 
complexity comparisons for existing and proposed techniques might be 
nice to see.


Currently I do have comparison with OVS TSS, DPDK librte_acl, 
PartitionSort, HyperSplit, HyperCuts and few others. (C based 
benchmarks. Some of them not optimized implementations. Originally for 
devel. of hardware based classifier.). All previously mentioned except 
TSS do have worst case complexities from the same class. However it is 
more complicated as it depends on size of the matching field and other 
factors which have different meanings between algs. I do not have 
understandable and short definitions yet.




Thanks
Ian



(I'd take answers from anyone, not just Ian and Ilya!)

Thanks,

Ben.




Thanks,

Michal

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 1/3] doc: Move vhost tx retry info to separate section.

2019-07-02 Thread Ian Stokes

On 7/2/2019 1:32 AM, Kevin Traynor wrote:

vhost tx retry is applicable to vhost-user and vhost-user-client,
but was in the section that compares them. Also, moved further
down the doc as prefer to have more fundamental info about vhost
nearer the top.

Fixes: 6d6513bfc657 ("doc: Add info on vhost tx retries.")
Reported-by: David Marchand 
Signed-off-by: Kevin Traynor 
---
  Documentation/topics/dpdk/vhost-user.rst | 72 
  1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/Documentation/topics/dpdk/vhost-user.rst 
b/Documentation/topics/dpdk/vhost-user.rst
index a6cf9d1cc..5f393aced 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -76,40 +76,4 @@ mode ports require QEMU version 2.7.  Ports of type 
vhost-user are currently
  deprecated and will be removed in a future release.
  
-vhost tx retries

-
-
-When sending a batch of packets to a vhost-user or vhost-user-client interface,
-it may happen that some but not all of the packets in the batch are able to be
-sent to the guest. This is often because there is not enough free descriptors
-in the virtqueue for all the packets in the batch to be sent. In this case
-there will be a retry, with a default maximum of 8 occurring. If at any time no
-packets can be sent, it may mean the guest is not accepting packets, so there
-are no (more) retries.
-
-.. note::
-
-  Maximum vhost tx batch size is defined by NETDEV_MAX_BURST, and is currently
-  as 32.
-
-Tx Retries may be reduced or even avoided by some external configuration, such
-as increasing the virtqueue size through the ``rx_queue_size`` parameter
-introduced in QEMU 2.7.0 / libvirt 2.3.0::
-
-  
-  
-  
-  
-  
-  
-  
-
-The guest application will also need need to provide enough descriptors. For
-example with ``testpmd`` the command line argument can be used::
-
- --rxd=1024 --txd=1024
-
-The guest should also have sufficient cores dedicated for consuming and
-processing packets at the required rate.
-
  .. _dpdk-vhost-user:
  
@@ -521,4 +485,40 @@ DPDK vHost User ports can be configured to use Jumbo Frames. For more

  information, refer to :doc:`jumbo-frames`.
  
+vhost tx retries

+
+
+When sending a batch of packets to a vhost-user or vhost-user-client interface,
+it may happen that some but not all of the packets in the batch are able to be
+sent to the guest. This is often because there is not enough free descriptors
+in the virtqueue for all the packets in the batch to be sent. In this case
+there will be a retry, with a default maximum of 8 occurring. If at any time no
+packets can be sent, it may mean the guest is not accepting packets, so there
+are no (more) retries.
+
+.. note::
+
+  Maximum vhost tx batch size is defined by NETDEV_MAX_BURST, and is currently
+  as 32.
+
+Tx Retries may be reduced or even avoided by some external configuration, such
+as increasing the virtqueue size through the ``rx_queue_size`` parameter
+introduced in QEMU 2.7.0 / libvirt 2.3.0::
+
+  
+  
+  
+  
+  
+  
+  
+
+The guest application will also need need to provide enough descriptors. For
+example with ``testpmd`` the command line argument can be used::
+
+ --rxd=1024 --txd=1024
+
+The guest should also have sufficient cores dedicated for consuming and
+processing packets at the required rate.
+


Looks ok to me.

@David as this was suggested by you, are you happy with the change?

Thanks
Ian


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and vhostuser ports

2019-07-02 Thread Kevin Traynor
On 02/07/2019 13:30, Sriram Vatala wrote:
> Hi Kevin,
> 
> Please consider this as a gentle remainder regarding below mail discussion 
> and 
> let me know your thoughts if any.
> 

Hi Sriram,

I think your suggested approach is the best available at present i.e.
use a more generic name and document the most likely cause.

I thought about something with rte_vhost_avail_entries() but i'm not
sure it's sufficient. You would need to re-write the OVS vhost send
code, and it seems to return 0 for multiple reasons.

thanks,
Kevin.

PS, I am on leave with limited email access, so best to discuss further
with Ian and Ilya if required.

> Thanks & Regards,
> Sriram.
> 
> -Original Message-
> From: Sriram Vatala 
> Sent: 26 June 2019 11:01
> To: 'Venkatesan Pradeep' ; 'Kevin Traynor' 
> ; 'ovs-dev@openvswitch.org' 
> Cc: 'Ilya Maximets' 
> Subject: RE: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk 
> and 
> vhostuser ports
> 
> Hi,
> 
> Agreed. As per suggestions, I would go a head with generic name for packet
> drops due to transmission failure (i.e packets left un-transmitted in output
> packet buffer argument after call to transmit API ) and since the most likely
> reason for this to happen is when transmit queue  is full or has been filled
> up . So,  i will highlight this reason in the documentation.
> 
> Thanks & Regards,
> Sriram.
> 
> -Original Message-
> From: Venkatesan Pradeep 
> Sent: 24 June 2019 12:30
> To: Sriram Vatala ; 'Kevin Traynor'
> 
> Cc: 'Ilya Maximets' 
> Subject: RE: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and
> vhostuser ports
> 
> Hi,
> 
> Since the reason for introducing a new counter to track drops due to tx queue
> being full was to quickly isolate that the problem was on the VM side rather
> than on OVS, having a generic counter and name wouldn't help. One would still
> need to have some knowledge of the code to understand that the likely reason
> for tx failure was the queue being full. Short of having a new API that can
> return the failure code we won't be able to say for sure the drop reason.  If
> we want to use a generic name perhaps we can update the documentation to
> highlight the likely reason.
> 
> Thanks,
> 
> Pradeep
> -Original Message-
> From: ovs-dev-boun...@openvswitch.org  On
> Behalf Of Sriram Vatala via dev
> Sent: Wednesday, June 19, 2019 8:44 PM
> To: 'Kevin Traynor' ; ovs-dev@openvswitch.org
> Cc: 'Ilya Maximets' 
> Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and
> vhostuser ports
> 
> Hi kevin,
> 
> Thanks for your inputs. I will  send the updated patch ASAP.
> 
> Thanks & Regards,
> Sriram.
> 
> -Original Message-
> From: Kevin Traynor 
> Sent: 19 June 2019 19:07
> To: Sriram Vatala ; ovs-dev@openvswitch.org
> Cc: 'Ilya Maximets' 
> Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and
> vhostuser ports
> 
> On 19/06/2019 11:07, Sriram Vatala wrote:
>>
>> Hi Kevin,
>>
>> Thanks for reviewing the patch. Please find my answers to your
>> comments below.
>>
>> Comment-1
>> I'm not sure about this name, you would need to know that it was the
>> only reason rte_eth_tx_burst() and rte_vhost_enqueue_burst() will not
>> send pkts.
>> ---
>> Agree that queue full is not the only reason for which the above DPDK
>> apis will fail to send packets. But queue full is the most common reason.
>> Instead of overlooking other reasons like invalid queue-id, i can
>> change the name of the counter to a generic name something like
>> "tx_failed_drops".
>> what do you think?
>>
> 
> Sounds good to me.
> 
>> Comment-2
>> There can be other drops earlier in this function
>> ("__netdev_dpdk_vhost_send"), should they be logged also?
>> ---
>> These are the drops for invalid queue-id or vhost device id and if
>> device is not up.
>> Again these are very unlikely to happen, but i can add a rate limiting
>> warn log.
>>
> 
> Well I guess it doesn't invalidate your patches which provides drop stats for
> mtu/qos/qfull(or new name), so maybe it's ok to not log those other drops in
> the context of this patchset.
> 
>> Comment-3
>>>  rte_spinlock_lock(&dev->stats_lock);
>>>  netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts,
>>> total_pkts,
>>> + dropped);
>>> +dev->stats.tx_mtu_drops += mtu_drops;
>>
>> There is a function just above for updating vhost tx stats. Now the
>> updates are split with some updated in the function and some updated
>> here, it would be better to update them all in the same place.
>> ---
>> Agree. will change the implementation here.
>>
>> Comment-4
>>>
>>> +dropped += mtu_drops + qos_drops + qfull_drops;
>>
>> = is enough, you don't need +=
>> ---
>> Actually in the code above to this part in "dpdk_do_tx_copy" function,
>> dropped will updated if mbuf alloc fails.
>> Here is the code snippet:
>>
>>
>> pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
>> if (OVS_UNL

Re: [ovs-dev] criteria/benchmarks for an improved userspace datapath classifier?

2019-07-02 Thread Flavio Leitner via dev
On Tue, Jul 02, 2019 at 10:00:44AM -0700, Ben Pfaff wrote:
> Hi Ilya and Ian.  Please allow me to introduce Michal Orsak, a grad
> student currently looking at packet classifiers.  He's implemented a
> novel classifier that is faster than the one already in OVS in the
> benchmarks that he's run.  His classifier is tree-based, like most
> high-performance classifiers, but also incremental so that flows can be
> inserted and deleted individually without undue delay.  Ultimately, it
> might make sense to replace the OVS userspace datapath classifier by one
> based on the concepts that he's come up with.
> 
> A difficulty with classifiers, however, is coming up with an appropriate
> set of benchmarks to compare them fairly.  The userspace datapath
> focuses on performance, so do you have a set of benchmarks that you
> recommend for comparison?  Are there other criteria that would be
> important (besides correctness)?
> 
> (I'd take answers from anyone, not just Ian and Ilya!)

Hi Ben,

We use 1M constant IPv4 flows, and 200k new and 200k retiring IPv4
flows per second to compare as it seems to be close to some production
workloads.

fbl

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] criteria/benchmarks for an improved userspace datapath classifier?

2019-07-02 Thread Ian Stokes

On 7/2/2019 6:00 PM, Ben Pfaff wrote:

Hi Ilya and Ian.  Please allow me to introduce Michal Orsak, a grad
student currently looking at packet classifiers.  He's implemented a
novel classifier that is faster than the one already in OVS in the
benchmarks that he's run.  His classifier is tree-based, like most
high-performance classifiers, but also incremental so that flows can be
inserted and deleted individually without undue delay.  Ultimately, it
might make sense to replace the OVS userspace datapath classifier by one
based on the concepts that he's come up with.


Thanks for the introduction Ben. I wasn't aware of Michals work. The 
timing is quite apt as currently I've been looking at Harrys (CCd) 
approach to improving performance for the current DPCLS.


https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/358790.html

I was hoping to have this approach in place for OVS 2.12 as there's been 
ongoing testing/reviews across the community with general improvements 
of ~ 15% or more for both x86 and ARM architectures which looks promising.


I'd be interested in seeing Michals proposals if there are patches 
available that would be great.




A difficulty with classifiers, however, is coming up with an appropriate
set of benchmarks to compare them fairly.  The userspace datapath
focuses on performance, so do you have a set of benchmarks that you
recommend for comparison?  Are there other criteria that would be
important (besides correctness)?


Agreed, that's something we've been looking for feedback on from the 
community to date. Most testing we have seen have been performance focused.


The areas I think that would be good to measure would be flow 
addition/flow deletion overhead in CPU cycles. Also memory and time 
complexity comparisons for existing and proposed techniques might be 
nice to see.


Thanks
Ian



(I'd take answers from anyone, not just Ian and Ilya!)

Thanks,

Ben.



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv5] tunnel: Add layer 2 IPv6 GRE encapsulation support.

2019-07-02 Thread Gregory Rose


On 7/1/2019 12:45 PM, William Tu wrote:

The patch adds ip6gre support. Tunnel type 'ip6gre' with packet_type=
legacy_l2 is a layer 2 GRE tunnel over IPv6, carrying inner ethernet packets
and encap with GRE header with outer IPv6 header.  Encapsulation of layer 3
packet over IPv6 GRE, ip6gre, is not supported yet.  I tested it by running:
   # make check-kernel TESTSUITEFLAGS='-k ip6gre'
under kernel 5.2 and for userspace:
   # make check TESTSUITEFLAGS='-k ip6gre'


On a Ubuntu 18.04 system with the distro provided 4.15.0-43-generic 
kernel I get the following splat:


[  510.428122] general protection fault:  [#1] SMP PTI
[  510.428192] Modules linked in: openvswitch(OE) tunnel6 
nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 netconsole xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
xt_tcpudp bridge stp llc iptable_filter snd_hda_codec_generic 
snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer 
joydev snd soundcore input_leds mac_hid serio_raw qemu_fw_cfg 
sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs 
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
pcbc qxl ttm
[  510.428536]  drm_kms_helper syscopyarea sysfillrect sysimgblt 
fb_sys_fops aesni_intel aes_x86_64 crypto_simd glue_helper cryptd drm 
psmouse i2c_piix4 virtio_net virtio_blk pata_acpi floppy
[  510.428609] CPU: 0 PID: 35 Comm: kworker/0:1 Tainted: G OE    
4.15.0-43-generic #46-Ubuntu

[  510.428646] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  510.428727] Workqueue: ipv6_addrconf addrconf_dad_work
[  510.428763] RIP: 0010:__gre6_xmit+0x11f/0x2c0 [openvswitch]
[  510.428787] RSP: 0018:b1e840d8f3e8 EFLAGS: 00010286
[  510.428820] RAX: ffea RBX: 9dfaa8a1 RCX: 

[  510.428848] RDX: af55e000 RSI: 0040 RDI: 
9dfaae9cda00
[  510.428875] RBP: b1e840d8f440 R08: 5865 R09: 
b1e840d8f464
[  510.428903] R10: 2000 R11: 9dfab7002c00 R12: 
9dfaae9cda00
[  510.428930] R13:  R14: b1e840d8f468 R15: 

[  510.428972] FS:  () GS:9dfabfc0() 
knlGS:

[  510.429003] CS:  0010 DS:  ES:  CR0: 80050033
[  510.429950] CR2: 55d347881e60 CR3: 0001b760a005 CR4: 
003606f0
[  510.430860] DR0:  DR1:  DR2: 

[  510.431683] DR3:  DR6: fffe0ff0 DR7: 
0400

[  510.432494] Call Trace:
[  510.433318]  ip6gre_tunnel_xmit+0x103/0x530 [openvswitch]
[  510.434083]  ? __wake_up_common+0x73/0x130
[  510.434891]  ? __wake_up_common_lock+0x8e/0xc0
[  510.435763]  __ip6gre_tunnel_xmit+0x12/0x20 [openvswitch]
[  510.436562]  ovs_vport_send+0xd4/0x170 [openvswitch]
[  510.437393]  do_output+0x53/0x160 [openvswitch]
[  510.438221]  do_execute_actions+0x9a1/0x1880 [openvswitch]
[  510.438968]  ? __wake_up_common+0x73/0x130
[  510.439746]  ? __wake_up_common_lock+0x8e/0xc0
[  510.440515]  ? __skb_flow_dissect+0xd29/0x1450
[  510.441295]  ovs_execute_actions+0x4c/0x120 [openvswitch]
[  510.442030]  ? ovs_execute_actions+0x4c/0x120 [openvswitch]
[  510.442819]  ovs_dp_process_packet+0x95/0x150 [openvswitch]
[  510.443649]  ? ovs_ct_update_key+0x85/0xe0 [openvswitch]
[  510.92]  ovs_vport_receive+0x7e/0xd0 [openvswitch]
[  510.445294]  ? select_idle_sibling+0x29/0x410
[  510.446200]  ? select_task_rq_fair+0x637/0xab0
[  510.446990]  ? __update_load_avg_se.isra.38+0x1c0/0x1d0
[  510.447763]  ? try_to_wake_up+0x59/0x480
[  510.448549]  internal_dev_xmit+0x28/0x60 [openvswitch]
[  510.449356]  dev_hard_start_xmit+0xa8/0x210
[  510.450173]  __dev_queue_xmit+0x690/0x7d0
[  510.450908]  ? mroute6_socket+0x78/0xa0
[  510.451704]  dev_queue_xmit+0x10/0x20
[  510.452413]  ? dev_queue_xmit+0x10/0x20
[  510.453101]  ip6_finish_output2+0x339/0x520
[  510.453776]  ? ipv6_get_l4proto+0x96/0xe0 [nf_conntrack_ipv6]
[  510.454429]  ip6_finish_output+0x13a/0x1b0
[  510.455049]  ? ip6_finish_output+0x13a/0x1b0
[  510.455662]  ip6_output+0x6c/0x110
[  510.456285]  ? ip6_fragment+0x9f0/0x9f0
[  510.456825]  NF_HOOK.constprop.34+0x45/0xb0
[  510.457401]  ? ipv6_icmp_sysctl_init+0x40/0x40
[  510.458022]  mld_sendpack+0x163/0x210
[  510.458572]  mld_send_initial_cr.part.23+0x86/0xa0
[  510.459214]  ipv6_mc_dad_complete+0x2b/0x40
[  510.459744]  addrconf_dad_completed+0x302/0x370
[  510.460268]  ? __switch_to_asm+0x40/0x70
[  510.460768]  ? __switch_to_asm+0x34/0x70
[  510.461272]  ? __switch_to_asm+0x40/0x70
[  510.461766]  ? __switch_to_asm+0x34/0x70
[  510.462271]  ? __switch_to_asm+0x40/0x70
[  510.462738]  addrconf_dad_work+0x2b8/0x440
[

Re: [ovs-dev] [PATCH 1/1] dpif-netlink: Warn eth type 0x1234 not offloadable

2019-07-02 Thread Ben Pfaff
On Sun, Jun 30, 2019 at 06:16:05AM +, Eli Britstein wrote:
> Ethernet type 0x1234 is used for testing and not being offloadable. For
> testing offloadable features, warn about using this value.
> 
> Signed-off-by: Eli Britstein 
> ---
>  lib/dpif-netlink.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index ba80a0079..327076784 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -2007,6 +2007,7 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
> dpif_flow_put *put)
>  
>  /* When we try to install a dummy flow from a probed feature. */
>  if (match.flow.dl_type == htons(0x1234)) {
> +VLOG_WARN("eth 0x1234 is special and not offloadable");
>  return EOPNOTSUPP;
>  }

This should be rate-limited.

Are you sure that you want it to be a warning?  It does not seem to be
an actual problem, so INFO or even DBG might be better.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] criteria/benchmarks for an improved userspace datapath classifier?

2019-07-02 Thread Ben Pfaff
Hi Ilya and Ian.  Please allow me to introduce Michal Orsak, a grad
student currently looking at packet classifiers.  He's implemented a
novel classifier that is faster than the one already in OVS in the
benchmarks that he's run.  His classifier is tree-based, like most
high-performance classifiers, but also incremental so that flows can be
inserted and deleted individually without undue delay.  Ultimately, it
might make sense to replace the OVS userspace datapath classifier by one
based on the concepts that he's come up with.

A difficulty with classifiers, however, is coming up with an appropriate
set of benchmarks to compare them fairly.  The userspace datapath
focuses on performance, so do you have a set of benchmarks that you
recommend for comparison?  Are there other criteria that would be
important (besides correctness)?

(I'd take answers from anyone, not just Ian and Ilya!)

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v9 1/5] dpif-netdev: implement function pointers/subtable

2019-07-02 Thread Ian Stokes

On 7/2/2019 4:02 PM, Van Haaren, Harry wrote:

-Original Message-
From: Stokes, Ian
Sent: Tuesday, July 2, 2019 3:40 PM
To: Van Haaren, Harry ; ovs-dev@openvswitch.org
Cc: i.maxim...@samsung.com
Subject: Re: [ovs-dev] [PATCH v9 1/5] dpif-netdev: implement function
pointers/subtable

On 5/15/2019 6:02 PM, Ian Stokes wrote:

On 5/8/2019 4:13 PM, Harry van Haaren wrote:

This allows plugging-in of different subtable hash-lookup-verify
routines, and allows special casing of those functions based on
known context (eg: # of bits set) of the specific subtable.

Signed-off-by: Harry van Haaren 

---

v9:
- Use count_1bits in favour of __builtin_popcount (Ilya)

v6:
- Implement subtable effort per packet "lookups_match" counter  (Ilya)
- Remove double newline (Eelco)
- Remove doubel * before comments (Eelco)
- Reword comments in dpcls_lookup() for clarity (Harry)
---
   lib/dpif-netdev.c | 135 +++---
   1 file changed, 93 insertions(+), 42 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 5a6f2abac..e2e2c14e7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -7572,6 +7572,27 @@ dpif_dummy_register(enum dummy_level level)
   



   /* Datapath Classifier. */
+/* forward declaration for lookup_func typedef */


Minor nit above, general rule of thumb for comment style, start with
capital letter and finish with period.


+struct dpcls_subtable;
+
+/* Lookup function for a subtable in the dpcls. This function is called
+ * by each subtable with an array of packets, and a bitmask of
packets to
+ * perform the lookup on. Using a function pointer gives flexibility to
+ * optimize the lookup function based on subtable properties and the
+ * CPU instruction set available at runtime.
+ */
+typedef uint32_t (*dpcls_subtable_lookup_func)(struct dpcls_subtable
*subtable,
+    uint32_t keys_map, const struct netdev_flow_key *keys[],
+    struct dpcls_rule **rules);


Alignment for the arguments above looks odd, typically arguments are
kept vertically in line with one another *similar to
dpcls_subtable_lookup_generic below).


+
+/* Prototype for generic lookup func, using same code path as before */


Period at end of comment above.


+uint32_t
+dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable,
+  uint32_t keys_map,
+  const struct netdev_flow_key *keys[],
+  struct dpcls_rule **rules);
+
+


One minor follow up, extra whitespace seems to have been added here, cab
be reduced to 1 new line.


Will fix.






+    uint32_t pkts_matched = count_1bits(found_map);


Just to clarify, at this point found_map has been returned and it only
contains matches where packets were found? i.e. it doesn't contain
matches from possible hash collisions?


+    lookups_match += pkts_matched * subtable_pos;


Hi Harry, can you clarify above why '* subtable_pos' is used? Is that
right? I'm just thinking that you already have the number of packets
matched from the count_1bits(found_map).


The "lookups match" counter variable name is a bit misleading, but I'd
change it in this patchset as its orthogonal to the DPCLS function pointer
concept.

The counter counts "number of subtables searched per hit packet", so to
speak. See Ilya's input and my response here:
https://www.mail-archive.com/ovs-dev@openvswitch.org/msg31442.html

This is a pseudo code of expected behavior:
1st subtable packet hits are += 1,
2nd subtable packet hits are += 2,
3rd subtable packet hits are += 3  etc.

At the end of a burst, we have a bitmask of packets hit, and a counter for
"subtable effort per packet matched".

Given two experienced OVS folks asked ~ the same question, hints that its
time for a cleanup & refactor, perhaps even a /* descriptive comment */ :)



Ya that makes more sense, confirms what i suspected :).

Sure a description of the purpose would help here in the next revision.

Thanks
Ian


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv13] netdev-afxdp: add new netdev type for AF_XDP.

2019-07-02 Thread Ilya Maximets
On 28.06.2019 19:37, William Tu wrote:
>>
>>
>> One more thing I noticed is the same issue as you had with completion queue, 
>> but
>> with rx queue. When I'm trying to send traffic from 2 threads to the same 
>> port,
> 
> Is the 2 threads send traffic using afxdp tx?

Yes.

> 
>> I'm starting receiving same pointers from rx ring. Not only the same ring 
>> entries,
>> but there was cases where two identical pointers was stored sequentially in 
>> rx ring.
> 
> I use similar way as used in completion queue (assign UINT64_MAX to rx ring
> at netdev_afxdp_rxq_recv) but do not see any identical pointers.
> 
>> I'm more and more thinking that it's a kernel/libbpf bug. The last bit that 
>> left
>> for checking is the pointers inside the fill queue. All other parts in OVS 
>> seems to
>> work correctly. I'll send more information about the testcase later after 
>> re-checking
>> with the most recent bpf-next.
>>
> 
> Look forward to your investigation! Thanks a lot.

It was a kernel bug that generic receive path doesn't have any locks,
but generic receive could be triggered from different cores at the same
time breaking the rx an fill queues. I tried to run 2 traffic flows over
the veth pair, one side of which was opened by netdev-afxdp in OVS. And
OVS constantly crashed because two kernel threads tried to allocate same
addresses from fill queue and pushed them to rx queue. That is the root
cause of duplicated addresses in RX queue. Data in these descriptors
most probably was corrupted too.

I've send a patch for this issue:
https://lore.kernel.org/bpf/20190702143634.19688-1-i.maxim...@samsung.com/

I'm still having some troubles with this scenario. Sometimes the traffic
simply stops flowing. But this seems a different issue. Most likely, one
more kernel issue...
However, OVS doesn't crash for me anymore. And this is good news.


-
Full testcase description
-
ip netns add at_ns0
ip netns add at_ns1

ip link add p0 type veth peer name patch-p0
ethtool -K p0 tx off rxvlan off txvlan off
  
ip link set p0 netns at_ns0  
ip link set dev patch-p0 up
ip link set dev patch-p0 promisc on
  
ip netns exec at_ns0 ip addr add "10.1.1.1/24" dev p0
ip netns exec at_ns0 ip link set dev p0 up

ip link add p1 type veth peer name patch-p1
ethtool -K p1 tx off rxvlan off txvlan off

ip link set p1 netns at_ns1
ip link set dev patch-p1 up
ip link set dev patch-p1 promisc on

ip netns exec at_ns1 ip addr add "10.1.1.2/24" dev p1
ip netns exec at_ns1 ip link set dev p1 up



# up the internal port of ovs bridge
ip link set dev br0 up
ip addr add dev br0 10.1.1.13/24


[shell#1] ip netns exec at_ns1 iperf3 -s
[shell#2] ip netns exec at_ns1 iperf3 -s -p 5008
[shell#3] ip netns exec at_ns0 iperf3 -c 10.1.1.2 -t 3600

[shell#4] iperf3 -c 10.1.1.2 -t 3600 -p 5008 # Works via internal port.



-
For this testcase to work you need 'skb_unclone' patch applied in kernel,
otherwise TCP traffic will not flow.


Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v9 1/5] dpif-netdev: implement function pointers/subtable

2019-07-02 Thread Van Haaren, Harry
> -Original Message-
> From: Stokes, Ian
> Sent: Tuesday, July 2, 2019 3:40 PM
> To: Van Haaren, Harry ; ovs-dev@openvswitch.org
> Cc: i.maxim...@samsung.com
> Subject: Re: [ovs-dev] [PATCH v9 1/5] dpif-netdev: implement function
> pointers/subtable
> 
> On 5/15/2019 6:02 PM, Ian Stokes wrote:
> > On 5/8/2019 4:13 PM, Harry van Haaren wrote:
> >> This allows plugging-in of different subtable hash-lookup-verify
> >> routines, and allows special casing of those functions based on
> >> known context (eg: # of bits set) of the specific subtable.
> >>
> >> Signed-off-by: Harry van Haaren 
> >>
> >> ---
> >>
> >> v9:
> >> - Use count_1bits in favour of __builtin_popcount (Ilya)
> >>
> >> v6:
> >> - Implement subtable effort per packet "lookups_match" counter  (Ilya)
> >> - Remove double newline (Eelco)
> >> - Remove doubel * before comments (Eelco)
> >> - Reword comments in dpcls_lookup() for clarity (Harry)
> >> ---
> >>   lib/dpif-netdev.c | 135 +++---
> >>   1 file changed, 93 insertions(+), 42 deletions(-)
> >>
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >> index 5a6f2abac..e2e2c14e7 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -7572,6 +7572,27 @@ dpif_dummy_register(enum dummy_level level)
> >>   

> >>   /* Datapath Classifier. */
> >> +/* forward declaration for lookup_func typedef */
> >
> > Minor nit above, general rule of thumb for comment style, start with
> > capital letter and finish with period.
> >
> >> +struct dpcls_subtable;
> >> +
> >> +/* Lookup function for a subtable in the dpcls. This function is called
> >> + * by each subtable with an array of packets, and a bitmask of
> >> packets to
> >> + * perform the lookup on. Using a function pointer gives flexibility to
> >> + * optimize the lookup function based on subtable properties and the
> >> + * CPU instruction set available at runtime.
> >> + */
> >> +typedef uint32_t (*dpcls_subtable_lookup_func)(struct dpcls_subtable
> >> *subtable,
> >> +    uint32_t keys_map, const struct netdev_flow_key *keys[],
> >> +    struct dpcls_rule **rules);
> >
> > Alignment for the arguments above looks odd, typically arguments are
> > kept vertically in line with one another *similar to
> > dpcls_subtable_lookup_generic below).
> >
> >> +
> >> +/* Prototype for generic lookup func, using same code path as before */
> >
> > Period at end of comment above.
> >
> >> +uint32_t
> >> +dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable,
> >> +  uint32_t keys_map,
> >> +  const struct netdev_flow_key *keys[],
> >> +  struct dpcls_rule **rules);
> >> +
> >> +
> 
> One minor follow up, extra whitespace seems to have been added here, cab
> be reduced to 1 new line.

Will fix.



> >
> >> +    uint32_t pkts_matched = count_1bits(found_map);
> >
> > Just to clarify, at this point found_map has been returned and it only
> > contains matches where packets were found? i.e. it doesn't contain
> > matches from possible hash collisions?
> >
> >> +    lookups_match += pkts_matched * subtable_pos;
> 
> Hi Harry, can you clarify above why '* subtable_pos' is used? Is that
> right? I'm just thinking that you already have the number of packets
> matched from the count_1bits(found_map).

The "lookups match" counter variable name is a bit misleading, but I'd
change it in this patchset as its orthogonal to the DPCLS function pointer
concept.

The counter counts "number of subtables searched per hit packet", so to
speak. See Ilya's input and my response here:
https://www.mail-archive.com/ovs-dev@openvswitch.org/msg31442.html

This is a pseudo code of expected behavior:
1st subtable packet hits are += 1,
2nd subtable packet hits are += 2,
3rd subtable packet hits are += 3  etc.

At the end of a burst, we have a bitmask of packets hit, and a counter for
"subtable effort per packet matched".

Given two experienced OVS folks asked ~ the same question, hints that its
time for a cleanup & refactor, perhaps even a /* descriptive comment */ :)



Thanks for review / comments! -Harry
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v9 1/5] dpif-netdev: implement function pointers/subtable

2019-07-02 Thread Ian Stokes

On 5/15/2019 6:02 PM, Ian Stokes wrote:

On 5/8/2019 4:13 PM, Harry van Haaren wrote:

This allows plugging-in of different subtable hash-lookup-verify
routines, and allows special casing of those functions based on
known context (eg: # of bits set) of the specific subtable.

Signed-off-by: Harry van Haaren 

---

v9:
- Use count_1bits in favour of __builtin_popcount (Ilya)

v6:
- Implement subtable effort per packet "lookups_match" counter  (Ilya)
- Remove double newline (Eelco)
- Remove doubel * before comments (Eelco)
- Reword comments in dpcls_lookup() for clarity (Harry)
---
  lib/dpif-netdev.c | 135 +++---
  1 file changed, 93 insertions(+), 42 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 5a6f2abac..e2e2c14e7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -7572,6 +7572,27 @@ dpif_dummy_register(enum dummy_level level)
  
  /* Datapath Classifier. */
+/* forward declaration for lookup_func typedef */


Minor nit above, general rule of thumb for comment style, start with 
capital letter and finish with period.



+struct dpcls_subtable;
+
+/* Lookup function for a subtable in the dpcls. This function is called
+ * by each subtable with an array of packets, and a bitmask of 
packets to

+ * perform the lookup on. Using a function pointer gives flexibility to
+ * optimize the lookup function based on subtable properties and the
+ * CPU instruction set available at runtime.
+ */
+typedef uint32_t (*dpcls_subtable_lookup_func)(struct dpcls_subtable 
*subtable,

+    uint32_t keys_map, const struct netdev_flow_key *keys[],
+    struct dpcls_rule **rules);


Alignment for the arguments above looks odd, typically arguments are 
kept vertically in line with one another *similar to 
dpcls_subtable_lookup_generic below).



+
+/* Prototype for generic lookup func, using same code path as before */


Period at end of comment above.


+uint32_t
+dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable,
+  uint32_t keys_map,
+  const struct netdev_flow_key *keys[],
+  struct dpcls_rule **rules);
+
+


One minor follow up, extra whitespace seems to have been added here, cab 
be reduced to 1 new line.



  /* A set of rules that all have the same fields wildcarded. */
  struct dpcls_subtable {
  /* The fields are only used by writers. */
@@ -7581,6 +7602,13 @@ struct dpcls_subtable {
  struct cmap rules;   /* Contains "struct dpcls_rule"s. */
  uint32_t hit_cnt;    /* Number of match hits in subtable 
in current

  optimization interval. */
+
+    /* the lookup function to use for this subtable. If there is a known
+ * property of the subtable (eg: only 3 bits of miniflow metadata is
+ * used for the lookup) then this can point at an optimized 
version of

+ * the lookup function for this particular subtable. */

the -> The above.

+    dpcls_subtable_lookup_func lookup_func;
+
  struct netdev_flow_key mask; /* Wildcards for fields (const). */
  /* 'mask' must be the last field, additional space is allocated 
here. */

  };
@@ -7640,6 +7668,10 @@ dpcls_create_subtable(struct dpcls *cls, const 
struct netdev_flow_key *mask)

  cmap_init(&subtable->rules);
  subtable->hit_cnt = 0;
  netdev_flow_key_clone(&subtable->mask, mask);
+
+    /* decide which hash/lookup/verify function to use */
+    subtable->lookup_func = dpcls_subtable_lookup_generic;


So by default dpcls_subtable_lookup_generic is set as the look up 
function. Makes sense as this is the only look up available at this 
stage. I assume it's later in the series a mechanism to select a 
different lookup is introduced. Or by default does the lookup always 
start off as the generic option when a subtable is created even when 
other options are possible?



+
  cmap_insert(&cls->subtables_map, &subtable->cmap_node, mask->hash);
  /* Add the new subtable at the end of the pvector (with no hits 
yet) */

  pvector_insert(&cls->subtables, subtable, 0);
@@ -7800,6 +7832,53 @@ dpcls_rule_matches_key(const struct dpcls_rule 
*rule,

  return true;
  }
+uint32_t
+dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable,
+  uint32_t keys_map,
+  const struct netdev_flow_key *keys[],
+  struct dpcls_rule **rules)
+{
+    int i;
+    /* Compute hashes for the remaining keys.  Each search-key is
+ * masked with the subtable's mask to avoid hashing the 
wildcarded

+ * bits. */
+    uint32_t hashes[NETDEV_MAX_BURST];
+    ULLONG_FOR_EACH_1(i, keys_map) {
+    hashes[i] = netdev_flow_key_hash_in_mask(keys[i],
+ &subtable->mask);
+    }
+
+    /* Lookup. */
+    const struct cmap_node *no

Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and vhostuser ports

2019-07-02 Thread Sriram Vatala via dev
Hi Kevin,

Please consider this as a gentle remainder regarding below mail discussion and 
let me know your thoughts if any.

Thanks & Regards,
Sriram.

-Original Message-
From: Sriram Vatala 
Sent: 26 June 2019 11:01
To: 'Venkatesan Pradeep' ; 'Kevin Traynor' 
; 'ovs-dev@openvswitch.org' 
Cc: 'Ilya Maximets' 
Subject: RE: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and 
vhostuser ports

Hi,

Agreed. As per suggestions, I would go a head with generic name for packet
drops due to transmission failure (i.e packets left un-transmitted in output
packet buffer argument after call to transmit API ) and since the most likely
reason for this to happen is when transmit queue  is full or has been filled
up . So,  i will highlight this reason in the documentation.

Thanks & Regards,
Sriram.

-Original Message-
From: Venkatesan Pradeep 
Sent: 24 June 2019 12:30
To: Sriram Vatala ; 'Kevin Traynor'

Cc: 'Ilya Maximets' 
Subject: RE: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and
vhostuser ports

Hi,

Since the reason for introducing a new counter to track drops due to tx queue
being full was to quickly isolate that the problem was on the VM side rather
than on OVS, having a generic counter and name wouldn't help. One would still
need to have some knowledge of the code to understand that the likely reason
for tx failure was the queue being full. Short of having a new API that can
return the failure code we won't be able to say for sure the drop reason.  If
we want to use a generic name perhaps we can update the documentation to
highlight the likely reason.

Thanks,

Pradeep
-Original Message-
From: ovs-dev-boun...@openvswitch.org  On
Behalf Of Sriram Vatala via dev
Sent: Wednesday, June 19, 2019 8:44 PM
To: 'Kevin Traynor' ; ovs-dev@openvswitch.org
Cc: 'Ilya Maximets' 
Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and
vhostuser ports

Hi kevin,

Thanks for your inputs. I will  send the updated patch ASAP.

Thanks & Regards,
Sriram.

-Original Message-
From: Kevin Traynor 
Sent: 19 June 2019 19:07
To: Sriram Vatala ; ovs-dev@openvswitch.org
Cc: 'Ilya Maximets' 
Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and
vhostuser ports

On 19/06/2019 11:07, Sriram Vatala wrote:
>
> Hi Kevin,
>
> Thanks for reviewing the patch. Please find my answers to your
> comments below.
>
> Comment-1
> I'm not sure about this name, you would need to know that it was the
> only reason rte_eth_tx_burst() and rte_vhost_enqueue_burst() will not
> send pkts.
> ---
> Agree that queue full is not the only reason for which the above DPDK
> apis will fail to send packets. But queue full is the most common reason.
> Instead of overlooking other reasons like invalid queue-id, i can
> change the name of the counter to a generic name something like
> "tx_failed_drops".
> what do you think?
>

Sounds good to me.

> Comment-2
> There can be other drops earlier in this function
> ("__netdev_dpdk_vhost_send"), should they be logged also?
> ---
> These are the drops for invalid queue-id or vhost device id and if
> device is not up.
> Again these are very unlikely to happen, but i can add a rate limiting
> warn log.
>

Well I guess it doesn't invalidate your patches which provides drop stats for
mtu/qos/qfull(or new name), so maybe it's ok to not log those other drops in
the context of this patchset.

> Comment-3
>>  rte_spinlock_lock(&dev->stats_lock);
>>  netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts,
>> total_pkts,
>> + dropped);
>> +dev->stats.tx_mtu_drops += mtu_drops;
>
> There is a function just above for updating vhost tx stats. Now the
> updates are split with some updated in the function and some updated
> here, it would be better to update them all in the same place.
> ---
> Agree. will change the implementation here.
>
> Comment-4
>>
>> +dropped += mtu_drops + qos_drops + qfull_drops;
>
> = is enough, you don't need +=
> ---
> Actually in the code above to this part in "dpdk_do_tx_copy" function,
> dropped will updated if mbuf alloc fails.
> Here is the code snippet:
>
>
> pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
> if (OVS_UNLIKELY(!pkts[txcnt])) {
> dropped += cnt - i;
> break;
> }
>
> To take care not to miss this in total drops, i am using dropped+=,
> Even if the above part of the code doesn't hit, as dropped variable is
> initialized to Zero that expression should not result in incorrect
> value for drops.
>

Yes, you're right, I missed it, thanks.

>
> Comment-5
>
>> +> +
>> filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-iface-st
>> ats
>> +
>
> Probably better to be consistent and not wrap this line
> ---
> Ok. I wrapped this line as utilities/checkpatch script is giving
> warning as the line length is exceeding 79 characters. I will unwrap this.
>
>
>
> Comment-6
>
>> I think

Re: [ovs-dev] [Suspected-Phishing]Re: [Suspected-Phishing]Re: [PATCH v2 1/3] netdev: Dynamic per-port Flow API.

2019-07-02 Thread Roi Dayan



On 2019-07-02 2:23 PM, Roi Dayan wrote:
>>> Regarding deprecation, I'd like to remove all the functionality from 
>>> ovs-dpctl
>>> utility keeping only ability to remove system datapaths, since this, IMHO,
>>> is the only useful function.
>>>
 Imagine some flows are in TC and some in OVS. I guess dpctl will only
 show the OVS ones.
>>> Yes, and this is documented.
>> I missed the documentation but bisected to find the offending commit. I 
>> think it's better to deprecate as you suggested above, maybe with some 
>> message to refer users to the documentation.
> was added when u try to use the type arg.
> 
> # ovs-dpctl dump-flows type=tc
> ovs-dpctl: Invalid argument 'type'. Use 'ovs-appctl dpctl/dump-flows' instead.
> 
> just output the msg all the time would have been annoying.
> 

Hi Ilya,

I didn't understand at first what Eli intended but I do now.
Eli suggest ovs-dpctl will always output a deprecation notice
to use ovs-appctl instead. so all users will see it and start
migrating to ovs-appctl.  Is that accepted ?

can be in any ovs-dpctl or specific ones like dump-flows but
not only if type arg used but always.

Thanks,
Roi
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [Suspected-Phishing]Re: [Suspected-Phishing]Re: [Suspected-Phishing] [PATCH 1/1] dpif-netlink: Warn eth type 0x1234 not offloadable

2019-07-02 Thread Roi Dayan



On 2019-07-02 2:33 PM, Roi Dayan wrote:
> 
> 
> On 2019-07-02 2:21 PM, Roi Dayan wrote:
>>
>>
>> On 2019-06-30 9:16 AM, Eli Britstein wrote:
>>> Ethernet type 0x1234 is used for testing and not being offloadable. For
>>> testing offloadable features, warn about using this value.
>>>
>>> Signed-off-by: Eli Britstein 
>>> ---
>>>  lib/dpif-netlink.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>>> index ba80a0079..327076784 100644
>>> --- a/lib/dpif-netlink.c
>>> +++ b/lib/dpif-netlink.c
>>> @@ -2007,6 +2007,7 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
>>> dpif_flow_put *put)
>>>  
>>>  /* When we try to install a dummy flow from a probed feature. */
>>>  if (match.flow.dl_type == htons(0x1234)) {
>>> +VLOG_WARN("eth 0x1234 is special and not offloadable");
>>>  return EOPNOTSUPP;
>>>  }
>>>  
>>>
>>
>> why a warn is needed?
> 
> nm. somehow skipped your commit msg.
> did u check by any chance if unit tests won't fail as they need to filter 
> this from the log?
> 
tested here. didnt need to filter it.

https://travis-ci.org/roidayan/ovs/builds/553249802

Acked-by: Roi Dayan 

>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&data=02%7C01%7Croid%40mellanox.com%7C8103de0801a340b6fce008d6fee142cb%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636976640787761456&sdata=BUPGrfzUwJjVLPoBT6miPjsxXvXhC6XIELUTvtfnFe0%3D&reserved=0
>>
> ___
> dev mailing list
> d...@openvswitch.org
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&data=02%7C01%7Croid%40mellanox.com%7C8103de0801a340b6fce008d6fee142cb%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636976640787771450&sdata=Z5%2BCwEWOsZSXVI%2BZcCM0l0B9%2BzpBP%2F1YOT07Ai2u7tc%3D&reserved=0
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [Suspected-Phishing] [PATCH 1/1] dpif-netlink: Warn eth type 0x1234 not offloadable

2019-07-02 Thread Roi Dayan



On 2019-06-30 9:16 AM, Eli Britstein wrote:
> Ethernet type 0x1234 is used for testing and not being offloadable. For
> testing offloadable features, warn about using this value.
> 
> Signed-off-by: Eli Britstein 
> ---
>  lib/dpif-netlink.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index ba80a0079..327076784 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -2007,6 +2007,7 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
> dpif_flow_put *put)
>  
>  /* When we try to install a dummy flow from a probed feature. */
>  if (match.flow.dl_type == htons(0x1234)) {
> +VLOG_WARN("eth 0x1234 is special and not offloadable");
>  return EOPNOTSUPP;
>  }
>  
> 

why a warn is needed?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [Suspected-Phishing]Re: [Suspected-Phishing] [PATCH 1/1] dpif-netlink: Warn eth type 0x1234 not offloadable

2019-07-02 Thread Roi Dayan



On 2019-07-02 2:21 PM, Roi Dayan wrote:
> 
> 
> On 2019-06-30 9:16 AM, Eli Britstein wrote:
>> Ethernet type 0x1234 is used for testing and not being offloadable. For
>> testing offloadable features, warn about using this value.
>>
>> Signed-off-by: Eli Britstein 
>> ---
>>  lib/dpif-netlink.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>> index ba80a0079..327076784 100644
>> --- a/lib/dpif-netlink.c
>> +++ b/lib/dpif-netlink.c
>> @@ -2007,6 +2007,7 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
>> dpif_flow_put *put)
>>  
>>  /* When we try to install a dummy flow from a probed feature. */
>>  if (match.flow.dl_type == htons(0x1234)) {
>> +VLOG_WARN("eth 0x1234 is special and not offloadable");
>>  return EOPNOTSUPP;
>>  }
>>  
>>
> 
> why a warn is needed?

nm. somehow skipped your commit msg.
did u check by any chance if unit tests won't fail as they need to filter this 
from the log?

> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&data=02%7C01%7Croid%40mellanox.com%7Ce8fb39de2ec84d799be308d6fedfa943%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636976633920052415&sdata=VHoqOa8rylBZNj9q8W3fa4H3xx55NK3qR2tEDvBIeRg%3D&reserved=0
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [Suspected-Phishing]Re: [PATCH v2 1/3] netdev: Dynamic per-port Flow API.

2019-07-02 Thread Roi Dayan



On 2019-07-01 1:53 PM, Eli Britstein wrote:
> 
> On 7/1/2019 1:46 PM, Ilya Maximets wrote:
>> On 01.07.2019 13:24, Eli Britstein wrote:
>>> On 7/1/2019 1:13 PM, Ilya Maximets wrote:
 On 30.06.2019 7:47, Eli Britstein wrote:
> This patch breaks ovs-dpctl dump-flows, when using TC offloads (kernel).
>
> I added a print in netdev_flow_dump_create, and flow_api is NULL when
> invoking ovs-dpctl dump-flows.
>
> I think new netdev objects are created to the ports (netdev_open), but
> not properly initialized.
>
> Could you please have a look?
>
 Hi. This is a known thing that ovs-dpctl is no longer suitable for dumping
 of HW offloaded flows. Please, use 'ovs-appctl dpctl/dump-flows' instead.
 There was a patch for documentation in v4 of patch-set:
   
 https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2019-May%2F359013.html&data=02%7C01%7Croid%40mellanox.com%7C87911589131b4a2272e408d6fe126b00%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636975752392668133&sdata=Ix865YKHk%2F2DhsB%2BvKVzw9ncYo413WtBtfdggXrMn7M%3D&reserved=0
 Here is the relevant thread:
   
 https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2019-May%2F358985.html&data=02%7C01%7Croid%40mellanox.com%7C87911589131b4a2272e408d6fe126b00%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636975752392678131&sdata=Ckv4uo%2B9jYxUcjM%2BPbr6aGWutyh1NYGn2y9FhL3aC3M%3D&reserved=0

 However, I missed updating the NEWS about that change. Will send the patch
 for this soon.

 Best regards, Ilya Maximets.
>>> If that so, don't you think this should be either fixed or deprecated?
>> Fixing seems not possible, because ovs-dpctl is a standalone application
>> that works only with default configuration and HW offloading is disabled
>> by default.
>>
>> Regarding deprecation, I'd like to remove all the functionality from 
>> ovs-dpctl
>> utility keeping only ability to remove system datapaths, since this, IMHO,
>> is the only useful function.
>>
>>> Imagine some flows are in TC and some in OVS. I guess dpctl will only
>>> show the OVS ones.
>> Yes, and this is documented.
> I missed the documentation but bisected to find the offending commit. I 
> think it's better to deprecate as you suggested above, maybe with some 
> message to refer users to the documentation.

was added when u try to use the type arg.

# ovs-dpctl dump-flows type=tc
ovs-dpctl: Invalid argument 'type'. Use 'ovs-appctl dpctl/dump-flows' instead.

just output the msg all the time would have been annoying.

>>
>>> I know about that appctl, but at least currently it misses some
>>> information in the dump like "dp" and "offloaded".
>> appctl and dpctl works through the same code in lib/dpctl.c. So, everything
>> is supported. If you don't see those fields, increase the verbosity with
>> '--more' argument.
> Right. Sorry.
>>
>> Best regards, Ilya Maximets.
> ___
> dev mailing list
> d...@openvswitch.org
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&data=02%7C01%7Croid%40mellanox.com%7C87911589131b4a2272e408d6fe126b00%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636975752392678131&sdata=2RLvrvkSgzxbCSV1yJbCwnKNCtPble95OP6pGTgJXLs%3D&reserved=0
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] vswitchd: Always cleanup userspace datapath.

2019-07-02 Thread Ilya Maximets
On 01.07.2019 19:52, Ben Pfaff wrote:
> On Thu, Jun 27, 2019 at 08:24:46PM +0300, Ilya Maximets wrote:
>> On 26.06.2019 21:27, Ben Pfaff wrote:
>>> On Tue, Jun 25, 2019 at 01:12:11PM +0300, Ilya Maximets wrote:
 'netdev' datapath is implemented within ovs-vswitchd process and can
 not exist without it, so it should be gracefully terminated with a
 full cleanup of resources upon ovs-vswitchd exit.

 This change forces dpif cleanup for 'netdev' datapath regardless of
 passing '--cleanup' to 'ovs-appctl exit'. Such solution allowes to
 not pass this additional option everytime for userspace datapath
 installations and also allowes to not terminate system datapath in
 setups where both datapaths runs at the same time.

 Exception made for 'internal' ports that could have user ip/route
 configuration. These ports will not be removed without '--cleanup'.

 This change fixes OVS disappearing from the DPDK point of view
 (keeping HW NICs improperly configured, sudden closing of vhost-user
 connections) and will help with linux devices clearing with upcoming
 AF_XDP netdev support.

 Signed-off-by: Ilya Maximets 
>>>
>>> I'm having trouble figuring out what the critical step is in the
>>> destruction process that this enables or disables.  It controls whether
>>> dpif_port_del() gets called.  There's a lot of stuff under
>>> dpif_port_del(), most of it indirect.  I'm not sure which bit is the
>>> important one.  Would you mind explaining what it is as part of the
>>> commit message?
>>
>> """
>> The main part is that dpif_port_del() will lead to netdev_close()
>> and subsequent netdev_class->destroy(dev) which will stop HW NICs
>> and free their resources. For vhost-user interfaces it will invoke
>> vhost driver unregistering with a properly closed vhost-user
>> connection. For upcoming AF_XDP netdev this will allow to gracefully
>> destroy xdp sockets and unload xdp programs from linux interfaces.
>> Another important thing is that port deletion will also trigger
>> flushing of flows offloaded to HW NICs.
>> """
>>
>> Does above shed some light on the main goals of this patch?
>> I could add this information to commit message while applying the
>> patch.
> 
> Thanks.  It helps a lot.  Please add it to the commit message.
> 
> Acked-by: Ben Pfaff 

Thanks, Ben, Flavio and William!
I added above information to commit message and pushed the patch to master.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-offload-tc: Fix requesting match on wildcarded vlan tpid.

2019-07-02 Thread Ilya Maximets
On 19.06.2019 12:16, Ilya Maximets wrote:
> 'mask' must be checked first before configuring key in flower.
> 
> CC: Eli Britstein 
> Fixes: 0b0a84783cd6 ("netdev-tc-offloads: Support match on priority tags")
> Signed-off-by: Ilya Maximets 
> ---
>  lib/netdev-offload-tc.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

Thanks, Roi and Eli! Applied to master.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] NEWS: Update regarding dumping HW offloaded flows.

2019-07-02 Thread Ilya Maximets
On 01.07.2019 13:52, Ilya Maximets wrote:
> NEWS update was missed while updating docs for dynamic Flow API.
> Since this is a user visible change, it should be mentioned here.
> 
> Fixes: d74ca2269e36 ("dpctl: Update docs about dump-flows and HW offloading.")
> Signed-off-by: Ilya Maximets 
> ---
>  NEWS | 2 ++
>  1 file changed, 2 insertions(+)
> 

Thanks, Roi and Eli! Applied to master.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] B2B COLLABORATION (BUYING)

2019-07-02 Thread Juthamas Joy Loveyou via dev
Sainsbury's is the UK's second largest supermarket chain. It offers a wide 
range of other products.

Sainsbury's started e-commerce home delivery operations in 1995. In 1996, on 
Monday 30 December it was announced that Sainsbury's has joined forces with 
Hewlett-Packard for the development of an Internet-based supermarket offering a 
full range of products. In 1998 the service was extended back to food with the 
launch of ‘Orderline'. This enabled customers to order their shopping via 
telephone, fax or the internet and have it delivered to their front door.

We are planning to extend to other EU countries therefore we are looking for 
new suppliers and we will be pleased to welcome you to support our different 
projects .

We are interested in your products. Can you please send us a quotation.

Thank you for giving us the opportunity to collaborate with your company .

Don't hesitate to contact myself if you need any further informations .

Looking forward to hearing from you .

Sainsbury's live well for less

Best Regards

Sainsbury's Supermarkets Limited

Simon Roberts / Project Manager

33 Holborn

London

United Kingdom

EC1N 2HT

Email: simo...@sainsburys-uk.com 

Landline: +44 208 638 0674 My Direct: +44 7908 125 767

Company reg: 3261722 VAT: GB 660 454 836
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 3/3] ovn: Send GARP for router port IPs of a router port connected to bridged logical switch

2019-07-02 Thread Dumitru Ceara
On Mon, Jul 1, 2019 at 9:45 AM  wrote:
>
> From: Numan Siddique 
>
> This patch handles sending GARPs for
>
>  - router port IPs of a distributed router port
>
>  - router port IPs of a router port which belongs to gateway router
>(with the option - redirect-chassis set in Logical_Router.options)
>
> Signed-off-by: Numan Siddique 

I tested the patches on my setup and the code looks good to me.
A bit unrelated to this change, at least for me the
ovn_port_update_sbrec is getting quite hard to follow. Maybe we can
consider refactoring it a bit in the future so different
functionalities are handled by smaller functions (i.e., router-port,
switch-port connected to vif, switch-port connected to router).

Acked-by: Dumitru Ceara 

> ---
>  ovn/northd/ovn-northd.c | 44 
>  tests/ovn.at| 89 +++--
>  2 files changed, 105 insertions(+), 28 deletions(-)
>
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index e0af234f8..e8cbc3534 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -1983,9 +1983,23 @@ get_nat_addresses(const struct ovn_port *op, size_t *n)
>  }
>  } else {
>  /* Centralized NAT rule, either on gateway router or distributed
> - * router. */
> -ds_put_format(&c_addresses, " %s", nat->external_ip);
> -central_ip_address = true;
> + * router.
> + * Check if external_ip is same as router ip. If so, then there
> + * is no need to add this to the nat_addresses. The router IPs
> + * will be added separately. */
> +bool is_router_ip = false;
> +for (size_t j = 0; j < op->lrp_networks.n_ipv4_addrs; j++) {
> +if (!strcmp(nat->external_ip,
> +op->lrp_networks.ipv4_addrs[j].addr_s)) {
> +is_router_ip = true;
> +break;
> +}
> +}
> +
> +if (!is_router_ip) {
> +ds_put_format(&c_addresses, " %s", nat->external_ip);
> +central_ip_address = true;
> +}
>  }
>  }
>
> @@ -2531,13 +2545,26 @@ ovn_port_update_sbrec(struct northd_context *ctx,
>   * -  op->peer has 'reside-on-gateway-chassis' set and the
>   *the logical router datapath has distributed router port.
>   *
> + * -  op->peer is distributed gateway router port.
> + *
> + * -  op->peer's router is a gateway router and op has a localnet
> + *port.
> + *
>   * Note: Port_Binding.nat_addresses column is also used for
>   * sending the GARPs for the router port IPs.
>   * */
> +bool add_router_port_garp = false;
>  if (op->peer && op->peer->nbrp && op->peer->od->l3dgw_port &&
>  op->peer->od->l3redirect_port &&
> -smap_get_bool(&op->peer->nbrp->options,
> -  "reside-on-redirect-chassis", false)) {
> +(smap_get_bool(&op->peer->nbrp->options,
> +  "reside-on-redirect-chassis", false) ||
> +op->peer == op->peer->od->l3dgw_port)) {
> +add_router_port_garp = true;
> +} else if (chassis && op->od->localnet_port) {
> +add_router_port_garp = true;
> +}
> +
> +if (add_router_port_garp) {
>  struct ds garp_info = DS_EMPTY_INITIALIZER;
>  ds_put_format(&garp_info, "%s", op->peer->lrp_networks.ea_s);
>  for (size_t i = 0; i < op->peer->lrp_networks.n_ipv4_addrs;
> @@ -2545,8 +2572,11 @@ ovn_port_update_sbrec(struct northd_context *ctx,
>  ds_put_format(&garp_info, " %s",
>
> op->peer->lrp_networks.ipv4_addrs[i].addr_s);
>  }
> -ds_put_format(&garp_info, " is_chassis_resident(%s)",
> -  op->peer->od->l3redirect_port->json_key);
> +
> +if (op->peer->od->l3redirect_port) {
> +ds_put_format(&garp_info, " is_chassis_resident(%s)",
> +  op->peer->od->l3redirect_port->json_key);
> +}
>
>  sbrec_port_binding_update_nat_addresses_addvalue(
>  op->sb, ds_cstr(&garp_info));
> diff --git a/tests/ovn.at b/tests/ovn.at
> index ea627e128..2e266d94a 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -6730,6 +6730,9 @@ AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown])
>  AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet])
>  AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1])
>
> +# Wait until the patch ports are created in hv1 to connect br-int to br-eth0
> +OVS_WAIT_UNTIL([test 1 = `as hv1 ovs-vsctl show | \
> +grep "Por

Re: [ovs-dev] [PATCH v4 0/3] vhost tx retry updates

2019-07-02 Thread Eelco Chaudron



On 2 Jul 2019, at 2:32, Kevin Traynor wrote:

> v4:
> - 1/2 New patch: Move vhost tx retries doc to a seperate section (David)
>
> - 2/3
> -- Changed tx_retries to be a custom stat for vhost (Ilya)
> -- Added back in MIN() that was dropped in v2, as in retesting I
>saw it is needed when the retry limit is reached to prevent
>an accounting error
> -- note: this is not a typo, it was just out of date
>   -/* 44 pad bytes here. */
>   +/* 4 pad bytes here. */
> -- Removed acks due to the amount of changes made
>
>
> - 3/3
> -- Added link between doc sections on vhost tx retries and
>and config (Ian)
> -- Fixed spacing, fixed typo and added default in docs (Ilya)
> -- Some minor renaming due to 2/3 changes
> -- Changed config option name from "vhost-tx-retries" to "tx-retries-max"
>in order to have a generic user facing name in case it is ever extended
>to other interfaces
> -- Other minor edits
> -- Kept acks as the operation didn't really change

Changes look good to me, so ack for the series

Acked-by: Eelco Chaudron 


> v3:
> - 2/4 really updated text to NETDEV_MAX_BURST this time
> - 4/4 comments from Flavio. Made max retries limit 32.
>
> v2:
> - 1/4 - split out minor fix from docs patch
> - 2/4 - updated text, NETDEV_MAX_BURST, QEMU/libvirt versions
> - 3/4 - removed unneeded MIN()
> - 4/4 - explicitly range check, removed unneeded tx_counters fn changes
>
> Kevin Traynor (3):
>   doc: Move vhost tx retry info to separate section.
>   netdev-dpdk: Add custom stat for vhost tx retries.
>   netdev-dpdk: Enable vhost-tx-retries config.
>
>  Documentation/topics/dpdk/vhost-user.rst | 105 +++
>  lib/netdev-dpdk.c|  75 +++-
>  vswitchd/vswitch.xml |  12 +++
>  3 files changed, 152 insertions(+), 40 deletions(-)
>
> -- 
> 2.20.1
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 4/4] OVN: Add IGMP tests

2019-07-02 Thread Dumitru Ceara
Signed-off-by: Dumitru Ceara 
---
 tests/ovn.at |  270 ++
 1 file changed, 270 insertions(+)

diff --git a/tests/ovn.at b/tests/ovn.at
index daf85a5..aa8c46f 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -14017,3 +14017,273 @@ ovn-hv4-0
 
 OVN_CLEANUP([hv1], [hv2], [hv3])
 AT_CLEANUP
+
+AT_SETUP([ovn -- IGMP snoop/querier])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+# Logical network:
+# Two independent logical switches (sw1 and sw2).
+# sw1:
+#   - subnet 10.0.0.0/8
+#   - 2 ports bound on hv1 (sw1-p11, sw1-p12)
+#   - 2 ports bound on hv2 (sw1-p21, sw1-p22)
+# sw2:
+#   - subnet 20.0.0.0/8
+#   - 1 port bound on hv1 (sw2-p1)
+#   - 1 port bound on hv2 (sw2-p2)
+#   - IGMP Querier from 20.0.0.254
+
+reset_pcap_file() {
+local iface=$1
+local pcap_file=$2
+ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
+options:rxq_pcap=dummy-rx.pcap
+rm -f ${pcap_file}*.pcap
+ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \
+options:rxq_pcap=${pcap_file}-rx.pcap
+}
+
+ip_to_hex() {
+ printf "%02x%02x%02x%02x" "$@"
+}
+
+#
+# send_igmp_v3_report INPORT HV ETH_SRC IP_SRC IP_CSUM GROUP REC_TYPE
+# IGMP_CSUM OUTFILE
+#
+# This shell function causes an IGMPv3 report to be received on INPORT of HV.
+# The packet's content has Ethernet destination 01:00:5E:00:00:22 and source
+# ETH_SRC (exactly 12 hex digits). Ethernet type is set to IP.
+# GROUP is the IP multicast group to be joined/to leave (based on REC_TYPE).
+# REC_TYPE == 04: join GROUP
+# REC_TYPE == 03: leave GROUP
+# The packet hexdump is also stored in OUTFILE.
+#
+send_igmp_v3_report() {
+local inport=$1 hv=$2 eth_src=$3 ip_src=$4 ip_chksum=$5 group=$6
+local rec_type=$7 igmp_chksum=$8 outfile=$9
+
+local eth_dst=01005e16
+local ip_dst=$(ip_to_hex 224 0 0 22)
+local ip_ttl=01
+local ip_ra_opt=9404
+
+local igmp_type=2200
+local num_rec=0001
+local aux_dlen=00
+local num_src=
+
+local eth=${eth_dst}${eth_src}0800
+local 
ip=46c000284000${ip_ttl}02${ip_chksum}${ip_src}${ip_dst}${ip_ra_opt}
+local 
igmp=${igmp_type}${igmp_chksum}${num_rec}${rec_type}${aux_dlen}${num_src}${group}
+local packet=${eth}${ip}${igmp}
+
+echo ${packet} >> ${outfile}
+as $hv ovs-appctl netdev-dummy/receive ${inport} ${packet}
+}
+
+#
+# store_igmp_v3_query ETH_SRC IP_SRC IP_CSUM OUTFILE
+#
+# This shell function builds an IGMPv3 general query from ETH_SRC and IP_SRC
+# and stores the hexdump of the packet in OUTFILE.
+#
+store_igmp_v3_query() {
+local eth_src=$1 ip_src=$2 ip_chksum=$3 outfile=$4
+
+local eth_dst=01005e01
+local ip_dst=$(ip_to_hex 224 0 0 1)
+local ip_ttl=01
+local igmp_type=11
+local max_resp=0a
+local igmp_chksum=eeeb
+local addr=
+
+local eth=${eth_dst}${eth_src}0800
+local ip=45204000${ip_ttl}02${ip_chksum}${ip_src}${ip_dst}
+local igmp=${igmp_type}${max_resp}${igmp_chksum}${addr}000a
+local packet=${eth}${ip}${igmp}
+
+echo ${packet} >> ${outfile}
+}
+
+#
+# send_ip_multicast_pkt INPORT HV ETH_SRC ETH_DST IP_SRC IP_DST IP_LEN
+#IP_PROTO DATA OUTFILE
+#
+# This shell function causes an IP multicast packet to be received on INPORT
+# of HV.
+# The hexdump of the packet is stored in OUTFILE.
+#
+send_ip_multicast_pkt() {
+local inport=$1 hv=$2 eth_src=$3 eth_dst=$4 ip_src=$5 ip_dst=$6
+local ip_len=$7 ip_chksum=$8 proto=$9 data=${10} outfile=${11}
+
+local ip_ttl=20
+
+local eth=${eth_dst}${eth_src}0800
+local 
ip=45${ip_len}95f14000${ip_ttl}${proto}${ip_chksum}${ip_src}${ip_dst}
+local packet=${eth}${ip}${data}
+
+as $hv ovs-appctl netdev-dummy/receive ${inport} ${packet}
+echo ${packet} >> ${outfile}
+}
+
+ovn-nbctl ls-add sw1
+ovn-nbctl ls-add sw2
+
+ovn-nbctl lsp-add sw1 sw1-p11
+ovn-nbctl lsp-add sw1 sw1-p12
+ovn-nbctl lsp-add sw1 sw1-p21
+ovn-nbctl lsp-add sw1 sw1-p22
+ovn-nbctl lsp-add sw2 sw2-p1
+ovn-nbctl lsp-add sw2 sw2-p2
+
+net_add n1
+sim_add hv1
+as hv1
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+ovs-vsctl -- add-port br-int hv1-vif1 -- \
+set interface hv1-vif1 external-ids:iface-id=sw1-p11 \
+options:tx_pcap=hv1/vif1-tx.pcap \
+options:rxq_pcap=hv1/vif1-rx.pcap \
+ofport-request=1
+ovs-vsctl -- add-port br-int hv1-vif2 -- \
+set interface hv1-vif2 external-ids:iface-id=sw1-p12 \
+options:tx_pcap=hv1/vif2-tx.pcap \
+options:rxq_pcap=hv1/vif2-rx.pcap \
+ofport-request=1
+ovs-vsctl -- add-port br-int hv1-vif3 -- \
+set interface hv1-vif3 external-ids:iface-id=sw2-p1 \
+options:tx_pcap=hv1/vif3-tx.pcap \
+options:rxq_pcap=hv1/vif3-rx.pcap \
+ofport-request=1
+
+sim_add hv2
+as hv2
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.2
+ovs-vsctl -- add-port br-int hv2-vif1 -- \
+set interface hv2-vif1 external-ids:iface-id=sw1-p21 \
+opti

[ovs-dev] [PATCH 3/4] OVN: Add ovn-northd IGMP support

2019-07-02 Thread Dumitru Ceara
New IP Multicast Snooping Options are added to the Northbound DB
Logical_Switch:other_config column. These allow enabling IGMP snooping and
querier on the logical switch and get translated by ovn-northd to rows in
the IP_Multicast Southbound DB table.

ovn-northd monitors for changes done by ovn-controllers in the Southbound DB
IGMP_Group table. Based on the entries in IGMP_Group ovn-northd creates
Multicast_Group entries in the Southbound DB, one per IGMP_Group address X,
containing the list of logical switch ports (aggregated from all controllers)
that have IGMP_Group entries for that datapath and address X. ovn-northd
also creates a logical flow that matches on IP multicast traffic destined
to address X and outputs it on the tunnel key of the corresponding
Multicast_Group entry.

Signed-off-by: Dumitru Ceara 
---
 ovn/northd/ovn-northd.c |  446 ---
 ovn/ovn-nb.xml  |   54 ++
 2 files changed, 471 insertions(+), 29 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 0b0a96a..3f24fbe 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -29,6 +29,7 @@
 #include "openvswitch/json.h"
 #include "ovn/lex.h"
 #include "ovn/lib/chassis-index.h"
+#include "ovn/lib/ip-mcast-index.h"
 #include "ovn/lib/ovn-l7.h"
 #include "ovn/lib/ovn-nb-idl.h"
 #include "ovn/lib/ovn-sb-idl.h"
@@ -57,6 +58,7 @@ struct northd_context {
 struct ovsdb_idl_txn *ovnnb_txn;
 struct ovsdb_idl_txn *ovnsb_txn;
 struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name;
+struct ovsdb_idl_index *sbrec_ip_mcast_by_dp;
 };
 
 static const char *ovnnb_db;
@@ -424,6 +426,32 @@ struct ipam_info {
 bool mac_only;
 };
 
+struct mcast_info {
+bool enabled;
+bool querier;
+bool flood_unregistered;
+
+uint32_t table_size;
+uint32_t idle_timeout;
+uint32_t query_interval;
+char *eth_src;
+char *ipv4_src;
+uint8_t  query_max_response;
+
+uint32_t active_flows;
+};
+
+/*
+ * Multicast snooping idle timeout and query interval definitions.
+ */
+#define OVN_MCAST_MIN_IDLE_TIMEOUT_S   15
+#define OVN_MCAST_MAX_IDLE_TIMEOUT_S   3600
+#define OVN_MCAST_DEFAULT_IDLE_TIMEOUT_S   300
+#define OVN_MCAST_MIN_QUERY_INTERVAL_S 1
+#define OVN_MCAST_MAX_QUERY_INTERVAL_S OVN_MCAST_MAX_IDLE_TIMEOUT_S
+#define OVN_MCAST_DEFAULT_QUERY_MAX_RESPONSE_S 1
+#define OVN_MCAST_DEFAULT_MAX_ENTRIES  2048
+
 /* The 'key' comes from nbs->header_.uuid or nbr->header_.uuid or
  * sb->external_ids:logical-switch. */
 struct ovn_datapath {
@@ -448,6 +476,9 @@ struct ovn_datapath {
 /* IPAM data. */
 struct ipam_info ipam_info;
 
+/* Multicast data. */
+struct mcast_info mcast_info;
+
 /* OVN northd only needs to know about the logical router gateway port for
  * NAT on a distributed router.  This "distributed gateway port" is
  * populated only when there is a "redirect-chassis" specified for one of
@@ -522,6 +553,8 @@ ovn_datapath_destroy(struct hmap *datapaths, struct 
ovn_datapath *od)
 hmap_remove(datapaths, &od->key_node);
 destroy_tnlids(&od->port_tnlids);
 bitmap_free(od->ipam_info.allocated_ipv4s);
+free(od->mcast_info.eth_src);
+free(od->mcast_info.ipv4_src);
 free(od->router_ports);
 ovn_ls_port_group_destroy(&od->nb_pgs);
 free(od);
@@ -659,6 +692,80 @@ init_ipam_info_for_datapath(struct ovn_datapath *od)
 }
 
 static void
+init_mcast_info_for_datapath(struct ovn_datapath *od)
+{
+if (!od->nbs) {
+return;
+}
+
+struct mcast_info *mcast_info = &od->mcast_info;
+
+mcast_info->enabled =
+smap_get_bool(&od->nbs->other_config, "mcast_snoop", false);
+mcast_info->querier =
+smap_get_bool(&od->nbs->other_config, "mcast_querier", true);
+mcast_info->flood_unregistered =
+smap_get_bool(&od->nbs->other_config, "mcast_flood_unregistered",
+  false);
+
+mcast_info->table_size =
+smap_get_ullong(&od->nbs->other_config, "mcast_table_size",
+OVN_MCAST_DEFAULT_MAX_ENTRIES);
+
+uint32_t idle_timeout =
+smap_get_ullong(&od->nbs->other_config, "mcast_idle_timeout",
+OVN_MCAST_DEFAULT_IDLE_TIMEOUT_S);
+if (idle_timeout < OVN_MCAST_MIN_IDLE_TIMEOUT_S) {
+idle_timeout = OVN_MCAST_MIN_IDLE_TIMEOUT_S;
+} else if (idle_timeout > OVN_MCAST_MAX_IDLE_TIMEOUT_S) {
+idle_timeout = OVN_MCAST_MAX_IDLE_TIMEOUT_S;
+}
+mcast_info->idle_timeout = idle_timeout;
+
+uint32_t query_interval =
+smap_get_ullong(&od->nbs->other_config, "mcast_query_interval",
+mcast_info->idle_timeout / 2);
+if (query_interval < OVN_MCAST_MIN_QUERY_INTERVAL_S) {
+query_interval = OVN_MCAST_MIN_QUERY_INTERVAL_S;
+} else if (query_interval > OVN_MCAST_MAX_QUERY_INTERVAL_S) {
+query_interval = OVN_MCAST_MAX_QUERY_

[ovs-dev] [PATCH 2/4] OVN: Add IGMP SB definitions and ovn-controller support

2019-07-02 Thread Dumitru Ceara
A new IP_Multicast table is added to Southbound DB. This table stores the
multicast related configuration for each datapath. Each row will be
populated by ovn-northd and will control:
- if IGMP Snooping is enabled or not, the snooping table size and multicast
  group idle timeout.
- if IGMP Querier is enabled or not (only if snooping is enabled too), query
  interval, query source addresses (Ethernet and IP) and the max-response
  field to be stored in outgoing queries.
- an additional "seq_no" column is added such that ovn-sbctl or if needed a
  CMS can flush currently learned groups. This can be achieved by incrementing
  the "seq_no" value.

A new IGMP_Group table is added to Southbound DB. This table stores all the
multicast groups learned by ovn-controllers. The table is indexed by
datapath, group address and chassis. For a learned multicast group on a
specific datapath each ovn-controller will store its own row in this table.
Each row contains the list of chassis-local ports on which the group was
learned. Rows in the IGMP_Group table are updated or deleted only by the
ovn-controllers that created them.

A new action ("igmp") is added to punt IGMP packets on a specific logical
switch datapath to ovn-controller if IGMP snooping is enabled.

Per datapath IGMP multicast snooping support is added to pinctrl:
- incoming IGMP reports are processed and multicast groups are maintained
  (using the OVS mcast-snooping library).
- each OVN controller syncs its in-memory IGMP groups to the Southbound DB
  in the IGMP_Group table.
- pinctrl also sends periodic IGMPv3 general queries for all datapaths where
  querier is enabled.

Signed-off-by: Mark Michelson 
Co-authored-by: Mark Michelson 
Signed-off-by: Dumitru Ceara 
---
 include/ovn/actions.h   |6 
 ovn/controller/automake.mk  |2 
 ovn/controller/ip-mcast.c   |  164 +
 ovn/controller/ip-mcast.h   |   52 +++
 ovn/controller/ovn-controller.c |   23 +
 ovn/controller/pinctrl.c|  742 +++
 ovn/controller/pinctrl.h|2 
 ovn/lib/actions.c   |   22 +
 ovn/lib/automake.mk |2 
 ovn/lib/ip-mcast-index.c|   40 ++
 ovn/lib/ip-mcast-index.h|   30 ++
 ovn/lib/logical-fields.c|2 
 ovn/ovn-sb.ovsschema|   39 ++
 ovn/ovn-sb.xml  |   73 
 ovn/utilities/ovn-sbctl.c   |   53 +++
 ovn/utilities/ovn-trace.c   |   14 +
 16 files changed, 1261 insertions(+), 5 deletions(-)
 create mode 100644 ovn/controller/ip-mcast.c
 create mode 100644 ovn/controller/ip-mcast.h
 create mode 100644 ovn/lib/ip-mcast-index.c
 create mode 100644 ovn/lib/ip-mcast-index.h

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index f42bbc2..1cc022b 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -67,6 +67,7 @@ struct ovn_extend_table;
 OVNACT(ICMP4, ovnact_nest)\
 OVNACT(ICMP4_ERROR,   ovnact_nest)\
 OVNACT(ICMP6, ovnact_nest)\
+OVNACT(IGMP,  ovnact_nest)\
 OVNACT(TCP_RESET, ovnact_nest)\
 OVNACT(ND_NA, ovnact_nest)\
 OVNACT(ND_NA_ROUTER,  ovnact_nest)\
@@ -486,6 +487,11 @@ enum action_opcode {
  * The actions, in OpenFlow 1.3 format, follow the action_header.
  */
 ACTION_OPCODE_ICMP4_ERROR,
+/* "igmp()".
+ *
+ * Snoop IGMP, learn the multicast participants
+ */
+ACTION_OPCODE_IGMP,
 };
 
 /* Header. */
diff --git a/ovn/controller/automake.mk b/ovn/controller/automake.mk
index fcdf7a4..193ea69 100644
--- a/ovn/controller/automake.mk
+++ b/ovn/controller/automake.mk
@@ -10,6 +10,8 @@ ovn_controller_ovn_controller_SOURCES = \
ovn/controller/encaps.h \
ovn/controller/ha-chassis.c \
ovn/controller/ha-chassis.h \
+   ovn/controller/ip-mcast.c \
+   ovn/controller/ip-mcast.h \
ovn/controller/lflow.c \
ovn/controller/lflow.h \
ovn/controller/lport.c \
diff --git a/ovn/controller/ip-mcast.c b/ovn/controller/ip-mcast.c
new file mode 100644
index 000..ef36be2
--- /dev/null
+++ b/ovn/controller/ip-mcast.c
@@ -0,0 +1,164 @@
+/* Copyright (c) 2019, Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include "ip-mcast.h"
+#include "lport.h"
+#include "ovn/lib/ovn-sb-idl.h"
+
+/*
+ * Used for (faster) upd

[ovs-dev] [PATCH 0/4] OVN: Add IGMP support

2019-07-02 Thread Dumitru Ceara
This series introduces support for IGMP Snooping and IGMP Querier. IGMP
versions v1-v3 are supported for snooping and IGMP queries originated by
ovn-controller are general IGMPv3 queries. The rationale behind using v3 for
querier is that it's backward compatible with v1-v2.

The majority of the code is IP version independent with the thought in mind
that support for MLD snooping for IPv6 will be added next.

Dumitru Ceara (4):
  packets: Add IGMPv3 query packet definitions
  OVN: Add IGMP SB definitions and ovn-controller support
  OVN: Add ovn-northd IGMP support
  OVN: Add IGMP tests


 include/ovn/actions.h   |6 
 lib/packets.c   |   44 ++
 lib/packets.h   |   19 +
 ovn/controller/automake.mk  |2 
 ovn/controller/ip-mcast.c   |  164 +
 ovn/controller/ip-mcast.h   |   52 +++
 ovn/controller/ovn-controller.c |   23 +
 ovn/controller/pinctrl.c|  742 +++
 ovn/controller/pinctrl.h|2 
 ovn/lib/actions.c   |   22 +
 ovn/lib/automake.mk |2 
 ovn/lib/ip-mcast-index.c|   40 ++
 ovn/lib/ip-mcast-index.h|   30 ++
 ovn/lib/logical-fields.c|2 
 ovn/northd/ovn-northd.c |  446 ++-
 ovn/ovn-nb.xml  |   54 +++
 ovn/ovn-sb.ovsschema|   39 ++
 ovn/ovn-sb.xml  |   73 
 ovn/utilities/ovn-sbctl.c   |   53 +++
 ovn/utilities/ovn-trace.c   |   14 +
 tests/ovn.at|  270 ++
 21 files changed, 2064 insertions(+), 35 deletions(-)
 create mode 100644 ovn/controller/ip-mcast.c
 create mode 100644 ovn/controller/ip-mcast.h
 create mode 100644 ovn/lib/ip-mcast-index.c
 create mode 100644 ovn/lib/ip-mcast-index.h


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/4] packets: Add IGMPv3 query packet definitions

2019-07-02 Thread Dumitru Ceara
Signed-off-by: Dumitru Ceara 
---
 lib/packets.c |   44 
 lib/packets.h |   19 ++-
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/lib/packets.c b/lib/packets.c
index a8fd61f..ab0b1a3 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -1281,6 +1281,50 @@ packet_set_icmp(struct dp_packet *packet, uint8_t type, 
uint8_t code)
 }
 }
 
+/* Sets the IGMP type to IGMP_HOST_MEMBERSHIP_QUERY and populates the
+ * v3 query header fields in 'packet'. 'packet' must be a valid IGMPv3
+ * query packet with its l4 offset properly populated.
+ */
+void
+packet_set_igmp3_query(struct dp_packet *packet, uint8_t max_resp,
+   ovs_be32 group, bool srs, uint8_t qrv, uint8_t qqic)
+{
+struct igmpv3_query_header *igh = dp_packet_l4(packet);
+ovs_be16 orig_type_max_resp =
+htons(igh->type << 8 | igh->max_resp);
+ovs_be16 new_type_max_resp =
+htons(IGMP_HOST_MEMBERSHIP_QUERY << 8 | max_resp);
+
+if (orig_type_max_resp != new_type_max_resp) {
+igh->type = IGMP_HOST_MEMBERSHIP_QUERY;
+igh->max_resp = max_resp;
+igh->csum = recalc_csum16(igh->csum, orig_type_max_resp,
+  new_type_max_resp);
+}
+
+ovs_be32 old_group = get_16aligned_be32(&igh->group);
+
+if (old_group != group) {
+put_16aligned_be32(&igh->group, group);
+igh->csum = recalc_csum32(igh->csum, old_group, group);
+}
+
+/* See RFC 3376 4.1.6. */
+if (qrv > 7) {
+qrv = 0;
+}
+
+ovs_be16 orig_srs_qrv_qqic = htons(igh->srs_qrv << 8 | igh->qqic);
+ovs_be16 new_srs_qrv_qqic = htons(srs << 11 | qrv << 8 | qqic);
+
+if (orig_srs_qrv_qqic != new_srs_qrv_qqic) {
+igh->srs_qrv = (srs << 3 | qrv);
+igh->qqic = qqic;
+igh->csum = recalc_csum16(igh->csum, orig_srs_qrv_qqic,
+  new_srs_qrv_qqic);
+}
+}
+
 void
 packet_set_nd_ext(struct dp_packet *packet, const ovs_16aligned_be32 rso_flags,
   const uint8_t opt_type)
diff --git a/lib/packets.h b/lib/packets.h
index d293b35..4124490 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -681,6 +681,7 @@ char *ip_parse_cidr_len(const char *s, int *n, ovs_be32 *ip,
 #define IP_ECN_ECT_0 0x02
 #define IP_ECN_CE 0x03
 #define IP_ECN_MASK 0x03
+#define IP_DSCP_CS6 0xc0
 #define IP_DSCP_MASK 0xfc
 
 static inline int
@@ -763,6 +764,20 @@ struct igmpv3_header {
 };
 BUILD_ASSERT_DECL(IGMPV3_HEADER_LEN == sizeof(struct igmpv3_header));
 
+#define IGMPV3_QUERY_HEADER_LEN 12
+struct igmpv3_query_header {
+uint8_t type;
+uint8_t max_resp;
+ovs_be16 csum;
+ovs_16aligned_be32 group;
+uint8_t srs_qrv;
+uint8_t qqic;
+ovs_be16 nsrcs;
+};
+BUILD_ASSERT_DECL(
+IGMPV3_QUERY_HEADER_LEN == sizeof(struct igmpv3_query_header
+));
+
 #define IGMPV3_RECORD_LEN 8
 struct igmpv3_record {
 uint8_t type;
@@ -1543,7 +1558,9 @@ void packet_set_nd(struct dp_packet *, const struct 
in6_addr *target,
 void packet_set_nd_ext(struct dp_packet *packet,
const ovs_16aligned_be32 rso_flags,
const uint8_t opt_type);
-
+void packet_set_igmp3_query(struct dp_packet *, uint8_t max_resp,
+ovs_be32 group, bool srs, uint8_t qrv,
+uint8_t qqic);
 void packet_format_tcp_flags(struct ds *, uint16_t);
 const char *packet_tcp_flag_to_string(uint32_t flag);
 void compose_arp__(struct dp_packet *);

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev