date:20180710

[ovs-dev] LEGITIMATE PROPOSAL!!!

2018-07-10 Thread Mass Anderson via dev

Compliments of the day to you and I hope this mail will meet you in good 
health. With Due Respect, I want to introduce myself. I am Mass Anderson from 
Belgium, Working for South coast Inc. South Coast Inc is an investment 
sub-company under Capital One Bank U.K. South Coast Inc handles all aspects of 
investment of customer funds on behalf of Capital One Bank U.K. 

South Coast Inc, on behalf of Capital One Bank U.K handles all Investors 
Treasury Bill Deposit which gives us access to trade with our Investors Funds 
on Private Arrangement. In addition, we make an average turn-over of GBPS 
£1.2-3 Billion Great British Pounds annually through Private Trading with our 
Investors fund and Treasury Bill Deposit while the said funds are currently 
lying as a Floating Capital in our Treasury Bill Magellan Trust Account. 

Based on recent development, I have been mandated by one of our numerous 
investors to find somne of your caliber who can help receive and manage his 
funds in our custody. Details of the said investment funds will be revealed to 
you upon receiving a confirmatory email from you as to your readiness to assume 
the task of investing the said funds in a Private arrangement that will be 
beneficial to all parties involved. 

Can you be able to secure and manage the investment fund in your country? 
contact me by email on (mass.ander...@yahoo.com ) for more clarifications and 
how we can proceed 

Thank you for the anticipated cooperation. 

Warmest Regards. 
Mass Anderson
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [ovs-dev, v4] dpif-netdev: Avoid reordering of packets in a batch with same megaflow

2018-07-10 Thread Vishal Deep Ajmera

> >>
> >> This is potentially dangerous from the future modifications and hard to 
> >> read
> for
> >> reviewer/person who tries to understand how it works.
> >>
> >> Current implementation will fail if someone will change the logic of
> >> 'DP_PACKET_BATCH_REFILL_FOR_EACH', for example.
> >>
> >> The key point is that 'DP_PACKET_BATCH_REFILL_FOR_EACH' and
> >> 'dp_packet_batch_refill' manipulates with the batch size in a hidden from 
> >> the
> >> end user manner. Using of the 'packets_->count' directly requires knowledge
> of
> >> how refilling works internally, but these functions was intended to hide
> internals
> >> of batch manipulations.
> >>
> >> Also, as I understand, you're storing 'map_cnt' for each missed packet.
> >> So, why not just use 'n_missed' for that purpose?
> >>
> >
> > map_cnt keeps count of not just the emc miss packet but also all subsequent
> > packets (with emc hit) after one emc miss.
> 
> Yes. But we're talking about replacing 'packets_->count' with 'n_missed'.
> Not about 'map_cnt'.

Thanks Ilya. I have fixed this in v5 patch-set.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v5] dpif-netdev: Avoid reordering of packets in a batch with same megaflow

2018-07-10 Thread Vishal Deep Ajmera

OVS reads packets in batches from a given port and packets in the
batch are subjected to potentially 3 levels of lookups to identify
the datapath megaflow entry (or flow) associated with the packet.
Each megaflow entry has a dedicated buffer in which packets that match
the flow classification criteria are collected. This buffer helps OVS
perform batch processing for all packets associated with a given flow.

Each packet in the received batch is first subjected to lookup in the
Exact Match Cache (EMC). Each EMC entry will point to a flow. If the
EMC lookup is successful, the packet is moved from the rx batch to the
per-flow buffer.

Packets that did not match any EMC entry are rearranged in the rx batch
at the beginning and are now subjected to a lookup in the megaflow cache.
Packets that match a megaflow cache entry are *appended* to the per-flow
buffer.

Packets that do not match any megaflow entry are subjected to slow-path
processing through the upcall mechanism. This cannot change the order of
packets as by definition upcall processing is only done for packets
without matching megaflow entry.

The EMC entry match fields encompass all potentially significant header
fields, typically more than specified in the associated flow's match
criteria. Hence, multiple EMC entries can point to the same flow. Given
that per-flow batching happens at each lookup stage, packets belonging
to the same megaflow can get re-ordered because some packets match EMC
entries while others do not.

The following example can illustrate the issue better. Consider
following batch of packets (labelled P1 to P8) associated with a single
TCP connection and associated with a single flow. Let us assume that
packets with just the ACK bit set in TCP flags have been received in a
prior batch also and a corresponding EMC entry exists.

1. P1 (TCP Flag: ACK)
2. P2 (TCP Flag: ACK)
3. P3 (TCP Flag: ACK)
4. P4 (TCP Flag: ACK, PSH)
5. P5 (TCP Flag: ACK)
6. P6 (TCP Flag: ACK)
7. P7 (TCP Flag: ACK)
8. P8 (TCP Flag: ACK)

The megaflow classification criteria does not include TCP flags while
the EMC match criteria does. Thus, all packets other than P4 match
the existing EMC entry and are moved to the per-flow packet batch.
Subsequently, packet P4 is moved to the same per-flow packet batch as
a result of the megaflow lookup. Though the packets have all been
correctly classified as being associated with the same flow, the
packet order has not been preserved because of the per-flow batching
performed during the EMC lookup stage. This packet re-ordering has
performance implications for TCP applications.

This patch preserves the packet ordering by performing the per-flow
batching after both the EMC and megaflow lookups are complete. As an
optimization, packets are flow-batched in emc processing till any
packet in the batch has an EMC miss.

A new flow map is maintained to keep the original order of packet
along with flow information. Post fastpath processing, packets from
flow map are *appended* to per-flow buffer.

Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Venkatesan Pradeep 
Signed-off-by: Venkatesan Pradeep 
---
 lib/dpif-netdev.c | 104 +-
 1 file changed, 88 insertions(+), 16 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 8b3556d..52ababb 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -208,6 +208,13 @@ struct dpcls_rule {
 /* 'flow' must be the last field, additional space is allocated here. */
 };
 
+/* data structure to keep packet order till fastpath processing */
+struct dp_packet_flow_map {
+struct dp_packet *packet;
+struct dp_netdev_flow *flow;
+uint16_t tcp_flags;
+};
+
 static void dpcls_init(struct dpcls *);
 static void dpcls_destroy(struct dpcls *);
 static void dpcls_sort_subtable_vector(struct dpcls *);
@@ -5602,6 +5609,19 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
 packet_batch_per_flow_update(batch, pkt, tcp_flags);
 }
 
+static inline void
+packet_enqueue_to_flow_map(struct dp_packet_flow_map *flow_map,
+   struct dp_netdev_flow *flow,
+   struct dp_packet *packet,
+   uint16_t tcp_flags,
+   size_t *map_cnt)
+{
+struct dp_packet_flow_map *map = _map[(*map_cnt)++];
+map->flow = flow;
+map->packet = packet;
+map->tcp_flags = tcp_flags;
+}
+
 /* Try to process all ('cnt') the 'packets' using only the exact match cache
  * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the
  * miniflow is copied into 'keys' and the packet pointer is moved at the
@@ -5621,6 +5641,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
struct dp_packet_batch *packets_,
struct netdev_flow_key *keys,
struct packet_batch_per_flow batches[], size_t *n_batches,
+   struct dp_packet_flow_map *flow_map,
+   size_t *n_flows,
+

Re: [ovs-dev] [PATCH] sparse: Make IN6_IS_ADDR_MC_LINKLOCAL and IN6_ARE_ADDR_EQUAL pickier.

2018-07-10 Thread Aaron Conole

Ben Pfaff  writes:

> On GNU systems these macros work with arbitrary pointers, but the relevant
> standards only require IN6_IS_ADDR_MC_LINKLOCAL to work with in6_addr (and
> don't specify IN6_ARE_ADDR_EQUAL at all).  Make the "sparse"
> implementations correspondingly pickier so that we catch any introduced
> problems more quickly.
>
> CC: Aaron Conole 
> Signed-off-by: Ben Pfaff 
> ---

LGTM

Acked-by: Aaron Conole 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] tests: Add gre test that doesn't requiring Linux gre module

2018-07-10 Thread Darrell Ball

Thanks for the patch Yifeng

I gave the patch a spin.

Few datapoints:

1/ The new test fails for both userspace and kernel; at different points in
the test, lines 377 and 395
respectively; I later tried increasing the sleep time to 5 seconds, but
it did not seem to help.

2/ The test runs pretty slow for an arp/ICMP test.
 It takes about 35 seconds for the test to fail at line 377 and 140
seconds to fail at line 395.


## openvswitch 2.9.90 test suite. ##
## -- ##
 12: datapath - ping over gre tunnel by simulated packets FAILED (
system-traffic.at:395)

## - ##
## Test results. ##
## - ##

ERROR: 1 test was run,
1 failed unexpectedly.
## -- ##
## system-kmod-testsuite.log was created. ##
## -- ##

Please send `tests/system-kmod-testsuite.log' and all information you think
might help:

   To: 
   Subject: [openvswitch 2.9.90] system-kmod-testsuite: 12 failed

You may investigate any problem if you feel able to do so, in which
case the test suite provides a good starting point.  Its output may
be found below `tests/system-kmod-testsuite.dir'.

make[1]: *** [check-kernel] Error 1
make[1]: Leaving directory `/home/dball/openvswitch/ovs/_gcc'
make: *** [check-kmod] Error 2
make: Leaving directory `/home/dball/openvswitch/ovs/_gcc'

real 2m17.951s
user 0m7.528s
sys 0m0.944s


 In the same environment, test 1 takes about 5 seconds to complete

## -- ##
## openvswitch 2.9.90 test suite. ##
## -- ##
  1: datapath - ping between two ports   ok

## - ##
## Test results. ##
## - ##

1 test was successful.
make: Leaving directory `/home/dball/openvswitch/ovs/_gcc'

real 0m4.937s
user 0m1.440s
sys 0m0.304s


3/ I noticed the 4 other sendpkt tests, which are simple crafted packet
injection tests, also run slow - 1.5 to 4 minutes per test
for a successful run.

 ## -- ##
## openvswitch 2.9.90 test suite. ##
## -- ##
111: nsh - forward   ok

## - ##
## Test results. ##
## - ##

1 test was successful.
make: Leaving directory `/home/dball/openvswitch/ovs/_gcc'

real 4m10.855s
user 0m1.664s
sys 0m0.304s


Darrell

On Tue, Jul 10, 2018 at 12:24 PM, Yifeng Sun  wrote:

> Currently check-kmod's gre test is broken on certain kernel
> versions where ovs's gre module conflits with kernel's gre
> module. But at the same time, present gre test depends on
> Linux gre module to setup gre port.
>
> This patch repairs the gre test by completely removing the
> dependancy of kernel's gre module and emulating a virtual
> Linux gre port that sends out arp and icmp packets.
>
> Suggested-by: William Tu 
> Signed-off-by: Yifeng Sun 
> ---
>  tests/system-common-macros.at | 13 +
>  tests/system-traffic.at   | 67 ++
> +
>  2 files changed, 80 insertions(+)
>
> diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at
> index 64bf5ec63ab4..7b841250c383 100644
> --- a/tests/system-common-macros.at
> +++ b/tests/system-common-macros.at
> @@ -329,3 +329,16 @@ m4_define([OVS_CHECK_IPROUTE_ENCAP],
>  # OVS_CHECK_CT_CLEAR()
>  m4_define([OVS_CHECK_CT_CLEAR],
>  [AT_SKIP_IF([! grep -q "Datapath supports ct_clear action"
> ovs-vswitchd.log])])
> +
> +# GEN_ICMP_DATA([start], [len])
> +#
> +# Generate space-separated ICMP data that is acceptable to
> tests/sendpkt.py.
> +m4_define([GEN_ICMP_DATA],
> +[[
> +  ICMP_DATA=""
> +  for (( n=0; n<$2; n++ )); do
> +  ICMP_DATA="${ICMP_DATA} $(printf %x $((($1+n) % 256)))"
> +  done
> +  echo "$ICMP_DATA"
> +]]
> +)
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 519b234bb16b..37421ed1e78a 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -339,6 +339,73 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3
> -w 2 10.1.1.100 | FORMAT_PI
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>
> +AT_SETUP([datapath - ping over gre tunnel by simulated packets])
> +OVS_CHECK_GRE()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +AT_CHECK([ovs-vsctl -- set bridge br0 other-config:hwaddr=\"f2:ff:
> 00:00:00:01\"])
> +ADD_BR([br-underlay], [set bridge br-underlay other-config:hwaddr=\"f2:ff:
> 00:00:00:02\"])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH(p0, at_ns0, br-underlay, "172.31.1.1/24", f2:ff:00:00:00:03)
> +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace.
> +ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24])

Re: [ovs-dev] [PATCH 2/3] ipsec: add CA-cert based authentication

2018-07-10 Thread Ansis Atteka

On Wed, 27 Jun 2018 at 10:59, Qiuyu Xiao  wrote:
>
> This patch adds CA-cert based authentication to the ovs-monitor-ipsec
> daemon. With CA-cert based authentication enabled, OVS approves IPsec
> tunnel if the peer has a cert signed by a trusted CA and the identity of
> the peer cert is as expected. Belows are the major changes and the
> reasons:
>
> 1) Added CA-cert based authentication. Compared with peer-cert based
> authentication, this one doesn't need to import peer cert to the local
> host to do configuration. This is especially beneficial if host has
> mutiple peers and peers frequently update their certs. This feature is
> required for the upcoming OVN IPsec support.
>
> 2) Changed the host cert and private key configuration interface.
> Previously, the host's cert and private key can be configured in either
> Open_vSwitch's SSL column or the SSL table. Now, the host certificate
> and private key can be only configured in the Open_vSwitch table's
> other_config column. Since it is not SSL cert and key, we'd better not
> to confuse users by saying so.
>
> 3) Changed the peer cert configuration interface. Previously, the peer
> cert is configured by setting the interface's options column as the
> content of the peer cert. It's changed to setting the column as the path
> of the peer cert. This is easier to be configured by the command line
> tool, and is consistent with other cert and key configuration interface
> which is better from a usability point of view.
>

Would you mind creating a patch ovs/poc/ipsec ansible+vagrant recipe
that deploys two VMs, installs strongswan, openvswitch and then
configure IPsec between them?

Current tests use mocked strongSwan's ipsec utility.


> Signed-off-by: Qiuyu Xiao 
> ---
>  Documentation/howto/ipsec.rst |  78 ---
>  ipsec/ovs-monitor-ipsec   | 138 +-
>  2 files changed, 156 insertions(+), 60 deletions(-)
>
> diff --git a/Documentation/howto/ipsec.rst b/Documentation/howto/ipsec.rst
> index 4e4f4d211..b42312da5 100644
> --- a/Documentation/howto/ipsec.rst
> +++ b/Documentation/howto/ipsec.rst
> @@ -21,9 +21,9 @@
>
>Avoid deeper levels because they do not render well.
>
> -==
> -How to Encrypt Open vSwitch Tunnels with IPsec
> -==
> +===
> +Encrypt Open vSwitch Tunnels with IPsec
> +===
>
>  This document describes how to use Open vSwitch integration with strongSwan
>  5.1 or later to provide IPsec security for STT, GENEVE, GRE and VXLAN 
> tunnels.
> @@ -77,6 +77,67 @@ Install strongSwan and openvswitch-ipsec debian packages::
>  Configuration
>  -
>
> +The IPsec configuration is done by setting options of the tunnel interface.
> +ovs-monitor-ipsec configures strongSwan accordingly based on the tunnel 
> options.
> +
> +Authentication Methods
> +~~
> +
> +Hosts of the IPsec tunnel need to authenticate each other to build a secure
> +channel. There are three authentication methods:
> +
> +1) You can set a pre-shared key in both hosts to do authentication. This
> +   method is easier to use but less secure::
> +
> +  % ovs-vsctl add-port br0 ipsec_gre0 -- \
> +  set interface ipsec_gre0 type=gre \
> + options:remote_ip=1.2.3.4 \
> + options:psk=swordfish])
> +
> +2) You can use peer certificates to do authentication. First, generate
> +   certificate and private key in each host. The certificate could be
> +   self-signed.  Refer to the ovs-pki(8) man page for more information
> +   regarding certificate and key generation. Then, copy the peer certificate
> +   to the local host and type::
> +
> +  % ovs-vsctl set Open_vSwitch . \
> +  other_config:certificate=/path/to/local_cert.pem \
> +  other_config:private_key=/path/to/priv_key.pem
> +  % ovs-vsctl add-port br0 ipsec_gre0 -- \
> +  set interface ipsec_gre0 type=gre \
> + options:remote_ip=1.2.3.4 \
> + options:pki=peer_auth \
It seems you have introduced options:pki to allow to pick between
self-signed certificate configuration and certificate authority signed
certificate configuration. Instead, could you determine the
configuration from some fields that you uniquely set in each
configuration?

> + options:peer_cert=/path/to/peer_cert.pem
> +
> +   `local_cert.pem` is the certificate of the local host. `priv_key.pem`
> +   is the private key of the local host. `priv_key.pem` needs to be stored in
> +   a secure location. `peer_cert.pem` is the certificate of the remote host.
> +
> +3) You can also use CA certificate to do authentication. First, you need to
> +   establish your public key infrastructure.

[ovs-dev] [PATCH] tests: Add gre test that doesn't requiring Linux gre module

2018-07-10 Thread Yifeng Sun

Currently check-kmod's gre test is broken on certain kernel
versions where ovs's gre module conflits with kernel's gre
module. But at the same time, present gre test depends on
Linux gre module to setup gre port.

This patch repairs the gre test by completely removing the
dependancy of kernel's gre module and emulating a virtual
Linux gre port that sends out arp and icmp packets.

Suggested-by: William Tu 
Signed-off-by: Yifeng Sun 
---
 tests/system-common-macros.at | 13 +
 tests/system-traffic.at   | 67 +++
 2 files changed, 80 insertions(+)

diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at
index 64bf5ec63ab4..7b841250c383 100644
--- a/tests/system-common-macros.at
+++ b/tests/system-common-macros.at
@@ -329,3 +329,16 @@ m4_define([OVS_CHECK_IPROUTE_ENCAP],
 # OVS_CHECK_CT_CLEAR()
 m4_define([OVS_CHECK_CT_CLEAR],
 [AT_SKIP_IF([! grep -q "Datapath supports ct_clear action" 
ovs-vswitchd.log])])
+
+# GEN_ICMP_DATA([start], [len])
+#
+# Generate space-separated ICMP data that is acceptable to tests/sendpkt.py.
+m4_define([GEN_ICMP_DATA],
+[[
+  ICMP_DATA=""
+  for (( n=0; n<$2; n++ )); do
+  ICMP_DATA="${ICMP_DATA} $(printf %x $((($1+n) % 256)))"
+  done
+  echo "$ICMP_DATA"
+]]
+)
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 519b234bb16b..37421ed1e78a 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -339,6 +339,73 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 
10.1.1.100 | FORMAT_PI
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([datapath - ping over gre tunnel by simulated packets])
+OVS_CHECK_GRE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+AT_CHECK([ovs-vsctl -- set bridge br0 
other-config:hwaddr=\"f2:ff:00:00:00:01\"])
+ADD_BR([br-underlay], [set bridge br-underlay 
other-config:hwaddr=\"f2:ff:00:00:00:02\"])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH(p0, at_ns0, br-underlay, "172.31.1.1/24", f2:ff:00:00:00:03)
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace.
+ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24])
+
+ip netns exec at_ns0 tcpdump -U -i p0 -w p0.pcap &
+
+dnl First, check the underlay.
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], 
[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now we don't actually add the port as below, instead, we
+dnl emulate it. Suppose its mac address is f2:ff:00:00:00:04.
+dnl ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], 
[10.1.1.1/24])
+
+dnl Okay, now send out an arp request from 10.1.1.1 for 10.1.1.100 in gre.
+NS_CHECK_EXEC([at_ns0], [$PYTHON $srcdir/sendpkt.py p0 f2 ff 00 00 00 02 f2 ff 
00 00 00 03 08 00 45 00 00 42 ec 2c 40 00 40 2f f3 bc ac 1f 01 01 ac 1f 01 64 
00 00 65 58 ff ff ff ff ff ff f2 ff 00 00 00 04 08 06 00 01 08 00 06 04 00 01 
f2 ff 00 00 00 04 0a 01 01 01 00 00 00 00 00 00 0a 01 01 64 > /dev/null])
+
+sleep 1
+AT_CHECK([tcpdump -xx -r p0.pcap 2>&1 | egrep "IP 172.31.1.100 > 172.31.1.1: 
GREv0, length 46: ARP, Reply 10.1.1.100 is-at f2:ff:00:00:00:01 .* length 28" 
2>&1 1>/dev/null])
+
+dnl Oaky, now check the overlay by sending out raw icmp packets of
+dnl different sizes in gre protocol.
+
+dnl First, send the packet that emulates
+dnl `ip netns exec at_ns0 ping 10.1.1.100`
+NS_CHECK_EXEC([at_ns0], [$PYTHON $srcdir/sendpkt.py p0 f2 ff 00 00 00 02 f2 ff 
00 00 00 03 08 00 45 00 00 7a ec 8e 40 00 40 2f f3 22 ac 1f 01 01 ac 1f 01 64 
00 00 65 58 f2 ff 00 00 00 01 f2 ff 00 00 00 04 08 00 45 00 00 54 54 8f 40 00 
40 01 cf b3 0a 01 01 01 0a 01 01 64 08 00 e6 e8 29 27 00 03 e1 a3 43 5b 00 00 
00 00 ff 1a 05 00 00 00 00 00 $(GEN_ICMP_DATA(16, 40)) > /dev/null])
+
+sleep 1
+AT_CHECK([tcpdump -xx -r p0.pcap 2>&1 | egrep "IP 172.31.1.100 > 172.31.1.1: 
GREv0, length 102: IP 10.1.1.100 > 10.1.1.1: ICMP echo reply, .* length 64$" 
2>&1 1>/dev/null])
+
+dnl Second, send the packet that emulates
+dnl `ip netns exec at_ns0 ping -s 1600 10.1.1.100`
+NS_CHECK_EXEC([at_ns0], [$PYTHON $srcdir/sendpkt.py p0 f2 ff 00 00 00 02 f2 ff 
00 00 00 03 08 00 45 00 05 ca 9a 7b 40 00 40 2f 3f e6 ac 1f 01 01 ac 1f 01 64 
00 00 65 58 f2 ff 00 00 00 01 f2 ff 00 00 00 04 08 00 45 00 05 a4 6d 4d 20 00 
40 01 d1 a5 0a 01 01 01 0a 01 01 64 08 00 11 cc 13 ed 00 01 82 c6 43 5b 00 00 
00 00 d2 e1 0c 00 00 00 00 00 $(GEN_ICMP_DATA(16, 1400)) > /dev/null])
+NS_CHECK_EXEC([at_ns0], [$PYTHON $srcdir/sendpkt.py p0 f2 ff 00 00 00 02 f2 ff 
00 00 00 03 08 00 45 00 00 f2 9a 7c 40 00 40 2f 44 bd ac 1f 01 01 ac 1f 01 64 
00 00 65 58 f2 ff 00 00 00 01 f2 ff 00 00 00 04 08 00 45 00 00 cc 6d 4d 00 b2 
40 01 f5 cb 0a 01 01 01 0a

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Check the right IPv6 address in is_nd_dst_correct().

2018-07-10 Thread Aaron Conole

Ben Pfaff  writes:

> Fixes test 815 "tunnel_push_pop_ipv6 - action".
>
> CC: Aaron Conole 
> Fixes: 6f231f7c3a9e ("xlate: use const struct in6_addr in linklocal check")
> Signed-off-by: Ben Pfaff 
> ---

D'oh!  The names killed me.  Sorry :-(

Acked-by: Aaron Conole 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] datapath: work around the single GRE receive limitation.

2018-07-10 Thread William Tu

On Tue, Jul 10, 2018 at 4:27 PM, Yifeng Sun  wrote:
> Looks good to me, thanks.
>
> Reviewed-by: Yifeng Sun 
>
>
>
> On Tue, Jul 10, 2018 at 10:50 AM, William Tu  wrote:
>>
>> Commit 9f57c67c379d ("gre: Remove support for sharing GRE protocol hook")
>> allows only single GRE packet receiver.  When upstream kernel's gre module
>> is loaded, the gre.ko exclusively becomes the only gre packet receiver,
>> preventing OVS kernel module from registering another gre receiver.
>>
>> We can either try to unload the gre.ko by removing its dependencies,
>> or, in this patch, we try to register OVS as only the GRE transmit
>> portion when detecting there already exists another GRE receiver.
>>
>> Signed-off-by: William Tu 
>> Cc: Greg Rose 
>> Cc: Yifeng Sun 
>> ---
>>  datapath/linux/compat/ip_gre.c | 60
>> +-
>>  datapath/vport.c   | 12 ++---
>>  2 files changed, 56 insertions(+), 16 deletions(-)
>>
>> diff --git a/datapath/linux/compat/ip_gre.c
>> b/datapath/linux/compat/ip_gre.c
>> index 92de70127189..1ab798164894 100644
>> --- a/datapath/linux/compat/ip_gre.c
>> +++ b/datapath/linux/compat/ip_gre.c
>> @@ -71,6 +71,7 @@ static void erspan_build_header(struct sk_buff *skb,
>> bool truncate, bool is_ipv4);
>>
>>  static struct rtnl_link_ops ipgre_link_ops __read_mostly;
>> +static bool ip_gre_loaded = false;
>>
>>  #define ip_gre_calc_hlen rpl_ip_gre_calc_hlen
>>  static int ip_gre_calc_hlen(__be16 o_flags)
>> @@ -1640,25 +1641,57 @@ int rpl_ipgre_init(void)
>> int err;
>>
>> err = register_pernet_device(_tap_net_ops);
>> -   if (err < 0)
>> -   goto pnet_tap_failed;
>> +   if (err < 0) {
>> +   if (err == -EEXIST)
>> +   goto ip_gre_loaded;
>> +   else
>> +   goto pnet_tap_failed;
>> +   }
>>
>> err = register_pernet_device(_net_ops);
>> -   if (err < 0)
>> -   goto pnet_erspan_failed;
>> +   if (err < 0) {
>> +   if (err == -EEXIST)
>> +   goto ip_gre_loaded;
>> +   else
>> +   goto pnet_erspan_failed;
>> +   }
>>
>> err = register_pernet_device(_net_ops);
>> -   if (err < 0)
>> -   goto pnet_ipgre_failed;
>> +   if (err < 0) {
>> +   if (err == -EEXIST)
>> +   goto ip_gre_loaded;
>> +   else
>> +   goto pnet_ipgre_failed;
>> +   }
>>
>> err = gre_add_protocol(_protocol, GREPROTO_CISCO);
>> if (err < 0) {
>> pr_info("%s: can't add protocol\n", __func__);
>> -   goto add_proto_failed;
>> +   if (err == -EBUSY) {
>> +   goto ip_gre_loaded;
>> +   } else {
>> +   goto add_proto_failed;
>> +   }
>> }
>>
>> pr_info("GRE over IPv4 tunneling driver\n");
>> -
>> +   ovs_vport_ops_register(_ipgre_vport_ops);
>> +   ovs_vport_ops_register(_erspan_vport_ops);
>> +   return 0;
>> +
>> +ip_gre_loaded:
>> +   /* Since GRE only allows single receiver to be registerd,
>> +* we skip here so only gre transmit works, see:
>> +*
>> +* commit 9f57c67c379d88a10e8ad676426fee5ae7341b14
>> +* Author: Pravin B Shelar 
>> +* Date:   Fri Aug 7 23:51:52 2015 -0700
>> +* gre: Remove support for sharing GRE protocol hook
>> +*
>> +* OVS GRE receive part is disabled.
>> +*/
>> +   pr_info("GRE TX only over IPv4 tunneling driver\n");
>> +   ip_gre_loaded = true;
>> ovs_vport_ops_register(_ipgre_vport_ops);
>> ovs_vport_ops_register(_erspan_vport_ops);
>> return 0;
>> @@ -1678,10 +1711,13 @@ void rpl_ipgre_fini(void)
>>  {
>> ovs_vport_ops_unregister(_erspan_vport_ops);
>> ovs_vport_ops_unregister(_ipgre_vport_ops);
>> -   gre_del_protocol(_protocol, GREPROTO_CISCO);
>> -   unregister_pernet_device(_net_ops);
>> -   unregister_pernet_device(_net_ops);
>> -   unregister_pernet_device(_tap_net_ops);
>> +
>> +   if (!ip_gre_loaded) {
>> +   gre_del_protocol(_protocol, GREPROTO_CISCO);
>> +   unregister_pernet_device(_net_ops);
>> +   unregister_pernet_device(_net_ops);
>> +   unregister_pernet_device(_tap_net_ops);
>> +   }
>>  }
>>
>>  #endif
>> diff --git a/datapath/vport.c b/datapath/vport.c
>> index 02f6b56d3243..05dd9dc2a149 100644
>> --- a/datapath/vport.c
>> +++ b/datapath/vport.c
>> @@ -66,11 +66,15 @@ int ovs_vport_init(void)
>> if (err)
>> goto err_lisp;
>> err = gre_init();
>> -   if (err && err != -EEXIST)
>> +   if (err && err != -EEXIST) {
>> goto err_gre;
>> -   else if (err == -EEXIST)
>> -   pr_warn("Cannot take GRE protocol entry - The ERSPAN

Re: [ovs-dev] [patch v1] conntrack: Fix conn_update_state_alg use after free.

2018-07-10 Thread Darrell Ball

sorry, I only had a few minutes in the morning
I sent a V2, with a proper commit message.

https://patchwork.ozlabs.org/patch/942279/

Thanks Darrell


On Tue, Jul 10, 2018 at 1:27 PM, Ben Pfaff  wrote:

> On Tue, Jul 10, 2018 at 09:15:09AM -0700, Darrell Ball wrote:
> > Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.")
> > Signed-off-by: Darrell Ball 
>
> This could use a little longer commit message.  Is the following
> accurate:
>
> When conn_update_state() returns true, it has freed 'conn' which
> therefore should not be passed into handle_ftp_ctl().
>
> Thanks,
>
> Ben.
>
> > ---
> >
> > Needs backporting as far back as 2.8.
> >
> >  lib/conntrack.c | 7 +--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/conntrack.c b/lib/conntrack.c
> > index e1c1f2e..b818584 100644
> > --- a/lib/conntrack.c
> > +++ b/lib/conntrack.c
> > @@ -1159,8 +1159,11 @@ conn_update_state_alg(struct conntrack *ct,
> struct dp_packet *pkt,
> >  } else {
> >  *create_new_conn = conn_update_state(ct, pkt, ctx, ,
> now,
> >  bucket);
> > -handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER,
> > -   !!nat_action_info);
> > +
> > +if (*create_new_conn == false) {
> > +handle_ftp_ctl(ct, ctx, pkt, conn, now,
> CT_FTP_CTL_OTHER,
> > +   !!nat_action_info);
> > +}
> >  }
> >  return true;
> >  }
> > --
> > 1.9.1
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [patch v2] conntrack: Fix conn_update_state_alg use after free.

2018-07-10 Thread Darrell Ball

When conn_update_state() returns true, conn has been freed, so skip calling
handle_ftp_ctl() with this conn and instead follow code path for new
connections.

Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.")
Signed-off-by: Darrell Ball 
---

Needs backporting as far back as 2.8.

v1->v2: Fix commit messaage.

 lib/conntrack.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index e1c1f2e..b818584 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1159,8 +1159,11 @@ conn_update_state_alg(struct conntrack *ct, struct 
dp_packet *pkt,
 } else {
 *create_new_conn = conn_update_state(ct, pkt, ctx, , now,
 bucket);
-handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER,
-   !!nat_action_info);
+
+if (*create_new_conn == false) {
+handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER,
+   !!nat_action_info);
+}
 }
 return true;
 }
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] datapath: work around the single GRE receive limitation.

2018-07-10 Thread Yifeng Sun

Looks good to me, thanks.

Reviewed-by: Yifeng Sun 


On Tue, Jul 10, 2018 at 10:50 AM, William Tu  wrote:

> Commit 9f57c67c379d ("gre: Remove support for sharing GRE protocol hook")
> allows only single GRE packet receiver.  When upstream kernel's gre module
> is loaded, the gre.ko exclusively becomes the only gre packet receiver,
> preventing OVS kernel module from registering another gre receiver.
>
> We can either try to unload the gre.ko by removing its dependencies,
> or, in this patch, we try to register OVS as only the GRE transmit
> portion when detecting there already exists another GRE receiver.
>
> Signed-off-by: William Tu 
> Cc: Greg Rose 
> Cc: Yifeng Sun 
> ---
>  datapath/linux/compat/ip_gre.c | 60 ++
> +++-
>  datapath/vport.c   | 12 ++---
>  2 files changed, 56 insertions(+), 16 deletions(-)
>
> diff --git a/datapath/linux/compat/ip_gre.c b/datapath/linux/compat/ip_
> gre.c
> index 92de70127189..1ab798164894 100644
> --- a/datapath/linux/compat/ip_gre.c
> +++ b/datapath/linux/compat/ip_gre.c
> @@ -71,6 +71,7 @@ static void erspan_build_header(struct sk_buff *skb,
> bool truncate, bool is_ipv4);
>
>  static struct rtnl_link_ops ipgre_link_ops __read_mostly;
> +static bool ip_gre_loaded = false;
>
>  #define ip_gre_calc_hlen rpl_ip_gre_calc_hlen
>  static int ip_gre_calc_hlen(__be16 o_flags)
> @@ -1640,25 +1641,57 @@ int rpl_ipgre_init(void)
> int err;
>
> err = register_pernet_device(_tap_net_ops);
> -   if (err < 0)
> -   goto pnet_tap_failed;
> +   if (err < 0) {
> +   if (err == -EEXIST)
> +   goto ip_gre_loaded;
> +   else
> +   goto pnet_tap_failed;
> +   }
>
> err = register_pernet_device(_net_ops);
> -   if (err < 0)
> -   goto pnet_erspan_failed;
> +   if (err < 0) {
> +   if (err == -EEXIST)
> +   goto ip_gre_loaded;
> +   else
> +   goto pnet_erspan_failed;
> +   }
>
> err = register_pernet_device(_net_ops);
> -   if (err < 0)
> -   goto pnet_ipgre_failed;
> +   if (err < 0) {
> +   if (err == -EEXIST)
> +   goto ip_gre_loaded;
> +   else
> +   goto pnet_ipgre_failed;
> +   }
>
> err = gre_add_protocol(_protocol, GREPROTO_CISCO);
> if (err < 0) {
> pr_info("%s: can't add protocol\n", __func__);
> -   goto add_proto_failed;
> +   if (err == -EBUSY) {
> +   goto ip_gre_loaded;
> +   } else {
> +   goto add_proto_failed;
> +   }
> }
>
> pr_info("GRE over IPv4 tunneling driver\n");
> -
> +   ovs_vport_ops_register(_ipgre_vport_ops);
> +   ovs_vport_ops_register(_erspan_vport_ops);
> +   return 0;
> +
> +ip_gre_loaded:
> +   /* Since GRE only allows single receiver to be registerd,
> +* we skip here so only gre transmit works, see:
> +*
> +* commit 9f57c67c379d88a10e8ad676426fee5ae7341b14
> +* Author: Pravin B Shelar 
> +* Date:   Fri Aug 7 23:51:52 2015 -0700
> +* gre: Remove support for sharing GRE protocol hook
> +*
> +* OVS GRE receive part is disabled.
> +*/
> +   pr_info("GRE TX only over IPv4 tunneling driver\n");
> +   ip_gre_loaded = true;
> ovs_vport_ops_register(_ipgre_vport_ops);
> ovs_vport_ops_register(_erspan_vport_ops);
> return 0;
> @@ -1678,10 +1711,13 @@ void rpl_ipgre_fini(void)
>  {
> ovs_vport_ops_unregister(_erspan_vport_ops);
> ovs_vport_ops_unregister(_ipgre_vport_ops);
> -   gre_del_protocol(_protocol, GREPROTO_CISCO);
> -   unregister_pernet_device(_net_ops);
> -   unregister_pernet_device(_net_ops);
> -   unregister_pernet_device(_tap_net_ops);
> +
> +   if (!ip_gre_loaded) {
> +   gre_del_protocol(_protocol, GREPROTO_CISCO);
> +   unregister_pernet_device(_net_ops);
> +   unregister_pernet_device(_net_ops);
> +   unregister_pernet_device(_tap_net_ops);
> +   }
>  }
>
>  #endif
> diff --git a/datapath/vport.c b/datapath/vport.c
> index 02f6b56d3243..05dd9dc2a149 100644
> --- a/datapath/vport.c
> +++ b/datapath/vport.c
> @@ -66,11 +66,15 @@ int ovs_vport_init(void)
> if (err)
> goto err_lisp;
> err = gre_init();
> -   if (err && err != -EEXIST)
> +   if (err && err != -EEXIST) {
> goto err_gre;
> -   else if (err == -EEXIST)
> -   pr_warn("Cannot take GRE protocol entry - The ERSPAN
> feature may not be supported\n");
> -   else {
> +   } else {
> +   if (err == -EEXIST) {
> +   pr_warn("Cannot take GRE protocol entry"\
> +

Re: [ovs-dev] rhel: support kmod-openvswitch build against multiple kernels, rhel6

2018-07-10 Thread 0-day Robot

Bleep bloop.  Greetings Martin Xu, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
fatal: corrupt patch at line 127
Repository lacks necessary blobs to fall back on 3-way merge.
Cannot fall back to three-way merge.
Patch failed at 0001 rhel: support kmod-openvswitch build against multiple 
kernels, rhel6
The copy of the patch that failed is found in:
   
/var/lib/jenkins/jobs/upstream_build_from_pw/workspace/.git/rebase-apply/patch
When you have resolved this problem, run "git am --resolved".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Please check this out.  If you feel there has been an error, please email 
acon...@bytheb.org

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH branch-2.9] ovn-northd: Don't shadow addr_family in add_router_lb_flow().

2018-07-10 Thread Flavio Leitner

On Tue, Jul 10, 2018 at 03:24:17PM -0700, Ben Pfaff wrote:
> This is a partial backport of commit 396d492cfa8d ("Don't shadow
> variables.") to fix a build break due to backporting a different commit
> that depended on it.
> 
> CC: Mark Michelson 
> Fixes: 15fbc3baee5a ("ovn: Add router load balancer undnat rule for IPv6")
> Signed-off-by: Ben Pfaff 
> ---

LGTM, fixed the build issue.
Acked-by: Flavio Leitner 
Thanks Ben!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [ovs-dev, branch-2.9] ovn-northd: Don't shadow addr_family in add_router_lb_flow().

2018-07-10 Thread 0-day Robot

Bleep bloop.  Greetings Ben Pfaff, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Too many signoffs; are you missing Co-authored-by lines?
Lines checked: 30, Warnings: 0, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@bytheb.org

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] rhel: support kmod-openvswitch build against multiple kernels, rhel6

2018-07-10 Thread Gregory Rose


On 7/10/2018 3:42 PM, Martin Xu wrote:

This patch only affects rhel6 spec file.

RHEL 7.4 introduced backward incompatible changes in the kernel. As
a result, prebuilt PRM packages against kernels newer than 693.17.1
will cannot be used on systems with older kernels, vice versa.

This patch allows multiple kernel version numbers delimited by
whitespace to be passed as variable "kversion". kmod-openvswitch RPM
packages the kernel module .ko files from all specified kernel
versions.

This patch also includes a script to update the weak-update symlinks
if the system kernel version is upgraded or downgraded after
kmod-openvswitch is installed.

Previouly the kernel_module_package macro is used to generate spec file
template to build kmod-openvswitch RPM. This macro is now removed.
Everything is built in the main package. To maintain consistent naming,
the rhel6 kmod spec file is renamed to kmod-openvswitch-rhel6.spec to
match the built package name kmod-openvswitch.

This patch also removes the openvswitch-kmod package.


Oops, no signed off by line.

I'm adding Aaron and Flavio so the Red Hat guys can give it a look. You 
can wait for their review

comments and then spin up a Rev 2 with the fixed signed off by line.

Thanks for the patch!

- Greg




---
  Documentation/intro/install/rhel.rst   |   7 +-
  poc/playbook-centos-builder.yml|   8 +-
  rhel/.gitignore|   1 +
  rhel/automake.mk   |   8 +-
  rhel/kmod-openvswitch-rhel6.spec.in| 120
+
  rhel/openvswitch-kmod-rhel6.spec.in| 103 --
  rhel/openvswitch-kmod.files|   3 -
  ...sr_share_openvswitch_scripts_ovs-kmod-manage.sh |  71 
  8 files changed, 203 insertions(+), 118 deletions(-)
  create mode 100644 rhel/kmod-openvswitch-rhel6.spec.in
  delete mode 100644 rhel/openvswitch-kmod-rhel6.spec.in
  delete mode 100644 rhel/openvswitch-kmod.files
  create mode 100644 rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh

diff --git a/Documentation/intro/install/rhel.rst
b/Documentation/intro/install/rhel.rst
index f8b2606..6f062e8 100644
--- a/Documentation/intro/install/rhel.rst
+++ b/Documentation/intro/install/rhel.rst
@@ -197,17 +197,16 @@ the unit tests, run::
  Kernel Module
  ~

-On RHEL 6, to build the Open vSwitch kernel module, copy
-rhel/openvswitch-kmod.files into the RPM sources directory and run::
+On RHEL 6, to build the Open vSwitch kernel module run::

-$ rpmbuild -bb rhel/openvswitch-kmod-rhel6.spec
+$ rpmbuild -bb rhel/kmod-openvswitch-rhel6.spec

  You might have to specify a kernel version and/or variants, e.g.:

  $ rpmbuild -bb \
  -D "kversion 2.6.32-131.6.1.el6.x86_64" \
  -D "kflavors default debug kdump" \
-rhel/openvswitch-kmod-rhel6.spec
+rhel/kmod-openvswitch-rhel6.spec

  This produces an "kmod-openvswitch" RPM for each kernel variant, in this
  example: "kmod-openvswitch", "kmod-openvswitch-debug", and
diff --git a/poc/playbook-centos-builder.yml b/poc/playbook-centos-builder.
yml
index 71f1040..e902db7 100644
--- a/poc/playbook-centos-builder.yml
+++ b/poc/playbook-centos-builder.yml
@@ -41,13 +41,13 @@
  chdir: /git/ovs/rhel
  with_items:
- openvswitch.spec
-  - openvswitch-kmod-rhel6.spec
+  - kmod-openvswitch-rhel6.spec

- name: Install build dependencies specified from spec files
  shell: echo "y" | yum-builddep /tmp/{{item}}
  with_items:
- openvswitch.spec
-  - openvswitch-kmod-rhel6.spec
+  - kmod-openvswitch-rhel6.spec

- name: Create rpm dev tree
  command: rpmdev-setuptree
@@ -81,7 +81,7 @@
line: "Release: {{ ansible_local.builder.release }}"
  with_items:
- openvswitch.spec
-  - openvswitch-kmod-rhel6.spec
+  - kmod-openvswitch-rhel6.spec

- name: Build Open vSwitch user space rpms
  command: rpmbuild -bb --without check rhel/openvswitch.spec
@@ -89,7 +89,7 @@
  chdir: "{{SOURCE}}/openvswitch-{{version.stdout}}"

- name: Build Open vSwitch kmod rpms (only for currently loaded kernel)
-command: rpmbuild -bb  --without check rhel/openvswitch-kmod-rhel6.spec
+command: rpmbuild -bb  --without check rhel/kmod-openvswitch-rhel6.spec
  args:
  chdir: "{{SOURCE}}/openvswitch-{{version.stdout}}"

diff --git a/rhel/.gitignore b/rhel/.gitignore
index e584a1e..9b0ce1d 100644
--- a/rhel/.gitignore
+++ b/rhel/.gitignore
@@ -1,6 +1,7 @@
  openvswitch-dkms.spec
  openvswitch-kmod-rhel5.spec
  openvswitch-kmod-rhel6.spec
+kmod-openvswitch-rhel6.spec
  openvswitch-kmod-fedora.spec
  openvswitch.spec
  openvswitch-fedora.spec
diff --git a/rhel/automake.mk b/rhel/automake.mk
index 137ff4a..7b6c78f 100644
--- a/rhel/automake.mk
+++ b/rhel/automake.mk
@@ -15,9 +15,8 @@ EXTRA_DIST += \
   rhel/etc_sysconfig_network-scripts_ifup-ovs \

[ovs-dev] [PATCH] rhel: support kmod-openvswitch build against multiple kernels, rhel6

2018-07-10 Thread Martin Xu

This patch only affects rhel6 spec file.

RHEL 7.4 introduced backward incompatible changes in the kernel. As
a result, prebuilt PRM packages against kernels newer than 693.17.1
will cannot be used on systems with older kernels, vice versa.

This patch allows multiple kernel version numbers delimited by
whitespace to be passed as variable "kversion". kmod-openvswitch RPM
packages the kernel module .ko files from all specified kernel
versions.

This patch also includes a script to update the weak-update symlinks
if the system kernel version is upgraded or downgraded after
kmod-openvswitch is installed.

Previouly the kernel_module_package macro is used to generate spec file
template to build kmod-openvswitch RPM. This macro is now removed.
Everything is built in the main package. To maintain consistent naming,
the rhel6 kmod spec file is renamed to kmod-openvswitch-rhel6.spec to
match the built package name kmod-openvswitch.

This patch also removes the openvswitch-kmod package.
---
 Documentation/intro/install/rhel.rst   |   7 +-
 poc/playbook-centos-builder.yml|   8 +-
 rhel/.gitignore|   1 +
 rhel/automake.mk   |   8 +-
 rhel/kmod-openvswitch-rhel6.spec.in| 120
+
 rhel/openvswitch-kmod-rhel6.spec.in| 103 --
 rhel/openvswitch-kmod.files|   3 -
 ...sr_share_openvswitch_scripts_ovs-kmod-manage.sh |  71 
 8 files changed, 203 insertions(+), 118 deletions(-)
 create mode 100644 rhel/kmod-openvswitch-rhel6.spec.in
 delete mode 100644 rhel/openvswitch-kmod-rhel6.spec.in
 delete mode 100644 rhel/openvswitch-kmod.files
 create mode 100644 rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh

diff --git a/Documentation/intro/install/rhel.rst
b/Documentation/intro/install/rhel.rst
index f8b2606..6f062e8 100644
--- a/Documentation/intro/install/rhel.rst
+++ b/Documentation/intro/install/rhel.rst
@@ -197,17 +197,16 @@ the unit tests, run::
 Kernel Module
 ~

-On RHEL 6, to build the Open vSwitch kernel module, copy
-rhel/openvswitch-kmod.files into the RPM sources directory and run::
+On RHEL 6, to build the Open vSwitch kernel module run::

-$ rpmbuild -bb rhel/openvswitch-kmod-rhel6.spec
+$ rpmbuild -bb rhel/kmod-openvswitch-rhel6.spec

 You might have to specify a kernel version and/or variants, e.g.:

 $ rpmbuild -bb \
 -D "kversion 2.6.32-131.6.1.el6.x86_64" \
 -D "kflavors default debug kdump" \
-rhel/openvswitch-kmod-rhel6.spec
+rhel/kmod-openvswitch-rhel6.spec

 This produces an "kmod-openvswitch" RPM for each kernel variant, in this
 example: "kmod-openvswitch", "kmod-openvswitch-debug", and
diff --git a/poc/playbook-centos-builder.yml b/poc/playbook-centos-builder.
yml
index 71f1040..e902db7 100644
--- a/poc/playbook-centos-builder.yml
+++ b/poc/playbook-centos-builder.yml
@@ -41,13 +41,13 @@
 chdir: /git/ovs/rhel
 with_items:
   - openvswitch.spec
-  - openvswitch-kmod-rhel6.spec
+  - kmod-openvswitch-rhel6.spec

   - name: Install build dependencies specified from spec files
 shell: echo "y" | yum-builddep /tmp/{{item}}
 with_items:
   - openvswitch.spec
-  - openvswitch-kmod-rhel6.spec
+  - kmod-openvswitch-rhel6.spec

   - name: Create rpm dev tree
 command: rpmdev-setuptree
@@ -81,7 +81,7 @@
   line: "Release: {{ ansible_local.builder.release }}"
 with_items:
   - openvswitch.spec
-  - openvswitch-kmod-rhel6.spec
+  - kmod-openvswitch-rhel6.spec

   - name: Build Open vSwitch user space rpms
 command: rpmbuild -bb --without check rhel/openvswitch.spec
@@ -89,7 +89,7 @@
 chdir: "{{SOURCE}}/openvswitch-{{version.stdout}}"

   - name: Build Open vSwitch kmod rpms (only for currently loaded kernel)
-command: rpmbuild -bb  --without check rhel/openvswitch-kmod-rhel6.spec
+command: rpmbuild -bb  --without check rhel/kmod-openvswitch-rhel6.spec
 args:
 chdir: "{{SOURCE}}/openvswitch-{{version.stdout}}"

diff --git a/rhel/.gitignore b/rhel/.gitignore
index e584a1e..9b0ce1d 100644
--- a/rhel/.gitignore
+++ b/rhel/.gitignore
@@ -1,6 +1,7 @@
 openvswitch-dkms.spec
 openvswitch-kmod-rhel5.spec
 openvswitch-kmod-rhel6.spec
+kmod-openvswitch-rhel6.spec
 openvswitch-kmod-fedora.spec
 openvswitch.spec
 openvswitch-fedora.spec
diff --git a/rhel/automake.mk b/rhel/automake.mk
index 137ff4a..7b6c78f 100644
--- a/rhel/automake.mk
+++ b/rhel/automake.mk
@@ -15,9 +15,8 @@ EXTRA_DIST += \
  rhel/etc_sysconfig_network-scripts_ifup-ovs \
  rhel/openvswitch-dkms.spec \
  rhel/openvswitch-dkms.spec.in \
- rhel/openvswitch-kmod-rhel6.spec \
- rhel/openvswitch-kmod-rhel6.spec.in \
- rhel/openvswitch-kmod.files \
+ rhel/kmod-openvswitch-rhel6.spec \
+ rhel/kmod-openvswitch-rhel6.spec.in \
  rhel/openvswitch-kmod-fedora.spec \
  rhel/openvswitch-kmod-fedora.spec.in \

Re: [ovs-dev] [PATCH] ovn: Add router load balancer undnat rule for IPv6

2018-07-10 Thread Ben Pfaff

Oops.  I sent a fix:
https://patchwork.ozlabs.org/patch/942271/
I forgot to add you in Reported-by but I'll do that before pushing.

On Tue, Jul 10, 2018 at 06:44:15PM -0300, Flavio Leitner wrote:
> 
> Hi Ben and Mark,
> 
> This patch broke 2.9 build over here, see below.
> 
> On Tue, Jul 10, 2018 at 02:24:46PM -0700, Ben Pfaff wrote:
> > Done.
> > 
> > On Tue, Jul 10, 2018 at 10:41:32AM -0400, Mark Michelson wrote:
> > > Hi Ben,
> > > 
> > > Thanks for applying this patch to master. Can you please apply the patch 
> > > to
> > > the 2.9 branch as well?
> > > 
> > > Thank you,
> > > Mark
> > > 
> > > On 06/26/2018 02:45 PM, Mark Michelson wrote:
> > > >A note: if approved, this patch will also need to go into version 2.9
> > > >
> > > >On 06/26/2018 02:42 PM, Mark Michelson wrote:
> > > >>When configuring a router port to have a redirect-chassis and using an
> > > >>IPv6 load balancer rule that specifies a TCP/UDP port, load balancing
> > > >>would not work as expected. This is because a rule to un-dnat the return
> > > >>traffic from the load balancer destination was not installed. This is
> > > >>because this rule was only being installed for IPv4 load balancers.
> > > >>
> > > >>This change adds the same rule for IPv6 load balancers as well.
> > > >>
> > > >>Signed-off-by: Mark Michelson 
> > > >>---
> > > >>  ovn/northd/ovn-northd.c | 15 +++
> > > >>  1 file changed, 11 insertions(+), 4 deletions(-)
> > > >>
> > > >>diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > > >>index 72fe4e795..2ca439b39 100644
> > > >>--- a/ovn/northd/ovn-northd.c
> > > >>+++ b/ovn/northd/ovn-northd.c
> > > >>@@ -4641,8 +4641,7 @@ add_router_lb_flow(struct hmap *lflows, struct
> > > >>ovn_datapath *od,
> > > >>  free(new_match);
> > > >>  free(est_match);
> > > >>-    if (!od->l3dgw_port || !od->l3redirect_port || !backend_ips
> > > >>-    || addr_family != AF_INET) {
> > > >>+    if (!od->l3dgw_port || !od->l3redirect_port || !backend_ips) {
> > > >>  return;
> > > >>  }
> > > >>@@ -4651,7 +4650,11 @@ add_router_lb_flow(struct hmap *lflows, struct
> > > >>ovn_datapath *od,
> > > >>   * router has a gateway router port associated.
> > > >>   */
> > > >>  struct ds undnat_match = DS_EMPTY_INITIALIZER;
> > > >>-    ds_put_cstr(_match, "ip4 && (");
> > > >>+    if (addr_family == AF_INET) {
> > > >>+    ds_put_cstr(_match, "ip4 && (");
> > > >>+    } else {
> > > >>+    ds_put_cstr(_match, "ip6 && (");
> > > >>+    }
> > > >>  char *start, *next, *ip_str;
> > > >>  start = next = xstrdup(backend_ips);
> > > >>  ip_str = strsep(, ",");
> > > >>@@ -4666,7 +4669,11 @@ add_router_lb_flow(struct hmap *lflows, struct
> > > >>ovn_datapath *od,
> > > >>  break;
> > > >>  }
> > > >>-    ds_put_format(_match, "(ip4.src == %s", ip_address);
> > > >>+    if (addr_family_ == AF_INET) {
> extra _?  
> 
> 
> > > >>+    ds_put_format(_match, "(ip4.src == %s", ip_address);
> > > >>+    } else {
> > > >>+    ds_put_format(_match, "(ip6.src == %s", ip_address);
> > > >>+    }
> > > >>  free(ip_address);
> > > >>  if (port) {
> > > >>  ds_put_format(_match, " && %s.src == %d) || ",
> > > >>
> > > >
> > > 
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> -- 
> Flavio
> 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH branch-2.9] ovn-northd: Don't shadow addr_family in add_router_lb_flow().

2018-07-10 Thread Ben Pfaff

This is a partial backport of commit 396d492cfa8d ("Don't shadow
variables.") to fix a build break due to backporting a different commit
that depended on it.

CC: Mark Michelson 
Fixes: 15fbc3baee5a ("ovn: Add router load balancer undnat rule for IPv6")
Signed-off-by: Ben Pfaff 
---
 ovn/northd/ovn-northd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 2509b0a57320..334aa6f8c742 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4463,9 +4463,9 @@ add_router_lb_flow(struct hmap *lflows, struct 
ovn_datapath *od,
 while (ip_str && ip_str[0]) {
 char *ip_address = NULL;
 uint16_t port = 0;
-int addr_family;
+int addr_family_;
 ip_address_and_port_from_lb_key(ip_str, _address, ,
-_family);
+_family_);
 if (!ip_address) {
 break;
 }
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] datapath: work around the single GRE receive limitation.

2018-07-10 Thread Gregory Rose


On 7/10/2018 10:50 AM, William Tu wrote:

Commit 9f57c67c379d ("gre: Remove support for sharing GRE protocol hook")
allows only single GRE packet receiver.  When upstream kernel's gre module
is loaded, the gre.ko exclusively becomes the only gre packet receiver,
preventing OVS kernel module from registering another gre receiver.

We can either try to unload the gre.ko by removing its dependencies,
or, in this patch, we try to register OVS as only the GRE transmit
portion when detecting there already exists another GRE receiver.

Signed-off-by: William Tu 
Cc: Greg Rose 
Cc: Yifeng Sun 


LGTM

Tested-by: Greg Rose 
Reviewed-by: Greg Rose 


---
  datapath/linux/compat/ip_gre.c | 60 +-
  datapath/vport.c   | 12 ++---
  2 files changed, 56 insertions(+), 16 deletions(-)

diff --git a/datapath/linux/compat/ip_gre.c b/datapath/linux/compat/ip_gre.c
index 92de70127189..1ab798164894 100644
--- a/datapath/linux/compat/ip_gre.c
+++ b/datapath/linux/compat/ip_gre.c
@@ -71,6 +71,7 @@ static void erspan_build_header(struct sk_buff *skb,
bool truncate, bool is_ipv4);
  
  static struct rtnl_link_ops ipgre_link_ops __read_mostly;

+static bool ip_gre_loaded = false;
  
  #define ip_gre_calc_hlen rpl_ip_gre_calc_hlen

  static int ip_gre_calc_hlen(__be16 o_flags)
@@ -1640,25 +1641,57 @@ int rpl_ipgre_init(void)
int err;
  
  	err = register_pernet_device(_tap_net_ops);

-   if (err < 0)
-   goto pnet_tap_failed;
+   if (err < 0) {
+   if (err == -EEXIST)
+   goto ip_gre_loaded;
+   else
+   goto pnet_tap_failed;
+   }
  
  	err = register_pernet_device(_net_ops);

-   if (err < 0)
-   goto pnet_erspan_failed;
+   if (err < 0) {
+   if (err == -EEXIST)
+   goto ip_gre_loaded;
+   else
+   goto pnet_erspan_failed;
+   }
  
  	err = register_pernet_device(_net_ops);

-   if (err < 0)
-   goto pnet_ipgre_failed;
+   if (err < 0) {
+   if (err == -EEXIST)
+   goto ip_gre_loaded;
+   else
+   goto pnet_ipgre_failed;
+   }
  
  	err = gre_add_protocol(_protocol, GREPROTO_CISCO);

if (err < 0) {
pr_info("%s: can't add protocol\n", __func__);
-   goto add_proto_failed;
+   if (err == -EBUSY) {
+   goto ip_gre_loaded;
+   } else {
+   goto add_proto_failed;
+   }
}
  
  	pr_info("GRE over IPv4 tunneling driver\n");

-   
+   ovs_vport_ops_register(_ipgre_vport_ops);
+   ovs_vport_ops_register(_erspan_vport_ops);
+   return 0;
+
+ip_gre_loaded:
+   /* Since GRE only allows single receiver to be registerd,
+* we skip here so only gre transmit works, see:
+*
+* commit 9f57c67c379d88a10e8ad676426fee5ae7341b14
+* Author: Pravin B Shelar 
+* Date:   Fri Aug 7 23:51:52 2015 -0700
+* gre: Remove support for sharing GRE protocol hook
+*
+* OVS GRE receive part is disabled.
+*/
+   pr_info("GRE TX only over IPv4 tunneling driver\n");
+   ip_gre_loaded = true;
ovs_vport_ops_register(_ipgre_vport_ops);
ovs_vport_ops_register(_erspan_vport_ops);
return 0;
@@ -1678,10 +1711,13 @@ void rpl_ipgre_fini(void)
  {
ovs_vport_ops_unregister(_erspan_vport_ops);
ovs_vport_ops_unregister(_ipgre_vport_ops);
-   gre_del_protocol(_protocol, GREPROTO_CISCO);
-   unregister_pernet_device(_net_ops);
-   unregister_pernet_device(_net_ops);
-   unregister_pernet_device(_tap_net_ops);
+
+   if (!ip_gre_loaded) {
+   gre_del_protocol(_protocol, GREPROTO_CISCO);
+   unregister_pernet_device(_net_ops);
+   unregister_pernet_device(_net_ops);
+   unregister_pernet_device(_tap_net_ops);
+   }
  }
  
  #endif

diff --git a/datapath/vport.c b/datapath/vport.c
index 02f6b56d3243..05dd9dc2a149 100644
--- a/datapath/vport.c
+++ b/datapath/vport.c
@@ -66,11 +66,15 @@ int ovs_vport_init(void)
if (err)
goto err_lisp;
err = gre_init();
-   if (err && err != -EEXIST)
+   if (err && err != -EEXIST) {
goto err_gre;
-   else if (err == -EEXIST)
-   pr_warn("Cannot take GRE protocol entry - The ERSPAN feature may not 
be supported\n");
-   else {
+   } else {
+   if (err == -EEXIST) {
+   pr_warn("Cannot take GRE protocol entry"\
+   "- The ERSPAN feature may not be supported\n");
+   /* continue GRE tx */
+   }
+
err = ipgre_init();
if (err && err != -EEXIST)
goto

Re: [ovs-dev] [PATCH v3 1/4] dpif-netlink: Detect Out-Of-Resource condition on a netdev

2018-07-10 Thread Ben Pfaff

On Tue, Jul 10, 2018 at 11:59:35AM +0530, Sriharsha Basavapatna via dev wrote:
> This is the first patch in the patch-set to support dynamic rebalancing
> of offloaded flows.
> 
> The patch detects OOR condition on a netdev port when ENOSPC error is
> returned by TC-Flower while adding a flow rule. A new structure is added
> to the netdev called "netdev_hw_info", to store OOR related information
> required to perform dynamic offload-rebalancing.
> 
> Signed-off-by: Sriharsha Basavapatna 
> Co-authored-by: Venkat Duvvuru 
> Signed-off-by: Venkat Duvvuru 
> Reviewed-by: Sathya Perla 

Thanks for the patch.

I spent some time adjusting the style to better fit what we usually do
these days in OVS, which puts declarations as close to first use as
practical.  I'm appending the incremental that I'd suggest folding in.

However I noticed a potentially serious bug while doing it.  Before this
patch, the code looked for output actions and took the dst_port from the
last one it found that was a tunnel.  After this patch, it does the same
thing but it also leaks a netdev for every output action other than the
first.  I added a "break;" to mitigate the damage but I guess it's not a
correct solution.

Thanks,

Ben.

--8<--cut here-->8--

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 5a6d53ad5697..ecc1ea5f4455 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2097,16 +2097,12 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
dpif_flow_put *put)
 {
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
 const struct dpif_class *dpif_class = dpif->dpif.dpif_class;
-struct netdev_hw_info *hw_info;
 struct match match;
 odp_port_t in_port;
-odp_port_t out_port;
 const struct nlattr *nla;
 size_t left;
 struct netdev *dev;
 struct netdev *outdev = NULL;
-struct netdev *tunnel_netdev = NULL;
-struct netdev *oor_netdev = NULL;
 struct offload_info info;
 ovs_be16 dst_port = 0;
 int err;
@@ -2137,7 +2133,7 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
dpif_flow_put *put)
 if (nl_attr_type(nla) == OVS_ACTION_ATTR_OUTPUT) {
 const struct netdev_tunnel_config *tnl_cfg;
 
-out_port = nl_attr_get_odp_port(nla);
+odp_port_t out_port = nl_attr_get_odp_port(nla);
 outdev = netdev_ports_get(out_port, dpif_class);
 if (!outdev) {
 err = EOPNOTSUPP;
@@ -2147,6 +2143,7 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
dpif_flow_put *put)
 if (tnl_cfg && tnl_cfg->dst_port != 0) {
 dst_port = tnl_cfg->dst_port;
 }
+break;
 }
 }
 
@@ -2177,18 +2174,16 @@ parse_flow_put(struct dpif_netlink *dpif, struct 
dpif_flow_put *put)
 
 VLOG_DBG("added flow");
 } else if (err != EEXIST) {
-if (outdev && dev && (err == ENOSPC)) {
-tunnel_netdev = flow_get_tunnel_netdev();
-if (tunnel_netdev) {
-oor_netdev = tunnel_netdev;
-} else {
-oor_netdev = dev;
-}
-hw_info = _netdev->hw_info;
-hw_info->oor = true;
+struct netdev *oor_netdev = NULL;
+if (outdev && err == ENOSPC) {
+oor_netdev = flow_get_tunnel_netdev();
+if (!oor_netdev) {
+oor_netdev = dev;
+}
+oor_netdev->hw_info.oor = true;
 }
 VLOG_ERR_RL(, "failed to offload flow: %s: %s", ovs_strerror(err),
-(oor_netdev ? oor_netdev->name : dev->name));
+oor_netdev ? oor_netdev->name : dev->name);
 }
 
 out:
diff --git a/lib/flow.c b/lib/flow.c
index 90a1c0a3aa21..c1191e368419 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -3410,11 +3410,7 @@ flow_limit_vlans(int vlan_limit)
 struct netdev *
 flow_get_tunnel_netdev(struct flow_tnl *tunnel)
 {
-struct netdev *tunnel_netdev;
-char iface[IFNAMSIZ];
 struct in6_addr ip6;
-struct in6_addr gw;
-
 if (tunnel->ip_src) {
 in6_addr_set_mapped_ipv4(, tunnel->ip_src);
 } else if (ipv6_addr_is_set(>ipv6_src)) {
@@ -3423,10 +3419,11 @@ flow_get_tunnel_netdev(struct flow_tnl *tunnel)
 return NULL;
 }
 
+char iface[IFNAMSIZ];
+struct in6_addr gw;
 if (!ovs_router_lookup(0, , iface, NULL, )) {
-return (NULL);
+return NULL;
 }
 
-tunnel_netdev = netdev_from_name(iface);
-return tunnel_netdev;
+return netdev_from_name(iface);
 }
diff --git a/lib/netdev.c b/lib/netdev.c
index b17f0563fb3b..3e66158824e9 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -2473,7 +2473,6 @@ netdev_free_custom_stats_counters(struct 
netdev_custom_stats *custom_stats)
 }
 
 #ifdef __linux__
-
 static void
 netdev_ports_flow_init(void)
 {
___
dev mailing list
d...@openvswitch.org

Re: [ovs-dev] [PATCH 2/2] Add Drop Stats unit test cases

2018-07-10 Thread Ben Pfaff

On Fri, Jun 01, 2018 at 08:17:07AM +, Rohith Basavaraja wrote:
> This patch adds unit test cases for Improved Drop statistics feature
> request raised by following
> https://patchwork.ozlabs.org/patch/918934/
> 
> Signed-off-by: Rohith Basavaraja 

Where is patch 1/2?

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v3 1/1] netdev-vport: reject concomitant incompatible tunnels

2018-07-10 Thread Ben Pfaff

On Fri, Jun 01, 2018 at 07:03:55PM +0200, Eelco Chaudron wrote:
> This patch will make sure VXLAN tunnels with and without the group
> based policy (GBP) option enabled can not coexist on the same
> destination UDP port.
> 
> In theory, VXLAN tunnel with and without GBP enables can be
> multiplexed on the same UDP port as long as different VNI's are
> used. However currently OVS does not support this, hence this patch to
> check for this condition.
> 
> Signed-off-by: Eelco Chaudron 

Thanks for the patch, and sorry that I'm so slow.

Does this support the case where, in a single database transaction, a
GBP VXLAN tunnel is removed and a non-GBP VXLAN tunnel is created that
would otherwise interfere with one another (and the converse case)?  If
so, could that be included in the test?

"sparse" doesn't like the initialization strategy:

../lib/netdev-vport.c:1207:9: error: Using plain integer as NULL pointer
../lib/netdev-vport.c:1211:9: error: Using plain integer as NULL pointer
../lib/netdev-vport.c:1215:9: error: Using plain integer as NULL pointer
../lib/netdev-vport.c:1219:9: error: Using plain integer as NULL pointer
../lib/netdev-vport.c:1220:9: error: Using plain integer as NULL pointer
../lib/netdev-vport.c:1221:9: error: Using plain integer as NULL pointer
../lib/netdev-vport.c:1225:9: error: Using plain integer as NULL pointer
../lib/netdev-vport.c:1229:9: error: Using plain integer as NULL pointer
../lib/netdev-vport.c:1261:13: error: Using plain integer as NULL pointer

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovn: Add router load balancer undnat rule for IPv6

2018-07-10 Thread Flavio Leitner



Hi Ben and Mark,

This patch broke 2.9 build over here, see below.

On Tue, Jul 10, 2018 at 02:24:46PM -0700, Ben Pfaff wrote:
> Done.
> 
> On Tue, Jul 10, 2018 at 10:41:32AM -0400, Mark Michelson wrote:
> > Hi Ben,
> > 
> > Thanks for applying this patch to master. Can you please apply the patch to
> > the 2.9 branch as well?
> > 
> > Thank you,
> > Mark
> > 
> > On 06/26/2018 02:45 PM, Mark Michelson wrote:
> > >A note: if approved, this patch will also need to go into version 2.9
> > >
> > >On 06/26/2018 02:42 PM, Mark Michelson wrote:
> > >>When configuring a router port to have a redirect-chassis and using an
> > >>IPv6 load balancer rule that specifies a TCP/UDP port, load balancing
> > >>would not work as expected. This is because a rule to un-dnat the return
> > >>traffic from the load balancer destination was not installed. This is
> > >>because this rule was only being installed for IPv4 load balancers.
> > >>
> > >>This change adds the same rule for IPv6 load balancers as well.
> > >>
> > >>Signed-off-by: Mark Michelson 
> > >>---
> > >>  ovn/northd/ovn-northd.c | 15 +++
> > >>  1 file changed, 11 insertions(+), 4 deletions(-)
> > >>
> > >>diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > >>index 72fe4e795..2ca439b39 100644
> > >>--- a/ovn/northd/ovn-northd.c
> > >>+++ b/ovn/northd/ovn-northd.c
> > >>@@ -4641,8 +4641,7 @@ add_router_lb_flow(struct hmap *lflows, struct
> > >>ovn_datapath *od,
> > >>  free(new_match);
> > >>  free(est_match);
> > >>-    if (!od->l3dgw_port || !od->l3redirect_port || !backend_ips
> > >>-    || addr_family != AF_INET) {
> > >>+    if (!od->l3dgw_port || !od->l3redirect_port || !backend_ips) {
> > >>  return;
> > >>  }
> > >>@@ -4651,7 +4650,11 @@ add_router_lb_flow(struct hmap *lflows, struct
> > >>ovn_datapath *od,
> > >>   * router has a gateway router port associated.
> > >>   */
> > >>  struct ds undnat_match = DS_EMPTY_INITIALIZER;
> > >>-    ds_put_cstr(_match, "ip4 && (");
> > >>+    if (addr_family == AF_INET) {
> > >>+    ds_put_cstr(_match, "ip4 && (");
> > >>+    } else {
> > >>+    ds_put_cstr(_match, "ip6 && (");
> > >>+    }
> > >>  char *start, *next, *ip_str;
> > >>  start = next = xstrdup(backend_ips);
> > >>  ip_str = strsep(, ",");
> > >>@@ -4666,7 +4669,11 @@ add_router_lb_flow(struct hmap *lflows, struct
> > >>ovn_datapath *od,
> > >>  break;
> > >>  }
> > >>-    ds_put_format(_match, "(ip4.src == %s", ip_address);
> > >>+    if (addr_family_ == AF_INET) {
    extra _?  


> > >>+    ds_put_format(_match, "(ip4.src == %s", ip_address);
> > >>+    } else {
> > >>+    ds_put_format(_match, "(ip6.src == %s", ip_address);
> > >>+    }
> > >>  free(ip_address);
> > >>  if (port) {
> > >>  ds_put_format(_match, " && %s.src == %d) || ",
> > >>
> > >
> > 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
Flavio


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [patch v1] tests: Fix ICMP related 2 false positives.

2018-07-10 Thread Ben Pfaff

On Thu, Jul 05, 2018 at 08:11:33PM -0700, Darrell Ball wrote:
> Filter out packet-ins for V6 packets as this is a V4 test.
> 
> Signed-off-by: Darrell Ball 

Applied to master, thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 2/3] Partially revert "m4: Add hard requirements for python in "configure""

2018-07-10 Thread Timothy Redaelli

On Tue, 10 Jul 2018 14:23:59 -0700
Ben Pfaff  wrote:

> On Thu, Jun 14, 2018 at 01:50:16PM -0700, Ben Pfaff wrote:
> > On Thu, May 31, 2018 at 04:52:41PM +0200, Timothy Redaelli wrote:  
> > > This partially reverts commit
> > > d53d2f4b84cc77868f2a55d66417fe6058d0.
> > > 
> > > Python2 check for "six.moves.range" is not reverted.
> > > 
> > > Signed-off-by: Timothy Redaelli   
> > 
> > Thanks for the patch.
> > 
> > The commit message doesn't make it clear to me the reason for the
> > patch.  Can you add a sentence or two?  
> 
> I guess this series is still under review, do you want it for 2.10?

Hi,
I re-sent this patch merged with patch 3 in
https://patchwork.ozlabs.org/project/openvswitch/list/?series=51572

It would be great, if possible, to have it for OVS 2.10.

Thank you
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] Creation of network using ovs in Docker

2018-07-10 Thread Sandeep Adapala

You have been a great help Guru. thank you.

On Tue, Jul 10, 2018 at 5:25 PM, Guru Shetty  wrote:

> I would suggest to create a new topic with a valid subject name to attract
> the dpdk folks and post it in disc...@openvswitch.org
>
> On 10 July 2018 at 13:44, Sandeep Adapala 
> wrote:
>
>> Hello Guru,
>>
>> Now I have installed DPDK and OVS both of them are compatible with each
>> other but now I got stuck at new place. I know you are not very familiar
>> with DPDK but can you add someone who can help me out in that.
>>
>> I have configured the OVS to use dpdk library using
>>
>> ./configure --with-dpdk=$DPDK_BUILD
>>
>> then i built it and started it the same way as mentioned in this document
>>
>> http://docs.openvswitch.org/en/latest/intro/install/general/
>> #general-building
>>
>> but when i run ovs-vsctl get Open_vSwitch . dpdk_initialized i dont see
>> dpdk running
>>
>>
>> Regards,
>> Sandeep
>>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] Creation of network using ovs in Docker

2018-07-10 Thread Guru Shetty

I would suggest to create a new topic with a valid subject name to attract
the dpdk folks and post it in disc...@openvswitch.org

On 10 July 2018 at 13:44, Sandeep Adapala 
wrote:

> Hello Guru,
>
> Now I have installed DPDK and OVS both of them are compatible with each
> other but now I got stuck at new place. I know you are not very familiar
> with DPDK but can you add someone who can help me out in that.
>
> I have configured the OVS to use dpdk library using
>
> ./configure --with-dpdk=$DPDK_BUILD
>
> then i built it and started it the same way as mentioned in this document
>
> http://docs.openvswitch.org/en/latest/intro/install/
> general/#general-building
>
> but when i run ovs-vsctl get Open_vSwitch . dpdk_initialized i dont see
> dpdk running
>
>
> Regards,
> Sandeep
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovn: Add router load balancer undnat rule for IPv6

2018-07-10 Thread Ben Pfaff

Done.

On Tue, Jul 10, 2018 at 10:41:32AM -0400, Mark Michelson wrote:
> Hi Ben,
> 
> Thanks for applying this patch to master. Can you please apply the patch to
> the 2.9 branch as well?
> 
> Thank you,
> Mark
> 
> On 06/26/2018 02:45 PM, Mark Michelson wrote:
> >A note: if approved, this patch will also need to go into version 2.9
> >
> >On 06/26/2018 02:42 PM, Mark Michelson wrote:
> >>When configuring a router port to have a redirect-chassis and using an
> >>IPv6 load balancer rule that specifies a TCP/UDP port, load balancing
> >>would not work as expected. This is because a rule to un-dnat the return
> >>traffic from the load balancer destination was not installed. This is
> >>because this rule was only being installed for IPv4 load balancers.
> >>
> >>This change adds the same rule for IPv6 load balancers as well.
> >>
> >>Signed-off-by: Mark Michelson 
> >>---
> >>  ovn/northd/ovn-northd.c | 15 +++
> >>  1 file changed, 11 insertions(+), 4 deletions(-)
> >>
> >>diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> >>index 72fe4e795..2ca439b39 100644
> >>--- a/ovn/northd/ovn-northd.c
> >>+++ b/ovn/northd/ovn-northd.c
> >>@@ -4641,8 +4641,7 @@ add_router_lb_flow(struct hmap *lflows, struct
> >>ovn_datapath *od,
> >>  free(new_match);
> >>  free(est_match);
> >>-    if (!od->l3dgw_port || !od->l3redirect_port || !backend_ips
> >>-    || addr_family != AF_INET) {
> >>+    if (!od->l3dgw_port || !od->l3redirect_port || !backend_ips) {
> >>  return;
> >>  }
> >>@@ -4651,7 +4650,11 @@ add_router_lb_flow(struct hmap *lflows, struct
> >>ovn_datapath *od,
> >>   * router has a gateway router port associated.
> >>   */
> >>  struct ds undnat_match = DS_EMPTY_INITIALIZER;
> >>-    ds_put_cstr(_match, "ip4 && (");
> >>+    if (addr_family == AF_INET) {
> >>+    ds_put_cstr(_match, "ip4 && (");
> >>+    } else {
> >>+    ds_put_cstr(_match, "ip6 && (");
> >>+    }
> >>  char *start, *next, *ip_str;
> >>  start = next = xstrdup(backend_ips);
> >>  ip_str = strsep(, ",");
> >>@@ -4666,7 +4669,11 @@ add_router_lb_flow(struct hmap *lflows, struct
> >>ovn_datapath *od,
> >>  break;
> >>  }
> >>-    ds_put_format(_match, "(ip4.src == %s", ip_address);
> >>+    if (addr_family_ == AF_INET) {
> >>+    ds_put_format(_match, "(ip4.src == %s", ip_address);
> >>+    } else {
> >>+    ds_put_format(_match, "(ip6.src == %s", ip_address);
> >>+    }
> >>  free(ip_address);
> >>  if (port) {
> >>  ds_put_format(_match, " && %s.src == %d) || ",
> >>
> >
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 2/3] Partially revert "m4: Add hard requirements for python in "configure""

2018-07-10 Thread Ben Pfaff

On Thu, Jun 14, 2018 at 01:50:16PM -0700, Ben Pfaff wrote:
> On Thu, May 31, 2018 at 04:52:41PM +0200, Timothy Redaelli wrote:
> > This partially reverts commit d53d2f4b84cc77868f2a55d66417fe6058d0.
> > 
> > Python2 check for "six.moves.range" is not reverted.
> > 
> > Signed-off-by: Timothy Redaelli 
> 
> Thanks for the patch.
> 
> The commit message doesn't make it clear to me the reason for the
> patch.  Can you add a sentence or two?

I guess this series is still under review, do you want it for 2.10?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] bridge: Clean leaking netdevs when route is added.

2018-07-10 Thread Ben Pfaff

On Thu, Jun 21, 2018 at 06:39:16PM +0100, Tiago Lam wrote:
> When adding a route to a bridge, by executing "$appctl ovs/route/add
> $IP/$MASK $BR", a reference to the existing netdev is taken and stored
> in an instantiated ip_dev struct which is then stored in an addr_list
> list in tnl-ports.c. When OvS is signaled to exit, as a result of a
> "$appctl $OVS_PID exit --cleanup", for example, the bridge takes care of
> destroying its allocated port and iface structs. While destroying and
> freeing an iface, the netdev associated with it is also destroyed.
> However, for this to happen its ref_cnt must be 0.  Otherwise the
> destructor of the netdev (specific to each datapath) won't be called. On
> the userspace datapath this means a system interface, such as "br0",
> wouldn't get deleted upon exit of OvS (when a route happens to be
> assocaited).
> 
> This was first observed in the "ptap - triangle bridge setup with L2 and
> L3 GRE tunnels" test, which runs as part of the system userspace
> testsuite and uses the netdev datapath (as opoosed to several tests
> which use the dummy datapath, where this issue isn't seen). The test
> would pass every other time and fail the rest of the times because the
> needed system interfaces (br-p1, br-p2 and br-p3) were already present
> (from the previous successfull run which didn't clean up properly),
> leading to a failure.
> 
> To fix the leak and clean up the interfaces upon exit, on its final
> stage before destroying a netdev, in iface_destroy__(), the bridge calls
> tnl_port_map_delete_ipdev() which takes care of freeing the instatiated
> ip_dev structs that refer to a specific netdev.
> 
> An extra test is also introduced which verifies that the resources used
> by OvS netdev datapath have been correctly cleaned up between
> OVS_TRAFFIC_VSWITCHD_STOP and AT_CLEANUP.
> 
> Signed-off-by: Tiago Lam 

I don't fully understand this one, but the commit message is so detailed
and the patch is so short, so I applied it to master.  If it needs
backporting, please let me know.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] 答复: [bug] ovs crash when call xlate_normal_flood

2018-07-10 Thread Ben Pfaff

Hmm, that's interesting.  What was the problem, then?

On Fri, Jun 29, 2018 at 01:14:38AM +, Linhaifeng wrote:
> Sorry.This is not ovs‘s bug.
> 
> 发件人: Linhaifeng
> 发送时间: 2018年6月21日 13:54
> 收件人: 'd...@openvswitch.org' 
> 主题: 答复: [bug] ovs crash when call xlate_normal_flood
> 
> Or stop upcall thread before xlate_remove_ofproto in destruct.
> 
> 发件人: Linhaifeng
> 发送时间: 2018年6月21日 13:49
> 收件人: 'd...@openvswitch.org' 
> mailto:d...@openvswitch.org>>
> 主题: [bug] ovs crash when call xlate_normal_flood
> 
> Should we use rwlock to use xbridge->xbundles? xbridge->xbundles may write in 
> main thread and read in upcall thread.
> 
> The crash stack as follow:
> #5  0x004a04e8 in xlate_normal_flood (ctx=ctx@entry=0x7faeee7b6f30, 
> in_xbundle=in_xbundle@entry=0x553d950, vlan=vlan@entry=0) at 
> ofproto/ofproto_dpif_xlate.c:2509
> 2509   in ofproto/ofproto_dpif_xlate.c
> (gdb) info locals
> xbundle = 0xffe8
> (gdb) p xbundle->ofbundle
> Cannot access memory at address 0xfff8
> (gdb) p in_xbundle->ofbundle
> $5 = (struct ofbundle *) 0x0
> (gdb) p in_xbundle
> $6 = (struct xbundle *) 0x553d950
> (gdb) p *in_xbundle
> $7 = {hmap_node = {hash = 0, next = 0x0}, ofbundle = 0x0, list_node = {prev = 
> 0x0, next = 0x0}, xbridge = 0x0, xports = {prev = 0x0, next = 0x0}, name = 
> 0x0, bond = 0x0, lacp = 0x0, vlan_mode = PORT_VLAN_ACCESS, vlan = 0,
>   trunks = 0x0, use_priority_tags = false, floodable = false, protected = 
> false, external_id = 0}
> 
> (gdb) bt
> #0  0x7fb05ea85197 in raise () from /usr/lib64/libc.so.6
> #1  0x7fb05ea86888 in abort () from /usr/lib64/libc.so.6
> #2  0x0065e1e9 in PAT_abort ()
> #3  0x0065b32d in patchIllInsHandler ()
> #4  
> #5  0x004a04e8 in xlate_normal_flood (ctx=ctx@entry=0x7faeee7b6f30, 
> in_xbundle=in_xbundle@entry=0x553d950, vlan=vlan@entry=0) at 
> ofproto/ofproto_dpif_xlate.c:2509
> #6  0x004a0e6e in xlate_normal (ctx=0x7faeee7b6f30) at 
> ofproto/ofproto_dpif_xlate.c:2736
> #7  xlate_output_action (ctx=ctx@entry=0x7faeee7b6f30, port=, 
> max_len=, may_packet_in=may_packet_in@entry=true) at 
> ofproto/ofproto_dpif_xlate.c:4312
> #8  0x0049d240 in do_xlate_actions (ofpacts=, 
> ofpacts_len=, ctx=ctx@entry=0x7faeee7b6f30) at 
> ofproto/ofproto_dpif_xlate.c:5262
> #9  0x0049e410 in xlate_recursively (deepens=true, rule=0x6a4be00, 
> ctx=0x7faeee7b6f30) at ofproto/ofproto_dpif_xlate.c:3587
> #10 xlate_table_action (ctx=0x7faeee7b6f30, in_port=, 
> table_id=, may_packet_in=, 
> honor_table_miss=) at ofproto/ofproto_dpif_xlate.c:3654
> #11 0x0049faf7 in compose_output_action__ 
> (ctx=ctx@entry=0x7faeee7b6f30, ofp_port=, xr=, 
> check_stp=check_stp@entry=true) at ofproto/ofproto_dpif_xlate.c:3317
> #12 0x004a0126 in compose_output_action (xr=, 
> ofp_port=, ctx=0x7faeee7b6f30) at 
> ofproto/ofproto_dpif_xlate.c:3567
> #13 output_normal (ctx=ctx@entry=0x7faeee7b6f30, 
> out_xbundle=out_xbundle@entry=0x3005770, vlan=vlan@entry=0) at 
> ofproto/ofproto_dpif_xlate.c:2113
> #14 0x004a0511 in xlate_normal_flood (ctx=ctx@entry=0x7faeee7b6f30, 
> in_xbundle=in_xbundle@entry=0x3008a40, vlan=vlan@entry=0) at 
> ofproto/ofproto_dpif_xlate.c:2517
> #15 0x004a0e6e in xlate_normal (ctx=0x7faeee7b6f30) at 
> ofproto/ofproto_dpif_xlate.c:2736
> #16 xlate_output_action (ctx=ctx@entry=0x7faeee7b6f30, port=, 
> max_len=, may_packet_in=may_packet_in@entry=true) at 
> ofproto/ofproto_dpif_xlate.c:4312
> #17 0x0049d240 in do_xlate_actions (ofpacts=, 
> ofpacts_len=, ctx=ctx@entry=0x7faeee7b6f30) at 
> ofproto/ofproto_dpif_xlate.c:5262
> #18 0x0049e410 in xlate_recursively (deepens=true, rule=0x6a32500, 
> ctx=0x7faeee7b6f30) at ofproto/ofproto_dpif_xlate.c:3587
> #19 xlate_table_action (ctx=0x7faeee7b6f30, in_port=, 
> table_id=, may_packet_in=, 
> honor_table_miss=) at ofproto/ofproto_dpif_xlate.c:3654
> #20 0x0049faf7 in compose_output_action__ 
> (ctx=ctx@entry=0x7faeee7b6f30, ofp_port=, xr=, 
> check_stp=check_stp@entry=true) at ofproto/ofproto_dpif_xlate.c:3317
> #21 0x004a0126 in compose_output_action (xr=, 
> ofp_port=, ctx=0x7faeee7b6f30) at 
> ofproto/ofproto_dpif_xlate.c:3567
> #22 output_normal (ctx=ctx@entry=0x7faeee7b6f30, 
> out_xbundle=out_xbundle@entry=0x93e3e10, vlan=vlan@entry=0) at 
> ofproto/ofproto_dpif_xlate.c:2113
> #23 0x004a0511 in xlate_normal_flood (ctx=ctx@entry=0x7faeee7b6f30, 
> in_xbundle=in_xbundle@entry=0x894de90, vlan=vlan@entry=0) at 
> ofproto/ofproto_dpif_xlate.c:2517
> #24 0x004a0e6e in xlate_normal (ctx=0x7faeee7b6f30) at 
> ofproto/ofproto_dpif_xlate.c:2736
> #25 xlate_output_action (ctx=ctx@entry=0x7faeee7b6f30, port=, 
> max_len=, may_packet_in=may_packet_in@entry=true) at 
> ofproto/ofproto_dpif_xlate.c:4312
> #26 0x0049d240 in do_xlate_actions (ofpacts=, 
> ofpacts_len=, ctx=ctx@entry=0x7faeee7b6f30) at 
> ofproto/ofproto_dpif_xlate.c:5262
> #27 0x004a27df in xlate_actions

Re: [ovs-dev] ovs-vswitchd 2.4.1 scale >10K add/delete flows 100% cpu

2018-07-10 Thread Ben Pfaff

On Thu, Jul 05, 2018 at 01:44:33PM -0700, Ravi Kerur wrote:
> During scale flow add/delete (>10K), I am seeing ovs-vswitchd cpu usage
> spike to 100% and stay there without any sign of returning to normal cpu
> usage. It's normal OVS 2.4.1 and no DPDK involved. I am trying to get
> 'perf' working which might help in isolating the problem. In the meantime I
> would like to understand following things
> 
> (1) Recommended system configuration i.e. core allocation, memory,
> hugepages, ...
> (2) Published scale numbers for 2.4.1
> (3) Known performance issues with 2.4.1
> (4) Debugs to collect

2.4.x is really old.  It was released in 2015.  I doubt anyone is going
to be able to help you with it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [ovs-discuss] a question about ovs crash relationship with learn action

2018-07-10 Thread Ben Pfaff

Can you explain further?  Throwing away the cache entries makes the
cache less useful.

Are you really using v2.7.0?  None of the line numbers match up, either
in the backtrace or in the patch.

On Tue, Jun 26, 2018 at 10:38:50AM +, wangyunjian wrote:
> I think the function xlate_cache_clear() needs be called after 
> xlate_push_stats() to
> avoid xcache->entries use-after-free.
> 
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 85f5792..71b846c 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -2252,6 +2252,7 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key 
> *ukey,
>  /* Stats for deleted flows will be attributed upon flow deletion. Skip. 
> */
>  if (result != UKEY_DELETE) {
>  xlate_push_stats(ukey->xcache, );
> +xlate_cache_clear(ukey->xcache);
>  ukey->stats = *stats;
>  ukey->reval_seq = reval_seq;
>  }
> 
> > From: wangyunjian 
> > Sent: Monday, June 25, 2018 2:19 PM
> > To: ovs-disc...@openvswitch.org
> > Cc: 'b...@ovn.org' ; Lilijun (Jerry, Cloud Networking) 
> > 
> > Subject: [ovs-discuss] a question about ovs crash relationship with learn 
> > action
> > 
> > I'm running OVS 2.7.0 on a Linux 3.10.0 kernel. I found a ovs crash.
> > I doubt it's caused by use-after-free set match->flow = NULL in
> > minimatch_destroy function with following stack:
> > 
> > (gdb) bt
> > #0  0x7ff273b71197 in raise () from /usr/lib64/libc.so.6
> > #1  0x7ff273b72888 in abort () from /usr/lib64/libc.so.6
> > #2  0x00787289 in PAT_abort ()
> > #3  0x007843cd in patchIllInsHandler ()
> > #4  
> > #5  0x004cbfae in miniflow_n_values (flow=0x0) at lib/flow.h:540
> > #6  0x004cc95f in minimask_hash (mask=0x0, basis=0) at 
> > lib/classifier-private.h:321
> > #7  0x004cf613 in find_subtable (cls=0x38ad6e8, mask=0x0) at 
> > lib/classifier.c:1406
> > #8  0x004cefa7 in classifier_find_rule_exactly (cls=0x38ad6e8, 
> > target=0x7ff118025500, version=18446744073709551615) at 
> > lib/classifier.c:1178
> > #9  0x0047bcaf in collect_rules_strict (ofproto=0x389bc30, 
> > criteria=0x7ff1180254f8, rules=0x7ff118025588) at ofproto/ofproto.c:4253
> > #10 0x0047eba3 in modify_flow_start_strict (ofproto=0x389bc30, 
> > ofm=0x7ff1180254f0) at ofproto/ofproto.c:5492
> > #11 0x00482c9f in ofproto_flow_mod_start (ofproto=0x389bc30, 
> > ofm=0x7ff1180254f0) at ofproto/ofproto.c:7506
> > #12 0x0047dc01 in ofproto_flow_mod_learn_start (ofm=0x7ff1180254f0) 
> > at ofproto/ofproto.c:5088
> > #13 0x0047dd4b in ofproto_flow_mod_learn (ofm=0x7ff1180254f0, 
> > keep_ref=true) at ofproto/ofproto.c:5140
> > #14 0x004b55d4 in xlate_push_stats_entry (entry=0x7ff118015148, 
> > stats=0x7ff11d6675f0) at ofproto/ofproto-dpif-xlate-cache.c:130
> > #15 0x004b57b6 in xlate_push_stats (xcache=0x7ff1180254a0, 
> > stats=0x7ff11d6675f0) at ofproto/ofproto-dpif-xlate-cache.c:183
> > #16 0x004a312f in revalidate_ukey (udpif=0x38a5260, 
> > ukey=0x7ff0fc015910, stats=0x7ff11d668260, odp_actions=0x7ff11d66a3d0, 
> > reval_seq=25145760, recircs=0x7ff11d66a3b0) at 
> > ofproto/ofproto-dpif-upcall.c:2134
> > #17 0x004a3d76 in revalidate (revalidator=0x4cdda08) at 
> > ofproto/ofproto-dpif-upcall.c:2428
> > #18 0x004a0528 in udpif_revalidator (arg=0x4cdda08) at 
> > ofproto/ofproto-dpif-upcall.c:954
> > #19 0x0058f811 in ovsthread_wrapper (aux_=0x55088a0) at 
> > lib/ovs-thread.c:682
> > #20 0x7ff27549adc5 in start_thread () from /usr/lib64/libpthread.so.0
> > 
> > Any idea about this? 
> > Thanks,
> > Yunjian
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] Creation of network using ovs in Docker

2018-07-10 Thread Sandeep Adapala

Hello Guru,

Now I have installed DPDK and OVS both of them are compatible with each
other but now I got stuck at new place. I know you are not very familiar
with DPDK but can you add someone who can help me out in that.

I have configured the OVS to use dpdk library using

./configure --with-dpdk=$DPDK_BUILD

then i built it and started it the same way as mentioned in this document

http://docs.openvswitch.org/en/latest/intro/install/general/#general-building

but when i run ovs-vsctl get Open_vSwitch . dpdk_initialized i dont see
dpdk running


Regards,
Sandeep
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v2] ovs-ofctl: New helper command "parse-packet".

2018-07-10 Thread Ben Pfaff

This was useful for testing commit 4fe080160685 ("flow: Fix buffer overread
for crafted IPv6 packets.").

Signed-off-by: Ben Pfaff 
---
v1->v2: Updated commit message.

 utilities/ovs-ofctl.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/utilities/ovs-ofctl.c b/utilities/ovs-ofctl.c
index 0cd0fcb63e4b..ee08178d8fff 100644
--- a/utilities/ovs-ofctl.c
+++ b/utilities/ovs-ofctl.c
@@ -4802,6 +4802,24 @@ ofctl_compose_packet(struct ovs_cmdl_context *ctx)
 }
 }
 
+/* "parse-packet" reads an Ethernet packet from stdin and prints it out its
+ * extracted flow fields. */
+static void
+ofctl_parse_packet(struct ovs_cmdl_context *ctx OVS_UNUSED)
+{
+char packet[65535];
+ssize_t size = read(STDIN_FILENO, packet, sizeof packet);
+if (size < 0) {
+ovs_fatal(errno, "failed to read packet from stdin");
+}
+
+/* Make a copy of the packet in allocated memory to better allow Valgrind
+ * and Address Sanitizer to catch out-of-range access. */
+void *packet_copy = xmemdup(packet, size);
+ofp_print_packet(stdout, packet_copy, size, 0);
+free(packet_copy);
+}
+
 static const struct ovs_cmdl_command all_commands[] = {
 { "show", "switch",
   1, 1, ofctl_show, OVS_RO },
@@ -4936,6 +4954,7 @@ static const struct ovs_cmdl_command all_commands[] = {
 { "encode-hello", NULL, 1, 1, ofctl_encode_hello, OVS_RW },
 { "parse-key-value", NULL, 1, INT_MAX, ofctl_parse_key_value, OVS_RW },
 { "compose-packet", NULL, 1, 2, ofctl_compose_packet, OVS_RO },
+{ "parse-packet", NULL, 0, 0, ofctl_parse_packet, OVS_RO },
 
 { NULL, NULL, 0, 0, NULL, OVS_RO },
 };
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/2] ovs python: ovs.stream.open_block() returns success even if the remote is unreachable

2018-07-10 Thread Ben Pfaff

On Wed, Jul 11, 2018 at 12:56:39AM +0530, nusid...@redhat.com wrote:
> From: Numan Siddique 
> 
> The python function ovs.socket_util.check_connection_completion() uses 
> select()
> (provided by python) to monitor the socket file descriptor. The select()
> returns 1 when the file descriptor becomes ready. For error cases like -
> 111 (Connection refused) and 113 (No route to host) (POLLERR), 
> ovs.poller._SelectSelect.poll()
> expects the exceptfds list to be set by select(). But that is not the case.
> As per the select() man page, writefds list will be set for POLLERR.
> Please see "Correspondence between select() and poll() notifications" section 
> of select(2)
> man page.
> 
> Because of this behavior, ovs.socket_util.check_connection_completion() 
> returns success
> even if the remote is unreachable or not listening on the port.
> 
> This patch fixes this issue by adding a wrapper function - 
> check_connection_completion_status()
> which calls sock.connect_ex() to get the status of the connection if
> ovs.socket_util.check_connection_completion() returns success.
> 
> The test cases added fails without the fix in this patch.
> 
> Signed-off-by: Numan Siddique 

Can we just code check_connection_completion() like we do in C, by
directly using select() on Windows and poll() everywhere else?  The
approach in this patch seems a little indirect to me.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 3/3] OVN: add protocol unreachable support to OVN router ports

2018-07-10 Thread Ben Pfaff

This got fixed, right?

On Fri, Jun 29, 2018 at 02:59:39PM +0200, Daniel Alvarez Sanchez wrote:
> Yes, let's hope we can get it in soon... expecting an v3 from Darrell
> apparently.
> Thanks!
> 
> On Fri, Jun 29, 2018 at 2:15 PM Lucas Alvares Gomes 
> wrote:
> 
> >  Hi,
> >
> > > this should be the same issue reported by Darrell ('ovn: Fix gateway
> > > load balancing')
> > > Regards,
> > >
> >
> > Yeah that's right, I have applied Darrell's patch [0] and re-run the
> > same test that myself and Daniel were debugging [1] and I can confirm
> > that it works now.
> >
> > Thanks for the pointers!
> >
> > [0] http://patchwork.ozlabs.org/patch/935938/
> > [1]
> > https://github.com/openstack/neutron-tempest-plugin/blob/8bc66e3205b834e17e9a9e6b72b6203a7a02cada/neutron_tempest_plugin/scenario/test_floatingip.py#L180-L200
> >
> > Cheers,
> > Lucas
> >
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] conntrack: Fix using alg_exp_entry out of scope.

2018-07-10 Thread Ben Pfaff

Thanks Ilya and Darrell.  I applied this to master and backported as far
as branch-2.8.

On Tue, Jul 10, 2018 at 09:17:42AM -0700, Darrell Ball wrote:
> Good, thanks.
> 
> Acked-by: Darrell Ball 
> 
> This needs backporting to 2.8.
> 
> I re-ran some SA tools and found some other potential use after free bugs.
> 
> 
> On Tue, Jul 10, 2018 at 4:05 AM, Ilya Maximets 
> wrote:
> 
> > 'alg_exp_entry' is allocated on stack memory, but could be used via
> > 'alg_exp' pointer inside 'write_ct_md' function, i.e. outside its scope.
> >
> > CC: Darrell Ball 
> > Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.")
> > Signed-off-by: Ilya Maximets 
> > ---
> >
> >  lib/conntrack.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/conntrack.c b/lib/conntrack.c
> > index 97fd46a..51c1acb 100644
> > --- a/lib/conntrack.c
> > +++ b/lib/conntrack.c
> > @@ -1246,9 +1246,9 @@ process_one(struct conntrack *ct, struct dp_packet
> > *pkt,
> >  }
> >
> >  const struct alg_exp_node *alg_exp = NULL;
> > +struct alg_exp_node alg_exp_entry;
> >
> >  if (OVS_UNLIKELY(create_new_conn)) {
> > -struct alg_exp_node alg_exp_entry;
> >
> >  ct_rwlock_rdlock(>resources_lock);
> >  alg_exp = expectation_lookup(>alg_expectations, >key,
> > --
> > 2.7.4
> >
> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [patch v1] conntrack: Fix conn_update_state_alg use after free.

2018-07-10 Thread Ben Pfaff

On Tue, Jul 10, 2018 at 09:15:09AM -0700, Darrell Ball wrote:
> Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.")
> Signed-off-by: Darrell Ball 

This could use a little longer commit message.  Is the following
accurate:

When conn_update_state() returns true, it has freed 'conn' which
therefore should not be passed into handle_ftp_ctl().

Thanks,

Ben.

> ---
> 
> Needs backporting as far back as 2.8.
> 
>  lib/conntrack.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index e1c1f2e..b818584 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -1159,8 +1159,11 @@ conn_update_state_alg(struct conntrack *ct, struct 
> dp_packet *pkt,
>  } else {
>  *create_new_conn = conn_update_state(ct, pkt, ctx, , now,
>  bucket);
> -handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER,
> -   !!nat_action_info);
> +
> +if (*create_new_conn == false) {
> +handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER,
> +   !!nat_action_info);
> +}
>  }
>  return true;
>  }
> -- 
> 1.9.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 1/2] ovs python: ovs.stream.open_block() returns success even if the remote is unreachable

2018-07-10 Thread Ben Pfaff

On Tue, Jul 10, 2018 at 11:28:16PM +0530, Numan Siddique wrote:
> On Mon, Jul 9, 2018 at 11:36 PM Ben Pfaff  wrote:
> 
> > On Sun, Jul 08, 2018 at 10:05:41PM +0530, nusid...@redhat.com wrote:
> > > From: Numan Siddique 
> > >
> > > Calling ovs.stream.open_block(ovs.stream.open("tcp:127.0.0.1:6641"))
> > returns
> > > success even if there is no server listening on 6641. To check if the
> > connection
> > > is established or not, Stream class makes use of
> > ovs.socket_util.check_connection_completion().
> > > This function returns zero if the select for the socket fd signals. It
> > doesn't
> > > really check if the connection was established or not.
> > >
> > > This patch fixes this issue by adding a wrapper function -
> > check_connection_completion_status()
> > > which calls sock.connect_ex() to get the status of the connection if
> > > ovs.socket_util.check_connection_completion() returns success.
> > >
> > > The test cases added fails without the fix in this patch.
> > >
> > > Signed-off-by: Numan Siddique 
> >
> > I don't understand the problem here.  I mean, I believe when you say
> > there is a problem, but the cause doesn't really make sense to me.  The
> > code for check_connection_completion in socket_util.py looks correct to
> > me and equivalent to the C implementation in socket-util.c.  Do you have
> > an idea of why it doesn't work properly?  (Is it somehow specific to
> > Python?)
> >
> > I don't think we have an equivalent test for the C version.  Does it
> > pass, or does it need a similar change?
> >
> >
> The C implementation works fine. The function stream_open_block() returns 0
> when connected and a proper errno when it fails. I see the problem only in
> python implementation.
> 
> In C implementation, poll() is used. Suppose if I give remote as tcp:
> 127.0.0.1:6641
> and there is no server listening on 6641, poll() returns 1 and POLLERR bit
> is
> set in struct pollfd's revents (
> https://github.com/openvswitch/ovs/blob/master/lib/socket-util.c#L291)
> 
> In python implementation, select() is used. For the same remote "tcp:
> 127.0.0.1:6641",
> select() is returning 1, but 'ovs.poller.POLLERR' is not set in revents
> (
> https://github.com/openvswitch/ovs/blob/master/python/ovs/socket_util.py#L180
> ).
> So the python function check_connection_completion() returns 0, which is
> false positive.
> I also tested by adding below line in check_connection_completion() but
> still I see the
> same behavior.
> 
> 
> diff --git a/python/ovs/socket_util.py b/python/ovs/socket_util.py
> index 91f4532ea..01dc0af2e 100644
> --- a/python/ovs/socket_util.py
> +++ b/python/ovs/socket_util.py
> @@ -174,6 +174,8 @@ def check_connection_completion(sock):
>  p.register(event, ovs.poller.POLLOUT)
>  else:
>  p.register(sock, ovs.poller.POLLOUT)
> +p.register(sock, ovs.poller.POLLERR)
> +
>  pfds = p.poll(0)
>  if len(pfds) == 1:
>  revents = pfds[0][1]
> *
> 
> As per the select(2) man page,
> 
> *
> Correspondence between select() and poll() notifications
>Within  the  Linux  kernel  source,  we find the following
> definitions which show the correspondence between the readable, writable,
> and exceptional condition notifications of select() and the event
>notifications provided by poll(2) (and epoll(7)):
> 
>#define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP |
>POLLERR)
>   /* Ready for reading */
>#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
>   /* Ready for writing */
>#define POLLEX_SET (POLLPRI)
>   /* Exceptional condition */
> ***
> 
> To detect POLLERR, the python implementation is looking into the exceptfds
> list returned by select. But looks like exceptfds list will be set only for
> POLLPRI.

OK.  I didn't know about that difference between select() and poll().

Should we just make check_connection_completion() use select.poll()
directly (except on Windows)?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovn-nbctl: Fix compilation warnings.

2018-07-10 Thread Ben Pfaff

On Tue, Jul 10, 2018 at 07:46:55PM +0100, Ian Stokes wrote:
> This commit fixes 'maybe-uninitialized' warnings for pointers in various
> functions in ovn-nbctl when compiling with gcc 6.3.1 and -Werror.
> Pointers to structs nbrec_logical_switch, nbrec_logical_switch_port,
> nbrec_logical_router and nbrec_logical_router_port are now initialized
> to NULL where required.
> 
> Cc: Justin Pettit 
> Cc: Venkata Anil 
> Fixes: 31114af758c7 ("ovn-nbctl: Update logical router port commands.")
> Fixes: 80f408f4cffb ("ovn: Use Logical_Switch_Port in NB.")
> Fixes: 36f232bca2db ("ovn: l3ha, CLI for logical router port gateway
>   chassis")
> Signed-off-by: Ian Stokes 

Does the following patch also fix the problem?
https://patchwork.ozlabs.org/patch/942179/
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] ovn-nbctl: Always initialize output arguments in *_by_name_or_uuid().

2018-07-10 Thread Ben Pfaff

This makes it easier to reason about the outputs, and fixes warnings for
GCC 6.3.x.

Cc: Venkata Anil 
Fixes: 31114af758c7 ("ovn-nbctl: Update logical router port commands.")
Fixes: 80f408f4cffb ("ovn: Use Logical_Switch_Port in NB.")
Fixes: 36f232bca2db ("ovn: l3ha, CLI for logical router port gateway chassis")
Signed-off-by: Ben Pfaff 
---
 ovn/utilities/ovn-nbctl.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
index a98eacf170de..47df19b231b4 100644
--- a/ovn/utilities/ovn-nbctl.c
+++ b/ovn/utilities/ovn-nbctl.c
@@ -504,6 +504,7 @@ lr_by_name_or_uuid(struct ctl_context *ctx, const char *id,
 bool is_uuid = false;
 struct uuid lr_uuid;
 
+*lr_p = NULL;
 if (uuid_from_string(_uuid, id)) {
 is_uuid = true;
 lr = nbrec_logical_router_get_for_uuid(ctx->idl, _uuid);
@@ -529,9 +530,7 @@ lr_by_name_or_uuid(struct ctl_context *ctx, const char *id,
  id, is_uuid ? "UUID" : "name");
 }
 
-if (lr_p) {
-*lr_p = lr;
-}
+*lr_p = lr;
 return NULL;
 }
 
@@ -540,6 +539,7 @@ ls_by_name_or_uuid(struct ctl_context *ctx, const char *id, 
bool must_exist,
const struct nbrec_logical_switch **ls_p)
 {
 const struct nbrec_logical_switch *ls = NULL;
+*ls_p = NULL;
 
 struct uuid ls_uuid;
 bool is_uuid = uuid_from_string(_uuid, id);
@@ -567,9 +567,7 @@ ls_by_name_or_uuid(struct ctl_context *ctx, const char *id, 
bool must_exist,
  id, is_uuid ? "UUID" : "name");
 }
 
-if (ls_p) {
-*ls_p = ls;
-}
+*ls_p = ls;
 return NULL;
 }
 
@@ -925,6 +923,7 @@ lsp_by_name_or_uuid(struct ctl_context *ctx, const char *id,
 const struct nbrec_logical_switch_port **lsp_p)
 {
 const struct nbrec_logical_switch_port *lsp = NULL;
+*lsp_p = NULL;
 
 struct uuid lsp_uuid;
 bool is_uuid = uuid_from_string(_uuid, id);
@@ -945,9 +944,7 @@ lsp_by_name_or_uuid(struct ctl_context *ctx, const char *id,
  id, is_uuid ? "UUID" : "name");
 }
 
-if (lsp_p) {
-*lsp_p = lsp;
-}
+*lsp_p = lsp;
 return NULL;
 }
 
@@ -3043,7 +3040,8 @@ nbctl_lr_nat_add(struct ctl_context *ctx)
 }
 
 logical_port = ctx->argv[5];
-error = lsp_by_name_or_uuid(ctx, logical_port, true, NULL);
+const struct nbrec_logical_switch_port *lsp;
+error = lsp_by_name_or_uuid(ctx, logical_port, true, );
 if (error) {
 ctx->error = error;
 free(new_logical_ip);
@@ -3229,6 +3227,7 @@ lrp_by_name_or_uuid(struct ctl_context *ctx, const char 
*id, bool must_exist,
 const struct nbrec_logical_router_port **lrp_p)
 {
 const struct nbrec_logical_router_port *lrp = NULL;
+*lrp_p = NULL;
 
 struct uuid lrp_uuid;
 bool is_uuid = uuid_from_string(_uuid, id);
@@ -3249,9 +3248,7 @@ lrp_by_name_or_uuid(struct ctl_context *ctx, const char 
*id, bool must_exist,
  id, is_uuid ? "UUID" : "name");
 }
 
-if (lrp_p) {
-*lrp_p = lrp;
-}
+*lrp_p = lrp;
 return NULL;
 }
 
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 0/3] IPsec support for tunneling

2018-07-10 Thread Ian Stokes


On 7/9/2018 5:54 PM, Ben Pfaff wrote:

On Thu, Jul 05, 2018 at 09:29:37PM +, Stokes, Ian wrote:

On Thu, Jul 05, 2018 at 09:29:12PM +0100, Ian Stokes wrote:

On 6/27/2018 6:58 PM, Qiuyu Xiao wrote:

This patch series reintroduce IPsec support for OVS tunneling and
adds new features to prepare for the OVN IPsec support. The new

features are:


1) Add CA-cert based authentication support to ovs-monitor-ipsec.
2) Enable ovs-pki to generate x.509 version 3 certificate.



Thanks for working on the series.

Just had a general query as regards IPsec in userspace.

I had previously looked at implementing a *rough* IPsec Tunnel
interface for userspace last year for OVS DPDK. I had put the work on
hold as DPDK has begun working on a general IPsec library which would
make implementation simpler and cleaner/simpler to maintain in the
future. Targeted for DPDK
18.11 (November this year).

Would the introduction of a specific IPsec tunnel interface still be
acceptable in light of this patch?

There are other libraries such as macsec that DPDK has libraries for
as well that could be introduced in the future for user space.

I'm just aware of the divergence of approaches between whats available
in kernel vs userspace so thought it was worth raising for discussion
at this point?


Qiuyu probably doesn't have the context for this so let me respond.

Ideally, I'd like to have a single IPsec tunnel configuration interface
that works well with all datapaths.  The one that Qiuyu is (re)introducing
works for the kernel datapath.  I don't know IPsec or DPDK well enough to
guess whether changes would be needed to better adapt it to a userspace
datapath.  Do you see weaknesses in that area?
It'd be great to get it right now, if we can.


Ok, Cc'ing Declan who is heading up the IPsec library for DPDK.

 From the userspace POV I guess we would have to do the IPsec
processing (encryption/decryption, SA lookup/selection/installation)
from when a packet is received on the datapath (if certs had not been
setup previously). This is why I had suggested using a new tunnel type
previously. The encap/decap action can be associated with the SA
actions ideally.


I don't understand yet why a new tunnel type is preferable.  Keep in
mind that it wouldn't be a single new tunnel type but a new tunnel type
per current tunnel type (gre_ipsec, vxlan_ipsec, stt_ipsec,
geneve_ipsec, ...).



I was thinking of an IPsec tunnel that uses esp mode, from feedback I 
had received previously on ipsec transport with vxlan it use was 
limited. The ipsec tunnel would not be specific for another tunnel 
encapsulation such as vxlan etc as those tunnel headers would be 
encapsulated within the IPsec payload in esp mode so would not be 
visible. the idea being once IPsec is decapsulated the packet would be 
recirculated, if the packet was vxlan or gre etc it would then 
decapsulated for that protocol.


I'll take sometime and look at Qiuyus patch in more detail to provide 
better feedback.


Ian

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH 2/2] python jsonrpc: Allow jsonrpc_session to have more than one remote.

2018-07-10 Thread nusiddiq

From: Numan Siddique 

Python IDL implementation doesn't have the support to connect to the
cluster dbs. This patch adds this support. We are still missing the
support in python idl class to connect to the cluster master. That
support will be added in an upcoming patch.

This patch is similar to the commit 8cf6bbb184 which added multiple remote
support in the C jsonrpc implementation.

Signed-off-by: Numan Siddique 
---
 python/ovs/db/idl.py  | 20 +++-
 python/ovs/jsonrpc.py | 39 ---
 tests/ovsdb-idl.at| 54 +++
 tests/test-ovsdb.py   | 13 ---
 4 files changed, 114 insertions(+), 12 deletions(-)

diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
index 64eb1a886..d5bd21535 100644
--- a/python/ovs/db/idl.py
+++ b/python/ovs/db/idl.py
@@ -101,6 +101,9 @@ class Idl(object):
 ovs.jsonrpc.session.open().  The connection will maintain an in-memory
 replica of the remote database.
 
+'remote' can be comma separated multiple remotes and each remote
+should be in a form acceptable to ovs.jsonrpc.session.open().
+
 'schema_helper' should be an instance of the SchemaHelper class which
 generates schema for the remote database. The caller may have cut it
 down by removing tables or columns that are not of interest.  The IDL
@@ -127,7 +130,8 @@ class Idl(object):
 self.tables = schema.tables
 self.readonly = schema.readonly
 self._db = schema
-self._session = ovs.jsonrpc.Session.open(remote,
+remotes = self._parse_remotes(remote)
+self._session = ovs.jsonrpc.Session.open_multiple(remotes,
 probe_interval=probe_interval)
 self._monitor_request_id = None
 self._last_seqno = None
@@ -155,6 +159,20 @@ class Idl(object):
 table.condition = [True]
 table.cond_changed = False
 
+def _parse_remotes(self, remote):
+# If remote is -
+# "tcp:10.0.0.1:6641,unix:/tmp/db.sock,t,s,tcp:10.0.0.2:6642"
+# this function returns
+# ["tcp:10.0.0.1:6641", "unix:/tmp/db.sock,t,s", tcp:10.0.0.2:6642"]
+remotes = []
+for r in remote.split(','):
+r_length = len(remotes)
+if r.find(":") == -1 and r_length:
+remotes[r_length - 1] = remotes[r_length - 1] + "," + r
+else:
+remotes.append(r)
+return remotes
+
 def index_create(self, table, name):
 """Create a named multi-column index on a table"""
 return self.tables[table].rows.index_create(name)
diff --git a/python/ovs/jsonrpc.py b/python/ovs/jsonrpc.py
index 7c24e574a..4873cff98 100644
--- a/python/ovs/jsonrpc.py
+++ b/python/ovs/jsonrpc.py
@@ -14,6 +14,7 @@
 import codecs
 import errno
 import os
+import random
 import sys
 
 import ovs.json
@@ -368,12 +369,17 @@ class Connection(object):
 class Session(object):
 """A JSON-RPC session with reconnection."""
 
-def __init__(self, reconnect, rpc):
+def __init__(self, reconnect, rpc, remotes):
 self.reconnect = reconnect
 self.rpc = rpc
 self.stream = None
 self.pstream = None
 self.seqno = 0
+if type(remotes) != list:
+remotes = [remotes]
+self.remotes = remotes
+random.shuffle(self.remotes)
+self.next_remote = 0
 
 @staticmethod
 def open(name, probe_interval=None):
@@ -393,28 +399,38 @@ class Session(object):
 feature. If non-zero the value will be forced to at least 1000
 milliseconds. If None it will just use the default value in OVS.
 """
+return Session.open_multiple([name], probe_interval=probe_interval)
+
+@staticmethod
+def open_multiple(remotes, probe_interval=None):
 reconnect = ovs.reconnect.Reconnect(ovs.timeval.msec())
-reconnect.set_name(name)
+session = Session(reconnect, None, remotes)
+session.pick_remote()
 reconnect.enable(ovs.timeval.msec())
-
-if ovs.stream.PassiveStream.is_valid_name(name):
+reconnect.set_backoff_free_tries(len(remotes))
+if ovs.stream.PassiveStream.is_valid_name(reconnect.get_name()):
 reconnect.set_passive(True, ovs.timeval.msec())
 
-if not ovs.stream.stream_or_pstream_needs_probes(name):
+if not ovs.stream.stream_or_pstream_needs_probes(reconnect.get_name()):
 reconnect.set_probe_interval(0)
 elif probe_interval is not None:
 reconnect.set_probe_interval(probe_interval)
 
-return Session(reconnect, None)
+return session
 
 @staticmethod
 def open_unreliably(jsonrpc):
 reconnect = ovs.reconnect.Reconnect(ovs.timeval.msec())
+session = Session(reconnect, None, [jsonrpc.name])
 reconnect.set_quiet(True)
-reconnect.set_name(jsonrpc.name)
+session.pick_remote()

[ovs-dev] [PATCH 1/2] ovs python: ovs.stream.open_block() returns success even if the remote is unreachable

2018-07-10 Thread nusiddiq

From: Numan Siddique 

The python function ovs.socket_util.check_connection_completion() uses select()
(provided by python) to monitor the socket file descriptor. The select()
returns 1 when the file descriptor becomes ready. For error cases like -
111 (Connection refused) and 113 (No route to host) (POLLERR), 
ovs.poller._SelectSelect.poll()
expects the exceptfds list to be set by select(). But that is not the case.
As per the select() man page, writefds list will be set for POLLERR.
Please see "Correspondence between select() and poll() notifications" section 
of select(2)
man page.

Because of this behavior, ovs.socket_util.check_connection_completion() returns 
success
even if the remote is unreachable or not listening on the port.

This patch fixes this issue by adding a wrapper function - 
check_connection_completion_status()
which calls sock.connect_ex() to get the status of the connection if
ovs.socket_util.check_connection_completion() returns success.

The test cases added fails without the fix in this patch.

Signed-off-by: Numan Siddique 
---
 python/ovs/socket_util.py | 34 ++
 python/ovs/stream.py  | 16 +---
 tests/automake.mk |  1 +
 tests/ovsdb-idl.at| 16 
 tests/test-stream.py  | 32 
 5 files changed, 96 insertions(+), 3 deletions(-)
 create mode 100644 tests/test-stream.py

diff --git a/python/ovs/socket_util.py b/python/ovs/socket_util.py
index 403104936..91f4532ea 100644
--- a/python/ovs/socket_util.py
+++ b/python/ovs/socket_util.py
@@ -256,6 +256,40 @@ def inet_open_active(style, target, default_port, dscp):
 return get_exception_errno(e), None
 
 
+def check_connection_completion_status(sock, target, default_port):
+status = check_connection_completion(sock)
+if status:
+return status
+
+try:
+address = inet_parse_active(target, default_port)
+except ValueError:
+# It's not a valid tcp target.
+return status
+
+# For tcp connections, check_connection_completion function returns 0
+# when the select for the socket fd signals. But it doesn't really
+# verify the connection was established or not. So call connect again on
+# the socket to get the status.
+try:
+err = sock.connect_ex(address)
+except socket.error as e:
+err = get_exception_errno(e)
+if sys.platform == 'win32' and error == errno.WSAEWOULDBLOCK:
+# WSAEWOULDBLOCK would be the equivalent on Windows
+# for EINPROGRESS on Unix.
+err = err.EINPROGRESS
+
+if err == errno.EINPROGRESS or err == errno.EALREADY:
+return errno.EINPROGRESS
+
+if err == 0 or err == errno.EISCONN:
+return 0
+
+sock.close()
+return err
+
+
 def get_exception_errno(e):
 """A lot of methods on Python socket objects raise socket.error, but that
 exception is documented as having two completely different forms of
diff --git a/python/ovs/stream.py b/python/ovs/stream.py
index 5996497a5..7d5227469 100644
--- a/python/ovs/stream.py
+++ b/python/ovs/stream.py
@@ -119,6 +119,7 @@ class Stream(object):
   bInitialState=False)
 
 self.name = name
+self.suffix = name.split(":", 1)[1]
 if status == errno.EAGAIN:
 self.state = Stream.__S_CONNECTING
 elif status == 0:
@@ -191,8 +192,16 @@ class Stream(object):
 if error:
 return error, None
 else:
-status = ovs.socket_util.check_connection_completion(sock)
-return 0, cls(sock, name, status)
+err = ovs.socket_util.check_connection_completion_status(
+sock, suffix, 0)
+if err == errno.EAGAIN or err == errno.EINPROGRESS:
+status = errno.EAGAIN
+err = 0
+elif err == 0:
+status = 0
+else:
+status = err
+return err, cls(sock, name, status)
 
 @staticmethod
 def _open(suffix, dscp):
@@ -246,7 +255,8 @@ class Stream(object):
 
 def __scs_connecting(self):
 if self.socket is not None:
-retval = ovs.socket_util.check_connection_completion(self.socket)
+retval = ovs.socket_util.check_connection_completion_status(
+self.socket, self.suffix, 0)
 assert retval != errno.EINPROGRESS
 elif sys.platform == 'win32':
 if self.retry_connect:
diff --git a/tests/automake.mk b/tests/automake.mk
index 8224e5a4a..0abf29d47 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -421,6 +421,7 @@ CHECK_PYFILES = \
tests/test-l7.py \
tests/test-ovsdb.py \
tests/test-reconnect.py \
+   tests/test-stream.py \
tests/MockXenAPI.py \
tests/test-unix-socket.py \
tests/test-unixctl.py \
diff --git a/tests/ovsdb-idl.at

[ovs-dev] [PATCH v3 0/2] Partial cluster support in Python IDL client

2018-07-10 Thread nusiddiq

From: Numan Siddique 

Python IDL library is lacking the functionality to connect to the
clustered db servers by providing multiple remotes (like -
"tcp:10.0.0.1:6641, tcp:10.0.0.2:6641, tcp:10.0.0.3:6641") in the
connection string.

This patch adds this functionality to the python idl library.
It still lacks the feature to connect to the master of the cluster.
To add this
  - python idl client should monitor and read the '_Server' schema
  - connect to the master of the cluster.

I will submit the patch once that is ready. But for now I think this
is good enough for the clients to connect to the cluster dbs.


v2 -> v3

Addressed the review comments from Ben to parse the remote in
db/idl.py

v1 -> v2

Deleted the debug code which I forgot to cleanup when sending v1.

Numan Siddique (2):
  ovs python: ovs.stream.open_block() returns success even if the remote
is unreachable
  python jsonrpc: Allow jsonrpc_session to have more than one remote.

 python/ovs/db/idl.py  | 20 ++-
 python/ovs/jsonrpc.py | 39 +-
 python/ovs/socket_util.py | 34 +++
 python/ovs/stream.py  | 16 +++--
 tests/automake.mk |  1 +
 tests/ovsdb-idl.at| 70 +++
 tests/test-ovsdb.py   | 13 ++--
 tests/test-stream.py  | 32 ++
 8 files changed, 210 insertions(+), 15 deletions(-)
 create mode 100644 tests/test-stream.py

-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] ovn-nbctl: Fix compilation warnings.

2018-07-10 Thread Ian Stokes

This commit fixes 'maybe-uninitialized' warnings for pointers in various
functions in ovn-nbctl when compiling with gcc 6.3.1 and -Werror.
Pointers to structs nbrec_logical_switch, nbrec_logical_switch_port,
nbrec_logical_router and nbrec_logical_router_port are now initialized
to NULL where required.

Cc: Justin Pettit 
Cc: Venkata Anil 
Fixes: 31114af758c7 ("ovn-nbctl: Update logical router port commands.")
Fixes: 80f408f4cffb ("ovn: Use Logical_Switch_Port in NB.")
Fixes: 36f232bca2db ("ovn: l3ha, CLI for logical router port gateway
  chassis")
Signed-off-by: Ian Stokes 
---
 ovn/utilities/ovn-nbctl.c | 66 +++
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
index a98eacf..bca588a 100644
--- a/ovn/utilities/ovn-nbctl.c
+++ b/ovn/utilities/ovn-nbctl.c
@@ -885,7 +885,7 @@ nbctl_ls_del(struct ctl_context *ctx)
 {
 bool must_exist = !shash_find(>options, "--if-exists");
 const char *id = ctx->argv[1];
-const struct nbrec_logical_switch *ls;
+const struct nbrec_logical_switch *ls = NULL;
 
 char *error = ls_by_name_or_uuid(ctx, id, must_exist, );
 if (error) {
@@ -986,7 +986,7 @@ nbctl_lsp_add(struct ctl_context *ctx)
 {
 bool may_exist = shash_find(>options, "--may-exist") != NULL;
 
-const struct nbrec_logical_switch *ls;
+const struct nbrec_logical_switch *ls = NULL;
 char *error = ls_by_name_or_uuid(ctx, ctx->argv[1], true, );
 if (error) {
 ctx->error = error;
@@ -1113,7 +1113,7 @@ static void
 nbctl_lsp_del(struct ctl_context *ctx)
 {
 bool must_exist = !shash_find(>options, "--if-exists");
-const struct nbrec_logical_switch_port *lsp;
+const struct nbrec_logical_switch_port *lsp = NULL;
 
 char *error = lsp_by_name_or_uuid(ctx, ctx->argv[1], must_exist, );
 if (error) {
@@ -1170,7 +1170,7 @@ nbctl_lsp_list(struct ctl_context *ctx)
 static void
 nbctl_lsp_get_parent(struct ctl_context *ctx)
 {
-const struct nbrec_logical_switch_port *lsp;
+const struct nbrec_logical_switch_port *lsp = NULL;
 
 char *error = lsp_by_name_or_uuid(ctx, ctx->argv[1], true, );
 if (error) {
@@ -1184,7 +1184,7 @@ nbctl_lsp_get_parent(struct ctl_context *ctx)
 static void
 nbctl_lsp_get_tag(struct ctl_context *ctx)
 {
-const struct nbrec_logical_switch_port *lsp;
+const struct nbrec_logical_switch_port *lsp = NULL;
 
 char *error = lsp_by_name_or_uuid(ctx, ctx->argv[1], true, );
 if (error) {
@@ -1199,7 +1199,7 @@ static void
 nbctl_lsp_set_addresses(struct ctl_context *ctx)
 {
 const char *id = ctx->argv[1];
-const struct nbrec_logical_switch_port *lsp;
+const struct nbrec_logical_switch_port *lsp = NULL;
 
 char *error = lsp_by_name_or_uuid(ctx, id, true, );
 if (error) {
@@ -1229,7 +1229,7 @@ static void
 nbctl_lsp_get_addresses(struct ctl_context *ctx)
 {
 const char *id = ctx->argv[1];
-const struct nbrec_logical_switch_port *lsp;
+const struct nbrec_logical_switch_port *lsp = NULL;
 struct svec addresses;
 const char *mac;
 size_t i;
@@ -1254,7 +1254,7 @@ static void
 nbctl_lsp_set_port_security(struct ctl_context *ctx)
 {
 const char *id = ctx->argv[1];
-const struct nbrec_logical_switch_port *lsp;
+const struct nbrec_logical_switch_port *lsp = NULL;
 
 char *error = lsp_by_name_or_uuid(ctx, id, true, );
 if (error) {
@@ -1268,7 +1268,7 @@ static void
 nbctl_lsp_get_port_security(struct ctl_context *ctx)
 {
 const char *id = ctx->argv[1];
-const struct nbrec_logical_switch_port *lsp;
+const struct nbrec_logical_switch_port *lsp = NULL;
 struct svec addrs;
 const char *addr;
 size_t i;
@@ -1292,7 +1292,7 @@ static void
 nbctl_lsp_get_up(struct ctl_context *ctx)
 {
 const char *id = ctx->argv[1];
-const struct nbrec_logical_switch_port *lsp;
+const struct nbrec_logical_switch_port *lsp = NULL;
 
 char *error = lsp_by_name_or_uuid(ctx, id, true, );
 if (error) {
@@ -1323,7 +1323,7 @@ nbctl_lsp_set_enabled(struct ctl_context *ctx)
 {
 const char *id = ctx->argv[1];
 const char *state = ctx->argv[2];
-const struct nbrec_logical_switch_port *lsp;
+const struct nbrec_logical_switch_port *lsp = NULL;
 
 char *error = lsp_by_name_or_uuid(ctx, id, true, );
 if (error) {
@@ -1341,7 +1341,7 @@ static void
 nbctl_lsp_get_enabled(struct ctl_context *ctx)
 {
 const char *id = ctx->argv[1];
-const struct nbrec_logical_switch_port *lsp;
+const struct nbrec_logical_switch_port *lsp = NULL;
 
 char *error = lsp_by_name_or_uuid(ctx, id, true, );
 if (error) {
@@ -1356,7 +1356,7 @@ nbctl_lsp_set_type(struct ctl_context *ctx)
 {
 const char *id = ctx->argv[1];
 const char *type = ctx->argv[2];
-const struct nbrec_logical_switch_port *lsp;
+const struct nbrec_logical_switch_port *lsp = NULL;
 
 char *error = lsp_by_name_or_uuid(ctx, id,

Re: [ovs-dev] [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-07-10 Thread Pravin Shelar

On Wed, Jul 4, 2018 at 7:23 AM, Matteo Croce  wrote:
> From: Stefano Brivio 
>
> Open vSwitch sends to userspace all received packets that have
> no associated flow (thus doing an "upcall"). Then the userspace
> program creates a new flow and determines the actions to apply
> based on its configuration.
>
> When a single port generates a high rate of upcalls, it can
> prevent other ports from dispatching their own upcalls. vswitchd
> overcomes this problem by creating many netlink sockets for each
> port, but it quickly exceeds any reasonable maximum number of
> open files when dealing with huge amounts of ports.
>
> This patch queues all the upcalls into a list, ordering them in
> a per-port round-robin fashion, and schedules a deferred work to
> queue them to userspace.
>
> The algorithm to queue upcalls in a round-robin fashion,
> provided by Stefano, is based on these two rules:
>  - upcalls for a given port must be inserted after all the other
>occurrences of upcalls for the same port already in the queue,
>in order to avoid out-of-order upcalls for a given port
>  - insertion happens once the highest upcall count for any given
>port (excluding the one currently at hand) is greater than the
>count for the port we're queuing to -- if this condition is
>never true, upcall is queued at the tail. This results in a
>per-port round-robin order.
>
> In order to implement a fair round-robin behaviour, a variable
> queueing delay is introduced. This will be zero if the upcalls
> rate is below a given threshold, and grows linearly with the
> queue utilisation (i.e. upcalls rate) otherwise.
>
> This ensures fairness among ports under load and with few
> netlink sockets.
>
Thanks for the patch.
This patch is adding following overhead for upcall handling:
1. kmalloc.
2. global spin-lock.
3. context switch to single worker thread.
I think this could become bottle neck on most of multi core systems.
You have mentioned issue with existing fairness mechanism, Can you
elaborate on those, I think we could improve that before implementing
heavy weight fairness in upcall handling.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 1/2] ovs python: ovs.stream.open_block() returns success even if the remote is unreachable

2018-07-10 Thread Numan Siddique

On Mon, Jul 9, 2018 at 11:36 PM Ben Pfaff  wrote:

> On Sun, Jul 08, 2018 at 10:05:41PM +0530, nusid...@redhat.com wrote:
> > From: Numan Siddique 
> >
> > Calling ovs.stream.open_block(ovs.stream.open("tcp:127.0.0.1:6641"))
> returns
> > success even if there is no server listening on 6641. To check if the
> connection
> > is established or not, Stream class makes use of
> ovs.socket_util.check_connection_completion().
> > This function returns zero if the select for the socket fd signals. It
> doesn't
> > really check if the connection was established or not.
> >
> > This patch fixes this issue by adding a wrapper function -
> check_connection_completion_status()
> > which calls sock.connect_ex() to get the status of the connection if
> > ovs.socket_util.check_connection_completion() returns success.
> >
> > The test cases added fails without the fix in this patch.
> >
> > Signed-off-by: Numan Siddique 
>
> I don't understand the problem here.  I mean, I believe when you say
> there is a problem, but the cause doesn't really make sense to me.  The
> code for check_connection_completion in socket_util.py looks correct to
> me and equivalent to the C implementation in socket-util.c.  Do you have
> an idea of why it doesn't work properly?  (Is it somehow specific to
> Python?)
>
> I don't think we have an equivalent test for the C version.  Does it
> pass, or does it need a similar change?
>
>
The C implementation works fine. The function stream_open_block() returns 0
when connected and a proper errno when it fails. I see the problem only in
python implementation.

In C implementation, poll() is used. Suppose if I give remote as tcp:
127.0.0.1:6641
and there is no server listening on 6641, poll() returns 1 and POLLERR bit
is
set in struct pollfd's revents (
https://github.com/openvswitch/ovs/blob/master/lib/socket-util.c#L291)

In python implementation, select() is used. For the same remote "tcp:
127.0.0.1:6641",
select() is returning 1, but 'ovs.poller.POLLERR' is not set in revents
(
https://github.com/openvswitch/ovs/blob/master/python/ovs/socket_util.py#L180
).
So the python function check_connection_completion() returns 0, which is
false positive.
I also tested by adding below line in check_connection_completion() but
still I see the
same behavior.


diff --git a/python/ovs/socket_util.py b/python/ovs/socket_util.py
index 91f4532ea..01dc0af2e 100644
--- a/python/ovs/socket_util.py
+++ b/python/ovs/socket_util.py
@@ -174,6 +174,8 @@ def check_connection_completion(sock):
 p.register(event, ovs.poller.POLLOUT)
 else:
 p.register(sock, ovs.poller.POLLOUT)
+p.register(sock, ovs.poller.POLLERR)
+
 pfds = p.poll(0)
 if len(pfds) == 1:
 revents = pfds[0][1]
*

As per the select(2) man page,

*
Correspondence between select() and poll() notifications
   Within  the  Linux  kernel  source,  we find the following
definitions which show the correspondence between the readable, writable,
and exceptional condition notifications of select() and the event
   notifications provided by poll(2) (and epoll(7)):

   #define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP |
   POLLERR)
  /* Ready for reading */
   #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
  /* Ready for writing */
   #define POLLEX_SET (POLLPRI)
  /* Exceptional condition */
***

To detect POLLERR, the python implementation is looking into the exceptfds
list returned by select. But looks like exceptfds list will be set only for
POLLPRI.

Thanks
Numan

Thanks,
>
> Ben.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] datapath: work around the single GRE receive limitation.

2018-07-10 Thread William Tu

Commit 9f57c67c379d ("gre: Remove support for sharing GRE protocol hook")
allows only single GRE packet receiver.  When upstream kernel's gre module
is loaded, the gre.ko exclusively becomes the only gre packet receiver,
preventing OVS kernel module from registering another gre receiver.

We can either try to unload the gre.ko by removing its dependencies,
or, in this patch, we try to register OVS as only the GRE transmit
portion when detecting there already exists another GRE receiver.

Signed-off-by: William Tu 
Cc: Greg Rose 
Cc: Yifeng Sun 
---
 datapath/linux/compat/ip_gre.c | 60 +-
 datapath/vport.c   | 12 ++---
 2 files changed, 56 insertions(+), 16 deletions(-)

diff --git a/datapath/linux/compat/ip_gre.c b/datapath/linux/compat/ip_gre.c
index 92de70127189..1ab798164894 100644
--- a/datapath/linux/compat/ip_gre.c
+++ b/datapath/linux/compat/ip_gre.c
@@ -71,6 +71,7 @@ static void erspan_build_header(struct sk_buff *skb,
bool truncate, bool is_ipv4);
 
 static struct rtnl_link_ops ipgre_link_ops __read_mostly;
+static bool ip_gre_loaded = false;
 
 #define ip_gre_calc_hlen rpl_ip_gre_calc_hlen
 static int ip_gre_calc_hlen(__be16 o_flags)
@@ -1640,25 +1641,57 @@ int rpl_ipgre_init(void)
int err;
 
err = register_pernet_device(_tap_net_ops);
-   if (err < 0)
-   goto pnet_tap_failed;
+   if (err < 0) {
+   if (err == -EEXIST)
+   goto ip_gre_loaded;
+   else
+   goto pnet_tap_failed;
+   }
 
err = register_pernet_device(_net_ops);
-   if (err < 0)
-   goto pnet_erspan_failed;
+   if (err < 0) {
+   if (err == -EEXIST)
+   goto ip_gre_loaded;
+   else
+   goto pnet_erspan_failed;
+   }
 
err = register_pernet_device(_net_ops);
-   if (err < 0)
-   goto pnet_ipgre_failed;
+   if (err < 0) {
+   if (err == -EEXIST)
+   goto ip_gre_loaded;
+   else
+   goto pnet_ipgre_failed;
+   }
 
err = gre_add_protocol(_protocol, GREPROTO_CISCO);
if (err < 0) {
pr_info("%s: can't add protocol\n", __func__);
-   goto add_proto_failed;
+   if (err == -EBUSY) {
+   goto ip_gre_loaded;
+   } else {
+   goto add_proto_failed;
+   }
}
 
pr_info("GRE over IPv4 tunneling driver\n");
-   
+   ovs_vport_ops_register(_ipgre_vport_ops);
+   ovs_vport_ops_register(_erspan_vport_ops);
+   return 0;
+
+ip_gre_loaded:
+   /* Since GRE only allows single receiver to be registerd,
+* we skip here so only gre transmit works, see:
+*
+* commit 9f57c67c379d88a10e8ad676426fee5ae7341b14
+* Author: Pravin B Shelar 
+* Date:   Fri Aug 7 23:51:52 2015 -0700
+* gre: Remove support for sharing GRE protocol hook
+*
+* OVS GRE receive part is disabled.
+*/
+   pr_info("GRE TX only over IPv4 tunneling driver\n");
+   ip_gre_loaded = true;
ovs_vport_ops_register(_ipgre_vport_ops);
ovs_vport_ops_register(_erspan_vport_ops);
return 0;
@@ -1678,10 +1711,13 @@ void rpl_ipgre_fini(void)
 {
ovs_vport_ops_unregister(_erspan_vport_ops);
ovs_vport_ops_unregister(_ipgre_vport_ops);
-   gre_del_protocol(_protocol, GREPROTO_CISCO);
-   unregister_pernet_device(_net_ops);
-   unregister_pernet_device(_net_ops);
-   unregister_pernet_device(_tap_net_ops);
+
+   if (!ip_gre_loaded) {
+   gre_del_protocol(_protocol, GREPROTO_CISCO);
+   unregister_pernet_device(_net_ops);
+   unregister_pernet_device(_net_ops);
+   unregister_pernet_device(_tap_net_ops);
+   }
 }
 
 #endif
diff --git a/datapath/vport.c b/datapath/vport.c
index 02f6b56d3243..05dd9dc2a149 100644
--- a/datapath/vport.c
+++ b/datapath/vport.c
@@ -66,11 +66,15 @@ int ovs_vport_init(void)
if (err)
goto err_lisp;
err = gre_init();
-   if (err && err != -EEXIST)
+   if (err && err != -EEXIST) {
goto err_gre;
-   else if (err == -EEXIST)
-   pr_warn("Cannot take GRE protocol entry - The ERSPAN feature 
may not be supported\n");
-   else {
+   } else {
+   if (err == -EEXIST) {
+   pr_warn("Cannot take GRE protocol entry"\
+   "- The ERSPAN feature may not be supported\n");
+   /* continue GRE tx */
+   }
+
err = ipgre_init();
if (err && err != -EEXIST) 
goto err_ipgre;
-- 
2.7.4

___
dev mailing list

Re: [ovs-dev] [patch v1] conntrack: Support global drop statistics.

2018-07-10 Thread Ben Pfaff

On Tue, Jun 26, 2018 at 04:16:42PM -0700, Darrell Ball wrote:
> Signed-off-by: Darrell Ball 

I wonder whether coverage counters would be a better alternative?  They
use thread-local data so incrementing them is cheaper than atomic
increments, at least if they are accessed from multiple CPUs.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] NEWS: Re-add vhost zero copy support.

2018-07-10 Thread Ben Pfaff

On Tue, Jul 10, 2018 at 06:09:05PM +0100, Ian Stokes wrote:
> On 7/10/2018 2:33 PM, Loftus, Ciara wrote:
> >>
> >>An entry for experimental vhost zero copy support was removed
> >>incorrectly. Re-add this entry to NEWS.
> >>
> >>Reported-by: Eelco Chaudron 
> >>Cc: Ciara Loftus 
> >>Fixes: c3c722d2c7ee ("Documentation: document ovs-dpdk flow offload")
> >>Signed-off-by: Ian Stokes 
> >
> >Acked-by: Ciara Loftus 
> >
> Hi Ben,
> 
> this is a fairly minor patch, would you mind applying this directly to
> master instead of having it part of a pull request?

Ok, done.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] oss-fuzz: Move oss-fuzz test harnesses and fuzzer configs to ovs source repo

2018-07-10 Thread Ben Pfaff

On Tue, Jul 10, 2018 at 10:14:41AM -0700, Ben Pfaff wrote:
> On Tue, Jul 10, 2018 at 11:10:17AM +0200, Bhargava Shastry wrote:
> > Hi Ben,
> > 
> > Thanks a lot for the new patch. A couple of comments:
> > 
> > - Could you please rename each of `tests/oss-fuzz/config/*.options` from
> > *_fuzzer.options to *_target.options. This is because oss-fuzz infra
> > requires namesake fuzzer binaries and option.
> > 
> > Example: flow_extract_fuzzer.options renamed to
> > flow_extract_target.options etc.
> > 
> > - Could you please add "tests/oss-fuzz/fuzzer.h" to the distribution. At
> > the moment, I get the following build error:
> > 
> > ```
> > The following files are in git but not the distribution:
> > tests/oss-fuzz/fuzzer.h
> > Makefile:6314: recipe for target 'dist-hook-git' failed
> > make[2]: *** [dist-hook-git] Error 1
> > make[2]: Leaving directory '/src/openvswitch'
> > Makefile:5755: recipe for target 'all-recursive' failed
> > make[1]: *** [all-recursive] Error 1
> > make[1]: Leaving directory '/src/openvswitch'
> > Makefile:3373: recipe for target 'all' failed
> > make: *** [all] Error 2
> > ```
> 
> Oops.
> 
> Here's a v2:

Never mind that one, I failed to check in some of that.

I sent it formally:
https://patchwork.ozlabs.org/patch/942118/
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] oss-fuzz: Move oss-fuzz test harnesses and fuzzer configs to ovs source repo

2018-07-10 Thread Ben Pfaff

From: Bhargava Shastry 

Signed-off-by: Ben Pfaff 
---
 Makefile.am   |   1 +
 tests/automake.mk |   2 +
 tests/oss-fuzz/automake.mk|  27 ++
 tests/oss-fuzz/config/flow_extract_target.options |   2 +
 tests/oss-fuzz/config/json_parser_target.options  |   2 +
 tests/oss-fuzz/config/ofp_print_target.options|   3 +
 tests/oss-fuzz/config/ovs.dict| 293 ++
 tests/oss-fuzz/flow_extract_target.c  |  14 ++
 tests/oss-fuzz/fuzzer.h   |   9 +
 tests/oss-fuzz/json_parser_target.c   |  57 +
 tests/oss-fuzz/ofp_print_target.c |  50 
 11 files changed, 460 insertions(+)
 create mode 100644 tests/oss-fuzz/automake.mk
 create mode 100644 tests/oss-fuzz/config/flow_extract_target.options
 create mode 100644 tests/oss-fuzz/config/json_parser_target.options
 create mode 100644 tests/oss-fuzz/config/ofp_print_target.options
 create mode 100644 tests/oss-fuzz/config/ovs.dict
 create mode 100644 tests/oss-fuzz/flow_extract_target.c
 create mode 100644 tests/oss-fuzz/fuzzer.h
 create mode 100644 tests/oss-fuzz/json_parser_target.c
 create mode 100644 tests/oss-fuzz/ofp_print_target.c

diff --git a/Makefile.am b/Makefile.am
index e02799a90fab..ffbd051261b1 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -102,6 +102,7 @@ dist_pkgdata_SCRIPTS =
 dist_sbin_SCRIPTS =
 dist_scripts_SCRIPTS =
 dist_scripts_DATA =
+EXTRA_PROGRAMS =
 INSTALL_DATA_LOCAL =
 UNINSTALL_LOCAL =
 man_MANS =
diff --git a/tests/automake.mk b/tests/automake.mk
index 8224e5a4a22d..b7be1ea65811 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -475,3 +475,5 @@ clean-pki:
rm -f tests/pki/stamp
rm -rf tests/pki
 endif
+
+include tests/oss-fuzz/automake.mk
diff --git a/tests/oss-fuzz/automake.mk b/tests/oss-fuzz/automake.mk
new file mode 100644
index ..a2df5aa748f6
--- /dev/null
+++ b/tests/oss-fuzz/automake.mk
@@ -0,0 +1,27 @@
+OSS_FUZZ_TARGETS = \
+   tests/oss-fuzz/flow_extract_target \
+   tests/oss-fuzz/json_parser_target \
+   tests/oss-fuzz/ofp_print_target
+EXTRA_PROGRAMS += $(OSS_FUZZ_TARGETS)
+oss-fuzz-targets: $(OSS_FUZZ_TARGETS)
+
+tests_oss_fuzz_flow_extract_target_SOURCES = \
+   tests/oss-fuzz/flow_extract_target.c \
+   tests/oss-fuzz/fuzzer.h
+tests_oss_fuzz_flow_extract_target_LDADD = lib/libopenvswitch.la
+
+tests_oss_fuzz_json_parser_target_SOURCES = \
+   tests/oss-fuzz/json_parser_target.c \
+   tests/oss-fuzz/fuzzer.h
+tests_oss_fuzz_json_parser_target_LDADD = lib/libopenvswitch.la
+
+tests_oss_fuzz_ofp_print_target_SOURCES = \
+   tests/oss-fuzz/ofp_print_target.c \
+   tests/oss-fuzz/fuzzer.h
+tests_oss_fuzz_ofp_print_target_LDADD = lib/libopenvswitch.la
+
+EXTRA_DIST += \
+   tests/oss-fuzz/config/flow_extract_target.options \
+   tests/oss-fuzz/config/json_parser_target.options \
+   tests/oss-fuzz/config/ofp_print_target.options \
+   tests/oss-fuzz/config/ovs.dict
diff --git a/tests/oss-fuzz/config/flow_extract_target.options 
b/tests/oss-fuzz/config/flow_extract_target.options
new file mode 100644
index ..7a77aaf0462a
--- /dev/null
+++ b/tests/oss-fuzz/config/flow_extract_target.options
@@ -0,0 +1,2 @@
+[libfuzzer]
+dict = ovs.dict
diff --git a/tests/oss-fuzz/config/json_parser_target.options 
b/tests/oss-fuzz/config/json_parser_target.options
new file mode 100644
index ..8d3739a53fa2
--- /dev/null
+++ b/tests/oss-fuzz/config/json_parser_target.options
@@ -0,0 +1,2 @@
+[libfuzzer]
+dict = json.dict
diff --git a/tests/oss-fuzz/config/ofp_print_target.options 
b/tests/oss-fuzz/config/ofp_print_target.options
new file mode 100644
index ..7f117292e62d
--- /dev/null
+++ b/tests/oss-fuzz/config/ofp_print_target.options
@@ -0,0 +1,3 @@
+[libfuzzer]
+close_fd_mask = 3
+dict = ovs.dict
diff --git a/tests/oss-fuzz/config/ovs.dict b/tests/oss-fuzz/config/ovs.dict
new file mode 100644
index ..243b243ab42a
--- /dev/null
+++ b/tests/oss-fuzz/config/ovs.dict
@@ -0,0 +1,293 @@
+"0.2"
+"ADD_SUBSCRIBE"
+"-cbc"
+"CLEARSUB"
+"CLIENT"
+"GIMME"
+"GIMMEDEFS"
+"GIMMESTATS"
+"HM"
+"-hmac96"
+"HM_CTL"
+"HM_STAT"
+"HMST_CLIENT"
+"LOGIN"
+"\\MAILSLOT\\BROWSE"
+"NET-ANNOUNCED"
+"NET-VISIBLE"
+"-nodefs"
+"NONE"
+"OPSTAFF"
+"\\PIPE\\LANMAN"
+"public"
+"REALM"
+"REALM-ANNOUNCED"
+"REALM-VISIBLE"
+"REQ_SUBSCRIBE"
+"RLM_SUBSCRIBE"
+"RLM_UNSUBSCRIBE"
+"SENT"
+" %ssub%s"
+"SUBSCRIBE"
+"SUBSCRIBE_NODEFS"
+"un"
+"UNSUBSCRIBE"
+"USER_FLUSH"
+"USER_HIDE"
+"USER_LOCATE"
+"USER_UNHIDE"
+"WG_CTL"
+"\x01\x00"
+"\x01\x00\x00"
+"\x01\x00\x01"
+"\x01\x00\x02"
+"\x01\x00\x03"
+"\x01\x00\x05"
+"\x01\x01"
+"\x01\x02"
+"\x01\x03"
+"\x01\x04"
+"\x01\x05"
+"\x01\x07"
+"\x01\x0B"
+"\x01\x0C"
+"\x01\x10"
+"\x01\x11"
+"\x01\x12"
+"\x01\x13"
+"\x01\x14"
+"\x01\x15"
+"\x01\x16"
+"\x01\xE8\x48"
+"\x01\xF4"
+"\x01\xF5"
+"\x01\xF6"
+"\x01\xF7"
+"\x01\xF8"

[ovs-dev] [PATCH v5 2/2] tests: Fix unit test case caused by SMC cache.

2018-07-10 Thread Yipeng Wang

Test 1024 PMD - stats reported different stats data during tests because of
the SMC data. This commit fix the test.

Signed-off-by: Yipeng Wang 
---
 tests/pmd.at | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/pmd.at b/tests/pmd.at
index 60452f5..4cae6c8 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -196,12 +196,13 @@ dummy@ovs-dummy: hit:0 missed:0
 p0 7/1: (dummy-pmd: configured_rx_queues=4, 
configured_tx_queues=, requested_rx_queues=4, 
requested_tx_queues=)
 ])
 
-AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | 
sed '/cycles/d' | grep pmd -A 8], [0], [dnl
+AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | 
sed '/cycles/d' | grep pmd -A 9], [0], [dnl
 pmd thread numa_id  core_id :
   packets received: 0
   packet recirculations: 0
   avg. datapath passes per packet: 0.00
   emc hits: 0
+  smc hits: 0
   megaflow hits: 0
   avg. subtable lookups per megaflow hit: 0.00
   miss with success upcall: 0
@@ -226,12 +227,13 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | 
strip_xout], [0], [dnl
 
recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no),
 actions: 
 ])
 
-AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | 
sed '/cycles/d' | grep pmd -A 8], [0], [dnl
+AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | 
sed '/cycles/d' | grep pmd -A 9], [0], [dnl
 pmd thread numa_id  core_id :
   packets received: 20
   packet recirculations: 0
   avg. datapath passes per packet: 1.00
   emc hits: 19
+  smc hits: 0
   megaflow hits: 0
   avg. subtable lookups per megaflow hit: 0.00
   miss with success upcall: 1
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v5 1/2] dpif-netdev: Add SMC cache after EMC cache

2018-07-10 Thread Yipeng Wang

This patch adds a signature match cache (SMC) after exact match
cache (EMC). The difference between SMC and EMC is SMC only stores
a signature of a flow thus it is much more memory efficient. With
same memory space, EMC can store 8k flows while SMC can store 1M
flows. It is generally beneficial to turn on SMC but turn off EMC
when traffic flow count is much larger than EMC size.

SMC cache will map a signature to an dp_netdev_flow index in
flow_table. Thus, we add two new APIs in cmap for lookup key by
index and lookup index by key.

For now, SMC is an experimental feature that it is turned off by
default. One can turn it on using ovsdb options.

Signed-off-by: Yipeng Wang 
Co-authored-by: Jan Scheurich 
Signed-off-by: Jan Scheurich 
---
 Documentation/topics/dpdk/bridge.rst |  15 ++
 NEWS |   2 +
 lib/cmap.c   |  74 
 lib/cmap.h   |  11 ++
 lib/dpif-netdev-perf.h   |   1 +
 lib/dpif-netdev.c| 329 +++
 tests/pmd.at |   1 +
 vswitchd/vswitch.xml |  13 ++
 8 files changed, 409 insertions(+), 37 deletions(-)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 63f8a62..df74c02 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -102,3 +102,18 @@ For certain traffic profiles with many parallel flows, 
it's recommended to set
 ``N`` to '0' to achieve higher forwarding performance.
 
 For more information on the EMC refer to :doc:`/intro/install/dpdk` .
+
+
+SMC cache (experimental)
+-
+
+SMC cache or signature match cache is a new cache level after EMC cache.
+The difference between SMC and EMC is SMC only stores a signature of a flow
+thus it is much more memory efficient. With same memory space, EMC can store 8k
+flows while SMC can store 1M flows. When traffic flow count is much larger than
+EMC size, it is generally beneficial to turn off EMC and turn on SMC. It is
+currently turned off by default and an experimental feature.
+
+To turn on SMC::
+
+$ ovs-vsctl --no-wait set Open_vSwitch . other_config:smc-enable=true
diff --git a/NEWS b/NEWS
index 92e9b92..f30a1e0 100644
--- a/NEWS
+++ b/NEWS
@@ -44,6 +44,8 @@ Post-v2.9.0
  ovs-appctl dpif-netdev/pmd-perf-show
  * Supervision of PMD performance metrics and logging of suspicious
iterations
+ * Add signature match cache (SMC) as experimental feature. When turned on,
+   it improves throughput when traffic has many more flows than EMC size.
- ERSPAN:
  * Implemented ERSPAN protocol (draft-foschiano-erspan-00.txt) for
both kernel datapath and userspace datapath.
diff --git a/lib/cmap.c b/lib/cmap.c
index 07719a8..cb9cd32 100644
--- a/lib/cmap.c
+++ b/lib/cmap.c
@@ -373,6 +373,80 @@ cmap_find(const struct cmap *cmap, uint32_t hash)
hash);
 }
 
+/* Find a node by the index of the entry of cmap. Index N means the N/CMAP_K
+ * bucket and N%CMAP_K entry in that bucket.
+ * Notice that it is not protected by the optimistic lock (versioning) because
+ * it does not compare the hashes. Currently it is only used by the datapath
+ * SMC cache.
+ *
+ * Return node for the entry of index or NULL if the index beyond boundary */
+const struct cmap_node *
+cmap_find_by_index(const struct cmap *cmap, uint32_t index)
+{
+const struct cmap_impl *impl = cmap_get_impl(cmap);
+
+uint32_t b = index / CMAP_K;
+uint32_t e = index % CMAP_K;
+
+if (b > impl->mask) {
+return NULL;
+}
+
+const struct cmap_bucket *bucket = >buckets[b];
+
+return cmap_node_next(>nodes[e]);
+}
+
+/* Find the index of certain hash value. Currently only used by the datapath
+ * SMC cache.
+ *
+ * Return the index of the entry if found, or UINT32_MAX if not found. The
+ * function assumes entry index cannot be larger than UINT32_MAX. */
+uint32_t
+cmap_find_index(const struct cmap *cmap, uint32_t hash)
+{
+const struct cmap_impl *impl = cmap_get_impl(cmap);
+uint32_t h1 = rehash(impl, hash);
+uint32_t h2 = other_hash(h1);
+
+uint32_t b_index1 = h1 & impl->mask;
+uint32_t b_index2 = h2 & impl->mask;
+
+uint32_t c1, c2;
+uint32_t index = UINT32_MAX;
+
+const struct cmap_bucket *b1 = >buckets[b_index1];
+const struct cmap_bucket *b2 = >buckets[b_index2];
+
+do {
+do {
+c1 = read_even_counter(b1);
+for (int i = 0; i < CMAP_K; i++) {
+if (b1->hashes[i] == hash) {
+index = b_index1 * CMAP_K + i;
+ }
+}
+} while (OVS_UNLIKELY(counter_changed(b1, c1)));
+if (index != UINT32_MAX) {
+break;
+}
+do {
+c2 = read_even_counter(b2);
+for (int i = 0; i < CMAP_K; i++) {
+if (b2->hashes[i] == hash) {
+

[ovs-dev] [PATCH v5 0/2] dpif-netdev: Combine CD/DFC patch for datapath refactor

2018-07-10 Thread Yipeng Wang

This patch set is the V5 implementation to combine the CD and DFC design.
Both patches intend to refactor datapath to avoid costly sequential subtable
search.

We rename the CD/DFC as signature match cache (SMC) to be more clear since the
proposed cache structure stores signatures of flows. SMC works well when
traffic flow count is much larger than EMC size especially when subtable count
is large. Our tests show that with 20 subtables, the throughput could be
increased by more than 2x with SMC enabled.

In this version, SMC is made experimental and by default turned off.

CD and DFC patch sets:
CD: [PATCH v2 0/5] dpif-netdev: Cuckoo-Distributor implementation
https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/340305.html

DFC: [PATCH] dpif-netdev: Refactor datapath flow cache
https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341066.html

v4->v5:
1. Add comments in multiple places as Billy suggested.
2. Remove unnecessary arguments for smc_insert as Billy suggested.
3. Remove unnecessary return value from smc_insert.
4. Change the cmap_find_by_index's index type to be uin32_t from uint16_t
since index can be larger.
5. Coding style fixes.

v3->v4:
1. To make it easier for reviewers, we simplify the patch set to have only 2
commits. The first commit includes the main SMC design and the second commit
modifies the unit-test accordingly.
2. SMC is now an experimental feature turned off by default.
3. Remove the EMC changes from the original DFC patch set to make this patch
set smaller and more focus.
4. Remove adaptive turn on/off mechanism since it is now an experimental
feature turned off by default, and can be turned on manually by user.
5. Add documentation for SMC.
6. Packets hit SMC now verify input port when match dp_netdev_flow. This is
because SMC will bypass per-port classifier.

v2->v3:
1. Add the 5th commit: it is to automatically turn off DFC/CD when the number
of megaflow is larger than 2^16 since we use 16bits in the distributor to index
megaflows.
2. Add the 6th commit: since the pmd stats now print out the DFC/CD statistics
one of the unit test has mismatch output. This commit fixed this issue.
3. In first commit, the char key[248] array is changed to uint64_t key[31]
because of the OSX compilation warning that char array is 1 byte alligned while
8-byte alignment is required during type conversion.

v1->v2:
1. Add comment and follow code style for cmap code (Ben's comment)
2. Fix a bug in the first commit that fails multiple unit tests. Since DFC is
   per PMD not per port, the port mask should be included in rule.
3. Added commit 4. This commit separates DFC to be EMC cache and SMC (signature
   match cache) for easier optimization and readability.
4. In commit 4, DFC lookup is refactored to do batching lookup.
5. Rebase and other minor changes.

RFC->V1:
1. rebase to master head.
2. The last commit is totally rewritten to use the flow_table as indirect table.
   The CD/DFC distributor will cache the index of flow_table entry.
3. Incorporate commit 2 into commit 1. (Bhanu's comment)
4. Change DFC to be always on in commit 1. (Bhanu's comment)

RFC of this patch set:
https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343411.html


Yipeng Wang (2):
  dpif-netdev: Add SMC cache after EMC cache
  tests: Fix unit test case caused by SMC cache.

 Documentation/topics/dpdk/bridge.rst |  15 ++
 NEWS |   2 +
 lib/cmap.c   |  74 
 lib/cmap.h   |  11 ++
 lib/dpif-netdev-perf.h   |   1 +
 lib/dpif-netdev.c| 329 +++
 tests/pmd.at |   7 +-
 vswitchd/vswitch.xml |  13 ++
 8 files changed, 413 insertions(+), 39 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] oss-fuzz: Move oss-fuzz test harnesses and fuzzer configs to ovs source repo

2018-07-10 Thread Ben Pfaff

On Tue, Jul 10, 2018 at 11:10:17AM +0200, Bhargava Shastry wrote:
> Hi Ben,
> 
> Thanks a lot for the new patch. A couple of comments:
> 
> - Could you please rename each of `tests/oss-fuzz/config/*.options` from
> *_fuzzer.options to *_target.options. This is because oss-fuzz infra
> requires namesake fuzzer binaries and option.
> 
> Example: flow_extract_fuzzer.options renamed to
> flow_extract_target.options etc.
> 
> - Could you please add "tests/oss-fuzz/fuzzer.h" to the distribution. At
> the moment, I get the following build error:
> 
> ```
> The following files are in git but not the distribution:
> tests/oss-fuzz/fuzzer.h
> Makefile:6314: recipe for target 'dist-hook-git' failed
> make[2]: *** [dist-hook-git] Error 1
> make[2]: Leaving directory '/src/openvswitch'
> Makefile:5755: recipe for target 'all-recursive' failed
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory '/src/openvswitch'
> Makefile:3373: recipe for target 'all' failed
> make: *** [all] Error 2
> ```

Oops.

Here's a v2:

--8<--cut here-->8--

From: Bhargava Shastry 
Date: Thu, 5 Jul 2018 15:32:53 -0700
Subject: [PATCH] oss-fuzz: Move oss-fuzz test harnesses and fuzzer configs to
 ovs source repo

Signed-off-by: Ben Pfaff 
---
 Makefile.am   |   1 +
 tests/automake.mk |   2 +
 tests/oss-fuzz/automake.mk|  21 ++
 tests/oss-fuzz/config/flow_extract_fuzzer.options |   2 +
 tests/oss-fuzz/config/json_parser_fuzzer.options  |   2 +
 tests/oss-fuzz/config/ofp_print_fuzzer.options|   3 +
 tests/oss-fuzz/config/ovs.dict| 293 ++
 tests/oss-fuzz/flow_extract_target.c  |  14 ++
 tests/oss-fuzz/fuzzer.h   |   9 +
 tests/oss-fuzz/json_parser_target.c   |  57 +
 tests/oss-fuzz/ofp_print_target.c |  50 
 11 files changed, 454 insertions(+)
 create mode 100644 tests/oss-fuzz/automake.mk
 create mode 100644 tests/oss-fuzz/config/flow_extract_fuzzer.options
 create mode 100644 tests/oss-fuzz/config/json_parser_fuzzer.options
 create mode 100644 tests/oss-fuzz/config/ofp_print_fuzzer.options
 create mode 100644 tests/oss-fuzz/config/ovs.dict
 create mode 100644 tests/oss-fuzz/flow_extract_target.c
 create mode 100644 tests/oss-fuzz/fuzzer.h
 create mode 100644 tests/oss-fuzz/json_parser_target.c
 create mode 100644 tests/oss-fuzz/ofp_print_target.c

diff --git a/Makefile.am b/Makefile.am
index e02799a90fab..ffbd051261b1 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -102,6 +102,7 @@ dist_pkgdata_SCRIPTS =
 dist_sbin_SCRIPTS =
 dist_scripts_SCRIPTS =
 dist_scripts_DATA =
+EXTRA_PROGRAMS =
 INSTALL_DATA_LOCAL =
 UNINSTALL_LOCAL =
 man_MANS =
diff --git a/tests/automake.mk b/tests/automake.mk
index 8224e5a4a22d..b7be1ea65811 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -475,3 +475,5 @@ clean-pki:
rm -f tests/pki/stamp
rm -rf tests/pki
 endif
+
+include tests/oss-fuzz/automake.mk
diff --git a/tests/oss-fuzz/automake.mk b/tests/oss-fuzz/automake.mk
new file mode 100644
index ..d64b6986fc5e
--- /dev/null
+++ b/tests/oss-fuzz/automake.mk
@@ -0,0 +1,21 @@
+OSS_FUZZ_TARGETS = \
+   tests/oss-fuzz/flow_extract_target \
+   tests/oss-fuzz/json_parser_target \
+   tests/oss-fuzz/ofp_print_target
+EXTRA_PROGRAMS += $(OSS_FUZZ_TARGETS)
+oss-fuzz-targets: $(OSS_FUZZ_TARGETS)
+
+tests_oss_fuzz_flow_extract_target_SOURCES = 
tests/oss-fuzz/flow_extract_target.c
+tests_oss_fuzz_flow_extract_target_LDADD = lib/libopenvswitch.la
+
+tests_oss_fuzz_json_parser_target_SOURCES = tests/oss-fuzz/json_parser_target.c
+tests_oss_fuzz_json_parser_target_LDADD = lib/libopenvswitch.la
+
+tests_oss_fuzz_ofp_print_target_SOURCES = tests/oss-fuzz/ofp_print_target.c
+tests_oss_fuzz_ofp_print_target_LDADD = lib/libopenvswitch.la
+
+EXTRA_DIST += \
+   tests/oss-fuzz/config/flow_extract_fuzzer.options \
+   tests/oss-fuzz/config/json_parser_fuzzer.options \
+   tests/oss-fuzz/config/ofp_print_fuzzer.options \
+   tests/oss-fuzz/config/ovs.dict
diff --git a/tests/oss-fuzz/config/flow_extract_fuzzer.options 
b/tests/oss-fuzz/config/flow_extract_fuzzer.options
new file mode 100644
index ..7a77aaf0462a
--- /dev/null
+++ b/tests/oss-fuzz/config/flow_extract_fuzzer.options
@@ -0,0 +1,2 @@
+[libfuzzer]
+dict = ovs.dict
diff --git a/tests/oss-fuzz/config/json_parser_fuzzer.options 
b/tests/oss-fuzz/config/json_parser_fuzzer.options
new file mode 100644
index ..8d3739a53fa2
--- /dev/null
+++ b/tests/oss-fuzz/config/json_parser_fuzzer.options
@@ -0,0 +1,2 @@
+[libfuzzer]
+dict = json.dict
diff --git a/tests/oss-fuzz/config/ofp_print_fuzzer.options 
b/tests/oss-fuzz/config/ofp_print_fuzzer.options
new file mode 100644
index ..7f117292e62d
--- /dev/null
+++ b/tests/oss-fuzz/config/ofp_print_fuzzer.options
@@ -0,0 +1,3 @@

[ovs-dev] [PATCH] ofproto-dpif-xlate: Check the right IPv6 address in is_nd_dst_correct().

2018-07-10 Thread Ben Pfaff

Fixes test 815 "tunnel_push_pop_ipv6 - action".

CC: Aaron Conole 
Fixes: 6f231f7c3a9e ("xlate: use const struct in6_addr in linklocal check")
Signed-off-by: Ben Pfaff 
---
 ofproto/ofproto-dpif-xlate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index f8741d446116..dc63afa35a6b 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3831,7 +3831,7 @@ is_nd_dst_correct(const struct flow *flow, const struct 
in6_addr *ipv6_addr)
 const uint8_t *flow_ipv6_addr = (uint8_t *) >ipv6_dst;
 const uint8_t *addr = (uint8_t *) ipv6_addr;
 
-return (IN6_IS_ADDR_MC_LINKLOCAL(ipv6_addr) &&
+return (IN6_IS_ADDR_MC_LINKLOCAL(>ipv6_dst) &&
 flow_ipv6_addr[11] == 0x01 &&
 flow_ipv6_addr[12] == 0xff &&
 flow_ipv6_addr[13] == addr[13] &&
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] NEWS: Re-add vhost zero copy support.

2018-07-10 Thread Ian Stokes


On 7/10/2018 2:33 PM, Loftus, Ciara wrote:


An entry for experimental vhost zero copy support was removed
incorrectly. Re-add this entry to NEWS.

Reported-by: Eelco Chaudron 
Cc: Ciara Loftus 
Fixes: c3c722d2c7ee ("Documentation: document ovs-dpdk flow offload")
Signed-off-by: Ian Stokes 


Acked-by: Ciara Loftus 


Hi Ben,

this is a fairly minor patch, would you mind applying this directly to 
master instead of having it part of a pull request?


Thanks
Ian


---
  NEWS | 1 +
  1 file changed, 1 insertion(+)

diff --git a/NEWS b/NEWS
index 92e9b92..057e8bf 100644
--- a/NEWS
+++ b/NEWS
@@ -115,6 +115,7 @@ v2.9.0 - 19 Feb 2018
   * New appctl command 'dpif-netdev/pmd-rxq-rebalance' to rebalance rxq
to
 pmd assignments.
   * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'.
+ * Add support for vHost dequeue zero copy (experimental).
 - Userspace datapath:
   * Output packet batching support.
 - vswitchd:
--
2.7.5




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] table: New function table_format() for formatting a table as a string.

2018-07-10 Thread Ben Pfaff

On Tue, Jul 10, 2018 at 01:13:31PM +0200, Jakub Sitnicki wrote:
> On Mon,  9 Jul 2018 16:34:00 -0700
> Ben Pfaff  wrote:
> 
> > This will be useful for daemonized ovn-nbctl.
> > 
> > Signed-off-by: Ben Pfaff 
> > ---
> 
> Acked-by: Jakub Sitnicki 

Thanks, applied to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] (no subject)

2018-07-10 Thread support

 
 Hello! My Name is Perry White,P.A to Antonia Ax:son Johnson, she made a 
donation to you. Contact Email: antoniajax...@gmail.com  for more details. 
Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ofproto: Return error codes for Rule insertions"

2018-07-10 Thread Ben Pfaff

OK.

On Tue, Jul 10, 2018 at 02:58:47PM +0530, Aravind Prasad wrote:
> Hi Ben/All,
> 
> If possible, Kindly hold reviewing this patch for now. Expecting an
> approval from my Org. Sorry for the inconvenience caused and thanks for the
> support.
> 
> Will get back and intimate for the review as soon as possible after the
> approval (expecting it to take not more than a week).  And thanks again for
> understanding.
> 
> Thanks,
> Aravind Prasad S
> 
> On Tue, Jul 10, 2018 at 7:06 AM Aravind Prasad  wrote:
> 
> >
> > Currently, rule_insert() API doesnot have return value. There are some
> > possible
> > > scenarios where rule insertions can fail at run-time even though the
> > static
> > > checks during rule_construct() had passed previously.
> > > Some possible scenarios for failure of rule insertions:
> > > **) Rule insertions can fail dynamically in Hybrid mode (both Openflow
> > and
> > > Normal switch functioning coexist) where the CAM space could get suddenly
> > > filled up by Normal switch functioning and Openflow gets devoid of
> > > available space.
> > > **) Some deployments could have separate independent layers for HW rule
> > > insertions and application layer to interact with OVS. HW layer
> > > could face any dynamic issue during rule handling which application could
> > > not have predicted/captured in rule-construction phase.
> > > Rule-insert errors for bundles are not handled in this pull-request.
> > > Will be handled in upcoming pull request.
> >
> > >> I don't think that ofproto-dpif can ever see such a failure.  Are you
> > >> planning to submit an ofproto provider that exercises this behavior?
> >
> > Hi Ben,
> >
> > These type of errors are possible in actual Hardware implementations.
> > It is possible that ofproto and netdev providers could be implemented
> > for a actual HW.
> > Usually, in such cases, in the rule construct phase, all the static
> > checks like verifying the qualifiers and actions could be done and the
> > other related verifications.
> > But during the rule insert phase, it is possible that the rule insertion
> > may get failed in HW (runtime errors, HW errors and so on).
> > Hence, we need a way to rollback for rule-insert phase also.
> > Kindly let me know your views.
> >
> > Thanks,
> > Aravind Prasad S
> >
> >
> > On Tue, Jul 10, 2018 at 3:45 AM Ben Pfaff  wrote:
> >
> >> On Mon, Jul 09, 2018 at 01:02:08PM +0530, Aravind Prasad S wrote:
> >> > Currently, rule_insert() API doesnot have return value. There are some
> >> possible
> >> > scenarios where rule insertions can fail at run-time even though the
> >> static
> >> > checks during rule_construct() had passed previously.
> >> > Some possible scenarios for failure of rule insertions:
> >> > **) Rule insertions can fail dynamically in Hybrid mode (both Openflow
> >> and
> >> > Normal switch functioning coexist) where the CAM space could get
> >> suddenly
> >> > filled up by Normal switch functioning and Openflow gets devoid of
> >> > available space.
> >> > **) Some deployments could have separate independent layers for HW rule
> >> > insertions and application layer to interact with OVS. HW layer
> >> > could face any dynamic issue during rule handling which application
> >> could
> >> > not have predicted/captured in rule-construction phase.
> >> > Rule-insert errors for bundles are not handled in this pull-request.
> >> > Will be handled in upcoming pull request.
> >> >
> >> > Signed-off-by: Aravind Prasad S 
> >>
> >> I don't think that ofproto-dpif can ever see such a failure.  Are you
> >> planning to submit an ofproto provider that exercises this behavior?
> >>
> >> Thanks,
> >>
> >> Ben.
> >>
> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] sparse: Make IN6_IS_ADDR_MC_LINKLOCAL and IN6_ARE_ADDR_EQUAL pickier.

2018-07-10 Thread Ben Pfaff

On GNU systems these macros work with arbitrary pointers, but the relevant
standards only require IN6_IS_ADDR_MC_LINKLOCAL to work with in6_addr (and
don't specify IN6_ARE_ADDR_EQUAL at all).  Make the "sparse"
implementations correspondingly pickier so that we catch any introduced
problems more quickly.

CC: Aaron Conole 
Signed-off-by: Ben Pfaff 
---
 include/sparse/netinet/in.h | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/sparse/netinet/in.h b/include/sparse/netinet/in.h
index eea41bd7f672..21deceb28d41 100644
--- a/include/sparse/netinet/in.h
+++ b/include/sparse/netinet/in.h
@@ -124,14 +124,25 @@ struct sockaddr_in6 {
  (X)->s6_addr[11] == 0xff)
 
 #define IN6_IS_ADDR_MC_LINKLOCAL(a) \
-(((const uint8_t *) (a))[0] == 0xff &&  \
- (((const uint8_t *) (a))[1] & 0xf) == 0x2)
-
-# define IN6_ARE_ADDR_EQUAL(a,b)  \
-const uint32_t *) (a))[0] == ((const uint32_t *) (b))[0]) &&  \
- (((const uint32_t *) (a))[1] == ((const uint32_t *) (b))[1]) &&  \
- (((const uint32_t *) (a))[2] == ((const uint32_t *) (b))[2]) &&  \
- (((const uint32_t *) (a))[3] == ((const uint32_t *) (b))[3]))
+((a)->s6_addr[0] == 0xff && ((a)->s6_addr[1] & 0xf) == 0x2)
+
+# define IN6_ARE_ADDR_EQUAL(a, b)   \
+((a)->s6_addr[0] == (b)->s6_addr[0] &&  \
+ (a)->s6_addr[1] == (b)->s6_addr[1] &&  \
+ (a)->s6_addr[2] == (b)->s6_addr[2] &&  \
+ (a)->s6_addr[3] == (b)->s6_addr[3] &&  \
+ (a)->s6_addr[4] == (b)->s6_addr[4] &&  \
+ (a)->s6_addr[5] == (b)->s6_addr[5] &&  \
+ (a)->s6_addr[6] == (b)->s6_addr[6] &&  \
+ (a)->s6_addr[7] == (b)->s6_addr[7] &&  \
+ (a)->s6_addr[8] == (b)->s6_addr[8] &&  \
+ (a)->s6_addr[9] == (b)->s6_addr[9] &&  \
+ (a)->s6_addr[10] == (b)->s6_addr[10] &&\
+ (a)->s6_addr[11] == (b)->s6_addr[11] &&\
+ (a)->s6_addr[12] == (b)->s6_addr[12] &&\
+ (a)->s6_addr[13] == (b)->s6_addr[13] &&\
+ (a)->s6_addr[14] == (b)->s6_addr[14] &&\
+ (a)->s6_addr[15] == (b)->s6_addr[15])
 
 #define INET_ADDRSTRLEN 16
 #define INET6_ADDRSTRLEN 46
-- 
2.16.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] xlate: use const struct in6_addr in linklocal check

2018-07-10 Thread Ben Pfaff

On Tue, Jul 10, 2018 at 09:34:38AM -0400, Aaron Conole wrote:
> Commit 83c2757bd16e ("xlate: Move tnl_neigh_snoop() to
> terminate_native_tunnel()") introduced a call to
> IN6_IS_ADDR_MC_LINKLOCAL() when checking neighbor discovery.
> 
> The call to this assumes that the argument may be a const uint8_t *.
> According to The Open Group Base Specifications Issue 7, 2018:
> 
> macro is of type int and takes a single argument of
> type const struct in6_addr *
> 
> The GNU implementation allows a bit of flexibility, by internally
> casting the argument.  However, other implementations (such as OS X)
> more rigidly implement the standard and fail with errors like:
> 
> error: member reference base type 'const uint8_t'
>(aka 'const unsigned char') is not a structure or union
> 
> Fixes: 83c2757bd16e ("xlate: Move tnl_neigh_snoop() to 
> terminate_native_tunnel()")
> Cc: Zoltan Balogh 
> Cc: Jan Scheurich 
> Signed-off-by: Aaron Conole 

Thanks, applied to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] conntrack: Fix using alg_exp_entry out of scope.

2018-07-10 Thread Darrell Ball

Good, thanks.

Acked-by: Darrell Ball 

This needs backporting to 2.8.

I re-ran some SA tools and found some other potential use after free bugs.


On Tue, Jul 10, 2018 at 4:05 AM, Ilya Maximets 
wrote:

> 'alg_exp_entry' is allocated on stack memory, but could be used via
> 'alg_exp' pointer inside 'write_ct_md' function, i.e. outside its scope.
>
> CC: Darrell Ball 
> Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.")
> Signed-off-by: Ilya Maximets 
> ---
>
>  lib/conntrack.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 97fd46a..51c1acb 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -1246,9 +1246,9 @@ process_one(struct conntrack *ct, struct dp_packet
> *pkt,
>  }
>
>  const struct alg_exp_node *alg_exp = NULL;
> +struct alg_exp_node alg_exp_entry;
>
>  if (OVS_UNLIKELY(create_new_conn)) {
> -struct alg_exp_node alg_exp_entry;
>
>  ct_rwlock_rdlock(>resources_lock);
>  alg_exp = expectation_lookup(>alg_expectations, >key,
> --
> 2.7.4
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [patch v1] conntrack: Fix conn_update_state_alg use after free.

2018-07-10 Thread Darrell Ball

Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.")
Signed-off-by: Darrell Ball 
---

Needs backporting as far back as 2.8.

 lib/conntrack.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index e1c1f2e..b818584 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1159,8 +1159,11 @@ conn_update_state_alg(struct conntrack *ct, struct 
dp_packet *pkt,
 } else {
 *create_new_conn = conn_update_state(ct, pkt, ctx, , now,
 bucket);
-handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER,
-   !!nat_action_info);
+
+if (*create_new_conn == false) {
+handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER,
+   !!nat_action_info);
+}
 }
 return true;
 }
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [patch v1] conntrack: Fix conn_update_state_alg use after free.

2018-07-10 Thread Darrell Ball

Signed-off-by: Darrell Ball 
---

Needs backporting as far back as 2.8.

 lib/conntrack.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index e1c1f2e..b818584 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1159,8 +1159,11 @@ conn_update_state_alg(struct conntrack *ct, struct 
dp_packet *pkt,
 } else {
 *create_new_conn = conn_update_state(ct, pkt, ctx, , now,
 bucket);
-handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER,
-   !!nat_action_info);
+
+if (*create_new_conn == false) {
+handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER,
+   !!nat_action_info);
+}
 }
 return true;
 }
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [ovs-dev, v4] dpif-netdev: Avoid reordering of packets in a batch with same megaflow

2018-07-10 Thread Ilya Maximets

On 10.07.2018 17:48, Vishal Deep Ajmera wrote:
>>
>> This is potentially dangerous from the future modifications and hard to read 
>> for
>> reviewer/person who tries to understand how it works.
>>
>> Current implementation will fail if someone will change the logic of
>> 'DP_PACKET_BATCH_REFILL_FOR_EACH', for example.
>>
>> The key point is that 'DP_PACKET_BATCH_REFILL_FOR_EACH' and
>> 'dp_packet_batch_refill' manipulates with the batch size in a hidden from the
>> end user manner. Using of the 'packets_->count' directly requires knowledge 
>> of
>> how refilling works internally, but these functions was intended to hide 
>> internals
>> of batch manipulations.
>>
>> Also, as I understand, you're storing 'map_cnt' for each missed packet.
>> So, why not just use 'n_missed' for that purpose?
>>
> 
> map_cnt keeps count of not just the emc miss packet but also all subsequent
> packets (with emc hit) after one emc miss.

Yes. But we're talking about replacing 'packets_->count' with 'n_missed'.
Not about 'map_cnt'.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [ovs-dev, v4] dpif-netdev: Avoid reordering of packets in a batch with same megaflow

2018-07-10 Thread Vishal Deep Ajmera

> 
> This is potentially dangerous from the future modifications and hard to read 
> for
> reviewer/person who tries to understand how it works.
> 
> Current implementation will fail if someone will change the logic of
> 'DP_PACKET_BATCH_REFILL_FOR_EACH', for example.
> 
> The key point is that 'DP_PACKET_BATCH_REFILL_FOR_EACH' and
> 'dp_packet_batch_refill' manipulates with the batch size in a hidden from the
> end user manner. Using of the 'packets_->count' directly requires knowledge of
> how refilling works internally, but these functions was intended to hide 
> internals
> of batch manipulations.
> 
> Also, as I understand, you're storing 'map_cnt' for each missed packet.
> So, why not just use 'n_missed' for that purpose?
> 

map_cnt keeps count of not just the emc miss packet but also all subsequent
packets (with emc hit) after one emc miss.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] ovn: Add router load balancer undnat rule for IPv6

2018-07-10 Thread Mark Michelson


Hi Ben,

Thanks for applying this patch to master. Can you please apply the patch 
to the 2.9 branch as well?


Thank you,
Mark

On 06/26/2018 02:45 PM, Mark Michelson wrote:

A note: if approved, this patch will also need to go into version 2.9

On 06/26/2018 02:42 PM, Mark Michelson wrote:

When configuring a router port to have a redirect-chassis and using an
IPv6 load balancer rule that specifies a TCP/UDP port, load balancing
would not work as expected. This is because a rule to un-dnat the return
traffic from the load balancer destination was not installed. This is
because this rule was only being installed for IPv4 load balancers.

This change adds the same rule for IPv6 load balancers as well.

Signed-off-by: Mark Michelson 
---
  ovn/northd/ovn-northd.c | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 72fe4e795..2ca439b39 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4641,8 +4641,7 @@ add_router_lb_flow(struct hmap *lflows, struct 
ovn_datapath *od,

  free(new_match);
  free(est_match);
-    if (!od->l3dgw_port || !od->l3redirect_port || !backend_ips
-    || addr_family != AF_INET) {
+    if (!od->l3dgw_port || !od->l3redirect_port || !backend_ips) {
  return;
  }
@@ -4651,7 +4650,11 @@ add_router_lb_flow(struct hmap *lflows, struct 
ovn_datapath *od,

   * router has a gateway router port associated.
   */
  struct ds undnat_match = DS_EMPTY_INITIALIZER;
-    ds_put_cstr(_match, "ip4 && (");
+    if (addr_family == AF_INET) {
+    ds_put_cstr(_match, "ip4 && (");
+    } else {
+    ds_put_cstr(_match, "ip6 && (");
+    }
  char *start, *next, *ip_str;
  start = next = xstrdup(backend_ips);
  ip_str = strsep(, ",");
@@ -4666,7 +4669,11 @@ add_router_lb_flow(struct hmap *lflows, struct 
ovn_datapath *od,

  break;
  }
-    ds_put_format(_match, "(ip4.src == %s", ip_address);
+    if (addr_family_ == AF_INET) {
+    ds_put_format(_match, "(ip4.src == %s", ip_address);
+    } else {
+    ds_put_format(_match, "(ip6.src == %s", ip_address);
+    }
  free(ip_address);
  if (port) {
  ds_put_format(_match, " && %s.src == %d) || ",





___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 12/15] netdev-dpdk: support multi-segment jumbo frames

2018-07-10 Thread Lam, Tiago

On 10/07/2018 13:53, Eelco Chaudron wrote:
> 
> 
> On 10 Jul 2018, at 14:48, Lam, Tiago wrote:
> 

[snip]

>>
>> A blatant leftover from the rebase. Thanks, Eelco.
>>
>> I'll wait for Ian's reply in patch 01/15 and take this into account 
>> for v5.
>>
>> Tiago.
> 
> I’m on PTO starting tomorrow evening for 3 weeks, so might not have a 
> change to review your v5.
> 
> So if you are just fixing this, keep my ack :)
> 

Unless something pops up, that's the plan. Thanks for the promptness!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] xlate: use const struct in6_addr in linklocal check

2018-07-10 Thread Aaron Conole

Commit 83c2757bd16e ("xlate: Move tnl_neigh_snoop() to
terminate_native_tunnel()") introduced a call to
IN6_IS_ADDR_MC_LINKLOCAL() when checking neighbor discovery.

The call to this assumes that the argument may be a const uint8_t *.
According to The Open Group Base Specifications Issue 7, 2018:

macro is of type int and takes a single argument of
type const struct in6_addr *

The GNU implementation allows a bit of flexibility, by internally
casting the argument.  However, other implementations (such as OS X)
more rigidly implement the standard and fail with errors like:

error: member reference base type 'const uint8_t'
   (aka 'const unsigned char') is not a structure or union

Fixes: 83c2757bd16e ("xlate: Move tnl_neigh_snoop() to 
terminate_native_tunnel()")
Cc: Zoltan Balogh 
Cc: Jan Scheurich 
Signed-off-by: Aaron Conole 
---
 ofproto/ofproto-dpif-xlate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index c02a0327a..8c95935fa 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3832,7 +3832,7 @@ is_nd_dst_correct(const struct flow *flow, const struct 
in6_addr *ipv6_addr)
 const uint8_t *flow_ipv6_addr = (uint8_t *) >ipv6_dst;
 const uint8_t *addr = (uint8_t *) ipv6_addr;
 
-return (IN6_IS_ADDR_MC_LINKLOCAL(flow_ipv6_addr) &&
+return (IN6_IS_ADDR_MC_LINKLOCAL(ipv6_addr) &&
 flow_ipv6_addr[11] == 0x01 &&
 flow_ipv6_addr[12] == 0xff &&
 flow_ipv6_addr[13] == addr[13] &&
-- 
2.14.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] NEWS: Re-add vhost zero copy support.

2018-07-10 Thread Loftus, Ciara

> 
> An entry for experimental vhost zero copy support was removed
> incorrectly. Re-add this entry to NEWS.
> 
> Reported-by: Eelco Chaudron 
> Cc: Ciara Loftus 
> Fixes: c3c722d2c7ee ("Documentation: document ovs-dpdk flow offload")
> Signed-off-by: Ian Stokes 

Acked-by: Ciara Loftus 

> ---
>  NEWS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/NEWS b/NEWS
> index 92e9b92..057e8bf 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -115,6 +115,7 @@ v2.9.0 - 19 Feb 2018
>   * New appctl command 'dpif-netdev/pmd-rxq-rebalance' to rebalance rxq
> to
> pmd assignments.
>   * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'.
> + * Add support for vHost dequeue zero copy (experimental).
> - Userspace datapath:
>   * Output packet batching support.
> - vswitchd:
> --
> 2.7.5

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] netdev-dpdk: Fix failure to configure flow control at netdev-init.

2018-07-10 Thread Sugesh Chandran

Configuring flow control at ixgbe netdev-init is throwing error in port
start.

For eg: without this fix, user cannot configure flow control on ixgbe dpdk
port as below,

"
ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \
options:dpdk-devargs=:05:00.1 options:rx-flow-ctrl=true
"

Instead,  it must be configured as two different commands,

"
ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \
   options:dpdk-devargs=:05:00.1
ovs-vsctl set Interface dpdk0 options:rx-flow-ctrl=true
"

The DPDK ixgbe driver is now validating all the 'rte_eth_fc_conf' fields before
trying to configuring the dpdk ethdev. Hence OVS can no longer set the
'dont care' fields to just '0' as before. This commit make sure all the
'rte_eth_fc_conf' fields are populated with default values before the dev
init.

Signed-off-by: Sugesh Chandran 
---
 lib/netdev-dpdk.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index bb4d60f26..023b80d3e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1065,14 +1065,6 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 
 mbp_priv = rte_mempool_get_priv(dev->dpdk_mp->mp);
 dev->buf_size = mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM;
-
-/* Get the Flow control configuration for DPDK-ETH */
-diag = rte_eth_dev_flow_ctrl_get(dev->port_id, >fc_conf);
-if (diag) {
-VLOG_DBG("cannot get flow control parameters on port "DPDK_PORT_ID_FMT
- ", err=%d", dev->port_id, diag);
-}
-
 return 0;
 }
 
@@ -1773,6 +1765,13 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
struct smap *args,
 autoneg = smap_get_bool(args, "flow-ctrl-autoneg", false);
 
 fc_mode = fc_mode_set[tx_fc_en][rx_fc_en];
+/* Get the Flow control configuration for DPDK-ETH */
+err = rte_eth_dev_flow_ctrl_get(dev->port_id, >fc_conf);
+if (err) {
+VLOG_INFO("cannot get flow control parameters on port "DPDK_PORT_ID_FMT
+ ", err=%d", dev->port_id, err);
+}
+
 if (dev->fc_conf.mode != fc_mode || autoneg != dev->fc_conf.autoneg) {
 dev->fc_conf.mode = fc_mode;
 dev->fc_conf.autoneg = autoneg;
-- 
2.14.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH] NEWS: Re-add vhost zero copy support.

2018-07-10 Thread Ian Stokes

An entry for experimental vhost zero copy support was removed
incorrectly. Re-add this entry to NEWS.

Reported-by: Eelco Chaudron 
Cc: Ciara Loftus 
Fixes: c3c722d2c7ee ("Documentation: document ovs-dpdk flow offload")
Signed-off-by: Ian Stokes 
---
 NEWS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/NEWS b/NEWS
index 92e9b92..057e8bf 100644
--- a/NEWS
+++ b/NEWS
@@ -115,6 +115,7 @@ v2.9.0 - 19 Feb 2018
  * New appctl command 'dpif-netdev/pmd-rxq-rebalance' to rebalance rxq to
pmd assignments.
  * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'.
+ * Add support for vHost dequeue zero copy (experimental).
- Userspace datapath:
  * Output packet batching support.
- vswitchd:
-- 
2.7.5

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 01/15] netdev-dpdk: Differentiate between mtu/mbuf size.

2018-07-10 Thread Eelco Chaudron




On 10 Jul 2018, at 13:06, Tiago Lam wrote:

> When configuring a mempool, in netdev_dpdk_mempool_configure(), the
> result of a call to dpdk_buf_size() is being used as the MTU. This,
> however, is not the MTU but a ROUND_UP aligned number based on the MTU,
> which could lead to the reuse of mempools even when the real MTUs
> between different ports differ but somehow round up to the same mbuf
> size.
>
> To support multi-segment mbufs, which is coming in later commits, this
> distinction is important. For multi-segment mbufs the mbuf size is
> always the same (multiple mbufs are chained together to hold the
> packet's data), which means that using the mbuf size as the mtu would
> lead to reusing always the same mempool, independent of the MTU set.
>
> To fix this, two fields are now passed to dpdk_mp_create(), the mtu and
> the mbuf_size, thus creating a distinction between the two. The latter
> is used for telling rte_pktmbuf_pool_create() the size of each mbuf,
> while the former is used for tracking purposes in struct dpdk_mp and as
> an input to create a unique hash for the mempool's name.
>
> Signed-off-by: Tiago Lam 
> ---

Changes look good to me.

Acked-by: Eelco Chaudron 



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 12/15] netdev-dpdk: support multi-segment jumbo frames

2018-07-10 Thread Eelco Chaudron




On 10 Jul 2018, at 14:48, Lam, Tiago wrote:


On 10/07/2018 13:40, Ilya Maximets wrote:

On 10.07.2018 15:30, Eelco Chaudron wrote:



On 10 Jul 2018, at 13:06, Tiago Lam wrote:


From: Mark Kavanagh 

Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.

With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), 
irrespective

of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion 
of
the jumbo frame. Each mbuf in the chain is termed a segment, hence 
the

name.

== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and 
the
user must decide on which approach to adopt on init. The 
introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. 
This

is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a 
user-supplied

value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific 
OVSDB

fields:

    ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
    ovs-vsctl set Open_vSwitch . 
other_config:dpdk-lcore-mask=0x10
    ovs-vsctl set Open_vSwitch . 
other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . 
other_config:dpdk-multi-seg-mbufs=true


Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 Documentation/topics/dpdk/jumbo-frames.rst | 51 +
 NEWS   
|  2 +
 lib/dpdk.c 
|  8 +++
 lib/netdev-dpdk.c  
| 92 +-
 lib/netdev-dpdk.h  
|  2 +
 vswitchd/vswitch.xml   
| 23 

 6 files changed, 164 insertions(+), 14 deletions(-)

diff --git a/Documentation/topics/dpdk/jumbo-frames.rst 
b/Documentation/topics/dpdk/jumbo-frames.rst

index 00360b4..d1fb111 100644
--- a/Documentation/topics/dpdk/jumbo-frames.rst
+++ b/Documentation/topics/dpdk/jumbo-frames.rst
@@ -71,3 +71,54 @@ Jumbo frame support has been validated against 
9728B frames, which is the
 largest frame size supported by Fortville NIC using the DPDK i40e 
driver, but
 larger frames and other DPDK NIC drivers may be supported. These 
cases are

 common for use cases involving East-West traffic only.
+
+---
+Multi-segment mbufs
+---
+
+Instead of increasing the size of mbufs within a mempool, such 
that each mbuf
+within the pool is large enough to contain an entire jumbo frame 
of a
+user-defined size, mbufs can be chained together instead. In this 
approach each
+mbuf in the chain stores a portion of the jumbo frame, by default 
~2K bytes,
+irrespective of the user-requested MTU value. Since each mbuf in 
the chain is

+termed a segment, this approach is named "multi-segment mbufs".
+
+This approach may bring more flexibility in use cases where the 
maximum packet
+length may be hard to guess. For example, in cases where packets 
originate from
+sources marked for oflload (such as TSO), each packet may be 
larger than the
+MTU, and as such, when forwarding it to a DPDK port a single mbuf 
may not be

+enough to hold all of the packet's data.
+
+Multi-segment and single-segment mbufs are mutually exclusive, and 
the user
+must decide on which approach to adopt on initialisation. If 
multi-segment
+mbufs is to be enabled, it can be done so with the following 
command::

+
+    $ ovs-vsctl set Open_vSwitch . 
other_config:dpdk-multi-seg-mbufs=true

+
+Single-segment mbufs still remain the default when using OvS-DPDK, 
and the
+above option `dpdk-multi-seg-mbufs` must be explicitly set to 
`true` if

+multi-segment mbufs are to be used.
+
+~
+Performance notes
+~
+
+When using multi-segment mbufs some PMDs may not support 
vectorized Tx
+functions, due to its non-contiguous nature. As a result this can 
hit
+performance for smaller packet sizes. For example, on a setup 
sending 64B
+packets at line rate, a decrease of ~20% has been observed. The 
performance
+impact stops being noticeable for larger packet sizes, although 
the exact size

+will between PMDs, and depending on the architecture one's using.
+
+Tests performed with the i40e PMD driver only showed this 
limitation for 64B
+packets, and the same rate was observed when comparing 
multi-segment mbufs and
+single-segment mbuf for 128B packets. In other words, the 20% drop 
in

Re: [ovs-dev] [PATCH v4 12/15] netdev-dpdk: support multi-segment jumbo frames

2018-07-10 Thread Lam, Tiago

On 10/07/2018 13:40, Ilya Maximets wrote:
> On 10.07.2018 15:30, Eelco Chaudron wrote:
>>
>>
>> On 10 Jul 2018, at 13:06, Tiago Lam wrote:
>>
>>> From: Mark Kavanagh 
>>>
>>> Currently, jumbo frame support for OvS-DPDK is implemented by
>>> increasing the size of mbufs within a mempool, such that each mbuf
>>> within the pool is large enough to contain an entire jumbo frame of
>>> a user-defined size. Typically, for each user-defined MTU,
>>> 'requested_mtu', a new mempool is created, containing mbufs of size
>>> ~requested_mtu.
>>>
>>> With the multi-segment approach, a port uses a single mempool,
>>> (containing standard/default-sized mbufs of ~2k bytes), irrespective
>>> of the user-requested MTU value. To accommodate jumbo frames, mbufs
>>> are chained together, where each mbuf in the chain stores a portion of
>>> the jumbo frame. Each mbuf in the chain is termed a segment, hence the
>>> name.
>>>
>>> == Enabling multi-segment mbufs ==
>>> Multi-segment and single-segment mbufs are mutually exclusive, and the
>>> user must decide on which approach to adopt on init. The introduction
>>> of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
>>> is a global boolean value, which determines how jumbo frames are
>>> represented across all DPDK ports. In the absence of a user-supplied
>>> value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
>>> mbufs must be explicitly enabled / single-segment mbufs remain the
>>> default.
>>>
>>> Setting the field is identical to setting existing DPDK-specific OVSDB
>>> fields:
>>>
>>>     ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
>>>     ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
>>>     ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
>>> ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
>>>
>>> Co-authored-by: Tiago Lam 
>>>
>>> Signed-off-by: Mark Kavanagh 
>>> Signed-off-by: Tiago Lam 
>>> Acked-by: Eelco Chaudron 
>>> ---
>>>  Documentation/topics/dpdk/jumbo-frames.rst | 51 +
>>>  NEWS   |  2 +
>>>  lib/dpdk.c |  8 +++
>>>  lib/netdev-dpdk.c  | 92 
>>> +-
>>>  lib/netdev-dpdk.h  |  2 +
>>>  vswitchd/vswitch.xml   | 23 
>>>  6 files changed, 164 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/Documentation/topics/dpdk/jumbo-frames.rst 
>>> b/Documentation/topics/dpdk/jumbo-frames.rst
>>> index 00360b4..d1fb111 100644
>>> --- a/Documentation/topics/dpdk/jumbo-frames.rst
>>> +++ b/Documentation/topics/dpdk/jumbo-frames.rst
>>> @@ -71,3 +71,54 @@ Jumbo frame support has been validated against 9728B 
>>> frames, which is the
>>>  largest frame size supported by Fortville NIC using the DPDK i40e driver, 
>>> but
>>>  larger frames and other DPDK NIC drivers may be supported. These cases are
>>>  common for use cases involving East-West traffic only.
>>> +
>>> +---
>>> +Multi-segment mbufs
>>> +---
>>> +
>>> +Instead of increasing the size of mbufs within a mempool, such that each 
>>> mbuf
>>> +within the pool is large enough to contain an entire jumbo frame of a
>>> +user-defined size, mbufs can be chained together instead. In this approach 
>>> each
>>> +mbuf in the chain stores a portion of the jumbo frame, by default ~2K 
>>> bytes,
>>> +irrespective of the user-requested MTU value. Since each mbuf in the chain 
>>> is
>>> +termed a segment, this approach is named "multi-segment mbufs".
>>> +
>>> +This approach may bring more flexibility in use cases where the maximum 
>>> packet
>>> +length may be hard to guess. For example, in cases where packets originate 
>>> from
>>> +sources marked for oflload (such as TSO), each packet may be larger than 
>>> the
>>> +MTU, and as such, when forwarding it to a DPDK port a single mbuf may not 
>>> be
>>> +enough to hold all of the packet's data.
>>> +
>>> +Multi-segment and single-segment mbufs are mutually exclusive, and the user
>>> +must decide on which approach to adopt on initialisation. If multi-segment
>>> +mbufs is to be enabled, it can be done so with the following command::
>>> +
>>> +    $ ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
>>> +
>>> +Single-segment mbufs still remain the default when using OvS-DPDK, and the
>>> +above option `dpdk-multi-seg-mbufs` must be explicitly set to `true` if
>>> +multi-segment mbufs are to be used.
>>> +
>>> +~
>>> +Performance notes
>>> +~
>>> +
>>> +When using multi-segment mbufs some PMDs may not support vectorized Tx
>>> +functions, due to its non-contiguous nature. As a result this can hit
>>> +performance for smaller packet sizes. For example, on a setup sending 64B
>>> +packets at line rate, a decrease of ~20% has been observed. The performance
>>> +impact stops being noticeable for larger packet sizes,

Re: [ovs-dev] [PATCH v4 12/15] netdev-dpdk: support multi-segment jumbo frames

2018-07-10 Thread Ian Stokes


On 7/10/2018 1:40 PM, Ilya Maximets wrote:

On 10.07.2018 15:30, Eelco Chaudron wrote:



On 10 Jul 2018, at 13:06, Tiago Lam wrote:


From: Mark Kavanagh 

Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.

With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), irrespective
of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion of
the jumbo frame. Each mbuf in the chain is termed a segment, hence the
name.

== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

     ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
     ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
     ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
  Documentation/topics/dpdk/jumbo-frames.rst | 51 +
  NEWS   |  2 +
  lib/dpdk.c |  8 +++
  lib/netdev-dpdk.c  | 92 +-
  lib/netdev-dpdk.h  |  2 +
  vswitchd/vswitch.xml   | 23 
  6 files changed, 164 insertions(+), 14 deletions(-)

diff --git a/Documentation/topics/dpdk/jumbo-frames.rst 
b/Documentation/topics/dpdk/jumbo-frames.rst
index 00360b4..d1fb111 100644
--- a/Documentation/topics/dpdk/jumbo-frames.rst
+++ b/Documentation/topics/dpdk/jumbo-frames.rst
@@ -71,3 +71,54 @@ Jumbo frame support has been validated against 9728B frames, 
which is the
  largest frame size supported by Fortville NIC using the DPDK i40e driver, but
  larger frames and other DPDK NIC drivers may be supported. These cases are
  common for use cases involving East-West traffic only.
+
+---
+Multi-segment mbufs
+---
+
+Instead of increasing the size of mbufs within a mempool, such that each mbuf
+within the pool is large enough to contain an entire jumbo frame of a
+user-defined size, mbufs can be chained together instead. In this approach each
+mbuf in the chain stores a portion of the jumbo frame, by default ~2K bytes,
+irrespective of the user-requested MTU value. Since each mbuf in the chain is
+termed a segment, this approach is named "multi-segment mbufs".
+
+This approach may bring more flexibility in use cases where the maximum packet
+length may be hard to guess. For example, in cases where packets originate from
+sources marked for oflload (such as TSO), each packet may be larger than the
+MTU, and as such, when forwarding it to a DPDK port a single mbuf may not be
+enough to hold all of the packet's data.
+
+Multi-segment and single-segment mbufs are mutually exclusive, and the user
+must decide on which approach to adopt on initialisation. If multi-segment
+mbufs is to be enabled, it can be done so with the following command::
+
+    $ ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
+
+Single-segment mbufs still remain the default when using OvS-DPDK, and the
+above option `dpdk-multi-seg-mbufs` must be explicitly set to `true` if
+multi-segment mbufs are to be used.
+
+~
+Performance notes
+~
+
+When using multi-segment mbufs some PMDs may not support vectorized Tx
+functions, due to its non-contiguous nature. As a result this can hit
+performance for smaller packet sizes. For example, on a setup sending 64B
+packets at line rate, a decrease of ~20% has been observed. The performance
+impact stops being noticeable for larger packet sizes, although the exact size
+will between PMDs, and depending on the architecture one's using.
+
+Tests performed with the i40e PMD driver only showed this limitation for 64B
+packets, and the same rate was observed when comparing multi-segment mbufs and
+single-segment mbuf for 128B packets. In other words, the 20% drop in
+performance was not observed for packets >= 128B during this test case.
+
+Because of this,

Re: [ovs-dev] [PATCH v4 12/15] netdev-dpdk: support multi-segment jumbo frames

2018-07-10 Thread Ilya Maximets

On 10.07.2018 15:30, Eelco Chaudron wrote:
> 
> 
> On 10 Jul 2018, at 13:06, Tiago Lam wrote:
> 
>> From: Mark Kavanagh 
>>
>> Currently, jumbo frame support for OvS-DPDK is implemented by
>> increasing the size of mbufs within a mempool, such that each mbuf
>> within the pool is large enough to contain an entire jumbo frame of
>> a user-defined size. Typically, for each user-defined MTU,
>> 'requested_mtu', a new mempool is created, containing mbufs of size
>> ~requested_mtu.
>>
>> With the multi-segment approach, a port uses a single mempool,
>> (containing standard/default-sized mbufs of ~2k bytes), irrespective
>> of the user-requested MTU value. To accommodate jumbo frames, mbufs
>> are chained together, where each mbuf in the chain stores a portion of
>> the jumbo frame. Each mbuf in the chain is termed a segment, hence the
>> name.
>>
>> == Enabling multi-segment mbufs ==
>> Multi-segment and single-segment mbufs are mutually exclusive, and the
>> user must decide on which approach to adopt on init. The introduction
>> of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
>> is a global boolean value, which determines how jumbo frames are
>> represented across all DPDK ports. In the absence of a user-supplied
>> value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
>> mbufs must be explicitly enabled / single-segment mbufs remain the
>> default.
>>
>> Setting the field is identical to setting existing DPDK-specific OVSDB
>> fields:
>>
>>     ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
>>     ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
>>     ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
>> ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
>>
>> Co-authored-by: Tiago Lam 
>>
>> Signed-off-by: Mark Kavanagh 
>> Signed-off-by: Tiago Lam 
>> Acked-by: Eelco Chaudron 
>> ---
>>  Documentation/topics/dpdk/jumbo-frames.rst | 51 +
>>  NEWS   |  2 +
>>  lib/dpdk.c |  8 +++
>>  lib/netdev-dpdk.c  | 92 
>> +-
>>  lib/netdev-dpdk.h  |  2 +
>>  vswitchd/vswitch.xml   | 23 
>>  6 files changed, 164 insertions(+), 14 deletions(-)
>>
>> diff --git a/Documentation/topics/dpdk/jumbo-frames.rst 
>> b/Documentation/topics/dpdk/jumbo-frames.rst
>> index 00360b4..d1fb111 100644
>> --- a/Documentation/topics/dpdk/jumbo-frames.rst
>> +++ b/Documentation/topics/dpdk/jumbo-frames.rst
>> @@ -71,3 +71,54 @@ Jumbo frame support has been validated against 9728B 
>> frames, which is the
>>  largest frame size supported by Fortville NIC using the DPDK i40e driver, 
>> but
>>  larger frames and other DPDK NIC drivers may be supported. These cases are
>>  common for use cases involving East-West traffic only.
>> +
>> +---
>> +Multi-segment mbufs
>> +---
>> +
>> +Instead of increasing the size of mbufs within a mempool, such that each 
>> mbuf
>> +within the pool is large enough to contain an entire jumbo frame of a
>> +user-defined size, mbufs can be chained together instead. In this approach 
>> each
>> +mbuf in the chain stores a portion of the jumbo frame, by default ~2K bytes,
>> +irrespective of the user-requested MTU value. Since each mbuf in the chain 
>> is
>> +termed a segment, this approach is named "multi-segment mbufs".
>> +
>> +This approach may bring more flexibility in use cases where the maximum 
>> packet
>> +length may be hard to guess. For example, in cases where packets originate 
>> from
>> +sources marked for oflload (such as TSO), each packet may be larger than the
>> +MTU, and as such, when forwarding it to a DPDK port a single mbuf may not be
>> +enough to hold all of the packet's data.
>> +
>> +Multi-segment and single-segment mbufs are mutually exclusive, and the user
>> +must decide on which approach to adopt on initialisation. If multi-segment
>> +mbufs is to be enabled, it can be done so with the following command::
>> +
>> +    $ ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
>> +
>> +Single-segment mbufs still remain the default when using OvS-DPDK, and the
>> +above option `dpdk-multi-seg-mbufs` must be explicitly set to `true` if
>> +multi-segment mbufs are to be used.
>> +
>> +~
>> +Performance notes
>> +~
>> +
>> +When using multi-segment mbufs some PMDs may not support vectorized Tx
>> +functions, due to its non-contiguous nature. As a result this can hit
>> +performance for smaller packet sizes. For example, on a setup sending 64B
>> +packets at line rate, a decrease of ~20% has been observed. The performance
>> +impact stops being noticeable for larger packet sizes, although the exact 
>> size
>> +will between PMDs, and depending on the architecture one's using.
>> +
>> +Tests performed with the i40e PMD driver only showed

Re: [ovs-dev] [PATCH v4 12/15] netdev-dpdk: support multi-segment jumbo frames

2018-07-10 Thread Eelco Chaudron





On 10 Jul 2018, at 13:06, Tiago Lam wrote:


From: Mark Kavanagh 

Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.

With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), irrespective
of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion of
the jumbo frame. Each mbuf in the chain is termed a segment, hence the
name.

== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . 
other_config:dpdk-multi-seg-mbufs=true


Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 Documentation/topics/dpdk/jumbo-frames.rst | 51 +
 NEWS   |  2 +
 lib/dpdk.c |  8 +++
 lib/netdev-dpdk.c  | 92 
+-

 lib/netdev-dpdk.h  |  2 +
 vswitchd/vswitch.xml   | 23 
 6 files changed, 164 insertions(+), 14 deletions(-)

diff --git a/Documentation/topics/dpdk/jumbo-frames.rst 
b/Documentation/topics/dpdk/jumbo-frames.rst

index 00360b4..d1fb111 100644
--- a/Documentation/topics/dpdk/jumbo-frames.rst
+++ b/Documentation/topics/dpdk/jumbo-frames.rst
@@ -71,3 +71,54 @@ Jumbo frame support has been validated against 
9728B frames, which is the
 largest frame size supported by Fortville NIC using the DPDK i40e 
driver, but
 larger frames and other DPDK NIC drivers may be supported. These 
cases are

 common for use cases involving East-West traffic only.
+
+---
+Multi-segment mbufs
+---
+
+Instead of increasing the size of mbufs within a mempool, such that 
each mbuf

+within the pool is large enough to contain an entire jumbo frame of a
+user-defined size, mbufs can be chained together instead. In this 
approach each
+mbuf in the chain stores a portion of the jumbo frame, by default ~2K 
bytes,
+irrespective of the user-requested MTU value. Since each mbuf in the 
chain is

+termed a segment, this approach is named "multi-segment mbufs".
+
+This approach may bring more flexibility in use cases where the 
maximum packet
+length may be hard to guess. For example, in cases where packets 
originate from
+sources marked for oflload (such as TSO), each packet may be larger 
than the
+MTU, and as such, when forwarding it to a DPDK port a single mbuf may 
not be

+enough to hold all of the packet's data.
+
+Multi-segment and single-segment mbufs are mutually exclusive, and 
the user
+must decide on which approach to adopt on initialisation. If 
multi-segment
+mbufs is to be enabled, it can be done so with the following 
command::

+
+$ ovs-vsctl set Open_vSwitch . 
other_config:dpdk-multi-seg-mbufs=true

+
+Single-segment mbufs still remain the default when using OvS-DPDK, 
and the
+above option `dpdk-multi-seg-mbufs` must be explicitly set to `true` 
if

+multi-segment mbufs are to be used.
+
+~
+Performance notes
+~
+
+When using multi-segment mbufs some PMDs may not support vectorized 
Tx

+functions, due to its non-contiguous nature. As a result this can hit
+performance for smaller packet sizes. For example, on a setup sending 
64B
+packets at line rate, a decrease of ~20% has been observed. The 
performance
+impact stops being noticeable for larger packet sizes, although the 
exact size

+will between PMDs, and depending on the architecture one's using.
+
+Tests performed with the i40e PMD driver only showed this limitation 
for 64B
+packets, and the same rate was observed when comparing multi-segment 
mbufs and

+single-segment mbuf for 128B packets. In other words, the 20% drop in
+performance was not observed for packets >= 128B during this test 
case.

+
+Because of this, multi-segment mbufs is not advised to be used with 
smaller

[ovs-dev] How many are queues available at egress port by default ? - Please reply !

2018-07-10 Thread rakesh kumar

I am trying to implement 8 queues at egress port in the switch and will
queue the packets based on the priority I need the current file name which
is handing the queuing for OVS .

Regards
Rakesh Kumar
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [ovs-dev, v4] dpif-netdev: Avoid reordering of packets in a batch with same megaflow

2018-07-10 Thread Ilya Maximets

On 10.07.2018 14:14, Vishal Deep Ajmera wrote:
>>> +
>>> +/* preserve the order of packet for flow batching */
>>> +index_map[packets_->count - 1] = map_cnt;
>>
>> Using the "packets_->count" without accessor + inside a refill loop is highly
>> inaccurate/potentially dangerous. Isn't it equal to "n_missed"?
>>
> Hi Ilya,
> 
> Thanks for the review. However I could not understand your point about
> inaccurate/potentially dangerous code. Can you elaborate on a scenario 
> where this is likely to fail/crash ?

This is potentially dangerous from the future modifications and hard to
read for reviewer/person who tries to understand how it works.

Current implementation will fail if someone will change the logic of
'DP_PACKET_BATCH_REFILL_FOR_EACH', for example.

The key point is that 'DP_PACKET_BATCH_REFILL_FOR_EACH' and
'dp_packet_batch_refill' manipulates with the batch size in a hidden
from the end user manner. Using of the 'packets_->count' directly
requires knowledge of how refilling works internally, but these
functions was intended to hide internals of batch manipulations.

Also, as I understand, you're storing 'map_cnt' for each missed packet.
So, why not just use 'n_missed' for that purpose?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [ovs-dev, v4] dpif-netdev: Avoid reordering of packets in a batch with same megaflow

2018-07-10 Thread Vishal Deep Ajmera

> > +
> > +/* preserve the order of packet for flow batching */
> > +index_map[packets_->count - 1] = map_cnt;
> 
> Using the "packets_->count" without accessor + inside a refill loop is highly
> inaccurate/potentially dangerous. Isn't it equal to "n_missed"?
> 
Hi Ilya,

Thanks for the review. However I could not understand your point about
inaccurate/potentially dangerous code. Can you elaborate on a scenario 
where this is likely to fail/crash ?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] table: New function table_format() for formatting a table as a string.

2018-07-10 Thread Jakub Sitnicki

On Mon,  9 Jul 2018 16:34:00 -0700
Ben Pfaff  wrote:

> This will be useful for daemonized ovn-nbctl.
> 
> Signed-off-by: Ben Pfaff 
> ---

Acked-by: Jakub Sitnicki 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 01/15] netdev-dpdk: Differentiate between mtu/mbuf size.

2018-07-10 Thread Lam, Tiago

On 10/07/2018 12:06, Tiago Lam wrote:
> When configuring a mempool, in netdev_dpdk_mempool_configure(), the
> result of a call to dpdk_buf_size() is being used as the MTU. This,
> however, is not the MTU but a ROUND_UP aligned number based on the MTU,
> which could lead to the reuse of mempools even when the real MTUs
> between different ports differ but somehow round up to the same mbuf
> size.
> 
> To support multi-segment mbufs, which is coming in later commits, this
> distinction is important. For multi-segment mbufs the mbuf size is
> always the same (multiple mbufs are chained together to hold the
> packet's data), which means that using the mbuf size as the mtu would
> lead to reusing always the same mempool, independent of the MTU set.
> 
> To fix this, two fields are now passed to dpdk_mp_create(), the mtu and
> the mbuf_size, thus creating a distinction between the two. The latter
> is used for telling rte_pktmbuf_pool_create() the size of each mbuf,
> while the former is used for tracking purposes in struct dpdk_mp and as
> an input to create a unique hash for the mempool's name.
> 
> Signed-off-by: Tiago Lam 
> ---
>  lib/netdev-dpdk.c | 22 +-
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 

Hi Ian,

This patch changes the way the dpdk_mp struct tracks the "mtu", which
was introduced in "Support both shared and per port mempools". Instead
of assigning the "buf_size" to the "mtu" member in the dpdk_mp struct,
we assign the "requested_mtu".

This was needed because for multi-segment mbufs the "mbuf_size" may end
up being always the same even for different MTUs (as the mbufs are
chained together to hold all data), which meant the same mempool would
end up being reused.

I don't expect this to change the functionality (ran a few tests myself
to confirm the correct behavior), but does this sound OK to you? Any
thoughts here?

Thanks,
Tiago.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 14/15] dpdk-tests: Accept other configs in OVS_DPDK_START

2018-07-10 Thread Tiago Lam

As it stands, OVS_DPDK_START() won't allow other configs to be set
before starting the ovs-vswitchd daemon. This is a problem since some
configs, such as the "dpdk-multi-seg-mbufs=true" for enabling the
multi-segment mbufs, need to be set prior to start OvS.

To support other options, OVS_DPDK_START() has been modified to accept
extra configs in the form "$config_name=$config_value". It then uses
ovs-vsctl to set the configs.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 tests/system-dpdk-macros.at | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at
index 0762ee0..485a0f3 100644
--- a/tests/system-dpdk-macros.at
+++ b/tests/system-dpdk-macros.at
@@ -21,7 +21,7 @@ m4_define([OVS_DPDK_PRE_CHECK],
 ])
 
 
-# OVS_DPDK_START()
+# OVS_DPDK_START([other-conf-args])
 #
 # Create an empty database and start ovsdb-server. Add special configuration
 # dpdk-init to enable DPDK functionality. Start ovs-vswitchd connected to that
@@ -48,6 +48,11 @@ m4_define([OVS_DPDK_START],
AT_CHECK([lscpu], [], [stdout])
AT_CHECK([cat stdout | grep "NUMA node(s)" | awk '{c=1; while (c++<$(3)) 
{printf "1024,"}; print "1024"}' > SOCKET_MEM])
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
other_config:dpdk-socket-mem="$(cat SOCKET_MEM)"])
+   dnl Iterate through $other-conf-args list and include them
+   m4_foreach_w(opt, $1, [
+   AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . other_config:opt])
+   ])
+   dnl m4_if([$1], [], [], [AT_CHECK(["$1"])])
 
dnl Start ovs-vswitchd.
AT_CHECK([ovs-vswitchd --detach --no-chdir --pidfile --log-file -vvconn 
-vofproto_dpif -vunixctl], [0], [stdout], [stderr])
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 15/15] dpdk-tests: End-to-end tests for multi-seg mbufs.

2018-07-10 Thread Tiago Lam

The following tests are added to the DPDK testsuite to add some
coverage for the multi-segment mbufs:
- Check that multi-segment mbufs are disabled by default;
- Check that providing `other_config:dpdk-multi-seg-mbufs=true` indeed
  enables mbufs;
- Using a DPDK port, send a random packet out and check that `ofctl
  dump-flows` shows the correct amount of packets and bytes sent.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 tests/system-dpdk.at | 65 
 1 file changed, 65 insertions(+)

diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
index 3d21b01..7468d49 100644
--- a/tests/system-dpdk.at
+++ b/tests/system-dpdk.at
@@ -71,3 +71,68 @@ OVS_VSWITCHD_STOP("/does not exist. The Open vSwitch kernel 
module is probably n
 ")
 AT_CLEANUP
 dnl --
+
+AT_SETUP([Jumbo frames - Multi-segment disabled by default])
+OVS_DPDK_START()
+
+AT_CHECK([grep "multi-segment mbufs enabled" ovs-vswitchd.log], [1], [])
+OVS_VSWITCHD_STOP("/Global register is changed during/d
+/EAL: No free hugepages reported in hugepages-1048576kB/d
+")
+AT_CLEANUP
+
+AT_SETUP([Jumbo frames - Multi-segment enabled])
+OVS_DPDK_START([dpdk-multi-seg-mbufs=true])
+AT_CHECK([grep "multi-segment mbufs enabled" ovs-vswitchd.log], [], [stdout])
+OVS_VSWITCHD_STOP("/Global register is changed during/d
+/EAL: No free hugepages reported in hugepages-1048576kB/d
+")
+AT_CLEANUP
+
+AT_SETUP([Jumbo frames - Multi-segment mbufs Tx])
+OVS_DPDK_PRE_CHECK()
+OVS_DPDK_START([dpdk-multi-seg-mbufs=true])
+
+dnl Add userspace bridge and attach it to OVS
+AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
+AT_CHECK([ovs-vsctl add-port br10 dpdk0 \
+-- set Interface dpdk0 type=dpdk options:dpdk-devargs=$(cat PCI_ADDR) \
+-- set Interface dpdk0 mtu_request=9000], [], [stdout], [stderr])
+
+AT_CHECK([ovs-vsctl show], [], [stdout])
+
+dnl Add flows to send packets out from the 'dpdk0' port
+AT_CHECK([
+ovs-ofctl del-flows br10
+ovs-ofctl add-flow br10 in_port=LOCAL,actions=output:dpdk0
+], [], [stdout])
+
+AT_CHECK([ovs-ofctl dump-flows br10], [], [stdout])
+
+dnl Send packet out, of the 'dpdk0' port
+AT_CHECK([
+ARP_HEADER="09000B0009000A000806000108000604000100010A\
+0100020A02"
+dnl Build a random hex string to append to the ARP_HEADER
+RANDOM_BODY=$(printf '0102030405%.0s' {1..1750})
+dnl 8792B ARP packet
+RANDOM_ARP="$ARP_HEADER$RANDOM_BODY"
+
+ovs-ofctl packet-out br10 "packet=$RANDOM_ARP,action=resubmit:LOCAL"
+], [], [stdout])
+
+AT_CHECK([ovs-ofctl dump-flows br10], [0], [stdout])
+
+dnl Confirm the single packet as been sent with correct size
+AT_CHECK([ovs-ofctl dump-flows br10 | ofctl_strip | grep in_port], [0], [dnl
+ n_packets=1, n_bytes=8792, in_port=LOCAL actions=output:1
+])
+
+dnl Clean up
+OVS_VSWITCHD_STOP("/does not exist. The Open vSwitch kernel module is probably 
not loaded./d
+/Failed to enable flow control/d
+/failed to connect to \/tmp\/dpdkvhostclient0: No such file or directory/d
+/Global register is changed during/d
+/EAL: No free hugepages reported in hugepages-1048576kB/d
+")
+AT_CLEANUP
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 13/15] dpdk-tests: Add uni-tests for multi-seg mbufs.

2018-07-10 Thread Tiago Lam

In order to create a minimal environment that allows the tests to get
mbufs from an existing mempool, the following approach is taken:
- EAL is initialised (by using the main dpdk_init()) and a (very) small
  mempool is instantiated (mimicking the logic in dpdk_mp_create()).
  This mempool instance is global and used by all the tests;
- Packets are then allocated from the instantiated mempool, and tested
  on, by running some operations on them and manipulating data.

The tests introduced focus on testing DPDK dp_packets (where
source=DPBUF_DPDK), linked with a single or multiple mbufs, across
several operations, such as:
- dp_packet_put();
- dp_packet_shift();
- dp_packet_reserve();
- dp_packet_push_uninit();
- dp_packet_clear();
- And as a consequence of some of these, dp_packet_put_uninit() and
  dp_packet_resize__().

Finally, this has also been integrated with the new DPDK testsuite.
Thus, when running `$sudo make check-dpdk` one will also be running
these tests.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 tests/automake.mk  |  10 +-
 tests/dpdk-packet-mbufs.at |   7 +
 tests/system-dpdk-testsuite.at |   1 +
 tests/test-dpdk-mbufs.c| 518 +
 4 files changed, 535 insertions(+), 1 deletion(-)
 create mode 100644 tests/dpdk-packet-mbufs.at
 create mode 100644 tests/test-dpdk-mbufs.c

diff --git a/tests/automake.mk b/tests/automake.mk
index 8224e5a..b3941d0 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -134,7 +134,8 @@ SYSTEM_DPDK_TESTSUITE_AT = \
tests/system-common-macros.at \
tests/system-dpdk-macros.at \
tests/system-dpdk-testsuite.at \
-   tests/system-dpdk.at
+   tests/system-dpdk.at \
+   tests/dpdk-packet-mbufs.at
 
 check_SCRIPTS += tests/atlocal
 
@@ -391,6 +392,10 @@ tests_ovstest_SOURCES = \
tests/test-vconn.c \
tests/test-aa.c \
tests/test-stopwatch.c
+if DPDK_NETDEV
+tests_ovstest_SOURCES = \
+   tests/test-dpdk-mbufs.c
+endif
 
 if !WIN32
 tests_ovstest_SOURCES += \
@@ -403,6 +408,9 @@ tests_ovstest_SOURCES += \
 endif
 
 tests_ovstest_LDADD = lib/libopenvswitch.la ovn/lib/libovn.la
+if DPDK_NETDEV
+tests_ovstest_LDFLAGS = $(AM_LDFLAGS) $(DPDK_vswitchd_LDFLAGS)
+endif
 
 noinst_PROGRAMS += tests/test-strtok_r
 tests_test_strtok_r_SOURCES = tests/test-strtok_r.c
diff --git a/tests/dpdk-packet-mbufs.at b/tests/dpdk-packet-mbufs.at
new file mode 100644
index 000..f28e4fc
--- /dev/null
+++ b/tests/dpdk-packet-mbufs.at
@@ -0,0 +1,7 @@
+AT_BANNER([OVS-DPDK dp_packet unit tests])
+
+AT_SETUP([OVS-DPDK dp_packet - mbufs allocation])
+AT_KEYWORDS([dp_packet, multi-seg, mbufs])
+AT_CHECK(ovstest test-dpdk-packet, [], [ignore], [ignore])
+
+AT_CLEANUP
diff --git a/tests/system-dpdk-testsuite.at b/tests/system-dpdk-testsuite.at
index 382f09e..f5edf58 100644
--- a/tests/system-dpdk-testsuite.at
+++ b/tests/system-dpdk-testsuite.at
@@ -23,3 +23,4 @@ m4_include([tests/system-common-macros.at])
 m4_include([tests/system-dpdk-macros.at])
 
 m4_include([tests/system-dpdk.at])
+m4_include([tests/dpdk-packet-mbufs.at])
diff --git a/tests/test-dpdk-mbufs.c b/tests/test-dpdk-mbufs.c
new file mode 100644
index 000..1c77038
--- /dev/null
+++ b/tests/test-dpdk-mbufs.c
@@ -0,0 +1,518 @@
+/*
+ * Copyright (c) 2018 Intel Corporation
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "dp-packet.h"
+#include "ovstest.h"
+#include "dpdk.h"
+#include "smap.h"
+
+#define N_MBUFS 1024
+#define MBUF_DATA_LEN 2048
+
+int num_tests = 0;
+
+/* Global var to hold a mempool instance, "test-mp", used in all of the tests
+ * below. This instance is instantiated in dpdk_setup_eal_with_mp(). */
+static struct rte_mempool *mp;
+
+/* Test data used to fill the packets with data. Note that this isn't a string
+ * that repsents a valid packet, by any means. The pattern is generated in set_
+ * testing_pattern_str() and the sole purpose is to verify the data remains the
+ * same after inserting and operating on multi-segment mbufs. */
+static char *test_str;
+
+/* Asserts a dp_packet that holds a single mbuf, where:
+ * - nb_segs must be 1;
+ * - pkt_len must be equal to data_len which in turn must equal the provided
+ *   'pkt_len';
+ * - data_off must start at the provided 'data_ofs';
+ * - next must be NULL. */
+static void
+assert_single_mbuf(struct dp_packet

[ovs-dev] [PATCH v4 12/15] netdev-dpdk: support multi-segment jumbo frames

2018-07-10 Thread Tiago Lam

From: Mark Kavanagh 

Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.

With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), irrespective
of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion of
the jumbo frame. Each mbuf in the chain is termed a segment, hence the
name.

== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 Documentation/topics/dpdk/jumbo-frames.rst | 51 +
 NEWS   |  2 +
 lib/dpdk.c |  8 +++
 lib/netdev-dpdk.c  | 92 +-
 lib/netdev-dpdk.h  |  2 +
 vswitchd/vswitch.xml   | 23 
 6 files changed, 164 insertions(+), 14 deletions(-)

diff --git a/Documentation/topics/dpdk/jumbo-frames.rst 
b/Documentation/topics/dpdk/jumbo-frames.rst
index 00360b4..d1fb111 100644
--- a/Documentation/topics/dpdk/jumbo-frames.rst
+++ b/Documentation/topics/dpdk/jumbo-frames.rst
@@ -71,3 +71,54 @@ Jumbo frame support has been validated against 9728B frames, 
which is the
 largest frame size supported by Fortville NIC using the DPDK i40e driver, but
 larger frames and other DPDK NIC drivers may be supported. These cases are
 common for use cases involving East-West traffic only.
+
+---
+Multi-segment mbufs
+---
+
+Instead of increasing the size of mbufs within a mempool, such that each mbuf
+within the pool is large enough to contain an entire jumbo frame of a
+user-defined size, mbufs can be chained together instead. In this approach each
+mbuf in the chain stores a portion of the jumbo frame, by default ~2K bytes,
+irrespective of the user-requested MTU value. Since each mbuf in the chain is
+termed a segment, this approach is named "multi-segment mbufs".
+
+This approach may bring more flexibility in use cases where the maximum packet
+length may be hard to guess. For example, in cases where packets originate from
+sources marked for oflload (such as TSO), each packet may be larger than the
+MTU, and as such, when forwarding it to a DPDK port a single mbuf may not be
+enough to hold all of the packet's data.
+
+Multi-segment and single-segment mbufs are mutually exclusive, and the user
+must decide on which approach to adopt on initialisation. If multi-segment
+mbufs is to be enabled, it can be done so with the following command::
+
+$ ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
+
+Single-segment mbufs still remain the default when using OvS-DPDK, and the
+above option `dpdk-multi-seg-mbufs` must be explicitly set to `true` if
+multi-segment mbufs are to be used.
+
+~
+Performance notes
+~
+
+When using multi-segment mbufs some PMDs may not support vectorized Tx
+functions, due to its non-contiguous nature. As a result this can hit
+performance for smaller packet sizes. For example, on a setup sending 64B
+packets at line rate, a decrease of ~20% has been observed. The performance
+impact stops being noticeable for larger packet sizes, although the exact size
+will between PMDs, and depending on the architecture one's using.
+
+Tests performed with the i40e PMD driver only showed this limitation for 64B
+packets, and the same rate was observed when comparing multi-segment mbufs and
+single-segment mbuf for 128B packets. In other words, the 20% drop in
+performance was not observed for packets >= 128B during this test case.
+
+Because of this, multi-segment mbufs is not advised to be used with smaller
+packet sizes, such as 64B.
+
+Also, note that using multi-segment mbufs won't improve

[ovs-dev] [PATCH v4 11/15] netdev-dpdk: copy large packet to multi-seg. mbufs

2018-07-10 Thread Tiago Lam

From: Mark Kavanagh 

Currently, packets are only copied to a single segment in the function
dpdk_do_tx_copy(). This could be an issue in the case of jumbo frames,
particularly when multi-segment mbufs are involved.

This patch calculates the number of segments needed by a packet and
copies the data to each segment.

A new function, dpdk_buf_alloc(), has also been introduced as a wrapper
around the nonpmd_mp_mutex to serialise allocations from a non-pmd
context.

Co-authored-by: Michael Qiu 
Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Michael Qiu 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/netdev-dpdk.c | 79 ---
 1 file changed, 70 insertions(+), 9 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0a3eb3a..81dd1ed 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -552,6 +552,22 @@ dpdk_rte_mzalloc(size_t sz)
 return rte_zmalloc(OVS_VPORT_DPDK, sz, OVS_CACHE_LINE_SIZE);
 }
 
+static struct rte_mbuf *
+dpdk_buf_alloc(struct rte_mempool *mp)
+{
+if (!dpdk_thread_is_pmd()) {
+ovs_mutex_lock(_mp_mutex);
+}
+
+struct rte_mbuf *mbuf = rte_pktmbuf_alloc(mp);
+
+if (!dpdk_thread_is_pmd()) {
+ovs_mutex_unlock(_mp_mutex);
+}
+
+return mbuf;
+}
+
 void
 free_dpdk_buf(struct dp_packet *packet)
 {
@@ -2313,6 +2329,49 @@ out:
 }
 }
 
+static int
+dpdk_copy_dp_packet_to_mbuf(struct dp_packet *packet, struct rte_mbuf **head,
+struct rte_mempool *mp)
+{
+struct rte_mbuf *mbuf, *fmbuf;
+uint32_t size = dp_packet_size(packet);
+uint16_t max_data_len;
+uint32_t nb_segs = 0;
+
+/* Allocate first mbuf to know the size of data available */
+fmbuf = mbuf = *head = dpdk_buf_alloc(mp);
+if (OVS_UNLIKELY(!mbuf)) {
+return ENOMEM;
+}
+
+/* All new allocated mbuf's max data len is the same */
+max_data_len = mbuf->buf_len - mbuf->data_off;
+
+/* Calculate # of output mbufs. */
+nb_segs = size / max_data_len;
+if (size % max_data_len) {
+nb_segs = nb_segs + 1;
+}
+
+/* Allocate additional mbufs, less the one alredy allocated above */
+for (int i = 1; i < nb_segs; i++) {
+mbuf->next = dpdk_buf_alloc(mp);
+if (!mbuf->next) {
+free_dpdk_buf(CONTAINER_OF(fmbuf, struct dp_packet, mbuf));
+fmbuf = NULL;
+return ENOMEM;
+}
+mbuf = mbuf->next;
+}
+
+fmbuf->nb_segs = nb_segs;
+fmbuf->pkt_len = size;
+
+dp_packet_mbuf_write(fmbuf, 0, size, dp_packet_data(packet));
+
+return 0;
+}
+
 /* Tx function. Transmit packets indefinitely */
 static void
 dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
@@ -2329,6 +2388,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
dp_packet_batch *batch)
 struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
 uint32_t cnt = batch_cnt;
 uint32_t dropped = 0;
+uint32_t i;
 
 if (dev->type != DPDK_DEV_VHOST) {
 /* Check if QoS has been configured for this netdev. */
@@ -2339,28 +2399,29 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct 
dp_packet_batch *batch)
 
 uint32_t txcnt = 0;
 
-for (uint32_t i = 0; i < cnt; i++) {
+for (i = 0; i < cnt; i++) {
 struct dp_packet *packet = batch->packets[i];
 uint32_t size = dp_packet_size(packet);
+int err = 0;
 
 if (OVS_UNLIKELY(size > dev->max_packet_len)) {
 VLOG_WARN_RL(, "Too big size %u max_packet_len %d",
  size, dev->max_packet_len);
-
 dropped++;
 continue;
 }
 
-pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
-if (OVS_UNLIKELY(!pkts[txcnt])) {
+err = dpdk_copy_dp_packet_to_mbuf(packet, [txcnt],
+  dev->dpdk_mp->mp);
+if (err != 0) {
+if (err == ENOMEM) {
+VLOG_ERR_RL(, "Failed to alloc mbufs! %u packets dropped",
+cnt - i);
+}
+
 dropped += cnt - i;
 break;
 }
-
-/* We have to do a copy for now */
-memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *),
-   dp_packet_data(packet), size);
-dp_packet_set_size((struct dp_packet *)pkts[txcnt], size);
 dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet);
 
 txcnt++;
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 09/15] dp-packet: Handle multi-seg mbufs in resize__().

2018-07-10 Thread Tiago Lam

When enabled with DPDK OvS relies on mbufs allocated by mempools to
receive and output data on DPDK ports. Until now, each OvS dp_packet has
had only one mbuf associated, which is allocated with the maximum
possible size, taking the MTU into account. This approach, however,
doesn't allow us to increase the allocated size in an mbuf, if needed,
since an mbuf is allocated and initialised upon mempool creation. Thus,
in the current implementatin this is dealt with by calling
OVS_NOT_REACHED() and terminating OvS.

To avoid this, and allow the (already) allocated space to be better
used, dp_packet_resize__() now tries to use the available room, both the
tailroom and the headroom, to make enough space for the new data. Since
this happens for packets of source DPBUF_DPDK, the single-segment mbuf
case mentioned above is also covered by this new aproach in resize__().

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.c | 48 ++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index d6e19eb..87af459 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b, size_t 
new_headroom, size_t new_tailroom
 new_allocated = new_headroom + dp_packet_size(b) + new_tailroom;
 
 switch (b->source) {
+/* When resizing mbufs, both a single mbuf and multi-segment mbufs (where
+ * data is not contigously held in memory), both the headroom and the
+ * tailroom available will be used to make more space for where data needs
+ * to be inserted. I.e if there's not enough headroom, data may be shifted
+ * right if there's enough tailroom.
+ * However, this is not bulletproof and in some cases the space available
+ * won't be enough - in those cases, an error should be returned and the
+ * packet dropped. */
 case DPBUF_DPDK:
-OVS_NOT_REACHED();
+{
+size_t miss_len;
+
+if (new_headroom == dp_packet_headroom(b)) {
+/* This is a tailroom adjustment. Since there's no tailroom space
+ * left, try and shift data towards the head to free up tail space,
+ * if there's enough headroom */
+
+miss_len = new_tailroom - dp_packet_tailroom(b);
+
+if (miss_len <= new_headroom) {
+dp_packet_shift(b, -miss_len);
+} else {
+/* XXX: Handle error case and report error to caller */
+OVS_NOT_REACHED();
+}
+} else {
+/* Otherwise, this is a headroom adjustment. Try to shift data
+ * towards the tail to free up head space, if there's enough
+ * tailroom */
+
+miss_len = new_headroom - dp_packet_headroom(b);
 
+
+if (miss_len <= new_tailroom) {
+dp_packet_shift(b, miss_len);
+} else {
+/* XXX: Handle error case and report error to caller */
+OVS_NOT_REACHED();
+}
+}
+
+new_base = dp_packet_base(b);
+
+break;
+}
 case DPBUF_MALLOC:
 if (new_headroom == dp_packet_headroom(b)) {
 new_base = xrealloc(dp_packet_base(b), new_allocated);
@@ -263,7 +305,9 @@ dp_packet_resize__(struct dp_packet *b, size_t 
new_headroom, size_t new_tailroom
 OVS_NOT_REACHED();
 }
 
-dp_packet_set_allocated(b, new_allocated);
+if (b->source != DPBUF_DPDK) {
+dp_packet_set_allocated(b, new_allocated);
+}
 dp_packet_set_base(b, new_base);
 
 new_data = (char *) new_base + new_headroom;
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 10/15] dp-packet: copy data from multi-seg. DPDK mbuf

2018-07-10 Thread Tiago Lam

From: Michael Qiu 

When doing packet clone, if packet source is from DPDK driver,
multi-segment must be considered, and copy the segment's data one by
one.

Also, lots of DPDK mbuf's info is missed during a copy, like packet
type, ol_flags, etc.  That information is very important for DPDK to do
packets processing.

Co-authored-by: Mark Kavanagh 
Co-authored-by: Tiago Lam 

Signed-off-by: Michael Qiu 
Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.c   | 71 ++-
 lib/dp-packet.h   |  3 +++
 lib/netdev-dpdk.c |  1 +
 3 files changed, 64 insertions(+), 11 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 87af459..ae060e2 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -48,6 +48,23 @@ dp_packet_use__(struct dp_packet *b, void *base, size_t 
allocated,
 dp_packet_set_size(b, 0);
 }
 
+#ifdef DPDK_NETDEV
+void
+dp_packet_copy_mbuf_flags(struct dp_packet *dst, const struct dp_packet *src)
+{
+ovs_assert(dst != NULL && src != NULL);
+struct rte_mbuf *buf_dst = &(dst->mbuf);
+struct rte_mbuf buf_src = src->mbuf;
+
+buf_dst->nb_segs = buf_src.nb_segs;
+buf_dst->ol_flags = buf_src.ol_flags;
+buf_dst->packet_type = buf_src.packet_type;
+buf_dst->tx_offload = buf_src.tx_offload;
+}
+#else
+#define dp_packet_copy_mbuf_flags(arg1, arg2)
+#endif
+
 /* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
  * memory starting at 'base'.  'base' should be the first byte of a region
  * obtained from malloc().  It will be freed (with free()) if 'b' is resized or
@@ -158,6 +175,44 @@ dp_packet_clone(const struct dp_packet *buffer)
 return dp_packet_clone_with_headroom(buffer, 0);
 }
 
+#ifdef DPDK_NETDEV
+struct dp_packet *
+dp_packet_clone_with_headroom(const struct dp_packet *b, size_t headroom) {
+struct dp_packet *new_buffer;
+uint32_t pkt_len = dp_packet_size(b);
+
+/* copy multi-seg data */
+if (b->source == DPBUF_DPDK && b->mbuf.nb_segs > 1) {
+void *dst = NULL;
+struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, >mbuf);
+
+new_buffer = dp_packet_new_with_headroom(pkt_len, headroom);
+dst = dp_packet_data(new_buffer);
+dp_packet_set_size(new_buffer, pkt_len);
+
+if (!rte_pktmbuf_read(mbuf, 0, pkt_len, dst)) {
+return NULL;
+}
+} else {
+new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(b),
+dp_packet_size(b),
+headroom);
+}
+
+/* Copy the following fields into the returned buffer: l2_pad_size,
+ * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
+memcpy(_buffer->l2_pad_size, >l2_pad_size,
+   sizeof(struct dp_packet) -
+   offsetof(struct dp_packet, l2_pad_size));
+
+dp_packet_copy_mbuf_flags(new_buffer, b);
+if (dp_packet_rss_valid(new_buffer)) {
+new_buffer->mbuf.hash.rss = b->mbuf.hash.rss;
+}
+
+return new_buffer;
+}
+#else
 /* Creates and returns a new dp_packet whose data are copied from 'buffer'.
  * The returned dp_packet will additionally have 'headroom' bytes of
  * headroom. */
@@ -165,32 +220,25 @@ struct dp_packet *
 dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom)
 {
 struct dp_packet *new_buffer;
+uint32_t pkt_len = dp_packet_size(buffer);
 
 new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer),
- dp_packet_size(buffer),
- headroom);
+ pkt_len, headroom);
+
 /* Copy the following fields into the returned buffer: l2_pad_size,
  * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
 memcpy(_buffer->l2_pad_size, >l2_pad_size,
 sizeof(struct dp_packet) -
 offsetof(struct dp_packet, l2_pad_size));
 
-#ifdef DPDK_NETDEV
-new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags;
-#else
 new_buffer->rss_hash_valid = buffer->rss_hash_valid;
-#endif
-
 if (dp_packet_rss_valid(new_buffer)) {
-#ifdef DPDK_NETDEV
-new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss;
-#else
 new_buffer->rss_hash = buffer->rss_hash;
-#endif
 }
 
 return new_buffer;
 }
+#endif
 
 /* Creates and returns a new dp_packet that initially contains a copy of the
  * 'size' bytes of data starting at 'data' with no headroom or tailroom. */
@@ -374,6 +422,7 @@ dp_packet_mbuf_write(struct rte_mbuf *mbuf, int16_t ofs, 
uint32_t len,
 len -= len_copy;
 ofs = 0;
 
+mbuf->data_len = len_copy;
 mbuf = mbuf->next;
 }
 }
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 6ca4e98..d087cb3 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -124,6 +124,9 @@ void

[ovs-dev] [PATCH v4 01/15] netdev-dpdk: Differentiate between mtu/mbuf size.

2018-07-10 Thread Tiago Lam

When configuring a mempool, in netdev_dpdk_mempool_configure(), the
result of a call to dpdk_buf_size() is being used as the MTU. This,
however, is not the MTU but a ROUND_UP aligned number based on the MTU,
which could lead to the reuse of mempools even when the real MTUs
between different ports differ but somehow round up to the same mbuf
size.

To support multi-segment mbufs, which is coming in later commits, this
distinction is important. For multi-segment mbufs the mbuf size is
always the same (multiple mbufs are chained together to hold the
packet's data), which means that using the mbuf size as the mtu would
lead to reusing always the same mempool, independent of the MTU set.

To fix this, two fields are now passed to dpdk_mp_create(), the mtu and
the mbuf_size, thus creating a distinction between the two. The latter
is used for telling rte_pktmbuf_pool_create() the size of each mbuf,
while the former is used for tracking purposes in struct dpdk_mp and as
an input to create a unique hash for the mempool's name.

Signed-off-by: Tiago Lam 
---
 lib/netdev-dpdk.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index bb4d60f..be8e8b9 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -632,7 +632,8 @@ dpdk_calculate_mbufs(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
 }
 
 static struct dpdk_mp *
-dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
+dpdk_mp_create(struct netdev_dpdk *dev, int mtu, uint32_t mbuf_size,
+   bool per_port_mp)
 {
 char mp_name[RTE_MEMPOOL_NAMESIZE];
 const char *netdev_name = netdev_get_name(>up);
@@ -666,21 +667,23 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
 if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
 VLOG_DBG("snprintf returned %d. "
  "Failed to generate a mempool name for \"%s\". "
- "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.",
- ret, netdev_name, hash, socket_id, mtu, n_mbufs);
+ "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u, "
+ "mbuf size: %u",
+ ret, netdev_name, hash, socket_id, mtu, n_mbufs,
+ mbuf_size);
 break;
 }
 
-VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "
+VLOG_DBG("Port %s: Requesting a mempool of %u mbufs of size %u "
   "on socket %d for %d Rx and %d Tx queues.",
-  netdev_name, n_mbufs, socket_id,
+  netdev_name, n_mbufs, mbuf_size, socket_id,
   dev->requested_n_rxq, dev->requested_n_txq);
 
 dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs,
   MP_CACHE_SZ,
   sizeof (struct dp_packet)
   - sizeof (struct rte_mbuf),
-  MBUF_SIZE(mtu)
+  MBUF_SIZE(mbuf_size)
   - sizeof(struct dp_packet),
   socket_id);
 
@@ -718,7 +721,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
 }
 
 static struct dpdk_mp *
-dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
+dpdk_mp_get(struct netdev_dpdk *dev, int mtu, uint32_t mbuf_size,
+bool per_port_mp)
 {
 struct dpdk_mp *dmp, *next;
 bool reuse = false;
@@ -741,7 +745,7 @@ dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool 
per_port_mp)
 dpdk_mp_sweep();
 
 if (!reuse) {
-dmp = dpdk_mp_create(dev, mtu, per_port_mp);
+dmp = dpdk_mp_create(dev, mtu, mbuf_size, per_port_mp);
 if (dmp) {
 /* Shared memory will hit the reuse case above so will not
  * request a mempool that already exists but we need to check
@@ -807,7 +811,7 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
 return ret;
 }
 
-dmp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), per_port_mp);
+dmp = dpdk_mp_get(dev, dev->requested_mtu, buf_size, per_port_mp);
 if (!dmp) {
 VLOG_ERR("Failed to create memory pool for netdev "
  "%s, with MTU %d on socket %d: %s\n",
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 08/15] dp-packet: Handle multi-seg mubfs in shift() func.

2018-07-10 Thread Tiago Lam

In its current implementation dp_packet_shift() is also unaware of
multi-seg mbufs (that holds data in memory non-contiguously) and assumes
that data exists contiguously in memory, memmove'ing data to perform the
shift.

To add support for multi-seg mbuds a new set of functions was
introduced, dp_packet_mbuf_shift() and dp_packet_mbuf_write(). These
functions are used by dp_packet_shift(), when handling multi-seg mbufs,
to shift and write data within a chain of mbufs.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.c | 97 +
 lib/dp-packet.h | 10 ++
 2 files changed, 107 insertions(+)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 2aaeaae..d6e19eb 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -294,6 +294,97 @@ dp_packet_prealloc_headroom(struct dp_packet *b, size_t 
size)
 }
 }
 
+#ifdef DPDK_NETDEV
+/* Write len data bytes in a mbuf at specified offset.
+ *
+ * 'mbuf', pointer to the destination mbuf where 'ofs' is, and the mbuf where
+ * the data will first be written.
+ * 'ofs', the offset within the provided 'mbuf' where 'data' is to be written.
+ * 'len', the size of the to be written 'data'.
+ * 'data', pointer to the to be written bytes.
+ *
+ * XXX: This function is the counterpart of the `rte_pktmbuf_read()` function
+ * available with DPDK, in the rte_mbuf.h */
+void
+dp_packet_mbuf_write(struct rte_mbuf *mbuf, int16_t ofs, uint32_t len,
+ const void *data)
+{
+char *dst_addr;
+uint16_t data_len;
+int len_copy;
+while (mbuf) {
+if (len == 0) {
+break;
+}
+
+dst_addr = rte_pktmbuf_mtod_offset(mbuf, char *, ofs);
+data_len = MBUF_BUF_END(mbuf->buf_addr, mbuf->buf_len) - dst_addr;
+
+len_copy = MIN(len, data_len);
+/* We don't know if 'data' is the result of a rte_pktmbuf_read() call,
+ * in which case we may end up writing to the same region of memory we
+ * are reading from and overlapping. Hence the use of memmove() here */
+memmove(dst_addr, data, len_copy);
+
+data = ((char *) data) + len_copy;
+len -= len_copy;
+ofs = 0;
+
+mbuf = mbuf->next;
+}
+}
+
+static void
+dp_packet_mbuf_shift_(struct rte_mbuf *dbuf, int16_t dst_ofs,
+  const struct rte_mbuf *sbuf, uint16_t src_ofs, int len)
+{
+char rd[len];
+const char *wd = rte_pktmbuf_read(sbuf, src_ofs, len, rd);
+
+ovs_assert(wd);
+
+dp_packet_mbuf_write(dbuf, dst_ofs, len, wd);
+}
+
+/* Similarly to dp_packet_shift(), shifts the data within the mbufs of a
+ * dp_packet of DPBUF_DPDK source by 'delta' bytes.
+ * Caller must make sure of the following conditions:
+ * - When shifting left, delta can't be bigger than the data_len available in
+ *   the last mbuf;
+ * - When shifting right, delta can't be bigger than the space available in the
+ *   first mbuf (buf_len - data_off).
+ * Both these conditions guarantee that a shift operation doesn't fall outside
+ * the bounds of the existing mbufs, so that the first and last mbufs (when
+ * using multi-segment mbufs), remain the same. */
+static void
+dp_packet_mbuf_shift(struct dp_packet *b, int delta)
+{
+uint16_t src_ofs;
+int16_t dst_ofs;
+
+struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, >mbuf);
+struct rte_mbuf *tmbuf = rte_pktmbuf_lastseg(mbuf);
+
+if (delta < 0) {
+ovs_assert(-delta <= tmbuf->data_len);
+} else {
+ovs_assert(delta < (mbuf->buf_len - mbuf->data_off));
+}
+
+/* Set the destination and source offsets to copy to */
+dst_ofs = delta;
+src_ofs = 0;
+
+/* Shift data from src mbuf and offset to dst mbuf and offset */
+dp_packet_mbuf_shift_(mbuf, dst_ofs, mbuf, src_ofs,
+  rte_pktmbuf_pkt_len(mbuf));
+
+/* Update mbufs' properties, and if using multi-segment mbufs, first and
+ * last mbuf's data_len also needs to be adjusted */
+mbuf->data_off = mbuf->data_off + dst_ofs;
+}
+#endif
+
 /* Shifts all of the data within the allocated space in 'b' by 'delta' bytes.
  * For example, a 'delta' of 1 would cause each byte of data to move one byte
  * forward (from address 'p' to 'p+1'), and a 'delta' of -1 would cause each
@@ -306,6 +397,12 @@ dp_packet_shift(struct dp_packet *b, int delta)
: true);
 
 if (delta != 0) {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+dp_packet_mbuf_shift(b, delta);
+return;
+}
+#endif
 char *dst = (char *) dp_packet_data(b) + delta;
 memmove(dst, dp_packet_data(b), dp_packet_size(b));
 dp_packet_set_data(b, dst);
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 14e2551..6ca4e98 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -80,6 +80,11 @@ struct dp_packet {
 };
 };
 
+#ifdef DPDK_NETDEV
+#define MBUF_BUF_END(BUF_ADDR, BUF_LEN) \
+(char *) (((char *) BUF_ADDR)

[ovs-dev] [PATCH v4 03/15] dp-packet: Init specific mbuf fields.

2018-07-10 Thread Tiago Lam

From: Mark Kavanagh 

dp_packets are created using xmalloc(); in the case of OvS-DPDK, it's
possible the the resultant mbuf portion of the dp_packet contains
random data. For some mbuf fields, specifically those related to
multi-segment mbufs and/or offload features, random values may cause
unexpected behaviour, should the dp_packet's contents be later copied
to a DPDK mbuf. It is critical therefore, that these fields should be
initialized to 0.

This patch ensures that the following mbuf fields are initialized to
appropriate values on creation of a new dp_packet:
   - ol_flags=0
   - nb_segs=1
   - tx_offload=0
   - packet_type=0
   - next=NULL

Adapted from an idea by Michael Qiu :
https://patchwork.ozlabs.org/patch/777570/

Co-authored-by: Tiago Lam 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index ba91e58..b948fe1 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -625,14 +625,15 @@ dp_packet_mbuf_rss_flag_reset(struct dp_packet *p 
OVS_UNUSED)
 }
 
 /* This initialization is needed for packets that do not come
- * from DPDK interfaces, when vswitchd is built with --with-dpdk.
- * The DPDK rte library will still otherwise manage the mbuf.
- * We only need to initialize the mbuf ol_flags. */
+ * from DPDK interfaces, when vswitchd is built with --with-dpdk. */
 static inline void
 dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED)
 {
 #ifdef DPDK_NETDEV
-p->mbuf.ol_flags = 0;
+struct rte_mbuf *mbuf = &(p->mbuf);
+mbuf->ol_flags = mbuf->tx_offload = mbuf->packet_type = 0;
+mbuf->nb_segs = 1;
+mbuf->next = NULL;
 #endif
 }
 
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v4 07/15] dp-packet: Handle multi-seg mbufs in helper funcs.

2018-07-10 Thread Tiago Lam

Most helper functions in dp-packet assume that the data held by a
dp_packet is contiguous, and perform operations such as pointer
arithmetic under that assumption. However, with the introduction of
multi-segment mbufs, where data is non-contiguous, such assumptions are
no longer possible. Some examples of Such helper functions are
dp_packet_tail(), dp_packet_tailroom(), dp_packet_end(),
dp_packet_get_allocated() and dp_packet_at().

Thus, instead of assuming contiguous data in dp_packet, they  now
iterate over the (non-contiguous) data in mbufs to perform their
calculations.

Finally, dp_packet_use__() has also been modified to perform the
initialisation of the packet (and setting the source) before continuing
to set its size and data length, which now depends on the type of
packet.

Co-authored-by: Mark Kavanagh 

Signed-off-by: Mark Kavanagh 
Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.c |   4 +-
 lib/dp-packet.h | 114 +---
 2 files changed, 111 insertions(+), 7 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 782e7c2..2aaeaae 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -41,11 +41,11 @@ static void
 dp_packet_use__(struct dp_packet *b, void *base, size_t allocated,
  enum dp_packet_source source)
 {
+dp_packet_init__(b, allocated, source);
+
 dp_packet_set_base(b, base);
 dp_packet_set_data(b, base);
 dp_packet_set_size(b, 0);
-
-dp_packet_init__(b, allocated, source);
 }
 
 /* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index d2803af..14e2551 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -185,9 +185,25 @@ dp_packet_delete(struct dp_packet *b)
 static inline void *
 dp_packet_at(const struct dp_packet *b, size_t offset, size_t size)
 {
-return offset + size <= dp_packet_size(b)
-   ? (char *) dp_packet_data(b) + offset
-   : NULL;
+if (offset + size > dp_packet_size(b)) {
+return NULL;
+}
+
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, >mbuf);
+
+while (buf && offset > buf->data_len) {
+offset -= buf->data_len;
+
+buf = buf->next;
+}
+
+return buf ? rte_pktmbuf_mtod_offset(buf, char *, offset) : NULL;
+}
+#endif
+
+return (char *) dp_packet_data(b) + offset;
 }
 
 /* Returns a pointer to byte 'offset' in 'b', which must contain at least
@@ -196,13 +212,23 @@ static inline void *
 dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size)
 {
 ovs_assert(offset + size <= dp_packet_size(b));
-return ((char *) dp_packet_data(b)) + offset;
+return dp_packet_at(b, offset, size);
 }
 
 /* Returns a pointer to byte following the last byte of data in use in 'b'. */
 static inline void *
 dp_packet_tail(const struct dp_packet *b)
 {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, >mbuf);
+/* Find last segment where data ends, meaning the tail of the chained
+ *  mbufs must be there */
+buf = rte_pktmbuf_lastseg(buf);
+
+return rte_pktmbuf_mtod_offset(buf, void *, buf->data_len);
+}
+#endif
 return (char *) dp_packet_data(b) + dp_packet_size(b);
 }
 
@@ -211,6 +237,15 @@ dp_packet_tail(const struct dp_packet *b)
 static inline void *
 dp_packet_end(const struct dp_packet *b)
 {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &(b->mbuf));
+
+buf = rte_pktmbuf_lastseg(buf);
+
+return (char *) buf->buf_addr + buf->buf_len;
+}
+#endif
 return (char *) dp_packet_base(b) + dp_packet_get_allocated(b);
 }
 
@@ -236,6 +271,15 @@ dp_packet_tailroom(const struct dp_packet *b)
 static inline void
 dp_packet_clear(struct dp_packet *b)
 {
+#ifdef DPDK_NETDEV
+if (b->source == DPBUF_DPDK) {
+/* sets pkt_len and data_len to zero and frees unused mbufs */
+dp_packet_set_size(b, 0);
+rte_pktmbuf_reset(>mbuf);
+
+return;
+}
+#endif
 dp_packet_set_data(b, dp_packet_base(b));
 dp_packet_set_size(b, 0);
 }
@@ -252,12 +296,33 @@ dp_packet_pull(struct dp_packet *b, size_t size)
 return data;
 }
 
+#ifdef DPDK_NETDEV
+/* Similar to dp_packet_try_pull() but doesn't actually pull any data, only
+ * checks if it could and returns true or false accordingly.
+ *
+ * Valid for dp_packets carrying mbufs only. */
+static inline bool
+dp_packet_mbuf_may_pull(const struct dp_packet *b, size_t size) {
+if (size > b->mbuf.data_len) {
+return false;
+}
+
+return true;
+}
+#endif
+
 /* If 'b' has at least 'size' bytes of data, removes that many bytes from the
  * head end of 'b' and returns the first byte removed.  Otherwise, returns a
  * null pointer without modifying

[ovs-dev] [PATCH v4 04/15] dp-packet: Fix allocated size on DPDK init.

2018-07-10 Thread Tiago Lam

When enabled with DPDK OvS deals with two types of packets, the ones
coming from the mempool and the ones locally created by OvS - which are
copied to mempool mbufs before output. In the latter, the space is
allocated from the system, while in the former the mbufs are allocated
from a mempool, which takes care of initialising them appropriately.

In the current implementation, during mempool's initialisation of mbufs,
dp_packet_set_allocated() is called from dp_packet_init_dpdk() without
considering that the allocated space, in the case of multi-segment
mbufs, might be greater than a single mbuf.  Furthermore, given that
dp_packet_init_dpdk() is on the code path that's called upon mempool's
initialisation, a call to dp_packet_set_allocated() is redundant, since
mempool takes care of initialising it.

To fix this, dp_packet_set_allocated() is no longer called after
initialisation of a mempool, only in dp_packet_init__(), which is still
called by OvS when initialising locally created packets.

Signed-off-by: Tiago Lam 
Acked-by: Eelco Chaudron 
---
 lib/dp-packet.c   | 3 +--
 lib/dp-packet.h   | 2 +-
 lib/netdev-dpdk.c | 2 +-
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 443c225..782e7c2 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -99,9 +99,8 @@ dp_packet_use_const(struct dp_packet *b, const void *data, 
size_t size)
  * buffer.  Here, non-transient ovs dp-packet fields are initialized for
  * packets that are part of a DPDK memory pool. */
 void
-dp_packet_init_dpdk(struct dp_packet *b, size_t allocated)
+dp_packet_init_dpdk(struct dp_packet *b)
 {
-dp_packet_set_allocated(b, allocated);
 b->source = DPBUF_DPDK;
 }
 
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index b948fe1..6376039 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -114,7 +114,7 @@ void dp_packet_use(struct dp_packet *, void *, size_t);
 void dp_packet_use_stub(struct dp_packet *, void *, size_t);
 void dp_packet_use_const(struct dp_packet *, const void *, size_t);
 
-void dp_packet_init_dpdk(struct dp_packet *, size_t allocated);
+void dp_packet_init_dpdk(struct dp_packet *);
 
 void dp_packet_init(struct dp_packet *, size_t);
 void dp_packet_uninit(struct dp_packet *);
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0b964c8..a68f78a 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -550,7 +550,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED,
 {
 struct rte_mbuf *pkt = _p;
 
-dp_packet_init_dpdk((struct dp_packet *) pkt, pkt->buf_len);
+dp_packet_init_dpdk((struct dp_packet *) pkt);
 }
 
 static int
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

1 2 >

1 - 100 of 119 matches

Mail list logo