[ovs-dev] 答复: [PATCH 0/3] userspace-tso: Improve L4 csum offload support.

2020-04-15 Thread D
Ethtool can separately turn off tso

$ ethtool -K vethXXX tso off

$ ethtool -K vethXXX tx off will turn off tx checksum, tso, sg.

TSO depends on tx checksum and sg, so if you just want to turn off tso and
keep tx chechsum on, you can do it in the below way.

$ ethtool -K vethXXX tx on
$ ethtool -K vethXXX tso off


-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 William Tu
发送时间: 2020年4月16日 7:49
收件人: Ilya Maximets 
抄送:  ; Flavio Leitner

主题: Re: [ovs-dev] [PATCH 0/3] userspace-tso: Improve L4 csum offload
support.

On Fri, Feb 28, 2020 at 7:34 AM Ilya Maximets  wrote:
>
> On 2/14/20 2:03 PM, Flavio Leitner wrote:
> > This patchset disables unsupported offload features for vhost device 
> > such as UFO and ECN.
> >
> > Then it includes UDP checksum offload as a must have to enable 
> > userspace TSO, but leave SCTP as optional. Only a few drivers 
> > support SCTP checksum offload and the protocol is not widely used.

Hi Flavio and Ilya,

I have a question about this.
If we do "other_config:userspace-tso-enable=true", it enables both the TSO
and CSUM offload.
Can we enable only the CSUM offload, but not TSO?
So making it a separate configurations?

Because currently, all the "make check-system-userspace" has to add $
ethtool -K $1 tx off due to no checksum support.
If we make CSUM offload enabled by default, so we don't need to turn off tx
offload?

Regards,
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: Re: Re: Re: [PATCH v4 0/3] Add support for TSO with DPDK

2020-04-15 Thread D
Hi, Flavio

Are you Redhat folks working on VxLAN TSO support for OVS DPDK? This is a
great feature, many NICs can do VxLAN TSO, even very old Intel 82599ES
Controller also can support VxLAN TSO, per our test, for two VMs across two
compute nodes, iperf3 tcp and udp performance can reach line speed (about
10Gbps).

Per our understanding, DPDK and hardware are ready for VxLAN TSO, I don't
know how much extra effort we need to make to enable VxLAN TSO support?
what's the road block for this?

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年3月10日 21:10
收件人: txfh2007 
抄送: William Tu ; Yi Yang (杨�D)-云服务集团
; d...@openvswitch.org; i.maxim...@ovn.org
主题: Re: Re:[ovs-dev] Re: Re: [PATCH v4 0/3] Add support for TSO with DPDK

On Tue, Mar 10, 2020 at 04:08:43PM +0800, txfh2007 wrote:
> Hi Flavio and all:
> 
>  Is there a way to support software TSO for DPDK tunnel network ? I
have tried userspace TSO function, and running on tunnel network, I have got
the following error:
>  "Tunneling packets with HW offload flags is not supported: packet
dropped" 
>  So is there a way to work around if we would support both vlan and
tunnel network on the same compute node ? 


No, there is no support for tunneling at this point.
fbl

>  
> Thanks
> Timo
> 
> 
> 
> --
> 
> 
> 
> 
> On Fri, Feb 28, 2020 at 9:56 AM Flavio Leitner  wrote:
> >
> >
> > Hi Yi Yang,
> >
> > This is the bug fix required to make veth TSO work in OvS:
> >
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
d=9d2f67e43b73e8af7438be219b66a5de0cfa8bd9
> >
> > commit 9d2f67e43b73e8af7438be219b66a5de0cfa8bd9
> > Author: Jianfeng Tan 
> > Date:   Sat Sep 29 15:41:27 2018 +
> >
> > net/packet: fix packet drop as of virtio gso
> >
> > When we use raw socket as the vhost backend, a packet from virito
with
> > gso offloading information, cannot be sent out in later validaton at
> > xmit path, as we did not set correct skb->protocol which is further
used
> > for looking up the gso function.
> >
> > To fix this, we set this field according to virito hdr information.
> >
> > Fixes: e858fae2b0b8f4 ("virtio_net: use common code for
virtio_net_hdr and skb GSO conversion")
> > Signed-off-by: Jianfeng Tan 
> > Signed-off-by: David S. Miller 
> >
> >
> > So, the minimum kernel version is 4.19.
> >
> Thanks,
> I sent a patch to update the documentation. Please take a look.
> William
> 

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH v7] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-03-18 Thread D
Ilya, raw socket for the interface type of which is "system" has been set to
non-block mode, can you explain which syscall will lead to sleep? Yes, pmd
thread will consume CPU resource even if it has nothing to do, but all the
type=dpdk ports are handled by pmd thread, here we just let system
interfaces look like a DPDK interface. I didn't see any problem in my test,
it will be better if you can tell me what will result in a problem and how I
can reproduce it. By the way, type=tap/internal interfaces are still be
handled by ovs-vswitchd thread.

In addition, only one line change is there, ".is_pmd = true,", ".is_pmd =
false," will keep it in ovs-vswitchd if there is any other concern. We can
change non-thread-safe parts to support pmd.

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Ilya Maximets
发送时间: 2020年3月18日 19:45
收件人: yang_y...@163.com; ovs-dev@openvswitch.org
抄送: i.maxim...@ovn.org
主题: Re: [ovs-dev] [PATCH v7] Use TPACKET_V3 to accelerate veth for
userspace datapath

On 3/18/20 10:02 AM, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> We can avoid high system call overhead by using TPACKET_V3 and using 
> DPDK-like poll to receive and send packets (Note: send still needs to 
> call sendto to trigger final packet transmission).
> 
> From Linux kernel 3.10 on, TPACKET_V3 has been supported, so all the 
> Linux kernels current OVS supports can run
> TPACKET_V3 without any problem.
> 
> I can see about 50% performance improvement for veth compared to last 
> recvmmsg optimization if I use TPACKET_V3, it is about 2.21 Gbps, but 
> it was 1.47 Gbps before.
> 
> After is_pmd is set to true, performance can be improved much more, it 
> is about 180% performance improvement.
> 
> TPACKET_V3 can support TSO, but its performance isn't good because of 
> TPACKET_V3 kernel implementation issue, so it falls back to recvmmsg 
> in case userspace-tso-enable is set to true, but its performance is 
> better than recvmmsg in case userspace-tso-enable is set to false, so 
> just use TPACKET_V3 in that case.
> 
> Note: how much performance improvement is up to your platform, some 
> platforms can see huge improvement, some ones aren't so noticeable, 
> but if is_pmd is set to true, you can see big performance improvement, 
> the prerequisite is your tested veth interfaces should be attached to 
> different pmd threads.
> 
> Signed-off-by: Yi Yang 
> Co-authored-by: William Tu 
> Signed-off-by: William Tu 
> ---
>  acinclude.m4 |  12 ++
>  configure.ac |   1 +
>  include/sparse/linux/if_packet.h | 111 +++
>  lib/dp-packet.c  |  18 ++
>  lib/dp-packet.h  |   9 +
>  lib/netdev-linux-private.h   |  26 +++
>  lib/netdev-linux.c   | 419
+--
>  7 files changed, 579 insertions(+), 17 deletions(-)
> 
> Changelog:
> - v6->v7
>  * is_pmd is set to true for system interfaces

This can not be done that simple and should not be done unconditionally
anyways.  netdev-linux is not thread safe in many ways.  At least, stats
accounting will be messed up.  Second thing is that this change will harm
all the usual DPDK-based setups since PMD threads will start make a lot of
syscalls and sleep inside the kernel missing packets from the fast DPDK
interfaces.  Third thing is that this change will fire up at least one PMD
thread consuming 100% CPU constantly even for setups where it's not needed.
So, this version is definitely not acceptable.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH v5] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-02-25 Thread D
In the same environment, but I used tap but not veth, retr number is 0 for
the case without this patch (of course, I applied Flavio's tap enable patch)

vagrant@ubuntu1804:~$ sudo ./run-iperf3.sh
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 54572 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  12.6 GBytes  10.9 Gbits/sec0   3.14 MBytes
[  4]  10.00-20.00  sec  12.8 GBytes  11.0 Gbits/sec0   3.14 MBytes
[  4]  20.00-30.00  sec  10.2 GBytes  8.76 Gbits/sec0   3.14 MBytes
[  4]  30.00-40.00  sec  10.0 GBytes  8.63 Gbits/sec0   3.14 MBytes
[  4]  40.00-50.00  sec  10.4 GBytes  8.94 Gbits/sec0   3.14 MBytes
[  4]  50.00-60.00  sec  10.8 GBytes  9.31 Gbits/sec0   3.14 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec0 sender
[  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec
receiver

Server output:
Accepted connection from 10.15.1.2, port 54570
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 54572
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  12.6 GBytes  10.9 Gbits/sec
[  5]  10.00-20.00  sec  12.8 GBytes  11.0 Gbits/sec
[  5]  20.00-30.00  sec  10.2 GBytes  8.76 Gbits/sec
[  5]  30.00-40.00  sec  10.0 GBytes  8.63 Gbits/sec
[  5]  40.00-50.00  sec  10.4 GBytes  8.94 Gbits/sec
[  5]  50.00-60.00  sec  10.8 GBytes  9.31 Gbits/sec
[  5]  60.00-60.00  sec  1.75 MBytes  9.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-60.00  sec  0.00 Bytes  0.00 bits/sec  sender
[  5]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec
receiver


iperf Done.
vagrant@ubuntu1804:~$

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 William Tu
发送时间: 2020年2月26日 6:32
收件人: yang_y...@126.com
抄送: yang_y_yi ; ovs-dev 
主题: Re: [ovs-dev] [PATCH v5] Use TPACKET_V3 to accelerate veth for
userspace datapath

On Mon, Feb 24, 2020 at 5:01 AM  wrote:
>
> From: Yi Yang 
>
> We can avoid high system call overhead by using TPACKET_V3 and using 
> DPDK-like poll to receive and send packets (Note: send still needs to 
> call sendto to trigger final packet transmission).
>
> From Linux kernel 3.10 on, TPACKET_V3 has been supported, so all the 
> Linux kernels current OVS supports can run
> TPACKET_V3 without any problem.
>
> I can see about 30% performance improvement for veth compared to last 
> recvmmsg optimization if I use TPACKET_V3, it is about 1.98 Gbps, but 
> it was 1.47 Gbps before.
>
> TPACKET_V3 can support TSO, it can work only if your kernel can 
> support, this has been verified on Ubuntu 18.04 5.3.0-40-generic , if 
> you find the performance is very poor, please turn off tso for veth 
> interfces in case userspace-tso-enable is set to true.

Do you test the performance of enabling TSO?

Using veth (like your run-iperf3.sh) and with kernel 5.3.
Without your patch, with TSO enabled, I can get around 6Gbps But with this
patch, with TSO enabled, the performance drops to 1.9Gbps.

Regards,
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCHv2 1/2] userspace: Enable TSO support for non-DPDK.

2020-02-20 Thread D
William, which kernel version did you use to test for this patch? I don't
want to build a kernel if Ubuntu 16.04 kernel can work.

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 William Tu
发送时间: 2020年2月21日 3:00
收件人: d...@openvswitch.org
抄送: f...@sysclose.org; i.maxim...@ovn.org
主题: [ovs-dev] [PATCHv2 1/2] userspace: Enable TSO support for non-DPDK.

This patch enables TSO support for non-DPDK use cases, and also add
check-system-tso testsuite. Before TSO, we have to disable checksum offload,
allowing the kernel to calculate the TCP/UDP packet checsum. With TSO, we
can skip the checksum validation by enabling checksum offload, and with
large packet size, we see better performance.

Consider container to container use cases:
  iperf3 -c (ns0) -> veth peer -> OVS -> veth peer -> iperf3 -s (ns1) And I
got around 6Gbps, similar to TSO with DPDK-enabled.

Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/653109097
Signed-off-by: William Tu 
---
v2:
  - add make check-system-tso test
  - combine logging for dpdk and non-dpdk
  - I'm surprised that most of the test cases passed.
This is due to few tests using tcp/udp, so it does not trigger
TSO.  I saw only geneve/vxlan fails randomly, maybe we can
check it later.
---
 lib/dp-packet.h   | 95
++-
 lib/userspace-tso.c   |  5 ---
 tests/.gitignore  |  3 ++
 tests/automake.mk | 15 +++
 tests/system-tso-macros.at| 42 +++
 tests/system-tso-testsuite.at | 26 
 6 files changed, 143 insertions(+), 43 deletions(-)  create mode 100644
tests/system-tso-macros.at  create mode 100644 tests/system-tso-testsuite.at

diff --git a/lib/dp-packet.h b/lib/dp-packet.h index
9f8991faad52..6b90cec2afb4 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -53,7 +53,25 @@ enum OVS_PACKED_ENUM dp_packet_source {  enum
dp_packet_offload_mask {
 DP_PACKET_OL_RSS_HASH_MASK  = 0x1, /* Is the 'rss_hash' valid? */
 DP_PACKET_OL_FLOW_MARK_MASK = 0x2, /* Is the 'flow_mark' valid? */
+DP_PACKET_OL_RX_L4_CKSUM_BAD = 1 << 3,
+DP_PACKET_OL_RX_IP_CKSUM_BAD = 1 << 4,
+DP_PACKET_OL_RX_L4_CKSUM_GOOD = 1 << 5,
+DP_PACKET_OL_RX_IP_CKSUM_GOOD = 1 << 6,
+DP_PACKET_OL_TX_TCP_SEG = 1 << 7,
+DP_PACKET_OL_TX_IPV4 = 1 << 8,
+DP_PACKET_OL_TX_IPV6 = 1 << 9,
+DP_PACKET_OL_TX_TCP_CKSUM = 1 << 10,
+DP_PACKET_OL_TX_UDP_CKSUM = 1 << 11,
+DP_PACKET_OL_TX_SCTP_CKSUM = 1 << 12,
 };
+
+#define DP_PACKET_OL_TX_L4_MASK (DP_PACKET_OL_TX_TCP_CKSUM | \
+ DP_PACKET_OL_TX_UDP_CKSUM | \
+ DP_PACKET_OL_TX_SCTP_CKSUM) #define 
+DP_PACKET_OL_RX_IP_CKSUM_MASK (DP_PACKET_OL_RX_IP_CKSUM_GOOD | \
+   DP_PACKET_OL_RX_IP_CKSUM_BAD) 
+#define DP_PACKET_OL_RX_L4_CKSUM_MASK (DP_PACKET_OL_RX_L4_CKSUM_GOOD | \
+   DP_PACKET_OL_RX_L4_CKSUM_BAD)
 #else
 /* DPDK mbuf ol_flags that are not really an offload flags.  These are
mostly
  * related to mbuf memory layout and OVS should not touch/clear them. */ @@
-739,82 +757,79 @@ dp_packet_set_allocated(struct dp_packet *b, uint16_t s)
 b->allocated_ = s;
 }
 
-/* There are no implementation when not DPDK enabled datapath. */  static
inline bool -dp_packet_hwol_is_tso(const struct dp_packet *b OVS_UNUSED)
+dp_packet_hwol_is_tso(const struct dp_packet *b)
 {
-return false;
+return !!(b->ol_flags & DP_PACKET_OL_TX_TCP_SEG);
 }
 
-/* There are no implementation when not DPDK enabled datapath. */  static
inline bool -dp_packet_hwol_is_ipv4(const struct dp_packet *b OVS_UNUSED)
+dp_packet_hwol_is_ipv4(const struct dp_packet *b)
 {
-return false;
+return !!(b->ol_flags & DP_PACKET_OL_TX_IPV4);
 }
 
-/* There are no implementation when not DPDK enabled datapath. */  static
inline uint64_t -dp_packet_hwol_l4_mask(const struct dp_packet *b
OVS_UNUSED)
+dp_packet_hwol_l4_mask(const struct dp_packet *b)
 {
-return 0;
+return b->ol_flags & DP_PACKET_OL_TX_L4_MASK;
 }
 
-/* There are no implementation when not DPDK enabled datapath. */  static
inline bool -dp_packet_hwol_l4_is_tcp(const struct dp_packet *b OVS_UNUSED)
+dp_packet_hwol_l4_is_tcp(const struct dp_packet *b)
 {
-return false;
+return (b->ol_flags & DP_PACKET_OL_TX_L4_MASK) ==
+DP_PACKET_OL_TX_TCP_CKSUM;
 }
 
-/* There are no implementation when not DPDK enabled datapath. */  static
inline bool -dp_packet_hwol_l4_is_udp(const struct dp_packet *b OVS_UNUSED)
+dp_packet_hwol_l4_is_udp(const struct dp_packet *b)
 {
-return false;
+return (b->ol_flags & DP_PACKET_OL_TX_L4_MASK) ==
+DP_PACKET_OL_TX_UDP_CKSUM;
 }
 
-/* There are no implementation when not DPDK enabled datapath. */  static
inline bool -dp_packet_hwol_l4_is_sctp(const struct dp_packet *b OVS_UNUSED)
+dp_packet_hwol_l4_is_sctp(const struct dp_packet *b)
 {
-return false;
+return 

[ovs-dev] 答复: [PATCH v4 0/3] Add support for TSO with DPDK

2020-02-20 Thread D
Hi, Flavio

I find this tso feature doesn't work normally on my Ubuntu 16.04, here is my
result. My kernel version is 

$ uname -a
Linux cmp008 4.15.0-55-generic #60~16.04.2-Ubuntu SMP Thu Jul 4 09:03:09 UTC
2019 x86_64 x86_64 x86_64 GNU/Linux
$

$ ./run-iperf3.sh
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 56466 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  7.05 MBytes  5.91 Mbits/sec  2212   5.66 KBytes
[  4]  10.00-20.00  sec  7.67 MBytes  6.44 Mbits/sec  2484   5.66 KBytes
[  4]  20.00-30.00  sec  7.77 MBytes  6.52 Mbits/sec  2500   5.66 KBytes
[  4]  30.00-40.00  sec  7.77 MBytes  6.52 Mbits/sec  2490   5.66 KBytes
[  4]  40.00-50.00  sec  7.76 MBytes  6.51 Mbits/sec  2500   5.66 KBytes
[  4]  50.00-60.00  sec  7.79 MBytes  6.54 Mbits/sec  2504   5.66 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec  45.8 MBytes  6.40 Mbits/sec  14690
sender
[  4]   0.00-60.00  sec  45.7 MBytes  6.40 Mbits/sec
receiver

Server output:
Accepted connection from 10.15.1.2, port 56464
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 56466
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  6.90 MBytes  5.79 Mbits/sec
[  5]  10.00-20.00  sec  7.71 MBytes  6.47 Mbits/sec
[  5]  20.00-30.00  sec  7.73 MBytes  6.48 Mbits/sec
[  5]  30.00-40.00  sec  7.79 MBytes  6.53 Mbits/sec
[  5]  40.00-50.00  sec  7.79 MBytes  6.53 Mbits/sec
[  5]  50.00-60.00  sec  7.79 MBytes  6.54 Mbits/sec


iperf Done.
$

But it does work for tap, I'm not sure if it is a kernel issue, which kernel
version are you using? I didn't use tpacket_v3 patch. Here is my local ovs
info.

$ git log
commit 1223cf123ed141c0a0110ebed17572bdb2e3d0f4
Author: Ilya Maximets 
Date:   Thu Feb 6 14:24:23 2020 +0100

netdev-dpdk: Don't enable offloading on HW device if not requested.

DPDK drivers has different implementations of transmit functions.
Enabled offloading may cause driver to choose slower variant
significantly affecting performance if userspace TSO wasn't requested.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Reported-by: David Marchand 
Acked-by: David Marchand 
Acked-by: Flavio Leitner 
Acked-by: Kevin Traynor 
Signed-off-by: Ilya Maximets 

commit 73858f9dbe83daf8cc8d4b604acc23eb62cc3f52
Author: Flavio Leitner 
Date:   Mon Feb 3 18:45:50 2020 -0300

netdev-linux: Prepend the std packet in the TSO packet

Usually TSO packets are close to 50k, 60k bytes long, so to
to copy less bytes when receiving a packet from the kernel
change the approach. Instead of extending the MTU sized
packet received and append with remaining TSO data from
the TSO buffer, allocate a TSO packet with enough headroom
to prepend the std packet data.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Suggested-by: Ben Pfaff 
Signed-off-by: Flavio Leitner 
Signed-off-by: Ben Pfaff 

commit 2297cbe6cc25b6b1862c499ce8f16f52f75d9e5f
Author: Flavio Leitner 
Date:   Mon Feb 3 11:22:22 2020 -0300

netdev-linux-private: fix max length to be 16 bits

The dp_packet length is limited to 16 bits, so document that
and fix the length value accordingly.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner 
Signed-off-by: Ben Pfaff 

commit 3d6a6f450af5b7eaf4b532983cb14458ae792b72
Author: David Marchand 
Date:   Tue Feb 4 22:28:26 2020 +0100

netdev-dpdk: Fix port init when lacking Tx offloads for TSO.

The check on TSO capability did not ensure ip checksum, tcp checksum and
TSO tx offloads were available which resulted in a port init failure
(example below with a ena device):

*2020-02-04T17:42:52.976Z|00084|dpdk|ERR|Ethdev port_id=0 requested Tx
offloads 0x2a doesn't match Tx offloads capabilities 0xe in
rte_eth_dev_configure()*

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")

Reported-by: Ravi Kerur 
Signed-off-by: David Marchand 
Acked-by: Kevin Traynor 
Acked-by: Flavio Leitner 
Signed-off-by: Ilya Maximets 

commit 8e371aa497aa95e3562d53f566c2d634b4b0f589
Author: Kirill A. Kornilov 
Date:   Mon Jan 13 12:29:10 2020 +0300

vswitchd: Add serial number configuration.

Signed-off-by: Kirill A. Kornilov 
Signed-off-by: Ben Pfaff 

I applied your tap patch.

$ git diff
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index c6f3d27..74a5728 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -1010,6 +1010,23 @@ netdev_linux_construct_tap(struct netdev *netdev_)
 goto error_close;
 }

+if (userspace_tso_enabled()) {
+/* Old kernels don't support TUNSETOFFLOAD. If TUNSETOFFLOAD is
+ * available, it will return EINVAL when a flag is unknown.
+ 

[ovs-dev] 答复: [PATCH v4] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-02-17 Thread D
Ilya, thank you so much for your comments, I'll fix them in next version.

For TSO support, this patch can work from functionality, but I checked
tpacket_v3 kernel code, I don't think tpacket_v3 kernel part can support it,
my test result also showed very bad performance if userspace-tso-enable is
set to true, for the case you're saying, current tpacket_v3 can't reach very
good performance.

I have a question, in openstack and vxlan scenario, can userspace-tso-enable
be set to true? Per TSO patch series comments, it can't support such case,
so I'm thinking how we can trade off TSO and tpacket_v3, from my
perspective, tpacket_v3 is the best choice for the cases TSO can't work. I
hope openstack can get good performance when it uses new OVS DPDK version
and don't need to do any change in neutron and ovs agent side. It will be
better if we can work out a good way to cover two used cases.

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Ilya Maximets
发送时间: 2020年2月17日 21:12
收件人: ovs-dev@openvswitch.org
抄送: yang_y...@163.com; yang_y...@126.com; i.maxim...@ovn.org
主题: Re: [ovs-dev] [PATCH v4] Use TPACKET_V3 to accelerate veth for
userspace datapath

On 2/16/20 2:10 AM, yang_y...@126.com wrote:
> From: Yi Yang 
> 
> We can avoid high system call overhead by using TPACKET_V3 and using 
> DPDK-like poll to receive and send packets (Note: send still needs to 
> call sendto to trigger final packet transmission).
> 
> From Linux kernel 3.10 on, TPACKET_V3 has been supported, so all the 
> Linux kernels current OVS supports can run
> TPACKET_V3 without any problem.
> 
> I can see about 30% performance improvement for veth compared to last 
> recvmmsg optimization if I use TPACKET_V3, it is about 1.98 Gbps, but 
> it was 1.47 Gbps before.
> 
> Note: Linux kernel TPACKET_V3 can't support TSO, so the performance is 
> very poor, please turn off tso for veth interfces in case 
> userspace-tso-enable is set to true.

So, does this patch supports TSO or not?
What if I want to have TSO support AND a good performance?


I didn't review the code, but have some patch-wide style comments:
1. Comments in code should be complete sentences, i.e. start with
   a capital letter and end with a period.
2. Don't parenthesize arguments of sizeof if possible.


Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH v3] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-02-15 Thread D
Hi, William

I checked sparse check errors in my local machine, new v4 version should fix
these errors, please use v4, thanks a lot.

https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367883.html


diff --git a/include/sparse/linux/if_packet.h
b/include/sparse/linux/if_packet.h
index 503bade..d6a9fb0 100644
--- a/include/sparse/linux/if_packet.h
+++ b/include/sparse/linux/if_packet.h
@@ -5,6 +5,7 @@
 #error "Use this header only with sparse.  It is not a correct
implementation."
 #endif

+#include 
 #include_next 

 /* Fix endianness of 'spkt_protocol' and 'sll_protocol' members. */
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 49b6aa4..c275a64 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -1139,7 +1139,7 @@ tpacket_mmap_rx_tx_ring(int sock, struct tpacket_ring
*rx_ring,
 {
 int i;

-rx_ring->mm_space = mmap(0, rx_ring->mm_len + tx_ring->mm_len,
+rx_ring->mm_space = mmap(NULL, rx_ring->mm_len + tx_ring->mm_len,
   PROT_READ | PROT_WRITE,
   MAP_SHARED | MAP_LOCKED | MAP_POPULATE, sock, 0);
 if (rx_ring->mm_space == MAP_FAILED) {
@@ -1194,7 +1194,7 @@ netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
 };

 /* Create file descriptor. */
-rx->fd = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+rx->fd = socket(PF_PACKET, SOCK_RAW, (OVS_FORCE int)
htons(ETH_P_ALL));
 if (rx->fd < 0) {
 error = errno;
 VLOG_ERR("failed to create raw socket (%s)",
ovs_strerror(error));
@@ -1282,7 +1282,7 @@ netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
 sll.sll_halen = 0;
 #endif
 sll.sll_ifindex = ifindex;
-sll.sll_protocol = htons(ETH_P_ALL);
+sll.sll_protocol = (OVS_FORCE ovs_be16) htons(ETH_P_ALL);
 if (bind(rx->fd, (struct sockaddr *) , sizeof sll) < 0) {
 error = errno;
 VLOG_ERR("%s: failed to bind raw socket (%s)",

-邮件原件-
发件人: Yi Yang (杨�D)-云服务集团 
发送时间: 2020年2月15日 12:09
收件人: 'yang_y...@126.com' ; 'ovs-dev@openvswitch.org'

抄送: 'b...@ovn.org' ; 'ian.sto...@intel.com' ; 'yang_y...@163.com' 
主题: 答复: [PATCH v3] Use TPACKET_V3 to accelerate veth for userspace
datapath
重要性: 高

William, I don't know why I can't receive your comments in my outlook,
https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367860.html

I don't know how to check travis build issue, can you help provide a quick
guide in order that I can fix it?


-邮件原件-
发件人: yang_y...@126.com [mailto:yang_y...@126.com] 
发送时间: 2020年2月11日 18:22
收件人: ovs-dev@openvswitch.org
抄送: b...@ovn.org; ian.sto...@intel.com; Yi Yang (杨�D)-云服务集团
; yang_y...@163.com
主题: [PATCH v3] Use TPACKET_V3 to accelerate veth for userspace datapath

From: Yi Yang 

We can avoid high system call overhead by using TPACKET_V3 and using
DPDK-like poll to receive and send packets (Note: send still needs to call
sendto to trigger final packet transmission).

>From Linux kernel 3.10 on, TPACKET_V3 has been supported, so all the Linux
kernels current OVS supports can run
TPACKET_V3 without any problem.

I can see about 30% performance improvement for veth compared to last
recvmmsg optimization if I use TPACKET_V3, it is about 1.98 Gbps, but it was
1.47 Gbps before.

Note: Linux kernel TPACKET_V3 can't support TSO, so the performance is very
poor, please turn off tso for veth interfces in case userspace-tso-enable is
set to true.

Signed-off-by: Yi Yang 
Co-authored-by: William Tu 
Signed-off-by: William Tu 
---
 acinclude.m4 |  12 ++
 configure.ac |   1 +
 include/linux/automake.mk|   1 +
 include/linux/if_packet.h| 126 +
 include/sparse/linux/if_packet.h | 108 +++
 lib/netdev-linux-private.h   |  22 +++
 lib/netdev-linux.c   | 375
++-
 7 files changed, 640 insertions(+), 5 deletions(-)  create mode 100644
include/linux/if_packet.h

Changelog:
- v2->v3
 * Fix build issues in case HAVE_TPACKET_V3 is not defined
 * Add tso-related support code
 * make sure it can work normally in case userspace-tso-enable is true

- v1->v2
 * Remove TPACKET_V1 and TPACKET_V2 which is obsolete
 * Add include/linux/if_packet.h
 * Change include/sparse/linux/if_packet.h

diff --git a/acinclude.m4 b/acinclude.m4 index 1212a46..b39bbb9 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -1093,6 +1093,18 @@ AC_DEFUN([OVS_CHECK_IF_DL],
   AC_SEARCH_LIBS([pcap_open_live], [pcap])
fi])
 
+dnl OVS_CHECK_LINUX_TPACKET
+dnl
+dnl Configure Linux TPACKET.
+AC_DEFUN([OVS_CHECK_LINUX_TPACKET], [
+  AC_COMPILE_IFELSE([
+AC_LANG_PROGRAM([#include ], [
+struct tpacket3_hdr x =  { 0 };
+])],
+[AC_DEFINE([HAVE_TPACKET_V3], [1],
+[Define to 1 if struct tpacket3_hdr is available.])])
+])
+
 dnl Checks for buggy strtok_r.
 dnl
 dnl Some versions of glibc 2.7

[ovs-dev] 答复: [PATCH v3] Use TPACKET_V3 to accelerate veth for userspace datapath

2020-02-14 Thread D
William, I don't know why I can't receive your comments in my outlook,
https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367860.html

I don't know how to check travis build issue, can you help provide a quick
guide in order that I can fix it?


-邮件原件-
发件人: yang_y...@126.com [mailto:yang_y...@126.com] 
发送时间: 2020年2月11日 18:22
收件人: ovs-dev@openvswitch.org
抄送: b...@ovn.org; ian.sto...@intel.com; Yi Yang (杨�D)-云服务集团
; yang_y...@163.com
主题: [PATCH v3] Use TPACKET_V3 to accelerate veth for userspace datapath

From: Yi Yang 

We can avoid high system call overhead by using TPACKET_V3 and using
DPDK-like poll to receive and send packets (Note: send still needs to call
sendto to trigger final packet transmission).

>From Linux kernel 3.10 on, TPACKET_V3 has been supported, so all the Linux
kernels current OVS supports can run
TPACKET_V3 without any problem.

I can see about 30% performance improvement for veth compared to last
recvmmsg optimization if I use TPACKET_V3, it is about 1.98 Gbps, but it was
1.47 Gbps before.

Note: Linux kernel TPACKET_V3 can't support TSO, so the performance is very
poor, please turn off tso for veth interfces in case userspace-tso-enable is
set to true.

Signed-off-by: Yi Yang 
Co-authored-by: William Tu 
Signed-off-by: William Tu 
---
 acinclude.m4 |  12 ++
 configure.ac |   1 +
 include/linux/automake.mk|   1 +
 include/linux/if_packet.h| 126 +
 include/sparse/linux/if_packet.h | 108 +++
 lib/netdev-linux-private.h   |  22 +++
 lib/netdev-linux.c   | 375
++-
 7 files changed, 640 insertions(+), 5 deletions(-)  create mode 100644
include/linux/if_packet.h

Changelog:
- v2->v3
 * Fix build issues in case HAVE_TPACKET_V3 is not defined
 * Add tso-related support code
 * make sure it can work normally in case userspace-tso-enable is true

- v1->v2
 * Remove TPACKET_V1 and TPACKET_V2 which is obsolete
 * Add include/linux/if_packet.h
 * Change include/sparse/linux/if_packet.h

diff --git a/acinclude.m4 b/acinclude.m4 index 1212a46..b39bbb9 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -1093,6 +1093,18 @@ AC_DEFUN([OVS_CHECK_IF_DL],
   AC_SEARCH_LIBS([pcap_open_live], [pcap])
fi])
 
+dnl OVS_CHECK_LINUX_TPACKET
+dnl
+dnl Configure Linux TPACKET.
+AC_DEFUN([OVS_CHECK_LINUX_TPACKET], [
+  AC_COMPILE_IFELSE([
+AC_LANG_PROGRAM([#include ], [
+struct tpacket3_hdr x =  { 0 };
+])],
+[AC_DEFINE([HAVE_TPACKET_V3], [1],
+[Define to 1 if struct tpacket3_hdr is available.])])
+])
+
 dnl Checks for buggy strtok_r.
 dnl
 dnl Some versions of glibc 2.7 has a bug in strtok_r when compiling diff
--git a/configure.ac b/configure.ac index 1877aae..b61a1f4 100644
--- a/configure.ac
+++ b/configure.ac
@@ -89,6 +89,7 @@ OVS_CHECK_VISUAL_STUDIO_DDK  OVS_CHECK_COVERAGE
OVS_CHECK_NDEBUG  OVS_CHECK_NETLINK
+OVS_CHECK_LINUX_TPACKET
 OVS_CHECK_OPENSSL
 OVS_CHECK_LIBCAPNG
 OVS_CHECK_LOGDIR
diff --git a/include/linux/automake.mk b/include/linux/automake.mk index
8f063f4..a659e65 100644
--- a/include/linux/automake.mk
+++ b/include/linux/automake.mk
@@ -1,4 +1,5 @@
 noinst_HEADERS += \
+   include/linux/if_packet.h \
include/linux/netlink.h \
include/linux/netfilter/nf_conntrack_sctp.h \
include/linux/pkt_cls.h \
diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h new file
mode 100644 index 000..34c5747
--- /dev/null
+++ b/include/linux/if_packet.h
@@ -0,0 +1,126 @@
+#ifndef __LINUX_IF_PACKET_WRAPPER_H
+#define __LINUX_IF_PACKET_WRAPPER_H 1
+
+#ifdef HAVE_TPACKET_V3
+#include_next 
+#else
+#define HAVE_TPACKET_V3 1
+
+struct sockaddr_pkt {
+unsigned short  spkt_family;
+unsigned char   spkt_device[14];
+uint16_tspkt_protocol;
+};
+
+struct sockaddr_ll {
+unsigned short  sll_family;
+uint16_tsll_protocol;
+int sll_ifindex;
+unsigned short  sll_hatype;
+unsigned char   sll_pkttype;
+unsigned char   sll_halen;
+unsigned char   sll_addr[8];
+};
+
+/* Packet types */
+#define PACKET_HOST 0 /* To us*/
+
+/* Packet socket options */
+#define PACKET_RX_RING  5
+#define PACKET_VERSION 10
+#define PACKET_TX_RING 13
+#define PACKET_VNET_HDR15
+
+/* Rx ring - header status */
+#define TP_STATUS_KERNEL0
+#define TP_STATUS_USER(1 << 0)
+#define TP_STATUS_VLAN_VALID  (1 << 4) /* auxdata has valid tp_vlan_tci
*/
+#define TP_STATUS_VLAN_TPID_VALID (1 << 6) /* auxdata has valid 
+tp_vlan_tpid */
+
+/* Tx ring - header status */
+#define TP_STATUS_SEND_REQUEST(1 << 0)
+#define TP_STATUS_SENDING (1 << 1)
+
+struct tpacket_hdr {
+unsigned long tp_status;
+unsigned int tp_len;
+unsigned int tp_snaplen;
+unsigned s

[ovs-dev] 答复: [PATCH v2] netdev-linux: Prepend the std packet in the TSO packet

2020-02-03 Thread D
Hi, Flavio

With this one patch and previous several merged TSO-related patches, can
veth work with "ethtool -K vethX tx on"? I always can't figure out why veth
can work in dpdk data path when tx offload features are on, it looks like
you're fixing this big issue, right?

For tap interface, it can't support TSO, do you Redhat guys have plan to
enable it on kernel side.

-邮件原件-
发件人: Flavio Leitner [mailto:f...@sysclose.org] 
发送时间: 2020年2月4日 5:46
收件人: d...@openvswitch.org
抄送: Stokes Ian ; Loftus Ciara
; Ilya Maximets ; Yi Yang (杨
�D)-云服务集团 ; txfh2007 ; Ben
Pfaff ; Flavio Leitner 
主题: [PATCH v2] netdev-linux: Prepend the std packet in the TSO packet

Usually TSO packets are close to 50k, 60k bytes long, so to to copy less
bytes when receiving a packet from the kernel change the approach. Instead
of extending the MTU sized packet received and append with remaining TSO
data from the TSO buffer, allocate a TSO packet with enough headroom to
prepend the std packet data.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Suggested-by: Ben Pfaff 
Signed-off-by: Flavio Leitner 
---
 lib/dp-packet.c|   8 +--
 lib/dp-packet.h|   2 +
 lib/netdev-linux-private.h |   3 +-
 lib/netdev-linux.c | 117 ++---
 4 files changed, 78 insertions(+), 52 deletions(-)

V2:
  - tso packets tailroom depends on headroom in netdev_linux_rxq_recv()
  - iov_len uses packet's tailroom.

  This patch depends on a previous posted patch to work:
  Subject: netdev-linux-private: fix max length to be 16 bits
  https://mail.openvswitch.org/pipermail/ovs-dev/2020-February/367469.html

  With both patches applied, I can run iperf3 and scp on both directions
  with good performance and no issues.

diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 8dfedcb7c..cd2623500
100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -243,8 +243,8 @@ dp_packet_copy__(struct dp_packet *b, uint8_t *new_base,
 
 /* Reallocates 'b' so that it has exactly 'new_headroom' and 'new_tailroom'
  * bytes of headroom and tailroom, respectively. */ -static void
-dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t
new_tailroom)
+void
+dp_packet_resize(struct dp_packet *b, size_t new_headroom, size_t 
+new_tailroom)
 {
 void *new_base, *new_data;
 size_t new_allocated;
@@ -297,7 +297,7 @@ void
 dp_packet_prealloc_tailroom(struct dp_packet *b, size_t size)  {
 if (size > dp_packet_tailroom(b)) {
-dp_packet_resize__(b, dp_packet_headroom(b), MAX(size, 64));
+dp_packet_resize(b, dp_packet_headroom(b), MAX(size, 64));
 }
 }
 
@@ -308,7 +308,7 @@ void
 dp_packet_prealloc_headroom(struct dp_packet *b, size_t size)  {
 if (size > dp_packet_headroom(b)) {
-dp_packet_resize__(b, MAX(size, 64), dp_packet_tailroom(b));
+dp_packet_resize(b, MAX(size, 64), dp_packet_tailroom(b));
 }
 }
 
diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 69ae5dfac..9a9d35183
100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -152,6 +152,8 @@ struct dp_packet *dp_packet_clone_with_headroom(const
struct dp_packet *,  struct dp_packet *dp_packet_clone_data(const void *,
size_t);  struct dp_packet *dp_packet_clone_data_with_headroom(const void *,
size_t,
  size_t headroom);
+void dp_packet_resize(struct dp_packet *b, size_t new_headroom,
+  size_t new_tailroom);
 static inline void dp_packet_delete(struct dp_packet *);
 
 static inline void *dp_packet_at(const struct dp_packet *, size_t offset,
diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h index
be2d7b10b..c7c515f70 100644
--- a/lib/netdev-linux-private.h
+++ b/lib/netdev-linux-private.h
@@ -45,7 +45,8 @@ struct netdev_rxq_linux {
 struct netdev_rxq up;
 bool is_tap;
 int fd;
-char *aux_bufs[NETDEV_MAX_BURST]; /* Batch of preallocated TSO buffers.
*/
+struct dp_packet *aux_bufs[NETDEV_MAX_BURST]; /* Preallocated TSO
+ packets. */
 };
 
 int netdev_linux_construct(struct netdev *); diff --git a/lib/netdev-linux.
c b/lib/netdev-linux.c index 6add3e2fc..c6f3d2740 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -1052,15 +1052,6 @@ static struct netdev_rxq *
 netdev_linux_rxq_alloc(void)
 {
 struct netdev_rxq_linux *rx = xzalloc(sizeof *rx);
-if (userspace_tso_enabled()) {
-int i;
-
-/* Allocate auxiliay buffers to receive TSO packets. */
-for (i = 0; i < NETDEV_MAX_BURST; i++) {
-rx->aux_bufs[i] = xmalloc(LINUX_RXQ_TSO_MAX_LEN);
-}
-}
-
 return >up;
 }
 
@@ -1172,7 +1163,7 @@ netdev_linux_rxq_destruct(struct netdev_rxq *rxq_)
 }
 
 for (i = 0; i < NETDEV_MAX_BURST; i++) {
-free(rx->aux_bufs[i]);
+dp_packet_delete(rx->aux_bufs[i]);
 }
 }
 
@@ -1238,13 +1229,18 @@ netdev_linux_batch_rxq_re

[ovs-dev] 答复: [PATCH] Use TPACKET_V1/V2/V3 to accelerate veth for DPDK datapath

2020-02-02 Thread D
William, sorry for late reply.

About your question in
https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/367133.html,
af_packet I was saying is DPDK af_packet, its interface type is dpdk, so its
performance can be up to 4.00Gbps, for non-DPDK interface, handling thread
is ovs-vswitchd but not pmd_thread, so TPACKET_V3 only can reach 1.98 Gbps,
1.47Gbps is for my last patch Ben merged (recvmmsg for batch receiving). 

>Hi Yiyang,
>
>I don't understand these three numbers.
>Don't you also use af_packet for veth for 1.47 Gbps and 1.98 Gbps?
>What's the difference between your 4.00 Gbps and 1.98Gbps?
>
>William

-邮件原件-----
发件人: Yi Yang (杨�D)-云服务集团 
发送时间: 2020年2月3日 12:06
收件人: 'u9012...@gmail.com' ; 'b...@ovn.org'
; 'yang_y...@163.com' 
抄送: 'ovs-dev@openvswitch.org' ;
'ian.sto...@intel.com' 
主题: 答复: [PATCH] Use TPACKET_V1/V2/V3 to accelerate veth for DPDK
datapath
重要性: 高

Hi, William

Sorry for last reply, I don't know why I always can't get your comments
email from my outlook, Ben's comments are ok, I also can't see your comments
in outlook junk box.

About your comments in
https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/367146.html, I
checked it in my CentOS 7 which has 3.10.0 kernel, TPACKET_V3 sample code
can work, so I'm ok to remove V1 code.

>Hi Yiyang,
>
>Can we just implement TPACKET v3, and drop v2 and v1?
>V3 is supported since kernel 3.10,

>commit f6fb8f100b807378fda19e83e5ac6828b638603a
>Author: chetan loke 
>Date:   Fri Aug 19 10:18:16 2011 +
>
>af-packet: TPACKET_V3 flexible buffer implementation.
>
>and based on OVS release
>http://docs.openvswitch.org/en/latest/faq/releases/
>after OVS 2.12, the minimum kernel requirement is 3.10.
>
>Regards,
>William


-邮件原件-
发件人: Yi Yang (杨�D)-云服务集团 
发送时间: 2020年2月3日 10:36
收件人: 'b...@ovn.org' ; 'yang_y...@163.com' 
抄送: 'ovs-dev@openvswitch.org' ;
'ian.sto...@intel.com' 
主题: 答复: [PATCH] Use TPACKET_V1/V2/V3 to accelerate veth for DPDK
datapath
重要性: 高

Hi, all

Current tap, internal and system interfaces aren't handled by pmd_thread, so
the performance can't be boosted too high, I have a very simple test just by
setting is_pmd to true for them, the below is my data for veth (using
TPACKET_V3), you can see pmd_thread is much better than ovs_vswitchd
obviously, compared with my previous data 1.98Gbps, my question is if we can
set is_pmd to true by default, I'll set is_pmd to true in next version if no
objection.

$ sudo ip netns exec ns01 iperf3 -t 60 -i 10 -c 10.15.1.3
--get-server-output
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 59590 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  3.59 GBytes  3.09 Gbits/sec0   3.04 MBytes
[  4]  10.00-20.00  sec  3.57 GBytes  3.06 Gbits/sec0   3.04 MBytes
[  4]  20.00-30.00  sec  3.60 GBytes  3.09 Gbits/sec0   3.04 MBytes
[  4]  30.00-40.00  sec  3.56 GBytes  3.06 Gbits/sec0   3.04 MBytes
[  4]  40.00-50.00  sec  3.64 GBytes  3.12 Gbits/sec0   3.04 MBytes
[  4]  50.00-60.00  sec  3.62 GBytes  3.11 Gbits/sec0   3.04 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec  21.6 GBytes  3.09 Gbits/sec0 sender
[  4]   0.00-60.00  sec  21.6 GBytes  3.09 Gbits/sec
receiver

Server output:
---
Accepted connection from 10.15.1.2, port 59588
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 59590
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  3.57 GBytes  3.07 Gbits/sec
[  5]  10.00-20.00  sec  3.57 GBytes  3.06 Gbits/sec
[  5]  20.00-30.00  sec  3.60 GBytes  3.09 Gbits/sec
[  5]  30.00-40.00  sec  3.56 GBytes  3.06 Gbits/sec
[  5]  40.00-50.00  sec  3.64 GBytes  3.12 Gbits/sec
[  5]  50.00-60.00  sec  3.62 GBytes  3.11 Gbits/sec


iperf Done.
eipadmin@cmp008:~$

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2020年1月22日 3:26
收件人: yang_y...@163.com
抄送: ovs-dev@openvswitch.org; ian.sto...@intel.com; Yi Yang (杨�D)-云服务集
团 
主题: Re: [PATCH] Use TPACKET_V1/V2/V3 to accelerate veth for DPDK datapath

On Tue, Jan 21, 2020 at 02:49:47AM -0500, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> We can avoid high system call overhead by using TPACKET_V1/V2/V3 and 
> use DPDK-like poll to receive and send packets (Note: send still needs 
> to call sendto to trigger final packet transmission).
> 
> I can see about 30% improvement compared to last recvmmsg optimization 
> if I use TPACKET_V3. TPACKET_V1/V2 is worse than TPACKET_V3, but it 
> still can improve about 20%.
> 
> For veth, it is 1.47 Gbps before this patch, it is about 1.98 Gbps 
> after applied this patch. But it is about 4.00 Gbps if we use 
> af_packet for veth, the bottle neck lies in ovs-vswitchd thread, it 

[ovs-dev] 答复: [PATCH] Use TPACKET_V1/V2/V3 to accelerate veth for DPDK datapath

2020-02-02 Thread D
Hi, William

Sorry for last reply, I don't know why I always can't get your comments
email from my outlook, Ben's comments are ok, I also can't see your comments
in outlook junk box.

About your comments in
https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/367146.html, I
checked it in my CentOS 7 which has 3.10.0 kernel, TPACKET_V3 sample code
can work, so I'm ok to remove V1 code.

>Hi Yiyang,
>
>Can we just implement TPACKET v3, and drop v2 and v1?
>V3 is supported since kernel 3.10,

>commit f6fb8f100b807378fda19e83e5ac6828b638603a
>Author: chetan loke 
>Date:   Fri Aug 19 10:18:16 2011 +
>
>af-packet: TPACKET_V3 flexible buffer implementation.
>
>and based on OVS release
>http://docs.openvswitch.org/en/latest/faq/releases/
>after OVS 2.12, the minimum kernel requirement is 3.10.
>
>Regards,
>William


-邮件原件-
发件人: Yi Yang (杨�D)-云服务集团 
发送时间: 2020年2月3日 10:36
收件人: 'b...@ovn.org' ; 'yang_y...@163.com' 
抄送: 'ovs-dev@openvswitch.org' ;
'ian.sto...@intel.com' 
主题: 答复: [PATCH] Use TPACKET_V1/V2/V3 to accelerate veth for DPDK
datapath
重要性: 高

Hi, all

Current tap, internal and system interfaces aren't handled by pmd_thread, so
the performance can't be boosted too high, I have a very simple test just by
setting is_pmd to true for them, the below is my data for veth (using
TPACKET_V3), you can see pmd_thread is much better than ovs_vswitchd
obviously, compared with my previous data 1.98Gbps, my question is if we can
set is_pmd to true by default, I'll set is_pmd to true in next version if no
objection.

$ sudo ip netns exec ns01 iperf3 -t 60 -i 10 -c 10.15.1.3
--get-server-output
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 59590 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  3.59 GBytes  3.09 Gbits/sec0   3.04 MBytes
[  4]  10.00-20.00  sec  3.57 GBytes  3.06 Gbits/sec0   3.04 MBytes
[  4]  20.00-30.00  sec  3.60 GBytes  3.09 Gbits/sec0   3.04 MBytes
[  4]  30.00-40.00  sec  3.56 GBytes  3.06 Gbits/sec0   3.04 MBytes
[  4]  40.00-50.00  sec  3.64 GBytes  3.12 Gbits/sec0   3.04 MBytes
[  4]  50.00-60.00  sec  3.62 GBytes  3.11 Gbits/sec0   3.04 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec  21.6 GBytes  3.09 Gbits/sec0 sender
[  4]   0.00-60.00  sec  21.6 GBytes  3.09 Gbits/sec
receiver

Server output:
---
Accepted connection from 10.15.1.2, port 59588
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 59590
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  3.57 GBytes  3.07 Gbits/sec
[  5]  10.00-20.00  sec  3.57 GBytes  3.06 Gbits/sec
[  5]  20.00-30.00  sec  3.60 GBytes  3.09 Gbits/sec
[  5]  30.00-40.00  sec  3.56 GBytes  3.06 Gbits/sec
[  5]  40.00-50.00  sec  3.64 GBytes  3.12 Gbits/sec
[  5]  50.00-60.00  sec  3.62 GBytes  3.11 Gbits/sec


iperf Done.
eipadmin@cmp008:~$

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2020年1月22日 3:26
收件人: yang_y...@163.com
抄送: ovs-dev@openvswitch.org; ian.sto...@intel.com; Yi Yang (杨�D)-云服务集
团 
主题: Re: [PATCH] Use TPACKET_V1/V2/V3 to accelerate veth for DPDK datapath

On Tue, Jan 21, 2020 at 02:49:47AM -0500, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> We can avoid high system call overhead by using TPACKET_V1/V2/V3 and 
> use DPDK-like poll to receive and send packets (Note: send still needs 
> to call sendto to trigger final packet transmission).
> 
> I can see about 30% improvement compared to last recvmmsg optimization 
> if I use TPACKET_V3. TPACKET_V1/V2 is worse than TPACKET_V3, but it 
> still can improve about 20%.
> 
> For veth, it is 1.47 Gbps before this patch, it is about 1.98 Gbps 
> after applied this patch. But it is about 4.00 Gbps if we use 
> af_packet for veth, the bottle neck lies in ovs-vswitchd thread, it 
> will handle too many things for every loop (as below) , so it can't 
> work very efficintly as pmd_thread.
> 
> memory_run();
> bridge_run();
> unixctl_server_run(unixctl);
> netdev_run();
> 
> memory_wait();
> bridge_wait();
> unixctl_server_wait(unixctl);
> netdev_wait();
> poll_block();
> 
> In the next step, it will be better if let pmd_thread to handle tap 
> and veth interface.
> 
> Signed-off-by: Yi Yang 
> Co-authored-by: William Tu 
> Signed-off-by: William Tu 

Thanks for the patch!

I am a bit concerned about version compatibility issues here.  There are two
relevant kinds of versions.  The first is the version of the kernel/library
headers.  This patch works pretty hard to adapt to the headers that are
available at compile time, only dealing with the versions of the

[ovs-dev] 答复: [PATCH] Use TPACKET_V1/V2/V3 to accelerate veth for DPDK datapath

2020-02-02 Thread D
Hi, all

Current tap, internal and system interfaces aren't handled by pmd_thread, so
the performance can't be boosted too high, I have a very simple test just by
setting is_pmd to true for them, the below is my data for veth (using
TPACKET_V3), you can see pmd_thread is much better than ovs_vswitchd
obviously, compared with my previous data 1.98Gbps, my question is if we can
set is_pmd to true by default, I'll set is_pmd to true in next version if no
objection.

$ sudo ip netns exec ns01 iperf3 -t 60 -i 10 -c 10.15.1.3
--get-server-output
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 59590 connected to 10.15.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-10.00  sec  3.59 GBytes  3.09 Gbits/sec0   3.04 MBytes
[  4]  10.00-20.00  sec  3.57 GBytes  3.06 Gbits/sec0   3.04 MBytes
[  4]  20.00-30.00  sec  3.60 GBytes  3.09 Gbits/sec0   3.04 MBytes
[  4]  30.00-40.00  sec  3.56 GBytes  3.06 Gbits/sec0   3.04 MBytes
[  4]  40.00-50.00  sec  3.64 GBytes  3.12 Gbits/sec0   3.04 MBytes
[  4]  50.00-60.00  sec  3.62 GBytes  3.11 Gbits/sec0   3.04 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-60.00  sec  21.6 GBytes  3.09 Gbits/sec0 sender
[  4]   0.00-60.00  sec  21.6 GBytes  3.09 Gbits/sec
receiver

Server output:
---
Accepted connection from 10.15.1.2, port 59588
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 59590
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.00  sec  3.57 GBytes  3.07 Gbits/sec
[  5]  10.00-20.00  sec  3.57 GBytes  3.06 Gbits/sec
[  5]  20.00-30.00  sec  3.60 GBytes  3.09 Gbits/sec
[  5]  30.00-40.00  sec  3.56 GBytes  3.06 Gbits/sec
[  5]  40.00-50.00  sec  3.64 GBytes  3.12 Gbits/sec
[  5]  50.00-60.00  sec  3.62 GBytes  3.11 Gbits/sec


iperf Done.
eipadmin@cmp008:~$

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2020年1月22日 3:26
收件人: yang_y...@163.com
抄送: ovs-dev@openvswitch.org; ian.sto...@intel.com; Yi Yang (杨�D)-云服务集
团 
主题: Re: [PATCH] Use TPACKET_V1/V2/V3 to accelerate veth for DPDK datapath

On Tue, Jan 21, 2020 at 02:49:47AM -0500, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> We can avoid high system call overhead by using TPACKET_V1/V2/V3 and 
> use DPDK-like poll to receive and send packets (Note: send still needs 
> to call sendto to trigger final packet transmission).
> 
> I can see about 30% improvement compared to last recvmmsg optimization 
> if I use TPACKET_V3. TPACKET_V1/V2 is worse than TPACKET_V3, but it 
> still can improve about 20%.
> 
> For veth, it is 1.47 Gbps before this patch, it is about 1.98 Gbps 
> after applied this patch. But it is about 4.00 Gbps if we use 
> af_packet for veth, the bottle neck lies in ovs-vswitchd thread, it 
> will handle too many things for every loop (as below) , so it can't 
> work very efficintly as pmd_thread.
> 
> memory_run();
> bridge_run();
> unixctl_server_run(unixctl);
> netdev_run();
> 
> memory_wait();
> bridge_wait();
> unixctl_server_wait(unixctl);
> netdev_wait();
> poll_block();
> 
> In the next step, it will be better if let pmd_thread to handle tap 
> and veth interface.
> 
> Signed-off-by: Yi Yang 
> Co-authored-by: William Tu 
> Signed-off-by: William Tu 

Thanks for the patch!

I am a bit concerned about version compatibility issues here.  There are two
relevant kinds of versions.  The first is the version of the kernel/library
headers.  This patch works pretty hard to adapt to the headers that are
available at compile time, only dealing with the versions of the protocols
that are available from the headers.  This approach is sometimes fine, but
an approach can be better is to simply declare the structures or constants
that the headers lack.  This is often pretty easy for Linux data structures.
OVS does this for some structures that it cares about with the headers in
ovs/include/linux.
This approach has two advantages: the OVS code (outside these special
declarations) doesn't have to care whether particular structures are
declared, because they are always declared, and the OVS build always
supports a particular feature regardless of the headers of the system on
which it was built.

The second kind of version is the version of the system that OVS runs on.
Unless a given feature is one that is supported by every version that OVS
cares about, OVS needs to test at runtime whether the feature is supported
and, if not, fall back to the older feature.  I don't see that in this code.
Instead, it looks to me like it assumes that if the feature was available at
build time, then it is available at runtime.
This is not a good way to do things, since we want people to be ab

[ovs-dev] 答复: [PATCH] Use TPACKET_V1/V2/V3 to accelerate veth for DPDK datapath

2020-01-22 Thread D
Ben, thank you so much for your quick comments, yes, using some code to
check TPACKET features will be better, but I'm not familiar with AC_CHECK*
stuff, it will be better if you can show me a good example for reference,
I'll fix the issues you mentioned in next version. BTW, I'm taking Chinese
New Year long holiday, so next version post will be sent out after one week
at least. Welcome more comments from other folks.

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2020年1月22日 3:26
收件人: yang_y...@163.com
抄送: ovs-dev@openvswitch.org; ian.sto...@intel.com; Yi Yang (杨�D)-云服务集
团 
主题: Re: [PATCH] Use TPACKET_V1/V2/V3 to accelerate veth for DPDK datapath

On Tue, Jan 21, 2020 at 02:49:47AM -0500, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> We can avoid high system call overhead by using TPACKET_V1/V2/V3 and 
> use DPDK-like poll to receive and send packets (Note: send still needs 
> to call sendto to trigger final packet transmission).
> 
> I can see about 30% improvement compared to last recvmmsg optimization 
> if I use TPACKET_V3. TPACKET_V1/V2 is worse than TPACKET_V3, but it 
> still can improve about 20%.
> 
> For veth, it is 1.47 Gbps before this patch, it is about 1.98 Gbps 
> after applied this patch. But it is about 4.00 Gbps if we use 
> af_packet for veth, the bottle neck lies in ovs-vswitchd thread, it 
> will handle too many things for every loop (as below) , so it can't 
> work very efficintly as pmd_thread.
> 
> memory_run();
> bridge_run();
> unixctl_server_run(unixctl);
> netdev_run();
> 
> memory_wait();
> bridge_wait();
> unixctl_server_wait(unixctl);
> netdev_wait();
> poll_block();
> 
> In the next step, it will be better if let pmd_thread to handle tap 
> and veth interface.
> 
> Signed-off-by: Yi Yang 
> Co-authored-by: William Tu 
> Signed-off-by: William Tu 

Thanks for the patch!

I am a bit concerned about version compatibility issues here.  There are two
relevant kinds of versions.  The first is the version of the kernel/library
headers.  This patch works pretty hard to adapt to the headers that are
available at compile time, only dealing with the versions of the protocols
that are available from the headers.  This approach is sometimes fine, but
an approach can be better is to simply declare the structures or constants
that the headers lack.  This is often pretty easy for Linux data structures.
OVS does this for some structures that it cares about with the headers in
ovs/include/linux.
This approach has two advantages: the OVS code (outside these special
declarations) doesn't have to care whether particular structures are
declared, because they are always declared, and the OVS build always
supports a particular feature regardless of the headers of the system on
which it was built.

The second kind of version is the version of the system that OVS runs on.
Unless a given feature is one that is supported by every version that OVS
cares about, OVS needs to test at runtime whether the feature is supported
and, if not, fall back to the older feature.  I don't see that in this code.
Instead, it looks to me like it assumes that if the feature was available at
build time, then it is available at runtime.
This is not a good way to do things, since we want people to be able to get
builds from distributors such as Red Hat or Debian and then run those builds
on a diverse collection of kernels.

One specific comment I have here is that, in acinclude.m4, it would be
better to use AC_CHECK_TYPE or AC_CHECK_TYPES thatn OVS_GREP_IFELSE.
The latter is for testing for kernel builds only; we can't use the normal
AC_* tests for those because we often can't successfully build kernel
headers using the compiler and flags that Autoconf sets up for building OVS.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH RFC] WIP: netdev-tpacket: Add AF_PACKET v3 support.

2019-12-22 Thread D
Thanks William, af_packet only can open tap interface, it can't create tap
interface. Tap interface onlu can be created by the below way

ovs-vsctl add-port tapX -- set interface tapX type=internal

this tap is very special, it is like a mystery to me so far. "ip tuntap add
tapX mode tap" can't work for such tap interface.

Anybody can tell me how I can create such a tap interface without using "
ovs-vsctl add-port tapX"

By the way, I tried af_packet for veth, the performance is very good, it is
about 4Gbps on my machine, but it used TPACKET_V2.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2019年12月21日 1:50
收件人: Ben Pfaff 
抄送: d...@openvswitch.org; i.maxim...@ovn.org; Yi Yang (杨�D)-云服务集团
; echau...@redhat.com
主题: Re: [PATCH RFC] WIP: netdev-tpacket: Add AF_PACKET v3 support.

On Thu, Dec 19, 2019 at 08:44:30PM -0800, Ben Pfaff wrote:
> On Thu, Dec 19, 2019 at 04:41:25PM -0800, William Tu wrote:
> > Currently the performance of sending packets from userspace ovs to 
> > kernel veth device is pretty bad as reported from YiYang[1].
> > The patch adds AF_PACKET v3, tpacket v3, as another way to tx/rx 
> > packet to linux device, hopefully showing better performance.
> > 
> > AF_PACKET v3 should get closed to 1Mpps, as shown[2]. However, my 
> > current patch using iperf tcp shows only 1.4Gbps, maybe I'm doing 
> > something wrong.  Also DPDK has similar implementation using 
> > AF_PACKET v2[3].  This is still work-in-progress but any feedbacks 
> > are welcome.
> 
> Is there a good reason that this is implemented as a new kind of 
> netdev rather than just a new way for the existing netdev 
> implementation to do packet i/o?

The AF_PACKET v3 is more like PMD mode driver (the netdev-afxdp and other
dpdk netdev), which has its own memory mgmt, ring structure, and polling the
descriptors. So I implemented it as a new kind. I feel its pretty different
than tap or existing af_packet netdev.

But integrate it to the existing netdev (lib/netdev-linux.c) is also OK.

William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [PATCH RFC] WIP: netdev-tpacket: Add AF_PACKET v3 support.

2019-12-19 Thread D
Hi, William

What kernel version can support AF_PACKET v3? I can try it with your patch.

-邮件原件-
发件人: William Tu [mailto:u9012...@gmail.com] 
发送时间: 2019年12月20日 8:41
收件人: d...@openvswitch.org
抄送: i.maxim...@ovn.org; Yi Yang (杨�D)-云服务集团 ;
b...@ovn.org; echau...@redhat.com
主题: [PATCH RFC] WIP: netdev-tpacket: Add AF_PACKET v3 support.

Currently the performance of sending packets from userspace ovs to kernel
veth device is pretty bad as reported from YiYang[1].
The patch adds AF_PACKET v3, tpacket v3, as another way to tx/rx packet to
linux device, hopefully showing better performance.

AF_PACKET v3 should get closed to 1Mpps, as shown[2]. However, my current
patch using iperf tcp shows only 1.4Gbps, maybe I'm doing something wrong.
Also DPDK has similar implementation using AF_PACKET v2[3].  This is still
work-in-progress but any feedbacks are welcome.

[1] https://patchwork.ozlabs.org/patch/1204939/
[2] slide 18, https://www.netdevconf.info/2.2/slides/karlsson-afpacket-talk.
pdf
[3] dpdk/drivers/net/af_packet/rte_eth_af_packet.c
---
 lib/automake.mk|   2 +
 lib/netdev-linux-private.h |  23 +++
 lib/netdev-linux.c |  24 ++-
 lib/netdev-provider.h  |   1 +
 lib/netdev-tpacket.c   | 487
+
 lib/netdev-tpacket.h   |  43 
 lib/netdev.c   |   1 +
 7 files changed, 580 insertions(+), 1 deletion(-)  create mode 100644
lib/netdev-tpacket.c  create mode 100644 lib/netdev-tpacket.h

diff --git a/lib/automake.mk b/lib/automake.mk index
17b36b43d9d7..0c635404cb43 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -398,6 +398,8 @@ lib_libopenvswitch_la_SOURCES += \
lib/netdev-linux.c \
lib/netdev-linux.h \
lib/netdev-linux-private.h \
+   lib/netdev-tpacket.c \
+   lib/netdev-tpacket.h \
lib/netdev-offload-tc.c \
lib/netlink-conntrack.c \
lib/netlink-conntrack.h \
diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h index
f08159aa7b53..99a2c03bb2a6 100644
--- a/lib/netdev-linux-private.h
+++ b/lib/netdev-linux-private.h
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -37,6 +38,24 @@
 
 struct netdev;
 
+/* tpacket rx and tx ring structure. */ struct tp_ring {
+struct iovec *rd;   /* rd[n] points to mmap area. */
+int rd_len;
+int rd_num;
+char *mm;   /* mmap address. */
+size_t mm_len;
+unsigned int next_avail_block;
+int frame_len;
+};
+
+struct tpacket_info {
+int fd;
+struct tpacket_req3 req;
+struct tp_ring rxring;
+struct tp_ring txring;
+};
+
 struct netdev_rxq_linux {
 struct netdev_rxq up;
 bool is_tap;
@@ -110,6 +129,10 @@ struct netdev_linux {
 
 struct netdev_afxdp_tx_lock *tx_locks;  /* Array of locks for TX
queues. */  #endif
+
+/* tpacket v3 information. */
+struct tpacket_info **tps;
+int n_tps;
 };
 
 static bool
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index
f8e59bacfb13..edfc389ee6f2 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -36,9 +36,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
-#include 
+//#include 
 #include 
 #include 
 #include 
@@ -57,6 +58,7 @@
 #include "openvswitch/hmap.h"
 #include "netdev-afxdp.h"
 #include "netdev-provider.h"
+#include "netdev-tpacket.h"
 #include "netdev-vport.h"
 #include "netlink-notifier.h"
 #include "netlink-socket.h"
@@ -3315,6 +3317,26 @@ const struct netdev_class netdev_afxdp_class = {
 .rxq_recv = netdev_afxdp_rxq_recv,
 };
 #endif
+
+const struct netdev_class netdev_tpacket_class = {
+NETDEV_LINUX_CLASS_COMMON,
+.type = "tpacket",
+.is_pmd = true,
+.construct = netdev_linux_construct,
+.destruct = netdev_linux_destruct,
+.get_stats = netdev_linux_get_stats,
+.get_features = netdev_linux_get_features,
+.get_status = netdev_linux_get_status,
+.set_config = netdev_tpacket_set_config,
+.get_config = netdev_tpacket_get_config,
+.reconfigure = netdev_tpacket_reconfigure,
+.get_block_id = netdev_linux_get_block_id,
+.get_numa_id = netdev_afxdp_get_numa_id,
+.send = netdev_tpacket_batch_send,
+.rxq_construct = netdev_linux_rxq_construct,
+.rxq_destruct = netdev_linux_rxq_destruct,
+.rxq_recv = netdev_tpacket_rxq_recv, };
 

 
 #define CODEL_N_QUEUES 0x
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index f109c4e66f0d..518d1dc6e02c 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -833,6 +833,7 @@ extern const struct netdev_class netdev_bsd_class;
 extern const struct netdev_class netdev_windows_class;
 #else
 extern const struct netdev_class netdev_linux_class;
+extern const struct netdev_class netdev_tpacket_class;
 #endif
 extern const struct netdev_class netdev_internal_class;
 extern const struct netdev_class netdev_tap_class;
dif

[ovs-dev] 答复: [PATCH] socket-util: Introduce emulation and wrapper for recvmmsg().

2019-12-19 Thread D
Current ovs matser has included sendmmsg declaration in
include/sparse/sys/socket.h

int sendmmsg(int, struct mmsghdr *, unsigned int, unsigned int);

I saw  "+^L" in your patch.

--- a/lib/socket-util.c
+++ b/lib/socket-util.c
@@ -1283,3 +1283,59 @@ wrap_sendmmsg(int fd, struct mmsghdr *msgs, unsigned
int n, unsigned int flags)
 }
 #endif
 #endif
+^L
+#ifndef _WIN32 /* Avoid using recvmsg on Windows entirely. */

+#undef recvmmsg
+int
+wrap_recvmmsg(int fd, struct mmsghdr *msgs, unsigned int n,
+  int flags, struct timespec *timeout)
+{
+ovs_assert(!timeout);   /* XXX not emulated */
+
+static bool recvmmsg_broken = false;
+if (!recvmmsg_broken) {
+int save_errno = errno;
+int retval = recvmmsg(fd, msgs, n, flags, timeout);
+if (retval >= 0 || errno != ENOSYS) {
+return retval;
+}
+recvmmsg_broken = true;
+errno = save_errno;
+}
+return emulate_recvmmsg(fd, msgs, n, flags, timeout);
+}
+#endif

I don't understand why call recvmmsg here although we have known recvmmsg
isn't defined, I don't think "static bool recvmmsg_broken" is thread-safe. I
think we can completely remove the below part if we do know recvmmsg isn't
defined (I think autoconf can detect it very precisely, we needn't to do
runtime check for this)
+static bool recvmmsg_broken = false;
+if (!recvmmsg_broken) {
+int save_errno = errno;
+int retval = recvmmsg(fd, msgs, n, flags, timeout);
+if (retval >= 0 || errno != ENOSYS) {
+return retval;
+}
+recvmmsg_broken = true;
+errno = save_errno;
+}


-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2019年12月18日 4:39
收件人: d...@openvswitch.org
抄送: Ben Pfaff ; Yi Yang (杨�D)-云服务集团

主题: [PATCH] socket-util: Introduce emulation and wrapper for recvmmsg().

Not every system will have recvmmsg(), so introduce compatibility code that
will allow it to be used blindly from the rest of the tree.

This assumes that recvmmsg() and sendmmsg() are either both present or both
absent in system libraries and headers.

CC: Yi Yang 
Signed-off-by: Ben Pfaff 
---
I haven't actually tested this!

 include/sparse/sys/socket.h |  7 -
 lib/socket-util.c   | 56 +
 lib/socket-util.h   | 24 +---
 3 files changed, 76 insertions(+), 11 deletions(-)

diff --git a/include/sparse/sys/socket.h b/include/sparse/sys/socket.h index
4178f57e2bda..6ff245ae939b 100644
--- a/include/sparse/sys/socket.h
+++ b/include/sparse/sys/socket.h
@@ -27,6 +27,7 @@
 
 typedef unsigned short int sa_family_t;  typedef __socklen_t socklen_t;
+struct timespec;
 
 struct sockaddr {
 sa_family_t sa_family;
@@ -126,7 +127,8 @@ enum {
 MSG_PEEK,
 MSG_TRUNC,
 MSG_WAITALL,
-MSG_DONTWAIT
+MSG_DONTWAIT,
+MSG_WAITFORONE
 };
 
 enum {
@@ -171,4 +173,7 @@ int sockatmark(int);  int socket(int, int, int);  int
socketpair(int, int, int, int[2]);
 
+int sendmmsg(int, struct mmsghdr *, unsigned int, int); int 
+recvmmsg(int, struct mmsghdr *, unsigned int, int, struct timespec *);
+
 #endif /*  for sparse */
diff --git a/lib/socket-util.c b/lib/socket-util.c index
6b7378de934b..f6f6f3b0a33f 100644
--- a/lib/socket-util.c
+++ b/lib/socket-util.c
@@ -1283,3 +1283,59 @@ wrap_sendmmsg(int fd, struct mmsghdr *msgs, unsigned
int n, unsigned int flags)  }  #endif  #endif
+

+#ifndef _WIN32 /* Avoid using recvmsg on Windows entirely. */ static 
+int emulate_recvmmsg(int fd, struct mmsghdr *msgs, unsigned int n,
+ int flags, struct timespec *timeout OVS_UNUSED) {
+ovs_assert(!timeout);   /* XXX not emulated */
+
+bool waitforone = flags & MSG_WAITFORONE;
+flags &= ~MSG_WAITFORONE;
+
+for (unsigned int i = 0; i < n; i++) {
+ssize_t retval = recvmsg(fd, [i].msg_hdr, flags);
+if (retval < 0) {
+return i ? i : retval;
+}
+msgs[i].msg_len = retval;
+
+if (waitforone) {
+flags |= MSG_DONTWAIT;
+}
+}
+return n;
+}
+
+#ifndef HAVE_SENDMMSG
+int
+recvmmsg(int fd, struct mmsghdr *msgs, unsigned int n,
+ int flags, struct timespec *timeout) {
+return emulate_recvmmsg(fd, msgs, n, flags, timeout); } #else
+/* recvmmsg was redefined in lib/socket-util.c, should undef recvmmsg 
+here
+ * to avoid recursion */
+#undef recvmmsg
+int
+wrap_recvmmsg(int fd, struct mmsghdr *msgs, unsigned int n,
+  int flags, struct timespec *timeout) {
+ovs_assert(!timeout);   /* XXX not emulated */
+
+static bool recvmmsg_broken = false;
+if (!recvmmsg_broken) {
+int save_errno = errno;
+int retval = recvmmsg(fd, msgs, n, flags, timeout);
+if (retval >= 0 || errno != ENOSYS) {
+return retval;
+}
+recvmmsg_broken = true;
+errno = save_errno;
+}
+return emulate_recvmmsg(fd, msgs, n, fl

[ovs-dev] 答复: [PATCH] Use batch process recv for tap and raw socket in netdev datapath

2019-12-17 Thread D
Ben, thank for your review, for recvmmsg, we have to prepare some buffers
for it, but we have no way to know how many packets are there for socket, so
these mallocs are must-have overhead, maybe self-adaptive malloc mechanism
is better, for example, the first receive just mallocs 4 buffers, if it
receives 4 buffers successfully, we can increase it to 8, till it is up to
32, if it can't receive all the buffers, we can decrease it by one half, but
this will make code complicated a bit.

Your fix is right, I should be set to 0 when retval < 0, thank for your
review again, I'll update it with your fix patch and send another version.

-邮件原件-
发件人: Ben Pfaff [mailto:b...@ovn.org] 
发送时间: 2019年12月18日 4:14
收件人: yang_y...@163.com
抄送: ovs-dev@openvswitch.org; ian.sto...@intel.com; Yi Yang (杨�D)-云服务集
团 
主题: Re: [PATCH] Use batch process recv for tap and raw socket in netdev
datapath

On Fri, Dec 06, 2019 at 02:09:24AM -0500, yang_y...@163.com wrote:
> From: Yi Yang 
> 
> Current netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock just 
> receive single packet, that is very inefficient, per my test case 
> which adds two tap ports or veth ports into OVS bridge
> (datapath_type=netdev) and use iperf3 to do performance test between 
> two ports (they are set into different network name space).

Thanks for the patch!  This is an impressive performance improvement!

Each call to netdev_linux_batch_rxq_recv_sock() now calls malloc() 32 times.
This is expensive if only a few packets (or none) are received.
Maybe it doesn't matter, but I wonder whether it affects performance.

I think that no packets are freed on error.  Fix:

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index
9cb45d5c7d29..3414a6495ced 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -1198,6 +1198,7 @@ netdev_linux_batch_rxq_recv_sock(int fd, int mtu,
 if (retval < 0) {
 /* Save -errno to retval temporarily */
 retval = -errno;
+i = 0;
 goto free_buffers;
 }
 

To get sparse to work, one must fold in the following:

diff --git a/include/sparse/sys/socket.h b/include/sparse/sys/socket.h index
4178f57e2bda..e954ade714b5 100644
--- a/include/sparse/sys/socket.h
+++ b/include/sparse/sys/socket.h
@@ -27,6 +27,7 @@
 
 typedef unsigned short int sa_family_t;  typedef __socklen_t socklen_t;
+struct timespec;
 
 struct sockaddr {
 sa_family_t sa_family;
@@ -171,4 +172,7 @@ int sockatmark(int);  int socket(int, int, int);  int
socketpair(int, int, int, int[2]);
 
+int sendmmsg(int, struct mmsghdr *, unsigned int, int); int 
+recvmmsg(int, struct mmsghdr *, unsigned int, int, struct timespec *);
+
 #endif /*  for sparse */
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [openvswitch.org代发]Re: [PATCH] Use batch process recv for tap and raw socket in netdev datapath

2019-12-07 Thread D
William, thank you for your test, it is one of solutions to OVS DPDK issues
in OVS Conference :-), this is a kind of very cheap improving way, the
performance isn't that bad, it is basically acceptable for common use cases
which don't expect high network performance.

-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 William Tu
发送时间: 2019年12月7日 12:19
收件人: yang_y...@163.com
抄送: ovs-dev 
主题: [openvswitch.org代发]Re: [ovs-dev] [PATCH] Use batch process recv for
tap and raw socket in netdev datapath

On Thu, Dec 5, 2019 at 11:09 PM  wrote:
>
> From: Yi Yang 
>
> Current netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock just 
> receive single packet, that is very inefficient, per my test case 
> which adds two tap ports or veth ports into OVS bridge
> (datapath_type=netdev) and use iperf3 to do performance test between 
> two ports (they are set into different network name space).
>
> The result is as below:
>
>   tap:  295 Mbits/sec
>   veth: 207 Mbits/sec
>
> After I change netdev_linux_rxq_recv_tap and 
> netdev_linux_rxq_recv_sock to use batch process, the performance is 
> boosted by about 7 times, here is the result:
>
>   tap:  1.96 Gbits/sec
>   veth: 1.47 Gbits/sec
>
> Undoubtedly this is a huge improvement although it can't match OVS 
> kernel datapath yet.
>
> FYI: here is thr result for OVS kernel datapath:
>
>   tap:  37.2 Gbits/sec
>   veth: 36.3 Gbits/sec
>
> Note: performance result is highly related with your test machine , 
> you shouldn't expect the same results on your test machine.

Hi Yi Yang,

Thanks for the patch, it's amazing with so much performance improvement.
I haven't reviewed the code but Yifeng and I applied and tested this patch.
Using netdev-afxdp + tap port, we do see performance improves from 300Mbps
to 2Gbps in our testbed!

Will add more feedback next week.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: [openvswitch.org代发] [PATCH v2] netdev-afxdp: Best-effort configuration of XDP mode.

2019-11-19 Thread D
Hi, Ilya

Can you explain what kernel limitations are for TCP for veth? I can't
understand why veth has such limitations only for TCP. I saw a veth bug
(https://tech.vijayp.ca/linux-kernel-bug-delivers-corrupt-tcp-ip-data-to-mes
os-kubernetes-docker-containers-4986f88f7a19) but it has been fixed in 2016.

-邮件原件-
发件人: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-bounces@openvswitch.
org] 代表 Ilya Maximets
发送时间: 2019年11月7日 19:37
收件人: ovs-dev@openvswitch.org
抄送: Ilya Maximets 
主题: [openvswitch.org代发][ovs-dev] [PATCH v2] netdev-afxdp: Best-effort
configuration of XDP mode.

Until now there was only two options for XDP mode in OVS: SKB or DRV.
i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.

Devices like 'veth' interfaces in Linux supports native XDP, but doesn't
support zero-copy mode.  This case can not be covered by existing API and we
have to use slower generic XDP for such devices.
There are few more issues, e.g. TCP is not supported in generic XDP mode for
veth interfaces due to kernel limitations, however it is supported in native
mode.

This change introduces ability to use native XDP without zero-copy along
with best-effort configuration option that enabled by default.
In best-effort case OVS will sequentially try different modes starting from
the fastest one and will choose the first acceptable for current interface.
This will guarantee the best possible performance.

If user will want to choose specific mode, it's still possible by setting
the 'options:xdp-mode'.

This change additionally changes the API by renaming the configuration knob
from 'xdpmode' to 'xdp-mode' and also renaming the modes themselves to be
more user-friendly.

The full list of currently supported modes:
  * native-with-zerocopy - former DRV
  * native   - new one, DRV without zero-copy
  * generic  - former SKB
  * best-effort  - new one, chooses the best available from
   3 above modes

Since 'best-effort' is a default mode, users will not need to explicitely
set 'xdp-mode' in most cases.

TCP related tests enabled back in system afxdp testsuite, because
'best-effort' will choose 'native' mode for veth interfaces and this mode
has no issues with TCP.

Signed-off-by: Ilya Maximets 
---

With this patch I modified the user-visible API, but I think it's OK since
it's still an experimental netdev.  Comments are welcome.

Version 2:
  * Rebased on current master.

 Documentation/intro/install/afxdp.rst |  54 ---
 NEWS  |  12 +-
 lib/netdev-afxdp.c| 223 --
 lib/netdev-afxdp.h|   9 ++
 lib/netdev-linux-private.h|   8 +-
 tests/system-afxdp-macros.at  |   7 -
 vswitchd/vswitch.xml  |  38 +++--
 7 files changed, 227 insertions(+), 124 deletions(-)

diff --git a/Documentation/intro/install/afxdp.rst
b/Documentation/intro/install/afxdp.rst
index a136db0c9..937770ad0 100644
--- a/Documentation/intro/install/afxdp.rst
+++ b/Documentation/intro/install/afxdp.rst
@@ -153,9 +153,8 @@ To kick start end-to-end autotesting::
   make check-afxdp TESTSUITEFLAGS='1'
 
 .. note::
-   Not all test cases pass at this time. Currenly all TCP related
-   tests, ex: using wget or http, are skipped due to XDP limitations
-   on veth. cvlan test is also skipped.
+   Not all test cases pass at this time. Currenly all cvlan tests are
skipped
+   due to kernel issues.
 
 If a test case fails, check the log at::
 
@@ -177,33 +176,35 @@ in :doc:`general`::
   ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
 
 Make sure your device driver support AF_XDP, netdev-afxdp supports -the
following additional options (see man ovs-vswitchd.conf.db for
+the following additional options (see ``man ovs-vswitchd.conf.db`` for
 more details):
 
- * **xdpmode**: use "drv" for driver mode, or "skb" for skb mode.
+ * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``,
+   ``native`` or ``generic``.  Defaults to ``best-effort``, i.e. best of
+   supported modes, so in most cases you don't need to change it.
 
- * **use-need-wakeup**: default "true" if libbpf supports it, otherwise
false.
+ * ``use-need-wakeup``: default ``true`` if libbpf supports it,
+   otherwise ``false``.
 
 For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
-configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and n_rxq**.
-The **xdpmode** can be "drv" or "skb"::
+configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, and
+``n_rxq``::
 
   ethtool -L enp2s0 combined 1
   ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
   ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
-options:n_rxq=1 options:xdpmode=drv \
-other_config:pmd-rxq-affinity="0:4"
+   other_config:pmd-rxq-affinity="0:4"
 
 Or, use 4 pmds/cores and 4 queues by doing::
 
   ethtool -L enp2s0 combined 4
   ovs-vsctl 

[ovs-dev] can OVS conntrack support IP list like this: actions=ct(commit, table=0, zone=1, nat(dst=220.0.0.3, 220.0.0.7, 220.0.0.123))?

2019-11-05 Thread D
Hi, folks

 

We need to do SNAT for many internal IPs by just using several public IPs,
we also need to do DNAT by some other public IPs for exposing webservice,
openflow rules look like the below:

 

table=0,ip,nw_src=172.17.0.0/16,…,actions=ct(commit,table=0,zone=1,nat(src=
220.0.0.3,220.0.0.7,220.0.0.123))

table=0,ip,nw_src=172.18.0.67,…,actions=ct(commit,table=0,zone=1,nat(src=22
0.0.0.3,220.0.0.7,220.0.0.123))

table=0,ip,tcp,nw_dst=220.0.0.11,tp_dst=80,…,actions=ct(commit,table=0,zone
=2,nat(dst=172.16.0.100:80))

table=0,ip,tcp,nw_dst=220.0.0.11,
tp_dst=443,…,actions=ct(commit,table=0,zone=2,nat(dst=172.16.0.100:443))

 

 

>From ct document, it seems it can’t support IP list for nat, anybody knows
how we can handle such cases in some kind feasible way?

 

In addition, is it ok if multiple openflow rules use the same NAT IP:PORT
combination? I’m not sure if it will result in some conflicts for SNAT,
because all of them need to do dynamic source port mapping, per my test, it
seems this isn’t a problem.

 

Thank you all in advance and appreciate your help sincerely.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 答复: Why are iperf3 udp packets out of order in OVS DPDK case?

2019-08-27 Thread D
Ian, here is my configuration, sorry I can't show flow details because it is
confidential. By the way, iperf3 tcp is ok and performance is good enough,
I'm really confused, udp was ok but tcp were not ok in my VM environment
before, it broke my sense :-), I can avoid out of order issue if I control
udp bandwidth to 1G by -b 1G.

The traffic doesn't reach vlan ports, this ovs node acts as a NAT gateway,
it steers the traffic back and forth between iperf3 client and server,
iperf3 client and server are other physical machines which are IP reachable
for this ovs node.

$ sudo ovs-vsctl show
4135a1ed-2bcb-449a-bb07-ed907d6c265f
Bridge br-int
Port br-int
Interface br-int
type: internal
Port "vlan151"
tag: 151
Interface "vlan151"
type: internal
Port "vlan12"
tag: 12
Interface "vlan12"
type: internal
Port "dpdk0"
Interface "dpdk0"
type: dpdk
options: {dpdk-devargs=":07:00.1", n_rxq="7"}
Port "vlan11"
tag: 11
Interface "vlan11"
type: internal
Port "vlan153"
tag: 153
Interface "vlan153"
type: internal
ovs_version: "2.11.1"
$ sudo ovs-vsctl list Open_vSwitch
_uuid   : 4135a1ed-2bcb-449a-bb07-ed907d6c265f
bridges : [778ea619-496c-417c-ac08-92d7784f1660]
cur_cfg : 46
datapath_types  : [netdev, system]
db_version  : "7.16.1"
dpdk_initialized: true
dpdk_version: "DPDK 18.11.1"
external_ids: {hostname="eip01", rundir="/var/run/openvswitch",
system-id="f331dcc0-8ae7-4f2b-aa30-10ae4c8a7b11"}
iface_types : [dpdk, dpdkr, dpdkvhostuser, dpdkvhostuserclient,
erspan, geneve, gre, internal, "ip6erspan", "ip6gre", lisp, patch, stt,
system, tap, vxlan]
manager_options : []
next_cfg: 46
other_config: {dpdk-init="true", dpdk-socket-mem="4096",
pmd-cpu-mask="0xfe"}
ovs_version : "2.11.1"
ssl : []
statistics      : {}
system_type : ubuntu
system_version  : "16.04"
inspur@eip01:~$ sudo ovs-vsctl -- get Interface dpdk0 mtu_request
9000

-邮件原件-
发件人: Stokes, Ian [mailto:ian.sto...@intel.com] 
发送时间: 2019年8月27日 18:02
收件人: Yi Yang (杨�D)-云服务集团 ;
ovs-disc...@openvswitch.org
抄送: ovs-dev@openvswitch.org
主题: Re: [ovs-dev] Why are iperf3 udp packets out of order in OVS DPDK
case?



On 8/27/2019 9:35 AM, Yi Yang (杨�D)-云服务集团 wrote:
> Hi, all
> 
>   
> 
> I’m doing experiments with OVS and OVS DPDK, only one bridge is there, 
> ports and flows are same for OVS and OVS DPDK, in OVS case, everything 
> works well, but in OVS DPDK case, iperf udp performance data are very 
> poor, udp packets are out of order, I have limited MTU and send buffer 
> by �Cl1410 �C M1410, anybody knows why and how to fix it? Thank you in
advance.
> 

Hi,

can you provide more detail of you deployment? OVS version, DPDK version,
configuration commands for ports/flows etc.

Thanks
Ian

> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Why are iperf3 udp packets out of order in OVS DPDK case?

2019-08-27 Thread D
Hi, all

 

I’m doing experiments with OVS and OVS DPDK, only one bridge is there,
ports and flows are same for OVS and OVS DPDK, in OVS case, everything works
well, but in OVS DPDK case, iperf udp performance data are very poor, udp
packets are out of order, I have limited MTU and send buffer by �Cl1410 �C
M1410, anybody knows why and how to fix it? Thank you in advance.

 

iperf3: OUT OF ORDER - incoming packet = 65 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 66 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 67 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 68 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 69 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 70 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 71 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 72 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 73 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 74 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 75 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 76 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 77 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 78 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 79 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 80 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 81 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 82 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 83 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 84 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 85 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 86 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 87 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 88 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 89 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 90 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 91 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 92 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 93 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 94 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 95 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 96 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 97 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 98 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 99 and received packet = 352 AND SP
= 5

iperf3: OUT OF ORDER - incoming packet = 100 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 101 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 102 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 103 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 104 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 105 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 106 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 107 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 108 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 109 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 110 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 111 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 112 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 113 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 114 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 115 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 116 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 117 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 118 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 119 and received packet = 352 AND
SP = 5

iperf3: OUT OF ORDER - incoming packet = 120 and received packet = 352 AND
SP = 5


[ovs-dev] why action "meter" only can be specified once?

2019-08-05 Thread D
Hi, all

 

I was told meter only can be specified once, but actually there is such case
existing, i.e. multiple flows share a total bandwidth, but every flow also
has its own bandwidth limit, by two meters, we can not only get every flow
stats but also get total stats, I think this is very reasonable user
scenario.

 

ovs-ofctl: instruction meter may be specified only once

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] How can we improve veth and tap performance in OVS DPDK?

2019-07-29 Thread D
Hi, all

 

We’re trying OVS DPDK in openstack cloud, but a big warn makes us hesitate.
Floating IP and qrouter use tap interfaces which are attached into br-int,
SNAT also should use similar way, so OVS DPDK will impact on VM network
performance significantly, I believe many cloud providers have deployed OVS
DPDK, my questions are:

 

1.   Do we have some known ways to improve this?

2.   Is there any existing effort for this? Veth in kubernetes should
have the same performance issue in OVS DPDK case.

 

I also found a very weird issue. I added two veth pairs into ovs bridge and
ovs DPDK bridge, for ovs case, iperf3 can work well, but it can’t for OVS
DPDK case, what’s wrong.

 

$ sudo ./my-ovs-vsctl show

2a67c1d9-51dc-4728-bb3e-405f2f49e2b1

Bridge br-int

Port "veth3-br"

Interface "veth3-br"

Port "dpdk0"

Interface "dpdk0"

type: dpdk

options: {dpdk-devargs=":00:08.0"}

Port br-int

Interface br-int

type: internal

Port "veth2-br"

Interface "veth2-br"

Port "dpdk1"

Interface "dpdk1"

type: dpdk

options: {dpdk-devargs=":00:09.0"}

Port "veth4-br"

Interface "veth4-br"

Port "veth1-br"

Interface "veth1-br"

$ sudo ip netns exec ns1 ifconfig veth1

veth1 Link encap:Ethernet  HWaddr 26:32:e8:f3:1e:2a

  inet addr:20.1.1.1  Bcast:20.1.1.255  Mask:255.255.255.0

  inet6 addr: fe80::2432:e8ff:fef3:1e2a/64 Scope:Link

  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

  RX packets:809 errors:0 dropped:0 overruns:0 frame:0

  TX packets:20 errors:0 dropped:0 overruns:0 carrier:0

  collisions:0 txqueuelen:1000

  RX bytes:66050 (66.0 KB)  TX bytes:1580 (1.5 KB)

 

$ sudo ip netns exec ns2 ifconfig veth2

veth2 Link encap:Ethernet  HWaddr 82:71:3b:41:d1:ec

  inet addr:20.1.1.2  Bcast:20.1.1.255  Mask:255.255.255.0

  inet6 addr: fe80::8071:3bff:fe41:d1ec/64 Scope:Link

  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

  RX packets:862 errors:0 dropped:0 overruns:0 frame:0

  TX packets:26 errors:0 dropped:0 overruns:0 carrier:0

  collisions:0 txqueuelen:1000

  RX bytes:70436 (70.4 KB)  TX bytes:2024 (2.0 KB)

 

$ sudo ip netns exec ns2 ping 20.1.1.1

PING 20.1.1.1 (20.1.1.1) 56(84) bytes of data.

64 bytes from 20.1.1.1: icmp_seq=1 ttl=64 time=0.353 ms

64 bytes from 20.1.1.1: icmp_seq=2 ttl=64 time=0.322 ms

64 bytes from 20.1.1.1: icmp_seq=3 ttl=64 time=0.333 ms

64 bytes from 20.1.1.1: icmp_seq=4 ttl=64 time=0.329 ms

64 bytes from 20.1.1.1: icmp_seq=5 ttl=64 time=0.340 ms

^C

--- 20.1.1.1 ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 4099ms

rtt min/avg/max/mdev = 0.322/0.335/0.353/0.019 ms

$ sudo ip netns exec ns1 iperf3 -s -i 10 &

[2] 2851

[1]   Exit 1  sudo ip netns exec ns1 iperf3 -s -i 10

$ ---

Server listening on 5201

---

 

$ sudo ip netns exec ns2 iperf3 -t 60 -i 10 -c 20.1.1.1

iperf3: error - unable to connect to server: Connection timed out

$

 

iperf3 has always been hanging there, then exit because of timeout, what's
wrong here?

 

$ sudo ./my-ovs-ofctl -Oopenflow13 dump-flows br-int

cookie=0x0, duration=1076.396s, table=0, n_packets=1522, n_bytes=124264,
priority=0 actions=NORMAL

$

 

The below is Redhat OSP document for your reference.

 

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/
html/network_functions_virtualization_planning_and_configuration_guide/part-
dpdk-configure

 


8.8. Known limitations


There are certain limitations when configuring OVS-DPDK with Red Hat
OpenStack Platform for the NFV use case: 

*   Use Linux bonds for control plane networks. Ensure both PCI devices
used in the bond are on the same NUMA node for optimum performance. Neutron
Linux bridge configuration is not supported by Red Hat. 
*   Huge pages are required for every instance running on the hosts with
OVS-DPDK. If huge pages are not present in the guest, the interface appears
but does not function. 
*   There is a performance degradation of services that use tap devices,
because these devices do not support DPDK. For example, services such as
DVR, FWaaS, and LBaaS use tap devices. 

*   With OVS-DPDK, you can enable DVR with netdev datapath, but this has
poor performance and is not suitable for a production environment. DVR uses
kernel namespace and tap devices to perform the routing. 
*   To ensure the DVR routing performs well with OVS-DPDK, you need to
use a controller such as ODL which implements routing as OpenFlow rules.
With OVS-DPDK, OpenFlow routing removes the bottleneck introduced by the
Linux kernel 

[ovs-dev] How can I delete flows which match a given cookie value?

2019-07-16 Thread D
Hi, all

 

I need to add and delete flows according to user operations, I know
openflowplugin in Opendaylight can do this, but it seems “ovs-ofctl
del-flows” can’t do this way, why can’t cookie value be used to do this
for “ovs-ofctl del-flows”?

 

sudo ovs-ofctl -Oopenflow13 --strict del-flows br-int "table=2,cookie=12345"

ovs-ofctl: cannot set cookie

 

mod-flows can use cookie to modify flows, anybody can tell me one way to do
this for del-flows? I have a unqiue cooki value for every user’s flows, I
want to delete those flows by ID/cookie when the user is deleted.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Why is ovs DPDK much worse than ovs in my test case?

2019-07-09 Thread D
Hi, all

 

I just use ovs as a static router in my test case, ovs is ran in vagrant VM,
ethernet interfaces uses virtio driver, I create two ovs bridges, each one
adds one ethernet interface, two bridges are connected by patch port, only
default openflow rule is there.

 

table=0, priority=0 actions=NORMAL

 

Bridge br-int

Port patch-br-ex

Interface patch-br-ex

type: patch

options: {peer=patch-br-int}

Port br-int

Interface br-int

type: internal

Port "dpdk0"

Interface "dpdk0"

type: dpdk

options: {dpdk-devargs=":00:08.0"}

Bridge br-ex

Port "dpdk1"

Interface "dpdk1"

type: dpdk

options: {dpdk-devargs=":00:09.0"}

Port patch-br-int

Interface patch-br-int

type: patch

options: {peer=patch-br-ex}

Port br-ex

Interface br-ex

type: internal

 

But when I run iperf to do performance benchmark, the result shocked me.

 

For ovs nondpdk, the result is

 

vagrant@client1:~$ iperf -t 60 -i 10 -c 192.168.230.101



Client connecting to 192.168.230.101, TCP port 5001

TCP window size: 85.0 KByte (default)



[  3] local 192.168.200.101 port 53900 connected with 192.168.230.101 port
5001

[ ID] Interval   Transfer Bandwidth

[  3]  0.0-10.0 sec  1.05 GBytes   905 Mbits/sec

[  3] 10.0-20.0 sec  1.02 GBytes   877 Mbits/sec

[  3] 20.0-30.0 sec  1.07 GBytes   922 Mbits/sec

[  3] 30.0-40.0 sec  1.08 GBytes   927 Mbits/sec

[  3] 40.0-50.0 sec  1.06 GBytes   914 Mbits/sec

[  3] 50.0-60.0 sec  1.07 GBytes   922 Mbits/sec

[  3]  0.0-60.0 sec  6.37 GBytes   911 Mbits/sec

vagrant@client1:~$

 

For ovs dpdk, the bandwidth is just about 45Mbits/sec, why? I really don’t
understand what happened.

 

vagrant@client1:~$ iperf -t 60 -i 10 -c 192.168.230.101



Client connecting to 192.168.230.101, TCP port 5001

TCP window size: 85.0 KByte (default)



[  3] local 192.168.200.101 port 53908 connected with 192.168.230.101 port
5001

[ ID] Interval   Transfer Bandwidth

[  3]  0.0-10.0 sec  54.6 MBytes  45.8 Mbits/sec

[  3] 10.0-20.0 sec  55.5 MBytes  46.6 Mbits/sec

[  3] 20.0-30.0 sec  52.5 MBytes  44.0 Mbits/sec

[  3] 30.0-40.0 sec  53.6 MBytes  45.0 Mbits/sec

[  3] 40.0-50.0 sec  54.0 MBytes  45.3 Mbits/sec

[  3] 50.0-60.0 sec  53.9 MBytes  45.2 Mbits/sec

[  3]  0.0-60.0 sec   324 MBytes  45.3 Mbits/sec

vagrant@client1:~$

 

By the way, I tried to pin physical cores to qemu processes which correspond
to ovs pmd threads, but it hardly affects on performance.

 

  PID USER  PR  NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND
P

16303 yangyi 20   0   9207120 209700 107500 R 99.9  0.1   63:02.37
EMT-1  1

16304 yangyi 20   0   9207120 209700 107500 R 99.9  0.1   69:16.16
EMT-2  2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev