On 9/25/17, 10:37 PM, "Yuanhan Liu" <[email protected]> wrote:
Some hightlights in v3
======================
Here is the v3, with 2 major changes and further testings (including
many flows). This took more effort than I thought, thus v3 publication
has been delayed for a while.
The first major change is the mark and id association is done with array
instead of CMAP now. This gives us further performance gain: it could
be up to 70% now (please see the exact number below).
This change also make the code a bit more complex though, due to the
lock issue. RCU is used (not quite sure it's been used rightly though).
For now, RCU only protects the array base address update (due to
reallocate), it doesn't protect the array item (array[i] = xx]) change.
I think it's buggy and I need rethink about it.
The second major change is there is a thread introduced to do the real
flow offload. This is for diminishing the overhead by hw flow offload
installation/deletion at data path. See patch 9 for more detailed info.
[Darrell] There might be other options to handle this such as dividing time
to HWOL in a given PMD.
Another option to have PMD isolation.
In the last discussion, RSS action was suggested to use to get rid of
the QUEUE action workaround. Unfortunately, it didn't work. The flow
creation failed with MLX5 PMD driver, and that's the only driver in
DPDK that supports RSS action so far.
[Darrell]
I wonder if we should take a pause before jumping into the next version of the
code.
If workarounds are required in OVS code, then so be it as long as they are not
overly disruptive to the existing code and hard to maintain.
In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a reasonable option
to avoid some unpleasant OVS workarounds.
This would make a significant difference in the code paths, if we supported it,
so
we need to be sure as early as possible.
The support needed would be in some drivers and seems reasonably doable.
Moreover, this was discussed in the last dpdk meeting and the support was
indicated as existing?, although I only verified the MLX code, myself.
I had seen the MLX code supporting _RSS action and there are some checks for
supported cases; when you say “it didn't work”, what was the issue ?
Let us have a discussion also about the Intel nic side and the Napatech side.
It seems reasonable to ask where the disconnect is and whether this support
can be added and then make a decision based on the answers.
What do you think?
I also tested many flows this time. The result is more exciting: it
could be up to 267% boost, with 512 mega flows (with each has another
512 exact matching flows, thus it's 512*512=256K flows in total), one
core and one queue doing PHY-PHY forwarding. For the offload case, the
performance keeps the same as with one flow only: because the cost of
the mark to flow translation is constant, no matter how many flows
are inserted (as far as they are all offloaded). However, for the
vanilla ovs-dpdk, the more flows, the worse the performance is. In
another word, the more flows, the bigger difference we will see.
There were too many discussions in last version. I'm sorry if I missed
some comments and didn't do the corresponding changes in v3. Please let
me know if I made such mistakes.
And below are the formal cover letter introduction, for someone who
is the first time to see this patchset.
---
Hi,
Here is a joint work from Mellanox and Napatech, to enable the flow hw
offload with the DPDK generic flow interface (rte_flow).
The basic idea is to associate the flow with a mark id (a unit32_t number).
Later, we then get the flow directly from the mark id, bypassing the heavy
emc processing, including miniflow_extract.
The association is done with array in patch 1. It also reuses the flow
APIs introduced while adding the tc offloads. The emc bypassing is done
in patch 2. The flow offload is done in patch 4, which mainly does two
things:
- translate the ovs match to DPDK rte flow patterns
- bind those patterns with a MARK action.
Afterwards, the NIC will set the mark id in every pkt's mbuf when it
matches the flow. That's basically how we could get the flow directly
from the received mbuf.
While testing with PHY-PHY forwarding with one core, one queue and one
flow, I got about 70% performance boost. For PHY-vhost forwarding, I got
about 50% performance boost. It's basically the performance I got with v1,
when the tcp_flags is the ignored. In summary, the CMPA to array change
gives up yet another 16% performance boost.
The major issue mentioned in v1 is also workarounded: the queue index
is never set to 0 blindly anymore, but set to the rxq that first
receives the upcall pkt.
Note that it's disabled by default, which can be enabled by:
$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
v3: - The mark and id association is done with array instead of CMAP.
- Added a thread to do hw offload operations
- Removed macros completely
- dropped the patch to set FDIR_CONF, which is a workround some
Intel NICs.
- Added a debug patch to show all flow patterns we have created.
- Misc fixes
v2: - workaround the queue action issue
- fixed the tcp_flags being skipped issue, which also fixed the
build warnings
- fixed l2 patterns for Intel nic
- Converted some macros to functions
- did not hardcode the max number of flow/action
- rebased on top of the latest code
Thanks.
--yliu
---
Finn Christensen (2):
netdev-dpdk: implement flow put with rte flow
netdev-dpdk: retry with queue action
Shachar Beiser (1):
dpif-netdev: record rx queue id for the upcall
Yuanhan Liu (6):
dpif-netdev: associate flow with a mark id
dpif-netdev: retrieve flow directly from the flow mark
netdev-dpdk: convert ufid to dpdk flow
netdev-dpdk: remove offloaded flow on deletion
netdev-dpdk: add debug for rte flow patterns
dpif-netdev: do hw flow offload in another thread
lib/dp-packet.h | 14 +
lib/dpif-netdev.c | 421 ++++++++++++++++++++++++++++-
lib/flow.c | 155 ++++++++---
lib/flow.h | 1 +
lib/netdev-dpdk.c | 776
+++++++++++++++++++++++++++++++++++++++++++++++++++++-
lib/netdev.c | 1 +
lib/netdev.h | 7 +
7 files changed, 1331 insertions(+), 44 deletions(-)
--
2.7.4
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev