Hi Yuanhan, Thank you for your reply!..and for the wishes too :)
At very high level, my comments are on the approach taken for enabling hardware acceleration in OVS-DPDK. In general, the proposal works very well for the specific partial offload use case. i.e alleviate the miniflow extract and optimize the megaflow lookup. I totally agree on the performance advantage aspect of it. However I expect the hardware acceleration in OVS-DPDK should be in much more broad scope than just limited to a flow lookup optimization. I had raised a few comments around these point earlier when the previous versions of patches were released. Once again I am sharing my thoughts here to initiate a discussion on this area. In my point of view, hardware acceleration in OVS-DPDK will be, 1) Must understand the hardware and its capabilities. I would prefer a proactive hardware management than the reactive model. This approach let the software collect the relevant hardware information in advance before interacting with it. It will have impact on all the port specific operations including port init, device programming and etc. 2) Define hardware acceleration modes to support different acceleration methods. It will allow to work with various hardware devices based on its capabilities. 3) Its nice to have to make OVS software to use device + port model. This is helpful when enumerating device bound characteristics (such as supported protocols, tcam size, and etc). Also it helps when handling flows that span across different hardware acceleration devices. 4) Having device model as mentioned in (3) helps OVS to distribute hardware flow install into multiple threads than just single thread in the current implementation. May be its possible to have one flow thread per device to avoid the contention. 5) The action translation and flow validation for the hardware acceleration has to be device centric. So having one set of functions may not work for other use cases. Consider another partial hardware acceleration approach 'zero copy' where packets are copied directly to the VM queues by the NIC. For zero copy acceleration, the actions cannot be 'mark and RSS ' instead it will be specific queue action. Similarly different acceleration mode will have its own validations and set of actions. May be its good to have function hooks for these operation based on acceleration mode. Considering there are no DPDK support available for handle most of the above points, I don't mind pushing this patch into OVS mainline. However I would expect a refactoring of that code in the future to make it generic enough for supporting different use cases and modes. By having a generic approach it is possible to support various partial acceleration & full acceleration modes based on the available hardware. Regards _Sugesh > -----Original Message----- > From: Yuanhan Liu [mailto:[email protected]] > Sent: Monday, December 25, 2017 6:41 AM > To: Chandran, Sugesh <[email protected]> > Cc: Finn Christensen <[email protected]>; [email protected]; Darrell Ball > <[email protected]>; Simon Horman <[email protected]>; > Stokes, Ian <[email protected]> > Subject: Re: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow > > Hi Sugesh, > > On Fri, Dec 22, 2017 at 06:02:16PM +0000, Chandran, Sugesh wrote: > > Hi Yuanhan, > > > > Thank you for working on this. > > I will be reviewing this patch when I am back office after my Xmas/New Year > vacation. > > Thank you for that. And enjoy your new year vacation! > > > I do have some comments as well as the questions on how this proposal is > going to work with the different hardwares, full acceleration, and other > partial > offload options. Lets have that conversation after the holidays and see whats > the best way to get it work for most of the cases. > > I think most of this patchset is generic enough, including translating > "match" to > rte flow patterns, doing offload in another thread. > > What is not really generic enough is the flow mark part. Firstly, it's a > partial > offload. Secondly, it may not work well with Intel NIC (and other NICs) due to > the MAKR + RSS action combination: some NIC might only support one of the > two actions. Also, the code path for full acceleration should be different. > But I > think none of them will be a real matter. Say, for the full acceleration, > there > would be no MARK attached in the mbuf, then the flow mark code path is totally > passed, while the flow offload installing path is re-used. > > Probably what we need to do is to determine when to do a full offload, or > partial offload (say, the flow mark) at somewhere (i.e. netdev-dpdk.c). > Based on the decision, we chose the proper flow action for the different > kinds of > offload. > > I think we will know that when we are dealing with full offload later. > And I also think this patchset is a good start for the OVS-DPDK flow offload. > > --yliu > > > > > > > > > > > > Regards > > _Sugesh > > > > > > > -----Original Message----- > > > From: Finn Christensen [mailto:[email protected]] > > > Sent: Friday, December 22, 2017 11:29 AM > > > To: Yuanhan Liu <[email protected]>; [email protected] > > > Cc: Darrell Ball <[email protected]>; Chandran, Sugesh > > > <[email protected]>; Simon Horman > > > <[email protected]>; Stokes, Ian <[email protected]> > > > Subject: RE: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow > > > > > > Hi, > > > > > > The addition of the offload thread is an great addition, > > > furthermore, RSS action usage for target queues, works now well with > > > our NICs, I just had to do some additional drive work, to make it all > > > work. > > > > > > The support for RSS removed the problem with rte-flow queue > > > selection, but RSS in general added an additional challenge as now > > > multiple pmd's may request the same flow offload (as Yuanhan pointed > > > out). The solution has worked without issues for me. I have run > > > tests with 1 and 2 queues in a PV and PVP setup going over virtio. > > > Yuanhan mentioned tests in a P2P setup with high throughput > > > acceleration. I have focused on a PVP scenario, which, in addition, > > > will crosses virtio, and most importantly, has acceleration in Rx > > > direction only (seen from VM). It still gives fairly good performance > improvements. > > > > > > Here are my initial test results. > > > > > > Percentage improvements: > > > 1 RX/Tx queue: > > > 1 megaflow - 10K flowsPV: 73% > > > 10 megaflows - 100K flowsPVP: 50%PV: 56% > > > 100 megaflows - 1M flowsPVP: 42%PV: 49% > > > 1000 megaflows - 10M flowsPVP: 40%PV: 42% > > > > > > With RSS: 2 Rx/Tx queue pairs: > > > 1 megaflow - 10K flowsPV: 55% > > > 10 megaflows - 100K flowsPVP: 36%PV: 38% > > > 100 megaflows - 1M flowsPVP: 27%PV: 24% > > > 1000 megaflows - 10M flowsPVP: 21%PV: 24% > > > > > > PVP for 1 megaflow offload is omitted because my VM-app imposed a > > > bottle- neck. > > > > > > Regards, > > > Finn > > > > > > -----Original Message----- > > > From: Yuanhan Liu [mailto:[email protected]] > > > Sent: 20. december 2017 15:45 > > > To: [email protected] > > > Cc: Finn Christensen <[email protected]>; Darrell Ball > > > <[email protected]>; Chandran Sugesh <[email protected]>; > > > Simon Horman <[email protected]>; Ian Stokes > > > <[email protected]>; Yuanhan Liu <[email protected]> > > > Subject: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow > > > > > > Hi, > > > > > > Here is a joint work from Mellanox and Napatech, to enable the flow hw > > > offload with the DPDK generic flow interface (rte_flow). > > > > > > The basic idea is to associate the flow with a mark id (a unit32_t > > > number). > > > Later, we then get the flow directly from the mark id, which could > > > bypass > > > some heavy CPU operations, including but not limiting to mini flow > extract, > > > emc lookup, dpcls lookup, etc. > > > > > > The association is done with CMAP in patch 1. The CPU workload > bypassing > > > is done in patch 2. The flow offload is done in patch 3, which mainly > > > does > > > two things: > > > > > > - translate the ovs match to DPDK rte flow patterns > > > - bind those patterns with a RSS + MARK action. > > > > > > Patch 5 makes the offload work happen in another thread, for leaving > > > the > > > datapath as light as possible. > > > > > > A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and > 1 > > > million streams (tp_src=1000-1999, tp_dst=2000-2999) show more than > > > 260% performance boost. > > > > > > Note that it's disabled by default, which can be enabled by: > > > > > > $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true > > > > > > > > > v5: - fixed an issue that it took too long if we do flow add/remove > > > repeatedly. > > > - removed an unused mutex lock > > > - turned most of the log level to DBG > > > - rebased on top of the latest code > > > > > > v4: - use RSS action instead of QUEUE action with MARK > > > - make it work with multiple queue (see patch 1) > > > - rebased on top of latest code > > > > > > v3: - The mark and id association is done with array instead of CMAP. > > > - Added a thread to do hw offload operations > > > - Removed macros completely > > > - dropped the patch to set FDIR_CONF, which is a workround some > > > Intel NICs. > > > - Added a debug patch to show all flow patterns we have created. > > > - Misc fixes > > > > > > v2: - workaround the queue action issue > > > - fixed the tcp_flags being skipped issue, which also fixed the > > > build warnings > > > - fixed l2 patterns for Intel nic > > > - Converted some macros to functions > > > - did not hardcode the max number of flow/action > > > - rebased on top of the latest code > > > > > > Thanks. > > > > > > --yliu > > > > > > > > > --- > > > Finn Christensen (1): > > > netdev-dpdk: implement flow offload with rte flow > > > > > > Yuanhan Liu (4): > > > dpif-netdev: associate flow with a mark id > > > dpif-netdev: retrieve flow directly from the flow mark > > > netdev-dpdk: add debug for rte flow patterns > > > dpif-netdev: do hw flow offload in a thread > > > > > > lib/dp-packet.h | 13 + > > > lib/dpif-netdev.c | 473 ++++++++++++++++++++++++++++++++++- > > > lib/flow.c | 155 +++++++++--- > > > lib/flow.h | 1 + > > > lib/netdev-dpdk.c | 732 > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++- > > > lib/netdev.h | 6 + > > > 6 files changed, 1341 insertions(+), 39 deletions(-) > > > > > > -- > > > 2.7.4 > > > > > > Disclaimer: This email and any files transmitted with it may contain > > > confidential information intended for the addressee(s) only. The > > > information is not to be surrendered or copied to unauthorized > > > persons. If you have received this communication in error, please > > > notify the sender immediately and delete this e- mail from your system. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
