Hi All, Based on the comments below, I wanted to call out for a meeting to decide what further needs to be done to handle different kinds of hardware acceleration in OVS-DPDK. We have some POC implementation that partially handles some of the cases , But it only work with some of Intel hardware acceleration products. I also did a presentation about that proposal in the last OVS conference. I feel its time to discuss now to come up with a design that work for all different hardware acceleration devices and possible acceleration modes. If all are agree on the above suggestion, I can send out for a meeting invite sometime next week.(May be Jan15th, Monday 5:00PM GMT). Let me know if this timeslot is OK for everyone.
I am ccing the 'dev' mailing list as well here in case if anyone else wanted to join the call and provide the inputs. Regards _Sugesh > -----Original Message----- > From: Chandran, Sugesh > Sent: Wednesday, January 10, 2018 2:38 PM > To: 'Yuanhan Liu' <y...@fridaylinux.org> > Cc: Finn Christensen <f...@napatech.com>; d...@openvswitch.org; Darrell Ball > <db...@vmware.com>; Simon Horman <simon.hor...@netronome.com>; > Stokes, Ian <ian.sto...@intel.com> > Subject: RE: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow > > Hi Yuanhan, > > Thank you for your reply!..and for the wishes too :) > > > At very high level, my comments are on the approach taken for enabling > hardware acceleration in OVS-DPDK. In general, the proposal works very well > for > the specific partial offload use case. i.e alleviate the miniflow extract and > optimize the megaflow lookup. I totally agree on the performance advantage > aspect of it. However I expect the hardware acceleration in OVS-DPDK should be > in much more broad scope than just limited to a flow lookup optimization. I > had > raised a few comments around these point earlier when the previous versions of > patches were released. > > Once again I am sharing my thoughts here to initiate a discussion on this > area. In > my point of view, hardware acceleration in OVS-DPDK will be, > 1) Must understand the hardware and its capabilities. I would prefer a > proactive hardware management than the reactive model. This approach let the > software collect the relevant hardware information in advance before > interacting with it. It will have impact on all the port specific operations > including port init, device programming and etc. > 2) Define hardware acceleration modes to support different acceleration > methods. It will allow to work with various hardware devices based on its > capabilities. > 3) Its nice to have to make OVS software to use device + port model. This > is helpful when enumerating device bound characteristics (such as supported > protocols, tcam size, and etc). Also it helps when handling flows that span > across > different hardware acceleration devices. > 4) Having device model as mentioned in (3) helps OVS to distribute > hardware flow install into multiple threads than just single thread in the > current > implementation. May be its possible to have one flow thread per device to > avoid > the contention. > 5) The action translation and flow validation for the hardware acceleration > has to be device centric. So having one set of functions may not work for > other > use cases. Consider another partial hardware acceleration approach 'zero copy' > where packets are copied directly to the VM queues by the NIC. For zero copy > acceleration, the actions cannot be 'mark and RSS ' instead it will be > specific > queue action. Similarly different acceleration mode will have its own > validations > and set of actions. May be its good to have function hooks for these operation > based on acceleration mode. > > Considering there are no DPDK support available for handle most of the above > points, I don't mind pushing this patch into OVS mainline. However I would > expect a refactoring of that code in the future to make it generic enough for > supporting different use cases and modes. By having a generic approach it is > possible to support various partial acceleration & full acceleration modes > based > on the available hardware. > > Regards > _Sugesh > > > -----Original Message----- > > From: Yuanhan Liu [mailto:y...@fridaylinux.org] > > Sent: Monday, December 25, 2017 6:41 AM > > To: Chandran, Sugesh <sugesh.chand...@intel.com> > > Cc: Finn Christensen <f...@napatech.com>; d...@openvswitch.org; Darrell > > Ball <db...@vmware.com>; Simon Horman > <simon.hor...@netronome.com>; > > Stokes, Ian <ian.sto...@intel.com> > > Subject: Re: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow > > > > Hi Sugesh, > > > > On Fri, Dec 22, 2017 at 06:02:16PM +0000, Chandran, Sugesh wrote: > > > Hi Yuanhan, > > > > > > Thank you for working on this. > > > I will be reviewing this patch when I am back office after my > > > Xmas/New Year > > vacation. > > > > Thank you for that. And enjoy your new year vacation! > > > > > I do have some comments as well as the questions on how this > > > proposal is > > going to work with the different hardwares, full acceleration, and > > other partial offload options. Lets have that conversation after the > > holidays and see whats the best way to get it work for most of the cases. > > > > I think most of this patchset is generic enough, including translating > > "match" to rte flow patterns, doing offload in another thread. > > > > What is not really generic enough is the flow mark part. Firstly, it's > > a partial offload. Secondly, it may not work well with Intel NIC (and > > other NICs) due to the MAKR + RSS action combination: some NIC might > > only support one of the two actions. Also, the code path for full > > acceleration should be different. But I think none of them will be a > > real matter. Say, for the full acceleration, there would be no MARK > > attached in the mbuf, then the flow mark code path is totally passed, while > > the > flow offload installing path is re-used. > > > > Probably what we need to do is to determine when to do a full offload, > > or partial offload (say, the flow mark) at somewhere (i.e. netdev-dpdk.c). > > Based on the decision, we chose the proper flow action for the > > different kinds of offload. > > > > I think we will know that when we are dealing with full offload later. > > And I also think this patchset is a good start for the OVS-DPDK flow > > offload. > > > > --yliu > > > > > > > > > > > > > > > > > > Regards > > > _Sugesh > > > > > > > > > > -----Original Message----- > > > > From: Finn Christensen [mailto:f...@napatech.com] > > > > Sent: Friday, December 22, 2017 11:29 AM > > > > To: Yuanhan Liu <y...@fridaylinux.org>; d...@openvswitch.org > > > > Cc: Darrell Ball <db...@vmware.com>; Chandran, Sugesh > > > > <sugesh.chand...@intel.com>; Simon Horman > > > > <simon.hor...@netronome.com>; Stokes, Ian <ian.sto...@intel.com> > > > > Subject: RE: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow > > > > > > > > Hi, > > > > > > > > The addition of the offload thread is an great addition, > > > > furthermore, RSS action usage for target queues, works now well > > > > with our NICs, I just had to do some additional drive work, to make it > > > > all > work. > > > > > > > > The support for RSS removed the problem with rte-flow queue > > > > selection, but RSS in general added an additional challenge as now > > > > multiple pmd's may request the same flow offload (as Yuanhan > > > > pointed out). The solution has worked without issues for me. I > > > > have run tests with 1 and 2 queues in a PV and PVP setup going over > > > > virtio. > > > > Yuanhan mentioned tests in a P2P setup with high throughput > > > > acceleration. I have focused on a PVP scenario, which, in > > > > addition, will crosses virtio, and most importantly, has > > > > acceleration in Rx direction only (seen from VM). It still gives > > > > fairly good performance > > improvements. > > > > > > > > Here are my initial test results. > > > > > > > > Percentage improvements: > > > > 1 RX/Tx queue: > > > > 1 megaflow - 10K flowsPV: 73% > > > > 10 megaflows - 100K flowsPVP: 50%PV: 56% > > > > 100 megaflows - 1M flowsPVP: 42%PV: 49% > > > > 1000 megaflows - 10M flowsPVP: 40%PV: 42% > > > > > > > > With RSS: 2 Rx/Tx queue pairs: > > > > 1 megaflow - 10K flowsPV: 55% > > > > 10 megaflows - 100K flowsPVP: 36%PV: 38% > > > > 100 megaflows - 1M flowsPVP: 27%PV: 24% > > > > 1000 megaflows - 10M flowsPVP: 21%PV: 24% > > > > > > > > PVP for 1 megaflow offload is omitted because my VM-app imposed a > > > > bottle- neck. > > > > > > > > Regards, > > > > Finn > > > > > > > > -----Original Message----- > > > > From: Yuanhan Liu [mailto:y...@fridaylinux.org] > > > > Sent: 20. december 2017 15:45 > > > > To: d...@openvswitch.org > > > > Cc: Finn Christensen <f...@napatech.com>; Darrell Ball > > > > <db...@vmware.com>; Chandran Sugesh > <sugesh.chand...@intel.com>; > > > > Simon Horman <simon.hor...@netronome.com>; Ian Stokes > > > > <ian.sto...@intel.com>; Yuanhan Liu <y...@fridaylinux.org> > > > > Subject: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow > > > > > > > > Hi, > > > > > > > > Here is a joint work from Mellanox and Napatech, to enable the flow > > > > hw > > > > offload with the DPDK generic flow interface (rte_flow). > > > > > > > > The basic idea is to associate the flow with a mark id (a unit32_t > number). > > > > Later, we then get the flow directly from the mark id, which could > > > > bypass > > > > some heavy CPU operations, including but not limiting to mini > > > > flow > > extract, > > > > emc lookup, dpcls lookup, etc. > > > > > > > > The association is done with CMAP in patch 1. The CPU workload > > bypassing > > > > is done in patch 2. The flow offload is done in patch 3, which > > > > mainly does > > > > two things: > > > > > > > > - translate the ovs match to DPDK rte flow patterns > > > > - bind those patterns with a RSS + MARK action. > > > > > > > > Patch 5 makes the offload work happen in another thread, for leaving > the > > > > datapath as light as possible. > > > > > > > > A PHY-PHY forwarding with 1000 mega flows > > > > (udp,tp_src=1000-1999) and > > 1 > > > > million streams (tp_src=1000-1999, tp_dst=2000-2999) show more than > > > > 260% performance boost. > > > > > > > > Note that it's disabled by default, which can be enabled by: > > > > > > > > $ ovs-vsctl set Open_vSwitch . > > > > other_config:hw-offload=true > > > > > > > > > > > > v5: - fixed an issue that it took too long if we do flow add/remove > > > > repeatedly. > > > > - removed an unused mutex lock > > > > - turned most of the log level to DBG > > > > - rebased on top of the latest code > > > > > > > > v4: - use RSS action instead of QUEUE action with MARK > > > > - make it work with multiple queue (see patch 1) > > > > - rebased on top of latest code > > > > > > > > v3: - The mark and id association is done with array instead of > > > > CMAP. > > > > - Added a thread to do hw offload operations > > > > - Removed macros completely > > > > - dropped the patch to set FDIR_CONF, which is a workround some > > > > Intel NICs. > > > > - Added a debug patch to show all flow patterns we have created. > > > > - Misc fixes > > > > > > > > v2: - workaround the queue action issue > > > > - fixed the tcp_flags being skipped issue, which also fixed the > > > > build warnings > > > > - fixed l2 patterns for Intel nic > > > > - Converted some macros to functions > > > > - did not hardcode the max number of flow/action > > > > - rebased on top of the latest code > > > > > > > > Thanks. > > > > > > > > --yliu > > > > > > > > > > > > --- > > > > Finn Christensen (1): > > > > netdev-dpdk: implement flow offload with rte flow > > > > > > > > Yuanhan Liu (4): > > > > dpif-netdev: associate flow with a mark id > > > > dpif-netdev: retrieve flow directly from the flow mark > > > > netdev-dpdk: add debug for rte flow patterns > > > > dpif-netdev: do hw flow offload in a thread > > > > > > > > lib/dp-packet.h | 13 + > > > > lib/dpif-netdev.c | 473 ++++++++++++++++++++++++++++++++++- > > > > lib/flow.c | 155 +++++++++--- > > > > lib/flow.h | 1 + > > > > lib/netdev-dpdk.c | 732 > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++- > > > > lib/netdev.h | 6 + > > > > 6 files changed, 1341 insertions(+), 39 deletions(-) > > > > > > > > -- > > > > 2.7.4 > > > > > > > > Disclaimer: This email and any files transmitted with it may > > > > contain confidential information intended for the addressee(s) > > > > only. The information is not to be surrendered or copied to > > > > unauthorized persons. If you have received this communication in > > > > error, please notify the sender immediately and delete this e- mail from > your system. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev