On Wed, Jan 10, 2018 at 02:38:13PM +0000, Chandran, Sugesh wrote:
> Hi Yuanhan,
> 
> Thank you for your reply!..and for the wishes too :)
>
> At very high level, my comments are on the approach taken for enabling 
> hardware acceleration in OVS-DPDK. In general, the proposal works very well 
> for the specific partial offload use case. i.e alleviate the miniflow extract 
> and optimize the megaflow lookup. I totally agree on the performance 
> advantage aspect of it. However I expect the hardware acceleration in 
> OVS-DPDK should be in much more broad scope than just limited to a flow 
> lookup optimization. I had raised a few comments around these point earlier 
> when the previous versions of patches were released. 
> 
> Once again I am sharing my thoughts here to initiate a discussion on this 
> area. In my point of view, hardware acceleration in OVS-DPDK will be,

Thank you for sharing it!

> 1)    Must understand the hardware and its capabilities. I would prefer a 
> proactive hardware management than the reactive model. This approach let the 
> software collect the relevant hardware information in advance before 
> interacting with it. It will have impact on all the port specific operations 
> including port init, device programming and etc.

Yes, I agree, and I think we have also discussed it before. Likely, doing
probes proactively seems to work.

> 2)    Define hardware acceleration modes to support different acceleration 
> methods. It will allow to work with various hardware devices based on its 
> capabilities.
> 3)    Its nice to have to make OVS software to use device + port model. This 
> is helpful when enumerating device bound characteristics (such as supported 
> protocols, tcam size, and etc). Also it helps when handling flows that span 
> across different hardware acceleration devices.
> 4)    Having device model as mentioned in (3) helps OVS to distribute 
> hardware flow install into multiple threads than just single thread in the 
> current implementation. May be its possible to have one flow thread per 
> device to avoid the contention. 

Single thread is chosen for simplicity, and I do think it could be extended
to be multiple thread, when it comes to necessary.

> 5)    The action translation and flow validation for the hardware 
> acceleration has to be device centric. So having one set of functions may not 
> work for other use cases. Consider another partial hardware acceleration 
> approach 'zero copy' where packets are copied directly to the VM queues by 
> the NIC. For zero copy acceleration,  the actions cannot be 'mark and RSS ' 
> instead it  will be specific queue action. Similarly different acceleration 
> mode will have its own validations and set of actions. May be its good to 
> have function hooks for these operation based on acceleration mode.

Sure, the idea case would be choose different flow actions (or combinations)
based on different model. And I think it would be easy to do such change.

> 
> Considering there are no DPDK support available for handle most of the above 
> points, I don't mind pushing this patch into OVS mainline. However I would 
> expect a refactoring of that code in the future to make it generic enough for 
> supporting different use cases and modes. By having a generic approach it is 
> possible to support various partial acceleration & full acceleration modes 
> based on the available hardware.
> 

Agree. And that's also what I have been thinking of. It's also easier to
make decisions when we are at the step of doing real extension (say, adding
the full offload).

        --yliu

> > -----Original Message-----
> > From: Yuanhan Liu [mailto:y...@fridaylinux.org]
> > Sent: Monday, December 25, 2017 6:41 AM
> > To: Chandran, Sugesh <sugesh.chand...@intel.com>
> > Cc: Finn Christensen <f...@napatech.com>; d...@openvswitch.org; Darrell Ball
> > <db...@vmware.com>; Simon Horman <simon.hor...@netronome.com>;
> > Stokes, Ian <ian.sto...@intel.com>
> > Subject: Re: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow
> > 
> > Hi Sugesh,
> > 
> > On Fri, Dec 22, 2017 at 06:02:16PM +0000, Chandran, Sugesh wrote:
> > > Hi Yuanhan,
> > >
> > > Thank you for working on this.
> > > I will be reviewing this patch when I am back office after my Xmas/New 
> > > Year
> > vacation.
> > 
> > Thank you for that. And enjoy your new year vacation!
> > 
> > > I do have some comments as well as the questions on how this proposal is
> > going to work with the different hardwares, full acceleration, and other 
> > partial
> > offload options. Lets have that conversation after the holidays and see 
> > whats
> > the best way to get it work for most of the cases.
> > 
> > I think most of this patchset is generic enough, including translating 
> > "match" to
> > rte flow patterns, doing offload in another thread.
> > 
> > What is not really generic enough is the flow mark part. Firstly, it's a 
> > partial
> > offload. Secondly, it may not work well with Intel NIC (and other NICs) due 
> > to
> > the MAKR + RSS action combination: some NIC might only support one of the
> > two actions. Also, the code path for full acceleration should be different. 
> >  But I
> > think none of them will be a real matter. Say, for the full acceleration, 
> > there
> > would be no MARK attached in the mbuf, then the flow mark code path is 
> > totally
> > passed, while the flow offload installing path is re-used.
> > 
> > Probably what we need to do is to determine when to do a full offload, or
> > partial offload (say, the flow mark) at somewhere (i.e. netdev-dpdk.c).
> > Based on the decision, we chose the proper flow action for the different 
> > kinds of
> > offload.
> > 
> > I think we will know that when we are dealing with full offload later.
> > And I also think this patchset is a good start for the OVS-DPDK flow 
> > offload.
> > 
> >     --yliu
> > >
> > >
> > >
> > >
> > >
> > > Regards
> > > _Sugesh
> > >
> > >
> > > > -----Original Message-----
> > > > From: Finn Christensen [mailto:f...@napatech.com]
> > > > Sent: Friday, December 22, 2017 11:29 AM
> > > > To: Yuanhan Liu <y...@fridaylinux.org>; d...@openvswitch.org
> > > > Cc: Darrell Ball <db...@vmware.com>; Chandran, Sugesh
> > > > <sugesh.chand...@intel.com>; Simon Horman
> > > > <simon.hor...@netronome.com>; Stokes, Ian <ian.sto...@intel.com>
> > > > Subject: RE: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow
> > > >
> > > > Hi,
> > > >
> > > > The addition of the offload thread is an great addition,
> > > > furthermore, RSS action usage for target queues, works now well with
> > > > our NICs, I just had to do some additional drive work, to make it all 
> > > > work.
> > > >
> > > > The support for RSS removed the problem with rte-flow queue
> > > > selection, but RSS in general added an additional challenge as now
> > > > multiple pmd's may request the same flow offload (as Yuanhan pointed
> > > > out). The solution has worked without issues for me. I have run
> > > > tests with 1 and 2 queues in a PV and PVP setup going over virtio.
> > > > Yuanhan mentioned tests in a P2P setup with high throughput
> > > > acceleration. I have focused on a PVP scenario, which, in addition,
> > > > will crosses virtio, and most importantly, has acceleration in Rx
> > > > direction only (seen from VM). It still gives fairly good performance
> > improvements.
> > > >
> > > > Here are my initial test results.
> > > >
> > > > Percentage improvements:
> > > > 1 RX/Tx queue:
> > > > 1     megaflow - 10K flowsPV: 73%
> > > > 10  megaflows - 100K flowsPVP: 50%PV: 56%
> > > > 100 megaflows - 1M flowsPVP: 42%PV: 49%
> > > > 1000 megaflows - 10M flowsPVP: 40%PV: 42%
> > > >
> > > > With RSS: 2 Rx/Tx queue pairs:
> > > > 1     megaflow - 10K flowsPV: 55%
> > > > 10  megaflows - 100K flowsPVP: 36%PV: 38%
> > > > 100 megaflows - 1M flowsPVP: 27%PV: 24%
> > > > 1000 megaflows - 10M flowsPVP: 21%PV: 24%
> > > >
> > > > PVP for 1 megaflow offload is omitted because my VM-app imposed a
> > > > bottle- neck.
> > > >
> > > > Regards,
> > > > Finn
> > > >
> > > >     -----Original Message-----
> > > >     From: Yuanhan Liu [mailto:y...@fridaylinux.org]
> > > >     Sent: 20. december 2017 15:45
> > > >     To: d...@openvswitch.org
> > > >     Cc: Finn Christensen <f...@napatech.com>; Darrell Ball
> > > >     <db...@vmware.com>; Chandran Sugesh <sugesh.chand...@intel.com>;
> > > >     Simon Horman <simon.hor...@netronome.com>; Ian Stokes
> > > >     <ian.sto...@intel.com>; Yuanhan Liu <y...@fridaylinux.org>
> > > >     Subject: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow
> > > >
> > > >     Hi,
> > > >
> > > >     Here is a joint work from Mellanox and Napatech, to enable the flow 
> > > > hw
> > > >     offload with the DPDK generic flow interface (rte_flow).
> > > >
> > > >     The basic idea is to associate the flow with a mark id (a unit32_t 
> > > > number).
> > > >     Later, we then get the flow directly from the mark id, which could 
> > > > bypass
> > > >     some heavy CPU operations, including but not limiting to mini flow
> > extract,
> > > >     emc lookup, dpcls lookup, etc.
> > > >
> > > >     The association is done with CMAP in patch 1. The CPU workload
> > bypassing
> > > >     is done in patch 2. The flow offload is done in patch 3, which 
> > > > mainly does
> > > >     two things:
> > > >
> > > >     - translate the ovs match to DPDK rte flow patterns
> > > >     - bind those patterns with a RSS + MARK action.
> > > >
> > > >     Patch 5 makes the offload work happen in another thread, for 
> > > > leaving the
> > > >     datapath as light as possible.
> > > >
> > > >     A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and
> > 1
> > > >     million streams (tp_src=1000-1999, tp_dst=2000-2999) show more than
> > > >     260% performance boost.
> > > >
> > > >     Note that it's disabled by default, which can be enabled by:
> > > >
> > > >         $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> > > >
> > > >
> > > >     v5: - fixed an issue that it took too long if we do flow add/remove
> > > >           repeatedly.
> > > >         - removed an unused mutex lock
> > > >         - turned most of the log level to DBG
> > > >         - rebased on top of the latest code
> > > >
> > > >     v4: - use RSS action instead of QUEUE action with MARK
> > > >         - make it work with multiple queue (see patch 1)
> > > >         - rebased on top of latest code
> > > >
> > > >     v3: - The mark and id association is done with array instead of 
> > > > CMAP.
> > > >         - Added a thread to do hw offload operations
> > > >         - Removed macros completely
> > > >         - dropped the patch to set FDIR_CONF, which is a workround some
> > > >           Intel NICs.
> > > >         - Added a debug patch to show all flow patterns we have created.
> > > >         - Misc fixes
> > > >
> > > >     v2: - workaround the queue action issue
> > > >         - fixed the tcp_flags being skipped issue, which also fixed the
> > > >           build warnings
> > > >         - fixed l2 patterns for Intel nic
> > > >         - Converted some macros to functions
> > > >         - did not hardcode the max number of flow/action
> > > >         - rebased on top of the latest code
> > > >
> > > >     Thanks.
> > > >
> > > >         --yliu
> > > >
> > > >
> > > >     ---
> > > >     Finn Christensen (1):
> > > >       netdev-dpdk: implement flow offload with rte flow
> > > >
> > > >     Yuanhan Liu (4):
> > > >       dpif-netdev: associate flow with a mark id
> > > >       dpif-netdev: retrieve flow directly from the flow mark
> > > >       netdev-dpdk: add debug for rte flow patterns
> > > >       dpif-netdev: do hw flow offload in a thread
> > > >
> > > >      lib/dp-packet.h   |  13 +
> > > >      lib/dpif-netdev.c | 473 ++++++++++++++++++++++++++++++++++-
> > > >      lib/flow.c        | 155 +++++++++---
> > > >      lib/flow.h        |   1 +
> > > >      lib/netdev-dpdk.c | 732
> > > >     +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > > >      lib/netdev.h      |   6 +
> > > >      6 files changed, 1341 insertions(+), 39 deletions(-)
> > > >
> > > >     --
> > > >     2.7.4
> > > >
> > > > Disclaimer: This email and any files transmitted with it may contain
> > > > confidential information intended for the addressee(s) only. The
> > > > information is not to be surrendered or copied to unauthorized
> > > > persons. If you have received this communication in error, please
> > > > notify the sender immediately and delete this e- mail from your system.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to