On Wed, Jan 10, 2018 at 02:38:13PM +0000, Chandran, Sugesh wrote:
> Hi Yuanhan,
>
> Thank you for your reply!..and for the wishes too :)
>
> At very high level, my comments are on the approach taken for enabling
> hardware acceleration in OVS-DPDK. In general, the proposal works very well
> for the specific partial offload use case. i.e alleviate the miniflow extract
> and optimize the megaflow lookup. I totally agree on the performance
> advantage aspect of it. However I expect the hardware acceleration in
> OVS-DPDK should be in much more broad scope than just limited to a flow
> lookup optimization. I had raised a few comments around these point earlier
> when the previous versions of patches were released.
>
> Once again I am sharing my thoughts here to initiate a discussion on this
> area. In my point of view, hardware acceleration in OVS-DPDK will be,
Thank you for sharing it!
> 1) Must understand the hardware and its capabilities. I would prefer a
> proactive hardware management than the reactive model. This approach let the
> software collect the relevant hardware information in advance before
> interacting with it. It will have impact on all the port specific operations
> including port init, device programming and etc.
Yes, I agree, and I think we have also discussed it before. Likely, doing
probes proactively seems to work.
> 2) Define hardware acceleration modes to support different acceleration
> methods. It will allow to work with various hardware devices based on its
> capabilities.
> 3) Its nice to have to make OVS software to use device + port model. This
> is helpful when enumerating device bound characteristics (such as supported
> protocols, tcam size, and etc). Also it helps when handling flows that span
> across different hardware acceleration devices.
> 4) Having device model as mentioned in (3) helps OVS to distribute
> hardware flow install into multiple threads than just single thread in the
> current implementation. May be its possible to have one flow thread per
> device to avoid the contention.
Single thread is chosen for simplicity, and I do think it could be extended
to be multiple thread, when it comes to necessary.
> 5) The action translation and flow validation for the hardware
> acceleration has to be device centric. So having one set of functions may not
> work for other use cases. Consider another partial hardware acceleration
> approach 'zero copy' where packets are copied directly to the VM queues by
> the NIC. For zero copy acceleration, the actions cannot be 'mark and RSS '
> instead it will be specific queue action. Similarly different acceleration
> mode will have its own validations and set of actions. May be its good to
> have function hooks for these operation based on acceleration mode.
Sure, the idea case would be choose different flow actions (or combinations)
based on different model. And I think it would be easy to do such change.
>
> Considering there are no DPDK support available for handle most of the above
> points, I don't mind pushing this patch into OVS mainline. However I would
> expect a refactoring of that code in the future to make it generic enough for
> supporting different use cases and modes. By having a generic approach it is
> possible to support various partial acceleration & full acceleration modes
> based on the available hardware.
>
Agree. And that's also what I have been thinking of. It's also easier to
make decisions when we are at the step of doing real extension (say, adding
the full offload).
--yliu
> > -----Original Message-----
> > From: Yuanhan Liu [mailto:[email protected]]
> > Sent: Monday, December 25, 2017 6:41 AM
> > To: Chandran, Sugesh <[email protected]>
> > Cc: Finn Christensen <[email protected]>; [email protected]; Darrell Ball
> > <[email protected]>; Simon Horman <[email protected]>;
> > Stokes, Ian <[email protected]>
> > Subject: Re: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow
> >
> > Hi Sugesh,
> >
> > On Fri, Dec 22, 2017 at 06:02:16PM +0000, Chandran, Sugesh wrote:
> > > Hi Yuanhan,
> > >
> > > Thank you for working on this.
> > > I will be reviewing this patch when I am back office after my Xmas/New
> > > Year
> > vacation.
> >
> > Thank you for that. And enjoy your new year vacation!
> >
> > > I do have some comments as well as the questions on how this proposal is
> > going to work with the different hardwares, full acceleration, and other
> > partial
> > offload options. Lets have that conversation after the holidays and see
> > whats
> > the best way to get it work for most of the cases.
> >
> > I think most of this patchset is generic enough, including translating
> > "match" to
> > rte flow patterns, doing offload in another thread.
> >
> > What is not really generic enough is the flow mark part. Firstly, it's a
> > partial
> > offload. Secondly, it may not work well with Intel NIC (and other NICs) due
> > to
> > the MAKR + RSS action combination: some NIC might only support one of the
> > two actions. Also, the code path for full acceleration should be different.
> > But I
> > think none of them will be a real matter. Say, for the full acceleration,
> > there
> > would be no MARK attached in the mbuf, then the flow mark code path is
> > totally
> > passed, while the flow offload installing path is re-used.
> >
> > Probably what we need to do is to determine when to do a full offload, or
> > partial offload (say, the flow mark) at somewhere (i.e. netdev-dpdk.c).
> > Based on the decision, we chose the proper flow action for the different
> > kinds of
> > offload.
> >
> > I think we will know that when we are dealing with full offload later.
> > And I also think this patchset is a good start for the OVS-DPDK flow
> > offload.
> >
> > --yliu
> > >
> > >
> > >
> > >
> > >
> > > Regards
> > > _Sugesh
> > >
> > >
> > > > -----Original Message-----
> > > > From: Finn Christensen [mailto:[email protected]]
> > > > Sent: Friday, December 22, 2017 11:29 AM
> > > > To: Yuanhan Liu <[email protected]>; [email protected]
> > > > Cc: Darrell Ball <[email protected]>; Chandran, Sugesh
> > > > <[email protected]>; Simon Horman
> > > > <[email protected]>; Stokes, Ian <[email protected]>
> > > > Subject: RE: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow
> > > >
> > > > Hi,
> > > >
> > > > The addition of the offload thread is an great addition,
> > > > furthermore, RSS action usage for target queues, works now well with
> > > > our NICs, I just had to do some additional drive work, to make it all
> > > > work.
> > > >
> > > > The support for RSS removed the problem with rte-flow queue
> > > > selection, but RSS in general added an additional challenge as now
> > > > multiple pmd's may request the same flow offload (as Yuanhan pointed
> > > > out). The solution has worked without issues for me. I have run
> > > > tests with 1 and 2 queues in a PV and PVP setup going over virtio.
> > > > Yuanhan mentioned tests in a P2P setup with high throughput
> > > > acceleration. I have focused on a PVP scenario, which, in addition,
> > > > will crosses virtio, and most importantly, has acceleration in Rx
> > > > direction only (seen from VM). It still gives fairly good performance
> > improvements.
> > > >
> > > > Here are my initial test results.
> > > >
> > > > Percentage improvements:
> > > > 1 RX/Tx queue:
> > > > 1 megaflow - 10K flowsPV: 73%
> > > > 10 megaflows - 100K flowsPVP: 50%PV: 56%
> > > > 100 megaflows - 1M flowsPVP: 42%PV: 49%
> > > > 1000 megaflows - 10M flowsPVP: 40%PV: 42%
> > > >
> > > > With RSS: 2 Rx/Tx queue pairs:
> > > > 1 megaflow - 10K flowsPV: 55%
> > > > 10 megaflows - 100K flowsPVP: 36%PV: 38%
> > > > 100 megaflows - 1M flowsPVP: 27%PV: 24%
> > > > 1000 megaflows - 10M flowsPVP: 21%PV: 24%
> > > >
> > > > PVP for 1 megaflow offload is omitted because my VM-app imposed a
> > > > bottle- neck.
> > > >
> > > > Regards,
> > > > Finn
> > > >
> > > > -----Original Message-----
> > > > From: Yuanhan Liu [mailto:[email protected]]
> > > > Sent: 20. december 2017 15:45
> > > > To: [email protected]
> > > > Cc: Finn Christensen <[email protected]>; Darrell Ball
> > > > <[email protected]>; Chandran Sugesh <[email protected]>;
> > > > Simon Horman <[email protected]>; Ian Stokes
> > > > <[email protected]>; Yuanhan Liu <[email protected]>
> > > > Subject: [PATCH v5 0/5] OVS-DPDK flow offload with rte_flow
> > > >
> > > > Hi,
> > > >
> > > > Here is a joint work from Mellanox and Napatech, to enable the flow
> > > > hw
> > > > offload with the DPDK generic flow interface (rte_flow).
> > > >
> > > > The basic idea is to associate the flow with a mark id (a unit32_t
> > > > number).
> > > > Later, we then get the flow directly from the mark id, which could
> > > > bypass
> > > > some heavy CPU operations, including but not limiting to mini flow
> > extract,
> > > > emc lookup, dpcls lookup, etc.
> > > >
> > > > The association is done with CMAP in patch 1. The CPU workload
> > bypassing
> > > > is done in patch 2. The flow offload is done in patch 3, which
> > > > mainly does
> > > > two things:
> > > >
> > > > - translate the ovs match to DPDK rte flow patterns
> > > > - bind those patterns with a RSS + MARK action.
> > > >
> > > > Patch 5 makes the offload work happen in another thread, for
> > > > leaving the
> > > > datapath as light as possible.
> > > >
> > > > A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and
> > 1
> > > > million streams (tp_src=1000-1999, tp_dst=2000-2999) show more than
> > > > 260% performance boost.
> > > >
> > > > Note that it's disabled by default, which can be enabled by:
> > > >
> > > > $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> > > >
> > > >
> > > > v5: - fixed an issue that it took too long if we do flow add/remove
> > > > repeatedly.
> > > > - removed an unused mutex lock
> > > > - turned most of the log level to DBG
> > > > - rebased on top of the latest code
> > > >
> > > > v4: - use RSS action instead of QUEUE action with MARK
> > > > - make it work with multiple queue (see patch 1)
> > > > - rebased on top of latest code
> > > >
> > > > v3: - The mark and id association is done with array instead of
> > > > CMAP.
> > > > - Added a thread to do hw offload operations
> > > > - Removed macros completely
> > > > - dropped the patch to set FDIR_CONF, which is a workround some
> > > > Intel NICs.
> > > > - Added a debug patch to show all flow patterns we have created.
> > > > - Misc fixes
> > > >
> > > > v2: - workaround the queue action issue
> > > > - fixed the tcp_flags being skipped issue, which also fixed the
> > > > build warnings
> > > > - fixed l2 patterns for Intel nic
> > > > - Converted some macros to functions
> > > > - did not hardcode the max number of flow/action
> > > > - rebased on top of the latest code
> > > >
> > > > Thanks.
> > > >
> > > > --yliu
> > > >
> > > >
> > > > ---
> > > > Finn Christensen (1):
> > > > netdev-dpdk: implement flow offload with rte flow
> > > >
> > > > Yuanhan Liu (4):
> > > > dpif-netdev: associate flow with a mark id
> > > > dpif-netdev: retrieve flow directly from the flow mark
> > > > netdev-dpdk: add debug for rte flow patterns
> > > > dpif-netdev: do hw flow offload in a thread
> > > >
> > > > lib/dp-packet.h | 13 +
> > > > lib/dpif-netdev.c | 473 ++++++++++++++++++++++++++++++++++-
> > > > lib/flow.c | 155 +++++++++---
> > > > lib/flow.h | 1 +
> > > > lib/netdev-dpdk.c | 732
> > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > lib/netdev.h | 6 +
> > > > 6 files changed, 1341 insertions(+), 39 deletions(-)
> > > >
> > > > --
> > > > 2.7.4
> > > >
> > > > Disclaimer: This email and any files transmitted with it may contain
> > > > confidential information intended for the addressee(s) only. The
> > > > information is not to be surrendered or copied to unauthorized
> > > > persons. If you have received this communication in error, please
> > > > notify the sender immediately and delete this e- mail from your system.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev