Re: [ovs-dev] [PATCH v1 1/3] dpif-netdev: Move port flush after datapath reconfiguration

2022-02-08 Thread Sriharsha Basavapatna via dev
On Fri, Feb 4, 2022 at 9:43 PM Gaetan Rivet  wrote:

> Port flush and offload uninit should be moved after the datapath
> has been reconfigured. That way, no other thread such as PMDs will
> find this port to poll and enqueue further offload requests.
>
> After a flush, almost no further offload request for this port should
> be found in the queue.
>
> There will still be some issued by revalidators, but they
> will be catched when the offload thread fails to take a netdev ref.
>

catched --> caught

>
> This change fixes the issue of datapath reference being improperly
> accessed by offload threads while it is being destroyed.
>
> Fixes: 5b0aa55776cb ("dpif-netdev: Execute flush from offload thread.")
> Fixes: 62d1c28e9ce0 ("dpif-netdev: Flush offload rules upon port
> deletion.")
> Signed-off-by: Ilya Maximets 
> Signed-off-by: Gaetan Rivet 
> ---
>  lib/dpif-netdev.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e28e0b554..b5702e6a1 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -2313,13 +2313,22 @@ static void
>  do_del_port(struct dp_netdev *dp, struct dp_netdev_port *port)
>  OVS_REQ_WRLOCK(dp->port_rwlock)
>  {
> -dp_netdev_offload_flush(dp, port);
> -netdev_uninit_flow_api(port->netdev);
>  hmap_remove(>ports, >node);
>  seq_change(dp->port_seq);
>
>  reconfigure_datapath(dp);
>
> +/* Flush and disable offloads only after 'port' has been made
> + * inaccessible through datapath reconfiguration.
> + * This prevents having PMDs enqueuing offload requests after
> + * the flush.
> + * When only this port is deleted instead of the whole datapath,
> + * revalidator threads are still active and can still enqueue
> + * offload modification or deletion. Managing those stray requests
> + * is done in the offload threads. */
> +dp_netdev_offload_flush(dp, port);
> +netdev_uninit_flow_api(port->netdev);
> +
>  port_destroy(port);
>  }
>
> --
> 2.31.1
>

 Acked-by: Sriharsha Basavapatna 

Thanks,
-Harsha

>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-02-02 Thread Sriharsha Basavapatna via dev
On Wed, Feb 2, 2022 at 9:22 PM Ilya Maximets  wrote:
>
> On 2/1/22 16:07, Gaëtan Rivet wrote:
> > On Tue, Feb 1, 2022, at 15:00, Ilya Maximets wrote:
> >> On 2/1/22 14:38, Gaëtan Rivet wrote:
> >>> On Tue, Feb 1, 2022, at 12:48, Ilya Maximets wrote:
> >>>> On 2/1/22 12:38, Gaëtan Rivet wrote:
> >>>>> On Mon, Jan 31, 2022, at 20:33, Ilya Maximets wrote:
> >>>>>> On 1/31/22 11:42, Gaëtan Rivet wrote:
> >>>>>>> On Thu, Jan 27, 2022, at 11:38, Ilya Maximets wrote:
> >>>>>>>> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
> >>>>>>>>> In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> >>>>>>>>> offloaded flow, if a mark has been assigned to the flow. But if
> >>>>>>>>> this occurs in the window in which the offload thread completes
> >>>>>>>>> offloading the flow and assigns a mark to the flow, then we miss
> >>>>>>>>> deleting the flow. This problem has been observed while adding
> >>>>>>>>> and deleting flows in a loop. To fix this, always enqueue flow
> >>>>>>>>> deletion regardless of the flow->mark being set.
> >>>>>>>>>
> >>>>>>>>> Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> >>>>>>>>> Signed-off-by: Sriharsha Basavapatna 
> >>>>>>>>> 
> >>>>>>>>> ---
> >>>>>>>>>  lib/dpif-netdev.c | 4 +---
> >>>>>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >>>>>>>>> index e28e0b554..22c5f19a8 100644
> >>>>>>>>> --- a/lib/dpif-netdev.c
> >>>>>>>>> +++ b/lib/dpif-netdev.c
> >>>>>>>>> @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct 
> >>>>>>>>> dp_netdev_pmd_thread *pmd,
> >>>>>>>>>  dp_netdev_simple_match_remove(pmd, flow);
> >>>>>>>>>  cmap_remove(>flow_table, node, 
> >>>>>>>>> dp_netdev_flow_hash(>ufid));
> >>>>>>>>>  ccmap_dec(>n_flows, odp_to_u32(in_port));
> >>>>>>>>> -if (flow->mark != INVALID_FLOW_MARK) {
> >>>>>>>>> -queue_netdev_flow_del(pmd, flow);
> >>>>>>>>> -}
> >>>>>>>>> +queue_netdev_flow_del(pmd, flow);
> >>>>>>>>>  flow->dead = true;
> >>>>>>>>>
> >>>>>>>>>  dp_netdev_flow_unref(flow);
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks for the patch, but it looks like that it's not that simple...
> >>>>>>>> A lot of tests are crashing in GHA and ASAN reports use-after-free
> >>>>>>>> on flow disassociation if the dpif already destroyed:
> >>>>>>>>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
> >>>>>>>>
> >>>>>>>> I think that the problem is that offload thread holds the ref count
> >>>>>>>> for PMD thread, but PMD thread structure doesn't hold the ref count
> >>>>>>>> for the dp, because it doesn't expect that dp_netdev_pmd structure
> >>>>>>>> will be used while PMD thread is destroyed.  I guess, we should fix
> >>>>>>>> that.  OTOH, I'm not sure if offload thread actually needs a 
> >>>>>>>> reference
> >>>>>>>> to dp_netdev_pmd structure.  If I didn't miss something, it only
> >>>>>>>> uses pmd pointer to access pmd->dp.  So, maybe, 
> >>>>>>>> dp_offload_thread_item
> >>>>>>>> should hold and ref the dp pointer instead?
> >>>>>>>>
> >>>>>>>> What do you think?  Gaetan?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Hi Ilya,
> >>>>>>>
> >>>>>>> I've been looking into this issue, you are right that the o

Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-01-27 Thread Sriharsha Basavapatna via dev
On Thu, Jan 27, 2022 at 4:08 PM Ilya Maximets  wrote:
>
> On 1/27/22 07:42, Sriharsha Basavapatna via dev wrote:
> > In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> > offloaded flow, if a mark has been assigned to the flow. But if
> > this occurs in the window in which the offload thread completes
> > offloading the flow and assigns a mark to the flow, then we miss
> > deleting the flow. This problem has been observed while adding
> > and deleting flows in a loop. To fix this, always enqueue flow
> > deletion regardless of the flow->mark being set.
> >
> > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> > Signed-off-by: Sriharsha Basavapatna 
> > ---
> >  lib/dpif-netdev.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index e28e0b554..22c5f19a8 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread 
> > *pmd,
> >  dp_netdev_simple_match_remove(pmd, flow);
> >  cmap_remove(>flow_table, node, dp_netdev_flow_hash(>ufid));
> >  ccmap_dec(>n_flows, odp_to_u32(in_port));
> > -if (flow->mark != INVALID_FLOW_MARK) {
> > -queue_netdev_flow_del(pmd, flow);
> > -}
> > +queue_netdev_flow_del(pmd, flow);
> >  flow->dead = true;
> >
> >  dp_netdev_flow_unref(flow);
> >
>
> Thanks for the patch, but it looks like that it's not that simple...
> A lot of tests are crashing in GHA and ASAN reports use-after-free
> on flow disassociation if the dpif already destroyed:
>   https://github.com/ovsrobot/ovs/actions/runs/1754866321
>
> I think that the problem is that offload thread holds the ref count
> for PMD thread, but PMD thread structure doesn't hold the ref count
> for the dp, because it doesn't expect that dp_netdev_pmd structure
> will be used while PMD thread is destroyed.  I guess, we should fix
> that.  OTOH, I'm not sure if offload thread actually needs a reference
> to dp_netdev_pmd structure.  If I didn't miss something, it only
> uses pmd pointer to access pmd->dp.  So, maybe, dp_offload_thread_item
> should hold and ref the dp pointer instead?
>
> What do you think?  Gaetan?
>
> Another point is that queue_netdev_flow_del() should check for
> netdev_is_flow_api_enabled() to avoid creation of offload threads
> if offloading is disabled.  But that's good that we didn't have it,
> as the refcount issue got exposed.  Otherwise it would be hard
> to reproduce.
>
> Best regards, Ilya Maximets.
>
> Asan report below, for convenience:
>
> =
> ==17076==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x61603080 at pc 0x005e0e38 bp 0x7fe0594309f0 sp 0x7fe0594309e8
> READ of size 8 at 0x61603080 thread T5 (hw_offload4)
> #0 0x5e0e37 in mark_to_flow_disassociate 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2556:62
> #1 0x5dfaf1 in dp_offload_flow 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2821:15
> #2 0x5df8bd in dp_netdev_flow_offload_main 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2884:17
> #3 0x73a2fc in ovsthread_wrapper 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/ovs-thread.c:422:12
> #4 0x7fe0619506da in start_thread 
> (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
> #5 0x7fe060ecf71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e)
>
> 0x61603080 is located 0 bytes inside of 576-byte region 
> [0x61603080,0x616032c0)
> freed by thread T0 here:
> #0 0x49640d in free 
> (/home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/vswitchd/ovs-vswitchd+0x49640d)
> #1 0x5d6652 in dp_netdev_free 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1980:5
> #2 0x5f0722 in dp_netdev_unref 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:1991:13
> #3 0x5cf5e5 in dpif_netdev_close 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif-netdev.c:2002:5
> #4 0x5fc393 in dpif_uninit 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:1725:9
> #5 0x5fc0c0 in dpif_close 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../lib/dpif.c:457:9
> #6 0x52a972 in close_dpif_backer 
> /home/runner/work/ovs/ovs/openvswitch-2.17.90/_build/sub/../../ofproto/ofproto-dpif.c:715:5
> #7

Re: [ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-01-27 Thread Sriharsha Basavapatna via dev
On Thu, Jan 27, 2022 at 2:45 PM Gaëtan Rivet  wrote:
>
> On Thu, Jan 27, 2022, at 07:42, Sriharsha Basavapatna via dev wrote:
> > In dp_netdev_pmd_remove_flow() we schedule the deletion of an
> > offloaded flow, if a mark has been assigned to the flow. But if
> > this occurs in the window in which the offload thread completes
> > offloading the flow and assigns a mark to the flow, then we miss
> > deleting the flow. This problem has been observed while adding
> > and deleting flows in a loop. To fix this, always enqueue flow
> > deletion regardless of the flow->mark being set.
> >
> > Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
> > Signed-off-by: Sriharsha Basavapatna 
> > ---
> >  lib/dpif-netdev.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index e28e0b554..22c5f19a8 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct
> > dp_netdev_pmd_thread *pmd,
> >  dp_netdev_simple_match_remove(pmd, flow);
> >  cmap_remove(>flow_table, node,
> > dp_netdev_flow_hash(>ufid));
> >  ccmap_dec(>n_flows, odp_to_u32(in_port));
> > -if (flow->mark != INVALID_FLOW_MARK) {
> > -queue_netdev_flow_del(pmd, flow);
> > -}
> > +queue_netdev_flow_del(pmd, flow);
>
> Hi Sriharsha,
>
> It makes sense, thanks for the patch.
> Additionally, the mark is being written in a thread and read in another but 
> is not atomic. Without fence, coherence might take time, making the described 
> issue more likely on some archs.

Right, good point.
>
> Acked-by: Gaetan Rivet 

Thanks Gaetan.
-Harsha
>
> --
> Gaetan Rivet

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] dpif-netdev: Fix a race condition in deletion of offloaded flows

2022-01-26 Thread Sriharsha Basavapatna via dev
In dp_netdev_pmd_remove_flow() we schedule the deletion of an
offloaded flow, if a mark has been assigned to the flow. But if
this occurs in the window in which the offload thread completes
offloading the flow and assigns a mark to the flow, then we miss
deleting the flow. This problem has been observed while adding
and deleting flows in a loop. To fix this, always enqueue flow
deletion regardless of the flow->mark being set.

Fixes: 241bad15d99a("dpif-netdev: associate flow with a mark id")
Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e28e0b554..22c5f19a8 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3029,9 +3029,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread 
*pmd,
 dp_netdev_simple_match_remove(pmd, flow);
 cmap_remove(>flow_table, node, dp_netdev_flow_hash(>ufid));
 ccmap_dec(>n_flows, odp_to_u32(in_port));
-if (flow->mark != INVALID_FLOW_MARK) {
-queue_netdev_flow_del(pmd, flow);
-}
+queue_netdev_flow_del(pmd, flow);
 flow->dead = true;
 
 dp_netdev_flow_unref(flow);
-- 
2.30.0.349.g30b29f044a


-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 00/27] dpif-netdev: Parallel offload processing

2022-01-26 Thread Sriharsha Basavapatna via dev
On Wed, Jan 26, 2022 at 3:26 AM Ilya Maximets  wrote:
>
> On 1/23/22 17:28, Sriharsha Basavapatna wrote:
> > Hi Ilya,
> >
> > On Wed, Jan 19, 2022 at 7:24 AM Ilya Maximets  wrote:
> > 
> >>
> >> I also spotted one bug where the flow stays offloaded if it gets added
> >> and removed shortly after that.  But it's an old existing race condition,
> >> not introduced by this patch set.  We basically have to enqueue the
> >> flow deletion regardless of the flow->mark being set.
> >> I'll send a patch for that issue, probably, in the next couple of days.
> >> Or if you want to work on that, that's fine for me too.
> >
> > We are seeing a similar issue with OVS-2.16. While adding and deleting
> > flows in a loop, after a few iterations, rte_flow_destroy() is not
> > seen by the PMD for some of the offloaded flows. And those flows stay
> > offloaded in HW.  I tried the following change in 2.16 like you
> > suggested above and it resolved the issue.
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index d6bee2a5a..8cca57f1f 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -2758,9 +2758,7 @@ dp_netdev_pmd_remove_flow(struct
> > dp_netdev_pmd_thread *pmd,
> >  ovs_assert(cls != NULL);
> >  dpcls_remove(cls, >cr);
> >  cmap_remove(>flow_table, node, dp_netdev_flow_hash(>ufid));
> > -if (flow->mark != INVALID_FLOW_MARK) {
> > -queue_netdev_flow_del(pmd, flow);
> > -}
> > +queue_netdev_flow_del(pmd, flow);
> >  flow->dead = true;
> >
> >  dp_netdev_flow_unref(flow);
>
> Yeah, I was thinking about something similar.  Would you mind sending an
> official patch?
>
> Best regards, Ilya Maximets.

Sure, I will send it.
Thanks,
-Harsha

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 00/27] dpif-netdev: Parallel offload processing

2022-01-23 Thread Sriharsha Basavapatna via dev
Hi Ilya,

On Wed, Jan 19, 2022 at 7:24 AM Ilya Maximets  wrote:
>
> On 9/15/21 15:45, Maxime Coquelin wrote:
> > Hi,
> >
> > On 9/8/21 11:47 AM, Gaetan Rivet wrote:
> >> This patch series aims to improve the performance of the management
> >> of hw-offloads in dpif-netdev. In the current version, some setup
> >> will experience high memory usage and poor latency between a flow
> >> decision and its execution regarding hardware offloading.
> >>
> >> This series starts by measuring key metrics regarding both issues
> >> Those patches are introduced first to compare the current status
> >> with each improvements introduced.
> >> Offloads enqueued and inserted, as well as the latency
> >> from queue insertion to hardware insertion is measured. A new
> >> command 'ovs-appctl dpctl/offload-stats-show' is introduced
> >> to show the current measure.
> >>
> >> In my current performance test setup I am measuring an
> >> average latency hovering between 1~2 seconds.
> >> After the optimizations, it is reduced to 500~900 ms.
> >> Finally when using multiple threads and with proper driver
> >> support[1], it is measured in the order of 1 ms.
> >>
> >> A few modules are introduced:
> >>
> >>   * An ID pool with reduced capabilities, simplifying its
> >> operations and allowing better performances in both
> >> single and multi-thread setup.
> >>
> >>   * A lockless queue between PMDs / revalidators and
> >> offload thread(s). As the number of PMDs increases,
> >> contention can be high on the shared queue.
> >> This queue is designed to serve as message queue
> >> between threads.
> >>
> >>   * A bounded lockless MPMC ring and some helpers for
> >> calculating moving averages.
> >>
> >>   * A moving average module for Cumulative and Exponential
> >> moving averages.
> >>
> >> The netdev-offload-dpdk module is made thread-safe.
> >> Internal maps are made per-netdev instead, and locks are
> >> taken for shorter critical sections within the module.
> >>
> >> CI result: https://github.com/grivet/ovs/actions/runs/554918929
> >>
> >> [1]: The rte_flow API was made thread-safe in the 20.11 DPDK
> >>  release. Drivers that do not implement those operations
> >>  concurrently are protected by a lock. Others will
> >>  allow better concurrency, that improve the result
> >>  of this series.
> >>
> >> v2:
> >>
> >>   * Improved the MPSC queue API to simplify usage.
> >>
> >>   * Moved flush operation from initiator thread to offload
> >> thread(s). This ensures offload metadata are shared only
> >> among the offload thread pool.
> >>
> >>   * Flush operation needs additional thread synchronization.
> >> The ovs_barrier currently triggers a UAF. Add a unit-test to
> >> validate its operations and a fix for the UAF.
> >>
> >> CI result: https://github.com/grivet/ovs/actions/runs/741430135
> >>The error comes from a failure to download 'automake' on
> >>osx, unrelated to any change in this series.
> >>
> >> v3:
> >>
> >>   * Re-ordered commits so fixes are first. No conflict seen currently,
> >> but it might prevent them if some requested changes to the series
> >> were to move code in the same parts.
> >>
> >>   * Modified the reduced quiescing the thread to use ovsrcu_quiesce(),
> >> and base next_rcu on the current time value (after quiescing happened,
> >> however long it takes).
> >>
> >>   * Added Reviewed-by tags to the relevant commits.
> >>
> >> CI result: https://github.com/grivet/ovs/actions/runs/782655601
> >>
> >> v4:
> >>
> >>   * Modified the seq-pool to use batches of IDs with a spinlock
> >> instead of lockless rings.
> >>
> >>   * The llring structure is removed.
> >>
> >>   * Due to the length of the changes to the structure, some
> >> acked-by or reviewed-by were not ported to the id-fpool patch.
> >>
> >> CI result: https://github.com/grivet/ovs/actions/runs/921095015
> >>
> >> v5:
> >>
> >>   * Rebase on master.
> >> Conflicts were seen related to the vxlan-decap and pmd rebalance
> >> series.
> >>
> >>   * Fix typo in xchg patch spotted by Maxime Coquelin.
> >>
> >>   * Added Reviewed-by Maxime Coquelin on 4 patches.
> >
> >
> > I went through the changes between v4 and v5, and would like to confirm
> > these are are minor and OK to me.
> >
> > Regards,
> > Maxime
> >
> >> CI result: https://github.com/grivet/ovs/actions/runs/1212804378
> >>
> >> Gaetan Rivet (27):
> >>   ovs-thread: Fix barrier use-after-free
> >>   dpif-netdev: Rename flow offload thread
> >>   tests: Add ovs-barrier unit test
> >>   netdev: Add flow API uninit function
> >>   netdev-offload-dpdk: Use per-netdev offload metadata
> >>   netdev-offload-dpdk: Implement hw-offload statistics read
> >>   dpctl: Add function to read hardware offload statistics
> >>   dpif-netdev: Rename offload thread structure
> >>   mov-avg: Add a moving average helper structure
> >>   dpif-netdev: Implement hardware offloads stats query
> >>   

Re: [ovs-dev] [PATCH v3] dpif-netdev: Forwarding optimization for flows with a simple match.

2021-12-21 Thread Sriharsha Basavapatna via dev
On Tue, Nov 2, 2021 at 1:48 AM Ilya Maximets  wrote:
>
> On 10/30/21 09:04, Sriharsha Basavapatna wrote:
> > On Mon, Aug 9, 2021 at 6:28 PM Ilya Maximets  wrote:
> >>
> >> There are cases where users might want simple forwarding or drop rules
> >> for all packets received from a specific port, e.g ::
> >>
> >>   "in_port=1,actions=2"
> >>   "in_port=2,actions=IN_PORT"
> >>   "in_port=3,vlan_tci=0x1234/0x1fff,actions=drop"
> >>   "in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3"
> >>
> >> There are also cases where complex OpenFlow rules can be simplified
> >> down to datapath flows with very simple match criteria.
> >>
> >> In theory, for very simple forwarding, OVS doesn't need to parse
> >> packets at all in order to follow these rules.  "Simple match" lookup
> >> optimization is intended to speed up packet forwarding in these cases.
> >>
> >> Design:
> >>
> >> Due to various implementation constraints userspace datapath has
> >> following flow fields always in exact match (i.e. it's required to
> >> match at least these fields of a packet even if the OF rule doesn't
> >> need that):
> >>
> >>   - recirc_id
> >>   - in_port
> >>   - packet_type
> >>   - dl_type
> >>   - vlan_tci (CFI + VID) - in most cases
> >>   - nw_frag - for ip packets
> >>
> >> Not all of these fields are related to packet itself.  We already
> >> know the current 'recirc_id' and the 'in_port' before starting the
> >> packet processing.  It also seems safe to assume that we're working
> >> with Ethernet packets.  So, for the simple OF rule we need to match
> >> only on 'dl_type', 'vlan_tci' and 'nw_frag'.
> >>
> >> 'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be
> >> combined in a single 64bit integer (mark) that can be used as a
> >> hash in hash map.  We are using only VID and CFI form the 'vlan_tci',
> >> flows that need to match on PCP will not qualify for the optimization.
> >> Workaround for matching on non-existence of vlan updated to match on
> >> CFI and VID only in order to qualify for the optimization.  CFI is
> >> always set by OVS if vlan is present in a packet, so there is no need
> >> to match on PCP in this case.  'nw_frag' takes 2 bits of PCP inside
> >> the simple match mark.
> >>
> >> New per-PMD flow table 'simple_match_table' introduced to store
> >> simple match flows only.  'dp_netdev_flow_add' adds flow to the
> >> usual 'flow_table' and to the 'simple_match_table' if the flow
> >> meets following constraints:
> >>
> >>   - 'recirc_id' in flow match is 0.
> >>   - 'packet_type' in flow match is Ethernet.
> >>   - Flow wildcards contains only minimal set of non-wildcarded fields
> >> (listed above).
> >>
> >> If the number of flows for current 'in_port' in a regular 'flow_table'
> >> equals number of flows for current 'in_port' in a 'simple_match_table',
> >> we may use simple match optimization, because all the flows we have
> >> are simple match flows.  This means that we only need to parse
> >> 'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching.
> >> Now we make the unique flow mark from the 'in_port', 'dl_type',
> >> 'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'.
> >> On successful lookup we don't need to run full 'miniflow_extract()'.
> >>
> >> Unsuccessful lookup technically means that we have no suitable flow
> >> in the datapath and upcall will be required.  So, in this case EMC and
> >> SMC lookups are disabled.  We may optimize this path in the future by
> >> bypassing the dpcls lookup too.
> >>
> >> Performance improvement of this solution on a 'simple match' flows
> >> should be comparable with partial HW offloading, because it parses same
> >> packet fields and uses similar flow lookup scheme.
> >> However, unlike partial HW offloading, it works for all port types
> >> including virtual ones.
> >>
> >> Performance results when compared to EMC:
> >>
> >> Test setup:
> >>
> >>  virtio-user   OVSvirtio-user
> >>   Testpmd1  >  pmd1  >  Testpmd2
> >>   (txonly)   x<--  pmd2  < (mac swap)
> >>
> >> Single stream of 64byte packets.  Actions:
> >>   in_port=vhost0,actions=vhost1
> >>   in_port=vhost1,actions=vhost0
> >>
> >> Stats collected from pmd1 and pmd2, so there are 2 scenarios:
> >> Virt-to-Virt   : Testpmd1 --> pmd1 --> Testpmd2.
> >> Virt-to-NoCopy : Testpmd2 --> pmd2 --->x   Testpmd1.
> >> Here the packet sent from pmd2 to Testpmd1 is always dropped, because
> >> the virtqueue is full since Testpmd1 is in txonly mode and doesn't
> >> receive any packets.  This should be closer to the performance of a
> >> VM-to-Phy scenario.
> >>
> >> Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz.
> >> Table below represents improvement in throughput when compared to EMC.
> >>
> >>  ++++
> >>  ||Default (-g -O2)| "-Ofast -march=native" |
> >>  | 

[ovs-dev] [PATCH v2] dpif-netdev: avoid hw_miss_packet_recover() for devices with no support

2021-12-12 Thread Sriharsha Basavapatna via dev
The hw_miss_packet_recover() API results in performance degradation, for
ports that are either not offload capable or do not support this specific
offload API.

For example, in the test configuration shown below, the vhost-user port
does not support offloads and the VF port doesn't support hw_miss offload
API. But because tunnel offload needs to be configured in other bridges
(br-vxlan and br-phy), OVS has been built with -DALLOW_EXPERIMENTAL_API.

br-vhostbr-vxlanbr-phy
vhost-user<-->VFVF-Rep<-->VxLAN   uplink-port

For every packet between the VF and the vhost-user ports, hw_miss API is
called even though it is not supported by the ports involved. This leads
to significant performance drop (~3x in some cases; both cycles and pps).

Return EOPNOTSUPP when this API fails for a device that doesn't support it
and avoid this API on that port for subsequent packets.

Signed-off-by: Sriharsha Basavapatna 
---

v2: 
Rebased changes to commit: 20a4f546f7db; updated the logic to
save 'hw_miss_api_supported' variable in struct dp_netdev_rxq.

---
 lib/dpif-netdev.c | 18 +-
 lib/netdev-offload-dpdk.c |  8 ++--
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a790df5fd..dbe3f427b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -377,6 +377,7 @@ struct dp_netdev_rxq {
 unsigned intrvl_idx;   /* Write index for 'cycles_intrvl'. */
 struct dp_netdev_pmd_thread *pmd;  /* pmd thread that polls this queue. */
 bool is_vhost; /* Is rxq of a vhost port. */
+bool hw_miss_api_supported;/* hw_miss_packet_recover() supported */
 
 /* Counters of cycles spent successfully polling and processing pkts. */
 atomic_ullong cycles[RXQ_N_CYCLES];
@@ -4790,6 +4791,7 @@ port_reconfigure(struct dp_netdev_port *port)
 
 port->rxqs[i].port = port;
 port->rxqs[i].is_vhost = !strncmp(port->type, "dpdkvhost", 9);
+port->rxqs[i].hw_miss_api_supported = true;
 
 err = netdev_rxq_open(netdev, >rxqs[i].rx, i);
 if (err) {
@@ -7332,12 +7334,18 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread 
*pmd,
 #ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
 /* Restore the packet if HW processing was terminated before completion. */
 struct dp_netdev_rxq *rxq = pmd->ctx.last_rxq;
-int err;
 
-err = netdev_hw_miss_packet_recover(rxq->port->netdev, packet);
-if (err && err != EOPNOTSUPP) {
-COVERAGE_INC(datapath_drop_hw_miss_recover);
-return -1;
+if (rxq->hw_miss_api_supported) {
+int err = netdev_hw_miss_packet_recover(rxq->port->netdev, packet);
+if (err) {
+if (err != EOPNOTSUPP) {
+COVERAGE_INC(datapath_drop_hw_miss_recover);
+return -1;
+} else {
+/* API unsupported by the port; avoid subsequent calls. */
+rxq->hw_miss_api_supported = false;
+}
+}
 }
 #endif
 
diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index 9fee7570a..b335572cd 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -2292,11 +2292,15 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct 
netdev *netdev,
 odp_port_t vport_odp;
 int ret = 0;
 
-if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
-  _restore_info, NULL)) {
+ret = netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
+_restore_info, NULL);
+if (ret) {
 /* This function is called for every packet, and in most cases there
  * will be no restore info from the HW, thus error is expected.
  */
+if (ret == -EOPNOTSUPP) {
+return -ret;
+}
 return 0;
 }
 
-- 
2.30.0.349.g30b29f044a


-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: avoid hw_miss_packet_recover() for devices with no support

2021-11-22 Thread Sriharsha Basavapatna via dev
Hi Eli,

On Sun, Nov 21, 2021 at 12:03 PM Eli Britstein via dev
 wrote:
>
> Hi Harsha,
>
> It's a clever idea, though have some problems in the implementation. PSB.

Thanks, please see my response below.
>
>
> On 11/20/2021 11:20 AM, Sriharsha Basavapatna wrote:
> > The hw_miss_packet_recover() API results in performance degradation, for
> > ports that are either not offload capable or do not support this specific
> > offload API.
> >
> > For example, in the test configuration shown below, the vhost-user port
> > does not support offloads and the VF port doesn't support hw_miss offload
> > API. But because tunnel offload needs to be configured in other bridges
> > (br-vxlan and br-phy), OVS has been built with -DALLOW_EXPERIMENTAL_API.
> >
> >  br-vhostbr-vxlanbr-phy
> > vhost-user<-->VFVF-Rep<-->VxLAN   uplink-port
> >
> > For every packet between the VF and the vhost-user ports, hw_miss API is
> > called even though it is not supported by the ports involved. This leads
> > to significant performance drop (~3x in some cases; both cycles and pps).
> >
> > To fix this, return EOPNOTSUPP when this API fails for a device that
> "To fix" -> "To improve"
> > doesn't support it and avoid this API on that port for subsequent packets.
> >
> > Signed-off-by: Sriharsha Basavapatna 
> > ---
> >   lib/dpif-netdev-private.h |  2 +-
> >   lib/dpif-netdev.c | 29 +
> >   lib/netdev-offload-dpdk.c |  9 +++--
> >   3 files changed, 29 insertions(+), 11 deletions(-)
> >
> > diff --git a/lib/dpif-netdev-private.h b/lib/dpif-netdev-private.h
> > index 4593649bd..e2a6a9d3a 100644
> > --- a/lib/dpif-netdev-private.h
> > +++ b/lib/dpif-netdev-private.h
> > @@ -46,7 +46,7 @@ dp_netdev_batch_execute(struct dp_netdev_pmd_thread *pmd,
> >
> >   int
> >   dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
> > -  odp_port_t port_no,
> > +  void *port,
>
> void * -> struct tx_port *. use a forward declaration.
>
> > struct dp_packet *packet,
> > struct dp_netdev_flow **flow);
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 69d7ec26e..207b1961c 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -434,6 +434,7 @@ struct tx_port {
> >   long long last_used;
> >   struct hmap_node node;
> >   long long flush_time;
> > +bool hw_miss_api_supported;
> >   struct dp_packet_batch output_pkts;
> >   struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
> >   };
> > @@ -6972,6 +6973,7 @@ dp_netdev_add_port_tx_to_pmd(struct 
> > dp_netdev_pmd_thread *pmd,
> >   tx->port = port;
> >   tx->qid = -1;
> >   tx->flush_time = 0LL;
> > +tx->hw_miss_api_supported = true;
> >   dp_packet_batch_init(>output_pkts);
> >
> >   hmap_insert(>tx_ports, >node, 
> > hash_port_no(tx->port->port_no));
> > @@ -7327,22 +7329,28 @@ static struct tx_port * pmd_send_port_cache_lookup(
> >
> >   inline int
> >   dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
> > -  odp_port_t port_no OVS_UNUSED,
> > +  void *port,
> don't omit OVS_UNUSED. it is for compiling without ALLOW_EXPERIMENTAL_API
Ok.
> > struct dp_packet *packet,
> > struct dp_netdev_flow **flow)
> >   {
> > -struct tx_port *p OVS_UNUSED;
> > +struct tx_port *p = port;
> no need for this local variable, you get it from the function arguments
The declaration of dp_netdev_hw_flow() in dpif-netdev-private.h can't
see 'struct tx_port' since it is defined in dpif_netdev.c. So it needs
to be a void * argument.
> >   uint32_t mark;
> >
> >   #ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
> >   /* Restore the packet if HW processing was terminated before 
> > completion. */
> > -p = pmd_send_port_cache_lookup(pmd, port_no);
> > -if (OVS_LIKELY(p)) {
> > +if (OVS_LIKELY(p) && p->hw_miss_api_supported) {
> >   int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);
> >
> > -if (err && err != EOPNOTSUPP) {
> > -COVERAGE_INC(datapath_drop_hw_miss_recover);
> > -return -1;
> > +if (err) {
> > +if (err != EOPNOTSUPP) {
> > +COVERAGE_INC(datapath_drop_hw_miss_recover);
> > +return -1;
> > +} else {
> > +/* API unsupported by the port; avoid subsequent calls. */
> > +VLOG_DBG("hw_miss_api unsupported: port: %d",
> > + p->port->port_no);
> > +p->hw_miss_api_supported = false;
> > +}
> >   }
> >   }
> >   #endif
> > @@ -7394,6 +7402,11 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
> >   uint16_t tcp_flags;
> >   size_t map_cnt = 0;
> >   bool batch_enable = true;
> > +struct tx_port *port = NULL;
> > +
> > +#ifdef 

[ovs-dev] [PATCH] dpif-netdev: avoid hw_miss_packet_recover() for devices with no support

2021-11-20 Thread Sriharsha Basavapatna via dev
The hw_miss_packet_recover() API results in performance degradation, for
ports that are either not offload capable or do not support this specific
offload API.

For example, in the test configuration shown below, the vhost-user port
does not support offloads and the VF port doesn't support hw_miss offload
API. But because tunnel offload needs to be configured in other bridges
(br-vxlan and br-phy), OVS has been built with -DALLOW_EXPERIMENTAL_API.

br-vhostbr-vxlanbr-phy
vhost-user<-->VFVF-Rep<-->VxLAN   uplink-port

For every packet between the VF and the vhost-user ports, hw_miss API is
called even though it is not supported by the ports involved. This leads
to significant performance drop (~3x in some cases; both cycles and pps).

To fix this, return EOPNOTSUPP when this API fails for a device that
doesn't support it and avoid this API on that port for subsequent packets.

Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev-private.h |  2 +-
 lib/dpif-netdev.c | 29 +
 lib/netdev-offload-dpdk.c |  9 +++--
 3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/lib/dpif-netdev-private.h b/lib/dpif-netdev-private.h
index 4593649bd..e2a6a9d3a 100644
--- a/lib/dpif-netdev-private.h
+++ b/lib/dpif-netdev-private.h
@@ -46,7 +46,7 @@ dp_netdev_batch_execute(struct dp_netdev_pmd_thread *pmd,
 
 int
 dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
-  odp_port_t port_no,
+  void *port,
   struct dp_packet *packet,
   struct dp_netdev_flow **flow);
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 69d7ec26e..207b1961c 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -434,6 +434,7 @@ struct tx_port {
 long long last_used;
 struct hmap_node node;
 long long flush_time;
+bool hw_miss_api_supported;
 struct dp_packet_batch output_pkts;
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
@@ -6972,6 +6973,7 @@ dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread 
*pmd,
 tx->port = port;
 tx->qid = -1;
 tx->flush_time = 0LL;
+tx->hw_miss_api_supported = true;
 dp_packet_batch_init(>output_pkts);
 
 hmap_insert(>tx_ports, >node, hash_port_no(tx->port->port_no));
@@ -7327,22 +7329,28 @@ static struct tx_port * pmd_send_port_cache_lookup(
 
 inline int
 dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
-  odp_port_t port_no OVS_UNUSED,
+  void *port,
   struct dp_packet *packet,
   struct dp_netdev_flow **flow)
 {
-struct tx_port *p OVS_UNUSED;
+struct tx_port *p = port;
 uint32_t mark;
 
 #ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
 /* Restore the packet if HW processing was terminated before completion. */
-p = pmd_send_port_cache_lookup(pmd, port_no);
-if (OVS_LIKELY(p)) {
+if (OVS_LIKELY(p) && p->hw_miss_api_supported) {
 int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);
 
-if (err && err != EOPNOTSUPP) {
-COVERAGE_INC(datapath_drop_hw_miss_recover);
-return -1;
+if (err) {
+if (err != EOPNOTSUPP) {
+COVERAGE_INC(datapath_drop_hw_miss_recover);
+return -1;
+} else {
+/* API unsupported by the port; avoid subsequent calls. */
+VLOG_DBG("hw_miss_api unsupported: port: %d",
+ p->port->port_no);
+p->hw_miss_api_supported = false;
+}
 }
 }
 #endif
@@ -7394,6 +7402,11 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 uint16_t tcp_flags;
 size_t map_cnt = 0;
 bool batch_enable = true;
+struct tx_port *port = NULL;
+
+#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
+port = pmd_send_port_cache_lookup(pmd, port_no);
+#endif
 
 pmd_perf_update_counter(>perf_stats,
 md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV,
@@ -7420,7 +7433,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 }
 
 if (netdev_flow_api && recirc_depth == 0) {
-if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port_no, packet, ))) {
+if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port, packet, ))) {
 /* Packet restoration failed and it was dropped, do not
  * continue processing.
  */
diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index 9fee7570a..8bd2e 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -2292,11 +2292,16 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct 
netdev *netdev,
 odp_port_t vport_odp;
 int ret = 0;
 
-if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
-  _restore_info, NULL)) {
+ret = 

Re: [ovs-dev] Regarding the performance issues reported during today's OVS+DPDK meeting.

2021-11-11 Thread Sriharsha Basavapatna via dev
On Thu, Nov 11, 2021 at 3:22 PM Sriharsha Basavapatna
 wrote:
>
> Hi Ilya,
>
> On Wed, Nov 10, 2021 at 11:31 PM Ilya Maximets  wrote:
> >
> > Hi, Harsha.
> >
> > I was thinking about 3x performance drop due to enabling
> > of experimental API that you reported during the meeting
> > today.  I just want to clarify one thing to be sure that
> > you're not making the same mistake as I did.
> >
> > If you're just building OVS without specifying CFLAGS,
> > binaries will be compiled with the default '-g -O2'.
> > However, any manipulations with CLFAGS will result with
> > overriding of these default flags.  Therefore,
> > ./configure CLFAGS='-DALLOW_EXPERIMENTAL_API' will result
> > in building OVS with the compiler default optimization
> > level, which is -O0.  On my setup in a simple test,
> > -O2 works approximately 3-4x faster than -O0.  So, I'm
> > wondering if you just didn't add '-O2' while building
> > with experimental API, so the results are so low?
> > The correct way to configure would be:
> > ./configure CLFAGS='-DALLOW_EXPERIMENTAL_API -g -O2'
> >
> > I made this mistake several times in the past.  And the
> > most recent time it was today. :)
>
> Thanks for this information. We have been using '-g -O3' (with and
> without ALLOW_EXPERIMENTAL_API) . We will run the tests with '-g -O2'
> and see if there's any difference.
>
> Thanks,
> -Harsha

We see some improvement, but not comparable to numbers without the
'experimental' flag (non tunnel config). For example, at 32 flows and
with 64B packet size:
1)  -g -O2:cycles: 850, pps: 3.7M
2)  -g -O3:cycles: 1100, pps: 2.8M
3)  expected:cycles: 350, pps: 9M

I'm working on a patch that almost meets the expected numbers. I will
send it out for review after more testing.
Thanks,
-Harsha
> >
> >
> > Best regards, Ilya Maximets.

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Regarding the performance issues reported during today's OVS+DPDK meeting.

2021-11-11 Thread Sriharsha Basavapatna via dev
Hi Ilya,

On Wed, Nov 10, 2021 at 11:31 PM Ilya Maximets  wrote:
>
> Hi, Harsha.
>
> I was thinking about 3x performance drop due to enabling
> of experimental API that you reported during the meeting
> today.  I just want to clarify one thing to be sure that
> you're not making the same mistake as I did.
>
> If you're just building OVS without specifying CFLAGS,
> binaries will be compiled with the default '-g -O2'.
> However, any manipulations with CLFAGS will result with
> overriding of these default flags.  Therefore,
> ./configure CLFAGS='-DALLOW_EXPERIMENTAL_API' will result
> in building OVS with the compiler default optimization
> level, which is -O0.  On my setup in a simple test,
> -O2 works approximately 3-4x faster than -O0.  So, I'm
> wondering if you just didn't add '-O2' while building
> with experimental API, so the results are so low?
> The correct way to configure would be:
> ./configure CLFAGS='-DALLOW_EXPERIMENTAL_API -g -O2'
>
> I made this mistake several times in the past.  And the
> most recent time it was today. :)

Thanks for this information. We have been using '-g -O3' (with and
without ALLOW_EXPERIMENTAL_API) . We will run the tests with '-g -O2'
and see if there's any difference.

Thanks,
-Harsha
>
>
> Best regards, Ilya Maximets.

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] dpif-netdev: Forwarding optimization for flows with a simple match.

2021-10-30 Thread Sriharsha Basavapatna via dev
On Mon, Aug 9, 2021 at 6:28 PM Ilya Maximets  wrote:
>
> There are cases where users might want simple forwarding or drop rules
> for all packets received from a specific port, e.g ::
>
>   "in_port=1,actions=2"
>   "in_port=2,actions=IN_PORT"
>   "in_port=3,vlan_tci=0x1234/0x1fff,actions=drop"
>   "in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3"
>
> There are also cases where complex OpenFlow rules can be simplified
> down to datapath flows with very simple match criteria.
>
> In theory, for very simple forwarding, OVS doesn't need to parse
> packets at all in order to follow these rules.  "Simple match" lookup
> optimization is intended to speed up packet forwarding in these cases.
>
> Design:
>
> Due to various implementation constraints userspace datapath has
> following flow fields always in exact match (i.e. it's required to
> match at least these fields of a packet even if the OF rule doesn't
> need that):
>
>   - recirc_id
>   - in_port
>   - packet_type
>   - dl_type
>   - vlan_tci (CFI + VID) - in most cases
>   - nw_frag - for ip packets
>
> Not all of these fields are related to packet itself.  We already
> know the current 'recirc_id' and the 'in_port' before starting the
> packet processing.  It also seems safe to assume that we're working
> with Ethernet packets.  So, for the simple OF rule we need to match
> only on 'dl_type', 'vlan_tci' and 'nw_frag'.
>
> 'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be
> combined in a single 64bit integer (mark) that can be used as a
> hash in hash map.  We are using only VID and CFI form the 'vlan_tci',
> flows that need to match on PCP will not qualify for the optimization.
> Workaround for matching on non-existence of vlan updated to match on
> CFI and VID only in order to qualify for the optimization.  CFI is
> always set by OVS if vlan is present in a packet, so there is no need
> to match on PCP in this case.  'nw_frag' takes 2 bits of PCP inside
> the simple match mark.
>
> New per-PMD flow table 'simple_match_table' introduced to store
> simple match flows only.  'dp_netdev_flow_add' adds flow to the
> usual 'flow_table' and to the 'simple_match_table' if the flow
> meets following constraints:
>
>   - 'recirc_id' in flow match is 0.
>   - 'packet_type' in flow match is Ethernet.
>   - Flow wildcards contains only minimal set of non-wildcarded fields
> (listed above).
>
> If the number of flows for current 'in_port' in a regular 'flow_table'
> equals number of flows for current 'in_port' in a 'simple_match_table',
> we may use simple match optimization, because all the flows we have
> are simple match flows.  This means that we only need to parse
> 'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching.
> Now we make the unique flow mark from the 'in_port', 'dl_type',
> 'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'.
> On successful lookup we don't need to run full 'miniflow_extract()'.
>
> Unsuccessful lookup technically means that we have no suitable flow
> in the datapath and upcall will be required.  So, in this case EMC and
> SMC lookups are disabled.  We may optimize this path in the future by
> bypassing the dpcls lookup too.
>
> Performance improvement of this solution on a 'simple match' flows
> should be comparable with partial HW offloading, because it parses same
> packet fields and uses similar flow lookup scheme.
> However, unlike partial HW offloading, it works for all port types
> including virtual ones.
>
> Performance results when compared to EMC:
>
> Test setup:
>
>  virtio-user   OVSvirtio-user
>   Testpmd1  >  pmd1  >  Testpmd2
>   (txonly)   x<--  pmd2  < (mac swap)
>
> Single stream of 64byte packets.  Actions:
>   in_port=vhost0,actions=vhost1
>   in_port=vhost1,actions=vhost0
>
> Stats collected from pmd1 and pmd2, so there are 2 scenarios:
> Virt-to-Virt   : Testpmd1 --> pmd1 --> Testpmd2.
> Virt-to-NoCopy : Testpmd2 --> pmd2 --->x   Testpmd1.
> Here the packet sent from pmd2 to Testpmd1 is always dropped, because
> the virtqueue is full since Testpmd1 is in txonly mode and doesn't
> receive any packets.  This should be closer to the performance of a
> VM-to-Phy scenario.
>
> Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz.
> Table below represents improvement in throughput when compared to EMC.
>
>  ++++
>  ||Default (-g -O2)| "-Ofast -march=native" |
>  |   Scenario ++---++---+
>  || GCC|   Clang   | GCC|   Clang   |
>  +++---++---+
>  | Virt-to-Virt   |+18.9%  |   +25.5%  |+10.8%  |   +16.7%  |
>  | Virt-to-NoCopy |+24.3%  |   +33.7%  |+14.9%  |   +22.0%  |
>  

Re: [ovs-dev] [PATCH] netdev-offload-dpdk: initialize s_tnl dynamic string

2021-08-13 Thread Sriharsha Basavapatna via dev
On Fri, Aug 13, 2021 at 6:27 PM Gaëtan Rivet  wrote:

> On Fri, Aug 13, 2021, at 08:14, Sriharsha Basavapatna via dev wrote:
> > The 's_tnl' member in flow_patterns and flow_actions should be
> > to set to DS_EMPTY_INITIALIZER, to be consistent with dynamic string
> > initializations.
> >
> > Also, there's a potential memory leak of flow_patterns->s_tnl.
> > Fix this by destroying s_tnl in free_flow_patterns().
> >
> > Fixes: 507d20e77bfe ("netdev-offload-dpdk: Support vports flows
> offload.")
> > Fixes: be56e063d028 ("netdev-offload-dpdk: Support tunnel pop action.")
> > Signed-off-by: Sriharsha Basavapatna  >
>
> Hi Harsha,
>
> Thanks for the fix. I have an optional remark below.
> However, I'm not sure it's worth sending a v2 for it, so
> whether it's integrated as-is or modified is fine.
>
> Acked-by: Gaetan Rivet 
>
>
Thanks Gaetan, I'll not send out a v2 for this change.
-Harsha

> ---
> >  lib/netdev-offload-dpdk.c | 19 ---
> >  1 file changed, 16 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> > index f6706ee0c..aaf2f9a0b 100644
> > --- a/lib/netdev-offload-dpdk.c
> > +++ b/lib/netdev-offload-dpdk.c
> > @@ -791,6 +791,7 @@ free_flow_patterns(struct flow_patterns *patterns)
> >  free(patterns->items);
> >  patterns->items = NULL;
> >  patterns->cnt = 0;
> > +ds_destroy(>s_tnl);
> >  }
> >
> >  static void
> > @@ -1324,7 +1325,11 @@ netdev_offload_dpdk_mark_rss(struct
> > flow_patterns *patterns,
> >   struct netdev *netdev,
> >   uint32_t flow_mark)
> >  {
> > -struct flow_actions actions = { .actions = NULL, .cnt = 0 };
> > +struct flow_actions actions = {
> > +.actions = NULL,
> > +.cnt = 0,
> > +.s_tnl = DS_EMPTY_INITIALIZER
>
> If the initializer is multi-line, I prefer to always use a terminating
> comma.
> It reduces the diff size when the structure is modified.
>
> --
> Gaetan Rivet
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] netdev-offload-dpdk: initialize s_tnl dynamic string

2021-08-13 Thread Sriharsha Basavapatna via dev
The 's_tnl' member in flow_patterns and flow_actions should be
to set to DS_EMPTY_INITIALIZER, to be consistent with dynamic string
initializations.

Also, there's a potential memory leak of flow_patterns->s_tnl.
Fix this by destroying s_tnl in free_flow_patterns().

Fixes: 507d20e77bfe ("netdev-offload-dpdk: Support vports flows offload.")
Fixes: be56e063d028 ("netdev-offload-dpdk: Support tunnel pop action.")
Signed-off-by: Sriharsha Basavapatna 
---
 lib/netdev-offload-dpdk.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index f6706ee0c..aaf2f9a0b 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -791,6 +791,7 @@ free_flow_patterns(struct flow_patterns *patterns)
 free(patterns->items);
 patterns->items = NULL;
 patterns->cnt = 0;
+ds_destroy(>s_tnl);
 }
 
 static void
@@ -1324,7 +1325,11 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns 
*patterns,
  struct netdev *netdev,
  uint32_t flow_mark)
 {
-struct flow_actions actions = { .actions = NULL, .cnt = 0 };
+struct flow_actions actions = {
+.actions = NULL,
+.cnt = 0,
+.s_tnl = DS_EMPTY_INITIALIZER
+};
 const struct rte_flow_attr flow_attr = {
 .group = 0,
 .priority = 0,
@@ -1809,7 +1814,11 @@ netdev_offload_dpdk_actions(struct netdev *netdev,
 size_t actions_len)
 {
 const struct rte_flow_attr flow_attr = { .ingress = 1, .transfer = 1 };
-struct flow_actions actions = { .actions = NULL, .cnt = 0 };
+struct flow_actions actions = {
+.actions = NULL,
+.cnt = 0,
+.s_tnl = DS_EMPTY_INITIALIZER
+};
 struct rte_flow *flow = NULL;
 struct rte_flow_error error;
 int ret;
@@ -1833,7 +1842,11 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
  const ovs_u128 *ufid,
  struct offload_info *info)
 {
-struct flow_patterns patterns = { .items = NULL, .cnt = 0 };
+struct flow_patterns patterns = {
+.items = NULL,
+.cnt = 0,
+.s_tnl = DS_EMPTY_INITIALIZER
+};
 struct ufid_to_rte_flow_data *flows_data = NULL;
 bool actions_offloaded = true;
 struct rte_flow *flow;
-- 
2.30.0.349.g30b29f044a


-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dynamic-string: fix a crash in ds_clone()

2021-08-12 Thread Sriharsha Basavapatna via dev
On Fri, Aug 13, 2021 at 4:07 AM Ilya Maximets  wrote:
>
> On 8/12/21 8:33 AM, Sriharsha Basavapatna via dev wrote:
> > In netdev_offload_dpdk_flow_create() when an offload request fails,
> > dump_flow() is called to log a warning message. The 's_tnl' string
> > in flow_patterns gets initialized in vport_to_rte_tunnel() conditionally
> > via ds_put_format(). If it is not initialized, it crashes later in
> > dump_flow_attr()->ds_clone()->memcpy() while dereferencing this string.
> >
> > To fix this, check if memory for the src string has been allocated,
> > before copying it to the dst string.
> >
> > Fixes: fa44a4a3ff7b ("ovn-controller: Persist desired conntrack groups.")
> > Signed-off-by: Sriharsha Basavapatna 
> >
> > ---
> >
> > v1->v2: fix ds_clone(); ds_cstr() not needed in callers.
>
> Thanks!  This version looks good to me.  I'd add a few more generic
> words to the commit message, so it will be easier to understand the
> change on older branches, but I can do that before applying the patch.

Yes, please feel free to update the commit message, thanks !
>
> There supposed to be a separate patch for correct initialization of
> s_tnl in the lib/netdev-offload-dpdk.c, as Gaetan suggested.  We need
> to initialize them with DS_EMPTY_INITIALIZER.  Though it will not be
> different from the actual memory initialization point of view, it's
> a more correct way to work with dynamic string.  Something like this:
>
> @@ -1324,7 +1325,11 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns 
> *patterns,
>   struct netdev *netdev,
>   uint32_t flow_mark)
>  {
> -struct flow_actions actions = { .actions = NULL, .cnt = 0 };
> +struct flow_actions actions = {
> +.actions = NULL,
> +.cnt = 0,
> +.s_tnl = DS_EMPTY_INITIALIZER,
> +};
>  const struct rte_flow_attr flow_attr = {
>  .group = 0,
>  .priority = 0,
> ---
>
> And the same for other initializations of 'struct flow_actions' and
> 'struct flow_patterns'.  I also noticed that free_flow_patterns()
> doesn't destroy the s_tnl, i.e. leaks it.  This can be fixed in the
> same patch along with correct initialization.
>
> Could you prepare this kind of patch?

Sure, I'll send out a separate patch for this.

Thanks,
-Harsha
>
> Best regards, Ilya Maximets.

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] dynamic-string: fix a crash in ds_clone()

2021-08-12 Thread Sriharsha Basavapatna via dev
In netdev_offload_dpdk_flow_create() when an offload request fails,
dump_flow() is called to log a warning message. The 's_tnl' string
in flow_patterns gets initialized in vport_to_rte_tunnel() conditionally
via ds_put_format(). If it is not initialized, it crashes later in
dump_flow_attr()->ds_clone()->memcpy() while dereferencing this string.

To fix this, check if memory for the src string has been allocated,
before copying it to the dst string.

Fixes: fa44a4a3ff7b ("ovn-controller: Persist desired conntrack groups.")
Signed-off-by: Sriharsha Basavapatna 

---

v1->v2: fix ds_clone(); ds_cstr() not needed in callers.

---

 lib/dynamic-string.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/dynamic-string.c b/lib/dynamic-string.c
index 6f7b610a9..fd0127ed1 100644
--- a/lib/dynamic-string.c
+++ b/lib/dynamic-string.c
@@ -460,6 +460,10 @@ ds_chomp(struct ds *ds, int c)
 void
 ds_clone(struct ds *dst, struct ds *source)
 {
+if (!source->allocated) {
+ds_init(dst);
+return;
+}
 dst->length = source->length;
 dst->allocated = dst->length;
 dst->string = xmalloc(dst->allocated + 1);
-- 
2.30.0.349.g30b29f044a


-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-offload-dpdk: fix a crash in dump_flow_attr()

2021-08-11 Thread Sriharsha Basavapatna via dev
On Wed, Aug 11, 2021 at 4:55 PM Ilya Maximets  wrote:
>
> On 8/11/21 7:05 AM, Sriharsha Basavapatna via dev wrote:
> > On Wed, Aug 11, 2021 at 6:21 AM Gaëtan Rivet  wrote:
> >>
> >> On Wed, Aug 4, 2021, at 14:37, Sriharsha Basavapatna via dev wrote:
> >>> In netdev_offload_dpdk_flow_create() when an offload request fails,
> >>> dump_flow() is called to log a warning message. The 's_tnl' string
> >>> in flow_patterns gets initialized in vport_to_rte_tunnel() conditionally
> >>> via ds_put_format(). If it is not initialized, it crashes later in
> >>> dump_flow_attr()->ds_clone()->memcpy() while dereferencing this string.
> >>>
> >>> Fix this by initializing s_tnl using ds_cstr(). Fix a similar
> >>> issue with actions->s_tnl.
> >>>
> >>> Signed-off-by: Sriharsha Basavapatna 
> >>
> >> Hello Harsha,
> >>
> >> Thanks for the fix. I have a remark below.
> >
> > Thanks Gaetan, please see my response inline.
> >
> >>
> >>> ---
> >>>  lib/netdev-offload-dpdk.c | 4 
> >>>  1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> >>> index f6706ee0c..243e2c430 100644
> >>> --- a/lib/netdev-offload-dpdk.c
> >>> +++ b/lib/netdev-offload-dpdk.c
> >>> @@ -788,6 +788,7 @@ free_flow_patterns(struct flow_patterns *patterns)
> >>>  free(CONST_CAST(void *, patterns->items[i].mask));
> >>>  }
> >>>  }
> >>> +ds_destroy(>s_tnl);
> >>>  free(patterns->items);
> >>>  patterns->items = NULL;
> >>>  patterns->cnt = 0;
> >>> @@ -1334,6 +1335,7 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns
> >>> *patterns,
> >>>  struct rte_flow_error error;
> >>>  struct rte_flow *flow;
> >>>
> >>> +ds_cstr(_tnl);
> >>
> >> You are forced to use 'ds_cstr' because 'ds_clone' crashes on properly 
> >> initialized
> >> ds. I think this is an issue with 'ds_clone'.
> >>
> >> I would expect all ds_ functions to work with strings that were 
> >> initialized with
> >> 'ds_init' or set to 'DS_EMPTY_INITIALIZER'.
> >>
> >> In this case, I think it's better to use .s_tnl = DS_EMPTY_INITIALIZER in 
> >> the actions
> >> initializer list and have a fix for 'ds_clone' so that it won't crash with 
> >> correct strings.
> >
> > Are you suggesting something like this in ds_clone():
> >
> > if (source->string)
> >  memcpy(dst->string, source->string, dst->allocated + 1);
> >
> > I suspect that's inconsistent with other functions in the ds library.
> > The expectation seems to be that callers initialize their strings with
> > a call to ds_cstr() before using a ds string.>
> > Here's the comments from the header file for reference:
> >
> >  * The 'string' member does not always point to a null-terminated string.
> >  * Initially it is NULL, and even when it is nonnull, some operations do not
> >  * ensure that it is null-terminated.  Use ds_cstr() to ensure that memory 
> > is
> >  * allocated for the string and that it is null-terminated. */
>
> This part talks about Null-termination.  ds_cstr() should not be needed
> for the initialization of a dynamic string.  The problem here, I think,
> that ds_clone doesn't check if memory was allocated, so it incorrectly
> clones the empty dynamic string.  How about this (not tested):
>
> diff --git a/lib/dynamic-string.c b/lib/dynamic-string.c
> index 6f7b610a9..5e3f46bb5 100644
> --- a/lib/dynamic-string.c
> +++ b/lib/dynamic-string.c
> @@ -460,6 +460,10 @@ ds_chomp(struct ds *ds, int c)
>  void
>  ds_clone(struct ds *dst, struct ds *source)
>  {
> +if (!source->allocated) {
> +ds_init(dst);
> +return;
> +}
>  dst->length = source->length;
>  dst->allocated = dst->length;
>  dst->string = xmalloc(dst->allocated + 1);
> ---

Ok, thanks Ilya, I will send out v2.

Thanks,
-Harsha
>
> ?
>
> ds_clone seems to be a fairly new function and it was used only in one
> place in OVN until now, so it wasn't tested enough.  We also seems to
> not have any unit tests for dynamic-string library...
>
> > struct ds {
> > char *string;   /* Null-terminated string. */
> >
> > Let me kno

Re: [ovs-dev] [PATCH] netdev-offload-dpdk: fix a crash in dump_flow_attr()

2021-08-10 Thread Sriharsha Basavapatna via dev
On Wed, Aug 11, 2021 at 6:21 AM Gaëtan Rivet  wrote:
>
> On Wed, Aug 4, 2021, at 14:37, Sriharsha Basavapatna via dev wrote:
> > In netdev_offload_dpdk_flow_create() when an offload request fails,
> > dump_flow() is called to log a warning message. The 's_tnl' string
> > in flow_patterns gets initialized in vport_to_rte_tunnel() conditionally
> > via ds_put_format(). If it is not initialized, it crashes later in
> > dump_flow_attr()->ds_clone()->memcpy() while dereferencing this string.
> >
> > Fix this by initializing s_tnl using ds_cstr(). Fix a similar
> > issue with actions->s_tnl.
> >
> > Signed-off-by: Sriharsha Basavapatna 
>
> Hello Harsha,
>
> Thanks for the fix. I have a remark below.

Thanks Gaetan, please see my response inline.

>
> > ---
> >  lib/netdev-offload-dpdk.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> > index f6706ee0c..243e2c430 100644
> > --- a/lib/netdev-offload-dpdk.c
> > +++ b/lib/netdev-offload-dpdk.c
> > @@ -788,6 +788,7 @@ free_flow_patterns(struct flow_patterns *patterns)
> >  free(CONST_CAST(void *, patterns->items[i].mask));
> >  }
> >  }
> > +ds_destroy(>s_tnl);
> >  free(patterns->items);
> >  patterns->items = NULL;
> >  patterns->cnt = 0;
> > @@ -1334,6 +1335,7 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns
> > *patterns,
> >  struct rte_flow_error error;
> >  struct rte_flow *flow;
> >
> > +ds_cstr(_tnl);
>
> You are forced to use 'ds_cstr' because 'ds_clone' crashes on properly 
> initialized
> ds. I think this is an issue with 'ds_clone'.
>
> I would expect all ds_ functions to work with strings that were initialized 
> with
> 'ds_init' or set to 'DS_EMPTY_INITIALIZER'.
>
> In this case, I think it's better to use .s_tnl = DS_EMPTY_INITIALIZER in the 
> actions
> initializer list and have a fix for 'ds_clone' so that it won't crash with 
> correct strings.

Are you suggesting something like this in ds_clone():

if (source->string)
 memcpy(dst->string, source->string, dst->allocated + 1);

I suspect that's inconsistent with other functions in the ds library.
The expectation seems to be that callers initialize their strings with
a call to ds_cstr() before using a ds string.

Here's the comments from the header file for reference:

 * The 'string' member does not always point to a null-terminated string.
 * Initially it is NULL, and even when it is nonnull, some operations do not
 * ensure that it is null-terminated.  Use ds_cstr() to ensure that memory is
 * allocated for the string and that it is null-terminated. */
struct ds {
char *string;   /* Null-terminated string. */

Let me know what you think.

Thanks,
-Harsha
>
> --
> Gaetan Rivet

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-offload-dpdk: fix a crash in dump_flow_attr()

2021-08-09 Thread Sriharsha Basavapatna via dev
On Wed, Aug 4, 2021 at 6:07 PM Sriharsha Basavapatna
 wrote:
>
> In netdev_offload_dpdk_flow_create() when an offload request fails,
> dump_flow() is called to log a warning message. The 's_tnl' string
> in flow_patterns gets initialized in vport_to_rte_tunnel() conditionally
> via ds_put_format(). If it is not initialized, it crashes later in
> dump_flow_attr()->ds_clone()->memcpy() while dereferencing this string.
>
> Fix this by initializing s_tnl using ds_cstr(). Fix a similar
> issue with actions->s_tnl.
>
> Signed-off-by: Sriharsha Basavapatna 
> ---
>  lib/netdev-offload-dpdk.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index f6706ee0c..243e2c430 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -788,6 +788,7 @@ free_flow_patterns(struct flow_patterns *patterns)
>  free(CONST_CAST(void *, patterns->items[i].mask));
>  }
>  }
> +ds_destroy(>s_tnl);
>  free(patterns->items);
>  patterns->items = NULL;
>  patterns->cnt = 0;
> @@ -1334,6 +1335,7 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns 
> *patterns,
>  struct rte_flow_error error;
>  struct rte_flow *flow;
>
> +ds_cstr(_tnl);
>  add_flow_mark_rss_actions(, flow_mark, netdev);
>
>  flow = netdev_offload_dpdk_flow_create(netdev, _attr, patterns,
> @@ -1814,6 +1816,7 @@ netdev_offload_dpdk_actions(struct netdev *netdev,
>  struct rte_flow_error error;
>  int ret;
>
> +ds_cstr(_tnl);
>  ret = parse_flow_actions(netdev, , nl_actions, actions_len);
>  if (ret) {
>  goto out;
> @@ -1838,6 +1841,7 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
>  bool actions_offloaded = true;
>  struct rte_flow *flow;
>
> +ds_cstr(_tnl);
>  if (parse_flow_match(netdev, info->orig_in_port, , match)) {
>  VLOG_DBG_RL(, "%s: matches of ufid "UUID_FMT" are not supported",
>  netdev_get_name(netdev), UUID_ARGS((struct uuid *) 
> ufid));
> --
> 2.30.0.349.g30b29f044a
>

A gentle reminder on this fix. This crash was observed with the tunnel
offload feature.  The stack trace is below for reference.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7f61311f028c in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f611700 (LWP 25514))]
(gdb) bt
#0  0x7f61311f028c in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
#1  0x027c01cb in ds_clone (dst=0x7f611fffbee0, source=0x7f611fffc078)
at lib/dynamic-string.c:466
#2  0x02917189 in dump_flow_attr (s=0x7f611fffbec0,
s_extra=0x7f611fffbee0, attr=0x7f611fffbfe8, flow_patterns=0x7f611fffc050,
flow_actions=0x7f611fffbfa0) at lib/netdev-offload-dpdk.c:175
#3  0x02919884 in dump_flow (s=0x7f611fffbec0, s_extra=0x7f611fffbee0,
attr=0x7f611fffbfe8, flow_patterns=0x7f611fffc050,
flow_actions=0x7f611fffbfa0) at lib/netdev-offload-dpdk.c:627
#4  0x02919b74 in netdev_offload_dpdk_flow_create (netdev=0x23ffe8400,
attr=0x7f611fffbfe8, flow_patterns=0x7f611fffc050,
flow_actions=0x7f611fffbfa0, error=0x7f611fffbf80)
at lib/netdev-offload-dpdk.c:672
#5  0x0291cc60 in netdev_offload_dpdk_actions (netdev=0x23ffe8400,
patterns=0x7f611fffc050, nl_actions=0x7f60fc00b120, actions_len=8)
at lib/netdev-offload-dpdk.c:1821
#6  0x0291ce14 in netdev_offload_dpdk_add_flow (netdev=0x585d320,
match=0x7f60fc00b2a8, nl_actions=0x7f60fc00b120, actions_len=8,
ufid=0x7f60fc00a970, info=0x7f611fffc220) at lib/netdev-offload-dpdk.c:1847
#7  0x0291d304 in netdev_offload_dpdk_flow_put (netdev=0x585d320,
match=0x7f60fc00b2a8, actions=0x7f60fc00b120, actions_len=8,
ufid=0x7f60fc00a970, info=0x7f611fffc220, stats=0x0)
at lib/netdev-offload-dpdk.c:1963
#8  0x027f1d1e in netdev_flow_put (netdev=0x585d320,
match=0x7f60fc00b2a8, actions=0x7f60fc00b120, act_len=8,
ufid=0x7f60fc00a970, info=0x7f611fffc220, stats=0x0)
at lib/netdev-offload.c:251
#9  0x027a5a43 in dp_netdev_flow_offload_put (offload=0x7f60fc00b290)
at lib/dpif-netdev.c:2622
#10 0x027a5b70 in dp_netdev_flow_offload_main (data=0x0)
at lib/dpif-netdev.c:2671
#11 0x02877c0e in ovsthread_wrapper (aux_=0x57f90d0)
at lib/ovs-thread.c:383
#12 0x7f613299a2de in start_thread () from /lib64/libpthread.so.0
#13 0x7f613118e133 in clone () from /lib64/libc.so.6

Thanks,
-Harsha

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the 

[ovs-dev] [PATCH] netdev-offload-dpdk: fix a crash in dump_flow_attr()

2021-08-04 Thread Sriharsha Basavapatna via dev
In netdev_offload_dpdk_flow_create() when an offload request fails,
dump_flow() is called to log a warning message. The 's_tnl' string
in flow_patterns gets initialized in vport_to_rte_tunnel() conditionally
via ds_put_format(). If it is not initialized, it crashes later in
dump_flow_attr()->ds_clone()->memcpy() while dereferencing this string.

Fix this by initializing s_tnl using ds_cstr(). Fix a similar
issue with actions->s_tnl.

Signed-off-by: Sriharsha Basavapatna 
---
 lib/netdev-offload-dpdk.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index f6706ee0c..243e2c430 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -788,6 +788,7 @@ free_flow_patterns(struct flow_patterns *patterns)
 free(CONST_CAST(void *, patterns->items[i].mask));
 }
 }
+ds_destroy(>s_tnl);
 free(patterns->items);
 patterns->items = NULL;
 patterns->cnt = 0;
@@ -1334,6 +1335,7 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns 
*patterns,
 struct rte_flow_error error;
 struct rte_flow *flow;
 
+ds_cstr(_tnl);
 add_flow_mark_rss_actions(, flow_mark, netdev);
 
 flow = netdev_offload_dpdk_flow_create(netdev, _attr, patterns,
@@ -1814,6 +1816,7 @@ netdev_offload_dpdk_actions(struct netdev *netdev,
 struct rte_flow_error error;
 int ret;
 
+ds_cstr(_tnl);
 ret = parse_flow_actions(netdev, , nl_actions, actions_len);
 if (ret) {
 goto out;
@@ -1838,6 +1841,7 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
 bool actions_offloaded = true;
 struct rte_flow *flow;
 
+ds_cstr(_tnl);
 if (parse_flow_match(netdev, info->orig_in_port, , match)) {
 VLOG_DBG_RL(, "%s: matches of ufid "UUID_FMT" are not supported",
 netdev_get_name(netdev), UUID_ARGS((struct uuid *) ufid));
-- 
2.30.0.349.g30b29f044a


-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload

2021-06-24 Thread Sriharsha Basavapatna via dev
On Thu, Jun 24, 2021 at 12:23 AM Ilya Maximets  wrote:
>
> On 6/23/21 5:52 PM, Eli Britstein wrote:
> > VXLAN decap in OVS-DPDK configuration consists of two flows:
> > F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> > F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> >
> > F1 is a classification flow. It has outer headers matches and it
> > classifies the packet as a VXLAN packet, and using tnl_pop action the
> > packet continues processing in F2.
> > F2 is a flow that has matches on tunnel metadata as well as on the inner
> > packet headers (as any other flow).
> >
> > In order to fully offload VXLAN decap path, both F1 and F2 should be
> > offloaded. As there are more than one flow in HW, it is possible that
> > F1 is done by HW but F2 is not. Packet is received by SW, and should be
> > processed starting from F2 as F1 was already done by HW.
> > Rte_flows are applicable only on physical port IDs. Keeping the original
> > physical in port on which the packet is received on enables applying
> > vport flows (e.g. F2) on that physical port.
> >
> > This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
> > for tunnel offloads.
> >
> > Note that MLX5 PMD has a bug that the tnl_pop private actions must be
> > first. In OVS it is not.
> > Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
> > Meanwhile, tests were done with a workaround for it [2].
> >
> > v2-v1:
> > - Tracking original in_port, and applying vport on that physical port 
> > instead of all PFs.
> > v3-v2:
> > - Traversing ports using a new API instead of flow_dump.
> > - Refactor packet state recover logic, with bug fix for error pop_header.
> > - One ref count for netdev in non-tunnel case.
> > - Rename variables, comments, rebase.
> > v4-v3:
> > - Extract orig_in_port from physdev for flow modify.
> > - Miss handling fixes.
> > v5-v4:
> > - Drop refactor offload rule creation commit.
> > - Comment about setting in_port in restore.
> > - Refactor vports flow offload commit.
> > v6-v5:
> > - Fixed duplicate netdev ref bug.
> > v7-v6:
> > - Adopting Ilya's diff, with a minor fix in set_error stub.
> > - Fixed abort (remove OVS_NOT_REACHED()) with tunnels other than vxlan
> >   ("netdev-offload-dpdk: Support tunnel pop action.").
>
> Thanks!  I see the only difference (beside the set_error fix) with what
> I have locally is following:
>
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index 363f32f71..6bd5b6c9f 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -835,7 +835,9 @@ vport_to_rte_tunnel(struct netdev *vport,
>netdev_dpdk_get_port_id(netdev));
>  }
>  } else {
> -OVS_NOT_REACHED();
> +VLOG_DBG_RL(, "vport type '%s' is not supported",
> +netdev_get_type(vport));
> +return -1;
>  }
>
>  return 0;
> ---
>
> That looks good to me.  So, I guess, Harsha, we're waiting for
> your review/tests here.

Thanks Ilya and Eli, looks good to me; I've also tested it and it works fine.
-Harsha
>
> >
> > Travis:
> > v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> > v2: https://travis-ci.org/github/elibritstein/OVS/builds/758382963
> > v3: https://travis-ci.org/github/elibritstein/OVS/builds/761089087
> > v4: https://travis-ci.org/github/elibritstein/OVS/builds/763146966
> > v5: https://travis-ci.org/github/elibritstein/OVS/builds/765271879
> > v6: https://travis-ci.org/github/elibritstein/OVS/builds/765816800
> > v7: Have a problem to run
>
> Yes, this thing is non-functional.  Even travis-ci.com doesn't work
> for me for unknown reason (I do have compute credits).
>
> Best regards, Ilya Maximets.

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] NVIDIA Roadmap for v2.16

2021-05-26 Thread Sriharsha Basavapatna via dev
Gaetan,

Thanks for sending out this list.

Here's the list that Broadcom is reviewing/working on for the v2.16 release:

OVS-DPDK:

1) VXLAN decap offload:  v6:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=237293
  -- Completed review; acked v6 patchset.

2) Port-forward (direct output) feature:

-- Submitted RFC for review:
https://patchwork.ozlabs.org/project/openvswitch/patch/20210420080710.8200-1-sriharsha.basavapa...@broadcom.com/

-- Alternate/generic RFC by Ilya:
v2: 
https://patchwork.ozlabs.org/project/openvswitch/patch/20210524065140.31891-1-sriharsha.basavapa...@broadcom.com/

Thanks,
-Harsha

On Thu, Apr 15, 2021 at 10:33 PM Gaëtan Rivet  wrote:
>
> Hello,
>
> This is the list of topics we are working on for the v2.16 release:
>
> OVS-DPDK Hardware Offloads:
>
>   * VXLAN decap
> v6 submitted: 
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=237293
>
>   * IP fragmentation match
> v1 submitted: 
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=222530
>
>   * Parallel offload insertion
> v2 submitted: 
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=238779
>
>   * Local port mirroring HW offload
> Need rework following a few changes to VXLAN decap series.
> Submission pending VXLAN decap integration.
>
> OVS-kernel Hardware Offloads:
>
>   * Sflow support
> v12 submitted: 
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=226570=*
> v13 under development.
>
> Userland conntrack:
>
>   * Multithread scalability improvement:
> v1 submitted: 
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=229861
> v2 under development.
>
> Library:
>
>   * RCU debugger
> RFC under development.
>
> Best regards,
> --
> Gaetan Rivet

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] dpif-netdev: Forwarding optimization for direct output flows.

2021-05-24 Thread Sriharsha Basavapatna via dev
From: Ilya Maximets 

There are some cases where users want to have simple forwarding or drop
rules for all packets received from particular port, i.e :

  "in_port=1,actions=2"
  "in_port=1,actions=IN_PORT"
  "in_port=1,actions=drop"

There are also cases where complex OF flows could be simplified down
to simple forwarding/drop datapath flows.

In theory, we don't need to parse packets at all to follow these flows.
"Direct output forwarding" optimization is intended to speed up above
cases.

Design:

Due to various implementation restrictions userspace datapath has
following flow fields always in exact match (i.e. it's required to
match at least these fields of a packet even if the OF rule doesn't
need that):

  - recirc_id
  - in_port
  - packet_type
  - dl_type
  - vlan_tci
  - nw_frag (for ip packets)

Not all of these fields are related to packet itself. We already
know the current 'recirc_id' and the 'in_port' before starting the
packet processing. It also seems safe to assume that we're working
with Ethernet packets. dpif-netdev sets exact match on 'vlan_tci'
to avoid issues with flow format conversion and we don't really need
to match with it until ofproto layer didn't ask us to.
So, for the simple forwarding OF rule we need to match only with
'dl_type' and 'nw_frag'.

'in_port', 'dl_type' and 'nw_frag' could be combined in a single
64bit integer that could be used as a hash in hash map.

New per-PMD flow table 'direct_output_table' introduced to store
direct output flows only. 'dp_netdev_flow_add' adds flow to the
usual 'flow_table' and to 'direct_output_table' if the flow meets
following constraints:

  - 'recirc_id' in flow match is 0.
  - 'packet_type' in flow match is Ethernet.
  - Flow wildcards originally had wildcarded 'vlan_tci'.
  - Flow has no actions (drop) or exactly one action equal to
OVS_ACTION_ATTR_OUTPUT.
  - Flow wildcards contains only minimal set of non-wildcarded fields
(listed above).

If the number of flows for current 'in_port' in regular 'flow_table'
equals number of flows for current 'in_port' in 'direct_output_table',
we may use direct output optimization, because all the flows we have
are direct output flows. This means that we only need to parse
'dl_type' and 'nw_frag' to perform packet matching.
Now we making the unique flow mark from the 'in_port', 'dl_type' and
'nw_frag' and looking for it in 'direct_output_table'.
On successful lookup we don't need to make full 'miniflow_extract()'.

Unsuccessful lookup technically means that we have no sufficient flow
in datapath and upcall will be required. We may optimize this path
in the future by bypassing the EMC, SMC and dpcls lookups in this case.

Performance improvement of this solution on a 'direct output' flows
should be comparable with partial HW offloading, because it parses same
packet fields and uses similar flow lookup scheme.
However, unlike partial HW offloading, it works for all port types
including virtual ones.

Signed-off-by: Ilya Maximets 
Signed-off-by: Sriharsha Basavapatna 

---
v1->v2:
Rebased to master branch.
Added a coverage counter.
---

 lib/dpif-netdev.c | 263 +++---
 lib/flow.c|  12 ++-
 lib/flow.h|   4 +-
 3 files changed, 259 insertions(+), 20 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 650e67ab3..ec09c67cd 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -35,6 +35,7 @@
 
 #include "bitmap.h"
 #include "cmap.h"
+#include "ccmap.h"
 #include "conntrack.h"
 #include "conntrack-tp.h"
 #include "coverage.h"
@@ -114,6 +115,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
 COVERAGE_DEFINE(datapath_drop_invalid_bond);
 COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
 COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
+COVERAGE_DEFINE(datapath_direct_output_packet);
 
 /* Protects against changes to 'dp_netdevs'. */
 static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
@@ -543,6 +545,8 @@ struct dp_netdev_flow {
 /* Hash table index by unmasked flow. */
 const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */
  /* 'flow_table'. */
+const struct cmap_node direct_output_node; /* In dp_netdev_pmd_thread's
+ 'direct_output_table'. */
 const struct cmap_node mark_node; /* In owning flow_mark's mark_to_flow */
 const ovs_u128 ufid; /* Unique flow identifier. */
 const ovs_u128 mega_ufid;/* Unique mega flow identifier. */
@@ -556,7 +560,8 @@ struct dp_netdev_flow {
 struct ovs_refcount ref_cnt;
 
 bool dead;
-uint32_t mark;   /* Unique flow mark assigned to a flow */
+uint32_t mark;   /* Unique flow mark for netdev offloading. */
+uint64_t direct_output_mark; /* Unique flow mark for direct output. */
 
 /* Statistics. */
 struct dp_netdev_flow_stats stats;
@@ -690,12 +695,19 @@ struct dp_netdev_pmd_thread {
 
 

Re: [ovs-dev] [RFC PATCH] dpif-netdev: Support "port-forward" mode to avoid dp cache lookup

2021-04-27 Thread Sriharsha Basavapatna via dev
On Tue, Apr 27, 2021 at 6:42 PM Eli Britstein  wrote:
>
>
> On 4/27/2021 2:45 PM, Sriharsha Basavapatna wrote:
> > On Tue, Apr 27, 2021 at 4:26 PM Ilya Maximets  wrote:
> >> On 4/27/21 11:56 AM, Sriharsha Basavapatna via dev wrote:
> >>> Hi Eli,
> >>>
> >>> On Sun, Apr 25, 2021 at 6:22 PM Eli Britstein  wrote:
> >>>> Hi Harsha,
> >>>>
> >>>> On 4/20/2021 11:07 AM, Sriharsha Basavapatna wrote:
> >>>>> Sometimes a port might be configured with a single flow that just
> >>>>> forwards packets to another port. This would be useful in configs
> >>>>> where the bridge is just fowarding packets between two ports (for
> >>>>> example, between a vhost-user port and a physical port). A flow
> >>>>> that matches only on the in_port and with an action that forwards
> >>>>> to another port would be configured, to avoid learning or matching
> >>>>> on packet headers.
> >>>>>
> >>>>> Example:
> >>>>> $ ovs-ofctl add-flow br0 in_port=1,actions=output:2
> >>>>> $ ovs-ofctl add-flow br0 in_port=2,actions=output:1
> >>>>>
> >>>>> This translates to a datapath flow with the match fields wildcarded
> >>>>> for the packet headers. However, the datapath processing still involves
> >>>> There are still several matches (not wildcards):
> >>>>
> >>>> - recirc_id
> >>>> - in_port
> >>>> - packet_type
> >>>> - dl_type
> >>>> - vlan_tci
> >>>> - nw_frag (for ip packets)
> >>>>
> >>>> So there might be multiple flows for each such openflow rule.
> >>>>
> >>>> In the past, I have tried to optimize such scenario, see:
> >>>>
> >>>> https://mail.openvswitch.org/pipermail/ovs-dev/2019-April/357882.html
> >>>>
> >>>> That was wrong as commented afterwards.
> >>>>
> >>>> Another related patch-set was this (also not accepted):
> >>>>
> >>>> https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/363948.html
> >>>>
> >>>> Ilya wrote an alternative patch:
> >>>>
> >>>> https://patchwork.ozlabs.org/patch/1105880/
> >>>>
> >>>> AFAIR, it didn't improve performance either.
> >> Would be good to have some performance numbers for it as there was
> >> no test results published and I don't know if someone ever tested it.
> >>
> >>> Thanks for the above pointers. Ilya had also shared this patch
> >>> recently while discussing this topic at the ovs-dpdk community
> >>> meeting. I want to see if we can utilize part of the logic in that
> >>> patch to add some constraints, while still avoiding an additional
> >>> table/lookup.  The 'port-forward' mode implies that the user wants to
> >>> avoid any kind of lookup in the datapath (as indicated by the ofctl
> >>> rule + port-forward mode).
> >> I don't see how to completely avoid lookups.
> >>
> >> IIUC, in this patch there is a match and upcall for the first packet,
> >> but there are no matches for subsequent packets.
> > That's right. Allow the first packet to go through match, upcall,
> > dp/cache insertion etc. For subsequent packets avoid lookup.
> >
> >>   This will work
> >> only for flow actions that doesn't modify the packet.  If for some
> >> reason the flow contains header modifications OVS will not do that
> >> correctly because the header is not parsed.  Also, if the packet is
> >> a bit different from the very first packet, we might attempt to
> >> modify headers that doesn't exist.  All in all, this is very dangerous
> >> and might lead to OVS crash.  We can't rely on the user to set specific
> >> OF rules for this functionality and we should not have a feature that
> >> might crash OVS if not used accurately.
> >>
> >> The way to not parse the packet at all and to not perform any matches is
> >> the way to completely ignore OF rules, but OVS is an OF switch and
> >> such functionality just doesn't fit.
> > If I add a constraint to check that there is only one action and it's
> > an OUTPUT action (i.e don't enable port-forward mode if the DP flow
> > contains other actions like modify), like it is done in your patch,
> &g

Re: [ovs-dev] [RFC PATCH] dpif-netdev: Support "port-forward" mode to avoid dp cache lookup

2021-04-27 Thread Sriharsha Basavapatna via dev
On Tue, Apr 27, 2021 at 4:26 PM Ilya Maximets  wrote:
>
> On 4/27/21 11:56 AM, Sriharsha Basavapatna via dev wrote:
> > Hi Eli,
> >
> > On Sun, Apr 25, 2021 at 6:22 PM Eli Britstein  wrote:
> >>
> >> Hi Harsha,
> >>
> >> On 4/20/2021 11:07 AM, Sriharsha Basavapatna wrote:
> >>> Sometimes a port might be configured with a single flow that just
> >>> forwards packets to another port. This would be useful in configs
> >>> where the bridge is just fowarding packets between two ports (for
> >>> example, between a vhost-user port and a physical port). A flow
> >>> that matches only on the in_port and with an action that forwards
> >>> to another port would be configured, to avoid learning or matching
> >>> on packet headers.
> >>>
> >>> Example:
> >>> $ ovs-ofctl add-flow br0 in_port=1,actions=output:2
> >>> $ ovs-ofctl add-flow br0 in_port=2,actions=output:1
> >>>
> >>> This translates to a datapath flow with the match fields wildcarded
> >>> for the packet headers. However, the datapath processing still involves
> >>
> >> There are still several matches (not wildcards):
> >>
> >>- recirc_id
> >>- in_port
> >>- packet_type
> >>- dl_type
> >>- vlan_tci
> >>- nw_frag (for ip packets)
> >>
> >> So there might be multiple flows for each such openflow rule.
> >>
> >> In the past, I have tried to optimize such scenario, see:
> >>
> >> https://mail.openvswitch.org/pipermail/ovs-dev/2019-April/357882.html
> >>
> >> That was wrong as commented afterwards.
> >>
> >> Another related patch-set was this (also not accepted):
> >>
> >> https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/363948.html
> >>
> >> Ilya wrote an alternative patch:
> >>
> >> https://patchwork.ozlabs.org/patch/1105880/
> >>
> >> AFAIR, it didn't improve performance either.
>
> Would be good to have some performance numbers for it as there was
> no test results published and I don't know if someone ever tested it.
>
> >
> > Thanks for the above pointers. Ilya had also shared this patch
> > recently while discussing this topic at the ovs-dpdk community
> > meeting. I want to see if we can utilize part of the logic in that
> > patch to add some constraints, while still avoiding an additional
> > table/lookup.  The 'port-forward' mode implies that the user wants to
> > avoid any kind of lookup in the datapath (as indicated by the ofctl
> > rule + port-forward mode).
>
> I don't see how to completely avoid lookups.
>
> IIUC, in this patch there is a match and upcall for the first packet,
> but there are no matches for subsequent packets.

That's right. Allow the first packet to go through match, upcall,
dp/cache insertion etc. For subsequent packets avoid lookup.

>  This will work
> only for flow actions that doesn't modify the packet.  If for some
> reason the flow contains header modifications OVS will not do that
> correctly because the header is not parsed.  Also, if the packet is
> a bit different from the very first packet, we might attempt to
> modify headers that doesn't exist.  All in all, this is very dangerous
> and might lead to OVS crash.  We can't rely on the user to set specific
> OF rules for this functionality and we should not have a feature that
> might crash OVS if not used accurately.
>
> The way to not parse the packet at all and to not perform any matches is
> the way to completely ignore OF rules, but OVS is an OF switch and
> such functionality just doesn't fit.

If I add a constraint to check that there is only one action and it's
an OUTPUT action (i.e don't enable port-forward mode if the DP flow
contains other actions like modify), like it is done in your patch,
that should handle this issue ?

Thanks,
-Harsha
>
> In my change I minimized the lookup as possible to a single 64bit key.
> And it will actually work with any OF rules and without enabling of
> any special flags.  Would be great to see some performance numbers
> for it as I didn't see any.
>
> > With pvp tests (vxlan config), we have
> > seen better performance both in pps: ~50% and cpp: ~35%, at a few
> > thousand flows. Similar improvement can be seen with simple
> > configurations (e.g testpmd in the vm in txonly fwd mode).
> >
> >>
> >> Besides, I've tried this patch. Maybe I did something wrong (I
> >> configured port-forward=true on those ports and those openflow rules,
> >>

Re: [ovs-dev] [RFC PATCH] dpif-netdev: Support "port-forward" mode to avoid dp cache lookup

2021-04-27 Thread Sriharsha Basavapatna via dev
Hi Eli,

On Sun, Apr 25, 2021 at 6:22 PM Eli Britstein  wrote:
>
> Hi Harsha,
>
> On 4/20/2021 11:07 AM, Sriharsha Basavapatna wrote:
> > Sometimes a port might be configured with a single flow that just
> > forwards packets to another port. This would be useful in configs
> > where the bridge is just fowarding packets between two ports (for
> > example, between a vhost-user port and a physical port). A flow
> > that matches only on the in_port and with an action that forwards
> > to another port would be configured, to avoid learning or matching
> > on packet headers.
> >
> > Example:
> > $ ovs-ofctl add-flow br0 in_port=1,actions=output:2
> > $ ovs-ofctl add-flow br0 in_port=2,actions=output:1
> >
> > This translates to a datapath flow with the match fields wildcarded
> > for the packet headers. However, the datapath processing still involves
>
> There are still several matches (not wildcards):
>
>- recirc_id
>- in_port
>- packet_type
>- dl_type
>- vlan_tci
>- nw_frag (for ip packets)
>
> So there might be multiple flows for each such openflow rule.
>
> In the past, I have tried to optimize such scenario, see:
>
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-April/357882.html
>
> That was wrong as commented afterwards.
>
> Another related patch-set was this (also not accepted):
>
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/363948.html
>
> Ilya wrote an alternative patch:
>
> https://patchwork.ozlabs.org/patch/1105880/
>
> AFAIR, it didn't improve performance either.

Thanks for the above pointers. Ilya had also shared this patch
recently while discussing this topic at the ovs-dpdk community
meeting. I want to see if we can utilize part of the logic in that
patch to add some constraints, while still avoiding an additional
table/lookup.  The 'port-forward' mode implies that the user wants to
avoid any kind of lookup in the datapath (as indicated by the ofctl
rule + port-forward mode).  With pvp tests (vxlan config), we have
seen better performance both in pps: ~50% and cpp: ~35%, at a few
thousand flows. Similar improvement can be seen with simple
configurations (e.g testpmd in the vm in txonly fwd mode).

>
> Besides, I've tried this patch. Maybe I did something wrong (I
> configured port-forward=true on those ports and those openflow rules,
> and pinged between those ports). I didn't see it worked (the coverage,
> and also I added my own prints).

When you enable port-forward and start the traffic, you should see a
message like this:
"dpif_netdev(pmd-c02/id:74)|DBG|Setting port_forward_flow: port:
0x7f63400050b0 flow: 0x7f634000afb0"

I'm guessing the flow isn't getting added to the port; the insertion
is currently done when there's an emc hit. I should probably move the
insertion code to handle_packet_upcall(). As a workaround, can you
please update the emc insertion probability (ovs-vsctl --no-wait set
Open_vSwitch . other_config:emc-insert-inv-prob=1) and retry your test
?

Also, please disable normal mode in the bridge (ovs-ofctl del-flows
br0; and then add ofctl rules).  Let me know if you still see the
problem, I'll work with you offline.

>
> With this proposed patch, what will be the behavior in case there are
> multiple DP flows for that single openflow rule?

Right now I'm thinking that the ofctl rule takes precedence since the
user just wants to forward to another port. If there are multiple DP
flows, then the first one will act as the default flow.  What do you
think ?

Thanks,
-Harsha


>
> Thanks,
> Eli
>
> > flow cache (EMC/SMC) lookup and with a large number of flows it also
> > results in dpcls lookup due to cache miss. Avoiding cache lookup in
> > such configurations results in better performance (pps and cpp).
> >
> > This patch provides a new interface config parameter - "port-forward",
> > to avoid datapath cache lookup. When this is enabled, the datapath flow
> > is saved in the port when there is a cache hit for the initial packet.
> > For subsequent packets, the flow is readily found in the port structure,
> > thus avoiding cache and dpcls lookup.
> >
> > Example:
> > $ ovs-vsctl add-port br0 dpdk0 \
> > -- set Interface dpdk0 other_config:port-forward=true
> >
> > A coverage counter has also been added to track packets processed in
> > port-forward mode.
> >
> > $ ovs-appctl coverage/show   | grep datapath_port_forward_packet
> >
> > Signed-off-by: Sriharsha Basavapatna 
> > ---
> >   lib/dpif-netdev.c| 79 ++--
> >   vswitchd/vswitch.xml | 11 ++
> >   2 files changed, 72 insertions(+), 18 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 251788b04..133ed7c1e 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
> >   COVERAGE_DEFINE(datapath_drop_invalid_bond);
> >   COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
> >   COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
> > 

[ovs-dev] [RFC PATCH] dpif-netdev: Support "port-forward" mode to avoid dp cache lookup

2021-04-20 Thread Sriharsha Basavapatna via dev
Sometimes a port might be configured with a single flow that just
forwards packets to another port. This would be useful in configs
where the bridge is just fowarding packets between two ports (for
example, between a vhost-user port and a physical port). A flow
that matches only on the in_port and with an action that forwards
to another port would be configured, to avoid learning or matching
on packet headers.

Example:
$ ovs-ofctl add-flow br0 in_port=1,actions=output:2
$ ovs-ofctl add-flow br0 in_port=2,actions=output:1

This translates to a datapath flow with the match fields wildcarded
for the packet headers. However, the datapath processing still involves
flow cache (EMC/SMC) lookup and with a large number of flows it also
results in dpcls lookup due to cache miss. Avoiding cache lookup in
such configurations results in better performance (pps and cpp).

This patch provides a new interface config parameter - "port-forward",
to avoid datapath cache lookup. When this is enabled, the datapath flow
is saved in the port when there is a cache hit for the initial packet.
For subsequent packets, the flow is readily found in the port structure,
thus avoiding cache and dpcls lookup.

Example:
$ ovs-vsctl add-port br0 dpdk0 \
-- set Interface dpdk0 other_config:port-forward=true

A coverage counter has also been added to track packets processed in
port-forward mode.

$ ovs-appctl coverage/show   | grep datapath_port_forward_packet

Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev.c| 79 ++--
 vswitchd/vswitch.xml | 11 ++
 2 files changed, 72 insertions(+), 18 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 251788b04..133ed7c1e 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
 COVERAGE_DEFINE(datapath_drop_invalid_bond);
 COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
 COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
+COVERAGE_DEFINE(datapath_port_forward_packet);
 
 /* Protects against changes to 'dp_netdevs'. */
 static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
@@ -483,6 +484,7 @@ struct dp_netdev_port {
 unsigned *txq_used; /* Number of threads that use each tx queue. */
 struct ovs_mutex txq_used_mutex;
 bool emc_enabled;   /* If true EMC will be used. */
+bool port_forward;
 char *type; /* Port type as requested by user. */
 char *rxq_affinity_list;/* Requested affinity of rx queues. */
 };
@@ -557,6 +559,7 @@ struct dp_netdev_flow {
 
 bool dead;
 uint32_t mark;   /* Unique flow mark assigned to a flow */
+void *port_forward_txp;
 
 /* Statistics. */
 struct dp_netdev_flow_stats stats;
@@ -610,6 +613,7 @@ struct polled_queue {
 struct dp_netdev_rxq *rxq;
 odp_port_t port_no;
 bool emc_enabled;
+bool port_forward;
 bool rxq_enabled;
 uint64_t change_seq;
 };
@@ -628,6 +632,7 @@ struct tx_port {
 long long last_used;
 struct hmap_node node;
 long long flush_time;
+struct dp_netdev_flow *port_forward_flow;
 struct dp_packet_batch output_pkts;
 struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST];
 };
@@ -840,7 +845,8 @@ static void dp_netdev_execute_actions(struct 
dp_netdev_pmd_thread *pmd,
   const struct nlattr *actions,
   size_t actions_len);
 static void dp_netdev_input(struct dp_netdev_pmd_thread *,
-struct dp_packet_batch *, odp_port_t port_no);
+struct dp_packet_batch *, odp_port_t port_no,
+bool port_forward);
 static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
   struct dp_packet_batch *);
 
@@ -2083,6 +2089,7 @@ port_create(const char *devname, const char *type,
 port->type = xstrdup(type);
 port->sf = NULL;
 port->emc_enabled = true;
+port->port_forward = false;
 port->need_reconfigure = true;
 ovs_mutex_init(>txq_used_mutex);
 
@@ -2845,6 +2852,15 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread 
*pmd,
 if (flow->mark != INVALID_FLOW_MARK) {
 queue_netdev_flow_del(pmd, flow);
 }
+if (flow->port_forward_txp) {
+struct tx_port *p = flow->port_forward_txp;
+if (p->port_forward_flow == flow) {
+VLOG_DBG("Deleting port_forward_flow: port: %p flow: %p",
+  p, flow);
+p->port_forward_flow = NULL;
+flow->port_forward_txp = NULL;
+}
+}
 flow->dead = true;
 
 dp_netdev_flow_unref(flow);
@@ -3664,6 +3680,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
 flow->dead = false;
 flow->batch = NULL;
 flow->mark = INVALID_FLOW_MARK;
+flow->port_forward_txp = NULL;
 *CONST_CAST(unsigned *, >pmd_id) = pmd->core_id;
 *CONST_CAST(struct 

Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload

2021-04-08 Thread Sriharsha Basavapatna via dev
On Wed, Apr 7, 2021 at 2:50 PM Eli Britstein  wrote:

>
> On 4/7/2021 12:10 PM, Sriharsha Basavapatna wrote:
>
>
> On Sun, Apr 4, 2021 at 3:25 PM Eli Britstein  wrote:
>
>> VXLAN decap in OVS-DPDK configuration consists of two flows:
>> F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
>> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
>>
>> F1 is a classification flow. It has outer headers matches and it
>> classifies the packet as a VXLAN packet, and using tnl_pop action the
>> packet continues processing in F2.
>> F2 is a flow that has matches on tunnel metadata as well as on the inner
>> packet headers (as any other flow).
>>
>> In order to fully offload VXLAN decap path, both F1 and F2 should be
>> offloaded. As there are more than one flow in HW, it is possible that
>> F1 is done by HW but F2 is not. Packet is received by SW, and should be
>> processed starting from F2 as F1 was already done by HW.
>> Rte_flows are applicable only on physical port IDs. Keeping the original
>> physical in port on which the packet is received on enables applying
>> vport flows (e.g. F2) on that physical port.
>>
>> This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
>> for tunnel offloads.
>>
>> Note that MLX5 PMD has a bug that the tnl_pop private actions must be
>> first. In OVS it is not.
>> Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
>> Meanwhile, tests were done with a workaround for it [2].
>>
>> v2-v1:
>> - Tracking original in_port, and applying vport on that physical port
>> instead of all PFs.
>> v3-v2:
>> - Traversing ports using a new API instead of flow_dump.
>> - Refactor packet state recover logic, with bug fix for error pop_header.
>> - One ref count for netdev in non-tunnel case.
>> - Rename variables, comments, rebase.
>> v4-v3:
>> - Extract orig_in_port from physdev for flow modify.
>> - Miss handling fixes.
>> v5-v4:
>> - Drop refactor offload rule creation commit.
>> - Comment about setting in_port in restore.
>> - Refactor vports flow offload commit.
>> v6-v5:
>> - Fixed duplicate netdev ref bug.
>>
>
> Can you provide some info on this bug ?  and what changes were done to fix
> this ?
>
> With v5, the 2 netdevs sent to ufid_to_rte_flow_associate are always
> non-NULL (was not like this previously), and there was this line:
>
> +data->netdev = vport ? netdev_ref(vport) : physdev;
>
> As the "vport" was always non-null, even for non-tunnels, it took another
> ref of it, but in disassociate, only one close was done.
>
> With v6 it is like this (changed arguments names a bit)
>
> +data->physdev = netdev != physdev ? netdev_ref(physdev) : physdev;
>
> Checking the netdevs are different, not non-NULL.
>
> Thanks,
> -Harsha
>
>>
>> Travis:
>> v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
>> v2 :
>> https://travis-ci.org/github/elibritstein/OVS/builds/758382963
>> v3 :
>> https://travis-ci.org/github/elibritstein/OVS/builds/761089087
>> v4 :
>> https://travis-ci.org/github/elibritstein/OVS/builds/763146966
>> v5 :
>> https://travis-ci.org/github/elibritstein/OVS/builds/765271879
>> v6 :
>> https://travis-ci.org/github/elibritstein/OVS/builds/765816800
>>
>> GitHub Actions:
>> v1: https://github.com/elibritstein/OVS/actions/runs/515334647
>> v2: https://github.com/elibritstein/OVS/actions/runs/554986007
>> v3: https://github.com/elibritstein/OVS/actions/runs/613226225
>> v4: https://github.com/elibritstein/OVS/actions/runs/658415274
>> v5: https://github.com/elibritstein/OVS/actions/runs/704357369
>> v6: https://github.com/elibritstein/OVS/actions/runs/716304028
>>
>> [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
>> [2] https://github.com/elibritstein/dpdk-stable/pull/1
>>
>>
>> Eli Britstein (10):
>>   netdev-offload: Add HW miss packet state recover API
>>   netdev-dpdk: Introduce DPDK tunnel APIs
>>   netdev-offload: Introduce an API to traverse ports
>>   netdev-dpdk: Add flow_api support for netdev vxlan vports
>>   netdev-offload-dpdk: Implement HW miss packet recover for vport
>>   dpif-netdev: Add HW miss packet state recover logic
>>   netdev-offload-dpdk: Change log rate limits
>>   netdev-offload-dpdk: Support tunnel pop action
>>   netdev-offload-dpdk: Support vports flows offload
>>   netdev-dpdk-offload: Add vxlan pattern matching function
>>
>> Ilya Maximets (2):
>>   netdev-offload: Allow offloading to netdev without ifindex.
>>   netdev-offload: Disallow offloading to unrelated tunneling vports.
>>
>> Sriharsha Basavapatna (1):
>>   dpif-netdev: Provide orig_in_port in metadata for tunneled packets
>>
>>  Documentation/howto/dpdk.rst  |  

Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload

2021-04-07 Thread Sriharsha Basavapatna via dev
On Sun, Apr 4, 2021 at 3:25 PM Eli Britstein  wrote:

> VXLAN decap in OVS-DPDK configuration consists of two flows:
> F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
>
> F1 is a classification flow. It has outer headers matches and it
> classifies the packet as a VXLAN packet, and using tnl_pop action the
> packet continues processing in F2.
> F2 is a flow that has matches on tunnel metadata as well as on the inner
> packet headers (as any other flow).
>
> In order to fully offload VXLAN decap path, both F1 and F2 should be
> offloaded. As there are more than one flow in HW, it is possible that
> F1 is done by HW but F2 is not. Packet is received by SW, and should be
> processed starting from F2 as F1 was already done by HW.
> Rte_flows are applicable only on physical port IDs. Keeping the original
> physical in port on which the packet is received on enables applying
> vport flows (e.g. F2) on that physical port.
>
> This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
> for tunnel offloads.
>
> Note that MLX5 PMD has a bug that the tnl_pop private actions must be
> first. In OVS it is not.
> Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
> Meanwhile, tests were done with a workaround for it [2].
>
> v2-v1:
> - Tracking original in_port, and applying vport on that physical port
> instead of all PFs.
> v3-v2:
> - Traversing ports using a new API instead of flow_dump.
> - Refactor packet state recover logic, with bug fix for error pop_header.
> - One ref count for netdev in non-tunnel case.
> - Rename variables, comments, rebase.
> v4-v3:
> - Extract orig_in_port from physdev for flow modify.
> - Miss handling fixes.
> v5-v4:
> - Drop refactor offload rule creation commit.
> - Comment about setting in_port in restore.
> - Refactor vports flow offload commit.
> v6-v5:
> - Fixed duplicate netdev ref bug.
>

Can you provide some info on this bug ?  and what changes were done to fix
this ?
Thanks,
-Harsha

>
> Travis:
> v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> v2 :
> https://travis-ci.org/github/elibritstein/OVS/builds/758382963
> v3 :
> https://travis-ci.org/github/elibritstein/OVS/builds/761089087
> v4 :
> https://travis-ci.org/github/elibritstein/OVS/builds/763146966
> v5 :
> https://travis-ci.org/github/elibritstein/OVS/builds/765271879
> v6 :
> https://travis-ci.org/github/elibritstein/OVS/builds/765816800
>
> GitHub Actions:
> v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> v2: https://github.com/elibritstein/OVS/actions/runs/554986007
> v3: https://github.com/elibritstein/OVS/actions/runs/613226225
> v4: https://github.com/elibritstein/OVS/actions/runs/658415274
> v5: https://github.com/elibritstein/OVS/actions/runs/704357369
> v6: https://github.com/elibritstein/OVS/actions/runs/716304028
>
> [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> [2] https://github.com/elibritstein/dpdk-stable/pull/1
>
>
> Eli Britstein (10):
>   netdev-offload: Add HW miss packet state recover API
>   netdev-dpdk: Introduce DPDK tunnel APIs
>   netdev-offload: Introduce an API to traverse ports
>   netdev-dpdk: Add flow_api support for netdev vxlan vports
>   netdev-offload-dpdk: Implement HW miss packet recover for vport
>   dpif-netdev: Add HW miss packet state recover logic
>   netdev-offload-dpdk: Change log rate limits
>   netdev-offload-dpdk: Support tunnel pop action
>   netdev-offload-dpdk: Support vports flows offload
>   netdev-dpdk-offload: Add vxlan pattern matching function
>
> Ilya Maximets (2):
>   netdev-offload: Allow offloading to netdev without ifindex.
>   netdev-offload: Disallow offloading to unrelated tunneling vports.
>
> Sriharsha Basavapatna (1):
>   dpif-netdev: Provide orig_in_port in metadata for tunneled packets
>
>  Documentation/howto/dpdk.rst  |   1 +
>  NEWS  |   2 +
>  lib/dpif-netdev.c |  97 +++--
>  lib/netdev-dpdk.c | 118 ++
>  lib/netdev-dpdk.h | 106 -
>  lib/netdev-offload-dpdk.c | 704 +++---
>  lib/netdev-offload-provider.h |   5 +
>  lib/netdev-offload-tc.c   |   8 +
>  lib/netdev-offload.c  |  47 ++-
>  lib/netdev-offload.h  |  10 +
>  lib/packets.h |   8 +-
>  11 files changed, 1022 insertions(+), 84 deletions(-)
>
> --
> 2.28.0.2311.g225365fb51
>
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or 

Re: [ovs-dev] [PATCH V4 00/14] Netdev vxlan-decap offload

2021-03-30 Thread Sriharsha Basavapatna via dev
On Thu, Mar 25, 2021 at 2:40 PM Eli Britstein  wrote:
>
> Hello,
>
> Note that MLX5 PMD has a bug that the tnl_pop private actions must be
> first. In OVS it is not.
>
> Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
>
> Meanwhile, tests were done with a workaround for it.
>
> See https://github.com/elibritstein/dpdk-stable/pull/1
>
>
> Also, any other comments on this series?
>
>
> Thanks,
>
> Eli
>
>
> On 3/17/2021 8:35 AM, Eli Britstein wrote:
> > VXLAN decap in OVS-DPDK configuration consists of two flows:
> > F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> > F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> >
> > F1 is a classification flow. It has outer headers matches and it
> > classifies the packet as a VXLAN packet, and using tnl_pop action the
> > packet continues processing in F2.
> > F2 is a flow that has matches on tunnel metadata as well as on the inner
> > packet headers (as any other flow).
> >
> > In order to fully offload VXLAN decap path, both F1 and F2 should be
> > offloaded. As there are more than one flow in HW, it is possible that
> > F1 is done by HW but F2 is not. Packet is received by SW, and should be
> > processed starting from F2 as F1 was already done by HW.
> > Rte_flows are applicable only on physical port IDs. Keeping the original
> > physical in port on which the packet is received on enables applying
> > vport flows (e.g. F2) on that physical port.
> >
> > This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
> > for tunnel offloads.
> >
> > v2-v1:
> > - Tracking original in_port, and applying vport on that physical port 
> > instead of all PFs.
> > v3-v2:
> > - Traversing ports using a new API instead of flow_dump.
> > - Refactor packet state recover logic, with bug fix for error pop_header.
> > - One ref count for netdev in non-tunnel case.
> > - Rename variables, comments, rebase.
> > v4-v3:
> > - Extract orig_in_port from physdev for flow modify.
> > - Miss handling fixes.
> >
> > Travis:
> > v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> > v2: https://travis-ci.org/github/elibritstein/OVS/builds/758382963
> > v3: https://travis-ci.org/github/elibritstein/OVS/builds/761089087
> > v4: https://travis-ci.org/github/elibritstein/OVS/builds/763146966
> >
> > GitHub Actions:
> > v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> > v2: https://github.com/elibritstein/OVS/actions/runs/554986007
> > v3: https://github.com/elibritstein/OVS/actions/runs/613226225
> > v4: https://github.com/elibritstein/OVS/actions/runs/658415274
> >
> > [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> >
> > Eli Britstein (11):
> >netdev-offload: Add HW miss packet state recover API
> >netdev-dpdk: Introduce DPDK tunnel APIs
> >netdev-offload: Introduce an API to traverse ports
> >netdev-dpdk: Add flow_api support for netdev vxlan vports
> >netdev-offload-dpdk: Implement HW miss packet recover for vport
> >dpif-netdev: Add HW miss packet state recover logic
> >netdev-offload-dpdk: Change log rate limits
> >netdev-offload-dpdk: Support tunnel pop action
> >netdev-offload-dpdk: Refactor offload rule creation
> >netdev-offload-dpdk: Support vports flows offload
> >netdev-dpdk-offload: Add vxlan pattern matching function
> >
> > Ilya Maximets (2):
> >netdev-offload: Allow offloading to netdev without ifindex.
> >netdev-offload: Disallow offloading to unrelated tunneling vports.
> >
> > Sriharsha Basavapatna (1):
> >dpif-netdev: Provide orig_in_port in metadata for tunneled packets
> >
> >   Documentation/howto/dpdk.rst  |   1 +
> >   NEWS  |   2 +
> >   lib/dpif-netdev.c |  97 +++--
> >   lib/netdev-dpdk.c | 118 +
> >   lib/netdev-dpdk.h | 106 -
> >   lib/netdev-offload-dpdk.c | 782 ++
> >   lib/netdev-offload-provider.h |   5 +
> >   lib/netdev-offload-tc.c   |   8 +
> >   lib/netdev-offload.c  |  47 +-
> >   lib/netdev-offload.h  |  10 +
> >   lib/packets.h |   8 +-
> >   11 files changed, 1052 insertions(+), 132 deletions(-)
> >

Looks good overall;  couple of minor comments in patches 5 and 13.
Thanks,
-Harsha

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in 

Re: [ovs-dev] [PATCH V4 13/14] netdev-offload-dpdk: Support vports flows offload

2021-03-30 Thread Sriharsha Basavapatna via dev
On Wed, Mar 17, 2021 at 12:05 PM Eli Britstein  wrote:
>
> Vports are virtual, OVS only logical devices, so rte_flows cannot be
> applied as is on them. Instead, apply the rules the physical port from
> which the packet has arrived, provided by orig_in_port field.
>
> Signed-off-by: Eli Britstein 
> Reviewed-by: Gaetan Rivet 
> ---
>  lib/netdev-offload-dpdk.c | 204 --
>  1 file changed, 171 insertions(+), 33 deletions(-)
>
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index ade7fae09..69aaefa0f 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -25,6 +25,7 @@
>  #include "netdev-offload-provider.h"
>  #include "netdev-provider.h"
>  #include "netdev-vport.h"
> +#include "odp-util.h"
>  #include "openvswitch/match.h"
>  #include "openvswitch/vlog.h"
>  #include "packets.h"
> @@ -62,6 +63,7 @@ struct ufid_to_rte_flow_data {
>  struct rte_flow *rte_flow;
>  bool actions_offloaded;
>  struct dpif_flow_stats stats;
> +struct netdev *physdev;
>  };
>
>  /* Find rte_flow with @ufid. */
> @@ -87,7 +89,8 @@ ufid_to_rte_flow_data_find(const ovs_u128 *ufid, bool warn)
>
>  static inline struct ufid_to_rte_flow_data *
>  ufid_to_rte_flow_associate(const ovs_u128 *ufid, struct netdev *netdev,
> -   struct rte_flow *rte_flow, bool actions_offloaded)
> +   struct netdev *vport, struct rte_flow *rte_flow,
> +   bool actions_offloaded)
>  {
>  size_t hash = hash_bytes(ufid, sizeof *ufid, 0);
>  struct ufid_to_rte_flow_data *data = xzalloc(sizeof *data);
> @@ -105,7 +108,8 @@ ufid_to_rte_flow_associate(const ovs_u128 *ufid, struct 
> netdev *netdev,
>  }
>
>  data->ufid = *ufid;
> -data->netdev = netdev_ref(netdev);
> +data->physdev = netdev_ref(netdev);
> +data->netdev = vport ? netdev_ref(vport) : netdev;
>  data->rte_flow = rte_flow;
>  data->actions_offloaded = actions_offloaded;
>
> @@ -121,7 +125,10 @@ ufid_to_rte_flow_disassociate(struct 
> ufid_to_rte_flow_data *data)
>
>  cmap_remove(_to_rte_flow,
>  CONST_CAST(struct cmap_node *, >node), hash);
> -netdev_close(data->netdev);
> +if (data->netdev != data->physdev) {
> +netdev_close(data->netdev);
> +}
> +netdev_close(data->physdev);
>  ovsrcu_postpone(free, data);
>  }
>
> @@ -134,6 +141,8 @@ struct flow_patterns {
>  struct rte_flow_item *items;
>  int cnt;
>  int current_max;
> +uint32_t tnl_pmd_items_cnt;
> +struct ds s_tnl;
>  };
>
>  struct flow_actions {
> @@ -154,16 +163,20 @@ struct flow_actions {
>  static void
>  dump_flow_attr(struct ds *s, struct ds *s_extra,
> const struct rte_flow_attr *attr,
> +   struct flow_patterns *flow_patterns,
> struct flow_actions *flow_actions)
>  {
>  if (flow_actions->tnl_pmd_actions_cnt) {
>  ds_clone(s_extra, _actions->s_tnl);
> +} else if (flow_patterns->tnl_pmd_items_cnt) {
> +ds_clone(s_extra, _patterns->s_tnl);
>  }
> -ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s%s",
> +ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s%s%s",
>attr->ingress  ? "ingress " : "",
>attr->egress   ? "egress " : "", attr->priority, 
> attr->group,
>attr->transfer ? "transfer " : "",
> -  flow_actions->tnl_pmd_actions_cnt ? "tunnel_set 1 " : "");
> +  flow_actions->tnl_pmd_actions_cnt ? "tunnel_set 1 " : "",
> +  flow_patterns->tnl_pmd_items_cnt ? "tunnel_match 1 " : "");
>  }
>
>  /* Adds one pattern item 'field' with the 'mask' to dynamic string 's' using
> @@ -177,9 +190,18 @@ dump_flow_attr(struct ds *s, struct ds *s_extra,
>  }
>
>  static void
> -dump_flow_pattern(struct ds *s, const struct rte_flow_item *item)
> +dump_flow_pattern(struct ds *s,
> +  struct flow_patterns *flow_patterns,
> +  int pattern_index)
>  {
> -if (item->type == RTE_FLOW_ITEM_TYPE_ETH) {
> +const struct rte_flow_item *item = _patterns->items[pattern_index];
> +
> +if (item->type == RTE_FLOW_ITEM_TYPE_END) {
> +ds_put_cstr(s, "end ");
> +} else if (flow_patterns->tnl_pmd_items_cnt &&
> +   pattern_index < flow_patterns->tnl_pmd_items_cnt) {
> +return;
> +} else if (item->type == RTE_FLOW_ITEM_TYPE_ETH) {
>  const struct rte_flow_item_eth *eth_spec = item->spec;
>  const struct rte_flow_item_eth *eth_mask = item->mask;
>
> @@ -569,19 +591,19 @@ dump_flow_action(struct ds *s, struct ds *s_extra,
>  static struct ds *
>  dump_flow(struct ds *s, struct ds *s_extra,
>const struct rte_flow_attr *attr,
> -  const struct rte_flow_item *items,
> +  struct flow_patterns *flow_patterns,
>struct flow_actions *flow_actions)
>  {
>  int i;
>
>  

Re: [ovs-dev] [PATCH V4 05/14] netdev-offload-dpdk: Implement HW miss packet recover for vport

2021-03-30 Thread Sriharsha Basavapatna via dev
On Wed, Mar 17, 2021 at 12:05 PM Eli Britstein  wrote:
>
> A miss in virtual port offloads means the flow with tnl_pop was
> offloaded, but not the following one. Recover the state and continue
> with SW processing.
>
> Signed-off-by: Eli Britstein 
> Reviewed-by: Gaetan Rivet 
> ---
>  lib/netdev-offload-dpdk.c | 150 ++
>  1 file changed, 150 insertions(+)
>
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index f2413f5be..c78089605 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -1588,6 +1588,155 @@ netdev_offload_dpdk_flow_flush(struct netdev *netdev)
>  return 0;
>  }
>
> +struct get_vport_netdev_aux {
> +struct rte_flow_tunnel *tunnel;
> +odp_port_t *odp_port;
> +struct netdev *vport;
> +};
> +
> +static bool
> +get_vxlan_netdev_cb(struct netdev *netdev,
> +odp_port_t odp_port,
> +void *aux_)
> +{
> +const struct netdev_tunnel_config *tnl_cfg;
> +struct get_vport_netdev_aux *aux = aux_;
> +
> +if (strcmp(netdev_get_type(netdev), "vxlan")) {
> +   return false;
> +}
> +
> +tnl_cfg = netdev_get_tunnel_config(netdev);
> +if (!tnl_cfg) {
> +VLOG_ERR_RL(, "Cannot get a tunnel config for netdev %s",
> +netdev_get_name(netdev));
> +return false;
> +}
> +
> +if (tnl_cfg->dst_port == aux->tunnel->tp_dst) {
> +/* Found the netdev. Store the results and stop the traversing. */
> +aux->vport = netdev_ref(netdev);
> +*aux->odp_port = odp_port;
> +return true;
> +}
> +
> +return false;
> +}
> +
> +static struct netdev *
> +get_vxlan_netdev(const char *dpif_type,
> + struct rte_flow_tunnel *tunnel,
> + odp_port_t *odp_port)
> +{
> +struct get_vport_netdev_aux aux = {
> +.tunnel = tunnel,
> +.odp_port = odp_port,
> +.vport = NULL,
> +};
> +
> +netdev_ports_traverse(dpif_type, get_vxlan_netdev_cb, );
> +return aux.vport;
> +}
> +
> +static struct netdev *
> +get_vport_netdev(const char *dpif_type,
> + struct rte_flow_tunnel *tunnel,
> + odp_port_t *odp_port)
> +{
> +if (tunnel->type == RTE_FLOW_ITEM_TYPE_VXLAN) {
> +return get_vxlan_netdev(dpif_type, tunnel, odp_port);
> +}
> +
> +OVS_NOT_REACHED();
> +}
> +
> +static int
> +netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
> +   struct dp_packet *packet)
> +{
> +struct rte_flow_restore_info rte_restore_info;
> +struct rte_flow_tunnel *rte_tnl;
> +struct netdev *vport_netdev;
> +struct rte_flow_error error;
> +struct pkt_metadata *md;
> +struct flow_tnl *md_tnl;
> +odp_port_t vport_odp;
> +int ret = 0;
> +
> +if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
> +  _restore_info, )) {
> +/* This function is called for every packet, and in most cases there
> + * will be no restore info from the HW, thus error is expected.
> + */
> +(void) error;
> +return 0;
> +}
> +
> +if (!(rte_restore_info.flags & RTE_FLOW_RESTORE_INFO_TUNNEL)) {
> +return EOPNOTSUPP;
> +}
> +
> +rte_tnl = _restore_info.tunnel;
> +vport_netdev = get_vport_netdev(netdev->dpif_type, rte_tnl,
> +_odp);
> +if (!vport_netdev) {
> +VLOG_WARN("Could not find vport netdev");
> +return EOPNOTSUPP;
> +}
> +
> +md = >md;
> +/* For tunnel recovery (RTE_FLOW_RESTORE_INFO_TUNNEL), it is possible
> + * to have the packet to still be encapsulated, or not
> + * (RTE_FLOW_RESTORE_INFO_ENCAPSULATED).
> + * In the case it is on, the packet is still encapsulated, and we do
> + * the pop in SW.
> + * In the case it is off, the packet is already decapsulated by HW, and
> + * the tunnel info is provided in the tunnel struct. For this case we
> + * take it to OVS metadata.
> + */
> +if (rte_restore_info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED) {
> +if (!vport_netdev->netdev_class ||
> +!vport_netdev->netdev_class->pop_header) {
> +VLOG_ERR("vport nedtdev=%s with no pop_header method",
> + netdev_get_name(vport_netdev));
> +ret = EOPNOTSUPP;
> +goto close_vport_netdev;
> +}
> +parse_tcp_flags(packet);
> +if (vport_netdev->netdev_class->pop_header(packet) == NULL) {
> +/* If there is an error with popping the header, the packet is
> + * freed. In this case it should not continue SW processing.
> + */
> +ret = -1;
> +goto close_vport_netdev;
> +}
> +} else {
> +md_tnl = >tunnel;
> +if (rte_tnl->is_ipv6) {
> +memcpy(_tnl->ipv6_src, 

Re: [ovs-dev] [PATCH V3 12/14] dpif-netdev: Provide orig_in_port in metadata for tunneled packets

2021-03-15 Thread Sriharsha Basavapatna via dev
On Mon, Mar 15, 2021 at 3:57 PM Eli Britstein  wrote:
>
>
> On 3/15/2021 11:04 AM, Sriharsha Basavapatna wrote:
> > On Tue, Mar 2, 2021 at 4:56 PM Eli Britstein  wrote:
> >> From: Sriharsha Basavapatna 
> >>
> >> When an encapsulated packet is recirculated through a TUNNEL_POP
> >> action, the metadata gets reinitialized and the originating physical
> >> port information is lost. When this flow gets processed by the vport
> >> and it needs to be offloaded, we can't figure out the physical port
> >> through which the tunneled packet was received.
> >>
> >> Add a new member to the metadata: 'orig_in_port'. This is passed to
> >> the next stage during recirculation and the offload layer can use it
> >> to offload the flow to this physical port.
> >>
> >> Signed-off-by: Sriharsha Basavapatna 
> >> Signed-off-by: Eli Britstein 
> >> Reviewed-by: Gaetan Rivet 
> >> ---
> >>   lib/dpif-netdev.c| 20 ++--
> >>   lib/netdev-offload.h |  1 +
> >>   lib/packets.h|  8 +++-
> >>   3 files changed, 22 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >> index 58cad7ded..291c6eaa4 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -430,6 +430,7 @@ struct dp_flow_offload_item {
> >>   struct match match;
> >>   struct nlattr *actions;
> >>   size_t actions_len;
> >> +odp_port_t orig_in_port; /* Originating in_port for tnl flows. */
> >>
> >>   struct ovs_list node;
> >>   };
> >> @@ -2695,11 +2696,13 @@ dp_netdev_flow_offload_put(struct 
> >> dp_flow_offload_item *offload)
> >>   }
> >>   }
> >>   info.flow_mark = mark;
> >> +info.orig_in_port = offload->orig_in_port;
> >>
> >>   port = netdev_ports_get(in_port, dpif_type_str);
> >>   if (!port) {
> >>   goto err_free;
> >>   }
> >> +
> >>   /* Taking a global 'port_mutex' to fulfill thread safety 
> >> restrictions for
> >>* the netdev-offload-dpdk module. */
> >>   ovs_mutex_lock(>dp->port_mutex);
> >> @@ -2797,7 +2800,8 @@ queue_netdev_flow_del(struct dp_netdev_pmd_thread 
> >> *pmd,
> >>   static void
> >>   queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd,
> >> struct dp_netdev_flow *flow, struct match *match,
> >> -  const struct nlattr *actions, size_t actions_len)
> >> +  const struct nlattr *actions, size_t actions_len,
> >> +  odp_port_t orig_in_port)
> >>   {
> >>   struct dp_flow_offload_item *offload;
> >>   int op;
> >> @@ -2823,6 +2827,7 @@ queue_netdev_flow_put(struct dp_netdev_pmd_thread 
> >> *pmd,
> >>   offload->actions = xmalloc(actions_len);
> >>   memcpy(offload->actions, actions, actions_len);
> >>   offload->actions_len = actions_len;
> >> +offload->orig_in_port = orig_in_port;
> >>
> >>   dp_netdev_append_flow_offload(offload);
> >>   }
> >> @@ -3624,7 +3629,8 @@ dp_netdev_get_mega_ufid(const struct match *match, 
> >> ovs_u128 *mega_ufid)
> >>   static struct dp_netdev_flow *
> >>   dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
> >>  struct match *match, const ovs_u128 *ufid,
> >> -   const struct nlattr *actions, size_t actions_len)
> >> +   const struct nlattr *actions, size_t actions_len,
> >> +   odp_port_t orig_in_port)
> >>   OVS_REQUIRES(pmd->flow_mutex)
> >>   {
> >>   struct ds extra_info = DS_EMPTY_INITIALIZER;
> >> @@ -3690,7 +3696,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
> >>   cmap_insert(>flow_table, CONST_CAST(struct cmap_node *, 
> >> >node),
> >>   dp_netdev_flow_hash(>ufid));
> >>
> >> -queue_netdev_flow_put(pmd, flow, match, actions, actions_len);
> >> +queue_netdev_flow_put(pmd, flow, match, actions, actions_len,
> >> +  orig_in_port);
> >>
> >>   if (OVS_UNLIKELY(!VLOG_DROP_DBG((_rl {
> >>   struct ds ds = DS_EMPTY_INITIALIZER;
> >> @@ -3761,7 +3768,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> >>   if (!netdev_flow) {
> >>   if (put->flags & DPIF_FP_CREATE) {
> >>   dp_netdev_flow_add(pmd, match, ufid, put->actions,
> >> -   put->actions_len);
> >> +   put->actions_len, ODPP_NONE);
> >>   } else {
> >>   error = ENOENT;
> >>   }
> >> @@ -3777,7 +3784,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> >>   ovsrcu_set(_flow->actions, new_actions);
> >>
> >>   queue_netdev_flow_put(pmd, netdev_flow, match,
> >> -  put->actions, put->actions_len);
> >> +  put->actions, put->actions_len, 
> >> ODPP_NONE);
> >>
> >>   if (stats) {
> >>   get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL);
> >> @@ -7232,6 +7239,7 @@ 

Re: [ovs-dev] [PATCH V3 12/14] dpif-netdev: Provide orig_in_port in metadata for tunneled packets

2021-03-15 Thread Sriharsha Basavapatna via dev
On Tue, Mar 2, 2021 at 4:56 PM Eli Britstein  wrote:
>
> From: Sriharsha Basavapatna 
>
> When an encapsulated packet is recirculated through a TUNNEL_POP
> action, the metadata gets reinitialized and the originating physical
> port information is lost. When this flow gets processed by the vport
> and it needs to be offloaded, we can't figure out the physical port
> through which the tunneled packet was received.
>
> Add a new member to the metadata: 'orig_in_port'. This is passed to
> the next stage during recirculation and the offload layer can use it
> to offload the flow to this physical port.
>
> Signed-off-by: Sriharsha Basavapatna 
> Signed-off-by: Eli Britstein 
> Reviewed-by: Gaetan Rivet 
> ---
>  lib/dpif-netdev.c| 20 ++--
>  lib/netdev-offload.h |  1 +
>  lib/packets.h|  8 +++-
>  3 files changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 58cad7ded..291c6eaa4 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -430,6 +430,7 @@ struct dp_flow_offload_item {
>  struct match match;
>  struct nlattr *actions;
>  size_t actions_len;
> +odp_port_t orig_in_port; /* Originating in_port for tnl flows. */
>
>  struct ovs_list node;
>  };
> @@ -2695,11 +2696,13 @@ dp_netdev_flow_offload_put(struct 
> dp_flow_offload_item *offload)
>  }
>  }
>  info.flow_mark = mark;
> +info.orig_in_port = offload->orig_in_port;
>
>  port = netdev_ports_get(in_port, dpif_type_str);
>  if (!port) {
>  goto err_free;
>  }
> +
>  /* Taking a global 'port_mutex' to fulfill thread safety restrictions for
>   * the netdev-offload-dpdk module. */
>  ovs_mutex_lock(>dp->port_mutex);
> @@ -2797,7 +2800,8 @@ queue_netdev_flow_del(struct dp_netdev_pmd_thread *pmd,
>  static void
>  queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd,
>struct dp_netdev_flow *flow, struct match *match,
> -  const struct nlattr *actions, size_t actions_len)
> +  const struct nlattr *actions, size_t actions_len,
> +  odp_port_t orig_in_port)
>  {
>  struct dp_flow_offload_item *offload;
>  int op;
> @@ -2823,6 +2827,7 @@ queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd,
>  offload->actions = xmalloc(actions_len);
>  memcpy(offload->actions, actions, actions_len);
>  offload->actions_len = actions_len;
> +offload->orig_in_port = orig_in_port;
>
>  dp_netdev_append_flow_offload(offload);
>  }
> @@ -3624,7 +3629,8 @@ dp_netdev_get_mega_ufid(const struct match *match, 
> ovs_u128 *mega_ufid)
>  static struct dp_netdev_flow *
>  dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
> struct match *match, const ovs_u128 *ufid,
> -   const struct nlattr *actions, size_t actions_len)
> +   const struct nlattr *actions, size_t actions_len,
> +   odp_port_t orig_in_port)
>  OVS_REQUIRES(pmd->flow_mutex)
>  {
>  struct ds extra_info = DS_EMPTY_INITIALIZER;
> @@ -3690,7 +3696,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
>  cmap_insert(>flow_table, CONST_CAST(struct cmap_node *, 
> >node),
>  dp_netdev_flow_hash(>ufid));
>
> -queue_netdev_flow_put(pmd, flow, match, actions, actions_len);
> +queue_netdev_flow_put(pmd, flow, match, actions, actions_len,
> +  orig_in_port);
>
>  if (OVS_UNLIKELY(!VLOG_DROP_DBG((_rl {
>  struct ds ds = DS_EMPTY_INITIALIZER;
> @@ -3761,7 +3768,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>  if (!netdev_flow) {
>  if (put->flags & DPIF_FP_CREATE) {
>  dp_netdev_flow_add(pmd, match, ufid, put->actions,
> -   put->actions_len);
> +   put->actions_len, ODPP_NONE);
>  } else {
>  error = ENOENT;
>  }
> @@ -3777,7 +3784,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>  ovsrcu_set(_flow->actions, new_actions);
>
>  queue_netdev_flow_put(pmd, netdev_flow, match,
> -  put->actions, put->actions_len);
> +  put->actions, put->actions_len, ODPP_NONE);
>
>  if (stats) {
>  get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL);
> @@ -7232,6 +7239,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
>  ovs_u128 ufid;
>  int error;
>  uint64_t cycles = cycles_counter_update(>perf_stats);
> +odp_port_t orig_in_port = packet->md.orig_in_port;
>
>  match.tun_md.valid = false;
>  miniflow_expand(>mf, );
> @@ -7281,7 +7289,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
>  if (OVS_LIKELY(!netdev_flow)) {
>  netdev_flow = dp_netdev_flow_add(pmd, , ,
>   add_actions->data,
> -  

Re: [ovs-dev] [PATCH V2 13/14] netdev-offload-dpdk: Support vports flows offload

2021-03-01 Thread Sriharsha Basavapatna via dev
On Thu, Feb 25, 2021 at 7:51 PM Eli Britstein  wrote:
>
>
> On 2/25/2021 9:35 AM, Sriharsha Basavapatna wrote:
> > On Wed, Feb 24, 2021 at 4:50 PM Eli Britstein  wrote:
> >>
> >> On 2/24/2021 12:37 PM, Sriharsha Basavapatna wrote:
> >>> On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
>  Vports are virtual, OVS only logical devices, so rte_flows cannot be
>  applied as is on them. Instead, apply the rules the physical port from
>  which the packet has arrived, provided by orig_in_port field.
> 
>  Signed-off-by: Eli Britstein 
>  Reviewed-by: Gaetan Rivet 
>  ---
> lib/netdev-offload-dpdk.c | 169 ++
> 1 file changed, 137 insertions(+), 32 deletions(-)
> 
>  diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>  index f6e668bff..ad47d717f 100644
>  --- a/lib/netdev-offload-dpdk.c
>  +++ b/lib/netdev-offload-dpdk.c
>  @@ -62,6 +62,7 @@ struct ufid_to_rte_flow_data {
> struct rte_flow *rte_flow;
> bool actions_offloaded;
> struct dpif_flow_stats stats;
>  +struct netdev *physdev;
> };
> 
> /* Find rte_flow with @ufid. */
>  @@ -87,7 +88,8 @@ ufid_to_rte_flow_data_find(const ovs_u128 *ufid, bool 
>  warn)
> 
> static inline struct ufid_to_rte_flow_data *
> ufid_to_rte_flow_associate(const ovs_u128 *ufid, struct netdev 
>  *netdev,
>  -   struct rte_flow *rte_flow, bool 
>  actions_offloaded)
>  +   struct netdev *vport, struct rte_flow 
>  *rte_flow,
>  +   bool actions_offloaded)
> {
> size_t hash = hash_bytes(ufid, sizeof *ufid, 0);
> struct ufid_to_rte_flow_data *data = xzalloc(sizeof *data);
>  @@ -105,7 +107,8 @@ ufid_to_rte_flow_associate(const ovs_u128 *ufid, 
>  struct netdev *netdev,
> }
> 
> data->ufid = *ufid;
>  -data->netdev = netdev_ref(netdev);
>  +data->netdev = vport ? netdev_ref(vport) : netdev_ref(netdev);
>  +data->physdev = netdev_ref(netdev);
> >>> For non-tunnel offloads, we end up getting two references to the same
> >>> 'netdev'; can we avoid this ? That is, get a reference to physdev only
> >>> for the vport case.
> >> I know. This is on purpose. It simplifies other places, for example
> >> query, to always use physdev, and always close both without any
> >> additional logic there.
> > Ok,  please add this as an inline comment so we know why it is done
> > this way. Also, you missed my last comment in this patch (see
> > VXLAN_DECAP action below).
>
> OK.
>
> Sorry, see below.
>
> >
> >
> data->rte_flow = rte_flow;
> data->actions_offloaded = actions_offloaded;
> 
>  @@ -122,6 +125,7 @@ ufid_to_rte_flow_disassociate(struct 
>  ufid_to_rte_flow_data *data)
> cmap_remove(_to_rte_flow,
> CONST_CAST(struct cmap_node *, >node), hash);
> netdev_close(data->netdev);
>  +netdev_close(data->physdev);
> >>> Similar comment, release reference to physdev only if we got a
> >>> reference earlier (i.e., physdev should be non-null only when netdev
> >>> is a vport).
> >> Right. As it is written this way, no need for any additional logic here.
> ovsrcu_postpone(free, data);
> }
> 
>  @@ -134,6 +138,8 @@ struct flow_patterns {
> struct rte_flow_item *items;
> int cnt;
> int current_max;
>  +uint32_t num_of_tnl_items;
> >>> change to --> num_pmd_tnl_items
> >> OK.
>  +struct ds s_tnl;
> };
> 
> struct flow_actions {
>  @@ -154,16 +160,20 @@ struct flow_actions {
> static void
> dump_flow_attr(struct ds *s, struct ds *s_extra,
>    const struct rte_flow_attr *attr,
>  +   struct flow_patterns *flow_patterns,
>    struct flow_actions *flow_actions)
> {
> if (flow_actions->num_of_tnl_actions) {
> ds_clone(s_extra, _actions->s_tnl);
>  +} else if (flow_patterns->num_of_tnl_items) {
>  +ds_clone(s_extra, _patterns->s_tnl);
> }
>  -ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s%s",
>  +ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s%s%s",
>   attr->ingress  ? "ingress " : "",
>   attr->egress   ? "egress " : "", attr->priority, 
>  attr->group,
>   attr->transfer ? "transfer " : "",
>  -  flow_actions->num_of_tnl_actions ? "tunnel_set 1 " : 
>  "");
>  +  flow_actions->num_of_tnl_actions ? "tunnel_set 1 " : 
>  "",
>  +  flow_patterns->num_of_tnl_items ? "tunnel_match 1 " : 
>  "");
> }
> 

Re: [ovs-dev] [PATCH V2 10/14] netdev-offload-dpdk: Support tunnel pop action

2021-03-01 Thread Sriharsha Basavapatna via dev
On Thu, Feb 25, 2021 at 7:50 PM Eli Britstein  wrote:
>
>
> On 2/25/2021 9:35 AM, Sriharsha Basavapatna wrote:
> > On Wed, Feb 24, 2021 at 1:50 PM Eli Britstein  wrote:
> >>
> >> On 2/24/2021 9:52 AM, Sriharsha Basavapatna wrote:
> >>> On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
>  Support tunnel pop action.
> 
>  Signed-off-by: Eli Britstein 
>  Reviewed-by: Gaetan Rivet 
>  ---
> Documentation/howto/dpdk.rst |   1 +
> NEWS |   1 +
> lib/netdev-offload-dpdk.c| 173 ---
> 3 files changed, 160 insertions(+), 15 deletions(-)
> 
>  diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
>  index f0d45e47b..4918d80f3 100644
>  --- a/Documentation/howto/dpdk.rst
>  +++ b/Documentation/howto/dpdk.rst
>  @@ -398,6 +398,7 @@ Supported actions for hardware offload are:
> - VLAN Push/Pop (push_vlan/pop_vlan).
> - Modification of IPv6 
>  (set_field:->ipv6_src/ipv6_dst/mod_nw_ttl).
> - Clone/output (tnl_push and output) for encapsulating over a tunnel.
>  +- Tunnel pop, for changing from a physical port to a vport.
> 
> Further Reading
> ---
>  diff --git a/NEWS b/NEWS
>  index a7bffce97..6850d5621 100644
>  --- a/NEWS
>  +++ b/NEWS
>  @@ -26,6 +26,7 @@ v2.15.0 - xx xxx 
>    - DPDK:
>  * Removed support for vhost-user dequeue zero-copy.
>  * Add support for DPDK 20.11.
>  + * Add hardware offload support for tunnel pop action 
>  (experimental).
>    - Userspace datapath:
>  * Add the 'pmd' option to "ovs-appctl dpctl/dump-flows", which
>    restricts a flow dump to a single PMD thread if set.
>  diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>  index 78f866080..493cc9159 100644
>  --- a/lib/netdev-offload-dpdk.c
>  +++ b/lib/netdev-offload-dpdk.c
>  @@ -140,15 +140,30 @@ struct flow_actions {
> struct rte_flow_action *actions;
> int cnt;
> int current_max;
>  +struct netdev *tnl_netdev;
>  +/* tnl_actions is the opaque array of actions returned by the PMD. 
>  */
>  +struct rte_flow_action *tnl_actions;
> >>> Why is this an opaque array ? Since it is struct rte_flow_action, OVS
> >>> knows the type and members. Is it opaque because the value of
> >>> rte_flow_action_type member is unknown to OVS ? Is it a private action
> >>> type and if so how does the PMD ensure that it doesn't conflict with
> >>> standard action types ?
> >> True it is not used by OVS, but that's not why it opaque. Although it is
> >> struct rte_flow_action array, the PMD may use its own private actions,
> >> not defined in rte_flow.h, thus not known to the application.
> >>
> >> The details of this array is under the PMD's responsibility.
> > So this means the PMD has to pick a range of private action values
> > that are not defined in rte_flow.h,  but what if later a new action
> > type with the same value is added to rte_flow.h ?
> > The other question is if the PMD could use one of the existing action
> > types in rte_flow.h [i.e, to avoid defining its own private action
> > types] and return it in the opaque action array ?
>
> The goal of the API is to be able for each PMD to have its own
> implementation, private actions or not.
>
> For the application, this is opaque, as it doesn't know the details of it.
>
> If there is a change in rte_flow.h, it means ABI change in DPDK. DPDK
> release policy protects us in a sense.

Take this example: assume action type 100 is not yet defined in
rte_flow.h and a PMD uses this value for a new private action that it
defines. Later, if a new standard action type is added to rte_flow.h
with the same value, then the PMD has no way to distinguish if the
action is a standard action or its private action. Also, this private
action type defined by some vendor's PMD could be 100 and it could be
200 in another vendor's PMD. So don't we need to ensure that the
standard action types and private action types don't overlap ? One way
to handle this might be to reserve a range of values in rte_flow.h as
a vendor specific range, for example 100 to 200. And each PMD could
define its own private action types within this range, since it is
ensured that no standard action types would be defined in that range.

>
> >
>  +uint32_t num_of_tnl_actions;
>  +/* tnl_actions_pos is where the tunnel actions starts within the 
>  'actions'
>  + * field.
>  + */
>  +int tnl_actions_pos;
> >>> Names should indicate that they are private or pmd specific ?
> >>>
> >>> tnl_actions --> tnl_private_actions or tnl_pmd_actions
> >>> num_of_tnl_actions --> num_private_tnl_actions or num_pmd_tnl_actions
> >>> tnl_actions_pos --> tnl_private_actions_pos or tnl_pmd_actions_pos
> 

Re: [ovs-dev] [PATCH V2 11/14] netdev-offload-dpdk: Refactor offload rule creation

2021-02-25 Thread Sriharsha Basavapatna via dev
On Thu, Feb 25, 2021 at 7:51 PM Eli Britstein  wrote:
>
>
> On 2/25/2021 9:35 AM, Sriharsha Basavapatna wrote:
> > On Wed, Feb 24, 2021 at 3:47 PM Eli Britstein  wrote:
> >>
> >> On 2/24/2021 11:55 AM, Sriharsha Basavapatna wrote:
> >>> On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
>  Refactor offload rule creation as a pre-step towards tunnel matching
>  that depend on the netdev.
> 
>  Signed-off-by: Eli Britstein 
>  Reviewed-by: Gaetan Rivet 
>  ---
> lib/netdev-offload-dpdk.c | 106 --
> 1 file changed, 45 insertions(+), 61 deletions(-)
> 
>  diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>  index 493cc9159..f6e668bff 100644
>  --- a/lib/netdev-offload-dpdk.c
>  +++ b/lib/netdev-offload-dpdk.c
>  @@ -1009,30 +1009,6 @@ add_flow_mark_rss_actions(struct flow_actions 
>  *actions,
> add_flow_action(actions, RTE_FLOW_ACTION_TYPE_END, NULL);
> }
> 
>  -static struct rte_flow *
>  -netdev_offload_dpdk_mark_rss(struct flow_patterns *patterns,
>  - struct netdev *netdev,
>  - uint32_t flow_mark)
>  -{
>  -struct flow_actions actions = { .actions = NULL, .cnt = 0 };
>  -const struct rte_flow_attr flow_attr = {
>  -.group = 0,
>  -.priority = 0,
>  -.ingress = 1,
>  -.egress = 0
>  -};
>  -struct rte_flow_error error;
>  -struct rte_flow *flow;
>  -
>  -add_flow_mark_rss_actions(, flow_mark, netdev);
>  -
>  -flow = netdev_offload_dpdk_flow_create(netdev, _attr, 
>  patterns->items,
>  -   , );
>  -
>  -free_flow_actions();
>  -return flow;
>  -}
>  -
> static void
> add_count_action(struct flow_actions *actions)
> {
>  @@ -1509,27 +1485,48 @@ parse_flow_actions(struct netdev *netdev,
> return 0;
> }
> 
>  -static struct rte_flow *
>  -netdev_offload_dpdk_actions(struct netdev *netdev,
>  -struct flow_patterns *patterns,
>  -struct nlattr *nl_actions,
>  -size_t actions_len)
>  +static struct ufid_to_rte_flow_data *
>  +create_netdev_offload(struct netdev *netdev,
>  +  const ovs_u128 *ufid,
>  +  struct flow_patterns *flow_patterns,
>  +  struct flow_actions *flow_actions,
>  +  bool enable_full,
>  +  uint32_t flow_mark)
> {
>  -const struct rte_flow_attr flow_attr = { .ingress = 1, .transfer = 
>  1 };
>  -struct flow_actions actions = { .actions = NULL, .cnt = 0 };
>  +struct rte_flow_attr flow_attr = { .ingress = 1, .transfer = 1, };
>  +struct flow_actions rss_actions = { .actions = NULL, .cnt = 0 };
>  +struct rte_flow_item *items = flow_patterns->items;
>  +struct ufid_to_rte_flow_data *flow_data = NULL;
>  +bool actions_offloaded = true;
> struct rte_flow *flow = NULL;
> struct rte_flow_error error;
>  -int ret;
> 
>  -ret = parse_flow_actions(netdev, , nl_actions, actions_len);
>  -if (ret) {
>  -goto out;
>  +if (enable_full) {
>  +flow = netdev_offload_dpdk_flow_create(netdev, _attr, 
>  items,
>  +   flow_actions, );
> }
>  -flow = netdev_offload_dpdk_flow_create(netdev, _attr, 
>  patterns->items,
>  -   , );
>  -out:
>  -free_flow_actions();
>  -return flow;
>  +
>  +if (!flow) {
>  +/* If we failed to offload the rule actions fallback to MARK+RSS
>  + * actions.
>  + */
> >>> A  debug message might be useful here, when we fallback to mark/rss 
> >>> action ?
> >> We can, but this is just a refactor commit, and this fallback is not
> >> new. Add it anyway?
> > I think it'd be useful to add a debug message here and also in
> > parse_flow_actions() to indicate that action offloading failed.
>
> For the fallback, we have this info by dpctl/dump-flows ("partial").
> Furthermore, it may flood the log, depending on PMD rte_flow support.
>
> For parse_flow_action failure, there are some prints there with the
> places of failure, to be more useful rather than just a generic failure
> message.
>
> Let's keep this commit as a refactor commit without any logic changes,
> that can be added later. What do you think?

Ok.

>
> >
>  +actions_offloaded = false;
>  +flow_attr.transfer = 0;
>  +add_flow_mark_rss_actions(_actions, flow_mark, netdev);
> 

Re: [ovs-dev] [PATCH V2 05/14] netdev-offload-dpdk: Implement HW miss packet recover for vport

2021-02-25 Thread Sriharsha Basavapatna via dev
On Thu, Feb 25, 2021 at 7:50 PM Eli Britstein  wrote:
>
>
> On 2/25/2021 9:34 AM, Sriharsha Basavapatna wrote:
> > On Tue, Feb 23, 2021 at 7:24 PM Eli Britstein  wrote:
> >>
> >> On 2/23/2021 3:10 PM, Sriharsha Basavapatna wrote:
> >>> On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
>  A miss in virtual port offloads means the flow with tnl_pop was
>  offloaded, but not the following one. Recover the state and continue
>  with SW processing.
> >>> Relates to my comment on Patch-1; please explain what is meant by
> >>> recovering the packet state.
> >> See my response there.
> > Responded to your comment.
>  Signed-off-by: Eli Britstein 
>  Reviewed-by: Gaetan Rivet 
>  ---
> lib/netdev-offload-dpdk.c | 95 +++
> 1 file changed, 95 insertions(+)
> 
>  diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>  index 8cc90d0f1..21aa26b42 100644
>  --- a/lib/netdev-offload-dpdk.c
>  +++ b/lib/netdev-offload-dpdk.c
>  @@ -1610,6 +1610,100 @@ netdev_offload_dpdk_flow_dump_destroy(struct 
>  netdev_flow_dump *dump)
> return 0;
> }
> 
>  +static struct netdev *
>  +get_vport_netdev(const char *dpif_type,
>  + struct rte_flow_tunnel *tunnel,
>  + odp_port_t *odp_port)
>  +{
>  +const struct netdev_tunnel_config *tnl_cfg;
>  +struct netdev_flow_dump **netdev_dumps;
>  +struct netdev *vport = NULL;
>  +bool found = false;
>  +int num_ports = 0;
>  +int err;
>  +int i;
>  +
>  +netdev_dumps = netdev_ports_flow_dump_create(dpif_type, _ports, 
>  false);
> >>> This relates to my comment in Patch-3; flow_dump_create() in this
> >>> context is very confusing since we are not really dumping flows. We
> >>> might as well walk the list of ports/netdevs looking for a specific
> >>> netdev, just like other netdev_ports_*() routines in netdev-offload.c;
> >>> may be add a new function in netdev-offload.c:
> >> As in the response there. I used an existing API, not introducing new ones.
> > Like I said, this can be confusing that while trying to offload a
> > flow, we invoke flow-dump API. And the flow-dump API implementation
> > does nothing, apart from getting a reference to a netdev. IMO, it'd be
> > better to add a new API to keep it simple and clear, as shown below.
> >>> netdev_ports_get_tunnel_vport(dpif_type, tunnel_type, tp_dst):/*
> >>> example: tunnel_type == "vxlan", tp_dst = 4789 */
> >>>
> >>> HMAP_FOR_EACH (data, portno_node, _to_netdev) {
> >>>   if (netdev_get_dpif_type(data->netdev) == dpif_type &&
> >>>   netdev_get_type(data->netdev) == tunnel_type) {
> >>>   tnl_cfg = netdev_get_tunnel_config(data->netdev);
> >>>if (tnl_cfg && tnl_cfg->dst_port == tp_dst) {
> >>>
> >>>
> >>>   }
> >>>   }
> An alternative I can suggest is something like the following:
>
> New API:
>
> void
> netdev_ports_traverse(const char *dpif_type,
>void (*cb)(struct netdev *, odp_port_t, void *),
>void *cookie)
> {
>  struct port_to_netdev_data *data;
>
> ovs_rwlock_rdlock(_hmap_rwlock);
>  HMAP_FOR_EACH (data, portno_node, _to_netdev) {
>  if (netdev_get_dpif_type(data->netdev) == dpif_type) {
>  cb(data->netdev, data->dpif_port.port_no, cookie);
> }
> }
> ovs_rwlock_unlock(_hmap_rwlock);
> }
>
> Then, implement cb to find the tunnel in netdev-offload-dpdk.c. What do
> you think?

Yes, that should be ok; it might need some additional logic to stop
traversal when you find the required node ? Callback routine could
return a value that indicates continue or stop traversal ? Anyway,
please go ahead with the changes.

Thanks,
-Harsha

>
> >>>
>  +for (i = 0; i < num_ports; i++) {
>  +if (!found && tunnel->type == RTE_FLOW_ITEM_TYPE_VXLAN &&
>  +!strcmp(netdev_get_type(netdev_dumps[i]->netdev), "vxlan")) 
>  {
>  +tnl_cfg = netdev_get_tunnel_config(netdev_dumps[i]->netdev);
>  +if (tnl_cfg && tnl_cfg->dst_port == tunnel->tp_dst) {
>  +found = true;
>  +vport = netdev_dumps[i]->netdev;
>  +netdev_ref(vport);
>  +*odp_port = netdev_dumps[i]->port;
>  +}
>  +}
>  +err = netdev_flow_dump_destroy(netdev_dumps[i]);
>  +if (err != 0 && err != EOPNOTSUPP) {
>  +VLOG_ERR("failed dumping netdev: %s", ovs_strerror(err));
>  +}
>  +}
>  +return vport;
>  +}
>  +
>  +static int
>  +netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
>  +   struct dp_packet *packet)
>  +{
>  +struct 

Re: [ovs-dev] [PATCH V2 13/14] netdev-offload-dpdk: Support vports flows offload

2021-02-24 Thread Sriharsha Basavapatna via dev
On Wed, Feb 24, 2021 at 4:50 PM Eli Britstein  wrote:
>
>
> On 2/24/2021 12:37 PM, Sriharsha Basavapatna wrote:
> > On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
> >> Vports are virtual, OVS only logical devices, so rte_flows cannot be
> >> applied as is on them. Instead, apply the rules the physical port from
> >> which the packet has arrived, provided by orig_in_port field.
> >>
> >> Signed-off-by: Eli Britstein 
> >> Reviewed-by: Gaetan Rivet 
> >> ---
> >>   lib/netdev-offload-dpdk.c | 169 ++
> >>   1 file changed, 137 insertions(+), 32 deletions(-)
> >>
> >> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> >> index f6e668bff..ad47d717f 100644
> >> --- a/lib/netdev-offload-dpdk.c
> >> +++ b/lib/netdev-offload-dpdk.c
> >> @@ -62,6 +62,7 @@ struct ufid_to_rte_flow_data {
> >>   struct rte_flow *rte_flow;
> >>   bool actions_offloaded;
> >>   struct dpif_flow_stats stats;
> >> +struct netdev *physdev;
> >>   };
> >>
> >>   /* Find rte_flow with @ufid. */
> >> @@ -87,7 +88,8 @@ ufid_to_rte_flow_data_find(const ovs_u128 *ufid, bool 
> >> warn)
> >>
> >>   static inline struct ufid_to_rte_flow_data *
> >>   ufid_to_rte_flow_associate(const ovs_u128 *ufid, struct netdev *netdev,
> >> -   struct rte_flow *rte_flow, bool 
> >> actions_offloaded)
> >> +   struct netdev *vport, struct rte_flow 
> >> *rte_flow,
> >> +   bool actions_offloaded)
> >>   {
> >>   size_t hash = hash_bytes(ufid, sizeof *ufid, 0);
> >>   struct ufid_to_rte_flow_data *data = xzalloc(sizeof *data);
> >> @@ -105,7 +107,8 @@ ufid_to_rte_flow_associate(const ovs_u128 *ufid, 
> >> struct netdev *netdev,
> >>   }
> >>
> >>   data->ufid = *ufid;
> >> -data->netdev = netdev_ref(netdev);
> >> +data->netdev = vport ? netdev_ref(vport) : netdev_ref(netdev);
> >> +data->physdev = netdev_ref(netdev);
> > For non-tunnel offloads, we end up getting two references to the same
> > 'netdev'; can we avoid this ? That is, get a reference to physdev only
> > for the vport case.
> I know. This is on purpose. It simplifies other places, for example
> query, to always use physdev, and always close both without any
> additional logic there.

Ok,  please add this as an inline comment so we know why it is done
this way. Also, you missed my last comment in this patch (see
VXLAN_DECAP action below).


> >>   data->rte_flow = rte_flow;
> >>   data->actions_offloaded = actions_offloaded;
> >>
> >> @@ -122,6 +125,7 @@ ufid_to_rte_flow_disassociate(struct 
> >> ufid_to_rte_flow_data *data)
> >>   cmap_remove(_to_rte_flow,
> >>   CONST_CAST(struct cmap_node *, >node), hash);
> >>   netdev_close(data->netdev);
> >> +netdev_close(data->physdev);
> > Similar comment, release reference to physdev only if we got a
> > reference earlier (i.e., physdev should be non-null only when netdev
> > is a vport).
> Right. As it is written this way, no need for any additional logic here.
> >>   ovsrcu_postpone(free, data);
> >>   }
> >>
> >> @@ -134,6 +138,8 @@ struct flow_patterns {
> >>   struct rte_flow_item *items;
> >>   int cnt;
> >>   int current_max;
> >> +uint32_t num_of_tnl_items;
> > change to --> num_pmd_tnl_items
> OK.
> >> +struct ds s_tnl;
> >>   };
> >>
> >>   struct flow_actions {
> >> @@ -154,16 +160,20 @@ struct flow_actions {
> >>   static void
> >>   dump_flow_attr(struct ds *s, struct ds *s_extra,
> >>  const struct rte_flow_attr *attr,
> >> +   struct flow_patterns *flow_patterns,
> >>  struct flow_actions *flow_actions)
> >>   {
> >>   if (flow_actions->num_of_tnl_actions) {
> >>   ds_clone(s_extra, _actions->s_tnl);
> >> +} else if (flow_patterns->num_of_tnl_items) {
> >> +ds_clone(s_extra, _patterns->s_tnl);
> >>   }
> >> -ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s%s",
> >> +ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s%s%s",
> >> attr->ingress  ? "ingress " : "",
> >> attr->egress   ? "egress " : "", attr->priority, 
> >> attr->group,
> >> attr->transfer ? "transfer " : "",
> >> -  flow_actions->num_of_tnl_actions ? "tunnel_set 1 " : 
> >> "");
> >> +  flow_actions->num_of_tnl_actions ? "tunnel_set 1 " : "",
> >> +  flow_patterns->num_of_tnl_items ? "tunnel_match 1 " : 
> >> "");
> >>   }
> >>
> >>   /* Adds one pattern item 'field' with the 'mask' to dynamic string 's' 
> >> using
> >> @@ -177,9 +187,18 @@ dump_flow_attr(struct ds *s, struct ds *s_extra,
> >>   }
> >>
> >>   static void
> >> -dump_flow_pattern(struct ds *s, const struct rte_flow_item *item)
> >> +dump_flow_pattern(struct ds *s,
> >> +  struct flow_patterns *flow_patterns,
> >> +  int pattern_index)
> >>   {

Re: [ovs-dev] [PATCH V2 11/14] netdev-offload-dpdk: Refactor offload rule creation

2021-02-24 Thread Sriharsha Basavapatna via dev
On Wed, Feb 24, 2021 at 3:47 PM Eli Britstein  wrote:
>
>
> On 2/24/2021 11:55 AM, Sriharsha Basavapatna wrote:
> > On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
> >> Refactor offload rule creation as a pre-step towards tunnel matching
> >> that depend on the netdev.
> >>
> >> Signed-off-by: Eli Britstein 
> >> Reviewed-by: Gaetan Rivet 
> >> ---
> >>   lib/netdev-offload-dpdk.c | 106 --
> >>   1 file changed, 45 insertions(+), 61 deletions(-)
> >>
> >> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> >> index 493cc9159..f6e668bff 100644
> >> --- a/lib/netdev-offload-dpdk.c
> >> +++ b/lib/netdev-offload-dpdk.c
> >> @@ -1009,30 +1009,6 @@ add_flow_mark_rss_actions(struct flow_actions 
> >> *actions,
> >>   add_flow_action(actions, RTE_FLOW_ACTION_TYPE_END, NULL);
> >>   }
> >>
> >> -static struct rte_flow *
> >> -netdev_offload_dpdk_mark_rss(struct flow_patterns *patterns,
> >> - struct netdev *netdev,
> >> - uint32_t flow_mark)
> >> -{
> >> -struct flow_actions actions = { .actions = NULL, .cnt = 0 };
> >> -const struct rte_flow_attr flow_attr = {
> >> -.group = 0,
> >> -.priority = 0,
> >> -.ingress = 1,
> >> -.egress = 0
> >> -};
> >> -struct rte_flow_error error;
> >> -struct rte_flow *flow;
> >> -
> >> -add_flow_mark_rss_actions(, flow_mark, netdev);
> >> -
> >> -flow = netdev_offload_dpdk_flow_create(netdev, _attr, 
> >> patterns->items,
> >> -   , );
> >> -
> >> -free_flow_actions();
> >> -return flow;
> >> -}
> >> -
> >>   static void
> >>   add_count_action(struct flow_actions *actions)
> >>   {
> >> @@ -1509,27 +1485,48 @@ parse_flow_actions(struct netdev *netdev,
> >>   return 0;
> >>   }
> >>
> >> -static struct rte_flow *
> >> -netdev_offload_dpdk_actions(struct netdev *netdev,
> >> -struct flow_patterns *patterns,
> >> -struct nlattr *nl_actions,
> >> -size_t actions_len)
> >> +static struct ufid_to_rte_flow_data *
> >> +create_netdev_offload(struct netdev *netdev,
> >> +  const ovs_u128 *ufid,
> >> +  struct flow_patterns *flow_patterns,
> >> +  struct flow_actions *flow_actions,
> >> +  bool enable_full,
> >> +  uint32_t flow_mark)
> >>   {
> >> -const struct rte_flow_attr flow_attr = { .ingress = 1, .transfer = 1 
> >> };
> >> -struct flow_actions actions = { .actions = NULL, .cnt = 0 };
> >> +struct rte_flow_attr flow_attr = { .ingress = 1, .transfer = 1, };
> >> +struct flow_actions rss_actions = { .actions = NULL, .cnt = 0 };
> >> +struct rte_flow_item *items = flow_patterns->items;
> >> +struct ufid_to_rte_flow_data *flow_data = NULL;
> >> +bool actions_offloaded = true;
> >>   struct rte_flow *flow = NULL;
> >>   struct rte_flow_error error;
> >> -int ret;
> >>
> >> -ret = parse_flow_actions(netdev, , nl_actions, actions_len);
> >> -if (ret) {
> >> -goto out;
> >> +if (enable_full) {
> >> +flow = netdev_offload_dpdk_flow_create(netdev, _attr, items,
> >> +   flow_actions, );
> >>   }
> >> -flow = netdev_offload_dpdk_flow_create(netdev, _attr, 
> >> patterns->items,
> >> -   , );
> >> -out:
> >> -free_flow_actions();
> >> -return flow;
> >> +
> >> +if (!flow) {
> >> +/* If we failed to offload the rule actions fallback to MARK+RSS
> >> + * actions.
> >> + */
> > A  debug message might be useful here, when we fallback to mark/rss action ?
> We can, but this is just a refactor commit, and this fallback is not
> new. Add it anyway?

I think it'd be useful to add a debug message here and also in
parse_flow_actions() to indicate that action offloading failed.

> >> +actions_offloaded = false;
> >> +flow_attr.transfer = 0;
> >> +add_flow_mark_rss_actions(_actions, flow_mark, netdev);
> >> +flow = netdev_offload_dpdk_flow_create(netdev, _attr, items,
> >> +   _actions, );
> >> +}
> >> +
> >> +if (flow) {
> >> +flow_data = ufid_to_rte_flow_associate(ufid, netdev, flow,
> >> +   actions_offloaded);
> >> +VLOG_DBG("%s: installed flow %p by ufid "UUID_FMT,
> >> + netdev_get_name(netdev), flow,
> >> + UUID_ARGS((struct uuid *) ufid));
> >> +}
> >> +
> >> +free_flow_actions(_actions);
> > This free is needed only when we fallback to mark/rss offload, not 
> > otherwise.
> OK.
> >
> >> +return flow_data;
> >>   }
> >>
> >>   static struct ufid_to_rte_flow_data *
> >> @@ -1541,9 +1538,9 @@ 

Re: [ovs-dev] [PATCH V2 10/14] netdev-offload-dpdk: Support tunnel pop action

2021-02-24 Thread Sriharsha Basavapatna via dev
On Wed, Feb 24, 2021 at 1:50 PM Eli Britstein  wrote:
>
>
> On 2/24/2021 9:52 AM, Sriharsha Basavapatna wrote:
> > On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
> >> Support tunnel pop action.
> >>
> >> Signed-off-by: Eli Britstein 
> >> Reviewed-by: Gaetan Rivet 
> >> ---
> >>   Documentation/howto/dpdk.rst |   1 +
> >>   NEWS |   1 +
> >>   lib/netdev-offload-dpdk.c| 173 ---
> >>   3 files changed, 160 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> >> index f0d45e47b..4918d80f3 100644
> >> --- a/Documentation/howto/dpdk.rst
> >> +++ b/Documentation/howto/dpdk.rst
> >> @@ -398,6 +398,7 @@ Supported actions for hardware offload are:
> >>   - VLAN Push/Pop (push_vlan/pop_vlan).
> >>   - Modification of IPv6 (set_field:->ipv6_src/ipv6_dst/mod_nw_ttl).
> >>   - Clone/output (tnl_push and output) for encapsulating over a tunnel.
> >> +- Tunnel pop, for changing from a physical port to a vport.
> >>
> >>   Further Reading
> >>   ---
> >> diff --git a/NEWS b/NEWS
> >> index a7bffce97..6850d5621 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -26,6 +26,7 @@ v2.15.0 - xx xxx 
> >>  - DPDK:
> >>* Removed support for vhost-user dequeue zero-copy.
> >>* Add support for DPDK 20.11.
> >> + * Add hardware offload support for tunnel pop action (experimental).
> >>  - Userspace datapath:
> >>* Add the 'pmd' option to "ovs-appctl dpctl/dump-flows", which
> >>  restricts a flow dump to a single PMD thread if set.
> >> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> >> index 78f866080..493cc9159 100644
> >> --- a/lib/netdev-offload-dpdk.c
> >> +++ b/lib/netdev-offload-dpdk.c
> >> @@ -140,15 +140,30 @@ struct flow_actions {
> >>   struct rte_flow_action *actions;
> >>   int cnt;
> >>   int current_max;
> >> +struct netdev *tnl_netdev;
> >> +/* tnl_actions is the opaque array of actions returned by the PMD. */
> >> +struct rte_flow_action *tnl_actions;
> > Why is this an opaque array ? Since it is struct rte_flow_action, OVS
> > knows the type and members. Is it opaque because the value of
> > rte_flow_action_type member is unknown to OVS ? Is it a private action
> > type and if so how does the PMD ensure that it doesn't conflict with
> > standard action types ?
>
> True it is not used by OVS, but that's not why it opaque. Although it is
> struct rte_flow_action array, the PMD may use its own private actions,
> not defined in rte_flow.h, thus not known to the application.
>
> The details of this array is under the PMD's responsibility.

So this means the PMD has to pick a range of private action values
that are not defined in rte_flow.h,  but what if later a new action
type with the same value is added to rte_flow.h ?
The other question is if the PMD could use one of the existing action
types in rte_flow.h [i.e, to avoid defining its own private action
types] and return it in the opaque action array ?

>
> >
> >> +uint32_t num_of_tnl_actions;
> >> +/* tnl_actions_pos is where the tunnel actions starts within the 
> >> 'actions'
> >> + * field.
> >> + */
> >> +int tnl_actions_pos;
> > Names should indicate that they are private or pmd specific ?
> >
> > tnl_actions --> tnl_private_actions or tnl_pmd_actions
> > num_of_tnl_actions --> num_private_tnl_actions or num_pmd_tnl_actions
> > tnl_actions_pos --> tnl_private_actions_pos or tnl_pmd_actions_pos
> OK. _pmd_
> >
> >> +struct ds s_tnl;
> >>   };
> >>
> >>   static void
> >> -dump_flow_attr(struct ds *s, const struct rte_flow_attr *attr)
> >> +dump_flow_attr(struct ds *s, struct ds *s_extra,
> >> +   const struct rte_flow_attr *attr,
> >> +   struct flow_actions *flow_actions)
> >>   {
> >> -ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s",
> >> +if (flow_actions->num_of_tnl_actions) {
> >> +ds_clone(s_extra, _actions->s_tnl);
> >> +}
> >> +ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s%s",
> >> attr->ingress  ? "ingress " : "",
> >> attr->egress   ? "egress " : "", attr->priority, 
> >> attr->group,
> >> -  attr->transfer ? "transfer " : "");
> >> +  attr->transfer ? "transfer " : "",
> >> +  flow_actions->num_of_tnl_actions ? "tunnel_set 1 " : 
> >> "");
> >>   }
> >>
> >>   /* Adds one pattern item 'field' with the 'mask' to dynamic string 's' 
> >> using
> >> @@ -395,9 +410,19 @@ dump_vxlan_encap(struct ds *s, const struct 
> >> rte_flow_item *items)
> >>
> >>   static void
> >>   dump_flow_action(struct ds *s, struct ds *s_extra,
> >> - const struct rte_flow_action *actions)
> >> + struct flow_actions *flow_actions, int act_index)
> >>   {
> >> -if (actions->type == RTE_FLOW_ACTION_TYPE_MARK) {
> >> +const struct 

Re: [ovs-dev] [PATCH V2 05/14] netdev-offload-dpdk: Implement HW miss packet recover for vport

2021-02-24 Thread Sriharsha Basavapatna via dev
On Tue, Feb 23, 2021 at 7:24 PM Eli Britstein  wrote:
>
>
> On 2/23/2021 3:10 PM, Sriharsha Basavapatna wrote:
> > On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
> >> A miss in virtual port offloads means the flow with tnl_pop was
> >> offloaded, but not the following one. Recover the state and continue
> >> with SW processing.
> > Relates to my comment on Patch-1; please explain what is meant by
> > recovering the packet state.
> See my response there.

Responded to your comment.
> >
> >> Signed-off-by: Eli Britstein 
> >> Reviewed-by: Gaetan Rivet 
> >> ---
> >>   lib/netdev-offload-dpdk.c | 95 +++
> >>   1 file changed, 95 insertions(+)
> >>
> >> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> >> index 8cc90d0f1..21aa26b42 100644
> >> --- a/lib/netdev-offload-dpdk.c
> >> +++ b/lib/netdev-offload-dpdk.c
> >> @@ -1610,6 +1610,100 @@ netdev_offload_dpdk_flow_dump_destroy(struct 
> >> netdev_flow_dump *dump)
> >>   return 0;
> >>   }
> >>
> >> +static struct netdev *
> >> +get_vport_netdev(const char *dpif_type,
> >> + struct rte_flow_tunnel *tunnel,
> >> + odp_port_t *odp_port)
> >> +{
> >> +const struct netdev_tunnel_config *tnl_cfg;
> >> +struct netdev_flow_dump **netdev_dumps;
> >> +struct netdev *vport = NULL;
> >> +bool found = false;
> >> +int num_ports = 0;
> >> +int err;
> >> +int i;
> >> +
> >> +netdev_dumps = netdev_ports_flow_dump_create(dpif_type, _ports, 
> >> false);
> > This relates to my comment in Patch-3; flow_dump_create() in this
> > context is very confusing since we are not really dumping flows. We
> > might as well walk the list of ports/netdevs looking for a specific
> > netdev, just like other netdev_ports_*() routines in netdev-offload.c;
> > may be add a new function in netdev-offload.c:
> As in the response there. I used an existing API, not introducing new ones.

Like I said, this can be confusing that while trying to offload a
flow, we invoke flow-dump API. And the flow-dump API implementation
does nothing, apart from getting a reference to a netdev. IMO, it'd be
better to add a new API to keep it simple and clear, as shown below.
> >
> > netdev_ports_get_tunnel_vport(dpif_type, tunnel_type, tp_dst):/*
> > example: tunnel_type == "vxlan", tp_dst = 4789 */
> >
> > HMAP_FOR_EACH (data, portno_node, _to_netdev) {
> >  if (netdev_get_dpif_type(data->netdev) == dpif_type &&
> >  netdev_get_type(data->netdev) == tunnel_type) {
> >  tnl_cfg = netdev_get_tunnel_config(data->netdev);
> >   if (tnl_cfg && tnl_cfg->dst_port == tp_dst) {
> >   
> >   
> >  }
> >  }
> >
> >> +for (i = 0; i < num_ports; i++) {
> >> +if (!found && tunnel->type == RTE_FLOW_ITEM_TYPE_VXLAN &&
> >> +!strcmp(netdev_get_type(netdev_dumps[i]->netdev), "vxlan")) {
> >> +tnl_cfg = netdev_get_tunnel_config(netdev_dumps[i]->netdev);
> >> +if (tnl_cfg && tnl_cfg->dst_port == tunnel->tp_dst) {
> >> +found = true;
> >> +vport = netdev_dumps[i]->netdev;
> >> +netdev_ref(vport);
> >> +*odp_port = netdev_dumps[i]->port;
> >> +}
> >> +}
> >> +err = netdev_flow_dump_destroy(netdev_dumps[i]);
> >> +if (err != 0 && err != EOPNOTSUPP) {
> >> +VLOG_ERR("failed dumping netdev: %s", ovs_strerror(err));
> >> +}
> >> +}
> >> +return vport;
> >> +}
> >> +
> >> +static int
> >> +netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
> >> +   struct dp_packet *packet)
> >> +{
> >> +struct rte_flow_restore_info rte_restore_info;
> >> +struct rte_flow_tunnel *rte_tnl;
> >> +struct rte_flow_error error;
> >> +struct netdev *vport_netdev;
> >> +struct pkt_metadata *md;
> >> +struct flow_tnl *md_tnl;
> >> +odp_port_t vport_odp;
> >> +
> >> +if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
> >> +  _restore_info, )) 
> >> {
> >> +/* This function is called for every packet, and in most cases 
> >> there
> >> + * will be no restore info from the HW, thus error is expected.
> > Right now this API (get_restore_info) is needed only in the case of
> > tunnel offloads; is there some way we could avoid calling this for
> > every packet (e.g non-offloaded, non-tunnel-offloaded packets) ?
> No other way I can think of. Suggestions?
Suggested in Patch-6 (code refactoring in dfc_processing())

> > What is expected from the PMD using the given 'packet' argument ? This
> > is not clear from the API description in rte_flow.h.
>
> The PMD is expected to use any data on the packet in order to provide
> the HW info.
>
> mlx5 PMD uses mark to do it, but each PMD is free to use whatever
> implementation suitable for its HW.

Re: [ovs-dev] [PATCH V2 03/14] netdev-offload-dpdk: Implement flow dump create/destroy APIs

2021-02-24 Thread Sriharsha Basavapatna via dev
On Tue, Feb 23, 2021 at 6:55 PM Eli Britstein  wrote:
>
>
> On 2/23/2021 3:10 PM, Sriharsha Basavapatna wrote:
>
> On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
>
> When offloading vports, we don't configure rte_flow on the vport itself,
> as it is not a physical dpdk port, but rather on uplinks. Implement
> those APIs as a pre-step to enable iterate over the ports.
>
> We don't need these flow_dump APIs, since we are not really dumping
> any flows here and also orig_in_port is provided to the offload layer
> (Patch 12).
>
> We still need them to traverse the ports to get the vxlan netdev in case of a 
> miss. See patch #5:
>
> 37056941f netdev-offload-dpdk: Implement HW miss packet recover for vport
>
> There, see get_vport_netdev.
>
> The naming is because this is "flow_api" and the implementation is of an 
> existing API rather than introducing new one(s).

I've suggested a new API (see my comments in Patch 5).
>
> Thanks,
> -Harsha
>
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V2 01/14] netdev-offload: Add HW miss packet state recover API

2021-02-24 Thread Sriharsha Basavapatna via dev
On Tue, Feb 23, 2021 at 6:51 PM Eli Britstein  wrote:
>
>
> On 2/23/2021 3:10 PM, Sriharsha Basavapatna wrote:
>
> On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
>
> When the HW offload involves multiple flows, like in tunnel decap path,
> it is possible that not all flows in the path are offloaded, resulting
> in partial processing in HW. In order to proceed with rest of the
> processing in SW, the packet state has to be recovered as if it was
> processed in SW from the beginning. Add API for that.
>
> Can you be more specific/clear on what this API does ? What specific
> packet state is this referring to and what is meant by recovering the
> state here ? For example, if recovering  the packet state means to
> check if the packet is encapsulated and to pop the tunnel header, then
> it would make it clear to just state that.
> Thanks,
> -Harsha
>
> The state refers to the state provided by the HW. This patch introduces a 
> generic API to support all cases.
>
> The case to pop in SW in case the info provided is that the packet is 
> encapsulated is a private case.

Private case ? IMO, the API/interface should provide sufficient
information on what is meant by the state and recovery for each use
case, starting with tunnel encapsulated packets for now.

>
>
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V2 13/14] netdev-offload-dpdk: Support vports flows offload

2021-02-24 Thread Sriharsha Basavapatna via dev
On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
>
> Vports are virtual, OVS only logical devices, so rte_flows cannot be
> applied as is on them. Instead, apply the rules the physical port from
> which the packet has arrived, provided by orig_in_port field.
>
> Signed-off-by: Eli Britstein 
> Reviewed-by: Gaetan Rivet 
> ---
>  lib/netdev-offload-dpdk.c | 169 ++
>  1 file changed, 137 insertions(+), 32 deletions(-)
>
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index f6e668bff..ad47d717f 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -62,6 +62,7 @@ struct ufid_to_rte_flow_data {
>  struct rte_flow *rte_flow;
>  bool actions_offloaded;
>  struct dpif_flow_stats stats;
> +struct netdev *physdev;
>  };
>
>  /* Find rte_flow with @ufid. */
> @@ -87,7 +88,8 @@ ufid_to_rte_flow_data_find(const ovs_u128 *ufid, bool warn)
>
>  static inline struct ufid_to_rte_flow_data *
>  ufid_to_rte_flow_associate(const ovs_u128 *ufid, struct netdev *netdev,
> -   struct rte_flow *rte_flow, bool actions_offloaded)
> +   struct netdev *vport, struct rte_flow *rte_flow,
> +   bool actions_offloaded)
>  {
>  size_t hash = hash_bytes(ufid, sizeof *ufid, 0);
>  struct ufid_to_rte_flow_data *data = xzalloc(sizeof *data);
> @@ -105,7 +107,8 @@ ufid_to_rte_flow_associate(const ovs_u128 *ufid, struct 
> netdev *netdev,
>  }
>
>  data->ufid = *ufid;
> -data->netdev = netdev_ref(netdev);
> +data->netdev = vport ? netdev_ref(vport) : netdev_ref(netdev);
> +data->physdev = netdev_ref(netdev);

For non-tunnel offloads, we end up getting two references to the same
'netdev'; can we avoid this ? That is, get a reference to physdev only
for the vport case.
>  data->rte_flow = rte_flow;
>  data->actions_offloaded = actions_offloaded;
>
> @@ -122,6 +125,7 @@ ufid_to_rte_flow_disassociate(struct 
> ufid_to_rte_flow_data *data)
>  cmap_remove(_to_rte_flow,
>  CONST_CAST(struct cmap_node *, >node), hash);
>  netdev_close(data->netdev);
> +netdev_close(data->physdev);

Similar comment, release reference to physdev only if we got a
reference earlier (i.e., physdev should be non-null only when netdev
is a vport).
>  ovsrcu_postpone(free, data);
>  }
>
> @@ -134,6 +138,8 @@ struct flow_patterns {
>  struct rte_flow_item *items;
>  int cnt;
>  int current_max;
> +uint32_t num_of_tnl_items;

change to --> num_pmd_tnl_items
> +struct ds s_tnl;
>  };
>
>  struct flow_actions {
> @@ -154,16 +160,20 @@ struct flow_actions {
>  static void
>  dump_flow_attr(struct ds *s, struct ds *s_extra,
> const struct rte_flow_attr *attr,
> +   struct flow_patterns *flow_patterns,
> struct flow_actions *flow_actions)
>  {
>  if (flow_actions->num_of_tnl_actions) {
>  ds_clone(s_extra, _actions->s_tnl);
> +} else if (flow_patterns->num_of_tnl_items) {
> +ds_clone(s_extra, _patterns->s_tnl);
>  }
> -ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s%s",
> +ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s%s%s",
>attr->ingress  ? "ingress " : "",
>attr->egress   ? "egress " : "", attr->priority, 
> attr->group,
>attr->transfer ? "transfer " : "",
> -  flow_actions->num_of_tnl_actions ? "tunnel_set 1 " : "");
> +  flow_actions->num_of_tnl_actions ? "tunnel_set 1 " : "",
> +  flow_patterns->num_of_tnl_items ? "tunnel_match 1 " : "");
>  }
>
>  /* Adds one pattern item 'field' with the 'mask' to dynamic string 's' using
> @@ -177,9 +187,18 @@ dump_flow_attr(struct ds *s, struct ds *s_extra,
>  }
>
>  static void
> -dump_flow_pattern(struct ds *s, const struct rte_flow_item *item)
> +dump_flow_pattern(struct ds *s,
> +  struct flow_patterns *flow_patterns,
> +  int pattern_index)
>  {
> -if (item->type == RTE_FLOW_ITEM_TYPE_ETH) {
> +const struct rte_flow_item *item = _patterns->items[pattern_index];
> +
> +if (item->type == RTE_FLOW_ITEM_TYPE_END) {
> +ds_put_cstr(s, "end ");
> +} else if (flow_patterns->num_of_tnl_items &&
> +   pattern_index < flow_patterns->num_of_tnl_items) {
> +return;
> +} else if (item->type == RTE_FLOW_ITEM_TYPE_ETH) {
>  const struct rte_flow_item_eth *eth_spec = item->spec;
>  const struct rte_flow_item_eth *eth_mask = item->mask;
>
> @@ -569,19 +588,19 @@ dump_flow_action(struct ds *s, struct ds *s_extra,
>  static struct ds *
>  dump_flow(struct ds *s, struct ds *s_extra,
>const struct rte_flow_attr *attr,
> -  const struct rte_flow_item *items,
> +  struct flow_patterns *flow_patterns,
>struct flow_actions *flow_actions)
>  {
>  int 

Re: [ovs-dev] [PATCH V2 11/14] netdev-offload-dpdk: Refactor offload rule creation

2021-02-24 Thread Sriharsha Basavapatna via dev
On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
>
> Refactor offload rule creation as a pre-step towards tunnel matching
> that depend on the netdev.
>
> Signed-off-by: Eli Britstein 
> Reviewed-by: Gaetan Rivet 
> ---
>  lib/netdev-offload-dpdk.c | 106 --
>  1 file changed, 45 insertions(+), 61 deletions(-)
>
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index 493cc9159..f6e668bff 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -1009,30 +1009,6 @@ add_flow_mark_rss_actions(struct flow_actions *actions,
>  add_flow_action(actions, RTE_FLOW_ACTION_TYPE_END, NULL);
>  }
>
> -static struct rte_flow *
> -netdev_offload_dpdk_mark_rss(struct flow_patterns *patterns,
> - struct netdev *netdev,
> - uint32_t flow_mark)
> -{
> -struct flow_actions actions = { .actions = NULL, .cnt = 0 };
> -const struct rte_flow_attr flow_attr = {
> -.group = 0,
> -.priority = 0,
> -.ingress = 1,
> -.egress = 0
> -};
> -struct rte_flow_error error;
> -struct rte_flow *flow;
> -
> -add_flow_mark_rss_actions(, flow_mark, netdev);
> -
> -flow = netdev_offload_dpdk_flow_create(netdev, _attr, 
> patterns->items,
> -   , );
> -
> -free_flow_actions();
> -return flow;
> -}
> -
>  static void
>  add_count_action(struct flow_actions *actions)
>  {
> @@ -1509,27 +1485,48 @@ parse_flow_actions(struct netdev *netdev,
>  return 0;
>  }
>
> -static struct rte_flow *
> -netdev_offload_dpdk_actions(struct netdev *netdev,
> -struct flow_patterns *patterns,
> -struct nlattr *nl_actions,
> -size_t actions_len)
> +static struct ufid_to_rte_flow_data *
> +create_netdev_offload(struct netdev *netdev,
> +  const ovs_u128 *ufid,
> +  struct flow_patterns *flow_patterns,
> +  struct flow_actions *flow_actions,
> +  bool enable_full,
> +  uint32_t flow_mark)
>  {
> -const struct rte_flow_attr flow_attr = { .ingress = 1, .transfer = 1 };
> -struct flow_actions actions = { .actions = NULL, .cnt = 0 };
> +struct rte_flow_attr flow_attr = { .ingress = 1, .transfer = 1, };
> +struct flow_actions rss_actions = { .actions = NULL, .cnt = 0 };
> +struct rte_flow_item *items = flow_patterns->items;
> +struct ufid_to_rte_flow_data *flow_data = NULL;
> +bool actions_offloaded = true;
>  struct rte_flow *flow = NULL;
>  struct rte_flow_error error;
> -int ret;
>
> -ret = parse_flow_actions(netdev, , nl_actions, actions_len);
> -if (ret) {
> -goto out;
> +if (enable_full) {
> +flow = netdev_offload_dpdk_flow_create(netdev, _attr, items,
> +   flow_actions, );
>  }
> -flow = netdev_offload_dpdk_flow_create(netdev, _attr, 
> patterns->items,
> -   , );
> -out:
> -free_flow_actions();
> -return flow;
> +
> +if (!flow) {
> +/* If we failed to offload the rule actions fallback to MARK+RSS
> + * actions.
> + */
A  debug message might be useful here, when we fallback to mark/rss action ?
> +actions_offloaded = false;
> +flow_attr.transfer = 0;
> +add_flow_mark_rss_actions(_actions, flow_mark, netdev);
> +flow = netdev_offload_dpdk_flow_create(netdev, _attr, items,
> +   _actions, );
> +}
> +
> +if (flow) {
> +flow_data = ufid_to_rte_flow_associate(ufid, netdev, flow,
> +   actions_offloaded);
> +VLOG_DBG("%s: installed flow %p by ufid "UUID_FMT,
> + netdev_get_name(netdev), flow,
> + UUID_ARGS((struct uuid *) ufid));
> +}
> +
> +free_flow_actions(_actions);
This free is needed only when we fallback to mark/rss offload, not otherwise.

> +return flow_data;
>  }
>
>  static struct ufid_to_rte_flow_data *
> @@ -1541,9 +1538,9 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
>   struct offload_info *info)
>  {
>  struct flow_patterns patterns = { .items = NULL, .cnt = 0 };
> -struct ufid_to_rte_flow_data *flows_data = NULL;
> -bool actions_offloaded = true;
> -struct rte_flow *flow;
> +struct flow_actions actions = { .actions = NULL, .cnt = 0 };
> +struct ufid_to_rte_flow_data *flow_data = NULL;
> +int err;
>
>  if (parse_flow_match(, match)) {
>  VLOG_DBG_RL(, "%s: matches of ufid "UUID_FMT" are not supported",
> @@ -1551,28 +1548,15 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
>  goto out;
>  }
>
> -flow = netdev_offload_dpdk_actions(netdev, , nl_actions,
> - 

Re: [ovs-dev] [PATCH V2 10/14] netdev-offload-dpdk: Support tunnel pop action

2021-02-23 Thread Sriharsha Basavapatna via dev
On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
>
> Support tunnel pop action.
>
> Signed-off-by: Eli Britstein 
> Reviewed-by: Gaetan Rivet 
> ---
>  Documentation/howto/dpdk.rst |   1 +
>  NEWS |   1 +
>  lib/netdev-offload-dpdk.c| 173 ---
>  3 files changed, 160 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index f0d45e47b..4918d80f3 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -398,6 +398,7 @@ Supported actions for hardware offload are:
>  - VLAN Push/Pop (push_vlan/pop_vlan).
>  - Modification of IPv6 (set_field:->ipv6_src/ipv6_dst/mod_nw_ttl).
>  - Clone/output (tnl_push and output) for encapsulating over a tunnel.
> +- Tunnel pop, for changing from a physical port to a vport.
>
>  Further Reading
>  ---
> diff --git a/NEWS b/NEWS
> index a7bffce97..6850d5621 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -26,6 +26,7 @@ v2.15.0 - xx xxx 
> - DPDK:
>   * Removed support for vhost-user dequeue zero-copy.
>   * Add support for DPDK 20.11.
> + * Add hardware offload support for tunnel pop action (experimental).
> - Userspace datapath:
>   * Add the 'pmd' option to "ovs-appctl dpctl/dump-flows", which
> restricts a flow dump to a single PMD thread if set.
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index 78f866080..493cc9159 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -140,15 +140,30 @@ struct flow_actions {
>  struct rte_flow_action *actions;
>  int cnt;
>  int current_max;
> +struct netdev *tnl_netdev;
> +/* tnl_actions is the opaque array of actions returned by the PMD. */
> +struct rte_flow_action *tnl_actions;

Why is this an opaque array ? Since it is struct rte_flow_action, OVS
knows the type and members. Is it opaque because the value of
rte_flow_action_type member is unknown to OVS ? Is it a private action
type and if so how does the PMD ensure that it doesn't conflict with
standard action types ?

> +uint32_t num_of_tnl_actions;
> +/* tnl_actions_pos is where the tunnel actions starts within the 
> 'actions'
> + * field.
> + */
> +int tnl_actions_pos;

Names should indicate that they are private or pmd specific ?

tnl_actions --> tnl_private_actions or tnl_pmd_actions
num_of_tnl_actions --> num_private_tnl_actions or num_pmd_tnl_actions
tnl_actions_pos --> tnl_private_actions_pos or tnl_pmd_actions_pos

> +struct ds s_tnl;
>  };
>
>  static void
> -dump_flow_attr(struct ds *s, const struct rte_flow_attr *attr)
> +dump_flow_attr(struct ds *s, struct ds *s_extra,
> +   const struct rte_flow_attr *attr,
> +   struct flow_actions *flow_actions)
>  {
> -ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s",
> +if (flow_actions->num_of_tnl_actions) {
> +ds_clone(s_extra, _actions->s_tnl);
> +}
> +ds_put_format(s, "%s%spriority %"PRIu32" group %"PRIu32" %s%s",
>attr->ingress  ? "ingress " : "",
>attr->egress   ? "egress " : "", attr->priority, 
> attr->group,
> -  attr->transfer ? "transfer " : "");
> +  attr->transfer ? "transfer " : "",
> +  flow_actions->num_of_tnl_actions ? "tunnel_set 1 " : "");
>  }
>
>  /* Adds one pattern item 'field' with the 'mask' to dynamic string 's' using
> @@ -395,9 +410,19 @@ dump_vxlan_encap(struct ds *s, const struct 
> rte_flow_item *items)
>
>  static void
>  dump_flow_action(struct ds *s, struct ds *s_extra,
> - const struct rte_flow_action *actions)
> + struct flow_actions *flow_actions, int act_index)
>  {
> -if (actions->type == RTE_FLOW_ACTION_TYPE_MARK) {
> +const struct rte_flow_action *actions = 
> _actions->actions[act_index];
> +
> +if (actions->type == RTE_FLOW_ACTION_TYPE_END) {
> +ds_put_cstr(s, "end");
> +} else if (flow_actions->num_of_tnl_actions &&
> +   act_index >= flow_actions->tnl_actions_pos &&
> +   act_index < flow_actions->tnl_actions_pos +
> +   flow_actions->num_of_tnl_actions) {
> +/* Opaque PMD tunnel actions is skipped. */

Wouldn't it be useful to at least print the value of PMD action types ?

> +return;
> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_MARK) {
>  const struct rte_flow_action_mark *mark = actions->conf;
>
>  ds_put_cstr(s, "mark ");
> @@ -528,6 +553,14 @@ dump_flow_action(struct ds *s, struct ds *s_extra,
>  ds_put_cstr(s, "vxlan_encap / ");
>  dump_vxlan_encap(s_extra, items);
>  ds_put_cstr(s_extra, ";");
> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_JUMP) {
> +const struct rte_flow_action_jump *jump = actions->conf;
> +
> +ds_put_cstr(s, "jump ");
> +if (jump) {

Re: [ovs-dev] [PATCH V2 06/14] dpif-netdev: Add HW miss packet state recover logic

2021-02-23 Thread Sriharsha Basavapatna via dev
On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
>
> Recover the packet if it was partially processed by the HW. Fallback to
> lookup flow by mark association.
>
> Signed-off-by: Eli Britstein 
> Reviewed-by: Gaetan Rivet 
> ---
>  lib/dpif-netdev.c | 46 ++
>  1 file changed, 30 insertions(+), 16 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e3fd0a07f..09e86631e 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -7036,6 +7036,10 @@ smc_lookup_batch(struct dp_netdev_pmd_thread *pmd,
>  pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, n_smc_hit);
>  }
>
> +static struct tx_port *
> +pmd_send_port_cache_lookup(const struct dp_netdev_pmd_thread *pmd,
> +   odp_port_t port_no);
> +
>  /* Try to process all ('cnt') the 'packets' using only the datapath flow 
> cache
>   * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the
>   * miniflow is copied into 'keys' and the packet pointer is moved at the
> @@ -7099,23 +7103,33 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>  pkt_metadata_init(>md, port_no);
>  }
>
> -if ((*recirc_depth_get() == 0) &&
> -dp_packet_has_flow_mark(packet, )) {
> -flow = mark_to_flow_find(pmd, mark);
> -if (OVS_LIKELY(flow)) {
> -tcp_flags = parse_tcp_flags(packet);
> -if (OVS_LIKELY(batch_enable)) {
> -dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
> -n_batches);
> -} else {
> -/* Flow batching should be performed only after fast-path
> - * processing is also completed for packets with emc miss
> - * or else it will result in reordering of packets with
> - * same datapath flows. */
> -packet_enqueue_to_flow_map(packet, flow, tcp_flags,
> -   flow_map, map_cnt++);
> +if (*recirc_depth_get() == 0) {
> +/* Restore the packet if HW processing was terminated before
> + * completion.
> + */
> +struct tx_port *p;
> +
> +tcp_flags = parse_tcp_flags(packet);
> +p = pmd_send_port_cache_lookup(pmd, port_no);
> +if (!p || netdev_hw_miss_packet_recover(p->port->netdev, 
> packet)) {
> +if (dp_packet_has_flow_mark(packet, )) {
> +flow = mark_to_flow_find(pmd, mark);
> +if (OVS_LIKELY(flow)) {
> +if (OVS_LIKELY(batch_enable)) {
> +dp_netdev_queue_batches(packet, flow, tcp_flags,
> +batches, n_batches);
> +} else {
> +/* Flow batching should be performed only after
> + * fast-path processing is also completed for
> + * packets with emc miss or else it will result 
> in
> + * reordering of packets with same datapath 
> flows.
> + */
> +packet_enqueue_to_flow_map(packet, flow, 
> tcp_flags,
> +   flow_map, map_cnt++);
> +}
> +continue;
> +}
>  }
> -continue;
>  }
>  }

The above logic should be changed to avoid hw_miss_packet_recover()
when hw-offload is not enabled:

if (*recirc_depth_get() == 0) {
...
...
if (netdev_is_flow_api_enabled()) {
p = pmd_send_port_cache_lookup(pmd, port_no);
if (OVS_UNLIKELY(p && !netdev_hw_miss_packet_recover(p->port->netdev,
  packet))) {
goto miniflow;
}
}
if (dp_packet_has_flow_mark(packet, )) {
flow = mark_to_flow_find(pmd, mark);
if (OVS_LIKELY(flow)) {
...
}
}
}

miniflow:
miniflow_extract();

>
> --
> 2.28.0.546.g385c171
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 

Re: [ovs-dev] [PATCH V2 00/14] Netdev vxlan-decap offload

2021-02-23 Thread Sriharsha Basavapatna via dev
On Tue, Feb 23, 2021 at 5:14 PM Eli Britstein  wrote:
>
>
> On 2/23/2021 12:48 PM, Sriharsha Basavapatna wrote:
> > On Sun, Feb 21, 2021 at 7:04 PM Eli Britstein  wrote:
> >>
> >> On 2/18/2021 6:38 PM, Kovacevic, Marko wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> <...>
>  Sending to Marko. As he wasn't subscribed to ovs-dev then.
> 
> >>> <...>
> > VXLAN decap in OVS-DPDK configuration consists of two flows:
> > F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> > F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> >
> > F1 is a classification flow. It has outer headers matches and it 
> > classifies the
> > packet as a VXLAN packet, and using tnl_pop action the packet continues
> > processing in F2.
> > F2 is a flow that has matches on tunnel metadata as well as on the inner
> > packet headers (as any other flow).
> >
> >>> <...>
> >>>
> >>> Hi Eli,
> >>>
> >>> Hi,
> >>> After testing the patchset it seems  after the tenth patch I start seeing 
> >>> a drop in the scatter performance around ~4% decrease  across all packet 
> >>> sizes tested(112,256,512,1518)
> >>> Burst measurement see a decrease also but not as much as the scatter does.
> >> Hi Marko,
> >>
> >> Thanks for testing this series.
> >>
> >>> Patch10
> >>> fff1f9168 netdev-offload-dpdk: Support tunnel pop action
> >> It doesn't make sense that this commit causes any degradation as it only
> >> enhances offloads that are not in the datapath and not done for
> >> virtio-user ports in any case.
> > Patch 10 enables offload for flow F1 with tnl_pop action. If
> > hw_offload is enabled, then the new code to offload this flow would be
> > executed for virtio-user ports as well, since this flow is independent
> > of the end point port (whether virtio or vf-rep).
> No. virtio-user ports won't have "flow_api" function pointer to dpdk
> offload provider. Although, this tnl_pop flow is on the PF, so it is not
> virtio-user.

I know that virio-user ports won't have "flow-api" function pointers.
That's not what I meant. While offloading flow-F1, we don't really
know what the final endpoint port is (virtio or vf-rep), since the
in_port for flow-F1 is a PF port. So, add_tnl_pop_action() would be
executed independent of the target destination port (which is
available as out_port in flow-F2). So, even if the packet is
eventually destined to a virtio-user port (in F2), F1 still executes
add_tnl_pop_action().

> >
> > Before this patch (i.e, with the original code in master/2.15),
> > parse_flow_actions() would fail for TUNNEL_POP action. But with the
> > new code, this action is processed by the function -
> > add_tnl_pop_action(). There is some processing in this function,
> > including a new rte_flow API (rte_flow_tunnel_decap_set) to the PMD.
> > Maybe this is adding some overhead ?
>
> The new API is processed in the offload thread, not in the datapath.
> Indeed it can affect the datapath, depending if/how the PF's PMD
> support/implementation.
>
> As seen from Marko's configuration line, there is no experimental
> support, so there are no new offloads either.

Even if experimental api support is not enabled, if hw-offload is
enabled in OVS, then add_tnl_pop_action() would still be called ? And
at the very least these 3 functions would be invoked in that function:
netdev_ports_get(), vport_to_rte_tunnel() and
netdev_dpdk_rte_flow_tunnel_decap_set(), the last one returns -1.

Is hw-offload enabled in Marko's configuration ?


>
> >
> > Thanks,
> > -Harsha
> >> Could you please double check?
> >>
> >> I would expect maybe a degradation with:
> >>
> >> Patch 12: 8a21a377c dpif-netdev: Provide orig_in_port in metadata for
> >> tunneled packets
> >>
> >> Patch 6: e548c079d dpif-netdev: Add HW miss packet state recover logic
> >>
> >> Could you please double check what is the offending commit?
> >>
> >> Do you compile with ALLOW_EXPERIMENTAL_API defined or not?
> >>
> >>> The test used for this is a 32 virito-user ports with 1Millions flows.
> >> Could you please elaborate exactly about your setup and test?
> >>
> >> What are "1M flows"? what are the differences between them?
> >>
> >> What are the OpenFlow rules you use?
> >>
> >> Are there any other configurations set (other_config for example)?
> >>
> >> What is being done with the packets in the guest side? all ports are in
> >> the same VM?
> >>
> >>> Traffic @ Phy NIC Rx:
> >>> Ether()/IP()/UDP()/VXLAN()/Ether()/IP()
> >>>
> >>> Burst: on the outer ip we do a burst of 32 packets with same ip then 
> >>> switch for next 32 and so on
> >>> Scatter: for scatter we do incrementally for 32
> >>> And on the inner packet we have a total of  1048576 flows
> >>>
> >>> I can send on a diagram directly just restricted with html here to send 
> >>> the diagram here of the test setup
> >> As commented above, I would appreciate more details about your tests and
> >> setup.
> >>
> >> 

Re: [ovs-dev] [PATCH V2 05/14] netdev-offload-dpdk: Implement HW miss packet recover for vport

2021-02-23 Thread Sriharsha Basavapatna via dev
On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
>
> A miss in virtual port offloads means the flow with tnl_pop was
> offloaded, but not the following one. Recover the state and continue
> with SW processing.

Relates to my comment on Patch-1; please explain what is meant by
recovering the packet state.

>
> Signed-off-by: Eli Britstein 
> Reviewed-by: Gaetan Rivet 
> ---
>  lib/netdev-offload-dpdk.c | 95 +++
>  1 file changed, 95 insertions(+)
>
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index 8cc90d0f1..21aa26b42 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -1610,6 +1610,100 @@ netdev_offload_dpdk_flow_dump_destroy(struct 
> netdev_flow_dump *dump)
>  return 0;
>  }
>
> +static struct netdev *
> +get_vport_netdev(const char *dpif_type,
> + struct rte_flow_tunnel *tunnel,
> + odp_port_t *odp_port)
> +{
> +const struct netdev_tunnel_config *tnl_cfg;
> +struct netdev_flow_dump **netdev_dumps;
> +struct netdev *vport = NULL;
> +bool found = false;
> +int num_ports = 0;
> +int err;
> +int i;
> +
> +netdev_dumps = netdev_ports_flow_dump_create(dpif_type, _ports, 
> false);

This relates to my comment in Patch-3; flow_dump_create() in this
context is very confusing since we are not really dumping flows. We
might as well walk the list of ports/netdevs looking for a specific
netdev, just like other netdev_ports_*() routines in netdev-offload.c;
may be add a new function in netdev-offload.c:

netdev_ports_get_tunnel_vport(dpif_type, tunnel_type, tp_dst):/*
example: tunnel_type == "vxlan", tp_dst = 4789 */

HMAP_FOR_EACH (data, portno_node, _to_netdev) {
if (netdev_get_dpif_type(data->netdev) == dpif_type &&
netdev_get_type(data->netdev) == tunnel_type) {
tnl_cfg = netdev_get_tunnel_config(data->netdev);
 if (tnl_cfg && tnl_cfg->dst_port == tp_dst) {
 
 
}
}

> +for (i = 0; i < num_ports; i++) {
> +if (!found && tunnel->type == RTE_FLOW_ITEM_TYPE_VXLAN &&
> +!strcmp(netdev_get_type(netdev_dumps[i]->netdev), "vxlan")) {
> +tnl_cfg = netdev_get_tunnel_config(netdev_dumps[i]->netdev);
> +if (tnl_cfg && tnl_cfg->dst_port == tunnel->tp_dst) {
> +found = true;
> +vport = netdev_dumps[i]->netdev;
> +netdev_ref(vport);
> +*odp_port = netdev_dumps[i]->port;
> +}
> +}
> +err = netdev_flow_dump_destroy(netdev_dumps[i]);
> +if (err != 0 && err != EOPNOTSUPP) {
> +VLOG_ERR("failed dumping netdev: %s", ovs_strerror(err));
> +}
> +}
> +return vport;
> +}
> +
> +static int
> +netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
> +   struct dp_packet *packet)
> +{
> +struct rte_flow_restore_info rte_restore_info;
> +struct rte_flow_tunnel *rte_tnl;
> +struct rte_flow_error error;
> +struct netdev *vport_netdev;
> +struct pkt_metadata *md;
> +struct flow_tnl *md_tnl;
> +odp_port_t vport_odp;
> +
> +if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
> +  _restore_info, )) {
> +/* This function is called for every packet, and in most cases there
> + * will be no restore info from the HW, thus error is expected.

Right now this API (get_restore_info) is needed only in the case of
tunnel offloads; is there some way we could avoid calling this for
every packet (e.g non-offloaded, non-tunnel-offloaded packets) ?
What is expected from the PMD using the given 'packet' argument ? This
is not clear from the API description in rte_flow.h.
Is the PMD supposed to parse this packet and return success only if it
is an encapsulated packet ? Are there any other additional conditions
like the packet should be marked (i.e., the PMD should validate
rte_mbuf->hash.fdir fields etc) ? If it is a marked packet, does OVS
set the mark action or should the PMD implicitly add a mark action,
while offloading flow F1 ? If the PMD implicitly adds a mark action,
won't it conflict with mark-ids managed by OVS ?

> + */
> +(void) error;
> +return -1;
> +}
> +
> +rte_tnl = _restore_info.tunnel;
> +if (rte_restore_info.flags & RTE_FLOW_RESTORE_INFO_TUNNEL) {
> +vport_netdev = get_vport_netdev(netdev->dpif_type, rte_tnl,
> +_odp);
> +md = >md;
> +if (rte_restore_info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED) {
> +if (!vport_netdev || !vport_netdev->netdev_class ||
> +!vport_netdev->netdev_class->pop_header) {
> +VLOG_ERR("vport nedtdev=%s with no pop_header method",
> + netdev_get_name(vport_netdev));
> +

Re: [ovs-dev] [PATCH V2 03/14] netdev-offload-dpdk: Implement flow dump create/destroy APIs

2021-02-23 Thread Sriharsha Basavapatna via dev
On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
>
> When offloading vports, we don't configure rte_flow on the vport itself,
> as it is not a physical dpdk port, but rather on uplinks. Implement
> those APIs as a pre-step to enable iterate over the ports.

We don't need these flow_dump APIs, since we are not really dumping
any flows here and also orig_in_port is provided to the offload layer
(Patch 12).

Thanks,
-Harsha


>
> Signed-off-by: Eli Britstein 
> Reviewed-by: Gaetan Rivet 
> ---
>  lib/netdev-offload-dpdk.c | 24 
>  1 file changed, 24 insertions(+)
>
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index f2413f5be..8cc90d0f1 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -1588,6 +1588,28 @@ netdev_offload_dpdk_flow_flush(struct netdev *netdev)
>  return 0;
>  }
>
> +static int
> +netdev_offload_dpdk_flow_dump_create(struct netdev *netdev,
> + struct netdev_flow_dump **dump_out,
> + bool terse OVS_UNUSED)
> +{
> +struct netdev_flow_dump *dump;
> +
> +dump = xzalloc(sizeof *dump);
> +dump->netdev = netdev_ref(netdev);
> +
> +*dump_out = dump;
> +return 0;
> +}
> +
> +static int
> +netdev_offload_dpdk_flow_dump_destroy(struct netdev_flow_dump *dump)
> +{
> +netdev_close(dump->netdev);
> +free(dump);
> +return 0;
> +}
> +
>  const struct netdev_flow_api netdev_offload_dpdk = {
>  .type = "dpdk_flow_api",
>  .flow_put = netdev_offload_dpdk_flow_put,
> @@ -1595,4 +1617,6 @@ const struct netdev_flow_api netdev_offload_dpdk = {
>  .init_flow_api = netdev_offload_dpdk_init_flow_api,
>  .flow_get = netdev_offload_dpdk_flow_get,
>  .flow_flush = netdev_offload_dpdk_flow_flush,
> +.flow_dump_create = netdev_offload_dpdk_flow_dump_create,
> +.flow_dump_destroy = netdev_offload_dpdk_flow_dump_destroy,
>  };
> --
> 2.28.0.546.g385c171
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V2 01/14] netdev-offload: Add HW miss packet state recover API

2021-02-23 Thread Sriharsha Basavapatna via dev
On Wed, Feb 10, 2021 at 8:57 PM Eli Britstein  wrote:
>
> When the HW offload involves multiple flows, like in tunnel decap path,
> it is possible that not all flows in the path are offloaded, resulting
> in partial processing in HW. In order to proceed with rest of the
> processing in SW, the packet state has to be recovered as if it was
> processed in SW from the beginning. Add API for that.

Can you be more specific/clear on what this API does ? What specific
packet state is this referring to and what is meant by recovering the
state here ? For example, if recovering  the packet state means to
check if the packet is encapsulated and to pop the tunnel header, then
it would make it clear to just state that.
Thanks,
-Harsha



>
> Signed-off-by: Eli Britstein 
> Reviewed-by: Gaetan Rivet 
> ---
>  lib/netdev-offload-provider.h |  5 +
>  lib/netdev-offload.c  | 12 
>  lib/netdev-offload.h  |  1 +
>  3 files changed, 18 insertions(+)
>
> diff --git a/lib/netdev-offload-provider.h b/lib/netdev-offload-provider.h
> index cf859d1b4..f24c7dd19 100644
> --- a/lib/netdev-offload-provider.h
> +++ b/lib/netdev-offload-provider.h
> @@ -87,6 +87,11 @@ struct netdev_flow_api {
>   * Return 0 if successful, otherwise returns a positive errno value. */
>  int (*flow_get_n_flows)(struct netdev *, uint64_t *n_flows);
>
> +/* Recover the packet state (contents and data) for continued processing
> + * in software.
> + * Return 0 if successful, otherwise returns a positive errno value. */
> +int (*hw_miss_packet_recover)(struct netdev *, struct dp_packet *);
> +
>  /* Initializies the netdev flow api.
>   * Return 0 if successful, otherwise returns a positive errno value. */
>  int (*init_flow_api)(struct netdev *);
> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> index 6237667c3..e5d24651f 100644
> --- a/lib/netdev-offload.c
> +++ b/lib/netdev-offload.c
> @@ -253,6 +253,18 @@ netdev_flow_put(struct netdev *netdev, struct match 
> *match,
> : EOPNOTSUPP;
>  }
>
> +int
> +netdev_hw_miss_packet_recover(struct netdev *netdev,
> +  struct dp_packet *packet)
> +{
> +const struct netdev_flow_api *flow_api =
> +ovsrcu_get(const struct netdev_flow_api *, >flow_api);
> +
> +return (flow_api && flow_api->hw_miss_packet_recover)
> +? flow_api->hw_miss_packet_recover(netdev, packet)
> +: EOPNOTSUPP;
> +}
> +
>  int
>  netdev_flow_get(struct netdev *netdev, struct match *match,
>  struct nlattr **actions, const ovs_u128 *ufid,
> diff --git a/lib/netdev-offload.h b/lib/netdev-offload.h
> index 18b48790f..b063c43a3 100644
> --- a/lib/netdev-offload.h
> +++ b/lib/netdev-offload.h
> @@ -89,6 +89,7 @@ bool netdev_flow_dump_next(struct netdev_flow_dump *, 
> struct match *,
>  int netdev_flow_put(struct netdev *, struct match *, struct nlattr *actions,
>  size_t actions_len, const ovs_u128 *,
>  struct offload_info *, struct dpif_flow_stats *);
> +int netdev_hw_miss_packet_recover(struct netdev *, struct dp_packet *);
>  int netdev_flow_get(struct netdev *, struct match *, struct nlattr **actions,
>  const ovs_u128 *, struct dpif_flow_stats *,
>  struct dpif_flow_attrs *, struct ofpbuf *wbuffer);
> --
> 2.28.0.546.g385c171
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V2 00/14] Netdev vxlan-decap offload

2021-02-23 Thread Sriharsha Basavapatna via dev
On Sun, Feb 21, 2021 at 7:04 PM Eli Britstein  wrote:
>
>
> On 2/18/2021 6:38 PM, Kovacevic, Marko wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > <...>
> >> Sending to Marko. As he wasn't subscribed to ovs-dev then.
> >>
> > <...>
> >>> VXLAN decap in OVS-DPDK configuration consists of two flows:
> >>> F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> >>> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> >>>
> >>> F1 is a classification flow. It has outer headers matches and it 
> >>> classifies the
> >>> packet as a VXLAN packet, and using tnl_pop action the packet continues
> >>> processing in F2.
> >>> F2 is a flow that has matches on tunnel metadata as well as on the inner
> >>> packet headers (as any other flow).
> >>>
> > <...>
> >
> > Hi Eli,
> >
> > Hi,
> > After testing the patchset it seems  after the tenth patch I start seeing a 
> > drop in the scatter performance around ~4% decrease  across all packet 
> > sizes tested(112,256,512,1518)
> > Burst measurement see a decrease also but not as much as the scatter does.
>
> Hi Marko,
>
> Thanks for testing this series.
>
> >
> > Patch10
> > fff1f9168 netdev-offload-dpdk: Support tunnel pop action
>
> It doesn't make sense that this commit causes any degradation as it only
> enhances offloads that are not in the datapath and not done for
> virtio-user ports in any case.

Patch 10 enables offload for flow F1 with tnl_pop action. If
hw_offload is enabled, then the new code to offload this flow would be
executed for virtio-user ports as well, since this flow is independent
of the end point port (whether virtio or vf-rep).

Before this patch (i.e, with the original code in master/2.15),
parse_flow_actions() would fail for TUNNEL_POP action. But with the
new code, this action is processed by the function -
add_tnl_pop_action(). There is some processing in this function,
including a new rte_flow API (rte_flow_tunnel_decap_set) to the PMD.
Maybe this is adding some overhead ?

Thanks,
-Harsha
>
> Could you please double check?
>
> I would expect maybe a degradation with:
>
> Patch 12: 8a21a377c dpif-netdev: Provide orig_in_port in metadata for
> tunneled packets
>
> Patch 6: e548c079d dpif-netdev: Add HW miss packet state recover logic
>
> Could you please double check what is the offending commit?
>
> Do you compile with ALLOW_EXPERIMENTAL_API defined or not?
>
> > The test used for this is a 32 virito-user ports with 1Millions flows.
>
> Could you please elaborate exactly about your setup and test?
>
> What are "1M flows"? what are the differences between them?
>
> What are the OpenFlow rules you use?
>
> Are there any other configurations set (other_config for example)?
>
> What is being done with the packets in the guest side? all ports are in
> the same VM?
>
> >
> > Traffic @ Phy NIC Rx:
> > Ether()/IP()/UDP()/VXLAN()/Ether()/IP()
> >
> > Burst: on the outer ip we do a burst of 32 packets with same ip then switch 
> > for next 32 and so on
> > Scatter: for scatter we do incrementally for 32
> > And on the inner packet we have a total of  1048576 flows
> >
> > I can send on a diagram directly just restricted with html here to send the 
> > diagram here of the test setup
>
> As commented above, I would appreciate more details about your tests and
> setup.
>
> Thanks,
>
> Eli
>
> >
> > Thanks
> > Marko K
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 00/15] Netdev vxlan-decap offload

2021-02-09 Thread Sriharsha Basavapatna via dev
On Tue, Feb 9, 2021 at 8:41 PM Eli Britstein  wrote:
>
>
> On 2/8/2021 6:21 PM, Sriharsha Basavapatna wrote:
> > On Mon, Feb 8, 2021 at 7:33 PM Eli Britstein  wrote:
> >>
> >> On 2/8/2021 3:11 PM, Sriharsha Basavapatna wrote:
> >>> On Sun, Feb 7, 2021 at 4:58 PM Eli Britstein  wrote:
>  On 2/5/2021 8:26 PM, Sriharsha Basavapatna wrote:
> > On Fri, Feb 5, 2021 at 4:55 PM Sriharsha Basavapatna
> >  wrote:
> >> On Wed, Jan 27, 2021 at 11:40 PM Eli Britstein  
> >> wrote:
> >>> VXLAN decap in OVS-DPDK configuration consists of two flows:
> >>> F1: in_port(ens1f0),eth(),ipv4(),udp(), 
> >>> actions:tnl_pop(vxlan_sys_4789)
> >>> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> >>>
> >>> F1 is a classification flow. It has outer headers matches and it
> >>> classifies the packet as a VXLAN packet, and using tnl_pop action the
> >>> packet continues processing in F2.
> >>> F2 is a flow that has matches on tunnel metadata as well as on the 
> >>> inner
> >>> packet headers (as any other flow).
> >>>
> >>> In order to fully offload VXLAN decap path, both F1 and F2 should be
> >>> offloaded. As there are more than one flow in HW, it is possible that
> >>> F1 is done by HW but F2 is not. Packet is received by SW, and should 
> >>> be
> >>> processed starting from F2 as F1 was already done by HW.
> >>> Rte_flows are applicable only on physical port IDs. Vport flows (e.g. 
> >>> F2)
> >>> are applied on uplink ports attached to OVS.
> >>>
> >>> This patch-set makes use of [1] introduced in DPDK 20.11, that adds 
> >>> API
> >>> for tunnel offloads.
> >>>
> >>> Travis:
> >>> v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> >>>
> >>> GitHub Actions:
> >>> v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> >>>
> >>> [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> >>>
> >>> Eli Britstein (13):
> >>>  netdev-offload: Add HW miss packet state recover API
> >>>  netdev-dpdk: Introduce DPDK tunnel APIs
> >>>  netdev-offload-dpdk: Implement flow dump create/destroy APIs
> >>>  netdev-dpdk: Add flow_api support for netdev vxlan vports
> >>>  netdev-offload-dpdk: Implement HW miss packet recover for vport
> >>>  dpif-netdev: Add HW miss packet state recover logic
> >>>  netdev-offload-dpdk: Change log rate limits
> >>>  netdev-offload-dpdk: Support tunnel pop action
> >>>  netdev-offload-dpdk: Refactor offload rule creation
> >>>  netdev-dpdk: Introduce an API to query if a dpdk port is an 
> >>> uplink
> >>>port
> >>>  netdev-offload-dpdk: Map netdev and ufid to offload objects
> >>>  netdev-offload-dpdk: Support vports flows offload
> >>>  netdev-dpdk-offload: Add vxlan pattern matching function
> >>>
> >>> Ilya Maximets (2):
> >>>  netdev-offload: Allow offloading to netdev without ifindex.
> >>>  netdev-offload: Disallow offloading to unrelated tunneling 
> >>> vports.
> >>>
> >>> Documentation/howto/dpdk.rst  |   1 +
> >>> NEWS  |   2 +
> >>> lib/dpif-netdev.c |  49 +-
> >>> lib/netdev-dpdk.c | 135 ++
> >>> lib/netdev-dpdk.h | 104 -
> >>> lib/netdev-offload-dpdk.c | 851 
> >>> +-
> >>> lib/netdev-offload-provider.h |   5 +
> >>> lib/netdev-offload-tc.c   |   8 +
> >>> lib/netdev-offload.c  |  29 +-
> >>> lib/netdev-offload.h  |   1 +
> >>> 10 files changed, 1033 insertions(+), 152 deletions(-)
> >>>
> >>> --
> >>> 2.28.0.546.g385c171
> >>>
> >> Hi Eli,
> >>
> >> Thanks for posting this new patchset to support tunnel decap action 
> >> offload.
> >>
> >> I haven't looked at the entire patchset yet. But I focused on the
> >> patches that introduce 1-to-many mapping between an OVS flow (f2) and
> >> HW offloaded flows.
> >>
> >> Here is a representation of the design proposed in this patchset. A
> >> flow f2 (inner flow) between the VxLAN-vPort and VFRep-1, for which
> >> the underlying uplink/physical port is P0, gets offloaded to not only
> >> P0, but also to other physical ports P1, P2... and so on.
> >>
> >>P0 <> VxLAN-vPort <> VFRep-1
> >>
> >>P1
> >>P2
> >>...
> >>Pn
> >>
> >> IMO, the problems with this design are:
> >>
> >> - Offloading a flow to an unrelated physical device that has nothing
> >> to do with that flow (invalid device for the flow).
> >> - Offloading to not just one, but several such invalid physical 
> >> devices.
> >> - Consuming HW resources 

Re: [ovs-dev] [PATCH 00/15] Netdev vxlan-decap offload

2021-02-08 Thread Sriharsha Basavapatna via dev
On Mon, Feb 8, 2021 at 7:33 PM Eli Britstein  wrote:
>
>
> On 2/8/2021 3:11 PM, Sriharsha Basavapatna wrote:
> > On Sun, Feb 7, 2021 at 4:58 PM Eli Britstein  wrote:
> >>
> >> On 2/5/2021 8:26 PM, Sriharsha Basavapatna wrote:
> >>> On Fri, Feb 5, 2021 at 4:55 PM Sriharsha Basavapatna
> >>>  wrote:
>  On Wed, Jan 27, 2021 at 11:40 PM Eli Britstein  wrote:
> > VXLAN decap in OVS-DPDK configuration consists of two flows:
> > F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> > F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> >
> > F1 is a classification flow. It has outer headers matches and it
> > classifies the packet as a VXLAN packet, and using tnl_pop action the
> > packet continues processing in F2.
> > F2 is a flow that has matches on tunnel metadata as well as on the inner
> > packet headers (as any other flow).
> >
> > In order to fully offload VXLAN decap path, both F1 and F2 should be
> > offloaded. As there are more than one flow in HW, it is possible that
> > F1 is done by HW but F2 is not. Packet is received by SW, and should be
> > processed starting from F2 as F1 was already done by HW.
> > Rte_flows are applicable only on physical port IDs. Vport flows (e.g. 
> > F2)
> > are applied on uplink ports attached to OVS.
> >
> > This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
> > for tunnel offloads.
> >
> > Travis:
> > v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> >
> > GitHub Actions:
> > v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> >
> > [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> >
> > Eli Britstein (13):
> > netdev-offload: Add HW miss packet state recover API
> > netdev-dpdk: Introduce DPDK tunnel APIs
> > netdev-offload-dpdk: Implement flow dump create/destroy APIs
> > netdev-dpdk: Add flow_api support for netdev vxlan vports
> > netdev-offload-dpdk: Implement HW miss packet recover for vport
> > dpif-netdev: Add HW miss packet state recover logic
> > netdev-offload-dpdk: Change log rate limits
> > netdev-offload-dpdk: Support tunnel pop action
> > netdev-offload-dpdk: Refactor offload rule creation
> > netdev-dpdk: Introduce an API to query if a dpdk port is an uplink
> >   port
> > netdev-offload-dpdk: Map netdev and ufid to offload objects
> > netdev-offload-dpdk: Support vports flows offload
> > netdev-dpdk-offload: Add vxlan pattern matching function
> >
> > Ilya Maximets (2):
> > netdev-offload: Allow offloading to netdev without ifindex.
> > netdev-offload: Disallow offloading to unrelated tunneling vports.
> >
> >Documentation/howto/dpdk.rst  |   1 +
> >NEWS  |   2 +
> >lib/dpif-netdev.c |  49 +-
> >lib/netdev-dpdk.c | 135 ++
> >lib/netdev-dpdk.h | 104 -
> >lib/netdev-offload-dpdk.c | 851 
> > +-
> >lib/netdev-offload-provider.h |   5 +
> >lib/netdev-offload-tc.c   |   8 +
> >lib/netdev-offload.c  |  29 +-
> >lib/netdev-offload.h  |   1 +
> >10 files changed, 1033 insertions(+), 152 deletions(-)
> >
> > --
> > 2.28.0.546.g385c171
> >
>  Hi Eli,
> 
>  Thanks for posting this new patchset to support tunnel decap action 
>  offload.
> 
>  I haven't looked at the entire patchset yet. But I focused on the
>  patches that introduce 1-to-many mapping between an OVS flow (f2) and
>  HW offloaded flows.
> 
>  Here is a representation of the design proposed in this patchset. A
>  flow f2 (inner flow) between the VxLAN-vPort and VFRep-1, for which
>  the underlying uplink/physical port is P0, gets offloaded to not only
>  P0, but also to other physical ports P1, P2... and so on.
> 
>    P0 <> VxLAN-vPort <> VFRep-1
> 
>    P1
>    P2
>    ...
>    Pn
> 
>  IMO, the problems with this design are:
> 
>  - Offloading a flow to an unrelated physical device that has nothing
>  to do with that flow (invalid device for the flow).
>  - Offloading to not just one, but several such invalid physical devices.
>  - Consuming HW resources for a flow that is never seen or intended to
>  be processed by those physical devices.
>  - Impacts flow scale on other physical devices, since it would consume
>  their HW resources with a large number of such invalid flows.
>  - The indirect list used to track these multiple mappings complicates
>  the offload layer implementation.
>  - The addition of flow_dump_create() to offload APIs, just to 

Re: [ovs-dev] [PATCH 00/15] Netdev vxlan-decap offload

2021-02-08 Thread Sriharsha Basavapatna via dev
On Sun, Feb 7, 2021 at 4:58 PM Eli Britstein  wrote:
>
>
> On 2/5/2021 8:26 PM, Sriharsha Basavapatna wrote:
> > On Fri, Feb 5, 2021 at 4:55 PM Sriharsha Basavapatna
> >  wrote:
> >> On Wed, Jan 27, 2021 at 11:40 PM Eli Britstein  wrote:
> >>> VXLAN decap in OVS-DPDK configuration consists of two flows:
> >>> F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> >>> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> >>>
> >>> F1 is a classification flow. It has outer headers matches and it
> >>> classifies the packet as a VXLAN packet, and using tnl_pop action the
> >>> packet continues processing in F2.
> >>> F2 is a flow that has matches on tunnel metadata as well as on the inner
> >>> packet headers (as any other flow).
> >>>
> >>> In order to fully offload VXLAN decap path, both F1 and F2 should be
> >>> offloaded. As there are more than one flow in HW, it is possible that
> >>> F1 is done by HW but F2 is not. Packet is received by SW, and should be
> >>> processed starting from F2 as F1 was already done by HW.
> >>> Rte_flows are applicable only on physical port IDs. Vport flows (e.g. F2)
> >>> are applied on uplink ports attached to OVS.
> >>>
> >>> This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
> >>> for tunnel offloads.
> >>>
> >>> Travis:
> >>> v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> >>>
> >>> GitHub Actions:
> >>> v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> >>>
> >>> [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> >>>
> >>> Eli Britstein (13):
> >>>netdev-offload: Add HW miss packet state recover API
> >>>netdev-dpdk: Introduce DPDK tunnel APIs
> >>>netdev-offload-dpdk: Implement flow dump create/destroy APIs
> >>>netdev-dpdk: Add flow_api support for netdev vxlan vports
> >>>netdev-offload-dpdk: Implement HW miss packet recover for vport
> >>>dpif-netdev: Add HW miss packet state recover logic
> >>>netdev-offload-dpdk: Change log rate limits
> >>>netdev-offload-dpdk: Support tunnel pop action
> >>>netdev-offload-dpdk: Refactor offload rule creation
> >>>netdev-dpdk: Introduce an API to query if a dpdk port is an uplink
> >>>  port
> >>>netdev-offload-dpdk: Map netdev and ufid to offload objects
> >>>netdev-offload-dpdk: Support vports flows offload
> >>>netdev-dpdk-offload: Add vxlan pattern matching function
> >>>
> >>> Ilya Maximets (2):
> >>>netdev-offload: Allow offloading to netdev without ifindex.
> >>>netdev-offload: Disallow offloading to unrelated tunneling vports.
> >>>
> >>>   Documentation/howto/dpdk.rst  |   1 +
> >>>   NEWS  |   2 +
> >>>   lib/dpif-netdev.c |  49 +-
> >>>   lib/netdev-dpdk.c | 135 ++
> >>>   lib/netdev-dpdk.h | 104 -
> >>>   lib/netdev-offload-dpdk.c | 851 +-
> >>>   lib/netdev-offload-provider.h |   5 +
> >>>   lib/netdev-offload-tc.c   |   8 +
> >>>   lib/netdev-offload.c  |  29 +-
> >>>   lib/netdev-offload.h  |   1 +
> >>>   10 files changed, 1033 insertions(+), 152 deletions(-)
> >>>
> >>> --
> >>> 2.28.0.546.g385c171
> >>>
> >> Hi Eli,
> >>
> >> Thanks for posting this new patchset to support tunnel decap action 
> >> offload.
> >>
> >> I haven't looked at the entire patchset yet. But I focused on the
> >> patches that introduce 1-to-many mapping between an OVS flow (f2) and
> >> HW offloaded flows.
> >>
> >> Here is a representation of the design proposed in this patchset. A
> >> flow f2 (inner flow) between the VxLAN-vPort and VFRep-1, for which
> >> the underlying uplink/physical port is P0, gets offloaded to not only
> >> P0, but also to other physical ports P1, P2... and so on.
> >>
> >>  P0 <> VxLAN-vPort <> VFRep-1
> >>
> >>  P1
> >>  P2
> >>  ...
> >>  Pn
> >>
> >> IMO, the problems with this design are:
> >>
> >> - Offloading a flow to an unrelated physical device that has nothing
> >> to do with that flow (invalid device for the flow).
> >> - Offloading to not just one, but several such invalid physical devices.
> >> - Consuming HW resources for a flow that is never seen or intended to
> >> be processed by those physical devices.
> >> - Impacts flow scale on other physical devices, since it would consume
> >> their HW resources with a large number of such invalid flows.
> >> - The indirect list used to track these multiple mappings complicates
> >> the offload layer implementation.
> >> - The addition of flow_dump_create() to offload APIs, just to parse
> >> and get a list of user datapath netdevs is confusing and not needed.
> >>
> >> I have been exploring an alternate design to address this problem of
> >> figuring out the right physical device for a given tunnel inner-flow.
> >> I will send a patch, please take a look so we can continue the discussion.
> > I just posted this patch, please see the link 

Re: [ovs-dev] [PATCH 00/15] Netdev vxlan-decap offload

2021-02-05 Thread Sriharsha Basavapatna via dev
On Fri, Feb 5, 2021 at 4:55 PM Sriharsha Basavapatna
 wrote:
>
> On Wed, Jan 27, 2021 at 11:40 PM Eli Britstein  wrote:
> >
> > VXLAN decap in OVS-DPDK configuration consists of two flows:
> > F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> > F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> >
> > F1 is a classification flow. It has outer headers matches and it
> > classifies the packet as a VXLAN packet, and using tnl_pop action the
> > packet continues processing in F2.
> > F2 is a flow that has matches on tunnel metadata as well as on the inner
> > packet headers (as any other flow).
> >
> > In order to fully offload VXLAN decap path, both F1 and F2 should be
> > offloaded. As there are more than one flow in HW, it is possible that
> > F1 is done by HW but F2 is not. Packet is received by SW, and should be
> > processed starting from F2 as F1 was already done by HW.
> > Rte_flows are applicable only on physical port IDs. Vport flows (e.g. F2)
> > are applied on uplink ports attached to OVS.
> >
> > This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
> > for tunnel offloads.
> >
> > Travis:
> > v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> >
> > GitHub Actions:
> > v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> >
> > [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> >
> > Eli Britstein (13):
> >   netdev-offload: Add HW miss packet state recover API
> >   netdev-dpdk: Introduce DPDK tunnel APIs
> >   netdev-offload-dpdk: Implement flow dump create/destroy APIs
> >   netdev-dpdk: Add flow_api support for netdev vxlan vports
> >   netdev-offload-dpdk: Implement HW miss packet recover for vport
> >   dpif-netdev: Add HW miss packet state recover logic
> >   netdev-offload-dpdk: Change log rate limits
> >   netdev-offload-dpdk: Support tunnel pop action
> >   netdev-offload-dpdk: Refactor offload rule creation
> >   netdev-dpdk: Introduce an API to query if a dpdk port is an uplink
> > port
> >   netdev-offload-dpdk: Map netdev and ufid to offload objects
> >   netdev-offload-dpdk: Support vports flows offload
> >   netdev-dpdk-offload: Add vxlan pattern matching function
> >
> > Ilya Maximets (2):
> >   netdev-offload: Allow offloading to netdev without ifindex.
> >   netdev-offload: Disallow offloading to unrelated tunneling vports.
> >
> >  Documentation/howto/dpdk.rst  |   1 +
> >  NEWS  |   2 +
> >  lib/dpif-netdev.c |  49 +-
> >  lib/netdev-dpdk.c | 135 ++
> >  lib/netdev-dpdk.h | 104 -
> >  lib/netdev-offload-dpdk.c | 851 +-
> >  lib/netdev-offload-provider.h |   5 +
> >  lib/netdev-offload-tc.c   |   8 +
> >  lib/netdev-offload.c  |  29 +-
> >  lib/netdev-offload.h  |   1 +
> >  10 files changed, 1033 insertions(+), 152 deletions(-)
> >
> > --
> > 2.28.0.546.g385c171
> >
>
> Hi Eli,
>
> Thanks for posting this new patchset to support tunnel decap action offload.
>
> I haven't looked at the entire patchset yet. But I focused on the
> patches that introduce 1-to-many mapping between an OVS flow (f2) and
> HW offloaded flows.
>
> Here is a representation of the design proposed in this patchset. A
> flow f2 (inner flow) between the VxLAN-vPort and VFRep-1, for which
> the underlying uplink/physical port is P0, gets offloaded to not only
> P0, but also to other physical ports P1, P2... and so on.
>
> P0 <> VxLAN-vPort <> VFRep-1
>
> P1
> P2
> ...
> Pn
>
> IMO, the problems with this design are:
>
> - Offloading a flow to an unrelated physical device that has nothing
> to do with that flow (invalid device for the flow).
> - Offloading to not just one, but several such invalid physical devices.
> - Consuming HW resources for a flow that is never seen or intended to
> be processed by those physical devices.
> - Impacts flow scale on other physical devices, since it would consume
> their HW resources with a large number of such invalid flows.
> - The indirect list used to track these multiple mappings complicates
> the offload layer implementation.
> - The addition of flow_dump_create() to offload APIs, just to parse
> and get a list of user datapath netdevs is confusing and not needed.
>
> I have been exploring an alternate design to address this problem of
> figuring out the right physical device for a given tunnel inner-flow.
> I will send a patch, please take a look so we can continue the discussion.

I just posted this patch, please see the link below; this is currently
based on the decap offload patchset (but it can be rebased if needed,
without the decap patchset too). The patch provides changes to pass
physical port (orig_in_port) information to the offload layer in the
context of flow F2. Note that additional changes would be needed in
the decap patchset to utilize this, if we agree on this approach.


[ovs-dev] [PATCH] dpif-netdev: Provide orig_in_port in metadata for tunneled packets

2021-02-05 Thread Sriharsha Basavapatna via dev
When an encapsulated packet is recirculated through a TUNNEL_POP
action, the metadata gets reinitialized and the originating physical
port information is lost. When this flow gets processed by the vport
and it needs to be offloaded, we can't figure out the physical port
through which the tunneled packet was received.

Add a new member to the metadata: 'orig_in_port'. This is passed to
the next stage during recirculation and the offload layer should use
this to offload the flow to the right uplink port.

Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev.c | 28 ++--
 lib/netdev-offload-dpdk.c |  9 +
 lib/netdev-offload.h  |  1 +
 lib/packets.h |  1 +
 4 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 7c82a7a27..c65e1d39c 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -430,6 +430,7 @@ struct dp_flow_offload_item {
 struct match match;
 struct nlattr *actions;
 size_t actions_len;
+odp_port_t orig_in_port;/* Originating in_port for tnl flows */
 
 struct ovs_list node;
 };
@@ -2695,11 +2696,13 @@ dp_netdev_flow_offload_put(struct dp_flow_offload_item 
*offload)
 }
 }
 info.flow_mark = mark;
+info.orig_in_port = offload->orig_in_port;
 
 port = netdev_ports_get(in_port, dpif_type_str);
 if (!port) {
 goto err_free;
 }
+
 /* Taking a global 'port_mutex' to fulfill thread safety restrictions for
  * the netdev-offload-dpdk module. */
 ovs_mutex_lock(>dp->port_mutex);
@@ -2797,7 +2800,8 @@ queue_netdev_flow_del(struct dp_netdev_pmd_thread *pmd,
 static void
 queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd,
   struct dp_netdev_flow *flow, struct match *match,
-  const struct nlattr *actions, size_t actions_len)
+  const struct nlattr *actions, size_t actions_len,
+  odp_port_t orig_in_port)
 {
 struct dp_flow_offload_item *offload;
 int op;
@@ -2823,6 +2827,7 @@ queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd,
 offload->actions = xmalloc(actions_len);
 memcpy(offload->actions, actions, actions_len);
 offload->actions_len = actions_len;
+offload->orig_in_port = orig_in_port;
 
 dp_netdev_append_flow_offload(offload);
 }
@@ -3624,7 +3629,8 @@ dp_netdev_get_mega_ufid(const struct match *match, 
ovs_u128 *mega_ufid)
 static struct dp_netdev_flow *
 dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
struct match *match, const ovs_u128 *ufid,
-   const struct nlattr *actions, size_t actions_len)
+   const struct nlattr *actions, size_t actions_len,
+   odp_port_t orig_in_port)
 OVS_REQUIRES(pmd->flow_mutex)
 {
 struct ds extra_info = DS_EMPTY_INITIALIZER;
@@ -3690,7 +3696,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
 cmap_insert(>flow_table, CONST_CAST(struct cmap_node *, >node),
 dp_netdev_flow_hash(>ufid));
 
-queue_netdev_flow_put(pmd, flow, match, actions, actions_len);
+queue_netdev_flow_put(pmd, flow, match, actions, actions_len,
+  orig_in_port);
 
 if (OVS_UNLIKELY(!VLOG_DROP_DBG((_rl {
 struct ds ds = DS_EMPTY_INITIALIZER;
@@ -3761,7 +3768,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
 if (!netdev_flow) {
 if (put->flags & DPIF_FP_CREATE) {
 dp_netdev_flow_add(pmd, match, ufid, put->actions,
-   put->actions_len);
+   put->actions_len, ODPP_NONE);
 } else {
 error = ENOENT;
 }
@@ -3777,7 +3784,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
 ovsrcu_set(_flow->actions, new_actions);
 
 queue_netdev_flow_put(pmd, netdev_flow, match,
-  put->actions, put->actions_len);
+  put->actions, put->actions_len, ODPP_NONE);
 
 if (stats) {
 get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL);
@@ -7100,6 +7107,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 
 if (!md_is_valid) {
 pkt_metadata_init(>md, port_no);
+packet->md.orig_in_port = port_no;
 }
 
 if (*recirc_depth_get() == 0) {
@@ -7129,6 +7137,8 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 continue;
 }
 }
+} else {
+packet->md.orig_in_port = port_no;
 }
 }
 
@@ -7204,6 +7214,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
 ovs_u128 ufid;
 int error;
 uint64_t cycles = cycles_counter_update(>perf_stats);
+odp_port_t orig_in_port = packet->md.orig_in_port;
 
 match.tun_md.valid = false;
 miniflow_expand(>mf, );
@@ -7253,7 +7264,7 @@ 

Re: [ovs-dev] [PATCH 00/15] Netdev vxlan-decap offload

2021-02-05 Thread Sriharsha Basavapatna via dev
On Wed, Jan 27, 2021 at 11:40 PM Eli Britstein  wrote:
>
> VXLAN decap in OVS-DPDK configuration consists of two flows:
> F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
>
> F1 is a classification flow. It has outer headers matches and it
> classifies the packet as a VXLAN packet, and using tnl_pop action the
> packet continues processing in F2.
> F2 is a flow that has matches on tunnel metadata as well as on the inner
> packet headers (as any other flow).
>
> In order to fully offload VXLAN decap path, both F1 and F2 should be
> offloaded. As there are more than one flow in HW, it is possible that
> F1 is done by HW but F2 is not. Packet is received by SW, and should be
> processed starting from F2 as F1 was already done by HW.
> Rte_flows are applicable only on physical port IDs. Vport flows (e.g. F2)
> are applied on uplink ports attached to OVS.
>
> This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
> for tunnel offloads.
>
> Travis:
> v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
>
> GitHub Actions:
> v1: https://github.com/elibritstein/OVS/actions/runs/515334647
>
> [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
>
> Eli Britstein (13):
>   netdev-offload: Add HW miss packet state recover API
>   netdev-dpdk: Introduce DPDK tunnel APIs
>   netdev-offload-dpdk: Implement flow dump create/destroy APIs
>   netdev-dpdk: Add flow_api support for netdev vxlan vports
>   netdev-offload-dpdk: Implement HW miss packet recover for vport
>   dpif-netdev: Add HW miss packet state recover logic
>   netdev-offload-dpdk: Change log rate limits
>   netdev-offload-dpdk: Support tunnel pop action
>   netdev-offload-dpdk: Refactor offload rule creation
>   netdev-dpdk: Introduce an API to query if a dpdk port is an uplink
> port
>   netdev-offload-dpdk: Map netdev and ufid to offload objects
>   netdev-offload-dpdk: Support vports flows offload
>   netdev-dpdk-offload: Add vxlan pattern matching function
>
> Ilya Maximets (2):
>   netdev-offload: Allow offloading to netdev without ifindex.
>   netdev-offload: Disallow offloading to unrelated tunneling vports.
>
>  Documentation/howto/dpdk.rst  |   1 +
>  NEWS  |   2 +
>  lib/dpif-netdev.c |  49 +-
>  lib/netdev-dpdk.c | 135 ++
>  lib/netdev-dpdk.h | 104 -
>  lib/netdev-offload-dpdk.c | 851 +-
>  lib/netdev-offload-provider.h |   5 +
>  lib/netdev-offload-tc.c   |   8 +
>  lib/netdev-offload.c  |  29 +-
>  lib/netdev-offload.h  |   1 +
>  10 files changed, 1033 insertions(+), 152 deletions(-)
>
> --
> 2.28.0.546.g385c171
>

Hi Eli,

Thanks for posting this new patchset to support tunnel decap action offload.

I haven't looked at the entire patchset yet. But I focused on the
patches that introduce 1-to-many mapping between an OVS flow (f2) and
HW offloaded flows.

Here is a representation of the design proposed in this patchset. A
flow f2 (inner flow) between the VxLAN-vPort and VFRep-1, for which
the underlying uplink/physical port is P0, gets offloaded to not only
P0, but also to other physical ports P1, P2... and so on.

P0 <> VxLAN-vPort <> VFRep-1

P1
P2
...
Pn

IMO, the problems with this design are:

- Offloading a flow to an unrelated physical device that has nothing
to do with that flow (invalid device for the flow).
- Offloading to not just one, but several such invalid physical devices.
- Consuming HW resources for a flow that is never seen or intended to
be processed by those physical devices.
- Impacts flow scale on other physical devices, since it would consume
their HW resources with a large number of such invalid flows.
- The indirect list used to track these multiple mappings complicates
the offload layer implementation.
- The addition of flow_dump_create() to offload APIs, just to parse
and get a list of user datapath netdevs is confusing and not needed.

I have been exploring an alternate design to address this problem of
figuring out the right physical device for a given tunnel inner-flow.
I will send a patch, please take a look so we can continue the discussion.

Thanks,
-Harsha

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 

Re: [ovs-dev] [PATCH V3 0/4] netdev datapath flush offloaded flows

2021-01-16 Thread Sriharsha Basavapatna via dev
On Sun, Jan 10, 2021 at 12:25 PM Eli Britstein  wrote:
>
>
> On 1/7/2021 9:10 PM, Sriharsha Basavapatna wrote:
> > Hi Eli,
> >
> > On Mon, Dec 28, 2020 at 11:43 PM Eli Britstein  wrote:
> >> Netdev datapath offloads are done in a separate thread, using messaging
> >> between the threads. With port removal there is a race between the offload
> >> thread removing the offloaded rules and the actual port removal, so some
> >> rules will not be removed.
> > Can you provide details on this sequence, to get more clarity on how
> > some rules will not be removed ?
>
> A deletion sequence from dpif point of view starts with dpif_port_del.
> It calls dpif->dpif_class->port_del (implemented by
> dpif_netdev_port_del) and netdev_ports_remove.
>
> flow_mark_flush is called
> (dpif_netdev_port_del->do_del_port->reconfigure_datapath->reload_affected_pmds).
> This function only posts deletion requests to the queue
> (queue_netdev_flow_del), to be later handled by the offload thread.
>
> When the offload thread wakes up to handle the request
> (dp_netdev_flow_offload_del->mark_to_flow_disassociate) it looks for the
> netdev object by netdev_ports_get in port_to_netdev map, racing the
> execution of netdev_ports_remove that removes that mapping.
>
> If the mapping is removed before the handling, the removal of the HW
> rule won't take place, leaking it and the related allocated memory.

Thanks for the details; the above commit message could be reworded to
be more clear like this:

With port removal there is a race -->

"With port removal,  there is a race due to which by the time the flow
delete request is processed by the offload thread, the corresponding
netdev would have already been removed (in dpif_port_del()).  And the
offload thread fails to find the netdev in
mark_to_flow_disassociate(). Thus the flow is not deleted from HW."

> >> thread removing the offloaded rules and the actual port removal, so some
> >> rules will not be removed.
>
> >
> >> In OVS the offload objects are not freed (memory
> >> leak).
> > Since dp_netdev_free_flow_offload() frees the offload object, not sure
> > how the memory leak happens ?
> As eventually netdev_offload_dpdk_flow_del is not called, the objects in
> ufid_to_rte_flow that should have been freed by
> ufid_to_rte_flow_disassociate are leaked (as well as the rte_flows not
> destroyed).

Ok;  the commit message could be more specific about the objects that
are leaked.

> >
> >> In HW the remining of the rules depend on PMD behavior.
> > A few related questions:
> >
> > 1.  In Patch-1, netdev_flow_flush() is called from do_del_port().
> > Doesn't it violate the layering rules, since port deletion shouldn't
> > be directly involved with netdev/flow related operations (that would
> > otherwise be processed as a part of datapath reconfig) ?
>
> netdev/flow operations are called from dpif-netdev layer, that includes
> do_del_port. We can call the offload functions under dp->port_mutex
> which is locked in that function.
>
> Which layering rules are violated?

With this change, a port directly operates on the flows, as opposed to
letting the normal datapath routines delete the flow in the context of
a specific pmd thread.

Before this change, queue_netdev_flow_del() was the only entry point
for flow deletion and it was called by either flow_mark_flush() for a
given pmd thread or by dp_netdev_pmd_remove_flow() again for a given
pmd thread. But with the new change, the flow gets deleted without the
context of a pmd thread.  Also, we are now entirely bypassing the
offload thread.
>
> >
> > 2. Since the rte_flow is deleted first, doesn't it leave the
> > dpif_netdev_flow in an inconsistent state, since the flow is still
> > active from the pmd thread's perspective (megaflow_to_mark and
> > mark_to_flow cmap entries are still valid, indicating that it is
> > offloaded).
>
> That flow is a sequence not related to this patch set, that occurs normally.
>
> Currently there are two types of offloads (both are ingress):
>
> - Partial. Packets are handled by SW with or without offload. Packets
> are fully processed correctly by the SW either if they don't have mark
> or have mark that is not found by mark_to_flow_find.
>
> - Full. Packets will be handled correctly by the SW as they arrive with
> no mark.
>
> The mark<->megaflow mapping disassociation is handled correctly even if
> the rte_flow is not removed.
>
> However, indeed, this is a good point to take into consideration when
> implementing egress offloading.

I understand how full and partial offloads work w.r.t mark. That was
not my point here. For an offloaded flow (partial or full), entries
are added to the "megaflow_to_mark" and "mark_to_flow" cmaps after the
flow is successfully offloaded by rte. Those entries get removed, when
the flow gets deleted, through dp_netdev_flow_offload_del() .  The
cmap entries are removed at the same time the flow is deleted from
rte. But with this change, the deletion of a flow occurs in two
separate 

Re: [ovs-dev] [PATCH V3 0/4] netdev datapath flush offloaded flows

2021-01-07 Thread Sriharsha Basavapatna via dev
Hi Eli,

On Mon, Dec 28, 2020 at 11:43 PM Eli Britstein  wrote:
>
> Netdev datapath offloads are done in a separate thread, using messaging
> between the threads. With port removal there is a race between the offload
> thread removing the offloaded rules and the actual port removal, so some
> rules will not be removed.

Can you provide details on this sequence, to get more clarity on how
some rules will not be removed ?

> In OVS the offload objects are not freed (memory
> leak).

Since dp_netdev_free_flow_offload() frees the offload object, not sure
how the memory leak happens ?

> In HW the remining of the rules depend on PMD behavior.

A few related questions:

1.  In Patch-1, netdev_flow_flush() is called from do_del_port().
Doesn't it violate the layering rules, since port deletion shouldn't
be directly involved with netdev/flow related operations (that would
otherwise be processed as a part of datapath reconfig) ?

2. Since the rte_flow is deleted first, doesn't it leave the
dpif_netdev_flow in an inconsistent state, since the flow is still
active from the pmd thread's perspective (megaflow_to_mark and
mark_to_flow cmap entries are still valid, indicating that it is
offloaded).

3. Also, how does this work with partially offloaded flows (same
megaflow mapped to multiple pmd threads) ?

Thanks,
-Harsha
> This patch-set resolves this issue using flush.
>
> v2-v1:
> - Fix for pending offload requests.
> v3-v2:
> - Rebase.
>
> Travis:
> v1: https://travis-ci.org/github/elibritstein/OVS/builds/747022942
> v2: https://travis-ci.org/github/elibritstein/OVS/builds/748788786
> v3: https://travis-ci.org/github/elibritstein/OVS/builds/751787939
>
> GitHub Actions:
> v1: https://github.com/elibritstein/OVS/actions/runs/394296553
> v2: https://github.com/elibritstein/OVS/actions/runs/413379295
> v3: https://github.com/elibritstein/OVS/actions/runs/448541155
>
> Eli Britstein (4):
>   dpif-netdev: Flush offload rules upon port deletion
>   netdev-offload-dpdk: Keep netdev in offload object
>   netdev-offload-dpdk: Refactor disassociate and flow destroy
>   netdev-offload-dpdk: Implement flow flush
>
>  lib/dpif-netdev.c |  2 +
>  lib/netdev-offload-dpdk.c | 85 +--
>  2 files changed, 56 insertions(+), 31 deletions(-)
>
> --
> 2.28.0.546.g385c171
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.13] netdev-offload-dpdk: Support vxlan encap offload with load actions

2020-10-20 Thread Sriharsha Basavapatna via dev
On Tue, Oct 20, 2020 at 8:59 PM Ilya Maximets  wrote:
>
> On 10/20/20 5:04 PM, Eli Britstein wrote:
> >
> > On 10/20/2020 1:12 PM, Sriharsha Basavapatna wrote:
> >> On Tue, Oct 20, 2020 at 2:40 PM Ilya Maximets  wrote:
> >>> On 10/20/20 10:51 AM, Sriharsha Basavapatna wrote:
>  On Mon, Oct 19, 2020 at 7:59 PM Ilya Maximets  wrote:
> > On 10/18/20 9:10 AM, Eli Britstein wrote:
> >> Struct match has the tunnel values/masks in
> >> match->flow.tunnel/match->wc.masks.tunnel.
> >> Load actions such as load:0xa566c10->NXM_NX_TUN_IPV4_DST[],
> >> load:0xbba->NXM_NX_TUN_ID[] are utilizing the tunnel masks fields, but
> >> those should not be used for matching.
> >> Offloading fails if masks is not clear. Fix it by checking if tunnel is
> >> present.
> >>
> >> Signed-off-by: Eli Britstein 
> >> ---
> > Thanks, Eli.
> >
> > Harsha, Emma, could you, please, review/test this version?
> >
> > Best regards, Ilya Maximets.
>  Reviewed this change. Do we even need this backported to 2.13, since
>  we don't support clone/tnl-push actions with offload ?
> >>> I think we could have metadata partially set even if we're not performing
> >>> clone/tnl-push actions.  Maybe something like this:
> >>>
> >>>table=0,in_port=1,ip,actions=load:0xbba->NXM_NX_TUN_ID[],goto_table(1)
> >>>
> >>> table=1,ip,nw_src=192.168.0.1,actions=load:0xa566c10->NXM_NX_TUN_IPV4_DST[], >>>  to tunnel>
> >>>table=1,ip,nw_src=192.168.0.2,actions=drop
> >>>
> >>> In this scenario packet that goes to 'drop' action might have tun_id set
> >>> in the metadata and we will not offload it.  I didn't test that though.
> >>> Does that make sense?
> >> I tested with the above set of rules. It doesn't fail in
> >> validate_flow() even without this fix, since it checks
> >> match_zero_wc.flow.tunnel and not match_zero_wc.masks.tunnel which has
> >> the tun_id (mask) set. So, this fix doesn't make any difference
> >> (validate succeeds with or without it).
> >>
> >> (gdb) p /x match_zero_wc.flow.tunnel.tun_id
> >> $4 = 0x0
> >> (gdb) p /x match_zero_wc.wc.masks.tunnel.tun_id
> >> $5 = 0x
> >
> > Harsha - you are right. Thanks. This patch is not needed there as is. Maybe 
> > it was intentional to check the "flow" and not "masks" for tunnel, so I 
> > think we can abandon it for backport <=2.13.
>
> Good catch. Thanks, Harsha!
>
> Lets abandon this patch in this case.

Thanks Ilya and Eli.
-Harsha

>
> >
> > Regarding not supporting clone/tnl-push - it is not related. The issue is 
> > (might have been) with validation of matches. If we checked the masks we 
> > would fail also the partial offload.
> >
> >>
> >> Thanks,
> >> -Harsha
>  Thanks,
>  -Harsha
> >>   lib/netdev-offload-dpdk.c | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> >> index 4538baf5e..c68d539ea 100644
> >> --- a/lib/netdev-offload-dpdk.c
> >> +++ b/lib/netdev-offload-dpdk.c
> >> @@ -1092,7 +1092,8 @@ netdev_offload_dpdk_validate_flow(const struct 
> >> match *match)
> >>   /* Create a wc-zeroed version of flow. */
> >>   match_init(_zero_wc, >flow, >wc);
> >>
> >> -if (!is_all_zeros(_zero_wc.flow.tunnel,
> >> +if (flow_tnl_dst_is_set(>flow.tunnel) &&
> >> +!is_all_zeros(_zero_wc.flow.tunnel,
> >> sizeof match_zero_wc.flow.tunnel)) {
> >>   goto err;
> >>   }
> >>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] netdev-offload-dpdk: Pass L4 proto-id to match in the L3 rte_flow_item

2020-10-20 Thread Sriharsha Basavapatna via dev
The offload layer clears the L4 protocol mask in the L3 item, when the
L4 item is passed for matching, as an optimization. This can be confusing
while parsing the headers in the PMD. Also, the datapath flow specifies
this field to be matched. This optimization is best left to the PMD.
This patch restores the code to pass the L4 protocol type in L3 match.

Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Sriharsha Basavapatna 
Acked-by: Eli Britstein 

---
v3: Updated "Acked-by:" and rebased.

v2: Updated "fixes:" tag with the right commit id.
---

 lib/netdev-offload-dpdk.c | 23 ---
 1 file changed, 23 deletions(-)

diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index 4d19f93cd..786193e16 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -676,7 +676,6 @@ static int
 parse_flow_match(struct flow_patterns *patterns,
  struct match *match)
 {
-uint8_t *next_proto_mask = NULL;
 struct flow *consumed_masks;
 uint8_t proto = 0;
 
@@ -782,7 +781,6 @@ parse_flow_match(struct flow_patterns *patterns,
 /* Save proto for L4 protocol setup. */
 proto = spec->hdr.next_proto_id &
 mask->hdr.next_proto_id;
-next_proto_mask = >hdr.next_proto_id;
 }
 /* If fragmented, then don't HW accelerate - for now. */
 if (match->wc.masks.nw_frag & match->flow.nw_frag) {
@@ -825,7 +823,6 @@ parse_flow_match(struct flow_patterns *patterns,
 
 /* Save proto for L4 protocol setup. */
 proto = spec->hdr.proto & mask->hdr.proto;
-next_proto_mask = >hdr.proto;
 }
 
 if (proto != IPPROTO_ICMP && proto != IPPROTO_UDP  &&
@@ -858,11 +855,6 @@ parse_flow_match(struct flow_patterns *patterns,
 consumed_masks->tcp_flags = 0;
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_TCP, spec, mask);
-
-/* proto == TCP and ITEM_TYPE_TCP, thus no need for proto match. */
-if (next_proto_mask) {
-*next_proto_mask = 0;
-}
 } else if (proto == IPPROTO_UDP) {
 struct rte_flow_item_udp *spec, *mask;
 
@@ -879,11 +871,6 @@ parse_flow_match(struct flow_patterns *patterns,
 consumed_masks->tp_dst = 0;
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_UDP, spec, mask);
-
-/* proto == UDP and ITEM_TYPE_UDP, thus no need for proto match. */
-if (next_proto_mask) {
-*next_proto_mask = 0;
-}
 } else if (proto == IPPROTO_SCTP) {
 struct rte_flow_item_sctp *spec, *mask;
 
@@ -900,11 +887,6 @@ parse_flow_match(struct flow_patterns *patterns,
 consumed_masks->tp_dst = 0;
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_SCTP, spec, mask);
-
-/* proto == SCTP and ITEM_TYPE_SCTP, thus no need for proto match. */
-if (next_proto_mask) {
-*next_proto_mask = 0;
-}
 } else if (proto == IPPROTO_ICMP) {
 struct rte_flow_item_icmp *spec, *mask;
 
@@ -921,11 +903,6 @@ parse_flow_match(struct flow_patterns *patterns,
 consumed_masks->tp_dst = 0;
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_ICMP, spec, mask);
-
-/* proto == ICMP and ITEM_TYPE_ICMP, thus no need for proto match. */
-if (next_proto_mask) {
-*next_proto_mask = 0;
-}
 }
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_END, NULL, NULL);
-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/3] netdev datapath offload: misc fixes

2020-10-20 Thread Sriharsha Basavapatna via dev
On Fri, Jul 10, 2020 at 5:37 PM Sriharsha Basavapatna
 wrote:
>
> Hi,
>
> This patchset fixes some issues found during netdev-offload-dpdk testing.
>
> Patch-1: Initialize rte 'transfer' attribute for mark/rss offload.
Patch-1 is not needed (discussed in an earlier thread).
> Patch-2: Pass L4 protocol-id to match in the rte_flow_item.
I will rebase Patch-2 (already acked, but merge conflict due to recent
changes) and send v3 it separately.
> Patch-3: Set IP_ECN_MASK only when the ECN field is matched.
Some issues observed with 'make check'; need to see if this can be
fixed differently (may be in the offload layer); please ignore this
patch for now.

Thanks,
-Harsha
>
> Thanks,
> -Harsha
>
> **
>
> v1:
> - Created this patchset using patches 1 & 2, sent separately earlier.
>   Please ignore the previous version of these patches.
> - Patch-2: Updated "fixes:" tag with the right commit id.
> - Added patch-3.
>
> **
>
> Sriharsha Basavapatna (3):
>   netdev-offload-dpdk: Set transfer attribute to zero for mark/rss
> offload
>   netdev-offload-dpdk: Pass L4 proto-id to match in the L3 rte_flow_item
>   tunnel: Set ECN mask bits only when it is matched in the IP header
>
>  lib/netdev-offload-dpdk.c | 25 ++---
>  ofproto/tunnel.c  |  8 ++--
>  2 files changed, 8 insertions(+), 25 deletions(-)
>
> --
> 2.25.0.rc2
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.13] netdev-offload-dpdk: Support vxlan encap offload with load actions

2020-10-20 Thread Sriharsha Basavapatna via dev
On Tue, Oct 20, 2020 at 2:40 PM Ilya Maximets  wrote:
>
> On 10/20/20 10:51 AM, Sriharsha Basavapatna wrote:
> > On Mon, Oct 19, 2020 at 7:59 PM Ilya Maximets  wrote:
> >>
> >> On 10/18/20 9:10 AM, Eli Britstein wrote:
> >>> Struct match has the tunnel values/masks in
> >>> match->flow.tunnel/match->wc.masks.tunnel.
> >>> Load actions such as load:0xa566c10->NXM_NX_TUN_IPV4_DST[],
> >>> load:0xbba->NXM_NX_TUN_ID[] are utilizing the tunnel masks fields, but
> >>> those should not be used for matching.
> >>> Offloading fails if masks is not clear. Fix it by checking if tunnel is
> >>> present.
> >>>
> >>> Signed-off-by: Eli Britstein 
> >>> ---
> >>
> >> Thanks, Eli.
> >>
> >> Harsha, Emma, could you, please, review/test this version?
> >>
> >> Best regards, Ilya Maximets.
> >
> > Reviewed this change. Do we even need this backported to 2.13, since
> > we don't support clone/tnl-push actions with offload ?
>
> I think we could have metadata partially set even if we're not performing
> clone/tnl-push actions.  Maybe something like this:
>
>   table=0,in_port=1,ip,actions=load:0xbba->NXM_NX_TUN_ID[],goto_table(1)
>   
> table=1,ip,nw_src=192.168.0.1,actions=load:0xa566c10->NXM_NX_TUN_IPV4_DST[],  to tunnel>
>   table=1,ip,nw_src=192.168.0.2,actions=drop
>
> In this scenario packet that goes to 'drop' action might have tun_id set
> in the metadata and we will not offload it.  I didn't test that though.
> Does that make sense?

I tested with the above set of rules. It doesn't fail in
validate_flow() even without this fix, since it checks
match_zero_wc.flow.tunnel and not match_zero_wc.masks.tunnel which has
the tun_id (mask) set. So, this fix doesn't make any difference
(validate succeeds with or without it).

(gdb) p /x match_zero_wc.flow.tunnel.tun_id
$4 = 0x0
(gdb) p /x match_zero_wc.wc.masks.tunnel.tun_id
$5 = 0x

Thanks,
-Harsha
>
> >
> > Thanks,
> > -Harsha
> >>
> >>>  lib/netdev-offload-dpdk.c | 3 ++-
> >>>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> >>> index 4538baf5e..c68d539ea 100644
> >>> --- a/lib/netdev-offload-dpdk.c
> >>> +++ b/lib/netdev-offload-dpdk.c
> >>> @@ -1092,7 +1092,8 @@ netdev_offload_dpdk_validate_flow(const struct 
> >>> match *match)
> >>>  /* Create a wc-zeroed version of flow. */
> >>>  match_init(_zero_wc, >flow, >wc);
> >>>
> >>> -if (!is_all_zeros(_zero_wc.flow.tunnel,
> >>> +if (flow_tnl_dst_is_set(>flow.tunnel) &&
> >>> +!is_all_zeros(_zero_wc.flow.tunnel,
> >>>sizeof match_zero_wc.flow.tunnel)) {
> >>>  goto err;
> >>>  }
> >>>
> >>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.13] netdev-offload-dpdk: Support vxlan encap offload with load actions

2020-10-20 Thread Sriharsha Basavapatna via dev
On Mon, Oct 19, 2020 at 7:59 PM Ilya Maximets  wrote:
>
> On 10/18/20 9:10 AM, Eli Britstein wrote:
> > Struct match has the tunnel values/masks in
> > match->flow.tunnel/match->wc.masks.tunnel.
> > Load actions such as load:0xa566c10->NXM_NX_TUN_IPV4_DST[],
> > load:0xbba->NXM_NX_TUN_ID[] are utilizing the tunnel masks fields, but
> > those should not be used for matching.
> > Offloading fails if masks is not clear. Fix it by checking if tunnel is
> > present.
> >
> > Signed-off-by: Eli Britstein 
> > ---
>
> Thanks, Eli.
>
> Harsha, Emma, could you, please, review/test this version?
>
> Best regards, Ilya Maximets.

Reviewed this change. Do we even need this backported to 2.13, since
we don't support clone/tnl-push actions with offload ?

Thanks,
-Harsha
>
> >  lib/netdev-offload-dpdk.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> > index 4538baf5e..c68d539ea 100644
> > --- a/lib/netdev-offload-dpdk.c
> > +++ b/lib/netdev-offload-dpdk.c
> > @@ -1092,7 +1092,8 @@ netdev_offload_dpdk_validate_flow(const struct match 
> > *match)
> >  /* Create a wc-zeroed version of flow. */
> >  match_init(_zero_wc, >flow, >wc);
> >
> > -if (!is_all_zeros(_zero_wc.flow.tunnel,
> > +if (flow_tnl_dst_is_set(>flow.tunnel) &&
> > +!is_all_zeros(_zero_wc.flow.tunnel,
> >sizeof match_zero_wc.flow.tunnel)) {
> >  goto err;
> >  }
> >
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.13] netdev-offload-dpdk: Support vxlan encap offload with load actions

2020-10-19 Thread Sriharsha Basavapatna via dev
On Mon, Oct 19, 2020 at 7:59 PM Ilya Maximets  wrote:
>
> On 10/18/20 9:10 AM, Eli Britstein wrote:
> > Struct match has the tunnel values/masks in
> > match->flow.tunnel/match->wc.masks.tunnel.
> > Load actions such as load:0xa566c10->NXM_NX_TUN_IPV4_DST[],
> > load:0xbba->NXM_NX_TUN_ID[] are utilizing the tunnel masks fields, but
> > those should not be used for matching.
> > Offloading fails if masks is not clear. Fix it by checking if tunnel is
> > present.
> >
> > Signed-off-by: Eli Britstein 
> > ---
>
> Thanks, Eli.
>
> Harsha, Emma, could you, please, review/test this version?
>
> Best regards, Ilya Maximets.

Ilya, I'll review/test this today.
Thanks,
-Harsha
>
> >  lib/netdev-offload-dpdk.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> > index 4538baf5e..c68d539ea 100644
> > --- a/lib/netdev-offload-dpdk.c
> > +++ b/lib/netdev-offload-dpdk.c
> > @@ -1092,7 +1092,8 @@ netdev_offload_dpdk_validate_flow(const struct match 
> > *match)
> >  /* Create a wc-zeroed version of flow. */
> >  match_init(_zero_wc, >flow, >wc);
> >
> > -if (!is_all_zeros(_zero_wc.flow.tunnel,
> > +if (flow_tnl_dst_is_set(>flow.tunnel) &&
> > +!is_all_zeros(_zero_wc.flow.tunnel,
> >sizeof match_zero_wc.flow.tunnel)) {
> >  goto err;
> >  }
> >
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/3] tunnel: Set ECN mask bits only when it is matched in the IP header

2020-10-16 Thread Sriharsha Basavapatna via dev
On Wed, Oct 14, 2020 at 8:29 PM Mark Gray  wrote:
>
> This seems to break tests for me? Did you run a "make check"? I didn't
> apply the whole series because patch 2 doesn't apply so maybe that is
> needed for the tests to pass.

I rebased patch-2 and it now applies fine. But I'm seeing errors like
you mentioned with a "make check".  And the errors are seen only with
this patch (Set ECN...) These are the tests:
0758 - tunnel - Geneve option present
0759 - tunnel - Delete Geneve option
0763 - tunnel - Mix Geneve/GRE options
2270 - ptap - triangle bridge setup with L2 and L3 GRE tunnels
2273 - ptap - L3 over patch port

I looked at the testsuite.log for these tests, but it is not clear to
me why they are failing.

Ilya,

Could this be related to your comment on another patch ("Support vxlan
encap offload with load actions"), where you mentioned: "It seems like
tunnel metadata and a couple of other fields are special cases that
allowed to have masks set without having keys."

I'm wondering if this fix should also be moved into the offload layer.
With vxlan encap action we noticed that the mask for the IP tos field
was set though the field itself was not really being matched. Datapath
rule looks like this (tos=0/0x3 below).

ufid:255d198a-72ab-47a8-9a2b-a7e55feb66f5,
skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(port1),packet_type(ns=0,id=0),eth(src=8e:40:64:9c:02:72,dst=36:b2:fc:ba:37:cc),eth_type(0x0800),ipv4(src=192.168.11.3/0.0.0.0,dst=192.168.11.1/0.0.0.0,proto=1/0,tos=0/0x3,ttl=64/0,frag=no),icmp(type=0/0,code=0/0),
packets:255, bytes:24990, used:0.000s, dp:ovs,
actions:clone(tnl_push(tnl_port(.).

 Any thoughts ?

Thanks,
-Harsha


>
> Also, could you add a description in the commit message to make it clear
> this should be done?
>
> This comment now also looks wrong?
>
> >  /* ECN fields are always inherited. */
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V2 1/1] netdev-offload-dpdk: Preserve HW statistics for modified flows

2020-10-14 Thread Sriharsha Basavapatna via dev
On Mon, Oct 12, 2020 at 7:57 PM Eli Britstein  wrote:
>
> In case of a flow modification, preserve the HW statistics of the old HW
> flow to the new one.
>
> Fixes: 3c7330ebf036 ("netdev-offload-dpdk: Support offload of output action.")
> Signed-off-by: Eli Britstein 
> Reviewed-by: Gaetan Rivet 

Acked-by: Sriharsha Basavapatna 

> ---
>  lib/netdev-offload-dpdk.c | 33 ++---
>  1 file changed, 22 insertions(+), 11 deletions(-)
>
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index 5b632bac4..dadd8f253 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -78,7 +78,7 @@ ufid_to_rte_flow_data_find(const ovs_u128 *ufid)
>  return NULL;
>  }
>
> -static inline void
> +static inline struct ufid_to_rte_flow_data *
>  ufid_to_rte_flow_associate(const ovs_u128 *ufid,
> struct rte_flow *rte_flow, bool actions_offloaded)
>  {
> @@ -103,6 +103,7 @@ ufid_to_rte_flow_associate(const ovs_u128 *ufid,
>
>  cmap_insert(_to_rte_flow,
>  CONST_CAST(struct cmap_node *, >node), hash);
> +return data;
>  }
>
>  static inline void
> @@ -1420,7 +1421,7 @@ out:
>  return flow;
>  }
>
> -static int
> +static struct ufid_to_rte_flow_data *
>  netdev_offload_dpdk_add_flow(struct netdev *netdev,
>   struct match *match,
>   struct nlattr *nl_actions,
> @@ -1429,12 +1430,11 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
>   struct offload_info *info)
>  {
>  struct flow_patterns patterns = { .items = NULL, .cnt = 0 };
> +struct ufid_to_rte_flow_data *flows_data = NULL;
>  bool actions_offloaded = true;
>  struct rte_flow *flow;
> -int ret = 0;
>
> -ret = parse_flow_match(, match);
> -if (ret) {
> +if (parse_flow_match(, match)) {
>  VLOG_DBG_RL(, "%s: matches of ufid "UUID_FMT" are not supported",
>  netdev_get_name(netdev), UUID_ARGS((struct uuid *) 
> ufid));
>  goto out;
> @@ -1452,16 +1452,15 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
>  }
>
>  if (!flow) {
> -ret = -1;
>  goto out;
>  }
> -ufid_to_rte_flow_associate(ufid, flow, actions_offloaded);
> +flows_data = ufid_to_rte_flow_associate(ufid, flow, actions_offloaded);
>  VLOG_DBG("%s: installed flow %p by ufid "UUID_FMT,
>   netdev_get_name(netdev), flow, UUID_ARGS((struct uuid *)ufid));
>
>  out:
>  free_flow_patterns();
> -return ret;
> +return flows_data;
>  }
>
>  static int
> @@ -1495,14 +1494,19 @@ netdev_offload_dpdk_flow_put(struct netdev *netdev, 
> struct match *match,
>   struct dpif_flow_stats *stats)
>  {
>  struct ufid_to_rte_flow_data *rte_flow_data;
> +struct dpif_flow_stats old_stats;
> +bool modification = false;
>  int ret;
>
>  /*
>   * If an old rte_flow exists, it means it's a flow modification.
>   * Here destroy the old rte flow first before adding a new one.
> + * Keep the stats for the newly created rule.
>   */
>  rte_flow_data = ufid_to_rte_flow_data_find(ufid);
>  if (rte_flow_data && rte_flow_data->rte_flow) {
> +old_stats = rte_flow_data->stats;
> +modification = true;
>  ret = netdev_offload_dpdk_destroy_flow(netdev, ufid,
> rte_flow_data->rte_flow);
>  if (ret < 0) {
> @@ -1510,11 +1514,18 @@ netdev_offload_dpdk_flow_put(struct netdev *netdev, 
> struct match *match,
>  }
>  }
>
> +rte_flow_data = netdev_offload_dpdk_add_flow(netdev, match, actions,
> + actions_len, ufid, info);
> +if (!rte_flow_data) {
> +return -1;
> +}
> +if (modification) {
> +rte_flow_data->stats = old_stats;
> +}
>  if (stats) {
> -memset(stats, 0, sizeof *stats);
> +*stats = rte_flow_data->stats;
>  }
> -return netdev_offload_dpdk_add_flow(netdev, match, actions,
> -actions_len, ufid, info);
> +return 0;
>  }
>
>  static int
> --
> 2.26.2.1730.g385c171
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] netdev-offload-dpdk: Preserve HW statistics for modified flows

2020-10-12 Thread Sriharsha Basavapatna via dev
On Mon, Oct 12, 2020 at 4:43 PM Eli Britstein  wrote:
>
>
> On 10/12/2020 11:45 AM, Sriharsha Basavapatna wrote:
> > On Sun, Sep 6, 2020 at 5:52 PM Eli Britstein  wrote:
> >> ping
> >>
> >> On 7/30/2020 4:52 PM, Eli Britstein wrote:
> >>> In case of a flow modification, preserve the HW statistics of the old HW
> >>> flow to the new one.
> >>>
> >>> Signed-off-by: Eli Britstein 
> >>> Reviewed-by: Gaetan Rivet 
> > Update fixes: tag
> I'll put 3c7330ebf036 netdev-offload-dpdk: Support offload of output action.
> As it's the first commit that does full offload. Before that there are
> no HW rules that count. What do you think?

That's fine.
> >>> ---
> >>>lib/netdev-offload-dpdk.c | 28 +---
> >>>1 file changed, 21 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> >>> index de6101e4d..960acb2da 100644
> >>> --- a/lib/netdev-offload-dpdk.c
> >>> +++ b/lib/netdev-offload-dpdk.c
> >>> @@ -78,7 +78,7 @@ ufid_to_rte_flow_data_find(const ovs_u128 *ufid)
> >>>return NULL;
> >>>}
> >>>
> >>> -static inline void
> >>> +static inline struct ufid_to_rte_flow_data *
> >>>ufid_to_rte_flow_associate(const ovs_u128 *ufid,
> >>>   struct rte_flow *rte_flow, bool 
> >>> actions_offloaded)
> >>>{
> >>> @@ -103,6 +103,7 @@ ufid_to_rte_flow_associate(const ovs_u128 *ufid,
> >>>
> >>>cmap_insert(_to_rte_flow,
> >>>CONST_CAST(struct cmap_node *, >node), hash);
> >>> +return data;
> >>>}
> >>>
> >>>static inline void
> >>> @@ -1407,7 +1408,7 @@ out:
> >>>return flow;
> >>>}
> >>>
> >>> -static int
> >>> +static struct ufid_to_rte_flow_data *
> >>>netdev_offload_dpdk_add_flow(struct netdev *netdev,
> >>> struct match *match,
> >>> struct nlattr *nl_actions,
> >>> @@ -1416,6 +1417,7 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
> >>> struct offload_info *info)
> >>>{
> >>>struct flow_patterns patterns = { .items = NULL, .cnt = 0 };
> >>> +struct ufid_to_rte_flow_data *flows_data = NULL;
> >>>bool actions_offloaded = true;
> >>>struct rte_flow *flow;
> >>>int ret = 0;
> > We can eliminate 'ret' with this change, since it is only being used
> > to catch the return value of parse_flow_match().
> Ack
> >>> @@ -1442,13 +1444,13 @@ netdev_offload_dpdk_add_flow(struct netdev 
> >>> *netdev,
> >>>ret = -1;
> > Relates to the above comment.
> >>>goto out;
> >>>}
> >>> -ufid_to_rte_flow_associate(ufid, flow, actions_offloaded);
> >>> +flows_data = ufid_to_rte_flow_associate(ufid, flow, 
> >>> actions_offloaded);
> >>>VLOG_DBG("%s: installed flow %p by ufid "UUID_FMT,
> >>> netdev_get_name(netdev), flow, UUID_ARGS((struct uuid 
> >>> *)ufid));
> >>>
> >>>out:
> >>>free_flow_patterns();
> >>> -return ret;
> >>> +return flows_data;
> >>>}
> >>>
> >>>static int
> >>> @@ -1482,14 +1484,19 @@ netdev_offload_dpdk_flow_put(struct netdev 
> >>> *netdev, struct match *match,
> >>> struct dpif_flow_stats *stats)
> >>>{
> >>>struct ufid_to_rte_flow_data *rte_flow_data;
> >>> +struct dpif_flow_stats old_stats;
> >>> +bool modification = false;
> >>>int ret;
> >>>
> >>>/*
> >>> * If an old rte_flow exists, it means it's a flow modification.
> >>> * Here destroy the old rte flow first before adding a new one.
> >>> + * Keep the stats for the newly created rule.
> >>> */
> >>>rte_flow_data = ufid_to_rte_flow_data_find(ufid);
> >>>if (rte_flow_data && rte_flow_data->rte_flow) {
> >>> +old_stats = rte_flow_data->stats;
> >>> +modification = true;
> >>>ret = netdev_offload_dpdk_destroy_flow(netdev, ufid,
> >>>   
> >>> rte_flow_data->rte_flow);
> >>>if (ret < 0) {
> >>> @@ -1497,11 +1504,18 @@ netdev_offload_dpdk_flow_put(struct netdev 
> >>> *netdev, struct match *match,
> >>>}
> >>>}
> >>>
> >>> +rte_flow_data = netdev_offload_dpdk_add_flow(netdev, match, actions,
> >>> + actions_len, ufid, 
> >>> info);
> >>> +if (!rte_flow_data) {
> >>> +return -1;
> >>> +}
> >>> +if (modification) {
> >>> +rte_flow_data->stats = old_stats;
> >>> +}
> >>>if (stats) {
> >>> -memset(stats, 0, sizeof *stats);
> > What if it is a new flow, don't we need to clear the stats ?
>
> It is already cleared before. Allocation is xZalloc. See in
> ufid_to_rte_flow_associate:
>
>  struct ufid_to_rte_flow_data *data = xzalloc(sizeof *data);

Ok, thanks.
>
> >>> +*stats = rte_flow_data->stats;
> >>>}
> >>> -return 

Re: [ovs-dev] [PATCH 1/1] netdev-offload-dpdk: Support vxlan encap offload with load actions

2020-10-12 Thread Sriharsha Basavapatna via dev
On Mon, Oct 12, 2020 at 4:48 PM Eli Britstein  wrote:
>
>
> On 10/12/2020 10:20 AM, Sriharsha Basavapatna wrote:
> > On Sun, Sep 6, 2020 at 5:51 PM Eli Britstein  wrote:
> >> ping
> >>
> >> On 7/30/2020 1:58 PM, Eli Britstein wrote:
> >>> From: Lei Wang 
> >>>
> >>> Struct match has the tunnel values/masks in
> >>> match->flow.tunnel/match->wc.masks.tunnel.
> >>> Load actions such as load:0xa566c10->NXM_NX_TUN_IPV4_DST[],
> >>> load:0xbba->NXM_NX_TUN_ID[] are utilizing the tunnel masks fields,
> >>> but those should not be used for matching.
> >>> Offloading fails if masks is not clear. Clear it if no tunnel used.
> >>>
> >>> Signed-off-by: Lei Wang 
> >>> Reviewed-by: Eli Britstein 
> >>> Reviewed-by: Gaetan Rivet 
> >>> Signed-off-by: Eli Britstein 

Acked-by: Sriharsha Basavapatna 

See my comment below.

> >>> ---
> >>>lib/netdev-offload-dpdk.c | 4 
> >>>1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> >>> index de6101e4d..0d23e4879 100644
> >>> --- a/lib/netdev-offload-dpdk.c
> >>> +++ b/lib/netdev-offload-dpdk.c
> >>> @@ -682,6 +682,10 @@ parse_flow_match(struct flow_patterns *patterns,
> >>>
> >>>consumed_masks = >wc.masks;
> >>>
> >>> +if (!flow_tnl_dst_is_set(>flow.tunnel)) {
> >>> +memset(>wc.masks.tunnel, 0, sizeof 
> >>> match->wc.masks.tunnel);
> >>> +}
> >>> +
> > This fix looks good to me.  Did you take a look to see if this can be
> > fixed in the code that generates these invalid masks in the first
> > place ?
> I wouldn't say it's "invalid masks". OVS takes those masks into
> consideration with such usage of flow_tnl_dst_is_set, that is done in
> several places in the code. For example odp-util.c, lib/netdev-offload-tc.c.

It is invalid because the datapath rule is specifying tunnel masks
when we are not really matching on them. In the context of encap
actions (which is what this patch is fixing), tunnel masks shouldn't
be set.

> >
> > Thanks,
> > -Harsha
> >>>memset(_masks->in_port, 0, sizeof 
> >>> consumed_masks->in_port);
> >>>/* recirc id must be zero. */
> >>>if (match->wc.masks.recirc_id & match->flow.recirc_id) {
> >> ___
> >> dev mailing list
> >> d...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] netdev-offload-dpdk: Preserve HW statistics for modified flows

2020-10-12 Thread Sriharsha Basavapatna via dev
On Sun, Sep 6, 2020 at 5:52 PM Eli Britstein  wrote:
>
> ping
>
> On 7/30/2020 4:52 PM, Eli Britstein wrote:
> > In case of a flow modification, preserve the HW statistics of the old HW
> > flow to the new one.
> >
> > Signed-off-by: Eli Britstein 
> > Reviewed-by: Gaetan Rivet 

Update fixes: tag
> > ---
> >   lib/netdev-offload-dpdk.c | 28 +---
> >   1 file changed, 21 insertions(+), 7 deletions(-)
> >
> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> > index de6101e4d..960acb2da 100644
> > --- a/lib/netdev-offload-dpdk.c
> > +++ b/lib/netdev-offload-dpdk.c
> > @@ -78,7 +78,7 @@ ufid_to_rte_flow_data_find(const ovs_u128 *ufid)
> >   return NULL;
> >   }
> >
> > -static inline void
> > +static inline struct ufid_to_rte_flow_data *
> >   ufid_to_rte_flow_associate(const ovs_u128 *ufid,
> >  struct rte_flow *rte_flow, bool 
> > actions_offloaded)
> >   {
> > @@ -103,6 +103,7 @@ ufid_to_rte_flow_associate(const ovs_u128 *ufid,
> >
> >   cmap_insert(_to_rte_flow,
> >   CONST_CAST(struct cmap_node *, >node), hash);
> > +return data;
> >   }
> >
> >   static inline void
> > @@ -1407,7 +1408,7 @@ out:
> >   return flow;
> >   }
> >
> > -static int
> > +static struct ufid_to_rte_flow_data *
> >   netdev_offload_dpdk_add_flow(struct netdev *netdev,
> >struct match *match,
> >struct nlattr *nl_actions,
> > @@ -1416,6 +1417,7 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
> >struct offload_info *info)
> >   {
> >   struct flow_patterns patterns = { .items = NULL, .cnt = 0 };
> > +struct ufid_to_rte_flow_data *flows_data = NULL;
> >   bool actions_offloaded = true;
> >   struct rte_flow *flow;
> >   int ret = 0;

We can eliminate 'ret' with this change, since it is only being used
to catch the return value of parse_flow_match().
> > @@ -1442,13 +1444,13 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
> >   ret = -1;

Relates to the above comment.
> >   goto out;
> >   }
> > -ufid_to_rte_flow_associate(ufid, flow, actions_offloaded);
> > +flows_data = ufid_to_rte_flow_associate(ufid, flow, actions_offloaded);
> >   VLOG_DBG("%s: installed flow %p by ufid "UUID_FMT,
> >netdev_get_name(netdev), flow, UUID_ARGS((struct uuid 
> > *)ufid));
> >
> >   out:
> >   free_flow_patterns();
> > -return ret;
> > +return flows_data;
> >   }
> >
> >   static int
> > @@ -1482,14 +1484,19 @@ netdev_offload_dpdk_flow_put(struct netdev *netdev, 
> > struct match *match,
> >struct dpif_flow_stats *stats)
> >   {
> >   struct ufid_to_rte_flow_data *rte_flow_data;
> > +struct dpif_flow_stats old_stats;
> > +bool modification = false;
> >   int ret;
> >
> >   /*
> >* If an old rte_flow exists, it means it's a flow modification.
> >* Here destroy the old rte flow first before adding a new one.
> > + * Keep the stats for the newly created rule.
> >*/
> >   rte_flow_data = ufid_to_rte_flow_data_find(ufid);
> >   if (rte_flow_data && rte_flow_data->rte_flow) {
> > +old_stats = rte_flow_data->stats;
> > +modification = true;
> >   ret = netdev_offload_dpdk_destroy_flow(netdev, ufid,
> >  rte_flow_data->rte_flow);
> >   if (ret < 0) {
> > @@ -1497,11 +1504,18 @@ netdev_offload_dpdk_flow_put(struct netdev *netdev, 
> > struct match *match,
> >   }
> >   }
> >
> > +rte_flow_data = netdev_offload_dpdk_add_flow(netdev, match, actions,
> > + actions_len, ufid, info);
> > +if (!rte_flow_data) {
> > +return -1;
> > +}
> > +if (modification) {
> > +rte_flow_data->stats = old_stats;
> > +}
> >   if (stats) {
> > -memset(stats, 0, sizeof *stats);

What if it is a new flow, don't we need to clear the stats ?
> > +*stats = rte_flow_data->stats;
> >   }
> > -return netdev_offload_dpdk_add_flow(netdev, match, actions,
> > -actions_len, ufid, info);
> > +return 0;
> >   }
> >
> >   static int
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] netdev-offload-dpdk: Support vxlan encap offload with load actions

2020-10-12 Thread Sriharsha Basavapatna via dev
On Sun, Sep 6, 2020 at 5:51 PM Eli Britstein  wrote:
>
> ping
>
> On 7/30/2020 1:58 PM, Eli Britstein wrote:
> > From: Lei Wang 
> >
> > Struct match has the tunnel values/masks in
> > match->flow.tunnel/match->wc.masks.tunnel.
> > Load actions such as load:0xa566c10->NXM_NX_TUN_IPV4_DST[],
> > load:0xbba->NXM_NX_TUN_ID[] are utilizing the tunnel masks fields,
> > but those should not be used for matching.
> > Offloading fails if masks is not clear. Clear it if no tunnel used.
> >
> > Signed-off-by: Lei Wang 
> > Reviewed-by: Eli Britstein 
> > Reviewed-by: Gaetan Rivet 
> > Signed-off-by: Eli Britstein 
> > ---
> >   lib/netdev-offload-dpdk.c | 4 
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> > index de6101e4d..0d23e4879 100644
> > --- a/lib/netdev-offload-dpdk.c
> > +++ b/lib/netdev-offload-dpdk.c
> > @@ -682,6 +682,10 @@ parse_flow_match(struct flow_patterns *patterns,
> >
> >   consumed_masks = >wc.masks;
> >
> > +if (!flow_tnl_dst_is_set(>flow.tunnel)) {
> > +memset(>wc.masks.tunnel, 0, sizeof match->wc.masks.tunnel);
> > +}
> > +
This fix looks good to me.  Did you take a look to see if this can be
fixed in the code that generates these invalid masks in the first
place ?

Thanks,
-Harsha
> >   memset(_masks->in_port, 0, sizeof consumed_masks->in_port);
> >   /* recirc id must be zero. */
> >   if (match->wc.masks.recirc_id & match->flow.recirc_id) {
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpif-netdev: Avoid deadlock with offloading during PMD thread deletion.

2020-07-16 Thread Sriharsha Basavapatna via dev
On Thu, Jul 16, 2020 at 2:52 PM Ilya Maximets  wrote:
>
> On 7/16/20 10:01 AM, Eli Britstein wrote:
> >
> > On 7/16/2020 1:37 AM, Ilya Maximets wrote:
> >> On 7/15/20 8:30 PM, Stokes, Ian wrote:
>  On 15/07/2020 17:00, Ilya Maximets wrote:
> > Main thread will try to pause/stop all revalidators during datapath
> > reconfiguration via datapath purge callback (dp_purge_cb) while
> > holding 'dp->port_mutex'.  And deadlock happens in case any of
> > revalidator threads is already waiting on 'dp->port_mutex' while
> > dumping offloaded flows:
> >
> > main thread   revalidator
> >   -  --
> >
> >   ovs_mutex_lock(>port_mutex)
> >
> >  dpif_netdev_flow_dump_next()
> >  -> dp_netdev_flow_to_dpif_flow
> >  -> get_dpif_flow_status
> >  -> 
> > dpif_netdev_get_flow_offload_status()
> >  -> ovs_mutex_lock(>port_mutex)
> > 
> >
> >   reconfigure_datapath()
> >   -> reconfigure_pmd_threads()
> >   -> dp_netdev_del_pmd()
> >   -> dp_purge_cb()
> >   -> udpif_pause_revalidators()
> >   -> ovs_barrier_block(>pause_barrier)
> >  
> >
> >
> >
> > We're not allowed to call offloading API without holding global port
> > mutex from the userspace datapath due to thread safety restrictions on
> > netdev-offload-dpdk module.  And it's also not easy to rework datapath
> > reconfiguration process in order to move actual PMD removal and
> > datapath purge out of the port mutex.
> >
> > So, for now, not sleeping on a mutex if it's not immediately available
> > seem like an easiest workaround.  This will have impact on flow
> > statistics update rate and on ability to get the latest statistics
> > before removing the flow (latest stats will be lost in case we were
> > not able to take the mutex).  However, this will allow us to operate
> > normally avoiding the deadlock.
> >
> > The last bit is that to avoid flapping of flow attributes and
> > statistics we're not failing the operation, but returning last
> > statistics and attributes returned by offload provider.  Since those
> > might be updated in different threads, stores and reads are atomic.
> >
> > Reported-by: Frank Wang (王培辉) 
> > Reported-at:
> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2020-June%2F371753.htmldata=02%7C01%7Celibr%40mellanox.com%7C57c79422483e42047f4d08d8290fb1e8%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637304494672050066sdata=2HYA6RPY5LSa%2Beqy0WJZE0IZ65qRFacBph0M4SLXkwo%3Dreserved=0
> > Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows
> > statistics.")
> > Signed-off-by: Ilya Maximets 
>  LGTM
> 
>  Acked-by: Kevin Traynor 
> 
> >>> Looks ok to me as well Ilya, had some trouble getting some hardware setup 
> >>> to test in lab,  but standard test were ok it seems, no ill effect there. 
> >>> Not sure if you want to see a test from another vendor device?
> >>>
> >>> Acked-by: Ian Stokes 
> >> Thanks, Kevin and Ian.
> >>
> >> It'll be good to have tests from Mellanox or Broadcom, or from the
> >> original reporter.  But, if there will be no more replies until the
> >> end of July 16, I'll try to perform couple of synthetic tests by my
> >> own hands to see if it actually avoids deadlock and apply the patch.
> >> Anyway deadlock is worse than any stats or attributes misreporting.
> >
> > Hi. I tried but failed to reproduce it on master (without this patch).
> >
> > I run traffic (ping) between 2 ports (offloaded), and in the background 
> > removing/adding one of the ports in a loop.
> >
> > Is there a better way to reproduce?
>
> You need dp_netdev_del_pmd() being called, so I'd suggest removing
> pmd threads from the pmd-cpu-mask.  I guess, that there was such
> event preceding the port deletion in the original bug report since
> port deletion itself should not trigger the pmd thread deletion.

Ilya,

If you can send me the steps, I'll try this with my partial offload
patchset. Let me know.

Thanks,
-Harsha
>
> >
> >>
> >> Best regards, Ilya Maximets.
> >>
> > ---
> >
> > Version 2:
> >- Storing error code from netdev_flow_get() to mimic the actual 
> > result
> >  of the last call.
> >
> >   AUTHORS.rst   |  1 +
> >   lib/dpif-netdev.c | 93
> > ---
> >   2 files changed, 88 insertions(+), 6 deletions(-)
> >
> > diff --git a/AUTHORS.rst b/AUTHORS.rst index 8e6a0769f..eb36a01d0
> > 100644
> 

Re: [ovs-dev] [PATCH 1/3] netdev-offload-dpdk: Set transfer attribute to zero for mark/rss offload

2020-07-13 Thread Sriharsha Basavapatna via dev
On Mon, Jul 13, 2020 at 1:28 PM Eli Britstein  wrote:
>
>
> On 7/13/2020 10:55 AM, Sriharsha Basavapatna wrote:
> > On Mon, Jul 13, 2020 at 1:07 PM Eli Britstein  wrote:
> >>
> >> On 7/10/2020 3:07 PM, Sriharsha Basavapatna wrote:
> >>> The offload layer doesn't initialize the 'transfer' attribute
> >>> for mark/rss offload (partial offload). It should be set to 0.
> >>>
> >>> Fixes: 60e778c7533a ("netdev-offload-dpdk: Framework for actions 
> >>> offload.")
> >> It is not a bug. .ingress = 1 is also sufficient.
> > ingress and transfer are different attributes.
> Need to initialize only the non-zero fields. Others are implicitly
> initialized to zero.
> >
> >> See
> >> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gnu.org%2Fsoftware%2Fgnu-c-manual%2Fgnu-c-manual.html%23Initializing-Structure-Membersdata=02%7C01%7Celibr%40mellanox.com%7Cadfaff15fd8e4cec042808d827021b34%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637302237280357657sdata=KCkgeMHqrVUtVkPvciFB4cLO%2FiN%2BTP9WZmcaXMf2LqY%3Dreserved=0
> > Let us set it explicitly to 0, like other members (group etc).
> >> Anyway, this is the commit that introduced it.
> >>
> >> e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
> Remove the fixes. It doesn't fix anything. You can explain to make it
> explicit in the commit msg.
I thought about it, the concern I had was uninitialized value for the
transfer field. Like you pointed out, since it is not an issue (even
for local struct members), I'll drop this patch. I updated the second
patch (L4 proto id) with the right 'fixes' as per your previous
comment. Did you look at the 3rd patch in this set ? any comments ?
Thanks,
-Harsha
> > Ok, thanks.
> > -Harsha
> >
> >>> Signed-off-by: Sriharsha Basavapatna 
> >>> ---
> >>>lib/netdev-offload-dpdk.c | 3 ++-
> >>>1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> >>> index 26a75f0f2..4c652fd82 100644
> >>> --- a/lib/netdev-offload-dpdk.c
> >>> +++ b/lib/netdev-offload-dpdk.c
> >>> @@ -818,7 +818,8 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns 
> >>> *patterns,
> >>>.group = 0,
> >>>.priority = 0,
> >>>.ingress = 1,
> >>> -.egress = 0
> >>> +.egress = 0,
> >>> +.transfer = 0
> >>>};
> >>>struct rte_flow_error error;
> >>>struct rte_flow *flow;
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/3] netdev-offload-dpdk: Set transfer attribute to zero for mark/rss offload

2020-07-13 Thread Sriharsha Basavapatna via dev
On Mon, Jul 13, 2020 at 1:07 PM Eli Britstein  wrote:
>
>
> On 7/10/2020 3:07 PM, Sriharsha Basavapatna wrote:
> > The offload layer doesn't initialize the 'transfer' attribute
> > for mark/rss offload (partial offload). It should be set to 0.
> >
> > Fixes: 60e778c7533a ("netdev-offload-dpdk: Framework for actions offload.")
>
> It is not a bug. .ingress = 1 is also sufficient.

ingress and transfer are different attributes.

>
> See
> http://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#Initializing-Structure-Members

Let us set it explicitly to 0, like other members (group etc).
>
> Anyway, this is the commit that introduced it.
>
> e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")

Ok, thanks.
-Harsha

>
> > Signed-off-by: Sriharsha Basavapatna 
> > ---
> >   lib/netdev-offload-dpdk.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> > index 26a75f0f2..4c652fd82 100644
> > --- a/lib/netdev-offload-dpdk.c
> > +++ b/lib/netdev-offload-dpdk.c
> > @@ -818,7 +818,8 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns 
> > *patterns,
> >   .group = 0,
> >   .priority = 0,
> >   .ingress = 1,
> > -.egress = 0
> > +.egress = 0,
> > +.transfer = 0
> >   };
> >   struct rte_flow_error error;
> >   struct rte_flow *flow;
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v6 1/8] dpif-netdev: Refactor dp_netdev_flow_offload_put()

2020-07-12 Thread Sriharsha Basavapatna via dev
This patch refactors dp_netdev_flow_offload_put() to prepare for
changes to support partial action offload, in subsequent patches.

- Move mark allocation code into a separate wrapper function, outside of
  dp_netdev_flow_offload_put() to improve readability and to facilitate
  more changes in this function to support partial action offload.

- We need to get the in-port's netdev-type (e.g, vhost) to determine if
  the flow should be offloaded on the egress device instead. To facilitate
  such changes, netdev_ports_get() is moved ahead of mark allocation.

Reviewed-by: Hemal Shah 
Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev.c | 73 +--
 1 file changed, 45 insertions(+), 28 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 629a0cb53..f26caef6d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2499,6 +2499,43 @@ dp_netdev_flow_offload_del(struct dp_flow_offload_item 
*offload)
 return mark_to_flow_disassociate(offload->pmd, offload->flow);
 }
 
+static int
+dp_netdev_alloc_flow_mark(struct dp_netdev_flow *flow, bool modification,
+  uint32_t *markp)
+{
+uint32_t mark;
+
+if (modification) {
+mark = flow->mark;
+ovs_assert(mark != INVALID_FLOW_MARK);
+*markp = mark;
+return 0;
+}
+
+/*
+ * If a mega flow has already been offloaded (from other PMD
+ * instances), do not offload it again.
+ */
+mark = megaflow_to_mark_find(>mega_ufid);
+if (mark != INVALID_FLOW_MARK) {
+VLOG_DBG("Flow has already been offloaded with mark %u\n", mark);
+if (flow->mark != INVALID_FLOW_MARK) {
+ovs_assert(flow->mark == mark);
+} else {
+mark_to_flow_associate(mark, flow);
+}
+return 1;
+}
+
+mark = flow_mark_alloc();
+if (mark == INVALID_FLOW_MARK) {
+VLOG_ERR("Failed to allocate flow mark!\n");
+}
+
+*markp = mark;
+return 0;
+}
+
 /*
  * There are two flow offload operations here: addition and modification.
  *
@@ -2527,38 +2564,18 @@ dp_netdev_flow_offload_put(struct dp_flow_offload_item 
*offload)
 return -1;
 }
 
-if (modification) {
-mark = flow->mark;
-ovs_assert(mark != INVALID_FLOW_MARK);
-} else {
-/*
- * If a mega flow has already been offloaded (from other PMD
- * instances), do not offload it again.
- */
-mark = megaflow_to_mark_find(>mega_ufid);
-if (mark != INVALID_FLOW_MARK) {
-VLOG_DBG("Flow has already been offloaded with mark %u\n", mark);
-if (flow->mark != INVALID_FLOW_MARK) {
-ovs_assert(flow->mark == mark);
-} else {
-mark_to_flow_associate(mark, flow);
-}
-return 0;
-}
+port = netdev_ports_get(in_port, dpif_type_str);
+if (!port) {
+return -1;
+}
 
-mark = flow_mark_alloc();
-if (mark == INVALID_FLOW_MARK) {
-VLOG_ERR("Failed to allocate flow mark!\n");
-return -1;
-}
+if (dp_netdev_alloc_flow_mark(flow, modification, )) {
+/* flow already offloaded */
+netdev_close(port);
+return 0;
 }
 info.flow_mark = mark;
 
-port = netdev_ports_get(in_port, dpif_type_str);
-if (!port || netdev_vport_is_vport_class(port->netdev_class)) {
-netdev_close(port);
-goto err_free;
-}
 /* Taking a global 'port_mutex' to fulfill thread safety restrictions for
  * the netdev-offload-dpdk module. */
 ovs_mutex_lock(>dp->port_mutex);
-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v6 0/8] netdev datapath: Partial action offload

2020-07-12 Thread Sriharsha Basavapatna via dev
Hi,

This patchset extends the "Partial HW acceleration" mode to offload a
part of the action processing to HW, instead of offloading just lookup
(MARK/RSS), for "vhost-user" ports. This is referred to as "Partial Action
Offload". This mode does not require SRIOV/switchdev configuration. In this
mode, forwarding (output) action is still performed by OVS-DPDK SW datapath.

Thanks,
-Harsha

**

v5-->v6:

- Addressed the following comments from Ilya:
  - Introduced new flow offload APIs for egress & ingress partial offload
  - Added netdev-offload-dpdk implementation of the new APIs
  - Removed netdev specific info from the datapath layer
  - Removed flow_api_supported() checks from the datapath layer
  - Added dp class-specific assistance to avoid impact to kernel-dp (vlan)
  - Updated flow-dump identifiers for partial-action offload

v4-->v5:

- Rebased to the current ovs-master (includes vxlan-encap full offload)
- Added 2 patches to support partial offload of vlan push/pop actions

v3-->v4:

- Removed mega-ufid mapping code from dpif-netdev (patch #5) and updated the
  existing mega-ufid mapping code in netdev-offload-dpdk to support partial
  action offload.

v2-->v3:

- Added more commit log comments in the refactoring patch (#1).
- Removed wrapper function for should_partial_offload_egress().
- Removed partial offload check for output action in parse_flow_actions().
- Changed patch sequence (skip-encap and get-stats before offload patch).
- Removed locking code (details in email), added inline comments.
- Moved mega-ufid mapping code from skip-encap (#3) to the offload patch (#5).

v1-->v2:

Fixed review comments from Eli:
- Revamped action list parsing to reject multiple clone/output actions
- Updated comments to reflect support for single clone/output action
- Removed place-holder function for ingress partial action offload
- Created a separate patch (#2) to query dpdk-vhost netdevs
- Set transfer attribute to 0 for partial action offload
- Updated data type of 'dp_flow' in 'dp_netdev_execute_aux'
- Added a mutex to synchronize between offload and datapath threads
- Avoid fall back to mark/rss when egress partial offload fails
- Drop count action for partial action offload

Other changes:
- Avoid duplicate offload requests for the same mega-ufid (from PMD threads)
- Added a coverage counter to track pkts with tnl-push partial offloaded
- Fixed dp_netdev_pmd_remove_flow() to delete partial offloaded flow

**

Sriharsha Basavapatna (8):
  dpif-netdev: Refactor dp_netdev_flow_offload_put()
  netdev-dpdk: provide a function to identify dpdk-vhost netdevs
  dpif-netdev: Skip encap action during datapath execution
  dpif-netdev: Support flow_get() with partial-action-offload
  dpif-netdev: Support partial-action-offload of VXLAN encap flow
  dpif-netdev: Support partial offload of PUSH_VLAN action
  dpif-netdev: Support partial offload of POP_VLAN action
  dpctl: update flow-dump output to reflect partial action offload

 lib/dpctl.c   |   5 +-
 lib/dpif-netdev.c | 295 +--
 lib/dpif.c|   2 +-
 lib/netdev-dpdk.c |   5 +
 lib/netdev-dpdk.h |   1 +
 lib/netdev-offload-dpdk.c | 315 +++---
 lib/netdev-offload-provider.h |  13 ++
 lib/netdev-offload.c  | 102 +++
 lib/netdev-offload.h  |  12 +-
 lib/odp-execute.c |  25 ++-
 lib/odp-execute.h |   5 +-
 11 files changed, 693 insertions(+), 87 deletions(-)

-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v6 7/8] dpif-netdev: Support partial offload of POP_VLAN action

2020-07-12 Thread Sriharsha Basavapatna via dev
If the output-port is a vhost-user port and the action is POP_VLAN,
offload the action on the ingress device if it is offload capable.

Note:
- With ingress partial action offload, the flow must be added to the
  mark-to-flow table. Otherwise, since the action (e.g, POP_VLAN) is
  already performed in the HW, the flow can't be located in the datapath
  flow tables and caches.
- The mark action is offloaded implicitly to facilitate this mark-to-flow
  lookup.
- Add a new member 'partial_actions_offloaded' to the info structure passed
  to the offload layer. When the offload layer successfully offloads the
  partial action, it indicates this to the dpif-netdev layer through this
  flag. This is needed by the dpif-netdev layer to distinguish partial
  offload (i.e, classification offload or mark,rss actions) from partial
  actions offload (classification + some actions, e.g vlan-pop,mark actions)
  in ingress direction.

Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev.c | 61 ++
 lib/netdev-offload-dpdk.c | 97 ---
 lib/netdev-offload-provider.h |  6 +++
 lib/netdev-offload.c  | 35 +
 lib/netdev-offload.h  |  4 ++
 5 files changed, 186 insertions(+), 17 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 53f07fc44..60de686bd 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -115,6 +115,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
 COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
 COVERAGE_DEFINE(datapath_skip_tunnel_push);
 COVERAGE_DEFINE(datapath_skip_vlan_push);
+COVERAGE_DEFINE(datapath_skip_vlan_pop);
 
 /* Protects against changes to 'dp_netdevs'. */
 static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
@@ -2551,6 +2552,28 @@ dp_netdev_flow_offload_del(struct dp_flow_offload_item 
*offload)
 }
 }
 
+/* This function determines if the given flow actions can be partially
+ * offloaded. Partial action offload is attempted when either the in-port
+ * or the out-port for the flow is a vhost-user port.
+ */
+static bool
+should_partial_offload(struct netdev *in_netdev, const char *dpif_type,
+   struct match *match, struct nlattr *actions,
+   size_t act_len, struct netdev **egress_netdev,
+   odp_port_t *egress_port)
+{
+if (netdev_partial_offload_ingress(in_netdev, dpif_type, match, actions,
+   act_len)) {
+return true;
+} else if (netdev_partial_offload_egress(in_netdev, dpif_type, match,
+ actions, act_len, egress_netdev,
+ egress_port)) {
+return true;
+} else {
+return false;
+}
+}
+
 static int
 dp_netdev_alloc_flow_mark(struct dp_netdev_flow *flow, bool modification,
   uint32_t *markp)
@@ -2626,14 +2649,14 @@ dp_netdev_flow_offload_put(struct dp_flow_offload_item 
*offload)
 
 info.attr_egress = 0;
 info.partial_actions = 0;
-
-if (unlikely(netdev_partial_offload_egress(netdev, dpif_type_str,
-   >match,
-   CONST_CAST(struct nlattr *,
-   offload->actions),
-   offload->actions_len,
-   _netdev,
-   _port))) {
+info.partial_actions_offloaded = 0;
+
+if (unlikely(should_partial_offload(netdev, dpif_type_str, >match,
+CONST_CAST(struct nlattr *,
+offload->actions),
+offload->actions_len,
+_netdev,
+_port))) {
 if (egress_netdev) {
 netdev_close(netdev);
 netdev = egress_netdev;
@@ -2666,12 +2689,14 @@ dp_netdev_flow_offload_put(struct dp_flow_offload_item 
*offload)
 goto err_free;
 }
 
-if (unlikely(info.partial_actions && egress_netdev)) {
+if (unlikely(info.partial_actions && info.partial_actions_offloaded)) {
 VLOG_DBG_RL("%s: flow: %p mega_ufid: "UUID_FMT" pmd_id: %d\n",
 __func__, flow, UUID_ARGS((struct uuid *)>mega_ufid),
 flow->pmd_id);
 flow->partial_actions_offloaded = true;
-} else if (!modification) {
+}
+
+if (!modification && alloc_mark) {
 megaflow_to_mark_associate(>mega_ufid, mark);
 mark_to_flow_associate(mark, flow);
 }
@@ -7529,6 +7554,7 @@ dp_netdev_assist_cb(void *dp OVS_UNUSED, const struct 
nlattr *a)
 
 switch (type) {
 case OVS_ACTION_ATTR_PUSH_VLAN:
+case OVS_ACTION_ATTR_POP_VLAN:
 return true;
 default:
 return false;
@@ -7903,7 

[ovs-dev] [PATCH v6 8/8] dpctl: update flow-dump output to reflect partial action offload

2020-07-12 Thread Sriharsha Basavapatna via dev
To identify flows that have actions partially offloaded, the dp and
offloaded fields in the output of dpctl/flow-dump, are updated with
the values: "dp:ovs,dpdk" and "offloaded:partial-action".

Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpctl.c   |  5 -
 lib/netdev-offload-dpdk.c | 14 +++---
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/lib/dpctl.c b/lib/dpctl.c
index db2b1f896..93e73e99a 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -819,7 +819,10 @@ format_dpif_flow(struct ds *ds, const struct dpif_flow *f, 
struct hmap *ports,
 dpif_flow_stats_format(>stats, ds);
 if (dpctl_p->verbosity && f->attrs.offloaded) {
 if (f->attrs.dp_layer && !strcmp(f->attrs.dp_layer, "ovs")) {
-ds_put_cstr(ds, ", offloaded:partial");
+ds_put_cstr(ds, ", offloaded:partial");
+} else if (f->attrs.dp_layer &&
+   !strcmp(f->attrs.dp_layer, "ovs,dpdk")) {
+ds_put_cstr(ds, ", offloaded:partial-action");
 } else {
 ds_put_cstr(ds, ", offloaded:yes");
 }
diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index a4da03e62..7190d417b 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -61,6 +61,7 @@ struct ufid_to_rte_flow_data {
 uint32_t refcnt;
 struct rte_flow *rte_flow;
 bool actions_offloaded;
+bool partial_actions_offloaded;
 struct dpif_flow_stats stats;
 };
 
@@ -82,7 +83,8 @@ ufid_to_rte_flow_data_find(const ovs_u128 *ufid)
 
 static inline void
 ufid_to_rte_flow_associate(const ovs_u128 *ufid,
-   struct rte_flow *rte_flow, bool actions_offloaded)
+   struct rte_flow *rte_flow, bool actions_offloaded,
+   bool partial_actions_offloaded)
 {
 size_t hash = hash_bytes(ufid, sizeof *ufid, 0);
 struct ufid_to_rte_flow_data *data = xzalloc(sizeof *data);
@@ -103,6 +105,7 @@ ufid_to_rte_flow_associate(const ovs_u128 *ufid,
 data->ufid = *ufid;
 data->rte_flow = rte_flow;
 data->actions_offloaded = actions_offloaded;
+data->partial_actions_offloaded = partial_actions_offloaded;
 
 cmap_insert(_to_rte_flow,
 CONST_CAST(struct cmap_node *, >node), hash);
@@ -1479,7 +1482,8 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
 ret = -1;
 goto out;
 }
-ufid_to_rte_flow_associate(ufid, flow, actions_offloaded);
+ufid_to_rte_flow_associate(ufid, flow, actions_offloaded,
+   info->partial_actions_offloaded);
 VLOG_DBG("%s: installed flow %p by ufid "UUID_FMT,
  netdev_get_name(netdev), flow, UUID_ARGS((struct uuid *)ufid));
 
@@ -1605,7 +1609,11 @@ netdev_offload_dpdk_flow_get(struct netdev *netdev,
 
 attrs->offloaded = true;
 if (!rte_flow_data->actions_offloaded) {
-attrs->dp_layer = "ovs";
+if (!rte_flow_data->partial_actions_offloaded) {
+attrs->dp_layer = "ovs";
+} else {
+attrs->dp_layer = "ovs,dpdk";
+}
 memset(stats, 0, sizeof *stats);
 goto out;
 }
-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v6 5/8] dpif-netdev: Support partial-action-offload of VXLAN encap flow

2020-07-12 Thread Sriharsha Basavapatna via dev
In this patch, we support offloading of VXLAN_ENCAP action for a vhost-user
port (aka "partial-action-offload"). At the time of offloading the flow, we
determine if the flow can be offloaded to an egress device, if the input
port is not offload capable such as a vhost-user port. We then offload the
flow with a VXLAN_ENCAP RTE action, to the egress device. We do not add
the OUTPUT RTE action, which indicates to the PMD that is is a partial
action offload request. Note that since the action is being offloaded in
egress direction, classification is expected to be done by OVS SW datapath
and hence there's no need to offload a MARK action.

If offload succeeds, we save the information in 'dp_netdev_flow' so that
we skip execution of the corresponding action (previous patch) during SW
datapath processing.

Reviewed-by: Hemal Shah 
Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev.c | 105 ++---
 lib/netdev-offload-dpdk.c | 204 ++
 lib/netdev-offload-provider.h |   7 ++
 lib/netdev-offload.c  |  67 +++
 lib/netdev-offload.h  |   8 +-
 5 files changed, 355 insertions(+), 36 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 2c20e6d5e..a0bb5d4e1 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2505,10 +2505,49 @@ dp_netdev_append_flow_offload(struct 
dp_flow_offload_item *offload)
 ovs_mutex_unlock(_flow_offload.mutex);
 }
 
+static int
+partial_offload_egress_flow_del(struct dp_flow_offload_item *offload)
+{
+struct dp_netdev_pmd_thread *pmd = offload->pmd;
+struct dp_netdev_flow *flow = offload->flow;
+const char *dpif_type_str = dpif_normalize_type(pmd->dp->class->type);
+struct netdev *port;
+int ret;
+
+port = netdev_ports_get(flow->egress_offload_port, dpif_type_str);
+if (!port) {
+return -1;
+}
+
+/* Taking a global 'port_mutex' to fulfill thread safety
+ * restrictions for the netdev-offload-dpdk module. */
+ovs_mutex_lock(>dp->port_mutex);
+ret = netdev_flow_del(port, >mega_ufid, NULL);
+ovs_mutex_unlock(>dp->port_mutex);
+netdev_close(port);
+
+if (ret) {
+return ret;
+}
+
+flow->egress_offload_port = NULL;
+flow->partial_actions_offloaded = false;
+
+VLOG_DBG_RL("%s: flow: %p mega_ufid: "UUID_FMT" pmd_id: %d\n", __func__,
+flow, UUID_ARGS((struct uuid *)>mega_ufid),
+offload->flow->pmd_id);
+return ret;
+}
+
 static int
 dp_netdev_flow_offload_del(struct dp_flow_offload_item *offload)
 {
-return mark_to_flow_disassociate(offload->pmd, offload->flow);
+if (unlikely(offload->flow->partial_actions_offloaded &&
+offload->flow->egress_offload_port != ODPP_NONE)) {
+return partial_offload_egress_flow_del(offload);
+} else {
+return mark_to_flow_disassociate(offload->pmd, offload->flow);
+}
 }
 
 static int
@@ -2568,51 +2607,82 @@ dp_netdev_flow_offload_put(struct dp_flow_offload_item 
*offload)
 const char *dpif_type_str = dpif_normalize_type(pmd->dp->class->type);
 bool modification = offload->op == DP_NETDEV_FLOW_OFFLOAD_OP_MOD;
 struct offload_info info;
-struct netdev *port;
-uint32_t mark;
+struct netdev *netdev;
+odp_port_t egress_port = ODPP_NONE;
+struct netdev *egress_netdev = NULL;
+bool alloc_mark = true;
+uint32_t mark = INVALID_FLOW_MARK;
 int ret;
 
 if (flow->dead) {
 return -1;
 }
 
-port = netdev_ports_get(in_port, dpif_type_str);
-if (!port) {
+netdev = netdev_ports_get(in_port, dpif_type_str);
+if (!netdev) {
 return -1;
 }
 
-if (dp_netdev_alloc_flow_mark(flow, modification, )) {
+info.attr_egress = 0;
+info.partial_actions = 0;
+
+if (unlikely(netdev_partial_offload_egress(netdev, dpif_type_str,
+   >match,
+   CONST_CAST(struct nlattr *,
+   offload->actions),
+   offload->actions_len,
+   _netdev,
+   _port))) {
+if (egress_netdev) {
+netdev_close(netdev);
+netdev = egress_netdev;
+flow->egress_offload_port = egress_port;
+info.attr_egress = 1;
+alloc_mark = false;
+}
+info.partial_actions = 1;
+}
+
+if (alloc_mark && dp_netdev_alloc_flow_mark(flow, modification, )) {
 /* flow already offloaded */
-netdev_close(port);
-return 0;
+netdev_close(netdev);
+return 0;
 }
+
 info.flow_mark = mark;
 
 /* Taking a global 'port_mutex' to fulfill thread safety restrictions for
  * the netdev-offload-dpdk module. */
 ovs_mutex_lock(>dp->port_mutex);
-ret = netdev_flow_put(port, 

Re: [ovs-dev] [PATCH v5 0/7] netdev datapath: Partial action offload

2020-07-12 Thread Sriharsha Basavapatna via dev
Hi Ilya,

Thanks for your comments, please see my response inline.

On Fri, Jul 10, 2020 at 5:41 PM Ilya Maximets  wrote:
>
> On 7/9/20 8:47 AM, Sriharsha Basavapatna via dev wrote:
> > Hi,
> >
> > This patchset extends the "Partial HW acceleration" mode to offload a
> > part of the action processing to HW, instead of offloading just lookup
> > (MARK/RSS), for "vhost-user" ports. This is referred to as "Partial Action
> > Offload". This mode does not require SRIOV/switchdev configuration. In this
> > mode, forwarding (output) action is still performed by OVS-DPDK SW datapath.
>
> Hi.  I like the idea of egress offloading.  It's interesting.

Yes,  you get the performance benefits of flow offload to virtual NICs
in a VM without SR-IOV (VFs).

> IIUC, HW will perform matching on egress packets and perform actions before
> actually sending them. Is that right?

Right, HW matches on the rule and performs the subset of actions
specified before transmitting. For example, egress packet from the VM
(from a virtio nic) is processed by OVS and OVS forwards it to the
physical port (PF) and the packet is encapsulated (VXLAN) on its way
out to the wire.
>
> For the implementation I have a few concerns:
>
> 1. Why only vhost-user ports?  I mean, egress offloading could be done
>for any ingress port in case ingress port doesn't support full offloading.

You are right. But just to reduce the scope of initial implementation
and validation, we focused on the use-case involving vhost-user ports.
Once this design is in place, it should be pretty easy to remove this
restriction to include more non-offload capable ingress ports.
>
>Moreover, you could have classification offloading on ingress and actions
>offloading on egress at the same time.  This might be useful, for example,
>if we have two diferent NICs that both supports offloading, but we have to
>send packets between them.  But, yes, that might be a way for further
>improvement.

Yes, good idea. This use case can be supported in the future, with
some slight extensions to the current design. Forwarding between two
physical ports, but since they are from different NICs (i.e, they
don't belong to the same eSwitch), break up the offload flow rule into
2 separate rules, with classification on the ingress-port and
partial-action offloading on the egress-port.
>
>Regarding vhost-user, you're exposing too much of netdev internals to
>datapath layer by checking for a specific netdev type.  This is not
>a good thing to do.  Especially because egress offloading doesn't depend
>on a type of ingress interface.

Sure, I'll remove this netdev type specific check from dpif-netdev.
But the current use case for egress offload that we are considering,
is only when the ingress device is not offload capable. So, I'd like
to retain this condition, but I'll move this into the offload layer
(see below).
>
> 2. I'm worried about other offload providers like tinux-tc that doesn't know
>anything that happens in dpif-netdev and will not work correctly if
>dpif-netdev will try to use egress offloading on it.

Maybe I'm missing something here, but this cannot happen since
dpif-netdev handles only user (dpdk) datapath flows and thus should
only be talking to netdev-offload-dpdk offload provider and in turn
only to devices of class netdev-dpdk  ? But I'll add additional checks
to the new offload APIs to make sure only flows of a given datapath
class are processed.

>I see that you're using netdev_dpdk_flow_api_supported() inside the
>dpif-netdev, but that is the violation of netdev-offload abstraction layer.

Sure, this will be moved into the offload layer API that will be
introduced (see below).
>
>I think, some more generic extension of netdev-offload interface required,
>so all offload providers will be able to use this API.  I mean, egress
>offloading should be possible with TC.  You don't need to add support for
>this, but we should have a generic interface to utilize this support in
>the future.
>
>At least there should be something more generic like info.offload_direction
>of some enumeration type like
> enum netdev_offload_direction {
> NETDEV_OFFLOAD_INGRESS,
> NETDEV_OFFLOAD_EGRESS,
> };
>And each offload provider should decide by itself if it supports some type
>of offloading or not.

I agree; will implement the following new offload APIs:

netdev-offload.h:  netdev_partial_offload_egress()
netdev-offload.h:  netdev_partial_offload_ingress()
netdev-offload-provider.h:  netdev_flow_api::flow_offload_egress_partial()
netdev-offload-provider.h:  netdev_flow_api::flow_offload_ingress_partial()
netdev-offload-dpdk.c:  netdev_offload_dpdk_egress_partial()
netdev-offload-dpdk.c:  netd

[ovs-dev] [PATCH v6 2/8] netdev-dpdk: provide a function to identify dpdk-vhost netdevs

2020-07-12 Thread Sriharsha Basavapatna via dev
This patch adds a function to determine if a given netdev belongs to the
dpdk-vhost class, using the netdev_class specific data.

Reviewed-by: Hemal Shah 
Signed-off-by: Sriharsha Basavapatna 
---
 lib/netdev-dpdk.c | 5 +
 lib/netdev-dpdk.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 44ebf96da..a2a9bb8e7 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -558,6 +558,11 @@ is_dpdk_class(const struct netdev_class *class)
|| class->destruct == netdev_dpdk_vhost_destruct;
 }
 
+bool is_dpdk_vhost_netdev(struct netdev *netdev)
+{
+return netdev->netdev_class->destruct == netdev_dpdk_vhost_destruct;
+}
+
 /* DPDK NIC drivers allocate RX buffers at a particular granularity, typically
  * aligned at 1k or less. If a declared mbuf size is not a multiple of this
  * value, insufficient buffers are allocated to accomodate the packet in its
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index 848346cb4..ab3c3102e 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -37,6 +37,7 @@ void netdev_dpdk_register(void);
 void free_dpdk_buf(struct dp_packet *);
 
 bool netdev_dpdk_flow_api_supported(struct netdev *);
+bool is_dpdk_vhost_netdev(struct netdev *);
 
 int
 netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v6 3/8] dpif-netdev: Skip encap action during datapath execution

2020-07-12 Thread Sriharsha Basavapatna via dev
In this patch we check if action processing (apart from OUTPUT action),
should be skipped for a given dp_netdev_flow. Specifically, we check if
the action is TNL_PUSH and if it has been offloaded to HW, then we do not
push the tunnel header in SW. The datapath only executes the OUTPUT action.
The packet will be encapsulated in HW during transmit.

Reviewed-by: Hemal Shah 
Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev.c | 46 --
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f26caef6d..c75a07c5e 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -113,6 +113,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
 COVERAGE_DEFINE(datapath_drop_invalid_bond);
 COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
 COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
+COVERAGE_DEFINE(datapath_skip_tunnel_push);
 
 /* Protects against changes to 'dp_netdevs'. */
 static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
@@ -547,6 +548,16 @@ struct dp_netdev_flow {
 bool dead;
 uint32_t mark;   /* Unique flow mark assigned to a flow */
 
+/* The next two members are used to support partial offloading of
+ * actions. The boolean flag tells if this flow has its actions partially
+ * offloaded. The egress port# tells if the action should be offloaded
+ * on the egress (output) port instead of the in-port for the flow. Note
+ * that we support flows with a single egress port action.
+ * (see MAX_ACTION_ATTRS for related comments).
+ */
+bool partial_actions_offloaded;
+odp_port_t  egress_offload_port;
+
 /* Statistics. */
 struct dp_netdev_flow_stats stats;
 
@@ -822,7 +833,8 @@ static void dp_netdev_execute_actions(struct 
dp_netdev_pmd_thread *pmd,
   bool should_steal,
   const struct flow *flow,
   const struct nlattr *actions,
-  size_t actions_len);
+  size_t actions_len,
+  const struct dp_netdev_flow *dp_flow);
 static void dp_netdev_input(struct dp_netdev_pmd_thread *,
 struct dp_packet_batch *, odp_port_t port_no);
 static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
@@ -3966,7 +3978,7 @@ dpif_netdev_execute(struct dpif *dpif, struct 
dpif_execute *execute)
 dp_packet_batch_init_packet(, execute->packet);
 pp.do_not_steal = true;
 dp_netdev_execute_actions(pmd, , false, execute->flow,
-  execute->actions, execute->actions_len);
+  execute->actions, execute->actions_len, NULL);
 dp_netdev_pmd_flush_output_packets(pmd, true);
 
 if (pmd->core_id == NON_PMD_CORE_ID) {
@@ -6700,7 +6712,7 @@ packet_batch_per_flow_execute(struct 
packet_batch_per_flow *batch,
 actions = dp_netdev_flow_get_actions(flow);
 
 dp_netdev_execute_actions(pmd, >array, true, >flow,
-  actions->actions, actions->size);
+  actions->actions, actions->size, flow);
 }
 
 static inline void
@@ -6995,7 +7007,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
  * we'll send the packet up twice. */
 dp_packet_batch_init_packet(, packet);
 dp_netdev_execute_actions(pmd, , true, ,
-  actions->data, actions->size);
+  actions->data, actions->size, NULL);
 
 add_actions = put_actions->size ? put_actions : actions;
 if (OVS_LIKELY(error != ENOSPC)) {
@@ -7230,6 +7242,7 @@ dp_netdev_recirculate(struct dp_netdev_pmd_thread *pmd,
 struct dp_netdev_execute_aux {
 struct dp_netdev_pmd_thread *pmd;
 const struct flow *flow;
+const struct dp_netdev_flow *dp_flow;/* for partial action offload */
 };
 
 static void
@@ -7374,7 +7387,7 @@ dp_execute_userspace_action(struct dp_netdev_pmd_thread 
*pmd,
 if (!error || error == ENOSPC) {
 dp_packet_batch_init_packet(, packet);
 dp_netdev_execute_actions(pmd, , should_steal, flow,
-  actions->data, actions->size);
+  actions->data, actions->size, NULL);
 } else if (should_steal) {
 dp_packet_delete(packet);
 COVERAGE_INC(datapath_drop_userspace_action_error);
@@ -7483,6 +7496,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 int type = nl_attr_type(a);
 struct tx_port *p;
 uint32_t packet_count, packets_dropped;
+const struct dp_netdev_flow *dp_flow = aux->dp_flow;
 
 switch ((enum ovs_action_attr)type) {
 case OVS_ACTION_ATTR_OUTPUT:
@@ -7505,9 +7519,20 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 }
 dp_packet_batch_apply_cutlen(packets_);
 packet_count = 

[ovs-dev] [PATCH v6 6/8] dpif-netdev: Support partial offload of PUSH_VLAN action

2020-07-12 Thread Sriharsha Basavapatna via dev
If the input-port is a vhost-user port and the action is PUSH_VLAN,
offload the action on the egress device if it is offload capable.

Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev.c | 32 ++--
 lib/dpif.c|  2 +-
 lib/netdev-offload-dpdk.c | 44 +++
 lib/odp-execute.c | 25 --
 lib/odp-execute.h |  5 -
 5 files changed, 80 insertions(+), 28 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a0bb5d4e1..53f07fc44 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_bond);
 COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
 COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
 COVERAGE_DEFINE(datapath_skip_tunnel_push);
+COVERAGE_DEFINE(datapath_skip_vlan_push);
 
 /* Protects against changes to 'dp_netdevs'. */
 static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
@@ -7521,6 +7522,19 @@ dp_execute_output_action(struct dp_netdev_pmd_thread 
*pmd,
 return true;
 }
 
+static bool
+dp_netdev_assist_cb(void *dp OVS_UNUSED, const struct nlattr *a)
+{
+enum ovs_action_attr type = nl_attr_type(a);
+
+switch (type) {
+case OVS_ACTION_ATTR_PUSH_VLAN:
+return true;
+default:
+return false;
+}
+}
+
 static void
 dp_execute_lb_output_action(struct dp_netdev_pmd_thread *pmd,
 struct dp_packet_batch *packets_,
@@ -7874,7 +7888,21 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 pmd->ctx.now);
 break;
 
-case OVS_ACTION_ATTR_PUSH_VLAN:
+case OVS_ACTION_ATTR_PUSH_VLAN: {
+const struct ovs_action_push_vlan *vlan = nl_attr_get(a);
+struct dp_packet *packet;
+
+if (!dp_flow || !dp_flow->partial_actions_offloaded) {
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets_) {
+eth_push_vlan(packet, vlan->vlan_tpid, vlan->vlan_tci);
+}
+} else {
+packet_count = dp_packet_batch_size(packets_);
+COVERAGE_ADD(datapath_skip_vlan_push, packet_count);
+}
+break;
+}
+
 case OVS_ACTION_ATTR_POP_VLAN:
 case OVS_ACTION_ATTR_PUSH_MPLS:
 case OVS_ACTION_ATTR_POP_MPLS:
@@ -7909,7 +7937,7 @@ dp_netdev_execute_actions(struct dp_netdev_pmd_thread 
*pmd,
 struct dp_netdev_execute_aux aux = { pmd, flow, dp_flow };
 
 odp_execute_actions(, packets, should_steal, actions,
-actions_len, dp_execute_cb);
+actions_len, dp_execute_cb, dp_netdev_assist_cb);
 }
 
 struct dp_netdev_ct_dump {
diff --git a/lib/dpif.c b/lib/dpif.c
index 7cac3a629..23c481f70 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -1294,7 +1294,7 @@ dpif_execute_with_help(struct dpif *dpif, struct 
dpif_execute *execute)
 
 dp_packet_batch_init_packet(, execute->packet);
 odp_execute_actions(, , false, execute->actions,
-execute->actions_len, dpif_execute_helper_cb);
+execute->actions_len, dpif_execute_helper_cb, NULL);
 return aux.error;
 }
 
diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index 78a71f84a..449333248 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -1622,15 +1622,18 @@ struct action_attr {
 };
 
 /*
- * Maxium number of actions to be parsed while selecting a flow for partial
- * action offload. This number is currently based on the minimum number of
- * attributes seen with the tunnel encap action (clone, tunnel_push, output).
- * This number includes output action to a single egress device (uplink) and
- * supports neither multiple clone() actions nor multiple output actions.
- * This number could change if and when we support other actions or
- * combinations of actions for partial offload.
+ * Maxium number of actions to be parsed while selecting a flow for egress
+ * partial action offload. This number is currently based on the minimum
+ * number of attributes seen with the tunnel encap action (clone, tunnel_push,
+ * output). This number includes output action to a single egress device
+ * (uplink) and supports neither multiple clone() actions nor multiple output
+ * actions.
  */
-#define MAX_ACTION_ATTRS3 /* Max # action attributes supported */
+enum num_action_attr_egress {
+VLAN_PUSH_ATTRS = 2,/* vlan_push, output */
+TUNNEL_PUSH_ATTRS = 3,  /* clone, tunnel_push, output */
+MAX_ACTION_ATTRS_EGRESS = TUNNEL_PUSH_ATTRS
+};
 
 /*
  * This function parses the list of OVS "actions" of length "actions_len",
@@ -1690,7 +1693,7 @@ netdev_offload_dpdk_egress_partial(struct netdev *netdev, 
struct match *match,
struct netdev **egress_netdev,
odp_port_t *egress_port)
 {
-struct action_attr attrs[MAX_ACTION_ATTRS];
+struct action_attr 

[ovs-dev] [PATCH v6 4/8] dpif-netdev: Support flow_get() with partial-action-offload

2020-07-12 Thread Sriharsha Basavapatna via dev
For flows that offload partial actions in egress direction,
provide the right netdev to fetch statistics.

Reviewed-by: Hemal Shah 
Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c75a07c5e..2c20e6d5e 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3197,8 +3197,14 @@ dpif_netdev_get_flow_offload_status(const struct 
dp_netdev *dp,
 return false;
 }
 
-netdev = netdev_ports_get(netdev_flow->flow.in_port.odp_port,
-  dpif_normalize_type(dp->class->type));
+if (netdev_flow->partial_actions_offloaded &&
+netdev_flow->egress_offload_port != ODPP_NONE) {
+netdev = netdev_ports_get(netdev_flow->egress_offload_port,
+  dpif_normalize_type(dp->class->type));
+} else {
+netdev = netdev_ports_get(netdev_flow->flow.in_port.odp_port,
+  dpif_normalize_type(dp->class->type));
+}
 if (!netdev) {
 return false;
 }
-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/3] netdev-offload-dpdk: Set transfer attribute to zero for mark/rss offload

2020-07-10 Thread Sriharsha Basavapatna via dev
The offload layer doesn't initialize the 'transfer' attribute
for mark/rss offload (partial offload). It should be set to 0.

Fixes: 60e778c7533a ("netdev-offload-dpdk: Framework for actions offload.")
Signed-off-by: Sriharsha Basavapatna 
---
 lib/netdev-offload-dpdk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index 26a75f0f2..4c652fd82 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -818,7 +818,8 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns *patterns,
 .group = 0,
 .priority = 0,
 .ingress = 1,
-.egress = 0
+.egress = 0,
+.transfer = 0
 };
 struct rte_flow_error error;
 struct rte_flow *flow;
-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/3] tunnel: Set ECN mask bits only when it is matched in the IP header

2020-07-10 Thread Sriharsha Basavapatna via dev
IP_ECN_MASK is set unconditionally in the mask field for a
tunneled flow. Set this only when the ECN field is matched.

Fixes: abcd4402fec4 ("tunnel: Only un-wildcard the ECN bits for IP traffic")
Signed-off-by: Sriharsha Basavapatna 
---
 ofproto/tunnel.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c
index 03f0ab765..d0a65b430 100644
--- a/ofproto/tunnel.c
+++ b/ofproto/tunnel.c
@@ -455,13 +455,17 @@ tnl_port_send(const struct ofport_dpif *ofport, struct 
flow *flow,
 
 /* ECN fields are always inherited. */
 if (is_ip_any(flow)) {
-wc->masks.nw_tos |= IP_ECN_MASK;
-
 if (IP_ECN_is_ce(flow->nw_tos)) {
 flow->tunnel.ip_tos |= IP_ECN_ECT_0;
 } else {
 flow->tunnel.ip_tos |= flow->nw_tos & IP_ECN_MASK;
 }
+
+if (flow->tunnel.ip_tos & IP_ECN_MASK) {
+wc->masks.nw_tos |= IP_ECN_MASK;
+} else {
+wc->masks.nw_tos &= ~IP_ECN_MASK;
+}
 }
 
 flow->tunnel.flags &= ~(FLOW_TNL_F_MASK & ~FLOW_TNL_PUB_F_MASK);
-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/3] netdev-offload-dpdk: Pass L4 proto-id to match in the L3 rte_flow_item

2020-07-10 Thread Sriharsha Basavapatna via dev
The offload layer clears the L4 protocol mask in the L3 item, when the
L4 item is passed for matching, as an optimization. This can be confusing
while parsing the headers in the PMD. Also, the datapath flow specifies
this field to be matched. This optimization is best left to the PMD.
This patch restores the code to pass the L4 protocol type in L3 match.

Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Sriharsha Basavapatna 
---
 lib/netdev-offload-dpdk.c | 22 --
 1 file changed, 22 deletions(-)

diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index 4c652fd82..165fd1f47 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -596,7 +596,6 @@ static int
 parse_flow_match(struct flow_patterns *patterns,
  const struct match *match)
 {
-uint8_t *next_proto_mask = NULL;
 uint8_t proto = 0;
 
 /* Eth */
@@ -667,7 +666,6 @@ parse_flow_match(struct flow_patterns *patterns,
 /* Save proto for L4 protocol setup. */
 proto = spec->hdr.next_proto_id &
 mask->hdr.next_proto_id;
-next_proto_mask = >hdr.next_proto_id;
 }
 
 if (proto != IPPROTO_ICMP && proto != IPPROTO_UDP  &&
@@ -701,11 +699,6 @@ parse_flow_match(struct flow_patterns *patterns,
 mask->hdr.tcp_flags = ntohs(match->wc.masks.tcp_flags) & 0xff;
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_TCP, spec, mask);
-
-/* proto == TCP and ITEM_TYPE_TCP, thus no need for proto match. */
-if (next_proto_mask) {
-*next_proto_mask = 0;
-}
 } else if (proto == IPPROTO_UDP) {
 struct rte_flow_item_udp *spec, *mask;
 
@@ -719,11 +712,6 @@ parse_flow_match(struct flow_patterns *patterns,
 mask->hdr.dst_port = match->wc.masks.tp_dst;
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_UDP, spec, mask);
-
-/* proto == UDP and ITEM_TYPE_UDP, thus no need for proto match. */
-if (next_proto_mask) {
-*next_proto_mask = 0;
-}
 } else if (proto == IPPROTO_SCTP) {
 struct rte_flow_item_sctp *spec, *mask;
 
@@ -737,11 +725,6 @@ parse_flow_match(struct flow_patterns *patterns,
 mask->hdr.dst_port = match->wc.masks.tp_dst;
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_SCTP, spec, mask);
-
-/* proto == SCTP and ITEM_TYPE_SCTP, thus no need for proto match. */
-if (next_proto_mask) {
-*next_proto_mask = 0;
-}
 } else if (proto == IPPROTO_ICMP) {
 struct rte_flow_item_icmp *spec, *mask;
 
@@ -755,11 +738,6 @@ parse_flow_match(struct flow_patterns *patterns,
 mask->hdr.icmp_code = (uint8_t) ntohs(match->wc.masks.tp_dst);
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_ICMP, spec, mask);
-
-/* proto == ICMP and ITEM_TYPE_ICMP, thus no need for proto match. */
-if (next_proto_mask) {
-*next_proto_mask = 0;
-}
 }
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_END, NULL, NULL);
-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 0/3] netdev datapath offload: misc fixes

2020-07-10 Thread Sriharsha Basavapatna via dev
Hi,

This patchset fixes some issues found during netdev-offload-dpdk testing.

Patch-1: Initialize rte 'transfer' attribute for mark/rss offload.
Patch-2: Pass L4 protocol-id to match in the rte_flow_item.
Patch-3: Set IP_ECN_MASK only when the ECN field is matched.

Thanks,
-Harsha

**

v1:
- Created this patchset using patches 1 & 2, sent separately earlier.
  Please ignore the previous version of these patches.
- Patch-2: Updated "fixes:" tag with the right commit id.
- Added patch-3.

**

Sriharsha Basavapatna (3):
  netdev-offload-dpdk: Set transfer attribute to zero for mark/rss
offload
  netdev-offload-dpdk: Pass L4 proto-id to match in the L3 rte_flow_item
  tunnel: Set ECN mask bits only when it is matched in the IP header

 lib/netdev-offload-dpdk.c | 25 ++---
 ofproto/tunnel.c  |  8 ++--
 2 files changed, 8 insertions(+), 25 deletions(-)

-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-offload-dpdk: Pass L4 proto-id to match in the L3 rte_flow_item

2020-07-10 Thread Sriharsha Basavapatna via dev
On Fri, Jul 10, 2020 at 4:01 PM Sriharsha Basavapatna <
sriharsha.basavapa...@broadcom.com> wrote:

>
>
> On Sun, Jul 5, 2020 at 6:00 PM Eli Britstein  wrote:
>
>>
>> On 7/5/2020 2:48 PM, Sriharsha Basavapatna wrote:
>> > The offload layer clears the L4 protocol mask in the L3 item, when the
>> > L4 item is passed for matching, as an optimization. This can be
>> confusing
>> > while parsing the headers in the PMD. Also, the datapath flow specifies
>> > this field to be matched. This optimization is best left to the PMD.
>> > This patch restores the code to pass the L4 protocol type in L3 match.
>> >
>> > Fixes: 900fe00784ca ("netdev-offload-dpdk: Dynamically allocate pattern
>> items.")
>>
>> It's arguable if it's really a fix.
>
> It is better not to ignore a field that is specified to be matched by the
> datapath flow.
>
>
>> I don't see any further information
>> the PMD can use, but it's harmless anyway, so OK by me either with this
>> commit or without.
>
> If you insist it's a fix, this is the correct commit that did it in the
>> first place:
>>
>> e8a2b5bf92bb netdev-dpdk: implement flow offload with rte flow
>>
>
> Thanks, I'll update the "fixes" field in v2.
> -Harsha
>

I'll send v2 of this patch in a patchset with a couple of other fixes.
-Harsha


>
>> > Signed-off-by: Sriharsha Basavapatna <
>> sriharsha.basavapa...@broadcom.com>
>> > ---
>> >   lib/netdev-offload-dpdk.c | 22 --
>> >   1 file changed, 22 deletions(-)
>> >
>> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>> > index 4c652fd82..165fd1f47 100644
>> > --- a/lib/netdev-offload-dpdk.c
>> > +++ b/lib/netdev-offload-dpdk.c
>> > @@ -596,7 +596,6 @@ static int
>> >   parse_flow_match(struct flow_patterns *patterns,
>> >const struct match *match)
>> >   {
>> > -uint8_t *next_proto_mask = NULL;
>> >   uint8_t proto = 0;
>> >
>> >   /* Eth */
>> > @@ -667,7 +666,6 @@ parse_flow_match(struct flow_patterns *patterns,
>> >   /* Save proto for L4 protocol setup. */
>> >   proto = spec->hdr.next_proto_id &
>> >   mask->hdr.next_proto_id;
>> > -next_proto_mask = >hdr.next_proto_id;
>> >   }
>> >
>> >   if (proto != IPPROTO_ICMP && proto != IPPROTO_UDP  &&
>> > @@ -701,11 +699,6 @@ parse_flow_match(struct flow_patterns *patterns,
>> >   mask->hdr.tcp_flags = ntohs(match->wc.masks.tcp_flags) & 0xff;
>> >
>> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_TCP, spec,
>> mask);
>> > -
>> > -/* proto == TCP and ITEM_TYPE_TCP, thus no need for proto
>> match. */
>> > -if (next_proto_mask) {
>> > -*next_proto_mask = 0;
>> > -}
>> >   } else if (proto == IPPROTO_UDP) {
>> >   struct rte_flow_item_udp *spec, *mask;
>> >
>> > @@ -719,11 +712,6 @@ parse_flow_match(struct flow_patterns *patterns,
>> >   mask->hdr.dst_port = match->wc.masks.tp_dst;
>> >
>> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_UDP, spec,
>> mask);
>> > -
>> > -/* proto == UDP and ITEM_TYPE_UDP, thus no need for proto
>> match. */
>> > -if (next_proto_mask) {
>> > -*next_proto_mask = 0;
>> > -}
>> >   } else if (proto == IPPROTO_SCTP) {
>> >   struct rte_flow_item_sctp *spec, *mask;
>> >
>> > @@ -737,11 +725,6 @@ parse_flow_match(struct flow_patterns *patterns,
>> >   mask->hdr.dst_port = match->wc.masks.tp_dst;
>> >
>> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_SCTP, spec,
>> mask);
>> > -
>> > -/* proto == SCTP and ITEM_TYPE_SCTP, thus no need for proto
>> match. */
>> > -if (next_proto_mask) {
>> > -*next_proto_mask = 0;
>> > -}
>> >   } else if (proto == IPPROTO_ICMP) {
>> >   struct rte_flow_item_icmp *spec, *mask;
>> >
>> > @@ -755,11 +738,6 @@ parse_flow_match(struct flow_patterns *patterns,
>> >   mask->hdr.icmp_code = (uint8_t) ntohs(match->wc.masks.tp_dst);
>> >
>> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_ICMP, spec,
>> mask);
>> > -
>> > -/* proto == ICMP and ITEM_TYPE_ICMP, thus no need for proto
>> match. */
>> > -if (next_proto_mask) {
>> > -*next_proto_mask = 0;
>> > -}
>> >   }
>> >
>> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_END, NULL, NULL);
>>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-offload-dpdk: Set transfer attribute to zero for mark/rss offload

2020-07-10 Thread Sriharsha Basavapatna via dev
Please ignore this patch, I'm resending it in a patchset.

Thanks,
-Harsha

On Mon, Jul 6, 2020 at 6:54 PM Sriharsha Basavapatna <
sriharsha.basavapa...@broadcom.com> wrote:

> A gentle reminder on this patch.
> Thanks,
> -Harsha
>
> On Mon, Jun 29, 2020 at 11:31 PM Sriharsha Basavapatna
>  wrote:
> >
> > The offload layer doesn't initialize the 'transfer' attribute
> > for mark/rss offload (partial offload). It should be set to 0.
> >
> > Fixes: 60e778c7533a ("netdev-offload-dpdk: Framework for actions
> offload.")
> > Signed-off-by: Sriharsha Basavapatna  >
> > ---
> >  lib/netdev-offload-dpdk.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> > index 26a75f0f2..4c652fd82 100644
> > --- a/lib/netdev-offload-dpdk.c
> > +++ b/lib/netdev-offload-dpdk.c
> > @@ -818,7 +818,8 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns
> *patterns,
> >  .group = 0,
> >  .priority = 0,
> >  .ingress = 1,
> > -.egress = 0
> > +.egress = 0,
> > +.transfer = 0
> >  };
> >  struct rte_flow_error error;
> >  struct rte_flow *flow;
> > --
> > 2.25.0.rc2
> >
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-offload-dpdk: Pass L4 proto-id to match in the L3 rte_flow_item

2020-07-10 Thread Sriharsha Basavapatna via dev
On Sun, Jul 5, 2020 at 6:00 PM Eli Britstein  wrote:

>
> On 7/5/2020 2:48 PM, Sriharsha Basavapatna wrote:
> > The offload layer clears the L4 protocol mask in the L3 item, when the
> > L4 item is passed for matching, as an optimization. This can be confusing
> > while parsing the headers in the PMD. Also, the datapath flow specifies
> > this field to be matched. This optimization is best left to the PMD.
> > This patch restores the code to pass the L4 protocol type in L3 match.
> >
> > Fixes: 900fe00784ca ("netdev-offload-dpdk: Dynamically allocate pattern
> items.")
>
> It's arguable if it's really a fix.

It is better not to ignore a field that is specified to be matched by the
datapath flow.


> I don't see any further information
> the PMD can use, but it's harmless anyway, so OK by me either with this
> commit or without.

If you insist it's a fix, this is the correct commit that did it in the
> first place:
>
> e8a2b5bf92bb netdev-dpdk: implement flow offload with rte flow
>

Thanks, I'll update the "fixes" field in v2.
-Harsha

>
> > Signed-off-by: Sriharsha Basavapatna  >
> > ---
> >   lib/netdev-offload-dpdk.c | 22 --
> >   1 file changed, 22 deletions(-)
> >
> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> > index 4c652fd82..165fd1f47 100644
> > --- a/lib/netdev-offload-dpdk.c
> > +++ b/lib/netdev-offload-dpdk.c
> > @@ -596,7 +596,6 @@ static int
> >   parse_flow_match(struct flow_patterns *patterns,
> >const struct match *match)
> >   {
> > -uint8_t *next_proto_mask = NULL;
> >   uint8_t proto = 0;
> >
> >   /* Eth */
> > @@ -667,7 +666,6 @@ parse_flow_match(struct flow_patterns *patterns,
> >   /* Save proto for L4 protocol setup. */
> >   proto = spec->hdr.next_proto_id &
> >   mask->hdr.next_proto_id;
> > -next_proto_mask = >hdr.next_proto_id;
> >   }
> >
> >   if (proto != IPPROTO_ICMP && proto != IPPROTO_UDP  &&
> > @@ -701,11 +699,6 @@ parse_flow_match(struct flow_patterns *patterns,
> >   mask->hdr.tcp_flags = ntohs(match->wc.masks.tcp_flags) & 0xff;
> >
> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_TCP, spec, mask);
> > -
> > -/* proto == TCP and ITEM_TYPE_TCP, thus no need for proto
> match. */
> > -if (next_proto_mask) {
> > -*next_proto_mask = 0;
> > -}
> >   } else if (proto == IPPROTO_UDP) {
> >   struct rte_flow_item_udp *spec, *mask;
> >
> > @@ -719,11 +712,6 @@ parse_flow_match(struct flow_patterns *patterns,
> >   mask->hdr.dst_port = match->wc.masks.tp_dst;
> >
> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_UDP, spec, mask);
> > -
> > -/* proto == UDP and ITEM_TYPE_UDP, thus no need for proto
> match. */
> > -if (next_proto_mask) {
> > -*next_proto_mask = 0;
> > -}
> >   } else if (proto == IPPROTO_SCTP) {
> >   struct rte_flow_item_sctp *spec, *mask;
> >
> > @@ -737,11 +725,6 @@ parse_flow_match(struct flow_patterns *patterns,
> >   mask->hdr.dst_port = match->wc.masks.tp_dst;
> >
> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_SCTP, spec,
> mask);
> > -
> > -/* proto == SCTP and ITEM_TYPE_SCTP, thus no need for proto
> match. */
> > -if (next_proto_mask) {
> > -*next_proto_mask = 0;
> > -}
> >   } else if (proto == IPPROTO_ICMP) {
> >   struct rte_flow_item_icmp *spec, *mask;
> >
> > @@ -755,11 +738,6 @@ parse_flow_match(struct flow_patterns *patterns,
> >   mask->hdr.icmp_code = (uint8_t) ntohs(match->wc.masks.tp_dst);
> >
> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_ICMP, spec,
> mask);
> > -
> > -/* proto == ICMP and ITEM_TYPE_ICMP, thus no need for proto
> match. */
> > -if (next_proto_mask) {
> > -*next_proto_mask = 0;
> > -}
> >   }
> >
> >   add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_END, NULL, NULL);
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 0/7] netdev datapath: Partial action offload

2020-07-09 Thread Sriharsha Basavapatna via dev
Hi Ilya,

I removed the RFC tag on this partial-offload patchset and it is
rebased to the latest master branch. We are targeting this for 2.14.
So just a gentle reminder on this offload patchset.

Thanks,
-Harsha

On Thu, Jul 9, 2020 at 12:17 PM Sriharsha Basavapatna
 wrote:
>
> Hi,
>
> This patchset extends the "Partial HW acceleration" mode to offload a
> part of the action processing to HW, instead of offloading just lookup
> (MARK/RSS), for "vhost-user" ports. This is referred to as "Partial Action
> Offload". This mode does not require SRIOV/switchdev configuration. In this
> mode, forwarding (output) action is still performed by OVS-DPDK SW datapath.
>
> Thanks,
> -Harsha
>
> **
>
> v4-->v5:
>
> - Rebased to the current ovs-master (includes vxlan-encap full offload)
> - Added 2 patches to support partial offload of vlan push/pop actions
>
> v3-->v4:
>
> - Removed mega-ufid mapping code from dpif-netdev (patch #5) and updated the
>   existing mega-ufid mapping code in netdev-offload-dpdk to support partial
>   action offload.
>
> v2-->v3:
>
> - Added more commit log comments in the refactoring patch (#1).
> - Removed wrapper function for should_partial_offload_egress().
> - Removed partial offload check for output action in parse_flow_actions().
> - Changed patch sequence (skip-encap and get-stats before offload patch).
> - Removed locking code (details in email), added inline comments.
> - Moved mega-ufid mapping code from skip-encap (#3) to the offload patch (#5).
>
> v1-->v2:
>
> Fixed review comments from Eli:
> - Revamped action list parsing to reject multiple clone/output actions
> - Updated comments to reflect support for single clone/output action
> - Removed place-holder function for ingress partial action offload
> - Created a separate patch (#2) to query dpdk-vhost netdevs
> - Set transfer attribute to 0 for partial action offload
> - Updated data type of 'dp_flow' in 'dp_netdev_execute_aux'
> - Added a mutex to synchronize between offload and datapath threads
> - Avoid fall back to mark/rss when egress partial offload fails
> - Drop count action for partial action offload
>
> Other changes:
> - Avoid duplicate offload requests for the same mega-ufid (from PMD threads)
> - Added a coverage counter to track pkts with tnl-push partial offloaded
> - Fixed dp_netdev_pmd_remove_flow() to delete partial offloaded flow
>
> **
>
> Sriharsha Basavapatna (7):
>   dpif-netdev: Refactor dp_netdev_flow_offload_put()
>   netdev-dpdk: provide a function to identify dpdk-vhost netdevs
>   dpif-netdev: Skip encap action during datapath execution
>   dpif-netdev: Support flow_get() with partial-action-offload
>   dpif-netdev: Support partial-action-offload of VXLAN encap flow
>   dpif-netdev: Support partial offload of PUSH_VLAN action
>   dpif-netdev: Support partial offload of POP_VLAN action
>
>  lib/dpif-netdev.c | 463 ++
>  lib/netdev-dpdk.c |   5 +
>  lib/netdev-dpdk.h |   1 +
>  lib/netdev-offload-dpdk.c |  96 ++--
>  lib/netdev-offload.h  |   3 +
>  lib/odp-execute.c |  19 +-
>  6 files changed, 502 insertions(+), 85 deletions(-)
>
> --
> 2.25.0.rc2
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5 5/7] dpif-netdev: Support partial-action-offload of VXLAN encap flow

2020-07-09 Thread Sriharsha Basavapatna via dev
In this patch, we support offloading of VXLAN_ENCAP action for a vhost-user
port (aka "partial-action-offload"). At the time of offloading the flow, we
determine if the flow can be offloaded to an egress device, if the input
port is not offload capable such as a vhost-user port. We then offload the
flow with a VXLAN_ENCAP RTE action, to the egress device. We do not add
the OUTPUT RTE action, which indicates to the PMD that is is a partial
action offload request. Note that since the action is being offloaded in
egress direction, classification is expected to be done by OVS SW datapath
and hence there's no need to offload a MARK action.

If offload succeeds, we save the information in 'dp_netdev_flow' so that
we skip execution of the corresponding action (previous patch) during SW
datapath processing.

Reviewed-by: Hemal Shah 
Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev.c | 212 --
 lib/netdev-offload-dpdk.c |  78 ++
 lib/netdev-offload.h  |   2 +
 3 files changed, 262 insertions(+), 30 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e208000a6..ed1b009c2 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2499,10 +2499,174 @@ dp_netdev_append_flow_offload(struct 
dp_flow_offload_item *offload)
 ovs_mutex_unlock(_flow_offload.mutex);
 }
 
+static int
+partial_offload_egress_flow_del(struct dp_flow_offload_item *offload)
+{
+struct dp_netdev_pmd_thread *pmd = offload->pmd;
+struct dp_netdev_flow *flow = offload->flow;
+const char *dpif_type_str = dpif_normalize_type(pmd->dp->class->type);
+struct netdev *port;
+int ret;
+
+port = netdev_ports_get(flow->egress_offload_port, dpif_type_str);
+if (!port) {
+return -1;
+}
+
+/* Taking a global 'port_mutex' to fulfill thread safety
+ * restrictions for the netdev-offload-dpdk module. */
+ovs_mutex_lock(>dp->port_mutex);
+ret = netdev_flow_del(port, >mega_ufid, NULL);
+ovs_mutex_unlock(>dp->port_mutex);
+netdev_close(port);
+
+if (ret) {
+return ret;
+}
+
+flow->egress_offload_port = NULL;
+flow->partial_actions_offloaded = false;
+
+VLOG_DBG_RL("%s: flow: %p mega_ufid: "UUID_FMT" pmd_id: %d\n", __func__,
+flow, UUID_ARGS((struct uuid *)>mega_ufid),
+offload->flow->pmd_id);
+return ret;
+}
+
 static int
 dp_netdev_flow_offload_del(struct dp_flow_offload_item *offload)
 {
-return mark_to_flow_disassociate(offload->pmd, offload->flow);
+if (unlikely(offload->flow->partial_actions_offloaded &&
+offload->flow->egress_offload_port != ODPP_NONE)) {
+return partial_offload_egress_flow_del(offload);
+} else {
+return mark_to_flow_disassociate(offload->pmd, offload->flow);
+}
+}
+
+/* Structure to hold a nl_parsed OVS action */
+struct action_attr {
+int type;/* OVS action type */
+struct nlattr *action;   /* action attribute */
+};
+
+/*
+ * Maxium number of actions to be parsed while selecting a flow for partial
+ * action offload. This number is currently based on the minimum number of
+ * attributes seen with the tunnel encap action (clone, tunnel_push, output).
+ * This number includes output action to a single egress device (uplink) and
+ * supports neither multiple clone() actions nor multiple output actions.
+ * This number could change if and when we support other actions or
+ * combinations of actions for partial offload.
+ */
+#define MAX_ACTION_ATTRS3 /* Max # action attributes supported */
+
+/*
+ * This function parses the list of OVS "actions" of length "actions_len",
+ * and returns them in an array of action "attrs", of size "max_attrs".
+ * The parsed number of actions is returned in "num_attrs". If the number
+ * of actions exceeds "max_attrs", parsing is stopped and E2BIG is returned.
+ * Otherwise, returns success (0).
+ */
+static int
+parse_nlattr_actions(struct nlattr *actions, size_t actions_len,
+ struct action_attr *attrs, int max_attrs, int *num_attrs)
+{
+const struct nlattr *a;
+unsigned int left;
+int num_actions = 0;
+int n_attrs = 0;
+int rc = 0;
+int type;
+
+*num_attrs = 0;
+
+NL_ATTR_FOR_EACH (a, left, actions, actions_len) {
+type = nl_attr_type(a);
+
+if (num_actions >= max_attrs) {
+*num_attrs = num_actions;
+return E2BIG;
+}
+
+attrs[num_actions].type = type;
+attrs[num_actions].action = a;
+num_actions++;
+if (type == OVS_ACTION_ATTR_CLONE) {
+rc = parse_nlattr_actions(nl_attr_get(a), nl_attr_get_size(a),
+  [num_actions],
+  (max_attrs - num_actions), _attrs);
+num_actions += n_attrs;
+if (rc == E2BIG) {
+*num_attrs = num_actions;
+return rc;
+}
+}

[ovs-dev] [PATCH v5 6/7] dpif-netdev: Support partial offload of PUSH_VLAN action

2020-07-09 Thread Sriharsha Basavapatna via dev
If the input-port is a vhost-user port and the action is PUSH_VLAN,
offload the action on the egress device if it is offload capable.

Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev.c | 59 ---
 lib/odp-execute.c | 11 +
 2 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index ed1b009c2..bfb016059 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_bond);
 COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
 COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
 COVERAGE_DEFINE(datapath_skip_tunnel_push);
+COVERAGE_DEFINE(datapath_skip_vlan_push);
 
 /* Protects against changes to 'dp_netdevs'. */
 static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
@@ -2551,15 +2552,18 @@ struct action_attr {
 };
 
 /*
- * Maxium number of actions to be parsed while selecting a flow for partial
- * action offload. This number is currently based on the minimum number of
- * attributes seen with the tunnel encap action (clone, tunnel_push, output).
- * This number includes output action to a single egress device (uplink) and
- * supports neither multiple clone() actions nor multiple output actions.
- * This number could change if and when we support other actions or
- * combinations of actions for partial offload.
+ * Maxium number of actions to be parsed while selecting a flow for egress
+ * partial action offload. This number is currently based on the minimum
+ * number of attributes seen with the tunnel encap action (clone, tunnel_push,
+ * output). This number includes output action to a single egress device
+ * (uplink) and supports neither multiple clone() actions nor multiple output
+ * actions.
  */
-#define MAX_ACTION_ATTRS3 /* Max # action attributes supported */
+enum num_action_attr_egress {
+VLAN_PUSH_ATTRS = 2,/* vlan_push, output */
+TUNNEL_PUSH_ATTRS = 3,  /* clone, tunnel_push, output */
+MAX_ACTION_ATTRS_EGRESS = TUNNEL_PUSH_ATTRS
+};
 
 /*
  * This function parses the list of OVS "actions" of length "actions_len",
@@ -2620,7 +2624,7 @@ should_partial_offload_egress(struct netdev *in_netdev,
 {
 const char *dpif_type_str =
 dpif_normalize_type(offload->pmd->dp->class->type);
-struct action_attr attrs[MAX_ACTION_ATTRS];
+struct action_attr attrs[MAX_ACTION_ATTRS_EGRESS];
 odp_port_t out_port = ODPP_NONE;
 struct netdev *out_netdev;
 int num_attrs = 0;
@@ -2633,26 +2637,31 @@ should_partial_offload_egress(struct netdev *in_netdev,
 }
 
 rc = parse_nlattr_actions(offload->actions, offload->actions_len, attrs,
-  MAX_ACTION_ATTRS, _attrs);
+  MAX_ACTION_ATTRS_EGRESS, _attrs);
 if (rc == E2BIG) {
 /* Action list too big; decline partial offload */
 return false;
 }
 
-/* Number of attrs expected with tunnel encap action */
-if (num_attrs < MAX_ACTION_ATTRS) {
+/* Minimum number of attrs expected (push_vlan) */
+if (num_attrs < VLAN_PUSH_ATTRS) {
 return false;
 }
 
-/* Only support clone sub-actions for now, tnl-push specifically. */
-if (attrs[0].type != OVS_ACTION_ATTR_CLONE ||
-attrs[1].type != OVS_ACTION_ATTR_TUNNEL_PUSH ||
-attrs[2].type != OVS_ACTION_ATTR_OUTPUT) {
+/* Only support clone(tnl-push) or push_vlan actions for now. */
+if (num_attrs == TUNNEL_PUSH_ATTRS &&
+(attrs[0].type != OVS_ACTION_ATTR_CLONE ||
+ attrs[1].type != OVS_ACTION_ATTR_TUNNEL_PUSH ||
+ attrs[2].type != OVS_ACTION_ATTR_OUTPUT)) {
+return false;
+} else if (num_attrs == VLAN_PUSH_ATTRS &&
+   (attrs[0].type != OVS_ACTION_ATTR_PUSH_VLAN ||
+attrs[1].type != OVS_ACTION_ATTR_OUTPUT)) {
 return false;
 }
 
 /* Egress partial-offload needs an output action at the end. */
-out_port = nl_attr_get_odp_port(attrs[2].action);
+out_port = nl_attr_get_odp_port(attrs[num_attrs - 1].action);
 if (out_port == ODPP_NONE) {
 return false;
 }
@@ -7985,7 +7994,21 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 pmd->ctx.now);
 break;
 
-case OVS_ACTION_ATTR_PUSH_VLAN:
+case OVS_ACTION_ATTR_PUSH_VLAN: {
+const struct ovs_action_push_vlan *vlan = nl_attr_get(a);
+struct dp_packet *packet;
+
+if (!dp_flow || !dp_flow->partial_actions_offloaded) {
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets_) {
+eth_push_vlan(packet, vlan->vlan_tpid, vlan->vlan_tci);
+}
+} else {
+packet_count = dp_packet_batch_size(packets_);
+COVERAGE_ADD(datapath_skip_vlan_push, packet_count);
+}
+break;
+}
+
 case OVS_ACTION_ATTR_POP_VLAN:
 case OVS_ACTION_ATTR_PUSH_MPLS:
 case OVS_ACTION_ATTR_POP_MPLS:
diff 

[ovs-dev] [PATCH v5 7/7] dpif-netdev: Support partial offload of POP_VLAN action

2020-07-09 Thread Sriharsha Basavapatna via dev
If the output-port is a vhost-user port and the action is POP_VLAN,
offload the action on the ingress device if it is offload capable.

Note:
- With ingress partial action offload, the flow must be added to the
  mark-to-flow table. Otherwise, since the action (e.g, POP_VLAN) is
  already performed in the HW, the flow can't be located in the datapath
  flow tables and caches.
- The mark action is offloaded implicitly to facilitate this mark-to-flow
  lookup.
- Add a new member 'partial_actions_offloaded' to the info structure passed
  to the offload layer. When the offload layer successfully offloads the
  partial action, it indicates this to the dpif-netdev layer through this
  flag. This is needed by the dpif-netdev layer to distinguish partial
  offload (i.e, classification offload or mark,rss actions) from partial
  actions offload (classification + some actions, e.g vlan-pop,mark actions)
  in ingress direction.

Signed-off-by: Sriharsha Basavapatna 
---
 lib/dpif-netdev.c | 114 --
 lib/netdev-offload-dpdk.c |  26 +++--
 lib/netdev-offload.h  |   1 +
 lib/odp-execute.c |   8 +--
 4 files changed, 132 insertions(+), 17 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index bfb016059..3f295ca33 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -115,6 +115,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
 COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
 COVERAGE_DEFINE(datapath_skip_tunnel_push);
 COVERAGE_DEFINE(datapath_skip_vlan_push);
+COVERAGE_DEFINE(datapath_skip_vlan_pop);
 
 /* Protects against changes to 'dp_netdevs'. */
 static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
@@ -2565,6 +2566,18 @@ enum num_action_attr_egress {
 MAX_ACTION_ATTRS_EGRESS = TUNNEL_PUSH_ATTRS
 };
 
+/*
+ * Maxium number of actions to be parsed while selecting a flow for ingress
+ * partial action offload. This number is currently based on the minimum
+ * number of attributes seen with the vlan pop action (vlan_pop, output).
+ * This number includes output action to a single vhost device (uplink) and
+ * does not support multiple output actions.
+ */
+enum num_action_attr_ingress {
+VLAN_POP_ATTRS = 2,/* vlan_pop, output */
+MAX_ACTION_ATTRS_INGRESS = VLAN_POP_ATTRS
+};
+
 /*
  * This function parses the list of OVS "actions" of length "actions_len",
  * and returns them in an array of action "attrs", of size "max_attrs".
@@ -2678,6 +2691,83 @@ should_partial_offload_egress(struct netdev *in_netdev,
 return true;
 }
 
+/* This function determines if the given flow should be partially offloaded
+ * on the ingress device, when the out-port is not offload-capable like a
+ * vhost-user port. The function currently supports offloading of only
+ * vlan-pop action.
+ */
+static bool
+should_partial_offload_ingress(struct netdev *in_netdev,
+   struct dp_flow_offload_item *offload)
+{
+const char *dpif_type_str =
+dpif_normalize_type(offload->pmd->dp->class->type);
+struct action_attr attrs[MAX_ACTION_ATTRS_INGRESS];
+odp_port_t out_port = ODPP_NONE;
+struct netdev *out_netdev;
+int num_attrs = 0;
+int type;
+int rc;
+
+/* Support ingress partial-offload only
+ * when the in-port supports offloads.
+ */
+if (!netdev_dpdk_flow_api_supported(in_netdev)) {
+ return false;
+}
+
+rc = parse_nlattr_actions(offload->actions, offload->actions_len, attrs,
+  MAX_ACTION_ATTRS_INGRESS, _attrs);
+if (rc == E2BIG) {
+/* Action list too big; decline partial offload */
+return false;
+}
+
+/* Minimum number of attrs expected (pop_vlan) */
+if (num_attrs < VLAN_POP_ATTRS) {
+return false;
+}
+
+if (num_attrs == VLAN_POP_ATTRS &&
+(attrs[0].type != OVS_ACTION_ATTR_POP_VLAN ||
+ attrs[1].type != OVS_ACTION_ATTR_OUTPUT)) {
+return false;
+}
+
+/* Ingress partial-offload needs an output action at the end. */
+out_port = nl_attr_get_odp_port(attrs[num_attrs - 1].action);
+if (out_port == ODPP_NONE) {
+return false;
+}
+
+/* Support ingress partial-offload only if out-port is vhost-user. */
+out_netdev = netdev_ports_get(out_port, dpif_type_str);
+if (out_netdev && is_dpdk_vhost_netdev(out_netdev)) {
+return true;
+}
+
+return false;
+}
+
+/* This function determines if the given flow actions can be partially
+ * offloaded. Partial action offload is attempted when either the in-port
+ * or the out-port for the flow is a vhost-user port.
+ */
+static bool
+should_partial_offload(struct netdev *in_netdev,
+   struct dp_flow_offload_item *offload,
+   struct netdev **egress_netdev)
+{
+if (should_partial_offload_ingress(in_netdev, offload)) {
+return true;
+} else if (should_partial_offload_egress(in_netdev, offload,
+  

[ovs-dev] [PATCH v5 2/7] netdev-dpdk: provide a function to identify dpdk-vhost netdevs

2020-07-09 Thread Sriharsha Basavapatna via dev
This patch adds a function to determine if a given netdev belongs to the
dpdk-vhost class, using the netdev_class specific data.

Reviewed-by: Hemal Shah 
Signed-off-by: Sriharsha Basavapatna 
---
 lib/netdev-dpdk.c | 5 +
 lib/netdev-dpdk.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 44ebf96da..a2a9bb8e7 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -558,6 +558,11 @@ is_dpdk_class(const struct netdev_class *class)
|| class->destruct == netdev_dpdk_vhost_destruct;
 }
 
+bool is_dpdk_vhost_netdev(struct netdev *netdev)
+{
+return netdev->netdev_class->destruct == netdev_dpdk_vhost_destruct;
+}
+
 /* DPDK NIC drivers allocate RX buffers at a particular granularity, typically
  * aligned at 1k or less. If a declared mbuf size is not a multiple of this
  * value, insufficient buffers are allocated to accomodate the packet in its
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index 848346cb4..ab3c3102e 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -37,6 +37,7 @@ void netdev_dpdk_register(void);
 void free_dpdk_buf(struct dp_packet *);
 
 bool netdev_dpdk_flow_api_supported(struct netdev *);
+bool is_dpdk_vhost_netdev(struct netdev *);
 
 int
 netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
-- 
2.25.0.rc2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


  1   2   3   >