Re: [ovs-dev] [PATCH v3 07/10] netdev-offload-tc: Add recirculation, support via tc chains

2019-12-10 Thread Paul Blakey
Hi sorry for late reply, didn't get this email.

On 03.12.2019 16:16, Ilya Maximets wrote:
 > From: Ilya Maximets 
 > On 03.12.2019 14:45, Roi Dayan wrote:
 > > From: Paul Blakey 
 > >
 > > Each recirculation id will create a tc chain, and we translate
 > > the recirculation action to a tc goto chain action.
 > >
 > > We check for kernel support for this by probing OvS Datapath for the
 > > tc recirc id sharing feature. If supported, we can offload rules
 > > that match on recirc_id, and recirculation action safely.
 > > ---
 > >
 > > Changelog:
 > > V2->V3:
 > > Merged part of probe for recirc_id support in here to help 
future git
 > > bisect.
 > > Added tunnel released check to avoid bug with mirroring
 > > Removed cascading condition in netdev_tc_flow_put() check of 
recirc_id
 > > support
 > >
 > > V1->V2:
 > > moved make_tc_id_chain helper to tc.h as static inline
 > > updated is_tc_id_eq with chain compare instead of find_ufid
 > >
 > > Signed-off-by: Paul Blakey 
 >
 > This tag should go before the '---', otherwise it'll not be part of a
 > commit-message.
 > You may see that checkpatch complains about missing signed-off on 
some of the
 > patches.

Yea we say it, but since I wanted to remove this changelog for last 
revision it was just easier to manage than git notes :) will do it for 
next one.

 > > ---
 > >  lib/dpif-netlink.c  |  1 +
 > >  lib/netdev-offload-tc.c | 61
 > > +
 > >  lib/tc.c    | 49 ++-
 > >  lib/tc.h    | 18 ++-
 > >  4 files changed, 112 insertions(+), 17 deletions(-)
 > >
 > > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
 > > index 92da918544d1..f0e870543ae5 100644
 > > --- a/lib/dpif-netlink.c
 > > +++ b/lib/dpif-netlink.c

[]

 > > @@ -983,12 +995,6 @@ test_key_and_mask(struct match *match)
 > >  return EOPNOTSUPP;
 > >  }
 > >
 > > -    if (mask->recirc_id && key->recirc_id) {
 > > -    VLOG_DBG_RL(, "offloading attribute recirc_id isn't 
supported");
 > > -    return EOPNOTSUPP;
 > > -    }
 > > -    mask->recirc_id = 0;
 > > -
 > >  if (mask->dp_hash) {
 > >  VLOG_DBG_RL(, "offloading attribute dp_hash isn't 
supported");
 > >  return EOPNOTSUPP;
 > > @@ -1139,6 +1145,25 @@ flower_match_to_tun_opt(struct tc_flower 
*flower,
 > > const struct flow_tnl *tnl,
 > >  flower->mask.tunnel.metadata.present.len = 
tnl->metadata.present.len;
 > >  }
 > >
 > > +static bool
 > > +recirc_id_sharing_support(struct dpif *dpif)
 > > +{
 > > +    static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
 > > +    static bool supported = false;
 > > +    int err;
 > > +
 > > +    if (ovsthread_once_start()) {
 > > +    err = dpif_set_features(dpif, OVS_DP_F_TC_RECIRC_SHARING);
 >
 > I don't think that it's the right thing to do to change datapath 
configuration
 > from the netdev-offload implementation.  This also requires you to 
have access
 > to the dpif-provider internals breaking the OVS architecture of 
modules. (You
 > may see that this patch set doesn't build at some point because you 
need to know
 > the internal structure of a dpif instance.)
 >
 > I'd suggest to move this to the common feature probing code, i.e. to 
ofproto.
 > After that you may pass supported features via offload_info structure.
 >
 > Thoughts?
 >

We are calling dpif_set_features, not accessing dpif internals.

The thing is, that this sets a feature and not just probes for a feature.

It enables a static branch on the kernel side which we might not want to 
enable any time,

and an offloading a recirc() action was our hook to know that this is 
wanted, as we talked

about in the kernel patch.

I'm not sure how to do that from ofproto layer.








___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 03/10] tc: Introduce tcf_id to specify a tc filter

2019-12-10 Thread Paul Blakey


On 12/10/2019 7:51 PM, Marcelo Ricardo Leitner wrote:
> On Mon, Dec 09, 2019 at 03:09:37PM +, Paul Blakey wrote:
>> On 12/3/2019 3:45 PM, Roi Dayan wrote:
>>> @@ -1428,38 +1419,35 @@ netdev_tc_flow_put(struct netdev *netdev, struct 
>>> match *match,
>>>}
>>>}
>>>
>>> -block_id = get_block_id_from_netdev(netdev);
>> Accidentally removed this line, will post V4 soon with this fix.
>>
>> There is nothing else pending so I welcome more replies :)
> Not from me, but Ilya metioned an open one at
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-December/365459.html
> (at the end of the email)
>
>Marcelo


Ah, didn't get that. will reply. thanks.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv4] userspace: Add GTP-U support.

2019-12-10 Thread William Tu
On Sun, Dec 08, 2019 at 08:11:19AM +, Roni Bar Yanai wrote:
> 
Hi Roni,

Thanks for your feedback!

> GTP-U header size is not constant, you *must* take into account the flags, 
> mainly
>  the sequence. The use of sequence in GTP-U is optional but some devices do 
> use
>  it.  see from 3GPP definition:
> "For PGW, SGW and eNodeB the usage of sequence numbers in G-PDUs is optional, 
> but
>  if GTP-U protocol entities in these nodes are relaying G-PDUs to other 
> nodes, they 
>  shall relay the sequence numbers as well. An RNC, SGSN or GGSN shall reorder 
> out 
>  of sequence T-PDUs when in sequence delivery is required. This is optional"

I see, thanks.
I was reading slide 9 and 10 from
https://www.slideshare.net/aliirfan04/gtp-overview
and I thought sequence number is not needed.

I will add sequence number support.

> 
> The assumption that GTP-U is only data is not correct, you have some signaling
> for example END MARKER (like you wrote). This message is sent when there is a 
> mobility
>  between cells, and the message contains a TEID and indicates that last 
> packet sent from 
> the origin cell, to prevent packet re-order as handover might have packets
> on the fly. So I would expected that END MARKER will do the exact same 
> forwarding as the GTP-U data.

OK, thanks.

> 
> see inline
> 
> >-Original Message-
> >From: dev  On Behalf Of William Tu
> >Sent: Thursday, December 5, 2019 10:44 PM
> >To: d...@openvswitch.org
> >Subject: [ovs-dev] [PATCHv4] userspace: Add GTP-U support.
> >
> >GTP, GPRS Tunneling Protocol, is a group of IP-based communications
> >protocols used to carry general packet radio service (GPRS) within
> >GSM, UMTS and LTE networks.  GTP protocol has two parts: Signalling
> >(GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used
> >for setting up GTP-U protocol, which is an IP-in-UDP tunneling
> >protocol. Usually GTP is used in connecting between base station for
> >radio, Serving Gateway (S-GW), and PDN Gateway (P-GW).
> >
> >This patch implements GTP-U protocol for userspace datapath,
> >supporting only required header fields and G-PDU message type.
> >See spec in:
> >https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.ietf.o
> >rg%2Fhtml%2Fdraft-hmm-dmm-5g-uplane-analysis-
> >00data=02%7C01%7Croniba%40mellanox.com%7Cb73529a0bea34637f9ed0
> >8d779c3f1e4%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C63711175479
> >1292921sdata=tY0GruNTBV9NWZg9gO2Lo4PWQZ1swHkB1AwdEg3AJUE%3
> >Dreserved=0
> >
> >Signed-off-by: Yi Yang 
> >Co-authored-by: Yi Yang 
> >Signed-off-by: William Tu 
> >---
> >v3 -> v4:
> >  - applied Ben's doc revise
> >  - increment FLOW_WC_SEQ to 42
> >  - minor fixes
> >  - travis:
> >https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-
> >ci.org%2Fwilliamtu%2Fovs-
> >travis%2Fbuilds%2F621289678data=02%7C01%7Croniba%40mellanox.com%
> >7Cb73529a0bea34637f9ed08d779c3f1e4%7Ca652971c7d2e4d9ba6a4d149256f461b
> >%7C0%7C0%7C637111754791292921sdata=eYRkIjcPUQqJwBQndsjCgtcPGG1
> >l4QFUnAWW4vwPjpI%3Dreserved=0
> >
> >v2 -> v3:
> >  - pick up the code from v2, rebase to master
> >  - many fixes in code, docs, and add more tests
> >  - travis:
> >https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-
> >ci.org%2Fwilliamtu%2Fovs-
> >travis%2Fbuilds%2F619799361data=02%7C01%7Croniba%40mellanox.com%
> >7Cb73529a0bea34637f9ed08d779c3f1e4%7Ca652971c7d2e4d9ba6a4d149256f461b
> >%7C0%7C0%7C637111754791292921sdata=cu5bdmJ4ydInKt4chxOcGy33K5E
> >9BMDzxgk5%2F7Cq%2FtI%3Dreserved=0
> >
> >v1 -> v2:
> >  - Add new packet_type PT_GTPU_MSG to handle GTP-U signaling message,
> >GTP-U signaling message should be steered into a control plane handler
> >by explicit openflow entry in flow table.
> >  - Fix gtpu_flags and gptu_msgtype set issue.
> >  - Verify communication with kernel GTP implementation from Jiannan
> >Ouyang.
> >  - Fix unit test issue and make sure all the unit tests can pass.
> >  - This patch series add GTP-U tunnel support in DPDK userspace,
> >GTP-U is used for carrying user data within the GPRS core network and
> >between the radio access network and the core network. The user data
> >transported can be packets in any of IPv4, IPv6, or PPP formats.
> >---
> > Documentation/faq/configuration.rst   |  13 +++
> > Documentation/faq/releases.rst|   1 +
> > NEWS  |   3 +
> > datapath/linux/compat/include/linux/openvswitch.h |   2 +
> > include/openvswitch/flow.h|   4 +-
> > include/openvswitch/match.h   |   6 ++
> > include/openvswitch/meta-flow.h   |  28 ++
> > include/openvswitch/packets.h |   4 +-
> > lib/dpif-netlink-rtnl.c   |   5 +
> > lib/dpif-netlink.c|   5 +
> > lib/flow.c|  22 +++--
> > lib/flow.h

Re: [ovs-dev] thundering herd wakeup of handler threads

2019-12-10 Thread Jason Baron via dev



On 12/10/19 5:20 PM, David Ahern wrote:
> On 12/10/19 3:09 PM, Jason Baron wrote:
>> Hi David,
>>
>> The idea is that we try and queue new work to 'idle' threads in an
>> attempt to distribute a workload. Thus, once we find an 'idle' thread we
>> stop waking up other threads. While we are searching the wakeup list for
>> idle threads, we do queue an epoll event to the non-idle threads, this
>> doesn't mean they are woken up. It just means that when they go to
>> epoll_wait() to harvest events from the kernel, if the event is still
>> available it will be reported. If the condition for the event is no
>> longer true (because another thread consumed it), they the event
>> wouldn't be visible. So its a way of load balancing a workload while
>> also reducing the number of wakeups. Its 'exclusive' in the sense that
>> it will stop after it finds the first idle thread.
>>
>> We certainly can employ other wakeup strategies - there was interest
>> (and patches) for a strict 'round robin' but that has not been merged
>> upstream.
>>
>> I would like to better understand the current usecase. It sounds like
>> each thread as an epoll file descriptor. And each epoll file descriptor
>> is attached the name netlink socket. But when that netlink socket gets a
>> packet it causes all the threads to wakeup? Are you sure there is just 1
>> netlink socket that all epoll file desciptors are are attached to?
>>
> 
> Thanks for the response.
> 
> This is the code in question:
> 
> https://github.com/openvswitch/ovs/blob/branch-2.11/lib/dpif-netlink.c#L492
> 
> Yes, prior to finding the above code reference I had traced it to a
> single socket with all handler threads (71 threads on this 96 cpu box)
> on the wait queue.
> 
> The ovs kernel module is punting a packet to userspace. It generates a
> netlink message and invokes netlink_unicast. This the stack trace:
> 
> ad09cc02 ttwu_do_wakeup+0x92 ([kernel.kallsyms])
> ad09d945 try_to_wake_up+0x1d5 ([kernel.kallsyms])
> ad257275 pollwake+0x75 ([kernel.kallsyms])
> ad0b58a4 __wake_up_common+0x74 ([kernel.kallsyms])
> ad0b59cc __wake_up_common_lock+0x7c ([kernel.kallsyms])
> ad289ecc ep_poll_wakeup_proc+0x1c ([kernel.kallsyms])
> ad28a4bc ep_call_nested.constprop.18+0xbc
> ([kernel.kallsyms])
> ad28b0f2 ep_poll_callback+0x172 ([kernel.kallsyms])
> ad0b58a4 __wake_up_common+0x74 ([kernel.kallsyms])
> ad0b59cc __wake_up_common_lock+0x7c ([kernel.kallsyms])
> ad794af9 sock_def_readable+0x39 ([kernel.kallsyms])
> ad7e846e __netlink_sendskb+0x3e ([kernel.kallsyms])
> ad7eb11a netlink_unicast+0x20a ([kernel.kallsyms])
> c07abd44 queue_userspace_packet+0x2d4 ([kernel.kallsyms])
> c07ac330 ovs_dp_upcall+0x50 ([kernel.kallsyms])
> 
> 
> A probe on sock_def_readable shows it is a single socket that the wait
> queue is processed. Eventually ttwu_do_wakeup is invoked 71 times (once
> for each thread). In some cases I see the awakened threads immediately
> running on the target_cpu, while the queue walk continues to wake up all
> threads. Only the first one is going to handle the packet so the rest of
> the wakeups are just noise.
> 
> On this system in just a 1-second interval I see this sequence play out
> 400+ times.
> 


That stack trace is interesting. I *think* it means you are doing
select() or poll() on the epoll file descriptors? I say that because I
see 'pollwake()' in the stack. Notice that pollwake() is only in
fs/select.c.

Thanks,

-jason
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] thundering herd wakeup of handler threads

2019-12-10 Thread Jason Baron via dev



On 12/10/19 4:00 PM, David Ahern wrote:
> [ adding Jason as author of the patch that added the epoll exclusive flag ]
> 
> On 12/10/19 12:37 PM, Matteo Croce wrote:
>> On Tue, Dec 10, 2019 at 8:13 PM David Ahern  wrote:
>>>
>>> Hi Matteo:
>>>
>>> On a hypervisor running a 4.14.91 kernel and OVS 2.11 I am seeing a
>>> thundering herd wake up problem. Every packet punted to userspace wakes
>>> up every one of the handler threads. On a box with 96 cpus, there are 71
>>> handler threads which means 71 process wakeups for every packet punted.
>>>
>>> This is really easy to see, just watch sched:sched_wakeup tracepoints.
>>> With a few extra probes:
>>>
>>> perf probe sock_def_readable sk=%di
>>> perf probe ep_poll_callback wait=%di mode=%si sync=%dx key=%cx
>>> perf probe __wake_up_common wq_head=%di mode=%si nr_exclusive=%dx
>>> wake_flags=%cx key=%8
>>>
>>> you can see there is a single netlink socket and its wait queue contains
>>> an entry for every handler thread.
>>>
>>> This does not happen with the 2.7.3 version. Roaming commits it appears
>>> that the change in behavior comes from this commit:
>>>
>>> commit 69c51582ff786a68fc325c1c50624715482bc460
>>> Author: Matteo Croce 
>>> Date:   Tue Sep 25 10:51:05 2018 +0200
>>>
>>> dpif-netlink: don't allocate per thread netlink sockets
>>>
>>>
>>> Is this a known problem?
>>>
>>> David
>>>
>>
>> Hi David,
>>
>> before my patch, vswitchd created NxM sockets, being N the ports and M
>> the active cores,
>> because every thread opens a netlink socket per port.
>>
>> With my patch, a pool is created with N socket, one per port, and all
>> the threads polls the same list
>> with the EPOLLEXCLUSIVE flag.
>> As the name suggests, EPOLLEXCLUSIVE lets the kernel wakeup only one
>> of the waiting threads.
>>
>> I'm not aware of this problem, but it goes against the intended
>> behaviour of EPOLLEXCLUSIVE.
>> Such flag exists since Linux 4.5, can you check that it's passed
>> correctly to epoll()?
>>
> 
> This the commit that added the EXCLUSIVE flag:
> 
> commit df0108c5da561c66c333bb46bfe3c1fc65905898
> Author: Jason Baron 
> Date:   Wed Jan 20 14:59:24 2016 -0800
> 
> epoll: add EPOLLEXCLUSIVE flag
> 
> 
> The commit message acknowledges that multiple threads can still be awakened:
> 
> "The implementation walks the list of exclusive waiters, and queues an
> event to each epfd, until it finds the first waiter that has threads
> blocked on it via epoll_wait().  The idea is to search for threads which
> are idle and ready to process the wakeup events.  Thus, we queue an
> event to at least 1 epfd, but may still potentially queue an event to
> all epfds that are attached to the shared fd source."
> 
> To me that means all idle handler threads are going to be awakened on
> each upcall message even though only 1 is needed to handle the message.
> 
> Jason: What was the rationale behind the exclusive flag that still wakes
> up more than 1 waiter? In the case of OVS and vswitchd I am seeing all N
> handler threads awakened on every single event which is a horrible
> scaling property.
> 

Hi David,

The idea is that we try and queue new work to 'idle' threads in an
attempt to distribute a workload. Thus, once we find an 'idle' thread we
stop waking up other threads. While we are searching the wakeup list for
idle threads, we do queue an epoll event to the non-idle threads, this
doesn't mean they are woken up. It just means that when they go to
epoll_wait() to harvest events from the kernel, if the event is still
available it will be reported. If the condition for the event is no
longer true (because another thread consumed it), they the event
wouldn't be visible. So its a way of load balancing a workload while
also reducing the number of wakeups. Its 'exclusive' in the sense that
it will stop after it finds the first idle thread.

We certainly can employ other wakeup strategies - there was interest
(and patches) for a strict 'round robin' but that has not been merged
upstream.

I would like to better understand the current usecase. It sounds like
each thread as an epoll file descriptor. And each epoll file descriptor
is attached the name netlink socket. But when that netlink socket gets a
packet it causes all the threads to wakeup? Are you sure there is just 1
netlink socket that all epoll file desciptors are are attached to?

Thanks,

-Jason



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] thundering herd wakeup of handler threads

2019-12-10 Thread David Ahern
On 12/10/19 3:09 PM, Jason Baron wrote:
> Hi David,
> 
> The idea is that we try and queue new work to 'idle' threads in an
> attempt to distribute a workload. Thus, once we find an 'idle' thread we
> stop waking up other threads. While we are searching the wakeup list for
> idle threads, we do queue an epoll event to the non-idle threads, this
> doesn't mean they are woken up. It just means that when they go to
> epoll_wait() to harvest events from the kernel, if the event is still
> available it will be reported. If the condition for the event is no
> longer true (because another thread consumed it), they the event
> wouldn't be visible. So its a way of load balancing a workload while
> also reducing the number of wakeups. Its 'exclusive' in the sense that
> it will stop after it finds the first idle thread.
> 
> We certainly can employ other wakeup strategies - there was interest
> (and patches) for a strict 'round robin' but that has not been merged
> upstream.
> 
> I would like to better understand the current usecase. It sounds like
> each thread as an epoll file descriptor. And each epoll file descriptor
> is attached the name netlink socket. But when that netlink socket gets a
> packet it causes all the threads to wakeup? Are you sure there is just 1
> netlink socket that all epoll file desciptors are are attached to?
> 

Thanks for the response.

This is the code in question:

https://github.com/openvswitch/ovs/blob/branch-2.11/lib/dpif-netlink.c#L492

Yes, prior to finding the above code reference I had traced it to a
single socket with all handler threads (71 threads on this 96 cpu box)
on the wait queue.

The ovs kernel module is punting a packet to userspace. It generates a
netlink message and invokes netlink_unicast. This the stack trace:

ad09cc02 ttwu_do_wakeup+0x92 ([kernel.kallsyms])
ad09d945 try_to_wake_up+0x1d5 ([kernel.kallsyms])
ad257275 pollwake+0x75 ([kernel.kallsyms])
ad0b58a4 __wake_up_common+0x74 ([kernel.kallsyms])
ad0b59cc __wake_up_common_lock+0x7c ([kernel.kallsyms])
ad289ecc ep_poll_wakeup_proc+0x1c ([kernel.kallsyms])
ad28a4bc ep_call_nested.constprop.18+0xbc
([kernel.kallsyms])
ad28b0f2 ep_poll_callback+0x172 ([kernel.kallsyms])
ad0b58a4 __wake_up_common+0x74 ([kernel.kallsyms])
ad0b59cc __wake_up_common_lock+0x7c ([kernel.kallsyms])
ad794af9 sock_def_readable+0x39 ([kernel.kallsyms])
ad7e846e __netlink_sendskb+0x3e ([kernel.kallsyms])
ad7eb11a netlink_unicast+0x20a ([kernel.kallsyms])
c07abd44 queue_userspace_packet+0x2d4 ([kernel.kallsyms])
c07ac330 ovs_dp_upcall+0x50 ([kernel.kallsyms])


A probe on sock_def_readable shows it is a single socket that the wait
queue is processed. Eventually ttwu_do_wakeup is invoked 71 times (once
for each thread). In some cases I see the awakened threads immediately
running on the target_cpu, while the queue walk continues to wake up all
threads. Only the first one is going to handle the packet so the rest of
the wakeups are just noise.

On this system in just a 1-second interval I see this sequence play out
400+ times.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] thundering herd wakeup of handler threads

2019-12-10 Thread David Ahern
On 12/10/19 2:20 PM, Matteo Croce wrote:
> 
> Before this patch (which unfortunately is needed to avoid -EMFILE
> errors with many ports), how many sockets are awakened when an ARP is
> received?
> 

on systems using 2.7.3 I see only a single handler thread awakened on
upcalls.

Yes, I saw the commit message on the change. At the moment, the socket
descriptor scaling problem has morphed into a thundering herd scheduling
problem with the added overhead of waking up all of threads in the
packet path.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] thundering herd wakeup of handler threads

2019-12-10 Thread Matteo Croce
On Tue, Dec 10, 2019 at 10:00 PM David Ahern  wrote:
>
> [ adding Jason as author of the patch that added the epoll exclusive flag ]
>
> On 12/10/19 12:37 PM, Matteo Croce wrote:
> > On Tue, Dec 10, 2019 at 8:13 PM David Ahern  wrote:
> >>
> >> Hi Matteo:
> >>
> >> On a hypervisor running a 4.14.91 kernel and OVS 2.11 I am seeing a
> >> thundering herd wake up problem. Every packet punted to userspace wakes
> >> up every one of the handler threads. On a box with 96 cpus, there are 71
> >> handler threads which means 71 process wakeups for every packet punted.
> >>
> >> This is really easy to see, just watch sched:sched_wakeup tracepoints.
> >> With a few extra probes:
> >>
> >> perf probe sock_def_readable sk=%di
> >> perf probe ep_poll_callback wait=%di mode=%si sync=%dx key=%cx
> >> perf probe __wake_up_common wq_head=%di mode=%si nr_exclusive=%dx
> >> wake_flags=%cx key=%8
> >>
> >> you can see there is a single netlink socket and its wait queue contains
> >> an entry for every handler thread.
> >>
> >> This does not happen with the 2.7.3 version. Roaming commits it appears
> >> that the change in behavior comes from this commit:
> >>
> >> commit 69c51582ff786a68fc325c1c50624715482bc460
> >> Author: Matteo Croce 
> >> Date:   Tue Sep 25 10:51:05 2018 +0200
> >>
> >> dpif-netlink: don't allocate per thread netlink sockets
> >>
> >>
> >> Is this a known problem?
> >>
> >> David
> >>
> >
> > Hi David,
> >
> > before my patch, vswitchd created NxM sockets, being N the ports and M
> > the active cores,
> > because every thread opens a netlink socket per port.
> >
> > With my patch, a pool is created with N socket, one per port, and all
> > the threads polls the same list
> > with the EPOLLEXCLUSIVE flag.
> > As the name suggests, EPOLLEXCLUSIVE lets the kernel wakeup only one
> > of the waiting threads.
> >
> > I'm not aware of this problem, but it goes against the intended
> > behaviour of EPOLLEXCLUSIVE.
> > Such flag exists since Linux 4.5, can you check that it's passed
> > correctly to epoll()?
> >
>
> This the commit that added the EXCLUSIVE flag:
>
> commit df0108c5da561c66c333bb46bfe3c1fc65905898
> Author: Jason Baron 
> Date:   Wed Jan 20 14:59:24 2016 -0800
>
> epoll: add EPOLLEXCLUSIVE flag
>
>
> The commit message acknowledges that multiple threads can still be awakened:
>
> "The implementation walks the list of exclusive waiters, and queues an
> event to each epfd, until it finds the first waiter that has threads
> blocked on it via epoll_wait().  The idea is to search for threads which
> are idle and ready to process the wakeup events.  Thus, we queue an
> event to at least 1 epfd, but may still potentially queue an event to
> all epfds that are attached to the shared fd source."
>
> To me that means all idle handler threads are going to be awakened on
> each upcall message even though only 1 is needed to handle the message.
>
> Jason: What was the rationale behind the exclusive flag that still wakes
> up more than 1 waiter? In the case of OVS and vswitchd I am seeing all N
> handler threads awakened on every single event which is a horrible
> scaling property.
>

Actually, I didn't look at that commit message, but I read the
epoll_ctl manpage which says:

"When a wakeup event occurs and multiple epoll file descriptors
are attached to the same target file using EPOLLEXCLUSIVE, one
or more of the epoll file descriptors will receive an event
with epoll_wait(2).  The default in this scenario (when
EPOLLEXCLUSIVE is not set) is for all epoll file descriptors
to receive an event.  EPOLLEXCLUSIVE is thus useful for avoid‐
ing thundering herd problems in certain scenarios."

I'd expect "one or more" to be probably greater than 1, but still much
lower than all.

Before this patch (which unfortunately is needed to avoid -EMFILE
errors with many ports), how many sockets are awakened when an ARP is
received?

Regards,

-- 
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] thundering herd wakeup of handler threads

2019-12-10 Thread David Ahern
[ adding Jason as author of the patch that added the epoll exclusive flag ]

On 12/10/19 12:37 PM, Matteo Croce wrote:
> On Tue, Dec 10, 2019 at 8:13 PM David Ahern  wrote:
>>
>> Hi Matteo:
>>
>> On a hypervisor running a 4.14.91 kernel and OVS 2.11 I am seeing a
>> thundering herd wake up problem. Every packet punted to userspace wakes
>> up every one of the handler threads. On a box with 96 cpus, there are 71
>> handler threads which means 71 process wakeups for every packet punted.
>>
>> This is really easy to see, just watch sched:sched_wakeup tracepoints.
>> With a few extra probes:
>>
>> perf probe sock_def_readable sk=%di
>> perf probe ep_poll_callback wait=%di mode=%si sync=%dx key=%cx
>> perf probe __wake_up_common wq_head=%di mode=%si nr_exclusive=%dx
>> wake_flags=%cx key=%8
>>
>> you can see there is a single netlink socket and its wait queue contains
>> an entry for every handler thread.
>>
>> This does not happen with the 2.7.3 version. Roaming commits it appears
>> that the change in behavior comes from this commit:
>>
>> commit 69c51582ff786a68fc325c1c50624715482bc460
>> Author: Matteo Croce 
>> Date:   Tue Sep 25 10:51:05 2018 +0200
>>
>> dpif-netlink: don't allocate per thread netlink sockets
>>
>>
>> Is this a known problem?
>>
>> David
>>
> 
> Hi David,
> 
> before my patch, vswitchd created NxM sockets, being N the ports and M
> the active cores,
> because every thread opens a netlink socket per port.
> 
> With my patch, a pool is created with N socket, one per port, and all
> the threads polls the same list
> with the EPOLLEXCLUSIVE flag.
> As the name suggests, EPOLLEXCLUSIVE lets the kernel wakeup only one
> of the waiting threads.
> 
> I'm not aware of this problem, but it goes against the intended
> behaviour of EPOLLEXCLUSIVE.
> Such flag exists since Linux 4.5, can you check that it's passed
> correctly to epoll()?
> 

This the commit that added the EXCLUSIVE flag:

commit df0108c5da561c66c333bb46bfe3c1fc65905898
Author: Jason Baron 
Date:   Wed Jan 20 14:59:24 2016 -0800

epoll: add EPOLLEXCLUSIVE flag


The commit message acknowledges that multiple threads can still be awakened:

"The implementation walks the list of exclusive waiters, and queues an
event to each epfd, until it finds the first waiter that has threads
blocked on it via epoll_wait().  The idea is to search for threads which
are idle and ready to process the wakeup events.  Thus, we queue an
event to at least 1 epfd, but may still potentially queue an event to
all epfds that are attached to the shared fd source."

To me that means all idle handler threads are going to be awakened on
each upcall message even though only 1 is needed to handle the message.

Jason: What was the rationale behind the exclusive flag that still wakes
up more than 1 waiter? In the case of OVS and vswitchd I am seeing all N
handler threads awakened on every single event which is a horrible
scaling property.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 18/19] netdev-offload-dpdk: Support offload of clone tnl_push/output actions

2019-12-10 Thread Ilya Maximets
On 08.12.2019 14:23, Eli Britstein wrote:
> Signed-off-by: Eli Britstein 
> Reviewed-by: Oz Shlomo 
> ---
>  Documentation/howto/dpdk.rst |  2 ++
>  NEWS |  4 ++--
>  lib/netdev-dpdk.c| 14 +
>  lib/netdev-offload-dpdk.c| 48 
> 
>  4 files changed, 66 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index b9be83002..b9aaf26bf 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -394,6 +394,8 @@ Supported actions for hardware offload are:
>  - Drop.
>  - Modification of Ethernet (mod_dl_src/mod_dl_dst).
>  - Modification of IPv4 (mod_nw_src/mod_nw_dst/mod_nw_ttl).
> +- Clone with tnl_push and output (a single clone, as the last action,
> +  with a single output, as the last nested clone action).
>  
>  Further Reading
>  ---
> diff --git a/NEWS b/NEWS
> index 297ca6fff..25125801b 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -26,8 +26,8 @@ Post-v2.12.0
>   * DPDK ring ports (dpdkr) are deprecated and will be removed in next
> releases.
>   * Add support for DPDK 19.11.
> - * Add hardware offload support for output, drop and set actions of
> -   MAC and IPv4 (experimental).
> + * Add hardware offload support for output, drop, set of MAC, IPv4
> +   and tunnel push-output actions (experimental).
>  
>  v2.12.0 - 03 Sep 2019
>  -
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index c4606363d..14f5d6f3b 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -4762,6 +4762,20 @@ ds_put_flow_action(struct ds *s, const struct 
> rte_flow_action *actions)
>  } else {
>  ds_put_cstr(s, "  Set-ttl = null\n");
>  }
> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_RAW_ENCAP) {
> +const struct rte_flow_action_raw_encap *raw_encap = actions->conf;
> +
> +ds_put_cstr(s, "rte flow raw-encap action:\n");
> +if (raw_encap) {
> +ds_put_format(s,
> +  "  Raw-encap: size=%ld\n",
> +  raw_encap->size);
> +ds_put_format(s,
> +  "  Raw-encap: encap=\n");

Why so multilined?

> +ds_put_hex_dump(s, raw_encap->data, raw_encap->size, 0, false);
> +} else {
> +ds_put_cstr(s, "  Raw-encap = null\n");
> +}
>  } else {
>  ds_put_format(s, "unknown rte flow action (%d)\n", actions->type);
>  }
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index 3fccac956..b0403a085 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -634,6 +634,45 @@ parse_set_actions(struct flow_actions *actions,
>  return 0;
>  }
>  
> +static int
> +parse_clone_actions(struct netdev *netdev,
> +struct flow_actions *actions,
> +const struct nlattr *clone_actions,
> +const size_t clone_actions_len,
> +struct offload_info *info)
> +{
> +const struct nlattr *ca;
> +unsigned int cleft;
> +
> +NL_ATTR_FOR_EACH_UNSAFE (ca, cleft, clone_actions, clone_actions_len) {
> +int clone_type = nl_attr_type(ca);
> +
> +if (clone_type == OVS_ACTION_ATTR_TUNNEL_PUSH) {
> +const struct ovs_action_push_tnl *tnl_push = nl_attr_get(ca);
> +struct rte_flow_action_raw_encap *raw_encap =
> +xzalloc(sizeof *raw_encap);
> +
> +raw_encap->data = (uint8_t *)tnl_push->header;
> +raw_encap->preserve = NULL;
> +raw_encap->size = tnl_push->header_len;
> +
> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RAW_ENCAP,
> +raw_encap);
> +} else if (clone_type == OVS_ACTION_ATTR_OUTPUT) {
> +if (!(cleft <= NLA_ALIGN(ca->nla_len)) ||

The same question as for the simple output action:
Why it should be the last here?  I guess, any combination of
encaps and outputs should be valid here.  Isn't it?

> +add_output_action(netdev, actions, ca, info)) {
> +return -1;
> +}
> +} else {
> +VLOG_DBG_RL(_rl,
> +"Unsupported clone action. clone_type=%d", 
> clone_type);

Maybe, "Unsupported nested action inside clone(), action type: %d" ?

> +return -1;
> +}
> +}
> +
> +return 0;
> +}
> +
>  static int
>  parse_flow_actions(struct netdev *netdev,
> struct flow_actions *actions,
> @@ -661,6 +700,15 @@ parse_flow_actions(struct netdev *netdev,
>masked)) {
>  return -1;
>  }
> +} else if (nl_attr_type(nla) == OVS_ACTION_ATTR_CLONE) {
> +const struct nlattr *clone_actions = nl_attr_get(nla);
> +size_t clone_actions_len = 

Re: [ovs-dev] thundering herd wakeup of handler threads

2019-12-10 Thread Matteo Croce
On Tue, Dec 10, 2019 at 8:41 PM David Ahern  wrote:
>
> On 12/10/19 12:37 PM, Matteo Croce wrote:
> > On Tue, Dec 10, 2019 at 8:13 PM David Ahern  wrote:
> >>
> >> Hi Matteo:
> >>
> >> On a hypervisor running a 4.14.91 kernel and OVS 2.11 I am seeing a
> >> thundering herd wake up problem. Every packet punted to userspace wakes
> >> up every one of the handler threads. On a box with 96 cpus, there are 71
> >> handler threads which means 71 process wakeups for every packet punted.
> >>
> >> This is really easy to see, just watch sched:sched_wakeup tracepoints.
> >> With a few extra probes:
> >>
> >> perf probe sock_def_readable sk=%di
> >> perf probe ep_poll_callback wait=%di mode=%si sync=%dx key=%cx
> >> perf probe __wake_up_common wq_head=%di mode=%si nr_exclusive=%dx
> >> wake_flags=%cx key=%8
> >>
> >> you can see there is a single netlink socket and its wait queue contains
> >> an entry for every handler thread.
> >>
> >> This does not happen with the 2.7.3 version. Roaming commits it appears
> >> that the change in behavior comes from this commit:
> >>
> >> commit 69c51582ff786a68fc325c1c50624715482bc460
> >> Author: Matteo Croce 
> >> Date:   Tue Sep 25 10:51:05 2018 +0200
> >>
> >> dpif-netlink: don't allocate per thread netlink sockets
> >>
> >>
> >> Is this a known problem?
> >>
> >> David
> >>
> >
> > Hi David,
> >
> > before my patch, vswitchd created NxM sockets, being N the ports and M
> > the active cores,
> > because every thread opens a netlink socket per port.
> >
> > With my patch, a pool is created with N socket, one per port, and all
> > the threads polls the same list
> > with the EPOLLEXCLUSIVE flag.
> > As the name suggests, EPOLLEXCLUSIVE lets the kernel wakeup only one
> > of the waiting threads.
> >
> > I'm not aware of this problem, but it goes against the intended
> > behaviour of EPOLLEXCLUSIVE.
> > Such flag exists since Linux 4.5, can you check that it's passed
> > correctly to epoll()?
> >
>
> I get the theory, but the reality is that all threads are awakened.
> Also, it is not limited to the 4.14 kernel; I see the same behavior with
> 5.4.
>

So all threads are awakened, even if there is only an upcall packet to read?
This is not good, epoll() should wake just one thread at time, as the
manpage says.

I have to check this, do you have a minimal setup? How do you generate
upcalls, it's BUM traffic or via action=userspace?

Bye,
-- 
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Re. Guten Tag

2019-12-10 Thread Mr. David Raymond


Liebste Geliebte,

Ich bin David Raymon. Ein portugiesischer Staatsbürger. Ich habe gesucht und 
Ihre E-Mail gesehen. Also habe ich beschlossen, Ihnen zu schreiben, ob Ihre 
E-Mail echt ist. Bei mir wurde Speiseröhrenkrebs diagnostiziert. Es hat alle 
Formen der medizinischen Behandlung verunreinigt, und jetzt habe ich nur noch 
ein paar Monate zu leben.

Ich bin sehr reich, war aber nie großzügig; Ich habe den größten Teil meines 
Vermögens meinen unmittelbaren Familienmitgliedern gegeben. Ich habe 
beschlossen, Almosen an Wohltätigkeitsorganisationen zu spenden. Ich kann das 
aus gesundheitlichen Gründen nicht mehr selbst machen. Ich habe einmal 
Familienmitglieder gebeten, etwas Geld an
Wohltätigkeitsorganisationen zu spenden. Sie haben dies abgelehnt und das Geld 
behalten. Ich habe eine riesige Bareinlage von achtzehn Millionen Dollar bei 
einer Sicherheitsfirma in Amerika. Ich möchte, dass Sie mir helfen, diese 
Kaution zu sammeln und an Wohltätigkeitsorganisationen zu senden. Sie werden 
30% dieser Mittel für Ihre Unterstützung in Anspruch nehmen. Ich möchte Sie 
bitten, den Erhalt dieser E-Mail so bald wie möglich zu bestätigen und sie mit 
absoluter Vertraulichkeit und Aufrichtigkeit zu behandeln.

Bitte antworten Sie aus Sicherheitsgründen auf meine private E-Mail-Adresse 
fabiano.ron5...@gmail.com

David Raymon
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] thundering herd wakeup of handler threads

2019-12-10 Thread David Ahern
On 12/10/19 12:37 PM, Matteo Croce wrote:
> On Tue, Dec 10, 2019 at 8:13 PM David Ahern  wrote:
>>
>> Hi Matteo:
>>
>> On a hypervisor running a 4.14.91 kernel and OVS 2.11 I am seeing a
>> thundering herd wake up problem. Every packet punted to userspace wakes
>> up every one of the handler threads. On a box with 96 cpus, there are 71
>> handler threads which means 71 process wakeups for every packet punted.
>>
>> This is really easy to see, just watch sched:sched_wakeup tracepoints.
>> With a few extra probes:
>>
>> perf probe sock_def_readable sk=%di
>> perf probe ep_poll_callback wait=%di mode=%si sync=%dx key=%cx
>> perf probe __wake_up_common wq_head=%di mode=%si nr_exclusive=%dx
>> wake_flags=%cx key=%8
>>
>> you can see there is a single netlink socket and its wait queue contains
>> an entry for every handler thread.
>>
>> This does not happen with the 2.7.3 version. Roaming commits it appears
>> that the change in behavior comes from this commit:
>>
>> commit 69c51582ff786a68fc325c1c50624715482bc460
>> Author: Matteo Croce 
>> Date:   Tue Sep 25 10:51:05 2018 +0200
>>
>> dpif-netlink: don't allocate per thread netlink sockets
>>
>>
>> Is this a known problem?
>>
>> David
>>
> 
> Hi David,
> 
> before my patch, vswitchd created NxM sockets, being N the ports and M
> the active cores,
> because every thread opens a netlink socket per port.
> 
> With my patch, a pool is created with N socket, one per port, and all
> the threads polls the same list
> with the EPOLLEXCLUSIVE flag.
> As the name suggests, EPOLLEXCLUSIVE lets the kernel wakeup only one
> of the waiting threads.
> 
> I'm not aware of this problem, but it goes against the intended
> behaviour of EPOLLEXCLUSIVE.
> Such flag exists since Linux 4.5, can you check that it's passed
> correctly to epoll()?
> 

I get the theory, but the reality is that all threads are awakened.
Also, it is not limited to the 4.14 kernel; I see the same behavior with
5.4.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] thundering herd wakeup of handler threads

2019-12-10 Thread Matteo Croce
On Tue, Dec 10, 2019 at 8:13 PM David Ahern  wrote:
>
> Hi Matteo:
>
> On a hypervisor running a 4.14.91 kernel and OVS 2.11 I am seeing a
> thundering herd wake up problem. Every packet punted to userspace wakes
> up every one of the handler threads. On a box with 96 cpus, there are 71
> handler threads which means 71 process wakeups for every packet punted.
>
> This is really easy to see, just watch sched:sched_wakeup tracepoints.
> With a few extra probes:
>
> perf probe sock_def_readable sk=%di
> perf probe ep_poll_callback wait=%di mode=%si sync=%dx key=%cx
> perf probe __wake_up_common wq_head=%di mode=%si nr_exclusive=%dx
> wake_flags=%cx key=%8
>
> you can see there is a single netlink socket and its wait queue contains
> an entry for every handler thread.
>
> This does not happen with the 2.7.3 version. Roaming commits it appears
> that the change in behavior comes from this commit:
>
> commit 69c51582ff786a68fc325c1c50624715482bc460
> Author: Matteo Croce 
> Date:   Tue Sep 25 10:51:05 2018 +0200
>
> dpif-netlink: don't allocate per thread netlink sockets
>
>
> Is this a known problem?
>
> David
>

Hi David,

before my patch, vswitchd created NxM sockets, being N the ports and M
the active cores,
because every thread opens a netlink socket per port.

With my patch, a pool is created with N socket, one per port, and all
the threads polls the same list
with the EPOLLEXCLUSIVE flag.
As the name suggests, EPOLLEXCLUSIVE lets the kernel wakeup only one
of the waiting threads.

I'm not aware of this problem, but it goes against the intended
behaviour of EPOLLEXCLUSIVE.
Such flag exists since Linux 4.5, can you check that it's passed
correctly to epoll()?

Bye,

-- 
Matteo Croce
per aspera ad upstream

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 16/19] netdev-offload-dpdk: Support offload of set MAC actions

2019-12-10 Thread Ilya Maximets
On 08.12.2019 14:23, Eli Britstein wrote:
> Signed-off-by: Eli Britstein 
> Reviewed-by: Oz Shlomo 
> ---
>  Documentation/howto/dpdk.rst |   1 +
>  NEWS |   3 +-
>  lib/netdev-dpdk.c|  15 ++
>  lib/netdev-offload-dpdk.c| 107 
> +++
>  4 files changed, 125 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 6cedd7f63..d228493e9 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -392,6 +392,7 @@ Supported actions for hardware offload are:
>  
>  - Output (a single output, as the last action).
>  - Drop.
> +- Modification of Ethernet (mod_dl_src/mod_dl_dst).
>  
>  Further Reading
>  ---
> diff --git a/NEWS b/NEWS
> index d019e066f..3ade86d49 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -26,7 +26,8 @@ Post-v2.12.0
>   * DPDK ring ports (dpdkr) are deprecated and will be removed in next
> releases.
>   * Add support for DPDK 19.11.
> - * Add hardware offload support for output and drop actions 
> (experimental).
> + * Add hardware offload support for output, drop and set MAC actions
> +   (experimental).
>  
>  v2.12.0 - 03 Sep 2019
>  -
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 872a45e75..e67a3dd76 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -4723,6 +4723,21 @@ ds_put_flow_action(struct ds *s, const struct 
> rte_flow_action *actions)
>  }
>  } else if (actions->type == RTE_FLOW_ACTION_TYPE_DROP) {
>  ds_put_cstr(s, "rte flow drop action\n");
> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_SRC ||
> +   actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_DST) {
> +const struct rte_flow_action_set_mac *set_mac = actions->conf;
> +
> +char *dirstr = actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_DST
> +   ? "dst" : "src";
> +
> +ds_put_format(s, "rte flow set-mac-%s action:\n", dirstr);
> +if (set_mac) {
> +ds_put_format(s,
> +  "  Set-mac-%s: "ETH_ADDR_FMT"\n",
> +  dirstr, ETH_ADDR_BYTES_ARGS(set_mac->mac_addr));
> +} else {
> +ds_put_format(s, "  Set-mac-%s = null\n", dirstr);
> +}
>  } else {
>  ds_put_format(s, "unknown rte flow action (%d)\n", actions->type);
>  }
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index bffb69cad..07f5d4687 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -511,6 +511,103 @@ add_output_action(struct netdev *netdev,
>  return ret;
>  }
>  
> +struct set_action_info {
> +const uint8_t *value, *mask;
> +const uint8_t size;
> +uint8_t *spec;
> +const int attr;
> +};
> +
> +static int
> +add_set_flow_action(struct flow_actions *actions,
> +struct set_action_info *sa_info_arr,
> +size_t sa_info_arr_size)
> +{
> +int field, i;
> +
> +for (field = 0; field < sa_info_arr_size; field++) {
> +if (sa_info_arr[field].mask) {
> +/* DPDK does not support partially masked set actions. In such
> + * case, fail the offload.
> + */

Could you, please, just use is_all_zeros/ones() functions?

> +if (sa_info_arr[field].mask[0] != 0x00 &&
> +sa_info_arr[field].mask[0] != 0xFF) {
> +VLOG_DBG_RL(_rl,
> +"Partial mask is not supported");
> +return -1;
> +}
> +
> +for (i = 1; i < sa_info_arr[field].size; i++) {
> +if (sa_info_arr[field].mask[i] !=
> +sa_info_arr[field].mask[i - 1]) {
> +VLOG_DBG_RL(_rl,
> +"Partial mask is not supported");
> +return -1;
> +}
> +}
> +
> +if (sa_info_arr[field].mask[0] == 0x00) {
> +/* mask bytes are all 0 - no rewrite action required */
> +continue;
> +}
> +}
> +
> +memcpy(sa_info_arr[field].spec, sa_info_arr[field].value,
> +   sa_info_arr[field].size);
> +add_flow_action(actions, sa_info_arr[field].attr,
> +sa_info_arr[field].spec);
> +}
> +
> +return 0;
> +}
> +
> +/* Mask is at the midpoint of the data. */
> +#define get_mask(a, type) ((const type *)(const void *)(a + 1) + 1)
> +
> +#define SA_INFO(_field, _spec, _attr) { \
> +.value = (uint8_t *)>_field, \
> +.mask = (masked) ? (uint8_t *)>_field : NULL, \
> +.size = sizeof key->_field, \
> +.spec = (uint8_t *)&_spec, \
> +.attr = _attr }

All of this doesn't look good and not easy to read.
I think it might be much better if you'll just pass required things
to add_set_flow_action() and call this function 

[ovs-dev] thundering herd wakeup of handler threads

2019-12-10 Thread David Ahern
Hi Matteo:

On a hypervisor running a 4.14.91 kernel and OVS 2.11 I am seeing a
thundering herd wake up problem. Every packet punted to userspace wakes
up every one of the handler threads. On a box with 96 cpus, there are 71
handler threads which means 71 process wakeups for every packet punted.

This is really easy to see, just watch sched:sched_wakeup tracepoints.
With a few extra probes:

perf probe sock_def_readable sk=%di
perf probe ep_poll_callback wait=%di mode=%si sync=%dx key=%cx
perf probe __wake_up_common wq_head=%di mode=%si nr_exclusive=%dx
wake_flags=%cx key=%8

you can see there is a single netlink socket and its wait queue contains
an entry for every handler thread.

This does not happen with the 2.7.3 version. Roaming commits it appears
that the change in behavior comes from this commit:

commit 69c51582ff786a68fc325c1c50624715482bc460
Author: Matteo Croce 
Date:   Tue Sep 25 10:51:05 2018 +0200

dpif-netlink: don't allocate per thread netlink sockets


Is this a known problem?

David
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 15/19] netdev-offload-dpdk: Support offload of drop action

2019-12-10 Thread Ilya Maximets
On 08.12.2019 14:23, Eli Britstein wrote:
> Signed-off-by: Eli Britstein 
> Reviewed-by: Oz Shlomo 
> ---
>  Documentation/howto/dpdk.rst | 1 +
>  NEWS | 2 +-
>  lib/netdev-dpdk.c| 2 ++
>  lib/netdev-offload-dpdk.c| 4 +---
>  4 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index f62ce82af..6cedd7f63 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -391,6 +391,7 @@ Supported protocols for hardware offload matches are:
>  Supported actions for hardware offload are:
>  
>  - Output (a single output, as the last action).
> +- Drop.
>  
>  Further Reading
>  ---
> diff --git a/NEWS b/NEWS
> index c430999a0..d019e066f 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -26,7 +26,7 @@ Post-v2.12.0
>   * DPDK ring ports (dpdkr) are deprecated and will be removed in next
> releases.
>   * Add support for DPDK 19.11.
> - * Add hardware offload support for output actions (experimental).
> + * Add hardware offload support for output and drop actions 
> (experimental).
>  
>  v2.12.0 - 03 Sep 2019
>  -
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index d9a2c2004..872a45e75 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -4721,6 +4721,8 @@ ds_put_flow_action(struct ds *s, const struct 
> rte_flow_action *actions)
>  } else {
>  ds_put_cstr(s, "  Port-id = null\n");
>  }
> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_DROP) {
> +ds_put_cstr(s, "rte flow drop action\n");
>  } else {
>  ds_put_format(s, "unknown rte flow action (%d)\n", actions->type);
>  }
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index 0b9087192..bffb69cad 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -536,9 +536,7 @@ parse_flow_actions(struct netdev *netdev,
>  }
>  
>  if (nl_actions_len == 0) {
> -VLOG_DBG_RL(_rl,
> -"Unsupported action type drop");
> -return -1;
> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_DROP, NULL);

Do we need an explicit drop action?
What will happen if HW will have only packet modification actions
or will not have actions at all?  Will it send packet to driver for
processing or it will drop it?
Do we need to finish all the flows that doesn't end with DROP/PORT_ID
actions with explicit drop action?
Asking because it's possible to have such flows in OVS.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 14/19] netdev-offload-dpdk: Support offload of output action

2019-12-10 Thread Ilya Maximets
On 08.12.2019 14:22, Eli Britstein wrote:
> Support offload of output action, also configuring count action for
> allowing query statistics of HW offloaded flows.
> 
> Signed-off-by: Eli Britstein 
> Reviewed-by: Oz Shlomo 
> ---
>  Documentation/howto/dpdk.rst | 17 ---
>  NEWS |  1 +
>  lib/netdev-dpdk.c| 22 +
>  lib/netdev-offload-dpdk.c| 73 
> +---
>  4 files changed, 104 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 766a7950c..f62ce82af 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -370,19 +370,28 @@ The flow hardware offload is disabled by default and 
> can be enabled by::
>  
>  $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
>  
> -So far only partial flow offload is implemented. Moreover, it only works
> -with PMD drivers have the rte_flow action "MARK + RSS" support.
> +Matches and actions are programmed into HW to achieve full offload of
> +the flow. If not all actions are supported, fallback to partial flow
> +offload (offloading matches only). Moreover, it only works with PMD
> +drivers that support the configured rte_flow actions.
> +Partial flow offload requires support of "MARK + RSS" actions. Full
> +hardware offload requires support of the actions listed below.
>  
>  The validated NICs are:
>  
>  - Mellanox (ConnectX-4, ConnectX-4 Lx, ConnectX-5)
>  - Napatech (NT200B01)
>  
> -Supported protocols for hardware offload are:
> +Supported protocols for hardware offload matches are:
> +
>  - L2: Ethernet, VLAN
> -- L3: IPv4, IPv6
> +- L3: IPv4
>  - L4: TCP, UDP, SCTP, ICMP
>  
> +Supported actions for hardware offload are:
> +
> +- Output (a single output, as the last action).

Is there any particular restriction why we cant output to
several ports in a row?

> +
>  Further Reading
>  ---
>  
> diff --git a/NEWS b/NEWS
> index 2acde3fe8..c430999a0 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -26,6 +26,7 @@ Post-v2.12.0
>   * DPDK ring ports (dpdkr) are deprecated and will be removed in next
> releases.
>   * Add support for DPDK 19.11.
> + * Add hardware offload support for output actions (experimental).
>  
>  v2.12.0 - 03 Sep 2019
>  -
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 16baae0c4..d9a2c2004 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -4699,6 +4699,28 @@ ds_put_flow_action(struct ds *s, const struct 
> rte_flow_action *actions)
>  } else {
>  ds_put_cstr(s, "  RSS = null\n");
>  }
> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_COUNT) {
> +const struct rte_flow_action_count *count = actions->conf;
> +
> +ds_put_cstr(s, "rte flow count action:\n");
> +if (count) {
> +ds_put_format(s,
> +  "  Count: shared=%d, id=%d\n",
> +  count->shared, count->id);
> +} else {
> +ds_put_cstr(s, "  Count = null\n");
> +}
> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_PORT_ID) {
> +const struct rte_flow_action_port_id *port_id = actions->conf;
> +
> +ds_put_cstr(s, "rte flow port-id action:\n");
> +if (port_id) {
> +ds_put_format(s,
> +  "  Port-id: original=%d, id=%d\n",
> +  port_id->original, port_id->id);
> +} else {
> +ds_put_cstr(s, "  Port-id = null\n");
> +}
>  } else {
>  ds_put_format(s, "unknown rte flow action (%d)\n", actions->type);
>  }
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index 10ab9240a..0b9087192 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -456,20 +456,83 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns 
> *patterns,
>  return flow;
>  }
>  
> +static void
> +add_count_action(struct flow_actions *actions)
> +{
> +struct rte_flow_action_count *count = xzalloc(sizeof *count);
> +
> +count->shared = 0;
> +/* Each flow has a single count action, so no need of id */

Comments style.

> +count->id = 0;
> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_COUNT, count);
> +}
> +
>  static int
> -parse_flow_actions(struct netdev *netdev OVS_UNUSED,
> +add_port_id_action(struct flow_actions *actions,
> +   struct netdev *outdev)
> +{
> +struct rte_flow_action_port_id *port_id = xzalloc(sizeof *port_id);
> +int outdev_id;
> +
> +outdev_id = netdev_dpdk_get_port_id(outdev);
> +if (outdev_id < 0) {
> +return -1;
> +}
> +port_id->id = outdev_id;
> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_PORT_ID, port_id);
> +return 0;
> +}
> +
> +static int
> +add_output_action(struct netdev *netdev,
> +  struct flow_actions *actions,
> +  

Re: [ovs-dev] [PATCH 15/20] netdev-offload-dpdk-flow: Support offload of output action

2019-12-10 Thread Ilya Maximets
On 08.12.2019 10:38, Eli Britstein wrote:
> 
> On 12/5/2019 6:46 PM, Sriharsha Basavapatna wrote:
>> On Wed, Dec 4, 2019 at 8:55 PM Eli Britstein  wrote:
>>>
>>> On 12/3/2019 5:19 PM, Sriharsha Basavapatna wrote:
 On Wed, Nov 20, 2019 at 9:07 PM Eli Britstein  wrote:
> Signed-off-by: Eli Britstein 
> Reviewed-by: Oz Shlomo 
> ---
>NEWS   |  2 +
>lib/netdev-offload-dpdk-flow.c | 87 
> --
>2 files changed, 85 insertions(+), 4 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 88b818948..ca9c2b230 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -10,6 +10,8 @@ Post-v2.12.0
>   if supported by libbpf.
> * Add option to enable, disable and query TCP sequence checking in
>   conntrack.
> +   - DPDK:
> + * Add hardware offload support for output actions.
>
>v2.12.0 - 03 Sep 2019
>-
> diff --git a/lib/netdev-offload-dpdk-flow.c 
> b/lib/netdev-offload-dpdk-flow.c
> index dbbc45e99..6e7efb315 100644
> --- a/lib/netdev-offload-dpdk-flow.c
> +++ b/lib/netdev-offload-dpdk-flow.c
> @@ -274,6 +274,28 @@ ds_put_flow_action(struct ds *s, const struct 
> rte_flow_action *actions)
>} else {
>ds_put_cstr(s, "  RSS = null\n");
>}
> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_COUNT) {
> +const struct rte_flow_action_count *count = actions->conf;
> +
> +ds_put_cstr(s, "rte flow count action:\n");
> +if (count) {
> +ds_put_format(s,
> +  "  Count: shared=%d, id=%d\n",
> +  count->shared, count->id);
> +} else {
> +ds_put_cstr(s, "  Count = null\n");
> +}
> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_PORT_ID) {
> +const struct rte_flow_action_port_id *port_id = actions->conf;
> +
> +ds_put_cstr(s, "rte flow port-id action:\n");
> +if (port_id) {
> +ds_put_format(s,
> +  "  Port-id: original=%d, id=%d\n",
> +  port_id->original, port_id->id);
> +} else {
> +ds_put_cstr(s, "  Port-id = null\n");
> +}
>} else {
>ds_put_format(s, "unknown rte flow action (%d)\n", 
> actions->type);
>}
> @@ -531,19 +553,76 @@ netdev_dpdk_flow_patterns_add(struct flow_patterns 
> *patterns,
>return 0;
>}
>
> +static void
> +netdev_dpdk_flow_add_count_action(struct flow_actions *actions)
> +{
> +struct rte_flow_action_count *count = xzalloc(sizeof *count);
> +
> +count->shared = 0;
> +/* Each flow has a single count action, so no need of id */
> +count->id = 0;
> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_COUNT, count);
> +}
> +
> +static void
> +netdev_dpdk_flow_add_port_id_action(struct flow_actions *actions,
> +struct netdev *outdev)
> +{
> +struct rte_flow_action_port_id *port_id = xzalloc(sizeof *port_id);
> +
> +port_id->id = netdev_dpdk_get_port_id(outdev);
> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_PORT_ID, port_id);
> +}
> +
> +static int
> +netdev_dpdk_flow_add_output_action(struct flow_actions *actions,
> +   const struct nlattr *nla,
> +   struct offload_info *info)
> +{
> +struct netdev *outdev;
> +odp_port_t port;
> +
> +port = nl_attr_get_odp_port(nla);
> +outdev = netdev_ports_get(port, info->dpif_class);
> +if (outdev == NULL) {
> +VLOG_DBG_RL(_rl,
> +"Cannot find netdev for odp port %d", port);
> +return -1;
> +}
> +if (!netdev_dpdk_flow_api_supported(outdev)) {
> +VLOG_DBG_RL(_rl,
> +"Output to %s cannot be offloaded",
> +netdev_get_name(outdev));
> +return -1;
> +}
> +
> +netdev_dpdk_flow_add_count_action(actions);
> +netdev_dpdk_flow_add_port_id_action(actions, outdev);
> +netdev_close(outdev);
> +
> +return 0;
> +}
> +
>int
>netdev_dpdk_flow_actions_add(struct flow_actions *actions,
> struct nlattr *nl_actions,
> size_t nl_actions_len,
> - struct offload_info *info OVS_UNUSED)
> + struct offload_info *info)
>{
>struct nlattr *nla;
>size_t left;
>
>NL_ATTR_FOR_EACH_UNSAFE (nla, 

Re: [ovs-dev] [PATCH V3 10/19] dpif-netdev: Read hw stats during flow_dump_next() call

2019-12-10 Thread Ilya Maximets
On 08.12.2019 14:22, Eli Britstein wrote:
> From: Ophir Munk 
> 
> Use netdev flow get API in order to update the statistics of fully
> offloaded flows.
> 
> Co-authored-by: Eli Britstein 
> Signed-off-by: Ophir Munk 
> Reviewed-by: Oz Shlomo 
> Signed-off-by: Eli Britstein 
> ---
>  lib/dpif-netdev.c | 47 +--
>  1 file changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 1e5493615..37e7e5c38 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -527,6 +527,7 @@ struct dp_netdev_flow {
>  
>  bool dead;
>  uint32_t mark;   /* Unique flow mark assigned to a flow */
> +bool actions_offloaded;  /* true if flow is fully offloaded. */

IIUC, this is modified and used asynchronously in different threads.
It should be atomic.

It's better to put 'true' into single quotes.

>  
>  /* Statistics. */
>  struct dp_netdev_flow_stats stats;
> @@ -2409,6 +2410,7 @@ dp_netdev_flow_offload_put(struct dp_flow_offload_item 
> *offload)
>  }
>  }
>  info.flow_mark = mark;
> +info.actions_offloaded = >actions_offloaded;
>  
>  port = netdev_ports_get(in_port, pmd->dp->dpif->dpif_class);
>  if (!port || netdev_vport_is_vport_class(port->netdev_class)) {
> @@ -3071,8 +3073,8 @@ dp_netdev_flow_to_dpif_flow(const struct dp_netdev_flow 
> *netdev_flow,
>  flow->pmd_id = netdev_flow->pmd_id;
>  get_dpif_flow_stats(netdev_flow, >stats);
>  
> -flow->attrs.offloaded = false;
> -flow->attrs.dp_layer = "ovs";
> +flow->attrs.offloaded = netdev_flow->actions_offloaded;
> +flow->attrs.dp_layer = flow->attrs.offloaded ? "in_hw" : "ovs";
>  }
>  
>  static int
> @@ -3242,6 +3244,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
>  flow->dead = false;
>  flow->batch = NULL;
>  flow->mark = INVALID_FLOW_MARK;
> +flow->actions_offloaded = false;
>  *CONST_CAST(unsigned *, >pmd_id) = pmd->core_id;
>  *CONST_CAST(struct flow *, >flow) = match->flow;
>  *CONST_CAST(ovs_u128 *, >ufid) = *ufid;
> @@ -3596,6 +3599,42 @@ dpif_netdev_flow_dump_thread_destroy(struct 
> dpif_flow_dump_thread *thread_)
>  free(thread);
>  }
>  
> +static int
> +dpif_netdev_offload_used(struct dp_netdev_flow *netdev_flow,
> + struct dp_netdev_pmd_thread *pmd)
> +{
> +struct dpif_flow_stats stats;
> +struct netdev *netdev;
> +struct match match;
> +struct nlattr *actions;
> +struct dpif_flow_attrs attrs;
> +struct ofpbuf wbuffer;
> +
> +ovs_u128 ufid;
> +int ret = 0;
> +
> +netdev = netdev_ports_get(netdev_flow->flow.in_port.odp_port,
> +  pmd->dp->class);
> +if (!netdev) {
> +return -1;
> +}
> +/* get offloaded stats */

Comment style.

> +ufid = netdev_flow->mega_ufid;

Why copying?  'ufid' argument is constant.

> +ret = netdev_flow_get(netdev, , , , , ,
> +  );

dp->port_mutex around above + please, copy the comment from my port_mutex patch.

> +netdev_close(netdev);
> +if (ret) {
> +return -1;
> +}
> +if (stats.n_packets) {
> +atomic_store_relaxed(_flow->stats.used, pmd->ctx.now / 1000);

This code should not be here according to review for the previous patch, but
anyway, you should not use pmd context as you're not in a pmd thread.

> +non_atomic_ullong_add(_flow->stats.packet_count, 
> stats.n_packets);
> +non_atomic_ullong_add(_flow->stats.byte_count, stats.n_bytes);
> +}
> +
> +return 0;
> +}
> +
>  static int
>  dpif_netdev_flow_dump_next(struct dpif_flow_dump_thread *thread_,
> struct dpif_flow *flows, int max_flows)
> @@ -3636,6 +3675,10 @@ dpif_netdev_flow_dump_next(struct 
> dpif_flow_dump_thread *thread_,
>  netdev_flows[n_flows] = CONTAINER_OF(node,
>   struct dp_netdev_flow,
>   node);
> +/* Read hardware stats in case of hardware offload */

Comment style.

> +if (netdev_flows[n_flows]->actions_offloaded) {
> +dpif_netdev_offload_used(netdev_flows[n_flows], pmd);
> +}
>  }
>  /* When finishing dumping the current pmd thread, moves to
>   * the next. */
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 03/10] tc: Introduce tcf_id to specify a tc filter

2019-12-10 Thread Marcelo Ricardo Leitner
On Mon, Dec 09, 2019 at 03:09:37PM +, Paul Blakey wrote:
> 
> On 12/3/2019 3:45 PM, Roi Dayan wrote:
> > @@ -1428,38 +1419,35 @@ netdev_tc_flow_put(struct netdev *netdev, struct 
> > match *match,
> >   }
> >   }
> >   
> > -block_id = get_block_id_from_netdev(netdev);
> 
> Accidentally removed this line, will post V4 soon with this fix.
> 
> There is nothing else pending so I welcome more replies :)

Not from me, but Ilya metioned an open one at
https://mail.openvswitch.org/pipermail/ovs-dev/2019-December/365459.html
(at the end of the email)

  Marcelo

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 09/19] netdev-offload-dpdk: Implement flow get method

2019-12-10 Thread Ilya Maximets
On 08.12.2019 14:22, Eli Britstein wrote:
> Implement the flow get method for DPDK, to get the statistics of the
> provided ufid, towards reading statistics of fully offloaded flows.
> 
> Signed-off-by: Eli Britstein 
> Reviewed-by: Oz Shlomo 
> ---
>  lib/netdev-offload-dpdk.c | 40 
>  1 file changed, 40 insertions(+)
> 
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index 5568400b6..10ab9240a 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -705,9 +705,49 @@ netdev_offload_dpdk_init_flow_api(struct netdev *netdev)
>  return netdev_dpdk_flow_api_supported(netdev) ? 0 : EOPNOTSUPP;
>  }
>  
> +static int
> +netdev_offload_dpdk_flow_get(struct netdev *netdev,
> + struct match *match OVS_UNUSED,
> + struct nlattr **actions OVS_UNUSED,
> + const ovs_u128 *ufid,
> + struct dpif_flow_stats *stats,
> + struct dpif_flow_attrs *attrs OVS_UNUSED,
> + struct ofpbuf *buf OVS_UNUSED)
> +{
> +struct rte_flow_query_count query = { .reset = 1 };

I see that you're implementing a workaround for 'used' field of a struct
by resetting them on each request, but netdev_flow_get() has a different
semantics.  It should return full flow stats, so you need to implement
'used' logic here on the netdev-offload-provider level. 

> +struct rte_flow_error error;
> +struct rte_flow *rte_flow;
> +int ret = 0;
> +
> +ovs_mutex_lock(_map_mutex);
> +rte_flow = ufid_to_rte_flow_find(ufid);
> +if (!rte_flow) {
> +ret = -1;
> +goto out;
> +}
> +
> +memset(stats, 0, sizeof *stats);
> +ret = netdev_dpdk_rte_flow_query(netdev, rte_flow, , );
> +if (ret) {
> +VLOG_DBG_RL(_rl,
> +"%s: Failed to query ufid "UUID_FMT" flow: %p\n",
> +netdev_get_name(netdev), UUID_ARGS((struct uuid *)ufid),
> +rte_flow);
> +ret = -1;
> +goto out;
> +}
> +stats->n_packets += (query.hits_set) ? query.hits : 0;
> +stats->n_bytes += (query.bytes_set) ? query.bytes : 0;

Why '+=' ?

> +
> +out:
> +ovs_mutex_unlock(_map_mutex);
> +return ret;
> +}
> +
>  const struct netdev_flow_api netdev_offload_dpdk = {
>  .type = "dpdk_flow_api",
>  .flow_put = netdev_offload_dpdk_flow_put,
>  .flow_del = netdev_offload_dpdk_flow_del,
>  .init_flow_api = netdev_offload_dpdk_init_flow_api,
> +.flow_get = netdev_offload_dpdk_flow_get,
>  };
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 18/19] netdev-offload-dpdk: Support offload of clone tnl_push/output actions

2019-12-10 Thread Sathya Perla via dev
On Tue, Dec 10, 2019 at 7:56 PM Eli Britstein  wrote:
>
>
> On 12/10/2019 2:53 PM, Sathya Perla wrote:
> > On Tue, Dec 10, 2019 at 5:25 PM Eli Britstein  wrote:
> >>
> >> On 12/10/2019 12:09 PM, Sathya Perla wrote:
> >>> On Tue, Dec 10, 2019 at 1:21 PM Eli Britstein  wrote:
>  On 12/10/2019 8:52 AM, Sathya Perla wrote:
> > On Sun, Dec 8, 2019 at 6:53 PM Eli Britstein  wrote:
> > ...
> >> +static int
> >> +parse_clone_actions(struct netdev *netdev,
> >> +struct flow_actions *actions,
> >> +const struct nlattr *clone_actions,
> >> +const size_t clone_actions_len,
> >> +struct offload_info *info)
> >> +{
> >> +const struct nlattr *ca;
> >> +unsigned int cleft;
> >> +
> >> +NL_ATTR_FOR_EACH_UNSAFE (ca, cleft, clone_actions, 
> >> clone_actions_len) {
> >> +int clone_type = nl_attr_type(ca);
> >> +
> >> +if (clone_type == OVS_ACTION_ATTR_TUNNEL_PUSH) {
> >> +const struct ovs_action_push_tnl *tnl_push = 
> >> nl_attr_get(ca);
> >> +struct rte_flow_action_raw_encap *raw_encap =
> >> +xzalloc(sizeof *raw_encap);
> >> +
> >> +raw_encap->data = (uint8_t *)tnl_push->header;
> >> +raw_encap->preserve = NULL;
> >> +raw_encap->size = tnl_push->header_len;
> >> +
> >> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RAW_ENCAP,
> >> +raw_encap);
> > Hi, converting OVS_TUNNLE_PUSH into RTE_RAW_ENCAP loses the 'tnl_type'
> > information provided by OVS. RAW_ENCAP provides the tunnel header just
> > as a blob of bits which may not be ideal for NIC HW to offload.
> >
> > How about using tnl_push->tnl_type field to parse the header and
> > compose specific tunnel encap actions like RTE_VXLAN_ENCAP,
> > RTE_NVGRE_ENCAP etc.
> > This would be also be more inline with how it's done with TC-offload.
>  I see your point. However, struct ovs_action_push_tnl has the "header"
>  field just as a raw binary buffer, and not detailed like in TC.
>  "tnl_type" has a comment /* For logging. */. It is indeed used for
>  logging, as in lib/odp-util.c, in function format_odp_tnl_push_header.
> >>> This is not entirely true. Checkout propagate_tunnel_data_to_flow()
> >>> where tnl_type is being used
> >>> to figure out nw_proto.
> >>>
>  using VXLAN_ENCAP instead of RAW_ENCAP will make the push less generic,
>  as I will have to parse the header to build the vxlan_encap fields, just
>  so they can re-build as a raw header in the PMD level, so I don't see
>  the offload benefit of it.
> >>> Why are you assuming that all PMDs will rebuild the tunnel header
> >>> fields as a 'raw header'?
> >> As far as I saw in DPDK code, if I'm not wrong, all PMDs that support it
> >> do it like this. Anyway, it is indeed not relevant.
> >>> What if the NIC HW needs tunnel specific headers to be programmed
> >>> separately for each tunnel type?
> >>>
>  Another point is that this way it will support any header, not just
>  VXLAN (as an example).
> >>> Given that VXLAN and NVGRE ENCAP and DECAP actions are already defined
> >>> in rte_flow, how about
> >>> we use them for vxlan and nvgre and use RAW for the rest of the tunnel 
> >>> types?
> >>> We can support all tunnel headers even this way.
> >> Again, I see your point.
> >>
> >> However, in OVS level, we have the encap as a raw buffer and DPDK
> >> supports it natively using RAW_ENCAP. For that reason I think we should
> >> use the straight forward method.
> >>
> >> I think that if there is a PMD that requires the fields separately, it
> >> is under its responsibility to parse it, and not forcing the application
> >> to do it.
> > How should a PMD parse the header buffer? For e.g., for vxlan, should the 
> > PMD
> > look for dst UDP port 4789? What if a different port number is used
> > for vxlan as it's the case with some deployments?
> > I think it's important to deal with this in OVS because the 'tnl_type'
> > info is available in OVS and it should be relayed to the PMD
> > is some form.
> >
> >> As it's a valid attribute in DPDK, and it's the natural way for OVS to
> >> use, I think we should use it. If it is from some reason not valid, in
> >> the future, DPDK will deprecate it.
> > I agree RAW_ENCAP is a valid RTE action. But I feel we must choose the
> > action that conveys/translates as much
> > data as possible from the OVS layer to the PMD. This will enable
> > offload support from more number of NIC HWs.
>
> AFAIU, currently there is no such NIC HW that actually uses the
> additional info in VXLAN_ENCAP vs RAW_ENCAP, but they would just
> re-build the header as if it would been received with RAW, so for now I
> don't see any benefit of doing so. In the future, if there is such HW as
> you 

Re: [ovs-dev] [PATCH V3 05/19] netdev-dpdk: Improve HW offload flow debuggability

2019-12-10 Thread Ilya Maximets
On 10.12.2019 15:44, Eli Britstein wrote:
> 
> On 12/10/2019 4:38 PM, Ilya Maximets wrote:
>> On 08.12.2019 14:22, Eli Britstein wrote:
>>> Add debug prints when creating and destroying rte flows, with all the
>>> flow details (attributes, patterns, actions).
>>>
>>> Signed-off-by: Eli Britstein 
>>> Reviewed-by: Oz Shlomo 
>>> ---
>>>   lib/netdev-dpdk.c | 260 
>>> ++
>>>   lib/netdev-offload-dpdk.c | 207 +---
>>>   2 files changed, 264 insertions(+), 203 deletions(-)
>>>
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 89c73a29b..da1349b69 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -4458,9 +4458,252 @@ netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
>>>   ovs_mutex_lock(>mutex);
>>>   ret = rte_flow_destroy(dev->port_id, rte_flow, error);
>>>   ovs_mutex_unlock(>mutex);
>>> +if (!ret) {
>>> +VLOG_DBG_RL(, "Destroy rte_flow %p: netdev=%s, port=%d\n",
>>> +rte_flow, netdev_get_name(netdev), dev->port_id);
>>> +} else {
>>> +VLOG_ERR_RL(, "Destroy rte_flow %p: netdev=%s, port=%d\n"
>>> +"FAILED. error %u : message : %s",
>>> +rte_flow, netdev_get_name(netdev), dev->port_id,
>>> +error->type, error->message);
>>> +}
>> So, what is the reason of doing that?
>> netdev_offload_dpdk_add_flow() prints/could print exactly same messages
>> (dev->port_id is not that useful to print).
>> Am I missing something?
> Yes, it could have been printed in netdev_offload_dpdk_add_flow(), but 
> if moving there, and you just enable DBG in netdev_dpdk, you will see 
> only the create messages, without the destroys. It is helpful for debug.

netdev_dpdk_rte_flow_create/destroy() doesn't have any logging, so you should
not have any messages if you only enable debug for netdev-dpdk.
It's straightfarward that you need to enable degug for netdev-offload-dpdk to
get debug logging about offloading.  Offloading logs are huge and should not
be printed or even constructed wihtout special request.

>>
>>>   return ret;
>>>   }
>>>   
>>> +static void
>>> +ds_put_flow_attr(struct ds *s, const struct rte_flow_attr *attr)
>> Please, don't use 'ds_' prefix.  It's reserved for dynamic-string module.
> Actually, I used it on purpose, as similar to ds_put_format, 
> ds_put_cstr. Those functions do the same for flow_attr, patterns etc.

This is very confusing.

> Any other suggestion?

Previous 'dump_' prefix was fine.

>>
>>> +{
>>> +ds_put_format(s,
>>> +  "  Attributes: "
>>> +  "ingress=%d, egress=%d, prio=%d, group=%d, 
>>> transfer=%d\n",
>>> +  attr->ingress, attr->egress, attr->priority, attr->group,
>>> +  attr->transfer);
>>> +}
>>> +
>>> +static void
>>> +ds_put_flow_pattern(struct ds *s, const struct rte_flow_item *item)
>>> +{
>>> +if (item->type == RTE_FLOW_ITEM_TYPE_ETH) {
>>> +const struct rte_flow_item_eth *eth_spec = item->spec;
>>> +const struct rte_flow_item_eth *eth_mask = item->mask;
>>> +
>>> +ds_put_cstr(s, "rte flow eth pattern:\n");
>>> +if (eth_spec) {
>>> +ds_put_format(s,
>>> +  "  Spec: src="ETH_ADDR_FMT", dst="ETH_ADDR_FMT", 
>>> "
>>> +  "type=0x%04" PRIx16"\n",
>>> +  ETH_ADDR_BYTES_ARGS(eth_spec->src.addr_bytes),
>>> +  ETH_ADDR_BYTES_ARGS(eth_spec->dst.addr_bytes),
>>> +  ntohs(eth_spec->type));
>>> +} else {
>>> +ds_put_cstr(s, "  Spec = null\n");
>>> +}
>>> +if (eth_mask) {
>>> +ds_put_format(s,
>>> +  "  Mask: src="ETH_ADDR_FMT", dst="ETH_ADDR_FMT", 
>>> "
>>> +  "type=0x%04"PRIx16"\n",
>>> +  ETH_ADDR_BYTES_ARGS(eth_mask->src.addr_bytes),
>>> +  ETH_ADDR_BYTES_ARGS(eth_mask->dst.addr_bytes),
>>> +  ntohs(eth_mask->type));
>>> +} else {
>>> +ds_put_cstr(s, "  Mask = null\n");
>>> +}
>>> +} else if (item->type == RTE_FLOW_ITEM_TYPE_VLAN) {
>>> +const struct rte_flow_item_vlan *vlan_spec = item->spec;
>>> +const struct rte_flow_item_vlan *vlan_mask = item->mask;
>>> +
>>> +ds_put_cstr(s, "rte flow vlan pattern:\n");
>>> +if (vlan_spec) {
>>> +ds_put_format(s,
>>> +  "  Spec: inner_type=0x%"PRIx16", 
>>> tci=0x%"PRIx16"\n",
>>> +  ntohs(vlan_spec->inner_type), 
>>> ntohs(vlan_spec->tci));
>>> +} else {
>>> +ds_put_cstr(s, "  Spec = null\n");
>>> +}
>>> +
>>> +if (vlan_mask) {
>>> +ds_put_format(s,
>>> +  "  Mask: inner_type=0x%"PRIx16", 
>>> tci=0x%"PRIx16"\n",
>>> +   

Re: [ovs-dev] [PATCH v2] Encap & Decap actions for MPLS Packet Type

2019-12-10 Thread Gregory Rose



On 12/10/2019 12:17 AM, Martin Varghese wrote:

From: Martin Varghese 

The existing PUSH MPLS & POP MPLS actions inserts & removes MPLS header
between ethernet header and the IP header. Though this behaviour is fine
for L3 VPN where an IP packet is encapsulated inside a MPLS tunnel, it
does not suffice the L2 VPN requirements. In L2 VPN the ethernet packets
must be encapsulated inside MPLS tunnel

In this change the encap & decap actions are extended to support MPLS
packet type. The encap & decap adds and removes MPLS header at the start
of packet as depicted below.

Encapsulation:

Actions - encap(mpls(ether_type=0x8847)),encap(ethernet)

Incoming packet -> | ETH | IP | Payload |

1 Actions -  encap(mpls(ether_type=0x8847)) [Kernel action -
ptap_push_mpls:0x8847]

 Outgoing packet -> | MPLS | ETH | Payload|

2 Actions - encap(ethernet) [ Kernel action - push_eth ]

 Outgoing packet -> | ETH | MPLS | ETH | Payload|

Decapsulation:

Incoming packet -> | ETH | MPLS | ETH | IP | Payload |

Actions - decap(),decap(packet_type(ns=0,type=0)

1 Actions -  decap() [Kernel action - pop_eth)

 Outgoing packet -> | MPLS | ETH | IP | Payload|

2 Actions - decap(packet_type(ns=0,type=0) [Kernel action -
ptap_pop_mpls:0]

 Outgoing packet -> | ETH  | IP | Payload

Signed-off-by: Martin Varghese 


This patch should be rebased for the Linux kernel and posted to netdev.

- Greg



---
  datapath/actions.c| 56 ++
  datapath/flow_netlink.c   | 21 ++
  datapath/linux/compat/include/linux/openvswitch.h |  2 +
  include/openvswitch/ofp-ed-props.h| 18 +
  lib/dpif-netdev.c |  2 +
  lib/dpif.c|  2 +
  lib/odp-execute.c | 14 
  lib/odp-util.c| 49 ++--
  lib/ofp-actions.c |  5 ++
  lib/ofp-ed-props.c| 92 ++-
  lib/packets.c | 28 +++
  lib/packets.h |  3 +-
  ofproto/ofproto-dpif-ipfix.c  |  2 +
  ofproto/ofproto-dpif-sflow.c  |  2 +
  ofproto/ofproto-dpif-xlate.c  | 54 +
  tests/system-traffic.at   | 34 +
  16 files changed, 376 insertions(+), 8 deletions(-)

diff --git a/datapath/actions.c b/datapath/actions.c
index fbf4457..1d0d904 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -256,6 +256,54 @@ static int pop_mpls(struct sk_buff *skb, struct 
sw_flow_key *key,
return 0;
  }
  
+static int push_ptap_mpls(struct sk_buff *skb, struct sw_flow_key *key,

+const struct ovs_action_push_mpls *mpls)
+{
+
+struct mpls_shim_hdr *lse;
+int err;
+
+if (unlikely(!eth_p_mpls(mpls->mpls_ethertype)))
+return -EINVAL;
+
+/* Networking stack does not allow simultaneous Tunnel and MPLS GSO. */
+if (skb->encapsulation)
+return -EINVAL;
+
+err = skb_cow_head(skb, MPLS_HLEN);
+if (unlikely(err))
+return err;
+
+if (!skb->inner_protocol) {
+skb_set_inner_network_header(skb, skb->mac_len);
+skb_set_inner_protocol(skb, skb->protocol);
+}
+
+skb_push(skb, MPLS_HLEN);
+skb_reset_mac_header(skb);
+skb_reset_network_header(skb);
+
+lse = mpls_hdr(skb);
+lse->label_stack_entry = mpls->mpls_lse;
+skb_postpush_rcsum(skb, lse, MPLS_HLEN);
+skb->protocol = mpls->mpls_ethertype;
+
+invalidate_flow_key(key);
+return 0;
+}
+
+static int ptap_pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
+const __be16 ethertype)
+{
+if(!ethertype)
+key->mac_proto = MAC_PROTO_ETHERNET;
+
+pop_mpls(skb, key, ethertype);
+invalidate_flow_key(key);
+return 0;
+}
+
+
  static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key,
const __be32 *mpls_lse, const __be32 *mask)
  {
@@ -1313,10 +1361,18 @@ static int do_execute_actions(struct datapath *dp, 
struct sk_buff *skb,
err = push_mpls(skb, key, nla_data(a));
break;
  
+		case OVS_ACTION_ATTR_PTAP_PUSH_MPLS:

+err = push_ptap_mpls(skb, key, nla_data(a));
+break;
+
case OVS_ACTION_ATTR_POP_MPLS:
err = pop_mpls(skb, key, nla_get_be16(a));
break;
  
+		 case OVS_ACTION_ATTR_PTAP_POP_MPLS:

+err = ptap_pop_mpls(skb, key, nla_get_be16(a));
+break;
+
case OVS_ACTION_ATTR_PUSH_VLAN:
err = push_vlan(skb, key, 

Re: [ovs-dev] [PATCH] dpif-netdev: Hold global port mutex while calling offload API.

2019-12-10 Thread Eli Britstein


On 12/10/2019 3:01 PM, Ilya Maximets wrote:
> We changed datapath port lookup to netdev-offload API usage, but
> forgot that port mutex was there not only to protect datapath
> port hash map.  It was there also as a workaround solution for
> complete unsafety of netdev-offload-dpdk functions.
>
> Turning it back to fix the behaviour and adding a comment to prevent
> removing it in the future unless netdev-offload-dpdk fixed.
>
> For the thread safety notice see the top of netdev-offload-dpdk.c.
>
> Fixes: 30115809da2e ("dpif-netdev: Use netdev-offload API for port lookup 
> while offloading")
> Signed-off-by: Ilya Maximets 
> ---
>
> Shame on me, I forgot about the documentation that I wrote 9 months ago.
>
>   lib/dpif-netdev.c | 9 +
>   1 file changed, 9 insertions(+)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 7ebf4ef87..c57458b62 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -2271,7 +2271,12 @@ mark_to_flow_disassociate(struct dp_netdev_pmd_thread 
> *pmd,
>   
>   port = netdev_ports_get(in_port, pmd->dp->class);
>   if (port) {
> +/* Taking a global 'port_mutex' to fulfill thread safety
> + * restrictions for the netdev-offload-dpdk module. */
> +ovs_mutex_lock(>dp->port_mutex);
>   ret = netdev_flow_del(port, >mega_ufid, NULL);
> +ovs_mutex_unlock(>dp->port_mutex);
> +

Just remove this blank line.

Acked-by: Eli Britstein 

>   netdev_close(port);
>   }
>   
> @@ -2415,10 +2420,14 @@ dp_netdev_flow_offload_put(struct 
> dp_flow_offload_item *offload)
>   netdev_close(port);
>   goto err_free;
>   }
> +/* Taking a global 'port_mutex' to fulfill thread safety restrictions for
> + * the netdev-offload-dpdk module. */
> +ovs_mutex_lock(>dp->port_mutex);
>   ret = netdev_flow_put(port, >match,
> CONST_CAST(struct nlattr *, offload->actions),
> offload->actions_len, >mega_ufid, ,
> NULL);
> +ovs_mutex_unlock(>dp->port_mutex);
>   netdev_close(port);
>   
>   if (ret) {
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 08/19] netdev-offload-dpdk: Protect UFID map by mutex

2019-12-10 Thread Eli Britstein


On 12/10/2019 5:03 PM, Ilya Maximets wrote:
> On 10.12.2019 15:56, Eli Britstein wrote:
>> On 12/10/2019 4:53 PM, Ilya Maximets wrote:
>>> On 08.12.2019 14:22, Eli Britstein wrote:
 Flow deletion and dumping for statistics collection are called from
 different threads. As a pre-step towards collecting HW statistics,
 protect the UFID map by mutex to make it thread safe.

 Signed-off-by: Eli Britstein 
 ---
lib/netdev-offload-dpdk.c | 7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)

>>> This patch is not needed as you have to hold global datapath port
>>> mutex while calling netdev-offload-dpdk functions.  This module
>>> is not thread safe at all.
>>>
>>> See 
>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.ozlabs.org%2Fpatch%2F1207077%2Fdata=02%7C01%7Celibr%40mellanox.com%7C7882b1cc0bde4a628f1308d77d821054%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637115869897963087sdata=WYDwU88eZXD3iSNB9RUyyoNRqC4c8WJhr06%2Fh70%2Fvxw%3Dreserved=0
>>>  .
>> That's right, but in next commits, reading the stats using
>> netdev_flow_get is done from another thread. This commit is a pre-step
>> towards it. see the commit msg.
> You have to hold dp->port_mutex while calling netdev_flow_get().
OK.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 08/19] netdev-offload-dpdk: Protect UFID map by mutex

2019-12-10 Thread Ilya Maximets
On 10.12.2019 15:56, Eli Britstein wrote:
> 
> On 12/10/2019 4:53 PM, Ilya Maximets wrote:
>> On 08.12.2019 14:22, Eli Britstein wrote:
>>> Flow deletion and dumping for statistics collection are called from
>>> different threads. As a pre-step towards collecting HW statistics,
>>> protect the UFID map by mutex to make it thread safe.
>>>
>>> Signed-off-by: Eli Britstein 
>>> ---
>>>   lib/netdev-offload-dpdk.c | 7 ++-
>>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>> This patch is not needed as you have to hold global datapath port
>> mutex while calling netdev-offload-dpdk functions.  This module
>> is not thread safe at all.
>>
>> See 
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.ozlabs.org%2Fpatch%2F1207077%2Fdata=02%7C01%7Celibr%40mellanox.com%7C25748aeece0d4a07a10408d77d80b937%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637115864133577751sdata=ndq%2Fda30D1ih4AaUmLvfFmVwXfcTO9hr5Y7wyM6KOzs%3Dreserved=0
>>  .
> That's right, but in next commits, reading the stats using 
> netdev_flow_get is done from another thread. This commit is a pre-step 
> towards it. see the commit msg.

You have to hold dp->port_mutex while calling netdev_flow_get().
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 07/19] netdev-dpdk: Introduce rte flow query function

2019-12-10 Thread Ilya Maximets
On 10.12.2019 15:53, Eli Britstein wrote:
> 
> On 12/10/2019 4:50 PM, Ilya Maximets wrote:
>> On 08.12.2019 14:22, Eli Britstein wrote:
>>> Introduce a rte flow query function as a pre-step towards reading HW
>>> statistics of fully offloaded flows.
>>>
>>> Signed-off-by: Eli Britstein 
>>> Reviewed-by: Oz Shlomo 
>>> ---
>>>   lib/netdev-dpdk.c | 25 +
>>>   lib/netdev-dpdk.h |  6 ++
>>>   2 files changed, 31 insertions(+)
>>>
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index da1349b69..e63a496c1 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -4737,6 +4737,31 @@ netdev_dpdk_rte_flow_create(struct netdev *netdev,
>>>   return flow;
>>>   }
>>>   
>>> +int
>>> +netdev_dpdk_rte_flow_query(struct netdev *netdev,
>> Shouldn't it be named netdev_dpdk_rte_flow_query_count() ?
> it's a paraphrase of "rte_flow_query()" (DPDK API), and it can stay as 
> is if changed beyond "count" usage in the future, but OK.

You'll need to change the function prototype anyway.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 08/19] netdev-offload-dpdk: Protect UFID map by mutex

2019-12-10 Thread Eli Britstein


On 12/10/2019 4:53 PM, Ilya Maximets wrote:
> On 08.12.2019 14:22, Eli Britstein wrote:
>> Flow deletion and dumping for statistics collection are called from
>> different threads. As a pre-step towards collecting HW statistics,
>> protect the UFID map by mutex to make it thread safe.
>>
>> Signed-off-by: Eli Britstein 
>> ---
>>   lib/netdev-offload-dpdk.c | 7 ++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
> This patch is not needed as you have to hold global datapath port
> mutex while calling netdev-offload-dpdk functions.  This module
> is not thread safe at all.
>
> See 
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.ozlabs.org%2Fpatch%2F1207077%2Fdata=02%7C01%7Celibr%40mellanox.com%7C25748aeece0d4a07a10408d77d80b937%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637115864133577751sdata=ndq%2Fda30D1ih4AaUmLvfFmVwXfcTO9hr5Y7wyM6KOzs%3Dreserved=0
>  .
That's right, but in next commits, reading the stats using 
netdev_flow_get is done from another thread. This commit is a pre-step 
towards it. see the commit msg.
>
> Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 08/19] netdev-offload-dpdk: Protect UFID map by mutex

2019-12-10 Thread Ilya Maximets
On 08.12.2019 14:22, Eli Britstein wrote:
> Flow deletion and dumping for statistics collection are called from
> different threads. As a pre-step towards collecting HW statistics,
> protect the UFID map by mutex to make it thread safe.
> 
> Signed-off-by: Eli Britstein 
> ---
>  lib/netdev-offload-dpdk.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 

This patch is not needed as you have to hold global datapath port
mutex while calling netdev-offload-dpdk functions.  This module
is not thread safe at all.

See https://patchwork.ozlabs.org/patch/1207077/ .

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 07/19] netdev-dpdk: Introduce rte flow query function

2019-12-10 Thread Eli Britstein


On 12/10/2019 4:50 PM, Ilya Maximets wrote:
> On 08.12.2019 14:22, Eli Britstein wrote:
>> Introduce a rte flow query function as a pre-step towards reading HW
>> statistics of fully offloaded flows.
>>
>> Signed-off-by: Eli Britstein 
>> Reviewed-by: Oz Shlomo 
>> ---
>>   lib/netdev-dpdk.c | 25 +
>>   lib/netdev-dpdk.h |  6 ++
>>   2 files changed, 31 insertions(+)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index da1349b69..e63a496c1 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -4737,6 +4737,31 @@ netdev_dpdk_rte_flow_create(struct netdev *netdev,
>>   return flow;
>>   }
>>   
>> +int
>> +netdev_dpdk_rte_flow_query(struct netdev *netdev,
> Shouldn't it be named netdev_dpdk_rte_flow_query_count() ?
it's a paraphrase of "rte_flow_query()" (DPDK API), and it can stay as 
is if changed beyond "count" usage in the future, but OK.
>
>> +   struct rte_flow *rte_flow,
>> +   struct rte_flow_query_count *query,
>> +   struct rte_flow_error *error)
>> +{
>> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> +struct rte_flow_action_count count = {};
>> +const struct rte_flow_action actions[] = {
>> +{
>> +.type = RTE_FLOW_ACTION_TYPE_COUNT,
>> +.conf = ,
>> +},
>> +{
>> +.type = RTE_FLOW_ACTION_TYPE_END,
>> +},
>> +};
>> +int ret;
>> +
>> +ovs_mutex_lock(>mutex);
>> +ret = rte_flow_query(dev->port_id, rte_flow, actions, query, error);
>> +ovs_mutex_unlock(>mutex);
>> +return ret;
>> +}
>> +
>>   #define NETDEV_DPDK_CLASS_COMMON\
>>   .is_pmd = true, \
>>   .alloc = netdev_dpdk_alloc, \
>> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
>> index 60631c4f0..ed7cb235a 100644
>> --- a/lib/netdev-dpdk.h
>> +++ b/lib/netdev-dpdk.h
>> @@ -31,6 +31,7 @@ struct rte_flow_error;
>>   struct rte_flow_attr;
>>   struct rte_flow_item;
>>   struct rte_flow_action;
>> +struct rte_flow_query_count;
>>   
>>   void netdev_dpdk_register(void);
>>   void free_dpdk_buf(struct dp_packet *);
>> @@ -47,6 +48,11 @@ netdev_dpdk_rte_flow_create(struct netdev *netdev,
>>   const struct rte_flow_item *items,
>>   const struct rte_flow_action *actions,
>>   struct rte_flow_error *error);
>> +int
>> +netdev_dpdk_rte_flow_query(struct netdev *netdev,
>> +   struct rte_flow *rte_flow,
>> +   struct rte_flow_query_count *query,
>> +   struct rte_flow_error *error);
>>   
>>   #else
>>   
>>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 07/19] netdev-dpdk: Introduce rte flow query function

2019-12-10 Thread Ilya Maximets
On 08.12.2019 14:22, Eli Britstein wrote:
> Introduce a rte flow query function as a pre-step towards reading HW
> statistics of fully offloaded flows.
> 
> Signed-off-by: Eli Britstein 
> Reviewed-by: Oz Shlomo 
> ---
>  lib/netdev-dpdk.c | 25 +
>  lib/netdev-dpdk.h |  6 ++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index da1349b69..e63a496c1 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -4737,6 +4737,31 @@ netdev_dpdk_rte_flow_create(struct netdev *netdev,
>  return flow;
>  }
>  
> +int
> +netdev_dpdk_rte_flow_query(struct netdev *netdev,

Shouldn't it be named netdev_dpdk_rte_flow_query_count() ?

> +   struct rte_flow *rte_flow,
> +   struct rte_flow_query_count *query,
> +   struct rte_flow_error *error)
> +{
> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +struct rte_flow_action_count count = {};
> +const struct rte_flow_action actions[] = {
> +{
> +.type = RTE_FLOW_ACTION_TYPE_COUNT,
> +.conf = ,
> +},
> +{
> +.type = RTE_FLOW_ACTION_TYPE_END,
> +},
> +};
> +int ret;
> +
> +ovs_mutex_lock(>mutex);
> +ret = rte_flow_query(dev->port_id, rte_flow, actions, query, error);
> +ovs_mutex_unlock(>mutex);
> +return ret;
> +}
> +
>  #define NETDEV_DPDK_CLASS_COMMON\
>  .is_pmd = true, \
>  .alloc = netdev_dpdk_alloc, \
> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
> index 60631c4f0..ed7cb235a 100644
> --- a/lib/netdev-dpdk.h
> +++ b/lib/netdev-dpdk.h
> @@ -31,6 +31,7 @@ struct rte_flow_error;
>  struct rte_flow_attr;
>  struct rte_flow_item;
>  struct rte_flow_action;
> +struct rte_flow_query_count;
>  
>  void netdev_dpdk_register(void);
>  void free_dpdk_buf(struct dp_packet *);
> @@ -47,6 +48,11 @@ netdev_dpdk_rte_flow_create(struct netdev *netdev,
>  const struct rte_flow_item *items,
>  const struct rte_flow_action *actions,
>  struct rte_flow_error *error);
> +int
> +netdev_dpdk_rte_flow_query(struct netdev *netdev,
> +   struct rte_flow *rte_flow,
> +   struct rte_flow_query_count *query,
> +   struct rte_flow_error *error);
>  
>  #else
>  
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 04/19] netdev-offload-dpdk: Fix typo

2019-12-10 Thread Eli Britstein


On 12/10/2019 4:22 PM, Ilya Maximets wrote:
> On 08.12.2019 14:22, Eli Britstein wrote:
>> Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
>> Signed-off-by: Eli Britstein 
>> Reviewed-by: Oz Shlomo 
>> ---
>>   lib/netdev-offload-dpdk.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>> index e48ac3b63..041b72d53 100644
>> --- a/lib/netdev-offload-dpdk.c
>> +++ b/lib/netdev-offload-dpdk.c
>> @@ -656,7 +656,7 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
>>  actions.actions, );
>>   
>>   if (!flow) {
>> -VLOG_ERR("%s: rte flow creat error: %u : message : %s\n",
>> +VLOG_ERR("%s: rte flow create error: %u : message : %s\n",
> I don't think that we need to backport this and you're changing/touching
> this line several (three?) times over the series.  So, I think, it's better
> to drop this patch and update along with later changes.  Or keep it and
> drop later changes (reviews for next patches are coming).
OK. will drop it.
>
>>netdev_get_name(netdev), error.type, error.message);
>>   ret = -1;
>>   goto out;
>>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 05/19] netdev-dpdk: Improve HW offload flow debuggability

2019-12-10 Thread Eli Britstein


On 12/10/2019 4:38 PM, Ilya Maximets wrote:
> On 08.12.2019 14:22, Eli Britstein wrote:
>> Add debug prints when creating and destroying rte flows, with all the
>> flow details (attributes, patterns, actions).
>>
>> Signed-off-by: Eli Britstein 
>> Reviewed-by: Oz Shlomo 
>> ---
>>   lib/netdev-dpdk.c | 260 
>> ++
>>   lib/netdev-offload-dpdk.c | 207 +---
>>   2 files changed, 264 insertions(+), 203 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 89c73a29b..da1349b69 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -4458,9 +4458,252 @@ netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
>>   ovs_mutex_lock(>mutex);
>>   ret = rte_flow_destroy(dev->port_id, rte_flow, error);
>>   ovs_mutex_unlock(>mutex);
>> +if (!ret) {
>> +VLOG_DBG_RL(, "Destroy rte_flow %p: netdev=%s, port=%d\n",
>> +rte_flow, netdev_get_name(netdev), dev->port_id);
>> +} else {
>> +VLOG_ERR_RL(, "Destroy rte_flow %p: netdev=%s, port=%d\n"
>> +"FAILED. error %u : message : %s",
>> +rte_flow, netdev_get_name(netdev), dev->port_id,
>> +error->type, error->message);
>> +}
> So, what is the reason of doing that?
> netdev_offload_dpdk_add_flow() prints/could print exactly same messages
> (dev->port_id is not that useful to print).
> Am I missing something?
Yes, it could have been printed in netdev_offload_dpdk_add_flow(), but 
if moving there, and you just enable DBG in netdev_dpdk, you will see 
only the create messages, without the destroys. It is helpful for debug.
>
>>   return ret;
>>   }
>>   
>> +static void
>> +ds_put_flow_attr(struct ds *s, const struct rte_flow_attr *attr)
> Please, don't use 'ds_' prefix.  It's reserved for dynamic-string module.
Actually, I used it on purpose, as similar to ds_put_format, 
ds_put_cstr. Those functions do the same for flow_attr, patterns etc. 
Any other suggestion?
>
>> +{
>> +ds_put_format(s,
>> +  "  Attributes: "
>> +  "ingress=%d, egress=%d, prio=%d, group=%d, transfer=%d\n",
>> +  attr->ingress, attr->egress, attr->priority, attr->group,
>> +  attr->transfer);
>> +}
>> +
>> +static void
>> +ds_put_flow_pattern(struct ds *s, const struct rte_flow_item *item)
>> +{
>> +if (item->type == RTE_FLOW_ITEM_TYPE_ETH) {
>> +const struct rte_flow_item_eth *eth_spec = item->spec;
>> +const struct rte_flow_item_eth *eth_mask = item->mask;
>> +
>> +ds_put_cstr(s, "rte flow eth pattern:\n");
>> +if (eth_spec) {
>> +ds_put_format(s,
>> +  "  Spec: src="ETH_ADDR_FMT", dst="ETH_ADDR_FMT", "
>> +  "type=0x%04" PRIx16"\n",
>> +  ETH_ADDR_BYTES_ARGS(eth_spec->src.addr_bytes),
>> +  ETH_ADDR_BYTES_ARGS(eth_spec->dst.addr_bytes),
>> +  ntohs(eth_spec->type));
>> +} else {
>> +ds_put_cstr(s, "  Spec = null\n");
>> +}
>> +if (eth_mask) {
>> +ds_put_format(s,
>> +  "  Mask: src="ETH_ADDR_FMT", dst="ETH_ADDR_FMT", "
>> +  "type=0x%04"PRIx16"\n",
>> +  ETH_ADDR_BYTES_ARGS(eth_mask->src.addr_bytes),
>> +  ETH_ADDR_BYTES_ARGS(eth_mask->dst.addr_bytes),
>> +  ntohs(eth_mask->type));
>> +} else {
>> +ds_put_cstr(s, "  Mask = null\n");
>> +}
>> +} else if (item->type == RTE_FLOW_ITEM_TYPE_VLAN) {
>> +const struct rte_flow_item_vlan *vlan_spec = item->spec;
>> +const struct rte_flow_item_vlan *vlan_mask = item->mask;
>> +
>> +ds_put_cstr(s, "rte flow vlan pattern:\n");
>> +if (vlan_spec) {
>> +ds_put_format(s,
>> +  "  Spec: inner_type=0x%"PRIx16", 
>> tci=0x%"PRIx16"\n",
>> +  ntohs(vlan_spec->inner_type), 
>> ntohs(vlan_spec->tci));
>> +} else {
>> +ds_put_cstr(s, "  Spec = null\n");
>> +}
>> +
>> +if (vlan_mask) {
>> +ds_put_format(s,
>> +  "  Mask: inner_type=0x%"PRIx16", 
>> tci=0x%"PRIx16"\n",
>> +  ntohs(vlan_mask->inner_type), 
>> ntohs(vlan_mask->tci));
>> +} else {
>> +ds_put_cstr(s, "  Mask = null\n");
>> +}
>> +} else if (item->type == RTE_FLOW_ITEM_TYPE_IPV4) {
>> +const struct rte_flow_item_ipv4 *ipv4_spec = item->spec;
>> +const struct rte_flow_item_ipv4 *ipv4_mask = item->mask;
>> +
>> +ds_put_cstr(s, "rte flow ipv4 pattern:\n");
>> +if (ipv4_spec) {
>> +ds_put_format(s,
>> +  "  Spec: tos=0x%"PRIx8", ttl=%"PRIx8
>> +  ", 

Re: [ovs-dev] [PATCH V3 05/19] netdev-dpdk: Improve HW offload flow debuggability

2019-12-10 Thread Ilya Maximets
On 08.12.2019 14:22, Eli Britstein wrote:
> Add debug prints when creating and destroying rte flows, with all the
> flow details (attributes, patterns, actions).
> 
> Signed-off-by: Eli Britstein 
> Reviewed-by: Oz Shlomo 
> ---
>  lib/netdev-dpdk.c | 260 
> ++
>  lib/netdev-offload-dpdk.c | 207 +---
>  2 files changed, 264 insertions(+), 203 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 89c73a29b..da1349b69 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -4458,9 +4458,252 @@ netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
>  ovs_mutex_lock(>mutex);
>  ret = rte_flow_destroy(dev->port_id, rte_flow, error);
>  ovs_mutex_unlock(>mutex);
> +if (!ret) {
> +VLOG_DBG_RL(, "Destroy rte_flow %p: netdev=%s, port=%d\n",
> +rte_flow, netdev_get_name(netdev), dev->port_id);
> +} else {
> +VLOG_ERR_RL(, "Destroy rte_flow %p: netdev=%s, port=%d\n"
> +"FAILED. error %u : message : %s",
> +rte_flow, netdev_get_name(netdev), dev->port_id,
> +error->type, error->message);
> +}

So, what is the reason of doing that?
netdev_offload_dpdk_add_flow() prints/could print exactly same messages
(dev->port_id is not that useful to print).
Am I missing something?

>  return ret;
>  }
>  
> +static void
> +ds_put_flow_attr(struct ds *s, const struct rte_flow_attr *attr)

Please, don't use 'ds_' prefix.  It's reserved for dynamic-string module.

> +{
> +ds_put_format(s,
> +  "  Attributes: "
> +  "ingress=%d, egress=%d, prio=%d, group=%d, transfer=%d\n",
> +  attr->ingress, attr->egress, attr->priority, attr->group,
> +  attr->transfer);
> +}
> +
> +static void
> +ds_put_flow_pattern(struct ds *s, const struct rte_flow_item *item)
> +{
> +if (item->type == RTE_FLOW_ITEM_TYPE_ETH) {
> +const struct rte_flow_item_eth *eth_spec = item->spec;
> +const struct rte_flow_item_eth *eth_mask = item->mask;
> +
> +ds_put_cstr(s, "rte flow eth pattern:\n");
> +if (eth_spec) {
> +ds_put_format(s,
> +  "  Spec: src="ETH_ADDR_FMT", dst="ETH_ADDR_FMT", "
> +  "type=0x%04" PRIx16"\n",
> +  ETH_ADDR_BYTES_ARGS(eth_spec->src.addr_bytes),
> +  ETH_ADDR_BYTES_ARGS(eth_spec->dst.addr_bytes),
> +  ntohs(eth_spec->type));
> +} else {
> +ds_put_cstr(s, "  Spec = null\n");
> +}
> +if (eth_mask) {
> +ds_put_format(s,
> +  "  Mask: src="ETH_ADDR_FMT", dst="ETH_ADDR_FMT", "
> +  "type=0x%04"PRIx16"\n",
> +  ETH_ADDR_BYTES_ARGS(eth_mask->src.addr_bytes),
> +  ETH_ADDR_BYTES_ARGS(eth_mask->dst.addr_bytes),
> +  ntohs(eth_mask->type));
> +} else {
> +ds_put_cstr(s, "  Mask = null\n");
> +}
> +} else if (item->type == RTE_FLOW_ITEM_TYPE_VLAN) {
> +const struct rte_flow_item_vlan *vlan_spec = item->spec;
> +const struct rte_flow_item_vlan *vlan_mask = item->mask;
> +
> +ds_put_cstr(s, "rte flow vlan pattern:\n");
> +if (vlan_spec) {
> +ds_put_format(s,
> +  "  Spec: inner_type=0x%"PRIx16", 
> tci=0x%"PRIx16"\n",
> +  ntohs(vlan_spec->inner_type), 
> ntohs(vlan_spec->tci));
> +} else {
> +ds_put_cstr(s, "  Spec = null\n");
> +}
> +
> +if (vlan_mask) {
> +ds_put_format(s,
> +  "  Mask: inner_type=0x%"PRIx16", 
> tci=0x%"PRIx16"\n",
> +  ntohs(vlan_mask->inner_type), 
> ntohs(vlan_mask->tci));
> +} else {
> +ds_put_cstr(s, "  Mask = null\n");
> +}
> +} else if (item->type == RTE_FLOW_ITEM_TYPE_IPV4) {
> +const struct rte_flow_item_ipv4 *ipv4_spec = item->spec;
> +const struct rte_flow_item_ipv4 *ipv4_mask = item->mask;
> +
> +ds_put_cstr(s, "rte flow ipv4 pattern:\n");
> +if (ipv4_spec) {
> +ds_put_format(s,
> +  "  Spec: tos=0x%"PRIx8", ttl=%"PRIx8
> +  ", proto=0x%"PRIx8
> +  ", src="IP_FMT", dst="IP_FMT"\n",
> +  ipv4_spec->hdr.type_of_service,
> +  ipv4_spec->hdr.time_to_live,
> +  ipv4_spec->hdr.next_proto_id,
> +  IP_ARGS(ipv4_spec->hdr.src_addr),
> +  IP_ARGS(ipv4_spec->hdr.dst_addr));
> +} else {
> +ds_put_cstr(s, "  Spec = null\n");
> +}
> +if (ipv4_mask) {
> +ds_put_format(s,
> + 

Re: [ovs-dev] [PATCH V3 18/19] netdev-offload-dpdk: Support offload of clone tnl_push/output actions

2019-12-10 Thread Eli Britstein


On 12/10/2019 2:53 PM, Sathya Perla wrote:
> On Tue, Dec 10, 2019 at 5:25 PM Eli Britstein  wrote:
>>
>> On 12/10/2019 12:09 PM, Sathya Perla wrote:
>>> On Tue, Dec 10, 2019 at 1:21 PM Eli Britstein  wrote:
 On 12/10/2019 8:52 AM, Sathya Perla wrote:
> On Sun, Dec 8, 2019 at 6:53 PM Eli Britstein  wrote:
> ...
>> +static int
>> +parse_clone_actions(struct netdev *netdev,
>> +struct flow_actions *actions,
>> +const struct nlattr *clone_actions,
>> +const size_t clone_actions_len,
>> +struct offload_info *info)
>> +{
>> +const struct nlattr *ca;
>> +unsigned int cleft;
>> +
>> +NL_ATTR_FOR_EACH_UNSAFE (ca, cleft, clone_actions, 
>> clone_actions_len) {
>> +int clone_type = nl_attr_type(ca);
>> +
>> +if (clone_type == OVS_ACTION_ATTR_TUNNEL_PUSH) {
>> +const struct ovs_action_push_tnl *tnl_push = 
>> nl_attr_get(ca);
>> +struct rte_flow_action_raw_encap *raw_encap =
>> +xzalloc(sizeof *raw_encap);
>> +
>> +raw_encap->data = (uint8_t *)tnl_push->header;
>> +raw_encap->preserve = NULL;
>> +raw_encap->size = tnl_push->header_len;
>> +
>> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RAW_ENCAP,
>> +raw_encap);
> Hi, converting OVS_TUNNLE_PUSH into RTE_RAW_ENCAP loses the 'tnl_type'
> information provided by OVS. RAW_ENCAP provides the tunnel header just
> as a blob of bits which may not be ideal for NIC HW to offload.
>
> How about using tnl_push->tnl_type field to parse the header and
> compose specific tunnel encap actions like RTE_VXLAN_ENCAP,
> RTE_NVGRE_ENCAP etc.
> This would be also be more inline with how it's done with TC-offload.
 I see your point. However, struct ovs_action_push_tnl has the "header"
 field just as a raw binary buffer, and not detailed like in TC.
 "tnl_type" has a comment /* For logging. */. It is indeed used for
 logging, as in lib/odp-util.c, in function format_odp_tnl_push_header.
>>> This is not entirely true. Checkout propagate_tunnel_data_to_flow()
>>> where tnl_type is being used
>>> to figure out nw_proto.
>>>
 using VXLAN_ENCAP instead of RAW_ENCAP will make the push less generic,
 as I will have to parse the header to build the vxlan_encap fields, just
 so they can re-build as a raw header in the PMD level, so I don't see
 the offload benefit of it.
>>> Why are you assuming that all PMDs will rebuild the tunnel header
>>> fields as a 'raw header'?
>> As far as I saw in DPDK code, if I'm not wrong, all PMDs that support it
>> do it like this. Anyway, it is indeed not relevant.
>>> What if the NIC HW needs tunnel specific headers to be programmed
>>> separately for each tunnel type?
>>>
 Another point is that this way it will support any header, not just
 VXLAN (as an example).
>>> Given that VXLAN and NVGRE ENCAP and DECAP actions are already defined
>>> in rte_flow, how about
>>> we use them for vxlan and nvgre and use RAW for the rest of the tunnel 
>>> types?
>>> We can support all tunnel headers even this way.
>> Again, I see your point.
>>
>> However, in OVS level, we have the encap as a raw buffer and DPDK
>> supports it natively using RAW_ENCAP. For that reason I think we should
>> use the straight forward method.
>>
>> I think that if there is a PMD that requires the fields separately, it
>> is under its responsibility to parse it, and not forcing the application
>> to do it.
> How should a PMD parse the header buffer? For e.g., for vxlan, should the PMD
> look for dst UDP port 4789? What if a different port number is used
> for vxlan as it's the case with some deployments?
> I think it's important to deal with this in OVS because the 'tnl_type'
> info is available in OVS and it should be relayed to the PMD
> is some form.
>
>> As it's a valid attribute in DPDK, and it's the natural way for OVS to
>> use, I think we should use it. If it is from some reason not valid, in
>> the future, DPDK will deprecate it.
> I agree RAW_ENCAP is a valid RTE action. But I feel we must choose the
> action that conveys/translates as much
> data as possible from the OVS layer to the PMD. This will enable
> offload support from more number of NIC HWs.

AFAIU, currently there is no such NIC HW that actually uses the 
additional info in VXLAN_ENCAP vs RAW_ENCAP, but they would just 
re-build the header as if it would been received with RAW, so for now I 
don't see any benefit of doing so. In the future, if there is such HW as 
you suggest, maybe it can be enhanced.

What do you think?

>
> thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 04/19] netdev-offload-dpdk: Fix typo

2019-12-10 Thread Ilya Maximets
On 08.12.2019 14:22, Eli Britstein wrote:
> Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
> Signed-off-by: Eli Britstein 
> Reviewed-by: Oz Shlomo 
> ---
>  lib/netdev-offload-dpdk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index e48ac3b63..041b72d53 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -656,7 +656,7 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
> actions.actions, );
>  
>  if (!flow) {
> -VLOG_ERR("%s: rte flow creat error: %u : message : %s\n",
> +VLOG_ERR("%s: rte flow create error: %u : message : %s\n",

I don't think that we need to backport this and you're changing/touching
this line several (three?) times over the series.  So, I think, it's better
to drop this patch and update along with later changes.  Or keep it and
drop later changes (reviews for next patches are coming).

>   netdev_get_name(netdev), error.type, error.message);
>  ret = -1;
>  goto out;
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 02/19] netdev-offload-dpdk: Refactor action items freeing scheme

2019-12-10 Thread Eli Britstein


On 12/10/2019 4:09 PM, Ilya Maximets wrote:
> On 10.12.2019 15:06, Eli Britstein wrote:
>> On 12/10/2019 3:43 PM, Ilya Maximets wrote:
>>> On 08.12.2019 14:22, Eli Britstein wrote:
 Action item data structures are pointed by rte_flow_action items.
 Refactor the code to free the data structures when freeing the
 rte_flow_action items, allowing simpler future actions simpler to add to
 the code.

 Signed-off-by: Eli Britstein 
 ---
lib/netdev-offload-dpdk.c | 92 
 ++-
1 file changed, 52 insertions(+), 40 deletions(-)

 diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
 index a008e642f..c3b595a0a 100644
 --- a/lib/netdev-offload-dpdk.c
 +++ b/lib/netdev-offload-dpdk.c
 @@ -374,40 +374,19 @@ add_flow_action(struct flow_actions *actions, enum 
 rte_flow_action_type type,
actions->cnt++;
}

 -struct action_rss_data {
 -struct rte_flow_action_rss conf;
 -uint16_t queue[0];
 -};
 -
 -static struct action_rss_data *
 -add_flow_rss_action(struct flow_actions *actions,
 -struct netdev *netdev)
 +static void
 +free_flow_actions(struct flow_actions *actions)
{
int i;
 -struct action_rss_data *rss_data;
 -
 -rss_data = xmalloc(sizeof *rss_data +
 -   netdev_n_rxq(netdev) * sizeof rss_data->queue[0]);
 -*rss_data = (struct action_rss_data) {
 -.conf = (struct rte_flow_action_rss) {
 -.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
 -.level = 0,
 -.types = 0,
 -.queue_num = netdev_n_rxq(netdev),
 -.queue = rss_data->queue,
 -.key_len = 0,
 -.key  = NULL
 -},
 -};

 -/* Override queue array with default. */
 -for (i = 0; i < netdev_n_rxq(netdev); i++) {
 -   rss_data->queue[i] = i;
 +for (i = 0; i < actions->cnt; i++) {
 +if (actions->actions[i].conf) {
 +free((void *)actions->actions[i].conf);
>>> Please, don't cast the argument.
>> The conf field is declared as "const void *" in DPDK. How can we avoid it?
> If it's here to remove the 'const', use explicit CONST_CAST instead.
OK
>
 +}
}
 -
 -add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, _data->conf);
 -
 -return rss_data;
 +free(actions->actions);
 +actions->actions = NULL;
 +actions->cnt = 0;
}

struct flow_items {
 @@ -572,6 +551,47 @@ parse_flow_match(struct flow_patterns *patterns,
return 0;
}

 +static void
 +add_flow_mark_rss_actions(struct flow_actions *actions,
 +  uint32_t flow_mark,
 +  struct netdev *netdev)
>>> const struct netdev *netdev)
>> OK
 +{
 +struct rte_flow_action_mark *mark;
 +struct action_rss_data {
 +struct rte_flow_action_rss conf;
 +uint16_t queue[0];
 +} *rss_data;
>>> It seems we need this:
>>>
>>> BUILD_ASSERT_DECL(offsetof(struct action_rss_data, conf) == 0);
>> OK
 +int i;
 +
 +mark = xzalloc(sizeof *mark);
 +
 +mark->id = flow_mark;
 +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_MARK, mark);
 +
 +rss_data = xmalloc(sizeof *rss_data +
 +   netdev_n_rxq(netdev) * sizeof rss_data->queue[0]);
 +*rss_data = (struct action_rss_data) {
 +.conf = (struct rte_flow_action_rss) {
 +.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
 +.level = 0,
 +.types = 0,
 +.queue_num = netdev_n_rxq(netdev),
 +.queue = rss_data->queue,
 +.key_len = 0,
 +.key  = NULL
 +},
 +};
 +
 +/* Override queue array with default. */
 +for (i = 0; i < netdev_n_rxq(netdev); i++) {
 +   rss_data->queue[i] = i;
 +}
 +
 +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, _data->conf);
 +
>>> This empty line doesn't seem to be necessary.
>> OK
 +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_END, NULL);
 +}
 +
static int
netdev_offload_dpdk_add_flow(struct netdev *netdev,
 const struct match *match,
 @@ -599,20 +619,12 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
goto out;
}

 -struct rte_flow_action_mark mark;
 -struct action_rss_data *rss;
 -
 -mark.id = info->flow_mark;
 -add_flow_action(, RTE_FLOW_ACTION_TYPE_MARK, );
 -
 -rss = add_flow_rss_action(, netdev);
 -add_flow_action(, 

Re: [ovs-dev] [PATCH V3 16/19] netdev-offload-dpdk: Support offload of set MAC actions

2019-12-10 Thread Andrew Rybchenko
On 12/8/19 4:23 PM, Eli Britstein wrote:
> Signed-off-by: Eli Britstein 
> Reviewed-by: Oz Shlomo 
> ---
>  Documentation/howto/dpdk.rst |   1 +
>  NEWS |   3 +-
>  lib/netdev-dpdk.c|  15 ++
>  lib/netdev-offload-dpdk.c| 107 
> +++
>  4 files changed, 125 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 6cedd7f63..d228493e9 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -392,6 +392,7 @@ Supported actions for hardware offload are:
>  
>  - Output (a single output, as the last action).
>  - Drop.
> +- Modification of Ethernet (mod_dl_src/mod_dl_dst).
>  
>  Further Reading
>  ---
> diff --git a/NEWS b/NEWS
> index d019e066f..3ade86d49 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -26,7 +26,8 @@ Post-v2.12.0
>   * DPDK ring ports (dpdkr) are deprecated and will be removed in next
> releases.
>   * Add support for DPDK 19.11.
> - * Add hardware offload support for output and drop actions 
> (experimental).
> + * Add hardware offload support for output, drop and set MAC actions
> +   (experimental).
>  
>  v2.12.0 - 03 Sep 2019
>  -
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 872a45e75..e67a3dd76 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -4723,6 +4723,21 @@ ds_put_flow_action(struct ds *s, const struct 
> rte_flow_action *actions)
>  }
>  } else if (actions->type == RTE_FLOW_ACTION_TYPE_DROP) {
>  ds_put_cstr(s, "rte flow drop action\n");
> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_SRC ||
> +   actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_DST) {
> +const struct rte_flow_action_set_mac *set_mac = actions->conf;
> +
> +char *dirstr = actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_DST
> +   ? "dst" : "src";
> +
> +ds_put_format(s, "rte flow set-mac-%s action:\n", dirstr);
> +if (set_mac) {
> +ds_put_format(s,
> +  "  Set-mac-%s: "ETH_ADDR_FMT"\n",
> +  dirstr, ETH_ADDR_BYTES_ARGS(set_mac->mac_addr));
> +} else {
> +ds_put_format(s, "  Set-mac-%s = null\n", dirstr);
> +}
>  } else {
>  ds_put_format(s, "unknown rte flow action (%d)\n", actions->type);
>  }
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index bffb69cad..07f5d4687 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -511,6 +511,103 @@ add_output_action(struct netdev *netdev,
>  return ret;
>  }
>  
> +struct set_action_info {
> +const uint8_t *value, *mask;
> +const uint8_t size;
> +uint8_t *spec;
> +const int attr;
> +};
> +
> +static int
> +add_set_flow_action(struct flow_actions *actions,
> +struct set_action_info *sa_info_arr,
> +size_t sa_info_arr_size)
> +{
> +int field, i;
> +
> +for (field = 0; field < sa_info_arr_size; field++) {
> +if (sa_info_arr[field].mask) {
> +/* DPDK does not support partially masked set actions. In such
> + * case, fail the offload.
> + */
> +if (sa_info_arr[field].mask[0] != 0x00 &&
> +sa_info_arr[field].mask[0] != 0xFF) {
> +VLOG_DBG_RL(_rl,
> +"Partial mask is not supported");
> +return -1;
> +}
> +
> +for (i = 1; i < sa_info_arr[field].size; i++) {
> +if (sa_info_arr[field].mask[i] !=
> +sa_info_arr[field].mask[i - 1]) {
> +VLOG_DBG_RL(_rl,
> +"Partial mask is not supported");
> +return -1;
> +}
> +}
> +
> +if (sa_info_arr[field].mask[0] == 0x00) {
> +/* mask bytes are all 0 - no rewrite action required */
> +continue;
> +}
> +}
> +
> +memcpy(sa_info_arr[field].spec, sa_info_arr[field].value,
> +   sa_info_arr[field].size);
> +add_flow_action(actions, sa_info_arr[field].attr,
> +sa_info_arr[field].spec);
> +}
> +
> +return 0;
> +}
> +
> +/* Mask is at the midpoint of the data. */
> +#define get_mask(a, type) ((const type *)(const void *)(a + 1) + 1)
> +
> +#define SA_INFO(_field, _spec, _attr) { \
> +.value = (uint8_t *)>_field, \
> +.mask = (masked) ? (uint8_t *)>_field : NULL, \
> +.size = sizeof key->_field, \
> +.spec = (uint8_t *)&_spec, \
> +.attr = _attr }
> +
> +static int
> +parse_set_actions(struct flow_actions *actions,
> +  const struct nlattr *set_actions,
> +  const size_t set_actions_len,
> +  bool masked)
> +{
> +const 

Re: [ovs-dev] [PATCH V3 16/19] netdev-offload-dpdk: Support offload of set MAC actions

2019-12-10 Thread Andrew Rybchenko
On 12/10/19 5:14 PM, Eli Britstein wrote:
> 
> On 12/10/2019 4:12 PM, Andrew Rybchenko wrote:
>> On 12/8/19 4:23 PM, Eli Britstein wrote:
>>> Signed-off-by: Eli Britstein 
>>> Reviewed-by: Oz Shlomo 
>>> ---
>>>   Documentation/howto/dpdk.rst |   1 +
>>>   NEWS |   3 +-
>>>   lib/netdev-dpdk.c|  15 ++
>>>   lib/netdev-offload-dpdk.c| 107 
>>> +++
>>>   4 files changed, 125 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
>>> index 6cedd7f63..d228493e9 100644
>>> --- a/Documentation/howto/dpdk.rst
>>> +++ b/Documentation/howto/dpdk.rst
>>> @@ -392,6 +392,7 @@ Supported actions for hardware offload are:
>>>   
>>>   - Output (a single output, as the last action).
>>>   - Drop.
>>> +- Modification of Ethernet (mod_dl_src/mod_dl_dst).
>>>   
>>>   Further Reading
>>>   ---
>>> diff --git a/NEWS b/NEWS
>>> index d019e066f..3ade86d49 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -26,7 +26,8 @@ Post-v2.12.0
>>>* DPDK ring ports (dpdkr) are deprecated and will be removed in next
>>>  releases.
>>>* Add support for DPDK 19.11.
>>> - * Add hardware offload support for output and drop actions 
>>> (experimental).
>>> + * Add hardware offload support for output, drop and set MAC actions
>>> +   (experimental).
>>>   
>>>   v2.12.0 - 03 Sep 2019
>>>   -
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 872a45e75..e67a3dd76 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -4723,6 +4723,21 @@ ds_put_flow_action(struct ds *s, const struct 
>>> rte_flow_action *actions)
>>>   }
>>>   } else if (actions->type == RTE_FLOW_ACTION_TYPE_DROP) {
>>>   ds_put_cstr(s, "rte flow drop action\n");
>>> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_SRC ||
>>> +   actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_DST) {
>>> +const struct rte_flow_action_set_mac *set_mac = actions->conf;
>>> +
>>> +char *dirstr = actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_DST
>>> +   ? "dst" : "src";
>>> +
>>> +ds_put_format(s, "rte flow set-mac-%s action:\n", dirstr);
>>> +if (set_mac) {
>>> +ds_put_format(s,
>>> +  "  Set-mac-%s: "ETH_ADDR_FMT"\n",
>>> +  dirstr, ETH_ADDR_BYTES_ARGS(set_mac->mac_addr));
>>> +} else {
>>> +ds_put_format(s, "  Set-mac-%s = null\n", dirstr);
>>> +}
>>>   } else {
>>>   ds_put_format(s, "unknown rte flow action (%d)\n", actions->type);
>>>   }
>>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>>> index bffb69cad..07f5d4687 100644
>>> --- a/lib/netdev-offload-dpdk.c
>>> +++ b/lib/netdev-offload-dpdk.c
>>> @@ -511,6 +511,103 @@ add_output_action(struct netdev *netdev,
>>>   return ret;
>>>   }
>>>   
>>> +struct set_action_info {
>>> +const uint8_t *value, *mask;
>>> +const uint8_t size;
>>> +uint8_t *spec;
>>> +const int attr;
>>> +};
>>> +
>>> +static int
>>> +add_set_flow_action(struct flow_actions *actions,
>>> +struct set_action_info *sa_info_arr,
>>> +size_t sa_info_arr_size)
>>> +{
>>> +int field, i;
>>> +
>>> +for (field = 0; field < sa_info_arr_size; field++) {
>>> +if (sa_info_arr[field].mask) {
>>> +/* DPDK does not support partially masked set actions. In such
>>> + * case, fail the offload.
>>> + */
>>> +if (sa_info_arr[field].mask[0] != 0x00 &&
>>> +sa_info_arr[field].mask[0] != 0xFF) {
>>> +VLOG_DBG_RL(_rl,
>>> +"Partial mask is not supported");
>>> +return -1;
>>> +}
>>> +
>>> +for (i = 1; i < sa_info_arr[field].size; i++) {
>>> +if (sa_info_arr[field].mask[i] !=
>>> +sa_info_arr[field].mask[i - 1]) {
>>> +VLOG_DBG_RL(_rl,
>>> +"Partial mask is not supported");
>>> +return -1;
>>> +}
>>> +}
>>> +
>>> +if (sa_info_arr[field].mask[0] == 0x00) {
>>> +/* mask bytes are all 0 - no rewrite action required */
>>> +continue;
>>> +}
>>> +}
>>> +
>>> +memcpy(sa_info_arr[field].spec, sa_info_arr[field].value,
>>> +   sa_info_arr[field].size);
>>> +add_flow_action(actions, sa_info_arr[field].attr,
>>> +sa_info_arr[field].spec);
>>> +}
>>> +
>>> +return 0;
>>> +}
>>> +
>>> +/* Mask is at the midpoint of the data. */
>>> +#define get_mask(a, type) ((const type *)(const void *)(a + 1) + 1)
>>> +
>>> +#define SA_INFO(_field, _spec, _attr) { \
>>> +.value = (uint8_t *)>_field, \
>>> + 

Re: [ovs-dev] [PATCH V3 16/19] netdev-offload-dpdk: Support offload of set MAC actions

2019-12-10 Thread Eli Britstein


On 12/10/2019 4:12 PM, Andrew Rybchenko wrote:
> On 12/8/19 4:23 PM, Eli Britstein wrote:
>> Signed-off-by: Eli Britstein 
>> Reviewed-by: Oz Shlomo 
>> ---
>>   Documentation/howto/dpdk.rst |   1 +
>>   NEWS |   3 +-
>>   lib/netdev-dpdk.c|  15 ++
>>   lib/netdev-offload-dpdk.c| 107 
>> +++
>>   4 files changed, 125 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
>> index 6cedd7f63..d228493e9 100644
>> --- a/Documentation/howto/dpdk.rst
>> +++ b/Documentation/howto/dpdk.rst
>> @@ -392,6 +392,7 @@ Supported actions for hardware offload are:
>>   
>>   - Output (a single output, as the last action).
>>   - Drop.
>> +- Modification of Ethernet (mod_dl_src/mod_dl_dst).
>>   
>>   Further Reading
>>   ---
>> diff --git a/NEWS b/NEWS
>> index d019e066f..3ade86d49 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -26,7 +26,8 @@ Post-v2.12.0
>>* DPDK ring ports (dpdkr) are deprecated and will be removed in next
>>  releases.
>>* Add support for DPDK 19.11.
>> - * Add hardware offload support for output and drop actions 
>> (experimental).
>> + * Add hardware offload support for output, drop and set MAC actions
>> +   (experimental).
>>   
>>   v2.12.0 - 03 Sep 2019
>>   -
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 872a45e75..e67a3dd76 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -4723,6 +4723,21 @@ ds_put_flow_action(struct ds *s, const struct 
>> rte_flow_action *actions)
>>   }
>>   } else if (actions->type == RTE_FLOW_ACTION_TYPE_DROP) {
>>   ds_put_cstr(s, "rte flow drop action\n");
>> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_SRC ||
>> +   actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_DST) {
>> +const struct rte_flow_action_set_mac *set_mac = actions->conf;
>> +
>> +char *dirstr = actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_DST
>> +   ? "dst" : "src";
>> +
>> +ds_put_format(s, "rte flow set-mac-%s action:\n", dirstr);
>> +if (set_mac) {
>> +ds_put_format(s,
>> +  "  Set-mac-%s: "ETH_ADDR_FMT"\n",
>> +  dirstr, ETH_ADDR_BYTES_ARGS(set_mac->mac_addr));
>> +} else {
>> +ds_put_format(s, "  Set-mac-%s = null\n", dirstr);
>> +}
>>   } else {
>>   ds_put_format(s, "unknown rte flow action (%d)\n", actions->type);
>>   }
>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>> index bffb69cad..07f5d4687 100644
>> --- a/lib/netdev-offload-dpdk.c
>> +++ b/lib/netdev-offload-dpdk.c
>> @@ -511,6 +511,103 @@ add_output_action(struct netdev *netdev,
>>   return ret;
>>   }
>>   
>> +struct set_action_info {
>> +const uint8_t *value, *mask;
>> +const uint8_t size;
>> +uint8_t *spec;
>> +const int attr;
>> +};
>> +
>> +static int
>> +add_set_flow_action(struct flow_actions *actions,
>> +struct set_action_info *sa_info_arr,
>> +size_t sa_info_arr_size)
>> +{
>> +int field, i;
>> +
>> +for (field = 0; field < sa_info_arr_size; field++) {
>> +if (sa_info_arr[field].mask) {
>> +/* DPDK does not support partially masked set actions. In such
>> + * case, fail the offload.
>> + */
>> +if (sa_info_arr[field].mask[0] != 0x00 &&
>> +sa_info_arr[field].mask[0] != 0xFF) {
>> +VLOG_DBG_RL(_rl,
>> +"Partial mask is not supported");
>> +return -1;
>> +}
>> +
>> +for (i = 1; i < sa_info_arr[field].size; i++) {
>> +if (sa_info_arr[field].mask[i] !=
>> +sa_info_arr[field].mask[i - 1]) {
>> +VLOG_DBG_RL(_rl,
>> +"Partial mask is not supported");
>> +return -1;
>> +}
>> +}
>> +
>> +if (sa_info_arr[field].mask[0] == 0x00) {
>> +/* mask bytes are all 0 - no rewrite action required */
>> +continue;
>> +}
>> +}
>> +
>> +memcpy(sa_info_arr[field].spec, sa_info_arr[field].value,
>> +   sa_info_arr[field].size);
>> +add_flow_action(actions, sa_info_arr[field].attr,
>> +sa_info_arr[field].spec);
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +/* Mask is at the midpoint of the data. */
>> +#define get_mask(a, type) ((const type *)(const void *)(a + 1) + 1)
>> +
>> +#define SA_INFO(_field, _spec, _attr) { \
>> +.value = (uint8_t *)>_field, \
>> +.mask = (masked) ? (uint8_t *)>_field : NULL, \
>> +.size = sizeof key->_field, \
>> +.spec = (uint8_t *)&_spec, \
>> +.attr = _attr }
>> +
>> +static int
>> 

Re: [ovs-dev] [PATCH V3 02/19] netdev-offload-dpdk: Refactor action items freeing scheme

2019-12-10 Thread Ilya Maximets
On 10.12.2019 15:06, Eli Britstein wrote:
> 
> On 12/10/2019 3:43 PM, Ilya Maximets wrote:
>> On 08.12.2019 14:22, Eli Britstein wrote:
>>> Action item data structures are pointed by rte_flow_action items.
>>> Refactor the code to free the data structures when freeing the
>>> rte_flow_action items, allowing simpler future actions simpler to add to
>>> the code.
>>>
>>> Signed-off-by: Eli Britstein 
>>> ---
>>>   lib/netdev-offload-dpdk.c | 92 
>>> ++-
>>>   1 file changed, 52 insertions(+), 40 deletions(-)
>>>
>>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>>> index a008e642f..c3b595a0a 100644
>>> --- a/lib/netdev-offload-dpdk.c
>>> +++ b/lib/netdev-offload-dpdk.c
>>> @@ -374,40 +374,19 @@ add_flow_action(struct flow_actions *actions, enum 
>>> rte_flow_action_type type,
>>>   actions->cnt++;
>>>   }
>>>   
>>> -struct action_rss_data {
>>> -struct rte_flow_action_rss conf;
>>> -uint16_t queue[0];
>>> -};
>>> -
>>> -static struct action_rss_data *
>>> -add_flow_rss_action(struct flow_actions *actions,
>>> -struct netdev *netdev)
>>> +static void
>>> +free_flow_actions(struct flow_actions *actions)
>>>   {
>>>   int i;
>>> -struct action_rss_data *rss_data;
>>> -
>>> -rss_data = xmalloc(sizeof *rss_data +
>>> -   netdev_n_rxq(netdev) * sizeof rss_data->queue[0]);
>>> -*rss_data = (struct action_rss_data) {
>>> -.conf = (struct rte_flow_action_rss) {
>>> -.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
>>> -.level = 0,
>>> -.types = 0,
>>> -.queue_num = netdev_n_rxq(netdev),
>>> -.queue = rss_data->queue,
>>> -.key_len = 0,
>>> -.key  = NULL
>>> -},
>>> -};
>>>   
>>> -/* Override queue array with default. */
>>> -for (i = 0; i < netdev_n_rxq(netdev); i++) {
>>> -   rss_data->queue[i] = i;
>>> +for (i = 0; i < actions->cnt; i++) {
>>> +if (actions->actions[i].conf) {
>>> +free((void *)actions->actions[i].conf);
>> Please, don't cast the argument.
> The conf field is declared as "const void *" in DPDK. How can we avoid it?

If it's here to remove the 'const', use explicit CONST_CAST instead.

>>
>>> +}
>>>   }
>>> -
>>> -add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, _data->conf);
>>> -
>>> -return rss_data;
>>> +free(actions->actions);
>>> +actions->actions = NULL;
>>> +actions->cnt = 0;
>>>   }
>>>   
>>>   struct flow_items {
>>> @@ -572,6 +551,47 @@ parse_flow_match(struct flow_patterns *patterns,
>>>   return 0;
>>>   }
>>>   
>>> +static void
>>> +add_flow_mark_rss_actions(struct flow_actions *actions,
>>> +  uint32_t flow_mark,
>>> +  struct netdev *netdev)
>> const struct netdev *netdev)
> OK
>>
>>> +{
>>> +struct rte_flow_action_mark *mark;
>>> +struct action_rss_data {
>>> +struct rte_flow_action_rss conf;
>>> +uint16_t queue[0];
>>> +} *rss_data;
>> It seems we need this:
>>
>> BUILD_ASSERT_DECL(offsetof(struct action_rss_data, conf) == 0);
> OK
>>
>>> +int i;
>>> +
>>> +mark = xzalloc(sizeof *mark);
>>> +
>>> +mark->id = flow_mark;
>>> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_MARK, mark);
>>> +
>>> +rss_data = xmalloc(sizeof *rss_data +
>>> +   netdev_n_rxq(netdev) * sizeof rss_data->queue[0]);
>>> +*rss_data = (struct action_rss_data) {
>>> +.conf = (struct rte_flow_action_rss) {
>>> +.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
>>> +.level = 0,
>>> +.types = 0,
>>> +.queue_num = netdev_n_rxq(netdev),
>>> +.queue = rss_data->queue,
>>> +.key_len = 0,
>>> +.key  = NULL
>>> +},
>>> +};
>>> +
>>> +/* Override queue array with default. */
>>> +for (i = 0; i < netdev_n_rxq(netdev); i++) {
>>> +   rss_data->queue[i] = i;
>>> +}
>>> +
>>> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, _data->conf);
>>> +
>> This empty line doesn't seem to be necessary.
> OK
>>
>>> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_END, NULL);
>>> +}
>>> +
>>>   static int
>>>   netdev_offload_dpdk_add_flow(struct netdev *netdev,
>>>const struct match *match,
>>> @@ -599,20 +619,12 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
>>>   goto out;
>>>   }
>>>   
>>> -struct rte_flow_action_mark mark;
>>> -struct action_rss_data *rss;
>>> -
>>> -mark.id = info->flow_mark;
>>> -add_flow_action(, RTE_FLOW_ACTION_TYPE_MARK, );
>>> -
>>> -rss = add_flow_rss_action(, netdev);
>>> -add_flow_action(, RTE_FLOW_ACTION_TYPE_END, NULL);
>>> +add_flow_mark_rss_actions(, info->flow_mark, netdev);
>>>   
>>>   flow = netdev_dpdk_rte_flow_create(netdev, _attr,
>>>  

Re: [ovs-dev] [PATCH V3 02/19] netdev-offload-dpdk: Refactor action items freeing scheme

2019-12-10 Thread Eli Britstein


On 12/10/2019 3:43 PM, Ilya Maximets wrote:
> On 08.12.2019 14:22, Eli Britstein wrote:
>> Action item data structures are pointed by rte_flow_action items.
>> Refactor the code to free the data structures when freeing the
>> rte_flow_action items, allowing simpler future actions simpler to add to
>> the code.
>>
>> Signed-off-by: Eli Britstein 
>> ---
>>   lib/netdev-offload-dpdk.c | 92 
>> ++-
>>   1 file changed, 52 insertions(+), 40 deletions(-)
>>
>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>> index a008e642f..c3b595a0a 100644
>> --- a/lib/netdev-offload-dpdk.c
>> +++ b/lib/netdev-offload-dpdk.c
>> @@ -374,40 +374,19 @@ add_flow_action(struct flow_actions *actions, enum 
>> rte_flow_action_type type,
>>   actions->cnt++;
>>   }
>>   
>> -struct action_rss_data {
>> -struct rte_flow_action_rss conf;
>> -uint16_t queue[0];
>> -};
>> -
>> -static struct action_rss_data *
>> -add_flow_rss_action(struct flow_actions *actions,
>> -struct netdev *netdev)
>> +static void
>> +free_flow_actions(struct flow_actions *actions)
>>   {
>>   int i;
>> -struct action_rss_data *rss_data;
>> -
>> -rss_data = xmalloc(sizeof *rss_data +
>> -   netdev_n_rxq(netdev) * sizeof rss_data->queue[0]);
>> -*rss_data = (struct action_rss_data) {
>> -.conf = (struct rte_flow_action_rss) {
>> -.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
>> -.level = 0,
>> -.types = 0,
>> -.queue_num = netdev_n_rxq(netdev),
>> -.queue = rss_data->queue,
>> -.key_len = 0,
>> -.key  = NULL
>> -},
>> -};
>>   
>> -/* Override queue array with default. */
>> -for (i = 0; i < netdev_n_rxq(netdev); i++) {
>> -   rss_data->queue[i] = i;
>> +for (i = 0; i < actions->cnt; i++) {
>> +if (actions->actions[i].conf) {
>> +free((void *)actions->actions[i].conf);
> Please, don't cast the argument.
The conf field is declared as "const void *" in DPDK. How can we avoid it?
>
>> +}
>>   }
>> -
>> -add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, _data->conf);
>> -
>> -return rss_data;
>> +free(actions->actions);
>> +actions->actions = NULL;
>> +actions->cnt = 0;
>>   }
>>   
>>   struct flow_items {
>> @@ -572,6 +551,47 @@ parse_flow_match(struct flow_patterns *patterns,
>>   return 0;
>>   }
>>   
>> +static void
>> +add_flow_mark_rss_actions(struct flow_actions *actions,
>> +  uint32_t flow_mark,
>> +  struct netdev *netdev)
> const struct netdev *netdev)
OK
>
>> +{
>> +struct rte_flow_action_mark *mark;
>> +struct action_rss_data {
>> +struct rte_flow_action_rss conf;
>> +uint16_t queue[0];
>> +} *rss_data;
> It seems we need this:
>
> BUILD_ASSERT_DECL(offsetof(struct action_rss_data, conf) == 0);
OK
>
>> +int i;
>> +
>> +mark = xzalloc(sizeof *mark);
>> +
>> +mark->id = flow_mark;
>> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_MARK, mark);
>> +
>> +rss_data = xmalloc(sizeof *rss_data +
>> +   netdev_n_rxq(netdev) * sizeof rss_data->queue[0]);
>> +*rss_data = (struct action_rss_data) {
>> +.conf = (struct rte_flow_action_rss) {
>> +.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
>> +.level = 0,
>> +.types = 0,
>> +.queue_num = netdev_n_rxq(netdev),
>> +.queue = rss_data->queue,
>> +.key_len = 0,
>> +.key  = NULL
>> +},
>> +};
>> +
>> +/* Override queue array with default. */
>> +for (i = 0; i < netdev_n_rxq(netdev); i++) {
>> +   rss_data->queue[i] = i;
>> +}
>> +
>> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, _data->conf);
>> +
> This empty line doesn't seem to be necessary.
OK
>
>> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_END, NULL);
>> +}
>> +
>>   static int
>>   netdev_offload_dpdk_add_flow(struct netdev *netdev,
>>const struct match *match,
>> @@ -599,20 +619,12 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
>>   goto out;
>>   }
>>   
>> -struct rte_flow_action_mark mark;
>> -struct action_rss_data *rss;
>> -
>> -mark.id = info->flow_mark;
>> -add_flow_action(, RTE_FLOW_ACTION_TYPE_MARK, );
>> -
>> -rss = add_flow_rss_action(, netdev);
>> -add_flow_action(, RTE_FLOW_ACTION_TYPE_END, NULL);
>> +add_flow_mark_rss_actions(, info->flow_mark, netdev);
>>   
>>   flow = netdev_dpdk_rte_flow_create(netdev, _attr,
>>  patterns.items,
>>  actions.actions, );
>>   
>> -free(rss);
>>   if (!flow) {
>>   VLOG_ERR("%s: rte flow creat error: %u : message : %s\n",
>>netdev_get_name(netdev), error.type, error.message);

Re: [ovs-dev] [PATCH V3 18/19] netdev-offload-dpdk: Support offload of clone tnl_push/output actions

2019-12-10 Thread Sathya Perla via dev
On Tue, Dec 10, 2019 at 5:25 PM Eli Britstein  wrote:
>
>
> On 12/10/2019 12:09 PM, Sathya Perla wrote:
> > On Tue, Dec 10, 2019 at 1:21 PM Eli Britstein  wrote:
> >>
> >> On 12/10/2019 8:52 AM, Sathya Perla wrote:
> >>> On Sun, Dec 8, 2019 at 6:53 PM Eli Britstein  wrote:
> >>> ...
>  +static int
>  +parse_clone_actions(struct netdev *netdev,
>  +struct flow_actions *actions,
>  +const struct nlattr *clone_actions,
>  +const size_t clone_actions_len,
>  +struct offload_info *info)
>  +{
>  +const struct nlattr *ca;
>  +unsigned int cleft;
>  +
>  +NL_ATTR_FOR_EACH_UNSAFE (ca, cleft, clone_actions, 
>  clone_actions_len) {
>  +int clone_type = nl_attr_type(ca);
>  +
>  +if (clone_type == OVS_ACTION_ATTR_TUNNEL_PUSH) {
>  +const struct ovs_action_push_tnl *tnl_push = 
>  nl_attr_get(ca);
>  +struct rte_flow_action_raw_encap *raw_encap =
>  +xzalloc(sizeof *raw_encap);
>  +
>  +raw_encap->data = (uint8_t *)tnl_push->header;
>  +raw_encap->preserve = NULL;
>  +raw_encap->size = tnl_push->header_len;
>  +
>  +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RAW_ENCAP,
>  +raw_encap);
> >>> Hi, converting OVS_TUNNLE_PUSH into RTE_RAW_ENCAP loses the 'tnl_type'
> >>> information provided by OVS. RAW_ENCAP provides the tunnel header just
> >>> as a blob of bits which may not be ideal for NIC HW to offload.
> >>>
> >>> How about using tnl_push->tnl_type field to parse the header and
> >>> compose specific tunnel encap actions like RTE_VXLAN_ENCAP,
> >>> RTE_NVGRE_ENCAP etc.
> >>> This would be also be more inline with how it's done with TC-offload.
> >> I see your point. However, struct ovs_action_push_tnl has the "header"
> >> field just as a raw binary buffer, and not detailed like in TC.
> >> "tnl_type" has a comment /* For logging. */. It is indeed used for
> >> logging, as in lib/odp-util.c, in function format_odp_tnl_push_header.
> > This is not entirely true. Checkout propagate_tunnel_data_to_flow()
> > where tnl_type is being used
> > to figure out nw_proto.
> >
> >> using VXLAN_ENCAP instead of RAW_ENCAP will make the push less generic,
> >> as I will have to parse the header to build the vxlan_encap fields, just
> >> so they can re-build as a raw header in the PMD level, so I don't see
> >> the offload benefit of it.
> > Why are you assuming that all PMDs will rebuild the tunnel header
> > fields as a 'raw header'?
> As far as I saw in DPDK code, if I'm not wrong, all PMDs that support it
> do it like this. Anyway, it is indeed not relevant.
> > What if the NIC HW needs tunnel specific headers to be programmed
> > separately for each tunnel type?
> >
> >> Another point is that this way it will support any header, not just
> >> VXLAN (as an example).
> > Given that VXLAN and NVGRE ENCAP and DECAP actions are already defined
> > in rte_flow, how about
> > we use them for vxlan and nvgre and use RAW for the rest of the tunnel 
> > types?
> > We can support all tunnel headers even this way.
>
> Again, I see your point.
>
> However, in OVS level, we have the encap as a raw buffer and DPDK
> supports it natively using RAW_ENCAP. For that reason I think we should
> use the straight forward method.
>
> I think that if there is a PMD that requires the fields separately, it
> is under its responsibility to parse it, and not forcing the application
> to do it.

How should a PMD parse the header buffer? For e.g., for vxlan, should the PMD
look for dst UDP port 4789? What if a different port number is used
for vxlan as it's the case with some deployments?
I think it's important to deal with this in OVS because the 'tnl_type'
info is available in OVS and it should be relayed to the PMD
is some form.

>
> As it's a valid attribute in DPDK, and it's the natural way for OVS to
> use, I think we should use it. If it is from some reason not valid, in
> the future, DPDK will deprecate it.

I agree RAW_ENCAP is a valid RTE action. But I feel we must choose the
action that conveys/translates as much
data as possible from the OVS layer to the PMD. This will enable
offload support from more number of NIC HWs.

thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 02/19] netdev-offload-dpdk: Refactor action items freeing scheme

2019-12-10 Thread Ilya Maximets
On 08.12.2019 14:22, Eli Britstein wrote:
> Action item data structures are pointed by rte_flow_action items.
> Refactor the code to free the data structures when freeing the
> rte_flow_action items, allowing simpler future actions simpler to add to
> the code.
> 
> Signed-off-by: Eli Britstein 
> ---
>  lib/netdev-offload-dpdk.c | 92 
> ++-
>  1 file changed, 52 insertions(+), 40 deletions(-)
> 
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index a008e642f..c3b595a0a 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -374,40 +374,19 @@ add_flow_action(struct flow_actions *actions, enum 
> rte_flow_action_type type,
>  actions->cnt++;
>  }
>  
> -struct action_rss_data {
> -struct rte_flow_action_rss conf;
> -uint16_t queue[0];
> -};
> -
> -static struct action_rss_data *
> -add_flow_rss_action(struct flow_actions *actions,
> -struct netdev *netdev)
> +static void
> +free_flow_actions(struct flow_actions *actions)
>  {
>  int i;
> -struct action_rss_data *rss_data;
> -
> -rss_data = xmalloc(sizeof *rss_data +
> -   netdev_n_rxq(netdev) * sizeof rss_data->queue[0]);
> -*rss_data = (struct action_rss_data) {
> -.conf = (struct rte_flow_action_rss) {
> -.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
> -.level = 0,
> -.types = 0,
> -.queue_num = netdev_n_rxq(netdev),
> -.queue = rss_data->queue,
> -.key_len = 0,
> -.key  = NULL
> -},
> -};
>  
> -/* Override queue array with default. */
> -for (i = 0; i < netdev_n_rxq(netdev); i++) {
> -   rss_data->queue[i] = i;
> +for (i = 0; i < actions->cnt; i++) {
> +if (actions->actions[i].conf) {
> +free((void *)actions->actions[i].conf);

Please, don't cast the argument.

> +}
>  }
> -
> -add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, _data->conf);
> -
> -return rss_data;
> +free(actions->actions);
> +actions->actions = NULL;
> +actions->cnt = 0;
>  }
>  
>  struct flow_items {
> @@ -572,6 +551,47 @@ parse_flow_match(struct flow_patterns *patterns,
>  return 0;
>  }
>  
> +static void
> +add_flow_mark_rss_actions(struct flow_actions *actions,
> +  uint32_t flow_mark,
> +  struct netdev *netdev)

const struct netdev *netdev)

> +{
> +struct rte_flow_action_mark *mark;
> +struct action_rss_data {
> +struct rte_flow_action_rss conf;
> +uint16_t queue[0];
> +} *rss_data;

It seems we need this:

BUILD_ASSERT_DECL(offsetof(struct action_rss_data, conf) == 0); 

> +int i;
> +
> +mark = xzalloc(sizeof *mark);
> +
> +mark->id = flow_mark;
> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_MARK, mark);
> +
> +rss_data = xmalloc(sizeof *rss_data +
> +   netdev_n_rxq(netdev) * sizeof rss_data->queue[0]);
> +*rss_data = (struct action_rss_data) {
> +.conf = (struct rte_flow_action_rss) {
> +.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
> +.level = 0,
> +.types = 0,
> +.queue_num = netdev_n_rxq(netdev),
> +.queue = rss_data->queue,
> +.key_len = 0,
> +.key  = NULL
> +},
> +};
> +
> +/* Override queue array with default. */
> +for (i = 0; i < netdev_n_rxq(netdev); i++) {
> +   rss_data->queue[i] = i;
> +}
> +
> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, _data->conf);
> +

This empty line doesn't seem to be necessary.

> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_END, NULL);
> +}
> +
>  static int
>  netdev_offload_dpdk_add_flow(struct netdev *netdev,
>   const struct match *match,
> @@ -599,20 +619,12 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
>  goto out;
>  }
>  
> -struct rte_flow_action_mark mark;
> -struct action_rss_data *rss;
> -
> -mark.id = info->flow_mark;
> -add_flow_action(, RTE_FLOW_ACTION_TYPE_MARK, );
> -
> -rss = add_flow_rss_action(, netdev);
> -add_flow_action(, RTE_FLOW_ACTION_TYPE_END, NULL);
> +add_flow_mark_rss_actions(, info->flow_mark, netdev);
>  
>  flow = netdev_dpdk_rte_flow_create(netdev, _attr,
> patterns.items,
> actions.actions, );
>  
> -free(rss);
>  if (!flow) {
>  VLOG_ERR("%s: rte flow creat error: %u : message : %s\n",
>   netdev_get_name(netdev), error.type, error.message);
> @@ -625,7 +637,7 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
>  
>  out:
>  free(patterns.items);
> -free(actions.actions);
> +free_flow_actions();
>  return ret;
>  }
>  
> 
___
dev mailing list
d...@openvswitch.org

Re: [ovs-dev] [PATCH V3 01/19] netdev-offload-dpdk: Refactor flow patterns

2019-12-10 Thread Eli Britstein


On 12/10/2019 3:27 PM, Ilya Maximets wrote:
> On 08.12.2019 14:22, Eli Britstein wrote:
>> Refactor the flow patterns code to a helper function for better
>> readability and towards supporting more matches.
>>
>> Signed-off-by: Eli Britstein 
>> Reviewed-by: Oz Shlomo 
>> ---
> Although I'm not quite sure that we need this change, a few comments
> for the current patch inline.
>
>>   lib/netdev-offload-dpdk.c | 208 
>> +-
>>   1 file changed, 112 insertions(+), 96 deletions(-)
>>
>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>> index 96794dc4d..a008e642f 100644
>> --- a/lib/netdev-offload-dpdk.c
>> +++ b/lib/netdev-offload-dpdk.c
>> @@ -410,54 +410,42 @@ add_flow_rss_action(struct flow_actions *actions,
>>   return rss_data;
>>   }
>>   
>> +struct flow_items {
>> +struct rte_flow_item_eth  eth;
>> +struct rte_flow_item_vlan vlan;
>> +struct rte_flow_item_ipv4 ipv4;
>> +union {
>> +struct rte_flow_item_tcp  tcp;
>> +struct rte_flow_item_udp  udp;
>> +struct rte_flow_item_sctp sctp;
>> +struct rte_flow_item_icmp icmp;
>> +};
>> +};
>> +
>>   static int
>> -netdev_offload_dpdk_add_flow(struct netdev *netdev,
>> - const struct match *match,
>> - struct nlattr *nl_actions OVS_UNUSED,
>> - size_t actions_len OVS_UNUSED,
>> - const ovs_u128 *ufid,
>> - struct offload_info *info)
>> +parse_flow_match(struct flow_patterns *patterns,
>> + struct flow_items *spec,
>> + struct flow_items *mask,
>> + const struct match *match)
>>   {
>> -const struct rte_flow_attr flow_attr = {
>> -.group = 0,
>> -.priority = 0,
>> -.ingress = 1,
>> -.egress = 0
>> -};
>> -struct flow_patterns patterns = { .items = NULL, .cnt = 0 };
>> -struct flow_actions actions = { .actions = NULL, .cnt = 0 };
>> -struct rte_flow *flow;
>> -struct rte_flow_error error;
>>   uint8_t proto = 0;
>> -int ret = 0;
>> -struct flow_items {
>> -struct rte_flow_item_eth  eth;
>> -struct rte_flow_item_vlan vlan;
>> -struct rte_flow_item_ipv4 ipv4;
>> -union {
>> -struct rte_flow_item_tcp  tcp;
>> -struct rte_flow_item_udp  udp;
>> -struct rte_flow_item_sctp sctp;
>> -struct rte_flow_item_icmp icmp;
>> -};
>> -} spec, mask;
>> -
>> -memset(, 0, sizeof spec);
>> -memset(, 0, sizeof mask);
>> +
>> +memset(spec, 0, sizeof *spec);
>> +memset(mask, 0, sizeof *mask);
> I think that we need to clear those before calling parse_flow_match().
> Will look cleaner.
OK
>
>>   
>>   /* Eth */
>>   if (!eth_addr_is_zero(match->wc.masks.dl_src) ||
>>   !eth_addr_is_zero(match->wc.masks.dl_dst)) {
>> -memcpy(, >flow.dl_dst, sizeof spec.eth.dst);
>> -memcpy(, >flow.dl_src, sizeof spec.eth.src);
>> -spec.eth.type = match->flow.dl_type;
>> +memcpy(>eth.dst, >flow.dl_dst, sizeof spec->eth.dst);
>> +memcpy(>eth.src, >flow.dl_src, sizeof spec->eth.src);
>> +spec->eth.type = match->flow.dl_type;
>>   
>> -memcpy(, >wc.masks.dl_dst, sizeof mask.eth.dst);
>> -memcpy(, >wc.masks.dl_src, sizeof mask.eth.src);
>> -mask.eth.type = match->wc.masks.dl_type;
>> +memcpy(>eth.dst, >wc.masks.dl_dst, sizeof 
>> mask->eth.dst);
>> +memcpy(>eth.src, >wc.masks.dl_src, sizeof 
>> mask->eth.src);
>> +mask->eth.type = match->wc.masks.dl_type;
>>   
>> -add_flow_pattern(, RTE_FLOW_ITEM_TYPE_ETH,
>> - , );
>> +add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_ETH,
>> + >eth, >eth);
>>   } else {
>>   /*
>>* If user specifies a flow (like UDP flow) without L2 patterns,
>> @@ -466,41 +454,41 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
>>* NIC (such as XL710) doesn't support that. Below is a workaround,
>>* which simply matches any L2 pkts.
>>*/
>> -add_flow_pattern(, RTE_FLOW_ITEM_TYPE_ETH, NULL, NULL);
>> +add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_ETH, NULL, NULL);
>>   }
>>   
>>   /* VLAN */
>>   if (match->wc.masks.vlans[0].tci && match->flow.vlans[0].tci) {
>> -spec.vlan.tci  = match->flow.vlans[0].tci & ~htons(VLAN_CFI);
>> -mask.vlan.tci  = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
>> +spec->vlan.tci  = match->flow.vlans[0].tci & ~htons(VLAN_CFI);
>> +mask->vlan.tci  = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
>>   
>>   /* Match any protocols. */
>> -mask.vlan.inner_type = 0;
>> +mask->vlan.inner_type = 0;
>>   
>> -add_flow_pattern(, RTE_FLOW_ITEM_TYPE_VLAN,
>> -   

Re: [ovs-dev] [PATCH V3 01/19] netdev-offload-dpdk: Refactor flow patterns

2019-12-10 Thread Ilya Maximets
On 08.12.2019 14:22, Eli Britstein wrote:
> Refactor the flow patterns code to a helper function for better
> readability and towards supporting more matches.
> 
> Signed-off-by: Eli Britstein 
> Reviewed-by: Oz Shlomo 
> ---

Although I'm not quite sure that we need this change, a few comments
for the current patch inline.

>  lib/netdev-offload-dpdk.c | 208 
> +-
>  1 file changed, 112 insertions(+), 96 deletions(-)
> 
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index 96794dc4d..a008e642f 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -410,54 +410,42 @@ add_flow_rss_action(struct flow_actions *actions,
>  return rss_data;
>  }
>  
> +struct flow_items {
> +struct rte_flow_item_eth  eth;
> +struct rte_flow_item_vlan vlan;
> +struct rte_flow_item_ipv4 ipv4;
> +union {
> +struct rte_flow_item_tcp  tcp;
> +struct rte_flow_item_udp  udp;
> +struct rte_flow_item_sctp sctp;
> +struct rte_flow_item_icmp icmp;
> +};
> +};
> +
>  static int
> -netdev_offload_dpdk_add_flow(struct netdev *netdev,
> - const struct match *match,
> - struct nlattr *nl_actions OVS_UNUSED,
> - size_t actions_len OVS_UNUSED,
> - const ovs_u128 *ufid,
> - struct offload_info *info)
> +parse_flow_match(struct flow_patterns *patterns,
> + struct flow_items *spec,
> + struct flow_items *mask,
> + const struct match *match)
>  {
> -const struct rte_flow_attr flow_attr = {
> -.group = 0,
> -.priority = 0,
> -.ingress = 1,
> -.egress = 0
> -};
> -struct flow_patterns patterns = { .items = NULL, .cnt = 0 };
> -struct flow_actions actions = { .actions = NULL, .cnt = 0 };
> -struct rte_flow *flow;
> -struct rte_flow_error error;
>  uint8_t proto = 0;
> -int ret = 0;
> -struct flow_items {
> -struct rte_flow_item_eth  eth;
> -struct rte_flow_item_vlan vlan;
> -struct rte_flow_item_ipv4 ipv4;
> -union {
> -struct rte_flow_item_tcp  tcp;
> -struct rte_flow_item_udp  udp;
> -struct rte_flow_item_sctp sctp;
> -struct rte_flow_item_icmp icmp;
> -};
> -} spec, mask;
> -
> -memset(, 0, sizeof spec);
> -memset(, 0, sizeof mask);
> +
> +memset(spec, 0, sizeof *spec);
> +memset(mask, 0, sizeof *mask);

I think that we need to clear those before calling parse_flow_match().
Will look cleaner.

>  
>  /* Eth */
>  if (!eth_addr_is_zero(match->wc.masks.dl_src) ||
>  !eth_addr_is_zero(match->wc.masks.dl_dst)) {
> -memcpy(, >flow.dl_dst, sizeof spec.eth.dst);
> -memcpy(, >flow.dl_src, sizeof spec.eth.src);
> -spec.eth.type = match->flow.dl_type;
> +memcpy(>eth.dst, >flow.dl_dst, sizeof spec->eth.dst);
> +memcpy(>eth.src, >flow.dl_src, sizeof spec->eth.src);
> +spec->eth.type = match->flow.dl_type;
>  
> -memcpy(, >wc.masks.dl_dst, sizeof mask.eth.dst);
> -memcpy(, >wc.masks.dl_src, sizeof mask.eth.src);
> -mask.eth.type = match->wc.masks.dl_type;
> +memcpy(>eth.dst, >wc.masks.dl_dst, sizeof 
> mask->eth.dst);
> +memcpy(>eth.src, >wc.masks.dl_src, sizeof 
> mask->eth.src);
> +mask->eth.type = match->wc.masks.dl_type;
>  
> -add_flow_pattern(, RTE_FLOW_ITEM_TYPE_ETH,
> - , );
> +add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_ETH,
> + >eth, >eth);
>  } else {
>  /*
>   * If user specifies a flow (like UDP flow) without L2 patterns,
> @@ -466,41 +454,41 @@ netdev_offload_dpdk_add_flow(struct netdev *netdev,
>   * NIC (such as XL710) doesn't support that. Below is a workaround,
>   * which simply matches any L2 pkts.
>   */
> -add_flow_pattern(, RTE_FLOW_ITEM_TYPE_ETH, NULL, NULL);
> +add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_ETH, NULL, NULL);
>  }
>  
>  /* VLAN */
>  if (match->wc.masks.vlans[0].tci && match->flow.vlans[0].tci) {
> -spec.vlan.tci  = match->flow.vlans[0].tci & ~htons(VLAN_CFI);
> -mask.vlan.tci  = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
> +spec->vlan.tci  = match->flow.vlans[0].tci & ~htons(VLAN_CFI);
> +mask->vlan.tci  = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
>  
>  /* Match any protocols. */
> -mask.vlan.inner_type = 0;
> +mask->vlan.inner_type = 0;
>  
> -add_flow_pattern(, RTE_FLOW_ITEM_TYPE_VLAN,
> - , );
> +add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_VLAN,
> + >vlan, >vlan);
>  }
>  
>  /* IP v4 */
>  if (match->flow.dl_type == 

[ovs-dev] [PATCH] dpif-netdev: Hold global port mutex while calling offload API.

2019-12-10 Thread Ilya Maximets
We changed datapath port lookup to netdev-offload API usage, but
forgot that port mutex was there not only to protect datapath
port hash map.  It was there also as a workaround solution for
complete unsafety of netdev-offload-dpdk functions.

Turning it back to fix the behaviour and adding a comment to prevent
removing it in the future unless netdev-offload-dpdk fixed.

For the thread safety notice see the top of netdev-offload-dpdk.c.

Fixes: 30115809da2e ("dpif-netdev: Use netdev-offload API for port lookup while 
offloading")
Signed-off-by: Ilya Maximets 
---

Shame on me, I forgot about the documentation that I wrote 9 months ago.

 lib/dpif-netdev.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 7ebf4ef87..c57458b62 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2271,7 +2271,12 @@ mark_to_flow_disassociate(struct dp_netdev_pmd_thread 
*pmd,
 
 port = netdev_ports_get(in_port, pmd->dp->class);
 if (port) {
+/* Taking a global 'port_mutex' to fulfill thread safety
+ * restrictions for the netdev-offload-dpdk module. */
+ovs_mutex_lock(>dp->port_mutex);
 ret = netdev_flow_del(port, >mega_ufid, NULL);
+ovs_mutex_unlock(>dp->port_mutex);
+
 netdev_close(port);
 }
 
@@ -2415,10 +2420,14 @@ dp_netdev_flow_offload_put(struct dp_flow_offload_item 
*offload)
 netdev_close(port);
 goto err_free;
 }
+/* Taking a global 'port_mutex' to fulfill thread safety restrictions for
+ * the netdev-offload-dpdk module. */
+ovs_mutex_lock(>dp->port_mutex);
 ret = netdev_flow_put(port, >match,
   CONST_CAST(struct nlattr *, offload->actions),
   offload->actions_len, >mega_ufid, ,
   NULL);
+ovs_mutex_unlock(>dp->port_mutex);
 netdev_close(port);
 
 if (ret) {
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 18/19] netdev-offload-dpdk: Support offload of clone tnl_push/output actions

2019-12-10 Thread Eli Britstein


On 12/10/2019 12:09 PM, Sathya Perla wrote:
> On Tue, Dec 10, 2019 at 1:21 PM Eli Britstein  wrote:
>>
>> On 12/10/2019 8:52 AM, Sathya Perla wrote:
>>> On Sun, Dec 8, 2019 at 6:53 PM Eli Britstein  wrote:
>>> ...
 +static int
 +parse_clone_actions(struct netdev *netdev,
 +struct flow_actions *actions,
 +const struct nlattr *clone_actions,
 +const size_t clone_actions_len,
 +struct offload_info *info)
 +{
 +const struct nlattr *ca;
 +unsigned int cleft;
 +
 +NL_ATTR_FOR_EACH_UNSAFE (ca, cleft, clone_actions, clone_actions_len) 
 {
 +int clone_type = nl_attr_type(ca);
 +
 +if (clone_type == OVS_ACTION_ATTR_TUNNEL_PUSH) {
 +const struct ovs_action_push_tnl *tnl_push = nl_attr_get(ca);
 +struct rte_flow_action_raw_encap *raw_encap =
 +xzalloc(sizeof *raw_encap);
 +
 +raw_encap->data = (uint8_t *)tnl_push->header;
 +raw_encap->preserve = NULL;
 +raw_encap->size = tnl_push->header_len;
 +
 +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RAW_ENCAP,
 +raw_encap);
>>> Hi, converting OVS_TUNNLE_PUSH into RTE_RAW_ENCAP loses the 'tnl_type'
>>> information provided by OVS. RAW_ENCAP provides the tunnel header just
>>> as a blob of bits which may not be ideal for NIC HW to offload.
>>>
>>> How about using tnl_push->tnl_type field to parse the header and
>>> compose specific tunnel encap actions like RTE_VXLAN_ENCAP,
>>> RTE_NVGRE_ENCAP etc.
>>> This would be also be more inline with how it's done with TC-offload.
>> I see your point. However, struct ovs_action_push_tnl has the "header"
>> field just as a raw binary buffer, and not detailed like in TC.
>> "tnl_type" has a comment /* For logging. */. It is indeed used for
>> logging, as in lib/odp-util.c, in function format_odp_tnl_push_header.
> This is not entirely true. Checkout propagate_tunnel_data_to_flow()
> where tnl_type is being used
> to figure out nw_proto.
>
>> using VXLAN_ENCAP instead of RAW_ENCAP will make the push less generic,
>> as I will have to parse the header to build the vxlan_encap fields, just
>> so they can re-build as a raw header in the PMD level, so I don't see
>> the offload benefit of it.
> Why are you assuming that all PMDs will rebuild the tunnel header
> fields as a 'raw header'?
As far as I saw in DPDK code, if I'm not wrong, all PMDs that support it 
do it like this. Anyway, it is indeed not relevant.
> What if the NIC HW needs tunnel specific headers to be programmed
> separately for each tunnel type?
>
>> Another point is that this way it will support any header, not just
>> VXLAN (as an example).
> Given that VXLAN and NVGRE ENCAP and DECAP actions are already defined
> in rte_flow, how about
> we use them for vxlan and nvgre and use RAW for the rest of the tunnel types?
> We can support all tunnel headers even this way.

Again, I see your point.

However, in OVS level, we have the encap as a raw buffer and DPDK 
supports it natively using RAW_ENCAP. For that reason I think we should 
use the straight forward method.

I think that if there is a PMD that requires the fields separately, it 
is under its responsibility to parse it, and not forcing the application 
to do it.

As it's a valid attribute in DPDK, and it's the natural way for OVS to 
use, I think we should use it. If it is from some reason not valid, in 
the future, DPDK will deprecate it.

>
> thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V3 18/19] netdev-offload-dpdk: Support offload of clone tnl_push/output actions

2019-12-10 Thread Sathya Perla via dev
On Tue, Dec 10, 2019 at 1:21 PM Eli Britstein  wrote:
>
>
> On 12/10/2019 8:52 AM, Sathya Perla wrote:
> > On Sun, Dec 8, 2019 at 6:53 PM Eli Britstein  wrote:
> > ...
> >> +static int
> >> +parse_clone_actions(struct netdev *netdev,
> >> +struct flow_actions *actions,
> >> +const struct nlattr *clone_actions,
> >> +const size_t clone_actions_len,
> >> +struct offload_info *info)
> >> +{
> >> +const struct nlattr *ca;
> >> +unsigned int cleft;
> >> +
> >> +NL_ATTR_FOR_EACH_UNSAFE (ca, cleft, clone_actions, clone_actions_len) 
> >> {
> >> +int clone_type = nl_attr_type(ca);
> >> +
> >> +if (clone_type == OVS_ACTION_ATTR_TUNNEL_PUSH) {
> >> +const struct ovs_action_push_tnl *tnl_push = nl_attr_get(ca);
> >> +struct rte_flow_action_raw_encap *raw_encap =
> >> +xzalloc(sizeof *raw_encap);
> >> +
> >> +raw_encap->data = (uint8_t *)tnl_push->header;
> >> +raw_encap->preserve = NULL;
> >> +raw_encap->size = tnl_push->header_len;
> >> +
> >> +add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RAW_ENCAP,
> >> +raw_encap);
> > Hi, converting OVS_TUNNLE_PUSH into RTE_RAW_ENCAP loses the 'tnl_type'
> > information provided by OVS. RAW_ENCAP provides the tunnel header just
> > as a blob of bits which may not be ideal for NIC HW to offload.
> >
> > How about using tnl_push->tnl_type field to parse the header and
> > compose specific tunnel encap actions like RTE_VXLAN_ENCAP,
> > RTE_NVGRE_ENCAP etc.
> > This would be also be more inline with how it's done with TC-offload.
>
> I see your point. However, struct ovs_action_push_tnl has the "header"
> field just as a raw binary buffer, and not detailed like in TC.
> "tnl_type" has a comment /* For logging. */. It is indeed used for
> logging, as in lib/odp-util.c, in function format_odp_tnl_push_header.

This is not entirely true. Checkout propagate_tunnel_data_to_flow()
where tnl_type is being used
to figure out nw_proto.

>
> using VXLAN_ENCAP instead of RAW_ENCAP will make the push less generic,
> as I will have to parse the header to build the vxlan_encap fields, just
> so they can re-build as a raw header in the PMD level, so I don't see
> the offload benefit of it.

Why are you assuming that all PMDs will rebuild the tunnel header
fields as a 'raw header'?
What if the NIC HW needs tunnel specific headers to be programmed
separately for each tunnel type?

>
> Another point is that this way it will support any header, not just
> VXLAN (as an example).

Given that VXLAN and NVGRE ENCAP and DECAP actions are already defined
in rte_flow, how about
we use them for vxlan and nvgre and use RAW for the rest of the tunnel types?
We can support all tunnel headers even this way.

thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] dpdk: Support running PMD threads on cores > RTE_MAX_LCORE.

2019-12-10 Thread David Marchand
Most DPDK components make the assumption that rte_lcore_id() returns
a valid lcore_id in [0..RTE_MAX_LCORE[ range (with the exception of
the LCORE_ID_ANY special value).

OVS does not currently check which value is set in
RTE_PER_LCORE(_lcore_id) which exposes us to potential crashes on DPDK
side.

Introduce a lcore allocator in OVS for PMD threads and map them to
unused lcores from DPDK à la --lcores.

The physical cores on which the PMD threads are running still
constitutes an important information when debugging, so still keep
those in the PMD thread names but add a new debug log when starting
them.
Mapping between OVS threads and DPDK lcores can be dumped with a new
dpdk/dump-lcores command.

Synchronize DPDK internals on numa and cpuset for the PMD threads by
registering them via the rte_thread_set_affinity() helper.

Signed-off-by: David Marchand 
---
Changelog since v1:
- rewired existing configuration 'dpdk-lcore-mask' to use --lcores,
- switched to a bitmap to track lcores,
- added a command to dump current mapping (Flavio): used an experimental
  API to get DPDK lcores cpuset since it is the most reliable/portable
  information,
- used the same code for the logs when starting DPDK/PMD threads,
- addressed Ilya comments,

---
 lib/dpdk-stub.c   |   8 ++-
 lib/dpdk.c| 163 --
 lib/dpdk.h|   5 +-
 lib/dpif-netdev.c |   3 +-
 4 files changed, 170 insertions(+), 9 deletions(-)

diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index c332c217c..90473bc8e 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -39,7 +39,13 @@ dpdk_init(const struct smap *ovs_other_config)
 }
 
 void
-dpdk_set_lcore_id(unsigned cpu OVS_UNUSED)
+dpdk_init_thread_context(unsigned cpu OVS_UNUSED)
+{
+/* Nothing */
+}
+
+void
+dpdk_uninit_thread_context(void)
 {
 /* Nothing */
 }
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 37ea2973c..0173366a0 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -30,6 +30,7 @@
 #include 
 #endif
 
+#include "bitmap.h"
 #include "dirs.h"
 #include "fatal-signal.h"
 #include "netdev-dpdk.h"
@@ -39,6 +40,7 @@
 #include "ovs-numa.h"
 #include "smap.h"
 #include "svec.h"
+#include "unixctl.h"
 #include "util.h"
 #include "vswitch-idl.h"
 
@@ -54,6 +56,9 @@ static bool dpdk_initialized = false; /* Indicates successful 
initialization
* of DPDK. */
 static bool per_port_memory = false; /* Status of per port memory support */
 
+static struct ovs_mutex lcore_bitmap_mutex = OVS_MUTEX_INITIALIZER;
+static unsigned long *lcore_bitmap OVS_GUARDED_BY(lcore_bitmap_mutex);
+
 static int
 process_vhost_flags(char *flag, const char *default_val, int size,
 const struct smap *ovs_other_config,
@@ -94,6 +99,38 @@ args_contains(const struct svec *args, const char *value)
 return false;
 }
 
+static void
+construct_dpdk_lcore_option(const struct smap *ovs_other_config,
+struct svec *args)
+{
+const char *cmask = smap_get(ovs_other_config, "dpdk-lcore-mask");
+struct svec lcores = SVEC_EMPTY_INITIALIZER;
+struct ovs_numa_info_core *core;
+struct ovs_numa_dump *cores;
+int index = 0;
+
+if (!cmask) {
+return;
+}
+if (args_contains(args, "-c") || args_contains(args, "-l") ||
+args_contains(args, "--lcores")) {
+VLOG_WARN("Ignoring database defined option 'dpdk-lcore-mask' "
+  "due to dpdk-extra config");
+return;
+}
+
+cores = ovs_numa_dump_cores_with_cmask(cmask);
+FOR_EACH_CORE_ON_DUMP(core, cores) {
+svec_add_nocopy(, xasprintf("%d@%d", index, core->core_id));
+index++;
+}
+svec_terminate();
+ovs_numa_dump_destroy(cores);
+svec_add(args, "--lcores");
+svec_add_nocopy(args, svec_join(, ",", ""));
+svec_destroy();
+}
+
 static void
 construct_dpdk_options(const struct smap *ovs_other_config, struct svec *args)
 {
@@ -103,7 +140,6 @@ construct_dpdk_options(const struct smap *ovs_other_config, 
struct svec *args)
 bool default_enabled;
 const char *default_value;
 } opts[] = {
-{"dpdk-lcore-mask",   "-c", false, NULL},
 {"dpdk-hugepage-dir", "--huge-dir", false, NULL},
 {"dpdk-socket-limit", "--socket-limit", false, NULL},
 };
@@ -224,6 +260,7 @@ construct_dpdk_args(const struct smap *ovs_other_config, 
struct svec *args)
 svec_parse_words(args, extra_configuration);
 }
 
+construct_dpdk_lcore_option(ovs_other_config, args);
 construct_dpdk_options(ovs_other_config, args);
 construct_dpdk_mutex_options(ovs_other_config, args);
 }
@@ -264,6 +301,58 @@ static cookie_io_functions_t dpdk_log_func = {
 .write = dpdk_log_write,
 };
 
+static void
+dpdk_dump_lcore(struct ds *ds, unsigned lcore)
+{
+struct svec cores = SVEC_EMPTY_INITIALIZER;
+rte_cpuset_t cpuset;
+unsigned cpu;
+
+cpuset = rte_lcore_cpuset(lcore);
+for (cpu 

Re: [ovs-dev] [PATCH V3 16/19] netdev-offload-dpdk: Support offload of set MAC actions

2019-12-10 Thread Eli Britstein


On 12/8/2019 3:23 PM, Eli Britstein wrote:
> Signed-off-by: Eli Britstein 
> Reviewed-by: Oz Shlomo 
> ---
>   Documentation/howto/dpdk.rst |   1 +
>   NEWS |   3 +-
>   lib/netdev-dpdk.c|  15 ++
>   lib/netdev-offload-dpdk.c| 107 
> +++
>   4 files changed, 125 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 6cedd7f63..d228493e9 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -392,6 +392,7 @@ Supported actions for hardware offload are:
>   
>   - Output (a single output, as the last action).
>   - Drop.
> +- Modification of Ethernet (mod_dl_src/mod_dl_dst).
>   
>   Further Reading
>   ---
> diff --git a/NEWS b/NEWS
> index d019e066f..3ade86d49 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -26,7 +26,8 @@ Post-v2.12.0
>* DPDK ring ports (dpdkr) are deprecated and will be removed in next
>  releases.
>* Add support for DPDK 19.11.
> - * Add hardware offload support for output and drop actions 
> (experimental).
> + * Add hardware offload support for output, drop and set MAC actions
> +   (experimental).
>   
>   v2.12.0 - 03 Sep 2019
>   -
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 872a45e75..e67a3dd76 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -4723,6 +4723,21 @@ ds_put_flow_action(struct ds *s, const struct 
> rte_flow_action *actions)
>   }
>   } else if (actions->type == RTE_FLOW_ACTION_TYPE_DROP) {
>   ds_put_cstr(s, "rte flow drop action\n");
> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_SRC ||
> +   actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_DST) {
> +const struct rte_flow_action_set_mac *set_mac = actions->conf;
> +
> +char *dirstr = actions->type == RTE_FLOW_ACTION_TYPE_SET_MAC_DST
> +   ? "dst" : "src";
> +
> +ds_put_format(s, "rte flow set-mac-%s action:\n", dirstr);
> +if (set_mac) {
> +ds_put_format(s,
> +  "  Set-mac-%s: "ETH_ADDR_FMT"\n",
> +  dirstr, ETH_ADDR_BYTES_ARGS(set_mac->mac_addr));
> +} else {
> +ds_put_format(s, "  Set-mac-%s = null\n", dirstr);
> +}
>   } else {
>   ds_put_format(s, "unknown rte flow action (%d)\n", actions->type);
>   }
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index bffb69cad..07f5d4687 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -511,6 +511,103 @@ add_output_action(struct netdev *netdev,
>   return ret;
>   }
>   
> +struct set_action_info {
> +const uint8_t *value, *mask;
> +const uint8_t size;
> +uint8_t *spec;
> +const int attr;
> +};
> +
> +static int
> +add_set_flow_action(struct flow_actions *actions,
> +struct set_action_info *sa_info_arr,
> +size_t sa_info_arr_size)
> +{
> +int field, i;
> +
> +for (field = 0; field < sa_info_arr_size; field++) {
> +if (sa_info_arr[field].mask) {
> +/* DPDK does not support partially masked set actions. In such
> + * case, fail the offload.
> + */
> +if (sa_info_arr[field].mask[0] != 0x00 &&
> +sa_info_arr[field].mask[0] != 0xFF) {
> +VLOG_DBG_RL(_rl,
> +"Partial mask is not supported");
> +return -1;
> +}
> +
> +for (i = 1; i < sa_info_arr[field].size; i++) {
> +if (sa_info_arr[field].mask[i] !=
> +sa_info_arr[field].mask[i - 1]) {
> +VLOG_DBG_RL(_rl,
> +"Partial mask is not supported");
> +return -1;
> +}
> +}
> +
> +if (sa_info_arr[field].mask[0] == 0x00) {
> +/* mask bytes are all 0 - no rewrite action required */
> +continue;
> +}
> +}
> +
> +memcpy(sa_info_arr[field].spec, sa_info_arr[field].value,
> +   sa_info_arr[field].size);
> +add_flow_action(actions, sa_info_arr[field].attr,
> +sa_info_arr[field].spec);
> +}
> +
> +return 0;
> +}
> +
> +/* Mask is at the midpoint of the data. */
> +#define get_mask(a, type) ((const type *)(const void *)(a + 1) + 1)
> +
> +#define SA_INFO(_field, _spec, _attr) { \
> +.value = (uint8_t *)>_field, \
> +.mask = (masked) ? (uint8_t *)>_field : NULL, \
> +.size = sizeof key->_field, \
> +.spec = (uint8_t *)&_spec, \
> +.attr = _attr }
> +
> +static int
> +parse_set_actions(struct flow_actions *actions,
> +  const struct nlattr *set_actions,
> +  const size_t set_actions_len,
> +  bool 

[ovs-dev] [PATCH v2] Encap & Decap actions for MPLS Packet Type

2019-12-10 Thread Martin Varghese
From: Martin Varghese 

The existing PUSH MPLS & POP MPLS actions inserts & removes MPLS header
between ethernet header and the IP header. Though this behaviour is fine
for L3 VPN where an IP packet is encapsulated inside a MPLS tunnel, it
does not suffice the L2 VPN requirements. In L2 VPN the ethernet packets
must be encapsulated inside MPLS tunnel

In this change the encap & decap actions are extended to support MPLS
packet type. The encap & decap adds and removes MPLS header at the start
of packet as depicted below.

Encapsulation:

Actions - encap(mpls(ether_type=0x8847)),encap(ethernet)

Incoming packet -> | ETH | IP | Payload |

1 Actions -  encap(mpls(ether_type=0x8847)) [Kernel action -
ptap_push_mpls:0x8847]

Outgoing packet -> | MPLS | ETH | Payload|

2 Actions - encap(ethernet) [ Kernel action - push_eth ]

Outgoing packet -> | ETH | MPLS | ETH | Payload|

Decapsulation:

Incoming packet -> | ETH | MPLS | ETH | IP | Payload |

Actions - decap(),decap(packet_type(ns=0,type=0)

1 Actions -  decap() [Kernel action - pop_eth)

Outgoing packet -> | MPLS | ETH | IP | Payload|

2 Actions - decap(packet_type(ns=0,type=0) [Kernel action -
ptap_pop_mpls:0]

Outgoing packet -> | ETH  | IP | Payload

Signed-off-by: Martin Varghese 
---
 datapath/actions.c| 56 ++
 datapath/flow_netlink.c   | 21 ++
 datapath/linux/compat/include/linux/openvswitch.h |  2 +
 include/openvswitch/ofp-ed-props.h| 18 +
 lib/dpif-netdev.c |  2 +
 lib/dpif.c|  2 +
 lib/odp-execute.c | 14 
 lib/odp-util.c| 49 ++--
 lib/ofp-actions.c |  5 ++
 lib/ofp-ed-props.c| 92 ++-
 lib/packets.c | 28 +++
 lib/packets.h |  3 +-
 ofproto/ofproto-dpif-ipfix.c  |  2 +
 ofproto/ofproto-dpif-sflow.c  |  2 +
 ofproto/ofproto-dpif-xlate.c  | 54 +
 tests/system-traffic.at   | 34 +
 16 files changed, 376 insertions(+), 8 deletions(-)

diff --git a/datapath/actions.c b/datapath/actions.c
index fbf4457..1d0d904 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -256,6 +256,54 @@ static int pop_mpls(struct sk_buff *skb, struct 
sw_flow_key *key,
return 0;
 }
 
+static int push_ptap_mpls(struct sk_buff *skb, struct sw_flow_key *key,
+const struct ovs_action_push_mpls *mpls)
+{
+
+struct mpls_shim_hdr *lse;
+int err;
+
+if (unlikely(!eth_p_mpls(mpls->mpls_ethertype)))
+return -EINVAL;
+
+/* Networking stack does not allow simultaneous Tunnel and MPLS GSO. */
+if (skb->encapsulation)
+return -EINVAL;
+
+err = skb_cow_head(skb, MPLS_HLEN);
+if (unlikely(err))
+return err;
+
+if (!skb->inner_protocol) {
+skb_set_inner_network_header(skb, skb->mac_len);
+skb_set_inner_protocol(skb, skb->protocol);
+}
+
+skb_push(skb, MPLS_HLEN);
+skb_reset_mac_header(skb);
+skb_reset_network_header(skb);
+
+lse = mpls_hdr(skb);
+lse->label_stack_entry = mpls->mpls_lse;
+skb_postpush_rcsum(skb, lse, MPLS_HLEN);
+skb->protocol = mpls->mpls_ethertype;
+
+invalidate_flow_key(key);
+return 0;
+}
+
+static int ptap_pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
+const __be16 ethertype)
+{
+if(!ethertype)
+key->mac_proto = MAC_PROTO_ETHERNET;
+
+pop_mpls(skb, key, ethertype);
+invalidate_flow_key(key);
+return 0;
+}
+
+
 static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key,
const __be32 *mpls_lse, const __be32 *mask)
 {
@@ -1313,10 +1361,18 @@ static int do_execute_actions(struct datapath *dp, 
struct sk_buff *skb,
err = push_mpls(skb, key, nla_data(a));
break;
 
+   case OVS_ACTION_ATTR_PTAP_PUSH_MPLS:
+err = push_ptap_mpls(skb, key, nla_data(a));
+break;
+
case OVS_ACTION_ATTR_POP_MPLS:
err = pop_mpls(skb, key, nla_get_be16(a));
break;
 
+case OVS_ACTION_ATTR_PTAP_POP_MPLS:
+err = ptap_pop_mpls(skb, key, nla_get_be16(a));
+break;
+
case OVS_ACTION_ATTR_PUSH_VLAN:
err = push_vlan(skb, key, nla_data(a));
break;
diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
index 9fc1a19..46d77d2 100644