from:"Roni Bar Yanai"

Re: [ovs-dev] [PATCH v4 0/9] Add offload support for sFlow

2020-10-11 Thread Roni Bar Yanai

Hi Ilya, 
please see inline

>-Original Message-
>From: Chris Mi 
>Sent: Friday, September 25, 2020 3:40 PM
>To: Roni Bar Yanai ; i.maxim...@ovn.org;
>d...@openvswitch.org
>Cc: sriharsha.basavapa...@broadcom.com; Eli Britstein ;
>hemal.s...@broadcom.com; ian.sto...@intel.com; u9012...@gmail.com;
>simon.hor...@netronome.com
>Subject: Re: [ovs-dev] [PATCH v4 0/9] Add offload support for sFlow
>
>+ Roni
>
>On Thu, 2020-09-24 at 17:56 +0200, Ilya Maximets wrote:
>> On 9/24/20 12:24 PM, Chris Mi wrote:
>> > This patch set adds offload support for sFlow.
>> >
>> > Psample is a genetlink channel for packet sampling. TC action
>> > act_sample uses psample to send sampled packets to userspace.
>> >
>> > When offloading sample action to TC, userspace creates a unique ID
>> > to map sFlow action and tunnel info and passes this ID to kernel
>> > instead of the sFlow info. psample will send this ID and sampled
>> > packet to userspace. Using the ID, userspace can recover the sFlow
>> > info and send sampled packet to the right sFlow monitoring host.
>>
>> Hi.  Thanks for working on this.
>>
>> I read through the implementation really roughly and I have a few
>> questions about the feature in general.  And also a big concern about
>implementation.
>>
>> The main issue that I see is that current implementation is tightly
>> coupled with kernel datapath and doesn't consider userspace datapath
>> at all even in terms of prohibiting support of sampling for userspace 
>> datapath.
>> Also dpif_provider and netdev_offload API relies on psample as a base
>> primitive and this will not allow us to reuse this API once we will
>> have offload support with rte_flow in netdev-offload-dpdk or dummy.
>>
>> So, while designing the feature implementation following concepts
>> should be taken into consideration:
>> 1. netdev-offload-tc could be used by userspace datapath and it should be
>>possible to use it correctly (it might be possible at least with
>>skip-sw right now).
>> 2. High-level dpif API should not depend on the implementation and type of
>>the offload provider or datapath and should work correctly with all
>>combinations.  (we should think on how current sampling infrastructure
>>could work with rte_flow sampling implementation in the future).
>>

Right. The implementation should be agnostic to the data plane. I think
maybe having a dedicate sample up-call function pointer added to dpif-provider.
It will be called directly from data path (dpif-netlink in this case) and can 
be shared 
between Implementations. HW offload using TC in user space has an implementation
Issues (to my opinion). For example,  when offloading conntrack using TC the 
kernel 
CT is accessed while user space has its own implementation. 
Other issues are passing information in case offload is not complete, such as 
mark. 
Not clear if we can do that. I think maybe user space offload using TC should 
be limited 
to flows that are single end to end (packet is processed using a single rule).

>> This leads to some questions about the feature itself:
>>
>> How does HW deliver packets to kernel that these packets appears on
>> psample socket instead of regular upcall sockets?  Is it a separate HW
>> rx queue or packets has special marks?
>>
>> How will it work if we will have smaple() action in HW and AF_XDP
>> socket with generic XDP program assigned?  Will packet be delivered
>> via this AF_XDP socket to userspace or it will still be placed into
>> psample socket? (this depends on how HW marks these packets and where
>> it places them) Is it possible for XDP program to determine if this
>> packet sampled or it just missed HW flow?

XDP and TC currently are not in sync. While in SW, XDP will always happen first,
when using HW offload, the HW offload happens first, so TC and XDP are done in 
reverse way. Note that AF_XDP has no meta data right now, so there is no way to 
mark packets for user space (however, I think it might be in the future same as 
DPDK 
has meta data,  I think it is must). This conflict makes the behavior of 
sampled packet
in HW and XDP undefined (or implementation specific). Since XDP executed before 
skb creation, sampled packet would not be marked for XDP code and user space 
AF_XDP 
might get the sampled packet instead of the PSAMPLE channel without any mark.
(although this might be considered as bug, and defining the sampled packet will
 happen before XDP)

Maybe use cases should be completely separated. In case of AF_XDP, the AF_XDP
can be looked as a port representor. We offload rules using TC, expecting that
TC SW is not in the game. All misses in HW wi

Re: [ovs-dev] OvS-DPDK change interface MAC

2020-05-17 Thread Roni Bar Yanai

Hi Ravi,

The patch only configures internal ports of OVS, other 
ports are blocked (you can look at the code).
We currently looking for a solution for setting the MAC
 in case of switchmod when VF is in passthrough and VM is 
 untrusted.
why do you need to change the mac address on the fly?

>-Original Message-
>From: dev  On Behalf Of Ravi Kerur
>Sent: Wednesday, May 13, 2020 9:56 PM
>To: ovs-dev 
>Subject: [ovs-dev] OvS-DPDK change interface MAC
>
>Hello OvS-DPDK team,
>
>Is there a way to change interface mac address for DPDK interfaces?
>Interfaces are part of LACP bond.
>
>I tried following commands and they don't seem to work.
>
>ovs-vsctl set interface  other-
>config:hwaddr=\"00:11:11:11:11:01\"
>ovs-vsctl set interface  mac=\"00:00:00:01:01:01\"
>
>No error messages in vswitchd log. Logs shown below
>
>
>ovs-ofctl dump-ports-desc br-phy
>OFPST_PORT_DESC reply (xid=0x2):
> 1(dpdk-enp4s0f0): addr:a0:36:9f:5d:af:58
> config: 0
> state:  0
> current:10GB-FD AUTO_NEG
> speed: 1 Mbps now, 0 Mbps max
> 2(dpdk-ens11f0): addr:90:e2:ba:a0:e6:10
> config: 0
> state:  0
> current:10GB-FD AUTO_NEG
> speed: 1 Mbps now, 0 Mbps max
> LOCAL(br-phy): addr:a0:36:9f:5d:af:58
> config: 0
> state:  0
> current:10MB-FD COPPER
> speed: 10 Mbps now, 0 Mbps max
>
>/Execute command***/
>ovs-vsctl set interface dpdk-ens11f0
>other-config:hwaddr=\"00:11:11:11:11:01\"
>
>/***Nothing changed***/
>ovs-ofctl dump-ports-desc br-phy
>OFPST_PORT_DESC reply (xid=0x2):
> 1(dpdk-enp4s0f0): addr:a0:36:9f:5d:af:58
> config: 0
> state:  0
> current:10GB-FD AUTO_NEG
> speed: 1 Mbps now, 0 Mbps max
> 2(dpdk-ens11f0): addr:90:e2:ba:a0:e6:10
> config: 0
> state:  0
> current:10GB-FD AUTO_NEG
> speed: 1 Mbps now, 0 Mbps max
> LOCAL(br-phy): addr:a0:36:9f:5d:af:58
> config: 0
> state:  0
> current:10MB-FD COPPER
> speed: 10 Mbps now, 0 Mbps max
>
>Thanks,
>Ravi
>___
>dev mailing list
>d...@openvswitch.org
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.open
>vswitch.org%2Fmailman%2Flistinfo%2Fovs-
>devdata=02%7C01%7Croniba%40mellanox.com%7C17acfbf576d840852ac40
>8d7f76f4355%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C63724992956
>6354578sdata=UKJmh36BntaRjllz551sBv0ah6O4sF%2BO8d6IGtr3xwM%3D&
>amp;reserved=0
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/1] vswitchd: Allow setting MAC on DPDK interfaces

2020-04-20 Thread Roni Bar Yanai

Hi Ben, Ilya

Going back to this thread. We've tried app-ctl approach and it fails on 
consistency
problem. Orchestrator can configure on full system init, but now executing 
local 
restart will lose the configuration (again for bifurcated driver it is not 
problem).

The requirement for setting VF mac through the representor, comes from different
use cases such as Open Stack and Kubernetes. VF is used with pass-through and
untrusted user VM/Pod/Container . The MAC address is set by the orchestrator
only. For Linux use case there is ip tool for doing that, and Linux takes care 
of the
consistency. For OVS-DPDK with port representor, there is no way to configure 
VF MAC
address (except of for bifurcated drivers).

The requirement is to enable orchestrator configuration of VF MAC address in 
DPDK,
when working with port representor.
The solution should handle the generic use case and not bifurcated drivers only.
The MAC should be kept and configured in case of OVS restart. 

Maybe we can go back to the first solution and open set MAC to port representors
only. We treat the solution as a generic solution and ignore bifurcated 
drivers. 
When user configure the representor MAC (which is a reflection of the VF), the 
VF MAC is configured. The MAC address is saved in the DB. In case of returning 
back
to kernel there is no problem because the DB MAC applies only for DPDK 
representor.
For bifurcated driver, user can cause in consistency, but this is a unique case 
and 
we cannot defend against misuse of bifurcated drivers.

Any thoughts?

>-Original Message-
>From: dev  On Behalf Of Roni Bar Yanai
>Sent: Sunday, January 26, 2020 9:47 AM
>To: Ben Pfaff 
>Cc: d...@openvswitch.org; Adrian Chiris ; Ilya Maximets
>; Ameer Mahagneh ; Eveline
>Raine 
>Subject: Re: [ovs-dev] [PATCH 1/1] vswitchd: Allow setting MAC on DPDK
>interfaces
>
>Thanks Ben.
>
>>-Original Message-
>>From: Ben Pfaff 
>>Sent: Thursday, January 23, 2020 11:33 PM
>>To: Roni Bar Yanai 
>>Cc: Ilya Maximets ; Ophir Munk
>>; Eveline Raine ;
>>d...@openvswitch.org; Moshe Levi ; Adrian Chiris
>>; Majd Dibbiny ; Ameer
>>Mahagneh 
>>Subject: Re: [PATCH 1/1] vswitchd: Allow setting MAC on DPDK interfaces
>>
>>The main problem is that the database is stateful.  I would not have
>>the same objection to an RPC to set an Ethernet address.  This could be
>>implemented via the ovs-appctl interface, if local-only is acceptable,
>>or via OpenFlow, if the controller needs to do it.
>>
>>On Wed, Jan 22, 2020 at 02:49:47PM +, Roni Bar Yanai wrote:
>>> Hi Ilya, Ben
>>>
>>> What is the plan to manage MAC addresses of DPDK ports with representors?
>>> The solution should be generic and support non bifurcated drivers.
>>> Currently we cannot integrate DPDK with port representors in
>>> platforms such as open stack.
>>>
>>> Thanks,
>>> Roni
>>> >-Original Message-
>>> >From: Roni Bar Yanai
>>> >Sent: Monday, January 13, 2020 1:24 PM
>>> >To: Ilya Maximets ; Ophir Munk
>>> >; Ben Pfaff 
>>> >Cc: Eveline Raine ; d...@openvswitch.org;
>>> >Moshe Levi ; Adrian Chiris
>>> >; Majd Dibbiny ; Ameer
>>> >Mahagneh 
>>> >Subject: RE: [PATCH 1/1] vswitchd: Allow setting MAC on DPDK
>>> >interfaces
>>> >
>>> >
>>> >
>>> >>-Original Message-
>>> >>From: Ilya Maximets 
>>> >>Sent: Thursday, January 9, 2020 3:28 PM
>>> >>To: Ophir Munk ; Ben Pfaff ;
>>> >>Ilya Maximets 
>>> >>Cc: Eveline Raine ; d...@openvswitch.org;
>>> >>Moshe Levi ; Adrian Chiris
>>> >>; Majd Dibbiny ; Roni Bar
>>> >>Yanai ; Ameer Mahagneh
>>
>>> >>Subject: Re: [PATCH 1/1] vswitchd: Allow setting MAC on DPDK
>>> >>interfaces
>>> >>
>>> >>On 08.01.2020 11:48, Ophir Munk wrote:
>>> >>> Hi,
>>> >>> There is an important use case for having OVS change MAC
>>> >>> addresses of dpdk
>>> >>interfaces.
>>> >>> OpenStack for example needs to update the MAC address of a VF
>>> >>> assigned to a
>>> >>VM, where the corresponding VF representor is owned by dpdk.
>>> >>> For some NIC vendors using "ifconfig" or "ip" commands - is not
>>> >>> an option (if the
>>> >>NIC is not bifurcated).
>>> >>> Therefore OpenStack should use the OVS API to set the MAC address
>>> >>> for dpdk
>>> >

Re: [ovs-dev] [PATCH] docs: Update conntrack established state description

2020-02-24 Thread Roni Bar Yanai




>-Original Message-
>From: Yi-Hung Wei 
>Sent: Saturday, February 22, 2020 12:51 AM
>To: d...@openvswitch.org; Roni Bar Yanai 
>Cc: Yi-Hung Wei 
>Subject: [PATCH] docs: Update conntrack established state description
>
>Patch a867c010ee91 ("conntrack: Fix conntrack new state") fixes the userspace
>conntrack behavior.  This patch updates the corresponding conntrack state
>description.
>
>Fixes: a867c010ee91 ("conntrack: Fix conntrack new state")
>Reported-by: Roni Bar Yanai 
>Signed-off-by: Yi-Hung Wei 
>---
>Please backports to branch 2.13.
>
>---
> lib/meta-flow.xml | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/lib/meta-flow.xml b/lib/meta-flow.xml index
>90b405c73750..2f9c5ee16342 100644
>--- a/lib/meta-flow.xml
>+++ b/lib/meta-flow.xml
>@@ -2566,8 +2566,8 @@ actions=clone(load:0->NXM_OF_IN_PORT[],output:123)
>
> est (0x02)
> 
>-  Part of an existing connection.  Set to 1 if this is a committed
>-  connection.
>+  Part of an existing connection.  Set to 1 if packets of a committed
>+  connection have been seen by conntrack from both directions.
> 
>
> rel (0x04)
>--
>2.7.4
Looks good to me.
Acked-by: Roni Bar Yanai 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] conntrack: Fix conntrack new state

2020-02-20 Thread Roni Bar Yanai

Hi Yi-Hung,

>From the man page, it looks like the +est is the result of committing action 
>and not related to the TCP state machine itself.
I'm aware that this is not how it works in kernel conntrack, if you want to 
align both , the description might need an update.

new (0x01)
 A new connection. Set to 1 if this is an uncommitted con‐
 nection.

  est (0x02)
 Part of an existing connection. Set to 1  if  this  is  a
 committed connection.

>-Original Message-
>From: dev  On Behalf Of Dumitru Ceara
>Sent: Tuesday, February 4, 2020 10:06 AM
>To: Yi-Hung Wei 
>Cc: ovs dev 
>Subject: Re: [ovs-dev] [PATCH] conntrack: Fix conntrack new state
>
>On 2/4/20 12:48 AM, Yi-Hung Wei wrote:
>> On Fri, Jan 31, 2020 at 2:07 AM Dumitru Ceara  wrote:
>>>
>>> On 1/29/20 7:58 PM, William Tu wrote:
 On Fri, Dec 20, 2019 at 01:16:39PM -0800, Ben Pfaff wrote:
> On Fri, Dec 20, 2019 at 09:51:08AM -0800, Yi-Hung Wei wrote:
>> In connection tracking system, a connection is established if we
>> see packets from both directions.  However, in userspace
>> datapath's conntrack, if we send a connection setup packet in one
>> direction twice, it will make the connection to be in established state.
>>
>> This patch fixes the aforementioned issue, and adds a system
>> traffic test for UDP and TCP traffic to avoid regression.
>>
>> Fixes: a489b16854b59 ("conntrack: New userspace connection
>> tracker.")
>> Signed-off-by: Yi-Hung Wei 
>> ---

 LGTM. I applied this to master, and branch 2.13.
 Thanks
 William

>>>
>>> Hi William, Yi-Hung,
>>>
>>> This patch breaks OVN switch load balancer system tests when running
>>> with the OVS userspace datapath. I only had a glance at the code and
>>> didn't dig into why exactly the tests fail yet but it's quite easy to
>>> reproduce:
>>>
>>> $ cd /tmp/
>>> $ git clone https://github.com/openvswitch/ovs
>>> $ cd /tmp/ovs
>>> $ git checkout a867c010ee9183885ee9d3eb76a0005c075c4d2e
>>> $ ./boot.sh && ./configure && make -j8 $ cd /tmp/ $ git clone
>>> https://github.com/ovn-org/ovn $ cd /tmp/ovn/ $ ./boot.sh &&
>>> ./configure --with-ovs-source=/tmp/ovs --with-ovs-build=/tmp/ovs &&
>>> make -j8 $ make -j8 check-system-userspace TESTSUITEFLAGS="-k ovnlb"
>>> [...]
>>> system-ovn
>>>
>>>   7: ovn -- load-balancing   FAILED
>>> (system-ovn.at:1121)
>>>   8: ovn -- load-balancing - IPv6FAILED
>>> (system-ovn.at:1268)
>>>   9: ovn -- load-balancing - same subnet.FAILED
>>> (system-ovn.at:1389)
>>>  10: ovn -- load-balancing - same subnet. - IPv6 FAILED
>>> (system-ovn.at:1498)
>>>  11: ovn -- load balancing in gateway router ok
>>>  12: ovn -- load balancing in gateway router - IPv6  ok
>>>  13: ovn -- multiple gateway routers, load-balancing ok
>>>  14: ovn -- multiple gateway routers, load-balancing - IPv6 ok
>>>  15: ovn -- load balancing in router with gateway router port ok
>>>  16: ovn -- load balancing in router with gateway router port - IPv6
>>> ok
>>>
>>> # Reverting commit a867c010ee9183885ee9d3eb76a0005c075c4d2e
>>>
>>> $ cd /tmp/ovs
>>> $ git checkout a867c010ee9183885ee9d3eb76a0005c075c4d2e~
>>> $ make -j8
>>> $ cd /tmp/ovn
>>> $ make -j8 check-system-userspace TESTSUITEFLAGS="-k ovnlb"
>>> [...]
>>> system-ovn
>>>
>>>   7: ovn -- load-balancing   ok
>>>   8: ovn -- load-balancing - IPv6ok
>>>   9: ovn -- load-balancing - same subnet.ok
>>>  10: ovn -- load-balancing - same subnet. - IPv6 ok
>>>  11: ovn -- load balancing in gateway router ok
>>>  12: ovn -- load balancing in gateway router - IPv6  ok
>>>  13: ovn -- multiple gateway routers, load-balancing ok
>>>  14: ovn -- multiple gateway routers, load-balancing - IPv6 ok
>>>  15: ovn -- load balancing in router with gateway router port ok
>>>  16: ovn -- load balancing in router with gateway router port - IPv6
>>> ok
>>>
>>> Thanks,
>>> Dumitru
>>
>>
>> Hi Dumitru,
>>
>> Thanks for reporting this issue. I am not familiar with OVN and the
>> broken system tests, but I will take a look.
>>
>> -Yi-Hung
>>
>
>Hi Yi-Hung,
>
>Please let me know if you need help. I can try to find a simpler way to 
>reproduce
>the issue.
>
>Regards,
>Dumitru
>
>___
>dev mailing list
>d...@openvswitch.org
>https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/1] vswitchd: Allow setting MAC on DPDK interfaces

2020-01-25 Thread Roni Bar Yanai

Thanks Ben.

>-Original Message-
>From: Ben Pfaff 
>Sent: Thursday, January 23, 2020 11:33 PM
>To: Roni Bar Yanai 
>Cc: Ilya Maximets ; Ophir Munk
>; Eveline Raine ;
>d...@openvswitch.org; Moshe Levi ; Adrian Chiris
>; Majd Dibbiny ; Ameer
>Mahagneh 
>Subject: Re: [PATCH 1/1] vswitchd: Allow setting MAC on DPDK interfaces
>
>The main problem is that the database is stateful.  I would not have the same
>objection to an RPC to set an Ethernet address.  This could be implemented via 
>the
>ovs-appctl interface, if local-only is acceptable, or via OpenFlow, if the 
>controller
>needs to do it.
>
>On Wed, Jan 22, 2020 at 02:49:47PM +, Roni Bar Yanai wrote:
>> Hi Ilya, Ben
>>
>> What is the plan to manage MAC addresses of DPDK ports with representors?
>> The solution should be generic and support non bifurcated drivers.
>> Currently we cannot integrate DPDK with port representors in platforms
>> such as open stack.
>>
>> Thanks,
>> Roni
>> >-Original Message-
>> >From: Roni Bar Yanai
>> >Sent: Monday, January 13, 2020 1:24 PM
>> >To: Ilya Maximets ; Ophir Munk
>> >; Ben Pfaff 
>> >Cc: Eveline Raine ; d...@openvswitch.org; Moshe
>> >Levi ; Adrian Chiris ;
>> >Majd Dibbiny ; Ameer Mahagneh
>> >
>> >Subject: RE: [PATCH 1/1] vswitchd: Allow setting MAC on DPDK
>> >interfaces
>> >
>> >
>> >
>> >>-Original Message-----
>> >>From: Ilya Maximets 
>> >>Sent: Thursday, January 9, 2020 3:28 PM
>> >>To: Ophir Munk ; Ben Pfaff ; Ilya
>> >>Maximets 
>> >>Cc: Eveline Raine ; d...@openvswitch.org;
>> >>Moshe Levi ; Adrian Chiris
>> >>; Majd Dibbiny ; Roni Bar
>> >>Yanai ; Ameer Mahagneh
>
>> >>Subject: Re: [PATCH 1/1] vswitchd: Allow setting MAC on DPDK
>> >>interfaces
>> >>
>> >>On 08.01.2020 11:48, Ophir Munk wrote:
>> >>> Hi,
>> >>> There is an important use case for having OVS change MAC addresses
>> >>> of dpdk
>> >>interfaces.
>> >>> OpenStack for example needs to update the MAC address of a VF
>> >>> assigned to a
>> >>VM, where the corresponding VF representor is owned by dpdk.
>> >>> For some NIC vendors using "ifconfig" or "ip" commands - is not an
>> >>> option (if the
>> >>NIC is not bifurcated).
>> >>> Therefore OpenStack should use the OVS API to set the MAC address
>> >>> for dpdk
>> >>interfaces.
>> >>> Along with Ben's explanation it seems right to only allow "internal" or
>"dpdk"
>> >>port types to set the MAC.
>> >>
>> >>There are still same issues for DPDK ports.  Just because OVS
>> >>doesn't necessarily "owns" them.  In case of Mellanox NICs, ports
>> >>are still could be
>> >configured by 'ip'
>> >>tool due to bifurcated drivers.  And this produces the same confusion.
>> >>
>> >This makes it hard to integrate OVS-DPDK in cloud with port
>> >representor. There Is no way to configure MAC address for untrusted
>> >VM. If the driver is bifurcated there is some WA,  but this is not a generic
>solution and will not be used.
>> >Any thoughts on how we can configure MAC for untrusted VM on such
>> >topologies?
>> >>>
>> >>> Testing both patches [1] and [2] - passed successfully.
>> >>> Acked-by: Ophir Munk 
>> >>>
>> >>> I hope patches [1] and [2] can be merged to master.
>> >>>
>> >>> [1]
>> >>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F
>> >>> pat
>> >>> c
>> >>>
>>
>>>hwork.ozlabs.org%2Fpatch%2F1186896%2Fdata=02%7C01%7Croniba%40
>> >m
>> >>ell
>> >>>
>>
>>>anox.com%7Ca56df03e685a4f6ca39708d79507b4bf%7Ca652971c7d2e4d9ba6a4d
>1
>> >4
>> >>9
>> >>>
>>
>>>256f461b%7C0%7C0%7C637141732649459544sdata=okjLnR63gplGcbFABh
>Y
>> >y
>> >>Kw
>> >>> m0TXgT6VyGvlLtpnmRB%2Fk%3Dreserved=0
>> >>> ("[ovs-dev,v2] netdev-dpdk: Add ability to set MAC address.")
>> >>>
>> >>> [2]
>> >>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F
>> >>> pat
>> >>> c
>> >

Re: [ovs-dev] [PATCH 1/1] vswitchd: Allow setting MAC on DPDK interfaces

2020-01-22 Thread Roni Bar Yanai

Hi Ilya, Ben

What is the plan to manage MAC addresses of DPDK ports with representors?
The solution should be generic and support non bifurcated drivers.
Currently we cannot integrate DPDK with port representors in platforms
 such as open stack.

Thanks,
Roni
>-Original Message-
>From: Roni Bar Yanai
>Sent: Monday, January 13, 2020 1:24 PM
>To: Ilya Maximets ; Ophir Munk
>; Ben Pfaff 
>Cc: Eveline Raine ; d...@openvswitch.org; Moshe Levi
>; Adrian Chiris ; Majd Dibbiny
>; Ameer Mahagneh 
>Subject: RE: [PATCH 1/1] vswitchd: Allow setting MAC on DPDK interfaces
>
>
>
>>-Original Message-
>>From: Ilya Maximets 
>>Sent: Thursday, January 9, 2020 3:28 PM
>>To: Ophir Munk ; Ben Pfaff ; Ilya
>>Maximets 
>>Cc: Eveline Raine ; d...@openvswitch.org; Moshe
>>Levi ; Adrian Chiris ; Majd
>>Dibbiny ; Roni Bar Yanai ;
>>Ameer Mahagneh 
>>Subject: Re: [PATCH 1/1] vswitchd: Allow setting MAC on DPDK interfaces
>>
>>On 08.01.2020 11:48, Ophir Munk wrote:
>>> Hi,
>>> There is an important use case for having OVS change MAC addresses of
>>> dpdk
>>interfaces.
>>> OpenStack for example needs to update the MAC address of a VF
>>> assigned to a
>>VM, where the corresponding VF representor is owned by dpdk.
>>> For some NIC vendors using "ifconfig" or "ip" commands - is not an
>>> option (if the
>>NIC is not bifurcated).
>>> Therefore OpenStack should use the OVS API to set the MAC address for
>>> dpdk
>>interfaces.
>>> Along with Ben's explanation it seems right to only allow "internal" or 
>>> "dpdk"
>>port types to set the MAC.
>>
>>There are still same issues for DPDK ports.  Just because OVS doesn't
>>necessarily "owns" them.  In case of Mellanox NICs, ports are still could be
>configured by 'ip'
>>tool due to bifurcated drivers.  And this produces the same confusion.
>>
>This makes it hard to integrate OVS-DPDK in cloud with port representor. There 
>Is
>no way to configure MAC address for untrusted VM. If the driver is bifurcated
>there is some WA,  but this is not a generic solution and will not be used.
>Any thoughts on how we can configure MAC for untrusted VM on such
>topologies?
>>>
>>> Testing both patches [1] and [2] - passed successfully.
>>> Acked-by: Ophir Munk 
>>>
>>> I hope patches [1] and [2] can be merged to master.
>>>
>>> [1]
>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpat
>>> c
>>>
>>hwork.ozlabs.org%2Fpatch%2F1186896%2Fdata=02%7C01%7Croniba%40
>m
>>ell
>>>
>>anox.com%7Ca56df03e685a4f6ca39708d79507b4bf%7Ca652971c7d2e4d9ba6a4d1
>4
>>9
>>>
>>256f461b%7C0%7C0%7C637141732649459544sdata=okjLnR63gplGcbFABhY
>y
>>Kw
>>> m0TXgT6VyGvlLtpnmRB%2Fk%3Dreserved=0
>>> ("[ovs-dev,v2] netdev-dpdk: Add ability to set MAC address.")
>>>
>>> [2]
>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpat
>>> c
>>>
>>hwork.ozlabs.org%2Fpatch%2F1215075%2Fdata=02%7C01%7Croniba%40
>m
>>ell
>>>
>>anox.com%7Ca56df03e685a4f6ca39708d79507b4bf%7Ca652971c7d2e4d9ba6a4d1
>4
>>9
>>>
>>256f461b%7C0%7C0%7C637141732649459544sdata=YovBQCaUkKBstJVj9lq
>J8
>>t
>>> 5aWo8ZHBvZzPZpFCzbUFM%3Dreserved=0
>>> ("[ovs-dev,1/1] vswitchd: Allow setting MAC on DPDK interfaces")
>>>
>>>> -Original Message-
>>>> From: Ben Pfaff 
>>>> Sent: Tuesday, January 7, 2020 2:26 AM
>>>> To: Ilya Maximets 
>>>> Cc: Eveline Raine ; d...@openvswitch.org;
>>>> Moshe Levi ; Adrian Chiris
>>>> ; Ophir Munk ; Majd
>>>> Dibbiny
>>;
>>>> Roni Bar Yanai ; Ameer Mahagneh
>>>> 
>>>> Subject: Re: [PATCH 1/1] vswitchd: Allow setting MAC on DPDK
>>>> interfaces
>>>>
>>>> On Fri, Jan 03, 2020 at 03:56:59PM +0100, Ilya Maximets wrote:
>>>>> Ben, do you see any other drawbacks that we should handle if we'll
>>>>> allow changing MAC addresses for non-internal ports?  Or, maybe
>>>>> some issues with my logic?
>>>>
>>>> It can cause surprises for interactions with regular system tools.
>>>> Anyone who uses "ip" or "ifconfig" to change the MAC will find it
>>>> changed back later (perhaps not immediately).  And if you un-set it
>>>> in the database, OVS doesn't know what to change it back to.
>>>>
>>>> These drawbacks aren't there in the same way for devices that OVS "owns"
>>>> like internal devices or dpdk devices.  Well, I guess they are for
>>>> OVS internal devices to some extent, but for those OVS has some
>>>> responsibility to pick a reasonable MAC address to begin with.  If
>>>> OVS doesn't, then it causes confusion of its own through things like
>>>> having a machine's MAC address change if you create a bridge and
>>>> move a physical NIC onto it.  We had lots of experience with that early on 
>>>> with
>OVS.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCHv6] userspace: Add GTP-U support.

2020-01-21 Thread Roni Bar Yanai

Hi William,

Two comments, please inline.

>-Original Message-
>From: William Tu 
>Sent: Thursday, January 16, 2020 9:45 PM
>To: d...@openvswitch.org
>Cc: Roni Bar Yanai ; acon...@redhat.com;
>yangy...@inspur.com; fhal...@redhat.com; ashishvarma@gmail.com;
>b...@ovn.org
>Subject: [PATCHv6] userspace: Add GTP-U support.
>
>GTP, GPRS Tunneling Protocol, is a group of IP-based communications
>protocols used to carry general packet radio service (GPRS) within
>GSM, UMTS and LTE networks.  GTP protocol has two parts: Signalling
>(GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used
>for setting up GTP-U protocol, which is an IP-in-UDP tunneling
>protocol. Usually GTP is used in connecting between base station for
>radio, Serving Gateway (S-GW), and PDN Gateway (P-GW).
>
>This patch implements GTP-U protocol for userspace datapath,
>supporting only required header fields and G-PDU message type.
>See spec in:
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.ietf.o
>rg%2Fhtml%2Fdraft-hmm-dmm-5g-uplane-analysis-
>00data=02%7C01%7Croniba%40mellanox.com%7Cf870ce16727e40fbf3cf08d
>79abcc4ef%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C6371480078905
>90483sdata=vU5nHP%2BTF1GqyHJfyfyYKhdla%2FT8BblYRCu2NUfxt7Q%3D
>reserved=0
>
>Signed-off-by: Yi Yang 
>Co-authored-by: Yi Yang 
>Signed-off-by: William Tu 
>---
>v5 -> v6:
>  - rebase to master
>  - travis:
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-
>ci.org%2Fwilliamtu%2Fovs-
>travis%2Fbuilds%2F638083655data=02%7C01%7Croniba%40mellanox.com%
>7Cf870ce16727e40fbf3cf08d79abcc4ef%7Ca652971c7d2e4d9ba6a4d149256f461b%7
>C0%7C0%7C637148007890590483sdata=lgdyvMiOvCjgEiLcFtqJvW2n2sm2clB
>aBqkwoWCzBbg%3Dreserved=0
>
>v4 -> v5:
>  - address Ben and Aaron comments
>1) flow_get_metadata, format_flow_tunnel
>2) use of ?: in MSVS
>3) tun_key_to_attr
>4) handling PT_GTPU_MSG packets
>5) make gtpu_flags and gtpu_msgtype read-only
>  - use be32_to_be64 and be64_to_be32
>  - make gtpu_flags as hexadicimal, gtpu_msgtype as decimal
>  - remove PT_GTPU_MSG
>  Address Roni's comments
>  - for non-GPDU msgtype, don't pop header
>  - add seq number flags parsing, push/pop header support
>  - refactor netdev_tnl_push_udp_header()
>
>v3 -> v4:
>  - applied Ben's doc revise
>  - increment FLOW_WC_SEQ to 42
>  - minor fixes
>  - travis:
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-
>ci.org%2Fwilliamtu%2Fovs-
>travis%2Fbuilds%2F621289678data=02%7C01%7Croniba%40mellanox.com%
>7Cf870ce16727e40fbf3cf08d79abcc4ef%7Ca652971c7d2e4d9ba6a4d149256f461b%7
>C0%7C0%7C637148007890590483sdata=6%2FPIBAFpCtpWXip89BU8B3R6S8B
>2CZFWRxB3rYNRp9Q%3Dreserved=0
>
>v2 -> v3:
>  - pick up the code from v2, rebase to master
>  - many fixes in code, docs, and add more tests
>  - travis:
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-
>ci.org%2Fwilliamtu%2Fovs-
>travis%2Fbuilds%2F619799361data=02%7C01%7Croniba%40mellanox.com%
>7Cf870ce16727e40fbf3cf08d79abcc4ef%7Ca652971c7d2e4d9ba6a4d149256f461b%7
>C0%7C0%7C637148007890590483sdata=ESUBrHITGL7Tn4PuyoCiWq0%2F%2
>FaxTDqIuckfw2ynrAyQ%3Dreserved=0
>
>v1 -> v2:
>  - Add new packet_type PT_GTPU_MSG to handle GTP-U signaling message,
>GTP-U signaling message should be steered into a control plane handler
>by explicit openflow entry in flow table.
>  - Fix gtpu_flags and gptu_msgtype set issue.
>  - Verify communication with kernel GTP implementation from Jiannan
>Ouyang.
>  - Fix unit test issue and make sure all the unit tests can pass.
>  - This patch series add GTP-U tunnel support in DPDK userspace,
>GTP-U is used for carrying user data within the GPRS core network and
>between the radio access network and the core network. The user data
>transported can be packets in any of IPv4, IPv6, or PPP formats.
>---
> Documentation/faq/configuration.rst   |  13 ++
> Documentation/faq/releases.rst|   1 +
> NEWS  |   3 +
> datapath/linux/compat/include/linux/openvswitch.h |   2 +
> include/openvswitch/flow.h|   4 +-
> include/openvswitch/match.h   |   6 +
> include/openvswitch/meta-flow.h   |  28 
> include/openvswitch/packets.h |   4 +-
> lib/dpif-netlink-rtnl.c   |   5 +
> lib/dpif-netlink.c|   5 +
> lib/flow.c|  28 ++--
> lib/flow.h|   2 +-
> lib/match.c

Re: [ovs-dev] [PATCH 1/1] vswitchd: Allow setting MAC on DPDK interfaces

2020-01-13 Thread Roni Bar Yanai




>-Original Message-
>From: Ilya Maximets 
>Sent: Thursday, January 9, 2020 3:28 PM
>To: Ophir Munk ; Ben Pfaff ; Ilya
>Maximets 
>Cc: Eveline Raine ; d...@openvswitch.org; Moshe Levi
>; Adrian Chiris ; Majd Dibbiny
>; Roni Bar Yanai ; Ameer
>Mahagneh 
>Subject: Re: [PATCH 1/1] vswitchd: Allow setting MAC on DPDK interfaces
>
>On 08.01.2020 11:48, Ophir Munk wrote:
>> Hi,
>> There is an important use case for having OVS change MAC addresses of dpdk
>interfaces.
>> OpenStack for example needs to update the MAC address of a VF assigned to a
>VM, where the corresponding VF representor is owned by dpdk.
>> For some NIC vendors using "ifconfig" or "ip" commands - is not an option 
>> (if the
>NIC is not bifurcated).
>> Therefore OpenStack should use the OVS API to set the MAC address for dpdk
>interfaces.
>> Along with Ben's explanation it seems right to only allow "internal" or 
>> "dpdk"
>port types to set the MAC.
>
>There are still same issues for DPDK ports.  Just because OVS doesn't 
>necessarily
>"owns" them.  In case of Mellanox NICs, ports are still could be configured by 
>'ip'
>tool due to bifurcated drivers.  And this produces the same confusion.
>
This makes it hard to integrate OVS-DPDK in cloud with port representor. There  
Is no way to configure MAC address for untrusted VM. If the driver is 
bifurcated 
there is some WA,  but this is not a generic solution and will not be used.
Any thoughts on how we can configure MAC for untrusted VM on such topologies?
>>
>> Testing both patches [1] and [2] - passed successfully.
>> Acked-by: Ophir Munk 
>>
>> I hope patches [1] and [2] can be merged to master.
>>
>> [1]
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc
>>
>hwork.ozlabs.org%2Fpatch%2F1186896%2Fdata=02%7C01%7Croniba%40m
>ell
>>
>anox.com%7Ca56df03e685a4f6ca39708d79507b4bf%7Ca652971c7d2e4d9ba6a4d14
>9
>>
>256f461b%7C0%7C0%7C637141732649459544sdata=okjLnR63gplGcbFABhYy
>Kw
>> m0TXgT6VyGvlLtpnmRB%2Fk%3Dreserved=0
>> ("[ovs-dev,v2] netdev-dpdk: Add ability to set MAC address.")
>>
>> [2]
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc
>>
>hwork.ozlabs.org%2Fpatch%2F1215075%2Fdata=02%7C01%7Croniba%40m
>ell
>>
>anox.com%7Ca56df03e685a4f6ca39708d79507b4bf%7Ca652971c7d2e4d9ba6a4d14
>9
>>
>256f461b%7C0%7C0%7C637141732649459544sdata=YovBQCaUkKBstJVj9lqJ8
>t
>> 5aWo8ZHBvZzPZpFCzbUFM%3Dreserved=0
>> ("[ovs-dev,1/1] vswitchd: Allow setting MAC on DPDK interfaces")
>>
>>> -Original Message-
>>> From: Ben Pfaff 
>>> Sent: Tuesday, January 7, 2020 2:26 AM
>>> To: Ilya Maximets 
>>> Cc: Eveline Raine ; d...@openvswitch.org; Moshe
>>> Levi ; Adrian Chiris ;
>>> Ophir Munk ; Majd Dibbiny
>;
>>> Roni Bar Yanai ; Ameer Mahagneh
>>> 
>>> Subject: Re: [PATCH 1/1] vswitchd: Allow setting MAC on DPDK
>>> interfaces
>>>
>>> On Fri, Jan 03, 2020 at 03:56:59PM +0100, Ilya Maximets wrote:
>>>> Ben, do you see any other drawbacks that we should handle if we'll
>>>> allow changing MAC addresses for non-internal ports?  Or, maybe some
>>>> issues with my logic?
>>>
>>> It can cause surprises for interactions with regular system tools.
>>> Anyone who uses "ip" or "ifconfig" to change the MAC will find it
>>> changed back later (perhaps not immediately).  And if you un-set it
>>> in the database, OVS doesn't know what to change it back to.
>>>
>>> These drawbacks aren't there in the same way for devices that OVS "owns"
>>> like internal devices or dpdk devices.  Well, I guess they are for
>>> OVS internal devices to some extent, but for those OVS has some
>>> responsibility to pick a reasonable MAC address to begin with.  If
>>> OVS doesn't, then it causes confusion of its own through things like
>>> having a machine's MAC address change if you create a bridge and move
>>> a physical NIC onto it.  We had lots of experience with that early on with 
>>> OVS.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC] odp-execute: dp_hash HW offload inconsistency.

2020-01-08 Thread Roni Bar Yanai

Hi Ilya,

I have a setup with vxlan,  and then I do dp_hash (after pop).
First packet does pop in SW, so RSS is invalidated (and anyway it was probably
 calculated on outer),  dp_hash is calculated again SW.
Following packet will do decap in HW so packet has a valid internal RSS 
(mark+rss),
And it is used as the hash. I have inconsistency between the packets that passed
 through SW and packet that were offloaded with the same 5-tuple (internal).

I know vxlan pop is not offload yet, but we hope it will soon.
I thought of solving this issue by calling netdev_offload_dpdk layer to calc 
the hash in case
when HW offload Is enabled. So it will be the same hash for both cases.
dp_hash HW offload without a tunnel, will have the same issue, as HW will use 
some hash not
 necessarily the same as RSS configuration, and even if RSS is used, in SW 
there is another
salt on top of that, so packet with same 5-tuple offload vs not offloaded will 
get different hash.

diff --git a/lib/odp-execute.c b/lib/odp-execute.c
index 563ad1d..26dc907 100644
--- a/lib/odp-execute.c
+++ b/lib/odp-execute.c
@@ -822,14 +822,18 @@ odp_execute_actions(void *dp, struct dp_packet_batch 
*batch, bool steal,
 DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
 /* RSS hash can be used here instead of 5tuple for
  * performance reasons. */
-if (dp_packet_rss_valid(packet)) {
-hash = dp_packet_get_rss_hash(packet);
-hash = hash_int(hash, hash_act->hash_basis);
+if (netdev_is_flow_api_enabled()) {
+hash = netdev_offload_dpdk_set_dp_hash(packet);
 } else {
-flow_extract(packet, );
-hash = flow_hash_5tuple(, hash_act->hash_basis);
-}
-packet->md.dp_hash = hash;
+
+if (dp_packet_rss_valid(packet)) {
+hash = dp_packet_get_rss_hash(packet);
+hash = hash_int(hash, hash_act->hash_basis);
+} else {
+flow_extract(packet, );
+hash = flow_hash_5tuple(, 
hash_act->hash_basis);
+}
+packet->md.dp_hash = hash;
 }
 break;
 }

What do you think?

BR,
Roni

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCHv4] userspace: Add GTP-U support.

2020-01-06 Thread Roni Bar Yanai

Hi William,
See inline

>-Original Message-
>From: William Tu 
>Sent: Thursday, December 12, 2019 7:25 PM
>To: Roni Bar Yanai 
>Cc: d...@openvswitch.org
>Subject: Re: [ovs-dev] [PATCHv4] userspace: Add GTP-U support.
>
>On Thu, Dec 12, 2019 at 07:28:27AM +, Roni Bar Yanai wrote:
>> Hi William,
>>
>> I think the question here is what is the main target functionality?
>> 3-5/G VNF switching?
>
>I actually don't have a target use case. The original patch was from Facebook 
>and
>Intel, see:
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.linuxf
>oundation.org%2Fpipermail%2Fovs-dev%2F2018-
>May%2F346957.htmldata=02%7C01%7Croniba%40mellanox.com%7C5ea521
>3063934e92f8c808d77f283106%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0
>%7C637117682910041411sdata=QyL1wgHhtLZfIe%2Bc%2B7F6eYM6R7E1%2F
>qc5%2FpGH2U4M3VY%3Dreserved=0
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.linuxf
>oundation.org%2Fpipermail%2Fovs-dev%2F2018-
>May%2F347029.htmldata=02%7C01%7Croniba%40mellanox.com%7C5ea521
>3063934e92f8c808d77f283106%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0
>%7C637117682910041411sdata=%2FKf47zwbTz5Rrf8u6SMKIjhd8hHKetnGS6
>7%2FHy5X%2FVw%3Dreserved=0
>
>Like other OVS tunnel supports, the patch allows OVS to
>- create GTP-U tunnel dev
>- decap/encap GTP-U
>- match on flags and msgtype
>
>I don't have experience in deploying GTP-U, but I'd love to know/learn from 
>your
>use cases.
I can think of the following:
*  4/5G VNF, In this case you might have
  - VNF get the packet with the GTP, no decap then TX with encap of GTP
  - Termination point, decap on RX and might also do NAT, On TX UN-NAT and 
encap
  - VNF that process control, doing match on msg type
* OVS as an implementation of UPF data (just a crazy idea):
   - UPF configures through N4 with matches that can have combination of 
teid (with ips)
 and  internal 5-tuple (or part of it). Actions of UPF can include 
decap. NAT and QoS.
 This can be translated to open-flow rules easily.
 However, unmatched packets should go to some VNF that have access to 
the SMF, and
 as a result will create new open-flow rules.
>
>> Note that in many cases there is RSS issue here because GTP between
>> two  functions will have the exact same outer header, there is no
>> port entropy. All the packets between two peers will get to the same
>> core (per direction, according to RSS symmetric cfg).
>
>I see. This looks like a performance issue.
>Maybe we can improve it later.
>
>>
>> See inline.
>>
>> >-Original Message-
>> >From: William Tu 
>> >Sent: Wednesday, December 11, 2019 8:20 PM
>> >To: Roni Bar Yanai 
>> >Cc: d...@openvswitch.org
>> >Subject: Re: [ovs-dev] [PATCHv4] userspace: Add GTP-U support.
>> >
>> >On Sun, Dec 08, 2019 at 08:11:19AM +, Roni Bar Yanai wrote:
>> >> Hi William,
>> >>
>> >> GTP-U header size is not constant, you *must* take into account the
>> >> flags, mainly  the sequence. The use of sequence in GTP-U is
>> >> optional but some devices do use  it.  see from 3GPP definition:
>> >> "For PGW, SGW and eNodeB the usage of sequence numbers in G-PDUs is
>> >> optional, but  if GTP-U protocol entities in these nodes are
>> >> relaying G-PDUs to other nodes, they  shall relay the sequence numbers as
>well.
>> >> An RNC, SGSN or GGSN shall reorder out  of sequence T-PDUs when in
>> >sequence delivery is required. This is optional"
>> >>
>> >> The assumption that GTP-U is only data is not correct, you have
>> >> some signaling for example END MARKER (like you wrote). This
>> >> message is sent when there is a mobility  between cells, and the
>> >> message contains a TEID and indicates that last packet sent from
>> >> the origin cell, to prevent packet re-order as handover might have
>> >> packets on the fly. So I would expected that END MARKER will do the
>> >> exact same forwarding as the
>> >GTP-U data.
>> >>
>> >> see inline
>> >>
>> >> >diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
>> >> >index
>> >> >a78972888e33..7ae554c87964 100644
>> >> >--- a/lib/netdev-native-tnl.c
>> >> >+++ b/lib/netdev-native-tnl.c
>> >> >@@ -55,6 +55,9 @@ static struct vlog_rate_limit err_rl =
>> >> >VLOG_RATE_LIMIT_INIT(60, 5);
>> >> > #define GENEVE_BASE_HLEN   (sizeof(struct udp_header) + \
>> &

Re: [ovs-dev] [PATCHv4] userspace: Add GTP-U support.

2019-12-11 Thread Roni Bar Yanai

Hi William,

I think the question here is what is the main target functionality?
3-5/G VNF switching?
Note that in many cases there is RSS issue here because GTP between
 two  functions will have the exact same outer header, there is no
 port entropy. All the packets between two peers will get to the same
 core (per direction, according to RSS symmetric cfg).

See inline.

>-Original Message-
>From: William Tu 
>Sent: Wednesday, December 11, 2019 8:20 PM
>To: Roni Bar Yanai 
>Cc: d...@openvswitch.org
>Subject: Re: [ovs-dev] [PATCHv4] userspace: Add GTP-U support.
>
>On Sun, Dec 08, 2019 at 08:11:19AM +, Roni Bar Yanai wrote:
>> Hi William,
>>
>> GTP-U header size is not constant, you *must* take into account the
>> flags, mainly  the sequence. The use of sequence in GTP-U is optional
>> but some devices do use  it.  see from 3GPP definition:
>> "For PGW, SGW and eNodeB the usage of sequence numbers in G-PDUs is
>> optional, but  if GTP-U protocol entities in these nodes are relaying
>> G-PDUs to other nodes, they  shall relay the sequence numbers as well.
>> An RNC, SGSN or GGSN shall reorder out  of sequence T-PDUs when in
>sequence delivery is required. This is optional"
>>
>> The assumption that GTP-U is only data is not correct, you have some
>> signaling for example END MARKER (like you wrote). This message is
>> sent when there is a mobility  between cells, and the message contains
>> a TEID and indicates that last packet sent from the origin cell, to
>> prevent packet re-order as handover might have packets on the fly. So
>> I would expected that END MARKER will do the exact same forwarding as the
>GTP-U data.
>>
>> see inline
>>
>> >diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c index
>> >a78972888e33..7ae554c87964 100644
>> >--- a/lib/netdev-native-tnl.c
>> >+++ b/lib/netdev-native-tnl.c
>> >@@ -55,6 +55,9 @@ static struct vlog_rate_limit err_rl =
>> >VLOG_RATE_LIMIT_INIT(60, 5);
>> > #define GENEVE_BASE_HLEN   (sizeof(struct udp_header) + \
>> > sizeof(struct genevehdr))
>> >
>> >+#define GTPU_HLEN   (sizeof(struct udp_header) +\
>> >+ sizeof(struct gtpuhdr))
>> This is the base header length, if S or N-PDU is on you have to add
>> 4-bytes
>> >+
>> > uint16_t tnl_udp_port_min = 32768;
>> > uint16_t tnl_udp_port_max = 61000;
>> >
>> >@@ -708,6 +711,88 @@ netdev_erspan_build_header(const struct netdev
>> >*netdev,  }
>> >
>> > struct dp_packet *
>> >+netdev_gtpu_pop_header(struct dp_packet *packet) {
>> >+struct pkt_metadata *md = >md;
>> >+struct flow_tnl *tnl = >tunnel;
>> >+struct gtpuhdr *gtph;
>> >+unsigned int hlen;
>> >+
>> >+ovs_assert(packet->l3_ofs > 0);
>> >+ovs_assert(packet->l4_ofs > 0);
>> >+
>> >+pkt_metadata_init_tnl(md);
>> >+if (GTPU_HLEN > dp_packet_l4_size(packet)) {
>> >+goto err;
>> >+}
>> >+
>> >+gtph = udp_extract_tnl_md(packet, tnl, );
>> >+if (!gtph) {
>> >+goto err;
>> >+}
>> >+
>> >+if (gtph->md.flags != GTPU_FLAGS_DEFAULT) {
>> >+VLOG_WARN_RL(_rl, "GTP-U not supported");
>> >+goto err;
>> >+}
>> >+
>> >+tnl->gtpu_flags = gtph->md.flags;
>> >+tnl->gtpu_msgtype = gtph->md.msgtype;
>> >+tnl->tun_id = htonll(ntohl(get_16aligned_be32(>teid)));
>> >+
>> >+if (tnl->gtpu_msgtype == GTPU_MSGTYPE_GPDU) {
>> >+struct ip_header *ip;
>> >+
>> >+ip = (struct ip_header *)(gtph + 1);
>> No always correct, should be additional 4 bytes on sequence, n-pdu
>> >+if (IP_VER(ip->ip_ihl_ver) == 4) {
>> >+packet->packet_type = htonl(PT_IPV4);
>> >+} else if (IP_VER(ip->ip_ihl_ver) == 6) {
>> >+packet->packet_type = htonl(PT_IPV6);
>> >+} else {
>> >+VLOG_WARN_RL(_rl, "GTP-U: Receive non-IP packet");
>> >+}
>> >+} else {
>> >+/* non GPDU GTP-U messages, ex: echo request, end marker. */
>> >+packet->packet_type = htonl(PT_GTPU_MSG);
>> >+}
>> >+
>> >+dp_packet_reset_packet(packet, hlen + GTPU_HLEN);
>> Why do you want to reset on control such as end marker?
>
&

Re: [ovs-dev] [PATCHv4] userspace: Add GTP-U support.

2019-12-08 Thread Roni Bar Yanai

Hi William,

GTP-U header size is not constant, you *must* take into account the flags, 
mainly
 the sequence. The use of sequence in GTP-U is optional but some devices do use
 it.  see from 3GPP definition:
"For PGW, SGW and eNodeB the usage of sequence numbers in G-PDUs is optional, 
but
 if GTP-U protocol entities in these nodes are relaying G-PDUs to other nodes, 
they 
 shall relay the sequence numbers as well. An RNC, SGSN or GGSN shall reorder 
out 
 of sequence T-PDUs when in sequence delivery is required. This is optional"

The assumption that GTP-U is only data is not correct, you have some signaling
for example END MARKER (like you wrote). This message is sent when there is a 
mobility
 between cells, and the message contains a TEID and indicates that last packet 
sent from 
the origin cell, to prevent packet re-order as handover might have packets
on the fly. So I would expected that END MARKER will do the exact same 
forwarding as the GTP-U data.

see inline

>-Original Message-
>From: dev  On Behalf Of William Tu
>Sent: Thursday, December 5, 2019 10:44 PM
>To: d...@openvswitch.org
>Subject: [ovs-dev] [PATCHv4] userspace: Add GTP-U support.
>
>GTP, GPRS Tunneling Protocol, is a group of IP-based communications
>protocols used to carry general packet radio service (GPRS) within
>GSM, UMTS and LTE networks.  GTP protocol has two parts: Signalling
>(GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used
>for setting up GTP-U protocol, which is an IP-in-UDP tunneling
>protocol. Usually GTP is used in connecting between base station for
>radio, Serving Gateway (S-GW), and PDN Gateway (P-GW).
>
>This patch implements GTP-U protocol for userspace datapath,
>supporting only required header fields and G-PDU message type.
>See spec in:
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.ietf.o
>rg%2Fhtml%2Fdraft-hmm-dmm-5g-uplane-analysis-
>00data=02%7C01%7Croniba%40mellanox.com%7Cb73529a0bea34637f9ed0
>8d779c3f1e4%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C63711175479
>1292921sdata=tY0GruNTBV9NWZg9gO2Lo4PWQZ1swHkB1AwdEg3AJUE%3
>Dreserved=0
>
>Signed-off-by: Yi Yang 
>Co-authored-by: Yi Yang 
>Signed-off-by: William Tu 
>---
>v3 -> v4:
>  - applied Ben's doc revise
>  - increment FLOW_WC_SEQ to 42
>  - minor fixes
>  - travis:
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-
>ci.org%2Fwilliamtu%2Fovs-
>travis%2Fbuilds%2F621289678data=02%7C01%7Croniba%40mellanox.com%
>7Cb73529a0bea34637f9ed08d779c3f1e4%7Ca652971c7d2e4d9ba6a4d149256f461b
>%7C0%7C0%7C637111754791292921sdata=eYRkIjcPUQqJwBQndsjCgtcPGG1
>l4QFUnAWW4vwPjpI%3Dreserved=0
>
>v2 -> v3:
>  - pick up the code from v2, rebase to master
>  - many fixes in code, docs, and add more tests
>  - travis:
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-
>ci.org%2Fwilliamtu%2Fovs-
>travis%2Fbuilds%2F619799361data=02%7C01%7Croniba%40mellanox.com%
>7Cb73529a0bea34637f9ed08d779c3f1e4%7Ca652971c7d2e4d9ba6a4d149256f461b
>%7C0%7C0%7C637111754791292921sdata=cu5bdmJ4ydInKt4chxOcGy33K5E
>9BMDzxgk5%2F7Cq%2FtI%3Dreserved=0
>
>v1 -> v2:
>  - Add new packet_type PT_GTPU_MSG to handle GTP-U signaling message,
>GTP-U signaling message should be steered into a control plane handler
>by explicit openflow entry in flow table.
>  - Fix gtpu_flags and gptu_msgtype set issue.
>  - Verify communication with kernel GTP implementation from Jiannan
>Ouyang.
>  - Fix unit test issue and make sure all the unit tests can pass.
>  - This patch series add GTP-U tunnel support in DPDK userspace,
>GTP-U is used for carrying user data within the GPRS core network and
>between the radio access network and the core network. The user data
>transported can be packets in any of IPv4, IPv6, or PPP formats.
>---
> Documentation/faq/configuration.rst   |  13 +++
> Documentation/faq/releases.rst|   1 +
> NEWS  |   3 +
> datapath/linux/compat/include/linux/openvswitch.h |   2 +
> include/openvswitch/flow.h|   4 +-
> include/openvswitch/match.h   |   6 ++
> include/openvswitch/meta-flow.h   |  28 ++
> include/openvswitch/packets.h |   4 +-
> lib/dpif-netlink-rtnl.c   |   5 +
> lib/dpif-netlink.c|   5 +
> lib/flow.c|  22 +++--
> lib/flow.h|   2 +-
> lib/match.c   |  30 +-
> lib/meta-flow.c   |  38 
> lib/meta-flow.xml |  79 ++-
> lib/netdev-native-tnl.c   |  85 
> lib/netdev-native-tnl.h   |   8 ++
> lib/netdev-vport.c|  25 -
> lib/nx-match.c

[ovs-dev] [RFC v2] netdev-dpdk: setting VF MAC address

2019-10-30 Thread Roni Bar Yanai



On cloud topology with hardware offload, when SR-IOV is used there is an
architecture limitation to configure the VF from the host. The port representor
is attached to OVS, while the VF is usually configured in pass through to the
VM. In such topology the MAC address of the VF is configured by the infra
 (OpenStack for example), as VM is usually untrusted.

For OVS-kernel this can be done by ip link command line. For OVS-DPDK and in
fact for any DPDK application that doesn't use bifurcated driver, the only
entity that can configure the VF MAC address is the PF owner.

Since the DPDK ports management is controlled by the application (OVS here),
such configuration request should be intercepted by OVS.
A more complex approach would be to have an external tool, provided by DPDK,
which would take care of avoiding any conflict or race condition with the
application threads. It is obviously easier to follow the current DPDK model
and keep full control in OVS.

A new command line for OVS will be added:

ovs-vsctl set interface  mac=XX:XX:XX:XX:XX:XX

The OVS will use the rte_eth_dev_default_mac_addr_set api that should be
extended to configure the VF mac address in the name of the representor.

see link: https://www.mail-archive.com/dev@dpdk.org/msg140671.html

It seems like this local approach requires minor changes in OVS netdev-dpdk,
and the benefit will be substantial mainly OVS-DPDK integration with  OpenStack
and other controllers.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovs v3 0/2] Introduce dpdkvdpa netdev

2019-10-28 Thread Roni Bar Yanai

Hi ilya, 
please see inline

>-Original Message-
>From: Ilya Maximets 
>Sent: Monday, October 28, 2019 3:46 PM
>To: Noa Levy ; ovs-dev@openvswitch.org; Roni Bar Yanai
>
>Cc: Oz Shlomo ; Majd Dibbiny ;
>Ameer Mahagneh ; Eli Britstein
>; William Tu ; Simon Horman
>
>Subject: Re: [ovs-dev] [PATCH ovs v3 0/2] Introduce dpdkvdpa netdev
>
>On 17.10.2019 13:16, Noa Ezra wrote:
>> There are two approaches to communicate with a guest, using virtIO or
>> SR-IOV.
>> SR-IOV allows working with port representor which is attached to the
>> OVS and a matching VF is given with pass-through to the VM.
>> HW rules can process packets from up-link and direct them to the VF
>> without going through SW (OVS) and therefore SR-IOV gives the best
>> performance.
>> However, SR-IOV architecture requires that the guest will use a driver
>> which is specific to the underlying HW. Specific HW driver has two
>> main
>> drawbacks:
>> 1. Breaks virtualization in some sense (VM aware of the HW), can also
>> limit the type of images supported.
>> 2. Less natural support for live migration.
>>
>> Using virtIO interface solves both problems, but reduces performance
>> and causes losing of some functionality, for example, for some HW
>> offload, working directly with virtIO cannot be supported.
>> In order to solve this conflict, we created a new netdev type-dpdkvdpa.
>> The new netdev is basically similar to a regular dpdk netdev, but it
>> has some additional functionality for transferring packets from virtIO
>> guest (VM) to a VF and vice versa. With this solution we can benefit
>> both SR-IOV and virtIO.
>> vDPA netdev is designed to support both SW and HW use-cases.
>> HW mode will be used to configure vDPA capable devices. The support
>> for this mode is on progress in the dpdk community.
>> SW acceleration is used to leverage SR-IOV offloads to virtIO guests
>> by relaying packets between VF and virtio devices and as a pre-step
>> for supporting vDPA in HW mode.
>>
>> Running example:
>> 1. Configure OVS bridge and ports:
>> ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev
>> ovs-vsctl add-port br0-ovs pf -- set Interface pf type=dpdk options: \
>>  dpdk-devargs=
>> ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \
>>  options:vdpa-socket-path= \
>>  options:vdpa-accelerator-devargs= \
>>  options:dpdk-devargs=,representor=[id] 2. Run a
>> virtIO guest (VM) in server mode that creates the socket of
>> the vDPA port.
>> 3. Send traffic.
>>
>> Noa Ezra (2):
>>netdev-dpdk-vdpa: Introduce dpdkvdpa netdev
>>netdev-dpdk: Add dpdkvdpa port
>>
>>   Documentation/automake.mk   |   1 +
>>   Documentation/topics/dpdk/index.rst |   1 +
>>   Documentation/topics/dpdk/vdpa.rst  |  90 +
>>   NEWS|   1 +
>>   lib/automake.mk |   4 +-
>>   lib/netdev-dpdk-vdpa.c  | 750
>
>>   lib/netdev-dpdk-vdpa.h  |  54 +++
>>   lib/netdev-dpdk.c   | 162 
>>   vswitchd/vswitch.xml|  25 ++
>>   9 files changed, 1087 insertions(+), 1 deletion(-)
>>   create mode 100644 Documentation/topics/dpdk/vdpa.rst
>>   create mode 100755 lib/netdev-dpdk-vdpa.c
>>   create mode 100644 lib/netdev-dpdk-vdpa.h
>>
>
>Hi, everyone.
>
>So, I have a few questions (mostly to Roni?):
>
>1. What happened to idea of implementing this as a DPDK vdev?
We wanted to solve both OVS-kernel and OVS-DPDK issue.
The main argument against it was that we allow to define ports that are not
 OF ports on the switch. regardless of how useful we think this feature can be
 it breaks something fundamental, and looking back I agree.  
The vdev, was a workaround, but after speaking with some of our DPDK
experts, they had similar arguments, this time for DPDK world. You create 
a port that never gets packet and never sends packets. This totally
doesn't make sense even if it can be useful in some cases. When we remove
the kernel form the equation, we are left with vDPA.
In fact we followed your suggestion of having SW vDPA and HW vDPA. 
This makes total sense, for example open stack can configure vDPA and 
let the OVS probe the device and go with HW or SW, leaving same functionality
and same configuration.

>
>2. What was the results of "direct output optimization" patch [1] testing?
>At this point I also have to mention that OVS is going to have TSO support
>in one of the next releases (at least we have some p

[ovs-dev] [RFC} netdev-dpdk: setting VF MAC address

2019-09-24 Thread Roni Bar Yanai

On cloud topology with hardware offload, when SR-IOV is used there is an
architecture limitation to configure the VF from the host. The port representor
is attached to OVS, while the VF is usually configured in pass through to the
VM. In such topology the MAC address of the VF is configured by the infra
 (OpenStack for example), as VM is usually untrusted.

For OVS-kernel this can be done by ip link command line. For OVS-DPDK and in
fact for any DPDK application that doesn't use bifurcated driver, the only
entity that can configure the VF MAC address is the PF owner.

Since the DPDK ports management is controlled by the application (OVS here),
such configuration request should be intercepted by OVS.
A more complex approach would be to have an external tool, provided by DPDK,
which would take care of avoiding any conflict or race condition with the
application threads. It is obviously easier to follow the current DPDK model
and keep full control in OVS.

A new command line for OVS will be added:

ovs-vsctl set interface  mac=XX:XX:XX:XX:XX:XX

The OVS will use the rte_eth_dev_default_mac_addr_set api that should be
extended to configure the VF mac address in the name of the representor.

see link: https://www.mail-archive.com/dev@dpdk.org/msg140671.html

It seems like this local approach requires minor changes in OVS netdev-dpdk,
and the benefit will be substantial mainly OVS-DPDK integration with  OpenStack
and other controllers.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4 1/4] netdev: Dynamic per-port Flow API.

2019-06-10 Thread Roni Bar Yanai

Hi Ilya,

See inline

>-Original Message-
>From: Ilya Maximets 
>Sent: Thursday, June 6, 2019 2:19 PM
>To: Roni Bar Yanai ; Ophir Munk
>; Roi Dayan ; ovs-
>d...@openvswitch.org
>Cc: Ian Stokes ; Flavio Leitner ; 
>Kevin
>Traynor ; Finn Christensen ; Ben
>Pfaff ; Simon Horman ; Olga
>Shern ; Asaf Penso ; Majd Dibbiny
>; Oz Shlomo 
>Subject: Re: [PATCH v4 1/4] netdev: Dynamic per-port Flow API.
>
>On 06.06.2019 13:38, Roni Bar Yanai wrote:
>> Hi Ilya,
>> was curious myself. Mark & RSS is working  (didn't test with representors 
>> yet).
>
>Hi. Thanks for testing!
>
>> I've tested offload is supported on vport using !has_system_port, do you 
>> think
>failing in this test is enough to say that this is netdev bridge port?
>
>I think it's OK for now. '!has_system_port' allows to say that it's not a
>system vport and, since 'dummy' flow API doesn't support vports offloading,
>we could be sure that only dpdk flow API could be used (if allowed).
>
>> When I returned supported, "dpdk_put" was called as expected (after removing
>the disallow).
>>
>> One thing I've encounters is while addin a tap to the dpdk bridge. I got 
>> this:
>>
>> 2019-06-06T10:18:21.865Z|00055|netdev_offload_tc|INFO|probe tc: block
>offload is supported.
>> 2019-06-06T10:18:21.866Z|00056|netdev_offload_tc|INFO|added ingress qdisc
>to veth0
>> 2019-06-06T10:18:21.866Z|00057|netdev_offload|INFO|veth0: Assigned flow
>API 'linux_tc'.
>> 2019-06-06T10:18:21.866Z|00058|bridge|INFO|bridge br-int: added interface
>veth0 on port 1
>> 2019-06-06T10:18:21.867Z|00059|bridge|INFO|bridge br-phy: added interface
>br-phy on port 65534
>>
>> Seems that we should block this as well.
>
>I don't think that we should block this, because we should allow
>'linux_tc' offloading for linux interfaces. For example, someone
>could open two linux ports from the userspace datapath and expect
>that flows could be offloaded to HW/tc layer. I agree that it will
>not fully work in case of mixing port types within a single OVS
>bridge, however this should be a valid case. Packets from linux to
>DPDK ports and backwards will go through OVS, because such flows
>could not be offloaded.
>
>Curious fact is that working this way (opening physical ports via
>netdev-linux) could be even faster than using DPDK ports in some
>cases, because 'linux_tc' supports full HW offloading unlike DPDK.

Can you explain what exactly are the use cases?
DPDK supports full offload when using port representors, it is just a matter of 
integrating it into OVS.

>
>We'll also have AF_XDP support some time soon which will give even
>more benefits to linux interfaces (XDP bypasses tc layer, however
>it might be possible to install flows to HW and handle unmatched
>packets in userspace).
>
>Best regards, Ilya Maximets.
>
>>
>> BR,
>> Roni
>>
>>> -Original Message-
>>> From: Ilya Maximets 
>>> Sent: Monday, June 3, 2019 6:30 PM
>>> To: Ophir Munk ; Roi Dayan ;
>>> ovs-dev@openvswitch.org
>>> Cc: Ian Stokes ; Flavio Leitner ;
>Kevin
>>> Traynor ; Roni Bar Yanai ;
>Finn
>>> Christensen ; Ben Pfaff ; Simon Horman
>>> ; Olga Shern ; Asaf
>>> Penso ; Majd Dibbiny ; Oz
>Shlomo
>>> 
>>> Subject: Re: [PATCH v4 1/4] netdev: Dynamic per-port Flow API.
>>>
>>> Hi Ophir.
>>> I'm curious, what is the current status of your testing?
>>>
>>> I'd like to apply this series to not block further development.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>> On 23.05.2019 16:54, Ilya Maximets wrote:
>>>> On 23.05.2019 16:53, Ilya Maximets wrote:
>>>>> On 23.05.2019 16:18, Ophir Munk wrote:
>>>>>>
>>>>>>
>>>>>>> -Original Message-
>>>>>>> From: Ilya Maximets 
>>>>>>> Sent: Wednesday, May 22, 2019 2:50 PM
>>>>>>> To: Ophir Munk ; Roi Dayan
>>>>>>> ; ovs-dev@openvswitch.org
>>>>>>> Cc: Ian Stokes ; Flavio Leitner 
>>>>>>> ;
>>>>>>> Kevin Traynor ; Roni Bar Yanai
>>>>>>> ; Finn Christensen ; Ben
>Pfaff
>>>>>>> ; Simon Horman ; Olga
>>>>>>> Shern ; Asaf Penso ; Majd
>>>>>>> Dibbiny 
>>>>>>> Subject: Re: [PATCH v4 1/4] netdev: Dynamic per-port Flow API.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 22.05.2019 13:15, Ilya M

Re: [ovs-dev] [PATCH v4 1/4] netdev: Dynamic per-port Flow API.

2019-06-06 Thread Roni Bar Yanai

Hi Ilya, 
was curious myself. Mark & RSS is working  (didn't test with representors yet). 
I've tested offload is supported on vport using !has_system_port, do you think 
failing in this test is enough to say that this is netdev bridge port?
When I returned supported, "dpdk_put" was called as expected (after removing 
the disallow).

One thing I've encounters is while addin a tap to the dpdk bridge. I got this:

2019-06-06T10:18:21.865Z|00055|netdev_offload_tc|INFO|probe tc: block offload 
is supported.
2019-06-06T10:18:21.866Z|00056|netdev_offload_tc|INFO|added ingress qdisc to 
veth0
2019-06-06T10:18:21.866Z|00057|netdev_offload|INFO|veth0: Assigned flow API 
'linux_tc'.
2019-06-06T10:18:21.866Z|00058|bridge|INFO|bridge br-int: added interface veth0 
on port 1
2019-06-06T10:18:21.867Z|00059|bridge|INFO|bridge br-phy: added interface 
br-phy on port 65534

Seems that we should block this as well. 

BR,
Roni

>-Original Message-
>From: Ilya Maximets 
>Sent: Monday, June 3, 2019 6:30 PM
>To: Ophir Munk ; Roi Dayan ;
>ovs-dev@openvswitch.org
>Cc: Ian Stokes ; Flavio Leitner ; 
>Kevin
>Traynor ; Roni Bar Yanai ; Finn
>Christensen ; Ben Pfaff ; Simon Horman
>; Olga Shern ; Asaf
>Penso ; Majd Dibbiny ; Oz Shlomo
>
>Subject: Re: [PATCH v4 1/4] netdev: Dynamic per-port Flow API.
>
>Hi Ophir.
>I'm curious, what is the current status of your testing?
>
>I'd like to apply this series to not block further development.
>
>Best regards, Ilya Maximets.
>
>On 23.05.2019 16:54, Ilya Maximets wrote:
>> On 23.05.2019 16:53, Ilya Maximets wrote:
>>> On 23.05.2019 16:18, Ophir Munk wrote:
>>>>
>>>>
>>>>> -Original Message-
>>>>> From: Ilya Maximets 
>>>>> Sent: Wednesday, May 22, 2019 2:50 PM
>>>>> To: Ophir Munk ; Roi Dayan
>>>>> ; ovs-dev@openvswitch.org
>>>>> Cc: Ian Stokes ; Flavio Leitner ;
>>>>> Kevin Traynor ; Roni Bar Yanai
>>>>> ; Finn Christensen ; Ben Pfaff
>>>>> ; Simon Horman ; Olga
>>>>> Shern ; Asaf Penso ; Majd
>>>>> Dibbiny 
>>>>> Subject: Re: [PATCH v4 1/4] netdev: Dynamic per-port Flow API.
>>>>>
>>>>>
>>>>>
>>>>> On 22.05.2019 13:15, Ilya Maximets wrote:
>>>>>> On 22.05.2019 1:12, Ophir Munk wrote:
>>>>>>>
>>>>>>>> -Original Message-
>>>>>>>> From: Roi Dayan
>>>>>>>> Sent: Tuesday, May 21, 2019 7:48 PM
>>>>>>>> To: Ilya Maximets ; ovs-
>>>>> d...@openvswitch.org
>>>>>>>> Cc: Ian Stokes ; Flavio Leitner
>>>>>>>> ; Ophir Munk ; Kevin
>>>>> Traynor
>>>>>>>> ; Roni Bar Yanai ; Finn
>>>>>>>> Christensen ; Ben Pfaff ; Simon
>>>>> Horman
>>>>>>>> 
>>>>>>>> Subject: Re: [PATCH v4 1/4] netdev: Dynamic per-port Flow API.
>>>>>>>>
>>>>>>>>
>>>>>>>> Acked-by: Roi Dayan 
>>>>>>>
>>>>>>> Hi Ilya,
>>>>>>> Can you please send a patch for the detection of netdev vport on top of
>>>>> this series (as you have already started suggesting in ML discussions)?
>>>>>>> I will then test it and will make sure it's applicable with this 
>>>>>>> series. I think it
>>>>> is better to do that before series acceptance.
>>>>>>> What do you think?
>>>>>>
>>>>>> Hi.
>>>>>> Actually patches are already on a list. You only need to add few lines
>>>>>> to make them allow vxlan for netdev-offload-dpdk.
>>>>>>
>>>>>> Apply following patch sets on top of this one:
>>>>>>
>>>>>>
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork
>.ozlabs.org%2Fproject%2Fopenvswitch%2Flist%2F%3Fseries%3D107534dat
>a=02%7C01%7Croniba%40mellanox.com%7C90cd4c9ba95b4629884f08d6e8384c41
>%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636951725824527442
>p;sdata=n%2FVMz34ICfk0dvGOP77Ip4L008RRKdraRXMG%2FtFRM64%3Dre
>served=0
>>>>
>>>> FYI - applying this patch succeeded, however applying the next patch failed
>unless I applied
>>>> patches 2/4, 3/4/ 4/4 first and only then I applied the next patch.
>>>
>>> Yes. That is expected.
>>>
>>>>
>>>>>>
>https://eur03.safelinks.p

Re: [ovs-dev] [PATCH 0/2] dpif-netdev: Flow mark leak + cross-dp offloading.

2019-06-06 Thread Roni Bar Yanai

Acked-By: Roni Bar Yanai 

>-Original Message-
>From: Ilya Maximets 
>Sent: Thursday, June 6, 2019 1:16 PM
>To: ovs-dev@openvswitch.org
>Cc: Ian Stokes ; Flavio Leitner ; 
>Ophir
>Munk ; Kevin Traynor ; Roni
>Bar Yanai 
>Subject: Re: [PATCH 0/2] dpif-netdev: Flow mark leak + cross-dp offloading.
>
>Hi everyone.
>
>These patches are bug fixes for issues that affects not only the master branch.
>It'll be good if someone could review them.
>
>Best regards, Ilya Maximets.
>
>On 13.05.2019 17:01, Ilya Maximets wrote:
>> Ilya Maximets (2):
>>   dpif-netdev: Fix flow mark leak on port lookup failure.
>>   dpif-netdev: Forbid vport offloading attempts.
>>
>>  lib/dpif-netdev.c | 20 +++-
>>  1 file changed, 11 insertions(+), 9 deletions(-)
>>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder

2019-05-27 Thread Roni Bar Yanai

Ilya thanks for the answer.
Seems we have a different point of view. 

"You contradict yourself. "not part of the switch" vs "OF switch with great
performance".
It's not the OF switch anymore if you're hiding ports from OF and users."

I don't think I contradict myself.
VF and Vhost "ports" (forwarded), are not switch ports.
They are closing a technical gap: users wants virtio and HW can offload only 
sriov.
It is an HW offload optimization, better performance (less cores for the 
solution), it also makes
It easier to deploy, maintain and optimize.
In my suggestion I didn't defined them as ports in the switch, they are 
configuration in the hw module.
It is just an OVS configuration similar to what you suggested, with netdev.
The reason for creating it in different layer is solving the same problem for 
kernel OVS.
I agree that if we add the kernel part (from my suggestion) you can say we turn 
OVS
Into multi tool because we now can use it just to forward packets without rules 
(and ports).
This is of course not the purpose. We want to add the ability for OVS to 
forward packets to
solve HW limit and get a better OVS performance that works with virtio 
hw-offload.

We are checking your patch, let's see the performance we can get.

>-Original Message-
>From: Ilya Maximets 
>Sent: Monday, May 27, 2019 11:32 AM
>To: Roni Bar Yanai ; Simon Horman
>
>Cc: ovs-dev@openvswitch.org; Ian Stokes ; Kevin Traynor
>; Oz Shlomo ; Eli Britstein
>; Eyal Lavee ; Rony Efraim
>; Ben Pfaff 
>Subject: Re: [ovs-dev] [RFC V2] netdev-rte-offloads: HW offload 
>virtio-forwarder
>
>On 24.05.2019 23:51, Roni Bar Yanai wrote:
>> Hi Ilya,
>> (S
>>
>> See inline.
>>
>>> -----Original Message-
>>> From: Ilya Maximets 
>>> Sent: Friday, May 24, 2019 3:21 PM
>>> To: Simon Horman ; Roni Bar Yanai
>>> 
>>> Cc: ovs-dev@openvswitch.org; Ian Stokes ; Kevin
>Traynor
>>> ; Oz Shlomo ; Eli Britstein
>>> ; Eyal Lavee ; Rony Efraim
>>> ; Ben Pfaff 
>>> Subject: Re: [ovs-dev] [RFC V2] netdev-rte-offloads: HW offload virtio-
>forwarder
>>>
>>> On 22.05.2019 15:10, Simon Horman wrote:
>>>> Hi,
>>>>
>>>> On Thu, May 16, 2019 at 08:44:31AM +, Roni Bar Yanai wrote:
>>>>>> -Original Message-
>>>>>> From: Ilya Maximets 
>>>>>> Sent: Wednesday, May 15, 2019 4:37 PM
>>>>>> To: Roni Bar Yanai ; ovs-dev@openvswitch.org;
>Ian
>>>>>> Stokes ; Kevin Traynor 
>>>>>> Cc: Eyal Lavee ; Oz Shlomo
>;
>>> Eli
>>>>>> Britstein ; Rony Efraim ;
>Asaf
>>>>>> Penso 
>>>>>> Subject: Re: [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder
>>>>>>
>>>>>> On 15.05.2019 16:01, Roni Bar Yanai wrote:
>>>>>>> Hi Ilya,
>>>>>>>
>>>>>>> Thanks for the comment.
>>>>>>>
>>>>>>> I think the suggested arch is very good and has many advantages, and
>>>>>>> in fact I had something very similar as my internally first approach.
>>>>>>>
>>>>>>> However, I had one problem: it doesn't solves the kernel case. It make
>>>>>>> sense doing forwarding using dpdk also when OVS is kernel (port
>>>>>>> representor and rule offloads are done with kernel OVS). It makes
>>>>>>> sense because we can have one solution and because DPDK has better
>>>>>>> performance.
>>>>>>
>>>>>> I'm not sure if it makes practical sense to run separate userpace
>>>>>> datapath just to pass packets between vhost and VF. This actually
>>>>>> matches with some of your own disadvantages of separate DPDK apps.
>>>>>> Separate userspace datapath will need its own complex start,
>>>>>> configuration and maintenance. Also it will consume additional cpu cores
>>>>>> which will not be shared with kernel packet processing.  I think that
>>>>>> just move everything to userspace in this case would be much more simple
>>>>>> for user than maintaining such configuration.
>>>>>
>>>>> Maybe It doesn't make sense for OVS-DPDK but for OVS users it does.
>When
>>>>> you run offload with OVS-kernel, and for some vendors this is the current
>>>>> status, and virtio is a requirement, you now have millions of packets
>>>>> that should be forwarded. Basically you hav

Re: [ovs-dev] [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder

2019-05-24 Thread Roni Bar Yanai

Hi Ilya,
(S

See inline.

>-Original Message-
>From: Ilya Maximets 
>Sent: Friday, May 24, 2019 3:21 PM
>To: Simon Horman ; Roni Bar Yanai
>
>Cc: ovs-dev@openvswitch.org; Ian Stokes ; Kevin Traynor
>; Oz Shlomo ; Eli Britstein
>; Eyal Lavee ; Rony Efraim
>; Ben Pfaff 
>Subject: Re: [ovs-dev] [RFC V2] netdev-rte-offloads: HW offload 
>virtio-forwarder
>
>On 22.05.2019 15:10, Simon Horman wrote:
>> Hi,
>>
>> On Thu, May 16, 2019 at 08:44:31AM +, Roni Bar Yanai wrote:
>>>> -Original Message-
>>>> From: Ilya Maximets 
>>>> Sent: Wednesday, May 15, 2019 4:37 PM
>>>> To: Roni Bar Yanai ; ovs-dev@openvswitch.org; Ian
>>>> Stokes ; Kevin Traynor 
>>>> Cc: Eyal Lavee ; Oz Shlomo ;
>Eli
>>>> Britstein ; Rony Efraim ; Asaf
>>>> Penso 
>>>> Subject: Re: [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder
>>>>
>>>> On 15.05.2019 16:01, Roni Bar Yanai wrote:
>>>>> Hi Ilya,
>>>>>
>>>>> Thanks for the comment.
>>>>>
>>>>> I think the suggested arch is very good and has many advantages, and
>>>>> in fact I had something very similar as my internally first approach.
>>>>>
>>>>> However, I had one problem: it doesn't solves the kernel case. It make
>>>>> sense doing forwarding using dpdk also when OVS is kernel (port
>>>>> representor and rule offloads are done with kernel OVS). It makes
>>>>> sense because we can have one solution and because DPDK has better
>>>>> performance.
>>>>
>>>> I'm not sure if it makes practical sense to run separate userpace
>>>> datapath just to pass packets between vhost and VF. This actually
>>>> matches with some of your own disadvantages of separate DPDK apps.
>>>> Separate userspace datapath will need its own complex start,
>>>> configuration and maintenance. Also it will consume additional cpu cores
>>>> which will not be shared with kernel packet processing.  I think that
>>>> just move everything to userspace in this case would be much more simple
>>>> for user than maintaining such configuration.
>>>
>>> Maybe It doesn't make sense for OVS-DPDK but for OVS users it does.  When
>>> you run offload with OVS-kernel, and for some vendors this is the current
>>> status, and virtio is a requirement, you now have millions of packets
>>> that should be forwarded. Basically you have two options:
>>>
>>> 1. use external application (we discussed that).
>>>
>>> 2. create user space data plane and configure forwarding (OVS), but then
>>> you have performance issues as OVS is not optimized for this. And for
>>> kernel data plane much worse off course.
>>>
>>> Regarding burning a core. In case of HW offload you will do it either
>>> way, and there is no benefit for adding FW functionality for kernel data
>>> path, mainly because of kernel performance limitations.
>>>
>>> I agree that in such case moving to user space is a solution for some,
>>> but keep in mind that some doesn't have such support for DPDK and for
>>> others they have their own OVS based data path with their adjustments, so
>>> it will be a hard transition.
>>>
>>> While arch is good for the two DPDK use cases, it leaves the kernel one
>>> out.  Any thoughts how we can add this use case as well and still keep
>>> the suggested arch?
>>
>> ...
>>
>> At Netronome we have an Open Source standalone application,
>> called virtio-forwarder
>(https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.co
>m%2FNetronome%2Fvirtio-
>forwarderdata=02%7C01%7Croniba%40mellanox.com%7C72d768141c794f4
>59eee08d6e0424d0a%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C6369
>42972699205388sdata=Nk1a9tRUD4%2BjDFOh%2BUwrdnQ6sJ2cnhXfvJsXJs
>F0X3E%3Dreserved=0).
>> The reason that we provide this solution is that we see this as a
>> requirement for some customers. This includes customers using OVS
>> with the kernel based HW offload (OVS-TC).
>>
>> In general I agree that integration with OVS has some advantages and
>> I'm happy to see this issue being discussed. But as we see demand
>> for use of virtio-forwarder in conjunction with OVS-TC I see that
>> as a requirement for a solution that is integrated with OVS, which leads
>> me to lean towards the proposal put forward by Roni.
>>
>> I also feel that the prop

Re: [ovs-dev] [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder

2019-05-16 Thread Roni Bar Yanai



>-Original Message-
>From: Ilya Maximets 
>Sent: Wednesday, May 15, 2019 4:37 PM
>To: Roni Bar Yanai ; ovs-dev@openvswitch.org; Ian
>Stokes ; Kevin Traynor 
>Cc: Eyal Lavee ; Oz Shlomo ; Eli
>Britstein ; Rony Efraim ; Asaf
>Penso 
>Subject: Re: [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder
>
>On 15.05.2019 16:01, Roni Bar Yanai wrote:
>> Hi Ilya,
>>
>> Thanks for the comment.
>> I think the suggested arch is very good and has many advantages, and in fact 
>> I
>had something very similar as my internally first approach.
>> However, I had one problem: it doesn't solves the kernel case. It make sense
>doing forwarding using dpdk also when OVS is kernel (port representor and rule
>offloads are done with kernel OVS). It makes sense because we can have one
>solution and because DPDK has better performance.
>
>I'm not sure if it makes practical sense to run separate userpace
>datapath just to pass packets between vhost and VF. This actually
>matches with some of your own disadvantages of separate DPDK apps.
>Separate userspace datapath will need its own complex start,
>configuration and maintenance. Also it will consume additional cpu
>cores which will not be shared with kernel packet processing.
>I think that just move everything to userspace in this case would
>be much more simple for user than maintaining such configuration.
>

Maybe It doesn't make sense for OVS-DPDK but for OVS users it does.
When you run offload with OVS-kernel, and for some vendors this is the current
status, and virtio is a requirement, you now have millions of packets that
should be forwarded. Basically you have two options:
1. use external application (we discussed that).
2. create user space data plane and configure forwarding (OVS), but then
 you have performance issues as OVS is not optimized for this. And for kernel
 data plane much worse off course.
Regarding burning a core. In case of HW offload you will do it either way, and
there is no benefit for adding FW functionality for kernel data path, mainly 
because 
of kernel performance limitations.  
I agree that in such case moving to user space is a solution for some, but keep 
in
mind that some doesn't have such support for DPDK and for others they have
their own OVS based data path with their adjustments, so it will be a hard
transition. 
While arch is good for the two DPDK use cases, it leaves the kernel one out. 
Any thoughts how we can add this use case as well and still keep the suggested 
arch?

>> Do you think of a way we can create also support just FW case when there is 
>> no
>port representors?
>>
>> Thanks,
>> Roni
>>
>>
>>> -Original Message-
>>> From: Ilya Maximets 
>>> Sent: Wednesday, May 15, 2019 2:25 PM
>>> To: Roni Bar Yanai ; ovs-dev@openvswitch.org; Ian
>>> Stokes ; Kevin Traynor 
>>> Cc: Eyal Lavee ; Oz Shlomo ; Eli
>>> Britstein ; Rony Efraim ; Asaf
>>> Penso 
>>> Subject: Re: [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder
>>>
>>> On 06.05.2019 13:43, Roni Bar Yanai wrote:
>>>> Background
>>>> ==
>>>> OVS HW offload solution is consisted of forwarding and control. HW
>implements
>>>> embedded switch that connects SRIOV VF's and forwards packets according
>to
>>> the
>>>> dynamically configured HW rules (packets can be altered by HW rules).
>Packets
>>>> that have no forwarding rule, called exception packets, are sent to the 
>>>> control
>>>> path (OVS SW). OVS SW will handle the exception packet (just like in SW 
>>>> only
>>>> mode),  namely calling up-call if no DP flow exists. OVS SW will use port
>>>> representor for representing the VF. see:
>>>>
>>>>     (_https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fd
>oc.
>>>
>dpdk.org%2Fguides%2Fprog_guide%2Fswitch_representation.html)._data
>>>
>=02%7C01%7Croniba%40mellanox.com%7Ce6056cf632f343dd734308d6d927f1e0%
>>>
>7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636935162916810125
>>>
>sdata=OT%2BXn0yz2pllFVRqFZDQCmXYymDuZfRamK72EX4Z2ZU%3Dreserv
>>> ed=0
>>>>
>>>> Packets sent from VF will get to the port representor and packets
>>>> sent to the port representor will get to the VF. Once OVS SW generates a
>>>> data plane flow, a new HW rules will be configured in the embedded switch.
>>>> following packet on the on the same flow will be directed by HW only. Will
>>>> arrive directly from VF (also uplink) to VF without getting to SW.
>>>>
>>>> For some HW architectur

Re: [ovs-dev] [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder

2019-05-15 Thread Roni Bar Yanai

Hi Ilya,

Thanks for the comment.
I think the suggested arch is very good and has many advantages, and in fact I 
had something very similar as my internally first approach.
However, I had one problem: it doesn't solves the kernel case. It make sense 
doing forwarding using dpdk also when OVS is kernel (port representor and rule 
offloads are done with kernel OVS). It makes sense because we can have one 
solution and because DPDK has better performance.
Do you think of a way we can create also support just FW case when there is no 
port representors?

Thanks,
Roni


>-Original Message-
>From: Ilya Maximets 
>Sent: Wednesday, May 15, 2019 2:25 PM
>To: Roni Bar Yanai ; ovs-dev@openvswitch.org; Ian
>Stokes ; Kevin Traynor 
>Cc: Eyal Lavee ; Oz Shlomo ; Eli
>Britstein ; Rony Efraim ; Asaf
>Penso 
>Subject: Re: [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder
>
>On 06.05.2019 13:43, Roni Bar Yanai wrote:
>> Background
>> ==
>> OVS HW offload solution is consisted of forwarding and control. HW implements
>> embedded switch that connects SRIOV VF's and forwards packets according to
>the
>> dynamically configured HW rules (packets can be altered by HW rules). Packets
>> that have no forwarding rule, called exception packets, are sent to the 
>> control
>> path (OVS SW). OVS SW will handle the exception packet (just like in SW only
>> mode),  namely calling up-call if no DP flow exists. OVS SW will use port
>> representor for representing the VF. see:
>>
>>     (_https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoc.
>dpdk.org%2Fguides%2Fprog_guide%2Fswitch_representation.html)._data
>=02%7C01%7Croniba%40mellanox.com%7Ce6056cf632f343dd734308d6d927f1e0%
>7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636935162916810125
>sdata=OT%2BXn0yz2pllFVRqFZDQCmXYymDuZfRamK72EX4Z2ZU%3Dreserv
>ed=0
>>
>> Packets sent from VF will get to the port representor and packets
>> sent to the port representor will get to the VF. Once OVS SW generates a
>> data plane flow, a new HW rules will be configured in the embedded switch.
>> following packet on the on the same flow will be directed by HW only. Will
>> arrive directly from VF (also uplink) to VF without getting to SW.
>>
>> For some HW architecture only the shortly presented SRIOV hw offload
>> architecture is supported. SRIOV architecture requires that the guest will
>> install a driver which is specific for the underlying HW. Specific HW driver
>> interduces two main problems for virtualization:
>>
>> 1. It breaks virtualization in some sense (VM aware of the HW).
>> 2. less natural support for live migration.
>>
>> Using virtio interface solves both problems (on the expense of some loss in
>> functionality and performance). However, for some HW offload, working 
>> directly
>> with vitrio cannot be supported.
>>
>> HW offload for virtio architecture
>> 
>> We suggest an architecture for HW offload of virtio interface that
>> adds another component called virtio-forwarder on top of the current
>> architecture. The forwarder is a software or hardware (for vdpa) component
>that
>> connects the VF with a matching virtio interface as shown below:
>>
>>    | PR1  ---
>>  --|--   |   |
>>     | |  | forwarder |
>>     | OVS |  |   |
>>     | |  - -
>>  -    | VF1   | virtio1   | |
>>    | uplink   |   |   |  guest  |
>>   -   |   \ --| |
>> |  |- /    -
>> |  e-switch    |
>> |  |
>>  --
>>
>> The forwarder role is to function as a wire between the VF and the virtio.
>> Forwarder reads packets from the rx-queue and sends them to the peer
>> tx-queue (and vice versa). since the function in this case is reduced to
>> forwarding packets without inspecting them, a single core can push a very 
>> high
>> number of PPS (near DPDK forwarding performance).
>>
>> There are 3 sub use cases.
>>
>> OVS-dpdk
>> 
>> This is the basic use case that was just described. In this use case we have
>> port representor, VF and virtio (forwarding should be done between VF and
>virtio).
>>
>> Vdpa
>> -
>> Vdpa enables the HW to directly put the packets in the VM virtio. In this 
>> case
>> the forwarding is done i

[ovs-dev] [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder

2019-05-06 Thread Roni Bar Yanai

Background
==

OVS HW offload solution is consisted of forwarding and control. HW implements
embedded switch that connects SRIOV VF's and forwards packets according to the
dynamically configured HW rules (packets can be altered by HW rules). Packets
that have no forwarding rule, called exception packets, are sent to the control
path (OVS SW). OVS SW will handle the exception packet (just like in SW only
mode),  namely calling up-call if no DP flow exists. OVS SW will use port
representor for representing the VF. see:
(https://doc.dpdk.org/guides/prog_guide/switch_representation.html).
Packets sent from VF will get to the port representor and packets
sent to the port representor will get to the VF. Once OVS SW generates a
data plane flow, a new HW rules will be configured in the embedded switch.
following packet on the on the same flow will be directed by HW only. Will
arrive directly from VF (also uplink) to VF without getting to SW.

For some HW architecture only the shortly presented SRIOV hw offload
architecture is supported. SRIOV architecture requires that the guest will
install a driver which is specific for the underlying HW. Specific HW driver
interduces two main problems for virtualization:
1. It breaks virtualization in some sense (VM aware of the HW).
2. less natural support for live migration.
Using virtio interface solves both problems (on the expense of some loss in
functionality and performance). However, for some HW offload, working directly
with vitrio cannot be supported.

HW offload for virtio architecture


We suggest an architecture for HW offload of virtio interface that
adds another component called virtio-forwarder on top of the current
architecture. The forwarder is a software or hardware (for vdpa) component that
connects the VF with a matching virtio interface as shown below:


   | PR1  ---
 --|--   |   |
| |  | forwarder |
| OVS |  |   |
   | |  - -
 -| VF1   | virtio1   | |
   | uplink   |   |   |  guest  |
  -   |   \ --| |
|  |- /-
|  e-switch|
|  |
--

The forwarder role is to function as a wire between the VF and the virtio.
Forwarder reads packets from the rx-queue and sends them to the peer
tx-queue (and vice versa). since the function in this case is reduced to
forwarding packets without inspecting them, a single core can push a very high
number of PPS (near DPDK forwarding performance).

There are 3 sub use cases.

OVS-dpdk

This is the basic use case that was just described. In this use case we have
port representor, VF and virtio (forwarding should be done between VF and 
virtio).

Vdpa
-
Vdpa enables the HW to directly put the packets in the VM virtio. In this case
the forwarding is done in HW, but it requires that some SW will handle the
control. Configure the queues and adjust configuration according to VHOST
updates.

OVS-kernel
--
OVS-kernel HW offload has the same limitation. However, in the case of
just forwarding packets, DPDK has a great performance advantage over the kernel.
It would be good to also add this use case, looking on implementation
effort and performance gain.

Why not just use standalone (DPDK test PMD)?


1. When HW-offload is running, we expect that most of the traffic will be
   handled by HW, so the PMD thread will be mostly idle. we don't want to burn
   another core for the forwarding.
2. Standalone application is another application with all the additional
   overheads: start, configuration, monitoring...etc. besides being another
   project which means another dependency.
3. Using already existing OVS load balancing and NUMA awareness. Forwarding
   should have the exact same symptoms of unbalanced workload as regular 
rx-queue
4. We might need to have some prioritization, exception packets are more
   important than forwarding. Being on the same domain will make it possible
   to add such prioritization while reducing CPU requirement to minimum.

OVS virtio-forwarder


The suggestion is to put the wire and control functionality in the hw-offload
module. Looking on the forwarder functionality we have control and data.
The control is the configuration: Virtio/VF matching (and type). queues
configuration (defined when VM initialized, and can change)...etc. The data is
the actual forwarding that needs a context to run it. As explained, forwarding
is reduced to a simple rx-burst and tx-burst where all can be predefined
after the configuration.

We add the forwarding layer to the hw offload module and we configure it
separately. For example:

ovs-appctl hw-offload/set-fw

[ovs-dev] [RFC] netdev-rte-offloads: HW offload virtio-forwarder

2019-05-06 Thread Roni Bar Yanai

Background
==

OVS HW offload solution is consisted of forwarding and control. HW implements
embedded switch that connects SRIOV VF's and forwards packets according to the
dynamically configured HW rules (packets can be altered by HW rules). Packets
that have no forwarding rule, called exception packets, are sent to the control
path (OVS SW). OVS SW will handle the exception packet (just like in SW only
mode),  namely calling up-call if no DP flow exists. OVS SW will use port
representor for representing the VF. see:
(https://doc.dpdk.org/guides/prog_guide/switch_representation.html).
Packets sent from VF will get to the port representor and packets
sent to the port representor will get to the VF. Once OVS SW generates a
data plane flow, a new HW rules will be configured in the embedded switch.
following packet on the on the same flow will be directed by HW only. Will
arrive directly from VF (also uplink) to VF without getting to SW.

For some HW architecture only the shortly presented SRIOV hw offload
architecture is supported. SRIOV architecture requires that the guest will
install a driver which is specific for the underlying HW. Specific HW driver
interduces two main problems for virtualization:
1. It breaks virtualization in some sense (VM aware of the HW).
2. less natural support for live migration.
Using virtio interface solves both problems (on the expense of some loss in
functionality and performance). However, for some HW offload, working directly
with vitrio cannot be supported.

HW offload for virtio architecture


We suggest an architecture for HW offload of virtio interface that
adds another component called virtio-forwarder on top of the current
architecture. The forwarder is a software or hardware (for vdpa) component that
connects the VF with a matching virtio interface as shown below:


   | PR1  ---
 --|--   |   |
| |  | forwarder |
| OVS |  |   |
| |  - -
 -| VF1   | virtio1   | |
   | uplink   |   |   |  guest  |
  -   |   \ --| |
|  |- /-
|  e-switch|
|  |
--

The forwarder role is to function as a wire between the VF and the virtio.
Forwarder reads packets from the rx-queue and sends them to the peer
tx-queue (and vice versa). since the function in this case is reduced to
forwarding packets without inspecting them, a single core can push a very high
number of PPS (near DPDK forwarding performance).

There are 3 sub use cases.

OVS-dpdk

This is the basic use case that was just described. In this use case we have
port representor, VF and virtio (forwarding should be done between VF and 
virtio).

Vdpa
-
Vdpa enables the HW to directly put the packets in the VM virtio. In this case
the forwarding is done in HW, but it requires that some SW will handle the
control. Configure the queues and adjust configuration according to VHOST
updates.

OVS-kernel
--
OVS-kernel HW offload has the same limitation. However, in the case of
just forwarding packets, DPDK has a great performance advantage over the kernel.
It would be good to also add this use case, looking on implementation
effort and performance gain.

Why not just use standalone (DPDK test PMD)?


1. When HW-offload is running, we expect that most of the traffic will be
   handled by HW, so the PMD thread will be mostly idle. we don't want to burn
   another core for the forwarding.
2. Standalone application is another application with all the additional
   overheads: start, configuration, monitoring...etc. besides being another
   project which means another dependency.
3. Using already existing OVS load balancing and NUMA awareness. Forwarding
   should have the exact same symptoms of unbalanced workload as regular 
rx-queue
4. We might need to have some prioritization, exception packets are more
   important than forwarding. Being on the same domain will make it possible
   to add such prioritization while reducing CPU requirement to minimum.

OVS virtio-forwarder


The suggestion is to put the wire and control functionality in the hw-offload
module. Looking on the forwarder functionality we have control and data.
The control is the configuration: Virtio/VF matching (and type). queues
configuration (defined when VM initialized, and can change)...etc. The data is
the actual forwarding that needs a context to run it. As explained, forwarding
is reduced to a simple rx-burst and tx-burst where all can be predefined
after the configuration.

We add the forwarding layer to the hw offload module and we configure it
separately. For example:

ovs-appctl hw-offload/set-fw

Re: [ovs-dev] [ovs-dev, v1] netdev-rte-offloads: Reassign vport netdev functions.

2019-04-30 Thread Roni Bar Yanai

Aaron, please see inline.

>-Original Message-
>From: ovs-dev-boun...@openvswitch.org 
>On Behalf Of Aaron Conole
>Sent: Monday, April 29, 2019 4:44 PM
>To: Ilya Maximets 
>Cc: ovs-dev@openvswitch.org; Thomas Monjalon 
>Subject: Re: [ovs-dev] [ovs-dev, v1] netdev-rte-offloads: Reassign vport netdev
>functions.
>
>Ilya Maximets  writes:
>
>>> 21/04/2019 11:11, Ophir Munk:
 Thomas - would you like to explain more on the origins of "rte"?
>>>
>>> Ian explained (below) the origin quite clearly.
>>> It has been decided in the early days by Intel.
>>>
 From: Ian Stokes
 > On 4/17/2019 5:34 PM, Ben Pfaff wrote:
 > > On Wed, Apr 17, 2019 at 11:45:33AM -0400, Aaron Conole wrote:
 > >> rte comes from dpdk as an acronym for Run Time Environment.  Maybe
 > >> even just dropping the 'rte_' portion?
 > >
 > > *That* is what rte stands for?  What a ridiculously generic name.
 > > It's like naming a library Operating System.
>>>
>>> Yes I agree that it's ridiculous :)
>>
>> The best header is "rte_eal.h". It looks like DPDK tries to abstract from 
>> itself.
>>
>>>
>>> I already proposed to replace rte_ with dpdk_ prefix
>>> but the vast majority was against a big replacement.
>>> Would you support such a change?
>>
>> I'm not contributing much to dpdk these days, but I'd support such a
>> change.
>
>I have a longer (and probably useless, but w/e) opinion below.
>
>> This would be a big step toward apps that tries to work with DPDK as a 
>> library
>> and not as a run-time environment.
>> And, probably, right now is the last chance for DPDK do make such a huge API
>break.
>> There was way too much discussions about API stability and I'm afraid that 
>> DPDK
>> will petrify soon without ability to change anything.
>
>+ .5
>
>I think it makes sense to do this change at the time DPDK project
>declares a stronger ABI guarantee.  "This is the new ossified
>ABI/API, and we don't break it."  It's a strong declaration to users.
>And API / ABI are really just user interfaces.  Compilers, computers,
>etc. don't "care".
>
>Then again, history is littered with crazier ossification in software.
>After all, we still use CAR,CDR in lisp and I don't think most people
>know what that's all about either.  Heck, supposedly indices start at 0
>rather than 1 to correct for yacht handicapping interrupting the
>compiler[1][2] and it turns out that isn't a terrible thing.  "rte_"
>*is* one character less than "dpdk_" after all.
>
>All this to say, I don't think it really truly matters.  As seen in the
>original patch, people now just think "rte == dpdk" and so it's probably
>a lot of hassle for not as much gain.  Anyone who googles "rte_*" upon
>encountering it in code will find DPDK.  And whatever the reason for
>using "rte_" originally won't matter.
>
>So I guess... do whatever you'd like in your existing code.  :)
>
>OTOH, for code in OvS, I'll continue to advocate against using *rte*
>because it's new code and *rte* doesn't really mean anything to OvS
>anyway.  At least in the DPDK project, it meant something.
>
>1:
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipe
>dia.org%2Fwiki%2FZero-
>based_numbering%23Origindata=02%7C01%7Croniba%40mellanox.com%7
>C6de746f1250948803ec608d6cca94ccd%7Ca652971c7d2e4d9ba6a4d149256f461b%7
>C0%7C0%7C636921424851725523sdata=IvceIDWnPxohuOl2smsbmf2Pz43tk
>8KgnUm5AVXDWfU%3Dreserved=0
>2:
>https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fexple.tive.o
>rg%2Fblarg%2F2013%2F10%2F22%2Fcitation-
>needed%2Fdata=02%7C01%7Croniba%40mellanox.com%7C6de746f125094
>8803ec608d6cca94ccd%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636
>921424851725523sdata=6ZlgjwSktRL8ltjAOUHcDJHksoWH8rKqo%2BenInMJ
>6Y8%3Dreserved=0
>
Agree that for most developers dpdk == rte, and when looking at the code rte* 
will not
be confusing. Having said that, totally agree that it makes more sense for OVS 
code to 
use dpdk and not rte in the API. We have tc-offload and dpdk-offload and we 
have 
netdev-dpdk...etc. 
This patch is not relevant anymore as the required functionality was 
implemented by Ilya
as infrastructure in a different patch, but will follow the convention on 
future patches.
Thanks,
Roni
>>>
 > This piqued my interest also, with DPDK in the early days it was 
 > targeting
>bare
 > metal comms systems, so the original API was LWRTE (LiteWeight Run Time
 > Environment) which became RTE as it moved on from bare metal, so it
>seems
 > more of a legacy convention.
>>>
>___
>dev mailing list
>d...@openvswitch.org
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.open
>vswitch.org%2Fmailman%2Flistinfo%2Fovs-
>devdata=02%7C01%7Croniba%40mellanox.com%7C6de746f1250948803ec60
>8d6cca94ccd%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C63692142485
>1735523sdata=ITEQBetXzMxsyR7wU02rwrpx4epHc4LARdEhEDTXq3M%3D&
>amp;reserved=0
___
dev

Re: [ovs-dev] [PATCH] netdev: Dynamic per-port Flow API.

2019-04-24 Thread Roni Bar Yanai

Hi Ilya,

Please see comment/clarification inline.
Thanks,
Roni

>-Original Message-
>From: Ilya Maximets 
>Sent: Wednesday, April 24, 2019 6:12 PM
>To: Ophir Munk ; ovs-dev@openvswitch.org
>Cc: Ian Stokes ; Flavio Leitner ; 
>Kevin
>Traynor ; Roni Bar Yanai ; Finn
>Christensen ; Ben Pfaff ; Roi Dayan
>; Simon Horman ; Olga
>Shern ; Asaf Penso ; Oz Shlomo
>; Paul Blakey 
>Subject: Re: [PATCH] netdev: Dynamic per-port Flow API.
>
>On 24.04.2019 17:26, Ophir Munk wrote:
>> Hi Ilya,
>> Please find comments inline and at the end of this mail.
>
>Hi. Thanks for review.
>Comments inline.
>
>Best regards, Ilya Maximets.
>
>>
>> Regards,
>> Ophir
>>
>>> -Original Message-
>>> From: Ilya Maximets 
>>> Sent: Tuesday, April 23, 2019 7:19 PM
>>> To: ovs-dev@openvswitch.org
>>> Cc: Ian Stokes ; Flavio Leitner ;
>>> Ophir Munk ; Kevin Traynor
>>> ; Roni Bar Yanai ; Finn
>>> Christensen ; Ben Pfaff ; Roi Dayan
>>> ; Simon Horman ;
>>> Ilya Maximets 
>>> Subject: [PATCH] netdev: Dynamic per-port Flow API.
>>>
>>> Current issues with Flow API:
>>>
>>> * OVS calls offloading functions regardless of successful
>>>   flow API initialization. (ex. on init_flow_api failure)
>>> * Static initilaization of Flow API for a netdev_class forbids
>>>   having different offloading types for different instances
>>>   of netdev with the same netdev_class. (ex. different vports in
>>>   'system' and 'netdev' datapaths at the same time)
>>>
>>> Solution:
>>>
>>> * Move Flow API from the netdev_class to netdev instance.
>>> * Make Flow API dynamic, i.e. probe the APIs and choose the
>>>   suitable one.
>>>
>>
>> I suggest distinguishing between initialization and probing.
>> The probing you mention is implemented by testing device initialization:
>init_flow_api().
>> I suggest adding a new probe() API just for the sake of probing. It will 
>> check the
>netdev struct.
>> I suggest  leaving the init_flow_api() API for HW/drivers initialization.
>> The HW/driver may fail at initialization, but it does not mean we need to 
>> probe
>for a new API in that case.
>
>I also thought about separate 'probe()' api, but it seems that probing
>will be equal to 'init()' in most cases. Checking the 'netdev struct'
>fro probing is a bad solution as instances of same netdev class could
>require different offloading apis. Probing a new api on init failure
>could be useful exactly for this case. Anyway we'll be able to improve
>the logic of assigning in the future if it will be needed. This is
>the single point in code where we have information about all the available
>offloading providers.
>
>>
>>> Side effects:
>>>
>>> * Flow API providers localized as possible in their modules.
>>> * Now we have an ability to make runtime checks. For example,
>>>   we could check if particular device supports features we
>>>   need, like if dpdk device supports RSS+MARK action.
>>>
>>> Signed-off-by: Ilya Maximets 
>>> ---
>>>
>>> Since RFC:
>>>   * Fixed sparce build.
>>>   * Some logs turned from INFO to DBG.
>>>   * Enabled HW offloading on non-Linux systems
>>> (For testing with dummy provider).
>>>
>>>  lib/automake.mk   |   2 +-
>>>  lib/dpdk.c|   2 +
>>>  lib/netdev-dpdk.c |  24 +++-
>>>  lib/netdev-dpdk.h |   3 +
>>>  lib/netdev-dummy.c|  24 +++-
>>>  lib/netdev-linux.c|   3 -
>>>  lib/netdev-linux.h|  10 --
>>>  lib/netdev-offload-provider.h |  99 +++
>>>  lib/netdev-provider.h |  67 +--
>>>  lib/netdev-rte-offloads.c |  40 +-
>>>  lib/netdev-rte-offloads.h |  41 +--
>>>  lib/netdev-tc-offloads.c  |  39 --
>>>  lib/netdev-tc-offloads.h  |  44 ---
>>>  lib/netdev-vport.c|   6 +-
>>>  lib/netdev.c  | 221 +++---
>>>  tests/dpif-netdev.at  |   4 +-
>>>  tests/ofproto-macros.at   |   1 -
>>>  17 files changed, 398 insertions(+), 232 deletions(-)  create mode 100644
>>> lib/netdev-offload-provider.h  delete mode 100644 lib/netdev-tc-offloads.h
>>>
>>> diff --git a/lib/automake.mk b/lib/automake.mk index cc5dccf39..b9f3b69e7
>>> 100644
>>> --- a/lib/automake.mk

Re: [ovs-dev] [PATCH v1] netdev-vport: Check DPDK_NETDEV for offloaded API.

2019-04-10 Thread Roni Bar Yanai




>-Original Message-
>From: Ilya Maximets 
>Sent: Wednesday, April 10, 2019 5:21 PM
>To: Ophir Munk ; ovs-dev@openvswitch.org
>Cc: Ian Stokes ; Olga Shern ;
>Kevin Traynor ; Asaf Penso ;
>Roni Bar Yanai ; Paul Blakey ;
>Roi Dayan ; Flavio Leitner 
>Subject: Re: [PATCH v1] netdev-vport: Check DPDK_NETDEV for offloaded
>API.
>
>On 10.04.2019 1:30, Ophir Munk wrote:
>> Hi Ilya,
>> Please see comments inline
>>
>>> -Original Message-
>>> From: Ilya Maximets 
>> ...
>>> On 02.04.2019 12:18, Ophir Munk wrote:
>>>> vport offloaded functions should have a different implementation for
>>>> linux based OVS versus dpdk based OVS. Currently there is only support
>>>> for linux based offloaded API. The code in the file named
>>>> netdev-vport.c checks 'ifdef __linux__' to select the linux based
>>>> offloaded functions, without checking 'ifndef DPDK_NETDEV' as well.
>>>> '__linux__' is a pre-defined compiler macro, indicating that a source
>>>> code is compiled on a linux based system. Any code inside a __linux__
>>>> definition will be excluded on a windows based system, for example.
>>>> 'DPDK_NETDEV' is a macro defined by autoconf tools when configuring
>>>> OVS to be dpdk based as shown in [1].
>>>> Before this commit and in case hw-offload=true - using a vport
>>>> interface with a dpdk based OVS daemon running on a linux machine
>>>> resulted in an error since the vport linux based offloaded APIs were
>>>> called instead of returning EOPNOTSUPP. Luckily the linux offloaded
>>>> API returned immediately on a get_ifindex() failure, which caused no
>>>> harm. An example of the failure message is shown in [2].
>>>>
>>>> [1]
>>>> configure --with-dpdk=/
>>>>
>>>> [2]
>>>> ovs|2|netdev_tc_offloads(dp_netdev_flow_5)|ERR|flow_put:
>failed
>>> to
>>>> get ifindex for vxlan_sys_4789: No such device
>>>>
>>>> Fixes: 01b257860c89 ("netdev-vport: Use common offloads interface")
>>>> Signed-off-by: Ophir Munk 
>>>> ---
>>>
>>> Hi.
>>>
>>> We already discussed this with Roni some time ago. You can't just disable
>>> vport offloading on compile time. Compiling with DPDK support doesn't
>>> mean that it will be used only with userspace datapath. DPDK support is
>just
>>> an additional feature. You could still use kernel datapath and even use
>kernel
>>> and userpace datapaths at the same time under control of the same ovs-
>>> vswitchd process. If those error messages are annoying,
>>
>> The main issue I am concerned with is that unexpected/unplanned kernel
>targeted code is executed
>> as part of dpdk run. It may now end with annoying messages, but I think
>kernel code should be avoided
>> in the first place in case of dpdk datapath. I will send a new patch to 
>> address
>it.
>>
>>> you need to
>>> *check* if offloading supported by *each particular netdev-vport in
>>> runtime*, probably based on the datapath type they assigned to.
>>
>> Thank you for this advice. Do you have pointers into the code where such
>checks occur?
>
>The simplest way is to duplicate vport types stripping the offloading
>and update 'dpif_netdev_port_open_type' to return names of duplicated
>ones.
>It probably be good to add '-user' suffix like 'geneve-user' or 'erspan-user'.
>
>I thought that Roni already has some solution for his tunneling offload
>patches.
>You may ask how he done that.

Thanks Ilya, I did a patch but haven't submit it yet.
I did it by changing the pointers according to the bridge type the port was 
added to, I didn't duplicate.
Will discuss with Ophir.
Thanks

>
>>
>>> In case of
>>> simultaneous running of different datapaths some vports will have backing
>>> linux devices and some will not.
>>>
>>> Compiling out offloading based on enabling of DPDK support will just break
>>> offloading for kernel datapath. At least, this will cause issues for distros
>that
>>> will have to decide if they want DPDK support or vport offloading in kernel.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>>>  lib/netdev-vport.c | 8 
>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c index
>>>> 808a43f..5ba7455 100644
>>>> --- a/lib/netdev-vport.c
>>&g

Re: [ovs-dev] [PATCH] netdev-rte-offloads: Fix printing masks with wrong byte order.

2019-03-31 Thread Roni Bar Yanai

Looks good. 

> -Original Message-
> From: Ilya Maximets 
> Sent: Tuesday, March 26, 2019 2:43 PM
> To: ovs-dev@openvswitch.org; Ian Stokes 
> Cc: Ophir Munk ; Roni Bar Yanai
> ; Ilya Maximets 
> Subject: [PATCH] netdev-rte-offloads: Fix printing masks with wrong byte
> order.
> 
> 'spec's and 'mask's should be printed in a same byte order.
> 
> Fixes: daf90186e291 ("netdev-dpdk: add debug for rte flow patterns")
> Signed-off-by: Ilya Maximets 
> ---
>  lib/netdev-rte-offloads.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/netdev-rte-offloads.c b/lib/netdev-rte-offloads.c
> index b945b3243..e9ab08624 100644
> --- a/lib/netdev-rte-offloads.c
> +++ b/lib/netdev-rte-offloads.c
> @@ -146,7 +146,7 @@ dump_flow_pattern(struct rte_flow_item *item)
>"type=0x%04"PRIx16"\n",
>ETH_ADDR_BYTES_ARGS(eth_mask->src.addr_bytes),
>ETH_ADDR_BYTES_ARGS(eth_mask->dst.addr_bytes),
> -  eth_mask->type);
> +  ntohs(eth_mask->type));
>  } else {
>  ds_put_cstr(, "  Mask = null\n");
>  }
> @@ -224,8 +224,8 @@ dump_flow_pattern(struct rte_flow_item *item)
>  ds_put_format(,
>"  Mask: src_port=0x%"PRIx16
>", dst_port=0x%"PRIx16"\n",
> -  udp_mask->hdr.src_port,
> -  udp_mask->hdr.dst_port);
> +  ntohs(udp_mask->hdr.src_port),
> +  ntohs(udp_mask->hdr.dst_port));
>  } else {
>  ds_put_cstr(, "  Mask = null\n");
>  }
> @@ -248,8 +248,8 @@ dump_flow_pattern(struct rte_flow_item *item)
>  ds_put_format(,
>"  Mask: src_port=0x%"PRIx16
>", dst_port=0x%"PRIx16"\n",
> -  sctp_mask->hdr.src_port,
> -  sctp_mask->hdr.dst_port);
> +  ntohs(sctp_mask->hdr.src_port),
> +  ntohs(sctp_mask->hdr.dst_port));
>  } else {
>  ds_put_cstr(, "  Mask = null\n");
>  }
> @@ -299,8 +299,8 @@ dump_flow_pattern(struct rte_flow_item *item)
>  ds_put_format(,
>"  Mask: src_port=%"PRIx16", dst_port=%"PRIx16
>", data_off=0x%"PRIx8", tcp_flags=0x%"PRIx8"\n",
> -  tcp_mask->hdr.src_port,
> -  tcp_mask->hdr.dst_port,
> +  ntohs(tcp_mask->hdr.src_port),
> +  ntohs(tcp_mask->hdr.dst_port),
>tcp_mask->hdr.data_off,
>tcp_mask->hdr.tcp_flags);
>  } else {
> --
> 2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] netdev-rte-offloads: Add thread-safety notes.

2019-03-21 Thread Roni Bar Yanai

Looks good.

> -Original Message-
> From: Ilya Maximets 
> Sent: Wednesday, March 20, 2019 1:15 PM
> To: ovs-dev@openvswitch.org; Ian Stokes 
> Cc: Flavio Leitner ; Ophir Munk
> ; Kevin Traynor ; Roni Bar
> Yanai ; Finn Christensen ; Asaf
> Penso ; Ilya Maximets 
> Subject: [PATCH] netdev-rte-offloads: Add thread-safety notes.
> 
> DPDK_FLOW_OFFLOAD_API is not safe in a variety of ways.
> This should be documented.
> 
> Signed-off-by: Ilya Maximets 
> ---
> 
> Wording/spelling suggestions are welcome.
> 
>  lib/netdev-rte-offloads.h | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/lib/netdev-rte-offloads.h b/lib/netdev-rte-offloads.h
> index 3f7d15f45..18c8a7558 100644
> --- a/lib/netdev-rte-offloads.h
> +++ b/lib/netdev-rte-offloads.h
> @@ -25,6 +25,23 @@ struct nlattr;
>  struct offload_info;
>  struct dpif_flow_stats;
> 
> +/* Thread-safety
> + * =
> + *
> + * Below API is NOT thread safe in following terms:
> + *
> + *  - The caller must be sure that none of these functions will be called
> + *simultaneously.  Even for different 'netdev's.
> + *
> + *  - The caller must be sure that 'netdev' will not be
> destructed/deallocated.
> + *
> + *  - The caller must be sure that 'netdev' configuration will not be 
> changed.
> + *For example, simultaneous call of 'netdev_reconfigure()' for the same
> + *'netdev' is forbidden.
> + *
> + * For current implementation all above restrictions could be fulfilled by
> + * taking the datapath 'port_mutex' in lib/dpif-netdev.c.  */
> +
>  int netdev_rte_offloads_flow_put(struct netdev *netdev, struct match
> *match,
>   struct nlattr *actions, size_t actions_len,
>   const ovs_u128 *ufid,
> --
> 2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v1] dpif: Add marked packets stats

2019-03-12 Thread Roni Bar Yanai




> -Original Message-
> From: Ilya Maximets 
> Sent: Tuesday, March 12, 2019 5:25 PM
> To: Roni Bar Yanai ; Ophir Munk
> ; ovs-dev@openvswitch.org
> Cc: Asaf Penso ; Ian Stokes ;
> Shahaf Shuler ; Olga Shern
> ; Kevin Traynor 
> Subject: Re: [PATCH v1] dpif: Add marked packets stats
> 
> On 12.03.2019 17:53, Roni Bar Yanai wrote:
> >
> >
> >> -Original Message-
> >> From: Ilya Maximets 
> >> Sent: Tuesday, March 12, 2019 11:11 AM
> >> To: Ophir Munk ; ovs-dev@openvswitch.org
> >> Cc: Asaf Penso ; Ian Stokes
> ;
> >> Shahaf Shuler ; Olga Shern
> >> ; Kevin Traynor ; Roni Bar
> >> Yanai 
> >> Subject: Re: [PATCH v1] dpif: Add marked packets stats
> >>
> >> On 12.03.2019 11:41, Ophir Munk wrote:
> >>> This commit adds marked packets statistics. Following commit [1],
> >>> received packets can be marked with a mark id which is associated
> >>> with the flow on which the packets were recieved. This commits counts
> >>> the marked packets per flow and displays the result when executing
> >>> datapath dump-flow command  with a "-m" (more verbosity) option, as
> >>> shown in [2].
> >>> Packets are marked only when hw-offload option is enabled and only on
> >>> some netdev classes such as netdev-dpdk. In all other cases marked
> >>> packets are not counted and are not displayed when executing the
> >> command
> >>> in [2].
> >>>
> >>> [1]
> >>> Commit 241bad15d9 ("dpif-netdev: associate flow with a mark id")
> >>>
> >>> [2]
> >>> ovs-appctl dpif/dump-flows -m 
> >>>
> >>> Signed-off-by: Ophir Munk 
> >>> ---
> >>
> >> Hi Ophir.
> >>
> >> Thanks for working on this, but I don't think that flow stats is the right
> >> place to expose information like that. We already have pmd-stats/perf-
> show
> >> appctl calls that prints various hit stats for PMD threads. And, I think,
> >> 'flow mark hits' or something similar should be one of these stats.
> >> Right now flow mark hits are not accounted there and that is the issue.
> >>
> >> Regarding the flow stats, we probably need to expose status of the flow,
> >> i.e. if it was offloaded or not, like it done for tc offloading.
> >> Right now all the flows reported by dpif-netdev as not offloaded.
> >> See 'flow->attrs.offloaded = false;' in dp_netdev_flow_to_dpif_flow().
> >> And we probably need to change this from boolean to some
> 'yes/no/partial'.
> >>
> >> BTW, your current implementation assumes that packets had flow mark if
> >> 'flow->mark' is set, but this is could be not true, because packets could
> >> arrive while offloading is in progress or offloading could be started
> >> between packets arrival and finishing of their processing.
> >>
> >> Best regards, Ilya Maximets.
> >
> > Hi Ilya,
> >
> > We agree that flow should be marked as offloaded yes/no/partial, but
> seems this is not enough.
> > Like you said, since hardware offload is executed in a different thread, it
> might be that the number
> > of packets that were marked/handled by hardware is different than the
> total packets of the flow.
> > If there is a load on the offload or just high latency, it might be that in 
> > some
> cases the difference will be significant.
> > Although I agree that you will probably be able to see the problem In the
> pmd-stats (assuming mark will be added there),
> > ,it will be less clear. For example, there are probably flows that cannot be
> offloaded at all so you won't expect
> > mark count for them. Another example is a bug, we say that the flow was
> offloaded but in fact it is not.
> > Of course you should expect the code to be tested, but the field is always
> different.
> > Maybe we can add it we special flag for hardware offload specific
> information.
> 
> Not sure if I understand your point correctly, but isn't it expected that
> if flow is offloaded than all the packets through the flow had flow mark?
> In case of a bug it's most likely that it happened after the flow mark
> association and your flow stats will be wrong anyway.
> The only trusted source of the number of "offloaded" packets could be the
> number of real packets that had flow mark set on receive. And these stats
> should be displayed by PMD stats.
> 
> If everything works as expected, all the packets for offloaded flow will be
> marked

Re: [ovs-dev] [PATCH v1] dpif: Add marked packets stats

2019-03-12 Thread Roni Bar Yanai




> -Original Message-
> From: Ilya Maximets 
> Sent: Tuesday, March 12, 2019 11:11 AM
> To: Ophir Munk ; ovs-dev@openvswitch.org
> Cc: Asaf Penso ; Ian Stokes ;
> Shahaf Shuler ; Olga Shern
> ; Kevin Traynor ; Roni Bar
> Yanai 
> Subject: Re: [PATCH v1] dpif: Add marked packets stats
> 
> On 12.03.2019 11:41, Ophir Munk wrote:
> > This commit adds marked packets statistics. Following commit [1],
> > received packets can be marked with a mark id which is associated
> > with the flow on which the packets were recieved. This commits counts
> > the marked packets per flow and displays the result when executing
> > datapath dump-flow command  with a "-m" (more verbosity) option, as
> > shown in [2].
> > Packets are marked only when hw-offload option is enabled and only on
> > some netdev classes such as netdev-dpdk. In all other cases marked
> > packets are not counted and are not displayed when executing the
> command
> > in [2].
> >
> > [1]
> > Commit 241bad15d9 ("dpif-netdev: associate flow with a mark id")
> >
> > [2]
> > ovs-appctl dpif/dump-flows -m 
> >
> > Signed-off-by: Ophir Munk 
> > ---
> 
> Hi Ophir.
> 
> Thanks for working on this, but I don't think that flow stats is the right
> place to expose information like that. We already have pmd-stats/perf-show
> appctl calls that prints various hit stats for PMD threads. And, I think,
> 'flow mark hits' or something similar should be one of these stats.
> Right now flow mark hits are not accounted there and that is the issue.
> 
> Regarding the flow stats, we probably need to expose status of the flow,
> i.e. if it was offloaded or not, like it done for tc offloading.
> Right now all the flows reported by dpif-netdev as not offloaded.
> See 'flow->attrs.offloaded = false;' in dp_netdev_flow_to_dpif_flow().
> And we probably need to change this from boolean to some 'yes/no/partial'.
> 
> BTW, your current implementation assumes that packets had flow mark if
> 'flow->mark' is set, but this is could be not true, because packets could
> arrive while offloading is in progress or offloading could be started
> between packets arrival and finishing of their processing.
> 
> Best regards, Ilya Maximets.

Hi Ilya, 

We agree that flow should be marked as offloaded yes/no/partial, but seems this 
is not enough.
Like you said, since hardware offload is executed in a different thread, it 
might be that the number
of packets that were marked/handled by hardware is different than the total 
packets of the flow. 
If there is a load on the offload or just high latency, it might be that in 
some cases the difference will be significant.
Although I agree that you will probably be able to see the problem In the 
pmd-stats (assuming mark will be added there),
,it will be less clear. For example, there are probably flows that cannot be 
offloaded at all so you won't expect
mark count for them. Another example is a bug, we say that the flow was 
offloaded but in fact it is not. 
Of course you should expect the code to be tested, but the field is always 
different.
Maybe we can add it we special flag for hardware offload specific information.

Thanks
> 
> >  lib/dpif-netdev.c  | 16 
> >  lib/dpif.c |  9 +
> >  lib/dpif.h |  3 +++
> >  ofproto/ofproto-dpif.c |  4 
> >  4 files changed, 32 insertions(+)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 4d6d0c3..ef2f8f1 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -486,6 +486,8 @@ struct dp_netdev_flow_stats {
> >  atomic_ullong packet_count;/* Number of packets matched. */
> >  atomic_ullong byte_count;  /* Number of bytes matched. */
> >  atomic_uint16_t tcp_flags; /* Bitwise-OR of seen tcp_flags values. 
> > */
> > +/* Offload stats. */
> > +atomic_ullong mark_count;  /* Number of marked packets matched.
> */
> >  };
> >
> >  /* A flow in 'dp_netdev_pmd_thread's 'flow_table'.
> > @@ -3008,6 +3010,9 @@ get_dpif_flow_stats(const struct
> dp_netdev_flow *netdev_flow_,
> >  stats->used = used;
> >  atomic_read_relaxed(_flow->stats.tcp_flags, );
> >  stats->tcp_flags = flags;
> > +/* Offload stats. */
> > +atomic_read_relaxed(_flow->stats.mark_count, );
> > +stats->n_marked = n;
> >  }
> >
> >  /* Converts to the dpif_flow format, using 'key_buf' and 'mask_buf' for
> > @@ -6124,6 +6129,15 @@ dp_netdev_flow_used(struct dp_netdev_flow
> *netdev_flow, int cnt, int size,
> >  atomic_store_relaxed(_flow-&

Re: [ovs-dev] [PATCH v2 2/3] netdev-dpdk: Move offloading-code to a new file

2019-02-26 Thread Roni Bar Yanai




> -Original Message-
> From: Ilya Maximets 
> Sent: Tuesday, February 26, 2019 4:38 PM
> To: Roni Bar Yanai ; Ophir Munk
> ; ovs-dev@openvswitch.org
> Cc: Ian Stokes ; Olga Shern ;
> Kevin Traynor ; Asaf Penso ;
> Flavio Leitner 
> Subject: Re: [PATCH v2 2/3] netdev-dpdk: Move offloading-code to a new
> file
> 
> On 21.02.2019 19:31, Roni Bar Yanai wrote:
> >
> >
> >> -Original Message-
> >> From: Ilya Maximets 
> >> Sent: Thursday, February 21, 2019 6:20 PM
> >> To: Ophir Munk ; ovs-dev@openvswitch.org
> >> Cc: Ian Stokes ; Olga Shern
> ;
> >> Kevin Traynor ; Asaf Penso
> ;
> >> Roni Bar Yanai ; Flavio Leitner
> 
> >> Subject: Re: [PATCH v2 2/3] netdev-dpdk: Move offloading-code to a new
> >> file
> >>
> >> At first, 'git am' fails to apply the patch:
> >>
> >> Applying: netdev-dpdk: Move offloading-code to a new file
> >> error: patch failed: lib/netdev-dpdk.c:4240
> >> error: lib/netdev-dpdk.c: patch does not apply
> >> .git/rebase-apply/patch:1564: new blank line at EOF.
> >> +
> >> Patch failed at 0002 netdev-dpdk: Move offloading-code to a new file
> >>
> >> I'm not sure why it complains about 'lib/netdev-dpdk.c' because the
> >> issue is the blanlk line at the end of netdev-rte-offloads.c.
> >>
> >> On 21.02.2019 15:07, Ophir Munk wrote:
> >>> From: Roni Bar Yanai 
> >>>
> >>> Hardware offloading-code is moved to a new file called
> >>
> >> Just curious, why you writing "offloading-code" as a single dash
> >> separated word?
> >>
> >>> netdev-rte-offloads.c. The original offloading-code is copied as is
> >>
> >> IMHO, it's a good time to make style changes in the code, because
> >> current functions doesn't follow a lot of OVS coding style guidelines
> >> and we're touching them anyway.
> >>
> >
> > I agree it is a good time, but I think it should be done on a separate 
> > commit,
> because if
> > something breaks (although all are style changes), it will be hard to find 
> > the
> changes on git log. The entire file
> > Will be + .
> 
> You already made mistakes while rebasing the copy-pasted code.
> Thorough review is required anyway. Moving of few braces will
> not change anything. 'sizeof' related changes are mostly obvious.
> 
> If you wish, I could handle 'n_rxq' usage update in a separate
> patch as it's not really a style change.
> 
Ok. Will fix all on V3.
Thanks.
 > >
> >
> >>> from the netdev-dpdk.c file to the new file, where future
> >>> offloading-code should be added as well. The netdev-dpdk.c file will
> >>> remain unchanged as new offloading-code is added.
> >>>
> >>> Reviewed-by: Asaf Penso 
> >>> Signed-off-by: Roni Bar Yanai 
> >>> Signed-off-by: Ophir Munk 
> >>> ---
> >>>  lib/automake.mk   |   4 +-
> >>>  lib/netdev-dpdk.c | 727 
> >>> +--
> >>>  lib/netdev-rte-offloads.c | 775
> >> ++
> >>>  lib/netdev-rte-offloads.h |  38 +++
> >>>  4 files changed, 817 insertions(+), 727 deletions(-)
> >>>  create mode 100644 lib/netdev-rte-offloads.c
> >>>  create mode 100644 lib/netdev-rte-offloads.h
> >>>
> >>> diff --git a/lib/automake.mk b/lib/automake.mk
> >>> index bae032b..cc5dccf 100644
> >>> --- a/lib/automake.mk
> >>> +++ b/lib/automake.mk
> >>> @@ -138,6 +138,7 @@ lib_libopenvswitch_la_SOURCES = \
> >>>   lib/netdev-dpdk.h \
> >>>   lib/netdev-dummy.c \
> >>>   lib/netdev-provider.h \
> >>> + lib/netdev-rte-offloads.h \
> >>>   lib/netdev-vport.c \
> >>>   lib/netdev-vport.h \
> >>>   lib/netdev-vport-private.h \
> >>> @@ -411,7 +412,8 @@ endif
> >>>  if DPDK_NETDEV
> >>>  lib_libopenvswitch_la_SOURCES += \
> >>>   lib/dpdk.c \
> >>> - lib/netdev-dpdk.c
> >>> + lib/netdev-dpdk.c \
> >>> + lib/netdev-rte-offloads.c
> >>>  else
> >>>  lib_libopenvswitch_la_SOURCES += \
> >>>   lib/dpdk-stub.c
> >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >>> index 163d4ec..c4228d2 100644
> >>> --- a/lib/netdev-dpdk.c
> >>> +++ b/lib/netdev-dpdk.c
> >>> @@ -47,6 +47,7

Re: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Expose flow creation/destruction calls

2019-02-22 Thread Roni Bar Yanai




> -Original Message-
> From: Ilya Maximets 
> Sent: Friday, February 22, 2019 1:26 PM
> To: Roni Bar Yanai ; Ophir Munk
> ; ovs-dev@openvswitch.org
> Cc: Ian Stokes ; Olga Shern ;
> Kevin Traynor ; Asaf Penso ;
> Flavio Leitner 
> Subject: Re: [PATCH v2 1/3] netdev-dpdk: Expose flow creation/destruction
> calls
> 
> On 21.02.2019 19:37, Roni Bar Yanai wrote:
> >
> >
> >> -Original Message-
> >> From: Ilya Maximets 
> >> Sent: Thursday, February 21, 2019 5:27 PM
> >> To: Ophir Munk ; ovs-dev@openvswitch.org
> >> Cc: Ian Stokes ; Olga Shern
> ;
> >> Kevin Traynor ; Asaf Penso
> ;
> >> Roni Bar Yanai ; Flavio Leitner
> 
> >> Subject: Re: [PATCH v2 1/3] netdev-dpdk: Expose flow
> creation/destruction
> >> calls
> >>
> >> On 21.02.2019 15:07, Ophir Munk wrote:
> >>> From: Roni Bar Yanai 
> >>>
> >>> Before offloading-code was added to the netdev-dpdk.c file (MARK and
> >>> RSS actions) the only DPDK RTE calls in use were rte_flow_create() and
> >>> rte_flow_destroy(). In preparation for splitting the offloading-code
> >>> from the netdev-dpdk.c file to a separate file, it is required
> >>> to embed these RTE calls into a global netdev-dpdk-* API so that
> >>> they can be called from the new file. An example for this requirement
> >>> can be seen in the handling of dpdk_mutex, which should be
> encapsulated
> >>
> >> You probably meant dev->mutex.
> >>
> >>> inside netdev-dpdk class (netdev-dpdk.c file), and should be unknown
> >>> to the outside callers. This commit embeds the rte_flow_create() call
> >>> inside the netdev_dpdk_flow_create() API and the rte_flow_destroy()
> >>> call inside the netdev_dpdk_rte_flow_destroy() API.
> >>>
> >>> Reviewed-by: Asaf Penso 
> >>> Signed-off-by: Roni Bar Yanai 
> >>> Signed-off-by: Ophir Munk 
> >>> ---
> >>>  lib/netdev-dpdk.c | 51
> >> +++
> >>>  lib/netdev-dpdk.h | 14 ++
> >>>  2 files changed, 53 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >>> index 32a6ffd..163d4ec 100644
> >>> --- a/lib/netdev-dpdk.c
> >>> +++ b/lib/netdev-dpdk.c
> >>> @@ -4203,6 +4203,42 @@ unlock:
> >>>  return err;
> >>>  }
> >>>
> >>> +int
> >>> +netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
> >>> + struct rte_flow *rte_flow,
> >>> + struct rte_flow_error *error)
> >>> +{
> >>> +if (!is_dpdk_class(netdev->netdev_class)) {
> >>> +return -1;
> >>> +}
> >>
> >> Not sure if this check is needed, because we're registering
> >> this offload API only for DPDK devices. However, it's not
> >> correct anyway, because 'is_dpdk_class' will return 'true'
> >> for vhost netdevs which are not rte_ethdev devices, i.e. has
> >> no offloading API.
> >>
> >>> +
> >>> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> >>> +int ret;
> >>> +
> >>> +ovs_mutex_lock(>mutex);
> >>> +ret = rte_flow_destroy(dev->port_id, rte_flow, error);
> >>> +ovs_mutex_unlock(>mutex);
> >>> +return ret;
> >>> +}
> >>> +
> >>> +struct rte_flow *
> >>> +netdev_dpdk_rte_flow_create(struct netdev *netdev,
> >>> +const struct rte_flow_attr *attr,
> >>> +const struct rte_flow_item *items,
> >>> +const struct rte_flow_action *actions,
> >>> +struct rte_flow_error *error)
> >>> +{
> >>> +if (!is_dpdk_class(netdev->netdev_class)) {
> >>> +return NULL;
> >>> +}
> >>
> >> Same here.
> >>
> >>> +
> >>> +struct rte_flow *flow;
> >>> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> >>
> >> It's better to have an empty line between declarations and code.
> >>
> >>> +ovs_mutex_lock(>mutex);
> >>> +flow = rte_flow_create(dev->port_id, attr, items, actions, error);
> >>> +o

Re: [ovs-dev] [PATCH v2 1/3] netdev-dpdk: Expose flow creation/destruction calls

2019-02-21 Thread Roni Bar Yanai




> -Original Message-
> From: Ilya Maximets 
> Sent: Thursday, February 21, 2019 5:27 PM
> To: Ophir Munk ; ovs-dev@openvswitch.org
> Cc: Ian Stokes ; Olga Shern ;
> Kevin Traynor ; Asaf Penso ;
> Roni Bar Yanai ; Flavio Leitner 
> Subject: Re: [PATCH v2 1/3] netdev-dpdk: Expose flow creation/destruction
> calls
> 
> On 21.02.2019 15:07, Ophir Munk wrote:
> > From: Roni Bar Yanai 
> >
> > Before offloading-code was added to the netdev-dpdk.c file (MARK and
> > RSS actions) the only DPDK RTE calls in use were rte_flow_create() and
> > rte_flow_destroy(). In preparation for splitting the offloading-code
> > from the netdev-dpdk.c file to a separate file, it is required
> > to embed these RTE calls into a global netdev-dpdk-* API so that
> > they can be called from the new file. An example for this requirement
> > can be seen in the handling of dpdk_mutex, which should be encapsulated
> 
> You probably meant dev->mutex.
> 
> > inside netdev-dpdk class (netdev-dpdk.c file), and should be unknown
> > to the outside callers. This commit embeds the rte_flow_create() call
> > inside the netdev_dpdk_flow_create() API and the rte_flow_destroy()
> > call inside the netdev_dpdk_rte_flow_destroy() API.
> >
> > Reviewed-by: Asaf Penso 
> > Signed-off-by: Roni Bar Yanai 
> > Signed-off-by: Ophir Munk 
> > ---
> >  lib/netdev-dpdk.c | 51
> +++
> >  lib/netdev-dpdk.h | 14 ++
> >  2 files changed, 53 insertions(+), 12 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 32a6ffd..163d4ec 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -4203,6 +4203,42 @@ unlock:
> >  return err;
> >  }
> >
> > +int
> > +netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
> > + struct rte_flow *rte_flow,
> > + struct rte_flow_error *error)
> > +{
> > +if (!is_dpdk_class(netdev->netdev_class)) {
> > +return -1;
> > +}
> 
> Not sure if this check is needed, because we're registering
> this offload API only for DPDK devices. However, it's not
> correct anyway, because 'is_dpdk_class' will return 'true'
> for vhost netdevs which are not rte_ethdev devices, i.e. has
> no offloading API.
> 
> > +
> > +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > +int ret;
> > +
> > +ovs_mutex_lock(>mutex);
> > +ret = rte_flow_destroy(dev->port_id, rte_flow, error);
> > +ovs_mutex_unlock(>mutex);
> > +return ret;
> > +}
> > +
> > +struct rte_flow *
> > +netdev_dpdk_rte_flow_create(struct netdev *netdev,
> > +const struct rte_flow_attr *attr,
> > +const struct rte_flow_item *items,
> > +const struct rte_flow_action *actions,
> > +struct rte_flow_error *error)
> > +{
> > +if (!is_dpdk_class(netdev->netdev_class)) {
> > +return NULL;
> > +}
> 
> Same here.
> 
> > +
> > +struct rte_flow *flow;
> > +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> 
> It's better to have an empty line between declarations and code.
> 
> > +ovs_mutex_lock(>mutex);
> > +flow = rte_flow_create(dev->port_id, attr, items, actions, error);
> > +ovs_mutex_unlock(>mutex);
> > +return flow;
> > +}
> >
> >  /* Find rte_flow with @ufid */
> >  static struct rte_flow *
> > @@ -4554,7 +4590,6 @@ netdev_dpdk_add_rte_flow_offload(struct
> netdev *netdev,
> >   size_t actions_len OVS_UNUSED,
> >   const ovs_u128 *ufid,
> >   struct offload_info *info) {
> > -struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> >  const struct rte_flow_attr flow_attr = {
> >  .group = 0,
> >  .priority = 0,
> > @@ -4759,15 +4794,12 @@ end_proto_check:
> >  mark.id = info->flow_mark;
> >  add_flow_action(, RTE_FLOW_ACTION_TYPE_MARK, );
> >
> > -ovs_mutex_lock(>mutex);
> >
> >  rss = add_flow_rss_action(, netdev);
> 
Just wondering what will happen if rss action is added with current n_rxq, and 
immediately after
there is a reconfiguration. The result will be exactly the same, rss flow has 
wrong
configuration. The lock doesn't solve this issue.

> This doesn't look right. 'add_flow_rss

Re: [ovs-dev] [PATCH v2 2/3] netdev-dpdk: Move offloading-code to a new file

2019-02-21 Thread Roni Bar Yanai




> -Original Message-
> From: Ilya Maximets 
> Sent: Thursday, February 21, 2019 6:20 PM
> To: Ophir Munk ; ovs-dev@openvswitch.org
> Cc: Ian Stokes ; Olga Shern ;
> Kevin Traynor ; Asaf Penso ;
> Roni Bar Yanai ; Flavio Leitner 
> Subject: Re: [PATCH v2 2/3] netdev-dpdk: Move offloading-code to a new
> file
> 
> At first, 'git am' fails to apply the patch:
> 
> Applying: netdev-dpdk: Move offloading-code to a new file
> error: patch failed: lib/netdev-dpdk.c:4240
> error: lib/netdev-dpdk.c: patch does not apply
> .git/rebase-apply/patch:1564: new blank line at EOF.
> +
> Patch failed at 0002 netdev-dpdk: Move offloading-code to a new file
> 
> I'm not sure why it complains about 'lib/netdev-dpdk.c' because the
> issue is the blanlk line at the end of netdev-rte-offloads.c.
> 
> On 21.02.2019 15:07, Ophir Munk wrote:
> > From: Roni Bar Yanai 
> >
> > Hardware offloading-code is moved to a new file called
> 
> Just curious, why you writing "offloading-code" as a single dash
> separated word?
> 
> > netdev-rte-offloads.c. The original offloading-code is copied as is
> 
> IMHO, it's a good time to make style changes in the code, because
> current functions doesn't follow a lot of OVS coding style guidelines
> and we're touching them anyway.
> 

I agree it is a good time, but I think it should be done on a separate commit, 
because if 
something breaks (although all are style changes), it will be hard to find the 
changes on git log. The entire file
Will be + . 
 

> > from the netdev-dpdk.c file to the new file, where future
> > offloading-code should be added as well. The netdev-dpdk.c file will
> > remain unchanged as new offloading-code is added.
> >
> > Reviewed-by: Asaf Penso 
> > Signed-off-by: Roni Bar Yanai 
> > Signed-off-by: Ophir Munk 
> > ---
> >  lib/automake.mk   |   4 +-
> >  lib/netdev-dpdk.c | 727 +--
> >  lib/netdev-rte-offloads.c | 775
> ++
> >  lib/netdev-rte-offloads.h |  38 +++
> >  4 files changed, 817 insertions(+), 727 deletions(-)
> >  create mode 100644 lib/netdev-rte-offloads.c
> >  create mode 100644 lib/netdev-rte-offloads.h
> >
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index bae032b..cc5dccf 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -138,6 +138,7 @@ lib_libopenvswitch_la_SOURCES = \
> > lib/netdev-dpdk.h \
> > lib/netdev-dummy.c \
> > lib/netdev-provider.h \
> > +   lib/netdev-rte-offloads.h \
> > lib/netdev-vport.c \
> > lib/netdev-vport.h \
> > lib/netdev-vport-private.h \
> > @@ -411,7 +412,8 @@ endif
> >  if DPDK_NETDEV
> >  lib_libopenvswitch_la_SOURCES += \
> > lib/dpdk.c \
> > -   lib/netdev-dpdk.c
> > +   lib/netdev-dpdk.c \
> > +   lib/netdev-rte-offloads.c
> >  else
> >  lib_libopenvswitch_la_SOURCES += \
> > lib/dpdk-stub.c
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 163d4ec..c4228d2 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -47,6 +47,7 @@
> >  #include "dpif-netdev.h"
> >  #include "fatal-signal.h"
> >  #include "netdev-provider.h"
> > +#include "netdev-rte-offloads.h"
> >  #include "netdev-vport.h"
> >  #include "odp-util.h"
> >  #include "openvswitch/dynamic-string.h"
> > @@ -179,17 +180,6 @@ static const struct rte_eth_conf port_conf = {
> >  };
> >
> >  /*
> > - * A mapping from ufid to dpdk rte_flow.
> > - */
> > -static struct cmap ufid_to_rte_flow = CMAP_INITIALIZER;
> > -
> > -struct ufid_to_rte_flow_data {
> > -struct cmap_node node;
> > -ovs_u128 ufid;
> > -struct rte_flow *rte_flow;
> > -};
> > -
> > -/*
> >   * These callbacks allow virtio-net devices to be added to vhost ports when
> >   * configuration has been fully completed.
> >   */
> > @@ -4240,721 +4230,6 @@ netdev_dpdk_rte_flow_create(struct netdev
> *netdev,
> >  return flow;
> >  }
> >
> > -/* Find rte_flow with @ufid */
> > -static struct rte_flow *
> > -ufid_to_rte_flow_find(const ovs_u128 *ufid) {
> > -size_t hash = hash_bytes(ufid, sizeof(*ufid), 0);
> > -struct ufid_to_rte_flow_data *data;
> > -
> > -CMAP_FOR_EACH_WITH_HASH (data, node, hash, _to_rte_flow) {
> > -if (ovs_u128_equals(*ufid, data->ufid)) {
> > -return data->rte_flow;

Re: [ovs-dev] [PATCH v1 0/3] Move offloading-code into a new file

2019-02-21 Thread Roni Bar Yanai




> -Original Message-
> From: Flavio Leitner 
> Sent: Thursday, February 21, 2019 1:40 PM
> To: Ophir Munk 
> Cc: Ilya Maximets ; ovs-dev@openvswitch.org;
> Ian Stokes ; Olga Shern ;
> Kevin Traynor ; Asaf Penso ;
> Roni Bar Yanai ; Shahaf Shuler
> ; Finn Christensen 
> Subject: Re: [PATCH v1 0/3] Move offloading-code into a new file
> 
> On Wed, Feb 20, 2019 at 10:25:45PM +, Ophir Munk wrote:
> > If we wanted to switch to a multi-threaded offload model today- we could
> have done it in two steps:
> > 1. Implement  a multi-threaded model on current master branch.
> > 2. Do the split.
> >
> > The two steps  seem independent. The split is rather technical and there
> are ways to share/pass locks between two files if needed.
> > I don't think that splitting now can limit any future plans to implement a
> multi-threaded offload.
> >
> > What do you think?
> 
> 
> Do we really need multi-threaded model? I had a hope that this low
> transfer rate to update flows in the NIC would be solved by the HW
> itself.
> 
> fbl
> 
You will need multi-thread only if OVS is the bottleneck, this not seems to be 
the case.
I think this is a HW bug/limitation that should be solved there like you said.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v1 0/3] Move offloading-code into a new file

2019-02-20 Thread Roni Bar Yanai

Ilya, this is what I think.
I think there are two points here. First is the atomicity of offloading flows 
and the second is concurrency in offloading on dpif.
Regarding the first point, at this point flows are logically atomic, so if for 
example you are offloading with two threads the lock
Is required only for the offload action itself (as it today). Nothing breaks if 
you add from one thread and then from another..
Of course, if that the case the netdev-rte-offloads.c should be thread safe, 
and also the offload itself should keep flows on same
thread so we won't get out of order for same flow (del, add).
Regarding the seconds point, I think that concurrency should be used only if 
OVS is the bottleneck.  The number of flows is a function
of where the OVS is used. If OVS is used as VNF load balancer, than maybe few 
rules will do the job. In case of a cloud for example
you might have hundreds of thousands of flows, and update rate of thousands per 
second. In many cases, if the latency can get to seconds, by
the time the flow will be offloaded, the flow will probably already has ended. 
In this case there is also a risk of memory leak, as rules rate is much higher
than apply rate, and rules will pileup in the queue. I think that depending on 
the scenario there is a max latency requirement. 
It is up to the driver owner to reach that latency, and it can do the 
optimization in the driver or firmware.
If we already raised this issues, I think that queue size for offload thread 
should be metered some how to avoid memory leak, and also applying rules 
too late. 

Saying that, we can expose dpdk_port and add some acquire/release to the api in 
the future if needed. I think that the netde-rte-offloads.c module
uses netdev-dpdk module, same way as RTE_FLOW uses dpdk_port. RTE_FLOW is a 
framework on top of DPDK.

BR,
Roni 

> -Original Message-
> From: Ilya Maximets 
> Sent: Wednesday, February 20, 2019 7:06 PM
> To: Ophir Munk ; ovs-dev@openvswitch.org
> Cc: Ian Stokes ; Olga Shern ;
> Kevin Traynor ; Asaf Penso ;
> Roni Bar Yanai ; Shahaf Shuler
> ; Flavio Leitner ; Finn
> Christensen 
> Subject: Re: [PATCH v1 0/3] Move offloading-code into a new file
> 
> One general concern to think about:
> 
> This code split introduces few helper functions that resides in netdev-dpdk
> and calls dpdk API by the requests of other modules (netdev-rte-offloads).
> This strategy will not allow us to lock the device for performing several
> operations atomically because all the locks remains in netdev-dpdk.
> While we have a single thread that handles all the offloading in dpif-netdev
> it's not an issue. But I'm wondering, what will happen if we'll decide to
> use offload API concurrently ? What will happen if we'll need to perform
> few operations on the same device atomically.
> 
> I'm not sure if we'll need this or not. Maybe someone has ideas or knows
> such scenarios that will require atomicity ?
> Is there need of changing current single-thread model to multi-thread ?
> 
> In fact that some NICs are not fast enough in updating flow tables
> (like cxgbe that could take up to 10 seconds), having multiple offloading
> threads might be a good idea.
> 
> Best regards, Ilya Maximets.
> 
> 
> On 18.02.2019 19:16, Ophir Munk wrote:
> > Hardware offloading-code is moved to a new file called
> > netdev-rte-offloads.c. The original offloading-code is copied as is
> > from the netdev-dpdk.c file to the new file where future
> > offloading-code should be added as well.
> >
> > This series is essential for offloading-code development for the following
> > reasons:
> > 1. This series does not change the existing OVS code flows/logic on master
> branch.
> > OVS functionality is the same before and after this series.
> > 2. The separation is essential for new offloading-code
> > development without interfering with the rest of OVS development.
> > 3. Vice versa: it is essential that while developing offloading-code -
> > to be able to frequently rebase on top of master branch.
> > 4. The OVS-kernel is practicing the same approach. Please note the file
> lib/netdev-tc-offloads.c.
> >
> > Based on the points mentioned above we kindly ask that the series will be
> > applied on top of the master branch.
> >
> > Ophir Munk (1):
> >   netdev-rte-offloads: Rename netdev_dpdk_* functions
> >
> > Roni Bar Yanai (2):
> >   netdev-dpdk: Expose flow creation/destruction calls
> >   netdev-dpdk: Move offloading-code to a new file
> >
> >  lib/automake.mk   |   4 +-
> >  lib/netdev-dpdk.c | 764 
> > ++---
> >  lib/netdev-dpdk.h |  14 +
> >  lib/netdev-rte-offloads.c | 778
> ++

38 matches

Mail list logo