Re: [ovs-dev] [PATCH v1] netdev-vport: Check DPDK_NETDEV for offloaded API.

2019-04-10 Thread Ilya Maximets
On 10.04.2019 22:56, Roni Bar Yanai wrote:
> 
> 
>> -Original Message-
>> From: Ilya Maximets 
>> Sent: Wednesday, April 10, 2019 5:21 PM
>> To: Ophir Munk ; ovs-dev@openvswitch.org
>> Cc: Ian Stokes ; Olga Shern ;
>> Kevin Traynor ; Asaf Penso ;
>> Roni Bar Yanai ; Paul Blakey ;
>> Roi Dayan ; Flavio Leitner 
>> Subject: Re: [PATCH v1] netdev-vport: Check DPDK_NETDEV for offloaded
>> API.
>>
>> On 10.04.2019 1:30, Ophir Munk wrote:
>>> Hi Ilya,
>>> Please see comments inline
>>>
 -Original Message-
 From: Ilya Maximets 
>>> ...
 On 02.04.2019 12:18, Ophir Munk wrote:
> vport offloaded functions should have a different implementation for
> linux based OVS versus dpdk based OVS. Currently there is only support
> for linux based offloaded API. The code in the file named
> netdev-vport.c checks 'ifdef __linux__' to select the linux based
> offloaded functions, without checking 'ifndef DPDK_NETDEV' as well.
> '__linux__' is a pre-defined compiler macro, indicating that a source
> code is compiled on a linux based system. Any code inside a __linux__
> definition will be excluded on a windows based system, for example.
> 'DPDK_NETDEV' is a macro defined by autoconf tools when configuring
> OVS to be dpdk based as shown in [1].
> Before this commit and in case hw-offload=true - using a vport
> interface with a dpdk based OVS daemon running on a linux machine
> resulted in an error since the vport linux based offloaded APIs were
> called instead of returning EOPNOTSUPP. Luckily the linux offloaded
> API returned immediately on a get_ifindex() failure, which caused no
> harm. An example of the failure message is shown in [2].
>
> [1]
> configure --with-dpdk=/
>
> [2]
> ovs|2|netdev_tc_offloads(dp_netdev_flow_5)|ERR|flow_put:
>> failed
 to
> get ifindex for vxlan_sys_4789: No such device
>
> Fixes: 01b257860c89 ("netdev-vport: Use common offloads interface")
> Signed-off-by: Ophir Munk 
> ---

 Hi.

 We already discussed this with Roni some time ago. You can't just disable
 vport offloading on compile time. Compiling with DPDK support doesn't
 mean that it will be used only with userspace datapath. DPDK support is
>> just
 an additional feature. You could still use kernel datapath and even use
>> kernel
 and userpace datapaths at the same time under control of the same ovs-
 vswitchd process. If those error messages are annoying,
>>>
>>> The main issue I am concerned with is that unexpected/unplanned kernel
>> targeted code is executed
>>> as part of dpdk run. It may now end with annoying messages, but I think
>> kernel code should be avoided
>>> in the first place in case of dpdk datapath. I will send a new patch to 
>>> address
>> it.
>>>
 you need to
 *check* if offloading supported by *each particular netdev-vport in
 runtime*, probably based on the datapath type they assigned to.
>>>
>>> Thank you for this advice. Do you have pointers into the code where such
>> checks occur?
>>
>> The simplest way is to duplicate vport types stripping the offloading
>> and update 'dpif_netdev_port_open_type' to return names of duplicated
>> ones.
>> It probably be good to add '-user' suffix like 'geneve-user' or 
>> 'erspan-user'.
>>
>> I thought that Roni already has some solution for his tunneling offload
>> patches.
>> You may ask how he done that.
> 
> Thanks Ilya, I did a patch but haven't submit it yet.
> I did it by changing the pointers according to the bridge type the port was 
> added to, I didn't duplicate.

There is only one instance of each 'netdev_class'. So, all the netdevs
has same pointer to 'netdev_class'. That's why I mentioned duplicating.
Please, be sure that offloading will work if both datapaths will have
tunneling ports at the same time.

> Will discuss with Ophir.
> Thanks
> 
>>
>>>
 In case of
 simultaneous running of different datapaths some vports will have backing
 linux devices and some will not.

 Compiling out offloading based on enabling of DPDK support will just break
 offloading for kernel datapath. At least, this will cause issues for 
 distros
>> that
 will have to decide if they want DPDK support or vport offloading in 
 kernel.

 Best regards, Ilya Maximets.

>  lib/netdev-vport.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c index
> 808a43f..5ba7455 100644
> --- a/lib/netdev-vport.c
> +++ b/lib/netdev-vport.c
> @@ -47,8 +47,8 @@
>  #include "unaligned.h"
>  #include "unixctl.h"
>  #include "openvswitch/vlog.h"
> +#if defined(__linux__) && !defined(DPDK_NETDEV)
>  #include "netdev-tc-offloads.h"
> -#ifdef __linux__
>  #include "netdev-linux.h"
>  #endif
>
> @@ -1093,7 +1093,7 @@ 

Re: [ovs-dev] datapath: fix flow actions reallocation

2019-04-10 Thread Gregory Rose



On 4/10/2019 4:50 PM, 0-day Robot wrote:

Bleep bloop.  Greetings Greg Rose, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Andrea Righi  needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors or 
committers: Greg Rose 
Lines checked: 77, Warnings: 1, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@bytheb.org

Thanks,
0-day Robot


I saw this when I ran checkpatch but still think it's in the correct 
format.  Since I'm the one who does a lot of
these things I suppose I should just go fix it...  Maybe key off the 
'Upstream commit' keywords.


- Greg
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [branch-2.5 1/2] Set release date for 2.5.8.

2019-04-10 Thread Justin Pettit


> On Apr 10, 2019, at 5:13 PM, Ben Pfaff  wrote:
> 
> On Sat, Mar 30, 2019 at 12:53:25PM -0700, Justin Pettit wrote:
>> Signed-off-by: Justin Pettit 
> 
> Acked-by: Ben Pfaff 
> 
> You can take this for all of the patches in all these release series.
> 
> (I didn't actually apply anything or test it, but the changes look
> good.)

Thanks!  I pushed the branches.  I'll get the releases out soon.

--Justin


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [branch-2.5 1/2] Set release date for 2.5.8.

2019-04-10 Thread Ben Pfaff
On Sat, Mar 30, 2019 at 12:53:25PM -0700, Justin Pettit wrote:
> Signed-off-by: Justin Pettit 

Acked-by: Ben Pfaff 

You can take this for all of the patches in all these release series.

(I didn't actually apply anything or test it, but the changes look
good.)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] datapath: fix flow actions reallocation

2019-04-10 Thread 0-day Robot
Bleep bloop.  Greetings Greg Rose, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Andrea Righi  needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Greg Rose 
Lines checked: 77, Warnings: 1, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@bytheb.org

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath: fix flow actions reallocation

2019-04-10 Thread William Tu
On Wed, Apr 10, 2019 at 3:51 PM Greg Rose  wrote:
>
> From: Andrea Righi 
>
> Upstream commit:
> commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb
> Author: Andrea Righi 
> Date:   Thu Mar 28 07:36:00 2019 +0100
>
> openvswitch: fix flow actions reallocation
>
> The flow action buffer can be resized if it's not big enough to contain
> all the requested flow actions. However, this resize doesn't take into
> account the new requested size, the buffer is only increased by a factor
> of 2x. This might be not enough to contain the new data, causing a
> buffer overflow, for example:
>
> [   42.044472] 
> =
> [   42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten
> [   42.046415] 
> -
>
> [   42.047715] Disabling lock debugging due to kernel taint
> [   42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc
> [   42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 
> flags=0x2808101
> [   42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb
>
> [   42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc  
> 
> [   42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 
> 00 00  l...
> [   42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 
> 15 f6  l...x...
> [   42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 
> 00 00   ...
> [   42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00  
> [   42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00  
> [   42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00  
> [   42.059061] Redzone 8bf2c4a5: 00 00 00 00  
> 
> [   42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a  
> 
>
> Fix by making sure the new buffer is properly resized to contain all the
> requested data.
>
> BugLink: https://bugs.launchpad.net/bugs/1813244
> Signed-off-by: Andrea Righi 
> Acked-by: Pravin B Shelar 
> Signed-off-by: David S. Miller 
>
> Cc: Andrea Righi 
> Signed-off-by: Greg Rose 

LGTM.
Acked-by: William Tu 

> ---
>  datapath/flow_netlink.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
> index e5e469a..13e6e33 100644
> --- a/datapath/flow_netlink.c
> +++ b/datapath/flow_netlink.c
> @@ -2311,14 +2311,14 @@ static struct nlattr *reserve_sfa_size(struct 
> sw_flow_actions **sfa,
>
> struct sw_flow_actions *acts;
> int new_acts_size;
> -   int req_size = NLA_ALIGN(attr_len);
> +   size_t req_size = NLA_ALIGN(attr_len);
> int next_offset = offsetof(struct sw_flow_actions, actions) +
> (*sfa)->actions_len;
>
> if (req_size <= (ksize(*sfa) - next_offset))
> goto out;
>
> -   new_acts_size = ksize(*sfa) * 2;
> +   new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2);
>
> if (new_acts_size > MAX_ACTIONS_BUFSIZE) {
> if ((MAX_ACTIONS_BUFSIZE - next_offset) < req_size) {
> --
> 1.8.3.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] openvswitch crash on i386

2019-04-10 Thread Gregory Rose


On 4/10/2019 10:19 AM, Gregory Rose wrote:



On 4/8/2019 1:31 AM, Juerg Haefliger wrote:

On Tue, 19 Mar 2019 11:45:44 -0700
Gregory Rose  wrote:

[snip]


Hi Ben, I apologize for the late reply - I've been out and am just
catching up.

Yes I have, and when I do the bug goes away.  I've also run a lockdep
enabled kernel - didn't see anything there.  I've enabled some further
kernel debug options and will get back to working on this now that I'm
back to work.
For me, this commit [1] seems to fix the issue.

...Juerg

[1] https://lore.kernel.org/lkml/20190328063600.GC16096@xps-13


I'll give it a try.  Thanks for the pointer!


That does the trick.  After 100 passes of adding and deleting the test 
bridge the problem does not occur.
In my prior testing it never made it more than 6 times through and 
usually would panic in first few tries.


I'll send a patch to this list so the fix can be back ported to our OOT 
driver.


Thanks,

- Greg

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] SST Tunnel missing from Open vSwitch v2.11.0?

2019-04-10 Thread Gregory Rose



On 4/10/2019 1:32 PM, Nikolas Britton wrote:

Hi,

I just finished compiling OVS 2.11.0 packages for Ubuntu 18.04 using the
Debian upstream source package. I was experimenting with tunnels to create
an overlay network to connect multiple KVM hypervisors. One of the options
I tried to use was an STT tunnel. However, it appears that the STT protocol
didn't get included in the binary somehow. Does it need to be enabled with
a build flag? The STT option also doesn't work with IPsec, but the manual
says it should.


The STT driver isn't included in the upstream drivers.  You'd need to 
pull from the OOT drivers at

github.

- Greg

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] datapath: fix flow actions reallocation

2019-04-10 Thread Greg Rose
From: Andrea Righi 

Upstream commit:
commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb
Author: Andrea Righi 
Date:   Thu Mar 28 07:36:00 2019 +0100

openvswitch: fix flow actions reallocation

The flow action buffer can be resized if it's not big enough to contain
all the requested flow actions. However, this resize doesn't take into
account the new requested size, the buffer is only increased by a factor
of 2x. This might be not enough to contain the new data, causing a
buffer overflow, for example:

[   42.044472] 
=
[   42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten
[   42.046415] 
-

[   42.047715] Disabling lock debugging due to kernel taint
[   42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc
[   42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 
flags=0x2808101
[   42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb

[   42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc
  
[   42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 
00 00  l...
[   42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 
15 f6  l...x...
[   42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 
00 00   ...
[   42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00  
[   42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00  
[   42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00  
[   42.059061] Redzone 8bf2c4a5: 00 00 00 00
  
[   42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a
  

Fix by making sure the new buffer is properly resized to contain all the
requested data.

BugLink: https://bugs.launchpad.net/bugs/1813244
Signed-off-by: Andrea Righi 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 

Cc: Andrea Righi 
Signed-off-by: Greg Rose 
---
 datapath/flow_netlink.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
index e5e469a..13e6e33 100644
--- a/datapath/flow_netlink.c
+++ b/datapath/flow_netlink.c
@@ -2311,14 +2311,14 @@ static struct nlattr *reserve_sfa_size(struct 
sw_flow_actions **sfa,
 
struct sw_flow_actions *acts;
int new_acts_size;
-   int req_size = NLA_ALIGN(attr_len);
+   size_t req_size = NLA_ALIGN(attr_len);
int next_offset = offsetof(struct sw_flow_actions, actions) +
(*sfa)->actions_len;
 
if (req_size <= (ksize(*sfa) - next_offset))
goto out;
 
-   new_acts_size = ksize(*sfa) * 2;
+   new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2);
 
if (new_acts_size > MAX_ACTIONS_BUFSIZE) {
if ((MAX_ACTIONS_BUFSIZE - next_offset) < req_size) {
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] SST Tunnel missing from Open vSwitch v2.11.0?

2019-04-10 Thread Nikolas Britton
Hi,

I just finished compiling OVS 2.11.0 packages for Ubuntu 18.04 using the
Debian upstream source package. I was experimenting with tunnels to create
an overlay network to connect multiple KVM hypervisors. One of the options
I tried to use was an STT tunnel. However, it appears that the STT protocol
didn't get included in the binary somehow. Does it need to be enabled with
a build flag? The STT option also doesn't work with IPsec, but the manual
says it should.

2019-04-10T17:26:44.063Z|00041|netdev_vport|WARN|stt0: unknown stt argument
'remote_cert'
2019-04-10T17:26:44.064Z|00042|dpif|WARN|system@ovs-system: failed to add
stt0 as port: Address family not supported by protocol
2019-04-10T17:26:44.064Z|00043|bridge|WARN|could not add network device
stt0 to ofproto (Address family not supported by protocol)

root@bm002:~# ovs-vsctl show
75ef1aaf-a31a-4e87-b3d3-94241d9b7125
Bridge "overlay0"
Port "overlay0"
Interface "overlay0"
type: internal
Port "stt0"
Interface "stt0"
type: stt
options: {local_ip="10.3.40.52", remote_ip="10.3.40.51"}
error: "could not add network device stt0 to ofproto
(Address family not supported by protocol)"
ovs_version: "2.11.0"

-- 
*Nikolas Britton*
*Senior Deployment Engineer, Mirantis*
12301-B Riata Trace Pkwy Bldg #2, Suite 150, Austin, Texas
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] netdev-vport: Check DPDK_NETDEV for offloaded API.

2019-04-10 Thread Roni Bar Yanai



>-Original Message-
>From: Ilya Maximets 
>Sent: Wednesday, April 10, 2019 5:21 PM
>To: Ophir Munk ; ovs-dev@openvswitch.org
>Cc: Ian Stokes ; Olga Shern ;
>Kevin Traynor ; Asaf Penso ;
>Roni Bar Yanai ; Paul Blakey ;
>Roi Dayan ; Flavio Leitner 
>Subject: Re: [PATCH v1] netdev-vport: Check DPDK_NETDEV for offloaded
>API.
>
>On 10.04.2019 1:30, Ophir Munk wrote:
>> Hi Ilya,
>> Please see comments inline
>>
>>> -Original Message-
>>> From: Ilya Maximets 
>> ...
>>> On 02.04.2019 12:18, Ophir Munk wrote:
 vport offloaded functions should have a different implementation for
 linux based OVS versus dpdk based OVS. Currently there is only support
 for linux based offloaded API. The code in the file named
 netdev-vport.c checks 'ifdef __linux__' to select the linux based
 offloaded functions, without checking 'ifndef DPDK_NETDEV' as well.
 '__linux__' is a pre-defined compiler macro, indicating that a source
 code is compiled on a linux based system. Any code inside a __linux__
 definition will be excluded on a windows based system, for example.
 'DPDK_NETDEV' is a macro defined by autoconf tools when configuring
 OVS to be dpdk based as shown in [1].
 Before this commit and in case hw-offload=true - using a vport
 interface with a dpdk based OVS daemon running on a linux machine
 resulted in an error since the vport linux based offloaded APIs were
 called instead of returning EOPNOTSUPP. Luckily the linux offloaded
 API returned immediately on a get_ifindex() failure, which caused no
 harm. An example of the failure message is shown in [2].

 [1]
 configure --with-dpdk=/

 [2]
 ovs|2|netdev_tc_offloads(dp_netdev_flow_5)|ERR|flow_put:
>failed
>>> to
 get ifindex for vxlan_sys_4789: No such device

 Fixes: 01b257860c89 ("netdev-vport: Use common offloads interface")
 Signed-off-by: Ophir Munk 
 ---
>>>
>>> Hi.
>>>
>>> We already discussed this with Roni some time ago. You can't just disable
>>> vport offloading on compile time. Compiling with DPDK support doesn't
>>> mean that it will be used only with userspace datapath. DPDK support is
>just
>>> an additional feature. You could still use kernel datapath and even use
>kernel
>>> and userpace datapaths at the same time under control of the same ovs-
>>> vswitchd process. If those error messages are annoying,
>>
>> The main issue I am concerned with is that unexpected/unplanned kernel
>targeted code is executed
>> as part of dpdk run. It may now end with annoying messages, but I think
>kernel code should be avoided
>> in the first place in case of dpdk datapath. I will send a new patch to 
>> address
>it.
>>
>>> you need to
>>> *check* if offloading supported by *each particular netdev-vport in
>>> runtime*, probably based on the datapath type they assigned to.
>>
>> Thank you for this advice. Do you have pointers into the code where such
>checks occur?
>
>The simplest way is to duplicate vport types stripping the offloading
>and update 'dpif_netdev_port_open_type' to return names of duplicated
>ones.
>It probably be good to add '-user' suffix like 'geneve-user' or 'erspan-user'.
>
>I thought that Roni already has some solution for his tunneling offload
>patches.
>You may ask how he done that.

Thanks Ilya, I did a patch but haven't submit it yet.
I did it by changing the pointers according to the bridge type the port was 
added to, I didn't duplicate.
Will discuss with Ophir.
Thanks

>
>>
>>> In case of
>>> simultaneous running of different datapaths some vports will have backing
>>> linux devices and some will not.
>>>
>>> Compiling out offloading based on enabling of DPDK support will just break
>>> offloading for kernel datapath. At least, this will cause issues for distros
>that
>>> will have to decide if they want DPDK support or vport offloading in kernel.
>>>
>>> Best regards, Ilya Maximets.
>>>
  lib/netdev-vport.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c index
 808a43f..5ba7455 100644
 --- a/lib/netdev-vport.c
 +++ b/lib/netdev-vport.c
 @@ -47,8 +47,8 @@
  #include "unaligned.h"
  #include "unixctl.h"
  #include "openvswitch/vlog.h"
 +#if defined(__linux__) && !defined(DPDK_NETDEV)
  #include "netdev-tc-offloads.h"
 -#ifdef __linux__
  #include "netdev-linux.h"
  #endif

 @@ -1093,7 +1093,7 @@ netdev_vport_get_pt_mode(const struct
>netdev
 *netdev)



>>
 -#ifdef __linux__
 +#if defined(__linux__) && !defined(DPDK_NETDEV)
  static int
  netdev_vport_get_ifindex(const struct netdev *netdev_)  { @@ -
>1105,10
 +1105,10 @@ netdev_vport_get_ifindex(const struct netdev *netdev_)

  #define NETDEV_VPORT_GET_IFINDEX netdev_vport_get_ifindex
>>> #define
 NETDEV_FLOW_OFFLOAD_API , LINUX_FLOW_OFFLOAD_API -#else /*
>>> !__linux__
 */
 

Re: [ovs-dev] [ovs-dev, v1] datapath-windows: Do not send out nbls when cloned nbls are being accessed

2019-04-10 Thread Aaron Conole
Aaron Conole  writes:

> 0-day Robot  writes:
>
>> Bleep bloop.  Greetings Anand Kumar via dev, I am a robot and I have tried 
>> out your patch.
>> Thanks for your contribution.
>>
>> I encountered some error that I wasn't expecting.  See the details below.
>>
>>
>> checkpatch:
>> ERROR: Author should not be mailing list.
>> Lines checked: 31, Warnings: 0, Errors: 1
>
> I looked into this - it seems that your patch doesn't have a Reply-To
> field.

Clarification, when fetched from Patchwork it doesn't seem that the
Reply-To field is included.  Perhaps this is something useful to address
in Patchwork as well?

> I plan to catch cases like this and duplicate the 'Signed-off-by:' as a
> 'From:'.
>
> Sorry for the noise, again.

>>
>> Please check this out.  If you feel there has been an error, please email 
>> acon...@bytheb.org
>>
>> Thanks,
>> 0-day Robot
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-dev, v1] datapath-windows: Do not send out nbls when cloned nbls are being accessed

2019-04-10 Thread Aaron Conole
0-day Robot  writes:

> Bleep bloop.  Greetings Anand Kumar via dev, I am a robot and I have tried 
> out your patch.
> Thanks for your contribution.
>
> I encountered some error that I wasn't expecting.  See the details below.
>
>
> checkpatch:
> ERROR: Author should not be mailing list.
> Lines checked: 31, Warnings: 0, Errors: 1

I looked into this - it seems that your patch doesn't have a Reply-To
field.

I plan to catch cases like this and duplicate the 'Signed-off-by:' as a
'From:'.

Sorry for the noise, again.

>
> Please check this out.  If you feel there has been an error, please email 
> acon...@bytheb.org
>
> Thanks,
> 0-day Robot
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-dev,2/5] datapath: convert to kvmalloc

2019-04-10 Thread Yifeng Sun
Got it, thanks!

Reviewed-by: Yifeng Sun 

On Wed, Apr 10, 2019 at 10:43 AM Gregory Rose  wrote:
>
>
>
> On 4/1/2019 10:21 AM, Yifeng Sun wrote:
> > As commented, previously flex_array was used because "we can have large
> >  hash-table without large contiguous kernel memory".
> >
> > Actually how large can the flow hash table can be? Will this backport
> > affect currently supported hash table size?
> Hi Yifeng,
>
> sorry for the late reply...
>
> The introduction of kvmalloc in the Linux kernel with commit a7c3e901a4
> "mm: introduce kv[mz]alloc helpers"
> helps explain.  kvmalloc_array() will allocate large, contiguous memory
> areas.
>
> As noted in this patch by Kent Overstreet we should probably use
> rhashtable instead but that would take
> additional work and is for future development.
>
> Thanks,
>
> - Greg
>
> >
> > Thanks,
> > Yifeng
> >
> > On Wed, Mar 27, 2019 at 9:06 AM 0-day Robot  wrote:
> >> Bleep bloop.  Greetings Gregory Rose, I am a robot and I have tried out 
> >> your patch.
> >> Thanks for your contribution.
> >>
> >> I encountered some error that I wasn't expecting.  See the details below.
> >>
> >>
> >> checkpatch:
> >> ERROR: Author Kent Overstreet  needs to sign 
> >> off.
> >> WARNING: Unexpected sign-offs from developers who are not authors or 
> >> co-authors or committers: Greg Rose 
> >> Lines checked: 208, Warnings: 1, Errors: 1
> >>
> >>
> >> Please check this out.  If you feel there has been an error, please email 
> >> acon...@bytheb.org
> >>
> >> Thanks,
> >> 0-day Robot
> >> ___
> >> dev mailing list
> >> d...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-dev,2/5] datapath: convert to kvmalloc

2019-04-10 Thread Gregory Rose



On 4/1/2019 10:21 AM, Yifeng Sun wrote:

As commented, previously flex_array was used because "we can have large
 hash-table without large contiguous kernel memory".

Actually how large can the flow hash table can be? Will this backport
affect currently supported hash table size?

Hi Yifeng,

sorry for the late reply...

The introduction of kvmalloc in the Linux kernel with commit a7c3e901a4 
"mm: introduce kv[mz]alloc helpers"
helps explain.  kvmalloc_array() will allocate large, contiguous memory 
areas.


As noted in this patch by Kent Overstreet we should probably use 
rhashtable instead but that would take

additional work and is for future development.

Thanks,

- Greg



Thanks,
Yifeng

On Wed, Mar 27, 2019 at 9:06 AM 0-day Robot  wrote:

Bleep bloop.  Greetings Gregory Rose, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Kent Overstreet  needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors or 
committers: Greg Rose 
Lines checked: 208, Warnings: 1, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@bytheb.org

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] openvswitch crash on i386

2019-04-10 Thread Gregory Rose



On 4/8/2019 1:31 AM, Juerg Haefliger wrote:

On Tue, 19 Mar 2019 11:45:44 -0700
Gregory Rose  wrote:

[snip]


Hi Ben, I apologize for the late reply - I've been out and am just
catching up.

Yes I have, and when I do the bug goes away.  I've also run a lockdep
enabled kernel - didn't see anything there.  I've enabled some further
kernel debug options and will get back to working on this now that I'm
back to work.
For me, this commit [1] seems to fix the issue.

...Juerg

[1] https://lore.kernel.org/lkml/20190328063600.GC16096@xps-13


I'll give it a try.  Thanks for the pointer!

- Greg





Thanks,

- Greg



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Allocate vhost_id dynamically.

2019-04-10 Thread Ian Stokes

On 4/10/2019 5:50 PM, Ben Pfaff wrote:

On Wed, Apr 10, 2019 at 05:35:34PM +0100, Ian Stokes wrote:

On 4/10/2019 5:21 PM, Ian Stokes wrote:

On 3/5/2019 4:28 PM, Ilya Maximets wrote:

'vhost_id' is an array of 'PATH_MAX' bytes in the middle of
'netdev_dpdk' structure. That is 4K bytes.

'vhost_id' never used on a hot path and there is no need to keep
it inside the structure memory. Dynamic allocation will allow to
decrease 'struct netdev_dpdk' signigficantly, saving 4KB per ETH

signigficantly -> significantly

port (ETH ports doesn't use 'vhost_id') and almost same value per
vhost ports (real 'vhost_id's, in common case, are much shorter).
We could save the pointer space by making the union with 'devargs'
which is mutually exclusive with 'vhost_id'.
As we're just removing the single 'PADDED_MEMBER', the total
cacheline layout is not affected.

Stats for 'struct netdev_dpdk':

  Before: /* size: 4992, cachelines: 78 */
  After : /* size:  896, cachelines: 14 */



Thanks Ilya, this looks like a good change, validated without issue.

I've addressed a minor typo in the commit message and comment style
issue below. Other than that it looks good so applied to master.

On a broader note, do you think it's worth capturing changes such as
this for users in documentation? i.e. previously the vhost-id path was
limited to 4096 (however unlikely it was that someone would exceed
this), but it wasn't captured in the OVS documentation from what I can
see. From a usability point is it something to document between the 2.10
and 2.11 release?



Sorry, meant OVS 2.11 to 2.12 above.


I like to document user-visible changes, but I think in this case it is
unlikely to fix any real user problems.  (It would be different if we
made this change in response to a user having a problem with a very long
vhost-id path.)



Sure, I don't think a user would be aware of this unless they were 
hitting the 4096 limit, in which case I think they would have a 
different issue at hand :). It was just an after thought during the 
review when I checked if the vhost-user path limit was documented. Not 
necessary to be captured now though.


Ian
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Allocate vhost_id dynamically.

2019-04-10 Thread Ben Pfaff
On Wed, Apr 10, 2019 at 05:35:34PM +0100, Ian Stokes wrote:
> On 4/10/2019 5:21 PM, Ian Stokes wrote:
> > On 3/5/2019 4:28 PM, Ilya Maximets wrote:
> > > 'vhost_id' is an array of 'PATH_MAX' bytes in the middle of
> > > 'netdev_dpdk' structure. That is 4K bytes.
> > > 
> > > 'vhost_id' never used on a hot path and there is no need to keep
> > > it inside the structure memory. Dynamic allocation will allow to
> > > decrease 'struct netdev_dpdk' signigficantly, saving 4KB per ETH
> > signigficantly -> significantly
> > > port (ETH ports doesn't use 'vhost_id') and almost same value per
> > > vhost ports (real 'vhost_id's, in common case, are much shorter).
> > > We could save the pointer space by making the union with 'devargs'
> > > which is mutually exclusive with 'vhost_id'.
> > > As we're just removing the single 'PADDED_MEMBER', the total
> > > cacheline layout is not affected.
> > > 
> > > Stats for 'struct netdev_dpdk':
> > > 
> > >  Before: /* size: 4992, cachelines: 78 */
> > >  After : /* size:  896, cachelines: 14 */
> > > 
> > 
> > Thanks Ilya, this looks like a good change, validated without issue.
> > 
> > I've addressed a minor typo in the commit message and comment style
> > issue below. Other than that it looks good so applied to master.
> > 
> > On a broader note, do you think it's worth capturing changes such as
> > this for users in documentation? i.e. previously the vhost-id path was
> > limited to 4096 (however unlikely it was that someone would exceed
> > this), but it wasn't captured in the OVS documentation from what I can
> > see. From a usability point is it something to document between the 2.10
> > and 2.11 release?
> > 
> 
> Sorry, meant OVS 2.11 to 2.12 above.

I like to document user-visible changes, but I think in this case it is
unlikely to fix any real user problems.  (It would be different if we
made this change in response to a user having a problem with a very long
vhost-id path.)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Allocate vhost_id dynamically.

2019-04-10 Thread Ian Stokes

On 4/10/2019 5:21 PM, Ian Stokes wrote:

On 3/5/2019 4:28 PM, Ilya Maximets wrote:

'vhost_id' is an array of 'PATH_MAX' bytes in the middle of
'netdev_dpdk' structure. That is 4K bytes.

'vhost_id' never used on a hot path and there is no need to keep
it inside the structure memory. Dynamic allocation will allow to
decrease 'struct netdev_dpdk' signigficantly, saving 4KB per ETH

signigficantly -> significantly

port (ETH ports doesn't use 'vhost_id') and almost same value per
vhost ports (real 'vhost_id's, in common case, are much shorter).
We could save the pointer space by making the union with 'devargs'
which is mutually exclusive with 'vhost_id'.
As we're just removing the single 'PADDED_MEMBER', the total
cacheline layout is not affected.

Stats for 'struct netdev_dpdk':

 Before: /* size: 4992, cachelines: 78 */
 After : /* size:  896, cachelines: 14 */



Thanks Ilya, this looks like a good change, validated without issue.

I've addressed a minor typo in the commit message and comment style 
issue below. Other than that it looks good so applied to master.


On a broader note, do you think it's worth capturing changes such as 
this for users in documentation? i.e. previously the vhost-id path was 
limited to 4096 (however unlikely it was that someone would exceed 
this), but it wasn't captured in the OVS documentation from what I can 
see. From a usability point is it something to document between the 2.10 
and 2.11 release?




Sorry, meant OVS 2.11 to 2.12 above.

Ian

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Allocate vhost_id dynamically.

2019-04-10 Thread Ian Stokes

On 3/5/2019 4:28 PM, Ilya Maximets wrote:

'vhost_id' is an array of 'PATH_MAX' bytes in the middle of
'netdev_dpdk' structure. That is 4K bytes.

'vhost_id' never used on a hot path and there is no need to keep
it inside the structure memory. Dynamic allocation will allow to
decrease 'struct netdev_dpdk' signigficantly, saving 4KB per ETH

signigficantly -> significantly

port (ETH ports doesn't use 'vhost_id') and almost same value per
vhost ports (real 'vhost_id's, in common case, are much shorter).
We could save the pointer space by making the union with 'devargs'
which is mutually exclusive with 'vhost_id'.
As we're just removing the single 'PADDED_MEMBER', the total
cacheline layout is not affected.

Stats for 'struct netdev_dpdk':

 Before: /* size: 4992, cachelines: 78 */
 After : /* size:  896, cachelines: 14 */



Thanks Ilya, this looks like a good change, validated without issue.

I've addressed a minor typo in the commit message and comment style 
issue below. Other than that it looks good so applied to master.


On a broader note, do you think it's worth capturing changes such as 
this for users in documentation? i.e. previously the vhost-id path was 
limited to 4096 (however unlikely it was that someone would exceed 
this), but it wasn't captured in the OVS documentation from what I can 
see. From a usability point is it something to document between the 2.10 
and 2.11 release?


Ian
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Trabajando desde Casa

2019-04-10 Thread Reducir costos y potenciar la productividad
Webinar Interactivo – Viernes 17 de Mayo

Guía para la implementación del Home Office

Hace algunos años era impensable trabajar desde casa. Ahora esta tendencia 
incluso podría representar utilidades
 a la organización, reducir costos y potenciar la productividad de nuestros 
empleados. 

Revisaremos los conceptos del Home Offce así como el proceso deimplementación 
en una empresa. 
 

Ejes Temáticos:

• Definición de conceptos básicos.

• Evaluación de situación actual y potencial de la organización para ésta 
práctica.

• Proceso a seguir para la correcta implementación.

• Supervisión y control del trabajo desde casa.

 
Para mayor información, responder sobre este correo con la palabra HOME + los 
siguientes datos:


NOMBRE:

TELÉFONO:

EMPRESA:

CORREO ALTERNO: 

 
 Llamanos al (045) 55 1554 6630
www.Innovalearn.mx 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-10 Thread Jan Scheurich
> >
> > I am afraid it is not a valid assumption that there will be similarly large
> number of OVS PMD threads as there are queues.
> >
> > In OpenStack deployments the OVS is typically statically configured to use a
> few dedicated host CPUs for PMDs (perhaps 2-8).
> >
> > Typical Telco VNF VMs, on the other hand, are very large (12-20 vCPUs or
> even more). If they enable an instance for multi-queue in Nova, Nova (in its
> eternal wisdom) will set up every vhostuser port with #vCPU queue pairs.
> 
> For me, it's an issue of Nova. It's pretty easy to limit the maximum number of
> queue pairs to some sane value (the value that could be handled by your
> number of available PMD threads).
> It'll be a one config and a small patch to nova-compute. With a bit more work
> you could make this per-port configurable and finally stop wasting HW
> resources.

OK, I fully agree. 
The OpenStack community is slow, though, when it comes to these kind of changes.
Do we have contacts we could push?

> 
> > A (real world) VM with 20 vCPUs and 6 ports would have 120 queue pairs,
> even if only one or two high-traffic ports can actually profit from 
> multi-queue.
> Even on those ports is it unlikely that the application will use all 16 
> queues. And
> often there would be another such VM on the second NUMA node.
> 
> With limiting the number of queues in Nova (like I described above) to 4 
> you'll
> have just
> 24 queues for 6 ports. If you'll make it per-port, you'll be able to limit 
> this to
> even more sane values.

Yes, per port configuration in Neutron seems the logical thing for me to do,
rather than a global per instance parameter in the Nova flavor. A per server
setting in Nova compute to limit the number of acceptable queue pairs to
match the OVS configuration might still be useful on top.

> 
> >
> > So, as soon as a VNF enables MQ in OpenStack, there will typically be a vast
> number of un-used queue pairs in OVS and it makes a lot of sense to minimize
> the run-time impact of having these around.
> 
> For me it seems like not an OVS, DPDK or QEMU issue. The orchestrator should
> configure sane values first of all. It's totally unclear why we're changing 
> OVS
> instead of changing Nova.

The VNF orchestrator would request queues based on the applications needs. They
should not need to be aware of the configuration of the infrastructure (such as
the number of PMD threads in OVS). The OpenStack operator would have to make
sure that the instantiated queues are a good compromise between application
needs and infra capabilities.

> 
> >
> > We have had discussion earlier with RedHat as to how a vhostuser backend
> like OVS could negotiate the number of queue pairs with Qemu down to a
> reasonable value (e.g. the number PMDs available for polling) *before* Qemu
> would actually start the guest. The guest would then not have to guess on the
> optimal number of queue pairs to actually activate.
> >
> > BR, Jan
> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-10 Thread Ilya Maximets
On 10.04.2019 17:10, David Marchand wrote:
> On Wed, Apr 10, 2019 at 9:32 AM David Marchand  > wrote:
> 
> On Tue, Apr 9, 2019 at 1:42 PM Ilya Maximets  > wrote:
> 
> 
> What do you think about something like this (not even compile tested):
> ---
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index bd9718824..0cf492a3b 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -591,6 +591,8 @@ struct polled_queue {
>      struct dp_netdev_rxq *rxq;
>      odp_port_t port_no;
>      bool emc_enabled;
> +    bool enabled;
> +    uint64_t change_seq;
>  };
> 
>  /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
> @@ -5375,6 +5377,9 @@ pmd_load_queues_and_ports(struct 
> dp_netdev_pmd_thread *pmd,
>          poll_list[i].rxq = poll->rxq;
>          poll_list[i].port_no = poll->rxq->port->port_no;
>          poll_list[i].emc_enabled = poll->rxq->port->emc_enabled;
> +        poll_list[i].enabled = netdev_rxq_enabled(poll->rxq);
> +        poll_list[i].change_seq =
> +                     netdev_get_change_seq(poll->rxq->port->netdev);
>          i++;
>      }
> 
> @@ -5440,6 +5445,10 @@ reload:
> 
>          for (i = 0; i < poll_cnt; i++) {
> 
> +            if (!poll_list[i].enabled) {
> +                continue;
> +            }
> +
>              if (poll_list[i].emc_enabled) {
>                  atomic_read_relaxed(>dp->emc_insert_min,
>                                      >ctx.emc_insert_min);
> @@ -5476,6 +5485,15 @@ reload:
>              if (reload) {
>                  break;
>              }
> +
> +            for (i = 0; i < poll_cnt; i++) {
> +                uint64_t current_seq =
> +                         
> netdev_get_change_seq(poll_list[i].rxq->port->netdev);
> +                if (poll_list[i].change_seq != current_seq) {
> +                    poll_list[i].change_seq = current_seq;
> +                    poll_list[i].enabled = 
> netdev_rxq_enabled(poll_list[i].rxq);
> +                }
> +            }
>          }
>          pmd_perf_end_iteration(s, rx_packets, tx_packets,
>                                 pmd_perf_metrics_enabled(pmd));
> ---
> + replacing of your 'netdev_request_reconfigure(>up)' inside 
> 'vring_state_changed'
> with 'netdev_change_seq_changed(>up)'?
> 
> 
> Interesting, I would prefer to have a bitmap of rxq rather than looping 
> on all and checking the state.
> But let me try your approach first.
> 
> 
> Incorporated this change, and it works fine, I get similar numbers 
> with/without a bitmap, so I dropped the bitmap.
> I would say we are only missing some information in dpif-netdev/pmd-rxq-show 
> to get a per rxq status.
> 
> I will submit this with the fix on F_MQ if this is okay for you.

Sure.
I'll make a more detailed review and, probably, some tests after that.

> 
> Thanks!
> 
> -- 
> David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] netdev-vport: Check DPDK_NETDEV for offloaded API.

2019-04-10 Thread Ilya Maximets
On 10.04.2019 1:30, Ophir Munk wrote:
> Hi Ilya,
> Please see comments inline
> 
>> -Original Message-
>> From: Ilya Maximets 
> ...
>> On 02.04.2019 12:18, Ophir Munk wrote:
>>> vport offloaded functions should have a different implementation for
>>> linux based OVS versus dpdk based OVS. Currently there is only support
>>> for linux based offloaded API. The code in the file named
>>> netdev-vport.c checks 'ifdef __linux__' to select the linux based
>>> offloaded functions, without checking 'ifndef DPDK_NETDEV' as well.
>>> '__linux__' is a pre-defined compiler macro, indicating that a source
>>> code is compiled on a linux based system. Any code inside a __linux__
>>> definition will be excluded on a windows based system, for example.
>>> 'DPDK_NETDEV' is a macro defined by autoconf tools when configuring
>>> OVS to be dpdk based as shown in [1].
>>> Before this commit and in case hw-offload=true - using a vport
>>> interface with a dpdk based OVS daemon running on a linux machine
>>> resulted in an error since the vport linux based offloaded APIs were
>>> called instead of returning EOPNOTSUPP. Luckily the linux offloaded
>>> API returned immediately on a get_ifindex() failure, which caused no
>>> harm. An example of the failure message is shown in [2].
>>>
>>> [1]
>>> configure --with-dpdk=/
>>>
>>> [2]
>>> ovs|2|netdev_tc_offloads(dp_netdev_flow_5)|ERR|flow_put: failed
>> to
>>> get ifindex for vxlan_sys_4789: No such device
>>>
>>> Fixes: 01b257860c89 ("netdev-vport: Use common offloads interface")
>>> Signed-off-by: Ophir Munk 
>>> ---
>>
>> Hi.
>>
>> We already discussed this with Roni some time ago. You can't just disable
>> vport offloading on compile time. Compiling with DPDK support doesn't
>> mean that it will be used only with userspace datapath. DPDK support is just
>> an additional feature. You could still use kernel datapath and even use 
>> kernel
>> and userpace datapaths at the same time under control of the same ovs-
>> vswitchd process. If those error messages are annoying, 
> 
> The main issue I am concerned with is that unexpected/unplanned kernel 
> targeted code is executed 
> as part of dpdk run. It may now end with annoying messages, but I think  
> kernel code should be avoided 
> in the first place in case of dpdk datapath. I will send a new patch to 
> address it.
> 
>> you need to
>> *check* if offloading supported by *each particular netdev-vport in
>> runtime*, probably based on the datapath type they assigned to. 
> 
> Thank you for this advice. Do you have pointers into the code where such 
> checks occur?

The simplest way is to duplicate vport types stripping the offloading
and update 'dpif_netdev_port_open_type' to return names of duplicated ones.
It probably be good to add '-user' suffix like 'geneve-user' or 'erspan-user'.

I thought that Roni already has some solution for his tunneling offload patches.
You may ask how he done that.

> 
>> In case of
>> simultaneous running of different datapaths some vports will have backing
>> linux devices and some will not.
>>
>> Compiling out offloading based on enabling of DPDK support will just break
>> offloading for kernel datapath. At least, this will cause issues for distros 
>> that
>> will have to decide if they want DPDK support or vport offloading in kernel.
>>
>> Best regards, Ilya Maximets.
>>
>>>  lib/netdev-vport.c | 8 
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c index
>>> 808a43f..5ba7455 100644
>>> --- a/lib/netdev-vport.c
>>> +++ b/lib/netdev-vport.c
>>> @@ -47,8 +47,8 @@
>>>  #include "unaligned.h"
>>>  #include "unixctl.h"
>>>  #include "openvswitch/vlog.h"
>>> +#if defined(__linux__) && !defined(DPDK_NETDEV)
>>>  #include "netdev-tc-offloads.h"
>>> -#ifdef __linux__
>>>  #include "netdev-linux.h"
>>>  #endif
>>>
>>> @@ -1093,7 +1093,7 @@ netdev_vport_get_pt_mode(const struct netdev
>>> *netdev)
>>>
>>>
>>>  
> 
>>> -#ifdef __linux__
>>> +#if defined(__linux__) && !defined(DPDK_NETDEV)
>>>  static int
>>>  netdev_vport_get_ifindex(const struct netdev *netdev_)  { @@ -1105,10
>>> +1105,10 @@ netdev_vport_get_ifindex(const struct netdev *netdev_)
>>>
>>>  #define NETDEV_VPORT_GET_IFINDEX netdev_vport_get_ifindex
>> #define
>>> NETDEV_FLOW_OFFLOAD_API , LINUX_FLOW_OFFLOAD_API -#else /*
>> !__linux__
>>> */
>>> +#else
>>>  #define NETDEV_VPORT_GET_IFINDEX NULL  #define
>>> NETDEV_FLOW_OFFLOAD_API -#endif /* __linux__ */
>>> +#endif
>>>
>>>  #define VPORT_FUNCTIONS_COMMON  \
>>>  .run = netdev_vport_run,\
>>>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-10 Thread David Marchand
On Wed, Apr 10, 2019 at 9:32 AM David Marchand 
wrote:

> On Tue, Apr 9, 2019 at 1:42 PM Ilya Maximets 
> wrote:
>
>>
>> What do you think about something like this (not even compile tested):
>> ---
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index bd9718824..0cf492a3b 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -591,6 +591,8 @@ struct polled_queue {
>>  struct dp_netdev_rxq *rxq;
>>  odp_port_t port_no;
>>  bool emc_enabled;
>> +bool enabled;
>> +uint64_t change_seq;
>>  };
>>
>>  /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
>> @@ -5375,6 +5377,9 @@ pmd_load_queues_and_ports(struct
>> dp_netdev_pmd_thread *pmd,
>>  poll_list[i].rxq = poll->rxq;
>>  poll_list[i].port_no = poll->rxq->port->port_no;
>>  poll_list[i].emc_enabled = poll->rxq->port->emc_enabled;
>> +poll_list[i].enabled = netdev_rxq_enabled(poll->rxq);
>> +poll_list[i].change_seq =
>> + netdev_get_change_seq(poll->rxq->port->netdev);
>>  i++;
>>  }
>>
>> @@ -5440,6 +5445,10 @@ reload:
>>
>>  for (i = 0; i < poll_cnt; i++) {
>>
>> +if (!poll_list[i].enabled) {
>> +continue;
>> +}
>> +
>>  if (poll_list[i].emc_enabled) {
>>  atomic_read_relaxed(>dp->emc_insert_min,
>>  >ctx.emc_insert_min);
>> @@ -5476,6 +5485,15 @@ reload:
>>  if (reload) {
>>  break;
>>  }
>> +
>> +for (i = 0; i < poll_cnt; i++) {
>> +uint64_t current_seq =
>> +
>>  netdev_get_change_seq(poll_list[i].rxq->port->netdev);
>> +if (poll_list[i].change_seq != current_seq) {
>> +poll_list[i].change_seq = current_seq;
>> +poll_list[i].enabled =
>> netdev_rxq_enabled(poll_list[i].rxq);
>> +}
>> +}
>>  }
>>  pmd_perf_end_iteration(s, rx_packets, tx_packets,
>> pmd_perf_metrics_enabled(pmd));
>> ---
>> + replacing of your 'netdev_request_reconfigure(>up)' inside
>> 'vring_state_changed'
>> with 'netdev_change_seq_changed(>up)'?
>>
>
> Interesting, I would prefer to have a bitmap of rxq rather than looping on
> all and checking the state.
> But let me try your approach first.
>

Incorporated this change, and it works fine, I get similar numbers
with/without a bitmap, so I dropped the bitmap.
I would say we are only missing some information in
dpif-netdev/pmd-rxq-show to get a per rxq status.

I will submit this with the fix on F_MQ if this is okay for you.

Thanks!

-- 
David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-10 Thread Jan Scheurich
Hi Ilya,

> >
> > With a simple pvp setup of mine.
> > 1c/2t poll two physical ports.
> > 1c/2t poll four vhost ports with 16 queues each.
> >   Only one queue is enabled on each virtio device attached by the guest.
> >   The first two virtio devices are bound to the virtio kmod.
> >   The last two virtio devices are bound to vfio-pci and used to forward
> incoming traffic with testpmd.
> >
> > The forwarding zeroloss rate goes from 5.2Mpps (polling all 64 vhost queues)
> to 6.2Mpps (polling only the 4 enabled vhost queues).
> 
> That's interesting. However, this doesn't look like a realistic scenario.
> In practice you'll need much more PMD threads to handle so many queues.
> If you'll add more threads, zeroloss test could show even worse results if 
> one of
> idle VMs will periodically change the number of queues. Periodic latency 
> spikes
> will cause queue overruns and subsequent packet drops on hot Rx queues. This
> could be partially solved by allowing n_rxq to grow only.
> However, I'd be happy to have different solution that will not hide number of
> queues from the datapath.
> 

I am afraid it is not a valid assumption that there will be similarly large 
number of OVS PMD threads as there are queues. 

In OpenStack deployments the OVS is typically statically configured to use a 
few dedicated host CPUs for PMDs (perhaps 2-8).

Typical Telco VNF VMs, on the other hand, are very large (12-20 vCPUs or even 
more). If they enable an instance for multi-queue in Nova, Nova (in its eternal 
wisdom) will set up every vhostuser port with #vCPU queue pairs. A (real world) 
VM with 20 vCPUs and 6 ports would have 120 queue pairs, even if only one or 
two high-traffic ports can actually profit from multi-queue. Even on those 
ports is it unlikely that the application will use all 16 queues. And often 
there would be another such VM on the second NUMA node.

So, as soon as a VNF enables MQ in OpenStack, there will typically be a vast 
number of un-used queue pairs in OVS and it makes a lot of sense to minimize 
the run-time impact of having these around. 

We have had discussion earlier with RedHat as to how a vhostuser backend like 
OVS could negotiate the number of queue pairs with Qemu down to a reasonable 
value (e.g. the number PMDs available for polling) *before* Qemu would actually 
start the guest. The guest would then not have to guess on the optimal number 
of queue pairs to actually activate.

BR, Jan
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: fix meter granulariy.

2019-04-10 Thread Ilya Maximets
On 09.04.2019 20:59, William Tu wrote:
> When testing packet rate around 1Mpps with meter enabled, the frequency
> of hitting meter action becomes much higher, around 30us each time.
> As a result, the meter's calculation of 'uint32_t delta_t' becomes
> always 0 and meter action has no effect.  This is due to the previous
> commit 05f9e707e194 divides the delta by 1000, in order to convert to
> msec granularity.  The patch fixes it by using double to hold the delta
> value.
> 
> Fixes: 05f9e707e194 ("dpif-netdev: Use microsecond granularity.")
> Cc: Ilya Maximets 
> Cc: Yi-Hung Wei 
> Signed-off-by: William Tu 
> ---

Hi. Good catch.

In fact, we may return previous behaviour by just dividing before
subtracting like this:
long_delta_t = now / 1000 - meter->used / 1000; /* msec */

I'm not much familiar with meters here. Is it OK if 'band->bucket'
grows only if we're crossing millisecond boundary?
Your change makes it grow smoothly on each call which wasn't the
case previously.

Jarno, maybe you have some thoughts about this?

Best regards, Ilya Maximets.

>  lib/dpif-netdev.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 4d6d0c372236..61d0e9f2bf69 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -5525,8 +5525,8 @@ dp_netdev_run_meter(struct dp_netdev *dp, struct 
> dp_packet_batch *packets_,
>  struct dp_meter *meter;
>  struct dp_meter_band *band;
>  struct dp_packet *packet;
> -long long int long_delta_t; /* msec */
> -uint32_t delta_t; /* msec */
> +double double_delta_t; /* msec */
> +double delta_t; /* msec */
>  const size_t cnt = dp_packet_batch_size(packets_);
>  uint32_t bytes, volume;
>  int exceeded_band[NETDEV_MAX_BURST];
> @@ -5549,12 +5549,12 @@ dp_netdev_run_meter(struct dp_netdev *dp, struct 
> dp_packet_batch *packets_,
>  memset(exceeded_rate, 0, cnt * sizeof *exceeded_rate);
>  
>  /* All packets will hit the meter at the same time. */
> -long_delta_t = (now - meter->used) / 1000; /* msec */
> +double_delta_t = (double)(now - meter->used) / 1000.0; /* msec */
>  
>  /* Make sure delta_t will not be too large, so that bucket will not
>   * wrap around below. */
> -delta_t = (long_delta_t > (long long int)meter->max_delta_t)
> -? meter->max_delta_t : (uint32_t)long_delta_t;
> +delta_t = (double_delta_t > (long long int)meter->max_delta_t)
> +? (double)meter->max_delta_t : double_delta_t;
>  
>  /* Update meter stats. */
>  meter->used = now;
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH OVS v3 0/4] ovs-tc: support OvS internal port offload

2019-04-10 Thread Simon Horman
On Tue, Apr 09, 2019 at 03:36:10PM +0100, John Hurley wrote:
> Common use-cases in OvS can produce datapath rules that match on OvS
> internal ports. For example, when the endpoint IP address of a VXLAN
> tunnel is on the OvS bridge itself, datapath rules may take the form:
> 
> 1. in_port(eth1),actions:set(tunnel(...)),vxlan_sys_4789
> 2. in_port(br0),eth(src=...,dst=...),actions:eth2
> 
> Here, the first rule outputs to the vxlan port which determines the next
> hop to be the OvS bridge. The user-space NORMAL rule can then be used
> to form rule 2 that forwards the tunnelled packet to a physical port.
> 
> Similarly, OvS can add rules that redirect to an internal port, e.g.:
> 
> 1. in_port(eth2),eth(src=...,dst=...),actions:br0
> 2. in_port(vxlan_sys_4789),tunnel(...),actions:eth1
> 
> In this case, tunnelled packets arriving on eth2 are matched on their MAC
> addresses and forwarded to internal port br0. As the tunnel endpoint IP
> exists here, the packet can be decapped and match on rule 2.
> 
> Currently, rules applied to internal ports are not offloaded to the TC
> datapath. This patchset proposes changes to addresses that. 
> 
> The implimentation of OvS internal ports in Linux means that if a packet
> is sent to them (via a datapath rule) then the packet is redirected
> through the network stack as if it had been received on that port. If a
> packet egresses an internal port (via its xmit ndo) then packet is passed
> back into the OvS datapath with an ingress port set to that of the
> internal port.
> 
> To offload rules sending to internal ports as TC filters, the OvS-TC API
> is modified to make use of the ingress mirred action. This allows us to
> direct packets through the network stack from TC.
> 
> For packets egressing internal ports, OvS-TC is modified to apply and use
> egress filters. The start xmit ndo in OvS internal ports pushes packets
> into the OvS kernel datapath with ingress set to the given internal port.
> This is essentially what happens when packets RX on other ports that have
> an OvS bridge as their master. Therefore, applying flower filters to the
> egress path of an internal port is akin to the current practice of adding
> the filters to the ingress path of non internal ports.
> 
> Once such filters exist in TC, it enables drivers to interpret OvS
> internal ports with the potential to offload a representation of them to
> harware devices.

Thanks John,

I have applied this series to the master branch.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH OVS v3 4/4] ovs-tc: offload datapath rules matching on internal ports

2019-04-10 Thread Roi Dayan



On 09/04/2019 17:36, John Hurley wrote:
> Rules applied to OvS internal ports are not represented in TC datapaths.
> However, it is possible to support rules matching on internal ports in TC.
> The start_xmit ndo of OvS internal ports directs packets back into the OvS
> kernel datapath where they are rematched with the ingress port now being
> that of the internal port. Due to this, rules matching on an internal port
> can be added as TC filters to an egress qdisc for these ports.
> 
> Allow rules applied to internal ports to be offloaded to TC as egress
> filters. Rules redirecting to an internal port are also offloaded. These
> are supported by the redirect ingress functionality applied in an earlier
> patch.
> 
> Signed-off-by: John Hurley 
> ---
>  lib/dpif.c   | 13 +
>  lib/netdev-linux.c   |  1 +
>  lib/netdev-tc-offloads.c | 41 +
>  3 files changed, 35 insertions(+), 20 deletions(-)
> 
> diff --git a/lib/dpif.c b/lib/dpif.c
> index 457c9bf..063ba20 100644
> --- a/lib/dpif.c
> +++ b/lib/dpif.c
> @@ -101,12 +101,9 @@ static bool should_log_flow_message(const struct 
> vlog_module *module,
>  struct seq *tnl_conf_seq;
>  
>  static bool
> -dpif_is_internal_port(const char *type)
> +dpif_is_tap_port(const char *type)
>  {
> -/* For userspace datapath, tap devices are the equivalent
> - * of internal devices in the kernel datapath, so both
> - * these types are 'internal' devices. */
> -return !strcmp(type, "internal") || !strcmp(type, "tap");
> +return !strcmp(type, "tap");
>  }
>  
>  static void
> @@ -359,7 +356,7 @@ do_open(const char *name, const char *type, bool create, 
> struct dpif **dpifp)
>  struct netdev *netdev;
>  int err;
>  
> -if (dpif_is_internal_port(dpif_port.type)) {
> +if (dpif_is_tap_port(dpif_port.type)) {
>  continue;
>  }
>  
> @@ -434,7 +431,7 @@ dpif_remove_netdev_ports(struct dpif *dpif) {
>  struct dpif_port dpif_port;
>  
>  DPIF_PORT_FOR_EACH (_port, _dump, dpif) {
> -if (!dpif_is_internal_port(dpif_port.type)) {
> +if (!dpif_is_tap_port(dpif_port.type)) {
>  netdev_ports_remove(dpif_port.port_no, dpif->dpif_class);
>  }
>  }
> @@ -582,7 +579,7 @@ dpif_port_add(struct dpif *dpif, struct netdev *netdev, 
> odp_port_t *port_nop)
>  VLOG_DBG_RL(_rl, "%s: added %s as port %"PRIu32,
>  dpif_name(dpif), netdev_name, port_no);
>  
> -if (!dpif_is_internal_port(netdev_get_type(netdev))) {
> +if (!dpif_is_tap_port(netdev_get_type(netdev))) {
>  
>  struct dpif_port dpif_port;
>  
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index 87337e0..776d938 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -3340,6 +3340,7 @@ const struct netdev_class netdev_tap_class = {
>  
>  const struct netdev_class netdev_internal_class = {
>  NETDEV_LINUX_CLASS_COMMON,
> +LINUX_FLOW_OFFLOAD_API,
>  .type = "internal",
>  .construct = netdev_linux_construct,
>  .get_stats = netdev_internal_get_stats,
> diff --git a/lib/netdev-tc-offloads.c b/lib/netdev-tc-offloads.c
> index 634c9b9..d5c66ac 100644
> --- a/lib/netdev-tc-offloads.c
> +++ b/lib/netdev-tc-offloads.c
> @@ -59,6 +59,12 @@ is_internal_port(const char *type)
>  return !strcmp(type, "internal");
>  }
>  
> +static enum tc_qdisc_hook
> +get_tc_qdisc_hook(struct netdev *netdev)
> +{
> +return is_internal_port(netdev_get_type(netdev)) ? TC_EGRESS : 
> TC_INGRESS;
> +}
> +
>  static struct netlink_field set_flower_map[][4] = {
>  [OVS_KEY_ATTR_IPV4] = {
>  { offsetof(struct ovs_key_ipv4, ipv4_src),
> @@ -185,11 +191,12 @@ del_ufid_tc_mapping(const ovs_u128 *ufid)
>  /* Wrapper function to delete filter and ufid tc mapping */
>  static int
>  del_filter_and_ufid_mapping(int ifindex, int prio, int handle,
> -uint32_t block_id, const ovs_u128 *ufid)
> +uint32_t block_id, const ovs_u128 *ufid,
> +enum tc_qdisc_hook hook)
>  {
>  int err;
>  
> -err = tc_del_filter(ifindex, prio, handle, block_id, TC_INGRESS);
> +err = tc_del_filter(ifindex, prio, handle, block_id, hook);
>  del_ufid_tc_mapping(ufid);
>  
>  return err;
> @@ -346,6 +353,7 @@ get_block_id_from_netdev(struct netdev *netdev)
>  int
>  netdev_tc_flow_flush(struct netdev *netdev)
>  {
> +enum tc_qdisc_hook hook = get_tc_qdisc_hook(netdev);
>  int ifindex = netdev_get_ifindex(netdev);
>  uint32_t block_id = 0;
>  
> @@ -357,13 +365,14 @@ netdev_tc_flow_flush(struct netdev *netdev)
>  
>  block_id = get_block_id_from_netdev(netdev);
>  
> -return tc_flush(ifindex, block_id, TC_INGRESS);
> +return tc_flush(ifindex, block_id, hook);
>  }
>  
>  int
>  netdev_tc_flow_dump_create(struct netdev *netdev,
>  

Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-10 Thread Ilya Maximets
On 09.04.2019 17:49, Kevin Traynor wrote:
> On 09/04/2019 12:41, Ilya Maximets wrote:
>> On 09.04.2019 11:34, Ilya Maximets wrote:
>>> On 08.04.2019 16:44, David Marchand wrote:
 Hello Ilya,

 On Mon, Apr 8, 2019 at 10:27 AM Ilya Maximets >>> > wrote:

 On 04.04.2019 22:49, David Marchand wrote:
 > We tried to lower the number of rebalances but we don't have a
 > satisfying solution at the moment, so this patch rebalances on each
 > update.

 Hi.

 Triggering the reconfiguration on each vring state change is a bad 
 thing.
 This could be abused by the guest to break the host networking by 
 infinite
 disabling/enabling queues. Each reconfiguration leads to removing ports
 from the PMD port caches and their reloads. On rescheduling all the 
 ports


 I'd say the reconfiguration itself is not wanted here.
 Only rebalancing the queues would be enough.
>>>
>>> As you correctly mentioned in commit message where will be no real port
>>> configuration changes.
>>>
>>> Under 'reconfiguration' I mean datapath reconfiguration and 'rebalancing'
>>> is one of its stages.
>>>


 could be moved to a different PMD threads resulting in EMC/SMC/dpcls
 invalidation and subsequent upcalls/packet reorderings.


 I agree that rebalancing does trigger EMC/SMC/dpcls invalidation when 
 moving queues.
 However, EMC/SMC/dpcls are per pmd specific, where would we have packet 
 reordering ?
>>>
>>> Rx queue could be scheduled to different PMD thread --> new packets will go 
>>> to different
>>> Tx queue. It's unlikely, but it will depend on device/driver which packets 
>>> will be sent
>>> first. The main issue here that it happens to other ports, not only to port 
>>> we're trying
>>> to reconfigure.
>>>



 Same issues was discussed previously while looking at possibility of
 vhost-pmd integration (with some test results):
 https://mail.openvswitch.org/pipermail/ovs-dev/2016-August/320430.html


 Thanks for the link, I will test this.



 One more reference:
 7f5f2bd0ce43 ("netdev-dpdk: Avoid reconfiguration on reconnection of 
 same vhost device.")


 Yes, I saw this patch.
 Are we safe against guest drivers/applications that play with 
 VIRTIO_NET_F_MQ, swapping it continuously ?
>>>
>>> Good point. I didn't test that, but it looks like we're not safe here.
>>> Kernel and DPDK drivers has F_MQ enabled by default so it'll require
>>> some changes/explicit disabling. But I agree that this could produce
>>> issues if someone will do that.
>>>
>>> We could probably handle this using 'max seen qp_num' approach. But I'm
>>> not a fan of this. Need to figure out how to do this correctly.
>>>
>>> In general, I think, that we should not allow cases where guest is able
>>> to manipulate the host configuration.
>>>




 Anyway, do you have some numbers of how much time PMD thread spends on 
 polling
 disabled queues? What the performance improvement you're able to 
 achieve by
 avoiding that?


 With a simple pvp setup of mine.
 1c/2t poll two physical ports.
 1c/2t poll four vhost ports with 16 queues each.
   Only one queue is enabled on each virtio device attached by the guest.
   The first two virtio devices are bound to the virtio kmod.
   The last two virtio devices are bound to vfio-pci and used to forward 
 incoming traffic with testpmd.

 The forwarding zeroloss rate goes from 5.2Mpps (polling all 64 vhost 
 queues) to 6.2Mpps (polling only the 4 enabled vhost queues).
>>>
>>> That's interesting. However, this doesn't look like a realistic scenario.
>>> In practice you'll need much more PMD threads to handle so many queues.
>>> If you'll add more threads, zeroloss test could show even worse results
>>> if one of idle VMs will periodically change the number of queues. Periodic
>>> latency spikes will cause queue overruns and subsequent packet drops on
>>> hot Rx queues. This could be partially solved by allowing n_rxq to grow 
>>> only.
>>> However, I'd be happy to have different solution that will not hide number
>>> of queues from the datapath.
>>
>> What do you think about something like this (not even compile tested):
>> ---
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index bd9718824..0cf492a3b 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -591,6 +591,8 @@ struct polled_queue {
>>  struct dp_netdev_rxq *rxq;
>>  odp_port_t port_no;
>>  bool emc_enabled;
>> +bool enabled;
>> +uint64_t change_seq;
>>  };
>>  
>>  /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
>> @@ -5375,6 +5377,9 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread 
>> *pmd,
>>  

Re: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues

2019-04-10 Thread David Marchand
Hello Ilya,

On Tue, Apr 9, 2019 at 1:42 PM Ilya Maximets  wrote:

>
> What do you think about something like this (not even compile tested):
> ---
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index bd9718824..0cf492a3b 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -591,6 +591,8 @@ struct polled_queue {
>  struct dp_netdev_rxq *rxq;
>  odp_port_t port_no;
>  bool emc_enabled;
> +bool enabled;
> +uint64_t change_seq;
>  };
>
>  /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
> @@ -5375,6 +5377,9 @@ pmd_load_queues_and_ports(struct
> dp_netdev_pmd_thread *pmd,
>  poll_list[i].rxq = poll->rxq;
>  poll_list[i].port_no = poll->rxq->port->port_no;
>  poll_list[i].emc_enabled = poll->rxq->port->emc_enabled;
> +poll_list[i].enabled = netdev_rxq_enabled(poll->rxq);
> +poll_list[i].change_seq =
> + netdev_get_change_seq(poll->rxq->port->netdev);
>  i++;
>  }
>
> @@ -5440,6 +5445,10 @@ reload:
>
>  for (i = 0; i < poll_cnt; i++) {
>
> +if (!poll_list[i].enabled) {
> +continue;
> +}
> +
>  if (poll_list[i].emc_enabled) {
>  atomic_read_relaxed(>dp->emc_insert_min,
>  >ctx.emc_insert_min);
> @@ -5476,6 +5485,15 @@ reload:
>  if (reload) {
>  break;
>  }
> +
> +for (i = 0; i < poll_cnt; i++) {
> +uint64_t current_seq =
> +
>  netdev_get_change_seq(poll_list[i].rxq->port->netdev);
> +if (poll_list[i].change_seq != current_seq) {
> +poll_list[i].change_seq = current_seq;
> +poll_list[i].enabled =
> netdev_rxq_enabled(poll_list[i].rxq);
> +}
> +}
>  }
>  pmd_perf_end_iteration(s, rx_packets, tx_packets,
> pmd_perf_metrics_enabled(pmd));
> ---
> + replacing of your 'netdev_request_reconfigure(>up)' inside
> 'vring_state_changed'
> with 'netdev_change_seq_changed(>up)'?
>

Interesting, I would prefer to have a bitmap of rxq rather than looping on
all and checking the state.
But let me try your approach first.



> This should help to not reconfigure/reschedule everything and not waste
> much time on
> polling. Also this will give ability to faster react on queue state
> changes.
>

I suspect the additional branch could still have an impact, but I won't
know before trying.



> After that we'll be able to fix reconfiguration on F_MQ manipulations with
> something
> simple like that:
> ---
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 91af377e8..1937561cc 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -3492,8 +3528,8 @@ new_device(int vid)
>  newnode = dev->socket_id;
>  }
>
> -if (dev->requested_n_txq != qp_num
> -|| dev->requested_n_rxq != qp_num
> +if (dev->requested_n_txq < qp_num
> +|| dev->requested_n_rxq < qp_num
>  || dev->requested_socket_id != newnode) {
>  dev->requested_socket_id = newnode;
>  dev->requested_n_rxq = qp_num;
> ---
> Decreasing of 'qp_num' will not cause big issues because we'll not poll
> disabled queues.
>

Yes.



> And there could be one separate change to drop the requested values if
> socket connection
> closed (I hope that guest is not able to control that):
>

I don't think the guest can control this part.


---
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 91af377e8..1937561cc 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -185,13 +185,16 @@ static const struct rte_eth_conf port_conf = {
>   */
>  static int new_device(int vid);
>  static void destroy_device(int vid);
> +static void destroy_connection(int vid);
>  static int vring_state_changed(int vid, uint16_t queue_id, int enable);
>  static const struct vhost_device_ops virtio_net_device_ops =
>  {
>  .new_device =  new_device,
>  .destroy_device = destroy_device,
>  .vring_state_changed = vring_state_changed,
> -.features_changed = NULL
> +.features_changed = NULL,
> +.new_connection = NULL,
> +.destroy_connection = destroy_connection
>  };
>
>  enum { DPDK_RING_SIZE = 256 };
> @@ -3462,6 +3465,39 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk *dev)
>  free(enabled_queues);
>  }
>
> +/*
> + * vhost-user socket connection is closed.
> + */
> +static void
> +destroy_connection(int vid)
> +{
> +struct netdev_dpdk *dev;
> +char ifname[IF_NAME_SZ];
> +
> +rte_vhost_get_ifname(vid, ifname, sizeof ifname);
> +
> +ovs_mutex_lock(_mutex);
> +LIST_FOR_EACH(dev, list_node, _list) {
> +ovs_mutex_lock(>mutex);
> +if (nullable_string_is_equal(ifname, dev->vhost_id)) {
> +uint32_t qp_num = NR_QUEUE;
> +
> +/* Restore the 

Re: [ovs-dev] [PATCH 5/5] odp-util: Add FLOW_WC_SEQ assertions.

2019-04-10 Thread Numan Siddique
On Mon, Apr 8, 2019 at 11:55 PM Ben Pfaff  wrote:

> The assertions make it easier to find all the places that need to be
> updated when adding protocol support.
>
> Signed-off-by: Ben Pfaff 
>

Acked-by: Numan Siddique 


> ---
>  lib/odp-util.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/lib/odp-util.c b/lib/odp-util.c
> index 6e545b80ba6c..f60536c1d9dc 100644
> --- a/lib/odp-util.c
> +++ b/lib/odp-util.c
> @@ -5819,6 +5819,11 @@ static void
>  odp_flow_key_from_flow__(const struct odp_flow_key_parms *parms,
>   bool export_mask, struct ofpbuf *buf)
>  {
> +/* New "struct flow" fields that are visible to the datapath
> (including all
> + * data fields) should be translated into equivalent datapath flow
> fields
> + * here (you will have to add a OVS_KEY_ATTR_* for them). */
> +BUILD_ASSERT_DECL(FLOW_WC_SEQ == 41);
> +
>  struct ovs_key_ethernet *eth_key;
>  size_t encap[FLOW_MAX_VLAN_HEADERS] = {0};
>  size_t max_vlans;
> @@ -6900,6 +6905,11 @@ odp_flow_key_to_flow__(const struct nlattr *key,
> size_t key_len,
> struct flow *flow, const struct flow *src_flow,
> char **errorp)
>  {
> +/* New "struct flow" fields that are visible to the datapath
> (including all
> + * data fields) should be translated from equivalent datapath flow
> fields
> + * here (you will have to add a OVS_KEY_ATTR_* for them).  */
> +BUILD_ASSERT_DECL(FLOW_WC_SEQ == 41);
> +
>  enum odp_key_fitness fitness = ODP_FIT_ERROR;
>  if (errorp) {
>  *errorp = NULL;
> --
> 2.20.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 4/5] flow: Add FLOW_WC_SEQ assertions and improve comments.

2019-04-10 Thread Numan Siddique
On Mon, Apr 8, 2019 at 11:54 PM Ben Pfaff  wrote:

> The assertions make it easier to find all the places that need to be
> updated when adding protocol support.
>
> Signed-off-by: Ben Pfaff 
>

Acked-by: Numan Siddique 


> ---
>  lib/flow.c | 64 --
>  lib/odp-util.c | 15 ++--
>  2 files changed, 49 insertions(+), 30 deletions(-)
>
> diff --git a/lib/flow.c b/lib/flow.c
> index 7511b1877ca4..f39b57f5b9d9 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -626,32 +626,8 @@ parse_nsh(const void **datap, size_t *sizep, struct
> ovs_key_nsh *key)
>  return true;
>  }
>
> -/* Initializes 'flow' members from 'packet' and 'md', taking the packet
> type
> - * into account.
> - *
> - * Initializes the layer offsets as follows:
> - *
> - *- packet->l2_5_ofs to the
> - *  * the start of the MPLS shim header. Can be zero, if the
> - *packet is of type (OFPHTN_ETHERTYPE, ETH_TYPE_MPLS).
> - *  * UINT16_MAX when there is no MPLS shim header.
> - *
> - *- packet->l3_ofs is set to
> - *  * zero if the packet_type is in name space OFPHTN_ETHERTYPE
> - *and there is no MPLS shim header.
> - *  * just past the Ethernet header, or just past the vlan_header
> if
> - *one is present, to the first byte of the payload of the
> - *Ethernet frame if the packet type is Ethernet and there is
> - *no MPLS shim header.
> - *  * just past the MPLS label stack to the first byte of the MPLS
> - *payload if there is at least one MPLS shim header.
> - *  * UINT16_MAX if the packet type is Ethernet and the frame is
> - *too short to contain an Ethernet header.
> - *
> - *- packet->l4_ofs is set to just past the IPv4 or IPv6 header, if
> one is
> - *  present and the packet has at least the content used for the
> fields
> - *  of interest for the flow, otherwise UINT16_MAX.
> - */
> +/* This does the same thing as miniflow_extract() with a full-size 'flow'
> as
> + * the destination. */
>  void
>  flow_extract(struct dp_packet *packet, struct flow *flow)
>  {
> @@ -730,11 +706,38 @@ ipv6_sanity_check(const struct ovs_16aligned_ip6_hdr
> *nh, size_t size)
>  return true;
>  }
>
> -/* Caller is responsible for initializing 'dst' with enough storage for
> - * FLOW_U64S * 8 bytes. */
> +/* Initializes 'dst' from 'packet' and 'md', taking the packet type into
> + * account.  'dst' must have enough space for FLOW_U64S * 8 bytes.
> + *
> + * Initializes the layer offsets as follows:
> + *
> + *- packet->l2_5_ofs to the
> + *  * the start of the MPLS shim header. Can be zero, if the
> + *packet is of type (OFPHTN_ETHERTYPE, ETH_TYPE_MPLS).
> + *  * UINT16_MAX when there is no MPLS shim header.
> + *
> + *- packet->l3_ofs is set to
> + *  * zero if the packet_type is in name space OFPHTN_ETHERTYPE
> + *and there is no MPLS shim header.
> + *  * just past the Ethernet header, or just past the vlan_header
> if
> + *one is present, to the first byte of the payload of the
> + *Ethernet frame if the packet type is Ethernet and there is
> + *no MPLS shim header.
> + *  * just past the MPLS label stack to the first byte of the MPLS
> + *payload if there is at least one MPLS shim header.
> + *  * UINT16_MAX if the packet type is Ethernet and the frame is
> + *too short to contain an Ethernet header.
> + *
> + *- packet->l4_ofs is set to just past the IPv4 or IPv6 header, if
> one is
> + *  present and the packet has at least the content used for the
> fields
> + *  of interest for the flow, otherwise UINT16_MAX.
> + */
>  void
>  miniflow_extract(struct dp_packet *packet, struct miniflow *dst)
>  {
> +/* Add code to this function (or its callees) to extract new fields.
> */
> +BUILD_ASSERT_DECL(FLOW_WC_SEQ == 41);
> +
>  const struct pkt_metadata *md = >md;
>  const void *data = dp_packet_data(packet);
>  size_t size = dp_packet_size(packet);
> @@ -3182,6 +3185,11 @@ void
>  flow_compose(struct dp_packet *p, const struct flow *flow,
>   const void *l7, size_t l7_len)
>  {
> +/* Add code to this function (or its callees) for emitting new fields
> or
> + * protocols.  (This isn't essential, so it can be skipped for initial
> + * testing.) */
> +BUILD_ASSERT_DECL(FLOW_WC_SEQ == 41);
> +
>  uint32_t pseudo_hdr_csum;
>  size_t l4_len;
>
> diff --git a/lib/odp-util.c b/lib/odp-util.c
> index b6552c5c208a..6e545b80ba6c 100644
> --- a/lib/odp-util.c
> +++ b/lib/odp-util.c
> @@ -8230,14 +8230,25 @@ commit_encap_decap_action(const struct flow *flow,
>   * in addition to this function if needed.  Sets fields in 'wc' that are
>   * used as part of the action.
>   *
> - * Returns a reason to force processing the flow's packets into the
> userspace
> -