Re: [ovs-discuss] [ovs-dev] Issues configuring OVS-DPDK in openstack queens

2018-10-24 Thread O Mahony, Billy


From: Manojawa Paritala [mailto:manojaw...@biarca.com]
Sent: Tuesday, October 23, 2018 5:37 PM
To: O Mahony, Billy 
Cc: ovs-discuss@openvswitch.org; ovs-...@openvswitch.org; Subba Rao Kodavalla 
; Song, Kee SangX ; Srinivasa Goda 
; Kris Rajana 
Subject: Re: [ovs-dev] Issues configuring OVS-DPDK in openstack queens

Hi Billy,

There are no br-flat1 entries in ovsdb.

As suggested I increased the log level to debug and then tried the same 
scenario again. Though the result was same (br-flat1 getting deleted), I 
observed the below 2 issues (i assume).
[[BO'M]] You can just increase the log level for the bridge module (all the 
extra revalidator debug is making it hard to read the logs.):
ovs-appctl vlog/set bridge:file:dbg

[[BO'M]] Also is there logging from the neutron agent. OvS should not be 
removing the br-flat1 of it’s own accord. Something external is updating ovsdb 
to remove it’s record. Is ovn in the loop here? Is there and ovn-controller or 
similar process running on the host?
I think this is important. As far as I know the decision to delete the bridge 
will not be made by vswitchd. It will be something external that will remove 
the record in ovsdb bridge table. If the neutron agent log does not mention it 
is doing this then maybe check the ovsdb .log file (it may not exist if it’s 
not configured on the ovsdb command line) – you’ll have to check the ovsdb man 
pages and also set. The ovsdb-tool is showing just the delete transaction so we 
need to find out where that delete request is coming from.

Issue-1 :-
1. Everything is up and running. That is all the bridges are displayed in OVS 
and no issues in the logs.

[[BO'M]] can you add the output from vsctl show at this point so I can see the 
desired post agent restart state.
2. I add the below entries in the OVS section of neutron's 
openvswitch-agent.ini file and restart the respective service.

datapath_type=netdev
vhostuser_socket_dir=/var/run/openvswitch

3. As mentioned earlier, bridge br-flat1 is deleted. At this point of time, 
observed the below.

3.1 br-int dtapath type changed from "system" to "netdev".
3.2. Not sure if it is an expected behavour, but there were MAC address changed 
prints only for both br-flat1 & br-int.

2018-10-23T14:39:40.253Z|41205|in_band|DBG|br-int: remote MAC address changed 
from 00:00:00:00:00:00 to 00:a0:c9:0e:01:01
2018-10-23T14:39:41.347Z|41374|in_band|DBG|br-flat1: remote MAC address changed 
from 00:00:00:00:00:00 to 00:a0:c9:0e:01:01
2018-10-23T14:39:48.229Z|41852|in_band|DBG|br-int: remote MAC address changed 
from 00:00:00:00:00:00 to 00:a0:c9:0e:01:01
2018-10-23T14:39:55.032Z|42008|in_band|DBG|br-int: remote MAC address changed 
from 00:00:00:00:00:00 to 00:a0:c9:0e:01:01
[[BO'M]] The datapath type change is expected. The MAC address changes I’m not 
sure.

3.3 interface state of br-int & eth8 (attached to br-flat1) are down.
[[BO'M]] Can you copy the o/p from vsctl show again at this point.

Attaching the debug logs of ovs-vswitch.


Issue-2 :-
1. Everything is up and running. That is all the bridges are displayed in OVS 
and no issues in the logs. I have one bridge named br0, which is of netdev type 
and I have attached a dpdk port, vhost-user port. Everthing is file.


[[BO'M]] When you say everything is fine br-flat1 is still missing right? If 
that is the case lets stick with the missing bridge issue for now. (There is 
actually quite a few issues in the vsctl output below that we deal with after 
we figure out who deleted  br-flat1).

2. Now, in the OVS section ofneutron's openvswitch-agent.ini file, to the 
existing "bridge_mappings" key, I added an extra value "vlan1:br0". The new 
key-value pair now is as below. I wanted to create a new network and then map 
the bridge br0. So, I added this entry.

bridge_mappings = flat:br-flat1,vxlan:br-vxlan,vlan:br-vlan,vlan1:br0

3. Now, when I restart the openvswitch-agent service, I observed that the 
datapath type of br0 changed from netdev to system. In the ovs-vsctl show 
output, I see the below.

   Bridge "br0"
Controller "tcp:127.0.0.1:6633<http://127.0.0.1:6633>"
is_connected: true
fail_mode: secure
Port "vhost-user-1"
Interface "vhost-user-1"
type: dpdkvhostuser
error: "could not add network device vhost-user-1 to ofproto 
(Invalid argument)"
Port "br0"
Interface "br0"
type: internal
Port "port0"
Interface "port0"
type: dpdk
options: {dpdk-devargs=":af:00.1"}
error: "could not add network device port0 to ofproto (Invalid 
argument)"
Port "phy-br0"
Interface "phy-br0"
    type: patch
options: {peer="int-br0&qu

Re: [ovs-discuss] [ovs-dev] Issues configuring OVS-DPDK in openstack queens

2018-10-23 Thread O Mahony, Billy
Hi Manojawa,

So is there any remaining entry br-flat entry in the ovsdb? Does it give any 
clue to the reason – there may be a free-form ‘status’ or ‘info’ field for that 
purpose.

I can understand the situation where a bridge might get incorrectly configured 
but I can’t understand why it is deleted by something other than the agent.

Maybe it tries to create the bridge, there is some error so it decides to 
delete it. Are there more detailed log levels available for the agent? You may 
be able to turn on more detailed logging for the bridge logging in OvS too.

/Billy.


From: Manojawa Paritala [mailto:manojaw...@biarca.com]
Sent: Tuesday, October 23, 2018 12:16 PM
To: O Mahony, Billy 
Cc: ovs-discuss@openvswitch.org; ovs-...@openvswitch.org
Subject: Re: [ovs-dev] Issues configuring OVS-DPDK in openstack queens

Hi Billy,

Thank you for your reply.

1. Huge pages are properly set. Based on the dpdk configuration 
dpdk-socket-mem="4096,4096", 8 pages were created under /dev/hugepages.
2. dpdk-p0 is not attached to br-flat1. Actually I defined the bridge as 
br-flat1.
3. Yes,  'ovs-vsctl show'  does not show br-flat1. As soon as I add the below 
entries in openvswitch-agent.ini and restart the neutron-openmvswitch-agent 
service, br-flat1 is getting deleted. I can see that in the ovs-vswitch logs 
and also in the output of "ovsdb-tool -mmm show-log"

datapath_type=netdev
vhostuser_socket_dir=/var/run/openvswitch

4. I do not see any errors in then neutron-openvswitch-agent logs, except for 
the below which are displayed after the bridge is deleted.

ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
[req-99a234e3-c943-4234-8c4d-f0fdc594df8f - - - - -] Bridge br-flat1 for 
physical network flat does not exist. Agent terminated!

Thanks & Regards,
PVMJ

On Tue, Oct 23, 2018 at 3:06 PM O Mahony, Billy 
mailto:billy.o.mah...@intel.com>> wrote:
Hi,

I don't see any errors relating to the dpdk interfaces. But it is also not 
clear where the user-space drivers are bound and the hugepage memory is set up. 
So double check those two items.

Is the dpdk-p0 interface being attached to br-flat? Even if there are issues 
with the dpdk port the bridge should not be deleted (at least not automatically 
by OvS).

Can you confirm with 'ovs-vsctl show' that the br-flat is actually not present 
after the agent is restarted. And that the dpdk-p0 is not reporting an error.

What does the neutron-openmvswitch-agent logs say?

Also run ovsdb-tool -mmm show-log which might give a clue as to when and how 
br-flat is being modified.

Regards,
Billy

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org<mailto:ovs-dev-boun...@openvswitch.org> 
> [mailto:ovs-dev-<mailto:ovs-dev->
> boun...@openvswitch.org<mailto:boun...@openvswitch.org>] On Behalf Of 
> Manojawa Paritala
> Sent: Monday, October 22, 2018 3:31 PM
> To: ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org>; 
> ovs-...@openvswitch.org<mailto:ovs-...@openvswitch.org>
> Subject: [ovs-dev] Issues configuring OVS-DPDK in openstack queens
>
> Hello All,
>
> On a 3 node (one controller + 2 compute), we configured Openstack Queens
> using OSA with OVS. On all the nodes, we defined br-mgmt as linux bridge, br-
> tun as private network and br-flat as external.
> Installation was successful and we could create networks and instances on
> Openstack.
>
> Below are the versions of the OVS packages used on each node.
>
> Controller :- openstack-vswitch - 2.9.0
> Computes :- openstack-vswitch-dpdk - 2.9.0 (as we wanted to configure dpdk on
> the compute hosts)
>
> The openstack-vswitch-dpdk 2.9.0 package that we installed had dpdk version
> 17.11.3. When we tried to enable DPDK it failed with the below error.
>
> dpdk|ERR|DPDK not supported in this copy of Open vSwitch
>
> So, we downloaded the sources for dpdk 17.11.4 and openvswitch 2.9.2, built
> openvswitch with dpdk as suggested in the below official link.
> No issues on Openstack or OVS.
> http://docs.openvswitch.org/en/latest/intro/install/dpdk/
>
> Then, we added the below parameters to OVS and everything looked ok.
> No issues in Openstack or OVS.
>
> $ovs-vsctl get Open_vSwitch . other_config {dpdk-extra="-n 2", 
> dpdk-init="true",
> dpdk-lcore-mask="0x3000", dpdk-socket-mem="4096,4096", pmd-
> cpu-mask="0xf3c", vhost-iommu-support="true"}
>
> Then on the compute node, in openvswitch_agent.ini file - OVS section, I added
> the below (based on the link
> https://docs.openstack.org/neutron/pike/contributor/internals/ovs_vhostuser.h
> tml
> )
> and restarted neutron-openmvswitch-agent service.
>
> datapath_type=netdev
> vhostuser_socket_dir=/var/run/openvswitch
>
> After the a

Re: [ovs-discuss] Mega-flow generation

2018-08-30 Thread O Mahony, Billy
Hi Sara,

A colleague of mine had a lot of difficulty using dpctl/add-flow to add flow 
directly to the datapath. And the man-page is a little lukewarm on it’s usage 
too albeit you are just debugging (ovs-vswitchd (8)-  DATAPATH FLOW TABLE 
DEBUGGING COMMANDS) . Also you will find that whatever rules you can insert 
will be very shortly deleted again by the revalidator thread as they will not 
correspond to any existing ofproto rule.

You should set up flows using ofctl then generate packets to match those flows 
and you should be able to monitor the netlink messages exchanged between 
vswitchd and the vswitch .ko as the megaflow is installed to the kernel module. 
I don’t know how you might monitor the netlink messages - ip-monitor (8) is a 
possibility. Turning on vswitchd debug in the relevant module is another 
possibility.

Hope some of that helps,

Billy.



From: Sara Gittlin [mailto:sara.gitt...@gmail.com]
Sent: Thursday, August 30, 2018 9:44 AM
To: O Mahony, Billy 
Cc: Ben Pfaff ; ovs-discuss@openvswitch.org
Subject: Re: [ovs-discuss] Mega-flow generation

i think i can 'ride on'  the callback "dpctl_put_flow"  - which is invoked 
whenever the cli  'ovs-dpctl add-flow' is executed.correct ?

On Thu, Aug 30, 2018 at 10:32 AM Sara Gittlin 
mailto:sara.gitt...@gmail.com>> wrote:
Billy,
Can you please refer me to the code where we start the megaflow creation to the 
kernel upon an upcall,  and where we send it to the kernel via the netlink 
socket ?
what i want to do is - to generate megaflows, based on openflow tables,  not 
neccesararily upon an upcall, that is not reactive mode and to send to the 
kernel.
Another question is,  who is responsible to  associate a specific  megaflow  to 
the appropriate table in the cache ?  is it the kernel module or vswitchd ?
Thank you
-Sara


On Tue, Aug 28, 2018 at 1:12 PM Sara Gittlin 
mailto:sara.gitt...@gmail.com>> wrote:
Thank you Ben and Billy
-Sara

On Tue, Aug 28, 2018 at 11:27 AM O Mahony, Billy 
mailto:billy.o.mah...@intel.com>> wrote:
Hi Sara,

This article 
https://software.intel.com/en-us/articles/ovs-dpdk-datapath-classifier gives 
practical overview of how megaflows, aka wildcarded or datapath flows, work at 
least in the ovs-dpdk (userspace datapath) context.

Regards,
Billy

> -Original Message-
> From: 
> ovs-discuss-boun...@openvswitch.org<mailto:ovs-discuss-boun...@openvswitch.org>
>  [mailto:ovs-discuss-<mailto:ovs-discuss->
> boun...@openvswitch.org<mailto:boun...@openvswitch.org>] On Behalf Of Ben 
> Pfaff
> Sent: Monday, August 27, 2018 5:10 PM
> To: Sara Gittlin mailto:sara.gitt...@gmail.com>>
> Cc: ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org>
> Subject: Re: [ovs-discuss] Mega-flow generation
>
> On Mon, Aug 27, 2018 at 02:46:19PM +0300, Sara Gittlin wrote:
> > Can someone refer me to the code of the megaflow generation process ?
> > Is this process  invoked by an upcall from the kernel module ? like in
> > microflow ?
>
> Did you read the OVS paper?  It's all about megaflows.
> http://www.openvswitch.org/support/papers/nsdi2015.pdf
> ___
> discuss mailing list
> disc...@openvswitch.org<mailto:disc...@openvswitch.org>
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] ovs w dpdk live lock?

2018-08-29 Thread O Mahony, Billy
Thanks, David. I’ll re-create the scenario here tomorrow and see what happens.

From: davidjoshuaev...@gmail.com [mailto:davidjoshuaev...@gmail.com]
Sent: Wednesday, August 29, 2018 2:16 PM
To: O Mahony, Billy 
Cc: ovs-discuss@openvswitch.org
Subject: Re: [ovs-discuss] ovs w dpdk live lock?

Thats correct Billy.

Say for instance you have pings from an internal vPort coming through to a DPDK 
port on a netdev bridge. (an openflow rule for each direction of traffic, no 
mac learning)
Pings work fine till you unplug the other device.
Then traffic will come in bursts and have massive latency up to 3 seconds or 
thereabouts.
Now, from the callback check_link_status down the stack...  
ixgbe_setup_mac_link_multispeed_fiber is 'supposed' to be non-blocking... but i 
see we end up in rte_delay_us_block
And it looks to me that 'main' thread gets hooked up in that activity.

I hope i am not leading you up the garden path, but that seems to me to be 
where it's at.

Dave.






On Wed, Aug 29, 2018 at 7:37 AM O Mahony, Billy 
mailto:billy.o.mah...@intel.com>> wrote:
Hi David,

I just saw this email now. Thanks, that’s interesting.

Can you confirm that the issue is a transient 3s approx. interruption in 
traffic when the cable of another port (not involved in the test traffic) is 
unplugged? Then I will try to recreate it here.

Also I’m re-adding the ovs-discuss list to the cc. “Don’t drop the list” is a 
regular comment on ovs-*.

Regards,
Billy

From: davidjoshuaev...@gmail.com<mailto:davidjoshuaev...@gmail.com> 
[mailto:davidjoshuaev...@gmail.com<mailto:davidjoshuaev...@gmail.com>]
Sent: Tuesday, August 28, 2018 9:07 PM
To: O Mahony, Billy mailto:billy.o.mah...@intel.com>>
Subject: Re: [ovs-discuss] ovs w dpdk live lock?

also...
perhaps related to stats collection

#0  0x7fe86789c0e2 in rte_delay_us_block () from /lib64/librte_eal.so.6
#1  0x7fe8624940c1 in ixgbe_setup_mac_link_multispeed_fiber () from 
/lib64/librte_pmd_ixgbe.so.2
#2  0x7fe8624b5684 in ixgbe_dev_link_update () from 
/lib64/librte_pmd_ixgbe.so.2
#3  0x7fe8673f52ad in rte_eth_link_get_nowait () from 
/lib64/librte_ethdev.so.8
#4  0x560fa8333905 in check_link_status ()
#5  0x560fa8333a53 in netdev_dpdk_get_carrier ()
#6  0x560fa8333abb in netdev_dpdk_get_stats ()
#7  0x560fa8282adb in netdev_get_stats ()
#8  0x560fa81f1ca8 in iface_refresh_stats.part.26 ()
#9  0x560fa81fad68 in bridge_run ()
#10 0x560fa81efc9d in main ()

main thread seems to end up here a lot...


On Mon, Aug 27, 2018 at 11:14 AM David Evans 
mailto:davidjoshuaev...@gmail.com>> wrote:
Hi Billy,
I did some more tests.
It appears to happen when i unplug the optics from the SFP.
The dpdk is 17.11.3 and ovs 2.9.2 source,, but built as RPM's and installed 
from the RPM.
I don't know why the RPM build reports the 17.11.0. - this might be a bug in 
the RPM script.

As you can see from below, i don't even have rules on the switch.
If i add rules to the switch just to force traffic from one port to another, 
there will be traffic interruption.. multiple second latency against the ports.

Start up script looks like this after attach all viable devices to igb_uio

ovsdb-tool create $OVS_DBDIR/conf.db $OVS_SCHEMA
echo "Starting the OVS database server"
ovsdb-server --remote=punix:"$DB_SOCK" \
 --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
 --pidfile --detach

ovs-vsctl --no-wait init

echo ""
echo "start ovs   "
echo ""
echo "Telling the OVS controller to start OVS with DPDK using 512MB hugepage 
memory and run the ovswitchd daemon on logical core 1"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true \
 other_config:dpdk-lcore-mask=0x1E other_config:dpdk-socket-mem="512"

echo "Starting the OVS daemon..."
ovs-vswitchd unix:$DB_SOCK --pidfile 
--log-file=/var/log/openvswitch/vswitchd.log --detach
sleep 2
echo "Before creating the DPDK ports, tell OVS to use Core 2 for the DPDK PMD"
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1E
ovs-vsctl add-port br0 dpdkx0 -- set Interface dpdkx0 type=dpdk 
options:dpdk-devargs=:0a:00.0 options:n_rxq=2
ovs-vsctl add-port br0 dpdkx1 -- set Interface dpdkx1 type=dpdk 
options:dpdk-devargs=:0a:00.1 options:n_rxq=2
ovs-vsctl add-port br0 dpdkx2 -- set Interface dpdkx2 type=dpdk 
options:dpdk-devargs=:0b:00.0
ovs-vsctl add-port br0 dpdkx3 -- set Interface dpdkx3 type=dpdk 
options:dpdk-devargs=:0b:00.1

echo "**Clearing current flows"
ovs-ofctl del-flows br0

 You don't need any traffic to get these kind of messages, just un-plug the 
fibre.
2018-08-27T15:41:27.629Z|00483|timeval|WARN|Unreasonably long 2961ms poll 
interval (998ms user, 0ms system)
2018-08-27T15:41:27.629Z|00484|timeval|WARN|context switches: 

Re: [ovs-discuss] ovs w dpdk live lock?

2018-08-29 Thread O Mahony, Billy
Hi David,

I just saw this email now. Thanks, that’s interesting.

Can you confirm that the issue is a transient 3s approx. interruption in 
traffic when the cable of another port (not involved in the test traffic) is 
unplugged? Then I will try to recreate it here.

Also I’m re-adding the ovs-discuss list to the cc. “Don’t drop the list” is a 
regular comment on ovs-*.

Regards,
Billy

From: davidjoshuaev...@gmail.com [mailto:davidjoshuaev...@gmail.com]
Sent: Tuesday, August 28, 2018 9:07 PM
To: O Mahony, Billy 
Subject: Re: [ovs-discuss] ovs w dpdk live lock?

also...
perhaps related to stats collection

#0  0x7fe86789c0e2 in rte_delay_us_block () from /lib64/librte_eal.so.6
#1  0x7fe8624940c1 in ixgbe_setup_mac_link_multispeed_fiber () from 
/lib64/librte_pmd_ixgbe.so.2
#2  0x7fe8624b5684 in ixgbe_dev_link_update () from 
/lib64/librte_pmd_ixgbe.so.2
#3  0x7fe8673f52ad in rte_eth_link_get_nowait () from 
/lib64/librte_ethdev.so.8
#4  0x560fa8333905 in check_link_status ()
#5  0x560fa8333a53 in netdev_dpdk_get_carrier ()
#6  0x560fa8333abb in netdev_dpdk_get_stats ()
#7  0x560fa8282adb in netdev_get_stats ()
#8  0x560fa81f1ca8 in iface_refresh_stats.part.26 ()
#9  0x560fa81fad68 in bridge_run ()
#10 0x560fa81efc9d in main ()

main thread seems to end up here a lot...


On Mon, Aug 27, 2018 at 11:14 AM David Evans 
mailto:davidjoshuaev...@gmail.com>> wrote:
Hi Billy,
I did some more tests.
It appears to happen when i unplug the optics from the SFP.
The dpdk is 17.11.3 and ovs 2.9.2 source,, but built as RPM's and installed 
from the RPM.
I don't know why the RPM build reports the 17.11.0. - this might be a bug in 
the RPM script.

As you can see from below, i don't even have rules on the switch.
If i add rules to the switch just to force traffic from one port to another, 
there will be traffic interruption.. multiple second latency against the ports.

Start up script looks like this after attach all viable devices to igb_uio

ovsdb-tool create $OVS_DBDIR/conf.db $OVS_SCHEMA
echo "Starting the OVS database server"
ovsdb-server --remote=punix:"$DB_SOCK" \
 --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
 --pidfile --detach

ovs-vsctl --no-wait init

echo ""
echo "start ovs   "
echo ""
echo "Telling the OVS controller to start OVS with DPDK using 512MB hugepage 
memory and run the ovswitchd daemon on logical core 1"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true \
 other_config:dpdk-lcore-mask=0x1E other_config:dpdk-socket-mem="512"

echo "Starting the OVS daemon..."
ovs-vswitchd unix:$DB_SOCK --pidfile 
--log-file=/var/log/openvswitch/vswitchd.log --detach
sleep 2
echo "Before creating the DPDK ports, tell OVS to use Core 2 for the DPDK PMD"
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1E
ovs-vsctl add-port br0 dpdkx0 -- set Interface dpdkx0 type=dpdk 
options:dpdk-devargs=:0a:00.0 options:n_rxq=2
ovs-vsctl add-port br0 dpdkx1 -- set Interface dpdkx1 type=dpdk 
options:dpdk-devargs=:0a:00.1 options:n_rxq=2
ovs-vsctl add-port br0 dpdkx2 -- set Interface dpdkx2 type=dpdk 
options:dpdk-devargs=:0b:00.0
ovs-vsctl add-port br0 dpdkx3 -- set Interface dpdkx3 type=dpdk 
options:dpdk-devargs=:0b:00.1

echo "**Clearing current flows"
ovs-ofctl del-flows br0

 You don't need any traffic to get these kind of messages, just un-plug the 
fibre.
2018-08-27T15:41:27.629Z|00483|timeval|WARN|Unreasonably long 2961ms poll 
interval (998ms user, 0ms system)
2018-08-27T15:41:27.629Z|00484|timeval|WARN|context switches: 1 voluntary, 105 
involuntary
2018-08-27T15:41:35.469Z|00485|timeval|WARN|Unreasonably long 2840ms poll 
interval (955ms user, 0ms system)
2018-08-27T15:41:35.469Z|00486|timeval|WARN|context switches: 1 voluntary, 99 
involuntary
2018-08-27T15:41:43.469Z|00487|timeval|WARN|Unreasonably long 3000ms poll 
interval (1000ms user, 0ms system)
2018-08-27T15:41:43.469Z|00488|timeval|WARN|context switches: 1 voluntary, 106 
involuntary
2018-08-27T15:41:51.389Z|00489|timeval|WARN|Unreasonably long 2920ms poll 
interval (996ms user, 0ms system)
2018-08-27T15:41:51.389Z|00490|timeval|WARN|context switches: 1 voluntary, 102 
involuntary
2018-08-27T15:41:58.369Z|00491|timeval|WARN|Unreasonably long 1980ms poll 
interval (992ms user, 0ms system)
2018-08-27T15:41:58.369Z|00492|timeval|WARN|context switches: 0 voluntary, 101 
involuntary
2018-08-27T15:41:58.369Z|00493|coverage|INFO|Dropped 4 log messages in last 31 
seconds (most recently, 7 seconds ago) due to excessive rate
2018-08-27T15:41:58.369Z|00494|coverage|INFO|Event coverage, avg rate over 
last: 5 seconds, last minute, last hour,  hash=dc43467d:
2018-08-27T15:41:58.369Z|00495|coverage|INFO|bridge_reconfigure 0.0/sec 
0.000/sec   

Re: [ovs-discuss] ovs w dpdk live lock?

2018-08-29 Thread O Mahony, Billy
Hi David,

From: davidjoshuaev...@gmail.com [mailto:davidjoshuaev...@gmail.com]
Sent: Monday, August 27, 2018 5:14 PM
To: O Mahony, Billy 
Subject: Re: [ovs-discuss] ovs w dpdk live lock?

Hi Billy,
I did some more tests.
It appears to happen when i unplug the optics from the SFP.
[[BO'M]] Can you be a little clearer on exactly what happens. You appear to 
have 2 dual port nics configured. What ports are being offered traffic, where 
are you expecting the packets to appear and how are you generating the packets 
how are you detecting if they arrive at the the intended destination or not.
The dpdk is 17.11.3 and ovs 2.9.2 source,, but built as RPM's and installed 
from the RPM.
I don't know why the RPM build reports the 17.11.0. - this might be a bug in 
the RPM script.

As you can see from below, i don't even have rules on the switch.
[[BO'M]] If there are no rules on the switch then no packets should be 
forwarded. If you want the bridge to act like a default l2-learning switch you 
have to explicitly add action=NORMAL to the bridge. Otherwise it appears at 
least on my setup that the any incoming packets will be dropped.
If i add rules to the switch just to force traffic from one port to another, 
there will be traffic interruption.. multiple second latency against the ports.

Start up script looks like this after attach all viable devices to igb_uio

ovsdb-tool create $OVS_DBDIR/conf.db $OVS_SCHEMA
echo "Starting the OVS database server"
ovsdb-server --remote=punix:"$DB_SOCK" \
 --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
 --pidfile --detach

ovs-vsctl --no-wait init

echo ""
echo "start ovs   "
echo ""
echo "Telling the OVS controller to start OVS with DPDK using 512MB hugepage 
memory and run the ovswitchd daemon on logical core 1"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true \
 other_config:dpdk-lcore-mask=0x1E other_config:dpdk-socket-mem="512"

echo "Starting the OVS daemon..."
ovs-vswitchd unix:$DB_SOCK --pidfile 
--log-file=/var/log/openvswitch/vswitchd.log --detach
sleep 2
echo "Before creating the DPDK ports, tell OVS to use Core 2 for the DPDK PMD"
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1E
ovs-vsctl add-port br0 dpdkx0 -- set Interface dpdkx0 type=dpdk 
options:dpdk-devargs=:0a:00.0 options:n_rxq=2
ovs-vsctl add-port br0 dpdkx1 -- set Interface dpdkx1 type=dpdk 
options:dpdk-devargs=:0a:00.1 options:n_rxq=2
ovs-vsctl add-port br0 dpdkx2 -- set Interface dpdkx2 type=dpdk 
options:dpdk-devargs=:0b:00.0
ovs-vsctl add-port br0 dpdkx3 -- set Interface dpdkx3 type=dpdk 
options:dpdk-devargs=:0b:00.1

[[BO'M]] You shouldn’t have overlap between the lcore mask and the pmd-mask. 
Although I can’t see that causing anything like your issue change them to not 
overlap.

echo "**Clearing current flows"
ovs-ofctl del-flows br0

 You don't need any traffic to get these kind of messages, just un-plug the 
fibre.
2018-08-27T15:41:27.629Z|00483|timeval|WARN|Unreasonably long 2961ms poll 
interval (998ms user, 0ms system)
2018-08-27T15:41:27.629Z|00484|timeval|WARN|context switches: 1 voluntary, 105 
involuntary
2018-08-27T15:41:35.469Z|00485|timeval|WARN|Unreasonably long 2840ms poll 
interval (955ms user, 0ms system)
2018-08-27T15:41:35.469Z|00486|timeval|WARN|context switches: 1 voluntary, 99 
involuntary
2018-08-27T15:41:43.469Z|00487|timeval|WARN|Unreasonably long 3000ms poll 
interval (1000ms user, 0ms system)
2018-08-27T15:41:43.469Z|00488|timeval|WARN|context switches: 1 voluntary, 106 
involuntary
2018-08-27T15:41:51.389Z|00489|timeval|WARN|Unreasonably long 2920ms poll 
interval (996ms user, 0ms system)
2018-08-27T15:41:51.389Z|00490|timeval|WARN|context switches: 1 voluntary, 102 
involuntary
2018-08-27T15:41:58.369Z|00491|timeval|WARN|Unreasonably long 1980ms poll 
interval (992ms user, 0ms system)
2018-08-27T15:41:58.369Z|00492|timeval|WARN|context switches: 0 voluntary, 101 
involuntary
2018-08-27T15:41:58.369Z|00493|coverage|INFO|Dropped 4 log messages in last 31 
seconds (most recently, 7 seconds ago) due to excessive rate
2018-08-27T15:41:58.369Z|00494|coverage|INFO|Event coverage, avg rate over 
last: 5 seconds, last minute, last hour,  hash=dc43467d:
2018-08-27T15:41:58.369Z|00495|coverage|INFO|bridge_reconfigure 0.0/sec 
0.000/sec0.0025/sec   total: 9
2018-08-27T15:41:58.369Z|00496|coverage|INFO|ofproto_flush  0.0/sec 
0.000/sec0.0003/sec   total: 1
2018-08-27T15:41:58.369Z|00497|coverage|INFO|ofproto_recv_openflow  0.0/sec 
0.000/sec0.0008/sec   total: 6
2018-08-27T15:41:58.369Z|00498|coverage|INFO|ofproto_update_port0.0/sec 
0.033/sec0.0036/sec   total: 13
2018-08-27T15:41:58.369Z|00499|coverage|INFO|rev_reconfigure 

Re: [ovs-discuss] Requested device cannot be used

2018-08-28 Thread O Mahony, Billy
> -Original Message-
> From: 聶大鈞 [mailto:tcn...@iii.org.tw]
> Sent: Tuesday, August 28, 2018 11:52 AM
> To: O Mahony, Billy ; ovs-discuss@openvswitch.org
> Subject: RE: [ovs-discuss] Requested device cannot be used
> 
> Hello Billy,
> 
> Thanks for your suggestion, It does solve my problem.
> 
> Here comes a further question, I did notice that this NIC card is allocated to
> numa one(my second numa node).
> Hence, I setup socket-mem by trying following commands:
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
> mem="0,1024", or ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-
> socket-mem=1024, or ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-socket-mem="1024,1024"
> All these commands could not solve the problem.
> 
[[BO'M]] I would have thought that 1024,1024 should work. As you do have 2048 
pages configured. However it could be that the allocation was not even across 
the two NUMAs and on one of the nodes not all the requested pages were 
allocated (due to not enough large enough contiguous physical address being 
free). 

I usu do 'echo 1024 > 
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages' to make 
my HP allocations and then readback from that file and .../free_hugepages to 
verify the actual allocations (note those figures are denominated in pages NOT 
MB like dpdk-socket-mem)

> Could you please tell me what difference between yours(512,5125) and
> mine(1024,1024) is?
> 
> Finally, thanks for your help again, when things go stable, I'll adjust 
> pmd-cpu-
> mask for performance.
> 
> Best Regard
> 
> Tcnieh
> 
> 
> -Original Message-
> From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> Sent: Tuesday, August 28, 2018 5:09 PM
> To: tcn...@iii.org.tw; ovs-discuss@openvswitch.org
> Subject: RE: [ovs-discuss] Requested device cannot be used
> 
> Hi Tcnieh,
> 
> 
> 
> Looks like your nics are on NUMA1 (second numa node) – as their pci bus
> number is > 80.
> 
> 
> 
> But you have not told OvS to allocate hugepage memory on the second numa
> node – the 0 in “--socket-mem 1024,0).”
> 
> 
> 
> So you need to change your line to something like:
> 
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
> mem="512,512"
> 
> 
> 
> to have Hugepages available on both nodes.
> 
> 
> 
> Also you have allocated just a single core (core 0) for DPDK PMDs. It is also
> unusual to allocate core zero. That should work but with reduced performance
> as the PMD (on NUMA0) will have to access the packet data on NUMA1.
> 
> 
> 
> Have a look at your cpu topology. And modify your core-mask to allocate a core
> from NUMA1 also.
> 
> 
> 
> The details are in the docs: Documentation/topics/dpdk/* and
> Documentation/howto/dpdk.rst.
> 
> 
> 
> Regards,
> 
> Billy
> 
> 
> 
> 
> 
> From: ovs-discuss-boun...@openvswitch.org [mailto:ovs-discuss-
> boun...@openvswitch.org] On Behalf Of ???
> Sent: Tuesday, August 28, 2018 3:37 AM
> To: ovs-discuss@openvswitch.org
> Subject: [ovs-discuss] Requested device cannot be used
> 
> 
> 
> Hello all,
>   I am trying to get the performance of intel x520 10G NIC over Dell 
> R630/R730,
> but I keep getting an unexpected error, please see below.
> 
> I followed the instruction of https://goo.gl/T7iTuk <https://goo.gl/T7iTuk>  
> to
> compiler the DPDK and OVS code. I've successfully binded both my x520 NIC
> ports to DPDK, using either igb_uio or vfio_pci:
> 
> ~~
> Network devices using DPDK-compatible driver
> 
> :82:00.0 'Ethernet 10G 2P X520 Adapter 154d' drv=igb_uio unused=vfio-pci
> :82:00.1 'Ethernet 10G 2P X520 Adapter 154d' drv=igb_uio unused=vfio-pci
> 
> Network devices using kernel driver
> ===
> :01:00.0 'NetXtreme BCM5720 Gigabit Ethernet PCIe 165f' if=eno1 drv=tg3
> unused=igb_uio,vfio-pci
> :01:00.1 'NetXtreme BCM5720 Gigabit Ethernet PCIe 165f' if=eno2 drv=tg3
> unused=igb_uio,vfio-pci
> :02:00.0 'NetXtreme BCM5720 Gigabit Ethernet PCIe 165f' if=eno3 drv=tg3
> unused=igb_uio,vfio-pci
> :02:00.1 'NetXtreme BCM5720 Gigabit Ethernet PCIe 165f' if=eno4 drv=tg3
> unused=igb_uio,vfio-pci *Active*
> 
> Other Network devices
> =
> 
> ~~~
> 
> And the hugepage was set to 2048 * 2M
> ~~~
> HugePages_Total:2048
> HugePages_Free: 1024
> HugePages_Rsvd:0
> HugePages_Surp:0
> Hugepag

Re: [ovs-discuss] Requested device cannot be used

2018-08-28 Thread O Mahony, Billy
Hi Tcnieh,

Looks like your nics are on NUMA1 (second numa node) – as their pci bus number 
is > 80.

But you have not told OvS to allocate hugepage memory on the second numa node – 
the 0 in “--socket-mem 1024,0).”

So you need to change your line to something like:
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="512,512"

to have Hugepages available on both nodes.

Also you have allocated just a single core (core 0) for DPDK PMDs. It is also 
unusual to allocate core zero. That should work but with reduced performance as 
the PMD (on NUMA0) will have to access the packet data on NUMA1.

Have a look at your cpu topology. And modify your core-mask to allocate a core 
from NUMA1 also.

The details are in the docs: Documentation/topics/dpdk/* and 
Documentation/howto/dpdk.rst.

Regards,
Billy


From: ovs-discuss-boun...@openvswitch.org 
[mailto:ovs-discuss-boun...@openvswitch.org] On Behalf Of ???
Sent: Tuesday, August 28, 2018 3:37 AM
To: ovs-discuss@openvswitch.org
Subject: [ovs-discuss] Requested device cannot be used


Hello all,

  I am trying to get the performance of intel x520 10G NIC over Dell R630/R730, 
but I keep getting an unexpected error, please see below.



I followed the instruction of https://goo.gl/T7iTuk to compiler the DPDK and 
OVS code. I've successfully binded both my x520 NIC ports to DPDK, using either 
igb_uio or vfio_pci:



~~

Network devices using DPDK-compatible driver



:82:00.0 'Ethernet 10G 2P X520 Adapter 154d' drv=igb_uio unused=vfio-pci

:82:00.1 'Ethernet 10G 2P X520 Adapter 154d' drv=igb_uio unused=vfio-pci



Network devices using kernel driver

===

:01:00.0 'NetXtreme BCM5720 Gigabit Ethernet PCIe 165f' if=eno1 drv=tg3 
unused=igb_uio,vfio-pci

:01:00.1 'NetXtreme BCM5720 Gigabit Ethernet PCIe 165f' if=eno2 drv=tg3 
unused=igb_uio,vfio-pci

:02:00.0 'NetXtreme BCM5720 Gigabit Ethernet PCIe 165f' if=eno3 drv=tg3 
unused=igb_uio,vfio-pci

:02:00.1 'NetXtreme BCM5720 Gigabit Ethernet PCIe 165f' if=eno4 drv=tg3 
unused=igb_uio,vfio-pci *Active*



Other Network devices

=



~~~



And the hugepage was set to 2048 * 2M

~~~

HugePages_Total:2048

HugePages_Free: 1024

HugePages_Rsvd:0

HugePages_Surp:0

Hugepagesize:   2048 kB

~~~



Here comes the problem, while I tried to init the ovsdb-server and ovs-vswitch, 
I got the following error:

~~~

   2018-08-27T09:54:05.548Z|2|ovs_numa|INFO|Discovered 16 CPU cores on NUMA 
node 0

   2018-08-27T09:54:05.548Z|3|ovs_numa|INFO|Discovered 16 CPU cores on NUMA 
node 1

   2018-08-27T09:54:05.548Z|4|ovs_numa|INFO|Discovered 2 NUMA nodes and 32 
CPU cores

   
2018-08-27T09:54:05.548Z|5|reconnect|INFO|unix:/usr/local/var/run/openvswitch/db.sock:
 connecting...

   2018-08-   
27T09:54:05.549Z|6|reconnect|INFO|unix:/usr/local/var/run/openvswitch/db.sock:
 connected

   2018-08-27T09:54:05.552Z|7|dpdk|INFO|DPDK Enabled - initializing...

   2018-08-27T09:54:05.552Z|8|dpdk|INFO|No vhost-sock-dir provided - 
defaulting to /usr/local/var/run/openvswitch

   2018-08-27T09:54:05.552Z|9|dpdk|INFO|EAL ARGS: ovs-vswitchd --socket-mem 
1024,0 -c 0x0001

   2018-08-27T09:54:05.553Z|00010|dpdk|INFO|EAL: Detected 32 lcore(s)

   2018-08-27T09:54:05.558Z|00011|dpdk|WARN|EAL: No free hugepages reported in 
hugepages-1048576kB

   2018-08-27T09:54:05.559Z|00012|dpdk|INFO|EAL: Probing VFIO support...

   2018-08-27T09:54:06.700Z|00013|dpdk|INFO|EAL: PCI device :82:00.0 on 
NUMA socket 1

   2018-08-27T09:54:06.700Z|00014|dpdk|INFO|EAL:   probe driver: 8086:154d 
net_ixgbe

2018-08-27T09:54:06.700Z|00015|dpdk|ERR|EAL: Requested device :82:00.0 
cannot be used

   2018-08-27T09:54:06.700Z|00016|dpdk|INFO|EAL: PCI device :82:00.1 on 
NUMA socket 1

   2018-08-27T09:54:06.700Z|00017|dpdk|INFO|EAL:   probe driver: 8086:154d 
net_ixgbe

2018-08-27T09:54:06.700Z|00018|dpdk|ERR|EAL: Requested device :82:00.1 
cannot be used

   2018-08-27T09:54:06.701Z|00019|dpdk|INFO|DPDK Enabled - initialized

   2018-08-27T09:54:06.705Z|00020|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath 
supports recirculation

~~~



Therefore, I also got the same error when I added a dpdk-port:

~~~

2018-08-27T09:54:06.709Z|00036|dpdk|INFO|EAL: PCI device :82:00.0 on NUMA 
socket 1

2018-08-27T09:54:06.709Z|00037|dpdk|INFO|EAL:   probe driver: 8086:154d 
net_ixgbe

2018-08-27T09:54:06.710Z|00038|dpdk|WARN|EAL: Requested device :82:00.0 
cannot be used

2018-08-27T09:54:06.710Z|00039|dpdk|ERR|EAL: Driver cannot attach the device 
(:82:00.0)


Re: [ovs-discuss] ovs-dpdk crash when use vhost-user in docker

2018-08-21 Thread O Mahony, Billy
Hi,

One thing to look out for with DPDK < 18.05 is that you need to used 1GB huge 
pages (and no more than eight of them) to use virtio. I’m not sure if that is 
the issue you have as I think it I don’t remember it causing a seg fault. But 
is certainly worth checking.

If that does not work please send the info Ciara refers to as well as the 
ovs-vsctl interface config for the ovs vhost backend.

Thanks,
Billy

From: ovs-discuss-boun...@openvswitch.org 
[mailto:ovs-discuss-boun...@openvswitch.org] On Behalf Of Loftus, Ciara
Sent: Tuesday, August 21, 2018 9:06 AM
To: gmzhan...@gmail.com; ovs-discuss@openvswitch.org
Cc: us...@dpdk.org
Subject: Re: [ovs-discuss] ovs-dpdk crash when use vhost-user in docker

Hi,

I am cc-ing the DPDK users’ list as the SEGV originates in the DPDK vHost code 
and somebody there might be able to help too.
Could you provide more information about your environment please? eg. OVS & 
DPDK versions, hugepage configuration, etc.

Thanks,
Ciara

From: 
ovs-discuss-boun...@openvswitch.org 
[mailto:ovs-discuss-boun...@openvswitch.org] On Behalf Of ???
Sent: Monday, August 20, 2018 12:06 PM
To: ovs-discuss@openvswitch.org
Subject: [ovs-discuss] ovs-dpdk crash when use vhost-user in docker

Hi,

   I used ovs-dpdk  as bridge  and l2fwd  as container. When l2fwd was runned 
,the ovs-dpdk was crashed.

My command is :

docker run -it --privileged --name=dpdk-docker  -v /dev/hugepages:/mnt/huge 
-v /usr/local/var/run/openvswitch:/var/run/openvswitch dpdk-docker

./l2fwd -c 0x06 -n 4  --socket-mem=1024  --no-pci 
--vdev=net_virtio_user0,mac=00:00:00:00:00:05,path=/var/run/openvswitch/vhost-user0
  
--vdev=net_virtio_user1,mac=00:00:00:00:00:01,path=/var/run/openvswitch/vhost-user1
 -- -p 0x3



The crash log



Program terminated with signal 11, Segmentation fault.

#0  0x00445828 in malloc_elem_alloc ()

Missing separate debuginfos, use: debuginfo-install 
glibc-2.17-196.el7_4.2.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 
krb5-libs-1.15.1-8.el7.x86_64 libcap-ng-0.7.5-4.el7.x86_64 
libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7_4.1.x86_64 
libpcap-1.5.3-9.el7.x86_64 libselinux-2.5-12.el7.x86_64 
numactl-libs-2.0.9-6.el7_2.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 
pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64

(gdb) bt

#0  0x00445828 in malloc_elem_alloc ()

#1  0x00445e5d in malloc_heap_alloc ()

#2  0x00444c74 in rte_zmalloc ()

#3  0x006c16bf in vhost_new_device ()

#4  0x006bfaf4 in vhost_user_add_connection ()

#5  0x006beb88 in fdset_event_dispatch ()

#6  0x7f613b288e25 in start_thread () from /usr/lib64/libpthread.so.0

#7  0x7f613a86b34d in clone () from /usr/lib64/libc.so.6



My OVS  version is 2.9.1 , DPDK version is 17.11.3





Thanks






___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] EMC lookup disabled but still there some processing going on with emc lookup

2018-03-23 Thread O Mahony, Billy
Are you setting emc-insert-inv-prob in the ovsdb before OvS starts?

There should not be any packets being processed before the configuration is 
applied but if you remove the early startup time until you are sure the system 
is in a steady state from the analysis do you still see time being spend in 
emc_lookup and emc_insert?

Are you sure that the debug info (.exe) available to vtune when it is analyzing 
is identical to the one that was used to gather the stats? Otherwise an address 
could be attributed to the wrong symbol.

/Billy.

From: ovs-discuss-boun...@openvswitch.org 
[mailto:ovs-discuss-boun...@openvswitch.org] On Behalf Of Krish
Sent: Thursday, March 22, 2018 3:48 AM
To: ovs-discuss@openvswitch.org
Subject: [ovs-discuss] EMC lookup disabled but still there some processing 
going on with emc lookup

Hello everyone

I am testing ovs-vswitch caches time spent using intel vtune.

I disabled emc lookup using "emc-insert-inv-prob=0" but still I can see emc 
lookup is not disabled and also there is insertion which takes place into emc 
also after the packet completes fast-path processing.

I am attaching the screenshots on Intel vtune along with this mail.

Can anyone please explain why this happened?

Thank you

Regards


___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Way to get average time spent

2018-03-23 Thread O Mahony, Billy
Hi Krish,

You will need to compile with debug info and also with in-lining turned off. 
Many functions get in-lined and in that case the perf report cannot attribute 
time to the in-lined function but only to the caller. Obviously performance 
overall drops in that case but the relative cost of each function should not be 
affected.

You should also checkout Brendan Gregg’s flame graphs 
http://www.brendangregg.com/flamegraphs.html as a visualization of cpu cost.

If you are using dpdk-ovs then you should pin the PMD to an isolated core and 
use perf to sample the call stack on that core only. Details of pinning & 
isolation are in the dpdk sections of the ovs documentation.

I have not run perf on kernel OvS. That would be a matter of using the correct 
flags to sample by kernel module or thread.

You will need to be reasonably familiar with the code in order to interpret the 
results. These are some blog posts on the EMC and netdev datapath classifier 
for OvS-DPDK :
https://software.intel.com/en-us/articles/the-open-vswitch-exact-match-cache
https://software.intel.com/en-us/articles/ovs-dpdk-datapath-classifier

Again for OvS-DPDK you could also use the output from ovs-appctl 
dpcls/pmd-stats-show. By using a very small number of flows you can ensure that 
your traffic his hitting the EMC and you can measure the EMC lookup cost. Then 
you can disable the EMC (using emc-insert-inv-prob setting) and measure the 
classifier lookup cost. Be aware those cost include the rx and tx costs which 
actually decrease as traffic increases – as the packet batch size increases 
leadings to lower per-packet tx/rx cost.

Regards,
Billy.

From: ovs-discuss-boun...@openvswitch.org 
[mailto:ovs-discuss-boun...@openvswitch.org] On Behalf Of Krish
Sent: Thursday, March 15, 2018 2:52 AM
To: Justin Pettit ; Greg Rose ; 
ovs-discuss@openvswitch.org
Subject: Re: [ovs-discuss] Way to get average time spent

Justin

Thanks for telling me about perf tool. I think its a really good tool for 
finding hotspots. But I don't know how to test OVS caches with the perf tool.

Greg
Can you please throw some light on this matter. If perf is the right tool for 
getting time spent in EMC,datapath classifier and packet header extraction?
If yes, please tell me the way to do that? Which part I should perf for?


Thanks and Regards


On Wed, Mar 14, 2018 at 4:31 PM, Krish 
> wrote:
Justin

Thanks for telling me about perf tool. I think its a really good tool for 
finding hotspots. But I don't know how to test OVS caches with the perf tool.

Greg
Can you please throw some light on this matter. If perf is the right tool for 
getting time spent in EMC,datapath classifier and packet header extraction?
If yes, please tell me the way to do that? Which part I should perf for?


Thanks and Regards

On Tue, Mar 13, 2018 at 2:08 AM, Justin Pettit 
> wrote:
Greg (cc'd) could probably provide you a better answer, but I suspect the perf 
tool is a good place to start.

--Justin


> On Mar 12, 2018, at 3:24 AM, Krish 
> > wrote:
>
> Hi users,
>
> I need to get the average time spent in packet extraction then in first level 
> cache , second level cache. I don't know the way to measure that.
>
> Can anyone please help me pointing to right direction?
> I think, I need to modify some code. Please guide me if I am right or wrong.
>
>
> Looking forward for a response from someone.
>
> Thanks
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] ovs_dpdk: dpdk-socket-mem usage question

2017-09-19 Thread O Mahony, Billy
Hi Wang,

Typically I reserve between 512M and 1G on each Numa.

There is no formula I am aware of for how much memory is actually required.

Fundamentally this will be determined by the maximum number and size of packets 
in-flight at any given time. Which is determined by the ingress packet rate, 
processing time in ovs and the rate and frequency at which egress queues are 
drained.

The maximum memory requiremnt is determined by the number of rx and tx queues 
and how many descriptors each has.  Also longer queues (more descriptors) will 
protect against packet loss up to a point. So QoS/throughput also comes in to 
play. 

On that point dpdkvhostuser ports, as far as I know, current versions of qemu 
have a virtio queue length fixed at compile time so these queue lengths cannot 
be modified by OVS at all.

In short I don't think there is any way other than testing and tuning of the 
dpdk application (in this case OVS) and the particular use case while 
monitoring internal queue usage. This should give you an idea of an acceptable 
maximum length for the various queues and a good first guess as to the total 
amount of memory required.

Regards,
Billy.



> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of ???
> Sent: Wednesday, September 13, 2017 6:35 AM
> To: ovs-...@openvswitch.org; ovs-discuss@openvswitch.org
> Subject: [ovs-dev] ovs_dpdk: dpdk-socket-mem usage question
> 
> Hi All,
> 
> I read below doc, and have one question:
> 
> http://docs.openvswitch.org/en/latest/intro/install/dpdk/
> dpdk-socket-mem
> Comma separated list of memory to pre-allocate from hugepages on specific
> sockets.
> 
> Question:
>OVS+DPDK can let user to specify the needed memory using dpdk-socket-
> mem. But the question is that how to know how much memory is needed. Is
> there some algorithm on how to calculate the memory?Thanks.
> 
> Br,
> Wang Zhike
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-11 Thread O Mahony, Billy
Hi Wang,

I believe that the PMD stats processing cycles includes EMC processing time. 

This is just in the context of your results being surprising. It could be a 
factor if you are using code where the bug exists. The patch carries a fixes: 
tag (I think) that should help you figure out if your results were potentially 
affected by this issue.

Regards,
/Billy. 

> -Original Message-
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Monday, September 11, 2017 3:00 AM
> To: O Mahony, Billy <billy.o.mah...@intel.com>; ovs-
> d...@openvswitch.org; Jan Scheurich <jan.scheur...@ericsson.com>; Darrell
> Ball <db...@vmware.com>; ovs-discuss@openvswitch.org; Kevin Traynor
> <ktray...@redhat.com>
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> In my test, almost all traffic went trough via EMC. So the fix does not impact
> the result, especially we want to know the difference (not the exact num).
> 
> Can you test to get some data? Thanks.
> 
> Br,
> Wang Zhike
> 
> -Original Message-
> From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> Sent: Friday, September 08, 2017 11:18 PM
> To: 王志克; ovs-...@openvswitch.org; Jan Scheurich; Darrell Ball; ovs-
> disc...@openvswitch.org; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Wang,
> 
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337309.html
> 
> I see it's been acked and is due to be pushed to master with other changes
> on the dpdk merge branch so you'll have to apply it manually for now.
> 
> /Billy.
> 
> > -Original Message-
> > From: 王志克 [mailto:wangzh...@jd.com]
> > Sent: Friday, September 8, 2017 11:48 AM
> > To: ovs-...@openvswitch.org; Jan Scheurich
> > <jan.scheur...@ericsson.com>; O Mahony, Billy
> > <billy.o.mah...@intel.com>; Darrell Ball <db...@vmware.com>; ovs-
> > disc...@openvswitch.org; Kevin Traynor <ktray...@redhat.com>
> > Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Billy,
> >
> > I used ovs2.7.0. I searched the git log, and not sure which commit it
> > is. Do you happen to know?
> >
> > Yes, I cleared the stats after traffic run.
> >
> > Br,
> > Wang Zhike
> >
> >
> > From: "O Mahony, Billy" <billy.o.mah...@intel.com>
> > To: "wangzh...@jd.com" <wangzh...@jd.com>, Jan Scheurich
> > <jan.scheur...@ericsson.com>, Darrell Ball <db...@vmware.com>,
> > "ovs-discuss@openvswitch.org" <ovs-discuss@openvswitch.org>,
> > "ovs-...@openvswitch.org" <ovs-...@openvswitch.org>, Kevin
> Traynor
> > <ktray...@redhat.com>
> > Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> > Message-ID:
> > <03135aea779d444e90975c2703f148dc58c19...@irsmsx107.ger.c
> > orp.intel.com>
> >
> > Content-Type: text/plain; charset="utf-8"
> >
> > Hi Wang,
> >
> > Thanks for the figures. Unexpected results as you say. Two things come
> > to
> > mind:
> >
> > I?m not sure what code you are using but the cycles per packet
> > statistic was broken for a while recently. Ilya posted a patch to fix
> > it so make sure you have that patch included.
> >
> > Also remember to reset the pmd stats after you start your traffic and
> > then measure after a short duration.
> >
> > Regards,
> > Billy.
> >
> >
> >
> > From: ??? [mailto:wangzh...@jd.com]
> > Sent: Friday, September 8, 2017 8:01 AM
> > To: Jan Scheurich <jan.scheur...@ericsson.com>; O Mahony, Billy
> > <billy.o.mah...@intel.com>; Darrell Ball <db...@vmware.com>; ovs-
> > disc...@openvswitch.org; ovs-...@openvswitch.org; Kevin Traynor
> > <ktray...@redhat.com>
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> >
> > Hi All,
> >
> >
> >
> > I tested below cases, and get some performance data. The data shows
> > there is little impact for cross NUMA communication, which is
> > different from my expectation. (Previously I mentioned that cross NUMA
> > would add 60% cycles, but I can NOT reproduce it any more).
> >
> >
> >
> > @Jan,
> >
> > You mentioned cross NUMA communication would cost lots more cycles.
> > Can you share your data? I am not sure whether I made some mistake or
> not.
> >
> &

Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-08 Thread O Mahony, Billy
Hi Wang,

https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337309.html

I see it's been acked and is due to be pushed to master with other changes on 
the dpdk merge branch so you'll have to apply it manually for now.

/Billy. 

> -Original Message-
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Friday, September 8, 2017 11:48 AM
> To: ovs-...@openvswitch.org; Jan Scheurich
> <jan.scheur...@ericsson.com>; O Mahony, Billy
> <billy.o.mah...@intel.com>; Darrell Ball <db...@vmware.com>; ovs-
> disc...@openvswitch.org; Kevin Traynor <ktray...@redhat.com>
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> I used ovs2.7.0. I searched the git log, and not sure which commit it is. Do 
> you
> happen to know?
> 
> Yes, I cleared the stats after traffic run.
> 
> Br,
> Wang Zhike
> 
> 
> From: "O Mahony, Billy" <billy.o.mah...@intel.com>
> To: "wangzh...@jd.com" <wangzh...@jd.com>, Jan Scheurich
>   <jan.scheur...@ericsson.com>, Darrell Ball <db...@vmware.com>,
>   "ovs-discuss@openvswitch.org" <ovs-discuss@openvswitch.org>,
>   "ovs-...@openvswitch.org" <ovs-...@openvswitch.org>, Kevin
> Traynor
>   <ktray...@redhat.com>
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
>   physical port
> Message-ID:
>   <03135aea779d444e90975c2703f148dc58c19...@irsmsx107.ger.c
> orp.intel.com>
> 
> Content-Type: text/plain; charset="utf-8"
> 
> Hi Wang,
> 
> Thanks for the figures. Unexpected results as you say. Two things come to
> mind:
> 
> I?m not sure what code you are using but the cycles per packet statistic was
> broken for a while recently. Ilya posted a patch to fix it so make sure you
> have that patch included.
> 
> Also remember to reset the pmd stats after you start your traffic and then
> measure after a short duration.
> 
> Regards,
> Billy.
> 
> 
> 
> From: ??? [mailto:wangzh...@jd.com]
> Sent: Friday, September 8, 2017 8:01 AM
> To: Jan Scheurich <jan.scheur...@ericsson.com>; O Mahony, Billy
> <billy.o.mah...@intel.com>; Darrell Ball <db...@vmware.com>; ovs-
> disc...@openvswitch.org; ovs-...@openvswitch.org; Kevin Traynor
> <ktray...@redhat.com>
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> 
> Hi All,
> 
> 
> 
> I tested below cases, and get some performance data. The data shows there
> is little impact for cross NUMA communication, which is different from my
> expectation. (Previously I mentioned that cross NUMA would add 60%
> cycles, but I can NOT reproduce it any more).
> 
> 
> 
> @Jan,
> 
> You mentioned cross NUMA communication would cost lots more cycles. Can
> you share your data? I am not sure whether I made some mistake or not.
> 
> 
> 
> @All,
> 
> Welcome your data if you have data for similar cases. Thanks.
> 
> 
> 
> Case1: VM0->PMD0->NIC0
> 
> Case2:VM1->PMD1->NIC0
> 
> Case3:VM1->PMD0->NIC0
> 
> Case4:NIC0->PMD0->VM0
> 
> Case5:NIC0->PMD1->VM1
> 
> Case6:NIC0->PMD0->VM1
> 
> 
> 
> ? VM Tx Mpps  Host Tx Mpps  avg cycles per packet   avg processing
> cycles per packet
> 
> Case1 1.4   1.4 512 
> 415
> 
> Case2 1.3   1.3 537 
> 436
> 
> Case3 1.351.35   514 390
> 
> 
> 
> ?  VM Rx Mpps    Host Rx Mpps  avg cycles per packet   avg processing 
> cycles
> per packet
> 
> Case4 1.3   1.3 549 
> 533
> 
> Case5 1.3   1.3 559 
> 540
> 
> Case6 1.28 1.28  568 551
> 
> 
> 
> Br,
> 
> Wang Zhike
> 
> 
> 
> -Original Message-
> From: Jan Scheurich [mailto:jan.scheur...@ericsson.com]
> Sent: Wednesday, September 06, 2017 9:33 PM
> To: O Mahony, Billy; ???; Darrell Ball; ovs-
> disc...@openvswitch.org<mailto:ovs-discuss@openvswitch.org>; ovs-
> d...@openvswitch.org<mailto:ovs-...@openvswitch.org>; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> 
> 
> Hi Billy,
> 
> 
> 
> > You are going to have to take the hit crossing the NUMA boundary at some
> point if your NIC and VM are on different NUMAs.
> 
> >
>

Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-08 Thread O Mahony, Billy
Hi Wang,

Thanks for the figures. Unexpected results as you say. Two things come to mind:

I’m not sure what code you are using but the cycles per packet statistic was 
broken for a while recently. Ilya posted a patch to fix it so make sure you 
have that patch included.

Also remember to reset the pmd stats after you start your traffic and then 
measure after a short duration.

Regards,
Billy.



From: 王志克 [mailto:wangzh...@jd.com]
Sent: Friday, September 8, 2017 8:01 AM
To: Jan Scheurich <jan.scheur...@ericsson.com>; O Mahony, Billy 
<billy.o.mah...@intel.com>; Darrell Ball <db...@vmware.com>; 
ovs-discuss@openvswitch.org; ovs-...@openvswitch.org; Kevin Traynor 
<ktray...@redhat.com>
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port


Hi All,



I tested below cases, and get some performance data. The data shows there is 
little impact for cross NUMA communication, which is different from my 
expectation. (Previously I mentioned that cross NUMA would add 60% cycles, but 
I can NOT reproduce it any more).



@Jan,

You mentioned cross NUMA communication would cost lots more cycles. Can you 
share your data? I am not sure whether I made some mistake or not.



@All,

Welcome your data if you have data for similar cases. Thanks.



Case1: VM0->PMD0->NIC0

Case2:VM1->PMD1->NIC0

Case3:VM1->PMD0->NIC0

Case4:NIC0->PMD0->VM0

Case5:NIC0->PMD1->VM1

Case6:NIC0->PMD0->VM1



  VM Tx Mpps  Host Tx Mpps  avg cycles per packet   avg processing 
cycles per packet

Case1 1.4   1.4 512 415

Case2 1.3   1.3 537 436

Case3 1.351.35   514 390



   VM Rx MppsHost Rx Mpps  avg cycles per packet   avg processing 
cycles per packet

Case4 1.3   1.3 549 533

Case5 1.3   1.3 559 540

Case6 1.28 1.28  568 551



Br,

Wang Zhike



-Original Message-
From: Jan Scheurich [mailto:jan.scheur...@ericsson.com]
Sent: Wednesday, September 06, 2017 9:33 PM
To: O Mahony, Billy; 王志克; Darrell Ball; 
ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org>; 
ovs-...@openvswitch.org<mailto:ovs-...@openvswitch.org>; Kevin Traynor
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port



Hi Billy,



> You are going to have to take the hit crossing the NUMA boundary at some 
> point if your NIC and VM are on different NUMAs.

>

> So are you saying that it is more expensive to cross the NUMA boundary from 
> the pmd to the VM that to cross it from the NIC to the

> PMD?



Indeed, that is the case: If the NIC crosses the QPI bus when storing packets 
in the remote NUMA there is no cost involved for the PMD. (The QPI bandwidth is 
typically not a bottleneck.) The PMD only performs local memory access.



On the other hand, if the PMD crosses the QPI when copying packets into a 
remote VM, there is a huge latency penalty involved, consuming lots of PMD 
cycles that cannot be spent on processing packets. We at Ericsson have observed 
exactly this behavior.



This latency penalty becomes even worse when the LLC cache hit rate is degraded 
due to LLC cache contention with real VNFs and/or unfavorable packet buffer 
re-use patterns as exhibited by real VNFs compared to typical synthetic 
benchmark apps like DPDK testpmd.



>

> If so then in that case you'd like to have two (for example) PMDs polling 2 
> queues on the same NIC. With the PMDs on each of the

> NUMA nodes forwarding to the VMs local to that NUMA?

>

> Of course your NIC would then also need to be able know which VM (or at least 
> which NUMA the VM is on) in order to send the frame

> to the correct rxq.



That would indeed be optimal but hard to realize in the general case (e.g. with 
VXLAN encapsulation) as the actual destination is only known after tunnel pop. 
Here perhaps some probabilistic steering of RSS hash values based on measured 
distribution of final destinations might help in the future.



But even without that in place, we need PMDs on both NUMAs anyhow (for 
NUMA-aware polling of vhostuser ports), so why not use them to also poll remote 
eth ports. We can achieve better average performance with fewer PMDs than with 
the current limitation to NUMA-local polling.



BR, Jan


___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread O Mahony, Billy


> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Wednesday, September 6, 2017 3:02 PM
> To: Jan Scheurich <jan.scheur...@ericsson.com>; O Mahony, Billy
> <billy.o.mah...@intel.com>; wangzh...@jd.com; Darrell Ball
> <db...@vmware.com>; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> On 09/06/2017 02:43 PM, Jan Scheurich wrote:
> >>
> >> I think the mention of pinning was confusing me a little. Let me see
> >> if I fully understand your use case:  You don't 'want' to pin
> >> anything but you are using it as a way to force the distribution of rxq 
> >> from
> a single nic across to PMDs on different NUMAs. As without pinning all rxqs
> are assigned to the NUMA-local pmd leaving the other PMD totally unused.
> >>
> >> But then when you used pinning you the PMDs became isolated so the
> >> vhostuser ports rxqs would not be assigned to the PMDs unless they too
> were pinned. Which worked but was not manageable as VM (and vhost
> ports) came and went.
> >>
> >> Yes?
> >
> > Yes!!!
[[BO'M]] Hurrah!
> >
> >>
> >> In that case what we probably want is the ability to pin an rxq to a
> >> pmd but without also isolating the pmd. So the PMD could be assigned
> some rxqs manually and still have others automatically assigned.
> >
> > Wonderful. That is exactly what I have wanted to propose for a while:
> Separate PMD isolation from pinning of Rx queues.
> >
> > Tying these two together makes it impossible to use pinning of Rx queues
> in OpenStack context (without the addition of dedicated PMDs/cores). And
> even during manual testing it is a nightmare to have to manually pin all 48
> vhostuser queues just because we want to pin the two heavy-loaded Rx
> queues to different PMDs.
> >
> 
> That sounds like it would be useful. Do you know in advance of running which
> rxq's they will be? i.e. you know it's particular port and there is only one
> queue. Or you don't know but analyze at runtime and then reconfigure?
> 
> > The idea would be to introduce a separate configuration option for PMDs
> to isolate them, and no longer automatically set that when pinning an rx
> queue to the PMD.
> >
> 
> Please don't break backward compatibility. I think it would be better to keep
> the existing command as is and add a new softer version that allows other
> rxq's to be scheduled on that pmd also.
[[BO'M]] Although is implicit isolation feature of pmd-rxq-affinity actuall 
used in the wild?  But still it's sensible to introduce the new 'softer 
version' as you say. 
> 
> Kevin.
> 
> > BR, Jan
> >

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread O Mahony, Billy


> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Wednesday, September 6, 2017 2:50 PM
> To: Jan Scheurich <jan.scheur...@ericsson.com>; O Mahony, Billy
> <billy.o.mah...@intel.com>; wangzh...@jd.com; Darrell Ball
> <db...@vmware.com>; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> On 09/06/2017 02:33 PM, Jan Scheurich wrote:
> > Hi Billy,
> >
> >> You are going to have to take the hit crossing the NUMA boundary at
> some point if your NIC and VM are on different NUMAs.
> >>
> >> So are you saying that it is more expensive to cross the NUMA
> >> boundary from the pmd to the VM that to cross it from the NIC to the
> PMD?
> >
> > Indeed, that is the case: If the NIC crosses the QPI bus when storing
> packets in the remote NUMA there is no cost involved for the PMD. (The QPI
> bandwidth is typically not a bottleneck.) The PMD only performs local
> memory access.
> >
> > On the other hand, if the PMD crosses the QPI when copying packets into a
> remote VM, there is a huge latency penalty involved, consuming lots of PMD
> cycles that cannot be spent on processing packets. We at Ericsson have
> observed exactly this behavior.
> >
> > This latency penalty becomes even worse when the LLC cache hit rate is
> degraded due to LLC cache contention with real VNFs and/or unfavorable
> packet buffer re-use patterns as exhibited by real VNFs compared to typical
> synthetic benchmark apps like DPDK testpmd.
> >
> >>
> >> If so then in that case you'd like to have two (for example) PMDs
> >> polling 2 queues on the same NIC. With the PMDs on each of the NUMA
> nodes forwarding to the VMs local to that NUMA?
> >>
> >> Of course your NIC would then also need to be able know which VM (or
> >> at least which NUMA the VM is on) in order to send the frame to the
> correct rxq.
> >
> > That would indeed be optimal but hard to realize in the general case (e.g.
> with VXLAN encapsulation) as the actual destination is only known after
> tunnel pop. Here perhaps some probabilistic steering of RSS hash values
> based on measured distribution of final destinations might help in the future.
> >
> > But even without that in place, we need PMDs on both NUMAs anyhow
> (for NUMA-aware polling of vhostuser ports), so why not use them to also
> poll remote eth ports. We can achieve better average performance with
> fewer PMDs than with the current limitation to NUMA-local polling.
> >
> 
> If the user has some knowledge of the numa locality of ports and can place
> VM's accordingly, default cross-numa assignment can be harm performance.
> Also, it would make for very unpredictable performance from test to test and
> even for flow to flow on a datapath.
[[BO'M]] Wang's original request would constitute default cross numa assignment 
but I don't think this modified proposal would as it still requires explicit 
config to assign to the remote NUMA.
> 
> Kevin.
> 
> > BR, Jan
> >

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread O Mahony, Billy
Hi Wang,

I think the mention of pinning was confusing me a little. Let me see if I fully 
understand your use case:  You don't 'want' to pin anything but you are using 
it as a way to force the distribution of rxq from a single nic across to PMDs 
on different NUMAs. As without pinning all rxqs are assigned to the NUMA-local 
pmd leaving the other PMD totally unused.

But then when you used pinning you the PMDs became isolated so the vhostuser 
ports rxqs would not be assigned to the PMDs unless they too were pinned. Which 
worked but was not manageable as VM (and vhost ports) came and went.

Yes? 

In that case what we probably want is the ability to pin an rxq to a pmd but 
without also isolating the pmd. So the PMD could be assigned some rxqs manually 
and still have others automatically assigned. 

But what I still don't understand is why you don't put both PMDs on the same 
NUMA node. Given that you cannot program the NIC to know which VM a frame is 
for then you would have to RSS the frames across rxqs (ie across NUMA nodes). 
Of those going to the NICs local-numa node 50% would have to go across the NUMA 
boundary when their destination VM was decided - which is okay - they have to 
cross the boundary at some point. But for or frames going to non-local NUMA, 
50% of these will actually be destined for what was originally the local NUMA 
node. Now these packets (25% of all traffic would ) will cross NUMA *twice* 
whereas if all PMDs were on the NICs NUMA node those frames would never have 
had to pass between NUMA nodes.

In short I think it's more efficient to have both PMDs on the same NUMA node as 
the NIC.

There is one more comments below..

> -Original Message-
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Wednesday, September 6, 2017 12:50 PM
> To: O Mahony, Billy <billy.o.mah...@intel.com>; Darrell Ball
> <db...@vmware.com>; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor <ktray...@redhat.com>
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> See my reply in line.
> 
> Br,
> Wang Zhike
> 
> -Original Message-
> From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> Sent: Wednesday, September 06, 2017 7:26 PM
> To: 王志克; Darrell Ball; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Wang,
> 
> You are going to have to take the hit crossing the NUMA boundary at some
> point if your NIC and VM are on different NUMAs.
> 
> So are you saying that it is more expensive to cross the NUMA boundary
> from the pmd to the VM that to cross it from the NIC to the PMD?
> 
> [Wang Zhike] I do not have such data. I hope we can try the new behavior
> and get the test result, and then know whether and how much performance
> can be improved.

[[BO'M]] You don't need to a code change to compare performance of these two 
scenarios. You can simulate it by pinning queues to VMs. I'd imagine crossing 
the NUMA boundary during the PCI DMA would be cheaper that crossing it over 
vhost. But I don't know what the result would be and this would a pretty 
interesting figure to have by the way.


> 
> If so then in that case you'd like to have two (for example) PMDs polling 2
> queues on the same NIC. With the PMDs on each of the NUMA nodes
> forwarding to the VMs local to that NUMA?
> 
> Of course your NIC would then also need to be able know which VM (or at
> least which NUMA the VM is on) in order to send the frame to the correct
> rxq.
> 
> [Wang Zhike] Currently I do not know how to achieve it. From my view, NIC
> do not know which NUMA should be the destination of the packet. Only
> after OVS handling (eg lookup the fowarding rule in OVS), then it can know
> the destination. If NIC does not know the destination NUMA socket, it does
> not matter which PMD to poll it.
> 
> 
> /Billy.
> 
> > -Original Message-
> > From: 王志克 [mailto:wangzh...@jd.com]
> > Sent: Wednesday, September 6, 2017 11:41 AM
> > To: O Mahony, Billy <billy.o.mah...@intel.com>; Darrell Ball
> > <db...@vmware.com>; ovs-discuss@openvswitch.org; ovs-
> > d...@openvswitch.org; Kevin Traynor <ktray...@redhat.com>
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Billy,
> >
> > It depends on the destination of the traffic.
> >
> > I observed that if the traffic destination is across NUMA socket, the
> > "avg processing cycles per packet" would increase 60% than the traffic
> > to same NUMA socket.
> >
> > Br,
> > Wang Zhike
> >
> > -Original Message-
> > F

Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread O Mahony, Billy
Hi Wang,

You are going to have to take the hit crossing the NUMA boundary at some point 
if your NIC and VM are on different NUMAs.

So are you saying that it is more expensive to cross the NUMA boundary from the 
pmd to the VM that to cross it from the NIC to the PMD?

If so then in that case you'd like to have two (for example) PMDs polling 2 
queues on the same NIC. With the PMDs on each of the NUMA nodes forwarding to 
the VMs local to that NUMA?

Of course your NIC would then also need to be able know which VM (or at least 
which NUMA the VM is on) in order to send the frame to the correct rxq. 

/Billy. 

> -Original Message-
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Wednesday, September 6, 2017 11:41 AM
> To: O Mahony, Billy <billy.o.mah...@intel.com>; Darrell Ball
> <db...@vmware.com>; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor <ktray...@redhat.com>
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> It depends on the destination of the traffic.
> 
> I observed that if the traffic destination is across NUMA socket, the "avg
> processing cycles per packet" would increase 60% than the traffic to same
> NUMA socket.
> 
> Br,
> Wang Zhike
> 
> -Original Message-
> From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> Sent: Wednesday, September 06, 2017 6:35 PM
> To: 王志克; Darrell Ball; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Wang,
> 
> If you create several PMDs on the NUMA of the physical port does that have
> the same performance characteristic?
> 
> /Billy
> 
> 
> 
> > -Original Message-
> > From: 王志克 [mailto:wangzh...@jd.com]
> > Sent: Wednesday, September 6, 2017 10:20 AM
> > To: O Mahony, Billy <billy.o.mah...@intel.com>; Darrell Ball
> > <db...@vmware.com>; ovs-discuss@openvswitch.org; ovs-
> > d...@openvswitch.org; Kevin Traynor <ktray...@redhat.com>
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Billy,
> >
> > Yes, I want to achieve better performance.
> >
> > The commit "dpif-netdev: Assign ports to pmds on non-local numa node"
> > can NOT meet my needs.
> >
> > I do have pmd on socket 0 to poll the physical NIC which is also on socket 
> > 0.
> > However, this is not enough since I also have other pmd on socket 1. I
> > hope such pmds on socket 1 can together poll physical NIC. In this
> > way, we have more CPU (in my case, double CPU) to poll the NIC, which
> > results in performance improvement.
> >
> > BR,
> > Wang Zhike
> >
> > -Original Message-
> > From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> > Sent: Wednesday, September 06, 2017 5:14 PM
> > To: Darrell Ball; 王志克; ovs-discuss@openvswitch.org; ovs-
> > d...@openvswitch.org; Kevin Traynor
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Wang,
> >
> > A change was committed to head of master 2017-08-02 "dpif-netdev:
> > Assign ports to pmds on non-local numa node" which if I understand
> > your request correctly will do what you require.
> >
> > However it is not clear to me why you are pinning rxqs to PMDs in the
> > first instance. Currently if you configure at least on pmd on each
> > numa there should always be a PMD available. Is the pinning for
> performance reasons?
> >
> > Regards,
> > Billy
> >
> >
> >
> > > -Original Message-
> > > From: Darrell Ball [mailto:db...@vmware.com]
> > > Sent: Wednesday, September 6, 2017 8:25 AM
> > > To: 王志克 <wangzh...@jd.com>; ovs-discuss@openvswitch.org; ovs-
> > > d...@openvswitch.org; O Mahony, Billy <billy.o.mah...@intel.com>;
> > Kevin
> > > Traynor <ktray...@redhat.com>
> > > Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > > physical port
> > >
> > > Adding Billy and Kevin
> > >
> > >
> > > On 9/6/17, 12:22 AM, "Darrell Ball" <db...@vmware.com> wrote:
> > >
> > >
> > >
> > > On 9/6/17, 12:03 AM, "王志克" <wangzh...@jd.com> wrote:
> > >
> > > Hi Darrell,
> > >
> > > pmd-rxq-affinity has below limitation: (so isolated pmd can
> > > not be used for others, which is not my expect

Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread O Mahony, Billy
Hi Wang,

A change was committed to head of master 2017-08-02 "dpif-netdev: Assign ports 
to pmds on non-local numa node" which if I understand your request correctly 
will do what you require.

However it is not clear to me why you are pinning rxqs to PMDs in the first 
instance. Currently if you configure at least on pmd on each numa there should 
always be a PMD available. Is the pinning for performance reasons?

Regards,
Billy



> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Wednesday, September 6, 2017 8:25 AM
> To: 王志克 <wangzh...@jd.com>; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; O Mahony, Billy <billy.o.mah...@intel.com>; Kevin
> Traynor <ktray...@redhat.com>
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Adding Billy and Kevin
> 
> 
> On 9/6/17, 12:22 AM, "Darrell Ball" <db...@vmware.com> wrote:
> 
> 
> 
> On 9/6/17, 12:03 AM, "王志克" <wangzh...@jd.com> wrote:
> 
> Hi Darrell,
> 
> pmd-rxq-affinity has below limitation: (so isolated pmd can not be 
> used
> for others, which is not my expectation. Lots of VMs come and go on the fly,
> and manully assignment is not feasible.)
>   >>After that PMD threads on cores where RX queues was pinned
> will become isolated. This means that this thread will poll only pinned RX
> queues
> 
> My problem is that I have several CPUs spreading on different NUMA
> nodes. I hope all these CPU can have chance to serve the rxq. However,
> because the phy NIC only locates on one certain socket node, non-same
> numa pmd/CPU would be excluded. So I am wondering whether we can
> have different behavior for phy port rxq:
>   round-robin to all PMDs even the pmd on different NUMA socket.
> 
> I guess this is a common case, and I believe it would improve rx
> performance.
> 
> 
> [Darrell] I agree it would be a common problem and some distribution
> would seem to make sense, maybe factoring in some favoring of local numa
> PMDs ?
> Maybe an optional config to enable ?
> 
> 
> Br,
> Wang Zhike
> 
> 

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] prioritizing latency-sensitive traffic

2017-04-24 Thread O Mahony, Billy
Hi Ben, Darrell,

I've done some PoC work on this kind of traffic prioritization. However with 
OVS-DPDK's run-to-completion model the issue I find the same issue as you 
outlined - by the time the priority of the packet has been determined most of 
the effort to process the packet has already been spent. 

So, I relied on using hardware offload i.e. FlowDirector on the NIC to analyse 
and allocate  packets to high and low priority queues and then modifying the 
PMD (dpif_netdev_run ) to read preferentially from the high priority queue. The 
results were  good for overload situations - packets on the priority queue not 
dropped. In terms of latency there was an improvement but there was still a 
long tail to the latency profile. i..e The latency profile moved down but not 
left.

As Darrell pointed out for egress scheduling, perhaps this kind of ingress 
prioritization can be encapsulated as an optional property of a port? 

If the port can implement the prioritization (either by offloading to hardware 
or in software) it can accept the property being set; If not it can return 
ENOTSUPP?

There is currently HWOL use-cases being gathered for OVS-DPDK:  
https://docs.google.com/document/d/1A8adu1xzg53bzcFKINhffYyC0nCQjqwqnI1jAFx2sck/edit?usp=sharing
 + Kevin who is co-ordinating that.

Thanks,
Billy.


> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Friday, April 21, 2017 10:39 PM
> To: O Mahony, Billy <billy.o.mah...@intel.com>
> Cc: ovs-discuss@openvswitch.org; Darrell Ball <db...@vmware.com>
> Subject: Re: [ovs-discuss] prioritizing latency-sensitive traffic
> 
> Thanks for letting us know.  I'm happy to continue the conversation if there
> are interesting ideas; it's a frustrating situation, frankly, and I'd love to 
> hear
> creative approaches.
> 
> On Tue, Apr 18, 2017 at 10:01:28AM +, O Mahony, Billy wrote:
> > Hi Ben, Darrell,
> >
> > It sounds like the general feeling is that any kind of tc pre-processing is 
> > not
> worth it and the existing egress queing/QoS facilities should suffice.
> >
> > Thanks for your comments.
> >
> > /Billy
> >
> >
> >
> > > -Original Message-
> > > From: Ben Pfaff [mailto:b...@ovn.org]
> > > Sent: Thursday, April 13, 2017 7:47 PM
> > > To: O Mahony, Billy <billy.o.mah...@intel.com>
> > > Cc: ovs-discuss@openvswitch.org
> > > Subject: Re: [ovs-discuss] prioritizing latency-sensitive traffic
> > >
> > > I don't know how much more OVS can contribute to this than it already
> does.
> > > By the time that OVS has classified a packet to the extent that is
> > > necessary to determine that it should be handled with a high
> > > priority, OVS has already done most of the work that it does on a packet.
> > [[BO'M]] I'm investigating how I could go about classifying packets
> > before the main The work to transmit the
> > > packet is not part of OVS's job, it is the job of the driver, and at
> > > most OVS can mark the packet with a priority or a queue.  OVS can
> > > already do that.  So the usual answer is that it's a matter of
> > > configuring QoS in the driver to do what the user wants.
> > >
> > > On Mon, Apr 10, 2017 at 09:30:12AM +, O Mahony, Billy wrote:
> > > > Hi Everybody,
> > > >
> > > > I just wanted to reflag this discussion below about possible
> > > > methods of
> > > how to prioritize certain types of traffic handled by OVS.
> > > >
> > > > By prioritize I mean either or both of:
> > > > a) 'priority' packets are sent to their destination port faster
> > > > than other packets
> > > > b) in an overloaded situation the switch drops non-prioritized
> > > > packets
> > > rather than prioritized packets.
> > > >
> > > > Also just to be clear I am discussing kernel ovs here. Also I'm
> > > > looking at
> > > doing this without writing new code - ie is it possible and if so
> > > how is it configured using existing OVS.
> > > >
> > > > Thanks again,
> > > > Billy.
> > > >
> > > > > -Original Message-
> > > > > From: ovs-discuss-boun...@openvswitch.org [mailto:ovs-discuss-
> > > > > boun...@openvswitch.org] On Behalf Of O Mahony, Billy
> > > > > Sent: Friday, November 25, 2016 5:04 PM
> > > > > To: ovs-discuss@openvswitch.org
> > > > > Subject: [ovs-discuss] prioritizing latency-sensitive traffic
> > > > >
> > > > > Hi,
> > > > >
> > > > 

Re: [ovs-discuss] prioritizing latency-sensitive traffic

2017-04-18 Thread O Mahony, Billy
Hi Ben, Darrell,

It sounds like the general feeling is that any kind of tc pre-processing is not 
worth it and the existing egress queing/QoS facilities should suffice. 

Thanks for your comments.

/Billy



> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Thursday, April 13, 2017 7:47 PM
> To: O Mahony, Billy <billy.o.mah...@intel.com>
> Cc: ovs-discuss@openvswitch.org
> Subject: Re: [ovs-discuss] prioritizing latency-sensitive traffic
> 
> I don't know how much more OVS can contribute to this than it already does.
> By the time that OVS has classified a packet to the extent that is necessary 
> to
> determine that it should be handled with a high priority, OVS has already
> done most of the work that it does on a packet.  
[[BO'M]] I'm investigating how I could go about classifying packets before the 
main 
The work to transmit the
> packet is not part of OVS's job, it is the job of the driver, and at most OVS 
> can
> mark the packet with a priority or a queue.  OVS can already do that.  So the
> usual answer is that it's a matter of configuring QoS in the driver to do what
> the user wants.
> 
> On Mon, Apr 10, 2017 at 09:30:12AM +, O Mahony, Billy wrote:
> > Hi Everybody,
> >
> > I just wanted to reflag this discussion below about possible methods of
> how to prioritize certain types of traffic handled by OVS.
> >
> > By prioritize I mean either or both of:
> > a) 'priority' packets are sent to their destination port faster than
> > other packets
> > b) in an overloaded situation the switch drops non-prioritized packets
> rather than prioritized packets.
> >
> > Also just to be clear I am discussing kernel ovs here. Also I'm looking at
> doing this without writing new code - ie is it possible and if so how is it
> configured using existing OVS.
> >
> > Thanks again,
> > Billy.
> >
> > > -Original Message-
> > > From: ovs-discuss-boun...@openvswitch.org [mailto:ovs-discuss-
> > > boun...@openvswitch.org] On Behalf Of O Mahony, Billy
> > > Sent: Friday, November 25, 2016 5:04 PM
> > > To: ovs-discuss@openvswitch.org
> > > Subject: [ovs-discuss] prioritizing latency-sensitive traffic
> > >
> > > Hi,
> > >
> > > I have been performing tests investigating latency profiles of
> > > low-bandwidth time-sensitive traffic when the system is busy with
> 'normal' traffic.
> > > Unsurprisingly the latency-sensitive traffic is affected by the
> > > normal traffic and has basically the same latency profile as the normal
> traffic.
> > >
> > > I would like to be able to perform prioritization of traffic as some
> > > protocols such as PTP would benefit greatly from having it's packets 'jump
> the queue'.
> > >
> > > From skimming the documentation it looks that ingress QoS offers
> > > only policing (rate-limiting). Is this actually the case or maybe
> > > I'm not looking in the right place?
> > >
> > > But if so, I am looking at some alternatives:
> > >
> > > a) create two separate egress ports and have PTP listen on one port,
> > > everything else listen on the other port and use normal forwarding
> > > rules to send PTP traffic incoming from eth0 to it's own port. Something
> like:
> > >
> > >  other apps  ptp_daemon
> > >   +   +
> > >   +   +
> > >if_norm if_ptp
> > >++
> > >||
> > >||
> > >   ++++
> > >   |  |
> > >   |ovs   |
> > >   |  |
> > >   +-++
> > > |
> > > +
> > >   eth0
> > >
> > > b) create prioritized queues on a port and use match and actions
> > > such as
> > > set_queue(queue) and enqueue(port, queue) on ingress traffic to
> > > forward the PTP traffic to the higher priority queue. However I
> > > think queue priority for this case only relates to which queue get
> > > to consume the bandwidth of the port first and not about changing
> > > the order in which the packets egress the port.
> > >
> > > c) Or perhaps I can re-use tc PRIO or CBQ qdiscs by passing all
> > > traffic to tc first before ovs?
> > >
> > >  other apps
> > >   |
> > >   |
> > >if_norm
> > >+
> > >|
> > >|

Re: [ovs-discuss] prioritizing latency-sensitive traffic

2017-04-10 Thread O Mahony, Billy
Hi Everybody,

I just wanted to reflag this discussion below about possible methods of how to 
prioritize certain types of traffic handled by OVS.

By prioritize I mean either or both of:
a) 'priority' packets are sent to their destination port faster than other 
packets
b) in an overloaded situation the switch drops non-prioritized packets rather 
than prioritized packets.

Also just to be clear I am discussing kernel ovs here. Also I'm looking at 
doing this without writing new code - ie is it possible and if so how is it 
configured using existing OVS.

Thanks again,
Billy.

> -Original Message-
> From: ovs-discuss-boun...@openvswitch.org [mailto:ovs-discuss-
> boun...@openvswitch.org] On Behalf Of O Mahony, Billy
> Sent: Friday, November 25, 2016 5:04 PM
> To: ovs-discuss@openvswitch.org
> Subject: [ovs-discuss] prioritizing latency-sensitive traffic
> 
> Hi,
> 
> I have been performing tests investigating latency profiles of low-bandwidth
> time-sensitive traffic when the system is busy with 'normal' traffic.
> Unsurprisingly the latency-sensitive traffic is affected by the normal traffic
> and has basically the same latency profile as the normal traffic.
> 
> I would like to be able to perform prioritization of traffic as some protocols
> such as PTP would benefit greatly from having it's packets 'jump the queue'.
> 
> From skimming the documentation it looks that ingress QoS offers only
> policing (rate-limiting). Is this actually the case or maybe I'm not looking 
> in the
> right place?
> 
> But if so, I am looking at some alternatives:
> 
> a) create two separate egress ports and have PTP listen on one port,
> everything else listen on the other port and use normal forwarding rules to
> send PTP traffic incoming from eth0 to it's own port. Something like:
> 
>  other apps  ptp_daemon
>   +   +
>   +   +
>if_norm if_ptp
>++
>||
>||
>   ++++
>   |  |
>   |ovs   |
>   |  |
>   +-++
> |
> +
>   eth0
> 
> b) create prioritized queues on a port and use match and actions such as
> set_queue(queue) and enqueue(port, queue) on ingress traffic to forward
> the PTP traffic to the higher priority queue. However I think queue priority
> for this case only relates to which queue get to consume the bandwidth of
> the port first and not about changing the order in which the packets egress
> the port.
> 
> c) Or perhaps I can re-use tc PRIO or CBQ qdiscs by passing all traffic to tc 
> first
> before ovs?
> 
>  other apps
>   |
>   |
>if_norm
>+
>|
>|
>   +--+
>   |  |
>   |ovs   |
>   |  |
>   +-++
> |
> |
> tc - if_ptp  ptp_daemon
> +
>   eth0
> 
> Any thoughts, ideas or clarifications most welcome.
> 
> Thanks,
> Billy.
> 
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Error attaching device to DPDK

2017-03-29 Thread O Mahony, Billy
Hi All,

Just to add something to this conversation that has not been explicitly 
mentioned below.

Brad outlines how to set other_config:dpdk-socket-mem to reserve hugepages from 
both NUMA nodes in order to be sure to avoid pci topology issues.

Strictly this doesn’t have to be done on both nodes – just on the correct node 
:) – but sometimes it’s easier to just cover all the bases.

In addition you will need to ensure that:
 * That linux has hugepages available on both (or the right) nodes.
 * There is a pmd thread created on each node by setting 
other_config:pmd-cpu-mask correctly – or again at least on the right node.

These config items are described in 
* ./Documentation/intro/install/dpdk.rst
* ./Documentation/howto/dpdk.rst

Currently if there is not a pmd thread on the NUMA node belonging to the PCI 
device the device will not be polled (a warning message is issued to this 
effect).

There is currently a patch under consideration so that if a NUMA-local pmd is 
not available for a device then device will be assigned to some other pmd 
thread. While this is not as efficient due to the need for the data to travel 
between NUMA nodes it is less frustrating to configure. 

https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/329175.html

If you install the hwloc package you can use “hwloc-calc pci=:01:02.0 
--intersect Socket” to show the NUMA node that is local to a pci device. 
However I have seen some systems which report that pci device is local to both 
nodes. The lstopo command can also be used for this.

Hope some of that helps,

Billy.


From: ovs-discuss-boun...@openvswitch.org 
[mailto:ovs-discuss-boun...@openvswitch.org] On Behalf Of Darrell Ball
Sent: Wednesday, March 29, 2017 2:36 AM
To: Brad Cowie ; Shivaram Mysore 
Cc: ovs-discuss@openvswitch.org
Subject: Re: [ovs-discuss] Error attaching device to DPDK



From:  on behalf of Brad Cowie 

Date: Tuesday, March 28, 2017 at 5:35 PM
To: Shivaram Mysore 
Cc: "ovs-discuss@openvswitch.org" 
Subject: Re: [ovs-discuss] Error attaching device to DPDK

Hi folks,
I have been helping Shivaram off-list with this problem and just wanted to 
update this thread with the solution.
Basically the issue was with how hugepages were being preallocated by OVS for 
use with DPDK. I believe by default OVS sets other_config:dpdk-socket-mem to 
"1024,0". Which will give one 1GB hugepage to CPU0 and none to CPU1.
The ports Shivaram was trying to assign to DPDK were connected physically via a 
PCI-E bus to CPU1 and since there are no hugepages preallocated there for DPDK 
then we can't add it.
The fix is simple, to assign a hugepage to CPU1:

$ sudo ovs-vsctl --no-wait set Open_vSwitch . 
other_config:dpdk-socket-mem="1024,1024"
In OVS 2.7.0 unfortunately due to an issue (that is now fixed by commit 
ef6cca6fdc67f3cee21c6bb1c13c4ca7f8241314) you get a segfault rather than an 
error message telling you the problem.
In OVS 2.7.90 however the issue becomes quite clear as there is a nice log 
message like such:

2017-03-29T00:19:57Z|00067|netdev_dpdk|ERR|Insufficient memory to create memory 
pool for netdev dpdk-p0, with MTU 1500 on socket 1
There is documentation regarding these fields, but for case, the documentation 
could simply state something like
“you need to allocate hugepage memory on each numa node where dpdk ports are 
bound and or those ports
will not work with dpdk”
The error log could also be a little more succinct like “no memory pool 
allocated for netdev dpdk-p0,.., usage on socket 1”

Possibly, going even further, when a PMD thread gets auto-allocated to run on a 
given numa node,
memory allocation is defaulted there as well. If it cannot be allocated, an 
error log is generated.

Unfortunately to those unfamiliar with DPDK and NUMA architectures this won't 
be very obvious. Potentially we could add some additional help to the DPDK 
documentation pages for OVS that explains the other_config options you may need 
to reconfigure on multi-CPU NUMA machines?
Brad

On 29 March 2017 at 10:02, Shivaram Mysore  wrote:
Ok thanks.  Will remember the same.  Right now, I will not specify anything as 
I have to get this to work first!

On Tue, Mar 28, 2017 at 2:00 PM, Bodireddy, Bhanuprakash 
 wrote:
>Question:
>
>I don't have any specific cpu cores associated with DPDK.  Will this make any
>difference?  I also did not see this requirement as a MUST have in the
>documentation.

I assume your question is on PMD threads. If you don’t specify any thing 
explicitly
PMD pmd thread shall be created and pinned to core 0.  You can explicitly 
select the no. of pmd threads
and the cores by explicitly setting the affinity mask using 
'other_config:pmd-cpu-mask'.

Based on the affinity mask, the PMD threads shall be spawned and the rx queues 
will be