[ovs-discuss] [ovs-ovn 2.7] How to find the compute node hosting l3 gateway router

2017-09-06 Thread Vikrant Aggarwal
Hi Team,

I have done the installation of packstack pike using ovn as mechanism
driver on centos. I have one controller and two compute nodes.

- Created one tenant geneve based network (added as port to router) and a
flat external network (set as gateway for router).

~~~
[root@controller ~(keystone_admin)]# rpm -qa | awk '/openvswitch-ovn/
{print $1}'
openvswitch-ovn-common-2.7.2-3.1fc27.el7.x86_64
openvswitch-ovn-host-2.7.2-3.1fc27.el7.x86_64
openvswitch-ovn-central-2.7.2-3.1fc27.el7.x86_64
~~~

I am trying to find a compute node on which my gateway router is hosted and
also the command to check the health of distributed logical routers.

It seems like that "lrp-get-gateway-chassis" command is not present in the
version which I am using.

~~~
[root@controller ~]# ovn-nbctl lrp-get-gateway-chassis
ovn-nbctl: unknown command 'lrp-get-gateway-chassis'; use --help for help

[root@controller ~]# ovn-nbctl --help | grep -i gateway

~~~

Output of ovn-nbctl show.

~~~

[root@controller ~(keystone_admin)]# ovn-nbctl show
switch 0d413d9c-7f23-4ace-9a8a-29817b3b33b5
(neutron-89113f8b-bc01-46b1-84fb-edd5d606879c)
port 6fe3cab5-5f84-44c8-90f2-64c21b489c62
addresses: ["fa:16:3e:fa:d6:d3 10.10.10.9"]
port 397c019e-9bc3-49d3-ac4c-4aeeb1b3ba3e
addresses: ["router"]
port 4c72cee2-35b7-4bcd-8c77-135a22d16df1
addresses: ["fa:16:3e:55:3f:be 10.10.10.4"]
switch 1ec08997-0899-40d1-9b74-0a25ef476c00
(neutron-e411bbe8-e169-4268-b2bf-d5959d9d7260)
port provnet-e411bbe8-e169-4268-b2bf-d5959d9d7260
addresses: ["unknown"]
port b95e9ae7-5c91-4037-8d2c-660d4af00974
addresses: ["router"]
router 7418a4e7-abff-4af7-85f5-6eea2ede9bea
(neutron-67dc2e78-e109-4dac-acce-b71b2c944dc1)
port lrp-b95e9ae7-5c91-4037-8d2c-660d4af00974
mac: "fa:16:3e:52:20:7c"
networks: ["192.168.122.50/24"]
port lrp-397c019e-9bc3-49d3-ac4c-4aeeb1b3ba3e
mac: "fa:16:3e:87:28:40"
networks: ["10.10.10.1/24"]
~~~


Thanks & Regards,
Vikrant Aggarwal
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread 王志克


-Original Message-
From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] 
Sent: Wednesday, September 06, 2017 10:49 PM
To: Kevin Traynor; Jan Scheurich; 王志克; Darrell Ball; 
ovs-discuss@openvswitch.org; ovs-...@openvswitch.org
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port



> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Wednesday, September 6, 2017 2:50 PM
> To: Jan Scheurich ; O Mahony, Billy
> ; wangzh...@jd.com; Darrell Ball
> ; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> On 09/06/2017 02:33 PM, Jan Scheurich wrote:
> > Hi Billy,
> >
> >> You are going to have to take the hit crossing the NUMA boundary at
> some point if your NIC and VM are on different NUMAs.
> >>
> >> So are you saying that it is more expensive to cross the NUMA
> >> boundary from the pmd to the VM that to cross it from the NIC to the
> PMD?
> >
> > Indeed, that is the case: If the NIC crosses the QPI bus when storing
> packets in the remote NUMA there is no cost involved for the PMD. (The QPI
> bandwidth is typically not a bottleneck.) The PMD only performs local
> memory access.
> >
> > On the other hand, if the PMD crosses the QPI when copying packets into a
> remote VM, there is a huge latency penalty involved, consuming lots of PMD
> cycles that cannot be spent on processing packets. We at Ericsson have
> observed exactly this behavior.
> >
> > This latency penalty becomes even worse when the LLC cache hit rate is
> degraded due to LLC cache contention with real VNFs and/or unfavorable
> packet buffer re-use patterns as exhibited by real VNFs compared to typical
> synthetic benchmark apps like DPDK testpmd.
> >
> >>
> >> If so then in that case you'd like to have two (for example) PMDs
> >> polling 2 queues on the same NIC. With the PMDs on each of the NUMA
> nodes forwarding to the VMs local to that NUMA?
> >>
> >> Of course your NIC would then also need to be able know which VM (or
> >> at least which NUMA the VM is on) in order to send the frame to the
> correct rxq.
> >
> > That would indeed be optimal but hard to realize in the general case (e.g.
> with VXLAN encapsulation) as the actual destination is only known after
> tunnel pop. Here perhaps some probabilistic steering of RSS hash values
> based on measured distribution of final destinations might help in the future.
> >
> > But even without that in place, we need PMDs on both NUMAs anyhow
> (for NUMA-aware polling of vhostuser ports), so why not use them to also
> poll remote eth ports. We can achieve better average performance with
> fewer PMDs than with the current limitation to NUMA-local polling.
> >
> 
> If the user has some knowledge of the numa locality of ports and can place
> VM's accordingly, default cross-numa assignment can be harm performance.
> Also, it would make for very unpredictable performance from test to test and
> even for flow to flow on a datapath.
[[BO'M]] Wang's original request would constitute default cross numa assignment 
but I don't think this modified proposal would as it still requires explicit 
config to assign to the remote NUMA.

[Wangzhike] I think configuration option or compiling option are OK to me, 
since only phyiscal NIC rxq needs be configrued. It is only one-shot job.
Regarding the test concern, I think it is worth to clarify different 
performance if the new behavior improves the rx throughput a lot.
> 
> Kevin.
> 
> > BR, Jan
> >

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread 王志克
Hi Billy,

Please see my reply in line.

Br,
Wang Zhike

-Original Message-
From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] 
Sent: Wednesday, September 06, 2017 9:01 PM
To: 王志克; Darrell Ball; ovs-discuss@openvswitch.org; ovs-...@openvswitch.org; 
Kevin Traynor
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

Hi Wang,

I think the mention of pinning was confusing me a little. Let me see if I fully 
understand your use case:  You don't 'want' to pin anything but you are using 
it as a way to force the distribution of rxq from a single nic across to PMDs 
on different NUMAs. As without pinning all rxqs are assigned to the NUMA-local 
pmd leaving the other PMD totally unused.

But then when you used pinning you the PMDs became isolated so the vhostuser 
ports rxqs would not be assigned to the PMDs unless they too were pinned. Which 
worked but was not manageable as VM (and vhost ports) came and went.

Yes? 
[Wang Zhike] Yes, exactly.

In that case what we probably want is the ability to pin an rxq to a pmd but 
without also isolating the pmd. So the PMD could be assigned some rxqs manually 
and still have others automatically assigned. 

But what I still don't understand is why you don't put both PMDs on the same 
NUMA node. Given that you cannot program the NIC to know which VM a frame is 
for then you would have to RSS the frames across rxqs (ie across NUMA nodes). 
Of those going to the NICs local-numa node 50% would have to go across the NUMA 
boundary when their destination VM was decided - which is okay - they have to 
cross the boundary at some point. But for or frames going to non-local NUMA, 
50% of these will actually be destined for what was originally the local NUMA 
node. Now these packets (25% of all traffic would ) will cross NUMA *twice* 
whereas if all PMDs were on the NICs NUMA node those frames would never have 
had to pass between NUMA nodes.

In short I think it's more efficient to have both PMDs on the same NUMA node as 
the NIC.

[Wang Zhike] If considering Tx direction, i.e, from VM on different NUMA node 
to phy NIC, I am not sure whether your proposal would downgrade the TX 
performance...
I will try to test different cross-NUMA scenario to get the performance penalty 
data.

There is one more comments below..

> -Original Message-
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Wednesday, September 6, 2017 12:50 PM
> To: O Mahony, Billy ; Darrell Ball
> ; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor 
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> See my reply in line.
> 
> Br,
> Wang Zhike
> 
> -Original Message-
> From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> Sent: Wednesday, September 06, 2017 7:26 PM
> To: 王志克; Darrell Ball; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Wang,
> 
> You are going to have to take the hit crossing the NUMA boundary at some
> point if your NIC and VM are on different NUMAs.
> 
> So are you saying that it is more expensive to cross the NUMA boundary
> from the pmd to the VM that to cross it from the NIC to the PMD?
> 
> [Wang Zhike] I do not have such data. I hope we can try the new behavior
> and get the test result, and then know whether and how much performance
> can be improved.

[[BO'M]] You don't need to a code change to compare performance of these two 
scenarios. You can simulate it by pinning queues to VMs. I'd imagine crossing 
the NUMA boundary during the PCI DMA would be cheaper that crossing it over 
vhost. But I don't know what the result would be and this would a pretty 
interesting figure to have by the way.


> 
> If so then in that case you'd like to have two (for example) PMDs polling 2
> queues on the same NIC. With the PMDs on each of the NUMA nodes
> forwarding to the VMs local to that NUMA?
> 
> Of course your NIC would then also need to be able know which VM (or at
> least which NUMA the VM is on) in order to send the frame to the correct
> rxq.
> 
> [Wang Zhike] Currently I do not know how to achieve it. From my view, NIC
> do not know which NUMA should be the destination of the packet. Only
> after OVS handling (eg lookup the fowarding rule in OVS), then it can know
> the destination. If NIC does not know the destination NUMA socket, it does
> not matter which PMD to poll it.
> 
> 
> /Billy.
> 
> > -Original Message-
> > From: 王志克 [mailto:wangzh...@jd.com]
> > Sent: Wednesday, September 6, 2017 11:41 AM
> > To: O Mahony, Billy ; Darrell Ball
> > ; ovs-discuss@openvswitch.org; ovs-
> > d...@openvswitch.org; Kevin Traynor 
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Billy,
> >
> > It depends on the destination of the traffic.
> >
> > I observed that if the traffic destination is across NUMA socket, th

Re: [ovs-discuss] conntrack: Another ct-clean thread crash bug

2017-09-06 Thread Huanglili (lee)
Sorry for the confuse, we'd like the new flow goes into the br-linux. 
   vm 
___|_3__ ___ 
||__1|   |
||__2|   |
||   |___|
| 4  
ovs+dpdk   br-linux

The new flow to vm will go with:
4->2->1->3
The est:
4->3

--
Darrell Ball [mailto:db...@vmware.com] 
20170906 23:50
Sender: Huanglili (lee); ovs-discuss@openvswitch.org
CC: b...@nicira.com; caihe; liucheng (J)
OBJ: Re: [ovs-discuss] conntrack: Another ct-clean thread crash bug

Hmm, that seems odd.
Also, the code change you propose below does not make sense and would likely 
cause similar crashes itself.

Maybe, you explain what you are trying to do in your testing ?
Can you say what traffic are you sending and from which ports ?

I’ll take another look at the related code.

Darrell


On 9/6/17, 6:14 AM, "Huanglili (lee)"  wrote:

Hi,
We met another vswitchd crash when we use ct(nat) (ovs+dpdk).

Program terminated with signal 11, Segmentation fault.
#0  0x00574a0b in hmap_remove (node=0x7f150c6e60a8, 
hmap=0x7f1553c40780) at lib/hmap.h:270
while (*bucket != node) {

(gdb) bt
#0  0x00574a0b in hmap_remove (node=0x7f150c6e60a8, 
hmap=0x7f1553c40780)
#1  sweep_bucket (limit=1808, now=563303851, ctb=0x7f1553c40778, 
ct=0x7f1553c3f9a8)
#2  conntrack_clean (now=563303851, ct=0x7f1553c3f9a8) 
#3  clean_thread_main (f_=0x7f1553c3f9a8) 

This crash can be triggered by using following flows, maybe the flows are 
not reasonable, but shouldn't trigger crash
"table=0,priority=2,in_port=1 actions=resubmit(,2)
table=0,priority=2,in_port=4 actions=resubmit(,2)
table=0,priority=0 actions=drop
table=0,priority=1 actions=resubmit(,10)
table=1,priority=0 actions=resubmit(,14)
table=2,priority=0 actions=resubmit(,4)
table=4,priority=0 actions=resubmit(,14)
table=10,priority=2,arp actions=resubmit(,12)
table=10,priority=1,dl_src=90:E2:BA:69:CD:61 actions=resubmit(,1)
table=10,priority=0 actions=drop

table=12,priority=3,arp,dl_src=90:E2:BA:69:CD:61,arp_spa=194.168.100.1,arp_sha=90:E2:BA:69:CD:61
 actions=resubmit(,1)
table=12,priority=2,arp actions=drop
table=14,priority=6,ip actions=ct(table=16,zone=1)
table=14,priority=0 actions=resubmit(,20)
table=14,priority=20,ip,ip_frag=yes,actions=resubmit(,18)
table=16,priority=20,ct_state=+est+trk,ip actions=resubmit(,20)
table=16,priority=15,ct_state=+rel+trk,ip actions=resubmit(,20)
table=16,priority=10,ct_mark=0x8000/0x8000,udp actions=resubmit(,20)
table=16,priority=5,ct_state=+new+trk,ip,in_port=3 actions=resubmit(,18)
table=16,priority=5,ct_state=+new+trk,ip,in_port=4 actions=resubmit(,18)
table=16,priority=5,ct_state=+new+trk,ip,in_port=2 
actions=ct(commit,zone=1,exec(load:0x1->NXM_NX_CT_MARK[31])),output:4
table=16,priority=5,ct_state=+new+trk,ip,in_port=1 
actions=ct(commit,zone=1,exec(load:0x1->NXM_NX_CT_MARK[31])),output:3
table=18,priority=0,in_port=3 actions=ct(zone=1,table=24)
table=18,priority=0,in_port=2 actions=output:4
table=18,priority=0,in_port=4,ip 
actions=ct(commit,zone=1,nat(dst=194.168.100.1)),2
table=18,priority=0,in_port=1 actions=output:3
table=20,priority=10,in_port=3,ip actions=ct(zone=1,table=22)
table=20,priority=10,in_port=4,ip actions=ct(zone=1,table=23)
table=20,priority=1 actions=ct(zone=1,table=18)
table=22,priority=10,in_port=3 action=4
table=23,priority=10,in_port=4 action=3
table=24,priority=10,in_port=3 action=1"

The networking:
vm
 |
br-ply - br-linux
 |
br-int

We find rev_conn is in the list of ctb->exp_lists[] sometimes.
The following change will solve this problem, but we can't explain why

$ git diff
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 419cb1d..d5141c4 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ nat_clean(struct conntrack *ct, struct conn *conn,
 if (rev_conn && (!nat_conn_key_node ||
  conn_key_cmp(&nat_conn_key_node->value,
   &rev_conn->rev_key))) {
+ovs_list_remove(&rev_conn->exp_node);
 hmap_remove(&ct->buckets[bucket_rev_conn].connections,
 &rev_conn->node);
 free(rev_conn);
@@ create_un_nat_conn(struct conntrack *ct, struct conn *conn_f
or_un_nat_copy,
 nat_conn_keys_lookup(&ct->nat_conn_keys, &nc->key, 
ct->hash_basis);
 if (nat_conn_key_node && !conn_key_cmp(&nat_conn_key_node->value,
 &nc->rev_key) && !rev_conn) {
-
+ovs_list_init(&nc->exp_node);
  

Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread Jan Scheurich
> 
> I think the mention of pinning was confusing me a little. Let me see if I 
> fully understand your use case:  You don't 'want' to pin
> anything but you are using it as a way to force the distribution of rxq from 
> a single nic across to PMDs on different NUMAs. As without
> pinning all rxqs are assigned to the NUMA-local pmd leaving the other PMD 
> totally unused.
> 
> But then when you used pinning you the PMDs became isolated so the vhostuser 
> ports rxqs would not be assigned to the PMDs unless
> they too were pinned. Which worked but was not manageable as VM (and vhost 
> ports) came and went.
> 
> Yes?

Yes!!!

> 
> In that case what we probably want is the ability to pin an rxq to a pmd but 
> without also isolating the pmd. So the PMD could be
> assigned some rxqs manually and still have others automatically assigned.

Wonderful. That is exactly what I have wanted to propose for a while: Separate 
PMD isolation from pinning of Rx queues. 

Tying these two together makes it impossible to use pinning of Rx queues in 
OpenStack context (without the addition of dedicated PMDs/cores). And even 
during manual testing it is a nightmare to have to manually pin all 48 
vhostuser queues just because we want to pin the two heavy-loaded Rx queues to 
different PMDs.

The idea would be to introduce a separate configuration option for PMDs to 
isolate them, and no longer automatically set that when pinning an rx queue to 
the PMD.

BR, Jan
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread Jan Scheurich
Hi Billy,

> You are going to have to take the hit crossing the NUMA boundary at some 
> point if your NIC and VM are on different NUMAs.
> 
> So are you saying that it is more expensive to cross the NUMA boundary from 
> the pmd to the VM that to cross it from the NIC to the
> PMD?

Indeed, that is the case: If the NIC crosses the QPI bus when storing packets 
in the remote NUMA there is no cost involved for the PMD. (The QPI bandwidth is 
typically not a bottleneck.) The PMD only performs local memory access.

On the other hand, if the PMD crosses the QPI when copying packets into a 
remote VM, there is a huge latency penalty involved, consuming lots of PMD 
cycles that cannot be spent on processing packets. We at Ericsson have observed 
exactly this behavior.

This latency penalty becomes even worse when the LLC cache hit rate is degraded 
due to LLC cache contention with real VNFs and/or unfavorable packet buffer 
re-use patterns as exhibited by real VNFs compared to typical synthetic 
benchmark apps like DPDK testpmd.

> 
> If so then in that case you'd like to have two (for example) PMDs polling 2 
> queues on the same NIC. With the PMDs on each of the
> NUMA nodes forwarding to the VMs local to that NUMA?
> 
> Of course your NIC would then also need to be able know which VM (or at least 
> which NUMA the VM is on) in order to send the frame
> to the correct rxq.

That would indeed be optimal but hard to realize in the general case (e.g. with 
VXLAN encapsulation) as the actual destination is only known after tunnel pop. 
Here perhaps some probabilistic steering of RSS hash values based on measured 
distribution of final destinations might help in the future.

But even without that in place, we need PMDs on both NUMAs anyhow (for 
NUMA-aware polling of vhostuser ports), so why not use them to also poll remote 
eth ports. We can achieve better average performance with fewer PMDs than with 
the current limitation to NUMA-local polling.

BR, Jan

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] conntrack: Another ct-clean thread crash bug

2017-09-06 Thread Darrell Ball
Hmm, that seems odd.
Also, the code change you propose below does not make sense and would likely 
cause similar crashes itself.

Maybe, you explain what you are trying to do in your testing ?
Can you say what traffic are you sending and from which ports ?

I’ll take another look at the related code.

Darrell


On 9/6/17, 6:14 AM, "Huanglili (lee)"  wrote:

Hi,
We met another vswitchd crash when we use ct(nat) (ovs+dpdk).

Program terminated with signal 11, Segmentation fault.
#0  0x00574a0b in hmap_remove (node=0x7f150c6e60a8, 
hmap=0x7f1553c40780) at lib/hmap.h:270
while (*bucket != node) {

(gdb) bt
#0  0x00574a0b in hmap_remove (node=0x7f150c6e60a8, 
hmap=0x7f1553c40780)
#1  sweep_bucket (limit=1808, now=563303851, ctb=0x7f1553c40778, 
ct=0x7f1553c3f9a8)
#2  conntrack_clean (now=563303851, ct=0x7f1553c3f9a8) 
#3  clean_thread_main (f_=0x7f1553c3f9a8) 

This crash can be triggered by using following flows, maybe the flows are 
not reasonable, but shouldn't trigger crash
"table=0,priority=2,in_port=1 actions=resubmit(,2)
table=0,priority=2,in_port=4 actions=resubmit(,2)
table=0,priority=0 actions=drop
table=0,priority=1 actions=resubmit(,10)
table=1,priority=0 actions=resubmit(,14)
table=2,priority=0 actions=resubmit(,4)
table=4,priority=0 actions=resubmit(,14)
table=10,priority=2,arp actions=resubmit(,12)
table=10,priority=1,dl_src=90:E2:BA:69:CD:61 actions=resubmit(,1)
table=10,priority=0 actions=drop

table=12,priority=3,arp,dl_src=90:E2:BA:69:CD:61,arp_spa=194.168.100.1,arp_sha=90:E2:BA:69:CD:61
 actions=resubmit(,1)
table=12,priority=2,arp actions=drop
table=14,priority=6,ip actions=ct(table=16,zone=1)
table=14,priority=0 actions=resubmit(,20)
table=14,priority=20,ip,ip_frag=yes,actions=resubmit(,18)
table=16,priority=20,ct_state=+est+trk,ip actions=resubmit(,20)
table=16,priority=15,ct_state=+rel+trk,ip actions=resubmit(,20)
table=16,priority=10,ct_mark=0x8000/0x8000,udp actions=resubmit(,20)
table=16,priority=5,ct_state=+new+trk,ip,in_port=3 actions=resubmit(,18)
table=16,priority=5,ct_state=+new+trk,ip,in_port=4 actions=resubmit(,18)
table=16,priority=5,ct_state=+new+trk,ip,in_port=2 
actions=ct(commit,zone=1,exec(load:0x1->NXM_NX_CT_MARK[31])),output:4
table=16,priority=5,ct_state=+new+trk,ip,in_port=1 
actions=ct(commit,zone=1,exec(load:0x1->NXM_NX_CT_MARK[31])),output:3
table=18,priority=0,in_port=3 actions=ct(zone=1,table=24)
table=18,priority=0,in_port=2 actions=output:4
table=18,priority=0,in_port=4,ip 
actions=ct(commit,zone=1,nat(dst=194.168.100.1)),2
table=18,priority=0,in_port=1 actions=output:3
table=20,priority=10,in_port=3,ip actions=ct(zone=1,table=22)
table=20,priority=10,in_port=4,ip actions=ct(zone=1,table=23)
table=20,priority=1 actions=ct(zone=1,table=18)
table=22,priority=10,in_port=3 action=4
table=23,priority=10,in_port=4 action=3
table=24,priority=10,in_port=3 action=1"

The networking:
vm
 |
br-ply - br-linux
 |
br-int

We find rev_conn is in the list of ctb->exp_lists[] sometimes.
The following change will solve this problem, but we can't explain why

$ git diff
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 419cb1d..d5141c4 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ nat_clean(struct conntrack *ct, struct conn *conn,
 if (rev_conn && (!nat_conn_key_node ||
  conn_key_cmp(&nat_conn_key_node->value,
   &rev_conn->rev_key))) {
+ovs_list_remove(&rev_conn->exp_node);
 hmap_remove(&ct->buckets[bucket_rev_conn].connections,
 &rev_conn->node);
 free(rev_conn);
@@ create_un_nat_conn(struct conntrack *ct, struct conn *conn_f
or_un_nat_copy,
 nat_conn_keys_lookup(&ct->nat_conn_keys, &nc->key, 
ct->hash_basis);
 if (nat_conn_key_node && !conn_key_cmp(&nat_conn_key_node->value,
 &nc->rev_key) && !rev_conn) {
-
+ovs_list_init(&nc->exp_node);
 hmap_insert(&ct->buckets[un_nat_conn_bucket].connections,
 &nc->node, un_nat_hash);

Any idea?

Thanks.



On 8/24/17, 3:36 AM, "ovs-dev-boun...@openvswitch.org on behalf of 
huanglili"  wrote:

From: Lili Huang 

Conn should be removed from the list before freed.

This crash will be triggered when a established flow do ct(nat)
again, like
"ip,actions=ct(table=1)
 table=1,in_port=1,ip,actions=ct(commit,nat(dst=5.5.5.5)),2
 table=1,in_port=2,ip,ct

Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread O Mahony, Billy


> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Wednesday, September 6, 2017 3:02 PM
> To: Jan Scheurich ; O Mahony, Billy
> ; wangzh...@jd.com; Darrell Ball
> ; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> On 09/06/2017 02:43 PM, Jan Scheurich wrote:
> >>
> >> I think the mention of pinning was confusing me a little. Let me see
> >> if I fully understand your use case:  You don't 'want' to pin
> >> anything but you are using it as a way to force the distribution of rxq 
> >> from
> a single nic across to PMDs on different NUMAs. As without pinning all rxqs
> are assigned to the NUMA-local pmd leaving the other PMD totally unused.
> >>
> >> But then when you used pinning you the PMDs became isolated so the
> >> vhostuser ports rxqs would not be assigned to the PMDs unless they too
> were pinned. Which worked but was not manageable as VM (and vhost
> ports) came and went.
> >>
> >> Yes?
> >
> > Yes!!!
[[BO'M]] Hurrah!
> >
> >>
> >> In that case what we probably want is the ability to pin an rxq to a
> >> pmd but without also isolating the pmd. So the PMD could be assigned
> some rxqs manually and still have others automatically assigned.
> >
> > Wonderful. That is exactly what I have wanted to propose for a while:
> Separate PMD isolation from pinning of Rx queues.
> >
> > Tying these two together makes it impossible to use pinning of Rx queues
> in OpenStack context (without the addition of dedicated PMDs/cores). And
> even during manual testing it is a nightmare to have to manually pin all 48
> vhostuser queues just because we want to pin the two heavy-loaded Rx
> queues to different PMDs.
> >
> 
> That sounds like it would be useful. Do you know in advance of running which
> rxq's they will be? i.e. you know it's particular port and there is only one
> queue. Or you don't know but analyze at runtime and then reconfigure?
> 
> > The idea would be to introduce a separate configuration option for PMDs
> to isolate them, and no longer automatically set that when pinning an rx
> queue to the PMD.
> >
> 
> Please don't break backward compatibility. I think it would be better to keep
> the existing command as is and add a new softer version that allows other
> rxq's to be scheduled on that pmd also.
[[BO'M]] Although is implicit isolation feature of pmd-rxq-affinity actuall 
used in the wild?  But still it's sensible to introduce the new 'softer 
version' as you say. 
> 
> Kevin.
> 
> > BR, Jan
> >

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread O Mahony, Billy


> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Wednesday, September 6, 2017 2:50 PM
> To: Jan Scheurich ; O Mahony, Billy
> ; wangzh...@jd.com; Darrell Ball
> ; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> On 09/06/2017 02:33 PM, Jan Scheurich wrote:
> > Hi Billy,
> >
> >> You are going to have to take the hit crossing the NUMA boundary at
> some point if your NIC and VM are on different NUMAs.
> >>
> >> So are you saying that it is more expensive to cross the NUMA
> >> boundary from the pmd to the VM that to cross it from the NIC to the
> PMD?
> >
> > Indeed, that is the case: If the NIC crosses the QPI bus when storing
> packets in the remote NUMA there is no cost involved for the PMD. (The QPI
> bandwidth is typically not a bottleneck.) The PMD only performs local
> memory access.
> >
> > On the other hand, if the PMD crosses the QPI when copying packets into a
> remote VM, there is a huge latency penalty involved, consuming lots of PMD
> cycles that cannot be spent on processing packets. We at Ericsson have
> observed exactly this behavior.
> >
> > This latency penalty becomes even worse when the LLC cache hit rate is
> degraded due to LLC cache contention with real VNFs and/or unfavorable
> packet buffer re-use patterns as exhibited by real VNFs compared to typical
> synthetic benchmark apps like DPDK testpmd.
> >
> >>
> >> If so then in that case you'd like to have two (for example) PMDs
> >> polling 2 queues on the same NIC. With the PMDs on each of the NUMA
> nodes forwarding to the VMs local to that NUMA?
> >>
> >> Of course your NIC would then also need to be able know which VM (or
> >> at least which NUMA the VM is on) in order to send the frame to the
> correct rxq.
> >
> > That would indeed be optimal but hard to realize in the general case (e.g.
> with VXLAN encapsulation) as the actual destination is only known after
> tunnel pop. Here perhaps some probabilistic steering of RSS hash values
> based on measured distribution of final destinations might help in the future.
> >
> > But even without that in place, we need PMDs on both NUMAs anyhow
> (for NUMA-aware polling of vhostuser ports), so why not use them to also
> poll remote eth ports. We can achieve better average performance with
> fewer PMDs than with the current limitation to NUMA-local polling.
> >
> 
> If the user has some knowledge of the numa locality of ports and can place
> VM's accordingly, default cross-numa assignment can be harm performance.
> Also, it would make for very unpredictable performance from test to test and
> even for flow to flow on a datapath.
[[BO'M]] Wang's original request would constitute default cross numa assignment 
but I don't think this modified proposal would as it still requires explicit 
config to assign to the remote NUMA.
> 
> Kevin.
> 
> > BR, Jan
> >

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] adding dpdk ports sharing same pci address to ovs-dpdk bridge

2017-09-06 Thread devendra rawat
Hi,

I have compiled and built ovs-dpdk using DPDK v17.08 and OVS v2.8.0. The
NIC that I am using is Mellanox ConnectX-3 Pro, which is a dual port 10G
NIC. The problem with this NIC is that it provides only one PCI address for
both the 10G ports.

So when I am trying to add the two DPDK ports to my br0 bridge

# ovs-vsctl --no-wait add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
options:dpdk-devargs=0002:01:00.0

# ovs-vsctl --no-wait add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
options:dpdk-devargs=0002:01:00.0

The port dpdk1 is added successfully and able to transfer data, but adding
dpdk0 to br0 fails:

2017-09-06T14:19:20Z|00045|netdev_dpdk|INFO|Port 0: e4:1d:2d:4f:78:60
2017-09-06T14:19:20Z|00046|bridge|INFO|bridge br0: added interface dpdk1 on
port 1
2017-09-06T14:19:20Z|00047|bridge|INFO|bridge br0: added interface br0 on
port 65534
2017-09-06T14:19:20Z|00048|dpif_netlink|WARN|Generic Netlink family
'ovs_datapath' does not exist. The Open vSwitch kernel module is probably
not loaded.
2017-09-06T14:19:20Z|00049|netdev_dpdk|WARN|'dpdk0' is trying to use device
'0002:01:00.0' which is already in use by 'dpdk1'
2017-09-06T14:19:20Z|00050|netdev|WARN|dpdk0: could not set configuration
(Address already in use)
2017-09-06T14:19:20Z|00051|bridge|INFO|bridge br0: using datapath ID
e41d2d4f7860


With OVS v2.6.1 I never had this problem as dpdk-devargs was not mandatory
and just specifying port name was enough to add that port to bridge.

Is there a way to add port both ports to bridge ?

Thanks,
Devendra
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread Kevin Traynor
On 09/06/2017 02:43 PM, Jan Scheurich wrote:
>>
>> I think the mention of pinning was confusing me a little. Let me see if I 
>> fully understand your use case:  You don't 'want' to pin
>> anything but you are using it as a way to force the distribution of rxq from 
>> a single nic across to PMDs on different NUMAs. As without
>> pinning all rxqs are assigned to the NUMA-local pmd leaving the other PMD 
>> totally unused.
>>
>> But then when you used pinning you the PMDs became isolated so the vhostuser 
>> ports rxqs would not be assigned to the PMDs unless
>> they too were pinned. Which worked but was not manageable as VM (and vhost 
>> ports) came and went.
>>
>> Yes?
> 
> Yes!!!
> 
>>
>> In that case what we probably want is the ability to pin an rxq to a pmd but 
>> without also isolating the pmd. So the PMD could be
>> assigned some rxqs manually and still have others automatically assigned.
> 
> Wonderful. That is exactly what I have wanted to propose for a while: 
> Separate PMD isolation from pinning of Rx queues. 
> 
> Tying these two together makes it impossible to use pinning of Rx queues in 
> OpenStack context (without the addition of dedicated PMDs/cores). And even 
> during manual testing it is a nightmare to have to manually pin all 48 
> vhostuser queues just because we want to pin the two heavy-loaded Rx queues 
> to different PMDs.
> 

That sounds like it would be useful. Do you know in advance of running
which rxq's they will be? i.e. you know it's particular port and there
is only one queue. Or you don't know but analyze at runtime and then
reconfigure?

> The idea would be to introduce a separate configuration option for PMDs to 
> isolate them, and no longer automatically set that when pinning an rx queue 
> to the PMD.
> 

Please don't break backward compatibility. I think it would be better to
keep the existing command as is and add a new softer version that allows
other rxq's to be scheduled on that pmd also.

Kevin.

> BR, Jan
> 

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread Kevin Traynor
On 09/06/2017 02:33 PM, Jan Scheurich wrote:
> Hi Billy,
> 
>> You are going to have to take the hit crossing the NUMA boundary at some 
>> point if your NIC and VM are on different NUMAs.
>>
>> So are you saying that it is more expensive to cross the NUMA boundary from 
>> the pmd to the VM that to cross it from the NIC to the
>> PMD?
> 
> Indeed, that is the case: If the NIC crosses the QPI bus when storing packets 
> in the remote NUMA there is no cost involved for the PMD. (The QPI bandwidth 
> is typically not a bottleneck.) The PMD only performs local memory access.
> 
> On the other hand, if the PMD crosses the QPI when copying packets into a 
> remote VM, there is a huge latency penalty involved, consuming lots of PMD 
> cycles that cannot be spent on processing packets. We at Ericsson have 
> observed exactly this behavior.
> 
> This latency penalty becomes even worse when the LLC cache hit rate is 
> degraded due to LLC cache contention with real VNFs and/or unfavorable packet 
> buffer re-use patterns as exhibited by real VNFs compared to typical 
> synthetic benchmark apps like DPDK testpmd.
> 
>>
>> If so then in that case you'd like to have two (for example) PMDs polling 2 
>> queues on the same NIC. With the PMDs on each of the
>> NUMA nodes forwarding to the VMs local to that NUMA?
>>
>> Of course your NIC would then also need to be able know which VM (or at 
>> least which NUMA the VM is on) in order to send the frame
>> to the correct rxq.
> 
> That would indeed be optimal but hard to realize in the general case (e.g. 
> with VXLAN encapsulation) as the actual destination is only known after 
> tunnel pop. Here perhaps some probabilistic steering of RSS hash values based 
> on measured distribution of final destinations might help in the future.
> 
> But even without that in place, we need PMDs on both NUMAs anyhow (for 
> NUMA-aware polling of vhostuser ports), so why not use them to also poll 
> remote eth ports. We can achieve better average performance with fewer PMDs 
> than with the current limitation to NUMA-local polling.
> 

If the user has some knowledge of the numa locality of ports and can
place VM's accordingly, default cross-numa assignment can be harm
performance. Also, it would make for very unpredictable performance from
test to test and even for flow to flow on a datapath.

Kevin.

> BR, Jan
> 

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] conntrack: Another ct-clean thread crash bug

2017-09-06 Thread Huanglili (lee)
Hi,
We met another vswitchd crash when we use ct(nat) (ovs+dpdk).

Program terminated with signal 11, Segmentation fault.
#0  0x00574a0b in hmap_remove (node=0x7f150c6e60a8, 
hmap=0x7f1553c40780) at lib/hmap.h:270
while (*bucket != node) {

(gdb) bt
#0  0x00574a0b in hmap_remove (node=0x7f150c6e60a8, hmap=0x7f1553c40780)
#1  sweep_bucket (limit=1808, now=563303851, ctb=0x7f1553c40778, 
ct=0x7f1553c3f9a8)
#2  conntrack_clean (now=563303851, ct=0x7f1553c3f9a8) 
#3  clean_thread_main (f_=0x7f1553c3f9a8) 

This crash can be triggered by using following flows, maybe the flows are not 
reasonable, but shouldn't trigger crash
"table=0,priority=2,in_port=1 actions=resubmit(,2)
table=0,priority=2,in_port=4 actions=resubmit(,2)
table=0,priority=0 actions=drop
table=0,priority=1 actions=resubmit(,10)
table=1,priority=0 actions=resubmit(,14)
table=2,priority=0 actions=resubmit(,4)
table=4,priority=0 actions=resubmit(,14)
table=10,priority=2,arp actions=resubmit(,12)
table=10,priority=1,dl_src=90:E2:BA:69:CD:61 actions=resubmit(,1)
table=10,priority=0 actions=drop
table=12,priority=3,arp,dl_src=90:E2:BA:69:CD:61,arp_spa=194.168.100.1,arp_sha=90:E2:BA:69:CD:61
 actions=resubmit(,1)
table=12,priority=2,arp actions=drop
table=14,priority=6,ip actions=ct(table=16,zone=1)
table=14,priority=0 actions=resubmit(,20)
table=14,priority=20,ip,ip_frag=yes,actions=resubmit(,18)
table=16,priority=20,ct_state=+est+trk,ip actions=resubmit(,20)
table=16,priority=15,ct_state=+rel+trk,ip actions=resubmit(,20)
table=16,priority=10,ct_mark=0x8000/0x8000,udp actions=resubmit(,20)
table=16,priority=5,ct_state=+new+trk,ip,in_port=3 actions=resubmit(,18)
table=16,priority=5,ct_state=+new+trk,ip,in_port=4 actions=resubmit(,18)
table=16,priority=5,ct_state=+new+trk,ip,in_port=2 
actions=ct(commit,zone=1,exec(load:0x1->NXM_NX_CT_MARK[31])),output:4
table=16,priority=5,ct_state=+new+trk,ip,in_port=1 
actions=ct(commit,zone=1,exec(load:0x1->NXM_NX_CT_MARK[31])),output:3
table=18,priority=0,in_port=3 actions=ct(zone=1,table=24)
table=18,priority=0,in_port=2 actions=output:4
table=18,priority=0,in_port=4,ip 
actions=ct(commit,zone=1,nat(dst=194.168.100.1)),2
table=18,priority=0,in_port=1 actions=output:3
table=20,priority=10,in_port=3,ip actions=ct(zone=1,table=22)
table=20,priority=10,in_port=4,ip actions=ct(zone=1,table=23)
table=20,priority=1 actions=ct(zone=1,table=18)
table=22,priority=10,in_port=3 action=4
table=23,priority=10,in_port=4 action=3
table=24,priority=10,in_port=3 action=1"

The networking:
vm
 |
br-ply - br-linux
 |
br-int

We find rev_conn is in the list of ctb->exp_lists[] sometimes.
The following change will solve this problem, but we can't explain why

$ git diff
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 419cb1d..d5141c4 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ nat_clean(struct conntrack *ct, struct conn *conn,
 if (rev_conn && (!nat_conn_key_node ||
  conn_key_cmp(&nat_conn_key_node->value,
   &rev_conn->rev_key))) {
+ovs_list_remove(&rev_conn->exp_node);
 hmap_remove(&ct->buckets[bucket_rev_conn].connections,
 &rev_conn->node);
 free(rev_conn);
@@ create_un_nat_conn(struct conntrack *ct, struct conn *conn_f
or_un_nat_copy,
 nat_conn_keys_lookup(&ct->nat_conn_keys, &nc->key, ct->hash_basis);
 if (nat_conn_key_node && !conn_key_cmp(&nat_conn_key_node->value,
 &nc->rev_key) && !rev_conn) {
-
+ovs_list_init(&nc->exp_node);
 hmap_insert(&ct->buckets[un_nat_conn_bucket].connections,
 &nc->node, un_nat_hash);

Any idea?

Thanks.


On 8/24/17, 3:36 AM, "ovs-dev-boun...@openvswitch.org on behalf of 
huanglili"  wrote:

From: Lili Huang 

Conn should be removed from the list before freed.

This crash will be triggered when a established flow do ct(nat)
again, like
"ip,actions=ct(table=1)
 table=1,in_port=1,ip,actions=ct(commit,nat(dst=5.5.5.5)),2
 table=1,in_port=2,ip,ct_state=+est,actions=1
 table=1,in_port=1,ip,ct_state=+est,actions=2"

Signed-off-by: Lili Huang 
---
 lib/conntrack.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 1c0e023..dd73e1a 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -779,6 +779,8 @@ conn_not_found(struct conntrack *ct, struct 
dp_packet *pkt,
ct, nc, conn_for_un_nat_copy);
 
 if (!nat_res) {
+ovs_list_remove(&nc->exp_node);
+ctx->conn = NULL;
 goto nat_res_exhaustion;

Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread O Mahony, Billy
Hi Wang,

I think the mention of pinning was confusing me a little. Let me see if I fully 
understand your use case:  You don't 'want' to pin anything but you are using 
it as a way to force the distribution of rxq from a single nic across to PMDs 
on different NUMAs. As without pinning all rxqs are assigned to the NUMA-local 
pmd leaving the other PMD totally unused.

But then when you used pinning you the PMDs became isolated so the vhostuser 
ports rxqs would not be assigned to the PMDs unless they too were pinned. Which 
worked but was not manageable as VM (and vhost ports) came and went.

Yes? 

In that case what we probably want is the ability to pin an rxq to a pmd but 
without also isolating the pmd. So the PMD could be assigned some rxqs manually 
and still have others automatically assigned. 

But what I still don't understand is why you don't put both PMDs on the same 
NUMA node. Given that you cannot program the NIC to know which VM a frame is 
for then you would have to RSS the frames across rxqs (ie across NUMA nodes). 
Of those going to the NICs local-numa node 50% would have to go across the NUMA 
boundary when their destination VM was decided - which is okay - they have to 
cross the boundary at some point. But for or frames going to non-local NUMA, 
50% of these will actually be destined for what was originally the local NUMA 
node. Now these packets (25% of all traffic would ) will cross NUMA *twice* 
whereas if all PMDs were on the NICs NUMA node those frames would never have 
had to pass between NUMA nodes.

In short I think it's more efficient to have both PMDs on the same NUMA node as 
the NIC.

There is one more comments below..

> -Original Message-
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Wednesday, September 6, 2017 12:50 PM
> To: O Mahony, Billy ; Darrell Ball
> ; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor 
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> See my reply in line.
> 
> Br,
> Wang Zhike
> 
> -Original Message-
> From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> Sent: Wednesday, September 06, 2017 7:26 PM
> To: 王志克; Darrell Ball; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Wang,
> 
> You are going to have to take the hit crossing the NUMA boundary at some
> point if your NIC and VM are on different NUMAs.
> 
> So are you saying that it is more expensive to cross the NUMA boundary
> from the pmd to the VM that to cross it from the NIC to the PMD?
> 
> [Wang Zhike] I do not have such data. I hope we can try the new behavior
> and get the test result, and then know whether and how much performance
> can be improved.

[[BO'M]] You don't need to a code change to compare performance of these two 
scenarios. You can simulate it by pinning queues to VMs. I'd imagine crossing 
the NUMA boundary during the PCI DMA would be cheaper that crossing it over 
vhost. But I don't know what the result would be and this would a pretty 
interesting figure to have by the way.


> 
> If so then in that case you'd like to have two (for example) PMDs polling 2
> queues on the same NIC. With the PMDs on each of the NUMA nodes
> forwarding to the VMs local to that NUMA?
> 
> Of course your NIC would then also need to be able know which VM (or at
> least which NUMA the VM is on) in order to send the frame to the correct
> rxq.
> 
> [Wang Zhike] Currently I do not know how to achieve it. From my view, NIC
> do not know which NUMA should be the destination of the packet. Only
> after OVS handling (eg lookup the fowarding rule in OVS), then it can know
> the destination. If NIC does not know the destination NUMA socket, it does
> not matter which PMD to poll it.
> 
> 
> /Billy.
> 
> > -Original Message-
> > From: 王志克 [mailto:wangzh...@jd.com]
> > Sent: Wednesday, September 6, 2017 11:41 AM
> > To: O Mahony, Billy ; Darrell Ball
> > ; ovs-discuss@openvswitch.org; ovs-
> > d...@openvswitch.org; Kevin Traynor 
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Billy,
> >
> > It depends on the destination of the traffic.
> >
> > I observed that if the traffic destination is across NUMA socket, the
> > "avg processing cycles per packet" would increase 60% than the traffic
> > to same NUMA socket.
> >
> > Br,
> > Wang Zhike
> >
> > -Original Message-
> > From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> > Sent: Wednesday, September 06, 2017 6:35 PM
> > To: 王志克; Darrell Ball; ovs-discuss@openvswitch.org; ovs-
> > d...@openvswitch.org; Kevin Traynor
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Wang,
> >
> > If you create several PMDs on the NUMA of the physical port does that
> > have the same performance characteristic?
> >
> > /Billy
> >
> >
> >
> > > 

Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread 王志克
Hi Billy,

See my reply in line.

Br,
Wang Zhike

-Original Message-
From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] 
Sent: Wednesday, September 06, 2017 7:26 PM
To: 王志克; Darrell Ball; ovs-discuss@openvswitch.org; ovs-...@openvswitch.org; 
Kevin Traynor
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

Hi Wang,

You are going to have to take the hit crossing the NUMA boundary at some point 
if your NIC and VM are on different NUMAs.

So are you saying that it is more expensive to cross the NUMA boundary from the 
pmd to the VM that to cross it from the NIC to the PMD?

[Wang Zhike] I do not have such data. I hope we can try the new behavior and 
get the test result, and then know whether and how much performance can be 
improved.

If so then in that case you'd like to have two (for example) PMDs polling 2 
queues on the same NIC. With the PMDs on each of the NUMA nodes forwarding to 
the VMs local to that NUMA?

Of course your NIC would then also need to be able know which VM (or at least 
which NUMA the VM is on) in order to send the frame to the correct rxq. 

[Wang Zhike] Currently I do not know how to achieve it. From my view, NIC do 
not know which NUMA should be the destination of the packet. Only after OVS 
handling (eg lookup the fowarding rule in OVS), then it can know the 
destination. If NIC does not know the destination NUMA socket, it does not 
matter which PMD to poll it.


/Billy. 

> -Original Message-
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Wednesday, September 6, 2017 11:41 AM
> To: O Mahony, Billy ; Darrell Ball
> ; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor 
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> It depends on the destination of the traffic.
> 
> I observed that if the traffic destination is across NUMA socket, the "avg
> processing cycles per packet" would increase 60% than the traffic to same
> NUMA socket.
> 
> Br,
> Wang Zhike
> 
> -Original Message-
> From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> Sent: Wednesday, September 06, 2017 6:35 PM
> To: 王志克; Darrell Ball; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Wang,
> 
> If you create several PMDs on the NUMA of the physical port does that have
> the same performance characteristic?
> 
> /Billy
> 
> 
> 
> > -Original Message-
> > From: 王志克 [mailto:wangzh...@jd.com]
> > Sent: Wednesday, September 6, 2017 10:20 AM
> > To: O Mahony, Billy ; Darrell Ball
> > ; ovs-discuss@openvswitch.org; ovs-
> > d...@openvswitch.org; Kevin Traynor 
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Billy,
> >
> > Yes, I want to achieve better performance.
> >
> > The commit "dpif-netdev: Assign ports to pmds on non-local numa node"
> > can NOT meet my needs.
> >
> > I do have pmd on socket 0 to poll the physical NIC which is also on socket 
> > 0.
> > However, this is not enough since I also have other pmd on socket 1. I
> > hope such pmds on socket 1 can together poll physical NIC. In this
> > way, we have more CPU (in my case, double CPU) to poll the NIC, which
> > results in performance improvement.
> >
> > BR,
> > Wang Zhike
> >
> > -Original Message-
> > From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> > Sent: Wednesday, September 06, 2017 5:14 PM
> > To: Darrell Ball; 王志克; ovs-discuss@openvswitch.org; ovs-
> > d...@openvswitch.org; Kevin Traynor
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Wang,
> >
> > A change was committed to head of master 2017-08-02 "dpif-netdev:
> > Assign ports to pmds on non-local numa node" which if I understand
> > your request correctly will do what you require.
> >
> > However it is not clear to me why you are pinning rxqs to PMDs in the
> > first instance. Currently if you configure at least on pmd on each
> > numa there should always be a PMD available. Is the pinning for
> performance reasons?
> >
> > Regards,
> > Billy
> >
> >
> >
> > > -Original Message-
> > > From: Darrell Ball [mailto:db...@vmware.com]
> > > Sent: Wednesday, September 6, 2017 8:25 AM
> > > To: 王志克 ; ovs-discuss@openvswitch.org; ovs-
> > > d...@openvswitch.org; O Mahony, Billy ;
> > Kevin
> > > Traynor 
> > > Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > > physical port
> > >
> > > Adding Billy and Kevin
> > >
> > >
> > > On 9/6/17, 12:22 AM, "Darrell Ball"  wrote:
> > >
> > >
> > >
> > > On 9/6/17, 12:03 AM, "王志克"  wrote:
> > >
> > > Hi Darrell,
> > >
> > > pmd-rxq-affinity has below limitation: (so isolated pmd can
> > > not be used for others, which is not my expectation. Lots of VMs
> > > come and go on the fly, and manully assignment is not feasible.)
> > >   >>After

Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread O Mahony, Billy
Hi Wang,

You are going to have to take the hit crossing the NUMA boundary at some point 
if your NIC and VM are on different NUMAs.

So are you saying that it is more expensive to cross the NUMA boundary from the 
pmd to the VM that to cross it from the NIC to the PMD?

If so then in that case you'd like to have two (for example) PMDs polling 2 
queues on the same NIC. With the PMDs on each of the NUMA nodes forwarding to 
the VMs local to that NUMA?

Of course your NIC would then also need to be able know which VM (or at least 
which NUMA the VM is on) in order to send the frame to the correct rxq. 

/Billy. 

> -Original Message-
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Wednesday, September 6, 2017 11:41 AM
> To: O Mahony, Billy ; Darrell Ball
> ; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor 
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> It depends on the destination of the traffic.
> 
> I observed that if the traffic destination is across NUMA socket, the "avg
> processing cycles per packet" would increase 60% than the traffic to same
> NUMA socket.
> 
> Br,
> Wang Zhike
> 
> -Original Message-
> From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> Sent: Wednesday, September 06, 2017 6:35 PM
> To: 王志克; Darrell Ball; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Wang,
> 
> If you create several PMDs on the NUMA of the physical port does that have
> the same performance characteristic?
> 
> /Billy
> 
> 
> 
> > -Original Message-
> > From: 王志克 [mailto:wangzh...@jd.com]
> > Sent: Wednesday, September 6, 2017 10:20 AM
> > To: O Mahony, Billy ; Darrell Ball
> > ; ovs-discuss@openvswitch.org; ovs-
> > d...@openvswitch.org; Kevin Traynor 
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Billy,
> >
> > Yes, I want to achieve better performance.
> >
> > The commit "dpif-netdev: Assign ports to pmds on non-local numa node"
> > can NOT meet my needs.
> >
> > I do have pmd on socket 0 to poll the physical NIC which is also on socket 
> > 0.
> > However, this is not enough since I also have other pmd on socket 1. I
> > hope such pmds on socket 1 can together poll physical NIC. In this
> > way, we have more CPU (in my case, double CPU) to poll the NIC, which
> > results in performance improvement.
> >
> > BR,
> > Wang Zhike
> >
> > -Original Message-
> > From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> > Sent: Wednesday, September 06, 2017 5:14 PM
> > To: Darrell Ball; 王志克; ovs-discuss@openvswitch.org; ovs-
> > d...@openvswitch.org; Kevin Traynor
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Wang,
> >
> > A change was committed to head of master 2017-08-02 "dpif-netdev:
> > Assign ports to pmds on non-local numa node" which if I understand
> > your request correctly will do what you require.
> >
> > However it is not clear to me why you are pinning rxqs to PMDs in the
> > first instance. Currently if you configure at least on pmd on each
> > numa there should always be a PMD available. Is the pinning for
> performance reasons?
> >
> > Regards,
> > Billy
> >
> >
> >
> > > -Original Message-
> > > From: Darrell Ball [mailto:db...@vmware.com]
> > > Sent: Wednesday, September 6, 2017 8:25 AM
> > > To: 王志克 ; ovs-discuss@openvswitch.org; ovs-
> > > d...@openvswitch.org; O Mahony, Billy ;
> > Kevin
> > > Traynor 
> > > Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > > physical port
> > >
> > > Adding Billy and Kevin
> > >
> > >
> > > On 9/6/17, 12:22 AM, "Darrell Ball"  wrote:
> > >
> > >
> > >
> > > On 9/6/17, 12:03 AM, "王志克"  wrote:
> > >
> > > Hi Darrell,
> > >
> > > pmd-rxq-affinity has below limitation: (so isolated pmd can
> > > not be used for others, which is not my expectation. Lots of VMs
> > > come and go on the fly, and manully assignment is not feasible.)
> > >   >>After that PMD threads on cores where RX queues
> > > was pinned will become isolated. This means that this thread will
> > > poll only pinned RX queues
> > >
> > > My problem is that I have several CPUs spreading on
> > > different NUMA nodes. I hope all these CPU can have chance to serve
> the rxq.
> > > However, because the phy NIC only locates on one certain socket
> > > node, non-same numa pmd/CPU would be excluded. So I am wondering
> > > whether
> > we
> > > can have different behavior for phy port rxq:
> > >   round-robin to all PMDs even the pmd on different NUMA 
> > > socket.
> > >
> > > I guess this is a common case, and I believe it would
> > > improve rx performance.
> > >
> > >
> > > [Darrell] I agree it would be a common problem and some
> > > distribution would seem to make sense, maybe facto

Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread 王志克
Hi Billy,

It depends on the destination of the traffic.

I observed that if the traffic destination is across NUMA socket, the "avg 
processing cycles per packet" would increase 60% than the traffic to same NUMA 
socket.

Br,
Wang Zhike

-Original Message-
From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] 
Sent: Wednesday, September 06, 2017 6:35 PM
To: 王志克; Darrell Ball; ovs-discuss@openvswitch.org; ovs-...@openvswitch.org; 
Kevin Traynor
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

Hi Wang,

If you create several PMDs on the NUMA of the physical port does that have the 
same performance characteristic? 

/Billy



> -Original Message-
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Wednesday, September 6, 2017 10:20 AM
> To: O Mahony, Billy ; Darrell Ball
> ; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor 
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> Yes, I want to achieve better performance.
> 
> The commit "dpif-netdev: Assign ports to pmds on non-local numa node" can
> NOT meet my needs.
> 
> I do have pmd on socket 0 to poll the physical NIC which is also on socket 0.
> However, this is not enough since I also have other pmd on socket 1. I hope
> such pmds on socket 1 can together poll physical NIC. In this way, we have
> more CPU (in my case, double CPU) to poll the NIC, which results in
> performance improvement.
> 
> BR,
> Wang Zhike
> 
> -Original Message-
> From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> Sent: Wednesday, September 06, 2017 5:14 PM
> To: Darrell Ball; 王志克; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Wang,
> 
> A change was committed to head of master 2017-08-02 "dpif-netdev: Assign
> ports to pmds on non-local numa node" which if I understand your request
> correctly will do what you require.
> 
> However it is not clear to me why you are pinning rxqs to PMDs in the first
> instance. Currently if you configure at least on pmd on each numa there
> should always be a PMD available. Is the pinning for performance reasons?
> 
> Regards,
> Billy
> 
> 
> 
> > -Original Message-
> > From: Darrell Ball [mailto:db...@vmware.com]
> > Sent: Wednesday, September 6, 2017 8:25 AM
> > To: 王志克 ; ovs-discuss@openvswitch.org; ovs-
> > d...@openvswitch.org; O Mahony, Billy ;
> Kevin
> > Traynor 
> > Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Adding Billy and Kevin
> >
> >
> > On 9/6/17, 12:22 AM, "Darrell Ball"  wrote:
> >
> >
> >
> > On 9/6/17, 12:03 AM, "王志克"  wrote:
> >
> > Hi Darrell,
> >
> > pmd-rxq-affinity has below limitation: (so isolated pmd can
> > not be used for others, which is not my expectation. Lots of VMs come
> > and go on the fly, and manully assignment is not feasible.)
> >   >>After that PMD threads on cores where RX queues
> > was pinned will become isolated. This means that this thread will poll
> > only pinned RX queues
> >
> > My problem is that I have several CPUs spreading on different
> > NUMA nodes. I hope all these CPU can have chance to serve the rxq.
> > However, because the phy NIC only locates on one certain socket node,
> > non-same numa pmd/CPU would be excluded. So I am wondering whether
> we
> > can have different behavior for phy port rxq:
> >   round-robin to all PMDs even the pmd on different NUMA socket.
> >
> > I guess this is a common case, and I believe it would improve
> > rx performance.
> >
> >
> > [Darrell] I agree it would be a common problem and some
> > distribution would seem to make sense, maybe factoring in some
> > favoring of local numa PMDs ?
> > Maybe an optional config to enable ?
> >
> >
> > Br,
> > Wang Zhike
> >
> >

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread O Mahony, Billy
Hi Wang,

If you create several PMDs on the NUMA of the physical port does that have the 
same performance characteristic? 

/Billy



> -Original Message-
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Wednesday, September 6, 2017 10:20 AM
> To: O Mahony, Billy ; Darrell Ball
> ; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor 
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> Yes, I want to achieve better performance.
> 
> The commit "dpif-netdev: Assign ports to pmds on non-local numa node" can
> NOT meet my needs.
> 
> I do have pmd on socket 0 to poll the physical NIC which is also on socket 0.
> However, this is not enough since I also have other pmd on socket 1. I hope
> such pmds on socket 1 can together poll physical NIC. In this way, we have
> more CPU (in my case, double CPU) to poll the NIC, which results in
> performance improvement.
> 
> BR,
> Wang Zhike
> 
> -Original Message-
> From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> Sent: Wednesday, September 06, 2017 5:14 PM
> To: Darrell Ball; 王志克; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Wang,
> 
> A change was committed to head of master 2017-08-02 "dpif-netdev: Assign
> ports to pmds on non-local numa node" which if I understand your request
> correctly will do what you require.
> 
> However it is not clear to me why you are pinning rxqs to PMDs in the first
> instance. Currently if you configure at least on pmd on each numa there
> should always be a PMD available. Is the pinning for performance reasons?
> 
> Regards,
> Billy
> 
> 
> 
> > -Original Message-
> > From: Darrell Ball [mailto:db...@vmware.com]
> > Sent: Wednesday, September 6, 2017 8:25 AM
> > To: 王志克 ; ovs-discuss@openvswitch.org; ovs-
> > d...@openvswitch.org; O Mahony, Billy ;
> Kevin
> > Traynor 
> > Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Adding Billy and Kevin
> >
> >
> > On 9/6/17, 12:22 AM, "Darrell Ball"  wrote:
> >
> >
> >
> > On 9/6/17, 12:03 AM, "王志克"  wrote:
> >
> > Hi Darrell,
> >
> > pmd-rxq-affinity has below limitation: (so isolated pmd can
> > not be used for others, which is not my expectation. Lots of VMs come
> > and go on the fly, and manully assignment is not feasible.)
> >   >>After that PMD threads on cores where RX queues
> > was pinned will become isolated. This means that this thread will poll
> > only pinned RX queues
> >
> > My problem is that I have several CPUs spreading on different
> > NUMA nodes. I hope all these CPU can have chance to serve the rxq.
> > However, because the phy NIC only locates on one certain socket node,
> > non-same numa pmd/CPU would be excluded. So I am wondering whether
> we
> > can have different behavior for phy port rxq:
> >   round-robin to all PMDs even the pmd on different NUMA socket.
> >
> > I guess this is a common case, and I believe it would improve
> > rx performance.
> >
> >
> > [Darrell] I agree it would be a common problem and some
> > distribution would seem to make sense, maybe factoring in some
> > favoring of local numa PMDs ?
> > Maybe an optional config to enable ?
> >
> >
> > Br,
> > Wang Zhike
> >
> >

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] dev Digest, Vol 98, Issue 38

2017-09-06 Thread 王志克
Hi Kevin,

Consider the scenario:

One host with 1 physical NIC, and the NIC locates on NUMA socket0. There are 
lots of VM on this host.

I can see several method to improve the performance:
1) Try to make sure the VM memory used for networking would locate on socket0 
forever. Eg, if VM uses 4G memory, we can split 1G for networking and this 1G 
comes from socket 0. In this way, we can always allocate CPU from socket 0 
only. I do not know whether it is feasible or not.
2) If option 1 is not feasible, then VM memory would spread across NUMA socket. 
Then it means packet from physical NIC (socket0) may go to VM on other socket 
(say socket 1). Such across NUMA communication would lead to performance 
downgrade.

What I am talking is option 2. Since across NUMA communication is not 
avoidable, why not add more CPU?

Br,
Wang Zhike











Message: 5
Date: Wed, 6 Sep 2017 10:23:53 +0100
From: Kevin Traynor 
To: ??? , Darrell Ball ,
"ovs-discuss@openvswitch.org" ,
"ovs-...@openvswitch.org" 
Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
physicalport
Message-ID: 
Content-Type: text/plain; charset=utf-8

On 09/06/2017 08:03 AM, ??? wrote:
> Hi Darrell,
> 
> pmd-rxq-affinity has below limitation: (so isolated pmd can not be used for 
> others, which is not my expectation. Lots of VMs come and go on the fly, and 
> manully assignment is not feasible.)
>   >>After that PMD threads on cores where RX queues was pinned will 
> become isolated. This means that this thread will poll only pinned RX queues
> 
> My problem is that I have several CPUs spreading on different NUMA nodes. I 
> hope all these CPU can have chance to serve the rxq. However, because the phy 
> NIC only locates on one certain socket node, non-same numa pmd/CPU would be 
> excluded. So I am wondering whether we can have different behavior for phy 
> port rxq: 
>   round-robin to all PMDs even the pmd on different NUMA socket.
> 
> I guess this is a common case, and I believe it would improve rx performance.
> 

The issue is that cross numa datapaths occur a large performance penalty
(~2x cycles). This is the reason rxq assignment uses pmds from the same
numa node as the port. Also, any rxqs from other ports that are also
scheduled on the same pmd could suffer as a result of cpu starvation
from that cross-numa assignment.

An issue was that in the case of no pmds available on the correct NUMA
node for a port, it meant that rxqs from that port were not polled at
all. Billy's commit addressed that by allowing cross-numa assignment
*only* in the event of no pmds on the same numa node as the port.

If you look through the threads on Billy's patch you'll see more
discussion on it.

Kevin.


> Br,
> Wang Zhike
> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com] 
> Sent: Wednesday, September 06, 2017 1:39 PM
> To: ???; ovs-discuss@openvswitch.org; ovs-...@openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port
> 
> You could use  pmd-rxq-affinity for the queues you want serviced locally and 
> let the others go remote
> 
> On 9/5/17, 8:14 PM, "???"  wrote:
> 
> It is a bit different from my expectation.
> 
> 
> 
> I have separate CPU and pmd for each NUMA node. However, the physical NIC 
> only locates on NUMA socket0. So only part of CPU and pmd (the ones in same 
> NUMA node) can poll the physical NIC. Since I have multiple rx queue, I hope 
> part queues can be polled with pmd on same node, others can be polled with 
> pmd on non-local numa node. In this way, we have more pmds contributes the 
> polling of physical NIC, so performance improvement is expected from total rx 
> traffic view.
> 
> 
> 
> Br,
> 
> Wang Zhike
> 
> 
> 
> -Original Message-
> 
> From: Darrell Ball [mailto:db...@vmware.com] 
> 
> Sent: Wednesday, September 06, 2017 10:47 AM
> 
> To: ???; ovs-discuss@openvswitch.org; ovs-...@openvswitch.org
> 
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical 
> port
> 
> 
> 
> This same numa node limitation was already removed, although same numa is 
> preferred for performance reasons.
> 
> 
> 
> commit c37813fdb030b4270d05ad61943754f67021a50d
> 
> Author: Billy O'Mahony 
> 
> Date:   Tue Aug 1 14:38:43 2017 -0700
> 
> 
> 
> dpif-netdev: Assign ports to pmds on non-local numa node.
> 
> 
> 
> Previously if there is no available (non-isolated) pmd on the numa 
> node
> 
> for a port then the port is not polled at all. This can result in a
> 
> non-operational system until such time as nics are physically
> 
> repositioned. It is preferable to operate with a pmd on the 'wrong' 
> numa
> 
> node albeit with lower performance. Local pmds are still chosen w

Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread Kevin Traynor
On 09/06/2017 08:03 AM, 王志克 wrote:
> Hi Darrell,
> 
> pmd-rxq-affinity has below limitation: (so isolated pmd can not be used for 
> others, which is not my expectation. Lots of VMs come and go on the fly, and 
> manully assignment is not feasible.)
>   >>After that PMD threads on cores where RX queues was pinned will 
> become isolated. This means that this thread will poll only pinned RX queues
> 
> My problem is that I have several CPUs spreading on different NUMA nodes. I 
> hope all these CPU can have chance to serve the rxq. However, because the phy 
> NIC only locates on one certain socket node, non-same numa pmd/CPU would be 
> excluded. So I am wondering whether we can have different behavior for phy 
> port rxq: 
>   round-robin to all PMDs even the pmd on different NUMA socket.
> 
> I guess this is a common case, and I believe it would improve rx performance.
> 

The issue is that cross numa datapaths occur a large performance penalty
(~2x cycles). This is the reason rxq assignment uses pmds from the same
numa node as the port. Also, any rxqs from other ports that are also
scheduled on the same pmd could suffer as a result of cpu starvation
from that cross-numa assignment.

An issue was that in the case of no pmds available on the correct NUMA
node for a port, it meant that rxqs from that port were not polled at
all. Billy's commit addressed that by allowing cross-numa assignment
*only* in the event of no pmds on the same numa node as the port.

If you look through the threads on Billy's patch you'll see more
discussion on it.

Kevin.


> Br,
> Wang Zhike
> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com] 
> Sent: Wednesday, September 06, 2017 1:39 PM
> To: 王志克; ovs-discuss@openvswitch.org; ovs-...@openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port
> 
> You could use  pmd-rxq-affinity for the queues you want serviced locally and 
> let the others go remote
> 
> On 9/5/17, 8:14 PM, "王志克"  wrote:
> 
> It is a bit different from my expectation.
> 
> 
> 
> I have separate CPU and pmd for each NUMA node. However, the physical NIC 
> only locates on NUMA socket0. So only part of CPU and pmd (the ones in same 
> NUMA node) can poll the physical NIC. Since I have multiple rx queue, I hope 
> part queues can be polled with pmd on same node, others can be polled with 
> pmd on non-local numa node. In this way, we have more pmds contributes the 
> polling of physical NIC, so performance improvement is expected from total rx 
> traffic view.
> 
> 
> 
> Br,
> 
> Wang Zhike
> 
> 
> 
> -Original Message-
> 
> From: Darrell Ball [mailto:db...@vmware.com] 
> 
> Sent: Wednesday, September 06, 2017 10:47 AM
> 
> To: 王志克; ovs-discuss@openvswitch.org; ovs-...@openvswitch.org
> 
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical 
> port
> 
> 
> 
> This same numa node limitation was already removed, although same numa is 
> preferred for performance reasons.
> 
> 
> 
> commit c37813fdb030b4270d05ad61943754f67021a50d
> 
> Author: Billy O'Mahony 
> 
> Date:   Tue Aug 1 14:38:43 2017 -0700
> 
> 
> 
> dpif-netdev: Assign ports to pmds on non-local numa node.
> 
> 
> 
> Previously if there is no available (non-isolated) pmd on the numa 
> node
> 
> for a port then the port is not polled at all. This can result in a
> 
> non-operational system until such time as nics are physically
> 
> repositioned. It is preferable to operate with a pmd on the 'wrong' 
> numa
> 
> node albeit with lower performance. Local pmds are still chosen when
> 
> available.
> 
> 
> 
> Signed-off-by: Billy O'Mahony 
> 
> Signed-off-by: Ilya Maximets 
> 
> Co-authored-by: Ilya Maximets 
> 
> 
> 
> 
> 
> The sentence “The rx queues are assigned to pmd threads on the same NUMA 
> node in a round-robin fashion.”
> 
> 
> 
> under
> 
> 
> 
> DPDK Physical Port Rx Queues¶
> 
> 
> 
> should be removed since it is outdated in a couple of ways and there is 
> other correct documentation on the same page
> 
> and also here 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_howto_dpdk_&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=iNebKvfYjcXbjMsmtLJqThRUImv8W4PRrYWpD-QwUVg&s=KG3MmQe4QkUkyG3xsCoF6DakFsZh_eg9aEyhYFUKF2c&e=
>  
> 
> 
> 
> Maybe you could submit a patch ?
> 
> 
> 
> Thanks Darrell
> 
> 
> 
> 
> 
> On 9/5/17, 7:18 PM, "ovs-dev-boun...@openvswitch.org on behalf of 王志克" 
>  wrote:
> 
> 
> 
> Hi All,
> 
> 
> 
> 
> 
> 

Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread 王志克
Hi Billy,

Yes, I want to achieve better performance.

The commit "dpif-netdev: Assign ports to pmds on non-local numa node" can NOT 
meet my needs.

I do have pmd on socket 0 to poll the physical NIC which is also on socket 0. 
However, this is not enough since I also have other pmd on socket 1. I hope 
such pmds on socket 1 can together poll physical NIC. In this way, we have more 
CPU (in my case, double CPU) to poll the NIC, which results in performance 
improvement.

BR,
Wang Zhike

-Original Message-
From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] 
Sent: Wednesday, September 06, 2017 5:14 PM
To: Darrell Ball; 王志克; ovs-discuss@openvswitch.org; ovs-...@openvswitch.org; 
Kevin Traynor
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

Hi Wang,

A change was committed to head of master 2017-08-02 "dpif-netdev: Assign ports 
to pmds on non-local numa node" which if I understand your request correctly 
will do what you require.

However it is not clear to me why you are pinning rxqs to PMDs in the first 
instance. Currently if you configure at least on pmd on each numa there should 
always be a PMD available. Is the pinning for performance reasons?

Regards,
Billy



> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Wednesday, September 6, 2017 8:25 AM
> To: 王志克 ; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; O Mahony, Billy ; Kevin
> Traynor 
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Adding Billy and Kevin
> 
> 
> On 9/6/17, 12:22 AM, "Darrell Ball"  wrote:
> 
> 
> 
> On 9/6/17, 12:03 AM, "王志克"  wrote:
> 
> Hi Darrell,
> 
> pmd-rxq-affinity has below limitation: (so isolated pmd can not be 
> used
> for others, which is not my expectation. Lots of VMs come and go on the fly,
> and manully assignment is not feasible.)
>   >>After that PMD threads on cores where RX queues was pinned
> will become isolated. This means that this thread will poll only pinned RX
> queues
> 
> My problem is that I have several CPUs spreading on different NUMA
> nodes. I hope all these CPU can have chance to serve the rxq. However,
> because the phy NIC only locates on one certain socket node, non-same
> numa pmd/CPU would be excluded. So I am wondering whether we can
> have different behavior for phy port rxq:
>   round-robin to all PMDs even the pmd on different NUMA socket.
> 
> I guess this is a common case, and I believe it would improve rx
> performance.
> 
> 
> [Darrell] I agree it would be a common problem and some distribution
> would seem to make sense, maybe factoring in some favoring of local numa
> PMDs ?
> Maybe an optional config to enable ?
> 
> 
> Br,
> Wang Zhike
> 
> 

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread O Mahony, Billy
Hi Wang,

A change was committed to head of master 2017-08-02 "dpif-netdev: Assign ports 
to pmds on non-local numa node" which if I understand your request correctly 
will do what you require.

However it is not clear to me why you are pinning rxqs to PMDs in the first 
instance. Currently if you configure at least on pmd on each numa there should 
always be a PMD available. Is the pinning for performance reasons?

Regards,
Billy



> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com]
> Sent: Wednesday, September 6, 2017 8:25 AM
> To: 王志克 ; ovs-discuss@openvswitch.org; ovs-
> d...@openvswitch.org; O Mahony, Billy ; Kevin
> Traynor 
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Adding Billy and Kevin
> 
> 
> On 9/6/17, 12:22 AM, "Darrell Ball"  wrote:
> 
> 
> 
> On 9/6/17, 12:03 AM, "王志克"  wrote:
> 
> Hi Darrell,
> 
> pmd-rxq-affinity has below limitation: (so isolated pmd can not be 
> used
> for others, which is not my expectation. Lots of VMs come and go on the fly,
> and manully assignment is not feasible.)
>   >>After that PMD threads on cores where RX queues was pinned
> will become isolated. This means that this thread will poll only pinned RX
> queues
> 
> My problem is that I have several CPUs spreading on different NUMA
> nodes. I hope all these CPU can have chance to serve the rxq. However,
> because the phy NIC only locates on one certain socket node, non-same
> numa pmd/CPU would be excluded. So I am wondering whether we can
> have different behavior for phy port rxq:
>   round-robin to all PMDs even the pmd on different NUMA socket.
> 
> I guess this is a common case, and I believe it would improve rx
> performance.
> 
> 
> [Darrell] I agree it would be a common problem and some distribution
> would seem to make sense, maybe factoring in some favoring of local numa
> PMDs ?
> Maybe an optional config to enable ?
> 
> 
> Br,
> Wang Zhike
> 
> 

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread Darrell Ball
Adding Billy and Kevin


On 9/6/17, 12:22 AM, "Darrell Ball"  wrote:



On 9/6/17, 12:03 AM, "王志克"  wrote:

Hi Darrell,

pmd-rxq-affinity has below limitation: (so isolated pmd can not be used 
for others, which is not my expectation. Lots of VMs come and go on the fly, 
and manully assignment is not feasible.)
  >>After that PMD threads on cores where RX queues was pinned 
will become isolated. This means that this thread will poll only pinned RX 
queues

My problem is that I have several CPUs spreading on different NUMA 
nodes. I hope all these CPU can have chance to serve the rxq. However, because 
the phy NIC only locates on one certain socket node, non-same numa pmd/CPU 
would be excluded. So I am wondering whether we can have different behavior for 
phy port rxq: 
  round-robin to all PMDs even the pmd on different NUMA socket.

I guess this is a common case, and I believe it would improve rx 
performance.


[Darrell] I agree it would be a common problem and some distribution would 
seem to make sense, maybe factoring in some favoring of local numa PMDs ?
Maybe an optional config to enable ?
  

Br,
Wang Zhike



___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread Darrell Ball


On 9/6/17, 12:03 AM, "王志克"  wrote:

Hi Darrell,

pmd-rxq-affinity has below limitation: (so isolated pmd can not be used for 
others, which is not my expectation. Lots of VMs come and go on the fly, and 
manully assignment is not feasible.)
  >>After that PMD threads on cores where RX queues was pinned will 
become isolated. This means that this thread will poll only pinned RX queues

My problem is that I have several CPUs spreading on different NUMA nodes. I 
hope all these CPU can have chance to serve the rxq. However, because the phy 
NIC only locates on one certain socket node, non-same numa pmd/CPU would be 
excluded. So I am wondering whether we can have different behavior for phy port 
rxq: 
  round-robin to all PMDs even the pmd on different NUMA socket.

I guess this is a common case, and I believe it would improve rx 
performance.


[Darrell] I agree it would be a common problem and some distribution would seem 
to make sense, maybe factoring in some favoring of local numa PMDs ?
Maybe an optional config to enable ?
  

Br,
Wang Zhike

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread 王志克
Hi Darrell,

pmd-rxq-affinity has below limitation: (so isolated pmd can not be used for 
others, which is not my expectation. Lots of VMs come and go on the fly, and 
manully assignment is not feasible.)
  >>After that PMD threads on cores where RX queues was pinned will 
become isolated. This means that this thread will poll only pinned RX queues

My problem is that I have several CPUs spreading on different NUMA nodes. I 
hope all these CPU can have chance to serve the rxq. However, because the phy 
NIC only locates on one certain socket node, non-same numa pmd/CPU would be 
excluded. So I am wondering whether we can have different behavior for phy port 
rxq: 
  round-robin to all PMDs even the pmd on different NUMA socket.

I guess this is a common case, and I believe it would improve rx performance.

Br,
Wang Zhike
-Original Message-
From: Darrell Ball [mailto:db...@vmware.com] 
Sent: Wednesday, September 06, 2017 1:39 PM
To: 王志克; ovs-discuss@openvswitch.org; ovs-...@openvswitch.org
Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

You could use  pmd-rxq-affinity for the queues you want serviced locally and 
let the others go remote

On 9/5/17, 8:14 PM, "王志克"  wrote:

It is a bit different from my expectation.



I have separate CPU and pmd for each NUMA node. However, the physical NIC 
only locates on NUMA socket0. So only part of CPU and pmd (the ones in same 
NUMA node) can poll the physical NIC. Since I have multiple rx queue, I hope 
part queues can be polled with pmd on same node, others can be polled with pmd 
on non-local numa node. In this way, we have more pmds contributes the polling 
of physical NIC, so performance improvement is expected from total rx traffic 
view.



Br,

Wang Zhike



-Original Message-

From: Darrell Ball [mailto:db...@vmware.com] 

Sent: Wednesday, September 06, 2017 10:47 AM

To: 王志克; ovs-discuss@openvswitch.org; ovs-...@openvswitch.org

Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical 
port



This same numa node limitation was already removed, although same numa is 
preferred for performance reasons.



commit c37813fdb030b4270d05ad61943754f67021a50d

Author: Billy O'Mahony 

Date:   Tue Aug 1 14:38:43 2017 -0700



dpif-netdev: Assign ports to pmds on non-local numa node.



Previously if there is no available (non-isolated) pmd on the numa node

for a port then the port is not polled at all. This can result in a

non-operational system until such time as nics are physically

repositioned. It is preferable to operate with a pmd on the 'wrong' numa

node albeit with lower performance. Local pmds are still chosen when

available.



Signed-off-by: Billy O'Mahony 

Signed-off-by: Ilya Maximets 

Co-authored-by: Ilya Maximets 





The sentence “The rx queues are assigned to pmd threads on the same NUMA 
node in a round-robin fashion.”



under



DPDK Physical Port Rx Queues¶



should be removed since it is outdated in a couple of ways and there is 
other correct documentation on the same page

and also here 
https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_howto_dpdk_&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=iNebKvfYjcXbjMsmtLJqThRUImv8W4PRrYWpD-QwUVg&s=KG3MmQe4QkUkyG3xsCoF6DakFsZh_eg9aEyhYFUKF2c&e=
 



Maybe you could submit a patch ?



Thanks Darrell





On 9/5/17, 7:18 PM, "ovs-dev-boun...@openvswitch.org on behalf of 王志克" 
 wrote:



Hi All,







I read below doc about pmd assignment for physical port. I think the 
limitation “on the same NUMA node” may be not efficient.








https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_intro_install_dpdk_&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8&s=4wch_Q6fqo0stIDE4K2loh0z-dshuligqsrAV_h-QuU&e=
 



DPDK Physical Port Rx 
Queues¶







$ ovs-vsctl set Interface  options:n_rxq=







The above command sets the number of rx queues for DPDK ph