-----Original Message----- From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] Sent: Wednesday, September 06, 2017 10:49 PM To: Kevin Traynor; Jan Scheurich; 王志克; Darrell Ball; ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port
> -----Original Message----- > From: Kevin Traynor [mailto:ktray...@redhat.com] > Sent: Wednesday, September 6, 2017 2:50 PM > To: Jan Scheurich <jan.scheur...@ericsson.com>; O Mahony, Billy > <billy.o.mah...@intel.com>; wangzh...@jd.com; Darrell Ball > <db...@vmware.com>; ovs-disc...@openvswitch.org; ovs- > d...@openvswitch.org > Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for > physical port > > On 09/06/2017 02:33 PM, Jan Scheurich wrote: > > Hi Billy, > > > >> You are going to have to take the hit crossing the NUMA boundary at > some point if your NIC and VM are on different NUMAs. > >> > >> So are you saying that it is more expensive to cross the NUMA > >> boundary from the pmd to the VM that to cross it from the NIC to the > PMD? > > > > Indeed, that is the case: If the NIC crosses the QPI bus when storing > packets in the remote NUMA there is no cost involved for the PMD. (The QPI > bandwidth is typically not a bottleneck.) The PMD only performs local > memory access. > > > > On the other hand, if the PMD crosses the QPI when copying packets into a > remote VM, there is a huge latency penalty involved, consuming lots of PMD > cycles that cannot be spent on processing packets. We at Ericsson have > observed exactly this behavior. > > > > This latency penalty becomes even worse when the LLC cache hit rate is > degraded due to LLC cache contention with real VNFs and/or unfavorable > packet buffer re-use patterns as exhibited by real VNFs compared to typical > synthetic benchmark apps like DPDK testpmd. > > > >> > >> If so then in that case you'd like to have two (for example) PMDs > >> polling 2 queues on the same NIC. With the PMDs on each of the NUMA > nodes forwarding to the VMs local to that NUMA? > >> > >> Of course your NIC would then also need to be able know which VM (or > >> at least which NUMA the VM is on) in order to send the frame to the > correct rxq. > > > > That would indeed be optimal but hard to realize in the general case (e.g. > with VXLAN encapsulation) as the actual destination is only known after > tunnel pop. Here perhaps some probabilistic steering of RSS hash values > based on measured distribution of final destinations might help in the future. > > > > But even without that in place, we need PMDs on both NUMAs anyhow > (for NUMA-aware polling of vhostuser ports), so why not use them to also > poll remote eth ports. We can achieve better average performance with > fewer PMDs than with the current limitation to NUMA-local polling. > > > > If the user has some knowledge of the numa locality of ports and can place > VM's accordingly, default cross-numa assignment can be harm performance. > Also, it would make for very unpredictable performance from test to test and > even for flow to flow on a datapath. [[BO'M]] Wang's original request would constitute default cross numa assignment but I don't think this modified proposal would as it still requires explicit config to assign to the remote NUMA. [Wangzhike] I think configuration option or compiling option are OK to me, since only phyiscal NIC rxq needs be configrued. It is only one-shot job. Regarding the test concern, I think it is worth to clarify different performance if the new behavior improves the rx throughput a lot. > > Kevin. > > > BR, Jan > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev