Hi Jan, Kevin, > -----Original Message----- > From: Jan Scheurich [mailto:jan.scheur...@ericsson.com] > Sent: Thursday, September 21, 2017 4:12 PM > To: Kevin Traynor <ktray...@redhat.com>; O Mahony, Billy > <billy.o.mah...@intel.com>; d...@openvswitch.org > Cc: Mechthild Buescher <mechthild.buesc...@ericsson.com>; Venkatesan > Pradeep <venkatesan.prad...@ericsson.com> > Subject: RE: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic > > Hi all, > > We seriously want to pursue this kind of ingress traffic prioritization from > physical ports in OVS-DPDK for the use case I mentioned earlier: > prioritization > of in-band control plane traffic running on the same physical network as the > tenant data traffic. > > We have first focused on testing the effectiveness of the SW queue > prioritization in Billy's patch. To this end we added two DPDK ports to a PMD: > dpdk0 with normal priority and dpdk1 with hard-coded high priority (e.g. not > using the config interface in the patch). We cross-connected dpdk0 to a > vhostuser port in a VM and dpdk1 to the LOCAL port on the host. > > We overloaded the PMD with 64 byte packets on dpdk0 (~25% rx packet > drop on dpdk0) and in parallel sent iperf3 UDP traffic (256 byte datagrams) in > on dpdk1, destined to an iperf3 server running on the host. > > With the dpdk1 queue prioritized, we achieve ~1Gbit/s (460 Kpps) iperf3 > throughput with zero packet drop no matter if the parallel overload traffic on > dpdk0 is running or not. (The throughput is limited by the UDP/IP stack on > the client side.) In the same test with non-prioritized dpdk1 queue iperf3 > reports about 28% packet drop, same as experienced by the dpdk0 traffic. > > With that we can conclude that the PMD priority queue polling scheme > implemented in Billy's patch effectively solves our problem. We haven't > tested if the inner priority polling loop has any performance impact on the > normal PMD processing. Not likely, though.
[[BO'M]] That great to know! > > The next question is how to classify the ingress traffic on the NIC and > insert it > into rx queues with different priority. Any scheme implemented should > preferably work with as many NICs as possible. Use of the new rte_flow API > in DPDK seems the right direction to go here. [[BO'M]] This may be getting ahead of where we are but is it important to know if a NIC does not support a prioritization scheme? Someone, Darrell I believe mentioned a capability discovery mechanism at one point. I was thinking it was not necessary as functionally nothing changes if prioritization is or is not supported. But maybe in terms of an orchestrator it does make sense - as the it may want to want to make other arrangements to protect control traffic in the absence of a working prioritization mechanism. > > We are very interested in starting the dialogue how to configure the {queue, > priority, filter} mapping in OVS and which filters are most meaningful to > start > with and supported by most NICs. Candidates could include VLAN tags and p- > bits, Ethertype and IP DSCP. > > One thing that we consider important and that we would not want to lose > with prioritization is the possibility to share load over a number of PMDs > with > RSS. So preferably the prioritization and RSS spread over a number of rx > queues were orthogonal. [[BO'M]] We have a proposed solution for this now. Which is simply to change the RETA table to avoid RSS'd packets 'polluting' the priority queue. It hasn't been implemented but it should work. That's in the context of DPDK/FlowDirector/XL710 but rte_flow api should allow this too. > > BR, Jan > > > Note: There seems to be a significant overlap with the discussion around > classification HW offload for datapath flow entries currently going on, with > the exception that the QoS filters here are static and not in any way tied to > dynamic megaflows. > > > > -----Original Message----- > > From: Kevin Traynor [mailto:ktray...@redhat.com] > > Sent: Friday, 18 August, 2017 20:40 > > To: Jan Scheurich <jan.scheur...@ericsson.com>; O Mahony, Billy > > <billy.o.mah...@intel.com>; d...@openvswitch.org > > Subject: Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive > > traffic > > > > On 08/17/2017 05:21 PM, Jan Scheurich wrote: > > > Good discussion. Some thoughts: > > > > > > 1. Prioritizing queues by assigning them to dedicated PMDs is a > > > simple and effective but very crude method, considering that you > > have to reserve an entire (logical) core for that. So I am all for a more > economic and perhaps slightly less deterministic option! > > > > > > > Sure - if you have the ability to effectively prioritize traffic on > > all ports then I agree. At present you would only be able to > > prioritize traffic from a 2 rxq i40e which would mean any other high > > priority traffic may get penalized if it lands on the same pmd. I'm > > not sure that limited a use case would really be useful. > > > > Kevin. > > > > > 2. Offering the option to prioritize certain queues in OVS-DPDK is a > > > highly desirable feature. We have at least one important use case > > in OpenStack (prioritizing "in-band" infrastructure control plane > > traffic over tenant data, in case both are carried on the same physical > network). In our case the traffic separation would be done per VLAN. Can we > add this to the list of supported filters? > > > > > > 3. It would be nice to be able to combine priority queues with > > > filters with a number of RSS queues without filter. Is this a XL710 > > > HW > > limitation or only a limitation of the drivers and DPDK APIs? > > > > > > BR, Jan > > > > > > > > >> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > > >> boun...@openvswitch.org] On Behalf Of O Mahony, Billy > > >> Sent: Thursday, 17 August, 2017 18:07 > > >> To: Kevin Traynor <ktray...@redhat.com>; d...@openvswitch.org > > >> Subject: Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive > > >> traffic > > >> > > >> Hi Kevin, > > >> > > >> Thanks for the comments - more inline. > > >> > > >> Billy. > > >> > > >>> -----Original Message----- > > >>> From: Kevin Traynor [mailto:ktray...@redhat.com] > > >>> Sent: Thursday, August 17, 2017 3:37 PM > > >>> To: O Mahony, Billy <billy.o.mah...@intel.com>; > > >>> d...@openvswitch.org > > >>> Subject: Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive > > >>> traffic > > >>> > > >>> Hi Billy, > > >>> > > >>> I just happened to be about to send a reply to the previous > > >>> patchset, so adding comments here instead. > > >>> > > >>> On 08/17/2017 03:24 PM, Billy O'Mahony wrote: > > >>>> Hi All, > > >>>> > > >>>> v2: Addresses various review comments; Applies cleanly on > 0bedb3d6. > > >>>> > > >>>> This patch set provides a method to request ingress scheduling on > > >>> interfaces. > > >>>> It also provides an implemtation of same for DPDK physical ports. > > >>>> > > >>>> This allows specific packet types to be: > > >>>> * forwarded to their destination port ahead of other packets. > > >>>> and/or > > >>>> * be less likely to be dropped in an overloaded situation. > > >>>> > > >>>> It was previously discussed > > >>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2017- > > >>> May/044395.htm > > >>>> l > > >>>> and RFC'd > > >>>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335237.h > > >>>> tml > > >>>> > > >>>> Limitations of this patch: > > >>>> * The patch uses the Flow Director filter API in DPDK and has > > >>>> only been tested on Fortville (XL710) NIC. > > >>>> * Prioritization is limited to: > > >>>> ** eth_type > > >>>> ** Fully specified 5-tuple src & dst ip and port numbers for UDP > > >>>> & TCP packets > > >>>> * ovs-appctl dpif-netdev/pmd-*-show o/p should indicate rxq > > >>> prioritization. > > >>>> * any requirements for a more granular prioritization mechanism > > >>>> > > >>> > > >>> In general I like the idea of splitting priority traffic to a > > >>> specific queue but I have concerns about the implementation. I > > >>> shared most of these when we met already but adding here too. Not a > detailed review. > > >> [[BO'M]] No worries. If we get the high-level sorted out first the > > >> details will fall into place :) > > >>> > > >>> - It is using deprecated DPDK filter API. > > >>> http://dpdk.org/doc/guides/rel_notes/deprecation.html > > >> [[BO'M]] Yes it looks like a move to the shiny new Flow API is in order. > > >>> > > >>> - It is an invasive change that seems to be for only one Intel NIC > > >>> in the DPDK datapath. Even then it is very limited as it only > > >>> works when that Intel NIC is using exactly 2 rx queues. > > >> [[BO'M]] That's the current case but is really a limitation of > > >> FlowDirectorAPI/DPDK/XL710 combination. Maybe Flow API will allow > > >> to RSS over many queues and place the prioritized traffic on another > queue. > > >>> > > >>> - It's a hardcoded opaque QoS which will have a negative impact on > > >>> whichever queues happen to land on the same pmd so it's > > >>> unpredictable which queues will be affected. It could effect other > > >>> latency sensitive traffic that cannot by prioritized because of the > limitations above. > > >>> > > >>> - I guess multiple priority queues could land on the same pmd and > > >>> starve each other? > > >> [[BO'M]] Interaction with pmd assignment is definitely an issue > > >> that needs to be addressed. I know there is work in-flight in that > > >> regard so it will be easier to address that when the in-flight work > > >> lands. > > >>> > > >>> I think a more general, less restricted scheme using DPDK rte_flow > > >>> API with controls on the effects to other traffic is needed. > > >>> Perhaps if a user is very concerned with latency on traffic from a > > >>> port, they would be ok with dedicating a pmd to it. > > >> [[BO'M]] You are proposing to prioritize queues by allocating a > > >> single pmd to them rather than by changing the pmds read algorithm > > >> to favor prioritized queues? For sure that could be another > > >> implementation of the solution. > > >> > > >> If we look at the patch set as containing two distinct things as > > >> per the cover letter "the patch set provides a method to request > > >> ingress scheduling on interfaces. It also provides an > > >> implementation of same for DPDK physical ports." Then this would > > >> change the second part put the first would be still valid. Each > > >> port type in any case would have to come up with it's own > > >> implementation - it's just for non-physical ports than cannot > > >> offload the prioritization decision it not worth the effort - as was > > >> noted > in an earlier RFC. > > >> > > >>> > > >>> thanks, > > >>> Kevin. > > >>> > > >>>> Initial results: > > >>>> * even when userspace OVS is very much overloaded and > > >>>> dropping significant numbers of packets the drop rate for > > >>>> prioritized > > >> traffic > > >>>> is running at 1/1000th of the drop rate for non-prioritized traffic. > > >>>> > > >>>> * the latency profile of prioritized traffic through userspace > > >>>> OVS is also > > >>> much > > >>>> improved > > >>>> > > >>>> 1e0 |* > > >>>> |* > > >>>> 1e-1 |* | Non-prioritized pkt latency > > >>>> |* * Prioritized pkt latency > > >>>> 1e-2 |* > > >>>> |* > > >>>> 1e-3 |* | > > >>>> |* | > > >>>> 1e-4 |* | | | > > >>>> |* |* | | > > >>>> 1e-5 |* |* | | | > > >>>> |* |* |* | | | > > >>>> 1e-6 |* |* |* |* | | > > >>>> |* |* |* |* |* | > > >>>> 1e-7 |* |* |* |* |* |* > > >>>> |* |* |* |* |* |* > > >>>> 1e-8 |* |* |* |* |* |* > > >>>> 0-1 1-20 20-40 40-50 50-60 60-70 ... 120-400 > > >>>> Latency (us) > > >>>> > > >>>> Proportion of packets per latency bin @ 80% Max Throughput > > >>>> (Log scale) > > >>>> > > >>>> > > >>>> Regards, > > >>>> Billy. > > >>>> > > >>>> billy O'Mahony (4): > > >>>> netdev: Add set_ingress_sched to netdev api > > >>>> netdev-dpdk: Apply ingress_sched config to dpdk phy ports > > >>>> dpif-netdev: Add rxq prioritization > > >>>> docs: Document ingress scheduling feature > > >>>> > > >>>> Documentation/howto/dpdk.rst | 31 +++++++ > > >>>> include/openvswitch/ofp-parse.h | 3 + > > >>>> lib/dpif-netdev.c | 25 ++++-- > > >>>> lib/netdev-bsd.c | 1 + > > >>>> lib/netdev-dpdk.c | 192 > > >>> +++++++++++++++++++++++++++++++++++++++- > > >>>> lib/netdev-dummy.c | 1 + > > >>>> lib/netdev-linux.c | 1 + > > >>>> lib/netdev-provider.h | 10 +++ > > >>>> lib/netdev-vport.c | 1 + > > >>>> lib/netdev.c | 22 +++++ > > >>>> lib/netdev.h | 1 + > > >>>> vswitchd/bridge.c | 4 + > > >>>> vswitchd/vswitch.xml | 31 +++++++ > > >>>> 13 files changed, 315 insertions(+), 8 deletions(-) > > >>>> > > >> > > >> _______________________________________________ > > >> dev mailing list > > >> d...@openvswitch.org > > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev