Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic
Hi All, As a suggestion for dealing with indicating success or otherwise of ingress scheduling configuration and also advertising an Interfaces ingress scheduling capability I'm suggesting both these can be written back to the Interface tables other_config column. The schema change (change with respect to the current patch-set) would be like this. The format of the ingress_sched field is specified in ovs-fields(7) in the ``Matching'' and ``FIELD REFERENCE'' sections. + + +A comma separated list of ovs-fields(7) that the interface supports for +ingress scheduling. If ingress scheduling is not supported this column +is cleared. + + + + +If the specified ingress scheduling could not be applied, Open vSwitch +sets this column to an error description in human readable form. +Otherwise, Open vSwitch clears this column. + + It would be nice to have input on the feasibility of writing back to the Interface table - there is already a few columns that are written to in Interface table - e.g stats column and ofport column. But this would make the other_config column both read and write which hopefully doesn't confuse the mechanism that notifies Interface table changes from ovsdb into vswitchd. Regards, Billy. > -Original Message- > From: Jan Scheurich [mailto:jan.scheur...@ericsson.com] > Sent: Friday, September 22, 2017 12:37 PM > To: O Mahony, Billy <billy.o.mah...@intel.com>; Kevin Traynor > <ktray...@redhat.com>; d...@openvswitch.org > Cc: Mechthild Buescher <mechthild.buesc...@ericsson.com>; Venkatesan > Pradeep <venkatesan.prad...@ericsson.com> > Subject: RE: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic > > Hi Billy, > > > -Original Message- > > From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] > > Sent: Friday, 22 September, 2017 10:52 > > > > The next question is how to classify the ingress traffic on the NIC > > > and insert it into rx queues with different priority. Any scheme > > > implemented should preferably work with as many NICs as possible. > > > Use of the new rte_flow API in DPDK seems the right direction to go here. > > > > [[BO'M]] This may be getting ahead of where we are but is it important to > know if a NIC does not support a prioritization scheme? > > Someone, Darrell I believe mentioned a capability discovery mechanism > > at one point. I was thinking it was not necessary as functionally > > nothing changes if prioritization is or is not supported. But maybe in > > terms of > an orchestrator it does make sense - as the it may want to want to make other > arrangements to protect control traffic in the absence of a working > prioritization mechanism. > > [Jan] In our use case the configuration of filters for prioritization would > happen > "manually" at OVS deployment time with full knowledge of the NIC type and > capabilities. A run-time capability discovery mechanism is not really needed > for > that. But it would anyway be good to get a feedback if the configured filter > is > supported by the present NIC or if the prioritization will not work. > > > > > > > We are very interested in starting the dialogue how to configure the > > > {queue, priority, filter} mapping in OVS and which filters are most > > > meaningful to start with and supported by most NICs. Candidates > > > could include VLAN tags and p- bits, Ethertype and IP DSCP. > > Any feedback as to the viability of filtering on those fields with i40e and > ixgbe? > > > > > > > One thing that we consider important and that we would not want to > > > lose with prioritization is the possibility to share load over a > > > number of PMDs with RSS. So preferably the prioritization and RSS > > > spread over a number of rx queues were orthogonal. > > > > [[BO'M]] We have a proposed solution for this now. Which is simply to > > change the RETA table to avoid RSS'd packets 'polluting' the priority > > queue. It hasn't been implemented but it should work. That's in the context > > of > DPDK/FlowDirector/XL710 but rte_flow api should allow this too. > > [Jan] Does this mean there is work needed to enhance the NIC firmware, the > i40e DPDK PMD, or the rte_flow API (or any combination of those)? What about > the ixgbe PMD in this context? Will the Niantic support similar > classification? > > Do you have a pointer to Fortville documentation that would help us to > understand how i40e implements the rte_flow API. > > Thanks, Jan ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic
Hi Jan, > -Original Message- > From: Jan Scheurich [mailto:jan.scheur...@ericsson.com] > Sent: Friday, September 22, 2017 12:37 PM > To: O Mahony, Billy <billy.o.mah...@intel.com>; Kevin Traynor > <ktray...@redhat.com>; d...@openvswitch.org > Cc: Mechthild Buescher <mechthild.buesc...@ericsson.com>; Venkatesan > Pradeep <venkatesan.prad...@ericsson.com> > Subject: RE: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic > > Hi Billy, > > > -Original Message- > > From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] > > Sent: Friday, 22 September, 2017 10:52 > > > > The next question is how to classify the ingress traffic on the NIC > > > and insert it into rx queues with different priority. Any scheme > > > implemented should preferably work with as many NICs as possible. > > > Use of the new rte_flow API in DPDK seems the right direction to go > here. > > > > [[BO'M]] This may be getting ahead of where we are but is it important to > know if a NIC does not support a prioritization scheme? > > Someone, Darrell I believe mentioned a capability discovery mechanism > > at one point. I was thinking it was not necessary as functionally > > nothing changes if prioritization is or is not supported. But maybe in terms > of an orchestrator it does make sense - as the it may want to want to make > other arrangements to protect control traffic in the absence of a working > prioritization mechanism. > > [Jan] In our use case the configuration of filters for prioritization would > happen "manually" at OVS deployment time with full knowledge of the NIC > type and capabilities. A run-time capability discovery mechanism is not really > needed for that. But it would anyway be good to get a feedback if the > configured filter is supported by the present NIC or if the prioritization > will > not work. > [[BO'M]] There is a log warning message but if something more software-friendly is required maybe the ovsdb entry for the other_config could be cleared by vswitchd if the interface can't perform? > > > > > > We are very interested in starting the dialogue how to configure the > > > {queue, priority, filter} mapping in OVS and which filters are most > > > meaningful to start with and supported by most NICs. Candidates > > > could include VLAN tags and p- bits, Ethertype and IP DSCP. > > Any feedback as to the viability of filtering on those fields with i40e and > ixgbe? [[BO'M]] There is a flex filter feature which should make this possible for XL710. I will verify. > > > > > > > One thing that we consider important and that we would not want to > > > lose with prioritization is the possibility to share load over a > > > number of PMDs with RSS. So preferably the prioritization and RSS > > > spread over a number of rx queues were orthogonal. > > > > [[BO'M]] We have a proposed solution for this now. Which is simply to > > change the RETA table to avoid RSS'd packets 'polluting' the priority > > queue. It hasn't been implemented but it should work. That's in the > context of DPDK/FlowDirector/XL710 but rte_flow api should allow this too. > > [Jan] Does this mean there is work needed to enhance the NIC firmware, the > i40e DPDK PMD, or the rte_flow API (or any combination of those)? What > about the ixgbe PMD in this context? Will the Niantic support similar > classification? [[BO'M]] I'd imagine that all NICs implementing RSS have a RETA and I'm sure it's accessible by both fdir and rte_flow currently. In terms of Niantic supporting queue assignment based on VLAN tags etc I'm not so sure. I'll take an AR to dig into this. > > Do you have a pointer to Fortville documentation that would help us to > understand how i40e implements the rte_flow API. [[BO'M]] AFAIK the flow API is pretty expressive. The issue would be more with NIC support. There is the XL710 datasheet https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xl710-10-40-controller-datasheet.pdf which tbh I find hard to figure out how the various filter mechanism interact. > > Thanks, Jan ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic
Hi Billy, > -Original Message- > From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] > Sent: Friday, 22 September, 2017 10:52 > > The next question is how to classify the ingress traffic on the NIC and > > insert it > > into rx queues with different priority. Any scheme implemented should > > preferably work with as many NICs as possible. Use of the new rte_flow API > > in DPDK seems the right direction to go here. > > [[BO'M]] This may be getting ahead of where we are but is it important to > know if a NIC does not support a prioritization scheme? > Someone, Darrell I believe mentioned a capability discovery mechanism at one > point. I was thinking it was not necessary as functionally > nothing changes if prioritization is or is not supported. But maybe in terms > of an orchestrator it does make sense - as the it may want to > want to make other arrangements to protect control traffic in the absence of > a working prioritization mechanism. [Jan] In our use case the configuration of filters for prioritization would happen "manually" at OVS deployment time with full knowledge of the NIC type and capabilities. A run-time capability discovery mechanism is not really needed for that. But it would anyway be good to get a feedback if the configured filter is supported by the present NIC or if the prioritization will not work. > > > > We are very interested in starting the dialogue how to configure the {queue, > > priority, filter} mapping in OVS and which filters are most meaningful to > > start > > with and supported by most NICs. Candidates could include VLAN tags and p- > > bits, Ethertype and IP DSCP. Any feedback as to the viability of filtering on those fields with i40e and ixgbe? > > > > One thing that we consider important and that we would not want to lose > > with prioritization is the possibility to share load over a number of PMDs > > with > > RSS. So preferably the prioritization and RSS spread over a number of rx > > queues were orthogonal. > > [[BO'M]] We have a proposed solution for this now. Which is simply to change > the RETA table to avoid RSS'd packets 'polluting' the > priority queue. It hasn't been implemented but it should work. That's in the > context of DPDK/FlowDirector/XL710 but rte_flow api should > allow this too. [Jan] Does this mean there is work needed to enhance the NIC firmware, the i40e DPDK PMD, or the rte_flow API (or any combination of those)? What about the ixgbe PMD in this context? Will the Niantic support similar classification? Do you have a pointer to Fortville documentation that would help us to understand how i40e implements the rte_flow API. Thanks, Jan ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic
Hi Jan, Kevin, > -Original Message- > From: Jan Scheurich [mailto:jan.scheur...@ericsson.com] > Sent: Thursday, September 21, 2017 4:12 PM > To: Kevin Traynor <ktray...@redhat.com>; O Mahony, Billy > <billy.o.mah...@intel.com>; d...@openvswitch.org > Cc: Mechthild Buescher <mechthild.buesc...@ericsson.com>; Venkatesan > Pradeep <venkatesan.prad...@ericsson.com> > Subject: RE: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic > > Hi all, > > We seriously want to pursue this kind of ingress traffic prioritization from > physical ports in OVS-DPDK for the use case I mentioned earlier: > prioritization > of in-band control plane traffic running on the same physical network as the > tenant data traffic. > > We have first focused on testing the effectiveness of the SW queue > prioritization in Billy's patch. To this end we added two DPDK ports to a PMD: > dpdk0 with normal priority and dpdk1 with hard-coded high priority (e.g. not > using the config interface in the patch). We cross-connected dpdk0 to a > vhostuser port in a VM and dpdk1 to the LOCAL port on the host. > > We overloaded the PMD with 64 byte packets on dpdk0 (~25% rx packet > drop on dpdk0) and in parallel sent iperf3 UDP traffic (256 byte datagrams) in > on dpdk1, destined to an iperf3 server running on the host. > > With the dpdk1 queue prioritized, we achieve ~1Gbit/s (460 Kpps) iperf3 > throughput with zero packet drop no matter if the parallel overload traffic on > dpdk0 is running or not. (The throughput is limited by the UDP/IP stack on > the client side.) In the same test with non-prioritized dpdk1 queue iperf3 > reports about 28% packet drop, same as experienced by the dpdk0 traffic. > > With that we can conclude that the PMD priority queue polling scheme > implemented in Billy's patch effectively solves our problem. We haven't > tested if the inner priority polling loop has any performance impact on the > normal PMD processing. Not likely, though. [[BO'M]] That great to know! > > The next question is how to classify the ingress traffic on the NIC and > insert it > into rx queues with different priority. Any scheme implemented should > preferably work with as many NICs as possible. Use of the new rte_flow API > in DPDK seems the right direction to go here. [[BO'M]] This may be getting ahead of where we are but is it important to know if a NIC does not support a prioritization scheme? Someone, Darrell I believe mentioned a capability discovery mechanism at one point. I was thinking it was not necessary as functionally nothing changes if prioritization is or is not supported. But maybe in terms of an orchestrator it does make sense - as the it may want to want to make other arrangements to protect control traffic in the absence of a working prioritization mechanism. > > We are very interested in starting the dialogue how to configure the {queue, > priority, filter} mapping in OVS and which filters are most meaningful to > start > with and supported by most NICs. Candidates could include VLAN tags and p- > bits, Ethertype and IP DSCP. > > One thing that we consider important and that we would not want to lose > with prioritization is the possibility to share load over a number of PMDs > with > RSS. So preferably the prioritization and RSS spread over a number of rx > queues were orthogonal. [[BO'M]] We have a proposed solution for this now. Which is simply to change the RETA table to avoid RSS'd packets 'polluting' the priority queue. It hasn't been implemented but it should work. That's in the context of DPDK/FlowDirector/XL710 but rte_flow api should allow this too. > > BR, Jan > > > Note: There seems to be a significant overlap with the discussion around > classification HW offload for datapath flow entries currently going on, with > the exception that the QoS filters here are static and not in any way tied to > dynamic megaflows. > > > > -Original Message- > > From: Kevin Traynor [mailto:ktray...@redhat.com] > > Sent: Friday, 18 August, 2017 20:40 > > To: Jan Scheurich <jan.scheur...@ericsson.com>; O Mahony, Billy > > <billy.o.mah...@intel.com>; d...@openvswitch.org > > Subject: Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive > > traffic > > > > On 08/17/2017 05:21 PM, Jan Scheurich wrote: > > > Good discussion. Some thoughts: > > > > > > 1. Prioritizing queues by assigning them to dedicated PMDs is a > > > simple and effective but very crude method, considering that you > > have to reserve an entire (logical) core for that. So I am all for a more > economic and perhaps slightly less deterministic option! > > > > > > > S
Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic
Hi all, We seriously want to pursue this kind of ingress traffic prioritization from physical ports in OVS-DPDK for the use case I mentioned earlier: prioritization of in-band control plane traffic running on the same physical network as the tenant data traffic. We have first focused on testing the effectiveness of the SW queue prioritization in Billy's patch. To this end we added two DPDK ports to a PMD: dpdk0 with normal priority and dpdk1 with hard-coded high priority (e.g. not using the config interface in the patch). We cross-connected dpdk0 to a vhostuser port in a VM and dpdk1 to the LOCAL port on the host. We overloaded the PMD with 64 byte packets on dpdk0 (~25% rx packet drop on dpdk0) and in parallel sent iperf3 UDP traffic (256 byte datagrams) in on dpdk1, destined to an iperf3 server running on the host. With the dpdk1 queue prioritized, we achieve ~1Gbit/s (460 Kpps) iperf3 throughput with zero packet drop no matter if the parallel overload traffic on dpdk0 is running or not. (The throughput is limited by the UDP/IP stack on the client side.) In the same test with non-prioritized dpdk1 queue iperf3 reports about 28% packet drop, same as experienced by the dpdk0 traffic. With that we can conclude that the PMD priority queue polling scheme implemented in Billy's patch effectively solves our problem. We haven't tested if the inner priority polling loop has any performance impact on the normal PMD processing. Not likely, though. The next question is how to classify the ingress traffic on the NIC and insert it into rx queues with different priority. Any scheme implemented should preferably work with as many NICs as possible. Use of the new rte_flow API in DPDK seems the right direction to go here. We are very interested in starting the dialogue how to configure the {queue, priority, filter} mapping in OVS and which filters are most meaningful to start with and supported by most NICs. Candidates could include VLAN tags and p-bits, Ethertype and IP DSCP. One thing that we consider important and that we would not want to lose with prioritization is the possibility to share load over a number of PMDs with RSS. So preferably the prioritization and RSS spread over a number of rx queues were orthogonal. BR, Jan Note: There seems to be a significant overlap with the discussion around classification HW offload for datapath flow entries currently going on, with the exception that the QoS filters here are static and not in any way tied to dynamic megaflows. > -Original Message- > From: Kevin Traynor [mailto:ktray...@redhat.com] > Sent: Friday, 18 August, 2017 20:40 > To: Jan Scheurich <jan.scheur...@ericsson.com>; O Mahony, Billy > <billy.o.mah...@intel.com>; d...@openvswitch.org > Subject: Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic > > On 08/17/2017 05:21 PM, Jan Scheurich wrote: > > Good discussion. Some thoughts: > > > > 1. Prioritizing queues by assigning them to dedicated PMDs is a simple and > > effective but very crude method, considering that you > have to reserve an entire (logical) core for that. So I am all for a more > economic and perhaps slightly less deterministic option! > > > > Sure - if you have the ability to effectively prioritize traffic on all > ports then I agree. At present you would only be able to prioritize > traffic from a 2 rxq i40e which would mean any other high priority > traffic may get penalized if it lands on the same pmd. I'm not sure that > limited a use case would really be useful. > > Kevin. > > > 2. Offering the option to prioritize certain queues in OVS-DPDK is a highly > > desirable feature. We have at least one important use case > in OpenStack (prioritizing "in-band" infrastructure control plane traffic > over tenant data, in case both are carried on the same physical > network). In our case the traffic separation would be done per VLAN. Can we > add this to the list of supported filters? > > > > 3. It would be nice to be able to combine priority queues with filters with > > a number of RSS queues without filter. Is this a XL710 HW > limitation or only a limitation of the drivers and DPDK APIs? > > > > BR, Jan > > > > > >> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > >> boun...@openvswitch.org] On Behalf Of O Mahony, Billy > >> Sent: Thursday, 17 August, 2017 18:07 > >> To: Kevin Traynor <ktray...@redhat.com>; d...@openvswitch.org > >> Subject: Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic > >> > >> Hi Kevin, > >> > >> Thanks for the comments - more inline. > >> > >> Billy. > >> > >>> -Original Message- > >>> From: Kevin Trayn
Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic
On 08/17/2017 05:21 PM, Jan Scheurich wrote: > Good discussion. Some thoughts: > > 1. Prioritizing queues by assigning them to dedicated PMDs is a simple and > effective but very crude method, considering that you have to reserve an > entire (logical) core for that. So I am all for a more economic and perhaps > slightly less deterministic option! > Sure - if you have the ability to effectively prioritize traffic on all ports then I agree. At present you would only be able to prioritize traffic from a 2 rxq i40e which would mean any other high priority traffic may get penalized if it lands on the same pmd. I'm not sure that limited a use case would really be useful. Kevin. > 2. Offering the option to prioritize certain queues in OVS-DPDK is a highly > desirable feature. We have at least one important use case in OpenStack > (prioritizing "in-band" infrastructure control plane traffic over tenant > data, in case both are carried on the same physical network). In our case the > traffic separation would be done per VLAN. Can we add this to the list of > supported filters? > > 3. It would be nice to be able to combine priority queues with filters with a > number of RSS queues without filter. Is this a XL710 HW limitation or only a > limitation of the drivers and DPDK APIs? > > BR, Jan > > >> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- >> boun...@openvswitch.org] On Behalf Of O Mahony, Billy >> Sent: Thursday, 17 August, 2017 18:07 >> To: Kevin Traynor <ktray...@redhat.com>; d...@openvswitch.org >> Subject: Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic >> >> Hi Kevin, >> >> Thanks for the comments - more inline. >> >> Billy. >> >>> -Original Message- >>> From: Kevin Traynor [mailto:ktray...@redhat.com] >>> Sent: Thursday, August 17, 2017 3:37 PM >>> To: O Mahony, Billy <billy.o.mah...@intel.com>; d...@openvswitch.org >>> Subject: Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive >>> traffic >>> >>> Hi Billy, >>> >>> I just happened to be about to send a reply to the previous patchset, >>> so adding comments here instead. >>> >>> On 08/17/2017 03:24 PM, Billy O'Mahony wrote: >>>> Hi All, >>>> >>>> v2: Addresses various review comments; Applies cleanly on 0bedb3d6. >>>> >>>> This patch set provides a method to request ingress scheduling on >>> interfaces. >>>> It also provides an implemtation of same for DPDK physical ports. >>>> >>>> This allows specific packet types to be: >>>> * forwarded to their destination port ahead of other packets. >>>> and/or >>>> * be less likely to be dropped in an overloaded situation. >>>> >>>> It was previously discussed >>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2017- >>> May/044395.htm >>>> l >>>> and RFC'd >>>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335237.html >>>> >>>> Limitations of this patch: >>>> * The patch uses the Flow Director filter API in DPDK and has only >>>> been tested on Fortville (XL710) NIC. >>>> * Prioritization is limited to: >>>> ** eth_type >>>> ** Fully specified 5-tuple src & dst ip and port numbers for UDP & >>>> TCP packets >>>> * ovs-appctl dpif-netdev/pmd-*-show o/p should indicate rxq >>> prioritization. >>>> * any requirements for a more granular prioritization mechanism >>>> >>> >>> In general I like the idea of splitting priority traffic to a specific >>> queue but I have concerns about the implementation. I shared most of >>> these when we met already but adding here too. Not a detailed review. >> [[BO'M]] No worries. If we get the high-level sorted out first the details >> will >> fall into place :) >>> >>> - It is using deprecated DPDK filter API. >>> http://dpdk.org/doc/guides/rel_notes/deprecation.html >> [[BO'M]] Yes it looks like a move to the shiny new Flow API is in order. >>> >>> - It is an invasive change that seems to be for only one Intel NIC in >>> the DPDK datapath. Even then it is very limited as it only works when >>> that Intel NIC is using exactly 2 rx queues. >> [[BO'M]] That's the current case but is really a limitation of >> FlowDirectorAPI/DPDK/XL710 combination. Maybe Flow API will allow to >> RSS over many queues and place the prioritized traffic on
Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic
Hi All, > -Original Message- > From: Jan Scheurich [mailto:jan.scheur...@ericsson.com] > Sent: Thursday, August 17, 2017 5:22 PM > To: O Mahony, Billy <billy.o.mah...@intel.com>; Kevin Traynor > <ktray...@redhat.com>; d...@openvswitch.org > Subject: RE: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic > > Good discussion. Some thoughts: > > 1. Prioritizing queues by assigning them to dedicated PMDs is a simple and > effective but very crude method, considering that you have to reserve an > entire (logical) core for that. So I am all for a more economic and perhaps > slightly less deterministic option! [[BO'M]] I would agree. I was just drawing attention to the two-part nature of the patch. Ie. that it's not dpdk specific as such but comes with a dpdk implementation. > > 2. Offering the option to prioritize certain queues in OVS-DPDK is a highly > desirable feature. We have at least one important use case in OpenStack > (prioritizing "in-band" infrastructure control plane traffic over tenant > data, in > case both are carried on the same physical network). In our case the traffic > separation would be done per VLAN. Can we add this to the list of supported > filters? [[BO'M]] Good to know about use-cases. I'll dig a bit on that wrt to dpdk drivers and hardware. > > 3. It would be nice to be able to combine priority queues with filters with a > number of RSS queues without filter. Is this a XL710 HW limitation or only a > limitation of the drivers and DPDK APIs? [[BO'M]] Again I'll have to dig on this. Our go to guy for this is on vacation at the moment. Remind me if I don't get back with a response. > > BR, Jan > > > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > > boun...@openvswitch.org] On Behalf Of O Mahony, Billy > > Sent: Thursday, 17 August, 2017 18:07 > > To: Kevin Traynor <ktray...@redhat.com>; d...@openvswitch.org > > Subject: Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive > > traffic > > > > Hi Kevin, > > > > Thanks for the comments - more inline. > > > > Billy. > > > > > -Original Message----- > > > From: Kevin Traynor [mailto:ktray...@redhat.com] > > > Sent: Thursday, August 17, 2017 3:37 PM > > > To: O Mahony, Billy <billy.o.mah...@intel.com>; d...@openvswitch.org > > > Subject: Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive > > > traffic > > > > > > Hi Billy, > > > > > > I just happened to be about to send a reply to the previous > > > patchset, so adding comments here instead. > > > > > > On 08/17/2017 03:24 PM, Billy O'Mahony wrote: > > > > Hi All, > > > > > > > > v2: Addresses various review comments; Applies cleanly on 0bedb3d6. > > > > > > > > This patch set provides a method to request ingress scheduling on > > > interfaces. > > > > It also provides an implemtation of same for DPDK physical ports. > > > > > > > > This allows specific packet types to be: > > > > * forwarded to their destination port ahead of other packets. > > > > and/or > > > > * be less likely to be dropped in an overloaded situation. > > > > > > > > It was previously discussed > > > > https://mail.openvswitch.org/pipermail/ovs-discuss/2017- > > > May/044395.htm > > > > l > > > > and RFC'd > > > > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335237.ht > > > > ml > > > > > > > > Limitations of this patch: > > > > * The patch uses the Flow Director filter API in DPDK and has only > > > > been tested on Fortville (XL710) NIC. > > > > * Prioritization is limited to: > > > > ** eth_type > > > > ** Fully specified 5-tuple src & dst ip and port numbers for UDP & > > > > TCP packets > > > > * ovs-appctl dpif-netdev/pmd-*-show o/p should indicate rxq > > > prioritization. > > > > * any requirements for a more granular prioritization mechanism > > > > > > > > > > In general I like the idea of splitting priority traffic to a > > > specific queue but I have concerns about the implementation. I > > > shared most of these when we met already but adding here too. Not a > detailed review. > > [[BO'M]] No worries. If we get the high-level sorted out first the > > details will fall into place :) > > > > > > - It is using deprecated DPDK filter API. > > > http://dpdk.org/doc/guides/rel_notes/
Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic
Good discussion. Some thoughts: 1. Prioritizing queues by assigning them to dedicated PMDs is a simple and effective but very crude method, considering that you have to reserve an entire (logical) core for that. So I am all for a more economic and perhaps slightly less deterministic option! 2. Offering the option to prioritize certain queues in OVS-DPDK is a highly desirable feature. We have at least one important use case in OpenStack (prioritizing "in-band" infrastructure control plane traffic over tenant data, in case both are carried on the same physical network). In our case the traffic separation would be done per VLAN. Can we add this to the list of supported filters? 3. It would be nice to be able to combine priority queues with filters with a number of RSS queues without filter. Is this a XL710 HW limitation or only a limitation of the drivers and DPDK APIs? BR, Jan > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > boun...@openvswitch.org] On Behalf Of O Mahony, Billy > Sent: Thursday, 17 August, 2017 18:07 > To: Kevin Traynor <ktray...@redhat.com>; d...@openvswitch.org > Subject: Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic > > Hi Kevin, > > Thanks for the comments - more inline. > > Billy. > > > -Original Message- > > From: Kevin Traynor [mailto:ktray...@redhat.com] > > Sent: Thursday, August 17, 2017 3:37 PM > > To: O Mahony, Billy <billy.o.mah...@intel.com>; d...@openvswitch.org > > Subject: Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive > > traffic > > > > Hi Billy, > > > > I just happened to be about to send a reply to the previous patchset, > > so adding comments here instead. > > > > On 08/17/2017 03:24 PM, Billy O'Mahony wrote: > > > Hi All, > > > > > > v2: Addresses various review comments; Applies cleanly on 0bedb3d6. > > > > > > This patch set provides a method to request ingress scheduling on > > interfaces. > > > It also provides an implemtation of same for DPDK physical ports. > > > > > > This allows specific packet types to be: > > > * forwarded to their destination port ahead of other packets. > > > and/or > > > * be less likely to be dropped in an overloaded situation. > > > > > > It was previously discussed > > > https://mail.openvswitch.org/pipermail/ovs-discuss/2017- > > May/044395.htm > > > l > > > and RFC'd > > > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335237.html > > > > > > Limitations of this patch: > > > * The patch uses the Flow Director filter API in DPDK and has only > > > been tested on Fortville (XL710) NIC. > > > * Prioritization is limited to: > > > ** eth_type > > > ** Fully specified 5-tuple src & dst ip and port numbers for UDP & > > > TCP packets > > > * ovs-appctl dpif-netdev/pmd-*-show o/p should indicate rxq > > prioritization. > > > * any requirements for a more granular prioritization mechanism > > > > > > > In general I like the idea of splitting priority traffic to a specific > > queue but I have concerns about the implementation. I shared most of > > these when we met already but adding here too. Not a detailed review. > [[BO'M]] No worries. If we get the high-level sorted out first the details > will > fall into place :) > > > > - It is using deprecated DPDK filter API. > > http://dpdk.org/doc/guides/rel_notes/deprecation.html > [[BO'M]] Yes it looks like a move to the shiny new Flow API is in order. > > > > - It is an invasive change that seems to be for only one Intel NIC in > > the DPDK datapath. Even then it is very limited as it only works when > > that Intel NIC is using exactly 2 rx queues. > [[BO'M]] That's the current case but is really a limitation of > FlowDirectorAPI/DPDK/XL710 combination. Maybe Flow API will allow to > RSS over many queues and place the prioritized traffic on another queue. > > > > - It's a hardcoded opaque QoS which will have a negative impact on > > whichever queues happen to land on the same pmd so it's unpredictable > > which queues will be affected. It could effect other latency sensitive > > traffic that cannot by prioritized because of the limitations above. > > > > - I guess multiple priority queues could land on the same pmd and > > starve each other? > [[BO'M]] Interaction with pmd assignment is definitely an issue that needs > to be addressed. I know there is work in-flight in that regard so it will be > easier to address that when the in-flight work lands. > > >
Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic
Hi Kevin, Thanks for the comments - more inline. Billy. > -Original Message- > From: Kevin Traynor [mailto:ktray...@redhat.com] > Sent: Thursday, August 17, 2017 3:37 PM > To: O Mahony, Billy <billy.o.mah...@intel.com>; d...@openvswitch.org > Subject: Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic > > Hi Billy, > > I just happened to be about to send a reply to the previous patchset, so > adding comments here instead. > > On 08/17/2017 03:24 PM, Billy O'Mahony wrote: > > Hi All, > > > > v2: Addresses various review comments; Applies cleanly on 0bedb3d6. > > > > This patch set provides a method to request ingress scheduling on > interfaces. > > It also provides an implemtation of same for DPDK physical ports. > > > > This allows specific packet types to be: > > * forwarded to their destination port ahead of other packets. > > and/or > > * be less likely to be dropped in an overloaded situation. > > > > It was previously discussed > > https://mail.openvswitch.org/pipermail/ovs-discuss/2017- > May/044395.htm > > l > > and RFC'd > > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335237.html > > > > Limitations of this patch: > > * The patch uses the Flow Director filter API in DPDK and has only > > been tested on Fortville (XL710) NIC. > > * Prioritization is limited to: > > ** eth_type > > ** Fully specified 5-tuple src & dst ip and port numbers for UDP & TCP > > packets > > * ovs-appctl dpif-netdev/pmd-*-show o/p should indicate rxq > prioritization. > > * any requirements for a more granular prioritization mechanism > > > > In general I like the idea of splitting priority traffic to a specific queue > but I > have concerns about the implementation. I shared most of these when we > met already but adding here too. Not a detailed review. [[BO'M]] No worries. If we get the high-level sorted out first the details will fall into place :) > > - It is using deprecated DPDK filter API. > http://dpdk.org/doc/guides/rel_notes/deprecation.html [[BO'M]] Yes it looks like a move to the shiny new Flow API is in order. > > - It is an invasive change that seems to be for only one Intel NIC in the DPDK > datapath. Even then it is very limited as it only works when that Intel NIC is > using exactly 2 rx queues. [[BO'M]] That's the current case but is really a limitation of FlowDirectorAPI/DPDK/XL710 combination. Maybe Flow API will allow to RSS over many queues and place the prioritized traffic on another queue. > > - It's a hardcoded opaque QoS which will have a negative impact on > whichever queues happen to land on the same pmd so it's unpredictable > which queues will be affected. It could effect other latency sensitive traffic > that cannot by prioritized because of the limitations above. > > - I guess multiple priority queues could land on the same pmd and starve > each other? [[BO'M]] Interaction with pmd assignment is definitely an issue that needs to be addressed. I know there is work in-flight in that regard so it will be easier to address that when the in-flight work lands. > > I think a more general, less restricted scheme using DPDK rte_flow API with > controls on the effects to other traffic is needed. Perhaps if a user is very > concerned with latency on traffic from a port, they would be ok with > dedicating a pmd to it. [[BO'M]] You are proposing to prioritize queues by allocating a single pmd to them rather than by changing the pmds read algorithm to favor prioritized queues? For sure that could be another implementation of the solution. If we look at the patch set as containing two distinct things as per the cover letter "the patch set provides a method to request ingress scheduling on interfaces. It also provides an implementation of same for DPDK physical ports." Then this would change the second part put the first would be still valid. Each port type in any case would have to come up with it's own implementation - it's just for non-physical ports than cannot offload the prioritization decision it not worth the effort - as was noted in an earlier RFC. > > thanks, > Kevin. > > > Initial results: > > * even when userspace OVS is very much overloaded and > > dropping significant numbers of packets the drop rate for prioritized > > traffic > > is running at 1/1000th of the drop rate for non-prioritized traffic. > > > > * the latency profile of prioritized traffic through userspace OVS is also > much > > improved > > > > 1e0 |* > > |* >
Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic
Hi Billy, I just happened to be about to send a reply to the previous patchset, so adding comments here instead. On 08/17/2017 03:24 PM, Billy O'Mahony wrote: > Hi All, > > v2: Addresses various review comments; Applies cleanly on 0bedb3d6. > > This patch set provides a method to request ingress scheduling on interfaces. > It also provides an implemtation of same for DPDK physical ports. > > This allows specific packet types to be: > * forwarded to their destination port ahead of other packets. > and/or > * be less likely to be dropped in an overloaded situation. > > It was previously discussed > https://mail.openvswitch.org/pipermail/ovs-discuss/2017-May/044395.html > and RFC'd > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335237.html > > Limitations of this patch: > * The patch uses the Flow Director filter API in DPDK and has only been tested > on Fortville (XL710) NIC. > * Prioritization is limited to: > ** eth_type > ** Fully specified 5-tuple src & dst ip and port numbers for UDP & TCP packets > * ovs-appctl dpif-netdev/pmd-*-show o/p should indicate rxq prioritization. > * any requirements for a more granular prioritization mechanism > In general I like the idea of splitting priority traffic to a specific queue but I have concerns about the implementation. I shared most of these when we met already but adding here too. Not a detailed review. - It is using deprecated DPDK filter API. http://dpdk.org/doc/guides/rel_notes/deprecation.html - It is an invasive change that seems to be for only one Intel NIC in the DPDK datapath. Even then it is very limited as it only works when that Intel NIC is using exactly 2 rx queues. - It's a hardcoded opaque QoS which will have a negative impact on whichever queues happen to land on the same pmd so it's unpredictable which queues will be affected. It could effect other latency sensitive traffic that cannot by prioritized because of the limitations above. - I guess multiple priority queues could land on the same pmd and starve each other? I think a more general, less restricted scheme using DPDK rte_flow API with controls on the effects to other traffic is needed. Perhaps if a user is very concerned with latency on traffic from a port, they would be ok with dedicating a pmd to it. thanks, Kevin. > Initial results: > * even when userspace OVS is very much overloaded and > dropping significant numbers of packets the drop rate for prioritized > traffic > is running at 1/1000th of the drop rate for non-prioritized traffic. > > * the latency profile of prioritized traffic through userspace OVS is also > much > improved > > 1e0 |* > |* > 1e-1|* | Non-prioritized pkt latency > |* * Prioritized pkt latency > 1e-2|* > |* > 1e-3|* | > |* | > 1e-4|* | | | > |* |*| | > 1e-5|* |*| | | > |* |*|*| | | > 1e-6|* |*|*|*| | > |* |*|*|*|* | > 1e-7|* |*|*|*|* |* > |* |*|*|*|* |* > 1e-8|* |*|*|*|* |* > 0-1 1-20 20-40 40-50 50-60 60-70 ... 120-400 > Latency (us) > > Proportion of packets per latency bin @ 80% Max Throughput > (Log scale) > > > Regards, > Billy. > > billy O'Mahony (4): > netdev: Add set_ingress_sched to netdev api > netdev-dpdk: Apply ingress_sched config to dpdk phy ports > dpif-netdev: Add rxq prioritization > docs: Document ingress scheduling feature > > Documentation/howto/dpdk.rst| 31 +++ > include/openvswitch/ofp-parse.h | 3 + > lib/dpif-netdev.c | 25 -- > lib/netdev-bsd.c| 1 + > lib/netdev-dpdk.c | 192 > +++- > lib/netdev-dummy.c | 1 + > lib/netdev-linux.c | 1 + > lib/netdev-provider.h | 10 +++ > lib/netdev-vport.c | 1 + > lib/netdev.c| 22 + > lib/netdev.h| 1 + > vswitchd/bridge.c | 4 + > vswitchd/vswitch.xml| 31 +++ > 13 files changed, 315 insertions(+), 8 deletions(-) > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic
Hi All, v2: Addresses various review comments; Applies cleanly on 0bedb3d6. This patch set provides a method to request ingress scheduling on interfaces. It also provides an implemtation of same for DPDK physical ports. This allows specific packet types to be: * forwarded to their destination port ahead of other packets. and/or * be less likely to be dropped in an overloaded situation. It was previously discussed https://mail.openvswitch.org/pipermail/ovs-discuss/2017-May/044395.html and RFC'd https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335237.html Limitations of this patch: * The patch uses the Flow Director filter API in DPDK and has only been tested on Fortville (XL710) NIC. * Prioritization is limited to: ** eth_type ** Fully specified 5-tuple src & dst ip and port numbers for UDP & TCP packets * ovs-appctl dpif-netdev/pmd-*-show o/p should indicate rxq prioritization. * any requirements for a more granular prioritization mechanism Initial results: * even when userspace OVS is very much overloaded and dropping significant numbers of packets the drop rate for prioritized traffic is running at 1/1000th of the drop rate for non-prioritized traffic. * the latency profile of prioritized traffic through userspace OVS is also much improved 1e0 |* |* 1e-1|* | Non-prioritized pkt latency |* * Prioritized pkt latency 1e-2|* |* 1e-3|* | |* | 1e-4|* | | | |* |*| | 1e-5|* |*| | | |* |*|*| | | 1e-6|* |*|*|*| | |* |*|*|*|* | 1e-7|* |*|*|*|* |* |* |*|*|*|* |* 1e-8|* |*|*|*|* |* 0-1 1-20 20-40 40-50 50-60 60-70 ... 120-400 Latency (us) Proportion of packets per latency bin @ 80% Max Throughput (Log scale) Regards, Billy. billy O'Mahony (4): netdev: Add set_ingress_sched to netdev api netdev-dpdk: Apply ingress_sched config to dpdk phy ports dpif-netdev: Add rxq prioritization docs: Document ingress scheduling feature Documentation/howto/dpdk.rst| 31 +++ include/openvswitch/ofp-parse.h | 3 + lib/dpif-netdev.c | 25 -- lib/netdev-bsd.c| 1 + lib/netdev-dpdk.c | 192 +++- lib/netdev-dummy.c | 1 + lib/netdev-linux.c | 1 + lib/netdev-provider.h | 10 +++ lib/netdev-vport.c | 1 + lib/netdev.c| 22 + lib/netdev.h| 1 + vswitchd/bridge.c | 4 + vswitchd/vswitch.xml| 31 +++ 13 files changed, 315 insertions(+), 8 deletions(-) -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/4] prioritizing latency sensitive traffic
Hi Billy, >Hi All, > >This patch set provides a method to request ingress scheduling on interfaces. >It also provides an implemtation of same for DPDK physical ports. > >This allows specific packet types to be: >* forwarded to their destination port ahead of other packets. >and/or >* be less likely to be dropped in an overloaded situation. > >It was previously discussed >https://mail.openvswitch.org/pipermail/ovs-discuss/2017-May/044395.html >and RFC'd >https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335237.html > >Limitations of this patch: >* The patch uses the Flow Director filter API in DPDK and has only been tested >on Fortville (XL710) NIC. >* Prioritization is limited to: >** eth_type >** Fully specified 5-tuple src & dst ip and port numbers for UDP & TCP packets >* ovs-appctl dpif-netdev/pmd-*-show o/p should indicate rxq prioritization. >* any requirements for a more granular prioritization mechanism > >Initial results: >* even when userspace OVS is very much overloaded and > dropping significant numbers of packets the drop rate for prioritized traffic > is running at 1/1000th of the drop rate for non-prioritized traffic. > >* the latency profile of prioritized traffic through userspace OVS is also much > improved > >1e0 |* >|* >1e-1|* | Non-prioritized pkt latency >|* * Prioritized pkt latency >1e-2|* >|* >1e-3|* | >|* | >1e-4|* | | | >|* |*| | >1e-5|* |*| | | >|* |*|*| | | >1e-6|* |*|*|*| | >|* |*|*|*|* | >1e-7|* |*|*|*|* |* >|* |*|*|*|* |* >1e-8|* |*|*|*|* |* > 0-1 1-20 20-40 40-50 50-60 60-70 ... 120-400 >Latency (us) > > Proportion of packets per latency bin @ 80% Max Throughput > (Log scale) > Thanks for working on this feature. I started reviewing the code initially but later decided to test it first as it uses XL710 NIC Flow director features and wanted to Know the implications if any. I had few observations here and would like to know if you have seen this during your unit tests. 1) With this patch series, Rx Burst Bulk Allocation call back function is invoked instead of vector rx function. Meaning i40e_recv_pkts_bulk_alloc() gets invoked instead of i40e_recv_pkts_vec(). Please check i40e_set_rx_function() of i40e DPDK drivers. I am speculating this may be due to the enabling flow director and rules. I don't know the implications of using bulk_alloc() function, maybe we should check with DPDK guys on this. 2) When I tried to prioritize the udp pkts for specific IPs and Ports, I see a massive performance drop. I am using XL710 NIC with stable firmware version. Below are my steps. - Start OvS and make sure the the n_rxq for DPDK0, DPDK1 ports is set to 2. - Do simple P2P test with single stream(ip_src=8.18.8.1,ip_dst=101.10.10.1,udp_src=10001,udp_dst=5001) and check the throughput. - Prioritize the active stream. ovs-vsctl set interface dpdk0 other_config:ingress_sched=udp,ip_src=8.18.8.1,ip_dst=101.10.10.1,udp_src=10001,udp_dst=5001 - Throughput drop is observed now. (~1.7Mpps) A bit of debugging in to case 2, I found that "miniflow_hash_5tuple()" is getting invoked and consuming 10% of the total cycles. one of the commits had below lines. dpdk_eth_dev_queue_setup-- /* Ingress scheduling requires ETH_MQ_RX_NONE so limit it to when exactly * two rxqs are defined. Otherwise MQ will not work as expected. */ if (dev->ingress_sched_str && n_rxq == 2) { conf.rxmode.mq_mode = ETH_MQ_RX_NONE; } else { conf.rxmode.mq_mode = ETH_MQ_RX_RSS; } - Does ingress scheduling turn off RSS? This will be big drawback as calculating hash in SW consumes significant cycles. 3) This is another corner case. - Here n_rxq set to 4 for my DPDK ports. start OvS and traffic is started and throughput is as expected. - Now prioritize the stream ovs-vsctl set interface dpdk0 other_config:ingress_sched=udp,ip_src=8.18.8.1,ip_dst=101.10.10.1,udp_src=10001,udp_dst=5001 - The above command shouldn't take in to affect as n_rxq is set to 4 and not 2 and the same is logged appropriately. "2017-07-28T11:11:57.792Z|00104|netdev_dpdk|ERR|Interface dpdk0: Ingress scheduling config ignored; Requires n_rxq==2. 2017-07-28T11:11:57.809Z|00105|dpdk|INFO|PMD: i40e_pf_config_rss(): Max of contiguous 4 PF queues are configured" - However the