On 15/10/15 00:23, Ananyev, Konstantin wrote: > > >> -----Original Message----- >> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] >> Sent: Wednesday, October 14, 2015 5:11 PM >> To: Ananyev, Konstantin; Richardson, Bruce; dev at dpdk.org >> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive >> function >> >> >> >> On 28/09/15 00:19, Ananyev, Konstantin wrote: >>> >>> >>>> -----Original Message----- >>>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] >>>> Sent: Friday, September 25, 2015 7:29 PM >>>> To: Richardson, Bruce; dev at dpdk.org >>>> Cc: Ananyev, Konstantin >>>> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive >>>> function >>>> >>>> On 07/09/15 07:41, Richardson, Bruce wrote: >>>>> >>>>>> -----Original Message----- >>>>>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] >>>>>> Sent: Monday, September 7, 2015 3:15 PM >>>>>> To: Richardson, Bruce; dev at dpdk.org >>>>>> Cc: Ananyev, Konstantin >>>>>> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD receive >>>>>> function >>>>>> >>>>>> >>>>>> >>>>>> On 07/09/15 13:57, Richardson, Bruce wrote: >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] >>>>>>>> Sent: Monday, September 7, 2015 1:26 PM >>>>>>>> To: dev at dpdk.org >>>>>>>> Cc: Ananyev, Konstantin; Richardson, Bruce >>>>>>>> Subject: Re: [PATCH] ixgbe: prefetch packet headers in vector PMD >>>>>>>> receive function >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I just realized I've missed the "[PATCH]" tag from the subject. Did >>>>>>>> anyone had time to review this? >>>>>>>> >>>>>>> Hi Zoltan, >>>>>>> >>>>>>> the big thing that concerns me with this is the addition of new >>>>>>> instructions for each packet in the fast path. Ideally, this >>>>>>> prefetching would be better handled in the application itself, as for >>>>>>> some apps, e.g. those using pipelining, the core doing the RX from the >>>>>>> NIC may not touch the packet data at all, and the prefetches will >>>>>> instead cause a performance slowdown. >>>>>>> Is it possible to get the same performance increase - or something >>>>>>> close to it - by making changes in OVS? >>>>>> OVS already does a prefetch when it's processing the previous packet, but >>>>>> apparently it's not early enough. At least for my test scenario, where >>>>>> I'm >>>>>> forwarding UDP packets with the least possible overhead. I guess in tests >>>>>> where OVS does more complex processing it should be fine. >>>>>> I'll try to move the prefetch earlier in OVS codebase, but I'm not sure >>>>>> if >>>>>> it'll help. >>>>> I would suggest trying to prefetch more than one packet ahead. >>>>> Prefetching 4 or >>>>> 8 ahead might work better, depending on the processing being done. >>>> >>>> I've moved the prefetch earlier, and it seems to work: >>>> >>>> https://patchwork.ozlabs.org/patch/519017/ >>>> >>>> However it raises the question: should we remove header prefetch from >>>> all the other drivers, and the CONFIG_RTE_PMD_PACKET_PREFETCH option? >>> >>> My vote would be for that. >>> Konstantin >> >> After some further thinking I would rather support the >> rte_packet_prefetch() macro (prefetch the header in the driver, and >> configure it through CONFIG_RTE_PMD_PACKET_PREFETCH) >> >> - the prefetch can happen earlier, so if an application needs the header >> right away, this is the fastest >> - if the application doesn't need header prefetch, it can turn it off >> compile time. Same as if we wouldn't have this option. >> - if the application has mixed needs (sometimes it needs the header >> right away, sometimes it doesn't), it can turn it off and do what it >> needs. Same as if we wouldn't have this option. >> >> A harder decision would be whether it should be turned on or off by >> default. Currently it's on, and half of the receive functions don't use it. > > Yep, it is sort of a mess right now. > Another question if we'd like to keep it and standardise it: > at what moment to prefetch: as soon as we realize that HW is done with that > buffer, > or as late inside rx_burst() as possible? > I suppose there is no clear answer for that. I think if the application needs the driver start the prefetching, it does it because it's already too late when rte_eth_rx_burst() returns. So I think it's best to do it as soon as we learn that the HW is done.
> That's why my thought was to just get rid of it. > Though if it would be implemented in some standardized way and disabled by > default - > that's probably ok too. > BTW, couldn't that be implemented just via rte_ethdev rx callbacks mechanism? > Then we can have the code one place and don't need compile option at all - > could be ebabled/disabled dynamically on a per nic basis. > Or would it be too high overhead for that? I think if we go that way, it's better to pass an option to rte_eth_rx_burst() and tell if you want the header prefetched by the driver. That would be the most flexible. > Konstantin > > > >> >>> >>> >>>> >>>>> >>>>> /Bruce >>>