On Tue, Mar 29, 2016 at 09:54:18AM -0700, Stephen Hemminger wrote:
> On Tue, 29 Mar 2016 10:31:19 +0100
> Bruce Richardson <bruce.richardson at intel.com> wrote:
> 
> > On Mon, Mar 28, 2016 at 06:45:26PM -0700, Mohammad El-Shabani wrote:
> > > Hi,
> > > Looking into why it hurts performance, I see that ixgbe_dev_rx_queue_count
> > > is implemented a scan of elements of rx descriptors, which is very
> > > expensive. I am wondering why its implemented the way it is. Could it not
> > > just read the head location from the driver?
> > > 
> > > Thanks!
> > > Mohammad El-Shabani
> > 
> > It's likely that reading the head location from the driver will be even 
> > slower
> > than scanning the descriptor rings in memory. Access to PCI is very much 
> > slower
> > than accessing memory - especially since on platforms with DDIO, many memory
> > accesses will actually be cache reads.
> > 
> > That being said, I haven't actually written a test to prove this out, so 
> > feel
> > free to try out the head pointer read method instead and see if it improves
> > things. The results may vary depending on how far ahead needs to be scanned,
> > but certainly for the empty ring case, the descriptor scan method will be 
> > far
> > faster than a head read.
> > 
> > Regards,
> > /Bruce
> 
> Also the most common use case is "is there any more packets ready before
> I go to sleep on epoll", and the descriptor done API tells more than
> is needed.

Yes, it's not designed for that case. For the are-there-any-more-packets query,
the rx_burst api is the one to call. :-)
The rx_queue_count API is for the case where you are under load and need to see
beyond the max count returned by rx_burst before you process the burst of 
packets.

/Bruce

Reply via email to