On Tue, Mar 29, 2016 at 09:54:18AM -0700, Stephen Hemminger wrote: > On Tue, 29 Mar 2016 10:31:19 +0100 > Bruce Richardson <bruce.richardson at intel.com> wrote: > > > On Mon, Mar 28, 2016 at 06:45:26PM -0700, Mohammad El-Shabani wrote: > > > Hi, > > > Looking into why it hurts performance, I see that ixgbe_dev_rx_queue_count > > > is implemented a scan of elements of rx descriptors, which is very > > > expensive. I am wondering why its implemented the way it is. Could it not > > > just read the head location from the driver? > > > > > > Thanks! > > > Mohammad El-Shabani > > > > It's likely that reading the head location from the driver will be even > > slower > > than scanning the descriptor rings in memory. Access to PCI is very much > > slower > > than accessing memory - especially since on platforms with DDIO, many memory > > accesses will actually be cache reads. > > > > That being said, I haven't actually written a test to prove this out, so > > feel > > free to try out the head pointer read method instead and see if it improves > > things. The results may vary depending on how far ahead needs to be scanned, > > but certainly for the empty ring case, the descriptor scan method will be > > far > > faster than a head read. > > > > Regards, > > /Bruce > > Also the most common use case is "is there any more packets ready before > I go to sleep on epoll", and the descriptor done API tells more than > is needed.
Yes, it's not designed for that case. For the are-there-any-more-packets query, the rx_burst api is the one to call. :-) The rx_queue_count API is for the case where you are under load and need to see beyond the max count returned by rx_burst before you process the burst of packets. /Bruce