On Tue, Jul 15, 2014 at 10:01:11AM -0400, John W. Linville wrote: > On Tue, Jul 15, 2014 at 08:17:44AM -0400, Neil Horman wrote: > > On Tue, Jul 15, 2014 at 12:15:49AM +0000, Zhou, Danny wrote: > > > According to my performance measurement results for 64B small > > > packet, 1 queue perf. is better than 16 queues (1.35M pps vs. 0.93M > > > pps) which make sense to me as for 16 queues case more CPU cycles (16 > > > queues' 87% vs. 1 queue' 80%) in kernel land needed for NAPI-enabled > > > ixgbe driver to switch between polling and interrupt modes in order > > > to service per-queue rx interrupts, so more context switch overhead > > > involved. Also, since the eth_packet_rx/eth_packet_tx routines involves > > > in two memory copies between DPDK mbuf and pbuf for each packet, > > > it can hardly achieve high performance unless packet are directly > > > DMA to mbuf which needs ixgbe driver to support. > > > > I thought 16 queues would be spread out between as many cpus as you had > > though, > > obviating the need for context switches, no? > > I think Danny is testing the single CPU case. Having more queues > than CPUs probably does not provide any benefit. > Ah, yes, generally speaking, you never want nr_cpus < nr_queues. Otherwise you'll just be fighting yourself.
> It would be cool to hack the DPDK memory management to work directly > out of the mmap'ed AF_PACKET buffers. But at this point I don't > have enough knowledge of DPDK internals to know if that is at all > reasonable... > > John > > P.S. Danny, have you run any performance tests on the PCAP driver? > > -- > John W. Linville Someday the world will need a hero, and you > linville at tuxdriver.com might be all we have. Be ready. >