On Mon, Aug 15, 2016 at 01:00:24PM -0700, Rich Lane wrote: > Concurrent enqueue is an important performance optimization when the number > of cores used for switching is different than the number of vhost queues. > I've observed a 20% performance improvement compared to a strategy that > binds queues to cores. > > The atomic cmpset is only executed when the application calls > rte_vhost_enqueue_burst_mp. Benchmarks show no performance impact > when not using concurrent enqueue. > > Mergeable RX buffers aren't supported by concurrent enqueue to minimize > code complexity.
I think that would break things when Mergeable rx is enabled (which is actually enabled by default). Besides that, as mentioned in the last week f2f talk, do you think adding a new flag RTE_VHOST_USER_CONCURRENT_ENQUEUE (for rte_vhost_driver_register()) __might__ be a better idea? That could save us a API, to which I don't object though. --yliu