> >>> This version of the patch seems to have negative impact on > >>> performance > >> for burst traffic profile[1]. > >>> Benefits seen with the previous version (v2) was up to ~1.6x for > >>> 1568 byte > >> packets compared to ~1.2x seen with the current design (v3) as > >> measured on new Intel hardware that supports DSA [2] , CPU @ 1.8Ghz. > >>> The cause of the drop seems to be because of the excessive vhost txq > >> contention across the PMD threads. > >> > >> So it means the Tx/Rx queue pairs aren't consumed by the same PMD > >> thread. can you confirm? > > > > Yes, the completion polls for a given txq happens on a single PMD > thread(on the same thread where its corresponding rxq is being polled) but > other threads can submit(enqueue) packets on the same txq, which leads to > contention. > > Why this process can't be lockless? > If we have to lock the device, maybe we can do both submission and > completion from the thread that polls corresponding Rx queue? > Tx threads may enqueue mbufs to some lockless ring inside the > rte_vhost_enqueue_burst. Rx thread may dequeue them and submit jobs > to dma device and check completions. No locks required. >
Thank you for the comments, Ilya. Hi Jiayu, Maxime, Could I request your opinions on this from the vhost library perspective ? Thanks and regards, Sunil _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
