> >>> This version of the patch seems to have negative impact on
> >>> performance
> >> for burst traffic profile[1].
> >>> Benefits seen with the previous version (v2) was up to ~1.6x for
> >>> 1568 byte
> >> packets compared to ~1.2x seen with the current design (v3) as
> >> measured on new Intel hardware that supports DSA [2] , CPU @ 1.8Ghz.
> >>> The cause of the drop seems to be because of the excessive vhost txq
> >> contention across the PMD threads.
> >>
> >> So it means the Tx/Rx queue pairs aren't consumed by the same PMD
> >> thread. can you confirm?
> >
> > Yes, the completion polls for a given txq happens on a single PMD
> thread(on the same thread where its corresponding rxq is being polled) but
> other threads can submit(enqueue) packets on the same txq,  which leads to
> contention.
> 
> Why this process can't be lockless?
> If we have to lock the device, maybe we can do both submission and
> completion from the thread that polls corresponding Rx queue?
> Tx threads may enqueue mbufs to some lockless ring inside the
> rte_vhost_enqueue_burst.  Rx thread may dequeue them and submit jobs
> to dma device and check completions.  No locks required.
> 

Thank you for the comments, Ilya.

Hi Jiayu, Maxime,

Could I request your opinions on this from the vhost library perspective ? 

Thanks and regards,
Sunil 
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to