On 2018/12/14 上午4:12, Michael S. Tsirkin wrote:
On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:
Hi:

This series tries to access virtqueue metadata through kernel virtual
address instead of copy_user() friends since they had too much
overheads like checks, spec barriers or even hardware feature
toggling.

Test shows about 24% improvement on TX PPS. It should benefit other
cases as well.

Please review
I think the idea of speeding up userspace access is a good one.
However I think that moving all checks to start is way too aggressive.


So did packet and AF_XDP. Anyway, sharing address space and access them directly is the fastest way. Performance is the major consideration for people to choose backend. Compare to userspace implementation, vhost does not have security advantages at any level. If vhost is still slow, people will start to develop backends based on e.g AF_XDP.


Instead, let's batch things up but let's not keep them
around forever.
Here are some ideas:


1. Disable preemption, process a small number of small packets
    directly in an atomic context. This should cut latency
    down significantly, the tricky part is to only do it
    on a light load and disable this
    for the streaming case otherwise it's unfair.
    This might fail, if it does just bounce things out to
    a thread.


I'm not sure what context you meant here. Is this for TX path of TUN? But a fundamental difference is my series is targeted for extreme heavy load not light one, 100% cpu for vhost is expected.



2. Switch to unsafe_put_user/unsafe_get_user,
    and batch up multiple accesses.


As I said, unless we can batch accessing of two difference places of three of avail, descriptor and used. It won't help for batching the accessing of a single place like used. I'm even not sure this can be done consider the case of packed virtqueue, we have a single descriptor ring. Batching through unsafe helpers may not help in this case since it's equivalent to safe ones . And This requires non trivial refactoring of vhost. And such refactoring itself make give us noticeable impact (e.g it may lead regression).



3. Allow adding a fixup point manually,
    such that multiple independent get_user accesses
    can get a single fixup (will allow better compiler
    optimizations).


So for metadata access, I don't see how you suggest here can help in the case of heavy workload.

For data access, this may help but I've played to batch the data copy to reduce SMAP/spec barriers in vhost-net but I don't see performance improvement.

Thanks





Jason Wang (3):
   vhost: generalize adding used elem
   vhost: fine grain userspace memory accessors
   vhost: access vq metadata through kernel virtual address

  drivers/vhost/vhost.c | 281 ++++++++++++++++++++++++++++++++++++++----
  drivers/vhost/vhost.h |  11 ++
  2 files changed, 266 insertions(+), 26 deletions(-)

--
2.17.1

Reply via email to