Re: [Qemu-devel] [PATCH v2 0/3] virtio: proposal to optimize accesses to VQs

Vincenzo Maffione Wed, 16 Dec 2015 02:40:36 -0800

2015-12-16 10:34 GMT+01:00 Paolo Bonzini <pbonz...@redhat.com>:
>
>
> On 16/12/2015 10:28, Vincenzo Maffione wrote:
>> Assuming my TX experiments with disconnected backend (and I disable
>> CPU dynamic scaling of performance, etc.):
>>   1) after patch 1 and 2, virtio bottleneck jumps from ~1Mpps to 1.910 Mpps.
>>   2) after patch 1,2 and 3, virtio bottleneck jumps to 2.039 Mpps.
>>
>> So I see an improvement for patch 3, and I guess it's because we avoid
>> an additional memory translation and related overhead. I believe that
>> avoiding the memory translation is more beneficial than avoiding the
>> variable-sized memcpy.
>> I'm not surprised of that, because taking a brief look at what happens
>> under the hood when you call an access_memory() function - it looks
>> like a lot of operations.
>
> Great, thanks for confirming!
>
> Paolo


No problems.

I have some additional (orthogonal) curiosities:

  1) Assuming "hw/virtio/dataplane/vring.c" is what I think it is (VQ
data structures directly accessible in the host virtual memory, with
guest-phyisical-to-host-virtual mapping done statically at setup time)
why isn't QEMU using this approach also for virtio-net? I see it is
used by virtio-blk only.

  2) In any case (vring or not) QEMU dynamically maps data buffers
from guest physical memory, for each descriptor to be processed: e1000
uses pci_dma_read/pci_dma_write, virtio uses
cpu_physical_memory_map()/cpu_physical_memory_unmap(), vring uses the
more specialied vring_map()/vring_unmap(). All of these go through
expensive lookups and related operations to do the address
translation.
Have you considered the possibility to cache the translation result to
remove this bottleneck (maybe just for virtio devices)? Or is any
consistency or migration-related problem that would create issues?
Just to give an example of what I'm talking about:
https://github.com/vmaffione/qemu/blob/master/hw/net/e1000.c#L349-L423.

At very high packet rates, once notifications (kicks and interrupts)
have been amortized in some way, memory translation becomes the major
bottleneck. And this (1 and 2) is why QEMU virtio implementation
cannot achieve the same throughput as bhyve does (5-6 Mpps or more
IIRC).

Cheers,
  Vincenzo



-- 
Vincenzo Maffione

Re: [Qemu-devel] [PATCH v2 0/3] virtio: proposal to optimize accesses to VQs

Reply via email to