2015-12-16 12:02 GMT+01:00 Michael S. Tsirkin <m...@redhat.com>:
> On Wed, Dec 16, 2015 at 11:39:46AM +0100, Vincenzo Maffione wrote:
>> 2015-12-16 10:34 GMT+01:00 Paolo Bonzini <pbonz...@redhat.com>:
>> >
>> >
>> > On 16/12/2015 10:28, Vincenzo Maffione wrote:
>> >> Assuming my TX experiments with disconnected backend (and I disable
>> >> CPU dynamic scaling of performance, etc.):
>> >>   1) after patch 1 and 2, virtio bottleneck jumps from ~1Mpps to 1.910 
>> >> Mpps.
>> >>   2) after patch 1,2 and 3, virtio bottleneck jumps to 2.039 Mpps.
>> >>
>> >> So I see an improvement for patch 3, and I guess it's because we avoid
>> >> an additional memory translation and related overhead. I believe that
>> >> avoiding the memory translation is more beneficial than avoiding the
>> >> variable-sized memcpy.
>> >> I'm not surprised of that, because taking a brief look at what happens
>> >> under the hood when you call an access_memory() function - it looks
>> >> like a lot of operations.
>> >
>> > Great, thanks for confirming!
>> >
>> > Paolo
>>
>> No problems.
>>
>> I have some additional (orthogonal) curiosities:
>>
>>   1) Assuming "hw/virtio/dataplane/vring.c" is what I think it is (VQ
>> data structures directly accessible in the host virtual memory, with
>> guest-phyisical-to-host-virtual mapping done statically at setup time)
>> why isn't QEMU using this approach also for virtio-net? I see it is
>> used by virtio-blk only.
>
> Because on Linux, nothing would be gained as compared to using vhost-net
> in kernel or vhost-user with dpdk.  virtio-net is there for non-Linux
> hosts, keeping it simple is important to avoid e.g. security problems.
> Same as serial, etc.

Ok, thanks for the clarification.

>
>>   2) In any case (vring or not) QEMU dynamically maps data buffers
>> from guest physical memory, for each descriptor to be processed: e1000
>> uses pci_dma_read/pci_dma_write, virtio uses
>> cpu_physical_memory_map()/cpu_physical_memory_unmap(), vring uses the
>> more specialied vring_map()/vring_unmap(). All of these go through
>> expensive lookups and related operations to do the address
>> translation.
>> Have you considered the possibility to cache the translation result to
>> remove this bottleneck (maybe just for virtio devices)? Or is any
>> consistency or migration-related problem that would create issues?
>> Just to give an example of what I'm talking about:
>> https://github.com/vmaffione/qemu/blob/master/hw/net/e1000.c#L349-L423.
>>
>> At very high packet rates, once notifications (kicks and interrupts)
>> have been amortized in some way, memory translation becomes the major
>> bottleneck. And this (1 and 2) is why QEMU virtio implementation
>> cannot achieve the same throughput as bhyve does (5-6 Mpps or more
>> IIRC).
>>
>> Cheers,
>>   Vincenzo
>>
>>
>>
>> --
>> Vincenzo Maffione



-- 
Vincenzo Maffione

Reply via email to