On Wed, Dec 16, 2015 at 11:39:46AM +0100, Vincenzo Maffione wrote: > 2015-12-16 10:34 GMT+01:00 Paolo Bonzini <pbonz...@redhat.com>: > > > > > > On 16/12/2015 10:28, Vincenzo Maffione wrote: > >> Assuming my TX experiments with disconnected backend (and I disable > >> CPU dynamic scaling of performance, etc.): > >> 1) after patch 1 and 2, virtio bottleneck jumps from ~1Mpps to 1.910 > >> Mpps. > >> 2) after patch 1,2 and 3, virtio bottleneck jumps to 2.039 Mpps. > >> > >> So I see an improvement for patch 3, and I guess it's because we avoid > >> an additional memory translation and related overhead. I believe that > >> avoiding the memory translation is more beneficial than avoiding the > >> variable-sized memcpy. > >> I'm not surprised of that, because taking a brief look at what happens > >> under the hood when you call an access_memory() function - it looks > >> like a lot of operations. > > > > Great, thanks for confirming! > > > > Paolo > > No problems. > > I have some additional (orthogonal) curiosities: > > 1) Assuming "hw/virtio/dataplane/vring.c" is what I think it is (VQ > data structures directly accessible in the host virtual memory, with > guest-phyisical-to-host-virtual mapping done statically at setup time) > why isn't QEMU using this approach also for virtio-net? I see it is > used by virtio-blk only.
Because on Linux, nothing would be gained as compared to using vhost-net in kernel or vhost-user with dpdk. virtio-net is there for non-Linux hosts, keeping it simple is important to avoid e.g. security problems. Same as serial, etc. > 2) In any case (vring or not) QEMU dynamically maps data buffers > from guest physical memory, for each descriptor to be processed: e1000 > uses pci_dma_read/pci_dma_write, virtio uses > cpu_physical_memory_map()/cpu_physical_memory_unmap(), vring uses the > more specialied vring_map()/vring_unmap(). All of these go through > expensive lookups and related operations to do the address > translation. > Have you considered the possibility to cache the translation result to > remove this bottleneck (maybe just for virtio devices)? Or is any > consistency or migration-related problem that would create issues? > Just to give an example of what I'm talking about: > https://github.com/vmaffione/qemu/blob/master/hw/net/e1000.c#L349-L423. > > At very high packet rates, once notifications (kicks and interrupts) > have been amortized in some way, memory translation becomes the major > bottleneck. And this (1 and 2) is why QEMU virtio implementation > cannot achieve the same throughput as bhyve does (5-6 Mpps or more > IIRC). > > Cheers, > Vincenzo > > > > -- > Vincenzo Maffione