Am 22.10.2018 um 13:26 schrieb Harry Schmalzbauer:
…
Test-Runs:
Each hypervisor had only the one bench-guest running, no other tasks/guests were running besides system's native standard processes. Since the time between powering up the guest and finishing logon differed notably (~5s vs. ~20s) from one host to the other, I did a quick synthetic IO-Test beforehand. I'm using IOmeter since heise.de published a great test pattern called IOmix – about 18 years ago I guess.  This access pattern has always perfectly reflected the system performance for human computer usage with non-caculation-centric applications, and still is my favourite, despite throughput and latency changed by some orders of manitudes during the last decade (and I had defined something for "fio" which mimics IOmix and shows reasonable relational results; but I'm still prefering IOmeter for homogenous IO benchmarking).

The results is about factor 7 :-(
~3800iops&69MB/s (CPU-guest-usage 42%IOmeter+12%irq)
                vs.
~29000iops&530MB/s (CPU-guest-usage 11%IOmeter+19%irq)


    [with debug kernel and debug-malloc, numbers are 3000iops&56MB/s,
     virtio-blk instead of ahci,hd: results in 5660iops&104MB/s with non-debug kernel
     – much better, but even higher CPU load and still factor 4 slower]

What I don't understand is, why the IOmeter process differs that much in CPU utilization!?!  It's the same binary on the same OS (guest) with the same OS-driver and the same underlying hardware – "just" the AHCI emulation and the vmm differ...

Unfortunately, the picture for virtio-net vs. vmxnet3 is similar sad.
Copying a single 5GB file from CIFS share to DB-ssd results in 100% guest-CPU usage, where 40% are irqs and the throughput max out at ~40MB/s. When copying the same file from the same source with the same guest on the same host but host booted ESXi, there's 20% guest-CPU usage while transfering 111MB/s – the uplink GbE limit.

These synthetic benchmark very well explain the "feelable" difference when using a guest between the two hypervisors, but
…

To add an additional and rather surprinsing result, at least for me:

Virtualbox provides
'VBoxManage internalcommands createrawvmdk -filename "testbench_da0.vmdk" -rawdisk /dev/da0'

So I could use the exactly same test setup as for ESXi and bhyve.
FreeBSD-Virtualbox (running on the same host installation like bhyve) performed quiet well, although it doesn't survive IOmix benchmark run when the "testbench_da0.vmdk" (the "raw" SSD-R0-array) is hooked up to the SATA controller. But connected to the emulated SAS controller(LSI1068), it runs without problems and results in 9600iops@185MB/s with 1%IOmeter+7%irq CPU utilization (yes, 1% vs. 42% for IOmeter load). Still far away from what ESXi provides, but almost double performance of virtio-blk with bhyve, and most important, much less load (host and guest show exactly the same low values as opposed to the very high loads which are shown on host and guest with bhyve:virtio-blk). The HDtune random access benchmark also shows the factor 2, linear over all block sizes.

Virtualbox's virtio-net setup gives ~100MB/s with peaks at 111 and ~40% CPU load. Guest uses the same driver like with bhyve:virtio-blk, while backend of virtualbox:virtio-net is vboxnetflt utilizing netgraph and vboxnetadp.ko vs. tap(4). So not only the IO efficiency (lower throughput but also much lower CPU utilization) is remarbably better, but also the network performance.  Even low-bandwidth RDP sessions via GbE-LAN suffer from micro hangs under bhyve and virtio-net.  And 40MB/s transfers cause 100% CPU load on bhyve – both runs had exactly the same WIndows virtio-net driver in use (RedHat 141).

Conclusion: Virtualbox vs. ESXi shows a 0.5% efficiency factor, while bhyve vs. ESXi shows 0.25% overall efficiency factor. I tried to provide a test environment with shortest hardware paths possible.  At least the benchmarks were run 100% reproducable with the same binaries.

So I'm really interested if
…
Are these (emulation(only?) related, I guess) performace issues well known?  I mean, does somebody know what needs to be done in what area, in order to catch up with the other results? So it's just a matter of time/resources? Or are these results surprising and extensive analysis must be done before anybody can tell how to fix the IO limitations?

Is the root cause for the problematic low virtio-net throughput probably the same as for the disk IO limits?  Both really hurt in my use case and the host is not idling in relation, but even showing higher load with lower results.  So even if the lower user-experience-performance would be considered as toleratable, the guests/host ratio was only half dense.

Thanks,

-harry

_______________________________________________
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Reply via email to