Am 22.10.2018 um 13:26 schrieb Harry Schmalzbauer:
…
Test-Runs:
Each hypervisor had only the one bench-guest running, no other
tasks/guests were running besides system's native standard processes.
Since the time between powering up the guest and finishing logon
differed notably (~5s vs. ~20s) from one host to the other, I did a
quick synthetic IO-Test beforehand.
I'm using IOmeter since heise.de published a great test pattern called
IOmix – about 18 years ago I guess. This access pattern has always
perfectly reflected the system performance for human computer usage
with non-caculation-centric applications, and still is my favourite,
despite throughput and latency changed by some orders of manitudes
during the last decade (and I had defined something for "fio" which
mimics IOmix and shows reasonable relational results; but I'm still
prefering IOmeter for homogenous IO benchmarking).
The results is about factor 7 :-(
~3800iops&69MB/s (CPU-guest-usage 42%IOmeter+12%irq)
vs.
~29000iops&530MB/s (CPU-guest-usage 11%IOmeter+19%irq)
[with debug kernel and debug-malloc, numbers are 3000iops&56MB/s,
virtio-blk instead of ahci,hd: results in 5660iops&104MB/s with
non-debug kernel
– much better, but even higher CPU load and still factor 4 slower]
What I don't understand is, why the IOmeter process differs that much
in CPU utilization!?! It's the same binary on the same OS (guest)
with the same OS-driver and the same underlying hardware – "just" the
AHCI emulation and the vmm differ...
Unfortunately, the picture for virtio-net vs. vmxnet3 is similar sad.
Copying a single 5GB file from CIFS share to DB-ssd results in 100%
guest-CPU usage, where 40% are irqs and the throughput max out at
~40MB/s.
When copying the same file from the same source with the same guest on
the same host but host booted ESXi, there's 20% guest-CPU usage while
transfering 111MB/s – the uplink GbE limit.
These synthetic benchmark very well explain the "feelable" difference
when using a guest between the two hypervisors, but
…
To add an additional and rather surprinsing result, at least for me:
Virtualbox provides
'VBoxManage internalcommands createrawvmdk -filename
"testbench_da0.vmdk" -rawdisk /dev/da0'
So I could use the exactly same test setup as for ESXi and bhyve.
FreeBSD-Virtualbox (running on the same host installation like bhyve)
performed quiet well, although it doesn't survive IOmix benchmark run
when the "testbench_da0.vmdk" (the "raw" SSD-R0-array) is hooked up to
the SATA controller.
But connected to the emulated SAS controller(LSI1068), it runs without
problems and results in 9600iops@185MB/s with 1%IOmeter+7%irq CPU
utilization (yes, 1% vs. 42% for IOmeter load).
Still far away from what ESXi provides, but almost double performance of
virtio-blk with bhyve, and most important, much less load (host and
guest show exactly the same low values as opposed to the very high loads
which are shown on host and guest with bhyve:virtio-blk).
The HDtune random access benchmark also shows the factor 2, linear over
all block sizes.
Virtualbox's virtio-net setup gives ~100MB/s with peaks at 111 and ~40%
CPU load.
Guest uses the same driver like with bhyve:virtio-blk, while backend of
virtualbox:virtio-net is vboxnetflt utilizing netgraph and vboxnetadp.ko
vs. tap(4).
So not only the IO efficiency (lower throughput but also much lower CPU
utilization) is remarbably better, but also the network performance.
Even low-bandwidth RDP sessions via GbE-LAN suffer from micro hangs
under bhyve and virtio-net. And 40MB/s transfers cause 100% CPU load on
bhyve – both runs had exactly the same WIndows virtio-net driver in use
(RedHat 141).
Conclusion: Virtualbox vs. ESXi shows a 0.5% efficiency factor, while
bhyve vs. ESXi shows 0.25% overall efficiency factor.
I tried to provide a test environment with shortest hardware paths
possible. At least the benchmarks were run 100% reproducable with the
same binaries.
So I'm really interested if
…
Are these (emulation(only?) related, I guess) performace issues well
known? I mean, does somebody know what needs to be done in what area,
in order to catch up with the other results? So it's just a matter of
time/resources?
Or are these results surprising and extensive analysis must be done
before anybody can tell how to fix the IO limitations?
Is the root cause for the problematic low virtio-net throughput
probably the same as for the disk IO limits? Both really hurt in my
use case and the host is not idling in relation, but even showing
higher load with lower results. So even if the lower
user-experience-performance would be considered as toleratable, the
guests/host ratio was only half dense.
Thanks,
-harry
_______________________________________________
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to
"freebsd-virtualization-unsubscr...@freebsd.org"