I did a bit of work on this back in early 2016 and wrote a paper which analyzes what Intel were doing with Clear Containers back then, and how it fitted with the more distribution-centric view of Fedora and Red Hat, ie that we ideally want a single qemu binary, a single kernel, a single SeaBIOS, etc.
You can read the paper here: http://oirase.annexia.org/tmp/paper.pdf and the source is here: http://git.annexia.org/?p=libguestfs-talks.git;a=tree;f=2016-eng-talk;h=5a0a29ceb1e9db39539669717ec06e4f94eba086;hb=HEAD To address a few points from this thread: * As Paolo mentioned one problem is we link qemu to so many libraries, and glibc / ELF dynamic loading is very slow. Modularizing qemu could help here. Reducing symbol interposition (-fvisibility=hidden) in more libraries would help a bit. Linker security features enabled in downstream distros don't help. Rewriting ELF to be less crazy would help a lot but good luck there :-) * As Stefan & Paolo mentioned, it would be nice if SeaBIOS was faster in the default configuration. I ended up compiling a special minimal SeaBIOS which saved a load of time, mainly not probing PCI unnecessarily IIRC. * Considerable time is taken in booting the kernel, and that's mostly in running all the initcall functions. We wanted to use a Fedora distro kernel, but unfortunately many subsystems do loads of initcall work which is run even when that subsystem is not used. This is why compiling a custom kernel (compiling out these subsystems) is enticing. Parallelizing initialization could help here (however at the moment using -smp slows things down), also parallelizing was rejected upstream IIRC. * PCI config space probing is really slow. Unfortunately accelerating it in the kernel doesn't seem either very easy or very acceptable to KVM upstream: the use case is rather narrow & the implementation seems like it would be very complex. I also write a parallelizing PCI probe for Linux which helped a bit but wasn't upstream material. * udev is another huge problem. It's slow, it's monolithic, it resists modifications such as modularization or removing parts. * Debugging over the UART is slow. libguestfs also ships with benchmarking tools which can be very useful to actually measure boot time: https://github.com/libguestfs/libguestfs/tree/master/utils HTH, Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html