Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-09-01 Thread Jing Liu




On 8/30/2019 10:27 PM, Sergio Lopez wrote:


Jing Liu  writes:


Hi Sergio,

On 8/29/2019 11:46 PM, Sergio Lopez wrote:


Jing Liu  writes:


Hi Sergio,

The idea is interesting and I tried to launch a guest by your
guide but seems failed to me. I tried both legacy and normal modes,
but the vncviewer connected and told me that:
The vm has no graphic display device.
All the screen in vnc is just black.


The microvm machine type doesn't support any graphics device, so you
need to rely on the serial console.

Got it.




kernel config:
CONFIG_KVM_MMIO=y
CONFIG_VIRTIO_MMIO=y

I don't know if any specified kernel version/patch/config
is needed or anything I missed.
Could you kindly give some tips?


I'm testing it with upstream vanilla Linux. In addition to MMIO, you
need to add support for PVH (the next version of this patchset, v4, will
support booting from FW, so it'll be possible to use non-PVH ELF kernels
and bzImages too).

I've just uploaded a working kernel config here:

https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9


Thanks very much and this config is helpful to me.


As for the QEMU command line, something like this should do the trick:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel 
vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults 
-no-user-config -nographic -serial stdio

If this works, you can move to non-legacy mode with a virtio-console:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux 
-append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic 
-serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device 
virtconsole,chardev=virtiocon0


I tried the above two ways and it works now. Thanks!


If is still working, you can try adding some devices too:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux 
-append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults 
-no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device 
virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet 
-device virtio-net-device,netdev=testnet -drive 
id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device 
virtio-blk-device,drive=test


But I'm wondering why the image I used can not be found.
root=/dev/vda3 and the same image worked well on normal qemu/guest-
config bootup, but didn't work here. The details are,

-append "console=hvc0 reboot=k panic=1 root=/dev/vda3 rw rootfstype=ext4" \

[0.022784] Key type encrypted registered
[0.022988] VFS: Cannot open root device "vda3" or
unknown-block(254,3): error -6
[0.023041] Please append a correct "root=" boot option; here are
the available partitions:
[0.023089] fe00 8946688 vda
[0.023090]  driver: virtio_blk
[0.023143] Kernel panic - not syncing: VFS: Unable to mount root
fs on unknown-block(254,3)
[0.023201] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc3 #23


BTW, root=/dev/vda is also tried and didn't work. The dmesg is a
little different:

[0.028050] Key type encrypted registered
[0.028484] List of all partitions:
[0.028529] fe00 8946688 vda
[0.028529]  driver: virtio_blk
[0.028615] No filesystem could mount root, tried:
[0.028616]  ext4
[0.028670]
[0.028712] Kernel panic - not syncing: VFS: Unable to mount root
fs on unknown-block(254,0)

I tried another ext4 img but still doesn't work.
Is there any limitation of blk image? Could I copy your image for simple
test?


The kernel config I posted lacks support for DOS partitions. Adding
CONFIG_MSDOS_PARTITION=y should allow you to boot from /dev/vda3.

Anyway, in case you also want to try booting from /dev/vda (without
partitions), this is the recipe I use to quickly create a minimal rootfs
image:

# wget 
http://dl-cdn.alpinelinux.org/alpine/v3.10/releases/x86_64/alpine-minirootfs-3.10.2-x86_64.tar.gz
# qemu-img create -f raw alpine-rootfs-x86_64.raw 1G
# sudo losetup /dev/loop0 alpine-rootfs-x86_64.raw
# sudo mkfs.ext4 /dev/loop0
# sudo mount /dev/loop0 /mnt
# sudo tar xpf alpine-minirootfs-3.10.2-x86_64.tar.gz -C /mnt
# sudo umount /mnt
# sudo losetup -d /dev/loop0

The rootfs will be missing openrc, so you'll need to add "init=/bin/sh"
to the command line.



Thank you Sergio. I'll try that.

Jing

Sergio.


Thanks in advance,
Jing


Sergio.




Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-08-30 Thread Sergio Lopez

Jing Liu  writes:

> Hi Sergio,
>
> On 8/29/2019 11:46 PM, Sergio Lopez wrote:
>>
>> Jing Liu  writes:
>>
>>> Hi Sergio,
>>>
>>> The idea is interesting and I tried to launch a guest by your
>>> guide but seems failed to me. I tried both legacy and normal modes,
>>> but the vncviewer connected and told me that:
>>> The vm has no graphic display device.
>>> All the screen in vnc is just black.
>>
>> The microvm machine type doesn't support any graphics device, so you
>> need to rely on the serial console.
> Got it.
>
>>
>>> kernel config:
>>> CONFIG_KVM_MMIO=y
>>> CONFIG_VIRTIO_MMIO=y
>>>
>>> I don't know if any specified kernel version/patch/config
>>> is needed or anything I missed.
>>> Could you kindly give some tips?
>>
>> I'm testing it with upstream vanilla Linux. In addition to MMIO, you
>> need to add support for PVH (the next version of this patchset, v4, will
>> support booting from FW, so it'll be possible to use non-PVH ELF kernels
>> and bzImages too).
>>
>> I've just uploaded a working kernel config here:
>>
>> https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9
>>
> Thanks very much and this config is helpful to me.
>
>> As for the QEMU command line, something like this should do the trick:
>>
>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M 
>> microvm,legacy -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 
>> reboot=k panic=1" -nodefaults -no-user-config -nographic -serial stdio
>>
>> If this works, you can move to non-legacy mode with a virtio-console:
>>
>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm 
>> -kernel vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults 
>> -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server 
>> -device virtio-serial-device -device virtconsole,chardev=virtiocon0
>>
> I tried the above two ways and it works now. Thanks!
>
>> If is still working, you can try adding some devices too:
>>
>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm 
>> -kernel vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" 
>> -nodefaults -no-user-config -nographic -serial pty -chardev 
>> stdio,id=virtiocon0,server -device virtio-serial-device -device 
>> virtconsole,chardev=virtiocon0 -netdev user,id=testnet -device 
>> virtio-net-device,netdev=testnet -drive 
>> id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device 
>> virtio-blk-device,drive=test
>>
> But I'm wondering why the image I used can not be found.
> root=/dev/vda3 and the same image worked well on normal qemu/guest-
> config bootup, but didn't work here. The details are,
>
> -append "console=hvc0 reboot=k panic=1 root=/dev/vda3 rw rootfstype=ext4" \
>
> [0.022784] Key type encrypted registered
> [0.022988] VFS: Cannot open root device "vda3" or
> unknown-block(254,3): error -6
> [0.023041] Please append a correct "root=" boot option; here are
> the available partitions:
> [0.023089] fe00 8946688 vda
> [0.023090]  driver: virtio_blk
> [0.023143] Kernel panic - not syncing: VFS: Unable to mount root
> fs on unknown-block(254,3)
> [0.023201] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc3 #23
>
>
> BTW, root=/dev/vda is also tried and didn't work. The dmesg is a
> little different:
>
> [0.028050] Key type encrypted registered
> [0.028484] List of all partitions:
> [0.028529] fe00 8946688 vda
> [0.028529]  driver: virtio_blk
> [0.028615] No filesystem could mount root, tried:
> [0.028616]  ext4
> [0.028670]
> [0.028712] Kernel panic - not syncing: VFS: Unable to mount root
> fs on unknown-block(254,0)
>
> I tried another ext4 img but still doesn't work.
> Is there any limitation of blk image? Could I copy your image for simple
> test?

The kernel config I posted lacks support for DOS partitions. Adding
CONFIG_MSDOS_PARTITION=y should allow you to boot from /dev/vda3.

Anyway, in case you also want to try booting from /dev/vda (without
partitions), this is the recipe I use to quickly create a minimal rootfs
image:

# wget 
http://dl-cdn.alpinelinux.org/alpine/v3.10/releases/x86_64/alpine-minirootfs-3.10.2-x86_64.tar.gz
# qemu-img create -f raw alpine-rootfs-x86_64.raw 1G
# sudo losetup /dev/loop0 alpine-rootfs-x86_64.raw
# sudo mkfs.ext4 /dev/loop0
# sudo mount /dev/loop0 /mnt
# sudo tar xpf alpine-minirootfs-3.10.2-x86_64.tar.gz -C /mnt
# sudo umount /mnt
# sudo losetup -d /dev/loop0

The rootfs will be missing openrc, so you'll need to add "init=/bin/sh"
to the command line.

Sergio.

> Thanks in advance,
> Jing
>
>> Sergio.
>>
>>> Thanks very much.
>>> Jing
>>>
>>>
>>>
 A QEMU instance with the microvm machine type can be invoked this way:

- Normal mode:

 qemu-system-x86_64 -M microvm -m 512m -smp 2 \
-kernel vmlinux -append "console=hvc0 root=/dev/vda" \
-nodefaults -no-user-config \
-chardev pty,id=virtiocon0,server \
-device virtio-serial-device \

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-08-29 Thread Jing Liu

Hi Sergio,

On 8/29/2019 11:46 PM, Sergio Lopez wrote:


Jing Liu  writes:


Hi Sergio,

The idea is interesting and I tried to launch a guest by your
guide but seems failed to me. I tried both legacy and normal modes,
but the vncviewer connected and told me that:
The vm has no graphic display device.
All the screen in vnc is just black.


The microvm machine type doesn't support any graphics device, so you
need to rely on the serial console.

Got it.




kernel config:
CONFIG_KVM_MMIO=y
CONFIG_VIRTIO_MMIO=y

I don't know if any specified kernel version/patch/config
is needed or anything I missed.
Could you kindly give some tips?


I'm testing it with upstream vanilla Linux. In addition to MMIO, you
need to add support for PVH (the next version of this patchset, v4, will
support booting from FW, so it'll be possible to use non-PVH ELF kernels
and bzImages too).

I've just uploaded a working kernel config here:

https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9


Thanks very much and this config is helpful to me.


As for the QEMU command line, something like this should do the trick:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel 
vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults 
-no-user-config -nographic -serial stdio

If this works, you can move to non-legacy mode with a virtio-console:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux 
-append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic 
-serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device 
virtconsole,chardev=virtiocon0


I tried the above two ways and it works now. Thanks!


If is still working, you can try adding some devices too:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux 
-append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults 
-no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device 
virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet 
-device virtio-net-device,netdev=testnet -drive 
id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device 
virtio-blk-device,drive=test


But I'm wondering why the image I used can not be found.
root=/dev/vda3 and the same image worked well on normal qemu/guest-
config bootup, but didn't work here. The details are,

-append "console=hvc0 reboot=k panic=1 root=/dev/vda3 rw rootfstype=ext4" \

[0.022784] Key type encrypted registered
[0.022988] VFS: Cannot open root device "vda3" or 
unknown-block(254,3): error -6
[0.023041] Please append a correct "root=" boot option; here are the 
available partitions:

[0.023089] fe00 8946688 vda
[0.023090]  driver: virtio_blk
[0.023143] Kernel panic - not syncing: VFS: Unable to mount root fs 
on unknown-block(254,3)

[0.023201] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc3 #23


BTW, root=/dev/vda is also tried and didn't work. The dmesg is a little 
different:


[0.028050] Key type encrypted registered
[0.028484] List of all partitions:
[0.028529] fe00 8946688 vda
[0.028529]  driver: virtio_blk
[0.028615] No filesystem could mount root, tried:
[0.028616]  ext4
[0.028670]
[0.028712] Kernel panic - not syncing: VFS: Unable to mount root fs 
on unknown-block(254,0)


I tried another ext4 img but still doesn't work.
Is there any limitation of blk image? Could I copy your image for simple
test?

Thanks in advance,
Jing


Sergio.


Thanks very much.
Jing




A QEMU instance with the microvm machine type can be invoked this way:

   - Normal mode:

qemu-system-x86_64 -M microvm -m 512m -smp 2 \
   -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
   -nodefaults -no-user-config \
   -chardev pty,id=virtiocon0,server \
   -device virtio-serial-device \
   -device virtconsole,chardev=virtiocon0 \
   -drive id=test,file=test.img,format=raw,if=none \
   -device virtio-blk-device,drive=test \
   -netdev tap,id=tap0,script=no,downscript=no \
   -device virtio-net-device,netdev=tap0

   - Legacy mode:

qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
   -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
   -nodefaults -no-user-config \
   -drive id=test,file=test.img,format=raw,if=none \
   -device virtio-blk-device,drive=test \
   -netdev tap,id=tap0,script=no,downscript=no \
   -device virtio-net-device,netdev=tap0 \
   -serial stdio







Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-08-29 Thread Sergio Lopez

Jing Liu  writes:

> Hi Sergio,
>
> The idea is interesting and I tried to launch a guest by your
> guide but seems failed to me. I tried both legacy and normal modes,
> but the vncviewer connected and told me that:
> The vm has no graphic display device.
> All the screen in vnc is just black.

The microvm machine type doesn't support any graphics device, so you
need to rely on the serial console.

> kernel config:
> CONFIG_KVM_MMIO=y
> CONFIG_VIRTIO_MMIO=y
>
> I don't know if any specified kernel version/patch/config
> is needed or anything I missed.
> Could you kindly give some tips?

I'm testing it with upstream vanilla Linux. In addition to MMIO, you
need to add support for PVH (the next version of this patchset, v4, will
support booting from FW, so it'll be possible to use non-PVH ELF kernels
and bzImages too).

I've just uploaded a working kernel config here:

https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9

As for the QEMU command line, something like this should do the trick:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy 
-kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" 
-nodefaults -no-user-config -nographic -serial stdio

If this works, you can move to non-legacy mode with a virtio-console:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel 
vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config 
-nographic -serial pty -chardev stdio,id=virtiocon0,server -device 
virtio-serial-device -device virtconsole,chardev=virtiocon0

If is still working, you can try adding some devices too:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel 
vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults 
-no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server 
-device virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev 
user,id=testnet -device virtio-net-device,netdev=testnet -drive 
id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device 
virtio-blk-device,drive=test

Sergio.

> Thanks very much.
> Jing
>
>
>
>> A QEMU instance with the microvm machine type can be invoked this way:
>>
>>   - Normal mode:
>>
>> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>>   -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>   -nodefaults -no-user-config \
>>   -chardev pty,id=virtiocon0,server \
>>   -device virtio-serial-device \
>>   -device virtconsole,chardev=virtiocon0 \
>>   -drive id=test,file=test.img,format=raw,if=none \
>>   -device virtio-blk-device,drive=test \
>>   -netdev tap,id=tap0,script=no,downscript=no \
>>   -device virtio-net-device,netdev=tap0
>>
>>   - Legacy mode:
>>
>> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>>   -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>>   -nodefaults -no-user-config \
>>   -drive id=test,file=test.img,format=raw,if=none \
>>   -device virtio-blk-device,drive=test \
>>   -netdev tap,id=tap0,script=no,downscript=no \
>>   -device virtio-net-device,netdev=tap0 \
>>   -serial stdio
>>



signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-08-29 Thread Jing Liu

Hi Sergio,

The idea is interesting and I tried to launch a guest by your
guide but seems failed to me. I tried both legacy and normal modes,
but the vncviewer connected and told me that:
The vm has no graphic display device.
All the screen in vnc is just black.

kernel config:
CONFIG_KVM_MMIO=y
CONFIG_VIRTIO_MMIO=y

I don't know if any specified kernel version/patch/config
is needed or anything I missed.
Could you kindly give some tips?

Thanks very much.
Jing




A QEMU instance with the microvm machine type can be invoked this way:

  - Normal mode:

qemu-system-x86_64 -M microvm -m 512m -smp 2 \
  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
  -nodefaults -no-user-config \
  -chardev pty,id=virtiocon0,server \
  -device virtio-serial-device \
  -device virtconsole,chardev=virtiocon0 \
  -drive id=test,file=test.img,format=raw,if=none \
  -device virtio-blk-device,drive=test \
  -netdev tap,id=tap0,script=no,downscript=no \
  -device virtio-net-device,netdev=tap0

  - Legacy mode:

qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
  -nodefaults -no-user-config \
  -drive id=test,file=test.img,format=raw,if=none \
  -device virtio-blk-device,drive=test \
  -netdev tap,id=tap0,script=no,downscript=no \
  -device virtio-net-device,netdev=tap0 \
  -serial stdio





Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-26 Thread Igor Mammedov
On Thu, 25 Jul 2019 13:38:48 -0400
"Michael S. Tsirkin"  wrote:

> On Thu, Jul 25, 2019 at 05:39:39PM +0200, Paolo Bonzini wrote:
> > On 25/07/19 17:01, Michael S. Tsirkin wrote:  
> > >> It would be educational to try to enable ACPI core but disable all
> > >> optional features.  
> > 
> > A lot of them are select'ed so it's not easy.
> >   
> > > Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.  
> > 
> > That's what the NEMU guys experimented with.  It's not supported by our
> > DSDT since it uses ACPI GPE,  
> 
> Well there are two GPE blocks in FADT. We could just switch to
> these if necesary I think.

if it's simplistic vm we could build dedicated DSDT (or whole set of tables)
for it and use reduced profile like arm-virt machine does (just a newer
version of FADT with need flags set). That probably would cut acpi cost on
QEMU side.

> > and the reduction in code size is small
> > (about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?).
> > 
> > Paolo  
> 
> Well ACPI is 150k loc I think, right?
> 
> linux]$ wc -l `find drivers/acpi/ -name '*.c' `|tail -1
>  145926 total
> 
> So 100k wouldn't be too shabby.
> 




Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-26 Thread Michael S. Tsirkin
On Fri, Jul 26, 2019 at 09:57:51AM +0200, Paolo Bonzini wrote:
> On 25/07/19 22:30, Michael S. Tsirkin wrote:
> > On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
> >> On 25/07/19 16:46, Michael S. Tsirkin wrote:
> >>> Actually, I think I have a better idea.
> >>> At the moment we just get an exit on these reads and return all-ones.
> >>> Yes, in theory there could be a UR bit set in a bunch of
> >>> registers but in practice no one cares about these,
> >>> and I don't think we implement them.
> >>> So how about mapping a single page, read-only, and filling it
> >>> with all-ones?
> >>
> >> Yes, that's nice indeed. :)  But it does have some cost, in terms of
> >> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
> >>
> >> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
> >>
> >> Paolo
> > 
> > I think I know what you are thinking of doing:
> > map /dev/zero so we get a single VMA but all mapped to
> > a single zero pte?
> 
> Yes, exactly.  You absolutely need to share the page because the guest
> could easily touch 32*256 pages just to scan function 0 on every bus and
> device, even if the VM has just 4 or 5 devices and all of them on the
> root complex.  And that causes fragmentation so you have to map bigger
> areas.
> 
> > - we can implement /dev/ones. in fact, we can implement
> >   /dev/byteXX for each possible value, the cost will
> >   be only 1M on a 4k page system.
> >   it might come in handy for e.g. free page hinting:
> >   at the moment if guest memory is poisoned
> >   we can not unmap it, with this trick we can
> >   map it to /dev/byteXX.
> 
> I also thought of /dev/ones, not sure how it would be accepted. :)  Also
> you cannot map lazily on page fault, otherwise you get a vmexit and it's
> slow again.  So /dev/ones needs to be written to use a huge page, possibly.
> 
> Paolo

It's not easy to do that - each device gets 4K within MCFG.

So what we need then is a kvm option to create an address range - or
maybe even a group of address ranges and aggressively map all pages in a
group to the same guest page on a fault of one page in the group.

-- 
MST



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-26 Thread Paolo Bonzini
On 25/07/19 22:30, Michael S. Tsirkin wrote:
> On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
>> On 25/07/19 16:46, Michael S. Tsirkin wrote:
>>> Actually, I think I have a better idea.
>>> At the moment we just get an exit on these reads and return all-ones.
>>> Yes, in theory there could be a UR bit set in a bunch of
>>> registers but in practice no one cares about these,
>>> and I don't think we implement them.
>>> So how about mapping a single page, read-only, and filling it
>>> with all-ones?
>>
>> Yes, that's nice indeed. :)  But it does have some cost, in terms of
>> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
>>
>> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
>>
>> Paolo
> 
> I think I know what you are thinking of doing:
> map /dev/zero so we get a single VMA but all mapped to
> a single zero pte?

Yes, exactly.  You absolutely need to share the page because the guest
could easily touch 32*256 pages just to scan function 0 on every bus and
device, even if the VM has just 4 or 5 devices and all of them on the
root complex.  And that causes fragmentation so you have to map bigger
areas.

> - we can implement /dev/ones. in fact, we can implement
>   /dev/byteXX for each possible value, the cost will
>   be only 1M on a 4k page system.
>   it might come in handy for e.g. free page hinting:
>   at the moment if guest memory is poisoned
>   we can not unmap it, with this trick we can
>   map it to /dev/byteXX.

I also thought of /dev/ones, not sure how it would be accepted. :)  Also
you cannot map lazily on page fault, otherwise you get a vmexit and it's
slow again.  So /dev/ones needs to be written to use a huge page, possibly.

Paolo



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
> On 25/07/19 16:46, Michael S. Tsirkin wrote:
> > Actually, I think I have a better idea.
> > At the moment we just get an exit on these reads and return all-ones.
> > Yes, in theory there could be a UR bit set in a bunch of
> > registers but in practice no one cares about these,
> > and I don't think we implement them.
> > So how about mapping a single page, read-only, and filling it
> > with all-ones?
> 
> Yes, that's nice indeed. :)  But it does have some cost, in terms of
> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
> 
> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
> 
> Paolo

I think I know what you are thinking of doing:
map /dev/zero so we get a single VMA but all mapped to
a single zero pte?

We could start with that, at least as an experiment.
Further:

- we can limit the amount of fragmentation and simply
  unmap everything if we exceed a specific limit:
  with more than X devices it's no longer a lightweight
  VM anyway :)

- we can implement /dev/ones. in fact, we can implement
  /dev/byteXX for each possible value, the cost will
  be only 1M on a 4k page system.
  it might come in handy for e.g. free page hinting:
  at the moment if guest memory is poisoned
  we can not unmap it, with this trick we can
  map it to /dev/byteXX.

Note that the kvm memory array is still fragmented.
Again, we can fallback on disabling the optimization
if there are too many devices.


-- 
MST



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2019 at 05:39:39PM +0200, Paolo Bonzini wrote:
> On 25/07/19 17:01, Michael S. Tsirkin wrote:
> >> It would be educational to try to enable ACPI core but disable all
> >> optional features.
> 
> A lot of them are select'ed so it's not easy.
> 
> > Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.
> 
> That's what the NEMU guys experimented with.  It's not supported by our
> DSDT since it uses ACPI GPE,

Well there are two GPE blocks in FADT. We could just switch to
these if necesary I think.

> and the reduction in code size is small
> (about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?).
> 
> Paolo

Well ACPI is 150k loc I think, right?

linux]$ wc -l `find drivers/acpi/ -name '*.c' `|tail -1
 145926 total

So 100k wouldn't be too shabby.

-- 
MST



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
> On 25/07/19 16:46, Michael S. Tsirkin wrote:
> > Actually, I think I have a better idea.
> > At the moment we just get an exit on these reads and return all-ones.
> > Yes, in theory there could be a UR bit set in a bunch of
> > registers but in practice no one cares about these,
> > and I don't think we implement them.
> > So how about mapping a single page, read-only, and filling it
> > with all-ones?
> 
> Yes, that's nice indeed. :)  But it does have some cost, in terms of
> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
> 
> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
> 
> Paolo

It isn't but that's not what baremetal does. So there's some risk
there ...

Why is all zeroes better? We still need to map it, right?

-- 
MST



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Sergio Lopez

Michael S. Tsirkin  writes:

> On Thu, Jul 25, 2019 at 10:58:22AM -0400, Michael S. Tsirkin wrote:
>> On Thu, Jul 25, 2019 at 04:42:42PM +0200, Sergio Lopez wrote:
>> > 
>> > Paolo Bonzini  writes:
>> > 
>> > > On 25/07/19 15:26, Stefan Hajnoczi wrote:
>> > >> The microvm design has a premise and it can be answered definitively
>> > >> through performance analysis.
>> > >> 
>> > >> If I had to explain to someone why PCI or ACPI significantly slows
>> > >> things down, I couldn't honestly do so.  I say significantly because
>> > >> PCI init definitely requires more vmexits but can it be a small
>> > >> number?  For ACPI I have no idea why it would consume significant
>> > >> amounts of time.
>> > >
>> > > My guess is that it's just a lot of code that has to run. :(
>> > 
>> > I think I haven't shared any numbers about ACPI.
>> > 
>> > I don't have details about where exactly the time is spent, but
>> > compiling a guest kernel without ACPI decreases the average boot time in
>> > ~12ms, and the kernel's unstripped ELF binary size goes down in a
>> > whooping ~300KiB.
>> 
>> At least the binary size is hardly surprising.
>> 
>> I'm guessing you built in lots of drivers.
>> 
>> It would be educational to try to enable ACPI core but disable all
>> optional features.

I just tried disabling everything that menuconfig allowed me to. Saves
~27KiB and doesn't improve boot time.

> Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.

I also tried enabling this one in my original config. It saves ~11.5KiB,
and has on impact on boot time either.

>> 
>> > On the other hand, removing ACPI from QEMU decreases its initialization
>> > time in ~5ms, and the binary size is ~183KiB smaller.
>> 
>> Yes - ACPI generation uses a ton of allocations and data copies.
>> 
>> Need to play with pre-allocation strategies. Maybe something
>> as simple as:
>> 
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index f3fdfefcd5..24becc069e 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -2629,8 +2629,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
>> *machine)
>>  acpi_get_pci_holes(_hole, _hole64);
>>  acpi_get_slic_oem(_oem);
>>  
>> +#define DEFAULT_ARRAY_SIZE 16
>>  table_offsets = g_array_new(false, true /* clear */,
>> -sizeof(uint32_t));
>> +sizeof(uint32_t),
>> +DEFAULT_ARRAY_SIZE);
>>  ACPI_BUILD_DPRINTF("init ACPI tables\n");
>>  
>>  bios_linker_loader_alloc(tables->linker,
>> 
>> will already help a bit.
>> 
>> > 
>> > IMHO, those are pretty relevant savings on both fronts.
>> > 
>> > >> Until we have this knowledge, the premise of microvm is unproven and
>> > >> merging it would be premature because maybe we can get into the same
>> > >> ballpark by optimizing existing code.
>> > >> 
>> > >> I'm sorry for being a pain.  I actually think the analysis will
>> > >> support microvm, but it still needs to be done in order to justify it.
>> > >
>> > > No, you're not a pain, you're explaining your reasoning and that helps.
>> > >
>> > > To me *maintainability is the biggest consideration* when introducing a
>> > > new feature.  "We can do just as well with q35" is a good reason to
>> > > deprecate and delete microvm, but not a good reason to reject it now as
>> > > long as microvm is good enough in terms of maintainability.  Keeping it
>> > > out of tree only makes it harder to do this kind of experiment.  virtio
>> > > 1 seems to be the biggest remaining blocker and I think it'd be a good
>> > > thing to have even for the ARM virt machine type.
>> > >
>> > > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
>> > > and ~25 ms in the kernel.  I must say that's pretty good, but it's still
>> > > 30% of the whole boot time and reducing it is the hardest part.  If
>> > > having microvm in tree can help reducing it, good.  Yes, it will get
>> > > users, but most likely they will have to support pc or q35 as a fallback
>> > > so we could still delete microvm at any time with the due deprecation
>> > > period if it turns out to be a failed experiment.
>> > >
>> > > Whether to use qboot or SeaBIOS for microvm is another story, but it's
>> > > an implementation detail as long as the ROM size doesn't change and/or
>> > > we don't do versioned machine types.  So we can switch from one to the
>> > > other at any time; we can also include qboot directly in QEMU's tree,
>> > > without going through a submodule, which also reduces the infrastructure
>> > > needed (mirrors, etc.) and makes it easier to delete it.
>> > >
>> > > Paolo
>> > >
>> > > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
>> > > last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
>> > > end up measured as PCI in SeaBIOS, due to different init order, so the
>> > > real firmware cost of PAM and PCI 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Paolo Bonzini
On 25/07/19 17:01, Michael S. Tsirkin wrote:
>> It would be educational to try to enable ACPI core but disable all
>> optional features.

A lot of them are select'ed so it's not easy.

> Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.

That's what the NEMU guys experimented with.  It's not supported by our
DSDT since it uses ACPI GPE, and the reduction in code size is small
(about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?).

Paolo



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Paolo Bonzini
On 25/07/19 16:46, Michael S. Tsirkin wrote:
> Actually, I think I have a better idea.
> At the moment we just get an exit on these reads and return all-ones.
> Yes, in theory there could be a UR bit set in a bunch of
> registers but in practice no one cares about these,
> and I don't think we implement them.
> So how about mapping a single page, read-only, and filling it
> with all-ones?

Yes, that's nice indeed. :)  But it does have some cost, in terms of
either number of VMAs or QEMU RSS since the MMCONFIG area is large.

What breaks if we return all zeroes?  Zero is not a valid vendor ID.

Paolo



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2019 at 10:58:22AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 25, 2019 at 04:42:42PM +0200, Sergio Lopez wrote:
> > 
> > Paolo Bonzini  writes:
> > 
> > > On 25/07/19 15:26, Stefan Hajnoczi wrote:
> > >> The microvm design has a premise and it can be answered definitively
> > >> through performance analysis.
> > >> 
> > >> If I had to explain to someone why PCI or ACPI significantly slows
> > >> things down, I couldn't honestly do so.  I say significantly because
> > >> PCI init definitely requires more vmexits but can it be a small
> > >> number?  For ACPI I have no idea why it would consume significant
> > >> amounts of time.
> > >
> > > My guess is that it's just a lot of code that has to run. :(
> > 
> > I think I haven't shared any numbers about ACPI.
> > 
> > I don't have details about where exactly the time is spent, but
> > compiling a guest kernel without ACPI decreases the average boot time in
> > ~12ms, and the kernel's unstripped ELF binary size goes down in a
> > whooping ~300KiB.
> 
> At least the binary size is hardly surprising.
> 
> I'm guessing you built in lots of drivers.
> 
> It would be educational to try to enable ACPI core but disable all
> optional features.

Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.


> 
> > On the other hand, removing ACPI from QEMU decreases its initialization
> > time in ~5ms, and the binary size is ~183KiB smaller.
> 
> Yes - ACPI generation uses a ton of allocations and data copies.
> 
> Need to play with pre-allocation strategies. Maybe something
> as simple as:
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index f3fdfefcd5..24becc069e 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2629,8 +2629,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
> *machine)
>  acpi_get_pci_holes(_hole, _hole64);
>  acpi_get_slic_oem(_oem);
>  
> +#define DEFAULT_ARRAY_SIZE 16
>  table_offsets = g_array_new(false, true /* clear */,
> -sizeof(uint32_t));
> +sizeof(uint32_t),
> +DEFAULT_ARRAY_SIZE);
>  ACPI_BUILD_DPRINTF("init ACPI tables\n");
>  
>  bios_linker_loader_alloc(tables->linker,
> 
> will already help a bit.
> 
> > 
> > IMHO, those are pretty relevant savings on both fronts.
> > 
> > >> Until we have this knowledge, the premise of microvm is unproven and
> > >> merging it would be premature because maybe we can get into the same
> > >> ballpark by optimizing existing code.
> > >> 
> > >> I'm sorry for being a pain.  I actually think the analysis will
> > >> support microvm, but it still needs to be done in order to justify it.
> > >
> > > No, you're not a pain, you're explaining your reasoning and that helps.
> > >
> > > To me *maintainability is the biggest consideration* when introducing a
> > > new feature.  "We can do just as well with q35" is a good reason to
> > > deprecate and delete microvm, but not a good reason to reject it now as
> > > long as microvm is good enough in terms of maintainability.  Keeping it
> > > out of tree only makes it harder to do this kind of experiment.  virtio
> > > 1 seems to be the biggest remaining blocker and I think it'd be a good
> > > thing to have even for the ARM virt machine type.
> > >
> > > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> > > and ~25 ms in the kernel.  I must say that's pretty good, but it's still
> > > 30% of the whole boot time and reducing it is the hardest part.  If
> > > having microvm in tree can help reducing it, good.  Yes, it will get
> > > users, but most likely they will have to support pc or q35 as a fallback
> > > so we could still delete microvm at any time with the due deprecation
> > > period if it turns out to be a failed experiment.
> > >
> > > Whether to use qboot or SeaBIOS for microvm is another story, but it's
> > > an implementation detail as long as the ROM size doesn't change and/or
> > > we don't do versioned machine types.  So we can switch from one to the
> > > other at any time; we can also include qboot directly in QEMU's tree,
> > > without going through a submodule, which also reduces the infrastructure
> > > needed (mirrors, etc.) and makes it easier to delete it.
> > >
> > > Paolo
> > >
> > > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
> > > last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
> > > end up measured as PCI in SeaBIOS, due to different init order, so the
> > > real firmware cost of PAM and PCI initialization should be 5ms for qboot
> > > and 10ms for SeaBIOS.
> > 
> 
> 



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2019 at 04:42:42PM +0200, Sergio Lopez wrote:
> 
> Paolo Bonzini  writes:
> 
> > On 25/07/19 15:26, Stefan Hajnoczi wrote:
> >> The microvm design has a premise and it can be answered definitively
> >> through performance analysis.
> >> 
> >> If I had to explain to someone why PCI or ACPI significantly slows
> >> things down, I couldn't honestly do so.  I say significantly because
> >> PCI init definitely requires more vmexits but can it be a small
> >> number?  For ACPI I have no idea why it would consume significant
> >> amounts of time.
> >
> > My guess is that it's just a lot of code that has to run. :(
> 
> I think I haven't shared any numbers about ACPI.
> 
> I don't have details about where exactly the time is spent, but
> compiling a guest kernel without ACPI decreases the average boot time in
> ~12ms, and the kernel's unstripped ELF binary size goes down in a
> whooping ~300KiB.

At least the binary size is hardly surprising.

I'm guessing you built in lots of drivers.

It would be educational to try to enable ACPI core but disable all
optional features.


> On the other hand, removing ACPI from QEMU decreases its initialization
> time in ~5ms, and the binary size is ~183KiB smaller.

Yes - ACPI generation uses a ton of allocations and data copies.

Need to play with pre-allocation strategies. Maybe something
as simple as:

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index f3fdfefcd5..24becc069e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2629,8 +2629,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
 acpi_get_pci_holes(_hole, _hole64);
 acpi_get_slic_oem(_oem);
 
+#define DEFAULT_ARRAY_SIZE 16
 table_offsets = g_array_new(false, true /* clear */,
-sizeof(uint32_t));
+sizeof(uint32_t),
+DEFAULT_ARRAY_SIZE);
 ACPI_BUILD_DPRINTF("init ACPI tables\n");
 
 bios_linker_loader_alloc(tables->linker,

will already help a bit.

> 
> IMHO, those are pretty relevant savings on both fronts.
> 
> >> Until we have this knowledge, the premise of microvm is unproven and
> >> merging it would be premature because maybe we can get into the same
> >> ballpark by optimizing existing code.
> >> 
> >> I'm sorry for being a pain.  I actually think the analysis will
> >> support microvm, but it still needs to be done in order to justify it.
> >
> > No, you're not a pain, you're explaining your reasoning and that helps.
> >
> > To me *maintainability is the biggest consideration* when introducing a
> > new feature.  "We can do just as well with q35" is a good reason to
> > deprecate and delete microvm, but not a good reason to reject it now as
> > long as microvm is good enough in terms of maintainability.  Keeping it
> > out of tree only makes it harder to do this kind of experiment.  virtio
> > 1 seems to be the biggest remaining blocker and I think it'd be a good
> > thing to have even for the ARM virt machine type.
> >
> > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> > and ~25 ms in the kernel.  I must say that's pretty good, but it's still
> > 30% of the whole boot time and reducing it is the hardest part.  If
> > having microvm in tree can help reducing it, good.  Yes, it will get
> > users, but most likely they will have to support pc or q35 as a fallback
> > so we could still delete microvm at any time with the due deprecation
> > period if it turns out to be a failed experiment.
> >
> > Whether to use qboot or SeaBIOS for microvm is another story, but it's
> > an implementation detail as long as the ROM size doesn't change and/or
> > we don't do versioned machine types.  So we can switch from one to the
> > other at any time; we can also include qboot directly in QEMU's tree,
> > without going through a submodule, which also reduces the infrastructure
> > needed (mirrors, etc.) and makes it easier to delete it.
> >
> > Paolo
> >
> > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
> > last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
> > end up measured as PCI in SeaBIOS, due to different init order, so the
> > real firmware cost of PAM and PCI initialization should be 5ms for qboot
> > and 10ms for SeaBIOS.
> 





Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Sergio Lopez

Michael S. Tsirkin  writes:

> On Thu, Jul 25, 2019 at 11:05:05AM +0100, Peter Maydell wrote:
>> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin  wrote:
>> > OK so please start with adding virtio 1 support. Guest bits
>> > have been ready for years now.
>> 
>> I'd still rather we just used pci virtio. If pci isn't
>> fast enough at startup, do something to make it faster...
>> 
>> thanks
>> -- PMM
>
> Oh that's putting microvm aside - if we have a maintainer for
> virtio mmio that's great because it does need a maintainer,
> and virtio 1 would be the thing to fix before adding features ;)

There seems to be a general consensus that virtio-mmio needs some care,
and looking at the specs, implementing virtio-mmio v2/virtio v1
shouldn't be too time consuming, so I'm going to give it a try.

Cheers,
Sergio.


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Wed, Jul 24, 2019 at 01:14:35PM +0200, Paolo Bonzini wrote:
> On 23/07/19 12:01, Paolo Bonzini wrote:
> > The number of buses is determined by the firmware, not by QEMU, so
> > fw_cfg would not be the right interface.  In fact (as I have just
> > learnt) lastbus is an x86-specific option that overrides the last bus
> > returned by SeaBIOS's handle_1ab101.
> > 
> > So the next step could be to figure out what is the lastbus returned by
> > handle_1ab101 and possibly why it isn't zero.
> 
> Some update:
> 
> - for 64-bit, PCIBIOS (and thus handle_1ab101) is not called.  PCIBIOS is
> only used by 32-bit kernels.  As a side effect, PCI expander bridges do not
> work on 32-bit kernels with ACPI disabled, because they are located beyond
> pcibios_last_bus (with ACPI enabled, the DSDT exposes them).
> 
> - for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is 
> done.
> 
> - for -M q35, pcibios_last_bus in Linux is set based on the size of the 
> MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs 
> for buses above 0.
> 
> Here is a patch that only scans devfn==0, which should mostly remove the need
> for pci=lastbus=0.  (Testing is welcome).

Actually, I think I have a better idea.
At the moment we just get an exit on these reads and return all-ones.
Yes, in theory there could be a UR bit set in a bunch of
registers but in practice no one cares about these,
and I don't think we implement them.
So how about mapping a single page, read-only, and filling it
with all-ones?

We'll still run the code within linux but it will be free.

What do you think?


> Actually, KVM could probably avoid the scanning altogether.  The only 
> "hidden" root
> buses we expect are from PCI expander bridges and if you found an MMCONFIG 
> area
> through the ACPI MCFG table, you can also use the DSDT to find PCI expander 
> bridges.
> However, I am being conservative.
> 
> A possible alternative could be a mechanism whereby the vmlinuz real mode 
> entry
> point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the
> kernel via the vmlinuz or PVH boot information structs.  However, I don't 
> think
> that's very useful, and there is some risk of breaking real hardware too.
> 
> Paolo
> 
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 73bb404f4d2a..17012aa60d22 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -61,6 +61,7 @@ enum pci_bf_sort_state {
>  extern struct pci_ops pci_root_ops;
>  
>  void pcibios_scan_specific_bus(int busn);
> +void pcibios_scan_bus_by_device(int busn);
>  
>  /* pci-irq.c */
>  
> @@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, 
> u32 val)
>  # endif
>  # define x86_default_pci_init_irqpcibios_irq_init
>  # define x86_default_pci_fixup_irqs  pcibios_fixup_irqs
> +# define x86_default_pci_scan_buspcibios_scan_bus_by_device
>  #else
>  # define x86_default_pci_initNULL
>  # define x86_default_pci_init_irqNULL
>  # define x86_default_pci_fixup_irqs  NULL
> +# define x86_default_pci_scan_bus  NULL
>  #endif
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index b85a7c54c6a1..4c3a0a17a600 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -251,6 +251,7 @@ struct x86_hyper_runtime {
>   * @save_sched_clock_state:  save state for sched_clock() on suspend
>   * @restore_sched_clock_state:   restore state for sched_clock() on 
> resume
>   * @apic_post_init:  adjust apic if needed
> + * @pci_scan_bus:scan a PCI bus
>   * @legacy:  legacy features
>   * @set_legacy_features: override legacy features. Use of this callback
>   *   is highly discouraged. You should only need
> @@ -273,6 +274,7 @@ struct x86_platform_ops {
>   void (*save_sched_clock_state)(void);
>   void (*restore_sched_clock_state)(void);
>   void (*apic_post_init)(void);
> + void (*pci_scan_bus)(int busn);
>   struct x86_legacy_features legacy;
>   void (*set_legacy_features)(void);
>   struct x86_hyper_runtime hyper;
> diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
> index 6857b4577f17..b248d7036dd3 100644
> --- a/arch/x86/kernel/jailhouse.c
> +++ b/arch/x86/kernel/jailhouse.c
> @@ -11,12 +11,14 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void)
>   return 0;
>  }
>  
> +static void jailhouse_pci_scan_bus_by_function(int busn)
> +{
> +int devfn;
> +u32 l;
> +
> +for (devfn = 0; devfn < 256; devfn++) {
> +if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) &&
> +l != 0x && l != 0x) {
> + 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Sergio Lopez

Paolo Bonzini  writes:

> On 25/07/19 15:26, Stefan Hajnoczi wrote:
>> The microvm design has a premise and it can be answered definitively
>> through performance analysis.
>> 
>> If I had to explain to someone why PCI or ACPI significantly slows
>> things down, I couldn't honestly do so.  I say significantly because
>> PCI init definitely requires more vmexits but can it be a small
>> number?  For ACPI I have no idea why it would consume significant
>> amounts of time.
>
> My guess is that it's just a lot of code that has to run. :(

I think I haven't shared any numbers about ACPI.

I don't have details about where exactly the time is spent, but
compiling a guest kernel without ACPI decreases the average boot time in
~12ms, and the kernel's unstripped ELF binary size goes down in a
whooping ~300KiB.

On the other hand, removing ACPI from QEMU decreases its initialization
time in ~5ms, and the binary size is ~183KiB smaller.

IMHO, those are pretty relevant savings on both fronts.

>> Until we have this knowledge, the premise of microvm is unproven and
>> merging it would be premature because maybe we can get into the same
>> ballpark by optimizing existing code.
>> 
>> I'm sorry for being a pain.  I actually think the analysis will
>> support microvm, but it still needs to be done in order to justify it.
>
> No, you're not a pain, you're explaining your reasoning and that helps.
>
> To me *maintainability is the biggest consideration* when introducing a
> new feature.  "We can do just as well with q35" is a good reason to
> deprecate and delete microvm, but not a good reason to reject it now as
> long as microvm is good enough in terms of maintainability.  Keeping it
> out of tree only makes it harder to do this kind of experiment.  virtio
> 1 seems to be the biggest remaining blocker and I think it'd be a good
> thing to have even for the ARM virt machine type.
>
> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> and ~25 ms in the kernel.  I must say that's pretty good, but it's still
> 30% of the whole boot time and reducing it is the hardest part.  If
> having microvm in tree can help reducing it, good.  Yes, it will get
> users, but most likely they will have to support pc or q35 as a fallback
> so we could still delete microvm at any time with the due deprecation
> period if it turns out to be a failed experiment.
>
> Whether to use qboot or SeaBIOS for microvm is another story, but it's
> an implementation detail as long as the ROM size doesn't change and/or
> we don't do versioned machine types.  So we can switch from one to the
> other at any time; we can also include qboot directly in QEMU's tree,
> without going through a submodule, which also reduces the infrastructure
> needed (mirrors, etc.) and makes it easier to delete it.
>
> Paolo
>
> (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
> last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
> end up measured as PCI in SeaBIOS, due to different init order, so the
> real firmware cost of PAM and PCI initialization should be 5ms for qboot
> and 10ms for SeaBIOS.



signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2019 at 04:13:13PM +0200, Paolo Bonzini wrote:
> On 25/07/19 15:54, Michael S. Tsirkin wrote:
> >> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> >> and ~25 ms in the kernel.
> > How did you measure the qemu time btw?
> > 
> 
> It's QEMU startup, but not QEMU altogether.  For example the time spent
> in memory.c when a BAR is programmed is not part of those 10 ms.
> 
> So I just computed q35 qemu startup - microvm qemu startup, it's 65 vs
> 65 ms.
> 
> Paolo

Oh so it could be eventfd or whatever, just as well.

I actually wonder whether we spend much time within
synchronize_* calls. eventfd triggers this a  lot of times.

How about ioeventfd=off? Does this speed up things?



-- 
MST



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2019 at 04:26:42PM +0200, Paolo Bonzini wrote:
> On 25/07/19 16:04, Peter Maydell wrote:
> > On Thu, 25 Jul 2019 at 14:43, Paolo Bonzini  wrote:
> >> To me *maintainability is the biggest consideration* when introducing a
> >> new feature.  "We can do just as well with q35" is a good reason to
> >> deprecate and delete microvm, but not a good reason to reject it now as
> >> long as microvm is good enough in terms of maintainability.
> > 
> > I think maintainability matters, but also important is "are
> > we going in the right direction in the first place?".
> > virtio-mmio is (variously deliberately and accidentally)
> > quite a long way behind virtio-pci, and certain kinds of things
> > (hotplug, extensibility beyond a certain number of endpoints)
> > are not going to be possible (either ever, or without a lot
> > of extra design and implementation work to reimplement stuff
> > we have already today with PCI). Are we sure we're not going
> > to end up with a stream of "oh, now we need to implement X for
> > virtio-mmio (that virtio-pci already has)", "users want Y now
> > (that virtio-pci already has)", etc?
> 
> I think this is part of maintainability in a wider sense.  For every
> missing feature there should be a good reason why it's not needed.  And
> if there is already code to do that in QEMU, then there should be an
> excellent reason why it's not being used.  (This was the essence of the
> firmware debate).
> 
> So for microvm you could do without hotplug because the idea is that you
> just tear down the VM and restart it.  Lack of MSI is actually what
> worries me the most, but we could say that microvm clients generally
> have little multiprocessing so it's not common to have multiple network
> flows at the same time and so you don't need multiqueue.

Me too, and in fact someone just posted
virtio-mmio: support multiple interrupt vectors


> For microvm in particular there are two reasons why we can take some
> shortcuts (but with care):
> 
> - we won't support versioned machine types for microvm.  microvm guests
> die every time you upgrade QEMU, by design.  So this is not another QED,
> which implemented more features than qcow2 but did so at the wrong place
> of the stack.  In fact it's exactly the opposite (it implements less
> features, so that the implementation of e.g. q35 or PCI is untouched and
> does not need one-off boot time optimization hacks)
> 
> - we know that Amazon is using something very similar to microvm in
> production, with virtio-mmio, so the feature set is at least usable for
> something.
> 
> > The other thing is that once we've introduced something we're
> > stuck with whatever it does, because we don't like breaking
> > backwards compatibility. So I think getting the virtio-legacy
> > vs virtio-1 story sorted out before we land microvm is
> > important, at least to the point where we know we haven't
> > backed ourselves into a corner or required a lot of extra
> > effort on transitional-device support that we could have
> > avoided.
> 
> Even though we won't support versioned machine types, I think there is
> agreement that virtio 0.9 is a bad idea and should be fixed.
> 
> Paolo

Right, for the simple reason that mmio does not support transitional
devices, only transitional drivers.  So if we commit to supporting old
guests, we won't be able to back out of that.

> > Which isn't to say that I'm against the microvm approach;
> > just that I'd like us to consider and make a decision on
> > these issues before landing it, rather than just saying
> > "the patches in themselves look good, let's merge it".
> > 
> > thanks
> > -- PMM
> > 



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Paolo Bonzini
On 25/07/19 16:04, Peter Maydell wrote:
> On Thu, 25 Jul 2019 at 14:43, Paolo Bonzini  wrote:
>> To me *maintainability is the biggest consideration* when introducing a
>> new feature.  "We can do just as well with q35" is a good reason to
>> deprecate and delete microvm, but not a good reason to reject it now as
>> long as microvm is good enough in terms of maintainability.
> 
> I think maintainability matters, but also important is "are
> we going in the right direction in the first place?".
> virtio-mmio is (variously deliberately and accidentally)
> quite a long way behind virtio-pci, and certain kinds of things
> (hotplug, extensibility beyond a certain number of endpoints)
> are not going to be possible (either ever, or without a lot
> of extra design and implementation work to reimplement stuff
> we have already today with PCI). Are we sure we're not going
> to end up with a stream of "oh, now we need to implement X for
> virtio-mmio (that virtio-pci already has)", "users want Y now
> (that virtio-pci already has)", etc?

I think this is part of maintainability in a wider sense.  For every
missing feature there should be a good reason why it's not needed.  And
if there is already code to do that in QEMU, then there should be an
excellent reason why it's not being used.  (This was the essence of the
firmware debate).

So for microvm you could do without hotplug because the idea is that you
just tear down the VM and restart it.  Lack of MSI is actually what
worries me the most, but we could say that microvm clients generally
have little multiprocessing so it's not common to have multiple network
flows at the same time and so you don't need multiqueue.

For microvm in particular there are two reasons why we can take some
shortcuts (but with care):

- we won't support versioned machine types for microvm.  microvm guests
die every time you upgrade QEMU, by design.  So this is not another QED,
which implemented more features than qcow2 but did so at the wrong place
of the stack.  In fact it's exactly the opposite (it implements less
features, so that the implementation of e.g. q35 or PCI is untouched and
does not need one-off boot time optimization hacks)

- we know that Amazon is using something very similar to microvm in
production, with virtio-mmio, so the feature set is at least usable for
something.

> The other thing is that once we've introduced something we're
> stuck with whatever it does, because we don't like breaking
> backwards compatibility. So I think getting the virtio-legacy
> vs virtio-1 story sorted out before we land microvm is
> important, at least to the point where we know we haven't
> backed ourselves into a corner or required a lot of extra
> effort on transitional-device support that we could have
> avoided.

Even though we won't support versioned machine types, I think there is
agreement that virtio 0.9 is a bad idea and should be fixed.

Paolo

> Which isn't to say that I'm against the microvm approach;
> just that I'd like us to consider and make a decision on
> these issues before landing it, rather than just saying
> "the patches in themselves look good, let's merge it".
> 
> thanks
> -- PMM
> 




Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Paolo Bonzini
On 25/07/19 15:54, Michael S. Tsirkin wrote:
>> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
>> and ~25 ms in the kernel.
> How did you measure the qemu time btw?
> 

It's QEMU startup, but not QEMU altogether.  For example the time spent
in memory.c when a BAR is programmed is not part of those 10 ms.

So I just computed q35 qemu startup - microvm qemu startup, it's 65 vs
65 ms.

Paolo



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Peter Maydell
On Thu, 25 Jul 2019 at 14:43, Paolo Bonzini  wrote:
> To me *maintainability is the biggest consideration* when introducing a
> new feature.  "We can do just as well with q35" is a good reason to
> deprecate and delete microvm, but not a good reason to reject it now as
> long as microvm is good enough in terms of maintainability.

I think maintainability matters, but also important is "are
we going in the right direction in the first place?".
virtio-mmio is (variously deliberately and accidentally)
quite a long way behind virtio-pci, and certain kinds of things
(hotplug, extensibility beyond a certain number of endpoints)
are not going to be possible (either ever, or without a lot
of extra design and implementation work to reimplement stuff
we have already today with PCI). Are we sure we're not going
to end up with a stream of "oh, now we need to implement X for
virtio-mmio (that virtio-pci already has)", "users want Y now
(that virtio-pci already has)", etc?

The other thing is that once we've introduced something we're
stuck with whatever it does, because we don't like breaking
backwards compatibility. So I think getting the virtio-legacy
vs virtio-1 story sorted out before we land microvm is
important, at least to the point where we know we haven't
backed ourselves into a corner or required a lot of extra
effort on transitional-device support that we could have
avoided.

Which isn't to say that I'm against the microvm approach;
just that I'd like us to consider and make a decision on
these issues before landing it, rather than just saying
"the patches in themselves look good, let's merge it".

thanks
-- PMM



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2019 at 03:43:12PM +0200, Paolo Bonzini wrote:
> On 25/07/19 15:26, Stefan Hajnoczi wrote:
> > The microvm design has a premise and it can be answered definitively
> > through performance analysis.
> > 
> > If I had to explain to someone why PCI or ACPI significantly slows
> > things down, I couldn't honestly do so.  I say significantly because
> > PCI init definitely requires more vmexits but can it be a small
> > number?  For ACPI I have no idea why it would consume significant
> > amounts of time.
> 
> My guess is that it's just a lot of code that has to run. :(
> 
> > Until we have this knowledge, the premise of microvm is unproven and
> > merging it would be premature because maybe we can get into the same
> > ballpark by optimizing existing code.
> > 
> > I'm sorry for being a pain.  I actually think the analysis will
> > support microvm, but it still needs to be done in order to justify it.
> 
> No, you're not a pain, you're explaining your reasoning and that helps.
> 
> To me *maintainability is the biggest consideration* when introducing a
> new feature.  "We can do just as well with q35" is a good reason to
> deprecate and delete microvm, but not a good reason to reject it now as
> long as microvm is good enough in terms of maintainability.  Keeping it
> out of tree only makes it harder to do this kind of experiment.  virtio
> 1 seems to be the biggest remaining blocker and I think it'd be a good
> thing to have even for the ARM virt machine type.

Yep. E.g. virtio-iommu guys wanted that too.

> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> and ~25 ms in the kernel.

How did you measure the qemu time btw?

>  I must say that's pretty good, but it's still
> 30% of the whole boot time and reducing it is the hardest part.  If
> having microvm in tree can help reducing it, good.  Yes, it will get
> users, but most likely they will have to support pc or q35 as a fallback
> so we could still delete microvm at any time with the due deprecation
> period if it turns out to be a failed experiment.
> 
> Whether to use qboot or SeaBIOS for microvm is another story, but it's
> an implementation detail as long as the ROM size doesn't change and/or
> we don't do versioned machine types.  So we can switch from one to the
> other at any time; we can also include qboot directly in QEMU's tree,
> without going through a submodule, which also reduces the infrastructure
> needed (mirrors, etc.) and makes it easier to delete it.
> 
> Paolo
> 
> (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
> last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
> end up measured as PCI in SeaBIOS, due to different init order, so the
> real firmware cost of PAM and PCI initialization should be 5ms for qboot
> and 10ms for SeaBIOS.



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2019 at 02:26:12PM +0100, Stefan Hajnoczi wrote:
> On Thu, Jul 25, 2019 at 1:10 PM Michael S. Tsirkin  wrote:
> > On Thu, Jul 25, 2019 at 01:01:29PM +0100, Stefan Hajnoczi wrote:
> > > On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini  
> > > wrote:
> > > > On 25/07/19 12:42, Sergio Lopez wrote:
> > > > > Peter Maydell  writes:
> > > > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin  
> > > > >> wrote:
> > > > >>> OK so please start with adding virtio 1 support. Guest bits
> > > > >>> have been ready for years now.
> > > > >>
> > > > >> I'd still rather we just used pci virtio. If pci isn't
> > > > >> fast enough at startup, do something to make it faster...
> > > > >
> > > > > Actually, removing PCI (and ACPI), is one of the main ways microvm has
> > > > > to reduce not only boot time, but also the exposed surface and the
> > > > > general footprint.
> > > > >
> > > > > I think we need to discuss and settle whether using virtio-mmio (even 
> > > > > if
> > > > > maintained and upgraded to virtio 1) for a new machine type is
> > > > > acceptable or not. Because if it isn't, we should probably just ditch
> > > > > the whole microvm idea and move to something else.
> > > >
> > > > I agree.  IMNSHO the reduced attack surface from removing PCI is
> > > > (mostly) security theater, however the boot time numbers that Sergio
> > > > showed for microvm are quite extreme and I don't think there is any hope
> > > > of getting even close with a PCI-based virtual machine.
> > > >
> > > > So I'd even go a step further: if using virtio-mmio for a new machine
> > > > type is not acceptable, we should admit that boot time optimization in
> > > > QEMU is basically as good as it can get---low-hanging fruit has been
> > > > picked with PVH and mmap is the logical next step, but all that's left
> > > > is optimizing the guest or something else.
> > >
> > > I haven't seen enough analysis to declare boot time optimization done.
> > > QEMU startup can be profiled and improved.
> >
> > Right, and that will always stay the case.
> 
> The microvm design has a premise and it can be answered definitively
> through performance analysis.
> 
> If I had to explain to someone why PCI or ACPI significantly slows
> things down, I couldn't honestly do so.

well with pci each device describes itself. you read
this description dword by dword normally. typical
description is 20-50 words.

if both bios and linux do this, that's twice the amount.

bios also uses two vmexits for each access.

there's also the resource allocation game.

I would say up to 200 exits per device is reasonable.


>  I say significantly because
> PCI init definitely requires more vmexits but can it be a small
> number?

each bus is scanned for devices. 32 accesses, 256 bus numbers
(that's the lastbus thing). Paolo posted a hack just
for the root bus but whenever we have a bridge the problem
will just re-surface.

pcie is actually link based so downstream buses do not
need to be scanned outside device 0 unless we see
a multifunction bit set. I don't think linux
implements this optimization atm.
But still the case for internal buses.


> For ACPI I have no idea why it would consume significant
> amounts of time.


me neither. I suspect it's not vmexit related at all.  Is ACPI driver in
linux just slow?  It's not been designed to be on any data path...
I'd love to know. I don't feel it's fair to ask someone
interested in writing new performant code to necessary optimize
old non-performant one.

> Until we have this knowledge, the premise of microvm is unproven and
> merging it would be premature because maybe we can get into the same
> ballpark by optimizing existing code.

maybe but who is working on this right now?

If it's possible to make PC faster but not enough people
know how to do it, and enough people know how to make microvm
faster, then it does not matter what's possible in theory.


> 
> I'm sorry for being a pain.  I actually think the analysis will
> support microvm, but it still needs to be done in order to justify it.
> 
> Stefan

At some level it would be great to have someone do detailed performance
profiling. But it is a lot of work, which also needs to be justified
given there's working code, and it's not bad code at that.

Yes speeding up PC would be nice but if everyone's gut feeling is it
won't get us what microvm is trying to achieve, why spend cycles making
sure?

-- 
MST



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Paolo Bonzini
On 25/07/19 15:26, Stefan Hajnoczi wrote:
> The microvm design has a premise and it can be answered definitively
> through performance analysis.
> 
> If I had to explain to someone why PCI or ACPI significantly slows
> things down, I couldn't honestly do so.  I say significantly because
> PCI init definitely requires more vmexits but can it be a small
> number?  For ACPI I have no idea why it would consume significant
> amounts of time.

My guess is that it's just a lot of code that has to run. :(

> Until we have this knowledge, the premise of microvm is unproven and
> merging it would be premature because maybe we can get into the same
> ballpark by optimizing existing code.
> 
> I'm sorry for being a pain.  I actually think the analysis will
> support microvm, but it still needs to be done in order to justify it.

No, you're not a pain, you're explaining your reasoning and that helps.

To me *maintainability is the biggest consideration* when introducing a
new feature.  "We can do just as well with q35" is a good reason to
deprecate and delete microvm, but not a good reason to reject it now as
long as microvm is good enough in terms of maintainability.  Keeping it
out of tree only makes it harder to do this kind of experiment.  virtio
1 seems to be the biggest remaining blocker and I think it'd be a good
thing to have even for the ARM virt machine type.

FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
and ~25 ms in the kernel.  I must say that's pretty good, but it's still
30% of the whole boot time and reducing it is the hardest part.  If
having microvm in tree can help reducing it, good.  Yes, it will get
users, but most likely they will have to support pc or q35 as a fallback
so we could still delete microvm at any time with the due deprecation
period if it turns out to be a failed experiment.

Whether to use qboot or SeaBIOS for microvm is another story, but it's
an implementation detail as long as the ROM size doesn't change and/or
we don't do versioned machine types.  So we can switch from one to the
other at any time; we can also include qboot directly in QEMU's tree,
without going through a submodule, which also reduces the infrastructure
needed (mirrors, etc.) and makes it easier to delete it.

Paolo

(*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
end up measured as PCI in SeaBIOS, due to different init order, so the
real firmware cost of PAM and PCI initialization should be 5ms for qboot
and 10ms for SeaBIOS.



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Stefan Hajnoczi
On Thu, Jul 25, 2019 at 1:10 PM Michael S. Tsirkin  wrote:
> On Thu, Jul 25, 2019 at 01:01:29PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini  wrote:
> > > On 25/07/19 12:42, Sergio Lopez wrote:
> > > > Peter Maydell  writes:
> > > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin  
> > > >> wrote:
> > > >>> OK so please start with adding virtio 1 support. Guest bits
> > > >>> have been ready for years now.
> > > >>
> > > >> I'd still rather we just used pci virtio. If pci isn't
> > > >> fast enough at startup, do something to make it faster...
> > > >
> > > > Actually, removing PCI (and ACPI), is one of the main ways microvm has
> > > > to reduce not only boot time, but also the exposed surface and the
> > > > general footprint.
> > > >
> > > > I think we need to discuss and settle whether using virtio-mmio (even if
> > > > maintained and upgraded to virtio 1) for a new machine type is
> > > > acceptable or not. Because if it isn't, we should probably just ditch
> > > > the whole microvm idea and move to something else.
> > >
> > > I agree.  IMNSHO the reduced attack surface from removing PCI is
> > > (mostly) security theater, however the boot time numbers that Sergio
> > > showed for microvm are quite extreme and I don't think there is any hope
> > > of getting even close with a PCI-based virtual machine.
> > >
> > > So I'd even go a step further: if using virtio-mmio for a new machine
> > > type is not acceptable, we should admit that boot time optimization in
> > > QEMU is basically as good as it can get---low-hanging fruit has been
> > > picked with PVH and mmap is the logical next step, but all that's left
> > > is optimizing the guest or something else.
> >
> > I haven't seen enough analysis to declare boot time optimization done.
> > QEMU startup can be profiled and improved.
>
> Right, and that will always stay the case.

The microvm design has a premise and it can be answered definitively
through performance analysis.

If I had to explain to someone why PCI or ACPI significantly slows
things down, I couldn't honestly do so.  I say significantly because
PCI init definitely requires more vmexits but can it be a small
number?  For ACPI I have no idea why it would consume significant
amounts of time.

Until we have this knowledge, the premise of microvm is unproven and
merging it would be premature because maybe we can get into the same
ballpark by optimizing existing code.

I'm sorry for being a pain.  I actually think the analysis will
support microvm, but it still needs to be done in order to justify it.

Stefan



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2019 at 01:01:29PM +0100, Stefan Hajnoczi wrote:
> On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini  wrote:
> > On 25/07/19 12:42, Sergio Lopez wrote:
> > > Peter Maydell  writes:
> > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin  wrote:
> > >>> OK so please start with adding virtio 1 support. Guest bits
> > >>> have been ready for years now.
> > >>
> > >> I'd still rather we just used pci virtio. If pci isn't
> > >> fast enough at startup, do something to make it faster...
> > >
> > > Actually, removing PCI (and ACPI), is one of the main ways microvm has
> > > to reduce not only boot time, but also the exposed surface and the
> > > general footprint.
> > >
> > > I think we need to discuss and settle whether using virtio-mmio (even if
> > > maintained and upgraded to virtio 1) for a new machine type is
> > > acceptable or not. Because if it isn't, we should probably just ditch
> > > the whole microvm idea and move to something else.
> >
> > I agree.  IMNSHO the reduced attack surface from removing PCI is
> > (mostly) security theater, however the boot time numbers that Sergio
> > showed for microvm are quite extreme and I don't think there is any hope
> > of getting even close with a PCI-based virtual machine.
> >
> > So I'd even go a step further: if using virtio-mmio for a new machine
> > type is not acceptable, we should admit that boot time optimization in
> > QEMU is basically as good as it can get---low-hanging fruit has been
> > picked with PVH and mmap is the logical next step, but all that's left
> > is optimizing the guest or something else.
> 
> I haven't seen enough analysis to declare boot time optimization done.
> QEMU startup can be profiled and improved.

Right, and that will always stay the case. OTOH imho microvm is
non-intrusive enough, and small enough, that we'd just put it upstream
after addressing low-level comments.
This will allow more contributions from people interested in boot time.
With no cross-version migration support, or maybe migration
disabled completely, maintainance burden should not be too high.
Not everyone wants to hack on pci/acpi specifically.


> The numbers show that removing PCI and ACPI makes things faster but
> this doesn't justify removing them.  Understanding of why they are
> slow is what justifies removing them.  Otherwise it could just be a
> misconfiguration, inefficient implementation, etc and we've seen there
> is low-hanging fruit.
> 
> How much time is spent doing PCI initialization?  Is the vmexit
> pattern for PCI initialization as good as the hardware interface
> allows?

I know in the bios we wanted to use memory mapped for pci config
accesses for a very long time now. This makes each vmexit slower but
cuts the number of exits by half. Only affects seabios though.




> Without an analysis of why things are slow it's not possible come to
> an informed decision.
> 
> Stefan



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Stefan Hajnoczi
On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini  wrote:
> On 25/07/19 12:42, Sergio Lopez wrote:
> > Peter Maydell  writes:
> >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin  wrote:
> >>> OK so please start with adding virtio 1 support. Guest bits
> >>> have been ready for years now.
> >>
> >> I'd still rather we just used pci virtio. If pci isn't
> >> fast enough at startup, do something to make it faster...
> >
> > Actually, removing PCI (and ACPI), is one of the main ways microvm has
> > to reduce not only boot time, but also the exposed surface and the
> > general footprint.
> >
> > I think we need to discuss and settle whether using virtio-mmio (even if
> > maintained and upgraded to virtio 1) for a new machine type is
> > acceptable or not. Because if it isn't, we should probably just ditch
> > the whole microvm idea and move to something else.
>
> I agree.  IMNSHO the reduced attack surface from removing PCI is
> (mostly) security theater, however the boot time numbers that Sergio
> showed for microvm are quite extreme and I don't think there is any hope
> of getting even close with a PCI-based virtual machine.
>
> So I'd even go a step further: if using virtio-mmio for a new machine
> type is not acceptable, we should admit that boot time optimization in
> QEMU is basically as good as it can get---low-hanging fruit has been
> picked with PVH and mmap is the logical next step, but all that's left
> is optimizing the guest or something else.

I haven't seen enough analysis to declare boot time optimization done.
QEMU startup can be profiled and improved.

The numbers show that removing PCI and ACPI makes things faster but
this doesn't justify removing them.  Understanding of why they are
slow is what justifies removing them.  Otherwise it could just be a
misconfiguration, inefficient implementation, etc and we've seen there
is low-hanging fruit.

How much time is spent doing PCI initialization?  Is the vmexit
pattern for PCI initialization as good as the hardware interface
allows?

Without an analysis of why things are slow it's not possible come to
an informed decision.

Stefan



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Paolo Bonzini
On 25/07/19 12:42, Sergio Lopez wrote:
> 
> Peter Maydell  writes:
> 
>> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin  wrote:
>>> OK so please start with adding virtio 1 support. Guest bits
>>> have been ready for years now.
>>
>> I'd still rather we just used pci virtio. If pci isn't
>> fast enough at startup, do something to make it faster...
> 
> Actually, removing PCI (and ACPI), is one of the main ways microvm has
> to reduce not only boot time, but also the exposed surface and the
> general footprint.
> 
> I think we need to discuss and settle whether using virtio-mmio (even if
> maintained and upgraded to virtio 1) for a new machine type is
> acceptable or not. Because if it isn't, we should probably just ditch
> the whole microvm idea and move to something else.

I agree.  IMNSHO the reduced attack surface from removing PCI is
(mostly) security theater, however the boot time numbers that Sergio
showed for microvm are quite extreme and I don't think there is any hope
of getting even close with a PCI-based virtual machine.

So I'd even go a step further: if using virtio-mmio for a new machine
type is not acceptable, we should admit that boot time optimization in
QEMU is basically as good as it can get---low-hanging fruit has been
picked with PVH and mmap is the logical next step, but all that's left
is optimizing the guest or something else.

I must say that -M microvm took a while to grow on me, but I think it's
a great example of how the infrastructure provided by QEMU provides
useful features for free, even for the simplest emulated hardware.  For
example, in v3 microvm could only boot from PVH kernels, but the next
firmware-enabled version reuses more of the PC code and thus supports
all of vmlinuz, multiboot and PVH.

Again: Sergio has been very receptive to feedback and has provided
numbers to back the design choices, and we should reciprocate or at
least be very clear on the constraints.

Paolo



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Paolo Bonzini
On 25/07/19 12:03, Michael S. Tsirkin wrote:
>> +#ifdef CONFIG_PCI
>> +x86_platform.pci_scan_bus = kvm_pci_scan_bus;
>> +#endif
>> +
>>  if (!kvm_para_available())
>>  return;
>>  
> Shouldn't this happen after kvm_para_available?

Actually kvm_para_available() is not needed anymore, since this only
runs after kvm_detect() has returned true.

> In fact, let's add a CPU ID flag for this, so it's
> easy to tell guest whether to scan extra buses.
> What do you say?

I think it would make it much harder to deploy this, since it relies on
having new userspace and new machine types.  This patch is basically a
reflection of the status quo, which is that there are generally no
"hidden" buses on commonly-used KVM userspaces, and even in the weird
configurations that have them there is always something at devfn=0.

(On real hardware, the only such hidden bus is e.g. 0x7f/0xff, which
have a bunch of QPI and MCH-related devices.  This is not something
you'd have in a virtual machine).

Paolo



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Sergio Lopez

Peter Maydell  writes:

> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin  wrote:
>> OK so please start with adding virtio 1 support. Guest bits
>> have been ready for years now.
>
> I'd still rather we just used pci virtio. If pci isn't
> fast enough at startup, do something to make it faster...

Actually, removing PCI (and ACPI), is one of the main ways microvm has
to reduce not only boot time, but also the exposed surface and the
general footprint.

I think we need to discuss and settle whether using virtio-mmio (even if
maintained and upgraded to virtio 1) for a new machine type is
acceptable or not. Because if it isn't, we should probably just ditch
the whole microvm idea and move to something else.

Sergio.




signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Thu, Jul 25, 2019 at 11:05:05AM +0100, Peter Maydell wrote:
> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin  wrote:
> > OK so please start with adding virtio 1 support. Guest bits
> > have been ready for years now.
> 
> I'd still rather we just used pci virtio. If pci isn't
> fast enough at startup, do something to make it faster...
> 
> thanks
> -- PMM

Oh that's putting microvm aside - if we have a maintainer for
virtio mmio that's great because it does need a maintainer,
and virtio 1 would be the thing to fix before adding features ;)

-- 
MST



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Peter Maydell
On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin  wrote:
> OK so please start with adding virtio 1 support. Guest bits
> have been ready for years now.

I'd still rather we just used pci virtio. If pci isn't
fast enough at startup, do something to make it faster...

thanks
-- PMM



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Wed, Jul 24, 2019 at 01:14:35PM +0200, Paolo Bonzini wrote:
> On 23/07/19 12:01, Paolo Bonzini wrote:
> > The number of buses is determined by the firmware, not by QEMU, so
> > fw_cfg would not be the right interface.  In fact (as I have just
> > learnt) lastbus is an x86-specific option that overrides the last bus
> > returned by SeaBIOS's handle_1ab101.
> > 
> > So the next step could be to figure out what is the lastbus returned by
> > handle_1ab101 and possibly why it isn't zero.
> 
> Some update:
> 
> - for 64-bit, PCIBIOS (and thus handle_1ab101) is not called.  PCIBIOS is
> only used by 32-bit kernels.  As a side effect, PCI expander bridges do not
> work on 32-bit kernels with ACPI disabled, because they are located beyond
> pcibios_last_bus (with ACPI enabled, the DSDT exposes them).
> 
> - for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is 
> done.
> 
> - for -M q35, pcibios_last_bus in Linux is set based on the size of the 
> MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs 
> for buses above 0.
> 
> Here is a patch that only scans devfn==0, which should mostly remove the need
> for pci=lastbus=0.  (Testing is welcome).
> 
> Actually, KVM could probably avoid the scanning altogether.  The only 
> "hidden" root
> buses we expect are from PCI expander bridges and if you found an MMCONFIG 
> area
> through the ACPI MCFG table, you can also use the DSDT to find PCI expander 
> bridges.
> However, I am being conservative.
> 
> A possible alternative could be a mechanism whereby the vmlinuz real mode 
> entry
> point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the
> kernel via the vmlinuz or PVH boot information structs.  However, I don't 
> think
> that's very useful, and there is some risk of breaking real hardware too.
> 
> Paolo
> 
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 73bb404f4d2a..17012aa60d22 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -61,6 +61,7 @@ enum pci_bf_sort_state {
>  extern struct pci_ops pci_root_ops;
>  
>  void pcibios_scan_specific_bus(int busn);
> +void pcibios_scan_bus_by_device(int busn);
>  
>  /* pci-irq.c */
>  
> @@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, 
> u32 val)
>  # endif
>  # define x86_default_pci_init_irqpcibios_irq_init
>  # define x86_default_pci_fixup_irqs  pcibios_fixup_irqs
> +# define x86_default_pci_scan_buspcibios_scan_bus_by_device
>  #else
>  # define x86_default_pci_initNULL
>  # define x86_default_pci_init_irqNULL
>  # define x86_default_pci_fixup_irqs  NULL
> +# define x86_default_pci_scan_bus  NULL
>  #endif
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index b85a7c54c6a1..4c3a0a17a600 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -251,6 +251,7 @@ struct x86_hyper_runtime {
>   * @save_sched_clock_state:  save state for sched_clock() on suspend
>   * @restore_sched_clock_state:   restore state for sched_clock() on 
> resume
>   * @apic_post_init:  adjust apic if needed
> + * @pci_scan_bus:scan a PCI bus
>   * @legacy:  legacy features
>   * @set_legacy_features: override legacy features. Use of this callback
>   *   is highly discouraged. You should only need
> @@ -273,6 +274,7 @@ struct x86_platform_ops {
>   void (*save_sched_clock_state)(void);
>   void (*restore_sched_clock_state)(void);
>   void (*apic_post_init)(void);
> + void (*pci_scan_bus)(int busn);
>   struct x86_legacy_features legacy;
>   void (*set_legacy_features)(void);
>   struct x86_hyper_runtime hyper;
> diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
> index 6857b4577f17..b248d7036dd3 100644
> --- a/arch/x86/kernel/jailhouse.c
> +++ b/arch/x86/kernel/jailhouse.c
> @@ -11,12 +11,14 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void)
>   return 0;
>  }
>  
> +static void jailhouse_pci_scan_bus_by_function(int busn)
> +{
> +int devfn;
> +u32 l;
> +
> +for (devfn = 0; devfn < 256; devfn++) {
> +if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) &&
> +l != 0x && l != 0x) {
> +DBG("Found device at %02x:%02x [%04x]\n", busn, 
> devfn, l);
> +pr_info("PCI: Discovered peer bus %02x\n", busn);
> +pcibios_scan_root(busn);
> +return;
> +}
> +}
> +}
> +
>  static void __init jailhouse_init_platform(void)
>  {
>   u64 pa_data = boot_params.hdr.setup_data;
> @@ -153,6 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin
On Wed, Jul 03, 2019 at 12:04:00AM +0200, Sergio Lopez wrote:
> On Tue, Jul 02, 2019 at 07:04:15PM +0100, Peter Maydell wrote:
> > On Tue, 2 Jul 2019 at 18:34, Sergio Lopez  wrote:
> > > Peter Maydell  writes:
> > > > Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
> > > > a bit deprecated and tends not to support all the features that
> > > > virtio-pci does. It was introduced mostly as a stopgap while we
> > > > didn't have pci support in the aarch64 virt machine, and remains
> > > > for legacy "we don't like to break existing working setups" rather
> > > > than as a recommended config for new systems.
> > >
> > > Using virtio-pci implies keeping PCI and ACPI support, defeating a
> > > significant part of microvm's purpose.
> > >
> > > What are the issues with the current state of virtio-mmio? Is there a
> > > way I can help to improve the situation?
> > 
> > Off the top of my head:
> >  * limitations on numbers of devices
> >  * no hotplug support
> >  * unlike PCI, it's not probeable, so you have to tell the
> >guest where all the transports are using device tree or
> >some similar mechanism
> >  * you need one IRQ line per transport, which restricts how
> >many you can have
> >  * it's only virtio-0.9, it doesn't support any of the new
> >virtio-1.0 functionality
> >  * it is broadly not really maintained in QEMU (and I think
> >not really in the kernel either? not sure), because we'd
> >rather not have to maintain two mechanisms for doing virtio
> >when virtio-pci is clearly better than virtio-mmio
> 
> Some of these are design issues, but others can be improved with a bit
> of work.
> 
> As for the maintenance burden, I volunteer myself to help with that, so
> it won't have an impact on other developers and/or projects.
> 
> Sergio.

OK so please start with adding virtio 1 support. Guest bits
have been ready for years now.

-- 
MST



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Sergio Lopez

Paolo Bonzini  writes:

> On 23/07/19 12:01, Paolo Bonzini wrote:
>> The number of buses is determined by the firmware, not by QEMU, so
>> fw_cfg would not be the right interface.  In fact (as I have just
>> learnt) lastbus is an x86-specific option that overrides the last bus
>> returned by SeaBIOS's handle_1ab101.
>> 
>> So the next step could be to figure out what is the lastbus returned by
>> handle_1ab101 and possibly why it isn't zero.
>
> Some update:
>
> - for 64-bit, PCIBIOS (and thus handle_1ab101) is not called.  PCIBIOS is
> only used by 32-bit kernels.  As a side effect, PCI expander bridges do not
> work on 32-bit kernels with ACPI disabled, because they are located beyond
> pcibios_last_bus (with ACPI enabled, the DSDT exposes them).
>
> - for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is 
> done.
>
> - for -M q35, pcibios_last_bus in Linux is set based on the size of the 
> MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs 
> for buses above 0.
>
> Here is a patch that only scans devfn==0, which should mostly remove the need
> for pci=lastbus=0.  (Testing is welcome).

I just gave it a try. These are the results (avg on 10 consecutive runs):

 - Unpatched kernel:

Avg
 qemu_init_end: 75.207386
 linux_start_kernel: 115.056767 (+39.849381)
 linux_start_user: 241.020113 (+125.963346)

 - Unpatched kernel with pci=lastbus=0:

Avg
 qemu_init_end: 75.468282
 linux_start_kernel: 115.189322 (+39.72104)
 linux_start_user: 192.404823 (+77.215501)

 - Patched kernel (without pci=lastbus=0):

Avg
 qemu_init_end: 75.605627
 linux_start_kernel: 115.656557 (+40.05093)
 linux_start_user: 192.857655 (+77.201098)

Looks fine to me. There must an extra cost in the patched kernel
vs. using pci=lastbus=0, but it's so low that's hard to catch on the
average numbers.

> Actually, KVM could probably avoid the scanning altogether.  The only 
> "hidden" root
> buses we expect are from PCI expander bridges and if you found an MMCONFIG 
> area
> through the ACPI MCFG table, you can also use the DSDT to find PCI expander 
> bridges.
> However, I am being conservative.
>
> A possible alternative could be a mechanism whereby the vmlinuz real mode 
> entry
> point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the
> kernel via the vmlinuz or PVH boot information structs.  However, I don't 
> think
> that's very useful, and there is some risk of breaking real hardware too.
>
> Paolo
>
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 73bb404f4d2a..17012aa60d22 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -61,6 +61,7 @@ enum pci_bf_sort_state {
>  extern struct pci_ops pci_root_ops;
>  
>  void pcibios_scan_specific_bus(int busn);
> +void pcibios_scan_bus_by_device(int busn);
>  
>  /* pci-irq.c */
>  
> @@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, 
> u32 val)
>  # endif
>  # define x86_default_pci_init_irqpcibios_irq_init
>  # define x86_default_pci_fixup_irqs  pcibios_fixup_irqs
> +# define x86_default_pci_scan_buspcibios_scan_bus_by_device
>  #else
>  # define x86_default_pci_initNULL
>  # define x86_default_pci_init_irqNULL
>  # define x86_default_pci_fixup_irqs  NULL
> +# define x86_default_pci_scan_bus  NULL
>  #endif
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index b85a7c54c6a1..4c3a0a17a600 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -251,6 +251,7 @@ struct x86_hyper_runtime {
>   * @save_sched_clock_state:  save state for sched_clock() on suspend
>   * @restore_sched_clock_state:   restore state for sched_clock() on 
> resume
>   * @apic_post_init:  adjust apic if needed
> + * @pci_scan_bus:scan a PCI bus
>   * @legacy:  legacy features
>   * @set_legacy_features: override legacy features. Use of this callback
>   *   is highly discouraged. You should only need
> @@ -273,6 +274,7 @@ struct x86_platform_ops {
>   void (*save_sched_clock_state)(void);
>   void (*restore_sched_clock_state)(void);
>   void (*apic_post_init)(void);
> + void (*pci_scan_bus)(int busn);
>   struct x86_legacy_features legacy;
>   void (*set_legacy_features)(void);
>   struct x86_hyper_runtime hyper;
> diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
> index 6857b4577f17..b248d7036dd3 100644
> --- a/arch/x86/kernel/jailhouse.c
> +++ b/arch/x86/kernel/jailhouse.c
> @@ -11,12 +11,14 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void)
>   return 0;
>  }
>  
> +static void jailhouse_pci_scan_bus_by_function(int busn)
> +{
> +   

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-24 Thread Stefano Garzarella



On Tue, Jul 23, 2019 at 1:30 PM Stefano Garzarella  wrote:
>
> On Tue, Jul 23, 2019 at 10:47:39AM +0100, Stefan Hajnoczi wrote:
> > On Tue, Jul 23, 2019 at 9:43 AM Sergio Lopez  wrote:
> > > Montes, Julio  writes:
> > >
> > > > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
> > > >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez  wrote:
> > > >> > Stefan Hajnoczi  writes:
> > > >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> > > >> > > > Stefan Hajnoczi  writes:
> > > >> > > >
> > > >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> > > >> > > >  --
> > > >> > > >  | Conclusion |
> > > >> > > >  --
> > > >> > > >
> > > >> > > > The average boot time of microvm is a third of Q35's (115ms vs.
> > > >> > > > 363ms),
> > > >> > > > and is smaller on all sections (QEMU initialization, firmware
> > > >> > > > overhead
> > > >> > > > and kernel start-to-user).
> > > >> > > >
> > > >> > > > Microvm's memory tree is also visibly simpler, significantly
> > > >> > > > reducing
> > > >> > > > the exposed surface to the guest.
> > > >> > > >
> > > >> > > > While we can certainly work on making Q35 smaller, I definitely
> > > >> > > > think
> > > >> > > > it's better (and way safer!) having a specialized machine type
> > > >> > > > for a
> > > >> > > > specific use case, than a minimal Q35 whose behavior
> > > >> > > > significantly
> > > >> > > > diverges from a conventional Q35.
> > > >> > >
> > > >> > > Interesting, so not a 10x difference!  This might be amenable to
> > > >> > > optimization.
> > > >> > >
> > > >> > > My concern with microvm is that it's so limited that few users
> > > >> > > will be
> > > >> > > able to benefit from the reduced attack surface and faster
> > > >> > > startup time.
> > > >> > > I think it's worth investigating slimming down Q35 further first.
> > > >> > >
> > > >> > > In terms of startup time the first step would be profiling Q35
> > > >> > > kernel
> > > >> > > startup to find out what's taking so long (firmware
> > > >> > > initialization, PCI
> > > >> > > probing, etc)?
> > > >> >
> > > >> > Some findings:
> > > >> >
> > > >> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
> > > >> > saves a
> > > >> > whooping 120ms by avoiding the APIC timer calibration at
> > > >> > arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
> > > >> >
> > > >> > Average boot time with "-cpu host"
> > > >> >  qemu_init_end: 76.408950
> > > >> >  linux_start_kernel: 116.166142 (+39.757192)
> > > >> >  linux_start_user: 242.954347 (+126.788205)
> > > >> >
> > > >> > Average boot time with default "cpu"
> > > >> >  qemu_init_end: 77.467852
> > > >> >  linux_start_kernel: 116.688472 (+39.22062)
> > > >> >  linux_start_user: 363.033365 (+246.344893)
> > > >>
> > > >> \o/
> > > >>
> > > >> >  2. The other 130ms are a direct result of PCI and ACPI presence
> > > >> > (tested
> > > >> > with a kernel without support for those elements). I'll publish
> > > >> > some
> > > >> > detailed numbers next week.
> > > >>
> > > >> Here are the Kata Containers kernel parameters:
> > > >>
> > > >> var kernelParams = []Param{
> > > >> {"tsc", "reliable"},
> > > >> {"no_timer_check", ""},
> > > >> {"rcupdate.rcu_expedited", "1"},
> > > >> {"i8042.direct", "1"},
> > > >> {"i8042.dumbkbd", "1"},
> > > >> {"i8042.nopnp", "1"},
> > > >> {"i8042.noaux", "1"},
> > > >> {"noreplace-smp", ""},
> > > >> {"reboot", "k"},
> > > >> {"console", "hvc0"},
> > > >> {"console", "hvc1"},
> > > >> {"iommu", "off"},
> > > >> {"cryptomgr.notests", ""},
> > > >> {"net.ifnames", "0"},
> > > >> {"pci", "lastbus=0"},
> > > >> }
> > > >>
> > > >> pci lastbus=0 looks interesting and so do some of the others :).
> > > >>
> > > >
> > > > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
> > > > kernel won't scan the 255.. buses :)
> > >
> > > I can confirm that adding pci=lastbus=0 makes a significant
> > > improvement. In fact, is the only option from Kata's kernel parameter
> > > list that has an impact, probably because the kernel is already quite
> > > minimalistic.
> > >
> > > Average boot time with "-cpu host" and "pci=lastbus=0"
> > >  qemu_init_end: 73.711569
> > >  linux_start_kernel: 113.414311 (+39.702742)
> > >  linux_start_user: 190.949939 (+77.535628)
> > >
> > > That's still ~40% slower than microvm, and the breach quickly widens
> > > when adding more PCI devices (each one adds 10-15ms), but it's certainly
> > > an improvement over the original numbers.
> > >
> > > On the other hand, there isn't much we can do here from QEMU's
> > > perspective, as this is basically Guest OS tuning.
> >
> > fw_cfg could expose this information so guest kernels know when to
> > stop enumerating the PCI bus.  This would make all PCI guests with new
> > kernels boot ~50 ms faster, regardless of machine type.

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-24 Thread Paolo Bonzini
On 23/07/19 12:01, Paolo Bonzini wrote:
> The number of buses is determined by the firmware, not by QEMU, so
> fw_cfg would not be the right interface.  In fact (as I have just
> learnt) lastbus is an x86-specific option that overrides the last bus
> returned by SeaBIOS's handle_1ab101.
> 
> So the next step could be to figure out what is the lastbus returned by
> handle_1ab101 and possibly why it isn't zero.

Some update:

- for 64-bit, PCIBIOS (and thus handle_1ab101) is not called.  PCIBIOS is
only used by 32-bit kernels.  As a side effect, PCI expander bridges do not
work on 32-bit kernels with ACPI disabled, because they are located beyond
pcibios_last_bus (with ACPI enabled, the DSDT exposes them).

- for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is 
done.

- for -M q35, pcibios_last_bus in Linux is set based on the size of the 
MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs 
for buses above 0.

Here is a patch that only scans devfn==0, which should mostly remove the need
for pci=lastbus=0.  (Testing is welcome).

Actually, KVM could probably avoid the scanning altogether.  The only "hidden" 
root
buses we expect are from PCI expander bridges and if you found an MMCONFIG area
through the ACPI MCFG table, you can also use the DSDT to find PCI expander 
bridges.
However, I am being conservative.

A possible alternative could be a mechanism whereby the vmlinuz real mode entry
point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the
kernel via the vmlinuz or PVH boot information structs.  However, I don't think
that's very useful, and there is some risk of breaking real hardware too.

Paolo

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index 73bb404f4d2a..17012aa60d22 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -61,6 +61,7 @@ enum pci_bf_sort_state {
 extern struct pci_ops pci_root_ops;
 
 void pcibios_scan_specific_bus(int busn);
+void pcibios_scan_bus_by_device(int busn);
 
 /* pci-irq.c */
 
@@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, 
u32 val)
 # endif
 # define x86_default_pci_init_irq  pcibios_irq_init
 # define x86_default_pci_fixup_irqspcibios_fixup_irqs
+# define x86_default_pci_scan_bus  pcibios_scan_bus_by_device
 #else
 # define x86_default_pci_init  NULL
 # define x86_default_pci_init_irq  NULL
 # define x86_default_pci_fixup_irqsNULL
+# define x86_default_pci_scan_bus  NULL
 #endif
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index b85a7c54c6a1..4c3a0a17a600 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -251,6 +251,7 @@ struct x86_hyper_runtime {
  * @save_sched_clock_state:save state for sched_clock() on suspend
  * @restore_sched_clock_state: restore state for sched_clock() on resume
  * @apic_post_init:adjust apic if needed
+ * @pci_scan_bus:  scan a PCI bus
  * @legacy:legacy features
  * @set_legacy_features:   override legacy features. Use of this callback
  * is highly discouraged. You should only need
@@ -273,6 +274,7 @@ struct x86_platform_ops {
void (*save_sched_clock_state)(void);
void (*restore_sched_clock_state)(void);
void (*apic_post_init)(void);
+   void (*pci_scan_bus)(int busn);
struct x86_legacy_features legacy;
void (*set_legacy_features)(void);
struct x86_hyper_runtime hyper;
diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
index 6857b4577f17..b248d7036dd3 100644
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -11,12 +11,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void)
return 0;
 }
 
+static void jailhouse_pci_scan_bus_by_function(int busn)
+{
+int devfn;
+u32 l;
+
+for (devfn = 0; devfn < 256; devfn++) {
+if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) &&
+l != 0x && l != 0x) {
+DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, 
l);
+pr_info("PCI: Discovered peer bus %02x\n", busn);
+pcibios_scan_root(busn);
+return;
+}
+}
+}
+
 static void __init jailhouse_init_platform(void)
 {
u64 pa_data = boot_params.hdr.setup_data;
@@ -153,6 +171,7 @@ static void __init jailhouse_init_platform(void)
x86_platform.legacy.rtc = 0;
x86_platform.legacy.warm_reset  = 0;
x86_platform.legacy.i8042   = X86_LEGACY_I8042_PLATFORM_ABSENT;
+   x86_platform.pci_scan_bus   = jailhouse_pci_scan_bus_by_function;
 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-23 Thread Stefano Garzarella
On Tue, Jul 23, 2019 at 10:47:39AM +0100, Stefan Hajnoczi wrote:
> On Tue, Jul 23, 2019 at 9:43 AM Sergio Lopez  wrote:
> > Montes, Julio  writes:
> >
> > > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
> > >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez  wrote:
> > >> > Stefan Hajnoczi  writes:
> > >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> > >> > > > Stefan Hajnoczi  writes:
> > >> > > >
> > >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> > >> > > >  --
> > >> > > >  | Conclusion |
> > >> > > >  --
> > >> > > >
> > >> > > > The average boot time of microvm is a third of Q35's (115ms vs.
> > >> > > > 363ms),
> > >> > > > and is smaller on all sections (QEMU initialization, firmware
> > >> > > > overhead
> > >> > > > and kernel start-to-user).
> > >> > > >
> > >> > > > Microvm's memory tree is also visibly simpler, significantly
> > >> > > > reducing
> > >> > > > the exposed surface to the guest.
> > >> > > >
> > >> > > > While we can certainly work on making Q35 smaller, I definitely
> > >> > > > think
> > >> > > > it's better (and way safer!) having a specialized machine type
> > >> > > > for a
> > >> > > > specific use case, than a minimal Q35 whose behavior
> > >> > > > significantly
> > >> > > > diverges from a conventional Q35.
> > >> > >
> > >> > > Interesting, so not a 10x difference!  This might be amenable to
> > >> > > optimization.
> > >> > >
> > >> > > My concern with microvm is that it's so limited that few users
> > >> > > will be
> > >> > > able to benefit from the reduced attack surface and faster
> > >> > > startup time.
> > >> > > I think it's worth investigating slimming down Q35 further first.
> > >> > >
> > >> > > In terms of startup time the first step would be profiling Q35
> > >> > > kernel
> > >> > > startup to find out what's taking so long (firmware
> > >> > > initialization, PCI
> > >> > > probing, etc)?
> > >> >
> > >> > Some findings:
> > >> >
> > >> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
> > >> > saves a
> > >> > whooping 120ms by avoiding the APIC timer calibration at
> > >> > arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
> > >> >
> > >> > Average boot time with "-cpu host"
> > >> >  qemu_init_end: 76.408950
> > >> >  linux_start_kernel: 116.166142 (+39.757192)
> > >> >  linux_start_user: 242.954347 (+126.788205)
> > >> >
> > >> > Average boot time with default "cpu"
> > >> >  qemu_init_end: 77.467852
> > >> >  linux_start_kernel: 116.688472 (+39.22062)
> > >> >  linux_start_user: 363.033365 (+246.344893)
> > >>
> > >> \o/
> > >>
> > >> >  2. The other 130ms are a direct result of PCI and ACPI presence
> > >> > (tested
> > >> > with a kernel without support for those elements). I'll publish
> > >> > some
> > >> > detailed numbers next week.
> > >>
> > >> Here are the Kata Containers kernel parameters:
> > >>
> > >> var kernelParams = []Param{
> > >> {"tsc", "reliable"},
> > >> {"no_timer_check", ""},
> > >> {"rcupdate.rcu_expedited", "1"},
> > >> {"i8042.direct", "1"},
> > >> {"i8042.dumbkbd", "1"},
> > >> {"i8042.nopnp", "1"},
> > >> {"i8042.noaux", "1"},
> > >> {"noreplace-smp", ""},
> > >> {"reboot", "k"},
> > >> {"console", "hvc0"},
> > >> {"console", "hvc1"},
> > >> {"iommu", "off"},
> > >> {"cryptomgr.notests", ""},
> > >> {"net.ifnames", "0"},
> > >> {"pci", "lastbus=0"},
> > >> }
> > >>
> > >> pci lastbus=0 looks interesting and so do some of the others :).
> > >>
> > >
> > > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
> > > kernel won't scan the 255.. buses :)
> >
> > I can confirm that adding pci=lastbus=0 makes a significant
> > improvement. In fact, is the only option from Kata's kernel parameter
> > list that has an impact, probably because the kernel is already quite
> > minimalistic.
> >
> > Average boot time with "-cpu host" and "pci=lastbus=0"
> >  qemu_init_end: 73.711569
> >  linux_start_kernel: 113.414311 (+39.702742)
> >  linux_start_user: 190.949939 (+77.535628)
> >
> > That's still ~40% slower than microvm, and the breach quickly widens
> > when adding more PCI devices (each one adds 10-15ms), but it's certainly
> > an improvement over the original numbers.
> >
> > On the other hand, there isn't much we can do here from QEMU's
> > perspective, as this is basically Guest OS tuning.
> 
> fw_cfg could expose this information so guest kernels know when to
> stop enumerating the PCI bus.  This would make all PCI guests with new
> kernels boot ~50 ms faster, regardless of machine type.
> 
> The difference between microvm and tuned Q35 is 76 ms now.
> 
> microvm:
> qemu_init_end: 64.043264
> linux_start_kernel: 65.481782 (+1.438518)
> linux_start_user: 114.938353 (+49.456571)
> 
> Q35 with -cpu host and pci=lasbus=0:
> qemu_init_end: 73.711569
> linux_start_kernel: 113.414311 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-23 Thread Paolo Bonzini
On 23/07/19 11:47, Stefan Hajnoczi wrote:
> fw_cfg could expose this information so guest kernels know when to
> stop enumerating the PCI bus.  This would make all PCI guests with new
> kernels boot ~50 ms faster, regardless of machine type.

The number of buses is determined by the firmware, not by QEMU, so
fw_cfg would not be the right interface.  In fact (as I have just
learnt) lastbus is an x86-specific option that overrides the last bus
returned by SeaBIOS's handle_1ab101.

So the next step could be to figure out what is the lastbus returned by
handle_1ab101 and possibly why it isn't zero.

Paolo

> The difference between microvm and tuned Q35 is 76 ms now.
> 
> microvm:
> qemu_init_end: 64.043264
> linux_start_kernel: 65.481782 (+1.438518)
> linux_start_user: 114.938353 (+49.456571)
> 
> Q35 with -cpu host and pci=lasbus=0:
> qemu_init_end: 73.711569
> linux_start_kernel: 113.414311 (+39.702742)
> linux_start_user: 190.949939 (+77.535628)
> 
> There is a ~39 ms difference before linux_start_kernel.  SeaBIOS is
> loading the PVH Option ROM.
> 
> Stefano: any recommendations for profiling or tuning SeaBIOS?




Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-23 Thread Stefan Hajnoczi
On Tue, Jul 23, 2019 at 9:43 AM Sergio Lopez  wrote:
> Montes, Julio  writes:
>
> > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
> >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez  wrote:
> >> > Stefan Hajnoczi  writes:
> >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> >> > > > Stefan Hajnoczi  writes:
> >> > > >
> >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> >> > > >  --
> >> > > >  | Conclusion |
> >> > > >  --
> >> > > >
> >> > > > The average boot time of microvm is a third of Q35's (115ms vs.
> >> > > > 363ms),
> >> > > > and is smaller on all sections (QEMU initialization, firmware
> >> > > > overhead
> >> > > > and kernel start-to-user).
> >> > > >
> >> > > > Microvm's memory tree is also visibly simpler, significantly
> >> > > > reducing
> >> > > > the exposed surface to the guest.
> >> > > >
> >> > > > While we can certainly work on making Q35 smaller, I definitely
> >> > > > think
> >> > > > it's better (and way safer!) having a specialized machine type
> >> > > > for a
> >> > > > specific use case, than a minimal Q35 whose behavior
> >> > > > significantly
> >> > > > diverges from a conventional Q35.
> >> > >
> >> > > Interesting, so not a 10x difference!  This might be amenable to
> >> > > optimization.
> >> > >
> >> > > My concern with microvm is that it's so limited that few users
> >> > > will be
> >> > > able to benefit from the reduced attack surface and faster
> >> > > startup time.
> >> > > I think it's worth investigating slimming down Q35 further first.
> >> > >
> >> > > In terms of startup time the first step would be profiling Q35
> >> > > kernel
> >> > > startup to find out what's taking so long (firmware
> >> > > initialization, PCI
> >> > > probing, etc)?
> >> >
> >> > Some findings:
> >> >
> >> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
> >> > saves a
> >> > whooping 120ms by avoiding the APIC timer calibration at
> >> > arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
> >> >
> >> > Average boot time with "-cpu host"
> >> >  qemu_init_end: 76.408950
> >> >  linux_start_kernel: 116.166142 (+39.757192)
> >> >  linux_start_user: 242.954347 (+126.788205)
> >> >
> >> > Average boot time with default "cpu"
> >> >  qemu_init_end: 77.467852
> >> >  linux_start_kernel: 116.688472 (+39.22062)
> >> >  linux_start_user: 363.033365 (+246.344893)
> >>
> >> \o/
> >>
> >> >  2. The other 130ms are a direct result of PCI and ACPI presence
> >> > (tested
> >> > with a kernel without support for those elements). I'll publish
> >> > some
> >> > detailed numbers next week.
> >>
> >> Here are the Kata Containers kernel parameters:
> >>
> >> var kernelParams = []Param{
> >> {"tsc", "reliable"},
> >> {"no_timer_check", ""},
> >> {"rcupdate.rcu_expedited", "1"},
> >> {"i8042.direct", "1"},
> >> {"i8042.dumbkbd", "1"},
> >> {"i8042.nopnp", "1"},
> >> {"i8042.noaux", "1"},
> >> {"noreplace-smp", ""},
> >> {"reboot", "k"},
> >> {"console", "hvc0"},
> >> {"console", "hvc1"},
> >> {"iommu", "off"},
> >> {"cryptomgr.notests", ""},
> >> {"net.ifnames", "0"},
> >> {"pci", "lastbus=0"},
> >> }
> >>
> >> pci lastbus=0 looks interesting and so do some of the others :).
> >>
> >
> > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
> > kernel won't scan the 255.. buses :)
>
> I can confirm that adding pci=lastbus=0 makes a significant
> improvement. In fact, is the only option from Kata's kernel parameter
> list that has an impact, probably because the kernel is already quite
> minimalistic.
>
> Average boot time with "-cpu host" and "pci=lastbus=0"
>  qemu_init_end: 73.711569
>  linux_start_kernel: 113.414311 (+39.702742)
>  linux_start_user: 190.949939 (+77.535628)
>
> That's still ~40% slower than microvm, and the breach quickly widens
> when adding more PCI devices (each one adds 10-15ms), but it's certainly
> an improvement over the original numbers.
>
> On the other hand, there isn't much we can do here from QEMU's
> perspective, as this is basically Guest OS tuning.

fw_cfg could expose this information so guest kernels know when to
stop enumerating the PCI bus.  This would make all PCI guests with new
kernels boot ~50 ms faster, regardless of machine type.

The difference between microvm and tuned Q35 is 76 ms now.

microvm:
qemu_init_end: 64.043264
linux_start_kernel: 65.481782 (+1.438518)
linux_start_user: 114.938353 (+49.456571)

Q35 with -cpu host and pci=lasbus=0:
qemu_init_end: 73.711569
linux_start_kernel: 113.414311 (+39.702742)
linux_start_user: 190.949939 (+77.535628)

There is a ~39 ms difference before linux_start_kernel.  SeaBIOS is
loading the PVH Option ROM.

Stefano: any recommendations for profiling or tuning SeaBIOS?

Stefan



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-23 Thread Sergio Lopez

Montes, Julio  writes:

> On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
>> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez  wrote:
>> > Stefan Hajnoczi  writes:
>> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
>> > > > Stefan Hajnoczi  writes:
>> > > > 
>> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
>> > > >  --
>> > > >  | Conclusion |
>> > > >  --
>> > > > 
>> > > > The average boot time of microvm is a third of Q35's (115ms vs.
>> > > > 363ms),
>> > > > and is smaller on all sections (QEMU initialization, firmware
>> > > > overhead
>> > > > and kernel start-to-user).
>> > > > 
>> > > > Microvm's memory tree is also visibly simpler, significantly
>> > > > reducing
>> > > > the exposed surface to the guest.
>> > > > 
>> > > > While we can certainly work on making Q35 smaller, I definitely
>> > > > think
>> > > > it's better (and way safer!) having a specialized machine type
>> > > > for a
>> > > > specific use case, than a minimal Q35 whose behavior
>> > > > significantly
>> > > > diverges from a conventional Q35.
>> > > 
>> > > Interesting, so not a 10x difference!  This might be amenable to
>> > > optimization.
>> > > 
>> > > My concern with microvm is that it's so limited that few users
>> > > will be
>> > > able to benefit from the reduced attack surface and faster
>> > > startup time.
>> > > I think it's worth investigating slimming down Q35 further first.
>> > > 
>> > > In terms of startup time the first step would be profiling Q35
>> > > kernel
>> > > startup to find out what's taking so long (firmware
>> > > initialization, PCI
>> > > probing, etc)?
>> > 
>> > Some findings:
>> > 
>> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
>> > saves a
>> > whooping 120ms by avoiding the APIC timer calibration at
>> > arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
>> > 
>> > Average boot time with "-cpu host"
>> >  qemu_init_end: 76.408950
>> >  linux_start_kernel: 116.166142 (+39.757192)
>> >  linux_start_user: 242.954347 (+126.788205)
>> > 
>> > Average boot time with default "cpu"
>> >  qemu_init_end: 77.467852
>> >  linux_start_kernel: 116.688472 (+39.22062)
>> >  linux_start_user: 363.033365 (+246.344893)
>> 
>> \o/
>> 
>> >  2. The other 130ms are a direct result of PCI and ACPI presence
>> > (tested
>> > with a kernel without support for those elements). I'll publish
>> > some
>> > detailed numbers next week.
>> 
>> Here are the Kata Containers kernel parameters:
>> 
>> var kernelParams = []Param{
>> {"tsc", "reliable"},
>> {"no_timer_check", ""},
>> {"rcupdate.rcu_expedited", "1"},
>> {"i8042.direct", "1"},
>> {"i8042.dumbkbd", "1"},
>> {"i8042.nopnp", "1"},
>> {"i8042.noaux", "1"},
>> {"noreplace-smp", ""},
>> {"reboot", "k"},
>> {"console", "hvc0"},
>> {"console", "hvc1"},
>> {"iommu", "off"},
>> {"cryptomgr.notests", ""},
>> {"net.ifnames", "0"},
>> {"pci", "lastbus=0"},
>> }
>> 
>> pci lastbus=0 looks interesting and so do some of the others :).
>> 
>
> yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
> kernel won't scan the 255.. buses :)

I can confirm that adding pci=lastbus=0 makes a significant
improvement. In fact, is the only option from Kata's kernel parameter
list that has an impact, probably because the kernel is already quite
minimalistic.

Average boot time with "-cpu host" and "pci=lastbus=0"
 qemu_init_end: 73.711569
 linux_start_kernel: 113.414311 (+39.702742)
 linux_start_user: 190.949939 (+77.535628)

That's still ~40% slower than microvm, and the breach quickly widens
when adding more PCI devices (each one adds 10-15ms), but it's certainly
an improvement over the original numbers.

On the other hand, there isn't much we can do here from QEMU's
perspective, as this is basically Guest OS tuning.

Sergio.


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-19 Thread Montes, Julio
On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez  wrote:
> > Stefan Hajnoczi  writes:
> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> > > > Stefan Hajnoczi  writes:
> > > > 
> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> > > >  --
> > > >  | Conclusion |
> > > >  --
> > > > 
> > > > The average boot time of microvm is a third of Q35's (115ms vs.
> > > > 363ms),
> > > > and is smaller on all sections (QEMU initialization, firmware
> > > > overhead
> > > > and kernel start-to-user).
> > > > 
> > > > Microvm's memory tree is also visibly simpler, significantly
> > > > reducing
> > > > the exposed surface to the guest.
> > > > 
> > > > While we can certainly work on making Q35 smaller, I definitely
> > > > think
> > > > it's better (and way safer!) having a specialized machine type
> > > > for a
> > > > specific use case, than a minimal Q35 whose behavior
> > > > significantly
> > > > diverges from a conventional Q35.
> > > 
> > > Interesting, so not a 10x difference!  This might be amenable to
> > > optimization.
> > > 
> > > My concern with microvm is that it's so limited that few users
> > > will be
> > > able to benefit from the reduced attack surface and faster
> > > startup time.
> > > I think it's worth investigating slimming down Q35 further first.
> > > 
> > > In terms of startup time the first step would be profiling Q35
> > > kernel
> > > startup to find out what's taking so long (firmware
> > > initialization, PCI
> > > probing, etc)?
> > 
> > Some findings:
> > 
> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
> > saves a
> > whooping 120ms by avoiding the APIC timer calibration at
> > arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
> > 
> > Average boot time with "-cpu host"
> >  qemu_init_end: 76.408950
> >  linux_start_kernel: 116.166142 (+39.757192)
> >  linux_start_user: 242.954347 (+126.788205)
> > 
> > Average boot time with default "cpu"
> >  qemu_init_end: 77.467852
> >  linux_start_kernel: 116.688472 (+39.22062)
> >  linux_start_user: 363.033365 (+246.344893)
> 
> \o/
> 
> >  2. The other 130ms are a direct result of PCI and ACPI presence
> > (tested
> > with a kernel without support for those elements). I'll publish
> > some
> > detailed numbers next week.
> 
> Here are the Kata Containers kernel parameters:
> 
> var kernelParams = []Param{
> {"tsc", "reliable"},
> {"no_timer_check", ""},
> {"rcupdate.rcu_expedited", "1"},
> {"i8042.direct", "1"},
> {"i8042.dumbkbd", "1"},
> {"i8042.nopnp", "1"},
> {"i8042.noaux", "1"},
> {"noreplace-smp", ""},
> {"reboot", "k"},
> {"console", "hvc0"},
> {"console", "hvc1"},
> {"iommu", "off"},
> {"cryptomgr.notests", ""},
> {"net.ifnames", "0"},
> {"pci", "lastbus=0"},
> }
> 
> pci lastbus=0 looks interesting and so do some of the others :).
> 

yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
kernel won't scan the 255.. buses :)

> Stefan
> 


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-19 Thread Stefan Hajnoczi
On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez  wrote:
> Stefan Hajnoczi  writes:
> > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> >>
> >> Stefan Hajnoczi  writes:
> >>
> >> > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> >>  --
> >>  | Conclusion |
> >>  --
> >>
> >> The average boot time of microvm is a third of Q35's (115ms vs. 363ms),
> >> and is smaller on all sections (QEMU initialization, firmware overhead
> >> and kernel start-to-user).
> >>
> >> Microvm's memory tree is also visibly simpler, significantly reducing
> >> the exposed surface to the guest.
> >>
> >> While we can certainly work on making Q35 smaller, I definitely think
> >> it's better (and way safer!) having a specialized machine type for a
> >> specific use case, than a minimal Q35 whose behavior significantly
> >> diverges from a conventional Q35.
> >
> > Interesting, so not a 10x difference!  This might be amenable to
> > optimization.
> >
> > My concern with microvm is that it's so limited that few users will be
> > able to benefit from the reduced attack surface and faster startup time.
> > I think it's worth investigating slimming down Q35 further first.
> >
> > In terms of startup time the first step would be profiling Q35 kernel
> > startup to find out what's taking so long (firmware initialization, PCI
> > probing, etc)?
>
> Some findings:
>
>  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host") saves a
> whooping 120ms by avoiding the APIC timer calibration at
> arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
>
> Average boot time with "-cpu host"
>  qemu_init_end: 76.408950
>  linux_start_kernel: 116.166142 (+39.757192)
>  linux_start_user: 242.954347 (+126.788205)
>
> Average boot time with default "cpu"
>  qemu_init_end: 77.467852
>  linux_start_kernel: 116.688472 (+39.22062)
>  linux_start_user: 363.033365 (+246.344893)

\o/

>  2. The other 130ms are a direct result of PCI and ACPI presence (tested
> with a kernel without support for those elements). I'll publish some
> detailed numbers next week.

Here are the Kata Containers kernel parameters:

var kernelParams = []Param{
{"tsc", "reliable"},
{"no_timer_check", ""},
{"rcupdate.rcu_expedited", "1"},
{"i8042.direct", "1"},
{"i8042.dumbkbd", "1"},
{"i8042.nopnp", "1"},
{"i8042.noaux", "1"},
{"noreplace-smp", ""},
{"reboot", "k"},
{"console", "hvc0"},
{"console", "hvc1"},
{"iommu", "off"},
{"cryptomgr.notests", ""},
{"net.ifnames", "0"},
{"pci", "lastbus=0"},
}

pci lastbus=0 looks interesting and so do some of the others :).

Stefan



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-19 Thread Sergio Lopez

Stefan Hajnoczi  writes:

> On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
>> 
>> Stefan Hajnoczi  writes:
>> 
>> > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
>> >> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> >> constructed after the machine model implemented by the latter.
>> >> 
>> >> It's main purpose is providing users a KVM-only machine type with fast
>> >> boot times, minimal attack surface (measured as the number of IO ports
>> >> and MMIO regions exposed to the Guest) and small footprint (specially
>> >> when combined with the ongoing QEMU modularization effort).
>> >> 
>> >> Normally, other than the device support provided by KVM itself,
>> >> microvm only supports virtio-mmio devices. Microvm also includes a
>> >> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>> >> for being able to see the early boot kernel messages.
>> >> 
>> >> Microvm only supports booting PVH-enabled Linux ELF images. Booting
>> >> other PVH-enabled kernels may be possible, but due to the lack of ACPI
>> >> and firmware, we're relying on the command line for specifying the
>> >> location of the virtio-mmio transports. If there's an interest on
>> >> using this machine type with other kernels, we'll try to find some
>> >> kind of middle ground solution.
>> >> 
>> >> This is the list of the exposed IO ports and MMIO regions when running
>> >> in non-legacy mode:
>> >> 
>> >> address-space: memory
>> >> d000-d1ff (prio 0, i/o): virtio-mmio
>> >> d200-d3ff (prio 0, i/o): virtio-mmio
>> >> d400-d5ff (prio 0, i/o): virtio-mmio
>> >> d600-d7ff (prio 0, i/o): virtio-mmio
>> >> d800-d9ff (prio 0, i/o): virtio-mmio
>> >> da00-dbff (prio 0, i/o): virtio-mmio
>> >> dc00-ddff (prio 0, i/o): virtio-mmio
>> >> de00-dfff (prio 0, i/o): virtio-mmio
>> >> fee0-feef (prio 4096, i/o): kvm-apic-msi
>> >> 
>> >> address-space: I/O
>> >>   - (prio 0, i/o): io
>> >> 0020-0021 (prio 0, i/o): kvm-pic
>> >> 0040-0043 (prio 0, i/o): kvm-pit
>> >> 007e-007f (prio 0, i/o): kvmvapic
>> >> 00a0-00a1 (prio 0, i/o): kvm-pic
>> >> 04d0-04d0 (prio 0, i/o): kvm-elcr
>> >> 04d1-04d1 (prio 0, i/o): kvm-elcr
>> >> 
>> >> A QEMU instance with the microvm machine type can be invoked this way:
>> >> 
>> >>  - Normal mode:
>> >> 
>> >> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>> >>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>> >>  -nodefaults -no-user-config \
>> >>  -chardev pty,id=virtiocon0,server \
>> >>  -device virtio-serial-device \
>> >>  -device virtconsole,chardev=virtiocon0 \
>> >>  -drive id=test,file=test.img,format=raw,if=none \
>> >>  -device virtio-blk-device,drive=test \
>> >>  -netdev tap,id=tap0,script=no,downscript=no \
>> >>  -device virtio-net-device,netdev=tap0
>> >> 
>> >>  - Legacy mode:
>> >> 
>> >> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>> >>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>> >>  -nodefaults -no-user-config \
>> >>  -drive id=test,file=test.img,format=raw,if=none \
>> >>  -device virtio-blk-device,drive=test \
>> >>  -netdev tap,id=tap0,script=no,downscript=no \
>> >>  -device virtio-net-device,netdev=tap0 \
>> >>  -serial stdio
>> >
>> > Please post metrics that compare this against a minimal Q35.
>> >
>> > With qboot it was later found that SeaBIOS can achieve comparable boot
>> > times, so it wasn't worth maintaining qboot.
>> >
>> > Data is needed to show that microvm is really a significant improvement
>> > over a minimal Q35.
>> 
>> I've just ran some numbers using Stefano Garzarella's qemu-boot-time
>> scripts [1] on a server with 2xIntel Xeon Silver 4114 2.20GHz, using the
>> upstream QEMU (474f3938d79ab36b9231c9ad3b5a9314c2aeacde) built with
>> minimal features [2]. The VM boots a minimal kernel [3] without initrd,
>> using a kata container image as root via virtio-blk (though this isn't
>> really relevant, as we're just taking measurements until the kernel is
>> about to exec init).
>> 
>> To try to make the comparison as fair as possible, I've used a minimal
>> q35 machine with as few devices as possible. Disabling HPET and PIT at
>> the same time caused the kernel to get stuck on boot, so I ran two
>> iterations, one without HPET and the other without PIT:
>> 
>> 
>> -
>>  | Q35 with HPET |
>>  -
>> 
>> Command line:
>> 
>> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M 
>> q35,smbus=off,nvdimm=off,pit=off,vmport=off,sata=off,usb=off,graphics=off 
>> -kernel /root/src/images/vmlinux-5.2 -append 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-19 Thread Stefan Hajnoczi
On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> 
> Stefan Hajnoczi  writes:
> 
> > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> >> Microvm is a machine type inspired by both NEMU and Firecracker, and
> >> constructed after the machine model implemented by the latter.
> >> 
> >> It's main purpose is providing users a KVM-only machine type with fast
> >> boot times, minimal attack surface (measured as the number of IO ports
> >> and MMIO regions exposed to the Guest) and small footprint (specially
> >> when combined with the ongoing QEMU modularization effort).
> >> 
> >> Normally, other than the device support provided by KVM itself,
> >> microvm only supports virtio-mmio devices. Microvm also includes a
> >> legacy mode, which adds an ISA bus with a 16550A serial port, useful
> >> for being able to see the early boot kernel messages.
> >> 
> >> Microvm only supports booting PVH-enabled Linux ELF images. Booting
> >> other PVH-enabled kernels may be possible, but due to the lack of ACPI
> >> and firmware, we're relying on the command line for specifying the
> >> location of the virtio-mmio transports. If there's an interest on
> >> using this machine type with other kernels, we'll try to find some
> >> kind of middle ground solution.
> >> 
> >> This is the list of the exposed IO ports and MMIO regions when running
> >> in non-legacy mode:
> >> 
> >> address-space: memory
> >> d000-d1ff (prio 0, i/o): virtio-mmio
> >> d200-d3ff (prio 0, i/o): virtio-mmio
> >> d400-d5ff (prio 0, i/o): virtio-mmio
> >> d600-d7ff (prio 0, i/o): virtio-mmio
> >> d800-d9ff (prio 0, i/o): virtio-mmio
> >> da00-dbff (prio 0, i/o): virtio-mmio
> >> dc00-ddff (prio 0, i/o): virtio-mmio
> >> de00-dfff (prio 0, i/o): virtio-mmio
> >> fee0-feef (prio 4096, i/o): kvm-apic-msi
> >> 
> >> address-space: I/O
> >>   - (prio 0, i/o): io
> >> 0020-0021 (prio 0, i/o): kvm-pic
> >> 0040-0043 (prio 0, i/o): kvm-pit
> >> 007e-007f (prio 0, i/o): kvmvapic
> >> 00a0-00a1 (prio 0, i/o): kvm-pic
> >> 04d0-04d0 (prio 0, i/o): kvm-elcr
> >> 04d1-04d1 (prio 0, i/o): kvm-elcr
> >> 
> >> A QEMU instance with the microvm machine type can be invoked this way:
> >> 
> >>  - Normal mode:
> >> 
> >> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
> >>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
> >>  -nodefaults -no-user-config \
> >>  -chardev pty,id=virtiocon0,server \
> >>  -device virtio-serial-device \
> >>  -device virtconsole,chardev=virtiocon0 \
> >>  -drive id=test,file=test.img,format=raw,if=none \
> >>  -device virtio-blk-device,drive=test \
> >>  -netdev tap,id=tap0,script=no,downscript=no \
> >>  -device virtio-net-device,netdev=tap0
> >> 
> >>  - Legacy mode:
> >> 
> >> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
> >>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
> >>  -nodefaults -no-user-config \
> >>  -drive id=test,file=test.img,format=raw,if=none \
> >>  -device virtio-blk-device,drive=test \
> >>  -netdev tap,id=tap0,script=no,downscript=no \
> >>  -device virtio-net-device,netdev=tap0 \
> >>  -serial stdio
> >
> > Please post metrics that compare this against a minimal Q35.
> >
> > With qboot it was later found that SeaBIOS can achieve comparable boot
> > times, so it wasn't worth maintaining qboot.
> >
> > Data is needed to show that microvm is really a significant improvement
> > over a minimal Q35.
> 
> I've just ran some numbers using Stefano Garzarella's qemu-boot-time
> scripts [1] on a server with 2xIntel Xeon Silver 4114 2.20GHz, using the
> upstream QEMU (474f3938d79ab36b9231c9ad3b5a9314c2aeacde) built with
> minimal features [2]. The VM boots a minimal kernel [3] without initrd,
> using a kata container image as root via virtio-blk (though this isn't
> really relevant, as we're just taking measurements until the kernel is
> about to exec init).
> 
> To try to make the comparison as fair as possible, I've used a minimal
> q35 machine with as few devices as possible. Disabling HPET and PIT at
> the same time caused the kernel to get stuck on boot, so I ran two
> iterations, one without HPET and the other without PIT:
> 
> 
> -
>  | Q35 with HPET |
>  -
> 
> Command line:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M 
> q35,smbus=off,nvdimm=off,pit=off,vmport=off,sata=off,usb=off,graphics=off 
> -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 
> root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev 
> pty,id=virtiocon0,server -device 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-18 Thread Sergio Lopez

Stefan Hajnoczi  writes:

> On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>> 
>> It's main purpose is providing users a KVM-only machine type with fast
>> boot times, minimal attack surface (measured as the number of IO ports
>> and MMIO regions exposed to the Guest) and small footprint (specially
>> when combined with the ongoing QEMU modularization effort).
>> 
>> Normally, other than the device support provided by KVM itself,
>> microvm only supports virtio-mmio devices. Microvm also includes a
>> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>> for being able to see the early boot kernel messages.
>> 
>> Microvm only supports booting PVH-enabled Linux ELF images. Booting
>> other PVH-enabled kernels may be possible, but due to the lack of ACPI
>> and firmware, we're relying on the command line for specifying the
>> location of the virtio-mmio transports. If there's an interest on
>> using this machine type with other kernels, we'll try to find some
>> kind of middle ground solution.
>> 
>> This is the list of the exposed IO ports and MMIO regions when running
>> in non-legacy mode:
>> 
>> address-space: memory
>> d000-d1ff (prio 0, i/o): virtio-mmio
>> d200-d3ff (prio 0, i/o): virtio-mmio
>> d400-d5ff (prio 0, i/o): virtio-mmio
>> d600-d7ff (prio 0, i/o): virtio-mmio
>> d800-d9ff (prio 0, i/o): virtio-mmio
>> da00-dbff (prio 0, i/o): virtio-mmio
>> dc00-ddff (prio 0, i/o): virtio-mmio
>> de00-dfff (prio 0, i/o): virtio-mmio
>> fee0-feef (prio 4096, i/o): kvm-apic-msi
>> 
>> address-space: I/O
>>   - (prio 0, i/o): io
>> 0020-0021 (prio 0, i/o): kvm-pic
>> 0040-0043 (prio 0, i/o): kvm-pit
>> 007e-007f (prio 0, i/o): kvmvapic
>> 00a0-00a1 (prio 0, i/o): kvm-pic
>> 04d0-04d0 (prio 0, i/o): kvm-elcr
>> 04d1-04d1 (prio 0, i/o): kvm-elcr
>> 
>> A QEMU instance with the microvm machine type can be invoked this way:
>> 
>>  - Normal mode:
>> 
>> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>  -nodefaults -no-user-config \
>>  -chardev pty,id=virtiocon0,server \
>>  -device virtio-serial-device \
>>  -device virtconsole,chardev=virtiocon0 \
>>  -drive id=test,file=test.img,format=raw,if=none \
>>  -device virtio-blk-device,drive=test \
>>  -netdev tap,id=tap0,script=no,downscript=no \
>>  -device virtio-net-device,netdev=tap0
>> 
>>  - Legacy mode:
>> 
>> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>>  -nodefaults -no-user-config \
>>  -drive id=test,file=test.img,format=raw,if=none \
>>  -device virtio-blk-device,drive=test \
>>  -netdev tap,id=tap0,script=no,downscript=no \
>>  -device virtio-net-device,netdev=tap0 \
>>  -serial stdio
>
> Please post metrics that compare this against a minimal Q35.
>
> With qboot it was later found that SeaBIOS can achieve comparable boot
> times, so it wasn't worth maintaining qboot.
>
> Data is needed to show that microvm is really a significant improvement
> over a minimal Q35.

I've just ran some numbers using Stefano Garzarella's qemu-boot-time
scripts [1] on a server with 2xIntel Xeon Silver 4114 2.20GHz, using the
upstream QEMU (474f3938d79ab36b9231c9ad3b5a9314c2aeacde) built with
minimal features [2]. The VM boots a minimal kernel [3] without initrd,
using a kata container image as root via virtio-blk (though this isn't
really relevant, as we're just taking measurements until the kernel is
about to exec init).

To try to make the comparison as fair as possible, I've used a minimal
q35 machine with as few devices as possible. Disabling HPET and PIT at
the same time caused the kernel to get stuck on boot, so I ran two
iterations, one without HPET and the other without PIT:


-
 | Q35 with HPET |
 -

Command line:

./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M 
q35,smbus=off,nvdimm=off,pit=off,vmport=off,sata=off,usb=off,graphics=off 
-kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 
root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev 
pty,id=virtiocon0,server -device virtio-serial -device 
virtconsole,chardev=virtiocon0 -drive 
id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device 
virtio-blk,drive=test

Average boot times after 10 consecutive runs:

 qemu_init_end: 77.637936
 linux_start_kernel: 117.082526 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-03 Thread Stefan Hajnoczi
On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
> 
> It's main purpose is providing users a KVM-only machine type with fast
> boot times, minimal attack surface (measured as the number of IO ports
> and MMIO regions exposed to the Guest) and small footprint (specially
> when combined with the ongoing QEMU modularization effort).
> 
> Normally, other than the device support provided by KVM itself,
> microvm only supports virtio-mmio devices. Microvm also includes a
> legacy mode, which adds an ISA bus with a 16550A serial port, useful
> for being able to see the early boot kernel messages.
> 
> Microvm only supports booting PVH-enabled Linux ELF images. Booting
> other PVH-enabled kernels may be possible, but due to the lack of ACPI
> and firmware, we're relying on the command line for specifying the
> location of the virtio-mmio transports. If there's an interest on
> using this machine type with other kernels, we'll try to find some
> kind of middle ground solution.
> 
> This is the list of the exposed IO ports and MMIO regions when running
> in non-legacy mode:
> 
> address-space: memory
> d000-d1ff (prio 0, i/o): virtio-mmio
> d200-d3ff (prio 0, i/o): virtio-mmio
> d400-d5ff (prio 0, i/o): virtio-mmio
> d600-d7ff (prio 0, i/o): virtio-mmio
> d800-d9ff (prio 0, i/o): virtio-mmio
> da00-dbff (prio 0, i/o): virtio-mmio
> dc00-ddff (prio 0, i/o): virtio-mmio
> de00-dfff (prio 0, i/o): virtio-mmio
> fee0-feef (prio 4096, i/o): kvm-apic-msi
> 
> address-space: I/O
>   - (prio 0, i/o): io
> 0020-0021 (prio 0, i/o): kvm-pic
> 0040-0043 (prio 0, i/o): kvm-pit
> 007e-007f (prio 0, i/o): kvmvapic
> 00a0-00a1 (prio 0, i/o): kvm-pic
> 04d0-04d0 (prio 0, i/o): kvm-elcr
> 04d1-04d1 (prio 0, i/o): kvm-elcr
> 
> A QEMU instance with the microvm machine type can be invoked this way:
> 
>  - Normal mode:
> 
> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>  -nodefaults -no-user-config \
>  -chardev pty,id=virtiocon0,server \
>  -device virtio-serial-device \
>  -device virtconsole,chardev=virtiocon0 \
>  -drive id=test,file=test.img,format=raw,if=none \
>  -device virtio-blk-device,drive=test \
>  -netdev tap,id=tap0,script=no,downscript=no \
>  -device virtio-net-device,netdev=tap0
> 
>  - Legacy mode:
> 
> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>  -nodefaults -no-user-config \
>  -drive id=test,file=test.img,format=raw,if=none \
>  -device virtio-blk-device,drive=test \
>  -netdev tap,id=tap0,script=no,downscript=no \
>  -device virtio-net-device,netdev=tap0 \
>  -serial stdio

Please post metrics that compare this against a minimal Q35.

With qboot it was later found that SeaBIOS can achieve comparable boot
times, so it wasn't worth maintaining qboot.

Data is needed to show that microvm is really a significant improvement
over a minimal Q35.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-02 Thread Sergio Lopez
On Tue, Jul 02, 2019 at 07:04:15PM +0100, Peter Maydell wrote:
> On Tue, 2 Jul 2019 at 18:34, Sergio Lopez  wrote:
> > Peter Maydell  writes:
> > > Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
> > > a bit deprecated and tends not to support all the features that
> > > virtio-pci does. It was introduced mostly as a stopgap while we
> > > didn't have pci support in the aarch64 virt machine, and remains
> > > for legacy "we don't like to break existing working setups" rather
> > > than as a recommended config for new systems.
> >
> > Using virtio-pci implies keeping PCI and ACPI support, defeating a
> > significant part of microvm's purpose.
> >
> > What are the issues with the current state of virtio-mmio? Is there a
> > way I can help to improve the situation?
> 
> Off the top of my head:
>  * limitations on numbers of devices
>  * no hotplug support
>  * unlike PCI, it's not probeable, so you have to tell the
>guest where all the transports are using device tree or
>some similar mechanism
>  * you need one IRQ line per transport, which restricts how
>many you can have
>  * it's only virtio-0.9, it doesn't support any of the new
>virtio-1.0 functionality
>  * it is broadly not really maintained in QEMU (and I think
>not really in the kernel either? not sure), because we'd
>rather not have to maintain two mechanisms for doing virtio
>when virtio-pci is clearly better than virtio-mmio

Some of these are design issues, but others can be improved with a bit
of work.

As for the maintenance burden, I volunteer myself to help with that, so
it won't have an impact on other developers and/or projects.

Sergio.




Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-02 Thread Sergio Lopez

Peter Maydell  writes:

> On Tue, 2 Jul 2019 at 13:14, Sergio Lopez  wrote:
>>
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>>
>> It's main purpose is providing users a KVM-only machine type with fast
>> boot times, minimal attack surface (measured as the number of IO ports
>> and MMIO regions exposed to the Guest) and small footprint (specially
>> when combined with the ongoing QEMU modularization effort).
>>
>> Normally, other than the device support provided by KVM itself,
>> microvm only supports virtio-mmio devices. Microvm also includes a
>> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>> for being able to see the early boot kernel messages.
>
> Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
> a bit deprecated and tends not to support all the features that
> virtio-pci does. It was introduced mostly as a stopgap while we
> didn't have pci support in the aarch64 virt machine, and remains
> for legacy "we don't like to break existing working setups" rather
> than as a recommended config for new systems.

Using virtio-pci implies keeping PCI and ACPI support, defeating a
significant part of microvm's purpose.

What are the issues with the current state of virtio-mmio? Is there a
way I can help to improve the situation?

Sergio.




signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-02 Thread Peter Maydell
On Tue, 2 Jul 2019 at 18:34, Sergio Lopez  wrote:
> Peter Maydell  writes:
> > Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
> > a bit deprecated and tends not to support all the features that
> > virtio-pci does. It was introduced mostly as a stopgap while we
> > didn't have pci support in the aarch64 virt machine, and remains
> > for legacy "we don't like to break existing working setups" rather
> > than as a recommended config for new systems.
>
> Using virtio-pci implies keeping PCI and ACPI support, defeating a
> significant part of microvm's purpose.
>
> What are the issues with the current state of virtio-mmio? Is there a
> way I can help to improve the situation?

Off the top of my head:
 * limitations on numbers of devices
 * no hotplug support
 * unlike PCI, it's not probeable, so you have to tell the
   guest where all the transports are using device tree or
   some similar mechanism
 * you need one IRQ line per transport, which restricts how
   many you can have
 * it's only virtio-0.9, it doesn't support any of the new
   virtio-1.0 functionality
 * it is broadly not really maintained in QEMU (and I think
   not really in the kernel either? not sure), because we'd
   rather not have to maintain two mechanisms for doing virtio
   when virtio-pci is clearly better than virtio-mmio

thanks
-- PMM



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-02 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20190702121106.28374-1-...@redhat.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

PASS 2 fdc-test /x86_64/fdc/no_media_on_start
PASS 3 fdc-test /x86_64/fdc/read_without_media
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/check-qlit -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="check-qlit" 
==7808==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 fdc-test /x86_64/fdc/media_change
PASS 5 fdc-test /x86_64/fdc/sense_interrupt
PASS 6 fdc-test /x86_64/fdc/relative_seek
---
PASS 32 test-opts-visitor /visitor/opts/range/beyond
PASS 33 test-opts-visitor /visitor/opts/dict/unvisited
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-coroutine -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-coroutine" 
==7851==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
==7851==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 
0x7ffc0ad0a000; bottom 0x7fa44def8000; size: 0x0057bce12000 (376831025152)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-coroutine /basic/no-dangling-access
---
PASS 11 test-aio /aio/event/wait
PASS 12 test-aio /aio/event/flush
PASS 13 test-aio /aio/event/wait/no-flush-cb
==7866==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 14 test-aio /aio/timer/schedule
PASS 15 test-aio /aio/coroutine/queue-chaining
PASS 16 test-aio /aio-gsource/flush
---
PASS 28 test-aio /aio-gsource/timer/schedule
PASS 13 fdc-test /x86_64/fdc/fuzz-registers
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-aio-multithread -m=quick -k --tap < /dev/null | 
./scripts/tap-driver.pl --test-name="test-aio-multithread" 
==7873==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-aio-multithread /aio/multi/lifecycle
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img 
tests/ide-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="ide-test" 
==7890==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 2 test-aio-multithread /aio/multi/schedule
PASS 1 ide-test /x86_64/ide/identify
PASS 3 test-aio-multithread /aio/multi/mutex/contended
==7901==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 2 ide-test /x86_64/ide/flush
==7912==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 3 ide-test /x86_64/ide/bmdma/simple_rw
==7918==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 test-aio-multithread /aio/multi/mutex/handoff
PASS 4 ide-test /x86_64/ide/bmdma/trim
==7929==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 5 test-aio-multithread /aio/multi/mutex/mcs
PASS 5 ide-test /x86_64/ide/bmdma/short_prdt
==7940==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 6 test-aio-multithread /aio/multi/mutex/pthread
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-throttle -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-throttle" 
PASS 6 ide-test /x86_64/ide/bmdma/one_sector_short_prdt
==7948==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-throttle /throttle/leak_bucket
PASS 2 test-throttle /throttle/compute_wait
PASS 3 test-throttle /throttle/init
---
PASS 14 test-throttle /throttle/config/max
PASS 15 test-throttle /throttle/config/iops_size
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-thread-pool -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-thread-pool" 
==7951==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
==7955==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-thread-pool /thread-pool/submit
PASS 2 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-02 Thread Peter Maydell
On Tue, 2 Jul 2019 at 13:14, Sergio Lopez  wrote:
>
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
>
> It's main purpose is providing users a KVM-only machine type with fast
> boot times, minimal attack surface (measured as the number of IO ports
> and MMIO regions exposed to the Guest) and small footprint (specially
> when combined with the ongoing QEMU modularization effort).
>
> Normally, other than the device support provided by KVM itself,
> microvm only supports virtio-mmio devices. Microvm also includes a
> legacy mode, which adds an ISA bus with a 16550A serial port, useful
> for being able to see the early boot kernel messages.

Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
a bit deprecated and tends not to support all the features that
virtio-pci does. It was introduced mostly as a stopgap while we
didn't have pci support in the aarch64 virt machine, and remains
for legacy "we don't like to break existing working setups" rather
than as a recommended config for new systems.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-02 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20190702121106.28374-1-...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Message-id: 20190702121106.28374-1-...@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20190702113414.6896-1-arm...@redhat.com -> 
patchew/20190702113414.6896-1-arm...@redhat.com
Switched to a new branch 'test'
8ebe540 hw/i386: Introduce the microvm machine type
ac71c2a hw/i386: Factorize PVH related functions
faeccbd hw/i386: Add an Intel MPTable generator
7540b93 hw/virtio: Factorize virtio-mmio headers

=== OUTPUT BEGIN ===
1/4 Checking commit 7540b9358a0f (hw/virtio: Factorize virtio-mmio headers)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#66: 
new file mode 100644

total: 0 errors, 1 warnings, 105 lines checked

Patch 1/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
2/4 Checking commit faeccbd2c589 (hw/i386: Add an Intel MPTable generator)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#16: 
new file mode 100644

total: 0 errors, 1 warnings, 374 lines checked

Patch 2/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
3/4 Checking commit ac71c2af3972 (hw/i386: Factorize PVH related functions)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#186: 
new file mode 100644

ERROR: do not initialise statics to 0 or NULL
#210: FILE: hw/i386/pvh.c:20:
+static size_t pvh_start_addr = 0;

total: 1 errors, 1 warnings, 281 lines checked

Patch 3/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/4 Checking commit 8ebe540c4430 (hw/i386: Introduce the microvm machine type)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#67: 
new file mode 100644

ERROR: Error messages should not contain newlines
#291: FILE: hw/i386/microvm.c:220:
+error_report("qemu: error reading initrd %s: %s\n",

ERROR: Error messages should not contain newlines
#299: FILE: hw/i386/microvm.c:228:
+ "(max: %"PRIu32", need %"PRId64")\n",

total: 2 errors, 1 warnings, 653 lines checked

Patch 4/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190702121106.28374-1-...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-02 Thread Sergio Lopez
Microvm is a machine type inspired by both NEMU and Firecracker, and
constructed after the machine model implemented by the latter.

It's main purpose is providing users a KVM-only machine type with fast
boot times, minimal attack surface (measured as the number of IO ports
and MMIO regions exposed to the Guest) and small footprint (specially
when combined with the ongoing QEMU modularization effort).

Normally, other than the device support provided by KVM itself,
microvm only supports virtio-mmio devices. Microvm also includes a
legacy mode, which adds an ISA bus with a 16550A serial port, useful
for being able to see the early boot kernel messages.

Microvm only supports booting PVH-enabled Linux ELF images. Booting
other PVH-enabled kernels may be possible, but due to the lack of ACPI
and firmware, we're relying on the command line for specifying the
location of the virtio-mmio transports. If there's an interest on
using this machine type with other kernels, we'll try to find some
kind of middle ground solution.

This is the list of the exposed IO ports and MMIO regions when running
in non-legacy mode:

address-space: memory
d000-d1ff (prio 0, i/o): virtio-mmio
d200-d3ff (prio 0, i/o): virtio-mmio
d400-d5ff (prio 0, i/o): virtio-mmio
d600-d7ff (prio 0, i/o): virtio-mmio
d800-d9ff (prio 0, i/o): virtio-mmio
da00-dbff (prio 0, i/o): virtio-mmio
dc00-ddff (prio 0, i/o): virtio-mmio
de00-dfff (prio 0, i/o): virtio-mmio
fee0-feef (prio 4096, i/o): kvm-apic-msi

address-space: I/O
  - (prio 0, i/o): io
0020-0021 (prio 0, i/o): kvm-pic
0040-0043 (prio 0, i/o): kvm-pit
007e-007f (prio 0, i/o): kvmvapic
00a0-00a1 (prio 0, i/o): kvm-pic
04d0-04d0 (prio 0, i/o): kvm-elcr
04d1-04d1 (prio 0, i/o): kvm-elcr

A QEMU instance with the microvm machine type can be invoked this way:

 - Normal mode:

qemu-system-x86_64 -M microvm -m 512m -smp 2 \
 -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
 -nodefaults -no-user-config \
 -chardev pty,id=virtiocon0,server \
 -device virtio-serial-device \
 -device virtconsole,chardev=virtiocon0 \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

 - Legacy mode:

qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
 -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
 -nodefaults -no-user-config \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0 \
 -serial stdio


Changelog:
v3:
  - Add initrd support (thanks Stefano).

v2:
  - Drop "[PATCH 1/4] hw/i386: Factorize CPU routine".
  - Simplify machine definition (thanks Eduardo).
  - Remove use of unneeded NUMA-related callbacks (thanks Eduardo).
  - Add a patch to factorize PVH-related functions.
  - Replace use of Linux's Zero Page with PVH (thanks Maran and Paolo).


Sergio Lopez (4):
  hw/virtio: Factorize virtio-mmio headers
  hw/i386: Add an Intel MPTable generator
  hw/i386: Factorize PVH related functions
  hw/i386: Introduce the microvm machine type

 default-configs/i386-softmmu.mak|   1 +
 hw/i386/Kconfig |   4 +
 hw/i386/Makefile.objs   |   2 +
 hw/i386/microvm.c   | 550 
 hw/i386/mptable.c   | 156 ++
 hw/i386/pc.c| 120 +
 hw/i386/pvh.c   | 113 
 hw/i386/pvh.h   |  10 +
 hw/virtio/virtio-mmio.c |  35 +-
 hw/virtio/virtio-mmio.h |  60 +++
 include/hw/i386/microvm.h   |  82 +++
 include/hw/i386/mptable.h   |  36 ++
 include/standard-headers/linux/mpspec_def.h | 182 +++
 13 files changed, 1209 insertions(+), 142 deletions(-)
 create mode 100644 hw/i386/microvm.c
 create mode 100644 hw/i386/mptable.c
 create mode 100644 hw/i386/pvh.c
 create mode 100644 hw/i386/pvh.h
 create mode 100644 hw/virtio/virtio-mmio.h
 create mode 100644 include/hw/i386/microvm.h
 create mode 100644 include/hw/i386/mptable.h
 create mode 100644 include/standard-headers/linux/mpspec_def.h

--
2.21.0