Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-23 Thread Maciej S. Szmigiero

On 22.06.2023 20:45, Maciej S. Szmigiero wrote:

On 22.06.2023 14:52, David Hildenbrand wrote:

On 22.06.23 14:14, Maciej S. Szmigiero wrote:

On 22.06.2023 14:06, David Hildenbrand wrote:

On 22.06.23 13:17, Maciej S. Szmigiero wrote:

On 22.06.2023 13:15, David Hildenbrand wrote:

On 22.06.23 13:12, Maciej S. Szmigiero wrote:

On 22.06.2023 13:01, David Hildenbrand wrote:

[...]


We'd use a memory region container as device memory region (like [1]) and would 
have to handle the !memdev case (I can help with that). > Into that, you can 
map the RAM memory region on demand (and eventually even using multiple slots like 
[1]).

(2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code 
handle (un)plugging of the device.


(1) feels cleanest to me, although it will require a bit more work.



I also think approach (1) makes more sense as it avoids memslot metadata
overhead for not-yet-hot-added parts of the memory backing device.

Not sure what you mean that the !memdev case would be problematic in this
case - it is working in the current driver shape so why would adding
potential memory subregions (used in the memdev case) change that?


I'm thinking about the case where you have a hv-balloon device without a memdev.

Without -m X,maxmem=y we don't currently expect to have memory devices around
(and especially them getting (un)plugged. But why should we "force" to set the
"maxmem" option


I guess it's only a small change to QEMU to allow having hv-balloon
device (without a memdev) even in the case where there's no "maxmem"
option given on the QEMU command line.



I hope I'll find some time soonish to prototype what I have in mind, to see
if it could be made working.



Okay, so I'll wait for your prototype before commencing further work on
the next version of this driver.


About to have something simplistic running -- I think. Want to test with a 
Linux VM, but I don't seem to get it working (also without my changes).


#!/bin/bash

build/qemu-system-x86_64 \
        --enable-kvm \
        -m 4G,maxmem=36G \
        -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
        -smp 16 \
        -nographic \
        -nodefaults \
        -net nic -net user \
        -chardev stdio,nosignal,id=serial \
        -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
        -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
        -device isa-serial,chardev=serial \
        -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
        -mon chardev=monitor,mode=readline \
        -device vmbus-bridge \
        -object memory-backend-ram,size=2G,id=mem0 \
        -device hv-balloon,id=hv1,memdev=mem0



[root@vm-0 ~]# uname -r
6.3.5-100.fc37.x86_64
[root@vm-0 ~]# modprobe hv_balloon
modprobe: ERROR: could not insert 'hv_balloon': No such device


Any magic flag I am missing? Or is there something preventing this to work with 
Linux VMs?



Haven't tested the driver with Linux guests in a long time (as it is
targeting Windows), but I think you need to disable KVM PV interface for
the Hyper-V one to be detected by Linux.

Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
the detected hypervisor is now Hyper-V.

Also, you need to disable S4 in the guest for hot-add capability to work
(I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).

Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
to QEMU command line to see what's happening.


VM is not happy:

[    1.908595] BUG: kernel NULL pointer dereference, address: 0007
[    1.908837] #PF: supervisor read access in kernel mode
[    1.908837] #PF: error_code(0x) - not-present page
[    1.908837] PGD 0 P4D 0
[    1.908837] Oops:  [#1] PREEMPT SMP NOPTI
[    1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 
6.3.5-100.fc37.x86_64 #1
[    1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-0-gea1b7a073390-p4
[    1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0
[    1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 
03 00 00 44 89 c3 4c 89 cb
[    1.908837] RSP: 0018:95b680ad7950 EFLAGS: 00010286
[    1.908837] RAX: 95b680ad79e0 RBX: 0002 RCX: 0003
[    1.908837] RDX:  RSI: 8a0283a3c558 RDI: a4b376e0
[    1.908837] RBP:  R08: 0002 R09: 
[    1.908837] R10: 8a02811034ec R11:  R12: 
[    1.908837] R13: 8a02811034e8 R14: 8a02811034e8 R15: 
[    1.908837] FS:  7f3bb2e7d0c0() GS:8a02bbd4() 
knlGS:
[    1.908837] CS:  0010 DS:  ES:  CR0: 80050033
[    1.908837] CR2: 0007 CR3: 000100a58002 CR4: 00770ee0
[    1.908837] PKRU: 5554
[    1.908837] Call Trace:
[    1.908837]  
[    1.908837]  ? __die+0x23/0x70
[    1.908837]  ? page_fault_oops+0x171/0x4e0
[    1.9088

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-23 Thread David Hildenbrand

On 22.06.23 20:45, Maciej S. Szmigiero wrote:

On 22.06.2023 14:52, David Hildenbrand wrote:

On 22.06.23 14:14, Maciej S. Szmigiero wrote:

On 22.06.2023 14:06, David Hildenbrand wrote:

On 22.06.23 13:17, Maciej S. Szmigiero wrote:

On 22.06.2023 13:15, David Hildenbrand wrote:

On 22.06.23 13:12, Maciej S. Szmigiero wrote:

On 22.06.2023 13:01, David Hildenbrand wrote:

[...]


We'd use a memory region container as device memory region (like [1]) and would 
have to handle the !memdev case (I can help with that). > Into that, you can 
map the RAM memory region on demand (and eventually even using multiple slots like 
[1]).

(2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code 
handle (un)plugging of the device.


(1) feels cleanest to me, although it will require a bit more work.



I also think approach (1) makes more sense as it avoids memslot metadata
overhead for not-yet-hot-added parts of the memory backing device.

Not sure what you mean that the !memdev case would be problematic in this
case - it is working in the current driver shape so why would adding
potential memory subregions (used in the memdev case) change that?


I'm thinking about the case where you have a hv-balloon device without a memdev.

Without -m X,maxmem=y we don't currently expect to have memory devices around
(and especially them getting (un)plugged. But why should we "force" to set the
"maxmem" option


I guess it's only a small change to QEMU to allow having hv-balloon
device (without a memdev) even in the case where there's no "maxmem"
option given on the QEMU command line.



I hope I'll find some time soonish to prototype what I have in mind, to see
if it could be made working.



Okay, so I'll wait for your prototype before commencing further work on
the next version of this driver.


About to have something simplistic running -- I think. Want to test with a 
Linux VM, but I don't seem to get it working (also without my changes).


#!/bin/bash

build/qemu-system-x86_64 \
         --enable-kvm \
         -m 4G,maxmem=36G \
         -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
         -smp 16 \
         -nographic \
         -nodefaults \
         -net nic -net user \
         -chardev stdio,nosignal,id=serial \
         -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
         -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
         -device isa-serial,chardev=serial \
         -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
         -mon chardev=monitor,mode=readline \
         -device vmbus-bridge \
         -object memory-backend-ram,size=2G,id=mem0 \
         -device hv-balloon,id=hv1,memdev=mem0



[root@vm-0 ~]# uname -r
6.3.5-100.fc37.x86_64
[root@vm-0 ~]# modprobe hv_balloon
modprobe: ERROR: could not insert 'hv_balloon': No such device


Any magic flag I am missing? Or is there something preventing this to work with 
Linux VMs?



Haven't tested the driver with Linux guests in a long time (as it is
targeting Windows), but I think you need to disable KVM PV interface for
the Hyper-V one to be detected by Linux.

Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
the detected hypervisor is now Hyper-V.

Also, you need to disable S4 in the guest for hot-add capability to work
(I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).

Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
to QEMU command line to see what's happening.


VM is not happy:

[    1.908595] BUG: kernel NULL pointer dereference, address: 0007
[    1.908837] #PF: supervisor read access in kernel mode
[    1.908837] #PF: error_code(0x) - not-present page
[    1.908837] PGD 0 P4D 0
[    1.908837] Oops:  [#1] PREEMPT SMP NOPTI
[    1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 
6.3.5-100.fc37.x86_64 #1
[    1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-0-gea1b7a073390-p4
[    1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0
[    1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 
03 00 00 44 89 c3 4c 89 cb
[    1.908837] RSP: 0018:95b680ad7950 EFLAGS: 00010286
[    1.908837] RAX: 95b680ad79e0 RBX: 0002 RCX: 0003
[    1.908837] RDX:  RSI: 8a0283a3c558 RDI: a4b376e0
[    1.908837] RBP:  R08: 0002 R09: 
[    1.908837] R10: 8a02811034ec R11:  R12: 
[    1.908837] R13: 8a02811034e8 R14: 8a02811034e8 R15: 
[    1.908837] FS:  7f3bb2e7d0c0() GS:8a02bbd4() 
knlGS:
[    1.908837] CS:  0010 DS:  ES:  CR0: 80050033
[    1.908837] CR2: 0007 CR3: 000100a58002 CR4: 00770ee0
[    1.908837] PKRU: 5554
[    1.908837] Call Trace:
[    1.908837]  
[    1.908837]  ? __die+0x23/0x70
[    1.908837]  ? page_fault_oops+0x171/0x4

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-22 Thread Maciej S. Szmigiero

On 22.06.2023 14:52, David Hildenbrand wrote:

On 22.06.23 14:14, Maciej S. Szmigiero wrote:

On 22.06.2023 14:06, David Hildenbrand wrote:

On 22.06.23 13:17, Maciej S. Szmigiero wrote:

On 22.06.2023 13:15, David Hildenbrand wrote:

On 22.06.23 13:12, Maciej S. Szmigiero wrote:

On 22.06.2023 13:01, David Hildenbrand wrote:

[...]


We'd use a memory region container as device memory region (like [1]) and would 
have to handle the !memdev case (I can help with that). > Into that, you can 
map the RAM memory region on demand (and eventually even using multiple slots like 
[1]).

(2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code 
handle (un)plugging of the device.


(1) feels cleanest to me, although it will require a bit more work.



I also think approach (1) makes more sense as it avoids memslot metadata
overhead for not-yet-hot-added parts of the memory backing device.

Not sure what you mean that the !memdev case would be problematic in this
case - it is working in the current driver shape so why would adding
potential memory subregions (used in the memdev case) change that?


I'm thinking about the case where you have a hv-balloon device without a memdev.

Without -m X,maxmem=y we don't currently expect to have memory devices around
(and especially them getting (un)plugged. But why should we "force" to set the
"maxmem" option


I guess it's only a small change to QEMU to allow having hv-balloon
device (without a memdev) even in the case where there's no "maxmem"
option given on the QEMU command line.



I hope I'll find some time soonish to prototype what I have in mind, to see
if it could be made working.



Okay, so I'll wait for your prototype before commencing further work on
the next version of this driver.


About to have something simplistic running -- I think. Want to test with a 
Linux VM, but I don't seem to get it working (also without my changes).


#!/bin/bash

build/qemu-system-x86_64 \
        --enable-kvm \
        -m 4G,maxmem=36G \
        -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
        -smp 16 \
        -nographic \
        -nodefaults \
        -net nic -net user \
        -chardev stdio,nosignal,id=serial \
        -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
        -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
        -device isa-serial,chardev=serial \
        -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
        -mon chardev=monitor,mode=readline \
        -device vmbus-bridge \
        -object memory-backend-ram,size=2G,id=mem0 \
        -device hv-balloon,id=hv1,memdev=mem0



[root@vm-0 ~]# uname -r
6.3.5-100.fc37.x86_64
[root@vm-0 ~]# modprobe hv_balloon
modprobe: ERROR: could not insert 'hv_balloon': No such device


Any magic flag I am missing? Or is there something preventing this to work with 
Linux VMs?



Haven't tested the driver with Linux guests in a long time (as it is
targeting Windows), but I think you need to disable KVM PV interface for
the Hyper-V one to be detected by Linux.

Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
the detected hypervisor is now Hyper-V.

Also, you need to disable S4 in the guest for hot-add capability to work
(I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).

Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
to QEMU command line to see what's happening.


VM is not happy:

[    1.908595] BUG: kernel NULL pointer dereference, address: 0007
[    1.908837] #PF: supervisor read access in kernel mode
[    1.908837] #PF: error_code(0x) - not-present page
[    1.908837] PGD 0 P4D 0
[    1.908837] Oops:  [#1] PREEMPT SMP NOPTI
[    1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 
6.3.5-100.fc37.x86_64 #1
[    1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-0-gea1b7a073390-p4
[    1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0
[    1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 
03 00 00 44 89 c3 4c 89 cb
[    1.908837] RSP: 0018:95b680ad7950 EFLAGS: 00010286
[    1.908837] RAX: 95b680ad79e0 RBX: 0002 RCX: 0003
[    1.908837] RDX:  RSI: 8a0283a3c558 RDI: a4b376e0
[    1.908837] RBP:  R08: 0002 R09: 
[    1.908837] R10: 8a02811034ec R11:  R12: 
[    1.908837] R13: 8a02811034e8 R14: 8a02811034e8 R15: 
[    1.908837] FS:  7f3bb2e7d0c0() GS:8a02bbd4() 
knlGS:
[    1.908837] CS:  0010 DS:  ES:  CR0: 80050033
[    1.908837] CR2: 0007 CR3: 000100a58002 CR4: 00770ee0
[    1.908837] PKRU: 5554
[    1.908837] Call Trace:
[    1.908837]  
[    1.908837]  ? __die+0x23/0x70
[    1.908837]  ? page_fault_oops+0x171/0x4e0
[    1.908837]  ? prepare_alloc_pages.constprop.0+0xf6/0x1a0

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-22 Thread David Hildenbrand

On 22.06.23 14:14, Maciej S. Szmigiero wrote:

On 22.06.2023 14:06, David Hildenbrand wrote:

On 22.06.23 13:17, Maciej S. Szmigiero wrote:

On 22.06.2023 13:15, David Hildenbrand wrote:

On 22.06.23 13:12, Maciej S. Szmigiero wrote:

On 22.06.2023 13:01, David Hildenbrand wrote:

[...]


We'd use a memory region container as device memory region (like [1]) and would 
have to handle the !memdev case (I can help with that). > Into that, you can 
map the RAM memory region on demand (and eventually even using multiple slots like 
[1]).

(2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code 
handle (un)plugging of the device.


(1) feels cleanest to me, although it will require a bit more work.



I also think approach (1) makes more sense as it avoids memslot metadata
overhead for not-yet-hot-added parts of the memory backing device.

Not sure what you mean that the !memdev case would be problematic in this
case - it is working in the current driver shape so why would adding
potential memory subregions (used in the memdev case) change that?


I'm thinking about the case where you have a hv-balloon device without a memdev.

Without -m X,maxmem=y we don't currently expect to have memory devices around
(and especially them getting (un)plugged. But why should we "force" to set the
"maxmem" option


I guess it's only a small change to QEMU to allow having hv-balloon
device (without a memdev) even in the case where there's no "maxmem"
option given on the QEMU command line.



I hope I'll find some time soonish to prototype what I have in mind, to see
if it could be made working.



Okay, so I'll wait for your prototype before commencing further work on
the next version of this driver.


About to have something simplistic running -- I think. Want to test with a 
Linux VM, but I don't seem to get it working (also without my changes).


#!/bin/bash

build/qemu-system-x86_64 \
        --enable-kvm \
        -m 4G,maxmem=36G \
        -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
        -smp 16 \
        -nographic \
        -nodefaults \
        -net nic -net user \
        -chardev stdio,nosignal,id=serial \
        -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
        -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
        -device isa-serial,chardev=serial \
        -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
        -mon chardev=monitor,mode=readline \
        -device vmbus-bridge \
        -object memory-backend-ram,size=2G,id=mem0 \
        -device hv-balloon,id=hv1,memdev=mem0



[root@vm-0 ~]# uname -r
6.3.5-100.fc37.x86_64
[root@vm-0 ~]# modprobe hv_balloon
modprobe: ERROR: could not insert 'hv_balloon': No such device


Any magic flag I am missing? Or is there something preventing this to work with 
Linux VMs?



Haven't tested the driver with Linux guests in a long time (as it is
targeting Windows), but I think you need to disable KVM PV interface for
the Hyper-V one to be detected by Linux.

Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
the detected hypervisor is now Hyper-V.

Also, you need to disable S4 in the guest for hot-add capability to work
(I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).

Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
to QEMU command line to see what's happening.


VM is not happy:

[    1.908595] BUG: kernel NULL pointer dereference, address: 0007
[    1.908837] #PF: supervisor read access in kernel mode
[    1.908837] #PF: error_code(0x) - not-present page
[    1.908837] PGD 0 P4D 0
[    1.908837] Oops:  [#1] PREEMPT SMP NOPTI
[    1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 
6.3.5-100.fc37.x86_64 #1
[    1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-0-gea1b7a073390-p4
[    1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0
[    1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 
03 00 00 44 89 c3 4c 89 cb
[    1.908837] RSP: 0018:95b680ad7950 EFLAGS: 00010286
[    1.908837] RAX: 95b680ad79e0 RBX: 0002 RCX: 0003
[    1.908837] RDX:  RSI: 8a0283a3c558 RDI: a4b376e0
[    1.908837] RBP:  R08: 0002 R09: 
[    1.908837] R10: 8a02811034ec R11:  R12: 
[    1.908837] R13: 8a02811034e8 R14: 8a02811034e8 R15: 
[    1.908837] FS:  7f3bb2e7d0c0() GS:8a02bbd4() 
knlGS:
[    1.908837] CS:  0010 DS:  ES:  CR0: 80050033
[    1.908837] CR2: 0007 CR3: 000100a58002 CR4: 00770ee0
[    1.908837] PKRU: 5554
[    1.908837] Call Trace:
[    1.908837]  
[    1.908837]  ? __die+0x23/0x70
[    1.908837]  ? page_fault_oops+0x171/0x4e0
[    1.908837]  ? prepare_alloc_pages.constprop.0+0xf6/0x1a0
[    1.908837]  ? exc_page_fault+0x74/0x170
[ 

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-22 Thread Maciej S. Szmigiero

On 22.06.2023 14:14, Maciej S. Szmigiero wrote:

On 22.06.2023 14:06, David Hildenbrand wrote:

On 22.06.23 13:17, Maciej S. Szmigiero wrote:

On 22.06.2023 13:15, David Hildenbrand wrote:

On 22.06.23 13:12, Maciej S. Szmigiero wrote:

On 22.06.2023 13:01, David Hildenbrand wrote:

[...]


We'd use a memory region container as device memory region (like [1]) and would 
have to handle the !memdev case (I can help with that). > Into that, you can 
map the RAM memory region on demand (and eventually even using multiple slots like 
[1]).

(2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code 
handle (un)plugging of the device.


(1) feels cleanest to me, although it will require a bit more work.



I also think approach (1) makes more sense as it avoids memslot metadata
overhead for not-yet-hot-added parts of the memory backing device.

Not sure what you mean that the !memdev case would be problematic in this
case - it is working in the current driver shape so why would adding
potential memory subregions (used in the memdev case) change that?


I'm thinking about the case where you have a hv-balloon device without a memdev.

Without -m X,maxmem=y we don't currently expect to have memory devices around
(and especially them getting (un)plugged. But why should we "force" to set the
"maxmem" option


I guess it's only a small change to QEMU to allow having hv-balloon
device (without a memdev) even in the case where there's no "maxmem"
option given on the QEMU command line.



I hope I'll find some time soonish to prototype what I have in mind, to see
if it could be made working.



Okay, so I'll wait for your prototype before commencing further work on
the next version of this driver.


About to have something simplistic running -- I think. Want to test with a 
Linux VM, but I don't seem to get it working (also without my changes).


#!/bin/bash

build/qemu-system-x86_64 \
       --enable-kvm \
       -m 4G,maxmem=36G \
       -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
       -smp 16 \
       -nographic \
       -nodefaults \
       -net nic -net user \
       -chardev stdio,nosignal,id=serial \
       -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
       -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
       -device isa-serial,chardev=serial \
       -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
       -mon chardev=monitor,mode=readline \
       -device vmbus-bridge \
       -object memory-backend-ram,size=2G,id=mem0 \
       -device hv-balloon,id=hv1,memdev=mem0



[root@vm-0 ~]# uname -r
6.3.5-100.fc37.x86_64
[root@vm-0 ~]# modprobe hv_balloon
modprobe: ERROR: could not insert 'hv_balloon': No such device


Any magic flag I am missing? Or is there something preventing this to work with 
Linux VMs?



Haven't tested the driver with Linux guests in a long time (as it is
targeting Windows), but I think you need to disable KVM PV interface for
the Hyper-V one to be detected by Linux.

Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
the detected hypervisor is now Hyper-V.

Also, you need to disable S4 in the guest for hot-add capability to work
(I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).

Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
to QEMU command line to see what's happening.


VM is not happy:

[    1.908595] BUG: kernel NULL pointer dereference, address: 0007
[    1.908837] #PF: supervisor read access in kernel mode
[    1.908837] #PF: error_code(0x) - not-present page
[    1.908837] PGD 0 P4D 0
[    1.908837] Oops:  [#1] PREEMPT SMP NOPTI
[    1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 
6.3.5-100.fc37.x86_64 #1
[    1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-0-gea1b7a073390-p4
[    1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0
[    1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 
03 00 00 44 89 c3 4c 89 cb
[    1.908837] RSP: 0018:95b680ad7950 EFLAGS: 00010286
[    1.908837] RAX: 95b680ad79e0 RBX: 0002 RCX: 0003
[    1.908837] RDX:  RSI: 8a0283a3c558 RDI: a4b376e0
[    1.908837] RBP:  R08: 0002 R09: 
[    1.908837] R10: 8a02811034ec R11:  R12: 
[    1.908837] R13: 8a02811034e8 R14: 8a02811034e8 R15: 
[    1.908837] FS:  7f3bb2e7d0c0() GS:8a02bbd4() 
knlGS:
[    1.908837] CS:  0010 DS:  ES:  CR0: 80050033
[    1.908837] CR2: 0007 CR3: 000100a58002 CR4: 00770ee0
[    1.908837] PKRU: 5554
[    1.908837] Call Trace:
[    1.908837]  
[    1.908837]  ? __die+0x23/0x70
[    1.908837]  ? page_fault_oops+0x171/0x4e0
[    1.908837]  ? prepare_alloc_pages.constprop.0+0xf6/0x1a0
[    1.908837]  ? exc_page_fault+0x74/0x170
[    1.908837]  

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-22 Thread Maciej S. Szmigiero

On 22.06.2023 14:06, David Hildenbrand wrote:

On 22.06.23 13:17, Maciej S. Szmigiero wrote:

On 22.06.2023 13:15, David Hildenbrand wrote:

On 22.06.23 13:12, Maciej S. Szmigiero wrote:

On 22.06.2023 13:01, David Hildenbrand wrote:

[...]


We'd use a memory region container as device memory region (like [1]) and would 
have to handle the !memdev case (I can help with that). > Into that, you can 
map the RAM memory region on demand (and eventually even using multiple slots like 
[1]).

(2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code 
handle (un)plugging of the device.


(1) feels cleanest to me, although it will require a bit more work.



I also think approach (1) makes more sense as it avoids memslot metadata
overhead for not-yet-hot-added parts of the memory backing device.

Not sure what you mean that the !memdev case would be problematic in this
case - it is working in the current driver shape so why would adding
potential memory subregions (used in the memdev case) change that?


I'm thinking about the case where you have a hv-balloon device without a memdev.

Without -m X,maxmem=y we don't currently expect to have memory devices around
(and especially them getting (un)plugged. But why should we "force" to set the
"maxmem" option


I guess it's only a small change to QEMU to allow having hv-balloon
device (without a memdev) even in the case where there's no "maxmem"
option given on the QEMU command line.



I hope I'll find some time soonish to prototype what I have in mind, to see
if it could be made working.



Okay, so I'll wait for your prototype before commencing further work on
the next version of this driver.


About to have something simplistic running -- I think. Want to test with a 
Linux VM, but I don't seem to get it working (also without my changes).


#!/bin/bash

build/qemu-system-x86_64 \
       --enable-kvm \
       -m 4G,maxmem=36G \
       -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
       -smp 16 \
       -nographic \
       -nodefaults \
       -net nic -net user \
       -chardev stdio,nosignal,id=serial \
       -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
       -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
       -device isa-serial,chardev=serial \
       -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
       -mon chardev=monitor,mode=readline \
       -device vmbus-bridge \
       -object memory-backend-ram,size=2G,id=mem0 \
       -device hv-balloon,id=hv1,memdev=mem0



[root@vm-0 ~]# uname -r
6.3.5-100.fc37.x86_64
[root@vm-0 ~]# modprobe hv_balloon
modprobe: ERROR: could not insert 'hv_balloon': No such device


Any magic flag I am missing? Or is there something preventing this to work with 
Linux VMs?



Haven't tested the driver with Linux guests in a long time (as it is
targeting Windows), but I think you need to disable KVM PV interface for
the Hyper-V one to be detected by Linux.

Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
the detected hypervisor is now Hyper-V.

Also, you need to disable S4 in the guest for hot-add capability to work
(I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).

Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
to QEMU command line to see what's happening.


VM is not happy:

[    1.908595] BUG: kernel NULL pointer dereference, address: 0007
[    1.908837] #PF: supervisor read access in kernel mode
[    1.908837] #PF: error_code(0x) - not-present page
[    1.908837] PGD 0 P4D 0
[    1.908837] Oops:  [#1] PREEMPT SMP NOPTI
[    1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 
6.3.5-100.fc37.x86_64 #1
[    1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-0-gea1b7a073390-p4
[    1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0
[    1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 
03 00 00 44 89 c3 4c 89 cb
[    1.908837] RSP: 0018:95b680ad7950 EFLAGS: 00010286
[    1.908837] RAX: 95b680ad79e0 RBX: 0002 RCX: 0003
[    1.908837] RDX:  RSI: 8a0283a3c558 RDI: a4b376e0
[    1.908837] RBP:  R08: 0002 R09: 
[    1.908837] R10: 8a02811034ec R11:  R12: 
[    1.908837] R13: 8a02811034e8 R14: 8a02811034e8 R15: 
[    1.908837] FS:  7f3bb2e7d0c0() GS:8a02bbd4() 
knlGS:
[    1.908837] CS:  0010 DS:  ES:  CR0: 80050033
[    1.908837] CR2: 0007 CR3: 000100a58002 CR4: 00770ee0
[    1.908837] PKRU: 5554
[    1.908837] Call Trace:
[    1.908837]  
[    1.908837]  ? __die+0x23/0x70
[    1.908837]  ? page_fault_oops+0x171/0x4e0
[    1.908837]  ? prepare_alloc_pages.constprop.0+0xf6/0x1a0
[    1.908837]  ? exc_page_fault+0x74/0x170
[    1.908837]  ? asm_exc_page_fault+0x26/0x30
[    1.908837]  ? 

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-22 Thread David Hildenbrand

On 22.06.23 13:17, Maciej S. Szmigiero wrote:

On 22.06.2023 13:15, David Hildenbrand wrote:

On 22.06.23 13:12, Maciej S. Szmigiero wrote:

On 22.06.2023 13:01, David Hildenbrand wrote:

[...]


We'd use a memory region container as device memory region (like [1]) and would 
have to handle the !memdev case (I can help with that). > Into that, you can 
map the RAM memory region on demand (and eventually even using multiple slots like 
[1]).

(2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code 
handle (un)plugging of the device.


(1) feels cleanest to me, although it will require a bit more work.



I also think approach (1) makes more sense as it avoids memslot metadata
overhead for not-yet-hot-added parts of the memory backing device.

Not sure what you mean that the !memdev case would be problematic in this
case - it is working in the current driver shape so why would adding
potential memory subregions (used in the memdev case) change that?


I'm thinking about the case where you have a hv-balloon device without a memdev.

Without -m X,maxmem=y we don't currently expect to have memory devices around
(and especially them getting (un)plugged. But why should we "force" to set the
"maxmem" option


I guess it's only a small change to QEMU to allow having hv-balloon
device (without a memdev) even in the case where there's no "maxmem"
option given on the QEMU command line.



I hope I'll find some time soonish to prototype what I have in mind, to see
if it could be made working.



Okay, so I'll wait for your prototype before commencing further work on
the next version of this driver.


About to have something simplistic running -- I think. Want to test with a 
Linux VM, but I don't seem to get it working (also without my changes).


#!/bin/bash

build/qemu-system-x86_64 \
       --enable-kvm \
       -m 4G,maxmem=36G \
       -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
       -smp 16 \
       -nographic \
       -nodefaults \
       -net nic -net user \
       -chardev stdio,nosignal,id=serial \
       -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
       -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
       -device isa-serial,chardev=serial \
       -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
       -mon chardev=monitor,mode=readline \
       -device vmbus-bridge \
       -object memory-backend-ram,size=2G,id=mem0 \
       -device hv-balloon,id=hv1,memdev=mem0



[root@vm-0 ~]# uname -r
6.3.5-100.fc37.x86_64
[root@vm-0 ~]# modprobe hv_balloon
modprobe: ERROR: could not insert 'hv_balloon': No such device


Any magic flag I am missing? Or is there something preventing this to work with 
Linux VMs?



Haven't tested the driver with Linux guests in a long time (as it is
targeting Windows), but I think you need to disable KVM PV interface for
the Hyper-V one to be detected by Linux.

Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
the detected hypervisor is now Hyper-V.

Also, you need to disable S4 in the guest for hot-add capability to work
(I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).

Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
to QEMU command line to see what's happening.


VM is not happy:

[    1.908595] BUG: kernel NULL pointer dereference, address: 0007
[    1.908837] #PF: supervisor read access in kernel mode
[    1.908837] #PF: error_code(0x) - not-present page
[    1.908837] PGD 0 P4D 0
[    1.908837] Oops:  [#1] PREEMPT SMP NOPTI
[    1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 
6.3.5-100.fc37.x86_64 #1
[    1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-0-gea1b7a073390-p4
[    1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0
[    1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 
03 00 00 44 89 c3 4c 89 cb
[    1.908837] RSP: 0018:95b680ad7950 EFLAGS: 00010286
[    1.908837] RAX: 95b680ad79e0 RBX: 0002 RCX: 0003
[    1.908837] RDX:  RSI: 8a0283a3c558 RDI: a4b376e0
[    1.908837] RBP:  R08: 0002 R09: 
[    1.908837] R10: 8a02811034ec R11:  R12: 
[    1.908837] R13: 8a02811034e8 R14: 8a02811034e8 R15: 
[    1.908837] FS:  7f3bb2e7d0c0() GS:8a02bbd4() 
knlGS:
[    1.908837] CS:  0010 DS:  ES:  CR0: 80050033
[    1.908837] CR2: 0007 CR3: 000100a58002 CR4: 00770ee0
[    1.908837] PKRU: 5554
[    1.908837] Call Trace:
[    1.908837]  
[    1.908837]  ? __die+0x23/0x70
[    1.908837]  ? page_fault_oops+0x171/0x4e0
[    1.908837]  ? prepare_alloc_pages.constprop.0+0xf6/0x1a0
[    1.908837]  ? exc_page_fault+0x74/0x170
[    1.908837]  ? asm_exc_page_fault+0x26/0x30
[    1.908837]  ? acpi_ns_lookup+0x8f/0x4c0
[    1.908837]  acpi_

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-22 Thread Maciej S. Szmigiero

On 22.06.2023 13:15, David Hildenbrand wrote:

On 22.06.23 13:12, Maciej S. Szmigiero wrote:

On 22.06.2023 13:01, David Hildenbrand wrote:

[...]


We'd use a memory region container as device memory region (like [1]) and would 
have to handle the !memdev case (I can help with that). > Into that, you can 
map the RAM memory region on demand (and eventually even using multiple slots like 
[1]).

(2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code 
handle (un)plugging of the device.


(1) feels cleanest to me, although it will require a bit more work.



I also think approach (1) makes more sense as it avoids memslot metadata
overhead for not-yet-hot-added parts of the memory backing device.

Not sure what you mean that the !memdev case would be problematic in this
case - it is working in the current driver shape so why would adding
potential memory subregions (used in the memdev case) change that?


I'm thinking about the case where you have a hv-balloon device without a memdev.

Without -m X,maxmem=y we don't currently expect to have memory devices around
(and especially them getting (un)plugged. But why should we "force" to set the
"maxmem" option


I guess it's only a small change to QEMU to allow having hv-balloon
device (without a memdev) even in the case where there's no "maxmem"
option given on the QEMU command line.



I hope I'll find some time soonish to prototype what I have in mind, to see
if it could be made working.



Okay, so I'll wait for your prototype before commencing further work on
the next version of this driver.


About to have something simplistic running -- I think. Want to test with a 
Linux VM, but I don't seem to get it working (also without my changes).


#!/bin/bash

build/qemu-system-x86_64 \
      --enable-kvm \
      -m 4G,maxmem=36G \
      -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
      -smp 16 \
      -nographic \
      -nodefaults \
      -net nic -net user \
      -chardev stdio,nosignal,id=serial \
      -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
      -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
      -device isa-serial,chardev=serial \
      -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
      -mon chardev=monitor,mode=readline \
      -device vmbus-bridge \
      -object memory-backend-ram,size=2G,id=mem0 \
      -device hv-balloon,id=hv1,memdev=mem0



[root@vm-0 ~]# uname -r
6.3.5-100.fc37.x86_64
[root@vm-0 ~]# modprobe hv_balloon
modprobe: ERROR: could not insert 'hv_balloon': No such device


Any magic flag I am missing? Or is there something preventing this to work with 
Linux VMs?



Haven't tested the driver with Linux guests in a long time (as it is
targeting Windows), but I think you need to disable KVM PV interface for
the Hyper-V one to be detected by Linux.

Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
the detected hypervisor is now Hyper-V.

Also, you need to disable S4 in the guest for hot-add capability to work
(I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).

Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
to QEMU command line to see what's happening.


VM is not happy:

[    1.908595] BUG: kernel NULL pointer dereference, address: 0007
[    1.908837] #PF: supervisor read access in kernel mode
[    1.908837] #PF: error_code(0x) - not-present page
[    1.908837] PGD 0 P4D 0
[    1.908837] Oops:  [#1] PREEMPT SMP NOPTI
[    1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 
6.3.5-100.fc37.x86_64 #1
[    1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-0-gea1b7a073390-p4
[    1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0
[    1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 
03 00 00 44 89 c3 4c 89 cb
[    1.908837] RSP: 0018:95b680ad7950 EFLAGS: 00010286
[    1.908837] RAX: 95b680ad79e0 RBX: 0002 RCX: 0003
[    1.908837] RDX:  RSI: 8a0283a3c558 RDI: a4b376e0
[    1.908837] RBP:  R08: 0002 R09: 
[    1.908837] R10: 8a02811034ec R11:  R12: 
[    1.908837] R13: 8a02811034e8 R14: 8a02811034e8 R15: 
[    1.908837] FS:  7f3bb2e7d0c0() GS:8a02bbd4() 
knlGS:
[    1.908837] CS:  0010 DS:  ES:  CR0: 80050033
[    1.908837] CR2: 0007 CR3: 000100a58002 CR4: 00770ee0
[    1.908837] PKRU: 5554
[    1.908837] Call Trace:
[    1.908837]  
[    1.908837]  ? __die+0x23/0x70
[    1.908837]  ? page_fault_oops+0x171/0x4e0
[    1.908837]  ? prepare_alloc_pages.constprop.0+0xf6/0x1a0
[    1.908837]  ? exc_page_fault+0x74/0x170
[    1.908837]  ? asm_exc_page_fault+0x26/0x30
[    1.908837]  ? acpi_ns_lookup+0x8f/0x4c0
[    1.908837]  acpi_ns_get_node_unlocked+0xdd/0x110
[    1.908837]  ? down_timeout+

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-22 Thread David Hildenbrand

On 22.06.23 13:12, Maciej S. Szmigiero wrote:

On 22.06.2023 13:01, David Hildenbrand wrote:

[...]


We'd use a memory region container as device memory region (like [1]) and would 
have to handle the !memdev case (I can help with that). > Into that, you can 
map the RAM memory region on demand (and eventually even using multiple slots like 
[1]).

(2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code 
handle (un)plugging of the device.


(1) feels cleanest to me, although it will require a bit more work.



I also think approach (1) makes more sense as it avoids memslot metadata
overhead for not-yet-hot-added parts of the memory backing device.

Not sure what you mean that the !memdev case would be problematic in this
case - it is working in the current driver shape so why would adding
potential memory subregions (used in the memdev case) change that?


I'm thinking about the case where you have a hv-balloon device without a memdev.

Without -m X,maxmem=y we don't currently expect to have memory devices around
(and especially them getting (un)plugged. But why should we "force" to set the
"maxmem" option


I guess it's only a small change to QEMU to allow having hv-balloon
device (without a memdev) even in the case where there's no "maxmem"
option given on the QEMU command line.



I hope I'll find some time soonish to prototype what I have in mind, to see
if it could be made working.



Okay, so I'll wait for your prototype before commencing further work on
the next version of this driver.


About to have something simplistic running -- I think. Want to test with a 
Linux VM, but I don't seem to get it working (also without my changes).


#!/bin/bash

build/qemu-system-x86_64 \
      --enable-kvm \
      -m 4G,maxmem=36G \
      -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
      -smp 16 \
      -nographic \
      -nodefaults \
      -net nic -net user \
      -chardev stdio,nosignal,id=serial \
      -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
      -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
      -device isa-serial,chardev=serial \
      -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
      -mon chardev=monitor,mode=readline \
      -device vmbus-bridge \
      -object memory-backend-ram,size=2G,id=mem0 \
      -device hv-balloon,id=hv1,memdev=mem0



[root@vm-0 ~]# uname -r
6.3.5-100.fc37.x86_64
[root@vm-0 ~]# modprobe hv_balloon
modprobe: ERROR: could not insert 'hv_balloon': No such device


Any magic flag I am missing? Or is there something preventing this to work with 
Linux VMs?



Haven't tested the driver with Linux guests in a long time (as it is
targeting Windows), but I think you need to disable KVM PV interface for
the Hyper-V one to be detected by Linux.

Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
the detected hypervisor is now Hyper-V.

Also, you need to disable S4 in the guest for hot-add capability to work
(I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).

Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
to QEMU command line to see what's happening.


VM is not happy:

[1.908595] BUG: kernel NULL pointer dereference, address: 0007
[1.908837] #PF: supervisor read access in kernel mode
[1.908837] #PF: error_code(0x) - not-present page
[1.908837] PGD 0 P4D 0
[1.908837] Oops:  [#1] PREEMPT SMP NOPTI
[1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 
6.3.5-100.fc37.x86_64 #1
[1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-0-gea1b7a073390-p4
[1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0
[1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 
03 00 00 44 89 c3 4c 89 cb
[1.908837] RSP: 0018:95b680ad7950 EFLAGS: 00010286
[1.908837] RAX: 95b680ad79e0 RBX: 0002 RCX: 0003
[1.908837] RDX:  RSI: 8a0283a3c558 RDI: a4b376e0
[1.908837] RBP:  R08: 0002 R09: 
[1.908837] R10: 8a02811034ec R11:  R12: 
[1.908837] R13: 8a02811034e8 R14: 8a02811034e8 R15: 
[1.908837] FS:  7f3bb2e7d0c0() GS:8a02bbd4() 
knlGS:
[1.908837] CS:  0010 DS:  ES:  CR0: 80050033
[1.908837] CR2: 0007 CR3: 000100a58002 CR4: 00770ee0
[1.908837] PKRU: 5554
[1.908837] Call Trace:
[1.908837]  
[1.908837]  ? __die+0x23/0x70
[1.908837]  ? page_fault_oops+0x171/0x4e0
[1.908837]  ? prepare_alloc_pages.constprop.0+0xf6/0x1a0
[1.908837]  ? exc_page_fault+0x74/0x170
[1.908837]  ? asm_exc_page_fault+0x26/0x30
[1.908837]  ? acpi_ns_lookup+0x8f/0x4c0
[1.908837]  acpi_ns_get_node_unlocked+0xdd/0x110
[1.908837]  ? down_timeout+0x3e/0x60
[1.908837]  ? acpi_ns_get_node+0x

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-22 Thread Maciej S. Szmigiero

On 22.06.2023 13:01, David Hildenbrand wrote:

[...]


We'd use a memory region container as device memory region (like [1]) and would 
have to handle the !memdev case (I can help with that). > Into that, you can 
map the RAM memory region on demand (and eventually even using multiple slots like 
[1]).

(2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code 
handle (un)plugging of the device.


(1) feels cleanest to me, although it will require a bit more work.



I also think approach (1) makes more sense as it avoids memslot metadata
overhead for not-yet-hot-added parts of the memory backing device.

Not sure what you mean that the !memdev case would be problematic in this
case - it is working in the current driver shape so why would adding
potential memory subregions (used in the memdev case) change that?


I'm thinking about the case where you have a hv-balloon device without a memdev.

Without -m X,maxmem=y we don't currently expect to have memory devices around
(and especially them getting (un)plugged. But why should we "force" to set the
"maxmem" option


I guess it's only a small change to QEMU to allow having hv-balloon
device (without a memdev) even in the case where there's no "maxmem"
option given on the QEMU command line.



I hope I'll find some time soonish to prototype what I have in mind, to see
if it could be made working.



Okay, so I'll wait for your prototype before commencing further work on
the next version of this driver.


About to have something simplistic running -- I think. Want to test with a 
Linux VM, but I don't seem to get it working (also without my changes).


#!/bin/bash

build/qemu-system-x86_64 \
     --enable-kvm \
     -m 4G,maxmem=36G \
     -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
     -smp 16 \
     -nographic \
     -nodefaults \
     -net nic -net user \
     -chardev stdio,nosignal,id=serial \
     -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
     -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
     -device isa-serial,chardev=serial \
     -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
     -mon chardev=monitor,mode=readline \
     -device vmbus-bridge \
     -object memory-backend-ram,size=2G,id=mem0 \
     -device hv-balloon,id=hv1,memdev=mem0



[root@vm-0 ~]# uname -r
6.3.5-100.fc37.x86_64
[root@vm-0 ~]# modprobe hv_balloon
modprobe: ERROR: could not insert 'hv_balloon': No such device


Any magic flag I am missing? Or is there something preventing this to work with 
Linux VMs?



Haven't tested the driver with Linux guests in a long time (as it is
targeting Windows), but I think you need to disable KVM PV interface for
the Hyper-V one to be detected by Linux.

Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
the detected hypervisor is now Hyper-V.

Also, you need to disable S4 in the guest for hot-add capability to work
(I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).

Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
to QEMU command line to see what's happening.

Thanks,
Maciej




Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-22 Thread David Hildenbrand

[...]


We'd use a memory region container as device memory region (like [1]) and would 
have to handle the !memdev case (I can help with that). > Into that, you can 
map the RAM memory region on demand (and eventually even using multiple slots like 
[1]).

(2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code 
handle (un)plugging of the device.


(1) feels cleanest to me, although it will require a bit more work.



I also think approach (1) makes more sense as it avoids memslot metadata
overhead for not-yet-hot-added parts of the memory backing device.

Not sure what you mean that the !memdev case would be problematic in this
case - it is working in the current driver shape so why would adding
potential memory subregions (used in the memdev case) change that?


I'm thinking about the case where you have a hv-balloon device without a memdev.

Without -m X,maxmem=y we don't currently expect to have memory devices around
(and especially them getting (un)plugged. But why should we "force" to set the
"maxmem" option


I guess it's only a small change to QEMU to allow having hv-balloon
device (without a memdev) even in the case where there's no "maxmem"
option given on the QEMU command line.



I hope I'll find some time soonish to prototype what I have in mind, to see
if it could be made working.



Okay, so I'll wait for your prototype before commencing further work on
the next version of this driver.


About to have something simplistic running -- I think. Want to test with 
a Linux VM, but I don't seem to get it working (also without my changes).



#!/bin/bash

build/qemu-system-x86_64 \
--enable-kvm \
-m 4G,maxmem=36G \
-cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
-smp 16 \
-nographic \
-nodefaults \
-net nic -net user \
-chardev stdio,nosignal,id=serial \
-hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
-cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
-device isa-serial,chardev=serial \
-chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
-mon chardev=monitor,mode=readline \
-device vmbus-bridge \
-object memory-backend-ram,size=2G,id=mem0 \
-device hv-balloon,id=hv1,memdev=mem0



[root@vm-0 ~]# uname -r
6.3.5-100.fc37.x86_64
[root@vm-0 ~]# modprobe hv_balloon
modprobe: ERROR: could not insert 'hv_balloon': No such device


Any magic flag I am missing? Or is there something preventing this to 
work with Linux VMs?


--
Cheers,

David / dhildenb




Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-21 Thread Maciej S. Szmigiero

On 21.06.2023 12:32, David Hildenbrand wrote:

On 20.06.23 22:13, Maciej S. Szmigiero wrote:

On 19.06.2023 17:58, David Hildenbrand wrote:

[...]

Sorry for the late reply!

Still trying to make up my mind what the right way forward with this is.



This usage is still problematic I suspect (well, and a layer violation 
regarding the machine). The machine hotplug handler is supposed to call the 
pre_plug/plug/unplug hooks as response to pre_plug/plug/unplug notifications 
from the core. See how we handle virtio-mem/virtio-pmem/nvdimms as an example.

We assume that when memory_device_pre_plug() gets called, that the device is 
not realized yet, but once it gets plugged, that it already is realized, and 
that the device will actually vanish (get unrealized) when unplugging the 
device.
Otherwise memory device logic like in get_plugged_memory_size() stops working.


get_plugged_memory_size() just calls get_plugged_size() method on every
realized TYPE_MEMORY_DEVICE.

While this now always returns the whole backing memory size (once the
backend gets plugged) I don't see a reason why this method could not be
overridden in hv-balloon to return just the currently hot-added size.

By the way, this function seems to be used just for reporting stats via QMP.


memory_device_build_list() is another example, used for 
memory_device_get_free_addr().


I don't see it calling get_plugged_size() method, I can see it only using
(indirectly) get_addr() method.


It similarly contains the TYPE_MEMORY_DEVICE -> dev->realized logic.


All right, I thought at first you meant just that the get_plugged_memory_size()
function reports misleading values.






You'd be blocking memory address ranges with an unplugged-but-realized memory 
device.>
Memory device code expects that realized memory devices are plugged and vice 
versa.


Which QEMU code you mean specifically? Maybe it just needs a trivial
change.

Before the driver hot-adds the first chunk of memory it does not use any
part of the address space.

After that, it has to reserve address space for the whole backing memory
device, so no other devices will claim parts of it and because a
TYPE_MEMORY_DEVICE (currently) can have just a single range.

This address space is released when the VM is restarted.



As I said, memory device code currently expects that you don't have realized
TYPE_MEMORY_DEVICE that are not plugged, and currently that holds true for all
memory devices.

We could modify memory device code to change that, but IMHO it's the wrong way
around: the machine (hotplug) is responsible for (un)plugging memory devices
as they get realized.

Doing a qdev_get_machine()/current_machine from device code and then
modifying the state of the machine (here: calling plug/unplug handlers)
is usually a warning sign that there is a layer violation.

That's why I'm thinking about a cleaner way to handle that.


Okay, now I think I understand what you think is questionable:
calling memory_device_pre_plug(), memory_device_plug() and friends from
the driver when hot-adding the first memory chunk, even thought no actual
device is getting plugged in at that time.

I'm open to other approaches here (besides the virtual DIMMs one that we
already tried in the past).


[...]


Is it to support the !memdev case or why is this this plugging/unplugging in 
our_range_plugged_new()/our_range_plugged_free() required?


At least for three (four) reasons:
1a) At the hv-balloon plug time the device doesn't yet know the guest
alignement requirements - or whether the guest supports memory hot add at
all - that's what the device will learn only once the guest connects
to the protocol.


Understood, so you want to at least expose the memory dynamically to the VM 
(map the MR on demand).

That could be done using a memory region container like virtio-mem is planning 
[1] on using fairly easily.

[1] https://lkml.kernel.org/r/20230616092654.175518-14-da...@redhat.com


Thanks for the pointer to your series - I've looked at it and it seems
to me that while it allows multiple memory subregions, each backed by
a separate memslot it still needs a single big main region for
the particular TYPE_MEMORY_DEVICE, am I right?


Yes.




1b) For the same reason the memory region has to be unplugged at the VM
reset time - the new guest might have stricter alignement requirements


Alignment is certainly interesting, but is it a real problem?

As default (not other memory devices) you get an address that's aligned to 1 
GiB. And, in fact, you can simply always request a 1 GiB alignment for the 
device, independent of the guest requirement.

Would the guest requirement be even stricter than that (e.g., 2 GiB)?


The protocol allows up to 32 GiB alignement so we cannot simply
hardcode the alignement to 1 GiB, especially since this is Windows
we're talking about (so this parameter is subject to unpredictable
changes).


Note that anything bigger than 1 GiB is not really guaranteed to work in
QEMU on x86-64. See the w

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-21 Thread David Hildenbrand

On 20.06.23 22:13, Maciej S. Szmigiero wrote:

On 19.06.2023 17:58, David Hildenbrand wrote:

[...]

Sorry for the late reply!

Still trying to make up my mind what the right way forward with this is.



This usage is still problematic I suspect (well, and a layer violation 
regarding the machine). The machine hotplug handler is supposed to call the 
pre_plug/plug/unplug hooks as response to pre_plug/plug/unplug notifications 
from the core. See how we handle virtio-mem/virtio-pmem/nvdimms as an example.

We assume that when memory_device_pre_plug() gets called, that the device is 
not realized yet, but once it gets plugged, that it already is realized, and 
that the device will actually vanish (get unrealized) when unplugging the 
device.
Otherwise memory device logic like in get_plugged_memory_size() stops working.


get_plugged_memory_size() just calls get_plugged_size() method on every
realized TYPE_MEMORY_DEVICE.

While this now always returns the whole backing memory size (once the
backend gets plugged) I don't see a reason why this method could not be
overridden in hv-balloon to return just the currently hot-added size.

By the way, this function seems to be used just for reporting stats via QMP.


memory_device_build_list() is another example, used for 
memory_device_get_free_addr().


I don't see it calling get_plugged_size() method, I can see it only using
(indirectly) get_addr() method.


It similarly contains the TYPE_MEMORY_DEVICE -> dev->realized logic.




You'd be blocking memory address ranges with an unplugged-but-realized memory 
device.>
Memory device code expects that realized memory devices are plugged and vice 
versa.


Which QEMU code you mean specifically? Maybe it just needs a trivial
change.

Before the driver hot-adds the first chunk of memory it does not use any
part of the address space.

After that, it has to reserve address space for the whole backing memory
device, so no other devices will claim parts of it and because a
TYPE_MEMORY_DEVICE (currently) can have just a single range.

This address space is released when the VM is restarted.



As I said, memory device code currently expects that you don't have realized
TYPE_MEMORY_DEVICE that are not plugged, and currently that holds true for all
memory devices.

We could modify memory device code to change that, but IMHO it's the wrong way
around: the machine (hotplug) is responsible for (un)plugging memory devices
as they get realized.

Doing a qdev_get_machine()/current_machine from device code and then
modifying the state of the machine (here: calling plug/unplug handlers)
is usually a warning sign that there is a layer violation.

That's why I'm thinking about a cleaner way to handle that.

[...]


Is it to support the !memdev case or why is this this plugging/unplugging in 
our_range_plugged_new()/our_range_plugged_free() required?


At least for three (four) reasons:
1a) At the hv-balloon plug time the device doesn't yet know the guest
alignement requirements - or whether the guest supports memory hot add at
all - that's what the device will learn only once the guest connects
to the protocol.


Understood, so you want to at least expose the memory dynamically to the VM 
(map the MR on demand).

That could be done using a memory region container like virtio-mem is planning 
[1] on using fairly easily.

[1] https://lkml.kernel.org/r/20230616092654.175518-14-da...@redhat.com


Thanks for the pointer to your series - I've looked at it and it seems
to me that while it allows multiple memory subregions, each backed by
a separate memslot it still needs a single big main region for
the particular TYPE_MEMORY_DEVICE, am I right?


Yes.




1b) For the same reason the memory region has to be unplugged at the VM
reset time - the new guest might have stricter alignement requirements


Alignment is certainly interesting, but is it a real problem?

As default (not other memory devices) you get an address that's aligned to 1 
GiB. And, in fact, you can simply always request a 1 GiB alignment for the 
device, independent of the guest requirement.

Would the guest requirement be even stricter than that (e.g., 2 GiB)?


The protocol allows up to 32 GiB alignement so we cannot simply
hardcode the alignement to 1 GiB, especially since this is Windows
we're talking about (so this parameter is subject to unpredictable
changes).


Note that anything bigger than 1 GiB is not really guaranteed to work in
QEMU on x86-64. See the warning in memory_device_get_free_addr():

/* start of address space indicates the maximum alignment we expect */
if (!QEMU_IS_ALIGNED(range_lob(&as), align)) {
warn_report("the alignment (0x%" PRIx64 ") exceeds the expected"
" maximum alignment, memory will get fragmented and not"
" all 'maxmem' might be usable for memory devices.",
align);
}


So assume you do a "-m 4G,maxmem=36G"

You cannot add a 32 GiB device with an alignment of 32 GiB

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-20 Thread Maciej S. Szmigiero

On 19.06.2023 17:58, David Hildenbrand wrote:

[...]

Sorry for the late reply!

Still trying to make up my mind what the right way forward with this is.



This usage is still problematic I suspect (well, and a layer violation 
regarding the machine). The machine hotplug handler is supposed to call the 
pre_plug/plug/unplug hooks as response to pre_plug/plug/unplug notifications 
from the core. See how we handle virtio-mem/virtio-pmem/nvdimms as an example.

We assume that when memory_device_pre_plug() gets called, that the device is 
not realized yet, but once it gets plugged, that it already is realized, and 
that the device will actually vanish (get unrealized) when unplugging the 
device.
Otherwise memory device logic like in get_plugged_memory_size() stops working.


get_plugged_memory_size() just calls get_plugged_size() method on every
realized TYPE_MEMORY_DEVICE.

While this now always returns the whole backing memory size (once the
backend gets plugged) I don't see a reason why this method could not be
overridden in hv-balloon to return just the currently hot-added size.

By the way, this function seems to be used just for reporting stats via QMP.


memory_device_build_list() is another example, used for 
memory_device_get_free_addr().


I don't see it calling get_plugged_size() method, I can see it only using
(indirectly) get_addr() method.

You'd be blocking memory address ranges with an unplugged-but-realized memory device.> 
Memory device code expects that realized memory devices are plugged and vice versa.


Which QEMU code you mean specifically? Maybe it just needs a trivial
change.

Before the driver hot-adds the first chunk of memory it does not use any
part of the address space.

After that, it has to reserve address space for the whole backing memory
device, so no other devices will claim parts of it and because a
TYPE_MEMORY_DEVICE (currently) can have just a single range.

This address space is released when the VM is restarted.





As an example, see device_set_realized() on the pre_plug+realize+plug 
interaction.

IIRC, you're reusing the already-realized hv-balloon device here, correct?


Yes - in this version of the driver.

The previous version used separate virtual DIMM devices instead but you have
recommended against that approach.



Yes. My recommendation was to make the hv-balloon device a memory device and 
use a single memory region, which you did (and I think it's much better).

It's now all about when we (un)plug the memory device itself -- and how.



Why can't you call the pre_plug/plug/unplug functions from the machine 
pre_plug/plug/unplug hooks -- exactly once for the memory device when plugging 
the hv-balloon device?

Is it to support the !memdev case or why is this this plugging/unplugging in 
our_range_plugged_new()/our_range_plugged_free() required?


At least for three (four) reasons:
1a) At the hv-balloon plug time the device doesn't yet know the guest
alignement requirements - or whether the guest supports memory hot add at
all - that's what the device will learn only once the guest connects
to the protocol.


Understood, so you want to at least expose the memory dynamically to the VM 
(map the MR on demand).

That could be done using a memory region container like virtio-mem is planning 
[1] on using fairly easily.

[1] https://lkml.kernel.org/r/20230616092654.175518-14-da...@redhat.com


Thanks for the pointer to your series - I've looked at it and it seems
to me that while it allows multiple memory subregions, each backed by
a separate memslot it still needs a single big main region for
the particular TYPE_MEMORY_DEVICE, am I right?


1b) For the same reason the memory region has to be unplugged at the VM
reset time - the new guest might have stricter alignement requirements


Alignment is certainly interesting, but is it a real problem?

As default (not other memory devices) you get an address that's aligned to 1 
GiB. And, in fact, you can simply always request a 1 GiB alignment for the 
device, independent of the guest requirement.

Would the guest requirement be even stricter than that (e.g., 2 GiB)?


The protocol allows up to 32 GiB alignement so we cannot simply
hardcode the alignement to 1 GiB, especially since this is Windows
we're talking about (so this parameter is subject to unpredictable
changes).


In theory, when using a memory region container (again [1]) into which you 
dynamically map the RAM region, you can do this alignment internally.

So it might be an option to use a memory region container and dynamically map 
into that one as you please (it just has to have a fixed size).


Still, demand-allocating just the right memory region (with the right
alignement) seems to me like a cleaner solution than allocating a huge
worst-case memory region upfront and then trying to carve the right
part of it.



By the way, the memory region *can't* be unplugged yet at VMBus device
reset time - Windows keeps on using it until the system is restart

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-19 Thread David Hildenbrand

[...]

Sorry for the late reply!

Still trying to make up my mind what the right way forward with this is.



This usage is still problematic I suspect (well, and a layer violation 
regarding the machine). The machine hotplug handler is supposed to call the 
pre_plug/plug/unplug hooks as response to pre_plug/plug/unplug notifications 
from the core. See how we handle virtio-mem/virtio-pmem/nvdimms as an example.

We assume that when memory_device_pre_plug() gets called, that the device is 
not realized yet, but once it gets plugged, that it already is realized, and 
that the device will actually vanish (get unrealized) when unplugging the 
device.
Otherwise memory device logic like in get_plugged_memory_size() stops working.


get_plugged_memory_size() just calls get_plugged_size() method on every
realized TYPE_MEMORY_DEVICE.

While this now always returns the whole backing memory size (once the
backend gets plugged) I don't see a reason why this method could not be
overridden in hv-balloon to return just the currently hot-added size.

By the way, this function seems to be used just for reporting stats via QMP.


memory_device_build_list() is another example, used for 
memory_device_get_free_addr(). You'd be blocking memory address ranges 
with an unplugged-but-realized memory device.


Memory device code expects that realized memory devices are plugged and 
vice versa.






As an example, see device_set_realized() on the pre_plug+realize+plug 
interaction.

IIRC, you're reusing the already-realized hv-balloon device here, correct?


Yes - in this version of the driver.

The previous version used separate virtual DIMM devices instead but you have
recommended against that approach.



Yes. My recommendation was to make the hv-balloon device a memory device 
and use a single memory region, which you did (and I think it's much 
better).


It's now all about when we (un)plug the memory device itself -- and how.



Why can't you call the pre_plug/plug/unplug functions from the machine 
pre_plug/plug/unplug hooks -- exactly once for the memory device when plugging 
the hv-balloon device?

Is it to support the !memdev case or why is this this plugging/unplugging in 
our_range_plugged_new()/our_range_plugged_free() required?


At least for three (four) reasons:
1a) At the hv-balloon plug time the device doesn't yet know the guest
alignement requirements - or whether the guest supports memory hot add at
all - that's what the device will learn only once the guest connects
to the protocol.


Understood, so you want to at least expose the memory dynamically to the 
VM (map the MR on demand).


That could be done using a memory region container like virtio-mem is 
planning [1] on using fairly easily.


[1] https://lkml.kernel.org/r/20230616092654.175518-14-da...@redhat.com


1b) For the same reason the memory region has to be unplugged at the VM
reset time - the new guest might have stricter alignement requirements


Alignment is certainly interesting, but is it a real problem?

As default (not other memory devices) you get an address that's aligned 
to 1 GiB. And, in fact, you can simply always request a 1 GiB alignment 
for the device, independent of the guest requirement.


Would the guest requirement be even stricter than that (e.g., 2 GiB)?

In theory, when using a memory region container (again [1]) into which 
you dynamically map the RAM region, you can do this alignment internally.


So it might be an option to use a memory region container and 
dynamically map into that one as you please (it just has to have a fixed 
size).




By the way, the memory region *can't* be unplugged yet at VMBus device
reset time - Windows keeps on using it until the system is restarted,
even after disconnecting from the VMBus.


Yes, similar to virtio-mem -- we can only e.g. do it at system reset time.



2) The !memdev case, when the driver is just used for Windows-native
ballooning and stats reporting.


So we'd want support for a memory device that doesn't expose any memory 
-- in the current configuration. Should be doable (NULL returned as 
device memory region -> skip pre_plug/plug/unplug and teach the other 
code to just ignore this device). It would be easier if we could decide 
at runtime that this device is not a memory device ...


But let's first figure out if that's the right approach.




3) This will hopefully allow sharing the backing memory device between
virtio-mem and hv-balloon in the future - Linux guests will connect to
the former interface while Windows guests will connect to the later.



I've been told that the virtio-mem driver for Windows will show up 
polished in the near future ... we'll see :)


Anyhow, I consider that a secondary requirement. (virtio-mem is not 
compatible with shared memdevs)





Supporting the !memdev case is interesting: you essentially want to plug a 
memory device without a device region (or with an empty stub). I guess we 
should get that figured out somehow.



That's why t

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-13 Thread Maciej S. Szmigiero

On 12.06.2023 19:42, David Hildenbrand wrote:

On 12.06.23 16:00, Maciej S. Szmigiero wrote:

From: "Maciej S. Szmigiero" 

This driver is like virtio-balloon on steroids: it allows both changing the
guest memory allocation via ballooning and inserting pieces of extra RAM
into it on demand from a provided memory backend.

One of advantages of these over ACPI-based PC DIMM hotplug is that such
memory can be hotplugged in much smaller granularity because the ACPI DIMM
slot limit does not apply.

In order to enable hot-adding additional memory a new memory backend needs
to be created and provided to the driver via the "memdev" parameter.
This can be achieved by, for example, adding
"-object memory-backend-ram,id=mem1,size=32G" to the QEMU command line and
then instantiating the driver with "memdev=mem1" parameter.

In contrast with ACPI DIMM hotplug where one can only request to unplug a
whole DIMM stick this driver allows removing memory from guest in single
page (4k) units via ballooning.

The actual resizing is done via ballooning interface (for example, via
the "balloon" HMP command)
This includes resizing the guest past its boot size - that is, hot-adding
additional memory in granularity limited only by the guest alignment
requirements.

After a VM reboot the guest is back to its original (boot) size.

In the future, the guest boot memory size might be changed on reboot
instead, taking into account the effective size that VM had before that
reboot (much like Hyper-V does).

For performance reasons, the guest-released memory is tracked in a few
range trees, as a series of (start, count) ranges.
Each time a new page range is inserted into such tree its neighbors are
checked as candidates for possible merging with it.

Besides performance reasons, the Dynamic Memory protocol itself uses page
ranges as the data structure in its messages, so relevant pages need to be
merged into such ranges anyway.

One has to be careful when tracking the guest-released pages, since the
guest can maliciously report returning pages outside its current address
space, which later clash with the address range of newly added memory.
Similarly, the guest can report freeing the same page twice.

The above design results in much better ballooning performance than when
using virtio-balloon with the same guest: 230 GB / minute with this driver
versus 70 GB / minute with virtio-balloon.

During a ballooning operation most of time is spent waiting for the guest
to come up with newly freed page ranges, processing the received ranges on
the host side (in QEMU and KVM) is nearly instantaneous.

The unballoon operation is also pretty much instantaneous:
thanks to the merging of the ballooned out page ranges 200 GB of memory can
be returned to the guest in about 1 second.
With virtio-balloon this operation takes about 2.5 minutes.

These tests were done against a Windows Server 2019 guest running on a
Xeon E5-2699, after dirtying the whole memory inside guest before each
balloon operation.

Using a range tree instead of a bitmap to track the removed memory also
means that the solution scales well with the guest size: even a 1 TB range
takes just a few bytes of such metadata.

Since the required GTree operations aren't present in every Glib version
a check for them was added to the meson build script, together with new
"--enable-hv-balloon" and "--disable-hv-balloon" configure arguments.
If these GTree operations are missing in the system's Glib version this
driver will be skipped during QEMU build.

An optional "status-report=on" device parameter requests memory status
events from the guest (typically sent every second), which allow the host
to learn both the guest memory available and the guest memory in use
counts.
They are emitted externally as "HV_BALLOON_STATUS_REPORT" QMP events.

The driver is named hv-balloon since the Linux kernel client driver for
the Dynamic Memory Protocol is named as such and to follow the naming
pattern established by the virtio-balloon driver.
The whole protocol runs over Hyper-V VMBus.

The driver was tested against Windows Server 2012 R2, Windows Server 2016
and Windows Server 2019 guests and obeys the guest alignment requirements
reported to the host via DM_CAPABILITIES_REPORT message.

Signed-off-by: Maciej S. Szmigiero 
---

(...)

+/* OurRangePlugged */
+static OurRangePlugged *our_range_plugged_new(MemoryDeviceState *md,
+  HostMemoryBackend *hostmem,
+  uint64_t align,
+  Error **errp)
+{
+    ERRP_GUARD();
+    OurRangePlugged *our_range;
+    MachineState *ms = MACHINE(qdev_get_machine());
+    MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
+    uint64_t addr, count;
+
+    if (!align) {
+    align = HV_BALLOON_PAGE_SIZE;
+    }
+
+    if (host_memory_backend_is_mapped(hostmem)) {
+    error_setg(errp, "memory backend already mapped");
+    return NULL

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-12 Thread David Hildenbrand

On 12.06.23 16:00, Maciej S. Szmigiero wrote:

From: "Maciej S. Szmigiero" 

This driver is like virtio-balloon on steroids: it allows both changing the
guest memory allocation via ballooning and inserting pieces of extra RAM
into it on demand from a provided memory backend.

One of advantages of these over ACPI-based PC DIMM hotplug is that such
memory can be hotplugged in much smaller granularity because the ACPI DIMM
slot limit does not apply.

In order to enable hot-adding additional memory a new memory backend needs
to be created and provided to the driver via the "memdev" parameter.
This can be achieved by, for example, adding
"-object memory-backend-ram,id=mem1,size=32G" to the QEMU command line and
then instantiating the driver with "memdev=mem1" parameter.

In contrast with ACPI DIMM hotplug where one can only request to unplug a
whole DIMM stick this driver allows removing memory from guest in single
page (4k) units via ballooning.

The actual resizing is done via ballooning interface (for example, via
the "balloon" HMP command)
This includes resizing the guest past its boot size - that is, hot-adding
additional memory in granularity limited only by the guest alignment
requirements.

After a VM reboot the guest is back to its original (boot) size.

In the future, the guest boot memory size might be changed on reboot
instead, taking into account the effective size that VM had before that
reboot (much like Hyper-V does).

For performance reasons, the guest-released memory is tracked in a few
range trees, as a series of (start, count) ranges.
Each time a new page range is inserted into such tree its neighbors are
checked as candidates for possible merging with it.

Besides performance reasons, the Dynamic Memory protocol itself uses page
ranges as the data structure in its messages, so relevant pages need to be
merged into such ranges anyway.

One has to be careful when tracking the guest-released pages, since the
guest can maliciously report returning pages outside its current address
space, which later clash with the address range of newly added memory.
Similarly, the guest can report freeing the same page twice.

The above design results in much better ballooning performance than when
using virtio-balloon with the same guest: 230 GB / minute with this driver
versus 70 GB / minute with virtio-balloon.

During a ballooning operation most of time is spent waiting for the guest
to come up with newly freed page ranges, processing the received ranges on
the host side (in QEMU and KVM) is nearly instantaneous.

The unballoon operation is also pretty much instantaneous:
thanks to the merging of the ballooned out page ranges 200 GB of memory can
be returned to the guest in about 1 second.
With virtio-balloon this operation takes about 2.5 minutes.

These tests were done against a Windows Server 2019 guest running on a
Xeon E5-2699, after dirtying the whole memory inside guest before each
balloon operation.

Using a range tree instead of a bitmap to track the removed memory also
means that the solution scales well with the guest size: even a 1 TB range
takes just a few bytes of such metadata.

Since the required GTree operations aren't present in every Glib version
a check for them was added to the meson build script, together with new
"--enable-hv-balloon" and "--disable-hv-balloon" configure arguments.
If these GTree operations are missing in the system's Glib version this
driver will be skipped during QEMU build.

An optional "status-report=on" device parameter requests memory status
events from the guest (typically sent every second), which allow the host
to learn both the guest memory available and the guest memory in use
counts.
They are emitted externally as "HV_BALLOON_STATUS_REPORT" QMP events.

The driver is named hv-balloon since the Linux kernel client driver for
the Dynamic Memory Protocol is named as such and to follow the naming
pattern established by the virtio-balloon driver.
The whole protocol runs over Hyper-V VMBus.

The driver was tested against Windows Server 2012 R2, Windows Server 2016
and Windows Server 2019 guests and obeys the guest alignment requirements
reported to the host via DM_CAPABILITIES_REPORT message.

Signed-off-by: Maciej S. Szmigiero 
---
  Kconfig.host  |3 +
  hw/hyperv/Kconfig |5 +
  hw/hyperv/hv-balloon.c| 2040 +
  hw/hyperv/meson.build |1 +
  hw/hyperv/trace-events|   16 +
  meson.build   |   28 +-
  meson_options.txt |2 +
  qapi/machine.json |   25 +
  scripts/meson-buildoptions.sh |3 +
  9 files changed, 2122 insertions(+), 1 deletion(-)
  create mode 100644 hw/hyperv/hv-balloon.c

diff --git a/Kconfig.host b/Kconfig.host
index d763d892693c..2ee71578f38f 100644
--- a/Kconfig.host
+++ b/Kconfig.host
@@ -46,3 +46,6 @@ config FUZZ
  config VFIO_USER_SERVER_ALLOWED
  bool
  imply VFIO_USER_SERVER
+
+config HV_BALLO

[PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-12 Thread Maciej S. Szmigiero
From: "Maciej S. Szmigiero" 

This driver is like virtio-balloon on steroids: it allows both changing the
guest memory allocation via ballooning and inserting pieces of extra RAM
into it on demand from a provided memory backend.

One of advantages of these over ACPI-based PC DIMM hotplug is that such
memory can be hotplugged in much smaller granularity because the ACPI DIMM
slot limit does not apply.

In order to enable hot-adding additional memory a new memory backend needs
to be created and provided to the driver via the "memdev" parameter.
This can be achieved by, for example, adding
"-object memory-backend-ram,id=mem1,size=32G" to the QEMU command line and
then instantiating the driver with "memdev=mem1" parameter.

In contrast with ACPI DIMM hotplug where one can only request to unplug a
whole DIMM stick this driver allows removing memory from guest in single
page (4k) units via ballooning.

The actual resizing is done via ballooning interface (for example, via
the "balloon" HMP command)
This includes resizing the guest past its boot size - that is, hot-adding
additional memory in granularity limited only by the guest alignment
requirements.

After a VM reboot the guest is back to its original (boot) size.

In the future, the guest boot memory size might be changed on reboot
instead, taking into account the effective size that VM had before that
reboot (much like Hyper-V does).

For performance reasons, the guest-released memory is tracked in a few
range trees, as a series of (start, count) ranges.
Each time a new page range is inserted into such tree its neighbors are
checked as candidates for possible merging with it.

Besides performance reasons, the Dynamic Memory protocol itself uses page
ranges as the data structure in its messages, so relevant pages need to be
merged into such ranges anyway.

One has to be careful when tracking the guest-released pages, since the
guest can maliciously report returning pages outside its current address
space, which later clash with the address range of newly added memory.
Similarly, the guest can report freeing the same page twice.

The above design results in much better ballooning performance than when
using virtio-balloon with the same guest: 230 GB / minute with this driver
versus 70 GB / minute with virtio-balloon.

During a ballooning operation most of time is spent waiting for the guest
to come up with newly freed page ranges, processing the received ranges on
the host side (in QEMU and KVM) is nearly instantaneous.

The unballoon operation is also pretty much instantaneous:
thanks to the merging of the ballooned out page ranges 200 GB of memory can
be returned to the guest in about 1 second.
With virtio-balloon this operation takes about 2.5 minutes.

These tests were done against a Windows Server 2019 guest running on a
Xeon E5-2699, after dirtying the whole memory inside guest before each
balloon operation.

Using a range tree instead of a bitmap to track the removed memory also
means that the solution scales well with the guest size: even a 1 TB range
takes just a few bytes of such metadata.

Since the required GTree operations aren't present in every Glib version
a check for them was added to the meson build script, together with new
"--enable-hv-balloon" and "--disable-hv-balloon" configure arguments.
If these GTree operations are missing in the system's Glib version this
driver will be skipped during QEMU build.

An optional "status-report=on" device parameter requests memory status
events from the guest (typically sent every second), which allow the host
to learn both the guest memory available and the guest memory in use
counts.
They are emitted externally as "HV_BALLOON_STATUS_REPORT" QMP events.

The driver is named hv-balloon since the Linux kernel client driver for
the Dynamic Memory Protocol is named as such and to follow the naming
pattern established by the virtio-balloon driver.
The whole protocol runs over Hyper-V VMBus.

The driver was tested against Windows Server 2012 R2, Windows Server 2016
and Windows Server 2019 guests and obeys the guest alignment requirements
reported to the host via DM_CAPABILITIES_REPORT message.

Signed-off-by: Maciej S. Szmigiero 
---
 Kconfig.host  |3 +
 hw/hyperv/Kconfig |5 +
 hw/hyperv/hv-balloon.c| 2040 +
 hw/hyperv/meson.build |1 +
 hw/hyperv/trace-events|   16 +
 meson.build   |   28 +-
 meson_options.txt |2 +
 qapi/machine.json |   25 +
 scripts/meson-buildoptions.sh |3 +
 9 files changed, 2122 insertions(+), 1 deletion(-)
 create mode 100644 hw/hyperv/hv-balloon.c

diff --git a/Kconfig.host b/Kconfig.host
index d763d892693c..2ee71578f38f 100644
--- a/Kconfig.host
+++ b/Kconfig.host
@@ -46,3 +46,6 @@ config FUZZ
 config VFIO_USER_SERVER_ALLOWED
 bool
 imply VFIO_USER_SERVER
+
+config HV_BALLOON_POSSIBLE
+bool
diff --git a/hw/hyperv/Kconfig b/hw/hype