Re: [PATCH 2/2] vhost-use-blk: convert to new virtio_delete_queue
On 2/21/2020 7:31 PM, Stefan Hajnoczi wrote: > On Thu, Feb 13, 2020 at 09:28:07AM +0800, pannengy...@huawei.com wrote: >> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c >> index 2eba8b9db0..ed6a5cc03b 100644 >> --- a/hw/block/vhost-user-blk.c >> +++ b/hw/block/vhost-user-blk.c >> @@ -420,9 +420,10 @@ static void vhost_user_blk_device_realize(DeviceState >> *dev, Error **errp) >> virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, >> sizeof(struct virtio_blk_config)); >> >> +s->virtqs = g_new0(VirtQueue *, s->num_queues); > > Minor point, up to you if you want to change it: the array is fully > initialized by the for loop in the next line. There is no need to clear > the memory first: > > s/g_new0/g_new/ OK, it's fine, I will change it. Thanks. > >> diff --git a/include/hw/virtio/vhost-user-blk.h >> b/include/hw/virtio/vhost-user-blk.h >> index 108bfadeeb..f68911f6f0 100644 >> --- a/include/hw/virtio/vhost-user-blk.h >> +++ b/include/hw/virtio/vhost-user-blk.h >> @@ -37,6 +37,7 @@ typedef struct VHostUserBlk { >> struct vhost_inflight *inflight; >> VhostUserState vhost_user; >> struct vhost_virtqueue *vqs; >> +VirtQueue **virtqs; > > Both vqs and virtqs exist and are easily confused. Please rename vqs to > vhost_vqs. OK, I will do it. Thanks. >
Re: [PATCH] target: i386: Check float overflow about register stack
On 22/02/20 03:10, Chen Gang wrote: > Set C1 to 1 if stack overflow occurred; set to 0 otherwise". > > In helper_fxam_ST0, I guess, we need "env->fpus |= 0x200" (but I don't > know wheter it will be conflict with SIGND(temp)). And we have to still > need foverflow, because all env->fptags being 0 doesn't mean overflow. No, you need to add "env->fpus |= 0x200" and "env->fpus &= ~0x200" directly to fpush, fpop, etc. Polo
[Bug 1814420] Re: drive-backup with iscsi, it will failed "Could not create image: Invalid argument"
[Expired for QEMU because there has been no activity for 60 days.] ** Changed in: qemu Status: Incomplete => Expired -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1814420 Title: drive-backup with iscsi, it will failed "Could not create image: Invalid argument" Status in QEMU: Expired Bug description: I use iscsi protocol to drive-backup: ---iscsi target--- yum -y install targetcli python-rtslib systemctl start target systemctl enable target targetcli /iscsi create iqn.2019-01.com.iaas targetcli /iscsi/iqn.2019-01.com.iaas/tpg1 set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 targetcli /iscsi/iqn.2019-01.com.iaas/tpg1/portals create 192.168.1.1 3260 targetcli /backstores/fileio create file1 /opt/file1 2G targetcli /iscsi/iqn.2019-01.com.iaas/tpg1/luns create /backstores/fileio/file1 --- Now, '{ "execute" : "drive-backup" , "arguments" : { "device" : "drive-virtio-disk0" , "sync" : "top" , "target" : "iscsi://192.168.1.1:3260/iqn.2019-01.com.iaas/0" } }' It may failed: {"id":"libvirt-1785","error":{"class":"GenericError","desc":"Could not create image: Invalid argument"}} But, This abnormal will be appear at the first time. Because when I retry again, It works very well. Then, I re-start the vm, It still be failed 'Could not create image: Invalid argument' on the first try, and the second try it will work very well. --- Host: centos 7.5 qemu version: 2.12 and 3.1.0 qemu command line: qemu-system-x86_64 -name guest=test,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-190-test./master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off,mem-merge=off -m 1024 -mem-prealloc -mem-path /dev/hugepages1G/libvirt/qemu/190-test -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 1c8611c2-a18a-4b1c-b40b-9d82040eafa4 -smbios type=1,manufacturer=IaaS -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=31,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/opt/vol/sas/fb0c7c37-13e7-41fe-b3f8-f0fbaaeec7ce,format=qcow2,if=none,id=drive-virtio-disk0,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on -drive file=/opt/vol/sas/bde66671-536d-49cd-8b46-a4f1ea7be513,format=qcow2,if=none,id=drive-virtio-disk1,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1,write-cache=on -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:85:45:3e:d4:3a,bus=pci.0,addr=0x6 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=35,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1814420/+subscriptions
[Bug 1719282] Re: Unable to boot after drive-mirror
[Expired for QEMU because there has been no activity for 60 days.] ** Changed in: qemu Status: Incomplete => Expired -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1719282 Title: Unable to boot after drive-mirror Status in QEMU: Expired Bug description: Hi, I am using "drive-mirror" qmp block-job command to transfer VM disk image to other path (different physical disk on host). Unfortunately after shutting down and starting from new image, VM is unable to boot and qrub enters rescue mode displaying following error: ``` error: file '/grub/i386-pc/normal.mod' not found. Entering rescue mode... grub rescue> ``` To investigate the problem, I compared both RAW images using linux "cmp -l" command and found out that they differ in 569028 bytes starting from address 185598977 to 252708864 which are located on /boot partition. So I mounted /boot partition of mirrored RAW image on host OS and it seems that file-system is broken and grub folder is not recognized. But /boot on original RAW image has no problem. Mirrored Image: ls -l /mnt/vm-boot/ ls: cannot access /mnt/vm-boot/grub: Structure needs cleaning total 38168 -rw-r--r-- 1 root root 157721 Oct 19 2016 config-3.16.0-4-amd64 -rw-r--r-- 1 root root 129281 Sep 20 2015 config-3.2.0-4-amd64 d? ? ?? ?? grub -rw-r--r-- 1 root root 15739360 Nov 2 2016 initrd.img-3.16.0-4-amd64 -rw-r--r-- 1 root root 12115412 Oct 10 2015 initrd.img-3.2.0-4-amd64 drwxr-xr-x 2 root root12288 Oct 7 2013 lost+found -rw-r--r-- 1 root root 2679264 Oct 19 2016 System.map-3.16.0-4-amd64 -rw-r--r-- 1 root root 2114662 Sep 20 2015 System.map-3.2.0-4-amd64 -rw-r--r-- 1 root root 3126448 Oct 19 2016 vmlinuz-3.16.0-4-amd64 -rw-r--r-- 1 root root 2842592 Sep 20 2015 vmlinuz-3.2.0-4-amd64 Original Image: ls /mnt/vm-boot/ -l total 38173 -rw-r--r-- 1 root root 157721 Oct 19 2016 config-3.16.0-4-amd64 -rw-r--r-- 1 root root 129281 Sep 20 2015 config-3.2.0-4-amd64 drwxr-xr-x 5 root root 5120 Nov 2 2016 grub -rw-r--r-- 1 root root 15739360 Nov 2 2016 initrd.img-3.16.0-4-amd64 -rw-r--r-- 1 root root 12115412 Oct 10 2015 initrd.img-3.2.0-4-amd64 drwxr-xr-x 2 root root12288 Oct 7 2013 lost+found -rw-r--r-- 1 root root 2679264 Oct 19 2016 System.map-3.16.0-4-amd64 -rw-r--r-- 1 root root 2114662 Sep 20 2015 System.map-3.2.0-4-amd64 -rw-r--r-- 1 root root 3126448 Oct 19 2016 vmlinuz-3.16.0-4-amd64 -rw-r--r-- 1 root root 2842592 Sep 20 2015 vmlinuz-3.2.0-4-amd64 ls /mnt/vm-boot/grub/ -l total 2376 -rw-r--r-- 1 root root 48 Oct 7 2013 device.map drwxr-xr-x 2 root root1024 Oct 10 2015 fonts -r--r--r-- 1 root root9432 Nov 2 2016 grub.cfg -rw-r--r-- 1 root root1024 Oct 7 2013 grubenv drwxr-xr-x 2 root root6144 Aug 6 2016 i386-pc drwxr-xr-x 2 root root1024 Aug 6 2016 locale -rw-r--r-- 1 root root 2400500 Aug 6 2016 unicode.pf2 qemu Version: 2.7.0-10 Host OS: Debian 8x64 Guest OS: Debian 8x64 QMP Commands log: socat UNIX-CONNECT:/var/run/qemu-server/48016.qmp STDIO {"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 2}, "package": "pve-qemu-kvm_2.7.0-10"}, "capabilities": []}} { "execute": "qmp_capabilities" } {"return": {}} { "execute": "drive-mirror", "arguments": { "device": "drive-ide0", "target": "/diskc/48016/vm-48016-disk-2.raw", "sync": "full", "mode": "absolute-paths", "speed": 0 } } {"return": {}} {"timestamp": {"seconds": 1506331591, "microseconds": 623095}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-ide0", "len": 269445758976, "offset": 269445758976, "speed": 0, "type": "mirror"}} {"timestamp": {"seconds": 1506332641, "microseconds": 245272}, "event": "SHUTDOWN"} {"timestamp": {"seconds": 1506332641, "microseconds": 377751}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-ide0", "len": 271707340800, "offset": 271707340800, "speed": 0, "type": "mirror"}} To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1719282/+subscriptions
Re: [PATCH 2/2] riscv: sifive_u: Update BIOS_FILENAME for 32-bit
On Fri, Feb 21, 2020 at 6:53 PM Bin Meng wrote: > > On Sat, Feb 22, 2020 at 3:51 AM Alistair Francis wrote: > > > > On Thu, Feb 20, 2020 at 6:43 AM Bin Meng wrote: > > > > > > Update BIOS_FILENAME to consider 32-bit bios image file name. > > > > > > Tested booting Linux v5.5 32-bit image (built from rv32_defconfig > > > plus CONFIG_SOC_SIFIVE) with the default 32-bit bios image. > > > > Do we really want to support a 32-bit sifive_u machine? > > > > QEMU is an emulator, why not? With 32-bit sifive_u machine, we can > have 32-bit test coverage for SiFive specific drivers that cannot be > done with the 'virt' machine. That is a good point. Reviewed-by: Alistair Francis Alistair > > Regards, > Bin
Re: [PATCH 2/2] riscv: sifive_u: Update BIOS_FILENAME for 32-bit
On Sat, Feb 22, 2020 at 3:51 AM Alistair Francis wrote: > > On Thu, Feb 20, 2020 at 6:43 AM Bin Meng wrote: > > > > Update BIOS_FILENAME to consider 32-bit bios image file name. > > > > Tested booting Linux v5.5 32-bit image (built from rv32_defconfig > > plus CONFIG_SOC_SIFIVE) with the default 32-bit bios image. > > Do we really want to support a 32-bit sifive_u machine? > QEMU is an emulator, why not? With 32-bit sifive_u machine, we can have 32-bit test coverage for SiFive specific drivers that cannot be done with the 'virt' machine. Regards, Bin
Re: [PATCH] target: i386: Check float overflow about register stack
On 2020/2/22 上午10:10, Chen Gang wrote: > On 2020/2/22 上午12:18, Paolo Bonzini wrote: >> On 21/02/20 15:09, Chen Gang wrote: -/* XXX: test fptags too */ +if (env->fptags[env->fpstt]) { +env->fpus |= 0x4100; /* Empty */ +return; +} + >>> For fpop overflow, this fix is enough, but for me, we still need >>> foverflow to check fpush/fld*_ST0 overflow. >>> >>> Don't you think we need check fpush/f[i]ld*_ST0 overflow? >> >> After fld/fild or any other push, FXAM ST0 should not return empty in my >> opinion. >> > > OK, it sounds reasonable. > > After check the intel document for f[i]ld* instructions, it says: > > "Set C1 to 1 if stack overflow occurred; set to 0 otherwise". > > In helper_fxam_ST0, I guess, we need "env->fpus |= 0x200" (but I don't > know wheter it will be conflict with SIGND(temp)). And we have to still > need foverflow, because all env->fptags being 0 doesn't mean overflow. > After read the document more about fstp* and fld* instructions, I find that fstp* are also related with C1, and fld* will generate IS exception when stack underflow or overflow occurred. It seems more modifications should be done (but they will be several patches). Thanks.
Re: [PATCH] target: i386: Check float overflow about register stack
On 2020/2/22 上午12:18, Paolo Bonzini wrote: > On 21/02/20 15:09, Chen Gang wrote: >>> -/* XXX: test fptags too */ >>> +if (env->fptags[env->fpstt]) { >>> +env->fpus |= 0x4100; /* Empty */ >>> +return; >>> +} >>> + >> For fpop overflow, this fix is enough, but for me, we still need >> foverflow to check fpush/fld*_ST0 overflow. >> >> Don't you think we need check fpush/f[i]ld*_ST0 overflow? > > After fld/fild or any other push, FXAM ST0 should not return empty in my > opinion. > OK, it sounds reasonable. After check the intel document for f[i]ld* instructions, it says: "Set C1 to 1 if stack overflow occurred; set to 0 otherwise". In helper_fxam_ST0, I guess, we need "env->fpus |= 0x200" (but I don't know wheter it will be conflict with SIGND(temp)). And we have to still need foverflow, because all env->fptags being 0 doesn't mean overflow. Thanks.
Re: [PATCH] hw/ide: Remove status register read side effect
> The chip docs don't mention any side effect, they only say that the > DMA > IRQ and Error bits in the bus master IDE status reg (bits 2 and 1) > are > write 1 to clear I haven't found any documentation that mention that side effect either. As you say, it only mentions that you should set the bit to clear. While the side effect of clearing interrupts by reading the status register doesn't appear to be in documentation anywhere (to my knowledge), I do see this side effect referenced in the source code of drivers occasionally. In /drivers/ide/ide-io.c of the Linux kernel: /* * Whack the status register, just in case * we have a leftover pending IRQ. */ (void)hwif->tp_ops->read_status(hwif); Along with: * There's nothing really useful we can do with an unexpected interrupt, * other than reading the status register (to clear it), and logging it. The CMD64x specific code in the Linux kernel does explicitly set the IRQ bits in the bus master IDE status register to clear it. I'm not sure why, so maybe someone else can chime in explaining why Linux sometimes clears interrupts by reading the status register and other times follows the documentation and sets the required bits. The OpenBSD driver also appears to set the bits explicitly. I think the reason why the Solaris 10 driver crashes fatally whereas Linux and OpenBSD ignore the side effect is because when clearing interrupts, Solaris 10 expects the interrupt bit to be set and checks this. Linux and OpenBSD appear to clear it regardless of its state. With the patch, I didn't notice any regression for OpenBSD under Sun4u and there were improvements to the Solaris 10 boot under Sun4u. > Unfortunately > this does not change my problems with other BMDMA devices on PPC. This patch doesn't solve all the problems for Solaris 10. It gets further in the boot process but it is still unable to mount the file system. I suspect that there are more bugs in the IDE/CMD646 emulation. I'm going to continue looking into it. It's still possible we are being affected by the same bugs. Right now, I'm considering that the unresponsive serial console under Sun4u on Solaris 10 is due to not being able to mount the file system and pull the required assets for the installation menu. > this change seems to break > something else according to a CI test report from patchew. The test appears to break here in /tests/qtest/ide-test.c for obvious reasons: /* Reading the status register clears the IRQ */ g_assert(!qtest_get_irq(qts, IDE_PRIMARY_IRQ)); Should the patch I've suggested be correct, this test would need to be updated. This is my first patch attempt for QEMU so I'm not sure what the process is. Should I be submitting a new V2 patch with these changes? I won't have the opportunity to update the patch for a few days but will continue watching the thread for reviews. Thanks, Jasper Lowell. On Fri, 2020-02-21 at 16:43 +0100, BALATON Zoltan wrote: > On Fri, 21 Feb 2020, jasper.low...@bt.com wrote: > > The Linux libATA API documentation mentions that on some hardware, > > reading the status register has the side effect of clearing the > > interrupt condition. When emulating the generic Sun4u machine > > running > > Solaris 10, the Solaris 10 CMD646 driver exits fatally because of > > this > > emulated side effect. This side effect is likely to not exist on > > real > > CMD646 hardware. > > The chip docs don't mention any side effect, they only say that the > DMA > IRQ and Error bits in the bus master IDE status reg (bits 2 and 1) > are > write 1 to clear so this might be OK but this change seems to break > something else according to a CI test report from patchew. > Unfortunately > this does not change my problems with other BMDMA devices on PPC. > > Regards, > BALATON Zoltan > > > Signed-off-by: Jasper Lowell > > --- > > hw/ide/core.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/hw/ide/core.c b/hw/ide/core.c > > index 8eb766..82fd0632ac 100644 > > --- a/hw/ide/core.c > > +++ b/hw/ide/core.c > > @@ -2210,7 +2210,6 @@ uint32_t ide_ioport_read(void *opaque, > > uint32_t addr) > > } else { > > ret = s->status; > > } > > -qemu_irq_lower(bus->irq); > > break; > > } > > > >
[PATCH] linux-user: Add an argument QEMU_MMAP_BASE to set custom mmap base address in qemu user mode
This change allows us to set custom base address for guest programs. It is needed to allow qemu to work with Thread Sanitizer (TSan), which has specific boundary definitions for memory mappings on different platforms: https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/tsan/rtl/tsan_platform.h Signed-off-by: Lirong Yuan --- linux-user/main.c | 12 linux-user/mmap.c | 3 ++- linux-user/qemu.h | 5 + 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/linux-user/main.c b/linux-user/main.c index fba833aac9..c01af6bfee 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -336,6 +336,16 @@ static void handle_arg_guest_base(const char *arg) have_guest_base = 1; } +static void handle_arg_mmap_base(const char *arg) +{ +int err = qemu_strtoul(arg, NULL, 0, _base); +if (err) { +fprintf(stderr, "Invalid mmap_base: %s, err: %d\n", arg, err); +exit(EXIT_FAILURE); +} +mmap_next_start = mmap_base; +} + static void handle_arg_reserved_va(const char *arg) { char *p; @@ -440,6 +450,8 @@ static const struct qemu_argument arg_table[] = { "uname", "set qemu uname release string to 'uname'"}, {"B", "QEMU_GUEST_BASE", true, handle_arg_guest_base, "address","set guest_base address to 'address'"}, +{"mmap_base", "QEMU_MMAP_BASE", true, handle_arg_mmap_base, + "", "begin allocating guest pages at this host address"}, {"R", "QEMU_RESERVED_VA", true, handle_arg_reserved_va, "size", "reserve 'size' bytes for guest virtual address space"}, {"d", "QEMU_LOG", true, handle_arg_log, diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 8685f02e7e..3f35543acf 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -189,6 +189,7 @@ static int mmap_frag(abi_ulong real_start, # define TASK_UNMAPPED_BASE 0x4000 #endif abi_ulong mmap_next_start = TASK_UNMAPPED_BASE; +abi_ulong mmap_base = TASK_UNMAPPED_BASE; unsigned long last_brk; @@ -299,7 +300,7 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, abi_ulong align) if ((addr & (align - 1)) == 0) { /* Success. */ -if (start == mmap_next_start && addr >= TASK_UNMAPPED_BASE) { +if (start == mmap_next_start && addr >= mmap_base) { mmap_next_start = addr + size; } return addr; diff --git a/linux-user/qemu.h b/linux-user/qemu.h index 560a68090e..83c00cfea2 100644 --- a/linux-user/qemu.h +++ b/linux-user/qemu.h @@ -161,6 +161,11 @@ void task_settid(TaskState *); void stop_all_tasks(void); extern const char *qemu_uname_release; extern unsigned long mmap_min_addr; +/* + * mmap_base is minimum address to use when allocating guest pages. All guest + * pages will be allocated at this (guest) address or higher addresses. + */ +extern abi_ulong mmap_base; /* ??? See if we can avoid exposing so much of the loader internals. */ -- 2.25.0.265.gbab2e86ba0-goog
Re: RFC: Split EPT huge pages in advance of dirty logging
On Fri, Feb 21, 2020 at 2:08 PM Junaid Shahid wrote: > > On 2/20/20 9:34 AM, Ben Gardon wrote: > > > > FWIW, we currently do this eager splitting at Google for live > > migration. When the log-dirty-memory flag is set on a memslot we > > eagerly split all pages in the slot down to 4k granularity. > > As Jay said, this does not cause crippling lock contention because the > > vCPU page faults generated by write protection / splitting can be > > resolved in the fast page fault path without acquiring the MMU lock. > > I believe +Junaid Shahid tried to upstream this approach at some point > > in the past, but the patch set didn't make it in. (This was before my > > time, so I'm hoping he has a link.) > > I haven't done the analysis to know if eager splitting is more or less > > efficient with parallel slow-path page faults, but it's definitely > > faster under the MMU lock. > > > > I am not sure if we ever posted those patches upstream. Peter Feiner would > know for sure. One notable difference in what we do compared to the approach > outlined by Jay is that we don't rely on tdp_page_fault() to do the > splitting. So we don't have to create a dummy VCPU and the specialized split > function is also much faster. We've been carrying these patches since 2015. I've never posted them. Getting them in shape for upstream consumption will take some work. I can look into this next week. Peter
Re: [PATCH] linux-user: Add an argument QEMU_MMAP_BASE to set custom mmap base address in qemu user mode
Patchew URL: https://patchew.org/QEMU/20200221214614.165338-1-yua...@google.com/ Hi, This series seems to have some coding style problems. See output below for more information: Subject: [PATCH] linux-user: Add an argument QEMU_MMAP_BASE to set custom mmap base address in qemu user mode Message-id: 20200221214614.165338-1-yua...@google.com Type: series === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 38a7b04 linux-user: Add an argument QEMU_MMAP_BASE to set custom mmap base address in qemu user mode === OUTPUT BEGIN === ERROR: consider using qemu_strtoul in preference to strtoul #23: FILE: linux-user/main.c:341: +mmap_base = strtoul(arg, NULL, 0); total: 1 errors, 0 warnings, 50 lines checked Commit 38a7b049a7f8 (linux-user: Add an argument QEMU_MMAP_BASE to set custom mmap base address in qemu user mode) has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20200221214614.165338-1-yua...@google.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
Re: RFC: Split EPT huge pages in advance of dirty logging
On 2/20/20 9:34 AM, Ben Gardon wrote: FWIW, we currently do this eager splitting at Google for live migration. When the log-dirty-memory flag is set on a memslot we eagerly split all pages in the slot down to 4k granularity. As Jay said, this does not cause crippling lock contention because the vCPU page faults generated by write protection / splitting can be resolved in the fast page fault path without acquiring the MMU lock. I believe +Junaid Shahid tried to upstream this approach at some point in the past, but the patch set didn't make it in. (This was before my time, so I'm hoping he has a link.) I haven't done the analysis to know if eager splitting is more or less efficient with parallel slow-path page faults, but it's definitely faster under the MMU lock. I am not sure if we ever posted those patches upstream. Peter Feiner would know for sure. One notable difference in what we do compared to the approach outlined by Jay is that we don't rely on tdp_page_fault() to do the splitting. So we don't have to create a dummy VCPU and the specialized split function is also much faster. Thanks, Junaid
[PATCH] linux-user: Add an argument QEMU_MMAP_BASE to set custom mmap base address in qemu user mode
This change allows us to set custom base address for guest programs. It is needed to allow qemu to work with Thread Sanitizer (TSan), which has specific boundary definitions for memory mappings on different platforms: https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/tsan/rtl/tsan_platform.h Signed-off-by: Lirong Yuan --- linux-user/main.c | 12 linux-user/mmap.c | 3 ++- linux-user/qemu.h | 5 + 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/linux-user/main.c b/linux-user/main.c index fba833aac9..dfcd867399 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -336,6 +336,16 @@ static void handle_arg_guest_base(const char *arg) have_guest_base = 1; } +static void handle_arg_mmap_base(const char *arg) +{ +mmap_base = strtoul(arg, NULL, 0); +if (mmap_base == 0) { +fprintf(stderr, "Invalid mmap_base: %s\n", arg); +exit(EXIT_FAILURE); +} +mmap_next_start = mmap_base; +} + static void handle_arg_reserved_va(const char *arg) { char *p; @@ -440,6 +450,8 @@ static const struct qemu_argument arg_table[] = { "uname", "set qemu uname release string to 'uname'"}, {"B", "QEMU_GUEST_BASE", true, handle_arg_guest_base, "address","set guest_base address to 'address'"}, +{"mmap_base", "QEMU_MMAP_BASE", true, handle_arg_mmap_base, + "", "begin allocating guest pages at this host address"}, {"R", "QEMU_RESERVED_VA", true, handle_arg_reserved_va, "size", "reserve 'size' bytes for guest virtual address space"}, {"d", "QEMU_LOG", true, handle_arg_log, diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 8685f02e7e..3f35543acf 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -189,6 +189,7 @@ static int mmap_frag(abi_ulong real_start, # define TASK_UNMAPPED_BASE 0x4000 #endif abi_ulong mmap_next_start = TASK_UNMAPPED_BASE; +abi_ulong mmap_base = TASK_UNMAPPED_BASE; unsigned long last_brk; @@ -299,7 +300,7 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, abi_ulong align) if ((addr & (align - 1)) == 0) { /* Success. */ -if (start == mmap_next_start && addr >= TASK_UNMAPPED_BASE) { +if (start == mmap_next_start && addr >= mmap_base) { mmap_next_start = addr + size; } return addr; diff --git a/linux-user/qemu.h b/linux-user/qemu.h index 560a68090e..83c00cfea2 100644 --- a/linux-user/qemu.h +++ b/linux-user/qemu.h @@ -161,6 +161,11 @@ void task_settid(TaskState *); void stop_all_tasks(void); extern const char *qemu_uname_release; extern unsigned long mmap_min_addr; +/* + * mmap_base is minimum address to use when allocating guest pages. All guest + * pages will be allocated at this (guest) address or higher addresses. + */ +extern abi_ulong mmap_base; /* ??? See if we can avoid exposing so much of the loader internals. */ -- 2.25.0.265.gbab2e86ba0-goog
[PATCH v2 0/1] Enable PMR feature from NVMe 1.4 spec to NVMe driver
Changes since v1: - provided support for Bit 1 from PMRWBM register instead of Bit 0 to ensure improved performance in virtualized environment [1] (Stefan) - added check if pmr size is power of two in size (David) - addressed cross compilation build problems reported by CI environment [1]: https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf [2]: https://lore.kernel.org/qemu-devel/20200218224811.30050-1-andrzej.jakow...@linux.intel.com/ --- Persistent Memory Region (PMR) is a new optional feature provided in NVMe 1.4 specification. This patch implements initial support for it in NVMe driver. Andrzej Jakowski (1): block/nvme: introduce PMR support from NVMe 1.4 spec hw/block/nvme.c | 165 +++- hw/block/nvme.h | 5 ++ hw/block/trace-events | 5 ++ include/block/nvme.h | 172 ++ 4 files changed, 346 insertions(+), 1 deletion(-) -- 2.21.1
[PATCH v2 1/1] block/nvme: introduce PMR support from NVMe 1.4 spec
This patch introduces support for PMR that has been defined as part of NVMe 1.4 spec. User can now specify a pmr_file which will be mmap'ed into qemu address space and subsequently in PCI BAR 2. Guest OS can perform mmio read and writes to the PMR region that will stay persistent accross system reboot. Signed-off-by: Andrzej Jakowski --- hw/block/nvme.c | 165 +++- hw/block/nvme.h | 5 ++ hw/block/trace-events | 5 ++ include/block/nvme.h | 172 ++ 4 files changed, 346 insertions(+), 1 deletion(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index d28335cbf3..ff7e74d765 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -19,10 +19,14 @@ * -drive file=,if=none,id= * -device nvme,drive=,serial=,id=, \ * cmb_size_mb=, \ + * [pmr_file=,] \ * num_queues= * * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. + * + * Either cmb or pmr - due to limitation in avaialbe BAR indexes. + * pmr_file file needs to be preallocated and power of two in size. */ #include "qemu/osdep.h" @@ -1141,6 +1145,28 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data, NVME_GUEST_ERR(nvme_ub_mmiowr_cmbsz_readonly, "invalid write to read only CMBSZ, ignored"); return; +#ifndef _WIN32 +case 0xE00: /* PMRCAP */ +NVME_GUEST_ERR(nvme_ub_mmiowr_pmrcap_readonly, + "invalid write to PMRCAP register, ignored"); +return; +case 0xE04: /* TODO PMRCTL */ +break; +case 0xE08: /* PMRSTS */ +NVME_GUEST_ERR(nvme_ub_mmiowr_pmrsts_readonly, + "invalid write to PMRSTS register, ignored"); +return; +case 0xE0C: /* PMREBS */ +NVME_GUEST_ERR(nvme_ub_mmiowr_pmrebs_readonly, + "invalid write to PMREBS register, ignored"); +return; +case 0xE10: /* PMRSWTP */ +NVME_GUEST_ERR(nvme_ub_mmiowr_pmrswtp_readonly, + "invalid write to PMRSWTP register, ignored"); +return; +case 0xE14: /* TODO PMRMSC */ + break; +#endif /* !_WIN32 */ default: NVME_GUEST_ERR(nvme_ub_mmiowr_invalid, "invalid MMIO write," @@ -1169,6 +1195,22 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, unsigned size) } if (addr < sizeof(n->bar)) { +#ifndef _WIN32 +/* + * When PMRWBM bit 1 is set then read from + * from PMRSTS should ensure prior writes + * made it to persistent media + */ +if (addr == 0xE08 && +(NVME_PMRCAP_PMRWBM(n->bar.pmrcap) & 0x02) >> 1) { +int ret; +ret = msync(n->pmrbuf, n->f_pmr_size, MS_SYNC); +if (!ret) { +NVME_GUEST_ERR(nvme_ub_mmiord_pmrread_barrier, + "error while persisting data"); +} +} +#endif /* !_WIN32 */ memcpy(, ptr + addr, size); } else { NVME_GUEST_ERR(nvme_ub_mmiord_invalid_ofs, @@ -1303,6 +1345,31 @@ static const MemoryRegionOps nvme_cmb_ops = { }, }; +#ifndef _WIN32 +static void nvme_pmr_write(void *opaque, hwaddr addr, uint64_t data, +unsigned size) +{ +NvmeCtrl *n = (NvmeCtrl *)opaque; +stn_le_p(>pmrbuf[addr], size, data); +} + +static uint64_t nvme_pmr_read(void *opaque, hwaddr addr, unsigned size) +{ +NvmeCtrl *n = (NvmeCtrl *)opaque; +return ldn_le_p(>pmrbuf[addr], size); +} + +static const MemoryRegionOps nvme_pmr_ops = { +.read = nvme_pmr_read, +.write = nvme_pmr_write, +.endianness = DEVICE_LITTLE_ENDIAN, +.impl = { +.min_access_size = 1, +.max_access_size = 8, +}, +}; +#endif /* !_WIN32 */ + static void nvme_realize(PCIDevice *pci_dev, Error **errp) { NvmeCtrl *n = NVME(pci_dev); @@ -1332,6 +1399,39 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) error_setg(errp, "serial property not set"); return; } + +#ifndef _WIN32 +if (!n->cmb_size_mb && n->pmr_file) { +int fd; + +n->f_pmr = fopen(n->pmr_file, "r+b"); +if (!n->f_pmr) { +error_setg(errp, "pmr backend file open error"); +return; +} + +fseek(n->f_pmr, 0L, SEEK_END); +n->f_pmr_size = ftell(n->f_pmr); +fseek(n->f_pmr, 0L, SEEK_SET); + +/* pmr file size needs to be power of 2 in size */ +if (!n->f_pmr_size || !is_power_of_2(n->f_pmr_size)) { +error_setg(errp, "pmr backend file size needs to be greater than 0" + "and power of 2 in size"); +return; +} + +fd = fileno(n->f_pmr); +n->pmrbuf = mmap(NULL, n->f_pmr_size, + (PROT_READ | PROT_WRITE), MAP_SHARED, fd, 0); + +
[PATCH] linux-user: Add AT_EXECFN and AT_EXECFD auxval
This change adds the support for AT_EXECFN and AT_EXECFD auxval. Signed-off-by: Lirong Yuan --- linux-user/elfload.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index f3080a1635..7e0f3042f1 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -1568,7 +1568,7 @@ struct exec ~(abi_ulong)(TARGET_ELF_EXEC_PAGESIZE-1)) #define TARGET_ELF_PAGEOFFSET(_v) ((_v) & (TARGET_ELF_EXEC_PAGESIZE-1)) -#define DLINFO_ITEMS 15 +#define DLINFO_ITEMS 17 static inline void memcpy_fromfs(void * to, const void * from, unsigned long n) { @@ -1888,11 +1888,14 @@ static abi_ulong loader_build_fdpic_loadmap(struct image_info *info, abi_ulong s return sp; } -static abi_ulong create_elf_tables(abi_ulong p, int argc, int envc, +static abi_ulong create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec, struct image_info *info, struct image_info *interp_info) { +abi_ulong p = bprm->p; +int argc = bprm->argc; +int envc = bprm->envc; abi_ulong sp; abi_ulong u_argc, u_argv, u_envp, u_auxv; int size; @@ -2032,6 +2035,8 @@ static abi_ulong create_elf_tables(abi_ulong p, int argc, int envc, NEW_AUX_ENT(AT_CLKTCK, (abi_ulong) sysconf(_SC_CLK_TCK)); NEW_AUX_ENT(AT_RANDOM, (abi_ulong) u_rand_bytes); NEW_AUX_ENT(AT_SECURE, (abi_ulong) qemu_getauxval(AT_SECURE)); +NEW_AUX_ENT(AT_EXECFN, info->file_string); +NEW_AUX_ENT(AT_EXECFD, bprm->fd); #ifdef ELF_HWCAP2 NEW_AUX_ENT(AT_HWCAP2, (abi_ulong) ELF_HWCAP2); @@ -2870,8 +2875,8 @@ int load_elf_binary(struct linux_binprm *bprm, struct image_info *info) #endif } -bprm->p = create_elf_tables(bprm->p, bprm->argc, bprm->envc, _ex, -info, (elf_interpreter ? _info : NULL)); +bprm->p = create_elf_tables(bprm, _ex, info, +(elf_interpreter ? _info : NULL)); info->start_stack = bprm->p; /* If we have an interpreter, set that as the program's entry point. -- 2.25.0.265.gbab2e86ba0-goog
Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec
On 2/21/20 12:32 PM, Stefan Hajnoczi wrote: > Hi Andrzej, > After having looked at the PMRWBM part of the spec, I think that the > Bit 1 mode should be implemented for slightly better performance. Bit > 0 mode is not well-suited to virtualization for the reasons I > mentioned in the previous email. > > The spec describes Bit 1 mode as "The completion of a read to the > PMRSTS register shall ensure that all prior writes to the Persistent > Memory Region have completed and are persistent". > > Stefan Make sense -- will incorporate that feedback in second version of patch.
Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec
On 2/21/20 11:45 AM, Dr. David Alan Gilbert wrote: > Isn't there also a requirement that BARs are powers of two? Wouldn't > you need to ensure the PMR file is a power of 2 in size? > > Dave Agree, need to make sure it is power of 2.
Re: [PATCH] maint: Include top-level *.rst files early in git diff
On 2/21/20 4:17 AM, Peter Maydell wrote: On Thu, 20 Feb 2020 at 16:22, Eric Blake wrote: We are converting more doc files to *.rst rather than *.texi. Most doc files are already listed early in diffs due to our catchall docs/*, but a few top-level files get missed by that glob. Signed-off-by: Eric Blake --- Both *.texi and *.rst entries make sense while we are still converting things, but we'll need a followup to drop *.texi when the conversion is complete... I was wondering what rst files we had outside docs, and the answer is "README.rst, CODING_STYLE.rst and tests/acceptance/README.rst". (There's a reasonable argument for moving CODING_STYLE.rst to docs/devel now; depends how highly you rate having it more "discoverable" to new contributors.) Or even: git mv CODING_STYLE.rst docs/devel/CODING_STYLE.rst ln -s docs/devel/CODING_STYLE.rst . git add CODING_STYLE.rst to turn it into a git link in the top level for easy discoverabilty on a fresh checkout, while still making it part of our overall built documentation. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH v7 01/11] qapi/error: add (Error **errp) cleaning APIs
On 2/21/20 3:20 AM, Vladimir Sementsov-Ogievskiy wrote: +static inline void warn_report_errp(Error **errp) +{ + assert(errp && *errp); + warn_report_err(*errp); + *errp = NULL; +} + + /* * Just like error_setg(), except you get to specify the error class. * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is These appear to be unused apart from the Coccinelle script in PATCH 03. They are used in the full "[RFC v5 000/126] error: auto propagated local_err" series. Options: 1. Pick a few more patches into this part I series, so these guys come with users. It needs some additional effort for this series.. But it's possible. Still, I think that we at least should not pull out patches which should be in future series (for example from ppc or block/).. 2. Punt this patch to the first part that has users, along with the part of the Coccinelle script that deals with them. But coccinelle script would be wrong, if we drop this part from it. I think, that after commit which adds coccinelle script, it should work with any file, not only subset of these series. So, it's probably OK for now to drop these functions, forcing their addition if coccinelle script will be applied where these functions are needed. We can, for example comment these three functions. Splitting coccinelle script into two parts, which will be in different series will not help any patch-porting processes. Splitting the coccinelle script across multiple patches is actually quite reviewable, and still easy to backport. Consider this series by Philippe: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05554.html which makes multiple additions to scripts/coccinelle/exec_rw_const.cocci over the course of the series. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH] migration/throttle: Add throttle-trig-thres migration parameter
On 2/20/20 8:57 PM, Keqian Zhu wrote: Currently, if the bytes_dirty_period is more than the 50% of bytes_xfer_period, we start or increase throttling. If we make this percentage higher, then we can tolerate higher dirty rate during migration, which means less impact on guest. The side effect of higher percentage is longer migration time. We can configure this parameter to switch between migration time firt or guest performance first. The default value is 50. Signed-off-by: Keqian Zhu --- Cc: Juan Quintela Cc: "Dr. David Alan Gilbert" Cc: Eric Blake Cc: Markus Armbruster --- +++ b/qapi/migration.json @@ -524,6 +524,10 @@ # compression, so set the decompress-threads to the number about 1/4 # of compress-threads is adequate. # +# @throttle-trig-thres: The ratio of bytes_dirty_period and bytes_xfer_period to +# trigger throttling. It is expressed as percentage. The +# default value is 50. (Since 5.0) +# Abbreviating feels odd; can you please spell this out as throttle-trigger-threshold? Can the threshold exceed 100%? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH v7 02/11] error: auto propagated local_err
On 2/21/20 3:42 AM, Vladimir Sementsov-Ogievskiy wrote: +#define ERRP_AUTO_PROPAGATE() \ + g_auto(ErrorPropagator) _auto_errp_prop = {.errp = errp}; \ + errp = ((errp == NULL || *errp == error_fatal) \ + ? &_auto_errp_prop.local_err : errp) + /* * Special error destination to abort on error. * See error_setg() and error_propagate() for details. *errp == error_fatal tests *errp == NULL, which is not what you want. You need to test errp == _fatal, just like error_handle_fatal(). Oops, great bug) And nobody noticed before) Of course, you are right. Sorry I missed it in my earlier looks. Superfluous parenthesis around the first operand of ?:. Wouldn't #define ERRP_AUTO_PROPAGATE() \ g_auto(ErrorPropagator) _auto_errp_prop = {.errp = errp}; \ if (!errp || errp == _fatal) { \ errp = &_auto_errp_prop.local_err; \ } be clearer? Hmm, notation with "if" will allow omitting ';' after macro invocation, which seems not good.. Then wrap it: g_auto(ErrorPropagator) _auto_errp_prop = {.errp = errp}; \ do { \ if (!errp || errp == _fata) { errp = &_auto_errp_prop.local_err; \ } \ while (0) And if I'm not wrong we've already discussed it somewhere in previous versions. The original use of ?: stems from my suggestion on an earlier revision when we were still trying to pack everything into two consecutive declaration lines, rather than a declaration and a statement (as ?: is necessary for conditionals in declarations). But since then, we decided to go with a statement (we require a C99 compiler, so declaration after statement is supported by our compiler, even if our coding style currently avoids it where possible), so as long as we support statements, we might as well go with a legible statement instead of insisting on the compact ?: form. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: Emulating Solaris 10 on SPARC64 sun4u
* BALATON Zoltan (bala...@eik.bme.hu) wrote: > On Wed, 19 Feb 2020, BALATON Zoltan wrote: > > faster or doing something differently? Does someone know what interrupts > > are generated on real hardware in DMA mode so we can compare that to > > what we see with QEMU? > > The document Programming Interface for Bus Master IDE Controller, Revision > 1.0 (5/16/94) has some info on this. AFAIU it says that after DMA operation > is completed an IRQ should be raised. On page 5, section 3.1. Data > Synchronization it says: > > "Another way to view this requirement is that the first read to the > controller Status register in response to the IDE device interrupt must > return with the Interrupt bit set and with the guarantee that all buffered > data has been written to memory." > > Not sure if this is relevant but how is it handled in QEMU? Is the right > interrupt bit set after DMA transfer is done? If so is it the one that's > checked by the OS driver? One thing to be a little careful of is I remember the 646 was always known for having a few quirks (I've not got a SPARC64 ith one, but I think my Alpha has it). So whether you're chasing a generic IDE BM problem or a 646 special is fun. Dave > Regards, > BALATON Zoltan > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
On Fri, 21 Feb 2020, Peter Maydell wrote: On Fri, 21 Feb 2020 at 18:04, BALATON Zoltan wrote: On Fri, 21 Feb 2020, Peter Maydell wrote: I think that is the wrong approach. Enabling use of the host FPU should not affect the accuracy of the emulation, which should remain bitwise-correct. We should only be using the host FPU to the extent that we can do that without discarding accuracy. As far as I'm aware that's how the hardfloat support for other guest CPUs that use it works. I don't know of a better approach. Please see section 4.2.2 Floating-Point Status and Control Register on page 124 in this document: https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 especially the definition of the FR and FI bits and tell me how can we emulate these accurately and use host FPU. I don't know much about PPC, but if you can't emulate the guest architecture accurately with the host FPU, then don't use the host FPU. We used to have a kind of 'hardfloat' I don't know if it's possible or not to emulate these accurately and use the FPU but nobody did it for QEMU so far. But if someone knows a way please speak up then we can try to implement it. Unfortunately this would require more detailed knowledge about different FPU implementations (at least X86_64, ARM and PPC that are the mostly used platforms) than what I have or willing to spend time to learn. support that was fast but inaccurate, but it was a mess because it meant that most guest code sort of worked but some guest code would confusingly misbehave. Deliberately not correctly emulating the guest CPU/FPU behaviour is not something I want us to return to. You're right that sometimes you can't get both speed and accuracy; other emulators (and especially ones which are trying to emulate games consoles) may choose to prefer speed over accuracy. For QEMU we prefer to choose accuracy over speed in this area. OK, then how about keeping the default accurate but allow to opt in to use FPU even if it's known to break some bits for workloads where users would need speed over accuracy and would be happy to live with the limitation. Note that i've found that just removing the define that disables hardfloat for PPC target makes VMX vector instructions faster while normal FPU is a little slower without any other changes so disabling hardfloat already limits performance for guests using VMX even when not using the FPU for cases when it would cause inaccuracy. If you say we want accuracy and don't care about speed, then just don't disable hardfloat as it helps at least VMX and then we can add option to allow the user to say we can use hardfloat even if it's inaccurate then they can test their workload and decide for themselves. Regards, BALATON Zoltan
Re: [PATCH 2/2] riscv: sifive_u: Update BIOS_FILENAME for 32-bit
On Thu, Feb 20, 2020 at 6:43 AM Bin Meng wrote: > > Update BIOS_FILENAME to consider 32-bit bios image file name. > > Tested booting Linux v5.5 32-bit image (built from rv32_defconfig > plus CONFIG_SOC_SIFIVE) with the default 32-bit bios image. Do we really want to support a 32-bit sifive_u machine? Alistair > > Signed-off-by: Bin Meng > --- > > hw/riscv/sifive_u.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c > index ca561d3..371133e 100644 > --- a/hw/riscv/sifive_u.c > +++ b/hw/riscv/sifive_u.c > @@ -57,7 +57,11 @@ > > #include > > -#define BIOS_FILENAME "opensbi-riscv64-sifive_u-fw_jump.bin" > +#if defined(TARGET_RISCV32) > +# define BIOS_FILENAME "opensbi-riscv32-sifive_u-fw_jump.bin" > +#else > +# define BIOS_FILENAME "opensbi-riscv64-sifive_u-fw_jump.bin" > +#endif > > static const struct MemmapEntry { > hwaddr base; > -- > 2.7.4 > >
Re: [PATCH 2/3] hw/riscv/spike: Allow loading firmware separately using -bios option
On Thu, Feb 13, 2020 at 11:22 PM Anup Patel wrote: > > This patch extends Spike machine support to allow loading OpenSBI > firmware (fw_jump.elf) separately using -bios option. > > Signed-off-by: Anup Patel Can you add something in the commit message adding that you also support adding an initrd with this patch? Otherwise: Reviewed-by: Alistair Francis Alistair > --- > hw/riscv/spike.c | 24 +++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c > index 8823681783..060a86f922 100644 > --- a/hw/riscv/spike.c > +++ b/hw/riscv/spike.c > @@ -45,6 +45,12 @@ > > #include > > +#if defined(TARGET_RISCV32) > +# define BIOS_FILENAME "opensbi-riscv32-spike-fw_jump.elf" > +#else > +# define BIOS_FILENAME "opensbi-riscv64-spike-fw_jump.elf" > +#endif > + > static const struct MemmapEntry { > hwaddr base; > hwaddr size; > @@ -183,8 +189,24 @@ static void spike_board_init(MachineState *machine) > memory_region_add_subregion(system_memory, memmap[SPIKE_MROM].base, > mask_rom); > > +riscv_find_and_load_firmware(machine, BIOS_FILENAME, > + memmap[SPIKE_DRAM].base, > + htif_symbol_callback); > + > if (machine->kernel_filename) { > -riscv_load_kernel(machine->kernel_filename, htif_symbol_callback); > +uint64_t kernel_entry = riscv_load_kernel(machine->kernel_filename, > + htif_symbol_callback); > + > +if (machine->initrd_filename) { > +hwaddr start; > +hwaddr end = riscv_load_initrd(machine->initrd_filename, > + machine->ram_size, kernel_entry, > + ); > +qemu_fdt_setprop_cell(s->fdt, "/chosen", > + "linux,initrd-start", start); > +qemu_fdt_setprop_cell(s->fdt, "/chosen", "linux,initrd-end", > + end); > +} > } > > /* reset vector */ > -- > 2.17.1 > >
Re: [PATCH 1/3] hw/riscv: Add optional symbol callback ptr to riscv_load_firmware()
On Thu, Feb 13, 2020 at 11:24 PM Anup Patel wrote: > > This patch adds an optional function pointer, "sym_cb", to > riscv_load_firmware() which provides the possibility to access > the symbol table during kernel loading. > > The pointer is ignored, if supplied with flat (non-elf) firmware image. > > The Spike board requires it locate the HTIF symbols from firmware ELF > passed via "-bios" option. > > Signed-off-by: Anup Patel Reviewed-by: Alistair Francis Alistair > --- > hw/riscv/boot.c | 13 - > hw/riscv/sifive_u.c | 2 +- > hw/riscv/virt.c | 2 +- > include/hw/riscv/boot.h | 6 -- > 4 files changed, 14 insertions(+), 9 deletions(-) > > diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c > index 027303d2a3..7ec94dc701 100644 > --- a/hw/riscv/boot.c > +++ b/hw/riscv/boot.c > @@ -36,7 +36,8 @@ > > void riscv_find_and_load_firmware(MachineState *machine, >const char *default_machine_firmware, > - hwaddr firmware_load_addr) > + hwaddr firmware_load_addr, > + symbol_fn_t sym_cb) > { > char *firmware_filename = NULL; > > @@ -76,7 +77,7 @@ void riscv_find_and_load_firmware(MachineState *machine, > > if (firmware_filename) { > /* If not "none" load the firmware */ > -riscv_load_firmware(firmware_filename, firmware_load_addr); > +riscv_load_firmware(firmware_filename, firmware_load_addr, sym_cb); > g_free(firmware_filename); > } > } > @@ -96,12 +97,14 @@ char *riscv_find_firmware(const char *firmware_filename) > } > > target_ulong riscv_load_firmware(const char *firmware_filename, > - hwaddr firmware_load_addr) > + hwaddr firmware_load_addr, > + symbol_fn_t sym_cb) > { > uint64_t firmware_entry, firmware_start, firmware_end; > > -if (load_elf(firmware_filename, NULL, NULL, NULL, _entry, > - _start, _end, 0, EM_RISCV, 1, 0) > 0) { > +if (load_elf_ram_sym(firmware_filename, NULL, NULL, NULL, > + _entry, _start, _end, 0, > + EM_RISCV, 1, 0, NULL, true, sym_cb) > 0) { > return firmware_entry; > } > > diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c > index 0140e95732..0c84215f42 100644 > --- a/hw/riscv/sifive_u.c > +++ b/hw/riscv/sifive_u.c > @@ -341,7 +341,7 @@ static void riscv_sifive_u_init(MachineState *machine) > create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline); > > riscv_find_and_load_firmware(machine, BIOS_FILENAME, > - memmap[SIFIVE_U_DRAM].base); > + memmap[SIFIVE_U_DRAM].base, NULL); > > if (machine->kernel_filename) { > uint64_t kernel_entry = riscv_load_kernel(machine->kernel_filename, > diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c > index c44b865959..90a5bfef63 100644 > --- a/hw/riscv/virt.c > +++ b/hw/riscv/virt.c > @@ -476,7 +476,7 @@ static void riscv_virt_board_init(MachineState *machine) > mask_rom); > > riscv_find_and_load_firmware(machine, BIOS_FILENAME, > - memmap[VIRT_DRAM].base); > + memmap[VIRT_DRAM].base, NULL); > > if (machine->kernel_filename) { > uint64_t kernel_entry = riscv_load_kernel(machine->kernel_filename, > diff --git a/include/hw/riscv/boot.h b/include/hw/riscv/boot.h > index df80051fbc..474a940ad5 100644 > --- a/include/hw/riscv/boot.h > +++ b/include/hw/riscv/boot.h > @@ -24,10 +24,12 @@ > > void riscv_find_and_load_firmware(MachineState *machine, >const char *default_machine_firmware, > - hwaddr firmware_load_addr); > + hwaddr firmware_load_addr, > + symbol_fn_t sym_cb); > char *riscv_find_firmware(const char *firmware_filename); > target_ulong riscv_load_firmware(const char *firmware_filename, > - hwaddr firmware_load_addr); > + hwaddr firmware_load_addr, > + symbol_fn_t sym_cb); > target_ulong riscv_load_kernel(const char *kernel_filename, > symbol_fn_t sym_cb); > hwaddr riscv_load_initrd(const char *filename, uint64_t mem_size, > -- > 2.17.1 > >
Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec
Hi Andrzej, After having looked at the PMRWBM part of the spec, I think that the Bit 1 mode should be implemented for slightly better performance. Bit 0 mode is not well-suited to virtualization for the reasons I mentioned in the previous email. The spec describes Bit 1 mode as "The completion of a read to the PMRSTS register shall ensure that all prior writes to the Persistent Memory Region have completed and are persistent". Stefan
Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec
* Stefan Hajnoczi (stefa...@gmail.com) wrote: > On Tue, Feb 18, 2020 at 03:48:11PM -0700, Andrzej Jakowski wrote: > > This patch introduces support for PMR that has been defined as part of NVMe > > 1.4 > > spec. User can now specify a pmr_file which will be mmap'ed into qemu > > address > > space and subsequently in PCI BAR 2. Guest OS can perform mmio read and > > writes > > to the PMR region that will stay persistent accross system reboot. > > > > Signed-off-by: Andrzej Jakowski > > --- > > hw/block/nvme.c | 145 ++- > > hw/block/nvme.h | 5 ++ > > hw/block/trace-events | 5 ++ > > include/block/nvme.h | 172 ++ > > 4 files changed, 326 insertions(+), 1 deletion(-) > > NVDIMM folks, please take a look. There seems to be commonality here. > > Can this use -object memory-backend-file instead of manually opening and > mapping a file? > > Also CCing David Gilbert because there is some similarity with the > vhost-user-fs's DAX Window feature where QEMU mmaps regions of files > into a BAR. I guess the biggest difference here is that the read can have the side effect; in my world I don't have to set read/write/endian ops - I just map a chunk of memory and use memory_region_add_subregion, so all the read/writes are native read/writes - I assume that would be a lot faster - I guess it depends if NVME_PMRCAP_PMRWBM is something constant you know early on; if you know that you don't need to do side effects in the read you could do the same trick and avoid the IO ops altogether. Isn't there also a requirement that BARs are powers of two? Wouldn't you need to ensure the PMR file is a power of 2 in size? Dave > > @@ -1303,6 +1327,38 @@ static const MemoryRegionOps nvme_cmb_ops = { > > }, > > }; > > > > +static void nvme_pmr_write(void *opaque, hwaddr addr, uint64_t data, > > +unsigned size) > > +{ > > +NvmeCtrl *n = (NvmeCtrl *)opaque; > > +stn_le_p(>pmrbuf[addr], size, data); > > +} > > + > > +static uint64_t nvme_pmr_read(void *opaque, hwaddr addr, unsigned size) > > +{ > > +NvmeCtrl *n = (NvmeCtrl *)opaque; > > +if (!NVME_PMRCAP_PMRWBM(n->bar.pmrcap)) { > > +int ret; > > +ret = msync(n->pmrbuf, n->f_pmr_size, MS_SYNC); > > +if (!ret) { > > +NVME_GUEST_ERR(nvme_ub_mmiowr_pmrread_barrier, > > + "error while persisting data"); > > +} > > +} > > Why is msync(2) done on memory loads instead of stores? -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [PULL 0/6] virtiofs queue
On Fri, 21 Feb 2020 at 13:52, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" > > The following changes since commit b651b80822fa8cb66ca30087ac7fbc75507ae5d2: > > Merge remote-tracking branch > 'remotes/vivier2/tags/linux-user-for-5.0-pull-request' into staging > (2020-02-20 17:35:42 +) > > are available in the Git repository at: > > https://gitlab.com/dagrh/qemu.git tags/pull-virtiofs-20200221 > > for you to fetch changes up to 5bb8e8beedb47fc0d0a44957a154918c4f4afc80: > > docs: Fix virtiofsd.1 location (2020-02-21 13:05:27 +) > > ---- > virtiofs pull 20200221 > > Mostly minor cleanups. > Miroslav's fixes a make install corner case. > Philippe's set includes an error corner case fix. > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0 for any user-visible changes. -- PMM
Re: [PATCH v1 3/4] Acceptance test: provides new functions
Hi Oksana, On 2/14/20 12:52 PM, Oksana Vohchana wrote: Adds functions to check if service RDMA is enabled and gets the interface where it was configured Signed-off-by: Oksana Vohchana --- tests/acceptance/migration.py | 17 + 1 file changed, 17 insertions(+) diff --git a/tests/acceptance/migration.py b/tests/acceptance/migration.py index 8209dcf71d..bbd88f8dda 100644 --- a/tests/acceptance/migration.py +++ b/tests/acceptance/migration.py @@ -11,12 +11,16 @@ import tempfile +import re +import netifaces Since netifaces isn't a standard Python library that import might fail. The tests dependencies are listed in tests/requirements.txt, and installed in the environment created by `make check-acceptance`. If you want to ensure the test behaves well even when executed manually (i.e. not via `make check-acceptance`), you can add runtime checks as can be seen in tests/acceptance/machine_m68k_nextcube.py from avocado_qemu import Test from avocado import skipUnless from avocado.utils import network from avocado.utils import wait from avocado.utils.path import find_command +from avocado.utils import service +from avocado.utils import process class Migration(Test): @@ -58,6 +62,19 @@ class Migration(Test): self.cancel('Failed to find a free port') return port +def _if_rdma_enable(self): +rdma_stat = service.ServiceManager() +rdma = rdma_stat.status('rdma') +return rdma Above function is used on patch04, but actually I don't think it needs to check this service for RoCE. It would be needed if it was using the rxe_cfg to configure the rdma link. Or am I missing something? + +def _get_ip_rdma(self): +get_ip_rdma = process.run('rdma link show').stdout.decode() +for line in get_ip_rdma.split('\n'): +if re.search(r"ACTIVE", line): +interface = line.split(" ")[-2] +ip = netifaces.ifaddresses(interface)[netifaces.AF_INET][0]['addr'] +return ip + I suggest that it explicitly returns None if none is found. Thanks! - Wainer def test_migration_with_tcp_localhost(self): dest_uri = 'tcp:localhost:%u' % self._get_free_port()
Re: [PATCH v2 12/13] hw/arm/raspi: Add the Raspberry Pi B+ machine
On Tue, Feb 18, 2020 at 10:35:43AM +0100, Philippe Mathieu-Daudé wrote: > Cc'ing Eduardo/Igor. > > On 2/18/20 9:48 AM, Luc Michel wrote: > > On 2/17/20 12:45 PM, Philippe Mathieu-Daudé wrote: > > >$ qemu-system-arm -M raspi1b -serial stdio \ > > >-kernel raspberrypi/firmware/boot/kernel.img \ > > >-dtb raspberrypi/firmware/boot/bcm2708-rpi-b.dtb \ > > >-append 'printk.time=0 earlycon=pl011,0x20201000 console=ttyAMA0' > > >[0.00] Booting Linux on physical CPU 0x0 > > >[0.00] Linux version 4.19.69+ (dom@buildbot) (gcc version > > > 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1261 Tue Sep 3 > > > 20:21:01 BST 2019 > > >[0.00] CPU: ARMv6-compatible processor [410fb767] revision 7 > > > (ARMv7), cr=00c5387d > > >[0.00] CPU: VIPT aliasing data cache, unknown instruction cache > > >[0.00] OF: fdt: Machine model: Raspberry Pi Model B > > >[0.00] earlycon: pl11 at MMIO 0x20201000 (options '') > > >[0.00] bootconsole [pl11] enabled > > >[0.00] Memory policy: Data cache writeback > > >[0.00] cma: Reserved 8 MiB at 0x1b80 > > >[0.00] random: get_random_bytes called from > > > start_kernel+0x8c/0x49c with crng_init=0 > > >[0.00] Built 1 zonelists, mobility grouping on. Total pages: > > > 113680 > > >[0.00] Kernel command line: printk.time=0 > > > earlycon=pl011,0x20201000 console=ttyAMA0 > > >Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) > > >Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) > > >Memory: 434380K/458752K available (6971K kernel code, 635K rwdata, > > > 2080K rodata, 464K init, 797K bss, 16180K reserved, 8192K cma-reserved) > > >... > > > > > > Signed-off-by: Philippe Mathieu-Daudé > > > --- > > > hw/arm/raspi.c | 13 + > > > 1 file changed, 13 insertions(+) > > > > > > diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c > > > index 3537a329ac..2d9f4e3085 100644 > > > --- a/hw/arm/raspi.c > > > +++ b/hw/arm/raspi.c > > > @@ -324,6 +324,15 @@ static void > > > raspi_machine_class_common_init(MachineClass *mc, > > > mc->default_ram_size = board_ram_size(board_rev); > > > }; > > > +static void raspi1b_machine_class_init(ObjectClass *oc, void *data) > > > +{ > > > +MachineClass *mc = MACHINE_CLASS(oc); > > > +RaspiMachineClass *rmc = RASPI_MACHINE_CLASS(oc); > > > + > > > +rmc->board_rev = 0x900032; > > > +raspi_machine_class_common_init(mc, rmc->board_rev); > > > +}; > > > + > > > static void raspi2b_machine_class_init(ObjectClass *oc, void *data) > > > { > > > MachineClass *mc = MACHINE_CLASS(oc); > > > @@ -348,6 +357,10 @@ static void raspi3b_machine_class_init(ObjectClass > > > *oc, void *data) > > > static const TypeInfo raspi_machine_types[] = { > > > { > > > +.name = MACHINE_TYPE_NAME("raspi1b"), > > If it's the B+ model, why not call it raspi1b+ ? > > I thought about it (and prefer it), but I'm not sure this can have some > side-effect. > > Eduardo, Igor, is that OK to use a '+' in a machine name? It would probably work, but I would prefer to avoid that. > > So far the names used match [a-zA-Z0-9-]. > > > > > > +.parent = TYPE_RASPI_MACHINE, > > > +.class_init = raspi1b_machine_class_init, > > > +}, { > > > .name = MACHINE_TYPE_NAME("raspi2b"), > > > .parent = TYPE_RASPI_MACHINE, > > > .class_init = raspi2b_machine_class_init, > > > > > > -- Eduardo
Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec
On Fri, Feb 21, 2020, 17:50 Dr. David Alan Gilbert wrote: > * Stefan Hajnoczi (stefa...@gmail.com) wrote: > > On Fri, Feb 21, 2020 at 3:36 PM Andrzej Jakowski > > wrote: > > > On 2/21/20 6:45 AM, Stefan Hajnoczi wrote: > > > > Why is msync(2) done on memory loads instead of stores? > > > > > > This is my interpretation of NVMe spec wording with regards to PMRWBM > field > > > which says: > > > > > > "The completion of a memory read from any Persistent > > > Memory Region address ensures that all prior writes to the > > > Persistent Memory Region have completed and are > > > persistent." > > > > Thanks, I haven't read the PMR section of the spec :). > > > > A synchronous operation is bad for virtualization performance. While > > the sync may be a cheap operation in hardware, it can be arbitrarily > > expensive with msync(2). The vCPU will be stuck until msync(2) > > completes on the host. > > > > It's also a strange design choice since performance will suffer when > > an unrelated read has to wait for writes to complete. This is > > especially problematic for multi-threaded applications or multi-core > > systems where I guess this case is hit frequently. Maybe it's so > > cheap in hardware that it doesn't matter? But then why didn't NVDIMM > > use this mechanism? > > > > If anyone knows the answer I'd be interested in learning. But this > > isn't a criticism of the patch - of course it needs to implement the > > hardware spec and we can't change it. > > Is this coming from the underlying PCIe spec ? > In PCIe Base 4.0 Rev 0.3 Feb19-2014, section 2.4.1 Transaction ordering, > there's a Table 2-39 and entry B2a in that table is: > > > 'A Read Request must not pass a Posted Request unless B2b applies.' > > and a posted request is defined as a 'Memory Write Request or a Message > Request'. > > ??? > No, that relates to transaction ordering in PCI, not data persistence in PMR. PMR can define whatever persistence semantics it wants. The completion of the write transaction at the PCI level does not mean data has to be persistent at the PMR level. Stefan >
Re: [QUESTION] Clang as QEMU compiler - two questions
On Fri, 21 Feb 2020 at 18:09, Aleksandar Markovic wrote: > 1) Is ./configure --cc=clang the correct way to enable clang > compilation of QEMU? Any other gotchas? You'll also want "--cxx=clang++", so the (few) C++ parts of QEMU get compiled with the same toolchain. It should just work -- clang is our default compiler for OSX and is included in the CI and pre-merge tests on Linux. thanks -- PMM
Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
On Fri, 21 Feb 2020 at 18:04, BALATON Zoltan wrote: > On Fri, 21 Feb 2020, Peter Maydell wrote: > > I think that is the wrong approach. Enabling use of the host > > FPU should not affect the accuracy of the emulation, which > > should remain bitwise-correct. We should only be using the > > host FPU to the extent that we can do that without discarding > > accuracy. As far as I'm aware that's how the hardfloat support > > for other guest CPUs that use it works. > > I don't know of a better approach. Please see section 4.2.2 Floating-Point > Status and Control Register on page 124 in this document: > > https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 > > especially the definition of the FR and FI bits and tell me how can we > emulate these accurately and use host FPU. I don't know much about PPC, but if you can't emulate the guest architecture accurately with the host FPU, then don't use the host FPU. We used to have a kind of 'hardfloat' support that was fast but inaccurate, but it was a mess because it meant that most guest code sort of worked but some guest code would confusingly misbehave. Deliberately not correctly emulating the guest CPU/FPU behaviour is not something I want us to return to. You're right that sometimes you can't get both speed and accuracy; other emulators (and especially ones which are trying to emulate games consoles) may choose to prefer speed over accuracy. For QEMU we prefer to choose accuracy over speed in this area. thanks -- PMM
Re: [GSoC/Outreachy] Arduino complete setup visualization and emulation
On 2/21/20 12:14 PM, Philippe Mathieu-Daudé wrote: > On 2/21/20 11:56 AM, Stefan Hajnoczi wrote: >> On Tue, Feb 11, 2020 at 10:51:19AM +, Stefan Hajnoczi wrote: >>> On Mon, Feb 10, 2020 at 08:58:28PM +0100, Philippe Mathieu-Daudé wrote: ... > [*] Test with the preset arduino examples (### TODO add references ###) > > - Basic: "Blink: Turn a LED on and off." > - Analog: "Fading: Use an analog output (PWM pin) to dim a LED." > - Analog: "Analog Input: Use a potentiometer to control the flashing > of a LED." > [*] Test with the preset Arduino examples [0] - Basic: "Blink: Turn a LED on and off." [1] - Analog: "Fading: Use an analog output (PWM pin) to dim a LED." [2] - Analog: "Analog Input: Use a potentiometer to control the blinking of a LED." [3] [0] https://www.arduino.cc/en/Tutorial/BuiltInExamples [1] https://www.arduino.cc/en/Tutorial/Blink [2] https://www.arduino.cc/en/Tutorial/Fading [3] https://www.arduino.cc/en/Tutorial/AnalogInput --Joa
Re: [PATCH] hw/char/pl011: Output characters using best-effort mode
On 21/02/20 14:14, Peter Maydell wrote: > The initial case reported by Gavin in this thread is > "-serial tcp:127.0.0.1:50900" with the other end being a program which > listens on TCP port 50900 and then sleeps without accepting any incoming > connections, which blocks the serial port output and effectively blocks > the guest bootup. If you want to insulate the guest from badly > behaved consumers like that (or the related consumer who accepts > the connection and then just doesn't read data from it) you probably > need to deal with more than just POLLHUP. But I'm not sure how much > we should care about these cases as opposed to just telling users > not to do that... No, I think we don't do anything (on purpose; that is, it was considered the lesser evil) for x86 in that case. Paolo
[QUESTION] Clang as QEMU compiler - two questions
Hello, all I have two questions: 1) Is ./configure --cc=clang the correct way to enable clang compilation of QEMU? Any other gotchas? 2) Did somebody compare QEMU compiled by gcc vs. QEMU compiled with clang (in any sense: executable size, performance, etc.)? Thanks in advance, Aleksandar
Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
On Fri, 21 Feb 2020, Peter Maydell wrote: On Fri, 21 Feb 2020 at 16:05, BALATON Zoltan wrote: On Thu, 20 Feb 2020, Richard Henderson wrote: On 2/18/20 9:10 AM, BALATON Zoltan wrote: +DEFINE_PROP_BOOL("hardfloat", PowerPCCPU, hardfloat, true), I would also prefer a different name here -- perhaps x-no-fp-fi. What's wrong with hardfloat? That's how the code refers to this so if anyone searches what it does would turn up some meaningful results. This prompted me to check what you're using the property for. The cover letter says: This patch implements a simple way to keep the inexact flag set for hardfloat while still allowing to revert to softfloat for workloads that need more accurate albeit slower emulation. (Set hardfloat property of CPU, i.e. -cpu name,hardfloat=false for that.) I think that is the wrong approach. Enabling use of the host FPU should not affect the accuracy of the emulation, which should remain bitwise-correct. We should only be using the host FPU to the extent that we can do that without discarding accuracy. As far as I'm aware that's how the hardfloat support for other guest CPUs that use it works. I don't know of a better approach. Please see section 4.2.2 Floating-Point Status and Control Register on page 124 in this document: https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 especially the definition of the FR and FI bits and tell me how can we emulate these accurately and use host FPU. Not using the FPU even when these bits are not needed (which seems to be the case for all workloads we've tested so far) seriously limits the emulation speed so spending time to emulate obscure and unused part of an architecture when not actually needed just to keep emulation accurate but unusably slow does not seem to be the right approach. In an ideal world of course this should be both fast and accurate but we don't seem to have anyone who could achieve that in past two years so maybe we could give up some accuracy now to get usable speed and worry about emulating obscure features when we come across some workload that actually needs it (but we have the option to revert to accurate but slow emulation for that until a better way can be devised that's both fast and accurate). Insisting on accuracy without any solution to current state just hinders making any progress with this. Other PowerPC emulators also seem to not bother or have similar optimisation. I've quickly checked three that I know about: https://github.com/mamedev/mame/blob/master/src/devices/cpu/powerpc/ppcdrc.cpp#L1893 https://github.com/mamedev/mame/blob/master/src/devices/cpu/powerpc/ppcdrc.cpp#L3503 there's also something here but no mention of FI bit I could notice: https://github.com/mamedev/mame/blob/master/src/devices/cpu/powerpc/ppccom.cpp#L2023 https://github.com/xenia-project/xenia/blob/master/src/xenia/cpu/ppc/ppc_hir_builder.cc#L428 https://github.com/dolphin-emu/dolphin/blob/master/Source/Core/Core/PowerPC/Jit64/Jit_FloatingPoint.cpp But I'm not sure I understand all of the above so hope this makes more sense to someone and can advise. Regards, BALATON Zoltan
Re: [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating
On Fri, Feb 21, 2020 at 05:41:51PM +0100, David Hildenbrand wrote: > I was now able to actually test resizing while migrating. I am using the > prototype of virtio-mem to test (which also makes use of resizable > allocations). Things I was able to reproduce: The test cases cover quite a lot. Thanks for doing that. > - Resize while still running on the migration source. Migration is canceled > -- Test case for "migraton/ram: Handle RAM block resizes during precopy" > - Resize (grow+shrink) on the migration target during postcopy migration > (when syncing RAM blocks), while not yet running on the target > -- Test case for "migration/ram: Discard new RAM when growing RAM blocks >and the VM is stopped", and overall RAM size synchronization. Seems to >work just fine. This won't be able to trigger without virtio-mem, right? And I'm also curious on how to test this even with virtio-mem. Is that a QMP command to extend/shrink virtio-mem? > - Resize (grow+shrink) on the migration tagret during postcopy migration > while already running on the target. > -- Test case for "migration/ram: Handle RAM block resizes during postcopy" > -- Test case for "migration/ram: Tolerate partially changed mappings in >postcopy code" - I can see that -ENOENT is actually triggered and that >migration succeeds. Migration seems to work just fine. Thanks, -- Peter Xu
Re: [PATCH v4 04/16] hw/i386: Introduce init_topo_info to initialize X86CPUTopoInfo
On 2/21/20 11:05 AM, Igor Mammedov wrote: > On Thu, 13 Feb 2020 12:16:51 -0600 > Babu Moger wrote: > >> Initialize all the parameters in one function init_topo_info. > > is it possible to squash it in 2/16 > Sure. We can do that. > >> >> Move the data structure X86CPUTopoIDs and X86CPUTopoInfo into >> x86.h. > A reason why it's moved should be here. Apicid functions will be part of X86MachineState data structure(patches introduced later).These functions will use X86CPUTopoIDs and X86CPUTopoInfo definition. Will add these details. Thanks > >> >> Signed-off-by: Babu Moger >> Reviewed-by: Eduardo Habkost >> --- >> hw/i386/pc.c |4 +--- >> hw/i386/x86.c | 14 +++--- >> include/hw/i386/topology.h | 26 ++ >> include/hw/i386/x86.h | 17 + >> 4 files changed, 31 insertions(+), 30 deletions(-) >> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c >> index 2adf7f6afa..9803413dd9 100644 >> --- a/hw/i386/pc.c >> +++ b/hw/i386/pc.c >> @@ -1749,9 +1749,7 @@ static void pc_cpu_pre_plug(HotplugHandler >> *hotplug_dev, >> return; >> } >> >> -topo_info.dies_per_pkg = x86ms->smp_dies; >> -topo_info.cores_per_die = smp_cores; >> -topo_info.threads_per_core = smp_threads; >> +init_topo_info(_info, x86ms); >> >> env->nr_dies = x86ms->smp_dies; >> >> diff --git a/hw/i386/x86.c b/hw/i386/x86.c >> index f18cab8e5c..083effb2f5 100644 >> --- a/hw/i386/x86.c >> +++ b/hw/i386/x86.c >> @@ -63,15 +63,12 @@ static size_t pvh_start_addr; >> uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms, >> unsigned int cpu_index) >> { >> -MachineState *ms = MACHINE(x86ms); >> X86MachineClass *x86mc = X86_MACHINE_GET_CLASS(x86ms); >> X86CPUTopoInfo topo_info; >> uint32_t correct_id; >> static bool warned; >> >> -topo_info.dies_per_pkg = x86ms->smp_dies; >> -topo_info.cores_per_die = ms->smp.cores; >> -topo_info.threads_per_core = ms->smp.threads; >> +init_topo_info(_info, x86ms); >> >> correct_id = x86_apicid_from_cpu_idx(_info, cpu_index); >> if (x86mc->compat_apic_id_mode) { >> @@ -146,10 +143,7 @@ int64_t x86_get_default_cpu_node_id(const MachineState >> *ms, int idx) >> X86MachineState *x86ms = X86_MACHINE(ms); >> X86CPUTopoInfo topo_info; >> >> - topo_info.dies_per_pkg = x86ms->smp_dies; >> - topo_info.cores_per_die = ms->smp.cores; >> - topo_info.threads_per_core = ms->smp.threads; >> - >> + init_topo_info(_info, x86ms); >> >> assert(idx < ms->possible_cpus->len); >> x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id, >> @@ -177,9 +171,7 @@ const CPUArchIdList >> *x86_possible_cpu_arch_ids(MachineState *ms) >>sizeof(CPUArchId) * max_cpus); >> ms->possible_cpus->len = max_cpus; >> >> -topo_info.dies_per_pkg = x86ms->smp_dies; >> -topo_info.cores_per_die = ms->smp.cores; >> -topo_info.threads_per_core = ms->smp.threads; >> +init_topo_info(_info, x86ms); >> >> for (i = 0; i < ms->possible_cpus->len; i++) { >> X86CPUTopoIDs topo_ids; >> diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h >> index ba52d49079..ef0ab0b6a3 100644 >> --- a/include/hw/i386/topology.h >> +++ b/include/hw/i386/topology.h >> @@ -40,23 +40,17 @@ >> >> >> #include "qemu/bitops.h" >> +#include "hw/i386/x86.h" >> >> -/* APIC IDs can be 32-bit, but beware: APIC IDs > 255 require x2APIC support >> - */ >> -typedef uint32_t apic_id_t; >> - >> -typedef struct X86CPUTopoIDs { >> -unsigned pkg_id; >> -unsigned die_id; >> -unsigned core_id; >> -unsigned smt_id; >> -} X86CPUTopoIDs; >> - >> -typedef struct X86CPUTopoInfo { >> -unsigned dies_per_pkg; >> -unsigned cores_per_die; >> -unsigned threads_per_core; >> -} X86CPUTopoInfo; >> +static inline void init_topo_info(X86CPUTopoInfo *topo_info, >> + const X86MachineState *x86ms) >> +{ >> +MachineState *ms = MACHINE(x86ms); >> + >> +topo_info->dies_per_pkg = x86ms->smp_dies; >> +topo_info->cores_per_die = ms->smp.cores; >> +topo_info->threads_per_core = ms->smp.threads; >> +} >> >> /* Return the bit width needed for 'count' IDs >> */ >> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h >> index 4b84917885..ad62b01cf2 100644 >> --- a/include/hw/i386/x86.h >> +++ b/include/hw/i386/x86.h >> @@ -36,6 +36,23 @@ typedef struct { >> bool compat_apic_id_mode; >> } X86MachineClass; >> >> +/* APIC IDs can be 32-bit, but beware: APIC IDs > 255 require x2APIC support >> + */ >> +typedef uint32_t apic_id_t; >> + >> +typedef struct X86CPUTopoIDs { >> +unsigned pkg_id; >> +unsigned die_id; >> +unsigned core_id; >> +unsigned smt_id; >> +} X86CPUTopoIDs; >> + >> +typedef struct X86CPUTopoInfo { >> +unsigned dies_per_pkg; >> +unsigned cores_per_die; >> +
Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec
* Stefan Hajnoczi (stefa...@gmail.com) wrote: > On Fri, Feb 21, 2020 at 3:36 PM Andrzej Jakowski > wrote: > > On 2/21/20 6:45 AM, Stefan Hajnoczi wrote: > > > Why is msync(2) done on memory loads instead of stores? > > > > This is my interpretation of NVMe spec wording with regards to PMRWBM field > > which says: > > > > "The completion of a memory read from any Persistent > > Memory Region address ensures that all prior writes to the > > Persistent Memory Region have completed and are > > persistent." > > Thanks, I haven't read the PMR section of the spec :). > > A synchronous operation is bad for virtualization performance. While > the sync may be a cheap operation in hardware, it can be arbitrarily > expensive with msync(2). The vCPU will be stuck until msync(2) > completes on the host. > > It's also a strange design choice since performance will suffer when > an unrelated read has to wait for writes to complete. This is > especially problematic for multi-threaded applications or multi-core > systems where I guess this case is hit frequently. Maybe it's so > cheap in hardware that it doesn't matter? But then why didn't NVDIMM > use this mechanism? > > If anyone knows the answer I'd be interested in learning. But this > isn't a criticism of the patch - of course it needs to implement the > hardware spec and we can't change it. Is this coming from the underlying PCIe spec ? In PCIe Base 4.0 Rev 0.3 Feb19-2014, section 2.4.1 Transaction ordering, there's a Table 2-39 and entry B2a in that table is: 'A Read Request must not pass a Posted Request unless B2b applies.' and a posted request is defined as a 'Memory Write Request or a Message Request'. ??? Dave > Stefan > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [PATCH 2/7] hw/arm: Let devices own the MemoryRegion they create
On Fri, 21 Feb 2020 at 17:31, Philippe Mathieu-Daudé wrote: > > To avoid orphean memory regions being added in the /unattached ("orphan") > QOM container, use the memory_region_owner_nonnull.cocci script > to set the correct ownership. > > Signed-off-by: Philippe Mathieu-Daudé > --- > hw/arm/exynos4210.c| 14 +++--- > hw/arm/fsl-imx25.c | 14 +++--- > hw/arm/fsl-imx31.c | 10 +- > hw/arm/fsl-imx6.c | 10 +- > hw/arm/fsl-imx6ul.c| 14 +++--- > hw/arm/msf2-soc.c | 8 > hw/arm/nrf51_soc.c | 2 +- > hw/arm/stm32f205_soc.c | 10 +- > hw/arm/stm32f405_soc.c | 13 +++-- > hw/arm/xlnx-zynqmp.c | 13 ++--- > 10 files changed, 54 insertions(+), 54 deletions(-) > > diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c > index 59a27bdd68..d4b05336ee 100644 > --- a/hw/arm/exynos4210.c > +++ b/hw/arm/exynos4210.c > @@ -305,28 +305,28 @@ static void exynos4210_realize(DeviceState *socdev, > Error **errp) > /*** Memory ***/ > > /* Chip-ID and OMR */ > -memory_region_init_io(>chipid_mem, NULL, > _chipid_and_omr_ops, > -NULL, "exynos4210.chipid", sizeof(chipid_and_omr)); > +memory_region_init_io(>chipid_mem, OBJECT(socdev), > + _chipid_and_omr_ops, NULL, > + "exynos4210.chipid", sizeof(chipid_and_omr)); > memory_region_add_subregion(system_mem, EXYNOS4210_CHIPID_ADDR, > >chipid_mem); > > /* Internal ROM */ > -memory_region_init_ram(>irom_mem, NULL, "exynos4210.irom", > +memory_region_init_ram(>irom_mem, OBJECT(socdev), "exynos4210.irom", > EXYNOS4210_IROM_SIZE, _fatal); I have a feeling that the owner of a RAM memory region affects the name we use for the underlying ramblock, which in turn is used in the on-the-wire data stream in migration, which means that changing ownership breaks migration compatibility. That's probably OK in most cases as we don't care too much about migration-compat on most boards, but I think it does mean that you want to keep those changes separated out from the ones for IO and alias regions, which I think shouldn't have visible effects. thanks -- PMM
Re: [PATCH v7 01/11] qapi/error: add (Error **errp) cleaning APIs
21.02.2020 19:34, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: 21.02.2020 10:38, Markus Armbruster wrote: Vladimir Sementsov-Ogievskiy writes: Add functions to clean Error **errp: call corresponding Error *err cleaning function an set pointer to NULL. New functions: error_free_errp error_report_errp warn_report_errp Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Greg Kurz Reviewed-by: Eric Blake --- CC: Eric Blake CC: Kevin Wolf CC: Max Reitz CC: Greg Kurz CC: Stefano Stabellini CC: Anthony Perard CC: Paul Durrant CC: Stefan Hajnoczi CC: "Philippe Mathieu-Daudé" CC: Laszlo Ersek CC: Gerd Hoffmann CC: Stefan Berger CC: Markus Armbruster CC: Michael Roth CC: qemu-bl...@nongnu.org CC: xen-de...@lists.xenproject.org include/qapi/error.h | 26 ++ 1 file changed, 26 insertions(+) diff --git a/include/qapi/error.h b/include/qapi/error.h index ad5b6e896d..d34987148d 100644 --- a/include/qapi/error.h +++ b/include/qapi/error.h @@ -309,6 +309,32 @@ void warn_reportf_err(Error *err, const char *fmt, ...) void error_reportf_err(Error *err, const char *fmt, ...) GCC_FMT_ATTR(2, 3); +/* + * Functions to clean Error **errp: call corresponding Error *err cleaning + * function, then set pointer to NULL. + */ +static inline void error_free_errp(Error **errp) +{ +assert(errp && *errp); +error_free(*errp); +*errp = NULL; +} + +static inline void error_report_errp(Error **errp) +{ +assert(errp && *errp); +error_report_err(*errp); +*errp = NULL; +} + +static inline void warn_report_errp(Error **errp) +{ +assert(errp && *errp); +warn_report_err(*errp); +*errp = NULL; +} + + /* * Just like error_setg(), except you get to specify the error class. * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is These appear to be unused apart from the Coccinelle script in PATCH 03. They are used in the full "[RFC v5 000/126] error: auto propagated local_err" series. Options: 1. Pick a few more patches into this part I series, so these guys come with users. It needs some additional effort for this series.. But it's possible. Still, I think that we at least should not pull out patches which should be in future series (for example from ppc or block/).. Yes, we want to keep related stuff together. Grepping through v5: for x in {warn_report_errp,error_report_errp,error_free_errp}; do echo == $x ==; git grep -l $x | grep -v coccinelle | grep -v 'error\.h'; echo; done == warn_report_errp == block/file-posix.c hw/ppc/spapr.c hw/ppc/spapr_caps.c hw/ppc/spapr_irq.c hw/vfio/pci.c net/tap.c qom/object.c == error_report_errp == hw/block/vhost-user-blk.c util/oslib-posix.c == error_free_errp == block.c block/qapi.c block/sheepdog.c block/snapshot.c blockdev.c chardev/char-socket.c hw/audio/intel-hda.c hw/core/qdev-properties.c hw/pci-bridge/pci_bridge_dev.c hw/pci-bridge/pcie_pci_bridge.c hw/scsi/megasas.c hw/scsi/mptsas.c hw/usb/hcd-xhci.c io/net-listener.c migration/colo.c qga/commands-posix.c qga/commands-win32.c util/qemu-sockets.c What do you want to add? PATCH v5 032 uses both error_report_errp() and error_free_errp(). Adding warn_report_errp() without a user is okay with me. What do you think? If there are patches you consider related to 032, feel free to throw them in. 032 is qga/commands-win32.c and util/oslib-posix.c Seems that they are wrongly grouped into one patch. qga/commands-win32.c matches qga/ (Michael Roth) and util/oslib-posix.c matches POSIX (Paolo Bonzini) So, it should be two separate patches anyway. For [1.] I only afraid that we'll have to wait for maintainers, who were not interested in previous iterations, to review these new patches.. 2. Punt this patch to the first part that has users, along with the part of the Coccinelle script that deals with them. But coccinelle script would be wrong, if we drop this part from it. I think, that after commit which adds coccinelle script, it should work with any file, not only subset of these series. So, it's probably OK for now to drop these functions, forcing their addition if coccinelle script will be applied where these functions are needed. We can, for example comment these three functions. Splitting coccinelle script into two parts, which will be in different series will not help any patch-porting processes. Moreover, this will create dependencies between future series updating other files. So, I don't like [2.].. And it's likely more work than 1. 3. Do nothing: accept the functions without users. OK for me) I habitually dislike 3., but reviewing the rest of this series might make me override that dislike. [...] -- Best regards, Vladimir
Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec
On Fri, Feb 21, 2020 at 3:36 PM Andrzej Jakowski wrote: > On 2/21/20 6:45 AM, Stefan Hajnoczi wrote: > > Why is msync(2) done on memory loads instead of stores? > > This is my interpretation of NVMe spec wording with regards to PMRWBM field > which says: > > "The completion of a memory read from any Persistent > Memory Region address ensures that all prior writes to the > Persistent Memory Region have completed and are > persistent." Thanks, I haven't read the PMR section of the spec :). A synchronous operation is bad for virtualization performance. While the sync may be a cheap operation in hardware, it can be arbitrarily expensive with msync(2). The vCPU will be stuck until msync(2) completes on the host. It's also a strange design choice since performance will suffer when an unrelated read has to wait for writes to complete. This is especially problematic for multi-threaded applications or multi-core systems where I guess this case is hit frequently. Maybe it's so cheap in hardware that it doesn't matter? But then why didn't NVDIMM use this mechanism? If anyone knows the answer I'd be interested in learning. But this isn't a criticism of the patch - of course it needs to implement the hardware spec and we can't change it. Stefan
[PATCH 7/7] hw/riscv: Let devices own the MemoryRegion they create
To avoid orphean memory regions being added in the /unattached QOM container, use the memory_region_owner_nonnull.cocci script to set the correct ownership. Signed-off-by: Philippe Mathieu-Daudé --- hw/riscv/sifive_e.c | 8 hw/riscv/sifive_u.c | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c index 8a6b0348df..583015c247 100644 --- a/hw/riscv/sifive_e.c +++ b/hw/riscv/sifive_e.c @@ -145,8 +145,8 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, Error **errp) _abort); /* Mask ROM */ -memory_region_init_rom(>mask_rom, NULL, "riscv.sifive.e.mrom", -memmap[SIFIVE_E_MROM].size, _fatal); +memory_region_init_rom(>mask_rom, OBJECT(dev), "riscv.sifive.e.mrom", + memmap[SIFIVE_E_MROM].size, _fatal); memory_region_add_subregion(sys_mem, memmap[SIFIVE_E_MROM].base, >mask_rom); @@ -208,8 +208,8 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, Error **errp) memmap[SIFIVE_E_PWM2].base, memmap[SIFIVE_E_PWM2].size); /* Flash memory */ -memory_region_init_ram(>xip_mem, NULL, "riscv.sifive.e.xip", -memmap[SIFIVE_E_XIP].size, _fatal); +memory_region_init_ram(>xip_mem, OBJECT(dev), "riscv.sifive.e.xip", + memmap[SIFIVE_E_XIP].size, _fatal); memory_region_set_readonly(>xip_mem, true); memory_region_add_subregion(sys_mem, memmap[SIFIVE_E_XIP].base, >xip_mem); diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c index 0e12b3ccef..592830a3a2 100644 --- a/hw/riscv/sifive_u.c +++ b/hw/riscv/sifive_u.c @@ -497,7 +497,7 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, Error **errp) _abort); /* boot rom */ -memory_region_init_rom(mask_rom, NULL, "riscv.sifive.u.mrom", +memory_region_init_rom(mask_rom, OBJECT(dev), "riscv.sifive.u.mrom", memmap[SIFIVE_U_MROM].size, _fatal); memory_region_add_subregion(system_memory, memmap[SIFIVE_U_MROM].base, mask_rom); @@ -511,7 +511,7 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, Error **errp) * leave it enabled all the time. This won't break anything, but will be * too generous to misbehaving guests. */ -memory_region_init_ram(l2lim_mem, NULL, "riscv.sifive.u.l2lim", +memory_region_init_ram(l2lim_mem, OBJECT(dev), "riscv.sifive.u.l2lim", memmap[SIFIVE_U_L2LIM].size, _fatal); memory_region_add_subregion(system_memory, memmap[SIFIVE_U_L2LIM].base, l2lim_mem); -- 2.21.1
[PATCH 2/7] hw/arm: Let devices own the MemoryRegion they create
To avoid orphean memory regions being added in the /unattached QOM container, use the memory_region_owner_nonnull.cocci script to set the correct ownership. Signed-off-by: Philippe Mathieu-Daudé --- hw/arm/exynos4210.c| 14 +++--- hw/arm/fsl-imx25.c | 14 +++--- hw/arm/fsl-imx31.c | 10 +- hw/arm/fsl-imx6.c | 10 +- hw/arm/fsl-imx6ul.c| 14 +++--- hw/arm/msf2-soc.c | 8 hw/arm/nrf51_soc.c | 2 +- hw/arm/stm32f205_soc.c | 10 +- hw/arm/stm32f405_soc.c | 13 +++-- hw/arm/xlnx-zynqmp.c | 13 ++--- 10 files changed, 54 insertions(+), 54 deletions(-) diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c index 59a27bdd68..d4b05336ee 100644 --- a/hw/arm/exynos4210.c +++ b/hw/arm/exynos4210.c @@ -305,28 +305,28 @@ static void exynos4210_realize(DeviceState *socdev, Error **errp) /*** Memory ***/ /* Chip-ID and OMR */ -memory_region_init_io(>chipid_mem, NULL, _chipid_and_omr_ops, -NULL, "exynos4210.chipid", sizeof(chipid_and_omr)); +memory_region_init_io(>chipid_mem, OBJECT(socdev), + _chipid_and_omr_ops, NULL, + "exynos4210.chipid", sizeof(chipid_and_omr)); memory_region_add_subregion(system_mem, EXYNOS4210_CHIPID_ADDR, >chipid_mem); /* Internal ROM */ -memory_region_init_ram(>irom_mem, NULL, "exynos4210.irom", +memory_region_init_ram(>irom_mem, OBJECT(socdev), "exynos4210.irom", EXYNOS4210_IROM_SIZE, _fatal); memory_region_set_readonly(>irom_mem, true); memory_region_add_subregion(system_mem, EXYNOS4210_IROM_BASE_ADDR, >irom_mem); /* mirror of iROM */ -memory_region_init_alias(>irom_alias_mem, NULL, "exynos4210.irom_alias", - >irom_mem, - 0, +memory_region_init_alias(>irom_alias_mem, OBJECT(socdev), + "exynos4210.irom_alias", >irom_mem, 0, EXYNOS4210_IROM_SIZE); memory_region_set_readonly(>irom_alias_mem, true); memory_region_add_subregion(system_mem, EXYNOS4210_IROM_MIRROR_BASE_ADDR, >irom_alias_mem); /* Internal RAM */ -memory_region_init_ram(>iram_mem, NULL, "exynos4210.iram", +memory_region_init_ram(>iram_mem, OBJECT(socdev), "exynos4210.iram", EXYNOS4210_IRAM_SIZE, _fatal); memory_region_add_subregion(system_mem, EXYNOS4210_IRAM_BASE_ADDR, >iram_mem); diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c index da3471b395..a77c873cfe 100644 --- a/hw/arm/fsl-imx25.c +++ b/hw/arm/fsl-imx25.c @@ -247,16 +247,16 @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp) } /* initialize 2 x 16 KB ROM */ -memory_region_init_rom(>rom[0], NULL, - "imx25.rom0", FSL_IMX25_ROM0_SIZE, ); +memory_region_init_rom(>rom[0], OBJECT(dev), "imx25.rom0", + FSL_IMX25_ROM0_SIZE, ); if (err) { error_propagate(errp, err); return; } memory_region_add_subregion(get_system_memory(), FSL_IMX25_ROM0_ADDR, >rom[0]); -memory_region_init_rom(>rom[1], NULL, - "imx25.rom1", FSL_IMX25_ROM1_SIZE, ); +memory_region_init_rom(>rom[1], OBJECT(dev), "imx25.rom1", + FSL_IMX25_ROM1_SIZE, ); if (err) { error_propagate(errp, err); return; @@ -265,8 +265,8 @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp) >rom[1]); /* initialize internal RAM (128 KB) */ -memory_region_init_ram(>iram, NULL, "imx25.iram", FSL_IMX25_IRAM_SIZE, - ); +memory_region_init_ram(>iram, OBJECT(dev), "imx25.iram", + FSL_IMX25_IRAM_SIZE, ); if (err) { error_propagate(errp, err); return; @@ -275,7 +275,7 @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp) >iram); /* internal RAM (128 KB) is aliased over 128 MB - 128 KB */ -memory_region_init_alias(>iram_alias, NULL, "imx25.iram_alias", +memory_region_init_alias(>iram_alias, OBJECT(dev), "imx25.iram_alias", >iram, 0, FSL_IMX25_IRAM_ALIAS_SIZE); memory_region_add_subregion(get_system_memory(), FSL_IMX25_IRAM_ALIAS_ADDR, >iram_alias); diff --git a/hw/arm/fsl-imx31.c b/hw/arm/fsl-imx31.c index 55e90d104b..d5057ea49b 100644 --- a/hw/arm/fsl-imx31.c +++ b/hw/arm/fsl-imx31.c @@ -206,7 +206,7 @@ static void fsl_imx31_realize(DeviceState *dev, Error **errp) } /* On a real system, the first 16k is a `secure boot rom' */ -memory_region_init_rom(>secure_rom, NULL, "imx31.secure_rom", +
[PATCH 5/7] hw/display: Let devices own the MemoryRegion they create
To avoid orphean memory regions being added in the /unattached QOM container, use the memory_region_owner_nonnull.cocci script to set the correct ownership. Signed-off-by: Philippe Mathieu-Daudé --- hw/display/cg3.c| 4 ++-- hw/display/g364fb.c | 5 +++-- hw/display/macfb.c | 4 ++-- hw/display/vmware_vga.c | 4 ++-- 4 files changed, 9 insertions(+), 8 deletions(-) diff --git a/hw/display/cg3.c b/hw/display/cg3.c index 4fb67c6b1c..db2e291d16 100644 --- a/hw/display/cg3.c +++ b/hw/display/cg3.c @@ -315,8 +315,8 @@ static void cg3_realizefn(DeviceState *dev, Error **errp) } } -memory_region_init_ram(>vram_mem, NULL, "cg3.vram", s->vram_size, - _fatal); +memory_region_init_ram(>vram_mem, OBJECT(dev), "cg3.vram", + s->vram_size, _fatal); memory_region_set_log(>vram_mem, true, DIRTY_MEMORY_VGA); sysbus_init_mmio(sbd, >vram_mem); diff --git a/hw/display/g364fb.c b/hw/display/g364fb.c index 55185c95c6..28785cceda 100644 --- a/hw/display/g364fb.c +++ b/hw/display/g364fb.c @@ -477,8 +477,9 @@ static void g364fb_init(DeviceState *dev, G364State *s) s->con = graphic_console_init(dev, 0, _ops, s); -memory_region_init_io(>mem_ctrl, NULL, _ctrl_ops, s, "ctrl", 0x18); -memory_region_init_ram_ptr(>mem_vram, NULL, "vram", +memory_region_init_io(>mem_ctrl, OBJECT(dev), _ctrl_ops, s, + "ctrl", 0x18); +memory_region_init_ram_ptr(>mem_vram, OBJECT(dev), "vram", s->vram_size, s->vram); vmstate_register_ram(>mem_vram, dev); memory_region_set_log(>mem_vram, true, DIRTY_MEMORY_VGA); diff --git a/hw/display/macfb.c b/hw/display/macfb.c index 8bff16d535..b68faff4bb 100644 --- a/hw/display/macfb.c +++ b/hw/display/macfb.c @@ -362,8 +362,8 @@ static void macfb_common_realize(DeviceState *dev, MacfbState *s, Error **errp) return; } -memory_region_init_io(>mem_ctrl, NULL, _ctrl_ops, s, "macfb-ctrl", - 0x1000); +memory_region_init_io(>mem_ctrl, OBJECT(dev), _ctrl_ops, s, + "macfb-ctrl", 0x1000); memory_region_init_ram_nomigrate(>mem_vram, OBJECT(s), "macfb-vram", MACFB_VRAM_SIZE, errp); diff --git a/hw/display/vmware_vga.c b/hw/display/vmware_vga.c index 58ea82e3e5..79574e3c46 100644 --- a/hw/display/vmware_vga.c +++ b/hw/display/vmware_vga.c @@ -1238,8 +1238,8 @@ static void vmsvga_init(DeviceState *dev, struct vmsvga_state_s *s, s->vga.con = graphic_console_init(dev, 0, _ops, s); s->fifo_size = SVGA_FIFO_SIZE; -memory_region_init_ram(>fifo_ram, NULL, "vmsvga.fifo", s->fifo_size, - _fatal); +memory_region_init_ram(>fifo_ram, OBJECT(dev), "vmsvga.fifo", + s->fifo_size, _fatal); s->fifo_ptr = memory_region_get_ram_ptr(>fifo_ram); vga_common_init(>vga, OBJECT(dev)); -- 2.21.1
[PATCH 6/7] hw/dma: Let devices own the MemoryRegion they create
To avoid orphean memory regions being added in the /unattached QOM container, use the memory_region_owner_nonnull.cocci script to set the correct ownership. Signed-off-by: Philippe Mathieu-Daudé --- hw/dma/i8257.c | 2 +- hw/dma/rc4030.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/dma/i8257.c b/hw/dma/i8257.c index bad8debae4..ef15c06d77 100644 --- a/hw/dma/i8257.c +++ b/hw/dma/i8257.c @@ -553,7 +553,7 @@ static void i8257_realize(DeviceState *dev, Error **errp) I8257State *d = I8257(dev); int i; -memory_region_init_io(>channel_io, NULL, _io_ops, d, +memory_region_init_io(>channel_io, OBJECT(dev), _io_ops, d, "dma-chan", 8 << d->dshift); memory_region_add_subregion(isa_address_space_io(isa), d->base, >channel_io); diff --git a/hw/dma/rc4030.c b/hw/dma/rc4030.c index c4cf8236f4..f62eb3d2a3 100644 --- a/hw/dma/rc4030.c +++ b/hw/dma/rc4030.c @@ -679,9 +679,9 @@ static void rc4030_realize(DeviceState *dev, Error **errp) s->periodic_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, rc4030_periodic_timer, s); -memory_region_init_io(>iomem_chipset, NULL, _ops, s, +memory_region_init_io(>iomem_chipset, o, _ops, s, "rc4030.chipset", 0x300); -memory_region_init_io(>iomem_jazzio, NULL, _ops, s, +memory_region_init_io(>iomem_jazzio, o, _ops, s, "rc4030.jazzio", 0x1000); memory_region_init_iommu(>dma_mr, sizeof(s->dma_mr), -- 2.21.1
[PATCH 3/7] hw/char: Let devices own the MemoryRegion they create
To avoid orphean memory regions being added in the /unattached QOM container, use the memory_region_owner_nonnull.cocci script to set the correct ownership. Signed-off-by: Philippe Mathieu-Daudé --- hw/char/serial.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/hw/char/serial.c b/hw/char/serial.c index 9298881af9..2ab8b69e03 100644 --- a/hw/char/serial.c +++ b/hw/char/serial.c @@ -997,7 +997,7 @@ static void serial_io_realize(DeviceState *dev, Error **errp) return; } -memory_region_init_io(>io, NULL, _io_ops, s, "serial", 8); +memory_region_init_io(>io, OBJECT(dev), _io_ops, s, "serial", 8); sysbus_init_mmio(SYS_BUS_DEVICE(sio), >io); sysbus_init_irq(SYS_BUS_DEVICE(sio), >irq); } @@ -1106,8 +1106,9 @@ static void serial_mm_realize(DeviceState *dev, Error **errp) return; } -memory_region_init_io(>io, NULL, _mm_ops[smm->endianness], smm, - "serial", 8 << smm->regshift); +memory_region_init_io(>io, OBJECT(dev), + _mm_ops[smm->endianness], smm, "serial", + 8 << smm->regshift); sysbus_init_mmio(SYS_BUS_DEVICE(smm), >io); sysbus_init_irq(SYS_BUS_DEVICE(smm), >serial.irq); } -- 2.21.1
[PATCH 4/7] hw/core: Let devices own the MemoryRegion they create
To avoid orphean memory regions being added in the /unattached QOM container, use the memory_region_owner_nonnull.cocci script to set the correct ownership. Signed-off-by: Philippe Mathieu-Daudé --- hw/core/platform-bus.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/core/platform-bus.c b/hw/core/platform-bus.c index 22c5f76dd0..d494e5cec1 100644 --- a/hw/core/platform-bus.c +++ b/hw/core/platform-bus.c @@ -187,7 +187,8 @@ static void platform_bus_realize(DeviceState *dev, Error **errp) d = SYS_BUS_DEVICE(dev); pbus = PLATFORM_BUS_DEVICE(dev); -memory_region_init(>mmio, NULL, "platform bus", pbus->mmio_size); +memory_region_init(>mmio, OBJECT(dev), "platform bus", + pbus->mmio_size); sysbus_init_mmio(d, >mmio); pbus->used_irqs = bitmap_new(pbus->num_irqs); -- 2.21.1
[PATCH 0/7] hw: Let devices own the MemoryRegion they create
When a device creates a MemoryRegion without setting its ownership, the MemoryRegion is added to the machine "/unattached" container in the QOM tree. Add a script to do automatically let the device take the ownership of the memory regions it creates, and run it. Philippe Mathieu-Daudé (7): scripts/coccinelle: Add a script to let devices own their MemoryRegions hw/arm: Let devices own the MemoryRegion they create hw/char: Let devices own the MemoryRegion they create hw/core: Let devices own the MemoryRegion they create hw/display: Let devices own the MemoryRegion they create hw/dma: Let devices own the MemoryRegion they create hw/riscv: Let devices own the MemoryRegion they create hw/arm/exynos4210.c | 14 ++-- hw/arm/fsl-imx25.c| 14 ++-- hw/arm/fsl-imx31.c| 10 +-- hw/arm/fsl-imx6.c | 10 +-- hw/arm/fsl-imx6ul.c | 14 ++-- hw/arm/msf2-soc.c | 8 +- hw/arm/nrf51_soc.c| 2 +- hw/arm/stm32f205_soc.c| 10 +-- hw/arm/stm32f405_soc.c| 13 +-- hw/arm/xlnx-zynqmp.c | 13 ++- hw/char/serial.c | 7 +- hw/core/platform-bus.c| 3 +- hw/display/cg3.c | 4 +- hw/display/g364fb.c | 5 +- hw/display/macfb.c| 4 +- hw/display/vmware_vga.c | 4 +- hw/dma/i8257.c| 2 +- hw/dma/rc4030.c | 4 +- hw/riscv/sifive_e.c | 8 +- hw/riscv/sifive_u.c | 4 +- .../memory_region_owner_nonnull.cocci | 80 +++ 21 files changed, 158 insertions(+), 75 deletions(-) create mode 100644 scripts/coccinelle/memory_region_owner_nonnull.cocci -- 2.21.1
[PATCH 1/7] scripts/coccinelle: Add a script to let devices own their MemoryRegions
When a device creates a MemoryRegion without setting its ownership, the MemoryRegion is added to the machine "/unattached" container in the QOM tree. Example with the Samsung SMDKC210 board: $ arm-softmmu/qemu-system-arm -M smdkc210 -S -monitor stdio (qemu) info qom-tree /machine (smdkc210-machine) /unattached (container) /io[0] (qemu:memory-region) /exynos4210.dram0[0] (qemu:memory-region) /exynos4210.irom[0] (qemu:memory-region) /exynos4210.iram[0] (qemu:memory-region) /exynos4210.chipid[0] (qemu:memory-region) ... /device[26] (exynos4210.uart) /exynos4210.uart[0] (qemu:memory-region) /soc (exynos4210) ^ \__ [*] The irom/iram/chipid regions should go under 'soc' at [*]. Add a coccinelle script to do automatic semantic patching. Signed-off-by: Philippe Mathieu-Daudé --- .../memory_region_owner_nonnull.cocci | 80 +++ 1 file changed, 80 insertions(+) create mode 100644 scripts/coccinelle/memory_region_owner_nonnull.cocci diff --git a/scripts/coccinelle/memory_region_owner_nonnull.cocci b/scripts/coccinelle/memory_region_owner_nonnull.cocci new file mode 100644 index 00..6748a200b2 --- /dev/null +++ b/scripts/coccinelle/memory_region_owner_nonnull.cocci @@ -0,0 +1,80 @@ +/* + Usage: + +spatch \ +--macro-file scripts/cocci-macro-file.h \ +--sp-file scripts/coccinelle/memory_region_owner_nonnull.cocci \ +--keep-comments \ +--in-place \ +--dir . + +*/ + +// Device is owner +@@ +typedef DeviceState; +identifier device_fn, dev, obj; +expression E1, E2, E3, E4, E5; +@@ +static void device_fn(DeviceState *dev, ...) +{ + ... + Object *obj = OBJECT(dev); + <+... +( +- memory_region_init(E1, NULL, E2, E3); ++ memory_region_init(E1, obj, E2, E3); +| +- memory_region_init_io(E1, NULL, E2, E3, E4, E5); ++ memory_region_init_io(E1, obj, E2, E3, E4, E5); +| +- memory_region_init_alias(E1, NULL, E2, E3, E4, E5); ++ memory_region_init_alias(E1, obj, E2, E3, E4, E5); +| +- memory_region_init_rom(E1, NULL, E2, E3, E4); ++ memory_region_init_rom(E1, obj, E2, E3, E4); +| +- memory_region_init_ram(E1, NULL, E2, E3, E4); ++ memory_region_init_ram(E1, obj, E2, E3, E4); +| +- memory_region_init_ram_ptr(E1, NULL, E2, E3, E4); ++ memory_region_init_ram_ptr(E1, obj, E2, E3, E4); +| +- memory_region_init_ram_shared_nomigrate(E1, NULL, E2, E3, E4, E5); ++ memory_region_init_ram_shared_nomigrate(E1, obj, E2, E3, E4, E5); +) + ...+> +} + +// Device is owner +@@ +identifier device_fn, dev; +expression E1, E2, E3, E4, E5; +@@ +static void device_fn(DeviceState *dev, ...) +{ + <+... +( +- memory_region_init(E1, NULL, E2, E3); ++ memory_region_init(E1, OBJECT(dev), E2, E3); +| +- memory_region_init_io(E1, NULL, E2, E3, E4, E5); ++ memory_region_init_io(E1, OBJECT(dev), E2, E3, E4, E5); +| +- memory_region_init_alias(E1, NULL, E2, E3, E4, E5); ++ memory_region_init_alias(E1, OBJECT(dev), E2, E3, E4, E5); +| +- memory_region_init_rom(E1, NULL, E2, E3, E4); ++ memory_region_init_rom(E1, OBJECT(dev), E2, E3, E4); +| +- memory_region_init_ram(E1, NULL, E2, E3, E4); ++ memory_region_init_ram(E1, OBJECT(dev), E2, E3, E4); +| +- memory_region_init_ram_ptr(E1, NULL, E2, E3, E4); ++ memory_region_init_ram_ptr(E1, OBJECT(dev), E2, E3, E4); +| +- memory_region_init_ram_shared_nomigrate(E1, NULL, E2, E3, E4, E5); ++ memory_region_init_ram_shared_nomigrate(E1, OBJECT(dev), E2, E3, E4, E5); +) + ...+> +} -- 2.21.1
Re: [PATCH] WHPX: Assigning maintainer for Windows Hypervisor Platform
On 18/02/20 21:38, Sunil Muthuswamy wrote: > Signed-off-by: Sunil Muthuswamy > --- > MAINTAINERS | 8 > 1 file changed, 8 insertions(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 1740a4fddc..9b3ba4e1b5 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -404,6 +404,14 @@ S: Supported > F: target/i386/kvm.c > F: scripts/kvm/vmxcap > > +WHPX CPUs > +M: Sunil Muthuswamy > +S: Supported > +F: target/i386/whpx-all.c > +F: target/i386/whp-dispatch.h > +F: accel/stubs/whpx-stub.c > +F: include/sysemu/whpx.h > + > Guest CPU Cores (Xen) > - > X86 Xen CPUs > Queued. Patches can still go through my tree while you get up to speed. Thanks! Paolo
Re: [RFC PATCH v3 00/27] Add subcluster allocation to qcow2
On 22.12.19 12:36, Alberto Garcia wrote: > Hi, > > here's the new version of the patches to add subcluster allocation > support to qcow2. > > Please refer to the cover letter of the first version for a full > description of the patches: > >https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html > > This version fixes many of the problems highlighted by Max. I decided > not to replace completely the cluster logic with subcluster logic in > all cases because I felt that sometimes it only complicated the code. > Let's see what you think :-) Looks good overall. :) So now I wonder on what your plans are after this series. Here are some things that come to my mind, and I wonder whether you plan to address them or whether there are more things to do still: - In v2, you had a patch for preallocation support with backing files. It didn’t quite work, which is why I think you dropped it for now (why not, it isn’t crucial). - There is a TODO on subcluster zeroing. - I think adding support to amend for switching extended_l2 on or off would make sense. But maybe it’s too complicated to be worth the effort. - As I noted in v2, I think it’d be great if it were possible to run the iotests with -o extended_l2=on. But I suppose this kind of depends on me adding data_file support to the Python tests first... Max signature.asc Description: OpenPGP digital signature
Re: [RFC PATCH v3 27/27] iotests: Add tests for qcow2 images with extended L2 entries
On 22.12.19 12:37, Alberto Garcia wrote: > Signed-off-by: Alberto Garcia > --- > tests/qemu-iotests/271 | 256 + > tests/qemu-iotests/271.out | 208 ++ > tests/qemu-iotests/group | 1 + > 3 files changed, 465 insertions(+) > create mode 100755 tests/qemu-iotests/271 > create mode 100644 tests/qemu-iotests/271.out Currently, you’re using the reference output to verify the results. I find this rather difficult. Can this not be written in a way that the test itself verifies the results? I realize bit manipulation in bash is hard, which is why I wonder whether Python may be better suited for the job. Or maybe at least there could be some way to produce a hexdump-like result from some more abstract description on what to expect and then compare the strings. I suppose I can live with how it is, but I feel like I’d have to do something in my head that could be better done by a script. Max signature.asc Description: OpenPGP digital signature
Re: [PATCH v4 04/16] hw/i386: Introduce init_topo_info to initialize X86CPUTopoInfo
On Thu, 13 Feb 2020 12:16:51 -0600 Babu Moger wrote: > Initialize all the parameters in one function init_topo_info. is it possible to squash it in 2/16 > > Move the data structure X86CPUTopoIDs and X86CPUTopoInfo into > x86.h. A reason why it's moved should be here. > > Signed-off-by: Babu Moger > Reviewed-by: Eduardo Habkost > --- > hw/i386/pc.c |4 +--- > hw/i386/x86.c | 14 +++--- > include/hw/i386/topology.h | 26 ++ > include/hw/i386/x86.h | 17 + > 4 files changed, 31 insertions(+), 30 deletions(-) > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > index 2adf7f6afa..9803413dd9 100644 > --- a/hw/i386/pc.c > +++ b/hw/i386/pc.c > @@ -1749,9 +1749,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev, > return; > } > > -topo_info.dies_per_pkg = x86ms->smp_dies; > -topo_info.cores_per_die = smp_cores; > -topo_info.threads_per_core = smp_threads; > +init_topo_info(_info, x86ms); > > env->nr_dies = x86ms->smp_dies; > > diff --git a/hw/i386/x86.c b/hw/i386/x86.c > index f18cab8e5c..083effb2f5 100644 > --- a/hw/i386/x86.c > +++ b/hw/i386/x86.c > @@ -63,15 +63,12 @@ static size_t pvh_start_addr; > uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms, > unsigned int cpu_index) > { > -MachineState *ms = MACHINE(x86ms); > X86MachineClass *x86mc = X86_MACHINE_GET_CLASS(x86ms); > X86CPUTopoInfo topo_info; > uint32_t correct_id; > static bool warned; > > -topo_info.dies_per_pkg = x86ms->smp_dies; > -topo_info.cores_per_die = ms->smp.cores; > -topo_info.threads_per_core = ms->smp.threads; > +init_topo_info(_info, x86ms); > > correct_id = x86_apicid_from_cpu_idx(_info, cpu_index); > if (x86mc->compat_apic_id_mode) { > @@ -146,10 +143,7 @@ int64_t x86_get_default_cpu_node_id(const MachineState > *ms, int idx) > X86MachineState *x86ms = X86_MACHINE(ms); > X86CPUTopoInfo topo_info; > > - topo_info.dies_per_pkg = x86ms->smp_dies; > - topo_info.cores_per_die = ms->smp.cores; > - topo_info.threads_per_core = ms->smp.threads; > - > + init_topo_info(_info, x86ms); > > assert(idx < ms->possible_cpus->len); > x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id, > @@ -177,9 +171,7 @@ const CPUArchIdList > *x86_possible_cpu_arch_ids(MachineState *ms) >sizeof(CPUArchId) * max_cpus); > ms->possible_cpus->len = max_cpus; > > -topo_info.dies_per_pkg = x86ms->smp_dies; > -topo_info.cores_per_die = ms->smp.cores; > -topo_info.threads_per_core = ms->smp.threads; > +init_topo_info(_info, x86ms); > > for (i = 0; i < ms->possible_cpus->len; i++) { > X86CPUTopoIDs topo_ids; > diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h > index ba52d49079..ef0ab0b6a3 100644 > --- a/include/hw/i386/topology.h > +++ b/include/hw/i386/topology.h > @@ -40,23 +40,17 @@ > > > #include "qemu/bitops.h" > +#include "hw/i386/x86.h" > > -/* APIC IDs can be 32-bit, but beware: APIC IDs > 255 require x2APIC support > - */ > -typedef uint32_t apic_id_t; > - > -typedef struct X86CPUTopoIDs { > -unsigned pkg_id; > -unsigned die_id; > -unsigned core_id; > -unsigned smt_id; > -} X86CPUTopoIDs; > - > -typedef struct X86CPUTopoInfo { > -unsigned dies_per_pkg; > -unsigned cores_per_die; > -unsigned threads_per_core; > -} X86CPUTopoInfo; > +static inline void init_topo_info(X86CPUTopoInfo *topo_info, > + const X86MachineState *x86ms) > +{ > +MachineState *ms = MACHINE(x86ms); > + > +topo_info->dies_per_pkg = x86ms->smp_dies; > +topo_info->cores_per_die = ms->smp.cores; > +topo_info->threads_per_core = ms->smp.threads; > +} > > /* Return the bit width needed for 'count' IDs > */ > diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h > index 4b84917885..ad62b01cf2 100644 > --- a/include/hw/i386/x86.h > +++ b/include/hw/i386/x86.h > @@ -36,6 +36,23 @@ typedef struct { > bool compat_apic_id_mode; > } X86MachineClass; > > +/* APIC IDs can be 32-bit, but beware: APIC IDs > 255 require x2APIC support > + */ > +typedef uint32_t apic_id_t; > + > +typedef struct X86CPUTopoIDs { > +unsigned pkg_id; > +unsigned die_id; > +unsigned core_id; > +unsigned smt_id; > +} X86CPUTopoIDs; > + > +typedef struct X86CPUTopoInfo { > +unsigned dies_per_pkg; > +unsigned cores_per_die; > +unsigned threads_per_core; > +} X86CPUTopoInfo; > + > typedef struct { > /*< private >*/ > MachineState parent; >
Re: [PATCH] accel/kvm: Check ioctl(KVM_SET_USER_MEMORY_REGION) return value
On 21/02/20 17:33, Philippe Mathieu-Daudé wrote: > kvm_vm_ioctl() can fail, check its return value, and log an error > when it failed. This fixes Coverity CID 1412229: > > Unchecked return value (CHECKED_RETURN) > > check_return: Calling kvm_vm_ioctl without checking return value > > Reported-by: Coverity (CID 1412229) > Fixes: 235e8982ad3 ("support using KVM_MEM_READONLY flag for regions") > Signed-off-by: Philippe Mathieu-Daudé > --- > accel/kvm/kvm-all.c | 12 +++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > index c111312dfd..6df3a4d030 100644 > --- a/accel/kvm/kvm-all.c > +++ b/accel/kvm/kvm-all.c > @@ -308,13 +308,23 @@ static int kvm_set_user_memory_region(KVMMemoryListener > *kml, KVMSlot *slot, boo > /* Set the slot size to 0 before setting the slot to the desired > * value. This is needed based on KVM commit 75d61fbc. */ > mem.memory_size = 0; > -kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, ); > +ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, ); > +if (ret < 0) { > +goto err; > +} > } > mem.memory_size = slot->memory_size; > ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, ); > slot->old_flags = mem.flags; > +err: > trace_kvm_set_user_memory(mem.slot, mem.flags, mem.guest_phys_addr, >mem.memory_size, mem.userspace_addr, ret); > +if (ret < 0) { > +error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d," > + " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s", > + __func__, mem.slot, slot->start_addr, > + (uint64_t)mem.memory_size, strerror(errno)); > +} > return ret; > } > > Queued, thanks. Paolo
Re: [PULL v2 00/46] target-arm queue
On Fri, 21 Feb 2020 at 16:18, Peter Maydell wrote: > > v1->v2 changes: dropped the last 6 patches from rth as there's > a problem with one of them that's too complicated to try to > fix up. > > thanks > -- PMM > > The following changes since commit a8c6af67e1e8d460e2c6e87070807e0a02c0fec2: > > Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200221' > into staging (2020-02-21 14:20:42 +) > > are available in the Git repository at: > > https://git.linaro.org/people/pmaydell/qemu-arm.git > tags/pull-target-arm-20200221-1 > > for you to fetch changes up to 9eb4f58918a851fb46895fd9b7ce579afeac9d02: > > target/arm: Set MVFR0.FPSP for ARMv5 cpus (2020-02-21 16:07:03 +) > > > target-arm queue: > * aspeed/scu: Implement chip ID register > * hw/misc/iotkit-secctl: Fix writing to 'PPC Interrupt Clear' register > * mainstone: Make providing flash images non-mandatory > * z2: Make providing flash images non-mandatory > * Fix failures to flush SVE high bits after AdvSIMD > INS/ZIP/UZP/TRN/TBL/TBX/EXT > * Minor performance improvement: spend less time recalculating hflags values > * Code cleanup to isar_feature function tests > * Implement ARMv8.1-PMU and ARMv8.4-PMU extensions > * Bugfix: correct handling of PMCR_EL0.LC bit > * Bugfix: correct definition of PMCRDP > * Correctly implement ACTLR2, HACTLR2 > * allwinner: Wire up USB ports > * Vectorize emulation of USHL, SSHL, PMUL* > * xilinx_spips: Correct the number of dummy cycles for the FAST_READ_4 cmd > * sh4: Fix PCI ISA IO memory subregion > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0 for any user-visible changes. -- PMM
Re: [RFC PATCH v3 26/27] qcow2: Add subcluster support to qcow2_measure()
On 22.12.19 12:37, Alberto Garcia wrote: > Extended L2 entries are bigger than normal L2 entries so this has an > impact on the amount of metadata needed for a qcow2 file. > > Signed-off-by: Alberto Garcia > --- > block/qcow2.c | 19 --- > 1 file changed, 12 insertions(+), 7 deletions(-) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
On Fri, Feb 21, 2020 at 5:11 PM Peter Maydell wrote: > > On Fri, 21 Feb 2020 at 16:05, BALATON Zoltan wrote: > > > > On Thu, 20 Feb 2020, Richard Henderson wrote: > > > On 2/18/20 9:10 AM, BALATON Zoltan wrote: > > >> +DEFINE_PROP_BOOL("hardfloat", PowerPCCPU, hardfloat, true), > > > > > > I would also prefer a different name here -- perhaps x-no-fp-fi. > > > > What's wrong with hardfloat? That's how the code refers to this so if > > anyone searches what it does would turn up some meaningful results. > > This prompted me to check what you're using the property for. > The cover letter says: > > This patch implements a simple way to keep the inexact flag set for > > hardfloat while still allowing to revert to softfloat for workloads > > that need more accurate albeit slower emulation. (Set hardfloat > > property of CPU, i.e. -cpu name,hardfloat=false for that.) > > I think that is the wrong approach. Enabling use of the host > FPU should not affect the accuracy of the emulation, which > should remain bitwise-correct. We should only be using the > host FPU to the extent that we can do that without discarding > accuracy. As far as I'm aware that's how the hardfloat support > for other guest CPUs that use it works. > Right, that is my understanding as well. There shouldn't be "hardfloat" switch at all. If there is, it is either a mistake, or for experimental or other similar purposes. In my understanding, hardfloat feature should work entirely transparently, making the decision whether to use host math functions (instead of softfloat library) by itself, based on the particular execution circumstances. In other words, the accuracy of FP emulation should not be compromised in absolutely any case (there should not be an option "ok, we are happy with approximate calculations"). On top of that, artificial generating of "inexact" flag really seems problematic (or, speaking bluntly, looks like a hack)). Perhaps hardfloat feature needs a little bit of more work, but not in this way. Yours, Aleksandar > thanks > -- PMM >
Re: [PATCH] console: make QMP screendump use coroutine
Kevin Wolf writes: > Am 20.02.2020 um 17:01 hat Markus Armbruster geschrieben: >> >> > void qmp_screendump(const char *filename, bool has_device, const char >> >> > *device, >> >> > bool has_head, int64_t head, Error **errp) >> >> > { >> >> > QemuConsole *con; >> >> > DisplaySurface *surface; >> >> > +g_autoptr(pixman_image_t) image = NULL; >> >> > int fd; >> >> > >> >> > if (has_device) { >> >> > @@ -365,7 +375,15 @@ void qmp_screendump(const char *filename, bool >> >> > has_device, const char *device, >> >> > } >> >> > } >> >> > >> >> > -graphic_hw_update(con); >> >> > +if (qemu_in_coroutine()) { >> >> > +assert(!con->screendump_co); >> >> > +con->screendump_co = qemu_coroutine_self(); >> >> > +aio_bh_schedule_oneshot(qemu_get_aio_context(), >> >> > +graphic_hw_update_bh, con); >> >> > +qemu_coroutine_yield(); >> >> > +con->screendump_co = NULL; >> >> > +} >> >> >> >> What if multiple QMP monitors simultaneously screendump? Hmm, it works >> >> because all execute one after another in the same coroutine >> >> qmp_dispatcher_co. Implicit mutual exclusion. >> >> >> >> Executing them one after another is bad, because it lets an ill-behaved >> >> QMP command starve *all* QMP monitors. We do it only out of >> >> (reasonable!) fear of implicit mutual exclusion requirements like the >> >> one you add. >> >> >> >> Let's not add more if we can help it. >> > >> > The situation is not worse than the current blocking handling. >> >> Really? >> >> What makes executing multiple qmp_screendump() concurrently (in separate >> threads) or interleaved (in separate coroutines in the same thread) >> unsafe before this patch? > > QMP command handlers are guaranteed to run in the main thread with the > BQL held, so there is no concurrency. If you want to change this, you > would have much more complicated problems to solve than in this handler. > I'm not sure it's fair to require thread-safety from one handler when > no other handler is thread safe (except accidentally) and nobody seems > to plan actually calling them from multiple threads. "Let's not [...] if we can help it." is hardly a "change this or else no merge" demand. It is a challenge to find a more elegant solution. >> >> Your screendump_co is per QemuConsole instead of per QMP monitor only >> >> because you need to find the coroutine in graphic_hw_update_done(). Can >> >> we somehow pass it via function arguments? >> > >> > I think it could be done later, so I suggest a TODO. >> >> We should avoid making our dependence on implicit mutual exclusion >> worse. When we do it anyway, a big, fat, ugly comment is definitely >> called for. > > Anyway, what I really wanted to add: > > This should be easy to solve by having a CoQueue instead of a single Ah, challenge accepted! Exactly the outcome I was hoping for :) > Coroutine pointer. The coroutine would just call qemu_co_queue_wait(), > which adds itself to the queue before it yields and the update > completion would wake up all coroutines that are currently queued with > qemu_co_queue_restart_all(). > > qemu_co_queue_wait() takes a lock as its second parameter. You don't > need it in this context and can just pass NULL. (This is a lock that > would be dropped while the coroutine is sleeping and automatically > reacquired afterwards.) > >> >> In case avoiding the mutual exclusion is impractical: please explain it >> >> in a comment to make it somewhat less implicit. >> >> It is anything but: see appended patch. > > This works, too, but it requires an additional struct. I think the queue > is easier. (Note there is a difference in the mechanism: Your patch > waits for the specific update it triggered, while the CoQueue would wait > for _any_ update to complete. I assume effectively the result is the > same.) Your idea sounds much nicer to me. Thanks!
Re: [PATCH] accel/kvm: Check ioctl(KVM_SET_USER_MEMORY_REGION) return value
On Fri, Feb 21, 2020 at 05:33:36PM +0100, Philippe Mathieu-Daudé wrote: > kvm_vm_ioctl() can fail, check its return value, and log an error > when it failed. This fixes Coverity CID 1412229: > > Unchecked return value (CHECKED_RETURN) > > check_return: Calling kvm_vm_ioctl without checking return value > > Reported-by: Coverity (CID 1412229) > Fixes: 235e8982ad3 ("support using KVM_MEM_READONLY flag for regions") > Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Peter Xu -- Peter Xu
Re: [RFC PATCH v3 25/27] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
On 22.12.19 12:37, Alberto Garcia wrote: > Now that the implementation of subclusters is complete we can finally > add the necessary options to create and read images with this feature, > which we call "extended L2 entries". > > Signed-off-by: Alberto Garcia > --- > block/qcow2.c| 65 ++-- > block/qcow2.h| 8 ++- > include/block/block_int.h| 1 + > qapi/block-core.json | 7 +++ > tests/qemu-iotests/031.out | 8 +-- > tests/qemu-iotests/036.out | 4 +- > tests/qemu-iotests/049.out | 102 +++ > tests/qemu-iotests/060.out | 1 + > tests/qemu-iotests/061.out | 20 +++--- > tests/qemu-iotests/065 | 18 -- > tests/qemu-iotests/082.out | 48 --- > tests/qemu-iotests/085.out | 38 ++-- > tests/qemu-iotests/144.out | 4 +- > tests/qemu-iotests/182.out | 2 +- > tests/qemu-iotests/185.out | 8 +-- > tests/qemu-iotests/198.out | 2 + > tests/qemu-iotests/206.out | 4 ++ > tests/qemu-iotests/242.out | 5 ++ > tests/qemu-iotests/255.out | 8 +-- > tests/qemu-iotests/273.out | 9 ++- > tests/qemu-iotests/common.filter | 1 + > 21 files changed, 245 insertions(+), 118 deletions(-) With the .qapi versions adjusted to match $next_release, and with the bit fixed to be at index 4 instead of 3 (and with the iotests rebases that always become necessary[1]): Reviewed-by: Max Reitz [1] e.g. 280 fails now – I suppose qemu_img_log should filter just like the bash tests do, but then again, I’d rather drop that function altogether anyway (https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg00136.html) signature.asc Description: OpenPGP digital signature
[PATCH v2 11/13] migration/multifd: Print used_length of memory block
We actually want to print the used_length, against which we check. Cc: "Dr. David Alan Gilbert" Cc: Juan Quintela Cc: Peter Xu Signed-off-by: David Hildenbrand --- migration/multifd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/migration/multifd.c b/migration/multifd.c index b3e8ae9bcc..dd9e88c5f1 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -222,7 +222,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) if (offset > (block->used_length - qemu_target_page_size())) { error_setg(errp, "multifd: offset too long %" PRIu64 " (max " RAM_ADDR_FMT ")", - offset, block->max_length); + offset, block->used_length); return -1; } p->pages->iov[i].iov_base = block->host + offset; -- 2.24.1
[PATCH v2 12/13] migration/ram: Use offset_in_ramblock() in range checks
We never read or write beyond the used_length of memory blocks when migrating. Make this clearer by using offset_in_ramblock() consistently. Cc: "Dr. David Alan Gilbert" Cc: Juan Quintela Cc: Peter Xu Signed-off-by: David Hildenbrand --- migration/ram.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index ee5c3d5784..5cc9993899 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1309,8 +1309,8 @@ static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again) *again = false; return false; } -if ram_addr_t)pss->page) << TARGET_PAGE_BITS) ->= pss->block->used_length) { +if (!offset_in_ramblock(pss->block, +((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) { /* Didn't find anything in this RAM Block */ pss->page = 0; pss->block = QLIST_NEXT_RCU(pss->block, next); @@ -1514,7 +1514,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) rs->last_req_rb = ramblock; } trace_ram_save_queue_pages(ramblock->idstr, start, len); -if (start+len > ramblock->used_length) { +if (!offset_in_ramblock(ramblock, start + len - 1)) { error_report("%s request overrun start=" RAM_ADDR_FMT " len=" RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT, __func__, start, len, ramblock->used_length); @@ -3325,8 +3325,8 @@ static void colo_flush_ram_cache(void) while (block) { offset = migration_bitmap_find_dirty(ram_state, block, offset); -if (((ram_addr_t)offset) << TARGET_PAGE_BITS ->= block->used_length) { +if (!offset_in_ramblock(block, +((ram_addr_t)offset) << TARGET_PAGE_BITS)) { offset = 0; block = QLIST_NEXT_RCU(block, next); } else { -- 2.24.1
[PATCH v2 10/13] migration/ram: Handle RAM block resizes during postcopy
Resizing while migrating is dangerous and does not work as expected. The whole migration code works on the usable_length of ram blocks and does not expect this to change at random points in time. In the case of postcopy, relying on used_length is racy as soon as the guest is running. Also, when used_length changes we might leave the uffd handler registered for some memory regions, reject valid pages when migrating and fail when sending the recv bitmap to the source. Resizing can be trigger *after* (but not during) a reset in ACPI code by the guest - hw/arm/virt-acpi-build.c:acpi_ram_update() - hw/i386/acpi-build.c:acpi_ram_update() Let's remember the original used_length in a separate variable and use it in relevant postcopy code. Make sure to update it when we resize during precopy, when synchronizing the RAM block sizes with the source. Cc: "Dr. David Alan Gilbert" Cc: Juan Quintela Cc: Eduardo Habkost Cc: Paolo Bonzini Cc: Igor Mammedov Cc: "Michael S. Tsirkin" Cc: Richard Henderson Cc: Shannon Zhao Cc: Alex Bennée Cc: Peter Xu Signed-off-by: David Hildenbrand --- include/exec/ramblock.h | 10 ++ migration/postcopy-ram.c | 15 --- migration/ram.c | 11 +-- 3 files changed, 31 insertions(+), 5 deletions(-) diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h index 07d50864d8..664701b759 100644 --- a/include/exec/ramblock.h +++ b/include/exec/ramblock.h @@ -59,6 +59,16 @@ struct RAMBlock { */ unsigned long *clear_bmap; uint8_t clear_bmap_shift; + +/* + * RAM block length that corresponds to the used_length on the migration + * source (after RAM block sizes were synchronized). Especially, after + * starting to run the guest, used_length and postcopy_length can differ. + * Used to register/unregister uffd handlers and as the size of the received + * bitmap. Receiving any page beyond this length will bail out, as it + * could not have been valid on the source. + */ +ram_addr_t postcopy_length; }; #endif #endif diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index a36402722b..c68caf4e42 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -17,6 +17,7 @@ */ #include "qemu/osdep.h" +#include "qemu/rcu.h" #include "exec/target_page.h" #include "migration.h" #include "qemu-file.h" @@ -31,6 +32,7 @@ #include "qemu/error-report.h" #include "trace.h" #include "hw/boards.h" +#include "exec/ramblock.h" /* Arbitrary limit on size of each discard command, * keeps them around ~200 bytes @@ -456,6 +458,13 @@ static int init_range(RAMBlock *rb, void *opaque) ram_addr_t length = qemu_ram_get_used_length(rb); trace_postcopy_init_range(block_name, host_addr, offset, length); +/* + * Save the used_length before running the guest. In case we have to + * resize RAM blocks when syncing RAM block sizes from the source during + * precopy, we'll update it manually via the ram block notifier. + */ +rb->postcopy_length = length; + /* * We need the whole of RAM to be truly empty for postcopy, so things * like ROMs and any data tables built during init must be zero'd @@ -478,7 +487,7 @@ static int cleanup_range(RAMBlock *rb, void *opaque) const char *block_name = qemu_ram_get_idstr(rb); void *host_addr = qemu_ram_get_host_addr(rb); ram_addr_t offset = qemu_ram_get_offset(rb); -ram_addr_t length = qemu_ram_get_used_length(rb); +ram_addr_t length = rb->postcopy_length; MigrationIncomingState *mis = opaque; struct uffdio_range range_struct; trace_postcopy_cleanup_range(block_name, host_addr, offset, length); @@ -600,7 +609,7 @@ static int nhp_range(RAMBlock *rb, void *opaque) const char *block_name = qemu_ram_get_idstr(rb); void *host_addr = qemu_ram_get_host_addr(rb); ram_addr_t offset = qemu_ram_get_offset(rb); -ram_addr_t length = qemu_ram_get_used_length(rb); +ram_addr_t length = rb->postcopy_length; trace_postcopy_nhp_range(block_name, host_addr, offset, length); /* @@ -644,7 +653,7 @@ static int ram_block_enable_notify(RAMBlock *rb, void *opaque) struct uffdio_register reg_struct; reg_struct.range.start = (uintptr_t)qemu_ram_get_host_addr(rb); -reg_struct.range.len = qemu_ram_get_used_length(rb); +reg_struct.range.len = rb->postcopy_length; reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING; /* Now tell our userfault_fd that it's responsible for this area */ diff --git a/migration/ram.c b/migration/ram.c index 1a5ff07997..ee5c3d5784 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -244,7 +244,7 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file, return -1; } -nbits = block->used_length >> TARGET_PAGE_BITS; +nbits = block->postcopy_length >> TARGET_PAGE_BITS; /* * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit @@ -3160,7 +3160,13 @@ static int
[PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code
When we partially change mappings (esp., mmap over parts of an existing mmap like qemu_ram_remap() does) where we have a userfaultfd handler registered, the handler will implicitly be unregistered from the parts that changed. Trying to place pages onto mappings where there is no longer a handler registered will fail. Let's make sure that any waiter is woken up - we have to do that manually. Let's also document how UFFDIO_UNREGISTER will handle this scenario. This is mainly a preparation for RAM blocks with resizable allcoations, where the mapping of the invalid RAM range will change. The source will keep sending pages that are outside of the new (shrunk) RAM size. We have to treat these pages like they would have been migrated, but can essentially simply drop the content (ignore the placement error). Keep printing a warning on EINVAL, to avoid hiding other (programming) issues. ENOENT is unique. Cc: "Dr. David Alan Gilbert" Cc: Juan Quintela Cc: Peter Xu Cc: Andrea Arcangeli Signed-off-by: David Hildenbrand --- migration/postcopy-ram.c | 37 + 1 file changed, 37 insertions(+) diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index c68caf4e42..f023830b9a 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -506,6 +506,12 @@ static int cleanup_range(RAMBlock *rb, void *opaque) range_struct.start = (uintptr_t)host_addr; range_struct.len = length; +/* + * In case the mapping was partially changed since we enabled userfault + * (e.g., via qemu_ram_remap()), the userfaultfd handler was already removed + * for the mappings that changed. Unregistering will, however, still work + * and ignore mappings without a registered handler. + */ if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, _struct)) { error_report("%s: userfault unregister %s", __func__, strerror(errno)); @@ -1180,6 +1186,17 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis) return 0; } +static int qemu_ufd_wake_ioctl(int userfault_fd, void *host_addr, + uint64_t pagesize) +{ +struct uffdio_range range = { +.start = (uint64_t)(uintptr_t)host_addr, +.len = pagesize, +}; + +return ioctl(userfault_fd, UFFDIO_WAKE, ); +} + static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, void *from_addr, uint64_t pagesize, RAMBlock *rb) { @@ -1198,6 +1215,26 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, zero_struct.mode = 0; ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, _struct); } + +/* + * When the mapping gets partially changed (e.g., qemu_ram_remap()) before + * we try to place a page, the userfaultfd handler will be removed for the + * changed mappings and placing pages will fail. We can safely ignore this, + * because mappings that changed on the destination don't need data from the + * source (e.g., qemu_ram_remap()). Wake up any waiter waiting for that page + * (unlikely but possible). Waking up waiters is always possible, even + * without a registered userfaultfd handler. + * + * Old kernels report EINVAL, new kernels report ENOENT in case there is + * no longer a userfaultfd handler for a mapping. + */ +if (ret && (errno == ENOENT || errno == EINVAL)) { +if (errno == EINVAL) { +warn_report("%s: Failed to place page %p. Waking up any waiters.", + __func__, host_addr); +} +ret = qemu_ufd_wake_ioctl(userfault_fd, host_addr, pagesize); +} if (!ret) { ramblock_recv_bitmap_set_range(rb, host_addr, pagesize / qemu_target_page_size()); -- 2.24.1
[PATCH v2 08/13] migration/ram: Simplify host page handling in ram_load_postcopy()
Add two new helper functions. This will in come handy once we want to handle ram block resizes while postcopy is active. Cc: "Dr. David Alan Gilbert" Cc: Juan Quintela Cc: Peter Xu Signed-off-by: David Hildenbrand --- migration/ram.c | 54 - 1 file changed, 31 insertions(+), 23 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index d5a4d69e1c..f815f4e532 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2734,6 +2734,20 @@ static inline void *host_from_ram_block_offset(RAMBlock *block, return block->host + offset; } +static void *host_page_from_ram_block_offset(RAMBlock *block, + ram_addr_t offset) +{ +/* Note: Explicitly no check against offset_in_ramblock(). */ +return (void *)QEMU_ALIGN_DOWN((uintptr_t)block->host + offset, + block->page_size); +} + +static ram_addr_t host_page_offset_from_ram_block_offset(RAMBlock *block, + ram_addr_t offset) +{ +return ((uintptr_t)block->host + offset) & (block->page_size - 1); +} + static inline void *colo_cache_from_block_offset(RAMBlock *block, ram_addr_t offset) { @@ -3111,13 +3125,12 @@ static int ram_load_postcopy(QEMUFile *f) MigrationIncomingState *mis = migration_incoming_get_current(); /* Temporary page that is later 'placed' */ void *postcopy_host_page = mis->postcopy_tmp_page; -void *this_host = NULL; +void *host_page = NULL; bool all_zero = false; int target_pages = 0; while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) { ram_addr_t addr; -void *host = NULL; void *page_buffer = NULL; void *place_source = NULL; RAMBlock *block = NULL; @@ -3143,9 +3156,12 @@ static int ram_load_postcopy(QEMUFile *f) if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE | RAM_SAVE_FLAG_COMPRESS_PAGE)) { block = ram_block_from_stream(f, flags); +if (!block) { +ret = -EINVAL; +break; +} -host = host_from_ram_block_offset(block, addr); -if (!host) { +if (!offset_in_ramblock(block, addr)) { error_report("Illegal RAM offset " RAM_ADDR_FMT, addr); ret = -EINVAL; break; @@ -3163,21 +3179,18 @@ static int ram_load_postcopy(QEMUFile *f) * of a host page in one chunk. */ page_buffer = postcopy_host_page + - ((uintptr_t)host & (block->page_size - 1)); + host_page_offset_from_ram_block_offset(block, addr); /* If all TP are zero then we can optimise the place */ if (target_pages == 1) { all_zero = true; -this_host = (void *)QEMU_ALIGN_DOWN((uintptr_t)host, -block->page_size); -} else { +host_page = host_page_from_ram_block_offset(block, addr); +} else if (host_page != host_page_from_ram_block_offset(block, +addr)) { /* not the 1st TP within the HP */ -if (QEMU_ALIGN_DOWN((uintptr_t)host, block->page_size) != -(uintptr_t)this_host) { -error_report("Non-same host page %p/%p", - host, this_host); -ret = -EINVAL; -break; -} +error_report("Non-same host page %p/%p", host_page, + host_page_from_ram_block_offset(block, addr)); +ret = -EINVAL; +break; } /* @@ -3257,16 +3270,11 @@ static int ram_load_postcopy(QEMUFile *f) } if (!ret && place_needed) { -/* This gets called at the last target page in the host page */ -void *place_dest = (void *)QEMU_ALIGN_DOWN((uintptr_t)host, - block->page_size); - if (all_zero) { -ret = postcopy_place_page_zero(mis, place_dest, - block); +ret = postcopy_place_page_zero(mis, host_page, block); } else { -ret = postcopy_place_page(mis, place_dest, - place_source, block); +ret = postcopy_place_page(mis, host_page, place_source, + block); } } } -- 2.24.1
[PATCH v2 09/13] migration/ram: Consolidate variable reset after placement in ram_load_postcopy()
Let's consolidate resetting the variables. Cc: "Dr. David Alan Gilbert" Cc: Juan Quintela Cc: Peter Xu Signed-off-by: David Hildenbrand --- migration/ram.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index f815f4e532..1a5ff07997 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3126,7 +3126,7 @@ static int ram_load_postcopy(QEMUFile *f) /* Temporary page that is later 'placed' */ void *postcopy_host_page = mis->postcopy_tmp_page; void *host_page = NULL; -bool all_zero = false; +bool all_zero = true; int target_pages = 0; while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) { @@ -3152,7 +3152,6 @@ static int ram_load_postcopy(QEMUFile *f) addr &= TARGET_PAGE_MASK; trace_ram_load_postcopy_loop((uint64_t)addr, flags); -place_needed = false; if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE | RAM_SAVE_FLAG_COMPRESS_PAGE)) { block = ram_block_from_stream(f, flags); @@ -3180,9 +3179,7 @@ static int ram_load_postcopy(QEMUFile *f) */ page_buffer = postcopy_host_page + host_page_offset_from_ram_block_offset(block, addr); -/* If all TP are zero then we can optimise the place */ if (target_pages == 1) { -all_zero = true; host_page = host_page_from_ram_block_offset(block, addr); } else if (host_page != host_page_from_ram_block_offset(block, addr)) { @@ -3199,7 +3196,6 @@ static int ram_load_postcopy(QEMUFile *f) */ if (target_pages == (block->page_size / TARGET_PAGE_SIZE)) { place_needed = true; -target_pages = 0; } place_source = postcopy_host_page; } @@ -3276,6 +3272,10 @@ static int ram_load_postcopy(QEMUFile *f) ret = postcopy_place_page(mis, host_page, place_source, block); } +place_needed = false; +target_pages = 0; +/* Assume we have a zero page until we detect something different */ +all_zero = true; } } -- 2.24.1
[PATCH v2 03/13] numa: Teach ram block notifiers about resizeable ram blocks
Ram block notifiers are currently not aware of resizes. Especially to handle resizes during migration, but also to implement actually resizeable ram blocks (make everything between used_length and max_length inaccessible), we want to teach ram block notifiers about resizeable ram. Introduce the basic infrastructure but keep using max_size in the existing notifiers. Supply the max_size when adding and removing ram blocks. Also, notify on resizes. Acked-by: Paul Durrant Reviewed-by: Peter Xu Cc: Richard Henderson Cc: Paolo Bonzini Cc: "Dr. David Alan Gilbert" Cc: Eduardo Habkost Cc: Marcel Apfelbaum Cc: Stefano Stabellini Cc: Anthony Perard Cc: Paul Durrant Cc: "Michael S. Tsirkin" Cc: xen-de...@lists.xenproject.org Cc: Igor Mammedov Signed-off-by: David Hildenbrand --- exec.c | 13 +++-- hw/core/numa.c | 22 +- hw/i386/xen/xen-mapcache.c | 7 --- include/exec/ramlist.h | 13 + target/i386/hax-mem.c | 5 +++-- target/i386/sev.c | 18 ++ util/vfio-helpers.c| 16 7 files changed, 62 insertions(+), 32 deletions(-) diff --git a/exec.c b/exec.c index dfd43d27c6..b75250e773 100644 --- a/exec.c +++ b/exec.c @@ -2129,6 +2129,8 @@ static int memory_try_enable_merging(void *addr, size_t len) */ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp) { +const ram_addr_t oldsize = block->used_length; + assert(block); newsize = HOST_PAGE_ALIGN(newsize); @@ -2153,6 +2155,11 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp) return -EINVAL; } +/* Notify before modifying the ram block and touching the bitmaps. */ +if (block->host) { +ram_block_notify_resize(block->host, oldsize, newsize); +} + cpu_physical_memory_clear_dirty_range(block->offset, block->used_length); block->used_length = newsize; cpu_physical_memory_set_dirty_range(block->offset, block->used_length, @@ -2312,7 +2319,8 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared) qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE); /* MADV_DONTFORK is also needed by KVM in absence of synchronous MMU */ qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_DONTFORK); -ram_block_notify_add(new_block->host, new_block->max_length); +ram_block_notify_add(new_block->host, new_block->used_length, + new_block->max_length); } } @@ -2492,7 +2500,8 @@ void qemu_ram_free(RAMBlock *block) } if (block->host) { -ram_block_notify_remove(block->host, block->max_length); +ram_block_notify_remove(block->host, block->used_length, +block->max_length); } qemu_mutex_lock_ramlist(); diff --git a/hw/core/numa.c b/hw/core/numa.c index 6599c69e05..e28ad24fcd 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -902,11 +902,12 @@ void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms) static int ram_block_notify_add_single(RAMBlock *rb, void *opaque) { const ram_addr_t max_size = qemu_ram_get_max_length(rb); +const ram_addr_t size = qemu_ram_get_used_length(rb); void *host = qemu_ram_get_host_addr(rb); RAMBlockNotifier *notifier = opaque; if (host) { -notifier->ram_block_added(notifier, host, max_size); +notifier->ram_block_added(notifier, host, size, max_size); } return 0; } @@ -923,20 +924,31 @@ void ram_block_notifier_remove(RAMBlockNotifier *n) QLIST_REMOVE(n, next); } -void ram_block_notify_add(void *host, size_t size) +void ram_block_notify_add(void *host, size_t size, size_t max_size) { RAMBlockNotifier *notifier; QLIST_FOREACH(notifier, _list.ramblock_notifiers, next) { -notifier->ram_block_added(notifier, host, size); +notifier->ram_block_added(notifier, host, size, max_size); } } -void ram_block_notify_remove(void *host, size_t size) +void ram_block_notify_remove(void *host, size_t size, size_t max_size) { RAMBlockNotifier *notifier; QLIST_FOREACH(notifier, _list.ramblock_notifiers, next) { -notifier->ram_block_removed(notifier, host, size); +notifier->ram_block_removed(notifier, host, size, max_size); +} +} + +void ram_block_notify_resize(void *host, size_t old_size, size_t new_size) +{ +RAMBlockNotifier *notifier; + +QLIST_FOREACH(notifier, _list.ramblock_notifiers, next) { +if (notifier->ram_block_resized) { +notifier->ram_block_resized(notifier, host, old_size, new_size); +} } } diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c index 5b120ed44b..d6dcea65d1 100644 --- a/hw/i386/xen/xen-mapcache.c +++ b/hw/i386/xen/xen-mapcache.c @@ -169,7 +169,8 @@ static void xen_remap_bucket(MapCacheEntry *entry, if
[PATCH v2 07/13] migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init()
In case we grow our RAM after ram_postcopy_incoming_init() (e.g., when synchronizing the RAM block state with the migration source), the resized part would not get discarded. Let's perform that when being notified about a resize while postcopy has been advised, but is not listening yet. With precopy, the process is as following: 1. VM created - RAM blocks are created 2. Incomming migration started - Postcopy is advised - All pages in RAM blocks are discarded 3. Precopy starts - RAM blocks are resized to match the size on the migration source. - RAM pages from precopy stream are loaded - Uffd handler is registered, postcopy starts listening 3. Guest started, postcopy running - Pagefaults get resolved, pages get placed Cc: "Dr. David Alan Gilbert" Cc: Juan Quintela Cc: Peter Xu Signed-off-by: David Hildenbrand --- migration/ram.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 39c7d1c4a6..d5a4d69e1c 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3714,6 +3714,7 @@ static SaveVMHandlers savevm_ram_handlers = { static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host, size_t old_size, size_t new_size) { +PostcopyState ps = postcopy_state_get(); ram_addr_t offset; Error *err = NULL; RAMBlock *rb = qemu_ram_block_from_host(host, false, ); @@ -3734,6 +3735,35 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host, error_free(err); migration_cancel(); } + +switch (ps) { +case POSTCOPY_INCOMING_ADVISE: +/* + * Update what ram_postcopy_incoming_init()->init_range() does at the + * time postcopy was advised. Syncing RAM blocks with the source will + * result in RAM resizes. + */ +if (old_size < new_size) { +if (ram_discard_range(rb->idstr, old_size, new_size - old_size)) { +error_report("RAM block '%s' discard of resized RAM failed", + rb->idstr); +} +} +break; +case POSTCOPY_INCOMING_NONE: +case POSTCOPY_INCOMING_RUNNING: +case POSTCOPY_INCOMING_END: +/* + * Once our guest is running, postcopy does no longer care about + * resizes. When growing, the new memory was not available on the + * source, no handler needed. + */ +break; +default: +error_report("RAM block '%s' resized during postcopy state: %d", + rb->idstr, ps); +exit(-1); +} } static RAMBlockNotifier ram_mig_ram_notifier = { -- 2.24.1
[PATCH v2 04/13] numa: Make all callbacks of ram block notifiers optional
Let's make add/remove optional. We want to introduce a RAM block notifier for RAM migration, that's only interested in resizes. Reviewed-by: Peter Xu Cc: Richard Henderson Cc: Paolo Bonzini Cc: Eduardo Habkost Cc: Marcel Apfelbaum Cc: Peter Xu Signed-off-by: David Hildenbrand --- hw/core/numa.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/hw/core/numa.c b/hw/core/numa.c index e28ad24fcd..4270b268c8 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -915,8 +915,11 @@ static int ram_block_notify_add_single(RAMBlock *rb, void *opaque) void ram_block_notifier_add(RAMBlockNotifier *n) { QLIST_INSERT_HEAD(_list.ramblock_notifiers, n, next); + /* Notify about all existing ram blocks. */ -qemu_ram_foreach_block(ram_block_notify_add_single, n); +if (n->ram_block_added) { +qemu_ram_foreach_block(ram_block_notify_add_single, n); +} } void ram_block_notifier_remove(RAMBlockNotifier *n) @@ -929,7 +932,9 @@ void ram_block_notify_add(void *host, size_t size, size_t max_size) RAMBlockNotifier *notifier; QLIST_FOREACH(notifier, _list.ramblock_notifiers, next) { -notifier->ram_block_added(notifier, host, size, max_size); +if (notifier->ram_block_added) { +notifier->ram_block_added(notifier, host, size, max_size); +} } } @@ -938,7 +943,9 @@ void ram_block_notify_remove(void *host, size_t size, size_t max_size) RAMBlockNotifier *notifier; QLIST_FOREACH(notifier, _list.ramblock_notifiers, next) { -notifier->ram_block_removed(notifier, host, size, max_size); +if (notifier->ram_block_removed) { +notifier->ram_block_removed(notifier, host, size, max_size); +} } } -- 2.24.1
[PATCH v2 06/13] exec: Relax range check in ram_block_discard_range()
We want to make use of ram_block_discard_range() in the RAM block resize callback when growing a RAM block, *before* used_length is changed. Let's relax the check. We always have a reserved mapping for the whole max_length, so we cannot corrupt unrelated data. Cc: Richard Henderson Cc: Paolo Bonzini Cc: Eduardo Habkost Cc: Peter Xu Signed-off-by: David Hildenbrand --- exec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/exec.c b/exec.c index 8b015821d6..8737acedab 100644 --- a/exec.c +++ b/exec.c @@ -3915,7 +3915,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length) goto err; } -if ((start + length) <= rb->used_length) { +if ((start + length) <= rb->max_length) { bool need_madvise, need_fallocate; if (!QEMU_IS_ALIGNED(length, rb->page_size)) { error_report("ram_block_discard_range: Unaligned length: %zx", @@ -3982,7 +3982,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length) } else { error_report("ram_block_discard_range: Overrun block '%s' (%" PRIu64 "/%zx/" RAM_ADDR_FMT")", - rb->idstr, start, length, rb->used_length); + rb->idstr, start, length, rb->max_length); } err: -- 2.24.1
[PATCH v2 05/13] migration/ram: Handle RAM block resizes during precopy
Resizing while migrating is dangerous and does not work as expected. The whole migration code works on the usable_length of ram blocks and does not expect this to change at random points in time. In the case of precopy, the ram block size must not change on the source, after syncing the RAM block list in ram_save_setup(), so as long as the guest is still running on the source. Resizing can be trigger *after* (but not during) a reset in ACPI code by the guest - hw/arm/virt-acpi-build.c:acpi_ram_update() - hw/i386/acpi-build.c:acpi_ram_update() Use the ram block notifier to get notified about resizes. Let's simply cancel migration and indicate the reason. We'll continue running on the source. No harm done. Update the documentation. Postcopy will be handled separately. Cc: "Dr. David Alan Gilbert" Cc: Juan Quintela Cc: Eduardo Habkost Cc: Paolo Bonzini Cc: Igor Mammedov Cc: "Michael S. Tsirkin" Cc: Richard Henderson Cc: Shannon Zhao Cc: Alex Bennée Cc: Peter Xu Signed-off-by: David Hildenbrand --- exec.c| 5 +++-- include/exec/memory.h | 10 ++ migration/migration.c | 9 +++-- migration/migration.h | 1 + migration/ram.c | 31 +++ 5 files changed, 48 insertions(+), 8 deletions(-) diff --git a/exec.c b/exec.c index b75250e773..8b015821d6 100644 --- a/exec.c +++ b/exec.c @@ -2120,8 +2120,9 @@ static int memory_try_enable_merging(void *addr, size_t len) return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE); } -/* Only legal before guest might have detected the memory size: e.g. on - * incoming migration, or right after reset. +/* + * Resizing RAM while migrating can result in the migration being canceled. + * Care has to be taken if the guest might have already detected the memory. * * As memory core doesn't know how is memory accessed, it is up to * resize callback to update device state and/or add assertions to detect diff --git a/include/exec/memory.h b/include/exec/memory.h index e85b7de99a..de111347e8 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -113,7 +113,7 @@ typedef struct IOMMUNotifier IOMMUNotifier; #define RAM_SHARED (1 << 1) /* Only a portion of RAM (used_length) is actually used, and migrated. - * This used_length size can change across reboots. + * Resizing RAM while migrating can result in the migration being canceled. */ #define RAM_RESIZEABLE (1 << 2) @@ -843,7 +843,9 @@ void memory_region_init_ram_shared_nomigrate(MemoryRegion *mr, * RAM. Accesses into the region will * modify memory directly. Only an initial * portion of this RAM is actually used. - * The used size can change across reboots. + * Changing the size while migrating + * can result in the migration being + * canceled. * * @mr: the #MemoryRegion to be initialized. * @owner: the object that tracks the region's reference count @@ -1464,8 +1466,8 @@ void *memory_region_get_ram_ptr(MemoryRegion *mr); /* memory_region_ram_resize: Resize a RAM region. * - * Only legal before guest might have detected the memory size: e.g. on - * incoming migration, or right after reset. + * Resizing RAM while migrating can result in the migration being canceled. + * Care has to be taken if the guest might have already detected the memory. * * @mr: a memory region created with @memory_region_init_resizeable_ram. * @newsize: the new size the region diff --git a/migration/migration.c b/migration/migration.c index 8fb68795dc..ac9751dbe5 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -175,13 +175,18 @@ void migration_object_init(void) } } +void migration_cancel(void) +{ +migrate_fd_cancel(current_migration); +} + void migration_shutdown(void) { /* * Cancel the current migration - that will (eventually) * stop the migration using this structure */ -migrate_fd_cancel(current_migration); +migration_cancel(); object_unref(OBJECT(current_migration)); } @@ -2019,7 +2024,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, void qmp_migrate_cancel(Error **errp) { -migrate_fd_cancel(migrate_get_current()); +migration_cancel(); } void qmp_migrate_continue(MigrationStatus state, Error **errp) diff --git a/migration/migration.h b/migration/migration.h index 8473ddfc88..79fd74afa5 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -343,5 +343,6 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque); void migration_make_urgent_request(void); void migration_consume_urgent_request(void); bool migration_rate_limit(void); +void migration_cancel(void); #endif diff --git a/migration/ram.c b/migration/ram.c index ed23ed1c7c..39c7d1c4a6 100644 ---
[PATCH v2 02/13] stubs/ram-block: Remove stubs that are no longer needed
Current code no longer needs these stubs to compile. Let's just remove them. Reviewed-by: Peter Xu Cc: Richard Henderson Cc: Paolo Bonzini Cc: Eduardo Habkost Cc: Peter Xu Signed-off-by: David Hildenbrand --- stubs/ram-block.c | 20 1 file changed, 20 deletions(-) diff --git a/stubs/ram-block.c b/stubs/ram-block.c index 73c0a3ee08..10855b52dd 100644 --- a/stubs/ram-block.c +++ b/stubs/ram-block.c @@ -2,21 +2,6 @@ #include "exec/ramlist.h" #include "exec/cpu-common.h" -void *qemu_ram_get_host_addr(RAMBlock *rb) -{ -return 0; -} - -ram_addr_t qemu_ram_get_offset(RAMBlock *rb) -{ -return 0; -} - -ram_addr_t qemu_ram_get_used_length(RAMBlock *rb) -{ -return 0; -} - void ram_block_notifier_add(RAMBlockNotifier *n) { } @@ -24,8 +9,3 @@ void ram_block_notifier_add(RAMBlockNotifier *n) void ram_block_notifier_remove(RAMBlockNotifier *n) { } - -int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque) -{ -return 0; -} -- 2.24.1
[PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating
This is the follow up of "[PATCH RFC] memory: Don't allow to resize RAM while migrating" [1] This series contains some (slightly modified) patches also contained in: "[PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX" [2] That series will be based on this series. The last patch (#13) in this series could be moved to the other series, but I decided to include it in here for now (similar context). I realized that resizing RAM blocks while the guest is being migrated (precopy: resize while still running on the source, postcopy: resize while already running on the target) is buggy. In case of precopy, we can simply cancel migration. Postcopy handling is more involved. Resizing can currently happen during a guest reboot, triggered by ACPI rebuilds. Along with the fixes, some cleanups. [1] https://lkml.kernel.org/r/20200213172016.196609-1-da...@redhat.com [2] https://lkml.kernel.org/r/20200212134254.11073-1-da...@redhat.com I was now able to actually test resizing while migrating. I am using the prototype of virtio-mem to test (which also makes use of resizable allocations). Things I was able to reproduce: - Resize while still running on the migration source. Migration is canceled -- Test case for "migraton/ram: Handle RAM block resizes during precopy" - Resize (grow+shrink) on the migration target during postcopy migration (when syncing RAM blocks), while not yet running on the target -- Test case for "migration/ram: Discard new RAM when growing RAM blocks and the VM is stopped", and overall RAM size synchronization. Seems to work just fine. - Resize (grow+shrink) on the migration tagret during postcopy migration while already running on the target. -- Test case for "migration/ram: Handle RAM block resizes during postcopy" -- Test case for "migration/ram: Tolerate partially changed mappings in postcopy code" - I can see that -ENOENT is actually triggered and that migration succeeds. Migration seems to work just fine. In addition I run avocado-vt migration tests + usual QEMU checks. v1 -> v2: - "util: vfio-helpers: Factor out and fix processing of existing ram blocks" -- Stringify error - "migraton/ram: Handle RAM block resizes during precopy" -- Simplified check if we're migrating on the source - "exec: Relax range check in ram_block_discard_range()" -- Added to make discard during resizes actually work - "migration/ram: Discard new RAM when growing RAM blocks after ram_postcopy_incoming_init()" -- Better checks if in the right postcopy mode. -- Better patch subject/description/comments - "migration/ram: Handle RAM block resizes during postcopy" -- Better comments -- Adapt to changed postcopy checks - "migrate/ram: Get rid of "place_source" in ram_load_postcopy()" -- Dropped, as broken - "migration/ram: Tolerate partially changed mappings in postcopy code" -- Better comment / description. Clarify that no implicit wakeup will happen -- Warn on EINVAL (older kernels) -- Wake up any waiter explicitly David Hildenbrand (13): util: vfio-helpers: Factor out and fix processing of existing ram blocks stubs/ram-block: Remove stubs that are no longer needed numa: Teach ram block notifiers about resizeable ram blocks numa: Make all callbacks of ram block notifiers optional migration/ram: Handle RAM block resizes during precopy exec: Relax range check in ram_block_discard_range() migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init() migration/ram: Simplify host page handling in ram_load_postcopy() migration/ram: Consolidate variable reset after placement in ram_load_postcopy() migration/ram: Handle RAM block resizes during postcopy migration/multifd: Print used_length of memory block migration/ram: Use offset_in_ramblock() in range checks migration/ram: Tolerate partially changed mappings in postcopy code exec.c | 27 +-- hw/core/numa.c | 41 +-- hw/i386/xen/xen-mapcache.c | 7 +- include/exec/cpu-common.h | 1 + include/exec/memory.h | 10 +-- include/exec/ramblock.h| 10 +++ include/exec/ramlist.h | 13 ++-- migration/migration.c | 9 ++- migration/migration.h | 1 + migration/multifd.c| 2 +- migration/postcopy-ram.c | 52 +- migration/ram.c| 144 - stubs/ram-block.c | 20 -- target/i386/hax-mem.c | 5 +- target/i386/sev.c | 18 ++--- util/vfio-helpers.c| 41 --- 16 files changed, 283 insertions(+), 118 deletions(-) -- 2.24.1
[PATCH v2 01/13] util: vfio-helpers: Factor out and fix processing of existing ram blocks
Factor it out into common code when a new notifier is registered, just as done with the memory region notifier. This allows us to have the logic about how to process existing ram blocks at a central place (which will be extended soon). Just like when adding a new ram block, we have to register the max_length for now. We don't have a way to get notified about resizes yet, and some memory would not be mapped when growing the ram block. Note: Currently, ram blocks are only "fake resized". All memory (max_length) is accessible. We can get rid of a bunch of functions in stubs/ram-block.c . Print the warning from inside qemu_vfio_ram_block_added(). Reviewed-by: Peter Xu Cc: Richard Henderson Cc: Paolo Bonzini Cc: Eduardo Habkost Cc: Marcel Apfelbaum Cc: Alex Williamson Cc: Stefan Hajnoczi Cc: Peter Xu Signed-off-by: David Hildenbrand --- exec.c| 5 + hw/core/numa.c| 14 ++ include/exec/cpu-common.h | 1 + util/vfio-helpers.c | 29 - 4 files changed, 28 insertions(+), 21 deletions(-) diff --git a/exec.c b/exec.c index 8e9cc3b47c..dfd43d27c6 100644 --- a/exec.c +++ b/exec.c @@ -2016,6 +2016,11 @@ ram_addr_t qemu_ram_get_used_length(RAMBlock *rb) return rb->used_length; } +ram_addr_t qemu_ram_get_max_length(RAMBlock *rb) +{ +return rb->max_length; +} + bool qemu_ram_is_shared(RAMBlock *rb) { return rb->flags & RAM_SHARED; diff --git a/hw/core/numa.c b/hw/core/numa.c index 0d1b4be76a..6599c69e05 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -899,9 +899,23 @@ void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms) } } +static int ram_block_notify_add_single(RAMBlock *rb, void *opaque) +{ +const ram_addr_t max_size = qemu_ram_get_max_length(rb); +void *host = qemu_ram_get_host_addr(rb); +RAMBlockNotifier *notifier = opaque; + +if (host) { +notifier->ram_block_added(notifier, host, max_size); +} +return 0; +} + void ram_block_notifier_add(RAMBlockNotifier *n) { QLIST_INSERT_HEAD(_list.ramblock_notifiers, n, next); +/* Notify about all existing ram blocks. */ +qemu_ram_foreach_block(ram_block_notify_add_single, n); } void ram_block_notifier_remove(RAMBlockNotifier *n) diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h index 81753bbb34..9760ac9068 100644 --- a/include/exec/cpu-common.h +++ b/include/exec/cpu-common.h @@ -59,6 +59,7 @@ const char *qemu_ram_get_idstr(RAMBlock *rb); void *qemu_ram_get_host_addr(RAMBlock *rb); ram_addr_t qemu_ram_get_offset(RAMBlock *rb); ram_addr_t qemu_ram_get_used_length(RAMBlock *rb); +ram_addr_t qemu_ram_get_max_length(RAMBlock *rb); bool qemu_ram_is_shared(RAMBlock *rb); bool qemu_ram_is_uf_zeroable(RAMBlock *rb); void qemu_ram_set_uf_zeroable(RAMBlock *rb); diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c index ddd9a96e76..260570ae19 100644 --- a/util/vfio-helpers.c +++ b/util/vfio-helpers.c @@ -376,8 +376,14 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n, void *host, size_t size) { QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier); +int ret; + trace_qemu_vfio_ram_block_added(s, host, size); -qemu_vfio_dma_map(s, host, size, false, NULL); +ret = qemu_vfio_dma_map(s, host, size, false, NULL); +if (ret) { +error_report("qemu_vfio_dma_map(%p, %zu) failed: %s", host, size, + strerror(-ret)); +} } static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n, @@ -390,33 +396,14 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n, } } -static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque) -{ -void *host_addr = qemu_ram_get_host_addr(rb); -ram_addr_t length = qemu_ram_get_used_length(rb); -int ret; -QEMUVFIOState *s = opaque; - -if (!host_addr) { -return 0; -} -ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL); -if (ret) { -fprintf(stderr, "qemu_vfio_init_ramblock: failed %p %" PRId64 "\n", -host_addr, (uint64_t)length); -} -return 0; -} - static void qemu_vfio_open_common(QEMUVFIOState *s) { qemu_mutex_init(>lock); s->ram_notifier.ram_block_added = qemu_vfio_ram_block_added; s->ram_notifier.ram_block_removed = qemu_vfio_ram_block_removed; -ram_block_notifier_add(>ram_notifier); s->low_water_mark = QEMU_VFIO_IOVA_MIN; s->high_water_mark = QEMU_VFIO_IOVA_MAX; -qemu_ram_foreach_block(qemu_vfio_init_ramblock, s); +ram_block_notifier_add(>ram_notifier); } /** -- 2.24.1
Re: [PATCH] maint: Include top-level *.rst files early in git diff
Peter Maydell writes: > On Thu, 20 Feb 2020 at 16:22, Eric Blake wrote: >> >> We are converting more doc files to *.rst rather than *.texi. Most >> doc files are already listed early in diffs due to our catchall >> docs/*, but a few top-level files get missed by that glob. >> >> Signed-off-by: Eric Blake >> --- >> >> Both *.texi and *.rst entries make sense while we are still converting >> things, but we'll need a followup to drop *.texi when the conversion >> is complete... > > I was wondering what rst files we had outside docs, and > the answer is "README.rst, CODING_STYLE.rst and > tests/acceptance/README.rst". > > (There's a reasonable argument for moving CODING_STYLE.rst to > docs/devel now; depends how highly you rate having it more > "discoverable" to new contributors.) The little cynic in the back of my head cackles and points out that CODING_STYLE.rst exists not for new contributors to discover, but so that old contributors have something to smack on new heads. SCNR :)
Re: [PATCH v5 0/5] gpio: Add GPIO Aggregator
Hi Linus and Bartosz, On Tue, Feb 18, 2020 at 4:18 PM Geert Uytterhoeven wrote: > GPIO controllers are exported to userspace using /dev/gpiochip* > character devices. Access control to these devices is provided by > standard UNIX file system permissions, on an all-or-nothing basis: > either a GPIO controller is accessible for a user, or it is not. > Currently no mechanism exists to control access to individual GPIOs. > > Hence this adds a GPIO driver to aggregate existing GPIOs, and expose > them as a new gpiochip. This is useful for implementing access control, > and assigning a set of GPIOs to a specific user. Furthermore, this > simplifies and hardens exporting GPIOs to a virtual machine, as the VM > can just grab the full GPIO controller, and no longer needs to care > about which GPIOs to grab and which not, reducing the attack surface. Do you have any more comments, before I respin and post v6? Thanks, and have a niec weekend! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v5 4/5] docs: gpio: Add GPIO Aggregator documentation
On 2/21/20 8:34 AM, Geert Uytterhoeven wrote: > Hi Randy, > Hi Geert, Those changes look good. Thanks. -- ~Randy
Re: [PATCH 0/3] hw: More dma_memory_read/write() API cleanup
On 21/02/20 14:25, Philippe Mathieu-Daudé wrote: > Following up "global exec/memory/dma APIs cleanup" > https://www.mail-archive.com/qemu-devel@nongnu.org/msg681475.html > > Few more cleanups in PCNET & SCSI ESP devices. Can you send a pull request for everything? Paolo
[PATCH] accel/kvm: Check ioctl(KVM_SET_USER_MEMORY_REGION) return value
kvm_vm_ioctl() can fail, check its return value, and log an error when it failed. This fixes Coverity CID 1412229: Unchecked return value (CHECKED_RETURN) check_return: Calling kvm_vm_ioctl without checking return value Reported-by: Coverity (CID 1412229) Fixes: 235e8982ad3 ("support using KVM_MEM_READONLY flag for regions") Signed-off-by: Philippe Mathieu-Daudé --- accel/kvm/kvm-all.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index c111312dfd..6df3a4d030 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -308,13 +308,23 @@ static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, boo /* Set the slot size to 0 before setting the slot to the desired * value. This is needed based on KVM commit 75d61fbc. */ mem.memory_size = 0; -kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, ); +ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, ); +if (ret < 0) { +goto err; +} } mem.memory_size = slot->memory_size; ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, ); slot->old_flags = mem.flags; +err: trace_kvm_set_user_memory(mem.slot, mem.flags, mem.guest_phys_addr, mem.memory_size, mem.userspace_addr, ret); +if (ret < 0) { +error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d," + " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s", + __func__, mem.slot, slot->start_addr, + (uint64_t)mem.memory_size, strerror(errno)); +} return ret; } -- 2.21.1
Re: [PATCH v1 05/13] migrate/ram: Handle RAM block resizes during precopy
On Fri, Feb 21, 2020 at 11:19:58AM +0100, David Hildenbrand wrote: [...] > Lol, man this code is confusing. (Yes in many places it is...) > > So, migration_is_idle() really only checks current_migration, not > current_incoming. > > I didn't expect to be knees deep in migration code at this point ... > > if (migration_is_idle()) { > return: > } Yes this seems to work too for postcopy, though may worth add a comment showing the fact... Thanks, -- Peter Xu
[PATCH v2 1/2] block/curl: HTTP header fields allow whitespace around values
RFC 7230 section 3.2 indicates that whitespace is permitted between the field name and field value and after the field value. Signed-off-by: David Edmondson --- block/curl.c | 31 +++ 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/block/curl.c b/block/curl.c index f86299378e38..f9ffb7f4e2bf 100644 --- a/block/curl.c +++ b/block/curl.c @@ -214,11 +214,34 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque) { BDRVCURLState *s = opaque; size_t realsize = size * nmemb; -const char *accept_line = "Accept-Ranges: bytes"; +const char *header = (char *)ptr; +const char *end = header + realsize; +const char *accept_ranges = "Accept-Ranges:"; +const char *bytes = "bytes"; -if (realsize >= strlen(accept_line) -&& strncmp((char *)ptr, accept_line, strlen(accept_line)) == 0) { -s->accept_range = true; +if (realsize >= strlen(accept_ranges) +&& strncmp(header, accept_ranges, strlen(accept_ranges)) == 0) { + +char *p = strchr(header, ':') + 1; + +/* Skip whitespace between the header name and value. */ +while (p < end && *p && g_ascii_isspace(*p)) { +p++; +} + +if (end - p >= strlen(bytes) +&& strncmp(p, bytes, strlen(bytes)) == 0) { + +/* Check that there is nothing but whitespace after the value. */ +p += strlen(bytes); +while (p < end && *p && g_ascii_isspace(*p)) { +p++; +} + +if (p == end || !*p) { +s->accept_range = true; +} +} } return realsize; -- 2.24.1
Re: [PATCH v7 01/11] qapi/error: add (Error **errp) cleaning APIs
Vladimir Sementsov-Ogievskiy writes: > 21.02.2020 10:38, Markus Armbruster wrote: >> Vladimir Sementsov-Ogievskiy writes: >> >>> Add functions to clean Error **errp: call corresponding Error *err >>> cleaning function an set pointer to NULL. >>> >>> New functions: >>>error_free_errp >>>error_report_errp >>>warn_report_errp >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy >>> Reviewed-by: Greg Kurz >>> Reviewed-by: Eric Blake >>> --- >>> >>> CC: Eric Blake >>> CC: Kevin Wolf >>> CC: Max Reitz >>> CC: Greg Kurz >>> CC: Stefano Stabellini >>> CC: Anthony Perard >>> CC: Paul Durrant >>> CC: Stefan Hajnoczi >>> CC: "Philippe Mathieu-Daudé" >>> CC: Laszlo Ersek >>> CC: Gerd Hoffmann >>> CC: Stefan Berger >>> CC: Markus Armbruster >>> CC: Michael Roth >>> CC: qemu-bl...@nongnu.org >>> CC: xen-de...@lists.xenproject.org >>> >>> include/qapi/error.h | 26 ++ >>> 1 file changed, 26 insertions(+) >>> >>> diff --git a/include/qapi/error.h b/include/qapi/error.h >>> index ad5b6e896d..d34987148d 100644 >>> --- a/include/qapi/error.h >>> +++ b/include/qapi/error.h >>> @@ -309,6 +309,32 @@ void warn_reportf_err(Error *err, const char *fmt, ...) >>> void error_reportf_err(Error *err, const char *fmt, ...) >>> GCC_FMT_ATTR(2, 3); >>> +/* >>> + * Functions to clean Error **errp: call corresponding Error *err cleaning >>> + * function, then set pointer to NULL. >>> + */ >>> +static inline void error_free_errp(Error **errp) >>> +{ >>> +assert(errp && *errp); >>> +error_free(*errp); >>> +*errp = NULL; >>> +} >>> + >>> +static inline void error_report_errp(Error **errp) >>> +{ >>> +assert(errp && *errp); >>> +error_report_err(*errp); >>> +*errp = NULL; >>> +} >>> + >>> +static inline void warn_report_errp(Error **errp) >>> +{ >>> +assert(errp && *errp); >>> +warn_report_err(*errp); >>> +*errp = NULL; >>> +} >>> + >>> + >>> /* >>>* Just like error_setg(), except you get to specify the error class. >>>* Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is >> >> These appear to be unused apart from the Coccinelle script in PATCH 03. >> >> They are used in the full "[RFC v5 000/126] error: auto propagated >> local_err" series. Options: >> >> 1. Pick a few more patches into this part I series, so these guys come >> with users. > > It needs some additional effort for this series.. But it's possible. Still, > I think that we at least should not pull out patches which should be in > future series (for example from ppc or block/).. Yes, we want to keep related stuff together. > Grepping through v5: > for x in {warn_report_errp,error_report_errp,error_free_errp}; do echo == $x > ==; git grep -l $x | grep -v coccinelle | grep -v 'error\.h'; echo; done > == warn_report_errp == > block/file-posix.c > hw/ppc/spapr.c > hw/ppc/spapr_caps.c > hw/ppc/spapr_irq.c > hw/vfio/pci.c > net/tap.c > qom/object.c > > == error_report_errp == > hw/block/vhost-user-blk.c > util/oslib-posix.c > > == error_free_errp == > block.c > block/qapi.c > block/sheepdog.c > block/snapshot.c > blockdev.c > chardev/char-socket.c > hw/audio/intel-hda.c > hw/core/qdev-properties.c > hw/pci-bridge/pci_bridge_dev.c > hw/pci-bridge/pcie_pci_bridge.c > hw/scsi/megasas.c > hw/scsi/mptsas.c > hw/usb/hcd-xhci.c > io/net-listener.c > migration/colo.c > qga/commands-posix.c > qga/commands-win32.c > util/qemu-sockets.c > > What do you want to add? PATCH v5 032 uses both error_report_errp() and error_free_errp(). Adding warn_report_errp() without a user is okay with me. What do you think? If there are patches you consider related to 032, feel free to throw them in. >> >> 2. Punt this patch to the first part that has users, along with the >> part of the Coccinelle script that deals with them. > > But coccinelle script would be wrong, if we drop this part from it. I think, > that after commit which adds coccinelle script, it should work with any file, > not only subset of these series. > > So, it's probably OK for now to drop these functions, forcing their addition > if > coccinelle script will be applied where these functions are needed. We can, > for > example comment these three functions. > > Splitting coccinelle script into two parts, which will be in different series > will > not help any patch-porting processes. > > Moreover, this will create dependencies between future series updating other > files. > > So, I don't like [2.].. And it's likely more work than 1. >> 3. Do nothing: accept the functions without users. > > OK for me) > >> >> I habitually dislike 3., but reviewing the rest of this series might >> make me override that dislike. [...]
Re: [PATCH v5 4/5] docs: gpio: Add GPIO Aggregator documentation
Hi Randy, On Tue, Feb 18, 2020 at 7:30 PM Randy Dunlap wrote: > On 2/18/20 7:18 AM, Geert Uytterhoeven wrote: > > Document the GPIO Aggregator, and the two typical use-cases. > > > > Signed-off-by: Geert Uytterhoeven > > --- /dev/null > > +++ b/Documentation/admin-guide/gpio/gpio-aggregator.rst > > @@ -0,0 +1,102 @@ > > +.. SPDX-License-Identifier: GPL-2.0-only > > + > > +GPIO Aggregator > > +=== > > + > > +The GPIO Aggregator allows to aggregate GPIOs, and expose them as a new > > "allows" really wants an object following the verb [although the kernel > sources > and docs have many cases of it not having an object]. Something like > >allows {you, one, someone, users, a user} to aggregate Changing to: provides a mechanism to aggregate GPIOs > > +gpio_chip. This supports the following use cases. > > + > > + > > +Aggregating GPIOs using Sysfs > > +- > > + > > +GPIO controllers are exported to userspace using /dev/gpiochip* character > > +devices. Access control to these devices is provided by standard UNIX file > > +system permissions, on an all-or-nothing basis: either a GPIO controller is > > +accessible for a user, or it is not. > > + > > +The GPIO Aggregator allows access control for individual GPIOs, by > > aggregating Changing to: provides access control for a set of one or more GPIOs > > +them into a new gpio_chip, which can be assigned to a group or user using > > +standard UNIX file ownership and permissions. Furthermore, this > > simplifies and > > +hardens exporting GPIOs to a virtual machine, as the VM can just grab the > > full > > +GPIO controller, and no longer needs to care about which GPIOs to grab and > > +which not, reducing the attack surface. > > +Generic GPIO Driver > > +--- > > + > > +The GPIO Aggregator can also be used as a generic driver for a simple > > +GPIO-operated device described in DT, without a dedicated in-kernel driver. > > +This is useful in industrial control, and is not unlike e.g. spidev, which > > +allows to communicate with an SPI device from userspace. > >allows {choose an object} to communicate Changing to: allows the user to communicate Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
[PATCH v2 0/2] block/curl: Improve HTTP header parsing
An HTTP object store of my acquaintance returns "accept-ranges: bytes" (all lower case) as a header, causing the QEMU curl backend to refuse to talk to it. RFC 7230 says that HTTP headers are case insensitive, so update the curl backend accordingly. At the same time, allow for arbitrary white space around the HTTP header field value, as required by the RFC. v2: - strncasecmp -> g_ascii_strncasecmp (Philippe). - isspace -> g_ascii_isspace, for good measure. David Edmondson (2): block/curl: HTTP header fields allow whitespace around values block/curl: HTTP header field names are case insensitive block/curl.c | 31 +++ 1 file changed, 27 insertions(+), 4 deletions(-) -- 2.24.1
[PATCH v2 2/2] block/curl: HTTP header field names are case insensitive
RFC 7230 section 3.2 indicates that HTTP header field names are case insensitive. Signed-off-by: David Edmondson --- block/curl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/curl.c b/block/curl.c index f9ffb7f4e2bf..1421e8fb9815 100644 --- a/block/curl.c +++ b/block/curl.c @@ -216,11 +216,11 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque) size_t realsize = size * nmemb; const char *header = (char *)ptr; const char *end = header + realsize; -const char *accept_ranges = "Accept-Ranges:"; +const char *accept_ranges = "accept-ranges:"; const char *bytes = "bytes"; if (realsize >= strlen(accept_ranges) -&& strncmp(header, accept_ranges, strlen(accept_ranges)) == 0) { +&& g_ascii_strncasecmp(header, accept_ranges, strlen(accept_ranges)) == 0) { char *p = strchr(header, ':') + 1; -- 2.24.1
[PATCH] hw/unicore32/puv3: Simplify puv3_soc_init()
Since commit d8ed887bdc, the puv3_intc_cpu_handler handler takes a pointer to UniCore32CPU in its opaque argument. Directly pass the cpu pointer. Signed-off-by: Philippe Mathieu-Daudé --- hw/unicore32/puv3.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/hw/unicore32/puv3.c b/hw/unicore32/puv3.c index 7e933de228..eec7f561eb 100644 --- a/hw/unicore32/puv3.c +++ b/hw/unicore32/puv3.c @@ -48,7 +48,7 @@ static void puv3_intc_cpu_handler(void *opaque, int irq, int level) } } -static void puv3_soc_init(CPUUniCore32State *env) +static void puv3_soc_init(UniCore32CPU *cpu) { qemu_irq cpu_intc, irqs[PUV3_IRQS_NR]; DeviceState *dev; @@ -56,8 +56,7 @@ static void puv3_soc_init(CPUUniCore32State *env) int i; /* Initialize interrupt controller */ -cpu_intc = qemu_allocate_irq(puv3_intc_cpu_handler, - env_archcpu(env), 0); +cpu_intc = qemu_allocate_irq(puv3_intc_cpu_handler, cpu, 0); dev = sysbus_create_simple("puv3_intc", PUV3_INTC_BASE, cpu_intc); for (i = 0; i < PUV3_IRQS_NR; i++) { irqs[i] = qdev_get_gpio_in(dev, i); @@ -131,7 +130,7 @@ static void puv3_init(MachineState *machine) cpu = UNICORE32_CPU(cpu_create(machine->cpu_type)); env = >env; -puv3_soc_init(env); +puv3_soc_init(cpu); puv3_board_init(env, ram_size); puv3_load_kernel(kernel_filename); } -- 2.21.1
Re: [PATCH v7 02/11] error: auto propagated local_err
Vladimir Sementsov-Ogievskiy writes: > 21.02.2020 12:19, Markus Armbruster wrote: >> Vladimir Sementsov-Ogievskiy writes: >> >>> Here is introduced ERRP_AUTO_PROPAGATE macro, to be used at start of >>> functions with an errp OUT parameter. >>> >>> It has three goals: >>> >>> 1. Fix issue with error_fatal and error_prepend/error_append_hint: user >>> can't see this additional information, because exit() happens in >>> error_setg earlier than information is added. [Reported by Greg Kurz] >>> >>> 2. Fix issue with error_abort and error_propagate: when we wrap >>> error_abort by local_err+error_propagate, the resulting coredump will >>> refer to error_propagate and not to the place where error happened. >>> (the macro itself doesn't fix the issue, but it allows us to [3.] drop >>> the local_err+error_propagate pattern, which will definitely fix the >>> issue) [Reported by Kevin Wolf] >>> >>> 3. Drop local_err+error_propagate pattern, which is used to workaround >>> void functions with errp parameter, when caller wants to know resulting >>> status. (Note: actually these functions could be merely updated to >>> return int error code). >>> >>> To achieve these goals, later patches will add invocations >>> of this macro at the start of functions with either use >>> error_prepend/error_append_hint (solving 1) or which use >>> local_err+error_propagate to check errors, switching those >>> functions to use *errp instead (solving 2 and 3). >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy >>> Reviewed-by: Greg Kurz >>> Reviewed-by: Eric Blake >>> --- >>> >>> CC: Eric Blake >>> CC: Kevin Wolf >>> CC: Max Reitz >>> CC: Greg Kurz >>> CC: Stefano Stabellini >>> CC: Anthony Perard >>> CC: Paul Durrant >>> CC: Stefan Hajnoczi >>> CC: "Philippe Mathieu-Daudé" >>> CC: Laszlo Ersek >>> CC: Gerd Hoffmann >>> CC: Stefan Berger >>> CC: Markus Armbruster >>> CC: Michael Roth >>> CC: qemu-bl...@nongnu.org >>> CC: xen-de...@lists.xenproject.org >>> >>> include/qapi/error.h | 83 +++- >>> 1 file changed, 82 insertions(+), 1 deletion(-) >>> >>> diff --git a/include/qapi/error.h b/include/qapi/error.h >>> index d34987148d..b9452d4806 100644 >>> --- a/include/qapi/error.h >>> +++ b/include/qapi/error.h >>> @@ -78,7 +78,7 @@ >>>* Call a function treating errors as fatal: >>>* foo(arg, _fatal); >>>* >>> - * Receive an error and pass it on to the caller: >>> + * Receive an error and pass it on to the caller (DEPRECATED*): >>>* Error *err = NULL; >>>* foo(arg, ); >>>* if (err) { >>> @@ -98,6 +98,50 @@ >>>* foo(arg, errp); >>>* for readability. >>>* >>> + * DEPRECATED* This pattern is deprecated now, the use ERRP_AUTO_PROPAGATE >>> macro >>> + * instead (defined below). >>> + * It's deprecated because of two things: >>> + * >>> + * 1. Issue with error_abort and error_propagate: when we wrap error_abort >>> by >>> + * local_err+error_propagate, the resulting coredump will refer to >>> + * error_propagate and not to the place where error happened. >>> + * >>> + * 2. A lot of extra code of the same pattern >>> + * >>> + * How to update old code to use ERRP_AUTO_PROPAGATE? >>> + * >>> + * All you need is to add ERRP_AUTO_PROPAGATE() invocation at function >>> start, >>> + * than you may safely dereference errp to check errors and do not need any >>> + * additional local Error variables or calls to error_propagate(). >>> + * >>> + * Example: >>> + * >>> + * old code >>> + * >>> + * void fn(..., Error **errp) { >>> + * Error *err = NULL; >>> + * foo(arg, ); >>> + * if (err) { >>> + * handle the error... >>> + * error_propagate(errp, err); >>> + * return; >>> + * } >>> + * ... >>> + * } >>> + * >>> + * updated code >>> + * >>> + * void fn(..., Error **errp) { >>> + * ERRP_AUTO_PROPAGATE(); >>> + * foo(arg, errp); >>> + * if (*errp) { >>> + * handle the error... >>> + * return; >>> + * } >>> + * ... >>> + * } >>> + * >>> + * >>>* Receive and accumulate multiple errors (first one wins): >>>* Error *err = NULL, *local_err = NULL; >>>* foo(arg, ); >> >> Let's explain what should be done *first*, and only then talk about the >> deprecated pattern and how to convert it to current usage. >> >>> @@ -348,6 +392,43 @@ void error_set_internal(Error **errp, >>> ErrorClass err_class, const char *fmt, ...) >>> GCC_FMT_ATTR(6, 7); >>> +typedef struct ErrorPropagator { >>> +Error *local_err; >>> +Error **errp; >>> +} ErrorPropagator; >>> + >>> +static inline void error_propagator_cleanup(ErrorPropagator *prop) >>> +{ >>> +error_propagate(prop->errp, prop->local_err); >>> +} >>> + >>> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, >>> error_propagator_cleanup); >>> + >>> +/* >>> + * ERRP_AUTO_PROPAGATE >>> + * >>> + *
[PATCH] target/i386: check for empty register in FXAM
The fxam instruction returns the wrong result after fdecstp or after an underflow. Check fptags to handle this. Reported-by: Signed-off-by: Paolo Bonzini --- target/i386/fpu_helper.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c index 99f28f267f..792a128a6d 100644 --- a/target/i386/fpu_helper.c +++ b/target/i386/fpu_helper.c @@ -991,7 +991,11 @@ void helper_fxam_ST0(CPUX86State *env) env->fpus |= 0x200; /* C1 <-- 1 */ } -/* XXX: test fptags too */ +if (env->fptags[env->fpstt]) { +env->fpus |= 0x4100; /* Empty */ +return; +} + expdif = EXPD(temp); if (expdif == MAXEXPD) { if (MANTD(temp) == 0x8000ULL) { -- 2.21.1
[PATCH] hw/mips/mips_int: Simplify cpu_mips_irq_init_cpu()
Since commit d8ed887bdc, the cpu_mips_irq_request handler takes a pointer to MIPSCPU in its opaque argument. Directly pass the cpu pointer. Signed-off-by: Philippe Mathieu-Daudé --- hw/mips/mips_int.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/mips/mips_int.c b/hw/mips/mips_int.c index 863ed45659..796730b11d 100644 --- a/hw/mips/mips_int.c +++ b/hw/mips/mips_int.c @@ -77,7 +77,7 @@ void cpu_mips_irq_init_cpu(MIPSCPU *cpu) qemu_irq *qi; int i; -qi = qemu_allocate_irqs(cpu_mips_irq_request, env_archcpu(env), 8); +qi = qemu_allocate_irqs(cpu_mips_irq_request, cpu, 8); for (i = 0; i < 8; i++) { env->irq[i] = qi[i]; } -- 2.21.1
[PULL v2 00/46] target-arm queue
v1->v2 changes: dropped the last 6 patches from rth as there's a problem with one of them that's too complicated to try to fix up. thanks -- PMM The following changes since commit a8c6af67e1e8d460e2c6e87070807e0a02c0fec2: Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200221' into staging (2020-02-21 14:20:42 +) are available in the Git repository at: https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200221-1 for you to fetch changes up to 9eb4f58918a851fb46895fd9b7ce579afeac9d02: target/arm: Set MVFR0.FPSP for ARMv5 cpus (2020-02-21 16:07:03 +) target-arm queue: * aspeed/scu: Implement chip ID register * hw/misc/iotkit-secctl: Fix writing to 'PPC Interrupt Clear' register * mainstone: Make providing flash images non-mandatory * z2: Make providing flash images non-mandatory * Fix failures to flush SVE high bits after AdvSIMD INS/ZIP/UZP/TRN/TBL/TBX/EXT * Minor performance improvement: spend less time recalculating hflags values * Code cleanup to isar_feature function tests * Implement ARMv8.1-PMU and ARMv8.4-PMU extensions * Bugfix: correct handling of PMCR_EL0.LC bit * Bugfix: correct definition of PMCRDP * Correctly implement ACTLR2, HACTLR2 * allwinner: Wire up USB ports * Vectorize emulation of USHL, SSHL, PMUL* * xilinx_spips: Correct the number of dummy cycles for the FAST_READ_4 cmd * sh4: Fix PCI ISA IO memory subregion Francisco Iglesias (1): xilinx_spips: Correct the number of dummy cycles for the FAST_READ_4 cmd Guenter Roeck (6): mainstone: Make providing flash images non-mandatory z2: Make providing flash images non-mandatory hw: usb: hcd-ohci: Move OHCISysBusState and TYPE_SYSBUS_OHCI to include file hcd-ehci: Introduce "companion-enable" sysbus property arm: allwinner: Wire up USB ports sh4: Fix PCI ISA IO memory subregion Joel Stanley (2): aspeed/scu: Create separate write callbacks aspeed/scu: Implement chip ID register Peter Maydell (21): target/arm: Add _aa32_ to isar_feature functions testing 32-bit ID registers target/arm: Check aa32_pan in take_aarch32_exception(), not aa64_pan target/arm: Add isar_feature_any_fp16 and document naming/usage conventions target/arm: Define and use any_predinv isar_feature test target/arm: Factor out PMU register definitions target/arm: Add and use FIELD definitions for ID_AA64DFR0_EL1 target/arm: Use FIELD macros for clearing ID_DFR0 PERFMON field target/arm: Define an aa32_pmu_8_1 isar feature test function target/arm: Add _aa64_ and _any_ versions of pmu_8_1 isar checks target/arm: Stop assuming DBGDIDR always exists target/arm: Move DBGDIDR into ARMISARegisters target/arm: Read debug-related ID registers from KVM target/arm: Implement ARMv8.1-PMU extension target/arm: Implement ARMv8.4-PMU extension target/arm: Provide ARMv8.4-PMU in '-cpu max' target/arm: Correct definition of PMCRDP target/arm: Correct handling of PMCR_EL0.LC bit target/arm: Test correct register in aa32_pan and aa32_ats1e1 checks target/arm: Use isar_feature function for testing AA32HPD feature target/arm: Use FIELD_EX32 for testing 32-bit fields target/arm: Correctly implement ACTLR2, HACTLR2 Philippe Mathieu-Daudé (1): hw/misc/iotkit-secctl: Fix writing to 'PPC Interrupt Clear' register Richard Henderson (15): target/arm: Flush high bits of sve register after AdvSIMD EXT target/arm: Flush high bits of sve register after AdvSIMD TBL/TBX target/arm: Flush high bits of sve register after AdvSIMD ZIP/UZP/TRN target/arm: Flush high bits of sve register after AdvSIMD INS target/arm: Use bit 55 explicitly for pauth target/arm: Fix select for aa64_va_parameters_both target/arm: Remove ttbr1_valid check from get_phys_addr_lpae target/arm: Split out aa64_va_parameter_tbi, aa64_va_parameter_tbid target/arm: Vectorize USHL and SSHL target/arm: Convert PMUL.8 to gvec target/arm: Convert PMULL.64 to gvec target/arm: Convert PMULL.8 to gvec target/arm: Rename isar_feature_aa32_simd_r32 target/arm: Use isar_feature_aa32_simd_r32 more places target/arm: Set MVFR0.FPSP for ARMv5 cpus hw/usb/hcd-ohci.h | 16 ++ include/hw/arm/allwinner-a10.h | 6 + target/arm/cpu.h | 145 ++--- target/arm/helper-sve.h| 2 + target/arm/helper.h| 21 +- target/arm/internals.h | 47 - target/arm/translate.h | 6 + hw/arm/allwinner-a10.c | 43 hw/arm/mainstone.c | 11 +- hw/arm/z2.c| 6 - hw/intc/armv7m_nvic.c | 10 +- hw/misc/aspeed_scu.c | 93 ++-- h
Re: [PATCH] target: i386: Check float overflow about register stack
On 21/02/20 15:09, Chen Gang wrote: >> -/* XXX: test fptags too */ >> +if (env->fptags[env->fpstt]) { >> +env->fpus |= 0x4100; /* Empty */ >> +return; >> +} >> + > For fpop overflow, this fix is enough, but for me, we still need > foverflow to check fpush/fld*_ST0 overflow. > > Don't you think we need check fpush/f[i]ld*_ST0 overflow? After fld/fild or any other push, FXAM ST0 should not return empty in my opinion. Paolo
Re: [PULL 00/52] target-arm queue
Patchew URL: https://patchew.org/QEMU/20200221130740.7583-1-peter.mayd...@linaro.org/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === The full log is available at http://patchew.org/logs/20200221130740.7583-1-peter.mayd...@linaro.org/testing.docker-quick@centos7/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
On Fri, 21 Feb 2020 at 16:05, BALATON Zoltan wrote: > > On Thu, 20 Feb 2020, Richard Henderson wrote: > > On 2/18/20 9:10 AM, BALATON Zoltan wrote: > >> +DEFINE_PROP_BOOL("hardfloat", PowerPCCPU, hardfloat, true), > > > > I would also prefer a different name here -- perhaps x-no-fp-fi. > > What's wrong with hardfloat? That's how the code refers to this so if > anyone searches what it does would turn up some meaningful results. This prompted me to check what you're using the property for. The cover letter says: > This patch implements a simple way to keep the inexact flag set for > hardfloat while still allowing to revert to softfloat for workloads > that need more accurate albeit slower emulation. (Set hardfloat > property of CPU, i.e. -cpu name,hardfloat=false for that.) I think that is the wrong approach. Enabling use of the host FPU should not affect the accuracy of the emulation, which should remain bitwise-correct. We should only be using the host FPU to the extent that we can do that without discarding accuracy. As far as I'm aware that's how the hardfloat support for other guest CPUs that use it works. thanks -- PMM
Re: [PATCH 10/19] target/arm: Add missing checks for fpsp_v2
On Fri, 14 Feb 2020 at 18:16, Richard Henderson wrote: > > We will eventually remove the early ARM_FEATURE_VFP test, > so add a proper test for each trans_* that does not already > have another ISA test. > > Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell thanks -- PMM
Re: [PATCH 04/19] target/arm: Set MVFR0.FPSP for ARMv5 cpus
On Fri, 14 Feb 2020 at 18:15, Richard Henderson wrote: > > We are going to convert FEATURE tests to ISAR tests, > so FPSP needs to be set for these cpus, like we have > already for FPDP. > > Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell thanks -- PMM