date:20200221

On 22/02/20 03:10, Chen Gang wrote:
> Set C1 to 1 if stack overflow occurred; set to 0 otherwise".
> 
> In helper_fxam_ST0, I guess, we need "env->fpus |= 0x200" (but I don't
> know wheter it will be conflict with SIGND(temp)). And we have to still
> need foverflow, because all env->fptags being 0 doesn't mean overflow.

No, you need to add "env->fpus |= 0x200" and "env->fpus &= ~0x200"
directly to fpush, fpop, etc.

Polo

[Bug 1814420] Re: drive-backup with iscsi, it will failed "Could not create image: Invalid argument"

2020-02-21 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1814420

Title:
  drive-backup with iscsi, it will failed "Could not create image:
  Invalid argument"

Status in QEMU:
  Expired

Bug description:
  I use iscsi protocol to drive-backup:

  ---iscsi target---
  yum -y install targetcli python-rtslib
  systemctl start target
  systemctl enable target
  targetcli /iscsi create iqn.2019-01.com.iaas
  targetcli /iscsi/iqn.2019-01.com.iaas/tpg1 set attribute authentication=0 
demo_mode_write_protect=0 generate_node_acls=1
  targetcli /iscsi/iqn.2019-01.com.iaas/tpg1/portals create 192.168.1.1 3260
  targetcli /backstores/fileio create file1 /opt/file1 2G
  targetcli /iscsi/iqn.2019-01.com.iaas/tpg1/luns create 
/backstores/fileio/file1
  ---

  Now, '{ "execute" : "drive-backup" , "arguments" : { "device" :
  "drive-virtio-disk0" , "sync" : "top" , "target" :
  "iscsi://192.168.1.1:3260/iqn.2019-01.com.iaas/0" } }'

  It may failed:
  {"id":"libvirt-1785","error":{"class":"GenericError","desc":"Could not
  create image: Invalid argument"}}

  But, This abnormal will be appear at the first time. Because when I
  retry again, It works very well.

  Then, I re-start the vm, It still be failed 'Could not create image:
  Invalid argument' on the first try, and the second try it will work
  very well.

  ---
  Host: centos 7.5
  qemu version: 2.12 and 3.1.0
  qemu command line: qemu-system-x86_64 -name guest=test,debug-threads=on -S 
-object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-190-test./master-key.aes
 -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off,mem-merge=off -m 
1024 -mem-prealloc -mem-path /dev/hugepages1G/libvirt/qemu/190-test -realtime 
mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 
1c8611c2-a18a-4b1c-b40b-9d82040eafa4 -smbios type=1,manufacturer=IaaS 
-no-user-config -nodefaults -chardev socket,id=charmonitor,fd=31,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive 
file=/opt/vol/sas/fb0c7c37-13e7-41fe-b3f8-f0fbaaeec7ce,format=qcow2,if=none,id=drive-virtio-disk0,cache=writeback
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on
 -drive 
file=/opt/vol/sas/bde66671-536d-49cd-8b46-a4f1ea7be513,format=qcow2,if=none,id=drive-virtio-disk1,cache=writeback
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1,write-cache=on
 -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:85:45:3e:d4:3a,bus=pci.0,addr=0x6 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-chardev socket,id=charchannel0,fd=35,server,nowait -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0,password -device 
cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1814420/+subscriptions

[Bug 1719282] Re: Unable to boot after drive-mirror

2020-02-21 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1719282

Title:
  Unable to boot after drive-mirror

Status in QEMU:
  Expired

Bug description:
  Hi,
  I am using "drive-mirror" qmp block-job command to transfer VM disk image to 
other path (different physical disk on host).
  Unfortunately after shutting down and starting from new image, VM is unable 
to boot and qrub enters rescue mode displaying following error:
  ```
  error: file '/grub/i386-pc/normal.mod' not found.
  Entering rescue mode...
  grub rescue>
  ```

  To investigate the problem, I compared both RAW images using linux
  "cmp -l" command and found out that they differ in 569028 bytes
  starting from address 185598977 to 252708864 which are located on
  /boot partition.

  So I mounted /boot partition of mirrored RAW image on host OS and it
  seems that file-system is broken and grub folder is not recognized.
  But /boot on original RAW image has no problem.

  Mirrored Image:
  ls -l /mnt/vm-boot/
  ls: cannot access /mnt/vm-boot/grub: Structure needs cleaning
  total 38168
  -rw-r--r-- 1 root root   157721 Oct 19  2016 config-3.16.0-4-amd64
  -rw-r--r-- 1 root root   129281 Sep 20  2015 config-3.2.0-4-amd64
  d? ? ??   ?? grub
  -rw-r--r-- 1 root root 15739360 Nov  2  2016 initrd.img-3.16.0-4-amd64
  -rw-r--r-- 1 root root 12115412 Oct 10  2015 initrd.img-3.2.0-4-amd64
  drwxr-xr-x 2 root root12288 Oct  7  2013 lost+found
  -rw-r--r-- 1 root root  2679264 Oct 19  2016 System.map-3.16.0-4-amd64
  -rw-r--r-- 1 root root  2114662 Sep 20  2015 System.map-3.2.0-4-amd64
  -rw-r--r-- 1 root root  3126448 Oct 19  2016 vmlinuz-3.16.0-4-amd64
  -rw-r--r-- 1 root root  2842592 Sep 20  2015 vmlinuz-3.2.0-4-amd64

  Original Image:
  ls /mnt/vm-boot/ -l
  total 38173
  -rw-r--r-- 1 root root   157721 Oct 19  2016 config-3.16.0-4-amd64
  -rw-r--r-- 1 root root   129281 Sep 20  2015 config-3.2.0-4-amd64
  drwxr-xr-x 5 root root 5120 Nov  2  2016 grub
  -rw-r--r-- 1 root root 15739360 Nov  2  2016 initrd.img-3.16.0-4-amd64
  -rw-r--r-- 1 root root 12115412 Oct 10  2015 initrd.img-3.2.0-4-amd64
  drwxr-xr-x 2 root root12288 Oct  7  2013 lost+found
  -rw-r--r-- 1 root root  2679264 Oct 19  2016 System.map-3.16.0-4-amd64
  -rw-r--r-- 1 root root  2114662 Sep 20  2015 System.map-3.2.0-4-amd64
  -rw-r--r-- 1 root root  3126448 Oct 19  2016 vmlinuz-3.16.0-4-amd64
  -rw-r--r-- 1 root root  2842592 Sep 20  2015 vmlinuz-3.2.0-4-amd64

  ls /mnt/vm-boot/grub/ -l
  total 2376
  -rw-r--r-- 1 root root  48 Oct  7  2013 device.map
  drwxr-xr-x 2 root root1024 Oct 10  2015 fonts
  -r--r--r-- 1 root root9432 Nov  2  2016 grub.cfg
  -rw-r--r-- 1 root root1024 Oct  7  2013 grubenv
  drwxr-xr-x 2 root root6144 Aug  6  2016 i386-pc
  drwxr-xr-x 2 root root1024 Aug  6  2016 locale
  -rw-r--r-- 1 root root 2400500 Aug  6  2016 unicode.pf2

  qemu Version: 2.7.0-10

  Host OS: Debian 8x64
  Guest OS: Debian 8x64

  QMP Commands log:
  socat UNIX-CONNECT:/var/run/qemu-server/48016.qmp STDIO
  {"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 2}, "package": 
"pve-qemu-kvm_2.7.0-10"}, "capabilities": []}}
  { "execute": "qmp_capabilities" }
  {"return": {}}
  { "execute": "drive-mirror",
"arguments": {
  "device": "drive-ide0",
  "target": "/diskc/48016/vm-48016-disk-2.raw",
  "sync": "full",
  "mode": "absolute-paths",
  "speed": 0
}
  }
  {"return": {}}
  {"timestamp": {"seconds": 1506331591, "microseconds": 623095}, "event": 
"BLOCK_JOB_READY", "data": {"device": "drive-ide0", "len": 269445758976, 
"offset": 269445758976, "speed": 0, "type": "mirror"}}
  {"timestamp": {"seconds": 1506332641, "microseconds": 245272}, "event": 
"SHUTDOWN"}
  {"timestamp": {"seconds": 1506332641, "microseconds": 377751}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "drive-ide0", "len": 271707340800, 
"offset": 271707340800, "speed": 0, "type": "mirror"}}

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1719282/+subscriptions

Re: [PATCH 2/2] riscv: sifive_u: Update BIOS_FILENAME for 32-bit

On Fri, Feb 21, 2020 at 6:53 PM Bin Meng  wrote:
>
> On Sat, Feb 22, 2020 at 3:51 AM Alistair Francis  wrote:
> >
> > On Thu, Feb 20, 2020 at 6:43 AM Bin Meng  wrote:
> > >
> > > Update BIOS_FILENAME to consider 32-bit bios image file name.
> > >
> > > Tested booting Linux v5.5 32-bit image (built from rv32_defconfig
> > > plus CONFIG_SOC_SIFIVE) with the default 32-bit bios image.
> >
> > Do we really want to support a 32-bit sifive_u machine?
> >
>
> QEMU is an emulator, why not? With 32-bit sifive_u machine, we can
> have 32-bit test coverage for SiFive specific drivers that cannot be
> done with the 'virt' machine.

That is a good point.

Reviewed-by: Alistair Francis 

Alistair

>
> Regards,
> Bin

Re: [PATCH 2/2] riscv: sifive_u: Update BIOS_FILENAME for 32-bit

2020-02-21 Thread Bin Meng

On Sat, Feb 22, 2020 at 3:51 AM Alistair Francis  wrote:
>
> On Thu, Feb 20, 2020 at 6:43 AM Bin Meng  wrote:
> >
> > Update BIOS_FILENAME to consider 32-bit bios image file name.
> >
> > Tested booting Linux v5.5 32-bit image (built from rv32_defconfig
> > plus CONFIG_SOC_SIFIVE) with the default 32-bit bios image.
>
> Do we really want to support a 32-bit sifive_u machine?
>

QEMU is an emulator, why not? With 32-bit sifive_u machine, we can
have 32-bit test coverage for SiFive specific drivers that cannot be
done with the 'virt' machine.

Regards,
Bin

Re: [PATCH] target: i386: Check float overflow about register stack

2020-02-21 Thread Chen Gang

On 2020/2/22 上午10:10, Chen Gang wrote:
> On 2020/2/22 上午12:18, Paolo Bonzini wrote:
>> On 21/02/20 15:09, Chen Gang wrote:
 -/* XXX: test fptags too */
 +if (env->fptags[env->fpstt]) {
 +env->fpus |= 0x4100; /* Empty */
 +return;
 +}
 +
>>> For fpop overflow, this fix is enough, but for me, we still need
>>> foverflow to check fpush/fld*_ST0 overflow.
>>>
>>> Don't you think we need check fpush/f[i]ld*_ST0 overflow?
>>
>> After fld/fild or any other push, FXAM ST0 should not return empty in my
>> opinion.
>>
> 
> OK, it sounds reasonable.
> 
> After check the intel document for f[i]ld* instructions, it says:
> 
>   "Set C1 to 1 if stack overflow occurred; set to 0 otherwise".
> 
> In helper_fxam_ST0, I guess, we need "env->fpus |= 0x200" (but I don't
> know wheter it will be conflict with SIGND(temp)). And we have to still
> need foverflow, because all env->fptags being 0 doesn't mean overflow.
> 

After read the document more about fstp* and fld* instructions, I find
that fstp* are also related with C1, and fld* will generate IS exception
when stack underflow or overflow occurred.

It seems more modifications should be done (but they will be several
patches).

Thanks.

Re: [PATCH] target: i386: Check float overflow about register stack

2020-02-21 Thread Chen Gang

On 2020/2/22 上午12:18, Paolo Bonzini wrote:
> On 21/02/20 15:09, Chen Gang wrote:
>>> -/* XXX: test fptags too */
>>> +if (env->fptags[env->fpstt]) {
>>> +env->fpus |= 0x4100; /* Empty */
>>> +return;
>>> +}
>>> +
>> For fpop overflow, this fix is enough, but for me, we still need
>> foverflow to check fpush/fld*_ST0 overflow.
>>
>> Don't you think we need check fpush/f[i]ld*_ST0 overflow?
> 
> After fld/fild or any other push, FXAM ST0 should not return empty in my
> opinion.
> 

OK, it sounds reasonable.

After check the intel document for f[i]ld* instructions, it says:

  "Set C1 to 1 if stack overflow occurred; set to 0 otherwise".

In helper_fxam_ST0, I guess, we need "env->fpus |= 0x200" (but I don't
know wheter it will be conflict with SIGND(temp)). And we have to still
need foverflow, because all env->fptags being 0 doesn't mean overflow.

Thanks.

Re: [PATCH] hw/ide: Remove status register read side effect

2020-02-21 Thread jasper.lowell

> The chip docs don't mention any side effect, they only say that the
> DMA 
> IRQ and Error bits in the bus master IDE status reg (bits 2 and 1)
> are 
> write 1 to clear

I haven't found any documentation that mention that side effect either.
As you say, it only mentions that you should set the bit to clear.
While the side effect of clearing interrupts by reading the status
register doesn't appear to be in documentation anywhere (to my
knowledge), I do see this side effect referenced in the source code of
drivers occasionally.

In /drivers/ide/ide-io.c of the Linux kernel:
/*
 * Whack the status register, just in case
 * we have a leftover pending IRQ.
 */
(void)hwif->tp_ops->read_status(hwif);

Along with:
 *  There's nothing really useful we can do with an unexpected
interrupt,
 *  other than reading the status register (to clear it), and
logging it.

The CMD64x specific code in the Linux kernel does explicitly set the
IRQ bits in the bus master IDE status register to clear it. I'm not
sure why, so maybe someone else can chime in explaining why Linux
sometimes clears interrupts by reading the status register and other
times follows the documentation and sets the required bits. The OpenBSD
driver also appears to set the bits explicitly.

I think the reason why the Solaris 10 driver crashes fatally whereas
Linux and OpenBSD ignore the side effect is because when clearing
interrupts, Solaris 10 expects the interrupt bit to be set and checks
this. Linux and OpenBSD appear to clear it regardless of its state.

With the patch, I didn't notice any regression for OpenBSD under Sun4u
and there were improvements to the Solaris 10 boot under Sun4u.

> Unfortunately 
> this does not change my problems with other BMDMA devices on PPC.

This patch doesn't solve all the problems for Solaris 10. It gets
further in the boot process but it is still unable to mount the file
system. I suspect that there are more bugs in the IDE/CMD646 emulation.
I'm going to continue looking into it. It's still possible we are being
affected by the same bugs. Right now, I'm considering that the
unresponsive serial console under Sun4u on Solaris 10 is due to not
being able to mount the file system and pull the required assets for
the installation menu.

> this change seems to break 
> something else according to a CI test report from patchew.

The test appears to break here in /tests/qtest/ide-test.c for obvious
reasons:
/* Reading the status register clears the IRQ */
g_assert(!qtest_get_irq(qts, IDE_PRIMARY_IRQ));

Should the patch I've suggested be correct, this test would need to be
updated. This is my first patch attempt for QEMU so I'm not sure what
the process is. Should I be submitting a new V2 patch with these
changes? I won't have the opportunity to update the patch for a few
days but will continue watching the thread for reviews.

Thanks,
Jasper Lowell.

On Fri, 2020-02-21 at 16:43 +0100, BALATON Zoltan wrote:
> On Fri, 21 Feb 2020, jasper.low...@bt.com wrote:
> > The Linux libATA API documentation mentions that on some hardware,
> > reading the status register has the side effect of clearing the
> > interrupt condition. When emulating the generic Sun4u machine
> > running
> > Solaris 10, the Solaris 10 CMD646 driver exits fatally because of
> > this
> > emulated side effect. This side effect is likely to not exist on
> > real
> > CMD646 hardware.
> 
> The chip docs don't mention any side effect, they only say that the
> DMA 
> IRQ and Error bits in the bus master IDE status reg (bits 2 and 1)
> are 
> write 1 to clear so this might be OK but this change seems to break 
> something else according to a CI test report from patchew.
> Unfortunately 
> this does not change my problems with other BMDMA devices on PPC.
> 
> Regards,
> BALATON Zoltan
> 
> > Signed-off-by: Jasper Lowell 
> > ---
> > hw/ide/core.c | 1 -
> > 1 file changed, 1 deletion(-)
> > 
> > diff --git a/hw/ide/core.c b/hw/ide/core.c
> > index 8eb766..82fd0632ac 100644
> > --- a/hw/ide/core.c
> > +++ b/hw/ide/core.c
> > @@ -2210,7 +2210,6 @@ uint32_t ide_ioport_read(void *opaque,
> > uint32_t addr)
> > } else {
> > ret = s->status;
> > }
> > -qemu_irq_lower(bus->irq);
> > break;
> > }
> > 
> >

[PATCH] linux-user: Add an argument QEMU_MMAP_BASE to set custom mmap base address in qemu user mode

2020-02-21 Thread Lirong Yuan

This change allows us to set custom base address for guest programs. It is 
needed to allow qemu to work with Thread Sanitizer (TSan), which has specific 
boundary definitions for memory mappings on different platforms:
https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/tsan/rtl/tsan_platform.h

Signed-off-by: Lirong Yuan 
---
 linux-user/main.c | 12 
 linux-user/mmap.c |  3 ++-
 linux-user/qemu.h |  5 +
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index fba833aac9..c01af6bfee 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -336,6 +336,16 @@ static void handle_arg_guest_base(const char *arg)
 have_guest_base = 1;
 }
 
+static void handle_arg_mmap_base(const char *arg)
+{
+int err = qemu_strtoul(arg, NULL, 0, _base);
+if (err) {
+fprintf(stderr, "Invalid mmap_base: %s, err: %d\n", arg, err);
+exit(EXIT_FAILURE);
+}
+mmap_next_start = mmap_base;
+}
+
 static void handle_arg_reserved_va(const char *arg)
 {
 char *p;
@@ -440,6 +450,8 @@ static const struct qemu_argument arg_table[] = {
  "uname",  "set qemu uname release string to 'uname'"},
 {"B",  "QEMU_GUEST_BASE",  true,  handle_arg_guest_base,
  "address","set guest_base address to 'address'"},
+{"mmap_base",  "QEMU_MMAP_BASE",   true,  handle_arg_mmap_base,
+ "",   "begin allocating guest pages at this host address"},
 {"R",  "QEMU_RESERVED_VA", true,  handle_arg_reserved_va,
  "size",   "reserve 'size' bytes for guest virtual address space"},
 {"d",  "QEMU_LOG", true,  handle_arg_log,
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 8685f02e7e..3f35543acf 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -189,6 +189,7 @@ static int mmap_frag(abi_ulong real_start,
 # define TASK_UNMAPPED_BASE  0x4000
 #endif
 abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;
+abi_ulong mmap_base = TASK_UNMAPPED_BASE;
 
 unsigned long last_brk;
 
@@ -299,7 +300,7 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, 
abi_ulong align)
 
 if ((addr & (align - 1)) == 0) {
 /* Success.  */
-if (start == mmap_next_start && addr >= TASK_UNMAPPED_BASE) {
+if (start == mmap_next_start && addr >= mmap_base) {
 mmap_next_start = addr + size;
 }
 return addr;
diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 560a68090e..83c00cfea2 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -161,6 +161,11 @@ void task_settid(TaskState *);
 void stop_all_tasks(void);
 extern const char *qemu_uname_release;
 extern unsigned long mmap_min_addr;
+/*
+ * mmap_base is minimum address to use when allocating guest pages. All guest
+ * pages will be allocated at this (guest) address or higher addresses.
+ */
+extern abi_ulong mmap_base;
 
 /* ??? See if we can avoid exposing so much of the loader internals.  */
 
-- 
2.25.0.265.gbab2e86ba0-goog

Re: RFC: Split EPT huge pages in advance of dirty logging

2020-02-21 Thread Peter Feiner

On Fri, Feb 21, 2020 at 2:08 PM Junaid Shahid  wrote:
>
> On 2/20/20 9:34 AM, Ben Gardon wrote:
> >
> > FWIW, we currently do this eager splitting at Google for live
> > migration. When the log-dirty-memory flag is set on a memslot we
> > eagerly split all pages in the slot down to 4k granularity.
> > As Jay said, this does not cause crippling lock contention because the
> > vCPU page faults generated by write protection / splitting can be
> > resolved in the fast page fault path without acquiring the MMU lock.
> > I believe +Junaid Shahid tried to upstream this approach at some point
> > in the past, but the patch set didn't make it in. (This was before my
> > time, so I'm hoping he has a link.)
> > I haven't done the analysis to know if eager splitting is more or less
> > efficient with parallel slow-path page faults, but it's definitely
> > faster under the MMU lock.
> >
>
> I am not sure if we ever posted those patches upstream. Peter Feiner would 
> know for sure. One notable difference in what we do compared to the approach 
> outlined by Jay is that we don't rely on tdp_page_fault() to do the 
> splitting. So we don't have to create a dummy VCPU and the specialized split 
> function is also much faster.

We've been carrying these patches since 2015. I've never posted them.
Getting them in shape for upstream consumption will take some work. I
can look into this next week.

Peter

Re: [PATCH] linux-user: Add an argument QEMU_MMAP_BASE to set custom mmap base address in qemu user mode

2020-02-21 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200221214614.165338-1-yua...@google.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH] linux-user: Add an argument QEMU_MMAP_BASE to set custom mmap 
base address in qemu user mode
Message-id: 20200221214614.165338-1-yua...@google.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
38a7b04 linux-user: Add an argument QEMU_MMAP_BASE to set custom mmap base 
address in qemu user mode

=== OUTPUT BEGIN ===
ERROR: consider using qemu_strtoul in preference to strtoul
#23: FILE: linux-user/main.c:341:
+mmap_base = strtoul(arg, NULL, 0);

total: 1 errors, 0 warnings, 50 lines checked

Commit 38a7b049a7f8 (linux-user: Add an argument QEMU_MMAP_BASE to set custom 
mmap base address in qemu user mode) has style problems, please review.  If any 
of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200221214614.165338-1-yua...@google.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: RFC: Split EPT huge pages in advance of dirty logging

2020-02-21 Thread Junaid Shahid


On 2/20/20 9:34 AM, Ben Gardon wrote:


FWIW, we currently do this eager splitting at Google for live
migration. When the log-dirty-memory flag is set on a memslot we
eagerly split all pages in the slot down to 4k granularity.
As Jay said, this does not cause crippling lock contention because the
vCPU page faults generated by write protection / splitting can be
resolved in the fast page fault path without acquiring the MMU lock.
I believe +Junaid Shahid tried to upstream this approach at some point
in the past, but the patch set didn't make it in. (This was before my
time, so I'm hoping he has a link.)
I haven't done the analysis to know if eager splitting is more or less
efficient with parallel slow-path page faults, but it's definitely
faster under the MMU lock.



I am not sure if we ever posted those patches upstream. Peter Feiner would know 
for sure. One notable difference in what we do compared to the approach 
outlined by Jay is that we don't rely on tdp_page_fault() to do the splitting. 
So we don't have to create a dummy VCPU and the specialized split function is 
also much faster.

Thanks,
Junaid

[PATCH] linux-user: Add an argument QEMU_MMAP_BASE to set custom mmap base address in qemu user mode

2020-02-21 Thread Lirong Yuan

This change allows us to set custom base address for guest programs. It is 
needed to allow qemu to work with Thread Sanitizer (TSan), which has specific 
boundary definitions for memory mappings on different platforms:
https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/tsan/rtl/tsan_platform.h

Signed-off-by: Lirong Yuan 
---
 linux-user/main.c | 12 
 linux-user/mmap.c |  3 ++-
 linux-user/qemu.h |  5 +
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index fba833aac9..dfcd867399 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -336,6 +336,16 @@ static void handle_arg_guest_base(const char *arg)
 have_guest_base = 1;
 }
 
+static void handle_arg_mmap_base(const char *arg)
+{
+mmap_base = strtoul(arg, NULL, 0);
+if (mmap_base == 0) {
+fprintf(stderr, "Invalid mmap_base: %s\n", arg);
+exit(EXIT_FAILURE);
+}
+mmap_next_start = mmap_base;
+}
+
 static void handle_arg_reserved_va(const char *arg)
 {
 char *p;
@@ -440,6 +450,8 @@ static const struct qemu_argument arg_table[] = {
  "uname",  "set qemu uname release string to 'uname'"},
 {"B",  "QEMU_GUEST_BASE",  true,  handle_arg_guest_base,
  "address","set guest_base address to 'address'"},
+{"mmap_base",  "QEMU_MMAP_BASE",   true,  handle_arg_mmap_base,
+ "",   "begin allocating guest pages at this host address"},
 {"R",  "QEMU_RESERVED_VA", true,  handle_arg_reserved_va,
  "size",   "reserve 'size' bytes for guest virtual address space"},
 {"d",  "QEMU_LOG", true,  handle_arg_log,
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 8685f02e7e..3f35543acf 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -189,6 +189,7 @@ static int mmap_frag(abi_ulong real_start,
 # define TASK_UNMAPPED_BASE  0x4000
 #endif
 abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;
+abi_ulong mmap_base = TASK_UNMAPPED_BASE;
 
 unsigned long last_brk;
 
@@ -299,7 +300,7 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, 
abi_ulong align)
 
 if ((addr & (align - 1)) == 0) {
 /* Success.  */
-if (start == mmap_next_start && addr >= TASK_UNMAPPED_BASE) {
+if (start == mmap_next_start && addr >= mmap_base) {
 mmap_next_start = addr + size;
 }
 return addr;
diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 560a68090e..83c00cfea2 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -161,6 +161,11 @@ void task_settid(TaskState *);
 void stop_all_tasks(void);
 extern const char *qemu_uname_release;
 extern unsigned long mmap_min_addr;
+/*
+ * mmap_base is minimum address to use when allocating guest pages. All guest
+ * pages will be allocated at this (guest) address or higher addresses.
+ */
+extern abi_ulong mmap_base;
 
 /* ??? See if we can avoid exposing so much of the loader internals.  */
 
-- 
2.25.0.265.gbab2e86ba0-goog

[PATCH v2 0/1] Enable PMR feature from NVMe 1.4 spec to NVMe driver

Changes since v1:
 - provided support for Bit 1 from PMRWBM register instead of Bit 0 to ensure
   improved performance in virtualized environment [1] (Stefan)

 - added check if pmr size is power of two in size (David)

 - addressed cross compilation build problems reported by CI environment

[1]: 
https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf
[2]: 
https://lore.kernel.org/qemu-devel/20200218224811.30050-1-andrzej.jakow...@linux.intel.com/
 
---

Persistent Memory Region (PMR) is a new optional feature provided in NVMe 1.4
specification. This patch implements initial support for it in NVMe driver.

Andrzej Jakowski (1):
  block/nvme: introduce PMR support from NVMe 1.4 spec

 hw/block/nvme.c   | 165 +++-
 hw/block/nvme.h   |   5 ++
 hw/block/trace-events |   5 ++
 include/block/nvme.h  | 172 ++
 4 files changed, 346 insertions(+), 1 deletion(-)

-- 
2.21.1

[PATCH v2 1/1] block/nvme: introduce PMR support from NVMe 1.4 spec

This patch introduces support for PMR that has been defined as part of NVMe 1.4
spec. User can now specify a pmr_file which will be mmap'ed into qemu address
space and subsequently in PCI BAR 2. Guest OS can perform mmio read and writes
to the PMR region that will stay persistent accross system reboot.

Signed-off-by: Andrzej Jakowski 
---
 hw/block/nvme.c   | 165 +++-
 hw/block/nvme.h   |   5 ++
 hw/block/trace-events |   5 ++
 include/block/nvme.h  | 172 ++
 4 files changed, 346 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d28335cbf3..ff7e74d765 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -19,10 +19,14 @@
  *  -drive file=,if=none,id=
  *  -device nvme,drive=,serial=,id=, \
  *  cmb_size_mb=, \
+ *  [pmr_file=,] \
  *  num_queues=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
+ *
+ * Either cmb or pmr - due to limitation in avaialbe BAR indexes.
+ * pmr_file file needs to be preallocated and power of two in size.
  */
 
 #include "qemu/osdep.h"
@@ -1141,6 +1145,28 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 NVME_GUEST_ERR(nvme_ub_mmiowr_cmbsz_readonly,
"invalid write to read only CMBSZ, ignored");
 return;
+#ifndef _WIN32
+case 0xE00: /* PMRCAP */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrcap_readonly,
+   "invalid write to PMRCAP register, ignored");
+return;
+case 0xE04: /* TODO PMRCTL */
+break;
+case 0xE08: /* PMRSTS */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrsts_readonly,
+   "invalid write to PMRSTS register, ignored");
+return;
+case 0xE0C: /* PMREBS */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrebs_readonly,
+   "invalid write to PMREBS register, ignored");
+return;
+case 0xE10: /* PMRSWTP */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrswtp_readonly,
+   "invalid write to PMRSWTP register, ignored");
+return;
+case 0xE14: /* TODO PMRMSC */
+ break;
+#endif /* !_WIN32 */
 default:
 NVME_GUEST_ERR(nvme_ub_mmiowr_invalid,
"invalid MMIO write,"
@@ -1169,6 +1195,22 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr 
addr, unsigned size)
 }
 
 if (addr < sizeof(n->bar)) {
+#ifndef _WIN32
+/*
+ * When PMRWBM bit 1 is set then read from
+ * from PMRSTS should ensure prior writes
+ * made it to persistent media
+ */
+if (addr == 0xE08 &&
+(NVME_PMRCAP_PMRWBM(n->bar.pmrcap) & 0x02) >> 1) {
+int ret;
+ret = msync(n->pmrbuf, n->f_pmr_size, MS_SYNC);
+if (!ret) {
+NVME_GUEST_ERR(nvme_ub_mmiord_pmrread_barrier,
+   "error while persisting data");
+}
+}
+#endif /* !_WIN32 */
 memcpy(, ptr + addr, size);
 } else {
 NVME_GUEST_ERR(nvme_ub_mmiord_invalid_ofs,
@@ -1303,6 +1345,31 @@ static const MemoryRegionOps nvme_cmb_ops = {
 },
 };
 
+#ifndef _WIN32
+static void nvme_pmr_write(void *opaque, hwaddr addr, uint64_t data,
+unsigned size)
+{
+NvmeCtrl *n = (NvmeCtrl *)opaque;
+stn_le_p(>pmrbuf[addr], size, data);
+}
+
+static uint64_t nvme_pmr_read(void *opaque, hwaddr addr, unsigned size)
+{
+NvmeCtrl *n = (NvmeCtrl *)opaque;
+return ldn_le_p(>pmrbuf[addr], size);
+}
+
+static const MemoryRegionOps nvme_pmr_ops = {
+.read = nvme_pmr_read,
+.write = nvme_pmr_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.impl = {
+.min_access_size = 1,
+.max_access_size = 8,
+},
+};
+#endif /* !_WIN32 */
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
@@ -1332,6 +1399,39 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 error_setg(errp, "serial property not set");
 return;
 }
+
+#ifndef _WIN32
+if (!n->cmb_size_mb && n->pmr_file) {
+int fd;
+
+n->f_pmr = fopen(n->pmr_file, "r+b");
+if (!n->f_pmr) {
+error_setg(errp, "pmr backend file open error");
+return;
+}
+
+fseek(n->f_pmr, 0L, SEEK_END);
+n->f_pmr_size = ftell(n->f_pmr);
+fseek(n->f_pmr, 0L, SEEK_SET);
+
+/* pmr file size needs to be power of 2 in size */
+if (!n->f_pmr_size || !is_power_of_2(n->f_pmr_size)) {
+error_setg(errp, "pmr backend file size needs to be greater than 0"
+ "and power of 2 in size");
+return;
+}
+
+fd = fileno(n->f_pmr);
+n->pmrbuf = mmap(NULL, n->f_pmr_size,
+ (PROT_READ | PROT_WRITE), MAP_SHARED, fd, 0);
+
+

[PATCH] linux-user: Add AT_EXECFN and AT_EXECFD auxval

2020-02-21 Thread Lirong Yuan

This change adds the support for AT_EXECFN and AT_EXECFD auxval.

Signed-off-by: Lirong Yuan 
---
 linux-user/elfload.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index f3080a1635..7e0f3042f1 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1568,7 +1568,7 @@ struct exec
  ~(abi_ulong)(TARGET_ELF_EXEC_PAGESIZE-1))
 #define TARGET_ELF_PAGEOFFSET(_v) ((_v) & (TARGET_ELF_EXEC_PAGESIZE-1))
 
-#define DLINFO_ITEMS 15
+#define DLINFO_ITEMS 17
 
 static inline void memcpy_fromfs(void * to, const void * from, unsigned long n)
 {
@@ -1888,11 +1888,14 @@ static abi_ulong loader_build_fdpic_loadmap(struct 
image_info *info, abi_ulong s
 return sp;
 }
 
-static abi_ulong create_elf_tables(abi_ulong p, int argc, int envc,
+static abi_ulong create_elf_tables(struct linux_binprm *bprm,
struct elfhdr *exec,
struct image_info *info,
struct image_info *interp_info)
 {
+abi_ulong p = bprm->p;
+int argc = bprm->argc;
+int envc = bprm->envc;
 abi_ulong sp;
 abi_ulong u_argc, u_argv, u_envp, u_auxv;
 int size;
@@ -2032,6 +2035,8 @@ static abi_ulong create_elf_tables(abi_ulong p, int argc, 
int envc,
 NEW_AUX_ENT(AT_CLKTCK, (abi_ulong) sysconf(_SC_CLK_TCK));
 NEW_AUX_ENT(AT_RANDOM, (abi_ulong) u_rand_bytes);
 NEW_AUX_ENT(AT_SECURE, (abi_ulong) qemu_getauxval(AT_SECURE));
+NEW_AUX_ENT(AT_EXECFN, info->file_string);
+NEW_AUX_ENT(AT_EXECFD, bprm->fd);
 
 #ifdef ELF_HWCAP2
 NEW_AUX_ENT(AT_HWCAP2, (abi_ulong) ELF_HWCAP2);
@@ -2870,8 +2875,8 @@ int load_elf_binary(struct linux_binprm *bprm, struct 
image_info *info)
 #endif
 }
 
-bprm->p = create_elf_tables(bprm->p, bprm->argc, bprm->envc, _ex,
-info, (elf_interpreter ? _info : NULL));
+bprm->p = create_elf_tables(bprm, _ex, info,
+(elf_interpreter ? _info : NULL));
 info->start_stack = bprm->p;
 
 /* If we have an interpreter, set that as the program's entry point.
-- 
2.25.0.265.gbab2e86ba0-goog

Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec

On 2/21/20 12:32 PM, Stefan Hajnoczi wrote:
> Hi Andrzej,
> After having looked at the PMRWBM part of the spec, I think that the
> Bit 1 mode should be implemented for slightly better performance.  Bit
> 0 mode is not well-suited to virtualization for the reasons I
> mentioned in the previous email.
> 
> The spec describes Bit 1 mode as "The completion of a read to the
> PMRSTS register shall ensure that all prior writes to the Persistent
> Memory Region have completed and are persistent".
> 
> Stefan

Make sense -- will incorporate that feedback in second version of patch.

Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec

On 2/21/20 11:45 AM, Dr. David Alan Gilbert wrote:
> Isn't there also a requirement that BARs are powers of two? Wouldn't
> you need to ensure the PMR file is a power of 2 in size?
> 
> Dave

Agree, need to make sure it is power of 2.

Re: [PATCH] maint: Include top-level *.rst files early in git diff


On 2/21/20 4:17 AM, Peter Maydell wrote:

On Thu, 20 Feb 2020 at 16:22, Eric Blake  wrote:


We are converting more doc files to *.rst rather than *.texi.  Most
doc files are already listed early in diffs due to our catchall
docs/*, but a few top-level files get missed by that glob.

Signed-off-by: Eric Blake 
---

Both *.texi and *.rst entries make sense while we are still converting
things, but we'll need a followup to drop *.texi when the conversion
is complete...


I was wondering what rst files we had outside docs, and
the answer is "README.rst, CODING_STYLE.rst and
tests/acceptance/README.rst".

(There's a reasonable argument for moving CODING_STYLE.rst to
docs/devel now; depends how highly you rate having it more
"discoverable" to new contributors.)


Or even:
git mv CODING_STYLE.rst docs/devel/CODING_STYLE.rst
ln -s docs/devel/CODING_STYLE.rst .
git add CODING_STYLE.rst

to turn it into a git link in the top level for easy discoverabilty on a 
fresh checkout, while still making it part of our overall built 
documentation.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v7 01/11] qapi/error: add (Error **errp) cleaning APIs


On 2/21/20 3:20 AM, Vladimir Sementsov-Ogievskiy wrote:


+static inline void warn_report_errp(Error **errp)
+{
+    assert(errp && *errp);
+    warn_report_err(*errp);
+    *errp = NULL;
+}
+
+
  /*
   * Just like error_setg(), except you get to specify the error class.
   * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is


These appear to be unused apart from the Coccinelle script in PATCH 03.

They are used in the full "[RFC v5 000/126] error: auto propagated
local_err" series.  Options:

1. Pick a few more patches into this part I series, so these guys come
    with users.


It needs some additional effort for this series.. But it's possible. Still,
I think that we at least should not pull out patches which should be in
future series (for example from ppc or block/)..




2. Punt this patch to the first part that has users, along with the
    part of the Coccinelle script that deals with them.


But coccinelle script would be wrong, if we drop this part from it. I 
think,
that after commit which adds coccinelle script, it should work with any 
file,

not only subset of these series.

So, it's probably OK for now to drop these functions, forcing their 
addition if
coccinelle script will be applied where these functions are needed. We 
can, for

example comment these three functions.

Splitting coccinelle script into two parts, which will be in different 
series will

not help any patch-porting processes.


Splitting the coccinelle script across multiple patches is actually 
quite reviewable, and still easy to backport.  Consider this series by 
Philippe:


https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05554.html

which makes multiple additions to scripts/coccinelle/exec_rw_const.cocci 
over the course of the series.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] migration/throttle: Add throttle-trig-thres migration parameter


On 2/20/20 8:57 PM, Keqian Zhu wrote:

Currently, if the bytes_dirty_period is more than the 50% of
bytes_xfer_period, we start or increase throttling.

If we make this percentage higher, then we can tolerate higher
dirty rate during migration, which means less impact on guest.
The side effect of higher percentage is longer migration time.

We can configure this parameter to switch between migration time
firt or guest performance first. The default value is 50.

Signed-off-by: Keqian Zhu 
---
Cc: Juan Quintela 
Cc: "Dr. David Alan Gilbert" 
Cc: Eric Blake 
Cc: Markus Armbruster 
---



+++ b/qapi/migration.json
@@ -524,6 +524,10 @@
  #  compression, so set the decompress-threads to the 
number about 1/4
  #  of compress-threads is adequate.
  #
+# @throttle-trig-thres: The ratio of bytes_dirty_period and bytes_xfer_period 
to
+#   trigger throttling. It is expressed as percentage. The
+#   default value is 50. (Since 5.0)
+#


Abbreviating feels odd; can you please spell this out as 
throttle-trigger-threshold?


Can the threshold exceed 100%?

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v7 02/11] error: auto propagated local_err


On 2/21/20 3:42 AM, Vladimir Sementsov-Ogievskiy wrote:


+#define ERRP_AUTO_PROPAGATE()  \
+    g_auto(ErrorPropagator) _auto_errp_prop = {.errp = errp};  \
+    errp = ((errp == NULL || *errp == error_fatal) \
+    ? &_auto_errp_prop.local_err : errp)
+
  /*
   * Special error destination to abort on error.
   * See error_setg() and error_propagate() for details.


*errp == error_fatal tests *errp == NULL, which is not what you want.
You need to test errp == _fatal, just like error_handle_fatal().


Oops, great bug) And nobody noticed before) Of course, you are right.


Sorry I missed it in my earlier looks.





Superfluous parenthesis around the first operand of ?:.

Wouldn't

    #define ERRP_AUTO_PROPAGATE()  \
    g_auto(ErrorPropagator) _auto_errp_prop = {.errp = errp};  \
    if (!errp || errp == _fatal) {   \
    errp = &_auto_errp_prop.local_err; \
    }

be clearer?



Hmm, notation with "if" will allow omitting ';' after macro invocation, 
which seems not good..


Then wrap it:

g_auto(ErrorPropagator) _auto_errp_prop = {.errp = errp}; \
do { \
  if (!errp || errp == _fata) {
errp = &_auto_errp_prop.local_err; \
  } \
while (0)


And if I'm not wrong we've already discussed it somewhere in previous 
versions.


The original use of ?: stems from my suggestion on an earlier revision 
when we were still trying to pack everything into two consecutive 
declaration lines, rather than a declaration and a statement (as ?: is 
necessary for conditionals in declarations).  But since then, we decided 
to go with a statement (we require a C99 compiler, so declaration after 
statement is supported by our compiler, even if our coding style 
currently avoids it where possible), so as long as we support 
statements, we might as well go with a legible statement instead of 
insisting on the compact ?: form.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: Emulating Solaris 10 on SPARC64 sun4u

2020-02-21 Thread Dr. David Alan Gilbert

* BALATON Zoltan (bala...@eik.bme.hu) wrote:
> On Wed, 19 Feb 2020, BALATON Zoltan wrote:
> > faster or doing something differently? Does someone know what interrupts
> > are generated on real hardware in DMA mode so we can compare that to
> > what we see with QEMU?
> 
> The document Programming Interface for Bus Master IDE Controller, Revision
> 1.0 (5/16/94) has some info on this. AFAIU it says that after DMA operation
> is completed an IRQ should be raised. On page 5, section 3.1. Data
> Synchronization it says:
> 
> "Another way to view this requirement is that the first read to the
> controller Status register in response to the IDE device interrupt must
> return with the Interrupt bit set and with the guarantee that all buffered
> data has been written to memory."
> 
> Not sure if this is relevant but how is it handled in QEMU? Is the right
> interrupt bit set after DMA transfer is done? If so is it the one that's
> checked by the OS driver?

One thing to be a little careful of is I remember the 646 was always
known for having a few quirks (I've not got a SPARC64 ith one, but I
think my Alpha has it).  So whether you're chasing a generic IDE BM
problem or a 646 special is fun.

Dave

> Regards,
> BALATON Zoltan
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC

2020-02-21 Thread BALATON Zoltan


On Fri, 21 Feb 2020, Peter Maydell wrote:

On Fri, 21 Feb 2020 at 18:04, BALATON Zoltan  wrote:

On Fri, 21 Feb 2020, Peter Maydell wrote:

I think that is the wrong approach. Enabling use of the host
FPU should not affect the accuracy of the emulation, which
should remain bitwise-correct. We should only be using the
host FPU to the extent that we can do that without discarding
accuracy. As far as I'm aware that's how the hardfloat support
for other guest CPUs that use it works.


I don't know of a better approach. Please see section 4.2.2 Floating-Point
Status and Control Register on page 124 in this document:

https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0

especially the definition of the FR and FI bits and tell me how can we
emulate these accurately and use host FPU.


I don't know much about PPC, but if you can't emulate the
guest architecture accurately with the host FPU, then
don't use the host FPU. We used to have a kind of 'hardfloat'


I don't know if it's possible or not to emulate these accurately and use 
the FPU but nobody did it for QEMU so far. But if someone knows a way 
please speak up then we can try to implement it. Unfortunately this would 
require more detailed knowledge about different FPU implementations (at 
least X86_64, ARM and PPC that are the mostly used platforms) than what I 
have or willing to spend time to learn.



support that was fast but inaccurate, but it was a mess
because it meant that most guest code sort of worked but
some guest code would confusingly misbehave. Deliberately
not correctly emulating the guest CPU/FPU behaviour is not
something I want us to return to.

You're right that sometimes you can't get both speed
and accuracy; other emulators (and especially ones
which are trying to emulate games consoles) may choose
to prefer speed over accuracy. For QEMU we prefer to
choose accuracy over speed in this area.


OK, then how about keeping the default accurate but allow to opt in to use 
FPU even if it's known to break some bits for workloads where users would 
need speed over accuracy and would be happy to live with the limitation. 
Note that i've found that just removing the define that disables hardfloat 
for PPC target makes VMX vector instructions faster while normal FPU is a 
little slower without any other changes so disabling hardfloat already 
limits performance for guests using VMX even when not using the FPU for 
cases when it would cause inaccuracy. If you say we want accuracy and 
don't care about speed, then just don't disable hardfloat as it helps at 
least VMX and then we can add option to allow the user to say we can use 
hardfloat even if it's inaccurate then they can test their workload and 
decide for themselves.


Regards,
BALATON Zoltan

Re: [PATCH 2/2] riscv: sifive_u: Update BIOS_FILENAME for 32-bit

On Thu, Feb 20, 2020 at 6:43 AM Bin Meng  wrote:
>
> Update BIOS_FILENAME to consider 32-bit bios image file name.
>
> Tested booting Linux v5.5 32-bit image (built from rv32_defconfig
> plus CONFIG_SOC_SIFIVE) with the default 32-bit bios image.

Do we really want to support a 32-bit sifive_u machine?

Alistair

>
> Signed-off-by: Bin Meng 
> ---
>
>  hw/riscv/sifive_u.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
> index ca561d3..371133e 100644
> --- a/hw/riscv/sifive_u.c
> +++ b/hw/riscv/sifive_u.c
> @@ -57,7 +57,11 @@
>
>  #include 
>
> -#define BIOS_FILENAME "opensbi-riscv64-sifive_u-fw_jump.bin"
> +#if defined(TARGET_RISCV32)
> +# define BIOS_FILENAME "opensbi-riscv32-sifive_u-fw_jump.bin"
> +#else
> +# define BIOS_FILENAME "opensbi-riscv64-sifive_u-fw_jump.bin"
> +#endif
>
>  static const struct MemmapEntry {
>  hwaddr base;
> --
> 2.7.4
>
>

Re: [PATCH 2/3] hw/riscv/spike: Allow loading firmware separately using -bios option

On Thu, Feb 13, 2020 at 11:22 PM Anup Patel  wrote:
>
> This patch extends Spike machine support to allow loading OpenSBI
> firmware (fw_jump.elf) separately using -bios option.
>
> Signed-off-by: Anup Patel 

Can you add something in the commit message adding that you also
support adding an initrd with this patch?

Otherwise:

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/riscv/spike.c | 24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
> index 8823681783..060a86f922 100644
> --- a/hw/riscv/spike.c
> +++ b/hw/riscv/spike.c
> @@ -45,6 +45,12 @@
>
>  #include 
>
> +#if defined(TARGET_RISCV32)
> +# define BIOS_FILENAME "opensbi-riscv32-spike-fw_jump.elf"
> +#else
> +# define BIOS_FILENAME "opensbi-riscv64-spike-fw_jump.elf"
> +#endif
> +
>  static const struct MemmapEntry {
>  hwaddr base;
>  hwaddr size;
> @@ -183,8 +189,24 @@ static void spike_board_init(MachineState *machine)
>  memory_region_add_subregion(system_memory, memmap[SPIKE_MROM].base,
>  mask_rom);
>
> +riscv_find_and_load_firmware(machine, BIOS_FILENAME,
> + memmap[SPIKE_DRAM].base,
> + htif_symbol_callback);
> +
>  if (machine->kernel_filename) {
> -riscv_load_kernel(machine->kernel_filename, htif_symbol_callback);
> +uint64_t kernel_entry = riscv_load_kernel(machine->kernel_filename,
> +  htif_symbol_callback);
> +
> +if (machine->initrd_filename) {
> +hwaddr start;
> +hwaddr end = riscv_load_initrd(machine->initrd_filename,
> +   machine->ram_size, kernel_entry,
> +   );
> +qemu_fdt_setprop_cell(s->fdt, "/chosen",
> +  "linux,initrd-start", start);
> +qemu_fdt_setprop_cell(s->fdt, "/chosen", "linux,initrd-end",
> +  end);
> +}
>  }
>
>  /* reset vector */
> --
> 2.17.1
>
>

Re: [PATCH 1/3] hw/riscv: Add optional symbol callback ptr to riscv_load_firmware()

On Thu, Feb 13, 2020 at 11:24 PM Anup Patel  wrote:
>
> This patch adds an optional function pointer, "sym_cb", to
> riscv_load_firmware() which provides the possibility to access
> the symbol table during kernel loading.
>
> The pointer is ignored, if supplied with flat (non-elf) firmware image.
>
> The Spike board requires it locate the HTIF symbols from firmware ELF
> passed via "-bios" option.
>
> Signed-off-by: Anup Patel 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/riscv/boot.c | 13 -
>  hw/riscv/sifive_u.c |  2 +-
>  hw/riscv/virt.c |  2 +-
>  include/hw/riscv/boot.h |  6 --
>  4 files changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
> index 027303d2a3..7ec94dc701 100644
> --- a/hw/riscv/boot.c
> +++ b/hw/riscv/boot.c
> @@ -36,7 +36,8 @@
>
>  void riscv_find_and_load_firmware(MachineState *machine,
>const char *default_machine_firmware,
> -  hwaddr firmware_load_addr)
> +  hwaddr firmware_load_addr,
> +  symbol_fn_t sym_cb)
>  {
>  char *firmware_filename = NULL;
>
> @@ -76,7 +77,7 @@ void riscv_find_and_load_firmware(MachineState *machine,
>
>  if (firmware_filename) {
>  /* If not "none" load the firmware */
> -riscv_load_firmware(firmware_filename, firmware_load_addr);
> +riscv_load_firmware(firmware_filename, firmware_load_addr, sym_cb);
>  g_free(firmware_filename);
>  }
>  }
> @@ -96,12 +97,14 @@ char *riscv_find_firmware(const char *firmware_filename)
>  }
>
>  target_ulong riscv_load_firmware(const char *firmware_filename,
> - hwaddr firmware_load_addr)
> + hwaddr firmware_load_addr,
> + symbol_fn_t sym_cb)
>  {
>  uint64_t firmware_entry, firmware_start, firmware_end;
>
> -if (load_elf(firmware_filename, NULL, NULL, NULL, _entry,
> - _start, _end, 0, EM_RISCV, 1, 0) > 0) {
> +if (load_elf_ram_sym(firmware_filename, NULL, NULL, NULL,
> + _entry, _start, _end, 0,
> + EM_RISCV, 1, 0, NULL, true, sym_cb) > 0) {
>  return firmware_entry;
>  }
>
> diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
> index 0140e95732..0c84215f42 100644
> --- a/hw/riscv/sifive_u.c
> +++ b/hw/riscv/sifive_u.c
> @@ -341,7 +341,7 @@ static void riscv_sifive_u_init(MachineState *machine)
>  create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline);
>
>  riscv_find_and_load_firmware(machine, BIOS_FILENAME,
> - memmap[SIFIVE_U_DRAM].base);
> + memmap[SIFIVE_U_DRAM].base, NULL);
>
>  if (machine->kernel_filename) {
>  uint64_t kernel_entry = riscv_load_kernel(machine->kernel_filename,
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index c44b865959..90a5bfef63 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -476,7 +476,7 @@ static void riscv_virt_board_init(MachineState *machine)
>  mask_rom);
>
>  riscv_find_and_load_firmware(machine, BIOS_FILENAME,
> - memmap[VIRT_DRAM].base);
> + memmap[VIRT_DRAM].base, NULL);
>
>  if (machine->kernel_filename) {
>  uint64_t kernel_entry = riscv_load_kernel(machine->kernel_filename,
> diff --git a/include/hw/riscv/boot.h b/include/hw/riscv/boot.h
> index df80051fbc..474a940ad5 100644
> --- a/include/hw/riscv/boot.h
> +++ b/include/hw/riscv/boot.h
> @@ -24,10 +24,12 @@
>
>  void riscv_find_and_load_firmware(MachineState *machine,
>const char *default_machine_firmware,
> -  hwaddr firmware_load_addr);
> +  hwaddr firmware_load_addr,
> +  symbol_fn_t sym_cb);
>  char *riscv_find_firmware(const char *firmware_filename);
>  target_ulong riscv_load_firmware(const char *firmware_filename,
> - hwaddr firmware_load_addr);
> + hwaddr firmware_load_addr,
> + symbol_fn_t sym_cb);
>  target_ulong riscv_load_kernel(const char *kernel_filename,
> symbol_fn_t sym_cb);
>  hwaddr riscv_load_initrd(const char *filename, uint64_t mem_size,
> --
> 2.17.1
>
>

Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-02-21 Thread Stefan Hajnoczi

Hi Andrzej,
After having looked at the PMRWBM part of the spec, I think that the
Bit 1 mode should be implemented for slightly better performance.  Bit
0 mode is not well-suited to virtualization for the reasons I
mentioned in the previous email.

The spec describes Bit 1 mode as "The completion of a read to the
PMRSTS register shall ensure that all prior writes to the Persistent
Memory Region have completed and are persistent".

Stefan

Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-02-21 Thread Dr. David Alan Gilbert

* Stefan Hajnoczi (stefa...@gmail.com) wrote:
> On Tue, Feb 18, 2020 at 03:48:11PM -0700, Andrzej Jakowski wrote:
> > This patch introduces support for PMR that has been defined as part of NVMe 
> > 1.4
> > spec. User can now specify a pmr_file which will be mmap'ed into qemu 
> > address
> > space and subsequently in PCI BAR 2. Guest OS can perform mmio read and 
> > writes
> > to the PMR region that will stay persistent accross system reboot.
> > 
> > Signed-off-by: Andrzej Jakowski 
> > ---
> >  hw/block/nvme.c   | 145 ++-
> >  hw/block/nvme.h   |   5 ++
> >  hw/block/trace-events |   5 ++
> >  include/block/nvme.h  | 172 ++
> >  4 files changed, 326 insertions(+), 1 deletion(-)
> 
> NVDIMM folks, please take a look.  There seems to be commonality here.
> 
> Can this use -object memory-backend-file instead of manually opening and
> mapping a file?
> 
> Also CCing David Gilbert because there is some similarity with the
> vhost-user-fs's DAX Window feature where QEMU mmaps regions of files
> into a BAR.

I guess the biggest difference here is that the read can have the side
effect; in my world I don't have to set read/write/endian ops - I just
map a chunk of memory and use memory_region_add_subregion, so all the
read/writes are native read/writes - I assume that would be a lot
faster - I guess it depends if NVME_PMRCAP_PMRWBM is something constant
you know early on; if you know that you don't need to do side effects in
the read you could do the same trick and avoid the IO ops altogether.


Isn't there also a requirement that BARs are powers of two? Wouldn't
you need to ensure the PMR file is a power of 2 in size?

Dave

> > @@ -1303,6 +1327,38 @@ static const MemoryRegionOps nvme_cmb_ops = {
> >  },
> >  };
> >  
> > +static void nvme_pmr_write(void *opaque, hwaddr addr, uint64_t data,
> > +unsigned size)
> > +{
> > +NvmeCtrl *n = (NvmeCtrl *)opaque;
> > +stn_le_p(>pmrbuf[addr], size, data);
> > +}
> > +
> > +static uint64_t nvme_pmr_read(void *opaque, hwaddr addr, unsigned size)
> > +{
> > +NvmeCtrl *n = (NvmeCtrl *)opaque;
> > +if (!NVME_PMRCAP_PMRWBM(n->bar.pmrcap)) {
> > +int ret;
> > +ret = msync(n->pmrbuf, n->f_pmr_size, MS_SYNC);
> > +if (!ret) {
> > +NVME_GUEST_ERR(nvme_ub_mmiowr_pmrread_barrier,
> > +   "error while persisting data");
> > +}
> > +}
> 
> Why is msync(2) done on memory loads instead of stores?


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PULL 0/6] virtiofs queue

On Fri, 21 Feb 2020 at 13:52, Dr. David Alan Gilbert (git)
 wrote:
>
> From: "Dr. David Alan Gilbert" 
>
> The following changes since commit b651b80822fa8cb66ca30087ac7fbc75507ae5d2:
>
>   Merge remote-tracking branch 
> 'remotes/vivier2/tags/linux-user-for-5.0-pull-request' into staging 
> (2020-02-20 17:35:42 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/dagrh/qemu.git tags/pull-virtiofs-20200221
>
> for you to fetch changes up to 5bb8e8beedb47fc0d0a44957a154918c4f4afc80:
>
>   docs: Fix virtiofsd.1 location (2020-02-21 13:05:27 +)
>
> ----
> virtiofs pull 20200221
>
> Mostly minor cleanups.
> Miroslav's fixes a make install corner case.
> Philippe's set includes an error corner case fix.
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM

Re: [PATCH v1 3/4] Acceptance test: provides new functions

2020-02-21 Thread Wainer dos Santos Moschetta


Hi Oksana,

On 2/14/20 12:52 PM, Oksana Vohchana wrote:

Adds functions to check if service RDMA is enabled and gets the interface
where it was configured

Signed-off-by: Oksana Vohchana 
---
  tests/acceptance/migration.py | 17 +
  1 file changed, 17 insertions(+)

diff --git a/tests/acceptance/migration.py b/tests/acceptance/migration.py
index 8209dcf71d..bbd88f8dda 100644
--- a/tests/acceptance/migration.py
+++ b/tests/acceptance/migration.py
@@ -11,12 +11,16 @@
  
  
  import tempfile

+import re
+import netifaces


Since netifaces isn't a standard Python library that import might fail.

The tests dependencies are listed in tests/requirements.txt, and 
installed in the environment created by `make check-acceptance`. If you 
want to ensure the test behaves well even when executed manually (i.e. 
not via `make check-acceptance`), you can add runtime checks as can be 
seen in tests/acceptance/machine_m68k_nextcube.py



  from avocado_qemu import Test
  from avocado import skipUnless
  
  from avocado.utils import network

  from avocado.utils import wait
  from avocado.utils.path import find_command
+from avocado.utils import service
+from avocado.utils import process
  
  
  class Migration(Test):

@@ -58,6 +62,19 @@ class Migration(Test):
  self.cancel('Failed to find a free port')
  return port
  
+def _if_rdma_enable(self):

+rdma_stat = service.ServiceManager()
+rdma = rdma_stat.status('rdma')
+return rdma



Above function is used on patch04, but actually I don't think it needs 
to check this service for RoCE. It would be needed if it was using the 
rxe_cfg to configure the rdma link. Or am I missing something?




+
+def _get_ip_rdma(self):
+get_ip_rdma = process.run('rdma link show').stdout.decode()
+for line in get_ip_rdma.split('\n'):
+if re.search(r"ACTIVE", line):
+interface = line.split(" ")[-2]
+ip = 
netifaces.ifaddresses(interface)[netifaces.AF_INET][0]['addr']
+return ip
+


I suggest that it explicitly returns None if none is found.

Thanks!

- Wainer

  
  def test_migration_with_tcp_localhost(self):

  dest_uri = 'tcp:localhost:%u' % self._get_free_port()

Re: [PATCH v2 12/13] hw/arm/raspi: Add the Raspberry Pi B+ machine

2020-02-21 Thread Eduardo Habkost

On Tue, Feb 18, 2020 at 10:35:43AM +0100, Philippe Mathieu-Daudé wrote:
> Cc'ing Eduardo/Igor.
> 
> On 2/18/20 9:48 AM, Luc Michel wrote:
> > On 2/17/20 12:45 PM, Philippe Mathieu-Daudé wrote:
> > >$ qemu-system-arm -M raspi1b -serial stdio \
> > >-kernel raspberrypi/firmware/boot/kernel.img \
> > >-dtb raspberrypi/firmware/boot/bcm2708-rpi-b.dtb \
> > >-append 'printk.time=0 earlycon=pl011,0x20201000 console=ttyAMA0'
> > >[0.00] Booting Linux on physical CPU 0x0
> > >[0.00] Linux version 4.19.69+ (dom@buildbot) (gcc version 
> > > 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1261 Tue Sep 3 
> > > 20:21:01 BST 2019
> > >[0.00] CPU: ARMv6-compatible processor [410fb767] revision 7 
> > > (ARMv7), cr=00c5387d
> > >[0.00] CPU: VIPT aliasing data cache, unknown instruction cache
> > >[0.00] OF: fdt: Machine model: Raspberry Pi Model B
> > >[0.00] earlycon: pl11 at MMIO 0x20201000 (options '')
> > >[0.00] bootconsole [pl11] enabled
> > >[0.00] Memory policy: Data cache writeback
> > >[0.00] cma: Reserved 8 MiB at 0x1b80
> > >[0.00] random: get_random_bytes called from 
> > > start_kernel+0x8c/0x49c with crng_init=0
> > >[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 
> > > 113680
> > >[0.00] Kernel command line: printk.time=0 
> > > earlycon=pl011,0x20201000 console=ttyAMA0
> > >Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
> > >Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
> > >Memory: 434380K/458752K available (6971K kernel code, 635K rwdata, 
> > > 2080K rodata, 464K init, 797K bss, 16180K reserved, 8192K cma-reserved)
> > >...
> > > 
> > > Signed-off-by: Philippe Mathieu-Daudé 
> > > ---
> > >   hw/arm/raspi.c | 13 +
> > >   1 file changed, 13 insertions(+)
> > > 
> > > diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
> > > index 3537a329ac..2d9f4e3085 100644
> > > --- a/hw/arm/raspi.c
> > > +++ b/hw/arm/raspi.c
> > > @@ -324,6 +324,15 @@ static void 
> > > raspi_machine_class_common_init(MachineClass *mc,
> > >   mc->default_ram_size = board_ram_size(board_rev);
> > >   };
> > > +static void raspi1b_machine_class_init(ObjectClass *oc, void *data)
> > > +{
> > > +MachineClass *mc = MACHINE_CLASS(oc);
> > > +RaspiMachineClass *rmc = RASPI_MACHINE_CLASS(oc);
> > > +
> > > +rmc->board_rev = 0x900032;
> > > +raspi_machine_class_common_init(mc, rmc->board_rev);
> > > +};
> > > +
> > >   static void raspi2b_machine_class_init(ObjectClass *oc, void *data)
> > >   {
> > >   MachineClass *mc = MACHINE_CLASS(oc);
> > > @@ -348,6 +357,10 @@ static void raspi3b_machine_class_init(ObjectClass 
> > > *oc, void *data)
> > >   static const TypeInfo raspi_machine_types[] = {
> > >   {
> > > +.name   = MACHINE_TYPE_NAME("raspi1b"),
> > If it's the B+ model, why not call it raspi1b+ ?
> 
> I thought about it (and prefer it), but I'm not sure this can have some
> side-effect.
> 
> Eduardo, Igor, is that OK to use a '+' in a machine name?

It would probably work, but I would prefer to avoid that.

> 
> So far the names used match [a-zA-Z0-9-].
> 
> > 
> > > +.parent = TYPE_RASPI_MACHINE,
> > > +.class_init = raspi1b_machine_class_init,
> > > +}, {
> > >   .name   = MACHINE_TYPE_NAME("raspi2b"),
> > >   .parent = TYPE_RASPI_MACHINE,
> > >   .class_init = raspi2b_machine_class_init,
> > > 
> > 
> 

-- 
Eduardo

Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-02-21 Thread Stefan Hajnoczi

On Fri, Feb 21, 2020, 17:50 Dr. David Alan Gilbert 
wrote:

> * Stefan Hajnoczi (stefa...@gmail.com) wrote:
> > On Fri, Feb 21, 2020 at 3:36 PM Andrzej Jakowski
> >  wrote:
> > > On 2/21/20 6:45 AM, Stefan Hajnoczi wrote:
> > > > Why is msync(2) done on memory loads instead of stores?
> > >
> > > This is my interpretation of NVMe spec wording with regards to PMRWBM
> field
> > > which says:
> > >
> > > "The completion of a memory read from any Persistent
> > > Memory Region address ensures that all prior writes to the
> > > Persistent Memory Region have completed and are
> > > persistent."
> >
> > Thanks, I haven't read the PMR section of the spec :).
> >
> > A synchronous operation is bad for virtualization performance.  While
> > the sync may be a cheap operation in hardware, it can be arbitrarily
> > expensive with msync(2).  The vCPU will be stuck until msync(2)
> > completes on the host.
> >
> > It's also a strange design choice since performance will suffer when
> > an unrelated read has to wait for writes to complete.  This is
> > especially problematic for multi-threaded applications or multi-core
> > systems where I guess this case is hit frequently.  Maybe it's so
> > cheap in hardware that it doesn't matter?  But then why didn't NVDIMM
> > use this mechanism?
> >
> > If anyone knows the answer I'd be interested in learning.  But this
> > isn't a criticism of the patch - of course it needs to implement the
> > hardware spec and we can't change it.
>
> Is this coming from the underlying PCIe spec ?
> In PCIe Base 4.0 Rev 0.3 Feb19-2014, section 2.4.1 Transaction ordering,
> there's a Table 2-39 and entry B2a in that table is:
>
>
>   'A Read Request must not pass a Posted Request unless B2b applies.'
>
> and a posted request is defined as a 'Memory Write Request or a Message
> Request'.
>
> ???
>

No, that relates to transaction ordering in PCI, not data persistence in
PMR.  PMR can define whatever persistence semantics it wants.

The completion of the write transaction at the PCI level does not mean data
has to be persistent at the PMR level.

Stefan

>

Re: [QUESTION] Clang as QEMU compiler - two questions

On Fri, 21 Feb 2020 at 18:09, Aleksandar Markovic
 wrote:
> 1) Is ./configure --cc=clang the correct way to enable clang
> compilation of QEMU? Any other gotchas?

You'll also want "--cxx=clang++", so the (few) C++ parts of
QEMU get compiled with the same toolchain.

It should just work -- clang is our default compiler for OSX
and is included in the CI and pre-merge tests on Linux.

thanks
-- PMM

Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC

On Fri, 21 Feb 2020 at 18:04, BALATON Zoltan  wrote:
> On Fri, 21 Feb 2020, Peter Maydell wrote:
> > I think that is the wrong approach. Enabling use of the host
> > FPU should not affect the accuracy of the emulation, which
> > should remain bitwise-correct. We should only be using the
> > host FPU to the extent that we can do that without discarding
> > accuracy. As far as I'm aware that's how the hardfloat support
> > for other guest CPUs that use it works.
>
> I don't know of a better approach. Please see section 4.2.2 Floating-Point
> Status and Control Register on page 124 in this document:
>
> https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
>
> especially the definition of the FR and FI bits and tell me how can we
> emulate these accurately and use host FPU.

I don't know much about PPC, but if you can't emulate the
guest architecture accurately with the host FPU, then
don't use the host FPU. We used to have a kind of 'hardfloat'
support that was fast but inaccurate, but it was a mess
because it meant that most guest code sort of worked but
some guest code would confusingly misbehave. Deliberately
not correctly emulating the guest CPU/FPU behaviour is not
something I want us to return to.

You're right that sometimes you can't get both speed
and accuracy; other emulators (and especially ones
which are trying to emulate games consoles) may choose
to prefer speed over accuracy. For QEMU we prefer to
choose accuracy over speed in this area.

thanks
-- PMM

Re: [GSoC/Outreachy] Arduino complete setup visualization and emulation

2020-02-21 Thread Joaquin de Andres

On 2/21/20 12:14 PM, Philippe Mathieu-Daudé wrote:
> On 2/21/20 11:56 AM, Stefan Hajnoczi wrote:
>> On Tue, Feb 11, 2020 at 10:51:19AM +, Stefan Hajnoczi wrote:
>>> On Mon, Feb 10, 2020 at 08:58:28PM +0100, Philippe Mathieu-Daudé wrote:
...
> [*] Test with the preset arduino examples (### TODO add references ###)
> 
> - Basic: "Blink: Turn a LED on and off."
> - Analog: "Fading: Use an analog output (PWM pin) to dim a LED."
> - Analog: "Analog Input: Use a potentiometer to control the flashing
>   of a LED."
> 

[*] Test with the preset Arduino examples [0]

- Basic: "Blink: Turn a LED on and off." [1]
- Analog: "Fading: Use an analog output (PWM pin) to dim a LED." [2]
- Analog: "Analog Input: Use a potentiometer to control the blinking
  of a LED." [3]

[0] https://www.arduino.cc/en/Tutorial/BuiltInExamples
[1] https://www.arduino.cc/en/Tutorial/Blink
[2] https://www.arduino.cc/en/Tutorial/Fading
[3] https://www.arduino.cc/en/Tutorial/AnalogInput

--Joa

Re: [PATCH] hw/char/pl011: Output characters using best-effort mode

On 21/02/20 14:14, Peter Maydell wrote:
> The initial case reported by Gavin in this thread is
> "-serial tcp:127.0.0.1:50900" with the other end being a program which
> listens on TCP port 50900 and then sleeps without accepting any incoming
> connections, which blocks the serial port output and effectively blocks
> the guest bootup. If you want to insulate the guest from badly
> behaved consumers like that (or the related consumer who accepts
> the connection and then just doesn't read data from it) you probably
> need to deal with more than just POLLHUP. But I'm not sure how much
> we should care about these cases as opposed to just telling users
> not to do that...

No, I think we don't do anything (on purpose; that is, it was considered
the lesser evil) for x86 in that case.

Paolo

[QUESTION] Clang as QEMU compiler - two questions

2020-02-21 Thread Aleksandar Markovic

Hello, all

I have two questions:

1) Is ./configure --cc=clang the correct way to enable clang
compilation of QEMU? Any other gotchas?

2) Did somebody compare QEMU compiled by gcc vs. QEMU compiled with
clang (in any sense: executable size, performance, etc.)?

Thanks in advance,
Aleksandar

Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC

2020-02-21 Thread BALATON Zoltan

On Fri, 21 Feb 2020, Peter Maydell wrote:

On Fri, 21 Feb 2020 at 16:05, BALATON Zoltan wrote:

On Thu, 20 Feb 2020, Richard Henderson wrote:

On 2/18/20 9:10 AM, BALATON Zoltan wrote:

+DEFINE_PROP_BOOL("hardfloat", PowerPCCPU, hardfloat, true),

I would also prefer a different name here -- perhaps x-no-fp-fi.

What's wrong with hardfloat? That's how the code refers to this so if
anyone searches what it does would turn up some meaningful results.

This prompted me to check what you're using the property for.
The cover letter says:

This patch implements a simple way to keep the inexact flag set for
hardfloat while still allowing to revert to softfloat for workloads
that need more accurate albeit slower emulation. (Set hardfloat
property of CPU, i.e. -cpu name,hardfloat=false for that.)

I think that is the wrong approach. Enabling use of the host
FPU should not affect the accuracy of the emulation, which
should remain bitwise-correct. We should only be using the
host FPU to the extent that we can do that without discarding
accuracy. As far as I'm aware that's how the hardfloat support
for other guest CPUs that use it works.

I don't know of a better approach. Please see section 4.2.2 Floating-Point
Status and Control Register on page 124 in this document:

https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0

especially the definition of the FR and FI bits and tell me how can we
emulate these accurately and use host FPU. Not using the FPU even when
these bits are not needed (which seems to be the case for all workloads
we've tested so far) seriously limits the emulation speed so spending time
to emulate obscure and unused part of an architecture when not actually
needed just to keep emulation accurate but unusably slow does not seem to
be the right approach. In an ideal world of course this should be both
fast and accurate but we don't seem to have anyone who could achieve that
in past two years so maybe we could give up some accuracy now to get
usable speed and worry about emulating obscure features when we come
across some workload that actually needs it (but we have the option to
revert to accurate but slow emulation for that until a better way can be
devised that's both fast and accurate). Insisting on accuracy without any
solution to current state just hinders making any progress with this.

Other PowerPC emulators also seem to not bother or have similar
optimisation. I've quickly checked three that I know about:

https://github.com/mamedev/mame/blob/master/src/devices/cpu/powerpc/ppcdrc.cpp#L1893
https://github.com/mamedev/mame/blob/master/src/devices/cpu/powerpc/ppcdrc.cpp#L3503
there's also something here but no mention of FI bit I could notice:
https://github.com/mamedev/mame/blob/master/src/devices/cpu/powerpc/ppccom.cpp#L2023

https://github.com/xenia-project/xenia/blob/master/src/xenia/cpu/ppc/ppc_hir_builder.cc#L428

https://github.com/dolphin-emu/dolphin/blob/master/Source/Core/Core/PowerPC/Jit64/Jit_FloatingPoint.cpp

But I'm not sure I understand all of the above so hope this makes more
sense to someone and can advise.

Regards,
BALATON Zoltan

Re: [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating

2020-02-21 Thread Peter Xu

On Fri, Feb 21, 2020 at 05:41:51PM +0100, David Hildenbrand wrote:
> I was now able to actually test resizing while migrating. I am using the
> prototype of virtio-mem to test (which also makes use of resizable
> allocations). Things I was able to reproduce:

The test cases cover quite a lot.  Thanks for doing that.

> - Resize while still running on the migration source. Migration is canceled
> -- Test case for "migraton/ram: Handle RAM block resizes during precopy"

> - Resize (grow+shrink) on the migration target during postcopy migration
>   (when syncing RAM blocks), while not yet running on the target
> -- Test case for "migration/ram: Discard new RAM when growing RAM blocks
>and the VM is stopped", and overall RAM size synchronization. Seems to
>work just fine.

This won't be able to trigger without virtio-mem, right?

And I'm also curious on how to test this even with virtio-mem.  Is
that a QMP command to extend/shrink virtio-mem?

> - Resize (grow+shrink) on the migration tagret during postcopy migration
>   while already running on the target.
> -- Test case for "migration/ram: Handle RAM block resizes during postcopy"
> -- Test case for "migration/ram: Tolerate partially changed mappings in
>postcopy code" - I can see that -ENOENT is actually triggered and that
>migration succeeds. Migration seems to work just fine.

Thanks,

-- 
Peter Xu

Re: [PATCH v4 04/16] hw/i386: Introduce init_topo_info to initialize X86CPUTopoInfo

2020-02-21 Thread Babu Moger




On 2/21/20 11:05 AM, Igor Mammedov wrote:
> On Thu, 13 Feb 2020 12:16:51 -0600
> Babu Moger  wrote:
> 
>> Initialize all the parameters in one function init_topo_info.
> 
> is it possible to squash it in 2/16
> 
Sure. We can do that.
> 
>>
>> Move the data structure X86CPUTopoIDs and X86CPUTopoInfo into
>> x86.h.
> A reason why it's moved should be here.

Apicid functions will be part of X86MachineState data structure(patches
introduced later).These functions will use X86CPUTopoIDs and
X86CPUTopoInfo definition. Will add these details. Thanks

> 
>>
>> Signed-off-by: Babu Moger 
>> Reviewed-by: Eduardo Habkost 
>> ---
>>  hw/i386/pc.c   |4 +---
>>  hw/i386/x86.c  |   14 +++---
>>  include/hw/i386/topology.h |   26 ++
>>  include/hw/i386/x86.h  |   17 +
>>  4 files changed, 31 insertions(+), 30 deletions(-)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 2adf7f6afa..9803413dd9 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -1749,9 +1749,7 @@ static void pc_cpu_pre_plug(HotplugHandler 
>> *hotplug_dev,
>>  return;
>>  }
>>  
>> -topo_info.dies_per_pkg = x86ms->smp_dies;
>> -topo_info.cores_per_die = smp_cores;
>> -topo_info.threads_per_core = smp_threads;
>> +init_topo_info(_info, x86ms);
>>  
>>  env->nr_dies = x86ms->smp_dies;
>>  
>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>> index f18cab8e5c..083effb2f5 100644
>> --- a/hw/i386/x86.c
>> +++ b/hw/i386/x86.c
>> @@ -63,15 +63,12 @@ static size_t pvh_start_addr;
>>  uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
>>  unsigned int cpu_index)
>>  {
>> -MachineState *ms = MACHINE(x86ms);
>>  X86MachineClass *x86mc = X86_MACHINE_GET_CLASS(x86ms);
>>  X86CPUTopoInfo topo_info;
>>  uint32_t correct_id;
>>  static bool warned;
>>  
>> -topo_info.dies_per_pkg = x86ms->smp_dies;
>> -topo_info.cores_per_die = ms->smp.cores;
>> -topo_info.threads_per_core = ms->smp.threads;
>> +init_topo_info(_info, x86ms);
>>  
>>  correct_id = x86_apicid_from_cpu_idx(_info, cpu_index);
>>  if (x86mc->compat_apic_id_mode) {
>> @@ -146,10 +143,7 @@ int64_t x86_get_default_cpu_node_id(const MachineState 
>> *ms, int idx)
>> X86MachineState *x86ms = X86_MACHINE(ms);
>> X86CPUTopoInfo topo_info;
>>  
>> -   topo_info.dies_per_pkg = x86ms->smp_dies;
>> -   topo_info.cores_per_die = ms->smp.cores;
>> -   topo_info.threads_per_core = ms->smp.threads;
>> -
>> +   init_topo_info(_info, x86ms);
>>  
>> assert(idx < ms->possible_cpus->len);
>> x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
>> @@ -177,9 +171,7 @@ const CPUArchIdList 
>> *x86_possible_cpu_arch_ids(MachineState *ms)
>>sizeof(CPUArchId) * max_cpus);
>>  ms->possible_cpus->len = max_cpus;
>>  
>> -topo_info.dies_per_pkg = x86ms->smp_dies;
>> -topo_info.cores_per_die = ms->smp.cores;
>> -topo_info.threads_per_core = ms->smp.threads;
>> +init_topo_info(_info, x86ms);
>>  
>>  for (i = 0; i < ms->possible_cpus->len; i++) {
>>  X86CPUTopoIDs topo_ids;
>> diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
>> index ba52d49079..ef0ab0b6a3 100644
>> --- a/include/hw/i386/topology.h
>> +++ b/include/hw/i386/topology.h
>> @@ -40,23 +40,17 @@
>>  
>>  
>>  #include "qemu/bitops.h"
>> +#include "hw/i386/x86.h"
>>  
>> -/* APIC IDs can be 32-bit, but beware: APIC IDs > 255 require x2APIC support
>> - */
>> -typedef uint32_t apic_id_t;
>> -
>> -typedef struct X86CPUTopoIDs {
>> -unsigned pkg_id;
>> -unsigned die_id;
>> -unsigned core_id;
>> -unsigned smt_id;
>> -} X86CPUTopoIDs;
>> -
>> -typedef struct X86CPUTopoInfo {
>> -unsigned dies_per_pkg;
>> -unsigned cores_per_die;
>> -unsigned threads_per_core;
>> -} X86CPUTopoInfo;
>> +static inline void init_topo_info(X86CPUTopoInfo *topo_info,
>> +  const X86MachineState *x86ms)
>> +{
>> +MachineState *ms = MACHINE(x86ms);
>> +
>> +topo_info->dies_per_pkg = x86ms->smp_dies;
>> +topo_info->cores_per_die = ms->smp.cores;
>> +topo_info->threads_per_core = ms->smp.threads;
>> +}
>>  
>>  /* Return the bit width needed for 'count' IDs
>>   */
>> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
>> index 4b84917885..ad62b01cf2 100644
>> --- a/include/hw/i386/x86.h
>> +++ b/include/hw/i386/x86.h
>> @@ -36,6 +36,23 @@ typedef struct {
>>  bool compat_apic_id_mode;
>>  } X86MachineClass;
>>  
>> +/* APIC IDs can be 32-bit, but beware: APIC IDs > 255 require x2APIC support
>> + */
>> +typedef uint32_t apic_id_t;
>> +
>> +typedef struct X86CPUTopoIDs {
>> +unsigned pkg_id;
>> +unsigned die_id;
>> +unsigned core_id;
>> +unsigned smt_id;
>> +} X86CPUTopoIDs;
>> +
>> +typedef struct X86CPUTopoInfo {
>> +unsigned dies_per_pkg;
>> +unsigned cores_per_die;
>> +

Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-02-21 Thread Dr. David Alan Gilbert

* Stefan Hajnoczi (stefa...@gmail.com) wrote:
> On Fri, Feb 21, 2020 at 3:36 PM Andrzej Jakowski
>  wrote:
> > On 2/21/20 6:45 AM, Stefan Hajnoczi wrote:
> > > Why is msync(2) done on memory loads instead of stores?
> >
> > This is my interpretation of NVMe spec wording with regards to PMRWBM field
> > which says:
> >
> > "The completion of a memory read from any Persistent
> > Memory Region address ensures that all prior writes to the
> > Persistent Memory Region have completed and are
> > persistent."
> 
> Thanks, I haven't read the PMR section of the spec :).
> 
> A synchronous operation is bad for virtualization performance.  While
> the sync may be a cheap operation in hardware, it can be arbitrarily
> expensive with msync(2).  The vCPU will be stuck until msync(2)
> completes on the host.
> 
> It's also a strange design choice since performance will suffer when
> an unrelated read has to wait for writes to complete.  This is
> especially problematic for multi-threaded applications or multi-core
> systems where I guess this case is hit frequently.  Maybe it's so
> cheap in hardware that it doesn't matter?  But then why didn't NVDIMM
> use this mechanism?
> 
> If anyone knows the answer I'd be interested in learning.  But this
> isn't a criticism of the patch - of course it needs to implement the
> hardware spec and we can't change it.

Is this coming from the underlying PCIe spec ?
In PCIe Base 4.0 Rev 0.3 Feb19-2014, section 2.4.1 Transaction ordering,
there's a Table 2-39 and entry B2a in that table is:


  'A Read Request must not pass a Posted Request unless B2b applies.'

and a posted request is defined as a 'Memory Write Request or a Message
Request'.

???

Dave

> Stefan
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH 2/7] hw/arm: Let devices own the MemoryRegion they create

On Fri, 21 Feb 2020 at 17:31, Philippe Mathieu-Daudé  wrote:
>
> To avoid orphean memory regions being added in the /unattached

("orphan")

> QOM container, use the memory_region_owner_nonnull.cocci script
> to set the correct ownership.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/arm/exynos4210.c| 14 +++---
>  hw/arm/fsl-imx25.c | 14 +++---
>  hw/arm/fsl-imx31.c | 10 +-
>  hw/arm/fsl-imx6.c  | 10 +-
>  hw/arm/fsl-imx6ul.c| 14 +++---
>  hw/arm/msf2-soc.c  |  8 
>  hw/arm/nrf51_soc.c |  2 +-
>  hw/arm/stm32f205_soc.c | 10 +-
>  hw/arm/stm32f405_soc.c | 13 +++--
>  hw/arm/xlnx-zynqmp.c   | 13 ++---
>  10 files changed, 54 insertions(+), 54 deletions(-)
>
> diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
> index 59a27bdd68..d4b05336ee 100644
> --- a/hw/arm/exynos4210.c
> +++ b/hw/arm/exynos4210.c
> @@ -305,28 +305,28 @@ static void exynos4210_realize(DeviceState *socdev, 
> Error **errp)
>  /*** Memory ***/
>
>  /* Chip-ID and OMR */
> -memory_region_init_io(>chipid_mem, NULL, 
> _chipid_and_omr_ops,
> -NULL, "exynos4210.chipid", sizeof(chipid_and_omr));
> +memory_region_init_io(>chipid_mem, OBJECT(socdev),
> +  _chipid_and_omr_ops, NULL,
> +  "exynos4210.chipid", sizeof(chipid_and_omr));
>  memory_region_add_subregion(system_mem, EXYNOS4210_CHIPID_ADDR,
>  >chipid_mem);
>
>  /* Internal ROM */
> -memory_region_init_ram(>irom_mem, NULL, "exynos4210.irom",
> +memory_region_init_ram(>irom_mem, OBJECT(socdev), "exynos4210.irom",
> EXYNOS4210_IROM_SIZE, _fatal);

I have a feeling that the owner of a RAM memory region affects
the name we use for the underlying ramblock, which in turn
is used in the on-the-wire data stream in migration, which means
that changing ownership breaks migration compatibility.
That's probably OK in most cases as we don't care too much
about migration-compat on most boards, but I think it does
mean that you want to keep those changes separated out from
the ones for IO and alias regions, which I think shouldn't
have visible effects.

thanks
-- PMM

Re: [PATCH v7 01/11] qapi/error: add (Error **errp) cleaning APIs

2020-02-21 Thread Vladimir Sementsov-Ogievskiy


21.02.2020 19:34, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


21.02.2020 10:38, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


Add functions to clean Error **errp: call corresponding Error *err
cleaning function an set pointer to NULL.

New functions:
error_free_errp
error_report_errp
warn_report_errp

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
Reviewed-by: Eric Blake 
---

CC: Eric Blake 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Greg Kurz 
CC: Stefano Stabellini 
CC: Anthony Perard 
CC: Paul Durrant 
CC: Stefan Hajnoczi 
CC: "Philippe Mathieu-Daudé" 
CC: Laszlo Ersek 
CC: Gerd Hoffmann 
CC: Stefan Berger 
CC: Markus Armbruster 
CC: Michael Roth 
CC: qemu-bl...@nongnu.org
CC: xen-de...@lists.xenproject.org

   include/qapi/error.h | 26 ++
   1 file changed, 26 insertions(+)

diff --git a/include/qapi/error.h b/include/qapi/error.h
index ad5b6e896d..d34987148d 100644
--- a/include/qapi/error.h
+++ b/include/qapi/error.h
@@ -309,6 +309,32 @@ void warn_reportf_err(Error *err, const char *fmt, ...)
   void error_reportf_err(Error *err, const char *fmt, ...)
   GCC_FMT_ATTR(2, 3);
   +/*
+ * Functions to clean Error **errp: call corresponding Error *err cleaning
+ * function, then set pointer to NULL.
+ */
+static inline void error_free_errp(Error **errp)
+{
+assert(errp && *errp);
+error_free(*errp);
+*errp = NULL;
+}
+
+static inline void error_report_errp(Error **errp)
+{
+assert(errp && *errp);
+error_report_err(*errp);
+*errp = NULL;
+}
+
+static inline void warn_report_errp(Error **errp)
+{
+assert(errp && *errp);
+warn_report_err(*errp);
+*errp = NULL;
+}
+
+
   /*
* Just like error_setg(), except you get to specify the error class.
* Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is


These appear to be unused apart from the Coccinelle script in PATCH 03.

They are used in the full "[RFC v5 000/126] error: auto propagated
local_err" series.  Options:

1. Pick a few more patches into this part I series, so these guys come
 with users.


It needs some additional effort for this series.. But it's possible. Still,
I think that we at least should not pull out patches which should be in
future series (for example from ppc or block/)..


Yes, we want to keep related stuff together.


Grepping through v5:
  for x in {warn_report_errp,error_report_errp,error_free_errp}; do echo == $x 
==; git grep -l $x | grep -v coccinelle | grep -v 'error\.h'; echo; done
== warn_report_errp ==
block/file-posix.c
hw/ppc/spapr.c
hw/ppc/spapr_caps.c
hw/ppc/spapr_irq.c
hw/vfio/pci.c
net/tap.c
qom/object.c

== error_report_errp ==
hw/block/vhost-user-blk.c
util/oslib-posix.c

== error_free_errp ==
block.c
block/qapi.c
block/sheepdog.c
block/snapshot.c
blockdev.c
chardev/char-socket.c
hw/audio/intel-hda.c
hw/core/qdev-properties.c
hw/pci-bridge/pci_bridge_dev.c
hw/pci-bridge/pcie_pci_bridge.c
hw/scsi/megasas.c
hw/scsi/mptsas.c
hw/usb/hcd-xhci.c
io/net-listener.c
migration/colo.c
qga/commands-posix.c
qga/commands-win32.c
util/qemu-sockets.c

What do you want to add?


PATCH v5 032 uses both error_report_errp() and error_free_errp().
Adding warn_report_errp() without a user is okay with me.  What do you
think?

If there are patches you consider related to 032, feel free to throw
them in.


032 is qga/commands-win32.c and util/oslib-posix.c

Seems that they are wrongly grouped into one patch.

qga/commands-win32.c matches qga/ (Michael Roth)
and  util/oslib-posix.c matches POSIX (Paolo Bonzini)

So, it should be two separate patches anyway.

For [1.] I only afraid that we'll have to wait for maintainers, who were
not interested in previous iterations, to review these new patches..





2. Punt this patch to the first part that has users, along with the
 part of the Coccinelle script that deals with them.


But coccinelle script would be wrong, if we drop this part from it. I think,
that after commit which adds coccinelle script, it should work with any file,
not only subset of these series.

So, it's probably OK for now to drop these functions, forcing their addition if
coccinelle script will be applied where these functions are needed. We can, for
example comment these three functions.

Splitting coccinelle script into two parts, which will be in different series 
will
not help any patch-porting processes.

Moreover, this will create dependencies between future series updating other 
files.

So, I don't like [2.]..


And it's likely more work than 1.


3. Do nothing: accept the functions without users.


OK for me)



I habitually dislike 3., but reviewing the rest of this series might
make me override that dislike.

[...]




--
Best regards,
Vladimir

Re: [PATCH v1] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-02-21 Thread Stefan Hajnoczi

On Fri, Feb 21, 2020 at 3:36 PM Andrzej Jakowski
 wrote:
> On 2/21/20 6:45 AM, Stefan Hajnoczi wrote:
> > Why is msync(2) done on memory loads instead of stores?
>
> This is my interpretation of NVMe spec wording with regards to PMRWBM field
> which says:
>
> "The completion of a memory read from any Persistent
> Memory Region address ensures that all prior writes to the
> Persistent Memory Region have completed and are
> persistent."

Thanks, I haven't read the PMR section of the spec :).

A synchronous operation is bad for virtualization performance.  While
the sync may be a cheap operation in hardware, it can be arbitrarily
expensive with msync(2).  The vCPU will be stuck until msync(2)
completes on the host.

It's also a strange design choice since performance will suffer when
an unrelated read has to wait for writes to complete.  This is
especially problematic for multi-threaded applications or multi-core
systems where I guess this case is hit frequently.  Maybe it's so
cheap in hardware that it doesn't matter?  But then why didn't NVDIMM
use this mechanism?

If anyone knows the answer I'd be interested in learning.  But this
isn't a criticism of the patch - of course it needs to implement the
hardware spec and we can't change it.

Stefan

[PATCH 7/7] hw/riscv: Let devices own the MemoryRegion they create

To avoid orphean memory regions being added in the /unattached
QOM container, use the memory_region_owner_nonnull.cocci script
to set the correct ownership.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/riscv/sifive_e.c | 8 
 hw/riscv/sifive_u.c | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c
index 8a6b0348df..583015c247 100644
--- a/hw/riscv/sifive_e.c
+++ b/hw/riscv/sifive_e.c
@@ -145,8 +145,8 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, 
Error **errp)
 _abort);
 
 /* Mask ROM */
-memory_region_init_rom(>mask_rom, NULL, "riscv.sifive.e.mrom",
-memmap[SIFIVE_E_MROM].size, _fatal);
+memory_region_init_rom(>mask_rom, OBJECT(dev), "riscv.sifive.e.mrom",
+   memmap[SIFIVE_E_MROM].size, _fatal);
 memory_region_add_subregion(sys_mem,
 memmap[SIFIVE_E_MROM].base, >mask_rom);
 
@@ -208,8 +208,8 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, 
Error **errp)
 memmap[SIFIVE_E_PWM2].base, memmap[SIFIVE_E_PWM2].size);
 
 /* Flash memory */
-memory_region_init_ram(>xip_mem, NULL, "riscv.sifive.e.xip",
-memmap[SIFIVE_E_XIP].size, _fatal);
+memory_region_init_ram(>xip_mem, OBJECT(dev), "riscv.sifive.e.xip",
+   memmap[SIFIVE_E_XIP].size, _fatal);
 memory_region_set_readonly(>xip_mem, true);
 memory_region_add_subregion(sys_mem, memmap[SIFIVE_E_XIP].base,
 >xip_mem);
diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 0e12b3ccef..592830a3a2 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -497,7 +497,7 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, 
Error **errp)
  _abort);
 
 /* boot rom */
-memory_region_init_rom(mask_rom, NULL, "riscv.sifive.u.mrom",
+memory_region_init_rom(mask_rom, OBJECT(dev), "riscv.sifive.u.mrom",
memmap[SIFIVE_U_MROM].size, _fatal);
 memory_region_add_subregion(system_memory, memmap[SIFIVE_U_MROM].base,
 mask_rom);
@@ -511,7 +511,7 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, 
Error **errp)
  * leave it enabled all the time. This won't break anything, but will be
  * too generous to misbehaving guests.
  */
-memory_region_init_ram(l2lim_mem, NULL, "riscv.sifive.u.l2lim",
+memory_region_init_ram(l2lim_mem, OBJECT(dev), "riscv.sifive.u.l2lim",
memmap[SIFIVE_U_L2LIM].size, _fatal);
 memory_region_add_subregion(system_memory, memmap[SIFIVE_U_L2LIM].base,
 l2lim_mem);
-- 
2.21.1

[PATCH 2/7] hw/arm: Let devices own the MemoryRegion they create

To avoid orphean memory regions being added in the /unattached
QOM container, use the memory_region_owner_nonnull.cocci script
to set the correct ownership.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/exynos4210.c| 14 +++---
 hw/arm/fsl-imx25.c | 14 +++---
 hw/arm/fsl-imx31.c | 10 +-
 hw/arm/fsl-imx6.c  | 10 +-
 hw/arm/fsl-imx6ul.c| 14 +++---
 hw/arm/msf2-soc.c  |  8 
 hw/arm/nrf51_soc.c |  2 +-
 hw/arm/stm32f205_soc.c | 10 +-
 hw/arm/stm32f405_soc.c | 13 +++--
 hw/arm/xlnx-zynqmp.c   | 13 ++---
 10 files changed, 54 insertions(+), 54 deletions(-)

diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 59a27bdd68..d4b05336ee 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -305,28 +305,28 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 /*** Memory ***/
 
 /* Chip-ID and OMR */
-memory_region_init_io(>chipid_mem, NULL, _chipid_and_omr_ops,
-NULL, "exynos4210.chipid", sizeof(chipid_and_omr));
+memory_region_init_io(>chipid_mem, OBJECT(socdev),
+  _chipid_and_omr_ops, NULL,
+  "exynos4210.chipid", sizeof(chipid_and_omr));
 memory_region_add_subregion(system_mem, EXYNOS4210_CHIPID_ADDR,
 >chipid_mem);
 
 /* Internal ROM */
-memory_region_init_ram(>irom_mem, NULL, "exynos4210.irom",
+memory_region_init_ram(>irom_mem, OBJECT(socdev), "exynos4210.irom",
EXYNOS4210_IROM_SIZE, _fatal);
 memory_region_set_readonly(>irom_mem, true);
 memory_region_add_subregion(system_mem, EXYNOS4210_IROM_BASE_ADDR,
 >irom_mem);
 /* mirror of iROM */
-memory_region_init_alias(>irom_alias_mem, NULL, "exynos4210.irom_alias",
- >irom_mem,
- 0,
+memory_region_init_alias(>irom_alias_mem, OBJECT(socdev),
+ "exynos4210.irom_alias", >irom_mem, 0,
  EXYNOS4210_IROM_SIZE);
 memory_region_set_readonly(>irom_alias_mem, true);
 memory_region_add_subregion(system_mem, EXYNOS4210_IROM_MIRROR_BASE_ADDR,
 >irom_alias_mem);
 
 /* Internal RAM */
-memory_region_init_ram(>iram_mem, NULL, "exynos4210.iram",
+memory_region_init_ram(>iram_mem, OBJECT(socdev), "exynos4210.iram",
EXYNOS4210_IRAM_SIZE, _fatal);
 memory_region_add_subregion(system_mem, EXYNOS4210_IRAM_BASE_ADDR,
 >iram_mem);
diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
index da3471b395..a77c873cfe 100644
--- a/hw/arm/fsl-imx25.c
+++ b/hw/arm/fsl-imx25.c
@@ -247,16 +247,16 @@ static void fsl_imx25_realize(DeviceState *dev, Error 
**errp)
 }
 
 /* initialize 2 x 16 KB ROM */
-memory_region_init_rom(>rom[0], NULL,
-   "imx25.rom0", FSL_IMX25_ROM0_SIZE, );
+memory_region_init_rom(>rom[0], OBJECT(dev), "imx25.rom0",
+   FSL_IMX25_ROM0_SIZE, );
 if (err) {
 error_propagate(errp, err);
 return;
 }
 memory_region_add_subregion(get_system_memory(), FSL_IMX25_ROM0_ADDR,
 >rom[0]);
-memory_region_init_rom(>rom[1], NULL,
-   "imx25.rom1", FSL_IMX25_ROM1_SIZE, );
+memory_region_init_rom(>rom[1], OBJECT(dev), "imx25.rom1",
+   FSL_IMX25_ROM1_SIZE, );
 if (err) {
 error_propagate(errp, err);
 return;
@@ -265,8 +265,8 @@ static void fsl_imx25_realize(DeviceState *dev, Error 
**errp)
 >rom[1]);
 
 /* initialize internal RAM (128 KB) */
-memory_region_init_ram(>iram, NULL, "imx25.iram", FSL_IMX25_IRAM_SIZE,
-   );
+memory_region_init_ram(>iram, OBJECT(dev), "imx25.iram",
+   FSL_IMX25_IRAM_SIZE, );
 if (err) {
 error_propagate(errp, err);
 return;
@@ -275,7 +275,7 @@ static void fsl_imx25_realize(DeviceState *dev, Error 
**errp)
 >iram);
 
 /* internal RAM (128 KB) is aliased over 128 MB - 128 KB */
-memory_region_init_alias(>iram_alias, NULL, "imx25.iram_alias",
+memory_region_init_alias(>iram_alias, OBJECT(dev), "imx25.iram_alias",
  >iram, 0, FSL_IMX25_IRAM_ALIAS_SIZE);
 memory_region_add_subregion(get_system_memory(), FSL_IMX25_IRAM_ALIAS_ADDR,
 >iram_alias);
diff --git a/hw/arm/fsl-imx31.c b/hw/arm/fsl-imx31.c
index 55e90d104b..d5057ea49b 100644
--- a/hw/arm/fsl-imx31.c
+++ b/hw/arm/fsl-imx31.c
@@ -206,7 +206,7 @@ static void fsl_imx31_realize(DeviceState *dev, Error 
**errp)
 }
 
 /* On a real system, the first 16k is a `secure boot rom' */
-memory_region_init_rom(>secure_rom, NULL, "imx31.secure_rom",
+

[PATCH 5/7] hw/display: Let devices own the MemoryRegion they create

To avoid orphean memory regions being added in the /unattached
QOM container, use the memory_region_owner_nonnull.cocci script
to set the correct ownership.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/display/cg3.c| 4 ++--
 hw/display/g364fb.c | 5 +++--
 hw/display/macfb.c  | 4 ++--
 hw/display/vmware_vga.c | 4 ++--
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/hw/display/cg3.c b/hw/display/cg3.c
index 4fb67c6b1c..db2e291d16 100644
--- a/hw/display/cg3.c
+++ b/hw/display/cg3.c
@@ -315,8 +315,8 @@ static void cg3_realizefn(DeviceState *dev, Error **errp)
 }
 }
 
-memory_region_init_ram(>vram_mem, NULL, "cg3.vram", s->vram_size,
-   _fatal);
+memory_region_init_ram(>vram_mem, OBJECT(dev), "cg3.vram",
+   s->vram_size, _fatal);
 memory_region_set_log(>vram_mem, true, DIRTY_MEMORY_VGA);
 sysbus_init_mmio(sbd, >vram_mem);
 
diff --git a/hw/display/g364fb.c b/hw/display/g364fb.c
index 55185c95c6..28785cceda 100644
--- a/hw/display/g364fb.c
+++ b/hw/display/g364fb.c
@@ -477,8 +477,9 @@ static void g364fb_init(DeviceState *dev, G364State *s)
 
 s->con = graphic_console_init(dev, 0, _ops, s);
 
-memory_region_init_io(>mem_ctrl, NULL, _ctrl_ops, s, "ctrl", 
0x18);
-memory_region_init_ram_ptr(>mem_vram, NULL, "vram",
+memory_region_init_io(>mem_ctrl, OBJECT(dev), _ctrl_ops, s,
+  "ctrl", 0x18);
+memory_region_init_ram_ptr(>mem_vram, OBJECT(dev), "vram",
s->vram_size, s->vram);
 vmstate_register_ram(>mem_vram, dev);
 memory_region_set_log(>mem_vram, true, DIRTY_MEMORY_VGA);
diff --git a/hw/display/macfb.c b/hw/display/macfb.c
index 8bff16d535..b68faff4bb 100644
--- a/hw/display/macfb.c
+++ b/hw/display/macfb.c
@@ -362,8 +362,8 @@ static void macfb_common_realize(DeviceState *dev, 
MacfbState *s, Error **errp)
 return;
 }
 
-memory_region_init_io(>mem_ctrl, NULL, _ctrl_ops, s, "macfb-ctrl",
-  0x1000);
+memory_region_init_io(>mem_ctrl, OBJECT(dev), _ctrl_ops, s,
+  "macfb-ctrl", 0x1000);
 
 memory_region_init_ram_nomigrate(>mem_vram, OBJECT(s), "macfb-vram",
  MACFB_VRAM_SIZE, errp);
diff --git a/hw/display/vmware_vga.c b/hw/display/vmware_vga.c
index 58ea82e3e5..79574e3c46 100644
--- a/hw/display/vmware_vga.c
+++ b/hw/display/vmware_vga.c
@@ -1238,8 +1238,8 @@ static void vmsvga_init(DeviceState *dev, struct 
vmsvga_state_s *s,
 s->vga.con = graphic_console_init(dev, 0, _ops, s);
 
 s->fifo_size = SVGA_FIFO_SIZE;
-memory_region_init_ram(>fifo_ram, NULL, "vmsvga.fifo", s->fifo_size,
-   _fatal);
+memory_region_init_ram(>fifo_ram, OBJECT(dev), "vmsvga.fifo",
+   s->fifo_size, _fatal);
 s->fifo_ptr = memory_region_get_ram_ptr(>fifo_ram);
 
 vga_common_init(>vga, OBJECT(dev));
-- 
2.21.1

[PATCH 6/7] hw/dma: Let devices own the MemoryRegion they create

To avoid orphean memory regions being added in the /unattached
QOM container, use the memory_region_owner_nonnull.cocci script
to set the correct ownership.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/dma/i8257.c  | 2 +-
 hw/dma/rc4030.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/dma/i8257.c b/hw/dma/i8257.c
index bad8debae4..ef15c06d77 100644
--- a/hw/dma/i8257.c
+++ b/hw/dma/i8257.c
@@ -553,7 +553,7 @@ static void i8257_realize(DeviceState *dev, Error **errp)
 I8257State *d = I8257(dev);
 int i;
 
-memory_region_init_io(>channel_io, NULL, _io_ops, d,
+memory_region_init_io(>channel_io, OBJECT(dev), _io_ops, d,
   "dma-chan", 8 << d->dshift);
 memory_region_add_subregion(isa_address_space_io(isa),
 d->base, >channel_io);
diff --git a/hw/dma/rc4030.c b/hw/dma/rc4030.c
index c4cf8236f4..f62eb3d2a3 100644
--- a/hw/dma/rc4030.c
+++ b/hw/dma/rc4030.c
@@ -679,9 +679,9 @@ static void rc4030_realize(DeviceState *dev, Error **errp)
 s->periodic_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
  rc4030_periodic_timer, s);
 
-memory_region_init_io(>iomem_chipset, NULL, _ops, s,
+memory_region_init_io(>iomem_chipset, o, _ops, s,
   "rc4030.chipset", 0x300);
-memory_region_init_io(>iomem_jazzio, NULL, _ops, s,
+memory_region_init_io(>iomem_jazzio, o, _ops, s,
   "rc4030.jazzio", 0x1000);
 
 memory_region_init_iommu(>dma_mr, sizeof(s->dma_mr),
-- 
2.21.1

[PATCH 3/7] hw/char: Let devices own the MemoryRegion they create

To avoid orphean memory regions being added in the /unattached
QOM container, use the memory_region_owner_nonnull.cocci script
to set the correct ownership.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/char/serial.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/char/serial.c b/hw/char/serial.c
index 9298881af9..2ab8b69e03 100644
--- a/hw/char/serial.c
+++ b/hw/char/serial.c
@@ -997,7 +997,7 @@ static void serial_io_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-memory_region_init_io(>io, NULL, _io_ops, s, "serial", 8);
+memory_region_init_io(>io, OBJECT(dev), _io_ops, s, "serial", 8);
 sysbus_init_mmio(SYS_BUS_DEVICE(sio), >io);
 sysbus_init_irq(SYS_BUS_DEVICE(sio), >irq);
 }
@@ -1106,8 +1106,9 @@ static void serial_mm_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-memory_region_init_io(>io, NULL, _mm_ops[smm->endianness], smm,
-  "serial", 8 << smm->regshift);
+memory_region_init_io(>io, OBJECT(dev),
+  _mm_ops[smm->endianness], smm, "serial",
+  8 << smm->regshift);
 sysbus_init_mmio(SYS_BUS_DEVICE(smm), >io);
 sysbus_init_irq(SYS_BUS_DEVICE(smm), >serial.irq);
 }
-- 
2.21.1

[PATCH 4/7] hw/core: Let devices own the MemoryRegion they create

To avoid orphean memory regions being added in the /unattached
QOM container, use the memory_region_owner_nonnull.cocci script
to set the correct ownership.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/core/platform-bus.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/core/platform-bus.c b/hw/core/platform-bus.c
index 22c5f76dd0..d494e5cec1 100644
--- a/hw/core/platform-bus.c
+++ b/hw/core/platform-bus.c
@@ -187,7 +187,8 @@ static void platform_bus_realize(DeviceState *dev, Error 
**errp)
 d = SYS_BUS_DEVICE(dev);
 pbus = PLATFORM_BUS_DEVICE(dev);
 
-memory_region_init(>mmio, NULL, "platform bus", pbus->mmio_size);
+memory_region_init(>mmio, OBJECT(dev), "platform bus",
+   pbus->mmio_size);
 sysbus_init_mmio(d, >mmio);
 
 pbus->used_irqs = bitmap_new(pbus->num_irqs);
-- 
2.21.1

[PATCH 0/7] hw: Let devices own the MemoryRegion they create

When a device creates a MemoryRegion without setting its ownership,
the MemoryRegion is added to the machine "/unattached" container in
the QOM tree.

Add a script to do automatically let the device take the ownership
of the memory regions it creates, and run it.

Philippe Mathieu-Daudé (7):
  scripts/coccinelle: Add a script to let devices own their
MemoryRegions
  hw/arm: Let devices own the MemoryRegion they create
  hw/char: Let devices own the MemoryRegion they create
  hw/core: Let devices own the MemoryRegion they create
  hw/display: Let devices own the MemoryRegion they create
  hw/dma: Let devices own the MemoryRegion they create
  hw/riscv: Let devices own the MemoryRegion they create

 hw/arm/exynos4210.c   | 14 ++--
 hw/arm/fsl-imx25.c| 14 ++--
 hw/arm/fsl-imx31.c| 10 +--
 hw/arm/fsl-imx6.c | 10 +--
 hw/arm/fsl-imx6ul.c   | 14 ++--
 hw/arm/msf2-soc.c |  8 +-
 hw/arm/nrf51_soc.c|  2 +-
 hw/arm/stm32f205_soc.c| 10 +--
 hw/arm/stm32f405_soc.c| 13 +--
 hw/arm/xlnx-zynqmp.c  | 13 ++-
 hw/char/serial.c  |  7 +-
 hw/core/platform-bus.c|  3 +-
 hw/display/cg3.c  |  4 +-
 hw/display/g364fb.c   |  5 +-
 hw/display/macfb.c|  4 +-
 hw/display/vmware_vga.c   |  4 +-
 hw/dma/i8257.c|  2 +-
 hw/dma/rc4030.c   |  4 +-
 hw/riscv/sifive_e.c   |  8 +-
 hw/riscv/sifive_u.c   |  4 +-
 .../memory_region_owner_nonnull.cocci | 80 +++
 21 files changed, 158 insertions(+), 75 deletions(-)
 create mode 100644 scripts/coccinelle/memory_region_owner_nonnull.cocci

-- 
2.21.1

[PATCH 1/7] scripts/coccinelle: Add a script to let devices own their MemoryRegions

When a device creates a MemoryRegion without setting its ownership,
the MemoryRegion is added to the machine "/unattached" container in
the QOM tree.

Example with the Samsung SMDKC210 board:

  $ arm-softmmu/qemu-system-arm -M smdkc210 -S -monitor stdio
  (qemu) info qom-tree
  /machine (smdkc210-machine)
/unattached (container)
  /io[0] (qemu:memory-region)
  /exynos4210.dram0[0] (qemu:memory-region)
  /exynos4210.irom[0] (qemu:memory-region)
  /exynos4210.iram[0] (qemu:memory-region)
  /exynos4210.chipid[0] (qemu:memory-region)
  ...
  /device[26] (exynos4210.uart)
/exynos4210.uart[0] (qemu:memory-region)
/soc (exynos4210)
  ^
   \__ [*]

The irom/iram/chipid regions should go under 'soc' at [*].

Add a coccinelle script to do automatic semantic patching.

Signed-off-by: Philippe Mathieu-Daudé 
---
 .../memory_region_owner_nonnull.cocci | 80 +++
 1 file changed, 80 insertions(+)
 create mode 100644 scripts/coccinelle/memory_region_owner_nonnull.cocci

diff --git a/scripts/coccinelle/memory_region_owner_nonnull.cocci 
b/scripts/coccinelle/memory_region_owner_nonnull.cocci
new file mode 100644
index 00..6748a200b2
--- /dev/null
+++ b/scripts/coccinelle/memory_region_owner_nonnull.cocci
@@ -0,0 +1,80 @@
+/*
+  Usage:
+
+spatch \
+--macro-file scripts/cocci-macro-file.h \
+--sp-file scripts/coccinelle/memory_region_owner_nonnull.cocci \
+--keep-comments \
+--in-place \
+--dir .
+
+*/
+
+// Device is owner
+@@
+typedef DeviceState;
+identifier device_fn, dev, obj;
+expression E1, E2, E3, E4, E5;
+@@
+static void device_fn(DeviceState *dev, ...)
+{
+  ...
+  Object *obj = OBJECT(dev);
+  <+...
+(
+- memory_region_init(E1, NULL, E2, E3);
++ memory_region_init(E1, obj, E2, E3);
+|
+- memory_region_init_io(E1, NULL, E2, E3, E4, E5);
++ memory_region_init_io(E1, obj, E2, E3, E4, E5);
+|
+- memory_region_init_alias(E1, NULL, E2, E3, E4, E5);
++ memory_region_init_alias(E1, obj, E2, E3, E4, E5);
+|
+- memory_region_init_rom(E1, NULL, E2, E3, E4);
++ memory_region_init_rom(E1, obj, E2, E3, E4);
+|
+- memory_region_init_ram(E1, NULL, E2, E3, E4);
++ memory_region_init_ram(E1, obj, E2, E3, E4);
+|
+- memory_region_init_ram_ptr(E1, NULL, E2, E3, E4);
++ memory_region_init_ram_ptr(E1, obj, E2, E3, E4);
+|
+- memory_region_init_ram_shared_nomigrate(E1, NULL, E2, E3, E4, E5);
++ memory_region_init_ram_shared_nomigrate(E1, obj, E2, E3, E4, E5);
+)
+  ...+>
+}
+
+// Device is owner
+@@
+identifier device_fn, dev;
+expression E1, E2, E3, E4, E5;
+@@
+static void device_fn(DeviceState *dev, ...)
+{
+  <+...
+(
+- memory_region_init(E1, NULL, E2, E3);
++ memory_region_init(E1, OBJECT(dev), E2, E3);
+|
+- memory_region_init_io(E1, NULL, E2, E3, E4, E5);
++ memory_region_init_io(E1, OBJECT(dev), E2, E3, E4, E5);
+|
+- memory_region_init_alias(E1, NULL, E2, E3, E4, E5);
++ memory_region_init_alias(E1, OBJECT(dev), E2, E3, E4, E5);
+|
+- memory_region_init_rom(E1, NULL, E2, E3, E4);
++ memory_region_init_rom(E1, OBJECT(dev), E2, E3, E4);
+|
+- memory_region_init_ram(E1, NULL, E2, E3, E4);
++ memory_region_init_ram(E1, OBJECT(dev), E2, E3, E4);
+|
+- memory_region_init_ram_ptr(E1, NULL, E2, E3, E4);
++ memory_region_init_ram_ptr(E1, OBJECT(dev), E2, E3, E4);
+|
+- memory_region_init_ram_shared_nomigrate(E1, NULL, E2, E3, E4, E5);
++ memory_region_init_ram_shared_nomigrate(E1, OBJECT(dev), E2, E3, E4, E5);
+)
+  ...+>
+}
-- 
2.21.1

Re: [PATCH] WHPX: Assigning maintainer for Windows Hypervisor Platform

On 18/02/20 21:38, Sunil Muthuswamy wrote:
> Signed-off-by: Sunil Muthuswamy 
> ---
>  MAINTAINERS | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 1740a4fddc..9b3ba4e1b5 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -404,6 +404,14 @@ S: Supported
>  F: target/i386/kvm.c
>  F: scripts/kvm/vmxcap
>  
> +WHPX CPUs
> +M: Sunil Muthuswamy 
> +S: Supported
> +F: target/i386/whpx-all.c
> +F: target/i386/whp-dispatch.h
> +F: accel/stubs/whpx-stub.c
> +F: include/sysemu/whpx.h
> +
>  Guest CPU Cores (Xen)
>  -
>  X86 Xen CPUs
> 

Queued.  Patches can still go through my tree while you get up to speed.
 Thanks!

Paolo

Re: [RFC PATCH v3 00/27] Add subcluster allocation to qcow2

On 22.12.19 12:36, Alberto Garcia wrote:
> Hi,
> 
> here's the new version of the patches to add subcluster allocation
> support to qcow2.
> 
> Please refer to the cover letter of the first version for a full
> description of the patches:
> 
>https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html
> 
> This version fixes many of the problems highlighted by Max. I decided
> not to replace completely the cluster logic with subcluster logic in
> all cases because I felt that sometimes it only complicated the code.
> Let's see what you think :-)

Looks good overall. :)

So now I wonder on what your plans are after this series.  Here are some
things that come to my mind, and I wonder whether you plan to address
them or whether there are more things to do still:

- In v2, you had a patch for preallocation support with backing files.
It didn’t quite work, which is why I think you dropped it for now (why
not, it isn’t crucial).

- There is a TODO on subcluster zeroing.

- I think adding support to amend for switching extended_l2 on or off
would make sense.  But maybe it’s too complicated to be worth the effort.

- As I noted in v2, I think it’d be great if it were possible to run the
iotests with -o extended_l2=on.  But I suppose this kind of depends on
me adding data_file support to the Python tests first...

Max



signature.asc
Description: OpenPGP digital signature

Re: [RFC PATCH v3 27/27] iotests: Add tests for qcow2 images with extended L2 entries

On 22.12.19 12:37, Alberto Garcia wrote:
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/271 | 256 +
>  tests/qemu-iotests/271.out | 208 ++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 465 insertions(+)
>  create mode 100755 tests/qemu-iotests/271
>  create mode 100644 tests/qemu-iotests/271.out

Currently, you’re using the reference output to verify the results.  I
find this rather difficult.

Can this not be written in a way that the test itself verifies the
results?  I realize bit manipulation in bash is hard, which is why I
wonder whether Python may be better suited for the job.

Or maybe at least there could be some way to produce a hexdump-like
result from some more abstract description on what to expect and then
compare the strings.

I suppose I can live with how it is, but I feel like I’d have to do
something in my head that could be better done by a script.

Max

signature.asc
Description: OpenPGP digital signature

Re: [PATCH v4 04/16] hw/i386: Introduce init_topo_info to initialize X86CPUTopoInfo

2020-02-21 Thread Igor Mammedov

On Thu, 13 Feb 2020 12:16:51 -0600
Babu Moger  wrote:

> Initialize all the parameters in one function init_topo_info.

is it possible to squash it in 2/16


> 
> Move the data structure X86CPUTopoIDs and X86CPUTopoInfo into
> x86.h.
A reason why it's moved should be here.

> 
> Signed-off-by: Babu Moger 
> Reviewed-by: Eduardo Habkost 
> ---
>  hw/i386/pc.c   |4 +---
>  hw/i386/x86.c  |   14 +++---
>  include/hw/i386/topology.h |   26 ++
>  include/hw/i386/x86.h  |   17 +
>  4 files changed, 31 insertions(+), 30 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 2adf7f6afa..9803413dd9 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1749,9 +1749,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>  return;
>  }
>  
> -topo_info.dies_per_pkg = x86ms->smp_dies;
> -topo_info.cores_per_die = smp_cores;
> -topo_info.threads_per_core = smp_threads;
> +init_topo_info(_info, x86ms);
>  
>  env->nr_dies = x86ms->smp_dies;
>  
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index f18cab8e5c..083effb2f5 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -63,15 +63,12 @@ static size_t pvh_start_addr;
>  uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
>  unsigned int cpu_index)
>  {
> -MachineState *ms = MACHINE(x86ms);
>  X86MachineClass *x86mc = X86_MACHINE_GET_CLASS(x86ms);
>  X86CPUTopoInfo topo_info;
>  uint32_t correct_id;
>  static bool warned;
>  
> -topo_info.dies_per_pkg = x86ms->smp_dies;
> -topo_info.cores_per_die = ms->smp.cores;
> -topo_info.threads_per_core = ms->smp.threads;
> +init_topo_info(_info, x86ms);
>  
>  correct_id = x86_apicid_from_cpu_idx(_info, cpu_index);
>  if (x86mc->compat_apic_id_mode) {
> @@ -146,10 +143,7 @@ int64_t x86_get_default_cpu_node_id(const MachineState 
> *ms, int idx)
> X86MachineState *x86ms = X86_MACHINE(ms);
> X86CPUTopoInfo topo_info;
>  
> -   topo_info.dies_per_pkg = x86ms->smp_dies;
> -   topo_info.cores_per_die = ms->smp.cores;
> -   topo_info.threads_per_core = ms->smp.threads;
> -
> +   init_topo_info(_info, x86ms);
>  
> assert(idx < ms->possible_cpus->len);
> x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
> @@ -177,9 +171,7 @@ const CPUArchIdList 
> *x86_possible_cpu_arch_ids(MachineState *ms)
>sizeof(CPUArchId) * max_cpus);
>  ms->possible_cpus->len = max_cpus;
>  
> -topo_info.dies_per_pkg = x86ms->smp_dies;
> -topo_info.cores_per_die = ms->smp.cores;
> -topo_info.threads_per_core = ms->smp.threads;
> +init_topo_info(_info, x86ms);
>  
>  for (i = 0; i < ms->possible_cpus->len; i++) {
>  X86CPUTopoIDs topo_ids;
> diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> index ba52d49079..ef0ab0b6a3 100644
> --- a/include/hw/i386/topology.h
> +++ b/include/hw/i386/topology.h
> @@ -40,23 +40,17 @@
>  
>  
>  #include "qemu/bitops.h"
> +#include "hw/i386/x86.h"
>  
> -/* APIC IDs can be 32-bit, but beware: APIC IDs > 255 require x2APIC support
> - */
> -typedef uint32_t apic_id_t;
> -
> -typedef struct X86CPUTopoIDs {
> -unsigned pkg_id;
> -unsigned die_id;
> -unsigned core_id;
> -unsigned smt_id;
> -} X86CPUTopoIDs;
> -
> -typedef struct X86CPUTopoInfo {
> -unsigned dies_per_pkg;
> -unsigned cores_per_die;
> -unsigned threads_per_core;
> -} X86CPUTopoInfo;
> +static inline void init_topo_info(X86CPUTopoInfo *topo_info,
> +  const X86MachineState *x86ms)
> +{
> +MachineState *ms = MACHINE(x86ms);
> +
> +topo_info->dies_per_pkg = x86ms->smp_dies;
> +topo_info->cores_per_die = ms->smp.cores;
> +topo_info->threads_per_core = ms->smp.threads;
> +}
>  
>  /* Return the bit width needed for 'count' IDs
>   */
> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
> index 4b84917885..ad62b01cf2 100644
> --- a/include/hw/i386/x86.h
> +++ b/include/hw/i386/x86.h
> @@ -36,6 +36,23 @@ typedef struct {
>  bool compat_apic_id_mode;
>  } X86MachineClass;
>  
> +/* APIC IDs can be 32-bit, but beware: APIC IDs > 255 require x2APIC support
> + */
> +typedef uint32_t apic_id_t;
> +
> +typedef struct X86CPUTopoIDs {
> +unsigned pkg_id;
> +unsigned die_id;
> +unsigned core_id;
> +unsigned smt_id;
> +} X86CPUTopoIDs;
> +
> +typedef struct X86CPUTopoInfo {
> +unsigned dies_per_pkg;
> +unsigned cores_per_die;
> +unsigned threads_per_core;
> +} X86CPUTopoInfo;
> +
>  typedef struct {
>  /*< private >*/
>  MachineState parent;
>

Re: [PATCH] accel/kvm: Check ioctl(KVM_SET_USER_MEMORY_REGION) return value

On 21/02/20 17:33, Philippe Mathieu-Daudé wrote:
> kvm_vm_ioctl() can fail, check its return value, and log an error
> when it failed. This fixes Coverity CID 1412229:
> 
>   Unchecked return value (CHECKED_RETURN)
> 
>   check_return: Calling kvm_vm_ioctl without checking return value
> 
> Reported-by: Coverity (CID 1412229)
> Fixes: 235e8982ad3 ("support using KVM_MEM_READONLY flag for regions")
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  accel/kvm/kvm-all.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index c111312dfd..6df3a4d030 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -308,13 +308,23 @@ static int kvm_set_user_memory_region(KVMMemoryListener 
> *kml, KVMSlot *slot, boo
>  /* Set the slot size to 0 before setting the slot to the desired
>   * value. This is needed based on KVM commit 75d61fbc. */
>  mem.memory_size = 0;
> -kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, );
> +ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, );
> +if (ret < 0) {
> +goto err;
> +}
>  }
>  mem.memory_size = slot->memory_size;
>  ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, );
>  slot->old_flags = mem.flags;
> +err:
>  trace_kvm_set_user_memory(mem.slot, mem.flags, mem.guest_phys_addr,
>mem.memory_size, mem.userspace_addr, ret);
> +if (ret < 0) {
> +error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
> + " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
> + __func__, mem.slot, slot->start_addr,
> + (uint64_t)mem.memory_size, strerror(errno));
> +}
>  return ret;
>  }
>  
> 

Queued, thanks.

Paolo

Re: [PULL v2 00/46] target-arm queue

On Fri, 21 Feb 2020 at 16:18, Peter Maydell  wrote:
>
> v1->v2 changes: dropped the last 6 patches from rth as there's
> a problem with one of them that's too complicated to try to
> fix up.
>
> thanks
> -- PMM
>
> The following changes since commit a8c6af67e1e8d460e2c6e87070807e0a02c0fec2:
>
>   Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200221' 
> into staging (2020-02-21 14:20:42 +)
>
> are available in the Git repository at:
>
>   https://git.linaro.org/people/pmaydell/qemu-arm.git 
> tags/pull-target-arm-20200221-1
>
> for you to fetch changes up to 9eb4f58918a851fb46895fd9b7ce579afeac9d02:
>
>   target/arm: Set MVFR0.FPSP for ARMv5 cpus (2020-02-21 16:07:03 +)
>
> 
> target-arm queue:
>  * aspeed/scu: Implement chip ID register
>  * hw/misc/iotkit-secctl: Fix writing to 'PPC Interrupt Clear' register
>  * mainstone: Make providing flash images non-mandatory
>  * z2: Make providing flash images non-mandatory
>  * Fix failures to flush SVE high bits after AdvSIMD 
> INS/ZIP/UZP/TRN/TBL/TBX/EXT
>  * Minor performance improvement: spend less time recalculating hflags values
>  * Code cleanup to isar_feature function tests
>  * Implement ARMv8.1-PMU and ARMv8.4-PMU extensions
>  * Bugfix: correct handling of PMCR_EL0.LC bit
>  * Bugfix: correct definition of PMCRDP
>  * Correctly implement ACTLR2, HACTLR2
>  * allwinner: Wire up USB ports
>  * Vectorize emulation of USHL, SSHL, PMUL*
>  * xilinx_spips: Correct the number of dummy cycles for the FAST_READ_4 cmd
>  * sh4: Fix PCI ISA IO memory subregion
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM

Re: [RFC PATCH v3 26/27] qcow2: Add subcluster support to qcow2_measure()

On 22.12.19 12:37, Alberto Garcia wrote:
> Extended L2 entries are bigger than normal L2 entries so this has an
> impact on the amount of metadata needed for a qcow2 file.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2.c | 19 ---
>  1 file changed, 12 insertions(+), 7 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC

2020-02-21 Thread Aleksandar Markovic

On Fri, Feb 21, 2020 at 5:11 PM Peter Maydell  wrote:
>
> On Fri, 21 Feb 2020 at 16:05, BALATON Zoltan  wrote:
> >
> > On Thu, 20 Feb 2020, Richard Henderson wrote:
> > > On 2/18/20 9:10 AM, BALATON Zoltan wrote:
> > >> +DEFINE_PROP_BOOL("hardfloat", PowerPCCPU, hardfloat, true),
> > >
> > > I would also prefer a different name here -- perhaps x-no-fp-fi.
> >
> > What's wrong with hardfloat? That's how the code refers to this so if
> > anyone searches what it does would turn up some meaningful results.
>
> This prompted me to check what you're using the property for.
> The cover letter says:
> > This patch implements a simple way to keep the inexact flag set for
> > hardfloat while still allowing to revert to softfloat for workloads
> > that need more accurate albeit slower emulation. (Set hardfloat
> > property of CPU, i.e. -cpu name,hardfloat=false for that.)
>
> I think that is the wrong approach. Enabling use of the host
> FPU should not affect the accuracy of the emulation, which
> should remain bitwise-correct. We should only be using the
> host FPU to the extent that we can do that without discarding
> accuracy. As far as I'm aware that's how the hardfloat support
> for other guest CPUs that use it works.
>

Right, that is my understanding as well. There shouldn't be
"hardfloat" switch at all. If there is, it is either a mistake, or for
experimental or other similar purposes. In my understanding,
hardfloat feature should work entirely transparently, making
the decision whether to use host math functions (instead of
softfloat library) by itself, based on the particular execution
circumstances.

In other words, the accuracy of FP emulation should not be
compromised in absolutely any case (there should not be an
option "ok, we are happy with approximate calculations").

On top of that, artificial generating of "inexact" flag really
seems problematic (or, speaking bluntly, looks like a hack)).

Perhaps hardfloat feature needs a little bit of more work,
but not in this way.

Yours,
Aleksandar

> thanks
> -- PMM
>

Re: [PATCH] console: make QMP screendump use coroutine

Kevin Wolf  writes:

> Am 20.02.2020 um 17:01 hat Markus Armbruster geschrieben:
>> >> >  void qmp_screendump(const char *filename, bool has_device, const char 
>> >> > *device,
>> >> >  bool has_head, int64_t head, Error **errp)
>> >> >  {
>> >> >  QemuConsole *con;
>> >> >  DisplaySurface *surface;
>> >> > +g_autoptr(pixman_image_t) image = NULL;
>> >> >  int fd;
>> >> >
>> >> >  if (has_device) {
>> >> > @@ -365,7 +375,15 @@ void qmp_screendump(const char *filename, bool 
>> >> > has_device, const char *device,
>> >> >  }
>> >> >  }
>> >> >
>> >> > -graphic_hw_update(con);
>> >> > +if (qemu_in_coroutine()) {
>> >> > +assert(!con->screendump_co);
>> >> > +con->screendump_co = qemu_coroutine_self();
>> >> > +aio_bh_schedule_oneshot(qemu_get_aio_context(),
>> >> > +graphic_hw_update_bh, con);
>> >> > +qemu_coroutine_yield();
>> >> > +con->screendump_co = NULL;
>> >> > +}
>> >>
>> >> What if multiple QMP monitors simultaneously screendump?  Hmm, it works
>> >> because all execute one after another in the same coroutine
>> >> qmp_dispatcher_co.  Implicit mutual exclusion.
>> >>
>> >> Executing them one after another is bad, because it lets an ill-behaved
>> >> QMP command starve *all* QMP monitors.  We do it only out of
>> >> (reasonable!) fear of implicit mutual exclusion requirements like the
>> >> one you add.
>> >>
>> >> Let's not add more if we can help it.
>> >
>> > The situation is not worse than the current blocking handling.
>> 
>> Really?
>> 
>> What makes executing multiple qmp_screendump() concurrently (in separate
>> threads) or interleaved (in separate coroutines in the same thread)
>> unsafe before this patch?
>
> QMP command handlers are guaranteed to run in the main thread with the
> BQL held, so there is no concurrency. If you want to change this, you
> would have much more complicated problems to solve than in this handler.
> I'm not sure it's fair to require thread-safety from one handler when
> no other handler is thread safe (except accidentally) and nobody seems
> to plan actually calling them from multiple threads.

"Let's not [...] if we can help it." is hardly a "change this or else no
merge" demand.  It is a challenge to find a more elegant solution.

>> >> Your screendump_co is per QemuConsole instead of per QMP monitor only
>> >> because you need to find the coroutine in graphic_hw_update_done().  Can
>> >> we somehow pass it via function arguments?
>> >
>> > I think it could be done later, so I suggest a TODO.
>> 
>> We should avoid making our dependence on implicit mutual exclusion
>> worse.  When we do it anyway, a big, fat, ugly comment is definitely
>> called for.
>
> Anyway, what I really wanted to add:
>
> This should be easy to solve by having a CoQueue instead of a single

Ah, challenge accepted!  Exactly the outcome I was hoping for :)

> Coroutine pointer. The coroutine would just call qemu_co_queue_wait(),
> which adds itself to the queue before it yields and the update
> completion would wake up all coroutines that are currently queued with
> qemu_co_queue_restart_all().
>
> qemu_co_queue_wait() takes a lock as its second parameter. You don't
> need it in this context and can just pass NULL. (This is a lock that
> would be dropped while the coroutine is sleeping and automatically
> reacquired afterwards.)
>
>> >> In case avoiding the mutual exclusion is impractical: please explain it
>> >> in a comment to make it somewhat less implicit.
>> 
>> It is anything but: see appended patch.
>
> This works, too, but it requires an additional struct. I think the queue
> is easier. (Note there is a difference in the mechanism: Your patch
> waits for the specific update it triggered, while the CoQueue would wait
> for _any_ update to complete. I assume effectively the result is the
> same.)

Your idea sounds much nicer to me.  Thanks!

Re: [PATCH] accel/kvm: Check ioctl(KVM_SET_USER_MEMORY_REGION) return value

2020-02-21 Thread Peter Xu

On Fri, Feb 21, 2020 at 05:33:36PM +0100, Philippe Mathieu-Daudé wrote:
> kvm_vm_ioctl() can fail, check its return value, and log an error
> when it failed. This fixes Coverity CID 1412229:
> 
>   Unchecked return value (CHECKED_RETURN)
> 
>   check_return: Calling kvm_vm_ioctl without checking return value
> 
> Reported-by: Coverity (CID 1412229)
> Fixes: 235e8982ad3 ("support using KVM_MEM_READONLY flag for regions")
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [RFC PATCH v3 25/27] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit

On 22.12.19 12:37, Alberto Garcia wrote:
> Now that the implementation of subclusters is complete we can finally
> add the necessary options to create and read images with this feature,
> which we call "extended L2 entries".
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2.c|  65 ++--
>  block/qcow2.h|   8 ++-
>  include/block/block_int.h|   1 +
>  qapi/block-core.json |   7 +++
>  tests/qemu-iotests/031.out   |   8 +--
>  tests/qemu-iotests/036.out   |   4 +-
>  tests/qemu-iotests/049.out   | 102 +++
>  tests/qemu-iotests/060.out   |   1 +
>  tests/qemu-iotests/061.out   |  20 +++---
>  tests/qemu-iotests/065   |  18 --
>  tests/qemu-iotests/082.out   |  48 ---
>  tests/qemu-iotests/085.out   |  38 ++--
>  tests/qemu-iotests/144.out   |   4 +-
>  tests/qemu-iotests/182.out   |   2 +-
>  tests/qemu-iotests/185.out   |   8 +--
>  tests/qemu-iotests/198.out   |   2 +
>  tests/qemu-iotests/206.out   |   4 ++
>  tests/qemu-iotests/242.out   |   5 ++
>  tests/qemu-iotests/255.out   |   8 +--
>  tests/qemu-iotests/273.out   |   9 ++-
>  tests/qemu-iotests/common.filter |   1 +
>  21 files changed, 245 insertions(+), 118 deletions(-)

With the .qapi versions adjusted to match $next_release, and with the
bit fixed to be at index 4 instead of 3 (and with the iotests rebases
that always become necessary[1]):

Reviewed-by: Max Reitz 

[1] e.g. 280 fails now – I suppose qemu_img_log should filter just like
the bash tests do, but then again, I’d rather drop that function
altogether anyway
(https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg00136.html)



signature.asc
Description: OpenPGP digital signature

[PATCH v2 11/13] migration/multifd: Print used_length of memory block

We actually want to print the used_length, against which we check.

Cc: "Dr. David Alan Gilbert" 
Cc: Juan Quintela 
Cc: Peter Xu 
Signed-off-by: David Hildenbrand 
---
 migration/multifd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index b3e8ae9bcc..dd9e88c5f1 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -222,7 +222,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, 
Error **errp)
 if (offset > (block->used_length - qemu_target_page_size())) {
 error_setg(errp, "multifd: offset too long %" PRIu64
" (max " RAM_ADDR_FMT ")",
-   offset, block->max_length);
+   offset, block->used_length);
 return -1;
 }
 p->pages->iov[i].iov_base = block->host + offset;
-- 
2.24.1

[PATCH v2 12/13] migration/ram: Use offset_in_ramblock() in range checks

We never read or write beyond the used_length of memory blocks when
migrating. Make this clearer by using offset_in_ramblock() consistently.

Cc: "Dr. David Alan Gilbert" 
Cc: Juan Quintela 
Cc: Peter Xu 
Signed-off-by: David Hildenbrand 
---
 migration/ram.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index ee5c3d5784..5cc9993899 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1309,8 +1309,8 @@ static bool find_dirty_block(RAMState *rs, 
PageSearchStatus *pss, bool *again)
 *again = false;
 return false;
 }
-if ram_addr_t)pss->page) << TARGET_PAGE_BITS)
->= pss->block->used_length) {
+if (!offset_in_ramblock(pss->block,
+((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) {
 /* Didn't find anything in this RAM Block */
 pss->page = 0;
 pss->block = QLIST_NEXT_RCU(pss->block, next);
@@ -1514,7 +1514,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t 
start, ram_addr_t len)
 rs->last_req_rb = ramblock;
 }
 trace_ram_save_queue_pages(ramblock->idstr, start, len);
-if (start+len > ramblock->used_length) {
+if (!offset_in_ramblock(ramblock, start + len - 1)) {
 error_report("%s request overrun start=" RAM_ADDR_FMT " len="
  RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT,
  __func__, start, len, ramblock->used_length);
@@ -3325,8 +3325,8 @@ static void colo_flush_ram_cache(void)
 while (block) {
 offset = migration_bitmap_find_dirty(ram_state, block, offset);
 
-if (((ram_addr_t)offset) << TARGET_PAGE_BITS
->= block->used_length) {
+if (!offset_in_ramblock(block,
+((ram_addr_t)offset) << TARGET_PAGE_BITS)) 
{
 offset = 0;
 block = QLIST_NEXT_RCU(block, next);
 } else {
-- 
2.24.1

[PATCH v2 10/13] migration/ram: Handle RAM block resizes during postcopy

Resizing while migrating is dangerous and does not work as expected.
The whole migration code works on the usable_length of ram blocks and does
not expect this to change at random points in time.

In the case of postcopy, relying on used_length is racy as soon as the
guest is running. Also, when used_length changes we might leave the
uffd handler registered for some memory regions, reject valid pages
when migrating and fail when sending the recv bitmap to the source.

Resizing can be trigger *after* (but not during) a reset in
ACPI code by the guest
- hw/arm/virt-acpi-build.c:acpi_ram_update()
- hw/i386/acpi-build.c:acpi_ram_update()

Let's remember the original used_length in a separate variable and
use it in relevant postcopy code. Make sure to update it when we resize
during precopy, when synchronizing the RAM block sizes with the source.

Cc: "Dr. David Alan Gilbert" 
Cc: Juan Quintela 
Cc: Eduardo Habkost 
Cc: Paolo Bonzini 
Cc: Igor Mammedov 
Cc: "Michael S. Tsirkin" 
Cc: Richard Henderson 
Cc: Shannon Zhao 
Cc: Alex Bennée 
Cc: Peter Xu 
Signed-off-by: David Hildenbrand 
---
 include/exec/ramblock.h  | 10 ++
 migration/postcopy-ram.c | 15 ---
 migration/ram.c  | 11 +--
 3 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 07d50864d8..664701b759 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -59,6 +59,16 @@ struct RAMBlock {
  */
 unsigned long *clear_bmap;
 uint8_t clear_bmap_shift;
+
+/*
+ * RAM block length that corresponds to the used_length on the migration
+ * source (after RAM block sizes were synchronized). Especially, after
+ * starting to run the guest, used_length and postcopy_length can differ.
+ * Used to register/unregister uffd handlers and as the size of the 
received
+ * bitmap. Receiving any page beyond this length will bail out, as it
+ * could not have been valid on the source.
+ */
+ram_addr_t postcopy_length;
 };
 #endif
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index a36402722b..c68caf4e42 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -17,6 +17,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/rcu.h"
 #include "exec/target_page.h"
 #include "migration.h"
 #include "qemu-file.h"
@@ -31,6 +32,7 @@
 #include "qemu/error-report.h"
 #include "trace.h"
 #include "hw/boards.h"
+#include "exec/ramblock.h"
 
 /* Arbitrary limit on size of each discard command,
  * keeps them around ~200 bytes
@@ -456,6 +458,13 @@ static int init_range(RAMBlock *rb, void *opaque)
 ram_addr_t length = qemu_ram_get_used_length(rb);
 trace_postcopy_init_range(block_name, host_addr, offset, length);
 
+/*
+ * Save the used_length before running the guest. In case we have to
+ * resize RAM blocks when syncing RAM block sizes from the source during
+ * precopy, we'll update it manually via the ram block notifier.
+ */
+rb->postcopy_length = length;
+
 /*
  * We need the whole of RAM to be truly empty for postcopy, so things
  * like ROMs and any data tables built during init must be zero'd
@@ -478,7 +487,7 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
 const char *block_name = qemu_ram_get_idstr(rb);
 void *host_addr = qemu_ram_get_host_addr(rb);
 ram_addr_t offset = qemu_ram_get_offset(rb);
-ram_addr_t length = qemu_ram_get_used_length(rb);
+ram_addr_t length = rb->postcopy_length;
 MigrationIncomingState *mis = opaque;
 struct uffdio_range range_struct;
 trace_postcopy_cleanup_range(block_name, host_addr, offset, length);
@@ -600,7 +609,7 @@ static int nhp_range(RAMBlock *rb, void *opaque)
 const char *block_name = qemu_ram_get_idstr(rb);
 void *host_addr = qemu_ram_get_host_addr(rb);
 ram_addr_t offset = qemu_ram_get_offset(rb);
-ram_addr_t length = qemu_ram_get_used_length(rb);
+ram_addr_t length = rb->postcopy_length;
 trace_postcopy_nhp_range(block_name, host_addr, offset, length);
 
 /*
@@ -644,7 +653,7 @@ static int ram_block_enable_notify(RAMBlock *rb, void 
*opaque)
 struct uffdio_register reg_struct;
 
 reg_struct.range.start = (uintptr_t)qemu_ram_get_host_addr(rb);
-reg_struct.range.len = qemu_ram_get_used_length(rb);
+reg_struct.range.len = rb->postcopy_length;
 reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
 
 /* Now tell our userfault_fd that it's responsible for this area */
diff --git a/migration/ram.c b/migration/ram.c
index 1a5ff07997..ee5c3d5784 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -244,7 +244,7 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
 return -1;
 }
 
-nbits = block->used_length >> TARGET_PAGE_BITS;
+nbits = block->postcopy_length >> TARGET_PAGE_BITS;
 
 /*
  * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit
@@ -3160,7 +3160,13 @@ static int

[PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code

When we partially change mappings (esp., mmap over parts of an existing
mmap like qemu_ram_remap() does) where we have a userfaultfd handler
registered, the handler will implicitly be unregistered from the parts that
changed.

Trying to place pages onto mappings where there is no longer a handler
registered will fail. Let's make sure that any waiter is woken up - we
have to do that manually.

Let's also document how UFFDIO_UNREGISTER will handle this scenario.

This is mainly a preparation for RAM blocks with resizable allcoations,
where the mapping of the invalid RAM range will change. The source will
keep sending pages that are outside of the new (shrunk) RAM size. We have
to treat these pages like they would have been migrated, but can
essentially simply drop the content (ignore the placement error).

Keep printing a warning on EINVAL, to avoid hiding other (programming)
issues. ENOENT is unique.

Cc: "Dr. David Alan Gilbert" 
Cc: Juan Quintela 
Cc: Peter Xu 
Cc: Andrea Arcangeli 
Signed-off-by: David Hildenbrand 
---
 migration/postcopy-ram.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index c68caf4e42..f023830b9a 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -506,6 +506,12 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
 range_struct.start = (uintptr_t)host_addr;
 range_struct.len = length;
 
+/*
+ * In case the mapping was partially changed since we enabled userfault
+ * (e.g., via qemu_ram_remap()), the userfaultfd handler was already 
removed
+ * for the mappings that changed. Unregistering will, however, still work
+ * and ignore mappings without a registered handler.
+ */
 if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, _struct)) {
 error_report("%s: userfault unregister %s", __func__, strerror(errno));
 
@@ -1180,6 +1186,17 @@ int postcopy_ram_incoming_setup(MigrationIncomingState 
*mis)
 return 0;
 }
 
+static int qemu_ufd_wake_ioctl(int userfault_fd, void *host_addr,
+   uint64_t pagesize)
+{
+struct uffdio_range range = {
+.start = (uint64_t)(uintptr_t)host_addr,
+.len = pagesize,
+};
+
+return ioctl(userfault_fd, UFFDIO_WAKE, );
+}
+
 static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
void *from_addr, uint64_t pagesize, RAMBlock 
*rb)
 {
@@ -1198,6 +1215,26 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void 
*host_addr,
 zero_struct.mode = 0;
 ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, _struct);
 }
+
+/*
+ * When the mapping gets partially changed (e.g., qemu_ram_remap()) before
+ * we try to place a page, the userfaultfd handler will be removed for the
+ * changed mappings and placing pages will fail. We can safely ignore this,
+ * because mappings that changed on the destination don't need data from 
the
+ * source (e.g., qemu_ram_remap()). Wake up any waiter waiting for that 
page
+ * (unlikely but possible). Waking up waiters is always possible, even
+ * without a registered userfaultfd handler.
+ *
+ * Old kernels report EINVAL, new kernels report ENOENT in case there is
+ * no longer a userfaultfd handler for a mapping.
+ */
+if (ret && (errno == ENOENT || errno == EINVAL)) {
+if (errno == EINVAL) {
+warn_report("%s: Failed to place page %p. Waking up any waiters.",
+ __func__, host_addr);
+}
+ret = qemu_ufd_wake_ioctl(userfault_fd, host_addr, pagesize);
+}
 if (!ret) {
 ramblock_recv_bitmap_set_range(rb, host_addr,
pagesize / qemu_target_page_size());
-- 
2.24.1

[PATCH v2 08/13] migration/ram: Simplify host page handling in ram_load_postcopy()

Add two new helper functions. This will in come handy once we want to
handle ram block resizes while postcopy is active.

Cc: "Dr. David Alan Gilbert" 
Cc: Juan Quintela 
Cc: Peter Xu 
Signed-off-by: David Hildenbrand 
---
 migration/ram.c | 54 -
 1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d5a4d69e1c..f815f4e532 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2734,6 +2734,20 @@ static inline void *host_from_ram_block_offset(RAMBlock 
*block,
 return block->host + offset;
 }
 
+static void *host_page_from_ram_block_offset(RAMBlock *block,
+ ram_addr_t offset)
+{
+/* Note: Explicitly no check against offset_in_ramblock(). */
+return (void *)QEMU_ALIGN_DOWN((uintptr_t)block->host + offset,
+   block->page_size);
+}
+
+static ram_addr_t host_page_offset_from_ram_block_offset(RAMBlock *block,
+ ram_addr_t offset)
+{
+return ((uintptr_t)block->host + offset) & (block->page_size - 1);
+}
+
 static inline void *colo_cache_from_block_offset(RAMBlock *block,
  ram_addr_t offset)
 {
@@ -3111,13 +3125,12 @@ static int ram_load_postcopy(QEMUFile *f)
 MigrationIncomingState *mis = migration_incoming_get_current();
 /* Temporary page that is later 'placed' */
 void *postcopy_host_page = mis->postcopy_tmp_page;
-void *this_host = NULL;
+void *host_page = NULL;
 bool all_zero = false;
 int target_pages = 0;
 
 while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
 ram_addr_t addr;
-void *host = NULL;
 void *page_buffer = NULL;
 void *place_source = NULL;
 RAMBlock *block = NULL;
@@ -3143,9 +3156,12 @@ static int ram_load_postcopy(QEMUFile *f)
 if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
  RAM_SAVE_FLAG_COMPRESS_PAGE)) {
 block = ram_block_from_stream(f, flags);
+if (!block) {
+ret = -EINVAL;
+break;
+}
 
-host = host_from_ram_block_offset(block, addr);
-if (!host) {
+if (!offset_in_ramblock(block, addr)) {
 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
 ret = -EINVAL;
 break;
@@ -3163,21 +3179,18 @@ static int ram_load_postcopy(QEMUFile *f)
  * of a host page in one chunk.
  */
 page_buffer = postcopy_host_page +
-  ((uintptr_t)host & (block->page_size - 1));
+  host_page_offset_from_ram_block_offset(block, addr);
 /* If all TP are zero then we can optimise the place */
 if (target_pages == 1) {
 all_zero = true;
-this_host = (void *)QEMU_ALIGN_DOWN((uintptr_t)host,
-block->page_size);
-} else {
+host_page = host_page_from_ram_block_offset(block, addr);
+} else if (host_page != host_page_from_ram_block_offset(block,
+addr)) {
 /* not the 1st TP within the HP */
-if (QEMU_ALIGN_DOWN((uintptr_t)host, block->page_size) !=
-(uintptr_t)this_host) {
-error_report("Non-same host page %p/%p",
-  host, this_host);
-ret = -EINVAL;
-break;
-}
+error_report("Non-same host page %p/%p", host_page,
+ host_page_from_ram_block_offset(block, addr));
+ret = -EINVAL;
+break;
 }
 
 /*
@@ -3257,16 +3270,11 @@ static int ram_load_postcopy(QEMUFile *f)
 }
 
 if (!ret && place_needed) {
-/* This gets called at the last target page in the host page */
-void *place_dest = (void *)QEMU_ALIGN_DOWN((uintptr_t)host,
-   block->page_size);
-
 if (all_zero) {
-ret = postcopy_place_page_zero(mis, place_dest,
-   block);
+ret = postcopy_place_page_zero(mis, host_page, block);
 } else {
-ret = postcopy_place_page(mis, place_dest,
-  place_source, block);
+ret = postcopy_place_page(mis, host_page, place_source,
+  block);
 }
 }
 }
-- 
2.24.1

[PATCH v2 09/13] migration/ram: Consolidate variable reset after placement in ram_load_postcopy()

Let's consolidate resetting the variables.

Cc: "Dr. David Alan Gilbert" 
Cc: Juan Quintela 
Cc: Peter Xu 
Signed-off-by: David Hildenbrand 
---
 migration/ram.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index f815f4e532..1a5ff07997 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3126,7 +3126,7 @@ static int ram_load_postcopy(QEMUFile *f)
 /* Temporary page that is later 'placed' */
 void *postcopy_host_page = mis->postcopy_tmp_page;
 void *host_page = NULL;
-bool all_zero = false;
+bool all_zero = true;
 int target_pages = 0;
 
 while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
@@ -3152,7 +3152,6 @@ static int ram_load_postcopy(QEMUFile *f)
 addr &= TARGET_PAGE_MASK;
 
 trace_ram_load_postcopy_loop((uint64_t)addr, flags);
-place_needed = false;
 if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
  RAM_SAVE_FLAG_COMPRESS_PAGE)) {
 block = ram_block_from_stream(f, flags);
@@ -3180,9 +3179,7 @@ static int ram_load_postcopy(QEMUFile *f)
  */
 page_buffer = postcopy_host_page +
   host_page_offset_from_ram_block_offset(block, addr);
-/* If all TP are zero then we can optimise the place */
 if (target_pages == 1) {
-all_zero = true;
 host_page = host_page_from_ram_block_offset(block, addr);
 } else if (host_page != host_page_from_ram_block_offset(block,
 addr)) {
@@ -3199,7 +3196,6 @@ static int ram_load_postcopy(QEMUFile *f)
  */
 if (target_pages == (block->page_size / TARGET_PAGE_SIZE)) {
 place_needed = true;
-target_pages = 0;
 }
 place_source = postcopy_host_page;
 }
@@ -3276,6 +3272,10 @@ static int ram_load_postcopy(QEMUFile *f)
 ret = postcopy_place_page(mis, host_page, place_source,
   block);
 }
+place_needed = false;
+target_pages = 0;
+/* Assume we have a zero page until we detect something different 
*/
+all_zero = true;
 }
 }
 
-- 
2.24.1

[PATCH v2 03/13] numa: Teach ram block notifiers about resizeable ram blocks

Ram block notifiers are currently not aware of resizes. Especially to
handle resizes during migration, but also to implement actually resizeable
ram blocks (make everything between used_length and max_length
inaccessible), we want to teach ram block notifiers about resizeable
ram.

Introduce the basic infrastructure but keep using max_size in the
existing notifiers. Supply the max_size when adding and removing ram
blocks. Also, notify on resizes.

Acked-by: Paul Durrant 
Reviewed-by: Peter Xu 
Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: "Dr. David Alan Gilbert" 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Paul Durrant 
Cc: "Michael S. Tsirkin" 
Cc: xen-de...@lists.xenproject.org
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 exec.c | 13 +++--
 hw/core/numa.c | 22 +-
 hw/i386/xen/xen-mapcache.c |  7 ---
 include/exec/ramlist.h | 13 +
 target/i386/hax-mem.c  |  5 +++--
 target/i386/sev.c  | 18 ++
 util/vfio-helpers.c| 16 
 7 files changed, 62 insertions(+), 32 deletions(-)

diff --git a/exec.c b/exec.c
index dfd43d27c6..b75250e773 100644
--- a/exec.c
+++ b/exec.c
@@ -2129,6 +2129,8 @@ static int memory_try_enable_merging(void *addr, size_t 
len)
  */
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
 {
+const ram_addr_t oldsize = block->used_length;
+
 assert(block);
 
 newsize = HOST_PAGE_ALIGN(newsize);
@@ -2153,6 +2155,11 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, 
Error **errp)
 return -EINVAL;
 }
 
+/* Notify before modifying the ram block and touching the bitmaps. */
+if (block->host) {
+ram_block_notify_resize(block->host, oldsize, newsize);
+}
+
 cpu_physical_memory_clear_dirty_range(block->offset, block->used_length);
 block->used_length = newsize;
 cpu_physical_memory_set_dirty_range(block->offset, block->used_length,
@@ -2312,7 +2319,8 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp, bool shared)
 qemu_madvise(new_block->host, new_block->max_length, 
QEMU_MADV_HUGEPAGE);
 /* MADV_DONTFORK is also needed by KVM in absence of synchronous MMU */
 qemu_madvise(new_block->host, new_block->max_length, 
QEMU_MADV_DONTFORK);
-ram_block_notify_add(new_block->host, new_block->max_length);
+ram_block_notify_add(new_block->host, new_block->used_length,
+ new_block->max_length);
 }
 }
 
@@ -2492,7 +2500,8 @@ void qemu_ram_free(RAMBlock *block)
 }
 
 if (block->host) {
-ram_block_notify_remove(block->host, block->max_length);
+ram_block_notify_remove(block->host, block->used_length,
+block->max_length);
 }
 
 qemu_mutex_lock_ramlist();
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 6599c69e05..e28ad24fcd 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -902,11 +902,12 @@ void query_numa_node_mem(NumaNodeMem node_mem[], 
MachineState *ms)
 static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
 {
 const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+const ram_addr_t size = qemu_ram_get_used_length(rb);
 void *host = qemu_ram_get_host_addr(rb);
 RAMBlockNotifier *notifier = opaque;
 
 if (host) {
-notifier->ram_block_added(notifier, host, max_size);
+notifier->ram_block_added(notifier, host, size, max_size);
 }
 return 0;
 }
@@ -923,20 +924,31 @@ void ram_block_notifier_remove(RAMBlockNotifier *n)
 QLIST_REMOVE(n, next);
 }
 
-void ram_block_notify_add(void *host, size_t size)
+void ram_block_notify_add(void *host, size_t size, size_t max_size)
 {
 RAMBlockNotifier *notifier;
 
 QLIST_FOREACH(notifier, _list.ramblock_notifiers, next) {
-notifier->ram_block_added(notifier, host, size);
+notifier->ram_block_added(notifier, host, size, max_size);
 }
 }
 
-void ram_block_notify_remove(void *host, size_t size)
+void ram_block_notify_remove(void *host, size_t size, size_t max_size)
 {
 RAMBlockNotifier *notifier;
 
 QLIST_FOREACH(notifier, _list.ramblock_notifiers, next) {
-notifier->ram_block_removed(notifier, host, size);
+notifier->ram_block_removed(notifier, host, size, max_size);
+}
+}
+
+void ram_block_notify_resize(void *host, size_t old_size, size_t new_size)
+{
+RAMBlockNotifier *notifier;
+
+QLIST_FOREACH(notifier, _list.ramblock_notifiers, next) {
+if (notifier->ram_block_resized) {
+notifier->ram_block_resized(notifier, host, old_size, new_size);
+}
 }
 }
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 5b120ed44b..d6dcea65d1 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -169,7 +169,8 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 
 if

[PATCH v2 07/13] migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init()

In case we grow our RAM after ram_postcopy_incoming_init() (e.g., when
synchronizing the RAM block state with the migration source), the resized
part would not get discarded. Let's perform that when being notified
about a resize while postcopy has been advised, but is not listening
yet. With precopy, the process is as following:

1. VM created
- RAM blocks are created
2. Incomming migration started
- Postcopy is advised
- All pages in RAM blocks are discarded
3. Precopy starts
- RAM blocks are resized to match the size on the migration source.
- RAM pages from precopy stream are loaded
- Uffd handler is registered, postcopy starts listening
3. Guest started, postcopy running
- Pagefaults get resolved, pages get placed

Cc: "Dr. David Alan Gilbert" 
Cc: Juan Quintela 
Cc: Peter Xu 
Signed-off-by: David Hildenbrand 
---
 migration/ram.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 39c7d1c4a6..d5a4d69e1c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3714,6 +3714,7 @@ static SaveVMHandlers savevm_ram_handlers = {
 static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
   size_t old_size, size_t new_size)
 {
+PostcopyState ps = postcopy_state_get();
 ram_addr_t offset;
 Error *err = NULL;
 RAMBlock *rb = qemu_ram_block_from_host(host, false, );
@@ -3734,6 +3735,35 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier 
*n, void *host,
 error_free(err);
 migration_cancel();
 }
+
+switch (ps) {
+case POSTCOPY_INCOMING_ADVISE:
+/*
+ * Update what ram_postcopy_incoming_init()->init_range() does at the
+ * time postcopy was advised. Syncing RAM blocks with the source will
+ * result in RAM resizes.
+ */
+if (old_size < new_size) {
+if (ram_discard_range(rb->idstr, old_size, new_size - old_size)) {
+error_report("RAM block '%s' discard of resized RAM failed",
+ rb->idstr);
+}
+}
+break;
+case POSTCOPY_INCOMING_NONE:
+case POSTCOPY_INCOMING_RUNNING:
+case POSTCOPY_INCOMING_END:
+/*
+ * Once our guest is running, postcopy does no longer care about
+ * resizes. When growing, the new memory was not available on the
+ * source, no handler needed.
+ */
+break;
+default:
+error_report("RAM block '%s' resized during postcopy state: %d",
+ rb->idstr, ps);
+exit(-1);
+}
 }
 
 static RAMBlockNotifier ram_mig_ram_notifier = {
-- 
2.24.1

[PATCH v2 04/13] numa: Make all callbacks of ram block notifiers optional

Let's make add/remove optional. We want to introduce a RAM block
notifier for RAM migration, that's only interested in resizes.

Reviewed-by: Peter Xu 
Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Peter Xu 
Signed-off-by: David Hildenbrand 
---
 hw/core/numa.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index e28ad24fcd..4270b268c8 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -915,8 +915,11 @@ static int ram_block_notify_add_single(RAMBlock *rb, void 
*opaque)
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
 QLIST_INSERT_HEAD(_list.ramblock_notifiers, n, next);
+
 /* Notify about all existing ram blocks. */
-qemu_ram_foreach_block(ram_block_notify_add_single, n);
+if (n->ram_block_added) {
+qemu_ram_foreach_block(ram_block_notify_add_single, n);
+}
 }
 
 void ram_block_notifier_remove(RAMBlockNotifier *n)
@@ -929,7 +932,9 @@ void ram_block_notify_add(void *host, size_t size, size_t 
max_size)
 RAMBlockNotifier *notifier;
 
 QLIST_FOREACH(notifier, _list.ramblock_notifiers, next) {
-notifier->ram_block_added(notifier, host, size, max_size);
+if (notifier->ram_block_added) {
+notifier->ram_block_added(notifier, host, size, max_size);
+}
 }
 }
 
@@ -938,7 +943,9 @@ void ram_block_notify_remove(void *host, size_t size, 
size_t max_size)
 RAMBlockNotifier *notifier;
 
 QLIST_FOREACH(notifier, _list.ramblock_notifiers, next) {
-notifier->ram_block_removed(notifier, host, size, max_size);
+if (notifier->ram_block_removed) {
+notifier->ram_block_removed(notifier, host, size, max_size);
+}
 }
 }
 
-- 
2.24.1

[PATCH v2 06/13] exec: Relax range check in ram_block_discard_range()

We want to make use of ram_block_discard_range() in the RAM block resize
callback when growing a RAM block, *before* used_length is changed.
Let's relax the check. We always have a reserved mapping for the whole
max_length, so we cannot corrupt unrelated data.

Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
Cc: Peter Xu 
Signed-off-by: David Hildenbrand 
---
 exec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index 8b015821d6..8737acedab 100644
--- a/exec.c
+++ b/exec.c
@@ -3915,7 +3915,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, 
size_t length)
 goto err;
 }
 
-if ((start + length) <= rb->used_length) {
+if ((start + length) <= rb->max_length) {
 bool need_madvise, need_fallocate;
 if (!QEMU_IS_ALIGNED(length, rb->page_size)) {
 error_report("ram_block_discard_range: Unaligned length: %zx",
@@ -3982,7 +3982,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, 
size_t length)
 } else {
 error_report("ram_block_discard_range: Overrun block '%s' (%" PRIu64
  "/%zx/" RAM_ADDR_FMT")",
- rb->idstr, start, length, rb->used_length);
+ rb->idstr, start, length, rb->max_length);
 }
 
 err:
-- 
2.24.1

[PATCH v2 05/13] migration/ram: Handle RAM block resizes during precopy

Resizing while migrating is dangerous and does not work as expected.
The whole migration code works on the usable_length of ram blocks and does
not expect this to change at random points in time.

In the case of precopy, the ram block size must not change on the source,
after syncing the RAM block list in ram_save_setup(), so as long as the
guest is still running on the source.

Resizing can be trigger *after* (but not during) a reset in
ACPI code by the guest
- hw/arm/virt-acpi-build.c:acpi_ram_update()
- hw/i386/acpi-build.c:acpi_ram_update()

Use the ram block notifier to get notified about resizes. Let's simply
cancel migration and indicate the reason. We'll continue running on the
source. No harm done.

Update the documentation. Postcopy will be handled separately.

Cc: "Dr. David Alan Gilbert" 
Cc: Juan Quintela 
Cc: Eduardo Habkost 
Cc: Paolo Bonzini 
Cc: Igor Mammedov 
Cc: "Michael S. Tsirkin" 
Cc: Richard Henderson 
Cc: Shannon Zhao 
Cc: Alex Bennée 
Cc: Peter Xu 
Signed-off-by: David Hildenbrand 
---
 exec.c|  5 +++--
 include/exec/memory.h | 10 ++
 migration/migration.c |  9 +++--
 migration/migration.h |  1 +
 migration/ram.c   | 31 +++
 5 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index b75250e773..8b015821d6 100644
--- a/exec.c
+++ b/exec.c
@@ -2120,8 +2120,9 @@ static int memory_try_enable_merging(void *addr, size_t 
len)
 return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE);
 }
 
-/* Only legal before guest might have detected the memory size: e.g. on
- * incoming migration, or right after reset.
+/*
+ * Resizing RAM while migrating can result in the migration being canceled.
+ * Care has to be taken if the guest might have already detected the memory.
  *
  * As memory core doesn't know how is memory accessed, it is up to
  * resize callback to update device state and/or add assertions to detect
diff --git a/include/exec/memory.h b/include/exec/memory.h
index e85b7de99a..de111347e8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -113,7 +113,7 @@ typedef struct IOMMUNotifier IOMMUNotifier;
 #define RAM_SHARED (1 << 1)
 
 /* Only a portion of RAM (used_length) is actually used, and migrated.
- * This used_length size can change across reboots.
+ * Resizing RAM while migrating can result in the migration being canceled.
  */
 #define RAM_RESIZEABLE (1 << 2)
 
@@ -843,7 +843,9 @@ void memory_region_init_ram_shared_nomigrate(MemoryRegion 
*mr,
  * RAM.  Accesses into the region will
  * modify memory directly.  Only an initial
  * portion of this RAM is actually used.
- * The used size can change across reboots.
+ * Changing the size while migrating
+ * can result in the migration being
+ * canceled.
  *
  * @mr: the #MemoryRegion to be initialized.
  * @owner: the object that tracks the region's reference count
@@ -1464,8 +1466,8 @@ void *memory_region_get_ram_ptr(MemoryRegion *mr);
 
 /* memory_region_ram_resize: Resize a RAM region.
  *
- * Only legal before guest might have detected the memory size: e.g. on
- * incoming migration, or right after reset.
+ * Resizing RAM while migrating can result in the migration being canceled.
+ * Care has to be taken if the guest might have already detected the memory.
  *
  * @mr: a memory region created with @memory_region_init_resizeable_ram.
  * @newsize: the new size the region
diff --git a/migration/migration.c b/migration/migration.c
index 8fb68795dc..ac9751dbe5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -175,13 +175,18 @@ void migration_object_init(void)
 }
 }
 
+void migration_cancel(void)
+{
+migrate_fd_cancel(current_migration);
+}
+
 void migration_shutdown(void)
 {
 /*
  * Cancel the current migration - that will (eventually)
  * stop the migration using this structure
  */
-migrate_fd_cancel(current_migration);
+migration_cancel();
 object_unref(OBJECT(current_migration));
 }
 
@@ -2019,7 +2024,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
 void qmp_migrate_cancel(Error **errp)
 {
-migrate_fd_cancel(migrate_get_current());
+migration_cancel();
 }
 
 void qmp_migrate_continue(MigrationStatus state, Error **errp)
diff --git a/migration/migration.h b/migration/migration.h
index 8473ddfc88..79fd74afa5 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -343,5 +343,6 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void 
*opaque);
 void migration_make_urgent_request(void);
 void migration_consume_urgent_request(void);
 bool migration_rate_limit(void);
+void migration_cancel(void);
 
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index ed23ed1c7c..39c7d1c4a6 100644
---

[PATCH v2 02/13] stubs/ram-block: Remove stubs that are no longer needed

Current code no longer needs these stubs to compile. Let's just remove
them.

Reviewed-by: Peter Xu 
Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
Cc: Peter Xu 
Signed-off-by: David Hildenbrand 
---
 stubs/ram-block.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/stubs/ram-block.c b/stubs/ram-block.c
index 73c0a3ee08..10855b52dd 100644
--- a/stubs/ram-block.c
+++ b/stubs/ram-block.c
@@ -2,21 +2,6 @@
 #include "exec/ramlist.h"
 #include "exec/cpu-common.h"
 
-void *qemu_ram_get_host_addr(RAMBlock *rb)
-{
-return 0;
-}
-
-ram_addr_t qemu_ram_get_offset(RAMBlock *rb)
-{
-return 0;
-}
-
-ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
-{
-return 0;
-}
-
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
 }
@@ -24,8 +9,3 @@ void ram_block_notifier_add(RAMBlockNotifier *n)
 void ram_block_notifier_remove(RAMBlockNotifier *n)
 {
 }
-
-int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
-{
-return 0;
-}
-- 
2.24.1

[PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating

This is the follow up of
"[PATCH RFC] memory: Don't allow to resize RAM while migrating" [1]

This series contains some (slightly modified) patches also contained in:
"[PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations
 under POSIX" [2]
That series will be based on this series. The last patch (#13) in this
series could be moved to the other series, but I decided to include it in
here for now (similar context).

I realized that resizing RAM blocks while the guest is being migrated
(precopy: resize while still running on the source, postcopy: resize
 while already running on the target) is buggy. In case of precopy, we
can simply cancel migration. Postcopy handling is more involved. Resizing
can currently happen during a guest reboot, triggered by ACPI rebuilds.

Along with the fixes, some cleanups.

[1] https://lkml.kernel.org/r/20200213172016.196609-1-da...@redhat.com
[2] https://lkml.kernel.org/r/20200212134254.11073-1-da...@redhat.com

I was now able to actually test resizing while migrating. I am using the
prototype of virtio-mem to test (which also makes use of resizable
allocations). Things I was able to reproduce:
- Resize while still running on the migration source. Migration is canceled
-- Test case for "migraton/ram: Handle RAM block resizes during precopy"
- Resize (grow+shrink) on the migration target during postcopy migration
  (when syncing RAM blocks), while not yet running on the target
-- Test case for "migration/ram: Discard new RAM when growing RAM blocks
   and the VM is stopped", and overall RAM size synchronization. Seems to
   work just fine.
- Resize (grow+shrink) on the migration tagret during postcopy migration
  while already running on the target.
-- Test case for "migration/ram: Handle RAM block resizes during postcopy"
-- Test case for "migration/ram: Tolerate partially changed mappings in
   postcopy code" - I can see that -ENOENT is actually triggered and that
   migration succeeds. Migration seems to work just fine.

In addition I run avocado-vt migration tests + usual QEMU checks.

v1 -> v2:
- "util: vfio-helpers: Factor out and fix processing of existing ram
   blocks"
-- Stringify error
- "migraton/ram: Handle RAM block resizes during precopy"
-- Simplified check if we're migrating on the source
- "exec: Relax range check in ram_block_discard_range()"
-- Added to make discard during resizes actually work
- "migration/ram: Discard new RAM when growing RAM blocks after
   ram_postcopy_incoming_init()"
-- Better checks if in the right postcopy mode.
-- Better patch subject/description/comments
- "migration/ram: Handle RAM block resizes during postcopy"
-- Better comments
-- Adapt to changed postcopy checks
- "migrate/ram: Get rid of "place_source" in ram_load_postcopy()"
-- Dropped, as broken
- "migration/ram: Tolerate partially changed mappings in postcopy code"
-- Better comment / description. Clarify that no implicit wakeup will
   happen
-- Warn on EINVAL (older kernels)
-- Wake up any waiter explicitly


David Hildenbrand (13):
  util: vfio-helpers: Factor out and fix processing of existing ram
blocks
  stubs/ram-block: Remove stubs that are no longer needed
  numa: Teach ram block notifiers about resizeable ram blocks
  numa: Make all callbacks of ram block notifiers optional
  migration/ram: Handle RAM block resizes during precopy
  exec: Relax range check in ram_block_discard_range()
  migration/ram: Discard RAM when growing RAM blocks after
ram_postcopy_incoming_init()
  migration/ram: Simplify host page handling in ram_load_postcopy()
  migration/ram: Consolidate variable reset after placement in
ram_load_postcopy()
  migration/ram: Handle RAM block resizes during postcopy
  migration/multifd: Print used_length of memory block
  migration/ram: Use offset_in_ramblock() in range checks
  migration/ram: Tolerate partially changed mappings in postcopy code

 exec.c |  27 +--
 hw/core/numa.c |  41 +--
 hw/i386/xen/xen-mapcache.c |   7 +-
 include/exec/cpu-common.h  |   1 +
 include/exec/memory.h  |  10 +--
 include/exec/ramblock.h|  10 +++
 include/exec/ramlist.h |  13 ++--
 migration/migration.c  |   9 ++-
 migration/migration.h  |   1 +
 migration/multifd.c|   2 +-
 migration/postcopy-ram.c   |  52 +-
 migration/ram.c| 144 -
 stubs/ram-block.c  |  20 --
 target/i386/hax-mem.c  |   5 +-
 target/i386/sev.c  |  18 ++---
 util/vfio-helpers.c|  41 ---
 16 files changed, 283 insertions(+), 118 deletions(-)

-- 
2.24.1

[PATCH v2 01/13] util: vfio-helpers: Factor out and fix processing of existing ram blocks

Factor it out into common code when a new notifier is registered, just
as done with the memory region notifier. This allows us to have the
logic about how to process existing ram blocks at a central place (which
will be extended soon).

Just like when adding a new ram block, we have to register the max_length
for now. We don't have a way to get notified about resizes yet, and some
memory would not be mapped when growing the ram block.

Note: Currently, ram blocks are only "fake resized". All memory
(max_length) is accessible.

We can get rid of a bunch of functions in stubs/ram-block.c . Print the
warning from inside qemu_vfio_ram_block_added().

Reviewed-by: Peter Xu 
Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Alex Williamson 
Cc: Stefan Hajnoczi 
Cc: Peter Xu 
Signed-off-by: David Hildenbrand 
---
 exec.c|  5 +
 hw/core/numa.c| 14 ++
 include/exec/cpu-common.h |  1 +
 util/vfio-helpers.c   | 29 -
 4 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/exec.c b/exec.c
index 8e9cc3b47c..dfd43d27c6 100644
--- a/exec.c
+++ b/exec.c
@@ -2016,6 +2016,11 @@ ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
 return rb->used_length;
 }
 
+ram_addr_t qemu_ram_get_max_length(RAMBlock *rb)
+{
+return rb->max_length;
+}
+
 bool qemu_ram_is_shared(RAMBlock *rb)
 {
 return rb->flags & RAM_SHARED;
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 0d1b4be76a..6599c69e05 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -899,9 +899,23 @@ void query_numa_node_mem(NumaNodeMem node_mem[], 
MachineState *ms)
 }
 }
 
+static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
+{
+const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+void *host = qemu_ram_get_host_addr(rb);
+RAMBlockNotifier *notifier = opaque;
+
+if (host) {
+notifier->ram_block_added(notifier, host, max_size);
+}
+return 0;
+}
+
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
 QLIST_INSERT_HEAD(_list.ramblock_notifiers, n, next);
+/* Notify about all existing ram blocks. */
+qemu_ram_foreach_block(ram_block_notify_add_single, n);
 }
 
 void ram_block_notifier_remove(RAMBlockNotifier *n)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 81753bbb34..9760ac9068 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -59,6 +59,7 @@ const char *qemu_ram_get_idstr(RAMBlock *rb);
 void *qemu_ram_get_host_addr(RAMBlock *rb);
 ram_addr_t qemu_ram_get_offset(RAMBlock *rb);
 ram_addr_t qemu_ram_get_used_length(RAMBlock *rb);
+ram_addr_t qemu_ram_get_max_length(RAMBlock *rb);
 bool qemu_ram_is_shared(RAMBlock *rb);
 bool qemu_ram_is_uf_zeroable(RAMBlock *rb);
 void qemu_ram_set_uf_zeroable(RAMBlock *rb);
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index ddd9a96e76..260570ae19 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -376,8 +376,14 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
   void *host, size_t size)
 {
 QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
+int ret;
+
 trace_qemu_vfio_ram_block_added(s, host, size);
-qemu_vfio_dma_map(s, host, size, false, NULL);
+ret = qemu_vfio_dma_map(s, host, size, false, NULL);
+if (ret) {
+error_report("qemu_vfio_dma_map(%p, %zu) failed: %s", host, size,
+ strerror(-ret));
+}
 }
 
 static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
@@ -390,33 +396,14 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier 
*n,
 }
 }
 
-static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque)
-{
-void *host_addr = qemu_ram_get_host_addr(rb);
-ram_addr_t length = qemu_ram_get_used_length(rb);
-int ret;
-QEMUVFIOState *s = opaque;
-
-if (!host_addr) {
-return 0;
-}
-ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL);
-if (ret) {
-fprintf(stderr, "qemu_vfio_init_ramblock: failed %p %" PRId64 "\n",
-host_addr, (uint64_t)length);
-}
-return 0;
-}
-
 static void qemu_vfio_open_common(QEMUVFIOState *s)
 {
 qemu_mutex_init(>lock);
 s->ram_notifier.ram_block_added = qemu_vfio_ram_block_added;
 s->ram_notifier.ram_block_removed = qemu_vfio_ram_block_removed;
-ram_block_notifier_add(>ram_notifier);
 s->low_water_mark = QEMU_VFIO_IOVA_MIN;
 s->high_water_mark = QEMU_VFIO_IOVA_MAX;
-qemu_ram_foreach_block(qemu_vfio_init_ramblock, s);
+ram_block_notifier_add(>ram_notifier);
 }
 
 /**
-- 
2.24.1

Re: [PATCH] maint: Include top-level *.rst files early in git diff

Peter Maydell  writes:

> On Thu, 20 Feb 2020 at 16:22, Eric Blake  wrote:
>>
>> We are converting more doc files to *.rst rather than *.texi.  Most
>> doc files are already listed early in diffs due to our catchall
>> docs/*, but a few top-level files get missed by that glob.
>>
>> Signed-off-by: Eric Blake 
>> ---
>>
>> Both *.texi and *.rst entries make sense while we are still converting
>> things, but we'll need a followup to drop *.texi when the conversion
>> is complete...
>
> I was wondering what rst files we had outside docs, and
> the answer is "README.rst, CODING_STYLE.rst and
> tests/acceptance/README.rst".
>
> (There's a reasonable argument for moving CODING_STYLE.rst to
> docs/devel now; depends how highly you rate having it more
> "discoverable" to new contributors.)

The little cynic in the back of my head cackles and points out that
CODING_STYLE.rst exists not for new contributors to discover, but so
that old contributors have something to smack on new heads.

SCNR :)

Re: [PATCH v5 0/5] gpio: Add GPIO Aggregator

2020-02-21 Thread Geert Uytterhoeven

Hi Linus and Bartosz,

On Tue, Feb 18, 2020 at 4:18 PM Geert Uytterhoeven
 wrote:
> GPIO controllers are exported to userspace using /dev/gpiochip*
> character devices.  Access control to these devices is provided by
> standard UNIX file system permissions, on an all-or-nothing basis:
> either a GPIO controller is accessible for a user, or it is not.
> Currently no mechanism exists to control access to individual GPIOs.
>
> Hence this adds a GPIO driver to aggregate existing GPIOs, and expose
> them as a new gpiochip.  This is useful for implementing access control,
> and assigning a set of GPIOs to a specific user.  Furthermore, this
> simplifies and hardens exporting GPIOs to a virtual machine, as the VM
> can just grab the full GPIO controller, and no longer needs to care
> about which GPIOs to grab and which not, reducing the attack surface.

Do you have any more comments, before I respin and post v6?

Thanks, and have a niec weekend!

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH v5 4/5] docs: gpio: Add GPIO Aggregator documentation

2020-02-21 Thread Randy Dunlap

On 2/21/20 8:34 AM, Geert Uytterhoeven wrote:
> Hi Randy,
> 

Hi Geert,
Those changes look good. Thanks.

-- 
~Randy

Re: [PATCH 0/3] hw: More dma_memory_read/write() API cleanup

On 21/02/20 14:25, Philippe Mathieu-Daudé wrote:
> Following up "global exec/memory/dma APIs cleanup"
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg681475.html
> 
> Few more cleanups in PCNET & SCSI ESP devices.

Can you send a pull request for everything?

Paolo

[PATCH] accel/kvm: Check ioctl(KVM_SET_USER_MEMORY_REGION) return value

kvm_vm_ioctl() can fail, check its return value, and log an error
when it failed. This fixes Coverity CID 1412229:

  Unchecked return value (CHECKED_RETURN)

  check_return: Calling kvm_vm_ioctl without checking return value

Reported-by: Coverity (CID 1412229)
Fixes: 235e8982ad3 ("support using KVM_MEM_READONLY flag for regions")
Signed-off-by: Philippe Mathieu-Daudé 
---
 accel/kvm/kvm-all.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c111312dfd..6df3a4d030 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -308,13 +308,23 @@ static int kvm_set_user_memory_region(KVMMemoryListener 
*kml, KVMSlot *slot, boo
 /* Set the slot size to 0 before setting the slot to the desired
  * value. This is needed based on KVM commit 75d61fbc. */
 mem.memory_size = 0;
-kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, );
+ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, );
+if (ret < 0) {
+goto err;
+}
 }
 mem.memory_size = slot->memory_size;
 ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, );
 slot->old_flags = mem.flags;
+err:
 trace_kvm_set_user_memory(mem.slot, mem.flags, mem.guest_phys_addr,
   mem.memory_size, mem.userspace_addr, ret);
+if (ret < 0) {
+error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
+ " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
+ __func__, mem.slot, slot->start_addr,
+ (uint64_t)mem.memory_size, strerror(errno));
+}
 return ret;
 }
 
-- 
2.21.1

Re: [PATCH v1 05/13] migrate/ram: Handle RAM block resizes during precopy

2020-02-21 Thread Peter Xu

On Fri, Feb 21, 2020 at 11:19:58AM +0100, David Hildenbrand wrote:

[...]

> Lol, man this code is confusing.

(Yes in many places it is...)

> 
> So, migration_is_idle() really only checks current_migration, not
> current_incoming.
> 
> I didn't expect to be knees deep in migration code at this point ...
> 
> if (migration_is_idle()) {
>   return:
> }

Yes this seems to work too for postcopy, though may worth add a
comment showing the fact...  Thanks,

-- 
Peter Xu

[PATCH v2 1/2] block/curl: HTTP header fields allow whitespace around values

2020-02-21 Thread David Edmondson

RFC 7230 section 3.2 indicates that whitespace is permitted between
the field name and field value and after the field value.

Signed-off-by: David Edmondson 
---
 block/curl.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index f86299378e38..f9ffb7f4e2bf 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -214,11 +214,34 @@ static size_t curl_header_cb(void *ptr, size_t size, 
size_t nmemb, void *opaque)
 {
 BDRVCURLState *s = opaque;
 size_t realsize = size * nmemb;
-const char *accept_line = "Accept-Ranges: bytes";
+const char *header = (char *)ptr;
+const char *end = header + realsize;
+const char *accept_ranges = "Accept-Ranges:";
+const char *bytes = "bytes";
 
-if (realsize >= strlen(accept_line)
-&& strncmp((char *)ptr, accept_line, strlen(accept_line)) == 0) {
-s->accept_range = true;
+if (realsize >= strlen(accept_ranges)
+&& strncmp(header, accept_ranges, strlen(accept_ranges)) == 0) {
+
+char *p = strchr(header, ':') + 1;
+
+/* Skip whitespace between the header name and value. */
+while (p < end && *p && g_ascii_isspace(*p)) {
+p++;
+}
+
+if (end - p >= strlen(bytes)
+&& strncmp(p, bytes, strlen(bytes)) == 0) {
+
+/* Check that there is nothing but whitespace after the value. */
+p += strlen(bytes);
+while (p < end && *p && g_ascii_isspace(*p)) {
+p++;
+}
+
+if (p == end || !*p) {
+s->accept_range = true;
+}
+}
 }
 
 return realsize;
-- 
2.24.1

Re: [PATCH v7 01/11] qapi/error: add (Error **errp) cleaning APIs

Vladimir Sementsov-Ogievskiy  writes:

> 21.02.2020 10:38, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy  writes:
>>
>>> Add functions to clean Error **errp: call corresponding Error *err
>>> cleaning function an set pointer to NULL.
>>>
>>> New functions:
>>>error_free_errp
>>>error_report_errp
>>>warn_report_errp
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> Reviewed-by: Greg Kurz 
>>> Reviewed-by: Eric Blake 
>>> ---
>>>
>>> CC: Eric Blake 
>>> CC: Kevin Wolf 
>>> CC: Max Reitz 
>>> CC: Greg Kurz 
>>> CC: Stefano Stabellini 
>>> CC: Anthony Perard 
>>> CC: Paul Durrant 
>>> CC: Stefan Hajnoczi 
>>> CC: "Philippe Mathieu-Daudé" 
>>> CC: Laszlo Ersek 
>>> CC: Gerd Hoffmann 
>>> CC: Stefan Berger 
>>> CC: Markus Armbruster 
>>> CC: Michael Roth 
>>> CC: qemu-bl...@nongnu.org
>>> CC: xen-de...@lists.xenproject.org
>>>
>>>   include/qapi/error.h | 26 ++
>>>   1 file changed, 26 insertions(+)
>>>
>>> diff --git a/include/qapi/error.h b/include/qapi/error.h
>>> index ad5b6e896d..d34987148d 100644
>>> --- a/include/qapi/error.h
>>> +++ b/include/qapi/error.h
>>> @@ -309,6 +309,32 @@ void warn_reportf_err(Error *err, const char *fmt, ...)
>>>   void error_reportf_err(Error *err, const char *fmt, ...)
>>>   GCC_FMT_ATTR(2, 3);
>>>   +/*
>>> + * Functions to clean Error **errp: call corresponding Error *err cleaning
>>> + * function, then set pointer to NULL.
>>> + */
>>> +static inline void error_free_errp(Error **errp)
>>> +{
>>> +assert(errp && *errp);
>>> +error_free(*errp);
>>> +*errp = NULL;
>>> +}
>>> +
>>> +static inline void error_report_errp(Error **errp)
>>> +{
>>> +assert(errp && *errp);
>>> +error_report_err(*errp);
>>> +*errp = NULL;
>>> +}
>>> +
>>> +static inline void warn_report_errp(Error **errp)
>>> +{
>>> +assert(errp && *errp);
>>> +warn_report_err(*errp);
>>> +*errp = NULL;
>>> +}
>>> +
>>> +
>>>   /*
>>>* Just like error_setg(), except you get to specify the error class.
>>>* Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is
>>
>> These appear to be unused apart from the Coccinelle script in PATCH 03.
>>
>> They are used in the full "[RFC v5 000/126] error: auto propagated
>> local_err" series.  Options:
>>
>> 1. Pick a few more patches into this part I series, so these guys come
>> with users.
>
> It needs some additional effort for this series.. But it's possible. Still,
> I think that we at least should not pull out patches which should be in
> future series (for example from ppc or block/)..

Yes, we want to keep related stuff together.

> Grepping through v5:
>  for x in {warn_report_errp,error_report_errp,error_free_errp}; do echo == $x 
> ==; git grep -l $x | grep -v coccinelle | grep -v 'error\.h'; echo; done
> == warn_report_errp ==
> block/file-posix.c
> hw/ppc/spapr.c
> hw/ppc/spapr_caps.c
> hw/ppc/spapr_irq.c
> hw/vfio/pci.c
> net/tap.c
> qom/object.c
>
> == error_report_errp ==
> hw/block/vhost-user-blk.c
> util/oslib-posix.c
>
> == error_free_errp ==
> block.c
> block/qapi.c
> block/sheepdog.c
> block/snapshot.c
> blockdev.c
> chardev/char-socket.c
> hw/audio/intel-hda.c
> hw/core/qdev-properties.c
> hw/pci-bridge/pci_bridge_dev.c
> hw/pci-bridge/pcie_pci_bridge.c
> hw/scsi/megasas.c
> hw/scsi/mptsas.c
> hw/usb/hcd-xhci.c
> io/net-listener.c
> migration/colo.c
> qga/commands-posix.c
> qga/commands-win32.c
> util/qemu-sockets.c
>
> What do you want to add?

PATCH v5 032 uses both error_report_errp() and error_free_errp().
Adding warn_report_errp() without a user is okay with me.  What do you
think?

If there are patches you consider related to 032, feel free to throw
them in.

>>
>> 2. Punt this patch to the first part that has users, along with the
>> part of the Coccinelle script that deals with them.
>
> But coccinelle script would be wrong, if we drop this part from it. I think,
> that after commit which adds coccinelle script, it should work with any file,
> not only subset of these series.
>
> So, it's probably OK for now to drop these functions, forcing their addition 
> if
> coccinelle script will be applied where these functions are needed. We can, 
> for
> example comment these three functions.
>
> Splitting coccinelle script into two parts, which will be in different series 
> will
> not help any patch-porting processes.
>
> Moreover, this will create dependencies between future series updating other 
> files.
>
> So, I don't like [2.]..

And it's likely more work than 1.

>> 3. Do nothing: accept the functions without users.
>
> OK for me)
>
>>
>> I habitually dislike 3., but reviewing the rest of this series might
>> make me override that dislike.
[...]

Re: [PATCH v5 4/5] docs: gpio: Add GPIO Aggregator documentation

2020-02-21 Thread Geert Uytterhoeven

Hi Randy,

On Tue, Feb 18, 2020 at 7:30 PM Randy Dunlap  wrote:
> On 2/18/20 7:18 AM, Geert Uytterhoeven wrote:
> > Document the GPIO Aggregator, and the two typical use-cases.
> >
> > Signed-off-by: Geert Uytterhoeven 

> > --- /dev/null
> > +++ b/Documentation/admin-guide/gpio/gpio-aggregator.rst
> > @@ -0,0 +1,102 @@
> > +.. SPDX-License-Identifier: GPL-2.0-only
> > +
> > +GPIO Aggregator
> > +===
> > +
> > +The GPIO Aggregator allows to aggregate GPIOs, and expose them as a new
>
> "allows" really wants an object following the verb [although the kernel 
> sources
> and docs have many cases of it not having an object].  Something like
>
>allows {you, one, someone, users, a user} to aggregate

Changing to:

provides a mechanism to aggregate GPIOs

> > +gpio_chip.  This supports the following use cases.
> > +
> > +
> > +Aggregating GPIOs using Sysfs
> > +-
> > +
> > +GPIO controllers are exported to userspace using /dev/gpiochip* character
> > +devices.  Access control to these devices is provided by standard UNIX file
> > +system permissions, on an all-or-nothing basis: either a GPIO controller is
> > +accessible for a user, or it is not.
> > +
> > +The GPIO Aggregator allows access control for individual GPIOs, by 
> > aggregating

Changing to:

provides access control for a set of one or more GPIOs

> > +them into a new gpio_chip, which can be assigned to a group or user using
> > +standard UNIX file ownership and permissions.  Furthermore, this 
> > simplifies and
> > +hardens exporting GPIOs to a virtual machine, as the VM can just grab the 
> > full
> > +GPIO controller, and no longer needs to care about which GPIOs to grab and
> > +which not, reducing the attack surface.

> > +Generic GPIO Driver
> > +---
> > +
> > +The GPIO Aggregator can also be used as a generic driver for a simple
> > +GPIO-operated device described in DT, without a dedicated in-kernel driver.
> > +This is useful in industrial control, and is not unlike e.g. spidev, which
> > +allows to communicate with an SPI device from userspace.
>
>allows {choose an object} to communicate

Changing to:

allows the user to communicate

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

[PATCH v2 0/2] block/curl: Improve HTTP header parsing

2020-02-21 Thread David Edmondson

An HTTP object store of my acquaintance returns "accept-ranges: bytes"
(all lower case) as a header, causing the QEMU curl backend to refuse
to talk to it. RFC 7230 says that HTTP headers are case insensitive,
so update the curl backend accordingly.

At the same time, allow for arbitrary white space around the HTTP
header field value, as required by the RFC.

v2:
- strncasecmp -> g_ascii_strncasecmp (Philippe).
- isspace -> g_ascii_isspace, for good measure.

David Edmondson (2):
  block/curl: HTTP header fields allow whitespace around values
  block/curl: HTTP header field names are case insensitive

 block/curl.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

-- 
2.24.1

[PATCH v2 2/2] block/curl: HTTP header field names are case insensitive

2020-02-21 Thread David Edmondson

RFC 7230 section 3.2 indicates that HTTP header field names are case
insensitive.

Signed-off-by: David Edmondson 
---
 block/curl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index f9ffb7f4e2bf..1421e8fb9815 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -216,11 +216,11 @@ static size_t curl_header_cb(void *ptr, size_t size, 
size_t nmemb, void *opaque)
 size_t realsize = size * nmemb;
 const char *header = (char *)ptr;
 const char *end = header + realsize;
-const char *accept_ranges = "Accept-Ranges:";
+const char *accept_ranges = "accept-ranges:";
 const char *bytes = "bytes";
 
 if (realsize >= strlen(accept_ranges)
-&& strncmp(header, accept_ranges, strlen(accept_ranges)) == 0) {
+&& g_ascii_strncasecmp(header, accept_ranges, strlen(accept_ranges)) 
== 0) {
 
 char *p = strchr(header, ':') + 1;
 
-- 
2.24.1

[PATCH] hw/unicore32/puv3: Simplify puv3_soc_init()

Since commit d8ed887bdc, the puv3_intc_cpu_handler handler takes
a pointer to UniCore32CPU in its opaque argument.  Directly pass
the cpu pointer.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/unicore32/puv3.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/unicore32/puv3.c b/hw/unicore32/puv3.c
index 7e933de228..eec7f561eb 100644
--- a/hw/unicore32/puv3.c
+++ b/hw/unicore32/puv3.c
@@ -48,7 +48,7 @@ static void puv3_intc_cpu_handler(void *opaque, int irq, int 
level)
 }
 }
 
-static void puv3_soc_init(CPUUniCore32State *env)
+static void puv3_soc_init(UniCore32CPU *cpu)
 {
 qemu_irq cpu_intc, irqs[PUV3_IRQS_NR];
 DeviceState *dev;
@@ -56,8 +56,7 @@ static void puv3_soc_init(CPUUniCore32State *env)
 int i;
 
 /* Initialize interrupt controller */
-cpu_intc = qemu_allocate_irq(puv3_intc_cpu_handler,
- env_archcpu(env), 0);
+cpu_intc = qemu_allocate_irq(puv3_intc_cpu_handler, cpu, 0);
 dev = sysbus_create_simple("puv3_intc", PUV3_INTC_BASE, cpu_intc);
 for (i = 0; i < PUV3_IRQS_NR; i++) {
 irqs[i] = qdev_get_gpio_in(dev, i);
@@ -131,7 +130,7 @@ static void puv3_init(MachineState *machine)
 cpu = UNICORE32_CPU(cpu_create(machine->cpu_type));
 env = >env;
 
-puv3_soc_init(env);
+puv3_soc_init(cpu);
 puv3_board_init(env, ram_size);
 puv3_load_kernel(kernel_filename);
 }
-- 
2.21.1

Re: [PATCH v7 02/11] error: auto propagated local_err

Vladimir Sementsov-Ogievskiy  writes:

> 21.02.2020 12:19, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy  writes:
>>
>>> Here is introduced ERRP_AUTO_PROPAGATE macro, to be used at start of
>>> functions with an errp OUT parameter.
>>>
>>> It has three goals:
>>>
>>> 1. Fix issue with error_fatal and error_prepend/error_append_hint: user
>>> can't see this additional information, because exit() happens in
>>> error_setg earlier than information is added. [Reported by Greg Kurz]
>>>
>>> 2. Fix issue with error_abort and error_propagate: when we wrap
>>> error_abort by local_err+error_propagate, the resulting coredump will
>>> refer to error_propagate and not to the place where error happened.
>>> (the macro itself doesn't fix the issue, but it allows us to [3.] drop
>>> the local_err+error_propagate pattern, which will definitely fix the
>>> issue) [Reported by Kevin Wolf]
>>>
>>> 3. Drop local_err+error_propagate pattern, which is used to workaround
>>> void functions with errp parameter, when caller wants to know resulting
>>> status. (Note: actually these functions could be merely updated to
>>> return int error code).
>>>
>>> To achieve these goals, later patches will add invocations
>>> of this macro at the start of functions with either use
>>> error_prepend/error_append_hint (solving 1) or which use
>>> local_err+error_propagate to check errors, switching those
>>> functions to use *errp instead (solving 2 and 3).
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> Reviewed-by: Greg Kurz 
>>> Reviewed-by: Eric Blake 
>>> ---
>>>
>>> CC: Eric Blake 
>>> CC: Kevin Wolf 
>>> CC: Max Reitz 
>>> CC: Greg Kurz 
>>> CC: Stefano Stabellini 
>>> CC: Anthony Perard 
>>> CC: Paul Durrant 
>>> CC: Stefan Hajnoczi 
>>> CC: "Philippe Mathieu-Daudé" 
>>> CC: Laszlo Ersek 
>>> CC: Gerd Hoffmann 
>>> CC: Stefan Berger 
>>> CC: Markus Armbruster 
>>> CC: Michael Roth 
>>> CC: qemu-bl...@nongnu.org
>>> CC: xen-de...@lists.xenproject.org
>>>
>>>   include/qapi/error.h | 83 +++-
>>>   1 file changed, 82 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/qapi/error.h b/include/qapi/error.h
>>> index d34987148d..b9452d4806 100644
>>> --- a/include/qapi/error.h
>>> +++ b/include/qapi/error.h
>>> @@ -78,7 +78,7 @@
>>>* Call a function treating errors as fatal:
>>>* foo(arg, _fatal);
>>>*
>>> - * Receive an error and pass it on to the caller:
>>> + * Receive an error and pass it on to the caller (DEPRECATED*):
>>>* Error *err = NULL;
>>>* foo(arg, );
>>>* if (err) {
>>> @@ -98,6 +98,50 @@
>>>* foo(arg, errp);
>>>* for readability.
>>>*
>>> + * DEPRECATED* This pattern is deprecated now, the use ERRP_AUTO_PROPAGATE 
>>> macro
>>> + * instead (defined below).
>>> + * It's deprecated because of two things:
>>> + *
>>> + * 1. Issue with error_abort and error_propagate: when we wrap error_abort 
>>> by
>>> + * local_err+error_propagate, the resulting coredump will refer to
>>> + * error_propagate and not to the place where error happened.
>>> + *
>>> + * 2. A lot of extra code of the same pattern
>>> + *
>>> + * How to update old code to use ERRP_AUTO_PROPAGATE?
>>> + *
>>> + * All you need is to add ERRP_AUTO_PROPAGATE() invocation at function 
>>> start,
>>> + * than you may safely dereference errp to check errors and do not need any
>>> + * additional local Error variables or calls to error_propagate().
>>> + *
>>> + * Example:
>>> + *
>>> + * old code
>>> + *
>>> + * void fn(..., Error **errp) {
>>> + * Error *err = NULL;
>>> + * foo(arg, );
>>> + * if (err) {
>>> + * handle the error...
>>> + * error_propagate(errp, err);
>>> + * return;
>>> + * }
>>> + * ...
>>> + * }
>>> + *
>>> + * updated code
>>> + *
>>> + * void fn(..., Error **errp) {
>>> + * ERRP_AUTO_PROPAGATE();
>>> + * foo(arg, errp);
>>> + * if (*errp) {
>>> + * handle the error...
>>> + * return;
>>> + * }
>>> + * ...
>>> + * }
>>> + *
>>> + *
>>>* Receive and accumulate multiple errors (first one wins):
>>>* Error *err = NULL, *local_err = NULL;
>>>* foo(arg, );
>>
>> Let's explain what should be done *first*, and only then talk about the
>> deprecated pattern and how to convert it to current usage.
>>
>>> @@ -348,6 +392,43 @@ void error_set_internal(Error **errp,
>>>   ErrorClass err_class, const char *fmt, ...)
>>>   GCC_FMT_ATTR(6, 7);
>>>   +typedef struct ErrorPropagator {
>>> +Error *local_err;
>>> +Error **errp;
>>> +} ErrorPropagator;
>>> +
>>> +static inline void error_propagator_cleanup(ErrorPropagator *prop)
>>> +{
>>> +error_propagate(prop->errp, prop->local_err);
>>> +}
>>> +
>>> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, 
>>> error_propagator_cleanup);
>>> +
>>> +/*
>>> + * ERRP_AUTO_PROPAGATE
>>> + *
>>> + *

[PATCH] target/i386: check for empty register in FXAM

The fxam instruction returns the wrong result after fdecstp or after
an underflow.  Check fptags to handle this.

Reported-by: 
Signed-off-by: Paolo Bonzini 
---
 target/i386/fpu_helper.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 99f28f267f..792a128a6d 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -991,7 +991,11 @@ void helper_fxam_ST0(CPUX86State *env)
 env->fpus |= 0x200; /* C1 <-- 1 */
 }
 
-/* XXX: test fptags too */
+if (env->fptags[env->fpstt]) {
+env->fpus |= 0x4100; /* Empty */
+return;
+}
+
 expdif = EXPD(temp);
 if (expdif == MAXEXPD) {
 if (MANTD(temp) == 0x8000ULL) {
-- 
2.21.1

[PATCH] hw/mips/mips_int: Simplify cpu_mips_irq_init_cpu()

Since commit d8ed887bdc, the cpu_mips_irq_request handler takes
a pointer to MIPSCPU in its opaque argument.  Directly pass the
cpu pointer.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/mips/mips_int.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/mips/mips_int.c b/hw/mips/mips_int.c
index 863ed45659..796730b11d 100644
--- a/hw/mips/mips_int.c
+++ b/hw/mips/mips_int.c
@@ -77,7 +77,7 @@ void cpu_mips_irq_init_cpu(MIPSCPU *cpu)
 qemu_irq *qi;
 int i;
 
-qi = qemu_allocate_irqs(cpu_mips_irq_request, env_archcpu(env), 8);
+qi = qemu_allocate_irqs(cpu_mips_irq_request, cpu, 8);
 for (i = 0; i < 8; i++) {
 env->irq[i] = qi[i];
 }
-- 
2.21.1

[PULL v2 00/46] target-arm queue

v1->v2 changes: dropped the last 6 patches from rth as there's
a problem with one of them that's too complicated to try to
fix up.

thanks
-- PMM

The following changes since commit a8c6af67e1e8d460e2c6e87070807e0a02c0fec2:

  Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200221' into 
staging (2020-02-21 14:20:42 +)

are available in the Git repository at:

  https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20200221-1

for you to fetch changes up to 9eb4f58918a851fb46895fd9b7ce579afeac9d02:

  target/arm: Set MVFR0.FPSP for ARMv5 cpus (2020-02-21 16:07:03 +)


target-arm queue:
 * aspeed/scu: Implement chip ID register
 * hw/misc/iotkit-secctl: Fix writing to 'PPC Interrupt Clear' register
 * mainstone: Make providing flash images non-mandatory
 * z2: Make providing flash images non-mandatory
 * Fix failures to flush SVE high bits after AdvSIMD INS/ZIP/UZP/TRN/TBL/TBX/EXT
 * Minor performance improvement: spend less time recalculating hflags values
 * Code cleanup to isar_feature function tests
 * Implement ARMv8.1-PMU and ARMv8.4-PMU extensions
 * Bugfix: correct handling of PMCR_EL0.LC bit
 * Bugfix: correct definition of PMCRDP
 * Correctly implement ACTLR2, HACTLR2
 * allwinner: Wire up USB ports
 * Vectorize emulation of USHL, SSHL, PMUL*
 * xilinx_spips: Correct the number of dummy cycles for the FAST_READ_4 cmd
 * sh4: Fix PCI ISA IO memory subregion


Francisco Iglesias (1):
  xilinx_spips: Correct the number of dummy cycles for the FAST_READ_4 cmd

Guenter Roeck (6):
  mainstone: Make providing flash images non-mandatory
  z2: Make providing flash images non-mandatory
  hw: usb: hcd-ohci: Move OHCISysBusState and TYPE_SYSBUS_OHCI to include 
file
  hcd-ehci: Introduce "companion-enable" sysbus property
  arm: allwinner: Wire up USB ports
  sh4: Fix PCI ISA IO memory subregion

Joel Stanley (2):
  aspeed/scu: Create separate write callbacks
  aspeed/scu: Implement chip ID register

Peter Maydell (21):
  target/arm: Add _aa32_ to isar_feature functions testing 32-bit ID 
registers
  target/arm: Check aa32_pan in take_aarch32_exception(), not aa64_pan
  target/arm: Add isar_feature_any_fp16 and document naming/usage 
conventions
  target/arm: Define and use any_predinv isar_feature test
  target/arm: Factor out PMU register definitions
  target/arm: Add and use FIELD definitions for ID_AA64DFR0_EL1
  target/arm: Use FIELD macros for clearing ID_DFR0 PERFMON field
  target/arm: Define an aa32_pmu_8_1 isar feature test function
  target/arm: Add _aa64_ and _any_ versions of pmu_8_1 isar checks
  target/arm: Stop assuming DBGDIDR always exists
  target/arm: Move DBGDIDR into ARMISARegisters
  target/arm: Read debug-related ID registers from KVM
  target/arm: Implement ARMv8.1-PMU extension
  target/arm: Implement ARMv8.4-PMU extension
  target/arm: Provide ARMv8.4-PMU in '-cpu max'
  target/arm: Correct definition of PMCRDP
  target/arm: Correct handling of PMCR_EL0.LC bit
  target/arm: Test correct register in aa32_pan and aa32_ats1e1 checks
  target/arm: Use isar_feature function for testing AA32HPD feature
  target/arm: Use FIELD_EX32 for testing 32-bit fields
  target/arm: Correctly implement ACTLR2, HACTLR2

Philippe Mathieu-Daudé (1):
  hw/misc/iotkit-secctl: Fix writing to 'PPC Interrupt Clear' register

Richard Henderson (15):
  target/arm: Flush high bits of sve register after AdvSIMD EXT
  target/arm: Flush high bits of sve register after AdvSIMD TBL/TBX
  target/arm: Flush high bits of sve register after AdvSIMD ZIP/UZP/TRN
  target/arm: Flush high bits of sve register after AdvSIMD INS
  target/arm: Use bit 55 explicitly for pauth
  target/arm: Fix select for aa64_va_parameters_both
  target/arm: Remove ttbr1_valid check from get_phys_addr_lpae
  target/arm: Split out aa64_va_parameter_tbi, aa64_va_parameter_tbid
  target/arm: Vectorize USHL and SSHL
  target/arm: Convert PMUL.8 to gvec
  target/arm: Convert PMULL.64 to gvec
  target/arm: Convert PMULL.8 to gvec
  target/arm: Rename isar_feature_aa32_simd_r32
  target/arm: Use isar_feature_aa32_simd_r32 more places
  target/arm: Set MVFR0.FPSP for ARMv5 cpus

 hw/usb/hcd-ohci.h  |  16 ++
 include/hw/arm/allwinner-a10.h |   6 +
 target/arm/cpu.h   | 145 ++---
 target/arm/helper-sve.h|   2 +
 target/arm/helper.h|  21 +-
 target/arm/internals.h |  47 -
 target/arm/translate.h |   6 +
 hw/arm/allwinner-a10.c |  43 
 hw/arm/mainstone.c |  11 +-
 hw/arm/z2.c|   6 -
 hw/intc/armv7m_nvic.c  |  10 +-
 hw/misc/aspeed_scu.c   |  93 ++--
 h

Re: [PATCH] target: i386: Check float overflow about register stack

On 21/02/20 15:09, Chen Gang wrote:
>> -/* XXX: test fptags too */
>> +if (env->fptags[env->fpstt]) {
>> +env->fpus |= 0x4100; /* Empty */
>> +return;
>> +}
>> +
> For fpop overflow, this fix is enough, but for me, we still need
> foverflow to check fpush/fld*_ST0 overflow.
> 
> Don't you think we need check fpush/f[i]ld*_ST0 overflow?

After fld/fild or any other push, FXAM ST0 should not return empty in my
opinion.

Paolo

Re: [PULL 00/52] target-arm queue

2020-02-21 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200221130740.7583-1-peter.mayd...@linaro.org/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===




The full log is available at
http://patchew.org/logs/20200221130740.7583-1-peter.mayd...@linaro.org/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC

On Fri, 21 Feb 2020 at 16:05, BALATON Zoltan  wrote:
>
> On Thu, 20 Feb 2020, Richard Henderson wrote:
> > On 2/18/20 9:10 AM, BALATON Zoltan wrote:
> >> +DEFINE_PROP_BOOL("hardfloat", PowerPCCPU, hardfloat, true),
> >
> > I would also prefer a different name here -- perhaps x-no-fp-fi.
>
> What's wrong with hardfloat? That's how the code refers to this so if
> anyone searches what it does would turn up some meaningful results.

This prompted me to check what you're using the property for.
The cover letter says:
> This patch implements a simple way to keep the inexact flag set for
> hardfloat while still allowing to revert to softfloat for workloads
> that need more accurate albeit slower emulation. (Set hardfloat
> property of CPU, i.e. -cpu name,hardfloat=false for that.)

I think that is the wrong approach. Enabling use of the host
FPU should not affect the accuracy of the emulation, which
should remain bitwise-correct. We should only be using the
host FPU to the extent that we can do that without discarding
accuracy. As far as I'm aware that's how the hardfloat support
for other guest CPUs that use it works.

thanks
-- PMM

Re: [PATCH 10/19] target/arm: Add missing checks for fpsp_v2

On Fri, 14 Feb 2020 at 18:16, Richard Henderson
 wrote:
>
> We will eventually remove the early ARM_FEATURE_VFP test,
> so add a proper test for each trans_* that does not already
> have another ISA test.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH 04/19] target/arm: Set MVFR0.FPSP for ARMv5 cpus