date:20190107

Re: [Qemu-devel] [RFC PATCH 13/25] scsi: express dependencies with Kconfig

2019-01-07 Thread Yang Zhong

On Fri, Jan 04, 2019 at 02:38:23PM +0100, Paolo Bonzini wrote:
> On 27/12/18 07:34, Yang Zhong wrote:
> > From: Paolo Bonzini 
> > 
> > This lets you disable SCSI altogether with "CONFIG_SCSI=n".
> 
> USB_STORAGE_BOT and USB_STORAGE_UAS must also select SCSI.
> 
> Paolo

  Thanks for Paolo's reminder, i will add this.

  Regards,

  Yang

Re: [Qemu-devel] [PATCH v2] s390: avoid potential null dereference ins390_pcihost_unplug()

2019-01-07 Thread 李强


At 2019-01-08 00:10:29, "Cornelia Huck"  wrote:
>On Mon, 7 Jan 2019 16:04:35 +
>Peter Maydell  wrote:
>
>> On Mon, 7 Jan 2019 at 15:57, Cornelia Huck  wrote:
>> > On Mon, 7 Jan 2019 15:54:21 +
>> > Peter Maydell  wrote:  
>> > > On Mon, 7 Jan 2019 at 15:48, Cornelia Huck  wrote:  
>> > > > Sounds good. But please return anyway in the unplug case, so that the
>> > > > code is fine if asserts have been configured out.  
>> > >
>> > > Hopefully that won't cause the compiler to complain about
>> > > unreachable code :-)  
>> >
>> > BTW: Is there a common configuration where asserts are configured out?
>> > Not that this is an accident waiting to happen...  
>> 
>> No -- we insist they are always enabled, and osdep.h will #error
>> out if either NDEBUG or G_DISABLE_ASSERT are set.
>
>Ah, now I remember (I thought we still had that problem.)
>

>In that case, no return is needed.


Ok, later I will send out a revised patch.


Thanks,
Li Qiang

Re: [Qemu-devel] [RFC PATCH 10/25] build: convert pci.mak to Kconfig

2019-01-07 Thread Yang Zhong

On Fri, Jan 04, 2019 at 02:48:03PM +0100, Thomas Huth wrote:
> On 2018-12-27 07:34, Yang Zhong wrote:
> > From: Paolo Bonzini 
> > 
> > Instead of including the same list of devices for each target,
> > set CONFIG_PCI to true, and make the devices default to present
> > whenever PCI is available.
> > 
> > Done mostly with the following script:
> > 
> >   while read i; do
> >  i=${i%=y}; i=${i#CONFIG_}
> >  sed -i -e'/^config '$i'$/!b' -en \
> > -e'a\' -e'default y\' -e'depends on PCI' \
> >   `grep -lw $i hw/*/Kconfig`
> >   done < default-configs/pci.mak
> > 
> > followed by replacing a few "depends on" clauses with "select"
> > whenever the symbol is not really related to PCI.
> > 
> > Signed-off-by: Paolo Bonzini 
> > Signed-off-by: Yang Zhong 
> > ---
> >  default-configs/i386-softmmu.mak |  2 +-
> >  default-configs/pci.mak  | 47 
> >  hw/audio/Kconfig |  6 
> >  hw/block/Kconfig |  2 ++
> >  hw/char/Kconfig  |  2 ++
> >  hw/display/Kconfig   | 12 +++-
> >  hw/ide/Kconfig   |  6 
> >  hw/ipack/Kconfig |  2 ++
> >  hw/misc/Kconfig  |  4 +++
> >  hw/net/Kconfig   | 19 +
> >  hw/scsi/Kconfig  | 11 
> >  hw/sd/Kconfig|  3 ++
> >  hw/usb/Kconfig   | 10 +++
> >  hw/virtio/Kconfig|  3 ++
> >  hw/watchdog/Kconfig  |  2 ++
> >  15 files changed, 82 insertions(+), 49 deletions(-)
> >  delete mode 100644 default-configs/pci.mak
> [...]
> > diff --git a/hw/char/Kconfig b/hw/char/Kconfig
> > index 26c13243cf..1ed6f0dbce 100644
> > --- a/hw/char/Kconfig
> > +++ b/hw/char/Kconfig
> > @@ -15,6 +15,8 @@ config SERIAL_ISA
> >  
> >  config SERIAL_PCI
> >  bool
> > +default y
> > +depends on PCI
> 
> I think this likely needs a "select SERIAL", too? At least the
> SERIAL_ISA switch gets a "select SERIAL" in patch 15/25, so it sounds
> logical if the SERIAL_PCI would get this, too...
> 
>  Thomas

  Thomas, thanks, i will add "select SERIAL" into 15/25 patch, thanks again!

  Regards,

  Yang

Re: [Qemu-devel] [PATCH for-4.0 v9 09/16] qemu_thread: supplement error handling for pci_edu_realize

2019-01-07 Thread Peter Xu

On Tue, Jan 08, 2019 at 07:14:11AM +0100, Jiri Slaby wrote:
> On 07. 01. 19, 18:29, Markus Armbruster wrote:
> >static void pci_edu_uninit(PCIDevice *pdev)
> >{
> >EduState *edu = EDU(pdev);
> > 
> >qemu_mutex_lock(&edu->thr_mutex);
> >edu->stopping = true;
> >qemu_mutex_unlock(&edu->thr_mutex);
> >qemu_cond_signal(&edu->thr_cond);
> >qemu_thread_join(&edu->thread);
> > 
> >qemu_cond_destroy(&edu->thr_cond);
> >qemu_mutex_destroy(&edu->thr_mutex);
> > 
> >timer_del(&edu->dma_timer);
> >}
> > 
> > Preexisting: pci_edu_uninit() neglects to call msi_uninit().  Jiri?\
> 
> I don't know, the MSI support was added in:
> commit eabb5782f70b4a10975b24ccd7129929a05ac932
> Author: Peter Xu 
> Date:   Wed Sep 28 21:03:39 2016 +0800
> 
> hw/misc/edu: support MSI interrupt
> 
> Hence CCing Peter.

Hi, Jiri, Markus, Fei,

IMHO msi_uninit() is optional since it only operates on the config
space of the device to remove the capability or fix up the flags
without really doing any real destruction of objects so nothing will
be leaked (unlike msix_uninit, which should be required).  But I do
agree that calling msi_uninit() could be even nicer here.

Anyone would like to post a patch? Or should I?

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] hw: pvrdma: fix memory leak in error path

2019-01-07 Thread 李强


At 2019-01-08 00:26:58, "Yuval Shaia"  wrote:
>On Thu, Jan 03, 2019 at 02:47:37PM +0100, Philippe Mathieu-Daudé wrote:
>> On 1/3/19 2:03 PM, Li Qiang wrote:
>> > Spotted by Coverity: CID 1398595
>> > 
>> 
>> Fixes: 2b05705dc8
>> 
>> > Signed-off-by: Li Qiang 
>> 
>> Reviewed-by: Philippe Mathieu-Daudé 
>> 
>> > ---
>> >  hw/rdma/vmw/pvrdma_qp_ops.c | 2 ++
>> >  1 file changed, 2 insertions(+)
>> > 
>> > diff --git a/hw/rdma/vmw/pvrdma_qp_ops.c b/hw/rdma/vmw/pvrdma_qp_ops.c
>> > index 300471a4c9..584be2043e 100644
>> > --- a/hw/rdma/vmw/pvrdma_qp_ops.c
>> > +++ b/hw/rdma/vmw/pvrdma_qp_ops.c
>> > @@ -168,6 +168,7 @@ int pvrdma_qp_send(PVRDMADev *dev, uint32_t qp_handle)
>> >  sgid = rdma_rm_get_gid(&dev->rdma_dev_res, 
>> > wqe->hdr.wr.ud.av.gid_index);
>> >  if (!sgid) {
>> >  pr_dbg("Fail to get gid for idx %d\n", 
>> > wqe->hdr.wr.ud.av.gid_index);
>> > +g_free(comp_ctx);
>> >  return -EIO;
>> >  }
>> >  pr_dbg("sgid_id=%d, sgid=0x%llx\n", wqe->hdr.wr.ud.av.gid_index,
>> > @@ -179,6 +180,7 @@ int pvrdma_qp_send(PVRDMADev *dev, uint32_t qp_handle)
>> >  if (sgid_idx <= 0) {
>> >  pr_dbg("Fail to get bk sgid_idx for sgid_idx %d\n",
>> > wqe->hdr.wr.ud.av.gid_index);
>> > +g_free(comp_ctx);
>> >  return -EIO;
>> >  }
>
>Since comp_ctx is not used until the two checks are done we just can
>relocate the allocation & initialization right after the two checks.

>


OK, will send a revised version later.


Thanks,
Li Qiang


>Yuval
>
>> >  
>> >

Re: [Qemu-devel] [PATCH 3/3] machine: Use shorter format for GlobalProperty arrays

2019-01-07 Thread Gerd Hoffmann

  Hi,

> +{ "migration", "decompress-error-check", "off" },
> +{ "hda-audio", "use-timer", "false" },
> +{ "cirrus-vga", "global-vmstate", "true" },
> +{ "VGA", "global-vmstate", "true" },
> +{ "vmware-svga", "global-vmstate", "true" },
> +{ "qxl-vga", "global-vmstate", "true" },

I'd like to have the fields aligned.  Especially in cases like this one
where multiple devices get the same value assigned it makes things more
readable:

{ "migration",   "decompress-error-check", "off"   },
{ "hda-audio",   "use-timer",  "false" },
{ "cirrus-vga",  "global-vmstate", "true"  },
{ "VGA", "global-vmstate", "true"  },
{ "vmware-svga", "global-vmstate", "true"  },
{ "qxl-vga", "global-vmstate", "true"  },

thanks,
  Gerd

Re: [Qemu-devel] [PATCH v1] dump: Set correct vaddr for ELF dump

2019-01-07 Thread Jon Doron

Thank you for looking into this, perhaps I could change the patch (at
least in the C part not the python script) to something like:
-phdr.p_vaddr = cpu_to_dumpXX(s, memory_mapping->virt_addr);
+phdr.p_vaddr = cpu_to_dumpXX(s, memory_mapping->virt_addr) ?
cpu_to_dumpXX(s, memory_mapping->virt_addr) : phdr.p_paddr;

So in the case of paging where virt_addr is available we will use it

Thanks,
-- Jon.

On Mon, Jan 7, 2019 at 8:04 PM Laszlo Ersek  wrote:
>
> On 01/07/19 13:14, Marc-André Lureau wrote:
> > Hi
> >
> > On Tue, Dec 25, 2018 at 5:52 PM Jon Doron  wrote:
> >>
> >> vaddr needs to be equal to the paddr since the dump file represents the
> >> physical memory image.
> >>
> >> Without setting vaddr correctly, GDB would load all the different memory
> >> regions on top of each other to vaddr 0, thus making GDB showing the wrong
> >> memory data for a given address.
> >>
> >> Signed-off-by: Jon Doron 
> >
> > This is a non-trivial patch! (qemu-trivial, please ignore).
> >
> >> ---
> >>  dump.c   | 4 ++--
> >>  scripts/dump-guest-memory.py | 1 +
> >>  2 files changed, 3 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/dump.c b/dump.c
> >> index 4ec94c5e25..bf77a119ea 100644
> >> --- a/dump.c
> >> +++ b/dump.c
> >> @@ -192,7 +192,7 @@ static void write_elf64_load(DumpState *s, 
> >> MemoryMapping *memory_mapping,
> >>  phdr.p_paddr = cpu_to_dump64(s, memory_mapping->phys_addr);
> >>  phdr.p_filesz = cpu_to_dump64(s, filesz);
> >>  phdr.p_memsz = cpu_to_dump64(s, memory_mapping->length);
> >> -phdr.p_vaddr = cpu_to_dump64(s, memory_mapping->virt_addr);
> >> +phdr.p_vaddr = phdr.p_paddr;
> >
> > This is likely breaking paging=true somehow, which sets
> > memory_mapping->virt_addr to non-0.
> >
> > According to doc "If you want to use gdb to process the core, please
> > set @paging to true."
> >
> > Although I am not able to (gdb) x/10bx 0xa for example on a core
> > produced with paging. Not sure why, anybody could help?
> >
> >>  assert(memory_mapping->length >= filesz);
> >>
> >> @@ -216,7 +216,7 @@ static void write_elf32_load(DumpState *s, 
> >> MemoryMapping *memory_mapping,
> >>  phdr.p_paddr = cpu_to_dump32(s, memory_mapping->phys_addr);
> >>  phdr.p_filesz = cpu_to_dump32(s, filesz);
> >>  phdr.p_memsz = cpu_to_dump32(s, memory_mapping->length);
> >> -phdr.p_vaddr = cpu_to_dump32(s, memory_mapping->virt_addr);
> >> +phdr.p_vaddr = phdr.p_paddr;
> >>
> >>  assert(memory_mapping->length >= filesz);
> >>
> >> diff --git a/scripts/dump-guest-memory.py b/scripts/dump-guest-memory.py
> >> index 198cd0fe40..2c587cbefc 100644
> >> --- a/scripts/dump-guest-memory.py
> >> +++ b/scripts/dump-guest-memory.py
> >> @@ -163,6 +163,7 @@ class ELF(object):
> >>  phdr = get_arch_phdr(self.endianness, self.elfclass)
> >>  phdr.p_type = p_type
> >>  phdr.p_paddr = p_paddr
> >> +phdr.p_vaddr = p_paddr
> >
> > With your proposed change though, I can dump memory with gdb...
> >
> >>  phdr.p_filesz = p_size
> >>  phdr.p_memsz = p_size
> >>  self.segments.append(phdr)
> >> --
> >> 2.19.2
> >>
> >>
> >
> >
>
> I've never used paging-enabled dumps. First, because doing so requires
> QEMU to trust guest memory contents (see original commit 783e9b4826b9;
> or more recently/generally, the @dump-guest-memory docs in
> "qapi/misc.json"). Second, because whenever I had to deal with guest
> memory dumps, I always used "crash" (which needs no paging), and the
> subject guests were all Linux.
>
> I can't comment on paging-enabled patches for dump, except that they
> shouldn't regress the paging-disabled functionality. :) If the patches
> satisfy that, I'm fine.
>
> (I *am* surprised that GDB insists on p_vaddr equaling p_paddr; after
> all, in the guest, the virtual address is "memory_mapping->virt_addr".
> But, I would never claim to understand most of the ELF intricacies,
> and/or what GDB requires on top of those.)
>
> Thanks
> Laszlo

Re: [Qemu-devel] [PATCH for-4.0 v9 09/16] qemu_thread: supplement error handling for pci_edu_realize

2019-01-07 Thread Jiri Slaby

On 07. 01. 19, 18:29, Markus Armbruster wrote:
>static void pci_edu_uninit(PCIDevice *pdev)
>{
>EduState *edu = EDU(pdev);
> 
>qemu_mutex_lock(&edu->thr_mutex);
>edu->stopping = true;
>qemu_mutex_unlock(&edu->thr_mutex);
>qemu_cond_signal(&edu->thr_cond);
>qemu_thread_join(&edu->thread);
> 
>qemu_cond_destroy(&edu->thr_cond);
>qemu_mutex_destroy(&edu->thr_mutex);
> 
>timer_del(&edu->dma_timer);
>}
> 
> Preexisting: pci_edu_uninit() neglects to call msi_uninit().  Jiri?\

I don't know, the MSI support was added in:
commit eabb5782f70b4a10975b24ccd7129929a05ac932
Author: Peter Xu 
Date:   Wed Sep 28 21:03:39 2016 +0800

hw/misc/edu: support MSI interrupt

Hence CCing Peter.

thanks,
-- 
js
suse labs

[Qemu-devel] [PATCH] vfio: assign idstr for VFIO's mmaped regions for migration

2019-01-07 Thread Zhao Yan

if multiple regions in vfio are mmaped, their corresponding ramblocks
are like below, i.e. their idstrs are "".

(qemu) info ramblock
Block Name  PSize   Offset   UsedTotal
pc.ram  4 KiB  0x 0x2000 0x2000
4 KiB  0x2110 0x2000 0x2000
4 KiB  0x2090 0x0080 0x0080
4 KiB  0x2024 0x00687000 0x00687000
4 KiB  0x200c 0x00178000 0x00178000
pc.bios 4 KiB  0x2000 0x0004 0x0004
pc.rom  4 KiB  0x2004 0x0002 0x0002

This is because ramblocks' idstr are assigned by calling
vmstate_register_ram(), but memory region of type ram device ptr does not
call vmstate_register_ram().
vfio_region_mmap
|->memory_region_init_ram_device_ptr
   |-> memory_region_init_ram_ptr

Without empty idstrs will cause problem to snapshot copying during
migration, because it uses ramblocks' idstr to identify ramblocks.
ram_save_setup {
  …
  RAMBLOCK_FOREACH(block) {
  qemu_put_byte(f, strlen(block->idstr));
  qemu_put_buffer(f, (uint8_t *)block->idstr,strlen(block->idstr));
  qemu_put_be64(f, block->used_length);
  }
  …
}
ram_load() {
block = qemu_ram_block_by_name(id);
if (block) {
if (length != block->used_length) {
qemu_ram_resize(block, length, &local_err);
}
 ….
   }
}

Therefore, in this patch,
vmstate_register_ram() is called for memory region of type ram ptr,
also a unique vfioid is assigned to vfio devices across source
and target vms.
e.g. in source vm, use qemu parameter
-device
vfio-pci,sysfsdev=/sys/bus/pci/devices/:00:02.0/
882cc4da-dede-11e7-9180-078a62063ab1,vfioid=igd

and in target vm, use qemu paramter
-device
vfio-pci,sysfsdev=/sys/bus/pci/devices/:00:02.0/
5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,vfioid=igd

Signed-off-by: Zhao Yan 
---
 hw/vfio/pci.c | 8 +++-
 include/hw/vfio/vfio-common.h | 1 +
 memory.c  | 4 
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index c0cb1ec289..7bc2ed0752 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2533,7 +2533,12 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 }
 
 for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
-char *name = g_strdup_printf("%s BAR %d", vbasedev->name, i);
+char *name;
+if (vbasedev->vfioid) {
+name = g_strdup_printf("%s BAR %d", vbasedev->vfioid, i);
+} else {
+name = g_strdup_printf("%s BAR %d", vbasedev->name, i);
+}
 
 ret = vfio_region_setup(OBJECT(vdev), vbasedev,
 &vdev->bars[i].region, i, name);
@@ -3180,6 +3185,7 @@ static void vfio_instance_init(Object *obj)
 static Property vfio_pci_dev_properties[] = {
 DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
 DEFINE_PROP_STRING("sysfsdev", VFIOPCIDevice, vbasedev.sysfsdev),
+DEFINE_PROP_STRING("vfioid", VFIOPCIDevice, vbasedev.vfioid),
 DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
 display, ON_OFF_AUTO_OFF),
 DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 1b434d02f6..84bab94f52 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -108,6 +108,7 @@ typedef struct VFIODevice {
 struct VFIOGroup *group;
 char *sysfsdev;
 char *name;
+char *vfioid;
 DeviceState *dev;
 int fd;
 int type;
diff --git a/memory.c b/memory.c
index d14c6dec1d..dbb29fa989 100644
--- a/memory.c
+++ b/memory.c
@@ -1588,6 +1588,7 @@ void memory_region_init_ram_ptr(MemoryRegion *mr,
 uint64_t size,
 void *ptr)
 {
+DeviceState *owner_dev;
 memory_region_init(mr, owner, name, size);
 mr->ram = true;
 mr->terminates = true;
@@ -1597,6 +1598,9 @@ void memory_region_init_ram_ptr(MemoryRegion *mr,
 /* qemu_ram_alloc_from_ptr cannot fail with ptr != NULL.  */
 assert(ptr != NULL);
 mr->ram_block = qemu_ram_alloc_from_ptr(size, ptr, mr, &error_fatal);
+
+owner_dev = DEVICE(owner);
+vmstate_register_ram(mr, owner_dev);
 }
 
 void memory_region_init_ram_device_ptr(MemoryRegion *mr,
-- 
2.17.1

[Qemu-devel] [PATCH] vfio: assign idstr for VFIO's mmaped regions for migration

2019-01-07 Thread Zhao Yan

if multiple regions in vfio are mmaped, their corresponding ramblocks
are like below, i.e. their idstrs are "".

(qemu) info ramblock
Block Name  PSize   Offset   UsedTotal
pc.ram  4 KiB  0x 0x2000 0x2000
4 KiB  0x2110 0x2000 0x2000
4 KiB  0x2090 0x0080 0x0080
4 KiB  0x2024 0x00687000 0x00687000
4 KiB  0x200c 0x00178000 0x00178000
pc.bios 4 KiB  0x2000 0x0004 0x0004
pc.rom  4 KiB  0x2004 0x0002 0x0002

This is because ramblocks' idstr are assigned by calling
vmstate_register_ram(), but memory region of type ram device ptr does not
call vmstate_register_ram().
vfio_region_mmap
|->memory_region_init_ram_device_ptr
   |-> memory_region_init_ram_ptr

Without empty idstrs will cause problem to snapshot copying during
migration, because it uses ramblocks' idstr to identify ramblocks.
ram_save_setup {
  …
  RAMBLOCK_FOREACH(block) {
  qemu_put_byte(f, strlen(block->idstr));
  qemu_put_buffer(f, (uint8_t *)block->idstr,strlen(block->idstr));
  qemu_put_be64(f, block->used_length);
  }
  …
}
ram_load() {
block = qemu_ram_block_by_name(id);
if (block) {
if (length != block->used_length) {
qemu_ram_resize(block, length, &local_err);
}
 ….
   }
}

Therefore, in this patch,
vmstate_register_ram() is called for memory region of type ram ptr,
also a unique vfioid is assigned to vfio devices across source
and target vms.
e.g. in source vm, use qemu parameter
-device
vfio-pci,sysfsdev=/sys/bus/pci/devices/:00:02.0/
882cc4da-dede-11e7-9180-078a62063ab1,vfioid=igd

and in target vm, use qemu paramter
-device
vfio-pci,sysfsdev=/sys/bus/pci/devices/:00:02.0/
5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,vfioid=igd

Signed-off-by: Zhao Yan 
---
 hw/vfio/pci.c | 8 +++-
 include/hw/vfio/vfio-common.h | 1 +
 memory.c  | 4 
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index c0cb1ec289..7bc2ed0752 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2533,7 +2533,12 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 }
 
 for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
-char *name = g_strdup_printf("%s BAR %d", vbasedev->name, i);
+char *name;
+if (vbasedev->vfioid) {
+name = g_strdup_printf("%s BAR %d", vbasedev->vfioid, i);
+} else {
+name = g_strdup_printf("%s BAR %d", vbasedev->name, i);
+}
 
 ret = vfio_region_setup(OBJECT(vdev), vbasedev,
 &vdev->bars[i].region, i, name);
@@ -3180,6 +3185,7 @@ static void vfio_instance_init(Object *obj)
 static Property vfio_pci_dev_properties[] = {
 DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
 DEFINE_PROP_STRING("sysfsdev", VFIOPCIDevice, vbasedev.sysfsdev),
+DEFINE_PROP_STRING("vfioid", VFIOPCIDevice, vbasedev.vfioid),
 DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
 display, ON_OFF_AUTO_OFF),
 DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 1b434d02f6..84bab94f52 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -108,6 +108,7 @@ typedef struct VFIODevice {
 struct VFIOGroup *group;
 char *sysfsdev;
 char *name;
+char *vfioid;
 DeviceState *dev;
 int fd;
 int type;
diff --git a/memory.c b/memory.c
index d14c6dec1d..dbb29fa989 100644
--- a/memory.c
+++ b/memory.c
@@ -1588,6 +1588,7 @@ void memory_region_init_ram_ptr(MemoryRegion *mr,
 uint64_t size,
 void *ptr)
 {
+DeviceState *owner_dev;
 memory_region_init(mr, owner, name, size);
 mr->ram = true;
 mr->terminates = true;
@@ -1597,6 +1598,9 @@ void memory_region_init_ram_ptr(MemoryRegion *mr,
 /* qemu_ram_alloc_from_ptr cannot fail with ptr != NULL.  */
 assert(ptr != NULL);
 mr->ram_block = qemu_ram_alloc_from_ptr(size, ptr, mr, &error_fatal);
+
+owner_dev = DEVICE(owner);
+vmstate_register_ram(mr, owner_dev);
 }
 
 void memory_region_init_ram_device_ptr(MemoryRegion *mr,
-- 
2.17.1

Re: [Qemu-devel] [RFC PATCH 0/7] virtio-fs: shared file system for virtual machines3

2019-01-07 Thread jiangyiwen

On 2018/12/27 3:08, Vivek Goyal wrote:
> On Sat, Dec 22, 2018 at 05:27:28PM +0800, jiangyiwen wrote:
>> On 2018/12/11 1:31, Dr. David Alan Gilbert (git) wrote:
>>> From: "Dr. David Alan Gilbert" 
>>>
>>> Hi,
>>>   This is the first RFC for the QEMU side of 'virtio-fs';
>>> a new mechanism for mounting host directories into the guest
>>> in a fast, consistent and secure manner.  Our primary use
>>> case is kata containers, but it should be usable in other scenarios
>>> as well.
>>>
>>> There are corresponding patches being posted to Linux kernel,
>>> libfuse and kata lists.
>>>
>>> For a fuller design description, and benchmark numbers, please see
>>> Vivek's posting of the kernel set here:
>>>
>>> https://marc.info/?l=linux-kernel&m=154446243024251&w=2
>>>
>>> We've got a small website with instructions on how to use it, here:
>>>
>>> https://virtio-fs.gitlab.io/
>>>
>>> and all the code is available on gitlab at:
>>>
>>> https://gitlab.com/virtio-fs
>>>
>>> QEMU's changes
>>> --
>>>
>>> The QEMU changes are pretty small; 
>>>
>>> There's a new vhost-user device, which is used to carry a stream of
>>> FUSE messages to an external daemon that actually performs
>>> all the file IO.  The FUSE daemon is an external process in order to
>>> achieve better isolation for security and resource control (e.g. number
>>> of file descriptors) and also because it's cleaner than trying to
>>> integrate libfuse into QEMU.
>>>
>>> This device has an extra BAR that contains (up to) 3 regions:
>>>
>>>  a) a DAX mapping range ('the cache') - into which QEMU mmap's
>>> files on behalf of the external daemon; those files are
>>> then directly mapped by the guest in a way similar to a DAX
>>> backed file system;  one advantage of this is that multiple
>>> guests all accessing the same files should all be sharing
>>> those pages of host cache.
>>>
>>>  b) An experimental set of mappings for use by a metadata versioning
>>> daemon;  this mapping is shared between multiple guests and
>>> the daemon, but only contains a set of version counters that
>>> allow a guest to quickly tell if its metadata is stale.
>>>
>>> TODO
>>> 
>>>
>>> This is the first RFC, we know we have a bunch of things to clear up:
>>>
>>>   a) The virtio device specificiation is still in flux and is expected
>>>  to change
>>>
>>>   b) We'd like to find ways of reducing the map/unmap latency for DAX
>>>
>>>   c) The metadata versioning scheme needs to settle out.
>>>
>>>   d) mmap'ing host files has some interesting side effects; for example
>>>  if the file gets truncated by the host and then the guest accesses
>>>  the mapping, KVM can fail the guest hard.
>>>
>>> Dr. David Alan Gilbert (6):
>>>   virtio: Add shared memory capability
>>>   virtio-fs: Add cache BAR
>>>   virtio-fs: Add vhost-user slave commands for mapping
>>>   virtio-fs: Fill in slave commands for mapping
>>>   virtio-fs: Allow mapping of meta data version table
>>>   virtio-fs: Allow mapping of journal
>>>
>>> Stefan Hajnoczi (1):
>>>   virtio: add vhost-user-fs-pci device
>>>
>>>  configure   |  10 +
>>>  contrib/libvhost-user/libvhost-user.h   |   3 +
>>>  docs/interop/vhost-user.txt |  35 ++
>>>  hw/virtio/Makefile.objs |   1 +
>>>  hw/virtio/vhost-user-fs.c   | 517 
>>>  hw/virtio/vhost-user.c  |  16 +
>>>  hw/virtio/virtio-pci.c  | 115 +
>>>  hw/virtio/virtio-pci.h  |  19 +
>>>  include/hw/pci/pci.h|   1 +
>>>  include/hw/virtio/vhost-user-fs.h   |  79 +++
>>>  include/standard-headers/linux/virtio_fs.h  |  48 ++
>>>  include/standard-headers/linux/virtio_ids.h |   1 +
>>>  include/standard-headers/linux/virtio_pci.h |   9 +
>>>  13 files changed, 854 insertions(+)
>>>  create mode 100644 hw/virtio/vhost-user-fs.c
>>>  create mode 100644 include/hw/virtio/vhost-user-fs.h
>>>  create mode 100644 include/standard-headers/linux/virtio_fs.h
>>>
>>
>> Hi Dave,
>>
>> I encounter a problem after running qemu with virtio-fs,
>>
>> I find I only can mount virtio-fs using the following command:
>> mount -t virtio_fs /dev/null /mnt/virtio_fs/ -o 
>> tag=myfs,rootmode=04,user_id=0,group_id=0
>> or mount -t virtio_fs /dev/null /mnt/virtio_fs/ -o 
>> tag=myfs,rootmode=04,user_id=0,group_id=0,dax
>>
>> Then, I want to know how to use "cache=always" or "cache=none", even 
>> "cache=auto", "cache=writeback"?
>>
>> Thanks,
>> Yiwen.
> 
> Hi Yiwen,
> 
> As of now, cache options are libfuse daemon options. So while starting
> daemon, specify "-o cache=none" or "-o cache=always" etc. One can not
> specify caching option at virtio-fs mount time.
> 
> Thanks
> Vivek
> 
> .
> 

Ok, I get it, thanks.

Yiwen.

Re: [Qemu-devel] [PATCH v2 06/52] audio: -audiodev command line option basic implementation

2019-01-07 Thread Gerd Hoffmann

  Hi,

> > I suspect your series uses the options visitor only because back when
> > you started, qobject_input_visitor_new_str() didn't exist.
> 
> Yes, this patch series is a bit old, and at that time this was the best
> I could do. I can look into it this (probably only on the weekend
> though), it looks like it supports foo.bar=something syntax, so I don't
> have to specify a json on the command line...

It supports both, i.e. this ...

qemu -display gtk,full-screen=on

... and this ...

qemu -display '{ "type": "gtk", "full-screen": true }'

... has the same effect.

cheers,
  Gerd

Re: [Qemu-devel] [RFC PATCH 10/25] build: convert pci.mak to Kconfig

2019-01-07 Thread Yang Zhong

On Thu, Jan 03, 2019 at 05:06:32PM +0100, Thomas Huth wrote:
> On 2018-12-27 07:34, Yang Zhong wrote:
> > From: Paolo Bonzini 
> > 
> > Instead of including the same list of devices for each target,
> > set CONFIG_PCI to true, and make the devices default to present
> > whenever PCI is available.
> > 
> > Done mostly with the following script:
> > 
> >   while read i; do
> >  i=${i%=y}; i=${i#CONFIG_}
> >  sed -i -e'/^config '$i'$/!b' -en \
> > -e'a\' -e'default y\' -e'depends on PCI' \
> >   `grep -lw $i hw/*/Kconfig`
> >   done < default-configs/pci.mak
> > 
> > followed by replacing a few "depends on" clauses with "select"
> > whenever the symbol is not really related to PCI.
> > 
> > Signed-off-by: Paolo Bonzini 
> > Signed-off-by: Yang Zhong 
> > ---
> >  default-configs/i386-softmmu.mak |  2 +-
> >  default-configs/pci.mak  | 47 
> >  hw/audio/Kconfig |  6 
> >  hw/block/Kconfig |  2 ++
> >  hw/char/Kconfig  |  2 ++
> >  hw/display/Kconfig   | 12 +++-
> >  hw/ide/Kconfig   |  6 
> >  hw/ipack/Kconfig |  2 ++
> >  hw/misc/Kconfig  |  4 +++
> >  hw/net/Kconfig   | 19 +
> >  hw/scsi/Kconfig  | 11 
> >  hw/sd/Kconfig|  3 ++
> >  hw/usb/Kconfig   | 10 +++
> >  hw/virtio/Kconfig|  3 ++
> >  hw/watchdog/Kconfig  |  2 ++
> >  15 files changed, 82 insertions(+), 49 deletions(-)
> >  delete mode 100644 default-configs/pci.mak
> > 
> > diff --git a/default-configs/i386-softmmu.mak 
> > b/default-configs/i386-softmmu.mak
> > index 560250c998..61f19e5231 100644
> > --- a/default-configs/i386-softmmu.mak
> > +++ b/default-configs/i386-softmmu.mak
> > @@ -1,6 +1,6 @@
> >  # Default configuration for i386-softmmu
> [...]
> > diff --git a/hw/ide/Kconfig b/hw/ide/Kconfig
> > index 091f3a81c9..0f25b27163 100644
> > --- a/hw/ide/Kconfig
> > +++ b/hw/ide/Kconfig
> > @@ -7,6 +7,7 @@ config IDE_QDEV
> >  
> >  config IDE_PCI
> >  bool
> > +depends on PCI
> >  select IDE_CORE
> >  
> >  config IDE_ISA
> > @@ -15,10 +16,12 @@ config IDE_ISA
> >  
> >  config IDE_PIIX
> >  bool
> > +depends on PCI
> >  select IDE_QDEV
> >  
> >  config IDE_CMD646
> >  bool
> > +select IDE_PCI
> >  select IDE_QDEV
> 
> Why "select" here and not "depends on" like in the previous cases?
> 
  Thomas, thanks for reviewing, i will remove "select IDE_PCI" to previous
  patch(ide patch), IDE_PIIX should also "select IDE_PCI" here, thanks!

  Regards,

  Yang
 
> >  config IDE_MACIO
> > @@ -31,6 +34,7 @@ config IDE_MMIO
> >  
> >  config IDE_VIA
> >  bool
> > +select IDE_PCI
> >  select IDE_QDEV
> 
> dito
> 
  Yes, same as above, thanks for reminder!


> >  config MICRODRIVE
> > @@ -39,6 +43,8 @@ config MICRODRIVE
> >  
> >  config AHCI
> >  bool
> > +default y
> > +depends on PCI
> >  select IDE_QDEV
> >  
> >  config IDE_SII3112
> 
> I think the SII3112 needs a "depends on PCI", too?
> 
  Yes, i need add "depends on PCI" here, thanks!

  Regards,

  Yang

> > diff --git a/hw/net/Kconfig b/hw/net/Kconfig
> > index 6b2ec971b5..c87375ee52 100644
> > --- a/hw/net/Kconfig
> > +++ b/hw/net/Kconfig
> > @@ -3,27 +3,42 @@ config DP8393X
> >  
> >  config NE2000_PCI
> >  bool
> > +default y
> > +depends on PCI
> >  
> >  config EEPRO100_PCI
> >  bool
> > +default y
> > +depends on PCI
> >  
> >  config PCNET_PCI
> >  bool
> > +default y
> > +depends on PCI
> > +select PCNET_COMMON
> >  
> >  config PCNET_COMMON
> >  bool
> >  
> >  config E1000_PCI
> >  bool
> > +default y
> > +depends on PCI
> >  
> >  config E1000E_PCI
> >  bool
> > +default y
> > +depends on PCI
> >  
> >  config RTL8139_PCI
> >  bool
> > +default y
> > +depends on PCI
> >  
> >  config VMXNET3_PCI
> >  bool
> > +default y
> > +depends on PCI
> >  
> >  config SMC91C111
> >  bool
> > @@ -81,12 +96,16 @@ config ETSEC
> >  
> >  config ROCKER
> >  bool
> > +default y
> > +depends on PCI
> >  
> >  config CAN_BUS
> >  bool
> >  
> >  config CAN_PCI
> >  bool
> > +default y
> > +depends on PCI
> 
> Do we need a "select CAN_BUS" here? (Well, maybe that's rather something
> for a later patch instead...)
> 
>  Thomas

  Hello Thomas,

  Yes, right. All those code are based on can bus, which are implemented 
  in net/can. CAN_PCI and CAN_SJA1000 will select CAN_BUS, thanks!

  Regards,

  Yang

Re: [Qemu-devel] [RFC PATCH 09/25] ide: express dependencies with Kconfig

2019-01-07 Thread Yang Zhong

On Thu, Jan 03, 2019 at 04:47:02PM +0100, Thomas Huth wrote:
> On 2018-12-27 07:34, Yang Zhong wrote:
> > From: Paolo Bonzini 
> > 
> > Signed-off-by: Paolo Bonzini 
> > ---
> [...]
> > diff --git a/hw/ide/Kconfig b/hw/ide/Kconfig
> > index 5ec449525f..091f3a81c9 100644
> > --- a/hw/ide/Kconfig
> > +++ b/hw/ide/Kconfig
> > @@ -3,33 +3,44 @@ config IDE_CORE
> >  
> >  config IDE_QDEV
> >  bool
> > +select IDE_CORE
> >  
> >  config IDE_PCI
> >  bool
> > +select IDE_CORE
> >  
> >  config IDE_ISA
> >  bool
> > +select IDE_QDEV
> >  
> >  config IDE_PIIX
> >  bool
> > +select IDE_QDEV
> >  
> >  config IDE_CMD646
> >  bool
> > +select IDE_QDEV
> 
> PIIX and CMD646 seem to be derived from TYPE_PCI_IDE, so shouldn't these
> switches rather select IDE_PCI instead? (Or depend on IDE_PCI?)
> 
  Hello Thomas,

  Thanks, you are right. The CONFIG_IDE_PCI=y was removed in this patch,
  so the "select IDE_PCI" should be added here from next patch(convert 
  pci.mak to Kconfig patch). thanks!

  Regards,

  Yang  


> >  config IDE_MACIO
> >  bool
> > +select IDE_QDEV
> >  
> >  config IDE_MMIO
> >  bool
> > +select IDE_QDEV
> >  
> >  config IDE_VIA
> >  bool
> > +select IDE_QDEV
> 
> dito, VIA is a PCI device, too.
> 
  Yes, you are right, i will change this like above. thanks!

  Regards,

  Yang

> >  config MICRODRIVE
> >  bool
> > +select IDE_QDEV
> >  
> >  config AHCI
> >  bool
> > +select IDE_QDEV
> >  
> >  config IDE_SII3112
> >  bool
> > +select IDE_QDEV
> > 
> 
> dito, SII3112 is a PCI device.
> 
>  Thomas

  Yes, thanks for reminder!
  I will add "depends on PCI" in next patches(convert pci.mak to 
  Kconfig patch), thanks

  Regards,

  Yang

Re: [Qemu-devel] [PATCH v2 23/27] target/arm: Implement pauth_computepac

2019-01-07 Thread Richard Henderson

On 1/8/19 12:09 AM, Peter Maydell wrote:
>> +static int rot_cell(int cell, int n)
>> +{
>> +cell |= cell << 4;
>> +cell >>= n;
>> +return cell & 0xf;
> 
> This doesn't seem to do what the RotCell pseudocode does?
> Unless I've made an error, RotCell(ABCD, 1) == BCDA,
> but rot_cell(ABCD, 1) == DABC.

Yep, I mis-read the direction of the rotate.

Thanks for all of the proof-reading.
This section I found particularly eye watering.


r~

Re: [Qemu-devel] [PATCH v2 22/27] target/arm: Implement pauth_addpac

2019-01-07 Thread Richard Henderson

On 1/7/19 11:31 PM, Peter Maydell wrote:
>> +/* Preserve the determination between upper and lower at bit 55,
>> + * and insert pointer authentication code.
>> + */
>> +if (param.tbi) {
>> +ptr &= ~MAKE_64BIT_MASK(bot_bit, 55 - bot_bit + 1);
>> +pac &= MAKE_64BIT_MASK(bot_bit, 54 - bot_bit + 1);
>> +} else {
>> +ptr &= MAKE_64BIT_MASK(0, bot_bit);
>> +pac &= ~(MAKE_64BIT_MASK(55, 1) | MAKE_64BIT_MASK(0, bot_bit));
>> +}
>> +ext &= MAKE_64BIT_MASK(55, 1);
> 
> I found this a bit confusing to disentangle and compare with
> the pseudocode: the difference between the tbi and
> not-tbi cases is only "what are bits 63:56 in the result",
> but the implementation of how we put together bits 55:0 is
> different in the two code paths here.

Yes.  I found the pseudocode itself to be confusing.
Perhaps I went away from it too far.


r~

Re: [Qemu-devel] [PATCH v2 06/52] audio: -audiodev command line option basic implementation

2019-01-07 Thread Markus Armbruster

"Zoltán Kővágó"  writes:

> On 2019-01-07 14:13, Markus Armbruster wrote:
>> "Kővágó, Zoltán"  writes:
>> 
>>> Audio drivers now get an Audiodev * as config paramters, instead of the
>>> global audio_option structs.  There is some code in audio/audio_legacy.c
>>> that converts the old environment variables to audiodev options (this
>>> way backends do not have to worry about legacy options).  It also
>>> contains a replacement of -audio-help, which prints out the equivalent
>>> -audiodev based config of the currently specified environment variables.
>>>
>>> Note that backends are not updated and still rely on environment
>>> variables.
>>>
>>> Also note that (due to moving try-poll from global to backend specific
>>> option) currently ALSA and OSS will always try poll mode, regardless of
>>> environment variables or -audiodev options.
>>>
>>> Signed-off-by: Kővágó, Zoltán 
>>> ---
>> [...]
>>> diff --git a/audio/audio.c b/audio/audio.c
>>> index 96cbd57c37..e7f25ea84b 100644
>>> --- a/audio/audio.c
>>> +++ b/audio/audio.c
>> [...]
>>> @@ -2127,3 +1841,158 @@ void AUD_set_volume_in (SWVoiceIn *sw, int mute, 
>>> uint8_t lvol, uint8_t rvol)
>>>  }
>>>  }
>>>  }
>>> +
>>> +QemuOptsList qemu_audiodev_opts = {
>>> +.name = "audiodev",
>>> +.head = QTAILQ_HEAD_INITIALIZER(qemu_audiodev_opts.head),
>>> +.implied_opt_name = "driver",
>>> +.desc = {
>>> +/*
>>> + * no elements => accept any params
>>> + * sanity checking will happen later
>>> + */
>>> +{ /* end of list */ }
>>> +},
>>> +};
>>> +
>>> +static void validate_per_direction_opts(AudiodevPerDirectionOptions *pdo,
>>> +Error **errp)
>>> +{
>>> +if (!pdo->has_fixed_settings) {
>>> +pdo->has_fixed_settings = true;
>>> +pdo->fixed_settings = true;
>>> +}
>>> +if (!pdo->fixed_settings &&
>>> +(pdo->has_frequency || pdo->has_channels || pdo->has_format)) {
>>> +error_setg(errp,
>>> +   "You can't use frequency, channels or format with 
>>> fixed-settings=off");
>>> +return;
>>> +}
>>> +
>>> +if (!pdo->has_frequency) {
>>> +pdo->has_frequency = true;
>>> +pdo->frequency = 44100;
>>> +}
>>> +if (!pdo->has_channels) {
>>> +pdo->has_channels = true;
>>> +pdo->channels = 2;
>>> +}
>>> +if (!pdo->has_voices) {
>>> +pdo->has_voices = true;
>>> +pdo->voices = 1;
>>> +}
>>> +if (!pdo->has_format) {
>>> +pdo->has_format = true;
>>> +pdo->format = AUDIO_FORMAT_S16;
>>> +}
>>> +}
>>> +
>>> +static Audiodev *parse_option(QemuOpts *opts, Error **errp)
>>> +{
>>> +Error *local_err = NULL;
>>> +Visitor *v = opts_visitor_new(opts, true);
>>> +Audiodev *dev = NULL;
>>> +visit_type_Audiodev(v, NULL, &dev, &local_err);
>>> +visit_free(v);
>>> +
>>> +if (local_err) {
>>> +goto err2;
>>> +}
>>> +
>>> +validate_per_direction_opts(dev->in, &local_err);
>>> +if (local_err) {
>>> +goto err;
>>> +}
>>> +validate_per_direction_opts(dev->out, &local_err);
>>> +if (local_err) {
>>> +goto err;
>>> +}
>>> +
>>> +if (!dev->has_timer_period) {
>>> +dev->has_timer_period = true;
>>> +dev->timer_period = 1; /* 100Hz -> 10ms */
>>> +}
>>> +
>>> +return dev;
>>> +
>>> +err:
>>> +qapi_free_Audiodev(dev);
>>> +err2:
>>> +error_propagate(errp, local_err);
>>> +return NULL;
>>> +}
>>> +
>>> +static int each_option(void *opaque, QemuOpts *opts, Error **errp)
>>> +{
>>> +Audiodev *dev = parse_option(opts, errp);
>>> +if (!dev) {
>>> +return -1;
>>> +}
>>> +return audio_init(dev);
>>> +}
>>> +
>>> +void audio_set_options(void)
>>> +{
>>> +if (qemu_opts_foreach(qemu_find_opts("audiodev"), each_option, NULL,
>>> +  &error_abort)) {
>>> +exit(1);
>>> +}
>>> +}
>> [...]
>>> diff --git a/vl.c b/vl.c
>>> index 8353d3c718..b5364ffe46 100644
>>> --- a/vl.c
>>> +++ b/vl.c
>>> @@ -3074,6 +3074,7 @@ int main(int argc, char **argv, char **envp)
>>>  qemu_add_opts(&qemu_option_rom_opts);
>>>  qemu_add_opts(&qemu_machine_opts);
>>>  qemu_add_opts(&qemu_accel_opts);
>>> +qemu_add_opts(&qemu_audiodev_opts);
>>>  qemu_add_opts(&qemu_mem_opts);
>>>  qemu_add_opts(&qemu_smp_opts);
>>>  qemu_add_opts(&qemu_boot_opts);
>>> @@ -3307,9 +3308,15 @@ int main(int argc, char **argv, char **envp)
>>>  add_device_config(DEV_BT, optarg);
>>>  break;
>>>  case QEMU_OPTION_audio_help:
>>> -AUD_help ();
>>> +audio_legacy_help();
>>>  exit (0);
>>>  break;
>>> +case QEMU_OPTION_audiodev:
>>> +if (!qemu_opts_parse_noisily(qemu_find_opts("audiodev"),
>>> + optarg, true)) {
>>> +

Re: [Qemu-devel] [PATCH v2 04/27] target/arm: Add PAuth helpers

2019-01-07 Thread Richard Henderson

On 1/5/19 2:25 AM, Peter Maydell wrote:
> On Fri, 14 Dec 2018 at 05:24, Richard Henderson
>  wrote:
>>
>> The cryptographic internals are stubbed out for now,
>> but the enable and trap bits are checked.
>>
>> Signed-off-by: Richard Henderson 
>> 
> 
>> +static void QEMU_NORETURN pauth_trap(CPUARMState *env, int target_el,
>> + uintptr_t ra)
>> +{
>> +CPUState *cs = ENV_GET_CPU(env);
>> +
>> +cs->exception_index = EXCP_UDEF;
>> +env->exception.syndrome = syn_pactrap();
>> +env->exception.target_el = target_el;
>> +cpu_loop_exit_restore(cs, ra);
> 
> This should use raise_exception(), or some variant on it that
> lets you pass in the ra, because otherwise you lose the
> "redirect EL1 exceptions to EL2" HCR.TGE behaviour.
> Or can we only ever call this for a target_el of 2 or 3?

This particular usage can only ever target EL 2 or 3,
in response to {SCR,HCR}.API being clear.  AFAICS that
trap is properly directed already.

But, yes, raise_exception_ra would be useful.


r~

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/5] Support for datapath switching during live migration

2019-01-07 Thread Michael S. Tsirkin

On Mon, Jan 07, 2019 at 05:45:22PM -0800, si-wei liu wrote:
> 
> 
> On 01/07/2019 03:32 PM, Michael S. Tsirkin wrote:
> > On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
> > > Implement the infrastructure to support datapath switching during live
> > > migration involving SR-IOV devices.
> > > 
> > > 1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
> > > bit and MAC address device pairing.
> > > 
> > > 2. This set of events will be consumed by userspace management software
> > > to orchestrate all the hot plug and datapath switching activities.
> > > This scheme has the least QEMU modifications while allowing userspace
> > > software to build its own intelligence to control the whole process
> > > of SR-IOV live migration.
> > > 
> > > 3. While the hidden device model (viz. coupled device model) is still
> > > being explored for automatic hot plugging (QEMU) and automatic 
> > > datapath
> > > switching (host-kernel), this series provides a supplemental set
> > > of interfaces if management software wants to drive the SR-IOV live
> > > migration on its own. It should not conflict with the hidden device
> > > model but just offers simplicity of implementation.
> > > 
> > > 
> > > Si-Wei Liu (2):
> > >vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime 
> > > during failover
> > >pci: query command extension to check the bus master enabling status 
> > > of the failover-primary device
> > > 
> > > Sridhar Samudrala (1):
> > >virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
> > > 
> > > Venu Busireddy (2):
> > >virtio_net: Add support for "Data Path Switching" during Live 
> > > Migration.
> > >virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
> > > 
> > > ---
> > > Changes in v3:
> > >Fix issues with coding style in patch 3/5.
> > > 
> > > Changes in v2:
> > >Added a query command for FAILOVER_STANDBY_CHANGED event.
> > >Added a query command for FAILOVER_PRIMARY_CHANGED event.
> > Hmm it looks like all feedback I sent e.g. here:
> > https://patchwork.kernel.org/patch/10721571/
> > got ignored.
> > 
> > To summarize I suggest reworking the series adding a new command along
> > the lines of (naming is up to you):
> > 
> > query-pci-master - this returns status for a device
> >and enables a *single* event after
> >it changes
> > 
> > and then removing all status data from the event,
> > just notify about the change and *only once*.
> Why removing all status data from the event?

To make sure users do not forget to call query-pci-master to
re-enable more events.

> It does not hurt to keep them
> as the FAILOVER_PRIMARY_CHANGED event in general is of pretty low-frequency.

A malicious guest can make it as frequent as it wants to.
OTOH there is no way to limit.


> As can be seen other similar low-frequent QMP events do have data carried
> over.
> 
> As this event relates to datapath switching, there's implication to coalesce
> events as packets might not get a chance to send out as nothing would ever
> happen when  going through quick transitions like
> disabled->enabled->disabled. I would allow at least few packets to be sent
> over wire rather than nothing. Who knows how fast management can react and
> consume these events?
> 
> Thanks,
> -Siwei

OK if it's so important for latency let's include data in the event.
Please add comments explaining that you must always re-run query
afterwards to make sure it's stable and re-enable more events.



> > 
> > 
> > upon event management does query-pci-master
> > and acts accordingly.
> > 
> > 
> > 
> > 
> > >   hmp.c  |   5 +++
> > >   hw/acpi/pcihp.c|  27 +++
> > >   hw/net/virtio-net.c|  42 +
> > >   hw/pci/pci.c   |   5 +++
> > >   hw/vfio/pci.c  |  60 +
> > >   hw/vfio/pci.h  |   1 +
> > >   include/hw/pci/pci.h   |   1 +
> > >   include/hw/virtio/virtio-net.h |   1 +
> > >   include/net/net.h  |   2 +
> > >   net/net.c  |  61 +
> > >   qapi/misc.json |   5 ++-
> > >   qapi/net.json  | 100 
> > > +
> > >   12 files changed, 309 insertions(+), 1 deletion(-)
> > -
> > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
> >

[Qemu-devel] [PULL 2/3] target/alpha: Fix user-only initialization of fpcr

2019-01-07 Thread Richard Henderson

When the representation of fpcr was changed, the user-only
initialization was not updated to match.  Oops.

Fixes: f3d3aad4a92
Fixes: https://bugs.launchpad.net/bugs/1701835
Reported-by: Bruno Haible 
Signed-off-by: Richard Henderson 
---
 target/alpha/cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
index a953897fcc..1fd95d6c0f 100644
--- a/target/alpha/cpu.c
+++ b/target/alpha/cpu.c
@@ -205,9 +205,9 @@ static void alpha_cpu_initfn(Object *obj)
 env->lock_addr = -1;
 #if defined(CONFIG_USER_ONLY)
 env->flags = ENV_FLAG_PS_USER | ENV_FLAG_FEN;
-cpu_alpha_store_fpcr(env, (FPCR_INVD | FPCR_DZED | FPCR_OVFD
-   | FPCR_UNFD | FPCR_INED | FPCR_DNOD
-   | FPCR_DYN_NORMAL));
+cpu_alpha_store_fpcr(env, (uint64_t)(FPCR_INVD | FPCR_DZED | FPCR_OVFD
+ | FPCR_UNFD | FPCR_INED | FPCR_DNOD
+ | FPCR_DYN_NORMAL) << 32);
 #else
 env->flags = ENV_FLAG_PAL_MODE | ENV_FLAG_FEN;
 #endif
-- 
2.17.2

[Qemu-devel] [PULL 3/3] pc-bios: Update palcode-clipper

2019-01-07 Thread Richard Henderson

Do not double-update the PC after OPCDEC.

Fixes: https://bugs.launchpad.net/bugs/1810545
Signed-off-by: Richard Henderson 
---
 pc-bios/palcode-clipper | Bin 152680 -> 155968 bytes
 roms/qemu-palcode   |   2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/pc-bios/palcode-clipper b/pc-bios/palcode-clipper
index 
1df377a0fde034e958f6a3b6f90ed9f19cfe582a..fb9026ae64c19937007516b55a4048c28bd8db29
 100644
GIT binary patch
literal 155968
zcmeFa4SZD9nLmE++uUr=+WHa|5jAH1-{+ij@66o9
z*V*6a|NsAf_d@2L^Sqts`JU&Td(OEN=$N;7zDrSbjL~!TMaE{?8@l2ssEi2sj8h2sj8h
z2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h
z2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h
z2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h
z2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h
z2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h
z2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h
z2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h
z2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h
z2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h
z2sj8h2sj8h2sj8h2sj8h2sj8h2sj8h2z+@2y7wq-=jn^9c9zdzAFt115reT;*5^gb
zf}tJiqE5g6I)4kB&y0xSn#pPv7F^8OwZ=p(7`@UP{iB<)6DliPrLo8=oh@CJ
zNZfQfQ@>NrIzRdjm5nS>{_;Nl@6&X%m#;*ufyy1~!cKqX-(YAj8$OFSYrL>!Fd|`MYe6_9_u{zrP|NFBObfs
zGkuslqOU76_Cdt|tp#oSPwGI=Ae%p(MXtiq12@_E@><-Fy|`
zT^Z^>oT2_b8S3Ajp?*h(`fD=OUz(x*1sUpBWBun3vH1!cTwlpN?VsnFllm;H%CP-x
zwEIQdOl?0u!}b$eX5*j5cP*ne3~*dojacp)+q>`-_V2M0W^|OW!Rw|eZ`8Wj8|#@r
z-r;3sS9$!bqcp|m+zjK-rS-=bT=S*k=VI)`1lIjzW0<}Dh5N#PJ)8f?AC1}{JrkeL
zm=S+wOu&e_1GRVu`nlGxcz3qak2Me*WcBPzVUbzwOyc=kvZt75d>+eZG(TeELtT8O~o!wD0lv%+nZq
zwFG`lk?$qxedP3?#A6dbB}X34u>E^7)MruF^kw^=a-bu__Sa;nzcfSr3o_J?WT;=2
zp?+D0`uQ2^CtR70pSLp9e>p?_r!&;wm7)H_8S3AYq5ka|>UU(Qza~Tdr5Wm9kfDAg
zL;b1@^~*BU&&hE8lat~4?JXs<`Qzmb^`FjAe^-Y34`--
zZ_iM_BSZZ)8R{?1Q2&Aq^&=VTS7oSQmZ5%rhWh6GxrC+6pLcxz`IFi&(-I4qzo41#
zL;V->Gt>{|y5u8yr}yjyS^?g(i!(NQk52O4KXKP~Fm}9qySug>cWxtIwyc`Hvp$=>
zi*v(2tnv6~H2eJ*HJ7j(nhpPtn=AdlZ1&=J41zmi4IXxAeYvta>SBl2=ddH|^-y=c
z!d_iZXL_e&AxSI0HVh-Va)M
zhjN|=bn{&B47o;{D(}~=XmgQpugg~{=@t%ht5psj6ua2!^rQaKR)X3
z0kF;w(#{Y)=p#42YcUp>>nLZTJO*VK$}W`slnXeIt!DS&OsC_@T>mn(*TjG4z~k@l
zaDh+$FqD6QbJ|??AkJxz6N7SXF&rm)h%LbGLWb)EWl6dwV5&e7jN*
zFud9z&eN{g;b&KD&qglfY44zZ{J2$sG{%&i
z-d*5za$W@IdrCv#(n$?WQ=&%axxMhA;?jnt6}Ox_u;6=-3^X)eRBI#>9mWH*I-|jH
z^$q3T!wV*@4>p{3qP&6C)>f$c;RQOozdlQQWr0$#)nKf9n}@}=MU|x}@7`8|@>ecd
zT5;jLS|iG0eUCEs-Y;3_k={!>qglACT}|~Kw$ylH-LiUBJG3Byau_-WTegiEON}P>
z%8p9ZD?tz89>l#7cUH92Sb}>t>weW>J-sfB(|+Y1Lt(W>T>0L%LFN8!Ew0y$d-W}`
z4-*|nfJ<+b?W(x#kqrw=I&bP-&3d-I=G#?)@4px>mI!wKa;wqBdbay>UwI^WqOPF@
z{6u0G^6s#aD62Il&FO8}l)G$OL+*xc8y|@^w1VH_+GX6C`^vVd`o@@%8{5{1pG~<-
zjX3$+d3o>b6Dt}jPaJNz6S7fdm$4*w!}e~JS?(^QD|h!J2O%GZ+JsNx9D0a1~=uATMeSUAr&ZtJg(8O5AhAP{n1NiA-G1Oz>Zngn(8^Ra5qJI>Ak#ECQ
zA86_I4SIS#mYu94$ATE=L4W7D_ce819z1i-%0)9y)GhkGHhEqS!>ocnU<3F)8Uw#4
z1lxYrSYH6V6s@q9vFY2QtbN;gxx!vxy{iHBUXshv1{UOF3@HU~1^T)UZ;vDrz|MCc
z8^G^!Nn2zmQRWZ&nKIwB7nZhIVrd20BU>9Ei8kPOykIL}FlEZNGhxexd^FJC5!@+n
zF9J8vpq$GKt1I%^CDNSkcmz8CVnkne?jFXVe6-wXMz6N~oO
z(R}iT@Z$WGI^h?8PvFqo!bf2z4u34|0*-TPDYwY5&u(8X=GP-c3Y!|C{*yDm4cX
z^Izq-rudJcY{%!Y)F*7iDY`G(<^vO-|AM@aH66{S98vBG1A%=aT^zt#eN=xj63&GyNj!R_sYy^zs+
zNuZYW6!uCT=}}y&7K>)H&f|;;|4jSbw$_+tm$92GwZVJpHUBN};hrNd@bmCYXKZ=c
zwDm?x|55wh@9Zoe!2Ckv!CV*S$aNytrc-Fn$YRD(>L0%*=8-=pmSWzW%y}R3+)U0L
zM#_9f_M@=hed0o_#Ue&Ao@-6n5u+zBP+QN8<81qdh8y&J(Ajx{J{xp}MJl}TJN{hE
zU_T9ToC%XHda$-f57h3tI9l)04lj6Ool&n~4M=(Mfy*EG$TetuFOdk2Wyk8W*lTqs
zU0uMn1#KLu8$y}#CS&a5(1kgLv5LwM6HAYq{T_Xz+VGX-2IV<;IVygS(5O^&5(&&VSM-PxuZD?kqRwm94UG
zO?!Q_7`|`9PMGcAg?$IUpC0V|X>~B0F=oaf%kgs)eqO}S&Pfe(ZVPOjvncSDIoAfhI_IgtrE`8BF#V@K
z(C~ZV1={NisQqmGgoY_s&F3xUaALu@6xxT0vSSM7pe>gKqGzz^@hz7JqR7+ZTeb(H
zr{mu9UR87q&&kQgmiHskQ<1A!E5d)8ua*YMo=dwrnvJz~0df&@O?XU-{X;3^t*1B;
zrL|EuGhQWqmHYZ8_cf829MDtro{@b`Z4_v4aG8%kPR7zN`2`r0SW|L8-b>*}*%d~C
z_E)aQebp95@$U%)@k~4CuPJhcSnoyIT&_=YM^boXm+VuF#CtJ?*NgUBFmK@Z_OTzB
zi}l(SSg*BXy>>U&YmZ~S_7c`>!`m9LW@}*Wk2JEoA6dd4f24`M^vG&9{74I%xTBTL
z-O-6W+r-*;bg{d4e3j2D6KU^?_1qO$&$VMccQ@8^k7GUe64rCWy|A|x_U?nd`(W=r
z*t-w*?t{JiVefv}yC3%MhdunfR34!mlo#c9-y34Sqqz>)PU1O!dL%$OPC4JWXHlS`
z^QM4~=M2R>*OOe!Ws!m0aJMe0rvi
z#wY*$@Q>DeO@7-JBpS55S^8SmPe-zfLT}{%cvx
zfBMUTtd#yE?C#D=&A&M2jYY`6*B4B$jwAoXn&Um*XDDNhk9j2Aqn{Cb9sTp1oFh#9
zaXu0F6KH0gVx?!0S?BZrUy}Jp*kl8%9AIXSk6h$r9?r1vy{O4*a&VBCd-p*uh6!$a`lN%%nRujJ1&;oE(ncb2ev1@zrA&c7c=_?{OyuDl$`8|;}EsP2xD
zpE+7A=3(9M?XclNHWhYlphv$Z)`$7OPXFd>?C3Q&(KT7<$K_Db6j%6
zTTkUy4{_eI~beF
zpIgLR$(Eh9DTViTBJ$p7xjx!>YdN~Jybxon0Aq~S1vu5vE>8y&*Ea(|L8TT_V-v%oE7>84;qcM68VeTCu|G#eeeg@W-!&R6^Dvu|Y
zj)=kTzk&Sz7o{xt9pH2vd4hUn&-N)%#D9c%4}!N2yf-B8ujo_Sw;1e-V8TVYo>q$R;WM{d8
z?^M$>EEn@iBlhBfYF4Wbvf9A70L~|0sbj3R*T({eg1siK)j?Z(z*CO7@<-kDtnSIy
zYR8$n@B`<>7@x1yeVkZfFt(*PTh2Sw4ssUzn?OCzTcZKUZprJyncypR9;_K~rb6~Z
z0nA(Eb5B-z6zh#647o%;Q(JU5=Pw>V{O_i9raw|a_+QNC_Aw=bGySFNWM8w^Vb
z6xK73_!JYxM*E>#5l`NM?s-4|Nq4M}eXW*sXDG4SLWaCy@6-iQ#u)+Zp5C`M#qU3G
zn^nEpUGT3F{tZUvQ4HDeFI$AtytTqtsf{K244~9FTlO`
zIW<-pS6?|NHQx`|_y+PZuExS7

[Qemu-devel] [PULL 1/3] hw/alpha/typhoon: Stop calling cpu_unassigned_access()

2019-01-07 Thread Richard Henderson

From: Peter Maydell 

The typhoon MemoryRegionOps callbacks directly call
cpu_unassigned_access(), presumably as the old-fashioned way
to provoke a CPU exception.  This won't work since commit
6ad4d7eed05a1e235 when we switched Alpha over to the
transaction_failed hook API, because now cpu_unassigned_access()
is a no-op for Alpha.

Make the MemoryRegionOps callbacks use the read_with_attrs
and write_with_attrs hooks, so they can signal a failure
that should cause a CPU exception by returning MEMTX_ERROR.

Signed-off-by: Peter Maydell 
Message-Id: <20181210173350.13073-1-peter.mayd...@linaro.org>
Tested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
 hw/alpha/typhoon.c | 47 ++
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index 8004afe45b..cbacea5fbd 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -75,7 +75,9 @@ static void cpu_irq_change(AlphaCPU *cpu, uint64_t req)
 }
 }
 
-static uint64_t cchip_read(void *opaque, hwaddr addr, unsigned size)
+static MemTxResult cchip_read(void *opaque, hwaddr addr,
+  uint64_t *data, unsigned size,
+  MemTxAttrs attrs)
 {
 CPUState *cpu = current_cpu;
 TyphoonState *s = opaque;
@@ -196,11 +198,11 @@ static uint64_t cchip_read(void *opaque, hwaddr addr, 
unsigned size)
 break;
 
 default:
-cpu_unassigned_access(cpu, addr, false, false, 0, size);
-return -1;
+return MEMTX_ERROR;
 }
 
-return ret;
+*data = ret;
+return MEMTX_OK;
 }
 
 static uint64_t dchip_read(void *opaque, hwaddr addr, unsigned size)
@@ -209,7 +211,8 @@ static uint64_t dchip_read(void *opaque, hwaddr addr, 
unsigned size)
 return 0;
 }
 
-static uint64_t pchip_read(void *opaque, hwaddr addr, unsigned size)
+static MemTxResult pchip_read(void *opaque, hwaddr addr, uint64_t *data,
+  unsigned size, MemTxAttrs attrs)
 {
 TyphoonState *s = opaque;
 uint64_t ret = 0;
@@ -294,15 +297,16 @@ static uint64_t pchip_read(void *opaque, hwaddr addr, 
unsigned size)
 break;
 
 default:
-cpu_unassigned_access(current_cpu, addr, false, false, 0, size);
-return -1;
+return MEMTX_ERROR;
 }
 
-return ret;
+*data = ret;
+return MEMTX_OK;
 }
 
-static void cchip_write(void *opaque, hwaddr addr,
-uint64_t val, unsigned size)
+static MemTxResult cchip_write(void *opaque, hwaddr addr,
+   uint64_t val, unsigned size,
+   MemTxAttrs attrs)
 {
 TyphoonState *s = opaque;
 uint64_t oldval, newval;
@@ -446,9 +450,10 @@ static void cchip_write(void *opaque, hwaddr addr,
 break;
 
 default:
-cpu_unassigned_access(current_cpu, addr, true, false, 0, size);
-return;
+return MEMTX_ERROR;
 }
+
+return MEMTX_OK;
 }
 
 static void dchip_write(void *opaque, hwaddr addr,
@@ -457,8 +462,9 @@ static void dchip_write(void *opaque, hwaddr addr,
 /* Skip this.  It's all related to DRAM timing and setup.  */
 }
 
-static void pchip_write(void *opaque, hwaddr addr,
-uint64_t val, unsigned size)
+static MemTxResult pchip_write(void *opaque, hwaddr addr,
+   uint64_t val, unsigned size,
+   MemTxAttrs attrs)
 {
 TyphoonState *s = opaque;
 uint64_t oldval;
@@ -553,14 +559,15 @@ static void pchip_write(void *opaque, hwaddr addr,
 break;
 
 default:
-cpu_unassigned_access(current_cpu, addr, true, false, 0, size);
-return;
+return MEMTX_ERROR;
 }
+
+return MEMTX_OK;
 }
 
 static const MemoryRegionOps cchip_ops = {
-.read = cchip_read,
-.write = cchip_write,
+.read_with_attrs = cchip_read,
+.write_with_attrs = cchip_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
 .valid = {
 .min_access_size = 8,
@@ -587,8 +594,8 @@ static const MemoryRegionOps dchip_ops = {
 };
 
 static const MemoryRegionOps pchip_ops = {
-.read = pchip_read,
-.write = pchip_write,
+.read_with_attrs = pchip_read,
+.write_with_attrs = pchip_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
 .valid = {
 .min_access_size = 8,
-- 
2.17.2

[Qemu-devel] [PULL 0/3] target/alpha update

2019-01-07 Thread Richard Henderson

One quite old queued patch, and two recent bug fixes.


r~


The following changes since commit c102d9471f8f02d9fbea72ec4505d7089173f470:

  Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20190107' 
into staging (2019-01-07 16:56:33 +)

are available in the Git repository at:

  https://github.com/rth7680/qemu.git tags/pull-axp-20190108

for you to fetch changes up to ac89de40ef5d4eb1704aa830342a5371413a81dc:

  pc-bios: Update palcode-clipper (2019-01-08 12:12:51 +1000)


Queued target/alpha patches


Peter Maydell (1):
  hw/alpha/typhoon: Stop calling cpu_unassigned_access()

Richard Henderson (2):
  target/alpha: Fix user-only initialization of fpcr
  pc-bios: Update palcode-clipper

 hw/alpha/typhoon.c  |  47 +++
 target/alpha/cpu.c  |   6 +++---
 pc-bios/palcode-clipper | Bin 152680 -> 155968 bytes
 roms/qemu-palcode   |   2 +-
 4 files changed, 31 insertions(+), 24 deletions(-)

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/5] Support for datapath switching during live migration

2019-01-07 Thread si-wei liu





On 01/07/2019 03:32 PM, Michael S. Tsirkin wrote:

On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:

Implement the infrastructure to support datapath switching during live
migration involving SR-IOV devices.

1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
bit and MAC address device pairing.

2. This set of events will be consumed by userspace management software
to orchestrate all the hot plug and datapath switching activities.
This scheme has the least QEMU modifications while allowing userspace
software to build its own intelligence to control the whole process
of SR-IOV live migration.

3. While the hidden device model (viz. coupled device model) is still
being explored for automatic hot plugging (QEMU) and automatic datapath
switching (host-kernel), this series provides a supplemental set
of interfaces if management software wants to drive the SR-IOV live
migration on its own. It should not conflict with the hidden device
model but just offers simplicity of implementation.


Si-Wei Liu (2):
   vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during 
failover
   pci: query command extension to check the bus master enabling status of the 
failover-primary device

Sridhar Samudrala (1):
   virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

Venu Busireddy (2):
   virtio_net: Add support for "Data Path Switching" during Live Migration.
   virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.

---
Changes in v3:
   Fix issues with coding style in patch 3/5.

Changes in v2:
   Added a query command for FAILOVER_STANDBY_CHANGED event.
   Added a query command for FAILOVER_PRIMARY_CHANGED event.

Hmm it looks like all feedback I sent e.g. here:
https://patchwork.kernel.org/patch/10721571/
got ignored.

To summarize I suggest reworking the series adding a new command along
the lines of (naming is up to you):

query-pci-master - this returns status for a device
   and enables a *single* event after
   it changes

and then removing all status data from the event,
just notify about the change and *only once*.
Why removing all status data from the event? It does not hurt to keep 
them as the FAILOVER_PRIMARY_CHANGED event in general is of pretty 
low-frequency. As can be seen other similar low-frequent QMP events do 
have data carried over.


As this event relates to datapath switching, there's implication to 
coalesce events as packets might not get a chance to send out as nothing 
would ever happen when  going through quick transitions like 
disabled->enabled->disabled. I would allow at least few packets to be 
sent over wire rather than nothing. Who knows how fast management can 
react and consume these events?


Thanks,
-Siwei




upon event management does query-pci-master
and acts accordingly.





  hmp.c  |   5 +++
  hw/acpi/pcihp.c|  27 +++
  hw/net/virtio-net.c|  42 +
  hw/pci/pci.c   |   5 +++
  hw/vfio/pci.c  |  60 +
  hw/vfio/pci.h  |   1 +
  include/hw/pci/pci.h   |   1 +
  include/hw/virtio/virtio-net.h |   1 +
  include/net/net.h  |   2 +
  net/net.c  |  61 +
  qapi/misc.json |   5 ++-
  qapi/net.json  | 100 +
  12 files changed, 309 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

2019-01-07 Thread si-wei liu





On 01/07/2019 03:41 PM, Alex Williamson wrote:

On Mon, 7 Jan 2019 18:22:20 -0500
"Michael S. Tsirkin"  wrote:


On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:

On Mon,  7 Jan 2019 17:29:43 -0500
Venu Busireddy  wrote:
   

From: Si-Wei Liu 

When a VF is hotplugged into the guest, datapath switching will be
performed immediately, which is sub-optimal in terms of timing, and
could end up with substantial network downtime. One of ways to shorten
this downtime is to switch the datapath only after the VF is seen to get
enabled by guest, indicated by the bus master bit in VF's PCI config
space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
at that time to indicate this condition. Then management stack can kick
off datapath switching upon receiving the event.

Signed-off-by: Si-Wei Liu 
Signed-off-by: Venu Busireddy 
---
  hw/vfio/pci.c | 57 +
  qapi/net.json | 26 ++
  2 files changed, 83 insertions(+)

Why is this done at the vfio driver layer rather than the PCI core
layer?  We write everything through using pci_default_write_config(), I
don't see that anything here is particularly vfio specific.  Please copy
me on any changes in hw/vfio.  Thanks,

Alex

Hmm so you are saying let's send events for each device?
I don't have a problem with this but in this case
I think I would like to see a per-device option "send events".
We don't want a ton of events in the simple default config.

In the below we're only sending events for PCIDevice.failover_primary,
seems like that would filter out the rest of the non-NIC PCI devices as
well as it does for non-NIC VFIO PCI devices.  The only thing remotely
vfio specific below is that it might notify based on the vfio device
name, but it's a fallback to PCIDevice.qdev.id.
Not exactly. It will first try to use the qdev ID to notify. If qdev id 
is missing (vfio-pci device could live without it),  then sysfsdev name 
will be used instead (in the form of host device 
":." location rather than ID). The intent was 
indeed to make this notification applicable to every possible vfio-pci 
device, even those without a qdev ID.



  A real ID could just
be a requirement to make use of this.
I'm fine to make qdev-id required for failover_primary PCI device. But 
please be noted, this is a shrinkage rather than generalization that has 
to apply to all other non-VFIO PCI devices that don't have to specify a 
qdev ID today.  I'm not sure if it's a good idea to make it restricted 
this early.


Thanks,
-Siwei


  Thanks,

Alex


diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index bd83b58..adcc95a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -34,6 +34,7 @@
  #include "pci.h"
  #include "trace.h"
  #include "qapi/error.h"
+#include "qapi/qapi-events-net.h"
  
  #define MSIX_CAP_LENGTH 12
  
@@ -42,6 +43,7 @@
  
  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);

  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
  
  /*

   * Disabling BAR mmaping can be slow, but toggling it around INTx can
@@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
  {
  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
  uint32_t val_le = cpu_to_le32(val);
+bool may_notify = false;
+bool master_was = false;
  
  trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
  
@@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,

   __func__, vdev->vbasedev.name, addr, val, len);
  }
  
+/* Bus Master Enabling/Disabling */

+if (pdev->failover_primary && current_cpu &&
+range_covers_byte(addr, len, PCI_COMMAND)) {
+master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+PCI_COMMAND_MASTER);
+may_notify = true;
+}
+
  /* MSI/MSI-X Enabling/Disabling */
  if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
  ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
@@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
  /* Write everything to QEMU to keep emulated bits correct */
  pci_default_write_config(pdev, addr, val, len);
  }
+
+if (may_notify) {
+bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+ PCI_COMMAND_MASTER);
+if (master_was != master_now) {
+vfio_failover_notify(vdev, master_now);
+}
+}
  }
  
  /*

@@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
*vdev)
  vdev->req_enabled = false;
  }
  
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)

+{
+PCIDevice *pdev = &vdev->pdev;
+const char *n;
+gchar *path;
+
+n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
+path = object_get_canonical_path(OBJECT(vdev));
+qapi_event_send_failover_primary_changed(!!

Re: [Qemu-devel] [PATCH for-4.0 v4 3/4] i386: import & use bootparam.h

2019-01-07 Thread Li Zhijian


Hi Stefano,


On 1/5/19 00:41, Stefano Garzarella wrote:

+# Remove everything except the macros from bootparam.h avoiding the
+# unnecessary import of several video/ist/etc headers
+sed -e '/__ASSEMBLY__/,/__ASSEMBLY__/d' $tmpdir/include/asm/bootparam.h 
> $tmpdir/bootparam.h
+cp_portable $tmpdir/bootparam.h 
"$output/include/standard-headers/asm-$arch"

Maybe is better to use the double quotes for all paths.


Sure, i will update it at next version.

Thanks
Zhijian




Reviewed-by: Stefano Garzarella

Thanks,
Stefano

Re: [Qemu-devel] [PATCH for-4.0 v4 2/4] refactor load_image_size

2019-01-07 Thread Li Zhijian




On 1/7/19 18:33, Stefano Garzarella wrote:

On Mon, Dec 24, 2018 at 3:16 AM Li Zhijian  wrote:


On 12/22/18 00:12, Michael S. Tsirkin wrote:

On Thu, Dec 06, 2018 at 10:32:11AM +0800, Li Zhijian wrote:

Don't expect read(2) can always read as many as it's told.

Signed-off-by: Li Zhijian 
Reviewed-by: Richard Henderson 

This is more a theoretical bugfix than a refactoring right?

Yes， it does.

how about change the title to : "enhance reading on load_image_size()" or such

Maybe something like this: "hw/core/loader.c: Read as long as possible
in load_image_size()"


It really helps, i will take your suggestion to the subject.

Thanks
Zhijian






Reviewed-by: Stefano Garzarella

Re: [Qemu-devel] [PATCH v15 23/26] sched: early boot clock

2019-01-07 Thread Dominique Martinet

Pavel Tatashin wrote on Mon, Jan 07, 2019:
> I did exactly the same sequence on Kaby Lake CPU and could not
> reproduce it. What is your host CPU?

skylake consumer laptop CPU: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz

I don't have any kaby lake around; I have access to older servers though...
-- 
Dominique

Re: [Qemu-devel] [PATCH for-4.0 v4 1/4] unify len and addr type for memory/address APIs

2019-01-07 Thread Li Zhijian




On 1/7/19 18:04, Stefano Garzarella wrote:

As Philippe already suggested,
s/espcially/especially


Hi Stefano，

thanks a lot, will update it at next version.

Thanks

Re: [Qemu-devel] [PATCH v15 23/26] sched: early boot clock

2019-01-07 Thread Pavel Tatashin

I did exactly the same sequence on Kaby Lake CPU and could not
reproduce it. What is your host CPU?

Thank you,
Pasha

On Mon, Jan 7, 2019 at 6:48 PM Dominique Martinet
 wrote:
>
> Pavel Tatashin wrote on Mon, Jan 07, 2019:
> > I could not reproduce the problem. Did you suspend to memory between
> > wake ups? Does this time jump happen every time, even if your laptop
> > sleeps for a minute?
>
> I'm not sure I understand "suspend to memory between the wake ups".
> The full sequence is:
>  - start a VM (just in case, I let it boot till the end)
>  - suspend to memory (aka systemctl suspend) the host
>  - after resuming the host, soft reboot the VM (login through
> serial/ssh/whatever and reboot or in the qemu console 'system_reset')
>
> I've just slept exactly one minute and reproduced again with the fedora
> stock kernel now (4.19.13-300.fc29.x86_64) in the VM.
>
> Interestingly I'm not getting the same offset between multiple reboots
> now despite not suspending again; but if I don't suspend I cannot seem
> to get it to give an offset at all (only tried for a few minutes; this
> might not be true) ; OTOH I pushed my luck further and even with a five
> seconds sleep I'm getting a noticeable offset on first VM reboot after
> resume:
>
> [0.00] Hypervisor detected: KVM
> [0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00
> [  179.362163] kvm-clock: cpu 0, msr 13c01001, primary cpu clock
> [  179.362163] clocksource: kvm-clock: mask: 0x max_cycles: 
> 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
>
> Honestly not sure what more information I could give, I'll try on some
> other hardware than my laptop (if I can get a server to resume after
> suspend through ipmi or wake on lan); but I don't have anything I could
> install ubuntu on to try their qemu's version... although I really don't
> want to believe that's the difference...
>
> Thanks,
> --
> Dominique

Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

2019-01-07 Thread Michael S. Tsirkin

On Mon, Jan 07, 2019 at 05:24:15PM -0700, Alex Williamson wrote:
> On Mon, 7 Jan 2019 19:12:06 -0500
> "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Jan 07, 2019 at 04:41:15PM -0700, Alex Williamson wrote:
> > > On Mon, 7 Jan 2019 18:22:20 -0500
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:  
> > > > > On Mon,  7 Jan 2019 17:29:43 -0500
> > > > > Venu Busireddy  wrote:
> > > > > 
> > > > > > From: Si-Wei Liu 
> > > > > > 
> > > > > > When a VF is hotplugged into the guest, datapath switching will be
> > > > > > performed immediately, which is sub-optimal in terms of timing, and
> > > > > > could end up with substantial network downtime. One of ways to 
> > > > > > shorten
> > > > > > this downtime is to switch the datapath only after the VF is seen 
> > > > > > to get
> > > > > > enabled by guest, indicated by the bus master bit in VF's PCI config
> > > > > > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > > > > > at that time to indicate this condition. Then management stack can 
> > > > > > kick
> > > > > > off datapath switching upon receiving the event.
> > > > > > 
> > > > > > Signed-off-by: Si-Wei Liu 
> > > > > > Signed-off-by: Venu Busireddy 
> > > > > > ---
> > > > > >  hw/vfio/pci.c | 57 
> > > > > > +
> > > > > >  qapi/net.json | 26 ++
> > > > > >  2 files changed, 83 insertions(+)
> > > > > 
> > > > > Why is this done at the vfio driver layer rather than the PCI core
> > > > > layer?  We write everything through using pci_default_write_config(), 
> > > > > I
> > > > > don't see that anything here is particularly vfio specific.  Please 
> > > > > copy
> > > > > me on any changes in hw/vfio.  Thanks,
> > > > > 
> > > > > Alex
> > > > 
> > > > Hmm so you are saying let's send events for each device?
> > > > I don't have a problem with this but in this case
> > > > I think I would like to see a per-device option "send events".
> > > > We don't want a ton of events in the simple default config.  
> > > 
> > > In the below we're only sending events for PCIDevice.failover_primary,  
> > 
> > Well failover_primary in this patch is a vfio property, not a
> > pci device property.
> 
> It's both and it's kind of a kludge (from 2/5):
> 
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>  vfio_register_err_notifier(vdev);
>  vfio_register_req_notifier(vdev);
>  vfio_setup_resetfn_quirk(vdev);
> +pdev->failover_primary = vdev->failover_primary;
>  
>  return;
>  
> @@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
> qdev_prop_nv_gpudirect_clique, uint8_t),
>  DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, 
> msix_relo,
>  OFF_AUTOPCIBAR_OFF),
> +DEFINE_PROP_BOOL("failover-primary", VFIOPCIDevice, failover_primary,
> + false),
>  /*
>   * TODO - support passed fds... is this necessary?
>   * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
> 
> The property could have set VFIOPCIDevice.pdev.failover_primary
> directly.  I'm not thrilled about that name either, it's a very NIC
> centric property whereas vfio-pci supports plenty of non-networking
> devices, as of course does PCIDevice.  Maybe the concept needs to be
> more general or the name needs to be more NIC specific and fail for
> devices that don't have the correct class code.  Thanks,
> 
> Alex

I actually think it's generic concept. I came with a name failover
exactly to avoid the "bonding" name that was used originally
and was net specific.

In particular
https://fedoraproject.org/wiki/Features/Virt_Device_Failover
suggests using multipath for storage.

Can in theory easily be imagined to work with  rng, crypto
even though I don't think Linux makes supporting this easy.



> > > seems like that would filter out the rest of the non-NIC PCI devices as
> > > well as it does for non-NIC VFIO PCI devices.  The only thing remotely
> > > vfio specific below is that it might notify based on the vfio device
> > > name, but it's a fallback to PCIDevice.qdev.id.  A real ID could just
> > > be a requirement to make use of this.  
> > 
> > 
> > Right and in fact I don't see why we can't make reporting
> > bus master status a capability of all devices.
> > 
> > 
> > >  Thanks,
> > > 
> > > Alex
> > >   
> > > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > > > index bd83b58..adcc95a 100644
> > > > > > --- a/hw/vfio/pci.c
> > > > > > +++ b/hw/vfio/pci.c
> > > > > > @@ -34,6 +34,7 @@
> > > > > >  #include "pci.h"
> > > > > >  #include "trace.h"
> > > > > >  #include "qapi/error.h"
> > > > > > +#include "qapi/qapi-events-net.h"
> > > > > >  
> > > > > >  #define MSIX_CAP_LENGTH 12
> > > > > >  
> > > > > > @@ -42,6 +43,7 @@
> > > > >

Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

2019-01-07 Thread Alex Williamson

On Mon, 7 Jan 2019 19:12:06 -0500
"Michael S. Tsirkin"  wrote:

> On Mon, Jan 07, 2019 at 04:41:15PM -0700, Alex Williamson wrote:
> > On Mon, 7 Jan 2019 18:22:20 -0500
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:  
> > > > On Mon,  7 Jan 2019 17:29:43 -0500
> > > > Venu Busireddy  wrote:
> > > > 
> > > > > From: Si-Wei Liu 
> > > > > 
> > > > > When a VF is hotplugged into the guest, datapath switching will be
> > > > > performed immediately, which is sub-optimal in terms of timing, and
> > > > > could end up with substantial network downtime. One of ways to shorten
> > > > > this downtime is to switch the datapath only after the VF is seen to 
> > > > > get
> > > > > enabled by guest, indicated by the bus master bit in VF's PCI config
> > > > > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > > > > at that time to indicate this condition. Then management stack can 
> > > > > kick
> > > > > off datapath switching upon receiving the event.
> > > > > 
> > > > > Signed-off-by: Si-Wei Liu 
> > > > > Signed-off-by: Venu Busireddy 
> > > > > ---
> > > > >  hw/vfio/pci.c | 57 
> > > > > +
> > > > >  qapi/net.json | 26 ++
> > > > >  2 files changed, 83 insertions(+)
> > > > 
> > > > Why is this done at the vfio driver layer rather than the PCI core
> > > > layer?  We write everything through using pci_default_write_config(), I
> > > > don't see that anything here is particularly vfio specific.  Please copy
> > > > me on any changes in hw/vfio.  Thanks,
> > > > 
> > > > Alex
> > > 
> > > Hmm so you are saying let's send events for each device?
> > > I don't have a problem with this but in this case
> > > I think I would like to see a per-device option "send events".
> > > We don't want a ton of events in the simple default config.  
> > 
> > In the below we're only sending events for PCIDevice.failover_primary,  
> 
> Well failover_primary in this patch is a vfio property, not a
> pci device property.

It's both and it's kind of a kludge (from 2/5):

--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
 vfio_setup_resetfn_quirk(vdev);
+pdev->failover_primary = vdev->failover_primary;
 
 return;
 
@@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
qdev_prop_nv_gpudirect_clique, uint8_t),
 DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
 OFF_AUTOPCIBAR_OFF),
+DEFINE_PROP_BOOL("failover-primary", VFIOPCIDevice, failover_primary,
+ false),
 /*
  * TODO - support passed fds... is this necessary?
  * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),

The property could have set VFIOPCIDevice.pdev.failover_primary
directly.  I'm not thrilled about that name either, it's a very NIC
centric property whereas vfio-pci supports plenty of non-networking
devices, as of course does PCIDevice.  Maybe the concept needs to be
more general or the name needs to be more NIC specific and fail for
devices that don't have the correct class code.  Thanks,

Alex

> > seems like that would filter out the rest of the non-NIC PCI devices as
> > well as it does for non-NIC VFIO PCI devices.  The only thing remotely
> > vfio specific below is that it might notify based on the vfio device
> > name, but it's a fallback to PCIDevice.qdev.id.  A real ID could just
> > be a requirement to make use of this.  
> 
> 
> Right and in fact I don't see why we can't make reporting
> bus master status a capability of all devices.
> 
> 
> >  Thanks,
> > 
> > Alex
> >   
> > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > > index bd83b58..adcc95a 100644
> > > > > --- a/hw/vfio/pci.c
> > > > > +++ b/hw/vfio/pci.c
> > > > > @@ -34,6 +34,7 @@
> > > > >  #include "pci.h"
> > > > >  #include "trace.h"
> > > > >  #include "qapi/error.h"
> > > > > +#include "qapi/qapi-events-net.h"
> > > > >  
> > > > >  #define MSIX_CAP_LENGTH 12
> > > > >  
> > > > > @@ -42,6 +43,7 @@
> > > > >  
> > > > >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> > > > >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > > > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> > > > >  
> > > > >  /*
> > > > >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > > > > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > > >  {
> > > > >  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > >  uint32_t val_le = cpu_to_le32(val);
> > > > > +bool may_notify = false;
> > > > > +bool master_was = false;
> > > > >  
> > > > >  trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);

Re: [Qemu-devel] [PATCH 1/3] spapr: Eliminate SPAPR_PCI_2_7_MMIO_WIN_SIZE macro

2019-01-07 Thread David Gibson

On Mon, Jan 07, 2019 at 05:30:18PM -0200, Eduardo Habkost wrote:
> The macro is only used in one place, where the purpose of the
> value is obvious.  Eliminate the macro so we don't need to rely
> on stringify().
> 
> Signed-off-by: Eduardo Habkost 

Acked-by: David Gibson 

> ---
>  include/hw/pci-host/spapr.h | 1 -
>  hw/ppc/spapr.c  | 2 +-
>  2 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 7c66c3872f..a85a995b6c 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -99,7 +99,6 @@ struct sPAPRPHBState {
>  #define SPAPR_PCI_BASE   (1ULL << 45) /* 32 TiB */
>  #define SPAPR_PCI_LIMIT  (1ULL << 46) /* 64 TiB */
>  
> -#define SPAPR_PCI_2_7_MMIO_WIN_SIZE  0xf8000
>  #define SPAPR_PCI_IO_WIN_SIZE0x1
>  
>  #define SPAPR_PCI_MSI_WINDOW 0x400ULL
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5671608cea..bff42f0adb 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -4225,7 +4225,7 @@ static void 
> spapr_machine_2_7_class_options(MachineClass *mc)
>  {
>  .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,
>  .property = "mem_win_size",
> -.value= stringify(SPAPR_PCI_2_7_MMIO_WIN_SIZE),
> +.value= "0xf8000",
>  },
>  {
>  .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 2/3] machine: Eliminate unnecessary stringify() usage

2019-01-07 Thread David Gibson

On Mon, Jan 07, 2019 at 05:30:19PM -0200, Eduardo Habkost wrote:
> stringify() is useful when we need to use macros in compat_props
> (like when we set virtio-baloon-pci.class=PCI_CLASS_MEMORY_RAM at
> pc_i440fx_1_0_machine_options()), but it is pointless when we are
> already providing a number literal.
> 
> Replace stringify() with string literals when appropriate.
> 
> Signed-off-by: Eduardo Habkost 

Reviewed-by: David Gibson 

> ---
>  hw/core/machine.c |  8 ++--
>  hw/i386/pc.c  | 94 +++
>  hw/i386/pc_piix.c | 30 +++
>  hw/ppc/spapr.c|  2 +-
>  4 files changed, 67 insertions(+), 67 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index f8563efb86..4b4d6c23de 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -135,11 +135,11 @@ GlobalProperty hw_compat_2_8[] = {
>  {
>  .driver   = "fw_cfg_mem",
>  .property = "x-file-slots",
> -.value= stringify(0x10),
> +.value= "0x10",
>  },{
>  .driver   = "fw_cfg_io",
>  .property = "x-file-slots",
> -.value= stringify(0x10),
> +.value= "0x10",
>  },{
>  .driver   = "pflash_cfi01",
>  .property = "old-multiple-chip-handling",
> @@ -337,11 +337,11 @@ GlobalProperty hw_compat_2_1[] = {
>  },{
>  .driver   = "usb-mouse",
>  .property = "usb_version",
> -.value= stringify(1),
> +.value= "1",
>  },{
>  .driver   = "usb-kbd",
>  .property = "usb_version",
> -.value= stringify(1),
> +.value= "1",
>  },{
>  .driver   = "virtio-pci",
>  .property = "virtio-pci-bus-master-bug-migration",
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 4952feb476..ff14b6d4df 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -148,11 +148,11 @@ GlobalProperty pc_compat_2_12[] = {
>  },{
>  .driver   = "EPYC-" TYPE_X86_CPU,
>  .property = "xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "EPYC-IBPB-" TYPE_X86_CPU,
>  .property = "xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },
>  };
>  const size_t pc_compat_2_12_len = G_N_ELEMENTS(pc_compat_2_12);
> @@ -191,7 +191,7 @@ GlobalProperty pc_compat_2_9[] = {
>  {
>  .driver   = "mch",
>  .property = "extended-tseg-mbytes",
> -.value= stringify(0),
> +.value= "0",
>  },
>  };
>  const size_t pc_compat_2_9_len = G_N_ELEMENTS(pc_compat_2_9);
> @@ -365,75 +365,75 @@ GlobalProperty pc_compat_2_3[] = {
>  },{
>  .driver   = "qemu64" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(4),
> +.value= "4",
>  },{
>  .driver   = "kvm64" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(5),
> +.value= "5",
>  },{
>  .driver   = "pentium3" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(2),
> +.value= "2",
>  },{
>  .driver   = "n270" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(5),
> +.value= "5",
>  },{
>  .driver   = "Conroe" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(4),
> +.value= "4",
>  },{
>  .driver   = "Penryn" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(4),
> +.value= "4",
>  },{
>  .driver   = "Nehalem" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(4),
> +.value= "4",
>  },{
>  .driver   = "n270" "-" TYPE_X86_CPU,
>  .property = "min-xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "Penryn" "-" TYPE_X86_CPU,
>  .property = "min-xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "Conroe" "-" TYPE_X86_CPU,
>  .property = "min-xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "Nehalem" "-" TYPE_X86_CPU,
>  .property = "min-xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "Westmere" "-" TYPE_X86_CPU,
>  .property = "min-xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "SandyBridge" "-" TYPE_X86_CPU,
>  .property = "min-xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "IvyBridge" "-"

Re: [Qemu-devel] [PATCH v4 04/11] pci/pcie: stop plug/unplug if the slot is locked

2019-01-07 Thread Michael S. Tsirkin

On Mon, Jan 07, 2019 at 01:13:58PM +0100, David Hildenbrand wrote:
> On 12.12.18 10:16, David Hildenbrand wrote:
> > We better stop right away. For now, errors would be partially ignored
> > (so the guest might get informed or the device might get unplugged),
> > although actual plug/unplug will be reported as failed to the user.
> > 
> > While at it, properly move the check to the pre_plug handler for the plug
> > case, as we can test the slot state before the device will be realized.
> > 
> > Reviewed-by: Igor Mammedov 
> > Reviewed-by: David Gibson 
> > Signed-off-by: David Hildenbrand 
> 
> @MST, looks like you missed this one.


Ouch I wonder what happened.
Thanks!

> 
> -- 
> 
> Thanks,
> 
> David / dhildenb

Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

2019-01-07 Thread Michael S. Tsirkin

On Mon, Jan 07, 2019 at 04:41:15PM -0700, Alex Williamson wrote:
> On Mon, 7 Jan 2019 18:22:20 -0500
> "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:
> > > On Mon,  7 Jan 2019 17:29:43 -0500
> > > Venu Busireddy  wrote:
> > >   
> > > > From: Si-Wei Liu 
> > > > 
> > > > When a VF is hotplugged into the guest, datapath switching will be
> > > > performed immediately, which is sub-optimal in terms of timing, and
> > > > could end up with substantial network downtime. One of ways to shorten
> > > > this downtime is to switch the datapath only after the VF is seen to get
> > > > enabled by guest, indicated by the bus master bit in VF's PCI config
> > > > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > > > at that time to indicate this condition. Then management stack can kick
> > > > off datapath switching upon receiving the event.
> > > > 
> > > > Signed-off-by: Si-Wei Liu 
> > > > Signed-off-by: Venu Busireddy 
> > > > ---
> > > >  hw/vfio/pci.c | 57 
> > > > +
> > > >  qapi/net.json | 26 ++
> > > >  2 files changed, 83 insertions(+)  
> > > 
> > > Why is this done at the vfio driver layer rather than the PCI core
> > > layer?  We write everything through using pci_default_write_config(), I
> > > don't see that anything here is particularly vfio specific.  Please copy
> > > me on any changes in hw/vfio.  Thanks,
> > > 
> > > Alex  
> > 
> > Hmm so you are saying let's send events for each device?
> > I don't have a problem with this but in this case
> > I think I would like to see a per-device option "send events".
> > We don't want a ton of events in the simple default config.
> 
> In the below we're only sending events for PCIDevice.failover_primary,

Well failover_primary in this patch is a vfio property, not a
pci device property.


> seems like that would filter out the rest of the non-NIC PCI devices as
> well as it does for non-NIC VFIO PCI devices.  The only thing remotely
> vfio specific below is that it might notify based on the vfio device
> name, but it's a fallback to PCIDevice.qdev.id.  A real ID could just
> be a requirement to make use of this.


Right and in fact I don't see why we can't make reporting
bus master status a capability of all devices.


>  Thanks,
> 
> Alex
> 
> > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > index bd83b58..adcc95a 100644
> > > > --- a/hw/vfio/pci.c
> > > > +++ b/hw/vfio/pci.c
> > > > @@ -34,6 +34,7 @@
> > > >  #include "pci.h"
> > > >  #include "trace.h"
> > > >  #include "qapi/error.h"
> > > > +#include "qapi/qapi-events-net.h"
> > > >  
> > > >  #define MSIX_CAP_LENGTH 12
> > > >  
> > > > @@ -42,6 +43,7 @@
> > > >  
> > > >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> > > >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> > > >  
> > > >  /*
> > > >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > > > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > >  {
> > > >  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > >  uint32_t val_le = cpu_to_le32(val);
> > > > +bool may_notify = false;
> > > > +bool master_was = false;
> > > >  
> > > >  trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> > > >  
> > > > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > >   __func__, vdev->vbasedev.name, addr, val, len);
> > > >  }
> > > >  
> > > > +/* Bus Master Enabling/Disabling */
> > > > +if (pdev->failover_primary && current_cpu &&
> > > > +range_covers_byte(addr, len, PCI_COMMAND)) {
> > > > +master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > > +PCI_COMMAND_MASTER);
> > > > +may_notify = true;
> > > > +}
> > > > +
> > > >  /* MSI/MSI-X Enabling/Disabling */
> > > >  if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> > > >  ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > > > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > >  /* Write everything to QEMU to keep emulated bits correct */
> > > >  pci_default_write_config(pdev, addr, val, len);
> > > >  }
> > > > +
> > > > +if (may_notify) {
> > > > +bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > > + PCI_COMMAND_MASTER);
> > > > +if (master_was != master_now) {
> > > > +vfio_failover_notify(vdev, master_now);
> > > > +}
> > > > +}
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -2801,6 +2821,17 @@ static void 
> > > > vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> > > >  vdev->req_enabled = false;
> > > >  }
> > > >  
> > > > +static void vfio_failover_notify(VFIOPCIDevice

Re: [Qemu-devel] [PATCH v3 3/5] virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.

2019-01-07 Thread Michael S. Tsirkin

On Mon, Jan 07, 2019 at 05:29:42PM -0500, Venu Busireddy wrote:
> Add a query command to check the status of the FAILOVER_STANDBY_CHANGED
> state of the virtio_net devices.
> 
> Signed-off-by: Venu Busireddy 
> ---
>  hw/net/virtio-net.c| 16 +++
>  include/hw/virtio/virtio-net.h |  1 +
>  include/net/net.h  |  2 ++
>  net/net.c  | 61 
> ++
>  qapi/net.json  | 46 +++
>  5 files changed, 126 insertions(+)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 7b1bcde..a4e07ac 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -263,9 +263,11 @@ static void virtio_net_failover_notify_event(VirtIONet 
> *n, uint8_t status)
>   */
>  if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
>  (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
> +n->standby_enabled = true;
>  qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
>  } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
>  (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +n->standby_enabled = false;
>  qapi_event_send_failover_standby_changed(!!ncn, ncn, path, 
> false);
>  }
>  }

Here too, we are sending an endless stream of events.

Instead, let's send one "changed" event without data,
and then be silent until management runs the query command.




> @@ -448,6 +450,19 @@ static RxFilterInfo 
> *virtio_net_query_rxfilter(NetClientState *nc)
>  return info;
>  }
>  
> +static StandbyStatusInfo *virtio_net_query_standby_status(NetClientState *nc)
> +{
> +StandbyStatusInfo *info;
> +VirtIONet *n = qemu_get_nic_opaque(nc);
> +
> +info = g_malloc0(sizeof(*info));
> +info->device = g_strdup(n->netclient_name);
> +info->path = g_strdup(object_get_canonical_path(OBJECT(n->qdev)));
> +info->enabled = n->standby_enabled;
> +
> +return info;
> +}
> +
>  static void virtio_net_reset(VirtIODevice *vdev)
>  {
>  VirtIONet *n = VIRTIO_NET(vdev);
> @@ -1923,6 +1938,7 @@ static NetClientInfo net_virtio_info = {
>  .receive = virtio_net_receive,
>  .link_status_changed = virtio_net_set_link_status,
>  .query_rx_filter = virtio_net_query_rxfilter,
> +.query_standby_status = virtio_net_query_standby_status,
>  };
>  
>  static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index 4d7f3c8..9071e96 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -103,6 +103,7 @@ typedef struct VirtIONet {
>  int announce_counter;
>  bool needs_vnet_hdr_swap;
>  bool mtu_bypass_backend;
> +bool standby_enabled;
>  } VirtIONet;
>  
>  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> diff --git a/include/net/net.h b/include/net/net.h
> index ec13702..61e8513 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -50,6 +50,7 @@ typedef void (NetCleanup) (NetClientState *);
>  typedef void (LinkStatusChanged)(NetClientState *);
>  typedef void (NetClientDestructor)(NetClientState *);
>  typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
> +typedef StandbyStatusInfo *(QueryStandbyStatus)(NetClientState *);
>  typedef bool (HasUfo)(NetClientState *);
>  typedef bool (HasVnetHdr)(NetClientState *);
>  typedef bool (HasVnetHdrLen)(NetClientState *, int);
> @@ -71,6 +72,7 @@ typedef struct NetClientInfo {
>  NetCleanup *cleanup;
>  LinkStatusChanged *link_status_changed;
>  QueryRxFilter *query_rx_filter;
> +QueryStandbyStatus *query_standby_status;
>  NetPoll *poll;
>  HasUfo *has_ufo;
>  HasVnetHdr *has_vnet_hdr;
> diff --git a/net/net.c b/net/net.c
> index 1f7d626..fbf288e 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1320,6 +1320,67 @@ RxFilterInfoList *qmp_query_rx_filter(bool has_name, 
> const char *name,
>  return filter_list;
>  }
>  
> +StandbyStatusInfoList *qmp_query_standby_status(bool has_device,
> +const char *device,
> +Error **errp)
> +{
> +NetClientState *nc;
> +StandbyStatusInfoList *status_list = NULL, *last_entry = NULL;
> +
> +QTAILQ_FOREACH(nc, &net_clients, next) {
> +StandbyStatusInfoList *entry;
> +StandbyStatusInfo *info;
> +
> +if (has_device && strcmp(nc->name, device) != 0) {
> +continue;
> +}
> +
> +/* only query standby status information of NIC */
> +if (nc->info->type != NET_CLIENT_DRIVER_NIC) {
> +if (has_device) {
> +error_setg(errp, "net client(%s) isn't a NIC", device);
> +return NULL;
> +}
> +continue;
> +}
> +
> +/*
> + * only query information on queu

Re: [Qemu-devel] [PATCH v15 23/26] sched: early boot clock

2019-01-07 Thread Dominique Martinet

Pavel Tatashin wrote on Mon, Jan 07, 2019:
> I could not reproduce the problem. Did you suspend to memory between
> wake ups? Does this time jump happen every time, even if your laptop
> sleeps for a minute?

I'm not sure I understand "suspend to memory between the wake ups".
The full sequence is:
 - start a VM (just in case, I let it boot till the end)
 - suspend to memory (aka systemctl suspend) the host
 - after resuming the host, soft reboot the VM (login through
serial/ssh/whatever and reboot or in the qemu console 'system_reset')

I've just slept exactly one minute and reproduced again with the fedora
stock kernel now (4.19.13-300.fc29.x86_64) in the VM.

Interestingly I'm not getting the same offset between multiple reboots
now despite not suspending again; but if I don't suspend I cannot seem
to get it to give an offset at all (only tried for a few minutes; this
might not be true) ; OTOH I pushed my luck further and even with a five
seconds sleep I'm getting a noticeable offset on first VM reboot after
resume:

[0.00] Hypervisor detected: KVM
[0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00
[  179.362163] kvm-clock: cpu 0, msr 13c01001, primary cpu clock
[  179.362163] clocksource: kvm-clock: mask: 0x max_cycles: 
0x1cd42e4dffb, max_idle_ns: 881590591483 ns

Honestly not sure what more information I could give, I'll try on some
other hardware than my laptop (if I can get a server to resume after
suspend through ipmi or wake on lan); but I don't have anything I could
install ubuntu on to try their qemu's version... although I really don't
want to believe that's the difference...

Thanks,
-- 
Dominique

Re: [Qemu-devel] [PATCH] qom: cpu: destroy work_mutex in cpu_common_finalize

2019-01-07 Thread Paolo Bonzini

On 02/01/19 08:41, Li Qiang wrote:
> Commit 376692b9dc6(cpus: protect work list with work_mutex)
> initialize a work_mutex in cpu_common_initfn, however forget
> to destroy it. This will cause resource leak when hotunplug cpu
> or hotplug cpu fails.
> 
> Signed-off-by: Li Qiang 
> ---
>  qom/cpu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/qom/cpu.c b/qom/cpu.c
> index 9ad1372d57..367ebf9d61 100644
> --- a/qom/cpu.c
> +++ b/qom/cpu.c
> @@ -380,6 +380,9 @@ static void cpu_common_initfn(Object *obj)
>  
>  static void cpu_common_finalize(Object *obj)
>  {
> +CPUState *cpu = CPU(obj);
> +
> +qemu_mutex_destroy(&cpu->work_mutex);
>  }
>  
>  static int64_t cpu_common_get_arch_id(CPUState *cpu)
> 

Queued, thanks.

Paolo

Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

2019-01-07 Thread Alex Williamson

On Mon, 7 Jan 2019 18:22:20 -0500
"Michael S. Tsirkin"  wrote:

> On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:
> > On Mon,  7 Jan 2019 17:29:43 -0500
> > Venu Busireddy  wrote:
> >   
> > > From: Si-Wei Liu 
> > > 
> > > When a VF is hotplugged into the guest, datapath switching will be
> > > performed immediately, which is sub-optimal in terms of timing, and
> > > could end up with substantial network downtime. One of ways to shorten
> > > this downtime is to switch the datapath only after the VF is seen to get
> > > enabled by guest, indicated by the bus master bit in VF's PCI config
> > > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > > at that time to indicate this condition. Then management stack can kick
> > > off datapath switching upon receiving the event.
> > > 
> > > Signed-off-by: Si-Wei Liu 
> > > Signed-off-by: Venu Busireddy 
> > > ---
> > >  hw/vfio/pci.c | 57 
> > > +
> > >  qapi/net.json | 26 ++
> > >  2 files changed, 83 insertions(+)  
> > 
> > Why is this done at the vfio driver layer rather than the PCI core
> > layer?  We write everything through using pci_default_write_config(), I
> > don't see that anything here is particularly vfio specific.  Please copy
> > me on any changes in hw/vfio.  Thanks,
> > 
> > Alex  
> 
> Hmm so you are saying let's send events for each device?
> I don't have a problem with this but in this case
> I think I would like to see a per-device option "send events".
> We don't want a ton of events in the simple default config.

In the below we're only sending events for PCIDevice.failover_primary,
seems like that would filter out the rest of the non-NIC PCI devices as
well as it does for non-NIC VFIO PCI devices.  The only thing remotely
vfio specific below is that it might notify based on the vfio device
name, but it's a fallback to PCIDevice.qdev.id.  A real ID could just
be a requirement to make use of this.  Thanks,

Alex

> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index bd83b58..adcc95a 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -34,6 +34,7 @@
> > >  #include "pci.h"
> > >  #include "trace.h"
> > >  #include "qapi/error.h"
> > > +#include "qapi/qapi-events-net.h"
> > >  
> > >  #define MSIX_CAP_LENGTH 12
> > >  
> > > @@ -42,6 +43,7 @@
> > >  
> > >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> > >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> > >  
> > >  /*
> > >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > >  {
> > >  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > >  uint32_t val_le = cpu_to_le32(val);
> > > +bool may_notify = false;
> > > +bool master_was = false;
> > >  
> > >  trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> > >  
> > > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > >   __func__, vdev->vbasedev.name, addr, val, len);
> > >  }
> > >  
> > > +/* Bus Master Enabling/Disabling */
> > > +if (pdev->failover_primary && current_cpu &&
> > > +range_covers_byte(addr, len, PCI_COMMAND)) {
> > > +master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > +PCI_COMMAND_MASTER);
> > > +may_notify = true;
> > > +}
> > > +
> > >  /* MSI/MSI-X Enabling/Disabling */
> > >  if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> > >  ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > >  /* Write everything to QEMU to keep emulated bits correct */
> > >  pci_default_write_config(pdev, addr, val, len);
> > >  }
> > > +
> > > +if (may_notify) {
> > > +bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > + PCI_COMMAND_MASTER);
> > > +if (master_was != master_now) {
> > > +vfio_failover_notify(vdev, master_now);
> > > +}
> > > +}
> > >  }
> > >  
> > >  /*
> > > @@ -2801,6 +2821,17 @@ static void 
> > > vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> > >  vdev->req_enabled = false;
> > >  }
> > >  
> > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> > > +{
> > > +PCIDevice *pdev = &vdev->pdev;
> > > +const char *n;
> > > +gchar *path;
> > > +
> > > +n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> > > +path = object_get_canonical_path(OBJECT(vdev));
> > > +qapi_event_send_failover_primary_changed(!!n, n, path, status);
> > > +}
> > > +
> > >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> > >  {
> > >  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > @@ -3109,10

Re: [Qemu-devel] [PATCH for-4.0 v4 4/4] i386: allow to load initrd below 4G for recent linux

2019-01-07 Thread Paolo Bonzini

On 27/12/18 21:31, Eduardo Habkost wrote:
> All that said, I miss one piece of information here: is
> XLF_CAN_BE_LOADED_ABOVE_4G really supposed to override
> header+0x22c?  linux/Documentation/x86/boot.txt isn't clear about
> that.  Is there any reference that can help us confirm this?

Linux has supported initrd up to 4 GB for a very long time (2007, long
before XLF_CAN_BE_LOADED_ABOVE_4G which was added in 2013), though it
only sets initrd_max to 2 GB to "work around bootloader bugs".  So I
guess the flag can be taken as a hint that you can load at any address,
and perhaps could be renamed.

Paolo

Re: [Qemu-devel] [PATCH v3 0/5] Support for datapath switching during live migration

2019-01-07 Thread Michael S. Tsirkin

On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
> Implement the infrastructure to support datapath switching during live
> migration involving SR-IOV devices.
> 
> 1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
>bit and MAC address device pairing.
> 
> 2. This set of events will be consumed by userspace management software
>to orchestrate all the hot plug and datapath switching activities.
>This scheme has the least QEMU modifications while allowing userspace
>software to build its own intelligence to control the whole process
>of SR-IOV live migration.
> 
> 3. While the hidden device model (viz. coupled device model) is still
>being explored for automatic hot plugging (QEMU) and automatic datapath
>switching (host-kernel), this series provides a supplemental set
>of interfaces if management software wants to drive the SR-IOV live
>migration on its own. It should not conflict with the hidden device
>model but just offers simplicity of implementation.
> 
> 
> Si-Wei Liu (2):
>   vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during 
> failover
>   pci: query command extension to check the bus master enabling status of the 
> failover-primary device
> 
> Sridhar Samudrala (1):
>   virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
> 
> Venu Busireddy (2):
>   virtio_net: Add support for "Data Path Switching" during Live Migration.
>   virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
> 
> ---
> Changes in v3:
>   Fix issues with coding style in patch 3/5.
> 
> Changes in v2:
>   Added a query command for FAILOVER_STANDBY_CHANGED event.
>   Added a query command for FAILOVER_PRIMARY_CHANGED event.

Hmm it looks like all feedback I sent e.g. here:
https://patchwork.kernel.org/patch/10721571/
got ignored.

To summarize I suggest reworking the series adding a new command along
the lines of (naming is up to you):

query-pci-master - this returns status for a device
   and enables a *single* event after
   it changes

and then removing all status data from the event,
just notify about the change and *only once*.


upon event management does query-pci-master
and acts accordingly.




>  hmp.c  |   5 +++
>  hw/acpi/pcihp.c|  27 +++
>  hw/net/virtio-net.c|  42 +
>  hw/pci/pci.c   |   5 +++
>  hw/vfio/pci.c  |  60 +
>  hw/vfio/pci.h  |   1 +
>  include/hw/pci/pci.h   |   1 +
>  include/hw/virtio/virtio-net.h |   1 +
>  include/net/net.h  |   2 +
>  net/net.c  |  61 +
>  qapi/misc.json |   5 ++-
>  qapi/net.json  | 100 
> +
>  12 files changed, 309 insertions(+), 1 deletion(-)

Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

2019-01-07 Thread Alex Williamson

On Mon,  7 Jan 2019 17:29:43 -0500
Venu Busireddy  wrote:

> From: Si-Wei Liu 
> 
> When a VF is hotplugged into the guest, datapath switching will be
> performed immediately, which is sub-optimal in terms of timing, and
> could end up with substantial network downtime. One of ways to shorten
> this downtime is to switch the datapath only after the VF is seen to get
> enabled by guest, indicated by the bus master bit in VF's PCI config
> space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> at that time to indicate this condition. Then management stack can kick
> off datapath switching upon receiving the event.
> 
> Signed-off-by: Si-Wei Liu 
> Signed-off-by: Venu Busireddy 
> ---
>  hw/vfio/pci.c | 57 +
>  qapi/net.json | 26 ++
>  2 files changed, 83 insertions(+)

Why is this done at the vfio driver layer rather than the PCI core
layer?  We write everything through using pci_default_write_config(), I
don't see that anything here is particularly vfio specific.  Please copy
me on any changes in hw/vfio.  Thanks,

Alex

> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index bd83b58..adcc95a 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -34,6 +34,7 @@
>  #include "pci.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "qapi/qapi-events-net.h"
>  
>  #define MSIX_CAP_LENGTH 12
>  
> @@ -42,6 +43,7 @@
>  
>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
>  
>  /*
>   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
>  {
>  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>  uint32_t val_le = cpu_to_le32(val);
> +bool may_notify = false;
> +bool master_was = false;
>  
>  trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
>  
> @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>   __func__, vdev->vbasedev.name, addr, val, len);
>  }
>  
> +/* Bus Master Enabling/Disabling */
> +if (pdev->failover_primary && current_cpu &&
> +range_covers_byte(addr, len, PCI_COMMAND)) {
> +master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> +PCI_COMMAND_MASTER);
> +may_notify = true;
> +}
> +
>  /* MSI/MSI-X Enabling/Disabling */
>  if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
>  ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>  /* Write everything to QEMU to keep emulated bits correct */
>  pci_default_write_config(pdev, addr, val, len);
>  }
> +
> +if (may_notify) {
> +bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> + PCI_COMMAND_MASTER);
> +if (master_was != master_now) {
> +vfio_failover_notify(vdev, master_now);
> +}
> +}
>  }
>  
>  /*
> @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
> *vdev)
>  vdev->req_enabled = false;
>  }
>  
> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> +{
> +PCIDevice *pdev = &vdev->pdev;
> +const char *n;
> +gchar *path;
> +
> +n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> +path = object_get_canonical_path(OBJECT(vdev));
> +qapi_event_send_failover_primary_changed(!!n, n, path, status);
> +}
> +
>  static void vfio_realize(PCIDevice *pdev, Error **errp)
>  {
>  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
>  vfio_put_group(group);
>  }
>  
> +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> +{
> +PCIDevice *pdev = &vdev->pdev;
> +
> +/*
> + * Guest driver may not get the chance to disable bus mastering
> + * before the device object gets to be unrealized. In that event,
> + * send out a "disabled" notification on behalf of guest driver.
> + */
> +if (pdev->failover_primary &&
> +pdev->bus_master_enable_region.enabled) {
> +vfio_failover_notify(vdev, false);
> +}
> +}
> +
>  static void vfio_exitfn(PCIDevice *pdev)
>  {
>  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>  
> +/*
> + * During the guest reboot sequence, it is sometimes possible that
> + * the guest may not get sufficient time to complete the entire driver
> + * removal sequence, near the end of which a PCI config space write to
> + * disable bus mastering can be intercepted by device. In such cases,
> + * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
> + * is imperative to generate the event on the guest's behalf if the
> + * guest fails to make it.
> + */
>

Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

2019-01-07 Thread Michael S. Tsirkin

On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:
> On Mon,  7 Jan 2019 17:29:43 -0500
> Venu Busireddy  wrote:
> 
> > From: Si-Wei Liu 
> > 
> > When a VF is hotplugged into the guest, datapath switching will be
> > performed immediately, which is sub-optimal in terms of timing, and
> > could end up with substantial network downtime. One of ways to shorten
> > this downtime is to switch the datapath only after the VF is seen to get
> > enabled by guest, indicated by the bus master bit in VF's PCI config
> > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > at that time to indicate this condition. Then management stack can kick
> > off datapath switching upon receiving the event.
> > 
> > Signed-off-by: Si-Wei Liu 
> > Signed-off-by: Venu Busireddy 
> > ---
> >  hw/vfio/pci.c | 57 
> > +
> >  qapi/net.json | 26 ++
> >  2 files changed, 83 insertions(+)
> 
> Why is this done at the vfio driver layer rather than the PCI core
> layer?  We write everything through using pci_default_write_config(), I
> don't see that anything here is particularly vfio specific.  Please copy
> me on any changes in hw/vfio.  Thanks,
> 
> Alex

Hmm so you are saying let's send events for each device?
I don't have a problem with this but in this case
I think I would like to see a per-device option "send events".
We don't want a ton of events in the simple default config.

> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index bd83b58..adcc95a 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -34,6 +34,7 @@
> >  #include "pci.h"
> >  #include "trace.h"
> >  #include "qapi/error.h"
> > +#include "qapi/qapi-events-net.h"
> >  
> >  #define MSIX_CAP_LENGTH 12
> >  
> > @@ -42,6 +43,7 @@
> >  
> >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> >  
> >  /*
> >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >  {
> >  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> >  uint32_t val_le = cpu_to_le32(val);
> > +bool may_notify = false;
> > +bool master_was = false;
> >  
> >  trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> >  
> > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >   __func__, vdev->vbasedev.name, addr, val, len);
> >  }
> >  
> > +/* Bus Master Enabling/Disabling */
> > +if (pdev->failover_primary && current_cpu &&
> > +range_covers_byte(addr, len, PCI_COMMAND)) {
> > +master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > +PCI_COMMAND_MASTER);
> > +may_notify = true;
> > +}
> > +
> >  /* MSI/MSI-X Enabling/Disabling */
> >  if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> >  ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >  /* Write everything to QEMU to keep emulated bits correct */
> >  pci_default_write_config(pdev, addr, val, len);
> >  }
> > +
> > +if (may_notify) {
> > +bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > + PCI_COMMAND_MASTER);
> > +if (master_was != master_now) {
> > +vfio_failover_notify(vdev, master_now);
> > +}
> > +}
> >  }
> >  
> >  /*
> > @@ -2801,6 +2821,17 @@ static void 
> > vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> >  vdev->req_enabled = false;
> >  }
> >  
> > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> > +{
> > +PCIDevice *pdev = &vdev->pdev;
> > +const char *n;
> > +gchar *path;
> > +
> > +n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> > +path = object_get_canonical_path(OBJECT(vdev));
> > +qapi_event_send_failover_primary_changed(!!n, n, path, status);
> > +}
> > +
> >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> >  {
> >  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
> >  vfio_put_group(group);
> >  }
> >  
> > +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> > +{
> > +PCIDevice *pdev = &vdev->pdev;
> > +
> > +/*
> > + * Guest driver may not get the chance to disable bus mastering
> > + * before the device object gets to be unrealized. In that event,
> > + * send out a "disabled" notification on behalf of guest driver.
> > + */
> > +if (pdev->failover_primary &&
> > +pdev->bus_master_enable_region.enabled) {
> > +vfio_failover_notify(vdev, false);
> > +}
> > +}
> > +
> >  static void vfio_exitfn(PCIDevice *pdev)
> >  {
> >  VFIOP

Re: [Qemu-devel] [PATCH] cpus: ignore ESRCH in qemu_cpu_kick_thread()

2019-01-07 Thread Paolo Bonzini

On 02/01/19 15:16, Laurent Vivier wrote:
> We can have a race condition between qemu_cpu_kick_thread() and
> qemu_kvm_cpu_thread_fn() when we hotunplug a CPU. In this case,
> qemu_cpu_kick_thread() can try to kick a thread that is exiting.
> pthread_kill() returns an error and qemu is stopped by an exit(1).
> 
>qemu:qemu_cpu_kick_thread: No such process
> 
> We can ignore safely this error.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  cpus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 0ddeeefc14..4717490bd0 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1778,7 +1778,7 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
>  }
>  cpu->thread_kicked = true;
>  err = pthread_kill(cpu->thread->thread, SIG_IPI);
> -if (err) {
> +if (err && err != ESRCH) {
>  fprintf(stderr, "qemu:%s: %s", __func__, strerror(err));
>  exit(1);
>  }
> 

You could in principle be sending the signal to another thread, so the
fix is a bit hackish.  However, I don't have a better idea that is not
racy. :(

The problem is that qemu_cpu_kick does not use any spinlock or mutex to
synchronize against cpu_remove_sync's qemu_thread_join.  I think once
the you reach qemu_cpu_kick in cpu_remove_sync (so if cpu->unplug) you
do not need to reset cpu->thread_kicked anymore, but I don't think
that's enough to fix it.

Paolo

[Qemu-devel] [PULL 21/35] tests/hexloader-test: Don't pass -nographic to the QEMU under test

2019-01-07 Thread Paolo Bonzini

From: Peter Maydell 

The hexloader test invokes QEMU with the -nographic argument. This
is unnecessary, because the qtest_initf() function will pass it
-display none, which suffices to disable the graphical window.
It also means that the QEMU process will make the stdin/stdout
O_NONBLOCK. Since O_NONBLOCK is not per-file descriptor but per
"file description", this non-blocking behaviour is then shared
with any other process that's using the stdin/stdout of the
'make check' run, including make itself. This can result in make
falling over with "make: write error: stdout" because it got
an unexpected EINTR trying to write output messages to the terminal.
This is particularly noticable if running 'make check' in a loop with
  while make check; do true; done
(It does not affect single make check runs so much because the
shell will remove the O_NONBLOCK status before it reads the
terminal for interactive input.)

Remove the unwanted -nographic argument.

Signed-off-by: Peter Maydell 
Message-Id: <20190104145018.16950-1-peter.mayd...@linaro.org>
Reviewed-by: Thomas Huth 
Signed-off-by: Paolo Bonzini 
---
 tests/hexloader-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/hexloader-test.c b/tests/hexloader-test.c
index 834ed52c22..8b7aa2d72d 100644
--- a/tests/hexloader-test.c
+++ b/tests/hexloader-test.c
@@ -23,7 +23,7 @@ static void hex_loader_test(void)
 const unsigned int base_addr = 0x0001;
 
 QTestState *s = qtest_initf(
-"-M vexpress-a9 -nographic -device 
loader,file=tests/data/hex-loader/test.hex");
+"-M vexpress-a9 -device loader,file=tests/data/hex-loader/test.hex");
 
 for (i = 0; i < 256; ++i) {
 uint8_t val = qtest_readb(s, base_addr + i);
-- 
2.20.1

[Qemu-devel] [PATCH v3 3/5] virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.

2019-01-07 Thread Venu Busireddy

Add a query command to check the status of the FAILOVER_STANDBY_CHANGED
state of the virtio_net devices.

Signed-off-by: Venu Busireddy 
---
 hw/net/virtio-net.c| 16 +++
 include/hw/virtio/virtio-net.h |  1 +
 include/net/net.h  |  2 ++
 net/net.c  | 61 ++
 qapi/net.json  | 46 +++
 5 files changed, 126 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7b1bcde..a4e07ac 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -263,9 +263,11 @@ static void virtio_net_failover_notify_event(VirtIONet *n, 
uint8_t status)
  */
 if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
 (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
+n->standby_enabled = true;
 qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
 } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
 (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+n->standby_enabled = false;
 qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
 }
 }
@@ -448,6 +450,19 @@ static RxFilterInfo 
*virtio_net_query_rxfilter(NetClientState *nc)
 return info;
 }
 
+static StandbyStatusInfo *virtio_net_query_standby_status(NetClientState *nc)
+{
+StandbyStatusInfo *info;
+VirtIONet *n = qemu_get_nic_opaque(nc);
+
+info = g_malloc0(sizeof(*info));
+info->device = g_strdup(n->netclient_name);
+info->path = g_strdup(object_get_canonical_path(OBJECT(n->qdev)));
+info->enabled = n->standby_enabled;
+
+return info;
+}
+
 static void virtio_net_reset(VirtIODevice *vdev)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
@@ -1923,6 +1938,7 @@ static NetClientInfo net_virtio_info = {
 .receive = virtio_net_receive,
 .link_status_changed = virtio_net_set_link_status,
 .query_rx_filter = virtio_net_query_rxfilter,
+.query_standby_status = virtio_net_query_standby_status,
 };
 
 static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 4d7f3c8..9071e96 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -103,6 +103,7 @@ typedef struct VirtIONet {
 int announce_counter;
 bool needs_vnet_hdr_swap;
 bool mtu_bypass_backend;
+bool standby_enabled;
 } VirtIONet;
 
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
diff --git a/include/net/net.h b/include/net/net.h
index ec13702..61e8513 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -50,6 +50,7 @@ typedef void (NetCleanup) (NetClientState *);
 typedef void (LinkStatusChanged)(NetClientState *);
 typedef void (NetClientDestructor)(NetClientState *);
 typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
+typedef StandbyStatusInfo *(QueryStandbyStatus)(NetClientState *);
 typedef bool (HasUfo)(NetClientState *);
 typedef bool (HasVnetHdr)(NetClientState *);
 typedef bool (HasVnetHdrLen)(NetClientState *, int);
@@ -71,6 +72,7 @@ typedef struct NetClientInfo {
 NetCleanup *cleanup;
 LinkStatusChanged *link_status_changed;
 QueryRxFilter *query_rx_filter;
+QueryStandbyStatus *query_standby_status;
 NetPoll *poll;
 HasUfo *has_ufo;
 HasVnetHdr *has_vnet_hdr;
diff --git a/net/net.c b/net/net.c
index 1f7d626..fbf288e 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1320,6 +1320,67 @@ RxFilterInfoList *qmp_query_rx_filter(bool has_name, 
const char *name,
 return filter_list;
 }
 
+StandbyStatusInfoList *qmp_query_standby_status(bool has_device,
+const char *device,
+Error **errp)
+{
+NetClientState *nc;
+StandbyStatusInfoList *status_list = NULL, *last_entry = NULL;
+
+QTAILQ_FOREACH(nc, &net_clients, next) {
+StandbyStatusInfoList *entry;
+StandbyStatusInfo *info;
+
+if (has_device && strcmp(nc->name, device) != 0) {
+continue;
+}
+
+/* only query standby status information of NIC */
+if (nc->info->type != NET_CLIENT_DRIVER_NIC) {
+if (has_device) {
+error_setg(errp, "net client(%s) isn't a NIC", device);
+return NULL;
+}
+continue;
+}
+
+/*
+ * only query information on queue 0 since the info is per nic,
+ * not per queue.
+ */
+if (nc->queue_index != 0) {
+continue;
+}
+
+if (nc->info->query_standby_status) {
+info = nc->info->query_standby_status(nc);
+entry = g_malloc0(sizeof(*entry));
+entry->value = info;
+
+if (!status_list) {
+status_list = entry;
+} else {
+last_entry->next = entry;
+}
+

[Qemu-devel] [PULL 23/35] test: replace gtester with a TAP driver

2019-01-07 Thread Paolo Bonzini

gtester is deprecated by upstream glib (see for example the announcement
at https://blog.gtk.org/2018/07/11/news-from-glib-2-58/) and it does
not support tests that call g_test_skip in some glib stable releases.

glib suggests instead using Automake's TAP support, which gtest itself
supports since version 2.38 (QEMU's minimum requirement is 2.40).
We do not support Automake, but we can use Automake's code to beautify
the TAP output.  I chose to use the Perl copy rather than the shell/awk
one, with some changes so that it can accept TAP through stdin, in order
to reuse Perl's TAP parsing package.  This also avoids duplicating the
parser between tap-driver.pl and tap-merge.pl.

Signed-off-by: Paolo Bonzini 
Message-Id: <1543513531-1151-3-git-send-email-pbonz...@redhat.com>
Reviewed-by: Eric Blake 
Signed-off-by: Paolo Bonzini 
---
 rules.mak|   4 +-
 scripts/gtester-cat  |  26 --
 scripts/tap-driver.pl| 379 +++
 scripts/tap-merge.pl | 111 ++
 tests/Makefile.include   |  75 ++--
 tests/docker/dockerfiles/centos7.docker  |   1 +
 tests/docker/dockerfiles/debian-amd64.docker |   1 +
 tests/docker/dockerfiles/debian-ports.docker |   1 +
 tests/docker/dockerfiles/debian-sid.docker   |   1 +
 tests/docker/dockerfiles/debian8.docker  |   1 +
 tests/docker/dockerfiles/debian9.docker  |   1 +
 tests/docker/dockerfiles/fedora.docker   |   1 +
 tests/docker/dockerfiles/ubuntu.docker   |   1 +
 13 files changed, 551 insertions(+), 52 deletions(-)
 delete mode 100755 scripts/gtester-cat
 create mode 100755 scripts/tap-driver.pl
 create mode 100755 scripts/tap-merge.pl

diff --git a/rules.mak b/rules.mak
index bbb2667928..86e033d815 100644
--- a/rules.mak
+++ b/rules.mak
@@ -132,7 +132,9 @@ modules:
 #  otherwise print the 'quiet' output in the format "  NAME args to print"
 # NAME should be a short name of the command, 7 letters or fewer.
 # If called with only a single argument, will print nothing in quiet mode.
-quiet-command = $(if $(V),$1,$(if $(2),@printf "  %-7s %s\n" $2 $3 && $1, @$1))
+quiet-command-run = $(if $(V),,$(if $2,printf "  %-7s %s\n" $2 $3 && ))$1
+quiet-@ = $(if $(V),,@)
+quiet-command = $(quiet-@)$(call quiet-command-run,$1,$2,$3)
 
 # cc-option
 # Usage: CFLAGS+=$(call cc-option, -falign-functions=0, -malign-functions=0)
diff --git a/scripts/gtester-cat b/scripts/gtester-cat
deleted file mode 100755
index 061a952cad..00
--- a/scripts/gtester-cat
+++ /dev/null
@@ -1,26 +0,0 @@
-#!/bin/sh
-#
-# Copyright IBM, Corp. 2012
-#
-# Authors:
-#  Anthony Liguori 
-#
-# This work is licensed under the terms of the GNU GPLv2 or later.
-# See the COPYING file in the top-level directory.
-
-cat <
-
- 
-  qemu
-  0.0
-  rev
- 
-EOF
-
-sed \
-  -e '/$/d' \
-  -e '//,/<\/info>/d' \
-  -e '$b' \
-  -e '/^<\/gtester>$/d' "$@"
diff --git a/scripts/tap-driver.pl b/scripts/tap-driver.pl
new file mode 100755
index 00..6621a5cd67
--- /dev/null
+++ b/scripts/tap-driver.pl
@@ -0,0 +1,379 @@
+#! /usr/bin/env perl
+# Copyright (C) 2011-2013 Free Software Foundation, Inc.
+# Copyright (C) 2018 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2, or (at your option)
+# any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+# As a special exception to the GNU General Public License, if you
+# distribute this file as part of a program that contains a
+# configuration script generated by Autoconf, you may include it under
+# the same distribution terms that you use for the rest of that program.
+
+# -- #
+#  Imports, static data, and setup.  #
+# -- #
+
+use warnings FATAL => 'all';
+use strict;
+use Getopt::Long ();
+use TAP::Parser;
+use Term::ANSIColor qw(:constants);
+
+my $ME = "tap-driver.pl";
+my $VERSION = "2018-11-30";
+
+my $USAGE = <<'END';
+Usage:
+  tap-driver [--test-name=TEST] [--color={always|never|auto}]
+ [--verbose] [--show-failures-only]
+END
+
+my $HELP = "$ME: TAP-aware test driver for QEMU testsuite harness." .
+   "\n" . $USAGE;
+
+# It's important that NO_PLAN evaluates "false" as a boolean.
+use constant NO_PLAN => 0;
+use constant EARLY_PLAN => 1;
+use constant LATE_PLAN => 2;
+
+use constant DIAG_STRING => "#";
+
+# --- #
+#  Global variables.  #
+# --- #
+
+my $testno = 0; # Number of test

[Qemu-devel] [PATCH v3 5/5] pci: query command extension to check the bus master enabling status of the failover-primary device

2019-01-07 Thread Venu Busireddy

From: Si-Wei Liu 

Signed-off-by: Si-Wei Liu 
Signed-off-by: Venu Busireddy 
---
 hmp.c  | 5 +
 hw/pci/pci.c   | 5 +
 qapi/misc.json | 5 -
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hmp.c b/hmp.c
index 7828f93..7a75c93 100644
--- a/hmp.c
+++ b/hmp.c
@@ -890,6 +890,11 @@ static void hmp_info_pci_device(Monitor *mon, const 
PciDeviceInfo *dev)
 }
 }
 
+if (dev->has_failover_status) {
+monitor_printf(mon, "  Failover primary, bus master %s.\n",
+   dev->failover_status ? "enabled" : "disabled");
+}
+
 monitor_printf(mon, "  id \"%s\"\n", dev->qdev_id);
 
 if (dev->has_pci_bridge) {
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 56b13b3..9da49fd 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1761,6 +1761,11 @@ static PciDeviceInfo *qmp_query_pci_device(PCIDevice 
*dev, PCIBus *bus,
 pci_get_word(dev->config + PCI_CB_SUBSYSTEM_VENDOR_ID);
 }
 
+if (dev->failover_primary) {
+info->has_failover_status = true;
+info->failover_status = dev->bus_master_enable_region.enabled;
+}
+
 return info;
 }
 
diff --git a/qapi/misc.json b/qapi/misc.json
index 6c1c5c0..05f003e 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -865,6 +865,9 @@
 #
 # @regions: a list of the PCI I/O regions associated with the device
 #
+# @failover_status: if 'failover-primary' property is 'true', true if PCI
+#   bus master bit on the device is enabled
+#
 # Notes: the contents of @class_info.desc are not stable and should only be
 #treated as informational.
 #
@@ -874,7 +877,7 @@
   'data': {'bus': 'int', 'slot': 'int', 'function': 'int',
'class_info': 'PciDeviceClass', 'id': 'PciDeviceId',
'*irq': 'int', 'qdev_id': 'str', '*pci_bridge': 'PciBridgeInfo',
-   'regions': ['PciMemoryRegion']} }
+   'regions': ['PciMemoryRegion'], '*failover_status': 'bool'} }
 
 ##
 # @PciInfo:

[Qemu-devel] [PULL v4 00/35] Misc patches for 2018-12-18

2019-01-07 Thread Paolo Bonzini

The following changes since commit 9b2e891ec5ccdb4a7d583b77988848282606fdea:

  Merge remote-tracking branch 'remotes/marcel/tags/rdma-pull-request' into 
staging (2018-12-22 11:25:31 +)

are available in the Git repository at:

  git://github.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to a486bd29567a23a454e248aed1d8c28b90a6ea3e:

  avoid TABs in files that only contain a few (2019-01-04 20:59:49 +0100)


* HAX support for Linux hosts (Alejandro)
* esp bugfixes (Guenter)
* Windows build cleanup (Marc-André)
* checkpatch logic improvements (Paolo)
* coalesced range bugfix (Paolo)
* switch testsuite to TAP (Paolo, with Peter's fix to hexloader-test)
* QTAILQ rewrite (Paolo)
* block/iscsi.c cancellation fixes (Stefan)
* improve selection of the default accelerator (Thomas)


v3->v4: fix write error due to O_NONBLOCK for good

Alexandro Sanchez Bach (1):
  hax: Support for Linux hosts

Guenter Roeck (2):
  esp-pci: Fix status register write erase control
  scsi: esp: Defer command completion until previous interrupts have been 
handled

Marc-André Lureau (4):
  build-sys: don't include windows.h, osdep.h does it
  build-sys: move windows defines in osdep.h header
  build-sys: build with Vista API by default
  qga: drop < Vista compatibility

Paolo Bonzini (21):
  checkpatch: fix premature exit when no input or --mailback
  checkpatch: check Signed-off-by in --mailback mode
  checkpatch: improve handling of multiple patches or files
  checkpatch: colorize output to terminal
  pam: wrap MemoryRegion initialization in a transaction
  memory: extract flat_range_coalesced_io_{del,add}
  memory: avoid unnecessary coalesced_io_del operations
  memory: update coalesced_range on transaction_commit
  test: execute g_test_run when tests are skipped
  test: replace gtester with a TAP driver
  qemu/queue.h: do not access tqe_prev directly
  vfio: make vfio_address_spaces static
  qemu/queue.h: leave head structs anonymous unless necessary
  qemu/queue.h: typedef QTAILQ heads
  qemu/queue.h: remove Q_TAILQ_{HEAD,ENTRY}
  qemu/queue.h: reimplement QTAILQ without pointer-to-pointers
  qemu/queue.h: simplify reverse access to QTAILQ
  checkpatch: warn about qemu/queue.h head structs that are not typedef-ed
  scripts: add script to convert multiline comments into 4-line format
  remove space-tab sequences
  avoid TABs in files that only contain a few

Peng Hao (1):
  hw/watchdog/wdt_i6300esb: remove a unnecessary comment

Peter Maydell (1):
  tests/hexloader-test: Don't pass -nographic to the QEMU under test

Stefan Hajnoczi (4):
  block/iscsi: drop unused IscsiAIOCB->buf field
  block/iscsi: take iscsilun->mutex in iscsi_timed_check_events()
  block/iscsi: fix ioctl cancel use-after-free
  block/iscsi: cancel libiscsi task when ABORT TASK TMF completes

Thomas Huth (1):
  accel: Improve selection of the default accelerator

 accel/accel.c|  18 +-
 accel/kvm/kvm-all.c  |   4 +-
 accel/tcg/translate-all.c|   4 -
 block/bochs.c|  22 +-
 block/file-posix.c   |   2 +-
 block/file-win32.c   |   8 +-
 block/gluster.c  |   2 +-
 block/iscsi.c|  47 +++-
 block/linux-aio.c|   4 +-
 block/mirror.c   |   2 +-
 block/qcow2-bitmap.c |   4 +-
 block/qcow2-cluster.c|   2 +-
 block/qcow2.h|   5 +-
 block/sheepdog.c |   6 +-
 block/vhdx.h |   2 +-
 block/vpc.c  |   2 +-
 blockdev.c   |   4 +-
 bsd-user/elfload.c   |   2 +-
 bsd-user/x86_64/target_syscall.h |   2 +-
 configure|   3 -
 contrib/elf2dmp/main.c   |   2 +-
 contrib/ivshmem-client/ivshmem-client.h  |   4 +-
 contrib/ivshmem-server/ivshmem-server.h  |   5 +-
 cpus-common.c|   2 +-
 crypto/aes.c |  28 +-
 disas/alpha.c|   8 +-
 disas/arm.c  |   2 +-
 disas/i386.c |   4 +-
 disas/m68k.c |   4 +-
 dump.c   |   2 +-
 exec.c   |   5 +-
 fsdev/qemu-fsdev.c   |   2 +-
 hw/alpha/typhoon.c   |  12 +-
 hw/arm/stellaris.c

[Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

2019-01-07 Thread Venu Busireddy

From: Si-Wei Liu 

When a VF is hotplugged into the guest, datapath switching will be
performed immediately, which is sub-optimal in terms of timing, and
could end up with substantial network downtime. One of ways to shorten
this downtime is to switch the datapath only after the VF is seen to get
enabled by guest, indicated by the bus master bit in VF's PCI config
space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
at that time to indicate this condition. Then management stack can kick
off datapath switching upon receiving the event.

Signed-off-by: Si-Wei Liu 
Signed-off-by: Venu Busireddy 
---
 hw/vfio/pci.c | 57 +
 qapi/net.json | 26 ++
 2 files changed, 83 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index bd83b58..adcc95a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -34,6 +34,7 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "qapi/qapi-events-net.h"
 
 #define MSIX_CAP_LENGTH 12
 
@@ -42,6 +43,7 @@
 
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
 
 /*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
@@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
 uint32_t val_le = cpu_to_le32(val);
+bool may_notify = false;
+bool master_was = false;
 
 trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
 
@@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
  __func__, vdev->vbasedev.name, addr, val, len);
 }
 
+/* Bus Master Enabling/Disabling */
+if (pdev->failover_primary && current_cpu &&
+range_covers_byte(addr, len, PCI_COMMAND)) {
+master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+PCI_COMMAND_MASTER);
+may_notify = true;
+}
+
 /* MSI/MSI-X Enabling/Disabling */
 if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
 ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
@@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
 /* Write everything to QEMU to keep emulated bits correct */
 pci_default_write_config(pdev, addr, val, len);
 }
+
+if (may_notify) {
+bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+ PCI_COMMAND_MASTER);
+if (master_was != master_now) {
+vfio_failover_notify(vdev, master_now);
+}
+}
 }
 
 /*
@@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
*vdev)
 vdev->req_enabled = false;
 }
 
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
+{
+PCIDevice *pdev = &vdev->pdev;
+const char *n;
+gchar *path;
+
+n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
+path = object_get_canonical_path(OBJECT(vdev));
+qapi_event_send_failover_primary_changed(!!n, n, path, status);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
 vfio_put_group(group);
 }
 
+static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
+{
+PCIDevice *pdev = &vdev->pdev;
+
+/*
+ * Guest driver may not get the chance to disable bus mastering
+ * before the device object gets to be unrealized. In that event,
+ * send out a "disabled" notification on behalf of guest driver.
+ */
+if (pdev->failover_primary &&
+pdev->bus_master_enable_region.enabled) {
+vfio_failover_notify(vdev, false);
+}
+}
+
 static void vfio_exitfn(PCIDevice *pdev)
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
 
+/*
+ * During the guest reboot sequence, it is sometimes possible that
+ * the guest may not get sufficient time to complete the entire driver
+ * removal sequence, near the end of which a PCI config space write to
+ * disable bus mastering can be intercepted by device. In such cases,
+ * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
+ * is imperative to generate the event on the guest's behalf if the
+ * guest fails to make it.
+ */
+vfio_exit_failover_notify(vdev);
+
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
 pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
diff --git a/qapi/net.json b/qapi/net.json
index 633ac87..a5b8d70 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -757,3 +757,29 @@
 ##
 { 'command': 'query-standby-status', 'data': { '*device': 'str' },
   'returns': ['StandbyStatusInfo'] }
+
+##
+# @FAILOVER_PRIMARY_CHANGED:
+#
+# Emitted whenever the driver of failover primary is loaded or unloaded
+# by the guest.
+#
+# @device: d

[Qemu-devel] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.

2019-01-07 Thread Venu Busireddy

Added a new event, FAILOVER_STANDBY_CHANGED, which is emitted whenever
the status of the virtio_net driver in the guest changes (either the
guest successfully loads the driver after the F_STANDBY feature bit
is negotiated, or the guest unloads the driver or reboots). Management
stack can use this event to determine when to plug/unplug the VF device
to/from the guest.

Also, the Virtual Functions will be automatically removed from the guest
if the guest is rebooted. To properly identify the VFIO devices that
must be removed, a new property named "failover-primary" is added to
the vfio-pci devices. Only the vfio-pci devices that have this property
enabled are removed from the guest upon reboot.

Signed-off-by: Venu Busireddy 
---
 hw/acpi/pcihp.c  | 27 +++
 hw/net/virtio-net.c  | 24 
 hw/vfio/pci.c|  3 +++
 hw/vfio/pci.h|  1 +
 include/hw/pci/pci.h |  1 +
 qapi/net.json| 28 
 6 files changed, 84 insertions(+)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 80d42e1..2a3ffd3 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, 
unsigned bsel, unsigned slo
 }
 }
 
+static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
+{
+BusChild *kid, *next;
+PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
+
+if (!bus) {
+return;
+}
+QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
+DeviceState *qdev = kid->child;
+PCIDevice *pdev = PCI_DEVICE(qdev);
+int slot = PCI_SLOT(pdev->devfn);
+
+if (pdev->failover_primary) {
+s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
+}
+}
+}
+
 static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
 {
 BusChild *kid, *next;
@@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
 int i;
 
 for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
+/*
+ * Set the acpi_pcihp_pci_status[].down bits of all the
+ * failover_primary devices so that the devices are ejected
+ * from the guest. We can't use the qdev_unplug() as well as the
+ * hotplug_handler to unplug the devices, because the guest may
+ * not be in a state to cooperate.
+ */
+acpi_pcihp_cleanup_failover_primary(s, i);
 acpi_pcihp_update_hotplug_bus(s, i);
 }
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 411f8fb..7b1bcde 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -248,6 +248,29 @@ static void virtio_net_drop_tx_queue_data(VirtIODevice 
*vdev, VirtQueue *vq)
 }
 }
 
+static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(n);
+
+if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_STANDBY)) {
+const char *ncn = n->netclient_name;
+gchar *path = object_get_canonical_path(OBJECT(n->qdev));
+/*
+ * Emit FAILOVER_STANDBY_CHANGED event with enabled=true
+ *   when the status transitions from 0 to VIRTIO_CONFIG_S_DRIVER_OK
+ * Emit FAILOVER_STANDBY_CHANGED event with enabled=false
+ *   when the status transitions from VIRTIO_CONFIG_S_DRIVER_OK to 0
+ */
+if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+(!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
+qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
+} else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
+(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
+}
+}
+}
+
 static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
@@ -256,6 +279,7 @@ static void virtio_net_set_status(struct VirtIODevice 
*vdev, uint8_t status)
 uint8_t queue_status;
 
 virtio_net_vnet_endian_status(n, status);
+virtio_net_failover_notify_event(n, status);
 virtio_net_vhost_status(n, status);
 
 for (i = 0; i < n->max_queues; i++) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5c7bd96..bd83b58 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
 vfio_setup_resetfn_quirk(vdev);
+pdev->failover_primary = vdev->failover_primary;
 
 return;
 
@@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
qdev_prop_nv_gpudirect_clique, uint8_t),
 DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
 OFF_AUTOPCIBAR_OFF),
+DEFINE_PROP_BOOL("failover-primary", VFIOPCIDevice, failover_primary,
+ false),
 /*

[Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

2019-01-07 Thread Venu Busireddy

From: Sridhar Samudrala 

This feature bit can be used by a hypervisor to indicate to the virtio_net
device that it can act as a standby for another device with the same MAC
address.

Signed-off-by: Sridhar Samudrala 
Signed-off-by: Venu Busireddy 
---
 hw/net/virtio-net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 385b1a0..411f8fb 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
  true),
 DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
 DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
VIRTIO_NET_F_STANDBY,
+  false),
 DEFINE_PROP_END_OF_LIST(),
 };

[Qemu-devel] [PATCH v3 0/5] Support for datapath switching during live migration

2019-01-07 Thread Venu Busireddy

Implement the infrastructure to support datapath switching during live
migration involving SR-IOV devices.

1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
   bit and MAC address device pairing.

2. This set of events will be consumed by userspace management software
   to orchestrate all the hot plug and datapath switching activities.
   This scheme has the least QEMU modifications while allowing userspace
   software to build its own intelligence to control the whole process
   of SR-IOV live migration.

3. While the hidden device model (viz. coupled device model) is still
   being explored for automatic hot plugging (QEMU) and automatic datapath
   switching (host-kernel), this series provides a supplemental set
   of interfaces if management software wants to drive the SR-IOV live
   migration on its own. It should not conflict with the hidden device
   model but just offers simplicity of implementation.


Si-Wei Liu (2):
  vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during 
failover
  pci: query command extension to check the bus master enabling status of the 
failover-primary device

Sridhar Samudrala (1):
  virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

Venu Busireddy (2):
  virtio_net: Add support for "Data Path Switching" during Live Migration.
  virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.

---
Changes in v3:
  Fix issues with coding style in patch 3/5.

Changes in v2:
  Added a query command for FAILOVER_STANDBY_CHANGED event.
  Added a query command for FAILOVER_PRIMARY_CHANGED event.

 hmp.c  |   5 +++
 hw/acpi/pcihp.c|  27 +++
 hw/net/virtio-net.c|  42 +
 hw/pci/pci.c   |   5 +++
 hw/vfio/pci.c  |  60 +
 hw/vfio/pci.h  |   1 +
 include/hw/pci/pci.h   |   1 +
 include/hw/virtio/virtio-net.h |   1 +
 include/net/net.h  |   2 +
 net/net.c  |  61 +
 qapi/misc.json |   5 ++-
 qapi/net.json  | 100 +
 12 files changed, 309 insertions(+), 1 deletion(-)

Re: [Qemu-devel] [PATCH v2 19/27] target/arm: Export aa64_va_parameters to internals.h

2019-01-07 Thread Richard Henderson

On 1/7/19 9:45 PM, Peter Maydell wrote:
> I assume 48 bits is what the kernel sets up for userspace ?

Yep.  There's been some discussion about what to do with 52-bit addressing, but
it hasn't landed upstream yet.


r~

Re: [Qemu-devel] [PULL 00/37] target-arm queue

2019-01-07 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20190107163117.16269-1-peter.mayd...@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PULL 00/37] target-arm queue
Message-id: 20190107163117.16269-1-peter.mayd...@linaro.org
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
b2409c8 Support u-boot noload images for arm as used by, NetBSD/evbarm GENERIC 
kernel.
a212ed8 hw/misc/tz-mpc: Fix value of BLK_MAX register
d3d8681 target/arm: Emit barriers for A32/T32 load-acquire/store-release insns
95dc9ac arm: Add Clock peripheral stub to NRF51 SOC
814a757 tests/microbit-test: Add Tests for nRF51 Timer
31d3967 arm: Instantiate NRF51 Timers
a07deaf hw/timer/nrf51_timer: Add nRF51 Timer peripheral
1659930 tests/microbit-test: Add Tests for nRF51 GPIO
a0f346b arm: Instantiate NRF51 general purpose I/O
51dadd5 hw/gpio/nrf51_gpio: Add nRF51 GPIO peripheral
f5e930f arm: Instantiate NRF51 random number generator
10a0ea4 hw/misc/nrf51_rng: Add NRF51 random number generator peripheral
27b50a5 arm: Add header to host common definition for nRF51 SOC peripherals
328b5be qtest: Add set_irq_in command to set IRQ/GPIO level
04a8786 hw/arm/allwinner-a10: Add the 'A' SRAM and the SRAM controller
9ed39b1 cpus.c: Fix race condition in cpu_stop_current()
5f802b8 MAINTAINERS: Add ARM-related files for hw/[misc|input|timer]/
9731ced hw/arm: versal: Plug memory leaks
d60bc50 Revert "armv7m: Guard against no -kernel argument"
5ad714f arm/xlnx-zynqmp: put APUs and RPUs in separate CPU clusters
1df1cea gdbstub: add multiprocess extension support
531013e gdbstub: gdb_set_stop_cpu: ignore request when process is not attached
fbbf155 gdbstub: processes initialization on new peer connection
6bfe258 gdbstub: add support for vAttach packets
6deda56 gdbstub: add support for extended mode packet
b680a90 gdbstub: add multiprocess support to 'D' packets
f57d087 gdbstub: add multiprocess support to gdb_vm_state_change()
e9badd6 gdbstub: add multiprocess support to Xfer:features:read:
34b351b gdbstub: add multiprocess support to (f|s)ThreadInfo and ThreadExtraInfo
c1bf225 gdbstub: add multiprocess support to 'sC' packets
a5c8ccd gdbstub: add multiprocess support to vCont packets
f90eb5e gdbstub: add multiprocess support to 'H' and 'T' packets
01bf3a9 gdbstub: add multiprocess support to '?' packets
182cc17 gdbstub: introduce GDB processes
e54216e hw/cpu: introduce CPU clusters
029c2eb target/arm: SVE brk[ab] merging does not have s bit
cdbc6a8 target/arm: Convert ARM_TBFLAG_* to FIELDs

=== OUTPUT BEGIN ===
Checking PATCH 1/37: target/arm: Convert ARM_TBFLAG_* to FIELDs...
Checking PATCH 2/37: target/arm: SVE brk[ab] merging does not have s bit...
Checking PATCH 3/37: hw/cpu: introduce CPU clusters...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#50: 
new file mode 100644

WARNING: Block comments use a leading /* on a separate line
#153: FILE: include/hw/cpu/cluster.h:43:
+/**

total: 0 errors, 2 warnings, 122 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 4/37: gdbstub: introduce GDB processes...
Checking PATCH 5/37: gdbstub: add multiprocess support to '?' packets...
Checking PATCH 6/37: gdbstub: add multiprocess support to 'H' and 'T' packets...
Checking PATCH 7/37: gdbstub: add multiprocess support to vCont packets...
Checking PATCH 8/37: gdbstub: add multiprocess support to 'sC' packets...
Checking PATCH 9/37: gdbstub: add multiprocess support to (f|s)ThreadInfo and 
ThreadExtraInfo...
Checking PATCH 10/37: gdbstub: add multiprocess support to 
Xfer:features:read:...
Checking PATCH 11/37: gdbstub: add multiprocess support to 
gdb_vm_state_change()...
Checking PATCH 12/37: gdbstub: add multiprocess support to 'D' packets...
Checking PATCH 13/37: gdbstub: add support for extended mode packet...
Checking PATCH 14/37: gdbstub: add support for vAttach packets...
Checking PATCH 15/37: gdbstub: processes initialization on new peer 
connection...
Checking PATCH 16/37: gdbstub: gdb_set_stop_cpu: ignore request when process is 
not attached...
Checking PATCH 17/37: gdbstub: add multiprocess extension support...
Checking PATCH 18/37: arm/xlnx-zynqmp: put APUs and RPUs in separate CPU 
clusters...
Checking PATCH 19/37: Revert "armv7m: Guar

Re: [Qemu-devel] [Bug 1810545] Re: [alpha] Strange exception address reported

2019-01-07 Thread Richard Henderson

On 1/8/19 5:00 AM, Peter Maydell wrote:
> On Mon, 7 Jan 2019 at 18:10, Peter Maydell  wrote:
> (re: https://bugs.launchpad.net/bugs/1810545)
> 
>> The problem seems to be that the PC we report for an OPCDEC
>> is first selected by gen_invalid()/gen_excp() in
>> target/alpha/translate.c, which uses pc_next (ie the insn's
>> address plus 4). But that is then handed through to our custom
>> PALcode 
>> (https://git.qemu.org/?p=qemu-palcode.git;a=blob;f=pal.S;h=1781c4b415700ca3a68af07fdae90ae43e722501;hb=HEAD)
>>  which does
>>   addqp6, 4, p1  // increment past the faulting insn
>> resulting in insn + 8.
>>
>> That is, the palcode and the QEMU code have a disagreement about what
>> the (private) API between them is. I'm not sure which side is wrong and
>> should be corrected. I think the linux-user code assumes the same thing
>> that translate.c is doing, so perhaps the palcode.
> 
> Richard -- any suggestions for which side of this API we should
> be changing?

Probably the palcode side.  I'll take care of it.


r~

Re: [Qemu-devel] [PATCH 4/8] ppc4xx: Use ram_addr_t in ppc4xx_sdram_adjust()

2019-01-07 Thread BALATON Zoltan


On Fri, 4 Jan 2019, David Gibson wrote:

On Thu, Jan 03, 2019 at 03:03:20PM +0100, BALATON Zoltan wrote:

On Wed, 2 Jan 2019, David Gibson wrote:

On Wed, Jan 02, 2019 at 03:06:38AM +0100, BALATON Zoltan wrote:

To avoid overflow if larger values are added later use ram_addr_t for
the sdram_bank_sizes parameter to match ram_size to which it is
compared.


So, technically I think these should be 'hwaddr' (which represents a
guest physical address) rather tham ram_addr_t which
represents... something subtley different I've never properly
understood.


I don't understand the difference either but ram_size in MachineState where
this value comes from is ram_addr_t now so I've left is for now. If someone
knows which type should this be can change it in another patch
later.


Ok, fair enough.


Then will you take v3 of this series or is there anything else that should 
be corrected?


Regards,
BALATON Zoltan

Re: [Qemu-devel] [PATCH v2 0/5] Support for datapath switching during live migration

2019-01-07 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/1546883690-17798-1-git-send-email-venu.busire...@oracle.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH v2 0/5] Support for datapath switching during live 
migration
Message-id: 1546883690-17798-1-git-send-email-venu.busire...@oracle.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
88b0acb pci: query command extension to check the bus master enabling status of 
the failover-primary device
ed2dc77 vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during 
failover
c677246 virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
135dfa7 virtio_net: Add support for "Data Path Switching" during Live Migration.
a87f451 virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

=== OUTPUT BEGIN ===
Checking PATCH 1/5: virtio_net: Add VIRTIO_NET_F_STANDBY feature bit
Checking PATCH 2/5: virtio_net: Add support for "Data Path Switching" during 
Live Migration
Checking PATCH 3/5: virtio_net: Add a query command for 
FAILOVER_STANDBY_CHANGED event
WARNING: Block comments use a leading /* on a separate line
#121: FILE: net/net.c:1347:
+/* only query information on queue 0 since the info is per nic,

ERROR: braces {} are necessary for all arms of this statement
#124: FILE: net/net.c:1350:
+if (nc->queue_index != 0)
[...]

total: 1 errors, 1 warnings, 172 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 4/5: vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten 
downtime during failover...
Checking PATCH 5/5: pci: query command extension to check the bus master 
enabling status of the failover-primary device...
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/1546883690-17798-1-git-send-email-venu.busire...@oracle.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [Qemu-devel] [PATCH v2 06/52] audio: -audiodev command line option basic implementation

2019-01-07 Thread Zoltán Kővágó

On 2019-01-07 14:13, Markus Armbruster wrote:
> "Kővágó, Zoltán"  writes:
> 
>> Audio drivers now get an Audiodev * as config paramters, instead of the
>> global audio_option structs.  There is some code in audio/audio_legacy.c
>> that converts the old environment variables to audiodev options (this
>> way backends do not have to worry about legacy options).  It also
>> contains a replacement of -audio-help, which prints out the equivalent
>> -audiodev based config of the currently specified environment variables.
>>
>> Note that backends are not updated and still rely on environment
>> variables.
>>
>> Also note that (due to moving try-poll from global to backend specific
>> option) currently ALSA and OSS will always try poll mode, regardless of
>> environment variables or -audiodev options.
>>
>> Signed-off-by: Kővágó, Zoltán 
>> ---
> [...]
>> diff --git a/audio/audio.c b/audio/audio.c
>> index 96cbd57c37..e7f25ea84b 100644
>> --- a/audio/audio.c
>> +++ b/audio/audio.c
> [...]
>> @@ -2127,3 +1841,158 @@ void AUD_set_volume_in (SWVoiceIn *sw, int mute, 
>> uint8_t lvol, uint8_t rvol)
>>  }
>>  }
>>  }
>> +
>> +QemuOptsList qemu_audiodev_opts = {
>> +.name = "audiodev",
>> +.head = QTAILQ_HEAD_INITIALIZER(qemu_audiodev_opts.head),
>> +.implied_opt_name = "driver",
>> +.desc = {
>> +/*
>> + * no elements => accept any params
>> + * sanity checking will happen later
>> + */
>> +{ /* end of list */ }
>> +},
>> +};
>> +
>> +static void validate_per_direction_opts(AudiodevPerDirectionOptions *pdo,
>> +Error **errp)
>> +{
>> +if (!pdo->has_fixed_settings) {
>> +pdo->has_fixed_settings = true;
>> +pdo->fixed_settings = true;
>> +}
>> +if (!pdo->fixed_settings &&
>> +(pdo->has_frequency || pdo->has_channels || pdo->has_format)) {
>> +error_setg(errp,
>> +   "You can't use frequency, channels or format with 
>> fixed-settings=off");
>> +return;
>> +}
>> +
>> +if (!pdo->has_frequency) {
>> +pdo->has_frequency = true;
>> +pdo->frequency = 44100;
>> +}
>> +if (!pdo->has_channels) {
>> +pdo->has_channels = true;
>> +pdo->channels = 2;
>> +}
>> +if (!pdo->has_voices) {
>> +pdo->has_voices = true;
>> +pdo->voices = 1;
>> +}
>> +if (!pdo->has_format) {
>> +pdo->has_format = true;
>> +pdo->format = AUDIO_FORMAT_S16;
>> +}
>> +}
>> +
>> +static Audiodev *parse_option(QemuOpts *opts, Error **errp)
>> +{
>> +Error *local_err = NULL;
>> +Visitor *v = opts_visitor_new(opts, true);
>> +Audiodev *dev = NULL;
>> +visit_type_Audiodev(v, NULL, &dev, &local_err);
>> +visit_free(v);
>> +
>> +if (local_err) {
>> +goto err2;
>> +}
>> +
>> +validate_per_direction_opts(dev->in, &local_err);
>> +if (local_err) {
>> +goto err;
>> +}
>> +validate_per_direction_opts(dev->out, &local_err);
>> +if (local_err) {
>> +goto err;
>> +}
>> +
>> +if (!dev->has_timer_period) {
>> +dev->has_timer_period = true;
>> +dev->timer_period = 1; /* 100Hz -> 10ms */
>> +}
>> +
>> +return dev;
>> +
>> +err:
>> +qapi_free_Audiodev(dev);
>> +err2:
>> +error_propagate(errp, local_err);
>> +return NULL;
>> +}
>> +
>> +static int each_option(void *opaque, QemuOpts *opts, Error **errp)
>> +{
>> +Audiodev *dev = parse_option(opts, errp);
>> +if (!dev) {
>> +return -1;
>> +}
>> +return audio_init(dev);
>> +}
>> +
>> +void audio_set_options(void)
>> +{
>> +if (qemu_opts_foreach(qemu_find_opts("audiodev"), each_option, NULL,
>> +  &error_abort)) {
>> +exit(1);
>> +}
>> +}
> [...]
>> diff --git a/vl.c b/vl.c
>> index 8353d3c718..b5364ffe46 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -3074,6 +3074,7 @@ int main(int argc, char **argv, char **envp)
>>  qemu_add_opts(&qemu_option_rom_opts);
>>  qemu_add_opts(&qemu_machine_opts);
>>  qemu_add_opts(&qemu_accel_opts);
>> +qemu_add_opts(&qemu_audiodev_opts);
>>  qemu_add_opts(&qemu_mem_opts);
>>  qemu_add_opts(&qemu_smp_opts);
>>  qemu_add_opts(&qemu_boot_opts);
>> @@ -3307,9 +3308,15 @@ int main(int argc, char **argv, char **envp)
>>  add_device_config(DEV_BT, optarg);
>>  break;
>>  case QEMU_OPTION_audio_help:
>> -AUD_help ();
>> +audio_legacy_help();
>>  exit (0);
>>  break;
>> +case QEMU_OPTION_audiodev:
>> +if (!qemu_opts_parse_noisily(qemu_find_opts("audiodev"),
>> + optarg, true)) {
>> +exit(1);
>> +}
>> +break;
>>  case QEMU_OPTION_soundhw:
>>  select_soundhw (optarg);
>>

Re: [Qemu-devel] [PATCH 3/3] machine: Use shorter format for GlobalProperty arrays

2019-01-07 Thread Marc-André Lureau

On Mon, Jan 7, 2019 at 11:33 PM Eduardo Habkost  wrote:
>
> Instead of verbose arrays with 4 lines for each entry, make each
> entry take only one line.  This makes long arrays that couldn't
> fit in the screen become short and readable.
>
> Signed-off-by: Eduardo Habkost 
> ---
>  include/hw/i386/pc.h   |  18 +-
>  hw/core/machine.c  | 338 -
>  hw/i386/pc.c   | 720 +++--
>  hw/i386/pc_piix.c  | 192 ++
>  hw/ppc/spapr.c |  72 +---
>  hw/s390x/s390-virtio-ccw.c |  75 +---
>  hw/xen/xen-common.c|  18 +-
>  7 files changed, 265 insertions(+), 1168 deletions(-)

Nice diff state, hopefully I didn't miss any before/after difference:
Reviewed-by: Marc-André Lureau 

>
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 84720bede9..0abbe45637 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -354,21 +354,9 @@ extern const size_t pc_compat_1_4_len;
>   * depending on QEMU versions up to QEMU 2.4.
>   */
>  #define PC_CPU_MODEL_IDS(v) \
> -{\
> -.driver   = "qemu32-" TYPE_X86_CPU,\
> -.property = "model-id",\
> -.value= "QEMU Virtual CPU version " v,\
> -},\
> -{\
> -.driver   = "qemu64-" TYPE_X86_CPU,\
> -.property = "model-id",\
> -.value= "QEMU Virtual CPU version " v,\
> -},\
> -{\
> -.driver   = "athlon-" TYPE_X86_CPU,\
> -.property = "model-id",\
> -.value= "QEMU Virtual CPU version " v,\
> -},
> +{ "qemu32-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },\
> +{ "qemu64-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },\
> +{ "athlon-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },
>
>  #define DEFINE_PC_MACHINE(suffix, namestr, initfn, optsfn) \
>  static void pc_machine_##suffix##_class_init(ObjectClass *oc, void 
> *data) \
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 4b4d6c23de..5530b71981 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -24,23 +24,10 @@
>  #include "hw/pci/pci.h"
>
>  GlobalProperty hw_compat_3_1[] = {
> -{
> -.driver   = "pcie-root-port",
> -.property = "x-speed",
> -.value= "2_5",
> -},{
> -.driver   = "pcie-root-port",
> -.property = "x-width",
> -.value= "1",
> -},{
> -.driver   = "memory-backend-file",
> -.property = "x-use-canonical-path-for-ramblock-id",
> -.value= "true",
> -},{
> -.driver   = "memory-backend-memfd",
> -.property = "x-use-canonical-path-for-ramblock-id",
> -.value= "true",
> -},
> +{ "pcie-root-port", "x-speed", "2_5" },
> +{ "pcie-root-port", "x-width", "1" },
> +{ "memory-backend-file", "x-use-canonical-path-for-ramblock-id", "true" 
> },
> +{ "memory-backend-memfd", "x-use-canonical-path-for-ramblock-id", "true" 
> },
>  };
>  const size_t hw_compat_3_1_len = G_N_ELEMENTS(hw_compat_3_1);
>
> @@ -48,269 +35,96 @@ GlobalProperty hw_compat_3_0[] = {};
>  const size_t hw_compat_3_0_len = G_N_ELEMENTS(hw_compat_3_0);
>
>  GlobalProperty hw_compat_2_12[] = {
> -{
> -.driver   = "migration",
> -.property = "decompress-error-check",
> -.value= "off",
> -},{
> -.driver   = "hda-audio",
> -.property = "use-timer",
> -.value= "false",
> -},{
> -.driver   = "cirrus-vga",
> -.property = "global-vmstate",
> -.value= "true",
> -},{
> -.driver   = "VGA",
> -.property = "global-vmstate",
> -.value= "true",
> -},{
> -.driver   = "vmware-svga",
> -.property = "global-vmstate",
> -.value= "true",
> -},{
> -.driver   = "qxl-vga",
> -.property = "global-vmstate",
> -.value= "true",
> -},
> +{ "migration", "decompress-error-check", "off" },
> +{ "hda-audio", "use-timer", "false" },
> +{ "cirrus-vga", "global-vmstate", "true" },
> +{ "VGA", "global-vmstate", "true" },
> +{ "vmware-svga", "global-vmstate", "true" },
> +{ "qxl-vga", "global-vmstate", "true" },
>  };
>  const size_t hw_compat_2_12_len = G_N_ELEMENTS(hw_compat_2_12);
>
>  GlobalProperty hw_compat_2_11[] = {
> -{
> -.driver   = "hpet",
> -.property = "hpet-offset-saved",
> -.value= "false",
> -},{
> -.driver   = "virtio-blk-pci",
> -.property = "vectors",
> -.value= "2",
> -},{
> -.driver   = "vhost-user-blk-pci",
> -.property = "vectors",
> -.value= "2",
> -},{
> -.driver   = "e1000",
> -.property = "migrate_tso_props",
> -.value= "off",
> -},
> +{ "hpet", "hpet-offset-saved", "false" },
> +{ "virtio-blk-pci", "vectors", "2" },
> +{ "vhost-user-blk-pci", "vectors", "2" },
> +{ "e1000", "m

Re: [Qemu-devel] [PATCH 2/3] machine: Eliminate unnecessary stringify() usage

2019-01-07 Thread Marc-André Lureau

On Mon, Jan 7, 2019 at 11:32 PM Eduardo Habkost  wrote:
>
> stringify() is useful when we need to use macros in compat_props
> (like when we set virtio-baloon-pci.class=PCI_CLASS_MEMORY_RAM at
> pc_i440fx_1_0_machine_options()), but it is pointless when we are
> already providing a number literal.
>
> Replace stringify() with string literals when appropriate.
>
> Signed-off-by: Eduardo Habkost 

Reviewed-by: Marc-André Lureau 

> ---
>  hw/core/machine.c |  8 ++--
>  hw/i386/pc.c  | 94 +++
>  hw/i386/pc_piix.c | 30 +++
>  hw/ppc/spapr.c|  2 +-
>  4 files changed, 67 insertions(+), 67 deletions(-)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index f8563efb86..4b4d6c23de 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -135,11 +135,11 @@ GlobalProperty hw_compat_2_8[] = {
>  {
>  .driver   = "fw_cfg_mem",
>  .property = "x-file-slots",
> -.value= stringify(0x10),
> +.value= "0x10",
>  },{
>  .driver   = "fw_cfg_io",
>  .property = "x-file-slots",
> -.value= stringify(0x10),
> +.value= "0x10",
>  },{
>  .driver   = "pflash_cfi01",
>  .property = "old-multiple-chip-handling",
> @@ -337,11 +337,11 @@ GlobalProperty hw_compat_2_1[] = {
>  },{
>  .driver   = "usb-mouse",
>  .property = "usb_version",
> -.value= stringify(1),
> +.value= "1",
>  },{
>  .driver   = "usb-kbd",
>  .property = "usb_version",
> -.value= stringify(1),
> +.value= "1",
>  },{
>  .driver   = "virtio-pci",
>  .property = "virtio-pci-bus-master-bug-migration",
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 4952feb476..ff14b6d4df 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -148,11 +148,11 @@ GlobalProperty pc_compat_2_12[] = {
>  },{
>  .driver   = "EPYC-" TYPE_X86_CPU,
>  .property = "xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "EPYC-IBPB-" TYPE_X86_CPU,
>  .property = "xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },
>  };
>  const size_t pc_compat_2_12_len = G_N_ELEMENTS(pc_compat_2_12);
> @@ -191,7 +191,7 @@ GlobalProperty pc_compat_2_9[] = {
>  {
>  .driver   = "mch",
>  .property = "extended-tseg-mbytes",
> -.value= stringify(0),
> +.value= "0",
>  },
>  };
>  const size_t pc_compat_2_9_len = G_N_ELEMENTS(pc_compat_2_9);
> @@ -365,75 +365,75 @@ GlobalProperty pc_compat_2_3[] = {
>  },{
>  .driver   = "qemu64" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(4),
> +.value= "4",
>  },{
>  .driver   = "kvm64" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(5),
> +.value= "5",
>  },{
>  .driver   = "pentium3" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(2),
> +.value= "2",
>  },{
>  .driver   = "n270" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(5),
> +.value= "5",
>  },{
>  .driver   = "Conroe" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(4),
> +.value= "4",
>  },{
>  .driver   = "Penryn" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(4),
> +.value= "4",
>  },{
>  .driver   = "Nehalem" "-" TYPE_X86_CPU,
>  .property = "min-level",
> -.value= stringify(4),
> +.value= "4",
>  },{
>  .driver   = "n270" "-" TYPE_X86_CPU,
>  .property = "min-xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "Penryn" "-" TYPE_X86_CPU,
>  .property = "min-xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "Conroe" "-" TYPE_X86_CPU,
>  .property = "min-xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "Nehalem" "-" TYPE_X86_CPU,
>  .property = "min-xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "Westmere" "-" TYPE_X86_CPU,
>  .property = "min-xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "SandyBridge" "-" TYPE_X86_CPU,
>  .property = "min-xlevel",
> -.value= stringify(0x800a),
> +.value= "0x800a",
>  },{
>  .driver   = "IvyBridge" "-" TYPE_

Re: [Qemu-devel] [PATCH 1/3] spapr: Eliminate SPAPR_PCI_2_7_MMIO_WIN_SIZE macro

2019-01-07 Thread Marc-André Lureau

On Mon, Jan 7, 2019 at 11:34 PM Eduardo Habkost  wrote:
>
> The macro is only used in one place, where the purpose of the
> value is obvious.  Eliminate the macro so we don't need to rely
> on stringify().
>
> Signed-off-by: Eduardo Habkost 

Reviewed-by: Marc-André Lureau 

> ---
>  include/hw/pci-host/spapr.h | 1 -
>  hw/ppc/spapr.c  | 2 +-
>  2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 7c66c3872f..a85a995b6c 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -99,7 +99,6 @@ struct sPAPRPHBState {
>  #define SPAPR_PCI_BASE   (1ULL << 45) /* 32 TiB */
>  #define SPAPR_PCI_LIMIT  (1ULL << 46) /* 64 TiB */
>
> -#define SPAPR_PCI_2_7_MMIO_WIN_SIZE  0xf8000
>  #define SPAPR_PCI_IO_WIN_SIZE0x1
>
>  #define SPAPR_PCI_MSI_WINDOW 0x400ULL
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5671608cea..bff42f0adb 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -4225,7 +4225,7 @@ static void 
> spapr_machine_2_7_class_options(MachineClass *mc)
>  {
>  .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,
>  .property = "mem_win_size",
> -.value= stringify(SPAPR_PCI_2_7_MMIO_WIN_SIZE),
> +.value= "0xf8000",
>  },
>  {
>  .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,
> --
> 2.18.0.rc1.1.g3f1ff2140
>
>


-- 
Marc-André Lureau

[Qemu-devel] [PATCH 1/3] spapr: Eliminate SPAPR_PCI_2_7_MMIO_WIN_SIZE macro

2019-01-07 Thread Eduardo Habkost

The macro is only used in one place, where the purpose of the
value is obvious.  Eliminate the macro so we don't need to rely
on stringify().

Signed-off-by: Eduardo Habkost 
---
 include/hw/pci-host/spapr.h | 1 -
 hw/ppc/spapr.c  | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 7c66c3872f..a85a995b6c 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -99,7 +99,6 @@ struct sPAPRPHBState {
 #define SPAPR_PCI_BASE   (1ULL << 45) /* 32 TiB */
 #define SPAPR_PCI_LIMIT  (1ULL << 46) /* 64 TiB */
 
-#define SPAPR_PCI_2_7_MMIO_WIN_SIZE  0xf8000
 #define SPAPR_PCI_IO_WIN_SIZE0x1
 
 #define SPAPR_PCI_MSI_WINDOW 0x400ULL
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5671608cea..bff42f0adb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4225,7 +4225,7 @@ static void spapr_machine_2_7_class_options(MachineClass 
*mc)
 {
 .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,
 .property = "mem_win_size",
-.value= stringify(SPAPR_PCI_2_7_MMIO_WIN_SIZE),
+.value= "0xf8000",
 },
 {
 .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,
-- 
2.18.0.rc1.1.g3f1ff2140

[Qemu-devel] [PATCH 2/3] machine: Eliminate unnecessary stringify() usage

2019-01-07 Thread Eduardo Habkost

stringify() is useful when we need to use macros in compat_props
(like when we set virtio-baloon-pci.class=PCI_CLASS_MEMORY_RAM at
pc_i440fx_1_0_machine_options()), but it is pointless when we are
already providing a number literal.

Replace stringify() with string literals when appropriate.

Signed-off-by: Eduardo Habkost 
---
 hw/core/machine.c |  8 ++--
 hw/i386/pc.c  | 94 +++
 hw/i386/pc_piix.c | 30 +++
 hw/ppc/spapr.c|  2 +-
 4 files changed, 67 insertions(+), 67 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index f8563efb86..4b4d6c23de 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -135,11 +135,11 @@ GlobalProperty hw_compat_2_8[] = {
 {
 .driver   = "fw_cfg_mem",
 .property = "x-file-slots",
-.value= stringify(0x10),
+.value= "0x10",
 },{
 .driver   = "fw_cfg_io",
 .property = "x-file-slots",
-.value= stringify(0x10),
+.value= "0x10",
 },{
 .driver   = "pflash_cfi01",
 .property = "old-multiple-chip-handling",
@@ -337,11 +337,11 @@ GlobalProperty hw_compat_2_1[] = {
 },{
 .driver   = "usb-mouse",
 .property = "usb_version",
-.value= stringify(1),
+.value= "1",
 },{
 .driver   = "usb-kbd",
 .property = "usb_version",
-.value= stringify(1),
+.value= "1",
 },{
 .driver   = "virtio-pci",
 .property = "virtio-pci-bus-master-bug-migration",
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 4952feb476..ff14b6d4df 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -148,11 +148,11 @@ GlobalProperty pc_compat_2_12[] = {
 },{
 .driver   = "EPYC-" TYPE_X86_CPU,
 .property = "xlevel",
-.value= stringify(0x800a),
+.value= "0x800a",
 },{
 .driver   = "EPYC-IBPB-" TYPE_X86_CPU,
 .property = "xlevel",
-.value= stringify(0x800a),
+.value= "0x800a",
 },
 };
 const size_t pc_compat_2_12_len = G_N_ELEMENTS(pc_compat_2_12);
@@ -191,7 +191,7 @@ GlobalProperty pc_compat_2_9[] = {
 {
 .driver   = "mch",
 .property = "extended-tseg-mbytes",
-.value= stringify(0),
+.value= "0",
 },
 };
 const size_t pc_compat_2_9_len = G_N_ELEMENTS(pc_compat_2_9);
@@ -365,75 +365,75 @@ GlobalProperty pc_compat_2_3[] = {
 },{
 .driver   = "qemu64" "-" TYPE_X86_CPU,
 .property = "min-level",
-.value= stringify(4),
+.value= "4",
 },{
 .driver   = "kvm64" "-" TYPE_X86_CPU,
 .property = "min-level",
-.value= stringify(5),
+.value= "5",
 },{
 .driver   = "pentium3" "-" TYPE_X86_CPU,
 .property = "min-level",
-.value= stringify(2),
+.value= "2",
 },{
 .driver   = "n270" "-" TYPE_X86_CPU,
 .property = "min-level",
-.value= stringify(5),
+.value= "5",
 },{
 .driver   = "Conroe" "-" TYPE_X86_CPU,
 .property = "min-level",
-.value= stringify(4),
+.value= "4",
 },{
 .driver   = "Penryn" "-" TYPE_X86_CPU,
 .property = "min-level",
-.value= stringify(4),
+.value= "4",
 },{
 .driver   = "Nehalem" "-" TYPE_X86_CPU,
 .property = "min-level",
-.value= stringify(4),
+.value= "4",
 },{
 .driver   = "n270" "-" TYPE_X86_CPU,
 .property = "min-xlevel",
-.value= stringify(0x800a),
+.value= "0x800a",
 },{
 .driver   = "Penryn" "-" TYPE_X86_CPU,
 .property = "min-xlevel",
-.value= stringify(0x800a),
+.value= "0x800a",
 },{
 .driver   = "Conroe" "-" TYPE_X86_CPU,
 .property = "min-xlevel",
-.value= stringify(0x800a),
+.value= "0x800a",
 },{
 .driver   = "Nehalem" "-" TYPE_X86_CPU,
 .property = "min-xlevel",
-.value= stringify(0x800a),
+.value= "0x800a",
 },{
 .driver   = "Westmere" "-" TYPE_X86_CPU,
 .property = "min-xlevel",
-.value= stringify(0x800a),
+.value= "0x800a",
 },{
 .driver   = "SandyBridge" "-" TYPE_X86_CPU,
 .property = "min-xlevel",
-.value= stringify(0x800a),
+.value= "0x800a",
 },{
 .driver   = "IvyBridge" "-" TYPE_X86_CPU,
 .property = "min-xlevel",
-.value= stringify(0x800a),
+.value= "0x800a",
 },{
 .driver   = "Haswell" "-" TYPE_X86_CPU,
 .property = "min-xlevel",
-.value= stringify(0x800a),
+.value= "0x800a",
 },{
 .driver   = "Haswell-noTSX" "-" TYPE_X86_CPU,
 .prop

[Qemu-devel] [PATCH 3/3] machine: Use shorter format for GlobalProperty arrays

2019-01-07 Thread Eduardo Habkost

Instead of verbose arrays with 4 lines for each entry, make each
entry take only one line.  This makes long arrays that couldn't
fit in the screen become short and readable.

Signed-off-by: Eduardo Habkost 
---
 include/hw/i386/pc.h   |  18 +-
 hw/core/machine.c  | 338 -
 hw/i386/pc.c   | 720 +++--
 hw/i386/pc_piix.c  | 192 ++
 hw/ppc/spapr.c |  72 +---
 hw/s390x/s390-virtio-ccw.c |  75 +---
 hw/xen/xen-common.c|  18 +-
 7 files changed, 265 insertions(+), 1168 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 84720bede9..0abbe45637 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -354,21 +354,9 @@ extern const size_t pc_compat_1_4_len;
  * depending on QEMU versions up to QEMU 2.4.
  */
 #define PC_CPU_MODEL_IDS(v) \
-{\
-.driver   = "qemu32-" TYPE_X86_CPU,\
-.property = "model-id",\
-.value= "QEMU Virtual CPU version " v,\
-},\
-{\
-.driver   = "qemu64-" TYPE_X86_CPU,\
-.property = "model-id",\
-.value= "QEMU Virtual CPU version " v,\
-},\
-{\
-.driver   = "athlon-" TYPE_X86_CPU,\
-.property = "model-id",\
-.value= "QEMU Virtual CPU version " v,\
-},
+{ "qemu32-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },\
+{ "qemu64-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },\
+{ "athlon-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, },
 
 #define DEFINE_PC_MACHINE(suffix, namestr, initfn, optsfn) \
 static void pc_machine_##suffix##_class_init(ObjectClass *oc, void *data) \
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 4b4d6c23de..5530b71981 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -24,23 +24,10 @@
 #include "hw/pci/pci.h"
 
 GlobalProperty hw_compat_3_1[] = {
-{
-.driver   = "pcie-root-port",
-.property = "x-speed",
-.value= "2_5",
-},{
-.driver   = "pcie-root-port",
-.property = "x-width",
-.value= "1",
-},{
-.driver   = "memory-backend-file",
-.property = "x-use-canonical-path-for-ramblock-id",
-.value= "true",
-},{
-.driver   = "memory-backend-memfd",
-.property = "x-use-canonical-path-for-ramblock-id",
-.value= "true",
-},
+{ "pcie-root-port", "x-speed", "2_5" },
+{ "pcie-root-port", "x-width", "1" },
+{ "memory-backend-file", "x-use-canonical-path-for-ramblock-id", "true" },
+{ "memory-backend-memfd", "x-use-canonical-path-for-ramblock-id", "true" },
 };
 const size_t hw_compat_3_1_len = G_N_ELEMENTS(hw_compat_3_1);
 
@@ -48,269 +35,96 @@ GlobalProperty hw_compat_3_0[] = {};
 const size_t hw_compat_3_0_len = G_N_ELEMENTS(hw_compat_3_0);
 
 GlobalProperty hw_compat_2_12[] = {
-{
-.driver   = "migration",
-.property = "decompress-error-check",
-.value= "off",
-},{
-.driver   = "hda-audio",
-.property = "use-timer",
-.value= "false",
-},{
-.driver   = "cirrus-vga",
-.property = "global-vmstate",
-.value= "true",
-},{
-.driver   = "VGA",
-.property = "global-vmstate",
-.value= "true",
-},{
-.driver   = "vmware-svga",
-.property = "global-vmstate",
-.value= "true",
-},{
-.driver   = "qxl-vga",
-.property = "global-vmstate",
-.value= "true",
-},
+{ "migration", "decompress-error-check", "off" },
+{ "hda-audio", "use-timer", "false" },
+{ "cirrus-vga", "global-vmstate", "true" },
+{ "VGA", "global-vmstate", "true" },
+{ "vmware-svga", "global-vmstate", "true" },
+{ "qxl-vga", "global-vmstate", "true" },
 };
 const size_t hw_compat_2_12_len = G_N_ELEMENTS(hw_compat_2_12);
 
 GlobalProperty hw_compat_2_11[] = {
-{
-.driver   = "hpet",
-.property = "hpet-offset-saved",
-.value= "false",
-},{
-.driver   = "virtio-blk-pci",
-.property = "vectors",
-.value= "2",
-},{
-.driver   = "vhost-user-blk-pci",
-.property = "vectors",
-.value= "2",
-},{
-.driver   = "e1000",
-.property = "migrate_tso_props",
-.value= "off",
-},
+{ "hpet", "hpet-offset-saved", "false" },
+{ "virtio-blk-pci", "vectors", "2" },
+{ "vhost-user-blk-pci", "vectors", "2" },
+{ "e1000", "migrate_tso_props", "off" },
 };
 const size_t hw_compat_2_11_len = G_N_ELEMENTS(hw_compat_2_11);
 
 GlobalProperty hw_compat_2_10[] = {
-{
-.driver   = "virtio-mouse-device",
-.property = "wheel-axis",
-.value= "false",
-},{
-.driver   = "virtio-tablet-device",
-.property = "wheel-axis",
-.value= "false",
-},
+{ "virtio-mouse-device", "wheel-axis", "false"

[Qemu-devel] [PATCH 0/3] machine: Make compat_props arrays shorter and more readable

2019-01-07 Thread Eduardo Habkost

Current declarations of compat_props arrays are very verbose,
with each entry taking 4 lines of code.  By omitting the field
designators, we can make each array entry fit a single line of
code and be more readable.

Eduardo Habkost (3):
  spapr: Eliminate SPAPR_PCI_2_7_MMIO_WIN_SIZE macro
  machine: Eliminate unnecessary stringify() usage
  machine: Use shorter format for GlobalProperty arrays

 include/hw/i386/pc.h|  18 +-
 include/hw/pci-host/spapr.h |   1 -
 hw/core/machine.c   | 338 -
 hw/i386/pc.c| 720 +++-
 hw/i386/pc_piix.c   | 192 ++
 hw/ppc/spapr.c  |  72 +---
 hw/s390x/s390-virtio-ccw.c  |  75 +---
 hw/xen/xen-common.c |  18 +-
 8 files changed, 265 insertions(+), 1169 deletions(-)

-- 
2.18.0.rc1.1.g3f1ff2140

Re: [Qemu-devel] [PATCH] compat: Use explicit type names on HW_COMPAT_2_6

2019-01-07 Thread Eduardo Habkost

On Mon, Jan 07, 2019 at 09:12:06AM +0100, Cornelia Huck wrote:
> On Fri, 4 Jan 2019 20:00:50 -0200
> Eduardo Habkost  wrote:
> 
> > On Fri, Jan 04, 2019 at 04:13:15PM -0500, Michael S. Tsirkin wrote:
> > > On Fri, Jan 04, 2019 at 07:06:56PM -0200, Eduardo Habkost wrote:  
> > > > On Fri, Jan 04, 2019 at 03:48:02PM -0500, Michael S. Tsirkin wrote:  
> > > > > On Fri, Jan 04, 2019 at 06:09:52PM -0200, Eduardo Habkost wrote:  
> 
> > > > > > Anyway, while writing this I noticed another issue: many of the
> > > > > > virtio devices in QEMU 2.6 were already modern-only!
> > > > > > 
> > > > > > Setting disable-modern=off on modern-only devices like virtio-vga
> > > > > > or virtio-tablet-pci doesn't make sense.  This means setting
> > > > > > virtio-pci.disable-modern=off on HW_COMPAT_2_6 was incorrect even
> > > > > > before the -non-transitional and -transitional device types were
> > > > > > introduced.  
> > > > > 
> > > > > 
> > > > > It did create an opportunity to create non working devices.
> > > > > 
> > > > > Whether that's incorrect as such I'm not sure.  
> > > > 
> > > > This is not just creating the opportunity for an user to
> > > > disable-modern=on.  HW_COMPAT_2_6 is actually setting
> > > > disable-modern=on on virtio-vga and other modern-only devices.
> > > > Sounds like a mistake to me.
> > > > 
> > > > Luckily those modern-only devices silently ignore the
> > > > disable-modern/disable-legacy properties, but this might change
> > > > in the future.  
> > > 
> > > Worry about it then?  
> > 
> > Right, we don't need to worry about it today.  But if a solution
> > to the crash reported by Thomas will make the problem go away,
> > that's even better.
> 
> It seems your patch with the modern-only devices removed from the list
> would achieve that?

I think so.  But I think I'll do the removal of modern-only
devices from the list in a separate patch, just to be safe.

-- 
Eduardo

Re: [Qemu-devel] [QEMU-devel][PATCH v4 0/2] Fix concurrent aio_poll/set_fd_handler.

2019-01-07 Thread remy . noel


On Thu, Dec 20, 2018 at 04:20:28PM +0100, Remy Noel wrote:

From: Remy Noel 

It is possible for an io_poll/read/write callback to be concurrently executed 
along
with an aio_set_fd_handlers. This can cause all sorts of problems, like
a NULL callback or a bad opaque pointer.

V2:
   * Do not use RCU anymore as it inccurs a performance loss
V3:
   * Don't drop revents when a handler is modified [Stefan]
V4:
   * Unregister fd from ctx epoll when removing fd_handler [Paolo]

Remy Noel (2):
 aio-posix: Unregister fd from ctx epoll when removing fd_handler.
 aio-posix: Fix concurrent aio_poll/set_fd_handler.

util/aio-posix.c | 90 +---
util/aio-win32.c | 67 ---
2 files changed, 84 insertions(+), 73 deletions(-)

--
2.19.2


ping.

Does it needs anything for getting queued ?

Thanks.

Remy.

Re: [Qemu-devel] [PATCH 10/15] s390-bios: Support for running format-0/1 channel programs

2019-01-07 Thread Jason J. Herne


On 12/13/18 12:21 PM, Cornelia Huck wrote:

On Wed, 12 Dec 2018 09:11:13 -0500
"Jason J. Herne"  wrote:


Add struct for format-0 ccws. Support executing format-0 channel
programs and waiting for their completion before continuing execution.
This will be used for real dasd ipl.

Add cu_type() to channel io library. This will be used to query control
unit type which is used to determine if we are booting a virtio device or a
real dasd device.

Signed-off-by: Jason J. Herne 
---
  pc-bios/s390-ccw/cio.c  | 108 ++
  pc-bios/s390-ccw/cio.h  | 124 ++--
  pc-bios/s390-ccw/s390-ccw.h |   1 +
  pc-bios/s390-ccw/start.S|  33 +++-
  4 files changed, 261 insertions(+), 5 deletions(-)



(...)


+static bool irb_error(Irb *irb)
+{
+/* We have to ignore Incorrect Length (cstat == 0x40) indicators because
+ * real devices expect a 24 byte SenseID  buffer, and virtio devices expect
+ * a much larger buffer. Neither device type can tolerate a buffer size
+ * different from what they expect so they set this indicator.


Hm, can't you specify SLI for SenseID?



Yes, but this requires modifying run_ccw() in virtio.c to always specify the SLI flag. I'm 
not sure that is the best choice? I suppose I could add an sli argument to run_ccw if 
you'd prefer that.



+ */
+if (irb->scsw.cstat != 0x00 && irb->scsw.cstat != 0x40) {
+return true;
+}
+return irb->scsw.dstat != 0xc;


Also, shouldn't you actually use the #defines you introduce further
down?



Yep, I added the defines after I wrote this code. I'll fix that.


+}
+
+/* Executes a channel program at a given subchannel. The request to run the
+ * channel program is sent to the subchannel, we then wait for the interrupt
+ * singaling completion of the I/O operation(s) perfomed by the channel
+ * program. Lastly we verify that the i/o operation completed without error and
+ * that the interrupt we received was for the subchannel used to run the
+ * channel program.
+ *
+ * Note: This function assumes it is running in an environment where no other
+ * cpus are generating or receiving I/O interrupts. So either run it in a
+ * single-cpu environment or make sure all other cpus are not doing I/O and
+ * have I/O interrupts masked off.


Anything about iscs here (cr6)?



Those details are handled in the assembler code. Do you think I should mention something 
about cr6 here?



+ */
+int do_cio(SubChannelId schid, uint32_t ccw_addr, int fmt)
+{
+CmdOrb orb = {};
+Irb irb = {};
+SenseData sd;
+int rc, retries = 0;
+
+IPL_assert(fmt == 0 || fmt == 1, "Invalid ccw format");
+
+/* ccw_addr must be <= 24 bits and point to at least one whole ccw. */
+if (fmt == 0) {
+IPL_assert(ccw_addr <= 0xFF - 8, "Invalid ccw address");
+}
+
+orb.fmt = fmt ;
+orb.pfch = 1;  /* QEMU's cio implementation requires prefetch */
+orb.c64 = 1;   /* QEMU's cio implementation requires 64-bit idaws */
+orb.lpm = 0xFF; /* All paths allowed */
+orb.cpa = ccw_addr;
+
+while (true) {
+rc = ssch(schid, &orb);


I think we can get here:
- cc 0 -> all ok
- cc 1 -> status pending; could that be an unsolicited interrupt from
   the device? or would we always get a deferred cc 1 in that case?
- cc 2 -> another function pending; Should Not Happen
- cc 3 -> it's dead, Jim

So I'm wondering whether we should consume the status and retry for cc
1. The handling of the others is fine.



I took a look at css_do_ssch() in hw/s390x/css.c and it appears as though CC1 is a 
possibility here. I'm not against taking action, but I suspect we would have to clear the 
status with a basic sense (or something) before simply retrying... right?


Is it safe for us to just assume we can clear it and move on? It seems like an edge case 
that we'd be better off failing on. Perhaps let the user try again which will redrive the 
process?




+if (rc) {
+print_int("ssch failed with rc=", rc);
+break;
+}
+
+consume_io_int();
+
+/* Clear read */


I find that comment confusing. /* collect status */ maybe?


+rc = tsch(schid, &irb);


Here we can get:
- cc 0 -> status pending, all ok
- cc 1 -> no status pending, Should Not Happen
- cc 3 -> it's dead, Jim

So this looks fine.


+if (rc) {
+print_int("tsch failed with rc=", rc);
+break;
+}
+
+if (!irb_error(&irb)) {
+break;
+}
+
+/* Unexpected unit check. Use sense to clear unit check then retry. */


The dasds still don't support concurrent sense, do they? Might also be
worth investigating whether some unit checks are more "recoverable"
than others.



I wasn't sure on concurrent sense. I'd bet there are situations or environments where it 
won't be supported so it seems safest to assume we don't have it.


We already recover from the one unit check

Re: [Qemu-devel] [Bug 1810545] Re: [alpha] Strange exception address reported

2019-01-07 Thread Peter Maydell

On Mon, 7 Jan 2019 at 18:10, Peter Maydell  wrote:
(re: https://bugs.launchpad.net/bugs/1810545)

> The problem seems to be that the PC we report for an OPCDEC
> is first selected by gen_invalid()/gen_excp() in
> target/alpha/translate.c, which uses pc_next (ie the insn's
> address plus 4). But that is then handed through to our custom
> PALcode 
> (https://git.qemu.org/?p=qemu-palcode.git;a=blob;f=pal.S;h=1781c4b415700ca3a68af07fdae90ae43e722501;hb=HEAD)
>  which does
>   addqp6, 4, p1  // increment past the faulting insn
> resulting in insn + 8.
>
> That is, the palcode and the QEMU code have a disagreement about what
> the (private) API between them is. I'm not sure which side is wrong and
> should be corrected. I think the linux-user code assumes the same thing
> that translate.c is doing, so perhaps the palcode.

Richard -- any suggestions for which side of this API we should
be changing?

thanks
-- PMM

[Qemu-devel] [Bug 1810603] Re: QEMU QCow Images grow dramatically

2019-01-07 Thread Peter Maydell

** Summary changed:

- QEMU QCow Images crow dramatically
+ QEMU QCow Images grow dramatically

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1810603

Title:
  QEMU QCow Images grow dramatically

Status in QEMU:
  New

Bug description:
  I've recently migrated our VM infrastructure (~200 guest on 15 hosts)
  from vbox to Qemu (using KVM / libvirt). We have a master image (QEMU
  QCow v3) from which we spawn multiple instances (linked clones). All
  guests are being revert once per hour for security reasons.

  About 2 weeks after we successfully migrated to Qemu, we noticed that
  almost all disks went full across all 15 hosts. Our investigation
  showed that the initial qcow disk images blow up from a few gigabytes
  to 100GB and more. This should not happen, as we revert all VMs back
  to the initial snapshot once per hour and hence all changes that have
  been made to disks must be reverted too.

  We did an addition test with 24 hour time frame with which we could
  reproduce this bug as documented below.

  Initial disk image size (created on Jan 04):
  -rw-r--r-- 1 root root 7.1G Jan  4 15:59 W10-TS01-0.img
  -rw-r--r-- 1 root root 7.3G Jan  4 15:59 W10-TS02-0.img
  -rw-r--r-- 1 root root 7.4G Jan  4 15:59 W10-TS03-0.img
  -rw-r--r-- 1 root root 8.3G Jan  4 16:02 W10-CLIENT01-0.img
  -rw-r--r-- 1 root root 8.6G Jan  4 16:05 W10-CLIENT02-0.img
  -rw-r--r-- 1 root root 8.0G Jan  4 16:05 W10-CLIENT03-0.img
  -rw-r--r-- 1 root root 8.3G Jan  4 16:08 W10-CLIENT04-0.img
  -rw-r--r-- 1 root root 8.1G Jan  4 16:12 W10-CLIENT05-0.img
  -rw-r--r-- 1 root root 8.0G Jan  4 16:12 W10-CLIENT06-0.img
  -rw-r--r-- 1 root root 8.1G Jan  4 16:16 W10-CLIENT07-0.img
  -rw-r--r-- 1 root root 7.6G Jan  4 16:16 W10-CLIENT08-0.img
  -rw-r--r-- 1 root root 7.6G Jan  4 16:19 W10-CLIENT09-0.img
  -rw-r--r-- 1 root root 7.5G Jan  4 16:21 W10-ROUTER-0.img
  -rw-r--r-- 1 root root  18G Jan  4 16:25 W10-MASTER-IMG.qcow2

  Disk image size after 24 hours (printed on Jan 05):
  -rw-r--r-- 1 root root  13G Jan  5 15:07 W10-TS01-0.img
  -rw-r--r-- 1 root root 8.9G Jan  5 14:20 W10-TS02-0.img
  -rw-r--r-- 1 root root 9.0G Jan  5 15:07 W10-TS03-0.img
  -rw-r--r-- 1 root root  10G Jan  5 15:08 W10-CLIENT01-0.img
  -rw-r--r-- 1 root root  11G Jan  5 15:08 W10-CLIENT02-0.img
  -rw-r--r-- 1 root root  11G Jan  5 15:08 W10-CLIENT03-0.img
  -rw-r--r-- 1 root root  11G Jan  5 15:08 W10-CLIENT04-0.img
  -rw-r--r-- 1 root root  19G Jan  5 15:07 W10-CLIENT05-0.img
  -rw-r--r-- 1 root root  14G Jan  5 15:08 W10-CLIENT06-0.img
  -rw-r--r-- 1 root root 9.7G Jan  5 15:07 W10-CLIENT07-0.img
  -rw-r--r-- 1 root root  35G Jan  5 15:08 W10-CLIENT08-0.img
  -rw-r--r-- 1 root root 9.2G Jan  5 15:07 W10-CLIENT09-0.img
  -rw-r--r-- 1 root root  41G Jan  5 15:08 W10-ROUTER-0.img
  -rw-r--r-- 1 root root  18G Jan  4 16:25 W10-MASTER-IMG.qcow2

  You can reproduce this bug as follow:
  1) create an initial disk image
  2) create a linked clone
  3) create a snapshot of the linked clone
  4) revert the snapshot every X minutes / hours

  Due the described behavior / bug, our VM farm is completely down at
  the moment (as we run out of disk space on all host systems). A quick
  fix for this bug would be much appreciated.

  Host OS: Ubuntu 18.04.01 LTS
  Kernel: 4.15.0-43-generic
  Qemu: 3.1.0
  libvirt: 4.10.0
  Guest OS: Windows 10 64bit

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1810603/+subscriptions

[Qemu-devel] [PATCH] spice: Remove unused include

2019-01-07 Thread Frediano Ziglio

The definitions in the header are not  used.
Also this fixes porting SPICE to Windows where the header is not
available.

Signed-off-by: Frediano Ziglio 
---
 ui/spice-core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/ui/spice-core.c b/ui/spice-core.c
index 525e0929b9..fb87944a24 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -18,7 +18,6 @@
 #include "qemu/osdep.h"
 #include 
 
-#include 
 #include "sysemu/sysemu.h"
 
 #include "ui/qemu-spice.h"
-- 
2.20.1

Re: [Qemu-devel] [PATCH v2] qemu-io: Reinitialize optind to 1 (not 0) before parsing inner command.

2019-01-07 Thread Eric Blake

On 1/7/19 12:14 PM, Max Reitz wrote:
> On 07.01.19 18:59, Eric Blake wrote:
>> On 1/7/19 11:50 AM, Max Reitz wrote:
>>
>> Note I didn't set optreset.  It's not present in glibc and the "hard
>> reset" is not necessary in this context.
>
> But it sure sounds like FreeBSD requires you to set it, doesn't it?
>>
>> No.  Quoting https://www.freebsd.org/cgi/man.cgi?getopt(3)
>>
>>  The variables opterr and optind are both initialized to 1.   The 
>> optind
>>  variable may be set to another value before a set of calls  to
>> getopt() in
>>  order to skip over  more or less argv entries.
>>
>> so resetting it to 1 as a soft reset is no different to setting it to 2
>> to skip argv[1].
> 
> In theory it is very much different because the text clearly says "in
> order to skip", not "in order to re-parse or use a different argv".
> Especially the fact that we use different argvs is something that
> implementations may not expect.

Consider the following input:

./prog -ab -cd -ef

against

while ((opt = getopt(argc, argv, "abcdef")) != -1) {
  switch (opt) {
   case 'a': case 'b': case 'f': break;
   case 'c': optind = 3; break;
   case 'd': case 'e': abort();
  }
}

What does that do on BSD?  On glibc, after the third call, optind is
still 2 (but I hard-set it to 3), then the fourth call returns 'd' and
increments optind to 4, before abort()ing, never reaching 'e' or 'f'.
But if BSD goes from 'c' to 'f' and skips 'd' and 'e', it is because BSD
tracks internal state differently from glibc.  Either way, the fact that
setting optind = 3 does NOT make glibc return 'e' or 'f' means that I
did NOT skip ahead to argument 3 (glibc still returned 'd' then skipped
to argument 4; either BSD does the same, or BSD skips to 'f'), and thus
I can argue that the BSD man page is incomplete, and SHOULD be corrected
to mention that assigning to optind to skip to a future argument is safe
ONLY when the hidden state is not affected by being mid-parse of merged
short options.  But how do you get out of the hidden state of merged
short options? By parsing until getopt() returns -1.  And once you've
reached that point, then hidden state is clear, and skipping backwards
is just as reasonable as skipping forwards.

>> I think the BSD man page needs updating, and that will probably happen
>> if I file my promised POSIX defect.
> 
> Sure.  But as it is, it doesn't tell me that resetting optind to 1 is
> sufficient to be able to parse a new argv.

But arguing that something that worked for Richard's testing is wrong,
without reading the BSD source code, isn't going to help us either.

>> I don't see the point - Richard has already tested that optind = 1
>> worked on BSD machines for our purposes, so we don't have to worry about
>> the hard reset aspect of optreset=1.
> 
> Well, and as far as I remember glibc's memcpy() at one point only copied
> in one direction and things broke badly once they reversed it at some
> point for some CPUs.

That was because of buggy software that didn't read the function
contracts, and should have been using memmove( insta

> 
> Just because it works now doesn't mean it will work always if the
> specification allows for different behavior.

Yes, but that's why I need to file a POSIX defect, so that BSD won't
change their current behavior because POSIX will require the soft reset
behavior.

Here's the current thread on the POSIX list:
https://www.mail-archive.com/austin-group-l@opengroup.org/msg03210.html

which I hope to turn into a formal defect soon.

> 
>> (But yes, it would also be nice if
>> BSD and glibc folks could agree on how to do hard resets, instead of
>> having two different incompatible ways)
> I don't see why we should have a general code path if there is no
> standard way of resetting getopt() other than "This seems to work".
> What's so bad about a weak optreset or an
> "#ifdef __FreeBSD__; optreset = 1; #endif"?
> 
> Sure, if you can get POSIX to define the fact that optind = 1 after
> getopt() == -1 will be sufficient to start parsing a new argv, that'd be
> great.  But there is no such standard yet (other than "Why would that
> not work?").

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2] qemu-io: Reinitialize optind to 1 (not 0) before parsing inner command.

2019-01-07 Thread Richard W.M. Jones

On Mon, Jan 07, 2019 at 06:50:53PM +0100, Max Reitz wrote:
[...]

I don't particularly care how we fix this, but it breaks the nbdkit
tests on FreeBSD so I am keen to fix it one way or another.

> And if optreset not being available for glibc is the only issue, I'd say
> adding it as a weak global variable would work without #ifdefs.

The weak global variable doesn't make the code "#ifdef free".  I tried
a patch like this:

+int optreset __attribute__((weak));

...

 static int command(...)
 {
   ...
   optind = 0;
+  optreset = 1;
  ...
 }

but that still doesn't work on FreeBSD.

You have to set optind=1 apparently.  So if we want to set optreset=1
we still end up with #ifdef __FreeBSD__.  The final patch will
end up looking something like:

 static int command(...)
 {
   ...
+#ifdef __FreeBSD__
+  optind = 1;
+  optreset = 1;
+#else
   optind = 0;
+#endif
  ...
 }

If you want me to submit a formal patch like this let me know.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

[Qemu-devel] [PATCH 10/13] spapr: introduce routines to delete the KVM IRQ device

2019-01-07 Thread Cédric Le Goater

If a new interrupt mode is chosen by CAS, the machine generates a
reset to reconfigure. At this point, the connection with the previous
KVM device needs to be closed and a new connection needs to opened
with the KVM device operating the chosen interrupt mode.

New routines are introduced to destroy the XICS and the XIVE KVM
devices. They make use of a new KVM device ioctl which destroys the
device and also disconnects the IRQ presenters from the vCPUs.

Signed-off-by: Cédric Le Goater 
---
 include/hw/ppc/spapr_xive.h |  1 +
 include/hw/ppc/xics.h   |  1 +
 hw/intc/spapr_xive_kvm.c| 60 +
 hw/intc/xics_kvm.c  | 57 +++
 4 files changed, 119 insertions(+)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 52804516e909..f172fc20b650 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -70,6 +70,7 @@ void spapr_xive_map_mmio(sPAPRXive *xive);
  * KVM XIVE device helpers
  */
 void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
+void kvmppc_xive_disconnect(sPAPRXive *xive, Error **errp);
 void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp);
 int kvmppc_xive_pre_save(sPAPRXive *xive);
 int kvmppc_xive_post_load(sPAPRXive *xive, int version_id);
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 07508cbd217e..75d4effb5c5f 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -205,6 +205,7 @@ typedef struct sPAPRMachineState sPAPRMachineState;
 void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
uint32_t phandle);
 int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
+int xics_kvm_disconnect(sPAPRMachineState *spapr, Error **errp);
 void xics_spapr_init(sPAPRMachineState *spapr);
 
 Object *icp_create(Object *cpu, const char *type, XICSFabric *xi,
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index fe58a9ee32d3..93ea8e71047a 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -57,6 +57,16 @@ static void kvm_cpu_enable(CPUState *cs)
 QLIST_INSERT_HEAD(&kvm_enabled_cpus, enabled_cpu, node);
 }
 
+static void kvm_cpu_disable_all(void)
+{
+KVMEnabledCPU *enabled_cpu, *next;
+
+QLIST_FOREACH_SAFE(enabled_cpu, &kvm_enabled_cpus, node, next) {
+QLIST_REMOVE(enabled_cpu, node);
+g_free(enabled_cpu);
+}
+}
+
 /*
  * XIVE Thread Interrupt Management context (KVM)
  */
@@ -769,3 +779,53 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
 /* Map all regions */
 spapr_xive_map_mmio(xive);
 }
+
+void kvmppc_xive_disconnect(sPAPRXive *xive, Error **errp)
+{
+XiveSource *xsrc;
+struct kvm_create_device xive_destroy_device = { 0 };
+size_t esb_len;
+int rc;
+
+if (!kvm_enabled() || !kvmppc_has_cap_xive()) {
+error_setg(errp, "IRQ_XIVE capability must be present for KVM");
+return;
+}
+
+/* The KVM XIVE device is not in use */
+if (!xive || xive->fd == -1) {
+return;
+}
+
+/* Clear the KVM mapping */
+xsrc = &xive->source;
+esb_len = (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
+
+sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 0);
+munmap(xsrc->esb_mmap, esb_len);
+
+sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 1);
+
+sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 2);
+munmap(xive->tm_mmap, 4ull << TM_SHIFT);
+
+/* Destroy the KVM device. This also clears the VCPU presenters */
+xive_destroy_device.fd = xive->fd;
+xive_destroy_device.type = KVM_DEV_TYPE_XIVE;
+rc = kvm_vm_ioctl(kvm_state, KVM_DESTROY_DEVICE, &xive_destroy_device);
+if (rc < 0) {
+error_setg_errno(errp, -rc, "Error on KVM_DESTROY_DEVICE for XIVE");
+}
+close(xive->fd);
+xive->fd = -1;
+
+kvm_kernel_irqchip = false;
+kvm_msi_via_irqfd_allowed = false;
+kvm_gsi_direct_mapping = false;
+
+/* Clear the local list of presenter (hotplug) */
+kvm_cpu_disable_all();
+
+/* VM Change state handler is not needed anymore */
+qemu_del_vm_change_state_handler(xive->change);
+}
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 2426e5b2f4ed..da6a00bc88cc 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -50,6 +50,16 @@ typedef struct KVMEnabledICP {
 static QLIST_HEAD(, KVMEnabledICP)
 kvm_enabled_icps = QLIST_HEAD_INITIALIZER(&kvm_enabled_icps);
 
+static void kvm_disable_icps(void)
+{
+KVMEnabledICP *enabled_icp, *next;
+
+QLIST_FOREACH_SAFE(enabled_icp, &kvm_enabled_icps, node, next) {
+QLIST_REMOVE(enabled_icp, node);
+g_free(enabled_icp);
+}
+}
+
 /*
  * ICP-KVM
  */
@@ -455,6 +465,53 @@ fail:
 return -1;
 }
 
+int xics_kvm_disconnect(sPAPRMachineState *spapr, Error **errp)
+{
+int rc;
+struct kvm_create_device xics_create_device = {
+.fd = kernel_xics_fd,
+.type = KVM_DEV_TYPE_XICS,
+.flags = 0,
+};
+
+/* The KVM XICS device is not in

[Qemu-devel] [PATCH 09/13] sysbus: add a sysbus_mmio_unmap() helper

2019-01-07 Thread Cédric Le Goater

This will be used to remove the MMIO regions of the POWER9 XIVE
interrupt controller when the sPAPR machine is reseted.

Signed-off-by: Cédric Le Goater 
Reviewed-by: David Gibson 
Signed-off-by: Cédric Le Goater 
---
 include/hw/sysbus.h |  1 +
 hw/core/sysbus.c| 10 ++
 2 files changed, 11 insertions(+)

diff --git a/include/hw/sysbus.h b/include/hw/sysbus.h
index 1aedcf05c92b..4c668fbbdc60 100644
--- a/include/hw/sysbus.h
+++ b/include/hw/sysbus.h
@@ -89,6 +89,7 @@ qemu_irq sysbus_get_connected_irq(SysBusDevice *dev, int n);
 void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr);
 void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr,
  int priority);
+void sysbus_mmio_unmap(SysBusDevice *dev, int n);
 void sysbus_add_io(SysBusDevice *dev, hwaddr addr,
MemoryRegion *mem);
 MemoryRegion *sysbus_address_space(SysBusDevice *dev);
diff --git a/hw/core/sysbus.c b/hw/core/sysbus.c
index 9f9edbcab96f..f90d87b058c3 100644
--- a/hw/core/sysbus.c
+++ b/hw/core/sysbus.c
@@ -153,6 +153,16 @@ static void sysbus_mmio_map_common(SysBusDevice *dev, int 
n, hwaddr addr,
 }
 }
 
+void sysbus_mmio_unmap(SysBusDevice *dev, int n)
+{
+assert(n >= 0 && n < dev->num_mmio);
+
+if (dev->mmio[n].addr != (hwaddr)-1) {
+memory_region_del_subregion(get_system_memory(), dev->mmio[n].memory);
+dev->mmio[n].addr = (hwaddr)-1;
+}
+}
+
 void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
 {
 sysbus_mmio_map_common(dev, n, addr, false, 0);
-- 
2.20.1

[Qemu-devel] [PATCH 13/13] spapr: add KVM support to the 'dual' machine

2019-01-07 Thread Cédric Le Goater

The interrupt mode is chosen by the CAS negotiation process and
activated after a reset to take into account the required changes in
the machine. This brings new constraints on how the associated KVM IRQ
device is initialized.

Currently, each model takes care of the initialization of the KVM
device in their realize method but this is not possible anymore as the
initialization needs to be done globaly when the interrupt mode is
known, i.e. when machine is reseted. It also means that we need a way
to delete a KVM device when another mode is chosen.

Also, to support migration, the QEMU objects holding the state to
transfer should always be available but not necessarily activated.

The overall approach of this proposal is to initialize both interrupt
mode at the QEMU level and keep the IRQ number space in sync to allow
switching from one mode to another. For the KVM side of things, the
whole initialization of the KVM device, sources and presenters, is
grouped in a single routine. The XICS and XIVE sPAPR IRQ reset
handlers are modified accordingly to handle the init and the delete
sequences of the KVM device.

As KVM is now initialized at reset, we loose the possiblity to
fallback to the QEMU emulated mode in case of failure and failures
become fatal to the machine.

Signed-off-by: Cédric Le Goater 
---
 hw/intc/spapr_xive.c |  8 +---
 hw/intc/spapr_xive_kvm.c | 27 ++
 hw/intc/xics_kvm.c   | 25 +
 hw/intc/xive.c   |  4 --
 hw/ppc/spapr_irq.c   | 79 
 5 files changed, 109 insertions(+), 34 deletions(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 21f3c1ef0901..0661aca35900 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -330,13 +330,7 @@ static void spapr_xive_realize(DeviceState *dev, Error 
**errp)
 xive->eat = g_new0(XiveEAS, xive->nr_irqs);
 xive->endt = g_new0(XiveEND, xive->nr_ends);
 
-if (kvmppc_xive_enabled()) {
-kvmppc_xive_connect(xive, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-} else {
+if (!kvmppc_xive_enabled()) {
 /* TIMA initialization */
 memory_region_init_io(&xive->tm_mmio, OBJECT(xive), &xive_tm_ops, xive,
   "xive.tima", 4ull << TM_SHIFT);
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index d35814c1992e..3ebc947f2be7 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -737,6 +737,15 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
 Error *local_err = NULL;
 size_t esb_len;
 size_t tima_len;
+CPUState *cs;
+
+/*
+ * The KVM XIVE device already in use. This is the case when
+ * rebooting XIVE -> XIVE
+ */
+if (xive->fd != -1) {
+return;
+}
 
 if (!kvm_enabled() || !kvmppc_has_cap_xive()) {
 error_setg(errp, "IRQ_XIVE capability must be present for KVM");
@@ -800,6 +809,24 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
 xive->change = qemu_add_vm_change_state_handler(
 kvmppc_xive_change_state_handler, xive);
 
+/* Connect the presenters to the initial VCPUs of the machine */
+CPU_FOREACH(cs) {
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+kvmppc_xive_cpu_connect(cpu->tctx, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+
+/* Update the KVM sources */
+kvmppc_xive_source_reset(xsrc, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
 kvm_kernel_irqchip = true;
 kvm_msi_via_irqfd_allowed = true;
 kvm_gsi_direct_mapping = true;
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 1d21ff217b82..bfc35d71df7f 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -448,6 +448,16 @@ static void rtas_dummy(PowerPCCPU *cpu, sPAPRMachineState 
*spapr,
 int xics_kvm_init(sPAPRMachineState *spapr, Error **errp)
 {
 int rc;
+CPUState *cs;
+Error *local_err = NULL;
+
+/*
+ * The KVM XICS device already in use. This is the case when
+ * rebooting XICS -> XICS
+ */
+if (kernel_xics_fd != -1) {
+return 0;
+}
 
 if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_IRQ_XICS)) {
 error_setg(errp,
@@ -496,6 +506,21 @@ int xics_kvm_init(sPAPRMachineState *spapr, Error **errp)
 kvm_msi_via_irqfd_allowed = true;
 kvm_gsi_direct_mapping = true;
 
+/* Connect the presenters to the initial VCPUs of the machine */
+CPU_FOREACH(cs) {
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+icp_kvm_connect(cpu->icp, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
+}
+icp_set_kvm_state(cpu->icp, 1);
+}
+
+/* Update the KVM sources */
+ics_set_kvm_state(ICS_KVM(spapr->ics), 1);
+

[Qemu-devel] [PATCH 11/13] spapr: check for the activation of the KVM IRQ device

2019-01-07 Thread Cédric Le Goater

The activation of the KVM IRQ device depends on the interrupt mode
chosen at CAS time by the machine and some methods used at reset or by
the migration need to be protected.

Signed-off-by: Cédric Le Goater 
---
 hw/intc/spapr_xive_kvm.c | 28 
 hw/intc/xics_kvm.c   | 25 -
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 93ea8e71047a..d35814c1992e 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -95,9 +95,15 @@ static void kvmppc_xive_cpu_set_state(XiveTCTX *tctx, Error 
**errp)
 
 void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
 {
+sPAPRXive *xive = SPAPR_MACHINE(qdev_get_machine())->xive;
 uint64_t state[4] = { 0 };
 int ret;
 
+/* The KVM XIVE device is not in use */
+if (xive->fd == -1) {
+return;
+}
+
 ret = kvm_get_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
 if (ret != 0) {
 error_setg_errno(errp, errno,
@@ -151,6 +157,11 @@ void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
 unsigned long vcpu_id;
 int ret;
 
+/* The KVM XIVE device is not in use */
+if (xive->fd == -1) {
+return;
+}
+
 /* Check if CPU was hot unplugged and replugged. */
 if (kvm_cpu_is_enabled(tctx->cs)) {
 return;
@@ -234,9 +245,13 @@ static void kvmppc_xive_source_get_state(XiveSource *xsrc)
 void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
 {
 XiveSource *xsrc = opaque;
+sPAPRXive *xive = SPAPR_XIVE(xsrc->xive);
 struct kvm_irq_level args;
 int rc;
 
+/* The KVM XIVE device should be in use */
+assert(xive->fd != -1);
+
 args.irq = srcno;
 if (!xive_source_irq_is_lsi(xsrc, srcno)) {
 if (!val) {
@@ -580,6 +595,11 @@ int kvmppc_xive_pre_save(sPAPRXive *xive)
 Error *local_err = NULL;
 CPUState *cs;
 
+/* The KVM XIVE device is not in use */
+if (xive->fd == -1) {
+return 0;
+}
+
 /* Grab the EAT */
 kvmppc_xive_get_eas_state(xive, &local_err);
 if (local_err) {
@@ -612,6 +632,9 @@ int kvmppc_xive_post_load(sPAPRXive *xive, int version_id)
 Error *local_err = NULL;
 CPUState *cs;
 
+/* The KVM XIVE device should be in use */
+assert(xive->fd != -1);
+
 /* Restore the ENDT first. The targetting depends on it. */
 CPU_FOREACH(cs) {
 kvmppc_xive_set_eq_state(xive, cs, &local_err);
@@ -649,6 +672,11 @@ void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error 
**errp)
 CPUState *cs;
 Error *local_err = NULL;
 
+/* The KVM XIVE device is not in use */
+if (xive->fd == -1) {
+return;
+}
+
 /*
  * When the VM is stopped, the sources are masked and the previous
  * state is saved in anticipation of a migration. We should not
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index da6a00bc88cc..651bbfdf6966 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -68,6 +68,11 @@ static void icp_get_kvm_state(ICPState *icp)
 uint64_t state;
 int ret;
 
+/* The KVM XICS device is not in use */
+if (kernel_xics_fd == -1) {
+return;
+}
+
 /* ICP for this CPU thread is not in use, exiting */
 if (!icp->cs) {
 return;
@@ -104,6 +109,11 @@ static int icp_set_kvm_state(ICPState *icp, int version_id)
 uint64_t state;
 int ret;
 
+/* The KVM XICS device is not in use */
+if (kernel_xics_fd == -1) {
+return 0;
+}
+
 /* ICP for this CPU thread is not in use, exiting */
 if (!icp->cs) {
 return 0;
@@ -140,8 +150,8 @@ static void icp_kvm_connect(ICPState *icp, Error **errp)
 unsigned long vcpu_id;
 int ret;
 
+/* The KVM XICS device is not in use */
 if (kernel_xics_fd == -1) {
-abort();
 return;
 }
 
@@ -220,6 +230,11 @@ static void ics_get_kvm_state(ICSState *ics)
 uint64_t state;
 int i;
 
+/* The KVM XICS device is not in use */
+if (kernel_xics_fd == -1) {
+return;
+}
+
 for (i = 0; i < ics->nr_irqs; i++) {
 ICSIRQState *irq = &ics->irqs[i];
 
@@ -279,6 +294,11 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
 int i;
 Error *local_err = NULL;
 
+/* The KVM XICS device is not in use */
+if (kernel_xics_fd == -1) {
+return 0;
+}
+
 for (i = 0; i < ics->nr_irqs; i++) {
 ICSIRQState *irq = &ics->irqs[i];
 int ret;
@@ -325,6 +345,9 @@ void ics_kvm_set_irq(void *opaque, int srcno, int val)
 struct kvm_irq_level args;
 int rc;
 
+/* The KVM XICS device should be in use */
+assert(kernel_xics_fd != -1);
+
 args.irq = srcno + ics->offset;
 if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MSI) {
 if (!val) {
-- 
2.20.1

[Qemu-devel] [PATCH 05/13] spapr/xive: add migration support for KVM

2019-01-07 Thread Cédric Le Goater

When the VM is stopped, the VM state handler stabilizes the XIVE IC
and marks the EQ pages dirty. These are then transferred to destination
before the transfer of the device vmstates starts.

The sPAPRXive interrupt controller model captures the XIVE internal
tables, EAT and ENDT and the XiveTCTX model does the same for the
thread interrupt context registers.

At restart, the sPAPRXive 'post_load' method restores all the XIVE
states. It is called by the sPAPR machine 'post_load' method, when all
XIVE states have been transferred and loaded.

Finally, the source states are restored in the VM change state handler
when the machine reaches the running state.

Signed-off-by: Cédric Le Goater 
---
 include/hw/ppc/spapr_xive.h |   5 +
 include/hw/ppc/xive.h   |   1 +
 hw/intc/spapr_xive.c|  34 +++
 hw/intc/spapr_xive_kvm.c| 187 +++-
 hw/intc/xive.c  |  17 
 hw/ppc/spapr_irq.c  |   2 +-
 6 files changed, 244 insertions(+), 2 deletions(-)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 8815ed5aa372..52804516e909 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -46,6 +46,7 @@ bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, 
bool lsi);
 bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
 void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
 bool spapr_xive_priority_is_reserved(uint8_t priority);
+int spapr_xive_post_load(sPAPRXive *xive, int version_id);
 
 void spapr_xive_cpu_to_nvt(PowerPCCPU *cpu,
uint8_t *out_nvt_blk, uint32_t *out_nvt_idx);
@@ -53,6 +54,8 @@ void spapr_xive_cpu_to_end(PowerPCCPU *cpu, uint8_t prio,
uint8_t *out_end_blk, uint32_t *out_end_idx);
 int spapr_xive_target_to_end(uint32_t target, uint8_t prio,
  uint8_t *out_end_blk, uint32_t *out_end_idx);
+int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
+ uint32_t *out_server, uint8_t *out_prio);
 
 typedef struct sPAPRMachineState sPAPRMachineState;
 
@@ -68,5 +71,7 @@ void spapr_xive_map_mmio(sPAPRXive *xive);
  */
 void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
 void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp);
+int kvmppc_xive_pre_save(sPAPRXive *xive);
+int kvmppc_xive_post_load(sPAPRXive *xive, int version_id);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 2e48d75a22e0..8aa314f93ffd 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -443,5 +443,6 @@ void kvmppc_xive_source_reset(XiveSource *xsrc, Error 
**errp);
 void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
 void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
 void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp);
+void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp);
 
 #endif /* PPC_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 50dd66707968..21f3c1ef0901 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -85,6 +85,19 @@ static int spapr_xive_target_to_nvt(uint32_t target,
  * sPAPR END indexing uses a simple mapping of the CPU vcpu_id, 8
  * priorities per CPU
  */
+int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
+ uint32_t *out_server, uint8_t *out_prio)
+{
+if (out_server) {
+*out_server = end_idx >> 3;
+}
+
+if (out_prio) {
+*out_prio = end_idx & 0x7;
+}
+return 0;
+}
+
 void spapr_xive_cpu_to_end(PowerPCCPU *cpu, uint8_t prio,
uint8_t *out_end_blk, uint32_t *out_end_idx)
 {
@@ -438,10 +451,31 @@ static const VMStateDescription vmstate_spapr_xive_eas = {
 },
 };
 
+static int vmstate_spapr_xive_pre_save(void *opaque)
+{
+if (kvmppc_xive_enabled()) {
+return kvmppc_xive_pre_save(SPAPR_XIVE(opaque));
+}
+
+return 0;
+}
+
+/* Called by the sPAPR machine 'post_load' method */
+int spapr_xive_post_load(sPAPRXive *xive, int version_id)
+{
+if (kvmppc_xive_enabled()) {
+return kvmppc_xive_post_load(xive, version_id);
+}
+
+return 0;
+}
+
 static const VMStateDescription vmstate_spapr_xive = {
 .name = TYPE_SPAPR_XIVE,
 .version_id = 1,
 .minimum_version_id = 1,
+.pre_save = vmstate_spapr_xive_pre_save,
+.post_load = NULL, /* handled at the machine level */
 .fields = (VMStateField[]) {
 VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
 VMSTATE_STRUCT_VARRAY_POINTER_UINT32(eat, sPAPRXive, nr_irqs,
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index c7639ffe7758..fe58a9ee32d3 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -60,7 +60,30 @@ static void kvm_cpu_enable(CPUState *cs)
 /*
  * XIVE Thread Interrupt Management context (KVM)
  */
-static void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **e

[Qemu-devel] [PATCH 07/13] ppc/xics: introduce a icp_kvm_connect() routine

2019-01-07 Thread Cédric Le Goater

This routine gathers all the KVM initialization of the XICS KVM
presenter. It will be useful when the initialization of the KVM XICS
device is moved to a global routine.

Signed-off-by: Cédric Le Goater 
---
 hw/intc/xics_kvm.c | 29 -
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index ac94594b1919..2426e5b2f4ed 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -123,11 +123,8 @@ static void icp_kvm_reset(DeviceState *dev)
 icp_set_kvm_state(ICP(dev), 1);
 }
 
-static void icp_kvm_realize(DeviceState *dev, Error **errp)
+static void icp_kvm_connect(ICPState *icp, Error **errp)
 {
-ICPState *icp = ICP(dev);
-ICPStateClass *icpc = ICP_GET_CLASS(icp);
-Error *local_err = NULL;
 CPUState *cs;
 KVMEnabledICP *enabled_icp;
 unsigned long vcpu_id;
@@ -135,11 +132,6 @@ static void icp_kvm_realize(DeviceState *dev, Error **errp)
 
 if (kernel_xics_fd == -1) {
 abort();
-}
-
-icpc->parent_realize(dev, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
 return;
 }
 
@@ -168,6 +160,25 @@ static void icp_kvm_realize(DeviceState *dev, Error **errp)
 QLIST_INSERT_HEAD(&kvm_enabled_icps, enabled_icp, node);
 }
 
+static void icp_kvm_realize(DeviceState *dev, Error **errp)
+{
+ICPStateClass *icpc = ICP_GET_CLASS(dev);
+Error *local_err = NULL;
+
+icpc->parent_realize(dev, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+/* Connect the presenter to the VCPU (required for CPU hotplug) */
+icp_kvm_connect(ICP(dev), &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+
 static void icp_kvm_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-- 
2.20.1

[Qemu-devel] [PATCH 12/13] spapr/xics: ignore the lower 4K in the IRQ number space

2019-01-07 Thread Cédric Le Goater

The IRQ number space of the XIVE and XICS interrupt mode are aligned
when using the dual interrupt mode for the machine. This means that
the ICS offset is set to zero in QEMU and that the KVM XICS device
should be informed of this new value. Unfortunately, there is now way
to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset.

Ignore the lower 4K which are not used under the XICS interrupt
mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs.

Signed-off-by: Cédric Le Goater 
---
 hw/intc/xics_kvm.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 651bbfdf6966..1d21ff217b82 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics)
 for (i = 0; i < ics->nr_irqs; i++) {
 ICSIRQState *irq = &ics->irqs[i];
 
+/*
+ * The KVM XICS device considers that the IRQ numbers should
+ * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
+ * numbers (only claimed by XIVE for the CPU IPIs).
+ */
+if (i + ics->offset < XICS_IRQ_BASE) {
+continue;
+}
+
 kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
   i + ics->offset, &state, false, &error_fatal);
 
@@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
 ICSIRQState *irq = &ics->irqs[i];
 int ret;
 
+/*
+ * The KVM XICS device considers that the IRQ numbers should
+ * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
+ * numbers (only claimed by XIVE for the CPU IPIs).
+ */
+if (i + ics->offset < XICS_IRQ_BASE) {
+continue;
+}
+
 state = irq->server;
 state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
 << KVM_XICS_PRIORITY_SHIFT;
-- 
2.20.1

[Qemu-devel] [PATCH 04/13] spapr/xive: introduce a VM state change handler

2019-01-07 Thread Cédric Le Goater

This handler is in charge of stabilizing the flow of event notifications
in the XIVE controller before migrating a guest. This is a requirement
before transferring the guest EQ pages to a destination.

When the VM is stopped, the handler masks the sources (PQ=01) to stop
the flow of events and saves their previous state. The XIVE controller
is then synced through KVM to flush any in-flight event notification
and to stabilize the EQs. At this stage, the EQ pages are marked dirty
to make sure the EQ pages are transferred if a migration sequence is
in progress.

The previous configuration of the sources is restored when the VM
resumes, after a migration or a stop.

Signed-off-by: Cédric Le Goater 
---
 include/hw/ppc/spapr_xive.h |   1 +
 hw/intc/spapr_xive_kvm.c| 111 +++-
 2 files changed, 111 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 02f2de20111c..8815ed5aa372 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -39,6 +39,7 @@ typedef struct sPAPRXive {
 /* KVM support */
 int   fd;
 void  *tm_mmap;
+VMChangeStateEntry *change;
 } sPAPRXive;
 
 bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index f52bddc92a2a..c7639ffe7758 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -350,13 +350,119 @@ static void kvmppc_xive_get_eas_state(sPAPRXive *xive, 
Error **errp)
 }
 }
 
+/*
+ * Sync the XIVE controller through KVM to flush any in-flight event
+ * notification and stabilize the EQs.
+ */
+ static void kvmppc_xive_sync_all(sPAPRXive *xive, Error **errp)
+{
+XiveSource *xsrc = &xive->source;
+Error *local_err = NULL;
+int i;
+
+/* Sync the KVM source. This reaches the XIVE HW through OPAL */
+for (i = 0; i < xsrc->nr_irqs; i++) {
+XiveEAS *eas = &xive->eat[i];
+
+if (!xive_eas_is_valid(eas)) {
+continue;
+}
+
+kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SYNC, i, NULL, true,
+  &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+
+/*
+ * The primary goal of the XIVE VM change handler is to mark the EQ
+ * pages dirty when all XIVE event notifications have stopped.
+ *
+ * Whenever the VM is stopped, the VM change handler masks the sources
+ * (PQ=01) to stop the flow of events and saves the previous state in
+ * anticipation of a migration. The XIVE controller is then synced
+ * through KVM to flush any in-flight event notification and stabilize
+ * the EQs.
+ *
+ * At this stage, we can mark the EQ page dirty and let a migration
+ * sequence transfer the EQ pages to the destination, which is done
+ * just after the stop state.
+ *
+ * The previous configuration of the sources is restored when the VM
+ * runs again.
+ */
+static void kvmppc_xive_change_state_handler(void *opaque, int running,
+ RunState state)
+{
+sPAPRXive *xive = opaque;
+XiveSource *xsrc = &xive->source;
+Error *local_err = NULL;
+int i;
+
+/*
+ * Restore the sources to their initial state. This is called when
+ * the VM resumes after a stop or a migration.
+ */
+if (running) {
+for (i = 0; i < xsrc->nr_irqs; i++) {
+uint8_t pq = xive_source_esb_get(xsrc, i);
+if (xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 8)) != 0x1) 
{
+error_report("XIVE: IRQ %d has an invalid state", i);
+}
+}
+
+return;
+}
+
+/*
+ * Mask the sources, to stop the flow of event notifications, and
+ * save the PQs locally in the XiveSource object. The XiveSource
+ * state will be collected later on by its vmstate handler if a
+ * migration is in progress.
+ */
+for (i = 0; i < xsrc->nr_irqs; i++) {
+uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_01);
+xive_source_esb_set(xsrc, i, pq);
+}
+
+/*
+ * Sync the XIVE controller in KVM, to flush in-flight event
+ * notification that should be enqueued in the EQs.
+ */
+kvmppc_xive_sync_all(xive, &local_err);
+if (local_err) {
+error_report_err(local_err);
+return;
+}
+
+/*
+ * Mark the XIVE EQ pages dirty to collect all updates.
+ */
+kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_CTRL,
+  KVM_DEV_XIVE_SAVE_EQ_PAGES, NULL, true, &local_err);
+if (local_err) {
+error_report_err(local_err);
+}
+}
+
 void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp)
 {
 XiveSource *xsrc = &xive->source;
 CPUState *cs;
 Error *local_err = NULL;
 
-kvmppc_xive_source_get_state(xsrc);
+/*
+ * When the VM is stopped, the sources are masked and the previous
+ * state is saved

[Qemu-devel] [PATCH 06/13] spapr/xive: fix migration of the XiveTCTX under TCG

2019-01-07 Thread Cédric Le Goater

When the thread interrupt management state is retrieved from the KVM
VCPU, word2 is saved under the QEMU XIVE thread context to print out
the OS CAM line under the QEMU monitor.

This breaks the migration of a TCG guest (and with KVM when
kernel_irqchip=off) because the matching algorithm of the presenter
relies on the OS CAM value. Fix with an extra reset of the thread
contexts to restore the expected value.

Signed-off-by: Cédric Le Goater 
---
 hw/ppc/spapr_irq.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 233c97c5ecd9..ba27d9d8e972 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -363,7 +363,31 @@ static void 
spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
 
 static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
 {
-return spapr_xive_post_load(spapr->xive, version_id);
+CPUState *cs;
+int ret;
+
+ret = spapr_xive_post_load(spapr->xive, version_id);
+if (ret) {
+return ret;
+}
+
+/*
+ * When the states are collected from the KVM XIVE device, word2
+ * of the XiveTCTX is set to print out the OS CAM line under the
+ * QEMU monitor.
+ *
+ * This breaks the migration on a TCG guest (or on KVM with
+ * kernel_irqchip=off) because the matching algorithm of the
+ * presenter relies on the OS CAM value. Fix with an extra reset
+ * of the thread contexts to restore the expected value.
+ */
+CPU_FOREACH(cs) {
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+/* (TCG) Set the OS CAM line of the thread interrupt context. */
+spapr_xive_set_tctx_os_cam(cpu->tctx);
+}
+return 0;
 }
 
 static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
-- 
2.20.1

[Qemu-devel] [PATCH 03/13] spapr/xive: add state synchronization with KVM

2019-01-07 Thread Cédric Le Goater

This extends the KVM XIVE device backend with 'synchronize_state'
methods used to retrieve the state from KVM. The HW state of the
sources, the KVM device and the thread interrupt contexts are
collected for the monitor usage and also migration.

These get operations rely on their KVM counterpart in the host kernel
which acts as a proxy for OPAL, the host firmware. The set operations
will be added for migration support later.

Signed-off-by: Cédric Le Goater 
---
 include/hw/ppc/spapr_xive.h |   9 ++
 include/hw/ppc/xive.h   |   1 +
 hw/intc/spapr_xive.c|  24 ++--
 hw/intc/spapr_xive_kvm.c| 223 
 hw/intc/xive.c  |  10 ++
 5 files changed, 260 insertions(+), 7 deletions(-)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 24a0be478039..02f2de20111c 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -44,6 +44,14 @@ typedef struct sPAPRXive {
 bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
 bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
 void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
+bool spapr_xive_priority_is_reserved(uint8_t priority);
+
+void spapr_xive_cpu_to_nvt(PowerPCCPU *cpu,
+   uint8_t *out_nvt_blk, uint32_t *out_nvt_idx);
+void spapr_xive_cpu_to_end(PowerPCCPU *cpu, uint8_t prio,
+   uint8_t *out_end_blk, uint32_t *out_end_idx);
+int spapr_xive_target_to_end(uint32_t target, uint8_t prio,
+ uint8_t *out_end_blk, uint32_t *out_end_idx);
 
 typedef struct sPAPRMachineState sPAPRMachineState;
 
@@ -58,5 +66,6 @@ void spapr_xive_map_mmio(sPAPRXive *xive);
  * KVM XIVE device helpers
  */
 void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
+void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 4bbba8d39a65..2e48d75a22e0 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -442,5 +442,6 @@ static inline bool kvmppc_xive_enabled(void)
 void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
 void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
 void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
+void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp);
 
 #endif /* PPC_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index cf6d3a5f12e1..50dd66707968 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -54,8 +54,8 @@ static uint32_t spapr_xive_nvt_to_target(uint8_t nvt_blk, 
uint32_t nvt_idx)
 return nvt_idx - SPAPR_XIVE_NVT_BASE;
 }
 
-static void spapr_xive_cpu_to_nvt(PowerPCCPU *cpu,
-  uint8_t *out_nvt_blk, uint32_t *out_nvt_idx)
+void spapr_xive_cpu_to_nvt(PowerPCCPU *cpu,
+   uint8_t *out_nvt_blk, uint32_t *out_nvt_idx)
 {
 assert(cpu);
 
@@ -85,8 +85,8 @@ static int spapr_xive_target_to_nvt(uint32_t target,
  * sPAPR END indexing uses a simple mapping of the CPU vcpu_id, 8
  * priorities per CPU
  */
-static void spapr_xive_cpu_to_end(PowerPCCPU *cpu, uint8_t prio,
-  uint8_t *out_end_blk, uint32_t *out_end_idx)
+void spapr_xive_cpu_to_end(PowerPCCPU *cpu, uint8_t prio,
+   uint8_t *out_end_blk, uint32_t *out_end_idx)
 {
 assert(cpu);
 
@@ -99,8 +99,8 @@ static void spapr_xive_cpu_to_end(PowerPCCPU *cpu, uint8_t 
prio,
 }
 }
 
-static int spapr_xive_target_to_end(uint32_t target, uint8_t prio,
-uint8_t *out_end_blk, uint32_t 
*out_end_idx)
+int spapr_xive_target_to_end(uint32_t target, uint8_t prio,
+ uint8_t *out_end_blk, uint32_t *out_end_idx)
 {
 PowerPCCPU *cpu = spapr_find_cpu(target);
 
@@ -139,6 +139,16 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor 
*mon)
 XiveSource *xsrc = &xive->source;
 int i;
 
+if (kvmppc_xive_enabled()) {
+Error *local_err = NULL;
+
+kvmppc_xive_synchronize_state(xive, &local_err);
+if (local_err) {
+error_report_err(local_err);
+return;
+}
+}
+
 monitor_printf(mon, "  LSIN PQEISN CPU/PRIO EQ\n");
 
 for (i = 0; i < xive->nr_irqs; i++) {
@@ -529,7 +539,7 @@ bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn)
  * interrupts (DD2.X POWER9). So we only allow the guest to use
  * priorities [0..6].
  */
-static bool spapr_xive_priority_is_reserved(uint8_t priority)
+bool spapr_xive_priority_is_reserved(uint8_t priority)
 {
 switch (priority) {
 case 0 ... 6:
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index f96c66fa419d..f52bddc92a2a 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -60,6 +60,57 @@ static void kvm_cpu_enable(CPUState *cs)
 /*
  * XIVE Thread Interrupt Management co

[Qemu-devel] [PATCH 08/13] spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers

2019-01-07 Thread Cédric Le Goater

Removing RTAS handlers will become necessary when the new pseries
machine supporting multiple interrupt mode is introduced.

Signed-off-by: Cédric Le Goater 
---
 include/hw/ppc/spapr.h | 4 
 hw/ppc/spapr_rtas.c| 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 9e01a5a12e4a..9a6d015b9cf5 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -657,6 +657,10 @@ typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, 
sPAPRMachineState *sm,
   uint32_t nargs, target_ulong args,
   uint32_t nret, target_ulong rets);
 void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn);
+static inline void spapr_rtas_unregister(int token)
+{
+spapr_rtas_register(token, NULL, NULL);
+}
 target_ulong spapr_rtas_call(PowerPCCPU *cpu, sPAPRMachineState *sm,
  uint32_t token, uint32_t nargs, target_ulong args,
  uint32_t nret, target_ulong rets);
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index d6a0952154ac..e005d5d08151 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -404,7 +404,7 @@ void spapr_rtas_register(int token, const char *name, 
spapr_rtas_fn fn)
 
 token -= RTAS_TOKEN_BASE;
 
-assert(!rtas_table[token].name);
+assert(!name || !rtas_table[token].name);
 
 rtas_table[token].name = name;
 rtas_table[token].fn = fn;
-- 
2.20.1

[Qemu-devel] [PATCH 01/13] linux-headers: update to 5.0

2019-01-07 Thread Cédric Le Goater

These changes provide the interface with the KVM device implementing
the XIVE native exploitation interrupt mode. Also used to retrieve the
state of the KVM device for the monitor usage and for migration.

Available from :

  https://github.com/legoater/linux/commits/xive-5.0

Signed-off-by: Cédric Le Goater 
---
 linux-headers/asm-powerpc/kvm.h | 46 +
 linux-headers/linux/kvm.h   |  9 +++
 2 files changed, 55 insertions(+)

diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h
index 8c876c166ef2..10fe86c21e8f 100644
--- a/linux-headers/asm-powerpc/kvm.h
+++ b/linux-headers/asm-powerpc/kvm.h
@@ -480,6 +480,8 @@ struct kvm_ppc_cpu_char {
 #define  KVM_REG_PPC_ICP_PPRI_SHIFT16  /* pending irq priority */
 #define  KVM_REG_PPC_ICP_PPRI_MASK 0xff
 
+#define KVM_REG_PPC_NVT_STATE  (KVM_REG_PPC | KVM_REG_SIZE_U256 | 0x8d)
+
 /* Device control API: PPC-specific devices */
 #define KVM_DEV_MPIC_GRP_MISC  1
 #define   KVM_DEV_MPIC_BASE_ADDR   0   /* 64-bit */
@@ -675,4 +677,48 @@ struct kvm_ppc_cpu_char {
 #define  KVM_XICS_PRESENTED(1ULL << 43)
 #define  KVM_XICS_QUEUED   (1ULL << 44)
 
+/* POWER9 XIVE Native Interrupt Controller */
+#define KVM_DEV_XIVE_GRP_CTRL  1
+#define   KVM_DEV_XIVE_GET_ESB_FD  1
+#define   KVM_DEV_XIVE_GET_TIMA_FD 2
+#define   KVM_DEV_XIVE_VC_BASE 3
+#define   KVM_DEV_XIVE_SAVE_EQ_PAGES   4
+#define KVM_DEV_XIVE_GRP_SOURCES   2   /* 64-bit source attributes */
+#define KVM_DEV_XIVE_GRP_SYNC  3   /* 64-bit source attributes */
+#define KVM_DEV_XIVE_GRP_EAS   4   /* 64-bit eas attributes */
+#define KVM_DEV_XIVE_GRP_EQ5   /* 64-bit eq attributes */
+
+/* Layout of 64-bit XIVE source attribute values */
+#define KVM_XIVE_LEVEL_SENSITIVE   (1ULL << 0)
+#define KVM_XIVE_LEVEL_ASSERTED(1ULL << 1)
+
+/* Layout of 64-bit eas attribute values */
+#define KVM_XIVE_EAS_PRIORITY_SHIFT0
+#define KVM_XIVE_EAS_PRIORITY_MASK 0x7
+#define KVM_XIVE_EAS_SERVER_SHIFT  3
+#define KVM_XIVE_EAS_SERVER_MASK   0xfff8ULL
+#define KVM_XIVE_EAS_MASK_SHIFT32
+#define KVM_XIVE_EAS_MASK_MASK 0x1ULL
+#define KVM_XIVE_EAS_EISN_SHIFT33
+#define KVM_XIVE_EAS_EISN_MASK 0xfffeULL
+
+/* Layout of 64-bit eq attribute */
+#define KVM_XIVE_EQ_PRIORITY_SHIFT 0
+#define KVM_XIVE_EQ_PRIORITY_MASK  0x7
+#define KVM_XIVE_EQ_SERVER_SHIFT   3
+#define KVM_XIVE_EQ_SERVER_MASK0xfff8ULL
+
+/* Layout of 64-bit eq attribute values */
+struct kvm_ppc_xive_eq {
+   __u32 flags;
+   __u32 qsize;
+   __u64 qpage;
+   __u32 qtoggle;
+   __u32 qindex;
+};
+
+#define KVM_XIVE_EQ_FLAG_ENABLED   0x0001
+#define KVM_XIVE_EQ_FLAG_ALWAYS_NOTIFY 0x0002
+#define KVM_XIVE_EQ_FLAG_ESCALATE  0x0004
+
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index f11a7eb49cfa..7f476ad5e4e8 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1,3 +1,4 @@
+
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 #ifndef __LINUX_KVM_H
 #define __LINUX_KVM_H
@@ -965,6 +966,10 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_COALESCED_PIO 162
 #define KVM_CAP_HYPERV_ENLIGHTENED_VMCS 163
 #define KVM_CAP_EXCEPTION_PAYLOAD 164
+#define KVM_CAP_ARM_VM_IPA_SIZE 165
+#define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 166
+#define KVM_CAP_HYPERV_CPUID 167
+#define KVM_CAP_PPC_IRQ_XIVE 168
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1188,6 +1193,8 @@ enum kvm_device_type {
 #define KVM_DEV_TYPE_ARM_VGIC_V3   KVM_DEV_TYPE_ARM_VGIC_V3
KVM_DEV_TYPE_ARM_VGIC_ITS,
 #define KVM_DEV_TYPE_ARM_VGIC_ITS  KVM_DEV_TYPE_ARM_VGIC_ITS
+   KVM_DEV_TYPE_XIVE,
+#define KVM_DEV_TYPE_XIVE  KVM_DEV_TYPE_XIVE
KVM_DEV_TYPE_MAX,
 };
 
@@ -1305,6 +1312,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct kvm_device_attr)
 
+#define KVM_DESTROY_DEVICE   _IOWR(KVMIO,  0xf0, struct kvm_create_device)
+
 /*
  * ioctls for vcpu fds
  */
-- 
2.20.1

[Qemu-devel] [PATCH 02/13] spapr/xive: add KVM support

2019-01-07 Thread Cédric Le Goater

This introduces a set of helpers when KVM is in use, which create the
KVM XIVE device, initialize the interrupt sources at a KVM level and
connect the interrupt presenters to the vCPU.

They also handle the initialization of the TIMA and the source ESB
memory regions of the controller. These have a different type under
KVM. They are 'ram device' memory mappings, similarly to VFIO, exposed
to the guest and the associated VMAs on the host are populated
dynamically with the appropriate pages using a fault handler.

Signed-off-by: Cédric Le Goater 
---
 default-configs/ppc64-softmmu.mak |   1 +
 include/hw/ppc/spapr_xive.h   |  10 ++
 include/hw/ppc/xive.h |  22 +++
 target/ppc/kvm_ppc.h  |   6 +
 hw/intc/spapr_xive.c  |  31 ++--
 hw/intc/spapr_xive_kvm.c  | 254 ++
 hw/intc/xive.c|  22 ++-
 hw/ppc/spapr_irq.c|  11 +-
 target/ppc/kvm.c  |   7 +
 hw/intc/Makefile.objs |   1 +
 10 files changed, 349 insertions(+), 16 deletions(-)
 create mode 100644 hw/intc/spapr_xive_kvm.c

diff --git a/default-configs/ppc64-softmmu.mak 
b/default-configs/ppc64-softmmu.mak
index 7f34ad0528ed..c1bf5cd951f5 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -18,6 +18,7 @@ CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
 CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
 CONFIG_XIVE=$(CONFIG_PSERIES)
 CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
+CONFIG_XIVE_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
 CONFIG_MEM_DEVICE=y
 CONFIG_DIMM=y
 CONFIG_SPAPR_RNG=y
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 7fdc25057420..24a0be478039 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -35,6 +35,10 @@ typedef struct sPAPRXive {
 /* TIMA mapping address */
 hwaddrtm_base;
 MemoryRegion  tm_mmio;
+
+/* KVM support */
+int   fd;
+void  *tm_mmap;
 } sPAPRXive;
 
 bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
@@ -48,5 +52,11 @@ void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t 
nr_servers, void *fdt,
uint32_t phandle);
 void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx);
 void spapr_xive_mmio_set_enabled(sPAPRXive *xive, bool enable);
+void spapr_xive_map_mmio(sPAPRXive *xive);
+
+/*
+ * KVM XIVE device helpers
+ */
+void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index ec23253ba448..4bbba8d39a65 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -140,6 +140,7 @@
 #ifndef PPC_XIVE_H
 #define PPC_XIVE_H
 
+#include "sysemu/kvm.h"
 #include "hw/qdev-core.h"
 #include "hw/sysbus.h"
 #include "hw/ppc/xive_regs.h"
@@ -194,6 +195,9 @@ typedef struct XiveSource {
 uint32_tesb_shift;
 MemoryRegionesb_mmio;
 
+/* KVM support */
+void*esb_mmap;
+
 XiveNotifier*xive;
 } XiveSource;
 
@@ -421,4 +425,22 @@ static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, 
uint32_t nvt_idx)
 return (nvt_blk << 19) | nvt_idx;
 }
 
+/*
+ * KVM XIVE device helpers
+ */
+
+/* Keep inlined to discard compile of KVM code sections */
+static inline bool kvmppc_xive_enabled(void)
+{
+if (kvm_enabled()) {
+return machine_kernel_irqchip_allowed(MACHINE(qdev_get_machine()));
+} else {
+return false;
+}
+}
+
+void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
+void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
+void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
+
 #endif /* PPC_XIVE_H */
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index bdfaa4e70a83..d2159660f9f2 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -59,6 +59,7 @@ bool kvmppc_has_cap_fixup_hcalls(void);
 bool kvmppc_has_cap_htm(void);
 bool kvmppc_has_cap_mmu_radix(void);
 bool kvmppc_has_cap_mmu_hash_v3(void);
+bool kvmppc_has_cap_xive(void);
 int kvmppc_get_cap_safe_cache(void);
 int kvmppc_get_cap_safe_bounds_check(void);
 int kvmppc_get_cap_safe_indirect_branch(void);
@@ -307,6 +308,11 @@ static inline bool kvmppc_has_cap_mmu_hash_v3(void)
 return false;
 }
 
+static inline bool kvmppc_has_cap_xive(void)
+{
+return false;
+}
+
 static inline int kvmppc_get_cap_safe_cache(void)
 {
 return 0;
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index d391177ab81f..cf6d3a5f12e1 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -172,7 +172,7 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor 
*mon)
 }
 }
 
-static void spapr_xive_map_mmio(sPAPRXive *xive)
+void spapr_xive_map_mmio(sPAPRXive *xive)
 {
 sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->vc_base);
 sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1, xive->end_base);
@@ -250,6 +250,9 @@ static void spapr_xive_instance_init(Object *obj)

[Qemu-devel] [PATCH 00/13] spapr: add KVM support to the XIVE interrupt mode

2019-01-07 Thread Cédric Le Goater

Hello,

Following the 'dual' IRQ backend, this series adds KVM support to the
XIVE interrupt mode.

The first patches introduce the XIVE KVM device, state synchronization
and migration support under KVM. The second part of the patchset
modifies the XICS and XIVE interrupt models to add KVM support to the
'dual' IRQ backend.

This is a first round to check that the interfaces with Linux/KVM are
well in place. 

GitHub trees available here :
 
QEMU sPAPR:

  https://github.com/legoater/qemu/commits/xive-next
  
Linux/KVM:

  https://github.com/legoater/linux/commits/xive-5.0

OPAL:

  https://github.com/legoater/skiboot/commits/xive

Thanks,

C.

Cédric Le Goater (13):
  linux-headers: update to 5.0
  spapr/xive: add KVM support
  spapr/xive: add state synchronization with KVM
  spapr/xive: introduce a VM state change handler
  spapr/xive: add migration support for KVM
  spapr/xive: fix migration of the XiveTCTX under TCG
  ppc/xics: introduce a icp_kvm_connect() routine
  spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers
  sysbus: add a sysbus_mmio_unmap() helper
  spapr: introduce routines to delete the KVM IRQ device
  spapr: check for the activation of the KVM IRQ device
  spapr/xics: ignore the lower 4K in the IRQ number space
  spapr: add KVM support to the 'dual' machine

 default-configs/ppc64-softmmu.mak |   1 +
 include/hw/ppc/spapr.h|   4 +
 include/hw/ppc/spapr_xive.h   |  26 +
 include/hw/ppc/xics.h |   1 +
 include/hw/ppc/xive.h |  24 +
 include/hw/sysbus.h   |   1 +
 linux-headers/asm-powerpc/kvm.h   |  46 ++
 linux-headers/linux/kvm.h |   9 +
 target/ppc/kvm_ppc.h  |   6 +
 hw/core/sysbus.c  |  10 +
 hw/intc/spapr_xive.c  |  83 ++-
 hw/intc/spapr_xive_kvm.c  | 886 ++
 hw/intc/xics_kvm.c| 154 +-
 hw/intc/xive.c|  45 +-
 hw/ppc/spapr_irq.c| 114 +++-
 hw/ppc/spapr_rtas.c   |   2 +-
 target/ppc/kvm.c  |   7 +
 hw/intc/Makefile.objs |   1 +
 18 files changed, 1363 insertions(+), 57 deletions(-)
 create mode 100644 hw/intc/spapr_xive_kvm.c

-- 
2.20.1

Re: [Qemu-devel] [PULL 00/37] target-arm queue

2019-01-07 Thread Peter Maydell

On Mon, 7 Jan 2019 at 16:31, Peter Maydell  wrote:
>
> target-arm queue: the big things here are the new nRF51
> (microbit) devices and Luc's gdbstub multiprocess work.
>
> thanks
> -- PMM
>
> The following changes since commit a29644590f95166c8a13e5797f8e7701134b31d0:
>
>   Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-01-05' into 
> staging (2019-01-07 11:55:52 +)
>
> are available in the Git repository at:
>
>   https://git.linaro.org/people/pmaydell/qemu-arm.git 
> tags/pull-target-arm-20190107
>
> for you to fetch changes up to f831f955d420966471f5f8b316ba50d2523b1ff0:
>
>   Support u-boot noload images for arm as used by, NetBSD/evbarm GENERIC 
> kernel. (2019-01-07 15:46:20 +)
>
> 
> target-arm queue:
>  * Support u-boot 'noload' images for Arm (as used by NetBSD/evbarm GENERIC 
> kernel)
>  * hw/misc/tz-mpc: Fix value of BLK_MAX register
>  * target/arm: Emit barriers for A32/T32 load-acquire/store-release insns
>  * nRF51 SoC: add timer, GPIO, RNG peripherals
>  * hw/arm/allwinner-a10: Add the 'A' SRAM and the SRAM controller
>  * cpus.c: Fix race condition in cpu_stop_current()
>  * hw/arm: versal: Plug memory leaks
>  * Allow M profile boards to run even if -kernel not specified
>  * gdbstub: Add multiprocess extension support for use when the
>board has multiple CPUs of different types (like the Xilinx Zynq boards)
>  * target/arm: Don't decode S bit in SVE brk[ab] merging insns
>  * target/arm: Convert ARM_TBFLAG_* to FIELDs


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.0
for any user-visible changes.

-- PMM

Re: [Qemu-devel] [PATCH v15 23/26] sched: early boot clock

2019-01-07 Thread Pavel Tatashin

On Thu, Jan 3, 2019 at 6:43 PM Dominique Martinet
 wrote:
>
> Pavel Tatashin wrote on Thu, Jan 03, 2019:
> > Could you please send the config file and qemu arguments that were
> > used to reproduce this problem.
>
> Running qemu by hand, nothing fancy e.g. this works:
>
> # qemu-system-x86_64 -m 1G -smp 4 -drive 
> file=/root/kvm-wrapper/disks/f2.img,if=virtio -serial mon:stdio --enable-kvm 
> -cpu Haswell -device virtio-rng-pci -nographic
>
> (used a specific cpu just in case but normally runnning with cpu host on
> a skylake machine; can probably go older)
>
>
> qemu is fedora 29 blend as is:
> $ qemu-system-x86_64 --version
> QEMU emulator version 3.0.0 (qemu-3.0.0-3.fc29)
> Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
>
>
> compressed .config attached to the mail, this can likely be trimmed down
> some as well but that takes more time for me..
> I didn't rebuild the kernel so not 100% sure (comes from
> /proc/config.gz) but it should work on a 4.20-rc2 kernel as written in
> the first few lines; 857baa87b64 I referred to in another mail was
> merged in 4.19-rc1 so anything past that is probably OK to reproduce...
>
>
> Re-checked today with these exact options (fresh VM start; then suspend
> laptop for a bit, then reboot VM):
> [0.00] Hypervisor detected: KVM
> [0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00
> [ 2477.907447] kvm-clock: cpu 0, msr 153a4001, primary cpu clock
> [ 2477.907448] clocksource: kvm-clock: mask: 0x max_cycles: 
> 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
> [ 2477.907450] tsc: Detected 2592.000 MHz processor

I could not reproduce the problem. Did you suspend to memory between
wake ups? Does this time jump happen every time, even if your laptop
sleeps for a minute?

I have tried with qemu 2.6 and 3.1 on Ubuntu, testing 4.20rc2.

Pasha

Re: [Qemu-devel] [PATCH v1] dump: Set correct vaddr for ELF dump

2019-01-07 Thread Laszlo Ersek

On 01/07/19 13:14, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Dec 25, 2018 at 5:52 PM Jon Doron  wrote:
>>
>> vaddr needs to be equal to the paddr since the dump file represents the
>> physical memory image.
>>
>> Without setting vaddr correctly, GDB would load all the different memory
>> regions on top of each other to vaddr 0, thus making GDB showing the wrong
>> memory data for a given address.
>>
>> Signed-off-by: Jon Doron 
> 
> This is a non-trivial patch! (qemu-trivial, please ignore).
> 
>> ---
>>  dump.c   | 4 ++--
>>  scripts/dump-guest-memory.py | 1 +
>>  2 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/dump.c b/dump.c
>> index 4ec94c5e25..bf77a119ea 100644
>> --- a/dump.c
>> +++ b/dump.c
>> @@ -192,7 +192,7 @@ static void write_elf64_load(DumpState *s, MemoryMapping 
>> *memory_mapping,
>>  phdr.p_paddr = cpu_to_dump64(s, memory_mapping->phys_addr);
>>  phdr.p_filesz = cpu_to_dump64(s, filesz);
>>  phdr.p_memsz = cpu_to_dump64(s, memory_mapping->length);
>> -phdr.p_vaddr = cpu_to_dump64(s, memory_mapping->virt_addr);
>> +phdr.p_vaddr = phdr.p_paddr;
> 
> This is likely breaking paging=true somehow, which sets
> memory_mapping->virt_addr to non-0.
> 
> According to doc "If you want to use gdb to process the core, please
> set @paging to true."
> 
> Although I am not able to (gdb) x/10bx 0xa for example on a core
> produced with paging. Not sure why, anybody could help?
> 
>>  assert(memory_mapping->length >= filesz);
>>
>> @@ -216,7 +216,7 @@ static void write_elf32_load(DumpState *s, MemoryMapping 
>> *memory_mapping,
>>  phdr.p_paddr = cpu_to_dump32(s, memory_mapping->phys_addr);
>>  phdr.p_filesz = cpu_to_dump32(s, filesz);
>>  phdr.p_memsz = cpu_to_dump32(s, memory_mapping->length);
>> -phdr.p_vaddr = cpu_to_dump32(s, memory_mapping->virt_addr);
>> +phdr.p_vaddr = phdr.p_paddr;
>>
>>  assert(memory_mapping->length >= filesz);
>>
>> diff --git a/scripts/dump-guest-memory.py b/scripts/dump-guest-memory.py
>> index 198cd0fe40..2c587cbefc 100644
>> --- a/scripts/dump-guest-memory.py
>> +++ b/scripts/dump-guest-memory.py
>> @@ -163,6 +163,7 @@ class ELF(object):
>>  phdr = get_arch_phdr(self.endianness, self.elfclass)
>>  phdr.p_type = p_type
>>  phdr.p_paddr = p_paddr
>> +phdr.p_vaddr = p_paddr
> 
> With your proposed change though, I can dump memory with gdb...
> 
>>  phdr.p_filesz = p_size
>>  phdr.p_memsz = p_size
>>  self.segments.append(phdr)
>> --
>> 2.19.2
>>
>>
> 
> 

I've never used paging-enabled dumps. First, because doing so requires
QEMU to trust guest memory contents (see original commit 783e9b4826b9;
or more recently/generally, the @dump-guest-memory docs in
"qapi/misc.json"). Second, because whenever I had to deal with guest
memory dumps, I always used "crash" (which needs no paging), and the
subject guests were all Linux.

I can't comment on paging-enabled patches for dump, except that they
shouldn't regress the paging-disabled functionality. :) If the patches
satisfy that, I'm fine.

(I *am* surprised that GDB insists on p_vaddr equaling p_paddr; after
all, in the guest, the virtual address is "memory_mapping->virt_addr".
But, I would never claim to understand most of the ELF intricacies,
and/or what GDB requires on top of those.)

Thanks
Laszlo

Re: [Qemu-devel] [PATCH v6 0/3] block nodes graph visualization

2019-01-07 Thread Max Reitz

On 21.12.18 18:09, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> On the way of backup schemes development (and in general any complicated
> developments in Qemu block layer) it would be good to have an ability to print
> out graph of block nodes with their permissions.

Thanks, applied patches 1 and 2 to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v7] qemu-img info lists bitmap directory entries

2019-01-07 Thread Max Reitz

On 13.12.18 11:59, Andrey Shinkevich wrote:
> In the 'Format specific information' section of the 'qemu-img info'
> command output, the supplemental information about existing QCOW2
> bitmaps will be shown, such as a bitmap name, flags and granularity:
> 
> image: /vz/vmprivate/VM1/harddisk.hdd
> file format: qcow2
> virtual size: 64G (68719476736 bytes)
> disk size: 3.0M
> cluster_size: 1048576
> Format specific information:
> compat: 1.1
> lazy refcounts: true
> bitmaps:
> [0]:
> flags:
> [0]: in-use
> [1]: auto
> name: back-up1
> unknown flags: 4
> granularity: 65536
> [1]:
> flags:
> [0]: in-use
> [1]: auto
> name: back-up2
> unknown flags: 8
> granularity: 65536
> refcount bits: 16
> corrupt: false
> 
> Signed-off-by: Andrey Shinkevich 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Eric Blake 

The patch itself looks OK, but it breaks a couple of iotests, namely
060, 065, 082, 112, 198, and 206.

In addition, an iotest specifically for this new feature would be nice
as well.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2] qemu-io: Reinitialize optind to 1 (not 0) before parsing inner command.

2019-01-07 Thread Max Reitz

On 07.01.19 18:59, Eric Blake wrote:
> On 1/7/19 11:50 AM, Max Reitz wrote:
> 
> Note I didn't set optreset.  It's not present in glibc and the "hard
> reset" is not necessary in this context.

 But it sure sounds like FreeBSD requires you to set it, doesn't it?
> 
> No.  Quoting https://www.freebsd.org/cgi/man.cgi?getopt(3)
> 
>  The variables opterr and optind are both initialized to 1.The 
> optind
>  variable may be set to another value before a set of calls   to
> getopt() in
>  order to skip over   more or less argv entries.
> 
> so resetting it to 1 as a soft reset is no different to setting it to 2
> to skip argv[1].

In theory it is very much different because the text clearly says "in
order to skip", not "in order to re-parse or use a different argv".
Especially the fact that we use different argvs is something that
implementations may not expect.

> It just means that you didn't get the hard reset of
> internal state (there is definitely internal state if argv[1] was merged
> short options - but that state is cleared if you run getopt() until it
> returns -1;

While I agree that this is probably the case in practice, I see no
reason why this should be the case if it isn't guaranteed.  Why should
the state be cleared once you reach the end?

> there may also be internal state if you used extensions, but
> when you don't use extensions, such internal state is irrelevant).
> 
> I think the BSD man page needs updating, and that will probably happen
> if I file my promised POSIX defect.

Sure.  But as it is, it doesn't tell me that resetting optind to 1 is
sufficient to be able to parse a new argv.
>>> At the end of the day, both GNU optind=0 and BSD optreset=1 are
>>> sufficient to force a hard reset of all hidden state.  But if you don't
>>> use POSIX extensions, and always run getopt() until a -1 return, then
>>> setting optind=1 is a portable soft reset, regardless of how the hidden
>>> state is implemented, and regardless of how (or even if) libc offers a
>>> hard reset, even though POSIX itself is currently lacking that mention.
>>> (I should probably file a POSIX defect to get that wording listed in POSIX)
>>
>> Hm, OK?  Is there any guarantee for that behavior for FreeBSD, or is
>> that just how it is?  Because the man page is very clear on it:
>> "optreset must be set to 1".  It doesn't talk about soft or hard resets
>> like the glibc man page does.
>>
>> And if optreset not being available for glibc is the only issue, I'd say
>> adding it as a weak global variable would work without #ifdefs.
> 
> I don't see the point - Richard has already tested that optind = 1
> worked on BSD machines for our purposes, so we don't have to worry about
> the hard reset aspect of optreset=1.

Well, and as far as I remember glibc's memcpy() at one point only copied
in one direction and things broke badly once they reversed it at some
point for some CPUs.

Just because it works now doesn't mean it will work always if the
specification allows for different behavior.

> (But yes, it would also be nice if
> BSD and glibc folks could agree on how to do hard resets, instead of
> having two different incompatible ways)
I don't see why we should have a general code path if there is no
standard way of resetting getopt() other than "This seems to work".
What's so bad about a weak optreset or an
"#ifdef __FreeBSD__; optreset = 1; #endif"?

Sure, if you can get POSIX to define the fact that optind = 1 after
getopt() == -1 will be sufficient to start parsing a new argv, that'd be
great.  But there is no such standard yet (other than "Why would that
not work?").

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [virtio-dev] Re: [PATCH 3/3] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

2019-01-07 Thread Venu Busireddy

On 2018-12-10 12:31:43 -0500, Michael S. Tsirkin wrote:
> On Mon, Dec 10, 2018 at 11:15:48AM -0500, Venu Busireddy wrote:
> > From: Si-Wei Liu 
> > 
> > When a VF is hotplugged into the guest, datapath switching will be
> > performed immediately, which is sub-optimal in terms of timing, and
> > could end up with substantial network downtime. One of ways to shorten
> > this downtime is to switch the datapath only after the VF is seen to get
> > enabled by guest, indicated by the bus master bit in VF's PCI config
> > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > at that time to indicate this condition. Then management stack can kick
> > off datapath switching upon receiving the event.
> > 
> > Signed-off-by: Si-Wei Liu 
> > Signed-off-by: Venu Busireddy 
> 
> As management stacks can lose events, it's necessary
> to also have a query command to check device status.

Thanks for the feedback. Implemented the changes, and posted v2:

https://lists.oasis-open.org/archives/virtio-dev/201901/msg00046.html


> > ---
> >  hw/vfio/pci.c | 57 
> > +
> >  qapi/net.json | 26 ++
> >  2 files changed, 83 insertions(+)
> > 
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index ce1f33c..ea24ca2 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -34,6 +34,7 @@
> >  #include "pci.h"
> >  #include "trace.h"
> >  #include "qapi/error.h"
> > +#include "qapi/qapi-events-net.h"
> >  
> >  #define MSIX_CAP_LENGTH 12
> >  
> > @@ -42,6 +43,7 @@
> >  
> >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> >  
> >  /*
> >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >  {
> >  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> >  uint32_t val_le = cpu_to_le32(val);
> > +bool may_notify = false;
> > +bool master_was = false;
> >  
> >  trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> >  
> > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >   __func__, vdev->vbasedev.name, addr, val, len);
> >  }
> >  
> > +/* Bus Master Enabling/Disabling */
> > +if (pdev->failover_primary && current_cpu &&
> > +range_covers_byte(addr, len, PCI_COMMAND)) {
> > +master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > +PCI_COMMAND_MASTER);
> > +may_notify = true;
> > +}
> > +
> >  /* MSI/MSI-X Enabling/Disabling */
> >  if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> >  ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >  /* Write everything to QEMU to keep emulated bits correct */
> >  pci_default_write_config(pdev, addr, val, len);
> >  }
> > +
> > +if (may_notify) {
> > +bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > + PCI_COMMAND_MASTER);
> > +if (master_was != master_now) {
> > +vfio_failover_notify(vdev, master_now);
> > +}
> > +}
> >  }
> >  
> >  /*
> 
> It's very easy to have guest trigger a high load of events by playing
> with the bus master enable bits.  How about instead sending an event
> that just says "something changed" without the current status and have
> management issue a query command to check the status. QEMU then does not
> need to re-send an event until management issues a query command.
> 
> 
> > @@ -2801,6 +2821,17 @@ static void 
> > vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> >  vdev->req_enabled = false;
> >  }
> >  
> > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> > +{
> > +PCIDevice *pdev = &vdev->pdev;
> > +const char *n;
> > +gchar *path;
> > +
> > +n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> > +path = object_get_canonical_path(OBJECT(vdev));
> > +qapi_event_send_failover_primary_changed(!!n, n, path, status);
> > +}
> > +
> >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> >  {
> >  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
> >  vfio_put_group(group);
> >  }
> >  
> > +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> > +{
> > +PCIDevice *pdev = &vdev->pdev;
> > +
> > +/*
> > + * Guest driver may not get the chance to disable bus mastering
> > + * before the device object gets to be unrealized. In that event,
> > + * send out a "disabled" notification on behalf of guest driver.
> > + */
> > +if (pdev->failover_primary &&
> > +pdev->bus_master_enable_region.enabled) {
> > +vfio_fail

Re: [Qemu-devel] [PATCH for-4.0 v9 15/16] qemu_thread: supplement error handling for touch_all_pages

2019-01-07 Thread Markus Armbruster

Fei Li  writes:

> Supplement the error handling for touch_all_pages: add an Error
> parameter for it to propagate the error to its caller to do the
> handling in case it fails.
>
> Cc: Markus Armbruster 
> Signed-off-by: Fei Li 
> ---
>  util/oslib-posix.c | 25 -
>  1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 251e2f1aea..afc1d99093 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -431,15 +431,17 @@ static inline int get_memset_num_threads(int smp_cpus)
>  }
>  
>  static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
> -int smp_cpus)
> +int smp_cpus, Error **errp)
>  {
>  size_t numpages_per_thread;
>  size_t size_per_thread;
>  char *addr = area;
>  int i = 0;
> +int started_thread = 0;
>  
>  memset_thread_failed = false;
>  memset_num_threads = get_memset_num_threads(smp_cpus);
> +started_thread = memset_num_threads;
>  memset_thread = g_new0(MemsetThread, memset_num_threads);
>  numpages_per_thread = (numpages / memset_num_threads);
>  size_per_thread = (hpagesize * numpages_per_thread);
> @@ -448,14 +450,18 @@ static bool touch_all_pages(char *area, size_t 
> hpagesize, size_t numpages,
>  memset_thread[i].numpages = (i == (memset_num_threads - 1)) ?
>  numpages : numpages_per_thread;
>  memset_thread[i].hpagesize = hpagesize;
> -/* TODO: let the callers handle the error instead of abort() here */
> -qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
> -   do_touch_pages, &memset_thread[i],
> -   QEMU_THREAD_JOINABLE, &error_abort);
> +if (!qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
> +do_touch_pages, &memset_thread[i],
> +QEMU_THREAD_JOINABLE, errp)) {
> +memset_thread_failed = true;
> +started_thread = i;
> +goto out;

break rather than goto, please.

> +}
>  addr += size_per_thread;
>  numpages -= numpages_per_thread;
>  }
> -for (i = 0; i < memset_num_threads; i++) {
> +out:
> +for (i = 0; i < started_thread; i++) {
>  qemu_thread_join(&memset_thread[i].pgthread);
>  }

I don't like how @started_thread is computed.  The name suggests it's
the number of threads started so far.  That's the case when you
initialize it to zero.  But then you immediately set it to
memset_thread().  It again becomes the case only when you break the loop
on error, or when you complete it successfully.

There's no need for @started_thread, since the number of threads created
is readily available as @i:

   memset_num_threads = i;
   for (i = 0; i < memset_num_threads; i++) {
   qemu_thread_join(&memset_thread[i].pgthread);
   }

Rest of the function:

>  g_free(memset_thread);
   memset_thread = NULL;

   return memset_thread_failed;
   }

If do_touch_pages() set memset_thread_failed(), we return false without
setting an error.  I believe you should

   if (memset_thread_failed) {
   error_setg(errp, "os_mem_prealloc: Insufficient free host memory "
   "pages available to allocate guest RAM");
   return false;
   }
   return true;

here, and ...

> @@ -471,6 +477,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, 
> int smp_cpus,
>  struct sigaction act, oldact;
>  size_t hpagesize = qemu_fd_getpagesize(fd);
>  size_t numpages = DIV_ROUND_UP(memory, hpagesize);
> +Error *local_err = NULL;
>  
>  memset(&act, 0, sizeof(act));
>  act.sa_handler = &sigbus_handler;
> @@ -484,9 +491,9 @@ void os_mem_prealloc(int fd, char *area, size_t memory, 
> int smp_cpus,
>  }
>  
>  /* touch pages simultaneously */
> -if (touch_all_pages(area, hpagesize, numpages, smp_cpus)) {
> -error_setg(errp, "os_mem_prealloc: Insufficient free host memory "
> -"pages available to allocate guest RAM");
> +if (touch_all_pages(area, hpagesize, numpages, smp_cpus, &local_err)) {
> +error_propagate_prepend(errp, local_err, "os_mem_prealloc: 
> Insufficient"
> +" free host memory pages available to allocate guest RAM: ");
>  }

... not mess with the error message here, i.e.

   touch_all_pages(area, hpagesize, numpages, smp_cpus), errp);

>  
>  ret = sigaction(SIGBUS, &oldact, NULL);

Re: [Qemu-devel] [PATCH v2] qemu-io: Reinitialize optind to 1 (not 0) before parsing inner command.

2019-01-07 Thread Eric Blake

On 1/7/19 11:50 AM, Max Reitz wrote:

 Note I didn't set optreset.  It's not present in glibc and the "hard
 reset" is not necessary in this context.
>>>
>>> But it sure sounds like FreeBSD requires you to set it, doesn't it?

No.  Quoting https://www.freebsd.org/cgi/man.cgi?getopt(3)

 The variables opterr and optind are both initialized to 1.  The optind
 variable may be set to another value before a set of calls to
getopt() in
 order to skip over more or less argv entries.

so resetting it to 1 as a soft reset is no different to setting it to 2
to skip argv[1].  It just means that you didn't get the hard reset of
internal state (there is definitely internal state if argv[1] was merged
short options - but that state is cleared if you run getopt() until it
returns -1; there may also be internal state if you used extensions, but
when you don't use extensions, such internal state is irrelevant).

I think the BSD man page needs updating, and that will probably happen
if I file my promised POSIX defect.

>> At the end of the day, both GNU optind=0 and BSD optreset=1 are
>> sufficient to force a hard reset of all hidden state.  But if you don't
>> use POSIX extensions, and always run getopt() until a -1 return, then
>> setting optind=1 is a portable soft reset, regardless of how the hidden
>> state is implemented, and regardless of how (or even if) libc offers a
>> hard reset, even though POSIX itself is currently lacking that mention.
>> (I should probably file a POSIX defect to get that wording listed in POSIX)
> 
> Hm, OK?  Is there any guarantee for that behavior for FreeBSD, or is
> that just how it is?  Because the man page is very clear on it:
> "optreset must be set to 1".  It doesn't talk about soft or hard resets
> like the glibc man page does.
> 
> And if optreset not being available for glibc is the only issue, I'd say
> adding it as a weak global variable would work without #ifdefs.

I don't see the point - Richard has already tested that optind = 1
worked on BSD machines for our purposes, so we don't have to worry about
the hard reset aspect of optreset=1.  (But yes, it would also be nice if
BSD and glibc folks could agree on how to do hard resets, instead of
having two different incompatible ways).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [Bug 1810545] Re: [alpha] Strange exception address reported

2019-01-07 Thread Peter Maydell

The problem seems to be that the PC we report for an OPCDEC is first selected 
by gen_invalid()/gen-excp() in target/alpha/translate.c, which uses pc_next (ie 
the insn's address plus 4). But that is then handed through to our custom 
PALcode 
(https://git.qemu.org/?p=qemu-palcode.git;a=blob;f=pal.S;h=1781c4b415700ca3a68af07fdae90ae43e722501;hb=HEAD)
 which does
  addqp6, 4, p1  // increment past the faulting insn
resulting in insn + 8.

That is, the palcode and the QEMU code have a disagreement about what
the (private) API between them is. I'm not sure which side is wrong and
should be corrected. I think the linux-user code assumes the same thing
that translate.c is doing, so perhaps the palcode.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1810545

Title:
  [alpha] Strange exception address reported

Status in QEMU:
  New

Bug description:
  For some reason the SIGILL handler receives a different address under
  qemu than it used to on real hardware. I don't know specifics about
  the hardware used back then – it was some sort of 21264a somewhere
  between 600-800 MHz –, and I cannot say anything about the kernel as
  well, but I know that it delivered the faulting address +4, while
  under qemu it receives +8. I know because CACAO, an early Java JIT
  compiler extracts the address from the SIGILL handler and inspects the
  code at the faulting site, and it has substracted 4 from the handler
  address since the dawn of time, and this used to produce the desired
  result on the Alpha hardware. It actually ran on two different Alpha
  machines over the years, and both behaved identically.

  The handler looks like this:
  void handler_sigill(int sig, siginfo_t *siginfo, void *_p)
  {
uintptr_t trap_address = (uintptr_t) (((ucontext_t*) 
_p)->uc_mcontext.sc_pc) - 4;
  }

  (paraphrasing, the actual code is here: https://bitbucket.org/cacaovm
  /cacao-
  staging/src/c8d3fbab864c3243f97629fcfa8d84ba71f38157/src/vm/jit/alpha/linux
  /md-os.cpp?at=default&fileviewer=file-view-default#md-os.cpp-65)

  I don't know much about the qemu source code and cannot say where this
  is coming from at first glance. The gen_invalid function uses pc_next,
  which sounds like the next instruction, not the next-to-next ;). In
  theory it could actually be the kernel's fault, although I consider
  this unlikely.

  This is qemu-system-alpha with apparently the last Debian which
  existed for Alpha (lenny). The kernel is 2.6.26-2-alpha-generic
  (Debian 2.6.26-29). Observed with qemu git 1b3e80082b, but I guess it
  is the same with any version.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1810545/+subscriptions

[Qemu-devel] [PATCH v2 3/5] virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.

2019-01-07 Thread Venu Busireddy

Add a query command to check the status of the FAILOVER_STANDBY_CHANGED
state of the virtio_net devices.

Signed-off-by: Venu Busireddy 
---
 hw/net/virtio-net.c| 16 
 include/hw/virtio/virtio-net.h |  1 +
 include/net/net.h  |  2 ++
 net/net.c  | 59 ++
 qapi/net.json  | 46 
 5 files changed, 124 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7b1bcde..a4e07ac 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -263,9 +263,11 @@ static void virtio_net_failover_notify_event(VirtIONet *n, 
uint8_t status)
  */
 if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
 (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
+n->standby_enabled = true;
 qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
 } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
 (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+n->standby_enabled = false;
 qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
 }
 }
@@ -448,6 +450,19 @@ static RxFilterInfo 
*virtio_net_query_rxfilter(NetClientState *nc)
 return info;
 }
 
+static StandbyStatusInfo *virtio_net_query_standby_status(NetClientState *nc)
+{
+StandbyStatusInfo *info;
+VirtIONet *n = qemu_get_nic_opaque(nc);
+
+info = g_malloc0(sizeof(*info));
+info->device = g_strdup(n->netclient_name);
+info->path = g_strdup(object_get_canonical_path(OBJECT(n->qdev)));
+info->enabled = n->standby_enabled;
+
+return info;
+}
+
 static void virtio_net_reset(VirtIODevice *vdev)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
@@ -1923,6 +1938,7 @@ static NetClientInfo net_virtio_info = {
 .receive = virtio_net_receive,
 .link_status_changed = virtio_net_set_link_status,
 .query_rx_filter = virtio_net_query_rxfilter,
+.query_standby_status = virtio_net_query_standby_status,
 };
 
 static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 4d7f3c8..9071e96 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -103,6 +103,7 @@ typedef struct VirtIONet {
 int announce_counter;
 bool needs_vnet_hdr_swap;
 bool mtu_bypass_backend;
+bool standby_enabled;
 } VirtIONet;
 
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
diff --git a/include/net/net.h b/include/net/net.h
index ec13702..61e8513 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -50,6 +50,7 @@ typedef void (NetCleanup) (NetClientState *);
 typedef void (LinkStatusChanged)(NetClientState *);
 typedef void (NetClientDestructor)(NetClientState *);
 typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
+typedef StandbyStatusInfo *(QueryStandbyStatus)(NetClientState *);
 typedef bool (HasUfo)(NetClientState *);
 typedef bool (HasVnetHdr)(NetClientState *);
 typedef bool (HasVnetHdrLen)(NetClientState *, int);
@@ -71,6 +72,7 @@ typedef struct NetClientInfo {
 NetCleanup *cleanup;
 LinkStatusChanged *link_status_changed;
 QueryRxFilter *query_rx_filter;
+QueryStandbyStatus *query_standby_status;
 NetPoll *poll;
 HasUfo *has_ufo;
 HasVnetHdr *has_vnet_hdr;
diff --git a/net/net.c b/net/net.c
index 1f7d626..a6d8e73 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1320,6 +1320,65 @@ RxFilterInfoList *qmp_query_rx_filter(bool has_name, 
const char *name,
 return filter_list;
 }
 
+StandbyStatusInfoList *qmp_query_standby_status(bool has_device,
+const char *device,
+Error **errp)
+{
+NetClientState *nc;
+StandbyStatusInfoList *status_list = NULL, *last_entry = NULL;
+
+QTAILQ_FOREACH(nc, &net_clients, next) {
+StandbyStatusInfoList *entry;
+StandbyStatusInfo *info;
+
+if (has_device && strcmp(nc->name, device) != 0) {
+continue;
+}
+
+/* only query standby status information of NIC */
+if (nc->info->type != NET_CLIENT_DRIVER_NIC) {
+if (has_device) {
+error_setg(errp, "net client(%s) isn't a NIC", device);
+return NULL;
+}
+continue;
+}
+
+/* only query information on queue 0 since the info is per nic,
+ * not per queue
+ */
+if (nc->queue_index != 0)
+continue;
+
+if (nc->info->query_standby_status) {
+info = nc->info->query_standby_status(nc);
+entry = g_malloc0(sizeof(*entry));
+entry->value = info;
+
+if (!status_list) {
+status_list = entry;
+} else {
+last_entry->next = entry;
+}
+last_entry = entry;
+

1 2 3 4 >

1 - 100 of 328 matches

Mail list logo