[Bug 1844635] Re: qemu bug where load linux kernel

2020-05-11 Thread Thomas Huth
** Information type changed from Private Security to Public Security

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1844635

Title:
  qemu bug where load linux kernel

Status in QEMU:
  Fix Released

Bug description:
  i found a qemu bug ,when the qemu start and parse the kernel file .

  This vulnerability can be exploited.

  thanks

  /

  
  (gdb) set args -nodefaults -device pc-testdev -device 
isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device 
pci-testdev -machine accel=kvm -m 2048  -smp 2 -cpu host -machine 
kernel_irqchip=split -kernel poc1
  (gdb) r
  Starting program: /usr/bin/qemu-system-x86_64 -nodefaults -device pc-testdev 
-device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device 
pci-testdev -machine accel=kvm -m 2048  -smp 2 -cpu host -machine 
kernel_irqchip=split -kernel ./poc/poc1
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
  [New Thread 0x7fffe9a03700 (LWP 30066)]
  [New Thread 0x7fffe9202700 (LWP 30068)]
  [New Thread 0x7fffe8a01700 (LWP 30069)]

  Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
  __memmove_avx_unaligned_erms () at 
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:249
  249   ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file 
or directory.
  (gdb) bt
  #0  0x72390b1f in __memmove_avx_unaligned_erms () at 
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:249
  #1  0x559ebdcf in rom_copy ()
  #2  0x558dd1b3 in load_multiboot ()
  #3  0x558de1c3 in  ()
  #4  0x558e19d1 in pc_memory_init ()
  #5  0x558e4ee3 in  ()
  #6  0x559e8500 in machine_run_board_init ()
  #7  0x55834959 in main ()
  (gdb) c
  Continuing.
  Couldn't get registers: No such process.
  Couldn't get registers: No such process.
  (gdb) [Thread 0x7fffe8a01700 (LWP 30069) exited]
  [Thread 0x7fffe9202700 (LWP 30068) exited]
  [Thread 0x7fffe9a03700 (LWP 30066) exited]

  Program terminated with signal SIGSEGV, Segmentation fault.
  The program no longer exists.

  ***/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1844635/+subscriptions



Re: [PATCH V2] Add a new PIIX option to control PCI hot unplugging of devices on non-root buses

2020-05-11 Thread Ani Sinha


> On May 12, 2020, at 12:23 AM, Igor Mammedov  wrote:
> 
>> 
>> static void build_append_pci_bus_devices(Aml *parent_scope, PCIBus *bus,
>> - bool pcihp_bridge_en)
>> + bool pcihp_bridge_en,
>> + bool pcihup_bridge_en)
>> {
>> Aml *dev, *notify_method = NULL, *method;
>> QObject *bsel;
>> @@ -479,11 +484,14 @@ static void build_append_pci_bus_devices(Aml 
>> *parent_scope, PCIBus *bus,
>> dev = aml_device("S%.02X", PCI_DEVFN(slot, 0));
>> aml_append(dev, aml_name_decl("_SUN", aml_int(slot)));
>> aml_append(dev, aml_name_decl("_ADR", aml_int(slot << 16)));
>> -method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
>> -aml_append(method,
>> -aml_call2("PCEJ", aml_name("BSEL"), aml_name("_SUN"))
>> -);
>> -aml_append(dev, method);
>> +if (pcihup_bridge_en || pci_bus_is_root(bus)) {
> 
> so you are keeping unplug anyway in case of host bridge, so user will see
> eject icon if device is in root bus?

Yes, the user will see the eject option from system tray for devices plugged 
into the root bus. The idea is that whereas we disallow some devices from 
hot-unplugging, other devices which are plugged into the root bus can be hot 
plugged and unplugged. This leaves some room for flexibility across devices and 
VMs.

> 
> Other thing about this patch is that it only partially disable hotplug,
> I'd rather do it the way hardware does i.e. full hotplug or no hotplug at all.
> (like the other hypervisors have done it, to workaround this Windows 
> 'feature’)

So the main objection against this patch is that with this option enabled, we 
are violating what real HW does and since we want emulated HW to mimic real HW 
behavior as close as possible, we are breaking this assumption. Am I correct?

> 
> which is possible is one puts device on pci bridge without hotplug, i.e.
> 
> -global PIIX4_PM.acpi-pci-hotplug-with-bridge-support=off

right.

> 
> that of cause leaves apci hotplug on and as you noticed earlier
> Windows will offer to eject any device on root bus including directly
> attached bridges. And currently there is no way to disable that.

Right. However, I have tested that even though the PCI bridge shows up as a 
device in the “safely remove HW” option in the system tray, trying to eject a 
PCI bridge with devices attached will result in failure with the error message 
“this device is currently in use”.

> 
> Will following hack work for you?
> possible permutations
> 1) ACPI hotplug everywhere
> -global PIIX4_PM.acpi-pci-hotplug=on -global 
> PIIX4_PM.acpi-pci-hotplug-with-bridge-support=on -device 
> pci-bridge,chassis_nr=1,shpc=doesnt_matter -device 
> e1000,bus=pci.1,addr=01,id=netdev1 
> 
> 2) No hotplug at all
> -global PIIX4_PM.acpi-pci-hotplug=off -global 
> PIIX4_PM.acpi-pci-hotplug-with-bridge-support=on -device 
> pci-bridge,chassis_nr=1,shpc=off -device e1000,bus=pci.1,addr=01,id=netdev1
> 
> -global PIIX4_PM.acpi-pci-hotplug=off -global 
> PIIX4_PM.acpi-pci-hotplug-with-bridge-support=off -device 
> pci-bridge,chassis_nr=1,shpc=doesnt_matter  -device 
> e1000,bus=pci.1,addr=01,id=netdev1

Given that my patch is not acceptable, I’d prefer the following in the order of 
preference:

(a) Have an option to disable hot ejection of PCI-PCI bridge so that Windows 
does not even show this HW in the “safely remove HW” option. If we can do this 
then from OS perspective the GUI options will be same as what is available with 
PCIE/q35 - none of the devices will be hot ejectable if the hot plug option is 
turned off from the PCIE slots where devices are plugged into.
I looked at the code. It seems to manipulate ACPI tables of the empty slots of 
the root bus where no devices are attached (see comment "/* add hotplug slots 
for non present devices */ “). For cold plugged bridges, it recurses down to 
scan the slots of the bridge. Is it possible to disable hot plug for the slot 
to which the bridge is attached?

(b) Failing above, having a global option to disable all hot plug, including 
the 32 slots of the root bus would be good. However, this does not give us the 
flexibility we have with PCIE (that is, to hot plug a  device, we can always 
plug it to a slot with hot plug enabled).


Thanks for looking into my requirement more seriously,
ani


> 
> 3) looks like SHPC kicks in, but it still needs to some bridge description in 
> ACPI that
>   acpi-pci-hotplug-with-bridge-support provides, probably with this you can 
> individually flip hotplug on
>   colplugged bridges using 'shpc' property (requires Vista or newer, tested 
> win10).
> 
>   This needs some investigation so we could remove unsed AML and IO ports, 
> but I'm not really interested
>   in PCI stuff. So if 1+2 works for you, I'll post formal patches. If #3 is 
> required feel 

[Bug 1878136] [NEW] Assertion failures in ati_reg_read_offs/ati_reg_write_offs

2020-05-11 Thread Alexander Bulekov
Public bug reported:

Hello,
While fuzzing, I found inputs that trigger assertion failures in
ati_reg_read_offs/ati_reg_write_offs

uint32_t extract32(uint32_t, int, int): Assertion `start >= 0 && length
> 0 && length <= 32 - start' failed

#3 0x76866092 in __GI___assert_fail (assertion=0x56e760c0  
"start >= 0 && length > 0 && length <= 32 - start", file=0x56e76120  
"/home/alxndr/Development/qemu/include/qemu/bitops.h", line=0x12c, 
function=0x56e76180 <__PRETTY_FUNCTION__.extract32> "uint32_t 
extract32(uint32_t, int, int)") at assert.c:101
#4 0x5653d8a7 in ati_mm_read (opaque=, addr=0x1a, 
size=) at 
/home/alxndr/Development/qemu/include/qemu/log-for-trace.h:29
#5 0x5653c825 in ati_mm_read (opaque=, addr=0x4, 
size=) at /home/alxndr/Development/qemu/hw/display/ati.c:289
#6 0x5601446e in memory_region_read_accessor (mr=0x6314dc20, 
addr=, value=, size=, 
shift=, mask=, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:434
#7 0x56001a70 in access_with_adjusted_size (addr=, 
value=, size=, access_size_min=, 
access_size_max=, access_fn=, mr=0x6314dc20, 
attrs=...) at /home/alxndr/Development/qemu/memory.c:544
#8 0x56001a70 in memory_region_dispatch_read1 (mr=0x6314dc20, 
addr=0x4, pval=, size=0x4, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1396

I can reproduce it in qemu 5.0 built with using:
cat << EOF | ~/Development/qemu/build/i386-softmmu/qemu-system-i386 -M 
pc-q35-5.0 -device ati-vga -nographic -qtest stdio -monitor none -serial none
outl 0xcf8 0x80001018
outl 0xcfc 0xe200
outl 0xcf8 0x8000101c
outl 0xcf8 0x80001004
outw 0xcfc 0x7
outl 0xcf8 0x8000fa20
write 0xe204 0x1 0x1a
readq 0xe200
EOF

Similarly for ati_reg_write_offs:
cat << EOF | ~/Development/qemu/build/i386-softmmu/qemu-system-i386 -M 
pc-q35-5.0 -device ati-vga -nographic -qtest stdio -monitor none -serial none
outl 0xcf8 0x80001018
outl 0xcfc 0xe200
outl 0xcf8 0x8000101c
outl 0xcf8 0x80001004
outw 0xcfc 0x7
outl 0xcf8 0x8000fa20
write 0xe200 0x8 0x6a006a00
EOF

I also attached the traces to this launchpad report, in case the
formatting is broken:

qemu-system-i386 -M pc-q35-5.0 -device ati-vga -nographic -qtest stdio
-monitor none -serial none < attachment

Please let me know if I can provide any further info.
-Alex

** Affects: qemu
 Importance: Undecided
 Status: New

** Attachment added: "attachment"
   https://bugs.launchpad.net/bugs/1878136/+attachment/5370128/+files/attachment

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1878136

Title:
   Assertion failures in ati_reg_read_offs/ati_reg_write_offs

Status in QEMU:
  New

Bug description:
  Hello,
  While fuzzing, I found inputs that trigger assertion failures in
  ati_reg_read_offs/ati_reg_write_offs

  uint32_t extract32(uint32_t, int, int): Assertion `start >= 0 &&
  length > 0 && length <= 32 - start' failed

  #3 0x76866092 in __GI___assert_fail (assertion=0x56e760c0  
"start >= 0 && length > 0 && length <= 32 - start", file=0x56e76120  
"/home/alxndr/Development/qemu/include/qemu/bitops.h", line=0x12c, 
function=0x56e76180 <__PRETTY_FUNCTION__.extract32> "uint32_t 
extract32(uint32_t, int, int)") at assert.c:101
  #4 0x5653d8a7 in ati_mm_read (opaque=, addr=0x1a, 
size=) at 
/home/alxndr/Development/qemu/include/qemu/log-for-trace.h:29
  #5 0x5653c825 in ati_mm_read (opaque=, addr=0x4, 
size=) at /home/alxndr/Development/qemu/hw/display/ati.c:289
  #6 0x5601446e in memory_region_read_accessor (mr=0x6314dc20, 
addr=, value=, size=, 
shift=, mask=, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:434
  #7 0x56001a70 in access_with_adjusted_size (addr=, 
value=, size=, access_size_min=, 
access_size_max=, access_fn=, mr=0x6314dc20, 
attrs=...) at /home/alxndr/Development/qemu/memory.c:544
  #8 0x56001a70 in memory_region_dispatch_read1 (mr=0x6314dc20, 
addr=0x4, pval=, size=0x4, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1396

  I can reproduce it in qemu 5.0 built with using:
  cat << EOF | ~/Development/qemu/build/i386-softmmu/qemu-system-i386 -M 
pc-q35-5.0 -device ati-vga -nographic -qtest stdio -monitor none -serial none
  outl 0xcf8 0x80001018
  outl 0xcfc 0xe200
  outl 0xcf8 0x8000101c
  outl 0xcf8 0x80001004
  outw 0xcfc 0x7
  outl 0xcf8 0x8000fa20
  write 0xe204 0x1 0x1a
  readq 0xe200
  EOF

  Similarly for ati_reg_write_offs:
  cat << EOF | ~/Development/qemu/build/i386-softmmu/qemu-system-i386 -M 
pc-q35-5.0 -device ati-vga -nographic -qtest stdio -monitor none -serial none
  outl 0xcf8 0x80001018
  outl 0xcfc 0xe200
  outl 0xcf8 0x8000101c
  outl 0xcf8 0x80001004
  outw 0xcfc 0x7
  outl 0xcf8 0x8000fa20
  write 0xe200 0x8 0x6a006a00
  EOF

  I also attached the traces to this launchpad report, in case the
  formatting is broken:

[Bug 1878136] Re: Assertion failures in ati_reg_read_offs/ati_reg_write_offs

2020-05-11 Thread Alexander Bulekov
** Attachment added: "The qtest commands for triggering the assertion in 
ati_reg_read_offs"
   
https://bugs.launchpad.net/qemu/+bug/1878136/+attachment/5370129/+files/attachment2

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1878136

Title:
   Assertion failures in ati_reg_read_offs/ati_reg_write_offs

Status in QEMU:
  New

Bug description:
  Hello,
  While fuzzing, I found inputs that trigger assertion failures in
  ati_reg_read_offs/ati_reg_write_offs

  uint32_t extract32(uint32_t, int, int): Assertion `start >= 0 &&
  length > 0 && length <= 32 - start' failed

  #3 0x76866092 in __GI___assert_fail (assertion=0x56e760c0  
"start >= 0 && length > 0 && length <= 32 - start", file=0x56e76120  
"/home/alxndr/Development/qemu/include/qemu/bitops.h", line=0x12c, 
function=0x56e76180 <__PRETTY_FUNCTION__.extract32> "uint32_t 
extract32(uint32_t, int, int)") at assert.c:101
  #4 0x5653d8a7 in ati_mm_read (opaque=, addr=0x1a, 
size=) at 
/home/alxndr/Development/qemu/include/qemu/log-for-trace.h:29
  #5 0x5653c825 in ati_mm_read (opaque=, addr=0x4, 
size=) at /home/alxndr/Development/qemu/hw/display/ati.c:289
  #6 0x5601446e in memory_region_read_accessor (mr=0x6314dc20, 
addr=, value=, size=, 
shift=, mask=, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:434
  #7 0x56001a70 in access_with_adjusted_size (addr=, 
value=, size=, access_size_min=, 
access_size_max=, access_fn=, mr=0x6314dc20, 
attrs=...) at /home/alxndr/Development/qemu/memory.c:544
  #8 0x56001a70 in memory_region_dispatch_read1 (mr=0x6314dc20, 
addr=0x4, pval=, size=0x4, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1396

  I can reproduce it in qemu 5.0 built with using:
  cat << EOF | ~/Development/qemu/build/i386-softmmu/qemu-system-i386 -M 
pc-q35-5.0 -device ati-vga -nographic -qtest stdio -monitor none -serial none
  outl 0xcf8 0x80001018
  outl 0xcfc 0xe200
  outl 0xcf8 0x8000101c
  outl 0xcf8 0x80001004
  outw 0xcfc 0x7
  outl 0xcf8 0x8000fa20
  write 0xe204 0x1 0x1a
  readq 0xe200
  EOF

  Similarly for ati_reg_write_offs:
  cat << EOF | ~/Development/qemu/build/i386-softmmu/qemu-system-i386 -M 
pc-q35-5.0 -device ati-vga -nographic -qtest stdio -monitor none -serial none
  outl 0xcf8 0x80001018
  outl 0xcfc 0xe200
  outl 0xcf8 0x8000101c
  outl 0xcf8 0x80001004
  outw 0xcfc 0x7
  outl 0xcf8 0x8000fa20
  write 0xe200 0x8 0x6a006a00
  EOF

  I also attached the traces to this launchpad report, in case the
  formatting is broken:

  qemu-system-i386 -M pc-q35-5.0 -device ati-vga -nographic -qtest stdio
  -monitor none -serial none < attachment

  Please let me know if I can provide any further info.
  -Alex

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1878136/+subscriptions



Re: [PATCH v16 QEMU 09/16] vfio: Add save state functions to SaveVMHandlers

2020-05-11 Thread Yan Zhao
On Mon, May 11, 2020 at 05:53:37PM +0800, Kirti Wankhede wrote:
> 
> 
> On 5/5/2020 10:07 AM, Alex Williamson wrote:
> > On Tue, 5 May 2020 04:48:14 +0530
> > Kirti Wankhede  wrote:
> > 
> >> On 3/26/2020 3:33 AM, Alex Williamson wrote:
> >>> On Wed, 25 Mar 2020 02:39:07 +0530
> >>> Kirti Wankhede  wrote:
> >>>

<...>

>  +static int vfio_save_iterate(QEMUFile *f, void *opaque)
>  +{
>  +VFIODevice *vbasedev = opaque;
>  +int ret, data_size;
>  +
>  +qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
>  +
>  +data_size = vfio_save_buffer(f, vbasedev);
>  +
>  +if (data_size < 0) {
>  +error_report("%s: vfio_save_buffer failed %s", vbasedev->name,
>  + strerror(errno));
>  +return data_size;
>  +}
>  +
>  +qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>  +
>  +ret = qemu_file_get_error(f);
>  +if (ret) {
>  +return ret;
>  +}
>  +
>  +trace_vfio_save_iterate(vbasedev->name, data_size);
>  +if (data_size == 0) {
>  +/* indicates data finished, goto complete phase */
>  +return 1;
> >>>
> >>> But it's pending_bytes not data_size that indicates we're done.  How do
> >>> we get away with ignoring pending_bytes for the save_live_iterate phase?
> >>>
> >>
> >> This is requirement mentioned above qemu_savevm_state_iterate() which
> >> calls .save_live_iterate.
> >>
> >> /* 
> >>* this function has three return values:
> >>*   negative: there was one error, and we have -errno.
> >>*   0 : We haven't finished, caller have to go again
> >>*   1 : We have finished, we can go to complete phase
> >>*/
> >> int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
> >>
> >> This is to serialize savevm_state.handlers (or in other words devices).
> > 
> > I've lost all context on this question in the interim, but I think this
> > highlights my question.  We use pending_bytes to know how close we are
> > to the end of the stream and data_size to iterate each transaction
> > within that stream.  So how does data_size == 0 indicate we've
> > completed the current phase?  It seems like pending_bytes should
> > indicate that.  Thanks,
> > 
> 
> Fixing this by adding a read on pending_bytes if its 0 and return 
> accordingly.
>  if (migration->pending_bytes == 0) {
>  ret = vfio_update_pending(vbasedev);
>  if (ret) {
>  return ret;
>  }
> 
>  if (migration->pending_bytes == 0) {
>  /* indicates data finished, goto complete phase */
>  return 1;
>  }
>  }
> 

just a question. if 1 is only returned when migration->pending_bytes is 0,
does that mean .save_live_iterate of vmstates after "vfio-pci"
would never be called until migration->pending_bytes is 0 ?

as in qemu_savevm_state_iterate(),

qemu_savevm_state_iterate {
...
  QTAILQ_FOREACH(se, _state.handlers, entry) {
...
ret = se->ops->save_live_iterate(f, se->opaque);
...
if (ret <= 0) {
/* Do not proceed to the next vmstate before this one reported
   completion of the current stage. This serializes the migration
   and reduces the probability that a faster changing state is
   synchronized over and over again. */
break;
}
  }
  return ret;
}

in ram's migration code, its pending_bytes(remaining_size) is only updated in
ram_save_pending() when it's below threshold, which means in
ram_save_iterate() the pending_bytes is possible to be 0, so other
vmstates have their chance to be called.

Thanks
Yan




Re: [PATCH v16 QEMU 09/16] vfio: Add save state functions to SaveVMHandlers

2020-05-11 Thread Yan Zhao
On Mon, May 11, 2020 at 06:22:47PM +0800, Kirti Wankhede wrote:
> 
> 
> On 5/9/2020 11:01 AM, Yan Zhao wrote:
> > On Wed, Mar 25, 2020 at 05:09:07AM +0800, Kirti Wankhede wrote:
> >> Added .save_live_pending, .save_live_iterate and 
> >> .save_live_complete_precopy
> >> functions. These functions handles pre-copy and stop-and-copy phase.
> >>
> >> In _SAVING|_RUNNING device state or pre-copy phase:
> >> - read pending_bytes. If pending_bytes > 0, go through below steps.
> >> - read data_offset - indicates kernel driver to write data to staging
> >>buffer.
> >> - read data_size - amount of data in bytes written by vendor driver in
> >>migration region.
> > I think we should change the sequence of reading data_size and
> > data_offset. see the next comment below.
> > 
> >> - read data_size bytes of data from data_offset in the migration region.
> >> - Write data packet to file stream as below:
> >> {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
> >> VFIO_MIG_FLAG_END_OF_STATE }
> >>
> >> In _SAVING device state or stop-and-copy phase
> >> a. read config space of device and save to migration file stream. This
> >> doesn't need to be from vendor driver. Any other special config state
> >> from driver can be saved as data in following iteration.
> >> b. read pending_bytes. If pending_bytes > 0, go through below steps.
> >> c. read data_offset - indicates kernel driver to write data to staging
> >> buffer.
> >> d. read data_size - amount of data in bytes written by vendor driver in
> >> migration region.
> >> e. read data_size bytes of data from data_offset in the migration region.
> >> f. Write data packet as below:
> >> {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
> >> g. iterate through steps b to f while (pending_bytes > 0)
> >> h. Write {VFIO_MIG_FLAG_END_OF_STATE}
> >>
> >> When data region is mapped, its user's responsibility to read data from
> >> data_offset of data_size before moving to next steps.
> >>
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>   hw/vfio/migration.c   | 245 
> >> +-
> >>   hw/vfio/trace-events  |   6 ++
> >>   include/hw/vfio/vfio-common.h |   1 +
> >>   3 files changed, 251 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index 033f76526e49..ecbeed5182c2 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -138,6 +138,137 @@ static int vfio_migration_set_state(VFIODevice 
> >> *vbasedev, uint32_t mask,
> >>   return 0;
> >>   }
> >>   
> >> +static void *find_data_region(VFIORegion *region,
> >> +  uint64_t data_offset,
> >> +  uint64_t data_size)
> >> +{
> >> +void *ptr = NULL;
> >> +int i;
> >> +
> >> +for (i = 0; i < region->nr_mmaps; i++) {
> >> +if ((data_offset >= region->mmaps[i].offset) &&
> >> +(data_offset < region->mmaps[i].offset + 
> >> region->mmaps[i].size) &&
> >> +(data_size <= region->mmaps[i].size)) {
> >> +ptr = region->mmaps[i].mmap + (data_offset -
> >> +   region->mmaps[i].offset);
> >> +break;
> >> +}
> >> +}
> >> +return ptr;
> >> +}
> >> +
> >> +static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
> >> +{
> >> +VFIOMigration *migration = vbasedev->migration;
> >> +VFIORegion *region = >region;
> >> +uint64_t data_offset = 0, data_size = 0;
> >> +int ret;
> >> +
> >> +ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
> >> +region->fd_offset + offsetof(struct 
> >> vfio_device_migration_info,
> >> + data_offset));
> >> +if (ret != sizeof(data_offset)) {
> >> +error_report("%s: Failed to get migration buffer data offset %d",
> >> + vbasedev->name, ret);
> >> +return -EINVAL;
> >> +}
> >> +
> >> +ret = pread(vbasedev->fd, _size, sizeof(data_size),
> >> +region->fd_offset + offsetof(struct 
> >> vfio_device_migration_info,
> >> + data_size));
> >> +if (ret != sizeof(data_size)) {
> >> +error_report("%s: Failed to get migration buffer data size %d",
> >> + vbasedev->name, ret);
> >> +return -EINVAL;
> >> +}
> > data_size should be read first, and if it's 0, data_offset will not
> > be read further.
> > 
> > the reasons are below:
> > 1. if there's no data region provided by vendor driver, there's no
> > reason to get a valid data_offset, so reading/writing of data_offset
> > should fail. And this should not be treated as a migration error.
> > 
> > 2. even if pending_bytes is 0, vfio_save_iterate() is still possible to be
> > called and therefore vfio_save_buffer() is called.
> > 
> 
> As I mentioned in reply to Alex in:
> 

Re: [RESEND PATCH v3 1/1] ppc/spapr: Add hotremovable flag on DIMM LMBs on drmem_v2

2020-05-11 Thread David Gibson
On Mon, May 11, 2020 at 05:02:02PM -0300, Leonardo Bras wrote:
> From: Leonardo Bras 
> 
> On reboot, all memory that was previously added using object_add and
> device_add is placed in this DIMM area.
> 
> The new SPAPR_LMB_FLAGS_HOTREMOVABLE flag helps Linux to put this memory in
> the correct memory zone, so no unmovable allocations are made there,
> allowing the object to be easily hot-removed by device_del and
> object_del.
> 
> This new flag was accepted in Power Architecture documentation.
> 
> Signed-off-by: Leonardo Bras 
> Reviewed-by: Bharata B Rao 

Applied to ppc-for-5.1, thanks.

> 
> ---
> Changes since v1:
> - Flag name changed from SPAPR_LMB_FLAGS_HOTPLUGGED to
>   SPAPR_LMB_FLAGS_HOTREMOVABLE
> ---
>  hw/ppc/spapr.c | 3 ++-
>  include/hw/ppc/spapr.h | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 9a2bd501aa..fe662e297e 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -446,7 +446,8 @@ static int spapr_dt_dynamic_memory_v2(SpaprMachineState 
> *spapr, void *fdt,
>  g_assert(drc);
>  elem = spapr_get_drconf_cell(size / lmb_size, addr,
>   spapr_drc_index(drc), node,
> - SPAPR_LMB_FLAGS_ASSIGNED);
> + (SPAPR_LMB_FLAGS_ASSIGNED |
> +  SPAPR_LMB_FLAGS_HOTREMOVABLE);
>  QSIMPLEQ_INSERT_TAIL(_queue, elem, entry);
>  nr_entries++;
>  cur_addr = addr + size;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 42d64a0368..93e0d43051 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -880,6 +880,7 @@ int spapr_rtc_import_offset(SpaprRtcState *rtc, int64_t 
> legacy_offset);
>  #define SPAPR_LMB_FLAGS_ASSIGNED 0x0008
>  #define SPAPR_LMB_FLAGS_DRC_INVALID 0x0020
>  #define SPAPR_LMB_FLAGS_RESERVED 0x0080
> +#define SPAPR_LMB_FLAGS_HOTREMOVABLE 0x0100
>  
>  void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH v3 2/2] char-file: add test for distinct path= and pathin=

2020-05-11 Thread Alexander Bulekov
Signed-off-by: Alexander Bulekov 
---
 tests/test-char.c | 96 +++
 1 file changed, 96 insertions(+)

diff --git a/tests/test-char.c b/tests/test-char.c
index 3afc9b1b8d..6c66fae86a 100644
--- a/tests/test-char.c
+++ b/tests/test-char.c
@@ -1228,6 +1228,101 @@ static void char_file_test_internal(Chardev *ext_chr, 
const char *filepath)
 g_free(out);
 }
 
+static int file_can_read(void *opaque)
+{
+return 4096;
+}
+
+static void file_read(void *opaque, const uint8_t *buf, int size)
+{
+int ret;
+Chardev *chr = *(Chardev **)opaque;
+g_assert_cmpint(size, <=, file_can_read(opaque));
+
+g_assert_cmpint(size, ==, 6);
+g_assert(strncmp((const char *)buf, "hello!", 6) == 0);
+ret = qemu_chr_write_all(chr, (const uint8_t *)"world!", 6);
+g_assert_cmpint(ret, ==, 6);
+quit = true;
+}
+
+static void char_file_separate_input_file(void)
+{
+char *tmp_path = g_dir_make_tmp("qemu-test-char.XX", NULL);
+char *in;
+char *out;
+QemuOpts *opts;
+Chardev *chr;
+ChardevFile file = {};
+CharBackend be;
+ChardevBackend backend = { .type = CHARDEV_BACKEND_KIND_FILE,
+   .u.file.data =  };
+char *contents = NULL;
+gsize length;
+int ret;
+time_t in_mtime;
+GStatBuf file_stat;
+
+in = g_build_filename(tmp_path, "in", NULL);
+out = g_build_filename(tmp_path, "out", NULL);
+
+ret = g_file_set_contents(in, "hello!", 6, NULL);
+g_assert(ret == TRUE);
+g_stat(in, _stat);
+in_mtime = file_stat.st_mtime;
+/*
+ * Sleep to ensure that if the following actions modify the file, the mtime
+ * will be different
+ */
+sleep(1);
+opts = qemu_opts_create(qemu_find_opts("chardev"), "serial-id",
+1, _abort);
+qemu_opt_set(opts, "backend", "file", _abort);
+qemu_opt_set(opts, "pathin", in, _abort);
+qemu_opt_set(opts, "path", out, _abort);
+
+chr = qemu_chr_new_from_opts(opts, NULL, NULL);
+qemu_chr_fe_init(, chr, _abort);
+
+file.has_in = true;
+file.in = in;
+file.out = out;
+
+
+qemu_chr_fe_set_handlers(, file_can_read,
+ file_read,
+ NULL, NULL, , NULL, true);
+
+chr = qemu_chardev_new(NULL, TYPE_CHARDEV_FILE, ,
+   NULL, _abort);
+g_assert_nonnull(chr);
+
+main_loop(); /* should call file_read, and copy contents of in to out */
+
+qemu_chr_fe_deinit(, true);
+
+/* Check that out was written to */
+ret = g_file_get_contents(out, , , NULL);
+g_assert(ret == TRUE);
+g_assert_cmpint(length, ==, 6);
+g_assert(strncmp(contents, "world!", 6) == 0);
+g_free(contents);
+
+/* Check that in hasn't been modified */
+ret = g_file_get_contents(in, , , NULL);
+g_assert(ret == TRUE);
+g_assert_cmpint(length, ==, 6);
+g_assert(strncmp(contents, "hello!", 6) == 0);
+g_stat(in, _stat);
+g_assert(file_stat.st_mtime == in_mtime);
+
+g_free(contents);
+g_rmdir(tmp_path);
+g_free(tmp_path);
+g_free(in);
+g_free(out);
+}
+
 static void char_file_test(void)
 {
 char_file_test_internal(NULL, NULL);
@@ -1398,6 +1493,7 @@ int main(int argc, char **argv)
 g_test_add_func("/char/pipe", char_pipe_test);
 #endif
 g_test_add_func("/char/file", char_file_test);
+g_test_add_func("/char/file/pathin", char_file_separate_input_file);
 #ifndef _WIN32
 g_test_add_func("/char/file-fifo", char_file_fifo_test);
 #endif
-- 
2.26.2




[PATCH v3 0/2] Add pathin option to -chardev file

2020-05-11 Thread Alexander Bulekov
This adds a pathin= option to -chardev file, which allows specifying
distinct input and output paths for the chardev. This functionaliy was
already available through QMP.

v3:
  * char-test: add a stat()-based check to ensure that the pathin=
file is not modified during the execution of the test.


Alexander Bulekov (2):
  chardev: enable distinct input for -chardev file
  char-file: add test for distinct path= and pathin=

 chardev/char-file.c |  5 +++
 chardev/char.c  |  3 ++
 qemu-options.hx |  7 +++-
 tests/test-char.c   | 96 +
 4 files changed, 109 insertions(+), 2 deletions(-)

-- 
2.26.2




[PATCH v3 1/2] chardev: enable distinct input for -chardev file

2020-05-11 Thread Alexander Bulekov
char-file already supports distinct paths for input/output but it was
only possible to specify a distinct input through QMP. With this change,
we can also specify a distinct input with the -chardev file argument:
qemu -chardev file,id=char1,path=/out/file,pathin=/in/file

Signed-off-by: Alexander Bulekov 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Darren Kenny 
---
 chardev/char-file.c | 5 +
 chardev/char.c  | 3 +++
 qemu-options.hx | 7 +--
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/chardev/char-file.c b/chardev/char-file.c
index 2fd80707e5..031f2aa7d7 100644
--- a/chardev/char-file.c
+++ b/chardev/char-file.c
@@ -100,6 +100,7 @@ static void qemu_chr_parse_file_out(QemuOpts *opts, 
ChardevBackend *backend,
 Error **errp)
 {
 const char *path = qemu_opt_get(opts, "path");
+const char *pathin = qemu_opt_get(opts, "pathin");
 ChardevFile *file;
 
 backend->type = CHARDEV_BACKEND_KIND_FILE;
@@ -110,6 +111,10 @@ static void qemu_chr_parse_file_out(QemuOpts *opts, 
ChardevBackend *backend,
 file = backend->u.file.data = g_new0(ChardevFile, 1);
 qemu_chr_parse_common(opts, qapi_ChardevFile_base(file));
 file->out = g_strdup(path);
+if (pathin) {
+file->has_in = true;
+file->in = g_strdup(pathin);
+}
 
 file->has_append = true;
 file->append = qemu_opt_get_bool(opts, "append", false);
diff --git a/chardev/char.c b/chardev/char.c
index e77564060d..97e03a8e48 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -849,6 +849,9 @@ QemuOptsList qemu_chardev_opts = {
 },{
 .name = "path",
 .type = QEMU_OPT_STRING,
+},{
+.name = "pathin",
+.type = QEMU_OPT_STRING,
 },{
 .name = "host",
 .type = QEMU_OPT_STRING,
diff --git a/qemu-options.hx b/qemu-options.hx
index 292d4e7c0c..488961099b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2938,7 +2938,7 @@ DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
 "-chardev 
vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]\n"
 " [,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
 "-chardev ringbuf,id=id[,size=size][,logfile=PATH][,logappend=on|off]\n"
-"-chardev 
file,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+"-chardev 
file,id=id,path=path[,pathin=PATH][,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
 "-chardev 
pipe,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
 #ifdef _WIN32
 "-chardev console,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
@@ -3137,13 +3137,16 @@ The available backends are:
 Create a ring buffer with fixed size ``size``. size must be a power
 of two and defaults to ``64K``.
 
-``-chardev file,id=id,path=path``
+``-chardev file,id=id,path=path[,pathin=pathin]``
 Log all traffic received from the guest to a file.
 
 ``path`` specifies the path of the file to be opened. This file will
 be created if it does not already exist, and overwritten if it does.
 ``path`` is required.
 
+``pathin`` specifies a separate file as the input to the chardev. If
+``pathin`` is omitted, ``path`` is used for both input and output
+
 ``-chardev pipe,id=id,path=path``
 Create a two-way connection to the guest. The behaviour differs
 slightly between Windows hosts and other hosts:
-- 
2.26.2




Re: [PATCH v2 5/5] vhost: add device started check in migration set log

2020-05-11 Thread Li Feng
Hi, Dima.

If vhost_migration_log return < 0, then vhost_log_global_start will
trigger a crash.
Does your patch have process this abort?
If a disconnect happens in the migration stage, the correct operation
is to stop the migration, right?

 841 static void vhost_log_global_start(MemoryListener *listener)
 842 {
 843 int r;
 844
 845 r = vhost_migration_log(listener, true);
 846 if (r < 0) {
 847 abort();
 848 }
 849 }

Thanks,

Feng Li

Jason Wang  于2020年5月12日周二 上午11:33写道:
>
>
> On 2020/5/11 下午5:25, Dima Stepanov wrote:
> > On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:
> >> On 2020/4/30 下午9:36, Dima Stepanov wrote:
> >>> If vhost-user daemon is used as a backend for the vhost device, then we
> >>> should consider a possibility of disconnect at any moment. If such
> >>> disconnect happened in the vhost_migration_log() routine the vhost
> >>> device structure will be clean up.
> >>> At the start of the vhost_migration_log() function there is a check:
> >>>if (!dev->started) {
> >>>dev->log_enabled = enable;
> >>>return 0;
> >>>}
> >>> To be consistent with this check add the same check after calling the
> >>> vhost_dev_set_log() routine. This in general help not to break a
> >>> migration due the assert() message. But it looks like that this code
> >>> should be revised to handle these errors more carefully.
> >>>
> >>> In case of vhost-user device backend the fail paths should consider the
> >>> state of the device. In this case we should skip some function calls
> >>> during rollback on the error paths, so not to get the NULL dereference
> >>> errors.
> >>>
> >>> Signed-off-by: Dima Stepanov 
> >>> ---
> >>>   hw/virtio/vhost.c | 39 +++
> >>>   1 file changed, 35 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>> index 3ee50c4..d5ab96d 100644
> >>> --- a/hw/virtio/vhost.c
> >>> +++ b/hw/virtio/vhost.c
> >>> @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev 
> >>> *dev,
> >>>   static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
> >>>   {
> >>>   int r, i, idx;
> >>> +
> >>> +if (!dev->started) {
> >>> +/*
> >>> + * If vhost-user daemon is used as a backend for the
> >>> + * device and the connection is broken, then the vhost_dev
> >>> + * structure will be reset all its values to 0.
> >>> + * Add additional check for the device state.
> >>> + */
> >>> +return -1;
> >>> +}
> >>> +
> >>>   r = vhost_dev_set_features(dev, enable_log);
> >>>   if (r < 0) {
> >>>   goto err_features;
> >>> @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, 
> >>> bool enable_log)
> >>>   }
> >>>   return 0;
> >>>   err_vq:
> >>> -for (; i >= 0; --i) {
> >>> +/*
> >>> + * Disconnect with the vhost-user daemon can lead to the
> >>> + * vhost_dev_cleanup() call which will clean up vhost_dev
> >>> + * structure.
> >>> + */
> >>> +for (; dev->started && (i >= 0); --i) {
> >>>   idx = dev->vhost_ops->vhost_get_vq_index(
> >>
> >> Why need the check of dev->started here, can started be modified outside
> >> mainloop? If yes, I don't get the check of !dev->started in the beginning 
> >> of
> >> this function.
> >>
> > No dev->started can't change outside the mainloop. The main problem is
> > only for the vhost_user_blk daemon. Consider the case when we
> > successfully pass the dev->started check at the beginning of the
> > function, but after it we hit the disconnect on the next call on the
> > second or third iteration:
> >   r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
> > The unix socket backend device will call the disconnect routine for this
> > device and reset the structure. So the structure will be reset (and
> > dev->started set to false) inside this set_addr() call.
>
>
> I still don't get here. I think the disconnect can not happen in the
> middle of vhost_dev_set_log() since both of them were running in
> mainloop. And even if it can, we probably need other synchronization
> mechanism other than simple check here.
>
>
> >   So
> > we shouldn't call the clean up calls because this virtqueues were clean
> > up in the disconnect call. But we should protect these calls somehow, so
> > it will not hit SIGSEGV and we will be able to pass migration.
> >
> > Just to summarize it:
> > For the vhost-user-blk devices we ca hit clean up calls twice in case of
> > vhost disconnect:
> > 1. The first time during the disconnect process. The clean up is called
> > inside it.
> > 2. The second time during roll back clean up.
> > So if it is the case we should skip p2.
> >
> >>> dev, dev->vq_index + i);
> >>>   vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
> >>>dev->log_enabled);
> >>>   }
> >>> -vhost_dev_set_features(dev, 

Re: [PATCH v8 0/7] reference implementation of RSS and hash report

2020-05-11 Thread Jason Wang



On 2020/5/8 下午8:59, Yuri Benditovich wrote:

Support for VIRTIO_NET_F_RSS and VIRTIO_NET_F_HASH_REPORT
features in QEMU for reference purpose.
Implements Toeplitz hash calculation for incoming
packets according to configuration provided by driver.
Uses calculated hash for decision on receive virtqueue
and/or reports the hash in the virtio header

Changes from v7:
Patch 7.1: removed (RSS and hash report definitions)
Patch 7.2: delete configuration struct with RSS definitions
Patch 7.4: delete duplicated packet structure
Added patch 7 - adapt RSC definitions to updated header

Yuri Benditovich (7):
   virtio-net: implement RSS configuration command
   virtio-net: implement RX RSS processing
   tap: allow extended virtio header with hash info
   virtio-net: reference implementation of hash report
   vmstate.h: provide VMSTATE_VARRAY_UINT16_ALLOC macro
   virtio-net: add migration support for RSS and hash report
   virtio-net: align RSC fields with updated virtio-net header

  hw/net/trace-events|   3 +
  hw/net/virtio-net.c| 387 +
  include/hw/virtio/virtio-net.h |  16 ++
  include/migration/vmstate.h|  10 +
  net/tap.c  |   3 +-
  5 files changed, 379 insertions(+), 40 deletions(-)



Applied.

Thanks





Re: [PATCH 0/6] target/ppc: Various clean-up and fixes for radix64

2020-05-11 Thread David Gibson
On Mon, May 11, 2020 at 06:55:21PM +0200, Greg Kurz wrote:
> On Mon, 11 May 2020 11:44:26 +1000
> David Gibson  wrote:
> 
> > On Thu, May 07, 2020 at 07:26:32PM +0200, Greg Kurz wrote:
> > > First three patches of this series are simple cleanups. The other
> > > ones fix some regressions introduced by Cedric's recent addition
> > > of partition-scoped translation.
> > 
> > 1-5/6 applied to ppc-for-5.1.  I have some comments on 6/6.
> > 
> 
> As said in another mail, since patch 3 breaks build with gcc-9.3.1, I
> intend to send a v2 for the whole series later this week. I suggest
> you simply drop the patches you've applied for now.

Ok, done.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2] e1000e: Added ICR clearing by corresponding IMS bit.

2020-05-11 Thread Jason Wang



On 2020/5/11 下午6:08, Andrew Melnichenko wrote:

Yo,

So I think we should implement the 82574l behavior?

 Well, as I understand it - its already implemented. I've added ICR 
clearance if ICR & IMS(also need to add ICR_ASSERTED check, my bad, 
I'll prepare new patch).



Yes, but it behave more like e.g 82573 not what we claim to emulate like 
82574l.





At first, I had hacks to clear 'msi_causes_pending' at 
'e1000e_core_set_link_status()' before link down. It works but it's 
not a solution.
Also, on Windows the bug doesn't reproduce. I've traced Windows and 
Linux - the difference that Windows driver clears pending by writing 
to ICR, where Linux tries to clear by reading it.
I had another possible fix - for Linux driver(writing to ICR at 
interrupt routine).
I've asked intel guys, does Linux driver works with a device(I don't 
have real one). Thay said that it works and suggested to check 8257x 
spec. I'll forward the message to you.



Ok.

Thanks




On Sat, May 9, 2020 at 9:02 AM Jason Wang > wrote:



On 2020/5/9 上午2:13, Andrew Melnichenko wrote:
> Yo, I've used OpenSDM_8257x-18.pdf specification.
> This document was recommended by Intel guys(Also, they
referenced to
> that note).
> I've made a fast fix and it works. Before that I had a fix for
Linux
> e1000e driver.
> Overall, the issue was in pending interrupts that can't be
cleared by
> reading ICR in Linux(Windows driver clears by writing to ICR).
>
> You can download spec for example from:
>

http://iweb.dl.sourceforge.net/project/e1000/8257x%20Developer%20Manual/Revision%201.8/OpenSDM_8257x-18.pdf


Interesting, this spec doesn't include 82574l which is what e1000e
claims to emulate:

 c->vendor_id = PCI_VENDOR_ID_INTEL;
 c->device_id = E1000_DEV_ID_82574L;

Looking at 82574l spec (using the link mentioned in
e1000e_core.c), it
said (7.4.3):

In MSI-X mode the bits in this register can be configured to
auto-clear
when the MSI-X
interrupt message is sent, in order to minimize driver overhead, and
when using MSI-X
interrupt signaling.
In systems that do not support MSI-X, reading the ICR register clears
it's bits or writing
1b's clears the corresponding bits in this register.

So the auto clear is under the control of EIAC (MSIX) or
unconditionally
(non MSI-X).

But what has been implemented in e1000e_mac_icr_read() is something
similar to the behavior of non 82574l card.

So I think we should implement the 82574l behavior?

Thanks


>
> On Fri, May 8, 2020 at 5:21 AM Jason Wang mailto:jasow...@redhat.com>
> >> wrote:
>
>
>     On 2020/5/7 上午5:26, and...@daynix.com
 >
>     wrote:
>     > From: Andrew Melnychenko mailto:and...@daynix.com>
>     >>
>     >
>     > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1707441
>     > Added ICR clearing if there is IMS bit - according to the
note by
>     > section 13.3.27 of the 8257X developers manual.
>     >
>     > Signed-off-by: Andrew Melnychenko mailto:and...@daynix.com>
>     >>
>     > ---
>     >   hw/net/e1000e_core.c | 9 +
>     >   hw/net/trace-events  | 1 +
>     >   2 files changed, 10 insertions(+)
>     >
>     > diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
>     > index d5676871fa..302e99ff46 100644
>     > --- a/hw/net/e1000e_core.c
>     > +++ b/hw/net/e1000e_core.c
>     > @@ -2624,6 +2624,15 @@ e1000e_mac_icr_read(E1000ECore
*core, int
>     index)
>     >           e1000e_clear_ims_bits(core, core->mac[IAM]);
>     >       }
>     >
>     > +    /*
>     > +     * PCIe* GbE Controllers Open Source Software Developer's
>     Manual
>     > +     * 13.3.27 Interrupt Cause Read Register
>     > +     */
>
>
>     Hi Andrew:
>
>     Which version of the manual did you use? I try to use the one
>     mentioned
>     in e1000e.c which is
>

http://www.intel.com/content/dam/doc/datasheet/82574l-gbe-controller-datasheet.pdf.
>
>     But I couldn't find chapter 13.3.27.
>
>     Thanks
>
>
>     > +    if (core->mac[ICR] & core->mac[IMS]) {
>     > + trace_e1000e_irq_icr_clear_icr_bit_ims(core->mac[ICR],
>     core->mac[IMS]);
>     > +        core->mac[ICR] = 0;
>     > +    }
>     > +
>     >  trace_e1000e_irq_icr_read_exit(core->mac[ICR]);
>     >       e1000e_update_interrupt_state(core);
>     >       return ret;
>     > diff --git a/hw/net/trace-events b/hw/net/trace-events
>  

Re: [PATCH v2 5/5] vhost: add device started check in migration set log

2020-05-11 Thread Jason Wang



On 2020/5/11 下午5:25, Dima Stepanov wrote:

On Mon, May 11, 2020 at 11:15:53AM +0800, Jason Wang wrote:

On 2020/4/30 下午9:36, Dima Stepanov wrote:

If vhost-user daemon is used as a backend for the vhost device, then we
should consider a possibility of disconnect at any moment. If such
disconnect happened in the vhost_migration_log() routine the vhost
device structure will be clean up.
At the start of the vhost_migration_log() function there is a check:
   if (!dev->started) {
   dev->log_enabled = enable;
   return 0;
   }
To be consistent with this check add the same check after calling the
vhost_dev_set_log() routine. This in general help not to break a
migration due the assert() message. But it looks like that this code
should be revised to handle these errors more carefully.

In case of vhost-user device backend the fail paths should consider the
state of the device. In this case we should skip some function calls
during rollback on the error paths, so not to get the NULL dereference
errors.

Signed-off-by: Dima Stepanov 
---
  hw/virtio/vhost.c | 39 +++
  1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 3ee50c4..d5ab96d 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
  {
  int r, i, idx;
+
+if (!dev->started) {
+/*
+ * If vhost-user daemon is used as a backend for the
+ * device and the connection is broken, then the vhost_dev
+ * structure will be reset all its values to 0.
+ * Add additional check for the device state.
+ */
+return -1;
+}
+
  r = vhost_dev_set_features(dev, enable_log);
  if (r < 0) {
  goto err_features;
@@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool 
enable_log)
  }
  return 0;
  err_vq:
-for (; i >= 0; --i) {
+/*
+ * Disconnect with the vhost-user daemon can lead to the
+ * vhost_dev_cleanup() call which will clean up vhost_dev
+ * structure.
+ */
+for (; dev->started && (i >= 0); --i) {
  idx = dev->vhost_ops->vhost_get_vq_index(


Why need the check of dev->started here, can started be modified outside
mainloop? If yes, I don't get the check of !dev->started in the beginning of
this function.


No dev->started can't change outside the mainloop. The main problem is
only for the vhost_user_blk daemon. Consider the case when we
successfully pass the dev->started check at the beginning of the
function, but after it we hit the disconnect on the next call on the
second or third iteration:
  r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx, enable_log);
The unix socket backend device will call the disconnect routine for this
device and reset the structure. So the structure will be reset (and
dev->started set to false) inside this set_addr() call.



I still don't get here. I think the disconnect can not happen in the 
middle of vhost_dev_set_log() since both of them were running in 
mainloop. And even if it can, we probably need other synchronization 
mechanism other than simple check here.




  So
we shouldn't call the clean up calls because this virtqueues were clean
up in the disconnect call. But we should protect these calls somehow, so
it will not hit SIGSEGV and we will be able to pass migration.

Just to summarize it:
For the vhost-user-blk devices we ca hit clean up calls twice in case of
vhost disconnect:
1. The first time during the disconnect process. The clean up is called
inside it.
2. The second time during roll back clean up.
So if it is the case we should skip p2.


dev, dev->vq_index + i);
  vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
   dev->log_enabled);
  }
-vhost_dev_set_features(dev, dev->log_enabled);
+if (dev->started) {
+vhost_dev_set_features(dev, dev->log_enabled);
+}
  err_features:
  return r;
  }
@@ -832,7 +850,15 @@ static int vhost_migration_log(MemoryListener *listener, 
int enable)
  } else {
  vhost_dev_log_resize(dev, vhost_get_log_size(dev));
  r = vhost_dev_set_log(dev, true);
-if (r < 0) {
+/*
+ * The dev log resize can fail, because of disconnect
+ * with the vhost-user-blk daemon. Check the device
+ * state before calling the vhost_dev_set_log()
+ * function.
+ * Don't return error if device isn't started to be
+ * consistent with the check above.
+ */
+if (dev->started && r < 0) {
  return r;
  }
  }
@@ -1739,7 +1765,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev)
  fail_log:
  vhost_log_put(hdev, false);
  fail_vq:
-while (--i >= 0) {
+/*
+ * Disconnect with the vhost-user daemon can lead 

Re: [PATCH v2 4/5] vhost: check vring address before calling unmap

2020-05-11 Thread Jason Wang



On 2020/5/11 下午5:11, Dima Stepanov wrote:

On Mon, May 11, 2020 at 11:05:58AM +0800, Jason Wang wrote:

On 2020/4/30 下午9:36, Dima Stepanov wrote:

Since disconnect can happen at any time during initialization not all
vring buffers (for instance used vring) can be intialized successfully.
If the buffer was not initialized then vhost_memory_unmap call will lead
to SIGSEGV. Add checks for the vring address value before calling unmap.
Also add assert() in the vhost_memory_unmap() routine.

Signed-off-by: Dima Stepanov 
---
  hw/virtio/vhost.c | 27 +--
  1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddbdc53..3ee50c4 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -314,6 +314,8 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void 
*buffer,
 hwaddr len, int is_write,
 hwaddr access_len)
  {
+assert(buffer);
+
  if (!vhost_dev_has_iommu(dev)) {
  cpu_physical_memory_unmap(buffer, len, is_write, access_len);
  }
@@ -1132,12 +1134,25 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
  vhost_vq_index);
  }
-vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
-   1, virtio_queue_get_used_size(vdev, idx));
-vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
-   0, virtio_queue_get_avail_size(vdev, idx));
-vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
-   0, virtio_queue_get_desc_size(vdev, idx));
+/*
+ * Since the vhost-user disconnect can happen during initialization
+ * check if vring was initialized, before making unmap.
+ */
+if (vq->used) {
+vhost_memory_unmap(dev, vq->used,
+   virtio_queue_get_used_size(vdev, idx),
+   1, virtio_queue_get_used_size(vdev, idx));
+}
+if (vq->avail) {
+vhost_memory_unmap(dev, vq->avail,
+   virtio_queue_get_avail_size(vdev, idx),
+   0, virtio_queue_get_avail_size(vdev, idx));
+}
+if (vq->desc) {
+vhost_memory_unmap(dev, vq->desc,
+   virtio_queue_get_desc_size(vdev, idx),
+   0, virtio_queue_get_desc_size(vdev, idx));
+}


Any reason not checking hdev->started instead? vhost_dev_start() will set it
to true if virtqueues were correctly mapped.

Thanks

Well i see it a little bit different:
  - vhost_dev_start() sets hdev->started to true before starting
virtqueues
  - vhost_virtqueue_start() maps all the memory
If we hit the vhost disconnect at the start of the
vhost_virtqueue_start(), for instance for this call:
   r = dev->vhost_ops->vhost_set_vring_base(dev, );
Then we will call vhost_user_blk_disconnect:
   vhost_user_blk_disconnect()->
 vhost_user_blk_stop()->
   vhost_dev_stop()->
 vhost_virtqueue_stop()
As a result we will come in this routine with the hdev->started still
set to true, but if used/avail/desc fields still uninitialized and set
to 0.



I may miss something, but consider both vhost_dev_start() and 
vhost_user_blk_disconnect() were serialized in main loop. Can this 
really happen?


Thanks







  }
  static void vhost_eventfd_add(MemoryListener *listener,





[PATCH 1/4] fuzz: add datadir for oss-fuzz compatability

2020-05-11 Thread Alexander Bulekov
This allows us to keep pc-bios in executable_dir/pc-bios, rather than
executable_dir/../pc-bios, which is incompatible with oss-fuzz' file
structure.

Signed-off-by: Alexander Bulekov 
---
 include/sysemu/sysemu.h |  2 ++
 softmmu/vl.c|  2 +-
 tests/qtest/fuzz/fuzz.c | 15 +++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index ef81302e1a..cc96b66fc9 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -15,6 +15,8 @@ extern const char *qemu_name;
 extern QemuUUID qemu_uuid;
 extern bool qemu_uuid_set;
 
+void qemu_add_data_dir(const char *path);
+
 void qemu_add_exit_notifier(Notifier *notify);
 void qemu_remove_exit_notifier(Notifier *notify);
 
diff --git a/softmmu/vl.c b/softmmu/vl.c
index afd2615fb3..c71485a965 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1993,7 +1993,7 @@ char *qemu_find_file(int type, const char *name)
 return NULL;
 }
 
-static void qemu_add_data_dir(const char *path)
+void qemu_add_data_dir(const char *path)
 {
 int i;
 
diff --git a/tests/qtest/fuzz/fuzz.c b/tests/qtest/fuzz/fuzz.c
index f5c923852e..33365c3782 100644
--- a/tests/qtest/fuzz/fuzz.c
+++ b/tests/qtest/fuzz/fuzz.c
@@ -137,6 +137,7 @@ int LLVMFuzzerInitialize(int *argc, char ***argv, char 
***envp)
 {
 
 char *target_name;
+char *dir;
 
 /* Initialize qgraph and modules */
 qos_graph_init();
@@ -147,6 +148,20 @@ int LLVMFuzzerInitialize(int *argc, char ***argv, char 
***envp)
 target_name = strstr(**argv, "-target-");
 if (target_name) {/* The binary name specifies the target */
 target_name += strlen("-target-");
+/*
+ * With oss-fuzz, the executable is kept in the root of a directory (we
+ * cannot assume the path). All data (including bios binaries) must be
+ * in the same dir, or a subdir. Thus, we cannot place the pc-bios so
+ * that it would be in exec_dir/../pc-bios.
+ * As a workaround, oss-fuzz allows us to use argv[0] to get the
+ * location of the executable. Using this we add exec_dir/pc-bios to
+ * the datadirs.
+ */
+dir = g_build_filename(g_path_get_dirname(**argv), "pc-bios", NULL);
+if (g_file_test(dir, G_FILE_TEST_IS_DIR)) {
+qemu_add_data_dir(dir);
+}
+g_free(dir);
 } else if (*argc > 1) {  /* The target is specified as an argument */
 target_name = (*argv)[1];
 if (!strstr(target_name, "--fuzz-target=")) {
-- 
2.26.2




[PATCH 0/4] fuzz: misc changes for oss-fuzz compatability

2020-05-11 Thread Alexander Bulekov
Hello,
With these patches, the fuzzer passes the oss-fuzz build checks.
There are also some miscelanous improvement to the fuzzer, in general:
 * If building for oss-fuzz, check executable_dir/pc-bios for
   the bios images
 * Fix a typo in the i440fx-qtest-reboot argument which resulted in an
   invalid argument to qemu_main
 * Add an alternate name to resolve libfuzzer's internal fuzzer::TPC
   object at link-time
 * For all fork-based fuzzers, run the main-loop in the parent, to
   prevent the clock from running far-ahead of the previous main-loop.
-Alex

Alexander Bulekov (4):
  fuzz: add datadir for oss-fuzz compatability
  fuzz: fix typo in i440fx-qtest-reboot arguments
  fuzz: add mangled object name to linker script
  fuzz: run the main-loop in fork-server process

 include/sysemu/sysemu.h |  2 ++
 softmmu/vl.c|  2 +-
 tests/qtest/fuzz/fork_fuzz.ld   |  5 +
 tests/qtest/fuzz/fuzz.c | 15 +++
 tests/qtest/fuzz/i440fx_fuzz.c  |  3 ++-
 tests/qtest/fuzz/virtio_net_fuzz.c  |  2 ++
 tests/qtest/fuzz/virtio_scsi_fuzz.c |  2 ++
 7 files changed, 29 insertions(+), 2 deletions(-)

-- 
2.26.2




RE: [EXT] [PATCH v2 2/5] virtio-iommu: Implement RESV_MEM probe request

2020-05-11 Thread Bharat Bhushan
Hi Eric,

> -Original Message-
> From: Auger Eric 
> Sent: Tuesday, May 12, 2020 8:39 AM
> To: Bharat Bhushan ; eric.auger@gmail.com;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; peter.mayd...@linaro.org;
> m...@redhat.com; jean-phili...@linaro.org; pet...@redhat.com;
> arm...@redhat.com; pbonz...@redhat.com
> Subject: Re: [EXT] [PATCH v2 2/5] virtio-iommu: Implement RESV_MEM probe
> request
> 
> Hi Bharat,
> On 5/12/20 5:03 AM, Bharat Bhushan wrote:
> > Hi Eric,
> >
> >> -Original Message-
> >> From: Auger Eric 
> >> Sent: Monday, May 11, 2020 2:19 PM
> >> To: Bharat Bhushan ; eric.auger@gmail.com;
> >> qemu-devel@nongnu.org; qemu-...@nongnu.org; peter.mayd...@linaro.org;
> >> m...@redhat.com; jean-phili...@linaro.org; pet...@redhat.com;
> >> arm...@redhat.com; pbonz...@redhat.com
> >> Subject: Re: [EXT] [PATCH v2 2/5] virtio-iommu: Implement RESV_MEM
> >> probe request
> >>
> >> Hi Bharat,
> >>
> >> On 5/11/20 10:42 AM, Bharat Bhushan wrote:
> >>> Hi Eric,
> >>>
>  -Original Message-
>  From: Auger Eric 
>  Sent: Monday, May 11, 2020 12:26 PM
>  To: Bharat Bhushan ;
>  eric.auger@gmail.com; qemu-devel@nongnu.org;
>  qemu-...@nongnu.org; peter.mayd...@linaro.org; m...@redhat.com;
>  jean-phili...@linaro.org; pet...@redhat.com; arm...@redhat.com;
>  pbonz...@redhat.com
>  Subject: Re: [EXT] [PATCH v2 2/5] virtio-iommu: Implement RESV_MEM
>  probe request
> 
>  Hi Bharat,
>  On 5/11/20 8:38 AM, Bharat Bhushan wrote:
> > Hi Eric,
> >
> >> -Original Message-
> >> From: Eric Auger 
> >> Sent: Friday, May 8, 2020 11:01 PM
> >> To: eric.auger@gmail.com; eric.au...@redhat.com;
> >> qemu-devel@nongnu.org; qemu-...@nongnu.org;
> >> peter.mayd...@linaro.org; m...@redhat.com; jean-
> >> phili...@linaro.org; Bharat Bhushan ;
> >> pet...@redhat.com; arm...@redhat.com; pbonz...@redhat.com
> >> Subject: [EXT] [PATCH v2 2/5] virtio-iommu: Implement RESV_MEM
> >> probe request
> >>
> >> External Email
> >>
> >> -
> >> --
> >> --
> >> - This patch implements the PROBE request. At the moment, only
> >> THE RESV_MEM property is handled. The first goal is to report
> >> iommu wide reserved regions such as the MSI regions set by the
> >> machine code. On
> >> x86 this will be the IOAPIC MSI region,
> >> [0xFEE0 - 0xFEEF], on ARM this may be the ITS doorbell.
> >>
> >> In the future we may introduce per device reserved regions.
> >> This will be useful when protecting host assigned devices which
> >> may expose their own reserved regions
> >>
> >> Signed-off-by: Eric Auger 
> >>
> >> ---
> >>
> >> v1 -> v2:
> >> - move the unlock back to the same place
> >> - remove the push label and factorize the code after the out
> >> label
> >> - fix a bunch of cpu_to_leX according to the latest spec revision
> >> - do not remove sizeof(last) from free space
> >> - check the ep exists
> >> ---
> >>  include/hw/virtio/virtio-iommu.h |  2 +
> >>  hw/virtio/virtio-iommu.c | 94 ++--
> >>  hw/virtio/trace-events   |  1 +
> >>  3 files changed, 93 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/include/hw/virtio/virtio-iommu.h
> >> b/include/hw/virtio/virtio-iommu.h
> >> index e653004d7c..49eb105cd8 100644
> >> --- a/include/hw/virtio/virtio-iommu.h
> >> +++ b/include/hw/virtio/virtio-iommu.h
> >> @@ -53,6 +53,8 @@ typedef struct VirtIOIOMMU {
> >>  GHashTable *as_by_busptr;
> >>  IOMMUPciBus *iommu_pcibus_by_bus_num[PCI_BUS_MAX];
> >>  PCIBus *primary_bus;
> >> +ReservedRegion *reserved_regions;
> >> +uint32_t nb_reserved_regions;
> >>  GTree *domains;
> >>  QemuMutex mutex;
> >>  GTree *endpoints;
> >> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> >> index
> >> 22ba8848c2..35d772e021 100644
> >> --- a/hw/virtio/virtio-iommu.c
> >> +++ b/hw/virtio/virtio-iommu.c
> >> @@ -38,6 +38,7 @@
> >>
> >>  /* Max size */
> >>  #define VIOMMU_DEFAULT_QUEUE_SIZE 256
> >> +#define VIOMMU_PROBE_SIZE 512
> >>
> >>  typedef struct VirtIOIOMMUDomain {
> >>  uint32_t id;
> >> @@ -378,6 +379,65 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
> >>  return ret;
> >>  }
> >>
> >> +static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s,
> >> +uint32_t
> >> ep,
> >> +   uint8_t *buf,
> >> +size_t
> >> +free) {
> >> +struct virtio_iommu_probe_resv_mem prop = {};
> >> +size_t size = sizeof(prop), length = size - sizeof(prop.head), 
> >> total;
> >> +int i;
> >> +
> >> +total = size 

[PATCH v27 10/10] MAINTAINERS: Add ACPI/HEST/GHES entries

2020-05-11 Thread Dongjiu Geng
I and Xiang are willing to review the APEI-related patches and
volunteer as the reviewers for the HEST/GHES part.

Signed-off-by: Dongjiu Geng 
Signed-off-by: Xiang Zheng 
Reviewed-by: Philippe Mathieu-Daudé 
Acked-by: Michael S. Tsirkin 
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1f84e3a..9619b90 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1516,6 +1516,15 @@ F: tests/qtest/bios-tables-test.c
 F: tests/qtest/acpi-utils.[hc]
 F: tests/data/acpi/
 
+ACPI/HEST/GHES
+R: Dongjiu Geng 
+R: Xiang Zheng 
+L: qemu-...@nongnu.org
+S: Maintained
+F: hw/acpi/ghes.c
+F: include/hw/acpi/ghes.h
+F: docs/specs/acpi_hest_ghes.rst
+
 ppc4xx
 M: David Gibson 
 L: qemu-...@nongnu.org
-- 
1.8.3.1




RE: [EXT] [PATCH v2 2/5] virtio-iommu: Implement RESV_MEM probe request

2020-05-11 Thread Bharat Bhushan
Hi Eric,

> -Original Message-
> From: Auger Eric 
> Sent: Monday, May 11, 2020 2:19 PM
> To: Bharat Bhushan ; eric.auger@gmail.com;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; peter.mayd...@linaro.org;
> m...@redhat.com; jean-phili...@linaro.org; pet...@redhat.com;
> arm...@redhat.com; pbonz...@redhat.com
> Subject: Re: [EXT] [PATCH v2 2/5] virtio-iommu: Implement RESV_MEM probe
> request
> 
> Hi Bharat,
> 
> On 5/11/20 10:42 AM, Bharat Bhushan wrote:
> > Hi Eric,
> >
> >> -Original Message-
> >> From: Auger Eric 
> >> Sent: Monday, May 11, 2020 12:26 PM
> >> To: Bharat Bhushan ; eric.auger@gmail.com;
> >> qemu-devel@nongnu.org; qemu-...@nongnu.org; peter.mayd...@linaro.org;
> >> m...@redhat.com; jean-phili...@linaro.org; pet...@redhat.com;
> >> arm...@redhat.com; pbonz...@redhat.com
> >> Subject: Re: [EXT] [PATCH v2 2/5] virtio-iommu: Implement RESV_MEM
> >> probe request
> >>
> >> Hi Bharat,
> >> On 5/11/20 8:38 AM, Bharat Bhushan wrote:
> >>> Hi Eric,
> >>>
>  -Original Message-
>  From: Eric Auger 
>  Sent: Friday, May 8, 2020 11:01 PM
>  To: eric.auger@gmail.com; eric.au...@redhat.com;
>  qemu-devel@nongnu.org; qemu-...@nongnu.org;
>  peter.mayd...@linaro.org; m...@redhat.com; jean-
>  phili...@linaro.org; Bharat Bhushan ;
>  pet...@redhat.com; arm...@redhat.com; pbonz...@redhat.com
>  Subject: [EXT] [PATCH v2 2/5] virtio-iommu: Implement RESV_MEM
>  probe request
> 
>  External Email
> 
>  ---
>  --
>  - This patch implements the PROBE request. At the moment, only THE
>  RESV_MEM property is handled. The first goal is to report iommu
>  wide reserved regions such as the MSI regions set by the machine
>  code. On
>  x86 this will be the IOAPIC MSI region,
>  [0xFEE0 - 0xFEEF], on ARM this may be the ITS doorbell.
> 
>  In the future we may introduce per device reserved regions.
>  This will be useful when protecting host assigned devices which may
>  expose their own reserved regions
> 
>  Signed-off-by: Eric Auger 
> 
>  ---
> 
>  v1 -> v2:
>  - move the unlock back to the same place
>  - remove the push label and factorize the code after the out label
>  - fix a bunch of cpu_to_leX according to the latest spec revision
>  - do not remove sizeof(last) from free space
>  - check the ep exists
>  ---
>   include/hw/virtio/virtio-iommu.h |  2 +
>   hw/virtio/virtio-iommu.c | 94 ++--
>   hw/virtio/trace-events   |  1 +
>   3 files changed, 93 insertions(+), 4 deletions(-)
> 
>  diff --git a/include/hw/virtio/virtio-iommu.h
>  b/include/hw/virtio/virtio-iommu.h
>  index e653004d7c..49eb105cd8 100644
>  --- a/include/hw/virtio/virtio-iommu.h
>  +++ b/include/hw/virtio/virtio-iommu.h
>  @@ -53,6 +53,8 @@ typedef struct VirtIOIOMMU {
>   GHashTable *as_by_busptr;
>   IOMMUPciBus *iommu_pcibus_by_bus_num[PCI_BUS_MAX];
>   PCIBus *primary_bus;
>  +ReservedRegion *reserved_regions;
>  +uint32_t nb_reserved_regions;
>   GTree *domains;
>   QemuMutex mutex;
>   GTree *endpoints;
>  diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>  index
>  22ba8848c2..35d772e021 100644
>  --- a/hw/virtio/virtio-iommu.c
>  +++ b/hw/virtio/virtio-iommu.c
>  @@ -38,6 +38,7 @@
> 
>   /* Max size */
>   #define VIOMMU_DEFAULT_QUEUE_SIZE 256
>  +#define VIOMMU_PROBE_SIZE 512
> 
>   typedef struct VirtIOIOMMUDomain {
>   uint32_t id;
>  @@ -378,6 +379,65 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
>   return ret;
>   }
> 
>  +static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t
> ep,
>  +   uint8_t *buf,
>  +size_t
>  +free) {
>  +struct virtio_iommu_probe_resv_mem prop = {};
>  +size_t size = sizeof(prop), length = size - sizeof(prop.head), 
>  total;
>  +int i;
>  +
>  +total = size * s->nb_reserved_regions;
>  +
>  +if (total > free) {
>  +return -ENOSPC;
>  +}
>  +
>  +for (i = 0; i < s->nb_reserved_regions; i++) {
>  +prop.head.type = cpu_to_le16(VIRTIO_IOMMU_PROBE_T_RESV_MEM);
>  +prop.head.length = cpu_to_le16(length);
>  +prop.subtype = s->reserved_regions[i].type;
>  +prop.start = cpu_to_le64(s->reserved_regions[i].low);
>  +prop.end = cpu_to_le64(s->reserved_regions[i].high);
>  +
>  +memcpy(buf, , size);
>  +
>  +trace_virtio_iommu_fill_resv_property(ep, prop.subtype,
>  +  prop.start, prop.end);
>  +buf += size;
> 

Re: [PATCH v26 01/10] acpi: nvdimm: change NVDIMM_UUID_LE to a common macro

2020-05-11 Thread gengdongjiu
On 2020/5/12 3:41, Igor Mammedov wrote:
> for future, adding RESEND doesn't make sence here. If you change patches then 
> just bump version.

Igor,
Thanks for the reminder, Just now I submitted a new patchset version to 
avoid this confusion.




[PATCH v27 05/10] ACPI: Build Hardware Error Source Table

2020-05-11 Thread Dongjiu Geng
This patch builds Hardware Error Source Table(HEST) via fw_cfg blobs.
Now it only supports ARMv8 SEA, a type of Generic Hardware Error
Source version 2(GHESv2) error source. Afterwards, we can extend
the supported types if needed. For the CPER section, currently it
is memory section because kernel mainly wants userspace to handle
the memory errors.

This patch follows the spec ACPI 6.2 to build the Hardware Error
Source table. For more detailed information, please refer to
document: docs/specs/acpi_hest_ghes.rst

build_ghes_hw_error_notification() helper will help to add Hardware
Error Notification to ACPI tables without using packed C structures
and avoid endianness issues as API doesn't need explicit conversion.

Signed-off-by: Xiang Zheng 
Signed-off-by: Dongjiu Geng 
Reviewed-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
---
 hw/acpi/ghes.c   | 126 +++
 hw/arm/virt-acpi-build.c |   2 +
 include/hw/acpi/ghes.h   |  39 +++
 3 files changed, 167 insertions(+)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index e1b3f8f..091fd87 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -23,6 +23,7 @@
 #include "qemu/units.h"
 #include "hw/acpi/ghes.h"
 #include "hw/acpi/aml-build.h"
+#include "qemu/error-report.h"
 
 #define ACPI_GHES_ERRORS_FW_CFG_FILE"etc/hardware_errors"
 #define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
@@ -33,6 +34,42 @@
 /* Now only support ARMv8 SEA notification type error source */
 #define ACPI_GHES_ERROR_SOURCE_COUNT1
 
+/* Generic Hardware Error Source version 2 */
+#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
+
+/* Address offset in Generic Address Structure(GAS) */
+#define GAS_ADDR_OFFSET 4
+
+/*
+ * Hardware Error Notification
+ * ACPI 4.0: 17.3.2.7 Hardware Error Notification
+ * Composes dummy Hardware Error Notification descriptor of specified type
+ */
+static void build_ghes_hw_error_notification(GArray *table, const uint8_t type)
+{
+/* Type */
+build_append_int_noprefix(table, type, 1);
+/*
+ * Length:
+ * Total length of the structure in bytes
+ */
+build_append_int_noprefix(table, 28, 1);
+/* Configuration Write Enable */
+build_append_int_noprefix(table, 0, 2);
+/* Poll Interval */
+build_append_int_noprefix(table, 0, 4);
+/* Vector */
+build_append_int_noprefix(table, 0, 4);
+/* Switch To Polling Threshold Value */
+build_append_int_noprefix(table, 0, 4);
+/* Switch To Polling Threshold Window */
+build_append_int_noprefix(table, 0, 4);
+/* Error Threshold Value */
+build_append_int_noprefix(table, 0, 4);
+/* Error Threshold Window */
+build_append_int_noprefix(table, 0, 4);
+}
+
 /*
  * Build table for the hardware error fw_cfg blob.
  * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg 
blobs.
@@ -87,3 +124,92 @@ void build_ghes_error_table(GArray *hardware_errors, 
BIOSLinker *linker)
 bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
 0, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
 }
+
+/* Build Generic Hardware Error Source version 2 (GHESv2) */
+static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker 
*linker)
+{
+uint64_t address_offset;
+/*
+ * Type:
+ * Generic Hardware Error Source version 2(GHESv2 - Type 10)
+ */
+build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 
2);
+/* Source Id */
+build_append_int_noprefix(table_data, source_id, 2);
+/* Related Source Id */
+build_append_int_noprefix(table_data, 0x, 2);
+/* Flags */
+build_append_int_noprefix(table_data, 0, 1);
+/* Enabled */
+build_append_int_noprefix(table_data, 1, 1);
+
+/* Number of Records To Pre-allocate */
+build_append_int_noprefix(table_data, 1, 4);
+/* Max Sections Per Record */
+build_append_int_noprefix(table_data, 1, 4);
+/* Max Raw Data Length */
+build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
+
+address_offset = table_data->len;
+/* Error Status Address */
+build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
+ 4 /* QWord access */, 0);
+bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
+address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
+ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
+
+switch (source_id) {
+case ACPI_HEST_SRC_ID_SEA:
+/*
+ * Notification Structure
+ * Now only enable ARMv8 SEA notification type
+ */
+build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
+break;
+default:
+error_report("Not support this error source");
+abort();
+}
+
+/* Error Status Block Length */
+build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
+
+/*
+ * Read Ack Register
+ * ACPI 

Re: [EXT] [PATCH v2 2/5] virtio-iommu: Implement RESV_MEM probe request

2020-05-11 Thread Auger Eric
Hi Bharat,
On 5/12/20 5:03 AM, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Auger Eric 
>> Sent: Monday, May 11, 2020 2:19 PM
>> To: Bharat Bhushan ; eric.auger@gmail.com;
>> qemu-devel@nongnu.org; qemu-...@nongnu.org; peter.mayd...@linaro.org;
>> m...@redhat.com; jean-phili...@linaro.org; pet...@redhat.com;
>> arm...@redhat.com; pbonz...@redhat.com
>> Subject: Re: [EXT] [PATCH v2 2/5] virtio-iommu: Implement RESV_MEM probe
>> request
>>
>> Hi Bharat,
>>
>> On 5/11/20 10:42 AM, Bharat Bhushan wrote:
>>> Hi Eric,
>>>
 -Original Message-
 From: Auger Eric 
 Sent: Monday, May 11, 2020 12:26 PM
 To: Bharat Bhushan ; eric.auger@gmail.com;
 qemu-devel@nongnu.org; qemu-...@nongnu.org; peter.mayd...@linaro.org;
 m...@redhat.com; jean-phili...@linaro.org; pet...@redhat.com;
 arm...@redhat.com; pbonz...@redhat.com
 Subject: Re: [EXT] [PATCH v2 2/5] virtio-iommu: Implement RESV_MEM
 probe request

 Hi Bharat,
 On 5/11/20 8:38 AM, Bharat Bhushan wrote:
> Hi Eric,
>
>> -Original Message-
>> From: Eric Auger 
>> Sent: Friday, May 8, 2020 11:01 PM
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> qemu-devel@nongnu.org; qemu-...@nongnu.org;
>> peter.mayd...@linaro.org; m...@redhat.com; jean-
>> phili...@linaro.org; Bharat Bhushan ;
>> pet...@redhat.com; arm...@redhat.com; pbonz...@redhat.com
>> Subject: [EXT] [PATCH v2 2/5] virtio-iommu: Implement RESV_MEM
>> probe request
>>
>> External Email
>>
>> ---
>> --
>> - This patch implements the PROBE request. At the moment, only THE
>> RESV_MEM property is handled. The first goal is to report iommu
>> wide reserved regions such as the MSI regions set by the machine
>> code. On
>> x86 this will be the IOAPIC MSI region,
>> [0xFEE0 - 0xFEEF], on ARM this may be the ITS doorbell.
>>
>> In the future we may introduce per device reserved regions.
>> This will be useful when protecting host assigned devices which may
>> expose their own reserved regions
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v1 -> v2:
>> - move the unlock back to the same place
>> - remove the push label and factorize the code after the out label
>> - fix a bunch of cpu_to_leX according to the latest spec revision
>> - do not remove sizeof(last) from free space
>> - check the ep exists
>> ---
>>  include/hw/virtio/virtio-iommu.h |  2 +
>>  hw/virtio/virtio-iommu.c | 94 ++--
>>  hw/virtio/trace-events   |  1 +
>>  3 files changed, 93 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/hw/virtio/virtio-iommu.h
>> b/include/hw/virtio/virtio-iommu.h
>> index e653004d7c..49eb105cd8 100644
>> --- a/include/hw/virtio/virtio-iommu.h
>> +++ b/include/hw/virtio/virtio-iommu.h
>> @@ -53,6 +53,8 @@ typedef struct VirtIOIOMMU {
>>  GHashTable *as_by_busptr;
>>  IOMMUPciBus *iommu_pcibus_by_bus_num[PCI_BUS_MAX];
>>  PCIBus *primary_bus;
>> +ReservedRegion *reserved_regions;
>> +uint32_t nb_reserved_regions;
>>  GTree *domains;
>>  QemuMutex mutex;
>>  GTree *endpoints;
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index
>> 22ba8848c2..35d772e021 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -38,6 +38,7 @@
>>
>>  /* Max size */
>>  #define VIOMMU_DEFAULT_QUEUE_SIZE 256
>> +#define VIOMMU_PROBE_SIZE 512
>>
>>  typedef struct VirtIOIOMMUDomain {
>>  uint32_t id;
>> @@ -378,6 +379,65 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
>>  return ret;
>>  }
>>
>> +static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t
>> ep,
>> +   uint8_t *buf,
>> +size_t
>> +free) {
>> +struct virtio_iommu_probe_resv_mem prop = {};
>> +size_t size = sizeof(prop), length = size - sizeof(prop.head), 
>> total;
>> +int i;
>> +
>> +total = size * s->nb_reserved_regions;
>> +
>> +if (total > free) {
>> +return -ENOSPC;
>> +}
>> +
>> +for (i = 0; i < s->nb_reserved_regions; i++) {
>> +prop.head.type = cpu_to_le16(VIRTIO_IOMMU_PROBE_T_RESV_MEM);
>> +prop.head.length = cpu_to_le16(length);
>> +prop.subtype = s->reserved_regions[i].type;
>> +prop.start = cpu_to_le64(s->reserved_regions[i].low);
>> +prop.end = cpu_to_le64(s->reserved_regions[i].high);
>> +
>> +memcpy(buf, , size);
>> +
>> +trace_virtio_iommu_fill_resv_property(ep, prop.subtype,
>> +

[PATCH v27 08/10] ACPI: Record Generic Error Status Block(GESB) table

2020-05-11 Thread Dongjiu Geng
kvm_arch_on_sigbus_vcpu() error injection uses source_id as
index in etc/hardware_errors to find out Error Status Data
Block entry corresponding to error source. So supported source_id
values should be assigned here and not be changed afterwards to
make sure that guest will write error into expected Error Status
Data Block.

Before QEMU writes a new error to ACPI table, it will check whether
previous error has been acknowledged. If not acknowledged, the new
errors will be ignored and not be recorded. For the errors section
type, QEMU simulate it to memory section error.

Signed-off-by: Dongjiu Geng 
Signed-off-by: Xiang Zheng 
Reviewed-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
---
 hw/acpi/ghes.c | 219 +
 include/hw/acpi/ghes.h |   1 +
 2 files changed, 220 insertions(+)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index e74af23..b363bc3 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -26,6 +26,7 @@
 #include "qemu/error-report.h"
 #include "hw/acpi/generic_event_device.h"
 #include "hw/nvram/fw_cfg.h"
+#include "qemu/uuid.h"
 
 #define ACPI_GHES_ERRORS_FW_CFG_FILE"etc/hardware_errors"
 #define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
@@ -43,6 +44,36 @@
 #define GAS_ADDR_OFFSET 4
 
 /*
+ * The total size of Generic Error Data Entry
+ * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
+ * Table 18-343 Generic Error Data Entry
+ */
+#define ACPI_GHES_DATA_LENGTH   72
+
+/* The memory section CPER size, UEFI 2.6: N.2.5 Memory Error Section */
+#define ACPI_GHES_MEM_CPER_LENGTH   80
+
+/* Masks for block_status flags */
+#define ACPI_GEBS_UNCORRECTABLE 1
+
+/*
+ * Total size for Generic Error Status Block except Generic Error Data Entries
+ * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
+ * Table 18-380 Generic Error Status Block
+ */
+#define ACPI_GHES_GESB_SIZE 20
+
+/*
+ * Values for error_severity field
+ */
+enum AcpiGenericErrorSeverity {
+ACPI_CPER_SEV_RECOVERABLE = 0,
+ACPI_CPER_SEV_FATAL = 1,
+ACPI_CPER_SEV_CORRECTED = 2,
+ACPI_CPER_SEV_NONE = 3,
+};
+
+/*
  * Hardware Error Notification
  * ACPI 4.0: 17.3.2.7 Hardware Error Notification
  * Composes dummy Hardware Error Notification descriptor of specified type
@@ -73,6 +104,138 @@ static void build_ghes_hw_error_notification(GArray 
*table, const uint8_t type)
 }
 
 /*
+ * Generic Error Data Entry
+ * ACPI 6.1: 18.3.2.7.1 Generic Error Data
+ */
+static void acpi_ghes_generic_error_data(GArray *table,
+const uint8_t *section_type, uint32_t error_severity,
+uint8_t validation_bits, uint8_t flags,
+uint32_t error_data_length, QemuUUID fru_id,
+uint64_t time_stamp)
+{
+const uint8_t fru_text[20] = {0};
+
+/* Section Type */
+g_array_append_vals(table, section_type, 16);
+
+/* Error Severity */
+build_append_int_noprefix(table, error_severity, 4);
+/* Revision */
+build_append_int_noprefix(table, 0x300, 2);
+/* Validation Bits */
+build_append_int_noprefix(table, validation_bits, 1);
+/* Flags */
+build_append_int_noprefix(table, flags, 1);
+/* Error Data Length */
+build_append_int_noprefix(table, error_data_length, 4);
+
+/* FRU Id */
+g_array_append_vals(table, fru_id.data, ARRAY_SIZE(fru_id.data));
+
+/* FRU Text */
+g_array_append_vals(table, fru_text, sizeof(fru_text));
+
+/* Timestamp */
+build_append_int_noprefix(table, time_stamp, 8);
+}
+
+/*
+ * Generic Error Status Block
+ * ACPI 6.1: 18.3.2.7.1 Generic Error Data
+ */
+static void acpi_ghes_generic_error_status(GArray *table, uint32_t 
block_status,
+uint32_t raw_data_offset, uint32_t raw_data_length,
+uint32_t data_length, uint32_t error_severity)
+{
+/* Block Status */
+build_append_int_noprefix(table, block_status, 4);
+/* Raw Data Offset */
+build_append_int_noprefix(table, raw_data_offset, 4);
+/* Raw Data Length */
+build_append_int_noprefix(table, raw_data_length, 4);
+/* Data Length */
+build_append_int_noprefix(table, data_length, 4);
+/* Error Severity */
+build_append_int_noprefix(table, error_severity, 4);
+}
+
+/* UEFI 2.6: N.2.5 Memory Error Section */
+static void acpi_ghes_build_append_mem_cper(GArray *table,
+uint64_t error_physical_addr)
+{
+/*
+ * Memory Error Record
+ */
+
+/* Validation Bits */
+build_append_int_noprefix(table,
+  (1ULL << 14) | /* Type Valid */
+  (1ULL << 1) /* Physical Address Valid */,
+  8);
+/* Error Status */
+build_append_int_noprefix(table, 0, 8);
+/* Physical Address */
+build_append_int_noprefix(table, error_physical_addr, 8);
+/* Skip all the detailed information normally found in such a record */
+

[PATCH v27 06/10] ACPI: Record the Generic Error Status Block address

2020-05-11 Thread Dongjiu Geng
Record the GHEB address via fw_cfg file, when recording
a error to CPER, it will use this address to find out
Generic Error Data Entries and write the error.

In order to avoid migration failure, make hardware
error table address to a part of GED device instead
of global variable, then this address will be migrated
to target QEMU.

Acked-by: Xiang Zheng 
Signed-off-by: Dongjiu Geng 
Reviewed-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
---
 hw/acpi/generic_event_device.c | 19 +++
 hw/acpi/ghes.c | 14 ++
 hw/arm/virt-acpi-build.c   |  8 
 include/hw/acpi/generic_event_device.h |  2 ++
 include/hw/acpi/ghes.h |  6 ++
 5 files changed, 49 insertions(+)

diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 5d17f78..b1cbdd8 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -247,6 +247,24 @@ static const VMStateDescription vmstate_ged_state = {
 }
 };
 
+static bool ghes_needed(void *opaque)
+{
+AcpiGedState *s = opaque;
+return s->ghes_state.ghes_addr_le;
+}
+
+static const VMStateDescription vmstate_ghes_state = {
+.name = "acpi-ged/ghes",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = ghes_needed,
+.fields  = (VMStateField[]) {
+VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
+   vmstate_ghes_state, AcpiGhesState),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_acpi_ged = {
 .name = "acpi-ged",
 .version_id = 1,
@@ -257,6 +275,7 @@ static const VMStateDescription vmstate_acpi_ged = {
 },
 .subsections = (const VMStateDescription * []) {
 _memhp_state,
+_ghes_state,
 NULL
 }
 };
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 091fd87..e74af23 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -24,6 +24,8 @@
 #include "hw/acpi/ghes.h"
 #include "hw/acpi/aml-build.h"
 #include "qemu/error-report.h"
+#include "hw/acpi/generic_event_device.h"
+#include "hw/nvram/fw_cfg.h"
 
 #define ACPI_GHES_ERRORS_FW_CFG_FILE"etc/hardware_errors"
 #define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
@@ -213,3 +215,15 @@ void acpi_build_hest(GArray *table_data, BIOSLinker 
*linker)
 build_header(linker, table_data, (void *)(table_data->data + hest_start),
 "HEST", table_data->len - hest_start, 1, NULL, NULL);
 }
+
+void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
+  GArray *hardware_error)
+{
+/* Create a read-only fw_cfg file for GHES */
+fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
+hardware_error->len);
+
+/* Create a read-write fw_cfg file for Address */
+fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
+NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
+}
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index ef94e03..1b0a584 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -917,6 +917,7 @@ void virt_acpi_setup(VirtMachineState *vms)
 {
 AcpiBuildTables tables;
 AcpiBuildState *build_state;
+AcpiGedState *acpi_ged_state;
 
 if (!vms->fw_cfg) {
 trace_virt_acpi_setup();
@@ -947,6 +948,13 @@ void virt_acpi_setup(VirtMachineState *vms)
 fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
 acpi_data_len(tables.tcpalog));
 
+if (vms->ras) {
+assert(vms->acpi_dev);
+acpi_ged_state = ACPI_GED(vms->acpi_dev);
+acpi_ghes_add_fw_cfg(_ged_state->ghes_state,
+ vms->fw_cfg, tables.hardware_errors);
+}
+
 build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
  build_state, tables.rsdp,
  ACPI_BUILD_RSDP_FILE, 0);
diff --git a/include/hw/acpi/generic_event_device.h 
b/include/hw/acpi/generic_event_device.h
index 9eb86ca..83917de 100644
--- a/include/hw/acpi/generic_event_device.h
+++ b/include/hw/acpi/generic_event_device.h
@@ -61,6 +61,7 @@
 
 #include "hw/sysbus.h"
 #include "hw/acpi/memory_hotplug.h"
+#include "hw/acpi/ghes.h"
 
 #define ACPI_POWER_BUTTON_DEVICE "PWRB"
 
@@ -96,6 +97,7 @@ typedef struct AcpiGedState {
 GEDState ged_state;
 uint32_t ged_event_bitmap;
 qemu_irq irq;
+AcpiGhesState ghes_state;
 } AcpiGedState;
 
 void build_ged_aml(Aml *table, const char* name, HotplugHandler *hotplug_dev,
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 18debd8..a3420fc 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -62,6 +62,12 @@ enum {
 ACPI_HEST_SRC_ID_RESERVED,
 };
 
+typedef struct AcpiGhesState {
+uint64_t ghes_addr_le;
+} AcpiGhesState;
+
 void build_ghes_error_table(GArray *hardware_errors, BIOSLinker 

[PATCH v27 09/10] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2020-05-11 Thread Dongjiu Geng
Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
translates the host VA delivered by host to guest PA, then fills this PA
to guest APEI GHES memory, then notifies guest according to the SIGBUS
type.

When guest accesses the poisoned memory, it will generate a Synchronous
External Abort(SEA). Then host kernel gets an APEI notification and calls
memory_failure() to unmapped the affected page in stage 2, finally
returns to guest.

Guest continues to access the PG_hwpoison page, it will trap to KVM as
stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
Qemu, Qemu records this error address into guest APEI GHES memory and
notifes guest using Synchronous-External-Abort(SEA).

In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
in which we can setup the type of exception and the syndrome information.
When switching to guest, the target vcpu will jump to the synchronous
external abort vector table entry.

The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
not valid and hold an UNKNOWN value. These values will be set to KVM
register structures through KVM_SET_ONE_REG IOCTL.

Signed-off-by: Dongjiu Geng 
Signed-off-by: Xiang Zheng 
Reviewed-by: Michael S. Tsirkin 
Acked-by: Xiang Zheng 
Reviewed-by: Peter Maydell 
Reviewed-by: Igor Mammedov 
---
 include/sysemu/kvm.h|  3 +-
 target/arm/cpu.h|  4 +++
 target/arm/helper.c |  2 +-
 target/arm/internals.h  |  5 ++--
 target/arm/kvm64.c  | 77 +
 target/arm/tlb_helper.c |  2 +-
 target/i386/cpu.h   |  2 ++
 7 files changed, 89 insertions(+), 6 deletions(-)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 141342d..3b22504 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -379,8 +379,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
 /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
 unsigned long kvm_arch_vcpu_id(CPUState *cpu);
 
-#ifdef TARGET_I386
-#define KVM_HAVE_MCE_INJECTION 1
+#ifdef KVM_HAVE_MCE_INJECTION
 void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
 #endif
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 8608da6..89f51c6 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -28,6 +28,10 @@
 /* ARM processors have a weak memory model */
 #define TCG_GUEST_DEFAULT_MO  (0)
 
+#ifdef TARGET_AARCH64
+#define KVM_HAVE_MCE_INJECTION 1
+#endif
+
 #define EXCP_UDEF1   /* undefined instruction */
 #define EXCP_SWI 2   /* software interrupt */
 #define EXCP_PREFETCH_ABORT  3
diff --git a/target/arm/helper.c b/target/arm/helper.c
index a94f650..355b2d5 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3481,7 +3481,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t 
value,
  * Report exception with ESR indicating a fault due to a
  * translation table walk for a cache maintenance instruction.
  */
-syn = syn_data_abort_no_iss(current_el == target_el,
+syn = syn_data_abort_no_iss(current_el == target_el, 0,
 fi.ea, 1, fi.s1ptw, 1, fsc);
 env->exception.vaddress = value;
 env->exception.fsr = fsr;
diff --git a/target/arm/internals.h b/target/arm/internals.h
index e633aff..37c22a9 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int 
ea, int s1ptw, int fsc)
 | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
 }
 
-static inline uint32_t syn_data_abort_no_iss(int same_el,
+static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
  int ea, int cm, int s1ptw,
  int wnr, int fsc)
 {
 return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
| ARM_EL_IL
-   | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
+   | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
+   | (wnr << 6) | fsc;
 }
 
 static inline uint32_t syn_data_abort_with_iss(int same_el,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index be5b31c..d53f7f2 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -28,6 +28,9 @@
 #include "sysemu/kvm_int.h"
 #include "kvm_arm.h"
 #include "internals.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/ghes.h"
+#include "hw/arm/virt.h"
 
 static bool have_guest_debug;
 
@@ -893,6 +896,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
 return KVM_PUT_RUNTIME_STATE;
 }
 
+/* Callers must hold the iothread mutex lock */
+static void kvm_inject_arm_sea(CPUState *c)
+{
+ARMCPU *cpu = ARM_CPU(c);
+CPUARMState *env = >env;
+CPUClass *cc = CPU_GET_CLASS(c);
+uint32_t esr;
+bool same_el;
+
+c->exception_index = EXCP_DATA_ABORT;
+

[PATCH v27 04/10] ACPI: Build related register address fields via hardware error fw_cfg blob

2020-05-11 Thread Dongjiu Geng
This patch builds error_block_address and read_ack_register fields
in hardware errors table , the error_block_address points to Generic
Error Status Block(GESB) via bios_linker. The max size for one GESB
is 1kb, For more detailed information, please refer to
document: docs/specs/acpi_hest_ghes.rst

Now we only support one Error source, if necessary, we can extend to
support more.

Suggested-by: Laszlo Ersek 
Signed-off-by: Xiang Zheng 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Igor Mammedov 
Signed-off-by: Dongjiu Geng 
Reviewed-by: Michael S. Tsirkin 
---
 default-configs/arm-softmmu.mak |  1 +
 hw/acpi/Kconfig |  4 ++
 hw/acpi/Makefile.objs   |  1 +
 hw/acpi/aml-build.c |  2 +
 hw/acpi/ghes.c  | 89 +
 hw/arm/virt-acpi-build.c|  5 +++
 include/hw/acpi/aml-build.h |  1 +
 include/hw/acpi/ghes.h  | 28 +
 8 files changed, 131 insertions(+)
 create mode 100644 hw/acpi/ghes.c
 create mode 100644 include/hw/acpi/ghes.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 36a0e89..8fc09a4 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -42,3 +42,4 @@ CONFIG_FSL_IMX7=y
 CONFIG_FSL_IMX6UL=y
 CONFIG_SEMIHOSTING=y
 CONFIG_ALLWINNER_H3=y
+CONFIG_ACPI_APEI=y
diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 54209c6..1932f66 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -28,6 +28,10 @@ config ACPI_HMAT
 bool
 depends on ACPI
 
+config ACPI_APEI
+bool
+depends on ACPI
+
 config ACPI_PCI
 bool
 depends on ACPI && PCI
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index cab9bcd..72886c7 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -8,6 +8,7 @@ common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
 common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
 common-obj-$(CONFIG_ACPI_HMAT) += hmat.o
+common-obj-$(CONFIG_ACPI_APEI) += ghes.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 common-obj-$(call lnot,$(CONFIG_PC)) += acpi-x86-stub.o
 
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 2c3702b..3681ec6 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1578,6 +1578,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
 tables->table_data = g_array_new(false, true /* clear */, 1);
 tables->tcpalog = g_array_new(false, true /* clear */, 1);
 tables->vmgenid = g_array_new(false, true /* clear */, 1);
+tables->hardware_errors = g_array_new(false, true /* clear */, 1);
 tables->linker = bios_linker_loader_init();
 }
 
@@ -1588,6 +1589,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, 
bool mfre)
 g_array_free(tables->table_data, true);
 g_array_free(tables->tcpalog, mfre);
 g_array_free(tables->vmgenid, mfre);
+g_array_free(tables->hardware_errors, mfre);
 }
 
 /*
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
new file mode 100644
index 000..e1b3f8f
--- /dev/null
+++ b/hw/acpi/ghes.c
@@ -0,0 +1,89 @@
+/*
+ * Support for generating APEI tables and recording CPER for Guests
+ *
+ * Copyright (c) 2020 HUAWEI TECHNOLOGIES CO., LTD.
+ *
+ * Author: Dongjiu Geng 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "hw/acpi/ghes.h"
+#include "hw/acpi/aml-build.h"
+
+#define ACPI_GHES_ERRORS_FW_CFG_FILE"etc/hardware_errors"
+#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
+
+/* The max size in bytes for one error block */
+#define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
+
+/* Now only support ARMv8 SEA notification type error source */
+#define ACPI_GHES_ERROR_SOURCE_COUNT1
+
+/*
+ * Build table for the hardware error fw_cfg blob.
+ * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg 
blobs.
+ * See docs/specs/acpi_hest_ghes.rst for blobs format.
+ */
+void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
+{
+int i, error_status_block_offset;
+
+/* Build error_block_address */
+for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
+}
+
+/* Build read_ack_register */
+for (i = 0; i < 

[PATCH v27 07/10] KVM: Move hwpoison page related functions into kvm-all.c

2020-05-11 Thread Dongjiu Geng
kvm_hwpoison_page_add() and kvm_unpoison_all() will both
be used by X86 and ARM platforms, so moving them into
"accel/kvm/kvm-all.c" to avoid duplicate code.

For architectures that don't use the poison-list functionality
the reset handler will harmlessly do nothing, so let's register
the kvm_unpoison_all() function in the generic kvm_init() function.

Reviewed-by: Peter Maydell 
Signed-off-by: Dongjiu Geng 
Signed-off-by: Xiang Zheng 
Acked-by: Xiang Zheng 
---
 accel/kvm/kvm-all.c  | 36 
 include/sysemu/kvm_int.h | 12 
 target/i386/kvm.c| 36 
 3 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 439a4ef..36be117 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -44,6 +44,7 @@
 #include "qapi/visitor.h"
 #include "qapi/qapi-types-common.h"
 #include "qapi/qapi-visit-common.h"
+#include "sysemu/reset.h"
 
 #include "hw/boards.h"
 
@@ -883,6 +884,39 @@ int kvm_vm_check_extension(KVMState *s, unsigned int 
extension)
 return ret;
 }
 
+typedef struct HWPoisonPage {
+ram_addr_t ram_addr;
+QLIST_ENTRY(HWPoisonPage) list;
+} HWPoisonPage;
+
+static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
+QLIST_HEAD_INITIALIZER(hwpoison_page_list);
+
+static void kvm_unpoison_all(void *param)
+{
+HWPoisonPage *page, *next_page;
+
+QLIST_FOREACH_SAFE(page, _page_list, list, next_page) {
+QLIST_REMOVE(page, list);
+qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
+g_free(page);
+}
+}
+
+void kvm_hwpoison_page_add(ram_addr_t ram_addr)
+{
+HWPoisonPage *page;
+
+QLIST_FOREACH(page, _page_list, list) {
+if (page->ram_addr == ram_addr) {
+return;
+}
+}
+page = g_new(HWPoisonPage, 1);
+page->ram_addr = ram_addr;
+QLIST_INSERT_HEAD(_page_list, page, list);
+}
+
 static uint32_t adjust_ioeventfd_endianness(uint32_t val, uint32_t size)
 {
 #if defined(HOST_WORDS_BIGENDIAN) != defined(TARGET_WORDS_BIGENDIAN)
@@ -2085,6 +2119,8 @@ static int kvm_init(MachineState *ms)
 s->kernel_irqchip_split = mc->default_kernel_irqchip_split ? 
ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
 }
 
+qemu_register_reset(kvm_unpoison_all, NULL);
+
 if (s->kernel_irqchip_allowed) {
 kvm_irqchip_create(s);
 }
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index ac2d1f8..c660a70 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -42,4 +42,16 @@ void kvm_memory_listener_register(KVMState *s, 
KVMMemoryListener *kml,
   AddressSpace *as, int as_id);
 
 void kvm_set_max_memslot_size(hwaddr max_slot_size);
+
+/**
+ * kvm_hwpoison_page_add:
+ *
+ * Parameters:
+ *  @ram_addr: the address in the RAM for the poisoned page
+ *
+ * Add a poisoned page to the list
+ *
+ * Return: None.
+ */
+void kvm_hwpoison_page_add(ram_addr_t ram_addr);
 #endif
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 4901c6d..34f8387 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -24,7 +24,6 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/hw_accel.h"
 #include "sysemu/kvm_int.h"
-#include "sysemu/reset.h"
 #include "sysemu/runstate.h"
 #include "kvm_i386.h"
 #include "hyperv.h"
@@ -533,40 +532,6 @@ uint64_t kvm_arch_get_supported_msr_feature(KVMState *s, 
uint32_t index)
 }
 }
 
-
-typedef struct HWPoisonPage {
-ram_addr_t ram_addr;
-QLIST_ENTRY(HWPoisonPage) list;
-} HWPoisonPage;
-
-static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
-QLIST_HEAD_INITIALIZER(hwpoison_page_list);
-
-static void kvm_unpoison_all(void *param)
-{
-HWPoisonPage *page, *next_page;
-
-QLIST_FOREACH_SAFE(page, _page_list, list, next_page) {
-QLIST_REMOVE(page, list);
-qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
-g_free(page);
-}
-}
-
-static void kvm_hwpoison_page_add(ram_addr_t ram_addr)
-{
-HWPoisonPage *page;
-
-QLIST_FOREACH(page, _page_list, list) {
-if (page->ram_addr == ram_addr) {
-return;
-}
-}
-page = g_new(HWPoisonPage, 1);
-page->ram_addr = ram_addr;
-QLIST_INSERT_HEAD(_page_list, page, list);
-}
-
 static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
  int *max_banks)
 {
@@ -2180,7 +2145,6 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 fprintf(stderr, "e820_add_entry() table is full\n");
 return ret;
 }
-qemu_register_reset(kvm_unpoison_all, NULL);
 
 shadow_mem = object_property_get_int(OBJECT(s), "kvm-shadow-mem", 
_abort);
 if (shadow_mem != -1) {
-- 
1.8.3.1




[PATCH v27 01/10] acpi: nvdimm: change NVDIMM_UUID_LE to a common macro

2020-05-11 Thread Dongjiu Geng
The little end UUID is used in many places, so make
NVDIMM_UUID_LE to a common macro to convert the UUID
to a little end array.

Reviewed-by: Xiang Zheng 
Signed-off-by: Dongjiu Geng 
---
Change since v25:
1. Address Peter's comments to add a proper doc-comment comment for
   UUID_LE macros.
---
 hw/acpi/nvdimm.c| 10 +++---
 include/qemu/uuid.h | 27 +++
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index fa7bf8b..9316d12 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -27,6 +27,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/uuid.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/aml-build.h"
 #include "hw/acpi/bios-linker-loader.h"
@@ -34,18 +35,13 @@
 #include "hw/mem/nvdimm.h"
 #include "qemu/nvdimm-utils.h"
 
-#define NVDIMM_UUID_LE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7) \
-   { (a) & 0xff, ((a) >> 8) & 0xff, ((a) >> 16) & 0xff, ((a) >> 24) & 0xff, \
- (b) & 0xff, ((b) >> 8) & 0xff, (c) & 0xff, ((c) >> 8) & 0xff,  \
- (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) }
-
 /*
  * define Byte Addressable Persistent Memory (PM) Region according to
  * ACPI 6.0: 5.2.25.1 System Physical Address Range Structure.
  */
 static const uint8_t nvdimm_nfit_spa_uuid[] =
-  NVDIMM_UUID_LE(0x66f0d379, 0xb4f3, 0x4074, 0xac, 0x43, 0x0d, 0x33,
- 0x18, 0xb7, 0x8c, 0xdb);
+  UUID_LE(0x66f0d379, 0xb4f3, 0x4074, 0xac, 0x43, 0x0d, 0x33,
+  0x18, 0xb7, 0x8c, 0xdb);
 
 /*
  * NVDIMM Firmware Interface Table
diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
index 129c45f..9925feb 100644
--- a/include/qemu/uuid.h
+++ b/include/qemu/uuid.h
@@ -34,6 +34,33 @@ typedef struct {
 };
 } QemuUUID;
 
+/**
+ * UUID_LE - converts the fields of UUID to little-endian array,
+ * each of parameters is the filed of UUID.
+ *
+ * @time_low: The low field of the timestamp
+ * @time_mid: The middle field of the timestamp
+ * @time_hi_and_version: The high field of the timestamp
+ *   multiplexed with the version number
+ * @clock_seq_hi_and_reserved: The high field of the clock
+ * sequence multiplexed with the variant
+ * @clock_seq_low: The low field of the clock sequence
+ * @node0: The spatially unique node0 identifier
+ * @node1: The spatially unique node1 identifier
+ * @node2: The spatially unique node2 identifier
+ * @node3: The spatially unique node3 identifier
+ * @node4: The spatially unique node4 identifier
+ * @node5: The spatially unique node5 identifier
+ */
+#define UUID_LE(time_low, time_mid, time_hi_and_version,\
+  clock_seq_hi_and_reserved, clock_seq_low, node0, node1, node2,\
+  node3, node4, node5)  \
+  { (time_low) & 0xff, ((time_low) >> 8) & 0xff, ((time_low) >> 16) & 0xff, \
+((time_low) >> 24) & 0xff, (time_mid) & 0xff, ((time_mid) >> 8) & 0xff, \
+(time_hi_and_version) & 0xff, ((time_hi_and_version) >> 8) & 0xff,  \
+(clock_seq_hi_and_reserved), (clock_seq_low), (node0), (node1), (node2),\
+(node3), (node4), (node5) }
+
 #define UUID_FMT "%02hhx%02hhx%02hhx%02hhx-" \
  "%02hhx%02hhx-%02hhx%02hhx-" \
  "%02hhx%02hhx-" \
-- 
1.8.3.1




[PATCH 2/4] fuzz: fix typo in i440fx-qtest-reboot arguments

2020-05-11 Thread Alexander Bulekov
Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/i440fx_fuzz.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/fuzz/i440fx_fuzz.c b/tests/qtest/fuzz/i440fx_fuzz.c
index ab5f112584..90e75ffaea 100644
--- a/tests/qtest/fuzz/i440fx_fuzz.c
+++ b/tests/qtest/fuzz/i440fx_fuzz.c
@@ -143,7 +143,7 @@ static void i440fx_fuzz_qos_fork(QTestState *s,
 }
 
 static const char *i440fx_qtest_argv = TARGET_NAME " -machine accel=qtest"
-   "-m 0 -display none";
+   " -m 0 -display none";
 static const char *i440fx_argv(FuzzTarget *t)
 {
 return i440fx_qtest_argv;
-- 
2.26.2




[PATCH v27 00/10] Add ARMv8 RAS virtualization support in QEMU

2020-05-11 Thread Dongjiu Geng
In the ARMv8 platform, the CPU error types includes synchronous external 
abort(SEA)
and SError Interrupt (SEI). If exception happens in guest, host does not know 
the detailed
information of guest, so it is expected that guest can do the recovery. For 
example, if an
exception happens in a guest user-space application, host does not know which 
application
encounters errors, only guest knows it.

For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify userspace.
After user space gets the notification, it will record the CPER into guest GHES
buffer and inject an exception or IRQ to guest.

In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we will
treat it as a synchronous exception, and notify guest with ARMv8 SEA
notification type after recording CPER into guest.

A) This series of patches are based on Qemu 4.2, which include two parts:
1. Generate APEI/GHES table.
2. Handle the SIGBUS signal, record the CPER in runtime and fill it into guest
   memory, then notify guest according to the type of SIGBUS.

B) The solution was suggested by James(james.mo...@arm.com); The APEI part 
solution was suggested by Laszlo(ler...@redhat.com). Show some discussions in 
[1].

C) This series of patches have already been tested on ARM64 platform with RAS
feature enabled:
1. Show the APEI part verification result in [2].
2. Show the SIGBUS of BUS_MCEERR_AR handling verification result in [3].

D) Add 'ras' option in command Line to enable guest RAS error recovery feature, 
For example:
KVM model: ./qemu-system-aarch64 --enable-kvm -cpu host --bios QEMU_EFI.fd_new  
-machine virt,gic-version=3,ras,kernel-irqchip=on
-smp 4 -nographic -kernel Image  -append "rdinit=/init console=ttyAMA0 mem=512M 
root=/dev/ram0" -initrd guestfs_new.cpio.gz
TCG model: ./qemu-system-aarch64 -cpu cortex-a57 --bios QEMU_EFI.fd_new  
-machine virt,gic-version=3,ras,kernel-irqchip=on  -smp 4
-nographic -kernel Image  -append "rdinit=/init console=ttyAMA0 mem=512M 
root=/dev/ram0" -initrd guestfs_new.cpio.gz
---
Change since v23:
1. fix a warning for uuid

Change since v22:
1. Using 1 * KiB instead of 0x400 to define max size of one error block
2. Make the alignment to 8 bytes in bios_linker_loader_alloc()
3. Change "Copyright (c) 2019" to "Copyright (c) 2020" in file header
4. Fix some code style warnings/errors and add some comments in code
5. Address Jonathan's comments to easily support CCIX error injection
6. Add vmstate_ghes_state .subsections in vmstate_acpi_ged

Change since v21:
1. Make the user-facing 'ras' option description more clearly to address 
Peter's comments.
2. Update the doc description in "docs/specs/acpi_hest_ghes.rst"
3. Split HEST/GHES patches to more patches to make the review easily
4. Using source_id to index the location to save the CPER.
5. Optimize and simplify the logic to build HEST/GHES table to address 
Igor/Michael/Beata comments.
6. make ghes_addr_le a part of GED device.

Change since v20:
1. Move some implementation details from acpi_ghes.h to acpi_ghes.c
2. Add the reviewers for the ACPI/APEI/GHES part

Change since v19:
1. Fix clang compile error
2. Fix sphinx build error

Change since v18:
1. Fix some code-style and typo/grammar problems.
2. Remove no_ras in the VirtMachineClass struct.
3. Convert documentation to rst format.
4. Simplize the code and add comments for some magic value.
5. Move kvm_inject_arm_sea() function into the patch where it's used.
6. Register the reset handler(kvm_unpoison_all()) in the kvm_init() function.

Change since v17:
1. Improve some commit messages and comments.
2. Fix some code-style problems.
3. Add a *ras* machine option.
4. Move HEST/GHES related structures and macros into "hw/acpi/acpi_ghes.*".
5. Move HWPoison page functions into "include/sysemu/kvm_int.h".
6. Fix some bugs.
7. Improve the design document.

Change since v16:
1. check whether ACPI table is enabled when handling the memory error in the 
SIGBUS handler.

Change since v15:
1. Add a doc-comment in the proper format for 'include/exec/ram_addr.h'
2. Remove write_part_cpustate_to_list() because there is another bug fix patch
   has been merged "arm: Allow system registers for KVM guests to be changed by 
QEMU code"
3. Add some comments for kvm_inject_arm_sea() in 'target/arm/kvm64.c'
4. Compare the arm_current_el() return value to 0,1,2,3, not to PSTATE_MODE_* 
constants.
5. Change the RAS support wasn't introduced before 4.1 QEMU version.
6. Move the no_ras flag  patch to begin in this series

Change since v14:
1. Remove the BUS_MCEERR_AO handling logic because this asynchronous signal was 
masked by main thread
2. Address some Igor Mammedov's comments(ACPI part)
   1) change the comments for the enum AcpiHestNotifyType definition and remove 
ditto in patch 1
   2) change some patch commit messages and separate "APEI GHES table 
generation" patch to more patches.
3. Address some peter's comments(arm64 Synchronous External Abort injection)
   1) change some code notes
   2) 

[PATCH v27 03/10] docs: APEI GHES generation and CPER record description

2020-05-11 Thread Dongjiu Geng
Add APEI/GHES detailed design document

Signed-off-by: Dongjiu Geng 
Signed-off-by: Xiang Zheng 
Reviewed-by: Michael S. Tsirkin 
Reviewed-by: Igor Mammedov 
---
 docs/specs/acpi_hest_ghes.rst | 110 ++
 docs/specs/index.rst  |   1 +
 2 files changed, 111 insertions(+)
 create mode 100644 docs/specs/acpi_hest_ghes.rst

diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
new file mode 100644
index 000..68f1fbe
--- /dev/null
+++ b/docs/specs/acpi_hest_ghes.rst
@@ -0,0 +1,110 @@
+APEI tables generating and CPER record
+==
+
+..
+   Copyright (c) 2020 HUAWEI TECHNOLOGIES CO., LTD.
+
+   This work is licensed under the terms of the GNU GPL, version 2 or later.
+   See the COPYING file in the top-level directory.
+
+Design Details
+--
+
+::
+
+ etc/acpi/tables   etc/hardware_errors
+     ===
+  + +--+++
+  | | HEST | +->|error_block_address1
|--+
+  | +--+ |  ++ 
 |
+  | | GHES1| | +--->|error_block_address2
|--+-+
+  | +--+ | |++ 
 | |
+  | | .| | ||  ..| 
 | |
+  | | error_status_address-+-+ |-+ 
 | |
+  | | .|   |   +--->|error_block_addressN
|--+-+---+
+  | | read_ack_register+-+ |   |++ 
 | |   |
+  | | read_ack_preserve| +-+---+--->| read_ack_register1 | 
 | |   |
+  | | read_ack_write   |   |   |++ 
 | |   |
+  + +--+   | +-+--->| read_ack_register2 | 
 | |   |
+  | | GHES2|   | | |++ 
 | |   |
+  + +--+   | | ||   .| 
 | |   |
+  | | .|   | | |++ 
 | |   |
+  | | error_status_address-+---+ | | +->| read_ack_registerN | 
 | |   |
+  | | .| | | |  ++ 
 | |   |
+  | | read_ack_register+-+ | |  |Generic Error Status Block 
1|<-+ |   |
+  | | read_ack_preserve|   | |  |-++-+ 
   |   |
+  | | read_ack_write   |   | |  | |  CPER  | | 
   |   |
+  + +--|   | |  | |  CPER  | | 
   |   |
+  | | ...  |   | |  | |    | | 
   |   |
+  + +--+   | |  | |  CPER  | | 
   |   |
+  | | GHESN|   | |  |-++-| 
   |   |
+  + +--+   | |  |Generic Error Status Block 
2|<---+   |
+  | | .|   | |  |-++-+ 
   |
+  | | error_status_address-+---+ |  | |   CPER | | 
   |
+  | | .| |  | |   CPER | | 
   |
+  | | read_ack_register+-+  | |    | | 
   |
+  | | read_ack_preserve|| |   CPER | | 
   |
+  | | read_ack_write   |+-++-+ 
   |
+  + +--+| .. | 
   |
+|+ 
   |
+|Generic Error Status Block N 
|<--+
+|-+-+-+
+| |  CPER   | |
+| |  CPER   | |
+| |     | |
+| |  CPER   | |
++-+-+-+
+
+
+(1) QEMU generates the ACPI HEST table. This table goes in the current
+"etc/acpi/tables" fw_cfg blob. Each error source has different
+notification types.
+
+(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU
+also needs to populate this blob. The "etc/hardware_errors" fw_cfg blob
+contains an address registers table and an Error Status Data Block table.
+
+(3) The address registers table contains N Error Block Address entries
+and N Read Ack 

[PATCH v27 02/10] hw/arm/virt: Introduce a RAS machine option

2020-05-11 Thread Dongjiu Geng
RAS Virtualization feature is not supported now, so
add a RAS machine option and disable it by default.

Reviewed-by: Peter Maydell 
Signed-off-by: Dongjiu Geng 
Signed-off-by: Xiang Zheng 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Igor Mammedov 
---
 hw/arm/virt.c | 23 +++
 include/hw/arm/virt.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 171e690..2d46c3f 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1995,6 +1995,20 @@ static void virt_set_acpi(Object *obj, Visitor *v, const 
char *name,
 visit_type_OnOffAuto(v, name, >acpi, errp);
 }
 
+static bool virt_get_ras(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->ras;
+}
+
+static void virt_set_ras(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->ras = value;
+}
+
 static char *virt_get_gic_version(Object *obj, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2327,6 +2341,15 @@ static void virt_instance_init(Object *obj)
 "Valid values are none and smmuv3",
 NULL);
 
+/* Default disallows RAS instantiation */
+vms->ras = false;
+object_property_add_bool(obj, "ras", virt_get_ras,
+ virt_set_ras, NULL);
+object_property_set_description(obj, "ras",
+"Set on/off to enable/disable reporting 
host memory errors "
+"to a KVM guest using ACPI and guest 
external abort exceptions",
+NULL);
+
 vms->irqmap = a15irqmap;
 
 virt_flash_create(vms);
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 6d67ace..31878dd 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -132,6 +132,7 @@ typedef struct {
 bool highmem_ecam;
 bool its;
 bool virt;
+bool ras;
 OnOffAuto acpi;
 VirtGICType gic_version;
 VirtIOMMUType iommu;
-- 
1.8.3.1




[PATCH 3/4] fuzz: add mangled object name to linker script

2020-05-11 Thread Alexander Bulekov
Previously, we relied on "FuzzerTracePC*(.bss*)" to place libfuzzer's
fuzzer::TPC object into our contiguous shared-memory region. This does
not work for some libfuzzer builds, so this addition identifies the
region by its mangled name: *(.bss._ZN6fuzzer3TPCE);

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/fork_fuzz.ld | 5 +
 1 file changed, 5 insertions(+)

This isn't ideal, but I looked at the libfuzzer builds packaged for
debian, for versions 6, 7, 8, 9, 10 and 11 and this (mangled) object
name appears consistently in the symbol tables.

diff --git a/tests/qtest/fuzz/fork_fuzz.ld b/tests/qtest/fuzz/fork_fuzz.ld
index e086bba873..bfb667ed06 100644
--- a/tests/qtest/fuzz/fork_fuzz.ld
+++ b/tests/qtest/fuzz/fork_fuzz.ld
@@ -28,6 +28,11 @@ SECTIONS
 
   /* Internal Libfuzzer TracePC object which contains the ValueProfileMap 
*/
   FuzzerTracePC*(.bss*);
+  /*
+   * In case the above line fails, explicitly specify the (mangled) name of
+   * the object we care about
+   */
+   *(.bss._ZN6fuzzer3TPCE);
   }
   .data.fuzz_end : ALIGN(4K)
   {
-- 
2.26.2




[PATCH 4/4] fuzz: run the main-loop in fork-server process

2020-05-11 Thread Alexander Bulekov
Without this, the time since the last main-loop keeps increasing, as the
fuzzer runs. The forked children need to handle all the "past-due"
timers, slowing them down, over time. With this change, the
parent/fork-server process runs the main-loop, while waiting on the
child, ensuring that the timer events do not pile up, over time.

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/i440fx_fuzz.c  | 1 +
 tests/qtest/fuzz/virtio_net_fuzz.c  | 2 ++
 tests/qtest/fuzz/virtio_scsi_fuzz.c | 2 ++
 3 files changed, 5 insertions(+)

I'm working on another series to abstract away the details of resetting
qemu state between runs from the individual targets. That should relieve
us from needing to add this for each new fuzzing target.

diff --git a/tests/qtest/fuzz/i440fx_fuzz.c b/tests/qtest/fuzz/i440fx_fuzz.c
index 90e75ffaea..8449f81687 100644
--- a/tests/qtest/fuzz/i440fx_fuzz.c
+++ b/tests/qtest/fuzz/i440fx_fuzz.c
@@ -138,6 +138,7 @@ static void i440fx_fuzz_qos_fork(QTestState *s,
 i440fx_fuzz_qos(s, Data, Size);
 _Exit(0);
 } else {
+flush_events(s);
 wait(NULL);
 }
 }
diff --git a/tests/qtest/fuzz/virtio_net_fuzz.c 
b/tests/qtest/fuzz/virtio_net_fuzz.c
index d08a47e278..a33bd73067 100644
--- a/tests/qtest/fuzz/virtio_net_fuzz.c
+++ b/tests/qtest/fuzz/virtio_net_fuzz.c
@@ -122,6 +122,7 @@ static void virtio_net_fork_fuzz(QTestState *s,
 flush_events(s);
 _Exit(0);
 } else {
+flush_events(s);
 wait(NULL);
 }
 }
@@ -134,6 +135,7 @@ static void virtio_net_fork_fuzz_check_used(QTestState *s,
 flush_events(s);
 _Exit(0);
 } else {
+flush_events(s);
 wait(NULL);
 }
 }
diff --git a/tests/qtest/fuzz/virtio_scsi_fuzz.c 
b/tests/qtest/fuzz/virtio_scsi_fuzz.c
index 3b95247f12..51dce491ab 100644
--- a/tests/qtest/fuzz/virtio_scsi_fuzz.c
+++ b/tests/qtest/fuzz/virtio_scsi_fuzz.c
@@ -145,6 +145,7 @@ static void virtio_scsi_fork_fuzz(QTestState *s,
 flush_events(s);
 _Exit(0);
 } else {
+flush_events(s);
 wait(NULL);
 }
 }
@@ -164,6 +165,7 @@ static void virtio_scsi_with_flag_fuzz(QTestState *s,
 }
 _Exit(0);
 } else {
+flush_events(s);
 wait(NULL);
 }
 }
-- 
2.26.2




Questions about record & replay for RISC-V

2020-05-11 Thread LIU Zhiwei

Hi Pavel,

I am developing a profiling tool depending on record & replay feature of 
QEMU for  RISC-V.

Here I'd like to ask you some questions.

First, is it possible to record & replay a Linux system by this feature 
in theory? I mean keep the strict instruction
sequence of each process and kernel, for a very simple image,  with only 
timer and UART.


Second, is it planed to support RISC-V?

Thanks very much.

Best regards,
Zhiwei



Re: [PATCH 3/3] plugins: avoid failing plugin when CPU is inited several times

2020-05-11 Thread Emilio G. Cota
On Mon, May 11, 2020 at 18:53:19 +0300, Nikolay Igotti wrote:
> Attached to the mail counter.c when running with attached test.c compiled
> to Linux standalone binary shows failing assert, unless the patch is
> applied.

I didn't get the attachment. Can you paste the code at the end of your
reply?

Thanks,
Emilio



Re: [PATCH for QEMU v2] hw/vfio: Add VMD Passthrough Quirk

2020-05-11 Thread Alex Williamson
On Mon, 11 May 2020 15:01:27 -0400
Jon Derrick  wrote:

> The VMD endpoint provides a real PCIe domain to the guest, including

Please define VMD.  I'm sure this is obvious to many, but I've had to
do some research.  The best TL;DR summary I've found is Keith's
original commit 185a383ada2e adding the controller to Linux.  If there's
something better, please let me know.

> bridges and endpoints. Because the VMD domain is enumerated by the guest
> kernel, the guest kernel will assign Guest Physical Addresses to the
> downstream endpoint BARs and bridge windows.
>
> When the guest kernel performs MMIO to VMD sub-devices, IOMMU will
> translate from the guest address space to the physical address space.
> Because the bridges have been programmed with guest addresses, the
> bridges will reject the transaction containing physical addresses.

I'm lost, what IOMMU is involved in CPU access to MMIO space?  My guess
is that since all MMIO of this domain is mapped behind the host
endpoint BARs 2 & 4 that QEMU simply accesses it via mapping of those
BARs into the VM, so it's the MMU, not the IOMMU performing those GPA
to HPA translations.  But then presumably the bridges within the domain
are scrambled because their apertures are programmed with ranges that
don't map into the VMD endpoint BARs.  Is that remotely correct?  Some
/proc/iomem output and/or lspci listing from the host to see how this
works would be useful.

> VMD device 28C0 natively assists passthrough by providing the Host
> Physical Address in shadow registers accessible to the guest for bridge
> window assignment. The shadow registers are valid if bit 1 is set in VMD
> VMLOCK config register 0x70. Future VMDs will also support this feature.
> Existing VMDs have config register 0x70 reserved, and will return 0 on
> reads.

So these shadow registers are simply exposing the host BAR2 & BAR4
addresses into the guest, so the quirk is dependent on reading those
values from the device before anyone has written to them and the BAR
emulation in the kernel kicks in (not a problem, just an observation).

Does the VMD controller code then use these bases addresses to program
the bridges/endpoint within the domain?  What does the same /proc/iomem
or lspci look like inside the guest then?  It seems like we'd see the
VMD endpoint with GPA BARs, but the devices within the domain using
HPAs.  If that's remotely true, and we're not forcing an identity
mapping of this HPA range into the GPA, does the vmd controller driver
impose a TRA function on these MMIO addresses in the guest?

Sorry if I'm way off, I'm piecing things together from scant
information here.  Please Cc me on future vfio related patches.  Thanks,

Alex

 
> In order to support existing VMDs, this quirk emulates the VMLOCK and
> HPA shadow registers for all VMD device ids which don't natively assist
> with passthrough. The Linux VMD driver is updated to allow existing VMD
> devices to query VMLOCK for passthrough support.
> 
> Signed-off-by: Jon Derrick 
> ---
>  hw/vfio/pci-quirks.c | 103 +++
>  hw/vfio/pci.c|   7 +++
>  hw/vfio/pci.h|   2 +
>  hw/vfio/trace-events |   3 ++
>  4 files changed, 115 insertions(+)
> 
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index 2d348f8237..4060a6a95d 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -1709,3 +1709,106 @@ free_exit:
>  
>  return ret;
>  }
> +
> +/*
> + * The VMD endpoint provides a real PCIe domain to the guest and the guest
> + * kernel performs enumeration of the VMD sub-device domain. Guest 
> transactions
> + * to VMD sub-devices go through IOMMU translation from guest addresses to
> + * physical addresses. When MMIO goes to an endpoint after being translated 
> to
> + * physical addresses, the bridge rejects the transaction because the window
> + * has been programmed with guest addresses.
> + *
> + * VMD can use the Host Physical Address in order to correctly program the
> + * bridge windows in its PCIe domain. VMD device 28C0 has HPA shadow 
> registers
> + * located at offset 0x2000 in MEMBAR2 (BAR 4). The shadow registers are 
> valid
> + * if bit 1 is set in the VMD VMLOCK config register 0x70. VMD devices 
> without
> + * this native assistance can have these registers safely emulated as these
> + * registers are reserved.
> + */
> +typedef struct VFIOVMDQuirk {
> +VFIOPCIDevice *vdev;
> +uint64_t membar_phys[2];
> +} VFIOVMDQuirk;
> +
> +static uint64_t vfio_vmd_quirk_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +VFIOVMDQuirk *data = opaque;
> +uint64_t val = 0;
> +
> +memcpy(, (void *)data->membar_phys + addr, size);
> +return val;
> +}
> +
> +static const MemoryRegionOps vfio_vmd_quirk = {
> +.read = vfio_vmd_quirk_read,
> +.endianness = DEVICE_LITTLE_ENDIAN,
> +};
> +
> +#define VMD_VMLOCK  0x70
> +#define VMD_SHADOW  0x2000
> +#define VMD_MEMBAR2 4
> +
> +static int 

Re: [PATCH 1/2] xen-9pfs: Fix log messages of reply errors

2020-05-11 Thread Stefano Stabellini
On Sun, 10 May 2020, Christian Schoenebeck wrote:
> If delivery of some 9pfs response fails for some reason, log the
> error message by mentioning the 9P protocol reply type, not by
> client's request type. The latter could be misleading that the
> error occurred already when handling the request input.
> 
> Signed-off-by: Christian Schoenebeck 

Acked-by: Stefano Stabellini 

> ---
>  hw/9pfs/xen-9p-backend.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> index 18fe5b7c92..f04caabfe5 100644
> --- a/hw/9pfs/xen-9p-backend.c
> +++ b/hw/9pfs/xen-9p-backend.c
> @@ -137,7 +137,8 @@ static ssize_t xen_9pfs_pdu_vmarshal(V9fsPDU *pdu,
>  ret = v9fs_iov_vmarshal(in_sg, num, offset, 0, fmt, ap);
>  if (ret < 0) {
>  xen_pv_printf(_9pfs->xendev, 0,
> -  "Failed to encode VirtFS request type %d\n", pdu->id + 
> 1);
> +  "Failed to encode VirtFS reply type %d\n",
> +  pdu->id + 1);
>  xen_be_set_state(_9pfs->xendev, XenbusStateClosing);
>  xen_9pfs_disconnect(_9pfs->xendev);
>  }
> @@ -201,9 +202,9 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU *pdu,
>  
>  buf_size = iov_size(ring->sg, num);
>  if (buf_size  < P9_IOHDRSZ) {
> -xen_pv_printf(_9pfs->xendev, 0, "Xen 9pfs request type %d"
> -"needs %zu bytes, buffer has %zu, less than minimum\n",
> -pdu->id, *size, buf_size);
> +xen_pv_printf(_9pfs->xendev, 0, "Xen 9pfs reply type %d needs "
> +  "%zu bytes, buffer has %zu, less than minimum\n",
> +  pdu->id + 1, *size, buf_size);
>  xen_be_set_state(_9pfs->xendev, XenbusStateClosing);
>  xen_9pfs_disconnect(_9pfs->xendev);
>  }
> -- 
> 2.20.1
> 



Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size

2020-05-11 Thread Stefano Stabellini
On Sun, 10 May 2020, Christian Schoenebeck wrote:
> Commit SHA-1 16724a173049ac29c7b5ade741da93a0f46edff7 introduced
> truncating the response to the currently available transport buffer
> size, which was supposed to fix an 9pfs error on Xen boot where
> transport buffer might still be smaller than required for response.
> 
> Unfortunately this change broke small reads (with less than 12
> bytes).
> 
> To address both concerns, check the actual response type and only
> truncate reply for Rreaddir responses, 

I realize you mean "Rread" (not Rreaddir). Are we sure that truncation
can only happen with Rread? I checked the spec it looks like Directories
are pretty much like files from the spec point of view. So it seems to
me that truncation might be possible there too.


> and only if truncated reply would at least return one payload byte to
> client. Use Rreaddir's precise header size (11) for this instead of
> P9_IOHDRSZ.

Ah! That's the underlying error isn't it? That P9_IOHDRSZ is not really
the size of the reply header, it is bigger. Hence the check:

  if (buf_size < P9_IOHDRSZ) {

can be wrong for very small sizes.



> Fixes: 16724a173049ac29c7b5ade741da93a0f46edff7
> Fixes: https://bugs.launchpad.net/bugs/1877688
> Signed-off-by: Christian Schoenebeck 
> ---
>  hw/9pfs/virtio-9p-device.c | 35 +++
>  hw/9pfs/xen-9p-backend.c   | 38 +-
>  2 files changed, 56 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
> index 536447a355..57e4d92ecb 100644
> --- a/hw/9pfs/virtio-9p-device.c
> +++ b/hw/9pfs/virtio-9p-device.c
> @@ -154,15 +154,34 @@ static void virtio_init_in_iov_from_pdu(V9fsPDU *pdu, 
> struct iovec **piov,
>  VirtQueueElement *elem = v->elems[pdu->idx];
>  size_t buf_size = iov_size(elem->in_sg, elem->in_num);
>  
> -if (buf_size < P9_IOHDRSZ) {
> -VirtIODevice *vdev = VIRTIO_DEVICE(v);
> +if (pdu->id + 1 == P9_RREAD) {
> +/* size[4] Rread tag[2] count[4] data[count] */

4+2+4 = 10


> +const size_t hdr_size = 11;

Are you adding 1 to account for "count"?


> +/*
> + * If current transport buffer size is smaller than actually required
> + * for this Rreaddir response, then truncate the response to the
> + * currently available transport buffer size, however only if it 
> would
> + * at least allow to return 1 payload byte to client.
> + */
> +if (buf_size < hdr_size + 1) {

If you have already added 1 before, why do we need to add 1 again here?


> +VirtIODevice *vdev = VIRTIO_DEVICE(v);
>  
> -virtio_error(vdev,
> - "VirtFS reply type %d needs %zu bytes, buffer has %zu, 
> less than minimum",
> - pdu->id + 1, *size, buf_size);
> -}
> -if (buf_size < *size) {
> -*size = buf_size;
> +virtio_error(vdev,
> + "VirtFS reply type %d needs %zu bytes, buffer has "
> + "%zu, less than minimum (%zu)",
> + pdu->id + 1, *size, buf_size, hdr_size + 1);
> +}

I think we want to return here


> +if (buf_size < *size) {
> +*size = buf_size;
> +}
> +} else {
> +if (buf_size < *size) {
> +VirtIODevice *vdev = VIRTIO_DEVICE(v);
> +
> +virtio_error(vdev,
> + "VirtFS reply type %d needs %zu bytes, buffer has 
> %zu",
> + pdu->id + 1, *size, buf_size);
> +}
>  }
>  
>  *piov = elem->in_sg;
> diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> index f04caabfe5..98f340d24b 100644
> --- a/hw/9pfs/xen-9p-backend.c
> +++ b/hw/9pfs/xen-9p-backend.c
> @@ -201,15 +201,35 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU *pdu,
>  xen_9pfs_in_sg(ring, ring->sg, , pdu->idx, *size);
>  
>  buf_size = iov_size(ring->sg, num);
> -if (buf_size  < P9_IOHDRSZ) {
> -xen_pv_printf(_9pfs->xendev, 0, "Xen 9pfs reply type %d needs "
> -  "%zu bytes, buffer has %zu, less than minimum\n",
> -  pdu->id + 1, *size, buf_size);
> -xen_be_set_state(_9pfs->xendev, XenbusStateClosing);
> -xen_9pfs_disconnect(_9pfs->xendev);
> -}
> -if (buf_size  < *size) {
> -*size = buf_size;
> +if (pdu->id + 1 == P9_RREAD) {
> +/* size[4] Rread tag[2] count[4] data[count] */
> +const size_t hdr_size = 11;
> +/*
> + * If current transport buffer size is smaller than actually required
> + * for this Rreaddir response, then truncate the response to the
> + * currently available transport buffer size, however only if it 
> would
> + * at least allow to return 1 payload byte to client.
> + */
> +if (buf_size < hdr_size + 1) {
> +xen_pv_printf(_9pfs->xendev, 0, "Xen 

Re: [Bug 1878054] [NEW] Hang with high CPU usage in sdhci_data_transfer

2020-05-11 Thread Philippe Mathieu-Daudé

+Peter who was the previous maintainer.

On 5/11/20 7:23 PM, Alexander Bulekov wrote:

Public bug reported:

Hello,
While fuzzing, I found an input that causes QEMU to hang with 100% CPU usage.
I have waited several minutes, and QEMU is still unresponsive. Using gdb, It
appears that it is stuck in an sdhci_data_transfer:


Quick analysis of the attached file show the SDHCI starts multi-block 
DMA transfer (for 0xffea blocks), while the SD card is not initialized.


The card keeps returning zero data (because not in the SENDING state).

The problem seems related to this comment in 
sdhci_sdma_transfer_multi_blocks():


/* XXX: Some sd/mmc drivers (for example, u-boot-slp) do not 
account for
 * possible stop at page boundary if initial address is not page 
aligned,

 * allow them to work properly */
if ((s->sdmasysad % boundary_chk) == 0) {
page_aligned = true;
}

Setting page_aligned to false avoid the infinite loop.

You found a case where s->blkcnt is never decremented (thus the infinite 
loop & unresponsiveness). See:


if (((boundary_count + begin) < block_size) && page_aligned) {
s->data_count = boundary_count + begin;
boundary_count = 0;
 } else {
s->data_count = block_size;
boundary_count -= block_size - begin;
if (s->trnmod & SDHC_TRNS_BLK_CNT_EN) {
s->blkcnt--;
}
}



#0   memory_region_access_valid (mr=, addr=0x10284920, 
size=, is_write=0xff, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1378
#1   memory_region_dispatch_write (mr=, addr=, 
data=, op=MO_32, attrs=...) at /home/alxndr/Development/qemu/memory.c:1463
#2   flatview_write_continue (fv=, addr=0x10284920, attrs=..., ptr=, len=0xb7, addr1=0x582798e0, l=, mr=0x582798e0 
) at /home/alxndr/Development/qemu/exec.c:3137
#3   flatview_write (fv=0x60645da0, addr=, attrs=..., buf=, len=) at /home/alxndr/Development/qemu/exec.c:3177
#4   address_space_write (as=, addr=, attrs=..., 
buf=0xb04f325, len=0x4) at /home/alxndr/Development/qemu/exec.c:3268
#5   address_space_rw (as=0x572509ac , 
addr=0x582798e0, attrs=..., attrs@entry=..., buf=0xb04f325, len=0x4, 
is_write=0xb8, is_write@entry=0x1) at
/home/alxndr/Development/qemu/exec.c:3278
#6   dma_memory_rw_relaxed (as=0x572509ac , 
addr=0x582798e0, buf=0xb04f325, len=0x4, dir=DMA_DIRECTION_FROM_DEVICE) at 
/home/alxndr/Development/qemu/include/sysemu/dma.h:87
#7   dma_memory_rw (as=0x572509ac , 
addr=0x582798e0, buf=0xb04f325, len=0x4, dir=DMA_DIRECTION_FROM_DEVICE) at 
/home/alxndr/Development/qemu/include/sysemu/dma.h:110
#8   dma_memory_write (as=0x572509ac , 
addr=0x582798e0, buf=0xb04f325, len=0x4) at 
/home/alxndr/Development/qemu/include/sysemu/dma.h:122
#9   sdhci_sdma_transfer_multi_blocks (s=) at 
/home/alxndr/Development/qemu/hw/sd/sdhci.c:618
#10  sdhci_data_transfer (opaque=0x61e21080) at 
/home/alxndr/Development/qemu/hw/sd/sdhci.c:891
#11  sdhci_send_command (s=0x61e21080) at 
/home/alxndr/Development/qemu/hw/sd/sdhci.c:364
#12  sdhci_write (opaque=, offset=0xc, val=, 
size=) at /home/alxndr/Development/qemu/hw/sd/sdhci.c:1158
#13  memory_region_write_accessor (mr=, addr=, value=, size=, shift=, mask=, attrs=...) at
/home/alxndr/Development/qemu/memory.c:483
#14  access_with_adjusted_size (addr=, value=, size=, access_size_min=, access_size_max=, access_fn=, mr=0x61e219f0, attrs=...) at /home/alxndr/Development/qemu/memory.c:544
#15  memory_region_dispatch_write (mr=, addr=, 
data=0x1ffe0ff, op=, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1476
#16  flatview_write_continue (fv=, addr=0xe106800c, attrs=..., 
ptr=, len=0xff3, addr1=0x582798e0, l=, 
mr=0x61e219f0) at /home/alxndr/Development/qemu/exec.c:3137
#17  flatview_write (fv=0x60645da0, addr=, attrs=..., buf=, len=) at /home/alxndr/Development/qemu/exec.c:3177
#18  address_space_write (as=, addr=, attrs=..., 
attrs@entry=..., buf=0xb04f325, buf@entry=0x6218ad00, len=0x4) at 
/home/alxndr/Development/qemu/exec.c:3268
#19  qtest_process_command (chr=, chr@entry=0x5827c040 
, words=) at /home/alxndr/Development/qemu/qtest.c:567
#20  qtest_process_inbuf (chr=0x5827c040 , inbuf=0x6190f640) 
at /home/alxndr/Development/qemu/qtest.c:710


I am attaching the qtest commands for reproducing it.
I can reproduce it in a qemu 5.0 build using:

qemu-system-i386 -M pc-q35-5.0 -qtest stdio -device sdhci-pci,sd-spec-
version=3 -device sd-card,drive=mydrive -drive if=sd,index=0,file=null-
co://,format=raw,id=mydrive -nographic -nographic -serial none -monitor
none < attachment

Please let me know if I can provide any further info.
-Alex

** Affects: qemu
  Importance: Undecided
  Status: New





Re: [PATCH 5/7] audio: Let HWVoice write() handlers take a const buffer

2020-05-11 Thread Philippe Mathieu-Daudé

On 5/6/20 8:22 AM, Volker Rümelin wrote:



diff --git a/audio/dsoundaudio.c b/audio/dsoundaudio.c
index 4cdf19ab67..bba6bafda4 100644
--- a/audio/dsoundaudio.c
+++ b/audio/dsoundaudio.c
@@ -454,7 +454,7 @@ static void *dsound_get_buffer_out(HWVoiceOut *hw, size_t 
*size)
  return ret;
  }
  
-static size_t dsound_put_buffer_out(HWVoiceOut *hw, void *buf, size_t len)

+static size_t dsound_put_buffer_out(HWVoiceOut *hw, const void *buf, size_t 
len)
  {
  DSoundVoiceOut *ds = (DSoundVoiceOut *) hw;
  LPDIRECTSOUNDBUFFER dsb = ds->dsound_buffer;


You forgot to make the buffer const in dsound_put_buffer_in().

I had to cast buf to LPVOID in dsound_get_buffer_in() and 
dsound_put_buffer_in() because otherwise I see:

C:/usr/msys64/home/ruemelin/git/qemu/audio/dsoundaudio.c: In function 
'dsound_put_buffer_out':
C:/usr/msys64/home/ruemelin/git/qemu/audio/dsoundaudio.c:466:38: error: passing 
argument 2 of 'dsound_unlock_out' discards 'const' qualifier from pointer 
target type [-Werror=discarded-qualifiers]
   466 | int err = dsound_unlock_out(dsb, buf, NULL, len, 0);
   |  ^~~
In file included from 
C:/usr/msys64/home/ruemelin/git/qemu/audio/dsoundaudio.c:266:
C:/usr/msys64/home/ruemelin/git/qemu/audio/dsound_template.h:48:12: note: 
expected 'LPVOID' {aka 'void *'} but argument is of type 'const void *'
    48 | LPVOID p1,
   | ~~~^~
C:/usr/msys64/home/ruemelin/git/qemu/audio/dsoundaudio.c: In function 
'dsound_put_buffer_in':
C:/usr/msys64/home/ruemelin/git/qemu/audio/dsoundaudio.c:571:38: error: passing 
argument 2 of 'dsound_unlock_in' discards 'const' qualifier from pointer target 
type [-Werror=discarded-qualifiers]
   571 | int err = dsound_unlock_in(dscb, buf, NULL, len, 0);
   |  ^~~
In file included from 
C:/usr/msys64/home/ruemelin/git/qemu/audio/dsoundaudio.c:268:
C:/usr/msys64/home/ruemelin/git/qemu/audio/dsound_template.h:48:12: note: 
expected 'LPVOID' {aka 'void *'} but argument is of type 'const void *'
    48 | LPVOID p1,
   | ~~~^~


OK thanks for testing. This is unfortunate, because a single backend 
invalidates the whole series.
I don't understand why the DirectSound API requires a writable buffer 
for locking.




With best regards,
Volker





Re: [PATCH v3 3/9] block: Make it easier to learn which BDS support bitmaps

2020-05-11 Thread Eric Blake

On 5/11/20 1:16 PM, Eric Blake wrote:

On 5/11/20 4:21 AM, Max Reitz wrote:



+++ b/include/block/block_int.h
@@ -560,6 +560,7 @@ struct BlockDriver {
   uint64_t parent_perm, uint64_t 
parent_shared,

   uint64_t *nperm, uint64_t *nshared);

+    bool (*bdrv_dirty_bitmap_supported)(BlockDriverState *bs);


All BDSs support bitmaps, but only some support persistent dirty
bitmaps, so I think the name should reflect that.


How about .bdrv_dirty_bitmap_supports_persistent?


Bike-shedding myself, it looks like 
.bdrv_supports_persistent_dirty_bitmap is better (if you go by the 
naming convention 'noun-verb-details', it makes more sense that a 'bdrv' 
supports 'persistent dirty bitmaps', than that a 'bdrv dirty bitmap' 
supports 'persistence', particularly when the parameter is a 
BlockDriverState rather than a BdrvDirtyBitmap).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 3/5] virtio-iommu: Handle reserved regions in the translation process

2020-05-11 Thread Peter Xu
On Fri, May 08, 2020 at 07:30:55PM +0200, Eric Auger wrote:
> When translating an address we need to check if it belongs to
> a reserved virtual address range. If it does, there are 2 cases:
> 
> - it belongs to a RESERVED region: the guest should neither use
>   this address in a MAP not instruct the end-point to DMA on
>   them. We report an error
> 
> - It belongs to an MSI region: we bypass the translation.
> 
> Signed-off-by: Eric Auger 

Reviewed-by: Peter Xu 

-- 
Peter Xu




[PATCH] hw: Use QEMU_IS_ALIGNED() on parallel flash block size

2020-05-11 Thread Philippe Mathieu-Daudé
Use the QEMU_IS_ALIGNED() macro to verify the flash block size
is properly aligned. It is quicker to process when reviewing.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/sbsa-ref.c   | 2 +-
 hw/arm/virt.c   | 2 +-
 hw/block/pflash_cfi01.c | 2 +-
 hw/block/pflash_cfi02.c | 2 +-
 hw/i386/pc_sysfw.c  | 2 +-
 hw/riscv/virt.c | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 8409ba853d..b379e4a76a 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -241,7 +241,7 @@ static void sbsa_flash_map1(PFlashCFI01 *flash,
 {
 DeviceState *dev = DEVICE(flash);
 
-assert(size % SBSA_FLASH_SECTOR_SIZE == 0);
+assert(QEMU_IS_ALIGNED(size, SBSA_FLASH_SECTOR_SIZE));
 assert(size / SBSA_FLASH_SECTOR_SIZE <= UINT32_MAX);
 qdev_prop_set_uint32(dev, "num-blocks", size / SBSA_FLASH_SECTOR_SIZE);
 qdev_init_nofail(dev);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 634db0cfe9..0a99fddb3d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -978,7 +978,7 @@ static void virt_flash_map1(PFlashCFI01 *flash,
 {
 DeviceState *dev = DEVICE(flash);
 
-assert(size % VIRT_FLASH_SECTOR_SIZE == 0);
+assert(QEMU_IS_ALIGNED(size, VIRT_FLASH_SECTOR_SIZE));
 assert(size / VIRT_FLASH_SECTOR_SIZE <= UINT32_MAX);
 qdev_prop_set_uint32(dev, "num-blocks", size / VIRT_FLASH_SECTOR_SIZE);
 qdev_init_nofail(dev);
diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index f586bac269..11922c0f96 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -964,7 +964,7 @@ PFlashCFI01 *pflash_cfi01_register(hwaddr base,
 if (blk) {
 qdev_prop_set_drive(dev, "drive", blk, _abort);
 }
-assert(size % sector_len == 0);
+assert(QEMU_IS_ALIGNED(size, sector_len));
 qdev_prop_set_uint32(dev, "num-blocks", size / sector_len);
 qdev_prop_set_uint64(dev, "sector-length", sector_len);
 qdev_prop_set_uint8(dev, "width", bank_width);
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index c6b6f2d082..895f7daee3 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -1003,7 +1003,7 @@ PFlashCFI02 *pflash_cfi02_register(hwaddr base,
 if (blk) {
 qdev_prop_set_drive(dev, "drive", blk, _abort);
 }
-assert(size % sector_len == 0);
+assert(QEMU_IS_ALIGNED(size, sector_len));
 qdev_prop_set_uint32(dev, "num-blocks", size / sector_len);
 qdev_prop_set_uint32(dev, "sector-length", sector_len);
 qdev_prop_set_uint8(dev, "width", width);
diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index f5f3f466b0..fad41f0e73 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -168,7 +168,7 @@ static void pc_system_flash_map(PCMachineState *pcms,
  blk_name(blk), strerror(-size));
 exit(1);
 }
-if (size == 0 || size % FLASH_SECTOR_SIZE != 0) {
+if (size == 0 || !QEMU_IS_ALIGNED(size, FLASH_SECTOR_SIZE)) {
 error_report("system firmware block device %s has invalid size "
  "%" PRId64,
  blk_name(blk), size);
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index daae3ebdbb..71481d59c2 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -112,7 +112,7 @@ static void virt_flash_map1(PFlashCFI01 *flash,
 {
 DeviceState *dev = DEVICE(flash);
 
-assert(size % VIRT_FLASH_SECTOR_SIZE == 0);
+assert(QEMU_IS_ALIGNED(size, VIRT_FLASH_SECTOR_SIZE));
 assert(size / VIRT_FLASH_SECTOR_SIZE <= UINT32_MAX);
 qdev_prop_set_uint32(dev, "num-blocks", size / VIRT_FLASH_SECTOR_SIZE);
 qdev_init_nofail(dev);
-- 
2.21.3




Re: [PATCH v8] audio/jack: add JACK client audiodev

2020-05-11 Thread Eric Blake

On 4/29/20 12:53 AM, Geoffrey McRae wrote:

This commit adds a new audiodev backend to allow QEMU to use JACK as
both an audio sink and source.

Signed-off-by: Geoffrey McRae 
---



+++ b/qapi/audio.json
@@ -152,6 +152,55 @@
  '*out': 'AudiodevPerDirectionOptions',
  '*latency': 'uint32' } }
  
+##

+# @AudiodevJackPerDirectionOptions:
+#
+# Options of the JACK backend that are used for both playback and
+# recording.
+#
+# @server-name: select from among several possible concurrent server instances
+# (default: environment variable $JACK_DEFAULT_SERVER if set, else "default")
+#
+# @client-name: the client name to use. The server will modify this name to
+# create a unique variant, if needed unless @exact_name is true (default: the


Minor typos: @exact_name does not match...


+# guest's name)
+#
+# @connect-ports: if set, a regular expression of JACK client port name(s) to
+# monitor for and automatically connect to
+#
+# @start-server: start a jack server process if one is not lready present


already


+# (default: false)
+#
+# @exact-name: use the exact name requested otherwise JACK automatically
+# generates a unique one, if needed (default: false)


...the actual parameter @exact-name.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: Qemu, VNC and non-US keymaps

2020-05-11 Thread B3r3n

Hello Daniel,


There is no mention here of what VNC client program is being used, which
is quite important, as key handling is a big mess in VNC.
I tested with TightVNC & noVNC through Apache. Both behaves the same. 
I did not tested Ultr@VNC.




The default VNC protocol passes X11 keysyms over the wire.

The remote desktop gets hardware scancodes and turns them into keysyms,
which the VNC client sees. The VNC client passes them to the VNC server
in QEMU, which then has to turn them back into hardware scancodes. This
reverse mapping relies on knowledge of the keyboard mapping, and is what
the "-k fr" argument tells QEMU.

For this to work at all, the keymap used by the remote desktop must
match the keymap used by QEMU, which must match the keymap used by
the guest OS.  Even this is not sufficient though, because the act
of translating hardware scancodes into keysyms is *lossy*. There is
no way to reliably go back to hardware scancodes, which is precisely
what QEMU tries to do - some reverse mappings will be ambiguous.

Yes, I saw that topic passing by. Looks messy with all these interferences...


Due to this mess, years ago (over a decade) QEMU introduced a VNC
protocol extension that allows for passing hardware scancodes over
the wire.

I guess I also crossed something about this on Internet.
Are you talking of the RFB protocol ?


With this extension, the VNC client gets the hardware scancode
from the remote desktop, and passes it straight to the VNC server,
which passes it straight to the guest OS, which then applies the
localized keyboard mapping.   This is good because the localized
keyboard mapping conversion is now only done once, in the guest
OS.

To make use of this protocol extension to VNC, you must *NOT*
pass any "-k" arg to QEMU, and must use a VNC client that has
support for this protocol extension.  The GTK-VNC widget supports
this and is used by virt-viewer, remote-viewer, virt-manager,
GNOME Boxes, Vinagre client applications.  The TigerVNC client
also supports this extension.
So if I read you, if the client "enforce" this protocol (supposed 
RFB), Qemu will automatically uses it as well ?
Removing -k option is great to me if it works, since user will have 
its own mapping and these are international :-)



To summarize, my recommendation is to remove the "-k" arg entirely,
and pick a VNC client that supports the scancode extension.
For now I am using TightVNC & noVNC. noVNC is precious since it 
widens the user world, removing any client software constraint.



It is possible there might be a genuine bug in QEMU's 'fr' keymap
that can be fixed to deal with AltGr problems. Personally though I
don't spend time investigating these problems, as the broad reverse
keymapping problem is unfixable. The only sensible option is to take
the route of using the VNC hardware scancode extension. It is notable
that SPICE learnt from VNC's mistake and used hardware scancodes from
the very start.


This was another path I intend to follow : using SPICE and a 
"noSPICE" client if VNC was too painful.

If I understand you, using SPICE could also solve the issue ?

Many thanks for your inputs...

Brgrds




Regards,
Daniel
--
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|





Re: [PATCH v4 5/6] i386: Hyper-V VMBus ACPI DSDT entry

2020-05-11 Thread Roman Kagan
On Thu, May 07, 2020 at 06:14:25AM +0300, Jon Doron wrote:
> Igor it seems like the IRQ being used is 5 and not 7 & 13 like in the
> current patch.

HyperV using irq 5 doesn't mean QEMU has to too.  Especially so as no
guest was noticed to use the irqs in ACPI.  I'd rather try and test if
the guest requires any those at all.

> Seems like it needs to reside in the _CRS like you said.

They already are there.

> Seems like it has all those _STA/_DIS/_PS0 just like the way it's currently
> in the patch (unless I'm missing something).

Right, but, as you can see, they are pretty dumb, so the question is
whether they are necessary or the guests can do without (Linux
apparently can).

Thanks,
Roman.

> Notice _PS3 is not a Method.
> 
> So just to summarize the changes i need to do:
> 1. Change from 2 IRQs to single one (and use 5 as the default)
> 2. IRQs needs to be under _CRS.
> 3. You mentioned you want under a different location than the ISA bug where
> would you want it to be?
> 
> Please let me know if there is anything else.
> 
> Thanks,
> -- Jon.
> 
> On 06/05/2020, Maciej S. Szmigiero wrote:
> > On 05.05.2020 17:38, Jon Doron wrote:
> > > On 05/05/2020, Igor Mammedov wrote:
> > > 
> > > I dont know what were the original intentions of the original patch 
> > > authors (at this point I simply rebased it, and to be honest I did not 
> > > need this patch to get where I was going to, but it was part of the 
> > > original patchset).
> > > 
> > > But I'm willing to do any changes so we can keep going forward with this.
> > > 
> > > > On Fri, 24 Apr 2020 15:34:43 +0300
> > > > Jon Doron  wrote:
> > > > 
> > > > > Guest OS uses ACPI to discover VMBus presence.  Add a corresponding
> > > > > entry to DSDT in case VMBus has been enabled.
> > > > > 
> > > > > Experimentally Windows guests were found to require this entry to
> > > > > include two IRQ resources. They seem to never be used but they still
> > > > > have to be there.
> > > > > 
> > > > > Make IRQ numbers user-configurable via corresponding properties; use 7
> > > > > and 13 by default.
> > > > well, it seems that at least linux guest driver uses one IRQ,
> > > > abeit not from ACPI descriptior
> > > > 
> > > > perhaps it's what hyperv host puts into _CRS.
> > > > Could you dump ACPI tables and check how hyperv describes vmbus in acpi?
> > > > 
> > > > 
> > > 
> > > I can no longer get to the HyperV computer I had (in the office so 
> > > hopefully if someone else has access to HyperV machine and willing to 
> > > reply here with the dumped ACPI tables that would be great).
> > > 
> > 
> > Here is a VMBus ACPI device description from Hyper-V in Windows Server 2019:
> > 
> > Device (\_SB.VMOD.VMBS)
> > {
> >Name (STA, 0x0F)
> >Name (_ADR, Zero)  // _ADR: Address
> >Name (_DDN, "VMBUS")  // _DDN: DOS Device Name
> >Name (_HID, "VMBus")  // _HID: Hardware ID
> >Name (_UID, Zero)  // _UID: Unique ID
> >Method (_DIS, 0, NotSerialized)  // _DIS: Disable Device
> >{
> > STA &= 0x0D
> >}
> > 
> >Method (_PS0, 0, NotSerialized)  // _PS0: Power State 0
> >{
> > STA |= 0x0F
> >}
> > 
> >Method (_STA, 0, NotSerialized)  // _STA: Status
> >{
> > Return (STA) /* \_SB_.VMOD.VMBS.STA_ */
> >}
> > 
> >Name (_PS3, Zero)  // _PS3: Power State 3
> >Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
> >{
> > IRQ (Edge, ActiveHigh, Exclusive, )
> > {5}
> >})
> > }
> > 
> > It seems to use just IRQ 5.
> > 
> > Maciej
> 



Re: [PATCH] hostmem: don't use mbind() if host-nodes is epmty

2020-05-11 Thread Philippe Mathieu-Daudé

On 5/11/20 9:24 PM, Igor Mammedov wrote:

On Mon, 11 May 2020 18:00:01 +0200
Philippe Mathieu-Daudé  wrote:


Hi Eduardo,

On 5/4/20 5:44 PM, Eduardo Habkost wrote:

On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote:

Since 5.0 QEMU uses hostmem backend for allocating main guest RAM.
The backend however calls mbind() which is typically NOP
in case of default policy/absent host-nodes bitmap.
However when runing in container with black-listed mbind()
syscall, QEMU fails to start with error
   "cannot bind memory to host NUMA nodes: Operation not permitted"
even when user hasn't provided host-nodes to pin to explictly
(which is the case with -m option)

To fix issue, call mbind() only in case when user has provided
host-nodes explicitly (i.e. host_nodes bitmap is not empty).
That should allow to run QEMU in containers with black-listed
mbind() without memory pinning. If QEMU provided memory-pinning
is required user still has to white-list mbind() in container
configuration.

Reported-by: Manuel Hohmann 
Signed-off-by: Igor Mammedov 


Queued on machine-next, thanks!


I've been debugging this issue again today and figured it was not
merged, if possible can you add the "Cc: qemu-sta...@nongnu.org" tag
before sending your pull request?

it's CCed already, so my impression was that will should picked up once it was 
reviewed.


Correct, however some distributions find easier to grep for the 'Cc: 
qemu-sta...@nongnu.org' merged tag before qemu-stable is released.






Thanks,

Phil.








[RESEND PATCH v3 1/1] ppc/spapr: Add hotremovable flag on DIMM LMBs on drmem_v2

2020-05-11 Thread Leonardo Bras
From: Leonardo Bras 

On reboot, all memory that was previously added using object_add and
device_add is placed in this DIMM area.

The new SPAPR_LMB_FLAGS_HOTREMOVABLE flag helps Linux to put this memory in
the correct memory zone, so no unmovable allocations are made there,
allowing the object to be easily hot-removed by device_del and
object_del.

This new flag was accepted in Power Architecture documentation.

Signed-off-by: Leonardo Bras 
Reviewed-by: Bharata B Rao 

---
Changes since v1:
- Flag name changed from SPAPR_LMB_FLAGS_HOTPLUGGED to
SPAPR_LMB_FLAGS_HOTREMOVABLE
---
 hw/ppc/spapr.c | 3 ++-
 include/hw/ppc/spapr.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9a2bd501aa..fe662e297e 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -446,7 +446,8 @@ static int spapr_dt_dynamic_memory_v2(SpaprMachineState 
*spapr, void *fdt,
 g_assert(drc);
 elem = spapr_get_drconf_cell(size / lmb_size, addr,
  spapr_drc_index(drc), node,
- SPAPR_LMB_FLAGS_ASSIGNED);
+ (SPAPR_LMB_FLAGS_ASSIGNED |
+  SPAPR_LMB_FLAGS_HOTREMOVABLE);
 QSIMPLEQ_INSERT_TAIL(_queue, elem, entry);
 nr_entries++;
 cur_addr = addr + size;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 42d64a0368..93e0d43051 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -880,6 +880,7 @@ int spapr_rtc_import_offset(SpaprRtcState *rtc, int64_t 
legacy_offset);
 #define SPAPR_LMB_FLAGS_ASSIGNED 0x0008
 #define SPAPR_LMB_FLAGS_DRC_INVALID 0x0020
 #define SPAPR_LMB_FLAGS_RESERVED 0x0080
+#define SPAPR_LMB_FLAGS_HOTREMOVABLE 0x0100
 
 void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
 
-- 
2.25.1




Re: [PATCH 2/5] io/channel.c,io/channel-socket.c: Add yank feature

2020-05-11 Thread Lukas Straub
On Mon, 11 May 2020 12:51:46 +0100
Daniel P. Berrangé  wrote:

> On Mon, May 11, 2020 at 01:14:41PM +0200, Lukas Straub wrote:
> > Add qio_channel_set_yank function to channel and to channel-socket,
> > which will register a yank function. The yank function calls
> > shutdown() on the socket.
> > 
> > Signed-off-by: Lukas Straub 
> > ---
> >  Makefile.objs   |  1 +
> >  include/io/channel-socket.h |  1 +
> >  include/io/channel.h| 12 
> >  io/channel-socket.c | 29 +
> >  io/channel.c|  9 +
> >  5 files changed, 52 insertions(+)  
> 
> Assuming we want the yank feature (which I'm not entirely convinced
> of), then I don't think any of this addition should exist. The
> QIOChannel class already provides a "qio_channel_shutdown" method
> which can be invoked. The layer above which is using the QIOChannel
> should be calling this existing qio_channel_shutdown method in
> response to any yank request.  The I/O layer shouldn't have any
> direct dependancy on the yank feature.

Having the code here simplifys the code in the other places.

Regards,
Lukas Straub

> 
> Regards,
> Daniel



pgpe0UA_BrJSl.pgp
Description: OpenPGP digital signature


Re: [PATCH v26 01/10] acpi: nvdimm: change NVDIMM_UUID_LE to a common macro

2020-05-11 Thread Igor Mammedov
On Mon, 11 May 2020 22:05:28 +0800
gengdongjiu  wrote:

> >> +    (node3), (node4), (node5) }
> >> +
> >>   #define UUID_FMT "%02hhx%02hhx%02hhx%02hhx-" \
> >>    "%02hhx%02hhx-%02hhx%02hhx-" \
> >>    "%02hhx%02hhx-" \
> >> diff --git a/slirp b/slirp
> >> index 2faae0f..55ab21c 16
> >> --- a/slirp
> >> +++ b/slirp
> >> @@ -1 +1 @@
> >> -Subproject commit 2faae0f778f818fadc873308f983289df697eb93
> >> +Subproject commit 55ab21c9a36852915b81f1b41ebaf3b6509dd8ba  
> > 
> > The SLiRP submodule change is certainly unrelated.  
> 
> Thanks Philippe's review and comments. I submitted another patchset "[PATCH 
> RESEND v26 00/10] Add ARMv8 RAS virtualization support in QEMU" to fix it, 
> please review that patchset.

for future, adding RESEND doesn't make sence here. If you change patches then 
just bump version.
> 
> > 
> > 
> > .
> >   
> 




Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu

2020-05-11 Thread Lukas Straub
On Mon, 11 May 2020 13:17:14 +0100
Daniel P. Berrangé  wrote:

> On Mon, May 11, 2020 at 01:07:18PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berra...@redhat.com) wrote:  
> > > On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:  
> > > > Hello Everyone,
> > > > In many cases, if qemu has a network connection (qmp, migration, 
> > > > chardev, etc.)
> > > > to some other server and that server dies or hangs, qemu hangs too.  
> > > 
> > > If qemu as a whole hangs due to a stalled network connection, that is a
> > > bug in QEMU that we should be fixing IMHO. QEMU should be doing 
> > > non-blocking
> > > I/O in general, such that if the network connection or remote server 
> > > stalls,
> > > we simply stop sending I/O - we shouldn't ever hang the QEMU process or 
> > > main
> > > loop.
> > > 
> > > There are places in QEMU code which are not well behaved in this respect,
> > > but many are, and others are getting fixed where found to be important.
> > > 
> > > Arguably any place in QEMU code which can result in a hang of QEMU in the
> > > event of a stalled network should be considered a security flaw, because
> > > the network is untrusted in general.  
> > 
> > That's not really true of the 'management network' - people trust that
> > and I don't see a lot of the qemu code getting fixed safely for all of
> > them.  
> 
> It depends on the user / app / deployment scenario. In OpenStack alot of
> work was done to beef up security between services on the mgmt network,
> with TLS encryption as standard to reduce attack vectors.
> 
> > > > These patches introduce the new 'yank' out-of-band qmp command to 
> > > > recover from
> > > > these kinds of hangs. The different subsystems register callbacks which 
> > > > get
> > > > executed with the yank command. For example the callback can shutdown() 
> > > > a
> > > > socket. This is intended for the colo use-case, but it can be used for 
> > > > other
> > > > things too of course.  
> > > 
> > > IIUC, invoking the "yank" command unconditionally kills every single
> > > network connection in QEMU that has registered with the "yank" subsystem.
> > > IMHO this is way too big of a hammer, even if we accept there are bugs in
> > > QEMU not handling stalled networking well.  
> > 
> > But isn't this hammer conditional - I see that it's a migration
> > capabiltiy for the migration socket, and a flag in nbd - so it only
> > yanks things you've told it to.  
> 
> IIUC, you have to set these flags upfront when you launch QEMU, or
> hotplug the device using the feature. When something gets stuck,
> and you issue the "yank" command, then everything that has the flag
> enabled gets torn down. So in practice it looks like the flag will
> get enabled for everything at QEMU startup, and yanking down tear
> down everything.
> 
> > > eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
> > > connection used for the guest disk, we needlessly break I/O.
> > > 
> > > eg doing this in the chardev backend is not desirable, because the bugs
> > > with hanging QEMU are typically caused by the way the frontend device
> > > uses the chardev blocking I/O calls, instead of non-blocking I/O calls.
> > >   
> > 
> > Having a way to get out of any of these problems from a single point is
> > quite nice.  To be useful in COLO you need to know for sure you can get
> > out of any network screwup.
> > 
> > We already use shutdown(2) in migrate_cancel and migrate-pause for
> > basically the same reason; I don't think we've got anything similar for
> > NBD, and we probably should have (I think I asked for it fairly
> > recently).  
> 
> Yes, the migrate_cancel is an example of a more fine grained way to
> recover. I was thinking that we need an equivalent fine control knob
> for NBD too.

One reason why the yank feature is done this way is that the management 
application may not know in what state qemu is and so it doesn't know what to 
yank. Poking in the dark would work too in my case, but it's not that nice.

Regards,
Lukas Straub

> That way if QEMU does get stuck, you can start by tearing down the
> least distruptive channel. eg try tearing down the migration connection
> first (which shouldn't negatively impact the guest), and only if that
> doesn't work then, move on to tear down the NBD connection (which risks
> data loss)
> 
> Regards,
> Daniel



pgp03ApSNxKwa.pgp
Description: OpenPGP digital signature


Re: [PATCH v5 14/15] acpi: q35: drop _SB.PCI0.ISA.LPCD opregion.

2020-05-11 Thread Igor Mammedov
On Thu,  7 May 2020 15:16:39 +0200
Gerd Hoffmann  wrote:

> Seems to be unused.
> 
> Signed-off-by: Gerd Hoffmann 

Reviewed-by: Igor Mammedov 

> ---
>  hw/i386/acpi-build.c | 11 ---
>  1 file changed, 11 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index c1e63cce5e8e..1afb47b09ee9 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1417,7 +1417,6 @@ static void build_q35_isa_bridge(Aml *table)
>  {
>  Aml *dev;
>  Aml *scope;
> -Aml *field;
>  
>  scope =  aml_scope("_SB.PCI0");
>  dev = aml_device("ISA");
> @@ -1427,16 +1426,6 @@ static void build_q35_isa_bridge(Aml *table)
>  aml_append(dev, aml_operation_region("PIRQ", AML_PCI_CONFIG,
>   aml_int(0x60), 0x0C));
>  
> -aml_append(dev, aml_operation_region("LPCD", AML_PCI_CONFIG,
> - aml_int(0x80), 0x02));
> -field = aml_field("LPCD", AML_ANY_ACC, AML_NOLOCK, AML_PRESERVE);
> -aml_append(field, aml_named_field("COMA", 3));
> -aml_append(field, aml_reserved_field(1));
> -aml_append(field, aml_named_field("COMB", 3));
> -aml_append(field, aml_reserved_field(1));
> -aml_append(field, aml_named_field("LPTD", 2));
> -aml_append(dev, field);
> -
>  aml_append(scope, dev);
>  aml_append(table, scope);
>  }




Re: [PATCH v5 13/15] acpi: drop build_piix4_pm()

2020-05-11 Thread Igor Mammedov
On Thu,  7 May 2020 15:16:38 +0200
Gerd Hoffmann  wrote:

> The _SB.PCI0.PX13.P13C opregion (holds isa device enable bits)
> is not used any more, remove it from DSDT.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/i386/acpi-build.c | 16 
>  1 file changed, 16 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 765409a90eb6..c1e63cce5e8e 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1441,21 +1441,6 @@ static void build_q35_isa_bridge(Aml *table)
>  aml_append(table, scope);
>  }
>  
> -static void build_piix4_pm(Aml *table)
> -{
> -Aml *dev;
> -Aml *scope;
> -
> -scope =  aml_scope("_SB.PCI0");
> -dev = aml_device("PX13");
> -aml_append(dev, aml_name_decl("_ADR", aml_int(0x00010003)));
I agree about removing P13C but I'm not sure if it's safe to remove
whole isa bridge

> -
> -aml_append(dev, aml_operation_region("P13C", AML_PCI_CONFIG,
> - aml_int(0x00), 0xff));
> -aml_append(scope, dev);
> -aml_append(table, scope);
> -}
> -
>  static void build_piix4_isa_bridge(Aml *table)
>  {
>  Aml *dev;
> @@ -1607,7 +1592,6 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>  aml_append(dsdt, sb_scope);
>  
>  build_hpet_aml(dsdt);
> -build_piix4_pm(dsdt);
>  build_piix4_isa_bridge(dsdt);
>  build_isa_devices_aml(dsdt);
>  build_piix4_pci_hotplug(dsdt);




Re: [PATCH RESEND v6 00/36] Initial support for multi-process qemu

2020-05-11 Thread Jag Raman



> On May 11, 2020, at 10:40 AM, Stefan Hajnoczi  wrote:
> 
> Hi,
> Have you decided whether to drop the remote device program in favor of
> using a softmmu make target?
> 
> Is there anything in this series you'd like me to review before you send
> the next revision?

Hi Stefan,

We are planning to drop the separate remote device program in the next
revision. We are planning to use QEMU’s existing event loop instead of
a separate event loop for the remote process, as well as the command
line invocation you suggested in your feedback.

We hope the following core patches look good to you, by and large:
[PATCH RESEND v6 01/36] memory: alloc RAM from file at offset
[PATCH RESEND v6 11/36] multi-process: define mpqemu-link object
[PATCH RESEND v6 12/36] multi-process: add functions to synchronize proxy and 
remote endpoints
[PATCH RESEND v6 13/36] multi-process: setup PCI host bridge for remote device
[PATCH RESEND v6 14/36] multi-process: setup a machine object for remote device 
process
[PATCH RESEND v6 15/36] multi-process: setup memory manager for remote device
[PATCH RESEND v6 17/36] multi-process: introduce proxy object
[PATCH RESEND v6 18/36] multi-process: Initialize Proxy Object's communication 
channel
[PATCH RESEND v6 19/36] multi-process: Connect Proxy Object with device in the 
remote process
[PATCH RESEND v6 20/36] multi-process: Forward PCI config space acceses to the 
remote process
[PATCH RESEND v6 21/36] multi-process: PCI BAR read/write handling for proxy & 
remote endpoints
[PATCH RESEND v6 22/36] multi-process: Synchronize remote memory
[PATCH RESEND v6 23/36] multi-process: create IOHUB object to handle irq
[PATCH RESEND v6 24/36] multi-process: Retrieve PCI info from remote process

Thank you very much!
—
Jag

> 
> Stefan




Re: [PATCH v5 12/15] acpi: drop serial/parallel enable bits from dsdt

2020-05-11 Thread Igor Mammedov
On Thu,  7 May 2020 15:16:37 +0200
Gerd Hoffmann  wrote:

> The _STA methods for COM+LPT used to reference them,
> but that isn't the case any more.
> 
> Signed-off-by: Gerd Hoffmann 

Reviewed-by: Igor Mammedov 


> ---
>  hw/i386/acpi-build.c | 23 ---
>  1 file changed, 23 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 1922868f3401..765409a90eb6 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1437,15 +1437,6 @@ static void build_q35_isa_bridge(Aml *table)
>  aml_append(field, aml_named_field("LPTD", 2));
>  aml_append(dev, field);
>  
> -aml_append(dev, aml_operation_region("LPCE", AML_PCI_CONFIG,
> - aml_int(0x82), 0x02));
> -/* enable bits */
> -field = aml_field("LPCE", AML_ANY_ACC, AML_NOLOCK, AML_PRESERVE);
> -aml_append(field, aml_named_field("CAEN", 1));
> -aml_append(field, aml_named_field("CBEN", 1));
> -aml_append(field, aml_named_field("LPEN", 1));
> -aml_append(dev, field);
> -
>  aml_append(scope, dev);
>  aml_append(table, scope);
>  }
> @@ -1469,7 +1460,6 @@ static void build_piix4_isa_bridge(Aml *table)
>  {
>  Aml *dev;
>  Aml *scope;
> -Aml *field;
>  
>  scope =  aml_scope("_SB.PCI0");
>  dev = aml_device("ISA");
> @@ -1478,19 +1468,6 @@ static void build_piix4_isa_bridge(Aml *table)
>  /* PIIX PCI to ISA irq remapping */
>  aml_append(dev, aml_operation_region("P40C", AML_PCI_CONFIG,
>   aml_int(0x60), 0x04));
> -/* enable bits */
> -field = aml_field("^PX13.P13C", AML_ANY_ACC, AML_NOLOCK, AML_PRESERVE);
> -/* Offset(0x5f),, 7, */
> -aml_append(field, aml_reserved_field(0x2f8));
> -aml_append(field, aml_reserved_field(7));
> -aml_append(field, aml_named_field("LPEN", 1));
> -/* Offset(0x67),, 3, */
> -aml_append(field, aml_reserved_field(0x38));
> -aml_append(field, aml_reserved_field(3));
> -aml_append(field, aml_named_field("CAEN", 1));
> -aml_append(field, aml_reserved_field(3));
> -aml_append(field, aml_named_field("CBEN", 1));
> -aml_append(dev, field);
>  
>  aml_append(scope, dev);
>  aml_append(table, scope);




Re: [PATCH v5 03/15] acpi: rtc: use a single crs range

2020-05-11 Thread Igor Mammedov
On Thu,  7 May 2020 15:16:28 +0200
Gerd Hoffmann  wrote:

> Use a single io range for _CRS instead of two,
> following what real hardware does.
> 
> Signed-off-by: Gerd Hoffmann 

Reviewed-by: Igor Mammedov 

> ---
>  hw/rtc/mc146818rtc.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
> index 2104e0aa3b14..ab0cc59973b3 100644
> --- a/hw/rtc/mc146818rtc.c
> +++ b/hw/rtc/mc146818rtc.c
> @@ -1013,12 +1013,14 @@ static void rtc_build_aml(ISADevice *isadev, Aml 
> *scope)
>  Aml *dev;
>  Aml *crs;
>  
> +/*
> + * Reserving 8 io ports here, following what physical hardware
> + * does, even though qemu only responds to the first two ports.
> + */
>  crs = aml_resource_template();
>  aml_append(crs, aml_io(AML_DECODE16, RTC_ISA_BASE, RTC_ISA_BASE,
> -   0x10, 0x02));
> +   0x01, 0x08));
>  aml_append(crs, aml_irq_no_flags(RTC_ISA_IRQ));
> -aml_append(crs, aml_io(AML_DECODE16, RTC_ISA_BASE + 2, RTC_ISA_BASE + 2,
> -   0x02, 0x06));
>  
>  dev = aml_device("RTC");
>  aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0B00")));




Re: [PATCH] hostmem: don't use mbind() if host-nodes is epmty

2020-05-11 Thread Igor Mammedov
On Mon, 11 May 2020 18:00:01 +0200
Philippe Mathieu-Daudé  wrote:

> Hi Eduardo,
> 
> On 5/4/20 5:44 PM, Eduardo Habkost wrote:
> > On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote:  
> >> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM.
> >> The backend however calls mbind() which is typically NOP
> >> in case of default policy/absent host-nodes bitmap.
> >> However when runing in container with black-listed mbind()
> >> syscall, QEMU fails to start with error
> >>   "cannot bind memory to host NUMA nodes: Operation not permitted"
> >> even when user hasn't provided host-nodes to pin to explictly
> >> (which is the case with -m option)
> >>
> >> To fix issue, call mbind() only in case when user has provided
> >> host-nodes explicitly (i.e. host_nodes bitmap is not empty).
> >> That should allow to run QEMU in containers with black-listed
> >> mbind() without memory pinning. If QEMU provided memory-pinning
> >> is required user still has to white-list mbind() in container
> >> configuration.
> >>
> >> Reported-by: Manuel Hohmann 
> >> Signed-off-by: Igor Mammedov   
> > 
> > Queued on machine-next, thanks!  
> 
> I've been debugging this issue again today and figured it was not 
> merged, if possible can you add the "Cc: qemu-sta...@nongnu.org" tag 
> before sending your pull request?
it's CCed already, so my impression was that will should picked up once it was 
reviewed.

> 
> Thanks,
> 
> Phil.
> 




[Bug 1877716] Re: Win10 guest unusable after a few minutes

2020-05-11 Thread Stefan Hajnoczi
Please try this patch series:
https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg02728.html

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1877716

Title:
  Win10 guest unusable after a few minutes

Status in QEMU:
  New

Bug description:
  On Arch Linux, the recent qemu package update seems to misbehave on
  some systems. In my case, my Windows 10 guest runs fine for around 5
  minutes and then start to get really sluggish, even unresponsive. It
  needs to be forced off. I could reproduce this on a minimal VM with no
  passthrough, although my current testing setup involves an nvme pcie
  passthrough.

  I bisected it to the following commit which rapidly starts to run sluggishly 
on my setup:
  https://github.com/qemu/qemu/commit/73fd282e7b6dd4e4ea1c3bbb3d302c8db51e4ccf

  I've ran the previous commit (
  https://github.com/qemu/qemu/commit/b321051cf48ccc2d3d832af111d688f2282f089b
  ) for the entire night without an issue so far.

  I believe this might be a duplicate of
  https://bugs.launchpad.net/qemu/+bug/1873032 , although I'm not sure.

  Linux cc 5.6.10-arch1-1 #1 SMP PREEMPT Sat, 02 May 2020 19:11:54 + x86_64 
GNU/Linux
  AMD Ryzen 7 2700X Eight-Core Processor

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1877716/+subscriptions



[Bug 1873032] Re: After upgrade qemu to 5.0.0-0.3.rc2.fc33 the virtual machine with Windows 10 after a while starts to work very slowly

2020-05-11 Thread Stefan Hajnoczi
Please try this patch series: https://lists.gnu.org/archive/html/qemu-
devel/2020-05/msg02728.html

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1873032

Title:
  After upgrade qemu to 5.0.0-0.3.rc2.fc33 the virtual machine with
  Windows 10 after a while starts to work very slowly

Status in QEMU:
  New

Bug description:
  Description of problem:

  After upgrade qemu to 5.0.0-0.3.rc2.fc33 the virtual machine with
  Windows 10 after a while starts to work very slowly

  I created the virtual machine with Windows 10 with the following config:
  - 1 CPU
  - 2GB RAM
  - With network access

  I launch there a web browser there with flash content. 
  And usually, the system (Windows 10) does not work there for more than an 
hour.
  When the system starts to work very slowly it doesn't respond to "Reboot" and 
"Shut Down" commands. Only works "Force Reset" and "Force Off". But when I 
reboot the system with "Force Reset" it usually stuck at boot at the Windows 
splash screen. https://imgur.com/yGyacDG

  The last version of qemu which not contain this issue is
  5.0.0-0.2.rc0.fc33

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1873032/+subscriptions



[PATCH v2 1/2] PCI: vmd: Filter resource type bits from shadow register

2020-05-11 Thread Jon Derrick
Versions of VMD with the Host Physical Address shadow register use this
register to calculate the bus address offset needed to do guest
passthrough of the domain. This register shadows the Host Physical
Address registers including the resource type bits. After calculating
the offset, the extra resource type bits lead to the VMD resources being
over-provisioned at the front and under-provisioned at the back.

Example:
pci 1:80:02.0: reg 0x10: [mem 0xf801fffc-0xf803fffb 64bit]

Expected:
pci 1:80:02.0: reg 0x10: [mem 0xf802-0xf803 64bit]

If other devices are mapped in the over-provisioned front, it could lead
to resource conflict issues with VMD or those devices.

Fixes: a1a30170138c9 ("PCI: vmd: Fix shadow offsets to reflect spec changes")
Signed-off-by: Jon Derrick 
---
 drivers/pci/controller/vmd.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index dac91d60701d..e386d4eac407 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -445,9 +445,11 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned 
long features)
if (!membar2)
return -ENOMEM;
offset[0] = vmd->dev->resource[VMD_MEMBAR1].start -
-   readq(membar2 + MB2_SHADOW_OFFSET);
+   (readq(membar2 + MB2_SHADOW_OFFSET) &
+PCI_BASE_ADDRESS_MEM_MASK);
offset[1] = vmd->dev->resource[VMD_MEMBAR2].start -
-   readq(membar2 + MB2_SHADOW_OFFSET + 8);
+   (readq(membar2 + MB2_SHADOW_OFFSET + 8) 
&
+PCI_BASE_ADDRESS_MEM_MASK);
pci_iounmap(vmd->dev, membar2);
}
}
-- 
2.18.1




[PATCH for QEMU v2] hw/vfio: Add VMD Passthrough Quirk

2020-05-11 Thread Jon Derrick
The VMD endpoint provides a real PCIe domain to the guest, including
bridges and endpoints. Because the VMD domain is enumerated by the guest
kernel, the guest kernel will assign Guest Physical Addresses to the
downstream endpoint BARs and bridge windows.

When the guest kernel performs MMIO to VMD sub-devices, IOMMU will
translate from the guest address space to the physical address space.
Because the bridges have been programmed with guest addresses, the
bridges will reject the transaction containing physical addresses.

VMD device 28C0 natively assists passthrough by providing the Host
Physical Address in shadow registers accessible to the guest for bridge
window assignment. The shadow registers are valid if bit 1 is set in VMD
VMLOCK config register 0x70. Future VMDs will also support this feature.
Existing VMDs have config register 0x70 reserved, and will return 0 on
reads.

In order to support existing VMDs, this quirk emulates the VMLOCK and
HPA shadow registers for all VMD device ids which don't natively assist
with passthrough. The Linux VMD driver is updated to allow existing VMD
devices to query VMLOCK for passthrough support.

Signed-off-by: Jon Derrick 
---
 hw/vfio/pci-quirks.c | 103 +++
 hw/vfio/pci.c|   7 +++
 hw/vfio/pci.h|   2 +
 hw/vfio/trace-events |   3 ++
 4 files changed, 115 insertions(+)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 2d348f8237..4060a6a95d 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1709,3 +1709,106 @@ free_exit:
 
 return ret;
 }
+
+/*
+ * The VMD endpoint provides a real PCIe domain to the guest and the guest
+ * kernel performs enumeration of the VMD sub-device domain. Guest transactions
+ * to VMD sub-devices go through IOMMU translation from guest addresses to
+ * physical addresses. When MMIO goes to an endpoint after being translated to
+ * physical addresses, the bridge rejects the transaction because the window
+ * has been programmed with guest addresses.
+ *
+ * VMD can use the Host Physical Address in order to correctly program the
+ * bridge windows in its PCIe domain. VMD device 28C0 has HPA shadow registers
+ * located at offset 0x2000 in MEMBAR2 (BAR 4). The shadow registers are valid
+ * if bit 1 is set in the VMD VMLOCK config register 0x70. VMD devices without
+ * this native assistance can have these registers safely emulated as these
+ * registers are reserved.
+ */
+typedef struct VFIOVMDQuirk {
+VFIOPCIDevice *vdev;
+uint64_t membar_phys[2];
+} VFIOVMDQuirk;
+
+static uint64_t vfio_vmd_quirk_read(void *opaque, hwaddr addr, unsigned size)
+{
+VFIOVMDQuirk *data = opaque;
+uint64_t val = 0;
+
+memcpy(, (void *)data->membar_phys + addr, size);
+return val;
+}
+
+static const MemoryRegionOps vfio_vmd_quirk = {
+.read = vfio_vmd_quirk_read,
+.endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+#define VMD_VMLOCK  0x70
+#define VMD_SHADOW  0x2000
+#define VMD_MEMBAR2 4
+
+static int vfio_vmd_emulate_shadow_registers(VFIOPCIDevice *vdev)
+{
+VFIOQuirk *quirk;
+VFIOVMDQuirk *data;
+PCIDevice *pdev = >pdev;
+int ret;
+
+data = g_malloc0(sizeof(*data));
+ret = pread(vdev->vbasedev.fd, data->membar_phys, 16,
+vdev->config_offset + PCI_BASE_ADDRESS_2);
+if (ret != 16) {
+error_report("VMD %s cannot read MEMBARs (%d)",
+ vdev->vbasedev.name, ret);
+g_free(data);
+return -EFAULT;
+}
+
+quirk = vfio_quirk_alloc(1);
+quirk->data = data;
+data->vdev = vdev;
+
+/* Emulate Shadow Registers */
+memory_region_init_io(quirk->mem, OBJECT(vdev), _vmd_quirk, data,
+  "vfio-vmd-quirk", sizeof(data->membar_phys));
+memory_region_add_subregion_overlap(vdev->bars[VMD_MEMBAR2].region.mem,
+VMD_SHADOW, quirk->mem, 1);
+memory_region_set_readonly(quirk->mem, true);
+memory_region_set_enabled(quirk->mem, true);
+
+QLIST_INSERT_HEAD(>bars[VMD_MEMBAR2].quirks, quirk, next);
+
+trace_vfio_pci_vmd_quirk_shadow_regs(vdev->vbasedev.name,
+ data->membar_phys[0],
+ data->membar_phys[1]);
+
+/* Advertise Shadow Register support */
+pci_byte_test_and_set_mask(pdev->config + VMD_VMLOCK, 0x2);
+pci_set_byte(pdev->wmask + VMD_VMLOCK, 0);
+pci_set_byte(vdev->emulated_config_bits + VMD_VMLOCK, 0x2);
+
+trace_vfio_pci_vmd_quirk_vmlock(vdev->vbasedev.name,
+pci_get_byte(pdev->config + VMD_VMLOCK));
+
+return 0;
+}
+
+int vfio_pci_vmd_init(VFIOPCIDevice *vdev)
+{
+int ret = 0;
+
+switch (vdev->device_id) {
+case 0x28C0: /* Native passthrough support */
+break;
+/* Emulates Native passthrough support */
+case 0x201D:
+case 0x467F:
+case 0x4C3D:
+case 0x9A0B:
+ret = 

[PATCH v2 2/2] PCI: vmd: Use Shadow MEMBAR registers for QEMU/KVM guests

2020-05-11 Thread Jon Derrick
VMD device 28C0 natively assists guest passthrough of the VMD endpoint
through the use of shadow registers that provide Host Physical Addresses
to correctly assign bridge windows. These shadow registers are only
available if VMD config space register 0x70, bit 1 is set.

For existing VMD which don't natively support the shadow register, VMD
config space register 0x70 is reserved and will return 0. Future VMD
will have these registers natively in hardware, but existing VMD can
still use this feature by emulating the config space register and shadow
registers.

QEMU has been modified to emulate this config space register and the
shadow membar registers for VMDs which don't natively support this
feature. This patch updates the supported device list to allow this
feature to be used on these VMDs.

Signed-off-by: Jon Derrick 
---
 drivers/pci/controller/vmd.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index e386d4eac407..ee71d0989875 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -600,6 +600,7 @@ static irqreturn_t vmd_irq(int irq, void *data)
 static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id)
 {
struct vmd_dev *vmd;
+   unsigned long features = id->driver_data;
int i, err;
 
if (resource_size(>resource[VMD_CFGBAR]) < (1 << 20))
@@ -652,7 +653,7 @@ static int vmd_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
 
spin_lock_init(>cfg_lock);
pci_set_drvdata(dev, vmd);
-   err = vmd_enable_domain(vmd, (unsigned long) id->driver_data);
+   err = vmd_enable_domain(vmd, features);
if (err)
return err;
 
@@ -716,16 +717,20 @@ static int vmd_resume(struct device *dev)
 static SIMPLE_DEV_PM_OPS(vmd_dev_pm_ops, vmd_suspend, vmd_resume);
 
 static const struct pci_device_id vmd_ids[] = {
-   {PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_201D),},
+   {PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_201D),
+   .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW,},
{PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_28C0),
.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW |
VMD_FEAT_HAS_BUS_RESTRICTIONS,},
{PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x467f),
-   .driver_data = VMD_FEAT_HAS_BUS_RESTRICTIONS,},
+   .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW |
+   VMD_FEAT_HAS_BUS_RESTRICTIONS,},
{PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x4c3d),
-   .driver_data = VMD_FEAT_HAS_BUS_RESTRICTIONS,},
+   .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW |
+   VMD_FEAT_HAS_BUS_RESTRICTIONS,},
{PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_9A0B),
-   .driver_data = VMD_FEAT_HAS_BUS_RESTRICTIONS,},
+   .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW |
+   VMD_FEAT_HAS_BUS_RESTRICTIONS,},
{0,}
 };
 MODULE_DEVICE_TABLE(pci, vmd_ids);
-- 
2.18.1




[PATCH v2 0/2] VMD endpoint passthrough support

2020-05-11 Thread Jon Derrick
This set contains 2 patches for Linux and 1 for QEMU. VMD device
8086:28C0 contains information in registers to assist with direct
assignment passthrough. Several other VMD devices don't have this
information, but can easily be emulated to offer this feature.

The existing VMD devices not supporting the feature cannot be changed to
offer the information, but also don't restrict the ability to offer this
information in emulation by the hypervisor. Future VMD devices will
offer the 28C0 mode natively.

The QEMU patch emulates the hardware assistance that the VMD 28C0 device
provides: a config space register claiming passthrough support, and the
shadow membar registers containing the host information for guest
address assignment in the VMD domain. These VMD devices have this config
space register set as reserved and will not conflict with the emulated
bit.

The Linux patch allows guest kernels to use the passthrough information
emulated by the QEMU patch, by matching the config space register
claiming passthrough support.

Changes from v1:
v1 changed the VMD Subsystem ID to QEMU's so that the guest driver could
match against it. This was unnecessary as the VMLOCK register and shadow
membar registers could be safely emulated. Future VMDs will be aligned
on these register bits.

Added the resource bit filtering patch that got lost in the mailserver.

v1: 
https://lore.kernel.org/linux-pci/20200422171444.10992-1-jonathan.derr...@intel.com/

Jon Derrick (2):
  PCI: vmd: Filter resource type bits from shadow register
  PCI: vmd: Use Shadow MEMBAR registers for QEMU/KVM guests

 drivers/pci/controller/vmd.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

-- 
2.18.1




Re: [PATCH] linux-user: support of semtimedop syscall

2020-05-11 Thread Laurent Vivier
Le 11/05/2020 à 18:39, Matus Kysel a écrit :
> We should add support of semtimedop syscall as new version of
> glibc 2.31 uses semop based on semtimedop (commit: 
> https://gitlab.com/freedesktop-sdk/mirrors/sourceware/glibc/-/commit/765cdd0bffd77960ae852104fc4ea5edcdb8aed3
>  ).
> 
> Signed-off-by: Matus Kysel 
> ---
>  linux-user/syscall.c | 26 +-
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 3a924c0004..cb3978a2a5 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -3879,21 +3879,32 @@ static inline abi_long target_to_host_sembuf(struct 
> sembuf *host_sembuf,
>  return 0;
>  }
>  
> -static inline abi_long do_semop(int semid, abi_long ptr, unsigned nsops)


You should add around this function:

#if defined(TARGET_NR_ipc) || defined(TARGET_NR_semop) ||
defined(TARGET_NR_semtimedop)

> +static inline abi_long do_semtimedop(int semid,
> + abi_long ptr,
> + unsigned nsops,
> + abi_long timeout)
>  {
>  struct sembuf sops[nsops];
> +struct timespec ts, *pts = NULL;
>  abi_long ret;
>  
> +if (timeout) {
> +pts = 
> +if (target_to_host_timespec(pts, timeout)) {

You should add the same #ifdef around target_to_host_timespec().

> +return -TARGET_EFAULT;
> +}
> +}
> +
>  if (target_to_host_sembuf(sops, ptr, nsops))
>  return -TARGET_EFAULT;
>  
>  ret = -TARGET_ENOSYS;
>  #ifdef __NR_semtimedop
> -ret = get_errno(safe_semtimedop(semid, sops, nsops, NULL));
> +ret = get_errno(safe_semtimedop(semid, sops, nsops, pts));
>  #endif
>  #ifdef __NR_ipc
>  if (ret == -TARGET_ENOSYS) {
> -ret = get_errno(safe_ipc(IPCOP_semtimedop, semid, nsops, 0, sops, 
> 0));
> +ret = get_errno(safe_ipc(IPCOP_semtimedop, semid, nsops, 0, sops, 
> pts));
>  }
>  #endif
>  return ret;
> @@ -4373,7 +4384,8 @@ static abi_long do_ipc(CPUArchState *cpu_env,
>  
>  switch (call) {
>  case IPCOP_semop:
> -ret = do_semop(first, ptr, second);
> +case IPCOP_semtimedop:
> +ret = do_semtimedop(first, ptr, second, third);

Are you sure "third" is NULL in case of IPCOP_semop?

You should explicitly keep

ret = do_semtimedop(first, ptr, second, NULL);

for IPCOP_semop.

>  break;
>  
>  case IPCOP_semget:
> @@ -9608,7 +9620,11 @@ static abi_long do_syscall1(void *cpu_env, int num, 
> abi_long arg1,
>  #endif
>  #ifdef TARGET_NR_semop
>  case TARGET_NR_semop:
> -return do_semop(arg1, arg2, arg3);
> +return do_semtimedop(arg1, arg2, arg3, 0);
> +#endif
> +#ifdef TARGET_NR_semtimedop
> +case TARGET_NR_semtimedop:
> +return do_semtimedop(arg1, arg2, arg3, arg4);
>  #endif
>  #ifdef TARGET_NR_semctl
>  case TARGET_NR_semctl:
> 

Thanks,
LAurent



Re: [PATCH] Add a new PIIX option to control PCI hot unplugging of devices on non-root buses

2020-05-11 Thread Igor Mammedov
On Sun, 10 May 2020 17:42:16 +
Ani Sinha  wrote:

> > On Apr 29, 2020, at 9:02 PM, Igor Mammedov  wrote:
> > 
> > On Fri, 24 Apr 2020 14:44:48 -0400
> > Eduardo Habkost  wrote:
> >   
> >> On Fri, Apr 24, 2020 at 03:23:56PM +, Ani Sinha wrote:  
> >>> 
> >>>   
>  On Apr 22, 2020, at 4:15 PM, Ani Sinha  wrote:
>  
>  
>    
> > On Apr 21, 2020, at 8:32 PM, Daniel P. Berrangé  
> > wrote:
> > 
> > On Tue, Apr 21, 2020 at 02:45:04PM +, Ani Sinha wrote:
> >> 
> >>   
> >>> On Apr 20, 2020, at 8:32 PM, Michael S. Tsirkin  
> >>> wrote:
> >>> 
> >>> But I for one would like to focus on keeping PIIX stable
> >>> and focus development on q35.  Not bloating PIIX with lots of new
> >>> features is IMHO a good way to do that.
> >> 
> >> Does this mean this patch is a no-go then? :(
> > 
> > I'd support this patch, as I don't think it can really be described as
> > bloat or destabalizing. It is just adding a simple property to
> > conditionalize existing functionality.  Telling people to switch to Q35
> > is unreasonable as it is not a simple 1-1 conversion from existing use
> > of PIIX. Q35 has much higher complexity in its configuration, has higher
> > memory overhead per VM too, and lacks certain features of PIIX too.
>  
>  Cool. How do we go forward from here?
>    
> >>> 
> >>> We would really appreciate if we can add this extra knob in
> >>> Qemu. Maybe someone else also in the community will find this
> >>> useful. We don’t want to maintain this patch internally forever
> >>> but rather prefer we maintain this as a Qemu community.
> >> 
> >> Michael, I agree with Daniel here and I don't think we should
> >> start refusing PIIX features if they are useful for a portion of
> >> the QEMU community.
> >> 
> >> Would you reconsider and merge this patch?  
> > 
> > I put this patch on my review queue (hopefully next week I'd be able to get 
> > to it)  
> 
> Any progress?
> 

see my reply on v2




Re: [PATCH V2] Add a new PIIX option to control PCI hot unplugging of devices on non-root buses

2020-05-11 Thread Igor Mammedov
On Tue, 28 Apr 2020 10:16:52 +
Ani Sinha  wrote:

> A new option "use_acpi_unplug" is introduced for PIIX which will
> selectively only disable hot unplugging of both hot plugged and
> cold plugged PCI devices on non-root PCI buses. This will prevent
> hot unplugging of devices from Windows based guests from system
> tray but will not prevent devices from being hot plugged into the
> guest.
> 
> It has been tested on Windows guests.
> 
> Signed-off-by: Ani Sinha 
> ---
>  hw/acpi/piix4.c  |  3 +++
>  hw/i386/acpi-build.c | 40 ++--
>  2 files changed, 29 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> index 964d6f5..59fa707 100644
> --- a/hw/acpi/piix4.c
> +++ b/hw/acpi/piix4.c
> @@ -78,6 +78,7 @@ typedef struct PIIX4PMState {
>  
>  AcpiPciHpState acpi_pci_hotplug;
>  bool use_acpi_pci_hotplug;
> +bool use_acpi_unplug;
>  
>  uint8_t disable_s3;
>  uint8_t disable_s4;
> @@ -633,6 +634,8 @@ static Property piix4_pm_properties[] = {
>  DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_VAL, PIIX4PMState, s4_val, 2),
>  DEFINE_PROP_BOOL("acpi-pci-hotplug-with-bridge-support", PIIX4PMState,
>   use_acpi_pci_hotplug, true),
> +DEFINE_PROP_BOOL("acpi-pci-hotunplug-enable-bridge", PIIX4PMState,
> + use_acpi_unplug, true),
>  DEFINE_PROP_BOOL("memory-hotplug-support", PIIX4PMState,
>   acpi_memory_hotplug.is_enabled, true),
>  DEFINE_PROP_END_OF_LIST(),
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 23c77ee..71b3ac3 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
>  bool s3_disabled;
>  bool s4_disabled;
>  bool pcihp_bridge_en;
> +bool pcihup_bridge_en;
>  uint8_t s4_val;
>  AcpiFadtData fadt;
>  uint16_t cpu_hp_io_base;
> @@ -240,6 +241,9 @@ static void acpi_get_pm_info(MachineState *machine, 
> AcpiPmInfo *pm)
>  pm->pcihp_bridge_en =
>  object_property_get_bool(obj, "acpi-pci-hotplug-with-bridge-support",
>   NULL);
> +pm->pcihup_bridge_en =
> +object_property_get_bool(obj, "acpi-pci-hotunplug-enable-bridge",
> + NULL);
>  }
>  
>  static void acpi_get_misc_info(AcpiMiscInfo *info)
> @@ -451,7 +455,8 @@ static void build_append_pcihp_notify_entry(Aml *method, 
> int slot)
>  }
>  
>  static void build_append_pci_bus_devices(Aml *parent_scope, PCIBus *bus,
> - bool pcihp_bridge_en)
> + bool pcihp_bridge_en,
> + bool pcihup_bridge_en)
>  {
>  Aml *dev, *notify_method = NULL, *method;
>  QObject *bsel;
> @@ -479,11 +484,14 @@ static void build_append_pci_bus_devices(Aml 
> *parent_scope, PCIBus *bus,
>  dev = aml_device("S%.02X", PCI_DEVFN(slot, 0));
>  aml_append(dev, aml_name_decl("_SUN", aml_int(slot)));
>  aml_append(dev, aml_name_decl("_ADR", aml_int(slot << 16)));
> -method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
> -aml_append(method,
> -aml_call2("PCEJ", aml_name("BSEL"), aml_name("_SUN"))
> -);
> -aml_append(dev, method);
> +if (pcihup_bridge_en || pci_bus_is_root(bus)) {

so you are keeping unplug anyway in case of host bridge, so user will see
eject icon if device is in root bus?

Other thing about this patch is that it only partially disable hotplug,
I'd rather do it the way hardware does i.e. full hotplug or no hotplug at all.
(like the other hypervisors have done it, to workaround this Windows 'feature')

which is possible is one puts device on pci bridge without hotplug, i.e.

 -global PIIX4_PM.acpi-pci-hotplug-with-bridge-support=off

that of cause leaves apci hotplug on and as you noticed earlier
Windows will offer to eject any device on root bus including directly
attached bridges. And currently there is no way to disable that.

Will following hack work for you?
possible permutations
1) ACPI hotplug everywhere
-global PIIX4_PM.acpi-pci-hotplug=on -global 
PIIX4_PM.acpi-pci-hotplug-with-bridge-support=on -device 
pci-bridge,chassis_nr=1,shpc=doesnt_matter -device 
e1000,bus=pci.1,addr=01,id=netdev1 

2) No hotplug at all
-global PIIX4_PM.acpi-pci-hotplug=off -global 
PIIX4_PM.acpi-pci-hotplug-with-bridge-support=on -device 
pci-bridge,chassis_nr=1,shpc=off -device e1000,bus=pci.1,addr=01,id=netdev1

-global PIIX4_PM.acpi-pci-hotplug=off -global 
PIIX4_PM.acpi-pci-hotplug-with-bridge-support=off -device 
pci-bridge,chassis_nr=1,shpc=doesnt_matter  -device 
e1000,bus=pci.1,addr=01,id=netdev1

3) looks like SHPC kicks in, but it still needs to some bridge description in 
ACPI that
   acpi-pci-hotplug-with-bridge-support provides, probably with this you can 

Re: [PATCH v2] linux-user: syscall: ioctls: support DRM_IOCTL_VERSION

2020-05-11 Thread Laurent Vivier
Le 15/03/2020 à 13:20, cheng...@emindsoft.com.cn a écrit :
> From: Chen Gang 
> 
> Another DRM_IOCTL_* commands will be done later.
> 
> Signed-off-by: Chen Gang 
> ---
>  linux-user/ioctls.h|  2 ++
>  linux-user/syscall.c   | 62 ++
>  linux-user/syscall_defs.h  | 15 +
>  linux-user/syscall_types.h | 11 +++
>  4 files changed, 90 insertions(+)
> 
> diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
> index 0defa1d8c1..3ae32cbfb1 100644
> --- a/linux-user/ioctls.h
> +++ b/linux-user/ioctls.h
> @@ -574,6 +574,8 @@
>IOCTL_SPECIAL(SIOCDELRT, IOC_W, do_ioctl_rt,
>  MK_PTR(MK_STRUCT(STRUCT_rtentry)))
>  
> +  IOCTL_SPECIAL(DRM_IOCTL_VERSION, IOC_RW, do_ioctl_drm,
> +MK_PTR(MK_STRUCT(STRUCT_drm_version)))

Add a blank line here.

>  #ifdef TARGET_TIOCSTART
>IOCTL_IGNORE(TIOCSTART)
>IOCTL_IGNORE(TIOCSTOP)
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 8d27d10807..2eb7c91ab4 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -112,6 +112,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

I think you should check in configure that this file is available on the
system.

>  #include "linux_loop.h"
>  #include "uname.h"
>  
> @@ -5196,6 +5197,67 @@ static abi_long do_ioctl_tiocgptpeer(const IOCTLEntry 
> *ie, uint8_t *buf_temp,
>  }
>  #endif
>  
> +static inline abi_long target_to_host_drmversion(struct drm_version 
> *host_ver,
> +abi_long target_addr)
> +{
> +struct target_drm_version *target_ver;
> +
> +if (!lock_user_struct(VERIFY_READ, target_ver, target_addr, 0)) {
> +return -TARGET_EFAULT;
> +}
> +__get_user(host_ver->name_len, _ver->name_len);
> +host_ver->name = host_ver->name_len ? g2h(target_ver->name) : NULL;
> +__get_user(host_ver->date_len, _ver->date_len);
> +host_ver->date = host_ver->date_len ? g2h(target_ver->date) : NULL;
> +__get_user(host_ver->desc_len, _ver->desc_len);
> +host_ver->desc = host_ver->desc_len ? g2h(target_ver->desc) : NULL;
> +unlock_user_struct(target_ver, target_addr, 0);
> +return 0;
> +}
> +
> +static inline abi_long host_to_target_drmversion(abi_ulong target_addr,
> + struct drm_version 
> *host_ver)
> +{
> +struct target_drm_version *target_ver;
> +
> +if (!lock_user_struct(VERIFY_WRITE, target_ver, target_addr, 0)) {
> +return -TARGET_EFAULT;
> +}
> +__put_user(host_ver->version_major, _ver->version_major);
> +__put_user(host_ver->version_minor, _ver->version_minor);
> +__put_user(host_ver->version_patchlevel, 
> _ver->version_patchlevel);
> +__put_user(host_ver->name_len, _ver->name_len);
> +__put_user(host_ver->date_len, _ver->date_len);
> +__put_user(host_ver->desc_len, _ver->desc_len);
> +unlock_user_struct(target_ver, target_addr, 0);
> +return 0;
> +}
> +
> +static abi_long do_ioctl_drm(const IOCTLEntry *ie, uint8_t *buf_temp,
> + int fd, int cmd, abi_long arg)
> +{
> +struct drm_version *ver;
> +abi_long ret;
> +
> +switch (ie->host_cmd) {
> +case DRM_IOCTL_VERSION:
> +ver = (struct drm_version *)buf_temp;
> +memset(ver, 0, sizeof(*ver));
> +ret = target_to_host_drmversion(ver, arg);
> +if (is_error(ret)) {
> +return ret;
> +}
> +ret = get_errno(safe_ioctl(fd, ie->host_cmd, ver));
> +if (is_error(ret)) {
> +return ret;
> +}
> +ret = host_to_target_drmversion(arg, ver);
> +return ret;
> +}
> +return -TARGET_EFAULT;
> +}
> +
> +
>  static IOCTLEntry ioctl_entries[] = {
>  #define IOCTL(cmd, access, ...) \
>  { TARGET_ ## cmd, cmd, #cmd, access, 0, {  __VA_ARGS__ } },
> diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
> index 152ec637cb..3c261cff0e 100644
> --- a/linux-user/syscall_defs.h
> +++ b/linux-user/syscall_defs.h
> @@ -1167,6 +1167,9 @@ struct target_rtc_pll_info {
>  #define TARGET_DM_TARGET_MSG  TARGET_IOWRU(0xfd, 0x0e)
>  #define TARGET_DM_DEV_SET_GEOMETRYTARGET_IOWRU(0xfd, 0x0f)
>  
> +/* drm ioctls */
> +#define TARGET_DRM_IOCTL_VERSION  TARGET_IOWRU('d', 0x00)

Why do you use the TARGET_IOWRU variant?

Can't you use TARGET_IOWR('d', 0x00, struct target_drm_version)?

Thanks,
Laurent



[PATCH 2/2] aio-posix: disable fdmon-io_uring when GSource is used

2020-05-11 Thread Stefan Hajnoczi
The glib event loop does not call fdmon_io_uring_wait() so fd handlers
waiting to be submitted build up in the list. There is no benefit is
using io_uring when the glib GSource is being used, so disable it
instead of implementing a more complex fix.

This fixes a memory leak where AioHandlers would build up and increasing
amounts of CPU time were spent iterating them in aio_pending(). The
symptom is that guests become slow when QEMU is built with io_uring
support.

Buglink: https://bugs.launchpad.net/qemu/+bug/1877716
Fixes: 73fd282e7b6dd4e4ea1c3bbb3d302c8db51e4ccf ("aio-posix: add io_uring fd 
monitoring implementation")
Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h |  3 +++
 util/aio-posix.c| 12 
 util/aio-win32.c|  4 
 util/async.c|  1 +
 4 files changed, 20 insertions(+)

diff --git a/include/block/aio.h b/include/block/aio.h
index 62ed954344..b2f703fa3f 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -701,6 +701,9 @@ void aio_context_setup(AioContext *ctx);
  */
 void aio_context_destroy(AioContext *ctx);
 
+/* Used internally, do not call outside AioContext code */
+void aio_context_use_g_source(AioContext *ctx);
+
 /**
  * aio_context_set_poll_params:
  * @ctx: the aio context
diff --git a/util/aio-posix.c b/util/aio-posix.c
index 8af334ab19..1b2a3af65b 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -682,6 +682,18 @@ void aio_context_destroy(AioContext *ctx)
 aio_free_deleted_handlers(ctx);
 }
 
+void aio_context_use_g_source(AioContext *ctx)
+{
+/*
+ * Disable io_uring when the glib main loop is used because it doesn't
+ * support mixed glib/aio_poll() usage. It relies on aio_poll() being
+ * called regularly so that changes to the monitored file descriptors are
+ * submitted, otherwise a list of pending fd handlers builds up.
+ */
+fdmon_io_uring_destroy(ctx);
+aio_free_deleted_handlers(ctx);
+}
+
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
  int64_t grow, int64_t shrink, Error **errp)
 {
diff --git a/util/aio-win32.c b/util/aio-win32.c
index 729d533faf..953c56ab48 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -414,6 +414,10 @@ void aio_context_destroy(AioContext *ctx)
 {
 }
 
+void aio_context_use_g_source(AioContext *ctx)
+{
+}
+
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
  int64_t grow, int64_t shrink, Error **errp)
 {
diff --git a/util/async.c b/util/async.c
index 3165a28f2f..1319eee3bc 100644
--- a/util/async.c
+++ b/util/async.c
@@ -362,6 +362,7 @@ static GSourceFuncs aio_source_funcs = {
 
 GSource *aio_get_g_source(AioContext *ctx)
 {
+aio_context_use_g_source(ctx);
 g_source_ref(>source);
 return >source;
 }
-- 
2.25.3



[PATCH 0/2] aio-posix: fix fdmon-io_uring memory leak

2020-05-11 Thread Stefan Hajnoczi
This bug was introduced in QEMU 5.0 and causes guests to slow down because
AioHandlers are not freed when the fdmon-io_uring file descriptor monitoring
implementation is used by the main loop thread's glib event loop. This issue
does not apply to IOThread usage of fdmon-io_uring.

In practice few distros build with io_uring support enabled at the moment, so
the number of affected users is likely to be small. The fix is still suitable
for a stable release though.

https://bugs.launchpad.net/qemu/+bug/1877716
https://bugs.launchpad.net/qemu/+bug/1873032

Stefan Hajnoczi (2):
  aio-posix: don't duplicate fd handler deletion in
fdmon_io_uring_destroy()
  aio-posix: disable fdmon-io_uring when GSource is used

 include/block/aio.h   |  3 +++
 util/aio-posix.c  | 13 +
 util/aio-win32.c  |  4 
 util/async.c  |  1 +
 util/fdmon-io_uring.c | 13 ++---
 5 files changed, 31 insertions(+), 3 deletions(-)

-- 
2.25.3



[PATCH 1/2] aio-posix: don't duplicate fd handler deletion in fdmon_io_uring_destroy()

2020-05-11 Thread Stefan Hajnoczi
The io_uring file descriptor monitoring implementation has an internal
list of fd handlers that are pending submission to io_uring.
fdmon_io_uring_destroy() deletes all fd handlers on the list.

Don't delete fd handlers directly in fdmon_io_uring_destroy() for two
reasons:
1. This duplicates the aio-posix.c AioHandler deletion code and could
   become outdated if the struct changes.
2. Only handlers with the FDMON_IO_URING_REMOVE flag set are safe to
   remove. If the flag is not set then something still has a pointer to
   the fd handler. Let aio-posix.c and its user worry about that. In
   practice this isn't an issue because fdmon_io_uring_destroy() is only
   called when shutting down so all users have removed their fd
   handlers, but the next patch will need this!

Signed-off-by: Stefan Hajnoczi 
---
 util/aio-posix.c  |  1 +
 util/fdmon-io_uring.c | 13 ++---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index c3613d299e..8af334ab19 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -679,6 +679,7 @@ void aio_context_destroy(AioContext *ctx)
 {
 fdmon_io_uring_destroy(ctx);
 fdmon_epoll_disable(ctx);
+aio_free_deleted_handlers(ctx);
 }
 
 void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
index d5a80ed6fb..1d14177df0 100644
--- a/util/fdmon-io_uring.c
+++ b/util/fdmon-io_uring.c
@@ -342,11 +342,18 @@ void fdmon_io_uring_destroy(AioContext *ctx)
 
 io_uring_queue_exit(>fdmon_io_uring);
 
-/* No need to submit these anymore, just free them. */
+/* Move handlers due to be removed onto the deleted list */
 while ((node = QSLIST_FIRST_RCU(>submit_list))) {
+unsigned flags = atomic_fetch_and(>flags,
+~(FDMON_IO_URING_PENDING |
+  FDMON_IO_URING_ADD |
+  FDMON_IO_URING_REMOVE));
+
+if (flags & FDMON_IO_URING_REMOVE) {
+QLIST_INSERT_HEAD_RCU(>deleted_aio_handlers, node, 
node_deleted);
+}
+
 QSLIST_REMOVE_HEAD_RCU(>submit_list, node_submitted);
-QLIST_REMOVE(node, node);
-g_free(node);
 }
 
 ctx->fdmon_ops = _poll_ops;
-- 
2.25.3



Re: [PATCH v3 05/17] block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes()

2020-05-11 Thread Eric Blake

On 5/11/20 12:17 PM, Alberto Garcia wrote:

On Thu 30 Apr 2020 01:10:21 PM CEST, Vladimir Sementsov-Ogievskiy wrote:

 compute 'int tail' via % 'int alignment' - safe


 tail = (offset + bytes) % alignment;

both are int64_t, no chance of overflow here?


Good question - I know several places check that offset+bytes does not 
overflow, but did not specifically audit if this one does.  Adding an 
assert() in this function may be easier than trying to prove all callers 
pass in safe values.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 7/9] qcow2: Expose bitmaps' size during measure

2020-05-11 Thread Eric Blake

On 5/11/20 6:50 AM, Max Reitz wrote:

On 08.05.20 20:03, Eric Blake wrote:

It's useful to know how much space can be occupied by qcow2 persistent
bitmaps, even though such metadata is unrelated to the guest-visible
data.  Report this value as an additional field, present when
measuring an existing image and the output format supports bitmaps.
Update iotest 178 and 190 to updated output, as well as new coverage
in 190 demonstrating non-zero values made possible with the
recently-added qemu-img bitmap command.

The addition of a new field demonstrates why we should always
zero-initialize qapi C structs; while the qcow2 driver still fully
populates all fields, the raw and crypto drivers had to be tweaked to
avoid uninitialized data.

See also: https://bugzilla.redhat.com/1779904

Reported-by: Nir Soffer 
Signed-off-by: Eric Blake 
---



+#
+# @bitmaps: Additional size required for bitmap metadata in a source image,


s/in/from/?  Otherwise it sounds like this would be about allocation in
the source, which it clear can’t be, but, well.



Yes, 'from' sounds nicer, especially since the size requirements being 
measured depend on the destination's cluster size (which may be 
different from the source's cluster size).



+#   if that bitmap metadata can be copied in addition to guest
+#   contents. (since 5.1)


[...]




+/*
+ * Remove data clusters that are not required.  This overestimates the
   * required size because metadata needed for the fully allocated file is
- * still counted.
+ * still counted.  Show bitmaps only if both source and destination
+ * would support them.
   */
  info->required = info->fully_allocated - virtual_size + required;
+info->has_bitmaps = version >= 3 && in_bs &&
+bdrv_dirty_bitmap_supported(in_bs);


Why is it important whether the source format supports persistent dirty
bitmaps?


If the source format does not support bitmaps, there is nothing to copy 
over.  Reporting '0' would work, but adds verbosity.  It also becomes a 
question as to whether 'qemu-img convert --bitmaps' should silently 
ignore such sources, or loudly error out that the option is unsupported 
because the source lacks bitmaps.  I could lean either way.




I’m asking because I’d like there to be some concise reason when and why
the @bitmaps field appears.  “Whenever the target supports bitmaps” is
more concise than “When both source and target support bitmaps”.  Also,
the latter is not really different from “When any bitmap data can be
copied”, but in the latter case we should not show it when there are no
bitmaps in the source (even though the format supports them).

Or from the other perspective: As a user, I would never be annoyed by
the @bitmaps field being present.  I don’t mind a “0”.
OTOH, what information can it convey to me that it it’s optional and
sometimes not present?


The impact to the iotests .out files is larger if I do not require that 
the source supports bitmaps (more lines of 'bitmaps: 0' added).  I'm 
fine doing that, if we decide we're okay with the simpler definition of 
'"bitmaps" is present if the destination supports them' (rather than 
this version's implementation of '"bitmaps" is present if both the 
source and destination support them').



I can see these cases:

- That the source format doesn’t support bitmaps?  I want to convert it
to something else anyway, so I don’t really care about what the source
format can or can’t do.

- That the destination doesn’t support bitmaps?  Ah, yes, the fact that
the bitmap field is missing might be a warning sign for this.

- That qemu is too old to copy bitmaps?  Same here.


In fact, that argument is a GOOD reason to output 'bitmaps: 0' in as 
many cases as possible, because it then becomes a side-effect witness of 
whether 'qemu-img convert --bitmaps' is even understood.




- That there are no bitmaps in the source?  OK, but then I disregard the
@bitmaps field anyway, present or not.

So from that standpoint, the best use seems to me to take “The @bitmaps
field isn’t present” as kind of a warning that something in the convert
process won’t support copying bitmaps.  If it’s present, all is well.
So basically there’d be an iff relationship between “measure reports
@bitmaps” and “convert --bitmap can work”.


Yes, I can make that tweak for v4.



But the distinction between “the source format doesn’t support bitmaps”
and “the source image doesn’t have bitmaps” doesn’t seem that important
to me to make it visible in the interface.



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




[Bug 1877716] Re: Win10 guest unusable after a few minutes

2020-05-11 Thread Anatol Pomozov
Thank you Stefan for looking at this issue.

As Alexander and @postfactum mentioned Arch disabled io_uring feature
after this bug has been discovered. Here is an Arch Linux issue that
tracks it https://bugs.archlinux.org/task/66578

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1877716

Title:
  Win10 guest unusable after a few minutes

Status in QEMU:
  New

Bug description:
  On Arch Linux, the recent qemu package update seems to misbehave on
  some systems. In my case, my Windows 10 guest runs fine for around 5
  minutes and then start to get really sluggish, even unresponsive. It
  needs to be forced off. I could reproduce this on a minimal VM with no
  passthrough, although my current testing setup involves an nvme pcie
  passthrough.

  I bisected it to the following commit which rapidly starts to run sluggishly 
on my setup:
  https://github.com/qemu/qemu/commit/73fd282e7b6dd4e4ea1c3bbb3d302c8db51e4ccf

  I've ran the previous commit (
  https://github.com/qemu/qemu/commit/b321051cf48ccc2d3d832af111d688f2282f089b
  ) for the entire night without an issue so far.

  I believe this might be a duplicate of
  https://bugs.launchpad.net/qemu/+bug/1873032 , although I'm not sure.

  Linux cc 5.6.10-arch1-1 #1 SMP PREEMPT Sat, 02 May 2020 19:11:54 + x86_64 
GNU/Linux
  AMD Ryzen 7 2700X Eight-Core Processor

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1877716/+subscriptions



Re: [PATCH v3 6/9] qemu-img: Add bitmap sub-command

2020-05-11 Thread Eric Blake

On 5/11/20 6:10 AM, Max Reitz wrote:

On 08.05.20 20:03, Eric Blake wrote:

Include actions for --add, --remove, --clear, --enable, --disable, and
--merge (note that --clear is a bit of fluff, because the same can be
accomplished by removing a bitmap and then adding a new one in its
place, but it matches what QMP commands exist).  Listing is omitted,
because it does not require a bitmap name and because it was already
possible with 'qemu-img info'.  A single command line can play one or
more bitmap commands in sequence on the same bitmap name (although all
added bitmaps share the same granularity, and and all merged bitmaps
come from the same source file).  Merge defaults to other bitmaps in
the primary image, but can also be told to merge bitmaps from a
distinct image.


For the record: Yes, my comment was mostly about my confusion around the
{}.  So just replacing them by () would have pacified me.

But this is more fun, of course.




+++ b/docs/tools/qemu-img.rst
@@ -281,6 +281,29 @@ Command description:


[...]


+  Additional options ``-g`` set a non-default *GRANULARITY* for


sets?


Or maybe:

Additional options include ``-g`` which sets a non-default *GRANULARITY* 
for ``--add``, and ``-b`` and ``-F`` which select an alternative source 
file for all *SOURCE* bitmaps used by ``--merge``.


And in writing this, I just realized - even though you _can_ use --add 
more than once in a command line, the command is still limited to 
operating on a single bitmap name, so unless you write contortions like:


qemu-img bitmap --add --remove --add -g 1024 file.qcow2 bitmapname

there will normally be at most one --add operation for a -g to be used 
with (because otherwise the second --add will fail when attempting to 
create an already-existing bitmap name).




With that fixed (or maybe not, you know that better than me):

Reviewed-by: Max Reitz 




--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v4 5/6] i386: Hyper-V VMBus ACPI DSDT entry

2020-05-11 Thread Roman Kagan
On Tue, May 05, 2020 at 03:06:37PM +0200, Igor Mammedov wrote:
> On Fri, 24 Apr 2020 15:34:43 +0300
> Jon Doron  wrote:
> 
> > Guest OS uses ACPI to discover VMBus presence.  Add a corresponding
> > entry to DSDT in case VMBus has been enabled.
> > 
> > Experimentally Windows guests were found to require this entry to
> > include two IRQ resources. They seem to never be used but they still
> > have to be there.
> > 
> > Make IRQ numbers user-configurable via corresponding properties; use 7
> > and 13 by default.
> well, it seems that at least linux guest driver uses one IRQ,
> abeit not from ACPI descriptior

I guess you mean synthetic interrupts.  Linux doesn't seem to use
ACPI-discovered IRQs.

> perhaps it's what hyperv host puts into _CRS.
> Could you dump ACPI tables and check how hyperv describes vmbus in acpi?

Exactly, this was how this was conceived in the first place.

> also what if vmbus irq collides with an irq that is already taken,
> it would be better to initialize and consume irqs it climes to use
> so in case if conflict one would get a error.

That was the plan initially.  However, since no guest actually used
those irqs, it appeared not worth the effort.  Dunno what problems can
arise from the conflicts.

> > Signed-off-by: Evgeny Yakovlev 
> > Signed-off-by: Roman Kagan 
> > Signed-off-by: Maciej S. Szmigiero 
> > Signed-off-by: Jon Doron 
> > ---
> >  hw/hyperv/vmbus.c|  7 ++
> >  hw/i386/acpi-build.c | 43 
> >  include/hw/hyperv/vmbus-bridge.h |  3 +++
> >  3 files changed, 53 insertions(+)
> > 
> > diff --git a/hw/hyperv/vmbus.c b/hw/hyperv/vmbus.c
> > index 1f5873ab60..0df7afe0ca 100644
> > --- a/hw/hyperv/vmbus.c
> > +++ b/hw/hyperv/vmbus.c
> > @@ -2641,6 +2641,12 @@ static const VMStateDescription vmstate_vmbus_bridge 
> > = {
> >  },
> >  };
> >  
> > +static Property vmbus_bridge_props[] = {
> > +DEFINE_PROP_UINT8("irq0", VMBusBridge, irq0, 7),
> > +DEFINE_PROP_UINT8("irq1", VMBusBridge, irq1, 13),
> > +DEFINE_PROP_END_OF_LIST()
> > +};
> > +
> >  static void vmbus_bridge_class_init(ObjectClass *klass, void *data)
> >  {
> >  DeviceClass *k = DEVICE_CLASS(klass);
> > @@ -2651,6 +2657,7 @@ static void vmbus_bridge_class_init(ObjectClass 
> > *klass, void *data)
> >  sk->explicit_ofw_unit_address = vmbus_bridge_ofw_unit_address;
> >  set_bit(DEVICE_CATEGORY_BRIDGE, k->categories);
> >  k->vmsd = _vmbus_bridge;
> > +device_class_set_props(k, vmbus_bridge_props);
> >  /* override SysBusDevice's default */
> >  k->user_creatable = true;
> >  }
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 2a7e55bae7..d235074fb8 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -50,6 +50,7 @@
> >  #include "hw/mem/nvdimm.h"
> >  #include "sysemu/numa.h"
> >  #include "sysemu/reset.h"
> > +#include "hw/hyperv/vmbus-bridge.h"
> >  
> >  /* Supported chipsets: */
> >  #include "hw/southbridge/piix.h"
> > @@ -1270,9 +1271,47 @@ static Aml *build_com_device_aml(uint8_t uid)
> >  return dev;
> >  }
> >  
> > +static Aml *build_vmbus_device_aml(VMBusBridge *vmbus_bridge)
> > +{
> > +Aml *dev;
> > +Aml *method;
> > +Aml *crs;
> > +
> > +dev = aml_device("VMBS");
> > +aml_append(dev, aml_name_decl("STA", aml_int(0xF)));
> > +aml_append(dev, aml_name_decl("_HID", aml_string("VMBus")));
> > +aml_append(dev, aml_name_decl("_UID", aml_int(0x0)));
> > +aml_append(dev, aml_name_decl("_DDN", aml_string("VMBUS")));
> > +
> > +method = aml_method("_DIS", 0, AML_NOTSERIALIZED);
> > +aml_append(method, aml_store(aml_and(aml_name("STA"), aml_int(0xD), 
> > NULL),
> > + aml_name("STA")));
> > +aml_append(dev, method);
> > +
> > +method = aml_method("_PS0", 0, AML_NOTSERIALIZED);
> > +aml_append(method, aml_store(aml_or(aml_name("STA"), aml_int(0xF), 
> > NULL),
> > + aml_name("STA")));
> > +aml_append(dev, method);
> > +
> > +method = aml_method("_STA", 0, AML_NOTSERIALIZED);
> > +aml_append(method, aml_return(aml_name("STA")));
> > +aml_append(dev, method);
> 
> do you reaaly need all that _STA/_DIS/_PS0,
> does it work without thouse methods?

This was just copied from HyperV.  It may make sense to test without.

> > +
> > +aml_append(dev, aml_name_decl("_PS3", aml_int(0x0)));
> should be method

Not our fault :)  Again this was copied.

> > +
> > +crs = aml_resource_template();
> > +aml_append(crs, aml_irq_no_flags(vmbus_bridge->irq0));
> > +/* FIXME: newer HyperV gets by with only one IRQ */
> then why are you adding the second IRQ, does it work with 1 IRQ?

This FIXME was left by me when I noticed that more recent HyperV servers
only stick one IRQ there, but I didn't get around to dig further.

> > +aml_append(crs, aml_irq_no_flags(vmbus_bridge->irq1));
> > +aml_append(dev, aml_name_decl("_CRS", 

Re: [PATCH v3 3/9] block: Make it easier to learn which BDS support bitmaps

2020-05-11 Thread Eric Blake

On 5/11/20 4:21 AM, Max Reitz wrote:

On 08.05.20 20:03, Eric Blake wrote:

Upcoming patches will enhance bitmap support in qemu-img, but in doing
so, it turns out to be nice to suppress output when bitmaps make no
sense (such as on a qcow2 v2 image).  Add a hook to make this easier
to query.

In the future, when we improve the ability to look up bitmaps through
a filter, we will probably also want to teach the block layer to
automatically let filters pass this request on through.

Signed-off-by: Eric Blake 
---
  block/qcow2.h| 1 +
  include/block/block_int.h| 1 +
  include/block/dirty-bitmap.h | 1 +
  block/dirty-bitmap.c | 9 +
  block/qcow2-bitmap.c | 7 +++
  block/qcow2.c| 1 +
  6 files changed, 20 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index f4de0a27d5c3..fb2b2b5a7b4d 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -764,6 +764,7 @@ bool qcow2_co_can_store_new_dirty_bitmap(BlockDriverState 
*bs,
  int qcow2_co_remove_persistent_dirty_bitmap(BlockDriverState *bs,
  const char *name,
  Error **errp);
+bool qcow2_dirty_bitmap_supported(BlockDriverState *bs);

  ssize_t coroutine_fn
  qcow2_co_compress(BlockDriverState *bs, void *dest, size_t dest_size,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index df6d0273d679..cb1082da4c43 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -560,6 +560,7 @@ struct BlockDriver {
   uint64_t parent_perm, uint64_t parent_shared,
   uint64_t *nperm, uint64_t *nshared);

+bool (*bdrv_dirty_bitmap_supported)(BlockDriverState *bs);


All BDSs support bitmaps, but only some support persistent dirty
bitmaps, so I think the name should reflect that.


How about .bdrv_dirty_bitmap_supports_persistent?



Conceptually, this looks reasonable.  This information might indeed be
nice to have, and I’m not sure whether we should extend any existing
interface to return it.

(The interfaces that come to my mind are:
(1) bdrv_can_store_new_dirty_bitmap() below, which we could make accept
a NULL @name to return basically the same information.  But it’s still a
bit different, because I’d expect that function to return whether any
bitmap can be stored then, not whether the node supports bitmaps at all.
  So e.g. if there are already too many bitmaps, it should return false,
even though the node itself does support bitmaps.


[which reminds me - a while ago, we had patches for qcow2 handling with 
64k bitmaps, or whatever insane number it took to overflow data 
structures, and I don't know if those ever got applied...]




(2) bdrv_get_info()/BlockDriverInfo: This information would fit in very
nicely here, but do we have to put it here just because it does?  I
don’t think so.  This patch adds 20 lines of code, that shows that it’s
very simple to add a dedicated method, and it’s certainly a bit easier
to use than to invoke bdrv_get_info() and throw away all the other
information.  Perhaps this patch only shows that BlockDriverInfo doesn’t
make much sense in the first place, and most of its fields should have
been scalar return values from dedicated functions.)


Indeed, you (re-)discovered some of the very reasons why I chose to make 
a new interface.  I could tweak the commit message to mention 
alternatives, if that would help.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [RFC PATCH 7/8] riscv: Add RV64M instructions description

2020-05-11 Thread Richard Henderson
On 4/30/20 12:21 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei 
> ---
>  riscv64.risu | 43 +++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/riscv64.risu b/riscv64.risu
> index 98141ab..f006dc8 100644
> --- a/riscv64.risu
> +++ b/riscv64.risu
> @@ -139,3 +139,46 @@ SRLW RISCV 000 rs2:5 rs1:5 101 rd:5 0011011 \
>  
>  SRAW RISCV 010 rs2:5 rs1:5 101 rd:5 0011011 \
>  !constraints { $rd != 2 && $rd != 3 && $rd != 4 && $rs1 != 2 }
> +
> +@RV64M
> +
> +MUL RISCV 001 rs2:5 rs1:5 000 rd:5 0110011 \
> +!constraints { $rd != 2 && $rd != 3 && $rd != 4 && $rs1 != 2 && $rs2 != 2 }
> +

Modulo the use of a helper function,
Reviewed-by: Richard Henderson 


r~



Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu

2020-05-11 Thread Lukas Straub
On Mon, 11 May 2020 12:49:47 +0100
Daniel P. Berrangé  wrote:

> On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:
> > Hello Everyone,
> > In many cases, if qemu has a network connection (qmp, migration, chardev, 
> > etc.)
> > to some other server and that server dies or hangs, qemu hangs too.  
> 
> If qemu as a whole hangs due to a stalled network connection, that is a
> bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
> I/O in general, such that if the network connection or remote server stalls,
> we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
> loop.
> 
> There are places in QEMU code which are not well behaved in this respect,
> but many are, and others are getting fixed where found to be important.
> 
> Arguably any place in QEMU code which can result in a hang of QEMU in the
> event of a stalled network should be considered a security flaw, because
> the network is untrusted in general.

The fact that out-of-band qmp commands exist at all shows that we have to make 
tradeoffs of developer time vs. doing things right. Sure, the migration code 
can be rewritten to use non-blocking i/o and finegrained locks. But as a 
hobbyist I don't have time to fix this.

> > These patches introduce the new 'yank' out-of-band qmp command to recover 
> > from
> > these kinds of hangs. The different subsystems register callbacks which get
> > executed with the yank command. For example the callback can shutdown() a
> > socket. This is intended for the colo use-case, but it can be used for other
> > things too of course.  
> 
> IIUC, invoking the "yank" command unconditionally kills every single
> network connection in QEMU that has registered with the "yank" subsystem.
> IMHO this is way too big of a hammer, even if we accept there are bugs in
> QEMU not handling stalled networking well.
> 
> eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
> connection used for the guest disk, we needlessly break I/O.

Yeah, these patches are intended to solve the problems with the colo use-case 
where all external connections (migration, chardevs, nbd) are just for 
replication. In other use-cases you'd enable the yank feature only on the 
non-essential connections.

> eg doing this in the chardev backend is not desirable, because the bugs
> with hanging QEMU are typically caused by the way the frontend device
> uses the chardev blocking I/O calls, instead of non-blocking I/O calls.
> 
> 
> Regards,
> Daniel



pgpbcoT0BgnY3.pgp
Description: OpenPGP digital signature


Re: [RFC PATCH 8/8] riscv: Add RV64F instructions description

2020-05-11 Thread Richard Henderson
On 4/30/20 12:21 AM, LIU Zhiwei wrote:
> +FCVT_L_S RISCV 110 00010 rs1:5 rm:3 rd:5 1010011 \
> +!constraints { $rd != 2 && $rd != 3 && $rd != 4 && $rm != 6 && $rm != 5 }
> +
> +FCVT_LU_S RISCV 110 00011 rs1:5 rm:3 rd:5 1010011 \
> +!constraints { $rd != 2 && $rd != 3 && $rd != 4 && $rm != 6 && $rm != 5 }
> +
> +FCVT_S_L RISCV 1101000 00010 rs1:5 rm:3 rd:5 1010011 \
> +!constraints { $rs1 != 2 && $rm != 6 && $rm != 5 }
> +
> +FCVT_S_LU RISCV 1101000 00011 rs1:5 rm:3 rd:5 1010011 \
> +!constraints { $rs1 != 2 && $rm != 6 && $rm != 5 }

Interesting question here: Do we really want to avoid the reserved rounding
modes, or do we want to verify that we raise an invalid operand exception?

I guess I'm fine with it either way.


r~



Re: [Bug 1877384] Re: 9pfs file create with mapped-xattr can fail on overlayfs

2020-05-11 Thread Fishface60
I've tested it (eventually, hit
https://github.com/torvalds/linux/commit/467d12f5c7842896d2de3ced74e4147ee29e97c8
while trying to build it),
it doesn't help, since my program wasn't failing from attempting to
use O_NOATIME.

The following patch fixed the -ENOENT on file create for me. I also
applied the fix to symlink. Potentially it could happen to mknod and
other calls that create a new directory entry, which couldn't be
simply fixed by altering the open file, but I've not encountered
issues there.

On Sat, 9 May 2020 at 15:05, Christian Schoenebeck
<1877...@bugs.launchpad.net> wrote:
>
> Since the report is about overlayfs being involved, could you please try if
> the following patch makes a difference?
>
> https://github.com/gkurz/qemu/commit/f7f5a1b01307af1c7b6c94672f2ce75c36f10565
>
> It's not yet on master, but will be soon.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1877384
>
> Title:
>   9pfs file create with mapped-xattr can fail on overlayfs
>
> Status in QEMU:
>   New
>
> Bug description:
>   QEMU Version: 3.1.0 as packaged in debian buster, but the code appears to 
> do the same in master.
>   qemu command-line: qemu-system-x86_64 -m 1G -nographic -nic 
> "user,model=virtio-net-pci,tftp=$(pwd),net=10.0.2.0/24,host=10.0.2.2" -fsdev 
> local,id=fs,path=$thisdir/..,security_model=mapped-xattr -device 
> virtio-9p-pci,fsdev=fs,mount_tag=fs -drive 
> "file=$rootdisk,if=virtio,format=raw" -kernel "$kernel" -initrd "$initrd" 
> -append "$append"
>
>
>   I'm using CI that runs in a Docker container and runs a qemu VM with code 
> and results shared via virtio 9p.
>   The 9p fsdev is configured with security_model=mapped-xattr
>   When the test code attempts to create a log file in an existing directory, 
> open with O_CREAT fails with -ENOENT.
>
>   The relevant strace excerpt is:
>
>   28791 openat(11, ".", O_RDONLY|O_NOFOLLOW|O_PATH|O_DIRECTORY) = 20
>   28791 openat(20, "src", 
> O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_DIRECTORY) = 21
>   28791 fcntl(21, F_SETFL, O_RDONLY|O_DIRECTORY) = 0
>   28791 close(20) = 0
>   28791 openat(21, "client.log", 
> O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW, 0600) = 20
>   28791 fcntl(20, F_SETFL, O_WRONLY|O_CREAT|O_NONBLOCK|O_NOFOLLOW) = 0
>   28791 lsetxattr("/proc/self/fd/21/client.log", "user.virtfs.uid", "\0\0\0", 
> 4, 0) = -1 ENOENT (No such file or directory)
>
>   My hypothesis for what's going wrong is since the Docker container's
>   overlayfs copies-up on writes, when it opens the file it's created a
>   new version of the `src` directory containing a `client.log`, but this
>   new src directory isn't accessible by file descriptor 20 and the
>   lsetxattr call is instead attempting to set attributes on the path in
>   the old `src` directory.
>
>   Looking at the code, a fix would be to change `hw/9pfs/9p-local.c` and
>   change `local_open2` to instead of calling `local_set_xattrat` to set
>   the xattrs by directory file descriptor and file name, to have a
>   version of local_set_xattrat` which uses `fsetxattr` to set the virtfs
>   attributes instead of the `fsetxattrat_nofollow` helper.
>
>   This reliably happened for me in CI, but I don't have access to the CI
>   host or the time to strip the test down to make a minimal test case,
>   and had difficulty reproducing the error on other machines.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1877384/+subscriptions


** Patch added: "0001-9pfs-Fix-ENOENT-on-overlayfs.patch"
   
https://bugs.launchpad.net/bugs/1877384/+attachment/5369986/+files/0001-9pfs-Fix-ENOENT-on-overlayfs.patch

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1877384

Title:
  9pfs file create with mapped-xattr can fail on overlayfs

Status in QEMU:
  New

Bug description:
  QEMU Version: 3.1.0 as packaged in debian buster, but the code appears to do 
the same in master.
  qemu command-line: qemu-system-x86_64 -m 1G -nographic -nic 
"user,model=virtio-net-pci,tftp=$(pwd),net=10.0.2.0/24,host=10.0.2.2" -fsdev 
local,id=fs,path=$thisdir/..,security_model=mapped-xattr -device 
virtio-9p-pci,fsdev=fs,mount_tag=fs -drive 
"file=$rootdisk,if=virtio,format=raw" -kernel "$kernel" -initrd "$initrd" 
-append "$append"

  
  I'm using CI that runs in a Docker container and runs a qemu VM with code and 
results shared via virtio 9p.
  The 9p fsdev is configured with security_model=mapped-xattr
  When the test code attempts to create a log file in an existing directory, 
open with O_CREAT fails with -ENOENT.

  The relevant strace excerpt is:

  28791 openat(11, ".", O_RDONLY|O_NOFOLLOW|O_PATH|O_DIRECTORY) = 20
  28791 openat(20, "src", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_DIRECTORY) 
= 21
  28791 fcntl(21, F_SETFL, O_RDONLY|O_DIRECTORY) = 0
  28791 close(20) = 

[Bug 1878067] [NEW] Assertion failure in eth_get_gso_type through the e1000e

2020-05-11 Thread Alexander Bulekov
Public bug reported:

Hello,
While fuzzing, I found an input that triggers an assertion failure in
eth_get_gso_type through the e1000e:

#1  0x7685755b in __GI_abort () at abort.c:79
#2  0x77c75dc3 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x77cd0b0a in g_assertion_message_expr () at 
/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#4  0x56875f33 in eth_get_gso_type (l3_proto=, 
l3_hdr=, l4proto=) at 
/home/alxndr/Development/qemu/net/eth.c:76
#5  0x565e09ac in net_tx_pkt_get_gso_type (pkt=0x63114800, 
tso_enable=0x1) at /home/alxndr/Development/qemu/hw/net/net_tx_pkt.c:300
#6  0x565e09ac in net_tx_pkt_build_vheader (pkt=0x63114800, 
tso_enable=, csum_enable=, gso_size=) at /home/alxndr/Development/qemu/hw/net/net_tx_pkt.c:316
#7  0x5660bdb1 in e1000e_setup_tx_offloads (core=0x7fffeeb754e0, 
tx=0x7fffeeb95748) at /home/alxndr/Development/qemu/hw/net/e1000e_core.c:637
#8  0x5660bdb1 in e1000e_tx_pkt_send (core=0x7fffeeb754e0, 
tx=0x7fffeeb95748, queue_index=) at 
/home/alxndr/Development/qemu/hw/net/e1000e_core.c:658
#9  0x5660bdb1 in e1000e_process_tx_desc (core=0x7fffeeb754e0, 
tx=0x7fffeeb95748, dp=, queue_index=) at 
/home/alxndr/Development/qemu/hw/net/e1000e_core.c:743
#10 0x5660bdb1 in e1000e_start_xmit (core=core@entry=0x7fffeeb754e0, 
txr=, txr@entry=0x7fffbe60) at 
/home/alxndr/Development/qemu/hw/net/e1000e_core.c:934
#11 0x56607e2e in e1000e_set_tctl (core=0x7fffeeb754e0, 
index=, val=) at 
/home/alxndr/Development/qemu/hw/net/e1000e_core.c:2431
#12 0x565f90fd in e1000e_core_write (core=, 
addr=, val=, size=) at 
/home/alxndr/Development/qemu/hw/net/e1000e_core.c:3261
#13 0x55ff4337 in memory_region_write_accessor (mr=, 
addr=, value=, size=, 
shift=, mask=, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:483
#14 0x55ff3ce0 in access_with_adjusted_size (addr=, 
value=, size=, access_size_min=, 
access_size_max=, access_fn=, mr=0x7fffeeb75110, 
attrs=...) at /home/alxndr/Development/qemu/memory.c:544
#15 0x55ff3ce0 in memory_region_dispatch_write (mr=, 
addr=, data=0x2b, op=, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1476

I can reproduce it in qemu 5.0 built with using:
cat << EOF | ~/Development/qemu/build/i386-softmmu/qemu-system-i386 -M 
pc-q35-5.0 -netdev user,id=qtest-bn0 -device e1000e,netdev=qtest-bn0 -display 
none -nodefaults -nographic -qtest stdio -monitor none -serial none
outl 0xcf8 0x8810
outl 0xcfc 0xe000
outl 0xcf8 0x8814
outl 0xcf8 0x8804
outw 0xcfc 0x7
outl 0xcf8 0x88a2
write 0xe420 0x1fc 
0x3ff9ffdf002467ff272d2f3ff9ffdf00246fff272d2f3ff9ffdf002477ff272d2f3ff9ffdf00247fff272d2f3ff9ffdf002487ff272d2f3ff9ffdf00248fff272d2f3ff9ffdf002497ff272d2f3ff9ffdf00249fff272d2f3ff9ffdf0024a7ff272d2f3ff9ffdf0024afff272d2f3ff9ffdf0024b7ff272d2f3ff9ffdf0024bfff272d2f3ff9ffdf0024c7ff272d2f3ff9ffdf0024cfff272d2f3ff9ffdf0024d7ff272d2f3ff9ffdf0024dfff272d2f3ff9ffdf0024e7ff272d2f3ff9ffdf0024efff272d2f3ff9ffdf0024f7ff272d2f3ff9ffdf0024272d2f3ff9ffdf002407ff272d2f3ff9ffdf00240fff272d2f3ff9ffdf002417ff272d2f3ff9ffdf00241fff272d2f3ff9ffdf002427ff272d2f3ff9ffdf00242fff272d2f3ff9ffdf002437ff272d2f3ff9ffdf00243fff272d2f3ff9ffdf002447ff272d2f3ff9ffdf00244fff272d2f3ff9ffdf002457ff272d2f3ff9ffdf00245fff272d2f3ff9ffdf002467ff272d2f3ff9ffdf00246fff27
write 0xe0b8 0x349 

Re: [RFC PATCH 4/8] riscv: Implement payload load interfaces

2020-05-11 Thread Richard Henderson
On 5/11/20 11:03 AM, Richard Henderson wrote:
>> +if (m->regs[i] != a->regs[i]) {
>> +fprintf(f, "  X%-2d: %016" PRIx64 " vs %016" PRIx64 "\n",
>> +i, m->regs[i], a->regs[i]);
>> +}
> 
> riscv doesn't name its registers with an x.

Duh.  It does.  Nevermind this.


r~



Re: [RFC PATCH 6/8] riscv: Add configure script

2020-05-11 Thread Richard Henderson
On 4/30/20 12:21 AM, LIU Zhiwei wrote:
> +++ b/configure
> @@ -58,6 +58,8 @@ guess_arch() {
>  ARCH="m68k"
>  elif check_define __powerpc64__ ; then
>  ARCH="ppc64"
> +elif check_define __riscv ; then
> +ARCH="riscv64"
>  else
>  echo "This cpu is not supported by risu. Try -h. " >&2
>  exit 1

Why "riscv64" and not "riscv"?

You can't really say more without checking __riscv_xlen.


r~



Re: [PATCH v2 0/7] RFC/WIP: Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread

2020-05-11 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20200511160951.8733-1-mlevi...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20200511160951.8733-1-mlevi...@redhat.com
Subject: [PATCH v2 0/7] RFC/WIP: Fix scsi devices plug/unplug races w.r.t 
virtio-scsi iothread
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
f10acc6 virtio-scsi: use scsi_device_get
4b28e41 scsi: Add scsi_device_get
969c784 virtio-scsi: don't touch scsi devices that are not yet realized or 
about to be un-realized
c95c33f device-core: use atomic_set on .realized property
239a8ce device-core: use RCU for list of childs of a bus
dd0c3a8 Implement drain_call_rcu and use it in hmp_device_del
cc7a085 scsi/scsi_bus: switch search direction in scsi_device_find

=== OUTPUT BEGIN ===
1/7 Checking commit cc7a085c2c59 (scsi/scsi_bus: switch search direction in 
scsi_device_find)
2/7 Checking commit dd0c3a8cfd9b (Implement drain_call_rcu and use it in 
hmp_device_del)
3/7 Checking commit 239a8cee3c60 (device-core: use RCU for list of childs of a 
bus)
4/7 Checking commit c95c33f4a7dc (device-core: use atomic_set on .realized 
property)
5/7 Checking commit 969c784b8d8e (virtio-scsi: don't touch scsi devices that 
are not yet realized or about to be un-realized)
6/7 Checking commit 4b28e41ee772 (scsi: Add scsi_device_get)
WARNING: Block comments use a trailing */ on a separate line
#29: FILE: hw/scsi/scsi-bus.c:1592:
+ * */

ERROR: braces {} are necessary for all arms of this statement
#67: FILE: hw/scsi/scsi-bus.c:1630:
+if (!dev)
[...]

total: 1 errors, 1 warnings, 66 lines checked

Patch 6/7 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

7/7 Checking commit f10acc631cf7 (virtio-scsi: use scsi_device_get)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200511160951.8733-1-mlevi...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v1 0/7] various tcg and linux-user updates

2020-05-11 Thread Laurent Vivier
Le 11/05/2020 à 13:12, Alex Bennée a écrit :
> 
> Alex Bennée  writes:
> 
>> Hi,
>>
>> Cleaning up my queues into more focused trees these are all tweaks to
>> TCG related stuff. The guest_base changes where posted before but
>> where a little radical for 5.0 but I think are worth getting in early
>> as it enables the sanitizer builds for a range of linux-user targets
>> we couldn't run before. Finally there is a little tweak made to the
>> out_asm handling which makes it a bit easier to see which guest
>> instructions are being emulated by which host code.
>>
>> The following need review:
>>
>>  - translate-all: include guest address in out_asm output
>>  - disas: add optional note support to cap_disas
>>  - disas: include an optional note for the start of disassembly
>>  - accel/tcg: don't disable exec_tb trace events
>>  - linux-user: completely re-write init_guest_space
> 
> Gentle ping,
> 
> I would especially like some feed-back on the guest base updates from
> the linux-user maintainers so we can get the sanitizers more widely
> used.
> 
> If your happy for me to include them in my next PR I'll just take some
> Acked-by's ;-)
> 
I don't have enough time to review the changes, but if you are confident
with your patch you can add:

Acked-by: Laurent Vivier 

Thanks,
Laurent



Re: [RFC PATCH 5/8] riscv: Add standard test case

2020-05-11 Thread Richard Henderson
On 4/30/20 12:21 AM, LIU Zhiwei wrote:
> Signed-off-by: LIU Zhiwei 
> ---
>  test_riscv64.s | 85 ++
>  1 file changed, 85 insertions(+)
>  create mode 100644 test_riscv64.s

Reviewed-by: Richard Henderson 


r~



Re: [RFC PATCH 4/8] riscv: Implement payload load interfaces

2020-05-11 Thread Richard Henderson
On 4/30/20 12:21 AM, LIU Zhiwei wrote:
> +void reginfo_init(struct reginfo *ri, ucontext_t *uc)
> +{
> +int i;
> +union __riscv_mc_fp_state *fp;
> +/* necessary to be able to compare with memcmp later */
> +memset(ri, 0, sizeof(*ri));
> +
> +for (i = 0; i < 32; i++) {
> +ri->regs[i] = uc->uc_mcontext.__gregs[i];
> +}
> +
> +ri->sp = 0xdeadbeefdeadbeef;
> +ri->regs[2] = 0xdeadbeefdeadbeef;
> +ri->regs[3] = 0xdeadbeefdeadbeef;
> +ri->regs[4] = 0xdeadbeefdeadbeef;
> +ri->pc = uc->uc_mcontext.__gregs[0] - image_start_address;
> +ri->faulting_insn = *((uint32_t *) uc->uc_mcontext.__gregs[0]);
> +fp = >uc_mcontext.__fpregs;
> +ri->fcsr = fp->__d.__fcsr;
> +
> +for (i = 0; i < 32; i++) {
> +ri->fregs[i] = fp->__d.__f[i];
> +}
> +}

Perhaps wrap the fp bits here in

#if __riscv_flen == 64
ri->fcsr = fp->__d.__fscr;
...
#else
# error "Unsupported fp length"
#endif

> +if (m->regs[i] != a->regs[i]) {
> +fprintf(f, "  X%-2d: %016" PRIx64 " vs %016" PRIx64 "\n",
> +i, m->regs[i], a->regs[i]);
> +}

riscv doesn't name its registers with an x.


r~



Re: [PATCH v2 0/7] RFC/WIP: Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread

2020-05-11 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20200511160951.8733-1-mlevi...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TESTiotest-qcow2: 111
  TESTiotest-qcow2: 114
**
ERROR:/tmp/qemu-test/src/qom/object.c:1124:object_unref: assertion failed: 
(obj->ref > 0)
Broken pipe
/tmp/qemu-test/src/tests/qtest/libqtest.c:175: kill_qemu() detected QEMU death 
from signal 6 (Aborted) (core dumped)
ERROR - too few tests run (expected 6, got 5)
make: *** [check-qtest-aarch64] Error 1
make: *** Waiting for unfinished jobs
  TESTiotest-qcow2: 117
  TESTiotest-qcow2: 120
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=f4bb7d21c4384634a322c2f5cd38b1bf', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-t97jyowh/src/docker-src.2020-05-11-13.48.16.8561:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=f4bb7d21c4384634a322c2f5cd38b1bf
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-t97jyowh/src'
make: *** [docker-run-test-quick@centos7] Error 2

real15m0.135s
user0m8.625s


The full log is available at
http://patchew.org/logs/20200511160951.8733-1-mlevi...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v3 1/3] qemu-sockets: add abstract UNIX domain socket support

2020-05-11 Thread Eric Blake

On 5/10/20 1:14 AM, xiaoqiang zhao wrote:

unix_listen/connect_saddr now support abstract address types

two aditional BOOL switches are introduced:
tight: whether to set @addrlen to the minimal string length,
or the maximum sun_path length. default is TRUE
abstract: whether we use abstract address. default is FALSE

cli example:
-monitor unix:/tmp/unix.socket,abstract,tight=off
OR
-chardev socket,path=/tmp/unix.socket,id=unix1,abstract,tight=on

Signed-off-by: xiaoqiang zhao 
---



+++ b/qapi/sockets.json
@@ -73,12 +73,19 @@
  # Captures a socket address in the local ("Unix socket") namespace.
  #
  # @path: filesystem path to use
+# @tight: pass a socket address length that does not include the whole
+# struct sockaddr_un record but (besides other components) only
+# the relevant part of the filename or abstract string.
+# default value is 'true'


Perhaps:

pass a socket address length confined to the minimum length of the 
abstract string, rather than the full sockaddr_un record length (only 
matters for abstract sockets, default true)



+# @abstract: whether this is a abstract address, default is 'false'


Both new fields should have a '(since 5.1)' tag, to make it obvious that 
they did not exist in earlier releases with the rest of the struct.


s/a abstract/an abstract/

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




[PATCH v1 0/1] target/microblaze: Fix FPU2 instruction check

2020-05-11 Thread Joe Komlodi
Hi all,

This fixes a backwards if statement that caused Microblaze FPU2 instructions
to not be executed, even if use-fpu=2 in the DTS.

Thanks!
Joe

Joe Komlodi (1):
  target/microblaze: Fix FPU2 instruction check

 target/microblaze/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.7.4




[PATCH v1 1/1] target/microblaze: Fix FPU2 instruction check

2020-05-11 Thread Joe Komlodi
The check to see if we can use FPU2 instructions would return 0 if
cfg.use_fpu == 2, rather than returning the PVR2_USE_FPU2_MASK.

This would cause all FPU2 instructions (fsqrt, flt, fint) to not be used.

Signed-off-by: Joe Komlodi 
---
 target/microblaze/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/microblaze/translate.c b/target/microblaze/translate.c
index 4e7f903a..329743b 100644
--- a/target/microblaze/translate.c
+++ b/target/microblaze/translate.c
@@ -1391,7 +1391,7 @@ static int dec_check_fpuv2(DisasContext *dc)
 tcg_gen_movi_i64(cpu_SR[SR_ESR], ESR_EC_FPU);
 t_gen_raise_exception(dc, EXCP_HW_EXCP);
 }
-return (dc->cpu->cfg.use_fpu == 2) ? 0 : PVR2_USE_FPU2_MASK;
+return (dc->cpu->cfg.use_fpu == 2) ? PVR2_USE_FPU2_MASK : 0;
 }
 
 static void dec_fpu(DisasContext *dc)
-- 
2.7.4




Re: [RFC PATCH 3/8] riscv: Define riscv struct reginfo

2020-05-11 Thread Richard Henderson
On 4/30/20 12:21 AM, LIU Zhiwei wrote:
> +struct reginfo {
> +uint64_t fault_address;
> +uint64_t regs[32];
> +uint64_t fregs[32];
> +uint64_t sp;
> +uint64_t pc;
> +uint32_t flags;
> +uint32_t faulting_insn;
> +
> +/* FP */
> +uint32_t fcsr;
> +};

There's no need for a separate sp field, since that's regs[2].


r~



[Bug 1878057] [NEW] null-ptr dereference in megasas_command_complete

2020-05-11 Thread Alexander Bulekov
Public bug reported:

Hello,
While fuzzing, I found an input that triggers a null-pointer dereference in
megasas_command_complete:

==14959==ERROR: AddressSanitizer: SEGV on unknown address 0x0003 (pc 
0x55b1d11b4df1 bp 0x7ffeb55ca450 sp 0x7ffeb55ca1e0 T0)
==14959==The signal is caused by a WRITE memory access.
==14959==Hint: address points to the zero page.
#0 0x55b1d11b4df1 in megasas_command_complete 
/home/alxndr/Development/qemu/hw/scsi/megasas.c:1877:40
#1 0x55b1d11759ec in scsi_req_complete 
/home/alxndr/Development/qemu/hw/scsi/scsi-bus.c:1430:5
#2 0x55b1d115c98f in scsi_aio_complete 
/home/alxndr/Development/qemu/hw/scsi/scsi-disk.c:216:5
#3 0x55b1d151c638 in blk_aio_complete 
/home/alxndr/Development/qemu/block/block-backend.c:1375:9
#4 0x55b1d151c638 in blk_aio_complete_bh 
/home/alxndr/Development/qemu/block/block-backend.c:1385:5
#5 0x55b1d16f3a5b in aio_bh_call 
/home/alxndr/Development/qemu/util/async.c:136:5
#6 0x55b1d16f3a5b in aio_bh_poll 
/home/alxndr/Development/qemu/util/async.c:164:13
#7 0x55b1d16fe43e in aio_dispatch 
/home/alxndr/Development/qemu/util/aio-posix.c:380:5
#8 0x55b1d16f54fa in aio_ctx_dispatch 
/home/alxndr/Development/qemu/util/async.c:306:5
#9 0x7f47937c89ed in g_main_context_dispatch 
(/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4e9ed)
#10 0x55b1d16fbef4 in glib_pollfds_poll 
/home/alxndr/Development/qemu/util/main-loop.c:219:9
#11 0x55b1d16fbef4 in os_host_main_loop_wait 
/home/alxndr/Development/qemu/util/main-loop.c:242:5
#12 0x55b1d16fbef4 in main_loop_wait 
/home/alxndr/Development/qemu/util/main-loop.c:518:11
#13 0x55b1d0cd16a6 in qemu_main_loop 
/home/alxndr/Development/qemu/softmmu/vl.c:1664:9
#14 0x55b1d1608dca in main /home/alxndr/Development/qemu/softmmu/main.c:49:5
#15 0x7f4792378e0a in __libc_start_main 
/build/glibc-GwnBeO/glibc-2.30/csu/../csu/libc-start.c:308:16
#16 0x55b1d091d7b9 in _start 
(/home/alxndr/Development/qemu/build/i386-softmmu/qemu-system-i386+0x8f47b9)

I can reproduce it in qemu 5.0 built with using:
cat << EOF | ~/Development/qemu/build/i386-softmmu/qemu-system-i386 -M 
pc-q35-5.0 -no-shutdown -M q35 -device megasas -device scsi-cd,drive=null0 
-blockdev driver=null-co,read-zeroes=on,node-name=null0 -nographic -qtest stdio 
-monitor none -serial none
outl 0xcf8 0x80001814
outl 0xcfc 0xc021
outl 0xcf8 0x80001818
outl 0xcf8 0x80001804
outw 0xcfc 0x7
outl 0xcf8 0x80001810
outl 0xcfc 0xe10c
outl 0xcf8 0x8000f810
write 0x44b20 0x1 0x35
write 0x44b00 0x1 0x03
write 0xc021e10c0040 0x81 
0x014b0400013100014b0400013800014b0400013f00014b0400014600014b0400014d00014b0400015400014b0400015b00014b0400016200014b0400016900014b040001714b0400017700014b0400017e00014b0400018500014b0400018c00014b04
EOF

I also attached the trace to this launchpad report, in case the
formatting is broken:

qemu-system-i386 -qtest stdio -monitor none -serial none -M pc-q35-5.0
-no-shutdown -M q35 -device megasas -device scsi-cd,drive=null0
-blockdev driver=null-co,read-zeroes=on,node-name=null0 -nographic <
attachment

Please let me know if I can provide any further info.
-Alex

** Affects: qemu
 Importance: Undecided
 Status: New

** Attachment added: "attachment"
   https://bugs.launchpad.net/bugs/1878057/+attachment/5369968/+files/attachment

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1878057

Title:
  null-ptr dereference in megasas_command_complete

Status in QEMU:
  New

Bug description:
  Hello,
  While fuzzing, I found an input that triggers a null-pointer dereference in
  megasas_command_complete:

  ==14959==ERROR: AddressSanitizer: SEGV on unknown address 0x0003 (pc 
0x55b1d11b4df1 bp 0x7ffeb55ca450 sp 0x7ffeb55ca1e0 T0)
  ==14959==The signal is caused by a WRITE memory access.
  ==14959==Hint: address points to the zero page.
  #0 0x55b1d11b4df1 in megasas_command_complete 
/home/alxndr/Development/qemu/hw/scsi/megasas.c:1877:40
  #1 0x55b1d11759ec in scsi_req_complete 
/home/alxndr/Development/qemu/hw/scsi/scsi-bus.c:1430:5
  #2 0x55b1d115c98f in scsi_aio_complete 
/home/alxndr/Development/qemu/hw/scsi/scsi-disk.c:216:5
  #3 0x55b1d151c638 in blk_aio_complete 
/home/alxndr/Development/qemu/block/block-backend.c:1375:9
  #4 0x55b1d151c638 in blk_aio_complete_bh 
/home/alxndr/Development/qemu/block/block-backend.c:1385:5
  #5 0x55b1d16f3a5b in aio_bh_call 
/home/alxndr/Development/qemu/util/async.c:136:5
  #6 0x55b1d16f3a5b in aio_bh_poll 
/home/alxndr/Development/qemu/util/async.c:164:13
  #7 0x55b1d16fe43e in aio_dispatch 
/home/alxndr/Development/qemu/util/aio-posix.c:380:5
  #8 0x55b1d16f54fa in aio_ctx_dispatch 
/home/alxndr/Development/qemu/util/async.c:306:5
  #9 0x7f47937c89ed in g_main_context_dispatch 
(/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4e9ed)
   

Re: [RFC PATCH 2/8] riscv: Generate payload scripts

2020-05-11 Thread Richard Henderson
On 4/30/20 12:21 AM, LIU Zhiwei wrote:
> +# sequence of li rd, 0x1234567887654321
> +#
> +#  0:   002471b7lui rd,0x247
> +#  4:   8ad1819baddiw   rd,rd,-1875
> +#  8:   00c19193sllird,rd,0xc
> +#  c:   f1118193addird,rd,-239 # 0x246f11
> +# 10:   00d19193sllird,rd,0xd
> +# 14:   d9518193addird,rd,-619
> +# 18:   00e19193sllird,rd,0xe
> +# 1c:   32118193addird,rd,801

You don't really need to use addiw.  Removing that special case would really
simplify this.

> +sub write_memblock_setup()
> +{
> +# Write code which sets up the memory block for loads and stores.
> +# We set r0 to point to a block of 16K length
> +# of random data, aligned to the maximum desired alignment.
> +
> +my $align = $MAXALIGN;
> +my $datalen = 16384 + $align;

risu.h:#define MEMBLOCKLEN 8192

Why are you using 16384?

Also, typo -- you're setting r10 not r0, obviously.

The rest looks fine.


r~



[Bug 1878054] [NEW] Hang with high CPU usage in sdhci_data_transfer

2020-05-11 Thread Alexander Bulekov
Public bug reported:

Hello,
While fuzzing, I found an input that causes QEMU to hang with 100% CPU usage.
I have waited several minutes, and QEMU is still unresponsive. Using gdb, It
appears that it is stuck in an sdhci_data_transfer:

#0   memory_region_access_valid (mr=, addr=0x10284920, 
size=, is_write=0xff, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1378
#1   memory_region_dispatch_write (mr=, addr=, 
data=, op=MO_32, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1463
#2   flatview_write_continue (fv=, addr=0x10284920, attrs=..., 
ptr=, len=0xb7, addr1=0x582798e0, l=, 
mr=0x582798e0 ) at 
/home/alxndr/Development/qemu/exec.c:3137
#3   flatview_write (fv=0x60645da0, addr=, attrs=..., 
buf=, len=) at 
/home/alxndr/Development/qemu/exec.c:3177
#4   address_space_write (as=, addr=, attrs=..., 
buf=0xb04f325, len=0x4) at /home/alxndr/Development/qemu/exec.c:3268
#5   address_space_rw (as=0x572509ac , 
addr=0x582798e0, attrs=..., attrs@entry=..., buf=0xb04f325, len=0x4, 
is_write=0xb8, is_write@entry=0x1) at
/home/alxndr/Development/qemu/exec.c:3278
#6   dma_memory_rw_relaxed (as=0x572509ac , 
addr=0x582798e0, buf=0xb04f325, len=0x4, dir=DMA_DIRECTION_FROM_DEVICE) 
at /home/alxndr/Development/qemu/include/sysemu/dma.h:87
#7   dma_memory_rw (as=0x572509ac , 
addr=0x582798e0, buf=0xb04f325, len=0x4, dir=DMA_DIRECTION_FROM_DEVICE) 
at /home/alxndr/Development/qemu/include/sysemu/dma.h:110
#8   dma_memory_write (as=0x572509ac , 
addr=0x582798e0, buf=0xb04f325, len=0x4) at 
/home/alxndr/Development/qemu/include/sysemu/dma.h:122
#9   sdhci_sdma_transfer_multi_blocks (s=) at 
/home/alxndr/Development/qemu/hw/sd/sdhci.c:618
#10  sdhci_data_transfer (opaque=0x61e21080) at 
/home/alxndr/Development/qemu/hw/sd/sdhci.c:891
#11  sdhci_send_command (s=0x61e21080) at 
/home/alxndr/Development/qemu/hw/sd/sdhci.c:364
#12  sdhci_write (opaque=, offset=0xc, val=, 
size=) at /home/alxndr/Development/qemu/hw/sd/sdhci.c:1158
#13  memory_region_write_accessor (mr=, addr=, 
value=, size=, shift=, 
mask=, attrs=...) at
/home/alxndr/Development/qemu/memory.c:483
#14  access_with_adjusted_size (addr=, value=, 
size=, access_size_min=, 
access_size_max=, access_fn=, mr=0x61e219f0, 
attrs=...) at /home/alxndr/Development/qemu/memory.c:544
#15  memory_region_dispatch_write (mr=, addr=, 
data=0x1ffe0ff, op=, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1476
#16  flatview_write_continue (fv=, addr=0xe106800c, attrs=..., 
ptr=, len=0xff3, addr1=0x582798e0, l=, 
mr=0x61e219f0) at /home/alxndr/Development/qemu/exec.c:3137
#17  flatview_write (fv=0x60645da0, addr=, attrs=..., 
buf=, len=) at 
/home/alxndr/Development/qemu/exec.c:3177
#18  address_space_write (as=, addr=, attrs=..., 
attrs@entry=..., buf=0xb04f325, buf@entry=0x6218ad00, len=0x4) at 
/home/alxndr/Development/qemu/exec.c:3268
#19  qtest_process_command (chr=, chr@entry=0x5827c040 
, words=) at /home/alxndr/Development/qemu/qtest.c:567
#20  qtest_process_inbuf (chr=0x5827c040 , inbuf=0x6190f640) 
at /home/alxndr/Development/qemu/qtest.c:710


I am attaching the qtest commands for reproducing it.
I can reproduce it in a qemu 5.0 build using:

qemu-system-i386 -M pc-q35-5.0 -qtest stdio -device sdhci-pci,sd-spec-
version=3 -device sd-card,drive=mydrive -drive if=sd,index=0,file=null-
co://,format=raw,id=mydrive -nographic -nographic -serial none -monitor
none < attachment

Please let me know if I can provide any further info.
-Alex

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1878054

Title:
  Hang with high CPU usage in sdhci_data_transfer

Status in QEMU:
  New

Bug description:
  Hello,
  While fuzzing, I found an input that causes QEMU to hang with 100% CPU usage.
  I have waited several minutes, and QEMU is still unresponsive. Using gdb, It
  appears that it is stuck in an sdhci_data_transfer:

  #0   memory_region_access_valid (mr=, addr=0x10284920, 
size=, is_write=0xff, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1378
  #1   memory_region_dispatch_write (mr=, addr=, 
data=, op=MO_32, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1463
  #2   flatview_write_continue (fv=, addr=0x10284920, attrs=..., 
ptr=, len=0xb7, addr1=0x582798e0, l=, 
mr=0x582798e0 ) at 
/home/alxndr/Development/qemu/exec.c:3137
  #3   flatview_write (fv=0x60645da0, addr=, attrs=..., 
buf=, len=) at 
/home/alxndr/Development/qemu/exec.c:3177
  #4   address_space_write (as=, addr=, 
attrs=..., buf=0xb04f325, len=0x4) at 
/home/alxndr/Development/qemu/exec.c:3268
  #5   address_space_rw (as=0x572509ac , 
addr=0x582798e0, attrs=..., attrs@entry=..., buf=0xb04f325, len=0x4, 
is_write=0xb8, is_write@entry=0x1) at
  /home/alxndr/Development/qemu/exec.c:3278
  #6   

[Bug 1878054] Re: Hang with high CPU usage in sdhci_data_transfer

2020-05-11 Thread Alexander Bulekov
Forgot the attachment..

** Attachment added: "attachment"
   
https://bugs.launchpad.net/qemu/+bug/1878054/+attachment/5369967/+files/attachment

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1878054

Title:
  Hang with high CPU usage in sdhci_data_transfer

Status in QEMU:
  New

Bug description:
  Hello,
  While fuzzing, I found an input that causes QEMU to hang with 100% CPU usage.
  I have waited several minutes, and QEMU is still unresponsive. Using gdb, It
  appears that it is stuck in an sdhci_data_transfer:

  #0   memory_region_access_valid (mr=, addr=0x10284920, 
size=, is_write=0xff, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1378
  #1   memory_region_dispatch_write (mr=, addr=, 
data=, op=MO_32, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1463
  #2   flatview_write_continue (fv=, addr=0x10284920, attrs=..., 
ptr=, len=0xb7, addr1=0x582798e0, l=, 
mr=0x582798e0 ) at 
/home/alxndr/Development/qemu/exec.c:3137
  #3   flatview_write (fv=0x60645da0, addr=, attrs=..., 
buf=, len=) at 
/home/alxndr/Development/qemu/exec.c:3177
  #4   address_space_write (as=, addr=, 
attrs=..., buf=0xb04f325, len=0x4) at 
/home/alxndr/Development/qemu/exec.c:3268
  #5   address_space_rw (as=0x572509ac , 
addr=0x582798e0, attrs=..., attrs@entry=..., buf=0xb04f325, len=0x4, 
is_write=0xb8, is_write@entry=0x1) at
  /home/alxndr/Development/qemu/exec.c:3278
  #6   dma_memory_rw_relaxed (as=0x572509ac , 
addr=0x582798e0, buf=0xb04f325, len=0x4, dir=DMA_DIRECTION_FROM_DEVICE) 
at /home/alxndr/Development/qemu/include/sysemu/dma.h:87
  #7   dma_memory_rw (as=0x572509ac , 
addr=0x582798e0, buf=0xb04f325, len=0x4, dir=DMA_DIRECTION_FROM_DEVICE) 
at /home/alxndr/Development/qemu/include/sysemu/dma.h:110
  #8   dma_memory_write (as=0x572509ac , 
addr=0x582798e0, buf=0xb04f325, len=0x4) at 
/home/alxndr/Development/qemu/include/sysemu/dma.h:122
  #9   sdhci_sdma_transfer_multi_blocks (s=) at 
/home/alxndr/Development/qemu/hw/sd/sdhci.c:618
  #10  sdhci_data_transfer (opaque=0x61e21080) at 
/home/alxndr/Development/qemu/hw/sd/sdhci.c:891
  #11  sdhci_send_command (s=0x61e21080) at 
/home/alxndr/Development/qemu/hw/sd/sdhci.c:364
  #12  sdhci_write (opaque=, offset=0xc, val=, 
size=) at /home/alxndr/Development/qemu/hw/sd/sdhci.c:1158
  #13  memory_region_write_accessor (mr=, addr=, 
value=, size=, shift=, 
mask=, attrs=...) at
  /home/alxndr/Development/qemu/memory.c:483
  #14  access_with_adjusted_size (addr=, value=, 
size=, access_size_min=, 
access_size_max=, access_fn=, mr=0x61e219f0, 
attrs=...) at /home/alxndr/Development/qemu/memory.c:544
  #15  memory_region_dispatch_write (mr=, addr=, 
data=0x1ffe0ff, op=, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1476
  #16  flatview_write_continue (fv=, addr=0xe106800c, attrs=..., 
ptr=, len=0xff3, addr1=0x582798e0, l=, 
mr=0x61e219f0) at /home/alxndr/Development/qemu/exec.c:3137
  #17  flatview_write (fv=0x60645da0, addr=, attrs=..., 
buf=, len=) at 
/home/alxndr/Development/qemu/exec.c:3177
  #18  address_space_write (as=, addr=, 
attrs=..., attrs@entry=..., buf=0xb04f325, buf@entry=0x6218ad00, 
len=0x4) at /home/alxndr/Development/qemu/exec.c:3268
  #19  qtest_process_command (chr=, chr@entry=0x5827c040 
, words=) at /home/alxndr/Development/qemu/qtest.c:567
  #20  qtest_process_inbuf (chr=0x5827c040 , 
inbuf=0x6190f640) at /home/alxndr/Development/qemu/qtest.c:710

  
  I am attaching the qtest commands for reproducing it.
  I can reproduce it in a qemu 5.0 build using:

  qemu-system-i386 -M pc-q35-5.0 -qtest stdio -device sdhci-pci,sd-spec-
  version=3 -device sd-card,drive=mydrive -drive if=sd,index=0,file
  =null-co://,format=raw,id=mydrive -nographic -nographic -serial none
  -monitor none < attachment

  Please let me know if I can provide any further info.
  -Alex

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1878054/+subscriptions



Re: [PATCH 3/5] block/nbd.c: Add yank feature

2020-05-11 Thread Dr. David Alan Gilbert
* Lukas Straub (lukasstra...@web.de) wrote:
> On Mon, 11 May 2020 17:19:09 +0100
> "Dr. David Alan Gilbert"  wrote:
> 
> > * Lukas Straub (lukasstra...@web.de) wrote:
> > > Add yank option, pass it to the socket-channel and register a yank
> > > function which sets s->state = NBD_CLIENT_QUIT. This is the same
> > > behaviour as if an error occured.
> > > 
> > > Signed-off-by: Lukas Straub   
> > 
> > > +static void nbd_yank(void *opaque)
> > > +{
> > > +BlockDriverState *bs = opaque;
> > > +BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
> > > +
> > > +atomic_set(>state, NBD_CLIENT_QUIT);  
> > 
> > I think I was expecting a shutdown on the socket here - why doesn't it
> > have one?
> 
> For nbd, we register two yank functions: This one and we enable the yank 
> feature on the qio channel (see function nbd_establish_connection below).

Oh I see; yeh that still surprises me a little; I'd expected one yank
per item.

Dave

> Regards,
> Lukas Straub
> 
> > Dave
> > 
> > > +}
> > > +
> > >  static void nbd_client_close(BlockDriverState *bs)
> > >  {
> > >  BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
> > > @@ -1407,14 +1421,17 @@ static void nbd_client_close(BlockDriverState *bs)
> > >  nbd_teardown_connection(bs);
> > >  }
> > >  
> > > -static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
> > > +static QIOChannelSocket *nbd_establish_connection(BlockDriverState *bs,
> > > +  SocketAddress *saddr,
> > >Error **errp)
> > >  {
> > > +BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
> > >  QIOChannelSocket *sioc;
> > >  Error *local_err = NULL;
> > >  
> > >  sioc = qio_channel_socket_new();
> > >  qio_channel_set_name(QIO_CHANNEL(sioc), "nbd-client");
> > > +qio_channel_set_yank(QIO_CHANNEL(sioc), s->yank);
> > >  
> > >  qio_channel_socket_connect_sync(sioc, saddr, _err);
> > >  if (local_err) {
> > > @@ -1438,7 +1455,7 @@ static int nbd_client_connect(BlockDriverState *bs, 
> > > Error **errp)
> > >   * establish TCP connection, return error if it fails
> > >   * TODO: Configurable retry-until-timeout behaviour.
> > >   */
> > > -QIOChannelSocket *sioc = nbd_establish_connection(s->saddr, errp);
> > > +QIOChannelSocket *sioc = nbd_establish_connection(bs, s->saddr, 
> > > errp);
> > >  
> > >  if (!sioc) {
> > >  return -ECONNREFUSED;
> > > @@ -1829,6 +1846,12 @@ static QemuOptsList nbd_runtime_opts = {
> > >  "future requests before a successful reconnect will "
> > >  "immediately fail. Default 0",
> > >  },
> > > +{
> > > +.name = "yank",
> > > +.type = QEMU_OPT_BOOL,
> > > +.help = "Forcibly close the connection and don't attempt to "
> > > +"reconnect when the 'yank' qmp command is executed.",
> > > +},
> > >  { /* end of list */ }
> > >  },
> > >  };
> > > @@ -1888,6 +1911,8 @@ static int nbd_process_options(BlockDriverState 
> > > *bs, QDict *options,
> > >  
> > >  s->reconnect_delay = qemu_opt_get_number(opts, "reconnect-delay", 0);
> > >  
> > > +s->yank = qemu_opt_get_bool(opts, "yank", false);
> > > +
> > >  ret = 0;
> > >  
> > >   error:
> > > @@ -1921,6 +1946,10 @@ static int nbd_open(BlockDriverState *bs, QDict 
> > > *options, int flags,
> > >  /* successfully connected */
> > >  s->state = NBD_CLIENT_CONNECTED;
> > >  
> > > +if (s->yank) {
> > > +yank_register_function(nbd_yank, bs);
> > > +}
> > > +
> > >  s->connection_co = qemu_coroutine_create(nbd_connection_entry, s);
> > >  bdrv_inc_in_flight(bs);
> > >  aio_co_schedule(bdrv_get_aio_context(bs), s->connection_co);
> > > @@ -1972,6 +2001,11 @@ static void nbd_close(BlockDriverState *bs)
> > >  BDRVNBDState *s = bs->opaque;
> > >  
> > >  nbd_client_close(bs);
> > > +
> > > +if (s->yank) {
> > > +yank_unregister_function(nbd_yank, bs);
> > > +}
> > > +
> > >  nbd_clear_bdrvstate(s);
> > >  }
> > >  
> > > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > > index 943df1926a..1c1578160e 100644
> > > --- a/qapi/block-core.json
> > > +++ b/qapi/block-core.json
> > > @@ -3862,6 +3862,8 @@
> > >  #   reconnect. After that time, any delayed requests and 
> > > all
> > >  #   future requests before a successful reconnect will
> > >  #   immediately fail. Default 0 (Since 4.2)
> > > +# @yank: Forcibly close the connection and don't attempt to reconnect 
> > > when
> > > +#the 'yank' qmp command is executed. (Since: 5.1)
> > >  #
> > >  # Since: 2.9
> > >  ##
> > > @@ -3870,7 +3872,8 @@
> > >  '*export': 'str',
> > >  '*tls-creds': 'str',
> > >  '*x-dirty-bitmap': 'str',
> > > -'*reconnect-delay': 'uint32' } }
> > > +

  1   2   3   4   >