Re: [PATCH 1/2] tests: fix "make check-qtest" for modular builds

2020-07-12 Thread Thomas Huth
On 10/07/2020 22.36, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann 
> ---
>  tests/qtest/Makefile.include | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/qtest/Makefile.include b/tests/qtest/Makefile.include
> index 98af2c2d9338..6a0276fd42dd 100644
> --- a/tests/qtest/Makefile.include
> +++ b/tests/qtest/Makefile.include
> @@ -277,6 +277,7 @@ tests/qtest/tco-test$(EXESUF): tests/qtest/tco-test.o 
> $(libqos-pc-obj-y)
>  tests/qtest/virtio-ccw-test$(EXESUF): tests/qtest/virtio-ccw-test.o
>  tests/qtest/display-vga-test$(EXESUF): tests/qtest/display-vga-test.o
>  tests/qtest/qom-test$(EXESUF): tests/qtest/qom-test.o
> +tests/qtest/modules-test$(EXESUF): tests/qtest/modules-test.o
>  tests/qtest/test-hmp$(EXESUF): tests/qtest/test-hmp.o
>  tests/qtest/machine-none-test$(EXESUF): tests/qtest/machine-none-test.o
>  tests/qtest/device-plug-test$(EXESUF): tests/qtest/device-plug-test.o

What was the error that you run into here? ... some words in the commit
message would be nice. Actually, I always wondered why we need a
separate entry for each and every test here ... I'd rather expect that
this is handled by a normal generic make rule instead?

 Thomas




[Bug 1887318] Re: impossible to install in OSX Yosemite 10.10.5

2020-07-12 Thread Thomas Huth
QEMU only supports the two most recent versions of macOS (see
https://www.qemu.org/docs/master/system/build-platforms.html). Support
for older versions has been removed (see
https://git.qemu.org/?p=qemu.git;a=commitdiff;h=483644c25b93236001), so
if you still want to use QEMU on such an old system, you better use an
older version of QEMU instead.

** Changed in: qemu
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1887318

Title:
  impossible to install in OSX Yosemite 10.10.5

Status in QEMU:
  Won't Fix

Bug description:
  the Brew method has glib problems, glib is impossible to install.
  the MacPorts method has a very long .log file.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1887318/+subscriptions



[Bug 1840719] Re: win98se floppy fails to boot with isapc machine

2020-07-12 Thread Roman Bolshakov
The commit fixes the issue in master branch:
https://git.qemu.org/?p=qemu.git;a=commit;h=de15df5ead400b7c3d0cf21c8164a7686dc81933

The fix is going to be released in 5.1

** Changed in: qemu
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1840719

Title:
  win98se floppy fails to boot with isapc machine

Status in QEMU:
  Fix Committed

Bug description:
  QEMU emulator version 4.1.50 (commit 50d69ee0d)

  floppy image from:
  https://winworldpc.com/download/417d71c2-ae18-c39a-11c3-a4e284a2c3a5

  $ qemu-system-i386 -M isapc -fda Windows\ 98\ Second\ Edition\ Boot.img
  SeaBIOS (version rel-1.12.1-0...)
  Booting from Floppy...
  Boot failed: could not read the boot disk

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1840719/+subscriptions



[Bug 1840719] Re: win98se floppy fails to boot with isapc machine

2020-07-12 Thread Philippe Mathieu-Daudé
** Tags removed: x86
** Tags added: i386 testcase

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1840719

Title:
  win98se floppy fails to boot with isapc machine

Status in QEMU:
  Fix Committed

Bug description:
  QEMU emulator version 4.1.50 (commit 50d69ee0d)

  floppy image from:
  https://winworldpc.com/download/417d71c2-ae18-c39a-11c3-a4e284a2c3a5

  $ qemu-system-i386 -M isapc -fda Windows\ 98\ Second\ Edition\ Boot.img
  SeaBIOS (version rel-1.12.1-0...)
  Booting from Floppy...
  Boot failed: could not read the boot disk

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1840719/+subscriptions



Re: [PATCH v3 4/4] spapr: Forbid nested KVM-HV in pre-power9 compat mode

2020-07-12 Thread David Gibson
On Fri, Jul 03, 2020 at 04:19:24PM +0200, Greg Kurz wrote:
> On Mon, 15 Jun 2020 11:20:31 +0200
> Greg Kurz  wrote:
> 
> > On Sat, 13 Jun 2020 17:18:04 +1000
> > David Gibson  wrote:
> > 
> > > On Thu, Jun 11, 2020 at 03:40:33PM +0200, Greg Kurz wrote:
> > > > Nested KVM-HV only works on POWER9.
> > > > 
> > > > Signed-off-by: Greg Kurz 
> > > > Reviewed-by: Laurent Vivier 
> > > 
> > > Hrm.  I have mixed feelings about this.  It does bring forward an
> > > error that we'd otherwise only discover when we try to load the kvm
> > > module in the guest.
> > > 
> > > On the other hand, it's kind of a layering violation - really it's
> > > KVM's business to report what it can and can't do, rather than having
> > > qemu anticipate it.
> > > 
> > 
> > Agreed and it seems that we can probably get KVM to report that
> > already. I'll have closer look.
> > 
> 
> Checking the KVM_CAP_PPC_NESTED_HV extension only reports what the host
> supports. It can't reasonably take into account that we're going to
> switch vCPUs in some compat mode later on. KVM could possibly check
> that it has a vCPU in pre-power9 compat mode when we try to enable
> the capability and fail... but it would be a layering violation all
> the same. The KVM that doesn't like pre-power9 CPUs isn't the one in
> the host, it is the one in the guest, and it's not even directly
> related to the CPU type but to the MMU mode currently in use:
> 
> long kvmhv_nested_init(void)
> {
>   long int ptb_order;
>   unsigned long ptcr;
>   long rc;
> 
>   if (!kvmhv_on_pseries())
>   return 0;
> ==>   if (!radix_enabled())
>   return -ENODEV;
> 
> We cannot know either for sure the MMU mode the guest will run in
> when we enable the nested cap during the initial machine reset.
> So it seems we cannot do anything better than denylisting well
> known broken setups, in which case QEMU seems a better fit than
> KVM.
> 
> Makes sense ?

Yeah, good points.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v4 05/11] target/ppc: add vmulld instruction

2020-07-12 Thread David Gibson
On Wed, Jul 01, 2020 at 06:43:40PM -0500, Lijun Pan wrote:
> vmulld: Vector Multiply Low Doubleword.
> 
> Signed-off-by: Lijun Pan 

Applied to ppc-for-5.2.

> ---
> v4: add missing changes, and split to 5/11, 6/11, 7/11
> v3: use tcg_gen_gvec_mul()
> v2: fix coding style
> use Power ISA 3.1 flag
> 
>  target/ppc/translate/vmx-impl.inc.c | 1 +
>  target/ppc/translate/vmx-ops.inc.c  | 4 
>  2 files changed, 5 insertions(+)
> 
> diff --git a/target/ppc/translate/vmx-impl.inc.c 
> b/target/ppc/translate/vmx-impl.inc.c
> index 6e79ffa650..8c89738552 100644
> --- a/target/ppc/translate/vmx-impl.inc.c
> +++ b/target/ppc/translate/vmx-impl.inc.c
> @@ -807,6 +807,7 @@ GEN_VXFORM_DUAL(vmulouw, PPC_ALTIVEC, PPC_NONE,
>  GEN_VXFORM(vmulosb, 4, 4);
>  GEN_VXFORM(vmulosh, 4, 5);
>  GEN_VXFORM(vmulosw, 4, 6);
> +GEN_VXFORM_V(vmulld, MO_64, tcg_gen_gvec_mul, 4, 7);
>  GEN_VXFORM(vmuleub, 4, 8);
>  GEN_VXFORM(vmuleuh, 4, 9);
>  GEN_VXFORM(vmuleuw, 4, 10);
> diff --git a/target/ppc/translate/vmx-ops.inc.c 
> b/target/ppc/translate/vmx-ops.inc.c
> index 84e05fb827..b49787ac97 100644
> --- a/target/ppc/translate/vmx-ops.inc.c
> +++ b/target/ppc/translate/vmx-ops.inc.c
> @@ -48,6 +48,9 @@ GEN_HANDLER_E(name, 0x04, opc2, opc3, inval, PPC_NONE, 
> PPC2_ISA300)
>  GEN_HANDLER_E_2(name, 0x04, opc2, opc3, opc4, 0x, PPC_NONE, \
> PPC2_ISA300)
>  
> +#define GEN_VXFORM_310(name, opc2, opc3)\
> +GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x, PPC_NONE, PPC2_ISA310)
> +
>  #define GEN_VXFORM_DUAL(name0, name1, opc2, opc3, type0, type1) \
>  GEN_HANDLER_E(name0##_##name1, 0x4, opc2, opc3, 0x, type0, type1)
>  
> @@ -104,6 +107,7 @@ GEN_VXFORM_DUAL(vmulouw, vmuluwm, 4, 2, PPC_ALTIVEC, 
> PPC_NONE),
>  GEN_VXFORM(vmulosb, 4, 4),
>  GEN_VXFORM(vmulosh, 4, 5),
>  GEN_VXFORM_207(vmulosw, 4, 6),
> +GEN_VXFORM_310(vmulld, 4, 7),
>  GEN_VXFORM(vmuleub, 4, 8),
>  GEN_VXFORM(vmuleuh, 4, 9),
>  GEN_VXFORM_207(vmuleuw, 4, 10),

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v4 06/11] Update PowerPC AT_HWCAP2 definition

2020-07-12 Thread David Gibson
On Wed, Jul 01, 2020 at 06:43:41PM -0500, Lijun Pan wrote:
> Add PPC2_FEATURE2_ARCH_3_10 to the PowerPC AT_HWCAP2 definitions.
> 
> Signed-off-by: Lijun Pan 
> ---
> v4: add missing changes, and split to 5/11, 6/11, 7/11
> v3: use tcg_gen_gvec_mul()
> v2: fix coding style
> use Power ISA 3.1 flag
> 
>  include/elf.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/elf.h b/include/elf.h
> index 8fbfe60e09..1858b95acf 100644
> --- a/include/elf.h
> +++ b/include/elf.h
> @@ -554,6 +554,7 @@ typedef struct {
>  #define PPC_FEATURE2_HTM_NOSC   0x0100
>  #define PPC_FEATURE2_ARCH_3_00  0x0080
>  #define PPC_FEATURE2_HAS_IEEE1280x0040
> +#define PPC_FEATURE2_ARCH_3_10  0x0020
>  
>  /* Bits present in AT_HWCAP for Sparc.  */


Um.. in the corresponding #defines in the kernel 0x0020 is given
to PPC_FEATURE2_DARN, and several more bits are allocated past that
point.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v8 03/10] qcow2_format.py: change Qcow2BitmapExt initialization method

2020-07-12 Thread Andrey Shinkevich

On 11.07.2020 19:34, Vladimir Sementsov-Ogievskiy wrote:

03.07.2020 16:13, Andrey Shinkevich wrote:

There are two ways to initialize a class derived from Qcow2Struct:
1. Pass a block of binary data to the constructor.
2. Pass the file descriptor to allow reading the file from constructor.
Let's change the Qcow2BitmapExt initialization method from 1 to 2 to
support a scattered reading in the initialization chain.
The implementation comes with the patch that follows.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
---
  tests/qemu-iotests/qcow2_format.py | 14 --
  1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/qcow2_format.py 
b/tests/qemu-iotests/qcow2_format.py

index 2f3681b..1435e34 100644
--- a/tests/qemu-iotests/qcow2_format.py
+++ b/tests/qemu-iotests/qcow2_format.py
@@ -63,7 +63,8 @@ class Qcow2StructMeta(type):
    class Qcow2Struct(metaclass=Qcow2StructMeta):
  -    """Qcow2Struct: base class for qcow2 data structures
+    """
+    Qcow2Struct: base class for qcow2 data structures


Unrelated chunk. And why?



To conform to the common style for comments in the file as it is at

class QcowHeaderExtension::__init__()




    Successors should define fields class variable, which is: 
list of tuples,

  each of three elements:
@@ -113,6 +114,9 @@ class Qcow2BitmapExt(Qcow2Struct):
  ('u64', '{:#x}', 'bitmap_directory_offset')
  )
  +    def __init__(self, fd):
+    super().__init__(fd=fd)


this does nothing. We inherit the __init__ of super class, no need to 
define it just to call same __init__.



+
    QCOW2_EXT_MAGIC_BITMAPS = 0x23852875
  @@ -173,7 +177,13 @@ class QcowHeaderExtension(Qcow2Struct):
  self.data_str = data_str
    if self.magic == QCOW2_EXT_MAGIC_BITMAPS:
-    self.obj = Qcow2BitmapExt(data=self.data)
+    assert fd is not None
+    position = fd.tell()
+    # Step back to reread data


This definitely shows that we are doing something wrong



For Qcow2BitmapExt, we need both fd and data and they are mutualy exclusive

in the constructor of the class Qcow2Struct. Rereading the bitmap extension

is a solution without changing the Qcow2Struct. Any other suggestion?

Andrey





+    padded = (self.length + 7) & ~7
+    fd.seek(-padded, 1)
+    self.obj = Qcow2BitmapExt(fd=fd)
+    fd.seek(position)
  else:
  self.obj = None








[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2020-07-12 Thread Rafael David Tinoco
Worked being done for the Bionic SRU:

BUG: https://bugs.launchpad.net/qemu/+bug/1805256
(fix for the bionic regression demonstrated at LP: #1885419)
PPA: https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1805256-bionic
MERGE: https://tinyurl.com/y8sucs6x

Merge proposal currently going under review, tests and discussions.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Triaged
Status in kunpeng920 ubuntu-18.04 series:
  Triaged
Status in kunpeng920 ubuntu-18.04-hwe series:
  Triaged
Status in kunpeng920 ubuntu-19.10 series:
  Fix Released
Status in kunpeng920 ubuntu-20.04 series:
  Fix Released
Status in kunpeng920 upstream-kernel series:
  Invalid
Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Bionic:
  In Progress
Status in qemu source package in Eoan:
  Fix Released
Status in qemu source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * QEMU locking primitives might face a race condition in QEMU Async
  I/O bottom halves scheduling. This leads to a dead lock making either
  QEMU or one of its tools to hang indefinitely.

  [Test Case]

  * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs in Aarch64.

  [Regression Potential]

  * This is a change to a core part of QEMU: The AIO scheduling. It
  works like a "kernel" scheduler, whereas kernel schedules OS tasks,
  the QEMU AIO code is responsible to schedule QEMU coroutines or event
  listeners callbacks.

  * There was a long discussion upstream about primitives and Aarch64.
  After quite sometime Paolo released this patch and it solves the
  issue. Tested platforms were: amd64 and aarch64 based on his commit
  log.

  * Christian suggests that this fix stay little longer in -proposed to
  make sure it won't cause any regressions.

  * dannf suggests we also check for performance regressions; e.g. how
  long it takes to convert a cloud image on high-core systems.

  [Other Info]

   * Original Description bellow:

  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in 

[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2020-07-12 Thread Launchpad Bug Tracker
** Merge proposal linked:
   
https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/qemu/+git/qemu/+merge/387269

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Triaged
Status in kunpeng920 ubuntu-18.04 series:
  Triaged
Status in kunpeng920 ubuntu-18.04-hwe series:
  Triaged
Status in kunpeng920 ubuntu-19.10 series:
  Fix Released
Status in kunpeng920 ubuntu-20.04 series:
  Fix Released
Status in kunpeng920 upstream-kernel series:
  Invalid
Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Bionic:
  In Progress
Status in qemu source package in Eoan:
  Fix Released
Status in qemu source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * QEMU locking primitives might face a race condition in QEMU Async
  I/O bottom halves scheduling. This leads to a dead lock making either
  QEMU or one of its tools to hang indefinitely.

  [Test Case]

  * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs in Aarch64.

  [Regression Potential]

  * This is a change to a core part of QEMU: The AIO scheduling. It
  works like a "kernel" scheduler, whereas kernel schedules OS tasks,
  the QEMU AIO code is responsible to schedule QEMU coroutines or event
  listeners callbacks.

  * There was a long discussion upstream about primitives and Aarch64.
  After quite sometime Paolo released this patch and it solves the
  issue. Tested platforms were: amd64 and aarch64 based on his commit
  log.

  * Christian suggests that this fix stay little longer in -proposed to
  make sure it won't cause any regressions.

  * dannf suggests we also check for performance regressions; e.g. how
  long it takes to convert a cloud image on high-core systems.

  [Other Info]

   * Original Description bellow:

  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, 

Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-12 Thread Jason Wang



On 2020/7/10 下午9:30, Peter Xu wrote:

On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote:

On 2020/7/9 下午10:10, Peter Xu wrote:

On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote:

- If we care the performance, it's better to implement the MAP event for
vhost, otherwise it could be a lot of IOTLB miss

I feel like these are two things.

So far what we are talking about is whether vt-d should have knowledge about
what kind of events one iommu notifier is interested in.  I still think we
should keep this as answered in question 1.

The other question is whether we want to switch vhost from UNMAP to MAP/UNMAP
events even without vDMA, so that vhost can establish the mapping even before
IO starts.  IMHO it's doable, but only if the guest runs DPDK workloads.  When
the guest is using dynamic iommu page mappings, I feel like that can be even
slower, because then the worst case is for each IO we'll need to vmexit twice:

 - The first vmexit caused by an invalidation to MAP the page tables, so 
vhost
   will setup the page table before IO starts

 - IO/DMA triggers and completes

 - The second vmexit caused by another invalidation to UNMAP the page tables

So it seems to be worse than when vhost only uses UNMAP like right now.  At
least we only have one vmexit (when UNMAP).  We'll have a vhost translate()
request from kernel to userspace, but IMHO that's cheaper than the vmexit.

Right but then I would still prefer to have another notifier.

Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a
dedicated command for flushing device IOTLB. But the check for
vtd_as_has_map_notifier is used to skip the device which can do demand
paging via ATS or device specific way. If we have two different notifiers,
vhost will be on the device iotlb notifier so we don't need it at all?

But we can still have iommu notifier that only registers to UNMAP even after we
introduce dev-iotlb notifier?  We don't want to do page walk for them as well.
TCG should be the only one so far, but I don't know.. maybe there can still be
new ones?


I think you're right. But looking at the codes, it looks like the check of
vtd_as_has_map_notifier() was only used in:

1) vtd_iommu_replay()
2) vtd_iotlb_page_invalidate_notify() (PSI)

For the replay, it's expensive anyhow. For PSI, I think it's just about one
or few mappings, not sure it will have obvious performance impact.

And I had two questions:

1) The codes doesn't check map for DSI or GI, does this match what spec
said? (It looks to me the spec is unclear in this part)

Both DSI/GI should cover maps too?  E.g. vtd_sync_shadow_page_table() in
vtd_iotlb_domain_invalidate().



I meant the code doesn't check whether there's an MAP notifier :)





2) for the replay() I don't see other implementations (either spapr or
generic one) that did unmap (actually they skip unmap explicitly), any
reason for doing this in intel IOMMU?

I could be wrong, but I'd guess it's because vt-d implemented the caching mode
by leveraging the same invalidation strucuture, so it's harder to make all
things right (IOW, we can't clearly identify MAP with UNMAP when we receive an
invalidation request, because MAP/UNMAP requests look the same).

I didn't check others, but I believe spapr is doing it differently by using
some hypercalls to deliver IOMMU map/unmap requests, which seems a bit close to
what virtio-iommu is doing.  Anyway, the point is if we have explicit MAP/UNMAP
from the guest, logically the replay indeed does not need to do any unmap
because we don't need to call replay() on an already existing device but only
for e.g. hot plug.



But this looks conflict with what memory_region_iommu_replay( ) did, for 
IOMMU that doesn't have a replay method, it skips UNMAP request:


    for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
    iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n->iommu_idx);
    if (iotlb.perm != IOMMU_NONE) {
    n->notify(n, );
    }

I guess there's no knowledge of whether guest have an explicit MAP/UMAP 
for this generic code. Or replay implies that guest doesn't have 
explicit MAP/UNMAP?


(btw, the code shortcut the memory_region_notify_one(), not sure the reason)



  VT-d does not have that clear interface, so VT-d needs to
maintain its own mapping structures, and also vt-d is using the same replay &
page_walk operations to sync all these structures, which complicated the vt-d
replay a bit.  With that, we assume replay() can be called anytime on a device,
and we won't notify duplicated MAPs to lower layer like vfio if it is mapped
before.  At the meantime, since we'll compare the latest mapping with the one
we cached in the iova tree, UNMAP becomes possible too.



AFAIK vtd_iommu_replay() did a completely UNMAP:

    /*
 * The replay can be triggered by either a invalidation or a newly
 * created entry. No matter what, we release existing mappings
 * (it means flushing caches 

Re: [RFC 12/65] target/riscv: rvv-0.9: update check functions

2020-07-12 Thread Frank Chang
On Sat, Jul 11, 2020 at 1:51 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 7/10/20 3:48 AM, frank.ch...@sifive.com wrote:
> > +#define REQUIRE_RVV do {\
> > +if (s->mstatus_vs == 0) \
> > +return false;   \
> > +} while (0)
>
> You've used this macro already back in patch 7.  I guess it should not have
> been there?  Or this bit belongs there, one or the other.
>
> I think this patch requires a description and justification.  I have no
> idea
> why you are replacing
>

Yes, this change should be moved to patch 7.


>
> > -return (vext_check_isa_ill(s) &&
> > -vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> > -vext_check_reg(s, a->rd, false) &&
> > -vext_check_reg(s, a->rs2, false) &&
> > -vext_check_reg(s, a->rs1, false));
>
> with invisible returns
>
> > +REQUIRE_RVV;
> > +VEXT_CHECK_ISA_ILL(s);
> > +VEXT_CHECK_SSS(s, a->rd, a->rs1, a->rs2, a->vm, true);
> > +return true;
>
>
> r~
>

You're right, I will resend the patches with more description and
justification.

Frank Chang


Re: [RFC 13/65] target/riscv: rvv-0.9: configure instructions

2020-07-12 Thread Frank Chang
On Sat, Jul 11, 2020 at 2:07 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 7/10/20 3:48 AM, frank.ch...@sifive.com wrote:
> > -static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl *a)
> > +static bool trans_vsetvl(DisasContext *s, arg_vsetvl *a)
>
> Do not mix this change with anything else.


OK~
---
Frank Chang


> > +rd = tcg_const_i32(a->rd);
> > +rs1 = tcg_const_i32(a->rs1);
>
> Any time you put a register number into a tcg const, there's probably a
> better
> way to do things.


> > -/* Using x0 as the rs1 register specifier, encodes an infinite AVL
> */
> > -if (a->rs1 == 0) {
> > -/* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
> > -s1 = tcg_const_tl(RV_VLEN_MAX);
> > -} else {
> > -s1 = tcg_temp_new();
> > -gen_get_gpr(s1, a->rs1);
> > -}
>
> E.g. this code should be kept, and add
>
> if (a->rd == 0 && a->rs1 == 0) {
> s1 = tcg_temp_new();
> tcg_gen_mov_tl(s1, cpu_vl);
> } else ...
>
OK~

>
> > +if ((sew > cpu->cfg.elen)
> > +|| vill
> > +|| vflmul < ((float)sew / cpu->cfg.elen)
> > +|| (ediv != 0)
> > +|| (reserved != 0)) {
> >  /* only set vill bit. */
> >  env->vtype = FIELD_DP64(0, VTYPE, VILL, 1);
> > -env->vl = 0;
> > -env->vstart = 0;
> >  return 0;
> >  }
>
> You do need to check 0.7.1 so long as it's supported.
>
>
> r~
>

Will drop 0.7.1 support in my first patch to prevent the confusion.

Frank Chang


Re: [RFC 14/65] target/riscv: rvv-0.9: stride load and store instructions

2020-07-12 Thread Frank Chang
On Sat, Jul 11, 2020 at 2:15 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 7/10/20 3:48 AM, frank.ch...@sifive.com wrote:
> >  # *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
> > -vlb_v  ... 100 . 0 . 000 . 111 @r2_nfvm
> > -vlh_v  ... 100 . 0 . 101 . 111 @r2_nfvm
> > -vlw_v  ... 100 . 0 . 110 . 111 @r2_nfvm
>
> Again, something you can't do until 0.7.1 is not supported.
>
> If you don't want to simultaneously support 0.7.1 and 0.9/1.0, then you
> should
> simply remove 0.7.1 in the first patch, so that there's no confusion.
>
> Is the rest of it mostly renaming?  You should definitely expand on what
> you're
> doing within each patch description.  A description of what has changed in
> the
> spec since 0.7.1 will help the reviewer validate that you've gotten all of
> the
> corner cases.
>
> I am going to stop reviewing this patch series now, as I expect that most
> of
> the remaining patches will have similar comments.
>
>
> r~
>

Thanks for the reviews.

I will rearrange my commits as what you suggest and add more comments in my
next patchset.

--
Frank Chang


Re: [RFC 00/65] target/riscv: support vector extension v0.9

2020-07-12 Thread Frank Chang
On Sat, Jul 11, 2020 at 5:53 AM Alistair Francis 
wrote:

> On Fri, Jul 10, 2020 at 5:59 AM  wrote:
> >
> > From: Frank Chang 
> >
> > This patchset implements the vector extension v0.9 for RISC-V on QEMU.
> >
> > This patchset is sent as RFC because RVV v0.9 is still in draft state.
> > However, as RVV v1.0 should be ratified soon and there shouldn't be
> > critical changes between RVV v1.0 and RVV v0.9. We would like to have
> > the community to review RVV v0.9 implementations. Once RVV v1.0 is
> > ratified, we will then upgrade to RVV v1.0.
> >
> > You can change the cpu argument: vext_spec to v0.9 (i.e. vext_spec=v0.9)
> > to run with RVV v0.9 instructions.
>
> Hello,
>
> First off thanks for the patches!
>
> QEMU has a policy of accepting draft specs as experimental. We
> currently support the v0.7.1 Vector extension for example, so this
> does not need to be an RFC and can be a full patch series that can be
> merged into master.
>
> I have applied the first few patches (PR should be out next week) and
> they should be in the QEMU 5.1 release. QEMU is currently in a freeze
> so I won't be able to merge this series for 5.1. In saying that please
> feel free to continue to send patches to the list, they can still be
> reviewed.
>
> In general we would need to gracefully handle extension upgrades and
> maintain backwards compatibility. This same principle doesn't apply to
> experimental features though (such as the vector extension) so you are
> free to remove support for the v0.7.1. For users who want v0.7.1
> support they can always use the QEMU 5.1. release. Just make sure that
> your series does not break bisectability.
>
> Thanks again for the patches!
>
> Alistair
>

Hi Alistair,

Thanks for the review.

Currently I would prefer to drop 0.7.1 support because I don't know if
there's
a good way to keep both 0.7.1 and 0.9 opcodes. I'm afraid it would cause the
encoding overlap while compiling with decodetree.

Does decodetree support any feature for multi-version opcodes?
Or if it can use something like C macros to compile with the opcodes by the
vspec
user assigned? If there's any good way to keep both versions, then I can
try to rearrange
my codes to support both vspecs.

Otherwise, I'll keep on my current approach to drop the support of v0.7.1
as the way
Richard has mentioned:
*If you don't want to simultaneously support 0.7.1 and 0.9/1.0, then you
should*
*simply remove 0.7.1 in the first patch, so that there's no confusion.*

Any suggestion would be appreciated.

Thanks
--
Frank Chang


> >
> > Chih-Min Chao (2):
> >   fpu: fix float16 nan check
> >   fpu: add api to handle alternative sNaN propagation
> >
> > Frank Chang (58):
> >   target/riscv: fix rsub gvec tcg_assert_listed_vecop assertion
> >   target/riscv: correct the gvec IR called in gen_vec_rsub16_i64()
> >   target/riscv: fix return value of do_opivx_widen()
> >   target/riscv: fix vill bit index in vtype register
> >   target/riscv: remove vsll.vi, vsrl.vi, vsra.vi insns from using gvec
> >   target/riscv: rvv-0.9: remove MLEN calculations
> >   target/riscv: rvv-0.9: add fractional LMUL, VTA and VMA
> >   target/riscv: rvv-0.9: update check functions
> >   target/riscv: rvv-0.9: configure instructions
> >   target/riscv: rvv-0.9: stride load and store instructions
> >   target/riscv: rvv-0.9: index load and store instructions
> >   target/riscv: rvv-0.9: fix address index overflow bug of indexed
> > load/store insns
> >   target/riscv: rvv-0.9: fault-only-first unit stride load
> >   target/riscv: rvv-0.9: amo operations
> >   target/riscv: rvv-0.9: load/store whole register instructions
> >   target/riscv: rvv-0.9: update vext_max_elems() for load/store insns
> >   target/riscv: rvv-0.9: take fractional LMUL into vector max elements
> > calculation
> >   target/riscv: rvv-0.9: floating-point square-root instruction
> >   target/riscv: rvv-0.9: floating-point classify instructions
> >   target/riscv: rvv-0.9: mask population count instruction
> >   target/riscv: rvv-0.9: find-first-set mask bit instruction
> >   target/riscv: rvv-0.9: set-X-first mask bit instructions
> >   target/riscv: rvv-0.9: iota instruction
> >   target/riscv: rvv-0.9: element index instruction
> >   target/riscv: rvv-0.9: integer scalar move instructions
> >   target/riscv: rvv-0.9: floating-point scalar move instructions
> >   target/riscv: rvv-0.9: whole register move instructions
> >   target/riscv: rvv-0.9: integer extension instructions
> >   target/riscv: rvv-0.9: single-width averaging add and subtract
> > instructions
> >   target/riscv: rvv-0.9: integer add-with-carry/subtract-with-borrow
> >   target/riscv: rvv-0.9: narrowing integer right shift instructions
> >   target/riscv: rvv-0.9: widening integer multiply-add instructions
> >   target/riscv: rvv-0.9: quad-widening integer multiply-add instructions
> >   target/riscv: rvv-0.9: integer merge and move instructions
> >   target/riscv: rvv-0.9: single-width saturating add and subtract

Re: [PATCH v4 4/7] hw/riscv: Use pre-built bios image of generic platform for virt & sifive_u

2020-07-12 Thread Bin Meng
On Sun, Jul 12, 2020 at 1:34 AM Alistair Francis  wrote:
>
> On Thu, Jul 9, 2020 at 10:07 PM Bin Meng  wrote:
> >
> > From: Bin Meng 
> >
> > Update virt and sifive_u machines to use the opensbi fw_dynamic bios
> > image built for the generic FDT platform.
> >
> > Remove the out-of-date no longer used bios images.
> >
> > Signed-off-by: Bin Meng 
> > Reviewed-by: Anup Patel 
> > Reviewed-by: Alistair Francis 
>
> This patch seems to break 32-bit Linux boots on the sifive_u and virt 
> machines.
>

It looks only Linux boot on sifive_u is broken. On our side, we have
been using VxWorks to test 32-bit OpenSBI on sifive_u so this issue
gets unnoticed. I will take a look.

Regards,
Bin



Re: [PATCH v2 01/17] tcg: Introduce target-specific page data for user-only

2020-07-12 Thread Richard Henderson
On 6/25/20 9:20 AM, Peter Maydell wrote:
> On Fri, 5 Jun 2020 at 05:17, Richard Henderson
>> @@ -787,9 +788,11 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong 
>> old_size,
>>  new_addr = -1;
>>  } else {
>>  new_addr = h2g(host_addr);
>> +/* FIXME: Move page flags and target_data for each page.  */
> 
> Is this something we're going to address later in the patchset?

I've removed the comment.

The mremap system call is not as general as I think it should be.  It only
applies to MAP_SHARED vmas and returns EINVAL on MAP_PRIVATE.  Therefore, at
least for the MTE usage of target_data, there cannot be any.


r~



Re: [PATCH v4 7/7] Makefile: Ship the generic platform bios images for RISC-V

2020-07-12 Thread Bin Meng
On Sun, Jul 12, 2020 at 1:28 AM Alistair Francis  wrote:
>
> On Fri, Jul 10, 2020 at 11:36 AM Alistair Francis  
> wrote:
> >
> > On Thu, Jul 9, 2020 at 10:11 PM Bin Meng  wrote:
> > >
> > > From: Bin Meng 
> > >
> > > Update the install blob list to include the generic platform
> > > fw_dynamic bios images.
> > >
> > > Signed-off-by: Bin Meng 
> >
> > You didn't address the comments in v3.
> >
> > Thinking about this more though it looks like we currently don't
> > install anything, so this is an improvement.
> >
> > Reviewed-by: Alistair Francis 
>
> Nope, I was wrong. This should be squashed into patch 4 where you
> remove the installed binaries.

Not entirely correct. The .bin changes should go to patch 4, and .elf
changes should remain in this patch I think.

Regards,
Bin



Re: [PATCH v2 2/2] hw/riscv: sifive_u: Provide a reliable way for bootloader to detect whether it is running in QEMU

2020-07-12 Thread Bin Meng
Hi Alistair,

On Sun, Jul 12, 2020 at 12:04 AM Alistair Francis  wrote:
>
> On Thu, Jul 9, 2020 at 5:50 PM Bin Meng  wrote:
> >
> > Hi Palmer,
> >
> > On Fri, Jul 10, 2020 at 8:45 AM Palmer Dabbelt  
> > wrote:
> > >
> > > On Thu, 09 Jul 2020 15:09:18 PDT (-0700), alistai...@gmail.com wrote:
> > > > On Thu, Jul 9, 2020 at 3:07 AM Bin Meng  wrote:
> > > >>
> > > >> From: Bin Meng 
> > > >>
> > > >> The reset vector codes are subject to change, e.g.: with recent
> > > >> fw_dynamic type image support, it breaks oreboot again.
> > > >
> > > > This is a recurring problem, I have another patch for Oreboot to fix
> > > > the latest breakage.
> > > >
> > > >>
> > > >> Add a subregion in the MROM, with the size of machine RAM stored,
> > > >> so that we can provide a reliable way for bootloader to detect
> > > >> whether it is running in QEMU.
> > > >
> > > > I don't really like this though. I would prefer that we don't
> > > > encourage guest software to behave differently on QEMU. I don't think
> > > > other upstream boards do this.
> > >
> > > I agree.  If you want an explicitly virtual board, use the virt board.  
> > > Users
> > > of sifive_u are presumably trying to do their best to test against what 
> > > the
> > > hardware does without actually using the hardware.  Otherwise there 
> > > should be
> > > no reason to use the sifive_u board, as it's just sticking a layer of
> > > complexity in the middle of everything.
> >
> > Understood. Then let's drop this patch.
> >
> > >
> > > > Besides Oreboot setting up the clocks are there any other users of this?
> > >
> > > IIRC we have a scheme for handling the clock setup in QEMU where we accept
> > > pretty much any control write and then just return reads that say the 
> > > PLLs have
> > > locked.  I'd be in favor of improving the scheme to improve compatibility 
> > > with
> > > the actual hardware, but adding some way for programs to skip the clocks
> > > because they know they're in QEMU seems like the wrong way to go.
> > >
> >
> > Yep, that's my question to Oreboot too.
> >
> > U-Boot SPL can boot with QEMU and no problem was seen with clock
> > settings in PRCI model in QEMU.
>
> I don't think it's an unsolvable problem. There is just little work on
> Oreboot to run on QEMU. I can dig into it a bit and see if I can find
> a better fix on the Oreboot side.
>

Can we remove the QEMU detect logic completely in Oreboot? Except the
QSPI controller QEMU should be able to run Oreboot since it runs
U-Boot SPL.

Regards,
Bin



Re: Slow down with: 'Make "info qom-tree" show children sorted'

2020-07-12 Thread David Gibson
On Tue, 07 Jul 2020 14:00:06 +0200
Markus Armbruster  wrote:

> Paolo Bonzini  writes:
> 
> > On 07/07/20 07:33, Markus Armbruster wrote:  
> >> Philippe Mathieu-Daudé  writes:
> >>   
> >>> On 7/7/20 6:45 AM, Thomas Huth wrote:  
>  On 27/05/2020 10.47, Markus Armbruster wrote:  
> > "info qom-tree" prints children in unstable order.  This is a pain
> > when diffing output for different versions to find change.  Print it
> > sorted.
> >
> > Signed-off-by: Markus Armbruster 
> > ---
> >  qom/qom-hmp-cmds.c | 24 
> >  1 file changed, 16 insertions(+), 8 deletions(-)  
> 
>   Hi Markus,
> 
>  this patch causes a slow down of the qtests which becomes quite massive
>  when e.g. using the ppc64 and thourough testing. When I'm running
> 
>  QTEST_QEMU_BINARY="ppc64-softmmu/qemu-system-ppc64" time \
>  ./tests/qtest/device-introspect-test -m slow | tail -n 10
> 
>  the test runs for ca. 6m40s here before the patch got applied, and for
>  mor than 20 minutes after the patch got applied!  
> >> 
> >> That's surprising.  
> >
> > It's a bit surprising indeed, but on the other hand using
> > g_queue_insert_sorted results in a quadratic loop.  
> 
> The surprising part is that n turns out to be large enough for n^2 to
> matter *that* much.

Is this another consequence of the ludicrous number of QOM objects we
create for LMB DRCs (one for every 256MiB of guest RAM)?  Avoiding that
is on my list.  Though avoiding a n^2 behaviour here is probably a good
idea anyway.

-- 
David Gibson 
Principal Software Engineer, Virtualization, Red Hat


pgpURDPjG9G7x.pgp
Description: OpenPGP digital signature


Re: [PATCH v2 2/2] hw/riscv: sifive_u: Provide a reliable way for bootloader to detect whether it is running in QEMU

2020-07-12 Thread Bin Meng
Hi Alistair,

On Sun, Jul 12, 2020 at 12:03 AM Alistair Francis  wrote:
>
> On Thu, Jul 9, 2020 at 5:48 PM Bin Meng  wrote:
> >
> > Hi Alistair,
> >
> > On Fri, Jul 10, 2020 at 6:19 AM Alistair Francis  
> > wrote:
> > >
> > > On Thu, Jul 9, 2020 at 3:07 AM Bin Meng  wrote:
> > > >
> > > > From: Bin Meng 
> > > >
> > > > The reset vector codes are subject to change, e.g.: with recent
> > > > fw_dynamic type image support, it breaks oreboot again.
> > >
> > > This is a recurring problem, I have another patch for Oreboot to fix
> > > the latest breakage.
> > >
> >
> > Can Oreboot be updated to remove the QEMU detection?
>
> In general I think it should be.
>
> Right now it's not critical to do. I think from a QEMU perspective we
> have finished changing the "ROM" code so after this release we can
> update Oreboot and then it should settle down again.
>
> >
> > > >
> > > > Add a subregion in the MROM, with the size of machine RAM stored,
> > > > so that we can provide a reliable way for bootloader to detect
> > > > whether it is running in QEMU.
> > >
> > > I don't really like this though. I would prefer that we don't
> > > encourage guest software to behave differently on QEMU. I don't think
> > > other upstream boards do this.
> > >
> > > Besides Oreboot setting up the clocks are there any other users of this?
> >
> > I don't really have any specific reason, except for testing U-Boot SPL
> > by relaxing the requirement of hardcoding the memory to 8G "-m 8G" as
> > I indicated in the commit message below:
>
> Yeah, I think that's just something we will have to deal with. If the
> guest expects 8GB and doesn't check the device tree passed to it then
> the user has to create 8GB of memory.
>

Note there are two reasons that why "-m 8G" is used to test U-Boot SPL:

1. U-Boot DDR driver is hardcoding memory to 8G, which I had a fix
locally and will send U-Boot upstream soon.
2. U-Boot SPL has to use its own device tree because we don't want to
update QEMU to include all the DDR register settings in the device
tree which is very big.

Why I wanted to add this patch here is that I wanted to dynamically
patch U-Boot SPL DT to use the actual ram size that QEMU instantiates.
This way we can avoid editing U-Boot SPL DT to set the actual memory
size.

Regards,
Bin



Re: [PATCH v3 4/4] hw/block/nvme: Align I/O BAR to 4 KiB

2020-07-12 Thread Dmitry Fomichev
On Tue, 2020-06-30 at 13:04 +0200, Philippe Mathieu-Daudé wrote:
> Simplify the NVMe emulated device by aligning the I/O BAR to 4 KiB.
> 
> Reviewed-by: Klaus Jensen 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/block/nvme.h | 2 ++
>  hw/block/nvme.c  | 5 ++---
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 82c384614a..4e1cea576a 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -22,6 +22,7 @@ typedef struct QEMU_PACKED NvmeBar {
>  uint32_tpmrebs;
>  uint32_tpmrswtp;
>  uint64_tpmrmsc;
> +uint8_t reserved[484];
>  } NvmeBar;
>  
>  enum NvmeCapShift {
> @@ -879,6 +880,7 @@ enum NvmeIdNsDps {
>  
>  static inline void _nvme_check_size(void)
>  {
> +QEMU_BUILD_BUG_ON(sizeof(NvmeBar) != 4096);
>  QEMU_BUILD_BUG_ON(sizeof(NvmeAerResult) != 4);
>  QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) != 16);
>  QEMU_BUILD_BUG_ON(sizeof(NvmeDsmRange) != 16);
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 6628d0a4ba..2aa54bc20e 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -55,7 +55,6 @@
>  #include "nvme.h"
>  
>  #define NVME_MAX_IOQPAIRS 0x
> -#define NVME_REG_SIZE 0x1000
>  #define NVME_DB_SIZE  4
>  #define NVME_CMB_BIR 2
>  #define NVME_PMR_BIR 2
> @@ -1322,7 +1321,7 @@ static void nvme_mmio_write(void *opaque, hwaddr addr, 
> uint64_t data,
>  NvmeCtrl *n = (NvmeCtrl *)opaque;
>  if (addr < sizeof(n->bar)) {
>  nvme_write_bar(n, addr, data, size);
> -} else if (addr >= 0x1000) {
> +} else {
>  nvme_process_db(n, addr, data);
>  }
>  }
> @@ -1416,7 +1415,7 @@ static void nvme_init_state(NvmeCtrl *n)
>  {
>  n->num_namespaces = 1;
>  /* add one to max_ioqpairs to account for the admin queue pair */
> -n->reg_size = pow2ceil(NVME_REG_SIZE +
> +n->reg_size = pow2ceil(sizeof(NvmeBar) +
> 2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE);
>  n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
>  n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
> -- 
> 2.21.3
> 
> 

Reviewed-by: Dmitry Fomichev 



Re: [PATCH v3 3/4] hw/block/nvme: Fix pmrmsc register size

2020-07-12 Thread Dmitry Fomichev
On Tue, 2020-06-30 at 13:04 +0200, Philippe Mathieu-Daudé wrote:
> The Persistent Memory Region Controller Memory Space Control
> register is 64-bit wide. See 'Figure 68: Register Definition'
> of the 'NVM Express Base Specification Revision 1.4'.
> 
> Fixes: 6cf9413229 ("introduce PMR support from NVMe 1.4 spec")
> Reported-by: Klaus Jensen 
> Reviewed-by: Klaus Jensen 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Andrzej Jakowski 
> ---
>  include/block/nvme.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 71c5681912..82c384614a 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -21,7 +21,7 @@ typedef struct QEMU_PACKED NvmeBar {
>  uint32_tpmrsts;
>  uint32_tpmrebs;
>  uint32_tpmrswtp;
> -uint32_tpmrmsc;
> +uint64_tpmrmsc;
>  } NvmeBar;
>  
>  enum NvmeCapShift {
> -- 
> 2.21.3
> 
> 

Reviewed-by: Dmitry Fomichev 



Re: [PATCH v3 2/4] hw/block/nvme: Use QEMU_PACKED on hardware/packet structures

2020-07-12 Thread Dmitry Fomichev

On Tue, 2020-06-30 at 13:04 +0200, Philippe Mathieu-Daudé wrote:
> These structures either describe hardware registers, or
> commands ('packets') to send to the hardware. To forbid
> the compiler to optimize and change fields alignment,
> mark the structures as packed.
> 
> Reviewed-by: Klaus Jensen 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/block/nvme.h | 38 +++---
>  1 file changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 1720ee1d51..71c5681912 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -1,7 +1,7 @@
>  #ifndef BLOCK_NVME_H
>  #define BLOCK_NVME_H
>  
> -typedef struct NvmeBar {
> +typedef struct QEMU_PACKED NvmeBar {
>  uint64_tcap;
>  uint32_tvs;
>  uint32_tintms;
> @@ -377,7 +377,7 @@ enum NvmePmrmscMask {
>  #define NVME_PMRMSC_SET_CBA(pmrmsc, val)   \
>  (pmrmsc |= (uint64_t)(val & PMRMSC_CBA_MASK) << PMRMSC_CBA_SHIFT)
>  
> -typedef struct NvmeCmd {
> +typedef struct QEMU_PACKED NvmeCmd {
>  uint8_t opcode;
>  uint8_t fuse;
>  uint16_tcid;
> @@ -422,7 +422,7 @@ enum NvmeIoCommands {
>  NVME_CMD_DSM= 0x09,
>  };
>  
> -typedef struct NvmeDeleteQ {
> +typedef struct QEMU_PACKED NvmeDeleteQ {
>  uint8_t opcode;
>  uint8_t flags;
>  uint16_tcid;
> @@ -432,7 +432,7 @@ typedef struct NvmeDeleteQ {
>  uint32_trsvd11[5];
>  } NvmeDeleteQ;
>  
> -typedef struct NvmeCreateCq {
> +typedef struct QEMU_PACKED NvmeCreateCq {
>  uint8_t opcode;
>  uint8_t flags;
>  uint16_tcid;
> @@ -449,7 +449,7 @@ typedef struct NvmeCreateCq {
>  #define NVME_CQ_FLAGS_PC(cq_flags)  (cq_flags & 0x1)
>  #define NVME_CQ_FLAGS_IEN(cq_flags) ((cq_flags >> 1) & 0x1)
>  
> -typedef struct NvmeCreateSq {
> +typedef struct QEMU_PACKED NvmeCreateSq {
>  uint8_t opcode;
>  uint8_t flags;
>  uint16_tcid;
> @@ -474,7 +474,7 @@ enum NvmeQueueFlags {
>  NVME_Q_PRIO_LOW = 3,
>  };
>  
> -typedef struct NvmeIdentify {
> +typedef struct QEMU_PACKED NvmeIdentify {
>  uint8_t opcode;
>  uint8_t flags;
>  uint16_tcid;
> @@ -486,7 +486,7 @@ typedef struct NvmeIdentify {
>  uint32_trsvd11[5];
>  } NvmeIdentify;
>  
> -typedef struct NvmeRwCmd {
> +typedef struct QEMU_PACKED NvmeRwCmd {
>  uint8_t opcode;
>  uint8_t flags;
>  uint16_tcid;
> @@ -528,7 +528,7 @@ enum {
>  NVME_RW_PRINFO_PRCHK_REF= 1 << 10,
>  };
>  
> -typedef struct NvmeDsmCmd {
> +typedef struct QEMU_PACKED NvmeDsmCmd {
>  uint8_t opcode;
>  uint8_t flags;
>  uint16_tcid;
> @@ -547,7 +547,7 @@ enum {
>  NVME_DSMGMT_AD  = 1 << 2,
>  };
>  
> -typedef struct NvmeDsmRange {
> +typedef struct QEMU_PACKED NvmeDsmRange {
>  uint32_tcattr;
>  uint32_tnlb;
>  uint64_tslba;
> @@ -569,14 +569,14 @@ enum NvmeAsyncEventRequest {
>  NVME_AER_INFO_SMART_SPARE_THRESH= 2,
>  };
>  
> -typedef struct NvmeAerResult {
> +typedef struct QEMU_PACKED NvmeAerResult {
>  uint8_t event_type;
>  uint8_t event_info;
>  uint8_t log_page;
>  uint8_t resv;
>  } NvmeAerResult;
>  
> -typedef struct NvmeCqe {
> +typedef struct QEMU_PACKED NvmeCqe {
>  uint32_tresult;
>  uint32_trsvd;
>  uint16_tsq_head;
> @@ -634,7 +634,7 @@ enum NvmeStatusCodes {
>  NVME_NO_COMPLETE= 0x,
>  };
>  
> -typedef struct NvmeFwSlotInfoLog {
> +typedef struct QEMU_PACKED NvmeFwSlotInfoLog {
>  uint8_t afi;
>  uint8_t reserved1[7];
>  uint8_t frs1[8];
> @@ -647,7 +647,7 @@ typedef struct NvmeFwSlotInfoLog {
>  uint8_t reserved2[448];
>  } NvmeFwSlotInfoLog;
>  
> -typedef struct NvmeErrorLog {
> +typedef struct QEMU_PACKED NvmeErrorLog {
>  uint64_terror_count;
>  uint16_tsqid;
>  uint16_tcid;
> @@ -659,7 +659,7 @@ typedef struct NvmeErrorLog {
>  uint8_t resv[35];
>  } NvmeErrorLog;
>  
> -typedef struct NvmeSmartLog {
> +typedef struct QEMU_PACKED NvmeSmartLog {
>  uint8_t critical_warning;
>  uint8_t temperature[2];
>  uint8_t available_spare;
> @@ -693,7 +693,7 @@ enum LogIdentifier {
>  NVME_LOG_FW_SLOT_INFO   = 0x03,
>  };
>  
> -typedef struct NvmePSD {
> +typedef struct QEMU_PACKED NvmePSD {
>  uint16_tmp;
>  uint16_treserved;
>  uint32_tenlat;
> @@ -713,7 +713,7 @@ enum {
>  NVME_ID_CNS_NS_ACTIVE_LIST = 0x2,
>  };
>  
> -typedef struct NvmeIdCtrl {
> +typedef struct QEMU_PACKED NvmeIdCtrl {
>  uint16_tvid;
>  uint16_tssvid;
>  uint8_t sn[20];
> @@ -807,7 +807,7 @@ enum NvmeFeatureIds {
>  NVME_SOFTWARE_PROGRESS_MARKER   = 0x80
>  };
>  
> -typedef struct NvmeRangeType {
> +typedef struct QEMU_PACKED NvmeRangeType {
>  uint8_t type;
>  uint8_t attributes;
>  uint8_t rsvd2[14];
> @@ 

[Bug 1887318] Re: impossible to install in OSX Yosemite 10.10.5

2020-07-12 Thread JuanPabloCuervo
https://github.com/Homebrew/brew/issues/7667

https://security.stackexchange.com/questions/232445/https-connection-to-
specific-sites-fail-with-curl-on-macos

This is a Cat & Mouse game...
Catch 22...

its Not a Brew problemm
is Not a glib problem,
is not a git problem,
so we dont care...
its an Apple problem,
Apple does Not care.

End of Story.

** Bug watch added: github.com/Homebrew/brew/issues #7667
   https://github.com/Homebrew/brew/issues/7667

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1887318

Title:
  impossible to install in OSX Yosemite 10.10.5

Status in QEMU:
  New

Bug description:
  the Brew method has glib problems, glib is impossible to install.
  the MacPorts method has a very long .log file.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1887318/+subscriptions



[Bug 1887318] Re: impossible to install in OSX Yosemite 10.10.5

2020-07-12 Thread JuanPabloCuervo
console log

i installed Xcode 6.3 as recommended by MacPorts
"better than 6.1"
for Yosemite


** Attachment added: "console.txt"
   
https://bugs.launchpad.net/qemu/+bug/1887318/+attachment/5392137/+files/console.txt

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1887318

Title:
  impossible to install in OSX Yosemite 10.10.5

Status in QEMU:
  New

Bug description:
  the Brew method has glib problems, glib is impossible to install.
  the MacPorts method has a very long .log file.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1887318/+subscriptions



[PATCH] target/openrisc: Remove superfluous breaks

2020-07-12 Thread Yi Wang
From: Liao Pingfang 

Remove superfluous breaks, as there is a "return" before them.

Signed-off-by: Liao Pingfang 
---
 target/openrisc/sys_helper.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/target/openrisc/sys_helper.c b/target/openrisc/sys_helper.c
index d9fe6c5..d9691d0 100644
--- a/target/openrisc/sys_helper.c
+++ b/target/openrisc/sys_helper.c
@@ -289,10 +289,8 @@ target_ulong HELPER(mfspr)(CPUOpenRISCState *env, 
target_ulong rd,
 
 case TO_SPR(5, 1):  /* MACLO */
 return (uint32_t)env->mac;
-break;
 case TO_SPR(5, 2):  /* MACHI */
 return env->mac >> 32;
-break;
 
 case TO_SPR(8, 0):  /* PMR */
 return env->pmr;
-- 
2.9.5




[PATCH] hw: Remove superfluous breaks

2020-07-12 Thread Yi Wang
From: Liao Pingfang 

Remove superfluous breaks, as there is a "return" before them.

Signed-off-by: Liao Pingfang 
---
 hw/block/pflash_cfi01.c |  1 -
 hw/display/cirrus_vga.c |  1 -
 hw/display/qxl-logger.c |  2 --
 hw/gpio/max7310.c   |  3 ---
 hw/i386/intel_iommu.c   |  1 -
 hw/input/pxa2xx_keypad.c| 10 --
 hw/intc/armv7m_nvic.c   |  1 -
 hw/net/lan9118.c|  2 --
 hw/usb/ccid-card-emulated.c |  1 -
 9 files changed, 22 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 8ab1d66..f0fcd63 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -213,7 +213,6 @@ static uint32_t pflash_devid_query(PFlashCFI01 *pfl, hwaddr 
offset)
 default:
 trace_pflash_device_info(offset);
 return 0;
-break;
 }
 /* Replicate responses for each device in bank. */
 if (pfl->device_width < pfl->bank_width) {
diff --git a/hw/display/cirrus_vga.c b/hw/display/cirrus_vga.c
index 212d6f5..02d9ed0 100644
--- a/hw/display/cirrus_vga.c
+++ b/hw/display/cirrus_vga.c
@@ -1637,7 +1637,6 @@ static int cirrus_vga_read_cr(CirrusVGAState * s, 
unsigned reg_index)
return s->vga.cr[s->vga.cr_index];
 case 0x26: // Attribute Controller Index Readback (R)
return s->vga.ar_index & 0x3f;
-   break;
 default:
 qemu_log_mask(LOG_GUEST_ERROR,
   "cirrus: inport cr_index 0x%02x\n", reg_index);
diff --git a/hw/display/qxl-logger.c b/hw/display/qxl-logger.c
index 2ec6d8f..c15175b 100644
--- a/hw/display/qxl-logger.c
+++ b/hw/display/qxl-logger.c
@@ -161,7 +161,6 @@ static int qxl_log_cmd_draw(PCIQXLDevice *qxl, QXLDrawable 
*draw, int group_id)
 switch (draw->type) {
 case QXL_DRAW_COPY:
 return qxl_log_cmd_draw_copy(qxl, >u.copy, group_id);
-break;
 }
 return 0;
 }
@@ -180,7 +179,6 @@ static int qxl_log_cmd_draw_compat(PCIQXLDevice *qxl, 
QXLCompatDrawable *draw,
 switch (draw->type) {
 case QXL_DRAW_COPY:
 return qxl_log_cmd_draw_copy(qxl, >u.copy, group_id);
-break;
 }
 return 0;
 }
diff --git a/hw/gpio/max7310.c b/hw/gpio/max7310.c
index bebb403..4f78774 100644
--- a/hw/gpio/max7310.c
+++ b/hw/gpio/max7310.c
@@ -51,11 +51,9 @@ static uint8_t max7310_rx(I2CSlave *i2c)
 switch (s->command) {
 case 0x00: /* Input port */
 return s->level ^ s->polarity;
-break;
 
 case 0x01: /* Output port */
 return s->level & ~s->direction;
-break;
 
 case 0x02: /* Polarity inversion */
 return s->polarity;
@@ -65,7 +63,6 @@ static uint8_t max7310_rx(I2CSlave *i2c)
 
 case 0x04: /* Timeout */
 return s->status;
-break;
 
 case 0xff: /* Reserved */
 return 0xff;
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c56398e..7b390ca 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3163,7 +3163,6 @@ static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t 
index,
   index, entry->irte.sid_vtype);
 /* Take this as verification failure. */
 return -VTD_FR_IR_SID_ERR;
-break;
 }
 }
 
diff --git a/hw/input/pxa2xx_keypad.c b/hw/input/pxa2xx_keypad.c
index 62aa6f6..7f2f739 100644
--- a/hw/input/pxa2xx_keypad.c
+++ b/hw/input/pxa2xx_keypad.c
@@ -192,10 +192,8 @@ static uint64_t pxa2xx_keypad_read(void *opaque, hwaddr 
offset,
 s->kpc &= ~(KPC_DI);
 qemu_irq_lower(s->irq);
 return tmp;
-break;
 case KPDK:
 return s->kpdk;
-break;
 case KPREC:
 tmp = s->kprec;
 if(tmp & KPREC_OF1)
@@ -207,31 +205,23 @@ static uint64_t pxa2xx_keypad_read(void *opaque, hwaddr 
offset,
 if(tmp & KPREC_UF0)
 s->kprec &= ~(KPREC_UF0);
 return tmp;
-break;
 case KPMK:
 tmp = s->kpmk;
 if(tmp & KPMK_MKP)
 s->kpmk &= ~(KPMK_MKP);
 return tmp;
-break;
 case KPAS:
 return s->kpas;
-break;
 case KPASMKP0:
 return s->kpasmkp[0];
-break;
 case KPASMKP1:
 return s->kpasmkp[1];
-break;
 case KPASMKP2:
 return s->kpasmkp[2];
-break;
 case KPASMKP3:
 return s->kpasmkp[3];
-break;
 case KPKDI:
 return s->kpkdi;
-break;
 default:
 qemu_log_mask(LOG_GUEST_ERROR,
   "%s: Bad read offset 0x%"HWADDR_PRIx"\n",
diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index 3c4b6e6..720ac97 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -1275,7 +1275,6 @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, 
MemTxAttrs attrs)
 case 0xd90: /* MPU_TYPE */
 /* Unified MPU; if the MPU is not present this value is zero */
 return cpu->pmsav7_dregion << 8;
-break;
 case 0xd94: /* MPU_CTRL */
 

[Bug 1887318] [NEW] impossible to install in OSX Yosemite 10.10.5

2020-07-12 Thread JuanPabloCuervo
Public bug reported:

the Brew method has glib problems, glib is impossible to install.
the MacPorts method has a very long .log file.

** Affects: qemu
 Importance: Undecided
 Status: New

** Attachment added: "main.log"
   https://bugs.launchpad.net/bugs/1887318/+attachment/5392136/+files/main.log

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1887318

Title:
  impossible to install in OSX Yosemite 10.10.5

Status in QEMU:
  New

Bug description:
  the Brew method has glib problems, glib is impossible to install.
  the MacPorts method has a very long .log file.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1887318/+subscriptions



[PATCH] target/ppc: Remove superfluous breaks

2020-07-12 Thread Yi Wang
From: Liao Pingfang 

Remove superfluous breaks, as there is a "return" before them.

Signed-off-by: Liao Pingfang 
---
 target/ppc/misc_helper.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index 55b68d1..e43a3b4 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -234,25 +234,20 @@ target_ulong helper_clcs(CPUPPCState *env, uint32_t arg)
 case 0x0CUL:
 /* Instruction cache line size */
 return env->icache_line_size;
-break;
 case 0x0DUL:
 /* Data cache line size */
 return env->dcache_line_size;
-break;
 case 0x0EUL:
 /* Minimum cache line size */
 return (env->icache_line_size < env->dcache_line_size) ?
 env->icache_line_size : env->dcache_line_size;
-break;
 case 0x0FUL:
 /* Maximum cache line size */
 return (env->icache_line_size > env->dcache_line_size) ?
 env->icache_line_size : env->dcache_line_size;
-break;
 default:
 /* Undefined */
 return 0;
-break;
 }
 }
 
-- 
2.9.5




[PATCH] target/sh4: Remove superfluous breaks

2020-07-12 Thread Yi Wang
From: Liao Pingfang 

Remove superfluous breaks, as there is a "return" before them.

Signed-off-by: Liao Pingfang 
---
 target/sh4/translate.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 6192d83..60c863d 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -1542,7 +1542,6 @@ static void _decode_opc(DisasContext * ctx)
 tcg_gen_qemu_ld_i32(REG(0), REG(B11_8), ctx->memidx,
 MO_TEUL | MO_UNALN);
 return;
-break;
 case 0x40e9:/* movua.l @Rm+,R0 */
 CHECK_SH4A
 /* Load non-boundary-aligned data */
@@ -1550,7 +1549,6 @@ static void _decode_opc(DisasContext * ctx)
 MO_TEUL | MO_UNALN);
 tcg_gen_addi_i32(REG(B11_8), REG(B11_8), 4);
 return;
-break;
 case 0x0029:   /* movt Rn */
 tcg_gen_mov_i32(REG(B11_8), cpu_sr_t);
return;
@@ -1638,7 +1636,6 @@ static void _decode_opc(DisasContext * ctx)
 CHECK_SH4A
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
 return;
-break;
 case 0x4024:   /* rotcl Rn */
{
TCGv tmp = tcg_temp_new();
-- 
2.9.5




[PATCH] scsi: Remove superfluous breaks

2020-07-12 Thread Yi Wang
From: Liao Pingfang 

Remove superfluous breaks, as there is a "return" before them.

Signed-off-by: Liao Pingfang 
---
 scsi/utils.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/scsi/utils.c b/scsi/utils.c
index c50e81f..b37c283 100644
--- a/scsi/utils.c
+++ b/scsi/utils.c
@@ -32,17 +32,13 @@ uint32_t scsi_cdb_xfer(uint8_t *buf)
 switch (buf[0] >> 5) {
 case 0:
 return buf[4];
-break;
 case 1:
 case 2:
 return lduw_be_p([7]);
-break;
 case 4:
 return ldl_be_p([10]) & 0xULL;
-break;
 case 5:
 return ldl_be_p([6]) & 0xULL;
-break;
 default:
 return -1;
 }
-- 
2.9.5




[PATCH] block/vmdk: Remove superfluous breaks

2020-07-12 Thread Yi Wang
From: Liao Pingfang 

Remove superfluous breaks, as there is a "return" before them.

Signed-off-by: Liao Pingfang 
---
 block/vmdk.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 28cec50..8f222e3 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1053,14 +1053,11 @@ static int vmdk_open_sparse(BlockDriverState *bs, 
BdrvChild *file, int flags,
 switch (magic) {
 case VMDK3_MAGIC:
 return vmdk_open_vmfs_sparse(bs, file, flags, errp);
-break;
 case VMDK4_MAGIC:
 return vmdk_open_vmdk4(bs, file, flags, options, errp);
-break;
 default:
 error_setg(errp, "Image not in VMDK format");
 return -EINVAL;
-break;
 }
 }
 
-- 
2.9.5




[PATCH] target/arm/kvm: Remove superfluous break

2020-07-12 Thread Yi Wang
From: Liao Pingfang 

Remove superfluous break.

Signed-off-by: Liao Pingfang 
---
 target/arm/kvm64.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 1169237..ef1e960 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -330,7 +330,6 @@ int kvm_arch_remove_hw_breakpoint(target_ulong addr,
 switch (type) {
 case GDB_BREAKPOINT_HW:
 return delete_hw_breakpoint(addr);
-break;
 case GDB_WATCHPOINT_READ:
 case GDB_WATCHPOINT_WRITE:
 case GDB_WATCHPOINT_ACCESS:
-- 
2.9.5




[PATCH] migration/migration.c: Remove superfluous breaks

2020-07-12 Thread Yi Wang
From: Liao Pingfang 

Remove superfluous breaks, as there is a "return" before them.

Signed-off-by: Liao Pingfang 
---
 migration/migration.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 92e44e0..2fd5fbb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -985,7 +985,6 @@ static void fill_source_migration_info(MigrationInfo *info)
 /* no migration has happened ever */
 /* do not overwrite destination migration status */
 return;
-break;
 case MIGRATION_STATUS_SETUP:
 info->has_status = true;
 info->has_total_time = false;
@@ -1104,7 +1103,6 @@ static void fill_destination_migration_info(MigrationInfo 
*info)
 switch (mis->state) {
 case MIGRATION_STATUS_NONE:
 return;
-break;
 case MIGRATION_STATUS_SETUP:
 case MIGRATION_STATUS_CANCELLING:
 case MIGRATION_STATUS_CANCELLED:
-- 
2.9.5




[PATCH] target/cris: Remove superfluous breaks

2020-07-12 Thread Yi Wang
From: Liao Pingfang 

Remove superfluous breaks, as there is a "return" before them.

Signed-off-by: Liao Pingfang 
---
 target/cris/translate.c | 7 +++
 target/cris/translate_v10.inc.c | 2 --
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index aaa46b5..64a478b 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -1178,12 +1178,11 @@ static inline void t_gen_zext(TCGv d, TCGv s, int size)
 static char memsize_char(int size)
 {
 switch (size) {
-case 1: return 'b';  break;
-case 2: return 'w';  break;
-case 4: return 'd';  break;
+case 1: return 'b';
+case 2: return 'w';
+case 4: return 'd';
 default:
 return 'x';
-break;
 }
 }
 #endif
diff --git a/target/cris/translate_v10.inc.c b/target/cris/translate_v10.inc.c
index ae34a0d..7f38fd2 100644
--- a/target/cris/translate_v10.inc.c
+++ b/target/cris/translate_v10.inc.c
@@ -1026,10 +1026,8 @@ static unsigned int dec10_ind(CPUCRISState *env, 
DisasContext *dc)
 switch (dc->opcode) {
 case CRISV10_IND_MOVE_M_R:
 return dec10_ind_move_m_r(env, dc, size);
-break;
 case CRISV10_IND_MOVE_R_M:
 return dec10_ind_move_r_m(dc, size);
-break;
 case CRISV10_IND_CMP:
 LOG_DIS("cmp size=%d op=%d %d\n",  size, dc->src, dc->dst);
 cris_cc_mask(dc, CC_MASK_NZVC);
-- 
2.9.5




[PATCH] vnc: Remove the superfluous break

2020-07-12 Thread Yi Wang
From: Liao Pingfang 

Remove the superfluous break, as there is a "return" before.

Signed-off-by: Liao Pingfang 
---
 ui/vnc-enc-tight.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/ui/vnc-enc-tight.c b/ui/vnc-enc-tight.c
index 1e08518..cebd358 100644
--- a/ui/vnc-enc-tight.c
+++ b/ui/vnc-enc-tight.c
@@ -1125,7 +1125,6 @@ static int send_palette_rect(VncState *vs, int x, int y,
 }
 default:
 return -1; /* No palette for 8bits colors */
-break;
 }
 bytes = w * h;
 vs->tight->tight.offset = bytes;
-- 
2.9.5




[PATCH] virtfs-proxy-helper: Remove the superfluous break

2020-07-12 Thread Yi Wang
From: Liao Pingfang 

Remove the superfluous break, as there is a "return" before it.

Signed-off-by: Liao Pingfang 
---
 fsdev/virtfs-proxy-helper.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fsdev/virtfs-proxy-helper.c b/fsdev/virtfs-proxy-helper.c
index de061a8..e68acc1 100644
--- a/fsdev/virtfs-proxy-helper.c
+++ b/fsdev/virtfs-proxy-helper.c
@@ -825,7 +825,6 @@ static int process_reply(int sock, int type,
 break;
 default:
 return -1;
-break;
 }
 return 0;
 }
-- 
2.9.5




[PATCH] tcg/riscv: Remove superfluous breaks

2020-07-12 Thread Yi Wang
From: Liao Pingfang 

Remove superfluous breaks, as there is a "return" before them.

Signed-off-by: Liao Pingfang 
---
 tcg/riscv/tcg-target.inc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tcg/riscv/tcg-target.inc.c b/tcg/riscv/tcg-target.inc.c
index 2bc0ba7..3c11ab8 100644
--- a/tcg/riscv/tcg-target.inc.c
+++ b/tcg/riscv/tcg-target.inc.c
@@ -502,10 +502,8 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 break;
 case R_RISCV_JAL:
 return reloc_jimm20(code_ptr, (tcg_insn_unit *)value);
-break;
 case R_RISCV_CALL:
 return reloc_call(code_ptr, (tcg_insn_unit *)value);
-break;
 default:
 tcg_abort();
 }
-- 
2.9.5




[RFC PATCH 6/8] fpu/softfloat: define operation for bfloat16

2020-07-12 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 fpu/softfloat.c | 146 +++-
 include/fpu/softfloat.h |  44 
 2 files changed, 189 insertions(+), 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 54fc889446..9a58107be3 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1182,6 +1182,28 @@ float64_sub(float64 a, float64 b, float_status *s)
 return float64_addsub(a, b, s, hard_f64_sub, soft_f64_sub);
 }
 
+/*
+ * Returns the result of adding or subtracting the brain floating-point
+ * values `a' and `b'.
+ */
+bfloat16 QEMU_FLATTEN bfloat16_add(bfloat16 a, bfloat16 b, float_status 
*status)
+{
+FloatParts pa = bfloat16_unpack_canonical(a, status);
+FloatParts pb = bfloat16_unpack_canonical(b, status);
+FloatParts pr = addsub_floats(pa, pb, false, status);
+
+return bfloat16_round_pack_canonical(pr, status);
+}
+
+bfloat16 QEMU_FLATTEN bfloat16_sub(bfloat16 a, bfloat16 b, float_status 
*status)
+{
+FloatParts pa = bfloat16_unpack_canonical(a, status);
+FloatParts pb = bfloat16_unpack_canonical(b, status);
+FloatParts pr = addsub_floats(pa, pb, true, status);
+
+return bfloat16_round_pack_canonical(pr, status);
+}
+
 /*
  * Returns the result of multiplying the floating-point values `a' and
  * `b'. The operation is performed according to the IEC/IEEE Standard
@@ -1284,6 +1306,20 @@ float64_mul(float64 a, float64 b, float_status *s)
 f64_is_zon2, f64_addsubmul_post);
 }
 
+/*
+ * Returns the result of multiplying the brain floating-point
+ * values `a' and `b'.
+ */
+
+bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status 
*status)
+{
+FloatParts pa = bfloat16_unpack_canonical(a, status);
+FloatParts pb = bfloat16_unpack_canonical(b, status);
+FloatParts pr = mul_floats(pa, pb, status);
+
+return bfloat16_round_pack_canonical(pr, status);
+}
+
 /*
  * Returns the result of multiplying the floating-point values `a' and
  * `b' then adding 'c', with no intermediate rounding step after the
@@ -1666,6 +1702,23 @@ float64_muladd(float64 xa, float64 xb, float64 xc, int 
flags, float_status *s)
 return soft_f64_muladd(ua.s, ub.s, uc.s, flags, s);
 }
 
+/*
+ * Returns the result of multiplying the brain floating-point values `a'
+ * and `b' then adding 'c', with no intermediate rounding step after the
+ * multiplication.
+ */
+
+bfloat16 QEMU_FLATTEN bfloat16_muladd(bfloat16 a, bfloat16 b, bfloat16 c,
+  int flags, float_status *status)
+{
+FloatParts pa = bfloat16_unpack_canonical(a, status);
+FloatParts pb = bfloat16_unpack_canonical(b, status);
+FloatParts pc = bfloat16_unpack_canonical(c, status);
+FloatParts pr = muladd_floats(pa, pb, pc, flags, status);
+
+return bfloat16_round_pack_canonical(pr, status);
+}
+
 /*
  * Returns the result of dividing the floating-point value `a' by the
  * corresponding value `b'. The operation is performed according to
@@ -1832,6 +1885,20 @@ float64_div(float64 a, float64 b, float_status *s)
 f64_div_pre, f64_div_post);
 }
 
+/*
+ * Returns the result of dividing the brain floating-point
+ * value `a' by the corresponding value `b'.
+ */
+
+bfloat16 bfloat16_div(bfloat16 a, bfloat16 b, float_status *status)
+{
+FloatParts pa = bfloat16_unpack_canonical(a, status);
+FloatParts pb = bfloat16_unpack_canonical(b, status);
+FloatParts pr = div_floats(pa, pb, status);
+
+return bfloat16_round_pack_canonical(pr, status);
+}
+
 /*
  * Float to Float conversions
  *
@@ -2871,6 +2938,25 @@ MINMAX(64, maxnummag, false, true, true)
 
 #undef MINMAX
 
+#define BF16_MINMAX(name, ismin, isiee, ismag)  \
+bfloat16 bfloat16_ ## name(bfloat16 a, bfloat16 b, float_status *s) \
+{   \
+FloatParts pa = bfloat16_unpack_canonical(a, s);\
+FloatParts pb = bfloat16_unpack_canonical(b, s);\
+FloatParts pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);  \
+\
+return bfloat16_round_pack_canonical(pr, s);\
+}
+
+BF16_MINMAX(min, true, false, false)
+BF16_MINMAX(minnum, true, true, false)
+BF16_MINMAX(minnummag, true, true, true)
+BF16_MINMAX(max, false, false, false)
+BF16_MINMAX(maxnum, false, true, false)
+BF16_MINMAX(maxnummag, false, true, true)
+
+#undef BF16_MINMAX
+
 /* Floating point compare */
 static FloatRelation compare_floats(FloatParts a, FloatParts b, bool is_quiet,
 float_status *s)
@@ -3032,6 +3118,24 @@ FloatRelation float64_compare_quiet(float64 a, float64 
b, float_status *s)
 return f64_compare(a, b, true, s);
 }
 
+static int QEMU_FLATTEN
+soft_bf16_compare(bfloat16 a, bfloat16 b, bool is_quiet, float_status *s)
+{
+FloatParts pa = 

[RFC PATCH 3/8] fpu/softfloat: add FloatFmt for bfloat16

2020-07-12 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 fpu/softfloat.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 79be4f5840..1ef07d9160 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -554,6 +554,10 @@ static const FloatFmt float16_params_ahp = {
 .arm_althp = true
 };
 
+static const FloatFmt bfloat16_params = {
+FLOAT_PARAMS(8, 7)
+};
+
 static const FloatFmt float32_params = {
 FLOAT_PARAMS(8, 23)
 };
-- 
2.23.0




[RFC PATCH 2/8] fpu/softfloat: use the similiar logic to recognize sNaN and qNaN

2020-07-12 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 fpu/softfloat-specialize.inc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fpu/softfloat-specialize.inc.c b/fpu/softfloat-specialize.inc.c
index 034d18199c..6b778a7830 100644
--- a/fpu/softfloat-specialize.inc.c
+++ b/fpu/softfloat-specialize.inc.c
@@ -292,7 +292,7 @@ bool float32_is_quiet_nan(float32 a_, float_status *status)
 if (snan_bit_is_one(status)) {
 return (((a >> 22) & 0x1FF) == 0x1FE) && (a & 0x003F);
 } else {
-return ((uint32_t)(a << 1) >= 0xFF80);
+return ((a >> 22) & 0x1FF) == 0x1FF;
 }
 #endif
 }
@@ -309,7 +309,7 @@ bool float32_is_signaling_nan(float32 a_, float_status 
*status)
 #else
 uint32_t a = float32_val(a_);
 if (snan_bit_is_one(status)) {
-return ((uint32_t)(a << 1) >= 0xFF80);
+return ((a >> 22) & 0x1FF) == 0x1FF;
 } else {
 return (((a >> 22) & 0x1FF) == 0x1FE) && (a & 0x003F);
 }
@@ -647,7 +647,7 @@ bool float64_is_quiet_nan(float64 a_, float_status *status)
 return (((a >> 51) & 0xFFF) == 0xFFE)
 && (a & 0x0007ULL);
 } else {
-return ((a << 1) >= 0xFFF0ULL);
+return ((a >> 51) & 0xFFF) == 0xFFF;
 }
 #endif
 }
@@ -664,7 +664,7 @@ bool float64_is_signaling_nan(float64 a_, float_status 
*status)
 #else
 uint64_t a = float64_val(a_);
 if (snan_bit_is_one(status)) {
-return ((a << 1) >= 0xFFF0ULL);
+return ((a >> 51) & 0xFFF) == 0xFFF;
 } else {
 return (((a >> 51) & 0xFFF) == 0xFFE)
 && (a & UINT64_C(0x0007));
-- 
2.23.0




[RFC PATCH 0/8] Implement blfoat16 in softfloat

2020-07-12 Thread LIU Zhiwei
As bfloat16 is more and more popular in many archs, implement bfloat16
interfaces in softfloat, so that archs can add their bfloat16 insns
based on the blfoat16 interfaces here.

This patch set is more copy of float16 than really define new
interfaces or implementations.

Any thoughts are welcomed!

LIU Zhiwei (8):
  fpu/softfloat: fix up float16 nan recognition
  fpu/softfloat: use the similiar logic to recognize sNaN and qNaN
  fpu/softfloat: add FloatFmt for bfloat16
  fpu/softfloat: add pack and unpack interfaces for bfloat16
  fpu/softfloat: define brain floating-point types
  fpu/softfloat: define operation for bfloat16
  fpu/softfloat: define covert operation for bfloat16
  fpu/softfloat: define misc operation for bfloat16

 fpu/softfloat-specialize.inc.c |  50 -
 fpu/softfloat.c| 393 -
 include/fpu/softfloat-types.h  |   8 +
 include/fpu/softfloat.h| 133 +++
 4 files changed, 577 insertions(+), 7 deletions(-)

-- 
2.23.0




[RFC PATCH 5/8] fpu/softfloat: define brain floating-point types

2020-07-12 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 include/fpu/softfloat-types.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index 7680193ebc..8f8fdfeecf 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -112,6 +112,14 @@ typedef struct {
 #define make_float128(high_, low_) ((float128) { .high = high_, .low = low_ })
 #define make_float128_init(high_, low_) { .high = high_, .low = low_ }
 
+/*
+ * Software brain floating-point types
+ */
+typedef uint16_t bfloat16;
+#define bfloat16_val(x) (x)
+#define make_bfloat16(x) (x)
+#define const_bfloat16(x) (x)
+
 /*
  * Software IEC/IEEE floating-point underflow tininess-detection mode.
  */
-- 
2.23.0




[RFC PATCH 7/8] fpu/softfloat: define covert operation for bfloat16

2020-07-12 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 fpu/softfloat.c | 223 
 include/fpu/softfloat.h |  48 +
 2 files changed, 271 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 9a58107be3..b6002d6856 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2014,6 +2014,34 @@ float32 float64_to_float32(float64 a, float_status *s)
 return float32_round_pack_canonical(pr, s);
 }
 
+float32 bfloat16_to_float32(bfloat16 a, float_status *s)
+{
+FloatParts p = bfloat16_unpack_canonical(a, s);
+FloatParts pr = float_to_float(p, _params, s);
+return float32_round_pack_canonical(pr, s);
+}
+
+float64 bfloat16_to_float64(bfloat16 a, float_status *s)
+{
+FloatParts p = bfloat16_unpack_canonical(a, s);
+FloatParts pr = float_to_float(p, _params, s);
+return float64_round_pack_canonical(pr, s);
+}
+
+bfloat16 float32_to_bfloat16(float32 a, float_status *s)
+{
+FloatParts p = float32_unpack_canonical(a, s);
+FloatParts pr = float_to_float(p, _params, s);
+return bfloat16_round_pack_canonical(pr, s);
+}
+
+bfloat16 float64_to_bfloat16(float64 a, float_status *s)
+{
+FloatParts p = float64_unpack_canonical(a, s);
+FloatParts pr = float_to_float(p, _params, s);
+return bfloat16_round_pack_canonical(pr, s);
+}
+
 /*
  * Rounds the floating-point value `a' to an integer, and returns the
  * result as a floating-point value. The operation is performed
@@ -2143,6 +2171,18 @@ float64 float64_round_to_int(float64 a, float_status *s)
 return float64_round_pack_canonical(pr, s);
 }
 
+/*
+ * Rounds the brain floating-point value `a' to an integer, and returns the
+ * result as a brain floating-point value.
+ */
+
+bfloat16 bfloat16_round_to_int(bfloat16 a, float_status *s)
+{
+FloatParts pa = bfloat16_unpack_canonical(a, s);
+FloatParts pr = round_to_int(pa, s->float_rounding_mode, 0, s);
+return bfloat16_round_pack_canonical(pr, s);
+}
+
 /*
  * Returns the result of converting the floating-point value `a' to
  * the two's complement integer format. The conversion is performed
@@ -2353,6 +2393,62 @@ int64_t float64_to_int64_round_to_zero(float64 a, 
float_status *s)
 return float64_to_int64_scalbn(a, float_round_to_zero, 0, s);
 }
 
+/*
+ * Returns the result of converting the floating-point value `a' to
+ * the two's complement integer format.
+ */
+
+int16_t bfloat16_to_int16_scalbn(bfloat16 a, int rmode, int scale,
+ float_status *s)
+{
+return round_to_int_and_pack(bfloat16_unpack_canonical(a, s),
+ rmode, scale, INT16_MIN, INT16_MAX, s);
+}
+
+int32_t bfloat16_to_int32_scalbn(bfloat16 a, int rmode, int scale,
+ float_status *s)
+{
+return round_to_int_and_pack(bfloat16_unpack_canonical(a, s),
+ rmode, scale, INT32_MIN, INT32_MAX, s);
+}
+
+int64_t bfloat16_to_int64_scalbn(bfloat16 a, int rmode, int scale,
+ float_status *s)
+{
+return round_to_int_and_pack(bfloat16_unpack_canonical(a, s),
+ rmode, scale, INT64_MIN, INT64_MAX, s);
+}
+
+int16_t bfloat16_to_int16(bfloat16 a, float_status *s)
+{
+return bfloat16_to_int16_scalbn(a, s->float_rounding_mode, 0, s);
+}
+
+int32_t bfloat16_to_int32(bfloat16 a, float_status *s)
+{
+return bfloat16_to_int32_scalbn(a, s->float_rounding_mode, 0, s);
+}
+
+int64_t bfloat16_to_int64(bfloat16 a, float_status *s)
+{
+return bfloat16_to_int64_scalbn(a, s->float_rounding_mode, 0, s);
+}
+
+int16_t bfloat16_to_int16_round_to_zero(bfloat16 a, float_status *s)
+{
+return bfloat16_to_int16_scalbn(a, float_round_to_zero, 0, s);
+}
+
+int32_t bfloat16_to_int32_round_to_zero(bfloat16 a, float_status *s)
+{
+return bfloat16_to_int32_scalbn(a, float_round_to_zero, 0, s);
+}
+
+int64_t bfloat16_to_int64_round_to_zero(bfloat16 a, float_status *s)
+{
+return bfloat16_to_int64_scalbn(a, float_round_to_zero, 0, s);
+}
+
 /*
  *  Returns the result of converting the floating-point value `a' to
  *  the unsigned integer format. The conversion is performed according
@@ -2566,6 +2662,62 @@ uint64_t float64_to_uint64_round_to_zero(float64 a, 
float_status *s)
 return float64_to_uint64_scalbn(a, float_round_to_zero, 0, s);
 }
 
+/*
+ *  Returns the result of converting the brain floating-point value `a' to
+ *  the unsigned integer format.
+ */
+
+uint16_t bfloat16_to_uint16_scalbn(bfloat16 a, int rmode, int scale,
+   float_status *s)
+{
+return round_to_uint_and_pack(bfloat16_unpack_canonical(a, s),
+  rmode, scale, UINT16_MAX, s);
+}
+
+uint32_t bfloat16_to_uint32_scalbn(bfloat16 a, int rmode, int scale,
+   float_status *s)
+{
+return round_to_uint_and_pack(bfloat16_unpack_canonical(a, s),
+  rmode, scale, UINT32_MAX, s);

[RFC PATCH 1/8] fpu/softfloat: fix up float16 nan recognition

2020-07-12 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 fpu/softfloat-specialize.inc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat-specialize.inc.c b/fpu/softfloat-specialize.inc.c
index 44f5b661f8..034d18199c 100644
--- a/fpu/softfloat-specialize.inc.c
+++ b/fpu/softfloat-specialize.inc.c
@@ -254,7 +254,7 @@ bool float16_is_quiet_nan(float16 a_, float_status *status)
 if (snan_bit_is_one(status)) {
 return (((a >> 9) & 0x3F) == 0x3E) && (a & 0x1FF);
 } else {
-return ((a & ~0x8000) >= 0x7C80);
+return ((a >> 9) & 0x3F) == 0x3F;
 }
 #endif
 }
@@ -271,7 +271,7 @@ bool float16_is_signaling_nan(float16 a_, float_status 
*status)
 #else
 uint16_t a = float16_val(a_);
 if (snan_bit_is_one(status)) {
-return ((a & ~0x8000) >= 0x7C80);
+return ((a >> 9) & 0x3F) == 0x3F;
 } else {
 return (((a >> 9) & 0x3F) == 0x3E) && (a & 0x1FF);
 }
-- 
2.23.0




[RFC PATCH 4/8] fpu/softfloat: add pack and unpack interfaces for bfloat16

2020-07-12 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 fpu/softfloat.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 1ef07d9160..54fc889446 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -584,6 +584,11 @@ static inline FloatParts float16_unpack_raw(float16 f)
 return unpack_raw(float16_params, f);
 }
 
+static inline FloatParts bfloat16_unpack_raw(bfloat16 f)
+{
+return unpack_raw(bfloat16_params, f);
+}
+
 static inline FloatParts float32_unpack_raw(float32 f)
 {
 return unpack_raw(float32_params, f);
@@ -607,6 +612,11 @@ static inline float16 float16_pack_raw(FloatParts p)
 return make_float16(pack_raw(float16_params, p));
 }
 
+static inline bfloat16 bfloat16_pack_raw(FloatParts p)
+{
+return make_bfloat16(pack_raw(bfloat16_params, p));
+}
+
 static inline float32 float32_pack_raw(FloatParts p)
 {
 return make_float32(pack_raw(float32_params, p));
@@ -824,6 +834,11 @@ static FloatParts float16_unpack_canonical(float16 f, 
float_status *s)
 return float16a_unpack_canonical(f, s, _params);
 }
 
+static FloatParts bfloat16_unpack_canonical(bfloat16 f, float_status *s)
+{
+return sf_canonicalize(bfloat16_unpack_raw(f), _params, s);
+}
+
 static float16 float16a_round_pack_canonical(FloatParts p, float_status *s,
  const FloatFmt *params)
 {
@@ -835,6 +850,11 @@ static float16 float16_round_pack_canonical(FloatParts p, 
float_status *s)
 return float16a_round_pack_canonical(p, s, _params);
 }
 
+static bfloat16 bfloat16_round_pack_canonical(FloatParts p, float_status *s)
+{
+return float16a_round_pack_canonical(p, s, _params);
+}
+
 static FloatParts float32_unpack_canonical(float32 f, float_status *s)
 {
 return sf_canonicalize(float32_unpack_raw(f), _params, s);
-- 
2.23.0




[RFC PATCH 8/8] fpu/softfloat: define misc operation for bfloat16

2020-07-12 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 fpu/softfloat-specialize.inc.c | 38 +++
 include/fpu/softfloat.h| 41 ++
 2 files changed, 79 insertions(+)

diff --git a/fpu/softfloat-specialize.inc.c b/fpu/softfloat-specialize.inc.c
index 6b778a7830..ff17f11f0c 100644
--- a/fpu/softfloat-specialize.inc.c
+++ b/fpu/softfloat-specialize.inc.c
@@ -259,6 +259,25 @@ bool float16_is_quiet_nan(float16 a_, float_status *status)
 #endif
 }
 
+/*
+| Returns 1 if the brain floating point value `a' is a quiet
+| NaN; otherwise returns 0.
+**/
+
+int bfloat16_is_quiet_nan(bfloat16 a_, float_status *status)
+{
+#ifdef NO_SIGNALING_NANS
+return bfloat16_is_any_nan(a_);
+#else
+uint16_t a = bfloat16_val(a_);
+if (snan_bit_is_one(status)) {
+return (((a >> 6) & 0x1FF) == 0x1FE) && (a & 0x3F);
+} else {
+return ((a >> 6) & 0x1FF) == 0x1FF;
+}
+#endif
+}
+
 /*
 | Returns 1 if the half-precision floating-point value `a' is a signaling
 | NaN; otherwise returns 0.
@@ -278,6 +297,25 @@ bool float16_is_signaling_nan(float16 a_, float_status 
*status)
 #endif
 }
 
+/*
+| Returns 1 if the brain floating point value `a' is a signaling
+| NaN; otherwise returns 0.
+**/
+
+int bfloat16_is_signaling_nan(bfloat16 a_, float_status *status)
+{
+#ifdef NO_SIGNALING_NANS
+return 0;
+#else
+uint16_t a = bfloat16_val(a_);
+if (snan_bit_is_one(status)) {
+return ((a >> 6) & 0x1FF) == 0x1FF;
+} else {
+return (((a >> 6) & 0x1FF) == 0x1FE) && (a & 0x3F);
+}
+#endif
+}
+
 /*
 | Returns 1 if the single-precision floating-point value `a' is a quiet
 | NaN; otherwise returns 0.
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 6590850253..d2c3f5fbe0 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -372,6 +372,47 @@ static inline float16 float16_set_sign(float16 a, int sign)
 #define float16_three make_float16(0x4200)
 #define float16_infinity make_float16(0x7c00)
 
+static inline int bfloat16_is_any_nan(bfloat16 a)
+{
+return ((bfloat16_val(a) & ~0x8000) > 0x7F80);
+}
+
+static inline int bfloat16_is_neg(bfloat16 a)
+{
+return bfloat16_val(a) >> 15;
+}
+
+static inline int bfloat16_is_infinity(bfloat16 a)
+{
+return (bfloat16_val(a) & 0x7fff) == 0x7F80;
+}
+
+static inline int bfloat16_is_zero(bfloat16 a)
+{
+return (bfloat16_val(a) & 0x7fff) == 0;
+}
+
+static inline int bfloat16_is_zero_or_denormal(bfloat16 a)
+{
+return (bfloat16_val(a) & 0x7F80) == 0;
+}
+
+static inline bfloat16 bfloat16_abs(bfloat16 a)
+{
+/* Note that abs does *not* handle NaN specially, nor does
+ * it flush denormal inputs to zero.
+ */
+return make_bfloat16(bfloat16_val(a) & 0x7fff);
+}
+
+static inline bfloat16 bfloat16_chs(bfloat16 a)
+{
+/* Note that chs does *not* handle NaN specially, nor does
+ * it flush denormal inputs to zero.
+ */
+return make_bfloat16(bfloat16_val(a) ^ 0x8000);
+}
+
 static inline bfloat16 bfloat16_set_sign(bfloat16 a, int sign)
 {
 return make_bfloat16((bfloat16_val(a) & 0x7fff) | (sign << 15));
-- 
2.23.0




Re: [PATCH for-5.1 0/3] Move and improve qdev API doc comments

2020-07-12 Thread Richard Henderson
On 7/11/20 7:24 AM, Peter Maydell wrote:
> Peter Maydell (3):
>   qdev: Move doc comments from qdev.c to qdev-core.h
>   qdev: Document qdev_unrealize()
>   qdev: Document GPIO related functions

Reviewed-by: Richard Henderson 

r~



Re: [Bug 1887309] [NEW] Floating-point exception in ide_set_sector

2020-07-12 Thread Alexander Bulekov
On 200712 2025, Alexander Bulekov wrote:
> Public bug reported:
> 
> Hello,
> Here is a reproducer:
> cat << EOF | ./i386-softmmu/qemu-system-i386 -M pc,accel=qtest\
>  -qtest null -nographic -vga qxl -qtest stdio -nodefaults\
>  -drive if=none,id=drive0,file=null-co://,file.read-zeroes=on,format=raw\
>  -drive if=none,id=drive1,file=null-co://,file.read-zeroes=on,format=raw\
>  -device ide-cd,drive=drive0 -device ide-hd,drive=drive1
> outw 0x176 0x3538
> outl 0xcf8 0x8903
> outl 0xcfc 0x184275c
> outb 0x376 0x2f
> outb 0x376 0x0
> outw 0x176 0xa1a4
> outl 0xcf8 0x8920
> outb 0xcfc 0xff
> outb 0xf8 0x9
> EOF
> 
> The stack-trace:
> ==16513==ERROR: UndefinedBehaviorSanitizer: FPE on unknown address 
> 0x556783603fdc (pc 0x556783603fdc bp 0x7fff82143b10 sp 0x7fff82143ab0 T16513)
> #0 0x556783603fdc in ide_set_sector 
> /home/alxndr/Development/qemu/hw/ide/core.c:626:26
> #1 0x556783603fdc in ide_dma_cb 
> /home/alxndr/Development/qemu/hw/ide/core.c:883:9
> #2 0x55678349d74d in dma_complete 
> /home/alxndr/Development/qemu/dma-helpers.c:120:9
> #3 0x55678349d74d in dma_blk_cb 
> /home/alxndr/Development/qemu/dma-helpers.c:138:9
> #4 0x556783962f25 in blk_aio_complete 
> /home/alxndr/Development/qemu/block/block-backend.c:1402:9
> #5 0x556783962f25 in blk_aio_complete_bh 
> /home/alxndr/Development/qemu/block/block-backend.c:1412:5
> #6 0x556783ac030f in aio_bh_call 
> /home/alxndr/Development/qemu/util/async.c:136:5
> #7 0x556783ac030f in aio_bh_poll 
> /home/alxndr/Development/qemu/util/async.c:164:13
> #8 0x556783a9a863 in aio_dispatch 
> /home/alxndr/Development/qemu/util/aio-posix.c:380:5
> #9 0x556783ac1b4c in aio_ctx_dispatch 
> /home/alxndr/Development/qemu/util/async.c:306:5
> #10 0x7f4f1c0fe9ed in g_main_context_dispatch 
> (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4e9ed)
> #11 0x556783acdccb in glib_pollfds_poll 
> /home/alxndr/Development/qemu/util/main-loop.c:219:9
> #12 0x556783acdccb in os_host_main_loop_wait 
> /home/alxndr/Development/qemu/util/main-loop.c:242:5
> #13 0x556783acdccb in main_loop_wait 
> /home/alxndr/Development/qemu/util/main-loop.c:518:11
> #14 0x5567833613e5 in qemu_main_loop 
> /home/alxndr/Development/qemu/softmmu/vl.c:1664:9
> #15 0x556783a07a4d in main 
> /home/alxndr/Development/qemu/softmmu/main.c:49:5
> #16 0x7f4f1ac84e0a in __libc_start_main 
> /build/glibc-GwnBeO/glibc-2.30/csu/../csu/libc-start.c:308:16
> #17 0x5567830a9089 in _start 
> (/home/alxndr/Development/qemu/build/i386-softmmu/qemu-system-i386+0x7d2089)
> 
> With -trace ide*
> 
> 12163@1594585516.671265:ide_reset IDEstate 0x56162a269058
> [R +0.024963] outw 0x176 0x3538
> 12163@1594585516.673676:ide_ioport_write IDE PIO wr @ 0x176 (Device/Head); 
> val 0x38; bus 0x56162a268c00 IDEState 0x56162a268c88
> 12163@1594585516.673683:ide_ioport_write IDE PIO wr @ 0x177 (Command); val 
> 0x35; bus 0x56162a268c00 IDEState 0x56162a269058
> 12163@1594585516.673686:ide_exec_cmd IDE exec cmd: bus 0x56162a268c00; state 
> 0x56162a269058; cmd 0x35
> OK
> [S +0.025002] OK
> [R +0.025012] outl 0xcf8 0x8903
> OK
> [S +0.025018] OK
> [R +0.025026] outl 0xcfc 0x184275c
> OK
> [S +0.025210] OK
> [R +0.025219] outb 0x376 0x2f
> 12163@1594585516.673916:ide_cmd_write IDE PIO wr @ 0x376 (Device Control); 
> val 0x2f; bus 0x56162a268c00
> OK
> [S +0.025229] OK
> [R +0.025234] outb 0x376 0x0
> 12163@1594585516.673928:ide_cmd_write IDE PIO wr @ 0x376 (Device Control); 
> val 0x00; bus 0x56162a268c00
> OK
> [S +0.025240] OK
> [R +0.025246] outw 0x176 0xa1a4
> 12163@1594585516.673940:ide_ioport_write IDE PIO wr @ 0x176 (Device/Head); 
> val 0xa4; bus 0x56162a268c00 IDEState 0x56162a269058
> 12163@1594585516.673943:ide_ioport_write IDE PIO wr @ 0x177 (Command); val 
> 0xa1; bus 0x56162a268c00 IDEState 0x56162a268c88
> 12163@1594585516.673946:ide_exec_cmd IDE exec cmd: bus 0x56162a268c00; state 
> 0x56162a268c88; cmd 0xa1
> OK
> [S +0.025265] OK
> [R +0.025270] outl 0xcf8 0x8920
> OK
> [S +0.025274] OK
> [R +0.025279] outb 0xcfc 0xff
> OK
> [S +0.025443] OK
> [R +0.025451] outb 0xf8 0x9
> 12163@1594585516.674221:ide_dma_cb IDEState 0x56162a268c88; sector_num=0 n=1 
> cmd=DMA READ
> OK
> [S +0.025724] OK
> UndefinedBehaviorSanitizer:DEADLYSIGNAL
> ==12163==ERROR: UndefinedBehaviorSanitizer: FPE on unknown address 
> 0x5616279cffdc (pc 0x5616279cffdc bp 0x7ffcdaabae90 sp 0x7ffcdaabae30 T12163)
> 
> -Alex
> 
> ** Affects: qemu
>  Importance: Undecided
>  Status: New
> 
> -- 
> You received this bug notification because you are a member of qemu-
> devel-ml, which is subscribed to QEMU.
> https://bugs.launchpad.net/bugs/1887309
> 
> Title:
>   Floating-point exception in ide_set_sector
> 
> Status in QEMU:
>   New
> 
> Bug description:
>   Hello,
>   Here is a reproducer:
>   cat << EOF | ./i386-softmmu/qemu-system-i386 -M pc,accel=qtest\
>-qtest null -nographic -vga qxl -qtest stdio -nodefaults\
>-drive 

[Bug 1887309] [NEW] Floating-point exception in ide_set_sector

2020-07-12 Thread Alexander Bulekov
Public bug reported:

Hello,
Here is a reproducer:
cat << EOF | ./i386-softmmu/qemu-system-i386 -M pc,accel=qtest\
 -qtest null -nographic -vga qxl -qtest stdio -nodefaults\
 -drive if=none,id=drive0,file=null-co://,file.read-zeroes=on,format=raw\
 -drive if=none,id=drive1,file=null-co://,file.read-zeroes=on,format=raw\
 -device ide-cd,drive=drive0 -device ide-hd,drive=drive1
outw 0x176 0x3538
outl 0xcf8 0x8903
outl 0xcfc 0x184275c
outb 0x376 0x2f
outb 0x376 0x0
outw 0x176 0xa1a4
outl 0xcf8 0x8920
outb 0xcfc 0xff
outb 0xf8 0x9
EOF

The stack-trace:
==16513==ERROR: UndefinedBehaviorSanitizer: FPE on unknown address 
0x556783603fdc (pc 0x556783603fdc bp 0x7fff82143b10 sp 0x7fff82143ab0 T16513)
#0 0x556783603fdc in ide_set_sector 
/home/alxndr/Development/qemu/hw/ide/core.c:626:26
#1 0x556783603fdc in ide_dma_cb 
/home/alxndr/Development/qemu/hw/ide/core.c:883:9
#2 0x55678349d74d in dma_complete 
/home/alxndr/Development/qemu/dma-helpers.c:120:9
#3 0x55678349d74d in dma_blk_cb 
/home/alxndr/Development/qemu/dma-helpers.c:138:9
#4 0x556783962f25 in blk_aio_complete 
/home/alxndr/Development/qemu/block/block-backend.c:1402:9
#5 0x556783962f25 in blk_aio_complete_bh 
/home/alxndr/Development/qemu/block/block-backend.c:1412:5
#6 0x556783ac030f in aio_bh_call 
/home/alxndr/Development/qemu/util/async.c:136:5
#7 0x556783ac030f in aio_bh_poll 
/home/alxndr/Development/qemu/util/async.c:164:13
#8 0x556783a9a863 in aio_dispatch 
/home/alxndr/Development/qemu/util/aio-posix.c:380:5
#9 0x556783ac1b4c in aio_ctx_dispatch 
/home/alxndr/Development/qemu/util/async.c:306:5
#10 0x7f4f1c0fe9ed in g_main_context_dispatch 
(/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4e9ed)
#11 0x556783acdccb in glib_pollfds_poll 
/home/alxndr/Development/qemu/util/main-loop.c:219:9
#12 0x556783acdccb in os_host_main_loop_wait 
/home/alxndr/Development/qemu/util/main-loop.c:242:5
#13 0x556783acdccb in main_loop_wait 
/home/alxndr/Development/qemu/util/main-loop.c:518:11
#14 0x5567833613e5 in qemu_main_loop 
/home/alxndr/Development/qemu/softmmu/vl.c:1664:9
#15 0x556783a07a4d in main /home/alxndr/Development/qemu/softmmu/main.c:49:5
#16 0x7f4f1ac84e0a in __libc_start_main 
/build/glibc-GwnBeO/glibc-2.30/csu/../csu/libc-start.c:308:16
#17 0x5567830a9089 in _start 
(/home/alxndr/Development/qemu/build/i386-softmmu/qemu-system-i386+0x7d2089)

With -trace ide*

12163@1594585516.671265:ide_reset IDEstate 0x56162a269058
[R +0.024963] outw 0x176 0x3538
12163@1594585516.673676:ide_ioport_write IDE PIO wr @ 0x176 (Device/Head); val 
0x38; bus 0x56162a268c00 IDEState 0x56162a268c88
12163@1594585516.673683:ide_ioport_write IDE PIO wr @ 0x177 (Command); val 
0x35; bus 0x56162a268c00 IDEState 0x56162a269058
12163@1594585516.673686:ide_exec_cmd IDE exec cmd: bus 0x56162a268c00; state 
0x56162a269058; cmd 0x35
OK
[S +0.025002] OK
[R +0.025012] outl 0xcf8 0x8903
OK
[S +0.025018] OK
[R +0.025026] outl 0xcfc 0x184275c
OK
[S +0.025210] OK
[R +0.025219] outb 0x376 0x2f
12163@1594585516.673916:ide_cmd_write IDE PIO wr @ 0x376 (Device Control); val 
0x2f; bus 0x56162a268c00
OK
[S +0.025229] OK
[R +0.025234] outb 0x376 0x0
12163@1594585516.673928:ide_cmd_write IDE PIO wr @ 0x376 (Device Control); val 
0x00; bus 0x56162a268c00
OK
[S +0.025240] OK
[R +0.025246] outw 0x176 0xa1a4
12163@1594585516.673940:ide_ioport_write IDE PIO wr @ 0x176 (Device/Head); val 
0xa4; bus 0x56162a268c00 IDEState 0x56162a269058
12163@1594585516.673943:ide_ioport_write IDE PIO wr @ 0x177 (Command); val 
0xa1; bus 0x56162a268c00 IDEState 0x56162a268c88
12163@1594585516.673946:ide_exec_cmd IDE exec cmd: bus 0x56162a268c00; state 
0x56162a268c88; cmd 0xa1
OK
[S +0.025265] OK
[R +0.025270] outl 0xcf8 0x8920
OK
[S +0.025274] OK
[R +0.025279] outb 0xcfc 0xff
OK
[S +0.025443] OK
[R +0.025451] outb 0xf8 0x9
12163@1594585516.674221:ide_dma_cb IDEState 0x56162a268c88; sector_num=0 n=1 
cmd=DMA READ
OK
[S +0.025724] OK
UndefinedBehaviorSanitizer:DEADLYSIGNAL
==12163==ERROR: UndefinedBehaviorSanitizer: FPE on unknown address 
0x5616279cffdc (pc 0x5616279cffdc bp 0x7ffcdaabae90 sp 0x7ffcdaabae30 T12163)

-Alex

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1887309

Title:
  Floating-point exception in ide_set_sector

Status in QEMU:
  New

Bug description:
  Hello,
  Here is a reproducer:
  cat << EOF | ./i386-softmmu/qemu-system-i386 -M pc,accel=qtest\
   -qtest null -nographic -vga qxl -qtest stdio -nodefaults\
   -drive if=none,id=drive0,file=null-co://,file.read-zeroes=on,format=raw\
   -drive if=none,id=drive1,file=null-co://,file.read-zeroes=on,format=raw\
   -device ide-cd,drive=drive0 -device ide-hd,drive=drive1
  outw 0x176 0x3538
  outl 0xcf8 0x8903
  outl 0xcfc 0x184275c
  outb 0x376 0x2f
  outb 0x376 0x0
  outw 0x176 0xa1a4
  outl 0xcf8 

[Bug 1887306] [NEW] qemu-user deadlocks when forked in a multithreaded process

2020-07-12 Thread Alexey Izbyshev
Public bug reported:

The following program (also attached) deadlocks when run under QEMU user
on Linux.

#include 
#include 
#include 
#include 
#include 
#include 

#define NUM_THREADS 100
#define NUM_FORKS 10

pthread_barrier_t barrier;

void *t(void *arg) {
for (int i = 0; i < NUM_FORKS; i++) {
pid_t pid = fork();
if (pid < 0)
abort();
if (!pid)
_exit(0);
if (waitpid(pid, NULL, 0) < 0)
abort();
}
//pthread_barrier_wait();
return NULL;
}

int main(void) {
pthread_barrier_init(, NULL, NUM_THREADS);
pthread_t ts[NUM_THREADS];
for (size_t i = 0; i < NUM_THREADS; i++) {
if (pthread_create([i], NULL, t, NULL))
abort();
}
for (size_t i = 0; i < NUM_THREADS; i++) {
pthread_join(ts[i], NULL);
}
printf("Done: %d\n", getpid());
return 0;
}

To reproduce:
$ gcc test.c -pthread
$ while qemu-x86_64 ./a.out; do :; done

(Be careful, Ctrl-C/SIGINT doesn't kill the deadlocked child).

Larger values of NUM_THREADS/NUM_FORKS lead to more often deadlocks.
With the values above it often deadlocks on the first try on my machine.
When it deadlocks, there is a child qemu process with two threads which
is waited upon by one of the worker threads of the parent.

I tried to avoid the deadlock by serializing fork() with a mutex, but it
didn't help. However, ensuring that no thread exits until all forks are
done (by adding a barrier to t()) does seem to help, at least, the
program above could run for a half an hour until I terminated it.

Tested on QEMU 5.0.0, 4.2.0 and 2.11.1, with x86_64 and AArch64 linux-
user targets.

** Affects: qemu
 Importance: Undecided
 Status: New

** Attachment added: "test.c"
   https://bugs.launchpad.net/bugs/1887306/+attachment/5392123/+files/test.c

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1887306

Title:
  qemu-user deadlocks when forked in a multithreaded process

Status in QEMU:
  New

Bug description:
  The following program (also attached) deadlocks when run under QEMU
  user on Linux.

  #include 
  #include 
  #include 
  #include 
  #include 
  #include 

  #define NUM_THREADS 100
  #define NUM_FORKS 10

  pthread_barrier_t barrier;

  void *t(void *arg) {
  for (int i = 0; i < NUM_FORKS; i++) {
  pid_t pid = fork();
  if (pid < 0)
  abort();
  if (!pid)
  _exit(0);
  if (waitpid(pid, NULL, 0) < 0)
  abort();
  }
  //pthread_barrier_wait();
  return NULL;
  }

  int main(void) {
  pthread_barrier_init(, NULL, NUM_THREADS);
  pthread_t ts[NUM_THREADS];
  for (size_t i = 0; i < NUM_THREADS; i++) {
  if (pthread_create([i], NULL, t, NULL))
  abort();
  }
  for (size_t i = 0; i < NUM_THREADS; i++) {
  pthread_join(ts[i], NULL);
  }
  printf("Done: %d\n", getpid());
  return 0;
  }

  To reproduce:
  $ gcc test.c -pthread
  $ while qemu-x86_64 ./a.out; do :; done

  (Be careful, Ctrl-C/SIGINT doesn't kill the deadlocked child).

  Larger values of NUM_THREADS/NUM_FORKS lead to more often deadlocks.
  With the values above it often deadlocks on the first try on my
  machine. When it deadlocks, there is a child qemu process with two
  threads which is waited upon by one of the worker threads of the
  parent.

  I tried to avoid the deadlock by serializing fork() with a mutex, but
  it didn't help. However, ensuring that no thread exits until all forks
  are done (by adding a barrier to t()) does seem to help, at least, the
  program above could run for a half an hour until I terminated it.

  Tested on QEMU 5.0.0, 4.2.0 and 2.11.1, with x86_64 and AArch64 linux-
  user targets.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1887306/+subscriptions



[Bug 1887303] [NEW] Assertion failure in *bmdma_active_if `bmdma->bus->retry_unit != (uint8_t)-1' failed.

2020-07-12 Thread Alexander Bulekov
Public bug reported:

Hello,
Here is a QTest Reproducer:

cat << EOF | ./i386-softmmu/qemu-system-i386 -M pc,accel=qtest\
 -qtest null -nographic -vga qxl -qtest stdio -nodefaults\
 -drive if=none,id=drive0,file=null-co://,file.read-zeroes=on,format=raw\
 -drive if=none,id=drive1,file=null-co://,file.read-zeroes=on,format=raw\
 -device ide-cd,drive=drive0 -device ide-hd,drive=drive1
outw 0x176 0x3538
outw 0x376 0x6007
outw 0x376 0x6b6b
outw 0x176 0x985c
outl 0xcf8 0x8903
outl 0xcfc 0x2f2931
outl 0xcf8 0x8920
outb 0xcfc 0x6b
outb 0x68 0x7
outw 0x176 0x2530
EOF

Here is the call-stack:

#8 0x7f00e0443091 in __assert_fail 
/build/glibc-GwnBeO/glibc-2.30/assert/assert.c:101:3
#9 0x55e163f5a1af in bmdma_active_if 
/home/alxndr/Development/qemu/include/hw/ide/pci.h:59:5
#10 0x55e163f5a1af in bmdma_prepare_buf 
/home/alxndr/Development/qemu/hw/ide/pci.c:132:19
#11 0x55e163f4f00d in ide_dma_cb 
/home/alxndr/Development/qemu/hw/ide/core.c:898:17
#12 0x55e163de86ad in dma_complete 
/home/alxndr/Development/qemu/dma-helpers.c:120:9
#13 0x55e163de86ad in dma_blk_cb 
/home/alxndr/Development/qemu/dma-helpers.c:138:9
#14 0x55e1642ade85 in blk_aio_complete 
/home/alxndr/Development/qemu/block/block-backend.c:1402:9
#15 0x55e1642ade85 in blk_aio_complete_bh 
/home/alxndr/Development/qemu/block/block-backend.c:1412:5
#16 0x55e16443556f in aio_bh_call 
/home/alxndr/Development/qemu/util/async.c:136:5
#17 0x55e16443556f in aio_bh_poll 
/home/alxndr/Development/qemu/util/async.c:164:13
#18 0x55e16440fac3 in aio_dispatch 
/home/alxndr/Development/qemu/util/aio-posix.c:380:5
#19 0x55e164436dac in aio_ctx_dispatch 
/home/alxndr/Development/qemu/util/async.c:306:5
#20 0x7f00e16e29ed in g_main_context_dispatch 
(/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4e9ed)
#21 0x55e164442f2b in glib_pollfds_poll 
/home/alxndr/Development/qemu/util/main-loop.c:219:9
#22 0x55e164442f2b in os_host_main_loop_wait 
/home/alxndr/Development/qemu/util/main-loop.c:242:5
#23 0x55e164442f2b in main_loop_wait 
/home/alxndr/Development/qemu/util/main-loop.c:518:11
#24 0x55e164376953 in flush_events 
/home/alxndr/Development/qemu/tests/qtest/fuzz/fuzz.c:47:9
#25 0x55e16437b8fa in general_fuzz 
/home/alxndr/Development/qemu/tests/qtest/fuzz/general_fuzz.c:544:17

=

Here is the same assertion failure but triggered through a different
call-stack:

cat << EOF | ./i386-softmmu/qemu-system-i386 -M pc,accel=qtest\
 -qtest null -nographic -vga qxl -qtest stdio -nodefaults\
 -drive if=none,id=drive0,file=null-co://,file.read-zeroes=on,format=raw\
 -drive if=none,id=drive1,file=null-co://,file.read-zeroes=on,format=raw\
 -device ide-cd,drive=drive0 -device ide-hd,drive=drive1
outw 0x171 0x2fe9
outb 0x177 0xa0
outl 0x170 0x928
outl 0x170 0x2b923b31
outl 0x170 0x800a24d7
outl 0xcf8 0x8903
outl 0xcfc 0x842700
outl 0xcf8 0x8920
outb 0xcfc 0x5e
outb 0x58 0x7
outb 0x376 0x5
outw 0x376 0x11
outw 0x176 0x3538
EOF

Call-stack:
#8 0x7f00e0443091 in __assert_fail 
/build/glibc-GwnBeO/glibc-2.30/assert/assert.c:101:3
#9 0x55e163f5a622 in bmdma_active_if 
/home/alxndr/Development/qemu/include/hw/ide/pci.h:59:5
#10 0x55e163f5a622 in bmdma_rw_buf 
/home/alxndr/Development/qemu/hw/ide/pci.c:187:19
#11 0x55e163f57577 in ide_atapi_cmd_read_dma_cb 
/home/alxndr/Development/qemu/hw/ide/atapi.c:375:13
#12 0x55e163f44c55 in ide_buffered_readv_cb 
/home/alxndr/Development/qemu/hw/ide/core.c:650:9
#13 0x55e1642ade85 in blk_aio_complete 
/home/alxndr/Development/qemu/block/block-backend.c:1402:9
#14 0x55e1642ade85 in blk_aio_complete_bh 
/home/alxndr/Development/qemu/block/block-backend.c:1412:5
#15 0x55e16443556f in aio_bh_call 
/home/alxndr/Development/qemu/util/async.c:136:5
#16 0x55e16443556f in aio_bh_poll 
/home/alxndr/Development/qemu/util/async.c:164:13
#17 0x55e16440fac3 in aio_dispatch 
/home/alxndr/Development/qemu/util/aio-posix.c:380:5
#18 0x55e164436dac in aio_ctx_dispatch 
/home/alxndr/Development/qemu/util/async.c:306:5
#19 0x7f00e16e29ed in g_main_context_dispatch 
(/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4e9ed)
#20 0x55e164442f2b in glib_pollfds_poll 
/home/alxndr/Development/qemu/util/main-loop.c:219:9
#21 0x55e164442f2b in os_host_main_loop_wait 
/home/alxndr/Development/qemu/util/main-loop.c:242:5
#22 0x55e164442f2b in main_loop_wait 
/home/alxndr/Development/qemu/util/main-loop.c:518:11
#23 0x55e164376953 in flush_events 
/home/alxndr/Development/qemu/tests/qtest/fuzz/fuzz.c:47:9
#24 0x55e16437b8fa in general_fuzz 
/home/alxndr/Development/qemu/tests/qtest/fuzz/general_fuzz.c:544:17

=

The first reproducer with -trace ide*:
[I 1594579788.601818] OPENED
26995@1594579788.617583:ide_reset IDEstate 0x565534a7b898
26995@1594579788.617684:ide_reset IDEstate 0x565534a7bc68
26995@1594579788.618019:ide_reset IDEstate 0x565534a7c188

Re: [PATCH] tests: improve performance of device-introspect-test

2020-07-12 Thread Thomas Huth
On 10/07/2020 22.03, Markus Armbruster wrote:
[...]
>   With -m slow, we test 2 * #machines * #devices introspections,
>   i.e. from 132 (tricore) to over 10k (ppc 13046, ppc64 23426, arm
>   82708, aarch64 89760).  Median is ~1600, sum is ~260k.
> 
>   Except we actually test just 89k now, because the test *fails* for arm
>   and aarch64 after some 500 introspections: introspecting device
>   msf2-soc with machine ast2600-evb makes QEMU terminate unsuccessfully
>   with "Unsupported NIC model: ftgmac100".  Cause: m2sxxx_soc_initfn()
>   calls qemu_check_nic_model().  Goes back to commit 05b7374a58 "msf2:
>   Add EMAC block to SmartFusion2 SoC", merged some ten weeks ago.  This
>   is exactly the kind of mistake the test is designed to catch.  How
>   come it wasn't?  Possibly due to unlucky combination with the slowdown
>   discussed in the next item (but that was less than four weeks ago).

Well, the explanation is simpler: Nobody ran the
device-introspection-test with the arm target in slow mode! Hardly
anybody runs the tests with SPEED=slow, and in the CI, we currently only
run the test in slow mode for the i386-softmmu, ppc64-softmmu and
mips64-softmmu targets. Simply because testing with arm was t slow
when I wrote that CI script, I didn't want to wait endlessly here.

But now with the speedup patch from Daniel, and maybe with some smarter
checks in the YML file (I now know that there are things like "only:
changes:" keywords so we could e.g. only run that test if something in
hw/arm/* has changed), I think it should be feasible to get this
included in the CI, too.

 Thomas




Re: [PATCH 1/2] tests/acceptance/boot_linux: Truncate SD card image to power of 2

2020-07-12 Thread Niek Linnenbank
On Tue, Jul 7, 2020 at 3:21 PM Philippe Mathieu-Daudé 
wrote:

> In the next commit we won't allow SD card images with invalid
> size (not aligned to a power of 2). Prepare the tests: add the
> pow2ceil() and image_pow2ceil_truncate() methods and truncate
> the images of the tests using SD cards.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  tests/acceptance/boot_linux_console.py | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/tests/acceptance/boot_linux_console.py
> b/tests/acceptance/boot_linux_console.py
> index 3d02519660..f4d4e3635f 100644
> --- a/tests/acceptance/boot_linux_console.py
> +++ b/tests/acceptance/boot_linux_console.py
> @@ -28,6 +28,18 @@
>  except CmdNotFoundError:
>  P7ZIP_AVAILABLE = False
>
> +# round up to next power of 2
> +def pow2ceil(x):
> +return 1 if x == 0 else 2**(x - 1).bit_length()
> +
> +# truncate file size to next power of 2
> +def image_pow2ceil_truncate(path):
> +size = os.path.getsize(path)
> +size_aligned = pow2ceil(size)
> +if size != size_aligned:
> +with open(path, 'ab+') as fd:
> +fd.truncate(size_aligned)
> +
>  class LinuxKernelTest(Test):
>  KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
>
> @@ -635,6 +647,7 @@ def test_arm_orangepi_sd(self):
>  rootfs_path_xz = self.fetch_asset(rootfs_url,
> asset_hash=rootfs_hash)
>  rootfs_path = os.path.join(self.workdir, 'rootfs.cpio')
>  archive.lzma_uncompress(rootfs_path_xz, rootfs_path)
> +image_pow2ceil_truncate(rootfs_path)
>
>  self.vm.set_console()
>  kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
> @@ -679,6 +692,7 @@ def test_arm_orangepi_bionic(self):
>  image_name = 'Armbian_19.11.3_Orangepipc_bionic_current_5.3.9.img'
>  image_path = os.path.join(self.workdir, image_name)
>  process.run("7z e -o%s %s" % (self.workdir, image_path_7z))
> +image_pow2ceil_truncate(image_path)
>
>  self.vm.set_console()
>  self.vm.add_args('-drive', 'file=' + image_path +
> ',if=sd,format=raw',
> @@ -728,6 +742,7 @@ def test_arm_orangepi_uboot_netbsd9(self):
>  image_hash = '2babb29d36d8360adcb39c09e31060945259917a'
>  image_path_gz = self.fetch_asset(image_url, asset_hash=image_hash)
>  image_path = os.path.join(self.workdir, 'armv7.img')
> +image_pow2ceil_truncate(image_path)
>  image_drive_args = 'if=sd,format=raw,snapshot=on,file=' +
> image_path
>  archive.gzip_uncompress(image_path_gz, image_path)
>
> --
> 2.21.3
>
>
Hi Philippe,

This patch works OK for the linux part, but the NetBSD didn't work, it
prints this error:

   (5/5)
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_uboot_netbsd9:
ERROR: [Errno 2] No such file or directory:
'/var/tmp/avocado_6hoo815w/avocado_job_40aayif8/

5-tests_acceptance_boot_linux_console.py_BootLinuxConsole.test_arm_orangepi_uboot_netbsd9/armv7.img'
(0.18 s)

Basically the truncate should just be moved after the uncompress to fix it.
And the lines that we use before
to extend the image size can be removed now. That was needed to avoid
conflict with the partition size inside image.

So with these small changes, I got it working fine:

diff --git a/tests/acceptance/boot_linux_console.py
b/tests/acceptance/boot_linux_console.py
index f4d4e3635f..69607a5840 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -684,7 +684,7 @@ class BootLinuxConsole(LinuxKernelTest):
 :avocado: tags=machine:orangepi-pc
 """

-# This test download a 196MB compressed image and expand it to
932MB...
+# This test download a 196MB compressed image and expand it to 1G
 image_url = ('https://dl.armbian.com/orangepipc/archive/'
  'Armbian_19.11.3_Orangepipc_bionic_current_5.3.9.7z')
 image_hash = '196a8ffb72b0123d92cea4a070894813d305c71e'
@@ -725,7 +725,7 @@ class BootLinuxConsole(LinuxKernelTest):
 :avocado: tags=arch:arm
 :avocado: tags=machine:orangepi-pc
 """
-# This test download a 304MB compressed image and expand it to
1.3GB...
+# This test download a 304MB compressed image and expand it to
2GB...
 deb_url = ('http://snapshot.debian.org/archive/debian/'
'20200108T145233Z/pool/main/u/u-boot/'
'u-boot-sunxi_2020.01%2Bdfsg-1_armhf.deb')
@@ -742,9 +742,9 @@ class BootLinuxConsole(LinuxKernelTest):
 image_hash = '2babb29d36d8360adcb39c09e31060945259917a'
 image_path_gz = self.fetch_asset(image_url, asset_hash=image_hash)
 image_path = os.path.join(self.workdir, 'armv7.img')
-image_pow2ceil_truncate(image_path)
 image_drive_args = 'if=sd,format=raw,snapshot=on,file=' +
image_path
 archive.gzip_uncompress(image_path_gz, image_path)
+image_pow2ceil_truncate(image_path)

 # dd 

[PATCH] docs/system/arm/orangepi: add instructions for resizing SD image to power of two

2020-07-12 Thread Niek Linnenbank
SD cards need to have a size of a power of two. This commit updates
the Orange Pi machine documentation to include instructions for
resizing downloaded images using the qemu-img command.

Signed-off-by: Niek Linnenbank 
---
 docs/system/arm/orangepi.rst | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/docs/system/arm/orangepi.rst b/docs/system/arm/orangepi.rst
index c41adad488..6f23907fb6 100644
--- a/docs/system/arm/orangepi.rst
+++ b/docs/system/arm/orangepi.rst
@@ -127,6 +127,16 @@ can be downloaded from:
 Alternatively, you can also choose to build you own image with buildroot
 using the orangepi_pc_defconfig. Also see https://buildroot.org for more 
information.
 
+When using an image as an SD card, it must be resized to a power of two. This 
can be
+done with the qemu-img command. It is recommended to only increase the image 
size
+instead of shrinking it to a power of two, to avoid loss of data. For example,
+to prepare a downloaded Armbian image, first extract it and then increase
+its size to one gigabyte as follows:
+
+.. code-block:: bash
+
+  $ qemu-img resize Armbian_19.11.3_Orangepipc_bionic_current_5.3.9.img 1G
+
 You can choose to attach the selected image either as an SD card or as USB 
mass storage.
 For example, to boot using the Orange Pi PC Debian image on SD card, simply 
add the -sd
 argument and provide the proper root= kernel parameter:
@@ -213,12 +223,12 @@ Next, unzip the NetBSD image and write the U-Boot binary 
including SPL using:
   $ dd if=/path/to/u-boot-sunxi-with-spl.bin of=armv7.img bs=1024 seek=8 
conv=notrunc
 
 Finally, before starting the machine the SD image must be extended such
-that the NetBSD kernel will not conclude the NetBSD partition is larger than
-the emulated SD card:
+that the size of the SD image is a power of two and that the NetBSD kernel
+will not conclude the NetBSD partition is larger than the emulated SD card:
 
 .. code-block:: bash
 
-  $ dd if=/dev/zero bs=1M count=64 >> armv7.img
+  $ qemu-img resize armv7.img 2G
 
 Start the machine using the following command:
 
-- 
2.25.1




RE: Seeing a problem in multi cpu runs where memory mapped pcie device register reads are returning incorrect values

2020-07-12 Thread Mark Wood-Patrick


From: Mark Wood-Patrick 
Sent: Wednesday, July 1, 2020 11:26 AM
To: qemu-devel@nongnu.org
Cc: Mark Wood-Patrick 
Subject: Seeing a problem in multi cpu runs where memory mapped pcie device 
register reads are returning incorrect values

Background
I have a test environment which runs QEMU 4.2 with a plugin that runs two 
copies of a PCIE device simulator on a CentOS 7.5 host with an Ubuntu 18.04 
guest. When running with a single QEMU CPU using:

 -cpu kvm64,+lahf_lm -M q35,kernel-irqchip=off -device 
intel-iommu,intremap=on

Our tests run fine. But when running with multiple cpu's:

-cpu kvm64,+lahf_lm -M q35,kernel-irqchip=off -device 
intel-iommu,intremap=on -smp 2,sockets=1,cores=2

The values retuned are correct  all the way up the call stack and in 
KVM_EXIT_MMIO in kvm_cpu_exec (qemu-4.2.0/accel/kvm/kvm-all.c:2365)  but the 
value returned to the device driver which initiated the read is 0.

Question
Is anyone else running QEMU 4.2 in multi cpu mode? Is anyone getting incorrect 
reads from memory mapped device registers  when running in this mode? I would 
appreciate any pointers on how best to debug the flow from KVM_EXIT_MMIO back 
to the device driver running on the guest



[PATCH v4 1/3] scripts/simplebench: compare write request performance

2020-07-12 Thread Andrey Shinkevich
The script 'bench_write_req.py' allows comparing performances of write
request for two qemu-img binary files.
An example with (qemu-img binary 1) and without (qemu-img binary 2) the
applied patch "qcow2: skip writing zero buffers to empty COW areas"
(git commit ID: c8bb23cbdbe32f5) has the following results:

SSD:
-  ---  ---
 
  0.34 +- 0.01 10.57 +- 0.96
 0.33 +- 0.01 9.15 +- 0.85
   0.33 +- 0.00 8.72 +- 0.05
  7.43 +- 1.19 14.35 +- 1.00
-  ---  ---
HDD:
-  ---  ---
 
  32.61 +- 1.1755.11 +- 1.15
 54.28 +- 8.8260.11 +- 2.76
   57.93 +- 0.4758.53 +- 0.51
  11.47 +- 0.9417.29 +- 4.40
-  ---  ---

Suggested-by: Denis V. Lunev 
Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
---
 scripts/simplebench/bench_write_req.py | 173 +
 1 file changed, 173 insertions(+)
 create mode 100755 scripts/simplebench/bench_write_req.py

diff --git a/scripts/simplebench/bench_write_req.py 
b/scripts/simplebench/bench_write_req.py
new file mode 100755
index 000..a285ef1
--- /dev/null
+++ b/scripts/simplebench/bench_write_req.py
@@ -0,0 +1,173 @@
+#!/usr/bin/env python3
+#
+# Test to compare performance of write requests for two qemu-img binary files.
+#
+# Copyright (c) 2020 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+
+import sys
+import os
+import subprocess
+import simplebench
+
+
+def bench_func(env, case):
+""" Handle one "cell" of benchmarking table. """
+return bench_write_req(env['qemu_img'], env['image_name'],
+   case['block_size'], case['block_offset'],
+   case['requests'])
+
+
+def qemu_img_pipe(*args):
+'''Run qemu-img and return its output'''
+subp = subprocess.Popen(list(args),
+stdout=subprocess.PIPE,
+stderr=subprocess.STDOUT,
+universal_newlines=True)
+exitcode = subp.wait()
+if exitcode < 0:
+sys.stderr.write('qemu-img received signal %i: %s\n'
+ % (-exitcode, ' '.join(list(args
+return subp.communicate()[0]
+
+
+def bench_write_req(qemu_img, image_name, block_size, block_offset, requests):
+"""Benchmark write requests
+
+The function creates a QCOW2 image with the given path/name and fills it
+with random data optionally. Then it runs the 'qemu-img bench' command and
+makes series of write requests on the image clusters. Finally, it returns
+the total time of the write operations on the disk.
+
+qemu_img -- path to qemu_img executable file
+image_name   -- QCOW2 image name to create
+block_size   -- size of a block to write to clusters
+block_offset -- offset of the block in clusters
+requests -- number of write requests per cluster
+
+Returns {'seconds': int} on success and {'error': str} on failure.
+Return value is compatible with simplebench lib.
+"""
+
+if not os.path.isfile(qemu_img):
+print(f'File not found: {qemu_img}')
+sys.exit(1)
+
+image_dir = os.path.dirname(os.path.abspath(image_name))
+if not os.path.isdir(image_dir):
+print(f'Path not found: {image_name}')
+sys.exit(1)
+
+cluster_size = 1024 * 1024
+image_size = 1024 * cluster_size
+seek = 4
+dd_count = int(image_size / cluster_size) - seek
+
+args_create = [qemu_img, 'create', '-f', 'qcow2', '-o',
+   f'cluster_size={cluster_size}',
+   image_name, str(image_size)]
+
+count = requests * int(image_size / cluster_size)
+step = str(cluster_size)
+offset = str(block_offset)
+cnt = str(count)
+size = []
+if block_size:
+size = ['-s', f'{block_size}']
+
+args_bench = [qemu_img, 'bench', '-w', '-n', '-t', 'none', '-c', cnt,
+  '-S', step, '-o', offset, '-f', 'qcow2', image_name]
+if block_size:
+args_bench.extend(size)
+
+try:
+qemu_img_pipe(*args_create)
+except OSError as e:
+os.remove(image_name)
+return 

[PATCH v4 2/3] scripts/simplebench: allow writing to non-empty image

2020-07-12 Thread Andrey Shinkevich
Add 'empty_image' parameter to the function bench_write_req() and to
the test cases that will allow writing to the non-empty clusters of the
image if the 'empty_image' parameter set to False.

Signed-off-by: Andrey Shinkevich 
---
 scripts/simplebench/bench_write_req.py | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/scripts/simplebench/bench_write_req.py 
b/scripts/simplebench/bench_write_req.py
index a285ef1..f758f90 100755
--- a/scripts/simplebench/bench_write_req.py
+++ b/scripts/simplebench/bench_write_req.py
@@ -29,7 +29,7 @@ def bench_func(env, case):
 """ Handle one "cell" of benchmarking table. """
 return bench_write_req(env['qemu_img'], env['image_name'],
case['block_size'], case['block_offset'],
-   case['requests'])
+   case['requests'], case['empty_image'])
 
 
 def qemu_img_pipe(*args):
@@ -45,7 +45,8 @@ def qemu_img_pipe(*args):
 return subp.communicate()[0]
 
 
-def bench_write_req(qemu_img, image_name, block_size, block_offset, requests):
+def bench_write_req(qemu_img, image_name, block_size, block_offset, requests,
+empty_image):
 """Benchmark write requests
 
 The function creates a QCOW2 image with the given path/name and fills it
@@ -58,6 +59,7 @@ def bench_write_req(qemu_img, image_name, block_size, 
block_offset, requests):
 block_size   -- size of a block to write to clusters
 block_offset -- offset of the block in clusters
 requests -- number of write requests per cluster
+empty_image  -- if not True, fills image with random data
 
 Returns {'seconds': int} on success and {'error': str} on failure.
 Return value is compatible with simplebench lib.
@@ -96,6 +98,15 @@ def bench_write_req(qemu_img, image_name, block_size, 
block_offset, requests):
 
 try:
 qemu_img_pipe(*args_create)
+
+if not empty_image:
+dd = ['dd', 'if=/dev/urandom', f'of={image_name}',
+  f'bs={cluster_size}', f'seek={seek}',
+  f'count={dd_count}']
+devnull = open('/dev/null', 'w')
+subprocess.run(dd, stderr=devnull, stdout=devnull)
+subprocess.run('sync')
+
 except OSError as e:
 os.remove(image_name)
 return {'error': 'qemu_img create failed: ' + str(e)}
@@ -130,25 +141,29 @@ if __name__ == '__main__':
 'id': '',
 'block_size': 0,
 'block_offset': 0,
-'requests': 10
+'requests': 10,
+'empty_image': True
 },
 {
 'id': '',
 'block_size': 4096,
 'block_offset': 0,
-'requests': 10
+'requests': 10,
+'empty_image': True
 },
 {
 'id': '',
 'block_size': 4096,
 'block_offset': 524288,
-'requests': 10
+'requests': 10,
+'empty_image': True
 },
 {
 'id': '',
 'block_size': 524288,
 'block_offset': 4096,
-'requests': 2
+'requests': 2,
+'empty_image': True
 },
 ]
 
-- 
1.8.3.1




[PATCH v4 0/3] scripts/simplebench: add bench_write_req.py test

2020-07-12 Thread Andrey Shinkevich
The script 'bench_write_req.py' allows comparing performances of write request 
for two
qemu-img binary files. If you made a change in QEMU code and want to check the 
write
requests performance, you will want to build two qemu-img binary files with and 
without
your change. Then you specify paths to those binary files and put them as 
parameters to
the bench_write_req.py script. You may see other supported parameters in the 
USAGE help.

v4:
  01: 'if/else requests' blocks moved from patch 0001 to 0003.

v3: Based on the Vladimir's review
  01: The test results were amended in the patch description.
  02: The python format string syntax changed to the newer one f''.
  03: The 'empty_disk' test parameter fixed to True.
  04: The function bench_write_req() was supplied with commentary.
  05: The subprocess.call() was replaced with subprocess.run().
  06: The exception handling was improved.
  07: The v2 only patch was split into three in the series.

Andrey Shinkevich (3):
  scripts/simplebench: compare write request performance
  scripts/simplebench: allow writing to non-empty image
  scripts/simplebench: add unaligned data case to bench_write_req

 scripts/simplebench/bench_write_req.py | 206 +
 1 file changed, 206 insertions(+)
 create mode 100755 scripts/simplebench/bench_write_req.py

-- 
1.8.3.1




[PATCH v4 3/3] scripts/simplebench: add unaligned data case to bench_write_req

2020-07-12 Thread Andrey Shinkevich
Add a test case with writhing data unaligned to the image clusters.
This case does not involve the COW optimization introduced with the
patch "qcow2: skip writing zero buffers to empty COW areas"
(git commit ID: c8bb23cbdbe32f5).

Signed-off-by: Andrey Shinkevich 
---
 scripts/simplebench/bench_write_req.py | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/scripts/simplebench/bench_write_req.py 
b/scripts/simplebench/bench_write_req.py
index f758f90..9f3a520 100755
--- a/scripts/simplebench/bench_write_req.py
+++ b/scripts/simplebench/bench_write_req.py
@@ -58,7 +58,7 @@ def bench_write_req(qemu_img, image_name, block_size, 
block_offset, requests,
 image_name   -- QCOW2 image name to create
 block_size   -- size of a block to write to clusters
 block_offset -- offset of the block in clusters
-requests -- number of write requests per cluster
+requests -- number of write requests per cluster, customize if zero
 empty_image  -- if not True, fills image with random data
 
 Returns {'seconds': int} on success and {'error': str} on failure.
@@ -83,8 +83,17 @@ def bench_write_req(qemu_img, image_name, block_size, 
block_offset, requests,
f'cluster_size={cluster_size}',
image_name, str(image_size)]
 
-count = requests * int(image_size / cluster_size)
-step = str(cluster_size)
+if requests:
+count = requests * int(image_size / cluster_size)
+step = str(cluster_size)
+else:
+# Create unaligned write requests
+assert block_size
+shift = int(block_size * 1.01)
+count = int((image_size - block_offset) / shift)
+step = str(shift)
+depth = ['-d', '2']
+
 offset = str(block_offset)
 cnt = str(count)
 size = []
@@ -95,6 +104,8 @@ def bench_write_req(qemu_img, image_name, block_size, 
block_offset, requests,
   '-S', step, '-o', offset, '-f', 'qcow2', image_name]
 if block_size:
 args_bench.extend(size)
+if not requests:
+args_bench.extend(depth)
 
 try:
 qemu_img_pipe(*args_create)
@@ -165,6 +176,13 @@ if __name__ == '__main__':
 'requests': 2,
 'empty_image': True
 },
+{
+'id': '',
+'block_size': 104857600,
+'block_offset': 524288,
+'requests': 0,
+'empty_image': False
+},
 ]
 
 # Test-envs are "columns" in benchmark resulting table, 'id is a caption
-- 
1.8.3.1




[PATCH v3 1/3] scripts/simplebench: compare write request performance

2020-07-12 Thread Andrey Shinkevich
The script 'bench_write_req.py' allows comparing performances of write
request for two qemu-img binary files.
An example with (qemu-img binary 1) and without (qemu-img binary 2) the
applied patch "qcow2: skip writing zero buffers to empty COW areas"
(git commit ID: c8bb23cbdbe32f5) has the following results:

SSD:
-  ---  ---
 
  0.34 +- 0.01 10.57 +- 0.96
 0.33 +- 0.01 9.15 +- 0.85
   0.33 +- 0.00 8.72 +- 0.05
  7.43 +- 1.19 14.35 +- 1.00
-  ---  ---
HDD:
-  ---  ---
 
  32.61 +- 1.1755.11 +- 1.15
 54.28 +- 8.8260.11 +- 2.76
   57.93 +- 0.4758.53 +- 0.51
  11.47 +- 0.9417.29 +- 4.40
-  ---  ---

Suggested-by: Denis V. Lunev 
Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
---
 scripts/simplebench/bench_write_req.py | 184 +
 1 file changed, 184 insertions(+)
 create mode 100755 scripts/simplebench/bench_write_req.py

diff --git a/scripts/simplebench/bench_write_req.py 
b/scripts/simplebench/bench_write_req.py
new file mode 100755
index 000..c61c8d2
--- /dev/null
+++ b/scripts/simplebench/bench_write_req.py
@@ -0,0 +1,184 @@
+#!/usr/bin/env python3
+#
+# Test to compare performance of write requests for two qemu-img binary files.
+#
+# Copyright (c) 2020 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+
+import sys
+import os
+import subprocess
+import simplebench
+
+
+def bench_func(env, case):
+""" Handle one "cell" of benchmarking table. """
+return bench_write_req(env['qemu_img'], env['image_name'],
+   case['block_size'], case['block_offset'],
+   case['requests'])
+
+
+def qemu_img_pipe(*args):
+'''Run qemu-img and return its output'''
+subp = subprocess.Popen(list(args),
+stdout=subprocess.PIPE,
+stderr=subprocess.STDOUT,
+universal_newlines=True)
+exitcode = subp.wait()
+if exitcode < 0:
+sys.stderr.write('qemu-img received signal %i: %s\n'
+ % (-exitcode, ' '.join(list(args
+return subp.communicate()[0]
+
+
+def bench_write_req(qemu_img, image_name, block_size, block_offset, requests):
+"""Benchmark write requests
+
+The function creates a QCOW2 image with the given path/name and fills it
+with random data optionally. Then it runs the 'qemu-img bench' command and
+makes series of write requests on the image clusters. Finally, it returns
+the total time of the write operations on the disk.
+
+qemu_img -- path to qemu_img executable file
+image_name   -- QCOW2 image name to create
+block_size   -- size of a block to write to clusters
+block_offset -- offset of the block in clusters
+requests -- number of write requests per cluster
+
+Returns {'seconds': int} on success and {'error': str} on failure.
+Return value is compatible with simplebench lib.
+"""
+
+if not os.path.isfile(qemu_img):
+print(f'File not found: {qemu_img}')
+sys.exit(1)
+
+image_dir = os.path.dirname(os.path.abspath(image_name))
+if not os.path.isdir(image_dir):
+print(f'Path not found: {image_name}')
+sys.exit(1)
+
+cluster_size = 1024 * 1024
+image_size = 1024 * cluster_size
+seek = 4
+dd_count = int(image_size / cluster_size) - seek
+
+args_create = [qemu_img, 'create', '-f', 'qcow2', '-o',
+   f'cluster_size={cluster_size}',
+   image_name, str(image_size)]
+
+if requests:
+count = requests * int(image_size / cluster_size)
+step = str(cluster_size)
+else:
+# Create unaligned write requests
+assert block_size
+shift = int(block_size * 1.01)
+count = int((image_size - block_offset) / shift)
+step = str(shift)
+depth = ['-d', '2']
+
+offset = str(block_offset)
+cnt = str(count)
+size = []
+if block_size:
+size = ['-s', f'{block_size}']
+
+args_bench = [qemu_img, 'bench', '-w', '-n', '-t', 

[PATCH v3 2/3] scripts/simplebench: allow writing to non-empty image

2020-07-12 Thread Andrey Shinkevich
Add 'empty_image' parameter to the function bench_write_req() and to
the test cases that will allow writing to the non-empty clusters of the
image if the 'empty_image' parameter set to False.

Signed-off-by: Andrey Shinkevich 
---
 scripts/simplebench/bench_write_req.py | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/scripts/simplebench/bench_write_req.py 
b/scripts/simplebench/bench_write_req.py
index c61c8d2..ceb0ab6 100755
--- a/scripts/simplebench/bench_write_req.py
+++ b/scripts/simplebench/bench_write_req.py
@@ -29,7 +29,7 @@ def bench_func(env, case):
 """ Handle one "cell" of benchmarking table. """
 return bench_write_req(env['qemu_img'], env['image_name'],
case['block_size'], case['block_offset'],
-   case['requests'])
+   case['requests'], case['empty_image'])
 
 
 def qemu_img_pipe(*args):
@@ -45,7 +45,8 @@ def qemu_img_pipe(*args):
 return subp.communicate()[0]
 
 
-def bench_write_req(qemu_img, image_name, block_size, block_offset, requests):
+def bench_write_req(qemu_img, image_name, block_size, block_offset, requests,
+empty_image):
 """Benchmark write requests
 
 The function creates a QCOW2 image with the given path/name and fills it
@@ -58,6 +59,7 @@ def bench_write_req(qemu_img, image_name, block_size, 
block_offset, requests):
 block_size   -- size of a block to write to clusters
 block_offset -- offset of the block in clusters
 requests -- number of write requests per cluster
+empty_image  -- if not True, fills image with random data
 
 Returns {'seconds': int} on success and {'error': str} on failure.
 Return value is compatible with simplebench lib.
@@ -107,6 +109,15 @@ def bench_write_req(qemu_img, image_name, block_size, 
block_offset, requests):
 
 try:
 qemu_img_pipe(*args_create)
+
+if not empty_image:
+dd = ['dd', 'if=/dev/urandom', f'of={image_name}',
+  f'bs={cluster_size}', f'seek={seek}',
+  f'count={dd_count}']
+devnull = open('/dev/null', 'w')
+subprocess.run(dd, stderr=devnull, stdout=devnull)
+subprocess.run('sync')
+
 except OSError as e:
 os.remove(image_name)
 return {'error': 'qemu_img create failed: ' + str(e)}
@@ -141,25 +152,29 @@ if __name__ == '__main__':
 'id': '',
 'block_size': 0,
 'block_offset': 0,
-'requests': 10
+'requests': 10,
+'empty_image': True
 },
 {
 'id': '',
 'block_size': 4096,
 'block_offset': 0,
-'requests': 10
+'requests': 10,
+'empty_image': True
 },
 {
 'id': '',
 'block_size': 4096,
 'block_offset': 524288,
-'requests': 10
+'requests': 10,
+'empty_image': True
 },
 {
 'id': '',
 'block_size': 524288,
 'block_offset': 4096,
-'requests': 2
+'requests': 2,
+'empty_image': True
 },
 ]
 
-- 
1.8.3.1




[PATCH v3 0/3] scripts/simplebench: add bench_write_req.py test

2020-07-12 Thread Andrey Shinkevich
The script 'bench_write_req.py' allows comparing performances of write request 
for two
qemu-img binary files. If you made a change in QEMU code and want to check the 
write
requests performance, you will want to build two qemu-img binary files with and 
without
your change. Then you specify paths to those binary files and put them as 
parameters to
the bench_write_req.py script. You may see other supported parameters in the 
USAGE help.

v3: Based on the Vladimir's review
  01: The test results were amended in the patch description.
  02: The python format string syntax changed to the newer one f''.
  03: The 'empty_disk' test parameter fixed to True.
  04: The function bench_write_req() was supplied with commentary.
  05: The subprocess.call() was replaced with subprocess.run().
  06: The exception handling was improved.
  07: The v2 only patch was split into three in the series.

Andrey Shinkevich (3):
  scripts/simplebench: compare write request performance
  scripts/simplebench: allow writing to non-empty image
  scripts/simplebench: add unaligned data case to bench_write_req

 scripts/simplebench/bench_write_req.py | 206 +
 1 file changed, 206 insertions(+)
 create mode 100755 scripts/simplebench/bench_write_req.py

-- 
1.8.3.1




[PATCH v3 3/3] scripts/simplebench: add unaligned data case to bench_write_req

2020-07-12 Thread Andrey Shinkevich
Add a test case with writhing data unaligned to the image clusters.
This case does not involve the COW optimization introduced with the
patch "qcow2: skip writing zero buffers to empty COW areas"
(git commit ID: c8bb23cbdbe32f5).

Signed-off-by: Andrey Shinkevich 
---
 scripts/simplebench/bench_write_req.py | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/scripts/simplebench/bench_write_req.py 
b/scripts/simplebench/bench_write_req.py
index ceb0ab6..9f3a520 100755
--- a/scripts/simplebench/bench_write_req.py
+++ b/scripts/simplebench/bench_write_req.py
@@ -58,7 +58,7 @@ def bench_write_req(qemu_img, image_name, block_size, 
block_offset, requests,
 image_name   -- QCOW2 image name to create
 block_size   -- size of a block to write to clusters
 block_offset -- offset of the block in clusters
-requests -- number of write requests per cluster
+requests -- number of write requests per cluster, customize if zero
 empty_image  -- if not True, fills image with random data
 
 Returns {'seconds': int} on success and {'error': str} on failure.
@@ -176,6 +176,13 @@ if __name__ == '__main__':
 'requests': 2,
 'empty_image': True
 },
+{
+'id': '',
+'block_size': 104857600,
+'block_offset': 524288,
+'requests': 0,
+'empty_image': False
+},
 ]
 
 # Test-envs are "columns" in benchmark resulting table, 'id is a caption
-- 
1.8.3.1




Re: [PATCH v2] scripts/simplebench: compare write request performance

2020-07-12 Thread Andrey Shinkevich

On 11.07.2020 16:05, Vladimir Sementsov-Ogievskiy wrote:

26.06.2020 17:31, Andrey Shinkevich wrote:

The script 'bench_write_req.py' allows comparing performances of write
request for two qemu-img binary files.
An example with (qemu-img binary 1) and without (qemu-img binary 2) the
applied patch "qcow2: skip writing zero buffers to empty COW areas"
(git commit ID: c8bb23cbdbe32f5)
The  case does not involve the COW optimization.


Good, this proves that c8bb23cbdbe32f5 makes sense.


Suggested-by: Denis V. Lunev 
Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
---
v2:
   01: Three more test cases added to the script:
   
   
   

  scripts/simplebench/bench_write_req.py | 201 
+

  1 file changed, 201 insertions(+)
  create mode 100755 scripts/simplebench/bench_write_req.py

diff --git a/scripts/simplebench/bench_write_req.py 
b/scripts/simplebench/bench_write_req.py

new file mode 100755
index 000..fe92d01
--- /dev/null
+++ b/scripts/simplebench/bench_write_req.py
@@ -0,0 +1,201 @@


Next, I don't understand, are you trying to fill qcow2 image by dd 
directly? This is strange. Even if you don't break metadata, you don't 
change it, so all cluster will remain empty.




I have tested and it works as designed.

This dd command doesn't hurt the metadata and fills the image with 
random data. The actual disk size becomes about 1G after the dd command.


Andrey




Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed

2020-07-12 Thread Paolo Bonzini
On 12/07/20 12:00, Claudio Fontana wrote:
> Note: only the === -blockdev with a backing file === part of test 267 fails. 
> -blockdev with NBD is ok, like all the rest.
> 
> 
> Interesting facts about s390 in particular: its save/load code includes the 
> transfer of "storage keys",
> which include a buffer of 32768 bytes of keydata in the stream.
> 
> The code (hw/s390x/s390-skeys.c),
> is modeled similarly to RAM transfer (like in migration/ram.c), with an EOS 
> (end of stream) marker.
> 
> Countrary to RAM transfer code though, after qemu_put_be64(f, EOS), the s390 
> code does not qemu_fflush(f).

1) Are there unexpected differences in the migration stream?  That is,
you could modify qcow2.c to fopen/fwrite/fclose the bytes as they're
written and read, and see if something does not match.

2) If it matches, are there unexpected differences other than the lack
of icount section when you apply the reproducer patch?

The fflush part makes me put more hope in the first, but both could help
you debug it.

Thanks,

Paolo




QEMU | Pipeline #165708498 has failed for master | d3449830

2020-07-12 Thread GitLab via


Your pipeline has failed.

Project: QEMU ( https://gitlab.com/qemu-project/qemu )
Branch: master ( https://gitlab.com/qemu-project/qemu/-/commits/master )

Commit: d3449830 ( 
https://gitlab.com/qemu-project/qemu/-/commit/d34498309cff7560ac90c422c56e3137e6a64b19
 )
Commit Message: Merge remote-tracking branch 'remotes/philmd-gi...
Commit Author: Peter Maydell ( https://gitlab.com/pm215 )

Pipeline #165708498 ( 
https://gitlab.com/qemu-project/qemu/-/pipelines/165708498 ) triggered by Alex 
Bennée ( https://gitlab.com/stsquad )
had 1 failed build.

Job #634867497 ( https://gitlab.com/qemu-project/qemu/-/jobs/634867497/raw )

Stage: test
Name: build-disabled
Trace: qemu-system-i386: falling back to tcg
Could not access KVM kernel module: No such file or directory
qemu-system-i386: -accel kvm: failed to initialize kvm: No such file or 
directory
qemu-system-i386: falling back to tcg
Could not access KVM kernel module: No such file or directory
qemu-system-i386: -accel kvm: failed to initialize kvm: No such file or 
directory
qemu-system-i386: falling back to tcg
  TESTcheck-qtest-i386: tests/qtest/device-introspect-test
  TESTcheck-qtest-i386: tests/qtest/machine-none-test
  TESTcheck-qtest-i386: tests/qtest/qmp-test
  TESTcheck-qtest-i386: tests/qtest/qmp-cmd-test
  TESTcheck-qtest-i386: tests/qtest/qom-test
  TESTcheck-qtest-i386: tests/qtest/test-hmp
  TESTcheck-qtest-i386: tests/qtest/qos-test
  TESTcheck-qtest-mips64: tests/qtest/endianness-test
  TESTcheck-qtest-mips64: tests/qtest/display-vga-test
  TESTcheck-qtest-mips64: tests/qtest/cdrom-test
  TESTcheck-qtest-mips64: tests/qtest/device-introspect-test
  TESTcheck-qtest-mips64: tests/qtest/machine-none-test
  TESTcheck-qtest-mips64: tests/qtest/qmp-test
  TESTcheck-qtest-mips64: tests/qtest/qmp-cmd-test
  TESTcheck-qtest-mips64: tests/qtest/qom-test
  TESTcheck-qtest-mips64: tests/qtest/test-hmp
  TESTcheck-qtest-mips64: tests/qtest/qos-test
  TESTcheck-qtest-ppc64: tests/qtest/machine-none-test
  TESTcheck-qtest-ppc64: tests/qtest/qmp-test
  TESTcheck-qtest-ppc64: tests/qtest/qmp-cmd-test
  TESTcheck-qtest-ppc64: tests/qtest/qom-test
section_end:1594569381:step_script
ERROR: Job failed: execution took longer than 1h0m0s seconds



-- 
You're receiving this email because of your account on gitlab.com.





RE: [PATCH v2 1/4] target/nios2: add DISAS_NORETURN case for nothing more to generate

2020-07-12 Thread Wu, Wentong
> -Original Message-
> From: Peter Maydell  
> Sent: Sunday, July 12, 2020 2:50 AM
> To: Wu, Wentong 
> Cc: QEMU Developers ; QEMU Trivial 
> ; Chris Wulff ; Marek Vasut 
> 
> Subject: Re: [PATCH v2 1/4] target/nios2: add DISAS_NORETURN case for nothing 
> more to generate
> 
> On Fri, 10 Jul 2020 at 16:46, Wentong Wu  wrote:
> >
> > Add DISAS_NORETURN case for nothing more to generate because at 
> > runtime execution will never return from some helper call. And at the 
> > same time replace DISAS_UPDATE in t_gen_helper_raise_exception and 
> > gen_exception with the newly added DISAS_NORETURN.
> >
> > Signed-off-by: Wentong Wu 
> 
> Hi; I'm going to pick these up and get them into master.
> 
> A couple of notes below for if you plan to submit more patches to QEMU in 
> future: these are really just minor workflow things, but they do help make 
> our lives easier in getting code submissions into the tree.

Thanks Peter, I will follow the process to submit more patches to QEMU project, 
and I really learn a lot! Thanks

> If people provide you with a Reviewed-by: tag for a patch, and you don't 
> change it when you send out an updated version, it's helpful if you include 
> that tag in the commit message of the revised version you send out. This 
> saves people having to remember whether they'd reviewed something or not, and 
> means that when applying I don't have to go back and look at old versions to 
> see who reviewed what.
>
> Patch series are much easier for our tooling to deal with if you send them 
> out with a cover letter email (a 0/n email which all the other emails are 
> followups to; git format-patch has a '--cover-letter' option which will do 
> the right thing here).
> 
> We document this kind of workflow stuff here:
> https://wiki.qemu.org/Contribute/SubmitAPatch
>
> thanks
> -- PMM


Re: [PULL v3 00/32] AVR port

2020-07-12 Thread Peter Maydell
On Sat, 11 Jul 2020 at 10:07, Philippe Mathieu-Daudé  wrote:
>
> Since v2:
>
>   Removed incorrect cpu_to_le32() call.
>
> Since v1:
>
>   Fixed issue on big-endian host reported by Peter Maydell.
>
> Possible false-positives from checkpatch:
>
>   WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
>
> The following changes since commit f2a1cf9180f63e88bb38ff21c169da97c3f2bad5:
>
>   Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2020-07-07-v2'=
>  into staging (2020-07-10 14:41:23 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/philmd/qemu.git tags/avr-port-20200711
>
> for you to fetch changes up to 19b293472f1514b5424ef4d9b092e02bd9b106c2:
>
>   target/avr/disas: Fix store instructions display order (2020-07-11 11:02:05=
>  +0200)
>
> 
> 8bit AVR port from Michael Rolnik.
>
> Michael started to work on the AVR port few years ago [*] and kept
> improving the code over various series.
>
> List of people who help him (in chronological order):
> - Richard Henderson
> - Sarah Harris and Edward Robbins
> - Philippe Mathieu-Daud=C3=A9 and Aleksandar Markovic
> - Pavel Dovgalyuk
> - Thomas Huth
>
> [*] The oldest contribution I could find on the list is from 2016:
> https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg02985.html


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.1
for any user-visible changes.

-- PMM



[PATCH] Allow acpi-tmr size=2

2020-07-12 Thread Simon John
macos guests no longer boot after commit 
5d971f9e672507210e77d020d89e0e89165c8fc9


acpi-tmr needs 2 byte memory accesses, so breaks as that commit only 
allows 4 bytes.


Fixes: 5d971f9e672507210e7 (memory: Revert "memory: accept mismatching 
sizes in memory_region_access_valid")

Buglink: https://bugs.launchpad.net/qemu/+bug/1886318

Signed-off-by: Simon John 
---
 hw/acpi/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/acpi/core.c b/hw/acpi/core.c
index f6d9ec4f13..05ff29b9d7 100644
--- a/hw/acpi/core.c
+++ b/hw/acpi/core.c
@@ -527,7 +527,7 @@ static void acpi_pm_tmr_write(void *opaque, hwaddr 
addr, uint64_t val,

 static const MemoryRegionOps acpi_pm_tmr_ops = {
 .read = acpi_pm_tmr_read,
 .write = acpi_pm_tmr_write,
-.valid.min_access_size = 4,
+.valid.min_access_size = 1,
 .valid.max_access_size = 4,
 .endianness = DEVICE_LITTLE_ENDIAN,
 };
--
2.27.0




[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

2020-07-12 Thread Rafael David Tinoco
Started working on this again...

** Changed in: qemu (Ubuntu Bionic)
   Status: Triaged => In Progress

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  Triaged
Status in kunpeng920 ubuntu-18.04 series:
  Triaged
Status in kunpeng920 ubuntu-18.04-hwe series:
  Triaged
Status in kunpeng920 ubuntu-19.10 series:
  Fix Released
Status in kunpeng920 ubuntu-20.04 series:
  Fix Released
Status in kunpeng920 upstream-kernel series:
  Invalid
Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Bionic:
  In Progress
Status in qemu source package in Eoan:
  Fix Released
Status in qemu source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * QEMU locking primitives might face a race condition in QEMU Async
  I/O bottom halves scheduling. This leads to a dead lock making either
  QEMU or one of its tools to hang indefinitely.

  [Test Case]

  * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs in Aarch64.

  [Regression Potential]

  * This is a change to a core part of QEMU: The AIO scheduling. It
  works like a "kernel" scheduler, whereas kernel schedules OS tasks,
  the QEMU AIO code is responsible to schedule QEMU coroutines or event
  listeners callbacks.

  * There was a long discussion upstream about primitives and Aarch64.
  After quite sometime Paolo released this patch and it solves the
  issue. Tested platforms were: amd64 and aarch64 based on his commit
  log.

  * Christian suggests that this fix stay little longer in -proposed to
  make sure it won't cause any regressions.

  * dannf suggests we also check for performance regressions; e.g. how
  long it takes to convert a cloud image on high-core systems.

  [Other Info]

   * Original Description bellow:

  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP 72859) exited]
  [Thread 0xa7ffed90 (LWP 72860) exited]

  
  """

  All the tasks left are blocked in a system call, so no task left to call
  qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
  thread #1 (doing poll() in a pipe with thread #2).

  Those 7 threads exit before disk conversion is complete (sometimes in
  the beginning, sometimes at the end).

  

  On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img
  frequently hangs (~50% of the time) with this command:

  qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

  Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This
  qcow2->qcow2 conversion happens to be something uvtool does every time
  it fetches images.

  Once hung, attaching gdb gives the following backtrace:

  (gdb) bt
  #0  0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, 
nfds=187650274213760,
  timeout=, timeout@entry=0x0, sigmask=0xc123b950)
  at ../sysdeps/unix/sysv/linux/ppoll.c:39
  #1  0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, 

[Bug 1809665] Re: Xbox One controller USB passthrough disconnections and stops

2020-07-12 Thread Ticketsolve
> This happened to me as well, but I managed to find a solution, if I
ban the xpad driver through modprobe.d, then the problem disappear.

Thanks, that's very interesting (and useful, although nowadays I use the
BT connection).

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1809665

Title:
  Xbox One controller USB passthrough disconnections and stops

Status in QEMU:
  New

Bug description:
  I can't properly passthrough my Xbox One controller to a virtual
  machine; it causes USB disconnections on the host, ultimately
  preventing it to work (at all) on the guest

  I've seen a few other cases reported in other websites, which show the
  same symptoms:

  - https://www.reddit.com/r/VFIO/comments/97dhbw/qemu_w10_xbox_one_controller
  - 
https://unix.stackexchange.com/questions/452751/how-can-i-pass-through-an-xbox-one-controller-to-a-windows-vm-on-ubuntu

  This is sample:

  libusb: error [udev_hotplug_event] ignoring udev action bind
  qemu-system-x86_64: libusb_release_interface: -4 [NO_DEVICE]
  qemu-system-x86_64: libusb_release_interface: -4 [NO_DEVICE]
  qemu-system-x86_64: libusb_release_interface: -4 [NO_DEVICE]
  libusb: error [_get_usbfs_fd] File doesn't exist, wait 10 ms and try again
  libusb: error [_get_usbfs_fd] libusb couldn't open USB device
  /dev/bus/usb/003/016: No such file or directory

  I think this is a quite long-standing issue, as I've been experiencing
  through several versions, including the current one (3.1).

  I can reproduce this 100% of the times, on multiple host O/S
  distributions (the current one being based on Ubuntu 18.04 x86-64).

  I compile QEMU directly from source, and execute it via commandline;
  the command is very long, however, the relevant part is standard (I
  think):

  -usb \
  -device usb-tablet \
  -device 
usb-host,vendorid=0x$VGAPT_XBOX_PAD_VEND_ID,productid=0x$VGAPT_XBOX_PAD_PROD_ID 
\

  The guest is Windows 10 64bit.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1809665/+subscriptions



[Bug 1809665] Re: Xbox One controller USB passthrough disconnections and stops

2020-07-12 Thread Kasper Grubbe
This happened to me as well, but I managed to find a solution, if I ban
the xpad driver through modprobe.d, then the problem disappear.

I added the following line:

blacklist xpad

To this file: /etc/modprobe.d/vfio.conf, rebooted, and then I could use
my Xbox One S controller with Qemu, I am not sure if it's a xpad bug or
a hardware bug.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1809665

Title:
  Xbox One controller USB passthrough disconnections and stops

Status in QEMU:
  New

Bug description:
  I can't properly passthrough my Xbox One controller to a virtual
  machine; it causes USB disconnections on the host, ultimately
  preventing it to work (at all) on the guest

  I've seen a few other cases reported in other websites, which show the
  same symptoms:

  - https://www.reddit.com/r/VFIO/comments/97dhbw/qemu_w10_xbox_one_controller
  - 
https://unix.stackexchange.com/questions/452751/how-can-i-pass-through-an-xbox-one-controller-to-a-windows-vm-on-ubuntu

  This is sample:

  libusb: error [udev_hotplug_event] ignoring udev action bind
  qemu-system-x86_64: libusb_release_interface: -4 [NO_DEVICE]
  qemu-system-x86_64: libusb_release_interface: -4 [NO_DEVICE]
  qemu-system-x86_64: libusb_release_interface: -4 [NO_DEVICE]
  libusb: error [_get_usbfs_fd] File doesn't exist, wait 10 ms and try again
  libusb: error [_get_usbfs_fd] libusb couldn't open USB device
  /dev/bus/usb/003/016: No such file or directory

  I think this is a quite long-standing issue, as I've been experiencing
  through several versions, including the current one (3.1).

  I can reproduce this 100% of the times, on multiple host O/S
  distributions (the current one being based on Ubuntu 18.04 x86-64).

  I compile QEMU directly from source, and execute it via commandline;
  the command is very long, however, the relevant part is standard (I
  think):

  -usb \
  -device usb-tablet \
  -device 
usb-host,vendorid=0x$VGAPT_XBOX_PAD_VEND_ID,productid=0x$VGAPT_XBOX_PAD_PROD_ID 
\

  The guest is Windows 10 64bit.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1809665/+subscriptions



[Bug 1886318] Re: Qemu after v5.0.0 breaks macos guests

2020-07-12 Thread Michael Tokarev
I think we should add debugging patch by Mark to qemu too, — I suspect
there will be more cases like this, since this check were turned off for
a few years.  Maybe not as printf's but as logging, I dunno, but the
info it collects is really a must-have.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1886318

Title:
  Qemu after v5.0.0 breaks macos guests

Status in QEMU:
  New

Bug description:
  The Debian Sid 5.0-6 qemu-kvm package can no longer get further than
  the Clover bootloader whereas 5.0-6 and earlier worked fine.

  So I built qemu master from github and it has the same problem,
  whereas git tag v5.0.0 (or 4.2.1) does not, so something between
  v5.0.0 release and the last few days has caused the problem.

  Here's my qemu script, pretty standard macOS-Simple-KVM setup on a
  Xeon host:

  qemu-system-x86_64 \
  -enable-kvm \
  -m 4G \
  -machine q35,accel=kvm \
  -smp 4,sockets=1,cores=2,threads=2 \
  -cpu 
  
Penryn,vendor=GenuineIntel,kvm=on,+sse3,+sse4.2,+aes,+xsave,+avx,+xsaveopt,+xsavec,+xgetbv1,+avx2,+bmi2,+smep,+bmi1,+fma,+movbe,+invtsc
 
  \
  -device 
  
isa-applesmc,osk="ourhardworkbythesewordsguardedpleasedontsteal(c)AppleComputerInc"
 
  \
  -smbios type=2 \
  -drive if=pflash,format=raw,readonly,file="/tmp/OVMF_CODE.fd" \
  -drive if=pflash,format=raw,file="/tmp/macos_catalina_VARS.fd" \
  -vga qxl \
  -device ich9-ahci,id=sata \
  -drive id=ESP,if=none,format=raw,file=/tmp/ESP.img \
  -device ide-hd,bus=sata.2,drive=ESP \
  -drive id=InstallMedia,format=raw,if=none,file=/tmp/BaseSystem.img \
  -device ide-hd,bus=sata.3,drive=InstallMedia \
  -drive id=SystemDisk,if=none,format=raw,file=/tmp/macos_catalina.img \
  -device ide-hd,bus=sata.4,drive=SystemDisk \
  -usb -device usb-kbd -device usb-mouse

  Perhaps something has changed in Penryn support recently, as that's
  required for macos?

  See also https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=964247

  Also on a related note, kernel 5.6/5.7 (on Debian) hard crashes the
  host when I try GPU passthrough on macos, whereas Ubuntu20/Win10 work
  fine - as does 5.5 kernel.

  See also https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=961676

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1886318/+subscriptions



[Bug 1886318] Re: Qemu after v5.0.0 breaks macos guests

2020-07-12 Thread Simon John
urgh, that was complicated, think i got it right!

need to look for "[PATCH] Allow acpi-tmr size=2" to show up in qemu-
devel

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1886318

Title:
  Qemu after v5.0.0 breaks macos guests

Status in QEMU:
  New

Bug description:
  The Debian Sid 5.0-6 qemu-kvm package can no longer get further than
  the Clover bootloader whereas 5.0-6 and earlier worked fine.

  So I built qemu master from github and it has the same problem,
  whereas git tag v5.0.0 (or 4.2.1) does not, so something between
  v5.0.0 release and the last few days has caused the problem.

  Here's my qemu script, pretty standard macOS-Simple-KVM setup on a
  Xeon host:

  qemu-system-x86_64 \
  -enable-kvm \
  -m 4G \
  -machine q35,accel=kvm \
  -smp 4,sockets=1,cores=2,threads=2 \
  -cpu 
  
Penryn,vendor=GenuineIntel,kvm=on,+sse3,+sse4.2,+aes,+xsave,+avx,+xsaveopt,+xsavec,+xgetbv1,+avx2,+bmi2,+smep,+bmi1,+fma,+movbe,+invtsc
 
  \
  -device 
  
isa-applesmc,osk="ourhardworkbythesewordsguardedpleasedontsteal(c)AppleComputerInc"
 
  \
  -smbios type=2 \
  -drive if=pflash,format=raw,readonly,file="/tmp/OVMF_CODE.fd" \
  -drive if=pflash,format=raw,file="/tmp/macos_catalina_VARS.fd" \
  -vga qxl \
  -device ich9-ahci,id=sata \
  -drive id=ESP,if=none,format=raw,file=/tmp/ESP.img \
  -device ide-hd,bus=sata.2,drive=ESP \
  -drive id=InstallMedia,format=raw,if=none,file=/tmp/BaseSystem.img \
  -device ide-hd,bus=sata.3,drive=InstallMedia \
  -drive id=SystemDisk,if=none,format=raw,file=/tmp/macos_catalina.img \
  -device ide-hd,bus=sata.4,drive=SystemDisk \
  -usb -device usb-kbd -device usb-mouse

  Perhaps something has changed in Penryn support recently, as that's
  required for macos?

  See also https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=964247

  Also on a related note, kernel 5.6/5.7 (on Debian) hard crashes the
  host when I try GPU passthrough on macos, whereas Ubuntu20/Win10 work
  fine - as does 5.5 kernel.

  See also https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=961676

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1886318/+subscriptions



Re: [PATCH] block/amend: Check whether the node exists

2020-07-12 Thread Maxim Levitsky
On Fri, 2020-07-10 at 11:50 +0200, Max Reitz wrote:
> We should check whether the user-specified node-name actually refers to
> a node.  The simplest way to do that is to use bdrv_lookup_bs() instead
> of bdrv_find_node() (the former wraps the latter, and produces an error
> message if necessary).
> 
> Reported-by: Coverity (CID 1430268)
> Fixes: ced914d0ab9fb2c900f873f6349a0b8eecd1fdbe
> Signed-off-by: Max Reitz 
> ---
>  block/amend.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/block/amend.c b/block/amend.c
> index f4612dcf08..392df9ef83 100644
> --- a/block/amend.c
> +++ b/block/amend.c
> @@ -69,8 +69,12 @@ void qmp_x_blockdev_amend(const char *job_id,
>  BlockdevAmendJob *s;
>  const char *fmt = BlockdevDriver_str(options->driver);
>  BlockDriver *drv = bdrv_find_format(fmt);
> -BlockDriverState *bs = bdrv_find_node(node_name);
> +BlockDriverState *bs;
>  
> +bs = bdrv_lookup_bs(NULL, node_name, errp);
> +if (!bs) {
> +return;
> +}
>  
>  if (!drv) {
>  error_setg(errp, "Block driver '%s' not found or not supported", 
> fmt);

Yep, this looks like a real bug, sorry about that.

Reviewed-by: Maxim Levitsky 

Best regards,
Maxim Levitsky




[RFC v8 24/25] intel_iommu: process PASID-based Device-TLB invalidation

2020-07-12 Thread Liu Yi L
This patch adds an empty handling for PASID-based Device-TLB
invalidation. For now it is enough as it is not necessary to
propagate it to host for passthru device and also there is no
emulated device has device tlb.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Yi Sun 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Reviewed-by: Peter Xu 
Signed-off-by: Liu Yi L 
---
 hw/i386/intel_iommu.c  | 18 ++
 hw/i386/intel_iommu_internal.h |  1 +
 2 files changed, 19 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d3c41a6..2bbb4b1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3213,6 +3213,17 @@ static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
 return true;
 }
 
+static bool vtd_process_device_piotlb_desc(IntelIOMMUState *s,
+   VTDInvDesc *inv_desc)
+{
+/*
+ * no need to handle it for passthru device, for emulated
+ * devices with device tlb, it may be required, but for now,
+ * return is enough
+ */
+return true;
+}
+
 static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
   VTDInvDesc *inv_desc)
 {
@@ -3334,6 +3345,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
 }
 break;
 
+case VTD_INV_DESC_DEV_PIOTLB:
+trace_vtd_inv_desc("device-piotlb", inv_desc.hi, inv_desc.lo);
+if (!vtd_process_device_piotlb_desc(s, _desc)) {
+return false;
+}
+break;
+
 case VTD_INV_DESC_DEVICE:
 trace_vtd_inv_desc("device", inv_desc.hi, inv_desc.lo);
 if (!vtd_process_device_iotlb_desc(s, _desc)) {
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 08ff58e..9b4fc67 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -405,6 +405,7 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_WAIT   0x5 /* Invalidation Wait Descriptor */
 #define VTD_INV_DESC_PIOTLB 0x6 /* PASID-IOTLB Invalidate Desc */
 #define VTD_INV_DESC_PC 0x7 /* PASID-cache Invalidate Desc */
+#define VTD_INV_DESC_DEV_PIOTLB 0x8 /* PASID-based-DIOTLB inv_desc*/
 #define VTD_INV_DESC_NONE   0   /* Not an Invalidate Descriptor */
 
 /* Masks for Invalidation Wait Descriptor*/
-- 
2.7.4




[RFC v8 25/25] intel_iommu: modify x-scalable-mode to be string option

2020-07-12 Thread Liu Yi L
Intel VT-d 3.0 introduces scalable mode, and it has a bunch of capabilities
related to scalable mode translation, thus there are multiple combinations.
While this vIOMMU implementation wants simplify it for user by providing
typical combinations. User could config it by "x-scalable-mode" option. The
usage is as below:

"-device intel-iommu,x-scalable-mode=["legacy"|"modern"|"off"]"

 - "legacy": gives support for SL page table
 - "modern": gives support for FL page table, pasid, virtual command
 - "off": no scalable mode support
 -  if not configured, means no scalable mode support, if not proper
configured, will throw error

Note: this patch is supposed to be merged when the whole vSVA patch series
were merged.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Yi Sun 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Reviewed-by: Peter Xu 
Signed-off-by: Liu Yi L 
Signed-off-by: Yi Sun 
---
rfcv5 (v2) -> rfcv6:
*) reports want_nested to VFIO;
*) assert iommu_set/unset_iommu_context() if vIOMMU is not scalable modern.
---
 hw/i386/intel_iommu.c  | 39 +++
 hw/i386/intel_iommu_internal.h |  3 +++
 include/hw/i386/intel_iommu.h  |  2 ++
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 2bbb4b1..d807484 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4050,7 +4050,7 @@ static Property vtd_properties[] = {
 DEFINE_PROP_UINT8("aw-bits", IntelIOMMUState, aw_bits,
   VTD_HOST_ADDRESS_WIDTH),
 DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
-DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE),
+DEFINE_PROP_STRING("x-scalable-mode", IntelIOMMUState, scalable_mode_str),
 DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true),
 DEFINE_PROP_END_OF_LIST(),
 };
@@ -4420,6 +4420,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus, int devfn)
 static int vtd_dev_get_iommu_attr(PCIBus *bus, void *opaque, int32_t devfn,
IOMMUAttr attr, void *data)
 {
+IntelIOMMUState *s = opaque;
 int ret = 0;
 
 assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
@@ -4429,8 +4430,7 @@ static int vtd_dev_get_iommu_attr(PCIBus *bus, void 
*opaque, int32_t devfn,
 {
 bool *pdata = data;
 
-/* return false until vSVA is ready */
-*pdata = false;
+*pdata = s->scalable_modern ? true : false;
 break;
 }
 default:
@@ -4526,6 +4526,8 @@ static int vtd_dev_set_iommu_context(PCIBus *bus, void 
*opaque,
 VTDHostIOMMUContext *vtd_dev_icx;
 
 assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
+/* only modern scalable supports set_ioimmu_context */
+assert(s->scalable_modern);
 
 vtd_bus = vtd_find_add_bus(s, bus);
 
@@ -4560,6 +4562,8 @@ static void vtd_dev_unset_iommu_context(PCIBus *bus, void 
*opaque, int devfn)
 VTDHostIOMMUContext *vtd_dev_icx;
 
 assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
+/* only modern scalable supports unset_ioimmu_context */
+assert(s->scalable_modern);
 
 vtd_bus = vtd_find_add_bus(s, bus);
 
@@ -4787,8 +4791,13 @@ static void vtd_init(IntelIOMMUState *s)
 }
 
 /* TODO: read cap/ecap from host to decide which cap to be exposed. */
-if (s->scalable_mode) {
+if (s->scalable_mode && !s->scalable_modern) {
 s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
+} else if (s->scalable_mode && s->scalable_modern) {
+s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID |
+   VTD_ECAP_FLTS | VTD_ECAP_PSS(VTD_PASID_SS) |
+   VTD_ECAP_VCS;
+s->vccap |= VTD_VCCAP_PAS;
 }
 
 if (!s->cap_finalized) {
@@ -4929,6 +4938,28 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error 
**errp)
 return false;
 }
 
+if (s->scalable_mode_str &&
+(strcmp(s->scalable_mode_str, "off") &&
+ strcmp(s->scalable_mode_str, "modern") &&
+ strcmp(s->scalable_mode_str, "legacy"))) {
+error_setg(errp, "Invalid x-scalable-mode config,"
+ "Please use \"modern\", \"legacy\" or \"off\"");
+return false;
+}
+
+if (s->scalable_mode_str &&
+!strcmp(s->scalable_mode_str, "legacy")) {
+s->scalable_mode = true;
+s->scalable_modern = false;
+} else if (s->scalable_mode_str &&
+!strcmp(s->scalable_mode_str, "modern")) {
+s->scalable_mode = true;
+s->scalable_modern = true;
+} else {
+s->scalable_mode = false;
+s->scalable_modern = false;
+}
+
 return true;
 }
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 9b4fc67..afb4c6a 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -197,7 +197,9 @@
 #define VTD_ECAP_MHMV   (15ULL << 20)
 #define VTD_ECAP_SRS   

[RFC v8 22/25] intel_iommu: process PASID-based iotlb invalidation

2020-07-12 Thread Liu Yi L
This patch adds the basic PASID-based iotlb (piotlb) invalidation
support. piotlb is used during walking Intel VT-d 1st level page
table. This patch only adds the basic processing. Detailed handling
will be added in next patch.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Yi Sun 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Reviewed-by: Peter Xu 
Signed-off-by: Liu Yi L 
---
 hw/i386/intel_iommu.c  | 53 ++
 hw/i386/intel_iommu_internal.h | 13 +++
 2 files changed, 66 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 47af7b1..e6364ee 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3038,6 +3038,55 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s,
 return true;
 }
 
+static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s,
+uint16_t domain_id,
+uint32_t pasid)
+{
+}
+
+static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
+   uint32_t pasid, hwaddr addr, uint8_t am,
+   bool ih)
+{
+}
+
+static bool vtd_process_piotlb_desc(IntelIOMMUState *s,
+VTDInvDesc *inv_desc)
+{
+uint16_t domain_id;
+uint32_t pasid;
+uint8_t am;
+hwaddr addr;
+
+if ((inv_desc->val[0] & VTD_INV_DESC_PIOTLB_RSVD_VAL0) ||
+(inv_desc->val[1] & VTD_INV_DESC_PIOTLB_RSVD_VAL1)) {
+error_report_once("non-zero-field-in-piotlb_inv_desc hi: 0x%" PRIx64
+  " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
+return false;
+}
+
+domain_id = VTD_INV_DESC_PIOTLB_DID(inv_desc->val[0]);
+pasid = VTD_INV_DESC_PIOTLB_PASID(inv_desc->val[0]);
+switch (inv_desc->val[0] & VTD_INV_DESC_IOTLB_G) {
+case VTD_INV_DESC_PIOTLB_ALL_IN_PASID:
+vtd_piotlb_pasid_invalidate(s, domain_id, pasid);
+break;
+
+case VTD_INV_DESC_PIOTLB_PSI_IN_PASID:
+am = VTD_INV_DESC_PIOTLB_AM(inv_desc->val[1]);
+addr = (hwaddr) VTD_INV_DESC_PIOTLB_ADDR(inv_desc->val[1]);
+vtd_piotlb_page_invalidate(s, domain_id, pasid, addr, am,
+   VTD_INV_DESC_PIOTLB_IH(inv_desc->val[1]));
+break;
+
+default:
+error_report_once("Invalid granularity in P-IOTLB desc hi: 0x%" PRIx64
+  " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
+return false;
+}
+return true;
+}
+
 static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
  VTDInvDesc *inv_desc)
 {
@@ -3152,6 +3201,10 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
 break;
 
 case VTD_INV_DESC_PIOTLB:
+trace_vtd_inv_desc("p-iotlb", inv_desc.val[1], inv_desc.val[0]);
+if (!vtd_process_piotlb_desc(s, _desc)) {
+return false;
+}
 break;
 
 case VTD_INV_DESC_WAIT:
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 9805b84..118d568 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -476,6 +476,19 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
 #define VTD_INV_DESC_PASIDC_GLOBAL (3ULL << 4)
 
+#define VTD_INV_DESC_PIOTLB_ALL_IN_PASID  (2ULL << 4)
+#define VTD_INV_DESC_PIOTLB_PSI_IN_PASID  (3ULL << 4)
+
+#define VTD_INV_DESC_PIOTLB_RSVD_VAL0 0xfff0ffc0ULL
+#define VTD_INV_DESC_PIOTLB_RSVD_VAL1 0xf80ULL
+
+#define VTD_INV_DESC_PIOTLB_PASID(val)(((val) >> 32) & 0xfULL)
+#define VTD_INV_DESC_PIOTLB_DID(val)  (((val) >> 16) & \
+ VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_PIOTLB_ADDR(val) ((val) & ~0xfffULL)
+#define VTD_INV_DESC_PIOTLB_AM(val)   ((val) & 0x3fULL)
+#define VTD_INV_DESC_PIOTLB_IH(val)   (((val) >> 6) & 0x1)
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
 uint16_t domain_id;
-- 
2.7.4




[RFC v8 23/25] intel_iommu: propagate PASID-based iotlb invalidation to host

2020-07-12 Thread Liu Yi L
This patch propagates PASID-based iotlb invalidation to host.

Intel VT-d 3.0 supports nested translation in PASID granular.
Guest SVA support could be implemented by configuring nested
translation on specific PASID. This is also known as dual stage
DMA translation.

Under such configuration, guest owns the GVA->GPA translation
which is configured as first level page table in host side for
a specific pasid, and host owns GPA->HPA translation. As guest
owns first level translation table, piotlb invalidation should
be propagated to host since host IOMMU will cache first level
page table related mappings during DMA address translation.

This patch traps the guest PASID-based iotlb flush and propagate
it to host.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Yi Sun 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Signed-off-by: Liu Yi L 
---
rfcv4 (v1) -> rfcv5 (v2):
*) removed the valid check to vtd_pasid_as instance as rfcv5 ensures
   all vtd_pasid_as instances in hash table should be valid.
---
 hw/i386/intel_iommu.c  | 113 +
 hw/i386/intel_iommu_internal.h |   7 +++
 2 files changed, 120 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index e6364ee..d3c41a6 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3038,16 +3038,129 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s,
 return true;
 }
 
+/**
+ * Caller of this function should hold iommu_lock.
+ */
+static void vtd_invalidate_piotlb(IntelIOMMUState *s,
+  VTDBus *vtd_bus,
+  int devfn,
+  struct iommu_cache_invalidate_info *cache)
+{
+VTDHostIOMMUContext *vtd_dev_icx;
+HostIOMMUContext *iommu_ctx;
+
+vtd_dev_icx = vtd_bus->dev_icx[devfn];
+if (!vtd_dev_icx) {
+goto out;
+}
+iommu_ctx = vtd_dev_icx->iommu_ctx;
+if (!iommu_ctx) {
+goto out;
+}
+if (host_iommu_ctx_flush_stage1_cache(iommu_ctx, cache)) {
+error_report("Cache flush failed");
+}
+out:
+return;
+}
+
+/**
+ * This function is a loop function for the s->vtd_pasid_as
+ * list with VTDPIOTLBInvInfo as execution filter. It propagates
+ * the piotlb invalidation to host. Caller of this function
+ * should hold iommu_lock.
+ */
+static void vtd_flush_pasid_iotlb(gpointer key, gpointer value,
+  gpointer user_data)
+{
+VTDPIOTLBInvInfo *piotlb_info = user_data;
+VTDPASIDAddressSpace *vtd_pasid_as = value;
+VTDPASIDCacheEntry *pc_entry = _pasid_as->pasid_cache_entry;
+uint16_t did;
+
+did = vtd_pe_get_domain_id(_entry->pasid_entry);
+
+if ((piotlb_info->domain_id == did) &&
+(piotlb_info->pasid == vtd_pasid_as->pasid)) {
+vtd_invalidate_piotlb(vtd_pasid_as->iommu_state,
+  vtd_pasid_as->vtd_bus,
+  vtd_pasid_as->devfn,
+  piotlb_info->cache_info);
+}
+
+/*
+ * TODO: needs to add QEMU piotlb flush when QEMU piotlb
+ * infrastructure is ready. For now, it is enough for passthru
+ * devices.
+ */
+}
+
 static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s,
 uint16_t domain_id,
 uint32_t pasid)
 {
+VTDPIOTLBInvInfo piotlb_info;
+struct iommu_cache_invalidate_info *cache_info;
+
+cache_info = g_malloc0(sizeof(*cache_info));
+
+cache_info->argsz = sizeof(*cache_info);
+cache_info->version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
+cache_info->cache = IOMMU_CACHE_INV_TYPE_IOTLB;
+cache_info->granularity = IOMMU_INV_GRANU_PASID;
+cache_info->granu.pasid_info.pasid = pasid;
+cache_info->granu.pasid_info.flags = IOMMU_INV_PASID_FLAGS_PASID;
+
+piotlb_info.domain_id = domain_id;
+piotlb_info.pasid = pasid;
+piotlb_info.cache_info = cache_info;
+
+vtd_iommu_lock(s);
+/*
+ * Here loops all the vtd_pasid_as instances in s->vtd_pasid_as
+ * to find out the affected devices since piotlb invalidation
+ * should check pasid cache per architecture point of view.
+ */
+g_hash_table_foreach(s->vtd_pasid_as,
+ vtd_flush_pasid_iotlb, _info);
+vtd_iommu_unlock(s);
+g_free(cache_info);
 }
 
 static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
uint32_t pasid, hwaddr addr, uint8_t am,
bool ih)
 {
+VTDPIOTLBInvInfo piotlb_info;
+struct iommu_cache_invalidate_info *cache_info;
+
+cache_info = g_malloc0(sizeof(*cache_info));
+
+cache_info->argsz = sizeof(*cache_info);
+cache_info->version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
+cache_info->cache = IOMMU_CACHE_INV_TYPE_IOTLB;
+cache_info->granularity = IOMMU_INV_GRANU_ADDR;
+

[RFC v8 20/25] intel_iommu: do not pass down pasid bind for PASID #0

2020-07-12 Thread Liu Yi L
RID_PASID field was introduced in VT-d 3.0 spec, it is used
for DMA requests w/o PASID in scalable mode VT-d. It is also
known as IOVA. And in VT-d 3.1 spec, there is definition on it:

"Implementations not supporting RID_PASID capability
(ECAP_REG.RPS is 0b), use a PASID value of 0 to perform
address translation for requests without PASID."

This patch adds a check against the PASIDs which are going to be
bound to device. For PASID #0, it is not necessary to pass down
pasid bind request for it since PASID #0 is used as RID_PASID for
DMA requests without pasid. Further reason is current Intel vIOMMU
supports gIOVA by shadowing guest 2nd level page table. However,
in future, if guest IOMMU driver uses 1st level page table to store
IOVA mappings, then guest IOVA support will also be done via nested
translation. When gIOVA is over FLPT, then vIOMMU should pass down
the pasid bind request for PASID #0 to host, host needs to bind the
guest IOVA page table to a proper PASID. e.g. PASID value in RID_PASID
field for PF/VF if ECAP_REG.RPS is clear or default PASID for ADI
(Assignable Device Interface in Scalable IOV solution).

IOVA over FLPT support on Intel VT-d:
https://lkml.org/lkml/2019/9/23/297

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Yi Sun 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Reviewed-by: Peter Xu 
Signed-off-by: Liu Yi L 
---
 hw/i386/intel_iommu.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index de2ba0e..47af7b1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1893,6 +1893,16 @@ static int vtd_bind_guest_pasid(IntelIOMMUState *s, 
VTDBus *vtd_bus,
 HostIOMMUContext *iommu_ctx;
 int ret = -1;
 
+if (pasid < VTD_HPASID_MIN) {
+/*
+ * If pasid < VTD_HPASID_MIN, this pasid is not allocated
+ * from host. No need to pass down the changes on it to host.
+ * TODO: when IOVA over FLPT is ready, this switch should be
+ * refined.
+ */
+return 0;
+}
+
 vtd_dev_icx = vtd_bus->dev_icx[devfn];
 if (!vtd_dev_icx) {
 /* means no need to go further, e.g. for emulated devices */
-- 
2.7.4




[RFC v8 21/25] vfio: add support for flush iommu stage-1 cache

2020-07-12 Thread Liu Yi L
This patch adds flush_stage1_cache() definition in HostIOMUContextClass.
And adds corresponding implementation in VFIO. This is to expose a way
for vIOMMU to flush stage-1 cache in host side since guest owns stage-1
translation structures in dual stage DMA translation configuration.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Cc: Alex Williamson 
Acked-by: Peter Xu 
Signed-off-by: Liu Yi L 
---
 hw/iommu/host_iommu_context.c | 19 +++
 hw/vfio/common.c  | 24 
 include/hw/iommu/host_iommu_context.h |  8 
 3 files changed, 51 insertions(+)

diff --git a/hw/iommu/host_iommu_context.c b/hw/iommu/host_iommu_context.c
index 0e7e790..7c8be15 100644
--- a/hw/iommu/host_iommu_context.c
+++ b/hw/iommu/host_iommu_context.c
@@ -113,6 +113,25 @@ int host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext 
*iommu_ctx,
 return hicxc->unbind_stage1_pgtbl(iommu_ctx, unbind);
 }
 
+int host_iommu_ctx_flush_stage1_cache(HostIOMMUContext *iommu_ctx,
+ struct iommu_cache_invalidate_info *cache)
+{
+HostIOMMUContextClass *hicxc;
+
+hicxc = HOST_IOMMU_CONTEXT_GET_CLASS(iommu_ctx);
+
+if (!hicxc) {
+return -EINVAL;
+}
+
+if (!(iommu_ctx->flags & HOST_IOMMU_NESTING) ||
+!hicxc->flush_stage1_cache) {
+return -EINVAL;
+}
+
+return hicxc->flush_stage1_cache(iommu_ctx, cache);
+}
+
 void host_iommu_ctx_init(void *_iommu_ctx, size_t instance_size,
  const char *mrtypename,
  uint64_t flags,
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 8bfc9ce..bfe9917 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1276,6 +1276,29 @@ static int 
vfio_host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *iommu_ctx,
 return ret;
 }
 
+static int vfio_host_iommu_ctx_flush_stage1_cache(HostIOMMUContext *iommu_ctx,
+struct iommu_cache_invalidate_info *cache)
+{
+VFIOContainer *container = container_of(iommu_ctx,
+VFIOContainer, iommu_ctx);
+struct vfio_iommu_type1_nesting_op *op;
+unsigned long argsz;
+int ret = 0;
+
+argsz = sizeof(*op) + sizeof(*cache);
+op = g_malloc0(argsz);
+op->argsz = argsz;
+op->flags = VFIO_IOMMU_NESTING_OP_CACHE_INVLD;
+memcpy(>data, cache, sizeof(*cache));
+
+if (ioctl(container->fd, VFIO_IOMMU_NESTING_OP, op)) {
+ret = -errno;
+error_report("%s: iommu cache flush failed: %m", __func__);
+}
+g_free(op);
+return ret;
+}
+
 /**
  * Get iommu info from host. Caller of this funcion should free
  * the memory pointed by the returned pointer stored in @info
@@ -2018,6 +2041,7 @@ static void 
vfio_host_iommu_context_class_init(ObjectClass *klass,
 hicxc->pasid_free = vfio_host_iommu_ctx_pasid_free;
 hicxc->bind_stage1_pgtbl = vfio_host_iommu_ctx_bind_stage1_pgtbl;
 hicxc->unbind_stage1_pgtbl = vfio_host_iommu_ctx_unbind_stage1_pgtbl;
+hicxc->flush_stage1_cache = vfio_host_iommu_ctx_flush_stage1_cache;
 }
 
 static const TypeInfo vfio_host_iommu_context_info = {
diff --git a/include/hw/iommu/host_iommu_context.h 
b/include/hw/iommu/host_iommu_context.h
index 2883ed8..40e860a 100644
--- a/include/hw/iommu/host_iommu_context.h
+++ b/include/hw/iommu/host_iommu_context.h
@@ -64,6 +64,12 @@ typedef struct HostIOMMUContextClass {
 /* Undo a previous bind. @unbind specifies the unbind info. */
 int (*unbind_stage1_pgtbl)(HostIOMMUContext *iommu_ctx,
struct iommu_gpasid_bind_data *unbind);
+/*
+ * Propagate stage-1 cache flush to host IOMMU, cache
+ * info specifid in @cache
+ */
+int (*flush_stage1_cache)(HostIOMMUContext *iommu_ctx,
+  struct iommu_cache_invalidate_info *cache);
 } HostIOMMUContextClass;
 
 /*
@@ -85,6 +91,8 @@ int host_iommu_ctx_bind_stage1_pgtbl(HostIOMMUContext 
*iommu_ctx,
  struct iommu_gpasid_bind_data *bind);
 int host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *iommu_ctx,
  struct iommu_gpasid_bind_data *unbind);
+int host_iommu_ctx_flush_stage1_cache(HostIOMMUContext *iommu_ctx,
+   struct iommu_cache_invalidate_info *cache);
 
 void host_iommu_ctx_init(void *_iommu_ctx, size_t instance_size,
  const char *mrtypename,
-- 
2.7.4




[RFC v8 18/25] intel_iommu: bind/unbind guest page table to host

2020-07-12 Thread Liu Yi L
This patch captures the guest PASID table entry modifications and
propagates the changes to host to setup dual stage DMA translation.
The guest page table is configured as 1st level page table (GVA->GPA)
whose translation result would further go through host VT-d 2nd
level page table(GPA->HPA) under nested translation mode. This is the
key part of vSVA support, and also a key to support IOVA over 1st-
level page table for Intel VT-d in virtualization environment.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Yi Sun 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Signed-off-by: Liu Yi L 
---
 hw/i386/intel_iommu.c  | 101 +++--
 hw/i386/intel_iommu_internal.h |  18 
 2 files changed, 114 insertions(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c3e8b20..1b7272c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -41,6 +41,7 @@
 #include "migration/vmstate.h"
 #include "trace.h"
 #include "qemu/jhash.h"
+#include 
 
 /* context entry operations */
 #define VTD_CE_GET_RID2PASID(ce) \
@@ -700,6 +701,16 @@ static inline uint32_t 
vtd_sm_ce_get_pdt_entry_num(VTDContextEntry *ce)
 return 1U << (VTD_SM_CONTEXT_ENTRY_PDTS(ce->val[0]) + 7);
 }
 
+static inline uint32_t vtd_pe_get_fl_aw(VTDPASIDEntry *pe)
+{
+return 48 + ((pe->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM) * 9;
+}
+
+static inline dma_addr_t vtd_pe_get_flpt_base(VTDPASIDEntry *pe)
+{
+return pe->val[2] & VTD_SM_PASID_ENTRY_FLPTPTR;
+}
+
 static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
 {
 return pdire->val & 1;
@@ -1861,6 +1872,85 @@ static void 
vtd_context_global_invalidate(IntelIOMMUState *s)
 vtd_iommu_replay_all(s);
 }
 
+/**
+ * Caller should hold iommu_lock.
+ */
+static int vtd_bind_guest_pasid(IntelIOMMUState *s, VTDBus *vtd_bus,
+int devfn, int pasid, VTDPASIDEntry *pe,
+VTDPASIDOp op)
+{
+VTDHostIOMMUContext *vtd_dev_icx;
+HostIOMMUContext *iommu_ctx;
+int ret = -1;
+
+vtd_dev_icx = vtd_bus->dev_icx[devfn];
+if (!vtd_dev_icx) {
+/* means no need to go further, e.g. for emulated devices */
+return 0;
+}
+
+iommu_ctx = vtd_dev_icx->iommu_ctx;
+if (!iommu_ctx) {
+return -EINVAL;
+}
+
+switch (op) {
+case VTD_PASID_BIND:
+{
+struct iommu_gpasid_bind_data *g_bind_data;
+
+g_bind_data = g_malloc0(sizeof(*g_bind_data));
+
+g_bind_data->argsz = sizeof(*g_bind_data);
+g_bind_data->version = IOMMU_GPASID_BIND_VERSION_1;
+g_bind_data->format = IOMMU_PASID_FORMAT_INTEL_VTD;
+g_bind_data->gpgd = vtd_pe_get_flpt_base(pe);
+g_bind_data->addr_width = vtd_pe_get_fl_aw(pe);
+g_bind_data->hpasid = pasid;
+g_bind_data->gpasid = pasid;
+g_bind_data->flags |= IOMMU_SVA_GPASID_VAL;
+g_bind_data->vendor.vtd.flags =
+ (VTD_SM_PASID_ENTRY_SRE_BIT(pe->val[2]) ?
+IOMMU_SVA_VTD_GPASID_SRE : 0)
+   | (VTD_SM_PASID_ENTRY_EAFE_BIT(pe->val[2]) ?
+IOMMU_SVA_VTD_GPASID_EAFE : 0)
+   | (VTD_SM_PASID_ENTRY_PCD_BIT(pe->val[1]) ?
+IOMMU_SVA_VTD_GPASID_PCD : 0)
+   | (VTD_SM_PASID_ENTRY_PWT_BIT(pe->val[1]) ?
+IOMMU_SVA_VTD_GPASID_PWT : 0)
+   | (VTD_SM_PASID_ENTRY_EMTE_BIT(pe->val[1]) ?
+IOMMU_SVA_VTD_GPASID_EMTE : 0)
+   | (VTD_SM_PASID_ENTRY_CD_BIT(pe->val[1]) ?
+IOMMU_SVA_VTD_GPASID_CD : 0);
+g_bind_data->vendor.vtd.pat = VTD_SM_PASID_ENTRY_PAT(pe->val[1]);
+g_bind_data->vendor.vtd.emt = VTD_SM_PASID_ENTRY_EMT(pe->val[1]);
+ret = host_iommu_ctx_bind_stage1_pgtbl(iommu_ctx, g_bind_data);
+g_free(g_bind_data);
+break;
+}
+case VTD_PASID_UNBIND:
+{
+struct iommu_gpasid_bind_data *g_unbind_data;
+
+g_unbind_data = g_malloc0(sizeof(*g_unbind_data));
+
+g_unbind_data->argsz = sizeof(*g_unbind_data);
+g_unbind_data->version = IOMMU_GPASID_BIND_VERSION_1;
+g_unbind_data->format = IOMMU_PASID_FORMAT_INTEL_VTD;
+g_unbind_data->hpasid = pasid;
+ret = host_iommu_ctx_unbind_stage1_pgtbl(iommu_ctx, g_unbind_data);
+g_free(g_unbind_data);
+break;
+}
+default:
+error_report_once("Unknown VTDPASIDOp!!!\n");
+break;
+}
+
+
+return ret;
+}
+
 /* Do a context-cache device-selective invalidation.
  * @func_mask: FM field after shifting
  */
@@ -2489,10 +2579,10 @@ static void vtd_fill_pe_in_cache(IntelIOMMUState *s,
 }
 
 pc_entry->pasid_entry = *pe;
-/*
- * TODO:
- * - 

[RFC v8 19/25] intel_iommu: replay pasid binds after context cache invalidation

2020-07-12 Thread Liu Yi L
This patch replays guest pasid bindings after context cache
invalidation. This is a behavior to ensure safety. Actually,
programmer should issue pasid cache invalidation with proper
granularity after issuing a context cache invalidation.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Yi Sun 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Signed-off-by: Liu Yi L 
---
 hw/i386/intel_iommu.c  | 50 ++
 hw/i386/intel_iommu_internal.h |  1 +
 hw/i386/trace-events   |  1 +
 3 files changed, 52 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 1b7272c..de2ba0e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -68,6 +68,10 @@ static void vtd_address_space_refresh_all(IntelIOMMUState 
*s);
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
 
 static void vtd_pasid_cache_reset(IntelIOMMUState *s);
+static void vtd_pasid_cache_sync(IntelIOMMUState *s,
+ VTDPASIDCacheInfo *pc_info);
+static void vtd_pasid_cache_devsi(IntelIOMMUState *s,
+  VTDBus *vtd_bus, uint16_t devfn);
 
 static void vtd_panic_require_caching_mode(void)
 {
@@ -1853,7 +1857,10 @@ static void vtd_iommu_replay_all(IntelIOMMUState *s)
 
 static void vtd_context_global_invalidate(IntelIOMMUState *s)
 {
+VTDPASIDCacheInfo pc_info;
+
 trace_vtd_inv_desc_cc_global();
+
 /* Protects context cache */
 vtd_iommu_lock(s);
 s->context_cache_gen++;
@@ -1870,6 +1877,9 @@ static void vtd_context_global_invalidate(IntelIOMMUState 
*s)
  * VT-d emulation codes.
  */
 vtd_iommu_replay_all(s);
+
+pc_info.type = VTD_PASID_CACHE_GLOBAL_INV;
+vtd_pasid_cache_sync(s, _info);
 }
 
 /**
@@ -2008,6 +2018,21 @@ static void 
vtd_context_device_invalidate(IntelIOMMUState *s,
  * happened.
  */
 vtd_sync_shadow_page_table(vtd_as);
+/*
+ * Per spec, context flush should also followed with PASID
+ * cache and iotlb flush. Regards to a device selective
+ * context cache invalidation:
+ * if (emaulted_device)
+ *invalidate pasid cahce and pasid-based iotlb
+ * else if (assigned_device)
+ *check if the device has been bound to any pasid
+ *invoke pasid_unbind regards to each bound pasid
+ * Here, we have vtd_pasid_cache_devsi() to invalidate pasid
+ * caches, while for piotlb in QEMU, we don't have it yet, so
+ * no handling. For assigned device, host iommu driver would
+ * flush piotlb when a pasid unbind is pass down to it.
+ */
+ vtd_pasid_cache_devsi(s, vtd_bus, devfn_it);
 }
 }
 }
@@ -2622,6 +2647,12 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer 
value,
 /* Fall through */
 case VTD_PASID_CACHE_GLOBAL_INV:
 break;
+case VTD_PASID_CACHE_DEVSI:
+if (pc_info->vtd_bus != vtd_bus ||
+pc_info->devfn != devfn) {
+return false;
+}
+break;
 default:
 error_report("invalid pc_info->type");
 abort();
@@ -2821,6 +2852,11 @@ static void 
vtd_replay_guest_pasid_bindings(IntelIOMMUState *s,
 case VTD_PASID_CACHE_GLOBAL_INV:
 /* loop all assigned devices */
 break;
+case VTD_PASID_CACHE_DEVSI:
+walk_info.vtd_bus = pc_info->vtd_bus;
+walk_info.devfn = pc_info->devfn;
+vtd_replay_pasid_bind_for_dev(s, start, end, _info);
+return;
 case VTD_PASID_CACHE_FORCE_RESET:
 /* For force reset, no need to go further replay */
 return;
@@ -2906,6 +2942,20 @@ static void vtd_pasid_cache_sync(IntelIOMMUState *s,
 vtd_iommu_unlock(s);
 }
 
+static void vtd_pasid_cache_devsi(IntelIOMMUState *s,
+  VTDBus *vtd_bus, uint16_t devfn)
+{
+VTDPASIDCacheInfo pc_info;
+
+trace_vtd_pasid_cache_devsi(devfn);
+
+pc_info.type = VTD_PASID_CACHE_DEVSI;
+pc_info.vtd_bus = vtd_bus;
+pc_info.devfn = devfn;
+
+vtd_pasid_cache_sync(s, _info);
+}
+
 /**
  * Caller of this function should hold iommu_lock
  */
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 51691d0..9805b84 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -548,6 +548,7 @@ typedef enum VTDPCInvType {
 VTD_PASID_CACHE_FORCE_RESET = 0,
 /* pasid cache invalidation rely on guest PASID entry */
 VTD_PASID_CACHE_GLOBAL_INV,
+VTD_PASID_CACHE_DEVSI,
 VTD_PASID_CACHE_DOMSI,
 VTD_PASID_CACHE_PASIDSI,
 } VTDPCInvType;
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 60d20c1..3853fa8 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -26,6 +26,7 @@ 

[RFC v8 17/25] intel_iommu: sync IOMMU nesting cap info for assigned devices

2020-07-12 Thread Liu Yi L
For assigned devices, Intel vIOMMU which wants to build DMA protection
based on physical IOMMU nesting paging should check the IOMMU nesting
support in host side. The host will return IOMMU nesting cap info to
user-space (e.g. VFIO returns IOMMU nesting cap info for nesting type
IOMMU). vIOMMU needs to check:
a) IOMMU model
b) 1st-level page table supports
c) address width
d) pasid support

This patch syncs the IOMMU nesting cap info when PCIe device (VFIO case)
sets HostIOMMUContext to vIOMMU. If the host IOMMU nesting support is not
compatible, vIOMMU should return failure to PCIe device.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Yi Sun 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Signed-off-by: Liu Yi L 
---
 hw/i386/intel_iommu.c  | 107 +
 hw/i386/intel_iommu_internal.h |  18 +++
 include/hw/i386/intel_iommu.h  |   4 ++
 3 files changed, 129 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c93c360..c3e8b20 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4104,6 +4104,84 @@ static int vtd_dev_get_iommu_attr(PCIBus *bus, void 
*opaque, int32_t devfn,
 return ret;
 }
 
+
+static bool vtd_check_nesting_info(IntelIOMMUState *s,
+   struct iommu_nesting_info *info,
+   struct iommu_nesting_info_vtd *vtd)
+{
+return !((s->aw_bits != info->addr_width) ||
+ ((s->host_cap & VTD_CAP_MASK) !=
+  (vtd->cap_reg & VTD_CAP_MASK)) ||
+ ((s->host_ecap & VTD_ECAP_MASK) !=
+  (vtd->ecap_reg & VTD_ECAP_MASK)) ||
+ (VTD_GET_PSS(s->host_ecap) != (info->pasid_bits - 1)));
+}
+
+/* Caller should hold iommu lock. */
+static bool vtd_sync_nesting_info(IntelIOMMUState *s,
+  struct iommu_nesting_info *info)
+{
+struct iommu_nesting_info_vtd *vtd;
+uint64_t cap, ecap;
+
+vtd =  (struct iommu_nesting_info_vtd *) >data;
+
+if (s->cap_finalized) {
+return vtd_check_nesting_info(s, info, vtd);
+}
+
+if (s->aw_bits > info->addr_width) {
+error_report("User aw-bits: %u > host address width: %u",
+  s->aw_bits, info->addr_width);
+return false;
+}
+
+cap = s->host_cap & vtd->cap_reg & VTD_CAP_MASK;
+s->host_cap &= ~VTD_CAP_MASK;
+s->host_cap |= cap;
+
+ecap = s->host_ecap & vtd->ecap_reg & VTD_ECAP_MASK;
+s->host_ecap &= ~VTD_ECAP_MASK;
+s->host_ecap |= ecap;
+
+if ((VTD_ECAP_PASID & s->host_ecap) && info->pasid_bits &&
+(VTD_GET_PSS(s->host_ecap) > (info->pasid_bits - 1))) {
+s->host_ecap &= ~VTD_ECAP_PSS_MASK;
+s->host_ecap |= VTD_ECAP_PSS(info->pasid_bits - 1);
+}
+return true;
+}
+
+/*
+ * virtual VT-d which wants nested needs to check the host IOMMU
+ * nesting cap info behind the assigned devices. Thus that vIOMMU
+ * could bind guest page table to host.
+ */
+static bool vtd_check_iommu_ctx(IntelIOMMUState *s,
+HostIOMMUContext *iommu_ctx)
+{
+struct iommu_nesting_info *info = iommu_ctx->info;
+uint32_t minsz, size;
+
+if (IOMMU_PASID_FORMAT_INTEL_VTD != info->format) {
+error_report("Format is not compatible for nesting!!!");
+return false;
+}
+
+size = sizeof(struct iommu_nesting_info_vtd);
+minsz = endof(struct iommu_nesting_info, flags);
+if (size > (info->size - minsz)) {
+/*
+ * QEMU may have been using new linux-headers/iommu.h than
+ * kernel supports, hence fail it.
+ */
+error_report("IOMMU nesting cap is not compatible!!!");
+return false;
+}
+
+return vtd_sync_nesting_info(s, info);
+}
+
 static int vtd_dev_set_iommu_context(PCIBus *bus, void *opaque,
  int devfn,
  HostIOMMUContext *iommu_ctx)
@@ -4118,6 +4196,11 @@ static int vtd_dev_set_iommu_context(PCIBus *bus, void 
*opaque,
 
 vtd_iommu_lock(s);
 
+if (!vtd_check_iommu_ctx(s, iommu_ctx)) {
+vtd_iommu_unlock(s);
+return -ENOENT;
+}
+
 vtd_dev_icx = vtd_bus->dev_icx[devfn];
 
 assert(!vtd_dev_icx);
@@ -4373,6 +4456,14 @@ static void vtd_init(IntelIOMMUState *s)
 s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
 }
 
+if (!s->cap_finalized) {
+s->host_cap = s->cap;
+s->host_ecap = s->ecap;
+} else {
+s->cap = s->host_cap;
+s->ecap = s->host_ecap;
+}
+
 vtd_reset_caches(s);
 
 /* Define registers with default values and bit semantics */
@@ -4506,6 +4597,12 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error 
**errp)
 return true;
 }
 
+static void vtd_refresh_capability_reg(IntelIOMMUState *s)
+{
+vtd_set_quad(s, DMAR_CAP_REG, s->cap);
+vtd_set_quad(s, DMAR_ECAP_REG, s->ecap);
+}
+
 static int vtd_machine_done_notify_one(Object *child, void 

[RFC v8 13/25] intel_iommu: add virtual command capability support

2020-07-12 Thread Liu Yi L
This patch adds virtual command support to Intel vIOMMU per
Intel VT-d 3.1 spec. And adds two virtual commands: allocate
pasid and free pasid.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Yi Sun 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Reviewed-by: Peter Xu 
Signed-off-by: Liu Yi L 
Signed-off-by: Yi Sun 
---
 hw/i386/intel_iommu.c  | 154 -
 hw/i386/intel_iommu_internal.h |  37 ++
 hw/i386/trace-events   |   1 +
 include/hw/i386/intel_iommu.h  |  10 ++-
 4 files changed, 200 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 8f7c957..46036d4 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2656,6 +2656,129 @@ static void vtd_handle_iectl_write(IntelIOMMUState *s)
 }
 }
 
+static int vtd_request_pasid_alloc(IntelIOMMUState *s, uint32_t *pasid)
+{
+VTDHostIOMMUContext *vtd_dev_icx;
+int ret = -1;
+
+vtd_iommu_lock(s);
+QLIST_FOREACH(vtd_dev_icx, >vtd_dev_icx_list, next) {
+HostIOMMUContext *iommu_ctx = vtd_dev_icx->iommu_ctx;
+
+/*
+ * We'll return the first valid result we got. It's
+ * a bit hackish in that we don't have a good global
+ * interface yet to talk to modules like vfio to deliver
+ * this allocation request, so we're leveraging this
+ * per-device iommu context to do the same thing just
+ * to make sure the allocation happens only once.
+ */
+ret = host_iommu_ctx_pasid_alloc(iommu_ctx, VTD_HPASID_MIN,
+ VTD_HPASID_MAX, pasid);
+if (!ret) {
+break;
+}
+}
+vtd_iommu_unlock(s);
+
+return ret;
+}
+
+static int vtd_request_pasid_free(IntelIOMMUState *s, uint32_t pasid)
+{
+VTDHostIOMMUContext *vtd_dev_icx;
+int ret = -1;
+
+vtd_iommu_lock(s);
+QLIST_FOREACH(vtd_dev_icx, >vtd_dev_icx_list, next) {
+HostIOMMUContext *iommu_ctx = vtd_dev_icx->iommu_ctx;
+
+/*
+ * Similar with pasid allocation. We'll free the pasid
+ * on the first successful free operation. It's a bit
+ * hackish in that we don't have a good global interface
+ * yet to talk to modules like vfio to deliver this pasid
+ * free request, so we're leveraging this per-device iommu
+ * context to do the same thing just to make sure the free
+ * happens only once.
+ */
+ret = host_iommu_ctx_pasid_free(iommu_ctx, pasid);
+if (!ret) {
+break;
+}
+}
+vtd_iommu_unlock(s);
+
+return ret;
+}
+
+/*
+ * If IP is not set, set it then return.
+ * If IP is already set, return.
+ */
+static void vtd_vcmd_set_ip(IntelIOMMUState *s)
+{
+s->vcrsp = 1;
+vtd_set_quad_raw(s, DMAR_VCRSP_REG,
+ ((uint64_t) s->vcrsp));
+}
+
+static void vtd_vcmd_clear_ip(IntelIOMMUState *s)
+{
+s->vcrsp &= (~((uint64_t)(0x1)));
+vtd_set_quad_raw(s, DMAR_VCRSP_REG,
+ ((uint64_t) s->vcrsp));
+}
+
+/* Handle write to Virtual Command Register */
+static int vtd_handle_vcmd_write(IntelIOMMUState *s, uint64_t val)
+{
+uint32_t pasid;
+int ret = -1;
+
+trace_vtd_reg_write_vcmd(s->vcrsp, val);
+
+if (!(s->vccap & VTD_VCCAP_PAS) ||
+ (s->vcrsp & 1)) {
+return -1;
+}
+
+/*
+ * Since vCPU should be blocked when the guest VMCD
+ * write was trapped to here. Should be no other vCPUs
+ * try to access VCMD if guest software is well written.
+ * However, we still emulate the IP bit here in case of
+ * bad guest software. Also align with the spec.
+ */
+vtd_vcmd_set_ip(s);
+
+switch (val & VTD_VCMD_CMD_MASK) {
+case VTD_VCMD_ALLOC_PASID:
+ret = vtd_request_pasid_alloc(s, );
+if (ret) {
+s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_NO_AVAILABLE_PASID);
+} else {
+s->vcrsp |= VTD_VCRSP_RSLT(pasid);
+}
+break;
+
+case VTD_VCMD_FREE_PASID:
+pasid = VTD_VCMD_PASID_VALUE(val);
+ret = vtd_request_pasid_free(s, pasid);
+if (ret < 0) {
+s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_FREE_INVALID_PASID);
+}
+break;
+
+default:
+s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_UNDEFINED_CMD);
+error_report_once("Virtual Command: unsupported command!!!");
+break;
+}
+vtd_vcmd_clear_ip(s);
+return 0;
+}
+
 static uint64_t vtd_mem_read(void *opaque, hwaddr addr, unsigned size)
 {
 IntelIOMMUState *s = opaque;
@@ -2944,6 +3067,23 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
 vtd_set_long(s, addr, val);
 break;
 
+case DMAR_VCMD_REG:
+if (!vtd_handle_vcmd_write(s, val)) {
+if (size == 4) {
+vtd_set_long(s, addr, val);
+} else {
+vtd_set_quad(s, addr, val);
+}
+}
+

[RFC v8 16/25] vfio: add bind stage-1 page table support

2020-07-12 Thread Liu Yi L
This patch adds bind_stage1_pgtbl() definition in HostIOMMUContextClass,
also adds corresponding implementation in VFIO. This is to expose a way
for vIOMMU to setup dual stage DMA translation for passthru devices on
hardware.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Cc: Alex Williamson 
Signed-off-by: Liu Yi L 
---
 hw/iommu/host_iommu_context.c | 57 +-
 hw/vfio/common.c  | 58 ++-
 include/hw/iommu/host_iommu_context.h | 19 +++-
 3 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/hw/iommu/host_iommu_context.c b/hw/iommu/host_iommu_context.c
index 5fb2223..0e7e790 100644
--- a/hw/iommu/host_iommu_context.c
+++ b/hw/iommu/host_iommu_context.c
@@ -69,23 +69,78 @@ int host_iommu_ctx_pasid_free(HostIOMMUContext *iommu_ctx, 
uint32_t pasid)
 return hicxc->pasid_free(iommu_ctx, pasid);
 }
 
+int host_iommu_ctx_bind_stage1_pgtbl(HostIOMMUContext *iommu_ctx,
+ struct iommu_gpasid_bind_data *bind)
+{
+HostIOMMUContextClass *hicxc;
+
+if (!iommu_ctx) {
+return -EINVAL;
+}
+
+hicxc = HOST_IOMMU_CONTEXT_GET_CLASS(iommu_ctx);
+if (!hicxc) {
+return -EINVAL;
+}
+
+if (!(iommu_ctx->flags & HOST_IOMMU_NESTING) ||
+!hicxc->bind_stage1_pgtbl) {
+return -EINVAL;
+}
+
+return hicxc->bind_stage1_pgtbl(iommu_ctx, bind);
+}
+
+int host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *iommu_ctx,
+ struct iommu_gpasid_bind_data *unbind)
+{
+HostIOMMUContextClass *hicxc;
+
+if (!iommu_ctx) {
+return -EINVAL;
+}
+
+hicxc = HOST_IOMMU_CONTEXT_GET_CLASS(iommu_ctx);
+if (!hicxc) {
+return -EINVAL;
+}
+
+if (!(iommu_ctx->flags & HOST_IOMMU_NESTING) ||
+!hicxc->unbind_stage1_pgtbl) {
+return -EINVAL;
+}
+
+return hicxc->unbind_stage1_pgtbl(iommu_ctx, unbind);
+}
+
 void host_iommu_ctx_init(void *_iommu_ctx, size_t instance_size,
  const char *mrtypename,
- uint64_t flags)
+ uint64_t flags,
+ struct iommu_nesting_info *info)
 {
 HostIOMMUContext *iommu_ctx;
 
 object_initialize(_iommu_ctx, instance_size, mrtypename);
 iommu_ctx = HOST_IOMMU_CONTEXT(_iommu_ctx);
 iommu_ctx->flags = flags;
+iommu_ctx->info = g_malloc0(info->size);
+memcpy(iommu_ctx->info, info, info->size);
 iommu_ctx->initialized = true;
 }
 
+static void host_iommu_ctx_finalize_fn(Object *obj)
+{
+HostIOMMUContext *iommu_ctx = HOST_IOMMU_CONTEXT(obj);
+
+g_free(iommu_ctx->info);
+}
+
 static const TypeInfo host_iommu_context_info = {
 .parent = TYPE_OBJECT,
 .name   = TYPE_HOST_IOMMU_CONTEXT,
 .class_size = sizeof(HostIOMMUContextClass),
 .instance_size  = sizeof(HostIOMMUContext),
+.instance_finalize  = host_iommu_ctx_finalize_fn,
 .abstract   = true,
 };
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index cdd16a1..8bfc9ce 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1228,6 +1228,54 @@ static int 
vfio_host_iommu_ctx_pasid_free(HostIOMMUContext *iommu_ctx,
 return ret;
 }
 
+static int vfio_host_iommu_ctx_bind_stage1_pgtbl(HostIOMMUContext *iommu_ctx,
+ struct iommu_gpasid_bind_data *bind)
+{
+VFIOContainer *container = container_of(iommu_ctx,
+VFIOContainer, iommu_ctx);
+struct vfio_iommu_type1_nesting_op *op;
+unsigned long argsz;
+int ret = 0;
+
+argsz = sizeof(*op) + sizeof(*bind);
+op = g_malloc0(argsz);
+op->argsz = argsz;
+op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
+memcpy(>data, bind, sizeof(*bind));
+
+if (ioctl(container->fd, VFIO_IOMMU_NESTING_OP, op)) {
+ret = -errno;
+error_report("%s: pasid (%llu) bind failed: %m",
+  __func__, bind->hpasid);
+}
+g_free(op);
+return ret;
+}
+
+static int vfio_host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *iommu_ctx,
+ struct iommu_gpasid_bind_data *unbind)
+{
+VFIOContainer *container = container_of(iommu_ctx,
+VFIOContainer, iommu_ctx);
+struct vfio_iommu_type1_nesting_op *op;
+unsigned long argsz;
+int ret = 0;
+
+argsz = sizeof(*op) + sizeof(*unbind);
+op = g_malloc0(argsz);
+op->argsz = argsz;
+op->flags = VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL;
+memcpy(>data, unbind, sizeof(*unbind));
+
+if (ioctl(container->fd, VFIO_IOMMU_NESTING_OP, op)) {
+ret = -errno;
+error_report("%s: pasid (%llu) unbind failed: %m",
+  __func__, unbind->hpasid);
+}
+g_free(op);
+return ret;
+}
+
 /**
  * Get iommu info 

[RFC v8 11/25] vfio/common: provide PASID alloc/free hooks

2020-07-12 Thread Liu Yi L
This patch defines vfio_host_iommu_context_info, implements the PASID
alloc/free hooks defined in HostIOMMUContextClass.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Cc: Alex Williamson 
Signed-off-by: Liu Yi L 
---
 hw/vfio/common.c  | 66 +++
 include/hw/iommu/host_iommu_context.h |  3 ++
 include/hw/vfio/vfio-common.h |  4 +++
 3 files changed, 73 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b85fbcf..7b92a58 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1184,6 +1184,50 @@ static int vfio_get_iommu_type(VFIOContainer *container,
 return ret;
 }
 
+static int vfio_host_iommu_ctx_pasid_alloc(HostIOMMUContext *iommu_ctx,
+   uint32_t min, uint32_t max,
+   uint32_t *pasid)
+{
+VFIOContainer *container = container_of(iommu_ctx,
+VFIOContainer, iommu_ctx);
+struct vfio_iommu_type1_pasid_request req;
+int ret = 0;
+
+req.argsz = sizeof(req);
+req.flags = VFIO_IOMMU_FLAG_ALLOC_PASID;
+req.range.min = min;
+req.range.max = max;
+
+ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, );
+if (ret < 0) {
+error_report("%s: alloc failed (%m)", __func__);
+return ret;
+}
+*pasid = ret;
+return 0;
+}
+
+static int vfio_host_iommu_ctx_pasid_free(HostIOMMUContext *iommu_ctx,
+  uint32_t pasid)
+{
+VFIOContainer *container = container_of(iommu_ctx,
+VFIOContainer, iommu_ctx);
+struct vfio_iommu_type1_pasid_request req;
+
+int ret = 0;
+
+req.argsz = sizeof(req);
+req.flags = VFIO_IOMMU_FLAG_FREE_PASID;
+req.range.min = pasid;
+req.range.max = pasid + 1;
+
+ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, );
+if (ret) {
+error_report("%s: free failed (%m)", __func__);
+}
+return ret;
+}
+
 static int vfio_init_container(VFIOContainer *container, int group_fd,
bool want_nested, Error **errp)
 {
@@ -1797,3 +1841,25 @@ int vfio_eeh_as_op(AddressSpace *as, uint32_t op)
 }
 return vfio_eeh_container_op(container, op);
 }
+
+static void vfio_host_iommu_context_class_init(ObjectClass *klass,
+   void *data)
+{
+HostIOMMUContextClass *hicxc = HOST_IOMMU_CONTEXT_CLASS(klass);
+
+hicxc->pasid_alloc = vfio_host_iommu_ctx_pasid_alloc;
+hicxc->pasid_free = vfio_host_iommu_ctx_pasid_free;
+}
+
+static const TypeInfo vfio_host_iommu_context_info = {
+.parent = TYPE_HOST_IOMMU_CONTEXT,
+.name = TYPE_VFIO_HOST_IOMMU_CONTEXT,
+.class_init = vfio_host_iommu_context_class_init,
+};
+
+static void vfio_register_types(void)
+{
+type_register_static(_host_iommu_context_info);
+}
+
+type_init(vfio_register_types)
diff --git a/include/hw/iommu/host_iommu_context.h 
b/include/hw/iommu/host_iommu_context.h
index 35c4861..227c433 100644
--- a/include/hw/iommu/host_iommu_context.h
+++ b/include/hw/iommu/host_iommu_context.h
@@ -33,6 +33,9 @@
 #define TYPE_HOST_IOMMU_CONTEXT "qemu:host-iommu-context"
 #define HOST_IOMMU_CONTEXT(obj) \
 OBJECT_CHECK(HostIOMMUContext, (obj), TYPE_HOST_IOMMU_CONTEXT)
+#define HOST_IOMMU_CONTEXT_CLASS(klass) \
+OBJECT_CLASS_CHECK(HostIOMMUContextClass, (klass), \
+ TYPE_HOST_IOMMU_CONTEXT)
 #define HOST_IOMMU_CONTEXT_GET_CLASS(obj) \
 OBJECT_GET_CLASS(HostIOMMUContextClass, (obj), \
  TYPE_HOST_IOMMU_CONTEXT)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index a77d0ed..f8694d6 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -26,12 +26,15 @@
 #include "qemu/notify.h"
 #include "ui/console.h"
 #include "hw/display/ramfb.h"
+#include "hw/iommu/host_iommu_context.h"
 #ifdef CONFIG_LINUX
 #include 
 #endif
 
 #define VFIO_MSG_PREFIX "vfio %s: "
 
+#define TYPE_VFIO_HOST_IOMMU_CONTEXT "qemu:vfio-host-iommu-context"
+
 enum {
 VFIO_DEVICE_TYPE_PCI = 0,
 VFIO_DEVICE_TYPE_PLATFORM = 1,
@@ -71,6 +74,7 @@ typedef struct VFIOContainer {
 MemoryListener listener;
 MemoryListener prereg_listener;
 unsigned iommu_type;
+HostIOMMUContext iommu_ctx;
 Error *error;
 bool initialized;
 unsigned long pgsizes;
-- 
2.7.4




[RFC v8 09/25] hw/pci: introduce pci_device_set/unset_iommu_context()

2020-07-12 Thread Liu Yi L
For nesting IOMMU translation capable platforms, vIOMMUs running on
such system could be implemented upon physical IOMMU nested paging
(VFIO case). vIOMMU advertises such implementation by "want_nested"
attribute to PCIe devices (e.g. VFIO PCI). Once "want_nested" is
satisfied, device (VFIO case) should set HostIOMMUContext to vIOMMU,
thus vIOMMU could manage stage-1 translation. DMAs out from such
devices would be protected through the stage-1 page tables owned by
guest together with stage-2 page tables owned by host.

This patch adds pci_device_set/unset_iommu_context() to set/unset
HostIOMMUContext for a given PCIe device (VFIO case). Caller of set
should fail if set operation failed.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Cc: Michael S. Tsirkin 
Reviewed-by: Peter Xu 
Signed-off-by: Liu Yi L 
---
rfcv5 (v2) -> rfcv6:
*) pci_device_set_iommu_context() returns 0 if callback is not implemented.
---
 hw/pci/pci.c | 28 
 include/hw/pci/pci.h | 10 ++
 2 files changed, 38 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 3c27805..59864c6 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2743,6 +2743,34 @@ int pci_device_get_iommu_attr(PCIDevice *dev, IOMMUAttr 
attr, void *data)
 return -ENOENT;
 }
 
+int pci_device_set_iommu_context(PCIDevice *dev,
+ HostIOMMUContext *iommu_ctx)
+{
+PCIBus *bus;
+uint8_t devfn;
+
+pci_device_get_iommu_bus_devfn(dev, , );
+if (bus && bus->iommu_ops &&
+bus->iommu_ops->set_iommu_context) {
+return bus->iommu_ops->set_iommu_context(bus,
+  bus->iommu_opaque, devfn, iommu_ctx);
+}
+return 0;
+}
+
+void pci_device_unset_iommu_context(PCIDevice *dev)
+{
+PCIBus *bus;
+uint8_t devfn;
+
+pci_device_get_iommu_bus_devfn(dev, , );
+if (bus && bus->iommu_ops &&
+bus->iommu_ops->unset_iommu_context) {
+bus->iommu_ops->unset_iommu_context(bus,
+ bus->iommu_opaque, devfn);
+}
+}
+
 void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque)
 {
 bus->iommu_ops = ops;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index f74161b..0647d64 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -9,6 +9,8 @@
 
 #include "hw/pci/pcie.h"
 
+#include "hw/iommu/host_iommu_context.h"
+
 extern bool pci_available;
 
 /* PCI bus */
@@ -495,10 +497,18 @@ struct PCIIOMMUOps {
 void *opaque, int32_t devfn);
 int (*get_iommu_attr)(PCIBus *bus, void *opaque, int32_t devfn,
IOMMUAttr attr, void *data);
+int (*set_iommu_context)(PCIBus *bus, void *opaque,
+ int32_t devfn,
+ HostIOMMUContext *iommu_ctx);
+void (*unset_iommu_context)(PCIBus *bus, void *opaque,
+int32_t devfn);
 };
 
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
 int pci_device_get_iommu_attr(PCIDevice *dev, IOMMUAttr attr, void *data);
+int pci_device_set_iommu_context(PCIDevice *dev,
+ HostIOMMUContext *iommu_ctx);
+void pci_device_unset_iommu_context(PCIDevice *dev);
 void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *iommu_ops, void *opaque);
 
 static inline void
-- 
2.7.4




[RFC v8 15/25] intel_iommu: add PASID cache management infrastructure

2020-07-12 Thread Liu Yi L
This patch adds a PASID cache management infrastructure based on
new added structure VTDPASIDAddressSpace, which is used to track
the PASID usage and future PASID tagged DMA address translation
support in vIOMMU.

struct VTDPASIDAddressSpace {
VTDBus *vtd_bus;
uint8_t devfn;
AddressSpace as;
uint32_t pasid;
IntelIOMMUState *iommu_state;
VTDContextCacheEntry context_cache_entry;
QLIST_ENTRY(VTDPASIDAddressSpace) next;
VTDPASIDCacheEntry pasid_cache_entry;
};

Ideally, a VTDPASIDAddressSpace instance is created when a PASID
is bound with a DMA AddressSpace. Intel VT-d spec requires guest
software to issue pasid cache invalidation when bind or unbind a
pasid with an address space under caching-mode. However, as
VTDPASIDAddressSpace instances also act as pasid cache in this
implementation, its creation also happens during vIOMMU PASID
tagged DMA translation. The creation in this path will not be
added in this patch since no PASID-capable emulated devices for
now.

The implementation in this patch manages VTDPASIDAddressSpace
instances per PASID+BDF (lookup and insert will use PASID and
BDF) since Intel VT-d spec allows per-BDF PASID Table. When a
guest bind a PASID with an AddressSpace, QEMU will capture the
guest pasid selective pasid cache invalidation, and allocate
remove a VTDPASIDAddressSpace instance per the invalidation
reasons:

*) a present pasid entry moved to non-present
*) a present pasid entry to be a present entry
*) a non-present pasid entry moved to present

vIOMMU emulator could figure out the reason by fetching latest
guest pasid entry.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Yi Sun 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Signed-off-by: Liu Yi L 
---
rfcv4 (v1) -> rfcv5 (v2):
*) merged this patch with former replay binding patch, makes
   PSI/DSI/GSI use the unified function to do cache invalidation
   and pasid binding replay.
*) dropped pasid_cache_gen in both iommu_state and vtd_pasid_as
   as it is not necessary so far, we may want it when one day
   initroduce emulated SVA-capable device.
---
 hw/i386/intel_iommu.c  | 464 +
 hw/i386/intel_iommu_internal.h |  21 ++
 hw/i386/trace-events   |   1 +
 include/hw/i386/intel_iommu.h  |  24 +++
 4 files changed, 510 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 968a0fc..c93c360 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -40,6 +40,7 @@
 #include "kvm_i386.h"
 #include "migration/vmstate.h"
 #include "trace.h"
+#include "qemu/jhash.h"
 
 /* context entry operations */
 #define VTD_CE_GET_RID2PASID(ce) \
@@ -65,6 +66,8 @@
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
 
+static void vtd_pasid_cache_reset(IntelIOMMUState *s);
+
 static void vtd_panic_require_caching_mode(void)
 {
 error_report("We need to set caching-mode=on for intel-iommu to enable "
@@ -276,6 +279,7 @@ static void vtd_reset_caches(IntelIOMMUState *s)
 vtd_iommu_lock(s);
 vtd_reset_iotlb_locked(s);
 vtd_reset_context_cache_locked(s);
+vtd_pasid_cache_reset(s);
 vtd_iommu_unlock(s);
 }
 
@@ -686,6 +690,16 @@ static inline bool vtd_pe_type_check(X86IOMMUState 
*x86_iommu,
 return true;
 }
 
+static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe)
+{
+return VTD_SM_PASID_ENTRY_DID((pe)->val[1]);
+}
+
+static inline uint32_t vtd_sm_ce_get_pdt_entry_num(VTDContextEntry *ce)
+{
+return 1U << (VTD_SM_CONTEXT_ENTRY_PDTS(ce->val[0]) + 7);
+}
+
 static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
 {
 return pdire->val & 1;
@@ -2395,9 +2409,443 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, 
VTDInvDesc *inv_desc)
 return true;
 }
 
+static inline void vtd_init_pasid_key(uint32_t pasid,
+ uint16_t sid,
+ struct pasid_key *key)
+{
+key->pasid = pasid;
+key->sid = sid;
+}
+
+static guint vtd_pasid_as_key_hash(gconstpointer v)
+{
+struct pasid_key *key = (struct pasid_key *)v;
+uint32_t a, b, c;
+
+/* Jenkins hash */
+a = b = c = JHASH_INITVAL + sizeof(*key);
+a += key->sid;
+b += extract32(key->pasid, 0, 16);
+c += extract32(key->pasid, 16, 16);
+
+__jhash_mix(a, b, c);
+__jhash_final(a, b, c);
+
+return c;
+}
+
+static gboolean vtd_pasid_as_key_equal(gconstpointer v1, gconstpointer v2)
+{
+const struct pasid_key *k1 = v1;
+const struct pasid_key *k2 = v2;
+
+return (k1->pasid == k2->pasid) && (k1->sid == k2->sid);
+}
+
+static inline int vtd_dev_get_pe_from_pasid(IntelIOMMUState *s,
+uint8_t bus_num,
+uint8_t devfn,
+uint32_t pasid,
+   

[RFC v8 08/25] hw/iommu: introduce HostIOMMUContext

2020-07-12 Thread Liu Yi L
Currently, many platform vendors provide the capability of dual stage
DMA address translation in hardware. For example, nested translation
on Intel VT-d scalable mode, nested stage translation on ARM SMMUv3,
and etc. In dual stage DMA address translation, there are two stages
address translation, stage-1 (a.k.a first-level) and stage-2 (a.k.a
second-level) translation structures. Stage-1 translation results are
also subjected to stage-2 translation structures. Take vSVA (Virtual
Shared Virtual Addressing) as an example, guest IOMMU driver owns
stage-1 translation structures (covers GVA->GPA translation), and host
IOMMU driver owns stage-2 translation structures (covers GPA->HPA
translation). VMM is responsible to bind stage-1 translation structures
to host, thus hardware could achieve GVA->GPA and then GPA->HPA
translation. For more background on SVA, refer the below links.
 - https://www.youtube.com/watch?v=Kq_nfGK5MwQ
 - https://events19.lfasiallc.com/wp-content/uploads/2017/11/\
Shared-Virtual-Memory-in-KVM_Yi-Liu.pdf

In QEMU, vIOMMU emulators expose IOMMUs to VM per their own spec (e.g.
Intel VT-d spec). Devices are pass-through to guest via device pass-
through components like VFIO. VFIO is a userspace driver framework
which exposes host IOMMU programming capability to userspace in a
secure manner. e.g. IOVA MAP/UNMAP requests. Information, different
from map/unmap notifications need to be passed from QEMU vIOMMU device
to/from the host IOMMU driver through the VFIO/IOMMU layer:
 1) PASID allocation (allow host to intercept in PASID allocation)
 2) bind stage-1 translation structures to host
 3) propagate stage-1 cache invalidation to host
 4) DMA address translation fault (I/O page fault) servicing etc.

With the above new interactions in QEMU, it requires an abstract layer
to facilitate the above operations and expose to vIOMMU emulators as an
explicit way for vIOMMU emulators call into VFIO. This patch introduces
HostIOMMUContext to serve it. The HostIOMMUContext is an object which
allows to manage the stage-1 translation when a vIOMMU is implemented
upon physical IOMMU nested paging (VFIO case). It is an abstract object
which needs to be derived for each vIOMMU immplementation based on
physical nested paging. An HostIOMMUContext derived object will be passed
to each VFIO device protected by a vIOMMU using physical nested paging.

This patchg also introduces HostIOMMUContextClass to provide methods for
vIOMMU emulators to propagate dual-stage translation related requests to
host. As a beginning, PASID allocation/free are defined to propagate PASID
allocation/free requests to host which is required for the vendors which
manage PASID in system-wide.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Cc: Michael S. Tsirkin 
Signed-off-by: Liu Yi L 
---
 hw/Makefile.objs  |  1 +
 hw/iommu/Makefile.objs|  1 +
 hw/iommu/host_iommu_context.c | 97 +++
 include/hw/iommu/host_iommu_context.h | 75 +++
 4 files changed, 174 insertions(+)
 create mode 100644 hw/iommu/Makefile.objs
 create mode 100644 hw/iommu/host_iommu_context.c
 create mode 100644 include/hw/iommu/host_iommu_context.h

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 4cbe5e4..d272f3d 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -40,6 +40,7 @@ devices-dirs-$(CONFIG_MEM_DEVICE) += mem/
 devices-dirs-$(CONFIG_NUBUS) += nubus/
 devices-dirs-y += semihosting/
 devices-dirs-y += smbios/
+devices-dirs-y += iommu/
 endif
 
 common-obj-y += $(devices-dirs-y)
diff --git a/hw/iommu/Makefile.objs b/hw/iommu/Makefile.objs
new file mode 100644
index 000..e6eed4e
--- /dev/null
+++ b/hw/iommu/Makefile.objs
@@ -0,0 +1 @@
+obj-y += host_iommu_context.o
diff --git a/hw/iommu/host_iommu_context.c b/hw/iommu/host_iommu_context.c
new file mode 100644
index 000..5fb2223
--- /dev/null
+++ b/hw/iommu/host_iommu_context.c
@@ -0,0 +1,97 @@
+/*
+ * QEMU abstract of Host IOMMU
+ *
+ * Copyright (C) 2020 Intel Corporation.
+ *
+ * Authors: Liu Yi L 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "qapi/visitor.h"
+#include "hw/iommu/host_iommu_context.h"
+
+int host_iommu_ctx_pasid_alloc(HostIOMMUContext *iommu_ctx, 

[RFC v8 12/25] vfio: init HostIOMMUContext per-container

2020-07-12 Thread Liu Yi L
In this patch, QEMU firstly gets iommu info from kernel to check the
supported capabilities by a VFIO_IOMMU_TYPE1_NESTING iommu. And inits
HostIOMMUContet instance.

For vfio-pci devices, it could use pci_device_set/unset_iommu() to
expose host iommu context to vIOMMU emulators. vIOMMU emulators
could make use the methods provided by host iommu context. e.g.
propagate requests to host iommu.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Cc: Alex Williamson 
Signed-off-by: Liu Yi L 
---
 hw/vfio/common.c | 113 +++
 hw/vfio/pci.c|  17 +
 2 files changed, 130 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7b92a58..cdd16a1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1228,10 +1228,102 @@ static int 
vfio_host_iommu_ctx_pasid_free(HostIOMMUContext *iommu_ctx,
 return ret;
 }
 
+/**
+ * Get iommu info from host. Caller of this funcion should free
+ * the memory pointed by the returned pointer stored in @info
+ * after a successful calling when finished its usage.
+ */
+static int vfio_get_iommu_info(VFIOContainer *container,
+ struct vfio_iommu_type1_info **info)
+{
+
+size_t argsz = sizeof(struct vfio_iommu_type1_info);
+
+*info = g_malloc0(argsz);
+
+retry:
+(*info)->argsz = argsz;
+
+if (ioctl(container->fd, VFIO_IOMMU_GET_INFO, *info)) {
+g_free(*info);
+*info = NULL;
+return -errno;
+}
+
+if (((*info)->argsz > argsz)) {
+argsz = (*info)->argsz;
+*info = g_realloc(*info, argsz);
+goto retry;
+}
+
+return 0;
+}
+
+static struct vfio_info_cap_header *
+vfio_get_iommu_info_cap(struct vfio_iommu_type1_info *info, uint16_t id)
+{
+struct vfio_info_cap_header *hdr;
+void *ptr = info;
+
+if (!(info->flags & VFIO_IOMMU_INFO_CAPS)) {
+return NULL;
+}
+
+for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
+if (hdr->id == id) {
+return hdr;
+}
+}
+
+return NULL;
+}
+
+static int vfio_get_nesting_iommu_cap(VFIOContainer *container,
+   struct vfio_iommu_type1_info_cap_nesting **cap_nesting)
+{
+struct vfio_iommu_type1_info *info;
+struct vfio_info_cap_header *hdr;
+struct vfio_iommu_type1_info_cap_nesting *cap;
+struct iommu_nesting_info *nest_info;
+int ret;
+uint32_t minsz, cap_size;
+
+ret = vfio_get_iommu_info(container, );
+if (ret) {
+return ret;
+}
+
+hdr = vfio_get_iommu_info_cap(info,
+VFIO_IOMMU_TYPE1_INFO_CAP_NESTING);
+if (!hdr) {
+g_free(info);
+return -EINVAL;
+}
+
+cap = container_of(hdr,
+struct vfio_iommu_type1_info_cap_nesting, header);
+
+nest_info = >info;
+minsz = offsetof(struct iommu_nesting_info, data);
+if (nest_info->size < minsz) {
+g_free(info);
+return -EINVAL;
+}
+
+cap_size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) +
+   nest_info->size;
+*cap_nesting = g_malloc0(cap_size);
+memcpy(*cap_nesting, cap, cap_size);
+
+g_free(info);
+return 0;
+}
+
 static int vfio_init_container(VFIOContainer *container, int group_fd,
bool want_nested, Error **errp)
 {
 int iommu_type, ret;
+uint64_t flags = 0;
 
 iommu_type = vfio_get_iommu_type(container, want_nested, errp);
 if (iommu_type < 0) {
@@ -1259,6 +1351,27 @@ static int vfio_init_container(VFIOContainer *container, 
int group_fd,
 return -errno;
 }
 
+if (iommu_type == VFIO_TYPE1_NESTING_IOMMU) {
+struct vfio_iommu_type1_info_cap_nesting *nesting = NULL;
+struct iommu_nesting_info *nest_info;
+
+ret = vfio_get_nesting_iommu_cap(container, );
+if (ret) {
+error_setg_errno(errp, -ret,
+ "Failed to get nesting iommu cap");
+return ret;
+}
+
+nest_info = (struct iommu_nesting_info *) >info;
+flags |= (nest_info->features & IOMMU_NESTING_FEAT_SYSWIDE_PASID) ?
+ HOST_IOMMU_PASID_REQUEST : 0;
+host_iommu_ctx_init(>iommu_ctx,
+sizeof(container->iommu_ctx),
+TYPE_VFIO_HOST_IOMMU_CONTEXT,
+flags);
+g_free(nesting);
+}
+
 container->iommu_type = iommu_type;
 return 0;
 }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 9d8d27f..b7045f0 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2710,6 +2710,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
 VFIODevice *vbasedev_iter;
 VFIOGroup *group;
+VFIOContainer *container;
 char *tmp, *subsys, group_path[PATH_MAX], *group_name;
 Error *err = NULL;
 ssize_t len;
@@ -2787,6 +2788,15 @@ static void 

[RFC v8 14/25] intel_iommu: process PASID cache invalidation

2020-07-12 Thread Liu Yi L
This patch adds PASID cache invalidation handling. When guest enabled
PASID usages (e.g. SVA), guest software should issue a proper PASID
cache invalidation when caching-mode is exposed. This patch only adds
the draft handling of pasid cache invalidation. Detailed handling will
be added in subsequent patches.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Yi Sun 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Reviewed-by: Peter Xu 
Signed-off-by: Liu Yi L 
---
rfcv4 (v1) -> rfcv5 (v2):
*) remove vtd_pasid_cache_gsi(), vtd_pasid_cache_psi()
   and vtd_pasid_cache_dsi()
---
 hw/i386/intel_iommu.c  | 40 +++-
 hw/i386/intel_iommu_internal.h | 12 
 hw/i386/trace-events   |  3 +++
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 46036d4..968a0fc 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2395,6 +2395,37 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, 
VTDInvDesc *inv_desc)
 return true;
 }
 
+static bool vtd_process_pasid_desc(IntelIOMMUState *s,
+   VTDInvDesc *inv_desc)
+{
+if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) ||
+(inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) ||
+(inv_desc->val[2] & VTD_INV_DESC_PASIDC_RSVD_VAL2) ||
+(inv_desc->val[3] & VTD_INV_DESC_PASIDC_RSVD_VAL3)) {
+error_report_once("non-zero-field-in-pc_inv_desc hi: 0x%" PRIx64
+  " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
+return false;
+}
+
+switch (inv_desc->val[0] & VTD_INV_DESC_PASIDC_G) {
+case VTD_INV_DESC_PASIDC_DSI:
+break;
+
+case VTD_INV_DESC_PASIDC_PASID_SI:
+break;
+
+case VTD_INV_DESC_PASIDC_GLOBAL:
+break;
+
+default:
+error_report_once("invalid-inv-granu-in-pc_inv_desc hi: 0x%" PRIx64
+  " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
+return false;
+}
+
+return true;
+}
+
 static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
  VTDInvDesc *inv_desc)
 {
@@ -2501,12 +2532,11 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
 }
 break;
 
-/*
- * TODO: the entity of below two cases will be implemented in future 
series.
- * To make guest (which integrates scalable mode support patch set in
- * iommu driver) work, just return true is enough so far.
- */
 case VTD_INV_DESC_PC:
+trace_vtd_inv_desc("pasid-cache", inv_desc.val[1], inv_desc.val[0]);
+if (!vtd_process_pasid_desc(s, _desc)) {
+return false;
+}
 break;
 
 case VTD_INV_DESC_PIOTLB:
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 64ac0a8..22d0bc5 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -445,6 +445,18 @@ typedef union VTDInvDesc VTDInvDesc;
 (0x3800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM | VTD_SL_TM)) : \
 (0x3800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
 
+#define VTD_INV_DESC_PASIDC_G  (3ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfULL)
+#define VTD_INV_DESC_PASIDC_DID(val)   (((val) >> 16) & VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_PASIDC_RSVD_VAL0  0xfff0ffc0ULL
+#define VTD_INV_DESC_PASIDC_RSVD_VAL1  0xULL
+#define VTD_INV_DESC_PASIDC_RSVD_VAL2  0xULL
+#define VTD_INV_DESC_PASIDC_RSVD_VAL3  0xULL
+
+#define VTD_INV_DESC_PASIDC_DSI(0ULL << 4)
+#define VTD_INV_DESC_PASIDC_PASID_SI   (1ULL << 4)
+#define VTD_INV_DESC_PASIDC_GLOBAL (3ULL << 4)
+
 /* Information about page-selective IOTLB invalidate */
 struct VTDIOTLBPageInvInfo {
 uint16_t domain_id;
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 71536a7..f7cd4e5 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -22,6 +22,9 @@ vtd_inv_qi_head(uint16_t head) "read head %d"
 vtd_inv_qi_tail(uint16_t head) "write tail %d"
 vtd_inv_qi_fetch(void) ""
 vtd_context_cache_reset(void) ""
+vtd_pasid_cache_gsi(void) ""
+vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 
0x%"PRIx16
+vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC 
invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32
 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present"
 vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" 
devfn %"PRIu8" not present"
 vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t 
domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" 
domain 0x%"PRIx16
-- 
2.7.4




[RFC v8 07/25] vfio: check VFIO_TYPE1_NESTING_IOMMU support

2020-07-12 Thread Liu Yi L
VFIO needs to check VFIO_TYPE1_NESTING_IOMMU support with Kernel before
further using it. e.g. requires to check IOMMU UAPI support.

Referred patch from Eric Auger: https://patchwork.kernel.org/patch/11040499/

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Cc: Alex Williamson 
Signed-off-by: Liu Yi L 
Signed-off-by: Eric Auger 
Signed-off-by: Yi Sun 
---
 hw/vfio/common.c | 37 ++---
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 89c6a25..b85fbcf 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1152,30 +1152,44 @@ static void vfio_put_address_space(VFIOAddressSpace 
*space)
 }
 
 /*
- * vfio_get_iommu_type - selects the richest iommu_type (v2 first)
+ * vfio_get_iommu_type - selects the richest iommu_type (NESTING first)
  */
 static int vfio_get_iommu_type(VFIOContainer *container,
+   bool want_nested,
Error **errp)
 {
-int iommu_types[] = { VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU,
+int iommu_types[] = { VFIO_TYPE1_NESTING_IOMMU,
+  VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU,
   VFIO_SPAPR_TCE_v2_IOMMU, VFIO_SPAPR_TCE_IOMMU };
-int i;
+int i, ret = -EINVAL;
 
 for (i = 0; i < ARRAY_SIZE(iommu_types); i++) {
 if (ioctl(container->fd, VFIO_CHECK_EXTENSION, iommu_types[i])) {
-return iommu_types[i];
+if (iommu_types[i] == VFIO_TYPE1_NESTING_IOMMU) {
+if (!want_nested) {
+continue;
+}
+}
+ret = iommu_types[i];
+break;
 }
 }
-error_setg(errp, "No available IOMMU models");
-return -EINVAL;
+
+if (ret < 0) {
+error_setg(errp, "No available IOMMU models");
+} else if (want_nested && ret != VFIO_TYPE1_NESTING_IOMMU) {
+error_setg(errp, "Nested mode requested but not supported");
+ret = -EINVAL;
+}
+return ret;
 }
 
 static int vfio_init_container(VFIOContainer *container, int group_fd,
-   Error **errp)
+   bool want_nested, Error **errp)
 {
 int iommu_type, ret;
 
-iommu_type = vfio_get_iommu_type(container, errp);
+iommu_type = vfio_get_iommu_type(container, want_nested, errp);
 if (iommu_type < 0) {
 return iommu_type;
 }
@@ -1206,7 +1220,7 @@ static int vfio_init_container(VFIOContainer *container, 
int group_fd,
 }
 
 static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
-  Error **errp)
+  bool want_nested, Error **errp)
 {
 VFIOContainer *container;
 int ret, fd;
@@ -1272,12 +1286,13 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 QLIST_INIT(>giommu_list);
 QLIST_INIT(>hostwin_list);
 
-ret = vfio_init_container(container, group->fd, errp);
+ret = vfio_init_container(container, group->fd, want_nested, errp);
 if (ret) {
 goto free_container_exit;
 }
 
 switch (container->iommu_type) {
+case VFIO_TYPE1_NESTING_IOMMU:
 case VFIO_TYPE1v2_IOMMU:
 case VFIO_TYPE1_IOMMU:
 {
@@ -1498,7 +1513,7 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as,
 group->groupid = groupid;
 QLIST_INIT(>device_list);
 
-if (vfio_connect_container(group, as, errp)) {
+if (vfio_connect_container(group, as, want_nested, errp)) {
 error_prepend(errp, "failed to setup container for group %d: ",
   groupid);
 goto close_fd_exit;
-- 
2.7.4




[RFC v8 10/25] intel_iommu: add set/unset_iommu_context callback

2020-07-12 Thread Liu Yi L
This patch adds set/unset_iommu_context() impelementation in Intel
vIOMMU. PCIe devices (VFIO case) sets HostIOMMUContext to vIOMMU as
an ack of vIOMMU's "want_nested" attribute. Thus vIOMMU could build
DMA protection based on nested paging of host IOMMU.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Yi Sun 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Signed-off-by: Liu Yi L 
---
 hw/i386/intel_iommu.c | 71 ---
 include/hw/i386/intel_iommu.h | 21 ++---
 2 files changed, 83 insertions(+), 9 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 2d6748f..8f7c957 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3359,23 +3359,33 @@ static const MemoryRegionOps vtd_mem_ir_ops = {
 },
 };
 
-VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
+/**
+ * Fetch a VTDBus instance for given PCIBus. If no existing instance,
+ * allocate one.
+ */
+static VTDBus *vtd_find_add_bus(IntelIOMMUState *s, PCIBus *bus)
 {
 uintptr_t key = (uintptr_t)bus;
 VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, );
-VTDAddressSpace *vtd_dev_as;
-char name[128];
 
 if (!vtd_bus) {
 uintptr_t *new_key = g_malloc(sizeof(*new_key));
 *new_key = (uintptr_t)bus;
 /* No corresponding free() */
-vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * \
-PCI_DEVFN_MAX);
+vtd_bus = g_malloc0(sizeof(VTDBus));
 vtd_bus->bus = bus;
 g_hash_table_insert(s->vtd_as_by_busptr, new_key, vtd_bus);
 }
+return vtd_bus;
+}
 
+VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
+{
+VTDBus *vtd_bus;
+VTDAddressSpace *vtd_dev_as;
+char name[128];
+
+vtd_bus = vtd_find_add_bus(s, bus);
 vtd_dev_as = vtd_bus->dev_as[devfn];
 
 if (!vtd_dev_as) {
@@ -3463,6 +3473,55 @@ static int vtd_dev_get_iommu_attr(PCIBus *bus, void 
*opaque, int32_t devfn,
 return ret;
 }
 
+static int vtd_dev_set_iommu_context(PCIBus *bus, void *opaque,
+ int devfn,
+ HostIOMMUContext *iommu_ctx)
+{
+IntelIOMMUState *s = opaque;
+VTDBus *vtd_bus;
+VTDHostIOMMUContext *vtd_dev_icx;
+
+assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
+
+vtd_bus = vtd_find_add_bus(s, bus);
+
+vtd_iommu_lock(s);
+
+vtd_dev_icx = vtd_bus->dev_icx[devfn];
+
+assert(!vtd_dev_icx);
+
+vtd_bus->dev_icx[devfn] = vtd_dev_icx =
+g_malloc0(sizeof(VTDHostIOMMUContext));
+vtd_dev_icx->vtd_bus = vtd_bus;
+vtd_dev_icx->devfn = (uint8_t)devfn;
+vtd_dev_icx->iommu_state = s;
+vtd_dev_icx->iommu_ctx = iommu_ctx;
+
+vtd_iommu_unlock(s);
+
+return 0;
+}
+
+static void vtd_dev_unset_iommu_context(PCIBus *bus, void *opaque, int devfn)
+{
+IntelIOMMUState *s = opaque;
+VTDBus *vtd_bus;
+VTDHostIOMMUContext *vtd_dev_icx;
+
+assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
+
+vtd_bus = vtd_find_add_bus(s, bus);
+
+vtd_iommu_lock(s);
+
+vtd_dev_icx = vtd_bus->dev_icx[devfn];
+g_free(vtd_dev_icx);
+vtd_bus->dev_icx[devfn] = NULL;
+
+vtd_iommu_unlock(s);
+}
+
 static uint64_t get_naturally_aligned_size(uint64_t start,
uint64_t size, int gaw)
 {
@@ -3759,6 +3818,8 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void 
*opaque, int devfn)
 static PCIIOMMUOps vtd_iommu_ops = {
 .get_address_space = vtd_host_dma_iommu,
 .get_iommu_attr = vtd_dev_get_iommu_attr,
+.set_iommu_context = vtd_dev_set_iommu_context,
+.unset_iommu_context = vtd_dev_unset_iommu_context,
 };
 
 static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 3870052..b5fefb9 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -64,6 +64,7 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
 typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 typedef struct VTDPASIDDirEntry VTDPASIDDirEntry;
 typedef struct VTDPASIDEntry VTDPASIDEntry;
+typedef struct VTDHostIOMMUContext VTDHostIOMMUContext;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -112,10 +113,20 @@ struct VTDAddressSpace {
 IOVATree *iova_tree;  /* Traces mapped IOVA ranges */
 };
 
+struct VTDHostIOMMUContext {
+VTDBus *vtd_bus;
+uint8_t devfn;
+HostIOMMUContext *iommu_ctx;
+IntelIOMMUState *iommu_state;
+};
+
 struct VTDBus {
-PCIBus* bus;   /* A reference to the bus to provide 
translation for */
+/* A reference to the bus to provide translation for */
+PCIBus *bus;
 /* A table of VTDAddressSpace objects indexed by devfn */
-VTDAddressSpace *dev_as[];
+VTDAddressSpace *dev_as[PCI_DEVFN_MAX];
+/* A table of VTDHostIOMMUContext objects indexed by devfn 

[RFC v8 05/25] intel_iommu: add get_iommu_attr() callback

2020-07-12 Thread Liu Yi L
Return vIOMMU attribute to caller. e.g. VFIO call via PCI layer.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Yi Sun 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Signed-off-by: Liu Yi L 
---
 hw/i386/intel_iommu.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index ca6dcad..2d6748f 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3441,6 +3441,28 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus, int devfn)
 return vtd_dev_as;
 }
 
+static int vtd_dev_get_iommu_attr(PCIBus *bus, void *opaque, int32_t devfn,
+   IOMMUAttr attr, void *data)
+{
+int ret = 0;
+
+assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
+
+switch (attr) {
+case IOMMU_WANT_NESTING:
+{
+bool *pdata = data;
+
+/* return false until vSVA is ready */
+*pdata = false;
+break;
+}
+default:
+ret = -ENOENT;
+}
+return ret;
+}
+
 static uint64_t get_naturally_aligned_size(uint64_t start,
uint64_t size, int gaw)
 {
@@ -3736,6 +3758,7 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void 
*opaque, int devfn)
 
 static PCIIOMMUOps vtd_iommu_ops = {
 .get_address_space = vtd_host_dma_iommu,
+.get_iommu_attr = vtd_dev_get_iommu_attr,
 };
 
 static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
-- 
2.7.4




[RFC v8 06/25] vfio: pass nesting requirement into vfio_get_group()

2020-07-12 Thread Liu Yi L
This patch passes the nesting requirement into vfio_get_group() to
indicate whether VFIO_TYPE1_NESTING_IOMMU is required.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Cc: Alex Williamson 
Signed-off-by: Liu Yi L 
---
 hw/vfio/ap.c  | 2 +-
 hw/vfio/ccw.c | 2 +-
 hw/vfio/common.c  | 3 ++-
 hw/vfio/pci.c | 9 -
 hw/vfio/platform.c| 2 +-
 include/hw/vfio/vfio-common.h | 3 ++-
 6 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 95564c1..933b118 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -82,7 +82,7 @@ static VFIOGroup *vfio_ap_get_group(VFIOAPDevice *vapdev, 
Error **errp)
 
 g_free(group_path);
 
-return vfio_get_group(groupid, _space_memory, errp);
+return vfio_get_group(groupid, _space_memory, false, errp);
 }
 
 static void vfio_ap_realize(DeviceState *dev, Error **errp)
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 06e69d7..7c20103 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -620,7 +620,7 @@ static VFIOGroup *vfio_ccw_get_group(S390CCWDevice *cdev, 
Error **errp)
 return NULL;
 }
 
-return vfio_get_group(groupid, _space_memory, errp);
+return vfio_get_group(groupid, _space_memory, false, errp);
 }
 
 static void vfio_ccw_realize(DeviceState *dev, Error **errp)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 0b3593b..89c6a25 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1453,7 +1453,8 @@ static void vfio_disconnect_container(VFIOGroup *group)
 }
 }
 
-VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
+VFIOGroup *vfio_get_group(int groupid, AddressSpace *as,
+  bool want_nested, Error **errp)
 {
 VFIOGroup *group;
 char path[32];
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6838bcc..9d8d27f 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2717,6 +2717,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 int groupid;
 int i, ret;
 bool is_mdev;
+bool want_nested;
 
 if (!vdev->vbasedev.sysfsdev) {
 if (!(~vdev->host.domain || ~vdev->host.bus ||
@@ -2775,7 +2776,13 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
 trace_vfio_realize(vdev->vbasedev.name, groupid);
 
-group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), 
errp);
+if (pci_device_get_iommu_attr(pdev,
+ IOMMU_WANT_NESTING, _nested)) {
+want_nested = false;
+}
+
+group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev),
+   want_nested, errp);
 if (!group) {
 goto error;
 }
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index ac2cefc..7ad7702 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -580,7 +580,7 @@ static int vfio_base_device_init(VFIODevice *vbasedev, 
Error **errp)
 
 trace_vfio_platform_base_device_init(vbasedev->name, groupid);
 
-group = vfio_get_group(groupid, _space_memory, errp);
+group = vfio_get_group(groupid, _space_memory, false, errp);
 if (!group) {
 return -ENOENT;
 }
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index fd56420..a77d0ed 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -174,7 +174,8 @@ void vfio_region_mmaps_set_enabled(VFIORegion *region, bool 
enabled);
 void vfio_region_exit(VFIORegion *region);
 void vfio_region_finalize(VFIORegion *region);
 void vfio_reset_handler(void *opaque);
-VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
+VFIOGroup *vfio_get_group(int groupid, AddressSpace *as,
+  bool want_nested, Error **errp);
 void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
 VFIODevice *vbasedev, Error **errp);
-- 
2.7.4




[RFC v8 04/25] hw/pci: introduce pci_device_get_iommu_attr()

2020-07-12 Thread Liu Yi L
This patch adds pci_device_get_iommu_attr() to get vIOMMU attributes.
e.g. if nesting IOMMU wanted.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Cc: Michael S. Tsirkin 
Signed-off-by: Liu Yi L 
---
 hw/pci/pci.c | 35 ++-
 include/hw/pci/pci.h |  7 +++
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index b2a2077..3c27805 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2659,7 +2659,8 @@ static void pci_device_class_base_init(ObjectClass 
*klass, void *data)
 }
 }
 
-AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
+static void pci_device_get_iommu_bus_devfn(PCIDevice *dev,
+  PCIBus **pbus, uint8_t *pdevfn)
 {
 PCIBus *bus = pci_get_bus(dev);
 PCIBus *iommu_bus = bus;
@@ -2710,14 +2711,38 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice 
*dev)
 
 iommu_bus = parent_bus;
 }
-if (iommu_bus && iommu_bus->iommu_ops &&
- iommu_bus->iommu_ops->get_address_space) {
-return iommu_bus->iommu_ops->get_address_space(bus,
- iommu_bus->iommu_opaque, devfn);
+*pbus = iommu_bus;
+*pdevfn = devfn;
+}
+
+AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
+{
+PCIBus *bus;
+uint8_t devfn;
+
+pci_device_get_iommu_bus_devfn(dev, , );
+if (bus && bus->iommu_ops &&
+bus->iommu_ops->get_address_space) {
+return bus->iommu_ops->get_address_space(bus,
+bus->iommu_opaque, devfn);
 }
 return _space_memory;
 }
 
+int pci_device_get_iommu_attr(PCIDevice *dev, IOMMUAttr attr, void *data)
+{
+PCIBus *bus;
+uint8_t devfn;
+
+pci_device_get_iommu_bus_devfn(dev, , );
+if (bus && bus->iommu_ops &&
+bus->iommu_ops->get_iommu_attr) {
+return bus->iommu_ops->get_iommu_attr(bus, bus->iommu_opaque,
+   devfn, attr, data);
+}
+return -ENOENT;
+}
+
 void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque)
 {
 bus->iommu_ops = ops;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index a43c19b..f74161b 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -485,13 +485,20 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range);
 
 void pci_device_deassert_intx(PCIDevice *dev);
 
+typedef enum IOMMUAttr {
+IOMMU_WANT_NESTING,
+} IOMMUAttr;
+
 typedef struct PCIIOMMUOps PCIIOMMUOps;
 struct PCIIOMMUOps {
 AddressSpace * (*get_address_space)(PCIBus *bus,
 void *opaque, int32_t devfn);
+int (*get_iommu_attr)(PCIBus *bus, void *opaque, int32_t devfn,
+   IOMMUAttr attr, void *data);
 };
 
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
+int pci_device_get_iommu_attr(PCIDevice *dev, IOMMUAttr attr, void *data);
 void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *iommu_ops, void *opaque);
 
 static inline void
-- 
2.7.4




[RFC v8 03/25] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps

2020-07-12 Thread Liu Yi L
This patch modifies pci_setup_iommu() to set PCIIOMMUOps
instead of setting PCIIOMMUFunc. PCIIOMMUFunc is used to
get an address space for a PCI device in vendor specific
way. The PCIIOMMUOps still offers this functionality. But
using PCIIOMMUOps leaves space to add more iommu related
vendor specific operations.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Cc: Michael S. Tsirkin 
Reviewed-by: David Gibson 
Reviewed-by: Peter Xu 
Signed-off-by: Liu Yi L 
---
 hw/alpha/typhoon.c   |  6 +-
 hw/arm/smmu-common.c |  6 +-
 hw/hppa/dino.c   |  6 +-
 hw/i386/amd_iommu.c  |  6 +-
 hw/i386/intel_iommu.c|  6 +-
 hw/pci-host/designware.c |  6 +-
 hw/pci-host/pnv_phb3.c   |  6 +-
 hw/pci-host/pnv_phb4.c   |  6 +-
 hw/pci-host/ppce500.c|  6 +-
 hw/pci-host/prep.c   |  6 +-
 hw/pci-host/sabre.c  |  6 +-
 hw/pci/pci.c | 18 +-
 hw/ppc/ppc440_pcix.c |  6 +-
 hw/ppc/spapr_pci.c   |  6 +-
 hw/s390x/s390-pci-bus.c  |  8 ++--
 hw/virtio/virtio-iommu.c |  6 +-
 include/hw/pci/pci.h |  8 ++--
 include/hw/pci/pci_bus.h |  2 +-
 18 files changed, 96 insertions(+), 24 deletions(-)

diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index 29d44df..c4ac693 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -740,6 +740,10 @@ static AddressSpace *typhoon_pci_dma_iommu(PCIBus *bus, 
void *opaque, int devfn)
 return >pchip.iommu_as;
 }
 
+static const PCIIOMMUOps typhoon_iommu_ops = {
+.get_address_space = typhoon_pci_dma_iommu,
+};
+
 static void typhoon_set_irq(void *opaque, int irq, int level)
 {
 TyphoonState *s = opaque;
@@ -897,7 +901,7 @@ PCIBus *typhoon_init(MemoryRegion *ram, ISABus **isa_bus, 
qemu_irq *p_rtc_irq,
  "iommu-typhoon", UINT64_MAX);
 address_space_init(>pchip.iommu_as, MEMORY_REGION(>pchip.iommu),
"pchip0-pci");
-pci_setup_iommu(b, typhoon_pci_dma_iommu, s);
+pci_setup_iommu(b, _iommu_ops, s);
 
 /* Pchip0 PCI special/interrupt acknowledge, 0x801.F800., 64MB.  */
 memory_region_init_io(>pchip.reg_iack, OBJECT(s), _pci_iack_ops,
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index e13a5f4..447146e 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -343,6 +343,10 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void 
*opaque, int devfn)
 return >as;
 }
 
+static const PCIIOMMUOps smmu_ops = {
+.get_address_space = smmu_find_add_as,
+};
+
 IOMMUMemoryRegion *smmu_iommu_mr(SMMUState *s, uint32_t sid)
 {
 uint8_t bus_n, devfn;
@@ -437,7 +441,7 @@ static void smmu_base_realize(DeviceState *dev, Error 
**errp)
 s->smmu_pcibus_by_busptr = g_hash_table_new(NULL, NULL);
 
 if (s->primary_bus) {
-pci_setup_iommu(s->primary_bus, smmu_find_add_as, s);
+pci_setup_iommu(s->primary_bus, _ops, s);
 } else {
 error_setg(errp, "SMMU is not attached to any PCI bus!");
 }
diff --git a/hw/hppa/dino.c b/hw/hppa/dino.c
index 7f0c622..ca2dea4 100644
--- a/hw/hppa/dino.c
+++ b/hw/hppa/dino.c
@@ -459,6 +459,10 @@ static AddressSpace *dino_pcihost_set_iommu(PCIBus *bus, 
void *opaque,
 return >bm_as;
 }
 
+static const PCIIOMMUOps dino_iommu_ops = {
+.get_address_space = dino_pcihost_set_iommu,
+};
+
 /*
  * Dino interrupts are connected as shown on Page 78, Table 23
  * (Little-endian bit numbers)
@@ -580,7 +584,7 @@ PCIBus *dino_init(MemoryRegion *addr_space,
 memory_region_add_subregion(>bm, 0xfff0,
 >bm_cpu_alias);
 address_space_init(>bm_as, >bm, "pci-bm");
-pci_setup_iommu(b, dino_pcihost_set_iommu, s);
+pci_setup_iommu(b, _iommu_ops, s);
 
 *p_rtc_irq = qemu_allocate_irq(dino_set_timer_irq, s, 0);
 *p_ser_irq = qemu_allocate_irq(dino_set_serial_irq, s, 0);
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 087f601..77f183d 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1452,6 +1452,10 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, 
void *opaque, int devfn)
 return _as[devfn]->as;
 }
 
+static const PCIIOMMUOps amdvi_iommu_ops = {
+.get_address_space = amdvi_host_dma_iommu,
+};
+
 static const MemoryRegionOps mmio_mem_ops = {
 .read = amdvi_mmio_read,
 .write = amdvi_mmio_write,
@@ -1579,7 +1583,7 @@ static void amdvi_realize(DeviceState *dev, Error **errp)
 
 sysbus_init_mmio(SYS_BUS_DEVICE(s), >mmio);
 sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, AMDVI_BASE_ADDR);
-pci_setup_iommu(bus, amdvi_host_dma_iommu, s);
+pci_setup_iommu(bus, _iommu_ops, s);
 s->devid = object_property_get_int(OBJECT(>pci), "addr", _abort);
 msi_init(>pci.dev, 0, 1, true, false, errp);
 amdvi_init(s);
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 8703a2d..ca6dcad 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3734,6 +3734,10 

  1   2   >