date:20180713

Re: [Qemu-devel] [PATCH v4] linux-user: ppc64: use the correct values for F_*LK64s

2018-07-13 Thread Laurent Vivier

Le 14/07/2018 à 03:07, David Gibson a écrit :
> On Fri, Jul 13, 2018 at 07:34:46AM -0500, Shivaprasad G Bhat wrote:
>> Qemu includes the glibc headers for the host defines and target headers are
>> part of the qemu source themselves. The glibc has the F_GETLK64, F_SETLK64
>> and F_SETLKW64 defined to 12, 13 and 14 for all archs in
>> sysdeps/unix/sysv/linux/bits/fcntl-linux.h. The linux kernel generic
>> definition for F_*LK is 5, 6 & 7 and F_*LK64* is 12,13, and 14 as seen in
>> include/uapi/asm-generic/fcntl.h. On 64bit machine, by default the kernel
>> assumes all F_*LK to 64bit calls and doesnt support use of F_*LK64* as
>> can be seen in include/linux/fcntl.h in linux source.
>>
>> On x86_64 host, the values for F_*LK64* are set to 5, 6 and 7
>> explicitly in /usr/include/x86_64-linux-gnu/bits/fcntl.h by the glibc.
>> Whereas, a PPC64 host doesn't have such a definition in
>> /usr/include/powerpc64le-linux-gnu/bits/fcntl.h by the glibc. So,
>> the sources on PPC64 host sees the default value of F_*LK64*
>> as 12, 13 & 14(fcntl-linux.h).
>>
>> Since the 64bit kernel doesnt support 12, 13 & 14; the glibc fcntl syscall
>> implementation(__libc_fcntl*(), __fcntl64_nocancel) does the F_*LK64* value
>> convertion back to F_*LK* values on PPC64 as seen in
>> sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h with FCNTL_ADJUST_CMD()
>> macro. Whereas on x86_64 host the values for F_*LK64* are set to 5, 6 and 7
>> and no adjustments are needed.
>>
>> Since qemu doesnt use the glibc fcntl, but makes the safe_syscall* on its
>> own, the PPC64 qemu is calling the syscall with 12, 13, and 14(without
>> adjustment) and they all fail. The fcntl calls to F_GETLK/F_SETLK|W all
>> fail by all pplications run on PPC64 host user emulation.
>>
>> The fix here could be to see why on PPC64 the glibc is still keeping
>> F_*LK64* different from F_*LK and why adjusting them to 5, 6 and 7 before
>> the syscall for PPC only. See if we can make the
>> /usr/include/powerpc64le-linux-gnu/bits/fcntl.h to have the values
>> 5, 6 & 7 just like x86_64 and remove the adjustment code in glibc. That
>> way, qemu sources see the kernel supported values in glibc headers.
>>
>> OR
>>
>> On PPC64 host, qemu sources see both F_*LK & F_*LK64* as same and set to
>> 12, 13 and 14 because __USE_FILE_OFFSET64 is defined in qemu
>> sources(also refer sysdeps/unix/sysv/linux/bits/fcntl-linux.h).
>> Do the value adjustment just like it is done by glibc source by using
>> F_GETLK value of 5. That way, we make the syscalls with the actual
>> supported values in Qemu. The patch is taking this approach.
>>
>> Signed-off-by: Shivaprasad G Bhat 
> 
> Reviewed-by: David Gibson 
> 
> I'm not sure if this should go in through my tree or not.

I will take it through my linux-user tree, I have some fixes to send for
the next -rc.

Thanks,
Laurent




signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [Bug 1575607] Re: vm startup failed, qemu returned "kvm run failed Bad address"

2018-07-13 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1575607

Title:
   vm startup failed, qemu returned "kvm run failed Bad address"

Status in QEMU:
  Expired

Bug description:
create a virtual machine and start by libvirt. vm startup failed, qemu 
returned "kvm run failed Bad address"
the error log is :

  error: kvm run failed Bad address

  EAX=7ffc EBX=7ffbffd0 ECX=fff0 EDX=002c

  ESI=6f5c EDI=7ffbffd0 EBP=7ffc ESP=6f34

  EIP=000dec7b EFL=00010046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0

  ES =0010   00c09300 DPL=0 DS   [-WA]

  CS =0008   00c09b00 DPL=0 CS32 [-RA]

  SS =0010   00c09300 DPL=0 DS   [-WA]

  DS =0010   00c09300 DPL=0 DS   [-WA]

  FS =0010   00c09300 DPL=0 DS   [-WA]

  GS =0010   00c09300 DPL=0 DS   [-WA]

  LDT=   8200 DPL=0 LDT

  TR =   8b00 DPL=0 TSS32-busy

  GDT= 000f6a80 0037

  IDT= 000f6abe 

  CR0=6011 CR2= CR3= CR4=

  DR0= DR1= DR2=
  DR3=

  DR6=0ff0 DR7=0400

  EFER=

  Code=c3 29 d3 21 cb 39 c3 77 27 3b 5e 0c 72 22 85 ff 75 02 89 df <89>
  5f 08 01 da 89 57 0c 89 47 10 89 5e 10 8b 56 04 89 f8 e8 8c fc ff ff
  89 d8 eb 06 8b 36

we add numa in the vm, the numatune info is:


  



   the version of qemu is 2.5.0.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1575607/+subscriptions

Re: [Qemu-devel] [PATCH] vhost: fix invalid downcast

2018-07-13 Thread Jia He




On 7/14/2018 12:15 AM, Michael S. Tsirkin Wrote:
> On Fri, Jul 13, 2018 at 05:04:05PM +0300, Yury Kotov wrote:
>> virtio_queue_get_desc_addr returns 64-bit hwaddr while int is usually 32-bit.
>> If returned hwaddr is not equal to 0 but least-significant 32 bits are
>> equal to 0 then this code will not actually stop running queue.
>>
>> Signed-off-by: Yury Kotov 
> 
> So IIUC
> 
> Fixes: fb20fbb764aa1 ("vhost: avoid to start/stop virtqueue which is not 
> ready")

> And 
> Cc: qemu-sta...@nongnu.org
> 
>> ---
>>  hw/virtio/vhost.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index b129cb9..7edeee7 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -1071,10 +1071,8 @@ static void vhost_virtqueue_stop(struct vhost_dev 
>> *dev,
>>  .index = vhost_vq_index,
>>  };
>>  int r;
>> -int a;
>>  
>> -a = virtio_queue_get_desc_addr(vdev, idx);
>> -if (a == 0) {
>> +if (virtio_queue_get_desc_addr(vdev, idx) == 0) {
>>  /* Don't stop the virtqueue which might have not been started */
>>  return;
>>  }
>> -- 
>> 2.7.4
> 
yes, it is a bug introduced by fb20fbb764aa1
Acked-by: Jia He 
-- 
Cheers,
Jia

Re: [Qemu-devel] [PATCH v4] linux-user: ppc64: use the correct values for F_*LK64s

2018-07-13 Thread David Gibson

On Fri, Jul 13, 2018 at 07:34:46AM -0500, Shivaprasad G Bhat wrote:
> Qemu includes the glibc headers for the host defines and target headers are
> part of the qemu source themselves. The glibc has the F_GETLK64, F_SETLK64
> and F_SETLKW64 defined to 12, 13 and 14 for all archs in
> sysdeps/unix/sysv/linux/bits/fcntl-linux.h. The linux kernel generic
> definition for F_*LK is 5, 6 & 7 and F_*LK64* is 12,13, and 14 as seen in
> include/uapi/asm-generic/fcntl.h. On 64bit machine, by default the kernel
> assumes all F_*LK to 64bit calls and doesnt support use of F_*LK64* as
> can be seen in include/linux/fcntl.h in linux source.
> 
> On x86_64 host, the values for F_*LK64* are set to 5, 6 and 7
> explicitly in /usr/include/x86_64-linux-gnu/bits/fcntl.h by the glibc.
> Whereas, a PPC64 host doesn't have such a definition in
> /usr/include/powerpc64le-linux-gnu/bits/fcntl.h by the glibc. So,
> the sources on PPC64 host sees the default value of F_*LK64*
> as 12, 13 & 14(fcntl-linux.h).
> 
> Since the 64bit kernel doesnt support 12, 13 & 14; the glibc fcntl syscall
> implementation(__libc_fcntl*(), __fcntl64_nocancel) does the F_*LK64* value
> convertion back to F_*LK* values on PPC64 as seen in
> sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h with FCNTL_ADJUST_CMD()
> macro. Whereas on x86_64 host the values for F_*LK64* are set to 5, 6 and 7
> and no adjustments are needed.
> 
> Since qemu doesnt use the glibc fcntl, but makes the safe_syscall* on its
> own, the PPC64 qemu is calling the syscall with 12, 13, and 14(without
> adjustment) and they all fail. The fcntl calls to F_GETLK/F_SETLK|W all
> fail by all pplications run on PPC64 host user emulation.
> 
> The fix here could be to see why on PPC64 the glibc is still keeping
> F_*LK64* different from F_*LK and why adjusting them to 5, 6 and 7 before
> the syscall for PPC only. See if we can make the
> /usr/include/powerpc64le-linux-gnu/bits/fcntl.h to have the values
> 5, 6 & 7 just like x86_64 and remove the adjustment code in glibc. That
> way, qemu sources see the kernel supported values in glibc headers.
> 
> OR
> 
> On PPC64 host, qemu sources see both F_*LK & F_*LK64* as same and set to
> 12, 13 and 14 because __USE_FILE_OFFSET64 is defined in qemu
> sources(also refer sysdeps/unix/sysv/linux/bits/fcntl-linux.h).
> Do the value adjustment just like it is done by glibc source by using
> F_GETLK value of 5. That way, we make the syscalls with the actual
> supported values in Qemu. The patch is taking this approach.
> 
> Signed-off-by: Shivaprasad G Bhat 

Reviewed-by: David Gibson 

I'm not sure if this should go in through my tree or not.

> ---
>  v3 - https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02923.html
>  Changes from v3:
> - Fixed the tabs for case statements
> - Addressed the comments on v3 wrt to the variable initialisation
>   and break from default case.
>  v2 - https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02920.html
>  Changes from v2:
> - Fixed the braces, and indentation for comments.
>  v1 - https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02567.html
>  Changes from v1:
> - Changed the overwrite of F*LK64* with 5, 6 and 7 in using #define
>   instead using the adjustment code similar to glibc as suggested.
> - Dropped __linux__ check for the adjustment code as suggested.
> - Moved the adjustment code inside target_to_host_fcntl_cmd to address
>   all possible|future cases.
> 
>  linux-user/syscall.c |  126 
> --
>  1 file changed, 80 insertions(+), 46 deletions(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 643b8833de..b5274f657a 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -6475,63 +6475,97 @@ static int do_fork(CPUArchState *env, unsigned int 
> flags, abi_ulong newsp,
>  /* warning : doesn't handle linux specific flags... */
>  static int target_to_host_fcntl_cmd(int cmd)
>  {
> +int ret;
> +
>  switch(cmd) {
> - case TARGET_F_DUPFD:
> - case TARGET_F_GETFD:
> - case TARGET_F_SETFD:
> - case TARGET_F_GETFL:
> - case TARGET_F_SETFL:
> -return cmd;
> -case TARGET_F_GETLK:
> -return F_GETLK64;
> -case TARGET_F_SETLK:
> -return F_SETLK64;
> -case TARGET_F_SETLKW:
> -return F_SETLKW64;
> - case TARGET_F_GETOWN:
> - return F_GETOWN;
> - case TARGET_F_SETOWN:
> - return F_SETOWN;
> - case TARGET_F_GETSIG:
> - return F_GETSIG;
> - case TARGET_F_SETSIG:
> - return F_SETSIG;
> +case TARGET_F_DUPFD:
> +case TARGET_F_GETFD:
> +case TARGET_F_SETFD:
> +case TARGET_F_GETFL:
> +case TARGET_F_SETFL:
> +ret = cmd;
> +break;
> +case TARGET_F_GETLK:
> +ret = F_GETLK64;
> +break;
> +case TARGET_F_SETLK:
> +ret = F_SETLK64;
> +break;
> +case TARGET_F_SETLKW:
> +

Re: [Qemu-devel] [PATCH v2 1/5] i386: Add support for IA32_PRED_CMD and IA32_ARCH_CAPABILITIES MSRs

2018-07-13 Thread Robert Hoo

On Fri, 2018-07-13 at 10:11 -0400, konrad.w...@oracle.com wrote:
> (Apologies if this comes out as HTML, using Thunderbird instead of mutt 
> here)..
> 
> > +uint64_t pred_cmd;
> > +uint64_t arch_capabilities;
> 
> Could this be 'arch_cap' ?
> 
> >   
> >   /* End of state preserved by INIT (dummy marker).  */
> >   struct {} end_init_save;
> > diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> > index 2d174f3..3aab182 100644
> > --- a/target/i386/kvm.c
> > +++ b/target/i386/kvm.c
> > @@ -93,6 +93,8 @@ static bool has_msr_hv_reenlightenment;
> >   static bool has_msr_xss;
> >   static bool has_msr_spec_ctrl;
> >   static bool has_msr_virt_ssbd;
> > +static bool has_msr_pred_cmd;
> > +static bool has_msr_arch_capabilities;
> 
> Ditto here?
> 

Per Paolo and Eduardo's comments, the 2 fields/variables are gone in new
versions.

Re: [Qemu-devel] [PATCH v2 01/16] qom/object: Add a new function object_initialize_child()

2018-07-13 Thread Eduardo Habkost

On Fri, Jul 13, 2018 at 10:27:29AM +0200, Thomas Huth wrote:
> A lot of code is using the object_initialize() function followed by a call
> to object_property_add_child() to add the newly initialized object as a child
> of the current object. Both functions increase the reference counter of the
> new object, but many spots that call these two functions then forget to drop
> one of the superfluous references. So the newly created object is often not
> cleaned up correctly when the parent is destroyed. In the worst case, this
> can cause crashes, e.g. because device objects are not correctly removed from
> their parent_bus.
> 
> Since this is a common pattern between many code spots, let's introdcue a
> new function that takes care of calling all three required initialization
> functions, first object_initialize(), then object_property_add_child() and
> finally object_unref().
> 
> And while we're at object.h, also fix some copy-n-paste errors in the
> comments there ("to store the area" --> "to store the error").
> 
> Signed-off-by: Thomas Huth 

Potential candidates for using the new function, found using the
following Coccinelle patch:

@@
expression child, size, type, parent, errp, propname;
@@
-object_initialize(child, size, type);
-object_property_add_child(
+object_initialize_child(
   parent, propname, 
-  OBJECT(child),
+  child, size, type,
   errp);

Some of them (very few) already call object_unref() and need to
be fixed manually.

Most of the remaining ~50 object_initialize() callers are also
candidates, even if they don't call object_property_add_child()
today.

Signed-off-by: Eduardo Habkost 
---
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index bb9d33848d..e5923f85af 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -189,8 +189,8 @@ static void aspeed_board_init(MachineState *machine,
 DriveInfo *drive0 = drive_get(IF_MTD, 0, 0);
 
 bmc = g_new0(AspeedBoardState, 1);
-object_initialize(>soc, (sizeof(bmc->soc)), cfg->soc_name);
-object_property_add_child(OBJECT(machine), "soc", OBJECT(>soc),
+object_initialize_child(OBJECT(machine), "soc",
+  >soc, (sizeof(bmc->soc)), cfg->soc_name,
   _abort);
 
 sc = ASPEED_SOC_GET_CLASS(>soc);
diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c
index e68911af0f..994262756f 100644
--- a/hw/arm/aspeed_soc.c
+++ b/hw/arm/aspeed_soc.c
@@ -106,11 +106,11 @@ static void aspeed_soc_init(Object *obj)
 AspeedSoCClass *sc = ASPEED_SOC_GET_CLASS(s);
 int i;
 
-object_initialize(>cpu, sizeof(s->cpu), sc->info->cpu_type);
-object_property_add_child(obj, "cpu", OBJECT(>cpu), NULL);
+object_initialize_child(obj, "cpu",  >cpu, sizeof(s->cpu),
+sc->info->cpu_type, NULL);
 
-object_initialize(>scu, sizeof(s->scu), TYPE_ASPEED_SCU);
-object_property_add_child(obj, "scu", OBJECT(>scu), NULL);
+object_initialize_child(obj, "scu",  >scu, sizeof(s->scu),
+TYPE_ASPEED_SCU, NULL);
 qdev_set_parent_bus(DEVICE(>scu), sysbus_get_default());
 qdev_prop_set_uint32(DEVICE(>scu), "silicon-rev",
  sc->info->silicon_rev);
@@ -121,35 +121,34 @@ static void aspeed_soc_init(Object *obj)
 object_property_add_alias(obj, "hw-prot-key", OBJECT(>scu),
   "hw-prot-key", _abort);
 
-object_initialize(>vic, sizeof(s->vic), TYPE_ASPEED_VIC);
-object_property_add_child(obj, "vic", OBJECT(>vic), NULL);
+object_initialize_child(obj, "vic",  >vic, sizeof(s->vic),
+TYPE_ASPEED_VIC, NULL);
 qdev_set_parent_bus(DEVICE(>vic), sysbus_get_default());
 
-object_initialize(>timerctrl, sizeof(s->timerctrl), TYPE_ASPEED_TIMER);
-object_property_add_child(obj, "timerctrl", OBJECT(>timerctrl), NULL);
+object_initialize_child(obj, "timerctrl",  >timerctrl,
+sizeof(s->timerctrl), TYPE_ASPEED_TIMER, NULL);
 object_property_add_const_link(OBJECT(>timerctrl), "scu",
OBJECT(>scu), _abort);
 qdev_set_parent_bus(DEVICE(>timerctrl), sysbus_get_default());
 
-object_initialize(>i2c, sizeof(s->i2c), TYPE_ASPEED_I2C);
-object_property_add_child(obj, "i2c", OBJECT(>i2c), NULL);
+object_initialize_child(obj, "i2c",  >i2c, sizeof(s->i2c),
+TYPE_ASPEED_I2C, NULL);
 qdev_set_parent_bus(DEVICE(>i2c), sysbus_get_default());
 
-object_initialize(>fmc, sizeof(s->fmc), sc->info->fmc_typename);
-object_property_add_child(obj, "fmc", OBJECT(>fmc), NULL);
+object_initialize_child(obj, "fmc",  >fmc, sizeof(s->fmc),
+sc->info->fmc_typename, NULL);
 qdev_set_parent_bus(DEVICE(>fmc), sysbus_get_default());
 object_property_add_alias(obj, "num-cs", OBJECT(>fmc), "num-cs",

Re: [Qemu-devel] [PATCH v2] dump: add kernel_gs_base to QEMU CPU state

2018-07-13 Thread Eduardo Habkost

On Thu, Jul 12, 2018 at 08:29:27PM +0300, Viktor Prutyanov wrote:
> This patch adds field with content of KERNEL_GS_BASE MSR to QEMU note in
> ELF dump.
> 
> On Windows, if all vCPUs are running usermode tasks at the time the dump is
> created, this can be helpful in the discovery of guest system structures
> during conversion ELF dump to MEMORY.DMP dump.
> 
> Signed-off-by: Viktor Prutyanov 
> ---
>  v2: keep version 1 in QEMUCPUState and document the extension procedure
> 
>  target/i386/arch_dump.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/target/i386/arch_dump.c b/target/i386/arch_dump.c
> index 35b55fc..cc8750f 100644
> --- a/target/i386/arch_dump.c
> +++ b/target/i386/arch_dump.c
> @@ -258,6 +258,12 @@ struct QEMUCPUState {
>  QEMUCPUSegment cs, ds, es, fs, gs, ss;
>  QEMUCPUSegment ldt, tr, gdt, idt;
>  uint64_t cr[5];
> +/*
> + * Fields below are optional and are being added at the end without
> + * changing the version. External tools may identify their presence
> + * by checking 'size' field.
> + */
> +uint64_t kernel_gs_base;
>  };
>  
>  typedef struct QEMUCPUState QEMUCPUState;
> @@ -315,6 +321,8 @@ static void qemu_get_cpustate(QEMUCPUState *s, 
> CPUX86State *env)
>  s->cr[2] = env->cr[2];
>  s->cr[3] = env->cr[3];
>  s->cr[4] = env->cr[4];
> +
> +s->kernel_gs_base = env->kernelgsbase;

This breaks i386-softmmu:

/home/travis/build/ehabkost/qemu/target/i386/arch_dump.c: In function 
‘qemu_get_cpustate’:
/home/travis/build/ehabkost/qemu/target/i386/arch_dump.c:325:28: error: 
‘CPUX86State’ has no member named ‘kernelgsbase’
 s->kernel_gs_base = env->kernelgsbase;
^
make[1]: *** [target/i386/arch_dump.o] Error 1
make: *** [subdir-i386-softmmu] Error 2
make: *** Waiting for unfinished jobs

-- 
Eduardo

Re: [Qemu-devel] [PATCH v2 00/16] Fix crashes with introspection of ARM devices

2018-07-13 Thread Eduardo Habkost

On Fri, Jul 13, 2018 at 10:27:28AM +0200, Thomas Huth wrote:
> As discovered recently, you can crash QEMU with a lot of devices
> that do not get the reference counting of child objects right.
> You just have to run 'device-list-properties' and call 'info qtree'
> afterwards.
> This patch series fixes a bunch of these problems in the ARM code.
> I did not fix all problems yet, since it is quite time consuming,
> and this series is big enough already. The remaining problems can
> be fixed in an independent patch series later.
> 
> v2:
>  - Updated the first patch according to the review feedback from v1
>  - Added more patches with additional fixes

OK, I will stop replying to each individual patch and just send
a single one for the series:

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [PATCH v2 09/16] hw/cpu/a9mpcore: Fix introspection problems with the "a9mpcore_priv" device

2018-07-13 Thread Eduardo Habkost

On Fri, Jul 13, 2018 at 10:27:37AM +0200, Thomas Huth wrote:
> Running QEMU with valgrind indicates a problem here:
> 
> echo "{'execute':'qmp_capabilities'} {'execute':'device-list-properties'," \
>  "'arguments':{'typename':'a9mpcore_priv'}}" \
>  "{'execute': 'human-monitor-command', " \
>  "'arguments': {'command-line': 'info qtree'}}" | \
>  valgrind -q aarch64-softmmu/qemu-system-aarch64 -M none,accel=qtest -qmp 
> stdio
> [...]
> ==30996== Invalid read of size 8
> ==30996==at 0x6185DA: qdev_print (qdev-monitor.c:686)
> ==30996==by 0x6185DA: qbus_print (qdev-monitor.c:719)
> ==30996==by 0x452B38: handle_hmp_command (monitor.c:3446)
> [...]
> 
> Use the new sysbus_init_child_obj() function to make sure that the objects
> are cleaned up correctly when the parent gets destroyed.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [PATCH v2 08/16] hw/arm/msf2-soc: Fix introspection problem with the "msf2-soc" device

2018-07-13 Thread Eduardo Habkost

On Fri, Jul 13, 2018 at 10:27:36AM +0200, Thomas Huth wrote:
> Valgrind currently reports a problem when running QEMU like this:
> 
> echo "{'execute':'qmp_capabilities'} {'execute':'device-list-properties'," \
>  "'arguments':{'typename':'msf2-soc'}}" \
>  "{'execute': 'human-monitor-command', " \
>  "'arguments': {'command-line': 'info qtree'}}" | \
>  valgrind -q aarch64-softmmu/qemu-system-aarch64 -M none,accel=qtest -qmp 
> stdio
> [...]
> ==23097== Invalid read of size 8
> ==23097==at 0x6192AA: qdev_print (qdev-monitor.c:686)
> ==23097==by 0x6192AA: qbus_print (qdev-monitor.c:719)
> [...]
> 
> Use the new sysbus_init_child_obj() function to make sure that the child
> objects are cleaned up correctly when the parent gets destroyed.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [PATCH v2 07/16] hw/arm/xlnx-zynqmp: Fix crash when introspecting the "xlnx, zynqmp" device

2018-07-13 Thread Eduardo Habkost

On Fri, Jul 13, 2018 at 10:27:35AM +0200, Thomas Huth wrote:
> QEMU currently crashes when e.g. doing something like this:
> 
> echo "{'execute':'qmp_capabilities'} {'execute':'device-list-properties'," \
>  "'arguments':{'typename':'xlnx,zynqmp'}}" \
>  "{'execute': 'human-monitor-command', " \
>  "'arguments': {'command-line': 'info qtree'}}" \
>  |  aarch64-softmmu/qemu-system-aarch64 -M none,accel=qtest -qmp stdio
> 
> Use the new object_initialize_child() and sysbus_init_child_obj()
> functions to get the refernce counting of the child objects right, so
> that they are properly cleaned up when the parent gets destroyed.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [PATCH v2 05/16] hw/cpu/a15mpcore: Fix introspection problem with the a15mpcore_priv device

2018-07-13 Thread Eduardo Habkost

On Fri, Jul 13, 2018 at 10:27:33AM +0200, Thomas Huth wrote:
> There is a memory management problem when introspecting the a15mpcore_priv
> device. It can be seen with valgrind when running QEMU like this:
> 
> echo "{'execute':'qmp_capabilities'} {'execute':'device-list-properties'," \
>  "'arguments':{'typename':'a15mpcore_priv'}}"\
>  "{'execute': 'human-monitor-command', " \
>  "'arguments': {'command-line': 'info qtree'}}"  | \
>  valgrind -q aarch64-softmmu/qemu-system-aarch64 -M none,accel=qtest -qmp 
> stdio
> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2},
>  "package": "build-all"}, "capabilities": []}}
> {"return": {}}
> {"return": [{"name": "num-cpu", "type": "uint32"}, {"name": "num-irq",
>  "type": "uint32"}, {"name": "a15mp-priv-container[0]", "type":
>   "child"}]}
> ==24978== Invalid read of size 8
> ==24978==at 0x618EBA: qdev_print (qdev-monitor.c:686)
> ==24978==by 0x618EBA: qbus_print (qdev-monitor.c:719)
> [...]
> 
> Use the new sysbus_init_child_obj() function to make sure that we get
> the reference counting of the child objects right.
> 
> Signed-off-by: Thomas Huth 
> ---
>  hw/cpu/a15mpcore.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/cpu/a15mpcore.c b/hw/cpu/a15mpcore.c
> index bc05152..43c1079 100644
> --- a/hw/cpu/a15mpcore.c
> +++ b/hw/cpu/a15mpcore.c
> @@ -35,15 +35,13 @@ static void a15mp_priv_initfn(Object *obj)
>  {
>  SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
>  A15MPPrivState *s = A15MPCORE_PRIV(obj);
> -DeviceState *gicdev;
>  
>  memory_region_init(>container, obj, "a15mp-priv-container", 0x8000);
>  sysbus_init_mmio(sbd, >container);
>  
> -object_initialize(>gic, sizeof(s->gic), gic_class_name());
> -gicdev = DEVICE(>gic);
> -qdev_set_parent_bus(gicdev, sysbus_get_default());
> -qdev_prop_set_uint32(gicdev, "revision", 2);
> +sysbus_init_child_obj(obj, "gic", >gic, sizeof(s->gic),
> +  gic_class_name());
> +qdev_prop_set_uint32(DEVICE(>gic), "revision", 2);

I assume qdev_set_parent_bus() won't trigger any code that looks
at "revision", so the prop_set/set_parent_bus ordering change
won't matter.

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [PATCH v2 01/16] qom/object: Add a new function object_initialize_child()

2018-07-13 Thread Eduardo Habkost

On Fri, Jul 13, 2018 at 11:29:17PM +0200, Andreas Färber wrote:
> Am 13.07.2018 um 23:16 schrieb Eduardo Habkost:
> > I wonder if we should deprecate object_initialize() and support
> > only object_initialize_child() later.  Initializing an object
> > contained inside another one without making it a child of the
> > parent object is a recipe for trouble.
> 
> The root container object needs to be initialized, too.

If the object is not embedded in another struct, I would expect
it to be created using object_new() instead of
object_initialize().

-- 
Eduardo

Re: [Qemu-devel] [PATCH v2 04/16] hw/arm/armv7: Fix crash when introspecting the "iotkit" device

2018-07-13 Thread Eduardo Habkost

On Fri, Jul 13, 2018 at 10:27:32AM +0200, Thomas Huth wrote:
> QEMU currently crashes when introspecting the "iotkit" device and
> runnint "info qtree" afterwards, e.g. when running QEMU like this:
> 
> echo "{'execute':'qmp_capabilities'} {'execute':'device-list-properties'," \
>  "'arguments':{'typename':'iotkit'}}" "{'execute': 'human-monitor-command', " 
> \
>  "'arguments': {'command-line': 'info qtree'}}" | \
>  aarch64-softmmu/qemu-system-aarch64 -M none,accel=qtest -qmp stdio
> 
> Use the new functions object_initialize_child() and sysbus_init_child_obj()
> to make sure that all objects get cleaned up correctly when the instances
> are destroyed.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [PATCH v2 01/16] qom/object: Add a new function object_initialize_child()

2018-07-13 Thread Andreas Färber

Am 13.07.2018 um 23:16 schrieb Eduardo Habkost:
> I wonder if we should deprecate object_initialize() and support
> only object_initialize_child() later.  Initializing an object
> contained inside another one without making it a child of the
> parent object is a recipe for trouble.

The root container object needs to be initialized, too.

Cheers,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

Re: [Qemu-devel] [PATCH v2 03/16] hw/arm/bcm2836: Fix crash with device_add bcm2837 on unsupported machines

2018-07-13 Thread Eduardo Habkost

On Fri, Jul 13, 2018 at 10:27:31AM +0200, Thomas Huth wrote:
> When trying to "device_add bcm2837" on a machine that is not suitable for
> this device, you can quickly crash QEMU afterwards, e.g. with "info qtree":
> 
> echo "{'execute':'qmp_capabilities'} {'execute':'device_add', " \
>  "'arguments':{'driver':'bcm2837'}} {'execute': 'human-monitor-command', " \
>  "'arguments': {'command-line': 'info qtree'}}" | \
>  aarch64-softmmu/qemu-system-aarch64 -M integratorcp,accel=qtest -S -qmp stdio
> 
> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2},
>  "package": "build-all"}, "capabilities": []}}
> {"return": {}}
> {"error": {"class": "GenericError", "desc": "Device 'bcm2837' can not be
>  hotplugged on this machine"}}
> Segmentation fault (core dumped)
> 
> The qdev_set_parent_bus() from instance_init adds a link to the child devices
> which is not valid anymore after the bcm2837 instance has been destroyed.
> Unfortunately, the child devices do not get destroyed / unlinked correctly
> because both object_initialize() and object_property_add_child() increase
> the reference count of the child objects by one, but only one reference
> is dropped when the parent gets removed. So let's use the new functions
> object_initialize_child() and sysbus_init_child_obj() instead to create
> the objects, which will take care of creating the child objects with the
> correct reference count of one.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Eduardo Habkost 

The usage of _abort in code that can be triggered from
device-list-properties still makes me nervous, but that's a
separate issue.


> ---
>  hw/arm/bcm2836.c | 18 ++
>  1 file changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
> index 6805a7d..af97b2f 100644
> --- a/hw/arm/bcm2836.c
> +++ b/hw/arm/bcm2836.c
> @@ -51,25 +51,19 @@ static void bcm2836_init(Object *obj)
>  int n;
>  
>  for (n = 0; n < BCM283X_NCPUS; n++) {
> -object_initialize(>cpus[n], sizeof(s->cpus[n]),
> -  info->cpu_type);
> -object_property_add_child(obj, "cpu[*]", OBJECT(>cpus[n]),
> -  _abort);
> +object_initialize_child(obj, "cpu[*]", >cpus[n], 
> sizeof(s->cpus[n]),
> +info->cpu_type, _abort);
>  }
>  
> -object_initialize(>control, sizeof(s->control), TYPE_BCM2836_CONTROL);
> -object_property_add_child(obj, "control", OBJECT(>control), NULL);
> -qdev_set_parent_bus(DEVICE(>control), sysbus_get_default());
> +sysbus_init_child_obj(obj, "control", >control, sizeof(s->control),
> +  TYPE_BCM2836_CONTROL);
>  
> -object_initialize(>peripherals, sizeof(s->peripherals),
> -  TYPE_BCM2835_PERIPHERALS);
> -object_property_add_child(obj, "peripherals", OBJECT(>peripherals),
> -  _abort);
> +sysbus_init_child_obj(obj, "peripherals", >peripherals,
> +  sizeof(s->peripherals), TYPE_BCM2835_PERIPHERALS);
>  object_property_add_alias(obj, "board-rev", OBJECT(>peripherals),
>"board-rev", _abort);
>  object_property_add_alias(obj, "vcram-size", OBJECT(>peripherals),
>"vcram-size", _abort);
> -qdev_set_parent_bus(DEVICE(>peripherals), sysbus_get_default());
>  }
>  
>  static void bcm2836_realize(DeviceState *dev, Error **errp)
> -- 
> 1.8.3.1
> 

-- 
Eduardo

Re: [Qemu-devel] [PATCH v2 02/16] hw/core/sysbus: Add a function for creating and attaching an object

2018-07-13 Thread Eduardo Habkost

On Fri, Jul 13, 2018 at 10:27:30AM +0200, Thomas Huth wrote:
> A lot of functions are initializing an object and attach it immediately
> afterwards to the system bus. Provide a common function for this, which
> also uses object_initialize_child() to make sure that the reference
> counter is correctly initialized to 1 afterwards.
> 
> Signed-off-by: Thomas Huth 
> ---
>  hw/core/sysbus.c| 8 
>  include/hw/sysbus.h | 3 +++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/hw/core/sysbus.c b/hw/core/sysbus.c
> index ecfb0cf..e2436ce 100644
> --- a/hw/core/sysbus.c
> +++ b/hw/core/sysbus.c
> @@ -376,6 +376,14 @@ BusState *sysbus_get_default(void)
>  return main_system_bus;
>  }
>  
> +void sysbus_init_child_obj(Object *parent, const char *childname, void 
> *child,
> +   size_t childsize, const char *childtype)
> +{
> +object_initialize_child(parent, childname, child, childsize, childtype,
> +_abort);
> +qdev_set_parent_bus(DEVICE(child), sysbus_get_default());
> +}
> +
>  static void sysbus_register_types(void)
>  {
>  type_register_static(_bus_info);
> diff --git a/include/hw/sysbus.h b/include/hw/sysbus.h
> index e88bb6d..e405232 100644
> --- a/include/hw/sysbus.h
> +++ b/include/hw/sysbus.h
> @@ -96,6 +96,9 @@ void sysbus_add_io(SysBusDevice *dev, hwaddr addr,
> MemoryRegion *mem);
>  MemoryRegion *sysbus_address_space(SysBusDevice *dev);
>  
> +void sysbus_init_child_obj(Object *parent, const char *childname, void 
> *child,
> +   size_t childsize, const char *childtype);
> +

Documentation about the reference ownership rules would be nice,
but I don't think this should block the bug fixes.

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [PATCH v2 01/16] qom/object: Add a new function object_initialize_child()

2018-07-13 Thread Eduardo Habkost

On Fri, Jul 13, 2018 at 10:27:29AM +0200, Thomas Huth wrote:
> A lot of code is using the object_initialize() function followed by a call
> to object_property_add_child() to add the newly initialized object as a child
> of the current object. Both functions increase the reference counter of the
> new object, but many spots that call these two functions then forget to drop
> one of the superfluous references. So the newly created object is often not
> cleaned up correctly when the parent is destroyed. In the worst case, this
> can cause crashes, e.g. because device objects are not correctly removed from
> their parent_bus.
> 
> Since this is a common pattern between many code spots, let's introdcue a
> new function that takes care of calling all three required initialization
> functions, first object_initialize(), then object_property_add_child() and
> finally object_unref().
> 
> And while we're at object.h, also fix some copy-n-paste errors in the
> comments there ("to store the area" --> "to store the error").
> 
> Signed-off-by: Thomas Huth 
> ---
>  include/qom/object.h | 23 +--
>  qom/object.c | 15 +++
>  2 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/include/qom/object.h b/include/qom/object.h
> index f3d2308..3362db0 100644
> --- a/include/qom/object.h
> +++ b/include/qom/object.h
> @@ -749,6 +749,25 @@ int object_set_propv(Object *obj,
>  void object_initialize(void *obj, size_t size, const char *typename);
>  
>  /**
> + * object_initialize_child:
> + * @parentobj: The parent object to add a property to
> + * @propname: The name of the property
> + * @childobj: A pointer to the memory to be used for the object.
> + * @size: The maximum size available at @obj for the object.
> + * @type: The name of the type of the object to instantiate.
> + * @errp: If an error occurs, a pointer to an area to store the error
> + *
> + * This function will initialize an object. The memory for the object should
> + * have already been allocated. The object will then be added as child 
> property
> + * to a parent with object_property_add_child() function. The returned object
> + * has a reference count of 1 (for the "child<...>" property from the 
> parent),
> + * so the object will get finalized automatically when the parent gets 
> removed.
> + */
> +void object_initialize_child(Object *parentobj, const char *propname,
> + void *childobj, size_t size, const char *type,
> + Error **errp);

I wonder if we should deprecate object_initialize() and support
only object_initialize_child() later.  Initializing an object
contained inside another one without making it a child of the
parent object is a recipe for trouble.

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [RFC v3 2/2] virtio-pmem: Add virtio pmem driver

2018-07-13 Thread Luiz Capitulino

On Fri, 13 Jul 2018 13:22:31 +0530
Pankaj Gupta  wrote:

> This patch adds virtio-pmem driver for KVM guest.
> 
> Guest reads the persistent memory range information from Qemu over 
> VIRTIO and registers it on nvdimm_bus. It also creates a nd_region 
> object with the persistent memory range information so that existing 
> 'nvdimm/pmem' driver can reserve this into system memory map. This way 
> 'virtio-pmem' driver uses existing functionality of pmem driver to 
> register persistent memory compatible for DAX capable filesystems.
> 
> This also provides function to perform guest flush over VIRTIO from 
> 'pmem' driver when userspace performs flush on DAX memory range.
> 
> Signed-off-by: Pankaj Gupta 
> ---
>  drivers/virtio/Kconfig   |   9 ++
>  drivers/virtio/Makefile  |   1 +
>  drivers/virtio/virtio_pmem.c | 190 
> +++
>  include/linux/virtio_pmem.h  |  44 +
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  40 +
>  6 files changed, 285 insertions(+)
>  create mode 100644 drivers/virtio/virtio_pmem.c
>  create mode 100644 include/linux/virtio_pmem.h
>  create mode 100644 include/uapi/linux/virtio_pmem.h
> 
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 3589764..a331e23 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -42,6 +42,15 @@ config VIRTIO_PCI_LEGACY
>  
> If unsure, say Y.
>  
> +config VIRTIO_PMEM
> + tristate "Support for virtio pmem driver"
> + depends on VIRTIO
> + help
> + This driver provides support for virtio based flushing interface
> + for persistent memory range.
> +
> + If unsure, say M.
> +
>  config VIRTIO_BALLOON
>   tristate "Virtio balloon driver"
>   depends on VIRTIO
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 3a2b5c5..cbe91c6 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o
> diff --git a/drivers/virtio/virtio_pmem.c b/drivers/virtio/virtio_pmem.c
> new file mode 100644
> index 000..6200b5e
> --- /dev/null
> +++ b/drivers/virtio/virtio_pmem.c
> @@ -0,0 +1,190 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include 
> +#include 
> +#include 
> +
> +static struct virtio_device_id id_table[] = {
> + { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> + { 0 },
> +};
> +
> + /* The interrupt handler */
> +static void host_ack(struct virtqueue *vq)
> +{
> + unsigned int len;
> + unsigned long flags;
> + struct virtio_pmem_request *req;
> + struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> + spin_lock_irqsave(>pmem_lock, flags);
> + while ((req = virtqueue_get_buf(vq, )) != NULL) {
> + req->done = true;
> + wake_up(>acked);
> + }
> + spin_unlock_irqrestore(>pmem_lock, flags);

Honest question: why do you need to disable interrupts here?

> +}
> + /* Initialize virt queue */
> +static int init_vq(struct virtio_pmem *vpmem)
> +{
> + struct virtqueue *vq;
> +
> + /* single vq */
> + vpmem->req_vq = vq = virtio_find_single_vq(vpmem->vdev,
> + host_ack, "flush_queue");
> + if (IS_ERR(vq))
> + return PTR_ERR(vq);
> + spin_lock_init(>pmem_lock);
> +
> + return 0;
> +};
> +
> + /* The request submission function */
> +static int virtio_pmem_flush(struct device *dev)
> +{
> + int err;
> + unsigned long flags;
> + struct scatterlist *sgs[2], sg, ret;
> + struct virtio_device *vdev = dev_to_virtio(dev->parent->parent);
> + struct virtio_pmem *vpmem = vdev->priv;
> + struct virtio_pmem_request *req = kmalloc(sizeof(*req), GFP_KERNEL);

Not checking kmalloc() return.

> +
> + req->done = false;
> + init_waitqueue_head(>acked);
> + spin_lock_irqsave(>pmem_lock, flags);

Why do you need spin_lock_irqsave()? There are two points consider:

1. Will virtio_pmem_flush() ever be called with interrupts disabled?
   If yes, then it's broken since you should be using GFP_ATOMIC in the
   kmalloc() call and you can't call wait_event()

2. If virtio_pmem_flush() is never called with interrupts disabled, do
   you really need to disable interrupts? If yes, why?

Another point to consider is whether or not virtio_pmem_flush()
can be called from atomic context. nvdimm_flush() itself is called
from a few atomic sites, but I can't tell if virtio_pmem_flush()
will ever be called from those sites. If it can be called atomic
context, then item 1 applies here.

Re: [Qemu-devel] [RFC v3 1/2] libnvdimm: Add flush callback for virtio pmem

2018-07-13 Thread Luiz Capitulino

On Fri, 13 Jul 2018 13:22:30 +0530
Pankaj Gupta  wrote:

> This patch adds functionality to perform flush from guest to host
> over VIRTIO. We are registering a callback based on 'nd_region' type.
> As virtio_pmem driver requires this special flush interface, for rest
> of the region types we are registering existing flush function.
> Also report the error returned by virtio flush interface.

This patch doesn't apply against latest upstream. A few more comments
below.

> 
> Signed-off-by: Pankaj Gupta 
> ---
>  drivers/nvdimm/nd.h  |  1 +
>  drivers/nvdimm/pmem.c|  4 ++--
>  drivers/nvdimm/region_devs.c | 24 ++--
>  include/linux/libnvdimm.h|  5 -
>  4 files changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> index 32e0364..1b62f79 100644
> --- a/drivers/nvdimm/nd.h
> +++ b/drivers/nvdimm/nd.h
> @@ -159,6 +159,7 @@ struct nd_region {
>   struct badblocks bb;
>   struct nd_interleave_set *nd_set;
>   struct nd_percpu_lane __percpu *lane;
> + int (*flush)(struct device *dev);
>   struct nd_mapping mapping[0];
>  };
>  
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 9d71492..29fd2cd 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -180,7 +180,7 @@ static blk_qc_t pmem_make_request(struct request_queue 
> *q, struct bio *bio)
>   struct nd_region *nd_region = to_region(pmem);
>  
>   if (bio->bi_opf & REQ_FLUSH)
> - nvdimm_flush(nd_region);
> + bio->bi_status = nvdimm_flush(nd_region);
>  
>   do_acct = nd_iostat_start(bio, );
>   bio_for_each_segment(bvec, bio, iter) {
> @@ -196,7 +196,7 @@ static blk_qc_t pmem_make_request(struct request_queue 
> *q, struct bio *bio)
>   nd_iostat_end(bio, start);
>  
>   if (bio->bi_opf & REQ_FUA)
> - nvdimm_flush(nd_region);
> + bio->bi_status = nvdimm_flush(nd_region);
>  
>   bio_endio(bio);
>   return BLK_QC_T_NONE;
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index a612be6..124aae7 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -1025,6 +1025,7 @@ static struct nd_region *nd_region_create(struct 
> nvdimm_bus *nvdimm_bus,
>   dev->of_node = ndr_desc->of_node;
>   nd_region->ndr_size = resource_size(ndr_desc->res);
>   nd_region->ndr_start = ndr_desc->res->start;
> + nd_region->flush = ndr_desc->flush;
>   nd_device_register(dev);
>  
>   return nd_region;
> @@ -1065,13 +1066,10 @@ struct nd_region 
> *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
>  }
>  EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
>  
> -/**
> - * nvdimm_flush - flush any posted write queues between the cpu and pmem 
> media
> - * @nd_region: blk or interleaved pmem region
> - */
> -void nvdimm_flush(struct nd_region *nd_region)
> +void pmem_flush(struct device *dev)
>  {
> - struct nd_region_data *ndrd = dev_get_drvdata(_region->dev);
> + struct nd_region_data *ndrd = dev_get_drvdata(dev);
> + struct nd_region *nd_region = to_nd_region(dev);
>   int i, idx;
>  
>   /*
> @@ -1094,6 +1092,20 @@ void nvdimm_flush(struct nd_region *nd_region)
>   writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
>   wmb();
>  }
> +
> +/**
> + * nvdimm_flush - flush any posted write queues between the cpu and pmem 
> media
> + * @nd_region: blk or interleaved pmem region
> + */
> +int nvdimm_flush(struct nd_region *nd_region)
> +{
> + if (nd_region->flush)
> + return(nd_region->flush(_region->dev));
> +
> + pmem_flush(_region->dev);

IMHO, a better way of doing this would be to allow nvdimm_flush() to
be overridden. That is, in nd_region_create() you set nd_region->flush
to the original nvdimm_flush() if ndr_desc->flush is NULL. And then
always call nd_region->flush() where nvdimm_flush() is called today.

> +
> + return 0;
> +}
>  EXPORT_SYMBOL_GPL(nvdimm_flush);
>  
>  /**
> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> index 097072c..33b617f 100644
> --- a/include/linux/libnvdimm.h
> +++ b/include/linux/libnvdimm.h
> @@ -126,6 +126,7 @@ struct nd_region_desc {
>   int numa_node;
>   unsigned long flags;
>   struct device_node *of_node;
> + int (*flush)(struct device *dev);
>  };
>  
>  struct device;
> @@ -201,7 +202,9 @@ unsigned long nd_blk_memremap_flags(struct nd_blk_region 
> *ndbr);
>  unsigned int nd_region_acquire_lane(struct nd_region *nd_region);
>  void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane);
>  u64 nd_fletcher64(void *addr, size_t len, bool le);
> -void nvdimm_flush(struct nd_region *nd_region);
> +int nvdimm_flush(struct nd_region *nd_region);
> +void pmem_set_flush(struct nd_region *nd_region, void (*flush)
> + (struct device *));

It seems pmem_set_flush() doesn't exist.

>  int

Re: [Qemu-devel] [RFC 1/3] qom: Document reference count ownership rules

2018-07-13 Thread Eduardo Habkost

On Fri, Jul 13, 2018 at 11:07:15AM +0200, Thomas Huth wrote:
> On 12.07.2018 21:45, Eduardo Habkost wrote:
> > The documentation for QOM is not clear about who owns references
> > to objects (i.e. who is responsible for calling object_unref()
> > later).
> > 
> > This is important considering there are a few inconsistencies in
> > the API (e.g. callers of object_new() need to call object_unref()
> > later, but callers of object_new_with_props() must not do it).
> > 
> > Update the documentation so that every mention of object
> > references also mention who exactly owns the reference.
> > 
> > Signed-off-by: Eduardo Habkost 
> > ---
> >  include/qom/object.h | 73 +---
> >  1 file changed, 42 insertions(+), 31 deletions(-)
> > 
> > diff --git a/include/qom/object.h b/include/qom/object.h
> > index f3d2308d56..08a1bbba7d 100644
> > --- a/include/qom/object.h
> > +++ b/include/qom/object.h
> > @@ -376,7 +376,7 @@ typedef void (ObjectUnparent)(Object *obj);
> >   * ObjectFree:
> >   * @obj: the object being freed
> >   *
> > - * Called when an object's last reference is removed.
> > + * Called when an object's last reference is dropped using object_unref().
> >   */
> >  typedef void (ObjectFree)(void *obj);
> >  
> > @@ -601,8 +601,8 @@ struct InterfaceClass
> >   * @typename: The name of the type of the object to instantiate.
> >   *
> >   * This function will initialize a new object using heap allocated memory.
> > - * The returned object has a reference count of 1, and will be freed when
> > - * the last reference is dropped.
> > + * The returned object has a reference count of 1, and the reference will 
> > be
> > + * owned by the caller.
> >   *
> >   * Returns: The newly allocated and instantiated object.
> >   */
> > @@ -617,8 +617,8 @@ Object *object_new(const char *typename);
> >   * @...: list of property names and values
> >   *
> >   * This function will initialize a new object using heap allocated memory.
> > - * The returned object has a reference count of 1, and will be freed when
> > - * the last reference is dropped.
> > + * The returned object has a reference count of 1, and the reference will
> > + * be owned by the caller.
> 
> That's the description of object_new_with_props here already, isn't it?
> So the reference will be owned by the parent object, not by the caller.

Oops, you're right.  Thanks!

> 
> >   * The @id parameter will be used when registering the object as a
> >   * child of @parent in the composition tree.
> > @@ -652,8 +652,8 @@ Object *object_new(const char *typename);
> >   *   
> >   * 
> >   *
> > - * The returned object will have one stable reference maintained
> > - * for as long as it is present in the object hierarchy.
> > + * The returned object will have one reference, owned by the
> > + * parent object (not by the caller).
> 
> ... and then this information here is somewhat redundant. I suggest to
> remove one of the two spots.

Will change this to:

 /**
  * object_new_with_props:
  * @typename:  The name of the type of the object to instantiate.
  * @parent: the parent object
  * @id: The unique ID of the object
  * @errp: pointer to error object
  * @...: list of property names and values
  *
  * This function will initialize a new object using heap allocated memory.
- * The returned object has a reference count of 1, and will be freed when
- * the last reference is dropped.
+ * The returned object will have one reference, owned by the
+ * parent object (not by the caller).
  *
  * The @id parameter will be used when registering the object as a
  * child of @parent in the composition tree.
  *
  * The variadic parameters are a list of pairs of (propname, propvalue)
  * strings. The propname of %NULL indicates the end of the property
  * list. If the object implements the user creatable interface, the
  * object will be marked complete once all the properties have been
  * processed.
  *
  * 
  *   Creating an object with properties
  *   
  *   Error *err = NULL;
  *   Object *obj;
  *
  *   obj = object_new_with_props(TYPE_MEMORY_BACKEND_FILE,
  *   object_get_objects_root(),
  *   "hostmem0",
  *   ,
  *   "share", "yes",
  *   "mem-path", "/dev/shm/somefile",
  *   "prealloc", "yes",
  *   "size", "1048576",
  *   NULL);
  *
  *   if (!obj) {
  * g_printerr("Cannot create memory backend: %s\n",
  *error_get_pretty(err));
  *   }
  *   
  * 
  *
- * The returned object will have one stable reference maintained
- * for as long as it is present in the object hierarchy.
- *
  * Returns: The newly allocated, instantiated & initialized object.
  */
 Object *object_new_with_props(const char *typename,
   Object *parent,

Re: [Qemu-devel] [PATCH v2] dump: add kernel_gs_base to QEMU CPU state

2018-07-13 Thread Eduardo Habkost

On Thu, Jul 12, 2018 at 08:29:27PM +0300, Viktor Prutyanov wrote:
> This patch adds field with content of KERNEL_GS_BASE MSR to QEMU note in
> ELF dump.
> 
> On Windows, if all vCPUs are running usermode tasks at the time the dump is
> created, this can be helpful in the discovery of guest system structures
> during conversion ELF dump to MEMORY.DMP dump.
> 
> Signed-off-by: Viktor Prutyanov 

Reviewed-by: Eduardo Habkost 

Queued for 3.1, thanks.

-- 
Eduardo

Re: [Qemu-devel] [PATCH] iotest: Fix filtering order in 226

2018-07-13 Thread John Snow




On 07/13/2018 03:41 PM, Max Reitz wrote:
> The test directory should be filtered before the image format, otherwise
> the test will fail if the image format is part of the test directory,
> like so:
> 
> [...]
> -can't open: Could not open 'TEST_DIR/t.IMGFMT': Is a directory
> +can't open: Could not open '/tmp/test-IMGFMT/t.IMGFMT': Is a directory
> [...]
> 
> Signed-off-by: Max Reitz 

Too many gotchas in our test suite.

Thanks :(

Reviewed-by: John Snow

[Qemu-devel] [PATCH] iotest: Fix filtering order in 226

2018-07-13 Thread Max Reitz

The test directory should be filtered before the image format, otherwise
the test will fail if the image format is part of the test directory,
like so:

[...]
-can't open: Could not open 'TEST_DIR/t.IMGFMT': Is a directory
+can't open: Could not open '/tmp/test-IMGFMT/t.IMGFMT': Is a directory
[...]

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/226 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/226 b/tests/qemu-iotests/226
index 460aea2fc9..a5a1f6720a 100755
--- a/tests/qemu-iotests/226
+++ b/tests/qemu-iotests/226
@@ -52,10 +52,10 @@ for PROTO in "file" "host_device" "host_cdrom"; do
 echo "=== Testing with driver:$PROTO ==="
 echo
 echo "== Testing RO =="
-$QEMU_IO -c "open -r -o driver=$PROTO,filename=$TEST_IMG" 2>&1 | 
_filter_imgfmt | _filter_testdir
+$QEMU_IO -c "open -r -o driver=$PROTO,filename=$TEST_IMG" 2>&1 | 
_filter_testdir | _filter_imgfmt
 $QEMU_IO -c "open -r -o driver=$PROTO,filename=/dev/null" 2>&1 | 
_filter_imgfmt
 echo "== Testing RW =="
-$QEMU_IO -c "open -o driver=$PROTO,filename=$TEST_IMG" 2>&1 | 
_filter_imgfmt | _filter_testdir
+$QEMU_IO -c "open -o driver=$PROTO,filename=$TEST_IMG" 2>&1 | 
_filter_testdir | _filter_imgfmt
 $QEMU_IO -c "open -o driver=$PROTO,filename=/dev/null" 2>&1 | 
_filter_imgfmt
 done
 
-- 
2.17.1

Re: [Qemu-devel] [PATCH] iotests: Disallow compat=0.10 in 223

2018-07-13 Thread John Snow




On 07/13/2018 03:15 PM, Max Reitz wrote:
> 223 tests persistent dirty bitmaps which are not supported in
> compat=0.10, so that option is unsupported for this test.
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/223 | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tests/qemu-iotests/223 b/tests/qemu-iotests/223
> index b63b7a4f9e..8b1859c2dd 100755
> --- a/tests/qemu-iotests/223
> +++ b/tests/qemu-iotests/223
> @@ -40,6 +40,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>  _supported_fmt qcow2
>  _supported_proto file # uses NBD as well
>  _supported_os Linux
> +# Persistent dirty bitmaps require compat=1.1
> +_unsupported_imgopts 'compat=0.10'
>  
>  function do_run_qemu()
>  {
> 

Thanks for catching this :)

Tested-by: John Snow 
Reviewed-by: John Snow

[Qemu-devel] [PATCH] iotests: Disallow compat=0.10 in 223

2018-07-13 Thread Max Reitz

223 tests persistent dirty bitmaps which are not supported in
compat=0.10, so that option is unsupported for this test.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/223 | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/qemu-iotests/223 b/tests/qemu-iotests/223
index b63b7a4f9e..8b1859c2dd 100755
--- a/tests/qemu-iotests/223
+++ b/tests/qemu-iotests/223
@@ -40,6 +40,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file # uses NBD as well
 _supported_os Linux
+# Persistent dirty bitmaps require compat=1.1
+_unsupported_imgopts 'compat=0.10'
 
 function do_run_qemu()
 {
-- 
2.17.1

Re: [Qemu-devel] [PATCH v6 4/4] acpi: build TPM Physical Presence interface

2018-07-13 Thread Marc-André Lureau

Hi

On Wed, Jul 11, 2018 at 6:25 PM, Igor Mammedov  wrote:
> On Wed, 4 Jul 2018 18:00:41 +0200
> Marc-André Lureau  wrote:
>
>> HI
>>
>> On Wed, Jul 4, 2018 at 5:39 PM, Igor Mammedov  wrote:
>> > On Thu, 28 Jun 2018 19:26:57 +0200
>> > Marc-André Lureau  wrote:
>> >
>> >> From: Stefan Berger 
>> >>
>> >> The TPM Physical Presence interface consists of an ACPI part, a shared
>> >> memory part, and code in the firmware. Users can send messages to the
>> >> firmware by writing a code into the shared memory through invoking the
>> >> ACPI code. When a reboot happens, the firmware looks for the code and
>> >> acts on it by sending sequences of commands to the TPM.
>> >>
>> >> This patch adds the ACPI code. It is similar to the one in EDK2 but 
>> >> doesn't
>> >> assume that SMIs are necessary to use. It uses a similar datastructure for
>> >> the shared memory as EDK2 does so that EDK2 and SeaBIOS could both make 
>> >> use
>> >> of it. I extended the shared memory data structure with an array of 256
>> >> bytes, one for each code that could be implemented. The array contains
>> >> flags describing the individual codes. This decouples the ACPI 
>> >> implementation
>> >> from the firmware implementation.
>> >>
>> >> The underlying TCG specification is accessible from the following page.
>> >>
>> >> https://trustedcomputinggroup.org/tcg-physical-presence-interface-specification/
>> >>
>> >> This patch implements version 1.30.
>> >>
>> >> Signed-off-by: Stefan Berger 
>> >> [ Marc-André - ACPI code improvements and windows fixes ]
>> >> Signed-off-by: Marc-André Lureau 
>> >>
>> >> ---
>> >>
>> >> v7: (Marc-André)
>> >>  - use first spec version/section in code comments
>> >>  - use more variables for ASL code building
>> >>  - remove unnecessering ToInteger() calls
>> >>
>> >> v6:
>> >>  - more code documentation (Marc-André)
>> >>  - use some explicit named variables to ease reading (Marc-André)
>> >>  - use fixed size fields/memory regions, remove PPI struct (Marc-André)
>> >>  - only add PPI ACPI methods if PPI is enabled (Marc-André)
>> >>  - document the qemu/firmware ACPI memory region (Stefan)
>> >>
>> >> v5 (Marc-André):
>> >>  - /struct tpm_ppi/struct TPMPPIData
>> >>
>> >> v4 (Marc-André):
>> >>  - replace 'DerefOf (FUNC [N])' with a function, to fix Windows ACPI
>> >> handling.
>> >>  - replace 'return Package (..) {} ' with scoped variables, to fix
>> >>Windows ACPI handling.
>> >>
>> >> v3:
>> >>  - add support for PPI to CRB
>> >>  - split up OperationRegion TPPI into two parts, one containing
>> >>the registers (TPP1) and the other one the flags (TPP2); switched
>> >>the order of the flags versus registers in the code
>> >>  - adapted ACPI code to small changes to the array of flags where
>> >>previous flag 0 was removed and now shifting right wasn't always
>> >>necessary anymore
>> >>
>> >> v2:
>> >>  - get rid of FAIL variable; function 5 was using it and always
>> >>returns 0; the value is related to the ACPI function call not
>> >>a possible failure of the TPM function call.
>> >>  - extend shared memory data structure with per-opcode entries
>> >>holding flags and use those flags to determine what to return
>> >>to caller
>> >>  - implement interface version 1.3
>> >> ---
>> >>  include/hw/acpi/tpm.h |   8 +
>> >>  hw/i386/acpi-build.c  | 403 +-
>> >>  docs/specs/tpm.txt|  79 +
>> >>  3 files changed, 487 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/include/hw/acpi/tpm.h b/include/hw/acpi/tpm.h
>> >> index f79d68a77a..e0bd07862e 100644
>> >> --- a/include/hw/acpi/tpm.h
>> >> +++ b/include/hw/acpi/tpm.h
>> >> @@ -196,4 +196,12 @@ REG32(CRB_DATA_BUFFER, 0x80)
>> >>  #define TPM_PPI_VERSION_NONE0
>> >>  #define TPM_PPI_VERSION_1_301
>> >>
>> >> +/* whether function is blocked by BIOS settings; bits 0, 1, 2 */
>> >> +#define TPM_PPI_FUNC_NOT_IMPLEMENTED (0 << 0)
>> >> +#define TPM_PPI_FUNC_BIOS_ONLY   (1 << 0)
>> >> +#define TPM_PPI_FUNC_BLOCKED (2 << 0)
>> >> +#define TPM_PPI_FUNC_ALLOWED_USR_REQ (3 << 0)
>> >> +#define TPM_PPI_FUNC_ALLOWED_USR_NOT_REQ (4 << 0)
>> >> +#define TPM_PPI_FUNC_MASK(7 << 0)
>> >> +
>> >>  #endif /* HW_ACPI_TPM_H */
>> >> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> >> index ebd64c4818..263677f3e4 100644
>> >> --- a/hw/i386/acpi-build.c
>> >> +++ b/hw/i386/acpi-build.c
>> >> @@ -43,6 +43,7 @@
>> >>  #include "hw/acpi/memory_hotplug.h"
>> >>  #include "sysemu/tpm.h"
>> >>  #include "hw/acpi/tpm.h"
>> >> +#include "hw/tpm/tpm_ppi.h"
>> >>  #include "hw/acpi/vmgenid.h"
>> >>  #include "sysemu/tpm_backend.h"
>> >>  #include "hw/timer/mc146818rtc_regs.h"
>> >> @@ -1789,6 +1790,396 @@ static Aml *build_q35_osc_method(void)
>> >>  return method;
>> >>  }
>> >>
>> >> +static void
>> >> +build_tpm_ppi(TPMIf *tpm, Aml *dev)
>> >> +{
>> >> +Aml *method, *field, *ifctx, *ifctx2, *ifctx3;
>> >> +

[Qemu-devel] [PATCH v2] trace/simple: fix hang in child after fork(2)

2018-07-13 Thread Stefan Hajnoczi

The simple trace backend spawns a write-out thread which is used to
asynchronously flush the in-memory ring buffer to disk.

fork(2) does not clone all threads, only the thread that invoked
fork(2).  As a result there is no write-out thread in the child process!

This causes a hang during shutdown when atexit(3) handler installed by
the simple trace backend waits for the non-existent write-out thread.

This patch uses pthread_atfork(3) to terminate the write-out thread
before fork and restart it in both the parent and child after fork.
This solves a hang in qemu-iotests 147 due to qemu-nbd --fork usage.

Reported-by: Cornelia Huck 
Suggested-by: Paolo Bonzini 
Signed-off-by: Stefan Hajnoczi 
---
 trace/simple.c | 67 +++---
 1 file changed, 58 insertions(+), 9 deletions(-)

diff --git a/trace/simple.c b/trace/simple.c
index 701dec639c..9f6ac9ef24 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -39,9 +39,11 @@
 static GMutex trace_lock;
 static GCond trace_available_cond;
 static GCond trace_empty_cond;
+static GThread *trace_writeout_thread;
 
 static bool trace_available;
 static bool trace_writeout_enabled;
+static bool trace_writeout_running;
 
 enum {
 TRACE_BUF_LEN = 4096 * 64,
@@ -142,15 +144,34 @@ static void flush_trace_file(bool wait)
 g_mutex_unlock(_lock);
 }
 
-static void wait_for_trace_records_available(void)
+/**
+ * Wait to be kicked by flush_trace_file()
+ *
+ * Returns: true if the writeout thread should continue
+ *  false if the writeout thread should terminate
+ */
+static bool wait_for_trace_records_available(void)
 {
+bool running;
+
 g_mutex_lock(_lock);
-while (!(trace_available && trace_writeout_enabled)) {
+for (;;) {
+running = trace_writeout_running;
+if (!running) {
+break;
+}
+
+if (trace_available && trace_writeout_enabled) {
+break;
+}
+
 g_cond_signal(_empty_cond);
 g_cond_wait(_available_cond, _lock);
 }
 trace_available = false;
 g_mutex_unlock(_lock);
+
+return running;
 }
 
 static gpointer writeout_thread(gpointer opaque)
@@ -165,9 +186,7 @@ static gpointer writeout_thread(gpointer opaque)
 size_t unused __attribute__ ((unused));
 uint64_t type = TRACE_RECORD_TYPE_EVENT;
 
-for (;;) {
-wait_for_trace_records_available();
-
+while (wait_for_trace_records_available()) {
 if (g_atomic_int_get(_events)) {
 dropped.rec.event = DROPPED_EVENT_ID,
 dropped.rec.timestamp_ns = get_clock();
@@ -398,18 +417,48 @@ static GThread *trace_thread_create(GThreadFunc fn)
 return thread;
 }
 
+#ifndef _WIN32
+static void stop_writeout_thread(void)
+{
+g_mutex_lock(_lock);
+trace_writeout_running = false;
+g_cond_signal(_available_cond);
+g_mutex_unlock(_lock);
+
+g_thread_join(trace_writeout_thread);
+trace_writeout_thread = NULL;
+}
+
+static void restart_writeout_thread(void)
+{
+trace_writeout_running = true;
+trace_writeout_thread = trace_thread_create(writeout_thread);
+if (!trace_writeout_thread) {
+warn_report("unable to initialize simple trace backend");
+}
+}
+#endif /* !_WIN32 */
+
 bool st_init(void)
 {
-GThread *thread;
-
 trace_pid = getpid();
+trace_writeout_running = true;
 
-thread = trace_thread_create(writeout_thread);
-if (!thread) {
+trace_writeout_thread = trace_thread_create(writeout_thread);
+if (!trace_writeout_thread) {
 warn_report("unable to initialize simple trace backend");
 return false;
 }
 
+#ifndef _WIN32
+/* Terminate writeout thread across fork and restart it in parent and
+ * child afterwards.
+ */
+pthread_atfork(stop_writeout_thread,
+   restart_writeout_thread,
+   restart_writeout_thread);
+#endif
+
 atexit(st_flush_trace_buffer);
 return true;
 }
-- 
2.17.1

[Qemu-devel] [PATCH] trace/simple: fix hang in child after fork(2)

2018-07-13 Thread Stefan Hajnoczi

The simple trace backend spawns a write-out thread which is used to
asynchronously flush the in-memory ring buffer to disk.

fork(2) does not clone all threads, only the thread that invoked
fork(2).  As a result there is no write-out thread in the child process!

This causes a hang during shutdown when atexit(3) handler installed by
the simple trace backend waits for the non-existent write-out thread.

This patch uses pthread_atfork(3) to terminate the write-out thread
before fork and restart it in both the parent and child after fork.
This solves a hang in qemu-iotests 147 due to qemu-nbd --fork usage.

Reported-by: Cornelia Huck 
Suggested-by: Paolo Bonzini 
Signed-off-by: Stefan Hajnoczi 
---
 trace/simple.c | 67 +++---
 1 file changed, 58 insertions(+), 9 deletions(-)

diff --git a/trace/simple.c b/trace/simple.c
index 701dec639c..9f6ac9ef24 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -39,9 +39,11 @@
 static GMutex trace_lock;
 static GCond trace_available_cond;
 static GCond trace_empty_cond;
+static GThread *trace_writeout_thread;
 
 static bool trace_available;
 static bool trace_writeout_enabled;
+static bool trace_writeout_running;
 
 enum {
 TRACE_BUF_LEN = 4096 * 64,
@@ -142,15 +144,34 @@ static void flush_trace_file(bool wait)
 g_mutex_unlock(_lock);
 }
 
-static void wait_for_trace_records_available(void)
+/**
+ * Wait to be kicked by flush_trace_file()
+ *
+ * Returns: true if the writeout thread should continue
+ *  false if the writeout thread should terminate
+ */
+static bool wait_for_trace_records_available(void)
 {
+bool running;
+
 g_mutex_lock(_lock);
-while (!(trace_available && trace_writeout_enabled)) {
+for (;;) {
+running = trace_writeout_running;
+if (!running) {
+break;
+}
+
+if (trace_available && trace_writeout_enabled) {
+break;
+}
+
 g_cond_signal(_empty_cond);
 g_cond_wait(_available_cond, _lock);
 }
 trace_available = false;
 g_mutex_unlock(_lock);
+
+return running;
 }
 
 static gpointer writeout_thread(gpointer opaque)
@@ -165,9 +186,7 @@ static gpointer writeout_thread(gpointer opaque)
 size_t unused __attribute__ ((unused));
 uint64_t type = TRACE_RECORD_TYPE_EVENT;
 
-for (;;) {
-wait_for_trace_records_available();
-
+while (wait_for_trace_records_available()) {
 if (g_atomic_int_get(_events)) {
 dropped.rec.event = DROPPED_EVENT_ID,
 dropped.rec.timestamp_ns = get_clock();
@@ -398,18 +417,48 @@ static GThread *trace_thread_create(GThreadFunc fn)
 return thread;
 }
 
+#ifndef _WIN32
+static void stop_writeout_thread(void)
+{
+g_mutex_lock(_lock);
+trace_writeout_running = false;
+g_cond_signal(_available_cond);
+g_mutex_unlock(_lock);
+
+g_thread_join(trace_writeout_thread);
+trace_writeout_thread = NULL;
+}
+
+static void restart_writeout_thread(void)
+{
+trace_writeout_running = true;
+trace_writeout_thread = trace_thread_create(writeout_thread);
+if (!trace_writeout_thread) {
+warn_report("unable to initialize simple trace backend");
+}
+}
+#endif /* !_WIN32 */
+
 bool st_init(void)
 {
-GThread *thread;
-
 trace_pid = getpid();
+trace_writeout_running = true;
 
-thread = trace_thread_create(writeout_thread);
-if (!thread) {
+trace_writeout_thread = trace_thread_create(writeout_thread);
+if (!trace_writeout_thread) {
 warn_report("unable to initialize simple trace backend");
 return false;
 }
 
+#ifndef _WIN32
+/* Terminate writeout thread across fork and restart it in parent and
+ * child afterwards.
+ */
+pthread_atfork(stop_writeout_thread,
+   restart_writeout_thread,
+   restart_writeout_thread);
+#endif
+
 atexit(st_flush_trace_buffer);
 return true;
 }
-- 
2.17.1

Re: [Qemu-devel] [PATCH] qemu-iotests: Use host_device instead of file in 149

2018-07-13 Thread John Snow




On 07/13/2018 03:10 AM, Kevin Wolf wrote:
> The test case uses block devices with driver=file, which causes the test
> to fail after commit 230ff73904 added a deprecation warning for this.
> Fix the test case to use driver=host_device and update the reference
> output accordingly.
> 
> Signed-off-by: Kevin Wolf 

149 [not run] requires password-less sudo access: sudo: a
password is required

Ah, this one hides from regression scripts quite well...

This test fails both before and after this patch for me, with

+Traceback (most recent call last):
+  File "149", line 525, in 
+test_once(config, qemu_img=False)
+  File "149", line 336, in test_once
+cryptsetup_open(config)
+  File "149", line 200, in cryptsetup_open
+cryptsetup(args, password)
+  File "149", line 124, in cryptsetup
+raise Exception(msg)
+Exception: Device qiotest-145-twofish-128-xts-plain64-sha1 already exists.
+Command failed with code 17: File exists

Am I missing some prerequisite, maybe?

Anyway, this fixes the regression in the warning output from what I can
see up to this failure, but I probably can't give an R-B here until I
set my own setup straight.

> ---
>  tests/qemu-iotests/149 |   2 +-
>  tests/qemu-iotests/149.out | 344 
> ++---
>  2 files changed, 173 insertions(+), 173 deletions(-)
> 
> diff --git a/tests/qemu-iotests/149 b/tests/qemu-iotests/149
> index d3ffa259db..9e0cad76f9 100755
> --- a/tests/qemu-iotests/149
> +++ b/tests/qemu-iotests/149
> @@ -259,7 +259,7 @@ def qemu_io_image_args(config, dev=False):
>  if dev:
>  return [
>  "--image-opts",
> -"driver=file,filename=%s" % config.device_path()]
> +"driver=host_device,filename=%s" % config.device_path()]
>  else:
>  return [
>  "--object",
> diff --git a/tests/qemu-iotests/149.out b/tests/qemu-iotests/149.out
> index 5dea00bfa8..1407ce6dad 100644
> --- a/tests/qemu-iotests/149.out
> +++ b/tests/qemu-iotests/149.out
> @@ -7,13 +7,13 @@ sudo cryptsetup -q -v luksFormat --cipher aes-xts-plain64 
> --key-size 512 --hash
>  sudo cryptsetup -q -v luksOpen TEST_DIR/luks-aes-256-xts-plain64-sha1.img 
> qiotest-145-aes-256-xts-plain64-sha1
>  # Write test pattern 0xa7
>  sudo chown UID:GID /dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
> -qemu-io -c write -P 0xa7 100M 10M --image-opts 
> driver=file,filename=/dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
> +qemu-io -c write -P 0xa7 100M 10M --image-opts 
> driver=host_device,filename=/dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
>  wrote 10485760/10485760 bytes at offset 104857600
>  10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  
>  # Write test pattern 0x13
>  sudo chown UID:GID /dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
> -qemu-io -c write -P 0x13 3145728M 10M --image-opts 
> driver=file,filename=/dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
> +qemu-io -c write -P 0x13 3145728M 10M --image-opts 
> driver=host_device,filename=/dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
>  wrote 10485760/10485760 bytes at offset 3298534883328
>  10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  
> @@ -43,13 +43,13 @@ wrote 10485760/10485760 bytes at offset 3298534883328
>  sudo cryptsetup -q -v luksOpen TEST_DIR/luks-aes-256-xts-plain64-sha1.img 
> qiotest-145-aes-256-xts-plain64-sha1
>  # Read test pattern 0x91
>  sudo chown UID:GID /dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
> -qemu-io -c read -P 0x91 100M 10M --image-opts 
> driver=file,filename=/dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
> +qemu-io -c read -P 0x91 100M 10M --image-opts 
> driver=host_device,filename=/dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
>  read 10485760/10485760 bytes at offset 104857600
>  10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  
>  # Read test pattern 0x5e
>  sudo chown UID:GID /dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
> -qemu-io -c read -P 0x5e 3145728M 10M --image-opts 
> driver=file,filename=/dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
> +qemu-io -c read -P 0x5e 3145728M 10M --image-opts 
> driver=host_device,filename=/dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
>  read 10485760/10485760 bytes at offset 3298534883328
>  10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  
> @@ -67,13 +67,13 @@ Formatting 'TEST_DIR/luks-aes-256-xts-plain64-sha1.img', 
> fmt=luks size=439804651
>  sudo cryptsetup -q -v luksOpen TEST_DIR/luks-aes-256-xts-plain64-sha1.img 
> qiotest-145-aes-256-xts-plain64-sha1
>  # Write test pattern 0xa7
>  sudo chown UID:GID /dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
> -qemu-io -c write -P 0xa7 100M 10M --image-opts 
> driver=file,filename=/dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
> +qemu-io -c write -P 0xa7 100M 10M --image-opts 
> driver=host_device,filename=/dev/mapper/qiotest-145-aes-256-xts-plain64-sha1
>  wrote 10485760/10485760 bytes at offset 104857600
>  10 MiB, X ops; XX:XX:XX.X (XXX

Re: [Qemu-devel] [PATCH 08/12] migration: do not flush_compressed_data at the end of each iteration

2018-07-13 Thread Dr. David Alan Gilbert

* guangrong.x...@gmail.com (guangrong.x...@gmail.com) wrote:
> From: Xiao Guangrong 
> 
> flush_compressed_data() needs to wait all compression threads to
> finish their work, after that all threads are free until the
> migration feed new request to them, reducing its call can improve
> the throughput and use CPU resource more effectively
> 
> We do not need to flush all threads at the end of iteration, the
> data can be kept locally until the memory block is changed
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  migration/ram.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index f9a8646520..0a38c1c61e 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1994,6 +1994,7 @@ static void ram_save_cleanup(void *opaque)
>  }
>  
>  xbzrle_cleanup();
> +flush_compressed_data(*rsp);
>  compress_threads_save_cleanup();
>  ram_state_cleanup(rsp);
>  }

I'm not sure why this change corresponds to the other removal.
We should already have sent all remaining data in ram_save_complete()'s
call to flush_compressed_data - so what is this one for?

> @@ -2690,7 +2691,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  }
>  i++;
>  }
> -flush_compressed_data(rs);
>  rcu_read_unlock();

Hmm - are we sure there's no other cases that depend on ordering of all
of the compressed data being sent out between threads?
I think the one I'd most worry about is the case where:

  iteration one:
 thread 1: Save compressed page 'n'

  iteration two:
 thread 2: Save compressed page 'n'

What guarantees that the version of page 'n'
from thread 2 reaches the destination first without
this flush?

Dave

>  /*
> -- 
> 2.14.4
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 07/12] migration: hold the lock only if it is really needed

2018-07-13 Thread Dr. David Alan Gilbert

* Xiao Guangrong (guangrong.x...@gmail.com) wrote:
> 
> 
> On 07/11/2018 04:21 PM, Peter Xu wrote:
> > On Thu, Jun 28, 2018 at 05:33:58PM +0800, Xiao Guangrong wrote:
> > > 
> > > 
> > > On 06/19/2018 03:36 PM, Peter Xu wrote:
> > > > On Mon, Jun 04, 2018 at 05:55:15PM +0800, guangrong.x...@gmail.com 
> > > > wrote:
> > > > > From: Xiao Guangrong 
> > > > > 
> > > > > Try to hold src_page_req_mutex only if the queue is not
> > > > > empty
> > > > 
> > > > Pure question: how much this patch would help?  Basically if you are
> > > > running compression tests then I think it means you are with precopy
> > > > (since postcopy cannot work with compression yet), then here the lock
> > > > has no contention at all.
> > > 
> > > Yes, you are right, however we can observe it is in the top functions
> > > (after revert this patch):
> > > 
> > > Samples: 29K of event 'cycles', Event count (approx.): 22263412260
> > > +   7.99%  kqemu  qemu-system-x86_64   [.] ram_bytes_total
> > > +   6.95%  kqemu  [kernel.kallsyms][k] 
> > > copy_user_enhanced_fast_string
> > > +   6.23%  kqemu  qemu-system-x86_64   [.] qemu_put_qemu_file
> > > +   6.20%  kqemu  qemu-system-x86_64   [.] qemu_event_set
> > > +   5.80%  kqemu  qemu-system-x86_64   [.] __ring_put
> > > +   4.82%  kqemu  qemu-system-x86_64   [.] compress_thread_data_done
> > > +   4.11%  kqemu  qemu-system-x86_64   [.] ring_is_full
> > > +   3.07%  kqemu  qemu-system-x86_64   [.] 
> > > threads_submit_request_prepare
> > > +   2.83%  kqemu  qemu-system-x86_64   [.] ring_mp_get
> > > +   2.71%  kqemu  qemu-system-x86_64   [.] __ring_is_full
> > > +   2.46%  kqemu  qemu-system-x86_64   [.] buffer_zero_sse2
> > > +   2.40%  kqemu  qemu-system-x86_64   [.] add_to_iovec
> > > +   2.21%  kqemu  qemu-system-x86_64   [.] ring_get
> > > +   1.96%  kqemu  [kernel.kallsyms][k] __lock_acquire
> > > +   1.90%  kqemu  libc-2.12.so [.] memcpy
> > > +   1.55%  kqemu  qemu-system-x86_64   [.] ring_len
> > > +   1.12%  kqemu  libpthread-2.12.so   [.] pthread_mutex_unlock
> > > +   1.11%  kqemu  qemu-system-x86_64   [.] ram_find_and_save_block
> > > +   1.07%  kqemu  qemu-system-x86_64   [.] ram_save_host_page
> > > +   1.04%  kqemu  qemu-system-x86_64   [.] qemu_put_buffer
> > > +   0.97%  kqemu  qemu-system-x86_64   [.] 
> > > compress_page_with_multi_thread
> > > +   0.96%  kqemu  qemu-system-x86_64   [.] ram_save_target_page
> > > +   0.93%  kqemu  libpthread-2.12.so   [.] pthread_mutex_lock
> > 
> > (sorry to respond late; I was busy with other stuff for the
> >   release...)
> > 
> 
> You're welcome. :)
> 
> > I am trying to find out anything related to unqueue_page() but I
> > failed.  Did I miss anything obvious there?
> > 
> 
> unqueue_page() was not listed here indeed, i think the function
> itself is light enough (a check then directly return) so it
> did not leave a trace here.
> 
> This perf data was got after reverting this patch, i.e, it's
> based on the lockless multithread model, then unqueue_page() is
> the only place using mutex in the main thread.
> 
> And you can see the overload of mutext was gone after applying
> this patch in the mail i replied to Dave.

I got around to playing with this patch and using 'perf top'
to see what was going on.
What I noticed was that without this patch pthread_mutex_unlock and
qemu_mutex_lock_impl were both noticeable; with the patch they'd
pretty mich vanished.  So I think it's worth it.

I ocouldn't honestly see a difference in total CPU usage or
bandwidth; but the migration code is so spiky in usage that
it's difficult to measure anyway.

So,


Reviewed-by: Dr. David Alan Gilbert 

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH v2] iotests: remove LUKS support from test 226

2018-07-13 Thread John Snow

This test doesn't actually care about the format anyway, it just
supports "all formats" as a convenience. LUKS however does not use a
simple image filename which confuses this iotest.

We can simply skip the test for formats that use IMGOPTSSYNTAX for
their filenames without missing much coverage.

Signed-off-by: John Snow 
---
V2: Test against IMGOPTSSYNTAX instead of 'luks' directly (Kevin)

 tests/qemu-iotests/226 | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/qemu-iotests/226 b/tests/qemu-iotests/226
index 460aea2fc9..34987d43f9 100755
--- a/tests/qemu-iotests/226
+++ b/tests/qemu-iotests/226
@@ -41,6 +41,9 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # Generic format, but tests file-protocol specific error handling
 _supported_fmt generic
+if [ "$IMGOPTSSYNTAX" = "true" ]; then
+_unsupported_fmt $IMGFMT
+fi
 _supported_proto file
 _supported_os Linux
 
-- 
2.14.4

Re: [Qemu-devel] [PATCH v2 06/16] hw/display/xlnx_dp: Move problematic code from instance_init to realize

2018-07-13 Thread Paolo Bonzini

On 13/07/2018 17:59, Thomas Huth wrote:
> Your patch looks good at a first quick glance, but it seems not to work as 
> expected: When I now run QEMU like this:
> 
> echo "{'execute':'qmp_capabilities'}" \
>  "{'execute':'device-list-properties'," \
>  "'arguments':{'typename':'xlnx,zynqmp'}}" \
>  "{'execute': 'human-monitor-command', " \
>  "'arguments': {'command-line': 'info qtree'}}" | \
>  aarch64-softmmu/qemu-system-aarch64 -M none,accel=qtest -qmp stdio
> 
> then QEMU ends up in an endless loop and I've got to kill it.

There are two more bugs that my patch makes un-latent, where the
objects are created but not added as children.  Therefore when
you call object_unparent on them, nothing happens.

In particular dpcd and edid give you an infinite loop in bus_unparent,
because device_unparent is not called and does not remove them from
the list of devices on the bus.

The following incremental changes fix everything for me.  Note that
aux_create_slave/qdev_create already do the unref for you.

Thanks,

Paolo

diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
index 589ef59dfd..6439bd05ef 100644
--- a/hw/display/xlnx_dp.c
+++ b/hw/display/xlnx_dp.c
@@ -1235,8 +1235,11 @@ static void xlnx_dp_init(Object *obj)
  * Initialize DPCD and EDID..
  */
 s->dpcd = DPCD(aux_create_slave(s->aux_bus, "dpcd"));
+object_property_add_child(OBJECT(s), "dpcd", OBJECT(s->dpcd), NULL);
+
 s->edid = I2CDDC(qdev_create(BUS(aux_get_i2c_bus(s->aux_bus)), "i2c-ddc"));
 i2c_set_slave_address(I2C_SLAVE(s->edid), 0x50);
+object_property_add_child(OBJECT(s), "edid", OBJECT(s->edid), NULL);
 
 fifo8_create(>rx_fifo, 16);
 fifo8_create(>tx_fifo, 16);
diff --git a/hw/misc/auxbus.c b/hw/misc/auxbus.c
index 2fe807d42f..0e56d9a8a4 100644
--- a/hw/misc/auxbus.c
+++ b/hw/misc/auxbus.c
@@ -32,6 +32,7 @@
 #include "hw/misc/auxbus.h"
 #include "hw/i2c/i2c.h"
 #include "monitor/monitor.h"
+#include "qapi/error.h"
 
 #ifndef DEBUG_AUX
 #define DEBUG_AUX 0
@@ -63,9 +64,14 @@ static void aux_bus_class_init(ObjectClass *klass, void 
*data)
 AUXBus *aux_init_bus(DeviceState *parent, const char *name)
 {
 AUXBus *bus;
+Object *auxtoi2c;
 
 bus = AUX_BUS(qbus_create(TYPE_AUX_BUS, parent, name));
-bus->bridge = AUXTOI2C(qdev_create(BUS(bus), TYPE_AUXTOI2C));
+auxtoi2c = object_new_with_props(TYPE_AUXTOI2C, OBJECT(bus), "i2c",
+ _abort, NULL);
+qdev_set_parent_bus(DEVICE(auxtoi2c), BUS(bus));
+
+bus->bridge = AUXTOI2C(auxtoi2c);
 
 /* Memory related. */
 bus->aux_io = g_malloc(sizeof(*bus->aux_io));

Re: [Qemu-devel] [PATCH] iotests: remove LUKS support from test 226

2018-07-13 Thread John Snow




On 07/13/2018 03:09 AM, Kevin Wolf wrote:
> Am 13.07.2018 um 00:12 hat John Snow geschrieben:
>> This test doesn't actually care about the format anyway, it just
>> supports "all formats" as a convenience. LUKS however does not use a
>> simple image filename which confuses this iotest.
>>
>> We can simply remove the LUKS "support" and be happier for it.
>>
>> Signed-off-by: John Snow 
>> ---
>>  tests/qemu-iotests/226 | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/tests/qemu-iotests/226 b/tests/qemu-iotests/226
>> index 460aea2fc9..ebfaf62e53 100755
>> --- a/tests/qemu-iotests/226
>> +++ b/tests/qemu-iotests/226
>> @@ -40,7 +40,9 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>>  . ./common.pattern
>>  
>>  # Generic format, but tests file-protocol specific error handling
>> +# LUKS does not use a simple filename; this test doesn't use the format 
>> anyway.
>>  _supported_fmt generic
>> +_unsupported_fmt luks
>>  _supported_proto file
>>  _supported_os Linux
> 
> Maybe it would be better to test for $IMGOPTSSYNTAX = "true", because
> that's the real problem here?
> 
> Kevin
> 

Oh, OK. I didn't know about that var.

Re: [Qemu-devel] [PATCH 2/6] accel/tcg: Handle get_page_addr_code() returning -1 in hashtable lookups

2018-07-13 Thread Emilio G. Cota

On Tue, Jul 10, 2018 at 17:00:09 +0100, Peter Maydell wrote:
> When we support execution from non-RAM MMIO regions, get_page_addr_code()
> will return -1 to indicate that there is no RAM at the requested address.
> Handle this in the cpu-exec TB hashtable lookup code, treating it as
> "no match found".
> 
> Note that the call to get_page_addr_code() in tb_lookup_cmp() needs
> no changes -- a return of -1 will already correctly result in the
> function returning false.
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Emilio G. Cota 

E.

Re: [Qemu-devel] [PATCH 4/6] accel/tcg: tb_gen_code(): Create single-insn TB for execution from non-RAM

2018-07-13 Thread Emilio G. Cota

On Tue, Jul 10, 2018 at 17:00:11 +0100, Peter Maydell wrote:
> If get_page_addr_code() returns -1, this indicates that there is no RAM
> page we can read a full TB from. Instead we must create a TB which
> contains a single instruction and which we do not cache, so it is
> executed only once.
> 
> Since this means we can now have TBs which are not in any page list,
> we also need to make tb_phys_invalidate() handle them (by not trying
> to remove them from a nonexistent page list).
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Emilio G. Cota 

Emilio

Re: [Qemu-devel] [PATCH 10/12] migration: introduce lockless multithreads model

2018-07-13 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> On Mon, Jun 04, 2018 at 05:55:18PM +0800, guangrong.x...@gmail.com wrote:
> > From: Xiao Guangrong 
> > 
> > Current implementation of compression and decompression are very
> > hard to be enabled on productions. We noticed that too many wait-wakes
> > go to kernel space and CPU usages are very low even if the system
> > is really free
> > 
> > The reasons are:
> > 1) there are two many locks used to do synchronous，there
> > 　　is a global lock and each single thread has its own lock,
> > 　　migration thread and work threads need to go to sleep if
> > 　　these locks are busy
> > 
> > 2) migration thread separately submits request to the thread
> >however, only one request can be pended, that means, the
> >thread has to go to sleep after finishing the request
> > 
> > To make it work better, we introduce a new multithread model,
> > the user, currently it is the migration thread, submits request
> > to each thread with round-robin manner, the thread has its own
> > ring whose capacity is 4 and puts the result to a global ring
> > which is lockless for multiple producers, the user fetches result
> > out from the global ring and do remaining operations for the
> > request, e.g, posting the compressed data out for migration on
> > the source QEMU
> > 
> > Performance Result:
> > The test was based on top of the patch:
> >ring: introduce lockless ring buffer
> > that means, previous optimizations are used for both of original case
> > and applying the new multithread model
> > 
> > We tested live migration on two hosts:
> >Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz * 64 + 256G memory
> > to migration a VM between each other, which has 16 vCPUs and 60G
> > memory, during the migration, multiple threads are repeatedly writing
> > the memory in the VM
> > 
> > We used 16 threads on the destination to decompress the data and on the
> > source, we tried 8 threads and 16 threads to compress the data
> > 
> > --- Before our work ---
> > migration can not be finished for both 8 threads and 16 threads. The data
> > is as followings:
> > 
> > Use 8 threads to compress:
> > - on the source:
> > migration thread   compress-threads
> > CPU usage   70%  some use 36%, others are very low ~20%
> > - on the destination:
> > main threaddecompress-threads
> > CPU usage   100% some use ~40%, other are very low ~2%
> > 
> > Migration status (CAN NOT FINISH):
> > info migrate
> > globals:
> > store-global-state: on
> > only-migratable: off
> > send-configuration: on
> > send-section-footer: on
> > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: 
> > off compress: on events: off postcopy-ram: off x-colo: off release-ram: off 
> > block: off return-path: off pause-before-switchover: off x-multifd: off 
> > dirty-bitmaps: off postcopy-blocktime: off
> > Migration status: active
> > total time: 1019540 milliseconds
> > expected downtime: 2263 milliseconds
> > setup: 218 milliseconds
> > transferred ram: 252419995 kbytes
> > throughput: 2469.45 mbps
> > remaining ram: 15611332 kbytes
> > total ram: 62931784 kbytes
> > duplicate: 915323 pages
> > skipped: 0 pages
> > normal: 59673047 pages
> > normal bytes: 238692188 kbytes
> > dirty sync count: 28
> > page size: 4 kbytes
> > dirty pages rate: 170551 pages
> > compression pages: 121309323 pages
> > compression busy: 60588337
> > compression busy rate: 0.36
> > compression reduced size: 484281967178
> > compression rate: 0.97
> > 
> > Use 16 threads to compress:
> > - on the source:
> > migration thread   compress-threads
> > CPU usage   96%  some use 45%, others are very low ~6%
> > - on the destination:
> > main threaddecompress-threads
> > CPU usage   96% some use 58%, other are very low ~10%
> > 
> > Migration status (CAN NOT FINISH):
> > info migrate
> > globals:
> > store-global-state: on
> > only-migratable: off
> > send-configuration: on
> > send-section-footer: on
> > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: 
> > off compress: on events: off postcopy-ram: off x-colo: off release-ram: off 
> > block: off return-path: off pause-before-switchover: off x-multifd: off 
> > dirty-bitmaps: off postcopy-blocktime: off
> > Migration status: active
> > total time: 1189221 milliseconds
> > expected downtime: 6824 milliseconds
> > setup: 220 milliseconds
> > transferred ram: 90620052 kbytes
> > throughput: 840.41 mbps
> > remaining ram: 3678760 kbytes
> > total ram: 62931784 kbytes
> > duplicate: 195893 pages
> > skipped: 0 pages
> > normal: 17290715 pages
> > normal bytes: 69162860 kbytes
> > dirty sync count: 33
> > page size: 4 kbytes
> > dirty pages rate: 175039 pages
> > compression pages: 186739419 pages
> > compression busy: 17486568
> > compression busy rate: 0.09
> > compression reduced size: 744546683892
> > compression rate: 0.97
> > 
> > --- After our work ---
> >

Re: [Qemu-devel] [PATCH] vhost: fix invalid downcast

2018-07-13 Thread Michael S. Tsirkin

On Fri, Jul 13, 2018 at 05:04:05PM +0300, Yury Kotov wrote:
> virtio_queue_get_desc_addr returns 64-bit hwaddr while int is usually 32-bit.
> If returned hwaddr is not equal to 0 but least-significant 32 bits are
> equal to 0 then this code will not actually stop running queue.
> 
> Signed-off-by: Yury Kotov 

So IIUC

Fixes: fb20fbb764aa1 ("vhost: avoid to start/stop virtqueue which is not ready")
And 
Cc: qemu-sta...@nongnu.org

> ---
>  hw/virtio/vhost.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index b129cb9..7edeee7 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1071,10 +1071,8 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
>  .index = vhost_vq_index,
>  };
>  int r;
> -int a;
>  
> -a = virtio_queue_get_desc_addr(vdev, idx);
> -if (a == 0) {
> +if (virtio_queue_get_desc_addr(vdev, idx) == 0) {
>  /* Don't stop the virtqueue which might have not been started */
>  return;
>  }
> -- 
> 2.7.4

Re: [Qemu-devel] [PATCH 1/5] nvme: PCI/e configuration from specification

2018-07-13 Thread David Sariel

Our bad. Change-Id tag snuck into those from gerrit
(https://review.gerrithub.io/c/davidsaOpenu/qemu/+/415434).

Took a note to replace this line with "[PATCH v2]" but, I guess, it makes
sense
if additional comments will follow, right?

Thanks for taking a look.

On 12 July 2018 at 14:47, Kevin Wolf  wrote:

> Am 22.06.2018 um 13:22 hat Shimi Gersner geschrieben:
> > PCI/e configuration currently does not meets specifications.
> >
> > Patch includes various configuration changes to support specifications
> > - BAR2 to return zero when read and CMD.IOSE is not set.
> > - Expose NVME configuration through IO space (Optional).
> > - PCI Power Management v1.2.
> > - PCIe Function Level Reset.
> > - Disable QEMUs default use of PCIe Link Status (DLLLA).
> > - PCIe missing AOC compliance flag.
> > - Mask PCIe End-to-End TLP as RO (Unspecified by specification).
> >
> > Change-Id: I9057f56266db16b013fa194730c998d4779cede3
>
> What is the meaning of this Change-Id tag?
>
> > Signed-off-by: Shimi Gersner 
>
> Keith, can you have a look at this series?
>
> Kevin
>



-- 

מדעי המחשב, האוניברסיטה הפתוחה
CS Dept, Open University of Israel

[Qemu-devel] [Bug 1781515] Re: Resolution switch leads to the screen/image being corrupted

2018-07-13 Thread Diego Viola

Switching the resolution with -vga std was working fine before, I'm not
sure on which version it started having this issue, but it should be on
a recent version.

I use the intel i915 drivers on the host OS.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1781515

Title:
  Resolution switch leads to the screen/image being corrupted

Status in QEMU:
  New

Bug description:
  I am currently using QEMU on a Arch Linux host, the guest OS is also
  Arch Linux.

  The QEMU version is currently 2.12.0-2 packaged by Arch Linux, the
  command line I'm using to fire an Arch VM is:

  $ qemu-system-x86_64 -enable-kvm -hda archlinux.qcow2 -m 4G -smp 4

  The problem I'm currently having is, after firing the VM and running
  startx I want to change the resolution to the native resolution, which
  is 1366x768 on my ThinkPad T450, however, after changing the
  resolution the image on the guest gets corrupted and it's impossible
  to see anything.

  At this point I can only turn off the VM. The only workaround I found
  is to start the VM with -vga virtio.

  The problem in this case occurs with -vga std which is the default
  video driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1781515/+subscriptions

[Qemu-devel] [Bug 1781515] Re: Resolution switch leads to the screen/image being corrupted

2018-07-13 Thread Francisco de la Peña

Hi Diego,

It seems this is a known limitation[1] because horizontal width is not a
multiple of 8, try 1360x768 as the nearest resolution, which works for
me on guests not supporting QXL drivers.

Regards.

[1] Proposed patch from 2013: https://lists.gnu.org/archive/html/qemu-
devel/2013-03/msg02732.html

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1781515

Title:
  Resolution switch leads to the screen/image being corrupted

Status in QEMU:
  New

Bug description:
  I am currently using QEMU on a Arch Linux host, the guest OS is also
  Arch Linux.

  The QEMU version is currently 2.12.0-2 packaged by Arch Linux, the
  command line I'm using to fire an Arch VM is:

  $ qemu-system-x86_64 -enable-kvm -hda archlinux.qcow2 -m 4G -smp 4

  The problem I'm currently having is, after firing the VM and running
  startx I want to change the resolution to the native resolution, which
  is 1366x768 on my ThinkPad T450, however, after changing the
  resolution the image on the guest gets corrupted and it's impossible
  to see anything.

  At this point I can only turn off the VM. The only workaround I found
  is to start the VM with -vga virtio.

  The problem in this case occurs with -vga std which is the default
  video driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1781515/+subscriptions

[Qemu-devel] [Bug 1781515] [NEW] Resolution switch leads to the screen/image being corrupted

2018-07-13 Thread Diego Viola

Public bug reported:

I am currently using QEMU on a Arch Linux host, the guest OS is also
Arch Linux.

The QEMU version is currently 2.12.0-2 packaged by Arch Linux, the
command line I'm using to fire an Arch VM is:

$ qemu-system-x86_64 -enable-kvm -hda archlinux.qcow2 -m 4G -smp 4

The problem I'm currently having is, after firing the VM and running
startx I want to change the resolution to the native resolution, which
is 1366x768 on my ThinkPad T450, however, after changing the resolution
the image on the guest gets corrupted and it's impossible to see
anything.

At this point I can only turn off the VM. The only workaround I found is
to start the VM with -vga virtio.

The problem in this case occurs with -vga std which is the default video
driver.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1781515

Title:
  Resolution switch leads to the screen/image being corrupted

Status in QEMU:
  New

Bug description:
  I am currently using QEMU on a Arch Linux host, the guest OS is also
  Arch Linux.

  The QEMU version is currently 2.12.0-2 packaged by Arch Linux, the
  command line I'm using to fire an Arch VM is:

  $ qemu-system-x86_64 -enable-kvm -hda archlinux.qcow2 -m 4G -smp 4

  The problem I'm currently having is, after firing the VM and running
  startx I want to change the resolution to the native resolution, which
  is 1366x768 on my ThinkPad T450, however, after changing the
  resolution the image on the guest gets corrupted and it's impossible
  to see anything.

  At this point I can only turn off the VM. The only workaround I found
  is to start the VM with -vga virtio.

  The problem in this case occurs with -vga std which is the default
  video driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1781515/+subscriptions

[Qemu-devel] QEMU advent calendar

2018-07-13 Thread Richard Jansson

Hi

I checked in on the QEMU advent calendar side of which I have very fond
memories. I'm very impressed and grateful for the work that you've put into
this site. My question to you is the following, what are your plans for the
continuation of the program?

In case there's anything I can do I'll gladly put in some work to help you
out to support you in your noble cause :)

Regards
Richard

[Qemu-devel] [Bug 1781515] Re: Resolution switch leads to the screen/image being corrupted

2018-07-13 Thread Diego Viola

Hi Francisco, thanks for your quick reply.

I've tried `xrandr --output Virtual-1 --mode 1360x768' with -vga std and
I also get a corrupted image.

I'm attaching a screenshot of what the screen corruption looks like
after changing the resolution.

Thanks.

** Attachment added: "qemu.png"
   
https://bugs.launchpad.net/qemu/+bug/1781515/+attachment/5163111/+files/qemu.png

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1781515

Title:
  Resolution switch leads to the screen/image being corrupted

Status in QEMU:
  New

Bug description:
  I am currently using QEMU on a Arch Linux host, the guest OS is also
  Arch Linux.

  The QEMU version is currently 2.12.0-2 packaged by Arch Linux, the
  command line I'm using to fire an Arch VM is:

  $ qemu-system-x86_64 -enable-kvm -hda archlinux.qcow2 -m 4G -smp 4

  The problem I'm currently having is, after firing the VM and running
  startx I want to change the resolution to the native resolution, which
  is 1366x768 on my ThinkPad T450, however, after changing the
  resolution the image on the guest gets corrupted and it's impossible
  to see anything.

  At this point I can only turn off the VM. The only workaround I found
  is to start the VM with -vga virtio.

  The problem in this case occurs with -vga std which is the default
  video driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1781515/+subscriptions

Re: [Qemu-devel] [PATCH v6 4/5] s390x/vfio: ap: Introduce VFIO AP device

2018-07-13 Thread Tony Krowiak


On 07/04/2018 01:24 PM, Cornelia Huck wrote:

On Fri, 29 Jun 2018 18:48:02 -0400
Tony Krowiak  wrote:


Introduces a VFIO based AP device. The device is defined via
the QEMU command line by specifying:

 -device vfio-ap,sysfsdev=

There may be only one vfio-ap device configured for a guest.

The mediated matrix device is created by the VFIO AP device
driver by writing a UUID to a sysfs attribute file (see
docs/vfio-ap.txt). The mediated matrix device will be named
after the UUID. Symbolic links to the $uuid are created in
many places, so the path to the mediated matrix device $uuid
can be specified in any of the following ways:

/sys/devices/vfio_ap/matrix/$uuid
/sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/devices/$uuid
/sys/bus/mdev/devices/$uuid
/sys/bus/mdev/drivers/vfio_mdev/$uuid

When the vfio-ap device is realized, it acquires and opens the
VFIO iommu group to which the mediated matrix device is
bound. This causes a VFIO group notification event to be
signaled. The vfio_ap device driver's group notification
handler will get called at which time the device driver
will configure the the AP devices to which the guest will
be granted access.

Signed-off-by: Tony Krowiak 
Signed-off-by: Tony Krowiak 

Nit: You may want to decide one address or the other :)


Unintentional  I'll correct it.




---
  MAINTAINERS   |1 +
  default-configs/s390x-softmmu.mak |1 +
  hw/vfio/Makefile.objs |1 +
  hw/vfio/ap.c  |  183 +
  include/hw/vfio/vfio-common.h |1 +
  5 files changed, 187 insertions(+), 0 deletions(-)
  create mode 100644 hw/vfio/ap.c

No complaints about the code from me.


I like no complaints:)

Re: [Qemu-devel] [PATCH v2 06/16] hw/display/xlnx_dp: Move problematic code from instance_init to realize

2018-07-13 Thread Thomas Huth

On 13.07.2018 13:13, Paolo Bonzini wrote:
> On 13/07/2018 10:27, Thomas Huth wrote:
>> aux_create_slave() calls qdev_init_nofail() which in turn "realizes"
>> the corresponding object. Thus this most not be called from an
>> instance_init function. Move the code to the realize function instead.
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>  hw/display/xlnx_dp.c | 23 +--
>>  1 file changed, 9 insertions(+), 14 deletions(-)
> 
>> +s->aux_bus = aux_init_bus(dev, "aux");
> 
> aux_init_bus can remain in the same place, and likewise the qdev_create
> that assigns to s->edid.
> 
> The only thing that has to move is the qdev_init_nofail and
> aux_bus_map_device, like this:
> 
> - 8< --
> From: Paolo Bonzini 
> Subject: [PATCH] hw/display/xlnx_dp: Move problematic code from instance_init 
> to realize

Your patch looks good at a first quick glance, but it seems not to work as 
expected: When I now run QEMU like this:

echo "{'execute':'qmp_capabilities'}" \
 "{'execute':'device-list-properties'," \
 "'arguments':{'typename':'xlnx,zynqmp'}}" \
 "{'execute': 'human-monitor-command', " \
 "'arguments': {'command-line': 'info qtree'}}" | \
 aarch64-softmmu/qemu-system-aarch64 -M none,accel=qtest -qmp stdio

then QEMU ends up in an endless loop and I've got to kill it.

 Thomas

[Qemu-devel] [RFC v3] arm: Add NRF51 random number generator peripheral

2018-07-13 Thread Steffen Görtz

Add a model of the NRF51 random number generator peripheral.

Signed-off-by: Steffen Görtz 
---
Changes since v3:
  - Replace bitfields
  - Add VMState / reset
  - Add reference to reference manual

Changes since v2: 
  - Add missing 'qapi/error.h' for error_abort

Changes since v1: 
  - Add implementation access size hints to MemoryRegionOps
  - Fail on error if qcrypto_random_bytes fails
  - Add references to Nrf51 datasheets


 hw/misc/Makefile.objs   |   1 +
 hw/misc/nrf51_rng.c | 269 
 include/hw/misc/nrf51_rng.h |  71 ++
 3 files changed, 341 insertions(+)
 create mode 100644 hw/misc/nrf51_rng.c
 create mode 100644 include/hw/misc/nrf51_rng.h

diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 00e834d0f0..fd8cc97249 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -70,3 +70,4 @@ obj-$(CONFIG_AUX) += auxbus.o
 obj-$(CONFIG_ASPEED_SOC) += aspeed_scu.o aspeed_sdmc.o
 obj-y += mmio_interface.o
 obj-$(CONFIG_MSF2) += msf2-sysreg.o
+obj-$(CONFIG_NRF51_SOC) += nrf51_rng.o
diff --git a/hw/misc/nrf51_rng.c b/hw/misc/nrf51_rng.c
new file mode 100644
index 00..cae427a391
--- /dev/null
+++ b/hw/misc/nrf51_rng.c
@@ -0,0 +1,269 @@
+/*
+ * nRF51 Random Number Generator
+ *
+ * Reference Manual: http://infocenter.nordicsemi.com/pdf/nRF51_RM_v3.0.pdf
+ *
+ * Copyright 2018 Steffen Görtz 
+ *
+ * This code is licensed under the GPL version 2 or later.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "hw/misc/nrf51_rng.h"
+#include "crypto/random.h"
+
+#define NRF51_RNG_SIZE 0x1000
+
+#define NRF51_RNG_TASK_START   0x000
+#define NRF51_RNG_TASK_STOP0x004
+#define NRF51_RNG_EVENT_VALRDY 0x100
+#define NRF51_RNG_REG_SHORTS   0x200
+#define NRF51_RNG_REG_SHORTS_VALRDY_STOP 0
+#define NRF51_RNG_REG_INTEN0x300
+#define NRF51_RNG_REG_INTEN_VALRDY 0
+#define NRF51_RNG_REG_INTENSET 0x304
+#define NRF51_RNG_REG_INTENCLR 0x308
+#define NRF51_RNG_REG_CONFIG   0x504
+#define NRF51_RNG_REG_CONFIG_DECEN 0
+#define NRF51_RNG_REG_VALUE0x508
+
+#define NRF51_TRIGGER_TASK 0x01
+#define NRF51_EVENT_CLEAR  0x00
+
+
+static uint64_t rng_read(void *opaque, hwaddr offset, unsigned int size)
+{
+Nrf51RNGState *s = NRF51_RNG(opaque);
+uint64_t r = 0;
+
+switch (offset) {
+case NRF51_RNG_EVENT_VALRDY:
+r = s->event_valrdy;
+break;
+case NRF51_RNG_REG_SHORTS:
+r = s->shortcut_stop_on_valrdy;
+break;
+case NRF51_RNG_REG_INTEN:
+case NRF51_RNG_REG_INTENSET:
+case NRF51_RNG_REG_INTENCLR:
+r = s->interrupt_enabled;
+break;
+case NRF51_RNG_REG_CONFIG:
+r = s->filter_enabled;
+break;
+case NRF51_RNG_REG_VALUE:
+r = s->value;
+break;
+
+default:
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: bad read offset 0x%" HWADDR_PRIx "\n",
+  __func__, offset);
+}
+
+return r;
+}
+
+static int64_t calc_next_timeout(Nrf51RNGState *s)
+{
+int64_t timeout = qemu_clock_get_us(QEMU_CLOCK_VIRTUAL);
+if (s->filter_enabled) {
+timeout += s->period_filtered_us;
+} else {
+timeout += s->period_unfiltered_us;
+}
+
+return timeout;
+}
+
+
+static void rng_update_timer(Nrf51RNGState *s)
+{
+if (s->active) {
+timer_mod(>timer, calc_next_timeout(s));
+} else {
+timer_del(>timer);
+}
+}
+
+
+static void rng_write(void *opaque, hwaddr offset,
+   uint64_t value, unsigned int size)
+{
+Nrf51RNGState *s = NRF51_RNG(opaque);
+
+switch (offset) {
+case NRF51_RNG_TASK_START:
+if (value == NRF51_TRIGGER_TASK) {
+s->active = 1;
+rng_update_timer(s);
+}
+break;
+case NRF51_RNG_TASK_STOP:
+if (value == NRF51_TRIGGER_TASK) {
+s->active = 0;
+rng_update_timer(s);
+}
+break;
+case NRF51_RNG_EVENT_VALRDY:
+if (value == NRF51_EVENT_CLEAR) {
+s->event_valrdy = 0;
+qemu_set_irq(s->eep_valrdy, 0);
+}
+break;
+case NRF51_RNG_REG_SHORTS:
+s->shortcut_stop_on_valrdy =
+(value & BIT_MASK(NRF51_RNG_REG_SHORTS_VALRDY_STOP)) ? 1 : 0;
+break;
+case NRF51_RNG_REG_INTEN:
+s->interrupt_enabled =
+(value & BIT_MASK(NRF51_RNG_REG_INTEN_VALRDY)) ? 1 : 0;
+break;
+case NRF51_RNG_REG_INTENSET:
+if (value & BIT_MASK(NRF51_RNG_REG_INTEN_VALRDY)) {
+s->interrupt_enabled = 1;
+}
+break;
+case NRF51_RNG_REG_INTENCLR:
+if (value & BIT_MASK(NRF51_RNG_REG_INTEN_VALRDY)) {
+s->interrupt_enabled = 0;
+}
+break;
+case NRF51_RNG_REG_CONFIG:
+s->filter_enabled =
+  (value & BIT_MASK(NRF51_RNG_REG_CONFIG_DECEN)) ? 1 : 0;
+

Re: [Qemu-devel] Native Memory Virtualization in qemu-system-aarch64

2018-07-13 Thread Peter Maydell

On 12 July 2018 at 17:48, Kevin Loughlin  wrote:
> I know TrustZone has support for memory virtualization in AArch64, but I'm
> looking to create a different model. Namely, I'd like to fully virtualize
> the memory map for the "virt" board.
>
> As a basic example of what I want, assuming an execution environment that
> runs in a 1GB physical address space (0x0 - 0x3FFF), I'd like to be
> able to switch to a second execution environment with a distinct SW stack
> that runs in the second GB of a board memory (0x4000 - 0x7FFF). The
> key points for my desired memory virtualization are the following...
>
>1. Both of these environments should have distinct virtual address spaces
>2. The OS in each environment should believe it is running on physical
>addresses 0x0 - 0x3FFF in both cases.
>3. Neither environment should have access to the physical memory state
>of the other
>
> I initialize distinct AddressSpace and MemoryRegion structures for each of
> these GB blocks. Because all I want is a simple shift of physical address
> for one environment, I hesitate to mirror the (relatively) complex address
> translation process for TrustZone. Does anyone know if it would be better
> to either (a) provide custom read/write functions for the shifted
> MemoryRegion object, or (b) modify the target/arm code, such as adding a
> shift to get_phys_addr() in target/arm/helper.c?

I'm a bit confused about what you're trying to do. Without TrustZone,
by definition there is only one physical address space (ie all of
memory/devices/etc are addressed by a single 64-bit physaddr).
There's no way to cause the CPU to not have access to it.
With TrustZone, you can think of the system as having two physical
address spaces (so to access something you need to specify both
a 64-bit physaddr and the TZ secure/nonsecure bit), and the CPU
and the system design cooperate to enforce that code running in the
nonsecure world can't get at things in the system it should not have
access to.

The whole point of TZ is to allow you to do this sort of partitioning.
Without it there's no way for the system (RAM or whatever) to know which
environment is running on the CPU.

You could in theory design and implement a non-standard extension to
the architecture to do equivalent things to what TZ is doing I suppose,
but that would be a lot of work and a lot of fragile modifications
to QEMU.

thanks
-- PMM

Re: [Qemu-devel] [Qemu-arm] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions

2018-07-13 Thread Peter Maydell

On 12 July 2018 at 17:37, Peter Maydell  wrote:
> On 11 July 2018 at 05:21, Philippe Mathieu-Daudé  wrote:
>> I applied and quickly tested your series on a MIPS SoC I'm working on
>> which has a tiny SRAM:
>>
>> (qemu) info mtree
>> address-space: memory
>>   - (prio 0, i/o): system
>> -07ff (prio 0, ram): sram
>> 1000-107f (prio 0, i/o): pflash
>> 1400-14ff (prio 0, ram): dram
>> 1fc0-1fc0 (prio 0, rom): srom
>>
>> The firmware copies the ISR in this SRAM area, sadly it didn't work as
>> expected:
>>
>> qemu-system-mips: Bad ram pointer 0x4a4

> Are you sure this is executing from the SRAM at this point?
> The PC value in the backtrace is 2415932580 == 900034A4. I don't
> know how the MMU is configured at this point, but my guess is that
> that's actually in the pflash area, and you're running into a
> different bug:
> https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg00674.html
> where when we attempt to execute from a romd device that's in romd
> mode we incorrectly think it's RAM in get_page_addr_code() and crash.

I just sent a follow-on patch which should fix the 'bad ram pointer'
assert. If your guest really is trying to execute from the pflash
here then it will just misbehave in a different way, though,
since pflash status response bytes are probably not valid MIPS
instructions...

thanks
-- PMM

[Qemu-devel] [PATCH] accel/tcg: Check whether TLB entry is RAM consistently with how we set it up

2018-07-13 Thread Peter Maydell

We set up TLB entries in tlb_set_page_with_attrs(), where we have
some logic for determining whether the TLB entry is considered
to be RAM-backed, and thus has a valid addend field. When we
look at the TLB entry in get_page_addr_code(), we use different
logic for determining whether to treat the page as RAM-backed
and use the addend field. This is confusing, and in fact buggy,
because the code in tlb_set_page_with_attrs() correctly decides
that rom_device memory regions not in romd mode are not RAM-backed,
but the code in get_page_addr_code() thinks they are RAM-backed.
This typically results in "Bad ram pointer" assertion if the
guest tries to execute from such a memory region.

Fix this by making get_page_addr_code() just look at the
TLB_MMIO bit in the code_address field of the TLB, which
tlb_set_page_with_attrs() sets if and only if the addend
field is not valid for code execution.

Signed-off-by: Peter Maydell 
---
This patch is based on:
 * [PATCH for-3.0 0/2] accel/tcg: fix get_page_addr_code() victim TLB lookups
 * [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions
Based-on: <20180713141636.18665-1-peter.mayd...@linaro.org>
Based-on: <20180710160013.26559-1-peter.mayd...@linaro.org>

It would be possible to fix the 'bad ram pointer' in a more
minimalist way (by making memory_region_is_unassigned()
check "!memory_region_is_romd(mr)" rather than "!mr->rom_device")
but since for 3.0 the only thing this does is turn that assert
failure into the "can't execute from non-ram" error-exit it
doesn't seem worth fixing for 3.0. After 3.0 the small-MMU-region
exec support makes it more interesting. NB that if your
rom-device is pflash then being able to execute from it when
it's not in romd mode is not likely to actually help you, since
pflash status code returns are seldom valid instructions :-)

Philippe: you might like to try this lot with your MIPS
code that was getting the bad ram addr error?
---
 include/exec/exec-all.h |  2 --
 accel/tcg/cputlb.c  | 30 +-
 exec.c  |  6 --
 3 files changed, 9 insertions(+), 29 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index da73e3bfed2..5f781255826 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -502,8 +502,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
hwaddr paddr, hwaddr xlat,
int prot,
target_ulong *address);
-bool memory_region_is_unassigned(MemoryRegion *mr);
-
 #endif
 
 /* vl.c */
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 754795ff253..5e5a2a2616c 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -926,10 +926,6 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, 
target_ulong addr)
 {
 int mmu_idx, index;
 void *p;
-MemoryRegion *mr;
-MemoryRegionSection *section;
-CPUState *cpu = ENV_GET_CPU(env);
-CPUIOTLBEntry *iotlbentry;
 
 index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
 mmu_idx = cpu_mmu_index(env, true);
@@ -939,29 +935,21 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, 
target_ulong addr)
 }
 assert(tlb_hit(env->tlb_table[mmu_idx][index].addr_code, addr));
 }
+assert(tlb_hit(env->tlb_table[mmu_idx][index].addr_code, addr));
 
-if (unlikely(env->tlb_table[mmu_idx][index].addr_code & TLB_RECHECK)) {
+if (unlikely(env->tlb_table[mmu_idx][index].addr_code &
+ (TLB_RECHECK | TLB_MMIO))) {
 /*
- * This is a TLB_RECHECK access, where the MMU protection
- * covers a smaller range than a target page. Return -1 to
- * indicate that we cannot simply execute from RAM here;
- * we will perform the necessary repeat of the MMU check
- * when the "execute a single insn" code performs the
- * load of the guest insn.
+ * Return -1 if we can't translate and execute from an entire
+ * page of RAM here, which will cause us to execute by loading
+ * and translating one insn at a time, without caching:
+ *  - TLB_RECHECK: means the MMU protection covers a smaller range
+ *than a target page, so we must redo the MMU check every insn
+ *  - TLB_MMIO: region is not backed by RAM
  */
 return -1;
 }
 
-iotlbentry = >iotlb[mmu_idx][index];
-section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
-mr = section->mr;
-if (memory_region_is_unassigned(mr)) {
-/*
- * Not guest RAM, so there is no ram_addr_t for it. Return -1,
- * and we will execute a single insn from this device.
- */
-return -1;
-}
 p = (void *)((uintptr_t)addr + env->tlb_table[mmu_idx][index].addend);
 return qemu_ram_addr_from_host_nofail(p);
 }
diff --git a/exec.c b/exec.c
index 4f5df07b6a2..e7be0761c28 100644
--- a/exec.c
+++

Re: [Qemu-devel] [PATCH] hw/char/serial: Only retry if qemu_chr_fe_write returns 0

2018-07-13 Thread Paolo Bonzini

On 13/07/2018 16:55, Marc-André Lureau wrote:
> -return;
> +} else {
> +int rc = qemu_chr_fe_write(>chr, >tsr, 1);
> +
> +if ((rc == 0 ||
> + (rc == -1 && (errno == EAGAIN || errno == EINTR))) &&
> +s->tsr_retry < MAX_XMIT_RETRY) {
> +assert(s->watch_tag == 0);
> +s->watch_tag =
> +qemu_chr_fe_add_watch(>chr, G_IO_OUT | G_IO_HUP,
> +  serial_watch_cb, s);
> +if (s->watch_tag > 0) {
> +s->tsr_retry++;
> +return;
> +}

I think EINTR should be handled by chardev, otherwise yes.

Paolo

Re: [Qemu-devel] [PATCH] hw/char/serial: Only retry if qemu_chr_fe_write returns 0

2018-07-13 Thread Marc-André Lureau

Hi

On Tue, Jul 10, 2018 at 3:48 PM, Igor Mammedov  wrote:
> On Tue, 5 Jun 2018 11:18:35 +0200
> Paolo Bonzini  wrote:
>
>> On 05/06/2018 09:54, Sergio Lopez wrote:
>> > Only retry on serial_xmit if qemu_chr_fe_write returns 0, as this is the
>> > only recoverable error.
>> >
>> > Retrying with any other scenario, in addition to being a waste of CPU
>> > cycles, can compromise the Guest stability if by the vCPU issuing the
>> > write and the main loop thread are, by chance or explicit pinning,
>> > running on the same pCPU.
>> >
>> > Previous discussion:
>> >
>> > https://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg06998.html
>> >
>> > Signed-off-by: Sergio Lopez 
>> > ---
>> >  hw/char/serial.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/hw/char/serial.c b/hw/char/serial.c
>> > index 605b0d0..6de6c29 100644
>> > --- a/hw/char/serial.c
>> > +++ b/hw/char/serial.c
>> > @@ -260,7 +260,7 @@ static void serial_xmit(SerialState *s)
>> >  if (s->mcr & UART_MCR_LOOP) {
>> >  /* in loopback mode, say that we just received a char */
>> >  serial_receive1(s, >tsr, 1);
>> > -} else if (qemu_chr_fe_write(>chr, >tsr, 1) != 1 &&
>> > +} else if (qemu_chr_fe_write(>chr, >tsr, 1) == 0 &&
>> > s->tsr_retry < MAX_XMIT_RETRY) {
>> >  assert(s->watch_tag == 0);
>> >  s->watch_tag =
>> >
> Hi Sergio, Paolo,
>
> it looks like commit
> commit 019288bf137183bf3407c9824655b753bfafc99f
> Author: Sergio Lopez 
> Date:   Tue Jun 5 03:54:55 2018 -0400
>
> hw/char/serial: Only retry if qemu_chr_fe_write returns 0
>
> introduced regression wrt 2.12 and broke windows guest remote kernel debug 
> over
> serial, where windbg can't connect to target and target hangs when windbg 
> tries
> to connect to it.
>
> Reverting the commit fixes issue
>

is this a possible solution?

diff --git a/hw/char/serial.c b/hw/char/serial.c
index cd7d747c68..046c4684ff 100644
--- a/hw/char/serial.c
+++ b/hw/char/serial.c
@@ -261,15 +261,20 @@ static void serial_xmit(SerialState *s)
 if (s->mcr & UART_MCR_LOOP) {
 /* in loopback mode, say that we just received a char */
 serial_receive1(s, >tsr, 1);
-} else if (qemu_chr_fe_write(>chr, >tsr, 1) == 0 &&
-   s->tsr_retry < MAX_XMIT_RETRY) {
-assert(s->watch_tag == 0);
-s->watch_tag =
-qemu_chr_fe_add_watch(>chr, G_IO_OUT | G_IO_HUP,
-  serial_watch_cb, s);
-if (s->watch_tag > 0) {
-s->tsr_retry++;
-return;
+} else {
+int rc = qemu_chr_fe_write(>chr, >tsr, 1);
+
+if ((rc == 0 ||
+ (rc == -1 && (errno == EAGAIN || errno == EINTR))) &&
+s->tsr_retry < MAX_XMIT_RETRY) {
+assert(s->watch_tag == 0);
+s->watch_tag =
+qemu_chr_fe_add_watch(>chr, G_IO_OUT | G_IO_HUP,
+  serial_watch_cb, s);
+if (s->watch_tag > 0) {
+s->tsr_retry++;
+return;
+}
 }
 }
 s->tsr_retry = 0;


-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH v2 1/5] i386: Add support for IA32_PRED_CMD and IA32_ARCH_CAPABILITIES MSRs

2018-07-13 Thread Konrad Rzeszutek Wilk

On Fri, Jul 13, 2018 at 04:44:49PM +0200, Paolo Bonzini wrote:
> On 13/07/2018 16:11, konrad.w...@oracle.com wrote:
> > (Apologies if this comes out as HTML, using Thunderbird instead of mutt
> > here)..
> > 
> >> +    uint64_t pred_cmd;
> >> +    uint64_t arch_capabilities;
> > 
> > Could this be 'arch_cap' ?
> > 
> 
> Why?  Intel chose a verbose name, we should not abbrev. it unnecessarily. :)

Oh you are right. Gosh, so many more characters to type :-(
> 
> Paolo

Re: [Qemu-devel] [PATCH v3 14/20] intc/arm_gic: Wire the vCPU interface

2018-07-13 Thread Luc Michel



On 07/12/2018 03:37 PM, Peter Maydell wrote:
> On 29 June 2018 at 14:29, Luc Michel  wrote:
>> Add the read/write functions to handle accesses to the vCPU interface.
>> Those accesses are forwarded to the real CPU interface, with the CPU id
>> being converted to the corresponding vCPU id (vCPU id = CPU id +
>> GIC_NCPU).
>>
>> As for the CPU interface, we create a base region for the vCPU interface
>> that fetches the current vCPU id using the current_cpu global variable, and
>> one mirror region per vCPU which maps to that specific vCPU id. This is
>> required by the GIC architecture specification.
> 
> 
>>  static void arm_gic_realize(DeviceState *dev, Error **errp)
>>  {
>>  /* Device instance realize function for the GIC sysbus device */
>> @@ -1531,7 +1590,7 @@ static void arm_gic_realize(DeviceState *dev, Error 
>> **errp)
>>  }
>>
>>  /* This creates distributor and main CPU interface (s->cpuiomem[0]) */
> 
> Can we also update this comment to indicate what virt-related
> memory regions are being created?
> 
>> -gic_init_irqs_and_mmio(s, gic_set_irq, gic_ops, NULL);
>> +gic_init_irqs_and_mmio(s, gic_set_irq, gic_ops, gic_virt_ops);
>>
>>  /* Extra core-specific regions for the CPU interfaces. This is
>>   * necessary for "franken-GIC" implementations, for example on
>> @@ -1547,6 +1606,16 @@ static void arm_gic_realize(DeviceState *dev, Error 
>> **errp)
>>>backref[i], "gic_cpu", 0x100);
>>  sysbus_init_mmio(sbd, >cpuiomem[i+1]);
>>  }
>> +
>> +if (s->virt_extn) {
> 
> ...and a comment about what these regions are for.
> 
> What requires these per-core regions anyway? There's no way to
> specify them in the device tree bindings for Linux, which AFAIK
> only cares about using the "vcpu i/f for this core" registers.
> I don't think the GIC-400 has these. (It does have per-cpu
> aliases of the GICH_* registers, but this patchset doesn't seem
> to implement those.)
My mistake. I misread the specifications. Those aliases should target
the virtual interfaces, and not the virtual CPU interfaces. I'll change
that.

> 
>> +for (i = 0; i < s->num_cpu; i++) {
>> +memory_region_init_io(>vcpuiomem[i + 1], OBJECT(s),
>> +  _vcpu_ops, >backref[i],
>> +  "gic_vcpu", 0x2000);
>> +sysbus_init_mmio(sbd, >vcpuiomem[i + 1]);
>> +}
>> +}
>> +
>>  }
> 
> thanks
> -- PMM
> 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 1/5] i386: Add support for IA32_PRED_CMD and IA32_ARCH_CAPABILITIES MSRs

2018-07-13 Thread Paolo Bonzini

On 13/07/2018 16:11, konrad.w...@oracle.com wrote:
> (Apologies if this comes out as HTML, using Thunderbird instead of mutt
> here)..
> 
>> +    uint64_t pred_cmd;
>> +    uint64_t arch_capabilities;
> 
> Could this be 'arch_cap' ?
> 

Why?  Intel chose a verbose name, we should not abbrev. it unnecessarily. :)

Paolo

[Qemu-devel] [PATCH for-3.0 2/2] accel/tcg: Assert that tlb fill gave us a valid TLB entry

2018-07-13 Thread Peter Maydell

In commit 4b1a3e1e34ad97 we added a check for whether the TLB entry
we had following a tlb_fill had the INVALID bit set.  This could
happen in some circumstances because a stale or wrong TLB entry was
pulled out of the victim cache.  However, after commit
68fea038553039e (which prevents stale entries being in the victim
cache) and the previous commit (which ensures we don't incorrectly
hit in the victim cache)) this should never be possible.

Drop the check on TLB_INVALID_MASK from the "is this a TLB_RECHECK?"
condition, and instead assert that the tlb fill procedure has given
us a valid TLB entry (or longjumped out with a guest exception).

Signed-off-by: Peter Maydell 
---
 accel/tcg/cputlb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 2d5fb15d9a3..563fa30117e 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -970,10 +970,10 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, 
target_ulong addr)
 if (!VICTIM_TLB_HIT(addr_code, addr)) {
 tlb_fill(ENV_GET_CPU(env), addr, 0, MMU_INST_FETCH, mmu_idx, 0);
 }
+assert(tlb_hit(env->tlb_table[mmu_idx][index].addr_code, addr));
 }
 
-if (unlikely((env->tlb_table[mmu_idx][index].addr_code &
-  (TLB_RECHECK | TLB_INVALID_MASK)) == TLB_RECHECK)) {
+if (unlikely(env->tlb_table[mmu_idx][index].addr_code & TLB_RECHECK)) {
 /*
  * This is a TLB_RECHECK access, where the MMU protection
  * covers a smaller range than a target page, and we must
-- 
2.17.1

[Qemu-devel] [PATCH for-3.0 0/2] accel/tcg: fix get_page_addr_code() victim TLB lookups

2018-07-13 Thread Peter Maydell

This patchset fixes a bug in get_page_addr_code()'s
lookup of the address in the victim TLB, which was another
source of "we end up with an invalid TLB entry even after
we've done a TLB fill for it".

The second patch then removes a check that we had that was
working around the existence of the bug patch 1 fixes and
the bug that commit 68fea038553039e fixes, and instead
puts in an assertion that we definitely do have a valid TLB.

Tested with Laurent's m68k image to check we didn't regress that.

thanks
-- PMM

Peter Maydell (2):
  accel/tcg: Use correct test when looking in victim TLB for code
  accel/tcg: Assert that tlb fill gave us a valid TLB entry

 accel/tcg/cputlb.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.17.1

[Qemu-devel] [PATCH for-3.0 1/2] accel/tcg: Use correct test when looking in victim TLB for code

2018-07-13 Thread Peter Maydell

In get_page_addr_code(), we were incorrectly looking in the victim
TLB for an entry which matched the target address for reads, not
for code accesses. This meant that we could hit on a victim TLB
entry that indicated that the address was readable but not
executable, and incorrectly bypass the call to tlb_fill() which
should generate the guest MMU exception. Fix this bug.

Signed-off-by: Peter Maydell 
---
 accel/tcg/cputlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 20c147d6554..2d5fb15d9a3 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -967,7 +967,7 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, 
target_ulong addr)
 index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
 mmu_idx = cpu_mmu_index(env, true);
 if (unlikely(!tlb_hit(env->tlb_table[mmu_idx][index].addr_code, addr))) {
-if (!VICTIM_TLB_HIT(addr_read, addr)) {
+if (!VICTIM_TLB_HIT(addr_code, addr)) {
 tlb_fill(ENV_GET_CPU(env), addr, 0, MMU_INST_FETCH, mmu_idx, 0);
 }
 }
-- 
2.17.1

Re: [Qemu-devel] [PATCH v2 1/5] i386: Add support for IA32_PRED_CMD and IA32_ARCH_CAPABILITIES MSRs

2018-07-13 Thread konrad . wilk


(Apologies if this comes out as HTML, using Thunderbird instead of mutt here)..


+uint64_t pred_cmd;
+uint64_t arch_capabilities;


Could this be 'arch_cap' ?

  
  /* End of state preserved by INIT (dummy marker).  */

  struct {} end_init_save;
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 2d174f3..3aab182 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -93,6 +93,8 @@ static bool has_msr_hv_reenlightenment;
  static bool has_msr_xss;
  static bool has_msr_spec_ctrl;
  static bool has_msr_virt_ssbd;
+static bool has_msr_pred_cmd;
+static bool has_msr_arch_capabilities;


Ditto here?

[Qemu-devel] [PATCH] vhost: fix invalid downcast

2018-07-13 Thread Yury Kotov

virtio_queue_get_desc_addr returns 64-bit hwaddr while int is usually 32-bit.
If returned hwaddr is not equal to 0 but least-significant 32 bits are
equal to 0 then this code will not actually stop running queue.

Signed-off-by: Yury Kotov 
---
 hw/virtio/vhost.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index b129cb9..7edeee7 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1071,10 +1071,8 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
 .index = vhost_vq_index,
 };
 int r;
-int a;
 
-a = virtio_queue_get_desc_addr(vdev, idx);
-if (a == 0) {
+if (virtio_queue_get_desc_addr(vdev, idx) == 0) {
 /* Don't stop the virtqueue which might have not been started */
 return;
 }
-- 
2.7.4

Re: [Qemu-devel] qemu-nbd vs 'simple' trace backend vs iotest 147

2018-07-13 Thread Stefan Hajnoczi

On Fri, Jul 13, 2018 at 08:40:19AM +0200, Paolo Bonzini wrote:
> On 12/07/2018 18:30, Stefan Hajnoczi wrote:
> > On Wed, Jul 11, 2018 at 03:33:21PM +0200, Cornelia Huck wrote:
> >> The other qemu-nbds (the inet and the unix socket ones from the first
> >> run, the second inet one from the second run) have a single thread with
> >> the same backtrace I posted above.
> > 
> > We just discussed this on IRC, but for the record:
> > 
> > qemu-nbd --fork will fork the process after the simpletrace write-out
> > thread has been spawned.  The child process lacks this thread (due to
> > how fork(2) handles multithreading).  Either qemu-nbd needs to
> > initialize tracing later (but that means we cannot trace early init) or
> > simpletrace needs a way to respawn the write-out thread.
> 
> You can use pthread_atfork for this.

Thanks.  The man page says:

  The intent of  pthread_atfork()  was  to provide  a  mechanism
  whereby  the application (or a library) could ensure that mutexes and
  other process and thread state would be restored to a consistent
  state.  In practice, this task is generally too difficult to be
  practicable.

Challenge accepted!

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 16/20] intc/arm_gic: Implement gic_update_virt() function

2018-07-13 Thread Luc Michel



On 07/12/2018 03:56 PM, Peter Maydell wrote:
> On 29 June 2018 at 14:29, Luc Michel  wrote:
>> Add the gic_update_virt() function to update the vCPU interface states
>> and raise vIRQ and vFIQ as needed. This commit renames gic_update() to
>> gic_update_internal() and generalizes it to handle both cases, with a
>> `virt' parameter to track whether we are updating the CPU or vCPU
>> interfaces.
>>
>> The main difference between CPU and vCPU is the way we select the best
>> IRQ. This part has been split into the gic_get_best_(v)irq functions.
>> For the virt case, the LRs are iterated to find the best candidate.
>>
>> Signed-off-by: Luc Michel 
>> ---
>>  hw/intc/arm_gic.c | 170 +++---
>>  1 file changed, 130 insertions(+), 40 deletions(-)
> 
> 
>> +
>> +/* Return true if IRQ signaling is enabled:
>> + *   - !virt -> from the distributor to the CPU interfaces,
>> + *  for the given group mask,
>> + *   -  virt -> from the given virtual interface to the CPU virtual 
>> interface.
>> + */
>> +static inline bool gic_irq_signaling_enabled(GICState *s, int cpu, bool 
>> virt,
>> +int group_mask)
>> +{
>> +return (virt && (s->h_hcr[cpu] & R_GICH_HCR_EN_MASK))
>> +|| (!virt && (s->ctlr & group_mask));
>> +}
> 
> For a real CPU interface we test the GICC_CTLR EnableGrp0/1 bits here.
> For a virt CPU interface shouldn't we test the GICV_CTLR bits ?
This test is still done in gic_update_internal() (we test
s->cpu_ctlr[cpu_iface], with cpu_iface being the index of a vcpu when
virt is true). I can move it in this function if you think it's clearer?

> 
> (This doesn't actually cause any wrong behaviour because this check
> is just an efficiency check: "if interrupts are entirely disabled
> then we don't need to do the expensive look-at-all-irqs checks".)
> 
> 
> Otherwise looks OK.
> 
> thanks
> -- PMM
> 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 16/20] intc/arm_gic: Implement gic_update_virt() function

2018-07-13 Thread Peter Maydell

On 13 July 2018 at 14:33, Luc Michel  wrote:
>
>
> On 07/12/2018 03:56 PM, Peter Maydell wrote:
>> On 29 June 2018 at 14:29, Luc Michel  wrote:
>>> Add the gic_update_virt() function to update the vCPU interface states
>>> and raise vIRQ and vFIQ as needed. This commit renames gic_update() to
>>> gic_update_internal() and generalizes it to handle both cases, with a
>>> `virt' parameter to track whether we are updating the CPU or vCPU
>>> interfaces.
>>>
>>> The main difference between CPU and vCPU is the way we select the best
>>> IRQ. This part has been split into the gic_get_best_(v)irq functions.
>>> For the virt case, the LRs are iterated to find the best candidate.
>>>
>>> Signed-off-by: Luc Michel 
>>> ---
>>>  hw/intc/arm_gic.c | 170 +++---
>>>  1 file changed, 130 insertions(+), 40 deletions(-)
>>
>>
>>> +
>>> +/* Return true if IRQ signaling is enabled:
>>> + *   - !virt -> from the distributor to the CPU interfaces,
>>> + *  for the given group mask,
>>> + *   -  virt -> from the given virtual interface to the CPU virtual 
>>> interface.
>>> + */
>>> +static inline bool gic_irq_signaling_enabled(GICState *s, int cpu, bool 
>>> virt,
>>> +int group_mask)
>>> +{
>>> +return (virt && (s->h_hcr[cpu] & R_GICH_HCR_EN_MASK))
>>> +|| (!virt && (s->ctlr & group_mask));
>>> +}
>>
>> For a real CPU interface we test the GICC_CTLR EnableGrp0/1 bits here.
>> For a virt CPU interface shouldn't we test the GICV_CTLR bits ?
> This test is still done in gic_update_internal() (we test
> s->cpu_ctlr[cpu_iface], with cpu_iface being the index of a vcpu when
> virt is true). I can move it in this function if you think it's clearer?

Oh, I see, you put it in the outer condition:

+if (!gic_irq_signaling_enabled(s, cpu, virt,
+   GICD_CTLR_EN_GRP0 | GICD_CTLR_EN_GRP1)
+|| !(s->cpu_ctlr[cpu_iface] &
+ (GICC_CTLR_EN_GRP0 | GICC_CTLR_EN_GRP1))) {

Yes, I think if we're going to abstract the check into a function
we should put it all in the function, not have half of it
in the function and half not.

(The purpose of the function is "identify configurations of the
GIC where it cannot possibly generate either an IRQ or a FIQ
regardless of the individual interrupt or list register state,
so that we can early-exit without doing a complete scan of
all interrupts/list registers".)

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 for 3.0 03/16] shippable: Build the TriCore docker image

2018-07-13 Thread Alex Bennée



Alex Bennée  writes:

> From: Philippe Mathieu-Daudé 
>
> Signed-off-by: Philippe Mathieu-Daudé 
> Signed-off-by: Alex Bennée 
> ---
>  .shippable.yml | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/.shippable.yml b/.shippable.yml
> index f74a3de3ff..9670b13f8e 100644
> --- a/.shippable.yml
> +++ b/.shippable.yml
> @@ -25,6 +25,8 @@ env:
>TARGET_LIST=mips64el-softmmu,mips64el-linux-user
>  - IMAGE=debian-ppc64el-cross
>TARGET_LIST=ppc64-softmmu,ppc64-linux-user,ppc64abi32-linux-user
> +- IMAGE=debian-tricore-cross
> +  TARGET_LIST=
>  build:
>pre_ci:
>  - make docker-image-${IMAGE} V=1

Serves me right for not waiting until the shippable build finished. This
needs to be dropped as while we can build the image we can't use it to
build anything.

--
Alex Bennée

[Qemu-devel] [PATCH v4 27/29] virtio-gpu: split virtio-gpu, introduce virtio-gpu-base

2018-07-13 Thread Marc-André Lureau

Add a base class that is common to virtio-gpu and vhost-user-gpu
devices.

The VirtIOGPUBase base class provides common functionalities necessary
for both virtio-gpu and vhost-user-gpu:
- common configuration (max-outputs, initial resolution, flags)
- virtio device initialization, including queue setup
- device pre-conditions checks (iommu)
- migration blocker
- virtio device callbacks
- hooking up to qemu display subsystem
- a few common helper functions to reset the device, retrieve display
informations
- a class callback to unblock the rendering (for GL updates)

What is left to the virtio-gpu subdevice to take care of, in short,
are all the virtio queues handling, command processing and migration.

Signed-off-by: Marc-André Lureau 
---
 include/hw/virtio/virtio-gpu.h |  72 +--
 hw/display/virtio-gpu-3d.c |  49 ++---
 hw/display/virtio-gpu-base.c   | 292 +
 hw/display/virtio-gpu-pci.c|   2 +-
 hw/display/virtio-gpu.c| 332 ++---
 hw/display/virtio-vga.c|  15 +-
 hw/display/Makefile.objs   |   2 +-
 7 files changed, 447 insertions(+), 317 deletions(-)
 create mode 100644 hw/display/virtio-gpu-base.c

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 6b07efc4f7..2edf47e9ab 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -22,6 +22,14 @@
 
 #include "standard-headers/linux/virtio_gpu.h"
 
+#define TYPE_VIRTIO_GPU_BASE "virtio-gpu-base"
+#define VIRTIO_GPU_BASE(obj)\
+OBJECT_CHECK(VirtIOGPUBase, (obj), TYPE_VIRTIO_GPU_BASE)
+#define VIRTIO_GPU_BASE_GET_CLASS(obj)  \
+OBJECT_GET_CLASS(VirtIOGPUBaseClass, obj, TYPE_VIRTIO_GPU_BASE)
+#define VIRTIO_GPU_BASE_CLASS(klass)\
+OBJECT_CLASS_CHECK(VirtIOGPUBaseClass, klass, TYPE_VIRTIO_GPU_BASE)
+
 #define TYPE_VIRTIO_GPU "virtio-gpu-device"
 #define VIRTIO_GPU(obj)\
 OBJECT_CHECK(VirtIOGPU, (obj), TYPE_VIRTIO_GPU)
@@ -58,7 +66,7 @@ struct virtio_gpu_requested_state {
 int x, y;
 };
 
-enum virtio_gpu_conf_flags {
+enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_VIRGL_ENABLED = 1,
 VIRTIO_GPU_FLAG_STATS_ENABLED,
 };
@@ -68,8 +76,7 @@ enum virtio_gpu_conf_flags {
 #define virtio_gpu_stats_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_STATS_ENABLED))
 
-struct virtio_gpu_conf {
-uint64_t max_hostmem;
+struct virtio_gpu_base_conf {
 uint32_t max_outputs;
 uint32_t flags;
 uint32_t xres;
@@ -85,31 +92,53 @@ struct virtio_gpu_ctrl_command {
 QTAILQ_ENTRY(virtio_gpu_ctrl_command) next;
 };
 
-typedef struct VirtIOGPU {
+typedef struct VirtIOGPUBase {
 VirtIODevice parent_obj;
 
-QEMUBH *ctrl_bh;
-QEMUBH *cursor_bh;
+Error *migration_blocker;
+
+struct virtio_gpu_base_conf conf;
+struct virtio_gpu_config virtio_config;
+
+bool use_virgl_renderer;
+int renderer_blocked;
+int enable;
+
+struct virtio_gpu_scanout scanout[VIRTIO_GPU_MAX_SCANOUTS];
+
+int enabled_output_bitmask;
+struct virtio_gpu_requested_state req_state[VIRTIO_GPU_MAX_SCANOUTS];
+} VirtIOGPUBase;
+
+typedef struct VirtIOGPUBaseClass {
+VirtioDeviceClass parent;
+
+void (*gl_unblock)(VirtIOGPUBase *g);
+} VirtIOGPUBaseClass;
+
+#define VIRTIO_GPU_BASE_PROPERTIES(_state, _conf)   \
+DEFINE_PROP_UINT32("max_outputs", _state, _conf.max_outputs, 1),\
+DEFINE_PROP_UINT32("xres", _state, _conf.xres, 1024),   \
+DEFINE_PROP_UINT32("yres", _state, _conf.yres, 768)
+
+typedef struct VirtIOGPU {
+VirtIOGPUBase parent_obj;
+
+uint64_t conf_max_hostmem;
+
 VirtQueue *ctrl_vq;
 VirtQueue *cursor_vq;
 
-int enable;
+QEMUBH *ctrl_bh;
+QEMUBH *cursor_bh;
 
 QTAILQ_HEAD(, virtio_gpu_simple_resource) reslist;
 QTAILQ_HEAD(, virtio_gpu_ctrl_command) cmdq;
 QTAILQ_HEAD(, virtio_gpu_ctrl_command) fenceq;
 
-struct virtio_gpu_scanout scanout[VIRTIO_GPU_MAX_SCANOUTS];
-struct virtio_gpu_requested_state req_state[VIRTIO_GPU_MAX_SCANOUTS];
-
-struct virtio_gpu_conf conf;
 uint64_t hostmem;
-int enabled_output_bitmask;
-struct virtio_gpu_config virtio_config;
 
-bool use_virgl_renderer;
 bool renderer_inited;
-int renderer_blocked;
 QEMUTimer *fence_poll;
 QEMUTimer *print_stats;
 
@@ -120,8 +149,6 @@ typedef struct VirtIOGPU {
 uint32_t req_3d;
 uint32_t bytes_3d;
 } stats;
-
-Error *migration_blocker;
 } VirtIOGPU;
 
 extern const GraphicHwOps virtio_gpu_ops;
@@ -144,6 +171,16 @@ extern const GraphicHwOps virtio_gpu_ops;
 }   \
 } while (0)
 
+/* virtio-gpu-base.c */
+bool virtio_gpu_base_device_realize(DeviceState *qdev,
+int num_capsets,
+

[Qemu-devel] [PATCH v4 28/29] virtio-gpu: split virtio-gpu-pci & virtio-vga

2018-07-13 Thread Marc-André Lureau

Add base classes that are common to vhost-user-gpu-pci and
vhost-user-vga.

Signed-off-by: Marc-André Lureau 
---
 hw/display/virtio-vga.h |  22 +++
 hw/virtio/virtio-pci.h  |  16 ++---
 hw/display/virtio-gpu-pci.c |  39 +---
 hw/display/virtio-vga.c | 122 +++-
 MAINTAINERS |   2 +-
 5 files changed, 126 insertions(+), 75 deletions(-)
 create mode 100644 hw/display/virtio-vga.h

diff --git a/hw/display/virtio-vga.h b/hw/display/virtio-vga.h
new file mode 100644
index 00..212624449c
--- /dev/null
+++ b/hw/display/virtio-vga.h
@@ -0,0 +1,22 @@
+#ifndef VIRTIO_VGA_H_
+#define VIRTIO_VGA_H_
+
+#include "hw/virtio/virtio-pci.h"
+#include "vga_int.h"
+
+/*
+ * virtio-vga-base: This extends VirtioPCIProxy.
+ */
+#define TYPE_VIRTIO_VGA_BASE "virtio-vga-base"
+#define VIRTIO_VGA_BASE(obj)\
+OBJECT_CHECK(VirtIOVGABase, (obj), TYPE_VIRTIO_VGA_BASE)
+
+typedef struct VirtIOVGABase {
+VirtIOPCIProxy parent_obj;
+
+VirtIOGPUBase *vgpu;
+VGACommonState vga;
+MemoryRegion   vga_mrs[3];
+} VirtIOVGABase;
+
+#endif /* VIRTIO_VGA_H_ */
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index c7e28e1b9c..42240f47b3 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -386,17 +386,19 @@ struct VHostUserInputPCI {
 VHostUserInput vhi;
 };
 
+
 /*
- * virtio-gpu-pci: This extends VirtioPCIProxy.
+ * virtio-gpu-pci-base: This extends VirtioPCIProxy.
  */
-#define TYPE_VIRTIO_GPU_PCI "virtio-gpu-pci"
-#define VIRTIO_GPU_PCI(obj) \
-OBJECT_CHECK(VirtIOGPUPCI, (obj), TYPE_VIRTIO_GPU_PCI)
+#define TYPE_VIRTIO_GPU_PCI_BASE "virtio-gpu-pci-base"
+#define VIRTIO_GPU_PCI_BASE(obj) \
+OBJECT_CHECK(VirtIOGPUPCIBase, (obj), TYPE_VIRTIO_GPU_PCI_BASE)
 
-struct VirtIOGPUPCI {
+typedef struct VirtIOGPUPCIBase {
 VirtIOPCIProxy parent_obj;
-VirtIOGPU vdev;
-};
+
+VirtIOGPUBase *vgpu;
+} VirtIOGPUPCIBase;
 
 #ifdef CONFIG_VHOST_VSOCK
 /*
diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
index 741badd909..5db6ad890d 100644
--- a/hw/display/virtio-gpu-pci.c
+++ b/hw/display/virtio-gpu-pci.c
@@ -19,16 +19,16 @@
 #include "hw/virtio/virtio-pci.h"
 #include "hw/virtio/virtio-gpu.h"
 
-static Property virtio_gpu_pci_properties[] = {
+static Property virtio_gpu_pci_base_properties[] = {
 DEFINE_VIRTIO_GPU_PCI_PROPERTIES(VirtIOPCIProxy),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-static void virtio_gpu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+static void virtio_gpu_pci_base_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
 {
-VirtIOGPUPCI *vgpu = VIRTIO_GPU_PCI(vpci_dev);
-VirtIOGPUBase *g = VIRTIO_GPU_BASE(>vdev);
-DeviceState *vdev = DEVICE(>vdev);
+VirtIOGPUPCIBase *vgpu = VIRTIO_GPU_PCI_BASE(vpci_dev);
+VirtIOGPUBase *g = vgpu->vgpu;
+DeviceState *vdev = DEVICE(g);
 int i;
 Error *local_error = NULL;
 
@@ -48,37 +48,56 @@ static void virtio_gpu_pci_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 }
 }
 
-static void virtio_gpu_pci_class_init(ObjectClass *klass, void *data)
+static void virtio_gpu_pci_base_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
 PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
 
 set_bit(DEVICE_CATEGORY_DISPLAY, dc->categories);
-dc->props = virtio_gpu_pci_properties;
+dc->props = virtio_gpu_pci_base_properties;
 dc->hotpluggable = false;
-k->realize = virtio_gpu_pci_realize;
+k->realize = virtio_gpu_pci_base_realize;
 pcidev_k->class_id = PCI_CLASS_DISPLAY_OTHER;
 }
 
+static const TypeInfo virtio_gpu_pci_base_info = {
+.name = TYPE_VIRTIO_GPU_PCI_BASE,
+.parent = TYPE_VIRTIO_PCI,
+.instance_size = sizeof(VirtIOGPUPCIBase),
+.class_init = virtio_gpu_pci_base_class_init,
+.abstract = true
+};
+
+#define TYPE_VIRTIO_GPU_PCI "virtio-gpu-pci"
+#define VIRTIO_GPU_PCI(obj) \
+OBJECT_CHECK(VirtIOGPUPCI, (obj), TYPE_VIRTIO_GPU_PCI)
+
+struct VirtIOGPUPCI {
+VirtIOGPUPCIBase parent_obj;
+VirtIOGPU vdev;
+};
+
 static void virtio_gpu_initfn(Object *obj)
 {
 VirtIOGPUPCI *dev = VIRTIO_GPU_PCI(obj);
 
 virtio_instance_init_common(obj, >vdev, sizeof(dev->vdev),
 TYPE_VIRTIO_GPU);
+VIRTIO_GPU_PCI_BASE(obj)->vgpu = VIRTIO_GPU_BASE(>vdev);
 }
 
 static const TypeInfo virtio_gpu_pci_info = {
 .name = TYPE_VIRTIO_GPU_PCI,
-.parent = TYPE_VIRTIO_PCI,
+.parent = TYPE_VIRTIO_GPU_PCI_BASE,
 .instance_size = sizeof(VirtIOGPUPCI),
 .instance_init = virtio_gpu_initfn,
-.class_init = virtio_gpu_pci_class_init,
 };
 
 static void virtio_gpu_pci_register_types(void)
 {
+type_register_static(_gpu_pci_base_info);
 type_register_static(_gpu_pci_info);
 }
+
 type_init(virtio_gpu_pci_register_types)
diff --git

Re: [Qemu-devel] [PATCH for-3.0 2/2] hw/intc/arm_gic: Fix handling of GICD_ITARGETSR

2018-07-13 Thread Luc Michel



On 07/12/2018 05:41 PM, Peter Maydell wrote:
> The GICD_ITARGETSR implementation still has some 11MPCore behaviour
> that we were incorrectly using in our GICv1 and GICv2 implementations
> for the case where the interrupt number is less than GIC_INTERNAL.
> The desired behaviour here is:
>  * for 11MPCore: RAZ/WI for irqs 0..28; read a number matching the
>CPU doing the read for irqs 29..31
>  * for GICv1 and v2: RAZ/WI if uniprocessor; otherwise read a
>number matching the CPU doing the read for all irqs < 32
> 
> Stop squashing GICD_ITARGETSR to 0 for IRQs 0..28 unless this
> is an 11MPCore GIC.
> 
> Reported-by: Jan Kiszka 
> Signed-off-by: Peter Maydell 
> ---
>  hw/intc/arm_gic.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)

Reviewed-by: Luc Michel 

-- 
Luc




signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v4 26/29] virtio-gpu: remove useless 'waiting' field

2018-07-13 Thread Marc-André Lureau

Let's check renderer_blocked instead directly.

Signed-off-by: Marc-André Lureau 
---
 include/hw/virtio/virtio-gpu.h | 1 -
 hw/display/virtio-gpu.c| 4 +---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 763e1291f1..6b07efc4f7 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -81,7 +81,6 @@ struct virtio_gpu_ctrl_command {
 VirtQueue *vq;
 struct virtio_gpu_ctrl_hdr cmd_hdr;
 uint32_t error;
-bool waiting;
 bool finished;
 QTAILQ_ENTRY(virtio_gpu_ctrl_command) next;
 };
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 26a2592c4a..b83982b0a3 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -840,8 +840,7 @@ void virtio_gpu_process_cmdq(VirtIOGPU *g)
 while (!QTAILQ_EMPTY(>cmdq)) {
 cmd = QTAILQ_FIRST(>cmdq);
 
-cmd->waiting = g->renderer_blocked;
-if (cmd->waiting) {
+if (g->renderer_blocked) {
 break;
 }
 
@@ -890,7 +889,6 @@ static void virtio_gpu_handle_ctrl(VirtIODevice *vdev, 
VirtQueue *vq)
 cmd->vq = vq;
 cmd->error = 0;
 cmd->finished = false;
-cmd->waiting = false;
 QTAILQ_INSERT_TAIL(>cmdq, cmd, next);
 cmd = virtqueue_pop(vq, sizeof(struct virtio_gpu_ctrl_command));
 }
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 22/29] contrib: add vhost-user-gpu

2018-07-13 Thread Marc-André Lureau

Add a vhost-user gpu backend example, based on virtio-gpu/3d device. It
is to be associated with a vhost-user-backend object, ex:

-object vhost-user-backend,id=vug,cmd="vhost-user-gpu"

TODO:
- add/check multi-head support
- crash & resume handling
- accelerated rendering/display to avoid the many round trips

Signed-off-by: Marc-André Lureau 
---
 contrib/vhost-user-gpu/drm.h |   63 ++
 contrib/vhost-user-gpu/virgl.h   |   25 +
 contrib/vhost-user-gpu/vugpu.h   |  168 
 contrib/vhost-user-gpu/drm.c |  190 +
 contrib/vhost-user-gpu/main.c| 1164 ++
 contrib/vhost-user-gpu/virgl.c   |  579 +
 MAINTAINERS  |2 +
 Makefile |3 +
 Makefile.objs|1 +
 configure|   32 +
 contrib/vhost-user-gpu/Makefile.objs |   10 +
 11 files changed, 2237 insertions(+)
 create mode 100644 contrib/vhost-user-gpu/drm.h
 create mode 100644 contrib/vhost-user-gpu/virgl.h
 create mode 100644 contrib/vhost-user-gpu/vugpu.h
 create mode 100644 contrib/vhost-user-gpu/drm.c
 create mode 100644 contrib/vhost-user-gpu/main.c
 create mode 100644 contrib/vhost-user-gpu/virgl.c
 create mode 100644 contrib/vhost-user-gpu/Makefile.objs

diff --git a/contrib/vhost-user-gpu/drm.h b/contrib/vhost-user-gpu/drm.h
new file mode 100644
index 00..ee95816cc9
--- /dev/null
+++ b/contrib/vhost-user-gpu/drm.h
@@ -0,0 +1,63 @@
+/*
+ * Virtio vhost-user GPU Device
+ *
+ * DRM helpers
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef VHOST_USER_GPU_DRM_H
+#define VHOST_USER_GPU_DRM_H
+
+#include "qemu/osdep.h"
+
+#ifdef CONFIG_LIBDRM_INTEL
+#include 
+#include 
+#endif
+
+struct drm_buffer;
+
+struct drm_device {
+bool inited;
+int fd;
+char *name;
+#ifdef CONFIG_LIBDRM_INTEL
+drm_intel_bufmgr *bufmgr;
+#endif
+
+bool (*alloc_bo)(struct drm_buffer *buf);
+void (*free_bo)(struct drm_buffer *buf);
+bool (*export_bo_to_prime)(struct drm_buffer *buf, int *fd);
+bool (*map_bo)(struct drm_buffer *buf);
+void (*unmap_bo)(struct drm_buffer *buf);
+void (*device_destroy)(struct drm_device *dev);
+};
+
+struct drm_buffer {
+struct drm_device *dev;
+
+#ifdef CONFIG_LIBDRM_INTEL
+drm_intel_bo *intel_bo;
+#endif /* HAVE_LIBDRM_INTEL */
+
+uint32_t gem_handle;
+int dmabuf_fd;
+uint8_t *mmap;
+
+int width;
+int height;
+int bpp;
+unsigned long stride;
+int format;
+};
+
+bool drm_device_init(struct drm_device *dev, int fd);
+void drm_device_destroy(struct drm_device *dev);
+
+bool drm_buffer_create(struct drm_buffer *buffer, struct drm_device *dev,
+   int width, int height);
+bool drm_buffer_get_dmabuf_fd(struct drm_buffer *buffer, int *fd);
+void drm_buffer_destroy(struct drm_buffer *buffer);
+
+#endif
diff --git a/contrib/vhost-user-gpu/virgl.h b/contrib/vhost-user-gpu/virgl.h
new file mode 100644
index 00..f952bc9d4f
--- /dev/null
+++ b/contrib/vhost-user-gpu/virgl.h
@@ -0,0 +1,25 @@
+/*
+ * Virtio vhost-user GPU Device
+ *
+ * Copyright Red Hat, Inc. 2013-2018
+ *
+ * Authors:
+ * Dave Airlie 
+ * Gerd Hoffmann 
+ * Marc-André Lureau 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef VUGPU_VIRGL_H_
+#define VUGPU_VIRGL_H_
+
+#include "vugpu.h"
+
+bool vg_virgl_init(VuGpu *g);
+uint32_t vg_virgl_get_num_capsets(void);
+void vg_virgl_process_cmd(VuGpu *vg, struct virtio_gpu_ctrl_command *cmd);
+void vg_virgl_update_cursor_data(VuGpu *g, uint32_t resource_id,
+ gpointer data);
+
+#endif
diff --git a/contrib/vhost-user-gpu/vugpu.h b/contrib/vhost-user-gpu/vugpu.h
new file mode 100644
index 00..37e2bc9fcd
--- /dev/null
+++ b/contrib/vhost-user-gpu/vugpu.h
@@ -0,0 +1,168 @@
+/*
+ * Virtio vhost-user GPU Device
+ *
+ * Copyright Red Hat, Inc. 2013-2018
+ *
+ * Authors:
+ * Dave Airlie 
+ * Gerd Hoffmann 
+ * Marc-André Lureau 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef VUGPU_H_
+#define VUGPU_H_
+
+#include "qemu/osdep.h"
+
+#include "contrib/libvhost-user/libvhost-user-glib.h"
+#include "standard-headers/linux/virtio_gpu.h"
+
+#include "qemu/queue.h"
+#include "qemu/iov.h"
+#include "qemu/bswap.h"
+#include "drm.h"
+
+typedef enum VhostUserGpuRequest {
+VHOST_USER_GPU_NONE = 0,
+VHOST_USER_GPU_GET_PROTOCOL_FEATURES,
+VHOST_USER_GPU_SET_PROTOCOL_FEATURES,
+VHOST_USER_GPU_GET_DISPLAY_INFO,
+VHOST_USER_GPU_CURSOR_POS,
+VHOST_USER_GPU_CURSOR_POS_HIDE,
+VHOST_USER_GPU_CURSOR_UPDATE,
+VHOST_USER_GPU_SCANOUT,
+VHOST_USER_GPU_UPDATE,
+VHOST_USER_GPU_DMABUF_SCANOUT,
+

[Qemu-devel] [PATCH v4 25/29] virtio-gpu: block both 2d and 3d rendering

2018-07-13 Thread Marc-André Lureau

Now that 2d commands are translated to 3d rendering, qemu must stop
sending 3d updates (from 2d) to Spice as well.

Signed-off-by: Marc-André Lureau 
---
 include/hw/virtio/virtio-gpu.h |  1 -
 hw/display/virtio-gpu-3d.c | 21 -
 hw/display/virtio-gpu.c| 25 ++---
 3 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 4c68bc4559..763e1291f1 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -166,7 +166,6 @@ void virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd);
 void virtio_gpu_virgl_fence_poll(VirtIOGPU *g);
 void virtio_gpu_virgl_reset(VirtIOGPU *g);
-void virtio_gpu_gl_block(void *opaque, bool block);
 int virtio_gpu_virgl_init(VirtIOGPU *g);
 int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g);
 #endif
diff --git a/hw/display/virtio-gpu-3d.c b/hw/display/virtio-gpu-3d.c
index 3558f38fe8..6fee0e8582 100644
--- a/hw/display/virtio-gpu-3d.c
+++ b/hw/display/virtio-gpu-3d.c
@@ -404,11 +404,6 @@ void virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
 {
 VIRTIO_GPU_FILL_CMD(cmd->cmd_hdr);
 
-cmd->waiting = g->renderer_blocked;
-if (cmd->waiting) {
-return;
-}
-
 virgl_renderer_force_ctx_0();
 switch (cmd->cmd_hdr.type) {
 case VIRTIO_GPU_CMD_CTX_CREATE:
@@ -604,22 +599,6 @@ void virtio_gpu_virgl_reset(VirtIOGPU *g)
 }
 }
 
-void virtio_gpu_gl_block(void *opaque, bool block)
-{
-VirtIOGPU *g = opaque;
-
-if (block) {
-g->renderer_blocked++;
-} else {
-g->renderer_blocked--;
-}
-assert(g->renderer_blocked >= 0);
-
-if (g->renderer_blocked == 0) {
-virtio_gpu_process_cmdq(g);
-}
-}
-
 int virtio_gpu_virgl_init(VirtIOGPU *g)
 {
 int ret;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 7cfb25879a..26a2592c4a 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -840,12 +840,15 @@ void virtio_gpu_process_cmdq(VirtIOGPU *g)
 while (!QTAILQ_EMPTY(>cmdq)) {
 cmd = QTAILQ_FIRST(>cmdq);
 
-/* process command */
-VIRGL(g, virtio_gpu_virgl_process_cmd, virtio_gpu_simple_process_cmd,
-  g, cmd);
+cmd->waiting = g->renderer_blocked;
 if (cmd->waiting) {
 break;
 }
+
+/* process command */
+VIRGL(g, virtio_gpu_virgl_process_cmd, virtio_gpu_simple_process_cmd,
+  g, cmd);
+
 QTAILQ_REMOVE(>cmdq, cmd, next);
 if (virtio_gpu_stats_enabled(g->conf)) {
 g->stats.requests++;
@@ -981,6 +984,22 @@ static int virtio_gpu_ui_info(void *opaque, uint32_t idx, 
QemuUIInfo *info)
 return 0;
 }
 
+static void virtio_gpu_gl_block(void *opaque, bool block)
+{
+VirtIOGPU *g = opaque;
+
+if (block) {
+g->renderer_blocked++;
+} else {
+g->renderer_blocked--;
+}
+assert(g->renderer_blocked >= 0);
+
+if (g->renderer_blocked == 0) {
+virtio_gpu_process_cmdq(g);
+}
+}
+
 const GraphicHwOps virtio_gpu_ops = {
 .invalidate = virtio_gpu_invalidate_display,
 .gfx_update = virtio_gpu_update_display,
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 29/29] hw/display: add vhost-user-vga & gpu-pci

2018-07-13 Thread Marc-André Lureau

Add new virtio-gpu devices with a "vhost-user" property. Tthe
associated vhost-user backend is used to handle the virtio rings and
provide rendering results thanks to the vhost-user-gpu protocol.

Example usage:
-object vhost-user-backend,id=vug,cmd="./vhost-user-gpu"
-device vhost-user-vga,vhost-user=vug

Signed-off-by: Marc-André Lureau 
---
 hw/virtio/virtio-pci.h  |   1 +
 include/hw/virtio/virtio-gpu.h  |  13 +
 hw/display/vhost-user-gpu-pci.c |  51 
 hw/display/vhost-user-gpu.c | 524 
 hw/display/vhost-user-vga.c |  52 
 vl.c|   1 +
 hw/display/Makefile.objs|   3 +
 7 files changed, 645 insertions(+)
 create mode 100644 hw/display/vhost-user-gpu-pci.c
 create mode 100644 hw/display/vhost-user-gpu.c
 create mode 100644 hw/display/vhost-user-vga.c

diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 42240f47b3..ec8eeb2a93 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -56,6 +56,7 @@ typedef struct VirtIOInputHIDPCI VirtIOInputHIDPCI;
 typedef struct VirtIOInputHostPCI VirtIOInputHostPCI;
 typedef struct VHostUserInputPCI VHostUserInputPCI;
 typedef struct VirtIOGPUPCI VirtIOGPUPCI;
+typedef struct VhostUserGPUPCI VhostUserGPUPCI;
 typedef struct VHostVSockPCI VHostVSockPCI;
 typedef struct VirtIOCryptoPCI VirtIOCryptoPCI;
 
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 2edf47e9ab..83e9ee2eaa 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -19,6 +19,7 @@
 #include "ui/console.h"
 #include "hw/virtio/virtio.h"
 #include "qemu/log.h"
+#include "sysemu/vhost-user-backend.h"
 
 #include "standard-headers/linux/virtio_gpu.h"
 
@@ -34,6 +35,8 @@
 #define VIRTIO_GPU(obj)\
 OBJECT_CHECK(VirtIOGPU, (obj), TYPE_VIRTIO_GPU)
 
+#define TYPE_VHOST_USER_GPU "vhost-user-gpu"
+
 #define VIRTIO_ID_GPU 16
 
 struct virtio_gpu_simple_resource {
@@ -151,6 +154,16 @@ typedef struct VirtIOGPU {
 } stats;
 } VirtIOGPU;
 
+typedef struct VhostUserGPU {
+VirtIOGPUBase parent_obj;
+
+VhostUserBackend *vhost;
+int vhost_gpu_fd; /* closed by the chardev */
+CharBackend vhost_chr;
+QemuDmaBuf dmabuf[VIRTIO_GPU_MAX_SCANOUTS];
+bool backend_blocked;
+} VhostUserGPU;
+
 extern const GraphicHwOps virtio_gpu_ops;
 
 /* to share between PCI and VGA */
diff --git a/hw/display/vhost-user-gpu-pci.c b/hw/display/vhost-user-gpu-pci.c
new file mode 100644
index 00..2eef4982a2
--- /dev/null
+++ b/hw/display/vhost-user-gpu-pci.c
@@ -0,0 +1,51 @@
+/*
+ * vhost-user GPU PCI device
+ *
+ * Copyright Red Hat, Inc. 2018
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/virtio/virtio-pci.h"
+
+#define TYPE_VHOST_USER_GPU_PCI "vhost-user-gpu-pci"
+#define VHOST_USER_GPU_PCI(obj) \
+OBJECT_CHECK(VhostUserGPUPCI, (obj), TYPE_VHOST_USER_GPU_PCI)
+
+struct VhostUserGPUPCI {
+VirtIOGPUPCIBase parent_obj;
+
+VhostUserGPU vdev;
+};
+
+static void vhost_user_gpu_pci_initfn(Object *obj)
+{
+VhostUserGPUPCI *dev = VHOST_USER_GPU_PCI(obj);
+
+virtio_instance_init_common(obj, >vdev, sizeof(dev->vdev),
+TYPE_VHOST_USER_GPU);
+
+VIRTIO_GPU_PCI_BASE(obj)->vgpu = VIRTIO_GPU_BASE(>vdev);
+
+object_property_add_alias(obj, "vhost-user",
+  OBJECT(>vdev), "vhost-user",
+  _abort);
+}
+
+static const TypeInfo vhost_user_gpu_pci_info = {
+.name = TYPE_VHOST_USER_GPU_PCI,
+.parent = TYPE_VIRTIO_GPU_PCI_BASE,
+.instance_size = sizeof(VhostUserGPUPCI),
+.instance_init = vhost_user_gpu_pci_initfn,
+};
+
+static void vhost_user_gpu_pci_register_types(void)
+{
+type_register_static(_user_gpu_pci_info);
+}
+
+type_init(vhost_user_gpu_pci_register_types)
diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
new file mode 100644
index 00..3427c151e7
--- /dev/null
+++ b/hw/display/vhost-user-gpu.c
@@ -0,0 +1,524 @@
+/*
+ * vhost-user GPU Device
+ *
+ * Copyright Red Hat, Inc. 2018
+ *
+ * Authors:
+ * Marc-André Lureau 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/virtio/virtio-gpu.h"
+#include "chardev/char-fe.h"
+#include "qapi/error.h"
+#include "migration/blocker.h"
+
+#define VHOST_USER_GPU(obj)\
+OBJECT_CHECK(VhostUserGPU, (obj), TYPE_VHOST_USER_GPU)
+
+typedef enum VhostUserGpuRequest {
+VHOST_USER_GPU_NONE = 0,
+VHOST_USER_GPU_GET_PROTOCOL_FEATURES,
+VHOST_USER_GPU_SET_PROTOCOL_FEATURES,
+VHOST_USER_GPU_GET_DISPLAY_INFO,
+VHOST_USER_GPU_CURSOR_POS,

[Qemu-devel] [PATCH v4 20/29] util: add qemu_write_pidfile()

2018-07-13 Thread Marc-André Lureau

There are variants of qemu_create_pidfile() in qemu-pr-helper and
qemu-ga. Let's have a common implementation in libqemuutil.

The code is based from pr-helper write_pidfile(), but allows the
caller to deal with error reporting and behaviour.

Signed-off-by: Marc-André Lureau 
---
 include/qemu/osdep.h  |  3 ++-
 os-posix.c| 24 ---
 os-win32.c| 25 
 qga/main.c| 54 ---
 scsi/qemu-pr-helper.c | 40 
 util/oslib-posix.c| 33 ++
 util/oslib-win32.c| 27 ++
 vl.c  |  4 ++--
 8 files changed, 79 insertions(+), 131 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index a91068df0e..47fa570bd4 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -448,7 +448,8 @@ bool qemu_has_ofd_lock(void);
 #define FMT_pid "%d"
 #endif
 
-int qemu_create_pidfile(const char *filename);
+bool qemu_write_pidfile(const char *pidfile, Error **errp);
+
 int qemu_get_thread_id(void);
 
 #ifndef CONFIG_IOVEC
diff --git a/os-posix.c b/os-posix.c
index 9ce6f74513..0e9403b4ff 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -352,30 +352,6 @@ void os_set_line_buffering(void)
 setvbuf(stdout, NULL, _IOLBF, 0);
 }
 
-int qemu_create_pidfile(const char *filename)
-{
-char buffer[128];
-int len;
-int fd;
-
-fd = qemu_open(filename, O_RDWR | O_CREAT, 0600);
-if (fd == -1) {
-return -1;
-}
-if (lockf(fd, F_TLOCK, 0) == -1) {
-close(fd);
-return -1;
-}
-len = snprintf(buffer, sizeof(buffer), FMT_pid "\n", getpid());
-if (write(fd, buffer, len) != len) {
-close(fd);
-return -1;
-}
-
-/* keep pidfile open & locked forever */
-return 0;
-}
-
 bool is_daemonized(void)
 {
 return daemonize;
diff --git a/os-win32.c b/os-win32.c
index 0674f94b57..0e0d7f50f3 100644
--- a/os-win32.c
+++ b/os-win32.c
@@ -97,28 +97,3 @@ int os_parse_cmd_args(int index, const char *optarg)
 {
 return -1;
 }
-
-int qemu_create_pidfile(const char *filename)
-{
-char buffer[128];
-int len;
-HANDLE file;
-OVERLAPPED overlap;
-BOOL ret;
-memset(, 0, sizeof(overlap));
-
-file = CreateFile(filename, GENERIC_WRITE, FILE_SHARE_READ, NULL,
-  OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
-
-if (file == INVALID_HANDLE_VALUE) {
-return -1;
-}
-len = snprintf(buffer, sizeof(buffer), "%d\n", getpid());
-ret = WriteFile(file, (LPCVOID)buffer, (DWORD)len,
-NULL, );
-CloseHandle(file);
-if (ret == 0) {
-return -1;
-}
-return 0;
-}
diff --git a/qga/main.c b/qga/main.c
index 537cc0e162..cc50692098 100644
--- a/qga/main.c
+++ b/qga/main.c
@@ -341,46 +341,6 @@ static FILE *ga_open_logfile(const char *logfile)
 return f;
 }
 
-#ifndef _WIN32
-static bool ga_open_pidfile(const char *pidfile)
-{
-int pidfd;
-char pidstr[32];
-
-pidfd = qemu_open(pidfile, O_CREAT|O_WRONLY, S_IRUSR|S_IWUSR);
-if (pidfd == -1 || lockf(pidfd, F_TLOCK, 0)) {
-g_critical("Cannot lock pid file, %s", strerror(errno));
-if (pidfd != -1) {
-close(pidfd);
-}
-return false;
-}
-
-if (ftruncate(pidfd, 0)) {
-g_critical("Failed to truncate pid file");
-goto fail;
-}
-snprintf(pidstr, sizeof(pidstr), "%d\n", getpid());
-if (write(pidfd, pidstr, strlen(pidstr)) != strlen(pidstr)) {
-g_critical("Failed to write pid file");
-goto fail;
-}
-
-/* keep pidfile open & locked forever */
-return true;
-
-fail:
-unlink(pidfile);
-close(pidfd);
-return false;
-}
-#else /* _WIN32 */
-static bool ga_open_pidfile(const char *pidfile)
-{
-return true;
-}
-#endif
-
 static gint ga_strcmp(gconstpointer str1, gconstpointer str2)
 {
 return strcmp(str1, str2);
@@ -480,8 +440,11 @@ void ga_unset_frozen(GAState *s)
 ga_enable_logging(s);
 g_warning("logging re-enabled due to filesystem unfreeze");
 if (s->deferred_options.pid_filepath) {
-if (!ga_open_pidfile(s->deferred_options.pid_filepath)) {
-g_warning("failed to create/open pid file");
+Error *err = NULL;
+
+if (!qemu_write_pidfile(s->deferred_options.pid_filepath, )) {
+g_warning("%s", error_get_pretty(err));
+error_free(err);
 }
 s->deferred_options.pid_filepath = NULL;
 }
@@ -516,8 +479,11 @@ static void become_daemon(const char *pidfile)
 }
 
 if (pidfile) {
-if (!ga_open_pidfile(pidfile)) {
-g_critical("failed to create pidfile");
+Error *err = NULL;
+
+if (!qemu_write_pidfile(pidfile, )) {
+g_critical("%s", error_get_pretty(err));
+error_free(err);
 exit(EXIT_FAILURE);
 }
 }
diff --git a/scsi/qemu-pr-helper.c

[Qemu-devel] [PATCH v4 19/29] util: promote qemu_egl_rendernode_open() to libqemuutil

2018-07-13 Thread Marc-André Lureau

vhost-user-gpu will share the same code to open a DRM node.

Signed-off-by: Marc-André Lureau 
---
 include/qemu/drm.h |  6 +
 ui/egl-helpers.c   | 51 ++-
 util/drm.c | 66 ++
 MAINTAINERS|  1 +
 util/Makefile.objs |  1 +
 5 files changed, 76 insertions(+), 49 deletions(-)
 create mode 100644 include/qemu/drm.h
 create mode 100644 util/drm.c

diff --git a/include/qemu/drm.h b/include/qemu/drm.h
new file mode 100644
index 00..4c3e622f5c
--- /dev/null
+++ b/include/qemu/drm.h
@@ -0,0 +1,6 @@
+#ifndef QEMU_DRM_H_
+#define QEMU_DRM_H_
+
+int qemu_drm_rendernode_open(const char *rendernode);
+
+#endif
diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c
index 71b6a97bd1..4f475142fc 100644
--- a/ui/egl-helpers.c
+++ b/ui/egl-helpers.c
@@ -15,9 +15,7 @@
  * License along with this library; if not, see .
  */
 #include "qemu/osdep.h"
-#include 
-#include 
-
+#include "qemu/drm.h"
 #include "qemu/error-report.h"
 #include "ui/console.h"
 #include "ui/egl-helpers.h"
@@ -147,57 +145,12 @@ int qemu_egl_rn_fd;
 struct gbm_device *qemu_egl_rn_gbm_dev;
 EGLContext qemu_egl_rn_ctx;
 
-static int qemu_egl_rendernode_open(const char *rendernode)
-{
-DIR *dir;
-struct dirent *e;
-int r, fd;
-char *p;
-
-if (rendernode) {
-return open(rendernode, O_RDWR | O_CLOEXEC | O_NOCTTY | O_NONBLOCK);
-}
-
-dir = opendir("/dev/dri");
-if (!dir) {
-return -1;
-}
-
-fd = -1;
-while ((e = readdir(dir))) {
-if (e->d_type != DT_CHR) {
-continue;
-}
-
-if (strncmp(e->d_name, "renderD", 7)) {
-continue;
-}
-
-p = g_strdup_printf("/dev/dri/%s", e->d_name);
-
-r = open(p, O_RDWR | O_CLOEXEC | O_NOCTTY | O_NONBLOCK);
-if (r < 0) {
-g_free(p);
-continue;
-}
-fd = r;
-g_free(p);
-break;
-}
-
-closedir(dir);
-if (fd < 0) {
-return -1;
-}
-return fd;
-}
-
 int egl_rendernode_init(const char *rendernode, DisplayGLMode mode)
 {
 qemu_egl_rn_fd = -1;
 int rc;
 
-qemu_egl_rn_fd = qemu_egl_rendernode_open(rendernode);
+qemu_egl_rn_fd = qemu_drm_rendernode_open(rendernode);
 if (qemu_egl_rn_fd == -1) {
 error_report("egl: no drm render node available");
 goto err;
diff --git a/util/drm.c b/util/drm.c
new file mode 100644
index 00..a23ff24538
--- /dev/null
+++ b/util/drm.c
@@ -0,0 +1,66 @@
+/*
+ * Copyright (C) 2015-2016 Gerd Hoffmann 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+#include "qemu/osdep.h"
+#include "qemu/drm.h"
+
+#include 
+#include 
+
+int qemu_drm_rendernode_open(const char *rendernode)
+{
+DIR *dir;
+struct dirent *e;
+int r, fd;
+char *p;
+
+if (rendernode) {
+return open(rendernode, O_RDWR | O_CLOEXEC | O_NOCTTY | O_NONBLOCK);
+}
+
+dir = opendir("/dev/dri");
+if (!dir) {
+return -1;
+}
+
+fd = -1;
+while ((e = readdir(dir))) {
+if (e->d_type != DT_CHR) {
+continue;
+}
+
+if (strncmp(e->d_name, "renderD", 7)) {
+continue;
+}
+
+p = g_strdup_printf("/dev/dri/%s", e->d_name);
+
+r = open(p, O_RDWR | O_CLOEXEC | O_NOCTTY | O_NONBLOCK);
+if (r < 0) {
+g_free(p);
+continue;
+}
+fd = r;
+g_free(p);
+break;
+}
+
+closedir(dir);
+if (fd < 0) {
+return -1;
+}
+return fd;
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index 5a83bf56a3..01e2b47c34 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1565,6 +1565,7 @@ S: Odd Fixes
 F: ui/
 F: include/ui/
 F: qapi/ui.json
+F: util/drm.c
 
 Cocoa graphics
 M: Peter Maydell 
diff --git a/util/Makefile.objs b/util/Makefile.objs
index e1c3fed4dc..1810f970ef 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -49,3 +49,4 @@ util-obj-y += stats64.o
 util-obj-y += systemd.o
 util-obj-y += iova-tree.o
 util-obj-$(CONFIG_LINUX) += vfio-helpers.o
+util-obj-$(CONFIG_LINUX) += drm.o
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 21/29] util: use fcntl() for qemu_write_pidfile() locking

2018-07-13 Thread Marc-André Lureau

According to Daniel Berrange, fcntl() locks have better portable
semantics than lockf().

Use an exclusive lock on the first byte with fcntl().

Signed-off-by: Marc-André Lureau 
---
 util/oslib-posix.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index da1d4a3201..26b11490b9 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -92,6 +92,11 @@ bool qemu_write_pidfile(const char *pidfile, Error **errp)
 {
 int pidfd;
 char pidstr[32];
+struct flock lock = {
+.l_type = F_WRLCK,
+.l_whence = SEEK_SET,
+.l_len = 1,
+};
 
 pidfd = qemu_open(pidfile, O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
 if (pidfd == -1) {
@@ -99,10 +104,11 @@ bool qemu_write_pidfile(const char *pidfile, Error **errp)
 return false;
 }
 
-if (lockf(pidfd, F_TLOCK, 0)) {
+if (fcntl(pidfd, F_SETLK, )) {
 error_setg_errno(errp, errno, "Cannot lock pid file");
 goto fail;
 }
+
 if (ftruncate(pidfd, 0)) {
 error_setg_errno(errp, errno, "Failed to truncate pid file");
 goto fail;
-- 
2.18.0.129.ge3331758f1

Re: [Qemu-devel] [PATCH for-3.0 1/2] hw/intc/arm_gic: Check interrupt number in gic_deactivate_irq()

2018-07-13 Thread Luc Michel

On 07/12/2018 05:41 PM, Peter Maydell wrote:
> In gic_deactivate_irq() the interrupt number comes from the guest
> (on a write to the GICC_DIR register), so we need to sanity check
> that it isn't out of range before we use it as an array index.
> Handle this in a similar manner to the check we do in
> gic_complete_irq() for the GICC_EOI register.
> 
> The array overrun is not disastrous because the calling code
> uses (value & 0x3ff) to extract the interrupt field, so the
> only out-of-range values possible are 1020..1023, which allow
> overrunning only from irq_state[] into the following
> irq_target[] array which the guest can already manipulate.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/intc/arm_gic.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)

Reviewed-by: Luc Michel 

-- 
Luc



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v4 16/29] vhost-user: add vhost_user_gpu_set_socket()

2018-07-13 Thread Marc-André Lureau

Add a new vhost-user message to give a unix socket to a vhost-user
backend for GPU display updates.

Back when I started that work, I added a new GPU channel because the
vhost-user protocol wasn't bidirectional. Since then, there is a
vhost-user-slave channel for the slave to send requests to the master.
We could extend it with GPU messages. However, the GPU protocol is
quite orthogonal to vhost-user, thus I chose to have a new dedicated
channel.

See vhost-user-gpu.rst for the protocol details.

Signed-off-by: Marc-André Lureau 
---
 contrib/libvhost-user/libvhost-user.h |   1 +
 include/hw/virtio/vhost-backend.h |   1 +
 hw/virtio/vhost-user.c|  11 ++
 MAINTAINERS   |   6 +
 docs/interop/vhost-user-gpu.rst   | 236 ++
 docs/interop/vhost-user.txt   |   9 +
 6 files changed, 264 insertions(+)
 create mode 100644 docs/interop/vhost-user-gpu.rst

diff --git a/contrib/libvhost-user/libvhost-user.h 
b/contrib/libvhost-user/libvhost-user.h
index a4afbc3a46..42e227cce6 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -92,6 +92,7 @@ typedef enum VhostUserRequest {
 VHOST_USER_POSTCOPY_LISTEN  = 29,
 VHOST_USER_POSTCOPY_END = 30,
 VHOST_USER_INPUT_GET_CONFIG = 31,
+VHOST_USER_GPU_SET_SOCKET   = 32,
 VHOST_USER_MAX
 } VhostUserRequest;
 
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index 1fca321d8a..daf9180c33 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -163,5 +163,6 @@ int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
 
 int vhost_user_input_get_config(struct vhost_dev *dev,
 struct virtio_input_config **config);
+int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd);
 
 #endif /* VHOST_BACKEND_H */
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 0c6914fab7..8c1a1742b0 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -90,6 +90,7 @@ typedef enum VhostUserRequest {
 VHOST_USER_POSTCOPY_LISTEN  = 29,
 VHOST_USER_POSTCOPY_END = 30,
 VHOST_USER_INPUT_GET_CONFIG = 31,
+VHOST_USER_GPU_SET_SOCKET   = 32,
 VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -397,6 +398,16 @@ err:
 return -1;
 }
 
+int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd)
+{
+VhostUserMsg msg = {
+.hdr.request = VHOST_USER_GPU_SET_SOCKET,
+.hdr.flags = VHOST_USER_VERSION,
+};
+
+return vhost_user_write(dev, , , 1);
+}
+
 static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
struct vhost_log *log)
 {
diff --git a/MAINTAINERS b/MAINTAINERS
index 50ace65b3b..5a83bf56a3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1371,6 +1371,12 @@ F: hw/display/virtio-gpu*
 F: hw/display/virtio-vga.c
 F: include/hw/virtio/virtio-gpu.h
 
+vhost-user-gpu
+M: Marc-André Lureau 
+M: Gerd Hoffmann 
+S: Maintained
+F: docs/interop/vhost-user-gpu.rst
+
 Cirrus VGA
 M: Gerd Hoffmann 
 S: Odd Fixes
diff --git a/docs/interop/vhost-user-gpu.rst b/docs/interop/vhost-user-gpu.rst
new file mode 100644
index 00..309d07fd6d
--- /dev/null
+++ b/docs/interop/vhost-user-gpu.rst
@@ -0,0 +1,236 @@
+===
+Vhost-user-gpu Protocol
+===
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Overview
+
+
+The vhost-user-gpu protocol is aiming at sharing the rendering result
+of a virtio-gpu, done from a vhost-user slave process to a vhost-user
+master process (such as QEMU). It bears a resemblance to a display
+server protocol, if you consider QEMU as the display server and the
+slave as the client, but in a very limited way. Typically, it will
+work by setting a scanout/display configuration, before sending flush
+events for the display updates. It will also update the cursor shape
+and position.
+
+The protocol is sent over a UNIX domain stream socket, since it uses
+socket ancillary data to share opened file descriptors (DMABUF fds or
+shared memory).
+
+Requests are sent by the slave, and the optional replies by the master.
+
+Wire format
+===
+
+Unless specified differently, numbers are in the machine native byte
+order.
+
+A vhost-user-gpu request consists of 2 header fields and a payload:
+
+::
+
+  
+  | u32:request | u32:size | payload |
+  
+
+- request: 32-bit type of the request
+- size: 32-bit size of the payload
+
+A reply consists only of a payload, whose content depends on on the request.
+
+
+Payload types
+-
+
+VhostUserGpuCursorPos
+^
+
+::
+
+   --
+   | u32:scanout-id | u32:x | u32:y |
+   --
+
+- scanout-id: the scanout where the cursor is

[Qemu-devel] [PATCH v4 14/29] contrib: add vhost-user-input

2018-07-13 Thread Marc-André Lureau

Add a vhost-user input backend, based on virtio-input-host device. It
takes an evdev path as argument, and can be associated with a
vhost-user-backend object, ex:

-object vhost-user-backend,id=vuid,cmd="vhost-user-input /dev/input/event0"

Signed-off-by: Marc-André Lureau 
---
 contrib/vhost-user-input/main.c| 379 +
 MAINTAINERS|   1 +
 Makefile   |   3 +
 Makefile.objs  |   1 +
 configure  |   3 +
 contrib/vhost-user-input/Makefile.objs |   1 +
 6 files changed, 388 insertions(+)
 create mode 100644 contrib/vhost-user-input/main.c
 create mode 100644 contrib/vhost-user-input/Makefile.objs

diff --git a/contrib/vhost-user-input/main.c b/contrib/vhost-user-input/main.c
new file mode 100644
index 00..f0a58da2a8
--- /dev/null
+++ b/contrib/vhost-user-input/main.c
@@ -0,0 +1,379 @@
+/*
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include 
+#include 
+
+#include "qemu/iov.h"
+#include "qemu/bswap.h"
+#include "contrib/libvhost-user/libvhost-user.h"
+#include "contrib/libvhost-user/libvhost-user-glib.h"
+#include "standard-headers/linux/virtio_input.h"
+#include "qapi/error.h"
+
+typedef struct virtio_input_event virtio_input_event;
+typedef struct virtio_input_config virtio_input_config;
+
+typedef struct VuInput {
+VugDev dev;
+GSource *evsrc;
+int evdevfd;
+GArray *config;
+virtio_input_event *queue;
+uint32_t qindex, qsize;
+} VuInput;
+
+static void vi_input_send(VuInput *vi, struct virtio_input_event *event)
+{
+VuDev *dev = >dev.parent;
+VuVirtq *vq = vu_get_queue(dev, 0);
+VuVirtqElement *elem;
+unsigned have, need;
+int i, len;
+
+/* queue up events ... */
+if (vi->qindex == vi->qsize) {
+vi->qsize++;
+vi->queue = realloc(vi->queue, vi->qsize *
+sizeof(virtio_input_event));
+}
+vi->queue[vi->qindex++] = *event;
+
+/* ... until we see a report sync ... */
+if (event->type != htole16(EV_SYN) ||
+event->code != htole16(SYN_REPORT)) {
+return;
+}
+
+/* ... then check available space ... */
+need = sizeof(virtio_input_event) * vi->qindex;
+vu_queue_get_avail_bytes(dev, vq, , NULL, need, 0);
+if (have < need) {
+vi->qindex = 0;
+g_warning("ENOSPC in vq, dropping events");
+return;
+}
+
+/* ... and finally pass them to the guest */
+for (i = 0; i < vi->qindex; i++) {
+elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+if (!elem) {
+/* should not happen, we've checked for space beforehand */
+g_warning("%s: Huh?  No vq elem available ...\n", __func__);
+return;
+}
+len = iov_from_buf(elem->in_sg, elem->in_num,
+   0, vi->queue + i, sizeof(virtio_input_event));
+vu_queue_push(dev, vq, elem, len);
+g_free(elem);
+}
+vu_queue_notify(>dev.parent, vq);
+vi->qindex = 0;
+}
+
+static void
+vi_evdev_watch(VuDev *dev, int condition, void *data)
+{
+VuInput *vi = data;
+int fd = vi->evdevfd;
+
+g_debug("Got evdev condition %x", condition);
+
+struct virtio_input_event virtio;
+struct input_event evdev;
+int rc;
+
+for (;;) {
+rc = read(fd, , sizeof(evdev));
+if (rc != sizeof(evdev)) {
+break;
+}
+
+g_debug("input %d %d %d", evdev.type, evdev.code, evdev.value);
+
+virtio.type  = htole16(evdev.type);
+virtio.code  = htole16(evdev.code);
+virtio.value = htole32(evdev.value);
+vi_input_send(vi, );
+}
+}
+
+
+static void vi_handle_status(VuInput *vi, virtio_input_event *event)
+{
+struct input_event evdev;
+int rc;
+
+if (gettimeofday(, NULL)) {
+perror("vi_handle_status: gettimeofday");
+return;
+}
+
+evdev.type = le16toh(event->type);
+evdev.code = le16toh(event->code);
+evdev.value = le32toh(event->value);
+
+rc = write(vi->evdevfd, , sizeof(evdev));
+if (rc == -1) {
+perror("vi_host_handle_status: write");
+}
+}
+
+static void vi_handle_sts(VuDev *dev, int qidx)
+{
+VuInput *vi = container_of(dev, VuInput, dev.parent);
+VuVirtq *vq = vu_get_queue(dev, qidx);
+virtio_input_event event;
+VuVirtqElement *elem;
+int len;
+
+g_debug("%s", G_STRFUNC);
+
+for (;;) {
+elem = vu_queue_pop(dev, vq, sizeof(VuVirtqElement));
+if (!elem) {
+break;
+}
+
+memset(, 0, sizeof(event));
+len = iov_to_buf(elem->out_sg, elem->out_num,
+ 0, , sizeof(event));
+vi_handle_status(vi, );
+vu_queue_push(dev, vq, elem, len);
+g_free(elem);
+}
+

[Qemu-devel] [PATCH v4 17/29] vhost-user: add vhost_user_gpu_get_num_capsets()

2018-07-13 Thread Marc-André Lureau

See vhost-user.txt protocol documentation for details.

Signed-off-by: Marc-André Lureau 
---
 contrib/libvhost-user/libvhost-user.h |  1 +
 include/hw/virtio/vhost-backend.h |  1 +
 hw/virtio/vhost-user.c| 15 +++
 docs/interop/vhost-user.txt   |  8 
 4 files changed, 25 insertions(+)

diff --git a/contrib/libvhost-user/libvhost-user.h 
b/contrib/libvhost-user/libvhost-user.h
index 42e227cce6..9f30e05bd1 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -93,6 +93,7 @@ typedef enum VhostUserRequest {
 VHOST_USER_POSTCOPY_END = 30,
 VHOST_USER_INPUT_GET_CONFIG = 31,
 VHOST_USER_GPU_SET_SOCKET   = 32,
+VHOST_USER_GPU_GET_NUM_CAPSETS = 33,
 VHOST_USER_MAX
 } VhostUserRequest;
 
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index daf9180c33..730e24c33b 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -164,5 +164,6 @@ int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
 int vhost_user_input_get_config(struct vhost_dev *dev,
 struct virtio_input_config **config);
 int vhost_user_gpu_set_socket(struct vhost_dev *dev, int fd);
+int vhost_user_gpu_get_num_capsets(struct vhost_dev *dev, uint32_t *num);
 
 #endif /* VHOST_BACKEND_H */
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8c1a1742b0..bfec6528f2 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -91,6 +91,7 @@ typedef enum VhostUserRequest {
 VHOST_USER_POSTCOPY_END = 30,
 VHOST_USER_INPUT_GET_CONFIG = 31,
 VHOST_USER_GPU_SET_SOCKET   = 32,
+VHOST_USER_GPU_GET_NUM_CAPSETS = 33,
 VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -922,6 +923,20 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int 
request, uint64_t *u64)
 return 0;
 }
 
+int vhost_user_gpu_get_num_capsets(struct vhost_dev *dev, uint32_t *num)
+{
+uint64_t u64;
+
+if (vhost_user_get_u64(dev, VHOST_USER_GPU_GET_NUM_CAPSETS, ) < 0) {
+return -1;
+}
+if (u64 > INT32_MAX) {
+return -1;
+}
+*num = u64;
+return 0;
+}
+
 static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
 {
 return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index c4c063a8c0..ba5e37d714 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -777,6 +777,14 @@ Master message types
   ancillary data. The GPU protocol is used to inform the master of
   rendering state and updates. See vhost-user-gpu.rst for details.
 
+ * VHOST_USER_GPU_GET_NUM_CAPSETS
+  Id: 33
+  Master payload: N/A
+  Slave payload: u64
+
+  Get the virtio-gpu 'num_capsets' config value. For non-virgl
+  rendering, the value can be 0.
+
 Slave message types
 ---
 
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 24/29] virtio-gpu: remove unused config_size

2018-07-13 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
---
 include/hw/virtio/virtio-gpu.h | 2 --
 hw/display/virtio-gpu.c| 3 +--
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index c54c903a65..4c68bc4559 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -96,8 +96,6 @@ typedef struct VirtIOGPU {
 
 int enable;
 
-int config_size;
-
 QTAILQ_HEAD(, virtio_gpu_simple_resource) reslist;
 QTAILQ_HEAD(, virtio_gpu_ctrl_command) cmdq;
 QTAILQ_HEAD(, virtio_gpu_ctrl_command) fenceq;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 30027f3384..7cfb25879a 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1189,10 +1189,9 @@ static void virtio_gpu_device_realize(DeviceState *qdev, 
Error **errp)
 }
 }
 
-g->config_size = sizeof(struct virtio_gpu_config);
 g->virtio_config.num_scanouts = cpu_to_le32(g->conf.max_outputs);
 virtio_init(VIRTIO_DEVICE(g), "virtio-gpu", VIRTIO_ID_GPU,
-g->config_size);
+sizeof(struct virtio_gpu_config));
 
 g->req_state[0].width = g->conf.xres;
 g->req_state[0].height = g->conf.yres;
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 15/29] Add vhost-user-input-pci

2018-07-13 Thread Marc-André Lureau

Add a new virtio-input device, which connects to a vhost-user
backend. Usage:

-object vhost-user-backend,id=vuid \
-device vhost-user-input-pci,vhost-user=vuid

Signed-off-by: Marc-André Lureau 
---
 hw/virtio/virtio-pci.h   |  10 +++
 include/hw/virtio/virtio-input.h |  14 
 hw/input/vhost-user-input.c  | 110 +++
 hw/virtio/virtio-pci.c   |  20 ++
 MAINTAINERS  |   1 +
 hw/input/Makefile.objs   |   1 +
 6 files changed, 156 insertions(+)
 create mode 100644 hw/input/vhost-user-input.c

diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 813082b0d7..c7e28e1b9c 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -54,6 +54,7 @@ typedef struct VirtIORngPCI VirtIORngPCI;
 typedef struct VirtIOInputPCI VirtIOInputPCI;
 typedef struct VirtIOInputHIDPCI VirtIOInputHIDPCI;
 typedef struct VirtIOInputHostPCI VirtIOInputHostPCI;
+typedef struct VHostUserInputPCI VHostUserInputPCI;
 typedef struct VirtIOGPUPCI VirtIOGPUPCI;
 typedef struct VHostVSockPCI VHostVSockPCI;
 typedef struct VirtIOCryptoPCI VirtIOCryptoPCI;
@@ -376,6 +377,15 @@ struct VirtIOInputHostPCI {
 
 #endif
 
+#define TYPE_VHOST_USER_INPUT_PCI "vhost-user-input-pci"
+#define VHOST_USER_INPUT_PCI(obj)\
+OBJECT_CHECK(VHostUserInputPCI, (obj), TYPE_VHOST_USER_INPUT_PCI)
+
+struct VHostUserInputPCI {
+VirtIOPCIProxy parent_obj;
+VHostUserInput vhi;
+};
+
 /*
  * virtio-gpu-pci: This extends VirtioPCIProxy.
  */
diff --git a/include/hw/virtio/virtio-input.h b/include/hw/virtio/virtio-input.h
index 054c38836f..4fca03e796 100644
--- a/include/hw/virtio/virtio-input.h
+++ b/include/hw/virtio/virtio-input.h
@@ -2,6 +2,7 @@
 #define QEMU_VIRTIO_INPUT_H
 
 #include "ui/input.h"
+#include "sysemu/vhost-user-backend.h"
 
 /* - */
 /* virtio input protocol */
@@ -42,11 +43,18 @@ typedef struct virtio_input_event virtio_input_event;
 #define VIRTIO_INPUT_HOST_GET_PARENT_CLASS(obj) \
 OBJECT_GET_PARENT_CLASS(obj, TYPE_VIRTIO_INPUT_HOST)
 
+#define TYPE_VHOST_USER_INPUT   "vhost-user-input"
+#define VHOST_USER_INPUT(obj)  \
+OBJECT_CHECK(VHostUserInput, (obj), TYPE_VHOST_USER_INPUT)
+#define VHOST_USER_INPUT_GET_PARENT_CLASS(obj) \
+OBJECT_GET_PARENT_CLASS(obj, TYPE_VHOST_USER_INPUT)
+
 typedef struct VirtIOInput VirtIOInput;
 typedef struct VirtIOInputClass VirtIOInputClass;
 typedef struct VirtIOInputConfig VirtIOInputConfig;
 typedef struct VirtIOInputHID VirtIOInputHID;
 typedef struct VirtIOInputHost VirtIOInputHost;
+typedef struct VHostUserInput VHostUserInput;
 
 struct VirtIOInputConfig {
 virtio_input_config   config;
@@ -98,6 +106,12 @@ struct VirtIOInputHost {
 int   fd;
 };
 
+struct VHostUserInput {
+VirtIOInput   parent_obj;
+
+VhostUserBackend  *vhost;
+};
+
 void virtio_input_send(VirtIOInput *vinput, virtio_input_event *event);
 void virtio_input_init_config(VirtIOInput *vinput,
   virtio_input_config *config);
diff --git a/hw/input/vhost-user-input.c b/hw/input/vhost-user-input.c
new file mode 100644
index 00..ef1b23a8b2
--- /dev/null
+++ b/hw/input/vhost-user-input.c
@@ -0,0 +1,110 @@
+/*
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+
+#include "hw/qdev.h"
+#include "hw/virtio/virtio-input.h"
+
+static void vhost_input_realize(DeviceState *dev, Error **errp)
+{
+VHostUserInput *vhi = VHOST_USER_INPUT(dev);
+VirtIOInput *vinput = VIRTIO_INPUT(dev);
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+virtio_input_config *config;
+int i, ret;
+
+if (!vhi->vhost) {
+error_setg(errp, "'vhost-user' property is required");
+return;
+}
+
+if (vhost_user_backend_dev_init(vhi->vhost, vdev, 2, errp) == -1) {
+return;
+}
+
+ret = vhost_user_input_get_config(>vhost->dev, );
+if (ret < 0) {
+error_setg(errp, "failed to get input config");
+return;
+}
+for (i = 0; i < ret; i++) {
+virtio_input_add_config(vinput, [i]);
+}
+g_free(config);
+}
+
+static void vhost_input_change_active(VirtIOInput *vinput)
+{
+VHostUserInput *vhi = VHOST_USER_INPUT(vinput);
+
+if (vinput->active) {
+vhost_user_backend_start(vhi->vhost);
+} else {
+vhost_user_backend_stop(vhi->vhost);
+}
+}
+
+static const VMStateDescription vmstate_vhost_input = {
+.name = "vhost-user-input",
+.unmigratable = 1,
+};
+
+static void vhost_input_class_init(ObjectClass *klass, void *data)
+{
+VirtIOInputClass *vic =

[Qemu-devel] [PATCH v4 12/29] vhost-user: add vhost_user_input_get_config()

2018-07-13 Thread Marc-André Lureau

Ask vhost user input backend the list of virtio_input_config.

Signed-off-by: Marc-André Lureau 
---
 contrib/libvhost-user/libvhost-user.h |  1 +
 include/hw/virtio/vhost-backend.h |  4 ++
 hw/virtio/vhost-user.c| 59 +++
 docs/interop/vhost-user.txt   |  7 
 4 files changed, 71 insertions(+)

diff --git a/contrib/libvhost-user/libvhost-user.h 
b/contrib/libvhost-user/libvhost-user.h
index 4aa55b4d2d..a4afbc3a46 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -91,6 +91,7 @@ typedef enum VhostUserRequest {
 VHOST_USER_POSTCOPY_ADVISE  = 28,
 VHOST_USER_POSTCOPY_LISTEN  = 29,
 VHOST_USER_POSTCOPY_END = 30,
+VHOST_USER_INPUT_GET_CONFIG = 31,
 VHOST_USER_MAX
 } VhostUserRequest;
 
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index 81283ec50f..1fca321d8a 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -12,6 +12,7 @@
 #define VHOST_BACKEND_H
 
 #include "exec/memory.h"
+#include "standard-headers/linux/virtio_input.h"
 
 typedef enum VhostBackendType {
 VHOST_BACKEND_TYPE_NONE = 0,
@@ -160,4 +161,7 @@ int vhost_backend_invalidate_device_iotlb(struct vhost_dev 
*dev,
 int vhost_backend_handle_iotlb_msg(struct vhost_dev *dev,
   struct vhost_iotlb_msg *imsg);
 
+int vhost_user_input_get_config(struct vhost_dev *dev,
+struct virtio_input_config **config);
+
 #endif /* VHOST_BACKEND_H */
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 29f8568a13..0c6914fab7 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -89,6 +89,7 @@ typedef enum VhostUserRequest {
 VHOST_USER_POSTCOPY_ADVISE  = 28,
 VHOST_USER_POSTCOPY_LISTEN  = 29,
 VHOST_USER_POSTCOPY_END = 30,
+VHOST_USER_INPUT_GET_CONFIG = 31,
 VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -338,6 +339,64 @@ static int vhost_user_write(struct vhost_dev *dev, 
VhostUserMsg *msg,
 return 0;
 }
 
+static void *vhost_user_read_size(struct vhost_dev *dev, uint32_t size)
+{
+struct vhost_user *u = dev->opaque;
+CharBackend *chr = u->user->chr;
+int r;
+uint8_t *p = g_malloc(size);
+
+r = qemu_chr_fe_read_all(chr, p, size);
+if (r != size) {
+error_report("Failed to read msg payload."
+ " Read %d instead of %d.", r, size);
+return NULL;
+}
+
+return p;
+}
+
+int vhost_user_input_get_config(struct vhost_dev *dev,
+struct virtio_input_config **config)
+{
+void *p = NULL;
+VhostUserMsg msg = {
+.hdr.request = VHOST_USER_INPUT_GET_CONFIG,
+.hdr.flags = VHOST_USER_VERSION,
+};
+
+if (vhost_user_write(dev, , NULL, 0) < 0) {
+goto err;
+}
+
+if (vhost_user_read_header(dev, ) < 0) {
+goto err;
+}
+
+p = vhost_user_read_size(dev, msg.hdr.size);
+if (!p) {
+goto err;
+}
+
+if (msg.hdr.request != VHOST_USER_INPUT_GET_CONFIG) {
+error_report("Received unexpected msg type. Expected %d received %d",
+ VHOST_USER_INPUT_GET_CONFIG, msg.hdr.request);
+goto err;
+}
+
+if (msg.hdr.size % sizeof(struct virtio_input_config)) {
+error_report("Invalid msg size");
+goto err;
+}
+
+*config = p;
+return msg.hdr.size / sizeof(struct virtio_input_config);
+
+err:
+g_free(p);
+return -1;
+}
+
 static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
struct vhost_log *log)
 {
diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index f59667f498..cba1ddde16 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -761,6 +761,13 @@ Master message types
   was previously sent.
   The value returned is an error indication; 0 is success.
 
+ * VHOST_USER_INPUT_GET_CONFIG
+  Id: 31
+  Master payload: N/A
+  Slave payload: (struct virtio_input_config)*
+
+  Ask vhost user input backend the list of virtio_input_config.
+
 Slave message types
 ---
 
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 11/29] vhost-user: split vhost_user_read()

2018-07-13 Thread Marc-André Lureau

Split vhost_user_read(), so only header can be read with
vhost_user_read_header().

Signed-off-by: Marc-André Lureau 
---
 hw/virtio/vhost-user.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 5b4188bc27..29f8568a13 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -210,7 +210,7 @@ static bool ioeventfd_enabled(void)
 return kvm_enabled() && kvm_eventfds_enabled();
 }
 
-static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
+static int vhost_user_read_header(struct vhost_dev *dev, VhostUserMsg *msg)
 {
 struct vhost_user *u = dev->opaque;
 CharBackend *chr = u->user->chr;
@@ -221,7 +221,7 @@ static int vhost_user_read(struct vhost_dev *dev, 
VhostUserMsg *msg)
 if (r != size) {
 error_report("Failed to read msg header. Read %d instead of %d."
  " Original request %d.", r, size, msg->hdr.request);
-goto fail;
+return -1;
 }
 
 /* validate received flags */
@@ -229,7 +229,21 @@ static int vhost_user_read(struct vhost_dev *dev, 
VhostUserMsg *msg)
 error_report("Failed to read msg header."
 " Flags 0x%x instead of 0x%x.", msg->hdr.flags,
 VHOST_USER_REPLY_MASK | VHOST_USER_VERSION);
-goto fail;
+return -1;
+}
+
+return 0;
+}
+
+static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
+{
+struct vhost_user *u = dev->opaque;
+CharBackend *chr = u->user->chr;
+uint8_t *p = (uint8_t *) msg;
+int r, size;
+
+if (vhost_user_read_header(dev, msg) < 0) {
+return -1;
 }
 
 /* validate message size is sane */
@@ -237,7 +251,7 @@ static int vhost_user_read(struct vhost_dev *dev, 
VhostUserMsg *msg)
 error_report("Failed to read msg header."
 " Size %d exceeds the maximum %zu.", msg->hdr.size,
 VHOST_USER_PAYLOAD_SIZE);
-goto fail;
+return -1;
 }
 
 if (msg->hdr.size) {
@@ -247,14 +261,11 @@ static int vhost_user_read(struct vhost_dev *dev, 
VhostUserMsg *msg)
 if (r != size) {
 error_report("Failed to read msg payload."
  " Read %d instead of %d.", r, msg->hdr.size);
-goto fail;
+return -1;
 }
 }
 
 return 0;
-
-fail:
-return -1;
 }
 
 static int process_message_reply(struct vhost_dev *dev,
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 23/29] virtio-gpu: remove unused qdev

2018-07-13 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
---
 include/hw/virtio/virtio-gpu.h | 1 -
 hw/display/virtio-gpu.c| 1 -
 2 files changed, 2 deletions(-)

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 9780f755ef..c54c903a65 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -97,7 +97,6 @@ typedef struct VirtIOGPU {
 int enable;
 
 int config_size;
-DeviceState *qdev;
 
 QTAILQ_HEAD(, virtio_gpu_simple_resource) reslist;
 QTAILQ_HEAD(, virtio_gpu_ctrl_command) cmdq;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 28602c1a35..30027f3384 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1219,7 +1219,6 @@ static void virtio_gpu_device_realize(DeviceState *qdev, 
Error **errp)
 QTAILQ_INIT(>fenceq);
 
 g->enabled_output_bitmask = 1;
-g->qdev = qdev;
 
 for (i = 0; i < g->conf.max_outputs; i++) {
 g->scanout[i].con =
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 13/29] libvhost-user: export vug_source_new()

2018-07-13 Thread Marc-André Lureau

Simplify the creation of FD sources for other users.

Signed-off-by: Marc-André Lureau 
---
 contrib/libvhost-user/libvhost-user-glib.h |  3 +++
 contrib/libvhost-user/libvhost-user-glib.c | 15 +++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user-glib.h 
b/contrib/libvhost-user/libvhost-user-glib.h
index 6b2110b94c..d3200f3afc 100644
--- a/contrib/libvhost-user/libvhost-user-glib.h
+++ b/contrib/libvhost-user/libvhost-user-glib.h
@@ -29,4 +29,7 @@ void vug_init(VugDev *dev, int socket,
   vu_panic_cb panic, const VuDevIface *iface);
 void vug_deinit(VugDev *dev);
 
+GSource *vug_source_new(VugDev *dev, int fd, GIOCondition cond,
+vu_watch_cb vu_cb, gpointer data);
+
 #endif /* LIBVHOST_USER_GLIB_H */
diff --git a/contrib/libvhost-user/libvhost-user-glib.c 
b/contrib/libvhost-user/libvhost-user-glib.c
index 545f089587..0e2c467f28 100644
--- a/contrib/libvhost-user/libvhost-user-glib.c
+++ b/contrib/libvhost-user/libvhost-user-glib.c
@@ -69,8 +69,8 @@ static GSourceFuncs vug_src_funcs = {
 };
 
 static GSource *
-vug_source_new(VuDev *dev, int fd, GIOCondition cond,
-   vu_watch_cb vu_cb, gpointer data)
+_vug_source_new(VuDev *dev, int fd, GIOCondition cond,
+vu_watch_cb vu_cb, gpointer data)
 {
 GSource *gsrc;
 VugSrc *src;
@@ -95,6 +95,13 @@ vug_source_new(VuDev *dev, int fd, GIOCondition cond,
 return gsrc;
 }
 
+GSource *
+vug_source_new(VugDev *dev, int fd, GIOCondition cond,
+   vu_watch_cb vu_cb, gpointer data)
+{
+return _vug_source_new(>parent, fd, cond, vu_cb, data);
+}
+
 static void
 set_watch(VuDev *vu_dev, int fd, int vu_evt, vu_watch_cb cb, void *pvt)
 {
@@ -106,7 +113,7 @@ set_watch(VuDev *vu_dev, int fd, int vu_evt, vu_watch_cb 
cb, void *pvt)
 g_assert(cb);
 
 dev = container_of(vu_dev, VugDev, parent);
-src = vug_source_new(vu_dev, fd, vu_evt, cb, pvt);
+src = _vug_source_new(vu_dev, fd, vu_evt, cb, pvt);
 g_hash_table_replace(dev->fdmap, GINT_TO_POINTER(fd), src);
 }
 
@@ -141,7 +148,7 @@ vug_init(VugDev *dev, int socket,
 dev->fdmap = g_hash_table_new_full(NULL, NULL, NULL,
(GDestroyNotify) g_source_destroy);
 
-dev->src = vug_source_new(>parent, socket, G_IO_IN, vug_watch, NULL);
+dev->src = _vug_source_new(>parent, socket, G_IO_IN, vug_watch, NULL);
 }
 
 void
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 07/29] vhost-user: wrap some read/write with retry handling

2018-07-13 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/virtio/vhost-user.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 44795880d6..5b4188bc27 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -975,7 +975,10 @@ static void slave_read(void *opaque)
 iov.iov_base = 
 iov.iov_len = VHOST_USER_HDR_SIZE;
 
-size = recvmsg(u->slave_fd, , 0);
+do {
+size = recvmsg(u->slave_fd, , 0);
+} while (size < 0 && (errno == EINTR || errno == EAGAIN));
+
 if (size != VHOST_USER_HDR_SIZE) {
 error_report("Failed to read from slave.");
 goto err;
@@ -1004,7 +1007,10 @@ static void slave_read(void *opaque)
 }
 
 /* Read payload */
-size = read(u->slave_fd, , hdr.size);
+do {
+size = read(u->slave_fd, , hdr.size);
+} while (size < 0 && (errno == EINTR || errno == EAGAIN));
+
 if (size != hdr.size) {
 error_report("Failed to read payload from slave.");
 goto err;
@@ -1052,7 +1058,10 @@ static void slave_read(void *opaque)
 iovec[1].iov_base = 
 iovec[1].iov_len = hdr.size;
 
-size = writev(u->slave_fd, iovec, ARRAY_SIZE(iovec));
+do {
+size = writev(u->slave_fd, iovec, ARRAY_SIZE(iovec));
+} while (size < 0 && (errno == EINTR || errno == EAGAIN));
+
 if (size != VHOST_USER_HDR_SIZE + hdr.size) {
 error_report("Failed to send msg reply to slave.");
 goto err;
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 08/29] Add vhost-user-backend

2018-07-13 Thread Marc-André Lureau

Create a vhost-user-backend object that holds a connection to a
vhost-user backend and can be referenced from virtio devices that
support it. See later patches for input & gpu usage.

A chardev can be specified to communicate with the vhost-user backend,
ex: -chardev socket,id=char0,path=/tmp/foo.sock -object
vhost-user-backend,id=vuid,chardev=char0.

Signed-off-by: Marc-André Lureau 
---
 include/sysemu/vhost-user-backend.h |  60 +++
 backends/vhost-user.c   | 244 
 vl.c|   3 +-
 MAINTAINERS |   2 +
 backends/Makefile.objs  |   3 +-
 qemu-options.hx |  20 +++
 6 files changed, 330 insertions(+), 2 deletions(-)
 create mode 100644 include/sysemu/vhost-user-backend.h
 create mode 100644 backends/vhost-user.c

diff --git a/include/sysemu/vhost-user-backend.h 
b/include/sysemu/vhost-user-backend.h
new file mode 100644
index 00..60f811cae7
--- /dev/null
+++ b/include/sysemu/vhost-user-backend.h
@@ -0,0 +1,60 @@
+/*
+ * QEMU vhost-user backend
+ *
+ * Copyright (C) 2018 Red Hat Inc
+ *
+ * Authors:
+ *  Marc-André Lureau 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_VHOST_USER_BACKEND_H
+#define QEMU_VHOST_USER_BACKEND_H
+
+#include "qom/object.h"
+#include "exec/memory.h"
+#include "qemu/option.h"
+#include "qemu/bitmap.h"
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
+#include "chardev/char-fe.h"
+#include "io/channel.h"
+
+#define TYPE_VHOST_USER_BACKEND "vhost-user-backend"
+#define VHOST_USER_BACKEND(obj) \
+OBJECT_CHECK(VhostUserBackend, (obj), TYPE_VHOST_USER_BACKEND)
+#define VHOST_USER_BACKEND_GET_CLASS(obj) \
+OBJECT_GET_CLASS(VhostUserBackendClass, (obj), TYPE_VHOST_USER_BACKEND)
+#define VHOST_USER_BACKEND_CLASS(klass) \
+OBJECT_CLASS_CHECK(VhostUserBackendClass, (klass), TYPE_VHOST_USER_BACKEND)
+
+typedef struct VhostUserBackend VhostUserBackend;
+typedef struct VhostUserBackendClass VhostUserBackendClass;
+
+struct VhostUserBackendClass {
+ObjectClass parent_class;
+};
+
+struct VhostUserBackend {
+/* private */
+Object parent;
+
+char *cmd;
+char *chr_name;
+
+CharBackend chr;
+VhostUserState vhost_user;
+struct vhost_dev dev;
+QIOChannel *child;
+VirtIODevice *vdev;
+bool started;
+bool completed;
+};
+
+int vhost_user_backend_dev_init(VhostUserBackend *b, VirtIODevice *vdev,
+unsigned nvqs, Error **errp);
+void vhost_user_backend_start(VhostUserBackend *b);
+void vhost_user_backend_stop(VhostUserBackend *b);
+
+#endif
diff --git a/backends/vhost-user.c b/backends/vhost-user.c
new file mode 100644
index 00..bf39c0751d
--- /dev/null
+++ b/backends/vhost-user.c
@@ -0,0 +1,244 @@
+/*
+ * QEMU vhost-user backend
+ *
+ * Copyright (C) 2018 Red Hat Inc
+ *
+ * Authors:
+ *  Marc-André Lureau 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+
+#include "qemu/osdep.h"
+#include "hw/qdev.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
+#include "qom/object_interfaces.h"
+#include "sysemu/vhost-user-backend.h"
+#include "sysemu/kvm.h"
+#include "io/channel-command.h"
+#include "hw/virtio/virtio-bus.h"
+
+static bool
+ioeventfd_enabled(void)
+{
+return kvm_enabled() && kvm_eventfds_enabled();
+}
+
+int
+vhost_user_backend_dev_init(VhostUserBackend *b, VirtIODevice *vdev,
+unsigned nvqs, Error **errp)
+{
+int ret;
+
+assert(!b->vdev && vdev);
+
+if (!ioeventfd_enabled()) {
+error_setg(errp, "vhost initialization failed: requires kvm");
+return -1;
+}
+
+if (!vhost_user_init(>vhost_user, >chr, errp)) {
+return -1;
+}
+
+b->vdev = vdev;
+b->dev.nvqs = nvqs;
+b->dev.vqs = g_new(struct vhost_virtqueue, nvqs);
+
+ret = vhost_dev_init(>dev, >vhost_user, VHOST_BACKEND_TYPE_USER, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "vhost initialization failed");
+return -1;
+}
+
+return 0;
+}
+
+void
+vhost_user_backend_start(VhostUserBackend *b)
+{
+BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(b->vdev)));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+int ret, i ;
+
+if (b->started) {
+return;
+}
+
+if (!k->set_guest_notifiers) {
+error_report("binding does not support guest notifiers");
+return;
+}
+
+ret = vhost_dev_enable_notifiers(>dev, b->vdev);
+if (ret < 0) {
+return;
+}
+
+ret = k->set_guest_notifiers(qbus->parent, b->dev.nvqs, true);
+if (ret < 0) {
+error_report("Error binding guest notifier");
+goto err_host_notifiers;
+}
+
+b->dev.acked_features = b->vdev->guest_features;
+

[Qemu-devel] [PATCH v4 18/29] virtio: add virtio-gpu bswap helpers header

2018-07-13 Thread Marc-André Lureau

The helper functions are useful to build the vhost-user-gpu backend.

Signed-off-by: Marc-André Lureau 
---
 include/hw/virtio/virtio-gpu-bswap.h | 61 
 hw/display/virtio-gpu.c  | 43 +---
 2 files changed, 62 insertions(+), 42 deletions(-)
 create mode 100644 include/hw/virtio/virtio-gpu-bswap.h

diff --git a/include/hw/virtio/virtio-gpu-bswap.h 
b/include/hw/virtio/virtio-gpu-bswap.h
new file mode 100644
index 00..38d12160f6
--- /dev/null
+++ b/include/hw/virtio/virtio-gpu-bswap.h
@@ -0,0 +1,61 @@
+/*
+ * Virtio GPU Device
+ *
+ * Copyright Red Hat, Inc. 2013-2014
+ *
+ * Authors:
+ * Dave Airlie 
+ * Gerd Hoffmann 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VIRTIO_GPU_BSWAP_H
+#define HW_VIRTIO_GPU_BSWAP_H
+
+#include "qemu/bswap.h"
+
+static inline void
+virtio_gpu_ctrl_hdr_bswap(struct virtio_gpu_ctrl_hdr *hdr)
+{
+le32_to_cpus(>type);
+le32_to_cpus(>flags);
+le64_to_cpus(>fence_id);
+le32_to_cpus(>ctx_id);
+le32_to_cpus(>padding);
+}
+
+static inline void
+virtio_gpu_bswap_32(void *ptr, size_t size)
+{
+#ifdef HOST_WORDS_BIGENDIAN
+
+size_t i;
+struct virtio_gpu_ctrl_hdr *hdr = (struct virtio_gpu_ctrl_hdr *) ptr;
+
+virtio_gpu_ctrl_hdr_bswap(hdr);
+
+i = sizeof(struct virtio_gpu_ctrl_hdr);
+while (i < size) {
+le32_to_cpus((uint32_t *)(ptr + i));
+i = i + sizeof(uint32_t);
+}
+
+#endif
+}
+
+static inline void
+virtio_gpu_t2d_bswap(struct virtio_gpu_transfer_to_host_2d *t2d)
+{
+virtio_gpu_ctrl_hdr_bswap(>hdr);
+le32_to_cpus(>r.x);
+le32_to_cpus(>r.y);
+le32_to_cpus(>r.width);
+le32_to_cpus(>r.height);
+le64_to_cpus(>offset);
+le32_to_cpus(>resource_id);
+le32_to_cpus(>padding);
+}
+
+#endif
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index ec366f4c35..28602c1a35 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -19,6 +19,7 @@
 #include "trace.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-gpu.h"
+#include "hw/virtio/virtio-gpu-bswap.h"
 #include "hw/virtio/virtio-bus.h"
 #include "migration/blocker.h"
 #include "qemu/log.h"
@@ -31,48 +32,6 @@ virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id);
 
 static void virtio_gpu_cleanup_mapping(struct virtio_gpu_simple_resource *res);
 
-static void
-virtio_gpu_ctrl_hdr_bswap(struct virtio_gpu_ctrl_hdr *hdr)
-{
-le32_to_cpus(>type);
-le32_to_cpus(>flags);
-le64_to_cpus(>fence_id);
-le32_to_cpus(>ctx_id);
-le32_to_cpus(>padding);
-}
-
-static void virtio_gpu_bswap_32(void *ptr,
-size_t size)
-{
-#ifdef HOST_WORDS_BIGENDIAN
-
-size_t i;
-struct virtio_gpu_ctrl_hdr *hdr = (struct virtio_gpu_ctrl_hdr *) ptr;
-
-virtio_gpu_ctrl_hdr_bswap(hdr);
-
-i = sizeof(struct virtio_gpu_ctrl_hdr);
-while (i < size) {
-le32_to_cpus((uint32_t *)(ptr + i));
-i = i + sizeof(uint32_t);
-}
-
-#endif
-}
-
-static void
-virtio_gpu_t2d_bswap(struct virtio_gpu_transfer_to_host_2d *t2d)
-{
-virtio_gpu_ctrl_hdr_bswap(>hdr);
-le32_to_cpus(>r.x);
-le32_to_cpus(>r.y);
-le32_to_cpus(>r.width);
-le32_to_cpus(>r.height);
-le64_to_cpus(>offset);
-le32_to_cpus(>resource_id);
-le32_to_cpus(>padding);
-}
-
 #ifdef CONFIG_VIRGL
 #include 
 #define VIRGL(_g, _virgl, _simple, ...) \
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 10/29] HACK: vhost-user-backend: allow to specify binary to execute

2018-07-13 Thread Marc-André Lureau

An executable with its arguments may be given as 'cmd' property, ex:
-object vhost-user-backend,id=vui,cmd="./vhost-user-input
/dev/input..". The executable is then spawn and, by convention, the
vhost-user socket is passed as fd=3. It may be considered a security
breach to allow creating processes that may execute arbitrary
executables, so this may be restricted to some known executables (via
signature etc) or directory.

To make the patch more acceptable, the command argument would have to
be passed via an array (probably via -object json: syntax), instead of
using g_shell_parse_argv().

Signed-off-by: Marc-André Lureau 
---
 backends/vhost-user.c | 96 ++-
 qemu-options.hx   | 12 --
 2 files changed, 94 insertions(+), 14 deletions(-)

diff --git a/backends/vhost-user.c b/backends/vhost-user.c
index bf39c0751d..32d3ec0e8b 100644
--- a/backends/vhost-user.c
+++ b/backends/vhost-user.c
@@ -136,31 +136,105 @@ vhost_user_backend_stop(VhostUserBackend *b)
 b->started = false;
 }
 
+static void
+pre_exec_cb(void *data)
+{
+int *sv = data;
+int maxfd = sysconf(_SC_OPEN_MAX);
+int fd;
+
+dup2(sv[1], 3);
+for (fd = 4; fd < maxfd; fd++) {
+close(fd);
+}
+}
+
 static void
 vhost_user_backend_complete(UserCreatable *uc, Error **errp)
 {
 VhostUserBackend *b = VHOST_USER_BACKEND(uc);
 Chardev *chr;
+char **argv = NULL;
 
-if (!b->chr_name) {
-error_setg(errp, "You must specificy 'chardev'.");
+if (!!b->chr_name + !!b->cmd != 1) {
+error_setg(errp, "You may specificy only one of 'chardev' or 'cmd'.");
 return;
 }
 
-chr = qemu_chr_find(b->chr_name);
-if (chr == NULL) {
-error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
-  "Chardev '%s' not found", b->chr_name);
-return;
-}
+if (b->chr_name) {
+chr = qemu_chr_find(b->chr_name);
+if (chr == NULL) {
+error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+  "Chardev '%s' not found", b->chr_name);
+return;
+}
 
-if (!qemu_chr_fe_init(>chr, chr, errp)) {
-return;
+if (!qemu_chr_fe_init(>chr, chr, errp)) {
+return;
+}
+} else {
+QIOChannelCommand *child;
+GError *err;
+int sv[2];
+
+if (!g_shell_parse_argv(b->cmd, NULL, , )) {
+error_setg(errp, "Fail to parse command: %s", err->message);
+g_error_free(err);
+return;
+}
+
+if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+error_setg_errno(errp, errno, "socketpair() failed");
+goto end;
+}
+
+chr = CHARDEV(object_new(TYPE_CHARDEV_SOCKET));
+if (!chr || qemu_chr_add_client(chr, sv[0]) == -1) {
+error_setg(errp, "Failed to make socket chardev");
+object_unref(OBJECT(chr));
+goto end;
+}
+
+if (!qemu_chr_fe_init(>chr, chr, errp)) {
+goto end;
+}
+
+child = qio_channel_command_new_spawn_with_pre_exec(
+(const char * const *)argv, O_RDONLY | O_WRONLY,
+pre_exec_cb, sv, errp);
+if (!child) {
+goto end;
+}
+b->child = QIO_CHANNEL(child);
+
+close(sv[1]);
 }
 
 b->completed = true;
 /* could vhost_dev_init() happen here, so early vhost-user message
  * can be exchanged */
+end:
+g_strfreev(argv);
+}
+
+static char *get_cmd(Object *obj, Error **errp)
+{
+VhostUserBackend *b = VHOST_USER_BACKEND(obj);
+
+return g_strdup(b->cmd);
+}
+
+static void set_cmd(Object *obj, const char *str, Error **errp)
+{
+VhostUserBackend *b = VHOST_USER_BACKEND(obj);
+
+if (b->child) {
+error_setg(errp, "cannot change property value");
+return;
+}
+
+g_free(b->cmd);
+b->cmd = g_strdup(str);
 }
 
 static void set_chardev(Object *obj, const char *value, Error **errp)
@@ -189,6 +263,7 @@ static char *get_chardev(Object *obj, Error **errp)
 
 static void vhost_user_backend_init(Object *obj)
 {
+object_property_add_str(obj, "cmd", get_cmd, set_cmd, NULL);
 object_property_add_str(obj, "chardev", get_chardev, set_chardev, NULL);
 }
 
@@ -197,6 +272,7 @@ static void vhost_user_backend_finalize(Object *obj)
 VhostUserBackend *b = VHOST_USER_BACKEND(obj);
 
 g_free(b->dev.vqs);
+g_free(b->cmd);
 g_free(b->chr_name);
 
 vhost_user_cleanup(>vhost_user);
diff --git a/qemu-options.hx b/qemu-options.hx
index 413b97d5e9..9243a5f8ab 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4292,16 +4292,20 @@ secondary:
 If you want to know the detail of above command line, you can read
 the colo-compare git log.
 
-@item -object vhost-user-backend,id=id=@var{id},chardev=@var{chardevid}
+@item -object 
vhost-user-backend,id=id=@var{id}[,chardev=@var{chardevid},cmd=@var{cmd}]
 
 Create a vhost-user-backend object that holds a

[Qemu-devel] [PATCH v4 06/29] libvhost-user: exit by default on VHOST_USER_NONE

2018-07-13 Thread Marc-André Lureau

Since commit 2566378d6d13bf4d28c7770bdbda5f7682594bbe, libvhost-user
no longer panics on disconnect (rc == 0), and instead silently ignores
an invalid VHOST_USER_NONE message.

Without extra work from the API user, this will simply busy-loop on
HUP events. The obvious thing to do is to exit(0) instead, while
additional or different work can be done by overriding
iface->process_msg().

Signed-off-by: Marc-André Lureau 
---
 contrib/libvhost-user/libvhost-user.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/libvhost-user/libvhost-user.c 
b/contrib/libvhost-user/libvhost-user.c
index a6b46cdc03..41470daf57 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -1285,7 +1285,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 case VHOST_USER_SET_CONFIG:
 return vu_set_config(dev, vmsg);
 case VHOST_USER_NONE:
-break;
+/* if you need processing before exit, override iface->process_msg */
+exit(0);
 case VHOST_USER_POSTCOPY_ADVISE:
 return vu_set_postcopy_advise(dev, vmsg);
 case VHOST_USER_POSTCOPY_LISTEN:
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 09/29] qio: add qio_channel_command_new_spawn_with_pre_exec()

2018-07-13 Thread Marc-André Lureau

Add a new function to let caller do some tuning thanks to a callback
before exec().

Signed-off-by: Marc-André Lureau 
---
 include/io/channel-command.h | 18 ++
 io/channel-command.c | 33 ++---
 2 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/include/io/channel-command.h b/include/io/channel-command.h
index 336d47fa5c..96c833daab 100644
--- a/include/io/channel-command.h
+++ b/include/io/channel-command.h
@@ -71,6 +71,24 @@ qio_channel_command_new_pid(int writefd,
 int readfd,
 pid_t pid);
 
+/**
+ * qio_channel_command_new_spawn_with_pre_exec:
+ * @argv: the NULL terminated list of command arguments
+ * @flags: the I/O mode, one of O_RDONLY, O_WRONLY, O_RDWR
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Create a channel for performing I/O with the
+ * command to be spawned with arguments @argv.
+ *
+ * Returns: the command channel object, or NULL on error
+ */
+QIOChannelCommand *
+qio_channel_command_new_spawn_with_pre_exec(const char *const argv[],
+int flags,
+void (*pre_exec_cb)(void *),
+void *data,
+Error **errp);
+
 /**
  * qio_channel_command_new_spawn:
  * @argv: the NULL terminated list of command arguments
diff --git a/io/channel-command.c b/io/channel-command.c
index 3e7eb17eff..05903ff194 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -46,9 +46,12 @@ qio_channel_command_new_pid(int writefd,
 
 #ifndef WIN32
 QIOChannelCommand *
-qio_channel_command_new_spawn(const char *const argv[],
-  int flags,
-  Error **errp)
+qio_channel_command_new_spawn_with_pre_exec(const char *const argv[],
+int flags,
+void (*pre_exec_cb)(void *),
+void *data,
+Error **errp)
+
 {
 pid_t pid = -1;
 int stdinfd[2] = { -1, -1 };
@@ -104,6 +107,10 @@ qio_channel_command_new_spawn(const char *const argv[],
 close(devnull);
 }
 
+if (pre_exec_cb) {
+pre_exec_cb(data);
+}
+
 execv(argv[0], (char * const *)argv);
 _exit(1);
 }
@@ -139,12 +146,13 @@ qio_channel_command_new_spawn(const char *const argv[],
 }
 return NULL;
 }
-
 #else /* WIN32 */
 QIOChannelCommand *
-qio_channel_command_new_spawn(const char *const argv[],
-  int flags,
-  Error **errp)
+qio_channel_command_new_spawn_with_pre_exec(const char *const argv[],
+int flags,
+void (*pre_exec_cb)(void *),
+void *data,
+Error **errp)
 {
 error_setg_errno(errp, ENOSYS,
  "Command spawn not supported on this platform");
@@ -152,6 +160,17 @@ qio_channel_command_new_spawn(const char *const argv[],
 }
 #endif /* WIN32 */
 
+
+QIOChannelCommand *
+qio_channel_command_new_spawn(const char *const argv[],
+  int flags,
+  Error **errp)
+{
+return qio_channel_command_new_spawn_with_pre_exec(argv, flags,
+   NULL, NULL, errp);
+}
+
+
 #ifndef WIN32
 static int qio_channel_command_abort(QIOChannelCommand *ioc,
  Error **errp)
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 05/29] vhost-user: simplify vhost_user_init/vhost_user_cleanup

2018-07-13 Thread Marc-André Lureau

Take a VhostUserState* that can be pre-allocated, and initialize it
with the associated chardev.

Signed-off-by: Marc-André Lureau 
Reviewed-by: Tiwei Bie 
---
 include/hw/virtio/vhost-user-blk.h  |  2 +-
 include/hw/virtio/vhost-user-scsi.h |  2 +-
 include/hw/virtio/vhost-user.h  |  2 +-
 backends/cryptodev-vhost-user.c | 18 --
 hw/block/vhost-user-blk.c   | 22 --
 hw/scsi/vhost-user-scsi.c   | 20 
 hw/virtio/vhost-stub.c  |  4 ++--
 hw/virtio/vhost-user.c  | 16 
 net/vhost-user.c| 13 -
 9 files changed, 33 insertions(+), 66 deletions(-)

diff --git a/include/hw/virtio/vhost-user-blk.h 
b/include/hw/virtio/vhost-user-blk.h
index d52944aeeb..a8a106eecb 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -36,7 +36,7 @@ typedef struct VHostUserBlk {
 uint32_t queue_size;
 uint32_t config_wce;
 struct vhost_dev dev;
-VhostUserState *vhost_user;
+VhostUserState vhost_user;
 } VHostUserBlk;
 
 #endif
diff --git a/include/hw/virtio/vhost-user-scsi.h 
b/include/hw/virtio/vhost-user-scsi.h
index 3ec34ae867..7c22bdc957 100644
--- a/include/hw/virtio/vhost-user-scsi.h
+++ b/include/hw/virtio/vhost-user-scsi.h
@@ -31,7 +31,7 @@
 typedef struct VHostUserSCSI {
 VHostSCSICommon parent_obj;
 uint64_t host_features;
-VhostUserState *vhost_user;
+VhostUserState vhost_user;
 } VHostUserSCSI;
 
 #endif /* VHOST_USER_SCSI_H */
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index fd660393a0..811e325f42 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -22,7 +22,7 @@ typedef struct VhostUserState {
 VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostUserState;
 
-VhostUserState *vhost_user_init(void);
+bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp);
 void vhost_user_cleanup(VhostUserState *user);
 
 #endif
diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
index d52daccfcd..63573e211e 100644
--- a/backends/cryptodev-vhost-user.c
+++ b/backends/cryptodev-vhost-user.c
@@ -47,7 +47,7 @@
 typedef struct CryptoDevBackendVhostUser {
 CryptoDevBackend parent_obj;
 
-VhostUserState *vhost_user;
+VhostUserState vhost_user;
 CharBackend chr;
 char *chr_name;
 bool opened;
@@ -104,7 +104,7 @@ cryptodev_vhost_user_start(int queues,
 continue;
 }
 
-options.opaque = s->vhost_user;
+options.opaque = >vhost_user;
 options.backend_type = VHOST_BACKEND_TYPE_USER;
 options.cc = b->conf.peers.ccs[i];
 s->vhost_crypto[i] = cryptodev_vhost_init();
@@ -187,7 +187,6 @@ static void cryptodev_vhost_user_init(
 size_t i;
 Error *local_err = NULL;
 Chardev *chr;
-VhostUserState *user;
 CryptoDevBackendClient *cc;
 CryptoDevBackendVhostUser *s =
   CRYPTODEV_BACKEND_VHOST_USER(backend);
@@ -218,15 +217,10 @@ static void cryptodev_vhost_user_init(
 }
 }
 
-user = vhost_user_init();
-if (!user) {
-error_setg(errp, "Failed to init vhost_user");
+if (!vhost_user_init(>vhost_user, >chr, errp)) {
 return;
 }
 
-user->chr = >chr;
-s->vhost_user = user;
-
 qemu_chr_fe_set_handlers(>chr, NULL, NULL,
  cryptodev_vhost_user_event, NULL, s, NULL, true);
 
@@ -312,11 +306,7 @@ static void cryptodev_vhost_user_cleanup(
 }
 }
 
-if (s->vhost_user) {
-vhost_user_cleanup(s->vhost_user);
-g_free(s->vhost_user);
-s->vhost_user = NULL;
-}
+vhost_user_cleanup(>vhost_user);
 }
 
 static void cryptodev_vhost_user_set_chardev(Object *obj,
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index d755223643..5a2c6fe5f3 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -224,7 +224,6 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostUserBlk *s = VHOST_USER_BLK(vdev);
-VhostUserState *user;
 int i, ret;
 
 if (!s->chardev.chr) {
@@ -242,15 +241,10 @@ static void vhost_user_blk_device_realize(DeviceState 
*dev, Error **errp)
 return;
 }
 
-user = vhost_user_init();
-if (!user) {
-error_setg(errp, "vhost-user-blk: failed to init vhost_user");
+if (!vhost_user_init(>vhost_user, >chardev, errp)) {
 return;
 }
 
-user->chr = >chardev;
-s->vhost_user = user;
-
 virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
 sizeof(struct virtio_blk_config));
 
@@ -266,7 +260,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 
 vhost_dev_set_config_notifier(>dev, _ops);
 
-ret = vhost_dev_init(>dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
+ret =

[Qemu-devel] [PATCH v4 00/29] vhost-user for input & GPU

2018-07-13 Thread Marc-André Lureau

Hi,

vhost-user allows to drive a virtio device in a seperate
process. After vhost-user-net, we have seen
vhost-user-{scsi,blk,crypto} added more recently.

This series, initially proposed 2 years ago
(https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01905.html)
contributes with vhost-user-input and vhost-user-gpu.

Additionally, to factor out common code and ease the usage, a
vhost-user-backend is introduced as an intermediary object between the
backend and the qemu device.

You may start a vhost-user-gpu with virgl rendering in a separate
process like this:

$ ./vhost-user-gpu --virgl -s vgpu.sock &
$ qemu...
  -chardev socket,id=chr,path=vgpu.sock
  -object vhost-user-backend,id=vug,chardev=chr
  -device vhost-user-vga,vhost-user=vug

You may also specify the backend command and the arguments as part of
vhost-user-backend qemu arguments. For example, to start a
vhost-user-input backend on input device /dev/input/event19:

-object vhost-user-backend,id=vuid,cmd="vhost-user-input /dev/input/event19"
-device virtio-input-host-pci,vhost-user=vuid

The vhost-user-gpu backend requires virgl from git.

The libvirt support is on-going work:
https://github.com/elmarco/libvirt/commits/vhost-user-gpu

The GPU benchmarks are encouraging, giving up to x5 performance on
Unigine Heaven 4.0.

Feedback welcome,

v4:
 - move qemu_write_pidfile() in util, improve it a bit
 - add --pid and --fd arguments to vhost-user to help with libvirt support
 - various bug fixes for synchronization, and tearing down

v3: deal with most comments from rfcv2 and various improvements
 - "vhost-user-backend: allow to specify binary to execute" as seperate
  patch, not for inclusion since Daniel as concerned about parsing
  shell strings with glib
 - use dmabuf to share 2d rendering result (with intel gem only atm)
 - document the vhost-user-gpu protocol
 - make vhost-user-gpu-pci and vhost-user-vga seperate devices (instead
   of adding vhost-user support to existing devices)
 - allow to specify virgl rendering, and rendernode
 - let's promote out of RFC status :)

RFCv2: (addressing some of Gerd comments digged in the archives)
 - rebased, clean ups, various small fixes, update commit messages
 - teach the vhost-user-backend to take a chardev
 - add vhost-user-input-pci, instead of adding vhost code to 
virtio-input-host-pci

Marc-André Lureau (29):
  chardev: avoid crash if no associated address
  chardev: remove qemu_chr_fe_read_all() counter
  chardev: unref if underlying chardev has no parent
  dmabuf: add y0_top, pass it to spice
  vhost-user: simplify vhost_user_init/vhost_user_cleanup
  libvhost-user: exit by default on VHOST_USER_NONE
  vhost-user: wrap some read/write with retry handling
  Add vhost-user-backend
  qio: add qio_channel_command_new_spawn_with_pre_exec()
  HACK: vhost-user-backend: allow to specify binary to execute
  vhost-user: split vhost_user_read()
  vhost-user: add vhost_user_input_get_config()
  libvhost-user: export vug_source_new()
  contrib: add vhost-user-input
  Add vhost-user-input-pci
  vhost-user: add vhost_user_gpu_set_socket()
  vhost-user: add vhost_user_gpu_get_num_capsets()
  virtio: add virtio-gpu bswap helpers header
  util: promote qemu_egl_rendernode_open() to libqemuutil
  util: add qemu_write_pidfile()
  util: use fcntl() for qemu_write_pidfile() locking
  contrib: add vhost-user-gpu
  virtio-gpu: remove unused qdev
  virtio-gpu: remove unused config_size
  virtio-gpu: block both 2d and 3d rendering
  virtio-gpu: remove useless 'waiting' field
  virtio-gpu: split virtio-gpu, introduce virtio-gpu-base
  virtio-gpu: split virtio-gpu-pci & virtio-vga
  hw/display: add vhost-user-vga & gpu-pci

 contrib/libvhost-user/libvhost-user-glib.h |3 +
 contrib/libvhost-user/libvhost-user.h  |3 +
 contrib/vhost-user-gpu/drm.h   |   63 ++
 contrib/vhost-user-gpu/virgl.h |   25 +
 contrib/vhost-user-gpu/vugpu.h |  168 +++
 hw/display/virtio-vga.h|   22 +
 hw/virtio/virtio-pci.h |   27 +-
 include/hw/virtio/vhost-backend.h  |6 +
 include/hw/virtio/vhost-user-blk.h |2 +-
 include/hw/virtio/vhost-user-scsi.h|2 +-
 include/hw/virtio/vhost-user.h |2 +-
 include/hw/virtio/virtio-gpu-bswap.h   |   61 +
 include/hw/virtio/virtio-gpu.h |   90 +-
 include/hw/virtio/virtio-input.h   |   14 +
 include/io/channel-command.h   |   18 +
 include/qemu/drm.h |6 +
 include/qemu/osdep.h   |3 +-
 include/sysemu/vhost-user-backend.h|   60 +
 include/ui/console.h   |1 +
 backends/cryptodev-vhost-user.c|   18 +-
 backends/vhost-user.c  |  320 ++
 chardev/char-fe.c  |   13 +-
 chardev/char-socket.c  |8 +-
 contrib/libvhost-user/libvhost-user-glib.c |   15 +-

[Qemu-devel] [PATCH v4 04/29] dmabuf: add y0_top, pass it to spice

2018-07-13 Thread Marc-André Lureau

Some scanouts during boot are top-down without it.

y0_top is set from VHOST_USER_GPU_DMABUF_SCANOUT code path in the last
patch of this series.

In current QEMU code base, only vfio/display uses dmabuf API. But the
VFIO query interface doesn't provide or need that detail so far.

Signed-off-by: Marc-André Lureau 
---
 include/ui/console.h | 1 +
 ui/spice-display.c   | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/ui/console.h b/include/ui/console.h
index 981b519dde..fb969caf70 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -186,6 +186,7 @@ struct QemuDmaBuf {
 uint32_t  stride;
 uint32_t  fourcc;
 uint32_t  texture;
+bool  y0_top;
 };
 
 typedef struct DisplayChangeListenerOps {
diff --git a/ui/spice-display.c b/ui/spice-display.c
index fe734821dd..81f08a85bc 100644
--- a/ui/spice-display.c
+++ b/ui/spice-display.c
@@ -1048,7 +1048,8 @@ static void qemu_spice_gl_update(DisplayChangeListener 
*dcl,
 /* note: spice server will close the fd, so hand over a dup */
 spice_qxl_gl_scanout(>qxl, dup(dmabuf->fd),
  dmabuf->width, dmabuf->height,
- dmabuf->stride, dmabuf->fourcc, false);
+ dmabuf->stride, dmabuf->fourcc,
+ dmabuf->y0_top);
 }
 qemu_spice_gl_monitor_config(ssd, 0, 0, dmabuf->width, dmabuf->height);
 ssd->guest_dmabuf_refresh = false;
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 02/29] chardev: remove qemu_chr_fe_read_all() counter

2018-07-13 Thread Marc-André Lureau

There is no obvious reason to have a loop counter. This limits from
reading several megabytes large buffers in one go, since socket
read/write usually have a limit.

Signed-off-by: Marc-André Lureau 
Reviewed-by: Paolo Bonzini 
---
 chardev/char-fe.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/chardev/char-fe.c b/chardev/char-fe.c
index b1f228e8b5..f158f158f8 100644
--- a/chardev/char-fe.c
+++ b/chardev/char-fe.c
@@ -56,7 +56,7 @@ int qemu_chr_fe_write_all(CharBackend *be, const uint8_t 
*buf, int len)
 int qemu_chr_fe_read_all(CharBackend *be, uint8_t *buf, int len)
 {
 Chardev *s = be->chr;
-int offset = 0, counter = 10;
+int offset = 0;
 int res;
 
 if (!s || !CHARDEV_GET_CLASS(s)->chr_sync_read) {
@@ -88,10 +88,6 @@ int qemu_chr_fe_read_all(CharBackend *be, uint8_t *buf, int 
len)
 }
 
 offset += res;
-
-if (!counter--) {
-break;
-}
 }
 
 if (qemu_chr_replay(s) && replay_mode == REPLAY_MODE_RECORD) {
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 03/29] chardev: unref if underlying chardev has no parent

2018-07-13 Thread Marc-André Lureau

It's possible to write code creating a chardev backend that is not
registered. When it is not user-created, it makes sense to keep it
hidden. Let the associated frontend destroy it also in this case.

Signed-off-by: Marc-André Lureau 
---
 chardev/char-fe.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/chardev/char-fe.c b/chardev/char-fe.c
index f158f158f8..a8931f7afd 100644
--- a/chardev/char-fe.c
+++ b/chardev/char-fe.c
@@ -235,7 +235,12 @@ void qemu_chr_fe_deinit(CharBackend *b, bool del)
 d->backends[b->tag] = NULL;
 }
 if (del) {
-object_unparent(OBJECT(b->chr));
+Object *obj = OBJECT(b->chr);
+if (obj->parent) {
+object_unparent(obj);
+} else {
+object_unref(obj);
+}
 }
 b->chr = NULL;
 }
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH v4 01/29] chardev: avoid crash if no associated address

2018-07-13 Thread Marc-André Lureau

A socket chardev may not have associated address (when adding client
fd manually for example). But on disconnect, updating socket filename
expects an address and may lead to this crash:

  Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
  0x55d8c70c in SocketAddress_to_str (prefix=0x56043062 
"disconnected:", addr=0x0, is_listen=false, is_telnet=false) at 
/home/elmarco/src/qq/chardev/char-socket.c:388
  388   switch (addr->type) {
  (gdb) bt
  #0  0x55d8c70c in SocketAddress_to_str (prefix=0x56043062 
"disconnected:", addr=0x0, is_listen=false, is_telnet=false) at 
/home/elmarco/src/qq/chardev/char-socket.c:388
  #1  0x55d8c8aa in update_disconnected_filename (s=0x56b1ed00) at 
/home/elmarco/src/qq/chardev/char-socket.c:419
  #2  0x55d8c959 in tcp_chr_disconnect (chr=0x56b1ed00) at 
/home/elmarco/src/qq/chardev/char-socket.c:438
  #3  0x55d8cba1 in tcp_chr_hup (channel=0x56b75690, cond=G_IO_HUP, 
opaque=0x56b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:482
  #4  0x55da596e in qio_channel_fd_source_dispatch 
(source=0x56bb68b0, callback=0x55d8cb58 , 
user_data=0x56b1ed00) at /home/elmarco/src/qq/io/channel-watch.c:84

Replace filename with a generic "disconnected:socket" in this case.

Signed-off-by: Marc-André Lureau 
Reviewed-by: Daniel P. Berrangé 
---
 chardev/char-socket.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index efbad6ee7c..fa5bfb3b0e 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -419,8 +419,12 @@ static void update_disconnected_filename(SocketChardev *s)
 Chardev *chr = CHARDEV(s);
 
 g_free(chr->filename);
-chr->filename = SocketAddress_to_str("disconnected:", s->addr,
- s->is_listen, s->is_telnet);
+if (s->addr) {
+chr->filename = SocketAddress_to_str("disconnected:", s->addr,
+ s->is_listen, s->is_telnet);
+} else {
+chr->filename = g_strdup("disconnected:socket");
+}
 }
 
 /* NB may be called even if tcp_chr_connect has not been
-- 
2.18.0.129.ge3331758f1

[Qemu-devel] [PATCH] linux-user: convert remaining fcntl() to safe_fcntl()

2018-07-13 Thread Laurent Vivier

Commit 435da5e709 didn't convert a fcntl() call to safe_fcntl()
for TARGET_NR_fcntl64 case. There is no reason to not use it
in this case.

Fixes: 435da5e709 linux-user: Use safe_syscall wrapper for fcntl
Signed-off-by: Laurent Vivier 
---
 linux-user/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index e4b1b7d7da..a997ba914f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -11730,7 +11730,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 if (ret) {
 break;
 }
-ret = get_errno(fcntl(arg1, cmd, ));
+ret = get_errno(safe_fcntl(arg1, cmd, ));
 if (ret == 0) {
 ret = copyto(arg3, );
 }
-- 
2.17.1

Re: [Qemu-devel] [PULL 4/6] accel/tcg: Don't treat invalid TLB entries as needing recheck

2018-07-13 Thread Peter Maydell

On 13 July 2018 at 13:36, Richard Henderson
 wrote:
> On 07/13/2018 06:05 AM, Peter Maydell wrote:
>>> -if (unlikely(env->tlb_table[mmu_idx][index].addr_code & TLB_RECHECK)) {
>>> +if (unlikely((env->tlb_table[mmu_idx][index].addr_code &
>>> +  (TLB_RECHECK | TLB_INVALID_MASK)) == TLB_RECHECK)) {
>>>  /*
>>>   * This is a TLB_RECHECK access, where the MMU protection
>>>   * covers a smaller range than a target page, and we must
>>
>> Looking again at this code, I think that now we have the code to
>> ensure that there's only ever one entry in the TLB/victim TLB for
>> a given guest address...
>
> Which probably wasn't the case the first time you wrote this, no?
> Fixing that single entry condition was just a few weeks ago.

Yes, exactly.

OTOH with Laurent's m68k test case I'm finding that the assert
is firing, and I'm not entirely sure why yet...

thanks
-- PMM

[Qemu-devel] [PATCH v4] linux-user: ppc64: use the correct values for F_*LK64s

2018-07-13 Thread Shivaprasad G Bhat

Qemu includes the glibc headers for the host defines and target headers are
part of the qemu source themselves. The glibc has the F_GETLK64, F_SETLK64
and F_SETLKW64 defined to 12, 13 and 14 for all archs in
sysdeps/unix/sysv/linux/bits/fcntl-linux.h. The linux kernel generic
definition for F_*LK is 5, 6 & 7 and F_*LK64* is 12,13, and 14 as seen in
include/uapi/asm-generic/fcntl.h. On 64bit machine, by default the kernel
assumes all F_*LK to 64bit calls and doesnt support use of F_*LK64* as
can be seen in include/linux/fcntl.h in linux source.

On x86_64 host, the values for F_*LK64* are set to 5, 6 and 7
explicitly in /usr/include/x86_64-linux-gnu/bits/fcntl.h by the glibc.
Whereas, a PPC64 host doesn't have such a definition in
/usr/include/powerpc64le-linux-gnu/bits/fcntl.h by the glibc. So,
the sources on PPC64 host sees the default value of F_*LK64*
as 12, 13 & 14(fcntl-linux.h).

Since the 64bit kernel doesnt support 12, 13 & 14; the glibc fcntl syscall
implementation(__libc_fcntl*(), __fcntl64_nocancel) does the F_*LK64* value
convertion back to F_*LK* values on PPC64 as seen in
sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h with FCNTL_ADJUST_CMD()
macro. Whereas on x86_64 host the values for F_*LK64* are set to 5, 6 and 7
and no adjustments are needed.

Since qemu doesnt use the glibc fcntl, but makes the safe_syscall* on its
own, the PPC64 qemu is calling the syscall with 12, 13, and 14(without
adjustment) and they all fail. The fcntl calls to F_GETLK/F_SETLK|W all
fail by all pplications run on PPC64 host user emulation.

The fix here could be to see why on PPC64 the glibc is still keeping
F_*LK64* different from F_*LK and why adjusting them to 5, 6 and 7 before
the syscall for PPC only. See if we can make the
/usr/include/powerpc64le-linux-gnu/bits/fcntl.h to have the values
5, 6 & 7 just like x86_64 and remove the adjustment code in glibc. That
way, qemu sources see the kernel supported values in glibc headers.

OR

On PPC64 host, qemu sources see both F_*LK & F_*LK64* as same and set to
12, 13 and 14 because __USE_FILE_OFFSET64 is defined in qemu
sources(also refer sysdeps/unix/sysv/linux/bits/fcntl-linux.h).
Do the value adjustment just like it is done by glibc source by using
F_GETLK value of 5. That way, we make the syscalls with the actual
supported values in Qemu. The patch is taking this approach.

Signed-off-by: Shivaprasad G Bhat 
---
 v3 - https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02923.html
 Changes from v3:
- Fixed the tabs for case statements
- Addressed the comments on v3 wrt to the variable initialisation
  and break from default case.
 v2 - https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02920.html
 Changes from v2:
- Fixed the braces, and indentation for comments.
 v1 - https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02567.html
 Changes from v1:
- Changed the overwrite of F*LK64* with 5, 6 and 7 in using #define
  instead using the adjustment code similar to glibc as suggested.
- Dropped __linux__ check for the adjustment code as suggested.
- Moved the adjustment code inside target_to_host_fcntl_cmd to address
  all possible|future cases.

 linux-user/syscall.c |  126 --
 1 file changed, 80 insertions(+), 46 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 643b8833de..b5274f657a 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6475,63 +6475,97 @@ static int do_fork(CPUArchState *env, unsigned int 
flags, abi_ulong newsp,
 /* warning : doesn't handle linux specific flags... */
 static int target_to_host_fcntl_cmd(int cmd)
 {
+int ret;
+
 switch(cmd) {
-   case TARGET_F_DUPFD:
-   case TARGET_F_GETFD:
-   case TARGET_F_SETFD:
-   case TARGET_F_GETFL:
-   case TARGET_F_SETFL:
-return cmd;
-case TARGET_F_GETLK:
-return F_GETLK64;
-case TARGET_F_SETLK:
-return F_SETLK64;
-case TARGET_F_SETLKW:
-return F_SETLKW64;
-   case TARGET_F_GETOWN:
-   return F_GETOWN;
-   case TARGET_F_SETOWN:
-   return F_SETOWN;
-   case TARGET_F_GETSIG:
-   return F_GETSIG;
-   case TARGET_F_SETSIG:
-   return F_SETSIG;
+case TARGET_F_DUPFD:
+case TARGET_F_GETFD:
+case TARGET_F_SETFD:
+case TARGET_F_GETFL:
+case TARGET_F_SETFL:
+ret = cmd;
+break;
+case TARGET_F_GETLK:
+ret = F_GETLK64;
+break;
+case TARGET_F_SETLK:
+ret = F_SETLK64;
+break;
+case TARGET_F_SETLKW:
+ret = F_SETLKW64;
+break;
+case TARGET_F_GETOWN:
+ret = F_GETOWN;
+break;
+case TARGET_F_SETOWN:
+ret = F_SETOWN;
+break;
+case TARGET_F_GETSIG:
+ret = F_GETSIG;
+break;
+case TARGET_F_SETSIG:
+ret = F_SETSIG;
+break;
 #if TARGET_ABI_BITS == 32
-case

1 2 >

1 - 100 of 179 matches

Mail list logo