date:20180607

[Qemu-devel] [PATCH] hw/arm/smmuv3: Fix translate error handling

2018-06-07 Thread Eric Auger

From: Jia He 

In case the STE's config is "Bypass" we currently don't set the
IOMMUTLBEntry perm flags and the access does not succeed. Also
if the config is 0b0xx (Aborted/Reserved), decode_ste and
smmuv3_decode_config currently returns -EINVAL and we don't enter
the expected code path: we record an event whereas we should not.

This patch fixes those bugs and simplifies the error handling.
decode_ste and smmuv3_decode_config now return 0 if aborted or
bypassed config was found. Only bad config info produces negative
error values. In smmuv3_translate we more clearly differentiate
errors, bypass/smmu disabled, aborted and success cases. Also
trace points are differentiated.

Fixes: 9bde7f0674fe ("hw/arm/smmuv3: Implement translate callback")
Reported-by: jia...@hxt-semitech.com
Signed-off-by: jia...@hxt-semitech.com
Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3-internal.h | 12 +--
 hw/arm/smmuv3.c  | 93 
 hw/arm/trace-events  |  7 ++--
 3 files changed, 77 insertions(+), 35 deletions(-)

diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index a9d714b..bab25d6 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -23,6 +23,14 @@
 
 #include "hw/arm/smmu-common.h"
 
+typedef enum SMMUTranslationStatus {
+SMMU_TRANS_DISABLE,
+SMMU_TRANS_ABORT,
+SMMU_TRANS_BYPASS,
+SMMU_TRANS_ERROR,
+SMMU_TRANS_SUCCESS,
+} SMMUTranslationStatus;
+
 /* MMIO Registers */
 
 REG32(IDR0,0x0)
@@ -315,7 +323,7 @@ enum { /* Command completion notification */
 /* Events */
 
 typedef enum SMMUEventType {
-SMMU_EVT_OK = 0x00,
+SMMU_EVT_NONE   = 0x00,
 SMMU_EVT_F_UUT,
 SMMU_EVT_C_BAD_STREAMID   ,
 SMMU_EVT_F_STE_FETCH  ,
@@ -337,7 +345,7 @@ typedef enum SMMUEventType {
 } SMMUEventType;
 
 static const char *event_stringify[] = {
-[SMMU_EVT_OK]   = "SMMU_EVT_OK",
+[SMMU_EVT_NONE] = "no recorded event",
 [SMMU_EVT_F_UUT]= "SMMU_EVT_F_UUT",
 [SMMU_EVT_C_BAD_STREAMID]   = "SMMU_EVT_C_BAD_STREAMID",
 [SMMU_EVT_F_STE_FETCH]  = "SMMU_EVT_F_STE_FETCH",
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index b3026de..c5e7a7c 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -154,7 +154,7 @@ void smmuv3_record_event(SMMUv3State *s, SMMUEventInfo 
*info)
 EVT_SET_SID(&evt, info->sid);
 
 switch (info->type) {
-case SMMU_EVT_OK:
+case SMMU_EVT_NONE:
 return;
 case SMMU_EVT_F_UUT:
 EVT_SET_SSID(&evt, info->u.f_uut.ssid);
@@ -312,12 +312,11 @@ static int smmu_get_cd(SMMUv3State *s, STE *ste, uint32_t 
ssid,
 return 0;
 }
 
-/* Returns <0 if the caller has no need to continue the translation */
+/* Returns < 0 in case of invalid STE, 0 otherwise */
 static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
   STE *ste, SMMUEventInfo *event)
 {
 uint32_t config;
-int ret = -EINVAL;
 
 if (!STE_VALID(ste)) {
 goto bad_ste;
@@ -326,13 +325,13 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
 config = STE_CONFIG(ste);
 
 if (STE_CFG_ABORT(config)) {
-cfg->aborted = true; /* abort but don't record any event */
-return ret;
+cfg->aborted = true;
+return 0;
 }
 
 if (STE_CFG_BYPASS(config)) {
 cfg->bypassed = true;
-return ret;
+return 0;
 }
 
 if (STE_CFG_S2_ENABLED(config)) {
@@ -509,7 +508,7 @@ bad_cd:
  *   the different configuration decoding steps
  * @event: must be zero'ed by the caller
  *
- * return < 0 if the translation needs to be aborted (@event is filled
+ * return < 0 in case of config decoding error (@event is filled
  * accordingly). Return 0 otherwise.
  */
 static int smmuv3_decode_config(IOMMUMemoryRegion *mr, SMMUTransCfg *cfg,
@@ -518,19 +517,26 @@ static int smmuv3_decode_config(IOMMUMemoryRegion *mr, 
SMMUTransCfg *cfg,
 SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
 uint32_t sid = smmu_get_sid(sdev);
 SMMUv3State *s = sdev->smmu;
-int ret = -EINVAL;
+int ret;
 STE ste;
 CD cd;
 
-if (smmu_find_ste(s, sid, &ste, event)) {
+ret = smmu_find_ste(s, sid, &ste, event);
+if (ret) {
 return ret;
 }
 
-if (decode_ste(s, cfg, &ste, event)) {
+ret = decode_ste(s, cfg, &ste, event);
+if (ret) {
 return ret;
 }
 
-if (smmu_get_cd(s, &ste, 0 /* ssid */, &cd, event)) {
+if (cfg->aborted || cfg->bypassed) {
+return 0;
+}
+
+ret = smmu_get_cd(s, &ste, 0 /* ssid */, &cd, event);
+if (ret) {
 return ret;
 }
 
@@ -543,8 +549,9 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
*mr, hwaddr addr,
 SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
 SMMUv3State *s = sdev->smmu;
 uint32_t sid = smmu_get_sid(sdev);
-

Re: [Qemu-devel] [OpenRISC] OpenRISC: SMP support for more than 2 cores

2018-06-07 Thread Davidson Francis


On 07-06-2018 12:56, Richard Henderson wrote:

On 06/07/2018 06:27 AM, Davidson Francis wrote:

Dear all,

Currently Qemu supports only 2 cores when SMP enabled for or1k architecure, so
I would like to know if there is a quick way to increase the number of cores by
changing a few lines of code or to accomplish this, will requires significant
changes in the source code?


Probably not significant changes.
The limit of 2 seems to be the way the interrupts are wired up.
That can probably be extended relatively easily.


r~



Thank you Richard, I will investigate how the interrupts are wired.

Kind regards,
Davidson Francis.

Re: [Qemu-devel] [RFC v2 3/4] monitor: remove "x-oob", turn oob on by default

2018-06-07 Thread Peter Xu

On Thu, Jun 07, 2018 at 01:40:22PM +0200, Markus Armbruster wrote:
> Peter Xu  writes:
> 
> > There was a regression reported by Eric Auger before with OOB:
> >
> >   http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html
> >
> > The fix is 951702f39c ("monitor: bind dispatch bh to iohandler context",
> > 2018-04-10), which is in master already.
> >
> > For the bug, we turned Out-Of-Band feature of monitors off for 2.12
> > release.  Now we turn that on again after the 2.12 release.
> >
> > This patch partly reverts be933ffc23 (monitor: new parameter "x-oob"),
> > meanwhile turn it on again by default for non-MUX QMPs.
> 
> Please add a brief explanation why OOB isn't turned on for MUX.  Pointer
> to existing explanation would be fine.

Sure.

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v2 3/4] hw/arm/sysbus-fdt: Replace error_setg(&error_fatal) by error_report() + exit()

2018-06-07 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> Use error_report() + exit() instead of error_setg(&error_fatal),
> as suggested by the "qapi/error.h" documentation:
>
>Please don't error_setg(&error_fatal, ...), use error_report() and
>exit(), because that's more obvious.
>
> This fixes CID 1352173:
> "Passing null pointer dt_name to qemu_fdt_node_path, which dereferences 
> it."
>
> And this also fixes:
>
> hw/arm/sysbus-fdt.c:322:9: warning: Array access (from variable 
> 'node_path') results in a null pointer dereference
> if (node_path[1]) {
> ^~~~
>
> Fixes: Coverity CID 1352173 (Dereference after null check)

You lost

  Suggested-by: Eric Blake 

Intentional?

> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/arm/sysbus-fdt.c | 42 +++---
>  1 file changed, 23 insertions(+), 19 deletions(-)
>
> diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
> index e4c492ea44..8e2784fa11 100644
> --- a/hw/arm/sysbus-fdt.c
> +++ b/hw/arm/sysbus-fdt.c
> @@ -91,7 +91,7 @@ static void copy_properties_from_host(HostProperty *props, 
> int nb_props,
>  r = qemu_fdt_getprop(host_fdt, node_path,
>   props[i].name,
>   &prop_len,
> - props[i].optional ? &err : &error_fatal);
> + &err);
>  if (r) {
>  qemu_fdt_setprop(guest_fdt, nodename,
>   props[i].name, r, prop_len);
> @@ -102,6 +102,7 @@ static void copy_properties_from_host(HostProperty 
> *props, int nb_props,
>  } else {
>  error_free(err);
>  }
> +assert(props[i].optional); /* mandatory property not found */
>  }
>  }
>  }

This is not the conversion the commit message promised: it replaces
exit(1) by abort().  Why?

[Remainder of the patch is unchanged]

Re: [Qemu-devel] [PATCH v10 5/7] monitor: remove event_clock_type

2018-06-07 Thread Peter Xu

On Fri, Jun 08, 2018 at 07:38:11AM +0200, Markus Armbruster wrote:

[...]

> > +/*
> > + * This should never be called before configure_accelerator() since
> > + * only until then could we know whether qtest was enabled or not.
> 
> Uh, we know it after then, not until then.  What about
> 
>/*
> * Return the clock to use for recording an event's time.
> * Beware: result is invalid before configure_accelerator().

Oh, another Chinglish from me. :(

> 
> > + */
> > +static inline QEMUClockType monitor_get_clock(void)
> 
> Suggest to rename to monitor_get_event_clock(), or just
> get_event_clock().
> 
> Happy to apply touch-ups on commit.

That'll be appreciated.  Both changes work for me.

[...]

> With at least a suitable fix for the comment:
> Reviewed-by: Markus Armbruster 

Thanks!

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v2 2/4] hw/ppc/spapr_drc: Replace error_setg(&error_abort) by abort()

2018-06-07 Thread Markus Armbruster

David Gibson  writes:

> On Fri, Jun 08, 2018 at 12:54:36AM -0300, Philippe Mathieu-Daudé wrote:
>> Hi David,
>> 
>> On 06/08/2018 12:03 AM, David Gibson wrote:
>> > On Thu, Jun 07, 2018 at 11:46:43AM -0300, Philippe Mathieu-Daudé wrote:
>> >> Use abort() instead of error_setg(&error_abort),
>> >> as suggested by the "qapi/error.h" documentation:
>> >>
>> >> Please don't error_setg(&error_fatal, ...), use error_report() and
>> >> exit(), because that's more obvious.
>> >> Likewise, don't error_setg(&error_abort, ...), use assert().
>> >>
>> >> Use abort() instead of the suggested assert() because the assertion is
>> >> already verified by the switch case.
>> > 
>> > I think g_assert_not_reached() would be the right thing here.
>> 
>> I try to follow Eric advice (who recalled Markus).
>> As I understand:
>> http://lists.gnu.org/archive/html/qemu-devel/2017-08/msg03605.html
>> 
>> "glib-Testing [...] should not be used outside of tests/."
>
> Oh, ok, go with that then.

Most g_assert_FOO() are indeed tests-only, but g_assert_not_reached() is
actually acceptable elsewhere, see commit 6e9389563e5, and its review
thread
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05499.html
Message-Id: <20170427165526.19836-1-dgilb...@redhat.com>

That said, I fail to see the value of this kind of lipstick, too.

> Acked-by: David Gibson 
>
>> I can respin if you prefer.

I can replace by g_assert_not_reached() on commit if you prefer.

[Qemu-devel] [PULL 4/5] slirp: correct size computation while concatenating mbuf

2018-06-07 Thread Samuel Thibault

From: Prasad J Pandit 

While reassembling incoming fragmented datagrams, 'm_cat' routine
extends the 'mbuf' buffer, if it has insufficient room. It computes
a wrong buffer size, which leads to overwriting adjacent heap buffer
area. Correct this size computation in m_cat.

Reported-by: ZDI Disclosures 
Signed-off-by: Prasad J Pandit 
Signed-off-by: Samuel Thibault 
---
 slirp/mbuf.c | 11 +--
 slirp/mbuf.h |  8 +++-
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/slirp/mbuf.c b/slirp/mbuf.c
index 5ff24559fd..18cbf759a7 100644
--- a/slirp/mbuf.c
+++ b/slirp/mbuf.c
@@ -138,7 +138,7 @@ m_cat(struct mbuf *m, struct mbuf *n)
 * If there's no room, realloc
 */
if (M_FREEROOM(m) < n->m_len)
-   m_inc(m,m->m_size+MINCSIZE);
+   m_inc(m, m->m_len + n->m_len);
 
memcpy(m->m_data+m->m_len, n->m_data, n->m_len);
m->m_len += n->m_len;
@@ -147,7 +147,7 @@ m_cat(struct mbuf *m, struct mbuf *n)
 }
 
 
-/* make m size bytes large */
+/* make m 'size' bytes large from m_data */
 void
 m_inc(struct mbuf *m, int size)
 {
@@ -158,12 +158,12 @@ m_inc(struct mbuf *m, int size)
 
 if (m->m_flags & M_EXT) {
  datasize = m->m_data - m->m_ext;
-  m->m_ext = g_realloc(m->m_ext, size);
+ m->m_ext = g_realloc(m->m_ext, size + datasize);
  m->m_data = m->m_ext + datasize;
 } else {
  char *dat;
  datasize = m->m_data - m->m_dat;
-  dat = g_malloc(size);
+ dat = g_malloc(size + datasize);
  memcpy(dat, m->m_dat, m->m_size);
 
  m->m_ext = dat;
@@ -171,8 +171,7 @@ m_inc(struct mbuf *m, int size)
  m->m_flags |= M_EXT;
 }
 
-m->m_size = size;
-
+m->m_size = size + datasize;
 }
 
 
diff --git a/slirp/mbuf.h b/slirp/mbuf.h
index 893601ff9d..33b84485d6 100644
--- a/slirp/mbuf.h
+++ b/slirp/mbuf.h
@@ -33,8 +33,6 @@
 #ifndef MBUF_H
 #define MBUF_H
 
-#define MINCSIZE 4096  /* Amount to increase mbuf if too small */
-
 /*
  * Macros for type conversion
  * mtod(m,t) - convert mbuf pointer to data pointer of correct type
@@ -72,11 +70,11 @@ struct mbuf {
struct  mbuf *m_prevpkt;/* Flags aren't used in the output 
queue */
int m_flags;/* Misc flags */
 
-   int m_size; /* Size of data */
+   int m_size; /* Size of mbuf, from m_dat or m_ext */
struct  socket *m_so;
 
-   caddr_t m_data; /* Location of data */
-   int m_len;  /* Amount of data in this mbuf */
+   caddr_t m_data; /* Current location of data */
+   int m_len;  /* Amount of data in this mbuf, from 
m_data */
 
Slirp *slirp;
boolresolution_requested;
-- 
2.17.1

[Qemu-devel] [PULL 1/5] slirp: Fix spurious error report when sending directly

2018-06-07 Thread Samuel Thibault

Move check to where it actually is useful, and reduce scope of 'len'
variable along the way.

Signed-off-by: Samuel Thibault 
Reviewed-by: Philippe Mathieu-Daudé 
---
 slirp/socket.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/slirp/socket.c b/slirp/socket.c
index e2a71c9b04..08fe98907d 100644
--- a/slirp/socket.c
+++ b/slirp/socket.c
@@ -340,7 +340,7 @@ sosendoob(struct socket *so)
struct sbuf *sb = &so->so_rcv;
char buff[2048]; /* XXX Shouldn't be sending more oob data than this */
 
-   int n, len;
+   int n;
 
DEBUG_CALL("sosendoob");
DEBUG_ARG("so = %p", so);
@@ -359,7 +359,7 @@ sosendoob(struct socket *so)
 * send it all
 */
uint32_t urgc = so->so_urgc;
-   len = (sb->sb_data + sb->sb_datalen) - sb->sb_rptr;
+   int len = (sb->sb_data + sb->sb_datalen) - sb->sb_rptr;
if (len > urgc) {
len = urgc;
}
@@ -374,13 +374,13 @@ sosendoob(struct socket *so)
len += n;
}
n = slirp_send(so, buff, len, (MSG_OOB)); /* |MSG_DONTWAIT)); */
-   }
-
 #ifdef DEBUG
-   if (n != len) {
-   DEBUG_ERROR((dfd, "Didn't send all data urgently X\n"));
-   }
+   if (n != len) {
+   DEBUG_ERROR((dfd, "Didn't send all data urgently 
X\n"));
+   }
 #endif
+   }
+
if (n < 0) {
return n;
}
-- 
2.17.1

[Qemu-devel] [PULL 0/5] slirp updates

2018-06-07 Thread Samuel Thibault

The following changes since commit 9be4af13305f24d2dabf94bb53e6b65c76d08bb2:

  Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into 
staging (2018-06-01 14:58:53 +0100)

are available in the Git repository at:

  https://people.debian.org/~sthibault/qemu.git tags/samuel-thibault

for you to fetch changes up to c22098c74a09164797fae6511c5eaf68f32c4dd8:

  slirp: reformat m_inc routine (2018-06-08 09:08:30 +0300)


slirp updates

Prasad J Pandit (2):
  slirp: Fix buffer overflow on packet reassembling

Samuel Thibault (3):
  slirp: Add Samuel Thibault's staging tree for slirp
  slirp: fix domainname version availability


Prasad J Pandit (2):
  slirp: correct size computation while concatenating mbuf
  slirp: reformat m_inc routine

Samuel Thibault (3):
  slirp: Fix spurious error report when sending directly
  slirp: Add Samuel Thibault's staging tree for slirp
  slirp: fix domainname version availability

 MAINTAINERS|  1 +
 qapi/net.json  |  2 +-
 slirp/mbuf.c   | 39 ++-
 slirp/mbuf.h   |  8 +++-
 slirp/socket.c | 14 +++---
 5 files changed, 30 insertions(+), 34 deletions(-)

[Qemu-devel] [PULL 3/5] slirp: fix domainname version availability

2018-06-07 Thread Samuel Thibault

The change missed the 2.12 deadline.

Signed-off-by: Samuel Thibault 
Reviewed-by: Eric Blake 
---
 qapi/net.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qapi/net.json b/qapi/net.json
index 32681a1af7..6b7d93cb59 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -161,7 +161,7 @@
 # to the guest
 #
 # @domainname: guest-visible domain name of the virtual nameserver
-#  (since 2.12)
+#  (since 3.0)
 #
 # @ipv6-prefix: IPv6 network prefix (default is fec0::) (since
 #   2.6). The network prefix is given in the usual
-- 
2.17.1

[Qemu-devel] [PULL 5/5] slirp: reformat m_inc routine

2018-06-07 Thread Samuel Thibault

From: Prasad J Pandit 

Coding style changes to the m_inc routine and minor refactoring.

Reported-by: ZDI Disclosures 
Signed-off-by: Prasad J Pandit 
Signed-off-by: Samuel Thibault 
---
 slirp/mbuf.c | 34 --
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/slirp/mbuf.c b/slirp/mbuf.c
index 18cbf759a7..0c189e1a7b 100644
--- a/slirp/mbuf.c
+++ b/slirp/mbuf.c
@@ -151,27 +151,25 @@ m_cat(struct mbuf *m, struct mbuf *n)
 void
 m_inc(struct mbuf *m, int size)
 {
-   int datasize;
+int datasize;
 
-   /* some compiles throw up on gotos.  This one we can fake. */
-if(m->m_size>size) return;
+/* some compilers throw up on gotos.  This one we can fake. */
+if (m->m_size > size) {
+return;
+}
 
-if (m->m_flags & M_EXT) {
- datasize = m->m_data - m->m_ext;
- m->m_ext = g_realloc(m->m_ext, size + datasize);
- m->m_data = m->m_ext + datasize;
-} else {
- char *dat;
- datasize = m->m_data - m->m_dat;
- dat = g_malloc(size + datasize);
- memcpy(dat, m->m_dat, m->m_size);
-
- m->m_ext = dat;
- m->m_data = m->m_ext + datasize;
- m->m_flags |= M_EXT;
-}
+if (m->m_flags & M_EXT) {
+datasize = m->m_data - m->m_ext;
+m->m_ext = g_realloc(m->m_ext, size + datasize);
+} else {
+datasize = m->m_data - m->m_dat;
+m->m_ext = g_malloc(size + datasize);
+memcpy(m->m_ext, m->m_dat, m->m_size);
+m->m_flags |= M_EXT;
+}
 
-m->m_size = size + datasize;
+m->m_data = m->m_ext + datasize;
+m->m_size = size + datasize;
 }
 
 
-- 
2.17.1

[Qemu-devel] [PULL 2/5] slirp: Add Samuel Thibault's staging tree for slirp

2018-06-07 Thread Samuel Thibault

Signed-off-by: Samuel Thibault 
Acked-by: Thomas Huth 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 41cd3736a9..4c73c16fee 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1675,6 +1675,7 @@ S: Maintained
 F: slirp/
 F: net/slirp.c
 F: include/net/slirp.h
+T: git https://people.debian.org/~sthibault/qemu.git slirp
 T: git git://git.kiszka.org/qemu.git queues/slirp
 
 Stubs
-- 
2.17.1

Re: [Qemu-devel] [PATCH v2 1/4] hw/block/fdc: Replace error_setg(&error_abort) by assert()

2018-06-07 Thread Markus Armbruster

John Snow  writes:

> On 06/07/2018 10:46 AM, Philippe Mathieu-Daudé wrote:
>> Use assert() instead of error_setg(&error_abort),
>> as suggested by the "qapi/error.h" documentation:
>> 
>> Please don't error_setg(&error_fatal, ...), use error_report() and
>> exit(), because that's more obvious.
>> Likewise, don't error_setg(&error_abort, ...), use assert().
>> 
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/block/fdc.c | 9 +
>>  1 file changed, 1 insertion(+), 8 deletions(-)
>> 
>> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
>> index cd29e27d8f..7c1c57f57f 100644
>> --- a/hw/block/fdc.c
>> +++ b/hw/block/fdc.c
>> @@ -396,16 +396,9 @@ static int pick_geometry(FDrive *drv)
>> nb_sectors,
>> FloppyDriveType_str(parse->drive));
>>  }
>> +assert(type_match != -1 && "misconfigured fd_format");
>>  match = type_match;
>>  }
>> -
>> -/* No match of any kind found -- fd_format is misconfigured, abort. */
>> -if (match == -1) {
>> -error_setg(&error_abort, "No candidate geometries present in table "
>> -   " for floppy drive type '%s'",
>> -   FloppyDriveType_str(drv->drive));
>> -}
>> -
>>  parse = &(fd_formats[match]);
>>  
>>   out:
>> 
>
> Truthfully sad to see the lipstick go, because this is my pig.
> Oh well, it doesn't matter. Not really, anyway.
>
> ACK

There's still a bit of lipstick in assert()'s argument.

v1 has the error_report() lipstick.  Would you like me to merge that one
instead?  Your pig, your choice :)

[Qemu-devel] [PATCH 4/6] block-backend: Add blk_aio_copy_range

2018-06-07 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 block/block-backend.c  | 36 ++
 include/sysemu/block-backend.h |  4 
 2 files changed, 40 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index e20a204bee..36d928e13d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1213,6 +1213,14 @@ typedef struct BlkRwCo {
 unsigned long int req;
 void *buf;
 } ioctl;
+
+struct {
+BlockBackend *dst_blk;
+int64_t src_off;
+int64_t dst_off;
+int bytes;
+BdrvRequestFlags flags;
+} copy_range;
 };
 
 } BlkRwCo;
@@ -1505,6 +1513,34 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t 
offset,
 return blk_aio_prwv(blk, &rwco, blk_aio_write_entry, cb, opaque);
 }
 
+static void blk_aio_copy_range_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = &acb->rwco;
+
+rwco->ret = blk_co_copy_range(rwco->blk, rwco->copy_range.src_off,
+  rwco->copy_range.dst_blk,
+  rwco->copy_range.dst_off,
+  rwco->copy_range.bytes,
+  rwco->copy_range.flags);
+blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_copy_range(BlockBackend *src, int64_t src_offset,
+   BlockBackend *dst, int64_t dst_offset,
+   uint64_t bytes, BdrvRequestFlags flags,
+   BlockCompletionFunc *cb, void *opaque)
+{
+BlkRwCo rwco = (BlkRwCo) {
+.copy_range.src_off = src_offset,
+.copy_range.dst_blk = dst,
+.copy_range.dst_off = dst_offset,
+.copy_range.bytes = bytes,
+.copy_range.flags = flags,
+};
+return blk_aio_prwv(src, &rwco, blk_aio_copy_range_entry, cb, opaque);
+}
+
 static void blk_aio_flush_entry(void *opaque)
 {
 BlkAioEmAIOCB *acb = opaque;
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8d03d493c2..ea121eac3f 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -147,6 +147,10 @@ BlockAIOCB *blk_aio_flush(BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque);
 BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int bytes,
  BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *blk_aio_copy_range(BlockBackend *src, int64_t src_offset,
+   BlockBackend *dst, int64_t dst_offset,
+   uint64_t bytes, BdrvRequestFlags flags,
+   BlockCompletionFunc *cb, void *opaque);
 void blk_aio_cancel(BlockAIOCB *acb);
 void blk_aio_cancel_async(BlockAIOCB *acb);
 int blk_co_ioctl(BlockBackend *blk, unsigned long int req, void *buf);
-- 
2.17.0

[Qemu-devel] [PATCH 6/6] mirror: Use copy offloading

2018-06-07 Thread Fam Zheng

This makes the mirror job to try offloaded copy. If it fails, error
action will not be taken yet, instead the failed cluster and all the
subsequent ones will fall back to bounce buffer.

Signed-off-by: Fam Zheng 
---
 block/mirror.c | 71 +-
 block/trace-events |  1 +
 2 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 435268bbbf..a90b64550c 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -71,6 +71,8 @@ typedef struct MirrorBlockJob {
 int target_cluster_size;
 int max_iov;
 bool initial_zeroing_ongoing;
+
+bool use_copy_range;
 } MirrorBlockJob;
 
 typedef struct MirrorOp {
@@ -293,6 +295,69 @@ static uint64_t mirror_do_read(MirrorBlockJob *s, int64_t 
offset,
 return ret;
 }
 
+static void mirror_copy_range_complete(void *opaque, int ret)
+{
+MirrorOp *op = opaque;
+MirrorBlockJob *s = op->s;
+
+aio_context_acquire(blk_get_aio_context(s->common.blk));
+trace_mirror_copy_range_complete(s, ret, op->offset, op->bytes);
+if (!ret) {
+mirror_iteration_done(op, ret);
+} else {
+uint64_t bytes;
+
+s->use_copy_range = false;
+s->in_flight--;
+s->bytes_in_flight -= op->bytes;
+bytes = mirror_do_read(s, op->offset, op->bytes);
+/* No alignment adjusting in mirror_do_read since we've already done
+ * that in mirror_do_copy(). */
+assert(bytes == op->bytes);
+g_free(op);
+}
+aio_context_release(blk_get_aio_context(s->common.blk));
+}
+
+static uint64_t mirror_do_copy(MirrorBlockJob *s, int64_t offset,
+   uint64_t bytes)
+{
+uint64_t ret;
+MirrorOp *op;
+
+if (!s->use_copy_range || offset < BLOCK_PROBE_BUF_SIZE) {
+return mirror_do_read(s, offset, bytes);
+}
+
+assert(bytes);
+assert(bytes < BDRV_REQUEST_MAX_BYTES);
+ret = bytes;
+
+if (s->cow_bitmap) {
+ret += mirror_cow_align(s, &offset, &bytes);
+}
+/* The offset is granularity-aligned because:
+ * 1) Caller passes in aligned values;
+ * 2) mirror_cow_align is used only when target cluster is larger. */
+assert(QEMU_IS_ALIGNED(offset, s->granularity));
+/* The range is sector-aligned, since bdrv_getlength() rounds up. */
+assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
+
+op = g_new0(MirrorOp, 1);
+op->s = s;
+op->offset = offset;
+op->bytes = bytes;
+
+/* Copy the dirty cluster.  */
+s->in_flight++;
+s->bytes_in_flight += bytes;
+trace_mirror_one_iteration(s, offset, bytes);
+
+blk_aio_copy_range(s->common.blk, offset, s->target, offset,
+   bytes, 0, mirror_copy_range_complete, op);
+return ret;
+}
+
 static void mirror_do_zero_or_discard(MirrorBlockJob *s,
   int64_t offset,
   uint64_t bytes,
@@ -429,7 +494,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 io_bytes = mirror_clip_bytes(s, offset, io_bytes);
 switch (mirror_method) {
 case MIRROR_METHOD_COPY:
-io_bytes = io_bytes_acct = mirror_do_read(s, offset, io_bytes);
+io_bytes = io_bytes_acct = mirror_do_copy(s, offset, io_bytes);
 break;
 case MIRROR_METHOD_ZERO:
 case MIRROR_METHOD_DISCARD:
@@ -1090,6 +1155,8 @@ static BlockDriver bdrv_mirror_top = {
 .bdrv_co_pdiscard   = bdrv_mirror_top_pdiscard,
 .bdrv_co_flush  = bdrv_mirror_top_flush,
 .bdrv_co_block_status   = bdrv_co_block_status_from_backing,
+.bdrv_co_copy_range_from= bdrv_co_copy_range_from_backing,
+.bdrv_co_copy_range_to  = bdrv_co_copy_range_to_backing,
 .bdrv_refresh_filename  = bdrv_mirror_top_refresh_filename,
 .bdrv_close = bdrv_mirror_top_close,
 .bdrv_child_perm= bdrv_mirror_top_child_perm,
@@ -1177,6 +1244,8 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 s->source = bs;
 s->mirror_top_bs = mirror_top_bs;
 
+s->use_copy_range = true;
+
 /* No resize for the target either; while the mirror is still running, a
  * consistent read isn't necessarily possible. We could possibly allow
  * writes and graph modifications, though it would likely defeat the
diff --git a/block/trace-events b/block/trace-events
index 2d59b53fd3..2958003e33 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -34,6 +34,7 @@ mirror_one_iteration(void *s, int64_t offset, uint64_t bytes) 
"s %p offset %" PR
 mirror_iteration_done(void *s, int64_t offset, uint64_t bytes, int ret) "s %p 
offset %" PRId64 " bytes %" PRIu64 " ret %d"
 mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p 
dirty count %"PRId64" free buffers %d in_flight %d"
 mirror_yield_in_flight(void *s, int64_t offset, int in_flight) "s %p offset %" 
PRId64 " in_flight

[Qemu-devel] [PATCH 3/6] block-backend: Refactor AIO emulation

2018-06-07 Thread Fam Zheng

BlkRwCo fields are multi-purposed. @offset is sometimes used to pass the
'req' number for blk_ioctl and blk_aio_ioctl; @iobuf is sometimes the
pointer for QEMUIOVector @qiov sometimes the ioctl @buf. This is not as
clean as it can be. As the coming copy range emulation wants to add
more differentiation in parameters, refactor a bit.

Move the per-request fields to a union and create one struct for each
type. While at it also move the bytes parameter from BlkAioEmAIOCB to
BlkRwCo.

Signed-off-by: Fam Zheng 
---
 block/block-backend.c | 211 +++---
 1 file changed, 134 insertions(+), 77 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index d55c328736..e20a204bee 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1192,62 +1192,79 @@ int coroutine_fn blk_co_pwritev(BlockBackend *blk, 
int64_t offset,
 
 typedef struct BlkRwCo {
 BlockBackend *blk;
-int64_t offset;
-void *iobuf;
 int ret;
-BdrvRequestFlags flags;
+
+union {
+struct {
+int64_t offset;
+int bytes;
+QEMUIOVector *qiov;
+BdrvRequestFlags flags;
+} prwv;
+
+struct {
+int64_t offset;
+int bytes;
+void *buf;
+BdrvRequestFlags flags;
+} prw;
+
+struct {
+unsigned long int req;
+void *buf;
+} ioctl;
+};
+
 } BlkRwCo;
 
 static void blk_read_entry(void *opaque)
 {
 BlkRwCo *rwco = opaque;
-QEMUIOVector *qiov = rwco->iobuf;
+QEMUIOVector qiov;
+struct iovec iov;
 
-rwco->ret = blk_co_preadv(rwco->blk, rwco->offset, qiov->size,
-  qiov, rwco->flags);
+iov = (struct iovec) {
+.iov_base = rwco->prw.buf,
+.iov_len = rwco->prw.bytes,
+};
+qemu_iovec_init_external(&qiov, &iov, 1);
+
+rwco->ret = blk_co_preadv(rwco->blk, rwco->prw.offset, rwco->prw.bytes,
+  &qiov, rwco->prw.flags);
 }
 
 static void blk_write_entry(void *opaque)
 {
 BlkRwCo *rwco = opaque;
-QEMUIOVector *qiov = rwco->iobuf;
-
-rwco->ret = blk_co_pwritev(rwco->blk, rwco->offset, qiov->size,
-   qiov, rwco->flags);
-}
-
-static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
-   int64_t bytes, CoroutineEntry co_entry,
-   BdrvRequestFlags flags)
-{
 QEMUIOVector qiov;
 struct iovec iov;
-BlkRwCo rwco;
 
 iov = (struct iovec) {
-.iov_base = buf,
-.iov_len = bytes,
+.iov_base = rwco->prw.buf,
+.iov_len = rwco->prw.bytes,
 };
 qemu_iovec_init_external(&qiov, &iov, 1);
 
-rwco = (BlkRwCo) {
-.blk= blk,
-.offset = offset,
-.iobuf  = &qiov,
-.flags  = flags,
-.ret= NOT_DONE,
-};
+rwco->ret = blk_co_pwritev(rwco->blk, rwco->prw.offset, rwco->prw.bytes,
+   &qiov, rwco->prw.flags);
+}
 
+static int blk_prw(BlockBackend *blk, BlkRwCo *rwco,
+   CoroutineEntry co_entry)
+{
+
+rwco->blk = blk;
+rwco->ret = NOT_DONE;
 if (qemu_in_coroutine()) {
 /* Fast-path if already in coroutine context */
-co_entry(&rwco);
+co_entry(rwco);
 } else {
-Coroutine *co = qemu_coroutine_create(co_entry, &rwco);
+Coroutine *co = qemu_coroutine_create(co_entry, rwco);
 bdrv_coroutine_enter(blk_bs(blk), co);
-BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE);
+BDRV_POLL_WHILE(blk_bs(blk), rwco->ret == NOT_DONE);
 }
 
-return rwco.ret;
+return rwco->ret;
 }
 
 int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
@@ -1269,8 +1286,12 @@ int blk_pread_unthrottled(BlockBackend *blk, int64_t 
offset, uint8_t *buf,
 int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
   int bytes, BdrvRequestFlags flags)
 {
-return blk_prw(blk, offset, NULL, bytes, blk_write_entry,
-   flags | BDRV_REQ_ZERO_WRITE);
+BlkRwCo rwco = (BlkRwCo) {
+.prwv.offset = offset,
+.prwv.bytes = bytes,
+.prwv.flags = flags | BDRV_REQ_ZERO_WRITE,
+};
+return blk_prw(blk, &rwco, blk_write_entry);
 }
 
 int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
@@ -1316,7 +1337,6 @@ BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
 typedef struct BlkAioEmAIOCB {
 BlockAIOCB common;
 BlkRwCo rwco;
-int bytes;
 bool has_returned;
 } BlkAioEmAIOCB;
 
@@ -1340,9 +1360,8 @@ static void blk_aio_complete_bh(void *opaque)
 blk_aio_complete(acb);
 }
 
-static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
-void *iobuf, CoroutineEntry co_entry,
-BdrvRequestFlags flags,
+static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, const BlkRwCo *rwco,
+

[Qemu-devel] [PATCH 2/6] block: Check if block drivers can do copy offloading

2018-06-07 Thread Fam Zheng

This avoids the wasteful cluster allocation in qcow2 before actually
trying an unsupported copy range call, for example.

Signed-off-by: Fam Zheng 
---
 block.c   | 12 
 block/file-posix.c|  9 +
 block/io.c|  3 +++
 block/iscsi.c |  8 
 block/qcow2.c | 11 +++
 block/raw-format.c|  6 ++
 include/block/block_int.h |  4 
 7 files changed, 53 insertions(+)

diff --git a/block.c b/block.c
index 501b64c819..28aa8d8a65 100644
--- a/block.c
+++ b/block.c
@@ -5320,3 +5320,15 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState 
*bs, const char *name,
 
 return drv->bdrv_can_store_new_dirty_bitmap(bs, name, granularity, errp);
 }
+
+bool bdrv_can_copy_range(BdrvChild *src, BdrvChild *dst)
+{
+BlockDriverState *bs;
+
+if (!src || !src->bs) {
+return false;
+}
+bs = src->bs;
+return bs && bs->drv && bs->drv->bdrv_can_copy_range &&
+   bs->drv->bdrv_can_copy_range(bs, dst);
+}
diff --git a/block/file-posix.c b/block/file-posix.c
index c6dae38f94..41c491c65b 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2539,6 +2539,13 @@ static int coroutine_fn 
raw_co_copy_range_to(BlockDriverState *bs,
NULL, bytes, QEMU_AIO_COPY_RANGE);
 }
 
+static bool raw_can_copy_range(BlockDriverState *bs,
+   BdrvChild *dst)
+{
+return dst->bs && dst->bs->drv &&
+   dst->bs->drv->bdrv_can_copy_range == raw_can_copy_range;
+}
+
 BlockDriver bdrv_file = {
 .format_name = "file",
 .protocol_name = "file",
@@ -2564,6 +2571,7 @@ BlockDriver bdrv_file = {
 .bdrv_aio_pdiscard = raw_aio_pdiscard,
 .bdrv_co_copy_range_from = raw_co_copy_range_from,
 .bdrv_co_copy_range_to  = raw_co_copy_range_to,
+.bdrv_can_copy_range = raw_can_copy_range,
 .bdrv_refresh_limits = raw_refresh_limits,
 .bdrv_io_plug = raw_aio_plug,
 .bdrv_io_unplug = raw_aio_unplug,
@@ -3044,6 +3052,7 @@ static BlockDriver bdrv_host_device = {
 .bdrv_aio_pdiscard   = hdev_aio_pdiscard,
 .bdrv_co_copy_range_from = raw_co_copy_range_from,
 .bdrv_co_copy_range_to  = raw_co_copy_range_to,
+.bdrv_can_copy_range = raw_can_copy_range,
 .bdrv_refresh_limits = raw_refresh_limits,
 .bdrv_io_plug = raw_aio_plug,
 .bdrv_io_unplug = raw_aio_unplug,
diff --git a/block/io.c b/block/io.c
index b7beaeeb9f..d8039793c2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2913,6 +2913,9 @@ int coroutine_fn bdrv_co_copy_range(BdrvChild *src, 
uint64_t src_offset,
 BlockDriverState *dst_bs = dst->bs;
 int ret;
 
+if (!bdrv_can_copy_range(src, dst)) {
+return -ENOTSUP;
+}
 bdrv_inc_in_flight(src_bs);
 bdrv_inc_in_flight(dst_bs);
 tracked_request_begin(&src_req, src_bs, src_offset,
diff --git a/block/iscsi.c b/block/iscsi.c
index c2fbd8a8aa..6c465ebd46 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2420,6 +2420,12 @@ out_unlock:
 return r;
 }
 
+static bool iscsi_can_copy_range(BlockDriverState *bs, BdrvChild *dst)
+{
+return dst->bs && dst->bs->drv &&
+   dst->bs->drv->bdrv_can_copy_range == iscsi_can_copy_range;
+}
+
 static QemuOptsList iscsi_create_opts = {
 .name = "iscsi-create-opts",
 .head = QTAILQ_HEAD_INITIALIZER(iscsi_create_opts.head),
@@ -2456,6 +2462,7 @@ static BlockDriver bdrv_iscsi = {
 .bdrv_co_pdiscard  = iscsi_co_pdiscard,
 .bdrv_co_copy_range_from = iscsi_co_copy_range_from,
 .bdrv_co_copy_range_to  = iscsi_co_copy_range_to,
+.bdrv_can_copy_range   = iscsi_can_copy_range,
 .bdrv_co_pwrite_zeroes = iscsi_co_pwrite_zeroes,
 .bdrv_co_readv = iscsi_co_readv,
 .bdrv_co_writev= iscsi_co_writev,
@@ -2493,6 +2500,7 @@ static BlockDriver bdrv_iser = {
 .bdrv_co_pdiscard  = iscsi_co_pdiscard,
 .bdrv_co_copy_range_from = iscsi_co_copy_range_from,
 .bdrv_co_copy_range_to  = iscsi_co_copy_range_to,
+.bdrv_can_copy_range   = iscsi_can_copy_range,
 .bdrv_co_pwrite_zeroes = iscsi_co_pwrite_zeroes,
 .bdrv_co_readv = iscsi_co_readv,
 .bdrv_co_writev= iscsi_co_writev,
diff --git a/block/qcow2.c b/block/qcow2.c
index 549fee9b69..1326410d1c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3440,6 +3440,16 @@ fail:
 return ret;
 }
 
+static bool qcow2_can_copy_range(BlockDriverState *bs, BdrvChild *dst)
+{
+bool r = bdrv_can_copy_range(bs->file, dst);
+
+if (bs->backing) {
+r = r && bdrv_can_copy_range(bs->backing, dst);
+}
+return r;
+}
+
 static int qcow2_truncate(BlockDriverState *bs, int64_t offset,
   PreallocMode prealloc, Error **errp)
 {
@@ -4690,6 +4700,7 @@ BlockDriver bdrv_qcow2 = {
 .bdrv_co_pdiscard   = qcow2_co_pdiscard,
 .bdrv_co_copy_range_from = qcow2_co_copy_range_from,
 .bdrv_co_copy_range_to  = qcow2_co_copy_range_to,
+.bdrv_can_copy_range= qcow2_can_copy_range,

[Qemu-devel] [PATCH 5/6] block: Add backing passthrough implementations for copy_range

2018-06-07 Thread Fam Zheng

Similar to bdrv_co_block_status_from_backing we add the two passthrough
callbacks for copy_range. This will be used by the block driver filters
so that they can support copy offloading.

Signed-off-by: Fam Zheng 
---
 block/io.c| 24 
 include/block/block_int.h | 22 ++
 2 files changed, 46 insertions(+)

diff --git a/block/io.c b/block/io.c
index d8039793c2..d1559c9cd5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1900,6 +1900,30 @@ int coroutine_fn 
bdrv_co_block_status_from_backing(BlockDriverState *bs,
 return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }
 
+int coroutine_fn bdrv_co_copy_range_from_backing(BlockDriverState *bs,
+ BdrvChild *src, uint64_t 
src_offset,
+ BdrvChild *dst, uint64_t 
dst_offset,
+ uint64_t bytes, 
BdrvRequestFlags flags)
+{
+if (!src->bs) {
+return -ENOMEDIUM;
+}
+return bdrv_co_copy_range_from(src->bs->backing, src_offset, dst,
+   dst_offset, bytes, flags);
+}
+
+int coroutine_fn bdrv_co_copy_range_to_backing(BlockDriverState *bs,
+   BdrvChild *src, uint64_t 
src_offset,
+   BdrvChild *dst, uint64_t 
dst_offset,
+   uint64_t bytes, 
BdrvRequestFlags flags)
+{
+if (!dst->bs) {
+return -ENOMEDIUM;
+}
+return bdrv_co_copy_range_to(src, src_offset, dst->bs->backing,
+ dst_offset, bytes, flags);
+}
+
 /*
  * Returns the allocation status of the specified sectors.
  * Drivers not implementing the functionality are assumed to not support
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 2c51cd420f..b488d74c1b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1118,6 +1118,28 @@ int coroutine_fn 
bdrv_co_block_status_from_backing(BlockDriverState *bs,
int64_t *pnum,
int64_t *map,
BlockDriverState **file);
+
+/*
+ * Default implementation for drivers to pass bdrv_co_copy_range_from() to
+ * their backing file.
+ */
+int coroutine_fn bdrv_co_copy_range_from_backing(BlockDriverState *bs,
+ BdrvChild *src, uint64_t 
src_offset,
+ BdrvChild *dst, uint64_t 
dst_offset,
+ uint64_t bytes,
+ BdrvRequestFlags flags);
+
+
+/*
+ * Default implementation for drivers to pass bdrv_co_copy_range_to() to their
+ * backing file.
+ */
+int coroutine_fn bdrv_co_copy_range_to_backing(BlockDriverState *bs,
+   BdrvChild *src, uint64_t 
src_offset,
+   BdrvChild *dst, uint64_t 
dst_offset,
+   uint64_t bytes,
+   BdrvRequestFlags flags);
+
 const char *bdrv_get_parent_name(const BlockDriverState *bs);
 void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp);
 bool blk_dev_has_removable_media(BlockBackend *blk);
-- 
2.17.0

[Qemu-devel] [PATCH 1/6] file-posix: Fix EINTR handling

2018-06-07 Thread Fam Zheng

EINTR should be checked against errno, not ret. While fixing the bug,
collecting the branches with a switch block.

Signed-off-by: Fam Zheng 
---
 block/file-posix.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 513d371bb1..c6dae38f94 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1473,20 +1473,20 @@ static ssize_t handle_aiocb_copy_range(RawPosixAIOData 
*aiocb)
 ssize_t ret = copy_file_range(aiocb->aio_fildes, &in_off,
   aiocb->aio_fd2, &out_off,
   bytes, 0);
-if (ret == -EINTR) {
-continue;
-}
-if (ret < 0) {
-if (errno == ENOSYS) {
+if (ret <= 0) {
+switch (errno) {
+case 0:
+/* No progress (e.g. when beyond EOF), let the caller fall back
+ * to buffer I/O. */
+/* fall through */
+case ENOSYS:
 return -ENOTSUP;
-} else {
+case EINTR:
+continue;
+default:
 return -errno;
 }
 }
-if (!ret) {
-/* No progress (e.g. when beyond EOF), fall back to buffer I/O. */
-return -ENOTSUP;
-}
 bytes -= ret;
 }
 return 0;
-- 
2.17.0

[Qemu-devel] [PATCH 0/6] mirror: Use copy offloading

2018-06-07 Thread Fam Zheng

This is the third part of copy offloading work. The first patches are fixes and
improvements in preparation for enabling mirror job. The last patch does a
similar change to the backup patch: it inserts a blk_aio_copy_range call before
the usual bounce buffer code in mirror_iteration.

Fam Zheng (6):
  file-posix: Fix EINTR handling
  block: Check if block drivers can do copy offloading
  block-backend: Refactor AIO emulation
  block-backend: Add blk_aio_copy_range
  block: Add backing passthrough implementations for copy_range
  mirror: Use copy offloading

 block.c|  12 ++
 block/block-backend.c  | 247 +++--
 block/file-posix.c |  29 ++--
 block/io.c |  27 
 block/iscsi.c  |   8 ++
 block/mirror.c |  71 +-
 block/qcow2.c  |  11 ++
 block/raw-format.c |   6 +
 block/trace-events |   1 +
 include/block/block_int.h  |  26 
 include/sysemu/block-backend.h |   4 +
 11 files changed, 354 insertions(+), 88 deletions(-)

-- 
2.17.0

[Qemu-devel] [Bug 1636217] Re: qemu-kvm 2.7 does not boot kvm VMs with virtio on top of VMware ESX

2018-06-07 Thread Thomas Huth

Marking as fixed, according to comment #13

** Changed in: qemu
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1636217

Title:
  qemu-kvm 2.7 does not boot kvm VMs with virtio on top of VMware ESX

Status in QEMU:
  Fix Released

Bug description:
  After todays Proxmox update all my Linux VMs stopped booting.

  # How to reproduce
  - Have KVM on top of VMware ESX (I use VMware ESX 6)
  - Boot Linux VM with virtio Disk drive.

  
  # Result
  virtio based VMs do not boot anymore:

  root@demotuxdc:/etc/pve/nodes/demotuxdc/qemu-server# grep virtio0 100.conf 
  bootdisk: virtio0
  virtio0: pvestorage:100/vm-100-disk-1.raw,discard=on,size=20G

  (initially with cache=writethrough, but that doesn´t matter)

  What happens instead is:

  - BIOS displays "Booting from harddisk..."
  - kvm process of VM loops at about 140% of Intel(R) Core(TM) i5-6260U CPU @ 
1.80GHz Skylake dual core CPU

  Disk of course has valid bootsector:

  root@demotuxdc:/srv/pvestorage/images/100# file -sk vm-100-disk-1.raw 
  vm-100-disk-1.raw: DOS/MBR boot sector DOS/MBR boot sector DOS executable 
(COM), boot code
  root@demotuxdc:/srv/pvestorage/images/100# head -c 2048 vm-100-disk-1.raw | 
hd | grep GRUB
  0170  be 94 7d e8 2e 00 cd 18  eb fe 47 52 55 42 20 00  |..}...GRUB .|

  
  # Workaround 1
  - Change disk from virtio0 to scsi0
  - Debian boots out of the box after this change
  - SLES 12 needs a rebuilt initrd
  - CentOS 7 too, but it seems that is not even enough and it still fails (even 
in hostonly="no" mode for dracut)

  
  # Workaround 2
  Downgrade pve-qemu-kvm 2.7.0-3 to 2.6.2-2.

  
  # Expected results
  Disk boots just fine via virtio like it did before.

  
  # Downstream bug report
  Downstream suggests an issue with upstream qemu-kvm:

  https://bugzilla.proxmox.com/show_bug.cgi?id=1181

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1636217/+subscriptions

Re: [Qemu-devel] [PATCH 4/6] hmp: Add info commands for preconfig

2018-06-07 Thread Markus Armbruster

"Dr. David Alan Gilbert"  writes:

> * Markus Armbruster (arm...@redhat.com) wrote:
>> "Dr. David Alan Gilbert"  writes:
>> 
>> > * Markus Armbruster (arm...@redhat.com) wrote:
>> >> Peter Xu  writes:
>> >> 
>> >> > On Tue, Jun 05, 2018 at 01:26:34PM +0100, Dr. David Alan Gilbert (git) 
>> >> > wrote:
>> >> >> From: "Dr. David Alan Gilbert" 
>> >> >> 
>> >> >> Allow a bunch of the info commands to be used in preconfig.
>> >> >> Could probably add most of them.
>> >> >
>> >> > I guess some of them may not work yet during preconfig.  E.g.:
>> >> >
>> >> > $ ./x86_64-softmmu/qemu-system-x86_64 -preconfig -monitor stdio
>> >> > QEMU 2.12.50 monitor - type 'help' for more information
>> >> > (qemu) info mtree
>> >> > address-space: memory
>> >> >   - (prio 0, i/o): system
>> >> >
>> >> > address-space: I/O
>> >> >   - (prio 0, i/o): io
>> >> >
>> >> > But it's fine to enable that I guess.
>> >> >
>> >> > (Which "info" command would you want to use during preconfig?)
>> >> >
>> >> >> 
>> >> >> Signed-off-by: Dr. David Alan Gilbert 
>> >> >
>> >> > Reviewed-by: Peter Xu 
>> >> 
>> >> The reason for having -preconfig is us despairing of making -S do the
>> >> right thing.  We'd have to *understand* the tangled mess that is our
>> >> startup, and rearrange it so QMP becomes available early enough for
>> >> configuring NUMA (and other things), yet late enough for everything to
>> >> work.
>> >> 
>> >> -preconfig is a cheap hack to avoid this headache, by bypassing almost
>> >> all of "everything".
>> >> 
>> >> Now you bring back some of "everything".  Dangerous.  You better show it
>> >> actually works.  Until you do:
>> >> 
>> >> NAK
>> >
>> > Well I did test each command in here to make sure it didn't
>> > crash/produce complete junk; but here's the output with the v2 of this
>> > patch that Igor R-b:
>> [...]
>> 
>> For the sake of the argument, let's assume these commands all work in
>> preconfig state.  Are their QMP equivalents all available in preconfig
>> state?
>
> That I don't know; I was happy to fix my list to the ones
> Igor recommended.  If you object to some particular entries I'll
> be happy to change them.

HMP must not provide more functionality than QMP.  Specifically, we may
provide "info FOO" only when we also provide query-FOO.

There are exceptions to this rule.  I don't think they apply here.  I'm
prepared to discuss them, of course.

I wish there was a way to automate "provide command in HMP when its
buddy is available in QMP", but since the buddies are only connected by
code, that seems infeasible.

Without such automation, the two sets of available commands need to be
kept consistent manually.  The larger they are, the more of a bother.

Bother is fine when it provides commensurate value to users.  Options in
increasing order of value provided:

(1) HMP becomes ready only after we exit preconfig state (what I
proposed in Message-ID: <87603cxob9@dusky.pond.sub.org>.

(2) HMP provides help, quit, exit-preconfig.

(3) HMP provides (a subset of) the commands QMP provides.

I figure the maintenance cost of (1) and (2) will be negligible, but (3)
could be a drag.  Are you sure it's worthwhile?

Re: [Qemu-devel] [PATCH] slirp: Add Samuel Thibault's staging tree for slirp

2018-06-07 Thread Thomas Huth

On 06.06.2018 15:14, Samuel Thibault wrote:
> Ping? I'm not sure who I am supposed to get a review from, or if I have
> to have one at all?
> 
> Samuel
> 
> Samuel Thibault, le jeu. 31 mai 2018 21:48:43 +0200, a ecrit:
>> Signed-off-by: Samuel Thibault 
>> ---
>>  MAINTAINERS | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 41cd3736a9..8822ae9b70 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -1675,6 +1675,7 @@ S: Maintained
>>  F: slirp/
>>  F: net/slirp.c
>>  F: include/net/slirp.h
>> +T: git http://people.debian.org/~sthibault/qemu.git slirp
>>  T: git git://git.kiszka.org/qemu.git queues/slirp

FWIW:

Acked-by: Thomas Huth

Re: [Qemu-devel] [PATCH] chardev: Restore CR,LF on stdio

2018-06-07 Thread Thomas Huth

On 07.06.2018 23:08, Philippe Mathieu-Daudé wrote:
> Remove the 'stair-step output' on stdio.
> 
> This partially reverts commit 12fb0ac05, which was correct
> on the mailing list but got corrupted by the maintainer :p
> 
> Introduced-by: 3b876140-c035-dd39-75d0-d54c48128...@redhat.com
> Reported-by: BALATON Zoltan 
> Suggested-by: Thomas Huth 
> Tested-by: Laurent Desnogues 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> See:
> http://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg06202.html (bug)
> http://lists.nongnu.org/archive/html/qemu-devel/2018-06/msg01309.html (report)
> 
> Peter, Can this enters directly as bug-fix?
> 
>  chardev/char-stdio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/chardev/char-stdio.c b/chardev/char-stdio.c
> index d83e60e787..96375f2ab8 100644
> --- a/chardev/char-stdio.c
> +++ b/chardev/char-stdio.c
> @@ -59,7 +59,7 @@ static void qemu_chr_set_echo_stdio(Chardev *chr, bool echo)
>  if (!echo) {
>  tty.c_iflag &= ~(IGNBRK | BRKINT | PARMRK | ISTRIP
>   | INLCR | IGNCR | ICRNL | IXON);
> -tty.c_oflag &= ~OPOST;
> +tty.c_oflag |= OPOST;
>  tty.c_lflag &= ~(ECHO | ECHONL | ICANON | IEXTEN);
>  tty.c_cflag &= ~(CSIZE | PARENB);
>  tty.c_cflag |= CS8;
> 

I think this is the right way to go.

Reviewed-by: Thomas Huth

Re: [Qemu-devel] [RFC v2 09/12] Add vhost-input-pci

2018-06-07 Thread Gerd Hoffmann

On Fri, Jun 08, 2018 at 12:22:38AM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Jun 4, 2018 at 10:58 AM, Gerd Hoffmann  wrote:
> >> +#define TYPE_VHOST_USER_INPUT_PCI "vhost-user-input-pci"
> >
> > Patch $subject mismatch.
> >
> >> +struct VHostUserInput {
> >> +VirtIOInput   parent_obj;
> >> +
> >> +VhostUserBackend  *vhost;
> >> +};
> >
> > Nothing input specific here ...
> 
> Except VirtIOInput

Oops, missed that.  Scratch the idea then.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 4/6] hmp: Add info commands for preconfig

2018-06-07 Thread Markus Armbruster

Igor Mammedov  writes:

> On Thu, 07 Jun 2018 14:22:34 +0200
> Markus Armbruster  wrote:
>
>> Peter Xu  writes:
>> 
>> > On Tue, Jun 05, 2018 at 01:26:34PM +0100, Dr. David Alan Gilbert (git) 
>> > wrote:  
>> >> From: "Dr. David Alan Gilbert" 
>> >> 
>> >> Allow a bunch of the info commands to be used in preconfig.
>> >> Could probably add most of them.  
>> >
>> > I guess some of them may not work yet during preconfig.  E.g.:
>> >
>> > $ ./x86_64-softmmu/qemu-system-x86_64 -preconfig -monitor stdio
>> > QEMU 2.12.50 monitor - type 'help' for more information
>> > (qemu) info mtree
>> > address-space: memory
>> >   - (prio 0, i/o): system
>> >
>> > address-space: I/O
>> >   - (prio 0, i/o): io
>> >
>> > But it's fine to enable that I guess.
>> >
>> > (Which "info" command would you want to use during preconfig?)
>> >  
>> >> 
>> >> Signed-off-by: Dr. David Alan Gilbert   
>> >
>> > Reviewed-by: Peter Xu   
>> 
>> The reason for having -preconfig is us despairing of making -S do the
>> right thing.  We'd have to *understand* the tangled mess that is our
>> startup, and rearrange it so QMP becomes available early enough for
>> configuring NUMA (and other things), yet late enough for everything to
>> work.
> We solve concrete NUMA problem at -S time as was demonstrated by thread:
> "RFC v2 0/4] enable numa configuration before machine  is running from 
> HMP/QMP"
> were pros and cons are were summarized. But that won't scale to other things.
>  
>> -preconfig is a cheap hack to avoid this headache, by bypassing almost
>> all of "everything".
> hack should allow to move configuration steps to early stage one by one
> until the point we could obsolete -S (from configuration point)
> without breaking current -S users and clean up start up mess in process.

Sometimes a hack is the only practical way forward.

>> Now you bring back some of "everything".  Dangerous.  You better show it
>> actually works.  Until you do:
> Tested v2, works as expected so far.

See my reply to Dave.

>> 
>> NAK
>>

Re: [Qemu-devel] [qemu PATCH 3/5] hw/i386: Update SSDT table used by "make check"

2018-06-07 Thread Thomas Huth

On 08.06.2018 00:31, Ross Zwisler wrote:
> This commit:
> 
> commit aa78a16d8645 ("hw/i386: Rename 2.13 machine types to 3.0")
> 
> updated the name used to create the q35 machine, which in turn changed the
> SSDT table which is generated when we run "make check":
> 
>   acpi-test: Warning! SSDT mismatch. Actual [asl:/tmp/asl-QZDWJZ.dsl,
>   aml:/tmp/aml-T8JYJZ], Expected [asl:/tmp/asl-DTWVJZ.dsl,
>   aml:tests/acpi-test-data/q35/SSDT.dimmpxm].
> 
> Here's the only difference, aside from the checksum:
> 
>   < Name (MEMA, 0x07FFF000)
>   ---
>   > Name (MEMA, 0x07FFE000)
> 
> Update the binary table that we compare against so it reflects this name
> change.
> 
> Signed-off-by: Ross Zwisler 
> Cc: Peter Maydell 
> Cc: Cornelia Huck 
> Cc: Thomas Huth 
> Cc: Eduardo Habkost 
> Fixes: commit aa78a16d8645 ("hw/i386: Rename 2.13 machine types to 3.0")
> ---
>  tests/acpi-test-data/q35/SSDT.dimmpxm | Bin 685 -> 685 bytes
>  1 file changed, 0 insertions(+), 0 deletions(-)
> 
> diff --git a/tests/acpi-test-data/q35/SSDT.dimmpxm 
> b/tests/acpi-test-data/q35/SSDT.dimmpxm
> index 
> 8ba0e67cb72daa81a65da4906d37a5e0f4af1fd4..2d5b721bcf9c398feb6d005761f898015042e8a4
>  100644
> GIT binary patch
> delta 23
> fcmZ3>x|WqIIM^j*EfWI+qr*n71x(Bz{<8xBPCEwk
> 
> delta 23
> fcmZ3>x|WqIIM^j*EfWI+W57nP1x(Bj{<8xBPMZev
> 

Thanks, that fixes the warning for me.

Tested-by: Thomas Huth

Re: [Qemu-devel] [PATCH v10 5/7] monitor: remove event_clock_type

2018-06-07 Thread Markus Armbruster

Peter Xu  writes:

> Instead, use a dynamic function to detect which clock we'll use.  The
> problem is that the old code will let monitor initialization depend on
> configure_accelerator() (that's where qtest_enabled() start to take
> effect).  After this change, we don't have such a dependency any more.
> We just need to make sure configure_accelerator() is called when we
> start to use it.  Now it's only used in monitor_qapi_event_queue() and
> monitor_qapi_event_handler(), so we're good.
>
> Suggested-by: Markus Armbruster 
> Signed-off-by: Peter Xu 
> ---
>  monitor.c | 25 -
>  1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/monitor.c b/monitor.c
> index fa9c35631f..2bda0dfa12 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -281,8 +281,6 @@ QmpCommandList qmp_commands, qmp_cap_negotiation_commands;
>  
>  Monitor *cur_mon;
>  
> -static QEMUClockType event_clock_type = QEMU_CLOCK_REALTIME;
> -
>  static void monitor_command_cb(void *opaque, const char *cmdline,
> void *readline_opaque);
>  
> @@ -309,6 +307,19 @@ static inline bool monitor_is_hmp_non_interactive(const 
> Monitor *mon)
>  return !monitor_is_qmp(mon) && !monitor_uses_readline(mon);
>  }
>  
> +/*
> + * This should never be called before configure_accelerator() since
> + * only until then could we know whether qtest was enabled or not.

Uh, we know it after then, not until then.  What about

   /*
* Return the clock to use for recording an event's time.
* Beware: result is invalid before configure_accelerator().

> + */
> +static inline QEMUClockType monitor_get_clock(void)

Suggest to rename to monitor_get_event_clock(), or just
get_event_clock().

Happy to apply touch-ups on commit.

> +{
> +/*
> + * This allows us to perform tests on the monitor queues to verify
> + * that the rate limits are enforced.
> + */
> +return qtest_enabled() ? QEMU_CLOCK_VIRTUAL : QEMU_CLOCK_REALTIME;
> +}
> +
>  /**
>   * Is the current monitor, if any, a QMP monitor?
>   */
> @@ -632,7 +643,7 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, 
> Error **errp)
>   * monitor_qapi_event_handler() in evconf->rate ns.  Any
>   * events arriving before then will be delayed until then.
>   */
> -int64_t now = qemu_clock_get_ns(event_clock_type);
> +int64_t now = qemu_clock_get_ns(monitor_get_clock());
>  
>  monitor_qapi_event_emit(event, qdict);
>  
> @@ -640,7 +651,7 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, 
> Error **errp)
>  evstate->event = event;
>  evstate->data = qobject_ref(data);
>  evstate->qdict = NULL;
> -evstate->timer = timer_new_ns(event_clock_type,
> +evstate->timer = timer_new_ns(monitor_get_clock(),
>monitor_qapi_event_handler,
>evstate);
>  g_hash_table_add(monitor_qapi_event_state, evstate);
> @@ -665,7 +676,7 @@ static void monitor_qapi_event_handler(void *opaque)
>  qemu_mutex_lock(&monitor_lock);
>  
>  if (evstate->qdict) {
> -int64_t now = qemu_clock_get_ns(event_clock_type);
> +int64_t now = qemu_clock_get_ns(monitor_get_clock());
>  
>  monitor_qapi_event_emit(evstate->event, evstate->qdict);
>  qobject_unref(evstate->qdict);
> @@ -721,10 +732,6 @@ static gboolean qapi_event_throttle_equal(const void *a, 
> const void *b)
>  
>  static void monitor_qapi_event_init(void)
>  {
> -if (qtest_enabled()) {
> -event_clock_type = QEMU_CLOCK_VIRTUAL;
> -}
> -
>  monitor_qapi_event_state = g_hash_table_new(qapi_event_throttle_hash,
>  qapi_event_throttle_equal);
>  qmp_event_set_func_emit(monitor_qapi_event_queue);

With at least a suitable fix for the comment:
Reviewed-by: Markus Armbruster

Re: [Qemu-devel] [qemu PATCH 4/5] machine: fix some misspelled words

2018-06-07 Thread Thomas Huth

On 08.06.2018 00:31, Ross Zwisler wrote:
> Normally this might not be worth fixing, but several of these are strings
> which are displayed to users.
> 
> Signed-off-by: Ross Zwisler 
> ---
>  hw/core/machine.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 617e5f8d75..a21269fa39 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -609,7 +609,7 @@ static void machine_class_init(ObjectClass *oc, void 
> *data)
>  machine_get_igd_gfx_passthru, machine_set_igd_gfx_passthru,
>  &error_abort);
>  object_class_property_set_description(oc, "igd-passthru",
> -"Set on/off to enable/disable igd passthrou", &error_abort);
> +"Set on/off to enable/disable igd passthru", &error_abort);

Shouldn't that rather be "passthrough" instead?

 Thomas

Re: [Qemu-devel] [PATCH v2 8/8] qapi: query-blockstat: add driver specific file-posix stats

2018-06-07 Thread Markus Armbruster

Eric Blake  writes:

> On 06/07/2018 10:23 AM, Anton Nefedov wrote:
 If we introduce BlockdevDriver as a discriminator as Markus suggests
 above, we need some way to define its value.
 I guess one would be to check blk->bs->drv->format_name but it won't
 always work; often it's even blk->bs == NULL.
>>>
>>> There is no blk->bs, at least not if blk is a BlockBackend *.
>>>
>>> I figure the problem you're trying to describe is query-blockstats
>>> running into BlockBackends that aren't associated with a
>>> BlockDriverState (blk->root is null), and thus aren't associated with a
>>> BlockDriver.  Correct?
>>>
>>
>> Sorry, yes, exactly
>
> Okay, that sounds like the driver stats have to be optional, present
> only when blk->bs is non-NULL.
>
>>
 I guess we could add a default ('unspecified'?) to BlockdevDriver enum?
>>>
>>> This part I understand, but...
>>>
 But I'd rather leave an optional BlockDriverStats union (but make it
 flat). Only the drivers that provide these stats will need to set
 BlockdevDriver field. What do you think?
>>>
>>> I'm not sure I got this part.  Care to sketch the QAPI schema snippet?
>>>
>>
>> You earlier proposed:
>>
>>  >>> You're adding a union of driver-specific stats to a struct of generic
>>  >>> stats.  That's unnecessarily complicated.  Instead, turn the struct of
>>  >>> generic stats into a flat union, like this:
>>  >>>
>>  >>> { 'union': 'BlockStats',
>>  >>> 'base': { ... the generic stats, i.e. the members of BlockStats
>>  >>>   before this patch ...
>>  >>>   'driver': 'BlockdevDriver' }
>>  >>> 'discriminator': 'driver',
>>  >>> 'data': {
>>  >>> 'file': 'BlockDriverStatsFile',
>>  >>> ... } }
>
> That would require 'driver' to always resolve to something, even when
> there is no driver (unless we create a superset enum that adds 'none'
> beyond what 'BlockdevDriver' supports).
>
>>
>> But I meant to leave it as:
>>
>> + { 'union': 'BlockDriverStats':
>> + 'base': { 'driver': 'BlockdevDriver' },
>> + 'discriminator': 'driver',
>> + 'data': {
>> + 'file': 'BlockDriverStatsFile' } }
>>
>>
>>    { 'struct': 'BlockStats',
>>      'data': {'*device': 'str', '*node-name': 'str',
>>   'stats': 'BlockDeviceStats',
>> +    '*driver-stats': 'BlockDriverStats',
>>   '*parent': 'BlockStats',
>>   '*backing': 'BlockStats'} }
>>
>> so those block backends which do not provide driver stats do not need to
>> set BlockdevDriver field.
>
> This makes the most sense to me - we're stuck with a layer of nesting,
> but that's because driver-stats truly is optional (we don't always
> have access to a driver).

Agree.

Now we have to name the thing.  You propose @driver-stats.  Servicable,
but let's review how existing driver-specific things are named.

BlockdevOptions and BlockdevCreateOptions have them inline, thus no
name.

ImageInfo has '*format-specific': 'ImageInfoSpecific'.

If we follow ImageInfo precedence, we get '*driver-specific':
'BlockStatsSpecific'.  driver-specific I like well enough.
BlockStatsSpecific less so.  BlockStatsDriverSpecific feels better, but
perhaps a bit long.

Re: [Qemu-devel] [qemu PATCH 2/5] acpi: "make check" should fail on asl mismatch

2018-06-07 Thread Thomas Huth

On 08.06.2018 01:09, Michael S. Tsirkin wrote:
> On Thu, Jun 07, 2018 at 04:31:08PM -0600, Ross Zwisler wrote:
>> Currently if "make check" detects a mismatch in the ASL generated during
>> testing, we print an error such as:
>>
>>   acpi-test: Warning! SSDT mismatch. Actual [asl:/tmp/asl-QZDWJZ.dsl,
>>   aml:/tmp/aml-T8JYJZ], Expected [asl:/tmp/asl-DTWVJZ.dsl,
>>   aml:tests/acpi-test-data/q35/SSDT.dimmpxm].
>>
>> but the testing still exits with good shell status.  This is wrong, and
>> makes bisecting such a failure difficult.
>>
>> Signed-off-by: Ross Zwisler 
> 
> Failing would also mean that any change must update the expected files
> at the same time.  And that in turn is problematic because expected
> files are binary and can't be merged.
> 
> In other words the way we devel ACPI right now means that bisect will
> periodically produce a diff, it's not an error.

But apparently the current way also allows that real bug go unnoticed
for a while, until somebody accidentially spots the warning in the
output of "make check". Wouldn't it be better to fail at CI time
already? If a merge of the file is required, you can still resolve that
manually (i.e. by rebasing one of the pull requests).

 Thomas

Re: [Qemu-devel] [Qemu-block] Some question about savem/qcow2 incremental snapshot

2018-06-07 Thread He, Junyan

Dear all:

I just switched from graphic/media field to virtualization at the end of the 
last year,
so I am sorry that though I have already try my best but I still feel a little 
dizzy
about your previous discussion about NVDimm via block layer:)
In today's qemu, we use the SaveVMHandlers functions to handle both snapshot 
and migration.
So for nvdimm kind memory, its migration and snapshot use the same way as the 
ram(savevm_ram_handlers). But the difference is the size of nvdimm may be huge, 
and the load
and store speed is slower. According to my usage, when I use 256G nvdimm as 
memory backend,
it may take more than 5 minutes to complete one snapshot saving, and after 
saving the qcow2
image is bigger than 50G. For migration, this may not be a problem because we 
do not need
extra disk space and the guest is not paused when in migration process. But for 
snapshot,
we need to pause the VM and the user experience is bad, and we got concerns 
about that.
I posted this question in Jan this year but failed to get enough reply. Then I 
sent a RFC patch
set in Mar, basic idea is using the dependency snapshot and dirty log trace in 
kernel to
optimize this.

https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg04530.html

I use the simple way to handle this,
1. Separate the nvdimm region from ram when do snapshot.
2. If the first time, we dump all the nvdimm data the same as ram, and enable 
dirty log trace
for nvdimm kind region.
3. If not the first time, we find the previous snapshot point and add reference 
to its clusters
which is used to store nvdimm data. And this time, we just save dirty page 
bitmap and dirty pages.
Because the previous nvdimm data clusters is ref added, we do not need to worry 
about its deleting.

I encounter a lot of problems:
1. Migration and snapshot logic is mixed and need to separate them for nvdimm.
2. Cluster has its alignment. When do snapshot, we just save data to disk 
continuous. Because we
need to add ref to cluster, we really need to consider the alignment. I just 
use a little trick way 
to padding some data to alignment now, and I think it is not a good way.
3. Dirty log trace may have some performance problem.

In theory, this manner can be used to handle all kind of huge memory snapshot, 
we need to find the 
balance between guest performance(Because of dirty log trace) and snapshot 
saving time.

Thanks
Junyan

-Original Message-
From: Stefan Hajnoczi [mailto:stefa...@redhat.com] 
Sent: Thursday, May 31, 2018 6:49 PM
To: Kevin Wolf 
Cc: Max Reitz ; He, Junyan ; Pankaj 
Gupta ; qemu-devel@nongnu.org; qemu block 

Subject: Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 
incremental snapshot

On Wed, May 30, 2018 at 06:07:19PM +0200, Kevin Wolf wrote:
> Am 30.05.2018 um 16:44 hat Stefan Hajnoczi geschrieben:
> > On Mon, May 14, 2018 at 02:48:47PM +0100, Stefan Hajnoczi wrote:
> > > On Fri, May 11, 2018 at 07:25:31PM +0200, Kevin Wolf wrote:
> > > > Am 10.05.2018 um 10:26 hat Stefan Hajnoczi geschrieben:
> > > > > On Wed, May 09, 2018 at 07:54:31PM +0200, Max Reitz wrote:
> > > > > > On 2018-05-09 12:16, Stefan Hajnoczi wrote:
> > > > > > > On Tue, May 08, 2018 at 05:03:09PM +0200, Kevin Wolf wrote:
> > > > > > >> Am 08.05.2018 um 16:41 hat Eric Blake geschrieben:
> > > > > > >>> On 12/25/2017 01:33 AM, He Junyan wrote:
> > > > > > >> I think it makes sense to invest some effort into such 
> > > > > > >> interfaces, but be prepared for a long journey.
> > > > > > > 
> > > > > > > I like the suggestion but it needs to be followed up with 
> > > > > > > a concrete design that is feasible and fair for Junyan and others 
> > > > > > > to implement.
> > > > > > > Otherwise the "long journey" is really just a way of 
> > > > > > > rejecting this feature.
> > 
> > The discussion on NVDIMM via the block layer has runs its course.  
> > It would be a big project and I don't think it's fair to ask Junyan 
> > to implement it.
> > 
> > My understanding is this patch series doesn't modify the qcow2 
> > on-disk file format.  Rather, it just uses existing qcow2 mechanisms 
> > and extends live migration to identify the NVDIMM state state region 
> > to share the clusters.
> > 
> > Since this feature does not involve qcow2 format changes and is just 
> > an optimization (dirty blocks still need to be allocated), it can be 
> > removed from QEMU in the future if a better alternative becomes 
> > available.
> > 
> > Junyan: Can you rebase the series and send a new revision?
> > 
> > Kevin and Max: Does this sound alright?
> 
> Do patches exist? I've never seen any, so I thought this was just the 
> early design stage.

Sorry for the confusion, the earlier patch series was here:

  https://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg04530.html

> I suspect that while it wouldn't change the qcow2 on-disk format in a 
> way that the qcow2 spec would have to be change, it does need to 
> change the VMState format that is stored as a blob within the qcow

Re: [Qemu-devel] [PATCH] monitor: postpone monitor_qmp_cleanup_queues

2018-06-07 Thread Peter Xu

On Thu, Jun 07, 2018 at 01:53:01PM +0200, Markus Armbruster wrote:
> Peter Xu  writes:
> 
> > Previously we cleanup the queues when we got CLOSED event.  It was used
> 
> we clean up
> 
> > to make sure we won't leftover replies/events of a old client to a new
> 
> we won't send leftover replies/events of an old client
> 
> > client.  Now this patch postpones that until OPENED.
> 
> What if OPENED never comes?

Then we clean that up until destruction of the monitor.  IMHO it's
fine, but I'm not sure whether that's an overall acceptable approach.

> 
> > In most cases, it does not really matter much since either way will make
> > sure that the new client won't receive unexpected old events/responses.
> > However there can be a very rare race condition with the old way when we
> > use QMP with stdio and pipelines (which is quite common in iotests).
> > The problem is that, we can easily have something like this in scripts:
> >
> >   cat $QMP_COMMANDS | qemu -qmp stdio ... | filter_commands
> >
> > In most cases, a QMP session will be based on a bidirectional
> > channel (read/write to a TCP port, for example), so IN and OUT are
> > sharing some fundamentally same thing.  However that's untrue for pipes
> 
> Suggest "are fundamentally the same thing".
> 
> > above - the IN is the "cat" program, while the "OUT" is directed to the
> > "filter_commands" which is actually another process.
> 
> Regarding 'IN is the "cat" program': a pipe is not a program.  Suggest
> 'IN is connected to "cat", while OUT is connected to "filter_commands",
> which are separate processes.
> 
> > Now when we received the CLOSED event, we cleanup the queues (including
> 
> we clean up
> 
> > the QMP response queue).  That means, if we kill/stop the "cat" process
> > faster than the filter_commands process, the filter_commands process is
> > possible to miss some responses/events that should belong to it.
> 
> I'm not sure I understand the error scenario.  Can you explain it in
> more concrete terms?  Like "cat terminates, QEMU reads EOF, QEMU does
> this, QEMU does that, oops, it shouldn't have done that".

One condition could be this (after "quit" command is executed and QEMU
quits the main loop):

1. [main thread] QEMU queues one SHUTDOWN event into response queue

2. "cat" terminates (to distinguish it from the animal, I quote it).
   Logically it can terminate even earlier, but let's just assume it's
   here.

3. [monitor iothread] QEMU reads EOF from stdin, which connects to the
   "cat" process

4. [monitor iothread] QEMU calls the CLOSED event hook for the monitor,
   which will clean up the response queue of the monitor, then the
   SHUTDOWN event is dropped

5. [main thread] clean up the monitors in monitor_cleanup(), when
   trying to flush pending responses, it sees nothing.  SHUTDOWN is
   lost forever

Note that before the monitor iothread was introduced, step [4]/[5]
could never happen since the main loop was the only place to detect
the EOF event of stdin and run the CLOSED event hooks.  Now things can
happen in parallel in the iothread.

I can add these into commit message.

> 
> > In practise, I encountered a very strange SHUTDOWN event missing when
> 
> practice
> 
> May I suggest use of a spell checker?  ;)

Sorry for that.  TODO added: "Find a spell checker for Emacs".

> 
> > running test with iotest 087.  Without this patch, iotest 078 will have
> 
> 087 or 078?

087.  Err.  Even a spell checker won't help me here!

> 
> > ~10% chance to miss the SHUTDOWN event and fail when with Out-Of-Band
> > enabled:
> >
> > 087 8s ... - output mismatch (see 087.out.bad)

I'll take all the rest comments.  Thanks for reviewing.

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v2 2/4] hw/ppc/spapr_drc: Replace error_setg(&error_abort) by abort()

2018-06-07 Thread David Gibson

On Fri, Jun 08, 2018 at 12:54:36AM -0300, Philippe Mathieu-Daudé wrote:
> Hi David,
> 
> On 06/08/2018 12:03 AM, David Gibson wrote:
> > On Thu, Jun 07, 2018 at 11:46:43AM -0300, Philippe Mathieu-Daudé wrote:
> >> Use abort() instead of error_setg(&error_abort),
> >> as suggested by the "qapi/error.h" documentation:
> >>
> >> Please don't error_setg(&error_fatal, ...), use error_report() and
> >> exit(), because that's more obvious.
> >> Likewise, don't error_setg(&error_abort, ...), use assert().
> >>
> >> Use abort() instead of the suggested assert() because the assertion is
> >> already verified by the switch case.
> > 
> > I think g_assert_not_reached() would be the right thing here.
> 
> I try to follow Eric advice (who recalled Markus).
> As I understand:
> http://lists.gnu.org/archive/html/qemu-devel/2017-08/msg03605.html
> 
> "glib-Testing [...] should not be used outside of tests/."

Oh, ok, go with that then.

Acked-by: David Gibson 

> I can respin if you prefer.



> 
> >>
> >> Signed-off-by: Philippe Mathieu-Daudé 
> >> ---
> >>  hw/ppc/spapr_drc.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> >> index 8a045d6b93..b934b9c9ed 100644
> >> --- a/hw/ppc/spapr_drc.c
> >> +++ b/hw/ppc/spapr_drc.c
> >> @@ -366,7 +366,7 @@ static void prop_get_fdt(Object *obj, Visitor *v, 
> >> const char *name,
> >>  break;
> >>  }
> >>  default:
> >> -error_setg(&error_abort, "device FDT in unexpected state: 
> >> %d", tag);
> >> +abort(); /* device FDT in unexpected state */
> >>  }
> >>  fdt_offset = fdt_offset_next;
> >>  } while (fdt_depth != 0);
> > 
> 




-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v1 7/8] spapr: handle cpu core unplug via hotplug handler chain

2018-06-07 Thread David Gibson

On Thu, Jun 07, 2018 at 06:52:17PM +0200, David Hildenbrand wrote:
> Let's handle it via hotplug_handler_unplug() to make plug/unplug code
> look symmetrical.
> 
> Acked-by: Igor Mammedov 
> Signed-off-by: David Hildenbrand 

Acked-by: David Gibson 

> ---
>  hw/ppc/spapr.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0a8a3455d6..994deea8cf 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3415,7 +3415,15 @@ static void *spapr_populate_hotplug_cpu_dt(CPUState 
> *cs, int *fdt_offset,
>  /* Callback to be called during DRC release. */
>  void spapr_core_release(DeviceState *dev)
>  {
> -MachineState *ms = MACHINE(qdev_get_hotplug_handler(dev));
> +HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev);
> +
> +/* Call the unplug handler chain. This can never fail. */
> +hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
> +}
> +
> +static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev)
> +{
> +MachineState *ms = MACHINE(hotplug_dev);
>  sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
>  CPUCore *cc = CPU_CORE(dev);
>  CPUArchId *core_slot = spapr_find_cpu_slot(ms, cc->core_id, NULL);
> @@ -3600,6 +3608,8 @@ static void spapr_machine_device_unplug(HotplugHandler 
> *hotplug_dev,
>  {
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>  spapr_memory_unplug(hotplug_dev, dev);
> +} else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> +spapr_core_unplug(hotplug_dev, dev);
>  }
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v1 5/8] spapr: introduce machine unplug handler

2018-06-07 Thread David Gibson

On Thu, Jun 07, 2018 at 06:52:15PM +0200, David Hildenbrand wrote:
> We'll be handling unplug of e.g. CPUs and PCDIMMs  via the general
> hotplug handler soon, so let's add that handler function.
> 
> Signed-off-by: David Hildenbrand 

Acked-by: David Gibson 

> ---
>  hw/ppc/spapr.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 4447cb197f..bcb72d9fa7 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3586,6 +3586,11 @@ static void spapr_machine_device_plug(HotplugHandler 
> *hotplug_dev,
>  error_propagate(errp, local_err);
>  }
>  
> +static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
> +DeviceState *dev, Error **errp)
> +{
> +}
> +
>  static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
>  DeviceState *dev, Error 
> **errp)
>  {
> @@ -3988,6 +3993,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> void *data)
>  mc->get_default_cpu_node_id = spapr_get_default_cpu_node_id;
>  mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
>  hc->unplug_request = spapr_machine_device_unplug_request;
> +hc->unplug = spapr_machine_device_unplug;
>  
>  smc->dr_lmb_enabled = true;
>  mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0");

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v1 2/8] spapr: no need to verify the node

2018-06-07 Thread David Gibson

On Thu, Jun 07, 2018 at 06:52:12PM +0200, David Hildenbrand wrote:
> The node property can always be queried and the value has already been
> verified in pc_dimm_realize().
> 
> Signed-off-by: David Hildenbrand 

Acked-by: David Gibson 

> ---
>  hw/ppc/spapr.c | 9 +
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 2375cbee12..d038f3243e 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3578,14 +3578,7 @@ static void spapr_machine_device_plug(HotplugHandler 
> *hotplug_dev,
>  error_setg(errp, "Memory hotplug not supported for this 
> machine");
>  return;
>  }
> -node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, 
> errp);
> -if (*errp) {
> -return;
> -}
> -if (node < 0 || node >= MAX_NODES) {
> -error_setg(errp, "Invaild node %d", node);
> -return;
> -}
> +node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, 
> NULL);
>  
>  spapr_memory_plug(hotplug_dev, dev, node, errp);
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v1 4/8] spapr: local error handling in hotplug handler functions

2018-06-07 Thread David Gibson

On Thu, Jun 07, 2018 at 06:52:14PM +0200, David Hildenbrand wrote:
> Let's introduce and use local error variables in the hotplug handler
> functions.
> 
> Signed-off-by: David Hildenbrand 

Acked-by: David Gibson 

> ---
>  hw/ppc/spapr.c | 29 -
>  1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index a12be24ca9..4447cb197f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3576,11 +3576,14 @@ out:
>  static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>DeviceState *dev, Error **errp)
>  {
> +Error *local_err = NULL;
> +
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> -spapr_memory_plug(hotplug_dev, dev, errp);
> +spapr_memory_plug(hotplug_dev, dev, &local_err);
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> -spapr_core_plug(hotplug_dev, dev, errp);
> +spapr_core_plug(hotplug_dev, dev, &local_err);
>  }
> +error_propagate(errp, local_err);
>  }
>  
>  static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
> @@ -3588,10 +3591,11 @@ static void 
> spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
>  {
>  sPAPRMachineState *sms = SPAPR_MACHINE(OBJECT(hotplug_dev));
>  MachineClass *mc = MACHINE_GET_CLASS(sms);
> +Error *local_err = NULL;
>  
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>  if (spapr_ovec_test(sms->ov5_cas, OV5_HP_EVT)) {
> -spapr_memory_unplug_request(hotplug_dev, dev, errp);
> +spapr_memory_unplug_request(hotplug_dev, dev, &local_err);
>  } else {
>  /* NOTE: this means there is a window after guest reset, prior to
>   * CAS negotiation, where unplug requests will fail due to the
> @@ -3599,25 +3603,32 @@ static void 
> spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
>   * the case with PCI unplug, where the events will be queued and
>   * eventually handled by the guest after boot
>   */
> -error_setg(errp, "Memory hot unplug not supported for this 
> guest");
> +error_setg(&local_err,
> +   "Memory hot unplug not supported for this guest");
>  }
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
>  if (!mc->has_hotpluggable_cpus) {
> -error_setg(errp, "CPU hot unplug not supported on this machine");
> -return;
> +error_setg(&local_err,
> +   "CPU hot unplug not supported on this machine");
> +goto out;
>  }
> -spapr_core_unplug_request(hotplug_dev, dev, errp);
> +spapr_core_unplug_request(hotplug_dev, dev, &local_err);
>  }
> +out:
> +error_propagate(errp, local_err);
>  }
>  
>  static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
>DeviceState *dev, Error **errp)
>  {
> +Error *local_err = NULL;
> +
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> -spapr_memory_pre_plug(hotplug_dev, dev, errp);
> +spapr_memory_pre_plug(hotplug_dev, dev, &local_err);
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> -spapr_core_pre_plug(hotplug_dev, dev, errp);
> +spapr_core_pre_plug(hotplug_dev, dev, &local_err);
>  }
> +error_propagate(errp, local_err);
>  }
>  
>  static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v1 3/8] spapr: move all DIMM checks into spapr_memory_plug

2018-06-07 Thread David Gibson

On Thu, Jun 07, 2018 at 06:52:13PM +0200, David Hildenbrand wrote:
> Let's clean the hotplug handler up by moving everything into
> spapr_memory_plug().
> 
> Signed-off-by: David Hildenbrand 

Acked-by: David Gibson 

> ---
>  hw/ppc/spapr.c | 23 ++-
>  1 file changed, 10 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index d038f3243e..a12be24ca9 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3136,14 +3136,22 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t 
> addr_start, uint64_t size,
>  }
>  
>  static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> -  uint32_t node, Error **errp)
> +  Error **errp)
>  {
>  Error *local_err = NULL;
>  sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
> +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
>  PCDIMMDevice *dimm = PC_DIMM(dev);
>  PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>  MemoryRegion *mr;
>  uint64_t align, size, addr;
> +uint32_t node;
> +
> +if (!smc->dr_lmb_enabled) {
> +error_setg(&local_err, "Memory hotplug not supported for this 
> machine");
> +goto out;
> +}
> +node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, NULL);
>  
>  mr = ddc->get_memory_region(dimm, &local_err);
>  if (local_err) {
> @@ -3568,19 +3576,8 @@ out:
>  static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>DeviceState *dev, Error **errp)
>  {
> -MachineState *ms = MACHINE(hotplug_dev);
> -sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
> -
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> -int node;
> -
> -if (!smc->dr_lmb_enabled) {
> -error_setg(errp, "Memory hotplug not supported for this 
> machine");
> -return;
> -}
> -node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, 
> NULL);
> -
> -spapr_memory_plug(hotplug_dev, dev, node, errp);
> +spapr_memory_plug(hotplug_dev, dev, errp);
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
>  spapr_core_plug(hotplug_dev, dev, errp);
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v1 6/8] spapr: handle pc-dimm unplug via hotplug handler chain

2018-06-07 Thread David Gibson

On Thu, Jun 07, 2018 at 06:52:16PM +0200, David Hildenbrand wrote:
> Let's handle it via hotplug_handler_unplug(). E.g. necessary to hotplug/
> unplug memory devices (which a pc-dimm is) later.
> 
> Signed-off-by: David Hildenbrand 

Acked-by: David Gibson 

> ---
>  hw/ppc/spapr.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index bcb72d9fa7..0a8a3455d6 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3298,7 +3298,8 @@ static sPAPRDIMMState 
> *spapr_recover_pending_dimm_state(sPAPRMachineState *ms,
>  /* Callback to be called during DRC release. */
>  void spapr_lmb_release(DeviceState *dev)
>  {
> -sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
> +HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev);
> +sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_ctrl);
>  sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, 
> PC_DIMM(dev));
>  
>  /* This information will get lost if a migration occurs
> @@ -3316,9 +3317,17 @@ void spapr_lmb_release(DeviceState *dev)
>  
>  /*
>   * Now that all the LMBs have been removed by the guest, call the
> - * pc-dimm unplug handler to cleanup up the pc-dimm device.
> + * unplug handler chain. This can never fail.
>   */
> -pc_dimm_memory_unplug(dev, MACHINE(spapr));
> +hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
> +}
> +
> +static void spapr_memory_unplug(HotplugHandler *hotplug_dev, DeviceState 
> *dev)
> +{
> +sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
> +sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, 
> PC_DIMM(dev));
> +
> +pc_dimm_memory_unplug(dev, MACHINE(hotplug_dev));
>  object_unparent(OBJECT(dev));
>  spapr_pending_dimm_unplugs_remove(spapr, ds);
>  }
> @@ -3589,6 +3598,9 @@ static void spapr_machine_device_plug(HotplugHandler 
> *hotplug_dev,
>  static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
>  DeviceState *dev, Error **errp)
>  {
> +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +spapr_memory_unplug(hotplug_dev, dev);
> +}
>  }
>  
>  static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v10 4/7] monitor: fix comment for monitor_lock

2018-06-07 Thread Peter Xu

Fix typo in d622cb5879c.  Meanwhile move these variables close to each
other.  monitor_qapi_event_state can be declared static, add that.

Reported-by: Markus Armbruster 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Markus Armbruster 
Signed-off-by: Peter Xu 
---
 monitor.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/monitor.c b/monitor.c
index 5c60bf08cc..fa9c35631f 100644
--- a/monitor.c
+++ b/monitor.c
@@ -266,10 +266,11 @@ typedef struct QMPRequest QMPRequest;
 /* QMP checker flags */
 #define QMP_ACCEPT_UNKNOWNS 1
 
-/* Protects mon_list, monitor_event_state.  */
+/* Protects mon_list, monitor_qapi_event_state.  */
 static QemuMutex monitor_lock;
-
+static GHashTable *monitor_qapi_event_state;
 static QTAILQ_HEAD(mon_list, Monitor) mon_list;
+
 static QLIST_HEAD(mon_fdsets, MonFdset) mon_fdsets;
 static int mon_refcount;
 
@@ -571,8 +572,6 @@ static MonitorQAPIEventConf 
monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
 [QAPI_EVENT_VSERPORT_CHANGE]   = { 1000 * SCALE_MS },
 };
 
-GHashTable *monitor_qapi_event_state;
-
 /*
  * Emits the event to every monitor instance, @event is only used for trace
  * Called with monitor_lock held.
-- 
2.17.1

[Qemu-devel] [PATCH v10 7/7] monitor: add lock to protect mon_fdsets

2018-06-07 Thread Peter Xu

Introduce a new global big lock for mon_fdsets.  Take it where needed.

The monitor_fdset_get_fd() handling is a bit tricky: now we need to call
qemu_mutex_unlock() which might pollute errno, so we need to make sure
the correct errno be passed up to the callers.  To make things simpler,
we let monitor_fdset_get_fd() return the -errno directly when error
happens, then in qemu_open() we move it back into errno.

Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Markus Armbruster 
Signed-off-by: Peter Xu 
---
 monitor.c | 52 +--
 stubs/fdset.c |  2 +-
 util/osdep.c  |  3 ++-
 3 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/monitor.c b/monitor.c
index 2bda0dfa12..ede4aed8b6 100644
--- a/monitor.c
+++ b/monitor.c
@@ -271,7 +271,10 @@ static QemuMutex monitor_lock;
 static GHashTable *monitor_qapi_event_state;
 static QTAILQ_HEAD(mon_list, Monitor) mon_list;
 
+/* Protects mon_fdsets */
+static QemuMutex mon_fdsets_lock;
 static QLIST_HEAD(mon_fdsets, MonFdset) mon_fdsets;
+
 static int mon_refcount;
 
 static mon_cmd_t mon_cmds[];
@@ -2319,9 +2322,11 @@ static void monitor_fdsets_cleanup(void)
 MonFdset *mon_fdset;
 MonFdset *mon_fdset_next;
 
+qemu_mutex_lock(&mon_fdsets_lock);
 QLIST_FOREACH_SAFE(mon_fdset, &mon_fdsets, next, mon_fdset_next) {
 monitor_fdset_cleanup(mon_fdset);
 }
+qemu_mutex_unlock(&mon_fdsets_lock);
 }
 
 AddfdInfo *qmp_add_fd(bool has_fdset_id, int64_t fdset_id, bool has_opaque,
@@ -2356,6 +2361,7 @@ void qmp_remove_fd(int64_t fdset_id, bool has_fd, int64_t 
fd, Error **errp)
 MonFdsetFd *mon_fdset_fd;
 char fd_str[60];
 
+qemu_mutex_lock(&mon_fdsets_lock);
 QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
 if (mon_fdset->id != fdset_id) {
 continue;
@@ -2375,10 +2381,12 @@ void qmp_remove_fd(int64_t fdset_id, bool has_fd, 
int64_t fd, Error **errp)
 goto error;
 }
 monitor_fdset_cleanup(mon_fdset);
+qemu_mutex_unlock(&mon_fdsets_lock);
 return;
 }
 
 error:
+qemu_mutex_unlock(&mon_fdsets_lock);
 if (has_fd) {
 snprintf(fd_str, sizeof(fd_str), "fdset-id:%" PRId64 ", fd:%" PRId64,
  fdset_id, fd);
@@ -2394,6 +2402,7 @@ FdsetInfoList *qmp_query_fdsets(Error **errp)
 MonFdsetFd *mon_fdset_fd;
 FdsetInfoList *fdset_list = NULL;
 
+qemu_mutex_lock(&mon_fdsets_lock);
 QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
 FdsetInfoList *fdset_info = g_malloc0(sizeof(*fdset_info));
 FdsetFdInfoList *fdsetfd_list = NULL;
@@ -2423,6 +2432,7 @@ FdsetInfoList *qmp_query_fdsets(Error **errp)
 fdset_info->next = fdset_list;
 fdset_list = fdset_info;
 }
+qemu_mutex_unlock(&mon_fdsets_lock);
 
 return fdset_list;
 }
@@ -2435,6 +2445,7 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool 
has_fdset_id, int64_t fdset_id,
 MonFdsetFd *mon_fdset_fd;
 AddfdInfo *fdinfo;
 
+qemu_mutex_lock(&mon_fdsets_lock);
 if (has_fdset_id) {
 QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
 /* Break if match found or match impossible due to ordering by ID 
*/
@@ -2455,6 +2466,7 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool 
has_fdset_id, int64_t fdset_id,
 if (fdset_id < 0) {
 error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "fdset-id",
"a non-negative value");
+qemu_mutex_unlock(&mon_fdsets_lock);
 return NULL;
 }
 /* Use specified fdset ID */
@@ -2505,16 +2517,21 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool 
has_fdset_id, int64_t fdset_id,
 fdinfo->fdset_id = mon_fdset->id;
 fdinfo->fd = mon_fdset_fd->fd;
 
+qemu_mutex_unlock(&mon_fdsets_lock);
 return fdinfo;
 }
 
 int monitor_fdset_get_fd(int64_t fdset_id, int flags)
 {
-#ifndef _WIN32
+#ifdef _WIN32
+return -ENOENT;
+#else
 MonFdset *mon_fdset;
 MonFdsetFd *mon_fdset_fd;
 int mon_fd_flags;
+int ret;
 
+qemu_mutex_lock(&mon_fdsets_lock);
 QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
 if (mon_fdset->id != fdset_id) {
 continue;
@@ -2522,20 +2539,24 @@ int monitor_fdset_get_fd(int64_t fdset_id, int flags)
 QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
 mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
 if (mon_fd_flags == -1) {
-return -1;
+ret = -errno;
+goto out;
 }
 
 if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
-return mon_fdset_fd->fd;
+ret = mon_fdset_fd->fd;
+goto out;
 }
 }
-errno = EACCES;
-return -1;
+ret = -EACCES;
+goto out;
 }
-#endif
+ret = -ENOENT;
 
-errno = ENOENT;
-return -1;
+out:
+qemu_mutex_unlock(&mon_fdsets_lock);
+return ret;
+#endif
 }
 
 int mon

[Qemu-devel] [PATCH v10 6/7] monitor: move init global earlier

2018-06-07 Thread Peter Xu

Before this patch, monitor fd helpers might be called even earlier than
monitor_init_globals().  This can be problematic.

After previous work, now monitor_init_globals() does not depend on
accelerator initialization any more.  Call it earlier (before CLI
parsing; that's where the monitor APIs might be called) to make sure it
is called before any of the monitor APIs.

Suggested-by: Markus Armbruster 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Markus Armbruster 
Signed-off-by: Peter Xu 
---
 vl.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/vl.c b/vl.c
index 06031715ac..5ddea3d235 100644
--- a/vl.c
+++ b/vl.c
@@ -2978,6 +2978,7 @@ int main(int argc, char **argv, char **envp)
 
 runstate_init();
 postcopy_infrastructure_init();
+monitor_init_globals();
 
 if (qcrypto_init(&err) < 0) {
 error_reportf_err(err, "cannot initialize crypto: ");
@@ -4412,12 +4413,6 @@ int main(int argc, char **argv, char **envp)
 default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
 default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
 
-/*
- * Note: qtest_enabled() (which is used in monitor_qapi_event_init())
- * depends on configure_accelerator() above.
- */
-monitor_init_globals();
-
 if (qemu_opts_foreach(qemu_find_opts("mon"),
   mon_init_func, NULL, NULL)) {
 exit(1);
-- 
2.17.1

[Qemu-devel] [PATCH v10 3/7] monitor: more comments on lock-free elements

2018-06-07 Thread Peter Xu

Add some explicit comments for both Readline and cpu_set/cpu_get helpers
that they do not need the mon_lock protection.

Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Markus Armbruster 
Signed-off-by: Peter Xu 
---
 monitor.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/monitor.c b/monitor.c
index 0fba3ccc20..5c60bf08cc 100644
--- a/monitor.c
+++ b/monitor.c
@@ -206,7 +206,15 @@ struct Monitor {
 int suspend_cnt;/* Needs to be accessed atomically */
 bool skip_flush;
 bool use_io_thr;
+
+/*
+ * State used only in the thread "owning" the monitor.
+ * If @use_io_thr, this is mon_global.mon_iothread.
+ * Else, it's the main thread.
+ * These members can be safely accessed without locks.
+ */
 ReadLineState *rs;
+
 MonitorQMP qmp;
 gchar *mon_cpu_path;
 BlockCompletionFunc *password_completion_cb;
@@ -1311,7 +1319,7 @@ void qmp_qmp_capabilities(bool has_enable, 
QMPCapabilityList *enable,
 cur_mon->qmp.commands = &qmp_commands;
 }
 
-/* set the current CPU defined by the user */
+/* Set the current CPU defined by the user. Callers must hold BQL. */
 int monitor_set_cpu(int cpu_index)
 {
 CPUState *cpu;
@@ -1325,6 +1333,7 @@ int monitor_set_cpu(int cpu_index)
 return 0;
 }
 
+/* Callers must hold BQL. */
 static CPUState *mon_get_cpu_sync(bool synchronize)
 {
 CPUState *cpu;
-- 
2.17.1

[Qemu-devel] [PATCH v10 0/7] monitor: let Monitor be thread safe

2018-06-07 Thread Peter Xu

v10:
- collect r-bs
- comment/renice the function monitor_get_clock(), add some commit
  message [Stefan, Markus]

v9:
- two more patches to implement Markus's idea to init monitor earlier
  (which are patch 5 & 6)
- touch up patch 7 to init the fdset lock in monitor_init_globals()

v8:
- some wording changes according to previous comments [Markus]
- return -ENOENT too in stubs/fdset.c:monitor_fdset_get_fd() [Stefan]
- refactor the fdset functions a bit, drop "ret" where proper [Markus]
- one more patch to fix monitor_lock comment [Markus]
- regular rebase and torturing

Stefan reported this problem that in the future we might start to have
more threads operating on the same Monitor object.  This seris try to
add fundamental support for it.

Please review.  Thanks,

Peter Xu (7):
  monitor: rename out_lock to mon_lock
  monitor: protect mon->fds with mon_lock
  monitor: more comments on lock-free elements
  monitor: fix comment for monitor_lock
  monitor: remove event_clock_type
  monitor: move init global earlier
  monitor: add lock to protect mon_fdsets

 monitor.c | 168 +++---
 stubs/fdset.c |   2 +-
 util/osdep.c  |   3 +-
 vl.c  |   7 +--
 4 files changed, 121 insertions(+), 59 deletions(-)

-- 
2.17.1

[Qemu-devel] [PATCH v10 1/7] monitor: rename out_lock to mon_lock

2018-06-07 Thread Peter Xu

The out_lock is protecting a few Monitor fields.  In the future the
monitor code will start to run in multiple threads.  We are going to
turn it into a bigger lock to protect not only the out buffer but also
most of the rest.

Since at it, rearrange the Monitor struct a bit.

Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Markus Armbruster 
Signed-off-by: Peter Xu 
---
 monitor.c | 53 +
 1 file changed, 29 insertions(+), 24 deletions(-)

diff --git a/monitor.c b/monitor.c
index 6d0cec552e..5bc9b2dcd0 100644
--- a/monitor.c
+++ b/monitor.c
@@ -206,15 +206,6 @@ struct Monitor {
 int suspend_cnt;/* Needs to be accessed atomically */
 bool skip_flush;
 bool use_io_thr;
-
-/* We can't access guest memory when holding the lock */
-QemuMutex out_lock;
-QString *outbuf;
-guint out_watch;
-
-/* Read under either BQL or out_lock, written with BQL+out_lock.  */
-int mux_out;
-
 ReadLineState *rs;
 MonitorQMP qmp;
 gchar *mon_cpu_path;
@@ -223,6 +214,20 @@ struct Monitor {
 mon_cmd_t *cmd_table;
 QLIST_HEAD(,mon_fd_t) fds;
 QTAILQ_ENTRY(Monitor) entry;
+
+/*
+ * The per-monitor lock. We can't access guest memory when holding
+ * the lock.
+ */
+QemuMutex mon_lock;
+
+/*
+ * Fields that are protected by the per-monitor lock.
+ */
+QString *outbuf;
+guint out_watch;
+/* Read under either BQL or mon_lock, written with BQL+mon_lock.  */
+int mux_out;
 };
 
 /* Let's add monitor global variables to this struct. */
@@ -365,14 +370,14 @@ static gboolean monitor_unblocked(GIOChannel *chan, 
GIOCondition cond,
 {
 Monitor *mon = opaque;
 
-qemu_mutex_lock(&mon->out_lock);
+qemu_mutex_lock(&mon->mon_lock);
 mon->out_watch = 0;
 monitor_flush_locked(mon);
-qemu_mutex_unlock(&mon->out_lock);
+qemu_mutex_unlock(&mon->mon_lock);
 return FALSE;
 }
 
-/* Called with mon->out_lock held.  */
+/* Called with mon->mon_lock held.  */
 static void monitor_flush_locked(Monitor *mon)
 {
 int rc;
@@ -410,9 +415,9 @@ static void monitor_flush_locked(Monitor *mon)
 
 void monitor_flush(Monitor *mon)
 {
-qemu_mutex_lock(&mon->out_lock);
+qemu_mutex_lock(&mon->mon_lock);
 monitor_flush_locked(mon);
-qemu_mutex_unlock(&mon->out_lock);
+qemu_mutex_unlock(&mon->mon_lock);
 }
 
 /* flush at every end of line */
@@ -420,7 +425,7 @@ static void monitor_puts(Monitor *mon, const char *str)
 {
 char c;
 
-qemu_mutex_lock(&mon->out_lock);
+qemu_mutex_lock(&mon->mon_lock);
 for(;;) {
 c = *str++;
 if (c == '\0')
@@ -433,7 +438,7 @@ static void monitor_puts(Monitor *mon, const char *str)
 monitor_flush_locked(mon);
 }
 }
-qemu_mutex_unlock(&mon->out_lock);
+qemu_mutex_unlock(&mon->mon_lock);
 }
 
 void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
@@ -724,7 +729,7 @@ static void monitor_data_init(Monitor *mon, bool skip_flush,
   bool use_io_thr)
 {
 memset(mon, 0, sizeof(Monitor));
-qemu_mutex_init(&mon->out_lock);
+qemu_mutex_init(&mon->mon_lock);
 qemu_mutex_init(&mon->qmp.qmp_queue_lock);
 mon->outbuf = qstring_new();
 /* Use *mon_cmds by default. */
@@ -744,7 +749,7 @@ static void monitor_data_destroy(Monitor *mon)
 }
 readline_free(mon->rs);
 qobject_unref(mon->outbuf);
-qemu_mutex_destroy(&mon->out_lock);
+qemu_mutex_destroy(&mon->mon_lock);
 qemu_mutex_destroy(&mon->qmp.qmp_queue_lock);
 monitor_qmp_cleanup_req_queue_locked(mon);
 monitor_qmp_cleanup_resp_queue_locked(mon);
@@ -776,13 +781,13 @@ char *qmp_human_monitor_command(const char *command_line, 
bool has_cpu_index,
 handle_hmp_command(&hmp, command_line);
 cur_mon = old_mon;
 
-qemu_mutex_lock(&hmp.out_lock);
+qemu_mutex_lock(&hmp.mon_lock);
 if (qstring_get_length(hmp.outbuf) > 0) {
 output = g_strdup(qstring_get_str(hmp.outbuf));
 } else {
 output = g_strdup("");
 }
-qemu_mutex_unlock(&hmp.out_lock);
+qemu_mutex_unlock(&hmp.mon_lock);
 
 out:
 monitor_data_destroy(&hmp);
@@ -4381,9 +4386,9 @@ static void monitor_event(void *opaque, int event)
 
 switch (event) {
 case CHR_EVENT_MUX_IN:
-qemu_mutex_lock(&mon->out_lock);
+qemu_mutex_lock(&mon->mon_lock);
 mon->mux_out = 0;
-qemu_mutex_unlock(&mon->out_lock);
+qemu_mutex_unlock(&mon->mon_lock);
 if (mon->reset_seen) {
 readline_restart(mon->rs);
 monitor_resume(mon);
@@ -4403,9 +4408,9 @@ static void monitor_event(void *opaque, int event)
 } else {
 atomic_inc(&mon->suspend_cnt);
 }
-qemu_mutex_lock(&mon->out_lock);
+qemu_mutex_lock(&mon->mon_lock);
 mon->mux_out = 1;
-qemu_mutex_unlock(&mon->out_lock);
+qemu_mutex_unlock(&mon->mon_lock);
 break;
 
 case

[Qemu-devel] [PATCH v10 5/7] monitor: remove event_clock_type

2018-06-07 Thread Peter Xu

Instead, use a dynamic function to detect which clock we'll use.  The
problem is that the old code will let monitor initialization depend on
configure_accelerator() (that's where qtest_enabled() start to take
effect).  After this change, we don't have such a dependency any more.
We just need to make sure configure_accelerator() is called when we
start to use it.  Now it's only used in monitor_qapi_event_queue() and
monitor_qapi_event_handler(), so we're good.

Suggested-by: Markus Armbruster 
Signed-off-by: Peter Xu 
---
 monitor.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/monitor.c b/monitor.c
index fa9c35631f..2bda0dfa12 100644
--- a/monitor.c
+++ b/monitor.c
@@ -281,8 +281,6 @@ QmpCommandList qmp_commands, qmp_cap_negotiation_commands;
 
 Monitor *cur_mon;
 
-static QEMUClockType event_clock_type = QEMU_CLOCK_REALTIME;
-
 static void monitor_command_cb(void *opaque, const char *cmdline,
void *readline_opaque);
 
@@ -309,6 +307,19 @@ static inline bool monitor_is_hmp_non_interactive(const 
Monitor *mon)
 return !monitor_is_qmp(mon) && !monitor_uses_readline(mon);
 }
 
+/*
+ * This should never be called before configure_accelerator() since
+ * only until then could we know whether qtest was enabled or not.
+ */
+static inline QEMUClockType monitor_get_clock(void)
+{
+/*
+ * This allows us to perform tests on the monitor queues to verify
+ * that the rate limits are enforced.
+ */
+return qtest_enabled() ? QEMU_CLOCK_VIRTUAL : QEMU_CLOCK_REALTIME;
+}
+
 /**
  * Is the current monitor, if any, a QMP monitor?
  */
@@ -632,7 +643,7 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, 
Error **errp)
  * monitor_qapi_event_handler() in evconf->rate ns.  Any
  * events arriving before then will be delayed until then.
  */
-int64_t now = qemu_clock_get_ns(event_clock_type);
+int64_t now = qemu_clock_get_ns(monitor_get_clock());
 
 monitor_qapi_event_emit(event, qdict);
 
@@ -640,7 +651,7 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, 
Error **errp)
 evstate->event = event;
 evstate->data = qobject_ref(data);
 evstate->qdict = NULL;
-evstate->timer = timer_new_ns(event_clock_type,
+evstate->timer = timer_new_ns(monitor_get_clock(),
   monitor_qapi_event_handler,
   evstate);
 g_hash_table_add(monitor_qapi_event_state, evstate);
@@ -665,7 +676,7 @@ static void monitor_qapi_event_handler(void *opaque)
 qemu_mutex_lock(&monitor_lock);
 
 if (evstate->qdict) {
-int64_t now = qemu_clock_get_ns(event_clock_type);
+int64_t now = qemu_clock_get_ns(monitor_get_clock());
 
 monitor_qapi_event_emit(evstate->event, evstate->qdict);
 qobject_unref(evstate->qdict);
@@ -721,10 +732,6 @@ static gboolean qapi_event_throttle_equal(const void *a, 
const void *b)
 
 static void monitor_qapi_event_init(void)
 {
-if (qtest_enabled()) {
-event_clock_type = QEMU_CLOCK_VIRTUAL;
-}
-
 monitor_qapi_event_state = g_hash_table_new(qapi_event_throttle_hash,
 qapi_event_throttle_equal);
 qmp_event_set_func_emit(monitor_qapi_event_queue);
-- 
2.17.1

Re: [Qemu-devel] [PATCH v2 2/4] hw/ppc/spapr_drc: Replace error_setg(&error_abort) by abort()

2018-06-07 Thread Philippe Mathieu-Daudé

Hi David,

On 06/08/2018 12:03 AM, David Gibson wrote:
> On Thu, Jun 07, 2018 at 11:46:43AM -0300, Philippe Mathieu-Daudé wrote:
>> Use abort() instead of error_setg(&error_abort),
>> as suggested by the "qapi/error.h" documentation:
>>
>> Please don't error_setg(&error_fatal, ...), use error_report() and
>> exit(), because that's more obvious.
>> Likewise, don't error_setg(&error_abort, ...), use assert().
>>
>> Use abort() instead of the suggested assert() because the assertion is
>> already verified by the switch case.
> 
> I think g_assert_not_reached() would be the right thing here.

I try to follow Eric advice (who recalled Markus).
As I understand:
http://lists.gnu.org/archive/html/qemu-devel/2017-08/msg03605.html

"glib-Testing [...] should not be used outside of tests/."

I can respin if you prefer.

>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/ppc/spapr_drc.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
>> index 8a045d6b93..b934b9c9ed 100644
>> --- a/hw/ppc/spapr_drc.c
>> +++ b/hw/ppc/spapr_drc.c
>> @@ -366,7 +366,7 @@ static void prop_get_fdt(Object *obj, Visitor *v, const 
>> char *name,
>>  break;
>>  }
>>  default:
>> -error_setg(&error_abort, "device FDT in unexpected state: %d", 
>> tag);
>> +abort(); /* device FDT in unexpected state */
>>  }
>>  fdt_offset = fdt_offset_next;
>>  } while (fdt_depth != 0);
> 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v9 5/7] monitor: remove event_clock_type

2018-06-07 Thread Peter Xu

On Thu, Jun 07, 2018 at 04:32:54PM +0200, Markus Armbruster wrote:
> Peter Xu  writes:
> 
> > Instead, use a dynamic function to detect which clock we'll use.  The
> > problem is that the old code will let monitor initialization depends on
> > qtest_enabled().  After this change, we don't have such a dependency any
> > more.
> >
> > Suggested-by: Markus Armbruster 
> > Signed-off-by: Peter Xu 
> > ---
> >  monitor.c | 21 -
> >  1 file changed, 12 insertions(+), 9 deletions(-)
> >
> > diff --git a/monitor.c b/monitor.c
> > index 2504696d76..bd9ab5597f 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -282,8 +282,6 @@ QmpCommandList qmp_commands, 
> > qmp_cap_negotiation_commands;
> >  
> >  Monitor *cur_mon;
> >  
> > -static QEMUClockType event_clock_type = QEMU_CLOCK_REALTIME;
> > -
> >  static void monitor_command_cb(void *opaque, const char *cmdline,
> > void *readline_opaque);
> >  
> > @@ -310,6 +308,15 @@ static inline bool 
> > monitor_is_hmp_non_interactive(const Monitor *mon)
> >  return !monitor_is_qmp(mon) && !monitor_uses_readline(mon);
> >  }
> >  
> > +static inline QEMUClockType monitor_get_clock(void)
> > +{
> > +if (qtest_enabled()) {
> > +return QEMU_CLOCK_VIRTUAL;
> > +} else {
> > +return QEMU_CLOCK_REALTIME;
> > +}
> 
> Suggest the more laconic
> 
>return qtest_enabled() ? QEMU_CLOCK_VIRTUAL : QEMU_CLOCK_REALTIME;
> 
> A comment explaining why we want QEMU_CLOCK_VIRTUAL would be nice.

Will do.

-- 
Peter Xu

[Qemu-devel] [PATCH v10 2/7] monitor: protect mon->fds with mon_lock

2018-06-07 Thread Peter Xu

mon->fds were protected by BQL.  Now protect it by mon_lock so that it
can even be used in monitor iothread.

Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Markus Armbruster 
Signed-off-by: Peter Xu 
---
 monitor.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/monitor.c b/monitor.c
index 5bc9b2dcd0..0fba3ccc20 100644
--- a/monitor.c
+++ b/monitor.c
@@ -212,7 +212,6 @@ struct Monitor {
 BlockCompletionFunc *password_completion_cb;
 void *password_opaque;
 mon_cmd_t *cmd_table;
-QLIST_HEAD(,mon_fd_t) fds;
 QTAILQ_ENTRY(Monitor) entry;
 
 /*
@@ -224,6 +223,7 @@ struct Monitor {
 /*
  * Fields that are protected by the per-monitor lock.
  */
+QLIST_HEAD(, mon_fd_t) fds;
 QString *outbuf;
 guint out_watch;
 /* Read under either BQL or mon_lock, written with BQL+mon_lock.  */
@@ -2187,7 +2187,7 @@ static void hmp_acl_remove(Monitor *mon, const QDict 
*qdict)
 void qmp_getfd(const char *fdname, Error **errp)
 {
 mon_fd_t *monfd;
-int fd;
+int fd, tmp_fd;
 
 fd = qemu_chr_fe_get_msgfd(&cur_mon->chr);
 if (fd == -1) {
@@ -2202,13 +2202,17 @@ void qmp_getfd(const char *fdname, Error **errp)
 return;
 }
 
+qemu_mutex_lock(&cur_mon->mon_lock);
 QLIST_FOREACH(monfd, &cur_mon->fds, next) {
 if (strcmp(monfd->name, fdname) != 0) {
 continue;
 }
 
-close(monfd->fd);
+tmp_fd = monfd->fd;
 monfd->fd = fd;
+qemu_mutex_unlock(&cur_mon->mon_lock);
+/* Make sure close() is out of critical section */
+close(tmp_fd);
 return;
 }
 
@@ -2217,24 +2221,31 @@ void qmp_getfd(const char *fdname, Error **errp)
 monfd->fd = fd;
 
 QLIST_INSERT_HEAD(&cur_mon->fds, monfd, next);
+qemu_mutex_unlock(&cur_mon->mon_lock);
 }
 
 void qmp_closefd(const char *fdname, Error **errp)
 {
 mon_fd_t *monfd;
+int tmp_fd;
 
+qemu_mutex_lock(&cur_mon->mon_lock);
 QLIST_FOREACH(monfd, &cur_mon->fds, next) {
 if (strcmp(monfd->name, fdname) != 0) {
 continue;
 }
 
 QLIST_REMOVE(monfd, next);
-close(monfd->fd);
+tmp_fd = monfd->fd;
 g_free(monfd->name);
 g_free(monfd);
+qemu_mutex_unlock(&cur_mon->mon_lock);
+/* Make sure close() is out of critical section */
+close(tmp_fd);
 return;
 }
 
+qemu_mutex_unlock(&cur_mon->mon_lock);
 error_setg(errp, QERR_FD_NOT_FOUND, fdname);
 }
 
@@ -2242,6 +2253,7 @@ int monitor_get_fd(Monitor *mon, const char *fdname, 
Error **errp)
 {
 mon_fd_t *monfd;
 
+qemu_mutex_lock(&mon->mon_lock);
 QLIST_FOREACH(monfd, &mon->fds, next) {
 int fd;
 
@@ -2255,10 +2267,12 @@ int monitor_get_fd(Monitor *mon, const char *fdname, 
Error **errp)
 QLIST_REMOVE(monfd, next);
 g_free(monfd->name);
 g_free(monfd);
+qemu_mutex_unlock(&mon->mon_lock);
 
 return fd;
 }
 
+qemu_mutex_unlock(&mon->mon_lock);
 error_setg(errp, "File descriptor named '%s' has not been found", fdname);
 return -1;
 }
-- 
2.17.1

Re: [Qemu-devel] [PATCH 4/4] hw/sd/omap_mmc: Reset SD card on controller reset

2018-06-07 Thread Philippe Mathieu-Daudé

Hi Peter,

On 01/09/2018 11:01 AM, Peter Maydell wrote:
> Since omap_mmc is still using the legacy SD card API, the SD
> card created by sd_init() is not plugged into any bus. This
> means that the controller has to reset it manually.
> 
> Failing to do this mostly didn't affect the guest since the
> guest typically does a programmed SD card reset as part of
> its SD controller driver initialization, but would mean that
> migration fails because it's only in sd_reset() that we
> set up the wpgrps_size field.
> 
> Signed-off-by: Peter Maydell 
> ---
> This one isn't cc-stable because the OMAP boards don't
> support migration at all anyway, being un-QOMified.
> ---
>  hw/sd/omap_mmc.c | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/sd/omap_mmc.c b/hw/sd/omap_mmc.c
> index e934cd3..5b47cad 100644
> --- a/hw/sd/omap_mmc.c
> +++ b/hw/sd/omap_mmc.c
> @@ -305,6 +305,12 @@ void omap_mmc_reset(struct omap_mmc_s *host)
>  host->cdet_enable = 0;
>  qemu_set_irq(host->coverswitch, host->cdet_state);
>  host->clkdiv = 0;
> +
> +/* Since we're still using the legacy SD API the card is not plugged
> + * into any bus, and we must reset it manually. When omap_mmc is
> + * QOMified this must move into the QOM reset function.
> + */
> +device_reset(DEVICE(host->card));
>  }
>  
>  static uint64_t omap_mmc_read(void *opaque, hwaddr offset,
> @@ -587,8 +593,6 @@ struct omap_mmc_s *omap_mmc_init(hwaddr base,
>  s->lines = 1;/* TODO: needs to be settable per-board */
>  s->rev = 1;
>  
> -omap_mmc_reset(s);
> -
>  memory_region_init_io(&s->iomem, NULL, &omap_mmc_ops, s, "omap.mmc", 
> 0x800);
>  memory_region_add_subregion(sysmem, base, &s->iomem);
>  
> @@ -598,6 +602,8 @@ struct omap_mmc_s *omap_mmc_init(hwaddr base,
>  exit(1);
>  }
>  
> +omap_mmc_reset(s);
> +
>  return s;
>  }
>  
> @@ -613,8 +619,6 @@ struct omap_mmc_s *omap2_mmc_init(struct 
> omap_target_agent_s *ta,
>  s->lines = 4;
>  s->rev = 2;
>  
> -omap_mmc_reset(s);
> -
>  memory_region_init_io(&s->iomem, NULL, &omap_mmc_ops, s, "omap.mmc",
>omap_l4_region_size(ta, 0));
>  omap_l4_attach(ta, 0, &s->iomem);
> @@ -628,6 +632,8 @@ struct omap_mmc_s *omap2_mmc_init(struct 
> omap_target_agent_s *ta,
>  s->cdet = qemu_allocate_irq(omap_mmc_cover_cb, s, 0);
>  sd_set_cb(s->card, NULL, s->cdet);
>  
> +omap_mmc_reset(s);
> +
>  return s;
>  }

This patch broke something in the Nokia N810 tablet.

I used your image:

http://people.linaro.org/~peter.maydell/n8x0-images.tgz

ecd219f7abbc17b9d9170206410355bba287831f is the first bad commit
commit ecd219f7abbc17b9d9170206410355bba287831f
Author: Peter Maydell 
Date:   Tue Jan 16 13:28:13 2018 +

hw/sd/omap_mmc: Reset SD card on controller reset

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 

Using: -append "console=ttyS1"

Before:

[1.239471] mmci-omap mmci-omap.0: command timeout (CMD52)
[1.240356] mmci-omap mmci-omap.0: command timeout (CMD52)
[1.253967] mmci-omap mmci-omap.0: command timeout (CMD5)
[1.254364] mmci-omap mmci-omap.0: command timeout (CMD5)
[1.254730] mmci-omap mmci-omap.0: command timeout (CMD5)
[1.255096] mmci-omap mmci-omap.0: command timeout (CMD5)
omap_dma4_write: Read-only register 0x34
omap_dma4_write: Read-only register 0x38
omap_dma4_write: Read-only register 0x3c
omap_dma4_write: Read-only register 0x40
omap_dma4_write: Read-only register 0x38
[1.263275] mmc0: host does not support reading read-only switch.
assuming write-enable.
[1.264038] mmc0: new SDHC card at address 4567
omap_uart_read: Bad register 0x34
omap_uart_write: Bad register 0x34
omap_uart_read: Bad register 0x34
omap_uart_write: Bad register 0x34
[1.327514] Waiting for root device /dev/mmcblk0p1...
[1.329925] mmcblk0: mmc0:4567 QEMU! 1.81 GiB
[1.333831]  mmcblk0:omap_dma4_write: Read-only register 0x38
 p1 p2
[1.425537] mmci-omap mmci-omap.0: command timeout (CMD52)
[1.426727] mmci-omap mmci-omap.0: command timeout (CMD52)
omap_dma4_write: Read-only register 0x38
[1.478668] mmci-omap mmci-omap.0: command timeout (CMD8)
omap_dma4_write: Read-only register 0x38
omap_dma4_write: Read-only register 0x38
[1.484436] mmci-omap mmci-omap.0: command timeout (CMD5)
[1.485015] mmci-omap mmci-omap.0: command timeout (CMD5)
[1.485595] mmci-omap mmci-omap.0: command timeout (CMD5)
[1.486022] mmci-omap mmci-omap.0: command timeout (CMD5)
[1.486480] mmci-omap mmci-omap.0: command timeout (CMD55)
[1.487030] mmci-omap mmci-omap.0: command timeout (CMD55)
omap_dma4_write: Read-only register 0x38
[1.490600] EXT3-fs (mmcblk0p1): recovery required on readonly filesystem
[1.491149] EXT3-fs (mmcblk0p1): write access will be ena

Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-07 Thread Peter Xu

On Fri, Jun 08, 2018 at 05:49:26AM +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 08, 2018 at 10:34:25AM +0800, Peter Xu wrote:
> > On Fri, Jun 08, 2018 at 04:58:21AM +0300, Michael S. Tsirkin wrote:
> > > On Fri, Jun 08, 2018 at 09:37:23AM +0800, Peter Xu wrote:
> > > > > > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > > > > > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > > > > > 
> > > > > > "The driver MUST NOT accept a feature which the device did not 
> > > > > > offer"
> > > > > > 
> > > > > > Then I'm curious what would happen if:
> > > > > > 
> > > > > > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > > > > > - a guest that enabled PAGE_POISON
> > > > > > 
> > > > > > Then how the driver could tell the host that PAGE_POISON is enabled
> > > > > > considering that guest should never set that feature bit if the
> > > > > > emulation code didn't provide it?
> > > 
> > > It wouldn't. It just has to deal with the fact that host can discard
> > > writes to hinted pages. Right now driver deals with it simply by
> > > disabling F_FREE_PAGE_HINT.
> > 
> > Ah I see.  Thanks Michael.
> > 
> > Then it seems to me that it's more important to implement the F_POISON
> > along with where it is declared since otherwise it'll be a real broken
> > (device declares F_POISON, guest assumes it can handle the POISON so
> > guest will enable FREE_PAGE_HINT, however the device can't really
> > handle that).
> 
> It seems to handle it fine, it just ignores the hints.

Ok I misunderstood.  Then that's fine.

The message in the commit message is a bit misleading:

"TODO: - handle the case when page poisoning is in use"

It seems to me that:

"Now we handle the page poisoning by dropping the page hints
 directly.  In the future we might do something better."

Regards,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v2] target/ppc: extend eieio for POWER9

2018-06-07 Thread David Gibson

On Wed, Jun 06, 2018 at 09:33:53AM +0200, Cédric Le Goater wrote:
> POWER9 introduced a new variant of the eieio instruction using bit 6
> as a hint to tell the CPU it is a store-forwarding barrier.
> 
> The usage of this eieio extension was recently added in Linux 4.17
> which activated the "support for a store forwarding barrier at kernel
> entry/exit".
> 
> Unfortunately, it is not possible to insert this new eieio instruction
> without considerable change in ppc_tr_translate_insn(). So instead we
> loosen the QEMU eieio instruction mask and modify the gen_eieio()
> helper to test for bit6. On non-POWER9 CPUs, the bit6 is just ignored
> but a warning is emitted as this is not an instruction software should
> be using.
> 
> Signed-off-by: Cédric Le Goater 

Applied to ppc-for-3.0.

> ---
> 
>  Changes since v1:
> 
>  - removed specific PPC2_MEM_EIEIO2 flag
>  - ignore bit6 on non-POWER9 CPU
> 
>  target/ppc/translate.c | 25 +++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 8ba8f67dc513..5fe1ba655599 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -2967,7 +2967,28 @@ static void gen_stswx(DisasContext *ctx)
>  /* eieio */
>  static void gen_eieio(DisasContext *ctx)
>  {
> -tcg_gen_mb(TCG_MO_LD_ST | TCG_BAR_SC);
> +TCGBar bar = TCG_MO_LD_ST;
> +
> +/*
> + * POWER9 has a eieio instruction variant using bit 6 as a hint to
> + * tell the CPU it is a store-forwarding barrier.
> + */
> +if (ctx->opcode & 0x200) {
> +/*
> + * ISA says that "Reserved fields in instructions are ignored
> + * by the processor". So ignore the bit 6 on non-POWER9 CPU but
> + * as this is not an instruction software should be using,
> + * complain to the user.
> + */
> +if (!(ctx->insns_flags2 & PPC2_ISA300)) {
> +qemu_log_mask(LOG_GUEST_ERROR, "invalid eieio using bit 6 at @"
> +  TARGET_FMT_lx "\n", ctx->base.pc_next - 4);
> +} else {
> +bar = TCG_MO_ST_LD;
> +}
> +}
> +
> +tcg_gen_mb(bar | TCG_BAR_SC);
>  }
>  
>  #if !defined(CONFIG_USER_ONLY)
> @@ -6483,7 +6504,7 @@ GEN_HANDLER(lswi, 0x1F, 0x15, 0x12, 0x0001, 
> PPC_STRING),
>  GEN_HANDLER(lswx, 0x1F, 0x15, 0x10, 0x0001, PPC_STRING),
>  GEN_HANDLER(stswi, 0x1F, 0x15, 0x16, 0x0001, PPC_STRING),
>  GEN_HANDLER(stswx, 0x1F, 0x15, 0x14, 0x0001, PPC_STRING),
> -GEN_HANDLER(eieio, 0x1F, 0x16, 0x1A, 0x03FFF801, PPC_MEM_EIEIO),
> +GEN_HANDLER(eieio, 0x1F, 0x16, 0x1A, 0x01FFF801, PPC_MEM_EIEIO),
>  GEN_HANDLER(isync, 0x13, 0x16, 0x04, 0x03FFF801, PPC_MEM),
>  GEN_HANDLER_E(lbarx, 0x1F, 0x14, 0x01, 0, PPC_NONE, PPC2_ATOMIC_ISA206),
>  GEN_HANDLER_E(lharx, 0x1F, 0x14, 0x03, 0, PPC_NONE, PPC2_ATOMIC_ISA206),

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 2/4] hw/ppc/spapr_drc: Replace error_setg(&error_abort) by abort()

2018-06-07 Thread David Gibson

On Thu, Jun 07, 2018 at 11:46:43AM -0300, Philippe Mathieu-Daudé wrote:
> Use abort() instead of error_setg(&error_abort),
> as suggested by the "qapi/error.h" documentation:
> 
> Please don't error_setg(&error_fatal, ...), use error_report() and
> exit(), because that's more obvious.
> Likewise, don't error_setg(&error_abort, ...), use assert().
> 
> Use abort() instead of the suggested assert() because the assertion is
> already verified by the switch case.

I think g_assert_not_reached() would be the right thing here.

> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/ppc/spapr_drc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 8a045d6b93..b934b9c9ed 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -366,7 +366,7 @@ static void prop_get_fdt(Object *obj, Visitor *v, const 
> char *name,
>  break;
>  }
>  default:
> -error_setg(&error_abort, "device FDT in unexpected state: %d", 
> tag);
> +abort(); /* device FDT in unexpected state */
>  }
>  fdt_offset = fdt_offset_next;
>  } while (fdt_depth != 0);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 4/4] device_tree: Replace error_setg(&error_fatal) by error_report() + exit()

2018-06-07 Thread David Gibson

On Thu, Jun 07, 2018 at 11:46:45AM -0300, Philippe Mathieu-Daudé wrote:
> Use error_report() + exit() instead of error_setg(&error_fatal),
> as suggested by the "qapi/error.h" documentation:
> 
>Please don't error_setg(&error_fatal, ...), use error_report() and
>exit(), because that's more obvious.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> Reviewed-by: Eric Auger 
> Reviewed-by: Markus Armbruster 

Reviewed-by: David Gibson 

> ---
>  device_tree.c | 23 +--
>  1 file changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/device_tree.c b/device_tree.c
> index 52c3358a55..3553819257 100644
> --- a/device_tree.c
> +++ b/device_tree.c
> @@ -140,15 +140,16 @@ static void read_fstree(void *fdt, const char *dirname)
>  const char *parent_node;
>  
>  if (strstr(dirname, root_dir) != dirname) {
> -error_setg(&error_fatal, "%s: %s must be searched within %s",
> -   __func__, dirname, root_dir);
> +error_report("%s: %s must be searched within %s",
> + __func__, dirname, root_dir);
> +exit(1);
>  }
>  parent_node = &dirname[strlen(SYSFS_DT_BASEDIR)];
>  
>  d = opendir(dirname);
>  if (!d) {
> -error_setg(&error_fatal, "%s cannot open %s", __func__, dirname);
> -return;
> +error_report("%s cannot open %s", __func__, dirname);
> +exit(1);
>  }
>  
>  while ((de = readdir(d)) != NULL) {
> @@ -162,7 +163,8 @@ static void read_fstree(void *fdt, const char *dirname)
>  tmpnam = g_strdup_printf("%s/%s", dirname, de->d_name);
>  
>  if (lstat(tmpnam, &st) < 0) {
> -error_setg(&error_fatal, "%s cannot lstat %s", __func__, tmpnam);
> +error_report("%s cannot lstat %s", __func__, tmpnam);
> +exit(1);
>  }
>  
>  if (S_ISREG(st.st_mode)) {
> @@ -170,8 +172,9 @@ static void read_fstree(void *fdt, const char *dirname)
>  gsize len;
>  
>  if (!g_file_get_contents(tmpnam, &val, &len, NULL)) {
> -error_setg(&error_fatal, "%s not able to extract info from 
> %s",
> -   __func__, tmpnam);
> +error_report("%s not able to extract info from %s",
> + __func__, tmpnam);
> +exit(1);
>  }
>  
>  if (strlen(parent_node) > 0) {
> @@ -206,9 +209,9 @@ void *load_device_tree_from_sysfs(void)
>  host_fdt = create_device_tree(&host_fdt_size);
>  read_fstree(host_fdt, SYSFS_DT_BASEDIR);
>  if (fdt_check_header(host_fdt)) {
> -error_setg(&error_fatal,
> -   "%s host device tree extracted into memory is invalid",
> -   __func__);
> +error_report("%s host device tree extracted into memory is invalid",
> + __func__);
> +exit(1);
>  }
>  return host_fdt;
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH] cpu hot-del: leak fix by free the relevant members

2018-06-07 Thread liujunjie

THese leaks are found by ASAN with CPU hot-add and hot-del actions,
such as:
==14127==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4096 byte(s) in 1 object(s) allocated from:
#0 0x7fc321cb6ec0 in posix_memalign 
(/usr/local/gcc-6-4/lib64/libasan.so.3+0xc7ec0)
#1 0xf756b9 in qemu_try_memalign util/oslib-posix.c:110
#2 0xf7575b in qemu_memalign util/oslib-posix.c:126
#3 0x7769cb in kvm_arch_init_vcpu /root/qemu/target/i386/kvm.c:1103
#4 0x4cc4e6 in qemu_kvm_cpu_thread_fn /root/qemu/cpus.c:1201
#5 0x7fc31cb28dc4 in start_thread (/usr/lib64/libpthread.so.0+0x7dc4)

Direct leak of 4096 byte(s) in 1 object(s) allocated from:
#0 0x7fc321cb6560 in calloc (/usr/local/gcc-6-4/lib64/libasan.so.3+0xc7560)
#1 0x7fc31cf944d6 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x4f4d6)
#2 0x4cc4e6 in qemu_kvm_cpu_thread_fn /root/qemu/cpus.c:1201
#3 0x7fc31cb28dc4 in start_thread (/usr/lib64/libpthread.so.0+0x7dc4)

Direct leak of 184 byte(s) in 1 object(s) allocated from:
#0 0x7fc321cb6560 in calloc (/usr/local/gcc-6-4/lib64/libasan.so.3+0xc7560)
#1 0x7fc31cf944d6 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x4f4d6)
#2 0x4cda83 in qemu_init_vcpu /root/qemu/cpus.c:1993
#3 0x6aedbd in x86_cpu_realizefn /root/qemu/target/i386/cpu.c:4739
#4 0x8edfc8 in device_set_realized hw/core/qdev.c:826
#5 0xcff60c in property_set_bool qom/object.c:1928
#6 0xd09bd2 in object_property_set_qobject qom/qom-qobject.c:27
#7 0xd048e3 in object_property_set_bool qom/object.c:1188
#8 0x79ad83 in qdev_device_add /root/qemu/qdev-monitor.c:627
#9 0x79b50c in qmp_device_add /root/qemu/qdev-monitor.c:807
#10 0xf4ca5d in do_qmp_dispatch qapi/qmp-dispatch.c:119
#11 0xf4ce3c in qmp_dispatch qapi/qmp-dispatch.c:168
#12 0x4e2a5a in monitor_qmp_dispatch_one /root/qemu/monitor.c:4088
#13 0x4e2baf in monitor_qmp_bh_dispatcher /root/qemu/monitor.c:4146
#14 0xf67ad1 in aio_bh_poll util/async.c:118
#15 0xf724a3 in aio_dispatch util/aio-posix.c:436
#16 0xf67270 in aio_ctx_dispatch util/async.c:261
#17 0x7fc31cf8e999 in g_main_context_dispatch 
(/usr/lib64/libglib-2.0.so.0+0x4)

Direct leak of 64 byte(s) in 2 object(s) allocated from:
#0 0x7fc321cb6560 in calloc (/usr/local/gcc-6-4/lib64/libasan.so.3+0xc7560)
#1 0x7fc31cf944d6 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x4f4d6)
#2 0x4cda1f in qemu_init_vcpu /root/qemu/cpus.c:1997
#3 0x6aedbd in x86_cpu_realizefn /root/qemu/target/i386/cpu.c:4739
#4 0x8edfc8 in device_set_realized hw/core/qdev.c:826
#5 0xcff60c in property_set_bool qom/object.c:1928
#6 0xd09bd2 in object_property_set_qobject qom/qom-qobject.c:27
#7 0xd048e3 in object_property_set_bool qom/object.c:1188
#8 0x79ad83 in qdev_device_add /root/qemu/qdev-monitor.c:627
#9 0x79b50c in qmp_device_add /root/qemu/qdev-monitor.c:807
#10 0xf4ca5d in do_qmp_dispatch qapi/qmp-dispatch.c:119
#11 0xf4ce3c in qmp_dispatch qapi/qmp-dispatch.c:168
#12 0x4e2a5a in monitor_qmp_dispatch_one /root/qemu/monitor.c:4088
#13 0x4e2baf in monitor_qmp_bh_dispatcher /root/qemu/monitor.c:4146
#14 0xf67ad1 in aio_bh_poll util/async.c:118
#15 0xf724a3 in aio_dispatch util/aio-posix.c:436
#16 0xf67270 in aio_ctx_dispatch util/async.c:261
#17 0x7fc31cf8e999 in g_main_context_dispatch 
(/usr/lib64/libglib-2.0.so.0+0x4)

SUMMARY: AddressSanitizer: 8440 byte(s) leaked in 5 allocation(s).

The relevant members in CPU Structure are freed to fix leak. Meanwhile, 
although the
VMChangeStateEntry added in kvm_arch_init_vcpu does not be reportd by ASAN 
since it still
in vm_change_state_head, it's not longer used after hot-del, so free it, too.

Signed-off-by: liujunjie 
Signed-off-by: linzhecheng 
---
 accel/kvm/kvm-all.c  |  1 +
 cpus.c   |  6 ++
 include/sysemu/kvm.h |  2 ++
 target/arm/kvm.c |  4 
 target/i386/cpu.h|  2 ++
 target/i386/kvm.c| 19 ++-
 6 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index ffee68e..ff5e738 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -305,6 +305,7 @@ int kvm_destroy_vcpu(CPUState *cpu)
 vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
 vcpu->kvm_fd = cpu->kvm_fd;
 QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node);
+kvm_arch_destroy_vcpu(cpu);
 err:
 return ret;
 }
diff --git a/cpus.c b/cpus.c
index d1f1629..cbe28d6 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1842,6 +1842,12 @@ void cpu_remove_sync(CPUState *cpu)
 qemu_mutex_unlock_iothread();
 qemu_thread_join(cpu->thread);
 qemu_mutex_lock_iothread();
+g_free(cpu->thread);
+cpu->thread = NULL;
+g_free(cpu->halt_cond);
+cpu->halt_cond = NULL;
+g_free(cpu->cpu_ases);
+cpu->cpu_ases = NULL;
 }
 
 /* For temporary buffers for forming a name */
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 0b64b8e..f7db7db 100644
--- a/include/sysemu/kvm

Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-07 Thread Michael S. Tsirkin

On Fri, Jun 08, 2018 at 10:34:25AM +0800, Peter Xu wrote:
> On Fri, Jun 08, 2018 at 04:58:21AM +0300, Michael S. Tsirkin wrote:
> > On Fri, Jun 08, 2018 at 09:37:23AM +0800, Peter Xu wrote:
> > > > > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > > > > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > > > > 
> > > > > "The driver MUST NOT accept a feature which the device did not offer"
> > > > > 
> > > > > Then I'm curious what would happen if:
> > > > > 
> > > > > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > > > > - a guest that enabled PAGE_POISON
> > > > > 
> > > > > Then how the driver could tell the host that PAGE_POISON is enabled
> > > > > considering that guest should never set that feature bit if the
> > > > > emulation code didn't provide it?
> > 
> > It wouldn't. It just has to deal with the fact that host can discard
> > writes to hinted pages. Right now driver deals with it simply by
> > disabling F_FREE_PAGE_HINT.
> 
> Ah I see.  Thanks Michael.
> 
> Then it seems to me that it's more important to implement the F_POISON
> along with where it is declared since otherwise it'll be a real broken
> (device declares F_POISON, guest assumes it can handle the POISON so
> guest will enable FREE_PAGE_HINT, however the device can't really
> handle that).

It seems to handle it fine, it just ignores the hints.

> Or, if the guest driver is capable to drop F_FREE_PAGE_HINT when
> F_POISON is not declared, we can safely split the two features into
> two patches in QEMU too.
> 
> Regards,
> 
> -- 
> Peter Xu

[Qemu-devel] [PATCH V6 RESEND 7/7] migration/ram: ensure write persistence on loading all data to PMEM.

2018-06-07 Thread junyan . he

From: Junyan He 

Because we need to make sure the pmem kind memory data is synced
after migration, we choose to call pmem_persist() when the migration
finish. This will make sure the data of pmem is safe and will not
lose if power is off.

Signed-off-by: Junyan He 
---
 include/qemu/pmem.h | 1 +
 migration/ram.c | 8 
 stubs/pmem.c| 4 
 3 files changed, 13 insertions(+)

diff --git a/include/qemu/pmem.h b/include/qemu/pmem.h
index 00d6680..8f52b08 100644
--- a/include/qemu/pmem.h
+++ b/include/qemu/pmem.h
@@ -17,6 +17,7 @@
 #else  /* !CONFIG_LIBPMEM */
 
 void *pmem_memcpy_persist(void *pmemdest, const void *src, size_t len);
+void pmem_persist(const void *addr, size_t len);
 
 #endif /* CONFIG_LIBPMEM */
 
diff --git a/migration/ram.c b/migration/ram.c
index aa0c6f0..15418c2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,6 +33,7 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
+#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3046,6 +3047,13 @@ static int ram_load_setup(QEMUFile *f, void *opaque)
 static int ram_load_cleanup(void *opaque)
 {
 RAMBlock *rb;
+
+RAMBLOCK_FOREACH(rb) {
+if (ramblock_is_pmem(rb)) {
+pmem_persist(rb->host, rb->used_length);
+}
+}
+
 xbzrle_load_cleanup();
 compress_threads_load_cleanup();
 
diff --git a/stubs/pmem.c b/stubs/pmem.c
index b4ec72d..f794262 100644
--- a/stubs/pmem.c
+++ b/stubs/pmem.c
@@ -17,3 +17,7 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, 
size_t len)
 {
 return memcpy(pmemdest, src, len);
 }
+
+void pmem_persist(const void *addr, size_t len)
+{
+}
-- 
2.7.4

Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-07 Thread Peter Xu

On Fri, Jun 08, 2018 at 04:58:21AM +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 08, 2018 at 09:37:23AM +0800, Peter Xu wrote:
> > > > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > > > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > > > 
> > > > "The driver MUST NOT accept a feature which the device did not offer"
> > > > 
> > > > Then I'm curious what would happen if:
> > > > 
> > > > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > > > - a guest that enabled PAGE_POISON
> > > > 
> > > > Then how the driver could tell the host that PAGE_POISON is enabled
> > > > considering that guest should never set that feature bit if the
> > > > emulation code didn't provide it?
> 
> It wouldn't. It just has to deal with the fact that host can discard
> writes to hinted pages. Right now driver deals with it simply by
> disabling F_FREE_PAGE_HINT.

Ah I see.  Thanks Michael.

Then it seems to me that it's more important to implement the F_POISON
along with where it is declared since otherwise it'll be a real broken
(device declares F_POISON, guest assumes it can handle the POISON so
guest will enable FREE_PAGE_HINT, however the device can't really
handle that).

Or, if the guest driver is capable to drop F_FREE_PAGE_HINT when
F_POISON is not declared, we can safely split the two features into
two patches in QEMU too.

Regards,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-07 Thread Peter Xu

On Thu, Jun 07, 2018 at 07:59:22PM +0800, Wei Wang wrote:
> On 06/07/2018 02:32 PM, Peter Xu wrote:
> > On Thu, Jun 07, 2018 at 01:24:29PM +0800, Wei Wang wrote:
> > > On 06/06/2018 07:02 PM, Peter Xu wrote:
> > > > On Wed, Jun 06, 2018 at 06:04:23PM +0800, Wei Wang wrote:
> > > > > On 06/06/2018 01:42 PM, Peter Xu wrote:
> > > > > > IMHO migration states do not suite here.  IMHO bitmap syncing is too
> > > > > > frequently an operation, especially at the end of a precopy 
> > > > > > migration.
> > > > > > If you really want to introduce some notifiers, I would prefer
> > > > > > something new rather than fiddling around with migration state.  
> > > > > > E.g.,
> > > > > > maybe a new migration event notifiers, then introduce two new events
> > > > > > for both start/end of bitmap syncing.
> > > > > Please see if below aligns to what you meant:
> > > > > 
> > > > > MigrationState {
> > > > > ...
> > > > > + int ram_save_state;
> > > > > 
> > > > > }
> > > > > 
> > > > > typedef enum RamSaveState {
> > > > >   RAM_SAVE_BEGIN = 0,
> > > > >   RAM_SAVE_END = 1,
> > > > >   RAM_SAVE_MAX = 2
> > > > > }
> > > > > 
> > > > > then at the step 1) and 3) you concluded somewhere below, we change 
> > > > > the
> > > > > state and invoke the callback.
> > > > I mean something like this:
> > > > 
> > > > 1693c64c27 ("postcopy: Add notifier chain", 2018-03-20)
> > > > 
> > > > That was a postcopy-only notifier.  Maybe we can generalize it into a
> > > > more common notifier for the migration framework so that we can even
> > > > register with non-postcopy events like bitmap syncing?
> > > Precopy already has its own notifiers: git 99a0db9b
> > > If we want to reuse, that one would be more suitable. I think mixing
> > > non-related events into one notifier list isn't nice.
> > I think that's only for migration state changes?
> > 
> > > > > Btw, the migration_state_notifiers is already there, but seems not 
> > > > > really
> > > > > used (I only tracked spice-core.c called
> > > > > add_migration_state_change_notifier). I thought adding new migration 
> > > > > states
> > > > > can reuse all that we have.
> > > > > What's your real concern about that? (not sure how defining new 
> > > > > events would
> > > > > make a difference)
> > > > Migration state is exposed via control path (QMP).  Adding new states
> > > > mean that the QMP clients will see more.  IMO that's not really
> > > > anything that a QMP client will need to know, instead we can keep it
> > > > internally.  That's a reason from compatibility pov.
> > > > 
> > > > Meanwhile, it's not really a state-thing at all for me.  It looks
> > > > really more like hook or event (start/stop of sync).
> > > Thanks for sharing your concerns in detail, which are quite helpful for 
> > > the
> > > discussion. To reuse 99a0db9b, we can also add sub-states (or say events),
> > > instead of new migration states.
> > > For example, we can still define "enum RamSaveState" as above, which can 
> > > be
> > > an indication for the notifier queued on the 99a0db9b notider_list to 
> > > decide
> > > whether to call start or stop.
> > > Does this solve your concern?
> > Frankly speaking I don't fully understand how you would add that
> > sub-state.  If you are confident with the idea, maybe you can post
> > your new version with the change, then I can read the code.
> 
> Sure. Code is more straightforward for this one. Let's check it in the new
> version.
> 
> > > > > > This is not that obvious to me.  For now I think it's true, since 
> > > > > > when
> > > > > > we call stop() we'll take the mutex, meanwhile the mutex is actually
> > > > > > always held by the iothread (in the big loop in
> > > > > > virtio_balloon_poll_free_page_hints) until either:
> > > > > > 
> > > > > > - it sleeps in qemu_cond_wait() [1], or
> > > > > > - it leaves the big loop [2]
> > > > > > 
> > > > > > Since I don't see anyone who will set dev->block_iothread to true 
> > > > > > for
> > > > > > the balloon device, then [1] cannot happen;
> > > > > there is a case in virtio_balloon_set_status which sets 
> > > > > dev->block_iothread
> > > > > to true.
> > > > > 
> > > > > Did you mean the free_page_lock mutex? it is released at the bottom 
> > > > > of the
> > > > > while() loop in virtio_balloon_poll_free_page_hint. It's actually 
> > > > > released
> > > > > for every hint. That is,
> > > > > 
> > > > > while(1){
> > > > >   take the lock;
> > > > >   process 1 hint from the vq;
> > > > >   release the lock;
> > > > > }
> > > > Ah, so now I understand why you need the lock to be inside the loop,
> > > > since the loop is busy polling actually.  Is it possible to do this in
> > > > an async way?
> > > We need to use polling here because of some back story in the guest side
> > > (due to some locks being held) that makes it a barrier to sending
> > > notifications for each hints.
> > Any link to the "back story" that I can learn about? :) If it's too
> > complicated a problem

Re: [Qemu-devel] [PATCH 0/4] cuda/mos6522 migration fixes

2018-06-07 Thread David Gibson

On Thu, Jun 07, 2018 at 06:17:47PM +0100, Mark Cave-Ayland wrote:
> Whilst performing a random migration test for the Mac machines I noticed
> a regression (patch 1) which prevented the loadvm from completing
> successfully. A big thank you to Peter and David on IRC who pointed me
> in the right direction in order to fix the bug.
> 
> Once that was working I spent a bit more time analysing the migration
> stream and realised that the mos6522 device state wasn't being embedded
> within the CUDA device, but instead being maintained separately which is
> solved by patch 2.
> 
> Patch 3 is something I noticed whilst rearranging the existing code based
> upon my better understanding of QOM/qdev and ensures that the timer frequency
> is always set correctly post-migration for the device and its parent class.
> This leaves no remaining functionality in the mos6522 realize function and so
> allows it to be removed.
> 
> Finally patch 4 was suggested by Peter on IRC whilst helping me investigate
> the original migration issue, and removes the last remaining user of
> VMSTATE_TIMER_PTR_TEST from the codebase.
> 
> Signed-off-by: Mark Cave-Ayland 

Applied to ppc-for-3.0.

> 
> 
> Mark Cave-Ayland (4):
>   mos6522: fix vmstate_mos6522_timer version in vmstate_mos6522
>   cuda: embed mos6522_cuda device directly rather than using QOM object
> link
>   mos6522: move timer frequency initialisation to mos6522_reset
>   mos6522: convert VMSTATE_TIMER_PTR_TEST to VMSTATE_TIMER_PTR
> 
>  hw/misc/macio/cuda.c | 50 
> +++-
>  hw/misc/mos6522.c| 26 ++-
>  include/hw/misc/macio/cuda.h | 27 +++-
>  include/hw/misc/mos6522.h|  4 +++-
>  4 files changed, 42 insertions(+), 65 deletions(-)
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 2/4] cuda: embed mos6522_cuda device directly rather than using QOM object link

2018-06-07 Thread David Gibson

On Thu, Jun 07, 2018 at 06:17:49PM +0100, Mark Cave-Ayland wrote:
> Examining the migration stream it can be seen that the mos6522 device state is
> being stored separately rather than as part of the CUDA device which is
> incorrect (and likely to cause issues if another mos6522 device is added to
> the machine).
> 
> Resolve this by embedding the mos6522_cuda device directly within the CUDA
> device rather than using a QOM object link to reference the device separately.
> 
> Note that we also bump the version in vmstate_cuda to reflect this change: 
> this
> isn't particularly important for the moment as the Mac machine migration isn't
> 100% reliable due to issues migrating the timebase under TCG.

And apart from that the mac machine types aren't yet versioned, so
we're not really trying to support migration between different qemu
versions anyway.

> 
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/misc/macio/cuda.c | 44 
> ++--
>  hw/misc/mos6522.c|  2 +-
>  include/hw/misc/macio/cuda.h | 27 ---
>  include/hw/misc/mos6522.h|  2 ++
>  4 files changed, 33 insertions(+), 42 deletions(-)
> 
> diff --git a/hw/misc/macio/cuda.c b/hw/misc/macio/cuda.c
> index bd9b862034..8aba2e63ec 100644
> --- a/hw/misc/macio/cuda.c
> +++ b/hw/misc/macio/cuda.c
> @@ -65,7 +65,7 @@ static void cuda_receive_packet_from_host(CUDAState *s,
>  static uint64_t cuda_get_counter_value(MOS6522State *s, MOS6522Timer *ti)
>  {
>  MOS6522CUDAState *mcs = container_of(s, MOS6522CUDAState, parent_obj);
> -CUDAState *cs = mcs->cuda;
> +CUDAState *cs = container_of(mcs, CUDAState, mos6522_cuda);
>  
>  /* Reverse of the tb calculation algorithm that Mac OS X uses on bootup 
> */
>  uint64_t tb_diff = muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
> @@ -78,7 +78,7 @@ static uint64_t cuda_get_counter_value(MOS6522State *s, 
> MOS6522Timer *ti)
>  static uint64_t cuda_get_load_time(MOS6522State *s, MOS6522Timer *ti)
>  {
>  MOS6522CUDAState *mcs = container_of(s, MOS6522CUDAState, parent_obj);
> -CUDAState *cs = mcs->cuda;
> +CUDAState *cs = container_of(mcs, CUDAState, mos6522_cuda);
>  
>  uint64_t load_time = muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
>cs->tb_frequency, NANOSECONDS_PER_SECOND);
> @@ -88,7 +88,7 @@ static uint64_t cuda_get_load_time(MOS6522State *s, 
> MOS6522Timer *ti)
>  static void cuda_set_sr_int(void *opaque)
>  {
>  CUDAState *s = opaque;
> -MOS6522CUDAState *mcs = s->mos6522_cuda;
> +MOS6522CUDAState *mcs = &s->mos6522_cuda;
>  MOS6522State *ms = MOS6522(mcs);
>  MOS6522DeviceClass *mdc = MOS6522_DEVICE_GET_CLASS(ms);
>  
> @@ -97,7 +97,7 @@ static void cuda_set_sr_int(void *opaque)
>  
>  static void cuda_delay_set_sr_int(CUDAState *s)
>  {
> -MOS6522CUDAState *mcs = s->mos6522_cuda;
> +MOS6522CUDAState *mcs = &s->mos6522_cuda;
>  MOS6522State *ms = MOS6522(mcs);
>  MOS6522DeviceClass *mdc = MOS6522_DEVICE_GET_CLASS(ms);
>  int64_t expire;
> @@ -117,7 +117,7 @@ static void cuda_delay_set_sr_int(CUDAState *s)
>  /* NOTE: TIP and TREQ are negated */
>  static void cuda_update(CUDAState *s)
>  {
> -MOS6522CUDAState *mcs = s->mos6522_cuda;
> +MOS6522CUDAState *mcs = &s->mos6522_cuda;
>  MOS6522State *ms = MOS6522(mcs);
>  int packet_received, len;
>  
> @@ -462,7 +462,7 @@ static void cuda_receive_packet_from_host(CUDAState *s,
>  static uint64_t mos6522_cuda_read(void *opaque, hwaddr addr, unsigned size)
>  {
>  CUDAState *s = opaque;
> -MOS6522CUDAState *mcs = s->mos6522_cuda;
> +MOS6522CUDAState *mcs = &s->mos6522_cuda;
>  MOS6522State *ms = MOS6522(mcs);
>  
>  addr = (addr >> 9) & 0xf;
> @@ -473,7 +473,7 @@ static void mos6522_cuda_write(void *opaque, hwaddr addr, 
> uint64_t val,
> unsigned size)
>  {
>  CUDAState *s = opaque;
> -MOS6522CUDAState *mcs = s->mos6522_cuda;
> +MOS6522CUDAState *mcs = &s->mos6522_cuda;
>  MOS6522State *ms = MOS6522(mcs);
>  
>  addr = (addr >> 9) & 0xf;
> @@ -492,9 +492,11 @@ static const MemoryRegionOps mos6522_cuda_ops = {
>  
>  static const VMStateDescription vmstate_cuda = {
>  .name = "cuda",
> -.version_id = 4,
> -.minimum_version_id = 4,
> +.version_id = 5,
> +.minimum_version_id = 5,
>  .fields = (VMStateField[]) {
> +VMSTATE_STRUCT(mos6522_cuda.parent_obj, CUDAState, 0, 
> vmstate_mos6522,
> +   MOS6522State),
>  VMSTATE_UINT8(last_b, CUDAState),
>  VMSTATE_UINT8(last_acr, CUDAState),
>  VMSTATE_INT32(data_in_size, CUDAState),
> @@ -530,12 +532,8 @@ static void cuda_realize(DeviceState *dev, Error **errp)
>  DeviceState *d;
>  struct tm tm;
>  
> -d = qdev_create(NULL, TYPE_MOS6522_CUDA);
> -object_property_set_link(OBJECT(d), OBJECT(s), "cuda", errp);
> -qdev_init_nofail(d);
> -s->mos6522_cu

Re: [Qemu-devel] [PATCH 0/3] ppc: trivial Mac fixes

2018-06-07 Thread David Gibson

On Thu, Jun 07, 2018 at 05:59:52PM +0100, Mark Cave-Ayland wrote:
> Here are some trivial Mac fixes taken from various working branches in my
> local git repository. Since they should need minimal review it seems
> worthwhile to send them separately.
> 
> Signed-off-by: Mark Cave-Ayland 

Applied to ppc-for-3.0, thanks.

> 
> 
> Mark Cave-Ayland (3):
>   ppc: remove obsolete pci_pmac_init() definitions from mac.h
>   ppc: remove obsolete macio_init() definition from mac.h
>   ppc: add missing FW_CFG_PPC_NVRAM_FLAT definition
> 
>  hw/ppc/mac.h | 9 -
>  include/hw/ppc/ppc.h | 1 +
>  2 files changed, 1 insertion(+), 9 deletions(-)
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-07 Thread Peter Xu

On Fri, Jun 08, 2018 at 09:37:23AM +0800, Peter Xu wrote:

[...]

> > > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > > 
> > > "The driver MUST NOT accept a feature which the device did not offer"
> > > 
> > > Then I'm curious what would happen if:
> > > 
> > > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > > - a guest that enabled PAGE_POISON
> > > 
> > > Then how the driver could tell the host that PAGE_POISON is enabled
> > > considering that guest should never set that feature bit if the
> > > emulation code didn't provide it?
> > > 
> > 
> > All the emulator implementations need to follow the virtio spec. We will
> > finally have this feature written to the virtio-balloon device section, and
> > state that the F_PAGE_POISON needs to be set on the device when
> > F_FREE_PAGE_HINT is set on the device.
> 
> Okay.  Still I would think a single feature cleaner here since they
> are actually tightly bound together, e.g., normally AFAIU this only
> happens when we introduce FEATURE1, after a while we introduced
> FEATURE2, then we need to have two features there since there are
> emulators that are already running only with FEATURE1.
> 
> AFAICT the thing behind is that your kernel patches are split (one for
> FEATURE1, one for FEATURE2), however when you only have FEATURE1 it's
> actually broken (if without the POISON support, PAGE_HINT feature
> might be broken).  So it would be nicer if the kernel patches are
> squashed so that no commit would broke any guest.  And, if they are
> squashed then IMHO we don't need two feature bits at all. ;)
> 
> But anyway, I understand it now.  Thanks,

This also reminds me that since we're going to declare both features
in this single patch, the final version of the patch should contain
the implementation of poisoned bits rather than a todo, am I right?

Regards,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-07 Thread Michael S. Tsirkin

On Fri, Jun 08, 2018 at 09:37:23AM +0800, Peter Xu wrote:
> > > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > > 
> > > "The driver MUST NOT accept a feature which the device did not offer"
> > > 
> > > Then I'm curious what would happen if:
> > > 
> > > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > > - a guest that enabled PAGE_POISON
> > > 
> > > Then how the driver could tell the host that PAGE_POISON is enabled
> > > considering that guest should never set that feature bit if the
> > > emulation code didn't provide it?

It wouldn't. It just has to deal with the fact that host can discard
writes to hinted pages. Right now driver deals with it simply by
disabling F_FREE_PAGE_HINT.

-- 
MST

Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-07 Thread Peter Xu

On Thu, Jun 07, 2018 at 08:01:42PM +0800, Wei Wang wrote:
> On 06/07/2018 02:58 PM, Peter Xu wrote:
> > On Thu, Jun 07, 2018 at 01:29:22PM +0800, Wei Wang wrote:
> > > > > > [...]
> > > > > > > +static const VMStateDescription 
> > > > > > > vmstate_virtio_balloon_free_page_report = {
> > > > > > > +.name = "virtio-balloon-device/free-page-report",
> > > > > > > +.version_id = 1,
> > > > > > > +.minimum_version_id = 1,
> > > > > > > +.needed = virtio_balloon_free_page_support,
> > > > > > > +.fields = (VMStateField[]) {
> > > > > > > +VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> > > > > > > +VMSTATE_UINT32(poison_val, VirtIOBalloon),
> > > > > > (could we move all the poison-related lines into another patch or
> > > > > > postpone?  after all we don't support it yet, do we?)
> > > > > > 
> > > > >We don't support migrating poison value, but guest maybe use it, 
> > > > > so we are
> > > > > actually disabling this feature in that case. Probably good to leave 
> > > > > the
> > > > > code together to handle that case.
> > > > Could we just avoid declaring that feature bit in emulation code
> > > > completely?  I mean, we support VIRTIO_BALLOON_F_FREE_PAGE_HINT first
> > > > as the first step (as you mentioned in commit message, the POISON is a
> > > > TODO).  Then when you really want to completely support the POISON
> > > > bit, you can put all that into a separate patch.  Would that work?
> > > > 
> > > Not really. The F_PAGE_POISON isn't a feature configured via QEMU cmd line
> > > like F_FREE_PAGE_HINT. We always set F_PAGE_POISON if F_FREE_PAGE_HINT is
> > > enabled. It is used to detect if the guest is using page poison.
> > Ok I think I kind of understand.  But it seems strange to me to have
> > this as a feature bit.  I thought it suites more to be a config field
> > so that guest could setup.  Like, we can have 1 byte to setup "whether
> > PAGE_POISON is used in the guest", another 1 byte to setup "what is
> > the PAGE_POISON value if it's enabled".
> 
> This is also suggested by Michael, which sounds good to me. Using config is
> doable, but that doesn't show advantages over using feature bits.
> 
> 
> 
> > 
> > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > 
> > "The driver MUST NOT accept a feature which the device did not offer"
> > 
> > Then I'm curious what would happen if:
> > 
> > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > - a guest that enabled PAGE_POISON
> > 
> > Then how the driver could tell the host that PAGE_POISON is enabled
> > considering that guest should never set that feature bit if the
> > emulation code didn't provide it?
> > 
> 
> All the emulator implementations need to follow the virtio spec. We will
> finally have this feature written to the virtio-balloon device section, and
> state that the F_PAGE_POISON needs to be set on the device when
> F_FREE_PAGE_HINT is set on the device.

Okay.  Still I would think a single feature cleaner here since they
are actually tightly bound together, e.g., normally AFAIU this only
happens when we introduce FEATURE1, after a while we introduced
FEATURE2, then we need to have two features there since there are
emulators that are already running only with FEATURE1.

AFAICT the thing behind is that your kernel patches are split (one for
FEATURE1, one for FEATURE2), however when you only have FEATURE1 it's
actually broken (if without the POISON support, PAGE_HINT feature
might be broken).  So it would be nicer if the kernel patches are
squashed so that no commit would broke any guest.  And, if they are
squashed then IMHO we don't need two feature bits at all. ;)

But anyway, I understand it now.  Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] hw/arm/smmuv3: fix smmu emulation when guest smmu is in passthrough mode

2018-06-07 Thread Jia He

Hi Eric

On 6/8/2018 1:06 AM, Auger Eric Wrote:
> Hi Jia,
> 
> On 06/07/2018 03:38 PM, Jia He wrote:
>> There is an exception when I passes iommu.passthrough=1 to guest's
>> kernel boot parameter(host QDF2400 kernel 4.17, guest kernel 4.14).
>> The guest will be hang when booting up.
>>
>> When guest smmu is in passthrough mode, entry.perm will not be assigned
>> to flag in smmuv3_translate. It seems not be correct.
>>
>> After this patch, I have tested in 4 cases and all passed.
>> host smmu on/passthrough + guest smmu on/passthrough
>>
>> Signed-off-by: jia...@hxt-semitech.com
>> ---
>>  hw/arm/smmuv3.c | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
>> index 42dc521..5c46102 100644
>> --- a/hw/arm/smmuv3.c
>> +++ b/hw/arm/smmuv3.c
>> @@ -560,6 +560,12 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
>> *mr, hwaddr addr,
>>  }
>>  
>>  ret = smmuv3_decode_config(mr, &cfg, &event);
>> +
>> +if (cfg.bypassed) {
>> +ret = 0;
>> +goto out;
>> +}
> Thank you for spotting this bug. Yes you're correct: on bypass we
> effectively need to set the IOMMUTLBEntry perm flags.
> 
> Reading the code again I think the error handling logic is really
> confusing and if you don't mind, I suggest I submit a global fix. On
> bypass we should rather have a bypass trace event issued instead of the
> translated one. To me the aborted case is not properly handled either as
> we are going to record an event whereas we shouldn't. In case of
> abort/bypass translation structure decoding should rather return 0 I
> think instead of returning errors. Please let me know if it suits you.
> 
Ok with me

Cheers，
Jia

Re: [Qemu-devel] [PATCH] kvm: support -realtime cpu-pm=on|off

2018-06-07 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180608000540.276617-1-...@redhat.com
Subject: [Qemu-devel] [PATCH] kvm: support -realtime cpu-pm=on|off

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 t [tag update]patchew/20180607154705.6316-1-da...@redhat.com -> 
patchew/20180607154705.6316-1-da...@redhat.com
 * [new tag]   patchew/20180608000540.276617-1-...@redhat.com -> 
patchew/20180608000540.276617-1-...@redhat.com
Switched to a new branch 'test'
e388d56861 kvm: support -realtime cpu-pm=on|off

=== OUTPUT BEGIN ===
Checking PATCH 1/1: kvm: support -realtime cpu-pm=on|off...
WARNING: line over 80 characters
#68: FILE: target/i386/kvm.c:1045:
+int disable_exits = kvm_check_extension(cs->kvm_state, 
KVM_CAP_X86_DISABLE_EXITS);

WARNING: line over 80 characters
#88: FILE: target/i386/kvm.c:1065:
+error_report("kvm: guest stopping CPU not supported: %s", 
strerror(-ret));

ERROR: do not initialise globals to 0 or NULL
#104: FILE: vl.c:145:
+bool enable_cpu_pm = false;

total: 1 errors, 2 warnings, 86 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Qemu-devel] [PATCH] kvm: support -realtime cpu-pm=on|off

2018-06-07 Thread Michael S. Tsirkin

With this flag, kvm allows guest to control host CPU power state.  This
increases latency for other processes using same host CPU in an
unpredictable way, but if decreases idle entry/exit times for the
running VCPU.

Follow-up patches will expose this capability to guest
(using mwait leaf).

Based on a patch by Wanpeng Li  .

Signed-off-by: Michael S. Tsirkin 
---
 include/sysemu/sysemu.h |  1 +
 target/i386/kvm.c   | 26 ++
 vl.c|  6 ++
 qemu-options.hx |  9 +++--
 4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index e893f72f3b..b921c6f3b7 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -128,6 +128,7 @@ extern bool boot_strict;
 extern uint8_t *boot_splash_filedata;
 extern size_t boot_splash_filedata_size;
 extern bool enable_mlock;
+extern bool enable_cpu_pm;
 extern uint8_t qemu_extra_params_fw[2];
 extern QEMUClockType rtc_clock;
 extern const char *mem_path;
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 44f70733e7..e4c3e65e00 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1041,6 +1041,32 @@ int kvm_arch_init_vcpu(CPUState *cs)
 }
 }
 
+if (enable_cpu_pm) {
+int disable_exits = kvm_check_extension(cs->kvm_state, 
KVM_CAP_X86_DISABLE_EXITS);
+int ret;
+
+/* Work around for kernel header with a typo. TODO: fix header and drop. */
+#if defined(KVM_X86_DISABLE_EXITS_HTL) && !defined(KVM_X86_DISABLE_EXITS_HLT)
+#define KVM_X86_DISABLE_EXITS_HLT KVM_X86_DISABLE_EXITS_HTL
+#endif
+if (disable_exits) {
+disable_exits &= (KVM_X86_DISABLE_EXITS_MWAIT |
+  KVM_X86_DISABLE_EXITS_HLT |
+  KVM_X86_DISABLE_EXITS_PAUSE);
+/* PV unhalt relies on HALT to cause an exit */
+if (env->user_features[FEAT_KVM] & (1U << KVM_FEATURE_PV_UNHALT)) {
+disable_exits &= ~KVM_X86_DISABLE_EXITS_HLT;
+}
+}
+
+ret = kvm_vm_enable_cap(cs->kvm_state, KVM_CAP_X86_DISABLE_EXITS, 0,
+disable_exits);
+if (ret < 0) {
+error_report("kvm: guest stopping CPU not supported: %s", 
strerror(-ret));
+}
+}
+
+
 qemu_add_vm_change_state_handler(cpu_update_state, env);
 
 c = cpuid_find_entry(&cpuid_data.cpuid, 1, 0);
diff --git a/vl.c b/vl.c
index 06031715ac..7bea9c2177 100644
--- a/vl.c
+++ b/vl.c
@@ -142,6 +142,7 @@ ram_addr_t ram_size;
 const char *mem_path = NULL;
 int mem_prealloc = 0; /* force preallocation of physical target memory */
 bool enable_mlock = false;
+bool enable_cpu_pm = false;
 int nb_nics;
 NICInfo nd_table[MAX_NICS];
 int autostart;
@@ -386,6 +387,10 @@ static QemuOptsList qemu_realtime_opts = {
 .name = "mlock",
 .type = QEMU_OPT_BOOL,
 },
+{
+.name = "cpu-pm",
+.type = QEMU_OPT_BOOL,
+},
 { /* end of list */ }
 },
 };
@@ -3904,6 +3909,7 @@ int main(int argc, char **argv, char **envp)
 exit(1);
 }
 enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
+enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", false);
 break;
 case QEMU_OPTION_msg:
 opts = qemu_opts_parse_noisily(qemu_find_opts("msg"), optarg,
diff --git a/qemu-options.hx b/qemu-options.hx
index c0d3951e9f..e6f31071ce 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3325,16 +3325,21 @@ Do not start CPU at startup (you must type 'c' in the 
monitor).
 ETEXI
 
 DEF("realtime", HAS_ARG, QEMU_OPTION_realtime,
-"-realtime [mlock=on|off]\n"
+"-realtime [mlock=on|off][cpu-halt=on|off[\n"
 "run qemu with realtime features\n"
-"mlock=on|off controls mlock support (default: on)\n",
+"mlock=on|off controls mlock support (default: on)\n"
+"cpu-pm=on|off controls cpu power management (default: 
off)\n",
 QEMU_ARCH_ALL)
 STEXI
 @item -realtime mlock=on|off
+@item -realtime cpu-pm=on|off
 @findex -realtime
 Run qemu with realtime features.
 mlocking qemu and guest memory can be enabled via @option{mlock=on}
 (enabled by default).
+guest ability to manage power state of host cpus (increasing latency for other
+processes on the same host cpu, but decreasing latency for guest)
+can be enabled via @option{cpu-pm=on} (disabled by default).
 ETEXI
 
 DEF("gdb", HAS_ARG, QEMU_OPTION_gdb, \
-- 
MST

Re: [Qemu-devel] [qemu PATCH 3/5] hw/i386: Update SSDT table used by "make check"

2018-06-07 Thread Michael S. Tsirkin

On Thu, Jun 07, 2018 at 04:31:09PM -0600, Ross Zwisler wrote:
> This commit:
> 
> commit aa78a16d8645 ("hw/i386: Rename 2.13 machine types to 3.0")
> 
> updated the name used to create the q35 machine, which in turn changed the
> SSDT table which is generated when we run "make check":
> 
>   acpi-test: Warning! SSDT mismatch. Actual [asl:/tmp/asl-QZDWJZ.dsl,
>   aml:/tmp/aml-T8JYJZ], Expected [asl:/tmp/asl-DTWVJZ.dsl,
>   aml:tests/acpi-test-data/q35/SSDT.dimmpxm].
> 
> Here's the only difference, aside from the checksum:
> 
>   < Name (MEMA, 0x07FFF000)
>   ---
>   > Name (MEMA, 0x07FFE000)

Weird. How come the phys address changes just because of machine name?

> 
> Update the binary table that we compare against so it reflects this name
> change.
> 
> Signed-off-by: Ross Zwisler 
> Cc: Peter Maydell 
> Cc: Cornelia Huck 
> Cc: Thomas Huth 
> Cc: Eduardo Habkost 
> Fixes: commit aa78a16d8645 ("hw/i386: Rename 2.13 machine types to 3.0")
> ---
>  tests/acpi-test-data/q35/SSDT.dimmpxm | Bin 685 -> 685 bytes
>  1 file changed, 0 insertions(+), 0 deletions(-)
> 
> diff --git a/tests/acpi-test-data/q35/SSDT.dimmpxm 
> b/tests/acpi-test-data/q35/SSDT.dimmpxm
> index 
> 8ba0e67cb72daa81a65da4906d37a5e0f4af1fd4..2d5b721bcf9c398feb6d005761f898015042e8a4
>  100644
> GIT binary patch
> delta 23
> fcmZ3>x|WqIIM^j*EfWI+qr*n71x(Bz{<8xBPCEwk
> 
> delta 23
> fcmZ3>x|WqIIM^j*EfWI+W57nP1x(Bj{<8xBPMZev
> 
> -- 
> 2.14.4

Re: [Qemu-devel] [qemu PATCH 2/5] acpi: "make check" should fail on asl mismatch

2018-06-07 Thread Michael S. Tsirkin

On Thu, Jun 07, 2018 at 04:31:08PM -0600, Ross Zwisler wrote:
> Currently if "make check" detects a mismatch in the ASL generated during
> testing, we print an error such as:
> 
>   acpi-test: Warning! SSDT mismatch. Actual [asl:/tmp/asl-QZDWJZ.dsl,
>   aml:/tmp/aml-T8JYJZ], Expected [asl:/tmp/asl-DTWVJZ.dsl,
>   aml:tests/acpi-test-data/q35/SSDT.dimmpxm].
> 
> but the testing still exits with good shell status.  This is wrong, and
> makes bisecting such a failure difficult.
> 
> Signed-off-by: Ross Zwisler 

Failing would also mean that any change must update the expected files
at the same time.  And that in turn is problematic because expected
files are binary and can't be merged.

In other words the way we devel ACPI right now means that bisect will
periodically produce a diff, it's not an error.


> ---
>  tests/bios-tables-test.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
> index 256d463cb8..9b5db1ee8f 100644
> --- a/tests/bios-tables-test.c
> +++ b/tests/bios-tables-test.c
> @@ -469,6 +469,7 @@ static void test_acpi_asl(test_data *data)
>  }
>  }
>}
> +  g_test_fail();
>  }
>  g_string_free(asl, true);
>  g_string_free(exp_asl, true);
> -- 
> 2.14.4

[Qemu-devel] [qemu PATCH 5/5] nvdimm: make persistence option symbolic

2018-06-07 Thread Ross Zwisler

Replace the "nvdimm-cap" option which took numeric arguments such as "2"
with a more user friendly "nvdimm-persistence" option which takes symbolic
arguments "cpu" or "mem-ctrl".

Signed-off-by: Ross Zwisler 
Suggested-by: Michael S. Tsirkin 
Suggested-by: Dan Williams 
---
 docs/nvdimm.txt  | 31 ---
 hw/acpi/nvdimm.c |  4 ++--
 hw/i386/pc.c | 35 +--
 include/hw/i386/pc.h |  2 +-
 include/hw/mem/nvdimm.h  |  3 ++-
 tests/bios-tables-test.c |  2 +-
 6 files changed, 35 insertions(+), 42 deletions(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index 8b48fb4633..24b443b655 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -154,29 +154,22 @@ guest software that this vNVDIMM device contains a region 
that cannot
 accept persistent writes. In result, for example, the guest Linux
 NVDIMM driver, marks such vNVDIMM device as read-only.
 
-Platform Capabilities
--
+NVDIMM Persistence
+--
 
 ACPI 6.2 Errata A added support for a new Platform Capabilities Structure
 which allows the platform to communicate what features it supports related to
-NVDIMM data durability.  Users can provide a capabilities value to a guest via
-the optional "nvdimm-cap" machine command line option:
+NVDIMM data persistence.  Users can provide a persistence value to a guest via
+the optional "nvdimm-persistence" machine command line option:
 
--machine pc,accel=kvm,nvdimm,nvdimm-cap=2
+-machine pc,accel=kvm,nvdimm,nvdimm-persistence=cpu
 
-This "nvdimm-cap" field is an integer, and is the combined value of the
-various capability bits defined in table 5-137 of the ACPI 6.2 Errata A spec.
+There are currently two valid values for this option:
 
-Here is a quick summary of the three bits that are defined as of that spec:
+"mem-ctrl" - The platform supports flushing dirty data from the memory
+ controller to the NVDIMMs in the event of power loss.
 
-Bit[0] - CPU Cache Flush to NVDIMM Durability on Power Loss Capable.
-Bit[1] - Memory Controller Flush to NVDIMM Durability on Power Loss Capable.
- Note: If bit 0 is set to 1 then this bit shall be set to 1 as well.
-Bit[2] - Byte Addressable Persistent Memory Hardware Mirroring Capable.
-
-So, a "nvdimm-cap" value of 2 would mean that the platform supports Memory
-Controller Flush on Power Loss, a value of 3 would mean that the platform
-supports CPU Cache Flush and Memory Controller Flush on Power Loss, etc.
-
-For a complete list of the flags available and for more detailed descriptions,
-please consult the ACPI spec.
+"cpu"  - The platform supports flushing dirty data from the CPU cache to
+ the NVDIMMs in the event of power loss.  This implies that the
+ platform also supports flushing dirty data through the memory
+ controller on power loss.
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 87e4280c71..27eeb6609f 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -404,8 +404,8 @@ static GArray 
*nvdimm_build_device_structure(AcpiNVDIMMState *state)
 }
 g_slist_free(device_list);
 
-if (state->capabilities) {
-nvdimm_build_structure_caps(structures, state->capabilities);
+if (state->persistence) {
+nvdimm_build_structure_caps(structures, state->persistence);
 }
 
 return structures;
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index f3befe6721..5bba9dcf5a 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2181,31 +2181,30 @@ static void pc_machine_set_nvdimm(Object *obj, bool 
value, Error **errp)
 pcms->acpi_nvdimm_state.is_enabled = value;
 }
 
-static void pc_machine_get_nvdimm_capabilities(Object *obj, Visitor *v,
-   const char *name, void *opaque,
-   Error **errp)
+static char *pc_machine_get_nvdimm_persistence(Object *obj, Error **errp)
 {
 PCMachineState *pcms = PC_MACHINE(obj);
-uint32_t value = pcms->acpi_nvdimm_state.capabilities;
 
-visit_type_uint32(v, name, &value, errp);
+return g_strdup(pcms->acpi_nvdimm_state.persistence_string);
 }
 
-static void pc_machine_set_nvdimm_capabilities(Object *obj, Visitor *v,
-   const char *name, void *opaque,
+static void pc_machine_set_nvdimm_persistence(Object *obj, const char *value,
Error **errp)
 {
 PCMachineState *pcms = PC_MACHINE(obj);
-Error *error = NULL;
-uint32_t value;
-
-visit_type_uint32(v, name, &value, &error);
-if (error) {
-error_propagate(errp, error);
-return;
+AcpiNVDIMMState *nvdimm_state = &pcms->acpi_nvdimm_state;
+
+if (strcmp(value, "cpu") == 0)
+nvdimm_state->persistence = 3;
+else if (strcmp(value, "mem-ctrl") == 0)
+nvdimm_state->persistence = 2;
+else {
+error_report("-machine nvdimm-persistence=%

[Qemu-devel] [qemu PATCH 1/5] gitignore: ignore generated qapi job files

2018-06-07 Thread Ross Zwisler

With a fully built QEMU I currently see the following with "git status":

  Untracked files:
(use "git add ..." to include in what will be committed)

qapi/qapi-commands-job.c
qapi/qapi-commands-job.h
qapi/qapi-events-job.c
qapi/qapi-events-job.h
qapi/qapi-types-job.c
qapi/qapi-types-job.h
qapi/qapi-visit-job.c
qapi/qapi-visit-job.h

These are all generated files.  Ignore them.

Signed-off-by: Ross Zwisler 
Cc: Kevin Wolf 
Cc: Eric Blake 
Fixes: commit bf42508f24ee ("job: Introduce qapi/job.json")
---
 .gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.gitignore b/.gitignore
index 81e1f2fb0f..2980090f0a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -100,6 +100,7 @@
 /qapi/qapi-visit-ui.[ch]
 /qapi/qapi-visit.[ch]
 /qapi/qapi-doc.texi
+/qapi/qapi-*-job.[ch]
 /qemu-doc.html
 /qemu-doc.info
 /qemu-doc.txt
-- 
2.14.4

Re: [Qemu-devel] [RFC v2 04/12] Add vhost-user-backend

2018-06-07 Thread Marc-André Lureau

Hi

On Mon, Jun 4, 2018 at 11:36 AM, Daniel P. Berrangé  wrote:
> On Fri, Jun 01, 2018 at 06:27:41PM +0200, Marc-André Lureau wrote:
>> Create a vhost-user-backend object that holds a connection to a
>> vhost-user backend and can be referenced from virtio devices that
>> support it. See later patches for input & gpu usage.
>>
>> A chardev can be specified to communicate with the vhost-user backend,
>> ex: -chardev socket,id=char0,path=/tmp/foo.sock -object
>> vhost-user-backend,id=vuid,chardev=char0.
>>
>> Alternatively, an executable with its arguments may be given as 'cmd'
>> property, ex: -object
>> vhost-user-backend,id=vui,cmd="./vhost-user-input /dev/input..". The
>> executable is then spawn and, by convention, the vhost-user socket is
>> passed as fd=3. It may be considered a security breach to allow
>> creating processes that may execute arbitrary executables, so this may
>> be restricted to some known executables (via signature etc) or
>> directory.
>
> Passing a binary and args as a string blob.
>
>> +static int
>> +vhost_user_backend_spawn_cmd(VhostUserBackend *b, int vhostfd, Error **errp)
>> +{
>> +int devnull = open("/dev/null", O_RDWR);
>> +pid_t pid;
>> +
>> +assert(!b->child);
>> +
>> +if (!b->cmd) {
>> +error_setg_errno(errp, errno, "Missing cmd property");
>> +return -1;
>> +}
>> +if (devnull < 0) {
>> +error_setg_errno(errp, errno, "Unable to open /dev/null");
>> +return -1;
>> +}
>> +
>> +pid = qemu_fork(errp);
>> +if (pid < 0) {
>> +close(devnull);
>> +return -1;
>> +}
>> +
>> +if (pid == 0) { /* child */
>> +int fd, maxfd = sysconf(_SC_OPEN_MAX);
>> +
>> +dup2(devnull, STDIN_FILENO);
>> +dup2(devnull, STDOUT_FILENO);
>> +dup2(vhostfd, 3);
>> +
>> +signal(SIGINT, SIG_IGN);
>
> Why ignore SIGINT ?  Surely we want this extra process to be killed
> someone ctrl-c's the parent QEMU.

leftover, removed

>
>> +
>> +for (fd = 4; fd < maxfd; fd++) {
>> +close(fd);
>> +}
>> +
>> +execlp("/bin/sh", "sh", "-c", b->cmd, NULL);
>
> ...which is then interpreted by the shell is a recipe for security
> flaws. There needs to be a way to pass the command + arguments
> to QEMU as an argv[] we can directly exec without involving the
> shell.
>

For now, I use g_shell_parse_argv(). Do you have a better idea?

>> +_exit(1);
>> +}
>> +
>> +b->child = QIO_CHANNEL(qio_channel_command_new_pid(devnull, devnull, 
>> pid));
>
> Overall this method overall duplicates much of the
> qio_channel_command_new_argv(), though you do have a few differences.
>
> I'd prefer if we could make qio_channel_command_new_argv more flexible to
> handle these extra needs though.
>

Ok I added a pre-exec callback for the extra dup2() & close().

Thanks,




-- 
Marc-André Lureau

[Qemu-devel] [qemu PATCH 3/5] hw/i386: Update SSDT table used by "make check"

2018-06-07 Thread Ross Zwisler

This commit:

commit aa78a16d8645 ("hw/i386: Rename 2.13 machine types to 3.0")

updated the name used to create the q35 machine, which in turn changed the
SSDT table which is generated when we run "make check":

  acpi-test: Warning! SSDT mismatch. Actual [asl:/tmp/asl-QZDWJZ.dsl,
  aml:/tmp/aml-T8JYJZ], Expected [asl:/tmp/asl-DTWVJZ.dsl,
  aml:tests/acpi-test-data/q35/SSDT.dimmpxm].

Here's the only difference, aside from the checksum:

  < Name (MEMA, 0x07FFF000)
  ---
  > Name (MEMA, 0x07FFE000)

Update the binary table that we compare against so it reflects this name
change.

Signed-off-by: Ross Zwisler 
Cc: Peter Maydell 
Cc: Cornelia Huck 
Cc: Thomas Huth 
Cc: Eduardo Habkost 
Fixes: commit aa78a16d8645 ("hw/i386: Rename 2.13 machine types to 3.0")
---
 tests/acpi-test-data/q35/SSDT.dimmpxm | Bin 685 -> 685 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/tests/acpi-test-data/q35/SSDT.dimmpxm 
b/tests/acpi-test-data/q35/SSDT.dimmpxm
index 
8ba0e67cb72daa81a65da4906d37a5e0f4af1fd4..2d5b721bcf9c398feb6d005761f898015042e8a4
 100644
GIT binary patch
delta 23
fcmZ3>x|WqIIM^j*EfWI+qr*n71x(Bz{<8xBPCEwk

delta 23
fcmZ3>x|WqIIM^j*EfWI+W57nP1x(Bj{<8xBPMZev

-- 
2.14.4

[Qemu-devel] [qemu PATCH 2/5] acpi: "make check" should fail on asl mismatch

2018-06-07 Thread Ross Zwisler

Currently if "make check" detects a mismatch in the ASL generated during
testing, we print an error such as:

  acpi-test: Warning! SSDT mismatch. Actual [asl:/tmp/asl-QZDWJZ.dsl,
  aml:/tmp/aml-T8JYJZ], Expected [asl:/tmp/asl-DTWVJZ.dsl,
  aml:tests/acpi-test-data/q35/SSDT.dimmpxm].

but the testing still exits with good shell status.  This is wrong, and
makes bisecting such a failure difficult.

Signed-off-by: Ross Zwisler 
---
 tests/bios-tables-test.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index 256d463cb8..9b5db1ee8f 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -469,6 +469,7 @@ static void test_acpi_asl(test_data *data)
 }
 }
   }
+  g_test_fail();
 }
 g_string_free(asl, true);
 g_string_free(exp_asl, true);
-- 
2.14.4

[Qemu-devel] [qemu PATCH 4/5] machine: fix some misspelled words

2018-06-07 Thread Ross Zwisler

Normally this might not be worth fixing, but several of these are strings
which are displayed to users.

Signed-off-by: Ross Zwisler 
---
 hw/core/machine.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 617e5f8d75..a21269fa39 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -609,7 +609,7 @@ static void machine_class_init(ObjectClass *oc, void *data)
 machine_get_igd_gfx_passthru, machine_set_igd_gfx_passthru,
 &error_abort);
 object_class_property_set_description(oc, "igd-passthru",
-"Set on/off to enable/disable igd passthrou", &error_abort);
+"Set on/off to enable/disable igd passthru", &error_abort);
 
 object_class_property_add_str(oc, "firmware",
 machine_get_firmware, machine_set_firmware,
@@ -633,7 +633,7 @@ static void machine_class_init(ObjectClass *oc, void *data)
 machine_get_memory_encryption, machine_set_memory_encryption,
 &error_abort);
 object_class_property_set_description(oc, "memory-encryption",
-"Set memory encyption object to use", &error_abort);
+"Set memory encryption object to use", &error_abort);
 }
 
 static void machine_class_base_init(ObjectClass *oc, void *data)
@@ -806,7 +806,7 @@ void machine_run_board_init(MachineState *machine)
 for (i = 0; machine_class->valid_cpu_types[i]; i++) {
 if (object_class_dynamic_cast(class,
   machine_class->valid_cpu_types[i])) {
-/* The user specificed CPU is in the valid field, we are
+/* The user specified CPU is in the valid field, we are
  * good to go.
  */
 break;
-- 
2.14.4

Re: [Qemu-devel] [RFC v2 09/12] Add vhost-input-pci

2018-06-07 Thread Marc-André Lureau

Hi

On Mon, Jun 4, 2018 at 10:58 AM, Gerd Hoffmann  wrote:
>> +#define TYPE_VHOST_USER_INPUT_PCI "vhost-user-input-pci"
>
> Patch $subject mismatch.
>
>> +struct VHostUserInput {
>> +VirtIOInput   parent_obj;
>> +
>> +VhostUserBackend  *vhost;
>> +};
>
> Nothing input specific here ...

Except VirtIOInput

>
>> +static void vhost_input_change_active(VirtIOInput *vinput)
>> +{
>> +VHostUserInput *vhi = VHOST_USER_INPUT(vinput);
>> +
>> +if (!vhi->vhost) {
>> +return;
>> +}
>> +
>> +if (vinput->active) {
>> +vhost_user_backend_start(vhi->vhost);
>> +} else {
>> +vhost_user_backend_stop(vhi->vhost);
>> +}
>> +}
>
> ... and here ...

Except it's a VirtIOInputClass callback
>
>> +static const VMStateDescription vmstate_vhost_input = {
>> +.name = "vhost-user-input",
>> +.unmigratable = 1,
>> +};
>
> ... and here ...
>
>> +static void vhost_input_is_busy(const Object *obj, const char *name,
>> +Object *val, Error **errp)
>> +{
>> +VHostUserInput *vhi = VHOST_USER_INPUT(obj);
>> +
>> +if (vhi->vhost) {
>> +error_setg(errp, "can't use already busy vhost-user");
>> +} else {
>> +qdev_prop_allow_set_link_before_realize(obj, name, val, errp);
>> +}
>> +}
>
> ... likewise ...
>
> So, maybe it makes sense to have a abstact base class for vhost-user
> devices?  And possibly move the vhost-backend code to the base class
> then?

The device inherits from virtio-input type. So we could somehow not
expose -object vhost-user and instead have internal vhost-user object
& properties aliased & duplicated on each -device vhost-user*. I would
rather keep the -object solution, since it's somehow cleaner, more
flexible and simpler to document that way. Or do you have a better
idea?

-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH v1 0/2] memory: fix alignment checks/asserts

2018-06-07 Thread Michael S. Tsirkin

On Thu, Jun 07, 2018 at 05:47:03PM +0200, David Hildenbrand wrote:
> We can currently hit two asserts. Let's fix those.
> 
> Patch nr. 1 is a result from:
> "[PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers"
> 
> We treat right now any alignment > 1GB as a violation, as it would
> fragment guest memory heavily. Turn the assert into a check.
> 
> Also, we run into an assert when using alignments that are not a power of
> two.

Reviewed-by: Michael S. Tsirkin 

> David Hildenbrand (2):
>   memory-device: turn alignment assert into check
>   exec: check that alignment is a power of two
> 
>  exec.c | 4 
>  hw/mem/memory-device.c | 8 +++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> -- 
> 2.17.0

Re: [Qemu-devel] storing machine data in qcow images?

2018-06-07 Thread Michael S. Tsirkin

On Wed, Jun 06, 2018 at 07:06:27PM +0200, Max Reitz wrote:
> On 2018-06-06 17:09, Michael S. Tsirkin wrote:
> > On Wed, Jun 06, 2018 at 04:51:39PM +0200, Max Reitz wrote:
> >> On 2018-06-06 16:31, Dr. David Alan Gilbert wrote:
> >>> * Max Reitz (mre...@redhat.com) wrote:
>  On 2018-06-06 14:00, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mre...@redhat.com) wrote:
> >> On 2018-06-06 13:14, Dr. David Alan Gilbert wrote:
> >>> * Max Reitz (mre...@redhat.com) wrote:
>  On 2018-06-05 11:21, Dr. David Alan Gilbert wrote:
> > 
> >>>
> >>> 
> >>>
> >>> The problem with having a separate file is that you either have to 
> >>> copy
> >>> it around with the image 
> >>
> >> Which is just an inconvenience.
> >
> > It's more than that;  if it's a separate file then the tools can't
> > rely on users supplying it, and frankly they won't and they'll still
> > just supply an image.
> 
>  At which point you throw an error and tell them to specify the config 
>  file.
> >>>
> >>> No:
> >>>a) At the moment they get away with it for images since they're all
> >>>   'pc' and the management layers do the right thing.
> >>
> >> So so far nobody has complained?  I don't really see the problem then.
> >>
> >> If deploying a disk and using all the defaults works out for users,
> >> great.  If they want more options, apparently they already know they
> >> have to provide some config.
> > 
> > QEMU's usability is terrible. There are tons of tools out there to try
> > to tame it, but of course they lack the knowledge of the VM internals
> > that QEMU has.
> 
> Er, yeah, OK.  But it was my understanding that we decided that we have
> a management layer on top of qemu to make things simple.

Who's we? I don't think the QEMU community completely gave up on people
using QEMU directly. It will need to be much more user-friendly than it
is right now. But it's possible. Fabrice built an emulator in
javascript, you go to a URL bam it runs a VM.

> Also, this is once more a case of first deciding what we want at all.

Who's we here again? Different people want different things. Enough
people seem to want to store tagged data with a disk image that it might
be worth someone's while to try to add that capability for starters to
qemu-img.

> Dave wants configuration options for the upper management layer which
> are completely opaque to qemu.  That has nothing to do whatsoever with
> the usability of qemu itself.

That's why I keep saying, let's start with implementing a mechanism,
worry about policy later if at all.

> >>>b) They'll give the wrong config file - then you'd need to add a flag
> >>>  to detect that - which means you'd need to add something to the
> >>>  qcow to match it to the config; loop back to teh start!
> >>
> >> I'm not sure how seriously I should take this argument.  Do stupid
> >> things, win stupid prizes.
> >>
> >> If that's the issue, add a UUID to qcow2 files and reference it from the
> >> config file.
> >>
> >>> We should make this EASY for users.
> >>
> >> To me, having a simple config file they can edit manually certainly
> >> seems simpler than having to use specific tools to edit it inside of the
> >> qcow2 file.
> > 
> > I think you are one of the happy users familiar with qemu intricacies
> > and/or using a tool on top that does it for you.
> 
> Yeah, virt-manager and sometimes libvirt directly.  Works nicely.  In
> any case, having to manage more than a single file was never one of my
> worries.  In fact, I never had to manage any file because both tools do
> it for me.
> 
> And again, I don't know what the usability of qemu has to do with what
> Dave is proposing.
> 
> [...]

I think what we are seeing here is many people jumping on the
bandwagon and finding more and more uses for ability to store
meta-data in the qcow2 file.

This just means we should make it flexible enough to possibly
support more uses. It does not mean we need to make it
read mail on day 1.

> >> Because I think (maybe I'm wrong, though) where to store it heavily
> >> depends on what we want to store and how we want to use it.
> > 
> > I don't really see why.
> 
> For instance, supporting full-blown appliances would mean supporting
> multiple images.  Maybe in multiple formats.  Maybe the user wants
> runtime performance and is willing to give up a bit of installation time
> for that (e.g. for unpacking an archive).
> 
> In any case, if we want to be able to configure every kind of VM, tying
> everything to qcow2 seems like a bad idea.  First defining a format and
> then deciding on whether it makes sense to be able to put it into qcow2
> for certain subcases seems much more reasonable.
> 
> And if you make the format decidedly qcow2-independent, the whole
> "putting it into qcow2 is the simplest implementation" argument becomes
> rather weak.

I don't see why. Yes I think it's a separate format that we should just
allow storing in qcow2 fo

Re: [Qemu-devel] [RFC PATCH 00/19] block: Configuration fixes and rbd authentication

2018-06-07 Thread Jeff Cody

On Thu, Jun 07, 2018 at 08:25:40AM +0200, Markus Armbruster wrote:
> This series is RFC because:
> 
> * It clashes with parts of Max's "[PATCH 00/13] block: Try to create
>   well typed json:{} filenames".  I missed that one until too late,
>   sorry.
> 
>   - I stole "[PATCH 06/13] block: Add block-specific QDict header",
> and tacked on fixups I'd like to have.
> 
>   - My qobject_input_visitor_new_flat_confused() addresses the same
> general problem as Max's qdict_stringify_for_keyval(): the block
> layer's confused use of QObject types.  My solution fixes a number
> of bugs in -blockdev, blockdev-add and -drive.  If Max's solution
> adds further value, we need to merge the two somehow.
> 
>   - I changed qdict_flatten() to fix -blockdev and blockdev-add for
> empty objects and arrays.  Max fixed it to not mess up shallow
> clones.  We need to merge the two.
> 
> * Rbd testing is incomplete.  Jeff Cody volunteered to test on his
>   rig.  Results should be in soon.
> 

Here are some results from auth testing of various combinations; I haven't
completed all the combinations in my matrix yet, but what I have completed
looks like what I would expect.

These were all tested with blockdev-add QAPI commands against this patch
series.

I'll be away on PTO tomorrow (Friday), so I'll conclude testing on Monday.

Warning, long lines below, so don't read it on a vt220 (apologies in
advance if you do...):


  Server|   Client-Side (qemu host)
+--
|   
 |
ServerAuth  |   userkey-secret/etc/ceph/keyring 
auth-client-required | Result
+--
cephx, none |   --- ---   ---   --- 
{"return": {}}
cephx, none |   --- ---   valid --- 
{"return": {}}
cephx, none |   --- ---   invalid   --- 
{"error": {"class": "GenericError", "desc": "error connecting: Invalid 
argument"}}
cephx, none |   --- valid ---   --- 
{"return": {}}
cephx, none |   --- invalid   ---   --- 
{"error": {"class": "GenericError", "desc": "error connecting: Invalid 
argument"}}
cephx, none |   --- invalid   valid --- 
{"error": {"class": "GenericError", "desc": "error connecting: Invalid 
argument"}}
cephx, none |   --- valid invalid   --- 
{"return": {}}
cephx, none |   admin   ---   valid --- 

cephx, none |   admin   ---   invalid   --- 

cephx, none |   invalid ---   valid --- 

cephx, none |   invalid ---   invalid   --- 


cephx, none |   --- ---   ---   none
{"return": {}}
cephx, none |   --- ---   valid none
{"return": {}}
cephx, none |   --- ---   invalid   none
{"return": {}}
cephx, none |   --- valid ---   none
{"return": {}}
cephx, none |   --- invalid   ---   none
{"return": {}}
cephx, none |   --- invalid   valid none
{"return": {}}
cephx, none |   --- valid invalid   none
{"return": {}}
cephx, none |   admin   ---   valid none
{"return": {}}
cephx, none |   admin   ---   invalid   none
{"return": {}}
cephx, none |   invalid ---   valid none
{"return": {}}
cephx, none |   invalid ---   invalid   none
{"return": {}}
|
cephx, none |   --- ---   ---   cephx   
{"error": {"class": "GenericError", "desc": "error connecting: No such 
file or directory"}}
cephx, none |   --- ---   valid cephx   
{"return": {}}
cephx, none |   --- ---   invalid   cephx   
{"error": {"class": "GenericError", "desc": "error connecting: Invalid 
argument"}}
cephx, none |   --- valid ---   cephx   
{"return": {}}
cephx, none |   --- invalid   ---

Re: [Qemu-devel] storing machine data in qcow images?

2018-06-07 Thread Michael S. Tsirkin

On Thu, Jun 07, 2018 at 12:54:33PM +0200, Andrea Bolognani wrote:
> On Thu, 2018-06-07 at 11:36 +0100, Daniel P. Berrangé wrote:
> > On Thu, Jun 07, 2018 at 11:32:18AM +0100, Richard W.M. Jones wrote:
> > > Another problem which Laszlo mentioned is the varstore isn't portable
> > > between UEFI implementations, or if the UEFI is compiled with
> > > different options.  You can even imagine shipping multiple
> > > varstores(!) which argues for a tar-like format.
> > 
> > Could we perhaps imagine shipping the actual UEFI bios, rather
> > than only the varstore.  The bios blob runs in guest context,
> > so there shouldn't be able security concerns from hosting
> > vendors with running user provided bios. Mostly its a matter
> > of confidence that the interface between bios & qemu is stable
> > which feels easier than assuming varstore vs different bios is
> > portable.
> 
> That sounds sensible, and further reinforces the idea that we
> need way more than a single string baked into the qcow2 file.

I don't think anyone said we want a single string.
What was proposed is a set of key value pairs with
values being binary blobs.

> -- 
> Andrea Bolognani / Red Hat / Virtualization

Re: [Qemu-devel] storing machine data in qcow images?

2018-06-07 Thread Michael S. Tsirkin

On Thu, Jun 07, 2018 at 11:36:20AM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 07, 2018 at 11:32:18AM +0100, Richard W.M. Jones wrote:
> > On Thu, Jun 07, 2018 at 12:02:29PM +0200, Andrea Bolognani wrote:
> > > Something that I haven't seen mentioned in the thread - and this
> > > looks like as good a point as any to jump in - is that for q35
> > > guests using EFI as well as aarch64 guests the "one click import"
> > > experience requires not only hints about the machine (and firmware!)
> > > type, but also a copy of the EFI variable store:
> > > 
> > >   $ virt-builder fedora-27 --arch aarch64 --notes
> > >   Fedora® 27 Server (aarch64)
> > > 
> > >   [...]
> > > 
> > >   You will need to use the associated UEFI NVRAM variables file:
> > > http://libguestfs.org/download/builder/fedora-27-aarch64-nvram.xz
> > 
> > This is true, although only sometimes.  If the bootloader[*] has a
> > working fallback path then usually it is able to boot and reset the
> > UEFI varstore back to the correct values.  We have had bugs before
> > where the fallback path was not working, eg:
> > 
> >   https://bugzilla.redhat.com/show_bug.cgi?id=1353689 (yours!)
> >   https://bugzilla.redhat.com/show_bug.cgi?id=1558793
> > 
> > Another problem which Laszlo mentioned is the varstore isn't portable
> > between UEFI implementations, or if the UEFI is compiled with
> > different options.  You can even imagine shipping multiple
> > varstores(!) which argues for a tar-like format.
> 
> Could we perhaps imagine shipping the actual UEFI bios, rather
> than only the varstore.

That's pretty unusual, UEFI is designed to abstract away the
hardware. It normally ships with the hardware.

I don't think it's a good idea to stick firmware itself in the image:
updating guest images is already a problem, at least we can easily fix
firmware bugs by dnf update on the host.

> The bios blob runs in guest context,
> so there shouldn't be able security concerns from hosting
> vendors with running user provided bios.

It seems possible that users that do supply their own firmware
will want to save it with the image. I don't think
we should do it for the standard firmware.


> Mostly its a matter
> of confidence that the interface between bios & qemu is stable
> which feels easier than assuming varstore vs different bios is
> portable. IIRC, shipping actual UEFI BIOS is something that was
> desirable for AMD SEV usage.

For SEV storing the un-encrypted binary, having QEMU read it out and write
it into guest memory isn't any better than shipping it with QEMU.

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH 0/4] cuda/mos6522 migration fixes

2018-06-07 Thread Philippe Mathieu-Daudé

On 06/07/2018 05:17 PM, Peter Maydell wrote:
> On 7 June 2018 at 20:18, Philippe Mathieu-Daudé  wrote:
>> You forgot patch 5 "vmstate: Remove VMSTATE_TIMER_PTR_TEST" :)
> 
> It's a generic macro that might have utility in future,
> and it fits into a slot in the matrix of possible VMSTATE
> macros, so I wouldn't bother. (It's kind of irritating that we
> have an almost but not quite orthogonal set of VMSTATE
> macros.)

OK!

[Qemu-devel] [PATCH] chardev: Restore CR,LF on stdio

2018-06-07 Thread Philippe Mathieu-Daudé

Remove the 'stair-step output' on stdio.

This partially reverts commit 12fb0ac05, which was correct
on the mailing list but got corrupted by the maintainer :p

Introduced-by: 3b876140-c035-dd39-75d0-d54c48128...@redhat.com
Reported-by: BALATON Zoltan 
Suggested-by: Thomas Huth 
Tested-by: Laurent Desnogues 
Signed-off-by: Philippe Mathieu-Daudé 
---
See:
http://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg06202.html (bug)
http://lists.nongnu.org/archive/html/qemu-devel/2018-06/msg01309.html (report)

Peter, Can this enters directly as bug-fix?

 chardev/char-stdio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/chardev/char-stdio.c b/chardev/char-stdio.c
index d83e60e787..96375f2ab8 100644
--- a/chardev/char-stdio.c
+++ b/chardev/char-stdio.c
@@ -59,7 +59,7 @@ static void qemu_chr_set_echo_stdio(Chardev *chr, bool echo)
 if (!echo) {
 tty.c_iflag &= ~(IGNBRK | BRKINT | PARMRK | ISTRIP
  | INLCR | IGNCR | ICRNL | IXON);
-tty.c_oflag &= ~OPOST;
+tty.c_oflag |= OPOST;
 tty.c_lflag &= ~(ECHO | ECHONL | ICANON | IEXTEN);
 tty.c_cflag &= ~(CSIZE | PARENB);
 tty.c_cflag |= CS8;
-- 
2.17.1

[Qemu-devel] [Bug 1775702] [NEW] High host CPU load and slower guest after upgrade guest OS Windows 10 to ver 1803

2018-06-07 Thread Giovanni Panozzo

Public bug reported:

After upgrading Windows 10 guest to version 1803, guests VM runs slower
and there is high host CPU load even when guest is almost idle. Did not
happened with windows 10 up to version 1709.

See my 1st report here:
https://askubuntu.com/questions/1033985/kvm-high-host-cpu-load-after-upgrading-vm-to-windows-10-1803

Another user report is here:
https://lime-technology.com/forums/topic/71479-windows-10-vm-cpu-usage/

Tested on: Ubuntu 16.04 with qemu 2.5.0 and i3-3217U, Arch with qemu
2.12 i5-7200U, Ubuntu 18.04 qemu 2.11.1 AMD FX-4300. All three platform
showing the same slowdown and higher host cpu load with windows 10 1803
VM compared to windows 10 1709 VM.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1775702

Title:
  High host CPU load and slower guest after upgrade guest OS Windows 10
  to ver 1803

Status in QEMU:
  New

Bug description:
  After upgrading Windows 10 guest to version 1803, guests VM runs
  slower and there is high host CPU load even when guest is almost idle.
  Did not happened with windows 10 up to version 1709.

  See my 1st report here:
  
https://askubuntu.com/questions/1033985/kvm-high-host-cpu-load-after-upgrading-vm-to-windows-10-1803

  Another user report is here:
  https://lime-technology.com/forums/topic/71479-windows-10-vm-cpu-usage/

  Tested on: Ubuntu 16.04 with qemu 2.5.0 and i3-3217U, Arch with qemu
  2.12 i5-7200U, Ubuntu 18.04 qemu 2.11.1 AMD FX-4300. All three
  platform showing the same slowdown and higher host cpu load with
  windows 10 1803 VM compared to windows 10 1709 VM.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1775702/+subscriptions

Re: [Qemu-devel] [PATCH] Improve file-backed RAM

2018-06-07 Thread no-reply

Hi,

This series failed build test on s390x host. Please find the details below.

Type: series
Message-id: 20180606205629.66987-1-...@google.com
Subject: [Qemu-devel] [PATCH] Improve file-backed RAM

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e
echo "=== ENV ==="
env
echo "=== PACKAGES ==="
rpm -qa
echo "=== TEST BEGIN ==="
CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
echo -n "Using CC: "
realpath $CC
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
999270755b Improve file-backed RAM

=== OUTPUT BEGIN ===
=== ENV ===
LANG=en_US.UTF-8
XDG_SESSION_ID=216587
USER=fam
PWD=/var/tmp/patchew-tester-tmp-7q84_teq/src
HOME=/home/fam
SHELL=/bin/sh
SHLVL=2
PATCHEW=/home/fam/patchew/patchew-cli -s http://patchew.org --nodebug
LOGNAME=fam
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1012/bus
XDG_RUNTIME_DIR=/run/user/1012
PATH=/usr/bin:/bin
_=/usr/bin/env
=== PACKAGES ===
gpg-pubkey-873529b8-54e386ff
glibc-debuginfo-common-2.24-10.fc25.s390x
fedora-release-26-1.noarch
dejavu-sans-mono-fonts-2.35-4.fc26.noarch
xemacs-filesystem-21.5.34-22.20170124hgf412e9f093d4.fc26.noarch
bash-4.4.12-7.fc26.s390x
libSM-1.2.2-5.fc26.s390x
libmpc-1.0.2-6.fc26.s390x
libaio-0.3.110-7.fc26.s390x
libverto-0.2.6-7.fc26.s390x
perl-Scalar-List-Utils-1.48-1.fc26.s390x
iptables-libs-1.6.1-2.fc26.s390x
tcl-8.6.6-2.fc26.s390x
libxshmfence-1.2-4.fc26.s390x
expect-5.45-23.fc26.s390x
perl-Thread-Queue-3.12-1.fc26.noarch
perl-encoding-2.19-6.fc26.s390x
keyutils-1.5.10-1.fc26.s390x
gmp-devel-6.1.2-4.fc26.s390x
enchant-1.6.0-16.fc26.s390x
python-gobject-base-3.24.1-1.fc26.s390x
python3-enchant-1.6.10-1.fc26.noarch
python-lockfile-0.11.0-6.fc26.noarch
python2-pyparsing-2.1.10-3.fc26.noarch
python2-lxml-4.1.1-1.fc26.s390x
librados2-10.2.7-2.fc26.s390x
trousers-lib-0.3.13-7.fc26.s390x
libdatrie-0.2.9-4.fc26.s390x
libsoup-2.58.2-1.fc26.s390x
passwd-0.79-9.fc26.s390x
bind99-libs-9.9.10-3.P3.fc26.s390x
python3-rpm-4.13.0.2-1.fc26.s390x
systemd-233-7.fc26.s390x
virglrenderer-0.6.0-1.20170210git76b3da97b.fc26.s390x
s390utils-ziomon-1.36.1-3.fc26.s390x
s390utils-osasnmpd-1.36.1-3.fc26.s390x
libXrandr-1.5.1-2.fc26.s390x
libglvnd-glx-1.0.0-1.fc26.s390x
texlive-ifxetex-svn19685.0.5-33.fc26.2.noarch
texlive-psnfss-svn33946.9.2a-33.fc26.2.noarch
texlive-dvipdfmx-def-svn40328-33.fc26.2.noarch
texlive-natbib-svn20668.8.31b-33.fc26.2.noarch
texlive-xdvi-bin-svn40750-33.20160520.fc26.2.s390x
texlive-cm-svn32865.0-33.fc26.2.noarch
texlive-beton-svn15878.0-33.fc26.2.noarch
texlive-fpl-svn15878.1.002-33.fc26.2.noarch
texlive-mflogo-svn38628-33.fc26.2.noarch
texlive-texlive-docindex-svn41430-33.fc26.2.noarch
texlive-luaotfload-bin-svn34647.0-33.20160520.fc26.2.noarch
texlive-koma-script-svn41508-33.fc26.2.noarch
texlive-pst-tree-svn24142.1.12-33.fc26.2.noarch
texlive-breqn-svn38099.0.98d-33.fc26.2.noarch
texlive-xetex-svn41438-33.fc26.2.noarch
gstreamer1-plugins-bad-free-1.12.3-1.fc26.s390x
xorg-x11-font-utils-7.5-33.fc26.s390x
ghostscript-fonts-5.50-36.fc26.noarch
libXext-devel-1.3.3-5.fc26.s390x
libusbx-devel-1.0.21-2.fc26.s390x
libglvnd-devel-1.0.0-1.fc26.s390x
emacs-25.3-3.fc26.s390x
alsa-lib-devel-1.1.4.1-1.fc26.s390x
kbd-2.0.4-2.fc26.s390x
dconf-0.26.0-2.fc26.s390x
mc-4.8.19-5.fc26.s390x
doxygen-1.8.13-9.fc26.s390x
dpkg-1.18.24-1.fc26.s390x
libtdb-1.3.13-1.fc26.s390x
python2-pynacl-1.1.1-1.fc26.s390x
perl-Filter-1.58-1.fc26.s390x
python2-pip-9.0.1-11.fc26.noarch
dnf-2.7.5-2.fc26.noarch
bind-license-9.11.2-1.P1.fc26.noarch
libtasn1-4.13-1.fc26.s390x
cpp-7.3.1-2.fc26.s390x
pkgconf-1.3.12-2.fc26.s390x
python2-fedora-0.10.0-1.fc26.noarch
cmake-filesystem-3.10.1-11.fc26.s390x
python3-requests-kerberos-0.12.0-1.fc26.noarch
libmicrohttpd-0.9.59-1.fc26.s390x
GeoIP-GeoLite-data-2018.01-1.fc26.noarch
python2-libs-2.7.14-7.fc26.s390x
libidn2-2.0.4-3.fc26.s390x
p11-kit-devel-0.23.10-1.fc26.s390x
perl-Errno-1.25-396.fc26.s390x
libdrm-2.4.90-2.fc26.s390x
sssd-common-1.16.1-1.fc26.s390x
boost-random-1.63.0-11.fc26.s390x
urw-fonts-2.4-24.fc26.noarch
ccache-3.3.6-1.fc26.s390x
glibc-debuginfo-2.24-10.fc25.s390x
dejavu-fonts-common-2.35-4.fc26.noarch
bind99-license-9.9.10-3.P3.fc26.noarch
ncurses-libs-6.0-8.20170212.fc26.s390x
libpng-1.6.28-2.fc26.s390x
libICE-1.0.9-9.fc26.s390x
perl-Text-ParseWords-3.30-366.fc26.noarch
libtool-ltdl-2.4.6-17.fc26.s390x
libselinux-utils-2.6-7.fc26.s390x
userspace-rcu-0.9.3-2.fc26.s390x
perl-Class-Inspector-1.31-3.fc26.noarch
keyutils-libs-devel-1.5.10-1.fc26.s390x
isl-0.16.1-1.fc26.s390x
libsecret-0.18.5-3.fc26.s390x
compat-openssl10-1.0.2m-1.fc26.s390x
python3-iniparse-0.4-24.fc26.noarch
python3-dateutil-2.6.0-3.fc26.noarch
python3-firewall-0.4.4.5-1.fc26.noarch
python-enum34-1.1.6-1.fc26.n

Re: [Qemu-devel] storing machine data in qcow images?

2018-06-07 Thread Gerd Hoffmann

  Hi,

> > I could be wrong, but I feel like it's significantly less likely
> > that a random QEMU binary won't like a random EFI ROM than it is
> > for a random EFI ROM to not like a random EFI NVRAM.
> 
> True, but it's not that rare to find SeaBIOS+qemu version problems;

Hmm?  Any recent examples?  Since we switched over to have qemu generate
the acpi tables instead of expecting the firmware doing it (qemu 1.5 or
1.6 IIRC) there where no hard lockstep updates.  Only soft dependencies
a'la "if you want use the new qemu feature foo you also need a seabios
supporting the new feature foo".

> so I'll assume the same happens with EFI.

We try to avoid it but sometimes it doesn't work out as we like.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH v1.1 0/9] target/m68k: Convert to TranslatorOps

2018-06-07 Thread Laurent Vivier

Le 12/05/2018 à 07:02, Richard Henderson a écrit :
> [ Ho, hum.  I didn't clear out my scratch directory before sending v1.0. ]
> 
> FYI, I've only tested this with linux-user-test-0.3 and
> our qemu coldfire testing kernel.
> 
> 
> r~
> 
> 
> Richard Henderson (9):
>   target/m68k: Use DISAS_NORETURN for exceptions
>   target/m68k: Replace DISAS_TB_JUMP with DISAS_NORETURN
>   target/m68k: Remove DISAS_JUMP_NEXT as unused
>   target/m68k: Use lookup_and_goto_tb for DISAS_JUMP
>   target/m68k: Rename DISAS_UPDATE and gen_lookup_tb
>   target/m68k: Convert to DisasContextBase
>   target/m68k: Convert to TranslatorOps
>   target/m68k: Improve ending TB at page boundaries
>   target/m68k: Merge disas_m68k_insn into m68k_tr_translate_insn
> 
>  target/m68k/translate.c | 354 
>  1 file changed, 179 insertions(+), 175 deletions(-)
> 

Richard,

do you want I take this through my m68k tree or do you take this in a
TCG pull requests?

Thanks,
Laurent

Re: [Qemu-devel] [PATCH 0/4] cuda/mos6522 migration fixes

2018-06-07 Thread Peter Maydell

On 7 June 2018 at 20:18, Philippe Mathieu-Daudé  wrote:
> You forgot patch 5 "vmstate: Remove VMSTATE_TIMER_PTR_TEST" :)

It's a generic macro that might have utility in future,
and it fits into a slot in the matrix of possible VMSTATE
macros, so I wouldn't bother. (It's kind of irritating that we
have an almost but not quite orthogonal set of VMSTATE
macros.)

thanks
-- PMM

Re: [Qemu-devel] [PATCH 0/4] linux-user: syscall number fixes

2018-06-07 Thread Laurent Vivier

Le 07/06/2018 à 20:56, no-re...@patchew.org a écrit :
...
> === OUTPUT BEGIN ===
> Checking PATCH 1/4: linux-user/alpha: Fix epoll syscalls...
> ERROR: code indent should never use tabs
> #22: FILE: linux-user/alpha/syscall_nr.h:346:
> +#define TARGET_NR_epoll_create^I^I407$
> 
> ERROR: code indent should never use tabs
> #23: FILE: linux-user/alpha/syscall_nr.h:347:
> +#define TARGET_NR_epoll_ctl^I^I408$
> 
> ERROR: code indent should never use tabs
> #24: FILE: linux-user/alpha/syscall_nr.h:348:
> +#define TARGET_NR_epoll_wait^I^I409$
> 
> total: 3 errors, 0 warnings, 27 lines checked
...
> ERROR: code indent should never use tabs
> #18: FILE: linux-user/microblaze/syscall_nr.h:366:
> +#define TARGET_NR_accept4^I^I362 /* new */$
> 

If you agree, I will remove tabs when I'll commit the patches for the
pull-request.

Thanks,
Laurent

Re: [Qemu-devel] [PATCH 4/4] linux-user/sparc64: Add inotify_rm_watch and tee syscalls

2018-06-07 Thread Laurent Vivier

Le 07/06/2018 à 20:48, Richard Henderson a écrit :
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/sparc64/syscall_nr.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/linux-user/sparc64/syscall_nr.h b/linux-user/sparc64/syscall_nr.h
> index 9391645598..0b91b896da 100644
> --- a/linux-user/sparc64/syscall_nr.h
> +++ b/linux-user/sparc64/syscall_nr.h
> @@ -154,7 +154,7 @@
>  #define TARGET_NR_poll   153 /* Common   
>*/
>  #define TARGET_NR_getdents64 154 /* Linux specific   
>*/
>  #define TARGET_NR_fcntl64155 /* Linux sparc32 Specific   
>*/
> -/* #define TARGET_NR_getdirentries   156SunOS Specific   
>*/
> +#define TARGET_NR_inotify_rm_watch   156 /* Linux specific   
>*/
>  #define TARGET_NR_statfs 157 /* Common   
>*/
>  #define TARGET_NR_fstatfs158 /* Common   
>*/
>  #define TARGET_NR_umount 159 /* Common   
>*/
> @@ -278,7 +278,7 @@
>  #define TARGET_NR_mq_notify  277
>  #define TARGET_NR_mq_getsetattr  278
>  #define TARGET_NR_waitid 279
> -/*#define TARGET_NR_sys_setaltroot   280 available (was setaltroot) */
> +#define TARGET_NR_tee   280
>  #define TARGET_NR_add_key281
>  #define TARGET_NR_request_key282
>  #define TARGET_NR_keyctl 283
> 

Reviewed-by: Laurent Vivier

Re: [Qemu-devel] [PATCH 3/4] linux-user/microblaze: Fix typo in accept4 syscall

2018-06-07 Thread Laurent Vivier

Le 07/06/2018 à 20:48, Richard Henderson a écrit :
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/microblaze/syscall_nr.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/linux-user/microblaze/syscall_nr.h 
> b/linux-user/microblaze/syscall_nr.h
> index 0704449bae..761208e9e6 100644
> --- a/linux-user/microblaze/syscall_nr.h
> +++ b/linux-user/microblaze/syscall_nr.h
> @@ -363,7 +363,7 @@
>  #define TARGET_NR_shutdown   359 /* new */
>  #define TARGET_NR_sendmsg360 /* new */
>  #define TARGET_NR_recvmsg361 /* new */
> -#define TARGET_NR_accept04   362 /* new */
> +#define TARGET_NR_accept4362 /* new */
>  #define TARGET_NR_preadv363 /* new */
>  #define TARGET_NR_pwritev   364 /* new */
>  #define TARGET_NR_rt_tgsigqueueinfo 365 /* new */
> 

Reviewed-by: Laurent Vivier

Re: [Qemu-devel] [PATCH 2/4] linux-user/hppa: Fix typo in mknodat syscall

2018-06-07 Thread Laurent Vivier

Le 07/06/2018 à 20:48, Richard Henderson a écrit :
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/hppa/syscall_nr.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/linux-user/hppa/syscall_nr.h b/linux-user/hppa/syscall_nr.h
> index 55bdf71d50..9c1d0a195d 100644
> --- a/linux-user/hppa/syscall_nr.h
> +++ b/linux-user/hppa/syscall_nr.h
> @@ -279,7 +279,7 @@
>  #define TARGET_NR_ppoll 274
>  #define TARGET_NR_openat275
>  #define TARGET_NR_mkdirat   276
> -#define TARGET_NR_mknotat   277
> +#define TARGET_NR_mknodat   277
>  #define TARGET_NR_fchownat  278
>  #define TARGET_NR_futimesat 279
>  #define TARGET_NR_fstatat64 280
> 

Reviewed-by: Laurent Vivier

Re: [Qemu-devel] [PATCH 1/4] linux-user/alpha: Fix epoll syscalls

2018-06-07 Thread Laurent Vivier

Le 07/06/2018 à 20:48, Richard Henderson a écrit :
> These were named incorrectly, going so far as to invade strace.list.
> 
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/alpha/syscall_nr.h | 6 +++---
>  linux-user/strace.list| 9 -
>  2 files changed, 3 insertions(+), 12 deletions(-)

Reviewed-by: Laurent Vivier

Re: [Qemu-devel] [PATCH 0/4] linux-user: syscall number fixes

2018-06-07 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180607184844.30126-1-richard.hender...@linaro.org
Subject: [Qemu-devel] [PATCH 0/4] linux-user: syscall number fixes

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   
patchew/20180607184844.30126-1-richard.hender...@linaro.org -> 
patchew/20180607184844.30126-1-richard.hender...@linaro.org
Switched to a new branch 'test'
6080fbaefd linux-user/sparc64: Add inotify_rm_watch and tee syscalls
a332062a9c linux-user/microblaze: Fix typo in accept4 syscall
04bf90a937 linux-user/hppa: Fix typo in mknodat syscall
63845de478 linux-user/alpha: Fix epoll syscalls

=== OUTPUT BEGIN ===
Checking PATCH 1/4: linux-user/alpha: Fix epoll syscalls...
ERROR: code indent should never use tabs
#22: FILE: linux-user/alpha/syscall_nr.h:346:
+#define TARGET_NR_epoll_create^I^I407$

ERROR: code indent should never use tabs
#23: FILE: linux-user/alpha/syscall_nr.h:347:
+#define TARGET_NR_epoll_ctl^I^I408$

ERROR: code indent should never use tabs
#24: FILE: linux-user/alpha/syscall_nr.h:348:
+#define TARGET_NR_epoll_wait^I^I409$

total: 3 errors, 0 warnings, 27 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/4: linux-user/hppa: Fix typo in mknodat syscall...
Checking PATCH 3/4: linux-user/microblaze: Fix typo in accept4 syscall...
ERROR: code indent should never use tabs
#18: FILE: linux-user/microblaze/syscall_nr.h:366:
+#define TARGET_NR_accept4^I^I362 /* new */$

total: 1 errors, 0 warnings, 8 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 4/4: linux-user/sparc64: Add inotify_rm_watch and tee syscalls...
WARNING: line over 80 characters
#18: FILE: linux-user/sparc64/syscall_nr.h:157:
+#define TARGET_NR_inotify_rm_watch   156 /* Linux specific 
 */

total: 0 errors, 1 warnings, 16 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [Qemu-devel] storing machine data in qcow images?

2018-06-07 Thread Laszlo Ersek

On 06/07/18 12:51, Andrea Bolognani wrote:
> On Thu, 2018-06-07 at 11:32 +0100, Richard W.M. Jones wrote:
>> On Thu, Jun 07, 2018 at 12:02:29PM +0200, Andrea Bolognani wrote:
>>> Something that I haven't seen mentioned in the thread - and this
>>> looks like as good a point as any to jump in - is that for q35
>>> guests using EFI as well as aarch64 guests the "one click import"
>>> experience requires not only hints about the machine (and firmware!)
>>> type, but also a copy of the EFI variable store:
>>>
>>>   $ virt-builder fedora-27 --arch aarch64 --notes
>>>   Fedora® 27 Server (aarch64)
>>>
>>>   [...]
>>>
>>>   You will need to use the associated UEFI NVRAM variables file:
>>> http://libguestfs.org/download/builder/fedora-27-aarch64-nvram.xz
>>
>> This is true, although only sometimes.  If the bootloader[*] has a
>> working fallback path then usually it is able to boot and reset the
>> UEFI varstore back to the correct values.  We have had bugs before
>> where the fallback path was not working, eg:
>>
>>   https://bugzilla.redhat.com/show_bug.cgi?id=1353689 (yours!)
>>   https://bugzilla.redhat.com/show_bug.cgi?id=1558793
>
> [...]
>> [*] I'm not sure exactly which bit of the bootloader does this,
>> whether it's UEFI itself, or the grub-efi in the guest.
>
> IIUC the UEFI spec itself reserves certain file names in the ESP
> for this fallback mechanism; it's then up to the guest operating
> system to actually install something appropriate there.
>
> In Fedora and RHEL, shim is what takes care of it (except when it
> doesn't ;), but in Debian and Ubuntu AFAIK shim is not included
> and the fallback path doesn't work at all, which makes providing
> the NVRAM file a hard requirement to boot such guests.

Quoting the UEFI-2.7 spec:

> 3.4.3 Boot Option Variables Default Boot Behavior
>
> [...] the boot options require a standard default behavior in the
> exceptional case that valid boot options are not present on a
> platform. The default behavior must be invoked any time the BootOrder
> variable does not exist or only points to nonexistent boot options, or
> if no entry in BootOrder can successfully be executed.
>
> If system firmware supports boot option recovery as described in
> Section 3.4, system firmware must include a PlatformRecovery
> variable specifying a short-form File Path Media Device Path (see
> Section 3.1.2) containing the platform default file path for removable
> media (see Table 11). [...]

(Note from Laszlo: think '\EFI\BOOT\BOOTX64.EFI' on the system disk's
EFI System Partition.)

> It is expected that this default boot will load an operating system or
> a maintenance utility.
>
> If this is an operating system setup program it is then responsible
> for setting the requisite environment variables for subsequent boots.
> [...]

More details:
.

Thanks
Laszlo

Re: [Qemu-devel] [PATCH 3/4] linux-user/microblaze: Fix typo in accept4 syscall

2018-06-07 Thread Philippe Mathieu-Daudé

On 06/07/2018 03:48 PM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson 

04 since '09: 8dfbe4e839e.

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  linux-user/microblaze/syscall_nr.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/linux-user/microblaze/syscall_nr.h 
> b/linux-user/microblaze/syscall_nr.h
> index 0704449bae..761208e9e6 100644
> --- a/linux-user/microblaze/syscall_nr.h
> +++ b/linux-user/microblaze/syscall_nr.h
> @@ -363,7 +363,7 @@
>  #define TARGET_NR_shutdown   359 /* new */
>  #define TARGET_NR_sendmsg360 /* new */
>  #define TARGET_NR_recvmsg361 /* new */
> -#define TARGET_NR_accept04   362 /* new */
> +#define TARGET_NR_accept4362 /* new */
>  #define TARGET_NR_preadv363 /* new */
>  #define TARGET_NR_pwritev   364 /* new */
>  #define TARGET_NR_rt_tgsigqueueinfo 365 /* new */
>

Re: [Qemu-devel] [PATCH 4/4] mos6522: convert VMSTATE_TIMER_PTR_TEST to VMSTATE_TIMER_PTR

2018-06-07 Thread Peter Maydell

On 7 June 2018 at 18:17, Mark Cave-Ayland  wrote:
> The timers are configured in the mos6522 init function and therefore will
> always exist, so the function can never return false.
>
> Peter also pointed out that this is the only remaining user of
> VMSTATE_TIMER_PTR_TEST in the codebase, so we might as well just convert it
> over to VMSTATE_TIMER_PTR and remove mos6522_timer_exist() as it is no
> longer required.
>
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/misc/mos6522.c | 9 +
>  1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
> index d0a0c9e5d9..e9b686ac92 100644
> --- a/hw/misc/mos6522.c
> +++ b/hw/misc/mos6522.c
> @@ -369,13 +369,6 @@ static const MemoryRegionOps mos6522_ops = {
>  },
>  };
>
> -static bool mos6522_timer_exist(void *opaque, int version_id)
> -{
> -MOS6522Timer *s = opaque;
> -
> -return s->timer != NULL;
> -}
> -
>  static const VMStateDescription vmstate_mos6522_timer = {
>  .name = "mos6522_timer",
>  .version_id = 0,
> @@ -385,7 +378,7 @@ static const VMStateDescription vmstate_mos6522_timer = {
>  VMSTATE_UINT16(counter_value, MOS6522Timer),
>  VMSTATE_INT64(load_time, MOS6522Timer),
>  VMSTATE_INT64(next_irq_time, MOS6522Timer),
> -VMSTATE_TIMER_PTR_TEST(timer, MOS6522Timer, mos6522_timer_exist),
> +VMSTATE_TIMER_PTR(timer, MOS6522Timer),
>  VMSTATE_END_OF_LIST()
>  }
Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [PATCH 2/4] linux-user/hppa: Fix typo in mknodat syscall

2018-06-07 Thread Philippe Mathieu-Daudé

On 06/07/2018 03:48 PM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson 

mknodat is not at 8ee78dece0d.

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  linux-user/hppa/syscall_nr.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/linux-user/hppa/syscall_nr.h b/linux-user/hppa/syscall_nr.h
> index 55bdf71d50..9c1d0a195d 100644
> --- a/linux-user/hppa/syscall_nr.h
> +++ b/linux-user/hppa/syscall_nr.h
> @@ -279,7 +279,7 @@
>  #define TARGET_NR_ppoll 274
>  #define TARGET_NR_openat275
>  #define TARGET_NR_mkdirat   276
> -#define TARGET_NR_mknotat   277
> +#define TARGET_NR_mknodat   277
>  #define TARGET_NR_fchownat  278
>  #define TARGET_NR_futimesat 279
>  #define TARGET_NR_fstatat64 280
>

1 2 3 4 5 >

1 - 100 of 423 matches

Mail list logo