date:20180611

Re: [Qemu-devel] [PATCH] monitor: postpone monitor_qmp_cleanup_queues

2018-06-11 Thread Peter Xu

On Mon, Jun 11, 2018 at 05:45:49PM +0100, Stefan Hajnoczi wrote:
> On Fri, Jun 08, 2018 at 10:18:25AM +0200, Markus Armbruster wrote:
> > Stefan Hajnoczi  writes:
> > > On Fri, Jun 08, 2018 at 12:42:35PM +0800, Peter Xu wrote:
> > >> On Thu, Jun 07, 2018 at 01:53:01PM +0200, Markus Armbruster wrote:
> > >> > Peter Xu  writes:
> > >> > 
> > >> > > Previously we cleanup the queues when we got CLOSED event.  It was 
> > >> > > used
> > >> > 
> > >> > we clean up
> > >> > 
> > >> > > to make sure we won't leftover replies/events of a old client to a 
> > >> > > new
> > >> > 
> > >> > we won't send leftover replies/events of an old client
> > >> > 
> > >> > > client.  Now this patch postpones that until OPENED.
> > >> > 
> > >> > What if OPENED never comes?
> > >> 
> > >> Then we clean that up until destruction of the monitor.  IMHO it's
> > >> fine, but I'm not sure whether that's an overall acceptable approach.
> > >
> > > I think this patch fixes the problem at the wrong level.  Marc-André's
> > > fix seemed like a cleaner solution.
> > 
> > Is it the right solution?
> > 
> > I proposed another one:
> 
> Sorry, I won't be able to participate in this because I'm behind on
> other patches and tasks.  Therefore, feel free to disregard but I'll
> give my 2 cents:
> 
> This seems like a chardev bug.  The solution should probably be in the
> chardev layer (Marc-André's patch or something else), not in the monitor
> (this patch).

Yes that's why I said I like Marc-Andre's patch. :) I just don't know
an reliable way to achieve what we want there.

The thing is that we don't really monitor the ioc_out for fd-typed
chardevs.  We only do that when we call qemu_chr_fe_add_watch() (e.g.,
in monitor_flush_locked()) when the writting buffer is full.  But
normally we can't detect any event from the output side, hence no way
to deliever a CLOSED event when the output fd is the last fd that is
closed.

Maybe we can keep that output watch for the whole lifecycle of the
chardev?  I'm not sure yet.

> 
> Even if there is a monitor change, it's probably necessary to at least
> clarify the meaning of the CLOSE event to reduce the chance of similar
> bugs in future chardev users.

Makes sense.

I'll wait for a few more days (I wanted to enable OOB asap since
otherwise we'll reach another rc soon again) to see whether
Paolo/Marc-Andre/... has any better idea, otherwise I'll post a patch
to fix our problem first, then we turn OOB on alongside with the fix.

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] loader: implement START_SEG_ADDR_RECORD

2018-06-11 Thread Su Hang

I do have time, the function mentioned in last comments isn't difficult to
implement, but I wonder how to write corresponding qtest-case for cortex-m3. You
know, becuase #current# QEMU doesn't surpport the cortex-m3 instruction, I don't
know how to prove correctness of hex loader. For example, in my past patch
serias, "Hello World!" will be printed out, if Hex File get successfully loaded.

Julia has told me a method, I haven't tried it. I will do it as quick as I
could.

Best,
Su Hang

> -Original Messages-
> From: "Stefan Hajnoczi" 
> Sent Time: 2018-06-11 21:55:07 (Monday)
> To: "Su Hang" 
> Cc: qemu-devel@nongnu.org, jus...@mail.ru, j...@groklearning.com, 
> j...@jms.id.au, qemu...@steffen-goertz.de
> Subject: Re: [PATCH] loader: implement START_SEG_ADDR_RECORD
> 
> On Sun, Jun 10, 2018 at 10:04:32AM +0800, Su Hang wrote:
> > Sure, Thanks for remind me of this.
> > One thing I must point out, in current code logic, if 
> > "START_SEG_ADDR_RECORD"
> > occured multiple times, only the last one works. I don't know whether GNU
> > binutils would emit 'The Record' many times.
> 
> It does not.  The behavior you described seems fine.
> 
> Do you have time to send a new revision of your patch series addressing
> the comments from last time?
> 
> Thanks,
> Stefan

Re: [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource.

2018-06-11 Thread Zhang,Yi

On 一, 2018-06-11 at 19:55 -0700, Dan Williams wrote:
> On Mon, Jun 11, 2018 at 9:26 AM, Stefan Hajnoczi  > wrote:
> > 
> > On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote:
> > > 
> > > Nvdimm driver use Memory hot-plug APIs to map it's pmem resource,
> > > which at a section granularity.
> > > 
> > > When QEMU emulated the vNVDIMM device, decrease the label-
> > > storage,
> > > QEMU will put the vNVDIMMs directly next to one another in
> > > physical
> > > address space, which means that the boundary between them won't
> > > align to the 128 MB memory section size.
> > I'm having a hard time parsing this.
> > 
> > Where does the "128 MB memory section size" come from?  ACPI?
> > A chipset-specific value?
> > 
> The devm_memremap_pages() implementation use the memory hotplug core
> to allocate the 'struct page' array/map for persistent memory. Memory
> hotplug can only be performed in terms of sections, 128MB on x86_64.
> There is some limited support for allowing devm_memremap_pages() to
> overlap 'System RAM' within a given section, but it does not
> currently
> support multiple devm_memremap_pages() calls overlapping within the
> same section. There is currently a kernel bug where we do not handle
> this unsupported configuration gracefully. The fix will cause
> configurations configurations that try to overlap 2 persistent memory
> ranges in the same section to fail.
> 
> The proposed fix is trying to make sure that QEMU does not run afoul
> of this constraint.
> 
> There is currently no line of sight to reduce the minimum memory
> hotplug alignment size to less than 128M. Also, as other
> architectures
> outside of x86_64 add devm_memremap_pages() support, the minimum
> section alignment constraint might change and is a property of a
> guest
> OS. My understanding is that some guest OSes might expect an even
> larger persistent memory minimum alignment.
> 
Thanks Dan's explanation, I still have a question that why we
overlapping
the un-align area  instead of drop it? and let it align to the next
section.

Re: [Qemu-devel] [PATCH v3 5/7] hmp: Add info commands for preconfig

2018-06-11 Thread Gerd Hoffmann

  Hi,

> > Now let's review the three commands:
> > 
> > * Gerd, why does "info usbhost" have no QMP equivalent?

Works only when running qemu directly, in the libvirt sandbox qemu
hasn't the permissions needed to scan the host usb bus so that would be
rather pointless ...

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 00/12] migration: improve multithreads for compression and decompression

2018-06-11 Thread Peter Xu

On Tue, Jun 12, 2018 at 11:19:14AM +0800, Xiao Guangrong wrote:
> 
> 
> On 06/11/2018 04:00 PM, Peter Xu wrote:
> > On Mon, Jun 04, 2018 at 05:55:08PM +0800, guangrong.x...@gmail.com wrote:
> > > From: Xiao Guangrong 
> > > 
> > > Background
> > > --
> > > Current implementation of compression and decompression are very
> > > hard to be enabled on productions. We noticed that too many wait-wakes
> > > go to kernel space and CPU usages are very low even if the system
> > > is really free
> > > 
> > > The reasons are:
> > > 1) there are two many locks used to do synchronous，there
> > > 　　is a global lock and each single thread has its own lock,
> > > 　　migration thread and work threads need to go to sleep if
> > > 　　these locks are busy
> > > 
> > > 2) migration thread separately submits request to the thread
> > > however, only one request can be pended, that means, the
> > > thread has to go to sleep after finishing the request
> > > 
> > > Our Ideas
> > > -
> > > To make it work better, we introduce a new multithread model,
> > > the user, currently it is the migration thread, submits request
> > > to each thread with round-robin manner, the thread has its own
> > > ring whose capacity is 4 and puts the result to a global ring
> > > which is lockless for multiple producers, the user fetches result
> > > out from the global ring and do remaining operations for the
> > > request, e.g, posting the compressed data out for migration on
> > > the source QEMU
> > > 
> > > Other works in this patchset is offering some statistics to see
> > > if compression works as we expected and making the migration thread
> > > work fast so it can feed more requests to the threads
> > 
> > Hi, Guangrong,
> > 
> > I'm not sure whether my understanding is correct, but AFAIU the old
> > code has a major defect that it depends too much on the big lock.  The
> > critial section of the small lock seems to be very short always, and
> > also that's per-thread.  However we use the big lock in lots of
> > places: flush compress data, queue every page, or send the notifies in
> > the compression thread.
> > 
> 
> The lock is one issue, however, another issue is that, the thread has
> to go to sleep after finishing one request and the main thread (live
> migration thread) needs to go to kernel space and wake the thread up
> for every single request.
> 
> And we also observed that linearly scan the threads one by one to
> see which is free is not cache-friendly...

I don't quite understand how this can be fixed on cache POV, but I'll
read the series first before further asking.

> 
> > I haven't yet read the whole work, this work seems to be quite nice
> > according to your test results.  However have you thought about
> > firstly remove the big lock without touching much of other part of the
> > code, then continue to improve it?  Or have you ever tried to do so?
> > I don't think you need to do extra work for this, but I would
> > appreciate if you have existing test results to share.
> > 
> 
> If you really want the performance result, i will try it...

Then that'll be enough for me.  Please only provide the performance
numbers if there are more people asking for that.  Otherwise please
feel free to put that aside.

> 
> Actually, the first version we used on our production is that we
> use a lockless multi-thread model (only one atomic operation is needed
> for both producer and consumer) but only one request can be fed to the
> thread. It's comparable to your suggestion (and should far more faster
> than your suggestion).
> 
> We observed the shortcoming of this solutions is that too many waits and
> wakeups trapped to kernel, so CPU is idle and bandwidth is low.

Okay.

> 
> > In other words, would it be nicer to separate the work into two
> > pieces?
> > 
> > - one to refactor the existing locks, to see what we can gain by
> >simplify the locks to minimum.  AFAIU now the locking used is still
> >not ideal, my thinking is that _maybe_ we can start by removing the
> >big lock, and use a semaphore or something to replace the "done"
> >notification while still keep the small lock?  Even some busy
> >looping?
> > 
> 
> Note: no lock is used after this patchset...
> 
> > - one to introduce the lockless ring buffer, to demostrate how the
> >lockless data structure helps comparing to the locking ways
> > 
> > Then we can know which item contributed how much to the performance
> > numbers.  After all the new ring and thread model seems to be a big
> > chunk of work (sorry I haven't read them yet, but I will).
> 
> It is really a huge burden that refactor old code and later completely
> remove old code.
> 
> We redesigned the data struct and algorithm completely and abstract the
> model to clean up the code used for compression and decompression, it's
> not easy to modify the old code part by part... :(

Yeah; my suggestion above is based on the possibility that removing
the big lock won't be

[Qemu-devel] [QEMU-PPC] [PATCH V2 2/2] ppc/spapr_caps: Don't disable cap_cfpc on POWER8 by default

2018-06-11 Thread Suraj Jitindar Singh

In default_caps_with_cpu() we set spapr_cap_cfpc to broken for POWER8
processors and before.

Since we no longer require private l1d cache on POWER8 for this cap to
be set to workaround change this to default to broken for POWER7
processors and before.

Signed-off-by: Suraj Jitindar Singh 
Reviewed-by: David Gibson 

---

V1 -> V2:
- No Change

---
 hw/ppc/spapr_caps.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 531e145114..00e43a9ba7 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -335,14 +335,10 @@ static sPAPRCapabilities 
default_caps_with_cpu(sPAPRMachineState *spapr,
 
 caps = smc->default_caps;
 
-if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00,
-  0, spapr->max_compat_pvr)) {
-caps.caps[SPAPR_CAP_CFPC] = SPAPR_CAP_BROKEN;
-}
-
 if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_2_07,
   0, spapr->max_compat_pvr)) {
 caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
+caps.caps[SPAPR_CAP_CFPC] = SPAPR_CAP_BROKEN;
 }
 
 if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_2_06_PLUS,
-- 
2.13.6

[Qemu-devel] [QEMU-PPC] [PATCH V2 1/2] target/ppc: Don't require private l1d cache on POWER8 for cap_ppc_safe_cache

2018-06-11 Thread Suraj Jitindar Singh

For cap_ppc_safe_cache to be set to workaround, we require both a l1d
cache flush instruction and private l1d cache.

On POWER8 don't require private l1d cache. This means a guest on a
POWER8 machine can make use of the cache flush workarounds.

Signed-off-by: Suraj Jitindar Singh 

---

V1 -> V2:
- Use mfpvr() to detect host type

---
 target/ppc/kvm.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 2c0c34e125..7fe9d0126b 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2412,11 +2412,28 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
 return cap_mmu_hash_v3;
 }
 
+static bool kvmppc_power8_host(void)
+{
+bool ret = false;
+#ifdef TARGET_PPC64
+{
+uint32_t base_pvr = CPU_POWERPC_POWER_SERVER_MASK & mfpvr();
+ret = (base_pvr == CPU_POWERPC_POWER8E_BASE) ||
+  (base_pvr == CPU_POWERPC_POWER8NVL_BASE) ||
+  (base_pvr == CPU_POWERPC_POWER8_BASE);
+}
+#endif /* TARGET_PPC64 */
+return ret;
+}
+
 static int parse_cap_ppc_safe_cache(struct kvm_ppc_cpu_char c)
 {
+bool l1d_thread_priv_req = !kvmppc_power8_host();
+
 if (~c.behaviour & c.behaviour_mask & H_CPU_BEHAV_L1D_FLUSH_PR) {
 return 2;
-} else if ((c.character & c.character_mask & H_CPU_CHAR_L1D_THREAD_PRIV) &&
+} else if ((!l1d_thread_priv_req ||
+c.character & c.character_mask & H_CPU_CHAR_L1D_THREAD_PRIV) &&
(c.character & c.character_mask
 & (H_CPU_CHAR_L1D_FLUSH_ORI30 | H_CPU_CHAR_L1D_FLUSH_TRIG2))) {
 return 1;
-- 
2.13.6

Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-11 Thread Samudrala, Sridhar


On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote:

On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote:


On 2018年06月12日 01:26, Michael S. Tsirkin wrote:

On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:

This feature bit can be used by hypervisor to indicate virtio_net device to
act as a standby for another device with the same MAC address.

I tested this with a small change to the patch to mark the STANDBY feature 
'true'
by default as i am using libvirt to start the VMs.
Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
XML file?

Signed-off-by: Sridhar Samudrala 

So I do not think we can commit to this interface: we
really need to control visibility of the primary device.

The problem is legacy guest won't use primary device at all if we do this.

And that's by design - I think it's the only way to ensure the
legacy guest isn't confused.


Yes. I think so. But i am not sure if Qemu is the right place to control the 
visibility
of the primary device. The primary device may not be specified as an argument 
to Qemu. It
may be plugged in later.
The cloud service provider is providing a feature that enables low latency 
datapath and live
migration capability.
A tenant can use this feature only if he is running a VM that has virtio-net 
with failover support.

I think Qemu should check if guest virtio-net supports this feature and provide 
a mechanism for
an upper layer indicating if the STANDBY feature is successfully negotiated or 
not.
The upper layer can then decide if it should hot plug a VF with the same MAC 
and manage the 2 links.
If VF is successfully hot plugged, virtio-net link should be disabled.





How about control the visibility of standby device?

Thanks

standy the always there to guarantee no downtime.


However just for testing purposes, we could add a non-stable
interface "x-standby" with the understanding that as any
x- prefix it's unstable and will be changed down the road,
likely in the next release.



---
   hw/net/virtio-net.c | 2 ++
   include/standard-headers/linux/virtio_net.h | 3 +++
   2 files changed, 5 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 90502fca7c..38b3140670 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
true),
   DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
   DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
VIRTIO_NET_F_STANDBY,
+  false),
   DEFINE_PROP_END_OF_LIST(),
   };
diff --git a/include/standard-headers/linux/virtio_net.h 
b/include/standard-headers/linux/virtio_net.h
index e9f255ea3f..01ec09684c 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -57,6 +57,9 @@
 * Steering */
   #define VIRTIO_NET_F_CTRL_MAC_ADDR 23/* Set MAC address */
+#define VIRTIO_NET_F_STANDBY  62/* Act as standby for another device
+ * with the same MAC.
+ */
   #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
   #ifndef VIRTIO_NET_NO_LEGACY
--
2.14.3

Re: [Qemu-devel] [Qemu-block] [PATCH v2] util/async: avoid NULL pointer dereference

2018-06-11 Thread Jeff Cody

On Tue, Jun 12, 2018 at 07:26:25AM +0800, Jie Wang wrote:
> if laio_init create linux_aio failed and return NULL, NULL pointer
> dereference will occur when laio_attach_aio_context dereference
> linux_aio in aio_get_linux_aio. Let's avoid it and report error.
> 
> Signed-off-by: Jie Wang 

Reviewed-by: Jeff Cody 

> ---
>  block/file-posix.c | 19 +--
>  util/async.c   |  5 -
>  2 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 513d371bb1..653017d7a5 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1665,6 +1665,11 @@ static int coroutine_fn raw_co_prw(BlockDriverState 
> *bs, uint64_t offset,
>  #ifdef CONFIG_LINUX_AIO
>  } else if (s->use_linux_aio) {
>  LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
> +if (!aio) {
> +s->use_linux_aio = false;
> +error_report("Failed to get linux aio");
> +return -EIO;
> +}
>  assert(qiov->size == bytes);
>  return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
>  #endif
> @@ -1695,7 +1700,12 @@ static void raw_aio_plug(BlockDriverState *bs)
>  BDRVRawState *s = bs->opaque;
>  if (s->use_linux_aio) {
>  LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
> -laio_io_plug(bs, aio);
> +if (aio) {
> +laio_io_plug(bs, aio);
> +} else {
> +s->use_linux_aio = false;
> +error_report("Failed to get linux aio");
> +}
>  }
>  #endif
>  }
> @@ -1706,7 +1716,12 @@ static void raw_aio_unplug(BlockDriverState *bs)
>  BDRVRawState *s = bs->opaque;
>  if (s->use_linux_aio) {
>  LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
> -laio_io_unplug(bs, aio);
> +if (aio) {
> +laio_io_unplug(bs, aio);
> +} else {
> +s->use_linux_aio = false;
> +error_report("Failed to get linux aio");
> +}
>  }
>  #endif
>  }
> diff --git a/util/async.c b/util/async.c
> index 03f62787f2..08d71340f8 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -327,8 +327,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
>  {
>  if (!ctx->linux_aio) {
>  ctx->linux_aio = laio_init();
> -laio_attach_aio_context(ctx->linux_aio, ctx);
> +if (ctx->linux_aio) {
> +laio_attach_aio_context(ctx->linux_aio, ctx);
> +}
>  }
> +
>  return ctx->linux_aio;
>  }
>  #endif
> -- 
> 2.15.0.windows.1
> 
>

Re: [Qemu-devel] [PATCH 00/12] migration: improve multithreads for compression and decompression

2018-06-11 Thread Xiao Guangrong





On 06/11/2018 04:00 PM, Peter Xu wrote:

On Mon, Jun 04, 2018 at 05:55:08PM +0800, guangrong.x...@gmail.com wrote:

From: Xiao Guangrong 

Background
--
Current implementation of compression and decompression are very
hard to be enabled on productions. We noticed that too many wait-wakes
go to kernel space and CPU usages are very low even if the system
is really free

The reasons are:
1) there are two many locks used to do synchronous，there
　　is a global lock and each single thread has its own lock,
　　migration thread and work threads need to go to sleep if
　　these locks are busy

2) migration thread separately submits request to the thread
however, only one request can be pended, that means, the
thread has to go to sleep after finishing the request

Our Ideas
-
To make it work better, we introduce a new multithread model,
the user, currently it is the migration thread, submits request
to each thread with round-robin manner, the thread has its own
ring whose capacity is 4 and puts the result to a global ring
which is lockless for multiple producers, the user fetches result
out from the global ring and do remaining operations for the
request, e.g, posting the compressed data out for migration on
the source QEMU

Other works in this patchset is offering some statistics to see
if compression works as we expected and making the migration thread
work fast so it can feed more requests to the threads


Hi, Guangrong,

I'm not sure whether my understanding is correct, but AFAIU the old
code has a major defect that it depends too much on the big lock.  The
critial section of the small lock seems to be very short always, and
also that's per-thread.  However we use the big lock in lots of
places: flush compress data, queue every page, or send the notifies in
the compression thread.



The lock is one issue, however, another issue is that, the thread has
to go to sleep after finishing one request and the main thread (live
migration thread) needs to go to kernel space and wake the thread up
for every single request.

And we also observed that linearly scan the threads one by one to
see which is free is not cache-friendly...


I haven't yet read the whole work, this work seems to be quite nice
according to your test results.  However have you thought about
firstly remove the big lock without touching much of other part of the
code, then continue to improve it?  Or have you ever tried to do so?
I don't think you need to do extra work for this, but I would
appreciate if you have existing test results to share.



If you really want the performance result, i will try it...

Actually, the first version we used on our production is that we
use a lockless multi-thread model (only one atomic operation is needed
for both producer and consumer) but only one request can be fed to the
thread. It's comparable to your suggestion (and should far more faster
than your suggestion).

We observed the shortcoming of this solutions is that too many waits and
wakeups trapped to kernel, so CPU is idle and bandwidth is low.


In other words, would it be nicer to separate the work into two
pieces?

- one to refactor the existing locks, to see what we can gain by
   simplify the locks to minimum.  AFAIU now the locking used is still
   not ideal, my thinking is that _maybe_ we can start by removing the
   big lock, and use a semaphore or something to replace the "done"
   notification while still keep the small lock?  Even some busy
   looping?



Note: no lock is used after this patchset...


- one to introduce the lockless ring buffer, to demostrate how the
   lockless data structure helps comparing to the locking ways

Then we can know which item contributed how much to the performance
numbers.  After all the new ring and thread model seems to be a big
chunk of work (sorry I haven't read them yet, but I will).


It is really a huge burden that refactor old code and later completely
remove old code.

We redesigned the data struct and algorithm completely and abstract the
model to clean up the code used for compression and decompression, it's
not easy to modify the old code part by part... :(

But... if you really it is really needed, i will try to figure out a way
to address your suggestion. :)

Thanks!

Re: [Qemu-devel] [PATCH 01/12] migration: do not wait if no free thread

2018-06-11 Thread Peter Xu

On Tue, Jun 12, 2018 at 10:42:25AM +0800, Xiao Guangrong wrote:
> 
> 
> On 06/11/2018 03:39 PM, Peter Xu wrote:
> > On Mon, Jun 04, 2018 at 05:55:09PM +0800, guangrong.x...@gmail.com wrote:
> > > From: Xiao Guangrong 
> > > 
> > > Instead of putting the main thread to sleep state to wait for
> > > free compression thread, we can directly post it out as normal
> > > page that reduces the latency and uses CPUs more efficiently
> > 
> > The feature looks good, though I'm not sure whether we should make a
> > capability flag for this feature since otherwise it'll be hard to
> > switch back to the old full-compression way no matter for what
> > reason.  Would that be a problem?
> > 
> 
> We assume this optimization should always be optimistic for all cases,
> particularly, we introduced the statistics of compression, then the user
> should adjust its parameters based on those statistics if anything works
> worse.

Ah, that'll be good.

> 
> Furthermore, we really need to improve this optimization if it hurts
> any case rather than leaving a option to the user. :)

Yeah, even if we make it a parameter/capability we can still turn that
on by default in new versions but keep the old behavior in old
versions. :) The major difference is that, then we can still _have_ a
way to compress every page. I'm just thinking if we don't have a
switch for that then if someone wants to measure e.g.  how a new
compression algo could help VM migration, then he/she won't be
possible to do that again since the numbers will be meaningless if
that bit is out of control on which page will be compressed.

Though I don't know how much use it'll bring...  But if that won't be
too hard, it still seems good.  Not a strong opinion.

> 
> > > 
> > > Signed-off-by: Xiao Guangrong 
> > > ---
> > >   migration/ram.c | 34 +++---
> > >   1 file changed, 15 insertions(+), 19 deletions(-)
> > > 
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 5bcbf7a9f9..0caf32ab0a 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -1423,25 +1423,18 @@ static int 
> > > compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
> > >   thread_count = migrate_compress_threads();
> > >   qemu_mutex_lock(_done_lock);
> > 
> > Can we drop this lock in this case?
> 
> The lock is used to protect comp_param[].done...

IMHO it's okay?

It's used in this way:

  if (done) {
done = false;
  }

So it only switches done from true->false.

And the compression thread is the only one that did the other switch
(false->true).  IMHO this special case will allow no-lock since as
long as "done" is true here then current thread will be the only one
to modify it, then no race at all.

> 
> Well, we are able to possibly remove it if we redesign the implementation, 
> e.g, use atomic
> access for comp_param.done, however, it still can not work efficiently i 
> believe. Please see
> more in the later reply to your comments in the cover-letter.

Will read that after it arrives; though I didn't receive a reply.
Have you missed clicking the "send" button? ;)

Regards,

-- 
Peter Xu

Re: [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource.

2018-06-11 Thread Dan Williams

On Mon, Jun 11, 2018 at 9:26 AM, Stefan Hajnoczi  wrote:
> On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote:
>> Nvdimm driver use Memory hot-plug APIs to map it's pmem resource,
>> which at a section granularity.
>>
>> When QEMU emulated the vNVDIMM device, decrease the label-storage,
>> QEMU will put the vNVDIMMs directly next to one another in physical
>> address space, which means that the boundary between them won't
>> align to the 128 MB memory section size.
>
> I'm having a hard time parsing this.
>
> Where does the "128 MB memory section size" come from?  ACPI?
> A chipset-specific value?
>

The devm_memremap_pages() implementation use the memory hotplug core
to allocate the 'struct page' array/map for persistent memory. Memory
hotplug can only be performed in terms of sections, 128MB on x86_64.
There is some limited support for allowing devm_memremap_pages() to
overlap 'System RAM' within a given section, but it does not currently
support multiple devm_memremap_pages() calls overlapping within the
same section. There is currently a kernel bug where we do not handle
this unsupported configuration gracefully. The fix will cause
configurations configurations that try to overlap 2 persistent memory
ranges in the same section to fail.

The proposed fix is trying to make sure that QEMU does not run afoul
of this constraint.

There is currently no line of sight to reduce the minimum memory
hotplug alignment size to less than 128M. Also, as other architectures
outside of x86_64 add devm_memremap_pages() support, the minimum
section alignment constraint might change and is a property of a guest
OS. My understanding is that some guest OSes might expect an even
larger persistent memory minimum alignment.

Re: [Qemu-devel] [PATCH 01/12] migration: do not wait if no free thread

2018-06-11 Thread Xiao Guangrong





On 06/11/2018 03:39 PM, Peter Xu wrote:

On Mon, Jun 04, 2018 at 05:55:09PM +0800, guangrong.x...@gmail.com wrote:

From: Xiao Guangrong 

Instead of putting the main thread to sleep state to wait for
free compression thread, we can directly post it out as normal
page that reduces the latency and uses CPUs more efficiently


The feature looks good, though I'm not sure whether we should make a
capability flag for this feature since otherwise it'll be hard to
switch back to the old full-compression way no matter for what
reason.  Would that be a problem?



We assume this optimization should always be optimistic for all cases,
particularly, we introduced the statistics of compression, then the user
should adjust its parameters based on those statistics if anything works
worse.

Furthermore, we really need to improve this optimization if it hurts
any case rather than leaving a option to the user. :)



Signed-off-by: Xiao Guangrong 
---
  migration/ram.c | 34 +++---
  1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 5bcbf7a9f9..0caf32ab0a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1423,25 +1423,18 @@ static int compress_page_with_multi_thread(RAMState 
*rs, RAMBlock *block,
  
  thread_count = migrate_compress_threads();

  qemu_mutex_lock(_done_lock);


Can we drop this lock in this case?


The lock is used to protect comp_param[].done...

Well, we are able to possibly remove it if we redesign the implementation, e.g, 
use atomic
access for comp_param.done, however, it still can not work efficiently i 
believe. Please see
more in the later reply to your comments in the cover-letter.

Re: [Qemu-devel] [PATCH v1 11/11] pc-dimm: assign and verify the "slot" property during pre_plug

2018-06-11 Thread David Gibson

On Mon, Jun 11, 2018 at 02:16:55PM +0200, David Hildenbrand wrote:
> We can assign and verify the slot before realizing and trying to plug.
> reading/writing the slot property should never change, so let's reduce
> error handling a bit by using _abort.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: David Gibson 

> ---
>  hw/mem/pc-dimm.c | 31 ++-
>  1 file changed, 14 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 995ce22d8d..88423f95a3 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -30,12 +30,25 @@
>  void pc_dimm_memory_pre_plug(DeviceState *dev, MachineState *machine,
>   Error **errp)
>  {
> +Error *local_err = NULL;
> +int slot;
> +
> +slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP,
> +   _abort);
> +slot = pc_dimm_get_free_slot(slot == PC_DIMM_UNASSIGNED_SLOT ? NULL : 
> ,
> + machine->ram_slots, _err);
> +if (local_err) {
> +goto out;
> +}
> +object_property_set_int(OBJECT(dev), slot, PC_DIMM_SLOT_PROP, 
> _abort);
> +trace_mhp_pc_dimm_assigned_slot(slot);
> +out:
> +error_propagate(errp, local_err);
>  }
>  
>  void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine,
>   uint64_t align, Error **errp)
>  {
> -int slot;
>  PCDIMMDevice *dimm = PC_DIMM(dev);
>  PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>  MemoryRegion *vmstate_mr = ddc->get_vmstate_memory_region(dimm);
> @@ -61,22 +74,6 @@ void pc_dimm_memory_plug(DeviceState *dev, MachineState 
> *machine,
>  }
>  trace_mhp_pc_dimm_assigned_address(addr);
>  
> -slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP, 
> _err);
> -if (local_err) {
> -goto out;
> -}
> -
> -slot = pc_dimm_get_free_slot(slot == PC_DIMM_UNASSIGNED_SLOT ? NULL : 
> ,
> - machine->ram_slots, _err);
> -if (local_err) {
> -goto out;
> -}
> -object_property_set_int(OBJECT(dev), slot, PC_DIMM_SLOT_PROP, 
> _err);
> -if (local_err) {
> -goto out;
> -}
> -trace_mhp_pc_dimm_assigned_slot(slot);
> -
>  memory_device_plug_region(machine, mr, addr);
>  vmstate_register_ram(vmstate_mr, dev);
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v2] vhost-user: delete net client if necessary

2018-06-11 Thread linzhecheng

As qemu_new_net_client create new ncs but error happens later,
ncs will be left in global net_clients list and we can't use them any
more, so we need to cleanup them.

Signed-off-by: linzhecheng 

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 608b837175..a39f9c9974 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -345,6 +345,9 @@ err:
 s->vhost_user = NULL;
 }
 }
+if (nc0) {
+qemu_del_net_client(nc0);
+}
 
 return -1;
 }
-- 
2.12.2.windows.2

Re: [Qemu-devel] [PULL 00/13] Python queue, 2018-06-11

2018-06-11 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180611174243.31070-1-ehabk...@redhat.com
Subject: [Qemu-devel] [PULL 00/13] Python queue, 2018-06-11

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
9befc781a3 python: Remove scripts/ordereddict.py
5c22101565 python: Remove scripts/argparse.py
1c07589cb9 configure: Require Python 2.7 or newer
47a73d5f8a python: futurize -f lib2to3.fixes.fix_numliterals
e0512db6fe python: futurize -f lib2to3.fixes.fix_except
ff4f6bf56d python: futurize -f lib2to3.fixes.fix_renames
029e4039c4 python: futurize -f lib2to3.fixes.fix_tuple_params
d560c7b24d python: futurize -f lib2to3.fixes.fix_reduce
ff1b0480fe python: futurize -f lib2to3.fixes.fix_standarderror
89e08e39a0 python: futurize -f lib2to3.fixes.fix_has_key
77e2e30a73 python: futurize -f libfuturize.fixes.fix_next_call
8cf63c3b07 python: futurize -f libfuturize.fixes.fix_absolute_import
e631df3427 python: futurize -f libfuturize.fixes.fix_print_with_import

=== OUTPUT BEGIN ===
Checking PATCH 1/13: python: futurize -f 
libfuturize.fixes.fix_print_with_import...
ERROR: line over 90 characters
#45: FILE: scripts/analyse-9p-simpletrace.py:86:
+print("RERROR (tag =", tag, ", id =", symbol_9p[id], ", err = 
\"", os.strerror(err), "\")")

ERROR: line over 90 characters
#49: FILE: scripts/analyse-9p-simpletrace.py:89:
+print("TVERSION (tag =", tag, ", msize =", msize, ", version 
=", version, ")")

ERROR: line over 90 characters
#53: FILE: scripts/analyse-9p-simpletrace.py:92:
+print("RVERSION (tag =", tag, ", msize =", msize, ", version 
=", version, ")")

ERROR: line over 90 characters
#57: FILE: scripts/analyse-9p-simpletrace.py:95:
+print("TATTACH (tag =", tag, ", fid =", fid, ", afid =", afid, 
", uname =", uname, ", aname =", aname, ")")

ERROR: line over 90 characters
#61: FILE: scripts/analyse-9p-simpletrace.py:98:
+print("RATTACH (tag =", tag, ", qid={type =", type, ", version 
=", version, ", path =", path, "})")

ERROR: line over 90 characters
#69: FILE: scripts/analyse-9p-simpletrace.py:104:
+print("RSTAT (tag =", tag, ", mode =", mode, ", atime =", 
atime, ", mtime =", mtime, ", length =", length, ")")

ERROR: line over 90 characters
#73: FILE: scripts/analyse-9p-simpletrace.py:107:
+print("TGETATTR (tag =", tag, ", fid =", fid, ", request_mask 
=", hex(request_mask), ")")

ERROR: line over 90 characters
#77: FILE: scripts/analyse-9p-simpletrace.py:110:
+print("RGETATTR (tag =", tag, ", result_mask =", 
hex(result_mask), ", mode =", oct(mode), ", uid =", uid, ", gid =", gid, ")")

ERROR: line over 90 characters
#81: FILE: scripts/analyse-9p-simpletrace.py:113:
+print("TWALK (tag =", tag, ", fid =", fid, ", newfid =", 
newfid, ", nwnames =", nwnames, ")")

ERROR: line over 90 characters
#85: FILE: scripts/analyse-9p-simpletrace.py:116:
+print("RWALK (tag =", tag, ", nwnames =", nwnames, ", qids =", 
hex(qids), ")")

WARNING: line over 80 characters
#89: FILE: scripts/analyse-9p-simpletrace.py:119:
+print("TOPEN (tag =", tag, ", fid =", fid, ", mode =", 
oct(mode), ")")

ERROR: line over 90 characters
#93: FILE: scripts/analyse-9p-simpletrace.py:122:
+print("ROPEN (tag =", tag,  ", qid={type =", type, ", version 
=", version, ", path =", path, "}, iounit =", iounit, ")")

ERROR: line over 90 characters
#97: FILE: scripts/analyse-9p-simpletrace.py:125:
+print("TLCREATE (tag =", tag, ", dfid =", dfid, ", flags =", 
oct(flags), ", mode =", oct(mode), ", gid =", gid, ")")

ERROR: line over 90 characters
#101: FILE: scripts/analyse-9p-simpletrace.py:128:
+print("RLCREATE (tag =", tag,  ", qid={type =", type, ", 
version =", version, ", path =", path, "}, iounit =", iounit, ")")

WARNING: line over 80 characters
#105: FILE: scripts/analyse-9p-simpletrace.py:131:
+print("TFSYNC (tag =", tag, ", fid =", fid, ", datasync =", 
datasync, ")")

ERROR: line over 90 characters
#113: FILE: scripts/analyse-9p-simpletrace.py:137:
+print("TREAD (tag =", tag, ", fid =", fid, ", off =", off, ", 
max_count =", max_count, ")")

WARNING: line over 80 characters
#117: FILE: scripts/analyse-9p-simpletrace.py:140:
+

Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-11 Thread Michael S. Tsirkin

On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote:
> 
> 
> On 2018年06月12日 01:26, Michael S. Tsirkin wrote:
> > On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:
> > > This feature bit can be used by hypervisor to indicate virtio_net device 
> > > to
> > > act as a standby for another device with the same MAC address.
> > > 
> > > I tested this with a small change to the patch to mark the STANDBY 
> > > feature 'true'
> > > by default as i am using libvirt to start the VMs.
> > > Is there a way to pass the newly added feature bit 'standby' to qemu via 
> > > libvirt
> > > XML file?
> > > 
> > > Signed-off-by: Sridhar Samudrala 
> > So I do not think we can commit to this interface: we
> > really need to control visibility of the primary device.
> 
> The problem is legacy guest won't use primary device at all if we do this.

And that's by design - I think it's the only way to ensure the
legacy guest isn't confused.

> How about control the visibility of standby device?
> 
> Thanks

standy the always there to guarantee no downtime.

> > However just for testing purposes, we could add a non-stable
> > interface "x-standby" with the understanding that as any
> > x- prefix it's unstable and will be changed down the road,
> > likely in the next release.
> > 
> > 
> > > ---
> > >   hw/net/virtio-net.c | 2 ++
> > >   include/standard-headers/linux/virtio_net.h | 3 +++
> > >   2 files changed, 5 insertions(+)
> > > 
> > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > index 90502fca7c..38b3140670 100644
> > > --- a/hw/net/virtio-net.c
> > > +++ b/hw/net/virtio-net.c
> > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> > >true),
> > >   DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, 
> > > SPEED_UNKNOWN),
> > >   DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> > > +DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
> > > VIRTIO_NET_F_STANDBY,
> > > +  false),
> > >   DEFINE_PROP_END_OF_LIST(),
> > >   };
> > > diff --git a/include/standard-headers/linux/virtio_net.h 
> > > b/include/standard-headers/linux/virtio_net.h
> > > index e9f255ea3f..01ec09684c 100644
> > > --- a/include/standard-headers/linux/virtio_net.h
> > > +++ b/include/standard-headers/linux/virtio_net.h
> > > @@ -57,6 +57,9 @@
> > >* Steering */
> > >   #define VIRTIO_NET_F_CTRL_MAC_ADDR 23   /* Set MAC address */
> > > +#define VIRTIO_NET_F_STANDBY  62/* Act as standby for another 
> > > device
> > > + * with the same MAC.
> > > + */
> > >   #define VIRTIO_NET_F_SPEED_DUPLEX 63/* Device set linkspeed and 
> > > duplex */
> > >   #ifndef VIRTIO_NET_NO_LEGACY
> > > -- 
> > > 2.14.3

Re: [Qemu-devel] question: a dead loop in qemu when do blockJobAbort and vm suspend coinstantaneously

2018-06-11 Thread l00284672


As my previos reply said, the mirror_exit is done in
aio_poll(qemu_get_aio_context(), true).  In
mirror_exit, the bs willl be free by bdrv_unref.  So it will make a
Null pointer access in the follow-up procedure in bdrv_set_aio_context.
So we should add bdrv_ref for it to avoid bs freed in mirror_exit.

On 2018/6/12 9:45, Fam Zheng wrote:

On Mon, 06/11 11:31, l00284672 wrote:

I tried your patch with my modification below can slove this problem.

void blk_set_aio_context(BlockBackend *blk, AioContext *new_context)
{
     BlockDriverState *bs = blk_bs(blk);
     ThrottleGroupMember *tgm = >public.throttle_group_member;

     if (bs) {
     if (tgm->throttle_state) {
     bdrv_drained_begin(bs);
     throttle_group_detach_aio_context(tgm);
     throttle_group_attach_aio_context(tgm, new_context);
     bdrv_drained_end(bs);
     }
      bdrv_ref(bs);
    bdrv_set_aio_context(bs, new_context);
        bdrv_unref(bs);
     }
}

I add bdrv_ref before bdrv_set_aio_context to avoid bs freed in
mirror_exit.  Do you agree with
my modification ?

TBH I don't understand this change. @blk should have a reference to @bs here,
no?  Why is an extra reference making any difference?

Fam


On 2018/6/11 11:01, l00284672 wrote:

Thanks for your reply.

I tried your patch but it didn't work for qemu crashed.  The qemu crash
bt is below:

(gdb) bt
#0  bdrv_detach_aio_context (bs=bs@entry=0x55a96b79ca30)
#1  0x55a9688249ae in bdrv_set_aio_context
(bs=bs@entry=0x55a96b79ca30,
     new_context=new_context@entry=0x55a96b766920)
#2  0x55a96885f721 in blk_set_aio_context (blk=0x55a96b792820,
new_context=0x55a96b766920)
#3  0x55a9685ab797 in virtio_blk_data_plane_stop (vdev=)
#4  0x55a9687bf705 in virtio_bus_stop_ioeventfd (bus=0x55a96cc42220)
#5  0x55a9685d9d94 in virtio_vmstate_change (opaque=0x55a96cc42290,
running=,
     state=)
#6  0x55a96866e1a2 in vm_state_notify (running=running@entry=0,
state=state@entry=RUN_STATE_PAUSED)
#7  0x55a96857b4c5 in do_vm_stop (state=RUN_STATE_PAUSED)
#8  vm_stop (state=state@entry=RUN_STATE_PAUSED)
#9  0x55a96867d52b in qmp_stop (errp=errp@entry=0x7fff4e54a0d8)
#10 0x55a96867b6ab in qmp_marshal_stop (args=,
ret=, errp=0x7fff4e54a
#11 0x55a9688c2267 in do_qmp_dispatch (errp=0x7fff4e54a118,
request=0x55a96b7b4740)
#12 qmp_dispatch (request=request@entry=0x55a96b7ae490)
#13 0x55a96857dd42 in handle_qmp_command (parser=,
tokens=)
#14 0x55a9688c7534 in json_message_process_token
(lexer=0x55a96b776a68, input=0x55a96b70cae0, type=
     x=36, y=91)
#15 0x55a9688e960b in json_lexer_feed_char
(lexer=lexer@entry=0x55a96b776a68, ch=125 '}',
     flush=flush@entry=false)
#16 0x55a9688e96ce in json_lexer_feed (lexer=0x55a96b776a68,
buffer=, size=, buffer=,
#18 0x55a96857c5fb in monitor_qmp_read (opaque=,
buf=,  size=)
#19 0x55a968667596 in tcp_chr_read (chan=,
cond=, opaque=0x55a96b7748
#20 0x7f8a9447899a in g_main_context_dispatch () from
/lib64/libglib-2.0.so.0
#21 0x55a968828c3c in glib_pollfds_poll ()
#22 os_host_main_loop_wait (timeout=)
#23 main_loop_wait (nonblocking=)
#24 0x55a96854351f in main_loop () at vl.c:2095
#25 main (argc=, argv=, envp=)

(gdb) p *bs
$1 = {total_sectors = 94186141054112, open_flags = 1811887680, read_only
= 169, encrypted = 85, valid_k
   sg = false, probed = false, copy_on_read = 0, flush_queue = {entries =
{sqh_first = 0x0,
   sqh_last = 0x55a96b79ca48}}, active_flush_req = false, flushed_gen
= 68727, drv = 0x0, opaque = 0
   aio_context = 0x55a96b778cd0, aio_notifiers = {lh_first = 0x0},
walking_aio_notifiers = false,
   filename = "/mnt/sdb/lzg/disk_10G.son", '\000' ,
   backing_file = "\000mnt/sdb/lzg/disk_10G.raw", '\000' ,
   backing_format = "\000aw", '\000' ,
full_open_options = 0x0,
   exact_filename = "/mnt/sdb/lzg/disk_10G.son", '\000' , backing = 0x0, file = 0x0,
   before_write_notifiers = {notifiers = {lh_first = 0x0}}, in_flight =
0, serialising_in_flight = 0,
   wakeup = false, wr_highest_offset = 35188224, bl = {request_alignment
= 0, max_pdiscard = 0,
     pdiscard_alignment = 0, max_pwrite_zeroes = 0,
pwrite_zeroes_alignment = 0, opt_transfer = 0, max_t
     min_mem_alignment = 0, opt_mem_alignment = 0, max_iov = 0},
supported_write_flags = 0,
   supported_zero_flags = 4, node_name = "#block349", '\000' , node_list = {
     tqe_next = 0x55a96b7b14f0, tqe_prev = 0x0}, bs_list = {tqe_next =
0x55a96b7ab240, tqe_prev = 0x0},
   monitor_list = {tqe_next = 0x0, tqe_prev = 0x0}, dirty_bitmaps =
{lh_first = 0x0}, refcnt = 0,
   tracked_requests = {lh_first = 0x0}, op_blockers = {{lh_first = 0x0}
}, job = 0x0,
   inherits_from = 0x0, children = {lh_first = 0x0}, parents = {lh_first
= 0x0}, options = 0x0,
   explicit_options = 0x0, detect_zeroes =
BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF, backing_blocker = 0x0,
   write_threshold_offset = 0, write_threshold_notifier = {notify = 0x0,
node = {le_next = 0x0, le_prev
   io_plugged = 0, quiesce_counter =

Re: [Qemu-devel] [PATCH] vhost-user: delete net client if necessary

2018-06-11 Thread Jason Wang





On 2018年06月11日 15:31, linzhecheng wrote:

As qemu_new_net_client create new ncs but error happens later,
ncs will be left in global net_clients list, so we need to cleanup them.

Signed-off-by: linzhecheng 

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 608b837175..1c7ee48b60 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -345,6 +345,7 @@ err:
  s->vhost_user = NULL;
  }
  }
+qemu_del_net_client(nc0);
  
  return -1;

  }


Can nc0 be NULL here?

And please try to use "Fixes:" tag to ease stable maintenance.

Thanks

Re: [Qemu-devel] [PATCH v3] net: Fix a potential segfault

2018-06-11 Thread Jason Wang





On 2018年06月11日 17:48, Thomas Huth wrote:

On 11.06.2018 11:23, Lin Ma wrote:

If user forgets to provide any backend types for '-netdev' in qemu CLI,
It triggers seg fault.

e.g.

Expected:
$ qemu -netdev id=net0
qemu-system-x86_64: Parameter 'type' is missing

Actual:
$ qemu -netdev id=net0
Segmentation fault (core dumped)

Signed-off-by: Lin Ma 
---
  net/net.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/net.c b/net/net.c
index efb9eaf779..2a3133990c 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1093,7 +1093,9 @@ static int net_client_init(QemuOpts *opts, bool 
is_netdev, Error **errp)
  int ret = -1;
  Visitor *v = opts_visitor_new(opts);
  
-if (is_netdev && is_help_option(qemu_opt_get(opts, "type"))) {

+const char *type = qemu_opt_get(opts, "type");
+
+if (is_netdev && type && is_help_option(type)) {
  show_netdevs();
  exit(0);
  } else {


Reviewed-by: Thomas Huth 


Applied, and queued for -stable.

Thanks

[Qemu-devel] [PATCH v4 08/14] fpu: introduce hardfloat

2018-06-11 Thread Emilio G. Cota

The appended paves the way for leveraging the host FPU for a subset
of guest FP operations. For most guest workloads (e.g. FP flags
aren't ever cleared, inexact occurs often and rounding is set to the
default [to nearest]) this will yield sizable performance speedups.

The approach followed here avoids checking the FP exception flags register.
See the added comment for details.

This assumes that QEMU is running on an IEEE754-compliant FPU and
that the rounding is set to the default (to nearest). The
implementation-dependent specifics of the FPU should not matter; things
like tininess detection and snan representation are still dealt with in
soft-fp. However, this approach will break on most hosts if we compile
QEMU with flags such as -ffast-math. We control the flags so this should
be easy to enforce though.

This patch just adds common code. Some operations will be migrated
to hardfloat in subsequent patches to ease bisection.

Note: some architectures (at least PPC, there might be others) clear
the status flags passed to softfloat before most FP operations. This
precludes the use of hardfloat, so to avoid introducing a performance
regression for those targets, we add a flag to disable hardfloat.
In the long run though it would be good to fix the targets so that
at least the inexact flag passed to softfloat is indeed sticky.

Signed-off-by: Emilio G. Cota 
---
 fpu/softfloat.c | 341 
 1 file changed, 341 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 2ab5a88..4d378d7 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -83,6 +83,7 @@ this code that are retained.
  * target-dependent and needs the TARGET_* macros.
  */
 #include "qemu/osdep.h"
+#include 
 #include "qemu/bitops.h"
 #include "fpu/softfloat.h"
 
@@ -95,6 +96,346 @@ this code that are retained.
 **/
 #include "fpu/softfloat-macros.h"
 
+/*
+ * Hardfloat
+ *
+ * Fast emulation of guest FP instructions is challenging for two reasons.
+ * First, FP instruction semantics are similar but not identical, particularly
+ * when handling NaNs. Second, emulating at reasonable speed the guest FP
+ * exception flags is not trivial: reading the host's flags register with a
+ * feclearexcept & fetestexcept pair is slow [slightly slower than soft-fp],
+ * and trapping on every FP exception is not fast nor pleasant to work with.
+ *
+ * We address these challenges by leveraging the host FPU for a subset of the
+ * operations. To do this we expand on the idea presented in this paper:
+ *
+ * Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions in a
+ * binary translator." Software: Practice and Experience 46.12 
(2016):1591-1615.
+ *
+ * The idea is thus to leverage the host FPU to (1) compute FP operations
+ * and (2) identify whether FP exceptions occurred while avoiding
+ * expensive exception flag register accesses.
+ *
+ * An important optimization shown in the paper is that given that exception
+ * flags are rarely cleared by the guest, we can avoid recomputing some flags.
+ * This is particularly useful for the inexact flag, which is very frequently
+ * raised in floating-point workloads.
+ *
+ * We optimize the code further by deferring to soft-fp whenever FP exception
+ * detection might get hairy. Two examples: (1) when at least one operand is
+ * denormal/inf/NaN; (2) when operands are not guaranteed to lead to a 0 result
+ * and the result is < the minimum normal.
+ */
+#define GEN_TYPE_CONV(name, to_t, from_t)   \
+static inline to_t name(from_t a)   \
+{   \
+to_t r = *(to_t *)   \
+return r;   \
+}
+
+GEN_TYPE_CONV(float32_to_float, float, float32)
+GEN_TYPE_CONV(float64_to_double, double, float64)
+GEN_TYPE_CONV(float_to_float32, float32, float)
+GEN_TYPE_CONV(double_to_float64, float64, double)
+#undef GEN_TYPE_CONV
+
+#define GEN_INPUT_FLUSH__NOCHECK(name, soft_t)  \
+static inline void name(soft_t *a, float_status *s) \
+{   \
+if (unlikely(soft_t ## _is_denormal(*a))) { \
+*a = soft_t ## _set_sign(soft_t ## _zero,   \
+ soft_t ## _is_neg(*a));\
+s->float_exception_flags |= float_flag_input_denormal;  \
+}   \
+}
+
+GEN_INPUT_FLUSH__NOCHECK(float32_input_flush__nocheck, float32)
+GEN_INPUT_FLUSH__NOCHECK(float64_input_flush__nocheck, float64)
+#undef GEN_INPUT_FLUSH__NOCHECK
+
+#define GEN_INPUT_FLUSH1(name, soft_t)  \
+static inline void name(soft_t *a, float_status *s) \
+{   \
+

[Qemu-devel] [PATCH v4 13/14] hardfloat: support float32/64 square root

2018-06-11 Thread Emilio G. Cota

Performance results for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
sqrt-single: 43.27 MFlops
sqrt-double: 24.81 MFlops
- after:
sqrt-single: 297.94 MFlops
sqrt-double: 210.46 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
sqrt-single: 12.41 MFlops
sqrt-double: 6.22 MFlops
- after:
sqrt-single: 55.58 MFlops
sqrt-double: 40.63 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
sqrt-single: 17.01 MFlops
sqrt-double: 9.61 MFlops
- after:
sqrt-single: 104.17 MFlops
sqrt-double: 133.32 MFlops

Here none of the machines got faster from enabling USE_FP. For
instance, on x86_64 sqrt is 23% slower for single precision,
with it enabled, and 17% slower for double precision.

Signed-off-by: Emilio G. Cota 
---
 fpu/softfloat.c | 73 +++--
 1 file changed, 71 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 63cf60c..f89e872 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2812,14 +2812,16 @@ float16 __attribute__((flatten)) float16_sqrt(float16 
a, float_status *status)
 return float16_round_pack_canonical(pr, status);
 }
 
-float32 __attribute__((flatten)) float32_sqrt(float32 a, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_float32_sqrt(float32 a, float_status *status)
 {
 FloatParts pa = float32_unpack_canonical(a, status);
 FloatParts pr = sqrt_float(pa, status, _params);
 return float32_round_pack_canonical(pr, status);
 }
 
-float64 __attribute__((flatten)) float64_sqrt(float64 a, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_float64_sqrt(float64 a, float_status *status)
 {
 FloatParts pa = float64_unpack_canonical(a, status);
 FloatParts pr = sqrt_float(pa, status, _params);
@@ -2899,6 +2901,73 @@ float64 float64_silence_nan(float64 a, float_status 
*status)
 return float64_pack_raw(p);
 }
 
+#define GEN_SQRT_SF(name, soft_t, host_t, host_sqrt_func)   \
+static soft_t name(soft_t a, float_status *s)   \
+{   \
+if (QEMU_NO_HARDFLOAT) {\
+goto soft;  \
+}   \
+soft_t ## _input_flush1(, s); \
+if (likely(soft_t ## _is_zero_or_normal(a) &&   \
+   !soft_t ## _is_neg(a) && \
+   can_use_fpu(s))) {   \
+host_t ha = soft_t ## _to_ ## host_t(a);\
+host_t hr = host_sqrt_func(ha); \
+\
+return host_t ## _to_ ## soft_t(hr);\
+}   \
+soft:   \
+return soft_ ## soft_t ## _sqrt(a, s);  \
+}
+
+#define GEN_SQRT_FP(name, soft_t, host_t, host_sqrt_func)   \
+static soft_t name(soft_t a, float_status *s)   \
+{   \
+host_t ha;  \
+\
+if (QEMU_NO_HARDFLOAT) {\
+goto soft;  \
+}   \
+soft_t ## _input_flush1(, s); \
+ha = soft_t ## _to_ ## host_t(a);   \
+if (likely((fpclassify(ha) == FP_NORMAL ||  \
+fpclassify(ha) == FP_ZERO) &&   \
+   !signbit(ha) &&  \
+   can_use_fpu(s))) {   \
+host_t hr = host_sqrt_func(ha); \
+\
+return host_t ## _to_ ## soft_t(hr);\
+}   \
+soft:   \
+return soft_ ## soft_t ## _sqrt(a, s);  \
+}
+
+GEN_SQRT_SF(f32_sqrt, float32, float, sqrtf)
+GEN_SQRT_SF(f64_sqrt, float64, double, sqrt)
+#undef GEN_SQRT_SF
+
+GEN_SQRT_FP(float_sqrt, float32, float, sqrtf)
+GEN_SQRT_FP(double_sqrt, float64, double, sqrt)
+#undef GEN_SQRT_FP
+
+float32 __attribute__((flatten)) float32_sqrt(float32 a, float_status *s)
+{
+if

[Qemu-devel] [PATCH v4 05/14] tests/fp: add fp-bench, a collection of simple floating point microbenchmarks

2018-06-11 Thread Emilio G. Cota

This will allow us to measure the performance impact of FP emulation
optimizations. Note that we can measure both directly the impact
on the softfloat functions (with "-t soft"), or the impact on an
emulated workload (call with "-t host" and run under qemu user-mode).

Signed-off-by: Emilio G. Cota 
---
 tests/fp/fp-bench.c | 526 
 tests/fp/.gitignore |   1 +
 tests/fp/Makefile   |   4 +-
 3 files changed, 530 insertions(+), 1 deletion(-)
 create mode 100644 tests/fp/fp-bench.c

diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
new file mode 100644
index 000..e4c6885
--- /dev/null
+++ b/tests/fp/fp-bench.c
@@ -0,0 +1,526 @@
+/*
+ * fp-bench.c - A collection of simple floating point microbenchmarks.
+ *
+ * Copyright (C) 2018, Emilio G. Cota 
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+#ifndef HW_POISON_H
+#error Must define HW_POISON_H to work around TARGET_* poisoning
+#endif
+
+#include "qemu/osdep.h"
+#include 
+#include "qemu/timer.h"
+#include "fpu/softfloat.h"
+
+/* amortize the computation of random inputs */
+#define OPS_PER_ITER 5
+
+#define MAX_OPERANDS 3
+
+#define SEED_A 0xdeadfacedeadface
+#define SEED_B 0xbadc0feebadc0fee
+#define SEED_C 0xbeefdeadbeefdead
+
+enum op {
+OP_ADD,
+OP_SUB,
+OP_MUL,
+OP_DIV,
+OP_FMA,
+OP_SQRT,
+OP_CMP,
+OP_MAX_NR,
+};
+
+static const char * const op_names[] = {
+[OP_ADD] = "add",
+[OP_SUB] = "sub",
+[OP_MUL] = "mul",
+[OP_DIV] = "div",
+[OP_FMA] = "fma",
+[OP_SQRT] = "sqrt",
+[OP_CMP] = "cmp",
+[OP_MAX_NR] = NULL,
+};
+
+enum precision {
+PREC_SINGLE,
+PREC_DOUBLE,
+PREC_FLOAT32,
+PREC_FLOAT64,
+PREC_MAX_NR,
+};
+
+enum tester {
+TESTER_SOFT,
+TESTER_HOST,
+TESTER_MAX_NR,
+};
+
+static const char * const tester_names[] = {
+[TESTER_SOFT] = "soft",
+[TESTER_HOST] = "host",
+[TESTER_MAX_NR] = NULL,
+};
+
+union fp {
+float f;
+double d;
+float32 f32;
+float64 f64;
+uint64_t u64;
+};
+
+struct op_state;
+
+typedef float (*float_func_t)(const struct op_state *s);
+typedef double (*double_func_t)(const struct op_state *s);
+
+union fp_func {
+float_func_t float_func;
+double_func_t double_func;
+};
+
+typedef void (*bench_func_t)(void);
+
+struct op_desc {
+const char * const name;
+};
+
+#define DEFAULT_DURATION_SECS 1
+
+static uint64_t random_ops[MAX_OPERANDS] = {
+SEED_A, SEED_B, SEED_C,
+};
+static float_status soft_status;
+static enum precision precision;
+static enum op operation;
+static enum tester tester;
+static uint64_t n_completed_ops;
+static unsigned int duration = DEFAULT_DURATION_SECS;
+static int64_t ns_elapsed;
+/* disable optimizations with volatile */
+static volatile union fp res;
+
+/*
+ * From: https://en.wikipedia.org/wiki/Xorshift
+ * This is faster than rand_r(), and gives us a wider range (RAND_MAX is only
+ * guaranteed to be >= INT_MAX).
+ */
+static uint64_t xorshift64star(uint64_t x)
+{
+x ^= x >> 12; /* a */
+x ^= x << 25; /* b */
+x ^= x >> 27; /* c */
+return x * UINT64_C(2685821657736338717);
+}
+
+static void update_random_ops(int n_ops, enum precision prec)
+{
+int i;
+
+for (i = 0; i < n_ops; i++) {
+uint64_t r = random_ops[i];
+
+if (prec == PREC_SINGLE || PREC_FLOAT32) {
+do {
+r = xorshift64star(r);
+} while (!float32_is_normal(r));
+} else if (prec == PREC_DOUBLE || PREC_FLOAT64) {
+do {
+r = xorshift64star(r);
+} while (!float64_is_normal(r));
+} else {
+g_assert_not_reached();
+}
+random_ops[i] = r;
+}
+}
+
+static void fill_random(union fp *ops, int n_ops, enum precision prec,
+bool no_neg)
+{
+int i;
+
+for (i = 0; i < n_ops; i++) {
+switch (prec) {
+case PREC_SINGLE:
+case PREC_FLOAT32:
+ops[i].f32 = make_float32(random_ops[i]);
+if (no_neg && float32_is_neg(ops[i].f32)) {
+ops[i].f32 = float32_chs(ops[i].f32);
+}
+/* raise the exponent to limit the frequency of denormal results */
+ops[i].f32 |= 0x4000;
+break;
+case PREC_DOUBLE:
+case PREC_FLOAT64:
+ops[i].f64 = make_float64(random_ops[i]);
+if (no_neg && float64_is_neg(ops[i].f64)) {
+ops[i].f64 = float64_chs(ops[i].f64);
+}
+/* raise the exponent to limit the frequency of denormal results */
+ops[i].f64 |= LIT64(0x4000);
+break;
+default:
+g_assert_not_reached();
+}
+}
+}
+
+/*
+ * The main benchmark function. Instead of (ab)using macros, we rely
+ * on the compiler to unfold this at compile-time.
+ */
+static void bench(enum precision

[Qemu-devel] [PATCH v4 02/14] fp-test: add muladd variants

2018-06-11 Thread Emilio G. Cota

These are a few muladd-related operations that the original IBM syntax
does not specify; model files for these are in muladd.fptest.

Signed-off-by: Emilio G. Cota 
---
 tests/fp/fp-test.c | 24 
 tests/fp/muladd.fptest | 51 ++
 2 files changed, 75 insertions(+)
 create mode 100644 tests/fp/muladd.fptest

diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
index 6be9ce7..bf6d0f3 100644
--- a/tests/fp/fp-test.c
+++ b/tests/fp/fp-test.c
@@ -52,6 +52,9 @@ enum op {
 OP_SUB,
 OP_MUL,
 OP_MULADD,
+OP_MULADD_NEG_ADDEND,
+OP_MULADD_NEG_PRODUCT,
+OP_MULADD_NEG_RESULT,
 OP_DIV,
 OP_SQRT,
 OP_MINNUM,
@@ -68,6 +71,9 @@ static const struct op_desc ops[] = {
 [OP_SUB] =   { "-", 2 },
 [OP_MUL] =   { "*", 2 },
 [OP_MULADD] ={ "*+", 3 },
+[OP_MULADD_NEG_ADDEND] =  { "*+nc", 3 },
+[OP_MULADD_NEG_PRODUCT] = { "*+np", 3 },
+[OP_MULADD_NEG_RESULT] =  { "*+nr", 3 },
 [OP_DIV] =   { "/", 2 },
 [OP_SQRT] =  { "V", 1 },
 [OP_MINNUM] ={ " Q i
+b32*+nc =0 -1.7FP127 -Inf +Inf -> Q i
+b32*+nc =0 -1.6C9AE7P113 -Inf +Inf -> Q i
+b32*+nc =0 -1.00P-126 -Inf +Inf -> Q i
+b32*+nc =0 -0.7FP-126 -Inf +Inf -> Q i
+b32*+nc =0 -0.1B977AP-126 -Inf +Inf -> Q i
+b32*+nc =0 -0.01P-126 -Inf +Inf -> Q i
+b32*+nc =0 -1.00P0 -Inf +Inf -> Q i
+b32*+nc =0 -Zero -Inf +Inf -> Q i
+b32*+nc =0 +Zero -Inf +Inf -> Q i
+b32*+nc =0 -Zero -1.00P-126 +1.7FP127 -> -1.7FP127
+b32*+nc =0 +Zero -1.00P-126 +1.7FP127 -> -1.7FP127
+b32*+nc =0 -1.00P-126 -1.7FP127 -1.4B9156P109 -> +1.4B9156P109 x
+b32*+nc =0 -0.7FP-126 -1.7FP127 -1.51BA59P-113 -> +1.7DP1 x
+b32*+nc =0 -0.3D6B57P-126 -1.7FP127 -1.265398P-67 -> +1.75AD5BP0 x
+b32*+nc =0 -0.01P-126 -1.7FP127 -1.677330P-113 -> +1.7FP-22 x
+
+# np == negate product
+b32*+np =0 +Inf -Inf -Inf -> Q i
+b32*+np =0 +1.7FP127 -Inf -Inf -> Q i
+b32*+np =0 +1.6C9AE7P113 -Inf -Inf -> Q i
+b32*+np =0 +1.00P-126 -Inf -Inf -> Q i
+b32*+np =0 +0.7FP-126 -Inf -Inf -> Q i
+b32*+np =0 +0.1B977AP-126 -Inf -Inf -> Q i
+b32*+np =0 +0.01P-126 -Inf -Inf -> Q i
+b32*+np =0 +1.00P0 -Inf -Inf -> Q i
+b32*+np =0 +Zero -Inf -Inf -> Q i
+b32*+np =0 +Zero -Inf -Inf -> Q i
+b32*+np =0 -Zero -1.00P-126 -1.7FP127 -> -1.7FP127
+b32*+np =0 +Zero -1.00P-126 -1.7FP127 -> -1.7FP127
+b32*+np =0 -1.3A6A89P-18 +1.24E7AEP9 -0.7FP-126 -> +1.7029E9P-9 x
+
+# nr == negate result
+b32*+nr =0 -Inf -Inf -Inf -> Q i
+b32*+nr =0 -1.7FP127 -Inf -Inf -> Q i
+b32*+nr =0 -1.6C9AE7P113 -Inf -Inf -> Q i
+b32*+nr =0 -1.00P-126 -Inf -Inf -> Q i
+b32*+nr =0 -0.7FP-126 -Inf -Inf -> Q i
+b32*+nr =0 -0.1B977AP-126 -Inf -Inf -> Q i
+b32*+nr =0 -0.01P-126 -Inf -Inf -> Q i
+b32*+nr =0 -1.00P0 -Inf -Inf -> Q i
+b32*+nr =0 -Zero -Inf -Inf -> Q i
+b32*+nr =0 -Zero -Inf -Inf -> Q i
+b32*+nr =0 +Zero -1.00P-126 -1.7FP127 -> +1.7FP127
+b32*+nr =0 -Zero -1.00P-126 -1.7FP127 -> +1.7FP127
+b32*+nr =0 -1.00P-126 -1.7FP127 -1.4B9156P109 -> +1.4B9156P109 x
+b32*+nr =0 -0.7FP-126 -1.7FP127 -1.51BA59P-113 -> -1.7DP1 x
+b32*+nr =0 -0.3D6B57P-126 -1.7FP127 -1.265398P-67 -> -1.75AD5BP0 x
+b32*+nr =0 -0.01P-126 -1.7FP127 -1.677330P-113 -> -1.7FP-22 x
+b32*+nr =0 +1.72E53AP-33 -1.7FP127 -1.5AA684P-2 -> +1.72E539P95 x
-- 
2.7.4

[Qemu-devel] [PATCH v4 09/14] hardfloat: support float32/64 addition and subtraction

2018-06-11 Thread Emilio G. Cota

Performance results (single and double precision) for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
add-single: 135.07 MFlops
add-double: 131.60 MFlops
sub-single: 130.04 MFlops
sub-double: 133.01 MFlops
- after:
add-single: 443.04 MFlops
add-double: 301.95 MFlops
sub-single: 411.36 MFlops
sub-double: 293.15 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
add-single: 44.79 MFlops
add-double: 49.20 MFlops
sub-single: 44.55 MFlops
sub-double: 49.06 MFlops
- after:
add-single: 93.28 MFlops
add-double: 88.27 MFlops
sub-single: 91.47 MFlops
sub-double: 88.27 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
add-single: 72.59 MFlops
add-double: 72.27 MFlops
sub-single: 75.33 MFlops
sub-double: 70.54 MFlops
- after:
add-single: 112.95 MFlops
add-double: 201.11 MFlops
sub-single: 116.80 MFlops
sub-double: 188.72 MFlops

Note that the IBM and ARM machines benefit from having
HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance
can suffer significantly:
- IBM Power8:
add-single: [1] 54.94 vs [0] 116.37 MFlops
add-double: [1] 58.92 vs [0] 201.44 MFlops
- Aarch64 A57:
add-single: [1] 80.72 vs [0] 93.24 MFlops
add-double: [1] 82.10 vs [0] 88.18 MFlops

On the Intel machine, having 2F64 set to 1 pays off, but it
doesn't for 2F32:
- Intel i7-6700K:
add-single: [1] 285.79 vs [0] 426.70 MFlops
add-double: [1] 302.15 vs [0] 278.82 MFlops

Signed-off-by: Emilio G. Cota 
---
 fpu/softfloat.c | 106 +++-
 1 file changed, 98 insertions(+), 8 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4d378d7..cdce6b2 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1077,8 +1077,8 @@ float16  __attribute__((flatten)) float16_add(float16 a, 
float16 b,
 return float16_round_pack_canonical(pr, status);
 }
 
-float32 __attribute__((flatten)) float32_add(float32 a, float32 b,
- float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_float32_add(float32 a, float32 b, float_status *status)
 {
 FloatParts pa = float32_unpack_canonical(a, status);
 FloatParts pb = float32_unpack_canonical(b, status);
@@ -1087,8 +1087,8 @@ float32 __attribute__((flatten)) float32_add(float32 a, 
float32 b,
 return float32_round_pack_canonical(pr, status);
 }
 
-float64 __attribute__((flatten)) float64_add(float64 a, float64 b,
- float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_float64_add(float64 a, float64 b, float_status *status)
 {
 FloatParts pa = float64_unpack_canonical(a, status);
 FloatParts pb = float64_unpack_canonical(b, status);
@@ -1107,8 +1107,8 @@ float16 __attribute__((flatten)) float16_sub(float16 a, 
float16 b,
 return float16_round_pack_canonical(pr, status);
 }
 
-float32 __attribute__((flatten)) float32_sub(float32 a, float32 b,
- float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_float32_sub(float32 a, float32 b, float_status *status)
 {
 FloatParts pa = float32_unpack_canonical(a, status);
 FloatParts pb = float32_unpack_canonical(b, status);
@@ -1117,8 +1117,8 @@ float32 __attribute__((flatten)) float32_sub(float32 a, 
float32 b,
 return float32_round_pack_canonical(pr, status);
 }
 
-float64 __attribute__((flatten)) float64_sub(float64 a, float64 b,
- float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_float64_sub(float64 a, float64 b, float_status *status)
 {
 FloatParts pa = float64_unpack_canonical(a, status);
 FloatParts pb = float64_unpack_canonical(b, status);
@@ -1127,6 +1127,96 @@ float64 __attribute__((flatten)) float64_sub(float64 a, 
float64 b,
 return float64_round_pack_canonical(pr, status);
 }
 
+static float float_add(float a, float b)
+{
+return a + b;
+}
+
+static float float_sub(float a, float b)
+{
+return a - b;
+}
+
+static double double_add(double a, double b)
+{
+return a + b;
+}
+
+static double double_sub(double a, double b)
+{
+return a - b;
+}
+
+static bool f32_addsub_post(float32 a, float32 b, const struct float_status *s)
+{
+return !(float32_is_zero(a) && float32_is_zero(b));
+}
+
+static bool
+float_addsub_post(float a, float b, const struct float_status *s)
+{
+return !(fpclassify(a) == FP_ZERO && fpclassify(b) == FP_ZERO);
+}
+
+static bool f64_addsub_post(float64 a, float64 b, const struct float_status *s)
+{
+return !(float64_is_zero(a) && float64_is_zero(b));
+}
+
+static bool
+double_addsub_post(double a, double b, const struct float_status *s)
+{
+return !(fpclassify(a) == FP_ZERO && fpclassify(b) == FP_ZERO);
+}
+
+static float32 float32_addsub(float32 a, float32 b, float_status *s,
+  float_op2_func_t hard, f32_op2_func_t soft)
+{
+if (QEMU_HARDFLOAT_2F32_USE_FP) {
+return float_gen2(a, b, s, hard, soft, float_is_zon2, 
float_addsub_post,
+

[Qemu-devel] [PATCH v4 12/14] hardfloat: support float32/64 fused multiply-add

2018-06-11 Thread Emilio G. Cota

Performance results for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
fma-single: 74.73 MFlops
fma-double: 74.54 MFlops
- after:
fma-single: 203.37 MFlops
fma-double: 169.37 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
fma-single: 23.24 MFlops
fma-double: 23.70 MFlops
- after:
fma-single: 66.14 MFlops
fma-double: 63.10 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
fma-single: 37.26 MFlops
fma-double: 37.29 MFlops
- after:
fma-single: 48.90 MFlops
fma-double: 59.51 MFlops

Here having 3FP64 set to 1 pays off for x86_64:
[1] 170.15 vs [0] 153.12 MFlops

Signed-off-by: Emilio G. Cota 
---
 fpu/softfloat.c | 169 ++--
 1 file changed, 165 insertions(+), 4 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index fa6c3b6..63cf60c 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1568,8 +1568,9 @@ float16 __attribute__((flatten)) float16_muladd(float16 
a, float16 b, float16 c,
 return float16_round_pack_canonical(pr, status);
 }
 
-float32 __attribute__((flatten)) float32_muladd(float32 a, float32 b, float32 
c,
-int flags, float_status 
*status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_float32_muladd(float32 a, float32 b, float32 c, int flags,
+float_status *status)
 {
 FloatParts pa = float32_unpack_canonical(a, status);
 FloatParts pb = float32_unpack_canonical(b, status);
@@ -1579,8 +1580,9 @@ float32 __attribute__((flatten)) float32_muladd(float32 
a, float32 b, float32 c,
 return float32_round_pack_canonical(pr, status);
 }
 
-float64 __attribute__((flatten)) float64_muladd(float64 a, float64 b, float64 
c,
-int flags, float_status 
*status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_float64_muladd(float64 a, float64 b, float64 c, int flags,
+float_status *status)
 {
 FloatParts pa = float64_unpack_canonical(a, status);
 FloatParts pb = float64_unpack_canonical(b, status);
@@ -1591,6 +1593,165 @@ float64 __attribute__((flatten)) float64_muladd(float64 
a, float64 b, float64 c,
 }
 
 /*
+ * FMA generator for softfloat-based condition checks.
+ *
+ * When (a || b) == 0, there's no need to check for under/over flow,
+ * since we know the addend is (normal || 0) and the product is 0.
+ */
+#define GEN_FMA_SF(name, soft_t, host_t, host_fma_f, host_abs_f, min_normal) \
+static soft_t   \
+name(soft_t a, soft_t b, soft_t c, int flags, float_status *s)  \
+{   \
+if (QEMU_NO_HARDFLOAT) {\
+goto soft;  \
+}   \
+soft_t ## _input_flush3(, , , s); \
+if (likely(soft_t ## _is_zero_or_normal(a) &&   \
+   soft_t ## _is_zero_or_normal(b) &&   \
+   soft_t ## _is_zero_or_normal(c) &&   \
+   !(flags & float_muladd_halve_result) &&  \
+   can_use_fpu(s))) {   \
+if (soft_t ## _is_zero(a) || soft_t ## _is_zero(b)) {   \
+soft_t p, r;\
+host_t hp, hc, hr;  \
+bool prod_sign; \
+\
+prod_sign = soft_t ## _is_neg(a) ^ soft_t ## _is_neg(b); \
+prod_sign ^= !!(flags & float_muladd_negate_product);   \
+p = soft_t ## _set_sign(soft_t ## _zero, prod_sign);\
+\
+if (flags & float_muladd_negate_c) {\
+c = soft_t ## _chs(c);  \
+}   \
+\
+hp = soft_t ## _to_ ## host_t(p);   \
+hc = soft_t ## _to_ ## host_t(c);   \
+hr = hp + hc;   \
+r = host_t ## _to_ ## soft_t(hr);   \
+return flags & float_muladd_negate_result ? \
+soft_t ## _chs(r) : r;  \
+} else {\
+host_t ha, hb, hc, hr;  \
+soft_t r;   \
+

Re: [Qemu-devel] [PATCH v1 10/11] pc-dimm: introduce and use pc_dimm_memory_pre_plug()

2018-06-11 Thread David Gibson

On Mon, Jun 11, 2018 at 02:16:54PM +0200, David Hildenbrand wrote:
> We'll be factoring out some pc-dimm specific and some memory-device
> checks next.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: David Gibson 

ppc parts
Acked-by: David Gibson 

> ---
>  hw/i386/pc.c | 2 ++
>  hw/mem/pc-dimm.c | 5 +
>  hw/ppc/spapr.c   | 1 +
>  include/hw/mem/pc-dimm.h | 2 ++
>  4 files changed, 10 insertions(+)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 017396fe84..dc8e7b033b 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1695,6 +1695,8 @@ static void pc_dimm_pre_plug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  error_setg(errp, "nvdimm is not enabled: missing 'nvdimm' in '-M'");
>  return;
>  }
> +
> +pc_dimm_memory_pre_plug(dev, MACHINE(hotplug_dev), errp);
>  }
>  
>  static void pc_dimm_plug(HotplugHandler *hotplug_dev,
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index bc79dd04d8..995ce22d8d 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -27,6 +27,11 @@
>  #include "sysemu/numa.h"
>  #include "trace.h"
>  
> +void pc_dimm_memory_pre_plug(DeviceState *dev, MachineState *machine,
> + Error **errp)
> +{
> +}
> +
>  void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine,
>   uint64_t align, Error **errp)
>  {
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 214286fd2f..54eddc0069 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3202,6 +3202,7 @@ static void spapr_memory_pre_plug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  goto out;
>  }
>  
> +pc_dimm_memory_pre_plug(dev, MACHINE(hotplug_dev), errp);
>  out:
>  g_free(mem_dev);
>  }
> diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
> index f0e6867803..7d46a0a0cb 100644
> --- a/include/hw/mem/pc-dimm.h
> +++ b/include/hw/mem/pc-dimm.h
> @@ -80,6 +80,8 @@ typedef struct PCDIMMDeviceClass {
>  
>  int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp);
>  
> +void pc_dimm_memory_pre_plug(DeviceState *dev, MachineState *machine,
> + Error **errp);
>  void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine,
>   uint64_t align, Error **errp);
>  void pc_dimm_memory_unplug(DeviceState *dev, MachineState *machine);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v4 07/14] softfloat: add float{32, 64}_is_zero_or_normal

2018-06-11 Thread Emilio G. Cota

These will gain some users very soon.

Signed-off-by: Emilio G. Cota 
---
 include/fpu/softfloat.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 1fbece5..08f63ae 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -422,6 +422,11 @@ static inline bool float32_is_denormal(float32 a)
 return float32_is_zero_or_denormal(a) && !float32_is_zero(a);
 }
 
+static inline bool float32_is_zero_or_normal(float32 a)
+{
+return float32_is_normal(a) || float32_is_zero(a);
+}
+
 static inline float32 float32_set_sign(float32 a, int sign)
 {
 return make_float32((float32_val(a) & 0x7fff) | (sign << 31));
@@ -561,6 +566,11 @@ static inline bool float64_is_denormal(float64 a)
 return float64_is_zero_or_denormal(a) && !float64_is_zero(a);
 }
 
+static inline bool float64_is_zero_or_normal(float64 a)
+{
+return float64_is_normal(a) || float64_is_zero(a);
+}
+
 static inline float64 float64_set_sign(float64 a, int sign)
 {
 return make_float64((float64_val(a) & 0x7fffULL)
-- 
2.7.4

Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-11 Thread Jason Wang





On 2018年06月12日 01:26, Michael S. Tsirkin wrote:

On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:

This feature bit can be used by hypervisor to indicate virtio_net device to
act as a standby for another device with the same MAC address.

I tested this with a small change to the patch to mark the STANDBY feature 
'true'
by default as i am using libvirt to start the VMs.
Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
XML file?

Signed-off-by: Sridhar Samudrala 

So I do not think we can commit to this interface: we
really need to control visibility of the primary device.


The problem is legacy guest won't use primary device at all if we do this.

How about control the visibility of standby device?

Thanks


However just for testing purposes, we could add a non-stable
interface "x-standby" with the understanding that as any
x- prefix it's unstable and will be changed down the road,
likely in the next release.



---
  hw/net/virtio-net.c | 2 ++
  include/standard-headers/linux/virtio_net.h | 3 +++
  2 files changed, 5 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 90502fca7c..38b3140670 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
   true),
  DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
  DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
VIRTIO_NET_F_STANDBY,
+  false),
  DEFINE_PROP_END_OF_LIST(),
  };
  
diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h

index e9f255ea3f..01ec09684c 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -57,6 +57,9 @@
 * Steering */
  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
  
+#define VIRTIO_NET_F_STANDBY  62/* Act as standby for another device

+ * with the same MAC.
+ */
  #define VIRTIO_NET_F_SPEED_DUPLEX 63  /* Device set linkspeed and duplex */
  
  #ifndef VIRTIO_NET_NO_LEGACY

--
2.14.3

[Qemu-devel] [PATCH v4 04/14] target/tricore: use float32_is_denormal

2018-06-11 Thread Emilio G. Cota

Cc: Bastian Koppelmann 
Reviewed-by: Bastian Koppelmann 
Signed-off-by: Emilio G. Cota 
---
 target/tricore/fpu_helper.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/target/tricore/fpu_helper.c b/target/tricore/fpu_helper.c
index df16290..31df462 100644
--- a/target/tricore/fpu_helper.c
+++ b/target/tricore/fpu_helper.c
@@ -44,11 +44,6 @@ static inline uint8_t f_get_excp_flags(CPUTriCoreState *env)
   | float_flag_inexact);
 }
 
-static inline bool f_is_denormal(float32 arg)
-{
-return float32_is_zero_or_denormal(arg) && !float32_is_zero(arg);
-}
-
 static inline float32 f_maddsub_nan_result(float32 arg1, float32 arg2,
float32 arg3, float32 result,
uint32_t muladd_negate_c)
@@ -260,8 +255,8 @@ uint32_t helper_fcmp(CPUTriCoreState *env, uint32_t r1, 
uint32_t r2)
 set_flush_inputs_to_zero(0, >fp_status);
 
 result = 1 << (float32_compare_quiet(arg1, arg2, >fp_status) + 1);
-result |= f_is_denormal(arg1) << 4;
-result |= f_is_denormal(arg2) << 5;
+result |= float32_is_denormal(arg1) << 4;
+result |= float32_is_denormal(arg2) << 5;
 
 flags = f_get_excp_flags(env);
 if (flags) {
-- 
2.7.4

[Qemu-devel] [PATCH v4 03/14] softfloat: add float{32, 64}_is_{de, }normal

2018-06-11 Thread Emilio G. Cota

This paves the way for upcoming work.

Cc: Bastian Koppelmann 
Reviewed-by: Bastian Koppelmann 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 include/fpu/softfloat.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 69f4dbc..1fbece5 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -412,6 +412,16 @@ static inline int float32_is_zero_or_denormal(float32 a)
 return (float32_val(a) & 0x7f80) == 0;
 }
 
+static inline bool float32_is_normal(float32 a)
+{
+return ((float32_val(a) + 0x0080) & 0x7fff) >= 0x0100;
+}
+
+static inline bool float32_is_denormal(float32 a)
+{
+return float32_is_zero_or_denormal(a) && !float32_is_zero(a);
+}
+
 static inline float32 float32_set_sign(float32 a, int sign)
 {
 return make_float32((float32_val(a) & 0x7fff) | (sign << 31));
@@ -541,6 +551,16 @@ static inline int float64_is_zero_or_denormal(float64 a)
 return (float64_val(a) & 0x7ff0LL) == 0;
 }
 
+static inline bool float64_is_normal(float64 a)
+{
+return ((float64_val(a) + (1ULL << 52)) & -1ULL >> 1) >= 1ULL << 53;
+}
+
+static inline bool float64_is_denormal(float64 a)
+{
+return float64_is_zero_or_denormal(a) && !float64_is_zero(a);
+}
+
 static inline float64 float64_set_sign(float64 a, int sign)
 {
 return make_float64((float64_val(a) & 0x7fffULL)
-- 
2.7.4

[Qemu-devel] [PATCH v4 14/14] hardfloat: support float32/64 comparison

2018-06-11 Thread Emilio G. Cota

Performance results for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
cmp-single: 113.01 MFlops
cmp-double: 115.54 MFlops
- after:
cmp-single: 527.83 MFlops
cmp-double: 457.21 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
cmp-single: 39.32 MFlops
cmp-double: 39.80 MFlops
- after:
cmp-single: 162.74 MFlops
cmp-double: 167.08 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
cmp-single: 60.81 MFlops
cmp-double: 62.76 MFlops
- after:
cmp-single: 235.39 MFlops
cmp-double: 283.44 MFlops

Here using float{32,64}_is_any_nan is faster than using isnan
for all machines. On x86_64 the perf difference is just a few
percentage points, but on aarch64 we go from 117/119 to
164/169 MFlops for single/double precision, respectively.

Aggregate performance improvement for the last few patches:
[ all charts in png: https://imgur.com/a/4yV8p ]

1. Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

   qemu-aarch64 NBench score; higher is better
 Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

  16 +-+---+-+===---+---===---+---+-+
  14 +-+..@@@&&.=...@@@&&.=...+-+
  12 +-+..@.@.&.=...@.@.&.=.+befor=== +-+
  10 +-+..@.@.&.=...@.@.&.=.+ad@@&& = +-+
   8 +-+...$$$%.@.&.=...@.@.&.=.+  @@u& = +-+
   6 +-+@@@&&=+***##.$%.@.&.=***##$$%+@.&.=..###$$%%@i& = +-+
   4 +-+...###$%%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=+**.#+$ +@m& = +-+
   2 +-+.***.#$.%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=.**.#+$+sqr& = +-+
   0 +-+-***##$%%@@&&=-***##$$%@@&&==***##$$%@@&&==-**##$$%+cmp==-+-+
FOURIERNEURAL NELU DECOMPOSITION gmean

  qemu-aarch64 SPEC06fp (test set) speedup over 
QEMU 4c2c1015905
  Host: Intel(R) Core(TM) i7-6700K CPU @ 
4.00GHz
error bars: 95% confidence interval

  4.5 
+-+---+-++-+-+-&---+-++-+-+-++-+-+-+-++-+---+-+
4 
+-+..+@@+...+-+
  3.5 
+-+..%%@&.@@..%%@&+++dsub
   +-+
  2.5 
+-+&&+...%%@&...+%%@..+%%&+..@@&+.%%@&+%%&+.+%@&++%%@&
  +-+
2 
+-+..+%%&..+%@&+.%%@&...+++..%%@...%%&.+$$@&..%%@&..%%@&...+%%&+.%%@&+..+%%@&.+%%&++$$@&++d%@&
  %%@&+-+
  1.5 
+-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+f%@&**$%@&+-+
  0.5 
+-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&+sqr@&**$%@&+-+
0 
+-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+cmp&**$%@&+-+
  
410.bw416.gam433.434.z435.436.cac437.lesli444.447.de450.so453454.ca459.GemsF465.tont470.lb4482.sphinxgeomean

2. Host: ARM Aarch64 A57 @ 2.4GHz

qemu-aarch64 NBench score; higher is better
 Host: Applied Micro X-Gene, Aarch64 A57 @ 2.4 GHz

5 +-+---+-+-+-+---+-+
  4.5 +-+@@@&==...+-+
  3 4 +-+..@@@&==@.@&.=.+before   +-+
3 +-+..@.@&.=@.@&.=.+ad@@@&== +-+
  2.5 +-+.##$$%%.@&.=@.@&.=.+  @m@& = +-+
2 +-+@@@&==.***#.$.%.@&.=.***#$$%%.@&.=.***#$$%%d@& = +-+
  1.5 +-+.***#$$%%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$ +f@& = +-+
  0.5 +-+.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$+sqr& = +-+
0 +-+-***#$$%%@@&==-***#$$%%@@&==-***#$$%%@@&==-***#$$%+cmp==-+-+
 FOURIERNEURAL NLU DECOMPOSITION gmean

Note that by not inlining the soft-fp primitives we end up
with a smaller softfloat.o--in particular, see the difference
for the softfloat.o built for fp-bench:

- before this series:
   textdata bss dec hex filename
 103235   0   0  103235   19343 softfloat.o
- after:
   textdata bss dec hex filename
  93369   0   0   93369   16cb9 softfloat.o

Signed-off-by: Emilio G. Cota 
---
 fpu/softfloat.c | 74 ++---
 1 file changed, 60 insertions(+), 14 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index f89e872..1cf74d1 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2671,28 +2671,74 @@ static int compare_floats(FloatParts a, FloatParts b, 
bool is_quiet,
 }
 }
 
-#define COMPARE(sz) \
-int float ## sz ## _compare(float ## sz a, float ## sz b,   \
-

[Qemu-devel] [PATCH v4 00/14] fp-test + hardfloat

2018-06-11 Thread Emilio G. Cota

Sending this respin (little more than a rebase) in case there's
reviewer bandwidth available until the soft-freeze in 3 weeks.

v3: https://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00455.html

Not much has changed since April's v3:

- Rebase to current master, fixing some conflicts.
- Add R-b tags
- Fix a typo in the long "hardfloat" comment.
- Move  header includes right after #include "qemu/osdep.h"
  Did this after seeing this message from Eric Blake:
  https://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00831.html

Note: there are some checkpatch errors, but they are false positives.

Perf numbers for fp-bench are in each commit log; numbers for several
benchmarks are in the last patch's commit log.

You can fetch this series from:
  https://github.com/cota/qemu/tree/hardfloat-v4

Thanks,

Emilio

[Qemu-devel] [PATCH v4 11/14] hardfloat: support float32/64 division

2018-06-11 Thread Emilio G. Cota

Performance results for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
div-single: 34.84 MFlops
div-double: 34.04 MFlops
- after:
div-single: 275.23 MFlops
div-double: 216.38 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
div-single: 9.33 MFlops
div-double: 9.30 MFlops
- after:
div-single: 51.55 MFlops
div-double: 15.09 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
div-single: 25.65 MFlops
div-double: 24.91 MFlops
- after:
div-single: 96.83 MFlops
div-double: 31.01 MFlops

Here setting 2FP64_USE_FP to 1 pays off for x86_64:
[1] 215.97 vs [0] 62.15 MFlops

Signed-off-by: Emilio G. Cota 
---
 fpu/softfloat.c | 88 +++--
 1 file changed, 86 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4fcabf6..fa6c3b6 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1659,7 +1659,8 @@ float16 float16_div(float16 a, float16 b, float_status 
*status)
 return float16_round_pack_canonical(pr, status);
 }
 
-float32 float32_div(float32 a, float32 b, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_float32_div(float32 a, float32 b, float_status *status)
 {
 FloatParts pa = float32_unpack_canonical(a, status);
 FloatParts pb = float32_unpack_canonical(b, status);
@@ -1668,7 +1669,8 @@ float32 float32_div(float32 a, float32 b, float_status 
*status)
 return float32_round_pack_canonical(pr, status);
 }
 
-float64 float64_div(float64 a, float64 b, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_float64_div(float64 a, float64 b, float_status *status)
 {
 FloatParts pa = float64_unpack_canonical(a, status);
 FloatParts pb = float64_unpack_canonical(b, status);
@@ -1677,6 +1679,88 @@ float64 float64_div(float64 a, float64 b, float_status 
*status)
 return float64_round_pack_canonical(pr, status);
 }
 
+static float float_div(float a, float b)
+{
+return a / b;
+}
+
+static double double_div(double a, double b)
+{
+return a / b;
+}
+
+static bool f32_div_pre(float32 a, float32 b, const struct float_status *s)
+{
+return likely(float32_is_zero_or_normal(a) &&
+  float32_is_normal(b) &&
+  can_use_fpu(s));
+}
+
+static bool f64_div_pre(float64 a, float64 b, const struct float_status *s)
+{
+return likely(float64_is_zero_or_normal(a) &&
+  float64_is_normal(b) &&
+  can_use_fpu(s));
+}
+
+static bool float_div_pre(float a, float b, const struct float_status *s)
+{
+return likely((fpclassify(a) == FP_NORMAL || fpclassify(a) == FP_ZERO) &&
+  fpclassify(b) == FP_NORMAL &&
+  can_use_fpu(s));
+}
+
+static bool double_div_pre(double a, double b, const struct float_status *s)
+{
+return likely((fpclassify(a) == FP_NORMAL || fpclassify(a) == FP_ZERO) &&
+  fpclassify(b) == FP_NORMAL &&
+  can_use_fpu(s));
+}
+
+static bool f32_div_post(float32 a, float32 b, const struct float_status *s)
+{
+return !float32_is_zero(a);
+}
+
+static bool f64_div_post(float64 a, float64 b, const struct float_status *s)
+{
+return !float64_is_zero(a);
+}
+
+static bool float_div_post(float a, float b, const struct float_status *s)
+{
+return fpclassify(a) != FP_ZERO;
+}
+
+static bool double_div_post(double a, double b, const struct float_status *s)
+{
+return fpclassify(a) != FP_ZERO;
+}
+
+float32 __attribute__((flatten))
+float32_div(float32 a, float32 b, float_status *s)
+{
+if (QEMU_HARDFLOAT_2F32_USE_FP) {
+return float_gen2(a, b, s, float_div, soft_float32_div, float_div_pre,
+  float_div_post, NULL, NULL);
+} else {
+return f32_gen2(a, b, s, float_div, soft_float32_div, f32_div_pre,
+f32_div_post, NULL, NULL);
+}
+}
+
+float64 __attribute__((flatten))
+float64_div(float64 a, float64 b, float_status *s)
+{
+if (QEMU_HARDFLOAT_2F64_USE_FP) {
+return double_gen2(a, b, s, double_div, soft_float64_div,
+   double_div_pre, double_div_post, NULL, NULL);
+} else {
+return f64_gen2(a, b, s, double_div, soft_float64_div, f64_div_pre,
+f64_div_post, NULL, NULL);
+}
+}
+
 /*
  * Float to Float conversions
  *
-- 
2.7.4

Re: [Qemu-devel] [PATCH v1 05/11] spapr: move memory hotplug size check into plug code

2018-06-11 Thread David Gibson

On Mon, Jun 11, 2018 at 02:16:49PM +0200, David Hildenbrand wrote:
> This might look like a step backwards, but it is not. get_memory_region()
> should not be called on uninititalized devices. In general, only
> properties should be access, but no "derived" satte like the memory
> region.
> 
> 1. We need duplicate error checks if memdev is actually already set.
>realize() performs these checks, no need to duplicate.
> 2. This is bad practise as one can see when looking at the NVDIMM
>implemetation. The call does not return sane data before the device
>is realized. Although spapr does not use NVDIMM, conceptually it is
>wrong.
> 
> So let's just move this call to the right place. We can then cleanup
> get_memory_region().
> 
> Signed-off-by: David Hildenbrand 

Acked-by: David Gibson 

> ---
>  hw/ppc/spapr.c | 21 ++---
>  1 file changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index f5daac..a5f1bbd58a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3153,6 +3153,12 @@ static void spapr_memory_plug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  align = memory_region_get_alignment(mr);
>  size = memory_region_size(mr);
>  
> +if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> +error_setg(_err, "Hotplugged memory size must be a multiple of 
> "
> +   "%lld MB", SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
> +goto out;
> +}
> +
>  pc_dimm_memory_plug(dev, MACHINE(ms), align, _err);
>  if (local_err) {
>  goto out;
> @@ -3186,9 +3192,6 @@ static void spapr_memory_pre_plug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  {
>  const sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(hotplug_dev);
>  PCDIMMDevice *dimm = PC_DIMM(dev);
> -PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> -MemoryRegion *mr;
> -uint64_t size;
>  char *mem_dev;
>  
>  if (!smc->dr_lmb_enabled) {
> @@ -3196,18 +3199,6 @@ static void spapr_memory_pre_plug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  return;
>  }
>  
> -mr = ddc->get_memory_region(dimm, errp);
> -if (!mr) {
> -return;
> -}
> -size = memory_region_size(mr);
> -
> -if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> -error_setg(errp, "Hotplugged memory size must be a multiple of "
> -  "%lld MB", SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
> -return;
> -}
> -
>  mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, 
> NULL);
>  if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
>  error_setg(errp, "Memory backend has bad page size. "

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v4 10/14] hardfloat: support float32/64 multiplication

2018-06-11 Thread Emilio G. Cota

Performance results for fp-bench:

1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
mul-single: 126.91 MFlops
mul-double: 118.28 MFlops
- after:
mul-single: 258.02 MFlops
mul-double: 197.96 MFlops

2. ARM Aarch64 A57 @ 2.4GHz
- before:
mul-single: 37.42 MFlops
mul-double: 38.77 MFlops
- after:
mul-single: 73.41 MFlops
mul-double: 76.93 MFlops

3. IBM POWER8E @ 2.1 GHz
- before:
mul-single: 58.40 MFlops
mul-double: 59.33 MFlops
- after:
mul-single: 60.25 MFlops
mul-double: 94.79 MFlops

Signed-off-by: Emilio G. Cota 
---
 fpu/softfloat.c | 66 +
 1 file changed, 62 insertions(+), 4 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index cdce6b2..4fcabf6 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1276,8 +1276,8 @@ float16 __attribute__((flatten)) float16_mul(float16 a, 
float16 b,
 return float16_round_pack_canonical(pr, status);
 }
 
-float32 __attribute__((flatten)) float32_mul(float32 a, float32 b,
- float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_float32_mul(float32 a, float32 b, float_status *status)
 {
 FloatParts pa = float32_unpack_canonical(a, status);
 FloatParts pb = float32_unpack_canonical(b, status);
@@ -1286,8 +1286,8 @@ float32 __attribute__((flatten)) float32_mul(float32 a, 
float32 b,
 return float32_round_pack_canonical(pr, status);
 }
 
-float64 __attribute__((flatten)) float64_mul(float64 a, float64 b,
- float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_float64_mul(float64 a, float64 b, float_status *status)
 {
 FloatParts pa = float64_unpack_canonical(a, status);
 FloatParts pb = float64_unpack_canonical(b, status);
@@ -1296,6 +1296,64 @@ float64 __attribute__((flatten)) float64_mul(float64 a, 
float64 b,
 return float64_round_pack_canonical(pr, status);
 }
 
+static float float_mul(float a, float b)
+{
+return a * b;
+}
+
+static double double_mul(double a, double b)
+{
+return a * b;
+}
+
+static bool f32_mul_fast(float32 a, float32 b, const struct float_status *s)
+{
+return float32_is_zero(a) || float32_is_zero(b);
+}
+
+static bool f64_mul_fast(float64 a, float64 b, const struct float_status *s)
+{
+return float64_is_zero(a) || float64_is_zero(b);
+}
+
+static float32 f32_mul_fast_op(float32 a, float32 b, float_status *s)
+{
+bool signbit = float32_is_neg(a) ^ float32_is_neg(b);
+
+return float32_set_sign(float32_zero, signbit);
+}
+
+static float64 f64_mul_fast_op(float64 a, float64 b, float_status *s)
+{
+bool signbit = float64_is_neg(a) ^ float64_is_neg(b);
+
+return float64_set_sign(float64_zero, signbit);
+}
+
+float32 __attribute__((flatten))
+float32_mul(float32 a, float32 b, float_status *s)
+{
+if (QEMU_HARDFLOAT_2F32_USE_FP) {
+return float_gen2(a, b, s, float_mul, soft_float32_mul, float_is_zon2,
+  NULL, f32_mul_fast, f32_mul_fast_op);
+} else {
+return f32_gen2(a, b, s, float_mul, soft_float32_mul, f32_is_zon2, 
NULL,
+f32_mul_fast, f32_mul_fast_op);
+}
+}
+
+float64 __attribute__((flatten))
+float64_mul(float64 a, float64 b, float_status *s)
+{
+if (QEMU_HARDFLOAT_2F64_USE_FP) {
+return double_gen2(a, b, s, double_mul, soft_float64_mul,
+   double_is_zon2, NULL, f64_mul_fast, 
f64_mul_fast_op);
+} else {
+return f64_gen2(a, b, s, double_mul, soft_float64_mul, f64_is_zon2,
+NULL, f64_mul_fast, f64_mul_fast_op);
+}
+}
+
 /*
  * Returns the result of multiplying the floating-point values `a' and
  * `b' then adding 'c', with no intermediate rounding step after the
-- 
2.7.4

[Qemu-devel] [PATCH v4 06/14] softfloat: rename canonicalize to sf_canonicalize

2018-06-11 Thread Emilio G. Cota

glibc >= 2.25 defines canonicalize in commit eaf5ad0
(Add canonicalize, canonicalizef, canonicalizel., 2016-10-26).

Given that we'll be including  soon, prepare
for this by prefixing our canonicalize() with sf_ to avoid
clashing with the libc's canonicalize().

Cc: Bastian Koppelmann 
Reported-by: Bastian Koppelmann 
Tested-by: Bastian Koppelmann 
Signed-off-by: Emilio G. Cota 
---
 fpu/softfloat.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 8cd2400..2ab5a88 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -336,8 +336,8 @@ static inline float64 float64_pack_raw(FloatParts p)
 #include "softfloat-specialize.h"
 
 /* Canonicalize EXP and FRAC, setting CLS.  */
-static FloatParts canonicalize(FloatParts part, const FloatFmt *parm,
-   float_status *status)
+static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
+  float_status *status)
 {
 if (part.exp == parm->exp_max && !parm->arm_althp) {
 if (part.frac == 0) {
@@ -513,7 +513,7 @@ static FloatParts round_canonical(FloatParts p, 
float_status *s,
 static FloatParts float16a_unpack_canonical(float16 f, float_status *s,
 const FloatFmt *params)
 {
-return canonicalize(float16_unpack_raw(f), params, s);
+return sf_canonicalize(float16_unpack_raw(f), params, s);
 }
 
 static FloatParts float16_unpack_canonical(float16 f, float_status *s)
@@ -534,7 +534,7 @@ static float16 float16_round_pack_canonical(FloatParts p, 
float_status *s)
 
 static FloatParts float32_unpack_canonical(float32 f, float_status *s)
 {
-return canonicalize(float32_unpack_raw(f), _params, s);
+return sf_canonicalize(float32_unpack_raw(f), _params, s);
 }
 
 static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
@@ -544,7 +544,7 @@ static float32 float32_round_pack_canonical(FloatParts p, 
float_status *s)
 
 static FloatParts float64_unpack_canonical(float64 f, float_status *s)
 {
-return canonicalize(float64_unpack_raw(f), _params, s);
+return sf_canonicalize(float64_unpack_raw(f), _params, s);
 }
 
 static float64 float64_round_pack_canonical(FloatParts p, float_status *s)
-- 
2.7.4

Re: [Qemu-devel] [PATCH v1 04/11] hostmem: drop error variable from host_memory_backend_get_memory()

2018-06-11 Thread David Gibson

On Mon, Jun 11, 2018 at 02:16:48PM +0200, David Hildenbrand wrote:
> Unused, so let's remove it.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: David Gibson 

> ---
>  backends/hostmem.c   | 3 +--
>  hw/mem/nvdimm.c  | 4 ++--
>  hw/mem/pc-dimm.c | 4 ++--
>  hw/misc/ivshmem.c| 3 +--
>  include/sysemu/hostmem.h | 3 +--
>  numa.c   | 3 +--
>  6 files changed, 8 insertions(+), 12 deletions(-)
> 
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index 3627e61584..4908946cd3 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -247,8 +247,7 @@ bool host_memory_backend_mr_inited(HostMemoryBackend 
> *backend)
>  return memory_region_size(>mr) != 0;
>  }
>  
> -MemoryRegion *
> -host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp)
> +MemoryRegion *host_memory_backend_get_memory(HostMemoryBackend *backend)
>  {
>  return host_memory_backend_mr_inited(backend) ? >mr : NULL;
>  }
> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> index f974accbdd..df9716231f 100644
> --- a/hw/mem/nvdimm.c
> +++ b/hw/mem/nvdimm.c
> @@ -105,7 +105,7 @@ static MemoryRegion 
> *nvdimm_get_memory_region(PCDIMMDevice *dimm, Error **errp)
>  
>  static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp)
>  {
> -MemoryRegion *mr = host_memory_backend_get_memory(dimm->hostmem, errp);
> +MemoryRegion *mr = host_memory_backend_get_memory(dimm->hostmem);
>  NVDIMMDevice *nvdimm = NVDIMM(dimm);
>  uint64_t align, pmem_size, size = memory_region_size(mr);
>  
> @@ -161,7 +161,7 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, 
> const void *buf,
>  
>  memcpy(nvdimm->label_data + offset, buf, size);
>  
> -mr = host_memory_backend_get_memory(dimm->hostmem, _abort);
> +mr = host_memory_backend_get_memory(dimm->hostmem);
>  backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
>  memory_region_set_dirty(mr, backend_offset, size);
>  }
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 62b34a992e..86fbcf2d0c 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -224,12 +224,12 @@ static MemoryRegion 
> *pc_dimm_get_memory_region(PCDIMMDevice *dimm, Error **errp)
>  return NULL;
>  }
>  
> -return host_memory_backend_get_memory(dimm->hostmem, errp);
> +return host_memory_backend_get_memory(dimm->hostmem);
>  }
>  
>  static MemoryRegion *pc_dimm_get_vmstate_memory_region(PCDIMMDevice *dimm)
>  {
> -return host_memory_backend_get_memory(dimm->hostmem, _abort);
> +return host_memory_backend_get_memory(dimm->hostmem);
>  }
>  
>  static uint64_t pc_dimm_md_get_addr(const MemoryDeviceState *md)
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 16f03701b7..ee01c5e66b 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -909,8 +909,7 @@ static void ivshmem_common_realize(PCIDevice *dev, Error 
> **errp)
>  if (s->hostmem != NULL) {
>  IVSHMEM_DPRINTF("using hostmem\n");
>  
> -s->ivshmem_bar2 = host_memory_backend_get_memory(s->hostmem,
> - _abort);
> +s->ivshmem_bar2 = host_memory_backend_get_memory(s->hostmem);
>  } else {
>  Chardev *chr = qemu_chr_fe_get_driver(>server_chr);
>  assert(chr);
> diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
> index 5beb0ef8ab..6e6bd2c1cb 100644
> --- a/include/sysemu/hostmem.h
> +++ b/include/sysemu/hostmem.h
> @@ -62,8 +62,7 @@ struct HostMemoryBackend {
>  };
>  
>  bool host_memory_backend_mr_inited(HostMemoryBackend *backend);
> -MemoryRegion *host_memory_backend_get_memory(HostMemoryBackend *backend,
> - Error **errp);
> +MemoryRegion *host_memory_backend_get_memory(HostMemoryBackend *backend);
>  
>  void host_memory_backend_set_mapped(HostMemoryBackend *backend, bool mapped);
>  bool host_memory_backend_is_mapped(HostMemoryBackend *backend);
> diff --git a/numa.c b/numa.c
> index 33572bfa74..94f758c757 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -523,8 +523,7 @@ void memory_region_allocate_system_memory(MemoryRegion 
> *mr, Object *owner,
>  if (!backend) {
>  continue;
>  }
> -MemoryRegion *seg = host_memory_backend_get_memory(backend,
> -   _fatal);
> +MemoryRegion *seg = host_memory_backend_get_memory(backend);
>  
>  if (memory_region_is_mapped(seg)) {
>  char *path = 
> object_get_canonical_path_component(OBJECT(backend));

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v1 08/11] pc-dimm: get_memory_region() will never return a NULL pointer

2018-06-11 Thread David Gibson

On Mon, Jun 11, 2018 at 02:16:52PM +0200, David Hildenbrand wrote:
> This is guaranteed by passing into host_memory_backend_get_memory() a
> value that is not NULL - which is what we always do.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: David Gibson 

> ---
>  hw/mem/pc-dimm.c | 15 ++-
>  1 file changed, 2 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 7bb6ce509c..9a0da5d441 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -157,7 +157,6 @@ static void pc_dimm_get_size(Object *obj, Visitor *v, 
> const char *name,
>   void *opaque, Error **errp)
>  {
>  uint64_t value;
> -MemoryRegion *mr;
>  PCDIMMDevice *dimm = PC_DIMM(obj);
>  PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(obj);
>  
> @@ -167,11 +166,7 @@ static void pc_dimm_get_size(Object *obj, Visitor *v, 
> const char *name,
>  return;
>  }
>  
> -mr = ddc->get_memory_region(dimm);
> -if (!mr) {
> -return;
> -}
> -value = memory_region_size(mr);
> +value = memory_region_size(ddc->get_memory_region(dimm));
>  
>  visit_type_uint64(v, name, , errp);
>  }
> @@ -241,14 +236,8 @@ static uint64_t pc_dimm_md_get_region_size(const 
> MemoryDeviceState *md)
>  /* dropping const here is fine as we don't touch the memory region */
>  PCDIMMDevice *dimm = PC_DIMM(md);
>  const PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(md);
> -MemoryRegion *mr;
> -
> -mr = ddc->get_memory_region(dimm);
> -if (!mr) {
> -return 0;
> -}
>  
> -return memory_region_size(mr);
> +return memory_region_size(ddc->get_memory_region(dimm));
>  }
>  
>  static void pc_dimm_md_fill_device_info(const MemoryDeviceState *md,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v4 01/14] tests: add fp-test, a floating point test suite

2018-06-11 Thread Emilio G. Cota

This will allow us to run correctness tests against our
FP implementation. The test can be run in two modes (called
"testers"): host and soft. With the former we check the results
and FP flags on the host machine against the model.
With the latter we check QEMU's fpu primitives against the
model. Note that in soft mode we are not instantiating any
particular CPU (hence the HW_POISON_H hack to avoid macro poisoning);
for that we need to run the test in host mode under QEMU.

The input files are taken from IBM's FPGen test suite:
https://www.research.ibm.com/haifa/projects/verification/fpgen/

I see no license file in there so I am just downloading them
with wget. We might want to keep a copy on a qemu server though,
in case IBM takes those files down in the future.

The "IBM" syntax of those files (for now the only syntax supported
in fp-test) is documented here:
https://www.research.ibm.com/haifa/projects/verification/fpgen/papers/ieee-test-suite-v2.pdf

Note that the syntax document has some inaccuracies; the appended
parsing code works around some of those.

The exception flag (-e) is important: many of the optimizations
included in the following commits assume that the inexact flag
is set, so "-e x" is necessary in order to test those code paths.

The whitelist flag (-w) points to a file with test cases to be ignored.
I have put some whitelist files online, but we should have them
on a QEMU-related server.

Thus, a typical of fp-test is as follows:

  $ cd qemu/build/tests/fp-test
  $ make -j && \
./fp-test -t soft ibm/*.fptest \
-w whitelist.txt \
-e x

If we want to test after-rounding tininess detection, then we need to
pass "-a -w whitelist-tininess-after.txt" in addition to the above.
(NB. we can pass "-w" as many times as we want.)

The patch immediately after this one fixes a mismatch against the model
in softfloat, but after that is applied the above should finish with a 0
return code, and print something like:
  All tests OK.
  Tests passed: 76572. Not handled: 51237, whitelisted: 2662

The tests pass on "host" mode on x86_64 and aarch64 machines, although
note that for the x86_64 you need to pass -w whitelist-tininess-after.txt.

Running on host mode under QEMU reports flag mismatches (e.g. for
x86_64-linux-user), but that isn't too surprising given how little
love the i386 frontend gets. Host mode under aarch64-linux-user
passes OK.

Flush-to-zero and flush-inputs-to-zero modes can be tested with the
-z and -Z flags. Note however that the IBM input files are only
IEEE-compliant, so for now I've tested these modes by diff'ing
the reported errors against the model files. We should look into
generating files for these non-standard modes to make testing
these modes less painful.

Signed-off-by: Emilio G. Cota 
---
 configure  |2 +
 tests/fp/fp-test.c | 1158 
 tests/Makefile.include |3 +
 tests/fp/.gitignore|3 +
 tests/fp/Makefile  |   34 ++
 5 files changed, 1200 insertions(+)
 create mode 100644 tests/fp/fp-test.c
 create mode 100644 tests/fp/.gitignore
 create mode 100644 tests/fp/Makefile

diff --git a/configure b/configure
index 14b1113..49694c2 100755
--- a/configure
+++ b/configure
@@ -7186,12 +7186,14 @@ fi
 
 # build tree in object directory in case the source is not in the current 
directory
 DIRS="tests tests/tcg tests/tcg/cris tests/tcg/lm32 tests/libqos 
tests/qapi-schema tests/tcg/xtensa tests/qemu-iotests tests/vm"
+DIRS="$DIRS tests/fp"
 DIRS="$DIRS docs docs/interop fsdev scsi"
 DIRS="$DIRS pc-bios/optionrom pc-bios/spapr-rtas pc-bios/s390-ccw"
 DIRS="$DIRS roms/seabios roms/vgabios"
 FILES="Makefile tests/tcg/Makefile qdict-test-data.txt"
 FILES="$FILES tests/tcg/cris/Makefile tests/tcg/cris/.gdbinit"
 FILES="$FILES tests/tcg/lm32/Makefile tests/tcg/xtensa/Makefile po/Makefile"
+FILES="$FILES tests/fp/Makefile"
 FILES="$FILES pc-bios/optionrom/Makefile pc-bios/keymaps"
 FILES="$FILES pc-bios/spapr-rtas/Makefile"
 FILES="$FILES pc-bios/s390-ccw/Makefile"
diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
new file mode 100644
index 000..6be9ce7
--- /dev/null
+++ b/tests/fp/fp-test.c
@@ -0,0 +1,1158 @@
+/*
+ * fp-test.c - Floating point test suite.
+ *
+ * Copyright (C) 2018, Emilio G. Cota 
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+#ifndef HW_POISON_H
+#error Must define HW_POISON_H to work around TARGET_* poisoning
+#endif
+
+#include "qemu/osdep.h"
+#include 
+#include 
+#include "fpu/softfloat.h"
+
+enum error {
+ERROR_NONE,
+ERROR_NOT_HANDLED,
+ERROR_WHITELISTED,
+ERROR_COMMENT,
+ERROR_INPUT,
+ERROR_RESULT,
+ERROR_EXCEPTIONS,
+ERROR_MAX,
+};
+
+enum input_fmt {
+INPUT_FMT_IBM,
+};
+
+struct input {
+const char * const name;
+enum error (*test_line)(const char *line);
+};
+
+enum precision {
+PREC_FLOAT,
+PREC_DOUBLE,
+PREC_QUAD,
+

Re: [Qemu-devel] [PATCH v1 07/11] pc-dimm: get_memory_region() can never fail

2018-06-11 Thread David Gibson

On Mon, Jun 11, 2018 at 02:16:51PM +0200, David Hildenbrand wrote:
> We already verify when realizing that the memdev property has been
> set. We have no more accesses to get_memory_region() before the device
> is realized.
> 
> So this function will never fail. Remove the stale check and the
> error variable. Add a comment to the functions stating that they should
> never be called on uninitialized devices.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: David Gibson 

and ppc parts

Acked-by: David Gibson 

> ---
>  hw/i386/pc.c |  7 +--
>  hw/mem/nvdimm.c  |  2 +-
>  hw/mem/pc-dimm.c | 21 ++---
>  hw/ppc/spapr.c   | 14 +++---
>  include/hw/mem/pc-dimm.h |  4 +++-
>  5 files changed, 14 insertions(+), 34 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 85c040482e..017396fe84 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1706,15 +1706,10 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>  PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>  PCDIMMDevice *dimm = PC_DIMM(dev);
>  PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> -MemoryRegion *mr;
> +MemoryRegion *mr = ddc->get_memory_region(dimm);
>  uint64_t align = TARGET_PAGE_SIZE;
>  bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>  
> -mr = ddc->get_memory_region(dimm, _err);
> -if (local_err) {
> -goto out;
> -}
> -
>  if (memory_region_get_alignment(mr) && pcmc->enforce_aligned_dimm) {
>  align = memory_region_get_alignment(mr);
>  }
> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> index df9716231f..b2dc2bbb50 100644
> --- a/hw/mem/nvdimm.c
> +++ b/hw/mem/nvdimm.c
> @@ -96,7 +96,7 @@ static void nvdimm_init(Object *obj)
>   nvdimm_get_unarmed, nvdimm_set_unarmed, NULL);
>  }
>  
> -static MemoryRegion *nvdimm_get_memory_region(PCDIMMDevice *dimm, Error 
> **errp)
> +static MemoryRegion *nvdimm_get_memory_region(PCDIMMDevice *dimm)
>  {
>  NVDIMMDevice *nvdimm = NVDIMM(dimm);
>  
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 5294734529..7bb6ce509c 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -35,14 +35,9 @@ void pc_dimm_memory_plug(DeviceState *dev, MachineState 
> *machine,
>  PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>  MemoryRegion *vmstate_mr = ddc->get_vmstate_memory_region(dimm);
>  Error *local_err = NULL;
> -MemoryRegion *mr;
> +MemoryRegion *mr = ddc->get_memory_region(dimm);
>  uint64_t addr;
>  
> -mr = ddc->get_memory_region(dimm, _err);
> -if (local_err) {
> -goto out;
> -}
> -
>  addr = object_property_get_uint(OBJECT(dimm),
>  PC_DIMM_ADDR_PROP, _err);
>  if (local_err) {
> @@ -89,7 +84,7 @@ void pc_dimm_memory_unplug(DeviceState *dev, MachineState 
> *machine)
>  PCDIMMDevice *dimm = PC_DIMM(dev);
>  PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>  MemoryRegion *vmstate_mr = ddc->get_vmstate_memory_region(dimm);
> -MemoryRegion *mr = ddc->get_memory_region(dimm, _abort);
> +MemoryRegion *mr = ddc->get_memory_region(dimm);
>  
>  memory_device_unplug_region(machine, mr);
>  vmstate_unregister_ram(vmstate_mr, dev);
> @@ -172,7 +167,7 @@ static void pc_dimm_get_size(Object *obj, Visitor *v, 
> const char *name,
>  return;
>  }
>  
> -mr = ddc->get_memory_region(dimm, errp);
> +mr = ddc->get_memory_region(dimm);
>  if (!mr) {
>  return;
>  }
> @@ -223,13 +218,9 @@ static void pc_dimm_unrealize(DeviceState *dev, Error 
> **errp)
>  host_memory_backend_set_mapped(dimm->hostmem, false);
>  }
>  
> -static MemoryRegion *pc_dimm_get_memory_region(PCDIMMDevice *dimm, Error 
> **errp)
> +static MemoryRegion *pc_dimm_get_memory_region(PCDIMMDevice *dimm)
>  {
> -if (!dimm->hostmem) {
> -error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property must be set");
> -return NULL;
> -}
> -
> +g_assert(dimm->hostmem);
>  return host_memory_backend_get_memory(dimm->hostmem);
>  }
>  
> @@ -252,7 +243,7 @@ static uint64_t pc_dimm_md_get_region_size(const 
> MemoryDeviceState *md)
>  const PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(md);
>  MemoryRegion *mr;
>  
> -mr = ddc->get_memory_region(dimm, _abort);
> +mr = ddc->get_memory_region(dimm);
>  if (!mr) {
>  return 0;
>  }
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index a5f1bbd58a..214286fd2f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3142,14 +3142,10 @@ static void spapr_memory_plug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
>  PCDIMMDevice *dimm = PC_DIMM(dev);
>  PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> -MemoryRegion *mr;
> +MemoryRegion *mr = ddc->get_memory_region(dimm);
>  uint64_t align, size, addr;
>

Re: [Qemu-devel] [PATCH v1 09/11] pc-dimm: remove pc_dimm_get_vmstate_memory_region()

2018-06-11 Thread David Gibson

On Mon, Jun 11, 2018 at 02:16:53PM +0200, David Hildenbrand wrote:
> We can reuse pc_dimm_get_memory_region() now, as both functions are
> (besides the assert which is also correct), equal.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: David Gibson 

> ---
>  hw/mem/pc-dimm.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 9a0da5d441..bc79dd04d8 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -219,11 +219,6 @@ static MemoryRegion 
> *pc_dimm_get_memory_region(PCDIMMDevice *dimm)
>  return host_memory_backend_get_memory(dimm->hostmem);
>  }
>  
> -static MemoryRegion *pc_dimm_get_vmstate_memory_region(PCDIMMDevice *dimm)
> -{
> -return host_memory_backend_get_memory(dimm->hostmem);
> -}
> -
>  static uint64_t pc_dimm_md_get_addr(const MemoryDeviceState *md)
>  {
>  const PCDIMMDevice *dimm = PC_DIMM(md);
> @@ -282,7 +277,7 @@ static void pc_dimm_class_init(ObjectClass *oc, void 
> *data)
>  dc->desc = "DIMM memory module";
>  
>  ddc->get_memory_region = pc_dimm_get_memory_region;
> -ddc->get_vmstate_memory_region = pc_dimm_get_vmstate_memory_region;
> +ddc->get_vmstate_memory_region = pc_dimm_get_memory_region;
>  
>  mdc->get_addr = pc_dimm_md_get_addr;
>  /* for a dimm plugged_size == region_size */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v1 06/11] pc-dimm: don't allow to access "size" before the device was realized

2018-06-11 Thread David Gibson

On Mon, Jun 11, 2018 at 02:16:50PM +0200, David Hildenbrand wrote:
> "size" should not be queried before the device was realized. Let' make
> that explicit.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: David Gibson 

> ---
>  hw/mem/pc-dimm.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 86fbcf2d0c..5294734529 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -166,6 +166,12 @@ static void pc_dimm_get_size(Object *obj, Visitor *v, 
> const char *name,
>  PCDIMMDevice *dimm = PC_DIMM(obj);
>  PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(obj);
>  
> +if (!DEVICE(obj)->realized) {
> +error_setg(errp, "Property \"%s\" not accessible before realized",
> +   name);
> +return;
> +}
> +
>  mr = ddc->get_memory_region(dimm, errp);
>  if (!mr) {
>  return;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] question: a dead loop in qemu when do blockJobAbort and vm suspend coinstantaneously

2018-06-11 Thread Fam Zheng

On Mon, 06/11 11:31, l00284672 wrote:
> I tried your patch with my modification below can slove this problem.
> 
> void blk_set_aio_context(BlockBackend *blk, AioContext *new_context)
> {
>     BlockDriverState *bs = blk_bs(blk);
>     ThrottleGroupMember *tgm = >public.throttle_group_member;
> 
>     if (bs) {
>     if (tgm->throttle_state) {
>     bdrv_drained_begin(bs);
>     throttle_group_detach_aio_context(tgm);
>     throttle_group_attach_aio_context(tgm, new_context);
>     bdrv_drained_end(bs);
>     }
>      bdrv_ref(bs);
>    bdrv_set_aio_context(bs, new_context);
>        bdrv_unref(bs);
>     }
> }
> 
> I add bdrv_ref before bdrv_set_aio_context to avoid bs freed in
> mirror_exit.  Do you agree with
> my modification ?

TBH I don't understand this change. @blk should have a reference to @bs here,
no?  Why is an extra reference making any difference?

Fam

> 
> On 2018/6/11 11:01, l00284672 wrote:
> > 
> > Thanks for your reply.
> > 
> > I tried your patch but it didn't work for qemu crashed.  The qemu crash
> > bt is below:
> > 
> > (gdb) bt
> > #0  bdrv_detach_aio_context (bs=bs@entry=0x55a96b79ca30)
> > #1  0x55a9688249ae in bdrv_set_aio_context
> > (bs=bs@entry=0x55a96b79ca30,
> >     new_context=new_context@entry=0x55a96b766920)
> > #2  0x55a96885f721 in blk_set_aio_context (blk=0x55a96b792820,
> > new_context=0x55a96b766920)
> > #3  0x55a9685ab797 in virtio_blk_data_plane_stop (vdev= > out>)
> > #4  0x55a9687bf705 in virtio_bus_stop_ioeventfd (bus=0x55a96cc42220)
> > #5  0x55a9685d9d94 in virtio_vmstate_change (opaque=0x55a96cc42290,
> > running=,
> >     state=)
> > #6  0x55a96866e1a2 in vm_state_notify (running=running@entry=0,
> > state=state@entry=RUN_STATE_PAUSED)
> > #7  0x55a96857b4c5 in do_vm_stop (state=RUN_STATE_PAUSED)
> > #8  vm_stop (state=state@entry=RUN_STATE_PAUSED)
> > #9  0x55a96867d52b in qmp_stop (errp=errp@entry=0x7fff4e54a0d8)
> > #10 0x55a96867b6ab in qmp_marshal_stop (args=,
> > ret=, errp=0x7fff4e54a
> > #11 0x55a9688c2267 in do_qmp_dispatch (errp=0x7fff4e54a118,
> > request=0x55a96b7b4740)
> > #12 qmp_dispatch (request=request@entry=0x55a96b7ae490)
> > #13 0x55a96857dd42 in handle_qmp_command (parser=,
> > tokens=)
> > #14 0x55a9688c7534 in json_message_process_token
> > (lexer=0x55a96b776a68, input=0x55a96b70cae0, type=
> >     x=36, y=91)
> > #15 0x55a9688e960b in json_lexer_feed_char
> > (lexer=lexer@entry=0x55a96b776a68, ch=125 '}',
> >     flush=flush@entry=false)
> > #16 0x55a9688e96ce in json_lexer_feed (lexer=0x55a96b776a68,
> > buffer=, size= > #17 0x55a9688c75f9 in json_message_parser_feed (parser= > out>, buffer=,
> > #18 0x55a96857c5fb in monitor_qmp_read (opaque=,
> > buf=,  size=)
> > #19 0x55a968667596 in tcp_chr_read (chan=,
> > cond=, opaque=0x55a96b7748
> > #20 0x7f8a9447899a in g_main_context_dispatch () from
> > /lib64/libglib-2.0.so.0
> > #21 0x55a968828c3c in glib_pollfds_poll ()
> > #22 os_host_main_loop_wait (timeout=)
> > #23 main_loop_wait (nonblocking=)
> > #24 0x55a96854351f in main_loop () at vl.c:2095
> > #25 main (argc=, argv=, envp= > out>)
> > 
> > (gdb) p *bs
> > $1 = {total_sectors = 94186141054112, open_flags = 1811887680, read_only
> > = 169, encrypted = 85, valid_k
> >   sg = false, probed = false, copy_on_read = 0, flush_queue = {entries =
> > {sqh_first = 0x0,
> >   sqh_last = 0x55a96b79ca48}}, active_flush_req = false, flushed_gen
> > = 68727, drv = 0x0, opaque = 0
> >   aio_context = 0x55a96b778cd0, aio_notifiers = {lh_first = 0x0},
> > walking_aio_notifiers = false,
> >   filename = "/mnt/sdb/lzg/disk_10G.son", '\000' ,
> >   backing_file = "\000mnt/sdb/lzg/disk_10G.raw", '\000'  > times>,
> >   backing_format = "\000aw", '\000' ,
> > full_open_options = 0x0,
> >   exact_filename = "/mnt/sdb/lzg/disk_10G.son", '\000'  > times>, backing = 0x0, file = 0x0,
> >   before_write_notifiers = {notifiers = {lh_first = 0x0}}, in_flight =
> > 0, serialising_in_flight = 0,
> >   wakeup = false, wr_highest_offset = 35188224, bl = {request_alignment
> > = 0, max_pdiscard = 0,
> >     pdiscard_alignment = 0, max_pwrite_zeroes = 0,
> > pwrite_zeroes_alignment = 0, opt_transfer = 0, max_t
> >     min_mem_alignment = 0, opt_mem_alignment = 0, max_iov = 0},
> > supported_write_flags = 0,
> >   supported_zero_flags = 4, node_name = "#block349", '\000'  > times>, node_list = {
> >     tqe_next = 0x55a96b7b14f0, tqe_prev = 0x0}, bs_list = {tqe_next =
> > 0x55a96b7ab240, tqe_prev = 0x0},
> >   monitor_list = {tqe_next = 0x0, tqe_prev = 0x0}, dirty_bitmaps =
> > {lh_first = 0x0}, refcnt = 0,
> >   tracked_requests = {lh_first = 0x0}, op_blockers = {{lh_first = 0x0}
> > }, job = 0x0,
> >   inherits_from = 0x0, children = {lh_first = 0x0}, parents = {lh_first
> > = 0x0}, options = 0x0,
> >   explicit_options = 0x0, detect_zeroes =
> > BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF, backing_blocker = 0x0,
> >

[Qemu-devel] [PATCH v2] util/async: avoid NULL pointer dereference

2018-06-11 Thread Jie Wang

if laio_init create linux_aio failed and return NULL, NULL pointer
dereference will occur when laio_attach_aio_context dereference
linux_aio in aio_get_linux_aio. Let's avoid it and report error.

Signed-off-by: Jie Wang 
---
 block/file-posix.c | 19 +--
 util/async.c   |  5 -
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 513d371bb1..653017d7a5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1665,6 +1665,11 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, 
uint64_t offset,
 #ifdef CONFIG_LINUX_AIO
 } else if (s->use_linux_aio) {
 LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
+if (!aio) {
+s->use_linux_aio = false;
+error_report("Failed to get linux aio");
+return -EIO;
+}
 assert(qiov->size == bytes);
 return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
 #endif
@@ -1695,7 +1700,12 @@ static void raw_aio_plug(BlockDriverState *bs)
 BDRVRawState *s = bs->opaque;
 if (s->use_linux_aio) {
 LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
-laio_io_plug(bs, aio);
+if (aio) {
+laio_io_plug(bs, aio);
+} else {
+s->use_linux_aio = false;
+error_report("Failed to get linux aio");
+}
 }
 #endif
 }
@@ -1706,7 +1716,12 @@ static void raw_aio_unplug(BlockDriverState *bs)
 BDRVRawState *s = bs->opaque;
 if (s->use_linux_aio) {
 LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
-laio_io_unplug(bs, aio);
+if (aio) {
+laio_io_unplug(bs, aio);
+} else {
+s->use_linux_aio = false;
+error_report("Failed to get linux aio");
+}
 }
 #endif
 }
diff --git a/util/async.c b/util/async.c
index 03f62787f2..08d71340f8 100644
--- a/util/async.c
+++ b/util/async.c
@@ -327,8 +327,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
 {
 if (!ctx->linux_aio) {
 ctx->linux_aio = laio_init();
-laio_attach_aio_context(ctx->linux_aio, ctx);
+if (ctx->linux_aio) {
+laio_attach_aio_context(ctx->linux_aio, ctx);
+}
 }
+
 return ctx->linux_aio;
 }
 #endif
-- 
2.15.0.windows.1

Re: [Qemu-devel] [PATCH] kvm: Delete the slot only when KVM_MEM_READONLY flag is changed

2018-06-11 Thread Shannon Zhao

Ping?

On 2018/5/16 17:18, Shannon Zhao wrote:
> According to KVM commit 75d61fbc, it needs to delete the slot before
> changing the KVM_MEM_READONLY flag. But QEMU commit 235e8982 only check
> whether KVM_MEM_READONLY flag is set instead of changing. It doesn't
> need to delete the slot if the KVM_MEM_READONLY flag is not changed.
> 
> This fixes a issue that migrating a VM at the OVMF startup stage and
> VM is executing the codes in rom. Between the deleting and adding the
> slot in kvm_set_user_memory_region, there is a chance that guest access
> rom and trap to KVM, then KVM can't find the corresponding memslot.
> While KVM (on ARM) injects an abort to guest due to the broken hva, then
> guest will get stuck.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  include/sysemu/kvm_int.h | 1 +
>  kvm-all.c| 6 +++---
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
> index 888557a..f838412 100644
> --- a/include/sysemu/kvm_int.h
> +++ b/include/sysemu/kvm_int.h
> @@ -20,6 +20,7 @@ typedef struct KVMSlot
>  void *ram;
>  int slot;
>  int flags;
> +int old_flags;
>  } KVMSlot;
>  
>  typedef struct KVMMemoryListener {
> diff --git a/kvm-all.c b/kvm-all.c
> index 2515a23..de8250e 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -252,7 +252,7 @@ static int kvm_set_user_memory_region(KVMMemoryListener 
> *kml, KVMSlot *slot)
>  mem.userspace_addr = (unsigned long)slot->ram;
>  mem.flags = slot->flags;
>  
> -if (slot->memory_size && mem.flags & KVM_MEM_READONLY) {
> +if (slot->memory_size && (mem.flags ^ slot->old_flags) & 
> KVM_MEM_READONLY) {
>  /* Set the slot size to 0 before setting the slot to the desired
>   * value. This is needed based on KVM commit 75d61fbc. */
>  mem.memory_size = 0;
> @@ -376,11 +376,11 @@ static int kvm_slot_update_flags(KVMMemoryListener 
> *kml, KVMSlot *mem,
>  {
>  int old_flags;
>  
> -old_flags = mem->flags;
> +mem->old_flags = mem->flags;
>  mem->flags = kvm_mem_flags(mr);
>  
>  /* If nothing changed effectively, no need to issue ioctl */
> -if (mem->flags == old_flags) {
> +if (mem->flags == mem->old_flags) {
>  return 0;
>  }
>  
> 

-- 
Shannon

Re: [Qemu-devel] question: a dead loop in qemu when do blockJobAbort and vm suspend coinstantaneously

2018-06-11 Thread l00284672


ping


On 2018/6/11 11:31, l00284672 wrote:


I tried your patch with my modification below can slove this problem.

void blk_set_aio_context(BlockBackend *blk, AioContext *new_context)
{
    BlockDriverState *bs = blk_bs(blk);
    ThrottleGroupMember *tgm = >public.throttle_group_member;

    if (bs) {
    if (tgm->throttle_state) {
    bdrv_drained_begin(bs);
    throttle_group_detach_aio_context(tgm);
    throttle_group_attach_aio_context(tgm, new_context);
    bdrv_drained_end(bs);
    }
     bdrv_ref(bs);
   bdrv_set_aio_context(bs, new_context);
       bdrv_unref(bs);
    }
}

I add bdrv_ref before bdrv_set_aio_context to avoid bs freed in 
mirror_exit.  Do you agree with

my modification ?

On 2018/6/11 11:01, l00284672 wrote:


Thanks for your reply.

I tried your patch but it didn't work for qemu crashed.  The qemu 
crash bt is below:


(gdb) bt
#0  bdrv_detach_aio_context (bs=bs@entry=0x55a96b79ca30)
#1  0x55a9688249ae in bdrv_set_aio_context 
(bs=bs@entry=0x55a96b79ca30,

    new_context=new_context@entry=0x55a96b766920)
#2  0x55a96885f721 in blk_set_aio_context (blk=0x55a96b792820, 
new_context=0x55a96b766920)
#3  0x55a9685ab797 in virtio_blk_data_plane_stop (vdev=out>)

#4  0x55a9687bf705 in virtio_bus_stop_ioeventfd (bus=0x55a96cc42220)
#5  0x55a9685d9d94 in virtio_vmstate_change 
(opaque=0x55a96cc42290, running=,

    state=)
#6  0x55a96866e1a2 in vm_state_notify (running=running@entry=0, 
state=state@entry=RUN_STATE_PAUSED)

#7  0x55a96857b4c5 in do_vm_stop (state=RUN_STATE_PAUSED)
#8  vm_stop (state=state@entry=RUN_STATE_PAUSED)
#9  0x55a96867d52b in qmp_stop (errp=errp@entry=0x7fff4e54a0d8)
#10 0x55a96867b6ab in qmp_marshal_stop (args=, 
ret=, errp=0x7fff4e54a
#11 0x55a9688c2267 in do_qmp_dispatch (errp=0x7fff4e54a118, 
request=0x55a96b7b4740)

#12 qmp_dispatch (request=request@entry=0x55a96b7ae490)
#13 0x55a96857dd42 in handle_qmp_command (parser=, 
tokens=)
#14 0x55a9688c7534 in json_message_process_token 
(lexer=0x55a96b776a68, input=0x55a96b70cae0, type=

    x=36, y=91)
#15 0x55a9688e960b in json_lexer_feed_char 
(lexer=lexer@entry=0x55a96b776a68, ch=125 '}',

    flush=flush@entry=false)
#16 0x55a9688e96ce in json_lexer_feed (lexer=0x55a96b776a68, 
buffer=, size=#17 0x55a9688c75f9 in json_message_parser_feed (parser=out>, buffer=,
#18 0x55a96857c5fb in monitor_qmp_read (opaque=, 
buf=, size=)
#19 0x55a968667596 in tcp_chr_read (chan=, 
cond=, opaque=0x55a96b7748
#20 0x7f8a9447899a in g_main_context_dispatch () from 
/lib64/libglib-2.0.so.0

#21 0x55a968828c3c in glib_pollfds_poll ()
#22 os_host_main_loop_wait (timeout=)
#23 main_loop_wait (nonblocking=)
#24 0x55a96854351f in main_loop () at vl.c:2095
#25 main (argc=, argv=, envp=out>)


(gdb) p *bs
$1 = {total_sectors = 94186141054112, open_flags = 1811887680, 
read_only = 169, encrypted = 85, valid_k
  sg = false, probed = false, copy_on_read = 0, flush_queue = 
{entries = {sqh_first = 0x0,
  sqh_last = 0x55a96b79ca48}}, active_flush_req = false, 
flushed_gen = 68727, drv = 0x0, opaque = 0
  aio_context = 0x55a96b778cd0, aio_notifiers = {lh_first = 0x0}, 
walking_aio_notifiers = false,

  filename = "/mnt/sdb/lzg/disk_10G.son", '\000' ,
  backing_file = "\000mnt/sdb/lzg/disk_10G.raw", '\000' times>,
  backing_format = "\000aw", '\000' , 
full_open_options = 0x0,
  exact_filename = "/mnt/sdb/lzg/disk_10G.son", '\000' times>, backing = 0x0, file = 0x0,
  before_write_notifiers = {notifiers = {lh_first = 0x0}}, in_flight 
= 0, serialising_in_flight = 0,
  wakeup = false, wr_highest_offset = 35188224, bl = 
{request_alignment = 0, max_pdiscard = 0,
    pdiscard_alignment = 0, max_pwrite_zeroes = 0, 
pwrite_zeroes_alignment = 0, opt_transfer = 0, max_t
    min_mem_alignment = 0, opt_mem_alignment = 0, max_iov = 0}, 
supported_write_flags = 0,
  supported_zero_flags = 4, node_name = "#block349", '\000' 22 times>, node_list = {
    tqe_next = 0x55a96b7b14f0, tqe_prev = 0x0}, bs_list = {tqe_next = 
0x55a96b7ab240, tqe_prev = 0x0},
  monitor_list = {tqe_next = 0x0, tqe_prev = 0x0}, dirty_bitmaps = 
{lh_first = 0x0}, refcnt = 0,
  tracked_requests = {lh_first = 0x0}, op_blockers = {{lh_first = 
0x0} }, job = 0x0,
  inherits_from = 0x0, children = {lh_first = 0x0}, parents = 
{lh_first = 0x0}, options = 0x0,
  explicit_options = 0x0, detect_zeroes = 
BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF, backing_blocker = 0x0,
  write_threshold_offset = 0, write_threshold_notifier = {notify = 
0x0, node = {le_next = 0x0, le_prev

  io_plugged = 0, quiesce_counter = 0, write_gen = 68727}
(gdb) p *bs->drv
Cannot access memory at address 0x0

From the bt we can see,  qemu met a Null pointer reference in 
bdrv_detach_aio_context. The code is below:


void bdrv_detach_aio_context(BlockDriverState *bs)
{
    .

QLIST_FOREACH_SAFE(baf, >aio_notifiers, list, baf_tmp) {
    if (baf->deleted) {

Re: [Qemu-devel] [PATCH v3 00/19] linux-user: Split do_syscall

2018-06-11 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180612005145.3375-1-richard.hender...@linaro.org
Subject: [Qemu-devel] [PATCH v3 00/19] linux-user: Split do_syscall

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   
patchew/20180612005145.3375-1-richard.hender...@linaro.org -> 
patchew/20180612005145.3375-1-richard.hender...@linaro.org
Switched to a new branch 'test'
41150e125a linux-user: Split out some process syscalls
fb226d75f3 linux-user: Split out memory syscalls
a1be98d2de linux-user: Split out ipc syscalls
4ed2df106e linux-user: Split out name_to_handle_at, open_by_handle_at
aa14b5dcfe linux-user: Split out pread64, pwrite64
856ddcb6c5 linux-user: Split out preadv, pwritev, readv, writev
4b6c6ec7c9 linux-user: Split out close, open, openat, read, write
796e07dce5 linux-user: Setup split syscall infrastructure
67e916d5d7 linux-user: Propagate goto fail to return
209fad0642 linux-user: Propagate goto unimplemented to default
df63822df8 linux-user: Propagate goto unimplemented_nowarn to return
6211190a92 linux-user: Propagate goto efault to return
daed129053 linux-user: Relax single exit from "break"
850a3c73a6 linux-user: Split out do_syscall1
1344862476 linux-user: Remove DEBUG
d4c997503b linux-user/sparc64: Add inotify_rm_watch and tee syscalls
57b7c2976f linux-user/microblaze: Fix typo in accept4 syscall
ca160b0d28 linux-user/hppa: Fix typo in mknodat syscall
2fd6a8234f linux-user/alpha: Fix epoll syscalls

=== OUTPUT BEGIN ===
Checking PATCH 1/19: linux-user/alpha: Fix epoll syscalls...
ERROR: code indent should never use tabs
#23: FILE: linux-user/alpha/syscall_nr.h:346:
+#define TARGET_NR_epoll_create^I^I407$

ERROR: code indent should never use tabs
#24: FILE: linux-user/alpha/syscall_nr.h:347:
+#define TARGET_NR_epoll_ctl^I^I408$

ERROR: code indent should never use tabs
#25: FILE: linux-user/alpha/syscall_nr.h:348:
+#define TARGET_NR_epoll_wait^I^I409$

total: 3 errors, 0 warnings, 27 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/19: linux-user/hppa: Fix typo in mknodat syscall...
Checking PATCH 3/19: linux-user/microblaze: Fix typo in accept4 syscall...
ERROR: code indent should never use tabs
#23: FILE: linux-user/microblaze/syscall_nr.h:366:
+#define TARGET_NR_accept4^I^I362 /* new */$

total: 1 errors, 0 warnings, 8 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 4/19: linux-user/sparc64: Add inotify_rm_watch and tee 
syscalls...
WARNING: line over 80 characters
#19: FILE: linux-user/sparc64/syscall_nr.h:157:
+#define TARGET_NR_inotify_rm_watch   156 /* Linux specific 
 */

total: 0 errors, 1 warnings, 16 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 5/19: linux-user: Remove DEBUG...
Checking PATCH 6/19: linux-user: Split out do_syscall1...
Checking PATCH 7/19: linux-user: Relax single exit from "break"...
Checking PATCH 8/19: linux-user: Propagate goto efault to return...
ERROR: do not use assignment in if condition
#257: FILE: linux-user/syscall.c:8465:
+if (!(p = lock_user_string(arg2))) {

ERROR: do not use assignment in if condition
#269: FILE: linux-user/syscall.c:8483:
+if (!(p = lock_user_string(arg1))) {

ERROR: do not use assignment in if condition
#280: FILE: linux-user/syscall.c:8492:
+if (!(p = lock_user_string(arg2))) {

ERROR: do not use assignment in if condition
#328: FILE: linux-user/syscall.c:8631:
+if (!(p = lock_user_string(arg1))) {

ERROR: suspect code indent for conditional statements (11, 15)
#652: FILE: linux-user/syscall.c:9462:
if (!p) {
+   return -TARGET_EFAULT;

ERROR: do not use assignment in if condition
#691: FILE: linux-user/syscall.c:9596:
+if (!(p = lock_user_string(arg1))) {

ERROR: do not use assignment in if condition
#712: FILE: linux-user/syscall.c:9632:
+if (!(p = lock_user_string(arg1))) {

ERROR: do not use assignment in if condition
#767: FILE:

[Qemu-devel] [PATCH v3 18/19] linux-user: Split out memory syscalls

2018-06-11 Thread Richard Henderson

This includes mmap, mmap2, munmap, mlock, mlockall, munlock,
munlockall, mprotect, mremap, msync.

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.h   |   2 +
 linux-user/strace.c|  55 ++
 linux-user/syscall.c   | 118 
 linux-user/syscall_mem.c   | 190 +
 linux-user/Makefile.objs   |   3 +-
 linux-user/gen_syscall_list.py |  10 ++
 linux-user/strace.list |  33 --
 7 files changed, 212 insertions(+), 199 deletions(-)
 create mode 100644 linux-user/syscall_mem.c

diff --git a/linux-user/syscall.h b/linux-user/syscall.h
index 217267409b..0bbbf6f7b0 100644
--- a/linux-user/syscall.h
+++ b/linux-user/syscall.h
@@ -53,6 +53,8 @@ typedef enum {
 /* These print as sets of flags.  */
 ARG_ATDIRFD,
 ARG_ATFLAG,
+ARG_MMAPFLAG,
+ARG_MMAPPROT,
 ARG_MODEFLAG,
 ARG_OPENFLAG,
 
diff --git a/linux-user/strace.c b/linux-user/strace.c
index 79d2d91636..6ebd261a84 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -801,7 +801,7 @@ UNUSED static struct flags umount2_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags mmap_prot_flags[] = {
+static struct flags const mmap_prot_flags[] = {
 FLAG_GENERIC(PROT_NONE),
 FLAG_GENERIC(PROT_EXEC),
 FLAG_GENERIC(PROT_READ),
@@ -812,7 +812,7 @@ UNUSED static struct flags mmap_prot_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags mmap_flags[] = {
+static struct flags const mmap_flags[] = {
 FLAG_TARGET(MAP_SHARED),
 FLAG_TARGET(MAP_PRIVATE),
 FLAG_TARGET(MAP_ANONYMOUS),
@@ -2364,51 +2364,6 @@ print_utimensat(const struct syscallname *name,
 }
 #endif
 
-#if defined(TARGET_NR_mmap) || defined(TARGET_NR_mmap2)
-static void
-print_mmap(const struct syscallname *name,
-abi_long arg0, abi_long arg1, abi_long arg2,
-abi_long arg3, abi_long arg4, abi_long arg5)
-{
-print_syscall_prologue(name);
-print_pointer(arg0, 0);
-print_raw_param("%d", arg1, 0);
-print_flags(mmap_prot_flags, arg2, 0);
-print_flags(mmap_flags, arg3, 0);
-print_raw_param("%d", arg4, 0);
-print_raw_param("%#x", arg5, 1);
-print_syscall_epilogue(name);
-}
-#define print_mmap2 print_mmap
-#endif
-
-#ifdef TARGET_NR_mprotect
-static void
-print_mprotect(const struct syscallname *name,
-abi_long arg0, abi_long arg1, abi_long arg2,
-abi_long arg3, abi_long arg4, abi_long arg5)
-{
-print_syscall_prologue(name);
-print_pointer(arg0, 0);
-print_raw_param("%d", arg1, 0);
-print_flags(mmap_prot_flags, arg2, 1);
-print_syscall_epilogue(name);
-}
-#endif
-
-#ifdef TARGET_NR_munmap
-static void
-print_munmap(const struct syscallname *name,
-abi_long arg0, abi_long arg1, abi_long arg2,
-abi_long arg3, abi_long arg4, abi_long arg5)
-{
-print_syscall_prologue(name);
-print_pointer(arg0, 0);
-print_raw_param("%d", arg1, 1);
-print_syscall_epilogue(name);
-}
-#endif
-
 #ifdef TARGET_NR_futex
 static void print_futex_op(abi_long tflag, int last)
 {
@@ -2613,6 +2568,12 @@ static void print_syscall_def1(const SyscallDef *def, 
int64_t args[6])
 case ARG_ATFLAG:
 len = add_flags(b, rest, at_file_flags, arg, false);
 break;
+case ARG_MMAPFLAG:
+len = add_flags(b, rest, mmap_flags, arg, false);
+break;
+case ARG_MMAPPROT:
+len = add_flags(b, rest, mmap_prot_flags, arg, false);
+break;
 case ARG_MODEFLAG:
 len = add_flags(b, rest, mode_flags, arg, true);
 break;
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 51d27f2a88..67fbf7f674 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -4899,29 +4899,6 @@ static const StructEntry struct_termios_def = {
 .align = { __alignof__(struct target_termios), __alignof__(struct 
host_termios) },
 };
 
-static bitmask_transtbl mmap_flags_tbl[] = {
-{ TARGET_MAP_SHARED, TARGET_MAP_SHARED, MAP_SHARED, MAP_SHARED },
-{ TARGET_MAP_PRIVATE, TARGET_MAP_PRIVATE, MAP_PRIVATE, MAP_PRIVATE },
-{ TARGET_MAP_FIXED, TARGET_MAP_FIXED, MAP_FIXED, MAP_FIXED },
-{ TARGET_MAP_ANONYMOUS, TARGET_MAP_ANONYMOUS,
-  MAP_ANONYMOUS, MAP_ANONYMOUS },
-{ TARGET_MAP_GROWSDOWN, TARGET_MAP_GROWSDOWN,
-  MAP_GROWSDOWN, MAP_GROWSDOWN },
-{ TARGET_MAP_DENYWRITE, TARGET_MAP_DENYWRITE,
-  MAP_DENYWRITE, MAP_DENYWRITE },
-{ TARGET_MAP_EXECUTABLE, TARGET_MAP_EXECUTABLE,
-  MAP_EXECUTABLE, MAP_EXECUTABLE },
-{ TARGET_MAP_LOCKED, TARGET_MAP_LOCKED, MAP_LOCKED, MAP_LOCKED },
-{ TARGET_MAP_NORESERVE, TARGET_MAP_NORESERVE,
-  MAP_NORESERVE, MAP_NORESERVE },
-{ TARGET_MAP_HUGETLB, TARGET_MAP_HUGETLB, MAP_HUGETLB, MAP_HUGETLB },
-/* MAP_STACK had been ignored by the kernel for quite some time.
-   Recognize it for the target insofar as we do not want to pass
-   it through to the host.  */
-{ TARGET_MAP_STACK, TARGET_MAP_STACK, 0, 0

[Qemu-devel] [PATCH v3 19/19] linux-user: Split out some process syscalls

2018-06-11 Thread Richard Henderson

This includes clone, getgroups, gettid, setfsgid, setfsuid,
setgroups, setsid, setuid, fork, getegid, getegid32, geteuid,
geteuid32, getgid, getgid32, getgroups32, getpgrp, getpid,
getppid, getresgid, getresgid32, getresuid, getresuid32,
getuid, getuid32, getxgid, getxpid, getxuid, setfsgid32,
setgsuid32, setgid32, setgroups32, setregid, setregid32,
setresgid, setresgid32, setresuid, setresuid32, setreuid,
setreuid32, setuid32, vfork.

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.h   | 108 
 linux-user/strace.c|  36 +-
 linux-user/syscall.c   | 799 
 linux-user/syscall_proc.c  | 914 +
 linux-user/Makefile.objs   |   5 +-
 linux-user/gen_syscall_list.py |  43 ++
 linux-user/strace.list | 144 --
 7 files changed, 1072 insertions(+), 977 deletions(-)
 create mode 100644 linux-user/syscall_proc.c

diff --git a/linux-user/syscall.h b/linux-user/syscall.h
index 0bbbf6f7b0..de99ae5a3b 100644
--- a/linux-user/syscall.h
+++ b/linux-user/syscall.h
@@ -53,6 +53,7 @@ typedef enum {
 /* These print as sets of flags.  */
 ARG_ATDIRFD,
 ARG_ATFLAG,
+ARG_CLONEFLAG,
 ARG_MMAPFLAG,
 ARG_MMAPPROT,
 ARG_MODEFLAG,
@@ -221,6 +222,61 @@ static inline uint64_t target_offset64(abi_ulong word0, 
abi_ulong word1)
 #endif
 }
 
+#ifdef USE_UID16
+static inline int high2lowuid(int uid)
+{
+return uid > 65535 ? 65534 : uid;
+}
+
+static inline int high2lowgid(int gid)
+{
+return gid > 65535 ? 65534 : gid;
+}
+
+static inline int low2highuid(int uid)
+{
+return (int16_t)uid == -1 ? -1 : uid;
+}
+
+static inline int low2highgid(int gid)
+{
+return (int16_t)gid == -1 ? -1 : gid;
+}
+static inline int tswapid(int id)
+{
+return tswap16(id);
+}
+
+#define put_user_id(x, gaddr) put_user_u16(x, gaddr)
+#else /* !USE_UID16 */
+static inline int high2lowuid(int uid)
+{
+return uid;
+}
+
+static inline int high2lowgid(int gid)
+{
+return gid;
+}
+
+static inline int low2highuid(int uid)
+{
+return uid;
+}
+
+static inline int low2highgid(int gid)
+{
+return gid;
+}
+
+static inline int tswapid(int id)
+{
+return tswap32(id);
+}
+
+#define put_user_id(x, gaddr) put_user_u32(x, gaddr)
+#endif /* USE_UID16 */
+
 /* Temporary declarations from syscall_foo.c back to main syscall.c.
  * These indicate incomplete conversion.
  */
@@ -277,5 +333,57 @@ static type safe_##name(type1 arg1, type2 arg2, type3 
arg3, type4 arg4, \
 return safe_syscall(__NR_##name, arg1, arg2, arg3, arg4, arg5, arg6); \
 }
 
+/* Declarators for non-interruptable system calls.  */
+
+#undef _syscall0
+#undef _syscall1
+#undef _syscall2
+#undef _syscall3
+#undef _syscall4
+#undef _syscall5
+#undef _syscall6
+
+#define _syscall0(type, name)   \
+static type name(void)  \
+{   \
+return syscall(__NR_##name);\
+}
+
+#define _syscall1(T0, name, T1, A1) \
+static T0 name(T1 A1)   \
+{   \
+return syscall(__NR_##name, A1);\
+}
+
+#define _syscall2(T0, name, T1, A1, T2, A2) \
+static T0 name(T1 A1, T2 A2)\
+{   \
+return syscall(__NR_##name, A1, A2);\
+}
+
+#define _syscall3(T0, name, T1, A1, T2, A2, T3, A3) \
+static T0 name(T1 A1, T2 A2, T3 A3) \
+{   \
+return syscall(__NR_##name, A1, A2, A3);\
+}
+
+#define _syscall4(T0, name, T1, A1, T2, A2, T3, A3, T4, A4) \
+static T0 name(T1 A1, T2 A2, T3 A3, T4 A4)  \
+{   \
+return syscall(__NR_##name, A1, A2, A3, A4);\
+}
+
+#define _syscall5(T0, name, T1, A1, T2, A2, T3, A3, T4, A4, T5, A5) \
+static T0 name(T1 A1, T2 A2, T3 A3, T4 A4, T5 A5)   \
+{   \
+return syscall(__NR_##name, A1, A2, A3, A4, A5);\
+}
+
+#define _syscall6(T0, name, T1, A1, T2, A2, T3, A3, T4, A4, T5, A5, T6, A6) \
+static T0 name(T1 A1, T2 A2, T3 A3, T4 A4, T5 A5, T6 A6)\
+{   \
+return syscall(__NR_##name, A1, A2, A3, A4, A5, A6);\
+}
+
 /* Include declarations of syscall definitions.  */
 #include "syscall_list.h"
diff --git a/linux-user/strace.c b/linux-user/strace.c
index 6ebd261a84..8b0104afce 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -836,7 +836,7 @@ static struct flags const mmap_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags clone_flags[] = {
+static struct flags const clone_flags[] = {
 FLAG_GENERIC(CLONE_VM),

[Qemu-devel] [PATCH v3 13/19] linux-user: Split out close, open, openat, read, write

2018-06-11 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.h   |  51 
 linux-user/strace.c|  35 ---
 linux-user/syscall.c   | 402 +--
 linux-user/syscall_file.c  | 423 +
 linux-user/Makefile.objs   |   3 +-
 linux-user/gen_syscall_list.py |   5 +
 linux-user/strace.list |  15 --
 7 files changed, 484 insertions(+), 450 deletions(-)
 create mode 100644 linux-user/syscall_file.c

diff --git a/linux-user/syscall.h b/linux-user/syscall.h
index 7eb078c3e5..e35b0a60f5 100644
--- a/linux-user/syscall.h
+++ b/linux-user/syscall.h
@@ -125,6 +125,57 @@ static inline int is_error(abi_ulong ret)
 return ret >= -4096;
 }
 
+typedef abi_long (*TargetFdDataFunc)(void *, size_t);
+typedef abi_long (*TargetFdAddrFunc)(void *, abi_ulong, socklen_t);
+typedef struct TargetFdTrans {
+TargetFdDataFunc host_to_target_data;
+TargetFdDataFunc target_to_host_data;
+TargetFdAddrFunc target_to_host_addr;
+} TargetFdTrans;
+
+extern TargetFdTrans **target_fd_trans;
+extern unsigned int target_fd_max;
+
+static inline TargetFdDataFunc fd_trans_target_to_host_data(int fd)
+{
+if (fd >= 0 && fd < target_fd_max && target_fd_trans[fd]) {
+return target_fd_trans[fd]->target_to_host_data;
+}
+return NULL;
+}
+
+static inline TargetFdDataFunc fd_trans_host_to_target_data(int fd)
+{
+if (fd >= 0 && fd < target_fd_max && target_fd_trans[fd]) {
+return target_fd_trans[fd]->host_to_target_data;
+}
+return NULL;
+}
+
+static inline TargetFdAddrFunc fd_trans_target_to_host_addr(int fd)
+{
+if (fd >= 0 && fd < target_fd_max && target_fd_trans[fd]) {
+return target_fd_trans[fd]->target_to_host_addr;
+}
+return NULL;
+}
+
+void fd_trans_register(int fd, TargetFdTrans *trans);
+
+static inline void fd_trans_unregister(int fd)
+{
+if (fd >= 0 && fd < target_fd_max) {
+target_fd_trans[fd] = NULL;
+}
+}
+
+/* Temporary declarations from syscall_foo.c back to main syscall.c.
+ * These indicate incomplete conversion.
+ */
+
+int is_proc_myself(const char *filename, const char *entry);
+extern bitmask_transtbl const fcntl_flags_tbl[];
+
 /* Declarators for interruptable system calls.  */
 
 #define safe_syscall0(type, name) \
diff --git a/linux-user/strace.c b/linux-user/strace.c
index 6375feb747..1ae0057365 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -2213,41 +2213,6 @@ print_mq_open(const struct syscallname *name,
 }
 #endif
 
-#ifdef TARGET_NR_open
-static void
-print_open(const struct syscallname *name,
-abi_long arg0, abi_long arg1, abi_long arg2,
-abi_long arg3, abi_long arg4, abi_long arg5)
-{
-int is_creat = (arg1 & TARGET_O_CREAT);
-
-print_syscall_prologue(name);
-print_string(arg0, 0);
-print_open_flags(arg1, (is_creat == 0));
-if (is_creat)
-print_file_mode(arg2, 1);
-print_syscall_epilogue(name);
-}
-#endif
-
-#ifdef TARGET_NR_openat
-static void
-print_openat(const struct syscallname *name,
-abi_long arg0, abi_long arg1, abi_long arg2,
-abi_long arg3, abi_long arg4, abi_long arg5)
-{
-int is_creat = (arg2 & TARGET_O_CREAT);
-
-print_syscall_prologue(name);
-print_at_dirfd(arg0, 0);
-print_string(arg1, 0);
-print_open_flags(arg2, (is_creat == 0));
-if (is_creat)
-print_file_mode(arg3, 1);
-print_syscall_epilogue(name);
-}
-#endif
-
 #ifdef TARGET_NR_mq_unlink
 static void
 print_mq_unlink(const struct syscallname *name,
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index a96bbf9093..c47e73de5f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -325,42 +325,6 @@ _syscall5(int, kcmp, pid_t, pid1, pid_t, pid2, int, type,
   unsigned long, idx1, unsigned long, idx2)
 #endif
 
-static bitmask_transtbl fcntl_flags_tbl[] = {
-  { TARGET_O_ACCMODE,   TARGET_O_WRONLY,O_ACCMODE,   O_WRONLY,},
-  { TARGET_O_ACCMODE,   TARGET_O_RDWR,  O_ACCMODE,   O_RDWR,  },
-  { TARGET_O_CREAT, TARGET_O_CREAT, O_CREAT, O_CREAT, },
-  { TARGET_O_EXCL,  TARGET_O_EXCL,  O_EXCL,  O_EXCL,  },
-  { TARGET_O_NOCTTY,TARGET_O_NOCTTY,O_NOCTTY,O_NOCTTY,},
-  { TARGET_O_TRUNC, TARGET_O_TRUNC, O_TRUNC, O_TRUNC, },
-  { TARGET_O_APPEND,TARGET_O_APPEND,O_APPEND,O_APPEND,},
-  { TARGET_O_NONBLOCK,  TARGET_O_NONBLOCK,  O_NONBLOCK,  O_NONBLOCK,  },
-  { TARGET_O_SYNC,  TARGET_O_DSYNC, O_SYNC,  O_DSYNC, },
-  { TARGET_O_SYNC,  TARGET_O_SYNC,  O_SYNC,  O_SYNC,  },
-  { TARGET_FASYNC,  TARGET_FASYNC,  FASYNC,  FASYNC,  },
-  { TARGET_O_DIRECTORY, TARGET_O_DIRECTORY, O_DIRECTORY, O_DIRECTORY, },
-  { TARGET_O_NOFOLLOW,  TARGET_O_NOFOLLOW,  O_NOFOLLOW,  O_NOFOLLOW,  },
-#if defined(O_DIRECT)
-  { TARGET_O_DIRECT,TARGET_O_DIRECT,O_DIRECT,O_DIRECT,},
-#endif
-#if defined(O_NOATIME)
-  { TARGET_O_NOATIME,

[Qemu-devel] [PATCH v3 17/19] linux-user: Split out ipc syscalls

2018-06-11 Thread Richard Henderson

Given that these are all intertwined via TARGET_NR_ipc,
we must move them all to the new file all at once.

Signed-off-by: Richard Henderson 
---
 linux-user/strace.c|   83 ---
 linux-user/syscall.c   |  973 
 linux-user/syscall_ipc.c   | 1095 
 linux-user/Makefile.objs   |4 +-
 linux-user/gen_syscall_list.py |   12 +
 linux-user/strace.list |   42 --
 6 files changed, 1110 insertions(+), 1099 deletions(-)
 create mode 100644 linux-user/syscall_ipc.c

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 4240ea649a..79d2d91636 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -1,8 +1,4 @@
 #include "qemu/osdep.h"
-#include 
-#include 
-#include 
-#include 
 #include 
 #include 
 #include 
@@ -74,54 +70,6 @@ UNUSED static void print_socket_protocol(int domain, int 
type, int protocol);
 /*
  * Utility functions
  */
-static void
-print_ipc_cmd(int cmd)
-{
-#define output_cmd(val) \
-if( cmd == val ) { \
-gemu_log(#val); \
-return; \
-}
-
-cmd &= 0xff;
-
-/* General IPC commands */
-output_cmd( IPC_RMID );
-output_cmd( IPC_SET );
-output_cmd( IPC_STAT );
-output_cmd( IPC_INFO );
-/* msgctl() commands */
-output_cmd( MSG_STAT );
-output_cmd( MSG_INFO );
-/* shmctl() commands */
-output_cmd( SHM_LOCK );
-output_cmd( SHM_UNLOCK );
-output_cmd( SHM_STAT );
-output_cmd( SHM_INFO );
-/* semctl() commands */
-output_cmd( GETPID );
-output_cmd( GETVAL );
-output_cmd( GETALL );
-output_cmd( GETNCNT );
-output_cmd( GETZCNT );
-output_cmd( SETVAL );
-output_cmd( SETALL );
-output_cmd( SEM_STAT );
-output_cmd( SEM_INFO );
-output_cmd( IPC_RMID );
-output_cmd( IPC_RMID );
-output_cmd( IPC_RMID );
-output_cmd( IPC_RMID );
-output_cmd( IPC_RMID );
-output_cmd( IPC_RMID );
-output_cmd( IPC_RMID );
-output_cmd( IPC_RMID );
-output_cmd( IPC_RMID );
-
-/* Some value we don't recognize */
-gemu_log("%d",cmd);
-}
-
 static void
 print_signal(abi_ulong arg, int last)
 {
@@ -620,18 +568,6 @@ print_newselect(const struct syscallname *name,
 }
 #endif
 
-#ifdef TARGET_NR_semctl
-static void
-print_semctl(const struct syscallname *name,
- abi_long arg1, abi_long arg2, abi_long arg3,
- abi_long arg4, abi_long arg5, abi_long arg6)
-{
-gemu_log("%s(" TARGET_ABI_FMT_ld "," TARGET_ABI_FMT_ld ",", name->name, 
arg1, arg2);
-print_ipc_cmd(arg3);
-gemu_log(",0x" TARGET_ABI_FMT_lx ")", arg4);
-}
-#endif
-
 static void
 print_execve(const struct syscallname *name,
  abi_long arg1, abi_long arg2, abi_long arg3,
@@ -664,25 +600,6 @@ print_execve(const struct syscallname *name,
 gemu_log("NULL})");
 }
 
-#ifdef TARGET_NR_ipc
-static void
-print_ipc(const struct syscallname *name,
-  abi_long arg1, abi_long arg2, abi_long arg3,
-  abi_long arg4, abi_long arg5, abi_long arg6)
-{
-switch(arg1) {
-case IPCOP_semctl:
-gemu_log("semctl(" TARGET_ABI_FMT_ld "," TARGET_ABI_FMT_ld ",", arg1, 
arg2);
-print_ipc_cmd(arg3);
-gemu_log(",0x" TARGET_ABI_FMT_lx ")", arg4);
-break;
-default:
-gemu_log("%s(" TARGET_ABI_FMT_ld "," TARGET_ABI_FMT_ld "," 
TARGET_ABI_FMT_ld "," TARGET_ABI_FMT_ld ")",
- name->name, arg1, arg2, arg3, arg4);
-}
-}
-#endif
-
 /*
  * Variants for the return value output function
  */
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8c7dd3049e..51d27f2a88 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -23,8 +23,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 #include 
@@ -41,8 +39,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 #include 
@@ -843,43 +839,6 @@ safe_syscall2(int, nanosleep, const struct timespec *, req,
 safe_syscall4(int, clock_nanosleep, const clockid_t, clock, int, flags,
   const struct timespec *, req, struct timespec *, rem)
 #endif
-#ifdef __NR_msgsnd
-safe_syscall4(int, msgsnd, int, msgid, const void *, msgp, size_t, sz,
-  int, flags)
-safe_syscall5(int, msgrcv, int, msgid, void *, msgp, size_t, sz,
-  long, msgtype, int, flags)
-safe_syscall4(int, semtimedop, int, semid, struct sembuf *, tsops,
-  unsigned, nsops, const struct timespec *, timeout)
-#else
-/* This host kernel architecture uses a single ipc syscall; fake up
- * wrappers for the sub-operations to hide this implementation detail.
- * Annoyingly we can't include linux/ipc.h to get the constant definitions
- * for the call parameter because some structs in there conflict with the
- * sys/ipc.h ones. So we just define them here, and rely on them being
- * the same for all host architectures.
- */
-#define Q_SEMTIMEDOP 4
-#define Q_MSGSND 11
-#define Q_MSGRCV 12
-#define Q_IPCCALL(VERSION, OP) ((VERSION) << 16 |

[Qemu-devel] [PATCH v3 11/19] linux-user: Propagate goto fail to return

2018-06-11 Thread Richard Henderson

Reviewed-by: Laurent Vivier 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 60 
 1 file changed, 22 insertions(+), 38 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 5a17803732..fd8d48b0da 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8865,8 +8865,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 how = SIG_SETMASK;
 break;
 default:
-ret = -TARGET_EINVAL;
-goto fail;
+return -TARGET_EINVAL;
 }
 mask = arg2;
 target_to_host_old_sigset(, );
@@ -8893,8 +8892,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 how = SIG_SETMASK;
 break;
 default:
-ret = -TARGET_EINVAL;
-goto fail;
+return -TARGET_EINVAL;
 }
 if (!(p = lock_user(VERIFY_READ, arg2, 
sizeof(target_sigset_t), 1)))
 return -TARGET_EFAULT;
@@ -8937,8 +8935,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 how = SIG_SETMASK;
 break;
 default:
-ret = -TARGET_EINVAL;
-goto fail;
+return -TARGET_EINVAL;
 }
 if (!(p = lock_user(VERIFY_READ, arg2, 
sizeof(target_sigset_t), 1)))
 return -TARGET_EFAULT;
@@ -9227,15 +9224,15 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 
 ret = copy_from_user_fdset_ptr(, _ptr, rfd_addr, n);
 if (ret) {
-goto fail;
+return ret;
 }
 ret = copy_from_user_fdset_ptr(, _ptr, wfd_addr, n);
 if (ret) {
-goto fail;
+return ret;
 }
 ret = copy_from_user_fdset_ptr(, _ptr, efd_addr, n);
 if (ret) {
-goto fail;
+return ret;
 }
 
 /*
@@ -9268,8 +9265,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 sig.set = 
 if (arg_sigsize != sizeof(*target_sigset)) {
 /* Like the kernel, we enforce correct size sigsets */
-ret = -TARGET_EINVAL;
-goto fail;
+return -TARGET_EINVAL;
 }
 target_sigset = lock_user(VERIFY_READ, arg_sigset,
   sizeof(*target_sigset), 1);
@@ -9700,17 +9696,15 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 case TARGET_SYSLOG_ACTION_READ_CLEAR:/* Read/clear msgs */
 case TARGET_SYSLOG_ACTION_READ_ALL:  /* Read last messages */
 {
-ret = -TARGET_EINVAL;
 if (len < 0) {
-goto fail;
+return -TARGET_EINVAL;
 }
 if (len == 0) {
 return 0;
 }
 p = lock_user(VERIFY_WRITE, arg2, arg3, 0);
 if (!p) {
-ret = -TARGET_EFAULT;
-goto fail;
+return -TARGET_EFAULT;
 }
 ret = get_errno(sys_syslog((int)arg1, p, (int)arg3));
 unlock_user(p, arg2, arg3);
@@ -10056,8 +10050,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 
 dirp = g_try_malloc(count);
 if (!dirp) {
-ret = -TARGET_ENOMEM;
-goto fail;
+return -TARGET_ENOMEM;
 }
 
 ret = get_errno(sys_getdents(arg1, dirp, count));
@@ -10456,7 +10449,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
arg2 ?  : NULL,
NULL));
 if (is_error(ret)) {
-goto fail;
+return ret;
 }
 if (arg1 && put_user_u32(cpu, arg1)) {
 return -TARGET_EFAULT;
@@ -10903,8 +10896,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 grouplist = alloca(gidsetsize * sizeof(gid_t));
 target_grouplist = lock_user(VERIFY_READ, arg2, gidsetsize * 
sizeof(target_id), 1);
 if (!target_grouplist) {
-ret = -TARGET_EFAULT;
-goto fail;
+return -TARGET_EFAULT;
 }
 for (i = 0; i < gidsetsize; i++) {
 grouplist[i] =

[Qemu-devel] [PATCH v3 12/19] linux-user: Setup split syscall infrastructure

2018-06-11 Thread Richard Henderson

Defines a unified structure for implementation and strace.
Supplies a generator script to build the declarations and
the lookup function.

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.h   | 178 +++
 linux-user/strace.c| 386 -
 linux-user/syscall.c   | 113 --
 linux-user/Makefile.objs   |  10 +
 linux-user/gen_syscall_list.py |  82 +++
 5 files changed, 595 insertions(+), 174 deletions(-)
 create mode 100644 linux-user/syscall.h
 create mode 100644 linux-user/gen_syscall_list.py

diff --git a/linux-user/syscall.h b/linux-user/syscall.h
new file mode 100644
index 00..7eb078c3e5
--- /dev/null
+++ b/linux-user/syscall.h
@@ -0,0 +1,178 @@
+/*
+ *  Linux syscalls internals
+ *  Copyright (c) 2018 Linaro, Limited.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+typedef struct SyscallDef SyscallDef;
+
+/* This hook extracts max 6 arguments from max 8 input registers.
+ * In the process, register pairs that store 64-bit arguments are merged.
+ * Finally, syscalls are demultipliexed; e.g. the hook for socketcall will
+ * return the SyscallDef for bind, listen, etc.  In the process the hook
+ * may need to read from guest memory, or otherwise validate operands.
+ * On failure, set errno (to a host value) and return NULL;
+ * the (target adjusted) errno will be returned to the guest.
+ */
+typedef const SyscallDef *SyscallArgsFn(const SyscallDef *, int64_t out[6],
+abi_long in[8]);
+
+/* This hook implements the syscall.  */
+typedef abi_long SyscallImplFn(CPUArchState *, int64_t, int64_t, int64_t,
+   int64_t, int64_t, int64_t);
+
+/* This hook prints the arguments to the syscall for strace.  */
+typedef void SyscallPrintFn(const SyscallDef *, int64_t arg[6]);
+
+/* This hook print the return value from the syscall for strace.  */
+typedef void SyscallPrintRetFn(const SyscallDef *, abi_long);
+
+/* These flags describe the arguments for the generic fallback to
+ * SyscallPrintFn.  ARG_NONE indicates that the argument is not present.
+ */
+typedef enum {
+ARG_NONE = 0,
+
+/* These print as numbers of abi_long.  */
+ARG_DEC,
+ARG_HEX,
+ARG_OCT,
+
+/* These print as sets of flags.  */
+ARG_ATDIRFD,
+ARG_MODEFLAG,
+ARG_OPENFLAG,
+
+/* These are interpreted as pointers.  */
+ARG_PTR,
+ARG_STR,
+ARG_BUF,
+
+/* For a 32-bit host, force printing as a 64-bit operand.  */
+#if TARGET_ABI_BITS == 32
+ARG_DEC64,
+#else
+ARG_DEC64 = ARG_DEC,
+#endif
+} SyscallArgType;
+
+struct SyscallDef {
+const char *name;
+SyscallArgsFn *args;
+SyscallImplFn *impl;
+SyscallPrintFn *print;
+SyscallPrintRetFn *print_ret;
+SyscallArgType arg_type[6];
+};
+
+void print_syscall_def(const SyscallDef *def, int64_t args[6]);
+void print_syscall_def_ret(const SyscallDef *def, abi_long ret);
+void print_syscall_ptr_ret(const SyscallDef *def, abi_long ret);
+
+/* Emit the signature for a SyscallArgsFn.  */
+#define SYSCALL_ARGS(NAME) \
+static const SyscallDef *args_##NAME(const SyscallDef *def, \
+ int64_t out[6], abi_long in[8])
+
+/* Emit the signature for a SyscallImplFn.  */
+#define SYSCALL_IMPL(NAME) \
+static abi_long impl_##NAME(CPUArchState *cpu_env, int64_t arg1, \
+int64_t arg2, int64_t arg3, int64_t arg4, \
+int64_t arg5, int64_t arg6)
+
+/* Emit the definition for a "simple" syscall.  Such does not use
+ * SyscallArgsFn and only uses arg_type for strace.
+ */
+#define SYSCALL_DEF(NAME, ...) \
+const SyscallDef def_##NAME = { \
+.name = #NAME, .impl = impl_##NAME, .arg_type = { __VA_ARGS__ } \
+}
+
+/* Emit the definition for a syscall that also has an args hook,
+ * and uses arg_type for strace.
+ */
+#define SYSCALL_DEF_ARGS(NAME, ...) \
+const SyscallDef def_##NAME = { \
+.name = #NAME, .args = args_##NAME, .impl = impl_##NAME, \
+.arg_type = { __VA_ARGS__ } \
+}
+
+/* Declarations from the main syscall.c for use in syscall_foo.c,
+ * or for the moment, vice versa.
+ */
+
+int host_to_target_errno(int err);
+
+static inline abi_long get_errno(abi_long ret)
+{
+return unlikely(ret == -1) ?

[Qemu-devel] [PATCH v3 10/19] linux-user: Propagate goto unimplemented to default

2018-06-11 Thread Richard Henderson

There is no point in listing a syscall if you want the same effect as
not listing it.  In one less trivial case, the goto was demonstrably
not reachable.

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 144 +--
 1 file changed, 1 insertion(+), 143 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 06205d3d65..5a17803732 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8297,14 +8297,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 ret = get_errno(chmod(p, arg2));
 unlock_user(p, arg1, 0);
 return ret;
-#endif
-#ifdef TARGET_NR_break
-case TARGET_NR_break:
-goto unimplemented;
-#endif
-#ifdef TARGET_NR_oldstat
-case TARGET_NR_oldstat:
-goto unimplemented;
 #endif
 case TARGET_NR_lseek:
 return get_errno(lseek(arg1, arg2, arg3));
@@ -8390,16 +8382,10 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return get_errno(stime(_time));
 }
 #endif
-case TARGET_NR_ptrace:
-goto unimplemented;
 #ifdef TARGET_NR_alarm /* not on alpha */
 case TARGET_NR_alarm:
 return alarm(arg1);
 #endif
-#ifdef TARGET_NR_oldfstat
-case TARGET_NR_oldfstat:
-goto unimplemented;
-#endif
 #ifdef TARGET_NR_pause /* not on alpha */
 case TARGET_NR_pause:
 if (!block_signals()) {
@@ -8470,14 +8456,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 }
 return ret;
 #endif
-#ifdef TARGET_NR_stty
-case TARGET_NR_stty:
-goto unimplemented;
-#endif
-#ifdef TARGET_NR_gtty
-case TARGET_NR_gtty:
-goto unimplemented;
-#endif
 #ifdef TARGET_NR_access
 case TARGET_NR_access:
 if (!(p = lock_user_string(arg1))) {
@@ -8499,10 +8477,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #ifdef TARGET_NR_nice /* not on alpha */
 case TARGET_NR_nice:
 return get_errno(nice(arg1));
-#endif
-#ifdef TARGET_NR_ftime
-case TARGET_NR_ftime:
-goto unimplemented;
 #endif
 case TARGET_NR_sync:
 sync();
@@ -8616,14 +8590,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 ret = host_to_target_clock_t(ret);
 }
 return ret;
-#ifdef TARGET_NR_prof
-case TARGET_NR_prof:
-goto unimplemented;
-#endif
-#ifdef TARGET_NR_signal
-case TARGET_NR_signal:
-goto unimplemented;
-#endif
 case TARGET_NR_acct:
 if (arg1 == 0) {
 ret = get_errno(acct(NULL));
@@ -8642,31 +8608,15 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 ret = get_errno(umount2(p, arg2));
 unlock_user(p, arg1, 0);
 return ret;
-#endif
-#ifdef TARGET_NR_lock
-case TARGET_NR_lock:
-goto unimplemented;
 #endif
 case TARGET_NR_ioctl:
 return do_ioctl(arg1, arg2, arg3);
 #ifdef TARGET_NR_fcntl
 case TARGET_NR_fcntl:
 return do_fcntl(arg1, arg2, arg3);
-#endif
-#ifdef TARGET_NR_mpx
-case TARGET_NR_mpx:
-goto unimplemented;
 #endif
 case TARGET_NR_setpgid:
 return get_errno(setpgid(arg1, arg2));
-#ifdef TARGET_NR_ulimit
-case TARGET_NR_ulimit:
-goto unimplemented;
-#endif
-#ifdef TARGET_NR_oldolduname
-case TARGET_NR_oldolduname:
-goto unimplemented;
-#endif
 case TARGET_NR_umask:
 return get_errno(umask(arg1));
 case TARGET_NR_chroot:
@@ -8675,10 +8625,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 ret = get_errno(chroot(p));
 unlock_user(p, arg1, 0);
 return ret;
-#ifdef TARGET_NR_ustat
-case TARGET_NR_ustat:
-goto unimplemented;
-#endif
 #ifdef TARGET_NR_dup2
 case TARGET_NR_dup2:
 ret = get_errno(dup2(arg1, arg2));
@@ -9386,10 +9332,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 }
 return ret;
 #endif
-#ifdef TARGET_NR_oldlstat
-case TARGET_NR_oldlstat:
-goto unimplemented;
-#endif
 #ifdef TARGET_NR_readlink
 case TARGET_NR_readlink:
 {
@@ -9443,10 +9385,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 }
 return ret;
 #endif
-#ifdef TARGET_NR_uselib
-case TARGET_NR_uselib:
-goto unimplemented;
-#endif
 #ifdef TARGET_NR_swapon
 case TARGET_NR_swapon:
 if (!(p = lock_user_string(arg1)))
@@ -9468,10 +9406,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
ret = get_errno(reboot(arg1, arg2, arg3, NULL));
 }
 return ret;
-#ifdef TARGET_NR_readdir
-case TARGET_NR_readdir:
-goto unimplemented;
-#endif
 #ifdef TARGET_NR_mmap
 case TARGET_NR_mmap:
 #if (defined(TARGET_I386) && defined(TARGET_ABI32)) || \
@@ -9588,10 +9522,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return ret;

[Qemu-devel] [PATCH v3 08/19] linux-user: Propagate goto efault to return

2018-06-11 Thread Richard Henderson

Reviewed-by: Laurent Vivier 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 341 ++-
 1 file changed, 175 insertions(+), 166 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index efe882612b..a6b8268332 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8009,7 +8009,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return 0;
 } else {
 if (!(p = lock_user(VERIFY_WRITE, arg2, arg3, 0)))
-goto efault;
+return -TARGET_EFAULT;
 ret = get_errno(safe_read(arg1, p, arg3));
 if (ret >= 0 &&
 fd_trans_host_to_target_data(arg1)) {
@@ -8020,7 +8020,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return ret;
 case TARGET_NR_write:
 if (!(p = lock_user(VERIFY_READ, arg2, arg3, 1)))
-goto efault;
+return -TARGET_EFAULT;
 if (fd_trans_target_to_host_data(arg1)) {
 void *copy = g_malloc(arg3);
 memcpy(copy, p, arg3);
@@ -8038,7 +8038,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #ifdef TARGET_NR_open
 case TARGET_NR_open:
 if (!(p = lock_user_string(arg1)))
-goto efault;
+return -TARGET_EFAULT;
 ret = get_errno(do_openat(cpu_env, AT_FDCWD, p,
   target_to_host_bitmask(arg2, 
fcntl_flags_tbl),
   arg3));
@@ -8048,7 +8048,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 case TARGET_NR_openat:
 if (!(p = lock_user_string(arg2)))
-goto efault;
+return -TARGET_EFAULT;
 ret = get_errno(do_openat(cpu_env, arg1, p,
   target_to_host_bitmask(arg3, 
fcntl_flags_tbl),
   arg4));
@@ -8083,7 +8083,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 ret = get_errno(safe_wait4(arg1, , arg3, 0));
 if (!is_error(ret) && arg2 && ret
 && put_user_s32(host_to_target_waitstatus(status), arg2))
-goto efault;
+return -TARGET_EFAULT;
 }
 return ret;
 #endif
@@ -8095,7 +8095,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 ret = get_errno(safe_waitid(arg1, arg2, , arg4, NULL));
 if (!is_error(ret) && arg3 && info.si_pid != 0) {
 if (!(p = lock_user(VERIFY_WRITE, arg3, 
sizeof(target_siginfo_t), 0)))
-goto efault;
+return -TARGET_EFAULT;
 host_to_target_siginfo(p, );
 unlock_user(p, arg3, sizeof(target_siginfo_t));
 }
@@ -8105,7 +8105,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #ifdef TARGET_NR_creat /* not on alpha */
 case TARGET_NR_creat:
 if (!(p = lock_user_string(arg1)))
-goto efault;
+return -TARGET_EFAULT;
 ret = get_errno(creat(p, arg2));
 fd_trans_unregister(ret);
 unlock_user(p, arg1, 0);
@@ -8131,7 +8131,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 {
 void * p2 = NULL;
 if (!arg2 || !arg4)
-goto efault;
+return -TARGET_EFAULT;
 p  = lock_user_string(arg2);
 p2 = lock_user_string(arg4);
 if (!p || !p2)
@@ -8146,7 +8146,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #ifdef TARGET_NR_unlink
 case TARGET_NR_unlink:
 if (!(p = lock_user_string(arg1)))
-goto efault;
+return -TARGET_EFAULT;
 ret = get_errno(unlink(p));
 unlock_user(p, arg1, 0);
 return ret;
@@ -8154,7 +8154,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #if defined(TARGET_NR_unlinkat)
 case TARGET_NR_unlinkat:
 if (!(p = lock_user_string(arg2)))
-goto efault;
+return -TARGET_EFAULT;
 ret = get_errno(unlinkat(arg1, p, arg3));
 unlock_user(p, arg2, 0);
 return ret;
@@ -8174,7 +8174,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 guest_argp = arg2;
 for (gp = guest_argp; gp; gp += sizeof(abi_ulong)) {
 if (get_user_ual(addr, gp))
-goto efault;
+return -TARGET_EFAULT;
 if (!addr)
 break;
 argc++;
@@ -8183,7 +8183,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 guest_envp = arg3;
 for (gp = guest_envp; gp; gp += sizeof(abi_ulong)) {
 if (get_user_ual(addr, gp))
-goto efault;
+return

[Qemu-devel] [PATCH v3 09/19] linux-user: Propagate goto unimplemented_nowarn to return

2018-06-11 Thread Richard Henderson

Reviewed-by: Laurent Vivier 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index a6b8268332..06205d3d65 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -11807,7 +11807,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
   return 0;
   }
 #else
-  goto unimplemented_nowarn;
+  return -TARGET_ENOSYS;
 #endif
 #endif
 #ifdef TARGET_NR_get_thread_area
@@ -11820,12 +11820,12 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return ts->tp_value;
 }
 #else
-goto unimplemented_nowarn;
+return -TARGET_ENOSYS;
 #endif
 #endif
 #ifdef TARGET_NR_getdomainname
 case TARGET_NR_getdomainname:
-goto unimplemented_nowarn;
+return -TARGET_ENOSYS;
 #endif
 
 #ifdef TARGET_NR_clock_settime
@@ -11910,7 +11910,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
  * holding a mutex that is shared with another process via
  * shared memory).
  */
-goto unimplemented_nowarn;
+return -TARGET_ENOSYS;
 #endif
 
 #if defined(TARGET_NR_utimensat)
@@ -12564,9 +12564,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 default:
 unimplemented:
 gemu_log("qemu: Unsupported syscall: %d\n", num);
-#if defined(TARGET_NR_setxattr) || defined(TARGET_NR_get_thread_area) || 
defined(TARGET_NR_getdomainname) || defined(TARGET_NR_set_robust_list)
-unimplemented_nowarn:
-#endif
 return -TARGET_ENOSYS;
 }
 fail:
-- 
2.17.1

[Qemu-devel] [PATCH v3 14/19] linux-user: Split out preadv, pwritev, readv, writev

2018-06-11 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.h   |   5 ++
 linux-user/syscall.c   |  85 ++-
 linux-user/syscall_file.c  | 103 +
 linux-user/gen_syscall_list.py |   4 ++
 linux-user/strace.list |  12 
 5 files changed, 116 insertions(+), 93 deletions(-)

diff --git a/linux-user/syscall.h b/linux-user/syscall.h
index e35b0a60f5..3fc49d5b65 100644
--- a/linux-user/syscall.h
+++ b/linux-user/syscall.h
@@ -169,6 +169,11 @@ static inline void fd_trans_unregister(int fd)
 }
 }
 
+struct iovec *lock_iovec(int type, abi_ulong target_addr,
+ abi_ulong count, int copy);
+void unlock_iovec(struct iovec *vec, abi_ulong target_addr,
+  abi_ulong count, int copy);
+
 /* Temporary declarations from syscall_foo.c back to main syscall.c.
  * These indicate incomplete conversion.
  */
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index c47e73de5f..381ee9c3a2 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -856,12 +856,6 @@ safe_syscall2(int, rt_sigsuspend, sigset_t *, newset, 
size_t, sigsetsize)
 safe_syscall2(int, kill, pid_t, pid, int, sig)
 safe_syscall2(int, tkill, int, tid, int, sig)
 safe_syscall3(int, tgkill, int, tgid, int, pid, int, sig)
-safe_syscall3(ssize_t, readv, int, fd, const struct iovec *, iov, int, iovcnt)
-safe_syscall3(ssize_t, writev, int, fd, const struct iovec *, iov, int, iovcnt)
-safe_syscall5(ssize_t, preadv, int, fd, const struct iovec *, iov, int, iovcnt,
-  unsigned long, pos_l, unsigned long, pos_h)
-safe_syscall5(ssize_t, pwritev, int, fd, const struct iovec *, iov, int, 
iovcnt,
-  unsigned long, pos_l, unsigned long, pos_h)
 safe_syscall3(int, connect, int, fd, const struct sockaddr *, addr,
   socklen_t, addrlen)
 safe_syscall6(ssize_t, sendto, int, fd, const void *, buf, size_t, len,
@@ -3317,25 +3311,8 @@ static abi_long do_getsockopt(int sockfd, int level, int 
optname,
 return ret;
 }
 
-/* Convert target low/high pair representing file offset into the host
- * low/high pair. This function doesn't handle offsets bigger than 64 bits
- * as the kernel doesn't handle them either.
- */
-static void target_to_host_low_high(abi_ulong tlow,
-abi_ulong thigh,
-unsigned long *hlow,
-unsigned long *hhigh)
-{
-uint64_t off = tlow |
-((unsigned long long)thigh << TARGET_LONG_BITS / 2) <<
-TARGET_LONG_BITS / 2;
-
-*hlow = off;
-*hhigh = (off >> HOST_LONG_BITS / 2) >> HOST_LONG_BITS / 2;
-}
-
-static struct iovec *lock_iovec(int type, abi_ulong target_addr,
-abi_ulong count, int copy)
+struct iovec *lock_iovec(int type, abi_ulong target_addr,
+ abi_ulong count, int copy)
 {
 struct target_iovec *target_vec;
 struct iovec *vec;
@@ -3422,8 +3399,8 @@ static struct iovec *lock_iovec(int type, abi_ulong 
target_addr,
 return NULL;
 }
 
-static void unlock_iovec(struct iovec *vec, abi_ulong target_addr,
- abi_ulong count, int copy)
+void unlock_iovec(struct iovec *vec, abi_ulong target_addr,
+  abi_ulong count, int copy)
 {
 struct target_iovec *target_vec;
 int i;
@@ -9859,60 +9836,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 /* NOTE: the flock constant seems to be the same for every
Linux platform */
 return get_errno(safe_flock(arg1, arg2));
-case TARGET_NR_readv:
-{
-struct iovec *vec = lock_iovec(VERIFY_WRITE, arg2, arg3, 0);
-if (vec != NULL) {
-ret = get_errno(safe_readv(arg1, vec, arg3));
-unlock_iovec(vec, arg2, arg3, 1);
-} else {
-ret = -host_to_target_errno(errno);
-}
-}
-return ret;
-case TARGET_NR_writev:
-{
-struct iovec *vec = lock_iovec(VERIFY_READ, arg2, arg3, 1);
-if (vec != NULL) {
-ret = get_errno(safe_writev(arg1, vec, arg3));
-unlock_iovec(vec, arg2, arg3, 0);
-} else {
-ret = -host_to_target_errno(errno);
-}
-}
-return ret;
-#if defined(TARGET_NR_preadv)
-case TARGET_NR_preadv:
-{
-struct iovec *vec = lock_iovec(VERIFY_WRITE, arg2, arg3, 0);
-if (vec != NULL) {
-unsigned long low, high;
-
-target_to_host_low_high(arg4, arg5, , );
-ret = get_errno(safe_preadv(arg1, vec, arg3, low, high));
-unlock_iovec(vec, arg2, arg3, 1);
-} else {
-ret = -host_to_target_errno(errno);
-   }
-}
-return ret;
-#endif
-#if defined(TARGET_NR_pwritev)
-case TARGET_NR_pwritev:
-{
-

[Qemu-devel] [PATCH v3 06/19] linux-user: Split out do_syscall1

2018-06-11 Thread Richard Henderson

There was supposed to be a single point of return for do_syscall
so that tracing works properly.  However, there are a few bugs
in that area.  It is significantly simpler to simply split out
an inner function to enforce this.

Reviewed-by: Laurent Vivier 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 77 +++-
 1 file changed, 48 insertions(+), 29 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index c212149245..ec3bc1cbe5 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7947,13 +7947,15 @@ static int host_to_target_cpu_mask(const unsigned long 
*host_mask,
 return 0;
 }
 
-/* do_syscall() should always have a single exit point at the end so
-   that actions, such as logging of syscall results, can be performed.
-   All errnos that do_syscall() returns must be -TARGET_. */
-abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
-abi_long arg2, abi_long arg3, abi_long arg4,
-abi_long arg5, abi_long arg6, abi_long arg7,
-abi_long arg8)
+/* This is an internal helper for do_syscall so that it is easier
+ * to have a single return point, so that actions, such as logging
+ * of syscall results, can be performed.
+ * All errnos that do_syscall() returns must be -TARGET_.
+ */
+static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
+abi_long arg2, abi_long arg3, abi_long arg4,
+abi_long arg5, abi_long arg6, abi_long arg7,
+abi_long arg8)
 {
 CPUState *cpu = ENV_GET_CPU(cpu_env);
 abi_long ret;
@@ -7961,25 +7963,6 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 struct statfs stfs;
 void *p;
 
-#if defined(DEBUG_ERESTARTSYS)
-/* Debug-only code for exercising the syscall-restart code paths
- * in the per-architecture cpu main loops: restart every syscall
- * the guest makes once before letting it through.
- */
-{
-static int flag;
-
-flag = !flag;
-if (flag) {
-return -TARGET_ERESTARTSYS;
-}
-}
-#endif
-
-trace_guest_user_syscall(cpu, num, arg1, arg2, arg3, arg4, arg5, arg6, 
arg7, arg8);
-if(do_strace)
-print_syscall(num, arg1, arg2, arg3, arg4, arg5, arg6);
-
 switch(num) {
 case TARGET_NR_exit:
 /* In old applications this may be used to implement _exit(2).
@@ -12765,11 +12748,47 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 break;
 }
 fail:
-if(do_strace)
-print_syscall_ret(num, ret);
-trace_guest_user_syscall_ret(cpu, num, ret);
 return ret;
 efault:
 ret = -TARGET_EFAULT;
 goto fail;
 }
+
+abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
+abi_long arg2, abi_long arg3, abi_long arg4,
+abi_long arg5, abi_long arg6, abi_long arg7,
+abi_long arg8)
+{
+CPUState *cpu = ENV_GET_CPU(cpu_env);
+abi_long ret;
+
+#ifdef DEBUG_ERESTARTSYS
+/* Debug-only code for exercising the syscall-restart code paths
+ * in the per-architecture cpu main loops: restart every syscall
+ * the guest makes once before letting it through.
+ */
+{
+static bool flag;
+flag = !flag;
+if (flag) {
+return -TARGET_ERESTARTSYS;
+}
+}
+#endif
+
+trace_guest_user_syscall(cpu, num, arg1, arg2, arg3, arg4,
+ arg5, arg6, arg7, arg8);
+
+if (unlikely(do_strace)) {
+print_syscall(num, arg1, arg2, arg3, arg4, arg5, arg6);
+ret = do_syscall1(cpu_env, num, arg1, arg2, arg3, arg4,
+  arg5, arg6, arg7, arg8);
+print_syscall_ret(num, ret);
+} else {
+ret = do_syscall1(cpu_env, num, arg1, arg2, arg3, arg4,
+  arg5, arg6, arg7, arg8);
+}
+
+trace_guest_user_syscall_ret(cpu, num, ret);
+return ret;
+}
-- 
2.17.1

[Qemu-devel] [PATCH v3 04/19] linux-user/sparc64: Add inotify_rm_watch and tee syscalls

2018-06-11 Thread Richard Henderson

Reviewed-by: Laurent Vivier 
Signed-off-by: Richard Henderson 
---
 linux-user/sparc64/syscall_nr.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/sparc64/syscall_nr.h b/linux-user/sparc64/syscall_nr.h
index 9391645598..0b91b896da 100644
--- a/linux-user/sparc64/syscall_nr.h
+++ b/linux-user/sparc64/syscall_nr.h
@@ -154,7 +154,7 @@
 #define TARGET_NR_poll   153 /* Common 
 */
 #define TARGET_NR_getdents64   154 /* Linux specific   
   */
 #define TARGET_NR_fcntl64155 /* Linux sparc32 Specific 
 */
-/* #define TARGET_NR_getdirentries   156SunOS Specific 
 */
+#define TARGET_NR_inotify_rm_watch   156 /* Linux specific 
 */
 #define TARGET_NR_statfs 157 /* Common 
 */
 #define TARGET_NR_fstatfs158 /* Common 
 */
 #define TARGET_NR_umount 159 /* Common 
 */
@@ -278,7 +278,7 @@
 #define TARGET_NR_mq_notify277
 #define TARGET_NR_mq_getsetattr278
 #define TARGET_NR_waitid   279
-/*#define TARGET_NR_sys_setaltroot 280 available (was setaltroot) */
+#define TARGET_NR_tee   280
 #define TARGET_NR_add_key  281
 #define TARGET_NR_request_key  282
 #define TARGET_NR_keyctl   283
-- 
2.17.1

[Qemu-devel] [PATCH v3 16/19] linux-user: Split out name_to_handle_at, open_by_handle_at

2018-06-11 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.h   |  1 +
 linux-user/strace.c|  5 +-
 linux-user/syscall.c   | 98 --
 linux-user/syscall_file.c  | 88 ++
 linux-user/gen_syscall_list.py |  2 +
 linux-user/strace.list |  3 --
 6 files changed, 95 insertions(+), 102 deletions(-)

diff --git a/linux-user/syscall.h b/linux-user/syscall.h
index a157e25bba..217267409b 100644
--- a/linux-user/syscall.h
+++ b/linux-user/syscall.h
@@ -52,6 +52,7 @@ typedef enum {
 
 /* These print as sets of flags.  */
 ARG_ATDIRFD,
+ARG_ATFLAG,
 ARG_MODEFLAG,
 ARG_OPENFLAG,
 
diff --git a/linux-user/strace.c b/linux-user/strace.c
index 1ae0057365..4240ea649a 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -780,7 +780,7 @@ UNUSED static struct flags access_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags at_file_flags[] = {
+static struct flags const at_file_flags[] = {
 #ifdef AT_EACCESS
 FLAG_GENERIC(AT_EACCESS),
 #endif
@@ -2693,6 +2693,9 @@ static void print_syscall_def1(const SyscallDef *def, 
int64_t args[6])
 case ARG_ATDIRFD:
 len = add_atdirfd(b, rest, arg);
 break;
+case ARG_ATFLAG:
+len = add_flags(b, rest, at_file_flags, arg, false);
+break;
 case ARG_MODEFLAG:
 len = add_flags(b, rest, mode_flags, arg, true);
 break;
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index cdc9787240..8c7dd3049e 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7151,93 +7151,6 @@ static int do_futex(target_ulong uaddr, int op, int val, 
target_ulong timeout,
 return -TARGET_ENOSYS;
 }
 }
-#if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
-static abi_long do_name_to_handle_at(abi_long dirfd, abi_long pathname,
- abi_long handle, abi_long mount_id,
- abi_long flags)
-{
-struct file_handle *target_fh;
-struct file_handle *fh;
-int mid = 0;
-abi_long ret;
-char *name;
-unsigned int size, total_size;
-
-if (get_user_s32(size, handle)) {
-return -TARGET_EFAULT;
-}
-
-name = lock_user_string(pathname);
-if (!name) {
-return -TARGET_EFAULT;
-}
-
-total_size = sizeof(struct file_handle) + size;
-target_fh = lock_user(VERIFY_WRITE, handle, total_size, 0);
-if (!target_fh) {
-unlock_user(name, pathname, 0);
-return -TARGET_EFAULT;
-}
-
-fh = g_malloc0(total_size);
-fh->handle_bytes = size;
-
-ret = get_errno(name_to_handle_at(dirfd, path(name), fh, , flags));
-unlock_user(name, pathname, 0);
-
-/* man name_to_handle_at(2):
- * Other than the use of the handle_bytes field, the caller should treat
- * the file_handle structure as an opaque data type
- */
-
-memcpy(target_fh, fh, total_size);
-target_fh->handle_bytes = tswap32(fh->handle_bytes);
-target_fh->handle_type = tswap32(fh->handle_type);
-g_free(fh);
-unlock_user(target_fh, handle, total_size);
-
-if (put_user_s32(mid, mount_id)) {
-return -TARGET_EFAULT;
-}
-
-return ret;
-
-}
-#endif
-
-#if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
-static abi_long do_open_by_handle_at(abi_long mount_fd, abi_long handle,
- abi_long flags)
-{
-struct file_handle *target_fh;
-struct file_handle *fh;
-unsigned int size, total_size;
-abi_long ret;
-
-if (get_user_s32(size, handle)) {
-return -TARGET_EFAULT;
-}
-
-total_size = sizeof(struct file_handle) + size;
-target_fh = lock_user(VERIFY_READ, handle, total_size, 1);
-if (!target_fh) {
-return -TARGET_EFAULT;
-}
-
-fh = g_memdup(target_fh, total_size);
-fh->handle_bytes = size;
-fh->handle_type = tswap32(target_fh->handle_type);
-
-ret = get_errno(open_by_handle_at(mount_fd, fh,
-target_to_host_bitmask(flags, fcntl_flags_tbl)));
-
-g_free(fh);
-
-unlock_user(target_fh, handle, total_size);
-
-return ret;
-}
-#endif
 
 #if defined(TARGET_NR_signalfd) || defined(TARGET_NR_signalfd4)
 
@@ -7533,17 +7446,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 gdb_exit(cpu_env, arg1);
 _exit(arg1);
 return 0; /* avoid warning */
-#if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
-case TARGET_NR_name_to_handle_at:
-ret = do_name_to_handle_at(arg1, arg2, arg3, arg4, arg5);
-return ret;
-#endif
-#if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
-case TARGET_NR_open_by_handle_at:
-ret = do_open_by_handle_at(arg1, arg2, arg3);
-fd_trans_unregister(ret);
-return ret;
-#endif
 case TARGET_NR_brk:
 return

[Qemu-devel] [PATCH v3 07/19] linux-user: Relax single exit from "break"

2018-06-11 Thread Richard Henderson

Transform outermost "break" to "return ret".  If the immediately
preceeding statement was an assignment to ret, return the value
directly.

Reviewed-by: Laurent Vivier 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 970 +--
 1 file changed, 390 insertions(+), 580 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index ec3bc1cbe5..efe882612b 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7971,8 +7971,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
Do thread termination if we have more then one thread.  */
 
 if (block_signals()) {
-ret = -TARGET_ERESTARTSYS;
-break;
+return -TARGET_ERESTARTSYS;
 }
 
 cpu_list_lock();
@@ -8004,12 +8003,11 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 gdb_exit(cpu_env, arg1);
 _exit(arg1);
-ret = 0; /* avoid warning */
-break;
+return 0; /* avoid warning */
 case TARGET_NR_read:
-if (arg3 == 0)
-ret = 0;
-else {
+if (arg3 == 0) {
+return 0;
+} else {
 if (!(p = lock_user(VERIFY_WRITE, arg2, arg3, 0)))
 goto efault;
 ret = get_errno(safe_read(arg1, p, arg3));
@@ -8019,7 +8017,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 }
 unlock_user(p, arg2, ret);
 }
-break;
+return ret;
 case TARGET_NR_write:
 if (!(p = lock_user(VERIFY_READ, arg2, arg3, 1)))
 goto efault;
@@ -8035,7 +8033,8 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 ret = get_errno(safe_write(arg1, p, arg3));
 }
 unlock_user(p, arg2, 0);
-break;
+return ret;
+
 #ifdef TARGET_NR_open
 case TARGET_NR_open:
 if (!(p = lock_user_string(arg1)))
@@ -8045,7 +8044,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
   arg3));
 fd_trans_unregister(ret);
 unlock_user(p, arg1, 0);
-break;
+return ret;
 #endif
 case TARGET_NR_openat:
 if (!(p = lock_user_string(arg2)))
@@ -8055,29 +8054,27 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
   arg4));
 fd_trans_unregister(ret);
 unlock_user(p, arg2, 0);
-break;
+return ret;
 #if defined(TARGET_NR_name_to_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 case TARGET_NR_name_to_handle_at:
 ret = do_name_to_handle_at(arg1, arg2, arg3, arg4, arg5);
-break;
+return ret;
 #endif
 #if defined(TARGET_NR_open_by_handle_at) && defined(CONFIG_OPEN_BY_HANDLE)
 case TARGET_NR_open_by_handle_at:
 ret = do_open_by_handle_at(arg1, arg2, arg3);
 fd_trans_unregister(ret);
-break;
+return ret;
 #endif
 case TARGET_NR_close:
 fd_trans_unregister(arg1);
-ret = get_errno(close(arg1));
-break;
+return get_errno(close(arg1));
+
 case TARGET_NR_brk:
-ret = do_brk(arg1);
-break;
+return do_brk(arg1);
 #ifdef TARGET_NR_fork
 case TARGET_NR_fork:
-ret = get_errno(do_fork(cpu_env, TARGET_SIGCHLD, 0, 0, 0, 0));
-break;
+return get_errno(do_fork(cpu_env, TARGET_SIGCHLD, 0, 0, 0, 0));
 #endif
 #ifdef TARGET_NR_waitpid
 case TARGET_NR_waitpid:
@@ -8088,7 +8085,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 && put_user_s32(host_to_target_waitstatus(status), arg2))
 goto efault;
 }
-break;
+return ret;
 #endif
 #ifdef TARGET_NR_waitid
 case TARGET_NR_waitid:
@@ -8103,7 +8100,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 unlock_user(p, arg3, sizeof(target_siginfo_t));
 }
 }
-break;
+return ret;
 #endif
 #ifdef TARGET_NR_creat /* not on alpha */
 case TARGET_NR_creat:
@@ -8112,7 +8109,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 ret = get_errno(creat(p, arg2));
 fd_trans_unregister(ret);
 unlock_user(p, arg1, 0);
-break;
+return ret;
 #endif
 #ifdef TARGET_NR_link
 case TARGET_NR_link:
@@ -8127,7 +8124,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 unlock_user(p2, arg2, 0);
 unlock_user(p, arg1, 0);
 }
-break;
+return ret;
 #endif
 #if defined(TARGET_NR_linkat)
 case TARGET_NR_linkat:
@@ -8144,7 +8141,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 unlock_user(p, arg2, 0);
 unlock_user(p2, arg4, 0);
 }
-break;
+return ret;
 #endif
 #ifdef TARGET_NR_unlink

[Qemu-devel] [PATCH v3 03/19] linux-user/microblaze: Fix typo in accept4 syscall

2018-06-11 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Laurent Vivier 
Signed-off-by: Richard Henderson 
---
 linux-user/microblaze/syscall_nr.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/microblaze/syscall_nr.h 
b/linux-user/microblaze/syscall_nr.h
index 0704449bae..761208e9e6 100644
--- a/linux-user/microblaze/syscall_nr.h
+++ b/linux-user/microblaze/syscall_nr.h
@@ -363,7 +363,7 @@
 #define TARGET_NR_shutdown 359 /* new */
 #define TARGET_NR_sendmsg  360 /* new */
 #define TARGET_NR_recvmsg  361 /* new */
-#define TARGET_NR_accept04 362 /* new */
+#define TARGET_NR_accept4  362 /* new */
 #define TARGET_NR_preadv363 /* new */
 #define TARGET_NR_pwritev   364 /* new */
 #define TARGET_NR_rt_tgsigqueueinfo 365 /* new */
-- 
2.17.1

[Qemu-devel] [PATCH v3 05/19] linux-user: Remove DEBUG

2018-06-11 Thread Richard Henderson

This is redundant with both -strace and actual tracing.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Laurent Vivier 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 7b9ac3b408..c212149245 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -163,7 +163,6 @@
  * (The one remaining unallocated bit is 0x1000 which used to be CLONE_PID.)
  */
 
-//#define DEBUG
 /* Define DEBUG_ERESTARTSYS to force every syscall to be restarted
  * once. This exercises the codepaths for restart.
  */
@@ -5777,9 +5776,6 @@ static abi_long do_ioctl(int fd, int cmd, abi_long arg)
 ie++;
 }
 arg_type = ie->arg_type;
-#if defined(DEBUG)
-gemu_log("ioctl: cmd=0x%04lx (%s)\n", (long)cmd, ie->name);
-#endif
 if (ie->do_ioctl) {
 return ie->do_ioctl(ie, buf_temp, fd, cmd, arg);
 } else if (!ie->host_cmd) {
@@ -7980,9 +7976,6 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
 }
 #endif
 
-#ifdef DEBUG
-gemu_log("syscall %d", num);
-#endif
 trace_guest_user_syscall(cpu, num, arg1, arg2, arg3, arg4, arg5, arg6, 
arg7, arg8);
 if(do_strace)
 print_syscall(num, arg1, arg2, arg3, arg4, arg5, arg6);
@@ -12772,9 +12765,6 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 break;
 }
 fail:
-#ifdef DEBUG
-gemu_log(" = " TARGET_ABI_FMT_ld "\n", ret);
-#endif
 if(do_strace)
 print_syscall_ret(num, ret);
 trace_guest_user_syscall_ret(cpu, num, ret);
-- 
2.17.1

[Qemu-devel] [PATCH v3 15/19] linux-user: Split out pread64, pwrite64

2018-06-11 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.h   | 44 +
 linux-user/syscall.c   | 70 --
 linux-user/syscall_file.c  | 61 +++--
 linux-user/gen_syscall_list.py |  2 +
 linux-user/strace.list |  6 ---
 5 files changed, 104 insertions(+), 79 deletions(-)

diff --git a/linux-user/syscall.h b/linux-user/syscall.h
index 3fc49d5b65..a157e25bba 100644
--- a/linux-user/syscall.h
+++ b/linux-user/syscall.h
@@ -174,6 +174,50 @@ struct iovec *lock_iovec(int type, abi_ulong target_addr,
 void unlock_iovec(struct iovec *vec, abi_ulong target_addr,
   abi_ulong count, int copy);
 
+/* Returns true if syscall NUM expects 64bit types aligned even
+ * on pairs of registers.
+ */
+static inline bool regpairs_aligned(void *cpu_env, int num)
+{
+#ifdef TARGET_ARM
+return ((CPUARMState *)cpu_env)->eabi;
+#elif defined(TARGET_MIPS) && TARGET_ABI_BITS == 32
+return true;
+#elif defined(TARGET_PPC) && !defined(TARGET_PPC64)
+/* SysV AVI for PPC32 expects 64bit parameters to be passed on
+ * odd/even pairs of registers which translates to the same as
+ * we start with r3 as arg1
+ */
+return true;
+#elif defined(TARGET_SH4)
+/* SH4 doesn't align register pairs, except for p{read,write}64 */
+switch (num) {
+case TARGET_NR_pread64:
+case TARGET_NR_pwrite64:
+return true;
+default:
+return false;
+}
+#elif defined(TARGET_XTENSA)
+return true;
+#else
+return false;
+#endif
+}
+
+static inline uint64_t target_offset64(abi_ulong word0, abi_ulong word1)
+{
+#if TARGET_ABI_BITS == 32
+# ifdef TARGET_WORDS_BIGENDIAN
+return ((uint64_t)word0 << 32) | word1;
+# else
+return ((uint64_t)word1 << 32) | word0;
+# endif
+#else
+return word0;
+#endif
+}
+
 /* Temporary declarations from syscall_foo.c back to main syscall.c.
  * These indicate incomplete conversion.
  */
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 381ee9c3a2..cdc9787240 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -642,38 +642,6 @@ static inline int next_free_host_timer(void)
 }
 #endif
 
-/* ARM EABI and MIPS expect 64bit types aligned even on pairs or registers */
-#ifdef TARGET_ARM
-static inline int regpairs_aligned(void *cpu_env, int num)
-{
-return CPUARMState *)cpu_env)->eabi) == 1) ;
-}
-#elif defined(TARGET_MIPS) && (TARGET_ABI_BITS == 32)
-static inline int regpairs_aligned(void *cpu_env, int num) { return 1; }
-#elif defined(TARGET_PPC) && !defined(TARGET_PPC64)
-/* SysV AVI for PPC32 expects 64bit parameters to be passed on odd/even pairs
- * of registers which translates to the same as ARM/MIPS, because we start with
- * r3 as arg1 */
-static inline int regpairs_aligned(void *cpu_env, int num) { return 1; }
-#elif defined(TARGET_SH4)
-/* SH4 doesn't align register pairs, except for p{read,write}64 */
-static inline int regpairs_aligned(void *cpu_env, int num)
-{
-switch (num) {
-case TARGET_NR_pread64:
-case TARGET_NR_pwrite64:
-return 1;
-
-default:
-return 0;
-}
-}
-#elif defined(TARGET_XTENSA)
-static inline int regpairs_aligned(void *cpu_env, int num) { return 1; }
-#else
-static inline int regpairs_aligned(void *cpu_env, int num) { return 0; }
-#endif
-
 #define ERRNO_TABLE_SIZE 1200
 
 /* target_to_host_errno_table[] is initialized from
@@ -6864,22 +6832,6 @@ void syscall_init(void)
 }
 }
 
-#if TARGET_ABI_BITS == 32
-static inline uint64_t target_offset64(uint32_t word0, uint32_t word1)
-{
-#ifdef TARGET_WORDS_BIGENDIAN
-return ((uint64_t)word0 << 32) | word1;
-#else
-return ((uint64_t)word1 << 32) | word0;
-#endif
-}
-#else /* TARGET_ABI_BITS == 32 */
-static inline uint64_t target_offset64(uint64_t word0, uint64_t word1)
-{
-return word0;
-}
-#endif /* TARGET_ABI_BITS != 32 */
-
 #ifdef TARGET_NR_truncate64
 static inline abi_long target_truncate64(void *cpu_env, const char *arg1,
  abi_long arg2,
@@ -10078,28 +10030,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #else
 #error unreachable
 #endif
-#endif
-#ifdef TARGET_NR_pread64
-case TARGET_NR_pread64:
-if (regpairs_aligned(cpu_env, num)) {
-arg4 = arg5;
-arg5 = arg6;
-}
-if (!(p = lock_user(VERIFY_WRITE, arg2, arg3, 0)))
-return -TARGET_EFAULT;
-ret = get_errno(pread64(arg1, p, arg3, target_offset64(arg4, arg5)));
-unlock_user(p, arg2, ret);
-return ret;
-case TARGET_NR_pwrite64:
-if (regpairs_aligned(cpu_env, num)) {
-arg4 = arg5;
-arg5 = arg6;
-}
-if (!(p = lock_user(VERIFY_READ, arg2, arg3, 1)))
-return -TARGET_EFAULT;
-ret = get_errno(pwrite64(arg1, p, arg3, target_offset64(arg4, arg5)));
-unlock_user(p, arg2, 0);
-return ret;
 #endif
 case

[Qemu-devel] [PATCH v3 01/19] linux-user/alpha: Fix epoll syscalls

2018-06-11 Thread Richard Henderson

These were named incorrectly, going so far as to invade strace.list.

Reviewed-by: Laurent Vivier 
Signed-off-by: Richard Henderson 
---
 linux-user/alpha/syscall_nr.h | 6 +++---
 linux-user/strace.list| 9 -
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/linux-user/alpha/syscall_nr.h b/linux-user/alpha/syscall_nr.h
index 00e14bb6b3..6532376e3e 100644
--- a/linux-user/alpha/syscall_nr.h
+++ b/linux-user/alpha/syscall_nr.h
@@ -343,9 +343,9 @@
 #define TARGET_NR_io_cancel402
 #define TARGET_NR_exit_group   405
 #define TARGET_NR_lookup_dcookie   406
-#define TARGET_NR_sys_epoll_create 407
-#define TARGET_NR_sys_epoll_ctl408
-#define TARGET_NR_sys_epoll_wait   409
+#define TARGET_NR_epoll_create 407
+#define TARGET_NR_epoll_ctl408
+#define TARGET_NR_epoll_wait   409
 #define TARGET_NR_remap_file_pages 410
 #define TARGET_NR_set_tid_address  411
 #define TARGET_NR_restart_syscall  412
diff --git a/linux-user/strace.list b/linux-user/strace.list
index a91e33f7e5..2bc5ba04d4 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -1467,15 +1467,6 @@
 #ifdef TARGET_NR__sysctl
 { TARGET_NR__sysctl, "_sysctl" , NULL, NULL, NULL },
 #endif
-#ifdef TARGET_NR_sys_epoll_create
-{ TARGET_NR_sys_epoll_create, "sys_epoll_create" , NULL, NULL, NULL },
-#endif
-#ifdef TARGET_NR_sys_epoll_ctl
-{ TARGET_NR_sys_epoll_ctl, "sys_epoll_ctl" , NULL, NULL, NULL },
-#endif
-#ifdef TARGET_NR_sys_epoll_wait
-{ TARGET_NR_sys_epoll_wait, "sys_epoll_wait" , NULL, NULL, NULL },
-#endif
 #ifdef TARGET_NR_sysfs
 { TARGET_NR_sysfs, "sysfs" , NULL, NULL, NULL },
 #endif
-- 
2.17.1

[Qemu-devel] [PATCH v3 02/19] linux-user/hppa: Fix typo in mknodat syscall

2018-06-11 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Laurent Vivier 
Signed-off-by: Richard Henderson 
---
 linux-user/hppa/syscall_nr.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/hppa/syscall_nr.h b/linux-user/hppa/syscall_nr.h
index 55bdf71d50..9c1d0a195d 100644
--- a/linux-user/hppa/syscall_nr.h
+++ b/linux-user/hppa/syscall_nr.h
@@ -279,7 +279,7 @@
 #define TARGET_NR_ppoll 274
 #define TARGET_NR_openat275
 #define TARGET_NR_mkdirat   276
-#define TARGET_NR_mknotat   277
+#define TARGET_NR_mknodat   277
 #define TARGET_NR_fchownat  278
 #define TARGET_NR_futimesat 279
 #define TARGET_NR_fstatat64 280
-- 
2.17.1

[Qemu-devel] [PATCH v3 00/19] linux-user: Split do_syscall

2018-06-11 Thread Richard Henderson

Version 3 does not attempt the whole thing in one go,
and does attempt to incorporate strace into the reorg.

Thoughts?


r~


Richard Henderson (19):
  linux-user/alpha: Fix epoll syscalls
  linux-user/hppa: Fix typo in mknodat syscall
  linux-user/microblaze: Fix typo in accept4 syscall
  linux-user/sparc64: Add inotify_rm_watch and tee syscalls
  linux-user: Remove DEBUG
  linux-user: Split out do_syscall1
  linux-user: Relax single exit from "break"
  linux-user: Propagate goto efault to return
  linux-user: Propagate goto unimplemented_nowarn to return
  linux-user: Propagate goto unimplemented to default
  linux-user: Propagate goto fail to return
  linux-user: Setup split syscall infrastructure
  linux-user: Split out close, open, openat, read, write
  linux-user: Split out preadv, pwritev, readv, writev
  linux-user: Split out pread64, pwrite64
  linux-user: Split out name_to_handle_at, open_by_handle_at
  linux-user: Split out ipc syscalls
  linux-user: Split out memory syscalls
  linux-user: Split out some process syscalls

 linux-user/alpha/syscall_nr.h  |6 +-
 linux-user/hppa/syscall_nr.h   |2 +-
 linux-user/microblaze/syscall_nr.h |2 +-
 linux-user/sparc64/syscall_nr.h|4 +-
 linux-user/syscall.h   |  389 +++
 linux-user/strace.c|  600 ++---
 linux-user/syscall.c   | 4015 
 linux-user/syscall_file.c  |  669 +
 linux-user/syscall_ipc.c   | 1095 
 linux-user/syscall_mem.c   |  190 ++
 linux-user/syscall_proc.c  |  914 +++
 linux-user/Makefile.objs   |   17 +-
 linux-user/gen_syscall_list.py |  160 ++
 linux-user/strace.list |  264 --
 14 files changed, 4300 insertions(+), 4027 deletions(-)
 create mode 100644 linux-user/syscall.h
 create mode 100644 linux-user/syscall_file.c
 create mode 100644 linux-user/syscall_ipc.c
 create mode 100644 linux-user/syscall_mem.c
 create mode 100644 linux-user/syscall_proc.c
 create mode 100644 linux-user/gen_syscall_list.py

-- 
2.17.1

Re: [Qemu-devel] [PATCH v1 03/11] pc: factor out pc-dimm checks into pc_dimm_pre_plug()

2018-06-11 Thread David Gibson

On Mon, Jun 11, 2018 at 02:16:47PM +0200, David Hildenbrand wrote:
> We can perform these checks before the device is actually realized.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: David Gibson 

> ---
>  hw/i386/pc.c | 44 ++--
>  1 file changed, 26 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index f3befe6721..85c040482e 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1674,6 +1674,29 @@ void ioapic_init_gsi(GSIState *gsi_state, const char 
> *parent_name)
>  }
>  }
>  
> +static void pc_dimm_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> + Error **errp)
> +{
> +const PCMachineState *pcms = PC_MACHINE(hotplug_dev);
> +const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
> +
> +/*
> + * When -no-acpi is used with Q35 machine type, no ACPI is built,
> + * but pcms->acpi_dev is still created. Check !acpi_enabled in
> + * addition to cover this case.
> + */
> +if (!pcms->acpi_dev || !acpi_enabled) {
> +error_setg(errp,
> +   "memory hotplug is not enabled: missing acpi device or 
> acpi disabled");
> +return;
> +}
> +
> +if (is_nvdimm && !pcms->acpi_nvdimm_state.is_enabled) {
> +error_setg(errp, "nvdimm is not enabled: missing 'nvdimm' in '-M'");
> +return;
> +}
> +}
> +
>  static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>   DeviceState *dev, Error **errp)
>  {
> @@ -1696,23 +1719,6 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>  align = memory_region_get_alignment(mr);
>  }
>  
> -/*
> - * When -no-acpi is used with Q35 machine type, no ACPI is built,
> - * but pcms->acpi_dev is still created. Check !acpi_enabled in
> - * addition to cover this case.
> - */
> -if (!pcms->acpi_dev || !acpi_enabled) {
> -error_setg(_err,
> -   "memory hotplug is not enabled: missing acpi device or 
> acpi disabled");
> -goto out;
> -}
> -
> -if (is_nvdimm && !pcms->acpi_nvdimm_state.is_enabled) {
> -error_setg(_err,
> -   "nvdimm is not enabled: missing 'nvdimm' in '-M'");
> -goto out;
> -}
> -
>  pc_dimm_memory_plug(dev, MACHINE(pcms), align, _err);
>  if (local_err) {
>  goto out;
> @@ -2006,7 +2012,9 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>  static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>DeviceState *dev, Error **errp)
>  {
> -if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +pc_dimm_pre_plug(hotplug_dev, dev, errp);
> +} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>  pc_cpu_pre_plug(hotplug_dev, dev, errp);
>  }
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v1 02/11] nvdimm: no need to overwrite get_vmstate_memory_region()

2018-06-11 Thread David Gibson

On Mon, Jun 11, 2018 at 02:16:46PM +0200, David Hildenbrand wrote:
> Our parent class (PC_DIMM) provides exactly the same function.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: David Gibson 

> ---
>  hw/mem/nvdimm.c | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> index 4087aca25e..f974accbdd 100644
> --- a/hw/mem/nvdimm.c
> +++ b/hw/mem/nvdimm.c
> @@ -166,11 +166,6 @@ static void nvdimm_write_label_data(NVDIMMDevice 
> *nvdimm, const void *buf,
>  memory_region_set_dirty(mr, backend_offset, size);
>  }
>  
> -static MemoryRegion *nvdimm_get_vmstate_memory_region(PCDIMMDevice *dimm)
> -{
> -return host_memory_backend_get_memory(dimm->hostmem, _abort);
> -}
> -
>  static void nvdimm_class_init(ObjectClass *oc, void *data)
>  {
>  PCDIMMDeviceClass *ddc = PC_DIMM_CLASS(oc);
> @@ -178,7 +173,6 @@ static void nvdimm_class_init(ObjectClass *oc, void *data)
>  
>  ddc->realize = nvdimm_realize;
>  ddc->get_memory_region = nvdimm_get_memory_region;
> -ddc->get_vmstate_memory_region = nvdimm_get_vmstate_memory_region;
>  
>  nvc->read_label_data = nvdimm_read_label_data;
>  nvc->write_label_data = nvdimm_write_label_data;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v1 01/11] pc-dimm: remove leftover "struct pc_dimms_capacity"

2018-06-11 Thread David Gibson

On Mon, Jun 11, 2018 at 02:16:45PM +0200, David Hildenbrand wrote:
> Not needed anymore, let's drop it.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: David Gibson 

> ---
>  hw/mem/pc-dimm.c | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 12da89d562..62b34a992e 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -27,11 +27,6 @@
>  #include "sysemu/numa.h"
>  #include "trace.h"
>  
> -typedef struct pc_dimms_capacity {
> - uint64_t size;
> - Error**errp;
> -} pc_dimms_capacity;
> -
>  void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine,
>   uint64_t align, Error **errp)
>  {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] xics_kvm: use KVM helpers

2018-06-11 Thread David Gibson

On Mon, Jun 11, 2018 at 06:23:10PM +0200, Cédric Le Goater wrote:
> The KVM helpers hide the low level interface used to communicate to
> the XICS KVM device and provide a good cleanup to the XICS KVM models.
> 
> Signed-off-by: Cédric Le Goater 

Applied to ppc-for-3.0, thanks.

> ---
>  hw/intc/xics_kvm.c | 52 ++--
>  1 file changed, 14 insertions(+), 38 deletions(-)
> 
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> index 89fb20e2c55c..8bdf6afe82a0 100644
> --- a/hw/intc/xics_kvm.c
> +++ b/hw/intc/xics_kvm.c
> @@ -56,10 +56,6 @@ static QLIST_HEAD(, KVMEnabledICP)
>  static void icp_get_kvm_state(ICPState *icp)
>  {
>  uint64_t state;
> -struct kvm_one_reg reg = {
> -.id = KVM_REG_PPC_ICP_STATE,
> -.addr = (uintptr_t),
> -};
>  int ret;
>  
>  /* ICP for this CPU thread is not in use, exiting */
> @@ -67,7 +63,7 @@ static void icp_get_kvm_state(ICPState *icp)
>  return;
>  }
>  
> -ret = kvm_vcpu_ioctl(icp->cs, KVM_GET_ONE_REG, );
> +ret = kvm_get_one_reg(icp->cs, KVM_REG_PPC_ICP_STATE, );
>  if (ret != 0) {
>  error_report("Unable to retrieve KVM interrupt controller state"
>  " for CPU %ld: %s", kvm_arch_vcpu_id(icp->cs), 
> strerror(errno));
> @@ -96,10 +92,6 @@ static void icp_synchronize_state(ICPState *icp)
>  static int icp_set_kvm_state(ICPState *icp, int version_id)
>  {
>  uint64_t state;
> -struct kvm_one_reg reg = {
> -.id = KVM_REG_PPC_ICP_STATE,
> -.addr = (uintptr_t),
> -};
>  int ret;
>  
>  /* ICP for this CPU thread is not in use, exiting */
> @@ -111,7 +103,7 @@ static int icp_set_kvm_state(ICPState *icp, int 
> version_id)
>  | ((uint64_t)icp->mfrr << KVM_REG_PPC_ICP_MFRR_SHIFT)
>  | ((uint64_t)icp->pending_priority << KVM_REG_PPC_ICP_PPRI_SHIFT);
>  
> -ret = kvm_vcpu_ioctl(icp->cs, KVM_SET_ONE_REG, );
> +ret = kvm_set_one_reg(icp->cs, KVM_REG_PPC_ICP_STATE, );
>  if (ret != 0) {
>  error_report("Unable to restore KVM interrupt controller state (0x%"
>  PRIx64 ") for CPU %ld: %s", state, kvm_arch_vcpu_id(icp->cs),
> @@ -185,21 +177,15 @@ static const TypeInfo icp_kvm_info = {
>  static void ics_get_kvm_state(ICSState *ics)
>  {
>  uint64_t state;
> -struct kvm_device_attr attr = {
> -.flags = 0,
> -.group = KVM_DEV_XICS_GRP_SOURCES,
> -.addr = (uint64_t)(uintptr_t),
> -};
>  int i;
> +Error *local_err = NULL;
>  
>  for (i = 0; i < ics->nr_irqs; i++) {
>  ICSIRQState *irq = >irqs[i];
> -int ret;
> -
> -attr.attr = i + ics->offset;
>  
> -ret = ioctl(kernel_xics_fd, KVM_GET_DEVICE_ATTR, );
> -if (ret != 0) {
> +kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
> +  i + ics->offset, , false, _err);
> +if (local_err) {
>  error_report("Unable to retrieve KVM interrupt controller state"
>  " for IRQ %d: %s", i + ics->offset, strerror(errno));
>  exit(1);
> @@ -255,19 +241,13 @@ static void ics_synchronize_state(ICSState *ics)
>  static int ics_set_kvm_state(ICSState *ics, int version_id)
>  {
>  uint64_t state;
> -struct kvm_device_attr attr = {
> -.flags = 0,
> -.group = KVM_DEV_XICS_GRP_SOURCES,
> -.addr = (uint64_t)(uintptr_t),
> -};
>  int i;
> +Error *local_err = NULL;
>  
>  for (i = 0; i < ics->nr_irqs; i++) {
>  ICSIRQState *irq = >irqs[i];
>  int ret;
>  
> -attr.attr = i + ics->offset;
> -
>  state = irq->server;
>  state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
>  << KVM_XICS_PRIORITY_SHIFT;
> @@ -293,8 +273,9 @@ static int ics_set_kvm_state(ICSState *ics, int 
> version_id)
>  state |= KVM_XICS_QUEUED;
>  }
>  
> -ret = ioctl(kernel_xics_fd, KVM_SET_DEVICE_ATTR, );
> -if (ret != 0) {
> +kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
> +  i + ics->offset, , true, _err);
> +if (local_err) {
>  error_report("Unable to restore KVM interrupt controller state"
>  " for IRQs %d: %s", i + ics->offset, strerror(errno));
>  return ret;
> @@ -391,10 +372,6 @@ static void rtas_dummy(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp)
>  {
>  int rc;
> -struct kvm_create_device xics_create_device = {
> -.type = KVM_DEV_TYPE_XICS,
> -.flags = 0,
> -};
>  
>  if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_IRQ_XICS)) 
> {
>  error_setg(errp,
> @@ -431,20 +408,19 @@ int xics_kvm_init(sPAPRMachineState *spapr, Error 
> **errp)
>  goto fail;
>  }
>  
> -/* Create the kernel ICP */
> -rc = kvm_vm_ioctl(kvm_state,

Re: [Qemu-devel] [PATCH] ppc/pnv: fix LPC HC firmware address space

2018-06-11 Thread David Gibson

On Mon, Jun 11, 2018 at 07:12:10PM +0200, Cédric Le Goater wrote:
> A specific MemoryRegion is required for the LPC HC Firmware address
> space.
> 
> Signed-off-by: Cédric Le Goater 

Applied to ppc-for-3.0, thanks.

> ---
>  include/hw/ppc/pnv_lpc.h | 1 +
>  hw/ppc/pnv_lpc.c | 4 +++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/ppc/pnv_lpc.h b/include/hw/ppc/pnv_lpc.h
> index 023b4f0fec86..53fdd5bb6450 100644
> --- a/include/hw/ppc/pnv_lpc.h
> +++ b/include/hw/ppc/pnv_lpc.h
> @@ -38,6 +38,7 @@ typedef struct PnvLpcController {
>  /* ISA IO and Memory space */
>  MemoryRegion isa_io;
>  MemoryRegion isa_mem;
> +MemoryRegion isa_fw;
>  
>  /* Windows from OPB to ISA (aliases) */
>  MemoryRegion opb_isa_io;
> diff --git a/hw/ppc/pnv_lpc.c b/hw/ppc/pnv_lpc.c
> index c42b4a8f6c0f..a0dba657ee60 100644
> --- a/hw/ppc/pnv_lpc.c
> +++ b/hw/ppc/pnv_lpc.c
> @@ -79,6 +79,7 @@ enum {
>  
>  #define ISA_IO_SIZE 0x0001
>  #define ISA_MEM_SIZE0x1000
> +#define ISA_FW_SIZE 0x1000
>  #define LPC_IO_OPB_ADDR 0xd001
>  #define LPC_IO_OPB_SIZE 0x0001
>  #define LPC_MEM_OPB_ADDR0xe001
> @@ -437,6 +438,7 @@ static void pnv_lpc_realize(DeviceState *dev, Error 
> **errp)
>   */
>  memory_region_init(>isa_io, OBJECT(dev), "isa-io", ISA_IO_SIZE);
>  memory_region_init(>isa_mem, OBJECT(dev), "isa-mem", ISA_MEM_SIZE);
> +memory_region_init(>isa_fw, OBJECT(dev),  "isa-fw", ISA_FW_SIZE);
>  
>  /* Create windows from the OPB space to the ISA space */
>  memory_region_init_alias(>opb_isa_io, OBJECT(dev), "lpc-isa-io",
> @@ -448,7 +450,7 @@ static void pnv_lpc_realize(DeviceState *dev, Error 
> **errp)
>  memory_region_add_subregion(>opb_mr, LPC_MEM_OPB_ADDR,
>  >opb_isa_mem);
>  memory_region_init_alias(>opb_isa_fw, OBJECT(dev), "lpc-isa-fw",
> - >isa_mem, 0, LPC_FW_OPB_SIZE);
> + >isa_fw, 0, LPC_FW_OPB_SIZE);
>  memory_region_add_subregion(>opb_mr, LPC_FW_OPB_ADDR,
>  >opb_isa_fw);
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 0/1] blockdev: implement n-ary bitmap merge

2018-06-11 Thread John Snow

I mis-typed Vladimir's email, I have corrected it here.

On 06/11/2018 06:43 PM, John Snow wrote:
> requires: 20180606182449.1607-1-js...@redhat.com
> 
> See patch for details; this is somewhat an RFC that I suspect
> will be useful for libvirt in some situations, but maybe it's
> actually overkill.
> 
> John Snow (1):
>   blockdev: n-ary bitmap merge
> 
>  blockdev.c   | 40 ++--
>  qapi/block-core.json | 10 +-
>  2 files changed, 35 insertions(+), 15 deletions(-)
>

[Qemu-devel] [PATCH 0/1] blockdev: implement n-ary bitmap merge

2018-06-11 Thread John Snow

requires: 20180606182449.1607-1-js...@redhat.com

See patch for details; this is somewhat an RFC that I suspect
will be useful for libvirt in some situations, but maybe it's
actually overkill.

John Snow (1):
  blockdev: n-ary bitmap merge

 blockdev.c   | 40 ++--
 qapi/block-core.json | 10 +-
 2 files changed, 35 insertions(+), 15 deletions(-)

-- 
2.14.3

[Qemu-devel] [PATCH 1/1] blockdev: n-ary bitmap merge

2018-06-11 Thread John Snow

It might be nice to have an all-or-nothing merge command that either
succeeds in merging all bitmaps or fails for all of them. This way,
when assembling bitmaps that represent arbitrary points in time from
component bitmaps, we always know the state of the target bitmap even
in cases of failure.

Signed-off-by: John Snow 
---
 blockdev.c   | 40 ++--
 qapi/block-core.json | 10 +-
 2 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 041f5d594f..4d60d0a03c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3054,34 +3054,54 @@ void qmp_x_block_dirty_bitmap_disable(const char *node, 
const char *name,
 bdrv_disable_dirty_bitmap(bitmap);
 }
 
-void qmp_x_block_dirty_bitmap_merge(const char *node, const char *dst_name,
-const char *src_name, Error **errp)
+void qmp_x_block_dirty_bitmap_merge(const char *node, const char *target,
+strList *bitmaps, Error **errp)
 {
 BlockDriverState *bs;
-BdrvDirtyBitmap *dst, *src;
+BdrvDirtyBitmap *dst, *src, *anon;
+strList *lst;
+Error *local_err = NULL;
 
-dst = block_dirty_bitmap_lookup(node, dst_name, , errp);
+dst = block_dirty_bitmap_lookup(node, target, , errp);
 if (!dst) {
 return;
 }
 
 if (bdrv_dirty_bitmap_frozen(dst)) {
 error_setg(errp, "Bitmap '%s' is frozen and cannot be modified",
-   dst_name);
+   target);
 return;
 } else if (bdrv_dirty_bitmap_readonly(dst)) {
 error_setg(errp, "Bitmap '%s' is readonly and cannot be modified",
-   dst_name);
+   target);
 return;
 }
 
-src = bdrv_find_dirty_bitmap(bs, src_name);
-if (!src) {
-error_setg(errp, "Dirty bitmap '%s' not found", src_name);
+anon = bdrv_create_dirty_bitmap(bs, bdrv_dirty_bitmap_granularity(dst),
+NULL, errp);
+if (!anon) {
 return;
 }
 
-bdrv_merge_dirty_bitmap(dst, src, errp);
+/* Aggregate bitmaps to anonymous temp bitmap */
+for (lst = bitmaps; lst; lst = lst->next) {
+  src = bdrv_find_dirty_bitmap(bs, lst->value);
+  if (!src) {
+  error_setg(errp, "Dirty bitmap '%s' not found", lst->value);
+  goto out;
+  }
+  bdrv_merge_dirty_bitmap(anon, src, _err);
+  if (local_err) {
+  error_propagate(errp, local_err);
+  goto out;
+  }
+}
+
+/* Merge into dst; dst is unchanged on failure */
+bdrv_merge_dirty_bitmap(dst, anon, errp);
+
+out:
+bdrv_release_dirty_bitmap(bs, anon);
 }
 
 BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index fff23fc82b..83402da7e7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1749,14 +1749,14 @@
 #
 # @node: name of device/node which the bitmap is tracking
 #
-# @dst_name: name of the destination dirty bitmap
+# @target: name of the destination dirty bitmap
 #
-# @src_name: name of the source dirty bitmap
+# @bitmaps: name(s) of the source dirty bitmap(s)
 #
 # Since: 3.0
 ##
 { 'struct': 'BlockDirtyBitmapMerge',
-  'data': { 'node': 'str', 'dst_name': 'str', 'src_name': 'str' } }
+  'data': { 'node': 'str', 'target': 'str', 'bitmaps': ['str'] } }
 
 ##
 # @block-dirty-bitmap-add:
@@ -1884,8 +1884,8 @@
 # Example:
 #
 # -> { "execute": "x-block-dirty-bitmap-merge",
-#  "arguments": { "node": "drive0", "dst_name": "bitmap0",
-# "src_name": "bitmap1" } }
+#  "arguments": { "node": "drive0", "target": "bitmap0",
+# "bitmaps": ["bitmap1"] } }
 # <- { "return": {} }
 #
 ##
-- 
2.14.3

Re: [Qemu-devel] [PATCH v6 2/2] vl: fix use of --daemonize with --preconfig

2018-06-11 Thread Eduardo Habkost

CCing libvir-list.

On Mon, Jun 11, 2018 at 11:29:24PM +0200, Igor Mammedov wrote:
> On Mon, 11 Jun 2018 16:06:07 -0300
> Eduardo Habkost  wrote:
> 
> > On Mon, Jun 11, 2018 at 03:16:25PM +0200, Igor Mammedov wrote:
> > > On Fri, 8 Jun 2018 10:21:05 -0300
> > > Eduardo Habkost  wrote:
> > > 
> > > > On Thu, Jun 07, 2018 at 02:00:09PM +0200, Igor Mammedov wrote:
> > > > > When using --daemonize, the initial lead process will fork a child and
> > > > > then wait to be notified that setup is complete via a pipe, before it
> > > > > exits.  When using --preconfig there is an extra call to main_loop()
> > > > > before the notification is done from os_setup_post(). Thus the parent
> > > > > process won't exit until the mgmt application connects to the monitor
> > > > > and tells QEMU to leave the RUN_STATE_PRECONFIG. The mgmt application
> > > > > won't connect to the monitor until daemonizing has completed though.
> > > > > 
> > > > > This is a chicken and egg problem, leading to deadlock at startup.
> > > > > 
> > > > > The only viable way to fix this is to call os_setup_post() before
> > > > > the early main_loop() call when --preconfig is used. This has the
> > > > > downside that any errors from this point onwards won't be handled
> > > > > well by the mgmt application, because it will think QEMU has started
> > > > > successfully, so not be expecting an abrupt exit. Moving as much user
> > > > > input validation as possible to before the main_loop() call might 
> > > > > help,
> > > > > but mgmt application should stop assuming that QEMU has started
> > > > > successfuly and use other means to collect errors from QEMU (logfile).
> > > > > 
> > > > > Signed-off-by: Daniel P. Berrangé 
> > > > > Signed-off-by: Igor Mammedov 
> > > > > ---
> > > > > v5:
> > > > >   * use original Daniel's patch [1], but addapt it to apply on top of
> > > > > "[PATCH v3 1/2] cli: Don't run early event loop if no  
> > > > > --preconfig was specified"
> > > > > with extra comment and massage commit message a little bit.
> > > > > v6:
> > > > >   * hide os_setup_post_done flag inside of os_setup_post() as it was 
> > > > > in v4
> > > > > 
> > > > > CC: berra...@redhat.com
> > > > > CC: mre...@redhat.com
> > > > > CC: pbonz...@redhat.com
> > > > > CC: ehabk...@redhat.com
> > > > > CC: ldok...@redhat.com
> > > > > CC: ebl...@redhat.com
> > > > > ---
> > > > >  os-posix.c | 6 ++
> > > > >  vl.c   | 6 ++
> > > > >  2 files changed, 12 insertions(+)
> > > > > 
> > > > > diff --git a/os-posix.c b/os-posix.c
> > > > > index 9ce6f74..0246195 100644
> > > > > --- a/os-posix.c
> > > > > +++ b/os-posix.c
> > > > > @@ -309,8 +309,14 @@ void os_daemonize(void)
> > > > >  
> > > > >  void os_setup_post(void)
> > > > >  {
> > > > > +static bool os_setup_post_done;
> > > > >  int fd = 0;
> > > > >  
> > > > > +if (os_setup_post_done) {
> > > > > +return;
> > > > > +}
> > > > > +os_setup_post_done = true;
> > > > > +
> > > > >  if (daemonize) {
> > > > >  if (chdir("/")) {
> > > > >  error_report("not able to chdir to /: %s", 
> > > > > strerror(errno));
> > > > > diff --git a/vl.c b/vl.c
> > > > > index fa44138..457ff2a 100644
> > > > > --- a/vl.c
> > > > > +++ b/vl.c
> > > > > @@ -4578,6 +4578,12 @@ int main(int argc, char **argv, char **envp)
> > > > >  parse_numa_opts(current_machine);
> > > > >  
> > > > >  /* do monitor/qmp handling at preconfig state if requested */
> > > > > +if (!preconfig_exit_requested && is_daemonized()) {
> > > > > +/* signal parent QEMU to exit, libvirt treats it as a sign
> > > > > + * that monitor socket is ready to accept connections
> > > > > + */
> > > > > +os_setup_post();
> > > > > +}  
> > > > 
> > > > I was looking at the daemonize logic, and noticed it we have a
> > > > huge amount of code between this line and the next
> > > > os_setup_post() call that could either:
> > > > 
> > > > * call exit() and/or error_report(); or
> > > logging would work to the extent mentioned in commit message,
> > > i.e. it' would work fine when log file is used otherwise it
> > > errors will go to /dev/null
> > > 
> > > so it should be more or less fine on this point
> > 
> > My worry is that most users of error_report() involve an exit()
> > call too.
> > 
> > Once we have an active monitor, we must never call exit()
> > directly.  Even qmp_quit() doesn't call exit() directly.
> Is there any reason why exit() can't be called?

QMP clients don't expect the QMP socket to be closed except when
using the 'quit' command.

> 
> > > > * be unable to finish machine initialization because of
> > > >   chdir("/"), change_root(), or change_process_uid().
> > > this one really no go.
> > > I see 2 options here,
> > > 
> > >  * move init code that opens files to early stage (before preconfig 
> > > monitor)
> > >or split it to open files early.
> > >(I've spotted several obvious places

Re: [Qemu-devel] [PATCH] hw/i386: Deprecate the machine types pc-0.10 and pc-0.11

2018-06-11 Thread Eduardo Habkost

On Mon, Jun 11, 2018 at 05:41:04AM +0200, Thomas Huth wrote:
> The oldest machine type which is still used in a maintained distribution
> is a pc-0.12 based machine type in RHEL6, so everything that is older
> than pc-0.12 should not be used anymore. Thus let's deprecate pc-0.10
> and pc-0.11 so that we can finally remove them in a future release.
> 
> Signed-off-by: Thomas Huth 
> ---
>  This is based on a patch that I already sent in 2017. But back then, we
>  were still in progress of discussing our deprecation policies (e.g. auto-
>  matic deprecation for old machine types), and there was no clear consensus
>  whether we should deprecate 0.10 - 0.11, all 0.x or even up to version 1.2.
>  After some iterations and too much discussion, I've forgotten about this
>  patch. Anyway, I think we agreed that at least 0.10 and 0.11 can certainly
>  be removed nowadays, so let's finally get at least those two machine types
>  marked as deprecated! If that works fine and we will finally have removed
>  these two types in v3.2, we can resume the discussion about newer machine
>  types afterwards.

Thanks!


>  
>  hw/i386/pc_piix.c   | 2 ++
>  include/hw/boards.h | 1 +
>  qemu-doc.texi   | 5 +
>  vl.c| 9 +++--
>  4 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 3d81136..fa61dc3 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -955,6 +955,8 @@ static void pc_i440fx_0_11_machine_options(MachineClass 
> *m)
>  {
>  pc_i440fx_0_12_machine_options(m);
>  m->hw_version = "0.11";
> +m->deprecation_msg = "Old and unsupported machine version, "
> + "use a newer machine type instead.";

Sounds simple enough to me, but see comment about QMP below.


>  SET_MACHINE_COMPAT(m, PC_COMPAT_0_11);
>  }
>  
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index ef7457f..e1bd25f 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -166,6 +166,7 @@ struct MachineClass {
>  char *name;
>  const char *alias;
>  const char *desc;
> +const char *deprecation_msg;

Let's document guidelines for deprecation_msg to avoid the
inconsistencies we saw in previous proposals?


>  
>  void (*init)(MachineState *state);
>  void (*reset)(void);
> diff --git a/qemu-doc.texi b/qemu-doc.texi
> index 2effe66..2a597d6 100644
> --- a/qemu-doc.texi
> +++ b/qemu-doc.texi
> @@ -2965,6 +2965,11 @@ support page sizes < 4096 any longer.
>  
>  @section System emulator machines
>  
> +@subsection pc-0.10 and pc-0.11 (since 3.0)
> +
> +These machine types are very old and likely can not be used for 
> life-migration
   
"live"

I never know if the right way to spell it is "live-migration" or
"live migration".


> +from old QEMU versions anymore. A newer machine type should be used instead.
> +
>  @section Block device options
>  
>  @subsection "backing": "" (since 2.12.0)
> diff --git a/vl.c b/vl.c
> index 0603171..096814c 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2560,8 +2560,9 @@ static gint machine_class_cmp(gconstpointer a, 
> gconstpointer b)
>  if (mc->alias) {
>  printf("%-20s %s (alias of %s)\n", mc->alias, mc->desc, 
> mc->name);
>  }
> -printf("%-20s %s%s\n", mc->name, mc->desc,
> -   mc->is_default ? " (default)" : "");
> +printf("%-20s %s%s%s\n", mc->name, mc->desc,
> +   mc->is_default ? " (default)" : "",
> +   mc->deprecation_msg ? " (deprecated)" : "");
>  }
>  }
>  
> @@ -3952,6 +3953,10 @@ int main(int argc, char **argv, char **envp)
>  }
>  
>  machine_class = select_machine();
> +if (machine_class->deprecation_msg) {
> +error_report("Machine type '%s' is deprecated: %s",
> + machine_class->name, machine_class->deprecation_msg);
> +}

Do you plan to add this info to 'query-machines' QMP command?

If we do that, maybe we should represent the common "this machine
type is too old, but there's a new version" case in a more
machine-friendly way?  Maybe a 'deprecation_reason' enum would be
better than a 'deprecation_msg' field?

(Note that I don't think any discussions about the QMP interface
should block this patch from being merged.  We can deprecate the
machines first, and decide about QMP later.)

>  
>  set_memory_options(_slots, _size, machine_class);
>  
> -- 
> 1.8.3.1
> 

-- 
Eduardo

Re: [Qemu-devel] [PATCH] m25p80: add support for two bytes WRSR for Macronix chips

2018-06-11 Thread Alistair Francis

On Mon, Jun 11, 2018 at 10:18 AM, Cédric Le Goater  wrote:
> On 06/11/2018 07:15 PM, Cédric Le Goater wrote:
>> On Macronix chips, two bytes can written to the WRSR. First byte will
>> configure the status register and the second the configuration
>> register. It is important to save the configuration value as it
>> contains the dummy cycle setting when using dual or quad IO mode.
>>
>> Signed-off-by: Cédric Le Goater 

Acked-by: Alistair Francis 

Alistair

>> ---
>>  hw/block/m25p80.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
>> index b49c8e9caa04..29775e055a24 100644
>> --- a/hw/block/m25p80.c
>> +++ b/hw/block/m25p80.c
>> @@ -699,6 +699,7 @@ static void complete_collecting_data(Flash *s)
>>  case MAN_MACRONIX:
>>  s->quad_enable = extract32(s->data[0], 6, 1);
>>  if (s->len > 1) {
>> +s->volatile_cfg = s->data[1];
>>  s->four_bytes_address_mode = extract32(s->data[1], 5, 1);
>>  }
>>  break;
>>
>
> I sent this patch to qemu-ppc instead of qemu-devel ...
>
> Sorry for the noise.
>
> C.
>

Re: [Qemu-devel] [PATCH 2/2] iotests: Add test 221 to catch qemu-img map regression

2018-06-11 Thread Eric Blake


On 06/11/2018 04:39 PM, Eric Blake wrote:

Although qemu-img creates aligned files (by rounding up), it
must also gracefully handle files that are not sector-aligned.
Test that the bug fixed in the previous patch does not recur.

It's a bit annoying that we can see the (implicit) hole past
the end of the file on to the next sector boundary, so if we
ever reach the point where we report a byte-accurate size rather
than our current behavior of always rounding up, this test will
probably need a slight modification.

Signed-off-by: Eric Blake 
---



+
+$QEMU_IO -c 'w 43008 1' "$TEST_IMG" # writing also rounds up


Shoot - missing a filter...


+++ b/tests/qemu-iotests/221.out
@@ -0,0 +1,16 @@
+QA output created by 221
+
+=== Check mapping of unaligned raw image ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=43009
+[{ "start": 0, "length": 43520, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET}]
+[{ "start": 0, "length": 43520, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET}]
+wrote 1/1 bytes at offset 43008
+1 bytes, 1 ops; 0.0001 sec (7.512 KiB/sec and 7692.3077 ops/sec)


...which leaks volatile output.  Squash this in:

diff --git i/tests/qemu-iotests/221 w/tests/qemu-iotests/221
index f2cd3c2210e..41c4e4bdf88 100755
--- i/tests/qemu-iotests/221
+++ w/tests/qemu-iotests/221
@@ -48,7 +48,7 @@ $QEMU_IMG map --output=json "$TEST_IMG" | 
_filter_qemu_img_map

 truncate --size=43009 "$TEST_IMG" # so we resize it and check again
 $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map

-$QEMU_IO -c 'w 43008 1' "$TEST_IMG" # writing also rounds up
+$QEMU_IO -c 'w 43008 1' "$TEST_IMG" | _filter_qemu_io # writing also 
rounds up

 $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map

 truncate --size=43009 "$TEST_IMG" # so we resize it and check again
diff --git i/tests/qemu-iotests/221.out w/tests/qemu-iotests/221.out
index fcf61352cc3..a9c0190aadc 100644
--- i/tests/qemu-iotests/221.out
+++ w/tests/qemu-iotests/221.out
@@ -6,7 +6,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=43009
 [{ "start": 0, "length": 43520, "depth": 0, "zero": true, "data": 
false, "offset": OFFSET}]
 [{ "start": 0, "length": 43520, "depth": 0, "zero": true, "data": 
false, "offset": OFFSET}]

 wrote 1/1 bytes at offset 43008
-1 bytes, 1 ops; 0.0001 sec (7.512 KiB/sec and 7692.3077 ops/sec)
+1 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 40960, "depth": 0, "zero": true, "data": 
false, "offset": OFFSET},
 { "start": 40960, "length": 2049, "depth": 0, "zero": false, "data": 
true, "offset": OFFSET},
 { "start": 43009, "length": 511, "depth": 0, "zero": true, "data": 
false, "offset": OFFSET}]



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH] docs: Grammar and spelling fixes

2018-06-11 Thread Eric Blake


On 06/11/2018 03:25 PM, Ville Skyttä wrote:

Signed-off-by: Ville Skyttä 
---



+++ b/docs/config/mach-virt-graphical.cfg
@@ -185,7 +185,7 @@
  # attached to it.
  #
  # We also create an optical disk, mostly for installation
-# purposes: once the guest OS has been succesfully
+# purposes: once the guest OS has been successfully


Looks like this line was copy-and-pasted a few times :)


+++ b/docs/interop/qcow2.txt
@@ -326,7 +326,7 @@ in the image file.
  It contains pointers to the second level structures which are called refcount
  blocks and are exactly one cluster in size.
  
-Given a offset into the image file, the refcount of its cluster can be obtained

+Given an offset into the image file, the refcount of its cluster can be 
obtained
  as follows:


This line is now long; it might be better if you reflow 'obtained', more 
like...


  
  refcount_block_entries = (cluster_size * 8 / refcount_bits)

@@ -365,7 +365,7 @@ The L1 table has a variable size (stored in the header) and 
may use multiple
  clusters, however it must be contiguous in the image file. L2 tables are
  exactly one cluster in size.
  
-Given a offset into the virtual disk, the offset into the image file can be

+Given an offset into the virtual disk, the offset into the image file can be
  obtained as follows:


...this line.

With or without the reflow,
Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [RFC untested PATCH] i386/cpu: make -cpu host support monitor/mwait

2018-06-11 Thread Eduardo Habkost

On Fri, Jun 08, 2018 at 11:59:19PM +0300, Michael S. Tsirkin wrote:
> When guest CPU PM is enabled, and with -cpu host, expose the host CPU
> MWAIT leaf to guest so guest can make good PM decisions.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> This builds but is untested. Is this a reasonable way to go about it?

Looks good to me.

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [RFC PATCH] i386: Remove ospke CPUID flag name

2018-06-11 Thread Richard Henderson

On 06/11/2018 10:37 AM, Eduardo Habkost wrote:
> RFC NOTE: Paolo, Richard, as far as I can see, there's no point
> in enabling OSPKE in user-mode QEMU.  Do you confirm that?
> 
> OSPKE is not a static feature flag: it changes dynamically at
> runtime depending on CR4, and it was never configurable: KVM
> never returned OSPKE on GET_SUPPORTED_CPUID, and on TCG enables
> it automatically if CR4_PKE_MASK is set.
> 
> Remove OSPKE from the feature name array so users don't try to
> configure it manually.
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  target/i386/cpu.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson 

r~

Re: [Qemu-devel] [PATCH] i386: Remove osxsave CPUID flag name

2018-06-11 Thread Richard Henderson

On 06/11/2018 10:38 AM, Eduardo Habkost wrote:
> OSXAVE is not a static feature flag: it changes dynamically at
> runtime depending on CR4, and it was never configurable: KVM
> never returned OSXSAVE on GET_SUPPORTED_CPUID, and it is not
> included in TCG_EXT_FEATURES.
> 
> Remove OSXSAVE from the feature name array so users don't try to
> configure it manually.
> 
> Signed-off-by: Eduardo Habkost 
> ---

Reviewed-by: Richard Henderson 

r~

[Qemu-devel] [PATCH 1/2] qemu-img: Fix assert when mapping unaligned raw file

2018-06-11 Thread Eric Blake

Commit a290f085 exposed a latent bug in qemu-img map introduced
during the conversion of block status to be byte-based.  Earlier in
commit 5e344dd8, the internal interface get_block_status() switched
to take byte-based parameters, but still called a sector-based
block layer function; as such, rounding was added in the lone
caller to obey the contract.  However, commit 237d78f8 changed
get_block_status() to truly be byte-based, at which point rounding
to sector boundaries can result in calling bdrv_block_status() with
'bytes == 0' (a coding error) when the boundary between data and a
hole falls mid-sector (true for the past-EOF implicit hole present
in POSIX files).  Fix things by removing the rounding that is now
no longer necessary.

See also https://bugzilla.redhat.com/1589738

Fixes: 237d78f8
Reported-by: Dan Kenigsberg 
Reported-by: Nir Soffer 
Reported-by: Maor Lipchuk 
CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 
---
 qemu-img.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index 1dcdd47254a..e1a506f7f67 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2906,7 +2906,7 @@ static int img_map(int argc, char **argv)
 int64_t n;

 /* Probe up to 1 GiB at a time.  */
-n = QEMU_ALIGN_DOWN(MIN(1 << 30, length - offset), BDRV_SECTOR_SIZE);
+n = MIN(1 << 30, length - offset);
 ret = get_block_status(bs, offset, n, );

 if (ret < 0) {
-- 
2.14.4

[Qemu-devel] [PATCH 0/2] Fix qemu-img map on unaligned image

2018-06-11 Thread Eric Blake

See https://bugzilla.redhat.com/1589738; thanks to Nir, Maor, and Dan
for figuring out that it was a qemu-img regression and coming up with
a test case; and for Kevin for then bisecting it to point to my
byte-based conversion code being at fault.

Eric Blake (2):
  qemu-img: Fix assert when mapping unaligned raw file
  iotests: Add test 221 to catch qemu-img map regression

 qemu-img.c |  2 +-
 tests/qemu-iotests/221 | 60 ++
 tests/qemu-iotests/221.out | 16 +
 tests/qemu-iotests/group   |  1 +
 4 files changed, 78 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/221
 create mode 100644 tests/qemu-iotests/221.out

-- 
2.14.4

[Qemu-devel] [PATCH 2/2] iotests: Add test 221 to catch qemu-img map regression

2018-06-11 Thread Eric Blake

Although qemu-img creates aligned files (by rounding up), it
must also gracefully handle files that are not sector-aligned.
Test that the bug fixed in the previous patch does not recur.

It's a bit annoying that we can see the (implicit) hole past
the end of the file on to the next sector boundary, so if we
ever reach the point where we report a byte-accurate size rather
than our current behavior of always rounding up, this test will
probably need a slight modification.

Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/221 | 60 ++
 tests/qemu-iotests/221.out | 16 +
 tests/qemu-iotests/group   |  1 +
 3 files changed, 77 insertions(+)
 create mode 100755 tests/qemu-iotests/221
 create mode 100644 tests/qemu-iotests/221.out

diff --git a/tests/qemu-iotests/221 b/tests/qemu-iotests/221
new file mode 100755
index 000..f2cd3c2210e
--- /dev/null
+++ b/tests/qemu-iotests/221
@@ -0,0 +1,60 @@
+#!/bin/bash
+#
+# Test qemu-img vs. unaligned images
+#
+# Copyright (C) 2018 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1 # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto file
+_supported_os Linux
+
+echo
+echo "=== Check mapping of unaligned raw image ==="
+echo
+
+_make_test_img 43009 # qemu-img create rounds size up
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+truncate --size=43009 "$TEST_IMG" # so we resize it and check again
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+$QEMU_IO -c 'w 43008 1' "$TEST_IMG" # writing also rounds up
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+truncate --size=43009 "$TEST_IMG" # so we resize it and check again
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/221.out b/tests/qemu-iotests/221.out
new file mode 100644
index 000..fcf61352cc3
--- /dev/null
+++ b/tests/qemu-iotests/221.out
@@ -0,0 +1,16 @@
+QA output created by 221
+
+=== Check mapping of unaligned raw image ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=43009
+[{ "start": 0, "length": 43520, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET}]
+[{ "start": 0, "length": 43520, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET}]
+wrote 1/1 bytes at offset 43008
+1 bytes, 1 ops; 0.0001 sec (7.512 KiB/sec and 7692.3077 ops/sec)
+[{ "start": 0, "length": 40960, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET},
+{ "start": 40960, "length": 2049, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET},
+{ "start": 43009, "length": 511, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET}]
+[{ "start": 0, "length": 40960, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET},
+{ "start": 40960, "length": 2049, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET},
+{ "start": 43009, "length": 511, "depth": 0, "zero": true, "data": false, 
"offset": OFFSET}]
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 0914c922d7f..937a3d0a4d8 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -218,3 +218,4 @@
 217 rw auto quick
 218 rw auto quick
 219 rw auto
+221 rw auto quick
-- 
2.14.4

Re: [Qemu-devel] [PATCH v3 5/7] hmp: Add info commands for preconfig

2018-06-11 Thread Igor Mammedov

On Mon, 11 Jun 2018 15:40:16 -0300
Eduardo Habkost  wrote:

> On Mon, Jun 11, 2018 at 02:01:52PM +0200, Markus Armbruster wrote:
> > * Eduardo, why does "info numa" have no QMP equivalent?
> 
> Nobody ever asked for one, which seems to qualify as "only for
> human users".
> 
> Should we add an equivalent QMP command even if we don't expect
> anybody to use it?

we inderectly can fetch numa info via QMP, using 
  query-hotpluggable-cpus
for CPU mapping and
  query-memory-devices
for (NV|PC)-dimm devices, however there is no QMP way for getting
for numa mapping of initial RAM nor configured numa nodes
(not counting querying CLI options).

So perhaps we need info 'numa' equivalent for QMP which would give
the same amount of information as HMP in one query.

Maybe libvirt side as actual users know better if it's really needed (CCed)

Re: [Qemu-devel] [PATCH v6 2/2] vl: fix use of --daemonize with --preconfig

2018-06-11 Thread Igor Mammedov

On Mon, 11 Jun 2018 16:06:07 -0300
Eduardo Habkost  wrote:

> On Mon, Jun 11, 2018 at 03:16:25PM +0200, Igor Mammedov wrote:
> > On Fri, 8 Jun 2018 10:21:05 -0300
> > Eduardo Habkost  wrote:
> > 
> > > On Thu, Jun 07, 2018 at 02:00:09PM +0200, Igor Mammedov wrote:
> > > > When using --daemonize, the initial lead process will fork a child and
> > > > then wait to be notified that setup is complete via a pipe, before it
> > > > exits.  When using --preconfig there is an extra call to main_loop()
> > > > before the notification is done from os_setup_post(). Thus the parent
> > > > process won't exit until the mgmt application connects to the monitor
> > > > and tells QEMU to leave the RUN_STATE_PRECONFIG. The mgmt application
> > > > won't connect to the monitor until daemonizing has completed though.
> > > > 
> > > > This is a chicken and egg problem, leading to deadlock at startup.
> > > > 
> > > > The only viable way to fix this is to call os_setup_post() before
> > > > the early main_loop() call when --preconfig is used. This has the
> > > > downside that any errors from this point onwards won't be handled
> > > > well by the mgmt application, because it will think QEMU has started
> > > > successfully, so not be expecting an abrupt exit. Moving as much user
> > > > input validation as possible to before the main_loop() call might help,
> > > > but mgmt application should stop assuming that QEMU has started
> > > > successfuly and use other means to collect errors from QEMU (logfile).
> > > > 
> > > > Signed-off-by: Daniel P. Berrangé 
> > > > Signed-off-by: Igor Mammedov 
> > > > ---
> > > > v5:
> > > >   * use original Daniel's patch [1], but addapt it to apply on top of
> > > > "[PATCH v3 1/2] cli: Don't run early event loop if no  --preconfig 
> > > > was specified"
> > > > with extra comment and massage commit message a little bit.
> > > > v6:
> > > >   * hide os_setup_post_done flag inside of os_setup_post() as it was in 
> > > > v4
> > > > 
> > > > CC: berra...@redhat.com
> > > > CC: mre...@redhat.com
> > > > CC: pbonz...@redhat.com
> > > > CC: ehabk...@redhat.com
> > > > CC: ldok...@redhat.com
> > > > CC: ebl...@redhat.com
> > > > ---
> > > >  os-posix.c | 6 ++
> > > >  vl.c   | 6 ++
> > > >  2 files changed, 12 insertions(+)
> > > > 
> > > > diff --git a/os-posix.c b/os-posix.c
> > > > index 9ce6f74..0246195 100644
> > > > --- a/os-posix.c
> > > > +++ b/os-posix.c
> > > > @@ -309,8 +309,14 @@ void os_daemonize(void)
> > > >  
> > > >  void os_setup_post(void)
> > > >  {
> > > > +static bool os_setup_post_done;
> > > >  int fd = 0;
> > > >  
> > > > +if (os_setup_post_done) {
> > > > +return;
> > > > +}
> > > > +os_setup_post_done = true;
> > > > +
> > > >  if (daemonize) {
> > > >  if (chdir("/")) {
> > > >  error_report("not able to chdir to /: %s", 
> > > > strerror(errno));
> > > > diff --git a/vl.c b/vl.c
> > > > index fa44138..457ff2a 100644
> > > > --- a/vl.c
> > > > +++ b/vl.c
> > > > @@ -4578,6 +4578,12 @@ int main(int argc, char **argv, char **envp)
> > > >  parse_numa_opts(current_machine);
> > > >  
> > > >  /* do monitor/qmp handling at preconfig state if requested */
> > > > +if (!preconfig_exit_requested && is_daemonized()) {
> > > > +/* signal parent QEMU to exit, libvirt treats it as a sign
> > > > + * that monitor socket is ready to accept connections
> > > > + */
> > > > +os_setup_post();
> > > > +}  
> > > 
> > > I was looking at the daemonize logic, and noticed it we have a
> > > huge amount of code between this line and the next
> > > os_setup_post() call that could either:
> > > 
> > > * call exit() and/or error_report(); or
> > logging would work to the extent mentioned in commit message,
> > i.e. it' would work fine when log file is used otherwise it
> > errors will go to /dev/null
> > 
> > so it should be more or less fine on this point
> 
> My worry is that most users of error_report() involve an exit()
> call too.
> 
> Once we have an active monitor, we must never call exit()
> directly.  Even qmp_quit() doesn't call exit() directly.
Is there any reason why exit() can't be called?

> > > * be unable to finish machine initialization because of
> > >   chdir("/"), change_root(), or change_process_uid().
> > this one really no go.
> > I see 2 options here,
> > 
> >  * move init code that opens files to early stage (before preconfig monitor)
> >or split it to open files early.
> >(I've spotted several obvious places fwcfg/vnc/replay_start/migration)
> >but there might be code somewhere in callbacks that would do it too,
> >so it rather risky to go this route.
> >(I'd do this anyways one place at the time using sanitizing
> > initialization sequence pretext.)
> 
> We might have QMP commands that take file paths as input, so is
> this really an option?
I'd think that in future we would want to enable object_add in preconfig
to create

Re: [Qemu-devel] [PATCH 0/3] i386: improve output from "-cpu help" argument

2018-06-11 Thread Eduardo Habkost

On Wed, Jun 06, 2018 at 05:55:24PM +0100, Daniel P. Berrangé wrote:
> This short series is a set of improvements to the "-cpu help" output for
> the x86 emulator. The patches describe what's change in each step, but
> it is best visualized by looking at before/after state.
[...]

Queue on x86-next, thanks!

-- 
Eduardo

Re: [Qemu-devel] [PATCH 3/3] i386: display known CPUID features linewrapped, in alphabetical order

2018-06-11 Thread Eduardo Habkost

On Wed, Jun 06, 2018 at 05:55:27PM +0100, Daniel P. Berrangé wrote:
[...]
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index cb074082b3..8043e41be8 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -3157,17 +3157,21 @@ static void 
> x86_cpu_class_check_missing_features(X86CPUClass *xcc,
>  
>  /* Print all cpuid feature names in featureset
>   */
> -static void listflags(FILE *f, fprintf_function print, const char 
> **featureset)
> +static void listflags(FILE *f, fprintf_function print, GList *features)
>  {
> -int bit;
> -bool first = true;
> -
> -for (bit = 0; bit < 32; bit++) {
> -if (featureset[bit]) {
> -print(f, "%s%s", first ? "" : " ", featureset[bit]);
> -first = false;
> +size_t len = 0;
> +GList *tmp;
> +
> +for (tmp = features; tmp; tmp = tmp->next) {
> +const char *name = tmp->data;
> +if ((len + strlen(name) + 1) >= 75) {
> +print(f, "\n");
> +len = 0;
>  }
> +print(f, "%s%s", len == 0 ? "  " : " ", name);
> +len += strlen(name) + 1;
>  }
> +print(f, "\n");

I'd love to have generic helper functions to format text like
this, but that's not a reason to block this patch from being
included.  I will queue the series on x86-next.  Thanks!

-- 
Eduardo

Re: [Qemu-devel] [PATCH v13 2/5] i386: Introduce auto_topoext bit to manage topoext

2018-06-11 Thread Moger, Babu



> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]
> On Behalf Of Eduardo Habkost
> Sent: Monday, June 11, 2018 4:05 PM
> To: Moger, Babu 
> Cc: m...@redhat.com; marcel.apfelb...@gmail.com; pbonz...@redhat.com;
> r...@twiddle.net; mtosa...@redhat.com; qemu-devel@nongnu.org;
> k...@vger.kernel.org; k...@tripleback.net; ge...@hostfission.com
> Subject: Re: [PATCH v13 2/5] i386: Introduce auto_topoext bit to manage
> topoext
> 
> On Mon, Jun 11, 2018 at 05:46:23PM -0300, Eduardo Habkost wrote:
> [...]
> > On PC_COMPAT_2_12, both would work:
> >   { TYPE_X86_CPU, "auto-topoext", "off" }
> > or
> >   { "EPYC" "-" TYPE_X86_CPU, "auto-topoext", "off" }.
> >
> > I prefer the latter, but both would work.
> 
> Oh, while we're at it: please name the property "x-auto-topoext",
> to indicate it's only for QEMU internal use or debugging, and not
> a supported command-line option.

Ok. Sure.

> 
> --
> Eduardo

Re: [Qemu-devel] [PATCH v13 3/5] i386: Enable TOPOEXT feature on AMD EPYC CPU

2018-06-11 Thread Eduardo Habkost

On Mon, Jun 11, 2018 at 05:50:30PM -0300, Eduardo Habkost wrote:
[...]
> > +/* TOPOEXT feature requires 0x801E */
> > +if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_TOPOEXT) {
> > +x86_cpu_adjust_level(cpu, >cpuid_min_xlevel, 0x801E);
> > +}
> 
> I suggest moving this hunk to a separate patch.  I'm not 100%
> sure yet if this will require compat_props code to disable
> auto-xlevel-increase on older machine-types.

The problem here is that:
  $QEMU -machine pc-i440fx-1.3 -cpu Opteron_G4,+topoext
currently results in xlevel=0x801A, since QEMU 1.3.

(The same applies to all machine-types between 1.3 and 2.12)

I was hoping that we could declare topoext as non-migration-safe,
but I believe libvirt will already include "topoext" when using
"host-model" if the host CPU supports TOPOEXT.  Jiri, can you
confirm that?

We can address that with a "x-topoext-auto-xlevel" property, set
to true on all CPU models by default, and disabled by
PC_COMPAT_2_12.

The code would become:

if (cpu->topoext_auto_xlevel && env->features[FEAT_8000_0001_ECX] & 
CPUID_EXT3_TOPOEXT) {
x86_cpu_adjust_level(cpu, >cpuid_min_xlevel, 0x801E);
}

Or, we could simply declare that "-cpu Opteron_G4,+topoext" will
never increase xlevel automatically (on any machine-type), and
change the code above to:

if (cpu->auto_topoext && env->features[FEAT_8000_0001_ECX] & 
CPUID_EXT3_TOPOEXT) {
x86_cpu_adjust_level(cpu, >cpuid_min_xlevel, 0x801E);
}

-- 
Eduardo

Re: [Qemu-devel] [PATCH v2 05/10] qapi: Formalize qcow encryption probing

2018-06-11 Thread Eric Blake


On 06/11/2018 03:51 PM, Max Reitz wrote:

qcow only supports a single encryption (and there is no reason why that
would change in the future), so we can make it the default.

Signed-off-by: Max Reitz 
---
  qapi/block-core.json | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)


Reviewed-by: Eric Blake 



diff --git a/qapi/block-core.json b/qapi/block-core.json
index 295ace42ae..98295ac30e 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2647,8 +2647,9 @@
  # Since: 2.10
  ##
  { 'union': 'BlockdevQcowEncryption',
-  'base': { 'format': 'BlockdevQcowEncryptionFormat' },
+  'base': { '*format': 'BlockdevQcowEncryptionFormat' },
'discriminator': 'format',
+  'default-variant': 'aes',
'data': { 'aes': 'QCryptoBlockOptionsQCow' } }
  
  ##




--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v2 04/10] qapi: Formalize qcow2 encryption probing

2018-06-11 Thread Eric Blake


On 06/11/2018 03:51 PM, Max Reitz wrote:

Currently, you can give no encryption format for a qcow2 file while
still passing a key-secret.  That does not conform to the schema, so
this patch changes the schema to allow it.

Signed-off-by: Max Reitz 
---
  qapi/block-core.json | 44 
  block/qcow2.c|  3 +++
  2 files changed, 43 insertions(+), 4 deletions(-)



+##
+# @ImageInfoSpecificQCow2EncryptionNoInfo:
+#
+# Only used for the qcow2 encryption format "auto" in which the actual
+# encryption format is determined from the image header.  Therefore,
+# this encryption format will never be reported in
+# ImageInfoSpecificQCow2Encryption.
+#
+# Since: 3.0
+##
+{ 'struct': 'ImageInfoSpecificQCow2EncryptionNoInfo',
+  'data': { } }


Do we actually need this type, given Anton's work on making omitted 
branches automatically use an empty struct?


https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg06836.html


+
  ##
  # @ImageInfoSpecificQCow2Encryption:
  #
@@ -53,7 +66,8 @@
'base': 'ImageInfoSpecificQCow2EncryptionBase',
'discriminator': 'format',
'data': { 'aes': 'QCryptoBlockInfoQCow',
-'luks': 'QCryptoBlockInfoLUKS' } }
+'luks': 'QCryptoBlockInfoLUKS',
+'auto': 'ImageInfoSpecificQCow2EncryptionNoInfo' } }


If Anton's patches go in first, you don't even have to change this type;

  
  ##

  # @ImageInfoSpecificQCow2:
@@ -2658,10 +2672,30 @@
  # @BlockdevQcow2EncryptionFormat:
  # @aes: AES-CBC with plain64 initialization venctors
  #
+# @auto:Determine the encryption format from the image
+#   header.  This only allows the use of the
+#   key-secret option.  (Since: 3.0)
+#
  # Since: 2.10
  ##
  { 'enum': 'BlockdevQcow2EncryptionFormat',
-  'data': [ 'aes', 'luks' ] }
+  'data': [ 'aes', 'luks', 'auto' ] }


the changed enum would be sufficient.


+
+##
+# @BlockdevQcow2EncryptionSecret:
+#
+# Allows specifying a key-secret without specifying the exact
+# encryption format, which is determined automatically from the image
+# header.
+#
+# @key-secret:  The ID of a QCryptoSecret object providing the
+#   decryption key.  Mandatory except when probing
+#   image for metadata only.
+#
+# Since: 3.0
+##
+{ 'struct': 'BlockdevQcow2EncryptionSecret',
+  'data': { '*key-secret': 'str' } }
  
  ##

  # @BlockdevQcow2Encryption:
@@ -2669,10 +2703,12 @@
  # Since: 2.10
  ##
  { 'union': 'BlockdevQcow2Encryption',
-  'base': { 'format': 'BlockdevQcow2EncryptionFormat' },
+  'base': { '*format': 'BlockdevQcow2EncryptionFormat' },
'discriminator': 'format',
+  'default-variant': 'auto',
'data': { 'aes': 'QCryptoBlockOptionsQCow',
-'luks': 'QCryptoBlockOptionsLUKS'} }
+'luks': 'QCryptoBlockOptionsLUKS',
+'auto': 'BlockdevQcow2EncryptionSecret' } }


This part is necessary, though, and looks correct.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v13 2/5] i386: Introduce auto_topoext bit to manage topoext

2018-06-11 Thread Eduardo Habkost

On Mon, Jun 11, 2018 at 05:46:23PM -0300, Eduardo Habkost wrote:
[...]
> On PC_COMPAT_2_12, both would work:
>   { TYPE_X86_CPU, "auto-topoext", "off" }
> or
>   { "EPYC" "-" TYPE_X86_CPU, "auto-topoext", "off" }.
> 
> I prefer the latter, but both would work.

Oh, while we're at it: please name the property "x-auto-topoext",
to indicate it's only for QEMU internal use or debugging, and not
a supported command-line option.

-- 
Eduardo

Re: [Qemu-devel] [PATCH v2 04/10] qapi: Formalize qcow2 encryption probing

2018-06-11 Thread Max Reitz

On 2018-06-11 23:02, Eric Blake wrote:
> On 06/11/2018 03:51 PM, Max Reitz wrote:
>> Currently, you can give no encryption format for a qcow2 file while
>> still passing a key-secret.  That does not conform to the schema, so
>> this patch changes the schema to allow it.
>>
>> Signed-off-by: Max Reitz 
>> ---
>>   qapi/block-core.json | 44 
>>   block/qcow2.c    |  3 +++
>>   2 files changed, 43 insertions(+), 4 deletions(-)
> 
>> +##
>> +# @ImageInfoSpecificQCow2EncryptionNoInfo:
>> +#
>> +# Only used for the qcow2 encryption format "auto" in which the actual
>> +# encryption format is determined from the image header.  Therefore,
>> +# this encryption format will never be reported in
>> +# ImageInfoSpecificQCow2Encryption.
>> +#
>> +# Since: 3.0
>> +##
>> +{ 'struct': 'ImageInfoSpecificQCow2EncryptionNoInfo',
>> +  'data': { } }
> 
> Do we actually need this type, given Anton's work on making omitted
> branches automatically use an empty struct?
> 
> https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg06836.html

Looks like no, we don't.  Great! :-)

I think I'll still keep part of the comment and move it down into the
description of ImageInfoSpecificQCow2Encryption so that anyone who's
wondering knows that this value won't appear.

Thanks for reviewing and pointing me at it,

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v13 4/5] i386: Verify and enable topoext feature if supported

2018-06-11 Thread Moger, Babu




> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]
> On Behalf Of Eduardo Habkost
> Sent: Monday, June 11, 2018 3:52 PM
> To: Moger, Babu 
> Cc: m...@redhat.com; marcel.apfelb...@gmail.com; pbonz...@redhat.com;
> r...@twiddle.net; mtosa...@redhat.com; qemu-devel@nongnu.org;
> k...@vger.kernel.org; k...@tripleback.net; ge...@hostfission.com
> Subject: Re: [PATCH v13 4/5] i386: Verify and enable topoext feature if
> supported
> 
> On Fri, Jun 08, 2018 at 06:56:20PM -0400, Babu Moger wrote:
> > If the CPU model supports topoext feature, enabled the
> > feature automatically if it can be supported.
> >
> > Signed-off-by: Babu Moger 
> > ---
> >  target/i386/cpu.c | 40 
> >  1 file changed, 40 insertions(+)
> >
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 4dd9a82..88bc73d 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -4763,6 +4763,33 @@ static int x86_cpu_filter_features(X86CPU *cpu)
> >  #define IS_AMD_CPU(env) ((env)->cpuid_vendor1 ==
> CPUID_VENDOR_AMD_1 && \
> >   (env)->cpuid_vendor2 == CPUID_VENDOR_AMD_2 && \
> >   (env)->cpuid_vendor3 == CPUID_VENDOR_AMD_3)
> > +/*
> > + * Check if we can support this topology
> > + * Fail if number of cores are beyond the supported config
> > + * or nr_threads is more than 2
> > + */
> > +static int topology_supports_topoext(int nr_cores, int nr_threads,
> > +Error **errp)
> > +{
> > +if (nr_cores > (MAX_CORES_IN_NODE * MAX_NODES_PER_SOCKET)) {
> > +error_setg(errp, "TOPOEXT unsupported with %d cores per socket",
> > +   nr_cores);
> > +error_append_hint(errp, "TOPOEXT supports only up to %d cores
> per"
> > +  " socket\n",
> > +  (MAX_CORES_IN_NODE * MAX_NODES_PER_SOCKET));
> > +return false;
> > +}
> > +
> > +if (nr_threads > 2) {
> > +error_setg(errp, "TOPOEXT unsupported with %d threads per core",
> > +   nr_threads);
> > +error_append_hint(errp, "TOPOEXT supports only up to 2 threads"
> > +  " per core\n");
> > +return false;
> > +}
> > +return true;
> > +}
> > +
> >  static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> >  {
> >  CPUState *cs = CPU(dev);
> > @@ -4953,6 +4980,19 @@ static void x86_cpu_realizefn(DeviceState *dev,
> Error **errp)
> >
> >  qemu_init_vcpu(cs);
> >
> > +if (cpu->auto_topoext &&
> > +!(env->user_features[FEAT_8000_0001_ECX] &
> CPUID_EXT3_TOPOEXT)) {
> > +if (cs->nr_cores <= (MAX_CORES_IN_NODE *
> MAX_NODES_PER_SOCKET) &&
> > +(cs->nr_threads <= 2)) {
> 
> This duplicates the logic from topology_supports_topoext().  Why
> not just call
>   topology_supports_topoext(cs->nr_cores, cs->nr_threads, NULL)
> here?

Ok. Will do it.

> 
> > +env->features[FEAT_8000_0001_ECX] |= CPUID_EXT3_TOPOEXT;
> > +}
> > +}
> > +
> > +if ((env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_TOPOEXT) &&
> > +!topology_supports_topoext(cs->nr_cores, cs->nr_threads, errp)) {
> > +return;
> > +}
> > +
> >  /* Only Intel CPUs support hyperthreading. Even though QEMU fixes this
> >   * issue by adjusting CPUID__0001_EBX and CPUID_8000_0008_ECX
> >   * based on inputs (sockets,cores,threads), it is still better to gives
> > --
> > 1.8.3.1
> >
> 
> --
> Eduardo

Re: [Qemu-devel] [PATCH v13 1/5] i386: Add support for CPUID_8000_001E for AMD

2018-06-11 Thread Moger, Babu



> -Original Message-
> From: Eduardo Habkost [mailto:ehabk...@redhat.com]
> Sent: Monday, June 11, 2018 3:55 PM
> To: Moger, Babu 
> Cc: m...@redhat.com; marcel.apfelb...@gmail.com; pbonz...@redhat.com;
> r...@twiddle.net; mtosa...@redhat.com; qemu-devel@nongnu.org;
> k...@vger.kernel.org; k...@tripleback.net; ge...@hostfission.com
> Subject: Re: [PATCH v13 1/5] i386: Add support for CPUID_8000_001E for
> AMD
> 
> On Fri, Jun 08, 2018 at 06:56:17PM -0400, Babu Moger wrote:
> > Add support for cpuid leaf CPUID_8000_001E. Build the config that closely
> > match the underlying hardware. Please refer to the Processor
> Programming
> > Reference (PPR) for AMD Family 17h Model for more details.
> >
> > Signed-off-by: Babu Moger 
> 
> Queued on x86-next, thanks.
> 

Thanks

> --
> Eduardo

Re: [Qemu-devel] [PATCH v13 2/5] i386: Introduce auto_topoext bit to manage topoext

2018-06-11 Thread Moger, Babu




> -Original Message-
> From: Eduardo Habkost [mailto:ehabk...@redhat.com]
> Sent: Monday, June 11, 2018 3:46 PM
> To: Moger, Babu 
> Cc: m...@redhat.com; marcel.apfelb...@gmail.com; pbonz...@redhat.com;
> r...@twiddle.net; mtosa...@redhat.com; qemu-devel@nongnu.org;
> k...@vger.kernel.org; k...@tripleback.net; ge...@hostfission.com
> Subject: Re: [PATCH v13 2/5] i386: Introduce auto_topoext bit to manage
> topoext
> 
> On Mon, Jun 11, 2018 at 08:25:32PM +, Moger, Babu wrote:
> > Hi Eduardo,
> > Planning  to make couple of changes after testing and review.  Let me know
> if I missed something.
> > Will wait for your feedback before the next revision.
> >
> > > -Original Message-
> > > From: kvm-ow...@vger.kernel.org [mailto:kvm-
> ow...@vger.kernel.org]
> > > On Behalf Of Babu Moger
> > > Sent: Friday, June 8, 2018 5:56 PM
> > > To: m...@redhat.com; marcel.apfelb...@gmail.com;
> pbonz...@redhat.com;
> > > r...@twiddle.net; ehabk...@redhat.com; mtosa...@redhat.com
> > > Cc: qemu-devel@nongnu.org; k...@vger.kernel.org; Moger, Babu
> > > ; k...@tripleback.net; ge...@hostfission.com
> > > Subject: [PATCH v13 2/5] i386: Introduce auto_topoext bit to manage
> > > topoext
> > >
> > > Introduce the auto_topoext bit to to control topoext feature.
> > >
> > > Also add new field auto_topoext(in X86CPUDefinition). This will
> > > be used to enable topoext on newer CPU models where topoext can
> > > be supported.
> > >
> > > Signed-off-by: Babu Moger 
> > > ---
> > >  include/hw/i386/pc.h |  4 
> > >  target/i386/cpu.c| 12 
> > >  target/i386/cpu.h|  5 +
> > >  3 files changed, 21 insertions(+)
> > >
> > > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> > > index 04d1f8c..cc30ec3 100644
> > > --- a/include/hw/i386/pc.h
> > > +++ b/include/hw/i386/pc.h
> > > @@ -303,6 +303,10 @@ bool e820_get_entry(int, uint32_t, uint64_t *,
> > > uint64_t *);
> > >  .driver   = TYPE_X86_CPU,\
> > >  .property = "legacy-cache",\
> > >  .value= "on",\
> > > +},{\
> > > +.driver   = TYPE_X86_CPU,\
> > > +.property = "auto-topoext",\
> > > +.value= "off",\
> > >  },
> >
> > We don't need this.  We are already setting this false in
> x86_cpu_properties.
> > But we need to set it "on" for EPYC on PC_COMPAT_3_0 whenever we
> define it.
> > That will be a separate patch. Correct?
> 
> PC_COMPAT_3_0 will exist only on QEMU 3.1.  If you want to define
> something as the default behavior in pc-3.0, just define it as
> the default in the device code, which is exactly what you are
> already doing in x86_cpu_load_def() below.
> 
> In other words, just set auto_topoext=true on the CPU model table
> for EPYC, and it should be enough.
> 
> On PC_COMPAT_2_12, both would work:
>   { TYPE_X86_CPU, "auto-topoext", "off" }
> or
>   { "EPYC" "-" TYPE_X86_CPU, "auto-topoext", "off" }.
> 
> I prefer the latter, but both would work.

Ok. Sure

> 
> 
> >
> > >
> > >  #define PC_COMPAT_2_11 \
> > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > > index 86fb1a4..d3411ed 100644
> > > --- a/target/i386/cpu.c
> > > +++ b/target/i386/cpu.c
> > > @@ -1283,6 +1283,8 @@ struct X86CPUDefinition {
> > >  FeatureWordArray features;
> > >  const char *model_id;
> > >  CPUCaches *cache_info;
> > > +/* Set it if topoext can be enabled in CPU models */
> > > +int auto_topoext;
> > >  };
> > >
> > >  static CPUCaches epyc_cache_info = {
> > > @@ -3517,6 +3519,9 @@ static void x86_cpu_load_def(X86CPU *cpu,
> > > X86CPUDefinition *def, Error **errp)
> > >  /* legacy-cache defaults to 'off' if CPU model provides cache info */
> > >  cpu->legacy_cache = !def->cache_info;
> > >
> > > +/* Set auto_topoext if both machine property and CPU model
> supports it
> > > */
> > > +cpu->auto_topoext =  cpu->auto_topoext & def->auto_topoext;
> >
> > This could be more appropriate like this.
> >
> >  cpu->auto_topoext =  cpu->auto_topoext && def->auto_topoext;
> 
> cpu->auto_topoext is supposed to be already false, here.  Why not
> just:
> cpu->auto_topoext = def->auto_topoext;
> ?

Ok. Will do it.

> 
> >
> > > +
> > >  /* Special cases not set in the X86CPUDefinition structs: */
> > >  /* TODO: in-kernel irqchip for hvf */
> > >  if (kvm_enabled()) {
> > > @@ -5382,6 +5387,13 @@ static Property x86_cpu_properties[] = {
> > >  DEFINE_PROP_BOOL("legacy-cache", X86CPU, legacy_cache, true),
> > >
> > >  /*
> > > + * auto-topoext property will be used to enable topoext feature.
> > > + * This will be disabled on all the older CPU models. Will be
> > > + * enabled on newer CPU modeles which can support topology
> extention.
> > > + */
> > > + DEFINE_PROP_BOOL("auto-topoext", X86CPU, auto_topoext, false),
> > > +
> > > +/*
> > >   * From "Requirements for Implementing the Microsoft
> > >   * Hypervisor Interface":
> > >   *

1 2 3 4 5 >

1 - 100 of 402 matches

Mail list logo