[Qemu-devel] [PATCH v4 01/14] compiler.h: add QEMU_ALIGNED() to enforce struct alignment

2016-04-29 Thread Emilio G. Cota
Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 include/qemu/compiler.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 8f1cc7b..b64f899 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -41,6 +41,8 @@
 # define QEMU_PACKED __attribute__((packed))
 #endif
 
+#define QEMU_ALIGNED(X) __attribute__((aligned(X)))
+
 #ifndef glue
 #define xglue(x, y) x ## y
 #define glue(x, y) xglue(x, y)
-- 
2.5.0




[Qemu-devel] [PATCH v4 02/14] seqlock: remove optional mutex

2016-04-29 Thread Emilio G. Cota
This option is unused; besides, it bloats the struct when not needed.
Let's just let writers define their own locks elsewhere.

Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 cpus.c |  2 +-
 include/qemu/seqlock.h | 10 +-
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/cpus.c b/cpus.c
index cbeb1f6..dd86da5 100644
--- a/cpus.c
+++ b/cpus.c
@@ -619,7 +619,7 @@ int cpu_throttle_get_percentage(void)
 
 void cpu_ticks_init(void)
 {
-seqlock_init(_state.vm_clock_seqlock, NULL);
+seqlock_init(_state.vm_clock_seqlock);
 vmstate_register(NULL, 0, _timers, _state);
 throttle_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
cpu_throttle_timer_tick, NULL);
diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h
index 70b01fd..e673482 100644
--- a/include/qemu/seqlock.h
+++ b/include/qemu/seqlock.h
@@ -19,22 +19,17 @@
 typedef struct QemuSeqLock QemuSeqLock;
 
 struct QemuSeqLock {
-QemuMutex *mutex;
 unsigned sequence;
 };
 
-static inline void seqlock_init(QemuSeqLock *sl, QemuMutex *mutex)
+static inline void seqlock_init(QemuSeqLock *sl)
 {
-sl->mutex = mutex;
 sl->sequence = 0;
 }
 
 /* Lock out other writers and update the count.  */
 static inline void seqlock_write_lock(QemuSeqLock *sl)
 {
-if (sl->mutex) {
-qemu_mutex_lock(sl->mutex);
-}
 ++sl->sequence;
 
 /* Write sequence before updating other fields.  */
@@ -47,9 +42,6 @@ static inline void seqlock_write_unlock(QemuSeqLock *sl)
 smp_wmb();
 
 ++sl->sequence;
-if (sl->mutex) {
-qemu_mutex_unlock(sl->mutex);
-}
 }
 
 static inline unsigned seqlock_read_begin(QemuSeqLock *sl)
-- 
2.5.0




[Qemu-devel] [PATCH v4 13/14] tb hash: track translated blocks with qht

2016-04-29 Thread Emilio G. Cota
Having a fixed-size hash table for keeping track of all translation blocks
is suboptimal: some workloads are just too big or too small to get maximum
performance from the hash table. The MRU promotion policy helps improve
performance when the hash table is a little undersized, but it cannot
make up for severely undersized hash tables.

Furthermore, frequent MRU promotions result in writes that are a scalability
bottleneck. For scalability, lookups should only perform reads, not writes.
This is not a big deal for now, but it will become one once MTTCG matures.

The appended fixes these issues by using qht as the implementation of
the TB hash table. This solution is superior to other alternatives considered,
namely:

- master: implementation in QEMU before this patchset
- xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU.
- xxhash-rcu: fixed buckets + xxhash + RCU list + MRU.
  MRU is implemented here by adding an intermediate struct
  that contains the u32 hash and a pointer to the TB; this
  allows us, on an MRU promotion, to copy said struct (that is not
  at the head), and put this new copy at the head. After a grace
  period, the original non-head struct can be eliminated, and
  after another grace period, freed.
- qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize +
   no MRU for lookups; MRU for inserts.
The appended solution is the following:
- qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize +
 no MRU for lookups; MRU for inserts.

The plots below compare the considered solutions. The Y axis shows the
boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis
sweeps the number of buckets (or initial number of buckets for qht-autoresize).
The plots in PNG format (and with errorbars) can be seen here:
  http://imgur.com/a/Awgnq

Each test runs 5 times, and the entire QEMU process is pinned to a
single core for repeatability of results.

Host: Intel Xeon E5-2690

  28 +++-+-+-+++
 A*+ + + master **A*** +
  27 ++* xxhash ##B###++
 |  A**A**   xxhash-rcu $$C$$$ |
  26 C$$  A**A**qht-fixed-nomru*%%D%%%++
 D%%$$  A**A**A*qht-dyn-mru A*EA
  25 ++ %%$$  qht-dyn-nomru ++
 B#%   |
  24 ++#C$++
 |  B###  $|
 |  ## C$$ |
  23 ++   #   C$$ ++
 | B##   C$$%%%D
  22 ++  %B##   C$$C$$C$$C$$C$$C
 |D%%B##  @E@@%%%D%%%@@@E@@E
  21 E@@E@@F&&&@@@E@@@&&%%B##B##B##B##B##B
 + E@@@   F&&&   +  E@ +  F&&&   + +
  20 +++-+-+-+++
 141618202224
 log2 number of buckets

 Host: Intel i7-4790K

  14.5 ++++-++++
   A**   ++ +master **A*** +
14 ++ ** xxhash ##B###++
  13.5 ++   **   xxhash-rcu $$C$$$++
   |qht-fixed-nomru %%D%%% |
13 ++ A**   qht-dyn-mru @@E@@@++
   | A*A**A** qht-dyn-nomru  |
  12.5 C$$   A**A**A*A*****A
12 ++ $$A***  ++
   D%%% $$ |
  11.5 ++  %% ++
   B###  %C$$  |
11 ++  ## D% C$   ++
   | #  %  C$$ |
  10.5 F&##D%   C$$C$$C$$C$C$$$$$C
10 E@@E@@B#B##B##E@@E@@@%%%D%D%%%###B##B
   + F&&  D%%B##B##B#B###@@@D%%%   +
   9.5 

[Qemu-devel] [PATCH v4 11/14] qht: QEMU's fast, resizable and scalable Hash Table

2016-04-29 Thread Emilio G. Cota
This is a hash table with optional auto-resizing and MRU promotion for
reads and writes. Its implementation goal is to stay fast while
scaling for read-mostly workloads.

A hash table with these features will be necessary for the scalability
of the ongoing MTTCG work; before those changes arrive we can already
benefit from the single-threaded speedup that qht also provides.

Signed-off-by: Emilio G. Cota 
---
 include/qemu/qht.h |  67 +
 util/Makefile.objs |   2 +-
 util/qht.c | 722 +
 3 files changed, 790 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/qht.h
 create mode 100644 util/qht.c

diff --git a/include/qemu/qht.h b/include/qemu/qht.h
new file mode 100644
index 000..8beea75
--- /dev/null
+++ b/include/qemu/qht.h
@@ -0,0 +1,67 @@
+/*
+ * Copyright (C) 2016, Emilio G. Cota 
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_QHT_H
+#define QEMU_QHT_H
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/seqlock.h"
+#include "qemu/qdist.h"
+#include "qemu/rcu.h"
+
+struct qht {
+struct qht_map *map;
+unsigned int mode;
+};
+
+struct qht_stats {
+size_t head_buckets;
+size_t used_head_buckets;
+size_t entries;
+struct qdist chain;
+struct qdist occupancy;
+};
+
+typedef bool (*qht_lookup_func_t)(const void *obj, const void *userp);
+typedef void (*qht_iter_func_t)(struct qht *ht, void *p, uint32_t h, void *up);
+
+#define QHT_MODE_MRU_LOOKUP  0x1 /* move looked-up items to head */
+#define QHT_MODE_MRU_INSERT  0x2 /* insert new elements at the head */
+#define QHT_MODE_AUTO_RESIZE 0x4 /* auto-resize when heavily loaded */
+
+void qht_init(struct qht *ht, size_t n_elems, unsigned int mode);
+
+/* call only when there are no readers left */
+void qht_destroy(struct qht *ht);
+
+/* call with an external lock held */
+void qht_reset(struct qht *ht);
+
+/* call with an external lock held */
+void qht_reset_size(struct qht *ht, size_t n_elems);
+
+/* call with an external lock held */
+void qht_insert(struct qht *ht, void *p, uint32_t hash);
+
+/* call with an external lock held */
+bool qht_remove(struct qht *ht, const void *p, uint32_t hash);
+
+/* call with an external lock held */
+void qht_iter(struct qht *ht, qht_iter_func_t func, void *userp);
+
+/* call with an external lock held */
+void qht_grow(struct qht *ht);
+
+void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp,
+ uint32_t hash);
+
+/* pass @stats to qht_statistics_destroy() when done */
+void qht_statistics_init(struct qht *ht, struct qht_stats *stats);
+
+void qht_statistics_destroy(struct qht_stats *stats);
+
+#endif /* QEMU_QHT_H */
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 5985a2e..7952f6d 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -1,4 +1,4 @@
-util-obj-y = osdep.o cutils.o unicode.o qemu-timer-common.o qdist.o
+util-obj-y = osdep.o cutils.o unicode.o qemu-timer-common.o qdist.o qht.o
 util-obj-$(CONFIG_POSIX) += compatfd.o
 util-obj-$(CONFIG_POSIX) += event_notifier-posix.o
 util-obj-$(CONFIG_POSIX) += mmap-alloc.o
diff --git a/util/qht.c b/util/qht.c
new file mode 100644
index 000..ef64510
--- /dev/null
+++ b/util/qht.c
@@ -0,0 +1,722 @@
+/*
+ * qht.c - QEMU Hash Table, designed to scale for read-mostly workloads.
+ *
+ * Copyright (C) 2016, Emilio G. Cota 
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ *
+ * Assumptions:
+ * - Writers and iterators must take an external lock.
+ * - NULL cannot be inserted as a pointer value.
+ *
+ * Features:
+ * - Optional auto-resizing: the hash table resizes up if the load surpasses
+ *   a certain threshold. Resizing is done concurrently with readers.
+ * - Optional bucket MRU promotion policy.
+ *
+ * The key structure is the bucket, which is cacheline-sized. Buckets
+ * contain a few hash values and pointers; the u32 hash values are stored in
+ * full so that resizing is fast. Having this structure instead of directly
+ * chaining items has three advantages:
+ * - Failed lookups fail fast, and touch a minimum number of cache lines.
+ * - Resizing the hash table with concurrent lookups is easy.
+ * - We can have a few Most-Recently-Used (MRU) hash-pointer pairs in the same
+ *   head bucket. This helps scalability, since MRU promotions (i.e. writes to
+ *   the bucket) become less common.
+ *
+ * For concurrent lookups we use a per-bucket seqlock; per-bucket spinlocks
+ * allow readers (lookups) to upgrade to writers and thus implement an MRU
+ * promotion policy; these MRU-induced writes do not touch the cache lines of
+ * other head buckets.
+ *
+ * There are two types of buckets:
+ * 1. "head" buckets are the ones allocated in the array of buckets in qht_map.
+ * 2. all "non-head" buckets (i.e. all others) are members of a 

[Qemu-devel] [PATCH v4 00/14] tb hash improvements

2016-04-29 Thread Emilio G. Cota
Changes from v3:

- added reviewed-by tags from v3. I dropped the review tags from the
  'qht' and 'info jit' patches because they have changed quite a bit
  from v3.
- qdist: new module to print intuitive histograms, see 'info jit' below.
- qht:
  + bug fix: remove unnecessary requirement of hashes being !0; the
only requirement is that pointers must be !NULL.
* qht-test: hash the integers we insert with their own integer values
  instead of using tb_hash_func5. This gives us better control
  of the hash values we're testing, and anyway the values we
  test are all unique, so this doesn't matter.
  + bug fix: was not setting map->n_items to 0 in qht_reset().
  + Do not leave NULL holes after removals. Instead, swap this hole
with the last valid item in the chain. Performance-wise this
makes no difference when resize is on; however, without resize
the gain is measurable.
* A consequence of this is a slight change in MRU promotion: the
  last item in the head bucket is simply swapped with orig[pos],
  without bringing orig to head->next.
* Added bucket corruption checks, enabled with #define QHT_DEBUG.
  + Do not set QHT_MODE_MRU_INSERT for the TB hash. With long chains it
causes quite a performance decrease; with short chains, such as what
we have with xxhash + auto-resize, it has no measurable performance
impact.
  + 'info jit' stats:
* Report the number of empty buckets
* Do not count empty buckets when reporting avg bucket chain length;
  by doing this we get an idea of how many buckets the average lookup
  is going through.
* Report the avg bucket chain length + a histogram for its distribution.
* Report avg bucket chain occupancy (in %) + its distribution's histogram.
  + qht-test: add a few more test cases
  + header guard: s/define QHT_H/define QEMU_QHT_H/
  + consistently use uint32_t for keeping the result of tb_hash_func()
  + avoid false leak reports from valgrind after calling call_rcu(map)
by placing the 'struct rcu_head' field at the top of struct qht_map.
- define QEMU_ALIGNED(X) instead of QEMU_ALIGNED(B)
- add copyright + license to include/processor.h
- add atomic_test_and_set to include/atomic.h, using __atomic_test_and_set
  when available.
- spinlock:
  + use newly-added atomic_test_and_set instead of atomic_xchg
  + remove TATAS for spin_try_lock

Thanks,

Emilio




Re: [Qemu-devel] [RFC v3] translate-all: protect code_gen_buffer with RCU

2016-04-29 Thread Emilio G. Cota
On Tue, Apr 26, 2016 at 07:32:39 +0100, Alex Bennée wrote:
> Emilio G. Cota  writes:
> > With two code_gen "halves", if two tb_flush calls are done in the same
> > RCU read critical section, we're screwed. I added a cpu_exit at the end
> > of tb_flush to try to mitigate this, but I haven't audited all the callers
> > (for instance, what does the gdbstub do?).
> 
> I'm not sure we are going to get much from this approach. The tb_flush
> is a fairly rare occurrence its not like its on the critical performance
> path (although of course pathological cases are possible).

This is what I thought from the beginning, but wanted to give this
alternative a go anyway to see if it was feasible.

On my end I won't do any more work on this approach. Will go back
to locks, despite Paolo's (justified) dislike for them =)

> > If we end up having a mechanism to "stop all  CPUs to do something", as
> > I think we'll end up needing for correct LL/SC emulation, we'll probably
> > be better off using that mechanism for tb_flush as well -- plus, we'll avoid
> > wasting memory.
> 
> I'm fairly certain there will need to be a "stop everything" mode for
> some things - I'm less certain of the best way of doing it. Did you get
> a chance to look at my version of the async_safe_work mechanism?

Not yet, but will get to it very soon.

Cheers,

Emilio



[Qemu-devel] [PATCH v4 07/14] exec: add tb_hash_func5, derived from xxhash

2016-04-29 Thread Emilio G. Cota
This will be used by upcoming changes for hashing the tb hash.

Add this into a separate file to include the copyright notice from
xxhash.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/exec/tb-hash-xx.h | 94 +++
 1 file changed, 94 insertions(+)
 create mode 100644 include/exec/tb-hash-xx.h

diff --git a/include/exec/tb-hash-xx.h b/include/exec/tb-hash-xx.h
new file mode 100644
index 000..67f4e6f
--- /dev/null
+++ b/include/exec/tb-hash-xx.h
@@ -0,0 +1,94 @@
+/*
+ * xxHash - Fast Hash algorithm
+ * Copyright (C) 2012-2016, Yann Collet
+ *
+ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * + Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * + Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following disclaimer
+ * in the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * You can contact the author at :
+ * - xxHash source repository : https://github.com/Cyan4973/xxHash
+ */
+#ifndef EXEC_TB_HASH_XX
+#define EXEC_TB_HASH_XX
+
+#include 
+
+#define PRIME32_1   2654435761U
+#define PRIME32_2   2246822519U
+#define PRIME32_3   3266489917U
+#define PRIME32_4668265263U
+#define PRIME32_5374761393U
+
+#define TB_HASH_XX_SEED 1
+
+/*
+ * xxhash32, customized for input variables that are not guaranteed to be
+ * contiguous in memory.
+ */
+static inline
+uint32_t tb_hash_func5(uint64_t a0, uint64_t b0, uint32_t e)
+{
+uint32_t v1 = TB_HASH_XX_SEED + PRIME32_1 + PRIME32_2;
+uint32_t v2 = TB_HASH_XX_SEED + PRIME32_2;
+uint32_t v3 = TB_HASH_XX_SEED + 0;
+uint32_t v4 = TB_HASH_XX_SEED - PRIME32_1;
+uint32_t a = a0 >> 31 >> 1;
+uint32_t b = a0;
+uint32_t c = b0 >> 31 >> 1;
+uint32_t d = b0;
+uint32_t h32;
+
+v1 += a * PRIME32_2;
+v1 = rol32(v1, 13);
+v1 *= PRIME32_1;
+
+v2 += b * PRIME32_2;
+v2 = rol32(v2, 13);
+v2 *= PRIME32_1;
+
+v3 += c * PRIME32_2;
+v3 = rol32(v3, 13);
+v3 *= PRIME32_1;
+
+v4 += d * PRIME32_2;
+v4 = rol32(v4, 13);
+v4 *= PRIME32_1;
+
+h32 = rol32(v1, 1) + rol32(v2, 7) + rol32(v3, 12) + rol32(v4, 18);
+h32 += 20;
+
+h32 += e * PRIME32_3;
+h32  = rol32(h32, 17) * PRIME32_4;
+
+h32 ^= h32 >> 15;
+h32 *= PRIME32_2;
+h32 ^= h32 >> 13;
+h32 *= PRIME32_3;
+h32 ^= h32 >> 16;
+
+return h32;
+}
+
+#endif /* EXEC_TB_HASH_XX */
-- 
2.5.0




[Qemu-devel] [PATCH v4 09/14] qdist: add module to represent frequency distributions of data

2016-04-29 Thread Emilio G. Cota
Sometimes it is useful to have a quick histogram to represent a certain
distribution -- for example, when investigating a performance regression
in a hash table due to inadequate hashing.

The appended allows us to easily represent a distribution using Unicode
characters. Further, the data structure keeping track of the distribution
is so simple that obtaining its values for off-line processing is trivial.

Example, taking the last 10 commits to QEMU:

 Characters in commit title  Count
---
 39  1
 48  1
 53  1
 54  2
 57  1
 61  1
 67  1
 78  1
 80  1
qdist_init();
qdist_inc(, 39);
[...]
qdist_inc(, 80);

char *str = qdist_pr(, 9, QDIST_PR_LABELS);
// -> [39.0,43.6)▂▂ █▂ ▂ ▄[75.4,80.0]
g_free(str);

char *str = qdist_pr(, 4, QDIST_PR_LABELS);
// -> [39.0,49.2)▁█▁▁[69.8,80.0]
g_free(str);

Signed-off-by: Emilio G. Cota 
---
 include/qemu/qdist.h |  62 +
 util/Makefile.objs   |   2 +-
 util/qdist.c | 386 +++
 3 files changed, 449 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/qdist.h
 create mode 100644 util/qdist.c

diff --git a/include/qemu/qdist.h b/include/qemu/qdist.h
new file mode 100644
index 000..6d8b701
--- /dev/null
+++ b/include/qemu/qdist.h
@@ -0,0 +1,62 @@
+/*
+ * Copyright (C) 2016, Emilio G. Cota 
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_QDIST_H
+#define QEMU_QDIST_H
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/bitops.h"
+
+/*
+ * Samples with the same 'x value' end up in the same qdist_entry,
+ * e.g. inc(0.1) and inc(0.1) end up as {x=0.1, count=2}.
+ *
+ * Binning happens only at print time, so that we retain the flexibility to
+ * choose the binning. This might not be ideal for workloads that do not care
+ * much about precision and insert many samples all with different x values;
+ * in that case, pre-binning (e.g. entering both 0.115 and 0.097 as 0.1)
+ * should be considered.
+ */
+struct qdist_entry {
+double x;
+unsigned long count;
+};
+
+struct qdist {
+struct qdist_entry *entries;
+size_t n;
+};
+
+#define QDIST_PR_BORDER BIT(0)
+#define QDIST_PR_LABELS BIT(1)
+/* the remaining options only work if PR_LABELS is set */
+#define QDIST_PR_NODECIMAL  BIT(2)
+#define QDIST_PR_PERCENTBIT(3)
+#define QDIST_PR_100X   BIT(4)
+#define QDIST_PR_NOBINRANGE BIT(5)
+
+void qdist_init(struct qdist *dist);
+void qdist_destroy(struct qdist *dist);
+
+void qdist_add(struct qdist *dist, double x, long count);
+void qdist_inc(struct qdist *dist, double x);
+double qdist_xmin(const struct qdist *dist);
+double qdist_xmax(const struct qdist *dist);
+double qdist_avg(const struct qdist *dist);
+unsigned long qdist_sample_count(const struct qdist *dist);
+size_t qdist_unique_entries(const struct qdist *dist);
+
+/* callers must free the returned string with g_free() */
+char *qdist_pr_plain(const struct qdist *dist, size_t n_groups);
+
+/* callers must free the returned string with g_free() */
+char *qdist_pr(const struct qdist *dist, size_t n_groups, uint32_t opt);
+
+/* Only qdist code and test code should ever call this function */
+void qdist_bin__internal(struct qdist *to, const struct qdist *from, size_t n);
+
+#endif /* QEMU_QDIST_H */
diff --git a/util/Makefile.objs b/util/Makefile.objs
index a8a777e..5985a2e 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -1,4 +1,4 @@
-util-obj-y = osdep.o cutils.o unicode.o qemu-timer-common.o
+util-obj-y = osdep.o cutils.o unicode.o qemu-timer-common.o qdist.o
 util-obj-$(CONFIG_POSIX) += compatfd.o
 util-obj-$(CONFIG_POSIX) += event_notifier-posix.o
 util-obj-$(CONFIG_POSIX) += mmap-alloc.o
diff --git a/util/qdist.c b/util/qdist.c
new file mode 100644
index 000..3343640
--- /dev/null
+++ b/util/qdist.c
@@ -0,0 +1,386 @@
+/*
+ * qdist.c - QEMU helpers for handling frequency distributions of data.
+ *
+ * Copyright (C) 2016, Emilio G. Cota 
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+#include "qemu/qdist.h"
+
+#include 
+#ifndef NAN
+#define NAN (0.0 / 0.0)
+#endif
+
+void qdist_init(struct qdist *dist)
+{
+dist->entries = NULL;
+dist->n = 0;
+}
+
+void qdist_destroy(struct qdist *dist)
+{
+g_free(dist->entries);
+}
+
+static inline int qdist_cmp_double(double a, double b)
+{
+if (a > b) {
+return 1;
+} else if (a < b) {
+return -1;
+}
+return 0;
+}
+
+static int qdist_cmp(const void *ap, const void *bp)
+{
+const struct qdist_entry *a = ap;
+const struct qdist_entry *b = bp;
+
+return 

[Qemu-devel] [PATCH v4 12/14] qht: add test program

2016-04-29 Thread Emilio G. Cota
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 tests/.gitignore |   1 +
 tests/Makefile   |   5 +-
 tests/test-qht.c | 177 +++
 3 files changed, 182 insertions(+), 1 deletion(-)
 create mode 100644 tests/test-qht.c

diff --git a/tests/.gitignore b/tests/.gitignore
index 384d2fd..c6e680d 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -49,6 +49,7 @@ test-qdev-global-props
 test-qemu-opts
 test-qdist
 test-qga
+test-qht
 test-qmp-commands
 test-qmp-commands.h
 test-qmp-event
diff --git a/tests/Makefile b/tests/Makefile
index d770bf8..0ab10af 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -70,6 +70,8 @@ check-unit-y += tests/test-rcu-list$(EXESUF)
 gcov-files-test-rcu-list-y = util/rcu.c
 check-unit-y += tests/test-qdist$(EXESUF)
 gcov-files-test-qdist-y = util/qdist.c
+check-unit-y += tests/test-qht$(EXESUF)
+gcov-files-test-qht-y = util/qht.c
 check-unit-y += tests/test-bitops$(EXESUF)
 check-unit-$(CONFIG_HAS_GLIB_SUBPROCESS_TESTS) += 
tests/test-qdev-global-props$(EXESUF)
 check-unit-y += tests/check-qom-interface$(EXESUF)
@@ -392,7 +394,7 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o 
tests/check-qdict.o \
tests/test-x86-cpuid.o tests/test-mul64.o tests/test-int128.o \
tests/test-opts-visitor.o tests/test-qmp-event.o \
tests/rcutorture.o tests/test-rcu-list.o \
-   tests/test-qdist.o
+   tests/test-qdist.o tests/test-qht.o
 
 $(test-obj-y): QEMU_INCLUDES += -Itests
 QEMU_CFLAGS += -I$(SRC_PATH)/tests
@@ -431,6 +433,7 @@ tests/test-int128$(EXESUF): tests/test-int128.o
 tests/rcutorture$(EXESUF): tests/rcutorture.o $(test-util-obj-y)
 tests/test-rcu-list$(EXESUF): tests/test-rcu-list.o $(test-util-obj-y)
 tests/test-qdist$(EXESUF): tests/test-qdist.o $(test-util-obj-y)
+tests/test-qht$(EXESUF): tests/test-qht.o $(test-util-obj-y)
 
 tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
hw/core/qdev.o hw/core/qdev-properties.o hw/core/hotplug.o\
diff --git a/tests/test-qht.c b/tests/test-qht.c
new file mode 100644
index 000..c8866e8
--- /dev/null
+++ b/tests/test-qht.c
@@ -0,0 +1,177 @@
+#include "qemu/osdep.h"
+#include 
+#include "qemu/qht.h"
+
+#define N 5000
+
+static struct qht ht;
+static int32_t arr[N * 2];
+
+static bool is_equal(const void *obj, const void *userp)
+{
+const int32_t *a = obj;
+const int32_t *b = userp;
+
+return *a == *b;
+}
+
+static void insert(int a, int b)
+{
+int i;
+
+for (i = a; i < b; i++) {
+uint32_t hash;
+
+arr[i] = i;
+hash = i;
+
+qht_insert(, [i], hash);
+}
+}
+
+static void rm(int init, int end)
+{
+int i;
+
+for (i = init; i < end; i++) {
+uint32_t hash;
+
+hash = arr[i];
+g_assert_true(qht_remove(, [i], hash));
+}
+}
+
+static void check(int a, int b, bool expected)
+{
+struct qht_stats stats;
+int i;
+
+for (i = a; i < b; i++) {
+void *p;
+uint32_t hash;
+int32_t val;
+
+val = i;
+hash = i;
+p = qht_lookup(, is_equal, , hash);
+g_assert_true(!!p == expected);
+}
+qht_statistics_init(, );
+if (stats.used_head_buckets) {
+g_assert_cmpfloat(qdist_avg(), >=, 1.0);
+}
+g_assert_cmpuint(stats.head_buckets, >, 0);
+qht_statistics_destroy();
+}
+
+static void count_func(struct qht *ht, void *p, uint32_t hash, void *userp)
+{
+unsigned int *curr = userp;
+
+(*curr)++;
+}
+
+static void check_n(size_t expected)
+{
+struct qht_stats stats;
+
+qht_statistics_init(, );
+g_assert_cmpuint(stats.entries, ==, expected);
+qht_statistics_destroy();
+}
+
+static void iter_check(unsigned int count)
+{
+unsigned int curr = 0;
+
+qht_iter(, count_func, );
+g_assert_cmpuint(curr, ==, count);
+}
+
+static void qht_do_test(unsigned int mode, size_t init_entries)
+{
+qht_init(, 0, mode);
+
+insert(0, N);
+check(0, N, true);
+check_n(N);
+check(-N, -1, false);
+iter_check(N);
+
+rm(101, 102);
+check_n(N - 1);
+insert(N, N * 2);
+check_n(N + N - 1);
+rm(N, N * 2);
+check_n(N - 1);
+insert(101, 102);
+check_n(N);
+
+rm(10, 200);
+check_n(N - 190);
+insert(150, 200);
+check_n(N - 190 + 50);
+insert(10, 150);
+check_n(N);
+
+rm(1, 2);
+check_n(N - 1);
+qht_reset_size(, 0);
+check_n(0);
+check(0, N, false);
+
+qht_destroy();
+}
+
+static void qht_test(unsigned int mode)
+{
+qht_do_test(mode, 0);
+qht_do_test(mode, 1);
+qht_do_test(mode, 2);
+qht_do_test(mode, 8);
+qht_do_test(mode, 16);
+qht_do_test(mode, 8192);
+qht_do_test(mode, 16384);
+}
+
+static void test_default(void)
+{
+qht_test(0);
+}
+
+static void test_resize(void)
+{
+qht_test(QHT_MODE_AUTO_RESIZE);
+}
+
+static void test_mru_lookup(void)
+{
+qht_test(QHT_MODE_MRU_LOOKUP);
+}
+

[Qemu-devel] [PATCH v4 06/14] qemu-thread: add simple test-and-set spinlock

2016-04-29 Thread Emilio G. Cota
From: Guillaume Delbergue 

Signed-off-by: Guillaume Delbergue 
[Rewritten. - Paolo]
Signed-off-by: Paolo Bonzini 
[Emilio's additions: use atomic_test_and_set instead of atomic_xchg;
 call cpu_relax() while spinning; optimize for uncontended locks by
 acquiring the lock with TAS instead of TATAS.]
Signed-off-by: Emilio G. Cota 
---
 include/qemu/thread.h | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index bdae6df..39ff1ac 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -1,6 +1,9 @@
 #ifndef __QEMU_THREAD_H
 #define __QEMU_THREAD_H 1
 
+#include 
+#include "qemu/processor.h"
+#include "qemu/atomic.h"
 
 typedef struct QemuMutex QemuMutex;
 typedef struct QemuCond QemuCond;
@@ -60,4 +63,35 @@ struct Notifier;
 void qemu_thread_atexit_add(struct Notifier *notifier);
 void qemu_thread_atexit_remove(struct Notifier *notifier);
 
+typedef struct QemuSpin {
+int value;
+} QemuSpin;
+
+static inline void qemu_spin_init(QemuSpin *spin)
+{
+spin->value = 0;
+}
+
+static inline void qemu_spin_lock(QemuSpin *spin)
+{
+while (atomic_test_and_set(>value)) {
+while (atomic_read(>value)) {
+cpu_relax();
+}
+}
+}
+
+static inline int qemu_spin_trylock(QemuSpin *spin)
+{
+if (atomic_test_and_set(>value)) {
+return -EBUSY;
+}
+return 0;
+}
+
+static inline void qemu_spin_unlock(QemuSpin *spin)
+{
+atomic_mb_set(>value, 0);
+}
+
 #endif
-- 
2.5.0




[Qemu-devel] [PATCH v4 08/14] tb hash: hash phys_pc, pc, and flags with xxhash

2016-04-29 Thread Emilio G. Cota
For some workloads such as arm bootup, tb_phys_hash is performance-critical.
The is due to the high frequency of accesses to the hash table, originated
by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's.
More info:
  https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html

To dig further into this I modified an arm image booting debian jessie to
immediately shut down after boot. Analysis revealed that quite a bit of time
is unnecessarily spent in tb_phys_hash: the cause is poor hashing that
results in very uneven loading of chains in the hash table's buckets;
the longest observed chain had ~550 elements.

The appended addresses this with two changes:

1) Use xxhash as the hash table's hash function. xxhash is a fast,
   high-quality hashing function.

2) Feed the hashing function with not just tb_phys, but also pc and flags.

This improves performance over using just tb_phys for hashing, since that
resulted in some hash buckets having many TB's, while others getting very few;
with these changes, the longest observed chain on a single hash bucket is
brought down from ~550 to ~40.

Tests show that the other element checked for in tb_find_physical,
cs_base, is always a match when tb_phys+pc+flags are a match,
so hashing cs_base is wasteful. It could be that this is an ARM-only
thing, though.

BTW, after this change the hash table should not be called "tb_hash_phys"
anymore; this is addressed later in this series.

This change gives consistent bootup time improvements. I tested two
host machines:
- Intel Xeon E5-2690: 11.6% less time
- Intel i7-4790K: 19.2% less time

Increasing the number of hash buckets yields further improvements. However,
using a larger, fixed number of buckets can degrade performance for other
workloads that do not translate as many blocks (600K+ for debian-jessie arm
bootup). This is dealt with later in this series.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 cpu-exec.c |  4 ++--
 include/exec/tb-hash.h |  8 ++--
 translate-all.c| 11 ++-
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index debc65c..395b302 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -224,7 +224,7 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
 {
 CPUArchState *env = (CPUArchState *)cpu->env_ptr;
 TranslationBlock *tb, **ptb1;
-unsigned int h;
+uint32_t h;
 tb_page_addr_t phys_pc, phys_page1;
 target_ulong virt_page2;
 
@@ -233,7 +233,7 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
 /* find translated block using physical mappings */
 phys_pc = get_page_addr_code(env, pc);
 phys_page1 = phys_pc & TARGET_PAGE_MASK;
-h = tb_phys_hash_func(phys_pc);
+h = tb_hash_func(phys_pc, pc, flags);
 ptb1 = _ctx.tb_ctx.tb_phys_hash[h];
 for(;;) {
 tb = *ptb1;
diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 0f4e8a0..4b9635a 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -20,6 +20,9 @@
 #ifndef EXEC_TB_HASH
 #define EXEC_TB_HASH
 
+#include "exec/exec-all.h"
+#include "exec/tb-hash-xx.h"
+
 /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for
addresses on the same page.  The top bits are the same.  This allows
TLB invalidation to quickly clear a subset of the hash table.  */
@@ -43,9 +46,10 @@ static inline unsigned int 
tb_jmp_cache_hash_func(target_ulong pc)
| (tmp & TB_JMP_ADDR_MASK));
 }
 
-static inline unsigned int tb_phys_hash_func(tb_page_addr_t pc)
+static inline
+uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, int flags)
 {
-return (pc >> 2) & (CODE_GEN_PHYS_HASH_SIZE - 1);
+return tb_hash_func5(phys_pc, pc, flags) & (CODE_GEN_PHYS_HASH_SIZE - 1);
 }
 
 #endif
diff --git a/translate-all.c b/translate-all.c
index 05c0b50..0463efc 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -965,13 +965,14 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 {
 CPUState *cpu;
 PageDesc *p;
-unsigned int h, n1;
+unsigned int n1;
+uint32_t h;
 tb_page_addr_t phys_pc;
 TranslationBlock *tb1, *tb2;
 
 /* remove the TB from the hash list */
 phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
-h = tb_phys_hash_func(phys_pc);
+h = tb_hash_func(phys_pc, tb->pc, tb->flags);
 tb_hash_remove(_ctx.tb_ctx.tb_phys_hash[h], tb);
 
 /* remove the TB from the page list */
@@ -1470,11 +1471,11 @@ static inline void tb_alloc_page(TranslationBlock *tb,
 static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
  tb_page_addr_t phys_page2)
 {
-unsigned int h;
+uint32_t h;
 TranslationBlock **ptb;
 
-/* add in the physical hash table */
-h = tb_phys_hash_func(phys_pc);
+/* add in the hash table */
+h = 

[Qemu-devel] [PATCH v4 14/14] translate-all: add tb hash bucket info to 'info jit' dump

2016-04-29 Thread Emilio G. Cota
Examples:

- Good hashing, i.e. tb_hash_func5(phys_pc, pc, flags):
TB count715135/2684354
[...]
TB hash buckets 388775/524288 (74.15% head buckets used)
TB hash occupancy   33.04% avg chain occ. Histogram: [0,10)%|▆ █  
▅▁▃▁▁|[90,100]%
TB hash avg chain   1.017 buckets. Histogram: 1|█▁▁|3

- Not-so-good hashing, i.e. tb_hash_func5(phys_pc, pc, 0):
TB count712636/2684354
[...]
TB hash buckets 344924/524288 (65.79% head buckets used)
TB hash occupancy   31.64% avg chain occ. Histogram: [0,10)%|█ ▆  
▅▁▃▁▂|[90,100]%
TB hash avg chain   1.047 buckets. Histogram: 1|█▁▁▁|4

- Bad hashing, i.e. tb_hash_func5(phys_pc, 0, 0):
TB count702818/2684354
[...]
TB hash buckets 112741/524288 (21.50% head buckets used)
TB hash occupancy   10.15% avg chain occ. Histogram: [0,10)%|█ ▁  
▁|[90,100]%
TB hash avg chain   2.107 buckets. Histogram: [1.0,10.2)|█▁|[83.8,93.0]

- Good hashing, but no auto-resize:
TB count715634/2684354
TB hash buckets 8192/8192 (100.00% head buckets used)
TB hash occupancy   98.30% avg chain occ. Histogram: 
[95.3,95.8)%|▁▁▃▄▃▄▁▇▁█|[99.5,100.0]%
TB hash avg chain   22.070 buckets. Histogram: 
[15.0,16.7)|▁▂▅▄█▅|[30.3,32.0]

Suggested-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 translate-all.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/translate-all.c b/translate-all.c
index 0bf76d7..775ea79 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1665,6 +1665,10 @@ void dump_exec_info(FILE *f, fprintf_function 
cpu_fprintf)
 int i, target_code_size, max_target_code_size;
 int direct_jmp_count, direct_jmp2_count, cross_page;
 TranslationBlock *tb;
+struct qht_stats hst;
+uint32_t hgram_opts;
+size_t hgram_bins;
+char *hgram;
 
 target_code_size = 0;
 max_target_code_size = 0;
@@ -1715,6 +1719,38 @@ void dump_exec_info(FILE *f, fprintf_function 
cpu_fprintf)
 direct_jmp2_count,
 tcg_ctx.tb_ctx.nb_tbs ? (direct_jmp2_count * 100) /
 tcg_ctx.tb_ctx.nb_tbs : 0);
+
+qht_statistics_init(_ctx.tb_ctx.htable, );
+
+cpu_fprintf(f, "TB hash buckets %zu/%zu (%0.2f%% head buckets used)\n",
+hst.used_head_buckets, hst.head_buckets,
+(double)hst.used_head_buckets / hst.head_buckets * 100);
+
+hgram_opts =  QDIST_PR_BORDER | QDIST_PR_LABELS;
+hgram_opts |= QDIST_PR_100X   | QDIST_PR_PERCENT;
+if (qdist_xmax() - qdist_xmin() == 1) {
+hgram_opts |= QDIST_PR_NODECIMAL;
+}
+hgram = qdist_pr(, 10, hgram_opts);
+cpu_fprintf(f, "TB hash occupancy   %0.2f%% avg chain occ. Histogram: 
%s\n",
+qdist_avg() * 100, hgram);
+g_free(hgram);
+
+hgram_opts = QDIST_PR_BORDER | QDIST_PR_LABELS;
+hgram_bins = qdist_xmax() - qdist_xmin();
+if (hgram_bins > 10) {
+hgram_bins = 10;
+} else {
+hgram_bins = 0;
+hgram_opts |= QDIST_PR_NODECIMAL | QDIST_PR_NOBINRANGE;
+}
+hgram = qdist_pr(, hgram_bins, hgram_opts);
+cpu_fprintf(f, "TB hash avg chain   %0.3f buckets. Histogram: %s\n",
+qdist_avg(), hgram);
+g_free(hgram);
+
+qht_statistics_destroy();
+
 cpu_fprintf(f, "\nStatistics:\n");
 cpu_fprintf(f, "TB flush count  %d\n", tcg_ctx.tb_ctx.tb_flush_count);
 cpu_fprintf(f, "TB invalidate count %d\n",
-- 
2.5.0




[Qemu-devel] [PATCH v4 04/14] include/processor.h: define cpu_relax()

2016-04-29 Thread Emilio G. Cota
Taken from the linux kernel.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 include/qemu/processor.h | 34 ++
 1 file changed, 34 insertions(+)
 create mode 100644 include/qemu/processor.h

diff --git a/include/qemu/processor.h b/include/qemu/processor.h
new file mode 100644
index 000..4e6a71f
--- /dev/null
+++ b/include/qemu/processor.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright (C) 2016, Emilio G. Cota 
+ *
+ * License: GNU GPL, version 2.
+ *   See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_PROCESSOR_H
+#define QEMU_PROCESSOR_H
+
+#include "qemu/atomic.h"
+
+#if defined(__i386__) || defined(__x86_64__)
+#define cpu_relax() asm volatile("rep; nop" ::: "memory")
+#endif
+
+#ifdef __ia64__
+#define cpu_relax() asm volatile("hint @pause" ::: "memory")
+#endif
+
+#ifdef __aarch64__
+#define cpu_relax() asm volatile("yield" ::: "memory")
+#endif
+
+#if defined(__powerpc64__)
+/* set Hardware Multi-Threading (HMT) priority to low; then back to medium */
+#define cpu_relax() asm volatile("or 1, 1, 1;"
+ "or 2, 2, 2;" ::: "memory")
+#endif
+
+#ifndef cpu_relax
+#define cpu_relax() barrier()
+#endif
+
+#endif /* QEMU_PROCESSOR_H */
-- 
2.5.0




[Qemu-devel] [PATCH v4 05/14] atomics: add atomic_test_and_set

2016-04-29 Thread Emilio G. Cota
This new helper expands to __atomic_test_and_set where available;
otherwise it expands to atomic_xchg.

Signed-off-by: Emilio G. Cota 
---
 include/qemu/atomic.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index 5bc4d6c..6132dad 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -134,6 +134,7 @@
 })
 
 /* Provide shorter names for GCC atomic builtins, return old value */
+#define atomic_test_and_set(ptr) __atomic_test_and_set(ptr, __ATOMIC_SEQ_CST)
 #define atomic_fetch_inc(ptr)  __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST)
 #define atomic_fetch_dec(ptr)  __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST)
 #define atomic_fetch_add(ptr, n) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST)
@@ -328,6 +329,7 @@
 #endif
 
 /* Provide shorter names for GCC atomic builtins.  */
+#define atomic_test_and_set(ptr) atomic_xchg(ptr, true)
 #define atomic_fetch_inc(ptr)  __sync_fetch_and_add(ptr, 1)
 #define atomic_fetch_dec(ptr)  __sync_fetch_and_add(ptr, -1)
 #define atomic_fetch_add   __sync_fetch_and_add
-- 
2.5.0




[Qemu-devel] [PATCH v4 03/14] seqlock: rename write_lock/unlock to write_begin/end

2016-04-29 Thread Emilio G. Cota
It is a more appropriate name, now that the mutex embedded
in the seqlock is gone.

Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 cpus.c | 28 ++--
 include/qemu/seqlock.h |  4 ++--
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/cpus.c b/cpus.c
index dd86da5..735c9b2 100644
--- a/cpus.c
+++ b/cpus.c
@@ -247,13 +247,13 @@ int64_t cpu_get_clock(void)
 void cpu_enable_ticks(void)
 {
 /* Here, the really thing protected by seqlock is cpu_clock_offset. */
-seqlock_write_lock(_state.vm_clock_seqlock);
+seqlock_write_begin(_state.vm_clock_seqlock);
 if (!timers_state.cpu_ticks_enabled) {
 timers_state.cpu_ticks_offset -= cpu_get_host_ticks();
 timers_state.cpu_clock_offset -= get_clock();
 timers_state.cpu_ticks_enabled = 1;
 }
-seqlock_write_unlock(_state.vm_clock_seqlock);
+seqlock_write_end(_state.vm_clock_seqlock);
 }
 
 /* disable cpu_get_ticks() : the clock is stopped. You must not call
@@ -263,13 +263,13 @@ void cpu_enable_ticks(void)
 void cpu_disable_ticks(void)
 {
 /* Here, the really thing protected by seqlock is cpu_clock_offset. */
-seqlock_write_lock(_state.vm_clock_seqlock);
+seqlock_write_begin(_state.vm_clock_seqlock);
 if (timers_state.cpu_ticks_enabled) {
 timers_state.cpu_ticks_offset += cpu_get_host_ticks();
 timers_state.cpu_clock_offset = cpu_get_clock_locked();
 timers_state.cpu_ticks_enabled = 0;
 }
-seqlock_write_unlock(_state.vm_clock_seqlock);
+seqlock_write_end(_state.vm_clock_seqlock);
 }
 
 /* Correlation between real and virtual time is always going to be
@@ -292,7 +292,7 @@ static void icount_adjust(void)
 return;
 }
 
-seqlock_write_lock(_state.vm_clock_seqlock);
+seqlock_write_begin(_state.vm_clock_seqlock);
 cur_time = cpu_get_clock_locked();
 cur_icount = cpu_get_icount_locked();
 
@@ -313,7 +313,7 @@ static void icount_adjust(void)
 last_delta = delta;
 timers_state.qemu_icount_bias = cur_icount
   - (timers_state.qemu_icount << 
icount_time_shift);
-seqlock_write_unlock(_state.vm_clock_seqlock);
+seqlock_write_end(_state.vm_clock_seqlock);
 }
 
 static void icount_adjust_rt(void *opaque)
@@ -353,7 +353,7 @@ static void icount_warp_rt(void)
 return;
 }
 
-seqlock_write_lock(_state.vm_clock_seqlock);
+seqlock_write_begin(_state.vm_clock_seqlock);
 if (runstate_is_running()) {
 int64_t clock = REPLAY_CLOCK(REPLAY_CLOCK_VIRTUAL_RT,
  cpu_get_clock_locked());
@@ -372,7 +372,7 @@ static void icount_warp_rt(void)
 timers_state.qemu_icount_bias += warp_delta;
 }
 vm_clock_warp_start = -1;
-seqlock_write_unlock(_state.vm_clock_seqlock);
+seqlock_write_end(_state.vm_clock_seqlock);
 
 if (qemu_clock_expired(QEMU_CLOCK_VIRTUAL)) {
 qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
@@ -397,9 +397,9 @@ void qtest_clock_warp(int64_t dest)
 int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
 int64_t warp = qemu_soonest_timeout(dest - clock, deadline);
 
-seqlock_write_lock(_state.vm_clock_seqlock);
+seqlock_write_begin(_state.vm_clock_seqlock);
 timers_state.qemu_icount_bias += warp;
-seqlock_write_unlock(_state.vm_clock_seqlock);
+seqlock_write_end(_state.vm_clock_seqlock);
 
 qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
 timerlist_run_timers(aio_context->tlg.tl[QEMU_CLOCK_VIRTUAL]);
@@ -466,9 +466,9 @@ void qemu_start_warp_timer(void)
  * It is useful when we want a deterministic execution time,
  * isolated from host latencies.
  */
-seqlock_write_lock(_state.vm_clock_seqlock);
+seqlock_write_begin(_state.vm_clock_seqlock);
 timers_state.qemu_icount_bias += deadline;
-seqlock_write_unlock(_state.vm_clock_seqlock);
+seqlock_write_end(_state.vm_clock_seqlock);
 qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
 } else {
 /*
@@ -479,11 +479,11 @@ void qemu_start_warp_timer(void)
  * you will not be sending network packets continuously instead of
  * every 100ms.
  */
-seqlock_write_lock(_state.vm_clock_seqlock);
+seqlock_write_begin(_state.vm_clock_seqlock);
 if (vm_clock_warp_start == -1 || vm_clock_warp_start > clock) {
 vm_clock_warp_start = clock;
 }
-seqlock_write_unlock(_state.vm_clock_seqlock);
+seqlock_write_end(_state.vm_clock_seqlock);
 timer_mod_anticipate(icount_warp_timer, clock + deadline);
 }
 } else if (deadline == 0) {
diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h
index e673482..4dfc055 

[Qemu-devel] [PATCH v4 10/14] qdist: add test program

2016-04-29 Thread Emilio G. Cota
Signed-off-by: Emilio G. Cota 
---
 tests/.gitignore   |   1 +
 tests/Makefile |   6 +-
 tests/test-qdist.c | 363 +
 3 files changed, 369 insertions(+), 1 deletion(-)
 create mode 100644 tests/test-qdist.c

diff --git a/tests/.gitignore b/tests/.gitignore
index 9eed229..384d2fd 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -47,6 +47,7 @@ test-qapi-types.[ch]
 test-qapi-visit.[ch]
 test-qdev-global-props
 test-qemu-opts
+test-qdist
 test-qga
 test-qmp-commands
 test-qmp-commands.h
diff --git a/tests/Makefile b/tests/Makefile
index 9194f18..d770bf8 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -68,6 +68,8 @@ check-unit-y += tests/rcutorture$(EXESUF)
 gcov-files-rcutorture-y = util/rcu.c
 check-unit-y += tests/test-rcu-list$(EXESUF)
 gcov-files-test-rcu-list-y = util/rcu.c
+check-unit-y += tests/test-qdist$(EXESUF)
+gcov-files-test-qdist-y = util/qdist.c
 check-unit-y += tests/test-bitops$(EXESUF)
 check-unit-$(CONFIG_HAS_GLIB_SUBPROCESS_TESTS) += 
tests/test-qdev-global-props$(EXESUF)
 check-unit-y += tests/check-qom-interface$(EXESUF)
@@ -389,7 +391,8 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o 
tests/check-qdict.o \
tests/test-qmp-commands.o tests/test-visitor-serialization.o \
tests/test-x86-cpuid.o tests/test-mul64.o tests/test-int128.o \
tests/test-opts-visitor.o tests/test-qmp-event.o \
-   tests/rcutorture.o tests/test-rcu-list.o
+   tests/rcutorture.o tests/test-rcu-list.o \
+   tests/test-qdist.o
 
 $(test-obj-y): QEMU_INCLUDES += -Itests
 QEMU_CFLAGS += -I$(SRC_PATH)/tests
@@ -427,6 +430,7 @@ tests/test-cutils$(EXESUF): tests/test-cutils.o 
util/cutils.o
 tests/test-int128$(EXESUF): tests/test-int128.o
 tests/rcutorture$(EXESUF): tests/rcutorture.o $(test-util-obj-y)
 tests/test-rcu-list$(EXESUF): tests/test-rcu-list.o $(test-util-obj-y)
+tests/test-qdist$(EXESUF): tests/test-qdist.o $(test-util-obj-y)
 
 tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
hw/core/qdev.o hw/core/qdev-properties.o hw/core/hotplug.o\
diff --git a/tests/test-qdist.c b/tests/test-qdist.c
new file mode 100644
index 000..ecdd3f4
--- /dev/null
+++ b/tests/test-qdist.c
@@ -0,0 +1,363 @@
+#include "qemu/osdep.h"
+#include 
+#include "qemu/qdist.h"
+
+#include 
+
+struct entry_desc {
+double x;
+unsigned long count;
+
+/* 0 prints a space, 1-8 prints from qdist_blocks[] */
+int fill_code;
+};
+
+/* See: https://en.wikipedia.org/wiki/Block_Elements */
+static const gunichar qdist_blocks[] = {
+0x2581,
+0x2582,
+0x2583,
+0x2584,
+0x2585,
+0x2586,
+0x2587,
+0x2588
+};
+
+#define QDIST_NR_BLOCK_CODES ARRAY_SIZE(qdist_blocks)
+
+static char *pr_hist(const struct entry_desc *darr, size_t n)
+{
+GString *s = g_string_new("");
+size_t i;
+
+for (i = 0; i < n; i++) {
+int fill = darr[i].fill_code;
+
+if (fill) {
+assert(fill <= QDIST_NR_BLOCK_CODES);
+g_string_append_unichar(s, qdist_blocks[fill - 1]);
+} else {
+g_string_append_c(s, ' ');
+}
+}
+return g_string_free(s, FALSE);
+}
+
+static void
+histogram_check(const struct qdist *dist, const struct entry_desc *darr,
+size_t n, size_t n_bins)
+{
+char *pr = qdist_pr_plain(dist, n_bins);
+char *str = pr_hist(darr, n);
+
+g_assert_cmpstr(pr, ==, str);
+g_free(pr);
+g_free(str);
+}
+
+static void histogram_check_single_full(const struct qdist *dist, size_t 
n_bins)
+{
+struct entry_desc desc = { .fill_code = 8 };
+
+histogram_check(dist, , 1, n_bins);
+}
+
+static void
+entries_check(const struct qdist *dist, const struct entry_desc *darr, size_t 
n)
+{
+size_t i;
+
+for (i = 0; i < n; i++) {
+struct qdist_entry *e = >entries[i];
+
+g_assert_cmpuint(e->count, ==, darr[i].count);
+}
+}
+
+static void
+entries_insert(struct qdist *dist, const struct entry_desc *darr, size_t n)
+{
+size_t i;
+
+for (i = 0; i < n; i++) {
+qdist_add(dist, darr[i].x, darr[i].count);
+}
+}
+
+static void do_test_bin(const struct entry_desc *a, size_t n_a,
+const struct entry_desc *b, size_t n_b)
+{
+struct qdist qda;
+struct qdist qdb;
+
+qdist_init();
+
+entries_insert(, a, n_a);
+qdist_inc(, a[0].x);
+qdist_add(, a[0].x, -1);
+
+g_assert_cmpuint(qdist_unique_entries(), ==, n_a);
+g_assert_cmpfloat(qdist_xmin(), ==, a[0].x);
+g_assert_cmpfloat(qdist_xmax(), ==, a[n_a - 1].x);
+histogram_check(, a, n_a, 0);
+histogram_check(, a, n_a, n_a);
+
+qdist_bin__internal(, , n_b);
+g_assert_cmpuint(qdb.n, ==, n_b);
+entries_check(, b, n_b);
+g_assert_cmpuint(qdist_sample_count(), ==, qdist_sample_count());
+/*
+ * No histogram_check() for $qdb, since we'd rebin it and that is a bug.
+ * Instead, regenerate it from $qda.
+ */
+

[Qemu-devel] [PATCH v2] spapr: Don't set the TM ibm, pa-features bit in PR KVM mode

2016-04-29 Thread Anton Blanchard
We don't support transactional memory in PR KVM, so don't tell
the OS that we do.

Signed-off-by: Anton Blanchard 
---

v2: Fix build with CONFIG_KVM disabled, noticed by Alex.

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b69995e..dc3e3c9 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -696,6 +696,14 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, 
int offset,
 } else /* env->mmu_model == POWERPC_MMU_2_07 */ {
 pa_features = pa_features_207;
 pa_size = sizeof(pa_features_207);
+
+#ifdef CONFIG_KVM
+/* Don't enable TM in PR KVM mode */
+if (kvm_enabled() &&
+kvm_vm_check_extension(cs->kvm_state, KVM_CAP_PPC_GET_PVINFO)) {
+pa_features[24] &= ~0x80;
+}
+#endif
 }
 if (env->ci_large_pages) {
 pa_features[3] |= 0x20;



Re: [Qemu-devel] [PATCH v4 13/14] block: Switch blk_write_zeroes() to byte interface

2016-04-29 Thread Eric Blake
On 04/29/2016 02:08 PM, Eric Blake wrote:
> Sector-based blk_write() should die; convert the one-off
> variant blk_write_zeroes().
> 
> Signed-off-by: Eric Blake 
> Acked-by: Denis V. Lunev 
> ---

> +++ b/include/sysemu/block-backend.h
> @@ -96,8 +96,8 @@ int blk_pread_unthrottled(BlockBackend *blk, int64_t 
> offset, uint8_t *buf,
>int count);
>  int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
>int nb_sectors);
> -int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
> - int nb_sectors, BdrvRequestFlags flags);
> +int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
> + int count, BdrvRequestFlags flags);

Maintainer may want to fix indentation if I don't need to respin.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [V9 3/4] hw/core: Add AMD IOMMU to machine properties

2016-04-29 Thread David Kiarie
Added an enum, subject to review, to machine properties which
it used to override iommu emulated from Intel to AMD.

Signed-off-by: David Kiarie 
---
 hw/core/machine.c | 33 ++---
 include/hw/boards.h   |  1 +
 include/hw/i386/intel_iommu.h |  1 +
 qemu-options.hx   |  7 +--
 util/qemu-config.c|  8 ++--
 5 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 6dbbc85..ff830f0 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -15,6 +15,8 @@
 #include "qapi/error.h"
 #include "qapi-visit.h"
 #include "qapi/visitor.h"
+#include "hw/i386/amd_iommu.h"
+#include "hw/i386/intel_iommu.h"
 #include "hw/sysbus.h"
 #include "sysemu/sysemu.h"
 #include "qemu/error-report.h"
@@ -300,6 +302,27 @@ static void machine_set_iommu(Object *obj, bool value, 
Error **errp)
 ms->iommu = value;
 }
 
+static void machine_set_iommu_override(Object *obj, const char *value,
+   Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+Error *err = NULL;
+
+ms->iommu_type = TYPE_INTEL;
+/* ensure a valid iommu type */
+if (g_strcmp0(value, AMD_IOMMU_STR) == 0) {
+} else if(g_strcmp0(value, INTEL_IOMMU_STR) == 0) {
+} else {
+error_setg(errp, "Invalid IOMMU type %s", value);
+error_propagate(errp, err);
+return;
+}
+
+if ((g_strcmp0(value, AMD_IOMMU_STR) == 0)) {
+ms->iommu_type = TYPE_AMD;
+}
+}
+
 static void machine_set_suppress_vmdesc(Object *obj, bool value, Error **errp)
 {
 MachineState *ms = MACHINE(obj);
@@ -473,10 +496,14 @@ static void machine_initfn(Object *obj)
 "Firmware image",
 NULL);
 object_property_add_bool(obj, "iommu",
- machine_get_iommu,
- machine_set_iommu, NULL);
+ machine_get_iommu, machine_set_iommu, NULL);
 object_property_set_description(obj, "iommu",
-"Set on/off to enable/disable Intel IOMMU 
(VT-d)",
+"Set on to enable IOMMU emulation",
+NULL);
+object_property_add_str(obj, "x-iommu-type",
+NULL, machine_set_iommu_override, NULL);
+object_property_set_description(obj, "x-iommu-type",
+"Set on to override emulated IOMMU to AMD 
IOMMU",
 NULL);
 object_property_add_bool(obj, "suppress-vmdesc",
  machine_get_suppress_vmdesc,
diff --git a/include/hw/boards.h b/include/hw/boards.h
index dbe6745..5b7eeda 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -158,6 +158,7 @@ struct MachineState {
 bool igd_gfx_passthru;
 char *firmware;
 bool iommu;
+IommuType iommu_type;
 bool suppress_vmdesc;
 bool enforce_config_section;
 
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index b024ffa..7e511e1 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -24,6 +24,7 @@
 #include "hw/qdev.h"
 #include "sysemu/dma.h"
 
+#define INTEL_IOMMU_STR "intel"
 #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
 #define INTEL_IOMMU_DEVICE(obj) \
  OBJECT_CHECK(IntelIOMMUState, (obj), TYPE_INTEL_IOMMU_DEVICE)
diff --git a/qemu-options.hx b/qemu-options.hx
index 6106520..81217d3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -38,7 +38,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
 "kvm_shadow_mem=size of KVM shadow MMU\n"
 "dump-guest-core=on|off include guest memory in a core 
dump (default=on)\n"
 "mem-merge=on|off controls memory merge support (default: 
on)\n"
-"iommu=on|off controls emulated Intel IOMMU (VT-d) support 
(default=off)\n"
+"iommu=on|off controls emulated IOMMU support(default: 
off)\n"
+"x-iommu-type=amd|intel overrides emulated IOMMU to AMD 
IOMMU (default: intel)\n"
 "igd-passthru=on|off controls IGD GFX passthrough support 
(default=off)\n"
 "aes-key-wrap=on|off controls support for AES key wrapping 
(default=on)\n"
 "dea-key-wrap=on|off controls support for DEA key wrapping 
(default=on)\n"
@@ -74,7 +75,9 @@ Enables or disables memory merge support. This feature, when 
supported by
 the host, de-duplicates identical memory pages among VMs instances
 (enabled by default).
 @item iommu=on|off
-Enables or disables emulated Intel IOMMU (VT-d) support. The default is off.
+Enables and disables IOMMU emulation. The default is off.
+@item x-iommu-type=on|off
+Overrides emulated IOMMU from AMD IOMMU. By default Intel IOMMU is emulated.
 @item aes-key-wrap=on|off
 

[Qemu-devel] [V9 2/4] hw/i386: ACPI table for AMD IOMMU

2016-04-29 Thread David Kiarie
Add IVRS table for AMD IOMMU. Generate IVRS or DMAR
depending on emulated IOMMU

Signed-off-by: David Kiarie 
---
 hw/acpi/aml-build.c |  2 +-
 hw/acpi/core.c  | 13 ---
 hw/i386/acpi-build.c| 93 +++--
 include/hw/acpi/acpi-defs.h | 14 +++
 include/hw/acpi/acpi.h  | 16 
 include/hw/acpi/aml-build.h |  1 +
 include/hw/boards.h |  6 +++
 7 files changed, 120 insertions(+), 25 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index ab89ca6..da11bf8 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -227,7 +227,7 @@ static void build_extop_package(GArray *package, uint8_t op)
 build_prepend_byte(package, 0x5B); /* ExtOpPrefix */
 }
 
-static void build_append_int_noprefix(GArray *table, uint64_t value, int size)
+void build_append_int_noprefix(GArray *table, uint64_t value, int size)
 {
 int i;
 
diff --git a/hw/acpi/core.c b/hw/acpi/core.c
index 7925a1a..ee0e687 100644
--- a/hw/acpi/core.c
+++ b/hw/acpi/core.c
@@ -29,19 +29,6 @@
 #include "qapi-visit.h"
 #include "qapi-event.h"
 
-struct acpi_table_header {
-uint16_t _length; /* our length, not actual part of the hdr */
-  /* allows easier parsing for fw_cfg clients */
-char sig[4];  /* ACPI signature (4 ASCII characters) */
-uint32_t length;  /* Length of table, in bytes, including header */
-uint8_t revision; /* ACPI Specification minor version # */
-uint8_t checksum; /* To make sum of entire table == 0 */
-char oem_id[6];   /* OEM identification */
-char oem_table_id[8]; /* OEM table identification */
-uint32_t oem_revision;/* OEM revision number */
-char asl_compiler_id[4];  /* ASL compiler vendor ID */
-uint32_t asl_compiler_revision; /* ASL compiler revision number */
-} QEMU_PACKED;
 
 #define ACPI_TABLE_HDR_SIZE sizeof(struct acpi_table_header)
 #define ACPI_TABLE_PFX_SIZE sizeof(uint16_t)  /* size of the extra prefix */
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 6477003..74ae994 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -52,6 +52,7 @@
 #include "hw/pci/pci_bus.h"
 #include "hw/pci-host/q35.h"
 #include "hw/i386/intel_iommu.h"
+#include "hw/i386/amd_iommu.h"
 #include "hw/timer/hpet.h"
 
 #include "hw/acpi/aml-build.h"
@@ -59,6 +60,8 @@
 #include "qapi/qmp/qint.h"
 #include "qom/qom-qobject.h"
 
+#include "hw/boards.h"
+
 /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
  * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
  * a little bit, there should be plenty of free space since the DSDT
@@ -2598,6 +2601,77 @@ build_dmar_q35(GArray *table_data, GArray *linker)
  "DMAR", table_data->len - dmar_start, 1, NULL, NULL);
 }
 
+static void
+build_amd_iommu(GArray *table_data, GArray *linker)
+{
+int iommu_start = table_data->len;
+bool iommu_ambig;
+
+/* IVRS definition  - table header has an extra 2-byte field */
+acpi_data_push(table_data, (sizeof(acpi_table_header) - 2));
+/* common virtualization information */
+build_append_int_noprefix(table_data, AMD_IOMMU_HOST_ADDRESS_WIDTH << 8, 
4);
+/* reserved */
+build_append_int_noprefix(table_data, 0, 8);
+
+AMDIOMMUState *s = (AMDIOMMUState *)object_resolve_path_type("",
+TYPE_AMD_IOMMU_DEVICE, _ambig);
+
+/* IVDB definition - type 10h */
+if (!iommu_ambig) {
+/* IVHD definition - type 10h */
+build_append_int_noprefix(table_data, 0x10, 1);
+/* virtualization flags */
+build_append_int_noprefix(table_data, (IVHD_HT_TUNEN |
+ IVHD_PPRSUP | IVHD_IOTLBSUP | IVHD_PREFSUP), 1);
+/* ivhd length */
+build_append_int_noprefix(table_data, 0x20, 2);
+/* iommu device id */
+build_append_int_noprefix(table_data, PCI_DEVICE_ID_RD890_IOMMU, 2);
+/* offset of capability registers */
+build_append_int_noprefix(table_data, s->capab_offset, 2);
+/* mmio base register */
+build_append_int_noprefix(table_data, s->mmio.addr, 8);
+/* pci segment */
+build_append_int_noprefix(table_data, 0, 2);
+/* interrupt numbers */
+build_append_int_noprefix(table_data, 0, 2);
+/* feature reporting */
+build_append_int_noprefix(table_data, (IVHD_EFR_GTSUP |
+IVHD_EFR_HATS | IVHD_EFR_GATS), 4);
+/* Add device flags here
+ *   This is are 4-byte device entries currently reporting the range of
+ *   devices 00h - h; all devices
+ *   Device setting affecting all devices should be made here
+ *
+ *   Refer to
+ *   (http://developer.amd.com/wordpress/media/2012/10/488821.pdf)
+ *   Table 95
+ */
+/* start of device range, 4-byte entries */
+  

[Qemu-devel] [V9 1/4] hw/i386: Introduce AMD IOMMU

2016-04-29 Thread David Kiarie
Add AMD IOMMU emulaton to Qemu in addition to Intel IOMMU
The IOMMU does basic translation, error checking and has a
minimal IOTLB implementation

Signed-off-by: David Kiarie 
---
 hw/i386/Makefile.objs |1 +
 hw/i386/amd_iommu.c   | 1426 +
 hw/i386/amd_iommu.h   |  398 ++
 include/hw/pci/pci.h  |2 +
 4 files changed, 1827 insertions(+)
 create mode 100644 hw/i386/amd_iommu.c
 create mode 100644 hw/i386/amd_iommu.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index b52d5b8..2f1a265 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -3,6 +3,7 @@ obj-y += multiboot.o
 obj-y += pc.o pc_piix.o pc_q35.o
 obj-y += pc_sysfw.o
 obj-y += intel_iommu.o
+obj-y += amd_iommu.o
 obj-$(CONFIG_XEN) += ../xenpv/ xen/
 
 obj-y += kvmvapic.o
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
new file mode 100644
index 000..eea4fac
--- /dev/null
+++ b/hw/i386/amd_iommu.c
@@ -0,0 +1,1426 @@
+/*
+ * QEMU emulation of AMD IOMMU (AMD-Vi)
+ *
+ * Copyright (C) 2011 Eduard - Gabriel Munteanu
+ * Copyright (C) 2015 David Kiarie, 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ *
+ * Cache implementation inspired by hw/i386/intel_iommu.c
+ *
+ */
+#include "qemu/osdep.h"
+#include "hw/i386/amd_iommu.h"
+
+//#define DEBUG_AMD_IOMMU
+#ifdef DEBUG_AMD_IOMMU
+enum {
+DEBUG_GENERAL, DEBUG_CAPAB, DEBUG_MMIO, DEBUG_ELOG,
+DEBUG_CACHE, DEBUG_COMMAND, DEBUG_MMU, DEBUG_CUSTOM
+};
+
+#define IOMMU_DBGBIT(x)   (1 << DEBUG_##x)
+static int iommu_dbgflags = IOMMU_DBGBIT(CUSTOM) | IOMMU_DBGBIT(MMIO);
+
+#define IOMMU_DPRINTF(what, fmt, ...) do { \
+if (iommu_dbgflags & IOMMU_DBGBIT(what)) { \
+fprintf(stderr, "(amd-iommu)%s: " fmt "\n", __func__, \
+## __VA_ARGS__); } \
+} while (0)
+#else
+#define IOMMU_DPRINTF(what, fmt, ...) do {} while (0)
+#endif
+
+#define ENCODE_EVENT(devid, info, addr, rshift) do { \
+*(uint16_t *)[0] = devid; \
+*(uint8_t *)[3]  = info;  \
+*(uint64_t *)[4] = rshift ? cpu_to_le64(addr) :\
+   cpu_to_le64(addr) >> rshift; \
+} while (0)
+
+typedef struct AMDIOMMUAddressSpace {
+uint8_t bus_num;/* bus number   */
+uint8_t devfn;  /* device function  */
+AMDIOMMUState *iommu_state; /* IOMMU - one per machine  */
+MemoryRegion iommu; /* Device's iommu region*/
+AddressSpace as;/* device's corresponding address space */
+} AMDIOMMUAddressSpace;
+
+/* IOMMU cache entry */
+typedef struct IOMMUIOTLBEntry {
+uint64_t gfn;
+uint16_t domid;
+uint64_t devid;
+uint64_t perms;
+uint64_t translated_addr;
+} IOMMUIOTLBEntry;
+
+/* configure MMIO registers at startup/reset */
+static void amd_iommu_set_quad(AMDIOMMUState *s, hwaddr addr, uint64_t val,
+   uint64_t romask, uint64_t w1cmask)
+{
+stq_le_p(>mmior[addr], val);
+stq_le_p(>romask[addr], romask);
+stq_le_p(>w1cmask[addr], w1cmask);
+}
+
+static uint16_t amd_iommu_readw(AMDIOMMUState *s, hwaddr addr)
+{
+return lduw_le_p(>mmior[addr]);
+}
+
+static uint32_t amd_iommu_readl(AMDIOMMUState *s, hwaddr addr)
+{
+return ldl_le_p(>mmior[addr]);
+}
+
+static uint64_t amd_iommu_readq(AMDIOMMUState *s, hwaddr addr)
+{
+return ldq_le_p(>mmior[addr]);
+}
+
+/* internal write */
+static void amd_iommu_writeq_raw(AMDIOMMUState *s, uint64_t val, hwaddr addr)
+{
+stq_le_p(>mmior[addr], val);
+}
+
+/* external write */
+static void amd_iommu_writew(AMDIOMMUState *s, hwaddr addr, uint16_t val)
+{
+uint16_t romask = lduw_le_p(>romask[addr]);
+uint16_t w1cmask = lduw_le_p(>w1cmask[addr]);
+uint16_t oldval = lduw_le_p(>mmior[addr]);
+stw_le_p(>mmior[addr], (val & ~(val & w1cmask)) | (romask & oldval));
+}
+
+static void amd_iommu_writel(AMDIOMMUState *s, hwaddr addr, uint32_t val)
+{
+uint32_t romask = ldl_le_p(>romask[addr]);
+uint32_t w1cmask = ldl_le_p(>w1cmask[addr]);
+uint32_t oldval = ldl_le_p(>mmior[addr]);
+stl_le_p(>mmior[addr], (val & ~(val & w1cmask)) | (romask & oldval));
+}
+
+static void amd_iommu_writeq(AMDIOMMUState *s, hwaddr addr, uint64_t val)
+{
+uint64_t romask = ldq_le_p(>romask[addr]);
+

[Qemu-devel] [V9 0/4] AMD IOMMU

2016-04-29 Thread David Kiarie
These series adds AMD IOMMU support to Qemu. It's currently in the 9th version.

In this series I have (hopefully) addressed all the comments made in the 
previous version.
I have also tested and successfully passed-through PCI device 'ac97' with more 
devices to be tested.


David Kiarie (4):
  hw/i386: Introduce AMD IOMMU
  hw/i386: ACPI table for AMD IOMMU
  hw/core: Add AMD IOMMU to machine properties
  hw/pci-host: Emulate AMD IOMMU

 hw/acpi/aml-build.c   |2 +-
 hw/acpi/core.c|   13 -
 hw/core/machine.c |   33 +-
 hw/i386/Makefile.objs |1 +
 hw/i386/acpi-build.c  |   93 ++-
 hw/i386/amd_iommu.c   | 1426 +
 hw/i386/amd_iommu.h   |  398 
 hw/pci-host/q35.c |   25 +-
 include/hw/acpi/acpi-defs.h   |   14 +
 include/hw/acpi/acpi.h|   16 +
 include/hw/acpi/aml-build.h   |1 +
 include/hw/boards.h   |7 +
 include/hw/i386/intel_iommu.h |1 +
 include/hw/pci/pci.h  |2 +
 qemu-options.hx   |7 +-
 util/qemu-config.c|8 +-
 16 files changed, 2012 insertions(+), 35 deletions(-)
 create mode 100644 hw/i386/amd_iommu.c
 create mode 100644 hw/i386/amd_iommu.h

-- 
2.1.4




[Qemu-devel] [PATCH v4 01/14] block: Allow BDRV_REQ_FUA through blk_pwrite()

2016-04-29 Thread Eric Blake
We have several block drivers that understand BDRV_REQ_FUA,
and emulate it in the block layer for the rest by a full flush.
But without a way to actually request BDRV_REQ_FUA during a
pass-through blk_pwrite(), FUA-aware block drivers like NBD are
forced to repeat the emulation logic of a full flush regardless
of whether the backend they are writing to could do it more
efficiently.

This patch just wires up a flags argument; followup patches
will actually make use of it in the NBD driver and in qemu-io.

Signed-off-by: Eric Blake 
Acked-by: Denis V. Lunev 
---
 include/sysemu/block-backend.h |  3 ++-
 block/block-backend.c  |  6 --
 block/crypto.c |  2 +-
 block/parallels.c  |  2 +-
 block/qcow.c   |  8 
 block/qcow2.c  |  4 ++--
 block/qed.c|  6 +++---
 block/sheepdog.c   |  2 +-
 block/vdi.c|  4 ++--
 block/vhdx.c   |  5 +++--
 block/vmdk.c   | 10 +-
 block/vpc.c| 10 +-
 hw/nvram/spapr_nvram.c |  4 ++--
 nbd/server.c   |  2 +-
 qemu-io-cmds.c |  2 +-
 15 files changed, 37 insertions(+), 33 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index c62b6fe..6991b26 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -102,7 +102,8 @@ BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t 
sector_num,
  int nb_sectors, BdrvRequestFlags flags,
  BlockCompletionFunc *cb, void *opaque);
 int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count);
-int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count);
+int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count,
+   BdrvRequestFlags flags);
 int64_t blk_getlength(BlockBackend *blk);
 void blk_get_geometry(BlockBackend *blk, uint64_t *nb_sectors_ptr);
 int64_t blk_nb_sectors(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index a7623e8..96c1d7c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -953,9 +953,11 @@ int blk_pread(BlockBackend *blk, int64_t offset, void 
*buf, int count)
 return count;
 }

-int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count)
+int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count,
+   BdrvRequestFlags flags)
 {
-int ret = blk_prw(blk, offset, (void*) buf, count, blk_write_entry, 0);
+int ret = blk_prw(blk, offset, (void *) buf, count, blk_write_entry,
+  flags);
 if (ret < 0) {
 return ret;
 }
diff --git a/block/crypto.c b/block/crypto.c
index 1903e84..32ba17c 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -91,7 +91,7 @@ static ssize_t block_crypto_write_func(QCryptoBlock *block,
 struct BlockCryptoCreateData *data = opaque;
 ssize_t ret;

-ret = blk_pwrite(data->blk, offset, buf, buflen);
+ret = blk_pwrite(data->blk, offset, buf, buflen, 0);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not write encryption header");
 return ret;
diff --git a/block/parallels.c b/block/parallels.c
index 324ed43..2d8bc87 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -512,7 +512,7 @@ static int parallels_create(const char *filename, QemuOpts 
*opts, Error **errp)
 memset(tmp, 0, sizeof(tmp));
 memcpy(tmp, , sizeof(header));

-ret = blk_pwrite(file, 0, tmp, BDRV_SECTOR_SIZE);
+ret = blk_pwrite(file, 0, tmp, BDRV_SECTOR_SIZE, 0);
 if (ret < 0) {
 goto exit;
 }
diff --git a/block/qcow.c b/block/qcow.c
index 60ddb12..d6dc1b0 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -853,14 +853,14 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }

 /* write all the data */
-ret = blk_pwrite(qcow_blk, 0, , sizeof(header));
+ret = blk_pwrite(qcow_blk, 0, , sizeof(header), 0);
 if (ret != sizeof(header)) {
 goto exit;
 }

 if (backing_file) {
 ret = blk_pwrite(qcow_blk, sizeof(header),
-backing_file, backing_filename_len);
+ backing_file, backing_filename_len, 0);
 if (ret != backing_filename_len) {
 goto exit;
 }
@@ -869,8 +869,8 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 tmp = g_malloc0(BDRV_SECTOR_SIZE);
 for (i = 0; i < ((sizeof(uint64_t)*l1_size + BDRV_SECTOR_SIZE - 1)/
 BDRV_SECTOR_SIZE); i++) {
-ret = blk_pwrite(qcow_blk, header_size +
-BDRV_SECTOR_SIZE*i, tmp, BDRV_SECTOR_SIZE);
+ret = blk_pwrite(qcow_blk, header_size + BDRV_SECTOR_SIZE * i,
+ tmp, BDRV_SECTOR_SIZE, 0);
 if (ret != BDRV_SECTOR_SIZE) 

[Qemu-devel] [PATCH v4 14/14] block: Kill blk_write(), blk_read()

2016-04-29 Thread Eric Blake
Now that there are no remaining clients, we can drop these
functions, to ensure that all future users get the byte-based
interfaces.  Sadly, there are still remaining sector-based
interfaces, such as blk_aio_writev; those will have to wait
for another day.

Signed-off-by: Eric Blake 
---
 include/sysemu/block-backend.h |  4 
 block/block-backend.c  | 25 -
 2 files changed, 29 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 1246699..bf04086 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -90,12 +90,8 @@ void blk_attach_dev_nofail(BlockBackend *blk, void *dev);
 void blk_detach_dev(BlockBackend *blk, void *dev);
 void *blk_get_attached_dev(BlockBackend *blk);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
-int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
- int nb_sectors);
 int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
   int count);
-int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
-  int nb_sectors);
 int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
  int count, BdrvRequestFlags flags);
 BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t sector_num,
diff --git a/block/block-backend.c b/block/block-backend.c
index 71133b2..e6ded54 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -772,24 +772,6 @@ static int blk_prw(BlockBackend *blk, int64_t offset, 
uint8_t *buf,
 return rwco.ret;
 }

-static int blk_rw(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
-  int nb_sectors, CoroutineEntry co_entry,
-  BdrvRequestFlags flags)
-{
-if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
-return -EINVAL;
-}
-
-return blk_prw(blk, sector_num << BDRV_SECTOR_BITS, buf,
-   nb_sectors << BDRV_SECTOR_BITS, co_entry, flags);
-}
-
-int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
- int nb_sectors)
-{
-return blk_rw(blk, sector_num, buf, nb_sectors, blk_read_entry, 0);
-}
-
 int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
   int count)
 {
@@ -807,13 +789,6 @@ int blk_pread_unthrottled(BlockBackend *blk, int64_t 
offset, uint8_t *buf,
 return ret;
 }

-int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
-  int nb_sectors)
-{
-return blk_rw(blk, sector_num, (uint8_t*) buf, nb_sectors,
-  blk_write_entry, 0);
-}
-
 int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
   int count, BdrvRequestFlags flags)
 {
-- 
2.5.5




[Qemu-devel] [PATCH v4 00/14] block: kill sector-based blk_write/read

2016-04-29 Thread Eric Blake
2.7 material, depends on Kevin's block-next:
git://repo.or.cz/qemu/kevin.git block-next

Previously posted as part of a larger NBD series [1] (at v3, explaining
why this is v4), but these are independent enough to make for easier
review on their own, and is mostly orthogonal to Kevin's recent work
to also kill sector interfaces from the driver.

[1] https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg03526.html

Also available as a tag at this location:
git fetch git://repo.or.cz/qemu/ericb.git nbd-block-v4

Changes since then:
add R-b/Acks received so far
rebase to Kevin's block-next branch
patch 8: use new defines for legibility [jsnow]

001/14:[] [--] 'block: Allow BDRV_REQ_FUA through blk_pwrite()'
002/14:[] [--] 'fdc: Switch to byte-based block access'
003/14:[] [--] 'nand: Switch to byte-based block access'
004/14:[] [--] 'onenand: Switch to byte-based block access'
005/14:[] [--] 'pflash: Switch to byte-based block access'
006/14:[] [--] 'sd: Switch to byte-based block access'
007/14:[] [--] 'm25p80: Switch to byte-based block access'
008/14:[0019] [FC] 'atapi: Switch to byte-based block access'
009/14:[] [--] 'nbd: Switch to byte-based block access'
010/14:[] [--] 'qemu-img: Switch to byte-based block access'
011/14:[] [--] 'qemu-io: Switch to byte-based block access'
012/14:[] [-C] 'block: Switch blk_read_unthrottled() to byte interface'
013/14:[] [--] 'block: Switch blk_write_zeroes() to byte interface'
014/14:[] [--] 'block: Kill blk_write(), blk_read()'

Eric Blake (14):
  block: Allow BDRV_REQ_FUA through blk_pwrite()
  fdc: Switch to byte-based block access
  nand: Switch to byte-based block access
  onenand: Switch to byte-based block access
  pflash: Switch to byte-based block access
  sd: Switch to byte-based block access
  m25p80: Switch to byte-based block access
  atapi: Switch to byte-based block access
  nbd: Switch to byte-based block access
  qemu-img: Switch to byte-based block access
  qemu-io: Switch to byte-based block access
  block: Switch blk_read_unthrottled() to byte interface
  block: Switch blk_write_zeroes() to byte interface
  block: Kill blk_write(), blk_read()

 include/sysemu/block-backend.h | 15 
 block/block-backend.c  | 47 +++---
 block/crypto.c |  2 +-
 block/parallels.c  |  5 +--
 block/qcow.c   |  8 ++---
 block/qcow2.c  |  4 +--
 block/qed.c|  6 ++--
 block/sheepdog.c   |  2 +-
 block/vdi.c|  4 +--
 block/vhdx.c   |  5 +--
 block/vmdk.c   | 10 +++---
 block/vpc.c| 10 +++---
 hw/block/fdc.c | 25 +-
 hw/block/hd-geometry.c |  2 +-
 hw/block/m25p80.c  |  3 +-
 hw/block/nand.c| 36 +---
 hw/block/onenand.c | 36 
 hw/block/pflash_cfi01.c| 12 +++
 hw/block/pflash_cfi02.c| 12 +++
 hw/ide/atapi.c | 19 ++-
 hw/nvram/spapr_nvram.c |  4 +--
 hw/sd/sd.c | 46 ++---
 nbd/server.c   |  2 +-
 qemu-img.c | 31 +++--
 qemu-io-cmds.c | 77 ++
 qemu-nbd.c | 11 +++---
 26 files changed, 185 insertions(+), 249 deletions(-)

-- 
2.5.5




[Qemu-devel] [PATCH v4 13/14] block: Switch blk_write_zeroes() to byte interface

2016-04-29 Thread Eric Blake
Sector-based blk_write() should die; convert the one-off
variant blk_write_zeroes().

Signed-off-by: Eric Blake 
Acked-by: Denis V. Lunev 
---
 include/sysemu/block-backend.h | 4 ++--
 block/block-backend.c  | 8 
 block/parallels.c  | 3 ++-
 qemu-img.c | 3 ++-
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 662a106..1246699 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -96,8 +96,8 @@ int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, 
uint8_t *buf,
   int count);
 int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
   int nb_sectors);
-int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
- int nb_sectors, BdrvRequestFlags flags);
+int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+ int count, BdrvRequestFlags flags);
 BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t sector_num,
  int nb_sectors, BdrvRequestFlags flags,
  BlockCompletionFunc *cb, void *opaque);
diff --git a/block/block-backend.c b/block/block-backend.c
index e5a8a07..71133b2 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -814,11 +814,11 @@ int blk_write(BlockBackend *blk, int64_t sector_num, 
const uint8_t *buf,
   blk_write_entry, 0);
 }

-int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
- int nb_sectors, BdrvRequestFlags flags)
+int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
+  int count, BdrvRequestFlags flags)
 {
-return blk_rw(blk, sector_num, NULL, nb_sectors, blk_write_entry,
-  flags | BDRV_REQ_ZERO_WRITE);
+return blk_prw(blk, offset, NULL, count, blk_write_entry,
+   flags | BDRV_REQ_ZERO_WRITE);
 }

 static void error_callback_bh(void *opaque)
diff --git a/block/parallels.c b/block/parallels.c
index 2d8bc87..95bfc32 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -516,7 +516,8 @@ static int parallels_create(const char *filename, QemuOpts 
*opts, Error **errp)
 if (ret < 0) {
 goto exit;
 }
-ret = blk_write_zeroes(file, 1, bat_sectors - 1, 0);
+ret = blk_pwrite_zeroes(file, BDRV_SECTOR_SIZE,
+(bat_sectors - 1) << BDRV_SECTOR_BITS, 0);
 if (ret < 0) {
 goto exit;
 }
diff --git a/qemu-img.c b/qemu-img.c
index c19f9d4..41df87d 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1595,7 +1595,8 @@ static int convert_write(ImgConvertState *s, int64_t 
sector_num, int nb_sectors,
 if (s->has_zero_init) {
 break;
 }
-ret = blk_write_zeroes(s->target, sector_num, n, 0);
+ret = blk_pwrite_zeroes(s->target, sector_num << BDRV_SECTOR_BITS,
+n << BDRV_SECTOR_BITS, 0);
 if (ret < 0) {
 return ret;
 }
-- 
2.5.5




[Qemu-devel] [PATCH v4 12/14] block: Switch blk_read_unthrottled() to byte interface

2016-04-29 Thread Eric Blake
Sector-based blk_read() should die; convert the one-off
variant blk_read_unthrottled().

Signed-off-by: Eric Blake 
---
 include/sysemu/block-backend.h | 4 ++--
 block/block-backend.c  | 8 
 hw/block/hd-geometry.c | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 6991b26..662a106 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -92,8 +92,8 @@ void *blk_get_attached_dev(BlockBackend *blk);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
 int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
  int nb_sectors);
-int blk_read_unthrottled(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
- int nb_sectors);
+int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
+  int count);
 int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
   int nb_sectors);
 int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
diff --git a/block/block-backend.c b/block/block-backend.c
index 96c1d7c..e5a8a07 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -790,19 +790,19 @@ int blk_read(BlockBackend *blk, int64_t sector_num, 
uint8_t *buf,
 return blk_rw(blk, sector_num, buf, nb_sectors, blk_read_entry, 0);
 }

-int blk_read_unthrottled(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
- int nb_sectors)
+int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
+  int count)
 {
 BlockDriverState *bs = blk_bs(blk);
 int ret;

-ret = blk_check_request(blk, sector_num, nb_sectors);
+ret = blk_check_byte_request(blk, offset, count);
 if (ret < 0) {
 return ret;
 }

 bdrv_no_throttling_begin(bs);
-ret = blk_read(blk, sector_num, buf, nb_sectors);
+ret = blk_pread(blk, offset, buf, count);
 bdrv_no_throttling_end(bs);
 return ret;
 }
diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index 6d02192..d388f13 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -66,7 +66,7 @@ static int guess_disk_lchs(BlockBackend *blk,
  * but also in async I/O mode. So the I/O throttling function has to
  * be disabled temporarily here, not permanently.
  */
-if (blk_read_unthrottled(blk, 0, buf, 1) < 0) {
+if (blk_pread_unthrottled(blk, 0, buf, BDRV_SECTOR_SIZE) < 0) {
 return -1;
 }
 /* test msdos magic */
-- 
2.5.5




[Qemu-devel] [PATCH v4 04/14] onenand: Switch to byte-based block access

2016-04-29 Thread Eric Blake
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake 

---
Not compile tested - I'm not sure what else I'd need in my environment
to actually test this one.  I have:
Fedora 23, dnf builddep qemu
./configure --enable-kvm --enable-system --disable-user 
--target-list=x86_64-softmmu,ppc64-softmmu --enable-debug
---
 hw/block/onenand.c | 36 ++--
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/hw/block/onenand.c b/hw/block/onenand.c
index 883f4b1..3d19b0c 100644
--- a/hw/block/onenand.c
+++ b/hw/block/onenand.c
@@ -224,7 +224,8 @@ static void onenand_reset(OneNANDState *s, int cold)
 /* Lock the whole flash */
 memset(s->blockwp, ONEN_LOCK_LOCKED, s->blocks);

-if (s->blk_cur && blk_read(s->blk_cur, 0, s->boot[0], 8) < 0) {
+if (s->blk_cur && blk_pread(s->blk_cur, 0, s->boot[0],
+8 << BDRV_SECTOR_BITS) < 0) {
 hw_error("%s: Loading the BootRAM failed.\n", __func__);
 }
 }
@@ -241,7 +242,8 @@ static inline int onenand_load_main(OneNANDState *s, int 
sec, int secn,
 void *dest)
 {
 if (s->blk_cur) {
-return blk_read(s->blk_cur, sec, dest, secn) < 0;
+return blk_pread(s->blk_cur, sec << BDRV_SECTOR_BITS, dest,
+ secn << BDRV_SECTOR_BITS) < 0;
 } else if (sec + secn > s->secs_cur) {
 return 1;
 }
@@ -257,19 +259,20 @@ static inline int onenand_prog_main(OneNANDState *s, int 
sec, int secn,
 int result = 0;

 if (secn > 0) {
-uint32_t size = (uint32_t)secn * 512;
+uint32_t size = (uint32_t)secn << BDRV_SECTOR_BITS;
+int64_t offset = sec << BDRV_SECTOR_BITS;
 const uint8_t *sp = (const uint8_t *)src;
 uint8_t *dp = 0;
 if (s->blk_cur) {
 dp = g_malloc(size);
-if (!dp || blk_read(s->blk_cur, sec, dp, secn) < 0) {
+if (!dp || blk_pread(s->blk_cur, offset, dp, size) < 0) {
 result = 1;
 }
 } else {
 if (sec + secn > s->secs_cur) {
 result = 1;
 } else {
-dp = (uint8_t *)s->current + (sec << 9);
+dp = (uint8_t *)s->current + offset;
 }
 }
 if (!result) {
@@ -278,7 +281,7 @@ static inline int onenand_prog_main(OneNANDState *s, int 
sec, int secn,
 dp[i] &= sp[i];
 }
 if (s->blk_cur) {
-result = blk_write(s->blk_cur, sec, dp, secn) < 0;
+result = blk_pwrite(s->blk_cur, offset, dp, size, 0) < 0;
 }
 }
 if (dp && s->blk_cur) {
@@ -295,7 +298,8 @@ static inline int onenand_load_spare(OneNANDState *s, int 
sec, int secn,
 uint8_t buf[512];

 if (s->blk_cur) {
-if (blk_read(s->blk_cur, s->secs_cur + (sec >> 5), buf, 1) < 0) {
+int32_t offset = (s->secs_cur + (sec >> 5)) << BDRV_SECTOR_BITS;
+if (blk_pread(s->blk_cur, offset, buf, BDRV_SECTOR_SIZE) < 0) {
 return 1;
 }
 memcpy(dest, buf + ((sec & 31) << 4), secn << 4);
@@ -304,7 +308,7 @@ static inline int onenand_load_spare(OneNANDState *s, int 
sec, int secn,
 } else {
 memcpy(dest, s->current + (s->secs_cur << 9) + (sec << 4), secn << 4);
 }
- 
+
 return 0;
 }

@@ -315,10 +319,11 @@ static inline int onenand_prog_spare(OneNANDState *s, int 
sec, int secn,
 if (secn > 0) {
 const uint8_t *sp = (const uint8_t *)src;
 uint8_t *dp = 0, *dpp = 0;
+uint64_t offset = (s->secs_cur + (sec >> 5)) << BDRV_SECTOR_BITS;
 if (s->blk_cur) {
 dp = g_malloc(512);
 if (!dp
-|| blk_read(s->blk_cur, s->secs_cur + (sec >> 5), dp, 1) < 0) {
+|| blk_pread(s->blk_cur, offset, dp, BDRV_SECTOR_SIZE) < 0) {
 result = 1;
 } else {
 dpp = dp + ((sec & 31) << 4);
@@ -336,8 +341,8 @@ static inline int onenand_prog_spare(OneNANDState *s, int 
sec, int secn,
 dpp[i] &= sp[i];
 }
 if (s->blk_cur) {
-result = blk_write(s->blk_cur, s->secs_cur + (sec >> 5),
-   dp, 1) < 0;
+result = blk_pwrite(s->blk_cur, offset, dp,
+BDRV_SECTOR_SIZE, 0) < 0;
 }
 }
 g_free(dp);
@@ -355,14 +360,17 @@ static inline int onenand_erase(OneNANDState *s, int sec, 
int num)
 for (; num > 0; num--, sec++) {
 if (s->blk_cur) {
 int erasesec = s->secs_cur + (sec >> 5);
-if (blk_write(s->blk_cur, sec, blankbuf, 1) < 0) {
+if (blk_pwrite(s->blk_cur, sec << BDRV_SECTOR_BITS, blankbuf,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 goto fail;
 }
-  

[Qemu-devel] [PATCH v4 10/14] qemu-img: Switch to byte-based block access

2016-04-29 Thread Eric Blake
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake 
---
 qemu-img.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 46f2a6d..c19f9d4 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1088,7 +1088,8 @@ static int check_empty_sectors(BlockBackend *blk, int64_t 
sect_num,
uint8_t *buffer, bool quiet)
 {
 int pnum, ret = 0;
-ret = blk_read(blk, sect_num, buffer, sect_count);
+ret = blk_pread(blk, sect_num << BDRV_SECTOR_BITS, buffer,
+sect_count << BDRV_SECTOR_BITS);
 if (ret < 0) {
 error_report("Error while reading offset %" PRId64 " of %s: %s",
  sectors_to_bytes(sect_num), filename, strerror(-ret));
@@ -1301,7 +1302,8 @@ static int img_compare(int argc, char **argv)
 nb_sectors = MIN(pnum1, pnum2);
 } else if (allocated1 == allocated2) {
 if (allocated1) {
-ret = blk_read(blk1, sector_num, buf1, nb_sectors);
+ret = blk_pread(blk1, sector_num << BDRV_SECTOR_BITS, buf1,
+nb_sectors << BDRV_SECTOR_BITS);
 if (ret < 0) {
 error_report("Error while reading offset %" PRId64 " of 
%s:"
  " %s", sectors_to_bytes(sector_num), 
filename1,
@@ -1309,7 +1311,8 @@ static int img_compare(int argc, char **argv)
 ret = 4;
 goto out;
 }
-ret = blk_read(blk2, sector_num, buf2, nb_sectors);
+ret = blk_pread(blk2, sector_num << BDRV_SECTOR_BITS, buf2,
+nb_sectors << BDRV_SECTOR_BITS);
 if (ret < 0) {
 error_report("Error while reading offset %" PRId64
  " of %s: %s", sectors_to_bytes(sector_num),
@@ -1522,7 +1525,9 @@ static int convert_read(ImgConvertState *s, int64_t 
sector_num, int nb_sectors,
 bs_sectors = s->src_sectors[s->src_cur];

 n = MIN(nb_sectors, bs_sectors - (sector_num - s->src_cur_offset));
-ret = blk_read(blk, sector_num - s->src_cur_offset, buf, n);
+ret = blk_pread(blk,
+(sector_num - s->src_cur_offset) << BDRV_SECTOR_BITS,
+buf, n << BDRV_SECTOR_BITS);
 if (ret < 0) {
 return ret;
 }
@@ -1577,7 +1582,8 @@ static int convert_write(ImgConvertState *s, int64_t 
sector_num, int nb_sectors,
 if (!s->min_sparse ||
 is_allocated_sectors_min(buf, n, , s->min_sparse))
 {
-ret = blk_write(s->target, sector_num, buf, n);
+ret = blk_pwrite(s->target, sector_num << BDRV_SECTOR_BITS,
+ buf, n << BDRV_SECTOR_BITS, 0);
 if (ret < 0) {
 return ret;
 }
@@ -3023,7 +3029,8 @@ static int img_rebase(int argc, char **argv)
 n = old_backing_num_sectors - sector;
 }

-ret = blk_read(blk_old_backing, sector, buf_old, n);
+ret = blk_pread(blk_old_backing, sector << BDRV_SECTOR_BITS,
+buf_old, n << BDRV_SECTOR_BITS);
 if (ret < 0) {
 error_report("error while reading from old backing file");
 goto out;
@@ -3037,7 +3044,8 @@ static int img_rebase(int argc, char **argv)
 n = new_backing_num_sectors - sector;
 }

-ret = blk_read(blk_new_backing, sector, buf_new, n);
+ret = blk_pread(blk_new_backing, sector << BDRV_SECTOR_BITS,
+buf_new, n << BDRV_SECTOR_BITS);
 if (ret < 0) {
 error_report("error while reading from new backing file");
 goto out;
@@ -3053,8 +3061,10 @@ static int img_rebase(int argc, char **argv)
 if (compare_sectors(buf_old + written * 512,
 buf_new + written * 512, n - written, ))
 {
-ret = blk_write(blk, sector + written,
-buf_old + written * 512, pnum);
+ret = blk_pwrite(blk,
+ (sector + written) << BDRV_SECTOR_BITS,
+ buf_old + written * 512,
+ pnum << BDRV_SECTOR_BITS, 0);
 if (ret < 0) {
 error_report("Error while writing to COW image: %s",
 strerror(-ret));
-- 
2.5.5




[Qemu-devel] [PATCH v4 11/14] qemu-io: Switch to byte-based block access

2016-04-29 Thread Eric Blake
blk_write() and blk_read() are now very simple wrappers around
blk_pwrite() and blk_pread().  There's no reason to require
the user to pass in aligned numbers.  Keep 'read -p' and
'write -p' so that I don't have to hunt down and update all
users of qemu-io, but make the default 'read' and 'write' now
do the same behavior that used to require -p.

Signed-off-by: Eric Blake 
---
 qemu-io-cmds.c | 75 +-
 1 file changed, 16 insertions(+), 59 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index e26e543..4184fb8 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -419,40 +419,6 @@ fail:
 return buf;
 }

-static int do_read(BlockBackend *blk, char *buf, int64_t offset, int64_t count,
-   int64_t *total)
-{
-int ret;
-
-if (count >> 9 > INT_MAX) {
-return -ERANGE;
-}
-
-ret = blk_read(blk, offset >> 9, (uint8_t *)buf, count >> 9);
-if (ret < 0) {
-return ret;
-}
-*total = count;
-return 1;
-}
-
-static int do_write(BlockBackend *blk, char *buf, int64_t offset, int64_t 
count,
-int64_t *total)
-{
-int ret;
-
-if (count >> 9 > INT_MAX) {
-return -ERANGE;
-}
-
-ret = blk_write(blk, offset >> 9, (uint8_t *)buf, count >> 9);
-if (ret < 0) {
-return ret;
-}
-*total = count;
-return 1;
-}
-
 static int do_pread(BlockBackend *blk, char *buf, int64_t offset,
 int64_t count, int64_t *total)
 {
@@ -671,7 +637,7 @@ static void read_help(void)
 " -b, -- read from the VM state rather than the virtual disk\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -l, -- length for pattern verification (only with -P)\n"
-" -p, -- use blk_pread to read the file\n"
+" -p, -- ignored for back-compat\n"
 " -P, -- use a pattern to verify read data\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
 " -s, -- start offset for pattern verification (only with -P)\n"
@@ -687,7 +653,7 @@ static const cmdinfo_t read_cmd = {
 .cfunc  = read_f,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-abCpqv] [-P pattern [-s off] [-l len]] off len",
+.args   = "[-abCqv] [-P pattern [-s off] [-l len]] off len",
 .oneline= "reads a number of bytes at a specified offset",
 .help   = read_help,
 };
@@ -695,7 +661,7 @@ static const cmdinfo_t read_cmd = {
 static int read_f(BlockBackend *blk, int argc, char **argv)
 {
 struct timeval t1, t2;
-int Cflag = 0, pflag = 0, qflag = 0, vflag = 0;
+int Cflag = 0, qflag = 0, vflag = 0;
 int Pflag = 0, sflag = 0, lflag = 0, bflag = 0;
 int c, cnt;
 char *buf;
@@ -723,7 +689,7 @@ static int read_f(BlockBackend *blk, int argc, char **argv)
 }
 break;
 case 'p':
-pflag = 1;
+/* Ignored for back-compat */
 break;
 case 'P':
 Pflag = 1;
@@ -755,11 +721,6 @@ static int read_f(BlockBackend *blk, int argc, char **argv)
 return qemuio_command_usage(_cmd);
 }

-if (bflag && pflag) {
-printf("-b and -p cannot be specified at the same time\n");
-return 0;
-}
-
 offset = cvtnum(argv[optind]);
 if (offset < 0) {
 print_cvtnum_err(offset, argv[optind]);
@@ -790,7 +751,7 @@ static int read_f(BlockBackend *blk, int argc, char **argv)
 return 0;
 }

-if (!pflag) {
+if (bflag) {
 if (offset & 0x1ff) {
 printf("offset %" PRId64 " is not sector aligned\n",
offset);
@@ -806,12 +767,10 @@ static int read_f(BlockBackend *blk, int argc, char 
**argv)
 buf = qemu_io_alloc(blk, count, 0xab);

 gettimeofday(, NULL);
-if (pflag) {
-cnt = do_pread(blk, buf, offset, count, );
-} else if (bflag) {
+if (bflag) {
 cnt = do_load_vmstate(blk, buf, offset, count, );
 } else {
-cnt = do_read(blk, buf, offset, count, );
+cnt = do_pread(blk, buf, offset, count, );
 }
 gettimeofday(, NULL);

@@ -991,7 +950,7 @@ static void write_help(void)
 " filled with a set pattern (0xcdcdcdcd).\n"
 " -b, -- write to the VM state rather than the virtual disk\n"
 " -c, -- write compressed data with blk_write_compressed\n"
-" -p, -- use blk_pwrite to write the file\n"
+" -p, -- ignored for back-compat\n"
 " -P, -- use different pattern to fill file\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
@@ -1007,7 +966,7 @@ static const cmdinfo_t write_cmd = {
 .cfunc  = write_f,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-bcCpqz] [-P pattern ] off len",
+.args   = "[-bcCqz] [-P pattern ] off len",
 .oneline= "writes a number of bytes at a specified offset",
 .help   = write_help,
 };
@@ -1015,7 +974,7 @@ static const cmdinfo_t write_cmd = {
 static int write_f(BlockBackend *blk, 

[Qemu-devel] [PATCH v4 05/14] pflash: Switch to byte-based block access

2016-04-29 Thread Eric Blake
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake 
---
 hw/block/pflash_cfi01.c | 12 ++--
 hw/block/pflash_cfi02.c | 12 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 106a775..3a1f85d 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -413,11 +413,11 @@ static void pflash_update(pflash_t *pfl, int offset,
 int offset_end;
 if (pfl->blk) {
 offset_end = offset + size;
-/* round to sectors */
-offset = offset >> 9;
-offset_end = (offset_end + 511) >> 9;
-blk_write(pfl->blk, offset, pfl->storage + (offset << 9),
-  offset_end - offset);
+/* widen to sector boundaries */
+offset = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
+offset_end = QEMU_ALIGN_UP(offset_end, BDRV_SECTOR_SIZE);
+blk_pwrite(pfl->blk, offset, pfl->storage + offset,
+   offset_end - offset, 0);
 }
 }

@@ -739,7 +739,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)

 if (pfl->blk) {
 /* read the initial flash content */
-ret = blk_read(pfl->blk, 0, pfl->storage, total_len >> 9);
+ret = blk_pread(pfl->blk, 0, pfl->storage, total_len);

 if (ret < 0) {
 vmstate_unregister_ram(>mem, DEVICE(pfl));
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index b13172c..5f10610 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -253,11 +253,11 @@ static void pflash_update(pflash_t *pfl, int offset,
 int offset_end;
 if (pfl->blk) {
 offset_end = offset + size;
-/* round to sectors */
-offset = offset >> 9;
-offset_end = (offset_end + 511) >> 9;
-blk_write(pfl->blk, offset, pfl->storage + (offset << 9),
-  offset_end - offset);
+/* widen to sector boundaries */
+offset = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
+offset_end = QEMU_ALIGN_UP(offset_end, BDRV_SECTOR_SIZE);
+blk_pwrite(pfl->blk, offset, pfl->storage + offset,
+   offset_end - offset, 0);
 }
 }

@@ -622,7 +622,7 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
**errp)
 pfl->chip_len = chip_len;
 if (pfl->blk) {
 /* read the initial flash content */
-ret = blk_read(pfl->blk, 0, pfl->storage, chip_len >> 9);
+ret = blk_pread(pfl->blk, 0, pfl->storage, chip_len);
 if (ret < 0) {
 vmstate_unregister_ram(>orig_mem, DEVICE(pfl));
 error_setg(errp, "failed to read the initial flash content");
-- 
2.5.5




[Qemu-devel] [PATCH v4 08/14] atapi: Switch to byte-based block access

2016-04-29 Thread Eric Blake
Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Add new defines ATAPI_SECTOR_BITS and ATAPI_SECTOR_SIZE to
use anywhere we were previously scaling BDRV_SECTOR_* by 4,
for better legibility.

Signed-off-by: Eric Blake 

---
v4: add new defines for use in more places [jsnow]
---
 hw/ide/atapi.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 2bb606c..95056d9 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -28,6 +28,9 @@
 #include "hw/scsi/scsi.h"
 #include "sysemu/block-backend.h"

+#define ATAPI_SECTOR_BITS (2 + BDRV_SECTOR_BITS)
+#define ATAPI_SECTOR_SIZE (1 << ATAPI_SECTOR_BITS)
+
 static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret);

 static void padstr8(uint8_t *buf, int buf_size, const char *src)
@@ -111,7 +114,7 @@ cd_read_sector_sync(IDEState *s)
 {
 int ret;
 block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+ ATAPI_SECTOR_SIZE, BLOCK_ACCT_READ);

 #ifdef DEBUG_IDE_ATAPI
 printf("cd_read_sector_sync: lba=%d\n", s->lba);
@@ -119,12 +122,12 @@ cd_read_sector_sync(IDEState *s)

 switch (s->cd_sector_size) {
 case 2048:
-ret = blk_read(s->blk, (int64_t)s->lba << 2,
-   s->io_buffer, 4);
+ret = blk_pread(s->blk, (int64_t)s->lba << ATAPI_SECTOR_BITS,
+s->io_buffer, ATAPI_SECTOR_SIZE);
 break;
 case 2352:
-ret = blk_read(s->blk, (int64_t)s->lba << 2,
-   s->io_buffer + 16, 4);
+ret = blk_pread(s->blk, (int64_t)s->lba << ATAPI_SECTOR_BITS,
+s->io_buffer + 16, ATAPI_SECTOR_SIZE);
 if (ret >= 0) {
 cd_data_to_raw(s->io_buffer, s->lba);
 }
@@ -182,7 +185,7 @@ static int cd_read_sector(IDEState *s)
 s->iov.iov_base = (s->cd_sector_size == 2352) ?
   s->io_buffer + 16 : s->io_buffer;

-s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+s->iov.iov_len = ATAPI_SECTOR_SIZE;
 qemu_iovec_init_external(>qiov, >iov, 1);

 #ifdef DEBUG_IDE_ATAPI
@@ -190,7 +193,7 @@ static int cd_read_sector(IDEState *s)
 #endif

 block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+ ATAPI_SECTOR_SIZE, BLOCK_ACCT_READ);

 ide_buffered_readv(s, (int64_t)s->lba << 2, >qiov, 4,
cd_read_sector_cb, s);
@@ -435,7 +438,7 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
 #endif

 s->bus->dma->iov.iov_base = (void *)(s->io_buffer + data_offset);
-s->bus->dma->iov.iov_len = n * 4 * 512;
+s->bus->dma->iov.iov_len = n * ATAPI_SECTOR_SIZE;
 qemu_iovec_init_external(>bus->dma->qiov, >bus->dma->iov, 1);

 s->bus->dma->aiocb = ide_buffered_readv(s, (int64_t)s->lba << 2,
-- 
2.5.5




[Qemu-devel] [PATCH v4 07/14] m25p80: Switch to byte-based block access

2016-04-29 Thread Eric Blake
Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Signed-off-by: Eric Blake 

---
Not compile tested - I'm not sure what else I'd need in my environment
to actually test this one.  I have:
Fedora 23, dnf builddep qemu
./configure --enable-kvm --enable-system --disable-user 
--target-list=x86_64-softmmu,ppc64-softmmu --enable-debug
---
 hw/block/m25p80.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index 906b712..01c51a2 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -907,8 +907,7 @@ static int m25p80_init(SSISlave *ss)
 s->storage = blk_blockalign(s->blk, s->size);

 /* FIXME: Move to late init */
-if (blk_read(s->blk, 0, s->storage,
- DIV_ROUND_UP(s->size, BDRV_SECTOR_SIZE))) {
+if (blk_pread(s->blk, 0, s->storage, s->size)) {
 fprintf(stderr, "Failed to initialize SPI flash!\n");
 return 1;
 }
-- 
2.5.5




[Qemu-devel] [PATCH v4 09/14] nbd: Switch to byte-based block access

2016-04-29 Thread Eric Blake
Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Signed-off-by: Eric Blake 
---
 qemu-nbd.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index c55b40f..c07ceef 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -159,12 +159,13 @@ static int find_partition(BlockBackend *blk, int 
partition,
   off_t *offset, off_t *size)
 {
 struct partition_record mbr[4];
-uint8_t data[512];
+uint8_t data[BDRV_SECTOR_SIZE];
 int i;
 int ext_partnum = 4;
 int ret;

-if ((ret = blk_read(blk, 0, data, 1)) < 0) {
+ret = blk_pread(blk, 0, data, sizeof(data));
+if (ret < 0) {
 error_report("error while reading: %s", strerror(-ret));
 exit(EXIT_FAILURE);
 }
@@ -182,10 +183,12 @@ static int find_partition(BlockBackend *blk, int 
partition,

 if (mbr[i].system == 0xF || mbr[i].system == 0x5) {
 struct partition_record ext[4];
-uint8_t data1[512];
+uint8_t data1[BDRV_SECTOR_SIZE];
 int j;

-if ((ret = blk_read(blk, mbr[i].start_sector_abs, data1, 1)) < 0) {
+ret = blk_pread(blk, mbr[i].start_sector_abs << BDRV_SECTOR_BITS,
+data1, sizeof(data1));
+if (ret < 0) {
 error_report("error while reading: %s", strerror(-ret));
 exit(EXIT_FAILURE);
 }
-- 
2.5.5




[Qemu-devel] [PATCH v4 02/14] fdc: Switch to byte-based block access

2016-04-29 Thread Eric Blake
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake 
---
 hw/block/fdc.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 3722275..f73af7d 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -223,6 +223,13 @@ static int fd_sector(FDrive *drv)
   NUM_SIDES(drv));
 }

+/* Returns current position, in bytes, for given drive */
+static int fd_offset(FDrive *drv)
+{
+g_assert(fd_sector(drv) < INT_MAX >> BDRV_SECTOR_BITS);
+return fd_sector(drv) << BDRV_SECTOR_BITS;
+}
+
 /* Seek to a new position:
  * returns 0 if already on right track
  * returns 1 if track changed
@@ -1629,8 +1636,8 @@ static int fdctrl_transfer_handler (void *opaque, int 
nchan,
 if (fdctrl->data_dir != FD_DIR_WRITE ||
 len < FD_SECTOR_LEN || rel_pos != 0) {
 /* READ & SCAN commands and realign to a sector for WRITE */
-if (blk_read(cur_drv->blk, fd_sector(cur_drv),
- fdctrl->fifo, 1) < 0) {
+if (blk_pread(cur_drv->blk, fd_offset(cur_drv),
+  fdctrl->fifo, BDRV_SECTOR_SIZE) < 0) {
 FLOPPY_DPRINTF("Floppy: error getting sector %d\n",
fd_sector(cur_drv));
 /* Sure, image size is too small... */
@@ -1657,8 +1664,8 @@ static int fdctrl_transfer_handler (void *opaque, int 
nchan,

 k->read_memory(fdctrl->dma, nchan, fdctrl->fifo + rel_pos,
fdctrl->data_pos, len);
-if (blk_write(cur_drv->blk, fd_sector(cur_drv),
-  fdctrl->fifo, 1) < 0) {
+if (blk_pwrite(cur_drv->blk, fd_offset(cur_drv),
+   fdctrl->fifo, BDRV_SECTOR_SIZE, 0) < 0) {
 FLOPPY_DPRINTF("error writing sector %d\n",
fd_sector(cur_drv));
 fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM | FD_SR0_SEEK, 
0x00, 0x00);
@@ -1741,7 +1748,8 @@ static uint32_t fdctrl_read_data(FDCtrl *fdctrl)
fd_sector(cur_drv));
 return 0;
 }
-if (blk_read(cur_drv->blk, fd_sector(cur_drv), fdctrl->fifo, 1)
+if (blk_pread(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
+  BDRV_SECTOR_SIZE)
 < 0) {
 FLOPPY_DPRINTF("error getting sector %d\n",
fd_sector(cur_drv));
@@ -1820,7 +1828,8 @@ static void fdctrl_format_sector(FDCtrl *fdctrl)
 }
 memset(fdctrl->fifo, 0, FD_SECTOR_LEN);
 if (cur_drv->blk == NULL ||
-blk_write(cur_drv->blk, fd_sector(cur_drv), fdctrl->fifo, 1) < 0) {
+blk_pwrite(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 FLOPPY_DPRINTF("error formatting sector %d\n", fd_sector(cur_drv));
 fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM | FD_SR0_SEEK, 0x00, 0x00);
 } else {
@@ -2243,8 +2252,8 @@ static void fdctrl_write_data(FDCtrl *fdctrl, uint32_t 
value)
 if (pos == FD_SECTOR_LEN - 1 ||
 fdctrl->data_pos == fdctrl->data_len) {
 cur_drv = get_cur_drv(fdctrl);
-if (blk_write(cur_drv->blk, fd_sector(cur_drv), fdctrl->fifo, 1)
-< 0) {
+if (blk_pwrite(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 FLOPPY_DPRINTF("error writing sector %d\n",
fd_sector(cur_drv));
 break;
-- 
2.5.5




[Qemu-devel] [PATCH v4 03/14] nand: Switch to byte-based block access

2016-04-29 Thread Eric Blake
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

This file is doing some complex computations to map various
flash page sizes (256, 512, and 2048) atop generic uses of
512-byte sector operations.  Perhaps someone will want to tidy
up the file for fewer gymnastics in managing addresses and
offsets, and less wasteful visits of 256-byte pages, but it
was out of scope for this series, where I just went with the
mechanical conversion.

Signed-off-by: Eric Blake 
---
 hw/block/nand.c | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/hw/block/nand.c b/hw/block/nand.c
index 29c6596..2703ff4 100644
--- a/hw/block/nand.c
+++ b/hw/block/nand.c
@@ -663,7 +663,8 @@ static void glue(nand_blk_write_, PAGE_SIZE)(NANDFlashState 
*s)
 sector = SECTOR(s->addr);
 off = (s->addr & PAGE_MASK) + s->offset;
 soff = SECTOR_OFFSET(s->addr);
-if (blk_read(s->blk, sector, iobuf, PAGE_SECTORS) < 0) {
+if (blk_pread(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+  PAGE_SECTORS << BDRV_SECTOR_BITS) < 0) {
 printf("%s: read error in sector %" PRIu64 "\n", __func__, sector);
 return;
 }
@@ -675,21 +676,24 @@ static void glue(nand_blk_write_, 
PAGE_SIZE)(NANDFlashState *s)
 MIN(OOB_SIZE, off + s->iolen - PAGE_SIZE));
 }

-if (blk_write(s->blk, sector, iobuf, PAGE_SECTORS) < 0) {
+if (blk_pwrite(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+   PAGE_SECTORS << BDRV_SECTOR_BITS, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, 
sector);
 }
 } else {
 off = PAGE_START(s->addr) + (s->addr & PAGE_MASK) + s->offset;
 sector = off >> 9;
 soff = off & 0x1ff;
-if (blk_read(s->blk, sector, iobuf, PAGE_SECTORS + 2) < 0) {
+if (blk_pread(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+ (PAGE_SECTORS + 2) << BDRV_SECTOR_BITS) < 0) {
 printf("%s: read error in sector %" PRIu64 "\n", __func__, sector);
 return;
 }

 mem_and(iobuf + soff, s->io, s->iolen);

-if (blk_write(s->blk, sector, iobuf, PAGE_SECTORS + 2) < 0) {
+if (blk_write(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+  (PAGE_SECTORS + 2) << BDRV_SECTOR_BITS, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, 
sector);
 }
 }
@@ -716,17 +720,20 @@ static void glue(nand_blk_erase_, 
PAGE_SIZE)(NANDFlashState *s)
 i = SECTOR(addr);
 page = SECTOR(addr + (1 << (ADDR_SHIFT + s->erase_shift)));
 for (; i < page; i ++)
-if (blk_write(s->blk, i, iobuf, 1) < 0) {
+if (blk_pwrite(s->blk, i << BDRV_SECTOR_BITS, iobuf,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, i);
 }
 } else {
 addr = PAGE_START(addr);
 page = addr >> 9;
-if (blk_read(s->blk, page, iobuf, 1) < 0) {
+if (blk_pread(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+  BDRV_SECTOR_SIZE) < 0) {
 printf("%s: read error in sector %" PRIu64 "\n", __func__, page);
 }
 memset(iobuf + (addr & 0x1ff), 0xff, (~addr & 0x1ff) + 1);
-if (blk_write(s->blk, page, iobuf, 1) < 0) {
+if (blk_pwrite(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, page);
 }

@@ -734,18 +741,20 @@ static void glue(nand_blk_erase_, 
PAGE_SIZE)(NANDFlashState *s)
 i = (addr & ~0x1ff) + 0x200;
 for (addr += ((PAGE_SIZE + OOB_SIZE) << s->erase_shift) - 0x200;
 i < addr; i += 0x200) {
-if (blk_write(s->blk, i >> 9, iobuf, 1) < 0) {
+if (blk_pwrite(s->blk, i, iobuf, BDRV_SECTOR_SIZE, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n",
__func__, i >> 9);
 }
 }

 page = i >> 9;
-if (blk_read(s->blk, page, iobuf, 1) < 0) {
+if (blk_pread(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+  BDRV_SECTOR_SIZE) < 0) {
 printf("%s: read error in sector %" PRIu64 "\n", __func__, page);
 }
 memset(iobuf, 0xff, ((addr - 1) & 0x1ff) + 1);
-if (blk_write(s->blk, page, iobuf, 1) < 0) {
+if (blk_pwrite(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, page);
 }
 }
@@ -760,7 +769,8 @@ static void glue(nand_blk_load_, PAGE_SIZE)(NANDFlashState 
*s,

 if (s->blk) {
 if (s->mem_oob) {
-  

[Qemu-devel] [PATCH v4 06/14] sd: Switch to byte-based block access

2016-04-29 Thread Eric Blake
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Greatly simplifies the code, now that we let the block layer
take care of alignment and read-modify-write on our behalf :)

Signed-off-by: Eric Blake 
---
 hw/sd/sd.c | 46 +++---
 1 file changed, 3 insertions(+), 43 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index b66e5d2..3c2f2f1 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -1577,57 +1577,17 @@ send_response:

 static void sd_blk_read(SDState *sd, uint64_t addr, uint32_t len)
 {
-uint64_t end = addr + len;
-
 DPRINTF("sd_blk_read: addr = 0x%08llx, len = %d\n",
 (unsigned long long) addr, len);
-if (!sd->blk || blk_read(sd->blk, addr >> 9, sd->buf, 1) < 0) {
+if (!sd->blk || blk_pread(sd->blk, addr, sd->data, len) < 0) {
 fprintf(stderr, "sd_blk_read: read error on host side\n");
-return;
 }
-
-if (end > (addr & ~511) + 512) {
-memcpy(sd->data, sd->buf + (addr & 511), 512 - (addr & 511));
-
-if (blk_read(sd->blk, end >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_read: read error on host side\n");
-return;
-}
-memcpy(sd->data + 512 - (addr & 511), sd->buf, end & 511);
-} else
-memcpy(sd->data, sd->buf + (addr & 511), len);
 }

 static void sd_blk_write(SDState *sd, uint64_t addr, uint32_t len)
 {
-uint64_t end = addr + len;
-
-if ((addr & 511) || len < 512)
-if (!sd->blk || blk_read(sd->blk, addr >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: read error on host side\n");
-return;
-}
-
-if (end > (addr & ~511) + 512) {
-memcpy(sd->buf + (addr & 511), sd->data, 512 - (addr & 511));
-if (blk_write(sd->blk, addr >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: write error on host side\n");
-return;
-}
-
-if (blk_read(sd->blk, end >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: read error on host side\n");
-return;
-}
-memcpy(sd->buf, sd->data + 512 - (addr & 511), end & 511);
-if (blk_write(sd->blk, end >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: write error on host side\n");
-}
-} else {
-memcpy(sd->buf + (addr & 511), sd->data, len);
-if (!sd->blk || blk_write(sd->blk, addr >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: write error on host side\n");
-}
+if (!sd->blk || blk_pwrite(sd->blk, addr, sd->buf, len, 0) < 0) {
+fprintf(stderr, "sd_blk_write: write error on host side\n");
 }
 }

-- 
2.5.5




Re: [Qemu-devel] [PATCH v5 00/18] IOMMU: Enable interrupt remapping for Intel IOMMU

2016-04-29 Thread Radim Krčmář
2016-04-28 17:18+0800, Peter Xu:
> On Thu, Apr 28, 2016 at 09:19:28AM +0200, Jan Kiszka wrote:
>> Instead of fiddling with irq routes for the IOAPIC - where we don't need
>> it -, I would suggest to do the following: Send IOAPIC events via
>> kvm_irqchip_send_msi to the kernel. Only irqfd users (vhost, vfio)
>> should use the pattern you are now applying to the IOAPIC: establish
>> static routes as an irqfd is set up, and that route should be translated
>> by the iommu first, register an IEC notifier to update any affected
>> route when the iommu translation changes.
> 
> Yes, maybe that's the right thing to do. Or say, when split irqchip,
> IOAPIC can avoid using GSI routes any more. If with that, I should
> also remove lots of things, like: IEC notifiers for IOAPIC, and all
> things related to msi route sync-up in IOAPIC codes with KVM (so I
> suppose we will save 24 gsi route entries for KVM, which sounds
> good).

Sadly, we can't get rid of those GSI routes.  KVM uses them to decide
whether it should forward EOI to userspace.  And QEMU also has to remap
them, because KVM uses dest_id for that decision.



Re: [Qemu-devel] [PATCH v5 18/18] ioapic: clear remote irr bit for edge-triggered interrupts

2016-04-29 Thread Radim Krčmář
2016-04-28 15:05+0800, Peter Xu:
> This is to better emulate IOAPIC version 0x1X hardware. Linux kernel
> leveraged this "feature" to do explicit EOI since EOI register is still
> not introduced at that time. This will also fix the issue that level
> triggered interrupts failed to work when IR enabled (tested with Linux
> kernel version 4.5).
> 
> Signed-off-by: Peter Xu 
> ---

Reviewed-by: Radim Krčmář 



Re: [Qemu-devel] [PATCH v5 17/18] ioapic: keep RO bits for IOAPIC entry

2016-04-29 Thread Radim Krčmář
2016-04-28 15:05+0800, Peter Xu:
> Currently IOAPIC RO bits can be written. To be better aligned with
> hardware, we should let them read-only.
> 
> Signed-off-by: Peter Xu 
> ---

Reviewed-by: Radim Krčmář 



Re: [Qemu-devel] [PATCH v6 6/6] cpu-exec: Move TB chaining into tb_find_fast()

2016-04-29 Thread Sergey Fedorov
On 29/04/16 19:32, Richard Henderson wrote:
> On 04/29/2016 06:58 AM, Sergey Fedorov wrote:
>> On 29/04/16 16:54, Alex Bennée wrote:
>>> Sergey Fedorov  writes:
 diff --git a/cpu-exec.c b/cpu-exec.c
 index f49a436e1a5a..5f23c0660d6e 100644
 --- a/cpu-exec.c
 +++ b/cpu-exec.c
 @@ -320,7 +320,9 @@ found:
  return tb;
  }

 -static inline TranslationBlock *tb_find_fast(CPUState *cpu)
 +static inline TranslationBlock *tb_find_fast(CPUState *cpu,
 + TranslationBlock **last_tb,
 + int tb_exit)
  {
  CPUArchState *env = (CPUArchState *)cpu->env_ptr;
  TranslationBlock *tb;
 @@ -331,11 +333,24 @@ static inline TranslationBlock 
 *tb_find_fast(CPUState *cpu)
 always be the same before a given translated block
 is executed. */
  cpu_get_tb_cpu_state(env, , _base, );
 +tb_lock();
  tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
  if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
   tb->flags != flags)) {
  tb = tb_find_slow(cpu, pc, cs_base, flags);
  }
 +if (cpu->tb_flushed) {
 +/* Ensure that no TB jump will be modified as the
 + * translation buffer has been flushed.
 + */
 +*last_tb = NULL;
 +cpu->tb_flushed = false;
 +}
 +/* See if we can patch the calling TB. */
 +if (*last_tb && qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>>> This should be !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)
>> Probably, it's mine rebase conflict resolution mistake. Nice catch, thanks!
> Fixed while applying all to tcg-next.

Thanks!

Kind regards,
Sergey




Re: [Qemu-devel] emulation details of qemu

2016-04-29 Thread Alex Bennée

tutu sky  writes:

> Thank you in advance Alex.
> you said: "Using the QEMU's gdbstub to debug a guest is different from 
> debugging QEMU by running it under gdb."
> if i want to see the hardware's internal which is emulated by QEMU,

If by hardware's internal state then yes, GDB doesn't export much other
than the system registers. However you can extend the gdbstub to make
more machine state visible to gdb. See the gdm-xml directory for
examples of defining additional registers for GDB and the various
target-${foo}/gdbstub.c files which pas those register values back and
forth.


>i must make QEMU to run in step mode and run QEMU under GDB, no matter which 
>guest is running; but if i want to debug a gust, QEMU makes it easy for me by 
>offering "gdbstub" and i may need to compile the kernel from source, Do i 
>understand you right?
>
> 
> From: Alex Bennée 
> Sent: Friday, April 29, 2016 3:08 PM
> To: tutu sky
> Cc: Stefan Hajnoczi; qemu-devel@nongnu.org
> Subject: Re: [Qemu-devel] emulation details of qemu
>
> tutu sky  writes:
>
>> Magic answer, Thanks a lot Alex.
>> you mean GDB will be enabled for just QEMU's itself internals? It does not 
>> make importance or any difference for guest running on it?
>> if i want describe my opinion in another way, i think you said that
>> when enabling GDB for QEMU, it is usable and is just important to be
>> usable for QEMU internals, as a user wants to develop it or a person
>> may want to know how he can watch a processor internals. Yeah?
>
> I'm not sure I follow. Using the QEMU's gdbstub to debug a guest is
> different from debugging QEMU by running it under gdb.
>
>> Can GDB  be activated for multicore architectures? in order to see every 
>> core's internals separately?
>> I ask these questions because QEMU documentation is not clear enough
>> and sometimes hard to understand. for example for attaching GDB to
>> QEMU, i am unable to find a good and general guide. it seems it just
>> depend on how much you know about GDB and how to work with. am i
>> right?
>
> Generally to use the stub you start the guest with -s -S, e.g:
>
> qemu-system-arm -machine virt,accel=tcg -cpu cortex-a15 -display none \
>   -serial stdio -kernel ./arm/locking-test.flat -smp 4 -s -S
>
> And then invoke gdb with something like:
>
> gdb-multiarch ./arm/locking-test.elf -ex "target remote localhost:1234"
>
> So in this example I'm using the .elf file with gdb as that has the
> debugging information for the .flat file I started QEMU with. -ex just
> saves the hassle of typing in the "target remote localhost:1234" to
> connect to the gdb stub when you start up. Once in gdb you can do all
> the usual things:
>
> (gdb) info threads
>   Id   Target Id Frame
>   * 1Thread 1 (CPU#0 [running]) 0x4000 in ?? ()
> 2Thread 2 (CPU#1 [halted ]) 0x in ?? ()
> 3Thread 3 (CPU#2 [halted ]) 0x in ?? ()
> 4Thread 4 (CPU#3 [halted ]) 0x in ?? ()
> (gdb) x/4i $pc
>   => 0x4000:  mov r0, #0
>  0x4004:  ldr r1, [pc, #4]; 0x4010
>  0x4008:  ldr r2, [pc, #4]; 0x4014
>  0x400c:  ldr pc, [pc, #4]; 0x4018
> (gdb) p/x $r0
> $1 = 0x0
> (gdb) p/x $r1
> $2 = 0x0
> (gdb) i
>  0x4004 in ?? ()
>   => 0x4004:  ldr r1, [pc, #4]; 0x4010
>  0x4008:  ldr r2, [pc, #4]; 0x4014
>  0x400c:  ldr pc, [pc, #4]; 0x4018
> (gdb) i
>  0x4008 in ?? ()
>   => 0x4008:  ldr r2, [pc, #4];
>  0x4014
> (gdb) p/x $r1
> $3 = 0x
>
>>
>> Thanks and regards.
>>
>> 
>> From: Alex Bennée 
>> Sent: Friday, April 29, 2016 12:22 PM
>> To: tutu sky
>> Cc: Stefan Hajnoczi; qemu-devel@nongnu.org
>> Subject: Re: [Qemu-devel] emulation details of qemu
>>
>> tutu sky  writes:
>>
>>> Yeah, thank you Alex.
>>> If I use a linux on top of the qemu, for entering debug mode, do i
>>> need to compile kernel from source or it is not dependent on debugging
>>> qemu itself?
>>
>> I'm not sure I follow. As far as QEMU is concerned it provides a stub
>> for GDB to talk to and doesn't need to know anything else about the
>> guest it is running. The GDB itself will want symbols one way or another
>> so you would either compile your kernel from source or pass the debug
>> symbol enabled vmlinux to GDB using symbol-file.
>>
>>> and then is it possible to define a heterogeneous multicore platform
>>> in qemu?
>>
>> The current upstream QEMU doesn't support heterogeneous setups although
>> some preliminary work has been posted to allow multiple front-ends to be
>> compiled together.
>>
>> There are certainly out-of-tree solutions although as I understand it
>> (I've not worked with them myself) they use multiple QEMU runtimes
>> linked together with some 

Re: [Qemu-devel] [PATCH v3 03/18] qapi: Factor out JSON string escaping

2016-04-29 Thread Eric Blake
On 04/29/2016 06:09 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> Pull out a new qstring_append_json_string() helper, so that all
>> JSON output producers can use the same output escaping rules.
>>
>> While it appears that vmstate's use of the simpler qjson.c
>> formatter is not currently encountering any string that needs
>> escapes to be valid JSON, it is better to be safe than sorry.
>>
>> Signed-off-by: Eric Blake 
>> Reviewed-by: Fam Zheng 

>> -qstring_append(str, "\"");
>> +qstring_append_json_string(str, qstring_get_str(val));
>>  break;

> I think this belongs to qobject-json.c, because it's very much about
> JSON (it encapsulates knowledge on JSON string escaping), and a mere
> user of QString (it knows nothing about QString's implementation).
> 
> Precedence: qobject_from_json() & friends are there, not in qobject.c.

Fair enough. Does the name qstring_append_json_string() still work, or
do I need to adjust the name to something with 'qobject' in there, to
make it easier to know which header and .c file to use for the function?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support

2016-04-29 Thread Yuanhan Liu
On Fri, Apr 29, 2016 at 12:40:09PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Apr 28, 2016 at 7:23 AM, Yuanhan Liu
>  wrote:
> > On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lur...@redhat.com wrote:
> >> From: Marc-André Lureau 
> >> +Slave message types
> >> +---
> >> +
> >> + * VHOST_USER_SLAVE_SHUTDOWN:
> >> +  Id: 1
> >> +  Master payload: N/A
> >> +  Slave payload: u64
> >> +
> >> +  Request the master to shutdown the slave. A 0 reply is for
> >> +  success, in which case the slave may close all connections
> >> +  immediately and quit. A non-zero reply cancels the request.
> >> +
> >> +  Before a reply comes, the master may make other requests in
> >> +  order to flush or sync state.
> >
> > Hi all,
> >
> > I threw this proposal as well as DPDK's implementation to our customer
> > (OVS, Openstack and some other teams) who made such request before. I'm
> > sorry to say that none of them really liked that we can't handle crash.
> > Making reconnect work from a vhost-user backend crash is exactly something
> > they are after.
> 
> Handling crashes is not quite the same as what I propose here.

Yes, I know. However, handling crashes is exactly what our customers
want. And I just want to let you know that, say, I don't ask you to
do that :)

> I see
> it as a different goal. But I doubt about usefulness and reliability
> of a backend that crashes.

Agreed with you on that. However, I guess you have to admit that crashes
just happen. Kernel sometimes crashes, too. So, it would be great if
we could make whole stuff work again after an unexpected crash (say,
from OVS), without restarting all guests.

> In many case, vhost-user was designed after
> kernel vhost, and qemu code has the same expectation about the kernel
> or the vhost-user backend: many calls are sync and will simply
> assert() on unexpected results.

I guess we could at aleast try to dimish it, if we can't avoid it completely.

> > And to handle the crash, I was thinking of the proposal from Michael.
> > That is to do reset from the guest OS. This would fix this issue
> > ultimately. However, old kernel will not benefit from this, as well
> > as other guest other than Linux, making it not that useful for current
> > usage.
> 
> Yes, I hope Michael can help with that, I am not very familiar with
> the kernel code.
> 
> > Thinking of that the VHOST_USER_SLAVE_SHUTDOWN just gives QEMU a chance
> > to get the vring base (last used idx) from the backend, Huawei suggests
> 
> Right, but after this message, the backend should have flushed all
> pending ring packets and stop processing them. So it's also a clean
> sync point.
> 
> > that we could still make it in a consistent state after the crash, if
> > we get the vring base from vring->used->idx.  That worked as expected
> 
> You can have a backend that would have already processed packets and
> not updated used idx. You could also have out-of-order packets in
> flights (ex: desc 1-2-3 avail, 1-3 used, 2 pending..). I can't see a
> clean way to restore this, but to reset the queues and start over,
> with either packet loss or packet duplication.

Judging that it (crash or restart) happens so rare, I don't think
it matters. Moreoever, doesn't that happen in real world :)

> If the backend
> guarantees to process packets in order, it may simplify things, but it
> would be a special case.

Well, it's more like a backend thing: it's the backend to try to
set a saner vring base as stated in above proposal. Therefore, I
will not say it's a special case.

> 
> > from my test. The only tricky thing might be how to detect a crash,
> > and we could do a simple compare of the vring base from QEMU with
> > the vring->used->idx at the initiation stage. If mismatch found, get
> > it from vring->used->idx instead.
> 
> I don't follow, would the backend restore its last vring->used->idx
> after a crash?

Yes, restore from the SET_VRING_BASE from QEMU. But it's a stale value,
normally 0 if no start/stop happens before. Therefore, we can't use
it after the crash, instead, we could try to detect the mismatch and
try to fix it at SET_VRING_ADDR request.

> 
> > Comments/thoughts? Is it a solid enough solution to you?  If so, we
> > could make things much simpler, and what's most important, we could
> > be able to handle crash.
> 
> That's not so easy, many of the vhost_ops->vhost*() are followed by
> assert(r>0), which will be hard to change to handle failure. But, I
> would worry first about a backend that crashes that it may corrupt the
> VM memory too...

Not quite sure I follow this. But, backend just touches the virtio
related memory, so it will do no harm to the VM?

--yliu



Re: [Qemu-devel] [PATCH RFC 0/8] basic vfio-ccw infrastructure

2016-04-29 Thread Alex Williamson
On Fri, 29 Apr 2016 14:11:47 +0200
Dong Jia Shi  wrote:

> vfio: ccw: basic vfio-ccw infrastructure
> 
> 
> Introduction
> 
> 
> Here we describe the vfio support for Channel I/O devices (aka. CCW
> devices) for Linux/s390. Motivation for vfio-ccw is to passthrough CCW
> devices to a virtual machine, while vfio is the means.
> 
> Different than other hardware architectures, s390 has defined a unified
> I/O access method, which is so called Channel I/O. It has its own
> access patterns:
> - Channel programs run asynchronously on a separate (co)processor.
> - The channel subsystem will access any memory designated by the caller
>   in the channel program directly, i.e. there is no iommu involved.
> Thus when we introduce vfio support for these devices, we realize it
> with a no-iommu vfio implementation.
> 
> This document does not intend to explain the s390 hardware architecture
> in every detail. More information/reference could be found here:
> - A good start to know Channel I/O in general:
>   https://en.wikipedia.org/wiki/Channel_I/O
> - s390 architecture:
>   s390 Principles of Operation manual (IBM Form. No. SA22-7832)
> - The existing Qemu code which implements a simple emulated channel
>   subsystem could also be a good reference. It makes it easier to
>   follow the flow.
>   qemu/hw/s390x/css.c
> 
> Motivation of vfio-ccw
> --
> 
> Currently, a guest virtualized via qemu/kvm on s390 only sees
> paravirtualized virtio devices via the "Virtio Over Channel I/O
> (virtio-ccw)" transport. This makes virtio devices discoverable via
> standard operating system algorithms for handling channel devices.
> 
> However this is not enough. On s390 for the majority of devices, which
> use the standard Channel I/O based mechanism, we also need to provide
> the functionality of passing through them to a Qemu virtual machine.
> This includes devices that don't have a virtio counterpart (e.g. tape
> drives) or that have specific characteristics which guests want to
> exploit.
> 
> For passing a device to a guest, we want to use the same interface as
> everybody else, namely vfio. Thus, we would like to introduce vfio
> support for channel devices. And we would like to name this new vfio
> device "vfio-ccw".
> 
> Access patterns of CCW devices
> --
> 
> s390 architecture has implemented a so called channel subsystem, that
> provides a unified view of the devices physically attached to the
> systems. Though the s390 hardware platform knows about a huge variety of
> different peripheral attachments like disk devices (aka. DASDs), tapes,
> communication controllers, etc. They can all be accessed by a well
> defined access method and they are presenting I/O completion a unified
> way: I/O interruptions.
> 
> All I/O requires the use of channel command words (CCWs). A CCW is an
> instruction to a specialized I/O channel processor. A channel program
> is a sequence of CCWs which are executed by the I/O channel subsystem.
> To issue a CCW program to the channel subsystem, it is required to
> build an operation request block (ORB), which can be used to point out
> the format of the CCW and other control information to the system. The
> operating system signals the I/O channel subsystem to begin executing
> the channel program with a SSCH (start sub-channel) instruction. The
> central processor is then free to proceed with non-I/O instructions
> until interrupted. The I/O completion result is received by the
> interrupt handler in the form of interrupt response block (IRB).
> 
> Back to vfio-ccw, in short:
> - ORBs and CCW programs are built in user space (with virtual
>   addresses).
> - ORBs and CCW programs are passed to the kernel.
> - kernel translates virtual addresses to real addresses and starts the
>   IO with issuing a privileged Channel I/O instruction (e.g SSCH).
> - CCW programs run asynchronously on a separate processor.
> - I/O completion will be signaled to the host with I/O interruptions.
>   And it will be copied as IRB to user space.
> 
> 
> vfio-ccw patches overview
> -
> 
> It follows that we need vfio-ccw with a vfio no-iommu mode. For now,
> our patches are based on the current no-iommu implementation. It's a
> good start to launch the code review for vfio-ccw. Note that the
> implementation is far from complete yet; but we'd like to get feedback
> for the general architecture.
> 
> The current no-iommu implementation would consider vfio-ccw as
> unsupported and will taint the kernel. This should be not true for
> vfio-ccw. But whether the end result will be using the existing
> no-iommu code or a new module would be an implementation detail.
> 
> * CCW translation APIs
> - Description:
>   These introduce a group of APIs (start with 'ccwchain_') to do CCW
>   translation. The CCWs passed in by a user space program are organized
>   in a buffer, with 

[Qemu-devel] [PATCH v2] configure: Check if struct fsxattr is available from linux header

2016-04-29 Thread Jan Vesely
Fixes build failure with --enable-xfsctl and
new linux headers (>=4.5) and older xfsprogs(<4.5):
In file included from /usr/include/xfs/xfs.h:38:0,
 from 
/var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:97:
/usr/include/xfs/xfs_fs.h:42:8: error: redefinition of ‘struct fsxattr’
 struct fsxattr {
^
In file included from 
/var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:60:0:
/usr/include/linux/fs.h:155:8: note: originally defined here
 struct fsxattr {

v2: Add explanatory comment

CC: qemu-triv...@nongnu.org
CC: Markus Armbruster 
CC: Peter Maydell 
CC: Stefan Weil 
Signed-off-by: Jan Vesely 
---
 configure | 21 +
 1 file changed, 21 insertions(+)

diff --git a/configure b/configure
index ab54f3c..2c3585c 100755
--- a/configure
+++ b/configure
@@ -4493,6 +4493,21 @@ if test "$fortify_source" != "no"; then
   fi
 fi
 
+
+# check if struct fsxattr is available
+
+have_fsxattr=no
+cat > $TMPC << EOF
+#include 
+struct fsxattr foo;
+int main(void) {
+  return 0;
+}
+EOF
+if compile_prog "" "" ; then
+have_fsxattr=yes
+fi
+
 ##
 # End of CC checks
 # After here, no more $cc or $ld runs
@@ -5160,6 +5175,12 @@ fi
 if test "$have_ifaddrs_h" = "yes" ; then
 echo "HAVE_IFADDRS_H=y" >> $config_host_mak
 fi
+
+# xfs headers will try to redefine structs from linux headers
+# if this macro is not set
+if test "$have_fsxattr" = "yes" ; then
+echo "HAVE_FSXATTR=y" >> $config_host_mak
+fi
 if test "$vte" = "yes" ; then
   echo "CONFIG_VTE=y" >> $config_host_mak
   echo "VTE_CFLAGS=$vte_cflags" >> $config_host_mak
-- 
2.7.4




Re: [Qemu-devel] [PATCH] configure: Check if struct fsxattr is available from linux header

2016-04-29 Thread Jan Vesely
On Fri, 2016-04-29 at 15:49 +0100, Peter Maydell wrote:
> On 29 April 2016 at 15:31, Stefan Weil  wrote:
> > 
> > Is it a bug of the system headers? Or simply a design which
> > requires users to be careful when including certain header files?
> > 
> > Both /usr/include/xfs/xfs_fs.h and /usr/include/linux/fs.h define
> > the same struct fsxattr, and both definitions are identical.
> That sounds like a header bug to me...
> 
> http://oss.sgi.com/archives/xfs/2016-02/msg00324.html
> 
> suggests that (a) the xfsprogs folks are updating their
> header to deal with what the kernel header is doing and that
> (b) they think the distros ought to be updating both of them
> in sync in some way...

yes, even more so that xfsprogs/xfslib will fail to compile using
linux-headers-4.5 for the very same reason. However, it looks like
distros are not keen on keeping them in sync.

the patch is a workaround.

Jan

> 
> > 
> > Of course a good comment would be helpful here, e. g.
> > 
> > # Avoid redefinition of struct fsxattr in xfs/xfs_fs.h.
> > # It is already defined in linux/fs.h.
> Yes, this is really all I want: a note that some versions of
> the kernel headers and the xfs headers clash, so we suppress
> the xfs version if the kernel header is providing the struct.
> 
> thanks
> -- PMM


signature.asc
Description: This is a digitally signed message part


Re: [Qemu-devel] [PATCH v5 00/10] tcg: Direct block chaining clean-up

2016-04-29 Thread Richard Henderson
On 04/28/2016 02:33 PM, Sergey Fedorov wrote:
> From: Sergey Fedorov 
> 
> This series combines a set of patches which is meant to improve overall code
> structure and readability of the direct block chaining mechanism. The other
> point is to make a step towards thread safety of TB chainig.
> 
> This series is based on commit: 1d02fa9e045b ("translate-all: Adjust 256mb
> testing for mips64") from git://github.com/rth7680/qemu.git tcg-next and is
> available at git://github.com/sergefdrv/qemu.git tb-chaining-cleanup-v5
> 
> Summary of changes:
>  Changes in v5:
>   * Fixed rebase conflicts
>   * Don't check for in_superpage() in target-alpha/translate.c for
> user-mode [PATCH v5 10/10]
>  Changes in v4:
>   * Removed assert from tb_add_jump() [PATCH v4 02/10]
>   * Added comment on TB stuff synchronization [PATCH v4 04/10]
>   * Documented tcg_gen_goto_tb() and moved its usage notes there
> [PATCH v4 09/10] and [PATCH v4 10/10]
>   * Cc'ed usermode maintainers in commit message [PATCH v4 10/10]
>  Changes in v3:
>   * New patch to clean up safety checks [PATCH v3 09/10]
>   * New patch to eliminate unneeded checks in user-mode [PATCH v3 10/10]
>  Changes in v2:
>   * Eliminated duplicate dereference of 'ptb' in tb_jmp_remove() [PATCH v2 
> 2/8]
>   * Tweaked a comment [PATCH v2 4/8]
>   * Complete rewrite [PATCH v2 5/8]
>   * Tweaked a comment; eliminated duplicate dereference of 'ptb' in
> tb_jmp_unlink() [PATCH v2 8/8]
> 
> Sergey Fedorov (10):
>   tcg: Clean up direct block chaining data fields
>   tcg: Use uintptr_t type for jmp_list_{next|first} fields of TB
>   tcg: Rearrange tb_link_page() to avoid forward declaration
>   tcg: Init TB's direct jumps before making it visible
>   tcg: Clarify thread safety check in tb_add_jump()
>   tcg: Rename tb_jmp_remove() to tb_remove_from_jmp_list()
>   tcg: Extract removing of jumps to TB from tb_phys_invalidate()
>   tcg: Clean up tb_jmp_unlink()
>   tcg: Clean up direct block chaining safety checks
>   tcg: Allow goto_tb to any target PC in user mode

Applied to tcg-next.


r~



Re: [Qemu-devel] [PATCH v6 6/6] cpu-exec: Move TB chaining into tb_find_fast()

2016-04-29 Thread Richard Henderson
On 04/29/2016 06:58 AM, Sergey Fedorov wrote:
> On 29/04/16 16:54, Alex Bennée wrote:
>> Sergey Fedorov  writes:
>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>> index f49a436e1a5a..5f23c0660d6e 100644
>>> --- a/cpu-exec.c
>>> +++ b/cpu-exec.c
>>> @@ -320,7 +320,9 @@ found:
>>>  return tb;
>>>  }
>>>
>>> -static inline TranslationBlock *tb_find_fast(CPUState *cpu)
>>> +static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>> + TranslationBlock **last_tb,
>>> + int tb_exit)
>>>  {
>>>  CPUArchState *env = (CPUArchState *)cpu->env_ptr;
>>>  TranslationBlock *tb;
>>> @@ -331,11 +333,24 @@ static inline TranslationBlock *tb_find_fast(CPUState 
>>> *cpu)
>>> always be the same before a given translated block
>>> is executed. */
>>>  cpu_get_tb_cpu_state(env, , _base, );
>>> +tb_lock();
>>>  tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
>>>  if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>>>   tb->flags != flags)) {
>>>  tb = tb_find_slow(cpu, pc, cs_base, flags);
>>>  }
>>> +if (cpu->tb_flushed) {
>>> +/* Ensure that no TB jump will be modified as the
>>> + * translation buffer has been flushed.
>>> + */
>>> +*last_tb = NULL;
>>> +cpu->tb_flushed = false;
>>> +}
>>> +/* See if we can patch the calling TB. */
>>> +if (*last_tb && qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>> This should be !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)
> 
> Probably, it's mine rebase conflict resolution mistake. Nice catch, thanks!

Fixed while applying all to tcg-next.


r~




Re: [Qemu-devel] [PATCH RFC 2/9] vfio: No-IOMMU mode support

2016-04-29 Thread Alex Williamson
On Fri, 29 Apr 2016 14:13:16 +0200
Xiao Feng Ren  wrote:

> Add qemu support for the newly introduced VFIO No-IOMMU driver.
> 
> We need to add special handling for:
> - Group character device is /dev/vfio/noiommu-$GROUP.
> - No-IOMMU does not rely on a memory listener.
> - No IOMMU will be set for its group, so no need to call
>   vfio_kvm_device_add_group.
> 
> Signed-off-by: Xiao Feng Ren 
> ---
>  hw/vfio/common.c  | 66 
> ++-
>  include/hw/vfio/vfio-common.h |  2 ++
>  2 files changed, 55 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f27db36..656c303 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -789,6 +789,33 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as)
>  container = g_malloc0(sizeof(*container));
>  container->space = space;
>  container->fd = fd;
> +container->noiommu = group->noiommu;
> +
> +if (container->noiommu) {
> +ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, );
> +if (ret) {
> +error_report("vfio: failed to set group container: %m");
> +ret = -errno;
> +goto free_container_exit;
> +}
> +
> +ret = ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_NOIOMMU_IOMMU);
> +if (!ret) {
> +error_report("vfio: No available IOMMU models");
> +ret = -EINVAL;
> +goto free_container_exit;
> +}
> +
> +ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_NOIOMMU_IOMMU);
> +if (ret) {
> +error_report("vfio: failed to set iommu for container: %m");
> +ret = -errno;
> +goto free_container_exit;
> +}
> +
> +goto listener_register;
> +}
> +
>  if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
>  ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
>  bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
> @@ -878,14 +905,16 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as)
>  goto free_container_exit;
>  }
>  
> -container->listener = vfio_memory_listener;
> -
> -memory_listener_register(>listener, container->space->as);
> -
> -if (container->error) {
> -ret = container->error;
> -error_report("vfio: memory listener initialization failed for 
> container");
> -goto listener_release_exit;
> +listener_register:
> +if (!container->noiommu) {
> +container->listener = vfio_memory_listener;
> +memory_listener_register(>listener, container->space->as);
> +if (container->error) {
> +ret = container->error;
> +error_report("vfio: memory listener initialization failed for "
> + "container");
> +goto listener_release_exit;
> +}
>  }
>  
>  container->initialized = true;
> @@ -898,7 +927,9 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as)
>  
>  return 0;
>  listener_release_exit:
> -vfio_listener_release(container);
> +if (!container->noiommu) {
> +vfio_listener_release(container);
> +}
>  
>  free_container_exit:
>  g_free(container);
> @@ -928,7 +959,9 @@ static void vfio_disconnect_container(VFIOGroup *group)
>  VFIOAddressSpace *space = container->space;
>  VFIOGuestIOMMU *giommu, *tmp;
>  
> -vfio_listener_release(container);
> +if (!container->noiommu) {
> +vfio_listener_release(container);
> +}
>  QLIST_REMOVE(container, next);
>  
>  QLIST_FOREACH_SAFE(giommu, >giommu_list, giommu_next, 
> tmp) {
> @@ -969,8 +1002,13 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
>  snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
>  group->fd = qemu_open(path, O_RDWR);
>  if (group->fd < 0) {
> -error_report("vfio: error opening %s: %m", path);
> -goto free_group_exit;
> +snprintf(path, sizeof(path), "/dev/vfio/noiommu-%d", groupid);
> +group->fd = qemu_open(path, O_RDWR);
> +if (group->fd < 0) {
> +error_report("vfio: error opening %s: %m", path);
> +goto free_group_exit;
> +}
> +group->noiommu = 1;

No, this just can't happen.  There is absolutely no way that falling
back to a noiommu interface is the correct thing to do in most
situations. It cannot be automatic or I will have vfio-pci users lined
up trying to do PCI device assignment with this code.

>  }
>  
>  if (ioctl(group->fd, VFIO_GROUP_GET_STATUS, )) {
> @@ -999,7 +1037,9 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
>  
>  QLIST_INSERT_HEAD(_group_list, group, next);
>  
> -vfio_kvm_device_add_group(group);
> +if (!group->noiommu) {
> +vfio_kvm_device_add_group(group);
> +}

Why?  Notifying KVM of 

Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer

2016-04-29 Thread Eric Blake
On 04/04/2016 07:43 AM, Alberto Garcia wrote:
> This patch makes the 'device' parameter of the 'block-stream' command
> accept a node name as well as a device name.
> 
> In addition to that, operation blockers will be checked in all
> intermediate nodes between the top and the base node.
> 
> Since qmp_block_stream() now uses the error from bdrv_lookup_bs() and
> no longer returns DeviceNotFound, iotest 030 is updated to expect
> GenericError instead.
> 
> Signed-off-by: Alberto Garcia 
> ---
> +++ b/qapi/block-core.json
> @@ -1405,6 +1405,10 @@

Context: block-stream

>  # with query-block-jobs.  The operation can be stopped before it has 
> completed
>  # using the block-job-cancel command.
>  #
> +# The node that receives the data is called the top image, can be located
> +# in any part of the whole chain and can be specified using its device
> +# or node name.
> +#

"any part of the whole chain" feels fishy - it has to be ABOVE the base
file.  That is, in a chain A <- B <- C <- D <- E, if I want to set B as
the base, then the top has to be C, D, or E (but not A).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node

2016-04-29 Thread Eric Blake
On 04/04/2016 07:43 AM, Alberto Garcia wrote:
> Currently, block jobs can only be owned by root nodes. This patch
> allows block jobs to be in any arbitrary node, by making the following
> changes:
> 
> - Block jobs can now be identified by the node name of their
>   BlockDriverState in addition to the device name. Since both device
>   and node names live in the same namespace there's no ambiguity.
> 
> - The "device" parameter used by all commands that operate on block
>   jobs can also be a node name now.
> 
> - The node name is used as a fallback to fill in the BlockJobInfo
>   structure and all BLOCK_JOB_* events if there is no device name for
>   that job.

Having more than one way to name a job might be okay for convenience,
but only if the canonical name is unambiguous.  I agree with Kevin's
concern that using the device and/or node name as the canonical name of
the job is worrisome, because it locks us into having only one job with
that name at a time.


> +++ b/docs/qmp-events.txt
> @@ -92,7 +92,7 @@ Data:
>  
>  - "type": Job type (json-string; "stream" for image streaming
>   "commit" for block commit)
> -- "device":   Device name (json-string)
> +- "device":   Device name, or node name if not present (json-string)

On the surface this sounds okay (you are promising to return a canonical
name, insofar as job canonical names are currently the node/device name
until the time that we allow job ids with multiple jobs per device) -
even if it means that you return a device name when the user started a
job based on a node name.

But I'm also worried about jobs where the node name tied to a device
changes over time (creating a snapshot, pivoting to a mirror, doing an
active commit with a pivot to the backing file - all of these are cases
where the device name stays the same, but the top node name associated
with the device differs over time).  If the device name is the canonical
one, then a job started on node "A" but reported as device "D", needs to
STILL be known as job "D" even if by the end of the job node "A" is no
longer associated with device "D" (because "D" is now serving the
top-level node "B").


> @@ -1513,7 +1513,7 @@
>  # the operation is actually paused.  Cancelling a paused job automatically
>  # resumes it.
>  #
> -# @device: the device name
> +# @device: the device or node name of the owner of the block job.

We aren't consistent on whether to use a trailing '.'.  I don't care
enough to standardize on a style, so no need to respin for just that.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v5 7/9] target-mips: Add nan2008 flavor of <CEIL|CVT|FLOOR|ROUND|TRUNC>.<L|W>.<S|D>

2016-04-29 Thread Leon Alrae
On 18/04/16 17:03, Aleksandar Markovic wrote:
> @@ -3049,6 +3050,330 @@ uint32_t helper_float_floorw_s(CPUMIPSState *env, 
> uint32_t fst0)
>  return wt2;
>  }
>  
> +uint64_t helper_float_cvt_2008_l_d(CPUMIPSState *env, uint64_t fdt0)
> +{
> +uint64_t dt2;
> +
> +dt2 = float64_to_int64(fdt0, >active_fpu.fp_status);
> +if (get_float_exception_flags(>active_fpu.fp_status)
> +& (float_flag_invalid) {

unnecessary parentheses

> @@ -8919,7 +8920,11 @@ static void gen_farith (DisasContext *ctx, enum 
> fopcode op1,
>  TCGv_i64 fp64 = tcg_temp_new_i64();
>  
>  gen_load_fpr32(ctx, fp32, fs);
> -gen_helper_float_roundl_s(fp64, cpu_env, fp32);
> +if ((ctx->insn_flags & ISA_MIPS32R6) && (ctx->nan2008)) {

Why testing the version of the architecture? This will generate wrong
helper for P5600 which is R5 and IEEE 754-2008 compliant.

Leon

> +gen_helper_float_round_2008_l_s(fp64, cpu_env, fp32);
> +} else {
> +gen_helper_float_round_l_s(fp64, cpu_env, fp32);
> +}
>  tcg_temp_free_i32(fp32);
>  gen_store_fpr64(ctx, fp64, fd);
>  tcg_temp_free_i64(fp64);



Re: [Qemu-devel] emulation details of qemu

2016-04-29 Thread tutu sky
Thank you in advance Alex.
you said: "Using the QEMU's gdbstub to debug a guest is different from 
debugging QEMU by running it under gdb."
if i want to see the hardware's internal which is emulated by QEMU, i must make 
QEMU to run in step mode and run QEMU under GDB, no matter which guest is 
running; but if i want to debug a gust, QEMU makes it easy for me by offering 
"gdbstub" and i may need to compile the kernel from source, Do i understand you 
right?


From: Alex Bennée 
Sent: Friday, April 29, 2016 3:08 PM
To: tutu sky
Cc: Stefan Hajnoczi; qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] emulation details of qemu

tutu sky  writes:

> Magic answer, Thanks a lot Alex.
> you mean GDB will be enabled for just QEMU's itself internals? It does not 
> make importance or any difference for guest running on it?
> if i want describe my opinion in another way, i think you said that
> when enabling GDB for QEMU, it is usable and is just important to be
> usable for QEMU internals, as a user wants to develop it or a person
> may want to know how he can watch a processor internals. Yeah?

I'm not sure I follow. Using the QEMU's gdbstub to debug a guest is
different from debugging QEMU by running it under gdb.

> Can GDB  be activated for multicore architectures? in order to see every 
> core's internals separately?
> I ask these questions because QEMU documentation is not clear enough
> and sometimes hard to understand. for example for attaching GDB to
> QEMU, i am unable to find a good and general guide. it seems it just
> depend on how much you know about GDB and how to work with. am i
> right?

Generally to use the stub you start the guest with -s -S, e.g:

qemu-system-arm -machine virt,accel=tcg -cpu cortex-a15 -display none \
  -serial stdio -kernel ./arm/locking-test.flat -smp 4 -s -S

And then invoke gdb with something like:

gdb-multiarch ./arm/locking-test.elf -ex "target remote localhost:1234"

So in this example I'm using the .elf file with gdb as that has the
debugging information for the .flat file I started QEMU with. -ex just
saves the hassle of typing in the "target remote localhost:1234" to
connect to the gdb stub when you start up. Once in gdb you can do all
the usual things:

(gdb) info threads
  Id   Target Id Frame
  * 1Thread 1 (CPU#0 [running]) 0x4000 in ?? ()
2Thread 2 (CPU#1 [halted ]) 0x in ?? ()
3Thread 3 (CPU#2 [halted ]) 0x in ?? ()
4Thread 4 (CPU#3 [halted ]) 0x in ?? ()
(gdb) x/4i $pc
  => 0x4000:  mov r0, #0
 0x4004:  ldr r1, [pc, #4]; 0x4010
 0x4008:  ldr r2, [pc, #4]; 0x4014
 0x400c:  ldr pc, [pc, #4]; 0x4018
(gdb) p/x $r0
$1 = 0x0
(gdb) p/x $r1
$2 = 0x0
(gdb) i
 0x4004 in ?? ()
  => 0x4004:  ldr r1, [pc, #4]; 0x4010
 0x4008:  ldr r2, [pc, #4]; 0x4014
 0x400c:  ldr pc, [pc, #4]; 0x4018
(gdb) i
 0x4008 in ?? ()
  => 0x4008:  ldr r2, [pc, #4];
 0x4014
(gdb) p/x $r1
$3 = 0x

>
> Thanks and regards.
>
> 
> From: Alex Bennée 
> Sent: Friday, April 29, 2016 12:22 PM
> To: tutu sky
> Cc: Stefan Hajnoczi; qemu-devel@nongnu.org
> Subject: Re: [Qemu-devel] emulation details of qemu
>
> tutu sky  writes:
>
>> Yeah, thank you Alex.
>> If I use a linux on top of the qemu, for entering debug mode, do i
>> need to compile kernel from source or it is not dependent on debugging
>> qemu itself?
>
> I'm not sure I follow. As far as QEMU is concerned it provides a stub
> for GDB to talk to and doesn't need to know anything else about the
> guest it is running. The GDB itself will want symbols one way or another
> so you would either compile your kernel from source or pass the debug
> symbol enabled vmlinux to GDB using symbol-file.
>
>> and then is it possible to define a heterogeneous multicore platform
>> in qemu?
>
> The current upstream QEMU doesn't support heterogeneous setups although
> some preliminary work has been posted to allow multiple front-ends to be
> compiled together.
>
> There are certainly out-of-tree solutions although as I understand it
> (I've not worked with them myself) they use multiple QEMU runtimes
> linked together with some sort of shared memory bus/IPC layer.
>
>>
>> Thanks and regards.
>>
>> 
>> From: Alex Bennée 
>> Sent: Thursday, April 28, 2016 6:45 PM
>> To: tutu sky
>> Cc: Stefan Hajnoczi; qemu-devel@nongnu.org
>> Subject: Re: [Qemu-devel] emulation details of qemu
>>
>> tutu sky  writes:
>>
>>> Thanks a lot Stefan,
>>> But if i want to change the content of a register during run time in
>>> debug mode, what should i do? is it possible at first?
>>
>> Using the gdbstub sure you 

[Qemu-devel] [PATCH v2 4/5] test: Postcopy

2016-04-29 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

This is a postcopy test (x86 only) that actually runs the guest
and checks the memory contents.

The test runs from an x86 boot block with the hex embedded in the test;
the source for this is:

...

.code16
.org 0x7c00
.file   "fill.s"
.text
.globl  start
.type   start, @function
start: # at 0x7c00 ?
cli
lgdt gdtdesc
mov $1,%eax
mov %eax,%cr0  # Protected mode enable
data32 ljmp $8,$0x7c20

.org 0x7c20
.code32
# A20 enable - not sure I actually need this
inb $0x92,%al
or  $2,%al
outb %al, $0x92

# set up DS for the whole of RAM (needed on KVM)
mov $16,%eax
mov %eax,%ds

mov $65,%ax
mov $0x3f8,%dx
outb %al,%dx

# bl keeps a counter so we limit the output speed
mov $0, %bl
mainloop:
# Start from 1MB
mov $(1024*1024),%eax
innerloop:
incb (%eax)
add $4096,%eax
cmp $(100*1024*1024),%eax
jl innerloop

inc %bl
jnz mainloop

mov $66,%ax
mov $0x3f8,%dx
outb %al,%dx

jmp mainloop

# GDT magic from old (GPLv2)  Grub startup.S
.p2align2   /* force 4-byte alignment */
gdt:
.word   0, 0
.byte   0, 0, 0, 0

/* -- code segment --
 * base = 0x, limit = 0xF (4 KiB Granularity), present
 * type = 32bit code execute/read, DPL = 0
 */
.word   0x, 0
.byte   0, 0x9A, 0xCF, 0

/* -- data segment --
 * base = 0x, limit 0xF (4 KiB Granularity), present
 * type = 32 bit data read/write, DPL = 0
 */
.word   0x, 0
.byte   0, 0x92, 0xCF, 0

gdtdesc:
.word   0x27/* limit */
.long   gdt /* addr */

/* I'm a bootable disk */
.org 0x7dfe
.byte 0x55
.byte 0xAA

...

and that can be assembled by the following magic:
as --32 -march=i486 fill.s -o fill.o
objcopy -O binary fill.o fill.boot
dd if=fill.boot of=bootsect bs=256 count=2 skip=124
xxd -i bootsect

Signed-off-by: Dr. David Alan Gilbert 
---
 tests/Makefile|   2 +
 tests/postcopy-test.c | 455 ++
 2 files changed, 457 insertions(+)
 create mode 100644 tests/postcopy-test.c

diff --git a/tests/Makefile b/tests/Makefile
index 9194f18..f356f4a 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -224,6 +224,7 @@ endif
 check-qtest-i386-y += tests/test-netfilter$(EXESUF)
 check-qtest-i386-y += tests/test-filter-mirror$(EXESUF)
 check-qtest-i386-y += tests/test-filter-redirector$(EXESUF)
+check-qtest-i386-y += tests/postcopy-test$(EXESUF)
 check-qtest-x86_64-y = $(check-qtest-i386-y)
 gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c
 gcov-files-x86_64-y = $(subst 
i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y))
@@ -579,6 +580,7 @@ tests/usb-hcd-uhci-test$(EXESUF): tests/usb-hcd-uhci-test.o 
$(libqos-usb-obj-y)
 tests/usb-hcd-ehci-test$(EXESUF): tests/usb-hcd-ehci-test.o $(libqos-usb-obj-y)
 tests/usb-hcd-xhci-test$(EXESUF): tests/usb-hcd-xhci-test.o $(libqos-usb-obj-y)
 tests/pc-cpu-test$(EXESUF): tests/pc-cpu-test.o
+tests/postcopy-test$(EXESUF): tests/postcopy-test.o
 tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o 
qemu-timer.o $(qtest-obj-y) $(test-io-obj-y)
 tests/qemu-iotests/socket_scm_helper$(EXESUF): 
tests/qemu-iotests/socket_scm_helper.o
 tests/test-qemu-opts$(EXESUF): tests/test-qemu-opts.o $(test-util-obj-y)
diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
new file mode 100644
index 000..3712a50
--- /dev/null
+++ b/tests/postcopy-test.c
@@ -0,0 +1,455 @@
+/*
+ * QTest testcase for postcopy
+ *
+ * Copyright (c) 2016 Red Hat, Inc. and/or its affiliates
+ *   based on the vhost-user-test.c that is:
+ *  Copyright (c) 2014 Virtual Open Systems Sarl.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include 
+
+#include "libqtest.h"
+#include "qemu/option.h"
+#include "qemu/range.h"
+#include "sysemu/char.h"
+#include "sysemu/sysemu.h"
+
+#include 
+#include 
+#include 
+
+#if defined(__linux__)
+#include 
+#endif
+
+#if defined(__linux__) && defined(__NR_userfaultfd) && defined(CONFIG_EVENTFD)
+#include 
+#include 
+#include 
+
+const unsigned start_address = 1024 * 1024;
+const unsigned end_address = 100 * 1024 * 1024;
+bool got_stop;
+
+static bool ufd_version_check(void)
+{
+struct uffdio_api api_struct;
+uint64_t ioctl_mask;
+
+int ufd = ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+
+if (ufd == -1) {
+g_test_message("Skipping test: userfaultfd not available");
+return false;
+}
+
+api_struct.api = 

Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer

2016-04-29 Thread Kevin Wolf
Am 28.04.2016 um 14:20 hat Alberto Garcia geschrieben:
> On Wed 27 Apr 2016 03:34:19 PM CEST, Max Reitz  wrote:
> >> +/* Look for the top-level node that contains 'bs' in its chain */
> >> +active = NULL;
> >> +do {
> >> +active = bdrv_next(active);
> >> +} while (active && !bdrv_chain_contains(active, bs));
> >
> > Alternatively, you could iterate up directly from @bs. Just look for
> > the BdrvChild in bs->parents with .role == _backing.
> 
> Yes, but the BdrvChild in bs->parents does not contain any pointer to
> the actual parent node, so unless we want to change that structure I
> wouldn't go for this solution.

Yes, and that's intentional. If we do things right, there shouldn't be
any need to go upwards in the tree. Our current op blockers are not done
right as they are only effective when set on the top layer. Some
temporary ugliness to deal with this is okay; breaking the assumption
that children don't know their parents would require much better
justification.

> > More interesting question: What happens if you have e.g. a qcow2 file
> > as a quorum child, and then want to stream inside of the qcow2 backing
> > chain?
> >
> > So maybe you should first walk up along _backing and then along
> > _file/_format. I think a generic bdrv_get_root_bs()
> > wouldn't hurt.
> 
> You're right. I'm not sure if that would have other consequences that we
> should consider, so maybe we can add that later, with its own set of
> tests.
> 
> Also, this is all necessary because of the problem with bdrv_reopen().
> If that is ever fixed there's no need to block the active layer at all.

This patch errors out if we can't find the active layer. Sounds safe and
appropriate for an initial version. The real solution isn't to improve
the magic to find the root node, but to remove the need to find it (by
getting the new op blockers).

Kevin



Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer

2016-04-29 Thread Kevin Wolf
Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> This patch makes the 'device' parameter of the 'block-stream' command
> accept a node name as well as a device name.
> 
> In addition to that, operation blockers will be checked in all
> intermediate nodes between the top and the base node.
> 
> Since qmp_block_stream() now uses the error from bdrv_lookup_bs() and
> no longer returns DeviceNotFound, iotest 030 is updated to expect
> GenericError instead.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  blockdev.c | 31 +++
>  qapi/block-core.json   | 10 +++---
>  tests/qemu-iotests/030 |  2 +-
>  3 files changed, 31 insertions(+), 12 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 2e7712e..bfdc0e3 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2989,6 +2989,7 @@ void qmp_block_stream(const char *device,
>  BlockBackend *blk;
>  BlockDriverState *bs;
>  BlockDriverState *base_bs = NULL;
> +BlockDriverState *active;
>  AioContext *aio_context;
>  Error *local_err = NULL;
>  const char *base_name = NULL;
> @@ -2997,21 +2998,19 @@ void qmp_block_stream(const char *device,
>  on_error = BLOCKDEV_ON_ERROR_REPORT;
>  }
>  
> -blk = blk_by_name(device);
> -if (!blk) {
> -error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
> -  "Device '%s' not found", device);
> +bs = bdrv_lookup_bs(device, device, errp);
> +if (!bs) {
>  return;
>  }
>  
> -aio_context = blk_get_aio_context(blk);
> +aio_context = bdrv_get_aio_context(bs);
>  aio_context_acquire(aio_context);
>  
> -if (!blk_is_available(blk)) {
> +blk = blk_by_name(device);
> +if (blk && !blk_is_available(blk)) {
>  error_setg(errp, "Device '%s' has no medium", device);
>  goto out;
>  }
> -bs = blk_bs(blk);
>  
>  if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_STREAM, errp)) {
>  goto out;
> @@ -3027,6 +3026,22 @@ void qmp_block_stream(const char *device,
>  base_name = base;
>  }
>  
> +/* Look for the top-level node that contains 'bs' in its chain */
> +active = NULL;
> +do {
> +active = bdrv_next(active);
> +} while (active && !bdrv_chain_contains(active, bs));
> +
> +if (active == NULL) {
> +error_setg(errp, "Cannot find top level node for '%s'", device);
> +goto out;
> +}

Hm... On the one hand, I really like that you don't expect the user to
provide the active layer in QMP. This allows us to remove this wart once
we have the new op blockers.

On the other hand, this code assumes that there is only a single
top-level node. This isn't necessarily true any more these days. Do we
need to set blockers on _all_ root nodes that have the node in their
backing chain?

Kevin



Re: [Qemu-devel] [PATCH v9 06/11] block: Support streaming to an intermediate layer

2016-04-29 Thread Kevin Wolf
Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> This makes sure that the image we are steaming into is open in

s/steaming/streaming/

> read-write mode during the operation.
> 
> The block job is created on the destination image, but operation
> blockers are also set on the active layer. We do this in order to
> prevent other block jobs from running in parallel in the same chain.
> See here for details on why that is currently not supported:
> 
> [Qemu-block] RFC: Status of the intermediate block streaming work
> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
> 
> Finally, this also unblocks the stream operation in backing files.
> 
> Signed-off-by: Alberto Garcia 

Would be great to have the new op blockers already, because having to
know the active layer before streaming can be started is certainly not
nice.

Kevin



Re: [Qemu-devel] emulation details of qemu

2016-04-29 Thread Alex Bennée

tutu sky  writes:

> Magic answer, Thanks a lot Alex.
> you mean GDB will be enabled for just QEMU's itself internals? It does not 
> make importance or any difference for guest running on it?
> if i want describe my opinion in another way, i think you said that
> when enabling GDB for QEMU, it is usable and is just important to be
> usable for QEMU internals, as a user wants to develop it or a person
> may want to know how he can watch a processor internals. Yeah?

I'm not sure I follow. Using the QEMU's gdbstub to debug a guest is
different from debugging QEMU by running it under gdb.

> Can GDB  be activated for multicore architectures? in order to see every 
> core's internals separately?
> I ask these questions because QEMU documentation is not clear enough
> and sometimes hard to understand. for example for attaching GDB to
> QEMU, i am unable to find a good and general guide. it seems it just
> depend on how much you know about GDB and how to work with. am i
> right?

Generally to use the stub you start the guest with -s -S, e.g:

qemu-system-arm -machine virt,accel=tcg -cpu cortex-a15 -display none \
  -serial stdio -kernel ./arm/locking-test.flat -smp 4 -s -S

And then invoke gdb with something like:

gdb-multiarch ./arm/locking-test.elf -ex "target remote localhost:1234"

So in this example I'm using the .elf file with gdb as that has the
debugging information for the .flat file I started QEMU with. -ex just
saves the hassle of typing in the "target remote localhost:1234" to
connect to the gdb stub when you start up. Once in gdb you can do all
the usual things:

(gdb) info threads
  Id   Target Id Frame
  * 1Thread 1 (CPU#0 [running]) 0x4000 in ?? ()
2Thread 2 (CPU#1 [halted ]) 0x in ?? ()
3Thread 3 (CPU#2 [halted ]) 0x in ?? ()
4Thread 4 (CPU#3 [halted ]) 0x in ?? ()
(gdb) x/4i $pc
  => 0x4000:  mov r0, #0
 0x4004:  ldr r1, [pc, #4]; 0x4010
 0x4008:  ldr r2, [pc, #4]; 0x4014
 0x400c:  ldr pc, [pc, #4]; 0x4018
(gdb) p/x $r0
$1 = 0x0
(gdb) p/x $r1
$2 = 0x0
(gdb) i
 0x4004 in ?? ()
  => 0x4004:  ldr r1, [pc, #4]; 0x4010
 0x4008:  ldr r2, [pc, #4]; 0x4014
 0x400c:  ldr pc, [pc, #4]; 0x4018
(gdb) i
 0x4008 in ?? ()
  => 0x4008:  ldr r2, [pc, #4];
 0x4014
(gdb) p/x $r1
$3 = 0x

>
> Thanks and regards.
>
> 
> From: Alex Bennée 
> Sent: Friday, April 29, 2016 12:22 PM
> To: tutu sky
> Cc: Stefan Hajnoczi; qemu-devel@nongnu.org
> Subject: Re: [Qemu-devel] emulation details of qemu
>
> tutu sky  writes:
>
>> Yeah, thank you Alex.
>> If I use a linux on top of the qemu, for entering debug mode, do i
>> need to compile kernel from source or it is not dependent on debugging
>> qemu itself?
>
> I'm not sure I follow. As far as QEMU is concerned it provides a stub
> for GDB to talk to and doesn't need to know anything else about the
> guest it is running. The GDB itself will want symbols one way or another
> so you would either compile your kernel from source or pass the debug
> symbol enabled vmlinux to GDB using symbol-file.
>
>> and then is it possible to define a heterogeneous multicore platform
>> in qemu?
>
> The current upstream QEMU doesn't support heterogeneous setups although
> some preliminary work has been posted to allow multiple front-ends to be
> compiled together.
>
> There are certainly out-of-tree solutions although as I understand it
> (I've not worked with them myself) they use multiple QEMU runtimes
> linked together with some sort of shared memory bus/IPC layer.
>
>>
>> Thanks and regards.
>>
>> 
>> From: Alex Bennée 
>> Sent: Thursday, April 28, 2016 6:45 PM
>> To: tutu sky
>> Cc: Stefan Hajnoczi; qemu-devel@nongnu.org
>> Subject: Re: [Qemu-devel] emulation details of qemu
>>
>> tutu sky  writes:
>>
>>> Thanks a lot Stefan,
>>> But if i want to change the content of a register during run time in
>>> debug mode, what should i do? is it possible at first?
>>
>> Using the gdbstub sure you can change the register values when the
>> machine is halted.
>>
>>>
>>> Regards.
>>> 
>>> From: Stefan Hajnoczi 
>>> Sent: Tuesday, April 26, 2016 9:31 AM
>>> To: tutu sky
>>> Cc: qemu-devel@nongnu.org
>>> Subject: Re: [Qemu-devel] emulation details of qemu
>>>
>>> On Sat, Apr 23, 2016 at 06:36:39AM +, tutu sky wrote:
 I want to know that is it possible to access registers or 
 micro-architectural part of a core/cpu in qemu during run time?
>>>
>>> Yes.  How and to what extent depends on whether you are using TCG, KVM,
>>> or TCI.  QEMU also has gdbstub support so you can single-step execution
>>> and access CPU 

Re: [Qemu-devel] [PATCH v5 5/9] target-mips: Activate IEEE 274-2008 signaling NaN bit meaning

2016-04-29 Thread Leon Alrae
On 18/04/16 17:03, Aleksandar Markovic wrote:
> From: Aleksandar Markovic 
> 
> Functions mips_cpu_reset() and msa_reset() are updated so that flag
> snan_bit_is_one is properly set for any Mips FPU/MSA configuration.
> For main FPUs, CPUs with FCR31's FCR31_NAN2008 bit set will invoke
> set_snan_bit_is_one(0). For MSA, as it is IEEE 274-2008 compliant
> from it inception, set_snan_bit_is_one(0) will always be invoked.
> 
> By applying this patch, a number of incorrect behaviors for CPU
> configurations that require IEEE 274-2008 compliance will be fixed.
> Those are behaviors that (up to the moment of applying this patch)
> did not get the desired functionality from SoftFloat library with
> respect to distinguishing between quiet and signaling NaN, getting
> default NaN values (both quiet and signaling), establishing if a
> floating point number is Nan or not, etc.
> 
> Just two examples:
> 
> * . will now correctly detect and propagate NaNs.
> * CLASS. will now correcty detect NaN flavors, both their
>   CPU FPU and MSA version.
> 
> Signed-off-by: Thomas Schwinge 
> Signed-off-by: Maciej W. Rozycki 
> Signed-off-by: Aleksandar Markovic 
> ---
>  target-mips/translate.c  | 6 +-
>  target-mips/translate_init.c | 3 ++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index e934884..2cdd2bd 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -20129,7 +20129,11 @@ void cpu_state_reset(CPUMIPSState *env)
>  env->CP0_PageGrain = env->cpu_model->CP0_PageGrain;
>  env->active_fpu.fcr0 = env->cpu_model->CP1_fcr0;
>  env->active_fpu.fcr31 = env->cpu_model->CP1_fcr31;
> -set_snan_bit_is_one(1, >active_fpu.fp_status);
> +if ((env->active_fpu.fcr31 >> FCR31_NAN2008) & 1) {
> +set_snan_bit_is_one(0, >active_fpu.fp_status);
> +} else {
> +set_snan_bit_is_one(1, >active_fpu.fp_status);
> +}
>  env->msair = env->cpu_model->MSAIR;
>  env->insn_flags = env->cpu_model->insn_flags;
>  
> diff --git a/target-mips/translate_init.c b/target-mips/translate_init.c
> index 1094baa..bae6183 100644
> --- a/target-mips/translate_init.c
> +++ b/target-mips/translate_init.c
> @@ -904,5 +904,6 @@ static void msa_reset(CPUMIPSState *env)
>  /* clear float_status nan mode */
>  set_default_nan_mode(0, >active_tc.msa_fp_status);
>  
> -set_snan_bit_is_one(1, >active_tc.msa_fp_status);
> +/* set proper signanling bit meaning ("1" means "quiet") */
> +set_snan_bit_is_one(0, >active_tc.msa_fp_status);
>  }

To support r3, specifically writable {NAN,ABS}2008 bits, we will need to
restore snan_bit_is_one in more places than just reset (for example
after migration), which suggests that the code in this patch deserves to
be placed in a separate function, just like it was done originally.
Also, having the fcr31_rw_bitmask would nicely clean up the fcr31
handling in helper_ctc1.

If you plan to do that later then that's OK as far as I'm concerned, but
if those changes (which were already posted and not that big) were
included here from the beginning then we would avoid having to rework
above code.

Thanks,
Leon



Re: [Qemu-devel] [PATCH v2 1/5] Postcopy: Avoid 0 length discards

2016-04-29 Thread Denis V. Lunev

On 04/29/2016 05:47 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

The discard code in migration/ram.c would send request for
zero length discards in the case where no discards were needed.
It doesn't appear to have had any bad effect.

Signed-off-by: Dr. David Alan Gilbert 
---
  migration/ram.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 3f05738..e96c2af 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1557,7 +1557,9 @@ static int postcopy_send_discard_bm_ram(MigrationState 
*ms,
  } else {
  discard_length = zero - one;
  }
-postcopy_discard_send_range(ms, pds, one, discard_length);
+if (discard_length) {
+postcopy_discard_send_range(ms, pds, one, discard_length);
+}
  current = one + discard_length;
  } else {
  current = one;

Reviewed-by: Denis V. Lunev 



Re: [Qemu-devel] [PATCH v2 3/5] Postcopy: Add stats on page requests

2016-04-29 Thread Denis V. Lunev

On 04/29/2016 05:47 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

On the source, add a count of page requests received from the
destination.

Signed-off-by: Dr. David Alan Gilbert 
---
  hmp.c | 4 
  include/migration/migration.h | 2 ++
  migration/migration.c | 2 ++
  migration/ram.c   | 1 +
  qapi-schema.json  | 6 +-
  5 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hmp.c b/hmp.c
index d510236..cd5fae3 100644
--- a/hmp.c
+++ b/hmp.c
@@ -209,6 +209,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
  monitor_printf(mon, "dirty pages rate: %" PRIu64 " pages\n",
 info->ram->dirty_pages_rate);
  }
+if (info->ram->postcopy_requests) {
+monitor_printf(mon, "postcopy request count: %" PRIu64 "\n",
+   info->ram->postcopy_requests);
+}
  }
  
  if (info->has_disk) {

diff --git a/include/migration/migration.h b/include/migration/migration.h
index ac2c12c..78fa59b 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -157,6 +157,8 @@ struct MigrationState
  int64_t xbzrle_cache_size;
  int64_t setup_time;
  int64_t dirty_sync_count;
+/* Count of requests incoming from destination */
+int64_t postcopy_requests;
  
  /* Flag set once the migration has been asked to enter postcopy */

  bool start_postcopy;
diff --git a/migration/migration.c b/migration/migration.c
index bfb326d..9d41618 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -573,6 +573,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
  info->ram->normal_bytes = norm_mig_bytes_transferred();
  info->ram->mbps = s->mbps;
  info->ram->dirty_sync_count = s->dirty_sync_count;
+info->ram->postcopy_requests = s->postcopy_requests;
  
  if (s->state != MIGRATION_STATUS_COMPLETED) {

  info->ram->remaining = ram_bytes_remaining();
@@ -933,6 +934,7 @@ MigrationState *migrate_init(const MigrationParams *params)
  s->dirty_sync_count = 0;
  s->start_postcopy = false;
  s->postcopy_after_devices = false;
+s->postcopy_requests = 0;
  s->migration_thread_running = false;
  s->last_req_rb = NULL;
  
diff --git a/migration/ram.c b/migration/ram.c

index e96c2af..eeb1902 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1169,6 +1169,7 @@ int ram_save_queue_pages(MigrationState *ms, const char 
*rbname,
  {
  RAMBlock *ramblock;
  
+ms->postcopy_requests++;

  rcu_read_lock();
  if (!rbname) {
  /* Reuse last RAMBlock */
diff --git a/qapi-schema.json b/qapi-schema.json
index 54634c4..8bc581e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -382,13 +382,17 @@
  #
  # @dirty-sync-count: number of times that dirty ram was synchronized (since 
2.1)
  #
+# @postcopy-requests: The number of page requests received from the destination
+#(since 2.7)
+#
  # Since: 0.14.0
  ##
  { 'struct': 'MigrationStats',
'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' ,
 'duplicate': 'int', 'skipped': 'int', 'normal': 'int',
 'normal-bytes': 'int', 'dirty-pages-rate' : 'int',
-   'mbps' : 'number', 'dirty-sync-count' : 'int' } }
+   'mbps' : 'number', 'dirty-sync-count' : 'int',
+   'postcopy-requests' : 'int' } }
  
  ##

  # @XBZRLECacheStats

Reviewed-by: Denis V. Lunev 



Re: [Qemu-devel] [PATCH v2 2/5] Migration: Split out ram part of qmp_query_migrate

2016-04-29 Thread Denis V. Lunev

On 04/29/2016 05:47 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

The RAM section of qmp_query_migrate is reasonably complex
and repeated 3 times.  Split it out into a helper.

Signed-off-by: Dr. David Alan Gilbert 
---
  migration/migration.c | 57 ---
  1 file changed, 22 insertions(+), 35 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 991313a..bfb326d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -561,6 +561,25 @@ static void get_xbzrle_cache_stats(MigrationInfo *info)
  }
  }
  
+static void populate_ram_info(MigrationInfo *info, MigrationState *s)

+{
+info->has_ram = true;
+info->ram = g_malloc0(sizeof(*info->ram));
+info->ram->transferred = ram_bytes_transferred();
+info->ram->total = ram_bytes_total();
+info->ram->duplicate = dup_mig_pages_transferred();
+info->ram->skipped = skipped_mig_pages_transferred();
+info->ram->normal = norm_mig_pages_transferred();
+info->ram->normal_bytes = norm_mig_bytes_transferred();
+info->ram->mbps = s->mbps;
+info->ram->dirty_sync_count = s->dirty_sync_count;
+
+if (s->state != MIGRATION_STATUS_COMPLETED) {
+info->ram->remaining = ram_bytes_remaining();
+info->ram->dirty_pages_rate = s->dirty_pages_rate;
+}
+}
+
  MigrationInfo *qmp_query_migrate(Error **errp)
  {
  MigrationInfo *info = g_malloc0(sizeof(*info));
@@ -585,18 +604,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
  info->has_setup_time = true;
  info->setup_time = s->setup_time;
  
-info->has_ram = true;

-info->ram = g_malloc0(sizeof(*info->ram));
-info->ram->transferred = ram_bytes_transferred();
-info->ram->remaining = ram_bytes_remaining();
-info->ram->total = ram_bytes_total();
-info->ram->duplicate = dup_mig_pages_transferred();
-info->ram->skipped = skipped_mig_pages_transferred();
-info->ram->normal = norm_mig_pages_transferred();
-info->ram->normal_bytes = norm_mig_bytes_transferred();
-info->ram->dirty_pages_rate = s->dirty_pages_rate;
-info->ram->mbps = s->mbps;
-info->ram->dirty_sync_count = s->dirty_sync_count;
+populate_ram_info(info, s);
  
  if (blk_mig_active()) {

  info->has_disk = true;
@@ -624,18 +632,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
  info->has_setup_time = true;
  info->setup_time = s->setup_time;
  
-info->has_ram = true;

-info->ram = g_malloc0(sizeof(*info->ram));
-info->ram->transferred = ram_bytes_transferred();
-info->ram->remaining = ram_bytes_remaining();
-info->ram->total = ram_bytes_total();
-info->ram->duplicate = dup_mig_pages_transferred();
-info->ram->skipped = skipped_mig_pages_transferred();
-info->ram->normal = norm_mig_pages_transferred();
-info->ram->normal_bytes = norm_mig_bytes_transferred();
-info->ram->dirty_pages_rate = s->dirty_pages_rate;
-info->ram->mbps = s->mbps;
-info->ram->dirty_sync_count = s->dirty_sync_count;
+populate_ram_info(info, s);
  
  if (blk_mig_active()) {

  info->has_disk = true;
@@ -658,17 +655,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
  info->has_setup_time = true;
  info->setup_time = s->setup_time;
  
-info->has_ram = true;

-info->ram = g_malloc0(sizeof(*info->ram));
-info->ram->transferred = ram_bytes_transferred();
-info->ram->remaining = 0;
-info->ram->total = ram_bytes_total();
-info->ram->duplicate = dup_mig_pages_transferred();
-info->ram->skipped = skipped_mig_pages_transferred();
-info->ram->normal = norm_mig_pages_transferred();
-info->ram->normal_bytes = norm_mig_bytes_transferred();
-info->ram->mbps = s->mbps;
-info->ram->dirty_sync_count = s->dirty_sync_count;
+populate_ram_info(info, s);
  break;
  case MIGRATION_STATUS_FAILED:
  info->has_status = true;

Reviwed-by: Denis V. Lunev 



Re: [Qemu-devel] [PATCH v9 05/11] block: allow block jobs in any arbitrary node

2016-04-29 Thread Kevin Wolf
Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> Currently, block jobs can only be owned by root nodes. This patch
> allows block jobs to be in any arbitrary node, by making the following
> changes:
> 
> - Block jobs can now be identified by the node name of their
>   BlockDriverState in addition to the device name. Since both device
>   and node names live in the same namespace there's no ambiguity.

Careful, we know this is a part of our API that we have already messed
up and we don't want to make things worse while adding new things before
we've cleaned it up.

Let's keep in mind where we are planning to go with block jobs: They
should become background jobs, not tied to any block device. The close
connection to a single BDS already doesn't make a lot of sense today
because most block jobs involve at least two BDSes.

In the final state, we will have a job ID that uniquely identifies the
job, and each command that starts a job will take an ID from the client.
For compatibility, we'll use the block device name as the job ID when
using old commands that don't get an explicit ID yet.

In the existing qemu version, you can't start two block jobs on the same
device, and in future versions, you're supposed to specify an ID each
time. This is why the default can always be supposed to work without
conflicts. If in newer versions, the client mixes both ways (which it
shouldn't do), starting a new block job may fail because the device name
is already in use as an ID for another job.

Now we can probably make the same argument for node names, so we can
extend the interface and still keep it compatible.

Where we need to be careful is that with device names and node names, we
have potentially two names describing the same BDS and therefore the
same job. This must not happen, because we won't be able to represent
that in the generic background job API. Any job has just a single ID
there.

> - The "device" parameter used by all commands that operate on block
>   jobs can also be a node name now.

So I think the least we need to do is change this one to resolve the
block job by its ID (job->id) rather than device or node names.

> - The node name is used as a fallback to fill in the BlockJobInfo
>   structure and all BLOCK_JOB_* events if there is no device name for
>   that job.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  blockdev.c   | 18 ++
>  blockjob.c   |  5 +++--
>  docs/qmp-events.txt  |  8 
>  qapi/block-core.json | 20 ++--
>  4 files changed, 27 insertions(+), 24 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index edbcc19..d1f5dfb 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3685,8 +3685,10 @@ void qmp_blockdev_mirror(const char *device, const 
> char *target,
>  aio_context_release(aio_context);
>  }
>  
> -/* Get the block job for a given device name and acquire its AioContext */
> -static BlockJob *find_block_job(const char *device, AioContext **aio_context,
> +/* Get the block job for a given device or node name
> + * and acquire its AioContext */
> +static BlockJob *find_block_job(const char *device_or_node,
> +AioContext **aio_context,
>  Error **errp)
>  {
>  BlockBackend *blk;
> @@ -3694,18 +3696,18 @@ static BlockJob *find_block_job(const char *device, 
> AioContext **aio_context,
>  
>  *aio_context = NULL;
>  
> -blk = blk_by_name(device);
> -if (!blk) {
> +bs = bdrv_lookup_bs(device_or_node, device_or_node, errp);

Specifically, this one is bad. It allows two different ways to specify
the same job.

The other thing about this patch is that I'm not sure how badly it
conflicts with my series to convert block jobs to BlockBackend. It seems
that you switch everything from blk to bs here, and I'll have to switch
back to blk (once job->blk exists).

Kevin



Re: [Qemu-devel] [PATCH v2 3/5] Postcopy: Add stats on page requests

2016-04-29 Thread Eric Blake
On 04/29/2016 08:47 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> On the source, add a count of page requests received from the
> destination.
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 2/5] Migration: Split out ram part of qmp_query_migrate

2016-04-29 Thread Eric Blake
On 04/29/2016 08:47 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> The RAM section of qmp_query_migrate is reasonably complex
> and repeated 3 times.  Split it out into a helper.
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  migration/migration.c | 57 
> ---
>  1 file changed, 22 insertions(+), 35 deletions(-)
> 

Reviewed-by: Eric Blake 

and thanks for the split.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v2 3/5] Postcopy: Add stats on page requests

2016-04-29 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

On the source, add a count of page requests received from the
destination.

Signed-off-by: Dr. David Alan Gilbert 
---
 hmp.c | 4 
 include/migration/migration.h | 2 ++
 migration/migration.c | 2 ++
 migration/ram.c   | 1 +
 qapi-schema.json  | 6 +-
 5 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hmp.c b/hmp.c
index d510236..cd5fae3 100644
--- a/hmp.c
+++ b/hmp.c
@@ -209,6 +209,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "dirty pages rate: %" PRIu64 " pages\n",
info->ram->dirty_pages_rate);
 }
+if (info->ram->postcopy_requests) {
+monitor_printf(mon, "postcopy request count: %" PRIu64 "\n",
+   info->ram->postcopy_requests);
+}
 }
 
 if (info->has_disk) {
diff --git a/include/migration/migration.h b/include/migration/migration.h
index ac2c12c..78fa59b 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -157,6 +157,8 @@ struct MigrationState
 int64_t xbzrle_cache_size;
 int64_t setup_time;
 int64_t dirty_sync_count;
+/* Count of requests incoming from destination */
+int64_t postcopy_requests;
 
 /* Flag set once the migration has been asked to enter postcopy */
 bool start_postcopy;
diff --git a/migration/migration.c b/migration/migration.c
index bfb326d..9d41618 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -573,6 +573,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->ram->normal_bytes = norm_mig_bytes_transferred();
 info->ram->mbps = s->mbps;
 info->ram->dirty_sync_count = s->dirty_sync_count;
+info->ram->postcopy_requests = s->postcopy_requests;
 
 if (s->state != MIGRATION_STATUS_COMPLETED) {
 info->ram->remaining = ram_bytes_remaining();
@@ -933,6 +934,7 @@ MigrationState *migrate_init(const MigrationParams *params)
 s->dirty_sync_count = 0;
 s->start_postcopy = false;
 s->postcopy_after_devices = false;
+s->postcopy_requests = 0;
 s->migration_thread_running = false;
 s->last_req_rb = NULL;
 
diff --git a/migration/ram.c b/migration/ram.c
index e96c2af..eeb1902 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1169,6 +1169,7 @@ int ram_save_queue_pages(MigrationState *ms, const char 
*rbname,
 {
 RAMBlock *ramblock;
 
+ms->postcopy_requests++;
 rcu_read_lock();
 if (!rbname) {
 /* Reuse last RAMBlock */
diff --git a/qapi-schema.json b/qapi-schema.json
index 54634c4..8bc581e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -382,13 +382,17 @@
 #
 # @dirty-sync-count: number of times that dirty ram was synchronized (since 
2.1)
 #
+# @postcopy-requests: The number of page requests received from the destination
+#(since 2.7)
+#
 # Since: 0.14.0
 ##
 { 'struct': 'MigrationStats',
   'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' ,
'duplicate': 'int', 'skipped': 'int', 'normal': 'int',
'normal-bytes': 'int', 'dirty-pages-rate' : 'int',
-   'mbps' : 'number', 'dirty-sync-count' : 'int' } }
+   'mbps' : 'number', 'dirty-sync-count' : 'int',
+   'postcopy-requests' : 'int' } }
 
 ##
 # @XBZRLECacheStats
-- 
2.5.5




[Qemu-devel] [PATCH v2 0/5] postcopy (& 1 test) patch for 2.7

2016-04-29 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

Hi,
  This is a small set of postcopy changes, the largest of which
is an x86 test for postcopy.

Andrea's libqtest change came about from running my test under very heavy
load.

The test includes a self contained migration workload that rapidly changes
RAM in a predictable fashion allowing us to end up in postcopy mode and
also to be able to check the contents of RAM.

  Note this sometimes fails on Linux kernels 4.5 (and current 4.6) which
  have a KVM+THP bug. Use this fix:
 https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg04028.html

v2:
  Split 'Add stats...' into two (Eric's comment)
  Test:
  Survive qmp events landing when we're expecting a response from the
  command (qmp/libqtest doesn't help in that)
  Fix a race where we'd start postcopy early

Dave

Andrea Arcangeli (1):
  tests: fix libqtest socket timeouts

Dr. David Alan Gilbert (4):
  Postcopy: Avoid 0 length discards
  Migration: Split out ram part of qmp_query_migrate
  Postcopy: Add stats on page requests
  test: Postcopy

 hmp.c |   4 +
 include/migration/migration.h |   2 +
 migration/migration.c |  59 +++---
 migration/ram.c   |   5 +-
 qapi-schema.json  |   6 +-
 tests/Makefile|   2 +
 tests/libqtest.c  |   2 +-
 tests/postcopy-test.c | 455 ++
 8 files changed, 497 insertions(+), 38 deletions(-)
 create mode 100644 tests/postcopy-test.c

-- 
2.5.5




[Qemu-devel] [PATCH v2 5/5] tests: fix libqtest socket timeouts

2016-04-29 Thread Dr. David Alan Gilbert (git)
From: Andrea Arcangeli 

I kept getting timeouts and unix socket accept failures under high
load, the patch fixes it.

Signed-off-by: Andrea Arcangeli 
---
 tests/libqtest.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/libqtest.c b/tests/libqtest.c
index b12a9e4..57ce292 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -27,7 +27,7 @@
 #include "qapi/qmp/qjson.h"
 
 #define MAX_IRQ 256
-#define SOCKET_TIMEOUT 5
+#define SOCKET_TIMEOUT 50
 
 QTestState *global_qtest;
 
-- 
2.5.5




[Qemu-devel] [PATCH v2 2/5] Migration: Split out ram part of qmp_query_migrate

2016-04-29 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

The RAM section of qmp_query_migrate is reasonably complex
and repeated 3 times.  Split it out into a helper.

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 57 ---
 1 file changed, 22 insertions(+), 35 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 991313a..bfb326d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -561,6 +561,25 @@ static void get_xbzrle_cache_stats(MigrationInfo *info)
 }
 }
 
+static void populate_ram_info(MigrationInfo *info, MigrationState *s)
+{
+info->has_ram = true;
+info->ram = g_malloc0(sizeof(*info->ram));
+info->ram->transferred = ram_bytes_transferred();
+info->ram->total = ram_bytes_total();
+info->ram->duplicate = dup_mig_pages_transferred();
+info->ram->skipped = skipped_mig_pages_transferred();
+info->ram->normal = norm_mig_pages_transferred();
+info->ram->normal_bytes = norm_mig_bytes_transferred();
+info->ram->mbps = s->mbps;
+info->ram->dirty_sync_count = s->dirty_sync_count;
+
+if (s->state != MIGRATION_STATUS_COMPLETED) {
+info->ram->remaining = ram_bytes_remaining();
+info->ram->dirty_pages_rate = s->dirty_pages_rate;
+}
+}
+
 MigrationInfo *qmp_query_migrate(Error **errp)
 {
 MigrationInfo *info = g_malloc0(sizeof(*info));
@@ -585,18 +604,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info->has_setup_time = true;
 info->setup_time = s->setup_time;
 
-info->has_ram = true;
-info->ram = g_malloc0(sizeof(*info->ram));
-info->ram->transferred = ram_bytes_transferred();
-info->ram->remaining = ram_bytes_remaining();
-info->ram->total = ram_bytes_total();
-info->ram->duplicate = dup_mig_pages_transferred();
-info->ram->skipped = skipped_mig_pages_transferred();
-info->ram->normal = norm_mig_pages_transferred();
-info->ram->normal_bytes = norm_mig_bytes_transferred();
-info->ram->dirty_pages_rate = s->dirty_pages_rate;
-info->ram->mbps = s->mbps;
-info->ram->dirty_sync_count = s->dirty_sync_count;
+populate_ram_info(info, s);
 
 if (blk_mig_active()) {
 info->has_disk = true;
@@ -624,18 +632,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info->has_setup_time = true;
 info->setup_time = s->setup_time;
 
-info->has_ram = true;
-info->ram = g_malloc0(sizeof(*info->ram));
-info->ram->transferred = ram_bytes_transferred();
-info->ram->remaining = ram_bytes_remaining();
-info->ram->total = ram_bytes_total();
-info->ram->duplicate = dup_mig_pages_transferred();
-info->ram->skipped = skipped_mig_pages_transferred();
-info->ram->normal = norm_mig_pages_transferred();
-info->ram->normal_bytes = norm_mig_bytes_transferred();
-info->ram->dirty_pages_rate = s->dirty_pages_rate;
-info->ram->mbps = s->mbps;
-info->ram->dirty_sync_count = s->dirty_sync_count;
+populate_ram_info(info, s);
 
 if (blk_mig_active()) {
 info->has_disk = true;
@@ -658,17 +655,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info->has_setup_time = true;
 info->setup_time = s->setup_time;
 
-info->has_ram = true;
-info->ram = g_malloc0(sizeof(*info->ram));
-info->ram->transferred = ram_bytes_transferred();
-info->ram->remaining = 0;
-info->ram->total = ram_bytes_total();
-info->ram->duplicate = dup_mig_pages_transferred();
-info->ram->skipped = skipped_mig_pages_transferred();
-info->ram->normal = norm_mig_pages_transferred();
-info->ram->normal_bytes = norm_mig_bytes_transferred();
-info->ram->mbps = s->mbps;
-info->ram->dirty_sync_count = s->dirty_sync_count;
+populate_ram_info(info, s);
 break;
 case MIGRATION_STATUS_FAILED:
 info->has_status = true;
-- 
2.5.5




[Qemu-devel] [PATCH v2 1/5] Postcopy: Avoid 0 length discards

2016-04-29 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

The discard code in migration/ram.c would send request for
zero length discards in the case where no discards were needed.
It doesn't appear to have had any bad effect.

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/ram.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 3f05738..e96c2af 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1557,7 +1557,9 @@ static int postcopy_send_discard_bm_ram(MigrationState 
*ms,
 } else {
 discard_length = zero - one;
 }
-postcopy_discard_send_range(ms, pds, one, discard_length);
+if (discard_length) {
+postcopy_discard_send_range(ms, pds, one, discard_length);
+}
 current = one + discard_length;
 } else {
 current = one;
-- 
2.5.5




Re: [Qemu-devel] [PATCH] configure: Check if struct fsxattr is available from linux header

2016-04-29 Thread Peter Maydell
On 29 April 2016 at 15:31, Stefan Weil  wrote:
> Is it a bug of the system headers? Or simply a design which
> requires users to be careful when including certain header files?
>
> Both /usr/include/xfs/xfs_fs.h and /usr/include/linux/fs.h define
> the same struct fsxattr, and both definitions are identical.

That sounds like a header bug to me...

http://oss.sgi.com/archives/xfs/2016-02/msg00324.html

suggests that (a) the xfsprogs folks are updating their
header to deal with what the kernel header is doing and that
(b) they think the distros ought to be updating both of them
in sync in some way...

> Of course a good comment would be helpful here, e. g.
>
> # Avoid redefinition of struct fsxattr in xfs/xfs_fs.h.
> # It is already defined in linux/fs.h.

Yes, this is really all I want: a note that some versions of
the kernel headers and the xfs headers clash, so we suppress
the xfs version if the kernel header is providing the struct.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v9 04/11] block: use the block job list in bdrv_close()

2016-04-29 Thread Kevin Wolf
Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> bdrv_close_all() cancels all block jobs by iterating over all
> BlockDriverStates. This patch simplifies the code by iterating
> directly over the block jobs using block_job_next().
> 
> Signed-off-by: Alberto Garcia 

This is essentially the same as I'm doing here:
http://repo.or.cz/qemu/kevin.git/commitdiff/6b545b21e3dfe2e3927cfb6bbdcc1b233c67630c

I think I like having a separate block_job_cancel_sync_all() function
like I did instead of inlining it in bdrv_close_all(), though that's a
matter of taste.

My patch also moves the call, so having to move only a single line in my
patch (assuming it gets rebased on top of this one) seems nicer.

Kevin

>  block.c | 25 ++---
>  1 file changed, 6 insertions(+), 19 deletions(-)
> 
> diff --git a/block.c b/block.c
> index d36eb75..48638c9 100644
> --- a/block.c
> +++ b/block.c
> @@ -2182,8 +2182,7 @@ static void bdrv_close(BlockDriverState *bs)
>  
>  void bdrv_close_all(void)
>  {
> -BlockDriverState *bs;
> -AioContext *aio_context;
> +BlockJob *job;
>  
>  /* Drop references from requests still in flight, such as canceled block
>   * jobs whose AIO context has not been polled yet */
> @@ -2193,23 +2192,11 @@ void bdrv_close_all(void)
>  blockdev_close_all_bdrv_states();
>  
>  /* Cancel all block jobs */
> -while (!QTAILQ_EMPTY(_bdrv_states)) {
> -QTAILQ_FOREACH(bs, _bdrv_states, bs_list) {
> -aio_context = bdrv_get_aio_context(bs);
> -
> -aio_context_acquire(aio_context);
> -if (bs->job) {
> -block_job_cancel_sync(bs->job);
> -aio_context_release(aio_context);
> -break;
> -}
> -aio_context_release(aio_context);
> -}
> -
> -/* All the remaining BlockDriverStates are referenced directly or
> - * indirectly from block jobs, so there needs to be at least one BDS
> - * directly used by a block job */
> -assert(bs);
> +while ((job = block_job_next(NULL))) {
> +AioContext *aio_context = bdrv_get_aio_context(job->bs);
> +aio_context_acquire(aio_context);
> +block_job_cancel_sync(job);
> +aio_context_release(aio_context);
>  }
>  }
>  
> -- 
> 2.8.0.rc3
> 



Re: [Qemu-devel] [PATCH v9 03/11] block: use the block job list in qmp_query_block_jobs()

2016-04-29 Thread Kevin Wolf
Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> qmp_query_block_jobs() uses bdrv_next() to look for block jobs, but
> this function can only find those in top-level BlockDriverStates.
> 
> This patch uses block_job_next() instead.
> 
> Signed-off-by: Alberto Garcia 

Reviewed-by: Kevin Wolf 

However, I'd like to give you a heads-up that this will technically
conflict with my series that removes BlockDriverState.blk because that
changes the bdrv_next() signature.

Nothing dramatic, but I guess it would make sense to decide where in the
queue of patches this series should go. My suggestion would be on top of
"blockdev: (Nearly) free clean-up work".

Kevin



Re: [Qemu-devel] [PATCH] configure: Check if struct fsxattr is available from linux header

2016-04-29 Thread Stefan Weil
Am 29.04.2016 um 16:00 schrieb Peter Maydell:
> On 29 April 2016 at 14:56, Stefan Weil  wrote:
>> Am 29.04.2016 um 15:54 schrieb Peter Maydell:
>>
>>> This means we'll build with a HAVE_FSXATTR define set, but
>>> nothing in the tree tries to use that as far as I can tell:
>>> "git grep HAVE_FSXATTR" returns no matches. What am I missing?
>> It's used by the system headers:
>>
>> /usr/include/xfs/xfs_fs.h:#ifndef HAVE_FSXATTR
> ...so this is a bug in the system headers that we're working
> around? It would probably be useful to say so in a comment
> in configure, otherwise it's liable to get ripped out in
> future when somebody notices it's not used any more.
> (For instance HAVE_IFADDRS_H isn't used and looks like dead
> code in configure.)
>
> thanks
> -- PMM

Is it a bug of the system headers? Or simply a design which
requires users to be careful when including certain header files?

Both /usr/include/xfs/xfs_fs.h and /usr/include/linux/fs.h define
the same struct fsxattr, and both definitions are identical.

Updating to xfslib-dev 4.3.0 did not help for Debian. This means
that even with a consistent installation of Debian Testing
QEMU fails to build as soon as CONFIG_XFS is defined.

Of course a good comment would be helpful here, e. g.

# Avoid redefinition of struct fsxattr in xfs/xfs_fs.h.
# It is already defined in linux/fs.h.

Stefan




Re: [Qemu-devel] [PATCH v9 02/11] block: use the block job list in bdrv_drain_all()

2016-04-29 Thread Kevin Wolf
Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> bdrv_drain_all() pauses all block jobs by using bdrv_next() to iterate
> over all top-level BlockDriverStates. Therefore the code is unable to
> find block jobs in other nodes.
> 
> This patch uses block_job_next() to iterate over all block jobs.
> 
> Signed-off-by: Alberto Garcia 

This conflicts with Paolo's patches in block-next. Please rebase. (Apart
from that, the change looks sane.)

Kevin



Re: [Qemu-devel] [PATCH v9 01/11] block: keep a list of block jobs

2016-04-29 Thread Kevin Wolf
Am 04.04.2016 um 15:43 hat Alberto Garcia geschrieben:
> The current way to obtain the list of existing block jobs is to
> iterate over all root nodes and check which ones own a job.
> 
> Since we want to be able to support block jobs in other nodes as well,
> this patch keeps a list of jobs that is updated every time one is
> created or destroyed.
> 
> Signed-off-by: Alberto Garcia 

Reviewed-by: Kevin Wolf 

Actually, I have almost literally the same change (except that I didn't
need a block_job_next()) in my development branch. I guess I should
rebase on top of this one. :-)

Kevin



Re: [Qemu-devel] [PATCH v5 5/9] target-mips: Activate IEEE 274-2008 signaling NaN bit meaning

2016-04-29 Thread Maciej W. Rozycki
On Mon, 25 Apr 2016, Aleksandar Markovic wrote:

> No, nothing is lost. The plan is to add this functionality at a later time.

 OK then, as you prefer.  Although I find the order somewhat odd as r5+ is 
a special case of r3.

  Maciej



Re: [Qemu-devel] [PATCH v16 00/24] qapi visitor cleanups (post-introspection cleanups subset E)

2016-04-29 Thread Eric Blake
On 04/29/2016 07:09 AM, Markus Armbruster wrote:

>>> Looks ready.  Thanks for the quick respin!
>>
>> As usual, I'll be glad double-check your qapi-next branch once you've
>> made the tweaks you suggested while applying the series.
> 
> Applied to qapi-next, thanks!

Looks good; I'll start basing my other patches on this branch until 2.7
opens.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v5 4/9] target-mips: Amend processor definitions in relation to FCR31

2016-04-29 Thread Leon Alrae
On 18/04/16 17:03, Aleksandar Markovic wrote:
> From: Aleksandar Markovic 
> 
> Amend definitions of some Mips processors related to FCR31
> (float status control register). Most significantly, FCR31 of
> processors mips32r6-generic, mips64r6-generic, and P5600 will
> be set so that its FCR31_ABS2008 and FCR31_NAN2008 bits are set
> to 1.

Not long before this series was posted I applied a change which sets
these bits for these processors (even though there's no actual support):
https://lists.nongnu.org/archive/html/qemu-devel/2016-02/msg05593.html

By looking at the description I'm guessing this part was subtracted
after you rebased the series. Now this patch does nothing apart from
setting fcr31 to 0 which actually isn't necessary and I think this patch
can be dropped.

Thanks,
Leon

> 
> Signed-off-by: Aleksandar Markovic 
> ---
>  target-mips/translate_init.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/target-mips/translate_init.c b/target-mips/translate_init.c
> index e81a831..1094baa 100644
> --- a/target-mips/translate_init.c
> +++ b/target-mips/translate_init.c
> @@ -273,6 +273,7 @@ static const mips_def_t mips_defs[] =
>  .CP0_Status_rw_bitmask = 0x3678FF1F,
>  .CP1_fcr0 = (1 << FCR0_F64) | (1 << FCR0_L) | (1 << FCR0_W) |
>  (1 << FCR0_D) | (1 << FCR0_S) | (0x93 << FCR0_PRID),
> +.CP1_fcr31 = 0,
>  .SEGBITS = 32,
>  .PABITS = 32,
>  .insn_flags = CPU_MIPS32R2 | ASE_MIPS16,
> @@ -303,6 +304,7 @@ static const mips_def_t mips_defs[] =
>  (0xff << CP0TCSt_TASID),
>  .CP1_fcr0 = (1 << FCR0_F64) | (1 << FCR0_L) | (1 << FCR0_W) |
>  (1 << FCR0_D) | (1 << FCR0_S) | (0x95 << FCR0_PRID),
> +.CP1_fcr31 = 0,
>  .CP0_SRSCtl = (0xf << CP0SRSCtl_HSS),
>  .CP0_SRSConf0_rw_bitmask = 0x3fff,
>  .CP0_SRSConf0 = (1U << CP0SRSC0_M) | (0x3fe << CP0SRSC0_SRS3) |
> @@ -343,6 +345,7 @@ static const mips_def_t mips_defs[] =
>  .CP0_Status_rw_bitmask = 0x3778FF1F,
>  .CP1_fcr0 = (1 << FCR0_F64) | (1 << FCR0_L) | (1 << FCR0_W) |
>  (1 << FCR0_D) | (1 << FCR0_S) | (0x93 << FCR0_PRID),
> +.CP1_fcr31 = 0,
>  .SEGBITS = 32,
>  .PABITS = 32,
>  .insn_flags = CPU_MIPS32R2 | ASE_MIPS16 | ASE_DSP | ASE_DSPR2,
> @@ -434,7 +437,7 @@ static const mips_def_t mips_defs[] =
>  },
>  {
>  /* A generic CPU supporting MIPS32 Release 6 ISA.
> -   FIXME: Support IEEE 754-2008 FP.
> +   FIXME: Complete support for IEEE 754-2008 FP.
>Eventually this should be replaced by a real CPU model. */
>  .name = "mips32r6-generic",
>  .CP0_PRid = 0x0001,
> @@ -485,6 +488,7 @@ static const mips_def_t mips_defs[] =
>  .CP0_Status_rw_bitmask = 0x3678,
>  /* The R4000 has a full 64bit FPU but doesn't use the fcr0 bits. */
>  .CP1_fcr0 = (0x5 << FCR0_PRID) | (0x0 << FCR0_REV),
> +.CP1_fcr31 = 0,
>  .SEGBITS = 40,
>  .PABITS = 36,
>  .insn_flags = CPU_MIPS3,
> @@ -503,6 +507,7 @@ static const mips_def_t mips_defs[] =
>  .CP0_Status_rw_bitmask = 0x3678,
>  /* The VR5432 has a full 64bit FPU but doesn't use the fcr0 bits. */
>  .CP1_fcr0 = (0x54 << FCR0_PRID) | (0x0 << FCR0_REV),
> +.CP1_fcr31 = 0,
>  .SEGBITS = 40,
>  .PABITS = 32,
>  .insn_flags = CPU_VR54XX,
> @@ -548,6 +553,7 @@ static const mips_def_t mips_defs[] =
>  /* The 5Kf has F64 / L / W but doesn't use the fcr0 bits. */
>  .CP1_fcr0 = (1 << FCR0_D) | (1 << FCR0_S) |
>  (0x81 << FCR0_PRID) | (0x0 << FCR0_REV),
> +.CP1_fcr31 = 0,
>  .SEGBITS = 42,
>  .PABITS = 36,
>  .insn_flags = CPU_MIPS64,
> @@ -575,6 +581,7 @@ static const mips_def_t mips_defs[] =
>  .CP1_fcr0 = (1 << FCR0_3D) | (1 << FCR0_PS) |
>  (1 << FCR0_D) | (1 << FCR0_S) |
>  (0x82 << FCR0_PRID) | (0x0 << FCR0_REV),
> +.CP1_fcr31 = 0,
>  .SEGBITS = 40,
>  .PABITS = 36,
>  .insn_flags = CPU_MIPS64 | ASE_MIPS3D,
> @@ -601,6 +608,7 @@ static const mips_def_t mips_defs[] =
>  .CP1_fcr0 = (1 << FCR0_F64) | (1 << FCR0_3D) | (1 << FCR0_PS) |
>  (1 << FCR0_L) | (1 << FCR0_W) | (1 << FCR0_D) |
>  (1 << FCR0_S) | (0x00 << FCR0_PRID) | (0x0 << FCR0_REV),
> +.CP1_fcr31 = 0,
>  .SEGBITS = 42,
>  .PABITS = 36,
>  .insn_flags = CPU_MIPS64R2 | ASE_MIPS3D,
> @@ -653,7 +661,7 @@ static const mips_def_t mips_defs[] =
>  },
>  {
>  /* A generic CPU supporting MIPS64 Release 6 ISA.
> -   FIXME: Support IEEE 754-2008 FP.
> +   FIXME: Complete support for IEEE 754-2008 FP.
>

Re: [Qemu-devel] [PATCH] configure: Check if struct fsxattr is available from linux header

2016-04-29 Thread Peter Maydell
On 29 April 2016 at 14:56, Stefan Weil  wrote:
> Am 29.04.2016 um 15:54 schrieb Peter Maydell:
>
>> This means we'll build with a HAVE_FSXATTR define set, but
>> nothing in the tree tries to use that as far as I can tell:
>> "git grep HAVE_FSXATTR" returns no matches. What am I missing?

> It's used by the system headers:
>
> /usr/include/xfs/xfs_fs.h:#ifndef HAVE_FSXATTR

...so this is a bug in the system headers that we're working
around? It would probably be useful to say so in a comment
in configure, otherwise it's liable to get ripped out in
future when somebody notices it's not used any more.
(For instance HAVE_IFADDRS_H isn't used and looks like dead
code in configure.)

thanks
-- PMM



Re: [Qemu-devel] [PATCH v5 2/9] softfloat: For Mips only, correct default NaN values

2016-04-29 Thread Leon Alrae
On 18/04/16 17:03, Aleksandar Markovic wrote:
> From: Aleksandar Markovic 
> 
> Only for Mips platform, and only for cases when snan_bit_is_one is 0,
> correct default NaN values (in their 16-, 32-, and 64-bit flavors).
> 
> For more info, see [1], page 84, Table 6.3 "Value Supplied When a New
> Quiet NaN Is Created", and [2], page 52, Table 3.7 "Default NaN
> Encodings".
> 
> [1] "MIPS® Architecture For Programmers Volume II-A:
> The MIPS64® Instruction Set Reference Manual",
> Imagination Technologies LTD, Revision 6.04, November 13, 2015
> 
> [2] "MIPS Architecture for Programmers Volume IV-j:
> The MIPS32® SIMD Architecture Module",
> Imagination Technologies LTD, Revision 1.12, February 3, 2016
> 
> Signed-off-by: Aleksandar Markovic 
> ---
>  fpu/softfloat-specialize.h | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
> index e03a529..093218f 100644
> --- a/fpu/softfloat-specialize.h
> +++ b/fpu/softfloat-specialize.h
> @@ -97,7 +97,11 @@ float16 float16_default_nan(float_status *status)
>  if (status->snan_bit_is_one) {
>  return const_float16(0x7DFF);
>  } else {
> +#if defined(TARGET_MIPS)
> +return const_float16(0x7E00);
> +#else
>  return const_float16(0xFE00);
> +#endif
>  }
>  #endif
>  }
> @@ -116,7 +120,11 @@ float32 float32_default_nan(float_status *status)
>  if (status->snan_bit_is_one) {
>  return const_float32(0x7FBF);
>  } else {
> +#if defined(TARGET_MIPS)
> +return const_float32(0x7FC0);
> +#else
>  return const_float32(0xFFC0);
> +#endif
>  }
>  #endif
>  }
> @@ -135,7 +143,11 @@ float64 float64_default_nan(float_status *status)
>  if (status->snan_bit_is_one) {
>  return const_float64(LIT64(0x7FF7));
>  } else {
> +#if defined(TARGET_MIPS)
> +return const_float64(LIT64(0x7FF8));
> +#else
>  return const_float64(LIT64(0xFFF8));
> +#endif
>  }
>  #endif
>  }

Reviewed-by: Leon Alrae 



Re: [Qemu-devel] [PATCH v6 6/6] cpu-exec: Move TB chaining into tb_find_fast()

2016-04-29 Thread Sergey Fedorov
On 29/04/16 16:54, Alex Bennée wrote:
> Sergey Fedorov  writes:
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index f49a436e1a5a..5f23c0660d6e 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -320,7 +320,9 @@ found:
>>  return tb;
>>  }
>>
>> -static inline TranslationBlock *tb_find_fast(CPUState *cpu)
>> +static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>> + TranslationBlock **last_tb,
>> + int tb_exit)
>>  {
>>  CPUArchState *env = (CPUArchState *)cpu->env_ptr;
>>  TranslationBlock *tb;
>> @@ -331,11 +333,24 @@ static inline TranslationBlock *tb_find_fast(CPUState 
>> *cpu)
>> always be the same before a given translated block
>> is executed. */
>>  cpu_get_tb_cpu_state(env, , _base, );
>> +tb_lock();
>>  tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
>>  if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>>   tb->flags != flags)) {
>>  tb = tb_find_slow(cpu, pc, cs_base, flags);
>>  }
>> +if (cpu->tb_flushed) {
>> +/* Ensure that no TB jump will be modified as the
>> + * translation buffer has been flushed.
>> + */
>> +*last_tb = NULL;
>> +cpu->tb_flushed = false;
>> +}
>> +/* See if we can patch the calling TB. */
>> +if (*last_tb && qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
> This should be !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)

Probably, it's mine rebase conflict resolution mistake. Nice catch, thanks!

Kind regards,
Sergey

>
>> +tb_add_jump(*last_tb, tb_exit, tb);
>> +}
>> +tb_unlock();
>>  return tb;
>>  }



[Qemu-devel] [PATCH RFC 5/8] vfio: ccw: realize VFIO_DEVICE_CCW_HOT_RESET ioctl

2016-04-29 Thread Dong Jia Shi
Introduce VFIO_DEVICE_CCW_HOT_RESET ioctl for vfio-ccw to make it
possible to hot-reset the device.

We try to achieve a hot reset by first offlining the device and then
onlining it again: this should clear all state at the subchannel.

Signed-off-by: Dong Jia Shi 
Reviewed-by: Pierre Morel 
---
 drivers/vfio/ccw/vfio_ccw.c | 50 -
 include/uapi/linux/vfio.h   |  8 
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/ccw/vfio_ccw.c b/drivers/vfio/ccw/vfio_ccw.c
index 7331aed..9700448 100644
--- a/drivers/vfio/ccw/vfio_ccw.c
+++ b/drivers/vfio/ccw/vfio_ccw.c
@@ -22,10 +22,12 @@
  * struct vfio_ccw_device
  * @cdev: ccw device
  * @going_away: if an offline procedure was already ongoing
+ * @hot_reset: if hot-reset is ongoing
  */
 struct vfio_ccw_device {
struct ccw_device   *cdev;
boolgoing_away;
+   boolhot_reset;
 };
 
 enum vfio_ccw_device_type {
@@ -58,6 +60,7 @@ static void vfio_ccw_release(void *device_data)
 static long vfio_ccw_ioctl(void *device_data, unsigned int cmd,
   unsigned long arg)
 {
+   struct vfio_ccw_device *vcdev = device_data;
unsigned long minsz;
 
if (cmd == VFIO_DEVICE_GET_INFO) {
@@ -76,6 +79,34 @@ static long vfio_ccw_ioctl(void *device_data, unsigned int 
cmd,
info.num_irqs = 0;
 
return copy_to_user((void __user *)arg, , minsz);
+
+   } else if (cmd == VFIO_DEVICE_CCW_HOT_RESET) {
+   unsigned long flags;
+   int ret;
+
+   spin_lock_irqsave(get_ccwdev_lock(vcdev->cdev), flags);
+   if (!vcdev->cdev->online) {
+   spin_unlock_irqrestore(get_ccwdev_lock(vcdev->cdev),
+  flags);
+   return -EINVAL;
+   }
+
+   if (vcdev->hot_reset) {
+   spin_unlock_irqrestore(get_ccwdev_lock(vcdev->cdev),
+  flags);
+   return -EBUSY;
+   }
+   vcdev->hot_reset = true;
+   spin_unlock_irqrestore(get_ccwdev_lock(vcdev->cdev), flags);
+
+   ret = ccw_device_set_offline(vcdev->cdev);
+   if (!ret)
+   ret = ccw_device_set_online(vcdev->cdev);
+
+   spin_lock_irqsave(get_ccwdev_lock(vcdev->cdev), flags);
+   vcdev->hot_reset = false;
+   spin_unlock_irqrestore(get_ccwdev_lock(vcdev->cdev), flags);
+   return ret;
}
 
return -ENOTTY;
@@ -108,7 +139,7 @@ static int vfio_ccw_set_offline(struct ccw_device *cdev)
 
vdev = vfio_device_data(device);
vfio_device_put(device);
-   if (!vdev || vdev->going_away)
+   if (!vdev || vdev->hot_reset || vdev->going_away)
return 0;
 
vdev->going_away = true;
@@ -128,9 +159,26 @@ void vfio_ccw_remove(struct ccw_device *cdev)
 
 static int vfio_ccw_set_online(struct ccw_device *cdev)
 {
+   struct vfio_device *device = vfio_device_get_from_dev(>dev);
struct vfio_ccw_device *vdev;
int ret;
 
+   if (!device)
+   goto create_device;
+
+   vdev = vfio_device_data(device);
+   vfio_device_put(device);
+   if (!vdev)
+   goto create_device;
+
+   /*
+* During hot reset, we just want to disable/enable the
+* subchannel and need not setup anything again.
+*/
+   if (vdev->hot_reset)
+   return 0;
+
+create_device:
vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev)
return -ENOMEM;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index aaedfcd..889a316 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -687,6 +687,14 @@ struct vfio_iommu_spapr_tce_remove {
 };
 #define VFIO_IOMMU_SPAPR_TCE_REMOVE_IO(VFIO_TYPE, VFIO_BASE + 20)
 
+/**
+ * VFIO_DEVICE_CCW_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 21)
+ *
+ * Hot reset the channel I/O device. All state of the subchannel will be
+ * cleared.
+ */
+#define VFIO_DEVICE_CCW_HOT_RESET  _IO(VFIO_TYPE, VFIO_BASE + 21)
+
 /* * */
 
 #endif /* _UAPIVFIO_H */
-- 
2.6.6




Re: [Qemu-devel] [PATCH] configure: Check if struct fsxattr is available from linux header

2016-04-29 Thread Stefan Weil
Am 29.04.2016 um 15:54 schrieb Peter Maydell:

> This means we'll build with a HAVE_FSXATTR define set, but
> nothing in the tree tries to use that as far as I can tell:
> "git grep HAVE_FSXATTR" returns no matches. What am I missing?
> 
> thanks
> -- PMM
> 

It's used by the system headers:

/usr/include/xfs/xfs_fs.h:#ifndef HAVE_FSXATTR

Stefan



Re: [Qemu-devel] [PATCH v3 04/18] qapi: Factor out JSON number formatting

2016-04-29 Thread Eric Blake
On 04/29/2016 07:22 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> Pull out a new qstring_append_json_number() helper, so that all
>> JSON output producers can use a consistent style for printing
>> floating point without duplicating code (since we are doing more
>> data massaging than a simple printf format can handle).
>>
>> Address one FIXME by adding an Error parameter and warning the
>> caller if the requested number cannot be represented in JSON;
>> but add another FIXME in its place because we have no way to
>> report the problem higher up the stack.
>>
>> Signed-off-by: Eric Blake 
>> Reviewed-by: Fam Zheng 
>>

>>  /**
>> + * qstring_append_json_number(): Append a JSON number to a QString.
>> + * Set @errp if the number is not representable in JSON, but append the
>> + * output anyway (callers can then choose to ignore the warning).
>> + */
>> +void qstring_append_json_number(QString *qstring, double number, Error 
>> **errp)
>> +{
>> +char buffer[1024];
>> +int len;
>> +
>> +/* JSON does not allow Inf or NaN; append it but set errp */
>> +if (!isfinite(number)) {
>> +error_setg(errp, "Non-finite number %f is not valid JSON", number);
>> +}
> 
> Separate patch, please.

Okay.

> 
> "Append it but set errp" feels odd.  Normally, returning with an error
> set means the function failed to do its job.

This one's weird because by the end of the series, it will be used by
the new JSON visitor (which wants the error message because that is not
valid JSON, and doesn't care if the QString is slightly longer); as well
as the existing QMP output visitor (where existing behavior ignores that
it is not valid JSON, and we don't really have a convenient way to pass
errors back up the stack).  Is it worth trying to plumb in better error
reporting to the QMP output visitor, and/or add assertions that values
are finite, and/or document that QMP has an extension beyond JSON in
that it accepts and also might produce Inf/NaN?

> 
>> +
>> +/* FIXME: snprintf() is locale dependent; but JSON requires
>> + * numbers to be formatted as if in the C locale. Dependence
>> + * on C locale is a pervasive issue in QEMU. */
>> +/* FIXME: This risks printing Inf or NaN, which are not valid
>> + * JSON values. */
> 
> Your !isfinite() conditional addresses this, doesn't it?

Yep. Looks like I messed up the rebase (I realized I had to re-move
updated code, but didn't scrub the comments after the move).


> 
> I think this belongs into qobject-json.c, like the previous patch's
> qstring_append_json_string().

Sounds reasonable.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH RFC 6/8] vfio: ccw: introduce page array interfaces

2016-04-29 Thread Dong Jia Shi
CCW translation requires to pin/unpin sets of mem pages frequently.
Currently we have a lack of support to do this in an efficient way.
So we introduce page_array data structure and helper functions to
handle pin/unpin operations here.

Signed-off-by: Dong Jia Shi 
---
 drivers/vfio/ccw/Makefile   |   2 +-
 drivers/vfio/ccw/ccwchain.c | 128 
 2 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 drivers/vfio/ccw/ccwchain.c

diff --git a/drivers/vfio/ccw/Makefile b/drivers/vfio/ccw/Makefile
index ea14ca9..ac62330 100644
--- a/drivers/vfio/ccw/Makefile
+++ b/drivers/vfio/ccw/Makefile
@@ -1,2 +1,2 @@
-vfio-ccw-y := vfio_ccw.o
+vfio-ccw-y := vfio_ccw.o ccwchain.o
 obj-$(CONFIG_VFIO_CCW) += vfio-ccw.o
diff --git a/drivers/vfio/ccw/ccwchain.c b/drivers/vfio/ccw/ccwchain.c
new file mode 100644
index 000..03b4e82
--- /dev/null
+++ b/drivers/vfio/ccw/ccwchain.c
@@ -0,0 +1,128 @@
+/*
+ * ccwchain interfaces
+ *
+ * Copyright IBM Corp. 2016
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2 only)
+ * as published by the Free Software Foundation.
+ *
+ * Author(s): Dong Jia Shi 
+ *Xiao Feng Ren 
+ */
+
+#include 
+#include 
+
+struct page_array {
+   u64 hva;
+   int nr;
+   struct page **items;
+};
+
+struct page_arrays {
+   struct page_array   *parray;
+   int nr;
+};
+
+/*
+ * Helpers to operate page_array.
+ */
+/*
+ * page_array_pin() - pin user pages in memory
+ * @p: page_array on which to perform the operation
+ *
+ * Attempt to pin user pages in memory.
+ *
+ * Usage of page_array:
+ * @p->hva  starting user address. Assigned by caller.
+ * @p->nr   number of pages from @p->hva to pin. Assigned by caller.
+ *  number of pages pinned. Assigned by callee.
+ * @p->itemsarray that receives pointers to the pages pinned. Allocated by
+ *  caller.
+ *
+ * Returns:
+ *   Number of pages pinned on success. If @p->nr is 0 or negative, returns 0.
+ *   If no pages were pinned, returns -errno.
+ */
+static int page_array_pin(struct page_array *p)
+{
+   int i, nr;
+
+   nr = get_user_pages_fast(p->hva, p->nr, 1, p->items);
+   if (nr <= 0) {
+   p->nr = 0;
+   return nr;
+   } else if (nr != p->nr) {
+   for (i = 0; i < nr; i++)
+   put_page(p->items[i]);
+   p->nr = 0;
+   return -ENOMEM;
+   }
+
+   return nr;
+}
+
+/* Unpin the items before releasing the memory. */
+static void page_array_items_unpin_free(struct page_array *p)
+{
+   int i;
+
+   for (i = 0; i < p->nr; i++)
+   put_page(p->items[i]);
+
+   p->nr = 0;
+   kfree(p->items);
+}
+
+/* Alloc memory for items, then pin pages with them. */
+static int page_array_items_alloc_pin(u64 hva,
+ unsigned int len,
+ struct page_array *p)
+{
+   int ret;
+
+   if (!len || p->nr)
+   return -EINVAL;
+
+   p->hva = hva;
+
+   p->nr = ((hva & ~PAGE_MASK) + len + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
+   if (!p->nr)
+   return -EINVAL;
+
+   p->items = kcalloc(p->nr, sizeof(*p->items), GFP_KERNEL);
+   if (!p->items)
+   return -ENOMEM;
+
+   ret = page_array_pin(p);
+   if (ret <= 0)
+   kfree(p->items);
+
+   return ret;
+}
+
+static int page_arrays_init(struct page_arrays *ps, int nr)
+{
+   ps->parray = kcalloc(nr, sizeof(*ps->parray), GFP_KERNEL);
+   if (!ps->parray) {
+   ps->nr = 0;
+   return -ENOMEM;
+   }
+
+   ps->nr = nr;
+   return 0;
+}
+
+static void page_arrays_unpin_free(struct page_arrays *ps)
+{
+   int i;
+
+   for (i = 0; i < ps->nr; i++)
+   page_array_items_unpin_free(ps->parray + i);
+
+   kfree(ps->parray);
+
+   ps->parray = NULL;
+   ps->nr = 0;
+}
-- 
2.6.6




Re: [Qemu-devel] [PATCH v6 6/6] cpu-exec: Move TB chaining into tb_find_fast()

2016-04-29 Thread Alex Bennée

Sergey Fedorov  writes:

> From: Sergey Fedorov 
>
> Move tb_add_jump() call and surrounding code from cpu_exec() into
> tb_find_fast(). That simplifies cpu_exec() a little by hiding the direct
> chaining optimization details into tb_find_fast(). It also allows to
> move tb_lock()/tb_unlock() pair into tb_find_fast(), putting it closer
> to tb_find_slow() which also manipulates the lock.
>
> Suggested-by: Alex Bennée 
> Signed-off-by: Sergey Fedorov 
> Signed-off-by: Sergey Fedorov 
> ---
>
> Changes in v6:
>  * Fixed rebase conflicts
>
>  cpu-exec.c | 35 +++
>  1 file changed, 19 insertions(+), 16 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index f49a436e1a5a..5f23c0660d6e 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -320,7 +320,9 @@ found:
>  return tb;
>  }
>
> -static inline TranslationBlock *tb_find_fast(CPUState *cpu)
> +static inline TranslationBlock *tb_find_fast(CPUState *cpu,
> + TranslationBlock **last_tb,
> + int tb_exit)
>  {
>  CPUArchState *env = (CPUArchState *)cpu->env_ptr;
>  TranslationBlock *tb;
> @@ -331,11 +333,24 @@ static inline TranslationBlock *tb_find_fast(CPUState 
> *cpu)
> always be the same before a given translated block
> is executed. */
>  cpu_get_tb_cpu_state(env, , _base, );
> +tb_lock();
>  tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
>  if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>   tb->flags != flags)) {
>  tb = tb_find_slow(cpu, pc, cs_base, flags);
>  }
> +if (cpu->tb_flushed) {
> +/* Ensure that no TB jump will be modified as the
> + * translation buffer has been flushed.
> + */
> +*last_tb = NULL;
> +cpu->tb_flushed = false;
> +}
> +/* See if we can patch the calling TB. */
> +if (*last_tb && qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {

This should be !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)

> +tb_add_jump(*last_tb, tb_exit, tb);
> +}
> +tb_unlock();
>  return tb;
>  }
>
> @@ -441,7 +456,8 @@ int cpu_exec(CPUState *cpu)
>  } else if (replay_has_exception()
> && cpu->icount_decr.u16.low + cpu->icount_extra == 0) 
> {
>  /* try to cause an exception pending in the log */
> -cpu_exec_nocache(cpu, 1, tb_find_fast(cpu), true);
> +last_tb = NULL; /* Avoid chaining TBs */
> +cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, _tb, 0), 
> true);
>  ret = -1;
>  break;
>  #endif
> @@ -511,20 +527,7 @@ int cpu_exec(CPUState *cpu)
>  cpu->exception_index = EXCP_INTERRUPT;
>  cpu_loop_exit(cpu);
>  }
> -tb_lock();
> -tb = tb_find_fast(cpu);
> -if (cpu->tb_flushed) {
> -/* Ensure that no TB jump will be modified as the
> - * translation buffer has been flushed.
> - */
> -last_tb = NULL;
> -cpu->tb_flushed = false;
> -}
> -/* See if we can patch the calling TB. */
> -if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
> -tb_add_jump(last_tb, tb_exit, tb);
> -}
> -tb_unlock();
> +tb = tb_find_fast(cpu, _tb, tb_exit);
>  if (likely(!cpu->exit_request)) {
>  uintptr_t ret;
>  trace_exec_tb(tb, tb->pc);


--
Alex Bennée



Re: [Qemu-devel] [PATCH] configure: Check if struct fsxattr is available from linux header

2016-04-29 Thread Peter Maydell
On 29 April 2016 at 14:07, Jan Vesely  wrote:
> Fixes build failure with --enable-xfsctl and
> new linux headers (>=4.5) and older xfsprogs(<4.5):
> In file included from /usr/include/xfs/xfs.h:38:0,
>  from 
> /var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:97:
> /usr/include/xfs/xfs_fs.h:42:8: error: redefinition of ‘struct fsxattr’
>  struct fsxattr {
> ^
> In file included from 
> /var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:60:0:
> /usr/include/linux/fs.h:155:8: note: originally defined here
>  struct fsxattr {
>
> CC: qemu-triv...@nongnu.org
> CC: Markus Armbruster 
> CC: Peter Maydell 
> CC: Stefan Weil 
> Signed-off-by: Jan Vesely 
> ---
> One can argue that the failure only happens for invalid linux-headers,
> xfsprogs combinations, feel free to reject the patch in that case.
>
> This patch relies on functionality introduced in
> 559607ea173 io: add QIOChannelSocket class
>
>  configure | 18 ++
>  1 file changed, 18 insertions(+)

Hi; thanks for this patch. I'm a bit confused by it:

> +if test "$have_fsxattr" = "yes" ; then
> +echo "HAVE_FSXATTR=y" >> $config_host_mak
> +fi

This means we'll build with a HAVE_FSXATTR define set, but
nothing in the tree tries to use that as far as I can tell:
"git grep HAVE_FSXATTR" returns no matches. What am I missing?

thanks
-- PMM



Re: [Qemu-devel] [PATCH] configure: Check if struct fsxattr is available from linux header

2016-04-29 Thread Stefan Weil
Am 29.04.2016 um 15:07 schrieb Jan Vesely:
> Fixes build failure with --enable-xfsctl and
> new linux headers (>=4.5) and older xfsprogs(<4.5):
> In file included from /usr/include/xfs/xfs.h:38:0,
>  from 
> /var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:97:
> /usr/include/xfs/xfs_fs.h:42:8: error: redefinition of ‘struct fsxattr’
>  struct fsxattr {
> ^
> In file included from 
> /var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:60:0:
> /usr/include/linux/fs.h:155:8: note: originally defined here
>  struct fsxattr {
> 
> CC: qemu-triv...@nongnu.org
> CC: Markus Armbruster 
> CC: Peter Maydell 
> CC: Stefan Weil 
> Signed-off-by: Jan Vesely 
> ---

I had this problem with Debian's xfslibs-dev 3.2.1,
linux-libc-dev 4.5.1-1 and either clang or gcc.

This patch fixes it.

Tested-by: Stefan Weil 




Re: [Qemu-devel] [PATCH v5 1/9] softfloat: Implement run-time-configurable meaning of signaling NaN bit

2016-04-29 Thread Leon Alrae
On 18/04/16 17:03, Aleksandar Markovic wrote:
> -#if SNAN_BIT_IS_ONE
> -return ((uint32_t)(a << 1) >= 0xff80);
> -#else
> -return ( ( ( a>>22 ) & 0x1FF ) == 0x1FE ) && ( a & 0x003F );
> -#endif
> +if (status->snan_bit_is_one) {
> +return ((uint32_t)(a << 1) >= 0xFF80);
> +} else {
> +return (((a >> 22) & 0x1FF) == 0x1FE) && (a & 0x003F);

Thanks for fixing the style of lines you modified, ...

> -z.sign = float32_val(a)>>31;
> +z.sign = float32_val(a) >> 31;
>  z.low = 0;
> -z.high = ( (uint64_t) float32_val(a) )<<41;
> +z.high = ((uint64_t)float32_val(a)) << 41;

... here however I think we usually don't correct the style if the line
wouldn't be touched otherwise. But obviously this is up to FPU Maintainers.

> @@ -2940,7 +2952,8 @@ void helper_msa_fclass_df(CPUMIPSState *env, uint32_t 
> df,
>  c = update_msacsr(env, CLEAR_FS_UNDERFLOW, 0);  \
>  \
>  if (get_enabled_exceptions(env, c)) {   \
> -DEST = ((FLOAT_SNAN ## BITS >> 6) << 6) | c;\
> +DEST = ((FLOAT_SNAN ## BITS(>active_tc.msa_fp_status)  \

You can use the existing local pointer "status". Similarly in other MSA
macros.

> +   >> 6) << 6) | c; \


> @@ -4670,7 +4670,7 @@ static void disas_sparc_insn(DisasContext * dc, 
> unsigned int insn)
>  TCGv r_const;
>  
>  gen_address_mask(dc, cpu_addr);
> -tcg_gen_qemu_ld8u(cpu_val, cpu_addr, dc->mem_idx);
> +tcg_gen_qemu_ld8s(cpu_val, cpu_addr, dc->mem_idx);

This change appeared here by mistake, isn't it?

Thanks,
Leon



Re: [Qemu-devel] [PATCH for-2.6] acpi: fix bios linker loadder COMMAND_ALLOCATE on bigendian host

2016-04-29 Thread Igor Mammedov
On Fri, 29 Apr 2016 15:16:07 +0200
Laurent Vivier  wrote:

> On 29/04/2016 14:44, Igor Mammedov wrote:
> > 'make check' fails with:
> > 
> > ERROR:tests/bios-tables-test.c:493:load_expected_aml:
> >assertion failed: (g_file_test(aml_file, G_FILE_TEST_EXISTS))
> > 
> > since commit:
> > caf50c7166a6ed96c462ab5db4b495e1234e4cc6
> > tests: pc: acpi: drop not needed 'expected SSDT' blobs
> > 
> > Assert happens because qemu-system-x86_64 generates
> > SSDT table and test looks for a corresponding expected
> > table to compare with.
> > 
> > However there is no expected SSDT blob anymore, since
> > QEMU souldn't generate one. As it happens BIOS is not
> > able to read ACPI tables from QEMU and fallbacks to
> > embeded legacy ACPI codepath, which generates SSDT.
> > That happens due to wrongly sized endiannes conversion
> > which makes
> >  uint8_t BiosLinkerLoaderEntry.alloc.zone
> > end up with 0 due to truncation of 32 bit integer
> > which on host is 1 or 2.
> > 
> > Fix it by dropping invalid cpu_to_le32() as uint8_t
> > doesn't require any conversion.
> > 
> > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1330174
> > 
> > Signed-off-by: Igor Mammedov   
> 
> Tested-by: Laurent Vivier 
> 
> Fix the problem.
> 
> We have always some warnings but they were already here in the previous
> releases.
That warnings are due to broken endianess handling in iasl,
it looks at table header size and says it's too big
because it doesn't take into account that all integers in ACPI
are little-endian.


> 
> Laurent
> 




Re: [Qemu-devel] [PATCH v3 06/18] qapi: Add qstring_append_format()

2016-04-29 Thread Markus Armbruster
Eric Blake  writes:

> Back in commit 764c1ca (Nov 2009), we added qstring_append_int().
> However, it did not see any use until commit 190c882 (Jan 2015).
> Furthermore, it has a rather limited use case - to print anything
> else, callers still have to format into a temporary buffer, unless
> we want to introduce an explosion of new qstring_append_* methods
> for each useful type to print.
>
> A much better approach is to add a wrapper that merges printf
> behavior onto qstring_append, via the new qstring_append_format()
> (and its vararg counterpart).  In fact, with that in place, we
> no longer need qstring_append_int().
>
> Other immediate uses for the new function include simplifying
> two existing clients of qstring_append() on a just-formatted
> buffer, and the fact that we can take advantage of printf width
> manipulations for more efficient indentation.
>
> Signed-off-by: Eric Blake 
> Reviewed-by: Fam Zheng 
>
> ---
> v3: rebase to master
> v2: also simplify qstring_append_json_string(), add assertion that
> format is well-formed
> ---
>  include/qapi/qmp/qstring.h |  7 +--
>  qjson.c|  2 +-
>  qobject/qobject-json.c | 25 +
>  qobject/qstring.c  | 37 +
>  4 files changed, 36 insertions(+), 35 deletions(-)
>
> diff --git a/include/qapi/qmp/qstring.h b/include/qapi/qmp/qstring.h
> index f00e3df..10afa5d 100644
> --- a/include/qapi/qmp/qstring.h
> +++ b/include/qapi/qmp/qstring.h
> @@ -1,7 +1,7 @@
>  /*
>   * QString Module
>   *
> - * Copyright (C) 2009, 2015 Red Hat Inc.
> + * Copyright (C) 2009-2016 Red Hat Inc.
>   *
>   * Authors:
>   *  Luiz Capitulino 
> @@ -28,9 +28,12 @@ QString *qstring_from_str(const char *str);
>  QString *qstring_from_substr(const char *str, int start, int end);
>  size_t qstring_get_length(const QString *qstring);
>  const char *qstring_get_str(const QString *qstring);
> -void qstring_append_int(QString *qstring, int64_t value);
>  void qstring_append(QString *qstring, const char *str);
>  void qstring_append_chr(QString *qstring, int c);
> +void qstring_append_format(QString *qstring, const char *fmt, ...)
> +GCC_FMT_ATTR(2, 3);
> +void qstring_append_vformat(QString *qstring, const char *fmt, va_list ap)
> +GCC_FMT_ATTR(2, 0);

Let's call these qstring_append_printf() and qstring_append_vprintf(),
to match GLib's g_string_append_printf() and g_string_append_vprintf().

>  void qstring_append_json_string(QString *qstring, const char *raw);
>  void qstring_append_json_number(QString *qstring, double number, Error 
> **errp);
>  QString *qobject_to_qstring(const QObject *obj);
> diff --git a/qjson.c b/qjson.c
> index b9a9a36..d172b1f 100644
> --- a/qjson.c
> +++ b/qjson.c
> @@ -70,7 +70,7 @@ void json_end_array(QJSON *json)
>  void json_prop_int(QJSON *json, const char *name, int64_t val)
>  {
>  json_emit_element(json, name);
> -qstring_append_int(json->str, val);
> +qstring_append_format(json->str, "%" PRId64, val);
>  }
>
>  void json_prop_str(QJSON *json, const char *name, const char *str)
> diff --git a/qobject/qobject-json.c b/qobject/qobject-json.c
> index 97bccb7..963de07 100644
> --- a/qobject/qobject-json.c
> +++ b/qobject/qobject-json.c
> @@ -80,16 +80,13 @@ static void to_json(const QObject *obj, QString *str, int 
> pretty, int indent);
>  static void to_json_dict_iter(const char *key, QObject *obj, void *opaque)
>  {
>  ToJsonIterState *s = opaque;
> -int j;
>
>  if (s->count) {
>  qstring_append(s->str, s->pretty ? "," : ", ");
>  }
>
>  if (s->pretty) {
> -qstring_append(s->str, "\n");
> -for (j = 0 ; j < s->indent ; j++)
> -qstring_append(s->str, "");
> +qstring_append_format(s->str, "\n%*s", 4 * s->indent, "");
>  }
>
>  qstring_append_json_string(s->str, key);
> @@ -102,16 +99,13 @@ static void to_json_dict_iter(const char *key, QObject 
> *obj, void *opaque)
>  static void to_json_list_iter(QObject *obj, void *opaque)
>  {
>  ToJsonIterState *s = opaque;
> -int j;
>
>  if (s->count) {
>  qstring_append(s->str, s->pretty ? "," : ", ");
>  }
>
>  if (s->pretty) {
> -qstring_append(s->str, "\n");
> -for (j = 0 ; j < s->indent ; j++)
> -qstring_append(s->str, "");
> +qstring_append_format(s->str, "\n%*s", 4 * s->indent, "");
>  }
>
>  to_json(obj, s->str, s->pretty, s->indent);
> @@ -126,10 +120,7 @@ static void to_json(const QObject *obj, QString *str, 
> int pretty, int indent)
>  break;
>  case QTYPE_QINT: {
>  QInt *val = qobject_to_qint(obj);
> -char buffer[1024];
> -
> -snprintf(buffer, sizeof(buffer), "%" PRId64, qint_get_int(val));
> -qstring_append(str, buffer);
> +qstring_append_format(str, "%" PRId64, qint_get_int(val));
>  break;
>  }
>   

[Qemu-devel] [PATCH RFC 3/8] vfio: ccw: basic implementation for vfio_ccw driver

2016-04-29 Thread Dong Jia Shi
Add a basic vfio_ccw driver, which depends on the VFIO No-IOMMU
support.

Add a new config option:
  Device Drivers
  --> VFIO Non-Privileged userspace driver framework
--> VFIO No-IOMMU support
  --> VFIO support for ccw devices

Signed-off-by: Dong Jia Shi 
Reviewed-by: Pierre Morel 
---
 arch/s390/include/asm/irq.h |   1 +
 arch/s390/kernel/irq.c  |   1 +
 drivers/vfio/Kconfig|   1 +
 drivers/vfio/Makefile   |   1 +
 drivers/vfio/ccw/Kconfig|   7 ++
 drivers/vfio/ccw/Makefile   |   2 +
 drivers/vfio/ccw/vfio_ccw.c | 160 
 7 files changed, 173 insertions(+)
 create mode 100644 drivers/vfio/ccw/Kconfig
 create mode 100644 drivers/vfio/ccw/Makefile
 create mode 100644 drivers/vfio/ccw/vfio_ccw.c

diff --git a/arch/s390/include/asm/irq.h b/arch/s390/include/asm/irq.h
index f97b055..5ec272a 100644
--- a/arch/s390/include/asm/irq.h
+++ b/arch/s390/include/asm/irq.h
@@ -66,6 +66,7 @@ enum interruption_class {
IRQIO_VAI,
NMI_NMI,
CPU_RST,
+   IRQIO_VFC,
NR_ARCH_IRQS
 };
 
diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c
index c373a1d..706002a 100644
--- a/arch/s390/kernel/irq.c
+++ b/arch/s390/kernel/irq.c
@@ -88,6 +88,7 @@ static const struct irq_class irqclass_sub_desc[] = {
{.irq = IRQIO_VAI,  .name = "VAI", .desc = "[I/O] Virtual I/O Devices 
AI"},
{.irq = NMI_NMI,.name = "NMI", .desc = "[NMI] Machine Check"},
{.irq = CPU_RST,.name = "RST", .desc = "[CPU] CPU Restart"},
+   {.irq = IRQIO_VFC,  .name = "VFC", .desc = "[I/O] VFIO CCW Devices"},
 };
 
 void __init init_IRQ(void)
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index da6e2ce..f1d414c 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -46,6 +46,7 @@ menuconfig VFIO_NOIOMMU
 
  If you don't know what to do here, say N.
 
+source "drivers/vfio/ccw/Kconfig"
 source "drivers/vfio/pci/Kconfig"
 source "drivers/vfio/platform/Kconfig"
 source "virt/lib/Kconfig"
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 7b8a31f..2b39593 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
 obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
 obj-$(CONFIG_VFIO_PCI) += pci/
 obj-$(CONFIG_VFIO_PLATFORM) += platform/
+obj-$(CONFIG_VFIO_CCW) += ccw/
diff --git a/drivers/vfio/ccw/Kconfig b/drivers/vfio/ccw/Kconfig
new file mode 100644
index 000..6281152
--- /dev/null
+++ b/drivers/vfio/ccw/Kconfig
@@ -0,0 +1,7 @@
+config VFIO_CCW
+   tristate "VFIO support for CCW devices"
+   depends on VFIO_NOIOMMU && CCW
+   help
+ VFIO support for CCW bus driver. Note that this is just
+ the base driver; you'll also need a userspace program
+ to provide a device configuration and channel programs.
diff --git a/drivers/vfio/ccw/Makefile b/drivers/vfio/ccw/Makefile
new file mode 100644
index 000..ea14ca9
--- /dev/null
+++ b/drivers/vfio/ccw/Makefile
@@ -0,0 +1,2 @@
+vfio-ccw-y := vfio_ccw.o
+obj-$(CONFIG_VFIO_CCW) += vfio-ccw.o
diff --git a/drivers/vfio/ccw/vfio_ccw.c b/drivers/vfio/ccw/vfio_ccw.c
new file mode 100644
index 000..8b0acae
--- /dev/null
+++ b/drivers/vfio/ccw/vfio_ccw.c
@@ -0,0 +1,160 @@
+/*
+ * vfio based ccw device driver
+ *
+ * Copyright IBM Corp. 2016
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2 only)
+ * as published by the Free Software Foundation.
+ *
+ * Author(s): Dong Jia Shi 
+ *Xiao Feng Ren 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * struct vfio_ccw_device
+ * @cdev: ccw device
+ * @going_away: if an offline procedure was already ongoing
+ */
+struct vfio_ccw_device {
+   struct ccw_device   *cdev;
+   boolgoing_away;
+};
+
+enum vfio_ccw_device_type {
+   vfio_dasd_eckd,
+};
+
+struct ccw_device_id vfio_ccw_ids[] = {
+   { CCW_DEVICE_DEVTYPE(0x3990, 0, 0x3390, 0),
+ .driver_info = vfio_dasd_eckd},
+   { /* End of list. */ },
+};
+MODULE_DEVICE_TABLE(ccw, vfio_ccw_ids);
+
+/*
+ * vfio callbacks
+ */
+static int vfio_ccw_open(void *device_data)
+{
+   if (!try_module_get(THIS_MODULE))
+   return -ENODEV;
+
+   return 0;
+}
+
+static void vfio_ccw_release(void *device_data)
+{
+   module_put(THIS_MODULE);
+}
+
+static long vfio_ccw_ioctl(void *device_data, unsigned int cmd,
+  unsigned long arg)
+{
+   return -ENOTTY;
+}
+
+static const struct vfio_device_ops vfio_ccw_ops = {
+   .name   = "vfio_ccw",
+   .open   = vfio_ccw_open,
+   .release= vfio_ccw_release,
+   .ioctl  = vfio_ccw_ioctl,
+};
+
+static int 

[Qemu-devel] [PATCH RFC 8/8] vfio: ccw: realize VFIO_DEVICE_CCW_CMD_REQUEST ioctl

2016-04-29 Thread Dong Jia Shi
Introduce VFIO_DEVICE_CCW_CMD_REQUEST ioctl for vfio-ccw
to handle the translated ccw commands.

We implement the basic ccw command handling infrastructure
here:
1. Issue the translated ccw commands to the device.
2. Once we get the execution result, update the guest SCSW
   with it.

Signed-off-by: Dong Jia Shi 
Reviewed-by: Pierre Morel 
---
 drivers/vfio/ccw/vfio_ccw.c | 190 +++-
 include/uapi/linux/vfio.h   |  23 ++
 2 files changed, 212 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/ccw/vfio_ccw.c b/drivers/vfio/ccw/vfio_ccw.c
index 9700448..3979544 100644
--- a/drivers/vfio/ccw/vfio_ccw.c
+++ b/drivers/vfio/ccw/vfio_ccw.c
@@ -17,15 +17,30 @@
 #include 
 #include 
 #include 
+#include 
+#include "ccwchain.h"
 
 /**
  * struct vfio_ccw_device
  * @cdev: ccw device
+ * @curr_intparm: record current interrupt parameter,
+ *used for wait interrupt.
+ * @wait_q: wait for interrupt
+ * @ccwchain_cmd: address map for current ccwchain.
+ * @irb: irb info received from interrupt
+ * @orb: orb for the currently processed ssch request
+ * @scsw: scsw info
  * @going_away: if an offline procedure was already ongoing
  * @hot_reset: if hot-reset is ongoing
  */
 struct vfio_ccw_device {
struct ccw_device   *cdev;
+   u32 curr_intparm;
+   wait_queue_head_t   wait_q;
+   struct ccwchain_cmd ccwchain_cmd;
+   struct irb  irb;
+   union orb   orb;
+   union scsw  scsw;
boolgoing_away;
boolhot_reset;
 };
@@ -42,6 +57,118 @@ struct ccw_device_id vfio_ccw_ids[] = {
 MODULE_DEVICE_TABLE(ccw, vfio_ccw_ids);
 
 /*
+ * LATER:
+ * This is good for Linux guests; but we may need an interface to
+ * deal with further bits in the orb.
+ */
+static unsigned long flags_from_orb(union orb *orb)
+{
+   unsigned long flags = 0;
+
+   flags |= orb->cmd.pfch ? 0 : DOIO_DENY_PREFETCH;
+   flags |= orb->cmd.spnd ? DOIO_ALLOW_SUSPEND : 0;
+   flags |= orb->cmd.ssic ? (DOIO_SUPPRESS_INTER | DOIO_ALLOW_SUSPEND) : 0;
+
+   return flags;
+}
+
+/* Check if the current intparm has been set. */
+static int doing_io(struct vfio_ccw_device *vcdev, u32 intparm)
+{
+   unsigned long flags;
+   int ret;
+
+   spin_lock_irqsave(get_ccwdev_lock(vcdev->cdev), flags);
+   ret = (vcdev->curr_intparm == intparm);
+   spin_unlock_irqrestore(get_ccwdev_lock(vcdev->cdev), flags);
+   return ret;
+}
+
+int vfio_ccw_io_helper(struct vfio_ccw_device *vcdev)
+{
+   struct ccwchain_cmd *ccwchain_cmd;
+   struct ccw1 *cpa;
+   u32 intparm;
+   unsigned long io_flags, lock_flags;
+   int ret;
+
+   ccwchain_cmd = >ccwchain_cmd;
+   cpa = ccwchain_get_cpa(ccwchain_cmd);
+   intparm = (u32)(u64)ccwchain_cmd->k_ccwchain;
+   io_flags = flags_from_orb(>orb);
+
+   spin_lock_irqsave(get_ccwdev_lock(vcdev->cdev), lock_flags);
+   ret = ccw_device_start(vcdev->cdev, cpa, intparm,
+  vcdev->orb.cmd.lpm, io_flags);
+   if (!ret)
+   vcdev->curr_intparm = 0;
+   spin_unlock_irqrestore(get_ccwdev_lock(vcdev->cdev), lock_flags);
+
+   if (!ret)
+   wait_event(vcdev->wait_q,
+  doing_io(vcdev, intparm));
+
+   ccwchain_update_scsw(ccwchain_cmd, &(vcdev->irb.scsw));
+
+   return ret;
+}
+
+/* Deal with the ccw command request from the userspace. */
+int vfio_ccw_cmd_request(struct vfio_ccw_device *vcdev,
+struct vfio_ccw_cmd *ccw_cmd)
+{
+   union orb *orb = >orb;
+   union scsw *scsw = >scsw;
+   struct irb *irb = >irb;
+   int ret;
+
+   memcpy(orb, ccw_cmd->orb_area, sizeof(*orb));
+   memcpy(scsw, ccw_cmd->scsw_area, sizeof(*scsw));
+   vcdev->ccwchain_cmd.u_ccwchain = (void *)ccw_cmd->ccwchain_buf;
+   vcdev->ccwchain_cmd.k_ccwchain = NULL;
+   vcdev->ccwchain_cmd.nr = ccw_cmd->ccwchain_nr;
+
+   if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
+   /*
+* XXX:
+* Only support prefetch enable mode now.
+* Only support 64bit addressing idal.
+*/
+   if (!orb->cmd.pfch || !orb->cmd.c64)
+   return -EOPNOTSUPP;
+
+   ret = ccwchain_alloc(>ccwchain_cmd);
+   if (ret)
+   return ret;
+
+   ret = ccwchain_prefetch(>ccwchain_cmd);
+   if (ret) {
+   ccwchain_free(>ccwchain_cmd);
+   return ret;
+   }
+
+   /* Start channel program and wait for I/O interrupt. */
+   ret = vfio_ccw_io_helper(vcdev);
+   if (!ret) {
+   /* Get irb info and copy it to irb_area. */
+   

[Qemu-devel] [PATCH RFC 0/8] basic vfio-ccw infrastructure

2016-04-29 Thread Dong Jia Shi
vfio: ccw: basic vfio-ccw infrastructure


Introduction


Here we describe the vfio support for Channel I/O devices (aka. CCW
devices) for Linux/s390. Motivation for vfio-ccw is to passthrough CCW
devices to a virtual machine, while vfio is the means.

Different than other hardware architectures, s390 has defined a unified
I/O access method, which is so called Channel I/O. It has its own
access patterns:
- Channel programs run asynchronously on a separate (co)processor.
- The channel subsystem will access any memory designated by the caller
  in the channel program directly, i.e. there is no iommu involved.
Thus when we introduce vfio support for these devices, we realize it
with a no-iommu vfio implementation.

This document does not intend to explain the s390 hardware architecture
in every detail. More information/reference could be found here:
- A good start to know Channel I/O in general:
  https://en.wikipedia.org/wiki/Channel_I/O
- s390 architecture:
  s390 Principles of Operation manual (IBM Form. No. SA22-7832)
- The existing Qemu code which implements a simple emulated channel
  subsystem could also be a good reference. It makes it easier to
  follow the flow.
  qemu/hw/s390x/css.c

Motivation of vfio-ccw
--

Currently, a guest virtualized via qemu/kvm on s390 only sees
paravirtualized virtio devices via the "Virtio Over Channel I/O
(virtio-ccw)" transport. This makes virtio devices discoverable via
standard operating system algorithms for handling channel devices.

However this is not enough. On s390 for the majority of devices, which
use the standard Channel I/O based mechanism, we also need to provide
the functionality of passing through them to a Qemu virtual machine.
This includes devices that don't have a virtio counterpart (e.g. tape
drives) or that have specific characteristics which guests want to
exploit.

For passing a device to a guest, we want to use the same interface as
everybody else, namely vfio. Thus, we would like to introduce vfio
support for channel devices. And we would like to name this new vfio
device "vfio-ccw".

Access patterns of CCW devices
--

s390 architecture has implemented a so called channel subsystem, that
provides a unified view of the devices physically attached to the
systems. Though the s390 hardware platform knows about a huge variety of
different peripheral attachments like disk devices (aka. DASDs), tapes,
communication controllers, etc. They can all be accessed by a well
defined access method and they are presenting I/O completion a unified
way: I/O interruptions.

All I/O requires the use of channel command words (CCWs). A CCW is an
instruction to a specialized I/O channel processor. A channel program
is a sequence of CCWs which are executed by the I/O channel subsystem.
To issue a CCW program to the channel subsystem, it is required to
build an operation request block (ORB), which can be used to point out
the format of the CCW and other control information to the system. The
operating system signals the I/O channel subsystem to begin executing
the channel program with a SSCH (start sub-channel) instruction. The
central processor is then free to proceed with non-I/O instructions
until interrupted. The I/O completion result is received by the
interrupt handler in the form of interrupt response block (IRB).

Back to vfio-ccw, in short:
- ORBs and CCW programs are built in user space (with virtual
  addresses).
- ORBs and CCW programs are passed to the kernel.
- kernel translates virtual addresses to real addresses and starts the
  IO with issuing a privileged Channel I/O instruction (e.g SSCH).
- CCW programs run asynchronously on a separate processor.
- I/O completion will be signaled to the host with I/O interruptions.
  And it will be copied as IRB to user space.


vfio-ccw patches overview
-

It follows that we need vfio-ccw with a vfio no-iommu mode. For now,
our patches are based on the current no-iommu implementation. It's a
good start to launch the code review for vfio-ccw. Note that the
implementation is far from complete yet; but we'd like to get feedback
for the general architecture.

The current no-iommu implementation would consider vfio-ccw as
unsupported and will taint the kernel. This should be not true for
vfio-ccw. But whether the end result will be using the existing
no-iommu code or a new module would be an implementation detail.

* CCW translation APIs
- Description:
  These introduce a group of APIs (start with 'ccwchain_') to do CCW
  translation. The CCWs passed in by a user space program are organized
  in a buffer, with their user virtual memory addresses. These APIs will
  copy the CCWs into the kernel space, and assemble a runnable kernel
  CCW program by updating the user virtual addresses with their
  corresponding physical addresses.
- Patches:
  vfio: ccw: introduce page array interfaces
  vfio: ccw: 

[Qemu-devel] [PATCH RFC 1/8] iommu: s390: enable iommu api for s390 ccw devices

2016-04-29 Thread Dong Jia Shi
This enables IOMMU API if CONFIG_CCW is configured.

Signed-off-by: Dong Jia Shi 
Reviewed-by: Pierre Morel 
---
 drivers/iommu/Kconfig | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index dd1dc39..63bbc3d 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -331,11 +331,11 @@ config ARM_SMMU_V3
  the ARM SMMUv3 architecture.
 
 config S390_IOMMU
-   def_bool y if S390 && PCI
-   depends on S390 && PCI
+   def_bool y
+   depends on S390 && (PCI || CCW)
select IOMMU_API
help
- Support for the IOMMU API for s390 PCI devices.
+ Support for the IOMMU API for s390 PCI and CCW devices.
 
 config MTK_IOMMU
bool "MTK IOMMU Support"
-- 
2.6.6




[Qemu-devel] [PATCH RFC 2/8] s390: move orb.h from drivers/s390/ to arch/s390/

2016-04-29 Thread Dong Jia Shi
Let's make orb-related definitions available outside
of the common I/O layer for future use (e.g. for
passing channel devices to a guest).

Signed-off-by: Dong Jia Shi 
Reviewed-by: Pierre Morel 
---
 {drivers/s390/cio => arch/s390/include/asm}/orb.h | 0
 drivers/s390/cio/eadm_sch.c   | 2 +-
 drivers/s390/cio/eadm_sch.h   | 2 +-
 drivers/s390/cio/io_sch.h | 2 +-
 drivers/s390/cio/ioasm.c  | 2 +-
 drivers/s390/cio/ioasm.h  | 2 +-
 drivers/s390/cio/trace.h  | 2 +-
 7 files changed, 6 insertions(+), 6 deletions(-)
 rename {drivers/s390/cio => arch/s390/include/asm}/orb.h (100%)

diff --git a/drivers/s390/cio/orb.h b/arch/s390/include/asm/orb.h
similarity index 100%
rename from drivers/s390/cio/orb.h
rename to arch/s390/include/asm/orb.h
diff --git a/drivers/s390/cio/eadm_sch.c b/drivers/s390/cio/eadm_sch.c
index b3f44bc..8082a03 100644
--- a/drivers/s390/cio/eadm_sch.c
+++ b/drivers/s390/cio/eadm_sch.c
@@ -21,12 +21,12 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "eadm_sch.h"
 #include "ioasm.h"
 #include "cio.h"
 #include "css.h"
-#include "orb.h"
 
 MODULE_DESCRIPTION("driver for s390 eadm subchannels");
 MODULE_LICENSE("GPL");
diff --git a/drivers/s390/cio/eadm_sch.h b/drivers/s390/cio/eadm_sch.h
index 9664e46..2184920 100644
--- a/drivers/s390/cio/eadm_sch.h
+++ b/drivers/s390/cio/eadm_sch.h
@@ -5,7 +5,7 @@
 #include 
 #include 
 #include 
-#include "orb.h"
+#include 
 
 struct eadm_private {
union orb orb;
diff --git a/drivers/s390/cio/io_sch.h b/drivers/s390/cio/io_sch.h
index 8975060..b768523 100644
--- a/drivers/s390/cio/io_sch.h
+++ b/drivers/s390/cio/io_sch.h
@@ -5,8 +5,8 @@
 #include 
 #include 
 #include 
+#include 
 #include "css.h"
-#include "orb.h"
 
 struct io_subchannel_private {
union orb orb;  /* operation request block */
diff --git a/drivers/s390/cio/ioasm.c b/drivers/s390/cio/ioasm.c
index 9898481..7fd413d 100644
--- a/drivers/s390/cio/ioasm.c
+++ b/drivers/s390/cio/ioasm.c
@@ -7,9 +7,9 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ioasm.h"
-#include "orb.h"
 #include "cio.h"
 
 int stsch(struct subchannel_id schid, struct schib *addr)
diff --git a/drivers/s390/cio/ioasm.h b/drivers/s390/cio/ioasm.h
index b31ee6b..b2ca4a3 100644
--- a/drivers/s390/cio/ioasm.h
+++ b/drivers/s390/cio/ioasm.h
@@ -4,7 +4,7 @@
 #include 
 #include 
 #include 
-#include "orb.h"
+#include 
 #include "cio.h"
 #include "trace.h"
 
diff --git a/drivers/s390/cio/trace.h b/drivers/s390/cio/trace.h
index 5b807a0..ba58f7c 100644
--- a/drivers/s390/cio/trace.h
+++ b/drivers/s390/cio/trace.h
@@ -7,10 +7,10 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "cio.h"
-#include "orb.h"
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM s390
-- 
2.6.6




[Qemu-devel] [PATCH RFC 4/8] vfio: ccw: realize VFIO_DEVICE_GET_INFO ioctl

2016-04-29 Thread Dong Jia Shi
Introduce device information about vfio-ccw: VFIO_DEVICE_FLAGS_CCW.
Realize VFIO_DEVICE_GET_INFO ioctl for vfio-ccw.

Signed-off-by: Dong Jia Shi 
Reviewed-by: Pierre Morel 
---
 drivers/vfio/ccw/vfio_ccw.c | 20 
 include/uapi/linux/vfio.h   |  1 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/vfio/ccw/vfio_ccw.c b/drivers/vfio/ccw/vfio_ccw.c
index 8b0acae..7331aed 100644
--- a/drivers/vfio/ccw/vfio_ccw.c
+++ b/drivers/vfio/ccw/vfio_ccw.c
@@ -58,6 +58,26 @@ static void vfio_ccw_release(void *device_data)
 static long vfio_ccw_ioctl(void *device_data, unsigned int cmd,
   unsigned long arg)
 {
+   unsigned long minsz;
+
+   if (cmd == VFIO_DEVICE_GET_INFO) {
+   struct vfio_device_info info;
+
+   minsz = offsetofend(struct vfio_device_info, num_irqs);
+
+   if (copy_from_user(, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (info.argsz < minsz)
+   return -EINVAL;
+
+   info.flags = VFIO_DEVICE_FLAGS_CCW;
+   info.num_regions = 0;
+   info.num_irqs = 0;
+
+   return copy_to_user((void __user *)arg, , minsz);
+   }
+
return -ENOTTY;
 }
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 255a211..aaedfcd 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -198,6 +198,7 @@ struct vfio_device_info {
 #define VFIO_DEVICE_FLAGS_PCI  (1 << 1)/* vfio-pci device */
 #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)/* vfio-platform device */
 #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)   /* vfio-amba device */
+#define VFIO_DEVICE_FLAGS_CCW  (1 << 4)/* vfio-ccw device */
__u32   num_regions;/* Max region index + 1 */
__u32   num_irqs;   /* Max IRQ index + 1 */
 };
-- 
2.6.6




[Qemu-devel] [PATCH RFC 7/8] vfio: ccw: introduce ccw chain interfaces

2016-04-29 Thread Dong Jia Shi
Introduce ccwchain structure and helper functions that can be used to
handle special ccw programs issued from user-space.

The following limitations apply:
1. Supports only prefetch enabled mode.
2. Supports direct ccw chaining by translating them to idal ccws.
3. Supports idal(c64) ccw chaining.

These interfaces are designed to support translation only for special
ccw programs, which are generated and formatted by a user-space
program. Thus this will make it possible for VFIO to leverage the
interfaces to realize channel I/O device drivers in user-space.

User-space programs should prepare the ccws according to the rules
below:
1. Allocate a 4K memory buffer in user-space to store all of the ccw
   program information.
2. Lower 2k of the buffer are used to store a maximum of 256 ccws.
3. Upper 2k of the buffer are used to store a maximum of 256
   corresponding cda data sets, each having a length of 8 bytes.
4. All of the ccws should be placed one after another.
5. For direct and idal ccw:
   - Find a free cda data entry, and find its offset to the address
 of the cda buffer.
   - Store the offset as the CDA value in the ccw.
   - Store the user virtual address of the data (idaw) as the data of
 the cda entry.
6. For tic ccw:
   - Find the target ccw, and find its offset to the address of the
 ccw buffer.
   - Store the offset as the CDA value in the ccw.

Signed-off-by: Dong Jia Shi 
Reviewed-by: Pierre Morel 
---
 drivers/vfio/ccw/ccwchain.c | 441 
 drivers/vfio/ccw/ccwchain.h |  49 +
 2 files changed, 490 insertions(+)
 create mode 100644 drivers/vfio/ccw/ccwchain.h

diff --git a/drivers/vfio/ccw/ccwchain.c b/drivers/vfio/ccw/ccwchain.c
index 03b4e82..964b6479 100644
--- a/drivers/vfio/ccw/ccwchain.c
+++ b/drivers/vfio/ccw/ccwchain.c
@@ -11,8 +11,19 @@
  *Xiao Feng Ren 
  */
 
+#include 
+#include 
+#include 
 #include 
 #include 
+#include "ccwchain.h"
+
+/*
+ * Max length for ccw chain.
+ * XXX: Limit to 256, need to check more?
+ */
+#define CCWCHAIN_LEN_MAX   256
+#define CDA_ITEM_SIZE  3 /* sizeof(u64) == (1 << 3) */
 
 struct page_array {
u64 hva;
@@ -25,6 +36,20 @@ struct page_arrays {
int nr;
 };
 
+struct ccwchain_buf {
+   struct ccw1 ccw[CCWCHAIN_LEN_MAX];
+   u64 cda[CCWCHAIN_LEN_MAX];
+};
+
+struct ccwchain {
+   struct ccwchain_buf buf;
+
+   /* Valid ccw number in chain */
+   int nr;
+   /* Pinned PAGEs for the original data. */
+   struct page_arrays  *pss;
+};
+
 /*
  * Helpers to operate page_array.
  */
@@ -126,3 +151,419 @@ static void page_arrays_unpin_free(struct page_arrays *ps)
ps->parray = NULL;
ps->nr = 0;
 }
+
+/*
+ * Helpers to operate ccwchain.
+ */
+/* Return the number of idal words needed for an address/length pair. */
+static inline unsigned int ccwchain_idal_nr_words(u64 addr, unsigned int 
length)
+{
+   /*
+* User virtual address and its corresponding kernel physical address
+* are aligned by pages. Thus their offsets to the page boundary will be
+* the same.
+* Althought idal_nr_words expects a virtual address as its first param,
+* it is the offset that matters. It's fine to use either hva or hpa as
+* the input, since they have the same offset inside a page.
+*/
+   return idal_nr_words((void *)(addr), length);
+}
+
+/* Create the list idal words for a page_arrays. */
+static inline void ccwchain_idal_create_words(unsigned long *idaws,
+ struct page_arrays *ps)
+{
+   int i, j, k;
+
+   /*
+* Idal words (execept the first one) rely on the memory being 4k
+* aligned. If a user virtual address is 4K aligned, then it's
+* corresponding kernel physical address will also be 4K aligned. Thus
+* there will be no problem here to simply use the hpa to create an
+* idaw.
+*/
+   k = 0;
+   for (i = 0; i < ps->nr; i++)
+   for (j = 0; j < ps->parray[i].nr; j++) {
+   idaws[k] = page_to_phys(ps->parray[i].items[j]);
+   if (k == 0)
+   idaws[k] += ps->parray[i].hva & ~PAGE_MASK;
+   k++;
+   }
+}
+
+#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
+
+#define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
+
+#define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
+
+#define ccw_is_idal(_ccw) ((_ccw)->flags & CCW_FLAG_IDA)
+
+/* Free resource for a ccw that allocated memory for its cda. */
+static void ccw_chain_cda_free(struct ccwchain *chain, int idx)
+{
+   struct ccw1 *ccw = chain->buf.ccw + idx;
+
+   if (!ccw->count)
+   return;

  1   2   >