Re: [Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argument 'vii'

2019-06-02 Thread Greg Kurz
On Wed, 22 May 2019 18:03:13 +0200
Christian Schoenebeck  wrote:

> On Montag, 20. Mai 2019 16:05:09 CEST Greg Kurz wrote:
> > Hi Christian,  
> 
> Hi Greg,
> 
> > On the other hand, I'm afraid that having a functional solution won't
> > motivate people to come up with a new spec... Anyway, I agree that the
> > data corruption/loss issues must be prevented, ie, the 9p server should
> > at least return an error to the client rather than returning a colliding
> > QID.   
> 
> Ok, I will extend Antonios' patch to log that error on host. I thought about 
> limiting that error message to once per session (for not flooding the logs), 
> but it is probably not worth it, so if you don't veto then I will just log 
> that error simply on every file access.
> 

Please use error_report_once().

> > A simple way to avoid that is to enforce a single device, ie. patch
> > 2 in Antonios's series. Of course this may break some setups where
> > multiple devices are involved, but it is pure luck if they aren't already
> > broken with the current code base.   
> 
> Yes, the worst thing you can have is this collision silently being ignored, 
> like it is actually right now. Because you end up getting all sorts of 
> different misbehaviours on guests, and these are just symptoms that take a 
> while to debug and realise that the actual root cause is an inode collision. 
> So enforcing a single device is still better than undefined behaviour.
> 
> > > Background: The concept of FS "data sets" combines the benefits of
> > > classical partitions (e.g. logical file space separation, independent fs
> > > configurations like compression on/off/algorithm, data deduplication
> > > on/off, snapshot isolation, snapshots on/off) without the disadvantages
> > > of classical real partitions (physical space is dynamically shared, no
> > > space wasted on fixed boundaries; physical device pool management is
> > > transparent for all data sets, configuration options can be inherited
> > > from parent data sets).  
> > 
> > Ok. I must admit my ignorance around ZFS and "data sets"... so IIUC, even
> > with a single underlying physical device you might end up with lstat()
> > returning different st_dev values on the associated files, correct ?
> > 
> > I take your word for the likeliness of the issue to popup in such setups. 
> > :)  
> 
> Yes, that is correct, you _always_ get a different stat::st_dev value for 
> each 
> ZFS data set. Furthermore, each ZFS data set has its own inode number 
> sequence 
> generator starting from one. So consider you create two new ZFS data sets, 
> then you create one file on each data set, then both files will have inode 
> number 1.
> 
> That probably makes it clear why you hit this ID collision bug very easily 
> when using the combination ZFS & 9p.
> 
> > > also a big difference giving the user the _optional_ possibility to define
> > > e.g. one path (not device) on guest said to be sensitive regarding high
> > > inode numbers on guest; and something completely different telling the
> > > user that he _must_ configure every single device from host that is ever
> > > supposed to pop up with 9p on guest and forcing the user to update that
> > > configuration whenever a new device is added or removed on host. The
> > > "vii" configuration feature does not require any knowledge of how many
> > > and which kinds of devices are actually ever used on host (nor on any
> > > higher level host in case of nested
> > > virtualization), nor does that "vii" config require any changes ever when
> > > host device setup changes. So 9p's core transparency feature would not be
> > > touched at all.  
> > 
> > I guess this all boils down to I finding some time to read/understand more
> > :)  
> 
> Yes, that helps sometimes. :)
> 
> > As usual, a series with smaller and simpler patches will be easier to
> > review, and more likely to be merged.  
> 
> Of course.
> 
> In the next patch series I will completely drop a) the entire QID persistency 
> feature code and b) that disputed "vii" feature. But I will still suggest the 
> variable inode suffix length patch as last patch in that new patch series.
> 
> That should make the amount of changes manageable small.
> 
> > > Let me make a different suggestion: how about putting these fixes into a
> > > separate C unit for now and making the default behaviour (if you really
> > > want) to not use any of that code by default at all. So the user would
> > > just get an error message in the qemu log files by default if he tries to
> > > export several devices with one 9p device, suggesting him either to map
> > > them as separate 9p devices (like you suggested) and informing the user
> > > about the alternative of enabling support for the automatic inode
> > > remapping code (from that separate C unit) instead by adding one
> > > convenient config option if he/she really wants.  
> > It seems that we may be reaching some consensus here :)
> > 
> > I like the approach, provided this is con

[Qemu-devel] [PATCH v4 10/11] kvm: Support KVM_CLEAR_DIRTY_LOG

2019-06-02 Thread Peter Xu
Firstly detect the interface using KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
and mark it.  When failed to enable the new feature we'll fall back to
the old sync.

Provide the log_clear() hook for the memory listeners for both address
spaces of KVM (normal system memory, and SMM) and deliever the clear
message to kernel.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 accel/kvm/kvm-all.c| 182 +
 accel/kvm/trace-events |   1 +
 2 files changed, 183 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index e3006006ea..5511550d21 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -91,6 +91,7 @@ struct KVMState
 int many_ioeventfds;
 int intx_set_mask;
 bool sync_mmu;
+bool manual_dirty_log_protect;
 /* The man page (and posix) say ioctl numbers are signed int, but
  * they're not.  Linux, glibc and *BSD all treat ioctl numbers as
  * unsigned, and treating them as signed here can break things */
@@ -555,6 +556,159 @@ out:
 return ret;
 }
 
+/* Alignment requirement for KVM_CLEAR_DIRTY_LOG - 64 pages */
+#define KVM_CLEAR_LOG_SHIFT  6
+#define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
+#define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
+
+/**
+ * kvm_physical_log_clear - Clear the kernel's dirty bitmap for range
+ *
+ * NOTE: this will be a no-op if we haven't enabled manual dirty log
+ * protection in the host kernel because in that case this operation
+ * will be done within log_sync().
+ *
+ * @kml: the kvm memory listener
+ * @section: the memory range to clear dirty bitmap
+ */
+static int kvm_physical_log_clear(KVMMemoryListener *kml,
+  MemoryRegionSection *section)
+{
+KVMState *s = kvm_state;
+struct kvm_clear_dirty_log d;
+uint64_t start, end, bmap_start, start_delta, bmap_npages, size;
+unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
+KVMSlot *mem = NULL;
+int ret, i;
+
+if (!s->manual_dirty_log_protect) {
+/* No need to do explicit clear */
+return 0;
+}
+
+start = section->offset_within_address_space;
+size = int128_get64(section->size);
+
+if (!size) {
+/* Nothing more we can do... */
+return 0;
+}
+
+kvm_slots_lock(kml);
+
+/* Find any possible slot that covers the section */
+for (i = 0; i < s->nr_slots; i++) {
+mem = &kml->slots[i];
+if (mem->start_addr <= start &&
+start + size <= mem->start_addr + mem->memory_size) {
+break;
+}
+}
+
+/*
+ * We should always find one memslot until this point, otherwise
+ * there could be something wrong from the upper layer
+ */
+assert(mem && i != s->nr_slots);
+
+/*
+ * We need to extend either the start or the size or both to
+ * satisfy the KVM interface requirement.  Firstly, do the start
+ * page alignment on 64 host pages
+ */
+bmap_start = (start - mem->start_addr) & KVM_CLEAR_LOG_MASK;
+start_delta = start - mem->start_addr - bmap_start;
+bmap_start /= psize;
+
+/*
+ * The kernel interface has restriction on the size too, that either:
+ *
+ * (1) the size is 64 host pages aligned (just like the start), or
+ * (2) the size fills up until the end of the KVM memslot.
+ */
+bmap_npages = DIV_ROUND_UP(size + start_delta, KVM_CLEAR_LOG_ALIGN)
+<< KVM_CLEAR_LOG_SHIFT;
+end = mem->memory_size / psize;
+if (bmap_npages > end - bmap_start) {
+bmap_npages = end - bmap_start;
+}
+start_delta /= psize;
+
+/*
+ * Prepare the bitmap to clear dirty bits.  Here we must guarantee
+ * that we won't clear any unknown dirty bits otherwise we might
+ * accidentally clear some set bits which are not yet synced from
+ * the kernel into QEMU's bitmap, then we'll lose track of the
+ * guest modifications upon those pages (which can directly lead
+ * to guest data loss or panic after migration).
+ *
+ * Layout of the KVMSlot.dirty_bmap:
+ *
+ *   |< bmap_npages ---..>|
+ * [1]
+ * start_delta size
+ *  ||-|--||
+ *  ^^ ^   ^
+ *  || |   |
+ * start  bmap_start (start) end
+ * of memslot of memslot
+ *
+ * [1] bmap_npages can be aligned to either 64 pages or the end of slot
+ */
+
+assert(bmap_start % BITS_PER_LONG == 0);
+/* We should never do log_clear before log_sync */
+assert(mem->dirty_bmap);
+if (start_delta) {
+/* Slow path - we need to manipulate a temp bitmap 

[Qemu-devel] [PATCH v4 05/11] memory: Pass mr into snapshot_and_clear_dirty

2019-06-02 Thread Peter Xu
Also we change the 2nd parameter of it to be the relative offset
within the memory region. This is to be used in follow up patches.

Signed-off-by: Peter Xu 
---
 exec.c  | 3 ++-
 include/exec/ram_addr.h | 2 +-
 memory.c| 3 +--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index 4e734770c2..815e4f48b9 100644
--- a/exec.c
+++ b/exec.c
@@ -1387,9 +1387,10 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t 
start,
 }
 
 DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty
- (ram_addr_t start, ram_addr_t length, unsigned client)
+(MemoryRegion *mr, hwaddr offset, hwaddr length, unsigned client)
 {
 DirtyMemoryBlocks *blocks;
+ram_addr_t start = memory_region_get_ram_addr(mr) + offset;
 unsigned long align = 1UL << (TARGET_PAGE_BITS + BITS_PER_LEVEL);
 ram_addr_t first = QEMU_ALIGN_DOWN(start, align);
 ram_addr_t last  = QEMU_ALIGN_UP(start + length, align);
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 79e70a96ee..a4456f3615 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -403,7 +403,7 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t 
start,
   unsigned client);
 
 DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty
-(ram_addr_t start, ram_addr_t length, unsigned client);
+(MemoryRegion *mr, hwaddr offset, hwaddr length, unsigned client);
 
 bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
 ram_addr_t start,
diff --git a/memory.c b/memory.c
index cff0ea8f40..84bba7b65c 100644
--- a/memory.c
+++ b/memory.c
@@ -2071,8 +2071,7 @@ DirtyBitmapSnapshot 
*memory_region_snapshot_and_clear_dirty(MemoryRegion *mr,
 {
 assert(mr->ram_block);
 memory_region_sync_dirty_bitmap(mr);
-return cpu_physical_memory_snapshot_and_clear_dirty(
-memory_region_get_ram_addr(mr) + addr, size, client);
+return cpu_physical_memory_snapshot_and_clear_dirty(mr, addr, size, 
client);
 }
 
 bool memory_region_snapshot_get_dirty(MemoryRegion *mr, DirtyBitmapSnapshot 
*snap,
-- 
2.17.1




[Qemu-devel] [PATCH v4 08/11] kvm: Persistent per kvmslot dirty bitmap

2019-06-02 Thread Peter Xu
When synchronizing dirty bitmap from kernel KVM we do it in a
per-kvmslot fashion and we allocate the userspace bitmap for each of
the ioctl.  This patch instead make the bitmap cache be persistent
then we don't need to g_malloc0() every time.

More importantly, the cached per-kvmslot dirty bitmap will be further
used when we want to add support for the KVM_CLEAR_DIRTY_LOG and this
cached bitmap will be used to guarantee we won't clear any unknown
dirty bits otherwise that can be a severe data loss issue for
migration code.

Signed-off-by: Peter Xu 
---
 accel/kvm/kvm-all.c  | 10 +++---
 include/sysemu/kvm_int.h |  2 ++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index b686531586..9fb0694772 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -511,17 +511,19 @@ static int 
kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
  */
 size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
  /*HOST_LONG_BITS*/ 64) / 8;
-d.dirty_bitmap = g_malloc0(size);
+if (!mem->dirty_bmap) {
+/* Allocate on the first log_sync, once and for all */
+mem->dirty_bmap = g_malloc0(size);
+}
 
+d.dirty_bitmap = mem->dirty_bmap;
 d.slot = mem->slot | (kml->as_id << 16);
 if (kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d) == -1) {
 DPRINTF("ioctl failed %d\n", errno);
-g_free(d.dirty_bitmap);
 return -1;
 }
 
 kvm_get_dirty_pages_log_range(section, d.dirty_bitmap);
-g_free(d.dirty_bitmap);
 }
 
 return 0;
@@ -796,6 +798,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 }
 
 /* unregister the slot */
+g_free(mem->dirty_bmap);
+mem->dirty_bmap = NULL;
 mem->memory_size = 0;
 mem->flags = 0;
 err = kvm_set_user_memory_region(kml, mem, false);
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index f838412491..687a2ee423 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -21,6 +21,8 @@ typedef struct KVMSlot
 int slot;
 int flags;
 int old_flags;
+/* Dirty bitmap cache for the slot */
+unsigned long *dirty_bmap;
 } KVMSlot;
 
 typedef struct KVMMemoryListener {
-- 
2.17.1




[Qemu-devel] [PATCH v4 04/11] bitmap: Add bitmap_copy_with_{src|dst}_offset()

2019-06-02 Thread Peter Xu
These helpers copy the source bitmap to destination bitmap with a
shift either on the src or dst bitmap.

Meanwhile, we never have bitmap tests but we should.

This patch also introduces the initial test cases for utils/bitmap.c
but it only tests the newly introduced functions.

Signed-off-by: Peter Xu 
---
 include/qemu/bitmap.h  |  9 +
 tests/Makefile.include |  2 +
 tests/test-bitmap.c| 72 +++
 util/bitmap.c  | 85 ++
 4 files changed, 168 insertions(+)
 create mode 100644 tests/test-bitmap.c

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 5c313346b9..82a1d2f41f 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -41,6 +41,10 @@
  * bitmap_find_next_zero_area(buf, len, pos, n, mask)  Find bit free area
  * bitmap_to_le(dst, src, nbits)  Convert bitmap to little endian
  * bitmap_from_le(dst, src, nbits)Convert bitmap from little endian
+ * bitmap_copy_with_src_offset(dst, src, offset, nbits)
+ **dst = *src (with an offset into src)
+ * bitmap_copy_with_dst_offset(dst, src, offset, nbits)
+ **dst = *src (with an offset into dst)
  */
 
 /*
@@ -271,4 +275,9 @@ void bitmap_to_le(unsigned long *dst, const unsigned long 
*src,
 void bitmap_from_le(unsigned long *dst, const unsigned long *src,
 long nbits);
 
+void bitmap_copy_with_src_offset(unsigned long *dst, const unsigned long *src,
+ unsigned long offset, unsigned long nbits);
+void bitmap_copy_with_dst_offset(unsigned long *dst, const unsigned long *src,
+ unsigned long shift, unsigned long nbits);
+
 #endif /* BITMAP_H */
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 1865f6b322..5e2d7dddff 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -64,6 +64,7 @@ check-unit-y += tests/test-opts-visitor$(EXESUF)
 check-unit-$(CONFIG_BLOCK) += tests/test-coroutine$(EXESUF)
 check-unit-y += tests/test-visitor-serialization$(EXESUF)
 check-unit-y += tests/test-iov$(EXESUF)
+check-unit-y += tests/test-bitmap$(EXESUF)
 check-unit-$(CONFIG_BLOCK) += tests/test-aio$(EXESUF)
 check-unit-$(CONFIG_BLOCK) += tests/test-aio-multithread$(EXESUF)
 check-unit-$(CONFIG_BLOCK) += tests/test-throttle$(EXESUF)
@@ -529,6 +530,7 @@ tests/test-image-locking$(EXESUF): 
tests/test-image-locking.o $(test-block-obj-y
 tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y)
 tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y)
 tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) 
$(test-crypto-obj-y)
+tests/test-bitmap$(EXESUF): tests/test-bitmap.o $(test-util-obj-y)
 tests/test-x86-cpuid$(EXESUF): tests/test-x86-cpuid.o
 tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o 
migration/page_cache.o $(test-util-obj-y)
 tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o 
$(test-util-obj-y)
diff --git a/tests/test-bitmap.c b/tests/test-bitmap.c
new file mode 100644
index 00..43f7ba26c5
--- /dev/null
+++ b/tests/test-bitmap.c
@@ -0,0 +1,72 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Bitmap.c unit-tests.
+ *
+ * Copyright (C) 2019, Red Hat, Inc.
+ *
+ * Author: Peter Xu 
+ */
+
+#include 
+#include "qemu/osdep.h"
+#include "qemu/bitmap.h"
+
+#define BMAP_SIZE  1024
+
+static void check_bitmap_copy_with_offset(void)
+{
+unsigned long *bmap1, *bmap2, *bmap3, total;
+
+bmap1 = bitmap_new(BMAP_SIZE);
+bmap2 = bitmap_new(BMAP_SIZE);
+bmap3 = bitmap_new(BMAP_SIZE);
+
+bmap1[0] = random();
+bmap1[1] = random();
+bmap1[2] = random();
+bmap1[3] = random();
+total = BITS_PER_LONG * 4;
+
+/* Shift 115 bits into bmap2 */
+bitmap_copy_with_dst_offset(bmap2, bmap1, 115, total);
+/* Shift another 85 bits into bmap3 */
+bitmap_copy_with_dst_offset(bmap3, bmap2, 85, total + 115);
+/* Shift back 200 bits back */
+bitmap_copy_with_src_offset(bmap2, bmap3, 200, total);
+
+g_assert_cmpmem(bmap1, total / sizeof(unsigned long),
+bmap2, total / sizeof(unsigned long));
+
+bitmap_clear(bmap1, 0, BMAP_SIZE);
+/* Set bits in bmap1 are 100-245 */
+bitmap_set(bmap1, 100, 145);
+
+/* Set bits in bmap2 are 60-205 */
+bitmap_copy_with_src_offset(bmap2, bmap1, 40, 250);
+g_assert_cmpint(find_first_bit(bmap2, 60), ==, 60);
+g_assert_cmpint(find_next_zero_bit(bmap2, 205, 60), ==, 205);
+g_assert(test_bit(205, bmap2) == 0);
+
+/* Set bits in bmap3 are 135-280 */
+bitmap_copy_with_dst_offset(bmap3, bmap1, 35, 250);
+g_assert_cmpint(find_first_bit(bmap3, 135), ==, 135);
+g_assert_cmpint(find_next_zero_bit(bmap3, 280, 135), ==, 280);
+g_assert(test_bit(280, bmap3) == 0);
+
+g_free(bmap1);
+g_free(bmap2);
+g_free(bmap3);
+}
+
+int main(int argc, char **argv)
+{
+g_test_init(

[Qemu-devel] [PATCH v4 07/11] kvm: Update comments for sync_dirty_bitmap

2019-06-02 Thread Peter Xu
It's obviously obsolete.  Do some update.

Signed-off-by: Peter Xu 
---
 accel/kvm/kvm-all.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 524c4ddfbd..b686531586 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -473,13 +473,13 @@ static int 
kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
 #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
 
 /**
- * kvm_physical_sync_dirty_bitmap - Grab dirty bitmap from kernel space
- * This function updates qemu's dirty bitmap using
- * memory_region_set_dirty().  This means all bits are set
- * to dirty.
+ * kvm_physical_sync_dirty_bitmap - Sync dirty bitmap from kernel space
  *
- * @start_add: start of logged region.
- * @end_addr: end of logged region.
+ * This function will first try to fetch dirty bitmap from the kernel,
+ * and then updates qemu's dirty bitmap.
+ *
+ * @kml: the KVM memory listener object
+ * @section: the memory section to sync the dirty bitmap with
  */
 static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
   MemoryRegionSection *section)
-- 
2.17.1




[Qemu-devel] [PATCH v4 03/11] memory: Don't set migration bitmap when without migration

2019-06-02 Thread Peter Xu
Similar to 9460dee4b2 ("memory: do not touch code dirty bitmap unless
TCG is enabled", 2015-06-05) but for the migration bitmap - we can
skip the MIGRATION bitmap update if migration not enabled.

Reviewed-by: Paolo Bonzini 
Reviewed-by: Juan Quintela 
Signed-off-by: Peter Xu 
---
 include/exec/memory.h   |  2 ++
 include/exec/ram_addr.h | 12 +++-
 memory.c|  2 +-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index e6140e8a04..f29300c54d 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -46,6 +46,8 @@
 OBJECT_GET_CLASS(IOMMUMemoryRegionClass, (obj), \
  TYPE_IOMMU_MEMORY_REGION)
 
+extern bool global_dirty_log;
+
 typedef struct MemoryRegionOps MemoryRegionOps;
 typedef struct MemoryRegionMmio MemoryRegionMmio;
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 6fc49e5db5..79e70a96ee 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -348,8 +348,13 @@ static inline void 
cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
 if (bitmap[k]) {
 unsigned long temp = leul_to_cpu(bitmap[k]);
 
-atomic_or(&blocks[DIRTY_MEMORY_MIGRATION][idx][offset], temp);
 atomic_or(&blocks[DIRTY_MEMORY_VGA][idx][offset], temp);
+
+if (global_dirty_log) {
+atomic_or(&blocks[DIRTY_MEMORY_MIGRATION][idx][offset],
+  temp);
+}
+
 if (tcg_enabled()) {
 atomic_or(&blocks[DIRTY_MEMORY_CODE][idx][offset], temp);
 }
@@ -366,6 +371,11 @@ static inline void 
cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
 xen_hvm_modified_memory(start, pages << TARGET_PAGE_BITS);
 } else {
 uint8_t clients = tcg_enabled() ? DIRTY_CLIENTS_ALL : 
DIRTY_CLIENTS_NOCODE;
+
+if (!global_dirty_log) {
+clients &= ~(1 << DIRTY_MEMORY_MIGRATION);
+}
+
 /*
  * bitmap-traveling is faster than memory-traveling (for addr...)
  * especially when most of the memory is not dirty.
diff --git a/memory.c b/memory.c
index 0920c105aa..cff0ea8f40 100644
--- a/memory.c
+++ b/memory.c
@@ -38,7 +38,7 @@
 static unsigned memory_region_transaction_depth;
 static bool memory_region_update_pending;
 static bool ioeventfd_update_pending;
-static bool global_dirty_log = false;
+bool global_dirty_log;
 
 static QTAILQ_HEAD(, MemoryListener) memory_listeners
 = QTAILQ_HEAD_INITIALIZER(memory_listeners);
-- 
2.17.1




[Qemu-devel] [PATCH v4 06/11] memory: Introduce memory listener hook log_clear()

2019-06-02 Thread Peter Xu
Introduce a new memory region listener hook log_clear() to allow the
listeners to hook onto the points where the dirty bitmap is cleared by
the bitmap users.

Previously log_sync() contains two operations:

  - dirty bitmap collection, and,
  - dirty bitmap clear on remote site.

Let's take KVM as example - log_sync() for KVM will first copy the
kernel dirty bitmap to userspace, and at the same time we'll clear the
dirty bitmap there along with re-protecting all the guest pages again.

We add this new log_clear() interface only to split the old log_sync()
into two separated procedures:

  - use log_sync() to collect the collection only, and,
  - use log_clear() to clear the remote dirty bitmap.

With the new interface, the memory listener users will still be able
to decide how to implement the log synchronization procedure, e.g.,
they can still only provide log_sync() method only and put all the two
procedures within log_sync() (that's how the old KVM works before
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is introduced).  However with this
new interface the memory listener users will start to have a chance to
postpone the log clear operation explicitly if the module supports.
That can really benefit users like KVM at least for host kernels that
support KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2.

There are three places that can clear dirty bits in any one of the
dirty bitmap in the ram_list.dirty_memory[3] array:

cpu_physical_memory_snapshot_and_clear_dirty
cpu_physical_memory_test_and_clear_dirty
cpu_physical_memory_sync_dirty_bitmap

Currently we hook directly into each of the functions to notify about
the log_clear().

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 exec.c  | 12 ++
 include/exec/memory.h   | 17 ++
 include/exec/ram_addr.h |  3 +++
 memory.c| 51 +
 4 files changed, 83 insertions(+)

diff --git a/exec.c b/exec.c
index 815e4f48b9..9e13b4f200 100644
--- a/exec.c
+++ b/exec.c
@@ -1355,6 +1355,8 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t 
start,
 DirtyMemoryBlocks *blocks;
 unsigned long end, page;
 bool dirty = false;
+RAMBlock *ramblock;
+uint64_t mr_offset, mr_size;
 
 if (length == 0) {
 return false;
@@ -1366,6 +1368,10 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t 
start,
 rcu_read_lock();
 
 blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
+ramblock = qemu_get_ram_block(start);
+/* Range sanity check on the ramblock */
+assert(start >= ramblock->offset &&
+   start + length <= ramblock->offset + ramblock->used_length);
 
 while (page < end) {
 unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
@@ -1377,6 +1383,10 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t 
start,
 page += num;
 }
 
+mr_offset = (ram_addr_t)(page << TARGET_PAGE_BITS) - ramblock->offset;
+mr_size = (end - page) << TARGET_PAGE_BITS;
+memory_region_clear_dirty_bitmap(ramblock->mr, mr_offset, mr_size);
+
 rcu_read_unlock();
 
 if (dirty && tcg_enabled()) {
@@ -1432,6 +1442,8 @@ DirtyBitmapSnapshot 
*cpu_physical_memory_snapshot_and_clear_dirty
 tlb_reset_dirty_range_all(start, length);
 }
 
+memory_region_clear_dirty_bitmap(mr, offset, length);
+
 return snap;
 }
 
diff --git a/include/exec/memory.h b/include/exec/memory.h
index f29300c54d..d752b2a758 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -416,6 +416,7 @@ struct MemoryListener {
 void (*log_stop)(MemoryListener *listener, MemoryRegionSection *section,
  int old, int new);
 void (*log_sync)(MemoryListener *listener, MemoryRegionSection *section);
+void (*log_clear)(MemoryListener *listener, MemoryRegionSection *section);
 void (*log_global_start)(MemoryListener *listener);
 void (*log_global_stop)(MemoryListener *listener);
 void (*eventfd_add)(MemoryListener *listener, MemoryRegionSection *section,
@@ -1269,6 +1270,22 @@ void memory_region_set_log(MemoryRegion *mr, bool log, 
unsigned client);
 void memory_region_set_dirty(MemoryRegion *mr, hwaddr addr,
  hwaddr size);
 
+/**
+ * memory_region_clear_dirty_bitmap - clear dirty bitmap for memory range
+ *
+ * This function is called when the caller wants to clear the remote
+ * dirty bitmap of a memory range within the memory region.  This can
+ * be used by e.g. KVM to manually clear dirty log when
+ * KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is declared support by the host
+ * kernel.
+ *
+ * @mr: the memory region to clear the dirty log upon
+ * @start:  start address offset within the memory region
+ * @len:length of the memory region to clear dirty bitmap
+ */
+void memory_region_clear_dirty_bitmap(MemoryRegion *mr, hwaddr start,
+  hwaddr len);
+
 /**
  * memory_region_snapshot_and_clear_dir

[Qemu-devel] [PATCH v4 01/11] migration: No need to take rcu during sync_dirty_bitmap

2019-06-02 Thread Peter Xu
cpu_physical_memory_sync_dirty_bitmap() has one RAMBlock* as
parameter, which means that it must be with RCU read lock held
already.  Taking it again inside seems redundant.  Removing it.
Instead comment on the functions about the RCU read lock.

Reviewed-by: Paolo Bonzini 
Reviewed-by: Juan Quintela 
Signed-off-by: Peter Xu 
---
 include/exec/ram_addr.h | 5 +
 migration/ram.c | 1 +
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 139ad79390..6fc49e5db5 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -408,6 +408,7 @@ static inline void 
cpu_physical_memory_clear_dirty_range(ram_addr_t start,
 }
 
 
+/* Called with RCU critical section */
 static inline
 uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
ram_addr_t start,
@@ -431,8 +432,6 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
 DIRTY_MEMORY_BLOCK_SIZE);
 unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS);
 
-rcu_read_lock();
-
 src = atomic_rcu_read(
 &ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION])->blocks;
 
@@ -452,8 +451,6 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
 idx++;
 }
 }
-
-rcu_read_unlock();
 } else {
 ram_addr_t offset = rb->offset;
 
diff --git a/migration/ram.c b/migration/ram.c
index 4c60869226..dc916042fb 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1678,6 +1678,7 @@ static inline bool migration_bitmap_clear_dirty(RAMState 
*rs,
 return ret;
 }
 
+/* Called with RCU critical section */
 static void migration_bitmap_sync_range(RAMState *rs, RAMBlock *rb,
 ram_addr_t length)
 {
-- 
2.17.1




[Qemu-devel] [PATCH v4 02/11] memory: Remove memory_region_get_dirty()

2019-06-02 Thread Peter Xu
It's never used anywhere.

Reviewed-by: Paolo Bonzini 
Reviewed-by: Juan Quintela 
Signed-off-by: Peter Xu 
---
 include/exec/memory.h | 17 -
 memory.c  |  8 
 2 files changed, 25 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9144a47f57..e6140e8a04 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1254,23 +1254,6 @@ void memory_region_ram_resize(MemoryRegion *mr, 
ram_addr_t newsize,
  */
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client);
 
-/**
- * memory_region_get_dirty: Check whether a range of bytes is dirty
- *  for a specified client.
- *
- * Checks whether a range of bytes has been written to since the last
- * call to memory_region_reset_dirty() with the same @client.  Dirty logging
- * must be enabled.
- *
- * @mr: the memory region being queried.
- * @addr: the address (relative to the start of the region) being queried.
- * @size: the size of the range being queried.
- * @client: the user of the logging information; %DIRTY_MEMORY_MIGRATION or
- *  %DIRTY_MEMORY_VGA.
- */
-bool memory_region_get_dirty(MemoryRegion *mr, hwaddr addr,
- hwaddr size, unsigned client);
-
 /**
  * memory_region_set_dirty: Mark a range of bytes as dirty in a memory region.
  *
diff --git a/memory.c b/memory.c
index 3071c4bdad..0920c105aa 100644
--- a/memory.c
+++ b/memory.c
@@ -2027,14 +2027,6 @@ void memory_region_set_log(MemoryRegion *mr, bool log, 
unsigned client)
 memory_region_transaction_commit();
 }
 
-bool memory_region_get_dirty(MemoryRegion *mr, hwaddr addr,
- hwaddr size, unsigned client)
-{
-assert(mr->ram_block);
-return cpu_physical_memory_get_dirty(memory_region_get_ram_addr(mr) + addr,
- size, client);
-}
-
 void memory_region_set_dirty(MemoryRegion *mr, hwaddr addr,
  hwaddr size)
 {
-- 
2.17.1




[Qemu-devel] [PATCH v4 00/11] kvm/migration: support KVM_CLEAR_DIRTY_LOG

2019-06-02 Thread Peter Xu
v4:
- add r-bs for Dave & Juan
- dropped patch 1 since queued
- fixup misc places in bitmap patch [Dave]
- indent and naming fixes in "pass mr into snapshot_and_clear_dirty"
  [Dave]
- allocate kvmslot dirty_bmap on first usage [Dave]
- comment fixup in clear-log split patch [Dave]

v3:
- drop header update because another same patch already merged in
  master by cohuck
- drop qmp/hmp patches [Paolo]
- comment fixes [Paolo]
- fix misuse of kvm cap names in either strings or commit messages [Paolo]

v2:
- rebase, add r-bs from Paolo
- added a few patches
  - linux-headers: Update to Linux 5.2-rc1
this is needed because we've got a new cap in kvm
  - checkpatch: Allow SPDX-License-Identifier
picked up the standalone patch into the series in case it got lost
  - hmp: Expose manual_dirty_log_protect via "info kvm"
qmp: Expose manual_dirty_log_protect via "query-kvm"
add interface to detect the new kvm capability
- switched default chunk size to 128M

Performance update is here:

  https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03621.html

Summary
=

This series allows QEMU to start using the new KVM_CLEAR_DIRTY_LOG
interface. For more on KVM_CLEAR_DIRTY_LOG itself, please refer to:

  
https://github.com/torvalds/linux/blob/master/Documentation/virtual/kvm/api.txt#L3810

The QEMU work (which is this series) is pushed too, please find the
tree here:

  https://github.com/xzpeter/qemu/tree/kvm-clear-dirty-log

Meanwhile, For anyone who really wants to try this out, please also
upgrade the host kernel to linux 5.2-rc1.

Design
===

I started with a naive/stupid design that I always pass all 1's to the
KVM for a memory range to clear all the dirty bits within that memory
range, but then I encountered guest oops - it's simply because we
can't clear any dirty bit from QEMU if we are not _sure_ that the bit
is dirty in the kernel.  Otherwise we might accidentally clear a bit
that we don't even know of (e.g., the bit was clear in migration's
dirty bitmap in QEMU) but actually that page was just being written so
QEMU will never remember to migrate that new page again.

The new design is focused on a dirty bitmap cache within the QEMU kvm
layer (which is per kvm memory slot).  With that we know what's dirty
in the kernel previously (note! the kernel bitmap is still growing all
the time so the cache will only be a subset of the realtime kernel
bitmap but that's far enough for us) and with that we'll be sure to
not accidentally clear unknown dirty pages.

With this method, we can also avoid race when multiple users (e.g.,
DIRTY_MEMORY_VGA and DIRTY_MEMORY_MIGRATION) want to clear the bit for
multiple time.  If without the kvm memory slot cached dirty bitmap we
won't be able to know which bit has been cleared and then if we send
the CLEAR operation upon the same bit twice (or more) we can still
face the same issue to clear something accidentally while we
shouldn't.

Summary: we really need to be careful on what bit to clear otherwise
we can face anything after the migration completes.  And I hope this
series has considered all about this.

Besides the new KVM cache layer and the new ioctl support, this series
introduced the memory_region_clear_dirty_bitmap() in the memory API
layer to allow clearing dirty bits of a specific memory range within
the memory region.

Please have a look, thanks.

Peter Xu (11):
  migration: No need to take rcu during sync_dirty_bitmap
  memory: Remove memory_region_get_dirty()
  memory: Don't set migration bitmap when without migration
  bitmap: Add bitmap_copy_with_{src|dst}_offset()
  memory: Pass mr into snapshot_and_clear_dirty
  memory: Introduce memory listener hook log_clear()
  kvm: Update comments for sync_dirty_bitmap
  kvm: Persistent per kvmslot dirty bitmap
  kvm: Introduce slots lock for memory listener
  kvm: Support KVM_CLEAR_DIRTY_LOG
  migration: Split log_clear() into smaller chunks

 accel/kvm/kvm-all.c  | 260 ---
 accel/kvm/trace-events   |   1 +
 exec.c   |  15 ++-
 include/exec/memory.h|  36 +++---
 include/exec/ram_addr.h  |  92 +-
 include/qemu/bitmap.h|   9 ++
 include/sysemu/kvm_int.h |   4 +
 memory.c |  64 --
 migration/migration.c|   4 +
 migration/migration.h|  27 
 migration/ram.c  |  45 +++
 migration/trace-events   |   1 +
 tests/Makefile.include   |   2 +
 tests/test-bitmap.c  |  72 +++
 util/bitmap.c|  85 +
 15 files changed, 663 insertions(+), 54 deletions(-)
 create mode 100644 tests/test-bitmap.c

-- 
2.17.1




Re: [Qemu-devel] [PATCH v2 2/6] tests/qapi-schema: Test for good feature lists in structs

2019-06-02 Thread Markus Armbruster
Kevin Wolf  writes:

> Am 24.05.2019 um 15:29 hat Markus Armbruster geschrieben:
>> Let's add
>> 
>>{ 'command': 'test-features',
>>  'data': { 'fs0': 'FeatureStruct0',
>>'fs1': 'FeatureStruct1',
>>'fs2': 'FeatureStruct2',
>>'fs3': 'FeatureStruct3',
>>'cfs1': 'CondFeatureStruct1',
>>'cfs2': 'CondFeatureStruct2',
>>'cfs3': 'CondFeatureStruct3' } }
>> 
>> because without it, the feature test cases won't generate introspection
>> code.
>
> Of course, like everything else you requested, I'll just do this to get
> the series off my table, but I'm still curious: Where would
> introspection code ever be generated for the test cases? I saw neither
> test code that generates the source files nor reference output that it
> would be compared against.

Asking me to explain why I want something done when you can't see it
yourself is much, much better than blindly implementing it.

Makefile.include feeds the two positive tests qapi-schema-test.json and
doc-good.json to qapi-gen.py.

The .o for the former's .c get linked into a bunch of tests via Make
variable $(test-qapi-obj-y).  One of them is test-qobject-input-visitor.
Its test case "/visitor/input/qapi-introspect" checks the generated
QObject conforms to the schema.

qapi-schema.json gets tested end-to-end instead: qmp-cmd-tests tests
query-qmp-schema.

Both tests only check schema conformance, they don't compare to expected
output.  Perhaps they should.  But I can still diff the generated
qmp-introspect.c manually, which I routinely do when messing with the
generator.

Makes sense?



Re: [Qemu-devel] [PATCH] migratioin/ram: leave RAMBlock->bmap blank on allocating

2019-06-02 Thread Peter Xu
On Mon, Jun 03, 2019 at 02:10:34PM +0800, Wei Yang wrote:
> On Mon, Jun 03, 2019 at 02:05:47PM +0800, Wei Yang wrote:
> >On Mon, Jun 03, 2019 at 01:40:13PM +0800, Peter Xu wrote:
> >>
> >>Ah I see, thanks for the pointer.  Then I would agree it's fine.
> >>
> >>I'm not an expert of TCG - I'm curious on why all those three dirty
> >>bitmaps need to be set at the very beginning.  IIUC at least the VGA
> >>bitmap should not require that (so IMHO we should be fine to have all
> >>zeros with VGA bitmap for ramblocks, and we only set them when the
> >>guest touches them).  Migration bitmap should be special somehow but I
> >>don't know much on TCG/TLB part I'd confess so I can't say.  In other
> >>words, if migration is the only one that requires this "all-1"
> >>initialization then IMHO we may consider to remove the other part
> >>rather than here in migration because that's what we'd better to be
> >>sure with.
> >
> >I am not sure about the background here, so I didn't make a change at this
> >place.
> >
> >>
> >>And even if you want to remove this, I still have two suggestions:
> >>
> >>(1) proper comment here above bmap on the above fact that although
> >>bmap is not set here but it's actually set somewhere else because
> >>we'll sooner or later copy all 1s from the ramblock bitmap
> >>
> >>(2) imho you can move "migration_dirty_pages = 0" into
> >>ram_list_init_bitmaps() too to let them be together
> >>
> 
> I took a look into this one.
> 
> ram_list_init_bitmaps() setup bitmap for each RAMBlock, while ram_state_init()
> setup RAMState. Since migration_dirty_pages belongs to RAMState, it maybe more
> proper to leave it at the original place.
> 
> Do you feel good about this?

Yes it's ok to me.  Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH] ioapic: kvm: Skip route updates for masked pins

2019-06-02 Thread Jan Kiszka

On 02.06.19 14:10, Peter Xu wrote:

On Sun, Jun 02, 2019 at 01:42:13PM +0200, Jan Kiszka wrote:

From: Jan Kiszka 

Masked entries will not generate interrupt messages, thus do no need to
be routed by KVM. This is a cosmetic cleanup, just avoiding warnings of
the kind

qemu-system-x86_64: vtd_irte_get: detected non-present IRTE (index=0, 
high=0xff00, low=0x100)

if the masked entry happens to reference a non-present IRTE.

Signed-off-by: Jan Kiszka 


Reviewed-by: Peter Xu 

Thanks, Jan.

The "non-cosmetic" part of clearing of those entries (e.g. including
when the entries were not setup correctly rather than masked) was
never really implemented and that task has been on my todo list for
quite a while but with a very low priority (low enough to sink...).  I
hope I didn't overlook its importance since AFAICT general OSs should
hardly trigger those paths and so far I don't see it a very big issue.


I triggered the message during the handover phase from Linux to Jailhouse. That 
involves reprogramming both IOAPIC and IRTEs - likely unusual sequences, I just 
didn't find invalid ones.


Thanks,
Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH] migratioin/ram: leave RAMBlock->bmap blank on allocating

2019-06-02 Thread Wei Yang
On Mon, Jun 03, 2019 at 02:05:47PM +0800, Wei Yang wrote:
>On Mon, Jun 03, 2019 at 01:40:13PM +0800, Peter Xu wrote:
>>
>>Ah I see, thanks for the pointer.  Then I would agree it's fine.
>>
>>I'm not an expert of TCG - I'm curious on why all those three dirty
>>bitmaps need to be set at the very beginning.  IIUC at least the VGA
>>bitmap should not require that (so IMHO we should be fine to have all
>>zeros with VGA bitmap for ramblocks, and we only set them when the
>>guest touches them).  Migration bitmap should be special somehow but I
>>don't know much on TCG/TLB part I'd confess so I can't say.  In other
>>words, if migration is the only one that requires this "all-1"
>>initialization then IMHO we may consider to remove the other part
>>rather than here in migration because that's what we'd better to be
>>sure with.
>
>I am not sure about the background here, so I didn't make a change at this
>place.
>
>>
>>And even if you want to remove this, I still have two suggestions:
>>
>>(1) proper comment here above bmap on the above fact that although
>>bmap is not set here but it's actually set somewhere else because
>>we'll sooner or later copy all 1s from the ramblock bitmap
>>
>>(2) imho you can move "migration_dirty_pages = 0" into
>>ram_list_init_bitmaps() too to let them be together
>>

I took a look into this one.

ram_list_init_bitmaps() setup bitmap for each RAMBlock, while ram_state_init()
setup RAMState. Since migration_dirty_pages belongs to RAMState, it maybe more
proper to leave it at the original place.

Do you feel good about this?

>
>I will address these two comments and send v2.
>
>Thanks.
>
>>-- 
>>Peter Xu
>
>-- 
>Wei Yang
>Help you, Help me

-- 
Wei Yang
Help you, Help me



Re: [Qemu-devel] [RFC PATCH 0/9] hw/acpi: make build_madt arch agnostic

2019-06-02 Thread Wei Yang
Igor,

Do you have some spare time to take a look the general idea?

On Mon, May 13, 2019 at 02:19:04PM +0800, Wei Yang wrote:
>Now MADT is highly depend in architecture and machine type and leaves
>duplicated code in different architecture. The series here tries to generalize
>it.
>
>MADT contains one main table and several sub tables. These sub tables are
>highly related to architecture. Here we introduce one method to make it
>architecture agnostic.
>
>  * each architecture define its sub-table implementation function in madt_sub
>  * introduces struct madt_input to collect sub table information and pass to
>build_madt
>
>By doing so, each architecture could prepare its own sub-table implementation
>and madt_input. And keep build_madt architecture agnostic.
>
>Wei Yang (9):
>  hw/acpi: expand pc_madt_cpu_entry in place
>  hw/acpi: implement madt_sub[ACPI_APIC_PROCESSOR]
>  hw/acpi: implement madt_sub[ACPI_APIC_LOCAL_X2APIC]
>  hw/acpi: implement madt_sub[ACPI_APIC_IO]
>  hw/acpi: implement madt_sub[ACPI_APIC_XRUPT_OVERRIDE]
>  hw/acpi: implement madt_sub[ACPI_APIC_LOCAL_X2APIC_NMI]
>  hw/acpi: implement madt_sub[ACPI_APIC_LOCAL_NMI]
>  hw/acpi: factor build_madt with madt_input
>  hw/acpi: implement madt_main to manipulate main madt table
>
> hw/acpi/cpu.c|  14 +-
> hw/acpi/piix4.c  |   3 +-
> hw/i386/acpi-build.c | 265 +--
> hw/isa/lpc_ich9.c|   3 +-
> include/hw/acpi/acpi_dev_interface.h |  12 +-
> include/hw/i386/pc.h |   2 +
> 6 files changed, 194 insertions(+), 105 deletions(-)
>
>-- 
>2.19.1

-- 
Wei Yang
Help you, Help me



Re: [Qemu-devel] [PATCH v3 01/12] checkpatch: Allow SPDX-License-Identifier

2019-06-02 Thread Peter Xu
On Fri, May 31, 2019 at 02:56:21PM +0200, Juan Quintela wrote:
> Peter Xu  wrote:
> > According to: https://spdx.org/ids-how, let's still allow QEMU to use
> > the SPDX license identifier:
> >
> > // SPDX-License-Identifier: ***
> >
> > Signed-off-by: Peter Xu 
> 
> Reviewed-by: Juan Quintela 
> 
> Althought this patch don't belong to the series O:-)

Right. :)  And Paolo should have queued the patch.

To make life easier, I plan to simply drop this patch in next spin and
change the only user of "// SPDX-License-Identifier" patch in the
series to simply use "/* ... */" since I just noticed vast codes in
QEMU is using that... then we don't have to depend on this patch.

-- 
Peter Xu



Re: [Qemu-devel] [Qemu-ppc] [PATCH v3] docs: provide documentation on the POWER9 XIVE interrupt controller

2019-06-02 Thread Cédric Le Goater
On 31/05/2019 09:03, Alexey Kardashevskiy wrote:
> 
> 
> On 21/05/2019 18:24, Cédric Le Goater wrote:
>> This documents the overall XIVE architecture and the XIVE support for
>> sPAPR guest machines (pseries).
>>
>> It also provides documentation on the 'info pic' command.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>
>>  Changes since v2:
>>
>>  - fixed typos.
>>
>>  Changes since v1:
>>
>>  - reorganized into different files and directories. I don't think the
>>'info pic' documentation belongs to 'interop' nor 'devel' and so
>>the ppc-spapr-xive.rst file seemed like the best place for it.
>>
>>  docs/index.rst|   1 +
>>  docs/specs/index.rst  |  13 +++
>>  docs/specs/ppc-spapr-xive.rst | 174 +
>>  docs/specs/ppc-xive.rst   | 199 ++
>>  MAINTAINERS   |   1 +
>>  5 files changed, 388 insertions(+)
>>  create mode 100644 docs/specs/index.rst
>>  create mode 100644 docs/specs/ppc-spapr-xive.rst
>>  create mode 100644 docs/specs/ppc-xive.rst
>>
>> diff --git a/docs/index.rst b/docs/index.rst
>> index 3690955dd1f5..baa5791c174b 100644
>> --- a/docs/index.rst
>> +++ b/docs/index.rst
>> @@ -12,4 +12,5 @@ Welcome to QEMU's documentation!
>>  
>> interop/index
>> devel/index
>> +   specs/index
>>  
>> diff --git a/docs/specs/index.rst b/docs/specs/index.rst
>> new file mode 100644
>> index ..2e927519c2e7
>> --- /dev/null
>> +++ b/docs/specs/index.rst
>> @@ -0,0 +1,13 @@
>> +. This is the top level page for the 'specs' manual
>> +
>> +
>> +QEMU full-system emulation guest hardware specifications
>> +
>> +
>> +
>> +Contents:
>> +
>> +.. toctree::
>> +   :maxdepth: 2
>> +
>> +   xive
>> diff --git a/docs/specs/ppc-spapr-xive.rst b/docs/specs/ppc-spapr-xive.rst
>> new file mode 100644
>> index ..539ce7ca4e90
>> --- /dev/null
>> +++ b/docs/specs/ppc-spapr-xive.rst
>> @@ -0,0 +1,174 @@
>> +XIVE for sPAPR (pseries machines)
>> +=
>> +
>> +The POWER9 processor comes with a new interrupt controller
>> +architecture, called XIVE as "eXternal Interrupt Virtualization
>> +Engine". It supports a larger number of interrupt sources and offers
>> +virtualization features which enables the HW to deliver interrupts
>> +directly to virtual processors without hypervisor assistance.
>> +
>> +A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9
>> +processors can run under two interrupt modes:
>> +
>> +- *Legacy Compatibility Mode*
>> +
>> +  the hypervisor provides identical interfaces and similar
>> +  functionality to PAPR+ Version 2.7.  This is the default mode
>> +
>> +  It is also referred as *XICS* in QEMU.
>> +
>> +- *XIVE native exploitation mode*
>> +
>> +  the hypervisor provides new interfaces to manage the XIVE control
>> +  structures, and provides direct control for interrupt management
>> +  through MMIO pages.
>> +
>> +Which interrupt modes can be used by the machine is negotiated with
>> +the guest O/S during the Client Architecture Support negotiation
>> +sequence. The two modes are mutually exclusive.
>> +
>> +Both interrupt mode share the same IRQ number space. See below for the
>> +layout.
>> +
>> +CAS Negotiation
>> +---
>> +
>> +QEMU advertises the supported interrupt modes in the device tree
>> +property "ibm,arch-vec-5-platform-support" in byte 23 and the OS
>> +Selection for XIVE is indicated in the "ibm,architecture-vec-5"
>> +property byte 23.
>> +
>> +The interrupt modes supported by the machine depend on the CPU type
>> +(POWER9 is required for XIVE) but also on the machine property
>> +``ic-mode`` which can be set on the command line. It can take the
>> +following values: ``xics``, ``xive``, ``dual`` and currently ``xics``
>> +is the default but it may change in the future.
>> +
>> +The choosen interrupt mode is activated after a reconfiguration done
>> +in a machine reset.
>> +
>> +XIVE Device tree properties
>> +---
>> +
>> +The properties for the PAPR interrupt controller node when the *XIVE
>> +native exploitation mode* is selected shoud contain:
>> +
>> +- ``device_type``
>> +
>> +  value should be "power-ivpe".
>> +
>> +- ``compatible``
>> +
>> +  value should be "ibm,power-ivpe".
>> +
>> +- ``reg``
>> +
>> +  contains the base address and size of the thread interrupt
>> +  managnement areas (TIMA), for the User level and for the Guest OS
>> +  level. Only the Guest OS level is taken into account today.
>> +
>> +- ``ibm,xive-eq-sizes``
>> +
>> +  the size of the event queues. One cell per size supported, contains
>> +  log2 of size, in ascending order.
>> +
>> +- ``ibm,xive-lisn-ranges``
>> +
>> +  the IRQ interrupt number ranges assigned to the guest for the IPIs.
>> +
>> +The root node also exports :
>> +
>> +- ``ibm,plat-res-int-priorities``
>> +
>> +  contains a list of priorities that the hypervisor has reserve

Re: [Qemu-devel] [PATCH] migratioin/ram: leave RAMBlock->bmap blank on allocating

2019-06-02 Thread Wei Yang
On Mon, Jun 03, 2019 at 01:40:13PM +0800, Peter Xu wrote:
>
>Ah I see, thanks for the pointer.  Then I would agree it's fine.
>
>I'm not an expert of TCG - I'm curious on why all those three dirty
>bitmaps need to be set at the very beginning.  IIUC at least the VGA
>bitmap should not require that (so IMHO we should be fine to have all
>zeros with VGA bitmap for ramblocks, and we only set them when the
>guest touches them).  Migration bitmap should be special somehow but I
>don't know much on TCG/TLB part I'd confess so I can't say.  In other
>words, if migration is the only one that requires this "all-1"
>initialization then IMHO we may consider to remove the other part
>rather than here in migration because that's what we'd better to be
>sure with.

I am not sure about the background here, so I didn't make a change at this
place.

>
>And even if you want to remove this, I still have two suggestions:
>
>(1) proper comment here above bmap on the above fact that although
>bmap is not set here but it's actually set somewhere else because
>we'll sooner or later copy all 1s from the ramblock bitmap
>
>(2) imho you can move "migration_dirty_pages = 0" into
>ram_list_init_bitmaps() too to let them be together
>

I will address these two comments and send v2.

Thanks.

>-- 
>Peter Xu

-- 
Wei Yang
Help you, Help me



Re: [Qemu-devel] [PATCH] migratioin/ram: leave RAMBlock->bmap blank on allocating

2019-06-02 Thread Peter Xu
On Mon, Jun 03, 2019 at 11:36:00AM +0800, Wei Yang wrote:
> On Mon, Jun 03, 2019 at 10:35:27AM +0800, Peter Xu wrote:
> >On Mon, Jun 03, 2019 at 09:33:05AM +0800, Wei Yang wrote:
> >> On Sat, Jun 01, 2019 at 11:34:41AM +0800, Peter Xu wrote:
> >> >On Fri, May 31, 2019 at 05:43:37PM +0100, Dr. David Alan Gilbert wrote:
> >> >> * Wei Yang (richardw.y...@linux.intel.com) wrote:
> >> >> > During migration, we would sync bitmap from ram_list.dirty_memory to
> >> >> > RAMBlock.bmap in cpu_physical_memory_sync_dirty_bitmap().
> >> >> > 
> >> >> > Since we set RAMBlock.bmap and ram_list.dirty_memory both to all 1, 
> >> >> > this
> >> >> > means at the first round this sync is meaningless and is a duplicated
> >> >> > work.
> >> >> > 
> >> >> > Leaving RAMBlock->bmap blank on allocating would have a side effect on
> >> >> > migration_dirty_pages, since it is calculated from the result of
> >> >> > cpu_physical_memory_sync_dirty_bitmap(). To keep it right, we need to
> >> >> > set migration_dirty_pages to 0 in ram_state_init().
> >> >> > 
> >> >> > Signed-off-by: Wei Yang 
> >> >> 
> >> >> I've looked at this for a while, and I think it's OK, so
> >> >> 
> >> >> Reviewed-by: Dr. David Alan Gilbert 
> >> >> 
> >> >> Peter, Juan: Can you just see if there's arny reason this would be bad,
> >> >> but I think it's actually more sensible than what we have.
> >> >
> >> >I really suspect it will work in all cases...  Wei, have you done any
> >> >test (or better, thorough tests) with this change?  My reasoning of
> >> >why we should need the bitmap all set is here:
> >> >
> >> 
> >> I have done some migration cases, like migrate a linux guest through tcp.
> >
> >When did you start the migration?  Have you tried to migrate during
> >some workload?
> >
> 
> I tried kernel build in guest.
> 
> >> 
> >> Other cases suggested to do?
> >
> >Could you also help answer the question I raised below in the link?
> >
> >Thanks,
> >
> >> >https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07361.html
> >
> 
> I took a look into this link, hope my understanding is correct.
> 
> You concern is this thread/patch is based on one prerequisite -- dirty all the
> bitmap at start.
> 
> My answer is we already did it in ram_block_add() for each RAMBlock. And then
> the bitmap is synced by migration_bitmap_sync_precopy() from
> ram_list.dirty_memory to RAMBlock.bmap.

Ah I see, thanks for the pointer.  Then I would agree it's fine.

I'm not an expert of TCG - I'm curious on why all those three dirty
bitmaps need to be set at the very beginning.  IIUC at least the VGA
bitmap should not require that (so IMHO we should be fine to have all
zeros with VGA bitmap for ramblocks, and we only set them when the
guest touches them).  Migration bitmap should be special somehow but I
don't know much on TCG/TLB part I'd confess so I can't say.  In other
words, if migration is the only one that requires this "all-1"
initialization then IMHO we may consider to remove the other part
rather than here in migration because that's what we'd better to be
sure with.

And even if you want to remove this, I still have two suggestions:

(1) proper comment here above bmap on the above fact that although
bmap is not set here but it's actually set somewhere else because
we'll sooner or later copy all 1s from the ramblock bitmap

(2) imho you can move "migration_dirty_pages = 0" into
ram_list_init_bitmaps() too to let them be together

-- 
Peter Xu



Re: [Qemu-devel] [PATCH] migratioin/ram: leave RAMBlock->bmap blank on allocating

2019-06-02 Thread Wei Yang
On Mon, Jun 03, 2019 at 10:35:27AM +0800, Peter Xu wrote:
>On Mon, Jun 03, 2019 at 09:33:05AM +0800, Wei Yang wrote:
>> On Sat, Jun 01, 2019 at 11:34:41AM +0800, Peter Xu wrote:
>> >On Fri, May 31, 2019 at 05:43:37PM +0100, Dr. David Alan Gilbert wrote:
>> >> * Wei Yang (richardw.y...@linux.intel.com) wrote:
>> >> > During migration, we would sync bitmap from ram_list.dirty_memory to
>> >> > RAMBlock.bmap in cpu_physical_memory_sync_dirty_bitmap().
>> >> > 
>> >> > Since we set RAMBlock.bmap and ram_list.dirty_memory both to all 1, this
>> >> > means at the first round this sync is meaningless and is a duplicated
>> >> > work.
>> >> > 
>> >> > Leaving RAMBlock->bmap blank on allocating would have a side effect on
>> >> > migration_dirty_pages, since it is calculated from the result of
>> >> > cpu_physical_memory_sync_dirty_bitmap(). To keep it right, we need to
>> >> > set migration_dirty_pages to 0 in ram_state_init().
>> >> > 
>> >> > Signed-off-by: Wei Yang 
>> >> 
>> >> I've looked at this for a while, and I think it's OK, so
>> >> 
>> >> Reviewed-by: Dr. David Alan Gilbert 
>> >> 
>> >> Peter, Juan: Can you just see if there's arny reason this would be bad,
>> >> but I think it's actually more sensible than what we have.
>> >
>> >I really suspect it will work in all cases...  Wei, have you done any
>> >test (or better, thorough tests) with this change?  My reasoning of
>> >why we should need the bitmap all set is here:
>> >
>> 
>> I have done some migration cases, like migrate a linux guest through tcp.
>
>When did you start the migration?  Have you tried to migrate during
>some workload?
>

I tried kernel build in guest.

>> 
>> Other cases suggested to do?
>
>Could you also help answer the question I raised below in the link?
>
>Thanks,
>
>> >https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07361.html
>

I took a look into this link, hope my understanding is correct.

You concern is this thread/patch is based on one prerequisite -- dirty all the
bitmap at start.

My answer is we already did it in ram_block_add() for each RAMBlock. And then
the bitmap is synced by migration_bitmap_sync_precopy() from
ram_list.dirty_memory to RAMBlock.bmap.


>-- 
>Peter Xu

-- 
Wei Yang
Help you, Help me



Re: [Qemu-devel] [PATCH] migratioin/ram: leave RAMBlock->bmap blank on allocating

2019-06-02 Thread Peter Xu
On Mon, Jun 03, 2019 at 09:33:05AM +0800, Wei Yang wrote:
> On Sat, Jun 01, 2019 at 11:34:41AM +0800, Peter Xu wrote:
> >On Fri, May 31, 2019 at 05:43:37PM +0100, Dr. David Alan Gilbert wrote:
> >> * Wei Yang (richardw.y...@linux.intel.com) wrote:
> >> > During migration, we would sync bitmap from ram_list.dirty_memory to
> >> > RAMBlock.bmap in cpu_physical_memory_sync_dirty_bitmap().
> >> > 
> >> > Since we set RAMBlock.bmap and ram_list.dirty_memory both to all 1, this
> >> > means at the first round this sync is meaningless and is a duplicated
> >> > work.
> >> > 
> >> > Leaving RAMBlock->bmap blank on allocating would have a side effect on
> >> > migration_dirty_pages, since it is calculated from the result of
> >> > cpu_physical_memory_sync_dirty_bitmap(). To keep it right, we need to
> >> > set migration_dirty_pages to 0 in ram_state_init().
> >> > 
> >> > Signed-off-by: Wei Yang 
> >> 
> >> I've looked at this for a while, and I think it's OK, so
> >> 
> >> Reviewed-by: Dr. David Alan Gilbert 
> >> 
> >> Peter, Juan: Can you just see if there's arny reason this would be bad,
> >> but I think it's actually more sensible than what we have.
> >
> >I really suspect it will work in all cases...  Wei, have you done any
> >test (or better, thorough tests) with this change?  My reasoning of
> >why we should need the bitmap all set is here:
> >
> 
> I have done some migration cases, like migrate a linux guest through tcp.

When did you start the migration?  Have you tried to migrate during
some workload?

> 
> Other cases suggested to do?

Could you also help answer the question I raised below in the link?

Thanks,

> >https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07361.html

-- 
Peter Xu



Re: [Qemu-devel] [PATCH 4/6] COLO-compare: Add colo-compare remote notify support

2019-06-02 Thread Li Zhijian

how about do switch inside colo_compare_inconsistency_notify(), like:

colo_compare_inconsistency_notify(CompareState *s)
{
if (s->remote_notify)
remote_notify
else
local_notity
}

Thanks
Zhijian

On 6/2/19 11:42 AM, Zhang Chen wrote:

From: Zhang Chen 

This patch make colo-compare can send message to remote COLO frame(Xen) when 
occur checkpoint.

Signed-off-by: Zhang Chen 
---
  net/colo-compare.c | 51 +-
  1 file changed, 46 insertions(+), 5 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 16285f4a96..19075c7a66 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -251,6 +251,17 @@ static void colo_release_primary_pkt(CompareState *s, 
Packet *pkt)
  packet_destroy(pkt, NULL);
  }
  
+static void notify_remote_frame(CompareState *s)

+{
+char msg[] = "DO_CHECKPOINT";
+int ret = 0;
+
+ret = compare_chr_send(s, (uint8_t *)msg, strlen(msg), 0, true);
+if (ret < 0) {
+error_report("Notify Xen COLO-frame failed");
+}
+}
+
  /*
   * The IP packets sent by primary and secondary
   * will be compared in here
@@ -435,7 +446,11 @@ sec:
  qemu_hexdump((char *)spkt->data, stderr,
   "colo-compare spkt", spkt->size);
  
-colo_compare_inconsistency_notify();

+if (s->notify_dev) {
+notify_remote_frame(s);
+} else {
+colo_compare_inconsistency_notify();
+}
  }
  }
  
@@ -577,7 +592,7 @@ void colo_compare_unregister_notifier(Notifier *notify)

  }
  
  static int colo_old_packet_check_one_conn(Connection *conn,

-   void *user_data)
+  CompareState *s)
  {
  GList *result = NULL;
  int64_t check_time = REGULAR_PACKET_CHECK_MS;
@@ -588,7 +603,11 @@ static int colo_old_packet_check_one_conn(Connection *conn,
  
  if (result) {

  /* Do checkpoint will flush old packet */
-colo_compare_inconsistency_notify();
+if (s->notify_dev) {
+notify_remote_frame(s);
+} else {
+colo_compare_inconsistency_notify();
+}
  return 0;
  }
  
@@ -608,7 +627,7 @@ static void colo_old_packet_check(void *opaque)

   * If we find one old packet, stop finding job and notify
   * COLO frame do checkpoint.
   */
-g_queue_find_custom(&s->conn_list, NULL,
+g_queue_find_custom(&s->conn_list, s,
  (GCompareFunc)colo_old_packet_check_one_conn);
  }
  
@@ -637,7 +656,12 @@ static void colo_compare_packet(CompareState *s, Connection *conn,

   */
  trace_colo_compare_main("packet different");
  g_queue_push_head(&conn->primary_list, pkt);
-colo_compare_inconsistency_notify();
+
+if (s->notify_dev) {
+notify_remote_frame(s);
+} else {
+colo_compare_inconsistency_notify();
+}
  break;
  }
  }
@@ -989,7 +1013,24 @@ static void compare_sec_rs_finalize(SocketReadState 
*sec_rs)
  
  static void compare_notify_rs_finalize(SocketReadState *notify_rs)

  {
+CompareState *s = container_of(notify_rs, CompareState, notify_rs);
+
  /* Get Xen colo-frame's notify and handle the message */
+char *data = g_memdup(notify_rs->buf, notify_rs->packet_len);
+char msg[] = "COLO_COMPARE_GET_XEN_INIT";
+int ret;
+
+if (!strcmp(data, "COLO_USERSPACE_PROXY_INIT")) {
+ret = compare_chr_send(s, (uint8_t *)msg, strlen(msg), 0, true);
+if (ret < 0) {
+error_report("Notify Xen COLO-frame INIT failed");
+}
+}
+
+if (!strcmp(data, "COLO_CHECKPOINT")) {
+/* colo-compare do checkpoint, flush pri packet and remove sec packet 
*/
+g_queue_foreach(&s->conn_list, colo_flush_packets, s);
+}
  }
  
  /*





Re: [Qemu-devel] [PATCH] migratioin/ram: leave RAMBlock->bmap blank on allocating

2019-06-02 Thread Wei Yang
On Sat, Jun 01, 2019 at 11:34:41AM +0800, Peter Xu wrote:
>On Fri, May 31, 2019 at 05:43:37PM +0100, Dr. David Alan Gilbert wrote:
>> * Wei Yang (richardw.y...@linux.intel.com) wrote:
>> > During migration, we would sync bitmap from ram_list.dirty_memory to
>> > RAMBlock.bmap in cpu_physical_memory_sync_dirty_bitmap().
>> > 
>> > Since we set RAMBlock.bmap and ram_list.dirty_memory both to all 1, this
>> > means at the first round this sync is meaningless and is a duplicated
>> > work.
>> > 
>> > Leaving RAMBlock->bmap blank on allocating would have a side effect on
>> > migration_dirty_pages, since it is calculated from the result of
>> > cpu_physical_memory_sync_dirty_bitmap(). To keep it right, we need to
>> > set migration_dirty_pages to 0 in ram_state_init().
>> > 
>> > Signed-off-by: Wei Yang 
>> 
>> I've looked at this for a while, and I think it's OK, so
>> 
>> Reviewed-by: Dr. David Alan Gilbert 
>> 
>> Peter, Juan: Can you just see if there's arny reason this would be bad,
>> but I think it's actually more sensible than what we have.
>
>I really suspect it will work in all cases...  Wei, have you done any
>test (or better, thorough tests) with this change?  My reasoning of
>why we should need the bitmap all set is here:
>

I have done some migration cases, like migrate a linux guest through tcp.

Other cases suggested to do?

>https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07361.html
>
>Regards,
>
>-- 
>Peter Xu

-- 
Wei Yang
Help you, Help me



Re: [Qemu-devel] [PATCH v2] target/ppc: Fix lxvw4x, lxvh8x and lxvb16x

2019-06-02 Thread David Gibson
On Sun, Jun 02, 2019 at 01:13:44PM +0100, Mark Cave-Ayland wrote:
> On 28/05/2019 02:09, David Gibson wrote:
> 
> > On Fri, May 24, 2019 at 07:53:45AM +0100, Mark Cave-Ayland wrote:
> >> From: Anton Blanchard 
> >>
> >> During the conversion these instructions were incorrectly treated as
> >> stores. We need to use set_cpu_vsr* and not get_cpu_vsr*.
> >>
> >> Fixes: 8b3b2d75c7c0 ("introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}() 
> >> helpers for VSR register access")
> >> Signed-off-by: Anton Blanchard 
> >> Reviewed-by: Mark Cave-Ayland 
> >> Tested-by: Greg Kurz 
> >> Reviewed-by: Greg Kurz 
> > 
> > Applied, thanks.
> 
> I'm in the process of preparing a VSX fixes branch to send over to 
> qemu-stable@ so
> that Anton's patches make the next 4.0 stable release, however I can't find 
> this
> patch in your ppc-for-4.1 branch? Did it get missed somehow?

Oops.  I think I must have botched a rebase and removed it
accidentally.  I've re-applied it.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 0/3] vhost-scsi: Support migration

2019-06-02 Thread Michael S. Tsirkin
On Mon, Jun 03, 2019 at 02:40:04AM +0300, Liran Alon wrote:
> Any news on when this patch-series is expected to be merged to upstream QEMU?
> It was accepted 2 months ago.
> 
> Thanks,
> -Liran 
> 
> > On 25 Apr 2019, at 20:53, Michael S. Tsirkin  wrote:
> > 
> > On Thu, Apr 25, 2019 at 09:38:19AM +0100, Stefan Hajnoczi wrote:
> >> On Wed, Apr 24, 2019 at 07:38:57PM +0300, Liran Alon wrote:
> >>> 
> >>> 
>  On 18 Apr 2019, at 12:41, Stefan Hajnoczi  wrote:
>  
>  On Tue, Apr 16, 2019 at 03:59:09PM +0300, Liran Alon wrote:
> > Hi,
> > 
> > This patch series aims to add supprot to migrate a VM with a vhost-scsi 
> > device.
> > 
> > The 1st patch fixes a bug of mistakenly not stopping vhost-scsi backend 
> > when a
> > VM is stopped (As happens on migratino pre-copy completion).
> > 
> > The 2nd patch adds ability to save/load vhost-scsi device state in 
> > VMState.
> > 
> > The 3rd and final paqtch adds a flag to vhost-scsi which allows admin 
> > to specify
> > it's setup supports vhost-scsi migratino. For more detailed information 
> > on why
> > this is valid, see commit message of specific patch.
> > 
> > Regards,
> > -Liran
>  
>  Looks fine for vhost_scsi.ko.  I have not checked how this interacts
>  with vhost-user-scsi.
>  
>  Reviewed-by: Stefan Hajnoczi 
> >>> 
> >>> Gentle Ping.
> >> 
> >> This should go either through Michael's virtio/vhost tree or Paolo's
> >> SCSI tree.
> >> 
> >> Stefan
> > 
> > OK I'll queue it.

Sorry dropped it by mistake after queueing and was not cc'd
so forgot to reapply. Queued now.

-- 
MST



Re: [Qemu-devel] [PATCH] ioapic: kvm: Skip route updates for masked pins

2019-06-02 Thread Michael S. Tsirkin
On Sun, Jun 02, 2019 at 01:42:13PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka 
> 
> Masked entries will not generate interrupt messages, thus do no need to
> be routed by KVM. This is a cosmetic cleanup, just avoiding warnings of
> the kind
> 
> qemu-system-x86_64: vtd_irte_get: detected non-present IRTE (index=0, 
> high=0xff00, low=0x100)
> 
> if the masked entry happens to reference a non-present IRTE.
> 
> Signed-off-by: Jan Kiszka 

Reviewed-by: Michael S. Tsirkin 

> ---
>  hw/intc/ioapic.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> index 7074489fdf..2fb288a22d 100644
> --- a/hw/intc/ioapic.c
> +++ b/hw/intc/ioapic.c
> @@ -197,9 +197,11 @@ static void ioapic_update_kvm_routes(IOAPICCommonState 
> *s)
>  MSIMessage msg;
>  struct ioapic_entry_info info;
>  ioapic_entry_parse(s->ioredtbl[i], &info);
> -msg.address = info.addr;
> -msg.data = info.data;
> -kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
> +if (!info.masked) {
> +msg.address = info.addr;
> +msg.data = info.data;
> +kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
> +}
>  }
>  kvm_irqchip_commit_routes(kvm_state);
>  }
> -- 
> 2.16.4



Re: [Qemu-devel] [PATCH v2 0/3] vhost-scsi: Support migration

2019-06-02 Thread Liran Alon
Any news on when this patch-series is expected to be merged to upstream QEMU?
It was accepted 2 months ago.

Thanks,
-Liran 

> On 25 Apr 2019, at 20:53, Michael S. Tsirkin  wrote:
> 
> On Thu, Apr 25, 2019 at 09:38:19AM +0100, Stefan Hajnoczi wrote:
>> On Wed, Apr 24, 2019 at 07:38:57PM +0300, Liran Alon wrote:
>>> 
>>> 
 On 18 Apr 2019, at 12:41, Stefan Hajnoczi  wrote:
 
 On Tue, Apr 16, 2019 at 03:59:09PM +0300, Liran Alon wrote:
> Hi,
> 
> This patch series aims to add supprot to migrate a VM with a vhost-scsi 
> device.
> 
> The 1st patch fixes a bug of mistakenly not stopping vhost-scsi backend 
> when a
> VM is stopped (As happens on migratino pre-copy completion).
> 
> The 2nd patch adds ability to save/load vhost-scsi device state in 
> VMState.
> 
> The 3rd and final paqtch adds a flag to vhost-scsi which allows admin to 
> specify
> it's setup supports vhost-scsi migratino. For more detailed information 
> on why
> this is valid, see commit message of specific patch.
> 
> Regards,
> -Liran
 
 Looks fine for vhost_scsi.ko.  I have not checked how this interacts
 with vhost-user-scsi.
 
 Reviewed-by: Stefan Hajnoczi 
>>> 
>>> Gentle Ping.
>> 
>> This should go either through Michael's virtio/vhost tree or Paolo's
>> SCSI tree.
>> 
>> Stefan
> 
> OK I'll queue it.




Re: [Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64

2019-06-02 Thread Andrew Randrianasulu
> Could you run:

>  make check-tcg

> And report which tests (if any) fail?

Unfortunately, test was SKIPped:

make check-tcg
make[1]: Вход в каталог `/dev/shm/qemu/slirp'
make[1]: Цель `all' не требует выполнения команд.
make[1]: Выход из каталога `/dev/shm/qemu/slirp'
CHK version_gen.h
  BUILD   TCG tests for x86_64-softmmu
  BUILD   x86_64 guest-tests SKIPPED
  RUN TCG tests for x86_64-softmmu
  RUN tests for x86_64 SKIPPED



[Qemu-devel] [PATCH] block/linux-aio: Drop unused BlockAIOCB submission method

2019-06-02 Thread Julia Suvorova via Qemu-devel
Callback-based laio_submit() and laio_cancel() were left after
rewriting Linux AIO backend to coroutines in hope that they would be
used in other code that could bypass coroutines. They can be safely
removed because they have not been used since that time.

Signed-off-by: Julia Suvorova 
---
 block/linux-aio.c   | 72 ++---
 include/block/raw-aio.h |  3 --
 2 files changed, 10 insertions(+), 65 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index d4b61fb251..27100c2fd1 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -30,7 +30,6 @@
 #define MAX_EVENTS 128
 
 struct qemu_laiocb {
-BlockAIOCB common;
 Coroutine *co;
 LinuxAioState *ctx;
 struct iocb iocb;
@@ -72,7 +71,7 @@ static inline ssize_t io_event_ret(struct io_event *ev)
 }
 
 /*
- * Completes an AIO request (calls the callback and frees the ACB).
+ * Completes an AIO request.
  */
 static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
 {
@@ -94,18 +93,15 @@ static void qemu_laio_process_completion(struct qemu_laiocb 
*laiocb)
 }
 
 laiocb->ret = ret;
-if (laiocb->co) {
-/* If the coroutine is already entered it must be in ioq_submit() and
- * will notice laio->ret has been filled in when it eventually runs
- * later.  Coroutines cannot be entered recursively so avoid doing
- * that!
- */
-if (!qemu_coroutine_entered(laiocb->co)) {
-aio_co_wake(laiocb->co);
-}
-} else {
-laiocb->common.cb(laiocb->common.opaque, ret);
-qemu_aio_unref(laiocb);
+
+/*
+ * If the coroutine is already entered it must be in ioq_submit() and
+ * will notice laio->ret has been filled in when it eventually runs
+ * later.  Coroutines cannot be entered recursively so avoid doing
+ * that!
+ */
+if (!qemu_coroutine_entered(laiocb->co)) {
+aio_co_wake(laiocb->co);
 }
 }
 
@@ -273,30 +269,6 @@ static bool qemu_laio_poll_cb(void *opaque)
 return true;
 }
 
-static void laio_cancel(BlockAIOCB *blockacb)
-{
-struct qemu_laiocb *laiocb = (struct qemu_laiocb *)blockacb;
-struct io_event event;
-int ret;
-
-if (laiocb->ret != -EINPROGRESS) {
-return;
-}
-ret = io_cancel(laiocb->ctx->ctx, &laiocb->iocb, &event);
-laiocb->ret = -ECANCELED;
-if (ret != 0) {
-/* iocb is not cancelled, cb will be called by the event loop later */
-return;
-}
-
-laiocb->common.cb(laiocb->common.opaque, laiocb->ret);
-}
-
-static const AIOCBInfo laio_aiocb_info = {
-.aiocb_size = sizeof(struct qemu_laiocb),
-.cancel_async   = laio_cancel,
-};
-
 static void ioq_init(LaioQueue *io_q)
 {
 QSIMPLEQ_INIT(&io_q->pending);
@@ -431,30 +403,6 @@ int coroutine_fn laio_co_submit(BlockDriverState *bs, 
LinuxAioState *s, int fd,
 return laiocb.ret;
 }
 
-BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
-int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
-BlockCompletionFunc *cb, void *opaque, int type)
-{
-struct qemu_laiocb *laiocb;
-off_t offset = sector_num * BDRV_SECTOR_SIZE;
-int ret;
-
-laiocb = qemu_aio_get(&laio_aiocb_info, bs, cb, opaque);
-laiocb->nbytes = nb_sectors * BDRV_SECTOR_SIZE;
-laiocb->ctx = s;
-laiocb->ret = -EINPROGRESS;
-laiocb->is_read = (type == QEMU_AIO_READ);
-laiocb->qiov = qiov;
-
-ret = laio_do_submit(fd, laiocb, offset, type);
-if (ret < 0) {
-qemu_aio_unref(laiocb);
-return NULL;
-}
-
-return &laiocb->common;
-}
-
 void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context)
 {
 aio_set_event_notifier(old_context, &s->e, false, NULL, NULL);
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index ba223dd1f1..0cb7cc74a2 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -50,9 +50,6 @@ LinuxAioState *laio_init(Error **errp);
 void laio_cleanup(LinuxAioState *s);
 int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
 uint64_t offset, QEMUIOVector *qiov, int type);
-BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
-int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
-BlockCompletionFunc *cb, void *opaque, int type);
 void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
 void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
 void laio_io_plug(BlockDriverState *bs, LinuxAioState *s);
-- 
2.17.1




Re: [Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64

2019-06-02 Thread Alex Bennée


Andrew Randrianasulu  writes:

> Hello!
>
> I was compiling latest qemu git, and was surprized to find qemu-system-x86_64
> (compiled for 32-bit x86 machine) can't boot any 64-bit kernel anymore.
>
> 32-bit kernels and kvm were fine.
> So, I run git bisect
>
> ./configure --target-list=x86_64-softmmu --disable-werror
>
> make -j 5
>
> x86_64-softmmu/qemu-system-x86_64 -kernel /boot/bzImage-4.12.0-x64
> -accel tcg

Could you run:

  make check-tcg

And report which tests (if any) fail?


>
> git bisect log
> git bisect start
> # bad: [60905286cb5150de854e08279bca7dfc4b549e91] Merge remote-tracking 
> branch 'remotes/dgibson/tags/ppc-for-4.1-20190529' into staging
> git bisect bad 60905286cb5150de854e08279bca7dfc4b549e91
> # good: [32a1a94dd324d33578dca1dc96d7896a0244d768] Update version for v3.1.0 
> release
> git bisect good 32a1a94dd324d33578dca1dc96d7896a0244d768
> # good: [32a1a94dd324d33578dca1dc96d7896a0244d768] Update version for v3.1.0 
> release
> git bisect good 32a1a94dd324d33578dca1dc96d7896a0244d768
> # good: [9403bccfe3e271f04e12c8c64d306e0cff589009] Merge remote-tracking 
> branch 'remotes/pmaydell/tags/pull-target-arm-20190228-1' into staging
> git bisect good 9403bccfe3e271f04e12c8c64d306e0cff589009
> # good: [9403bccfe3e271f04e12c8c64d306e0cff589009] Merge remote-tracking 
> branch 'remotes/pmaydell/tags/pull-target-arm-20190228-1' into staging
> git bisect good 9403bccfe3e271f04e12c8c64d306e0cff589009
> # good: [a39286dd61e455014694cb6aa44cfa9e5c86d101] nbd: Tolerate some server 
> non-compliance in NBD_CMD_BLOCK_STATUS
> git bisect good a39286dd61e455014694cb6aa44cfa9e5c86d101
> # bad: [bab1671f0fa928fd678a22f934739f06fd5fd035] tcg: Manually expand 
> INDEX_op_dup_vec
> git bisect bad bab1671f0fa928fd678a22f934739f06fd5fd035
> # bad: [bab1671f0fa928fd678a22f934739f06fd5fd035] tcg: Manually expand 
> INDEX_op_dup_vec
> git bisect bad bab1671f0fa928fd678a22f934739f06fd5fd035
> # good: [956fe143b4f254356496a0a1c479fa632376dfec] target/arm: Implement 
> VLLDM for v7M CPUs with an FPU
> git bisect good 956fe143b4f254356496a0a1c479fa632376dfec
> # good: [df06df4f412a67341de0fbb075e74c4dde3c8972] Merge remote-tracking 
> branch 'remotes/ericb/tags/pull-nbd-2019-05-07' into staging
> git bisect good df06df4f412a67341de0fbb075e74c4dde3c8972
> # good: [e5a0a6784a63a15d5b1221326fe5c258be6b5561] vvfat: Replace 
> bdrv_{read,write}() with bdrv_{pread,pwrite}()
> git bisect good e5a0a6784a63a15d5b1221326fe5c258be6b5561
> # bad: [01807c8b0e9f5da6981c2e62a3c1d8f661fb178e] Merge remote-tracking 
> branch 'remotes/armbru/tags/pull-misc-2019-05-13' into staging
> git bisect bad 01807c8b0e9f5da6981c2e62a3c1d8f661fb178e
> # bad: [04d6556c5c91d6b00c70df7b85e1715a7c7870df] Merge remote-tracking 
> branch 'remotes/stsquad/tags/pull-demacro-softmmu-100519-1' into staging
> git bisect bad 04d6556c5c91d6b00c70df7b85e1715a7c7870df
> # good: [c9ba36ff2f56a95dec0ee47f4dab0b22a0a01f86] Merge remote-tracking 
> branch 'remotes/kevin/tags/for-upstream' into staging
> git bisect good c9ba36ff2f56a95dec0ee47f4dab0b22a0a01f86
> # bad: [fc1bc777910dc14a3db4e2ad66f3e536effc297d] cputlb: Drop attribute 
> flatten
> git bisect bad fc1bc777910dc14a3db4e2ad66f3e536effc297d
> # bad: [f1be36969de2fb9b6b64397db1098f115210fcd9] cputlb: Move TLB_RECHECK 
> handling into load/store_helper
> git bisect bad f1be36969de2fb9b6b64397db1098f115210fcd9
> # bad: [eed5664238ea5317689cf32426d9318686b2b75c] accel/tcg: demacro cputlb
> git bisect bad eed5664238ea5317689cf32426d9318686b2b75c
> # first bad commit: [eed5664238ea5317689cf32426d9318686b2b75c] accel/tcg: 
> demacro cputlb
>
> Not sure how many people test qemu-system-x86_64 on 32-bit x86 hosts
>
>  gcc --version
> gcc (GCC) 5.5.0
>
> commit log says
>
> -
> accel/tcg: demacro cputlb
>
> Instead of expanding a series of macros to generate the load/store
> helpers we move stuff into common functions and rely on the compiler
> to eliminate the dead code for each variant.
> --
>
> May be gcc 5.5.0 was not really good in this case ...


--
Alex Bennée



[Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG

2019-06-02 Thread Alex Bennée
** Tags added: testcase

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1830872

Title:
  AARCH64 to ARMv7 mistranslation in TCG

Status in QEMU:
  New

Bug description:
  The following guest code:

  
https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S

  implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI
  Development Kit II) library function. (CopyMem() basically has memmove()
  semantics, to provide a standard C analog here.) The relevant functions
  are InternalMemCopyMem() and __memcpy().

  When TCG translates this aarch64 code to x86_64, everything works
  fine.

  When TCG translates this aarch64 code to ARMv7, the destination area of
  the translated CopyMem() function becomes corrupted -- it differs from
  the intended source contents. Namely, in every 4096 byte block, the
  8-byte word at offset 4032 (0xFC0) is zeroed out in the destination,
  instead of receiving the intended source value.

  I'm attaching two hexdumps of the same destination area:

  - "good.txt" is a hexdump of the destination area when CopyMem() was
translated to x86_64,

  - "bad.txt" is a hexdump of the destination area when CopyMem() was
translated to ARMv7.

  In order to assist with the analysis of this issue, I disassembled the
  aarch64 binary with "objdump". Please find the listing in
  "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at
  hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180.

  And, I ran the guest on the ARMv7 host with "-d
  in_asm,op,op_opt,op_ind,out_asm". Please find the log in
  "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached.

  The TBs that correspond to (parts of) the InternalMemCopyMem() and
  __memcpy() functions are scattered over the TCG log file, but the offset
  between the "nice" disassembly from "DxeCore.objdump", and the in-RAM
  TBs in the TCG log, can be determined from the fact that there is a
  single prfm instruction in the entire binary. The instruction's offset
  is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy()
  function --, and its RAM address is 0x472d2180 in the TCG log. Thus the
  difference (= the load address of DxeCore.efi) is 0x472a7000.

  QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch
  'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21).

  The reproducer command line is (on an ARMv7 host):

qemu-system-aarch64 \
  -display none \
  -machine virt,accel=tcg \
  -nodefaults \
  -nographic \
  -drive 
if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \
  -drive 
if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \
  -cpu cortex-a57 \
  -chardev stdio,signal=off,mux=on,id=char0 \
  -mon chardev=char0,mode=readline \
  -serial chardev:char0

  The apparent symptom is an assertion failure *in the guest*, such as

  > ASSERT [DxeCore]
  > 
/home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090):
  > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength

  but that is only a (distant) consequence of the CopyMem()
  mistranslation, and resultant destination area corruption.

  Originally reported in the following two mailing list messages:
  - 9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com">http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com
  - f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com">http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions



Re: [Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG

2019-06-02 Thread Alex Bennée


Laszlo Ersek (Red Hat)  writes:

> Possibly related:
> [Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64
> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07362.html
>
> (qemu-system-x86_64 fails to boot 64-bit kernel under TCG accel when
> QEMU is built for i686)
>
> Note to self: try to reprodouce the present issue with QEMU built at
> eed5664238ea^ -- this LP has originally been filed about the tree at
> a4f667b67149, and that commit contains eed5664238ea. So checking at
> eed5664238ea^ might reveal a difference.

Oops. Looks like tests/tcg/multiarch/system/memory.c didn't cover enough
cases.

--
Alex Bennée



Re: [Qemu-devel] [PATCH 1/2] target/mips: Improve performance for MSA binary operations

2019-06-02 Thread Alex Bennée


Mateja Marjanovic  writes:

> From: Mateja Marjanovic 
>
> Eliminate loops for better performance.

Have you done any measurements of the bellow loop unrolling? Because
this is something that maybe we can achieve and let the compiler make
the choice.

>
> Signed-off-by: Mateja Marjanovic 
> ---
>  target/mips/msa_helper.c | 43 ++-
>  1 file changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
> index 4c7ec05..1152fda 100644
> --- a/target/mips/msa_helper.c
> +++ b/target/mips/msa_helper.c
> @@ -804,28 +804,45 @@ void helper_msa_ ## func ## _df(CPUMIPSState *env, 
> uint32_t df, \
>  wr_t *pwd = &(env->active_fpu.fpr[wd].wr);  \
>  wr_t *pws = &(env->active_fpu.fpr[ws].wr);  \
>  wr_t *pwt = &(env->active_fpu.fpr[wt].wr);
> \

If we can ensure alignment for the various vector registers then the
compiler always has the option of using host vectors (certainly for int
and logic operations).

> -uint32_t i; \
>  \

>  switch (df) {   \
>  case DF_BYTE:   \
> -for (i = 0; i < DF_ELEMENTS(DF_BYTE); i++) {\
> -pwd->b[i] = msa_ ## func ## _df(df, pws->b[i], pwt->b[i]);  \
> -}   \
> +pwd->b[0]  = msa_ ## func ## _df(df, pws->b[0], pwt->b[0]); \
> +pwd->b[1]  = msa_ ## func ## _df(df, pws->b[1], pwt->b[1]); \
> +pwd->b[2]  = msa_ ## func ## _df(df, pws->b[2], pwt->b[2]); \
> +pwd->b[3]  = msa_ ## func ## _df(df, pws->b[3], pwt->b[3]); \
> +pwd->b[4]  = msa_ ## func ## _df(df, pws->b[4], pwt->b[4]); \
> +pwd->b[5]  = msa_ ## func ## _df(df, pws->b[5], pwt->b[5]); \
> +pwd->b[6]  = msa_ ## func ## _df(df, pws->b[6], pwt->b[6]); \
> +pwd->b[7]  = msa_ ## func ## _df(df, pws->b[7], pwt->b[7]); \
> +pwd->b[8]  = msa_ ## func ## _df(df, pws->b[8], pwt->b[8]); \
> +pwd->b[9]  = msa_ ## func ## _df(df, pws->b[9], pwt->b[9]); \
> +pwd->b[10] = msa_ ## func ## _df(df, pws->b[10], pwt->b[10]);   \
> +pwd->b[11] = msa_ ## func ## _df(df, pws->b[11], pwt->b[11]);   \
> +pwd->b[12] = msa_ ## func ## _df(df, pws->b[12], pwt->b[12]);   \
> +pwd->b[13] = msa_ ## func ## _df(df, pws->b[13], pwt->b[13]);   \
> +pwd->b[14] = msa_ ## func ## _df(df, pws->b[14], pwt->b[14]);   \
> +pwd->b[15] = msa_ ## func ## _df(df, pws->b[15], pwt->b[15]);   \
>  break;  \
>  case DF_HALF:   \
> -for (i = 0; i < DF_ELEMENTS(DF_HALF); i++) {\
> -pwd->h[i] = msa_ ## func ## _df(df, pws->h[i], pwt->h[i]);  \
> -}   \
> +pwd->h[0] = msa_ ## func ## _df(df, pws->h[0], pwt->h[0]);  \
> +pwd->h[1] = msa_ ## func ## _df(df, pws->h[1], pwt->h[1]);  \
> +pwd->h[2] = msa_ ## func ## _df(df, pws->h[2], pwt->h[2]);  \
> +pwd->h[3] = msa_ ## func ## _df(df, pws->h[3], pwt->h[3]);  \
> +pwd->h[4] = msa_ ## func ## _df(df, pws->h[4], pwt->h[4]);  \
> +pwd->h[5] = msa_ ## func ## _df(df, pws->h[5], pwt->h[5]);  \
> +pwd->h[6] = msa_ ## func ## _df(df, pws->h[6], pwt->h[6]);  \
> +pwd->h[7] = msa_ ## func ## _df(df, pws->h[7], pwt->h[7]);  \
>  break;  \
>  case DF_WORD:   \
> -for (i = 0; i < DF_ELEMENTS(DF_WORD); i++) {\
> -pwd->w[i] = msa_ ## func ## _df(df, pws->w[i], pwt->w[i]);  \
> -}   \
> +pwd->w[0] = msa_ ## func ## _df(df, pws->w[0], pwt->w[0]);  \
> +pwd->w[1] = msa_ ## func ## _df(df, pws->w[1], pwt->w[1]);  \
> +pwd->w[2] = msa_ ## func ## _df(df, pws->w[2], pwt->w[2]);  \
> +pwd->w[3] = msa_ ## func ## _df(df, pws->w[3], pwt->w[3]);  \
>  break;  \
>  case DF_DOUBLE: \
> -for (i = 0; i < DF_ELEMENTS(DF_DOUBLE); i++) {  \
> -pwd->d[i] = msa_ ## func ## _df(df, pws->d[i], pwt->d[i]);  \
> -}   \
> +pwd->d[0] = msa_ ## func ## _df(df, pws->d[0], pwt->d[0]);  \
>

Re: [Qemu-devel] [PATCH v2] target/ppc: Fix lxvw4x, lxvh8x and lxvb16x

2019-06-02 Thread Mark Cave-Ayland
On 28/05/2019 02:09, David Gibson wrote:

> On Fri, May 24, 2019 at 07:53:45AM +0100, Mark Cave-Ayland wrote:
>> From: Anton Blanchard 
>>
>> During the conversion these instructions were incorrectly treated as
>> stores. We need to use set_cpu_vsr* and not get_cpu_vsr*.
>>
>> Fixes: 8b3b2d75c7c0 ("introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}() 
>> helpers for VSR register access")
>> Signed-off-by: Anton Blanchard 
>> Reviewed-by: Mark Cave-Ayland 
>> Tested-by: Greg Kurz 
>> Reviewed-by: Greg Kurz 
> 
> Applied, thanks.

I'm in the process of preparing a VSX fixes branch to send over to qemu-stable@ 
so
that Anton's patches make the next 4.0 stable release, however I can't find this
patch in your ppc-for-4.1 branch? Did it get missed somehow?


ATB,

Mark.

>> ---
>>  target/ppc/translate/vsx-impl.inc.c | 13 +++--
>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/target/ppc/translate/vsx-impl.inc.c 
>> b/target/ppc/translate/vsx-impl.inc.c
>> index 199d22da97..cdb44b8b70 100644
>> --- a/target/ppc/translate/vsx-impl.inc.c
>> +++ b/target/ppc/translate/vsx-impl.inc.c
>> @@ -102,8 +102,7 @@ static void gen_lxvw4x(DisasContext *ctx)
>>  }
>>  xth = tcg_temp_new_i64();
>>  xtl = tcg_temp_new_i64();
>> -get_cpu_vsrh(xth, xT(ctx->opcode));
>> -get_cpu_vsrl(xtl, xT(ctx->opcode));
>> +
>>  gen_set_access_type(ctx, ACCESS_INT);
>>  EA = tcg_temp_new();
>>  
>> @@ -126,6 +125,8 @@ static void gen_lxvw4x(DisasContext *ctx)
>>  tcg_gen_addi_tl(EA, EA, 8);
>>  tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
>>  }
>> +set_cpu_vsrh(xT(ctx->opcode), xth);
>> +set_cpu_vsrl(xT(ctx->opcode), xtl);
>>  tcg_temp_free(EA);
>>  tcg_temp_free_i64(xth);
>>  tcg_temp_free_i64(xtl);
>> @@ -185,8 +186,6 @@ static void gen_lxvh8x(DisasContext *ctx)
>>  }
>>  xth = tcg_temp_new_i64();
>>  xtl = tcg_temp_new_i64();
>> -get_cpu_vsrh(xth, xT(ctx->opcode));
>> -get_cpu_vsrl(xtl, xT(ctx->opcode));
>>  gen_set_access_type(ctx, ACCESS_INT);
>>  
>>  EA = tcg_temp_new();
>> @@ -197,6 +196,8 @@ static void gen_lxvh8x(DisasContext *ctx)
>>  if (ctx->le_mode) {
>>  gen_bswap16x8(xth, xtl, xth, xtl);
>>  }
>> +set_cpu_vsrh(xT(ctx->opcode), xth);
>> +set_cpu_vsrl(xT(ctx->opcode), xtl);
>>  tcg_temp_free(EA);
>>  tcg_temp_free_i64(xth);
>>  tcg_temp_free_i64(xtl);
>> @@ -214,14 +215,14 @@ static void gen_lxvb16x(DisasContext *ctx)
>>  }
>>  xth = tcg_temp_new_i64();
>>  xtl = tcg_temp_new_i64();
>> -get_cpu_vsrh(xth, xT(ctx->opcode));
>> -get_cpu_vsrl(xtl, xT(ctx->opcode));
>>  gen_set_access_type(ctx, ACCESS_INT);
>>  EA = tcg_temp_new();
>>  gen_addr_reg_index(ctx, EA);
>>  tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
>>  tcg_gen_addi_tl(EA, EA, 8);
>>  tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
>> +set_cpu_vsrh(xT(ctx->opcode), xth);
>> +set_cpu_vsrl(xT(ctx->opcode), xtl);
>>  tcg_temp_free(EA);
>>  tcg_temp_free_i64(xth);
>>  tcg_temp_free_i64(xtl);
> 




Re: [Qemu-devel] [PATCH] ioapic: kvm: Skip route updates for masked pins

2019-06-02 Thread Peter Xu
On Sun, Jun 02, 2019 at 01:42:13PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka 
> 
> Masked entries will not generate interrupt messages, thus do no need to
> be routed by KVM. This is a cosmetic cleanup, just avoiding warnings of
> the kind
> 
> qemu-system-x86_64: vtd_irte_get: detected non-present IRTE (index=0, 
> high=0xff00, low=0x100)
> 
> if the masked entry happens to reference a non-present IRTE.
> 
> Signed-off-by: Jan Kiszka 

Reviewed-by: Peter Xu 

Thanks, Jan.

The "non-cosmetic" part of clearing of those entries (e.g. including
when the entries were not setup correctly rather than masked) was
never really implemented and that task has been on my todo list for
quite a while but with a very low priority (low enough to sink...).  I
hope I didn't overlook its importance since AFAICT general OSs should
hardly trigger those paths and so far I don't see it a very big issue.

Regards,

-- 
Peter Xu



[Qemu-devel] [PATCH v2 12/15] target/ppc: introduce GEN_VSX_HELPER_R2_AB macro to fpu_helper.c

2019-06-02 Thread Mark Cave-Ayland
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_R2_AB macro which performs the decode based
upon rA and rB at translation time.

Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Richard Henderson 
---
 target/ppc/fpu_helper.c | 10 --
 target/ppc/helper.h |  6 +++---
 target/ppc/translate/vsx-impl.inc.c | 24 +---
 3 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index ba52ef597e..350505e420 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2452,10 +2452,9 @@ void helper_xscmpexpdp(CPUPPCState *env, uint32_t opcode,
 do_float_check_status(env, GETPC());
 }
 
-void helper_xscmpexpqp(CPUPPCState *env, uint32_t opcode)
+void helper_xscmpexpqp(CPUPPCState *env, uint32_t opcode,
+   ppc_vsr_t *xa, ppc_vsr_t *xb)
 {
-ppc_vsr_t *xa = &env->vsr[rA(opcode) + 32];
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32];
 int64_t exp_a, exp_b;
 uint32_t cc;
 
@@ -2531,10 +2530,9 @@ VSX_SCALAR_CMP(xscmpodp, 1)
 VSX_SCALAR_CMP(xscmpudp, 0)
 
 #define VSX_SCALAR_CMPQ(op, ordered)\
-void helper_##op(CPUPPCState *env, uint32_t opcode) \
+void helper_##op(CPUPPCState *env, uint32_t opcode, \
+ ppc_vsr_t *xa, ppc_vsr_t *xb)  \
 {   \
-ppc_vsr_t *xa = &env->vsr[rA(opcode) + 32]; \
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32]; \
 uint32_t cc = 0;\
 bool vxsnan_flag = false, vxvc_flag = false;\
 \
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 2e0646f5eb..a5e12a3933 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -390,11 +390,11 @@ DEF_HELPER_4(xscmpgtdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpgedp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpnedp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpexpdp, void, env, i32, vsr, vsr)
-DEF_HELPER_2(xscmpexpqp, void, env, i32)
+DEF_HELPER_4(xscmpexpqp, void, env, i32, vsr, vsr)
 DEF_HELPER_4(xscmpodp, void, env, i32, vsr, vsr)
 DEF_HELPER_4(xscmpudp, void, env, i32, vsr, vsr)
-DEF_HELPER_2(xscmpoqp, void, env, i32)
-DEF_HELPER_2(xscmpuqp, void, env, i32)
+DEF_HELPER_4(xscmpoqp, void, env, i32, vsr, vsr)
+DEF_HELPER_4(xscmpuqp, void, env, i32, vsr, vsr)
 DEF_HELPER_4(xsmaxdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xsmindp, void, env, vsr, vsr, vsr)
 DEF_HELPER_5(xsmaxcdp, void, env, i32, vsr, vsr, vsr)
diff --git a/target/ppc/translate/vsx-impl.inc.c 
b/target/ppc/translate/vsx-impl.inc.c
index 0dd78546d7..e05756b8c1 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1132,6 +1132,24 @@ static void gen_##name(DisasContext *ctx)
 \
 tcg_temp_free_ptr(xb);\
 }
 
+#define GEN_VSX_HELPER_R2_AB(name, op1, op2, inval, type) \
+static void gen_##name(DisasContext *ctx) \
+{ \
+TCGv_i32 opc; \
+TCGv_ptr xa, xb;  \
+if (unlikely(!ctx->vsx_enabled)) {\
+gen_exception(ctx, POWERPC_EXCP_VSXU);\
+return;   \
+} \
+opc = tcg_const_i32(ctx->opcode); \
+xa = gen_vsr_ptr(rA(ctx->opcode) + 32);   \
+xb = gen_vsr_ptr(rB(ctx->opcode) + 32);   \
+gen_helper_##name(cpu_env, opc, xa, xb);  \
+tcg_temp_free_i32(opc);   \
+tcg_temp_free_ptr(xa);\
+tcg_temp_free_ptr(xb);\
+}
+
 #define GEN_VSX_HELPER_XT_XB_ENV(name, op1, op2, inval, type) \
 static void gen_##name(DisasContext *ctx) \
 { \
@@ -1175,11 +1193,11 @@ GEN_VSX_HELPER_X3(xscmpgtdp, 0x0C, 0x01, 0, PPC2_ISA300)
 GEN_VSX_HELPER_X3(xscmpgedp, 0x0C, 0x02, 0, PPC2_ISA300)
 GEN_VSX_HELPER_X3(xscmpnedp, 0x0C, 0x03, 0, PPC2_ISA300)
 GEN_VSX_HELPER_X2_AB(xscmpexpdp, 0x0C, 0x07, 0, PPC2_ISA300)
-GEN_VSX_HELPER_2(xscmpexpqp, 0x04, 

[Qemu-devel] [PATCH v2 15/15] target/ppc: improve VSX_FMADD with new GEN_VSX_HELPER_VSX_MADD macro

2019-06-02 Thread Mark Cave-Ayland
Introduce a new GEN_VSX_HELPER_VSX_MADD macro for the generator function which
enables the source and destination registers to be decoded at translation time.

This enables the determination of a or m form to be made at translation time so
that a single helper function can now be used for both variants.

Signed-off-by: Mark Cave-Ayland 
---
 target/ppc/fpu_helper.c | 68 ++-
 target/ppc/helper.h | 48 --
 target/ppc/translate/vsx-impl.inc.c | 81 +
 target/ppc/translate/vsx-ops.inc.c  | 70 +---
 4 files changed, 122 insertions(+), 145 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 350505e420..f13855d324 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2280,24 +2280,15 @@ VSX_TSQRT(xvtsqrtsp, 4, float32, VsrW(i), -126, 23)
  *   fld   - vsr_t field (VsrD(*) or VsrW(*))
  *   maddflgs - flags for the float*muladd routine that control the
  *   various forms (madd, msub, nmadd, nmsub)
- *   afrm  - A form (1=A, 0=M)
  *   sfprf - set FPRF
  */
-#define VSX_MADD(op, nels, tp, fld, maddflgs, afrm, sfprf, r2sp)  \
+#define VSX_MADD(op, nels, tp, fld, maddflgs, sfprf, r2sp)\
 void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, \
- ppc_vsr_t *xa, ppc_vsr_t *xb)\
+ ppc_vsr_t *xa, ppc_vsr_t *b, ppc_vsr_t *c)   \
 { \
-ppc_vsr_t t = *xt, *b, *c;\
+ppc_vsr_t t = *xt;\
 int i;\
   \
-if (afrm) { /* AxB + T */ \
-b = xb;   \
-c = xt;   \
-} else { /* AxT + B */\
-b = xt;   \
-c = xb;   \
-} \
-  \
 helper_reset_fpstatus(env);   \
   \
 for (i = 0; i < nels; i++) {  \
@@ -2336,41 +2327,24 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,   
  \
 do_float_check_status(env, GETPC());  \
 }
 
-VSX_MADD(xsmaddadp, 1, float64, VsrD(0), MADD_FLGS, 1, 1, 0)
-VSX_MADD(xsmaddmdp, 1, float64, VsrD(0), MADD_FLGS, 0, 1, 0)
-VSX_MADD(xsmsubadp, 1, float64, VsrD(0), MSUB_FLGS, 1, 1, 0)
-VSX_MADD(xsmsubmdp, 1, float64, VsrD(0), MSUB_FLGS, 0, 1, 0)
-VSX_MADD(xsnmaddadp, 1, float64, VsrD(0), NMADD_FLGS, 1, 1, 0)
-VSX_MADD(xsnmaddmdp, 1, float64, VsrD(0), NMADD_FLGS, 0, 1, 0)
-VSX_MADD(xsnmsubadp, 1, float64, VsrD(0), NMSUB_FLGS, 1, 1, 0)
-VSX_MADD(xsnmsubmdp, 1, float64, VsrD(0), NMSUB_FLGS, 0, 1, 0)
-
-VSX_MADD(xsmaddasp, 1, float64, VsrD(0), MADD_FLGS, 1, 1, 1)
-VSX_MADD(xsmaddmsp, 1, float64, VsrD(0), MADD_FLGS, 0, 1, 1)
-VSX_MADD(xsmsubasp, 1, float64, VsrD(0), MSUB_FLGS, 1, 1, 1)
-VSX_MADD(xsmsubmsp, 1, float64, VsrD(0), MSUB_FLGS, 0, 1, 1)
-VSX_MADD(xsnmaddasp, 1, float64, VsrD(0), NMADD_FLGS, 1, 1, 1)
-VSX_MADD(xsnmaddmsp, 1, float64, VsrD(0), NMADD_FLGS, 0, 1, 1)
-VSX_MADD(xsnmsubasp, 1, float64, VsrD(0), NMSUB_FLGS, 1, 1, 1)
-VSX_MADD(xsnmsubmsp, 1, float64, VsrD(0), NMSUB_FLGS, 0, 1, 1)
-
-VSX_MADD(xvmaddadp, 2, float64, VsrD(i), MADD_FLGS, 1, 0, 0)
-VSX_MADD(xvmaddmdp, 2, float64, VsrD(i), MADD_FLGS, 0, 0, 0)
-VSX_MADD(xvmsubadp, 2, float64, VsrD(i), MSUB_FLGS, 1, 0, 0)
-VSX_MADD(xvmsubmdp, 2, float64, VsrD(i), MSUB_FLGS, 0, 0, 0)
-VSX_MADD(xvnmaddadp, 2, float64, VsrD(i), NMADD_FLGS, 1, 0, 0)
-VSX_MADD(xvnmaddmdp, 2, float64, VsrD(i), NMADD_FLGS, 0, 0, 0)
-VSX_MADD(xvnmsubadp, 2, float64, VsrD(i), NMSUB_FLGS, 1, 0, 0)
-VSX_MADD(xvnmsubmdp, 2, float64, VsrD(i), NMSUB_FLGS, 0, 0, 0)
-
-VSX_MADD(xvmaddasp, 4, float32, VsrW(i), MADD_FLGS, 1, 0, 0)
-VSX_MADD(xvmaddmsp, 4, float32, VsrW(i), MADD_FLGS, 0, 0, 0)
-VSX_MADD(xvmsubasp, 4, float32, VsrW(i), MSUB_FLGS, 1, 0, 0)
-VSX_MADD(xvmsubmsp, 4, float32, VsrW(i), MSUB_FLGS, 0, 0, 0)
-VSX_MADD(xvnmaddasp, 4, float32, VsrW(i), NMADD_FLGS, 1, 0, 0)
-VSX_MADD(xvnmaddmsp, 4, float32, VsrW(i), NMADD_FLGS, 0, 0, 0)
-VSX_MADD(xvnmsubasp, 4, float32, VsrW(i), NMSUB_FLGS, 

[Qemu-devel] [PATCH v2 13/15] target/ppc: decode target register in VSX_VECTOR_LOAD_STORE_LENGTH at translation time

2019-06-02 Thread Mark Cave-Ayland
Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Richard Henderson 
---
 target/ppc/helper.h |  8 +++
 target/ppc/mem_helper.c |  6 ++---
 target/ppc/translate/vsx-impl.inc.c | 47 +++--
 3 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index a5e12a3933..7ed9e2 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -279,10 +279,10 @@ DEF_HELPER_3(stvebx, void, env, avr, tl)
 DEF_HELPER_3(stvehx, void, env, avr, tl)
 DEF_HELPER_3(stvewx, void, env, avr, tl)
 #if defined(TARGET_PPC64)
-DEF_HELPER_4(lxvl, void, env, tl, tl, tl)
-DEF_HELPER_4(lxvll, void, env, tl, tl, tl)
-DEF_HELPER_4(stxvl, void, env, tl, tl, tl)
-DEF_HELPER_4(stxvll, void, env, tl, tl, tl)
+DEF_HELPER_4(lxvl, void, env, tl, vsr, tl)
+DEF_HELPER_4(lxvll, void, env, tl, vsr, tl)
+DEF_HELPER_4(stxvl, void, env, tl, vsr, tl)
+DEF_HELPER_4(stxvll, void, env, tl, vsr, tl)
 #endif
 DEF_HELPER_4(vsumsws, void, env, avr, avr, avr)
 DEF_HELPER_4(vsum2sws, void, env, avr, avr, avr)
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 17a3c931a9..c533f88dc1 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -415,9 +415,8 @@ STVE(stvewx, cpu_stl_data_ra, bswap32, u32)
 
 #define VSX_LXVL(name, lj)  \
 void helper_##name(CPUPPCState *env, target_ulong addr, \
-   target_ulong xt_num, target_ulong rb)\
+   ppc_vsr_t *xt, target_ulong rb)  \
 {   \
-ppc_vsr_t *xt = &env->vsr[xt_num];  \
 ppc_vsr_t t;\
 uint64_t nb = GET_NB(rb);   \
 int i;  \
@@ -446,9 +445,8 @@ VSX_LXVL(lxvll, 1)
 
 #define VSX_STXVL(name, lj)   \
 void helper_##name(CPUPPCState *env, target_ulong addr,   \
-   target_ulong xt_num, target_ulong rb)  \
+   ppc_vsr_t *xt, target_ulong rb)\
 { \
-ppc_vsr_t *xt = &env->vsr[xt_num];\
 ppc_vsr_t t = *xt;\
 target_ulong nb = GET_NB(rb); \
 int i;\
diff --git a/target/ppc/translate/vsx-impl.inc.c 
b/target/ppc/translate/vsx-impl.inc.c
index e05756b8c1..931c7c33ac 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -343,29 +343,30 @@ VSX_VECTOR_STORE(stxv, st_i64, 0)
 VSX_VECTOR_STORE(stxvx, st_i64, 1)
 
 #ifdef TARGET_PPC64
-#define VSX_VECTOR_LOAD_STORE_LENGTH(name)  \
-static void gen_##name(DisasContext *ctx)   \
-{   \
-TCGv EA, xt;\
-\
-if (xT(ctx->opcode) < 32) { \
-if (unlikely(!ctx->vsx_enabled)) {  \
-gen_exception(ctx, POWERPC_EXCP_VSXU);  \
-return; \
-}   \
-} else {\
-if (unlikely(!ctx->altivec_enabled)) {  \
-gen_exception(ctx, POWERPC_EXCP_VPU);   \
-return; \
-}   \
-}   \
-EA = tcg_temp_new();\
-xt = tcg_const_tl(xT(ctx->opcode)); \
-gen_set_access_type(ctx, ACCESS_INT);   \
-gen_addr_register(ctx, EA); \
-gen_helper_##name(cpu_env, EA, xt, cpu_gpr[rB(ctx->opcode)]); \
-tcg_temp_free(EA);  \
-tcg_temp_free(xt);  \
+#define VSX_VECTOR_LOAD_STORE_LENGTH(name) \
+static void gen_##name(DisasContext *ctx)  \
+{  \
+TCGv EA;   \
+TCGv_ptr xt;   \
+   \
+if (xT(ctx->opcode) < 32) {\
+

[Qemu-devel] [PATCH v2 14/15] target/ppc: decode target register in VSX_EXTRACT_INSERT at translation time

2019-06-02 Thread Mark Cave-Ayland
Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Richard Henderson 
---
 target/ppc/helper.h |  4 ++--
 target/ppc/int_helper.c | 12 
 target/ppc/translate/vsx-impl.inc.c | 10 +-
 3 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 7ed9e2..3d5150a524 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -534,8 +534,8 @@ DEF_HELPER_3(xvrspip, void, env, vsr, vsr)
 DEF_HELPER_3(xvrspiz, void, env, vsr, vsr)
 DEF_HELPER_4(xxperm, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xxpermr, void, env, vsr, vsr, vsr)
-DEF_HELPER_4(xxextractuw, void, env, tl, tl, i32)
-DEF_HELPER_4(xxinsertw, void, env, tl, tl, i32)
+DEF_HELPER_4(xxextractuw, void, env, vsr, vsr, i32)
+DEF_HELPER_4(xxinsertw, void, env, vsr, vsr, i32)
 DEF_HELPER_3(xvxsigsp, void, env, vsr, vsr)
 
 DEF_HELPER_2(efscfsi, i32, env, i32)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 3b8939edcc..5c07ef3e4d 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1899,11 +1899,9 @@ VEXTRACT(uw, u32)
 VEXTRACT(d, u64)
 #undef VEXTRACT
 
-void helper_xxextractuw(CPUPPCState *env, target_ulong xtn,
-target_ulong xbn, uint32_t index)
+void helper_xxextractuw(CPUPPCState *env, ppc_vsr_t *xt,
+ppc_vsr_t *xb, uint32_t index)
 {
-ppc_vsr_t *xt = &env->vsr[xtn];
-ppc_vsr_t *xb = &env->vsr[xbn];
 ppc_vsr_t t = { };
 size_t es = sizeof(uint32_t);
 uint32_t ext_index;
@@ -1917,11 +1915,9 @@ void helper_xxextractuw(CPUPPCState *env, target_ulong 
xtn,
 *xt = t;
 }
 
-void helper_xxinsertw(CPUPPCState *env, target_ulong xtn,
-  target_ulong xbn, uint32_t index)
+void helper_xxinsertw(CPUPPCState *env, ppc_vsr_t *xt,
+  ppc_vsr_t *xb, uint32_t index)
 {
-ppc_vsr_t *xt = &env->vsr[xtn];
-ppc_vsr_t *xb = &env->vsr[xbn];
 ppc_vsr_t t = *xt;
 size_t es = sizeof(uint32_t);
 int ins_index, i = 0;
diff --git a/target/ppc/translate/vsx-impl.inc.c 
b/target/ppc/translate/vsx-impl.inc.c
index 931c7c33ac..b3bb1746ee 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1651,7 +1651,7 @@ static void gen_xxsldwi(DisasContext *ctx)
 #define VSX_EXTRACT_INSERT(name)\
 static void gen_##name(DisasContext *ctx)   \
 {   \
-TCGv xt, xb;\
+TCGv_ptr xt, xb;\
 TCGv_i32 t0;\
 TCGv_i64 t1;\
 uint8_t uimm = UIMM4(ctx->opcode);  \
@@ -1660,8 +1660,8 @@ static void gen_##name(DisasContext *ctx) 
  \
 gen_exception(ctx, POWERPC_EXCP_VSXU);  \
 return; \
 }   \
-xt = tcg_const_tl(xT(ctx->opcode)); \
-xb = tcg_const_tl(xB(ctx->opcode)); \
+xt = gen_vsr_ptr(xT(ctx->opcode));  \
+xb = gen_vsr_ptr(xB(ctx->opcode));  \
 t0 = tcg_temp_new_i32();\
 t1 = tcg_temp_new_i64();\
 /*  \
@@ -1676,8 +1676,8 @@ static void gen_##name(DisasContext *ctx) 
  \
 }   \
 tcg_gen_movi_i32(t0, uimm); \
 gen_helper_##name(cpu_env, xt, xb, t0); \
-tcg_temp_free(xb);  \
-tcg_temp_free(xt);  \
+tcg_temp_free_ptr(xb);  \
+tcg_temp_free_ptr(xt);  \
 tcg_temp_free_i32(t0);  \
 tcg_temp_free_i64(t1);  \
 }
-- 
2.11.0




[Qemu-devel] [PATCH] ioapic: kvm: Skip route updates for masked pins

2019-06-02 Thread Jan Kiszka
From: Jan Kiszka 

Masked entries will not generate interrupt messages, thus do no need to
be routed by KVM. This is a cosmetic cleanup, just avoiding warnings of
the kind

qemu-system-x86_64: vtd_irte_get: detected non-present IRTE (index=0, 
high=0xff00, low=0x100)

if the masked entry happens to reference a non-present IRTE.

Signed-off-by: Jan Kiszka 
---
 hw/intc/ioapic.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 7074489fdf..2fb288a22d 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -197,9 +197,11 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
 MSIMessage msg;
 struct ioapic_entry_info info;
 ioapic_entry_parse(s->ioredtbl[i], &info);
-msg.address = info.addr;
-msg.data = info.data;
-kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
+if (!info.masked) {
+msg.address = info.addr;
+msg.data = info.data;
+kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
+}
 }
 kvm_irqchip_commit_routes(kvm_state);
 }
-- 
2.16.4



[Qemu-devel] [PATCH v2 11/15] target/ppc: introduce GEN_VSX_HELPER_R2 macro to fpu_helper.c

2019-06-02 Thread Mark Cave-Ayland
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_R2 macro which performs the decode based
upon rD and rB at translation time.

Signed-off-by: Mark Cave-Ayland 
---
 target/ppc/fpu_helper.c | 30 -
 target/ppc/helper.h | 20 +--
 target/ppc/translate/vsx-impl.inc.c | 38 +++--
 3 files changed, 50 insertions(+), 38 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 90d3566ec8..ba52ef597e 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2808,10 +2808,9 @@ VSX_CVT_FP_TO_FP(xvcvspdp, 2, float32, float64, VsrW(2 * 
i), VsrD(i), 0)
  *   sfprf - set FPRF
  */
 #define VSX_CVT_FP_TO_FP_VECTOR(op, nels, stp, ttp, sfld, tfld, sfprf)\
-void helper_##op(CPUPPCState *env, uint32_t opcode)   \
+void helper_##op(CPUPPCState *env, uint32_t opcode,   \
+ ppc_vsr_t *xt, ppc_vsr_t *xb)\
 {   \
-ppc_vsr_t *xt = &env->vsr[rD(opcode) + 32]; \
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32]; \
 ppc_vsr_t t = *xt;  \
 int i;  \
 \
@@ -2975,10 +2974,9 @@ VSX_CVT_FP_TO_INT(xvcvspuxws, 4, float32, uint32, 
VsrW(i), VsrW(i), 0U)
  *   rnan  - resulting NaN
  */
 #define VSX_CVT_FP_TO_INT_VECTOR(op, stp, ttp, sfld, tfld, rnan) \
-void helper_##op(CPUPPCState *env, uint32_t opcode)  \
+void helper_##op(CPUPPCState *env, uint32_t opcode,  \
+ ppc_vsr_t *xt, ppc_vsr_t *xb)   \
 {\
-ppc_vsr_t *xt = &env->vsr[rD(opcode) + 32];  \
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32];  \
 ppc_vsr_t t = { };   \
  \
 t.tfld = stp##_to_##ttp##_round_to_zero(xb->sfld, &env->fp_status);  \
@@ -3052,10 +3050,9 @@ VSX_CVT_INT_TO_FP(xvcvuxwsp, 4, uint32, float32, 
VsrW(i), VsrW(i), 0, 0)
  *   tfld  - target vsr_t field
  */
 #define VSX_CVT_INT_TO_FP_VECTOR(op, stp, ttp, sfld, tfld)  \
-void helper_##op(CPUPPCState *env, uint32_t opcode) \
+void helper_##op(CPUPPCState *env, uint32_t opcode, \
+ ppc_vsr_t *xt, ppc_vsr_t *xb)  \
 {   \
-ppc_vsr_t *xt = &env->vsr[rD(opcode) + 32]; \
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32]; \
 ppc_vsr_t t = *xt;  \
 \
 t.tfld = stp##_to_##ttp(xb->sfld, &env->fp_status); \
@@ -3278,10 +3275,9 @@ void helper_xststdcsp(CPUPPCState *env, uint32_t opcode, 
ppc_vsr_t *xb)
 env->crf[BF(opcode)] = cc;
 }
 
-void helper_xsrqpi(CPUPPCState *env, uint32_t opcode)
+void helper_xsrqpi(CPUPPCState *env, uint32_t opcode,
+   ppc_vsr_t *xt, ppc_vsr_t *xb)
 {
-ppc_vsr_t *xt = &env->vsr[rD(opcode) + 32];
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32];
 ppc_vsr_t t = { };
 uint8_t r = Rrm(opcode);
 uint8_t ex = Rc(opcode);
@@ -3336,10 +3332,9 @@ void helper_xsrqpi(CPUPPCState *env, uint32_t opcode)
 do_float_check_status(env, GETPC());
 }
 
-void helper_xsrqpxp(CPUPPCState *env, uint32_t opcode)
+void helper_xsrqpxp(CPUPPCState *env, uint32_t opcode,
+ppc_vsr_t *xt, ppc_vsr_t *xb)
 {
-ppc_vsr_t *xt = &env->vsr[rD(opcode) + 32];
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32];
 ppc_vsr_t t = { };
 uint8_t r = Rrm(opcode);
 uint8_t rmc = RMC(opcode);
@@ -3391,10 +3386,9 @@ void helper_xsrqpxp(CPUPPCState *env, uint32_t opcode)
 do_float_check_status(env, GETPC());
 }
 
-void helper_xssqrtqp(CPUPPCState *env, uint32_t opcode)
+void helper_xssqrtqp(CPUPPCState *env, uint32_t opcode,
+ ppc_vsr_t *xt, ppc_vsr_t *xb)
 {
-ppc_vsr_t *xt = &env->vsr[rD(opcode) + 32];
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32];
 ppc_vsr_t t = { };
 float_status tstat;
 
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 9134da9cbb..2e0646f5eb 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -402,16 +402,16 @@ DEF_HELPER_5(xsmincdp, void, env, i32, vsr, vsr, vsr)
 DEF_HELPER_5(xsmaxjdp, void, env, i32, vsr,

[Qemu-devel] [PATCH v2 10/15] target/ppc: introduce GEN_VSX_HELPER_R3 macro to fpu_helper.c

2019-06-02 Thread Mark Cave-Ayland
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_R3 macro which performs the decode based
upon rD, rA and rB at translation time.

Signed-off-by: Mark Cave-Ayland 
---
 target/ppc/fpu_helper.c | 36 
 target/ppc/helper.h | 16 
 target/ppc/translate/vsx-impl.inc.c | 36 
 3 files changed, 48 insertions(+), 40 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 953d57d34e..90d3566ec8 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -1842,11 +1842,9 @@ VSX_ADD_SUB(xssubsp, sub, 1, float64, VsrD(0), 1, 1)
 VSX_ADD_SUB(xvsubdp, sub, 2, float64, VsrD(i), 0, 0)
 VSX_ADD_SUB(xvsubsp, sub, 4, float32, VsrW(i), 0, 0)
 
-void helper_xsaddqp(CPUPPCState *env, uint32_t opcode)
+void helper_xsaddqp(CPUPPCState *env, uint32_t opcode,
+ppc_vsr_t *xt, ppc_vsr_t *xa, ppc_vsr_t *xb)
 {
-ppc_vsr_t *xt = &env->vsr[rD(opcode) + 32];
-ppc_vsr_t *xa = &env->vsr[rA(opcode) + 32];
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32];
 ppc_vsr_t t = *xt;
 float_status tstat;
 
@@ -1920,11 +1918,9 @@ VSX_MUL(xsmulsp, 1, float64, VsrD(0), 1, 1)
 VSX_MUL(xvmuldp, 2, float64, VsrD(i), 0, 0)
 VSX_MUL(xvmulsp, 4, float32, VsrW(i), 0, 0)
 
-void helper_xsmulqp(CPUPPCState *env, uint32_t opcode)
+void helper_xsmulqp(CPUPPCState *env, uint32_t opcode,
+ppc_vsr_t *xt, ppc_vsr_t *xa, ppc_vsr_t *xb)
 {
-ppc_vsr_t *xt = &env->vsr[rD(opcode) + 32];
-ppc_vsr_t *xa = &env->vsr[rA(opcode) + 32];
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32];
 ppc_vsr_t t = *xt;
 float_status tstat;
 
@@ -1999,11 +1995,9 @@ VSX_DIV(xsdivsp, 1, float64, VsrD(0), 1, 1)
 VSX_DIV(xvdivdp, 2, float64, VsrD(i), 0, 0)
 VSX_DIV(xvdivsp, 4, float32, VsrW(i), 0, 0)
 
-void helper_xsdivqp(CPUPPCState *env, uint32_t opcode)
+void helper_xsdivqp(CPUPPCState *env, uint32_t opcode,
+ppc_vsr_t *xt, ppc_vsr_t *xa, ppc_vsr_t *xb)
 {
-ppc_vsr_t *xt = &env->vsr[rD(opcode) + 32];
-ppc_vsr_t *xa = &env->vsr[rA(opcode) + 32];
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32];
 ppc_vsr_t t = *xt;
 float_status tstat;
 
@@ -2620,11 +2614,9 @@ VSX_MAX_MIN(xvmindp, minnum, 2, float64, VsrD(i))
 VSX_MAX_MIN(xvminsp, minnum, 4, float32, VsrW(i))
 
 #define VSX_MAX_MINC(name, max)   \
-void helper_##name(CPUPPCState *env, uint32_t opcode) \
+void helper_##name(CPUPPCState *env, uint32_t opcode, \
+   ppc_vsr_t *xt, ppc_vsr_t *xa, ppc_vsr_t *xb)   \
 { \
-ppc_vsr_t *xt = &env->vsr[rD(opcode) + 32];   \
-ppc_vsr_t *xa = &env->vsr[rA(opcode) + 32];   \
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32];   \
 ppc_vsr_t t = *xt;\
 bool vxsnan_flag = false, vex_flag = false;   \
   \
@@ -2657,11 +2649,9 @@ VSX_MAX_MINC(xsmaxcdp, 1);
 VSX_MAX_MINC(xsmincdp, 0);
 
 #define VSX_MAX_MINJ(name, max)   \
-void helper_##name(CPUPPCState *env, uint32_t opcode) \
+void helper_##name(CPUPPCState *env, uint32_t opcode, \
+   ppc_vsr_t *xt, ppc_vsr_t *xa, ppc_vsr_t *xb)   \
 { \
-ppc_vsr_t *xt = &env->vsr[rD(opcode) + 32];   \
-ppc_vsr_t *xa = &env->vsr[rA(opcode) + 32];   \
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32];   \
 ppc_vsr_t t = *xt;\
 bool vxsnan_flag = false, vex_flag = false;   \
   \
@@ -3436,11 +3426,9 @@ void helper_xssqrtqp(CPUPPCState *env, uint32_t opcode)
 do_float_check_status(env, GETPC());
 }
 
-void helper_xssubqp(CPUPPCState *env, uint32_t opcode)
+void helper_xssubqp(CPUPPCState *env, uint32_t opcode,
+ppc_vsr_t *xt, ppc_vsr_t *xa, ppc_vsr_t *xb)
 {
-ppc_vsr_t *xt = &env->vsr[rD(opcode) + 32];
-ppc_vsr_t *xa = &env->vsr[rA(opcode) + 32];
-ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32];
 ppc_vsr_t t = *xt;
 float_status tstat;
 
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index a8886c56ad..9134da9cbb 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -366,12 +366,12 @@ DEF_HELPER_4(bcdtrunc, i32, avr, avr, avr, i32)
 

[Qemu-devel] [PATCH v2 03/15] target/ppc: remove getVSR()/putVSR() from int_helper.c

2019-06-02 Thread Mark Cave-Ayland
Since commit 8a14d31b00 "target/ppc: switch fpr/vsrl registers so all VSX
registers are in host endian order" functions getVSR() and putVSR() which used
to convert the VSR registers into host endian order are no longer required.

Now that there are now no more users of getVSR()/putVSR() these functions can
be completely removed.

Signed-off-by: Mark Cave-Ayland 
---
 target/ppc/int_helper.c | 22 ++
 target/ppc/internal.h   | 12 
 2 files changed, 10 insertions(+), 24 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 8ce89f2ad9..3b8939edcc 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1902,38 +1902,36 @@ VEXTRACT(d, u64)
 void helper_xxextractuw(CPUPPCState *env, target_ulong xtn,
 target_ulong xbn, uint32_t index)
 {
-ppc_vsr_t xt, xb;
+ppc_vsr_t *xt = &env->vsr[xtn];
+ppc_vsr_t *xb = &env->vsr[xbn];
+ppc_vsr_t t = { };
 size_t es = sizeof(uint32_t);
 uint32_t ext_index;
 int i;
 
-getVSR(xbn, &xb, env);
-memset(&xt, 0, sizeof(xt));
-
 ext_index = index;
 for (i = 0; i < es; i++, ext_index++) {
-xt.VsrB(8 - es + i) = xb.VsrB(ext_index % 16);
+t.VsrB(8 - es + i) = xb->VsrB(ext_index % 16);
 }
 
-putVSR(xtn, &xt, env);
+*xt = t;
 }
 
 void helper_xxinsertw(CPUPPCState *env, target_ulong xtn,
   target_ulong xbn, uint32_t index)
 {
-ppc_vsr_t xt, xb;
+ppc_vsr_t *xt = &env->vsr[xtn];
+ppc_vsr_t *xb = &env->vsr[xbn];
+ppc_vsr_t t = *xt;
 size_t es = sizeof(uint32_t);
 int ins_index, i = 0;
 
-getVSR(xbn, &xb, env);
-getVSR(xtn, &xt, env);
-
 ins_index = index;
 for (i = 0; i < es && ins_index < 16; i++, ins_index++) {
-xt.VsrB(ins_index) = xb.VsrB(8 - es + i);
+t.VsrB(ins_index) = xb->VsrB(8 - es + i);
 }
 
-putVSR(xtn, &xt, env);
+*xt = t;
 }
 
 #define VEXT_SIGNED(name, element, cast)\
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index fb6f64ed1e..d3d327e548 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -204,18 +204,6 @@ EXTRACT_HELPER(IMM8, 11, 8);
 EXTRACT_HELPER(DCMX, 16, 7);
 EXTRACT_HELPER_SPLIT_3(DCMX_XV, 5, 16, 0, 1, 2, 5, 1, 6, 6);
 
-static inline void getVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
-{
-vsr->VsrD(0) = env->vsr[n].VsrD(0);
-vsr->VsrD(1) = env->vsr[n].VsrD(1);
-}
-
-static inline void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
-{
-env->vsr[n].VsrD(0) = vsr->VsrD(0);
-env->vsr[n].VsrD(1) = vsr->VsrD(1);
-}
-
 void helper_compute_fprf_float16(CPUPPCState *env, float16 arg);
 void helper_compute_fprf_float32(CPUPPCState *env, float32 arg);
 void helper_compute_fprf_float128(CPUPPCState *env, float128 arg);
-- 
2.11.0




[Qemu-devel] [PATCH v2 00/15] target/ppc: remove getVSR()/putVSR() and further tidy-up

2019-06-02 Thread Mark Cave-Ayland
With the conversion of PPC VSX registers to host endian during the 4.0 
development
cycle, the VSX helpers getVSR() and putVSR() which were used to convert between 
big
endian and host endian (and are currently just a no-op) can now be removed. This
eliminates an extra copy for each VSX source register at runtime.

Patches 1-3 do the elimination work on a per-file basis and switch VSX register
accesses to be via pointers rather than on copies managed using 
getVSR()/putVSR().

After this patches 4-14 change the VSX registers to be passed to helpers via 
pointers
rather than register number so that the decode of the vector register pointers 
occurs
at translation time instead of at runtime. This matches how VMX instructions are
currently decoded.

Finally patch 15 performs some related tidy-up around VSX_FMADD which decodes 
the
a or m form at translation time, allowing a single helper function to be used 
for
both implementations.

Greg: I've added you as CC since you managed to find a bug in my last series. 
This
one is much more mechanical, but if you are able to confirm this doesn't 
introduce
any regressions in your test images then that would be great.

Signed-off-by: Mark Cave-Ayland 

v2:
- Rebase onto master
- Use working copy of VSX destination registers in patches 1-3 to keep current
  semantics where src == dest and exception handling
- Add patches 4 and 6 to split out helper functions still requiring an opcode
  parameter
- Remove opcode parameter from GEN_VSX_HELPER_X3 and GEN_VSX_HELPER_X2 as it
  isn't required for the common case
- Drop VSX_TEST_DC improvement patch since it is no longer applicable with the
  removal of opcode from the above macros
- Rework VSX_MADD improvement patch to use a single helper for both a and m
  forms as suggested by Richard


Mark Cave-Ayland (15):
  target/ppc: remove getVSR()/putVSR() from fpu_helper.c
  target/ppc: remove getVSR()/putVSR() from mem_helper.c
  target/ppc: remove getVSR()/putVSR() from int_helper.c
  target/ppc: introduce separate VSX_CMP macro for xvcmp* instructions
  target/ppc: introduce GEN_VSX_HELPER_X3 macro to fpu_helper.c
  target/ppc: introduce separate generator and helper for xscvqpdp
  target/ppc: introduce GEN_VSX_HELPER_X2 macro to fpu_helper.c
  target/ppc: introduce GEN_VSX_HELPER_X2_AB macro to fpu_helper.c
  target/ppc: introduce GEN_VSX_HELPER_X1 macro to fpu_helper.c
  target/ppc: introduce GEN_VSX_HELPER_R3 macro to fpu_helper.c
  target/ppc: introduce GEN_VSX_HELPER_R2 macro to fpu_helper.c
  target/ppc: introduce GEN_VSX_HELPER_R2_AB macro to fpu_helper.c
  target/ppc: decode target register in VSX_VECTOR_LOAD_STORE_LENGTH at
translation time
  target/ppc: decode target register in VSX_EXTRACT_INSERT at
translation time
  target/ppc: improve VSX_FMADD with new GEN_VSX_HELPER_VSX_MADD macro

 target/ppc/fpu_helper.c | 841 
 target/ppc/helper.h | 320 +++---
 target/ppc/int_helper.c |  26 +-
 target/ppc/internal.h   |  12 -
 target/ppc/mem_helper.c |  27 +-
 target/ppc/translate/vsx-impl.inc.c | 567 
 target/ppc/translate/vsx-ops.inc.c  |  70 +--
 7 files changed, 954 insertions(+), 909 deletions(-)

-- 
2.11.0




[Qemu-devel] [PATCH v2 05/15] target/ppc: introduce GEN_VSX_HELPER_X3 macro to fpu_helper.c

2019-06-02 Thread Mark Cave-Ayland
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_X3 macro which performs the decode based
upon xT, xA and xB at translation time.

With the previous changes to the VSX_CMP generator and helper macros the
opcode parameter is no longer required in the common case and can be
removed.

Signed-off-by: Mark Cave-Ayland 
---
 target/ppc/fpu_helper.c |  42 ---
 target/ppc/helper.h | 120 +++
 target/ppc/translate/vsx-impl.inc.c | 137 
 3 files changed, 151 insertions(+), 148 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 4b9b695333..386db30681 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -1801,11 +1801,9 @@ uint32_t helper_efdcmpeq(CPUPPCState *env, uint64_t op1, 
uint64_t op2)
  *   sfprf - set FPRF
  */
 #define VSX_ADD_SUB(name, op, nels, tp, fld, sfprf, r2sp)\
-void helper_##name(CPUPPCState *env, uint32_t opcode)\
+void helper_##name(CPUPPCState *env, ppc_vsr_t *xt,  \
+   ppc_vsr_t *xa, ppc_vsr_t *xb) \
 {\
-ppc_vsr_t *xt = &env->vsr[xT(opcode)];   \
-ppc_vsr_t *xa = &env->vsr[xA(opcode)];   \
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];   \
 ppc_vsr_t t = *xt;   \
 int i;   \
  \
@@ -1884,11 +1882,9 @@ void helper_xsaddqp(CPUPPCState *env, uint32_t opcode)
  *   sfprf - set FPRF
  */
 #define VSX_MUL(op, nels, tp, fld, sfprf, r2sp)  \
-void helper_##op(CPUPPCState *env, uint32_t opcode)  \
+void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,\
+ ppc_vsr_t *xa, ppc_vsr_t *xb)   \
 {\
-ppc_vsr_t *xt = &env->vsr[xT(opcode)];   \
-ppc_vsr_t *xa = &env->vsr[xA(opcode)];   \
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];   \
 ppc_vsr_t t = *xt;   \
 int i;   \
  \
@@ -1962,11 +1958,9 @@ void helper_xsmulqp(CPUPPCState *env, uint32_t opcode)
  *   sfprf - set FPRF
  */
 #define VSX_DIV(op, nels, tp, fld, sfprf, r2sp)   \
-void helper_##op(CPUPPCState *env, uint32_t opcode)   \
+void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, \
+ ppc_vsr_t *xa, ppc_vsr_t *xb)\
 { \
-ppc_vsr_t *xt = &env->vsr[xT(opcode)];\
-ppc_vsr_t *xa = &env->vsr[xA(opcode)];\
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];\
 ppc_vsr_t t = *xt;\
 int i;\
   \
@@ -2304,11 +2298,9 @@ VSX_TSQRT(xvtsqrtsp, 4, float32, VsrW(i), -126, 23)
  *   sfprf - set FPRF
  */
 #define VSX_MADD(op, nels, tp, fld, maddflgs, afrm, sfprf, r2sp)  \
-void helper_##op(CPUPPCState *env, uint32_t opcode)   \
+void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, \
+ ppc_vsr_t *xa, ppc_vsr_t *xb)\
 { \
-ppc_vsr_t *xt = &env->vsr[xT(opcode)];\
-ppc_vsr_t *xa = &env->vsr[xA(opcode)];\
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];\
 ppc_vsr_t t = *xt, *b, *c;\
 int i;\
   \
@@ -2402,11 +2394,9 @@ VSX_MADD(xvnmsubmsp, 4, float32, VsrW(i), NMSUB_FLGS, 0, 
0, 0)
  *   svxvc - set VXVC bit
  */
 #define VSX_SCALAR_CMP_DP(op, cmp, exp, svxvc)\
-void helper_##op(CPU

[Qemu-devel] [PATCH v2 01/15] target/ppc: remove getVSR()/putVSR() from fpu_helper.c

2019-06-02 Thread Mark Cave-Ayland
Since commit 8a14d31b00 "target/ppc: switch fpr/vsrl registers so all VSX
registers are in host endian order" functions getVSR() and putVSR() which used
to convert the VSR registers into host endian order are no longer required.

Signed-off-by: Mark Cave-Ayland 
---
 target/ppc/fpu_helper.c | 762 +++-
 1 file changed, 366 insertions(+), 396 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 0b7308f539..5edf913a89 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -1803,35 +1803,35 @@ uint32_t helper_efdcmpeq(CPUPPCState *env, uint64_t 
op1, uint64_t op2)
 #define VSX_ADD_SUB(name, op, nels, tp, fld, sfprf, r2sp)\
 void helper_##name(CPUPPCState *env, uint32_t opcode)\
 {\
-ppc_vsr_t xt, xa, xb;\
+ppc_vsr_t *xt = &env->vsr[xT(opcode)];   \
+ppc_vsr_t *xa = &env->vsr[xA(opcode)];   \
+ppc_vsr_t *xb = &env->vsr[xB(opcode)];   \
+ppc_vsr_t t = *xt;   \
 int i;   \
  \
-getVSR(xA(opcode), &xa, env);\
-getVSR(xB(opcode), &xb, env);\
-getVSR(xT(opcode), &xt, env);\
 helper_reset_fpstatus(env);  \
  \
 for (i = 0; i < nels; i++) { \
 float_status tstat = env->fp_status; \
 set_float_exception_flags(0, &tstat);\
-xt.fld = tp##_##op(xa.fld, xb.fld, &tstat);  \
+t.fld = tp##_##op(xa->fld, xb->fld, &tstat); \
 env->fp_status.float_exception_flags |= tstat.float_exception_flags; \
  \
 if (unlikely(tstat.float_exception_flags & float_flag_invalid)) {\
 float_invalid_op_addsub(env, sfprf, GETPC(), \
-tp##_classify(xa.fld) |  \
-tp##_classify(xb.fld));  \
+tp##_classify(xa->fld) | \
+tp##_classify(xb->fld)); \
 }\
  \
 if (r2sp) {  \
-xt.fld = helper_frsp(env, xt.fld);   \
+t.fld = helper_frsp(env, t.fld); \
 }\
  \
 if (sfprf) { \
-helper_compute_fprf_float64(env, xt.fld);\
+helper_compute_fprf_float64(env, t.fld); \
 }\
 }\
-putVSR(xT(opcode), &xt, env);\
+*xt = t; \
 do_float_check_status(env, GETPC()); \
 }
 
@@ -1846,12 +1846,12 @@ VSX_ADD_SUB(xvsubsp, sub, 4, float32, VsrW(i), 0, 0)
 
 void helper_xsaddqp(CPUPPCState *env, uint32_t opcode)
 {
-ppc_vsr_t xt, xa, xb;
+ppc_vsr_t *xt = &env->vsr[rD(opcode) + 32];
+ppc_vsr_t *xa = &env->vsr[rA(opcode) + 32];
+ppc_vsr_t *xb = &env->vsr[rB(opcode) + 32];
+ppc_vsr_t t = *xt;
 float_status tstat;
 
-getVSR(rA(opcode) + 32, &xa, env);
-getVSR(rB(opcode) + 32, &xb, env);
-getVSR(rD(opcode) + 32, &xt, env);
 helper_reset_fpstatus(env);
 
 tstat = env->fp_status;
@@ -1860,18 +1860,18 @@ void helper_xsaddqp(CPUPPCState *env, uint32_t opcode)
 }
 
 set_float_exception_flags(0, &tstat);
-xt.f128 = float128_add(xa.f128, xb.f128, &tstat);
+t.f128 = float128_add(xa->f128, xb->f128, &tstat);
 env->fp_status.float_exception_flags |= tstat.float_exception_flags;
 
 if (unlikely(tstat.float_exc

[Qemu-devel] [PATCH v2 04/15] target/ppc: introduce separate VSX_CMP macro for xvcmp* instructions

2019-06-02 Thread Mark Cave-Ayland
Rather than perform the VSR register decoding within the helper itself,
introduce a new VSX_CMP macro which performs the decode based upon xT, xA
and xB at translation time.

Subsequent commits will make the same changes for other instructions however
the xvcmp* instructions are different in that they return a set of flags to be
optionally written back to the crf[6] register. Move this logic from the
helper function to the generator function, along with the float_status update.

Signed-off-by: Mark Cave-Ayland 
---
 target/ppc/fpu_helper.c | 15 +---
 target/ppc/helper.h | 20 +--
 target/ppc/translate/vsx-impl.inc.c | 49 +++--
 3 files changed, 59 insertions(+), 25 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 5edf913a89..4b9b695333 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2746,12 +2746,11 @@ VSX_MAX_MINJ(xsminjdp, 0);
  *   exp   - expected result of comparison
  */
 #define VSX_CMP(op, nels, tp, fld, cmp, svxvc, exp)   \
-void helper_##op(CPUPPCState *env, uint32_t opcode)   \
+uint32_t helper_##op(CPUPPCState *env, ppc_vsr_t *xt, \
+ ppc_vsr_t *xa, ppc_vsr_t *xb)\
 { \
-ppc_vsr_t *xt = &env->vsr[xT(opcode)];\
-ppc_vsr_t *xa = &env->vsr[xA(opcode)];\
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];\
 ppc_vsr_t t = *xt;\
+uint32_t crf6 = 0;\
 int i;\
 int all_true = 1; \
 int all_false = 1;\
@@ -2780,11 +2779,9 @@ void helper_##op(CPUPPCState *env, uint32_t opcode)  
 \
 } \
   \
 *xt = t;  \
-if ((opcode >> (31 - 21)) & 1) {  \
-env->crf[6] = (all_true ? 0x8 : 0) | (all_false ? 0x2 : 0);   \
-} \
-do_float_check_status(env, GETPC());  \
- }
+crf6 = (all_true ? 0x8 : 0) | (all_false ? 0x2 : 0);  \
+return crf6;  \
+}
 
 VSX_CMP(xvcmpeqdp, 2, float64, VsrD(i), eq, 0, 1)
 VSX_CMP(xvcmpgedp, 2, float64, VsrD(i), le, 1, 1)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 02b67a333e..8666415169 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -108,6 +108,10 @@ DEF_HELPER_FLAGS_1(ftsqrt, TCG_CALL_NO_RWG_SE, i32, i64)
 #define dh_ctype_avr ppc_avr_t *
 #define dh_is_signed_avr dh_is_signed_ptr
 
+#define dh_alias_vsr ptr
+#define dh_ctype_vsr ppc_vsr_t *
+#define dh_is_signed_vsr dh_is_signed_ptr
+
 DEF_HELPER_3(vavgub, void, avr, avr, avr)
 DEF_HELPER_3(vavguh, void, avr, avr, avr)
 DEF_HELPER_3(vavguw, void, avr, avr, avr)
@@ -468,10 +472,10 @@ DEF_HELPER_2(xvnmsubadp, void, env, i32)
 DEF_HELPER_2(xvnmsubmdp, void, env, i32)
 DEF_HELPER_2(xvmaxdp, void, env, i32)
 DEF_HELPER_2(xvmindp, void, env, i32)
-DEF_HELPER_2(xvcmpeqdp, void, env, i32)
-DEF_HELPER_2(xvcmpgedp, void, env, i32)
-DEF_HELPER_2(xvcmpgtdp, void, env, i32)
-DEF_HELPER_2(xvcmpnedp, void, env, i32)
+DEF_HELPER_FLAGS_4(xvcmpeqdp, TCG_CALL_NO_RWG, i32, env, vsr, vsr, vsr)
+DEF_HELPER_FLAGS_4(xvcmpgedp, TCG_CALL_NO_RWG, i32, env, vsr, vsr, vsr)
+DEF_HELPER_FLAGS_4(xvcmpgtdp, TCG_CALL_NO_RWG, i32, env, vsr, vsr, vsr)
+DEF_HELPER_FLAGS_4(xvcmpnedp, TCG_CALL_NO_RWG, i32, env, vsr, vsr, vsr)
 DEF_HELPER_2(xvcvdpsp, void, env, i32)
 DEF_HELPER_2(xvcvdpsxds, void, env, i32)
 DEF_HELPER_2(xvcvdpsxws, void, env, i32)
@@ -506,10 +510,10 @@ DEF_HELPER_2(xvnmsubasp, void, env, i32)
 DEF_HELPER_2(xvnmsubmsp, void, env, i32)
 DEF_HELPER_2(xvmaxsp, void, env, i32)
 DEF_HELPER_2(xvminsp, void, env, i32)
-DEF_HELPER_2(xvcmpeqsp, void, env, i32)
-DEF_HELPER_2(xvcmpgesp, void, env, i32)
-DEF_HELPER_2(xvcmpgtsp, void, env, i32)
-DEF_HELPER_2(xvcmpnesp, void, env, i32)
+DEF_HELPER_FLAGS_4(xvcmpeqsp, TCG_CALL_NO_RWG, i32, env, vsr, vsr, vsr)
+DEF_HELPER_FLAGS_4(xvcmpgesp, TCG_CALL_NO_RWG, i32, env, vsr, vsr, vsr)
+DEF_HELPER_FLAGS_4(xvcmpgtsp, TCG_CALL_NO_RWG, i32, env, vsr, vsr, vsr)
+DEF_HELPER_FLAGS_4(xvcmpnesp, TCG_CALL_NO_RWG, i32, env, vsr, vsr, vsr)
 DEF_HELPER_2(xvcvspdp, void, env, i32)
 DEF_HELPER_2(xvcvsphp, void, env, i32)
 DEF_HELPER_2(xvcvhpsp, void, env, i32)
diff --git a

[Qemu-devel] [PATCH v2 09/15] target/ppc: introduce GEN_VSX_HELPER_X1 macro to fpu_helper.c

2019-06-02 Thread Mark Cave-Ayland
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_X1 macro which performs the decode based
upon xB at translation time.

Signed-off-by: Mark Cave-Ayland 
---
 target/ppc/fpu_helper.c |  6 ++
 target/ppc/helper.h |  8 
 target/ppc/translate/vsx-impl.inc.c | 24 
 3 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 628621d1b2..953d57d34e 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2236,9 +2236,8 @@ VSX_TDIV(xvtdivsp, 4, float32, VsrW(i), -126, 127, 23)
  *   nbits - number of fraction bits
  */
 #define VSX_TSQRT(op, nels, tp, fld, emin, nbits)   \
-void helper_##op(CPUPPCState *env, uint32_t opcode) \
+void helper_##op(CPUPPCState *env, uint32_t opcode, ppc_vsr_t *xb)  \
 {   \
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];  \
 int i;  \
 int fe_flag = 0;\
 int fg_flag = 0;\
@@ -3258,9 +3257,8 @@ VSX_TEST_DC(xvtstdcsp, 4, xB(opcode), float32, VsrW(i), 
VsrW(i), UINT32_MAX, 0)
 VSX_TEST_DC(xststdcdp, 1, xB(opcode), float64, VsrD(0), VsrD(0), 0, 1)
 VSX_TEST_DC(xststdcqp, 1, (rB(opcode) + 32), float128, f128, VsrD(0), 0, 1)
 
-void helper_xststdcsp(CPUPPCState *env, uint32_t opcode)
+void helper_xststdcsp(CPUPPCState *env, uint32_t opcode, ppc_vsr_t *xb)
 {
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];
 uint32_t dcmx, sign, exp;
 uint32_t cc, match = 0, not_sp = 0;
 
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 0ab1ef2aee..a8886c56ad 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -376,7 +376,7 @@ DEF_HELPER_3(xsredp, void, env, vsr, vsr)
 DEF_HELPER_3(xssqrtdp, void, env, vsr, vsr)
 DEF_HELPER_3(xsrsqrtedp, void, env, vsr, vsr)
 DEF_HELPER_4(xstdivdp, void, env, i32, vsr, vsr)
-DEF_HELPER_2(xstsqrtdp, void, env, i32)
+DEF_HELPER_3(xstsqrtdp, void, env, i32, vsr)
 DEF_HELPER_4(xsmaddadp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xsmaddmdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xsmsubadp, void, env, vsr, vsr, vsr)
@@ -423,7 +423,7 @@ DEF_HELPER_3(xscvuxdsp, void, env, vsr, vsr)
 DEF_HELPER_3(xscvsxdsp, void, env, vsr, vsr)
 DEF_HELPER_2(xscvudqp, void, env, i32)
 DEF_HELPER_3(xscvuxddp, void, env, vsr, vsr)
-DEF_HELPER_2(xststdcsp, void, env, i32)
+DEF_HELPER_3(xststdcsp, void, env, i32, vsr)
 DEF_HELPER_2(xststdcdp, void, env, i32)
 DEF_HELPER_2(xststdcqp, void, env, i32)
 DEF_HELPER_3(xsrdpi, void, env, vsr, vsr)
@@ -461,7 +461,7 @@ DEF_HELPER_3(xvredp, void, env, vsr, vsr)
 DEF_HELPER_3(xvsqrtdp, void, env, vsr, vsr)
 DEF_HELPER_3(xvrsqrtedp, void, env, vsr, vsr)
 DEF_HELPER_4(xvtdivdp, void, env, i32, vsr, vsr)
-DEF_HELPER_2(xvtsqrtdp, void, env, i32)
+DEF_HELPER_3(xvtsqrtdp, void, env, i32, vsr)
 DEF_HELPER_4(xvmaddadp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xvmaddmdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xvmsubadp, void, env, vsr, vsr, vsr)
@@ -499,7 +499,7 @@ DEF_HELPER_3(xvresp, void, env, vsr, vsr)
 DEF_HELPER_3(xvsqrtsp, void, env, vsr, vsr)
 DEF_HELPER_3(xvrsqrtesp, void, env, vsr, vsr)
 DEF_HELPER_4(xvtdivsp, void, env, i32, vsr, vsr)
-DEF_HELPER_2(xvtsqrtsp, void, env, i32)
+DEF_HELPER_3(xvtsqrtsp, void, env, i32, vsr)
 DEF_HELPER_4(xvmaddasp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xvmaddmsp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xvmsubasp, void, env, vsr, vsr, vsr)
diff --git a/target/ppc/translate/vsx-impl.inc.c 
b/target/ppc/translate/vsx-impl.inc.c
index d8e9b80d4a..e4831700bf 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1078,6 +1078,22 @@ static void gen_##name(DisasContext *ctx)
 \
 tcg_temp_free_ptr(xb);\
 }
 
+#define GEN_VSX_HELPER_X1(name, op1, op2, inval, type)\
+static void gen_##name(DisasContext *ctx) \
+{ \
+TCGv_i32 opc; \
+TCGv_ptr xb;  \
+if (unlikely(!ctx->vsx_enabled)) {\
+gen_exception(ctx, POWERPC_EXCP_VSXU);\
+return;   \
+} \
+opc = tcg_const_i32(ctx->opcode); \
+xb = gen_vsr_ptr(xB(ctx->opcode));\
+gen_h

[Qemu-devel] [PATCH v2 07/15] target/ppc: introduce GEN_VSX_HELPER_X2 macro to fpu_helper.c

2019-06-02 Thread Mark Cave-Ayland
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_X2 macro which performs the decode based
upon xT and xB at translation time.

With the previous change to the xscvqpdp generator and helper functions the
opcode parameter is no longer required in the common case and can be
removed.

Signed-off-by: Mark Cave-Ayland 
---
 target/ppc/fpu_helper.c |  36 +++---
 target/ppc/helper.h | 120 
 target/ppc/translate/vsx-impl.inc.c | 135 
 3 files changed, 144 insertions(+), 147 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index a6556781e1..c7f5b49e03 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2040,10 +2040,8 @@ void helper_xsdivqp(CPUPPCState *env, uint32_t opcode)
  *   sfprf - set FPRF
  */
 #define VSX_RE(op, nels, tp, fld, sfprf, r2sp)\
-void helper_##op(CPUPPCState *env, uint32_t opcode)   \
+void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)  \
 { \
-ppc_vsr_t *xt = &env->vsr[xT(opcode)];\
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];\
 ppc_vsr_t t = *xt;\
 int i;\
   \
@@ -2082,10 +2080,8 @@ VSX_RE(xvresp, 4, float32, VsrW(i), 0, 0)
  *   sfprf - set FPRF
  */
 #define VSX_SQRT(op, nels, tp, fld, sfprf, r2sp) \
-void helper_##op(CPUPPCState *env, uint32_t opcode)  \
+void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb) \
 {\
-ppc_vsr_t *xt = &env->vsr[xT(opcode)];   \
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];   \
 ppc_vsr_t t = *xt;   \
 int i;   \
  \
@@ -2132,10 +2128,8 @@ VSX_SQRT(xvsqrtsp, 4, float32, VsrW(i), 0, 0)
  *   sfprf - set FPRF
  */
 #define VSX_RSQRTE(op, nels, tp, fld, sfprf, r2sp)   \
-void helper_##op(CPUPPCState *env, uint32_t opcode)  \
+void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb) \
 {\
-ppc_vsr_t *xt = &env->vsr[xT(opcode)];   \
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];   \
 ppc_vsr_t t = *xt;   \
 int i;   \
  \
@@ -2791,10 +2785,8 @@ VSX_CMP(xvcmpnesp, 4, float32, VsrW(i), eq, 0, 0)
  *   sfprf - set FPRF
  */
 #define VSX_CVT_FP_TO_FP(op, nels, stp, ttp, sfld, tfld, sfprf)\
-void helper_##op(CPUPPCState *env, uint32_t opcode)\
+void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)   \
 {  \
-ppc_vsr_t *xt = &env->vsr[xT(opcode)]; \
-ppc_vsr_t *xb = &env->vsr[xB(opcode)]; \
 ppc_vsr_t t = *xt; \
 int i; \
\
@@ -2867,10 +2859,8 @@ VSX_CVT_FP_TO_FP_VECTOR(xscvdpqp, 1, float64, float128, 
VsrD(0), f128, 1)
  *   sfprf - set FPRF
  */
 #define VSX_CVT_FP_TO_FP_HP(op, nels, stp, ttp, sfld, tfld, sfprf) \
-void helper_##op(CPUPPCState *env, uint32_t opcode)\
+void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)   \
 {  \
-ppc_vsr_t *xt = &env->vsr[xT(opcode)]; \
-ppc_vsr_t *xb = &env->vsr[xB(opcode)]; \
 ppc_vsr_t t = { }; \
 int i; \
\
@@ -2949,11 +2939,9 @@ uint64_t helper_xscvspdpn(CPUPPCState *env, uint64_t xb)
  *   rnan  - resulting NaN
  */
 #define VSX_CVT_FP_TO_INT(op, nels, stp, ttp, sfld, tfld, rnan)  \
-void helper_##op(CPUPPCState *env, uint32_t opcode) 

[Qemu-devel] [PATCH v2 08/15] target/ppc: introduce GEN_VSX_HELPER_X2_AB macro to fpu_helper.c

2019-06-02 Thread Mark Cave-Ayland
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_X2_AB macro which performs the decode based
upon xA and xB at translation time.

Signed-off-by: Mark Cave-Ayland 
---
 target/ppc/fpu_helper.c | 15 ++-
 target/ppc/helper.h | 12 ++--
 target/ppc/translate/vsx-impl.inc.c | 30 --
 3 files changed, 36 insertions(+), 21 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index c7f5b49e03..628621d1b2 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2179,10 +2179,9 @@ VSX_RSQRTE(xvrsqrtesp, 4, float32, VsrW(i), 0, 0)
  *   nbits - number of fraction bits
  */
 #define VSX_TDIV(op, nels, tp, fld, emin, emax, nbits)  \
-void helper_##op(CPUPPCState *env, uint32_t opcode) \
+void helper_##op(CPUPPCState *env, uint32_t opcode, \
+ ppc_vsr_t *xa, ppc_vsr_t *xb)  \
 {   \
-ppc_vsr_t *xa = &env->vsr[xA(opcode)];  \
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];  \
 int i;  \
 int fe_flag = 0;\
 int fg_flag = 0;\
@@ -2431,10 +2430,9 @@ VSX_SCALAR_CMP_DP(xscmpgedp, le, 1, 1)
 VSX_SCALAR_CMP_DP(xscmpgtdp, lt, 1, 1)
 VSX_SCALAR_CMP_DP(xscmpnedp, eq, 0, 0)
 
-void helper_xscmpexpdp(CPUPPCState *env, uint32_t opcode)
+void helper_xscmpexpdp(CPUPPCState *env, uint32_t opcode,
+   ppc_vsr_t *xa, ppc_vsr_t *xb)
 {
-ppc_vsr_t *xa = &env->vsr[xA(opcode)];
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];
 int64_t exp_a, exp_b;
 uint32_t cc;
 
@@ -2492,10 +2490,9 @@ void helper_xscmpexpqp(CPUPPCState *env, uint32_t opcode)
 }
 
 #define VSX_SCALAR_CMP(op, ordered)  \
-void helper_##op(CPUPPCState *env, uint32_t opcode)  \
+void helper_##op(CPUPPCState *env, uint32_t opcode,  \
+ ppc_vsr_t *xa, ppc_vsr_t *xb)   \
 {\
-ppc_vsr_t *xa = &env->vsr[xA(opcode)];   \
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];   \
 uint32_t cc = 0; \
 bool vxsnan_flag = false, vxvc_flag = false; \
  \
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index f56476ec41..0ab1ef2aee 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -375,7 +375,7 @@ DEF_HELPER_2(xsdivqp, void, env, i32)
 DEF_HELPER_3(xsredp, void, env, vsr, vsr)
 DEF_HELPER_3(xssqrtdp, void, env, vsr, vsr)
 DEF_HELPER_3(xsrsqrtedp, void, env, vsr, vsr)
-DEF_HELPER_2(xstdivdp, void, env, i32)
+DEF_HELPER_4(xstdivdp, void, env, i32, vsr, vsr)
 DEF_HELPER_2(xstsqrtdp, void, env, i32)
 DEF_HELPER_4(xsmaddadp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xsmaddmdp, void, env, vsr, vsr, vsr)
@@ -389,10 +389,10 @@ DEF_HELPER_4(xscmpeqdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpgtdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpgedp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpnedp, void, env, vsr, vsr, vsr)
-DEF_HELPER_2(xscmpexpdp, void, env, i32)
+DEF_HELPER_4(xscmpexpdp, void, env, i32, vsr, vsr)
 DEF_HELPER_2(xscmpexpqp, void, env, i32)
-DEF_HELPER_2(xscmpodp, void, env, i32)
-DEF_HELPER_2(xscmpudp, void, env, i32)
+DEF_HELPER_4(xscmpodp, void, env, i32, vsr, vsr)
+DEF_HELPER_4(xscmpudp, void, env, i32, vsr, vsr)
 DEF_HELPER_2(xscmpoqp, void, env, i32)
 DEF_HELPER_2(xscmpuqp, void, env, i32)
 DEF_HELPER_4(xsmaxdp, void, env, vsr, vsr, vsr)
@@ -460,7 +460,7 @@ DEF_HELPER_4(xvdivdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_3(xvredp, void, env, vsr, vsr)
 DEF_HELPER_3(xvsqrtdp, void, env, vsr, vsr)
 DEF_HELPER_3(xvrsqrtedp, void, env, vsr, vsr)
-DEF_HELPER_2(xvtdivdp, void, env, i32)
+DEF_HELPER_4(xvtdivdp, void, env, i32, vsr, vsr)
 DEF_HELPER_2(xvtsqrtdp, void, env, i32)
 DEF_HELPER_4(xvmaddadp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xvmaddmdp, void, env, vsr, vsr, vsr)
@@ -498,7 +498,7 @@ DEF_HELPER_4(xvdivsp, void, env, vsr, vsr, vsr)
 DEF_HELPER_3(xvresp, void, env, vsr, vsr)
 DEF_HELPER_3(xvsqrtsp, void, env, vsr, vsr)
 DEF_HELPER_3(xvrsqrtesp, void, env, vsr, vsr)
-DEF_HELPER_2(xvtdivsp, void, env, i32)
+DEF_HELPER_4(xvtdivsp, void, env, i32, vsr, vsr)
 DEF_HELPER_2(xvtsqrtsp, void, env, i32)
 DEF_HELPER_4(xvmaddasp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xvmaddmsp, void, env, vsr, vsr, vsr)
diff --git a/target/ppc/translate/vsx-impl.inc.c 
b/target/ppc/translate/vsx-impl.inc.c
index 1fb2bf706

[Qemu-devel] [PATCH v2 06/15] target/ppc: introduce separate generator and helper for xscvqpdp

2019-06-02 Thread Mark Cave-Ayland
Rather than perform the VSR register decoding within the helper itself,
introduce a new generator and helper function which perform the decode based
upon xT and xB at translation time.

The xscvqpdp helper is the only 2 parameter xT/xB implementation that requires
the opcode to be passed as an additional parameter, so handling this separately
allows us to optimise the conversion in the next commit.

Signed-off-by: Mark Cave-Ayland 
---
 target/ppc/fpu_helper.c |  5 ++---
 target/ppc/helper.h |  2 +-
 target/ppc/translate/vsx-impl.inc.c | 18 +-
 3 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 386db30681..a6556781e1 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2899,10 +2899,9 @@ VSX_CVT_FP_TO_FP_HP(xvcvhpsp, 4, float16, float32, 
VsrH(2 * i + 1), VsrW(i), 0)
  * xscvqpdp isn't using VSX_CVT_FP_TO_FP() because xscvqpdpo will be
  * added to this later.
  */
-void helper_xscvqpdp(CPUPPCState *env, uint32_t opcode)
+void helper_xscvqpdp(CPUPPCState *env, uint32_t opcode,
+ ppc_vsr_t *xt, ppc_vsr_t *xb)
 {
-ppc_vsr_t *xt = &env->vsr[xT(opcode)];
-ppc_vsr_t *xb = &env->vsr[xB(opcode)];
 ppc_vsr_t t = { };
 float_status tstat;
 
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index f6a97cedc6..5d15166988 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -405,7 +405,7 @@ DEF_HELPER_2(xscvdphp, void, env, i32)
 DEF_HELPER_2(xscvdpqp, void, env, i32)
 DEF_HELPER_2(xscvdpsp, void, env, i32)
 DEF_HELPER_2(xscvdpspn, i64, env, i64)
-DEF_HELPER_2(xscvqpdp, void, env, i32)
+DEF_HELPER_4(xscvqpdp, void, env, i32, vsr, vsr)
 DEF_HELPER_2(xscvqpsdz, void, env, i32)
 DEF_HELPER_2(xscvqpswz, void, env, i32)
 DEF_HELPER_2(xscvqpudz, void, env, i32)
diff --git a/target/ppc/translate/vsx-impl.inc.c 
b/target/ppc/translate/vsx-impl.inc.c
index 0befbb508f..eac8c8937b 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -998,6 +998,23 @@ VSX_CMP(xvcmpgesp, 0x0C, 0x0A, 0, PPC2_VSX)
 VSX_CMP(xvcmpgtsp, 0x0C, 0x09, 0, PPC2_VSX)
 VSX_CMP(xvcmpnesp, 0x0C, 0x0B, 0, PPC2_VSX)
 
+static void gen_xscvqpdp(DisasContext *ctx)
+{
+TCGv_i32 opc;
+TCGv_ptr xt, xb;
+if (unlikely(!ctx->vsx_enabled)) {
+gen_exception(ctx, POWERPC_EXCP_VSXU);
+return;
+}
+opc = tcg_const_i32(ctx->opcode);
+xt = gen_vsr_ptr(xT(ctx->opcode));
+xb = gen_vsr_ptr(xB(ctx->opcode));
+gen_helper_xscvqpdp(cpu_env, opc, xt, xb);
+tcg_temp_free_i32(opc);
+tcg_temp_free_ptr(xt);
+tcg_temp_free_ptr(xb);
+}
+
 #define GEN_VSX_HELPER_2(name, op1, op2, inval, type) \
 static void gen_##name(DisasContext *ctx) \
 { \
@@ -1086,7 +1103,6 @@ GEN_VSX_HELPER_2(xscvdphp, 0x16, 0x15, 0x11, PPC2_ISA300)
 GEN_VSX_HELPER_2(xscvdpsp, 0x12, 0x10, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xscvdpqp, 0x04, 0x1A, 0x16, PPC2_ISA300)
 GEN_VSX_HELPER_XT_XB_ENV(xscvdpspn, 0x16, 0x10, 0, PPC2_VSX207)
-GEN_VSX_HELPER_2(xscvqpdp, 0x04, 0x1A, 0x14, PPC2_ISA300)
 GEN_VSX_HELPER_2(xscvqpsdz, 0x04, 0x1A, 0x19, PPC2_ISA300)
 GEN_VSX_HELPER_2(xscvqpswz, 0x04, 0x1A, 0x09, PPC2_ISA300)
 GEN_VSX_HELPER_2(xscvqpudz, 0x04, 0x1A, 0x11, PPC2_ISA300)
-- 
2.11.0




[Qemu-devel] [PATCH v2 02/15] target/ppc: remove getVSR()/putVSR() from mem_helper.c

2019-06-02 Thread Mark Cave-Ayland
Since commit 8a14d31b00 "target/ppc: switch fpr/vsrl registers so all VSX
registers are in host endian order" functions getVSR() and putVSR() which used
to convert the VSR registers into host endian order are no longer required.

Signed-off-by: Mark Cave-Ayland 
---
 target/ppc/mem_helper.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 5b0f9ee50d..17a3c931a9 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -417,26 +417,27 @@ STVE(stvewx, cpu_stl_data_ra, bswap32, u32)
 void helper_##name(CPUPPCState *env, target_ulong addr, \
target_ulong xt_num, target_ulong rb)\
 {   \
-int i;  \
-ppc_vsr_t xt;   \
+ppc_vsr_t *xt = &env->vsr[xt_num];  \
+ppc_vsr_t t;\
 uint64_t nb = GET_NB(rb);   \
+int i;  \
 \
-xt.s128 = int128_zero();\
+t.s128 = int128_zero(); \
 if (nb) {   \
 nb = (nb >= 16) ? 16 : nb;  \
 if (msr_le && !lj) {\
 for (i = 16; i > 16 - nb; i--) {\
-xt.VsrB(i - 1) = cpu_ldub_data_ra(env, addr, GETPC());  \
+t.VsrB(i - 1) = cpu_ldub_data_ra(env, addr, GETPC());   \
 addr = addr_add(env, addr, 1);  \
 }   \
 } else {\
 for (i = 0; i < nb; i++) {  \
-xt.VsrB(i) = cpu_ldub_data_ra(env, addr, GETPC());  \
+t.VsrB(i) = cpu_ldub_data_ra(env, addr, GETPC());   \
 addr = addr_add(env, addr, 1);  \
 }   \
 }   \
 }   \
-putVSR(xt_num, &xt, env);   \
+*xt = t;\
 }
 
 VSX_LXVL(lxvl, 0)
@@ -447,26 +448,28 @@ VSX_LXVL(lxvll, 1)
 void helper_##name(CPUPPCState *env, target_ulong addr,   \
target_ulong xt_num, target_ulong rb)  \
 { \
-int i;\
-ppc_vsr_t xt; \
+ppc_vsr_t *xt = &env->vsr[xt_num];\
+ppc_vsr_t t = *xt;\
 target_ulong nb = GET_NB(rb); \
+int i;\
   \
 if (!nb) {\
 return;   \
 } \
-getVSR(xt_num, &xt, env); \
+  \
 nb = (nb >= 16) ? 16 : nb;\
 if (msr_le && !lj) {  \
 for (i = 16; i > 16 - nb; i--) {  \
-cpu_stb_data_ra(env, addr, xt.VsrB(i - 1), GETPC());  \
+cpu_stb_data_ra(env, addr, t.VsrB(i - 1), GETPC());   \
 addr = addr_add(env, addr, 1);\
 } \
 } else {  \
 for (i = 0; i < nb; i++) {\
-cpu_stb_data_ra(env, addr, xt.VsrB(i), GETPC());  \
+cpu_stb_data_ra(env, addr, t.VsrB(i), GETPC())  ; \
 addr = addr_add(env, addr, 1);\
 } \
 } \
+*xt = t;  \
 }
 
 VSX_STXV

[Qemu-devel] [Bug 1830872] Re: AARCH64 to ARMv7 mistranslation in TCG

2019-06-02 Thread Laszlo Ersek (Red Hat)
Possibly related:
[Qemu-devel] "accel/tcg: demacro cputlb" break qemu-system-x86_64
https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg07362.html

(qemu-system-x86_64 fails to boot 64-bit kernel under TCG accel when
QEMU is built for i686)

Note to self: try to reprodouce the present issue with QEMU built at
eed5664238ea^ -- this LP has originally been filed about the tree at
a4f667b67149, and that commit contains eed5664238ea. So checking at
eed5664238ea^ might reveal a difference.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1830872

Title:
  AARCH64 to ARMv7 mistranslation in TCG

Status in QEMU:
  New

Bug description:
  The following guest code:

  
https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S

  implements, in hand-optimized aarch64 assembly, the CopyMem() edk2 (EFI
  Development Kit II) library function. (CopyMem() basically has memmove()
  semantics, to provide a standard C analog here.) The relevant functions
  are InternalMemCopyMem() and __memcpy().

  When TCG translates this aarch64 code to x86_64, everything works
  fine.

  When TCG translates this aarch64 code to ARMv7, the destination area of
  the translated CopyMem() function becomes corrupted -- it differs from
  the intended source contents. Namely, in every 4096 byte block, the
  8-byte word at offset 4032 (0xFC0) is zeroed out in the destination,
  instead of receiving the intended source value.

  I'm attaching two hexdumps of the same destination area:

  - "good.txt" is a hexdump of the destination area when CopyMem() was
translated to x86_64,

  - "bad.txt" is a hexdump of the destination area when CopyMem() was
translated to ARMv7.

  In order to assist with the analysis of this issue, I disassembled the
  aarch64 binary with "objdump". Please find the listing in
  "DxeCore.objdump", attached. The InternalMemCopyMem() function starts at
  hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180.

  And, I ran the guest on the ARMv7 host with "-d
  in_asm,op,op_opt,op_ind,out_asm". Please find the log in
  "tcg.in_asm.op.op_opt.op_ind.out_asm.log", attached.

  The TBs that correspond to (parts of) the InternalMemCopyMem() and
  __memcpy() functions are scattered over the TCG log file, but the offset
  between the "nice" disassembly from "DxeCore.objdump", and the in-RAM
  TBs in the TCG log, can be determined from the fact that there is a
  single prfm instruction in the entire binary. The instruction's offset
  is 0x2b180 in "DxeCore.objdump" -- at the beginning of the __memcpy()
  function --, and its RAM address is 0x472d2180 in the TCG log. Thus the
  difference (= the load address of DxeCore.efi) is 0x472a7000.

  QEMU was built at commit a4f667b67149 ("Merge remote-tracking branch
  'remotes/cohuck/tags/s390x-20190521-3' into staging", 2019-05-21).

  The reproducer command line is (on an ARMv7 host):

qemu-system-aarch64 \
  -display none \
  -machine virt,accel=tcg \
  -nodefaults \
  -nographic \
  -drive 
if=pflash,format=raw,file=$prefix/share/qemu/edk2-aarch64-code.fd,readonly \
  -drive 
if=pflash,format=raw,file=$prefix/share/qemu/edk2-arm-vars.fd,snapshot=on \
  -cpu cortex-a57 \
  -chardev stdio,signal=off,mux=on,id=char0 \
  -mon chardev=char0,mode=readline \
  -serial chardev:char0

  The apparent symptom is an assertion failure *in the guest*, such as

  > ASSERT [DxeCore]
  > 
/home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090):
  > Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength

  but that is only a (distant) consequence of the CopyMem()
  mistranslation, and resultant destination area corruption.

  Originally reported in the following two mailing list messages:
  - 9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com">http://mid.mail-archive.com/9d2e260c-c491-03d2-9b8b-b57b72083f77@redhat.com
  - f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com">http://mid.mail-archive.com/f1cec8c0-1a9b-f5bb-f951-ea0ba9d276ee@redhat.com

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1830872/+subscriptions



Re: [Qemu-devel] [PATCH 0/5] Few fixes for userspace NVME driver

2019-06-02 Thread Maxim Levitsky
On Mon, 2019-04-15 at 16:57 +0300, Maxim Levitsky wrote:
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: qemu-devel@nongnu.org
> 
> 
> Hi!
> These are few assorted fixes and features for the userspace
> nvme driver.
> 
> Tested that on my laptop with my Samsung X5 thunderbolt drive, which
> happens to have 4K sectors, support for discard and write zeros.
> 
> Also bunch of fixes sitting in my queue from the period when I developed
> the nvme-mdev driver.
> 
> Best regards,
>   Maxim Levitsky
> 
> Maxim Levitsky (5):
>   block/nvme: don't flip CQ phase bits
>   block/nvme: fix doorbell stride
>   block/nvme: support larger that 512 bytes sector devices
>   block/nvme: add support for write zeros
>   block/nvme: add support for discard
> 
>  block/nvme.c | 194 +--
>  block/trace-events   |   3 +
>  include/block/nvme.h |  17 +++-
>  3 files changed, 204 insertions(+), 10 deletions(-)
> 

Ping.

Best regards,
Maxim Levitsky





[Qemu-devel] [Bug 1831362] [NEW] European keyboard PC-105 deadkey

2019-06-02 Thread Roland Christmann
Public bug reported:

With a freshly compiled version of qemu 4.0.50 on Windows 10 (host)

I am using 3 different Belgian keyboards and I have the same behaviour
- 2 USB keyboards (Logitech and HP) and
- the keyboard of my laptop (HP)

3 characters on the same key cannot be used (the key seams to be dead):
< (less than),
> (greater than) used with the combination of LShift or RShift
\ (backslash) used with the combination of AltGr

Using grub command mode from an archlinux installation (5.1.4)
The keyboard seams to be a mix of azerty and qwerty keyboard
all letters are correctly mapped but all numbers and special
characters are not

Using sendkey in monitor
"sendkey <" results in : \
"sendkey shift-<" results in : |
"sendkey ctrl-alt-<" results in : nothing

REM: VirtualBox can handle this key and with the showkey command
 from the archlinux kbd package, it shows :
 keycode 86 press
 keycode 86 release

** Affects: qemu
 Importance: Undecided
 Status: New


** Tags: keyboard windows

** Tags added: windows

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1831362

Title:
  European keyboard PC-105 deadkey

Status in QEMU:
  New

Bug description:
  With a freshly compiled version of qemu 4.0.50 on Windows 10 (host)

  I am using 3 different Belgian keyboards and I have the same behaviour
  - 2 USB keyboards (Logitech and HP) and
  - the keyboard of my laptop (HP)

  3 characters on the same key cannot be used (the key seams to be dead):
  < (less than),
  > (greater than) used with the combination of LShift or RShift
  \ (backslash) used with the combination of AltGr

  Using grub command mode from an archlinux installation (5.1.4)
  The keyboard seams to be a mix of azerty and qwerty keyboard
  all letters are correctly mapped but all numbers and special
  characters are not

  Using sendkey in monitor
  "sendkey <" results in : \
  "sendkey shift-<" results in : |
  "sendkey ctrl-alt-<" results in : nothing

  REM: VirtualBox can handle this key and with the showkey command
   from the archlinux kbd package, it shows :
   keycode 86 press
   keycode 86 release

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1831362/+subscriptions



Re: [Qemu-devel] [PATCH 1/2] target/mips: Improve performance for MSA binary operations

2019-06-02 Thread Aleksandar Markovic
On Jun 1, 2019 4:16 PM, "Aleksandar Markovic" 
wrote:
>
> > From: Mateja Marjanovic 
> > Sent: Monday, March 4, 2019 5:51 PM
> > To: qemu-devel@nongnu.org
> > Cc: aurel...@aurel32.net; Aleksandar Markovic; Aleksandar Rikalo
> > Subject: [PATCH 1/2] target/mips: Improve performance for MSA binary
operations
> >
> > From: Mateja Marjanovic 
> >
> > Eliminate loops for better performance.
> >
> > Signed-off-by: Mateja Marjanovic 
> > ---
> >  target/mips/msa_helper.c | 43
++-
> >  1 file changed, 30 insertions(+), 13 deletions(-)
> >
>
> The commit message should be a little bit more informative - for example,
> it could list the affected instructions. Please consider other groups of
> MSA instructions that are implemented via helpers that use similar "for"
> loops. Otherwise:
>
> Reviewed-by: Aleksandar Markovic 
>

Mateja, you don't need to do anything regarding this patch, I am going to
fix the issues while appying.

Thanks, Aleksandar

> > diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
> > index 4c7ec05..1152fda 100644
> > --- a/target/mips/msa_helper.c
> > +++ b/target/mips/msa_helper.c
> > @@ -804,28 +804,45 @@ void helper_msa_ ## func ## _df(CPUMIPSState
*env, uint32_t > df, \
> >  wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
  \
> >  wr_t *pws = &(env->active_fpu.fpr[ws].wr);
  \
> >  wr_t *pwt = &(env->active_fpu.fpr[wt].wr);
  \
> > -uint32_t i;
 \
> >
  \
> >  switch (df) {
 \
> >  case DF_BYTE:
 \
> > -for (i = 0; i < DF_ELEMENTS(DF_BYTE); i++) {
  \
> > -pwd->b[i] = msa_ ## func ## _df(df, pws->b[i],
pwt->b[i]);  \
> > -}
 \
> > +pwd->b[0]  = msa_ ## func ## _df(df, pws->b[0], pwt->b[0]);
 \
> > +pwd->b[1]  = msa_ ## func ## _df(df, pws->b[1], pwt->b[1]);
 \
> > +pwd->b[2]  = msa_ ## func ## _df(df, pws->b[2], pwt->b[2]);
 \
> > +pwd->b[3]  = msa_ ## func ## _df(df, pws->b[3], pwt->b[3]);
 \
> > +pwd->b[4]  = msa_ ## func ## _df(df, pws->b[4], pwt->b[4]);
 \
> > +pwd->b[5]  = msa_ ## func ## _df(df, pws->b[5], pwt->b[5]);
 \
> > +pwd->b[6]  = msa_ ## func ## _df(df, pws->b[6], pwt->b[6]);
 \
> > +pwd->b[7]  = msa_ ## func ## _df(df, pws->b[7], pwt->b[7]);
 \
> > +pwd->b[8]  = msa_ ## func ## _df(df, pws->b[8], pwt->b[8]);
 \
> > +pwd->b[9]  = msa_ ## func ## _df(df, pws->b[9], pwt->b[9]);
 \
> > +pwd->b[10] = msa_ ## func ## _df(df, pws->b[10], pwt->b[10]);
 \
> > +pwd->b[11] = msa_ ## func ## _df(df, pws->b[11], pwt->b[11]);
 \
> > +pwd->b[12] = msa_ ## func ## _df(df, pws->b[12], pwt->b[12]);
 \
> > +pwd->b[13] = msa_ ## func ## _df(df, pws->b[13], pwt->b[13]);
 \
> > +pwd->b[14] = msa_ ## func ## _df(df, pws->b[14], pwt->b[14]);
 \
> > +pwd->b[15] = msa_ ## func ## _df(df, pws->b[15], pwt->b[15]);
 \
> >  break;
  \
> >  case DF_HALF:
 \
> > -for (i = 0; i < DF_ELEMENTS(DF_HALF); i++) {
  \
> > -pwd->h[i] = msa_ ## func ## _df(df, pws->h[i],
pwt->h[i]);  \
> > -}
 \
> > +pwd->h[0] = msa_ ## func ## _df(df, pws->h[0], pwt->h[0]);
  \
> > +pwd->h[1] = msa_ ## func ## _df(df, pws->h[1], pwt->h[1]);
  \
> > +pwd->h[2] = msa_ ## func ## _df(df, pws->h[2], pwt->h[2]);
  \
> > +pwd->h[3] = msa_ ## func ## _df(df, pws->h[3], pwt->h[3]);
  \
> > +pwd->h[4] = msa_ ## func ## _df(df, pws->h[4], pwt->h[4]);
  \
> > +pwd->h[5] = msa_ ## func ## _df(df, pws->h[5], pwt->h[5]);
  \
> > +pwd->h[6] = msa_ ## func ## _df(df, pws->h[6], pwt->h[6]);
  \
> > +pwd->h[7] = msa_ ## func ## _df(df, pws->h[7], pwt->h[7]);
  \
> >  break;
  \
> >  case DF_WORD:
 \
> > -for (i = 0; i < DF_ELEMENTS(DF_WORD); i++) {
  \
> > -pwd->w[i] = msa_ ## func ## _df(df, pws->w[i],
pwt->w[i]);  \
> > -}
 \
> > +pwd->w[0] = msa_ ## func ## _df(df, pws->w[0], pwt->w[0]);
  \
> > +pwd->w[1] = msa_ ## func ## _df(df, pws->w[1], pwt->w[1]);
  \
> > +pwd->w[2] = msa_ ## func ## _df(df, pws->w[2], pwt->w[2]);
  \
> > +pwd->w[3] = msa_ ## func ## _df(df, pws->w[3], pwt->w[3]);
  \
> >  break;
  \
> >  case DF_DOUBLE:
 \
> > -for (i = 0; i < DF_ELEMENTS(DF_DOUBLE); i++) {
  \
> > -pwd->d[i] = msa_ ## func ## _df(df, pws->d[i],
pwt->d[i]);  \
> > -}
 \
> > +pwd->d[0] = msa_ ## func ## _df(df, pws->d[0], pwt->d[0]);
  \
> > +pwd->d[1] = msa_ ## func ## _df(df, pws->d[1], pwt->d[1]);
  \
> >  break;
  \
> >  default:
  \
> >  assert(0);
  \
> > --
> > 2.7.4
> >
> >


Re: [Qemu-devel] Headers without multiple inclusion guards

2019-06-02 Thread Dmitry Fleytman
> 
> hw/net/e1000e_core.h
> hw/net/e1000x_common.h
> hw/net/vmxnet3_defs.h


Unintentional.






Re: [Qemu-devel] [PATCH v2 01/10] hw/scsi/vmw_pvscsi: Use qbus_reset_all() directly

2019-06-02 Thread Dmitry Fleytman
Reviewed-by: Dmitry Fleytman 

> On 28 May 2019, at 19:40, Philippe Mathieu-Daudé  wrote:
> 
> Since the BusState is accesible from the SCSIBus object,
> it is pointless to use qbus_reset_all_fn.
> Use qbus_reset_all() directly.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> v2: Use BUS() macro (Peter Maydell)
> 
> One step toward removing qbus_reset_all_fn()
> ---
> hw/scsi/vmw_pvscsi.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
> index 584b4be07e..c39e33fa35 100644
> --- a/hw/scsi/vmw_pvscsi.c
> +++ b/hw/scsi/vmw_pvscsi.c
> @@ -440,7 +440,7 @@ static void
> pvscsi_reset_adapter(PVSCSIState *s)
> {
> s->resetting++;
> -qbus_reset_all_fn(&s->bus);
> +qbus_reset_all(BUS(&s->bus));
> s->resetting--;
> pvscsi_process_completion_queue(s);
> assert(QTAILQ_EMPTY(&s->pending_queue));
> @@ -848,7 +848,7 @@ pvscsi_on_cmd_reset_bus(PVSCSIState *s)
> trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_RESET_BUS");
> 
> s->resetting++;
> -qbus_reset_all_fn(&s->bus);
> +qbus_reset_all(BUS(&s->bus));
> s->resetting--;
> return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
> }
> -- 
> 2.20.1
>