Re: [Qemu-devel] [PATCH v4 03/20] file-posix: Switch to .bdrv_co_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:58, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the file protocol driver accordingly.

In want_zero mode, we continue to report fine-grained hole
information (the caller wants as much mapping detail as possible);
but when not in that mode, the caller prefers larger *pnum and
merely cares about what offsets are allocated at this layer, rather
than where the holes live.  Since holes still read as zeroes at
this layer (rather than deferring to a backing layer), we can take
the shortcut of skipping lseek(), and merely state that all bytes
are allocated.

Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 



--
Best regards,
Vladimir




Re: [Qemu-devel] About virtio device hotplug in Q35! 【外域邮件.谨慎查阅】

2017-11-30 Thread Bob Chen
Hi,

After 3 months of work and investigation, and tedious mail discussions with
Nvidia, I think some progress have been made, in terms of the
GPUDirect(p2p) in virtual environment.

The only remaining issue then, is the low bidirectional bandwidth between
two sibling GPUs under the same PCIe switch.

We expanded the tests to run on even more GPU cards, so the results seemed
to be explicit now.


P40 is OK, and its hardware topology on host is:
 \-[:00]-+-00.0  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3
v4/Xeon D DMI2
 +-01.0-[03]00.0  LSI Logic / Symbios Logic MegaRAID SAS-3
3008 [Fury]
 +-02.0-[04]00.0  NVIDIA Corporation GP102GL [Tesla P40]
 +-03.0-[02]00.0  NVIDIA Corporation GP102GL [Tesla P40]


M60, not OK, low bandwidth:
 \-[:00]-+-00.0  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3
v4/Xeon D DMI2
 +-01.0-[06]00.0  LSI Logic / Symbios Logic MegaRAID SAS-3
3008 [Fury]
 +-02.0-[07-0a]00.0-[08-0a]--+-08.0-[09]00.0  NVIDIA
Corporation GM204GL [Tesla M60]
 |   \-10.0-[0a]00.0  NVIDIA
Corporation GM204GL [Tesla M60]


V100, not OK, low bandwidth:
\-[:00]-+-00.0  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon
D DMI2
 +-01.0-[01]--+-00.0  Mellanox Technologies MT27710 Family
[ConnectX-4 Lx]
 |\-00.1  Mellanox Technologies MT27710 Family
[ConnectX-4 Lx]
 +-02.0-[02-05]00.0-[03-05]--+-08.0-[04]00.0  NVIDIA
Corporation GV100 [Tesla V100 PCIe]
 |   \-10.0-[05]00.0  NVIDIA
Corporation GV100 [Tesla V100 PCIe]



So what might be the actual effect of the PLX switch hardware for GPU data
flow? Although it is not visible in guest OS.
Nvidia tech-support guys are not familiar with virtualization. They asked
us to consult the community first.


Re: [Qemu-devel] device_del fail

2017-11-30 Thread Lying
hello, i try again on arch Linux , after i add gpu, display is not 
light,following message from "info pci":
  Bus  0, device   5, function 0:
VGA controller: PCI device 10de:1401
  IRQ 11.
  BAR0: 32 bit memory at 0x [0x00fe].
  BAR1: 64 bit prefetchable memory at 0x [0x0ffe].
  BAR3: 64 bit prefetchable memory at 0x [0x01fe].
  BAR5: I/O at 0x [0x007e].
  BAR6: 32 bit memory at 0x [0x0001fffe].
  id "gpu1"
  Bus  0, device   6, function 0:
Audio controller: PCI device 10de:0fba
  IRQ 10.
  BAR0: 32 bit memory at 0x3e82 [0x3e823fff].
  id "gpu2"


Then i connect that guest with ssh:
Last login: Thu Nov 30 22:29:03 2017(it's normal)




At 2017-11-30 10:32:26, "Fam Zheng"  wrote:
>Let's keep qemu-devel@nongnu.org on the list so other people can chime in.
>
>On Thu, 11/30 10:05, Lying wrote:
> 
>> At 2017-11-29 14:29:17, "Fam Zheng"  wrote:
>> >On Wed, 11/29 13:43, Lying wrote:
>> >> Hello everybody, I encounter a error on my vm guest
>> >> I remove my device on my guest is failed, following is detail message.
>> >> I didn't add my audio device before running my guest, So i add audio with 
>> >> "device_add" after running
>> >> Then i can use it, so i remove it with "device_del", but when i do it 
>> >> again, i
>> >> can't use my audio device and cannot remove it.
>> >
>> >
>> >I haven't tried unplugging an audio device myself but in general PCI hot 
>> >unplug
>> >needs guest cooperation so some infomation about guest is also helpful.  
>> >Which
>> >guest OS is this?
>> >
>> >What are the exact commands that you used and what are the messages from
>> >host/guest when you device_del?
>> >
>> 
>> >Fam
>> 
>> 
>> hello, thanks for your reply, and i'm sorry that i not notice at yesterday .
>> following is my information:
>> guest : window-7
>> host : Ubuntu-16.04
>> qemu : 2.11.0-rc0
>> libvirt : 3.9
>>seabios : 1.10.74 (from qemu)
>> exact commands:
>> add:"device_add 
>> driver=vfio-pci,host=02:00.0,x-vga=on,multifunction=yes,id=gpu1"
>> remove:"device_del gpu1"
>> and not messages when i running device_del. But i can see it still exist 
>> while running commands "info pci"
>
>Probably the guest didn't respond to the hot-unplug event. In your original
>question you said it is an audio device but your command suggests it is a GPU
>device. I'm not sure Windows driver can handle this. Have you tried different
>guests, such as Linux?
>
>Cc Alex who may know more.
>
>Fam


Re: [Qemu-devel] [PATCH v4 04/20] gluster: Switch to .bdrv_co_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the gluster driver accordingly.

In want_zero mode, we continue to report fine-grained hole
information (the caller wants as much mapping detail as possible);
but when not in that mode, the caller prefers larger *pnum and
merely cares about what offsets are allocated at this layer, rather
than where the holes live.  Since holes still read as zeroes at
this layer (rather than deferring to a backing layer), we can take
the shortcut of skipping find_allocation(), and merely state that
all bytes are allocated.

Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 



--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH v4 08/20] null: Switch to .bdrv_co_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the null driver accordingly.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak
v3: no change
v2: rebase to mapping parameter
---
  block/null.c | 23 ---
  1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/block/null.c b/block/null.c
index dd9c13f9ba..20c7eb0ee2 100644
--- a/block/null.c
+++ b/block/null.c
@@ -223,22 +223,23 @@ static int null_reopen_prepare(BDRVReopenState 
*reopen_state,
  return 0;
  }

-static int64_t coroutine_fn null_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
- BlockDriverState **file)
+static int coroutine_fn null_co_block_status(BlockDriverState *bs,
+ bool want_zero, int64_t offset,
+ int64_t bytes, int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
  {
  BDRVNullState *s = bs->opaque;
-off_t start = sector_num * BDRV_SECTOR_SIZE;
+int64_t ret = BDRV_BLOCK_OFFSET_VALID;


int ret =

with that:
Reviewed-by: Vladimir Sementsov-Ogievskiy 



-*pnum = nb_sectors;
+*pnum = bytes;
+*map = offset;
  *file = bs;

  if (s->read_zeroes) {
-return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO;
-} else {
-return BDRV_BLOCK_OFFSET_VALID | start;
+ret |= BDRV_BLOCK_ZERO;
  }
+return ret;
  }

  static void null_refresh_filename(BlockDriverState *bs, QDict *opts)
@@ -270,7 +271,7 @@ static BlockDriver bdrv_null_co = {
  .bdrv_co_flush_to_disk  = null_co_flush,
  .bdrv_reopen_prepare= null_reopen_prepare,

-.bdrv_co_get_block_status   = null_co_get_block_status,
+.bdrv_co_block_status   = null_co_block_status,

  .bdrv_refresh_filename  = null_refresh_filename,
  };
@@ -290,7 +291,7 @@ static BlockDriver bdrv_null_aio = {
  .bdrv_aio_flush = null_aio_flush,
  .bdrv_reopen_prepare= null_reopen_prepare,

-.bdrv_co_get_block_status   = null_co_get_block_status,
+.bdrv_co_block_status   = null_co_block_status,

  .bdrv_refresh_filename  = null_refresh_filename,
  };



--
Best regards,
Vladimir




[Qemu-devel] [PATCH for-2.12 1/7] tests/boot-serial-test: Make sure that we check the timeout regularly

2017-11-30 Thread Thomas Huth
If the guest continuesly writes characters to the UART, we never leave
the inner while loop and thus never check whether we've reached the
timeout value. So if we fail to find the expected string in the UART
output, the test just hangs and never finishs. Use a counter to regularly
break out of the while loop to check the timeout.

Signed-off-by: Thomas Huth 
---
 tests/boot-serial-test.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
index c935d69..fa4183d 100644
--- a/tests/boot-serial-test.c
+++ b/tests/boot-serial-test.c
@@ -43,12 +43,13 @@ static testdef_t tests[] = {
 static void check_guest_output(const testdef_t *test, int fd)
 {
 bool output_ok = false;
-int i, nbr, pos = 0;
+int i, nbr, pos = 0, ccnt;
 char ch;
 
 /* Poll serial output... Wait at most 60 seconds */
 for (i = 0; i < 6000; ++i) {
-while ((nbr = read(fd, &ch, 1)) == 1) {
+ccnt = 0;
+while ((nbr = read(fd, &ch, 1)) == 1 && ccnt++ < 512) {
 if (ch == test->expect[pos]) {
 pos += 1;
 if (test->expect[pos] == '\0') {
-- 
1.8.3.1




[Qemu-devel] [PATCH for-2.12 6/7] tests/boot-serial-test: Add a test for the moxiesim machine

2017-11-30 Thread Thomas Huth
Now that moxiesim supports the -bios parameter, we can check this machine
in the boot-serial tester, too, by supplying a mini bios that only writes
'T' characters to the UART.

Signed-off-by: Thomas Huth 
---
 tests/Makefile.include   | 2 ++
 tests/boot-serial-test.c | 8 
 2 files changed, 10 insertions(+)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index cd908da..00c1e11 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -307,6 +307,8 @@ check-qtest-mips64-y = tests/endianness-test$(EXESUF)
 
 check-qtest-mips64el-y = tests/endianness-test$(EXESUF)
 
+check-qtest-moxie-y = tests/boot-serial-test$(EXESUF)
+
 check-qtest-ppc-y = tests/endianness-test$(EXESUF)
 check-qtest-ppc-y += tests/boot-order-test$(EXESUF)
 check-qtest-ppc-y += tests/prom-env-test$(EXESUF)
diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
index a39273a..1deddb8 100644
--- a/tests/boot-serial-test.c
+++ b/tests/boot-serial-test.c
@@ -40,6 +40,13 @@ static const uint8_t kernel_plml605[] = {
 0xfc, 0xff, 0x00, 0xb8  /* bri   -4  loop */
 };
 
+static const uint8_t bios_moxiesim[] = {
+0x20, 0x10, 0x00, 0x00, 0x03, 0xf8, /* ldi.s r1,0x3f8 */
+0x1b, 0x20, 0x00, 0x00, 0x00, 0x54, /* ldi.b r2,'T' */
+0x1e, 0x12, /* st.b  r1,r2 */
+0x1a, 0x00, 0x00, 0x00, 0x10, 0x00  /* jmpa  0x1000 */
+};
+
 typedef struct testdef {
 const char *arch;   /* Target architecture */
 const char *machine;/* Name of the machine */
@@ -70,6 +77,7 @@ static testdef_t tests[] = {
   sizeof(kernel_pls3adsp1800), kernel_pls3adsp1800 },
 { "microblazeel", "petalogix-ml605", "", "TT",
   sizeof(kernel_plml605), kernel_plml605 },
+{ "moxie", "moxiesim", "", "TT", sizeof(bios_moxiesim), 0, bios_moxiesim },
 
 { NULL }
 };
-- 
1.8.3.1




[Qemu-devel] [PATCH for-2.12 5/7] hw/moxie/moxiesim: Add support for loading a BIOS on moxiesim

2017-11-30 Thread Thomas Huth
The moxiesim machine already defines a memory region for a firmware,
but does not provide the possibility to load an image via "-bios" yet.
This will be needed for the boot-serial tester, so let's add support
for "-bios" here now.

Signed-off-by: Thomas Huth 
---
 hw/moxie/moxiesim.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/moxie/moxiesim.c b/hw/moxie/moxiesim.c
index 3ba5848..00f56df 100644
--- a/hw/moxie/moxiesim.c
+++ b/hw/moxie/moxiesim.c
@@ -25,6 +25,7 @@
  * THE SOFTWARE.
  */
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "qemu-common.h"
 #include "cpu.h"
@@ -41,6 +42,8 @@
 #include "elf.h"
 
 #define PHYS_MEM_BASE 0x8000
+#define FIRMWARE_BASE 0x1000
+#define FIRMWARE_SIZE (128 * 0x1000)
 
 typedef struct {
 uint64_t ram_size;
@@ -123,8 +126,8 @@ static void moxiesim_init(MachineState *machine)
 memory_region_init_ram(ram, NULL, "moxiesim.ram", ram_size, &error_fatal);
 memory_region_add_subregion(address_space_mem, ram_base, ram);
 
-memory_region_init_ram(rom, NULL, "moxie.rom", 128 * 0x1000, &error_fatal);
-memory_region_add_subregion(get_system_memory(), 0x1000, rom);
+memory_region_init_ram(rom, NULL, "moxie.rom", FIRMWARE_SIZE, 
&error_fatal);
+memory_region_add_subregion(get_system_memory(), FIRMWARE_BASE, rom);
 
 if (kernel_filename) {
 loader_params.ram_size = ram_size;
@@ -133,6 +136,11 @@ static void moxiesim_init(MachineState *machine)
 loader_params.initrd_filename = initrd_filename;
 load_kernel(cpu, &loader_params);
 }
+if (bios_name) {
+if (load_image_targphys(bios_name, FIRMWARE_BASE, FIRMWARE_SIZE) < 0) {
+error_report("Failed to load firmware '%s'", bios_name);
+}
+}
 
 /* A single 16450 sits at offset 0x3f8.  */
 if (serial_hds[0]) {
-- 
1.8.3.1




[Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board

2017-11-30 Thread Thomas Huth
We can output a character quite easily here with some few lines of
assembly that we provide as a mini-kernel for this board.

Signed-off-by: Thomas Huth 
---
 tests/boot-serial-test.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
index d997269..dd3828c 100644
--- a/tests/boot-serial-test.c
+++ b/tests/boot-serial-test.c
@@ -16,6 +16,14 @@
 #include "qemu/osdep.h"
 #include "libqtest.h"
 
+static const uint8_t kernel_mcf5208[] = {
+0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */
+0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */
+0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) Enable TX 
*/
+0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0)   Print 'T' 
*/
+0x60, 0xfa  /* bra.s  loop */
+};
+
 typedef struct testdef {
 const char *arch;   /* Target architecture */
 const char *machine;/* Name of the machine */
@@ -41,6 +49,8 @@ static testdef_t tests[] = {
 { "x86_64", "q35", "-device sga", "SGABIOS" },
 { "s390x", "s390-ccw-virtio",
   "-nodefaults -device sclpconsole,chardev=serial0", "virtio device" },
+{ "m68k", "mcf5208evb", "", "TT", sizeof(kernel_mcf5208), kernel_mcf5208 },
+
 { NULL }
 };
 
-- 
1.8.3.1




[Qemu-devel] [PATCH for-2.12 0/7] Test more machines and TCG CPUs automatically

2017-11-30 Thread Thomas Huth
During the 2.11 development cycle, a bunch of ARM boards were broken
completely, and nobody noticed it for a couple of weeks. Also a lot
of TCG CPUs do not get any test coverage at all yet (especially since
the old tests in the tests/tcg/ are still pretty much dead today).
So we really should improve our testing in this area.

This patch series now tries to improve the test coverage at least a
little bit: The second patch adds the possibility to the boot-serial
test to also load home-brew mini-kernels and firmware blobs, and the
following patches then add some tests for various boards / CPUs with
some few lines of self-made assembly.

Thomas Huth (7):
  tests/boot-serial-test: Make sure that we check the timeout regularly
  tests/boot-serial-test: Add code to allow to specify our own kernel or
bios
  tests/boot-serial-test: Add support for the mcf5208evb board
  tests/boot-serial-test: Add tests for microblaze boards
  hw/moxie/moxiesim: Add support for loading a BIOS on moxiesim
  tests/boot-serial-test: Add a test for the moxiesim machine
  tests/boot-serial-test: Add support for the raspi2 machine

 hw/moxie/moxiesim.c  |  12 +-
 tests/Makefile.include   |   7 
 tests/boot-serial-test.c | 106 ---
 3 files changed, 109 insertions(+), 16 deletions(-)

-- 
1.8.3.1




[Qemu-devel] [PATCH for-2.12 7/7] tests/boot-serial-test: Add support for the raspi2 machine

2017-11-30 Thread Thomas Huth
The raspi2 machine supports loading firmware images, so we can easily
load a small test sequence as raw binary blob here to test the UART.

Signed-off-by: Thomas Huth 
---
 tests/Makefile.include   | 1 +
 tests/boot-serial-test.c | 9 +
 2 files changed, 10 insertions(+)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 00c1e11..09025dc 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -361,6 +361,7 @@ check-qtest-arm-y += tests/virtio-blk-test$(EXESUF)
 gcov-files-arm-y += arm-softmmu/hw/block/virtio-blk.c
 check-qtest-arm-y += tests/test-arm-mptimer$(EXESUF)
 gcov-files-arm-y += hw/timer/arm_mptimer.c
+check-qtest-arm-y += tests/boot-serial-test$(EXESUF)
 
 check-qtest-aarch64-y = tests/numa-test$(EXESUF)
 
diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
index 1deddb8..663b78b 100644
--- a/tests/boot-serial-test.c
+++ b/tests/boot-serial-test.c
@@ -47,6 +47,14 @@ static const uint8_t bios_moxiesim[] = {
 0x1a, 0x00, 0x00, 0x00, 0x10, 0x00  /* jmpa  0x1000 */
 };
 
+static const uint8_t bios_raspi2[] = {
+0x08, 0x30, 0x9f, 0xe5, /* ldr   r3,[pc,#8]Get base */
+0x54, 0x20, 0xa0, 0xe3, /* mov r2,#'T' */
+0x00, 0x20, 0xc3, 0xe5, /* strbr2,[r3] */
+0xfb, 0xff, 0xff, 0xea, /* b   loop */
+0x00, 0x10, 0x20, 0x3f, /* 0x3f201000 = UART0 base addr */
+};
+
 typedef struct testdef {
 const char *arch;   /* Target architecture */
 const char *machine;/* Name of the machine */
@@ -78,6 +86,7 @@ static testdef_t tests[] = {
 { "microblazeel", "petalogix-ml605", "", "TT",
   sizeof(kernel_plml605), kernel_plml605 },
 { "moxie", "moxiesim", "", "TT", sizeof(bios_moxiesim), 0, bios_moxiesim },
+{ "arm", "raspi2", "", "TT", sizeof(bios_raspi2), 0, bios_raspi2 },
 
 { NULL }
 };
-- 
1.8.3.1




[Qemu-devel] [PATCH for-2.12 2/7] tests/boot-serial-test: Add code to allow to specify our own kernel or bios

2017-11-30 Thread Thomas Huth
QEMU only ships with some few firmware images, i.e. we can currently run
the boot-serial test only on a very limited set of machines. But writing
some characters to the default UART of a machine can often be done with
some few lines of assembly, so we add the possibility to the boot-serial
tester to use its own mini-kernels or mini-firmwares. We write such images
then into a file that we can load with the "-kernel" or "-bios" parameter
when we launch QEMU.

Signed-off-by: Thomas Huth 
---
 tests/Makefile.include   |  2 ++
 tests/boot-serial-test.c | 54 +---
 2 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index c002352..41ca44c 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -297,6 +297,8 @@ gcov-files-x86_64-y = $(subst 
i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y)
 
 check-qtest-alpha-y = tests/boot-serial-test$(EXESUF)
 
+check-qtest-m68k-y = tests/boot-serial-test$(EXESUF)
+
 check-qtest-mips-y = tests/endianness-test$(EXESUF)
 
 check-qtest-mips64-y = tests/endianness-test$(EXESUF)
diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
index fa4183d..d997269 100644
--- a/tests/boot-serial-test.c
+++ b/tests/boot-serial-test.c
@@ -7,9 +7,10 @@
  * or later. See the COPYING file in the top-level directory.
  *
  * This test is used to check that the serial output of the firmware
- * (that we provide for some machines) contains an expected string.
- * Thus we check that the firmware still boots at least to a certain
- * point and so we know that the machine is not completely broken.
+ * (that we provide for some machines) or some small mini-kernels that
+ * we provide here contains an expected string. Thus we check that the
+ * firmware/kernel still boots at least to a certain point and so we
+ * know that the machine is not completely broken.
  */
 
 #include "qemu/osdep.h"
@@ -20,6 +21,9 @@ typedef struct testdef {
 const char *machine;/* Name of the machine */
 const char *extra;  /* Additional parameters */
 const char *expect; /* Expected string in the serial output */
+size_t codesize;/* Size of the kernel or bios data */
+const uint8_t *kernel;  /* Set in case we use our own mini kernel */
+const uint8_t *bios;/* Set in case we use our own mini bios */
 } testdef_t;
 
 static testdef_t tests[] = {
@@ -72,26 +76,52 @@ done:
 static void test_machine(const void *data)
 {
 const testdef_t *test = data;
-char tmpname[] = "/tmp/qtest-boot-serial-XX";
-int fd;
+char serialtmp[] = "/tmp/qtest-boot-serial-sXX";
+char codetmp[] = "/tmp/qtest-boot-serial-cXX";
+const char *codeparam = "";
+const uint8_t *code = NULL;
+int ser_fd;
 
-fd = mkstemp(tmpname);
-g_assert(fd != -1);
+ser_fd = mkstemp(serialtmp);
+g_assert(ser_fd != -1);
+
+if (test->kernel) {
+code = test->kernel;
+codeparam = "-kernel";
+} else if (test->bios) {
+code = test->bios;
+codeparam = "-bios";
+}
+
+if (code) {
+ssize_t wlen;
+int code_fd;
+
+code_fd = mkstemp(codetmp);
+g_assert(code_fd != -1);
+wlen = write(code_fd, code, test->codesize);
+g_assert(wlen == test->codesize);
+close(code_fd);
+}
 
 /*
  * Make sure that this test uses tcg if available: It is used as a
  * fast-enough smoketest for that.
  */
-global_qtest = qtest_startf("-M %s,accel=tcg:kvm "
+global_qtest = qtest_startf("%s %s -M %s,accel=tcg:kvm "
 "-chardev file,id=serial0,path=%s "
 "-no-shutdown -serial chardev:serial0 %s",
-test->machine, tmpname, test->extra);
-unlink(tmpname);
+codeparam, code ? codetmp : "",
+test->machine, serialtmp, test->extra);
+unlink(serialtmp);
+if (code) {
+unlink(codetmp);
+}
 
-check_guest_output(test, fd);
+check_guest_output(test, ser_fd);
 qtest_quit(global_qtest);
 
-close(fd);
+close(ser_fd);
 }
 
 int main(int argc, char *argv[])
-- 
1.8.3.1




[Qemu-devel] [PATCH for-2.12 4/7] tests/boot-serial-test: Add tests for microblaze boards

2017-11-30 Thread Thomas Huth
This adds two simple TCG + UART tests for the microblaze boards,
one in big endian mode, and one in little endian mode.

Signed-off-by: Thomas Huth 
---
 tests/Makefile.include   |  2 ++
 tests/boot-serial-test.c | 20 
 2 files changed, 22 insertions(+)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 41ca44c..cd908da 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -299,6 +299,8 @@ check-qtest-alpha-y = tests/boot-serial-test$(EXESUF)
 
 check-qtest-m68k-y = tests/boot-serial-test$(EXESUF)
 
+check-qtest-microblaze-y = tests/boot-serial-test$(EXESUF)
+
 check-qtest-mips-y = tests/endianness-test$(EXESUF)
 
 check-qtest-mips64-y = tests/endianness-test$(EXESUF)
diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
index dd3828c..a39273a 100644
--- a/tests/boot-serial-test.c
+++ b/tests/boot-serial-test.c
@@ -24,6 +24,22 @@ static const uint8_t kernel_mcf5208[] = {
 0x60, 0xfa  /* bra.s  loop */
 };
 
+static const uint8_t kernel_pls3adsp1800[] = {
+0xb0, 0x00, 0x84, 0x00, /* imm   0x8400 */
+0x30, 0x60, 0x00, 0x04, /* addik r3,r0,4 */
+0x30, 0x80, 0x00, 0x54, /* addik r4,r0,'T' */
+0xf0, 0x83, 0x00, 0x00, /* sbi   r4,r3,0 */
+0xb8, 0x00, 0xff, 0xfc  /* bri   -4  loop */
+};
+
+static const uint8_t kernel_plml605[] = {
+0xe0, 0x83, 0x00, 0xb0, /* imm   0x83e0 */
+0x00, 0x10, 0x60, 0x30, /* addik r3,r0,0x1000 */
+0x54, 0x00, 0x80, 0x30, /* addik r4,r0,'T' */
+0x00, 0x00, 0x83, 0xf0, /* sbi   r4,r3,0 */
+0xfc, 0xff, 0x00, 0xb8  /* bri   -4  loop */
+};
+
 typedef struct testdef {
 const char *arch;   /* Target architecture */
 const char *machine;/* Name of the machine */
@@ -50,6 +66,10 @@ static testdef_t tests[] = {
 { "s390x", "s390-ccw-virtio",
   "-nodefaults -device sclpconsole,chardev=serial0", "virtio device" },
 { "m68k", "mcf5208evb", "", "TT", sizeof(kernel_mcf5208), kernel_mcf5208 },
+{ "microblaze", "petalogix-s3adsp1800", "", "TT",
+  sizeof(kernel_pls3adsp1800), kernel_pls3adsp1800 },
+{ "microblazeel", "petalogix-ml605", "", "TT",
+  sizeof(kernel_plml605), kernel_plml605 },
 
 { NULL }
 };
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH v2 for-2.12 01/16] s390x/tcg: introduce and use s390_program_interrupt()

2017-11-30 Thread Thomas Huth
On 29.11.2017 21:26, David Hildenbrand wrote:
> Allows to easily convert more callers of program_interrupt() and to
> easily introduce new exceptions without forgetting about the cpu state
> reset.
> 
> Use s390_program_interrupt() in places where we already had the same
> pattern. We will later get rid of program_interrupt().
> 
> RA != 0 checks are already done behind the scenes.
> 
> Reviewed-by: Richard Henderson 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/cpu.h   |  2 ++
>  target/s390x/crypto_helper.c |  7 ++-
>  target/s390x/excp_helper.c   |  5 +
>  target/s390x/interrupt.c | 13 +
>  target/s390x/mem_helper.c| 35 +++
>  target/s390x/misc_helper.c   |  3 +--
>  6 files changed, 30 insertions(+), 35 deletions(-)

Reviewed-by: Thomas Huth 



Re: [Qemu-devel] [PATCH v4 09/20] parallels: Switch to .bdrv_co_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the parallels driver accordingly.  Note that
the internal function block_status() is still sector-based, because
it is still in use by other sector-based functions; but that's okay
because request_alignment is 512 as a result of those functions.
For now, no optimizations are added based on the mapping hint.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak, R-b dropped
v3: no change
v2: rebase to mapping parameter; it is ignored, so R-b kept
---
  block/parallels.c | 22 +++---
  1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 2b6c6e5709..4a086f29e9 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -278,23 +278,31 @@ static coroutine_fn int 
parallels_co_flush_to_os(BlockDriverState *bs)
  }


-static int64_t coroutine_fn parallels_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn parallels_co_block_status(BlockDriverState *bs,
+  bool want_zero,
+  int64_t offset,
+  int64_t bytes,
+  int64_t *pnum,
+  int64_t *map,
+  BlockDriverState **file)
  {
  BDRVParallelsState *s = bs->opaque;
-int64_t offset;
+int count;

+assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
  qemu_co_mutex_lock(&s->lock);
-offset = block_status(s, sector_num, nb_sectors, pnum);
+offset = block_status(s, offset >> BDRV_SECTOR_BITS,
+  bytes >> BDRV_SECTOR_BITS, &count);
  qemu_co_mutex_unlock(&s->lock);

  if (offset < 0) {


pnum=count*BDRV_SECTOR_SIZE and map=0 setting missed here.


  return 0;
  }

+*pnum = count * BDRV_SECTOR_SIZE;
+*map = offset;
  *file = bs->file->bs;
-return (offset << BDRV_SECTOR_BITS) |
-BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
  }

  static coroutine_fn int parallels_co_writev(BlockDriverState *bs,
@@ -781,7 +789,7 @@ static BlockDriver bdrv_parallels = {
  .bdrv_open= parallels_open,
  .bdrv_close   = parallels_close,
  .bdrv_child_perm  = bdrv_format_default_perms,
-.bdrv_co_get_block_status = parallels_co_get_block_status,
+.bdrv_co_block_status = parallels_co_block_status,
  .bdrv_has_zero_init   = bdrv_has_zero_init_1,
  .bdrv_co_flush_to_os  = parallels_co_flush_to_os,
  .bdrv_co_readv  = parallels_co_readv,



--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH v2 for-2.12 02/16] s390x/tcg: get rid of runtime_exception()

2017-11-30 Thread Thomas Huth
On 29.11.2017 21:26, David Hildenbrand wrote:
> Let's use s390_program_interrupt() instead.
> 
> Reviewed-by: Richard Henderson 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/fpu_helper.c  |  2 +-
>  target/s390x/int_helper.c  | 14 +++---
>  target/s390x/internal.h|  2 --
>  target/s390x/misc_helper.c | 16 
>  4 files changed, 8 insertions(+), 26 deletions(-)

Is it a disadvantage that runtime_exception() was declared as
QEMU_NORETURN, and s390_program_interrupt() is not declared as
QEMU_NORETURN?

At a first glance, I guess it's ok, and in that case:

Reviewed-by: Thomas Huth 



Re: [Qemu-devel] [PATCH v1 2/2] intel-iommu: Extend address width to 48 bits

2017-11-30 Thread Peter Xu
On Thu, Nov 30, 2017 at 01:22:38PM +0800, Liu, Yi L wrote:
> On Tue, Nov 14, 2017 at 06:13:50PM -0500, prasad.singamse...@oracle.com wrote:
> > From: Prasad Singamsetty 
> > 
> > The current implementation of Intel IOMMU code only supports 39 bits
> > iova address width. This patch provides a new parameter (x-aw-bits)
> > for intel-iommu to extend its address width to 48 bits but keeping the
> > default the same (39 bits). The reason for not changing the default
> > is to avoid potential compatibility problems with live migration of
> > intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits'
> > parameter are 39 and 48.
> > 
> > After enabling larger address width (48), we should be able to map
> > larger iova addresses in the guest. For example, a QEMU guest that
> > is configured with large memory ( >=1TB ). To check whether 48 bits
> 
> I didn't quite get your point here. Address width limits the iova range,
> but it doesn't limit the guest memory range. e.g. you can use 39 bit iova
> address to access a guest physical address larger than (2^39 - 1) as long
> as the guest 2nd level page table is well programmed. Only one exception,
> if you requires a continuous iova range(e.g. 2^39), it would be an issue.
> Not sure if this is the major motivation of your patch? However, I'm not
> against extend the address width to be 48 bits. Just need to make it clear
> here.

One thing I can think of is the identity mapping. Say, when iommu=pt
is set in guest, meanwhile when PT capability is not supported from
hardware (here I mean, the emulated hardware, of course), guest kernel
will create one identity mapping to emulate the PT mode.

Current linux kernel's identity mapping should be a static 48 bits
mapping (it must cover the whole memory range of guest), so if we
provide a 39 bits vIOMMU to the guest, AFAIU we'll fail at device
attaching to that identity domain of every single device that is
protected by that 39 bits vIOMMU (kernel will find that domain gaw is
bigger than vIOMMU supported gaw of that device).

I do see no good fix for that, except boost the supported gaw to
bigger ones.

Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v3 1/2] virtio: check VirtQueue Vring object is set

2017-11-30 Thread P J P
+-- On Wed, 29 Nov 2017, Cornelia Huck wrote --+
| I think the basic problem is still that you conflate two things:
| - vring.num, which cannot be flipped between 0 and !0 by the guest
| - vring.{desc,avail,used}, which can
| 
| IOW, if vring.num == 0, the guest cannot manipulate the queue; if
| vring.desc == 0, the guest can. 
| 
| Many functions have a likely/unlikely check, setup routines excepted.

Have sent a revised patch v4. Thank you.
--
Prasad J Pandit / Red Hat Product Security Team
47AF CE69 3A90 54AA 9045 1053 DD13 3D32 FE5B 041F



Re: [Qemu-devel] [PATCH v4 10/20] qcow: Switch to .bdrv_co_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qcow driver accordingly.  There is no
intent to optimize based on the want_zero flag for this format.

Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 



--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH] i386: turn off l3-cache property by default

2017-11-30 Thread Longpeng (Mike)


On 2017/11/29 21:35, Roman Kagan wrote:

> On Wed, Nov 29, 2017 at 07:58:19PM +0800, Longpeng (Mike) wrote:
>> On 2017/11/29 18:41, Eduardo Habkost wrote:
>>> On Wed, Nov 29, 2017 at 01:20:38PM +0800, Longpeng (Mike) wrote:
 On 2017/11/29 5:13, Eduardo Habkost wrote:
> On Tue, Nov 28, 2017 at 11:20:27PM +0300, Denis V. Lunev wrote:
>> On 11/28/2017 10:58 PM, Eduardo Habkost wrote:
>>> On Fri, Nov 24, 2017 at 04:26:50PM +0300, Denis Plotnikov wrote:
 Commit 14c985cffa "target-i386: present virtual L3 cache info for 
 vcpus"
 introduced and set by default exposing l3 to the guest.

 The motivation behind it was that in the Linux scheduler, when waking 
 up
 a task on a sibling CPU, the task was put onto the target CPU's 
 runqueue
 directly, without sending a reschedule IPI.  Reduction in the IPI count
 led to performance gain.

 However, this isn't the whole story.  Once the task is on the target
 CPU's runqueue, it may have to preempt the current task on that CPU, be
 it the idle task putting the CPU to sleep or just another running task.
 For that a reschedule IPI will have to be issued, too.  Only when that
 other CPU is running a normal task for too little time, the fairness
 constraints will prevent the preemption and thus the IPI.


 Agree. :)

 Our testing VM is Suse11 guest with idle=poll at that time and now I 
 realize
 that Suse11 has a BUG in its scheduler.

 For REHL 7.3 or upstream kernel, in ttwu_queue_remote(), a RES IPI is 
 issued if
 rq->idle is not polling:
 '''
 static void ttwu_queue_remote(struct task_struct *p, int cpu)
 {
struct rq *rq = cpu_rq(cpu);

if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)) {
if (!set_nr_if_polling(rq->idle))
smp_send_reschedule(cpu);
else
trace_sched_wake_idle_without_ipi(cpu);
}
 }
 '''

 But for Suse11, it does not check, it send a RES IPI unconditionally.
>>>
>>> So, does that mean no Linux guest benefits from the l3-cache=on
>>> default except SuSE 11 guests?
>>>
>>
>> Not only that, there is another scenario:
>>
>> static void ttwu_queue(...)
>> {
>>  if (...two cpus NOT sharing L3-cache) {
>>  ...
>>  ttwu_queue_remote(p, cpu, wake_flags);
>>  return;
>>  }
>>  ...
>>  ttwu_do_activate(rq, p, wake_flags, &rf); <--*Here*
>>  ...
>> }
>>
>> In ttwu_do_activate(), there are also some opportunities with low 
>> probability to
>> do not send RES IPI even if the target cpu isn't in IDLE polling state.
> 
> Well it isn't so low actually, what you need is to keep the cpus busy
> switching tasks.  In that case it's not uncommon that the task being
> woken up on a remote cpu has accumulated more vruntime than the task
> already running on that cpu; in that case the new task won't preempt the
> current task and the IPI won't be issued.  E.g. on a RHEL 7.4 guest we
> saw:
> 

I get it, thanks.

 This boils down to the improvement being only achievable in workloads
 with many actively switching tasks.  We had no access to the
 (proprietary?) SAP HANA benchmark the commit referred to, but the
 pattern is also reproduced with "perf bench sched messaging -g 1"
 on 1 socket, 8 cores vCPU topology, we see indeed:

 l3-cache   #res IPI /s #time / 1 loops
 off560K1.8 sec
 on 40K 0.9 sec
> 
> The workload where it bites is mostly idle guest, with chains of
> dependent wakeups, i.e. with little parallelism:
> 
 Now there's a downside: with L3 cache the Linux scheduler is more eager
 to wake up tasks on sibling CPUs, resulting in unnecessary cross-vCPU
 interactions and therefore exessive halts and IPIs.  E.g. "perf bench
 sched pipe -i 10" gives

 l3-cache   #res IPI /s #HLT /s #time /10 loops
 off200 (no K)  230 0.2 sec
 on 400K330K0.5 sec


 I guess this issue could be resolved by disable the SD_WAKE_AFFINE.
> 
> Actually, it's SD_WAKE_AFFINE that's being effectively defeated by this
> l3-cache, because the scheduler thinks that the cpus that share the
> last-level cache are close enough that a dependent task can be woken up
> on a sibling cpu.
> 

In this case (sched pipe), without L3-cache, a dependent task is woken up on the
original cpu mostly, if these two tasks ran on the same cpu then the dependent
task is woken up without a RES IPI. The related codes are:
'''
void resched_curr(struct rq *rq)
{
if (cpu == smp_processor_id()) {
set_tsk_need_resched(curr

Re: [Qemu-devel] [PATCH v4 1/2] virtio: check VirtQueue Vring object is set

2017-11-30 Thread Cornelia Huck
On Wed, 29 Nov 2017 23:14:27 +0530
P J P  wrote:

> From: Prasad J Pandit 
> 
> A guest could attempt to use an uninitialised VirtQueue object
> or unset Vring.align leading to a arithmetic exception. Add check
> to avoid it.
> 
> Reported-by: Zhangboxian 
> Signed-off-by: Prasad J Pandit 
> ---
>  hw/virtio/virtio.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)

Reviewed-by: Cornelia Huck 



Re: [Qemu-devel] Block layer complexity: what to do to keep it under control?

2017-11-30 Thread Fam Zheng
On Wed, 11/29 12:00, Stefan Hajnoczi wrote:
> On Wed, Nov 29, 2017 at 11:55:02AM +0800, Fam Zheng wrote:
> > As we move forwards with new features in the block layer, the chances of 
> > tricky
> > bugs happening have been increasing alongside - block jobs, coroutines,
> > throttling, AioContext, op blockers and image locking combined together 
> > make a
> > large and complex picture that is hard to fully understand and work with. 
> > Some
> > bugs we've encountered are quite challenging already.  Examples are:
> > 
> > - segfault in parallel blockjobs (iotest 30)
> >   https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01144.html
> > 
> > - Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage
> >   migration)
> >   https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01626.html
> > 
> > - Drainage in bdrv_replace_child_noperm()
> >   https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg00868.html
> > 
> > - Regression from 2.8: stuck in bdrv_drain()
> >   https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02193.html
> > 
> > So in principle, what should we do to make the block layer easy to 
> > understand,
> > develop with and debug?
> 
> The assumptions that the code relies on are unclear so it's easy to
> introduce new bugs.

Is that one thing we could do better in documenting?

> 
> We are at a point where code review isn't finding certain bugs because
> no single person knows all the assumptions.  Previously the problem was
> contained because maintainers spotted problems before patches were
> merged.
> 
> This is not primarily a documentation problem though.  We cannot
> document our way out of this because no single person (patch author or
> code reviewer) can know or check everything anymore due to the scale.
> 
> I think it's a (lack of) design problem because we have many incomplete
> abstractions like block jobs, IOThreads, block graph, image locking,
> etc.  They do not cover all possibly states and interactions today.
> Extending them leads to complex bugs.
> 
> A little progress has been made with defining higher-level APIs for
> block drivers and block jobs.  This way they either don't deal with
> low-level details of the concurrency and event loop models (e.g.
> bdrv_coroutine_enter()) or there is an interface that prompts them to
> integrate properly like bdrv_attach/detach_aio_context().

Sounds good.

> 
> Event loops and coroutines are good but they should not be used directly
> by block drivers and block jobs.  We need safe, high-level APIs that
> implement commonly-used operations.
> 
> > - Documentation
> > 
> >   There is no central developer doc about block layer, especially how all 
> > pieces
> >   fit together. Having one will make it a lot easier for new contributors to
> >   understand better. Of course, we're facing the old problem: the code is
> >   moving, maintaining an updated document needs effort.
> > 
> >   Idea: add ./doc/deve/block.txt?
> 
> IOThreads and AioContexts are addressed here:
> docs/devel/multiple-iothreads.txt
> 
> The game has become significantly more complex than what the document
> describes.  It's lacking aio_co_wake() and aio_co_schedule() for
> example.
> 
> > - Simplified code, or more orthogonal/modularized architecture.
> > 
> >   Each aspect of block layer is complex enough so isolating them as much as
> >   possible is a reasonable approach to control the complexity. Block jobs 
> > and
> >   throttling becoming block filters is a good example, we should identify 
> > more.
> > 
> >   Idea: rethink event loops. Create coroutines ubiquitously (for example for
> >   each fd handler, BH and timer), so that many nested aio_poll() can be 
> > removed.
> > 
> >   Crazy idea: move the whole block layer to a vhost process, and implement
> >   existing features differently, especially in terms of multi-threading 
> > (hint:
> >   rust?).
> 
> A reimplementation will not solve the problem because:
> 
> 1. If it still has the same feature set and requirements then the level
>of complexity will be comparable.
> 
> 2. We can reduce accidental (inessential) complexity by continuing the
>various efforts around the block graph, block jobs, multi-queue block
>layer with an eye towards higher level APIs.

Starting over is certainly not the motivation to do qemu-vhost, but it would be
an opportunity to use different async/concurrency paradigms if that is going to
happen. I think in current block layer, event loop + coroutine is a good
combination, but having nested aio_poll()'s made it worse, then mixing IOThreads
in makes it a lot more complicated.

Fam



Re: [Qemu-devel] [RFC PATCH 1/1] s390x/css: unresrict cssids

2017-11-30 Thread Cornelia Huck
On Wed, 29 Nov 2017 19:51:23 +0100
Christian Borntraeger  wrote:

> On 11/28/2017 03:45 PM, Cornelia Huck wrote:
> > On Tue, 28 Nov 2017 15:17:49 +0100
> > Christian Borntraeger  wrote:
> >   
> >> On 11/28/2017 03:01 PM, Cornelia Huck wrote:  
> >>> On Tue, 28 Nov 2017 14:25:08 +0100
> >>> Christian Borntraeger  wrote:  
> >   
>  What I want now is to enable vfio-ccw for libvirt and Linux guests and 
>  for that
>  we need to lift the restriction of having vfio not in FE.
> >>>
> >>> This, however, worries me. I don't really consider vfio-ccw ready for
> >>> prime time yet. We still need to figure out path handling at the very
> >>> least. And I'm still not sure that our non-handling of halt/clear won't
> >>> cause major issues with e.g. a storage server error recovery.
> >>>
> >>> Can we flag vfio-ccw as experimental or such in libvirt?
> >>
> >> We should have flagged vfio-ccw experimental in QEMU then.
> >> e.g. by using x-vfio-ccw or whatever.I dont think we can do that
> >> after the facts.  
> > 
> > Yes, we should have done that... can't fix that now, unfortunately.
> >   
> >>
> >> I am not that deep into vfio-ccw, but I was under the impression that
> >> the missing features should not affect the vfio interface that is
> >> created by libvirt. After all this should just be a "-device 
> >> vfio-ccw,."
> >> statement and some setup steps beforehand (unbind + setup of vfio-ccw)  
> > 
> > The halt/clear stuff is highly unlikely to influence the configuration
> > interface. I'm still not too clear about path handling, though. Will we
> > need different modes (hypervisor managed vs. full passthrough? probably
> > only passthrough)? What about pgid handling - do we need some kind of
> > pgid manager? (Keep in mind that you get to set the pgid once. This has
> > implications for e.g. reserve/release.)
> > 
> > Also, what about devices other than ECKD DASD? Has there been any
> > testing? Tapes should be interesting; the other interesting ccw devices
> > are QDIO-based and I'm not sure we want to spend anything on supporting
> > those.  
> 
> DASD is probably the most important thing today, QDIO might never happen
> (or very late).

One thing I'd find interesting about tapes is very long running channel
programs (like rewind) and how they interact with halt/clear. But yes,
I would think that DASD is the most important one in practice.

> See my proposal below.
> 
> > 
> > The interface is probably fine, but I'd like to get an idea about the
> > path stuff (is the attachment spec that contains the pgid stuff
> > publicly available, btw.?)
> >   
> >>
> >> If your concern is the serviceability I think it would be valid for a RHEL
> >> KVM to disable vfio-ccw in the kernel. Maybe we could even provide a config
> >> for QEMU?  
> > 
> > It's fine just to turn off vfio-ccw in the kernel.
> >   
> >> PS: I learned from the PCI and CCW experience that I do not want
> >> to upstream kernel/qemu code unless I have a working libvirt code that
> >> shows that the thing will work.  
> > 
> > Yes, I understand the wish to verify the interface, and I think it's a
> > good idea. What I'm worried about is that this might be the precursor
> > to a push to support it, and I don't think vfio-ccw is ready for that
> > yet.
> >   
> FWIW, libvirt should not care if a device works in all cases or not because 
> libvirt versions should work with all kind of QEMU/kernel combinations. 
> Fencing in libvirt that the kernel part is not up to the task is making
> me feel the same way as you when you see  the css-unrestricted property
> at a device ;-)

I'm more worried about the political angle than the technical. If we
don't get pressure to support this too soon, I don't care that much
about experimental or not.

> Having said that, I think that having vfio-ccw support has a value (and it
> actually works for an unnamed test infrastructure). In addition to that I 
> am much more likely to actually test this continuously if I have libvirt 
> support.

Good to hear that it works for a non-Linux guest. Any plans to test
something like storage server failover?

[I'd really love to do something about the path handling stuff, but the
combination of scant documentation and scantier time works against me.]

> 
> 
> So what about the following: 
> 
> 1. We will implement the libvirt support with either a or b:
> a: if we find a solution to our "where to put the property" dispute use that 
> to decide
> if we can add vfio-ccw
> b if not: just provide a patch that lifts the restriction without any 
> property.
> in libvirt blindy assume  that free assignment will work. old QEMUs will 
> complain at
> startup and  libvirt will print the QEMU error. This is similar to other 
> situations 
> where libvirt cannot fully bprobe  if the QEMU will work or not. (not having 
> vfio-ccw
> support in the kernel will certainly allow libvirt to reject this upfront)

I hope we end up with a solution that works for everyone...

> 

Re: [Qemu-devel] [PATCH v4 11/20] qcow2: Switch to .bdrv_co_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qcow2 driver accordingly.

For now, we are ignoring the 'want_zero' hint.  However, it should
be relatively straightforward to honor the hint as a way to return
larger *pnum values when we have consecutive clusters with the same
data/zero status but which differ only in having non-consecutive
mappings.


useful thing (for example, to get several compressed clusters in one chunk,
but do not include following zero clusters). but should not the hint be 
called want_mapping for it?

or may be we will move to "int hint_flags", which will include status flags
we a interested in?



Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 



---
v4: update to interface tweak
v3: no change
v2: rebase to mapping flag
---
  block/qcow2.c | 24 +---
  1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 92cb9f9bfa..a84e50a6e6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1625,32 +1625,34 @@ static void qcow2_join_options(QDict *options, QDict 
*old_options)
  }
  }

-static int64_t coroutine_fn qcow2_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
+  bool want_zero,
+  int64_t offset, int64_t count,


'count' is used instead of 'bytes' because of qcow2_get_cluster_offset 
which wants int*..



+  int64_t *pnum, int64_t *map,
+  BlockDriverState **file)
  {
  BDRVQcow2State *s = bs->opaque;
  uint64_t cluster_offset;
  int index_in_cluster, ret;
  unsigned int bytes;
-int64_t status = 0;
+int status = 0;

-bytes = MIN(INT_MAX, nb_sectors * BDRV_SECTOR_SIZE);
+bytes = MIN(INT_MAX, count);
  qemu_co_mutex_lock(&s->lock);
-ret = qcow2_get_cluster_offset(bs, sector_num << BDRV_SECTOR_BITS, &bytes,
-   &cluster_offset);
+ret = qcow2_get_cluster_offset(bs, offset, &bytes, &cluster_offset);
  qemu_co_mutex_unlock(&s->lock);
  if (ret < 0) {
  return ret;
  }

-*pnum = bytes >> BDRV_SECTOR_BITS;
+*pnum = bytes;

  if (cluster_offset != 0 && ret != QCOW2_CLUSTER_COMPRESSED &&
  !s->crypto) {
-index_in_cluster = sector_num & (s->cluster_sectors - 1);
-cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS);
+index_in_cluster = offset & (s->cluster_size - 1);
+*map = cluster_offset | index_in_cluster;
  *file = bs->file->bs;
-status |= BDRV_BLOCK_OFFSET_VALID | cluster_offset;
+status |= BDRV_BLOCK_OFFSET_VALID;
  }
  if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
  status |= BDRV_BLOCK_ZERO;
@@ -4333,7 +4335,7 @@ BlockDriver bdrv_qcow2 = {
  .bdrv_child_perm  = bdrv_format_default_perms,
  .bdrv_create= qcow2_create,
  .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = qcow2_co_get_block_status,
+.bdrv_co_block_status = qcow2_co_block_status,

  .bdrv_co_preadv = qcow2_co_preadv,
  .bdrv_co_pwritev= qcow2_co_pwritev,



--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH v2 for-2.12 04/16] s390x/ioinst: pass the retaddr to all IO instructions

2017-11-30 Thread Thomas Huth
On 29.11.2017 21:26, David Hildenbrand wrote:
> TCG needs the retaddr when injecting an interrupt. Let's just pass it
> along and use RA_IGNORED for KVM. The value will be completely ignored for
> KVM.
> 
> Convert program_interrupt() to s390_program_interrupt() directly, making
> use of the passed address.
> 
> Reviewed-by: Richard Henderson 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/cpu.h |  1 +
>  target/s390x/internal.h| 29 +++-
>  target/s390x/ioinst.c  | 67 
> +++---
>  target/s390x/kvm.c | 27 ++-
>  target/s390x/misc_helper.c | 20 +++---
>  5 files changed, 75 insertions(+), 69 deletions(-)

Reviewed-by: Thomas Huth 



Re: [Qemu-devel] [PATCH v2 for-2.12 05/16] s390x/pci: pass the retaddr to all PCI instructions

2017-11-30 Thread Thomas Huth
On 29.11.2017 21:26, David Hildenbrand wrote:
> Once we wire up TCG, we will need the retaddr to correctly inject
> program interrupts. As we want to get rid of the function
> program_interrupt(), convert PCI code too.
> 
> For KVM, we can simply use RA_IGNORED.
> 
> Convert program_interrupt() to s390_program_interrupt() directly, making
> use of the passed address.
> 
> Reviewed-by: Richard Henderson 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/s390x/s390-pci-inst.c | 83 
> +---
>  hw/s390x/s390-pci-inst.h | 16 ++
>  target/s390x/kvm.c   | 14 
>  3 files changed, 59 insertions(+), 54 deletions(-)

Reviewed-by: Thomas Huth 



Re: [Qemu-devel] [PATCH v1 2/2] intel-iommu: Extend address width to 48 bits

2017-11-30 Thread Liu, Yi L
On Thu, Nov 30, 2017 at 05:11:55PM +0800, Peter Xu wrote:
> On Thu, Nov 30, 2017 at 01:22:38PM +0800, Liu, Yi L wrote:
> > On Tue, Nov 14, 2017 at 06:13:50PM -0500, prasad.singamse...@oracle.com 
> > wrote:
> > > From: Prasad Singamsetty 
> > > 
> > > The current implementation of Intel IOMMU code only supports 39 bits
> > > iova address width. This patch provides a new parameter (x-aw-bits)
> > > for intel-iommu to extend its address width to 48 bits but keeping the
> > > default the same (39 bits). The reason for not changing the default
> > > is to avoid potential compatibility problems with live migration of
> > > intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits'
> > > parameter are 39 and 48.
> > > 
> > > After enabling larger address width (48), we should be able to map
> > > larger iova addresses in the guest. For example, a QEMU guest that
> > > is configured with large memory ( >=1TB ). To check whether 48 bits
> > 
> > I didn't quite get your point here. Address width limits the iova range,
> > but it doesn't limit the guest memory range. e.g. you can use 39 bit iova
> > address to access a guest physical address larger than (2^39 - 1) as long
> > as the guest 2nd level page table is well programmed. Only one exception,
> > if you requires a continuous iova range(e.g. 2^39), it would be an issue.
> > Not sure if this is the major motivation of your patch? However, I'm not
> > against extend the address width to be 48 bits. Just need to make it clear
> > here.
> 
> One thing I can think of is the identity mapping. Say, when iommu=pt
> is set in guest, meanwhile when PT capability is not supported from
> hardware (here I mean, the emulated hardware, of course), guest kernel
> will create one identity mapping to emulate the PT mode.

True.
 
> Current linux kernel's identity mapping should be a static 48 bits
> mapping (it must cover the whole memory range of guest), so if we

I suppose guest memory range depends on the AW reported by CPUID? Not sure
if it is constantly 48 bits.

> provide a 39 bits vIOMMU to the guest, AFAIU we'll fail at device
> attaching to that identity domain of every single device that is
> protected by that 39 bits vIOMMU (kernel will find that domain gaw is
> bigger than vIOMMU supported gaw of that device).

Yeah, this is a good argue. As it is 1:1 mapping, the translated address
is limited all the same.

> I do see no good fix for that, except boost the supported gaw to
> bigger ones.

How about defaultly expose cap.PT bit? In that case, there will no 1:1
mapping in guest side. Translation is skipped. So the IOMMU AW won't
limit the addressing.

Regards,
Yi L

> 
> Thanks,
> 
> -- 
> Peter Xu
> 



Re: [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine

2017-11-30 Thread David Gibson
On Thu, Nov 23, 2017 at 02:29:47PM +0100, Cédric Le Goater wrote:
> The XIVE object is designed to be always available, so it is created
> unconditionally on newer machines.

There doesn't actually seem to be anything dependent on machine
version here.

> Depending on the configuration and
> the guest capabilities, the CAS negotiation process will decide which
> interrupt model to use, legacy or XIVE.
> 
> The XIVE model makes use of the full range of the IRQ number space
> because the IRQ numbers for the CPU IPIs are allocated in the range
> below XICS_IRQ_BASE, which is unused by XICS.

Ok.  And I take it 4096 is enough space for the XIVE IPIs for the
forseeable future?

> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/ppc/spapr.c | 34 ++
>  include/hw/ppc/spapr.h |  2 ++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5d3325ca3c88..0e0107c8272c 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -56,6 +56,7 @@
>  #include "hw/ppc/spapr_vio.h"
>  #include "hw/pci-host/spapr.h"
>  #include "hw/ppc/xics.h"
> +#include "hw/ppc/spapr_xive.h"
>  #include "hw/pci/msi.h"
>  
>  #include "hw/pci/pci.h"
> @@ -204,6 +205,29 @@ static void xics_system_init(MachineState *machine, int 
> nr_irqs, Error **errp)
>  }
>  }
>  
> +static sPAPRXive *spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs,
> +Error **errp)
> +{
> +Error *local_err = NULL;
> +Object *obj;
> +
> +obj = object_new(TYPE_SPAPR_XIVE);
> +object_property_add_child(OBJECT(spapr), "xive", obj, &error_abort);
> +object_property_set_int(obj, nr_irqs, "nr-irqs",  &local_err);
> +if (local_err) {
> +goto error;
> +}
> +object_property_set_bool(obj, true, "realized", &local_err);
> +if (local_err) {
> +goto error;
> +}
> +
> +return SPAPR_XIVE(obj);
> +error:
> +error_propagate(errp, local_err);
> +return NULL;
> +}
> +
>  static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
>int smt_threads)
>  {
> @@ -2360,6 +2384,16 @@ static void ppc_spapr_init(MachineState *machine)
>  /* Set up Interrupt Controller before we create the VCPUs */
>  xics_system_init(machine, XICS_IRQS_SPAPR, &error_fatal);
>  
> +/* We don't have KVM support yet, so check for irqchip=on */
> +if (kvm_enabled() && machine_kernel_irqchip_required(machine)) {
> +error_report("kernel_irqchip requested. no XIVE support");

I think you want an actual exit(1) here, no?  error_report() will
print an error but keep going.

> +} else {
> +/* XIVE uses the full range of IRQ numbers. The CPU IPIs will
> + * use the range below XICS_IRQ_BASE, which is unused by XICS. */
> +spapr->xive = spapr_xive_create(spapr, XICS_IRQ_BASE + 
> XICS_IRQS_SPAPR,
> +&error_fatal);

XICS_IRQ_BASE == 4096, and XICS_IRQS_SPAPR (which we should rename at
some point) == 1024.

So we have a total irq space of 5k, which is a bit odd.  I'd be ok
with rounding it out to 8k for newer machines if that's useful.
Sparse allocations in there might make life easier for getting
consistent irq numbers without an "allocator" per se (because we can
use different regions for VIO, PCI intx, MSI, etc. etc.).

> +}
> +
>  /* Set up containers for ibm,client-architecture-support negotiated 
> options
>   */
>  spapr->ov5 = spapr_ovec_new();
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 9a3885593c86..90e2b0f6c678 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -14,6 +14,7 @@ struct sPAPRNVRAM;
>  typedef struct sPAPREventLogEntry sPAPREventLogEntry;
>  typedef struct sPAPREventSource sPAPREventSource;
>  typedef struct sPAPRPendingHPT sPAPRPendingHPT;
> +typedef struct sPAPRXive sPAPRXive;
>  
>  #define HPTE64_V_HPTE_DIRTY 0x0040ULL
>  #define SPAPR_ENTRY_POINT   0x100
> @@ -127,6 +128,7 @@ struct sPAPRMachineState {
>  MemoryHotplugState hotplug_memory;
>  
>  const char *icp_type;
> +sPAPRXive  *xive;
>  };
>  
>  #define H_SUCCESS 0

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 05/17] timer: generalize Dallas/Maxim RTC i2c devices

2017-11-30 Thread David Gibson
On Sun, Nov 26, 2017 at 03:59:03PM -0600, Michael Davidsaver wrote:
> Support for: ds1307, ds1337, ds1338, ds1339,
> ds1340, ds1375, ds1388, and ds3231.
> 
> Tested with ds1338 and ds1375.
> 
> Signed-off-by: Michael Davidsaver 

I certainly like the idea of consolidating this code, but reviewing to
see that the new code really is a generalization of the old is
something I won't have time for for a while.

Also, hw/timer is not within my purview so it'll probably need to go
another path to merge.

> ---
>  default-configs/arm-softmmu.mak |   2 +-
>  hw/timer/Makefile.objs  |   2 +-
>  hw/timer/ds-rtc-i2c.c   | 461 
> 
>  hw/timer/ds1338.c   | 239 -
>  4 files changed, 463 insertions(+), 241 deletions(-)
>  create mode 100644 hw/timer/ds-rtc-i2c.c
>  delete mode 100644 hw/timer/ds1338.c
> 
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index d37edc4312..b857823681 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -31,7 +31,7 @@ CONFIG_SMC91C111=y
>  CONFIG_ALLWINNER_EMAC=y
>  CONFIG_IMX_FEC=y
>  CONFIG_FTGMAC100=y
> -CONFIG_DS1338=y
> +CONFIG_DSRTCI2C=y
>  CONFIG_PFLASH_CFI01=y
>  CONFIG_PFLASH_CFI02=y
>  CONFIG_MICRODRIVE=y
> diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
> index 8c19eac3b6..290015ebec 100644
> --- a/hw/timer/Makefile.objs
> +++ b/hw/timer/Makefile.objs
> @@ -3,7 +3,7 @@ common-obj-$(CONFIG_ARM_MPTIMER) += arm_mptimer.o
>  common-obj-$(CONFIG_ARM_V7M) += armv7m_systick.o
>  common-obj-$(CONFIG_A9_GTIMER) += a9gtimer.o
>  common-obj-$(CONFIG_CADENCE) += cadence_ttc.o
> -common-obj-$(CONFIG_DS1338) += ds1338.o
> +common-obj-$(CONFIG_DSRTCI2C) += ds-rtc-i2c.o
>  common-obj-$(CONFIG_HPET) += hpet.o
>  common-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
>  common-obj-$(CONFIG_M48T59) += m48t59.o
> diff --git a/hw/timer/ds-rtc-i2c.c b/hw/timer/ds-rtc-i2c.c
> new file mode 100644
> index 00..ad2f8f2a68
> --- /dev/null
> +++ b/hw/timer/ds-rtc-i2c.c
> @@ -0,0 +1,461 @@
> +/* Emulation of various Dallas/Maxim RTCs accessed via I2C bus
> + *
> + * Copyright (c) 2017 Michael Davidsaver
> + * Copyright (c) 2009 CodeSourcery
> + *
> + * Authors: Michael Davidsaver
> + *  Paul Brook
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the LICENSE file in the top-level directory.
> + *
> + * Models real time read/set and NVRAM.
> + * Does not model alarms, or control/status registers.
> + *
> + * Generalized register map is:
> + *   [Current time]
> + *   [Alarm settings] (optional)
> + *   [Control/Status] (optional)
> + *   [Non-volatile memory] (optional)
> + *
> + * The current time registers are almost always the same,
> + * with the exception being that some have a CENTURY bit
> + * in the month register.
> + */
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qemu/timer.h"
> +#include "qemu/bcd.h"
> +#include "hw/hw.h"
> +#include "hw/registerfields.h"
> +#include "hw/i2c/i2c.h"
> +#include "sysemu/qtest.h"
> +#include "qemu/error-report.h"
> +
> +/* #define DEBUG_DSRTC */
> +
> +#ifdef DEBUG_DSRTC
> +#define DPRINTK(FMT, ...) info_report(TYPE_DSRTC " : " FMT, ## __VA_ARGS__)
> +#else
> +#define DPRINTK(FMT, ...) do {} while (0)
> +#endif
> +
> +#define LOG(MSK, FMT, ...) qemu_log_mask(MSK, TYPE_DSRTC " : " FMT "\n", \
> +## __VA_ARGS__)
> +
> +#define DSRTC_REGSIZE (0x40)
> +
> +/* values stored in BCD */
> +/* 00-59 */
> +#define R_SEC   (0x0)
> +/* 00-59 */
> +#define R_MIN   (0x1)
> +#define R_HOUR  (0x2)
> +/* 1-7 */
> +#define R_WDAY  (0x3)
> +/* 0-31 */
> +#define R_DATE  (0x4)
> +#define R_MONTH (0x5)
> +/* 0-99 */
> +#define R_YEAR  (0x6)
> +
> +/* use 12 hour mode when set */
> +FIELD(HOUR, SET12, 6, 1)
> +/* 00-23 */
> +FIELD(HOUR, HOUR24, 0, 6)
> +FIELD(HOUR, AMPM, 5, 1)
> +/* 1-12 (not 0-11!) */
> +FIELD(HOUR, HOUR12, 0, 5)
> +
> +/* 1-12 */
> +FIELD(MONTH, MONTH, 0, 5)
> +FIELD(MONTH, CENTURY, 7, 1)
> +
> +typedef struct DSRTCInfo {
> +/* if bit 7 of the Month register is set after Y2K */
> +bool has_century;
> +/* address of first non-volatile memory cell.
> + * nv_start >= reg_end means no NV memory.
> + */
> +uint8_t nv_start;
> +/* total size of register range.  When address counter rolls over. */
> +uint8_t reg_size;
> +} DSRTCInfo;
> +
> +typedef struct DSRTCState {
> +I2CSlave parent_obj;
> +
> +const DSRTCInfo *info;
> +
> +qemu_irq alarm_irq;
> +
> +/* register address counter */
> +uint8_t addr;
> +/* when writing, whether the address has been sent */
> +bool addrd;
> +
> +int64_t time_offset;
> +int8_t wday_offset;
> +
> +uint8_t regs[DSRTC_REGSIZE];
> +} DSRTCState;
> +
> +typedef struct DSRTCClass {
> +I2CSlaveClass parent_class;
> +
> +const DSRTCInfo *info;
> +} DSRTCClass;
> +
> +#define TYPE_DSRTC "ds-rtc-i2c"
> +#defi

Re: [Qemu-devel] unsubscribe qemu-devel

2017-11-30 Thread Daniel P. Berrange
On Thu, Nov 30, 2017 at 06:11:26AM +, Huang, Haibin wrote:
> unsubscribe qemu-devel

FYI, this is the wrong email address to use for unsubscribing. You instead
need

   qemu-devel-requ...@nongnu.org

Alternatively, there is a web UI that lets you do it:

  https://lists.nongnu.org/mailman/listinfo/qemu-devel

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [PATCH v4 01/32] migration: better error handling with QEMUFile

2017-11-30 Thread Dr. David Alan Gilbert
* Peter Xu (pet...@redhat.com) wrote:
> If the postcopy down due to some reason, we can always see this on dst:
> 
>   qemu-system-x86_64: RP: Received invalid message 0x length 0x
> 
> However in most cases that's not the real issue. The problem is that
> qemu_get_be16() has no way to show whether the returned data is valid or
> not, and we are _always_ assuming it is valid. That's possibly not wise.
> 
> The best approach to solve this would be: refactoring QEMUFile interface
> to allow the APIs to return error if there is. However it needs quite a
> bit of work and testing. For now, let's explicitly check the validity
> first before using the data in all places for qemu_get_*().
> 
> This patch tries to fix most of the cases I can see. Only if we are with
> this, can we make sure we are processing the valid data, and also can we
> make sure we can capture the channel down events correctly.
> 
> Signed-off-by: Peter Xu 
> ---
>  migration/migration.c |  5 +
>  migration/ram.c   | 26 ++
>  migration/savevm.c| 40 ++--
>  3 files changed, 65 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index c0206023d7..eae34d0524 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1708,6 +1708,11 @@ static void *source_return_path_thread(void *opaque)
>  header_type = qemu_get_be16(rp);
>  header_len = qemu_get_be16(rp);
>  
> +if (qemu_file_get_error(rp)) {
> +mark_source_rp_bad(ms);
> +goto out;
> +}
> +
>  if (header_type >= MIG_RP_MSG_MAX ||
>  header_type == MIG_RP_MSG_INVALID) {
>  error_report("RP: Received invalid message 0x%04x length 0x%04x",
> diff --git a/migration/ram.c b/migration/ram.c
> index 8620aa400a..960c726ff2 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2687,7 +2687,7 @@ static int ram_load_postcopy(QEMUFile *f)
>  void *last_host = NULL;
>  bool all_zero = false;
>  
> -while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> +while (!(flags & RAM_SAVE_FLAG_EOS)) {

I still think you need to keep the !ret && - see below;
anyway, there's no harm in keeping it!

>  ram_addr_t addr;
>  void *host = NULL;
>  void *page_buffer = NULL;
> @@ -2696,6 +2696,16 @@ static int ram_load_postcopy(QEMUFile *f)
>  uint8_t ch;
>  
>  addr = qemu_get_be64(f);
> +
> +/*
> + * If qemu file error, we should stop here, and then "addr"
> + * may be invalid
> + */
> +ret = qemu_file_get_error(f);
> +if (ret) {
> +break;
> +}
> +
>  flags = addr & ~TARGET_PAGE_MASK;
>  addr &= TARGET_PAGE_MASK;
>  
> @@ -2776,6 +2786,13 @@ static int ram_load_postcopy(QEMUFile *f)
>  error_report("Unknown combination of migration flags: %#x"
>   " (postcopy mode)", flags);
>  ret = -EINVAL;
> +break;

This 'break' breaks from the switch, but doesn't break the loop and
because you remove dthe !ret && from the top, the loop keeps going when
it shouldn't.

> +}
> +
> +/* Detect for any possible file errors */
> +if (qemu_file_get_error(f)) {
> +ret = qemu_file_get_error(f);
> +break;
>  }

This is all simpler if you just leave the !ret && at the top, and then
make this:
  if (!ret) {
  ret = qemu_file_get_error(f);
  }

>  
>  if (place_needed) {

Make that

  if (!ret && place_needed) {

> @@ -2789,9 +2806,10 @@ static int ram_load_postcopy(QEMUFile *f)
>  ret = postcopy_place_page(mis, place_dest,
>place_source, block);
>  }
> -}
> -if (!ret) {
> -ret = qemu_file_get_error(f);
> +
> +if (ret) {
> +break;
> +}

And with the !ret check at the top this goes again.

>  }
>  }
>  
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 4a88228614..1da0255cd7 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1765,6 +1765,11 @@ static int loadvm_process_command(QEMUFile *f)
>  cmd = qemu_get_be16(f);
>  len = qemu_get_be16(f);
>  
> +/* Check validity before continue processing of cmds */
> +if (qemu_file_get_error(f)) {
> +return qemu_file_get_error(f);
> +}
> +
>  trace_loadvm_process_command(cmd, len);
>  if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
>  error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
> @@ -1830,6 +1835,7 @@ static int loadvm_process_command(QEMUFile *f)
>   */
>  static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
>  {
> +int ret;
>  uint8_t read_mark;
>  uint32_t read_section_id;
>  
> @@ -1840,6 +1846,13 @@ static bool check_section_footer(QEMUFile *f, 
> Save

Re: [Qemu-devel] [for-2.12 3/7] pci: Fold pci_bus.h into pci.h

2017-11-30 Thread Marcel Apfelbaum

On 30/11/2017 6:02, David Gibson wrote:

On Wed, Nov 29, 2017 at 12:38:00PM +0200, Marcel Apfelbaum wrote:

On 29/11/2017 10:46, David Gibson wrote:

include/hw/pci/pci_bus.h is now very small and can only safely be included
after hw/pci/pci.h.  So, just fold it into pci.h.



I don't get the benefit from merging the header files.
I would go the other way around and find stuff specific
to pci_bus and add it there, like the pci_bus_new*
you touched in the prev patch.


Hrm.  Except the point of the earlier patch was that those are
actually spoecific to root buses, so would really belong in say
pci-host.h, rather than pci-bus.h.

A log of PCI stuff deals with interaction between the device and bus
though, so it just seemed like more trouble than it was worth to go
disentangling them properly.


Maybe if *all* code files requiring pci.h would automatically
need pci_bus.h would make sense, but I didn't check that.


Yeah, I don't think every user of pci.h needs pci_bus.h, although a
fair few do as you can see from the diff.  Well, I guess it's up to
Michael.  I'll be tolerably content either way - I'd say this is the
least important patch of the series.



Agreed,

Thanks,
Marcel



[Qemu-devel] [Bug 1701974] Re: pwrite does not work right under qemu-sh4

2017-11-30 Thread John Paul Adrian Glaubitz
This might be related to this fix:

>
https://git.qemu.org/?p=qemu.git;a=commit;h=8bf8e9df4a7d82c7a47cc961c9cdee1615595de0

FWIW, if you're interested in sh4, please join #debian-ports on OFTC and
subscribe to the debian-superh mailing list. We're doing lots of sh4
development and testing QEMU in Debian.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1701974

Title:
  pwrite does not work right under qemu-sh4

Status in QEMU:
  New

Bug description:
  The pwrite system call has no effect when writing to a non-zero file
  position, in a program running under qemu-sh4 (version 2.9.0).

  How to reproduce:
  - Compile the program:
sh4-linux-gnu-gcc-5 -O -Wall -static -o test-pwrite test-pwrite.c
  - Set environment variable for using qemu-sh4 (actually not needed, since the 
program is statically linked here).
  - ~/inst-qemu/2.9.0/bin/qemu-sh4 test-pwrite

  Expected output:
  buf = 01W3456789

  Actual output:
  buf = 0123456789
  test-pwrite.c:56: assertion 'strcmp ("01W3456789",buf) == 0' failed
  qemu: uncaught target signal 6 (Aborted) - core dumped

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1701974/+subscriptions



[Qemu-devel] [Bug 1735384] [NEW] OpenJDK JVM segfaults on qemu-sh4 (regression)

2017-11-30 Thread John Paul Adrian Glaubitz
Public bug reported:

Some of the recent changes introduced a regression which makes the
OpenJDK JVM crash on qemu-sh4:

(sid-sh4-sbuild)root@nofan:/# java -version
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault
(sid-sh4-sbuild)root@nofan:/#

An older version works fine:

(sid-sh4-sbuild)root@nofan:/# java -version
openjdk version "9.0.1"
OpenJDK Runtime Environment (build 9.0.1+11-Debian-1)
OpenJDK Zero VM (build 9.0.1+11-Debian-1, interpreted mode)
(sid-sh4-sbuild)root@nofan:/#

Haven't had time for bisecting this yet.

Adrian

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1735384

Title:
  OpenJDK JVM segfaults on qemu-sh4 (regression)

Status in QEMU:
  New

Bug description:
  Some of the recent changes introduced a regression which makes the
  OpenJDK JVM crash on qemu-sh4:

  (sid-sh4-sbuild)root@nofan:/# java -version
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  Segmentation fault
  (sid-sh4-sbuild)root@nofan:/#

  An older version works fine:

  (sid-sh4-sbuild)root@nofan:/# java -version
  openjdk version "9.0.1"
  OpenJDK Runtime Environment (build 9.0.1+11-Debian-1)
  OpenJDK Zero VM (build 9.0.1+11-Debian-1, interpreted mode)
  (sid-sh4-sbuild)root@nofan:/#

  Haven't had time for bisecting this yet.

  Adrian

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1735384/+subscriptions



Re: [Qemu-devel] [PATCH v4 12/20] qed: Switch to .bdrv_co_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qed driver accordingly, taking the opportunity
to inline qed_is_allocated_cb() into its lone caller (the callback
used to be important, until we switched qed to coroutines).  There is
no intent to optimize based on the want_zero flag for this format.

Signed-off-by: Eric Blake 

---
v4: rebase to interface change, inline pointless callback
v3: no change
v2: rebase to mapping flag, fix mask in qed_is_allocated_cb
---
  block/qed.c | 84 +
  1 file changed, 28 insertions(+), 56 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index 28e2ec89e8..cbab5b6f83 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -688,74 +688,46 @@ finish:
  return ret;
  }

-typedef struct {
-BlockDriverState *bs;
-Coroutine *co;
-uint64_t pos;
-int64_t status;
-int *pnum;
-BlockDriverState **file;
-} QEDIsAllocatedCB;
-
-/* Called with table_lock held.  */
-static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t 
len)
-{
-QEDIsAllocatedCB *cb = opaque;
-BDRVQEDState *s = cb->bs->opaque;
-*cb->pnum = len / BDRV_SECTOR_SIZE;
-switch (ret) {
-case QED_CLUSTER_FOUND:
-offset |= qed_offset_into_cluster(s, cb->pos);
-cb->status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
-*cb->file = cb->bs->file->bs;
-break;
-case QED_CLUSTER_ZERO:
-cb->status = BDRV_BLOCK_ZERO;
-break;
-case QED_CLUSTER_L2:
-case QED_CLUSTER_L1:
-cb->status = 0;
-break;
-default:
-assert(ret < 0);
-cb->status = ret;
-break;
-}
-
-if (cb->co) {
-aio_co_wake(cb->co);
-}
-}
-
-static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
+static int coroutine_fn bdrv_qed_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t pos, int64_t bytes,
+ int64_t *pnum, int64_t *map,
   BlockDriverState **file)
  {
  BDRVQEDState *s = bs->opaque;
-size_t len = (size_t)nb_sectors * BDRV_SECTOR_SIZE;
-QEDIsAllocatedCB cb = {
-.bs = bs,
-.pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE,
-.status = BDRV_BLOCK_OFFSET_MASK,
-.pnum = pnum,
-.file = file,
-};
+size_t len;


size_t len = bytes;

with that:
Reviewed-by: Vladimir Sementsov-Ogievskiy 


+int status;
  QEDRequest request = { .l2_table = NULL };
  uint64_t offset;
  int ret;

  qemu_co_mutex_lock(&s->table_lock);
-ret = qed_find_cluster(s, &request, cb.pos, &len, &offset);
-qed_is_allocated_cb(&cb, ret, offset, len);
+ret = qed_find_cluster(s, &request, pos, &len, &offset);


len is in-out parameter, you can't use it uninitialized.



-/* The callback was invoked immediately */
-assert(cb.status != BDRV_BLOCK_OFFSET_MASK);
+*pnum = len;
+switch (ret) {
+case QED_CLUSTER_FOUND:
+*map = offset | qed_offset_into_cluster(s, pos);
+status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+*file = bs->file->bs;
+break;
+case QED_CLUSTER_ZERO:
+status = BDRV_BLOCK_ZERO;
+break;
+case QED_CLUSTER_L2:
+case QED_CLUSTER_L1:
+status = 0;
+break;
+default:
+assert(ret < 0);
+status = ret;
+break;
+}

  qed_unref_l2_cache_entry(request.l2_table);
  qemu_co_mutex_unlock(&s->table_lock);

-return cb.status;
+return status;
  }

  static BDRVQEDState *acb_to_s(QEDAIOCB *acb)
@@ -1595,7 +1567,7 @@ static BlockDriver bdrv_qed = {
  .bdrv_child_perm  = bdrv_format_default_perms,
  .bdrv_create  = bdrv_qed_create,
  .bdrv_has_zero_init   = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = bdrv_qed_co_get_block_status,
+.bdrv_co_block_status = bdrv_qed_co_block_status,
  .bdrv_co_readv= bdrv_qed_co_readv,
  .bdrv_co_writev   = bdrv_qed_co_writev,
  .bdrv_co_pwrite_zeroes= bdrv_qed_co_pwrite_zeroes,



--
Best regards,
Vladimir




[Qemu-devel] [Bug 1701973] Re: pread does not work right under qemu-sh4

2017-11-30 Thread John Paul Adrian Glaubitz
This might be related to this fix:

>
https://git.qemu.org/?p=qemu.git;a=commit;h=8bf8e9df4a7d82c7a47cc961c9cdee1615595de0

FWIW, if you're interested in sh4, please join #debian-ports on OFTC and
subscribe to the debian-superh mailing list. We're doing lots of sh4
development and testing QEMU in Debian.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1701973

Title:
  pread does not work right under qemu-sh4

Status in QEMU:
  New

Bug description:
  The pread system call returns a wrong value in some case, in a program
  running under qemu-sh4 (version 2.9.0).

  How to reproduce:
  - Compile the program:
sh4-linux-gnu-gcc-5 -O -Wall -static -o test-pread test-pread.c
  - Set environment variable for using qemu-sh4 (actually not needed, since the 
program is statically linked here).
  - ~/inst-qemu/2.9.0/bin/qemu-sh4 test-pread

  Expected output:
  ret=1 errno=0

  Actual output:
  ret=0 errno=2
  test-pread.c:44: assertion 'ret == 1' failed
  qemu: uncaught target signal 6 (Aborted) - core dumped

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1701973/+subscriptions



Re: [Qemu-devel] [PATCH RFC 0/9] block: Rewrite block drain begin/end

2017-11-30 Thread Kevin Wolf
Am 30.11.2017 um 03:03 hat Fam Zheng geschrieben:
> On Wed, 11/29 18:25, Kevin Wolf wrote:
> > Am 29.11.2017 um 15:49 hat Fam Zheng geschrieben:
> > > While we look at the fixes for 2.11, I briefly prototyped this series
> > > to see if it makes sense as a simplification of the drain API for
> > > 2.12.
> > > 
> > > The idea is to let AioContext manage quiesce callbacks, then the block
> > > layer only needs to do the in-flight request waiting. This lets us get
> > > rid of the callback recursion (both up and down).
> > 
> > So essentially you don't drain individual nodes any more, but whole
> > AioContexts. I have a feeeling that this would be a step in the wrong
> > direction.
> > 
> > Not only would it completely bypass the path I/O requests take and
> > potentially drain a lot more than is actually necessary, but it also
> > requires that all nodes that are connected in a tree are in the same
> > AioContext.
> 
> Yeah, good point. Initially I wanted to introduce a BlockGraph object
> which manages the per-graph draining, (i.e. where to register the
> drain callbacks), but I felt lazy and used AioContext.
> 
> Will that make it better?  BlockGraph would be a proper abstraction
> and will not limit the API to one AioContext per tree.

There is only a single graph, so this would mean going back to global
bdrv_drain_all() exclusively.

What you really mean is probably connected components in the graph, but
do we really want to manage merging and splitting object representing
connected components when a node is added or removed from the graph?
Especially when that graph change occurs in a drain callback?

You can also still easily introduce bugs where graph changes during a
drain end up in nodes not being drained, possibly drained twice, you
still access the next pointer of a deleted node or you accidentally
switch to draining a different component.

It's probably possible to get this right, but essentially you're just
switching from iterating a tree to iterating a list. You get roughly the
same set of problems that you have to consider as today, and getting it
right should be about the same difficulty.

> > And finally, I don't really think that the recursion is even a problem.
> > The problem is with graph changes made by callbacks that drain allows to
> > run. With your changes, it might be a bit easier to avoid that
> > bdrv_drain() itself gets into trouble due to graph changes, but this
> > doesn't solve the problem for any (possibly indirect) callers of
> > bdrv_drain().
> 
> The recursion is the one big place that can be easily broken by graph changes,
> fixing this doesn't make the situation any worse. We could still fix the
> indirect callers by taking references or by introducing "ubiquitous 
> coroutines".

But hiding a bug in 80% of the cases where it shows isn't enough.

I think the only real solution is to forbid graph changes until after
any critical operation has completed. I haven't tried it out in
practice, but I suppose we could use a CoMutex around them and take it
in bdrv_drained_begin/end() and all other places that can get into
trouble with graph changes.

Kevin



Re: [Qemu-devel] [PATCH v18 05/10] xbitmap: add more operations

2017-11-30 Thread Tetsuo Handa
Wei Wang wrote:
>  /**
> + * xb_clear_bit - clear a range of bits in the xbitmap

Name mismatch.

> + * @start: the start of the bit range, inclusive
> + * @end: the end of the bit range, inclusive
> + *
> + * This function is used to clear a bit in the xbitmap. If all the bits of 
> the
> + * bitmap are 0, the bitmap will be freed.
> + */
> +void xb_clear_bit_range(struct xb *xb, unsigned long start, unsigned long 
> end)
> +{
> + struct radix_tree_root *root = &xb->xbrt;
> + struct radix_tree_node *node;
> + void **slot;
> + struct ida_bitmap *bitmap;
> + unsigned int nbits;
> +
> + for (; start < end; start = (start | (IDA_BITMAP_BITS - 1)) + 1) {
> + unsigned long index = start / IDA_BITMAP_BITS;
> + unsigned long bit = start % IDA_BITMAP_BITS;
> +
> + bitmap = __radix_tree_lookup(root, index, &node, &slot);
> + if (radix_tree_exception(bitmap)) {
> + unsigned long ebit = bit + 2;
> + unsigned long tmp = (unsigned long)bitmap;
> +
> + nbits = min(end - start + 1, BITS_PER_LONG - ebit);

"nbits = min(end - start + 1," seems to expect that start == end is legal
for clearing only 1 bit. But this function is no-op if start == end.
Please clarify what "inclusive" intended.

> +
> + if (ebit >= BITS_PER_LONG)
> + continue;

(I don't understand how radix tree works, but generally this patchset looks 
fuzzy
to me about boundary cases. Thus, I want to confirm that this is not an 
overlook.)
Why is making "ebit >= BITS_PER_LONG" (e.g. start == 62) case a no-op correct?
Aren't there bits which should have been cleared in this case?

> + bitmap_clear(&tmp, ebit, nbits);
> + if (tmp == RADIX_TREE_EXCEPTIONAL_ENTRY)
> + __radix_tree_delete(root, node, slot);
> + else
> + rcu_assign_pointer(*slot, (void *)tmp);
> + } else if (bitmap) {
> + nbits = min(end - start + 1, IDA_BITMAP_BITS - bit);
> +
> + if (nbits != IDA_BITMAP_BITS)
> + bitmap_clear(bitmap->bitmap, bit, nbits);
> +
> + if (nbits == IDA_BITMAP_BITS ||
> + bitmap_empty(bitmap->bitmap, IDA_BITMAP_BITS)) {
> + kfree(bitmap);
> + __radix_tree_delete(root, node, slot);
> + }
> + }
> + }
> +}



> +static inline __always_inline void bitmap_clear(unsigned long *map,
> + unsigned int start,
> + unsigned int nbits)
> +{
> + if (__builtin_constant_p(nbits) && nbits == 1)
> + __clear_bit(start, map);
> + else if (__builtin_constant_p(start & 7) && IS_ALIGNED(start, 8) &&
> +  __builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8))

It looks strange to apply __builtin_constant_p test to variables after "& 7".

> + memset((char *)map + start / 8, 0, nbits / 8);
> + else
> + __bitmap_clear(map, start, nbits);
> +}
> +



Re: [Qemu-devel] [PATCH v18 07/10] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-11-30 Thread Tetsuo Handa
Wei Wang wrote:
> +static inline int xb_set_page(struct virtio_balloon *vb,
> +struct page *page,
> +unsigned long *pfn_min,
> +unsigned long *pfn_max)
> +{
> + unsigned long pfn = page_to_pfn(page);
> + int ret;
> +
> + *pfn_min = min(pfn, *pfn_min);
> + *pfn_max = max(pfn, *pfn_max);
> +
> + do {
> + ret = xb_preload_and_set_bit(&vb->page_xb, pfn,
> +  GFP_NOWAIT | __GFP_NOWARN);

It is a bit of pity that __GFP_NOWARN here is applied to only xb_preload().
Memory allocation by xb_set_bit() will after all emit warnings. Maybe

  xb_init(&vb->page_xb);
  vb->page_xb.gfp_mask |= __GFP_NOWARN;

is tolerable? Or, unconditionally apply __GFP_NOWARN at xb_init()?

  static inline void xb_init(struct xb *xb)
  {
  INIT_RADIX_TREE(&xb->xbrt, IDR_RT_MARKER | GFP_NOWAIT);
  }

> + } while (unlikely(ret == -EAGAIN));
> +
> + return ret;
> +}
> +



> @@ -172,11 +283,18 @@ static unsigned fill_balloon(struct virtio_balloon *vb, 
> size_t num)
>   vb->num_pfns = 0;
>  
>   while ((page = balloon_page_pop(&pages))) {
> + if (use_sg) {
> + if (xb_set_page(vb, page, &pfn_min, &pfn_max) < 0) {
> + __free_page(page);
> + break;

You cannot "break;" without consuming all pages in "pages".

> + }
> + } else {
> + set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> + }
> +
>   balloon_page_enqueue(&vb->vb_dev_info, page);
>  
>   vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;
> -
> - set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>   vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>   if (!virtio_has_feature(vb->vdev,
>   VIRTIO_BALLOON_F_DEFLATE_ON_OOM))



> @@ -212,9 +334,12 @@ static unsigned leak_balloon(struct virtio_balloon *vb, 
> size_t num)
>   struct page *page;
>   struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>   LIST_HEAD(pages);
> + bool use_sg = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_SG);

You can pass use_sg as an argument to leak_balloon(). Then, you won't
need to define leak_balloon_sg_oom(). Since xbitmap allocation does not
use __GFP_DIRECT_RECLAIM, it is safe to reuse leak_balloon() for OOM path.
Just be sure to pass use_sg == false because memory allocation for
use_sg == true likely fails when called from OOM path. (But trying
use_sg == true for OOM path and then fallback to use_sg == false is not bad?)

> + unsigned long pfn_max = 0, pfn_min = ULONG_MAX;
>  
> - /* We can only do one array worth at a time. */
> - num = min(num, ARRAY_SIZE(vb->pfns));
> + /* Traditionally, we can only do one array worth at a time. */
> + if (!use_sg)
> + num = min(num, ARRAY_SIZE(vb->pfns));
>  
>   mutex_lock(&vb->balloon_lock);
>   /* We can't release more pages than taken */



> diff --git a/include/uapi/linux/virtio_balloon.h 
> b/include/uapi/linux/virtio_balloon.h
> index 343d7dd..37780a7 100644
> --- a/include/uapi/linux/virtio_balloon.h
> +++ b/include/uapi/linux/virtio_balloon.h
> @@ -34,6 +34,7 @@
>  #define VIRTIO_BALLOON_F_MUST_TELL_HOST  0 /* Tell before reclaiming 
> pages */
>  #define VIRTIO_BALLOON_F_STATS_VQ1 /* Memory Stats virtqueue */
>  #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM  2 /* Deflate balloon on OOM */
> +#define VIRTIO_BALLOON_F_SG  3 /* Use sg instead of PFN lists */

Want more explicit comment that PFN lists will be used on OOM and therefore
the host side must be prepared for both sg and PFN lists even if negotiated?



Re: [Qemu-devel] [PATCH v4 05/32] migration: implement "postcopy-pause" src logic

2017-11-30 Thread Dr. David Alan Gilbert
* Peter Xu (pet...@redhat.com) wrote:
> Now when network down for postcopy, the source side will not fail the
> migration. Instead we convert the status into this new paused state, and
> we will try to wait for a rescue in the future.
> 
> If a recovery is detected, migration_thread() will reset its local
> variables to prepare for that.
> 
> Reviewed-by: Dr. David Alan Gilbert 

That's still OK; you might want to consider reusing the 'pause_sem' that I
added to MigrationStatus for the other pause case.

Dave

> Signed-off-by: Peter Xu 
> ---
>  migration/migration.c  | 98 
> +++---
>  migration/migration.h  |  3 ++
>  migration/trace-events |  1 +
>  3 files changed, 98 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index dd270f8bc5..46e7ca36a4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -,6 +,8 @@ static void migrate_fd_cleanup(void *opaque)
>  }
>  notifier_list_notify(&migration_state_notifiers, s);
>  block_cleanup_parameters(s);
> +
> +qemu_sem_destroy(&s->postcopy_pause_sem);
>  }
>  
>  void migrate_set_error(MigrationState *s, const Error *error)
> @@ -1267,6 +1269,7 @@ MigrationState *migrate_init(void)
>  s->migration_thread_running = false;
>  error_free(s->error);
>  s->error = NULL;
> +qemu_sem_init(&s->postcopy_pause_sem, 0);
>  
>  migrate_set_state(&s->state, MIGRATION_STATUS_NONE, 
> MIGRATION_STATUS_SETUP);
>  
> @@ -2159,6 +2162,80 @@ bool migrate_colo_enabled(void)
>  return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO];
>  }
>  
> +typedef enum MigThrError {
> +/* No error detected */
> +MIG_THR_ERR_NONE = 0,
> +/* Detected error, but resumed successfully */
> +MIG_THR_ERR_RECOVERED = 1,
> +/* Detected fatal error, need to exit */
> +MIG_THR_ERR_FATAL = 2,
> +} MigThrError;
> +
> +/*
> + * We don't return until we are in a safe state to continue current
> + * postcopy migration.  Returns MIG_THR_ERR_RECOVERED if recovered, or
> + * MIG_THR_ERR_FATAL if unrecovery failure happened.
> + */
> +static MigThrError postcopy_pause(MigrationState *s)
> +{
> +assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +  MIGRATION_STATUS_POSTCOPY_PAUSED);
> +
> +/* Current channel is possibly broken. Release it. */
> +assert(s->to_dst_file);
> +qemu_file_shutdown(s->to_dst_file);
> +qemu_fclose(s->to_dst_file);
> +s->to_dst_file = NULL;
> +
> +error_report("Detected IO failure for postcopy. "
> + "Migration paused.");
> +
> +/*
> + * We wait until things fixed up. Then someone will setup the
> + * status back for us.
> + */
> +while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +qemu_sem_wait(&s->postcopy_pause_sem);
> +}
> +
> +trace_postcopy_pause_continued();
> +
> +return MIG_THR_ERR_RECOVERED;
> +}
> +
> +static MigThrError migration_detect_error(MigrationState *s)
> +{
> +int ret;
> +
> +/* Try to detect any file errors */
> +ret = qemu_file_get_error(s->to_dst_file);
> +
> +if (!ret) {
> +/* Everything is fine */
> +return MIG_THR_ERR_NONE;
> +}
> +
> +if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
> +/*
> + * For postcopy, we allow the network to be down for a
> + * while. After that, it can be continued by a
> + * recovery phase.
> + */
> +return postcopy_pause(s);
> +} else {
> +/*
> + * For precopy (or postcopy with error outside IO), we fail
> + * with no time.
> + */
> +migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
> +trace_migration_thread_file_err();
> +
> +/* Time to stop the migration, now. */
> +return MIG_THR_ERR_FATAL;
> +}
> +}
> +
>  /*
>   * Master migration thread on the source VM.
>   * It drives the migration and pumps the data down the outgoing channel.
> @@ -2183,6 +2260,7 @@ static void *migration_thread(void *opaque)
>  /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
>  enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
>  bool enable_colo = migrate_colo_enabled();
> +MigThrError thr_error;
>  
>  rcu_register_thread();
>  
> @@ -2255,12 +2333,24 @@ static void *migration_thread(void *opaque)
>  }
>  }
>  
> -if (qemu_file_get_error(s->to_dst_file)) {
> -migrate_set_state(&s->state, current_active_state,
> -  MIGRATION_STATUS_FAILED);
> -trace_migration_thread_file_err();
> +/*
> + * Try to detect any kind of failures, and see whether we
> + * should stop the migration now.
> + */
> +thr_error = migration_detect_error(s);
> +

Re: [Qemu-devel] [PATCH v2 for-2.12 07/16] s390x: handle exceptions during s390_cpu_virt_mem_rw() correctly (TCG)

2017-11-30 Thread Thomas Huth
On 29.11.2017 21:26, David Hildenbrand wrote:
> s390_cpu_virt_mem_rw() must always return, so callers can react on
> an exception (e.g. see ioinst_handle_stcrw()).
> 
> However, for TCG we always have to exit the cpu loop (and restore the
> cpu state before that) if we injected a program interrupt. So let's
> introduce and use s390_cpu_virt_mem_handle_exc() in code that is not
> purely KVM.
> 
> Directly pass the retaddr we already have available in these functions.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/s390x/s390-pci-inst.c  |  7 +++
>  target/s390x/cpu.h|  1 +
>  target/s390x/ioinst.c | 20 +---
>  target/s390x/mmu_helper.c | 14 ++
>  4 files changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
> index 8123705dfd..6f41083244 100644
> --- a/hw/s390x/s390-pci-inst.c
> +++ b/hw/s390x/s390-pci-inst.c
> @@ -163,6 +163,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t 
> ra)
>  }
>  
>  if (s390_cpu_virt_mem_read(cpu, env->regs[r2], r2, buffer, 
> sizeof(*reqh))) {
> +s390_cpu_virt_mem_handle_exc(cpu, ra);
>  return 0;
>  }
>  reqh = (ClpReqHdr *)buffer;
> @@ -174,6 +175,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t 
> ra)
>  
>  if (s390_cpu_virt_mem_read(cpu, env->regs[r2], r2, buffer,
> req_len + sizeof(*resh))) {
> +s390_cpu_virt_mem_handle_exc(cpu, ra);
>  return 0;
>  }
>  resh = (ClpRspHdr *)(buffer + req_len);
> @@ -189,6 +191,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t 
> ra)
>  
>  if (s390_cpu_virt_mem_read(cpu, env->regs[r2], r2, buffer,
> req_len + res_len)) {
> +s390_cpu_virt_mem_handle_exc(cpu, ra);
>  return 0;
>  }
[...]

Having to change all these spots is kind of ugly ... and I guess it will also
be quite error-prone in the future (if someone is developing new code under
KVM only and then forgets to add these handlers for TCG).

Can't you do that simply always at the end of s390_cpu_virt_mem_rw() instead,
when the function has been called in non-checking mode (i.e. hostbuf != NULL)?
Then we only need the extra-dance in places where we call the _check function?

> @@ -212,9 +214,12 @@ void ioinst_handle_stcrw(S390CPU *cpu, uint32_t ipb, 
> uintptr_t ra)
>  
>  if (s390_cpu_virt_mem_write(cpu, addr, ar, &crw, sizeof(crw)) == 0) {
>  setcc(cpu, cc);
> -} else if (cc == 0) {
> -/* Write failed: requeue CRW since STCRW is a suppressing 
> instruction */
> -css_undo_stcrw(&crw);
> +} else {
> +if (cc == 0) {
> +/* Write failed: requeue CRW since STCRW is suppressing */
> +css_undo_stcrw(&crw);
> +}
> +s390_cpu_virt_mem_handle_exc(cpu, ra);
>  }
>  }

That hunk would then need to do _check first and in case there is a failure,
call css_undo_stcrw() before the s390_cpu_virt_mem_handle_exc() function, I 
think.

Alternatively, the s390_cpu_virt_mem_rw() function could get an additional 
parameter
that points to a callback function which is used for "clean-up" right before the
cpu_loop_exit ... and in this case the callback function would contain the
css_undo_stcrw().

What do you think?

 Thomas



Re: [Qemu-devel] [PATCH v4 13/20] raw: Switch to .bdrv_co_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the raw driver accordingly.

Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 



--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH v4 14/20] sheepdog: Switch to .bdrv_co_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the sheepdog driver accordingly.

Signed-off-by: Eric Blake 



Reviewed-by: Vladimir Sementsov-Ogievskiy 



--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH v4 15/20] vdi: Avoid bitrot of debugging code

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

Rework the debug define so that we always get -Wformat checking,
even when debugging is disabled.

Signed-off-by: Eric Blake 
Reviewed-by: Stefan Weil 
Reviewed-by: Philippe Mathieu-Daudé 


Reviewed-by: Vladimir Sementsov-Ogievskiy 



--
Best regards,
Vladimir



Re: [Qemu-devel] [PATCH v4 06/32] migration: allow dst vm pause on postcopy

2017-11-30 Thread Dr. David Alan Gilbert
* Peter Xu (pet...@redhat.com) wrote:
> When there is IO error on the incoming channel (e.g., network down),
> instead of bailing out immediately, we allow the dst vm to switch to the
> new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
> new semaphore, until someone poke it for another attempt.
> 
> Signed-off-by: Peter Xu 

As noted last time we need one of the later patches (8/32) to stop a
race; but OK

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/migration.c  |  1 +
>  migration/migration.h  |  3 +++
>  migration/savevm.c | 60 
> --
>  migration/trace-events |  2 ++
>  4 files changed, 64 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 46e7ca36a4..b166e19785 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -150,6 +150,7 @@ MigrationIncomingState 
> *migration_incoming_get_current(void)
>  memset(&mis_current, 0, sizeof(MigrationIncomingState));
>  qemu_mutex_init(&mis_current.rp_mutex);
>  qemu_event_init(&mis_current.main_thread_load_event, false);
> +qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
>  once = true;
>  }
>  return &mis_current;
> diff --git a/migration/migration.h b/migration/migration.h
> index 36aaa13f50..55894ecb79 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -61,6 +61,9 @@ struct MigrationIncomingState {
>  /* The coroutine we should enter (back) after failover */
>  Coroutine *migration_incoming_co;
>  QemuSemaphore colo_incoming_sem;
> +
> +/* notify PAUSED postcopy incoming migrations to try to continue */
> +QemuSemaphore postcopy_pause_sem_dst;
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 1da0255cd7..93e308ebf0 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1529,8 +1529,8 @@ static int 
> loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
>   */
>  static void *postcopy_ram_listen_thread(void *opaque)
>  {
> -QEMUFile *f = opaque;
>  MigrationIncomingState *mis = migration_incoming_get_current();
> +QEMUFile *f = mis->from_src_file;
>  int load_res;
>  
>  migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> @@ -1544,6 +1544,14 @@ static void *postcopy_ram_listen_thread(void *opaque)
>   */
>  qemu_file_set_blocking(f, true);
>  load_res = qemu_loadvm_state_main(f, mis);
> +
> +/*
> + * This is tricky, but, mis->from_src_file can change after it
> + * returns, when postcopy recovery happened. In the future, we may
> + * want a wrapper for the QEMUFile handle.
> + */
> +f = mis->from_src_file;
> +
>  /* And non-blocking again so we don't block in any cleanup */
>  qemu_file_set_blocking(f, false);
>  
> @@ -1626,7 +1634,7 @@ static int 
> loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>  /* Start up the listening thread and wait for it to signal ready */
>  qemu_sem_init(&mis->listen_thread_sem, 0);
>  qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> -   postcopy_ram_listen_thread, mis->from_src_file,
> +   postcopy_ram_listen_thread, NULL,
> QEMU_THREAD_DETACHED);
>  qemu_sem_wait(&mis->listen_thread_sem);
>  qemu_sem_destroy(&mis->listen_thread_sem);
> @@ -2011,11 +2019,44 @@ void qemu_loadvm_state_cleanup(void)
>  }
>  }
>  
> +/* Return true if we should continue the migration, or false. */
> +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> +{
> +trace_postcopy_pause_incoming();
> +
> +migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +  MIGRATION_STATUS_POSTCOPY_PAUSED);
> +
> +assert(mis->from_src_file);
> +qemu_file_shutdown(mis->from_src_file);
> +qemu_fclose(mis->from_src_file);
> +mis->from_src_file = NULL;
> +
> +assert(mis->to_src_file);
> +qemu_file_shutdown(mis->to_src_file);
> +qemu_mutex_lock(&mis->rp_mutex);
> +qemu_fclose(mis->to_src_file);
> +mis->to_src_file = NULL;
> +qemu_mutex_unlock(&mis->rp_mutex);
> +
> +error_report("Detected IO failure for postcopy. "
> + "Migration paused.");
> +
> +while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +qemu_sem_wait(&mis->postcopy_pause_sem_dst);
> +}
> +
> +trace_postcopy_pause_incoming_continued();
> +
> +return true;
> +}
> +
>  static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>  {
>  uint8_t section_type;
>  int ret = 0;
>  
> +retry:
>  while (true) {
>  section_type = qemu_get_byte(f);
>  
> @@ -2060,6 +2101,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, 
> MigrationIncomingState *mis)
>  out:
>  if (ret < 0) {
>  qemu_file_set_error(f, 

Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates

2017-11-30 Thread Igor Mammedov
On Wed, 29 Nov 2017 18:50:19 +
"Dr. David Alan Gilbert (git)"  wrote:

> From: "Dr. David Alan Gilbert" 
> 
> Hi,
>   This is an experimental set that reworks the way the vhost
> code handles changes in physical address space layout that
> came from a discussion with Igor.
Thanks for looking into it.

 
> Instead of updating and trying to merge sections of address
> space on each add/remove callback, we wait until the commit phase
> and go through and rebuild a list by walking the Flatview of
> memory and end up producing an ordered list.
> We compare the list to the old list to trigger updates.
> 
> Note, only very lightly tested so far, I'm just trying to see if it's
> the right shape.
> 
> Igor, is this what you were intending?

I was thinking about a little less intrusive approach

where vhost_region_add/del are modified to maintain
sorted by GPA array of mem_sections, vhost_dev::mem is dropped
altogether and vhost_memory_region array is build/used/freed
on every vhost_commit().
Maintaining sorted array should roughly cost us O(2 log n) if
binary search is used.

However I like your idea with iterator even more as it have
potential to make it even faster O(n) if we get rid of
quadratic and relatively complex vhost_update_compare_list().

Pls, see comments on individual patches.

> Dave
> 
> Dr. David Alan Gilbert (7):
>   memory: address_space_iterate
>   vhost: Move log_dirty check
>   vhost: New memory update functions
>   vhost: update_mem_cb implementation
>   vhost: Compare new and old memory lists
>   vhost: Copy updated region data into device state
>   vhost: Remove vhost_set_memory and children
> 
>  hw/virtio/trace-events |   8 +
>  hw/virtio/vhost.c  | 424 
> ++---
>  include/exec/memory.h  |  23 +++
>  memory.c   |  22 +++
>  4 files changed, 241 insertions(+), 236 deletions(-)
> 




Re: [Qemu-devel] [PATCH v4 16/20] vdi: Switch to .bdrv_co_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vdi driver accordingly.  Note that the
TODO is already covered (the block layer guarantees bounds of its
requests), and that we can remove the now-unused s->block_sectors.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak
v3: no change
v2: rebase to mapping flag
---
  block/vdi.c | 33 +
  1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 6f83221ddc..b30886823d 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -171,8 +171,6 @@ typedef struct {
  uint32_t *bmap;
  /* Size of block (bytes). */
  uint32_t block_size;
-/* Size of block (sectors). */
-uint32_t block_sectors;
  /* First sector of block map. */
  uint32_t bmap_sector;
  /* VDI header (converted to host endianness). */
@@ -462,7 +460,6 @@ static int vdi_open(BlockDriverState *bs, QDict *options, 
int flags,
  bs->total_sectors = header.disk_size / SECTOR_SIZE;

  s->block_size = header.block_size;
-s->block_sectors = header.block_size / SECTOR_SIZE;
  s->bmap_sector = header.offset_bmap / SECTOR_SIZE;
  s->header = header;

@@ -508,33 +505,29 @@ static int vdi_reopen_prepare(BDRVReopenState *state,
  return 0;
  }

-static int64_t coroutine_fn vdi_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn vdi_co_block_status(BlockDriverState *bs,
+bool want_zero,
+int64_t offset, int64_t bytes,
+int64_t *pnum, int64_t *map,
+BlockDriverState **file)
  {
-/* TODO: Check for too large sector_num (in bdrv_is_allocated or here). */
  BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
-size_t bmap_index = sector_num / s->block_sectors;
-size_t sector_in_block = sector_num % s->block_sectors;
-int n_sectors = s->block_sectors - sector_in_block;
+size_t bmap_index = offset / s->block_size;
+size_t index_in_block = offset % s->block_size;
  uint32_t bmap_entry = le32_to_cpu(s->bmap[bmap_index]);
-uint64_t offset;
  int result;

-logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum);
-if (n_sectors > nb_sectors) {
-n_sectors = nb_sectors;
-}
-*pnum = n_sectors;
+logout("%p, %" PRId64 ", %" PRId64 ", %p\n", bs, offset, bytes, pnum);


may be not bad to update message, as you changed numbers, like

logout("%p, offset %" PRId64 ", bytes %" PRId64 ", %p\n", bs, offset, bytes, 
pnum);

but for me it is ok as is too.


+*pnum = MIN(s->block_size, bytes);


looks like it should be MIN(s->block_size - index_in_block, bytes);

with that:
Reviewed-by: Vladimir Sementsov-Ogievskiy 


  result = VDI_IS_ALLOCATED(bmap_entry);
  if (!result) {
  return 0;
  }

-offset = s->header.offset_data +
-  (uint64_t)bmap_entry * s->block_size +
-  sector_in_block * SECTOR_SIZE;
+*map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
+index_in_block;
  *file = bs->file->bs;
-return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
  }

  static int coroutine_fn
@@ -902,7 +895,7 @@ static BlockDriver bdrv_vdi = {
  .bdrv_child_perm  = bdrv_format_default_perms,
  .bdrv_create = vdi_create,
  .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = vdi_co_get_block_status,
+.bdrv_co_block_status = vdi_co_block_status,
  .bdrv_make_empty = vdi_make_empty,

  .bdrv_co_preadv = vdi_co_preadv,



--
Best regards,
Vladimir




Re: [Qemu-devel] [RFC 4/7] vhost: update_mem_cb implementation

2017-11-30 Thread Igor Mammedov
On Wed, 29 Nov 2017 18:50:23 +
"Dr. David Alan Gilbert (git)"  wrote:

> From: "Dr. David Alan Gilbert" 
> 
> Add the meat of update_mem_cb;  this is called for each region,
> to add a region to our temporary list.
> Our temporary list is in order we look to see if this
> region can be merged with the current head of list.
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  hw/virtio/trace-events |  2 ++
>  hw/virtio/vhost.c  | 55 
> +-
>  2 files changed, 56 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 4a493bcd46..92fadec192 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -2,6 +2,8 @@
>  
>  # hw/virtio/vhost.c
>  vhost_section(const char *name, int r) "%s:%d"
> +vhost_update_mem_cb(const char *name, uint64_t gpa, uint64_t size, uint64_t 
> host) "%s: 0x%"PRIx64"+0x%"PRIx64" @ 0x%"PRIx64
> +vhost_update_mem_cb_abut(const char *name, uint64_t new_size) "%s: 0x%"PRIx64
>  
>  # hw/virtio/virtio.c
>  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned 
> out_num) "elem %p size %zd in_num %u out_num %u"
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index c959a59fb3..7e3c6ae032 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -638,11 +638,64 @@ struct vhost_update_mem_tmp {
>  /* Called for each MRS from vhost_update_mem */
>  static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
>  {
> +struct vhost_update_mem_tmp *vtmp = opaque;
> +struct vhost_memory_region *cur_vmr;
> +bool need_add = true;
> +uint64_t mrs_size;
> +uint64_t mrs_gpa;
> +uintptr_t mrs_host;
> +
>  if (!vhost_section(mrs)) {
>  return 0;
>  }
> +mrs_size = int128_get64(mrs->size);
> +mrs_gpa  = mrs->offset_within_address_space;
> +mrs_host = (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
> + mrs->offset_within_region;
> +
> +trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size,
> +  (uint64_t)mrs_host);
> +
> +if (vtmp->nregions) {
What forces you to maintain helper vhost_memory_region array
instead of MemoryRegionSection array?

> +/* Since we already have at least one region, lets see if
> + * this extends it; since we're scanning in order, we only
> + * have to look at the last one, and the FlatView that calls
> + * us shouldn't have overlaps.
> + */
> +struct vhost_memory_region *prev_vmr = vtmp->regions +
> +   (vtmp->nregions - 1);
> +uint64_t prev_gpa_start = prev_vmr->guest_phys_addr;
> +uint64_t prev_gpa_end   = range_get_last(prev_gpa_start,
> + prev_vmr->memory_size);
> +uint64_t prev_host_start = prev_vmr->userspace_addr;
> +uint64_t prev_host_end   = range_get_last(prev_host_start,
> +  prev_vmr->memory_size);
> +
> +if (prev_gpa_end + 1 == mrs_gpa &&
> +prev_host_end + 1 == mrs_host &&
> +(!vtmp->dev->vhost_ops->vhost_backend_can_merge ||
> +vtmp->dev->vhost_ops->vhost_backend_can_merge(vtmp->dev,
> +mrs_host, mrs_size,
> +prev_host_start, prev_vmr->memory_size))) {
> +/* The two regions abut */
> +need_add = false;
> +mrs_size = mrs_size + prev_vmr->memory_size;
> +prev_vmr->memory_size = mrs_size;
> +trace_vhost_update_mem_cb_abut(mrs->mr->name, mrs_size);
> +}
> +}
> +
> +if (need_add) {
> +vtmp->nregions++;
> +vtmp->regions = g_realloc_n(vtmp->regions, vtmp->nregions,
> +sizeof(vtmp->regions[0]));
> +cur_vmr = &vtmp->regions[vtmp->nregions - 1];
> +cur_vmr->guest_phys_addr = mrs_gpa;
> +cur_vmr->memory_size = mrs_size;
> +cur_vmr->userspace_addr  = mrs_host;
> +cur_vmr->flags_padding = 0;
> +}
>  
> -/* TODO */
>  return 0;
>  }
>  




Re: [Qemu-devel] [PATCH v2 for-2.12 08/16] s390x/tcg: don't exit the cpu loop in s390_cpu_virt_mem_rw()

2017-11-30 Thread Thomas Huth
On 29.11.2017 21:26, David Hildenbrand wrote:
> s390_cpu_virt_mem_rw() must always return, so callers can react on
> an exception (e.g. see ioinst_handle_stcrw()).
> 
> Therefore, using program_interrupt() is wrong. Fix that up.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/mmu_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c
> index dbe2f511f8..2c7f3d7d95 100644
> --- a/target/s390x/mmu_helper.c
> +++ b/target/s390x/mmu_helper.c
> @@ -459,7 +459,7 @@ static int translate_pages(S390CPU *cpu, vaddr addr, int 
> nr_pages,
>  }
>  if (!address_space_access_valid(&address_space_memory, pages[i],
>  TARGET_PAGE_SIZE, is_write)) {
> -program_interrupt(env, PGM_ADDRESSING, ILEN_AUTO);
> +trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO);
>  return -EFAULT;
>  }
>  addr += TARGET_PAGE_SIZE;

Is that still right when running with KVM? I think the exception will
then silently be ignored instead?

 Thomas



Re: [Qemu-devel] [PATCH v4 17/20] vmdk: Switch to .bdrv_co_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vmdk driver accordingly.

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak
v3: no change
v2: rebase to mapping flag
---
  block/vmdk.c | 28 +---
  1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index c665bcc977..624b5c296a 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1303,25 +1303,27 @@ static inline uint64_t 
vmdk_find_index_in_cluster(VmdkExtent *extent,
  return offset / BDRV_SECTOR_SIZE;
  }

-static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn vmdk_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map,
+ BlockDriverState **file)
  {
  BDRVVmdkState *s = bs->opaque;
  int64_t index_in_cluster, n, ret;
-uint64_t offset;
+uint64_t cluster_offset;
  VmdkExtent *extent;

-extent = find_extent(s, sector_num, NULL);
+extent = find_extent(s, offset >> BDRV_SECTOR_BITS, NULL);
  if (!extent) {
  return 0;
  }
  qemu_co_mutex_lock(&s->lock);
-ret = get_cluster_offset(bs, extent, NULL,
- sector_num * 512, false, &offset,
+ret = get_cluster_offset(bs, extent, NULL, offset, false, &cluster_offset,
   0, 0);
  qemu_co_mutex_unlock(&s->lock);

-index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
+index_in_cluster = vmdk_find_offset_in_cluster(extent, offset);


vmdk_find_index_in_cluster becomes unused. with it dropped:
Reviewed-by: Vladimir Sementsov-Ogievskiy 


  switch (ret) {
  case VMDK_ERROR:
  ret = -EIO;
@@ -1336,18 +1338,14 @@ static int64_t coroutine_fn 
vmdk_co_get_block_status(BlockDriverState *bs,
  ret = BDRV_BLOCK_DATA;
  if (!extent->compressed) {
  ret |= BDRV_BLOCK_OFFSET_VALID;
-ret |= (offset + (index_in_cluster << BDRV_SECTOR_BITS))
-& BDRV_BLOCK_OFFSET_MASK;
+*map = cluster_offset + index_in_cluster;
  }
  *file = extent->file->bs;
  break;
  }

-n = extent->cluster_sectors - index_in_cluster;
-if (n > nb_sectors) {
-n = nb_sectors;
-}
-*pnum = n;
+n = extent->cluster_sectors * BDRV_SECTOR_SIZE - index_in_cluster;
+*pnum = MIN(n, bytes);
  return ret;
  }

@@ -2393,7 +2391,7 @@ static BlockDriver bdrv_vmdk = {
  .bdrv_close   = vmdk_close,
  .bdrv_create  = vmdk_create,
  .bdrv_co_flush_to_disk= vmdk_co_flush,
-.bdrv_co_get_block_status = vmdk_co_get_block_status,
+.bdrv_co_block_status = vmdk_co_block_status,
  .bdrv_get_allocated_file_size = vmdk_get_allocated_file_size,
  .bdrv_has_zero_init   = vmdk_has_zero_init,
  .bdrv_get_specific_info   = vmdk_get_specific_info,



--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH v2 for-2.12 10/16] s390x/tcg: use s390_program_interrupt() in SCLP Service Call

2017-11-30 Thread Thomas Huth
On 29.11.2017 21:26, David Hildenbrand wrote:
> Now we can drop potential_page_fault(). While at it, move the
> unlock further up, looks cleaner.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/misc_helper.c | 5 ++---
>  target/s390x/translate.c   | 1 -
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/target/s390x/misc_helper.c b/target/s390x/misc_helper.c
> index 556340756c..02654617b3 100644
> --- a/target/s390x/misc_helper.c
> +++ b/target/s390x/misc_helper.c
> @@ -62,11 +62,10 @@ uint32_t HELPER(servc)(CPUS390XState *env, uint64_t r1, 
> uint64_t r2)
>  {
>  qemu_mutex_lock_iothread();
>  int r = sclp_service_call(env, r1, r2);
> +qemu_mutex_unlock_iothread();
>  if (r < 0) {
> -program_interrupt(env, -r, 4);
> -r = 0;
> +s390_program_interrupt(env, -r, 4, GETPC());
>  }
> -qemu_mutex_unlock_iothread();
>  return r;
>  }
>  
> diff --git a/target/s390x/translate.c b/target/s390x/translate.c
> index d0859c4bc7..76b222b0ce 100644
> --- a/target/s390x/translate.c
> +++ b/target/s390x/translate.c
> @@ -3704,7 +3704,6 @@ static ExitStatus op_sqxb(DisasContext *s, DisasOps *o)
>  static ExitStatus op_servc(DisasContext *s, DisasOps *o)
>  {
>  check_privileged(s);
> -potential_page_fault(s);
>  gen_helper_servc(cc_op, cpu_env, o->in2, o->in1);
>  set_cc_static(s);
>  return NO_EXIT;

Reviewed-by: Thomas Huth 



Re: [Qemu-devel] [PATCH v2 for-2.12 11/16] s390x/tcg: use s390_program_interrupt() in DIAG

2017-11-30 Thread Thomas Huth
On 29.11.2017 21:26, David Hildenbrand wrote:
> Now we can drop the two save statements in the translate function.
> 
> Reviewed-by: Richard Henderson 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/misc_helper.c | 2 +-
>  target/s390x/translate.c   | 3 ---
>  2 files changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/target/s390x/misc_helper.c b/target/s390x/misc_helper.c
> index 02654617b3..ee6179ef89 100644
> --- a/target/s390x/misc_helper.c
> +++ b/target/s390x/misc_helper.c
> @@ -101,7 +101,7 @@ void HELPER(diag)(CPUS390XState *env, uint32_t r1, 
> uint32_t r3, uint32_t num)
>  }
>  
>  if (r) {
> -program_interrupt(env, PGM_SPECIFICATION, ILEN_AUTO);
> +s390_program_interrupt(env, PGM_SPECIFICATION, ILEN_AUTO, GETPC());
>  }
>  }
>  
> diff --git a/target/s390x/translate.c b/target/s390x/translate.c
> index 76b222b0ce..cf8ffa217e 100644
> --- a/target/s390x/translate.c
> +++ b/target/s390x/translate.c
> @@ -2124,9 +2124,6 @@ static ExitStatus op_diag(DisasContext *s, DisasOps *o)
>  TCGv_i32 func_code = tcg_const_i32(get_field(s->fields, i2));
>  
>  check_privileged(s);
> -update_psw_addr(s);
> -gen_op_calc_cc(s);
> -
>  gen_helper_diag(cpu_env, r1, r3, func_code);
>  
>  tcg_temp_free_i32(func_code);
> 

Maybe merge it with patch 06/16 ?

 Thomas



Re: [Qemu-devel] [qemu-s390x] [PATCH v2 for-2.12 13/16] s390x/tcg: use s390_program_interrupt() in SACF

2017-11-30 Thread Thomas Huth
On 29.11.2017 21:26, David Hildenbrand wrote:
> Convert this user, too.
> 
> Reviewed-by: Richard Henderson 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/cc_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/s390x/cc_helper.c b/target/s390x/cc_helper.c
> index f008897e84..5d91e458a8 100644
> --- a/target/s390x/cc_helper.c
> +++ b/target/s390x/cc_helper.c
> @@ -564,7 +564,7 @@ void HELPER(sacf)(CPUS390XState *env, uint64_t a1)
>  break;
>  default:
>  HELPER_LOG("unknown sacf mode: %" PRIx64 "\n", a1);
> -program_interrupt(env, PGM_SPECIFICATION, 2);
> +s390_program_interrupt(env, PGM_SPECIFICATION, 2, GETPC());
>  break;
>  }
>  }
> 

Reviewed-by: Thomas Huth 



Re: [Qemu-devel] [qemu-s390x] [PATCH v2 for-2.12 14/16] s390x/tcg: use s390_program_interrupt() in STSI

2017-11-30 Thread Thomas Huth
On 29.11.2017 21:26, David Hildenbrand wrote:
> STSI needs some more love, but let's do one step at a time.
> We can now drop potential_page_fault().
> 
> Reviewed-by: Richard Henderson 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/misc_helper.c | 2 +-
>  target/s390x/translate.c   | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/target/s390x/misc_helper.c b/target/s390x/misc_helper.c
> index a911bff706..6d766ce1e7 100644
> --- a/target/s390x/misc_helper.c
> +++ b/target/s390x/misc_helper.c
> @@ -184,7 +184,7 @@ uint32_t HELPER(stsi)(CPUS390XState *env, uint64_t a0,
>  if ((r0 & STSI_LEVEL_MASK) <= STSI_LEVEL_3 &&
>  ((r0 & STSI_R0_RESERVED_MASK) || (r1 & STSI_R1_RESERVED_MASK))) {
>  /* valid function code, invalid reserved bits */
> -program_interrupt(env, PGM_SPECIFICATION, 4);
> +s390_program_interrupt(env, PGM_SPECIFICATION, 4, GETPC());
>  }
>  
>  sel1 = r0 & STSI_R0_SEL1_MASK;
> diff --git a/target/s390x/translate.c b/target/s390x/translate.c
> index f26fa64a78..1ce1390699 100644
> --- a/target/s390x/translate.c
> +++ b/target/s390x/translate.c
> @@ -3988,7 +3988,6 @@ static ExitStatus op_stpt(DisasContext *s, DisasOps *o)
>  static ExitStatus op_stsi(DisasContext *s, DisasOps *o)
>  {
>  check_privileged(s);
> -potential_page_fault(s);
>  gen_helper_stsi(cc_op, cpu_env, o->in2, regs[0], regs[1]);
>  set_cc_static(s);
>  return NO_EXIT;

Reviewed-by: Thomas Huth 



Re: [Qemu-devel] [PATCH v2 for-2.12 08/16] s390x/tcg: don't exit the cpu loop in s390_cpu_virt_mem_rw()

2017-11-30 Thread David Hildenbrand
On 30.11.2017 12:39, Thomas Huth wrote:
> On 29.11.2017 21:26, David Hildenbrand wrote:
>> s390_cpu_virt_mem_rw() must always return, so callers can react on
>> an exception (e.g. see ioinst_handle_stcrw()).
>>
>> Therefore, using program_interrupt() is wrong. Fix that up.
>>
>> Signed-off-by: David Hildenbrand 
>> ---
>>  target/s390x/mmu_helper.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c
>> index dbe2f511f8..2c7f3d7d95 100644
>> --- a/target/s390x/mmu_helper.c
>> +++ b/target/s390x/mmu_helper.c
>> @@ -459,7 +459,7 @@ static int translate_pages(S390CPU *cpu, vaddr addr, int 
>> nr_pages,
>>  }
>>  if (!address_space_access_valid(&address_space_memory, pages[i],
>>  TARGET_PAGE_SIZE, is_write)) {
>> -program_interrupt(env, PGM_ADDRESSING, ILEN_AUTO);
>> +trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO);
>>  return -EFAULT;
>>  }
>>  addr += TARGET_PAGE_SIZE;
> 
> Is that still right when running with KVM? I think the exception will
> then silently be ignored instead?

Good point (for older KVM). And ugly.

if (kvm_enabled()) {
kvm_s390_program_interrupt...
} else {
trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO);
}

> 
>  Thomas
> 


-- 

Thanks,

David / dhildenb



Re: [Qemu-devel] [PATCH v2 for-2.12 11/16] s390x/tcg: use s390_program_interrupt() in DIAG

2017-11-30 Thread David Hildenbrand
On 30.11.2017 12:53, Thomas Huth wrote:
> On 29.11.2017 21:26, David Hildenbrand wrote:
>> Now we can drop the two save statements in the translate function.
>>
>> Reviewed-by: Richard Henderson 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  target/s390x/misc_helper.c | 2 +-
>>  target/s390x/translate.c   | 3 ---
>>  2 files changed, 1 insertion(+), 4 deletions(-)
>>
>> diff --git a/target/s390x/misc_helper.c b/target/s390x/misc_helper.c
>> index 02654617b3..ee6179ef89 100644
>> --- a/target/s390x/misc_helper.c
>> +++ b/target/s390x/misc_helper.c
>> @@ -101,7 +101,7 @@ void HELPER(diag)(CPUS390XState *env, uint32_t r1, 
>> uint32_t r3, uint32_t num)
>>  }
>>  
>>  if (r) {
>> -program_interrupt(env, PGM_SPECIFICATION, ILEN_AUTO);
>> +s390_program_interrupt(env, PGM_SPECIFICATION, ILEN_AUTO, GETPC());
>>  }
>>  }
>>  
>> diff --git a/target/s390x/translate.c b/target/s390x/translate.c
>> index 76b222b0ce..cf8ffa217e 100644
>> --- a/target/s390x/translate.c
>> +++ b/target/s390x/translate.c
>> @@ -2124,9 +2124,6 @@ static ExitStatus op_diag(DisasContext *s, DisasOps *o)
>>  TCGv_i32 func_code = tcg_const_i32(get_field(s->fields, i2));
>>  
>>  check_privileged(s);
>> -update_psw_addr(s);
>> -gen_op_calc_cc(s);
>> -
>>  gen_helper_diag(cpu_env, r1, r3, func_code);
>>  
>>  tcg_temp_free_i32(func_code);
>>
> 
> Maybe merge it with patch 06/16 ?
>

No, because this is tcg specific.


>  Thomas
> 


-- 

Thanks,

David / dhildenb



Re: [Qemu-devel] [qemu-s390x] [PATCH v2 for-2.12 15/16] s390x/tcg: drop program_interrupt()

2017-11-30 Thread Thomas Huth
On 29.11.2017 21:27, David Hildenbrand wrote:
> All users are gone, we can finally drop it and make sure that all new
> program interrupt injections are reminded of the retaddr - as they have to
> use s390_program_interrupt() now.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/cpu.h   |  1 -
>  target/s390x/interrupt.c | 22 +-
>  2 files changed, 5 insertions(+), 18 deletions(-)

Reviewed-by: Thomas Huth 



Re: [Qemu-devel] [PATCH v1 2/2] intel-iommu: Extend address width to 48 bits

2017-11-30 Thread Peter Xu
On Thu, Nov 30, 2017 at 05:54:35PM +0800, Liu, Yi L wrote:
> On Thu, Nov 30, 2017 at 05:11:55PM +0800, Peter Xu wrote:
> > On Thu, Nov 30, 2017 at 01:22:38PM +0800, Liu, Yi L wrote:
> > > On Tue, Nov 14, 2017 at 06:13:50PM -0500, prasad.singamse...@oracle.com 
> > > wrote:
> > > > From: Prasad Singamsetty 
> > > > 
> > > > The current implementation of Intel IOMMU code only supports 39 bits
> > > > iova address width. This patch provides a new parameter (x-aw-bits)
> > > > for intel-iommu to extend its address width to 48 bits but keeping the
> > > > default the same (39 bits). The reason for not changing the default
> > > > is to avoid potential compatibility problems with live migration of
> > > > intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits'
> > > > parameter are 39 and 48.
> > > > 
> > > > After enabling larger address width (48), we should be able to map
> > > > larger iova addresses in the guest. For example, a QEMU guest that
> > > > is configured with large memory ( >=1TB ). To check whether 48 bits
> > > 
> > > I didn't quite get your point here. Address width limits the iova range,
> > > but it doesn't limit the guest memory range. e.g. you can use 39 bit iova
> > > address to access a guest physical address larger than (2^39 - 1) as long
> > > as the guest 2nd level page table is well programmed. Only one exception,
> > > if you requires a continuous iova range(e.g. 2^39), it would be an issue.
> > > Not sure if this is the major motivation of your patch? However, I'm not
> > > against extend the address width to be 48 bits. Just need to make it clear
> > > here.
> > 
> > One thing I can think of is the identity mapping. Say, when iommu=pt
> > is set in guest, meanwhile when PT capability is not supported from
> > hardware (here I mean, the emulated hardware, of course), guest kernel
> > will create one identity mapping to emulate the PT mode.
> 
> True.
>  
> > Current linux kernel's identity mapping should be a static 48 bits
> > mapping (it must cover the whole memory range of guest), so if we
> 
> I suppose guest memory range depends on the AW reported by CPUID? Not sure
> if it is constantly 48 bits.

Please refer to si_domain_init() and DEFAULT_DOMAIN_ADDRESS_WIDTH.

> 
> > provide a 39 bits vIOMMU to the guest, AFAIU we'll fail at device
> > attaching to that identity domain of every single device that is
> > protected by that 39 bits vIOMMU (kernel will find that domain gaw is
> > bigger than vIOMMU supported gaw of that device).
> 
> Yeah, this is a good argue. As it is 1:1 mapping, the translated address
> is limited all the same.
> 
> > I do see no good fix for that, except boost the supported gaw to
> > bigger ones.
> 
> How about defaultly expose cap.PT bit? In that case, there will no 1:1
> mapping in guest side. Translation is skipped. So the IOMMU AW won't
> limit the addressing.

PT is defaultly on already from the first day it's there. :)

-- 
Peter Xu



Re: [Qemu-devel] [PATCH 2/2] ide: fix crash in IDE cdrom read

2017-11-30 Thread Denis V. Lunev
On 11/29/2017 02:50 AM, John Snow wrote:
>
> On 11/28/2017 07:10 AM, Denis V. Lunev wrote:
>> There is the following crash reported from the field in QEMU 2.9:
>> bdrv_inc_in_flight (bs=bs@entry=0x0)
>> blk_aio_prwv
>> blk_aio_preadv
>> ide_buffered_readv
>> cd_read_sector
> Is ide_atapi_cmd_reply_end missing from the call stack here for some
> reason? ide_data_readw /should/ have end_transfer_func set to that if it
> was processing an ATAPI command; otherwise it shouldn't be able to get
> here under normal circumstances...
this is my fault. I have lost this line while removing unnecessary into
from callstack. Here if the full one.

(gdb) bt
#0  bdrv_inc_in_flight (bs=bs@entry=0x0) at block/io.c:561
#1  0x55b224160d37 in blk_aio_prwv (blk=0x55b2265405a0,
    offset=offset@entry=285122560, bytes=2048,
qiov=qiov@entry=0x55b22a31de20,
    co_entry=co_entry@entry=0x55b224160400 ,
    flags=flags@entry=0, cb=cb@entry=0x55b224048e00
,
    opaque=opaque@entry=0x55b22a31de00) at block/block-backend.c:1151
#2  0x55b224160db5 in blk_aio_preadv (blk=,
    offset=offset@entry=285122560, qiov=qiov@entry=0x55b22a31de20,
    flags=flags@entry=0, cb=cb@entry=0x55b224048e00
,
    opaque=opaque@entry=0x55b22a31de00) at block/block-backend.c:1256
#3  0x55b22404bd8d in ide_buffered_readv (s=s@entry=0x55b22a712a68,
    sector_num=556880, iov=iov@entry=0x55b22a712d60,
    nb_sectors=nb_sectors@entry=4,
    cb=cb@entry=0x55b22404f2a0 ,
    opaque=opaque@entry=0x55b22a712a68) at hw/ide/core.c:637
#4  0x55b22404e2c1 in cd_read_sector (s=0x55b22a712a68)
    at hw/ide/atapi.c:198
#5  ide_atapi_cmd_reply_end (s=0x55b22a712a68) at hw/ide/atapi.c:272
#6  0x55b224049704 in ide_data_readw (opaque=,
    addr=) at hw/ide/core.c:2263
#7  0x55b223edd7f0 in portio_read (opaque=0x55b22a836000, addr=0,
size=2)
    at /usr/src/debug/qemu-2.9.0/ioport.c:180
#8  0x55b223ee8e3b in memory_region_read_accessor (mr=0x55b22a836000,
    addr=0, value=0x7f843b5be7c0, size=2, shift=0, mask=65535, attrs=...)
    at /usr/src/debug/qemu-2.9.0/memory.c:435
#9  0x55b223ee6369 in access_with_adjusted_size (addr=addr@entry=0,
    value=value@entry=0x7f843b5be7c0, size=size@entry=2,
    access_size_min=, access_size_max=,
    access=access@entry=0x55b223ee8e10 ,
    mr=mr@entry=0x55b22a836000, attrs=attrs@entry=...)
    at /usr/src/debug/qemu-2.9.0/memory.c:592
#10 0x55b223ee9d36 in memory_region_dispatch_read1 (attrs=..., size=2,
    pval=0x7f843b5be7c0, addr=0, mr=0x55b22a836000)
    at /usr/src/debug/qemu-2.9.0/memory.c:1238
#11 memory_region_dispatch_read (mr=mr@entry=0x55b22a836000,
    addr=addr@entry=0, pval=pval@entry=0x7f843b5be7c0, size=size@entry=2,
    attrs=attrs@entry=...) at /usr/src/debug/qemu-2.9.0/memory.c:1269
#12 0x55b223e9bdb2 in address_space_read_continue (
    as=as@entry=0x55b2247db8c0 , addr=addr@entry=496,
---Type  to continue, or q  to quit---
    attrs=..., attrs@entry=...,
    buf=buf@entry=0x7f844dc103fe ,
    len=len@entry=2, addr1=0, l=2, mr=0x55b22a836000)
    at /usr/src/debug/qemu-2.9.0/exec.c:2844
#13 0x55b223e9be67 in address_space_read_full (
    as=0x55b2247db8c0 , addr=addr@entry=496, attrs=...,
    buf=buf@entry=0x7f844dc103fe ,
    len=2, len@entry=602521191) at /usr/src/debug/qemu-2.9.0/exec.c:2895
#14 0x55b223e9bfce in address_space_read (len=602521191,
    buf=0x7f844dc103fe , attrs=...,
    addr=496, as=)
    at /usr/src/debug/qemu-2.9.0/include/exec/memory.h:1718
#15 address_space_rw (as=, addr=addr@entry=496, attrs=...,
    attrs@entry=...,
    buf=buf@entry=0x7f844dc103fe ,
    len=len@entry=2, is_write=is_write@entry=false)
    at /usr/src/debug/qemu-2.9.0/exec.c:2909
#16 0x55b223ee5271 in kvm_handle_io (count=512, size=2,
    direction=, data=, attrs=..., port=496)
    at /usr/src/debug/qemu-2.9.0/kvm-all.c:1828
#17 kvm_cpu_exec (cpu=cpu@entry=0x55b229032000)
    at /usr/src/debug/qemu-2.9.0/kvm-all.c:2058
#18 0x55b223ed1c22 in qemu_kvm_cpu_thread_fn (arg=0x55b229032000)
    at /usr/src/debug/qemu-2.9.0/cpus.c:1119
#19 0x7f8443993e25 in start_thread (arg=0x7f843b5bf700)
    at pthread_create.c:308
#20 0x7f84436c134d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113



>> ide_data_readw
> How do we get here? ide_is_pio_out ought to be false here; do you know
> what command was being processed? Do you have a reproducer?
>
> Knowing both the ATA and ATAPI commands under consideration here would
> be helpful.
>

if you prefer, I can upload core/binary to some storage.
Here is the state.

(gdb) p *s
$3 = {bus = 0x55b22a7129f0, unit = 0 '\000', drive_kind = IDE_CD,
  cylinders = 0, heads = 0, sectors = 0, chs_trans = 0, nb_sectors =
11588488,
  mult_sectors = 16, identify_set = 1,
  identify_data = "\300\205", '\000' , "MQ 1", ' '
, "\003\000\000\002\004\000.2+5    EQUMD DVR-MO", ' '
,
"\000\000\001\000\000\003\000\000\000\000\000\000\a", '\000' ,
"\a\000\a\000\003\000\264\000\264\000,\001\264\000\000\000\000\000

Re: [Qemu-devel] [qemu-s390x] [PATCH v2 for-2.12 16/16] s390x/tcg: drop potential_page_fault()

2017-11-30 Thread Thomas Huth
On 29.11.2017 21:27, David Hildenbrand wrote:
> Only one user left, get rid of it so we don't get any new users.
> 
> Reviewed-by: Richard Henderson 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/translate.c | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/target/s390x/translate.c b/target/s390x/translate.c
> index 1ce1390699..26cf993405 100644
> --- a/target/s390x/translate.c
> +++ b/target/s390x/translate.c
> @@ -240,12 +240,6 @@ static void update_cc_op(DisasContext *s)
>  }
>  }
>  
> -static void potential_page_fault(DisasContext *s)
> -{
> -update_psw_addr(s);
> -update_cc_op(s);
> -}
> -
>  static inline uint64_t ld_code2(CPUS390XState *env, uint64_t pc)
>  {
>  return (uint64_t)cpu_lduw_code(env, pc);
> @@ -2939,7 +2933,8 @@ static ExitStatus op_lpd(DisasContext *s, DisasOps *o)
>  
>  /* In a parallel context, stop the world and single step.  */
>  if (tb_cflags(s->tb) & CF_PARALLEL) {
> -potential_page_fault(s);
> +update_psw_addr(s);
> +update_cc_op(s);
>  gen_exception(EXCP_ATOMIC);
>  return EXIT_NORETURN;
>  }
> 

Reviewed-by: Thomas Huth 



Re: [Qemu-devel] [PATCH v2 for-2.12 07/16] s390x: handle exceptions during s390_cpu_virt_mem_rw() correctly (TCG)

2017-11-30 Thread David Hildenbrand
On 30.11.2017 12:01, Thomas Huth wrote:
> On 29.11.2017 21:26, David Hildenbrand wrote:
>> s390_cpu_virt_mem_rw() must always return, so callers can react on
>> an exception (e.g. see ioinst_handle_stcrw()).
>>
>> However, for TCG we always have to exit the cpu loop (and restore the
>> cpu state before that) if we injected a program interrupt. So let's
>> introduce and use s390_cpu_virt_mem_handle_exc() in code that is not
>> purely KVM.
>>
>> Directly pass the retaddr we already have available in these functions.
>>
>> Signed-off-by: David Hildenbrand 
>> ---
>>  hw/s390x/s390-pci-inst.c  |  7 +++
>>  target/s390x/cpu.h|  1 +
>>  target/s390x/ioinst.c | 20 +---
>>  target/s390x/mmu_helper.c | 14 ++
>>  4 files changed, 39 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
>> index 8123705dfd..6f41083244 100644
>> --- a/hw/s390x/s390-pci-inst.c
>> +++ b/hw/s390x/s390-pci-inst.c
>> @@ -163,6 +163,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t 
>> ra)
>>  }
>>  
>>  if (s390_cpu_virt_mem_read(cpu, env->regs[r2], r2, buffer, 
>> sizeof(*reqh))) {
>> +s390_cpu_virt_mem_handle_exc(cpu, ra);
>>  return 0;
>>  }
>>  reqh = (ClpReqHdr *)buffer;
>> @@ -174,6 +175,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t 
>> ra)
>>  
>>  if (s390_cpu_virt_mem_read(cpu, env->regs[r2], r2, buffer,
>> req_len + sizeof(*resh))) {
>> +s390_cpu_virt_mem_handle_exc(cpu, ra);
>>  return 0;
>>  }
>>  resh = (ClpRspHdr *)(buffer + req_len);
>> @@ -189,6 +191,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t 
>> ra)
>>  
>>  if (s390_cpu_virt_mem_read(cpu, env->regs[r2], r2, buffer,
>> req_len + res_len)) {
>> +s390_cpu_virt_mem_handle_exc(cpu, ra);
>>  return 0;
>>  }
> [...]
> 
> Having to change all these spots is kind of ugly ... and I guess it will also
> be quite error-prone in the future (if someone is developing new code under
> KVM only and then forgets to add these handlers for TCG).

The good thing is, it won't break KVM but only TCG. And it has been
broken in TCG for years now without anybody noticing it.

> 
> Can't you do that simply always at the end of s390_cpu_virt_mem_rw() instead,
> when the function has been called in non-checking mode (i.e. hostbuf != NULL)?
> Then we only need the extra-dance in places where we call the _check function?

Oh god no, no such strange calling conventions from the same function.

> 
>> @@ -212,9 +214,12 @@ void ioinst_handle_stcrw(S390CPU *cpu, uint32_t ipb, 
>> uintptr_t ra)
>>  
>>  if (s390_cpu_virt_mem_write(cpu, addr, ar, &crw, sizeof(crw)) == 0) {
>>  setcc(cpu, cc);
>> -} else if (cc == 0) {
>> -/* Write failed: requeue CRW since STCRW is a suppressing 
>> instruction */
>> -css_undo_stcrw(&crw);
>> +} else {
>> +if (cc == 0) {
>> +/* Write failed: requeue CRW since STCRW is suppressing */
>> +css_undo_stcrw(&crw);
>> +}
>> +s390_cpu_virt_mem_handle_exc(cpu, ra);
>>  }
>>  }
> 
> That hunk would then need to do _check first and in case there is a failure,
> call css_undo_stcrw() before the s390_cpu_virt_mem_handle_exc() function, I 
> think.

I don't really like that. And there would be a possibility for a race.
So a no from my side.

> 
> Alternatively, the s390_cpu_virt_mem_rw() function could get an additional 
> parameter
> that points to a callback function which is used for "clean-up" right before 
> the
> cpu_loop_exit ... and in this case the callback function would contain the
> css_undo_stcrw().
> 
> What do you think?

I also thought about this, but adding two parameters to every call is
even uglier. Not talking about having to construct special structs to
pass through the data. Please not that the TPI handler I am about to
introduce will also require to revert stuff in case there was an exception.

Another option I thought about was checking in the caller if EXC_PGM has
been set. But missing to add such a check is even easier.

I'd prefer to keep it as is, but if there are other opinions/ideas,
please speak up.

Thanks for having a look!

> 
>  Thomas
> 


-- 

Thanks,

David / dhildenb



Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates

2017-11-30 Thread Dr. David Alan Gilbert
* Igor Mammedov (imamm...@redhat.com) wrote:
> On Wed, 29 Nov 2017 18:50:19 +
> "Dr. David Alan Gilbert (git)"  wrote:
> 
> > From: "Dr. David Alan Gilbert" 
> > 
> > Hi,
> >   This is an experimental set that reworks the way the vhost
> > code handles changes in physical address space layout that
> > came from a discussion with Igor.
> Thanks for looking into it.
> 
>  
> > Instead of updating and trying to merge sections of address
> > space on each add/remove callback, we wait until the commit phase
> > and go through and rebuild a list by walking the Flatview of
> > memory and end up producing an ordered list.
> > We compare the list to the old list to trigger updates.
> > 
> > Note, only very lightly tested so far, I'm just trying to see if it's
> > the right shape.
> > 
> > Igor, is this what you were intending?
> 
> I was thinking about a little less intrusive approach
> 
> where vhost_region_add/del are modified to maintain
> sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> altogether and vhost_memory_region array is build/used/freed
> on every vhost_commit().
> Maintaining sorted array should roughly cost us O(2 log n) if
> binary search is used.
> 
> However I like your idea with iterator even more as it have
> potential to make it even faster O(n) if we get rid of
> quadratic and relatively complex vhost_update_compare_list().

Note vhost_update_compare_list is complex, but it is O(n) - it's
got nested loops, but the inner loop moves forward and oldi never
gets reset back to zero.

> Pls, see comments on individual patches.

Thanks; I have fixed a couple of bugs since I posted, so I'm
more interested in structure/shape.  Any good ideas how to test
it are welcome.

Dave

> 
> > Dave
> > 
> > Dr. David Alan Gilbert (7):
> >   memory: address_space_iterate
> >   vhost: Move log_dirty check
> >   vhost: New memory update functions
> >   vhost: update_mem_cb implementation
> >   vhost: Compare new and old memory lists
> >   vhost: Copy updated region data into device state
> >   vhost: Remove vhost_set_memory and children
> > 
> >  hw/virtio/trace-events |   8 +
> >  hw/virtio/vhost.c  | 424 
> > ++---
> >  include/exec/memory.h  |  23 +++
> >  memory.c   |  22 +++
> >  4 files changed, 241 insertions(+), 236 deletions(-)
> > 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [RFC PATCH 1/1] s390x/css: unresrict cssids

2017-11-30 Thread Christian Borntraeger


On 11/30/2017 10:50 AM, Cornelia Huck wrote:
> On Wed, 29 Nov 2017 19:51:23 +0100
> Christian Borntraeger  wrote:
> 
>> On 11/28/2017 03:45 PM, Cornelia Huck wrote:
>>> On Tue, 28 Nov 2017 15:17:49 +0100
>>> Christian Borntraeger  wrote:
>>>   
 On 11/28/2017 03:01 PM, Cornelia Huck wrote:  
> On Tue, 28 Nov 2017 14:25:08 +0100
> Christian Borntraeger  wrote:  
>>>   
>> What I want now is to enable vfio-ccw for libvirt and Linux guests and 
>> for that
>> we need to lift the restriction of having vfio not in FE.
>
> This, however, worries me. I don't really consider vfio-ccw ready for
> prime time yet. We still need to figure out path handling at the very
> least. And I'm still not sure that our non-handling of halt/clear won't
> cause major issues with e.g. a storage server error recovery.
>
> Can we flag vfio-ccw as experimental or such in libvirt?

 We should have flagged vfio-ccw experimental in QEMU then.
 e.g. by using x-vfio-ccw or whatever.I dont think we can do that
 after the facts.  
>>>
>>> Yes, we should have done that... can't fix that now, unfortunately.
>>>   

 I am not that deep into vfio-ccw, but I was under the impression that
 the missing features should not affect the vfio interface that is
 created by libvirt. After all this should just be a "-device 
 vfio-ccw,."
 statement and some setup steps beforehand (unbind + setup of vfio-ccw)  
>>>
>>> The halt/clear stuff is highly unlikely to influence the configuration
>>> interface. I'm still not too clear about path handling, though. Will we
>>> need different modes (hypervisor managed vs. full passthrough? probably
>>> only passthrough)? What about pgid handling - do we need some kind of
>>> pgid manager? (Keep in mind that you get to set the pgid once. This has
>>> implications for e.g. reserve/release.)
>>>
>>> Also, what about devices other than ECKD DASD? Has there been any
>>> testing? Tapes should be interesting; the other interesting ccw devices
>>> are QDIO-based and I'm not sure we want to spend anything on supporting
>>> those.  
>>
>> DASD is probably the most important thing today, QDIO might never happen
>> (or very late).
> 
> One thing I'd find interesting about tapes is very long running channel
> programs (like rewind) and how they interact with halt/clear. But yes,
> I would think that DASD is the most important one in practice.
> 
>> See my proposal below.
>>
>>>
>>> The interface is probably fine, but I'd like to get an idea about the
>>> path stuff (is the attachment spec that contains the pgid stuff
>>> publicly available, btw.?)
>>>   

 If your concern is the serviceability I think it would be valid for a RHEL
 KVM to disable vfio-ccw in the kernel. Maybe we could even provide a config
 for QEMU?  
>>>
>>> It's fine just to turn off vfio-ccw in the kernel.
>>>   
 PS: I learned from the PCI and CCW experience that I do not want
 to upstream kernel/qemu code unless I have a working libvirt code that
 shows that the thing will work.  
>>>
>>> Yes, I understand the wish to verify the interface, and I think it's a
>>> good idea. What I'm worried about is that this might be the precursor
>>> to a push to support it, and I don't think vfio-ccw is ready for that
>>> yet.
>>>   
>> FWIW, libvirt should not care if a device works in all cases or not because 
>> libvirt versions should work with all kind of QEMU/kernel combinations. 
>> Fencing in libvirt that the kernel part is not up to the task is making
>> me feel the same way as you when you see  the css-unrestricted property
>> at a device ;-)
> 
> I'm more worried about the political angle than the technical. If we
> don't get pressure to support this too soon, I don't care that much
> about experimental or not.
> 
>> Having said that, I think that having vfio-ccw support has a value (and it
>> actually works for an unnamed test infrastructure). In addition to that I 
>> am much more likely to actually test this continuously if I have libvirt 
>> support.
> 
> Good to hear that it works for a non-Linux guest. Any plans to test
> something like storage server failover?
> 
> [I'd really love to do something about the path handling stuff, but the
> combination of scant documentation and scantier time works against me.]
> 
>>
>>
>> So what about the following: 
>>
>> 1. We will implement the libvirt support with either a or b:
>> a: if we find a solution to our "where to put the property" dispute use that 
>> to decide
>> if we can add vfio-ccw
>> b if not: just provide a patch that lifts the restriction without any 
>> property.
>> in libvirt blindy assume  that free assignment will work. old QEMUs will 
>> complain at
>> startup and  libvirt will print the QEMU error. This is similar to other 
>> situations 
>> where libvirt cannot fully bprobe  if the QEMU will work or not. (not having 
>> vfio-ccw
>> support in the kernel will certa

Re: [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

18.11.2017 00:35, John Snow wrote:


On 11/17/2017 03:22 AM, Vladimir Sementsov-Ogievskiy wrote:

17.11.2017 06:10, John Snow wrote:

On 11/16/2017 03:17 AM, Vladimir Sementsov-Ogievskiy wrote:

16.11.2017 00:20, John Snow wrote:

On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi all.

There are three qmp commands, needed to implement external backup API.

Using these three commands, client may do all needed bitmap
management by
hand:

on backup start we need to do a transaction:
   {disable old bitmap, create new bitmap}

on backup success:
   drop old bitmap

on backup fail:
   enable old bitmap
   merge new bitmap to old bitmap
   drop new bitmap


Can you give me an example of how you expect these commands to be used,
and why they're required?

I'm a little weary about how error-prone these commands might be and the
potential for incorrect usage seems... high. Why do we require them?

It is needed for incremental backup. It looks like bad idea to export
abdicate/reclaim functionality, it is simpler
and clearer to allow user to merge/enable/disable bitmaps by hand.

usage is like this:

1. we have dirty bitmap bitmap0 for incremental backup.

2. prepare image fleecing (create temporary image with backing=our_disk)
3. in qmp transaction:
     - disable bitmap0
     - create bitmap1
     - start image fleecing (backup sync=none our_disk -> temp_disk)

This could probably just be its own command, though:

block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera

Could handle forking the bitmap. I'm not sure what the arguments would
look like, but we could name the NBD export here, too. (Assuming the
server is already started and we just need to create the share.)

Then, we can basically do what mirror does:

(1) Cancel
(2) Complete

Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back),
and Complete would instruct QEMU to discard the changes.

This way we don't need to expose commands like split or merge that will
almost always be dangerous to use over QMP.

In fact, a fleecing job would be really convenient even without a
bitmap, because it'd still be nice to have a convenience command for it.
Using an existing infrastructure and understood paradigm is just a bonus.

1. If I understand correctly, Kevin and Max said in their report in
Prague about new block-job approach,
   using filter nodes, so I'm not sure that this is a good Idea to
introduce now new old-style block-job, where we can
   do without it.


We could do without it, but it might be a lot better to have everything
wrapped up in a command that's easy to digest instead of releasing 10
smaller commands that have to be executed in a very specific way in
order to work correctly.

I'm thinking about the complexity of error checking here with all the
smaller commands, versus error checking on a larger workflow we
understand and can quality test better.

I'm not sure that filter nodes becoming the new normal for block jobs
precludes our ability to use the job-management API as a handle for
managing the lifetime of a long-running task like fleecing, but I'll
check with Max and Kevin about this.


2. there is the following scenario: customers needs a possibility to
create a backup of data changed since some
point in time. So, maintaining several static and one (or several) activ
bitmaps with a possiblity of merge some of them
and create a backup using this merged bitmap may be convenient.


I think the ability to copy bitmaps and issue differential backups would
be sufficient in all cases I could think of...


so, instead of keeping several static bitmaps with ability to merge them,
you propose to keep several active bitmaps and copy them to make a backup?

so, instead of new qmp command for merge, add new qmp command for copy?

in case of static bitmaps, we can implement saving/loading them to the 
image to free RAM space,

so it is better.

or what do you propose for  [2] ?




--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH v2 for-2.12 08/16] s390x/tcg: don't exit the cpu loop in s390_cpu_virt_mem_rw()

2017-11-30 Thread Cornelia Huck
On Thu, 30 Nov 2017 12:57:00 +0100
David Hildenbrand  wrote:

> On 30.11.2017 12:39, Thomas Huth wrote:
> > On 29.11.2017 21:26, David Hildenbrand wrote:  
> >> s390_cpu_virt_mem_rw() must always return, so callers can react on
> >> an exception (e.g. see ioinst_handle_stcrw()).
> >>
> >> Therefore, using program_interrupt() is wrong. Fix that up.
> >>
> >> Signed-off-by: David Hildenbrand 
> >> ---
> >>  target/s390x/mmu_helper.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c
> >> index dbe2f511f8..2c7f3d7d95 100644
> >> --- a/target/s390x/mmu_helper.c
> >> +++ b/target/s390x/mmu_helper.c
> >> @@ -459,7 +459,7 @@ static int translate_pages(S390CPU *cpu, vaddr addr, 
> >> int nr_pages,
> >>  }
> >>  if (!address_space_access_valid(&address_space_memory, pages[i],
> >>  TARGET_PAGE_SIZE, is_write)) {
> >> -program_interrupt(env, PGM_ADDRESSING, ILEN_AUTO);
> >> +trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO);
> >>  return -EFAULT;
> >>  }
> >>  addr += TARGET_PAGE_SIZE;  
> > 
> > Is that still right when running with KVM? I think the exception will
> > then silently be ignored instead?  
> 
> Good point (for older KVM). And ugly.
> 
> if (kvm_enabled()) {
>   kvm_s390_program_interrupt...
> } else {
>   trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO);
> }

Maybe add a comment that this is only for old kernels that do not
support the mem op interface?



Re: [Qemu-devel] [PATCH v4 08/32] migration: allow send_rq to fail

2017-11-30 Thread Dr. David Alan Gilbert
* Peter Xu (pet...@redhat.com) wrote:
> We will not allow failures to happen when sending data from destination
> to source via the return path. However it is possible that there can be
> errors along the way.  This patch allows the migrate_send_rp_message()
> to return error when it happens, and further extended it to
> migrate_send_rp_req_pages().
> 
> Reviewed-by: Dr. David Alan Gilbert 
> Signed-off-by: Peter Xu 
> ---
>  migration/migration.c | 38 ++
>  migration/migration.h |  2 +-
>  2 files changed, 31 insertions(+), 9 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 8d93b891e3..db896233f6 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -199,17 +199,35 @@ static void deferred_incoming_migration(Error **errp)
>   * Send a message on the return channel back to the source
>   * of the migration.
>   */
> -static void migrate_send_rp_message(MigrationIncomingState *mis,
> -enum mig_rp_message_type message_type,
> -uint16_t len, void *data)
> +static int migrate_send_rp_message(MigrationIncomingState *mis,
> +   enum mig_rp_message_type message_type,
> +   uint16_t len, void *data)
>  {
> +int ret = 0;
> +
>  trace_migrate_send_rp_message((int)message_type, len);
>  qemu_mutex_lock(&mis->rp_mutex);
> +
> +/*
> + * It's possible that the file handle got lost due to network
> + * failures.
> + */
> +if (!mis->to_src_file) {
> +ret = -EIO;
> +goto error;
> +}
> +
>  qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
>  qemu_put_be16(mis->to_src_file, len);
>  qemu_put_buffer(mis->to_src_file, data, len);
>  qemu_fflush(mis->to_src_file);
> +
> +/* It's possible that qemu file got error during sending */
> +ret = qemu_file_get_error(mis->to_src_file);
> +
> +error:
>  qemu_mutex_unlock(&mis->rp_mutex);
> +return ret;
>  }
>  
>  /* Request a range of pages from the source VM at the given
> @@ -219,26 +237,30 @@ static void 
> migrate_send_rp_message(MigrationIncomingState *mis,
>   *   Start: Address offset within the RB
>   *   Len: Length in bytes required - must be a multiple of pagesize
>   */
> -void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char 
> *rbname,
> -   ram_addr_t start, size_t len)
> +int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char 
> *rbname,
> +  ram_addr_t start, size_t len)
>  {
>  uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname up to 256 */
>  size_t msglen = 12; /* start + len */
> +int rbname_len;
> +enum mig_rp_message_type msg_type;
>  
>  *(uint64_t *)bufc = cpu_to_be64((uint64_t)start);
>  *(uint32_t *)(bufc + 8) = cpu_to_be32((uint32_t)len);
>  
>  if (rbname) {
> -int rbname_len = strlen(rbname);
> +rbname_len = strlen(rbname);

I don't think that move of the declaration of rbname_len is necessary;
it's only msglen that you need to keep for longer.

Dave

>  assert(rbname_len < 256);
>  
>  bufc[msglen++] = rbname_len;
>  memcpy(bufc + msglen, rbname, rbname_len);
>  msglen += rbname_len;
> -migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES_ID, msglen, bufc);
> +msg_type = MIG_RP_MSG_REQ_PAGES_ID;
>  } else {
> -migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES, msglen, bufc);
> +msg_type = MIG_RP_MSG_REQ_PAGES;
>  }
> +
> +return migrate_send_rp_message(mis, msg_type, msglen, bufc);
>  }
>  
>  void qemu_start_incoming_migration(const char *uri, Error **errp)
> diff --git a/migration/migration.h b/migration/migration.h
> index ebb049f692..b63cdfbfdb 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -216,7 +216,7 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
>uint32_t value);
>  void migrate_send_rp_pong(MigrationIncomingState *mis,
>uint32_t value);
> -void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* 
> rbname,
> +int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* 
> rbname,
>ram_addr_t start, size_t len);
>  
>  #endif
> -- 
> 2.13.6
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board

2017-11-30 Thread Peter Maydell
On 30 November 2017 at 08:53, Thomas Huth  wrote:
> We can output a character quite easily here with some few lines of
> assembly that we provide as a mini-kernel for this board.
>
> Signed-off-by: Thomas Huth 
> ---
>  tests/boot-serial-test.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
> index d997269..dd3828c 100644
> --- a/tests/boot-serial-test.c
> +++ b/tests/boot-serial-test.c
> @@ -16,6 +16,14 @@
>  #include "qemu/osdep.h"
>  #include "libqtest.h"
>
> +static const uint8_t kernel_mcf5208[] = {
> +0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */
> +0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */
> +0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) Enable 
> TX */
> +0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0)   Print 
> 'T' */
> +0x60, 0xfa  /* bra.s  loop */
> +};

This approach doesn't seem to be scalable to me -- are we
really going to have 50 or more fragments of hand-coded hex in
this file to cover the various board models?

I'd much rather see us have a framework for being able
to build test blobs from source using a cross compiler
setup (and docker or similar so anybody can rebuild
the test blobs). That will be much easier to maintain
and easier to extend to having tests that test other
parts of the board or other aspects of TCG emulation.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v4 18/20] vpc: Switch to .bdrv_co_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vpc driver accordingly.  Drop the now-unused
get_sector_offset().

Signed-off-by: Eric Blake 

---
v4: rebase to interface tweak
v3: rebase to master
v2: drop get_sector_offset() [Kevin], rebase to mapping flag
---
  block/vpc.c | 42 ++
  1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 1576d7b595..4100ce1ed3 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -705,52 +705,54 @@ fail:
  return ret;
  }

-static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn vpc_co_block_status(BlockDriverState *bs,
+bool want_zero,
+int64_t offset, int64_t bytes,
+int64_t *pnum, int64_t *map,
+BlockDriverState **file)
  {
  BDRVVPCState *s = bs->opaque;
  VHDFooter *footer = (VHDFooter*) s->footer_buf;
-int64_t start, offset;
+int64_t start, image_offset;
  bool allocated;
-int64_t ret;
+int ret;
  int n;

  if (be32_to_cpu(footer->type) == VHD_FIXED) {
-*pnum = nb_sectors;
+*pnum = bytes;
+*map = offset;
  *file = bs->file->bs;
-return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
-   (sector_num << BDRV_SECTOR_BITS);
+return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
  }

  qemu_co_mutex_lock(&s->lock);

-offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, false, NULL);
-start = offset;
-allocated = (offset != -1);
+image_offset = get_image_offset(bs, offset, false, NULL);
+start = image_offset & BDRV_BLOCK_OFFSET_MASK;


why do you do so? round down to sector boundary? does it mean,
that *map will corresponds not to offset, but to some previous byte?


+allocated = (image_offset != -1);
  *pnum = 0;
  ret = 0;

  do {
  /* All sectors in a block are contiguous (without using the bitmap) */
-n = ROUND_UP(sector_num + 1, s->block_size / BDRV_SECTOR_SIZE)
-  - sector_num;
-n = MIN(n, nb_sectors);
+n = ROUND_UP(offset + 1, s->block_size) - offset;
+n = MIN(n, bytes);

  *pnum += n;
-sector_num += n;
-nb_sectors -= n;
+offset += n;
+bytes -= n;
  /* *pnum can't be greater than one block for allocated
   * sectors since there is always a bitmap in between. */
  if (allocated) {
  *file = bs->file->bs;
-ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
+*map = start;
+ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
  break;
  }
-if (nb_sectors == 0) {
+if (bytes == 0) {
  break;
  }
-offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, false,
-  NULL);
+image_offset = get_image_offset(bs, offset, false, NULL);
  } while (offset == -1);


image_offset here?



  qemu_co_mutex_unlock(&s->lock);
@@ -1097,7 +1099,7 @@ static BlockDriver bdrv_vpc = {

  .bdrv_co_preadv = vpc_co_preadv,
  .bdrv_co_pwritev= vpc_co_pwritev,
-.bdrv_co_get_block_status   = vpc_co_get_block_status,
+.bdrv_co_block_status   = vpc_co_block_status,

  .bdrv_get_info  = vpc_get_info,




--
Best regards,
Vladimir




[Qemu-devel] [Bug 1735384] Re: OpenJDK JVM segfaults on qemu-sh4 (regression)

2017-11-30 Thread Peter Maydell
This sounds like it may be the bug fixed by this patchset:
https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg05067.html

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1735384

Title:
  OpenJDK JVM segfaults on qemu-sh4 (regression)

Status in QEMU:
  New

Bug description:
  Some of the recent changes introduced a regression which makes the
  OpenJDK JVM crash on qemu-sh4:

  (sid-sh4-sbuild)root@nofan:/# java -version
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  Segmentation fault
  (sid-sh4-sbuild)root@nofan:/#

  An older version works fine:

  (sid-sh4-sbuild)root@nofan:/# java -version
  openjdk version "9.0.1"
  OpenJDK Runtime Environment (build 9.0.1+11-Debian-1)
  OpenJDK Zero VM (build 9.0.1+11-Debian-1, interpreted mode)
  (sid-sh4-sbuild)root@nofan:/#

  Haven't had time for bisecting this yet.

  Adrian

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1735384/+subscriptions



Re: [Qemu-devel] [PATCH v4 19/20] vvfat: Switch to .bdrv_co_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vvfat driver accordingly.  Note that we
can rely on the block driver having already clamped limits to our
block size, and simplify accordingly.

Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 



---
v4: rebase to interface tweak
v3: no change
v2: rebase to earlier changes, simplify
---
  block/vvfat.c | 16 +++-
  1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index a0f2335894..9142117fc6 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3082,15 +3082,13 @@ vvfat_co_pwritev(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
  return ret;
  }

-static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *n, BlockDriverState **file)
+static int coroutine_fn vvfat_co_block_status(BlockDriverState *bs,
+  bool want_zero, int64_t offset,
+  int64_t bytes, int64_t *n,


may be rename to *pnum ?


+  int64_t *map,
+  BlockDriverState **file)
  {
-*n = bs->total_sectors - sector_num;
-if (*n > nb_sectors) {
-*n = nb_sectors;
-} else if (*n < 0) {
-return 0;
-}
+*n = bytes;
  return BDRV_BLOCK_DATA;
  }

@@ -3251,7 +3249,7 @@ static BlockDriver bdrv_vvfat = {

  .bdrv_co_preadv = vvfat_co_preadv,
  .bdrv_co_pwritev= vvfat_co_pwritev,
-.bdrv_co_get_block_status = vvfat_co_get_block_status,
+.bdrv_co_block_status   = vvfat_co_block_status,
  };

  static void bdrv_vvfat_init(void)



--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs

2017-11-30 Thread Kevin Wolf
Am 29.11.2017 um 18:56 hat Alberto Garcia geschrieben:
> Starting from commit 40840e419be31e6a32e6ea24511c74b389d5e0e4 we are
> pausing all block jobs during bdrv_reopen_multiple() to prevent any of
> them from finishing and removing nodes from the graph while they are
> being reopened.
> 
> It turns out that pausing a block job doesn't necessarily prevent it
> from finishing: a paused block job can still run its exit function
> from the main loop and call block_job_completed(). The mirror block
> job in particular always goes to the main loop while it is paused (by
> virtue of the bdrv_drained_begin() call in mirror_run()).
> 
> Destroying a paused block job during bdrv_reopen_multiple() has two
> consequences:
> 
>1) The references to the nodes involved in the job are released,
>   possibly destroying some of them. If those nodes were in the
>   reopen queue this would trigger the problem originally described
>   in commit 40840e419be, crashing QEMU.

This specific problem could be avoided by making the BDS reference in
the reopen queue strong, i.e. bdrv_ref() in bdrv_reopen_queue_child()
and bdrv_unref() only at the end of bdrv_reopen_multiple().

>2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
>   not be doing all necessary bdrv_parent_drained_end() calls.

If I understand correctly, you don't have a reproducer here. I'm not
convinced that this one actually exists because the functions that do
the graph modifications (specficically bdrv_replace_child_noperm),
automatically drain and undrain nodes as necessary.

> I can reproduce problem 1) easily with iotest 030 by increasing
> STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking
> the iotest like in this example:
> 
>https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html
> 
> This patch keeps an additional reference to all block jobs between
> block_job_pause_all() and block_job_resume_all(), guaranteeing that
> they are kept alive.

This might be okay as a stopgap solution if this is a real problem in
practice because we don't have a better solution right now. However,
I haven't seen any sign of there being an -rc4, so we're probably too
late for 2.11 anyway.

It's certainly not a full solution because keeping a reference to a
block job does not prevent it from completing, but only from being
freed. Most block jobs do graph modifications, including dropping the
references to nodes, already when they complete, not only when they are
freed.

Now, while the above might have sounded like we have an easy solution in
bdrv_reopen(), I see another problem that looks quite tough:

3) bdrv_reopen_queue() is a recursive operation that involves all
   children that inherited options. If the graph changes between the
   bdrv_reopen_queue() call and the end of bdrv_reopen_multiple(), the
   calculated options (and today possibly permissions) don't actually
   match the graph any more.

   The requirement we really have is that between bdrv_reopen_queue()
   and bdrv_reopen_multiple() no graph changes are made. Calling
   bdrv_drain_all_begin() in bdrv_reopen_multiple() is too late.

Maybe that actually gives us another relatively simple solution: We need
to push up the drain into the callers, and possibly just assert that the
nodes are drained in the reopen functions.

Kevin



Re: [Qemu-devel] [PATCH v4 20/20] block: Drop unused .bdrv_co_get_block_status()

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

12.10.2017 21:59, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Now that all drivers have been updated to provide the
byte-based .bdrv_co_block_status(), we can delete the sector-based
interface.

Signed-off-by: Eric Blake 



Reviewed-by: Vladimir Sementsov-Ogievskiy 



--
Best regards,
Vladimir




Re: [Qemu-devel] [RFC PATCH v2 1/1] s390x/css: unrestrict cssids

2017-11-30 Thread Halil Pasic


On 11/29/2017 06:29 PM, Cornelia Huck wrote:
[..]
 With this change, however, our schema for generating a css bus ids, if
 none is specified does not make much sense. Currently we start at cssid
 0x0 for non-virtual devices and use the default css (without
 s390-squash-mcss exclusively) for virtual devices.  That means for
 non-virtual device the device would auto-magically end up non-visible for
 guests (which can't use the other channel subsystems).

 Thus let us also change the css bus id auto assignment algorithm,
 so that we first fill the default css, and then proceed to the
 next one (modulo MAX_CSSID).  
>>>
>>> Let's reword the previous two paragraphs:
>>>
>>> "At the same time, change our schema for generating css bus ids to put
>>> both virtual and non-virtual devices into the default css (spilling
>>> over into other css images, if needed) so that devices without a
>>> specified id don't end up hidden to guests not supporting multiple
>>> channel subsystems."
>>>   
>>
>> What I don't like about your explanation is, that "so that devices without
>> a specified id don't end up hidden to guests not supporting multiple channel 
>> subsystems" is not necessarily true: if we spill the devices are going
>> to end up hidden.
> 
> Let's make this "don't always end up hidden".

I would prefer to go with this.


At the same time, change our schema for generating css bus ids to put
both virtual and non-virtual devices into the default css (spilling
over into other css images, if needed). The intention is to deprecate
s390-squash-mcss. Whit this change devices without a specified devno won't
end up hidden to guests not supporting multiple channel subsystems, unless
this can not be avoided (default css full).

> 
>>

 The adverse effect of getting rid of the restriction on migration should
 not be too severe.  Vfio-ccw devices are not live-migratable yet, and for
 virtual devices using the extra freedom would only make sense with the
 aforementioned guest support in place.

 The auto-generated bus ids are affected by both changes. We hope to not


[..]
>>
>> I tired to be quite elaborate, because at some point of the v1
>> discussion, you said if we are planting landmines you want them explained
>> in the commit message. I'm not sure how this document file is supposed
>> to be called, and look like. I think this stuff is relevant for
>> the decision why is this patch a good one, and what are the trade-offs
>> we make. Referencing to a document would be also OK with me.
> 
> I don't think there will be landmines left, no? Or do I misread?
>

The patch basically just got worse with changing the schema. If there
were landmines they are still there. My original point was that it's
bearable in practice, so that's why I had this short and vague ain't
too bad formulation.
 
>>
>> Regarding the deprecation patch. It's already on the list as RFC. You
>> have even commented on it. I intend to make a v2 once we know what are
>> we going to do here.
> 
> This needs to be a patch series with a cover letter. Discussing in
> multiple places is draining.
> 

Can do. No problem for me.

>>

 Signed-off-by: Halil Pasic 

 ---
 Hi all!

 Moving the property to the machine turned out very ugly (IMHO).  Libvirt
 detects machine properties via query-command-line-options.  This is
 however decoupled from the existence of the actual property: one needs to
 touch util/qemu-config.c (see patch) so the property shows up.
 Furthermore this stuff is global. I've also noticed that the infamous
 s390-squash-mcss does not show up.  
>>>
>>> s390x-softmmu/qemu-system-s390x -machine s390-ccw-virtio,help
>>>
>>> shows it, as does qom-list on /machine.  
>>
>> For qom-list you need an instance. Libvirt probes such stuff using
>> (btw probing is done with machine type none, and qom-list /machine
>> should not list squash if we are running machine type none).
>>
>>>
>>> Output of  --help is very haphazard anyway and you should
>>> rely on the interfaces above.
>>>   
>>
>> I disagree. AFAIK management software should probe using the
>> query-command-line-options QMP command. Or am I missing something?
>>
>> I don't speak about the output of  --help here.
> 
> It's the same interface. It both over- and underreports. Querying the
> actual object is the only way to get this reliable. If that is not
> possible today, it needs to be implemented.
>

Sorry I did not look into how  --help is implemented. For
what we are trying to accomplish here it's irrelevant. Wonder why did you
have to bring this up and make the discussion even more complicated.
 
>>

 On the other hand to get the property displayed by -machine
 s390-ccw-virtio,help one needs a setter on the property. So I have
 created a fake setter which produces an error each time called.  
>>>
>>> Yes, this is fugly. A css object would be a better place, but way t

Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board

2017-11-30 Thread Thomas Huth
On 30.11.2017 13:14, Peter Maydell wrote:
> On 30 November 2017 at 08:53, Thomas Huth  wrote:
>> We can output a character quite easily here with some few lines of
>> assembly that we provide as a mini-kernel for this board.
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>  tests/boot-serial-test.c | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
>> index d997269..dd3828c 100644
>> --- a/tests/boot-serial-test.c
>> +++ b/tests/boot-serial-test.c
>> @@ -16,6 +16,14 @@
>>  #include "qemu/osdep.h"
>>  #include "libqtest.h"
>>
>> +static const uint8_t kernel_mcf5208[] = {
>> +0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */
>> +0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */
>> +0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) Enable 
>> TX */
>> +0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0)   Print 
>> 'T' */
>> +0x60, 0xfa  /* bra.s  loop */
>> +};
> 
> This approach doesn't seem to be scalable to me -- are we
> really going to have 50 or more fragments of hand-coded hex in
> this file to cover the various board models?

No, since this only works for certain few boards with these criteria:

1) There must be a way to load such small blobs either with -kernel or
-bios. Many boards only support loading kernels in the ELF format, and
we certainly don't want to encode the hex-dump of such a kernel here.
Or some boards / CPUs also support -bios, but the entry point must be at
the end of a large image, so that also does not make sense to include
them here.

2) The UART must be usable with some few lines of assembly. That's also
not always the case.

So no, this is really not meant to scale to all boards, it's just meant
to get at least some few more test coverage, especially while we're not
having other test mechanisms in place yet. (e.g. how long are we talking
about reviving tests/tcg/ again already? ... some attempts have been
made to make it compilable again, but non of the patch series has been
included yet so far)

> I'd much rather see us have a framework for being able
> to build test blobs from source using a cross compiler
> setup (and docker or similar so anybody can rebuild
> the test blobs). That will be much easier to maintain
> and easier to extend to having tests that test other
> parts of the board or other aspects of TCG emulation.

I agree that this idea is way more flexible and could cover many more
boards, but there are also things that need to be solved first:
Not everybody has docker / cross-compilers installed or can use them, so
we likely need a set of pre-build binary images somewhere. But we
certainly don't want to include megabytes of blobs in the git repository
... so this would need to go into an external repository instead. Then
you likely can not include this in "make check" so easily anymore
(unless you force everybody to check out the external repository with
the git-submodule.sh script - but I really dislike this idea) ...

So yes, we should have better test coverage for almost all machines if
possible, but I don't see that this is happening soon or could be really
tightly integrated into "make check". I.e. the boot-serial tester could
fill at least parts of this gap, I think.

 Thomas



Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board

2017-11-30 Thread Paolo Bonzini
On 30/11/2017 13:14, Peter Maydell wrote:
> On 30 November 2017 at 08:53, Thomas Huth  wrote:
>> +static const uint8_t kernel_mcf5208[] = {
>> +0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */
>> +0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */
>> +0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) Enable 
>> TX */
>> +0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0)   Print 
>> 'T' */
>> +0x60, 0xfa  /* bra.s  loop */
>> +};
> 
> This approach doesn't seem to be scalable to me -- are we
> really going to have 50 or more fragments of hand-coded hex in
> this file to cover the various board models?
> 
> I'd much rather see us have a framework for being able
> to build test blobs from source using a cross compiler
> setup (and docker or similar so anybody can rebuild
> the test blobs). That will be much easier to maintain
> and easier to extend to having tests that test other
> parts of the board or other aspects of TCG emulation.

It seems a bit overkill, as these snippets are ~16 bytes long.

However, it would be useful to have a basic patching mechanism so that
board descriptions could include a common hand-coded const array and
place an address at a given offset.  So you'd have

struct HexFirmware {
int patch_offset;
short   patch_size;
boolpatch_bigendian;
uint8_t data[32];
}

and microblaze boards could have:

struct HexFirmware kernel_microblaze = {
.patch_offset= 0,
.patch_size  = 2,
.patch_bigendian = false,
.data = {
0xaa, 0xaa, 0x00, 0xb0, /* imm   0x */
0x00, 0x10, 0x60, 0x30, /* addik r3,r0,0x1000 */
0x54, 0x00, 0x80, 0x30, /* addik r4,r0,'T' */
0x00, 0x00, 0x83, 0xf0, /* sbi   r4,r3,0 */
0xfc, 0xff, 0x00, 0xb8, /* bri   -4  loop */
}
};

...

{ "microblaze", "petalogix-s3adsp1800", "", "TT",
  kernel_microblaze, 0x8400 },
{ "microblazeel", "petalogix-ml605", "", "TT",
  kernel_microblaze, 0x83a0 },

Likewise, you could have just two copies of the code for all ARM boards
that have a pl011 (or any other UART with a simple byte-long transmit
register), one 32-bit and one 64-bit.

Thanks,

Paolo



Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates

2017-11-30 Thread Igor Mammedov
On Thu, 30 Nov 2017 12:08:06 +
"Dr. David Alan Gilbert"  wrote:

> * Igor Mammedov (imamm...@redhat.com) wrote:
> > On Wed, 29 Nov 2017 18:50:19 +
> > "Dr. David Alan Gilbert (git)"  wrote:
> >   
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > Hi,
> > >   This is an experimental set that reworks the way the vhost
> > > code handles changes in physical address space layout that
> > > came from a discussion with Igor.  
> > Thanks for looking into it.
> > 
> >
> > > Instead of updating and trying to merge sections of address
> > > space on each add/remove callback, we wait until the commit phase
> > > and go through and rebuild a list by walking the Flatview of
> > > memory and end up producing an ordered list.
> > > We compare the list to the old list to trigger updates.
> > > 
> > > Note, only very lightly tested so far, I'm just trying to see if it's
> > > the right shape.
> > > 
> > > Igor, is this what you were intending?  
> > 
> > I was thinking about a little less intrusive approach
> > 
> > where vhost_region_add/del are modified to maintain
> > sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> > altogether and vhost_memory_region array is build/used/freed
> > on every vhost_commit().
> > Maintaining sorted array should roughly cost us O(2 log n) if
> > binary search is used.
> > 
> > However I like your idea with iterator even more as it have
> > potential to make it even faster O(n) if we get rid of
> > quadratic and relatively complex vhost_update_compare_list().  
> 
> Note vhost_update_compare_list is complex,
> but it is O(n) - it's
> got nested loops, but the inner loop moves forward and oldi never
> gets reset back to zero.
While skimming through patches I've overlooked it.

Anyways,
why memcmp(old_arr, new_arr) is not sufficient
to detect a change in memory map?

> 
> > Pls, see comments on individual patches.  
> 
> Thanks; I have fixed a couple of bugs since I posted, so I'm
> more interested in structure/shape.  Any good ideas how to test
> it are welcome.
> 
> Dave
> 
> >   
> > > Dave
> > > 
> > > Dr. David Alan Gilbert (7):
> > >   memory: address_space_iterate
> > >   vhost: Move log_dirty check
> > >   vhost: New memory update functions
> > >   vhost: update_mem_cb implementation
> > >   vhost: Compare new and old memory lists
> > >   vhost: Copy updated region data into device state
> > >   vhost: Remove vhost_set_memory and children
> > > 
> > >  hw/virtio/trace-events |   8 +
> > >  hw/virtio/vhost.c  | 424 
> > > ++---
> > >  include/exec/memory.h  |  23 +++
> > >  memory.c   |  22 +++
> > >  4 files changed, 241 insertions(+), 236 deletions(-)
> > >   
> >   
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [Qemu-devel] [Bug 1735384] Re: OpenJDK JVM segfaults on qemu-sh4 (regression)

2017-11-30 Thread John Paul Adrian Glaubitz
On 11/30/2017 01:19 PM, Peter Maydell wrote:
> This sounds like it may be the bug fixed by this patchset:
> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg05067.html

Unfortunately not. I will upload a prepared chroot for testing later
and link it in this bug report.

Adrian

-- 
  .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
   `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1735384

Title:
  OpenJDK JVM segfaults on qemu-sh4 (regression)

Status in QEMU:
  New

Bug description:
  Some of the recent changes introduced a regression which makes the
  OpenJDK JVM crash on qemu-sh4:

  (sid-sh4-sbuild)root@nofan:/# java -version
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  Segmentation fault
  (sid-sh4-sbuild)root@nofan:/#

  An older version works fine:

  (sid-sh4-sbuild)root@nofan:/# java -version
  openjdk version "9.0.1"
  OpenJDK Runtime Environment (build 9.0.1+11-Debian-1)
  OpenJDK Zero VM (build 9.0.1+11-Debian-1, interpreted mode)
  (sid-sh4-sbuild)root@nofan:/#

  Haven't had time for bisecting this yet.

  Adrian

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1735384/+subscriptions



Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates

2017-11-30 Thread Dr. David Alan Gilbert
* Igor Mammedov (imamm...@redhat.com) wrote:
> On Thu, 30 Nov 2017 12:08:06 +
> "Dr. David Alan Gilbert"  wrote:
> 
> > * Igor Mammedov (imamm...@redhat.com) wrote:
> > > On Wed, 29 Nov 2017 18:50:19 +
> > > "Dr. David Alan Gilbert (git)"  wrote:
> > >   
> > > > From: "Dr. David Alan Gilbert" 
> > > > 
> > > > Hi,
> > > >   This is an experimental set that reworks the way the vhost
> > > > code handles changes in physical address space layout that
> > > > came from a discussion with Igor.  
> > > Thanks for looking into it.
> > > 
> > >
> > > > Instead of updating and trying to merge sections of address
> > > > space on each add/remove callback, we wait until the commit phase
> > > > and go through and rebuild a list by walking the Flatview of
> > > > memory and end up producing an ordered list.
> > > > We compare the list to the old list to trigger updates.
> > > > 
> > > > Note, only very lightly tested so far, I'm just trying to see if it's
> > > > the right shape.
> > > > 
> > > > Igor, is this what you were intending?  
> > > 
> > > I was thinking about a little less intrusive approach
> > > 
> > > where vhost_region_add/del are modified to maintain
> > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> > > altogether and vhost_memory_region array is build/used/freed
> > > on every vhost_commit().
> > > Maintaining sorted array should roughly cost us O(2 log n) if
> > > binary search is used.
> > > 
> > > However I like your idea with iterator even more as it have
> > > potential to make it even faster O(n) if we get rid of
> > > quadratic and relatively complex vhost_update_compare_list().  
> > 
> > Note vhost_update_compare_list is complex,
> > but it is O(n) - it's
> > got nested loops, but the inner loop moves forward and oldi never
> > gets reset back to zero.
> While skimming through patches I've overlooked it.
> 
> Anyways,
> why memcmp(old_arr, new_arr) is not sufficient
> to detect a change in memory map?

It tells you that you've got a change, but doesn't give
the start/end of the range that's changed, and those
are used by vhost_commit to limit the work of
vhost_verify_ring_mappings.

Dave

> > 
> > > Pls, see comments on individual patches.  
> > 
> > Thanks; I have fixed a couple of bugs since I posted, so I'm
> > more interested in structure/shape.  Any good ideas how to test
> > it are welcome.
> > 
> > Dave
> > 
> > >   
> > > > Dave
> > > > 
> > > > Dr. David Alan Gilbert (7):
> > > >   memory: address_space_iterate
> > > >   vhost: Move log_dirty check
> > > >   vhost: New memory update functions
> > > >   vhost: update_mem_cb implementation
> > > >   vhost: Compare new and old memory lists
> > > >   vhost: Copy updated region data into device state
> > > >   vhost: Remove vhost_set_memory and children
> > > > 
> > > >  hw/virtio/trace-events |   8 +
> > > >  hw/virtio/vhost.c  | 424 
> > > > ++---
> > > >  include/exec/memory.h  |  23 +++
> > > >  memory.c   |  22 +++
> > > >  4 files changed, 241 insertions(+), 236 deletions(-)
> > > >   
> > >   
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board

2017-11-30 Thread Thomas Huth
On 30.11.2017 13:40, Paolo Bonzini wrote:
> On 30/11/2017 13:14, Peter Maydell wrote:
>> On 30 November 2017 at 08:53, Thomas Huth  wrote:
>>> +static const uint8_t kernel_mcf5208[] = {
>>> +0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */
>>> +0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */
>>> +0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) Enable 
>>> TX */
>>> +0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0)   Print 
>>> 'T' */
>>> +0x60, 0xfa  /* bra.s  loop */
>>> +};
>>
>> This approach doesn't seem to be scalable to me -- are we
>> really going to have 50 or more fragments of hand-coded hex in
>> this file to cover the various board models?
>>
>> I'd much rather see us have a framework for being able
>> to build test blobs from source using a cross compiler
>> setup (and docker or similar so anybody can rebuild
>> the test blobs). That will be much easier to maintain
>> and easier to extend to having tests that test other
>> parts of the board or other aspects of TCG emulation.
> 
> It seems a bit overkill, as these snippets are ~16 bytes long.
> 
> However, it would be useful to have a basic patching mechanism so that
> board descriptions could include a common hand-coded const array and
> place an address at a given offset.  So you'd have
> 
> struct HexFirmware {
> int patch_offset;
> short   patch_size;
> boolpatch_bigendian;
> uint8_t data[32];
> }
> 
> and microblaze boards could have:
> 
> struct HexFirmware kernel_microblaze = {
> .patch_offset= 0,
> .patch_size  = 2,
> .patch_bigendian = false,
> .data = {
> 0xaa, 0xaa, 0x00, 0xb0, /* imm   0x */
> 0x00, 0x10, 0x60, 0x30, /* addik r3,r0,0x1000 */
> 0x54, 0x00, 0x80, 0x30, /* addik r4,r0,'T' */
> 0x00, 0x00, 0x83, 0xf0, /* sbi   r4,r3,0 */
> 0xfc, 0xff, 0x00, 0xb8, /* bri   -4  loop */
> }
> };
> 
> ...
> 
> { "microblaze", "petalogix-s3adsp1800", "", "TT",
>   kernel_microblaze, 0x8400 },
> { "microblazeel", "petalogix-ml605", "", "TT",
>   kernel_microblaze, 0x83a0 },

The two micrablaze data arrays are completely different, since one is
big endian, the other is little, so I'd need to byte-swap the whole
array on the fly, too. Not sure whether it makes sense to add such
complex code just to safe 16 bytes of blob data...?

> Likewise, you could have just two copies of the code for all ARM boards
> that have a pl011 (or any other UART with a simple byte-long transmit
> register), one 32-bit and one 64-bit.

OK, having a patching mechanism in place likely makes sense as soon as
we want to include multiple ARM boards here. I can do that, but I'd
rather like to do it as a second step. It was already quite hard work to
come up with all these assembler snippets (from CPUs where I don't have
a clue of) and to determine which machines could be tested this way at
all, so I'd first like to wait and see whether this patch series is
acceptable at all or not, since Peter has objections.

 Thomas



Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board

2017-11-30 Thread Peter Maydell
On 30 November 2017 at 12:40, Paolo Bonzini  wrote:
> On 30/11/2017 13:14, Peter Maydell wrote:
>> On 30 November 2017 at 08:53, Thomas Huth  wrote:
>>> +static const uint8_t kernel_mcf5208[] = {
>>> +0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */
>>> +0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */
>>> +0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) Enable 
>>> TX */
>>> +0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0)   Print 
>>> 'T' */
>>> +0x60, 0xfa  /* bra.s  loop */
>>> +};
>>
>> This approach doesn't seem to be scalable to me -- are we
>> really going to have 50 or more fragments of hand-coded hex in
>> this file to cover the various board models?
>>
>> I'd much rather see us have a framework for being able
>> to build test blobs from source using a cross compiler
>> setup (and docker or similar so anybody can rebuild
>> the test blobs). That will be much easier to maintain
>> and easier to extend to having tests that test other
>> parts of the board or other aspects of TCG emulation.
>
> It seems a bit overkill, as these snippets are ~16 bytes long.

They're 16 bytes long because that's about the limit of
what you can do with this approach. The consequence is that
they barely test anything at all. A more sensible framework
would allow:
 * better testing of TCG instructions more generally
 * writing your test cases in C
 * more interesting board dependent tests
 * "integration test" setups like 'boot entire kernel'
 * etc

thanks
-- PMM



[Qemu-devel] [PATCH v4 0/7] s390x/pci: Improve zPCI to cover more cases

2017-11-30 Thread Pierre Morel
This patch fixes the following BUG:
Even a guest is able to detect virtio_pci device, the init function
the Linux virtio_pci driver will hang because zPCI does not support
the subregions used by virtio_pci.

It follows that right now the PCI support is very limited
(e.g. pass through of a host vfio device)
To enable features like virtio-pci several modifications needs to be
done.

As already stated above, Virtio-PCI uses subregions, which may eventually
be discontinuous inside bars instead of a single flat region often used
by real devices.
The address offset being formerly calculated from the BAR base address
must be adapted to the subregions instead of to the single region.

This patch provides the new calculation for the three kind of BAR
access, zPCI STORE, zPCI LOAD and zPCI STORE BLOCK done by zPCI.

We use the opportunity to
 - enhance the fault detection for zPCI STORE and LOAD,
 - enhance the fault detection and to provide the maximum STORE BLOCK
   block size, maxstbl, for zPCI STORE BLOCK
 - factor out part of the code used to calculate the offset and
   access the BARs,
 - factor out the code for endianess conversion.


Pierre Morel (7):
  s390x/pci: factor out endianess conversion
  s390x/pci: rework PCI STORE
  s390x/pci: rework PCI LOAD
  s390x/pci: rework PCI STORE BLOCK
  s390x/pci: move the memory region read from pcilg
  s390x/pci: move the memory region write from pcistg
  s390x/pci: search for subregion inside the BARs

 hw/s390x/s390-pci-bus.h  |   1 +
 hw/s390x/s390-pci-inst.c | 251 ---
 hw/s390x/s390-pci-inst.h |   6 +-
 3 files changed, 158 insertions(+), 100 deletions(-)

-- 
2.7.4

Changelog:

from v3
- added explanation why we always need to swap bytes
- #define for BAR definitions (Conny)
- renamed "other" to "subregion" (Thomas)
- renamed label "addressing_error" to "specification_error" (Thomas)
- added some comments to explain why testing states is not needed (Conny)

from v2
- rewording of comments, coding style.

from v1
- suppress fallthrough to PCI_ROM_SLOT to handle it in the default case.
- reword most of the patch commit messages
- add comments to the endianness conversion
- reword somme comments inside the patches

  




[Qemu-devel] [PATCH v4 1/7] s390x/pci: factor out endianess conversion

2017-11-30 Thread Pierre Morel
There are two places where the same endianness conversion
is done.
Let's factor this out into a static function.

Note that the conversion must always be done for data in a register:

The S390 BE guest converted date to le before issuing the instruction.

After interception in a BE host:
ZPCI VFIO using pwrite must make the conversion back for the BE kernel.
Kernel will do BE to le translation when loading the register for the
real instruction.

After interception in a le host:
TCG stores a BE register in le, swapping bytes.
But since the data in the register was already le it is now BE
ZPCI VFIO must convert it to le before writing to the PCI memory.

In both cases ZPCI VFIO must swap the bytes from the register.

Signed-off-by: Pierre Morel 
Reviewed-by: Yi Min Zhao 
---
 hw/s390x/s390-pci-inst.c | 59 +++-
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 8e088f3..3e1f1a0 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -314,6 +314,36 @@ out:
 return 0;
 }
 
+/**
+ * Swap data contained in s390x big endian registers to little endian
+ * PCI bars.
+ *
+ * @ptr: a pointer to a uint64_t data field
+ * @len: the length of the valid data, must be 1,2,4 or 8
+ */
+static int zpci_endian_swap(uint64_t *ptr, uint8_t len)
+{
+uint64_t data = *ptr;
+
+switch (len) {
+case 1:
+break;
+case 2:
+data = bswap16(data);
+break;
+case 4:
+data = bswap32(data);
+break;
+case 8:
+data = bswap64(data);
+break;
+default:
+return -EINVAL;
+}
+*ptr = data;
+return 0;
+}
+
 int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
 {
 CPUS390XState *env = &cpu->env;
@@ -385,19 +415,7 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2)
 data =  pci_host_config_read_common(
pbdev->pdev, offset, pci_config_size(pbdev->pdev), len);
 
-switch (len) {
-case 1:
-break;
-case 2:
-data = bswap16(data);
-break;
-case 4:
-data = bswap32(data);
-break;
-case 8:
-data = bswap64(data);
-break;
-default:
+if (zpci_endian_swap(&data, len)) {
 program_interrupt(env, PGM_OPERAND, 4);
 return 0;
 }
@@ -500,19 +518,8 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2)
 program_interrupt(env, PGM_OPERAND, 4);
 return 0;
 }
-switch (len) {
-case 1:
-break;
-case 2:
-data = bswap16(data);
-break;
-case 4:
-data = bswap32(data);
-break;
-case 8:
-data = bswap64(data);
-break;
-default:
+
+if (zpci_endian_swap(&data, len)) {
 program_interrupt(env, PGM_OPERAND, 4);
 return 0;
 }
-- 
2.7.4




[Qemu-devel] [PATCH v4 6/7] s390x/pci: move the memory region write from pcistg

2017-11-30 Thread Pierre Morel
Let's move the memory region write from pcistg into a dedicated
function.
This allows us to prepare a later patch searching for subregions
inside of the memory region.

Signed-off-by: Pierre Morel 
Reviewed-by: Yi Min Zhao 
Reviewed-by: Thomas Huth 
---
 hw/s390x/s390-pci-inst.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index c702e8d..1277d62 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -454,12 +454,27 @@ static int trap_msix(S390PCIBusDevice *pbdev, uint64_t 
offset, uint8_t pcias)
 }
 }
 
+static MemTxResult zpci_write_bar(S390PCIBusDevice *pbdev, uint8_t pcias,
+  uint64_t offset, uint64_t data, uint8_t len)
+{
+MemoryRegion *mr;
+
+if (trap_msix(pbdev, offset, pcias)) {
+offset = offset - pbdev->msix.table_offset;
+mr = &pbdev->pdev->msix_table_mmio;
+} else {
+mr = pbdev->pdev->io_regions[pcias].memory;
+}
+
+return memory_region_dispatch_write(mr, offset, data, len,
+MEMTXATTRS_UNSPECIFIED);
+}
+
 int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
 {
 CPUS390XState *env = &cpu->env;
 uint64_t offset, data;
 S390PCIBusDevice *pbdev;
-MemoryRegion *mr;
 MemTxResult result;
 uint8_t len;
 uint32_t fh;
@@ -519,15 +534,7 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2)
 return 0;
 }
 
-if (trap_msix(pbdev, offset, pcias)) {
-offset = offset - pbdev->msix.table_offset;
-mr = &pbdev->pdev->msix_table_mmio;
-} else {
-mr = pbdev->pdev->io_regions[pcias].memory;
-}
-
-result = memory_region_dispatch_write(mr, offset, data, len,
- MEMTXATTRS_UNSPECIFIED);
+result = zpci_write_bar(pbdev, pcias, offset, data, len);
 if (result != MEMTX_OK) {
 program_interrupt(env, PGM_OPERAND, 4);
 return 0;
-- 
2.7.4




[Qemu-devel] [PATCH v4 3/7] s390x/pci: rework PCI LOAD

2017-11-30 Thread Pierre Morel
Enhance the fault detection, correction of the fault reporting.

Signed-off-by: Pierre Morel 
Reviewed-by: Yi Min Zhao 
---
 hw/s390x/s390-pci-inst.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 134484f..bab347e 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -373,6 +373,11 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2)
 len = env->regs[r2] & 0xf;
 offset = env->regs[r2 + 1];
 
+if (!(fh & FH_MASK_ENABLE)) {
+setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
+return 0;
+}
+
 pbdev = s390_pci_find_dev_by_fh(s390_get_phb(), fh);
 if (!pbdev) {
 DPRINTF("pcilg no pci dev\n");
@@ -381,12 +386,7 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2)
 }
 
 switch (pbdev->state) {
-case ZPCI_FS_RESERVED:
-case ZPCI_FS_STANDBY:
-case ZPCI_FS_DISABLED:
 case ZPCI_FS_PERMANENT_ERROR:
-setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
-return 0;
 case ZPCI_FS_ERROR:
 setcc(cpu, ZPCI_PCI_LS_ERR);
 s390_set_status_code(env, r2, ZPCI_PCI_ST_BLOCKED);
@@ -395,8 +395,9 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
 break;
 }
 
-if (pcias < 6) {
-if ((8 - (offset & 0x7)) < len) {
+switch (pcias) {
+case ZPCI_IO_BAR_MIN ... ZPCI_IO_BAR_MAX:
+if (!len || (len > (8 - (offset & 0x7 {
 program_interrupt(env, PGM_OPERAND, 4);
 return 0;
 }
@@ -407,8 +408,9 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
 program_interrupt(env, PGM_OPERAND, 4);
 return 0;
 }
-} else if (pcias == 15) {
-if ((4 - (offset & 0x3)) < len) {
+break;
+case ZPCI_CONFIG_BAR:
+if (!len || (len > (4 - (offset & 0x3))) || len == 3) {
 program_interrupt(env, PGM_OPERAND, 4);
 return 0;
 }
@@ -419,8 +421,9 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
 program_interrupt(env, PGM_OPERAND, 4);
 return 0;
 }
-} else {
-DPRINTF("invalid space\n");
+break;
+default:
+DPRINTF("pcilg invalid space\n");
 setcc(cpu, ZPCI_PCI_LS_ERR);
 s390_set_status_code(env, r2, ZPCI_PCI_ST_INVAL_AS);
 return 0;
-- 
2.7.4




[Qemu-devel] [PATCH v4 4/7] s390x/pci: rework PCI STORE BLOCK

2017-11-30 Thread Pierre Morel
Enhance the fault detection.

Fixup the precedence to check the destination path existance
before checking for the source accessibility.

Add the maxstbl entry to both the Query PCI Function Group
response and the PCIBusDevice structure.

Initialize the maxstbl to 128 per default until we get
the actual data from the hardware.

Signed-off-by: Pierre Morel 
Reviewed-by: Yi Min Zhao 
---
 hw/s390x/s390-pci-bus.h  |  1 +
 hw/s390x/s390-pci-inst.c | 63 ++--
 hw/s390x/s390-pci-inst.h |  2 +-
 3 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
index 560bd82..2993f0d 100644
--- a/hw/s390x/s390-pci-bus.h
+++ b/hw/s390x/s390-pci-bus.h
@@ -284,6 +284,7 @@ struct S390PCIBusDevice {
 uint64_t fmb_addr;
 uint8_t isc;
 uint16_t noi;
+uint16_t maxstbl;
 uint8_t sum;
 S390MsixInfo msix;
 AdapterRoutes routes;
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index bab347e..bab7353 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -294,6 +294,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2)
 stq_p(&resgrp->msia, ZPCI_MSI_ADDR);
 stw_p(&resgrp->mui, 0);
 stw_p(&resgrp->i, 128);
+stw_p(&resgrp->maxstbl, 128);
 resgrp->version = 0;
 
 stw_p(&resgrp->hdr.rsp, CLP_RC_OK);
@@ -648,6 +649,7 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r3, uint64_t gaddr,
 S390PCIBusDevice *pbdev;
 MemoryRegion *mr;
 MemTxResult result;
+uint64_t offset;
 int i;
 uint32_t fh;
 uint8_t pcias;
@@ -662,22 +664,10 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r3, uint64_t gaddr,
 fh = env->regs[r1] >> 32;
 pcias = (env->regs[r1] >> 16) & 0xf;
 len = env->regs[r1] & 0xff;
+offset = env->regs[r3];
 
-if (pcias > 5) {
-DPRINTF("pcistb invalid space\n");
-setcc(cpu, ZPCI_PCI_LS_ERR);
-s390_set_status_code(env, r1, ZPCI_PCI_ST_INVAL_AS);
-return 0;
-}
-
-switch (len) {
-case 16:
-case 32:
-case 64:
-case 128:
-break;
-default:
-program_interrupt(env, PGM_SPECIFICATION, 6);
+if (!(fh & FH_MASK_ENABLE)) {
+setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
 return 0;
 }
 
@@ -689,12 +679,7 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r3, uint64_t gaddr,
 }
 
 switch (pbdev->state) {
-case ZPCI_FS_RESERVED:
-case ZPCI_FS_STANDBY:
-case ZPCI_FS_DISABLED:
 case ZPCI_FS_PERMANENT_ERROR:
-setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
-return 0;
 case ZPCI_FS_ERROR:
 setcc(cpu, ZPCI_PCI_LS_ERR);
 s390_set_status_code(env, r1, ZPCI_PCI_ST_BLOCKED);
@@ -703,8 +688,34 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r3, uint64_t gaddr,
 break;
 }
 
+if (pcias > ZPCI_IO_BAR_MAX) {
+DPRINTF("pcistb invalid space\n");
+setcc(cpu, ZPCI_PCI_LS_ERR);
+s390_set_status_code(env, r1, ZPCI_PCI_ST_INVAL_AS);
+return 0;
+}
+
+/* Verify the address, offset and length */
+/* offset must be a multiple of 8 */
+if (offset % 8) {
+goto specification_error;
+}
+/* Length must be greater than 8, a multiple of 8 */
+/* and not greater than maxstbl */
+if ((len <= 8) || (len % 8) || (len > pbdev->maxstbl)) {
+goto specification_error;
+}
+/* Do not cross a 4K-byte boundary */
+if (((offset & 0xfff) + len) > 0x1000) {
+goto specification_error;
+}
+/* Guest address must be double word aligned */
+if (gaddr & 0x07UL) {
+goto specification_error;
+}
+
 mr = pbdev->pdev->io_regions[pcias].memory;
-if (!memory_region_access_valid(mr, env->regs[r3], len, true)) {
+if (!memory_region_access_valid(mr, offset, len, true)) {
 program_interrupt(env, PGM_OPERAND, 6);
 return 0;
 }
@@ -714,9 +725,9 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r3, uint64_t gaddr,
 }
 
 for (i = 0; i < len / 8; i++) {
-result = memory_region_dispatch_write(mr, env->regs[r3] + i * 8,
- ldq_p(buffer + i * 8), 8,
- MEMTXATTRS_UNSPECIFIED);
+result = memory_region_dispatch_write(mr, offset + i * 8,
+  ldq_p(buffer + i * 8), 8,
+  MEMTXATTRS_UNSPECIFIED);
 if (result != MEMTX_OK) {
 program_interrupt(env, PGM_OPERAND, 6);
 return 0;
@@ -725,6 +736,10 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r3, uint64_t gaddr,
 
 setcc(cpu, ZPCI_PCI_LS_OK);
 return 0;
+
+specification_error:
+program_interrupt(env, PGM_SPECIFICATION, 6);
+return 0;
 }
 
 static int reg_irqs(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib)
diff --git 

Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates

2017-11-30 Thread Igor Mammedov
On Thu, 30 Nov 2017 12:47:20 +
"Dr. David Alan Gilbert"  wrote:

> * Igor Mammedov (imamm...@redhat.com) wrote:
> > On Thu, 30 Nov 2017 12:08:06 +
> > "Dr. David Alan Gilbert"  wrote:
> >   
> > > * Igor Mammedov (imamm...@redhat.com) wrote:  
> > > > On Wed, 29 Nov 2017 18:50:19 +
> > > > "Dr. David Alan Gilbert (git)"  wrote:
> > > > 
> > > > > From: "Dr. David Alan Gilbert" 
> > > > > 
> > > > > Hi,
> > > > >   This is an experimental set that reworks the way the vhost
> > > > > code handles changes in physical address space layout that
> > > > > came from a discussion with Igor.
> > > > Thanks for looking into it.
> > > > 
> > > >  
> > > > > Instead of updating and trying to merge sections of address
> > > > > space on each add/remove callback, we wait until the commit phase
> > > > > and go through and rebuild a list by walking the Flatview of
> > > > > memory and end up producing an ordered list.
> > > > > We compare the list to the old list to trigger updates.
> > > > > 
> > > > > Note, only very lightly tested so far, I'm just trying to see if it's
> > > > > the right shape.
> > > > > 
> > > > > Igor, is this what you were intending?
> > > > 
> > > > I was thinking about a little less intrusive approach
> > > > 
> > > > where vhost_region_add/del are modified to maintain
> > > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> > > > altogether and vhost_memory_region array is build/used/freed
> > > > on every vhost_commit().
> > > > Maintaining sorted array should roughly cost us O(2 log n) if
> > > > binary search is used.
> > > > 
> > > > However I like your idea with iterator even more as it have
> > > > potential to make it even faster O(n) if we get rid of
> > > > quadratic and relatively complex vhost_update_compare_list().
> > > 
> > > Note vhost_update_compare_list is complex,
> > > but it is O(n) - it's
> > > got nested loops, but the inner loop moves forward and oldi never
> > > gets reset back to zero.  
> > While skimming through patches I've overlooked it.
> > 
> > Anyways,
> > why memcmp(old_arr, new_arr) is not sufficient
> > to detect a change in memory map?  
> 
> It tells you that you've got a change, but doesn't give
> the start/end of the range that's changed, and those
> are used by vhost_commit to limit the work of
> vhost_verify_ring_mappings.
Isn't memmap list a sorted and
 dev->mem_changed_[start|end]_addr are the lowest|highest
addresses of whole map?

If it's, so wouldn't getting values directly from 
the 1st/last entries of array be sufficient?



> 
> Dave
> 
> > >   
> > > > Pls, see comments on individual patches.
> > > 
> > > Thanks; I have fixed a couple of bugs since I posted, so I'm
> > > more interested in structure/shape.  Any good ideas how to test
> > > it are welcome.
> > > 
> > > Dave
> > >   
> > > > 
> > > > > Dave
> > > > > 
> > > > > Dr. David Alan Gilbert (7):
> > > > >   memory: address_space_iterate
> > > > >   vhost: Move log_dirty check
> > > > >   vhost: New memory update functions
> > > > >   vhost: update_mem_cb implementation
> > > > >   vhost: Compare new and old memory lists
> > > > >   vhost: Copy updated region data into device state
> > > > >   vhost: Remove vhost_set_memory and children
> > > > > 
> > > > >  hw/virtio/trace-events |   8 +
> > > > >  hw/virtio/vhost.c  | 424 
> > > > > ++---
> > > > >  include/exec/memory.h  |  23 +++
> > > > >  memory.c   |  22 +++
> > > > >  4 files changed, 241 insertions(+), 236 deletions(-)
> > > > > 
> > > > 
> > > --
> > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK  
> >   
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




[Qemu-devel] [PATCH v4 2/7] s390x/pci: rework PCI STORE

2017-11-30 Thread Pierre Morel
Enhance the fault detection, correction of the fault reporting.

Signed-off-by: Pierre Morel 
Reviewed-by: Yi Min Zhao 
---
 hw/s390x/s390-pci-inst.c | 42 +-
 hw/s390x/s390-pci-inst.h |  4 
 2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 3e1f1a0..134484f 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -470,6 +470,12 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2)
 pcias = (env->regs[r2] >> 16) & 0xf;
 len = env->regs[r2] & 0xf;
 offset = env->regs[r2 + 1];
+data = env->regs[r1];
+
+if (!(fh & FH_MASK_ENABLE)) {
+setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
+return 0;
+}
 
 pbdev = s390_pci_find_dev_by_fh(s390_get_phb(), fh);
 if (!pbdev) {
@@ -479,12 +485,10 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2)
 }
 
 switch (pbdev->state) {
-case ZPCI_FS_RESERVED:
-case ZPCI_FS_STANDBY:
-case ZPCI_FS_DISABLED:
+/* ZPCI_FS_RESERVED, ZPCI_FS_STANDBY and ZPCI_FS_DISABLED
+ * are already covered by the FH_MASK_ENABLE check above
+ */
 case ZPCI_FS_PERMANENT_ERROR:
-setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
-return 0;
 case ZPCI_FS_ERROR:
 setcc(cpu, ZPCI_PCI_LS_ERR);
 s390_set_status_code(env, r2, ZPCI_PCI_ST_BLOCKED);
@@ -493,9 +497,13 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2)
 break;
 }
 
-data = env->regs[r1];
-if (pcias < 6) {
-if ((8 - (offset & 0x7)) < len) {
+switch (pcias) {
+/* A ZPCI PCI card may use any BAR from BAR 0 to BAR 5 */
+case ZPCI_IO_BAR_MIN ... ZPCI_IO_BAR_MAX:
+/* Check length:
+ * A length of 0 is invalid and length should not cross a double word
+ */
+if (!len || (len > (8 - (offset & 0x7 {
 program_interrupt(env, PGM_OPERAND, 4);
 return 0;
 }
@@ -513,21 +521,21 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2)
 program_interrupt(env, PGM_OPERAND, 4);
 return 0;
 }
-} else if (pcias == 15) {
-if ((4 - (offset & 0x3)) < len) {
-program_interrupt(env, PGM_OPERAND, 4);
-return 0;
-}
-
-if (zpci_endian_swap(&data, len)) {
+break;
+case ZPCI_CONFIG_BAR:
+/* ZPCI uses the pseudo BAR number 15 as configuration space */
+/* possible access lengths are 1,2,4 and must not cross a word */
+if (!len || (len > (4 - (offset & 0x3))) || len == 3) {
 program_interrupt(env, PGM_OPERAND, 4);
 return 0;
 }
-
+/* len = 1,2,4 so we do not need to test */
+zpci_endian_swap(&data, len);
 pci_host_config_write_common(pbdev->pdev, offset,
  pci_config_size(pbdev->pdev),
  data, len);
-} else {
+break;
+default:
 DPRINTF("pcistg invalid space\n");
 setcc(cpu, ZPCI_PCI_LS_ERR);
 s390_set_status_code(env, r2, ZPCI_PCI_ST_INVAL_AS);
diff --git a/hw/s390x/s390-pci-inst.h b/hw/s390x/s390-pci-inst.h
index 94a959f..4be58fe 100644
--- a/hw/s390x/s390-pci-inst.h
+++ b/hw/s390x/s390-pci-inst.h
@@ -302,4 +302,8 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r3, uint64_t gaddr,
 int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar);
 int stpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar);
 
+#define ZPCI_IO_BAR_MIN 0
+#define ZPCI_IO_BAR_MAX 5
+#define ZPCI_CONFIG_BAR 15
+
 #endif
-- 
2.7.4




[Qemu-devel] [PATCH v4 5/7] s390x/pci: move the memory region read from pcilg

2017-11-30 Thread Pierre Morel
Let's move the memory region read from pcilg into a dedicated function.
This allows us to prepare a later patch.

Signed-off-by: Pierre Morel 
Reviewed-by: Yi Min Zhao 
Reviewed-by: Thomas Huth 
---
 hw/s390x/s390-pci-inst.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index bab7353..c702e8d 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -345,13 +345,22 @@ static int zpci_endian_swap(uint64_t *ptr, uint8_t len)
 return 0;
 }
 
+static MemTxResult zpci_read_bar(S390PCIBusDevice *pbdev, uint8_t pcias,
+ uint64_t offset, uint64_t *data, uint8_t len)
+{
+MemoryRegion *mr;
+
+mr = pbdev->pdev->io_regions[pcias].memory;
+return memory_region_dispatch_read(mr, offset, data, len,
+   MEMTXATTRS_UNSPECIFIED);
+}
+
 int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
 {
 CPUS390XState *env = &cpu->env;
 S390PCIBusDevice *pbdev;
 uint64_t offset;
 uint64_t data;
-MemoryRegion *mr;
 MemTxResult result;
 uint8_t len;
 uint32_t fh;
@@ -402,9 +411,7 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
 program_interrupt(env, PGM_OPERAND, 4);
 return 0;
 }
-mr = pbdev->pdev->io_regions[pcias].memory;
-result = memory_region_dispatch_read(mr, offset, &data, len,
- MEMTXATTRS_UNSPECIFIED);
+result = zpci_read_bar(pbdev, pcias, offset, &data, len);
 if (result != MEMTX_OK) {
 program_interrupt(env, PGM_OPERAND, 4);
 return 0;
-- 
2.7.4




[Qemu-devel] [PATCH v4 7/7] s390x/pci: search for subregion inside the BARs

2017-11-30 Thread Pierre Morel
When dispatching memory access to PCI BAR region, we must
look for possible subregions, used by the PCI device to map
different memory areas inside the same PCI BAR.

Since the data offset we received is calculated starting at the
region start address we need to adjust the offset for the subregion.

The data offset inside the subregion is calculated by substracting
the subregion's starting address from the data offset in the region.

The access to the MSIX region is now handled in a generic way,
we do not need the specific trap_msix() function anymore.

Signed-off-by: Pierre Morel 
Reviewed-by: Yi Min Zhao 
---
 hw/s390x/s390-pci-inst.c | 44 +---
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 1277d62..ca110e0 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -345,12 +345,31 @@ static int zpci_endian_swap(uint64_t *ptr, uint8_t len)
 return 0;
 }
 
+static MemoryRegion *s390_get_subregion(MemoryRegion *mr, uint64_t offset,
+uint8_t len)
+{
+MemoryRegion *subregion;
+uint64_t subregion_size;
+
+QTAILQ_FOREACH(subregion, &mr->subregions, subregions_link) {
+subregion_size = int128_get64(subregion->size);
+if ((offset >= subregion->addr) &&
+(offset + len) <= (subregion->addr + subregion_size)) {
+mr = subregion;
+break;
+}
+}
+return mr;
+}
+
 static MemTxResult zpci_read_bar(S390PCIBusDevice *pbdev, uint8_t pcias,
  uint64_t offset, uint64_t *data, uint8_t len)
 {
 MemoryRegion *mr;
 
 mr = pbdev->pdev->io_regions[pcias].memory;
+mr = s390_get_subregion(mr, offset, len);
+offset -= mr->addr;
 return memory_region_dispatch_read(mr, offset, data, len,
MEMTXATTRS_UNSPECIFIED);
 }
@@ -442,30 +461,14 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2)
 return 0;
 }
 
-static int trap_msix(S390PCIBusDevice *pbdev, uint64_t offset, uint8_t pcias)
-{
-if (pbdev->msix.available && pbdev->msix.table_bar == pcias &&
-offset >= pbdev->msix.table_offset &&
-offset < (pbdev->msix.table_offset +
-  pbdev->msix.entries * PCI_MSIX_ENTRY_SIZE)) {
-return 1;
-} else {
-return 0;
-}
-}
-
 static MemTxResult zpci_write_bar(S390PCIBusDevice *pbdev, uint8_t pcias,
   uint64_t offset, uint64_t data, uint8_t len)
 {
 MemoryRegion *mr;
 
-if (trap_msix(pbdev, offset, pcias)) {
-offset = offset - pbdev->msix.table_offset;
-mr = &pbdev->pdev->msix_table_mmio;
-} else {
-mr = pbdev->pdev->io_regions[pcias].memory;
-}
-
+mr = pbdev->pdev->io_regions[pcias].memory;
+mr = s390_get_subregion(mr, offset, len);
+offset -= mr->addr;
 return memory_region_dispatch_write(mr, offset, data, len,
 MEMTXATTRS_UNSPECIFIED);
 }
@@ -729,6 +732,9 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r3, uint64_t gaddr,
 }
 
 mr = pbdev->pdev->io_regions[pcias].memory;
+mr = s390_get_subregion(mr, offset, len);
+offset -= mr->addr;
+
 if (!memory_region_access_valid(mr, offset, len, true)) {
 program_interrupt(env, PGM_OPERAND, 6);
 return 0;
-- 
2.7.4




Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board

2017-11-30 Thread Paolo Bonzini
On 30/11/2017 13:51, Peter Maydell wrote:
> On 30 November 2017 at 12:40, Paolo Bonzini  wrote:
>> On 30/11/2017 13:14, Peter Maydell wrote:
>>> On 30 November 2017 at 08:53, Thomas Huth  wrote:
 +static const uint8_t kernel_mcf5208[] = {
 +0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */
 +0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */
 +0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) 
 Enable TX */
 +0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0)   Print 
 'T' */
 +0x60, 0xfa  /* bra.s  loop */
 +};
>>>
>>> This approach doesn't seem to be scalable to me -- are we
>>> really going to have 50 or more fragments of hand-coded hex in
>>> this file to cover the various board models?
>>>
>>> I'd much rather see us have a framework for being able
>>> to build test blobs from source using a cross compiler
>>> setup (and docker or similar so anybody can rebuild
>>> the test blobs). That will be much easier to maintain
>>> and easier to extend to having tests that test other
>>> parts of the board or other aspects of TCG emulation.
>>
>> It seems a bit overkill, as these snippets are ~16 bytes long.
> 
> They're 16 bytes long because that's about the limit of
> what you can do with this approach. The consequence is that
> they barely test anything at all.

Certainly they are an awful test for boards, but they are a great smoke
test for TCG changes that require modifications in all target/
subdirectories.

The infrastructure you want for integration tests is already provided by
kvm-unit-tests, which satisfies at least bullets 2-4 from your list
below (the first is unclear to me).

Thanks,

Paolo

> A more sensible framework
> would allow:
>  * better testing of TCG instructions more generally
>  * writing your test cases in C
>  * more interesting board dependent tests
>  * "integration test" setups like 'boot entire kernel'
>  * etc
> 
> thanks
> -- PMM
> 




Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board

2017-11-30 Thread Paolo Bonzini
On 30/11/2017 13:51, Thomas Huth wrote:
> The two micrablaze data arrays are completely different, since one is
> big endian, the other is little, so I'd need to byte-swap the whole
> array on the fly, too. Not sure whether it makes sense to add such
> complex code just to safe 16 bytes of blob data...?

*facepalm* :)

>> Likewise, you could have just two copies of the code for all ARM boards
>> that have a pl011 (or any other UART with a simple byte-long transmit
>> register), one 32-bit and one 64-bit.
> 
> OK, having a patching mechanism in place likely makes sense as soon as
> we want to include multiple ARM boards here. I can do that, but I'd
> rather like to do it as a second step. It was already quite hard work to
> come up with all these assembler snippets (from CPUs where I don't have
> a clue of) and to determine which machines could be tested this way at
> all, so I'd first like to wait and see whether this patch series is
> acceptable at all or not, since Peter has objections.

Given your remark on endianness then no, having one snippet per binary
is enough.

Paolo



Re: [Qemu-devel] [PATCH v4 00/20] add byte-based block_status driver callbacks

2017-11-30 Thread Vladimir Sementsov-Ogievskiy

Most of conversions looks fine, but it is not simple to prove the
correctness, because we start to use internal driver logic on offsets
and lengths, not aligned to sectors. And we can't imagine the
consequences (at least, I can't and my r-b doesn't give the guarantee)
of such change. It is like take some function and expand its scope of
parameters.

May be I'm too paranoiac. May be it would be good to align requests
in bdrv_co_block_status to sectors at least for drivers which do not
provide request_alignment. May be each patch should be reviewed by
person, knowing that particular driver.



12.10.2017 21:58, Eric Blake wrote:

There are patches floating around to add NBD_CMD_BLOCK_STATUS,
but NBD wants to report status on byte granularity (even if the
reporting will probably be naturally aligned to sectors or even
much higher levels).  I've therefore started the task of
converting our block status code to report at a byte granularity
rather than sectors.

Now that 2.11 is open, I'm rebasing/reposting the remaining patches.

The overall conversion currently looks like:
part 1: bdrv_is_allocated (merged, commit 51b0a488)
part 2: dirty-bitmap (merged, commit ca759622)
part 3: bdrv_get_block_status (v6 is posted [1], partially reviewed)
part 4: .bdrv_co_block_status (this series, v3 was here [2])

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-callback-v4

Based-on: <20171012034720.11947-1-ebl...@redhat.com>
([PATCH v6 00/24] make bdrv_get_block_status byte-based)

Since v3:
- rebase to series 3 tweak in get_status interface
- further simplify qed code
- better documentation of optimization flag (s/mapping/want_zero/)

001/20:[0033] [FC] 'block: Add .bdrv_co_block_status() callback'
002/20:[0077] [FC] 'block: Switch passthrough drivers to 
.bdrv_co_block_status()'
003/20:[0020] [FC] 'file-posix: Switch to .bdrv_co_block_status()'
004/20:[0019] [FC] 'gluster: Switch to .bdrv_co_block_status()'
005/20:[] [--] 'iscsi: Switch cluster_sectors to byte-based'
006/20:[] [--] 'iscsi: Switch iscsi_allocmap_update() to byte-based'
007/20:[0022] [FC] 'iscsi: Switch to .bdrv_co_block_status()'
008/20:[0013] [FC] 'null: Switch to .bdrv_co_block_status()'
009/20:[0017] [FC] 'parallels: Switch to .bdrv_co_block_status()'
010/20:[0014] [FC] 'qcow: Switch to .bdrv_co_block_status()'
011/20:[0019] [FC] 'qcow2: Switch to .bdrv_co_block_status()'
012/20:[0086] [FC] 'qed: Switch to .bdrv_co_block_status()'
013/20:[0015] [FC] 'raw: Switch to .bdrv_co_block_status()'
014/20:[0011] [FC] 'sheepdog: Switch to .bdrv_co_block_status()'
015/20:[] [--] 'vdi: Avoid bitrot of debugging code'
016/20:[0017] [FC] 'vdi: Switch to .bdrv_co_block_status()'
017/20:[0013] [FC] 'vmdk: Switch to .bdrv_co_block_status()'
018/20:[0019] [FC] 'vpc: Switch to .bdrv_co_block_status()'
019/20:[0010] [FC] 'vvfat: Switch to .bdrv_co_block_status()'
020/20:[0019] [FC] 'block: Drop unused .bdrv_co_get_block_status()'

Eric Blake (20):
   block: Add .bdrv_co_block_status() callback
   block: Switch passthrough drivers to .bdrv_co_block_status()
   file-posix: Switch to .bdrv_co_block_status()
   gluster: Switch to .bdrv_co_block_status()
   iscsi: Switch cluster_sectors to byte-based
   iscsi: Switch iscsi_allocmap_update() to byte-based
   iscsi: Switch to .bdrv_co_block_status()
   null: Switch to .bdrv_co_block_status()
   parallels: Switch to .bdrv_co_block_status()
   qcow: Switch to .bdrv_co_block_status()
   qcow2: Switch to .bdrv_co_block_status()
   qed: Switch to .bdrv_co_block_status()
   raw: Switch to .bdrv_co_block_status()
   sheepdog: Switch to .bdrv_co_block_status()
   vdi: Avoid bitrot of debugging code
   vdi: Switch to .bdrv_co_block_status()
   vmdk: Switch to .bdrv_co_block_status()
   vpc: Switch to .bdrv_co_block_status()
   vvfat: Switch to .bdrv_co_block_status()
   block: Drop unused .bdrv_co_get_block_status()

  include/block/block.h |   9 ++-
  include/block/block_int.h |  43 +++--
  block/io.c|  80 +++-
  block/blkdebug.c  |  20 +++---
  block/commit.c|   2 +-
  block/file-posix.c|  59 ++
  block/gluster.c   |  67 +++-
  block/iscsi.c | 154 +-
  block/mirror.c|   2 +-
  block/null.c  |  23 +++
  block/parallels.c |  22 ---
  block/qcow.c  |  27 
  block/qcow2.c |  24 
  block/qed.c   |  84 +
  block/raw-format.c|  16 ++---
  block/sheepdog.c  |  26 
  block/throttle.c  |   2 +-
  block/vdi.c   |  45 +++---
  block/vmdk.c  |  28 -
  block/vpc.c   |  42 +++--
  block/vvfat.c |  16 +++--
  21 files changed, 404 insertions(+), 387 deletions(-)




--
Best regards,
Vladimir




Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates

2017-11-30 Thread Dr. David Alan Gilbert
* Igor Mammedov (imamm...@redhat.com) wrote:
> On Thu, 30 Nov 2017 12:47:20 +
> "Dr. David Alan Gilbert"  wrote:
> 
> > * Igor Mammedov (imamm...@redhat.com) wrote:
> > > On Thu, 30 Nov 2017 12:08:06 +
> > > "Dr. David Alan Gilbert"  wrote:
> > >   
> > > > * Igor Mammedov (imamm...@redhat.com) wrote:  
> > > > > On Wed, 29 Nov 2017 18:50:19 +
> > > > > "Dr. David Alan Gilbert (git)"  wrote:
> > > > > 
> > > > > > From: "Dr. David Alan Gilbert" 
> > > > > > 
> > > > > > Hi,
> > > > > >   This is an experimental set that reworks the way the vhost
> > > > > > code handles changes in physical address space layout that
> > > > > > came from a discussion with Igor.
> > > > > Thanks for looking into it.
> > > > > 
> > > > >  
> > > > > > Instead of updating and trying to merge sections of address
> > > > > > space on each add/remove callback, we wait until the commit phase
> > > > > > and go through and rebuild a list by walking the Flatview of
> > > > > > memory and end up producing an ordered list.
> > > > > > We compare the list to the old list to trigger updates.
> > > > > > 
> > > > > > Note, only very lightly tested so far, I'm just trying to see if 
> > > > > > it's
> > > > > > the right shape.
> > > > > > 
> > > > > > Igor, is this what you were intending?
> > > > > 
> > > > > I was thinking about a little less intrusive approach
> > > > > 
> > > > > where vhost_region_add/del are modified to maintain
> > > > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> > > > > altogether and vhost_memory_region array is build/used/freed
> > > > > on every vhost_commit().
> > > > > Maintaining sorted array should roughly cost us O(2 log n) if
> > > > > binary search is used.
> > > > > 
> > > > > However I like your idea with iterator even more as it have
> > > > > potential to make it even faster O(n) if we get rid of
> > > > > quadratic and relatively complex vhost_update_compare_list().
> > > > 
> > > > Note vhost_update_compare_list is complex,
> > > > but it is O(n) - it's
> > > > got nested loops, but the inner loop moves forward and oldi never
> > > > gets reset back to zero.  
> > > While skimming through patches I've overlooked it.
> > > 
> > > Anyways,
> > > why memcmp(old_arr, new_arr) is not sufficient
> > > to detect a change in memory map?  
> > 
> > It tells you that you've got a change, but doesn't give
> > the start/end of the range that's changed, and those
> > are used by vhost_commit to limit the work of
> > vhost_verify_ring_mappings.
> Isn't memmap list a sorted and
>  dev->mem_changed_[start|end]_addr are the lowest|highest
> addresses of whole map?
> 
> If it's, so wouldn't getting values directly from 
> the 1st/last entries of array be sufficient?

THat wasn't my understanding from the existing code;
my understanding was that changed_start_addr is set by
vhost_region_add->vhost_set_memory when a new region is added
(or removed) and is set to the limit of the section added.
But perhaps I'm misunderstanding.

(The logic in vhost_verify_ring_mappings doesn't make sense
to me either though; if vhost_verify_ring_part_mapping returns 0
on success, why is it doing   if (!r) { break; }  surely it
should be  if (r) { break; })

Dave

> 
> 
> > 
> > Dave
> > 
> > > >   
> > > > > Pls, see comments on individual patches.
> > > > 
> > > > Thanks; I have fixed a couple of bugs since I posted, so I'm
> > > > more interested in structure/shape.  Any good ideas how to test
> > > > it are welcome.
> > > > 
> > > > Dave
> > > >   
> > > > > 
> > > > > > Dave
> > > > > > 
> > > > > > Dr. David Alan Gilbert (7):
> > > > > >   memory: address_space_iterate
> > > > > >   vhost: Move log_dirty check
> > > > > >   vhost: New memory update functions
> > > > > >   vhost: update_mem_cb implementation
> > > > > >   vhost: Compare new and old memory lists
> > > > > >   vhost: Copy updated region data into device state
> > > > > >   vhost: Remove vhost_set_memory and children
> > > > > > 
> > > > > >  hw/virtio/trace-events |   8 +
> > > > > >  hw/virtio/vhost.c  | 424 
> > > > > > ++---
> > > > > >  include/exec/memory.h  |  23 +++
> > > > > >  memory.c   |  22 +++
> > > > > >  4 files changed, 241 insertions(+), 236 deletions(-)
> > > > > > 
> > > > > 
> > > > --
> > > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK  
> > >   
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH 06/17] iotests: Drop format-specific in _filter_img_info

2017-11-30 Thread Max Reitz
On 2017-11-30 04:16, Fam Zheng wrote:
> On Thu, 11/23 03:08, Max Reitz wrote:
>> _filter_img_info should remove format-specific information, too.  We
>> already have such a filter in _img_info, and it is very useful for
>> query-block-named-block-nodes (etc.), too.
>>
>> However, in 198 we need that information (but we still want the rest of
>> the filter), so make that filtering optional.  Note that "the rest of
>> the filter" includes filtering of the test directory, so we can drop the
>> _filter_testdir from 198 at the same time.
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  tests/qemu-iotests/198   |  6 --
>>  tests/qemu-iotests/common.filter | 29 -
>>  2 files changed, 32 insertions(+), 3 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/198 b/tests/qemu-iotests/198
>> index 34ef666381..a84a058396 100755
>> --- a/tests/qemu-iotests/198
>> +++ b/tests/qemu-iotests/198
>> @@ -91,11 +91,13 @@ $QEMU_IO --object $SECRET0 --object $SECRET1 -c "read -P 
>> 0x9 0 $size" --image-op
>>  
>>  echo
>>  echo "== checking image base =="
>> -$QEMU_IMG info --image-opts $IMGSPECBASE | _filter_img_info | 
>> _filter_testdir | sed -e "/^disk size:/ D"
>> +$QEMU_IMG info --image-opts $IMGSPECBASE | _filter_img_info 
>> --format-specific \
>> +| sed -e "/^disk size:/ D"
>>  
>>  echo
>>  echo "== checking image layer =="
>> -$QEMU_IMG info --image-opts $IMGSPECLAYER | _filter_img_info | 
>> _filter_testdir | sed -e "/^disk size:/ D"
>> +$QEMU_IMG info --image-opts $IMGSPECLAYER | _filter_img_info 
>> --format-specific \
>> +| sed -e "/^disk size:/ D"
>>  
>>  
>>  # success, all done
>> diff --git a/tests/qemu-iotests/common.filter 
>> b/tests/qemu-iotests/common.filter
>> index d9237799e9..0c0e53fae7 100644
>> --- a/tests/qemu-iotests/common.filter
>> +++ b/tests/qemu-iotests/common.filter
>> @@ -139,6 +139,15 @@ _filter_img_create()
>>  
>>  _filter_img_info()
>>  {
>> +if [[ "$1" == "--format-specific" ]]; then
>> +local format_specific=1
>> +shift
>> +else
>> +local format_specific=0
>> +fi
>> +
>> +discard=0
>> +regex_json_spec_start='^ *"format-specific": \{'
>>  sed -e "s#$IMGPROTO:$TEST_DIR#TEST_DIR#g" \
>>  -e "s#$TEST_DIR#TEST_DIR#g" \
>>  -e "s#$IMGFMT#IMGFMT#g" \
>> @@ -159,7 +168,25 @@ _filter_img_info()
>>  -e "/block_state_zero: \\(on\\|off\\)/d" \
>>  -e "/log_size: [0-9]\\+/d" \
>>  -e "s/iters: [0-9]\\+/iters: 1024/" \
>> --e "s/uuid: [-a-f0-9]\\+/uuid: 
>> ----/"
>> +-e "s/uuid: [-a-f0-9]\\+/uuid: 
>> ----/" | \
>> +while IFS='' read -r line; do
>> +if [[ $format_specific == 1 ]]; then
>> +discard=0
>> +elif [[ $line == "Format specific information:" ]]; then
>> +discard=1
>> +elif [[ $line =~ $regex_json_spec_start ]]; then
>> +discard=2
>> +regex_json_spec_end="^${line%%[^ ]*}\\},? *$"
>> +fi
>> +if [[ $discard == 0 ]]; then
>> +echo "$line"
>> +elif [[ $discard == 1 && ! $line ]]; then
> 
> s/\$line/"\$line"/ ?
> 
>> +echo
>> +discard=0
>> +elif [[ $discard == 2 && $line =~ $regex_json_spec_end ]]; then
> 
> Ditto.

It's in double brackets, so it should be fine.

(I just copied it from _img_info in common.rc, and that was reviewed by
Eric, so I assume it's fine. :-))

Max

>> +discard=0
>> +fi
>> +done
>>  }
>>  
>>  # filter out offsets and file names from qemu-img map; good for both
>> -- 
>> 2.13.6
>>




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 08/17] iotests: Skip 103 for refcount_bits=1

2017-11-30 Thread Max Reitz
On 2017-11-30 04:18, Fam Zheng wrote:
> On Thu, 11/23 03:08, Max Reitz wrote:
>> Signed-off-by: Max Reitz 
>> ---
>>  tests/qemu-iotests/103 | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/tests/qemu-iotests/103 b/tests/qemu-iotests/103
>> index ecbd8ebd71..d0cfab8844 100755
>> --- a/tests/qemu-iotests/103
>> +++ b/tests/qemu-iotests/103
>> @@ -40,6 +40,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>>  _supported_fmt qcow2
>>  _supported_proto file nfs
>>  _supported_os Linux
>> +# Internal snapshots are (currently) impossible with refcount_bits=1
>> +_unsupported_imgopts 'refcount_bits=1[^0-9]'
> 
> What is the "[^0-9]" part for?

It's so you can specify refcount_bits=16, but not
refcount_bits=1,compat=0.10 or just refcount_bits=1.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC PATCH v2 1/1] s390x/css: unrestrict cssids

2017-11-30 Thread Cornelia Huck
On Thu, 30 Nov 2017 13:32:12 +0100
Halil Pasic  wrote:

(...)

Before I spend way too much time on this:

Is the proposed machine-property interface usable from a libvirt POV?
IOW, can we go with this now and fix the ugliness later (probably via a
generic overhaul of the interface)?



[Qemu-devel] [PATCH 0/3] macro do/while (0) cleanup

2017-11-30 Thread Eric Blake
Noticed this by chance in the tests/ directory, so I broadened
it to a grep of the entire code base.  I suspect many of the
bad macros were the victims of copy-and-paste from some other
bad location (particularly given how many bit-rotten debug
print macros were involved).

https://wiki.qemu.org/BiteSizedTasks#Bitrot_prevention is still
left for someone else, for another day.

Eric Blake (3):
  net: Drop unusual use of do { } while (0);
  mips: Tweak location of ';' in macros
  maint: Fix macros with broken 'do/while(0);' usage

 tests/acpi-utils.h |  8 
 ui/sdl_zoom_template.h |  8 
 audio/paaudio.c|  4 ++--
 hw/adc/stm32f2xx_adc.c |  2 +-
 hw/block/m25p80.c  |  2 +-
 hw/char/cadence_uart.c |  2 +-
 hw/char/stm32f2xx_usart.c  |  2 +-
 hw/display/cg3.c   |  2 +-
 hw/display/dpcd.c  |  2 +-
 hw/display/xlnx_dp.c   |  2 +-
 hw/dma/pl330.c |  2 +-
 hw/dma/xlnx-zynq-devcfg.c  |  2 +-
 hw/dma/xlnx_dpdma.c|  2 +-
 hw/i2c/i2c-ddc.c   |  2 +-
 hw/misc/auxbus.c   |  2 +-
 hw/misc/macio/mac_dbdma.c  |  4 ++--
 hw/misc/mmio_interface.c   |  2 +-
 hw/misc/stm32f2xx_syscfg.c |  2 +-
 hw/misc/zynq_slcr.c|  2 +-
 hw/net/cadence_gem.c   |  2 +-
 hw/net/pcnet.c | 20 ++--
 hw/ssi/mss-spi.c   |  2 +-
 hw/ssi/stm32f2xx_spi.c |  2 +-
 hw/ssi/xilinx_spi.c|  2 +-
 hw/ssi/xilinx_spips.c  |  2 +-
 hw/timer/a9gtimer.c|  2 +-
 hw/timer/cadence_ttc.c |  2 +-
 hw/timer/mss-timer.c   |  2 +-
 hw/timer/stm32f2xx_timer.c |  2 +-
 hw/tpm/tpm_passthrough.c   |  2 +-
 hw/tpm/tpm_tis.c   |  2 +-
 migration/rdma.c   |  2 +-
 target/arm/translate-a64.c |  2 +-
 target/mips/msa_helper.c   | 34 ++
 target/s390x/kvm.c |  2 +-
 tests/tcg/test-mmap.c  |  2 +-
 36 files changed, 70 insertions(+), 68 deletions(-)

-- 
2.14.3




[Qemu-devel] [PATCH 2/3] mips: Tweak location of ';' in macros

2017-11-30 Thread Eric Blake
It is more typical to provide the ';' by the caller of a macro
than to embed it in the macro itself; this is because syntax
highlight engines can get confused if a macro is called without
a semicolon before the closing '}'.

Signed-off-by: Eric Blake 
---
 target/mips/msa_helper.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
index f167a42655..8fb7a369ca 100644
--- a/target/mips/msa_helper.c
+++ b/target/mips/msa_helper.c
@@ -682,13 +682,13 @@ static inline int64_t msa_mod_u_df(uint32_t df, int64_t 
arg1, int64_t arg2)
 do {\
 e = SIGNED_EVEN(a, df); \
 o = SIGNED_ODD(a, df);  \
-} while (0);
+} while (0)

 #define UNSIGNED_EXTRACT(e, o, a, df)   \
 do {\
 e = UNSIGNED_EVEN(a, df);   \
 o = UNSIGNED_ODD(a, df);\
-} while (0);
+} while (0)

 static inline int64_t msa_dotp_s_df(uint32_t df, int64_t arg1, int64_t arg2)
 {
@@ -1120,9 +1120,11 @@ void helper_msa_splat_df(CPUMIPSState *env, uint32_t df, 
uint32_t wd,
 #define MSA_LOOP_COND_D MSA_LOOP_COND(DF_DOUBLE)

 #define MSA_LOOP(DF) \
+do { \
 for (i = 0; i < (MSA_LOOP_COND_ ## DF) ; i++) { \
-MSA_DO_ ## DF \
-}
+MSA_DO_ ## DF; \
+} \
+} while (0)

 #define MSA_FN_DF(FUNC) \
 void helper_msa_##FUNC(CPUMIPSState *env, uint32_t df, uint32_t wd, \
@@ -1135,17 +1137,17 @@ void helper_msa_##FUNC(CPUMIPSState *env, uint32_t df, 
uint32_t wd, \
 uint32_t i; \
 switch (df) {   \
 case DF_BYTE:   \
-MSA_LOOP_B  \
+MSA_LOOP_B; \
 break;  \
 case DF_HALF:   \
-MSA_LOOP_H  \
+MSA_LOOP_H; \
 break;  \
 case DF_WORD:   \
-MSA_LOOP_W  \
+MSA_LOOP_W; \
 break;  \
 case DF_DOUBLE: \
-MSA_LOOP_D  \
-   break;   \
+MSA_LOOP_D; \
+break;  \
 default:\
 assert(0);  \
 }   \
@@ -1168,7 +1170,7 @@ void helper_msa_##FUNC(CPUMIPSState *env, uint32_t df, 
uint32_t wd, \
 do {\
 R##DF(pwx, i) = pwt->DF[2*i];   \
 L##DF(pwx, i) = pws->DF[2*i];   \
-} while (0);
+} while (0)
 MSA_FN_DF(pckev_df)
 #undef MSA_DO

@@ -1176,7 +1178,7 @@ MSA_FN_DF(pckev_df)
 do {\
 R##DF(pwx, i) = pwt->DF[2*i+1]; \
 L##DF(pwx, i) = pws->DF[2*i+1]; \
-} while (0);
+} while (0)
 MSA_FN_DF(pckod_df)
 #undef MSA_DO

@@ -1184,7 +1186,7 @@ MSA_FN_DF(pckod_df)
 do {\
 pwx->DF[2*i]   = L##DF(pwt, i); \
 pwx->DF[2*i+1] = L##DF(pws, i); \
-} while (0);
+} while (0)
 MSA_FN_DF(ilvl_df)
 #undef MSA_DO

@@ -1192,7 +1194,7 @@ MSA_FN_DF(ilvl_df)
 do {\
 pwx->DF[2*i]   = R##DF(pwt, i); \
 pwx->DF[2*i+1] = R##DF(pws, i); \
-} while (0);
+} while (0)
 MSA_FN_DF(ilvr_df)
 #undef MSA_DO

@@ -1200,7 +1202,7 @@ MSA_FN_DF(ilvr_df)
 do {\
 pwx->DF[2*i]   = pwt->DF[2*i];  \
 pwx->DF[2*i+1] = pws->DF[2*i];  \
-} while (0);
+} while (0)
 MSA_FN_DF(ilvev_df)
 #undef MSA_DO

@@ -1208,7 +1210,7 @@ MSA_FN_DF(ilvev_df)
 do {\
 pwx->DF[2*i]   = pwt->DF[2*i+1];\
 pwx->DF[2*i+1] = pws->DF[2*i+1];\
-} while (0);
+} while (0)
 MSA_FN_DF(ilvod_df)
 #undef MSA_DO
 #undef MSA_LOOP_COND
@@ -1222,7 +1224,7 @@ MSA_FN_DF(ilvod_df)
 uint32_t k = (pwd->DF[i] & 0x3f) % (2 * n); \
 pwx->DF[i] =\
 (pwd->DF[i] & 0xc0

[Qemu-devel] [PATCH 3/3] maint: Fix macros with broken 'do/while(0); ' usage

2017-11-30 Thread Eric Blake
The point of writing a macro embedded in a 'do { ... } while (0)'
loop is so that the macro can be used as a drop-in statement with
the caller supplying the trailing ';'.  Although our coding style
frowns on brace-less 'if':
  if (cond)
statement;
  else
something else;
the use of do/while (0) in a macro is absolutely essential for the
purpose of avoiding a syntax error on the 'else' - but it only works
if there is no trailing ';' in the macro (as the ';' in the code
calling the macro would then be a second statement and cause the
'else' to not pair to the 'if').

Many of the places touched in this code are examples of the ugly
bit-rotting debug print statements; cleaning those up is left as
a bite-sized task for another day.

Found mechanically via: $ git grep -B1 'while (0);' | grep -A1 

Signed-off-by: Eric Blake 
---
 tests/acpi-utils.h | 8 
 ui/sdl_zoom_template.h | 8 
 audio/paaudio.c| 4 ++--
 hw/adc/stm32f2xx_adc.c | 2 +-
 hw/block/m25p80.c  | 2 +-
 hw/char/cadence_uart.c | 2 +-
 hw/char/stm32f2xx_usart.c  | 2 +-
 hw/display/cg3.c   | 2 +-
 hw/display/dpcd.c  | 2 +-
 hw/display/xlnx_dp.c   | 2 +-
 hw/dma/pl330.c | 2 +-
 hw/dma/xlnx-zynq-devcfg.c  | 2 +-
 hw/dma/xlnx_dpdma.c| 2 +-
 hw/i2c/i2c-ddc.c   | 2 +-
 hw/misc/auxbus.c   | 2 +-
 hw/misc/macio/mac_dbdma.c  | 4 ++--
 hw/misc/mmio_interface.c   | 2 +-
 hw/misc/stm32f2xx_syscfg.c | 2 +-
 hw/misc/zynq_slcr.c| 2 +-
 hw/net/cadence_gem.c   | 2 +-
 hw/ssi/mss-spi.c   | 2 +-
 hw/ssi/stm32f2xx_spi.c | 2 +-
 hw/ssi/xilinx_spi.c| 2 +-
 hw/ssi/xilinx_spips.c  | 2 +-
 hw/timer/a9gtimer.c| 2 +-
 hw/timer/cadence_ttc.c | 2 +-
 hw/timer/mss-timer.c   | 2 +-
 hw/timer/stm32f2xx_timer.c | 2 +-
 hw/tpm/tpm_passthrough.c   | 2 +-
 hw/tpm/tpm_tis.c   | 2 +-
 migration/rdma.c   | 2 +-
 target/arm/translate-a64.c | 2 +-
 target/s390x/kvm.c | 2 +-
 tests/tcg/test-mmap.c  | 2 +-
 34 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/tests/acpi-utils.h b/tests/acpi-utils.h
index d5ca5b6238..ac52abd0dd 100644
--- a/tests/acpi-utils.h
+++ b/tests/acpi-utils.h
@@ -32,7 +32,7 @@ typedef struct {
 do {\
 memread(addr, &field, sizeof(field));   \
 addr += sizeof(field);  \
-} while (0);
+} while (0)

 #define ACPI_READ_ARRAY_PTR(arr, length, addr)  \
 do {\
@@ -40,7 +40,7 @@ typedef struct {
 for (idx = 0; idx < length; ++idx) {\
 ACPI_READ_FIELD(arr[idx], addr);\
 }   \
-} while (0);
+} while (0)

 #define ACPI_READ_ARRAY(arr, addr)   \
 ACPI_READ_ARRAY_PTR(arr, sizeof(arr) / sizeof(arr[0]), addr)
@@ -56,7 +56,7 @@ typedef struct {
 ACPI_READ_FIELD((table)->oem_revision, addr);\
 ACPI_READ_ARRAY((table)->asl_compiler_id, addr); \
 ACPI_READ_FIELD((table)->asl_compiler_revision, addr);   \
-} while (0);
+} while (0)

 #define ACPI_ASSERT_CMP(actual, expected) do { \
 char ACPI_ASSERT_CMP_str[5] = {}; \
@@ -77,7 +77,7 @@ typedef struct {
 ACPI_READ_FIELD((field).bit_offset, addr);   \
 ACPI_READ_FIELD((field).access_width, addr); \
 ACPI_READ_FIELD((field).address, addr);  \
-} while (0);
+} while (0)


 uint8_t acpi_calc_checksum(const uint8_t *data, int len);
diff --git a/ui/sdl_zoom_template.h b/ui/sdl_zoom_template.h
index 3bb508b51e..6a424adfb4 100644
--- a/ui/sdl_zoom_template.h
+++ b/ui/sdl_zoom_template.h
@@ -34,22 +34,22 @@
 #define setRed(r, pcolor) do { \
 *pcolor = ((*pcolor) & (~(dpf->Rmask))) + \
   (((r) & (dpf->Rmask >> dpf->Rshift)) << dpf->Rshift); \
-} while (0);
+} while (0)

 #define setGreen(g, pcolor) do { \
 *pcolor = ((*pcolor) & (~(dpf->Gmask))) + \
   (((g) & (dpf->Gmask >> dpf->Gshift)) << dpf->Gshift); \
-} while (0);
+} while (0)

 #define setBlue(b, pcolor) do { \
 *pcolor = ((*pcolor) & (~(dpf->Bmask))) + \
   (((b) & (dpf->Bmask >> dpf->Bshift)) << dpf->Bshift); \
-} while (0);
+} while (0)

 #define setAlpha(a, pcolor) do { \
 *pcolor = ((*pcolor) & (~(dpf->Amask))) + \
   (((a) & (dpf->Amask >> dpf->Ashift)) << dpf->Ashift); \
-} while (0);
+} while (0)

 static void glue(sdl_zoom_rgb, BPP)(SDL_Surface *src, SDL_Surface *dst, int 
smooth,
SDL_Rect *dst_rect)
diff --git a/audio/paaudio.c b/audio/paaudio.c
index 65beb6f010..2a35e6f82c 100644
--- a/audio/paaudio.c
+++ b/audio/paaudio.c
@@ -89,7 +89,7 @@ static inline int PA_STREAM_IS_GOOD(pa_stream_state_t x)
 }   \
 goto label; \
 }  

[Qemu-devel] [PATCH 1/3] net: Drop unusual use of do { } while (0);

2017-11-30 Thread Eric Blake
For a couple of macros in pcnet.c, we have to provide a new scope
to avoid compiler warnings about declarations in the middle of a
switch statement that aren't in a sub-scope.  But use of
'do { ... } while (0);' merely to provide that new scope is arcane
overkill, compared to just using '{ ... }'.

Signed-off-by: Eric Blake 
---
 hw/net/pcnet.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c
index 654455355f..641bf2fd88 100644
--- a/hw/net/pcnet.c
+++ b/hw/net/pcnet.c
@@ -455,32 +455,32 @@ static inline void pcnet_rmd_store(PCNetState *s, struct 
pcnet_RMD *rmd,
 #define CHECK_RMD(ADDR,RES) do {\
 switch (BCR_SWSTYLE(s)) {   \
 case 0x00:  \
-do {\
+{   \
 uint16_t rda[4];\
 s->phys_mem_read(s->dma_opaque, (ADDR), \
 (void *)&rda[0], sizeof(rda), 0); \
 (RES) |= (rda[2] & 0xf000)!=0xf000; \
 (RES) |= (rda[3] & 0xf000)!=0x; \
-} while (0);\
+}   \
 break;  \
 case 0x01:  \
 case 0x02:  \
-do {\
+{   \
 uint32_t rda[4];\
 s->phys_mem_read(s->dma_opaque, (ADDR), \
 (void *)&rda[0], sizeof(rda), 0); \
 (RES) |= (rda[1] & 0xf000L)!=0xf000L; \
 (RES) |= (rda[2] & 0xf000L)!=0xL; \
-} while (0);\
+}   \
 break;  \
 case 0x03:  \
-do {\
+{   \
 uint32_t rda[4];\
 s->phys_mem_read(s->dma_opaque, (ADDR), \
 (void *)&rda[0], sizeof(rda), 0); \
 (RES) |= (rda[0] & 0xf000L)!=0xL; \
 (RES) |= (rda[1] & 0xf000L)!=0xf000L; \
-} while (0);\
+}   \
 break;  \
 }   \
 } while (0)
@@ -488,22 +488,22 @@ static inline void pcnet_rmd_store(PCNetState *s, struct 
pcnet_RMD *rmd,
 #define CHECK_TMD(ADDR,RES) do {\
 switch (BCR_SWSTYLE(s)) {   \
 case 0x00:  \
-do {\
+{   \
 uint16_t xda[4];\
 s->phys_mem_read(s->dma_opaque, (ADDR), \
 (void *)&xda[0], sizeof(xda), 0); \
 (RES) |= (xda[2] & 0xf000)!=0xf000; \
-} while (0);\
+}   \
 break;  \
 case 0x01:  \
 case 0x02:  \
 case 0x03:  \
-do {\
+{   \
 uint32_t xda[4];\
 s->phys_mem_read(s->dma_opaque, (ADDR), \
 (void *)&xda[0], sizeof(xda), 0); \
 (RES) |= (xda[1] & 0xf000L)!=0xf000L; \
-} while (0);\
+}   \
 break;  \
 }   \
 } while (0)
-- 
2.14.3




  1   2   >