Re: [Qemu-devel] [PATCH v4 03/20] file-posix: Switch to .bdrv_co_block_status()
12.10.2017 21:58, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Update the file protocol driver accordingly. In want_zero mode, we continue to report fine-grained hole information (the caller wants as much mapping detail as possible); but when not in that mode, the caller prefers larger *pnum and merely cares about what offsets are allocated at this layer, rather than where the holes live. Since holes still read as zeroes at this layer (rather than deferring to a backing layer), we can take the shortcut of skipping lseek(), and merely state that all bytes are allocated. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [Qemu-devel] About virtio device hotplug in Q35! 【外域邮件.谨慎查阅】
Hi, After 3 months of work and investigation, and tedious mail discussions with Nvidia, I think some progress have been made, in terms of the GPUDirect(p2p) in virtual environment. The only remaining issue then, is the low bidirectional bandwidth between two sibling GPUs under the same PCIe switch. We expanded the tests to run on even more GPU cards, so the results seemed to be explicit now. P40 is OK, and its hardware topology on host is: \-[:00]-+-00.0 Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 +-01.0-[03]00.0 LSI Logic / Symbios Logic MegaRAID SAS-3 3008 [Fury] +-02.0-[04]00.0 NVIDIA Corporation GP102GL [Tesla P40] +-03.0-[02]00.0 NVIDIA Corporation GP102GL [Tesla P40] M60, not OK, low bandwidth: \-[:00]-+-00.0 Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 +-01.0-[06]00.0 LSI Logic / Symbios Logic MegaRAID SAS-3 3008 [Fury] +-02.0-[07-0a]00.0-[08-0a]--+-08.0-[09]00.0 NVIDIA Corporation GM204GL [Tesla M60] | \-10.0-[0a]00.0 NVIDIA Corporation GM204GL [Tesla M60] V100, not OK, low bandwidth: \-[:00]-+-00.0 Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 +-01.0-[01]--+-00.0 Mellanox Technologies MT27710 Family [ConnectX-4 Lx] |\-00.1 Mellanox Technologies MT27710 Family [ConnectX-4 Lx] +-02.0-[02-05]00.0-[03-05]--+-08.0-[04]00.0 NVIDIA Corporation GV100 [Tesla V100 PCIe] | \-10.0-[05]00.0 NVIDIA Corporation GV100 [Tesla V100 PCIe] So what might be the actual effect of the PLX switch hardware for GPU data flow? Although it is not visible in guest OS. Nvidia tech-support guys are not familiar with virtualization. They asked us to consult the community first.
Re: [Qemu-devel] device_del fail
hello, i try again on arch Linux , after i add gpu, display is not light,following message from "info pci": Bus 0, device 5, function 0: VGA controller: PCI device 10de:1401 IRQ 11. BAR0: 32 bit memory at 0x [0x00fe]. BAR1: 64 bit prefetchable memory at 0x [0x0ffe]. BAR3: 64 bit prefetchable memory at 0x [0x01fe]. BAR5: I/O at 0x [0x007e]. BAR6: 32 bit memory at 0x [0x0001fffe]. id "gpu1" Bus 0, device 6, function 0: Audio controller: PCI device 10de:0fba IRQ 10. BAR0: 32 bit memory at 0x3e82 [0x3e823fff]. id "gpu2" Then i connect that guest with ssh: Last login: Thu Nov 30 22:29:03 2017(it's normal) At 2017-11-30 10:32:26, "Fam Zheng" wrote: >Let's keep qemu-devel@nongnu.org on the list so other people can chime in. > >On Thu, 11/30 10:05, Lying wrote: > >> At 2017-11-29 14:29:17, "Fam Zheng" wrote: >> >On Wed, 11/29 13:43, Lying wrote: >> >> Hello everybody, I encounter a error on my vm guest >> >> I remove my device on my guest is failed, following is detail message. >> >> I didn't add my audio device before running my guest, So i add audio with >> >> "device_add" after running >> >> Then i can use it, so i remove it with "device_del", but when i do it >> >> again, i >> >> can't use my audio device and cannot remove it. >> > >> > >> >I haven't tried unplugging an audio device myself but in general PCI hot >> >unplug >> >needs guest cooperation so some infomation about guest is also helpful. >> >Which >> >guest OS is this? >> > >> >What are the exact commands that you used and what are the messages from >> >host/guest when you device_del? >> > >> >> >Fam >> >> >> hello, thanks for your reply, and i'm sorry that i not notice at yesterday . >> following is my information: >> guest : window-7 >> host : Ubuntu-16.04 >> qemu : 2.11.0-rc0 >> libvirt : 3.9 >>seabios : 1.10.74 (from qemu) >> exact commands: >> add:"device_add >> driver=vfio-pci,host=02:00.0,x-vga=on,multifunction=yes,id=gpu1" >> remove:"device_del gpu1" >> and not messages when i running device_del. But i can see it still exist >> while running commands "info pci" > >Probably the guest didn't respond to the hot-unplug event. In your original >question you said it is an audio device but your command suggests it is a GPU >device. I'm not sure Windows driver can handle this. Have you tried different >guests, such as Linux? > >Cc Alex who may know more. > >Fam
Re: [Qemu-devel] [PATCH v4 04/20] gluster: Switch to .bdrv_co_block_status()
12.10.2017 21:59, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Update the gluster driver accordingly. In want_zero mode, we continue to report fine-grained hole information (the caller wants as much mapping detail as possible); but when not in that mode, the caller prefers larger *pnum and merely cares about what offsets are allocated at this layer, rather than where the holes live. Since holes still read as zeroes at this layer (rather than deferring to a backing layer), we can take the shortcut of skipping find_allocation(), and merely state that all bytes are allocated. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v4 08/20] null: Switch to .bdrv_co_block_status()
12.10.2017 21:59, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Update the null driver accordingly. Signed-off-by: Eric Blake --- v4: rebase to interface tweak v3: no change v2: rebase to mapping parameter --- block/null.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/block/null.c b/block/null.c index dd9c13f9ba..20c7eb0ee2 100644 --- a/block/null.c +++ b/block/null.c @@ -223,22 +223,23 @@ static int null_reopen_prepare(BDRVReopenState *reopen_state, return 0; } -static int64_t coroutine_fn null_co_get_block_status(BlockDriverState *bs, - int64_t sector_num, - int nb_sectors, int *pnum, - BlockDriverState **file) +static int coroutine_fn null_co_block_status(BlockDriverState *bs, + bool want_zero, int64_t offset, + int64_t bytes, int64_t *pnum, + int64_t *map, + BlockDriverState **file) { BDRVNullState *s = bs->opaque; -off_t start = sector_num * BDRV_SECTOR_SIZE; +int64_t ret = BDRV_BLOCK_OFFSET_VALID; int ret = with that: Reviewed-by: Vladimir Sementsov-Ogievskiy -*pnum = nb_sectors; +*pnum = bytes; +*map = offset; *file = bs; if (s->read_zeroes) { -return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO; -} else { -return BDRV_BLOCK_OFFSET_VALID | start; +ret |= BDRV_BLOCK_ZERO; } +return ret; } static void null_refresh_filename(BlockDriverState *bs, QDict *opts) @@ -270,7 +271,7 @@ static BlockDriver bdrv_null_co = { .bdrv_co_flush_to_disk = null_co_flush, .bdrv_reopen_prepare= null_reopen_prepare, -.bdrv_co_get_block_status = null_co_get_block_status, +.bdrv_co_block_status = null_co_block_status, .bdrv_refresh_filename = null_refresh_filename, }; @@ -290,7 +291,7 @@ static BlockDriver bdrv_null_aio = { .bdrv_aio_flush = null_aio_flush, .bdrv_reopen_prepare= null_reopen_prepare, -.bdrv_co_get_block_status = null_co_get_block_status, +.bdrv_co_block_status = null_co_block_status, .bdrv_refresh_filename = null_refresh_filename, }; -- Best regards, Vladimir
[Qemu-devel] [PATCH for-2.12 1/7] tests/boot-serial-test: Make sure that we check the timeout regularly
If the guest continuesly writes characters to the UART, we never leave the inner while loop and thus never check whether we've reached the timeout value. So if we fail to find the expected string in the UART output, the test just hangs and never finishs. Use a counter to regularly break out of the while loop to check the timeout. Signed-off-by: Thomas Huth --- tests/boot-serial-test.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c index c935d69..fa4183d 100644 --- a/tests/boot-serial-test.c +++ b/tests/boot-serial-test.c @@ -43,12 +43,13 @@ static testdef_t tests[] = { static void check_guest_output(const testdef_t *test, int fd) { bool output_ok = false; -int i, nbr, pos = 0; +int i, nbr, pos = 0, ccnt; char ch; /* Poll serial output... Wait at most 60 seconds */ for (i = 0; i < 6000; ++i) { -while ((nbr = read(fd, &ch, 1)) == 1) { +ccnt = 0; +while ((nbr = read(fd, &ch, 1)) == 1 && ccnt++ < 512) { if (ch == test->expect[pos]) { pos += 1; if (test->expect[pos] == '\0') { -- 1.8.3.1
[Qemu-devel] [PATCH for-2.12 6/7] tests/boot-serial-test: Add a test for the moxiesim machine
Now that moxiesim supports the -bios parameter, we can check this machine in the boot-serial tester, too, by supplying a mini bios that only writes 'T' characters to the UART. Signed-off-by: Thomas Huth --- tests/Makefile.include | 2 ++ tests/boot-serial-test.c | 8 2 files changed, 10 insertions(+) diff --git a/tests/Makefile.include b/tests/Makefile.include index cd908da..00c1e11 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -307,6 +307,8 @@ check-qtest-mips64-y = tests/endianness-test$(EXESUF) check-qtest-mips64el-y = tests/endianness-test$(EXESUF) +check-qtest-moxie-y = tests/boot-serial-test$(EXESUF) + check-qtest-ppc-y = tests/endianness-test$(EXESUF) check-qtest-ppc-y += tests/boot-order-test$(EXESUF) check-qtest-ppc-y += tests/prom-env-test$(EXESUF) diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c index a39273a..1deddb8 100644 --- a/tests/boot-serial-test.c +++ b/tests/boot-serial-test.c @@ -40,6 +40,13 @@ static const uint8_t kernel_plml605[] = { 0xfc, 0xff, 0x00, 0xb8 /* bri -4 loop */ }; +static const uint8_t bios_moxiesim[] = { +0x20, 0x10, 0x00, 0x00, 0x03, 0xf8, /* ldi.s r1,0x3f8 */ +0x1b, 0x20, 0x00, 0x00, 0x00, 0x54, /* ldi.b r2,'T' */ +0x1e, 0x12, /* st.b r1,r2 */ +0x1a, 0x00, 0x00, 0x00, 0x10, 0x00 /* jmpa 0x1000 */ +}; + typedef struct testdef { const char *arch; /* Target architecture */ const char *machine;/* Name of the machine */ @@ -70,6 +77,7 @@ static testdef_t tests[] = { sizeof(kernel_pls3adsp1800), kernel_pls3adsp1800 }, { "microblazeel", "petalogix-ml605", "", "TT", sizeof(kernel_plml605), kernel_plml605 }, +{ "moxie", "moxiesim", "", "TT", sizeof(bios_moxiesim), 0, bios_moxiesim }, { NULL } }; -- 1.8.3.1
[Qemu-devel] [PATCH for-2.12 5/7] hw/moxie/moxiesim: Add support for loading a BIOS on moxiesim
The moxiesim machine already defines a memory region for a firmware, but does not provide the possibility to load an image via "-bios" yet. This will be needed for the boot-serial tester, so let's add support for "-bios" here now. Signed-off-by: Thomas Huth --- hw/moxie/moxiesim.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/hw/moxie/moxiesim.c b/hw/moxie/moxiesim.c index 3ba5848..00f56df 100644 --- a/hw/moxie/moxiesim.c +++ b/hw/moxie/moxiesim.c @@ -25,6 +25,7 @@ * THE SOFTWARE. */ #include "qemu/osdep.h" +#include "qemu/error-report.h" #include "qapi/error.h" #include "qemu-common.h" #include "cpu.h" @@ -41,6 +42,8 @@ #include "elf.h" #define PHYS_MEM_BASE 0x8000 +#define FIRMWARE_BASE 0x1000 +#define FIRMWARE_SIZE (128 * 0x1000) typedef struct { uint64_t ram_size; @@ -123,8 +126,8 @@ static void moxiesim_init(MachineState *machine) memory_region_init_ram(ram, NULL, "moxiesim.ram", ram_size, &error_fatal); memory_region_add_subregion(address_space_mem, ram_base, ram); -memory_region_init_ram(rom, NULL, "moxie.rom", 128 * 0x1000, &error_fatal); -memory_region_add_subregion(get_system_memory(), 0x1000, rom); +memory_region_init_ram(rom, NULL, "moxie.rom", FIRMWARE_SIZE, &error_fatal); +memory_region_add_subregion(get_system_memory(), FIRMWARE_BASE, rom); if (kernel_filename) { loader_params.ram_size = ram_size; @@ -133,6 +136,11 @@ static void moxiesim_init(MachineState *machine) loader_params.initrd_filename = initrd_filename; load_kernel(cpu, &loader_params); } +if (bios_name) { +if (load_image_targphys(bios_name, FIRMWARE_BASE, FIRMWARE_SIZE) < 0) { +error_report("Failed to load firmware '%s'", bios_name); +} +} /* A single 16450 sits at offset 0x3f8. */ if (serial_hds[0]) { -- 1.8.3.1
[Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board
We can output a character quite easily here with some few lines of assembly that we provide as a mini-kernel for this board. Signed-off-by: Thomas Huth --- tests/boot-serial-test.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c index d997269..dd3828c 100644 --- a/tests/boot-serial-test.c +++ b/tests/boot-serial-test.c @@ -16,6 +16,14 @@ #include "qemu/osdep.h" #include "libqtest.h" +static const uint8_t kernel_mcf5208[] = { +0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */ +0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */ +0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) Enable TX */ +0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0) Print 'T' */ +0x60, 0xfa /* bra.s loop */ +}; + typedef struct testdef { const char *arch; /* Target architecture */ const char *machine;/* Name of the machine */ @@ -41,6 +49,8 @@ static testdef_t tests[] = { { "x86_64", "q35", "-device sga", "SGABIOS" }, { "s390x", "s390-ccw-virtio", "-nodefaults -device sclpconsole,chardev=serial0", "virtio device" }, +{ "m68k", "mcf5208evb", "", "TT", sizeof(kernel_mcf5208), kernel_mcf5208 }, + { NULL } }; -- 1.8.3.1
[Qemu-devel] [PATCH for-2.12 0/7] Test more machines and TCG CPUs automatically
During the 2.11 development cycle, a bunch of ARM boards were broken completely, and nobody noticed it for a couple of weeks. Also a lot of TCG CPUs do not get any test coverage at all yet (especially since the old tests in the tests/tcg/ are still pretty much dead today). So we really should improve our testing in this area. This patch series now tries to improve the test coverage at least a little bit: The second patch adds the possibility to the boot-serial test to also load home-brew mini-kernels and firmware blobs, and the following patches then add some tests for various boards / CPUs with some few lines of self-made assembly. Thomas Huth (7): tests/boot-serial-test: Make sure that we check the timeout regularly tests/boot-serial-test: Add code to allow to specify our own kernel or bios tests/boot-serial-test: Add support for the mcf5208evb board tests/boot-serial-test: Add tests for microblaze boards hw/moxie/moxiesim: Add support for loading a BIOS on moxiesim tests/boot-serial-test: Add a test for the moxiesim machine tests/boot-serial-test: Add support for the raspi2 machine hw/moxie/moxiesim.c | 12 +- tests/Makefile.include | 7 tests/boot-serial-test.c | 106 --- 3 files changed, 109 insertions(+), 16 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH for-2.12 7/7] tests/boot-serial-test: Add support for the raspi2 machine
The raspi2 machine supports loading firmware images, so we can easily load a small test sequence as raw binary blob here to test the UART. Signed-off-by: Thomas Huth --- tests/Makefile.include | 1 + tests/boot-serial-test.c | 9 + 2 files changed, 10 insertions(+) diff --git a/tests/Makefile.include b/tests/Makefile.include index 00c1e11..09025dc 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -361,6 +361,7 @@ check-qtest-arm-y += tests/virtio-blk-test$(EXESUF) gcov-files-arm-y += arm-softmmu/hw/block/virtio-blk.c check-qtest-arm-y += tests/test-arm-mptimer$(EXESUF) gcov-files-arm-y += hw/timer/arm_mptimer.c +check-qtest-arm-y += tests/boot-serial-test$(EXESUF) check-qtest-aarch64-y = tests/numa-test$(EXESUF) diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c index 1deddb8..663b78b 100644 --- a/tests/boot-serial-test.c +++ b/tests/boot-serial-test.c @@ -47,6 +47,14 @@ static const uint8_t bios_moxiesim[] = { 0x1a, 0x00, 0x00, 0x00, 0x10, 0x00 /* jmpa 0x1000 */ }; +static const uint8_t bios_raspi2[] = { +0x08, 0x30, 0x9f, 0xe5, /* ldr r3,[pc,#8]Get base */ +0x54, 0x20, 0xa0, 0xe3, /* mov r2,#'T' */ +0x00, 0x20, 0xc3, 0xe5, /* strbr2,[r3] */ +0xfb, 0xff, 0xff, 0xea, /* b loop */ +0x00, 0x10, 0x20, 0x3f, /* 0x3f201000 = UART0 base addr */ +}; + typedef struct testdef { const char *arch; /* Target architecture */ const char *machine;/* Name of the machine */ @@ -78,6 +86,7 @@ static testdef_t tests[] = { { "microblazeel", "petalogix-ml605", "", "TT", sizeof(kernel_plml605), kernel_plml605 }, { "moxie", "moxiesim", "", "TT", sizeof(bios_moxiesim), 0, bios_moxiesim }, +{ "arm", "raspi2", "", "TT", sizeof(bios_raspi2), 0, bios_raspi2 }, { NULL } }; -- 1.8.3.1
[Qemu-devel] [PATCH for-2.12 2/7] tests/boot-serial-test: Add code to allow to specify our own kernel or bios
QEMU only ships with some few firmware images, i.e. we can currently run the boot-serial test only on a very limited set of machines. But writing some characters to the default UART of a machine can often be done with some few lines of assembly, so we add the possibility to the boot-serial tester to use its own mini-kernels or mini-firmwares. We write such images then into a file that we can load with the "-kernel" or "-bios" parameter when we launch QEMU. Signed-off-by: Thomas Huth --- tests/Makefile.include | 2 ++ tests/boot-serial-test.c | 54 +--- 2 files changed, 44 insertions(+), 12 deletions(-) diff --git a/tests/Makefile.include b/tests/Makefile.include index c002352..41ca44c 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -297,6 +297,8 @@ gcov-files-x86_64-y = $(subst i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y) check-qtest-alpha-y = tests/boot-serial-test$(EXESUF) +check-qtest-m68k-y = tests/boot-serial-test$(EXESUF) + check-qtest-mips-y = tests/endianness-test$(EXESUF) check-qtest-mips64-y = tests/endianness-test$(EXESUF) diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c index fa4183d..d997269 100644 --- a/tests/boot-serial-test.c +++ b/tests/boot-serial-test.c @@ -7,9 +7,10 @@ * or later. See the COPYING file in the top-level directory. * * This test is used to check that the serial output of the firmware - * (that we provide for some machines) contains an expected string. - * Thus we check that the firmware still boots at least to a certain - * point and so we know that the machine is not completely broken. + * (that we provide for some machines) or some small mini-kernels that + * we provide here contains an expected string. Thus we check that the + * firmware/kernel still boots at least to a certain point and so we + * know that the machine is not completely broken. */ #include "qemu/osdep.h" @@ -20,6 +21,9 @@ typedef struct testdef { const char *machine;/* Name of the machine */ const char *extra; /* Additional parameters */ const char *expect; /* Expected string in the serial output */ +size_t codesize;/* Size of the kernel or bios data */ +const uint8_t *kernel; /* Set in case we use our own mini kernel */ +const uint8_t *bios;/* Set in case we use our own mini bios */ } testdef_t; static testdef_t tests[] = { @@ -72,26 +76,52 @@ done: static void test_machine(const void *data) { const testdef_t *test = data; -char tmpname[] = "/tmp/qtest-boot-serial-XX"; -int fd; +char serialtmp[] = "/tmp/qtest-boot-serial-sXX"; +char codetmp[] = "/tmp/qtest-boot-serial-cXX"; +const char *codeparam = ""; +const uint8_t *code = NULL; +int ser_fd; -fd = mkstemp(tmpname); -g_assert(fd != -1); +ser_fd = mkstemp(serialtmp); +g_assert(ser_fd != -1); + +if (test->kernel) { +code = test->kernel; +codeparam = "-kernel"; +} else if (test->bios) { +code = test->bios; +codeparam = "-bios"; +} + +if (code) { +ssize_t wlen; +int code_fd; + +code_fd = mkstemp(codetmp); +g_assert(code_fd != -1); +wlen = write(code_fd, code, test->codesize); +g_assert(wlen == test->codesize); +close(code_fd); +} /* * Make sure that this test uses tcg if available: It is used as a * fast-enough smoketest for that. */ -global_qtest = qtest_startf("-M %s,accel=tcg:kvm " +global_qtest = qtest_startf("%s %s -M %s,accel=tcg:kvm " "-chardev file,id=serial0,path=%s " "-no-shutdown -serial chardev:serial0 %s", -test->machine, tmpname, test->extra); -unlink(tmpname); +codeparam, code ? codetmp : "", +test->machine, serialtmp, test->extra); +unlink(serialtmp); +if (code) { +unlink(codetmp); +} -check_guest_output(test, fd); +check_guest_output(test, ser_fd); qtest_quit(global_qtest); -close(fd); +close(ser_fd); } int main(int argc, char *argv[]) -- 1.8.3.1
[Qemu-devel] [PATCH for-2.12 4/7] tests/boot-serial-test: Add tests for microblaze boards
This adds two simple TCG + UART tests for the microblaze boards, one in big endian mode, and one in little endian mode. Signed-off-by: Thomas Huth --- tests/Makefile.include | 2 ++ tests/boot-serial-test.c | 20 2 files changed, 22 insertions(+) diff --git a/tests/Makefile.include b/tests/Makefile.include index 41ca44c..cd908da 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -299,6 +299,8 @@ check-qtest-alpha-y = tests/boot-serial-test$(EXESUF) check-qtest-m68k-y = tests/boot-serial-test$(EXESUF) +check-qtest-microblaze-y = tests/boot-serial-test$(EXESUF) + check-qtest-mips-y = tests/endianness-test$(EXESUF) check-qtest-mips64-y = tests/endianness-test$(EXESUF) diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c index dd3828c..a39273a 100644 --- a/tests/boot-serial-test.c +++ b/tests/boot-serial-test.c @@ -24,6 +24,22 @@ static const uint8_t kernel_mcf5208[] = { 0x60, 0xfa /* bra.s loop */ }; +static const uint8_t kernel_pls3adsp1800[] = { +0xb0, 0x00, 0x84, 0x00, /* imm 0x8400 */ +0x30, 0x60, 0x00, 0x04, /* addik r3,r0,4 */ +0x30, 0x80, 0x00, 0x54, /* addik r4,r0,'T' */ +0xf0, 0x83, 0x00, 0x00, /* sbi r4,r3,0 */ +0xb8, 0x00, 0xff, 0xfc /* bri -4 loop */ +}; + +static const uint8_t kernel_plml605[] = { +0xe0, 0x83, 0x00, 0xb0, /* imm 0x83e0 */ +0x00, 0x10, 0x60, 0x30, /* addik r3,r0,0x1000 */ +0x54, 0x00, 0x80, 0x30, /* addik r4,r0,'T' */ +0x00, 0x00, 0x83, 0xf0, /* sbi r4,r3,0 */ +0xfc, 0xff, 0x00, 0xb8 /* bri -4 loop */ +}; + typedef struct testdef { const char *arch; /* Target architecture */ const char *machine;/* Name of the machine */ @@ -50,6 +66,10 @@ static testdef_t tests[] = { { "s390x", "s390-ccw-virtio", "-nodefaults -device sclpconsole,chardev=serial0", "virtio device" }, { "m68k", "mcf5208evb", "", "TT", sizeof(kernel_mcf5208), kernel_mcf5208 }, +{ "microblaze", "petalogix-s3adsp1800", "", "TT", + sizeof(kernel_pls3adsp1800), kernel_pls3adsp1800 }, +{ "microblazeel", "petalogix-ml605", "", "TT", + sizeof(kernel_plml605), kernel_plml605 }, { NULL } }; -- 1.8.3.1
Re: [Qemu-devel] [PATCH v2 for-2.12 01/16] s390x/tcg: introduce and use s390_program_interrupt()
On 29.11.2017 21:26, David Hildenbrand wrote: > Allows to easily convert more callers of program_interrupt() and to > easily introduce new exceptions without forgetting about the cpu state > reset. > > Use s390_program_interrupt() in places where we already had the same > pattern. We will later get rid of program_interrupt(). > > RA != 0 checks are already done behind the scenes. > > Reviewed-by: Richard Henderson > Signed-off-by: David Hildenbrand > --- > target/s390x/cpu.h | 2 ++ > target/s390x/crypto_helper.c | 7 ++- > target/s390x/excp_helper.c | 5 + > target/s390x/interrupt.c | 13 + > target/s390x/mem_helper.c| 35 +++ > target/s390x/misc_helper.c | 3 +-- > 6 files changed, 30 insertions(+), 35 deletions(-) Reviewed-by: Thomas Huth
Re: [Qemu-devel] [PATCH v4 09/20] parallels: Switch to .bdrv_co_block_status()
12.10.2017 21:59, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Update the parallels driver accordingly. Note that the internal function block_status() is still sector-based, because it is still in use by other sector-based functions; but that's okay because request_alignment is 512 as a result of those functions. For now, no optimizations are added based on the mapping hint. Signed-off-by: Eric Blake --- v4: rebase to interface tweak, R-b dropped v3: no change v2: rebase to mapping parameter; it is ignored, so R-b kept --- block/parallels.c | 22 +++--- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/block/parallels.c b/block/parallels.c index 2b6c6e5709..4a086f29e9 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -278,23 +278,31 @@ static coroutine_fn int parallels_co_flush_to_os(BlockDriverState *bs) } -static int64_t coroutine_fn parallels_co_get_block_status(BlockDriverState *bs, -int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file) +static int coroutine_fn parallels_co_block_status(BlockDriverState *bs, + bool want_zero, + int64_t offset, + int64_t bytes, + int64_t *pnum, + int64_t *map, + BlockDriverState **file) { BDRVParallelsState *s = bs->opaque; -int64_t offset; +int count; +assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE)); qemu_co_mutex_lock(&s->lock); -offset = block_status(s, sector_num, nb_sectors, pnum); +offset = block_status(s, offset >> BDRV_SECTOR_BITS, + bytes >> BDRV_SECTOR_BITS, &count); qemu_co_mutex_unlock(&s->lock); if (offset < 0) { pnum=count*BDRV_SECTOR_SIZE and map=0 setting missed here. return 0; } +*pnum = count * BDRV_SECTOR_SIZE; +*map = offset; *file = bs->file->bs; -return (offset << BDRV_SECTOR_BITS) | -BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID; +return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID; } static coroutine_fn int parallels_co_writev(BlockDriverState *bs, @@ -781,7 +789,7 @@ static BlockDriver bdrv_parallels = { .bdrv_open= parallels_open, .bdrv_close = parallels_close, .bdrv_child_perm = bdrv_format_default_perms, -.bdrv_co_get_block_status = parallels_co_get_block_status, +.bdrv_co_block_status = parallels_co_block_status, .bdrv_has_zero_init = bdrv_has_zero_init_1, .bdrv_co_flush_to_os = parallels_co_flush_to_os, .bdrv_co_readv = parallels_co_readv, -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v2 for-2.12 02/16] s390x/tcg: get rid of runtime_exception()
On 29.11.2017 21:26, David Hildenbrand wrote: > Let's use s390_program_interrupt() instead. > > Reviewed-by: Richard Henderson > Signed-off-by: David Hildenbrand > --- > target/s390x/fpu_helper.c | 2 +- > target/s390x/int_helper.c | 14 +++--- > target/s390x/internal.h| 2 -- > target/s390x/misc_helper.c | 16 > 4 files changed, 8 insertions(+), 26 deletions(-) Is it a disadvantage that runtime_exception() was declared as QEMU_NORETURN, and s390_program_interrupt() is not declared as QEMU_NORETURN? At a first glance, I guess it's ok, and in that case: Reviewed-by: Thomas Huth
Re: [Qemu-devel] [PATCH v1 2/2] intel-iommu: Extend address width to 48 bits
On Thu, Nov 30, 2017 at 01:22:38PM +0800, Liu, Yi L wrote: > On Tue, Nov 14, 2017 at 06:13:50PM -0500, prasad.singamse...@oracle.com wrote: > > From: Prasad Singamsetty > > > > The current implementation of Intel IOMMU code only supports 39 bits > > iova address width. This patch provides a new parameter (x-aw-bits) > > for intel-iommu to extend its address width to 48 bits but keeping the > > default the same (39 bits). The reason for not changing the default > > is to avoid potential compatibility problems with live migration of > > intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits' > > parameter are 39 and 48. > > > > After enabling larger address width (48), we should be able to map > > larger iova addresses in the guest. For example, a QEMU guest that > > is configured with large memory ( >=1TB ). To check whether 48 bits > > I didn't quite get your point here. Address width limits the iova range, > but it doesn't limit the guest memory range. e.g. you can use 39 bit iova > address to access a guest physical address larger than (2^39 - 1) as long > as the guest 2nd level page table is well programmed. Only one exception, > if you requires a continuous iova range(e.g. 2^39), it would be an issue. > Not sure if this is the major motivation of your patch? However, I'm not > against extend the address width to be 48 bits. Just need to make it clear > here. One thing I can think of is the identity mapping. Say, when iommu=pt is set in guest, meanwhile when PT capability is not supported from hardware (here I mean, the emulated hardware, of course), guest kernel will create one identity mapping to emulate the PT mode. Current linux kernel's identity mapping should be a static 48 bits mapping (it must cover the whole memory range of guest), so if we provide a 39 bits vIOMMU to the guest, AFAIU we'll fail at device attaching to that identity domain of every single device that is protected by that 39 bits vIOMMU (kernel will find that domain gaw is bigger than vIOMMU supported gaw of that device). I do see no good fix for that, except boost the supported gaw to bigger ones. Thanks, -- Peter Xu
Re: [Qemu-devel] [PATCH v3 1/2] virtio: check VirtQueue Vring object is set
+-- On Wed, 29 Nov 2017, Cornelia Huck wrote --+ | I think the basic problem is still that you conflate two things: | - vring.num, which cannot be flipped between 0 and !0 by the guest | - vring.{desc,avail,used}, which can | | IOW, if vring.num == 0, the guest cannot manipulate the queue; if | vring.desc == 0, the guest can. | | Many functions have a likely/unlikely check, setup routines excepted. Have sent a revised patch v4. Thank you. -- Prasad J Pandit / Red Hat Product Security Team 47AF CE69 3A90 54AA 9045 1053 DD13 3D32 FE5B 041F
Re: [Qemu-devel] [PATCH v4 10/20] qcow: Switch to .bdrv_co_block_status()
12.10.2017 21:59, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Update the qcow driver accordingly. There is no intent to optimize based on the want_zero flag for this format. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH] i386: turn off l3-cache property by default
On 2017/11/29 21:35, Roman Kagan wrote: > On Wed, Nov 29, 2017 at 07:58:19PM +0800, Longpeng (Mike) wrote: >> On 2017/11/29 18:41, Eduardo Habkost wrote: >>> On Wed, Nov 29, 2017 at 01:20:38PM +0800, Longpeng (Mike) wrote: On 2017/11/29 5:13, Eduardo Habkost wrote: > On Tue, Nov 28, 2017 at 11:20:27PM +0300, Denis V. Lunev wrote: >> On 11/28/2017 10:58 PM, Eduardo Habkost wrote: >>> On Fri, Nov 24, 2017 at 04:26:50PM +0300, Denis Plotnikov wrote: Commit 14c985cffa "target-i386: present virtual L3 cache info for vcpus" introduced and set by default exposing l3 to the guest. The motivation behind it was that in the Linux scheduler, when waking up a task on a sibling CPU, the task was put onto the target CPU's runqueue directly, without sending a reschedule IPI. Reduction in the IPI count led to performance gain. However, this isn't the whole story. Once the task is on the target CPU's runqueue, it may have to preempt the current task on that CPU, be it the idle task putting the CPU to sleep or just another running task. For that a reschedule IPI will have to be issued, too. Only when that other CPU is running a normal task for too little time, the fairness constraints will prevent the preemption and thus the IPI. Agree. :) Our testing VM is Suse11 guest with idle=poll at that time and now I realize that Suse11 has a BUG in its scheduler. For REHL 7.3 or upstream kernel, in ttwu_queue_remote(), a RES IPI is issued if rq->idle is not polling: ''' static void ttwu_queue_remote(struct task_struct *p, int cpu) { struct rq *rq = cpu_rq(cpu); if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)) { if (!set_nr_if_polling(rq->idle)) smp_send_reschedule(cpu); else trace_sched_wake_idle_without_ipi(cpu); } } ''' But for Suse11, it does not check, it send a RES IPI unconditionally. >>> >>> So, does that mean no Linux guest benefits from the l3-cache=on >>> default except SuSE 11 guests? >>> >> >> Not only that, there is another scenario: >> >> static void ttwu_queue(...) >> { >> if (...two cpus NOT sharing L3-cache) { >> ... >> ttwu_queue_remote(p, cpu, wake_flags); >> return; >> } >> ... >> ttwu_do_activate(rq, p, wake_flags, &rf); <--*Here* >> ... >> } >> >> In ttwu_do_activate(), there are also some opportunities with low >> probability to >> do not send RES IPI even if the target cpu isn't in IDLE polling state. > > Well it isn't so low actually, what you need is to keep the cpus busy > switching tasks. In that case it's not uncommon that the task being > woken up on a remote cpu has accumulated more vruntime than the task > already running on that cpu; in that case the new task won't preempt the > current task and the IPI won't be issued. E.g. on a RHEL 7.4 guest we > saw: > I get it, thanks. This boils down to the improvement being only achievable in workloads with many actively switching tasks. We had no access to the (proprietary?) SAP HANA benchmark the commit referred to, but the pattern is also reproduced with "perf bench sched messaging -g 1" on 1 socket, 8 cores vCPU topology, we see indeed: l3-cache #res IPI /s #time / 1 loops off560K1.8 sec on 40K 0.9 sec > > The workload where it bites is mostly idle guest, with chains of > dependent wakeups, i.e. with little parallelism: > Now there's a downside: with L3 cache the Linux scheduler is more eager to wake up tasks on sibling CPUs, resulting in unnecessary cross-vCPU interactions and therefore exessive halts and IPIs. E.g. "perf bench sched pipe -i 10" gives l3-cache #res IPI /s #HLT /s #time /10 loops off200 (no K) 230 0.2 sec on 400K330K0.5 sec I guess this issue could be resolved by disable the SD_WAKE_AFFINE. > > Actually, it's SD_WAKE_AFFINE that's being effectively defeated by this > l3-cache, because the scheduler thinks that the cpus that share the > last-level cache are close enough that a dependent task can be woken up > on a sibling cpu. > In this case (sched pipe), without L3-cache, a dependent task is woken up on the original cpu mostly, if these two tasks ran on the same cpu then the dependent task is woken up without a RES IPI. The related codes are: ''' void resched_curr(struct rq *rq) { if (cpu == smp_processor_id()) { set_tsk_need_resched(curr
Re: [Qemu-devel] [PATCH v4 1/2] virtio: check VirtQueue Vring object is set
On Wed, 29 Nov 2017 23:14:27 +0530 P J P wrote: > From: Prasad J Pandit > > A guest could attempt to use an uninitialised VirtQueue object > or unset Vring.align leading to a arithmetic exception. Add check > to avoid it. > > Reported-by: Zhangboxian > Signed-off-by: Prasad J Pandit > --- > hw/virtio/virtio.c | 14 +++--- > 1 file changed, 11 insertions(+), 3 deletions(-) Reviewed-by: Cornelia Huck
Re: [Qemu-devel] Block layer complexity: what to do to keep it under control?
On Wed, 11/29 12:00, Stefan Hajnoczi wrote: > On Wed, Nov 29, 2017 at 11:55:02AM +0800, Fam Zheng wrote: > > As we move forwards with new features in the block layer, the chances of > > tricky > > bugs happening have been increasing alongside - block jobs, coroutines, > > throttling, AioContext, op blockers and image locking combined together > > make a > > large and complex picture that is hard to fully understand and work with. > > Some > > bugs we've encountered are quite challenging already. Examples are: > > > > - segfault in parallel blockjobs (iotest 30) > > https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01144.html > > > > - Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage > > migration) > > https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01626.html > > > > - Drainage in bdrv_replace_child_noperm() > > https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg00868.html > > > > - Regression from 2.8: stuck in bdrv_drain() > > https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02193.html > > > > So in principle, what should we do to make the block layer easy to > > understand, > > develop with and debug? > > The assumptions that the code relies on are unclear so it's easy to > introduce new bugs. Is that one thing we could do better in documenting? > > We are at a point where code review isn't finding certain bugs because > no single person knows all the assumptions. Previously the problem was > contained because maintainers spotted problems before patches were > merged. > > This is not primarily a documentation problem though. We cannot > document our way out of this because no single person (patch author or > code reviewer) can know or check everything anymore due to the scale. > > I think it's a (lack of) design problem because we have many incomplete > abstractions like block jobs, IOThreads, block graph, image locking, > etc. They do not cover all possibly states and interactions today. > Extending them leads to complex bugs. > > A little progress has been made with defining higher-level APIs for > block drivers and block jobs. This way they either don't deal with > low-level details of the concurrency and event loop models (e.g. > bdrv_coroutine_enter()) or there is an interface that prompts them to > integrate properly like bdrv_attach/detach_aio_context(). Sounds good. > > Event loops and coroutines are good but they should not be used directly > by block drivers and block jobs. We need safe, high-level APIs that > implement commonly-used operations. > > > - Documentation > > > > There is no central developer doc about block layer, especially how all > > pieces > > fit together. Having one will make it a lot easier for new contributors to > > understand better. Of course, we're facing the old problem: the code is > > moving, maintaining an updated document needs effort. > > > > Idea: add ./doc/deve/block.txt? > > IOThreads and AioContexts are addressed here: > docs/devel/multiple-iothreads.txt > > The game has become significantly more complex than what the document > describes. It's lacking aio_co_wake() and aio_co_schedule() for > example. > > > - Simplified code, or more orthogonal/modularized architecture. > > > > Each aspect of block layer is complex enough so isolating them as much as > > possible is a reasonable approach to control the complexity. Block jobs > > and > > throttling becoming block filters is a good example, we should identify > > more. > > > > Idea: rethink event loops. Create coroutines ubiquitously (for example for > > each fd handler, BH and timer), so that many nested aio_poll() can be > > removed. > > > > Crazy idea: move the whole block layer to a vhost process, and implement > > existing features differently, especially in terms of multi-threading > > (hint: > > rust?). > > A reimplementation will not solve the problem because: > > 1. If it still has the same feature set and requirements then the level >of complexity will be comparable. > > 2. We can reduce accidental (inessential) complexity by continuing the >various efforts around the block graph, block jobs, multi-queue block >layer with an eye towards higher level APIs. Starting over is certainly not the motivation to do qemu-vhost, but it would be an opportunity to use different async/concurrency paradigms if that is going to happen. I think in current block layer, event loop + coroutine is a good combination, but having nested aio_poll()'s made it worse, then mixing IOThreads in makes it a lot more complicated. Fam
Re: [Qemu-devel] [RFC PATCH 1/1] s390x/css: unresrict cssids
On Wed, 29 Nov 2017 19:51:23 +0100 Christian Borntraeger wrote: > On 11/28/2017 03:45 PM, Cornelia Huck wrote: > > On Tue, 28 Nov 2017 15:17:49 +0100 > > Christian Borntraeger wrote: > > > >> On 11/28/2017 03:01 PM, Cornelia Huck wrote: > >>> On Tue, 28 Nov 2017 14:25:08 +0100 > >>> Christian Borntraeger wrote: > > > What I want now is to enable vfio-ccw for libvirt and Linux guests and > for that > we need to lift the restriction of having vfio not in FE. > >>> > >>> This, however, worries me. I don't really consider vfio-ccw ready for > >>> prime time yet. We still need to figure out path handling at the very > >>> least. And I'm still not sure that our non-handling of halt/clear won't > >>> cause major issues with e.g. a storage server error recovery. > >>> > >>> Can we flag vfio-ccw as experimental or such in libvirt? > >> > >> We should have flagged vfio-ccw experimental in QEMU then. > >> e.g. by using x-vfio-ccw or whatever.I dont think we can do that > >> after the facts. > > > > Yes, we should have done that... can't fix that now, unfortunately. > > > >> > >> I am not that deep into vfio-ccw, but I was under the impression that > >> the missing features should not affect the vfio interface that is > >> created by libvirt. After all this should just be a "-device > >> vfio-ccw,." > >> statement and some setup steps beforehand (unbind + setup of vfio-ccw) > > > > The halt/clear stuff is highly unlikely to influence the configuration > > interface. I'm still not too clear about path handling, though. Will we > > need different modes (hypervisor managed vs. full passthrough? probably > > only passthrough)? What about pgid handling - do we need some kind of > > pgid manager? (Keep in mind that you get to set the pgid once. This has > > implications for e.g. reserve/release.) > > > > Also, what about devices other than ECKD DASD? Has there been any > > testing? Tapes should be interesting; the other interesting ccw devices > > are QDIO-based and I'm not sure we want to spend anything on supporting > > those. > > DASD is probably the most important thing today, QDIO might never happen > (or very late). One thing I'd find interesting about tapes is very long running channel programs (like rewind) and how they interact with halt/clear. But yes, I would think that DASD is the most important one in practice. > See my proposal below. > > > > > The interface is probably fine, but I'd like to get an idea about the > > path stuff (is the attachment spec that contains the pgid stuff > > publicly available, btw.?) > > > >> > >> If your concern is the serviceability I think it would be valid for a RHEL > >> KVM to disable vfio-ccw in the kernel. Maybe we could even provide a config > >> for QEMU? > > > > It's fine just to turn off vfio-ccw in the kernel. > > > >> PS: I learned from the PCI and CCW experience that I do not want > >> to upstream kernel/qemu code unless I have a working libvirt code that > >> shows that the thing will work. > > > > Yes, I understand the wish to verify the interface, and I think it's a > > good idea. What I'm worried about is that this might be the precursor > > to a push to support it, and I don't think vfio-ccw is ready for that > > yet. > > > FWIW, libvirt should not care if a device works in all cases or not because > libvirt versions should work with all kind of QEMU/kernel combinations. > Fencing in libvirt that the kernel part is not up to the task is making > me feel the same way as you when you see the css-unrestricted property > at a device ;-) I'm more worried about the political angle than the technical. If we don't get pressure to support this too soon, I don't care that much about experimental or not. > Having said that, I think that having vfio-ccw support has a value (and it > actually works for an unnamed test infrastructure). In addition to that I > am much more likely to actually test this continuously if I have libvirt > support. Good to hear that it works for a non-Linux guest. Any plans to test something like storage server failover? [I'd really love to do something about the path handling stuff, but the combination of scant documentation and scantier time works against me.] > > > So what about the following: > > 1. We will implement the libvirt support with either a or b: > a: if we find a solution to our "where to put the property" dispute use that > to decide > if we can add vfio-ccw > b if not: just provide a patch that lifts the restriction without any > property. > in libvirt blindy assume that free assignment will work. old QEMUs will > complain at > startup and libvirt will print the QEMU error. This is similar to other > situations > where libvirt cannot fully bprobe if the QEMU will work or not. (not having > vfio-ccw > support in the kernel will certainly allow libvirt to reject this upfront) I hope we end up with a solution that works for everyone... >
Re: [Qemu-devel] [PATCH v4 11/20] qcow2: Switch to .bdrv_co_block_status()
12.10.2017 21:59, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Update the qcow2 driver accordingly. For now, we are ignoring the 'want_zero' hint. However, it should be relatively straightforward to honor the hint as a way to return larger *pnum values when we have consecutive clusters with the same data/zero status but which differ only in having non-consecutive mappings. useful thing (for example, to get several compressed clusters in one chunk, but do not include following zero clusters). but should not the hint be called want_mapping for it? or may be we will move to "int hint_flags", which will include status flags we a interested in? Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy --- v4: update to interface tweak v3: no change v2: rebase to mapping flag --- block/qcow2.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 92cb9f9bfa..a84e50a6e6 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1625,32 +1625,34 @@ static void qcow2_join_options(QDict *options, QDict *old_options) } } -static int64_t coroutine_fn qcow2_co_get_block_status(BlockDriverState *bs, -int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file) +static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs, + bool want_zero, + int64_t offset, int64_t count, 'count' is used instead of 'bytes' because of qcow2_get_cluster_offset which wants int*.. + int64_t *pnum, int64_t *map, + BlockDriverState **file) { BDRVQcow2State *s = bs->opaque; uint64_t cluster_offset; int index_in_cluster, ret; unsigned int bytes; -int64_t status = 0; +int status = 0; -bytes = MIN(INT_MAX, nb_sectors * BDRV_SECTOR_SIZE); +bytes = MIN(INT_MAX, count); qemu_co_mutex_lock(&s->lock); -ret = qcow2_get_cluster_offset(bs, sector_num << BDRV_SECTOR_BITS, &bytes, - &cluster_offset); +ret = qcow2_get_cluster_offset(bs, offset, &bytes, &cluster_offset); qemu_co_mutex_unlock(&s->lock); if (ret < 0) { return ret; } -*pnum = bytes >> BDRV_SECTOR_BITS; +*pnum = bytes; if (cluster_offset != 0 && ret != QCOW2_CLUSTER_COMPRESSED && !s->crypto) { -index_in_cluster = sector_num & (s->cluster_sectors - 1); -cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS); +index_in_cluster = offset & (s->cluster_size - 1); +*map = cluster_offset | index_in_cluster; *file = bs->file->bs; -status |= BDRV_BLOCK_OFFSET_VALID | cluster_offset; +status |= BDRV_BLOCK_OFFSET_VALID; } if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) { status |= BDRV_BLOCK_ZERO; @@ -4333,7 +4335,7 @@ BlockDriver bdrv_qcow2 = { .bdrv_child_perm = bdrv_format_default_perms, .bdrv_create= qcow2_create, .bdrv_has_zero_init = bdrv_has_zero_init_1, -.bdrv_co_get_block_status = qcow2_co_get_block_status, +.bdrv_co_block_status = qcow2_co_block_status, .bdrv_co_preadv = qcow2_co_preadv, .bdrv_co_pwritev= qcow2_co_pwritev, -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v2 for-2.12 04/16] s390x/ioinst: pass the retaddr to all IO instructions
On 29.11.2017 21:26, David Hildenbrand wrote: > TCG needs the retaddr when injecting an interrupt. Let's just pass it > along and use RA_IGNORED for KVM. The value will be completely ignored for > KVM. > > Convert program_interrupt() to s390_program_interrupt() directly, making > use of the passed address. > > Reviewed-by: Richard Henderson > Signed-off-by: David Hildenbrand > --- > target/s390x/cpu.h | 1 + > target/s390x/internal.h| 29 +++- > target/s390x/ioinst.c | 67 > +++--- > target/s390x/kvm.c | 27 ++- > target/s390x/misc_helper.c | 20 +++--- > 5 files changed, 75 insertions(+), 69 deletions(-) Reviewed-by: Thomas Huth
Re: [Qemu-devel] [PATCH v2 for-2.12 05/16] s390x/pci: pass the retaddr to all PCI instructions
On 29.11.2017 21:26, David Hildenbrand wrote: > Once we wire up TCG, we will need the retaddr to correctly inject > program interrupts. As we want to get rid of the function > program_interrupt(), convert PCI code too. > > For KVM, we can simply use RA_IGNORED. > > Convert program_interrupt() to s390_program_interrupt() directly, making > use of the passed address. > > Reviewed-by: Richard Henderson > Signed-off-by: David Hildenbrand > --- > hw/s390x/s390-pci-inst.c | 83 > +--- > hw/s390x/s390-pci-inst.h | 16 ++ > target/s390x/kvm.c | 14 > 3 files changed, 59 insertions(+), 54 deletions(-) Reviewed-by: Thomas Huth
Re: [Qemu-devel] [PATCH v1 2/2] intel-iommu: Extend address width to 48 bits
On Thu, Nov 30, 2017 at 05:11:55PM +0800, Peter Xu wrote: > On Thu, Nov 30, 2017 at 01:22:38PM +0800, Liu, Yi L wrote: > > On Tue, Nov 14, 2017 at 06:13:50PM -0500, prasad.singamse...@oracle.com > > wrote: > > > From: Prasad Singamsetty > > > > > > The current implementation of Intel IOMMU code only supports 39 bits > > > iova address width. This patch provides a new parameter (x-aw-bits) > > > for intel-iommu to extend its address width to 48 bits but keeping the > > > default the same (39 bits). The reason for not changing the default > > > is to avoid potential compatibility problems with live migration of > > > intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits' > > > parameter are 39 and 48. > > > > > > After enabling larger address width (48), we should be able to map > > > larger iova addresses in the guest. For example, a QEMU guest that > > > is configured with large memory ( >=1TB ). To check whether 48 bits > > > > I didn't quite get your point here. Address width limits the iova range, > > but it doesn't limit the guest memory range. e.g. you can use 39 bit iova > > address to access a guest physical address larger than (2^39 - 1) as long > > as the guest 2nd level page table is well programmed. Only one exception, > > if you requires a continuous iova range(e.g. 2^39), it would be an issue. > > Not sure if this is the major motivation of your patch? However, I'm not > > against extend the address width to be 48 bits. Just need to make it clear > > here. > > One thing I can think of is the identity mapping. Say, when iommu=pt > is set in guest, meanwhile when PT capability is not supported from > hardware (here I mean, the emulated hardware, of course), guest kernel > will create one identity mapping to emulate the PT mode. True. > Current linux kernel's identity mapping should be a static 48 bits > mapping (it must cover the whole memory range of guest), so if we I suppose guest memory range depends on the AW reported by CPUID? Not sure if it is constantly 48 bits. > provide a 39 bits vIOMMU to the guest, AFAIU we'll fail at device > attaching to that identity domain of every single device that is > protected by that 39 bits vIOMMU (kernel will find that domain gaw is > bigger than vIOMMU supported gaw of that device). Yeah, this is a good argue. As it is 1:1 mapping, the translated address is limited all the same. > I do see no good fix for that, except boost the supported gaw to > bigger ones. How about defaultly expose cap.PT bit? In that case, there will no 1:1 mapping in guest side. Translation is skipped. So the IOMMU AW won't limit the addressing. Regards, Yi L > > Thanks, > > -- > Peter Xu >
Re: [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine
On Thu, Nov 23, 2017 at 02:29:47PM +0100, Cédric Le Goater wrote: > The XIVE object is designed to be always available, so it is created > unconditionally on newer machines. There doesn't actually seem to be anything dependent on machine version here. > Depending on the configuration and > the guest capabilities, the CAS negotiation process will decide which > interrupt model to use, legacy or XIVE. > > The XIVE model makes use of the full range of the IRQ number space > because the IRQ numbers for the CPU IPIs are allocated in the range > below XICS_IRQ_BASE, which is unused by XICS. Ok. And I take it 4096 is enough space for the XIVE IPIs for the forseeable future? > > Signed-off-by: Cédric Le Goater > --- > hw/ppc/spapr.c | 34 ++ > include/hw/ppc/spapr.h | 2 ++ > 2 files changed, 36 insertions(+) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 5d3325ca3c88..0e0107c8272c 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -56,6 +56,7 @@ > #include "hw/ppc/spapr_vio.h" > #include "hw/pci-host/spapr.h" > #include "hw/ppc/xics.h" > +#include "hw/ppc/spapr_xive.h" > #include "hw/pci/msi.h" > > #include "hw/pci/pci.h" > @@ -204,6 +205,29 @@ static void xics_system_init(MachineState *machine, int > nr_irqs, Error **errp) > } > } > > +static sPAPRXive *spapr_xive_create(sPAPRMachineState *spapr, int nr_irqs, > +Error **errp) > +{ > +Error *local_err = NULL; > +Object *obj; > + > +obj = object_new(TYPE_SPAPR_XIVE); > +object_property_add_child(OBJECT(spapr), "xive", obj, &error_abort); > +object_property_set_int(obj, nr_irqs, "nr-irqs", &local_err); > +if (local_err) { > +goto error; > +} > +object_property_set_bool(obj, true, "realized", &local_err); > +if (local_err) { > +goto error; > +} > + > +return SPAPR_XIVE(obj); > +error: > +error_propagate(errp, local_err); > +return NULL; > +} > + > static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu, >int smt_threads) > { > @@ -2360,6 +2384,16 @@ static void ppc_spapr_init(MachineState *machine) > /* Set up Interrupt Controller before we create the VCPUs */ > xics_system_init(machine, XICS_IRQS_SPAPR, &error_fatal); > > +/* We don't have KVM support yet, so check for irqchip=on */ > +if (kvm_enabled() && machine_kernel_irqchip_required(machine)) { > +error_report("kernel_irqchip requested. no XIVE support"); I think you want an actual exit(1) here, no? error_report() will print an error but keep going. > +} else { > +/* XIVE uses the full range of IRQ numbers. The CPU IPIs will > + * use the range below XICS_IRQ_BASE, which is unused by XICS. */ > +spapr->xive = spapr_xive_create(spapr, XICS_IRQ_BASE + > XICS_IRQS_SPAPR, > +&error_fatal); XICS_IRQ_BASE == 4096, and XICS_IRQS_SPAPR (which we should rename at some point) == 1024. So we have a total irq space of 5k, which is a bit odd. I'd be ok with rounding it out to 8k for newer machines if that's useful. Sparse allocations in there might make life easier for getting consistent irq numbers without an "allocator" per se (because we can use different regions for VIO, PCI intx, MSI, etc. etc.). > +} > + > /* Set up containers for ibm,client-architecture-support negotiated > options > */ > spapr->ov5 = spapr_ovec_new(); > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > index 9a3885593c86..90e2b0f6c678 100644 > --- a/include/hw/ppc/spapr.h > +++ b/include/hw/ppc/spapr.h > @@ -14,6 +14,7 @@ struct sPAPRNVRAM; > typedef struct sPAPREventLogEntry sPAPREventLogEntry; > typedef struct sPAPREventSource sPAPREventSource; > typedef struct sPAPRPendingHPT sPAPRPendingHPT; > +typedef struct sPAPRXive sPAPRXive; > > #define HPTE64_V_HPTE_DIRTY 0x0040ULL > #define SPAPR_ENTRY_POINT 0x100 > @@ -127,6 +128,7 @@ struct sPAPRMachineState { > MemoryHotplugState hotplug_memory; > > const char *icp_type; > +sPAPRXive *xive; > }; > > #define H_SUCCESS 0 -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH 05/17] timer: generalize Dallas/Maxim RTC i2c devices
On Sun, Nov 26, 2017 at 03:59:03PM -0600, Michael Davidsaver wrote: > Support for: ds1307, ds1337, ds1338, ds1339, > ds1340, ds1375, ds1388, and ds3231. > > Tested with ds1338 and ds1375. > > Signed-off-by: Michael Davidsaver I certainly like the idea of consolidating this code, but reviewing to see that the new code really is a generalization of the old is something I won't have time for for a while. Also, hw/timer is not within my purview so it'll probably need to go another path to merge. > --- > default-configs/arm-softmmu.mak | 2 +- > hw/timer/Makefile.objs | 2 +- > hw/timer/ds-rtc-i2c.c | 461 > > hw/timer/ds1338.c | 239 - > 4 files changed, 463 insertions(+), 241 deletions(-) > create mode 100644 hw/timer/ds-rtc-i2c.c > delete mode 100644 hw/timer/ds1338.c > > diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak > index d37edc4312..b857823681 100644 > --- a/default-configs/arm-softmmu.mak > +++ b/default-configs/arm-softmmu.mak > @@ -31,7 +31,7 @@ CONFIG_SMC91C111=y > CONFIG_ALLWINNER_EMAC=y > CONFIG_IMX_FEC=y > CONFIG_FTGMAC100=y > -CONFIG_DS1338=y > +CONFIG_DSRTCI2C=y > CONFIG_PFLASH_CFI01=y > CONFIG_PFLASH_CFI02=y > CONFIG_MICRODRIVE=y > diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs > index 8c19eac3b6..290015ebec 100644 > --- a/hw/timer/Makefile.objs > +++ b/hw/timer/Makefile.objs > @@ -3,7 +3,7 @@ common-obj-$(CONFIG_ARM_MPTIMER) += arm_mptimer.o > common-obj-$(CONFIG_ARM_V7M) += armv7m_systick.o > common-obj-$(CONFIG_A9_GTIMER) += a9gtimer.o > common-obj-$(CONFIG_CADENCE) += cadence_ttc.o > -common-obj-$(CONFIG_DS1338) += ds1338.o > +common-obj-$(CONFIG_DSRTCI2C) += ds-rtc-i2c.o > common-obj-$(CONFIG_HPET) += hpet.o > common-obj-$(CONFIG_I8254) += i8254_common.o i8254.o > common-obj-$(CONFIG_M48T59) += m48t59.o > diff --git a/hw/timer/ds-rtc-i2c.c b/hw/timer/ds-rtc-i2c.c > new file mode 100644 > index 00..ad2f8f2a68 > --- /dev/null > +++ b/hw/timer/ds-rtc-i2c.c > @@ -0,0 +1,461 @@ > +/* Emulation of various Dallas/Maxim RTCs accessed via I2C bus > + * > + * Copyright (c) 2017 Michael Davidsaver > + * Copyright (c) 2009 CodeSourcery > + * > + * Authors: Michael Davidsaver > + * Paul Brook > + * > + * This work is licensed under the terms of the GNU GPL, version 2. See > + * the LICENSE file in the top-level directory. > + * > + * Models real time read/set and NVRAM. > + * Does not model alarms, or control/status registers. > + * > + * Generalized register map is: > + * [Current time] > + * [Alarm settings] (optional) > + * [Control/Status] (optional) > + * [Non-volatile memory] (optional) > + * > + * The current time registers are almost always the same, > + * with the exception being that some have a CENTURY bit > + * in the month register. > + */ > +#include "qemu/osdep.h" > +#include "qemu/log.h" > +#include "qemu/timer.h" > +#include "qemu/bcd.h" > +#include "hw/hw.h" > +#include "hw/registerfields.h" > +#include "hw/i2c/i2c.h" > +#include "sysemu/qtest.h" > +#include "qemu/error-report.h" > + > +/* #define DEBUG_DSRTC */ > + > +#ifdef DEBUG_DSRTC > +#define DPRINTK(FMT, ...) info_report(TYPE_DSRTC " : " FMT, ## __VA_ARGS__) > +#else > +#define DPRINTK(FMT, ...) do {} while (0) > +#endif > + > +#define LOG(MSK, FMT, ...) qemu_log_mask(MSK, TYPE_DSRTC " : " FMT "\n", \ > +## __VA_ARGS__) > + > +#define DSRTC_REGSIZE (0x40) > + > +/* values stored in BCD */ > +/* 00-59 */ > +#define R_SEC (0x0) > +/* 00-59 */ > +#define R_MIN (0x1) > +#define R_HOUR (0x2) > +/* 1-7 */ > +#define R_WDAY (0x3) > +/* 0-31 */ > +#define R_DATE (0x4) > +#define R_MONTH (0x5) > +/* 0-99 */ > +#define R_YEAR (0x6) > + > +/* use 12 hour mode when set */ > +FIELD(HOUR, SET12, 6, 1) > +/* 00-23 */ > +FIELD(HOUR, HOUR24, 0, 6) > +FIELD(HOUR, AMPM, 5, 1) > +/* 1-12 (not 0-11!) */ > +FIELD(HOUR, HOUR12, 0, 5) > + > +/* 1-12 */ > +FIELD(MONTH, MONTH, 0, 5) > +FIELD(MONTH, CENTURY, 7, 1) > + > +typedef struct DSRTCInfo { > +/* if bit 7 of the Month register is set after Y2K */ > +bool has_century; > +/* address of first non-volatile memory cell. > + * nv_start >= reg_end means no NV memory. > + */ > +uint8_t nv_start; > +/* total size of register range. When address counter rolls over. */ > +uint8_t reg_size; > +} DSRTCInfo; > + > +typedef struct DSRTCState { > +I2CSlave parent_obj; > + > +const DSRTCInfo *info; > + > +qemu_irq alarm_irq; > + > +/* register address counter */ > +uint8_t addr; > +/* when writing, whether the address has been sent */ > +bool addrd; > + > +int64_t time_offset; > +int8_t wday_offset; > + > +uint8_t regs[DSRTC_REGSIZE]; > +} DSRTCState; > + > +typedef struct DSRTCClass { > +I2CSlaveClass parent_class; > + > +const DSRTCInfo *info; > +} DSRTCClass; > + > +#define TYPE_DSRTC "ds-rtc-i2c" > +#defi
Re: [Qemu-devel] unsubscribe qemu-devel
On Thu, Nov 30, 2017 at 06:11:26AM +, Huang, Haibin wrote: > unsubscribe qemu-devel FYI, this is the wrong email address to use for unsubscribing. You instead need qemu-devel-requ...@nongnu.org Alternatively, there is a web UI that lets you do it: https://lists.nongnu.org/mailman/listinfo/qemu-devel Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [Qemu-devel] [PATCH v4 01/32] migration: better error handling with QEMUFile
* Peter Xu (pet...@redhat.com) wrote: > If the postcopy down due to some reason, we can always see this on dst: > > qemu-system-x86_64: RP: Received invalid message 0x length 0x > > However in most cases that's not the real issue. The problem is that > qemu_get_be16() has no way to show whether the returned data is valid or > not, and we are _always_ assuming it is valid. That's possibly not wise. > > The best approach to solve this would be: refactoring QEMUFile interface > to allow the APIs to return error if there is. However it needs quite a > bit of work and testing. For now, let's explicitly check the validity > first before using the data in all places for qemu_get_*(). > > This patch tries to fix most of the cases I can see. Only if we are with > this, can we make sure we are processing the valid data, and also can we > make sure we can capture the channel down events correctly. > > Signed-off-by: Peter Xu > --- > migration/migration.c | 5 + > migration/ram.c | 26 ++ > migration/savevm.c| 40 ++-- > 3 files changed, 65 insertions(+), 6 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index c0206023d7..eae34d0524 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1708,6 +1708,11 @@ static void *source_return_path_thread(void *opaque) > header_type = qemu_get_be16(rp); > header_len = qemu_get_be16(rp); > > +if (qemu_file_get_error(rp)) { > +mark_source_rp_bad(ms); > +goto out; > +} > + > if (header_type >= MIG_RP_MSG_MAX || > header_type == MIG_RP_MSG_INVALID) { > error_report("RP: Received invalid message 0x%04x length 0x%04x", > diff --git a/migration/ram.c b/migration/ram.c > index 8620aa400a..960c726ff2 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -2687,7 +2687,7 @@ static int ram_load_postcopy(QEMUFile *f) > void *last_host = NULL; > bool all_zero = false; > > -while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) { > +while (!(flags & RAM_SAVE_FLAG_EOS)) { I still think you need to keep the !ret && - see below; anyway, there's no harm in keeping it! > ram_addr_t addr; > void *host = NULL; > void *page_buffer = NULL; > @@ -2696,6 +2696,16 @@ static int ram_load_postcopy(QEMUFile *f) > uint8_t ch; > > addr = qemu_get_be64(f); > + > +/* > + * If qemu file error, we should stop here, and then "addr" > + * may be invalid > + */ > +ret = qemu_file_get_error(f); > +if (ret) { > +break; > +} > + > flags = addr & ~TARGET_PAGE_MASK; > addr &= TARGET_PAGE_MASK; > > @@ -2776,6 +2786,13 @@ static int ram_load_postcopy(QEMUFile *f) > error_report("Unknown combination of migration flags: %#x" > " (postcopy mode)", flags); > ret = -EINVAL; > +break; This 'break' breaks from the switch, but doesn't break the loop and because you remove dthe !ret && from the top, the loop keeps going when it shouldn't. > +} > + > +/* Detect for any possible file errors */ > +if (qemu_file_get_error(f)) { > +ret = qemu_file_get_error(f); > +break; > } This is all simpler if you just leave the !ret && at the top, and then make this: if (!ret) { ret = qemu_file_get_error(f); } > > if (place_needed) { Make that if (!ret && place_needed) { > @@ -2789,9 +2806,10 @@ static int ram_load_postcopy(QEMUFile *f) > ret = postcopy_place_page(mis, place_dest, >place_source, block); > } > -} > -if (!ret) { > -ret = qemu_file_get_error(f); > + > +if (ret) { > +break; > +} And with the !ret check at the top this goes again. > } > } > > diff --git a/migration/savevm.c b/migration/savevm.c > index 4a88228614..1da0255cd7 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -1765,6 +1765,11 @@ static int loadvm_process_command(QEMUFile *f) > cmd = qemu_get_be16(f); > len = qemu_get_be16(f); > > +/* Check validity before continue processing of cmds */ > +if (qemu_file_get_error(f)) { > +return qemu_file_get_error(f); > +} > + > trace_loadvm_process_command(cmd, len); > if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) { > error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len); > @@ -1830,6 +1835,7 @@ static int loadvm_process_command(QEMUFile *f) > */ > static bool check_section_footer(QEMUFile *f, SaveStateEntry *se) > { > +int ret; > uint8_t read_mark; > uint32_t read_section_id; > > @@ -1840,6 +1846,13 @@ static bool check_section_footer(QEMUFile *f, > Save
Re: [Qemu-devel] [for-2.12 3/7] pci: Fold pci_bus.h into pci.h
On 30/11/2017 6:02, David Gibson wrote: On Wed, Nov 29, 2017 at 12:38:00PM +0200, Marcel Apfelbaum wrote: On 29/11/2017 10:46, David Gibson wrote: include/hw/pci/pci_bus.h is now very small and can only safely be included after hw/pci/pci.h. So, just fold it into pci.h. I don't get the benefit from merging the header files. I would go the other way around and find stuff specific to pci_bus and add it there, like the pci_bus_new* you touched in the prev patch. Hrm. Except the point of the earlier patch was that those are actually spoecific to root buses, so would really belong in say pci-host.h, rather than pci-bus.h. A log of PCI stuff deals with interaction between the device and bus though, so it just seemed like more trouble than it was worth to go disentangling them properly. Maybe if *all* code files requiring pci.h would automatically need pci_bus.h would make sense, but I didn't check that. Yeah, I don't think every user of pci.h needs pci_bus.h, although a fair few do as you can see from the diff. Well, I guess it's up to Michael. I'll be tolerably content either way - I'd say this is the least important patch of the series. Agreed, Thanks, Marcel
[Qemu-devel] [Bug 1701974] Re: pwrite does not work right under qemu-sh4
This might be related to this fix: > https://git.qemu.org/?p=qemu.git;a=commit;h=8bf8e9df4a7d82c7a47cc961c9cdee1615595de0 FWIW, if you're interested in sh4, please join #debian-ports on OFTC and subscribe to the debian-superh mailing list. We're doing lots of sh4 development and testing QEMU in Debian. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1701974 Title: pwrite does not work right under qemu-sh4 Status in QEMU: New Bug description: The pwrite system call has no effect when writing to a non-zero file position, in a program running under qemu-sh4 (version 2.9.0). How to reproduce: - Compile the program: sh4-linux-gnu-gcc-5 -O -Wall -static -o test-pwrite test-pwrite.c - Set environment variable for using qemu-sh4 (actually not needed, since the program is statically linked here). - ~/inst-qemu/2.9.0/bin/qemu-sh4 test-pwrite Expected output: buf = 01W3456789 Actual output: buf = 0123456789 test-pwrite.c:56: assertion 'strcmp ("01W3456789",buf) == 0' failed qemu: uncaught target signal 6 (Aborted) - core dumped To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1701974/+subscriptions
[Qemu-devel] [Bug 1735384] [NEW] OpenJDK JVM segfaults on qemu-sh4 (regression)
Public bug reported: Some of the recent changes introduced a regression which makes the OpenJDK JVM crash on qemu-sh4: (sid-sh4-sbuild)root@nofan:/# java -version qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault (sid-sh4-sbuild)root@nofan:/# An older version works fine: (sid-sh4-sbuild)root@nofan:/# java -version openjdk version "9.0.1" OpenJDK Runtime Environment (build 9.0.1+11-Debian-1) OpenJDK Zero VM (build 9.0.1+11-Debian-1, interpreted mode) (sid-sh4-sbuild)root@nofan:/# Haven't had time for bisecting this yet. Adrian ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1735384 Title: OpenJDK JVM segfaults on qemu-sh4 (regression) Status in QEMU: New Bug description: Some of the recent changes introduced a regression which makes the OpenJDK JVM crash on qemu-sh4: (sid-sh4-sbuild)root@nofan:/# java -version qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault (sid-sh4-sbuild)root@nofan:/# An older version works fine: (sid-sh4-sbuild)root@nofan:/# java -version openjdk version "9.0.1" OpenJDK Runtime Environment (build 9.0.1+11-Debian-1) OpenJDK Zero VM (build 9.0.1+11-Debian-1, interpreted mode) (sid-sh4-sbuild)root@nofan:/# Haven't had time for bisecting this yet. Adrian To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1735384/+subscriptions
Re: [Qemu-devel] [PATCH v4 12/20] qed: Switch to .bdrv_co_block_status()
12.10.2017 21:59, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Update the qed driver accordingly, taking the opportunity to inline qed_is_allocated_cb() into its lone caller (the callback used to be important, until we switched qed to coroutines). There is no intent to optimize based on the want_zero flag for this format. Signed-off-by: Eric Blake --- v4: rebase to interface change, inline pointless callback v3: no change v2: rebase to mapping flag, fix mask in qed_is_allocated_cb --- block/qed.c | 84 + 1 file changed, 28 insertions(+), 56 deletions(-) diff --git a/block/qed.c b/block/qed.c index 28e2ec89e8..cbab5b6f83 100644 --- a/block/qed.c +++ b/block/qed.c @@ -688,74 +688,46 @@ finish: return ret; } -typedef struct { -BlockDriverState *bs; -Coroutine *co; -uint64_t pos; -int64_t status; -int *pnum; -BlockDriverState **file; -} QEDIsAllocatedCB; - -/* Called with table_lock held. */ -static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t len) -{ -QEDIsAllocatedCB *cb = opaque; -BDRVQEDState *s = cb->bs->opaque; -*cb->pnum = len / BDRV_SECTOR_SIZE; -switch (ret) { -case QED_CLUSTER_FOUND: -offset |= qed_offset_into_cluster(s, cb->pos); -cb->status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset; -*cb->file = cb->bs->file->bs; -break; -case QED_CLUSTER_ZERO: -cb->status = BDRV_BLOCK_ZERO; -break; -case QED_CLUSTER_L2: -case QED_CLUSTER_L1: -cb->status = 0; -break; -default: -assert(ret < 0); -cb->status = ret; -break; -} - -if (cb->co) { -aio_co_wake(cb->co); -} -} - -static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs, - int64_t sector_num, - int nb_sectors, int *pnum, +static int coroutine_fn bdrv_qed_co_block_status(BlockDriverState *bs, + bool want_zero, + int64_t pos, int64_t bytes, + int64_t *pnum, int64_t *map, BlockDriverState **file) { BDRVQEDState *s = bs->opaque; -size_t len = (size_t)nb_sectors * BDRV_SECTOR_SIZE; -QEDIsAllocatedCB cb = { -.bs = bs, -.pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE, -.status = BDRV_BLOCK_OFFSET_MASK, -.pnum = pnum, -.file = file, -}; +size_t len; size_t len = bytes; with that: Reviewed-by: Vladimir Sementsov-Ogievskiy +int status; QEDRequest request = { .l2_table = NULL }; uint64_t offset; int ret; qemu_co_mutex_lock(&s->table_lock); -ret = qed_find_cluster(s, &request, cb.pos, &len, &offset); -qed_is_allocated_cb(&cb, ret, offset, len); +ret = qed_find_cluster(s, &request, pos, &len, &offset); len is in-out parameter, you can't use it uninitialized. -/* The callback was invoked immediately */ -assert(cb.status != BDRV_BLOCK_OFFSET_MASK); +*pnum = len; +switch (ret) { +case QED_CLUSTER_FOUND: +*map = offset | qed_offset_into_cluster(s, pos); +status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID; +*file = bs->file->bs; +break; +case QED_CLUSTER_ZERO: +status = BDRV_BLOCK_ZERO; +break; +case QED_CLUSTER_L2: +case QED_CLUSTER_L1: +status = 0; +break; +default: +assert(ret < 0); +status = ret; +break; +} qed_unref_l2_cache_entry(request.l2_table); qemu_co_mutex_unlock(&s->table_lock); -return cb.status; +return status; } static BDRVQEDState *acb_to_s(QEDAIOCB *acb) @@ -1595,7 +1567,7 @@ static BlockDriver bdrv_qed = { .bdrv_child_perm = bdrv_format_default_perms, .bdrv_create = bdrv_qed_create, .bdrv_has_zero_init = bdrv_has_zero_init_1, -.bdrv_co_get_block_status = bdrv_qed_co_get_block_status, +.bdrv_co_block_status = bdrv_qed_co_block_status, .bdrv_co_readv= bdrv_qed_co_readv, .bdrv_co_writev = bdrv_qed_co_writev, .bdrv_co_pwrite_zeroes= bdrv_qed_co_pwrite_zeroes, -- Best regards, Vladimir
[Qemu-devel] [Bug 1701973] Re: pread does not work right under qemu-sh4
This might be related to this fix: > https://git.qemu.org/?p=qemu.git;a=commit;h=8bf8e9df4a7d82c7a47cc961c9cdee1615595de0 FWIW, if you're interested in sh4, please join #debian-ports on OFTC and subscribe to the debian-superh mailing list. We're doing lots of sh4 development and testing QEMU in Debian. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1701973 Title: pread does not work right under qemu-sh4 Status in QEMU: New Bug description: The pread system call returns a wrong value in some case, in a program running under qemu-sh4 (version 2.9.0). How to reproduce: - Compile the program: sh4-linux-gnu-gcc-5 -O -Wall -static -o test-pread test-pread.c - Set environment variable for using qemu-sh4 (actually not needed, since the program is statically linked here). - ~/inst-qemu/2.9.0/bin/qemu-sh4 test-pread Expected output: ret=1 errno=0 Actual output: ret=0 errno=2 test-pread.c:44: assertion 'ret == 1' failed qemu: uncaught target signal 6 (Aborted) - core dumped To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1701973/+subscriptions
Re: [Qemu-devel] [PATCH RFC 0/9] block: Rewrite block drain begin/end
Am 30.11.2017 um 03:03 hat Fam Zheng geschrieben: > On Wed, 11/29 18:25, Kevin Wolf wrote: > > Am 29.11.2017 um 15:49 hat Fam Zheng geschrieben: > > > While we look at the fixes for 2.11, I briefly prototyped this series > > > to see if it makes sense as a simplification of the drain API for > > > 2.12. > > > > > > The idea is to let AioContext manage quiesce callbacks, then the block > > > layer only needs to do the in-flight request waiting. This lets us get > > > rid of the callback recursion (both up and down). > > > > So essentially you don't drain individual nodes any more, but whole > > AioContexts. I have a feeeling that this would be a step in the wrong > > direction. > > > > Not only would it completely bypass the path I/O requests take and > > potentially drain a lot more than is actually necessary, but it also > > requires that all nodes that are connected in a tree are in the same > > AioContext. > > Yeah, good point. Initially I wanted to introduce a BlockGraph object > which manages the per-graph draining, (i.e. where to register the > drain callbacks), but I felt lazy and used AioContext. > > Will that make it better? BlockGraph would be a proper abstraction > and will not limit the API to one AioContext per tree. There is only a single graph, so this would mean going back to global bdrv_drain_all() exclusively. What you really mean is probably connected components in the graph, but do we really want to manage merging and splitting object representing connected components when a node is added or removed from the graph? Especially when that graph change occurs in a drain callback? You can also still easily introduce bugs where graph changes during a drain end up in nodes not being drained, possibly drained twice, you still access the next pointer of a deleted node or you accidentally switch to draining a different component. It's probably possible to get this right, but essentially you're just switching from iterating a tree to iterating a list. You get roughly the same set of problems that you have to consider as today, and getting it right should be about the same difficulty. > > And finally, I don't really think that the recursion is even a problem. > > The problem is with graph changes made by callbacks that drain allows to > > run. With your changes, it might be a bit easier to avoid that > > bdrv_drain() itself gets into trouble due to graph changes, but this > > doesn't solve the problem for any (possibly indirect) callers of > > bdrv_drain(). > > The recursion is the one big place that can be easily broken by graph changes, > fixing this doesn't make the situation any worse. We could still fix the > indirect callers by taking references or by introducing "ubiquitous > coroutines". But hiding a bug in 80% of the cases where it shows isn't enough. I think the only real solution is to forbid graph changes until after any critical operation has completed. I haven't tried it out in practice, but I suppose we could use a CoMutex around them and take it in bdrv_drained_begin/end() and all other places that can get into trouble with graph changes. Kevin
Re: [Qemu-devel] [PATCH v18 05/10] xbitmap: add more operations
Wei Wang wrote: > /** > + * xb_clear_bit - clear a range of bits in the xbitmap Name mismatch. > + * @start: the start of the bit range, inclusive > + * @end: the end of the bit range, inclusive > + * > + * This function is used to clear a bit in the xbitmap. If all the bits of > the > + * bitmap are 0, the bitmap will be freed. > + */ > +void xb_clear_bit_range(struct xb *xb, unsigned long start, unsigned long > end) > +{ > + struct radix_tree_root *root = &xb->xbrt; > + struct radix_tree_node *node; > + void **slot; > + struct ida_bitmap *bitmap; > + unsigned int nbits; > + > + for (; start < end; start = (start | (IDA_BITMAP_BITS - 1)) + 1) { > + unsigned long index = start / IDA_BITMAP_BITS; > + unsigned long bit = start % IDA_BITMAP_BITS; > + > + bitmap = __radix_tree_lookup(root, index, &node, &slot); > + if (radix_tree_exception(bitmap)) { > + unsigned long ebit = bit + 2; > + unsigned long tmp = (unsigned long)bitmap; > + > + nbits = min(end - start + 1, BITS_PER_LONG - ebit); "nbits = min(end - start + 1," seems to expect that start == end is legal for clearing only 1 bit. But this function is no-op if start == end. Please clarify what "inclusive" intended. > + > + if (ebit >= BITS_PER_LONG) > + continue; (I don't understand how radix tree works, but generally this patchset looks fuzzy to me about boundary cases. Thus, I want to confirm that this is not an overlook.) Why is making "ebit >= BITS_PER_LONG" (e.g. start == 62) case a no-op correct? Aren't there bits which should have been cleared in this case? > + bitmap_clear(&tmp, ebit, nbits); > + if (tmp == RADIX_TREE_EXCEPTIONAL_ENTRY) > + __radix_tree_delete(root, node, slot); > + else > + rcu_assign_pointer(*slot, (void *)tmp); > + } else if (bitmap) { > + nbits = min(end - start + 1, IDA_BITMAP_BITS - bit); > + > + if (nbits != IDA_BITMAP_BITS) > + bitmap_clear(bitmap->bitmap, bit, nbits); > + > + if (nbits == IDA_BITMAP_BITS || > + bitmap_empty(bitmap->bitmap, IDA_BITMAP_BITS)) { > + kfree(bitmap); > + __radix_tree_delete(root, node, slot); > + } > + } > + } > +} > +static inline __always_inline void bitmap_clear(unsigned long *map, > + unsigned int start, > + unsigned int nbits) > +{ > + if (__builtin_constant_p(nbits) && nbits == 1) > + __clear_bit(start, map); > + else if (__builtin_constant_p(start & 7) && IS_ALIGNED(start, 8) && > + __builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) It looks strange to apply __builtin_constant_p test to variables after "& 7". > + memset((char *)map + start / 8, 0, nbits / 8); > + else > + __bitmap_clear(map, start, nbits); > +} > +
Re: [Qemu-devel] [PATCH v18 07/10] virtio-balloon: VIRTIO_BALLOON_F_SG
Wei Wang wrote: > +static inline int xb_set_page(struct virtio_balloon *vb, > +struct page *page, > +unsigned long *pfn_min, > +unsigned long *pfn_max) > +{ > + unsigned long pfn = page_to_pfn(page); > + int ret; > + > + *pfn_min = min(pfn, *pfn_min); > + *pfn_max = max(pfn, *pfn_max); > + > + do { > + ret = xb_preload_and_set_bit(&vb->page_xb, pfn, > + GFP_NOWAIT | __GFP_NOWARN); It is a bit of pity that __GFP_NOWARN here is applied to only xb_preload(). Memory allocation by xb_set_bit() will after all emit warnings. Maybe xb_init(&vb->page_xb); vb->page_xb.gfp_mask |= __GFP_NOWARN; is tolerable? Or, unconditionally apply __GFP_NOWARN at xb_init()? static inline void xb_init(struct xb *xb) { INIT_RADIX_TREE(&xb->xbrt, IDR_RT_MARKER | GFP_NOWAIT); } > + } while (unlikely(ret == -EAGAIN)); > + > + return ret; > +} > + > @@ -172,11 +283,18 @@ static unsigned fill_balloon(struct virtio_balloon *vb, > size_t num) > vb->num_pfns = 0; > > while ((page = balloon_page_pop(&pages))) { > + if (use_sg) { > + if (xb_set_page(vb, page, &pfn_min, &pfn_max) < 0) { > + __free_page(page); > + break; You cannot "break;" without consuming all pages in "pages". > + } > + } else { > + set_page_pfns(vb, vb->pfns + vb->num_pfns, page); > + } > + > balloon_page_enqueue(&vb->vb_dev_info, page); > > vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE; > - > - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); > vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; > if (!virtio_has_feature(vb->vdev, > VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > @@ -212,9 +334,12 @@ static unsigned leak_balloon(struct virtio_balloon *vb, > size_t num) > struct page *page; > struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; > LIST_HEAD(pages); > + bool use_sg = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_SG); You can pass use_sg as an argument to leak_balloon(). Then, you won't need to define leak_balloon_sg_oom(). Since xbitmap allocation does not use __GFP_DIRECT_RECLAIM, it is safe to reuse leak_balloon() for OOM path. Just be sure to pass use_sg == false because memory allocation for use_sg == true likely fails when called from OOM path. (But trying use_sg == true for OOM path and then fallback to use_sg == false is not bad?) > + unsigned long pfn_max = 0, pfn_min = ULONG_MAX; > > - /* We can only do one array worth at a time. */ > - num = min(num, ARRAY_SIZE(vb->pfns)); > + /* Traditionally, we can only do one array worth at a time. */ > + if (!use_sg) > + num = min(num, ARRAY_SIZE(vb->pfns)); > > mutex_lock(&vb->balloon_lock); > /* We can't release more pages than taken */ > diff --git a/include/uapi/linux/virtio_balloon.h > b/include/uapi/linux/virtio_balloon.h > index 343d7dd..37780a7 100644 > --- a/include/uapi/linux/virtio_balloon.h > +++ b/include/uapi/linux/virtio_balloon.h > @@ -34,6 +34,7 @@ > #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming > pages */ > #define VIRTIO_BALLOON_F_STATS_VQ1 /* Memory Stats virtqueue */ > #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ > +#define VIRTIO_BALLOON_F_SG 3 /* Use sg instead of PFN lists */ Want more explicit comment that PFN lists will be used on OOM and therefore the host side must be prepared for both sg and PFN lists even if negotiated?
Re: [Qemu-devel] [PATCH v4 05/32] migration: implement "postcopy-pause" src logic
* Peter Xu (pet...@redhat.com) wrote: > Now when network down for postcopy, the source side will not fail the > migration. Instead we convert the status into this new paused state, and > we will try to wait for a rescue in the future. > > If a recovery is detected, migration_thread() will reset its local > variables to prepare for that. > > Reviewed-by: Dr. David Alan Gilbert That's still OK; you might want to consider reusing the 'pause_sem' that I added to MigrationStatus for the other pause case. Dave > Signed-off-by: Peter Xu > --- > migration/migration.c | 98 > +++--- > migration/migration.h | 3 ++ > migration/trace-events | 1 + > 3 files changed, 98 insertions(+), 4 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index dd270f8bc5..46e7ca36a4 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -,6 +,8 @@ static void migrate_fd_cleanup(void *opaque) > } > notifier_list_notify(&migration_state_notifiers, s); > block_cleanup_parameters(s); > + > +qemu_sem_destroy(&s->postcopy_pause_sem); > } > > void migrate_set_error(MigrationState *s, const Error *error) > @@ -1267,6 +1269,7 @@ MigrationState *migrate_init(void) > s->migration_thread_running = false; > error_free(s->error); > s->error = NULL; > +qemu_sem_init(&s->postcopy_pause_sem, 0); > > migrate_set_state(&s->state, MIGRATION_STATUS_NONE, > MIGRATION_STATUS_SETUP); > > @@ -2159,6 +2162,80 @@ bool migrate_colo_enabled(void) > return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO]; > } > > +typedef enum MigThrError { > +/* No error detected */ > +MIG_THR_ERR_NONE = 0, > +/* Detected error, but resumed successfully */ > +MIG_THR_ERR_RECOVERED = 1, > +/* Detected fatal error, need to exit */ > +MIG_THR_ERR_FATAL = 2, > +} MigThrError; > + > +/* > + * We don't return until we are in a safe state to continue current > + * postcopy migration. Returns MIG_THR_ERR_RECOVERED if recovered, or > + * MIG_THR_ERR_FATAL if unrecovery failure happened. > + */ > +static MigThrError postcopy_pause(MigrationState *s) > +{ > +assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); > +migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, > + MIGRATION_STATUS_POSTCOPY_PAUSED); > + > +/* Current channel is possibly broken. Release it. */ > +assert(s->to_dst_file); > +qemu_file_shutdown(s->to_dst_file); > +qemu_fclose(s->to_dst_file); > +s->to_dst_file = NULL; > + > +error_report("Detected IO failure for postcopy. " > + "Migration paused."); > + > +/* > + * We wait until things fixed up. Then someone will setup the > + * status back for us. > + */ > +while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { > +qemu_sem_wait(&s->postcopy_pause_sem); > +} > + > +trace_postcopy_pause_continued(); > + > +return MIG_THR_ERR_RECOVERED; > +} > + > +static MigThrError migration_detect_error(MigrationState *s) > +{ > +int ret; > + > +/* Try to detect any file errors */ > +ret = qemu_file_get_error(s->to_dst_file); > + > +if (!ret) { > +/* Everything is fine */ > +return MIG_THR_ERR_NONE; > +} > + > +if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) { > +/* > + * For postcopy, we allow the network to be down for a > + * while. After that, it can be continued by a > + * recovery phase. > + */ > +return postcopy_pause(s); > +} else { > +/* > + * For precopy (or postcopy with error outside IO), we fail > + * with no time. > + */ > +migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED); > +trace_migration_thread_file_err(); > + > +/* Time to stop the migration, now. */ > +return MIG_THR_ERR_FATAL; > +} > +} > + > /* > * Master migration thread on the source VM. > * It drives the migration and pumps the data down the outgoing channel. > @@ -2183,6 +2260,7 @@ static void *migration_thread(void *opaque) > /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */ > enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE; > bool enable_colo = migrate_colo_enabled(); > +MigThrError thr_error; > > rcu_register_thread(); > > @@ -2255,12 +2333,24 @@ static void *migration_thread(void *opaque) > } > } > > -if (qemu_file_get_error(s->to_dst_file)) { > -migrate_set_state(&s->state, current_active_state, > - MIGRATION_STATUS_FAILED); > -trace_migration_thread_file_err(); > +/* > + * Try to detect any kind of failures, and see whether we > + * should stop the migration now. > + */ > +thr_error = migration_detect_error(s); > +
Re: [Qemu-devel] [PATCH v2 for-2.12 07/16] s390x: handle exceptions during s390_cpu_virt_mem_rw() correctly (TCG)
On 29.11.2017 21:26, David Hildenbrand wrote: > s390_cpu_virt_mem_rw() must always return, so callers can react on > an exception (e.g. see ioinst_handle_stcrw()). > > However, for TCG we always have to exit the cpu loop (and restore the > cpu state before that) if we injected a program interrupt. So let's > introduce and use s390_cpu_virt_mem_handle_exc() in code that is not > purely KVM. > > Directly pass the retaddr we already have available in these functions. > > Signed-off-by: David Hildenbrand > --- > hw/s390x/s390-pci-inst.c | 7 +++ > target/s390x/cpu.h| 1 + > target/s390x/ioinst.c | 20 +--- > target/s390x/mmu_helper.c | 14 ++ > 4 files changed, 39 insertions(+), 3 deletions(-) > > diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c > index 8123705dfd..6f41083244 100644 > --- a/hw/s390x/s390-pci-inst.c > +++ b/hw/s390x/s390-pci-inst.c > @@ -163,6 +163,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t > ra) > } > > if (s390_cpu_virt_mem_read(cpu, env->regs[r2], r2, buffer, > sizeof(*reqh))) { > +s390_cpu_virt_mem_handle_exc(cpu, ra); > return 0; > } > reqh = (ClpReqHdr *)buffer; > @@ -174,6 +175,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t > ra) > > if (s390_cpu_virt_mem_read(cpu, env->regs[r2], r2, buffer, > req_len + sizeof(*resh))) { > +s390_cpu_virt_mem_handle_exc(cpu, ra); > return 0; > } > resh = (ClpRspHdr *)(buffer + req_len); > @@ -189,6 +191,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t > ra) > > if (s390_cpu_virt_mem_read(cpu, env->regs[r2], r2, buffer, > req_len + res_len)) { > +s390_cpu_virt_mem_handle_exc(cpu, ra); > return 0; > } [...] Having to change all these spots is kind of ugly ... and I guess it will also be quite error-prone in the future (if someone is developing new code under KVM only and then forgets to add these handlers for TCG). Can't you do that simply always at the end of s390_cpu_virt_mem_rw() instead, when the function has been called in non-checking mode (i.e. hostbuf != NULL)? Then we only need the extra-dance in places where we call the _check function? > @@ -212,9 +214,12 @@ void ioinst_handle_stcrw(S390CPU *cpu, uint32_t ipb, > uintptr_t ra) > > if (s390_cpu_virt_mem_write(cpu, addr, ar, &crw, sizeof(crw)) == 0) { > setcc(cpu, cc); > -} else if (cc == 0) { > -/* Write failed: requeue CRW since STCRW is a suppressing > instruction */ > -css_undo_stcrw(&crw); > +} else { > +if (cc == 0) { > +/* Write failed: requeue CRW since STCRW is suppressing */ > +css_undo_stcrw(&crw); > +} > +s390_cpu_virt_mem_handle_exc(cpu, ra); > } > } That hunk would then need to do _check first and in case there is a failure, call css_undo_stcrw() before the s390_cpu_virt_mem_handle_exc() function, I think. Alternatively, the s390_cpu_virt_mem_rw() function could get an additional parameter that points to a callback function which is used for "clean-up" right before the cpu_loop_exit ... and in this case the callback function would contain the css_undo_stcrw(). What do you think? Thomas
Re: [Qemu-devel] [PATCH v4 13/20] raw: Switch to .bdrv_co_block_status()
12.10.2017 21:59, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Update the raw driver accordingly. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v4 14/20] sheepdog: Switch to .bdrv_co_block_status()
12.10.2017 21:59, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Update the sheepdog driver accordingly. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v4 15/20] vdi: Avoid bitrot of debugging code
12.10.2017 21:59, Eric Blake wrote: Rework the debug define so that we always get -Wformat checking, even when debugging is disabled. Signed-off-by: Eric Blake Reviewed-by: Stefan Weil Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v4 06/32] migration: allow dst vm pause on postcopy
* Peter Xu (pet...@redhat.com) wrote: > When there is IO error on the incoming channel (e.g., network down), > instead of bailing out immediately, we allow the dst vm to switch to the > new POSTCOPY_PAUSE state. Currently it is still simple - it waits the > new semaphore, until someone poke it for another attempt. > > Signed-off-by: Peter Xu As noted last time we need one of the later patches (8/32) to stop a race; but OK Reviewed-by: Dr. David Alan Gilbert > --- > migration/migration.c | 1 + > migration/migration.h | 3 +++ > migration/savevm.c | 60 > -- > migration/trace-events | 2 ++ > 4 files changed, 64 insertions(+), 2 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 46e7ca36a4..b166e19785 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -150,6 +150,7 @@ MigrationIncomingState > *migration_incoming_get_current(void) > memset(&mis_current, 0, sizeof(MigrationIncomingState)); > qemu_mutex_init(&mis_current.rp_mutex); > qemu_event_init(&mis_current.main_thread_load_event, false); > +qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0); > once = true; > } > return &mis_current; > diff --git a/migration/migration.h b/migration/migration.h > index 36aaa13f50..55894ecb79 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -61,6 +61,9 @@ struct MigrationIncomingState { > /* The coroutine we should enter (back) after failover */ > Coroutine *migration_incoming_co; > QemuSemaphore colo_incoming_sem; > + > +/* notify PAUSED postcopy incoming migrations to try to continue */ > +QemuSemaphore postcopy_pause_sem_dst; > }; > > MigrationIncomingState *migration_incoming_get_current(void); > diff --git a/migration/savevm.c b/migration/savevm.c > index 1da0255cd7..93e308ebf0 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -1529,8 +1529,8 @@ static int > loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis, > */ > static void *postcopy_ram_listen_thread(void *opaque) > { > -QEMUFile *f = opaque; > MigrationIncomingState *mis = migration_incoming_get_current(); > +QEMUFile *f = mis->from_src_file; > int load_res; > > migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, > @@ -1544,6 +1544,14 @@ static void *postcopy_ram_listen_thread(void *opaque) > */ > qemu_file_set_blocking(f, true); > load_res = qemu_loadvm_state_main(f, mis); > + > +/* > + * This is tricky, but, mis->from_src_file can change after it > + * returns, when postcopy recovery happened. In the future, we may > + * want a wrapper for the QEMUFile handle. > + */ > +f = mis->from_src_file; > + > /* And non-blocking again so we don't block in any cleanup */ > qemu_file_set_blocking(f, false); > > @@ -1626,7 +1634,7 @@ static int > loadvm_postcopy_handle_listen(MigrationIncomingState *mis) > /* Start up the listening thread and wait for it to signal ready */ > qemu_sem_init(&mis->listen_thread_sem, 0); > qemu_thread_create(&mis->listen_thread, "postcopy/listen", > - postcopy_ram_listen_thread, mis->from_src_file, > + postcopy_ram_listen_thread, NULL, > QEMU_THREAD_DETACHED); > qemu_sem_wait(&mis->listen_thread_sem); > qemu_sem_destroy(&mis->listen_thread_sem); > @@ -2011,11 +2019,44 @@ void qemu_loadvm_state_cleanup(void) > } > } > > +/* Return true if we should continue the migration, or false. */ > +static bool postcopy_pause_incoming(MigrationIncomingState *mis) > +{ > +trace_postcopy_pause_incoming(); > + > +migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, > + MIGRATION_STATUS_POSTCOPY_PAUSED); > + > +assert(mis->from_src_file); > +qemu_file_shutdown(mis->from_src_file); > +qemu_fclose(mis->from_src_file); > +mis->from_src_file = NULL; > + > +assert(mis->to_src_file); > +qemu_file_shutdown(mis->to_src_file); > +qemu_mutex_lock(&mis->rp_mutex); > +qemu_fclose(mis->to_src_file); > +mis->to_src_file = NULL; > +qemu_mutex_unlock(&mis->rp_mutex); > + > +error_report("Detected IO failure for postcopy. " > + "Migration paused."); > + > +while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { > +qemu_sem_wait(&mis->postcopy_pause_sem_dst); > +} > + > +trace_postcopy_pause_incoming_continued(); > + > +return true; > +} > + > static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis) > { > uint8_t section_type; > int ret = 0; > > +retry: > while (true) { > section_type = qemu_get_byte(f); > > @@ -2060,6 +2101,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, > MigrationIncomingState *mis) > out: > if (ret < 0) { > qemu_file_set_error(f,
Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
On Wed, 29 Nov 2017 18:50:19 + "Dr. David Alan Gilbert (git)" wrote: > From: "Dr. David Alan Gilbert" > > Hi, > This is an experimental set that reworks the way the vhost > code handles changes in physical address space layout that > came from a discussion with Igor. Thanks for looking into it. > Instead of updating and trying to merge sections of address > space on each add/remove callback, we wait until the commit phase > and go through and rebuild a list by walking the Flatview of > memory and end up producing an ordered list. > We compare the list to the old list to trigger updates. > > Note, only very lightly tested so far, I'm just trying to see if it's > the right shape. > > Igor, is this what you were intending? I was thinking about a little less intrusive approach where vhost_region_add/del are modified to maintain sorted by GPA array of mem_sections, vhost_dev::mem is dropped altogether and vhost_memory_region array is build/used/freed on every vhost_commit(). Maintaining sorted array should roughly cost us O(2 log n) if binary search is used. However I like your idea with iterator even more as it have potential to make it even faster O(n) if we get rid of quadratic and relatively complex vhost_update_compare_list(). Pls, see comments on individual patches. > Dave > > Dr. David Alan Gilbert (7): > memory: address_space_iterate > vhost: Move log_dirty check > vhost: New memory update functions > vhost: update_mem_cb implementation > vhost: Compare new and old memory lists > vhost: Copy updated region data into device state > vhost: Remove vhost_set_memory and children > > hw/virtio/trace-events | 8 + > hw/virtio/vhost.c | 424 > ++--- > include/exec/memory.h | 23 +++ > memory.c | 22 +++ > 4 files changed, 241 insertions(+), 236 deletions(-) >
Re: [Qemu-devel] [PATCH v4 16/20] vdi: Switch to .bdrv_co_block_status()
12.10.2017 21:59, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Update the vdi driver accordingly. Note that the TODO is already covered (the block layer guarantees bounds of its requests), and that we can remove the now-unused s->block_sectors. Signed-off-by: Eric Blake --- v4: rebase to interface tweak v3: no change v2: rebase to mapping flag --- block/vdi.c | 33 + 1 file changed, 13 insertions(+), 20 deletions(-) diff --git a/block/vdi.c b/block/vdi.c index 6f83221ddc..b30886823d 100644 --- a/block/vdi.c +++ b/block/vdi.c @@ -171,8 +171,6 @@ typedef struct { uint32_t *bmap; /* Size of block (bytes). */ uint32_t block_size; -/* Size of block (sectors). */ -uint32_t block_sectors; /* First sector of block map. */ uint32_t bmap_sector; /* VDI header (converted to host endianness). */ @@ -462,7 +460,6 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags, bs->total_sectors = header.disk_size / SECTOR_SIZE; s->block_size = header.block_size; -s->block_sectors = header.block_size / SECTOR_SIZE; s->bmap_sector = header.offset_bmap / SECTOR_SIZE; s->header = header; @@ -508,33 +505,29 @@ static int vdi_reopen_prepare(BDRVReopenState *state, return 0; } -static int64_t coroutine_fn vdi_co_get_block_status(BlockDriverState *bs, -int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file) +static int coroutine_fn vdi_co_block_status(BlockDriverState *bs, +bool want_zero, +int64_t offset, int64_t bytes, +int64_t *pnum, int64_t *map, +BlockDriverState **file) { -/* TODO: Check for too large sector_num (in bdrv_is_allocated or here). */ BDRVVdiState *s = (BDRVVdiState *)bs->opaque; -size_t bmap_index = sector_num / s->block_sectors; -size_t sector_in_block = sector_num % s->block_sectors; -int n_sectors = s->block_sectors - sector_in_block; +size_t bmap_index = offset / s->block_size; +size_t index_in_block = offset % s->block_size; uint32_t bmap_entry = le32_to_cpu(s->bmap[bmap_index]); -uint64_t offset; int result; -logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum); -if (n_sectors > nb_sectors) { -n_sectors = nb_sectors; -} -*pnum = n_sectors; +logout("%p, %" PRId64 ", %" PRId64 ", %p\n", bs, offset, bytes, pnum); may be not bad to update message, as you changed numbers, like logout("%p, offset %" PRId64 ", bytes %" PRId64 ", %p\n", bs, offset, bytes, pnum); but for me it is ok as is too. +*pnum = MIN(s->block_size, bytes); looks like it should be MIN(s->block_size - index_in_block, bytes); with that: Reviewed-by: Vladimir Sementsov-Ogievskiy result = VDI_IS_ALLOCATED(bmap_entry); if (!result) { return 0; } -offset = s->header.offset_data + - (uint64_t)bmap_entry * s->block_size + - sector_in_block * SECTOR_SIZE; +*map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size + +index_in_block; *file = bs->file->bs; -return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset; +return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID; } static int coroutine_fn @@ -902,7 +895,7 @@ static BlockDriver bdrv_vdi = { .bdrv_child_perm = bdrv_format_default_perms, .bdrv_create = vdi_create, .bdrv_has_zero_init = bdrv_has_zero_init_1, -.bdrv_co_get_block_status = vdi_co_get_block_status, +.bdrv_co_block_status = vdi_co_block_status, .bdrv_make_empty = vdi_make_empty, .bdrv_co_preadv = vdi_co_preadv, -- Best regards, Vladimir
Re: [Qemu-devel] [RFC 4/7] vhost: update_mem_cb implementation
On Wed, 29 Nov 2017 18:50:23 + "Dr. David Alan Gilbert (git)" wrote: > From: "Dr. David Alan Gilbert" > > Add the meat of update_mem_cb; this is called for each region, > to add a region to our temporary list. > Our temporary list is in order we look to see if this > region can be merged with the current head of list. > > Signed-off-by: Dr. David Alan Gilbert > --- > hw/virtio/trace-events | 2 ++ > hw/virtio/vhost.c | 55 > +- > 2 files changed, 56 insertions(+), 1 deletion(-) > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events > index 4a493bcd46..92fadec192 100644 > --- a/hw/virtio/trace-events > +++ b/hw/virtio/trace-events > @@ -2,6 +2,8 @@ > > # hw/virtio/vhost.c > vhost_section(const char *name, int r) "%s:%d" > +vhost_update_mem_cb(const char *name, uint64_t gpa, uint64_t size, uint64_t > host) "%s: 0x%"PRIx64"+0x%"PRIx64" @ 0x%"PRIx64 > +vhost_update_mem_cb_abut(const char *name, uint64_t new_size) "%s: 0x%"PRIx64 > > # hw/virtio/virtio.c > virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned > out_num) "elem %p size %zd in_num %u out_num %u" > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > index c959a59fb3..7e3c6ae032 100644 > --- a/hw/virtio/vhost.c > +++ b/hw/virtio/vhost.c > @@ -638,11 +638,64 @@ struct vhost_update_mem_tmp { > /* Called for each MRS from vhost_update_mem */ > static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque) > { > +struct vhost_update_mem_tmp *vtmp = opaque; > +struct vhost_memory_region *cur_vmr; > +bool need_add = true; > +uint64_t mrs_size; > +uint64_t mrs_gpa; > +uintptr_t mrs_host; > + > if (!vhost_section(mrs)) { > return 0; > } > +mrs_size = int128_get64(mrs->size); > +mrs_gpa = mrs->offset_within_address_space; > +mrs_host = (uintptr_t)memory_region_get_ram_ptr(mrs->mr) + > + mrs->offset_within_region; > + > +trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size, > + (uint64_t)mrs_host); > + > +if (vtmp->nregions) { What forces you to maintain helper vhost_memory_region array instead of MemoryRegionSection array? > +/* Since we already have at least one region, lets see if > + * this extends it; since we're scanning in order, we only > + * have to look at the last one, and the FlatView that calls > + * us shouldn't have overlaps. > + */ > +struct vhost_memory_region *prev_vmr = vtmp->regions + > + (vtmp->nregions - 1); > +uint64_t prev_gpa_start = prev_vmr->guest_phys_addr; > +uint64_t prev_gpa_end = range_get_last(prev_gpa_start, > + prev_vmr->memory_size); > +uint64_t prev_host_start = prev_vmr->userspace_addr; > +uint64_t prev_host_end = range_get_last(prev_host_start, > + prev_vmr->memory_size); > + > +if (prev_gpa_end + 1 == mrs_gpa && > +prev_host_end + 1 == mrs_host && > +(!vtmp->dev->vhost_ops->vhost_backend_can_merge || > +vtmp->dev->vhost_ops->vhost_backend_can_merge(vtmp->dev, > +mrs_host, mrs_size, > +prev_host_start, prev_vmr->memory_size))) { > +/* The two regions abut */ > +need_add = false; > +mrs_size = mrs_size + prev_vmr->memory_size; > +prev_vmr->memory_size = mrs_size; > +trace_vhost_update_mem_cb_abut(mrs->mr->name, mrs_size); > +} > +} > + > +if (need_add) { > +vtmp->nregions++; > +vtmp->regions = g_realloc_n(vtmp->regions, vtmp->nregions, > +sizeof(vtmp->regions[0])); > +cur_vmr = &vtmp->regions[vtmp->nregions - 1]; > +cur_vmr->guest_phys_addr = mrs_gpa; > +cur_vmr->memory_size = mrs_size; > +cur_vmr->userspace_addr = mrs_host; > +cur_vmr->flags_padding = 0; > +} > > -/* TODO */ > return 0; > } >
Re: [Qemu-devel] [PATCH v2 for-2.12 08/16] s390x/tcg: don't exit the cpu loop in s390_cpu_virt_mem_rw()
On 29.11.2017 21:26, David Hildenbrand wrote: > s390_cpu_virt_mem_rw() must always return, so callers can react on > an exception (e.g. see ioinst_handle_stcrw()). > > Therefore, using program_interrupt() is wrong. Fix that up. > > Signed-off-by: David Hildenbrand > --- > target/s390x/mmu_helper.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c > index dbe2f511f8..2c7f3d7d95 100644 > --- a/target/s390x/mmu_helper.c > +++ b/target/s390x/mmu_helper.c > @@ -459,7 +459,7 @@ static int translate_pages(S390CPU *cpu, vaddr addr, int > nr_pages, > } > if (!address_space_access_valid(&address_space_memory, pages[i], > TARGET_PAGE_SIZE, is_write)) { > -program_interrupt(env, PGM_ADDRESSING, ILEN_AUTO); > +trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO); > return -EFAULT; > } > addr += TARGET_PAGE_SIZE; Is that still right when running with KVM? I think the exception will then silently be ignored instead? Thomas
Re: [Qemu-devel] [PATCH v4 17/20] vmdk: Switch to .bdrv_co_block_status()
12.10.2017 21:59, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Update the vmdk driver accordingly. Signed-off-by: Eric Blake --- v4: rebase to interface tweak v3: no change v2: rebase to mapping flag --- block/vmdk.c | 28 +--- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/block/vmdk.c b/block/vmdk.c index c665bcc977..624b5c296a 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -1303,25 +1303,27 @@ static inline uint64_t vmdk_find_index_in_cluster(VmdkExtent *extent, return offset / BDRV_SECTOR_SIZE; } -static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs, -int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file) +static int coroutine_fn vmdk_co_block_status(BlockDriverState *bs, + bool want_zero, + int64_t offset, int64_t bytes, + int64_t *pnum, int64_t *map, + BlockDriverState **file) { BDRVVmdkState *s = bs->opaque; int64_t index_in_cluster, n, ret; -uint64_t offset; +uint64_t cluster_offset; VmdkExtent *extent; -extent = find_extent(s, sector_num, NULL); +extent = find_extent(s, offset >> BDRV_SECTOR_BITS, NULL); if (!extent) { return 0; } qemu_co_mutex_lock(&s->lock); -ret = get_cluster_offset(bs, extent, NULL, - sector_num * 512, false, &offset, +ret = get_cluster_offset(bs, extent, NULL, offset, false, &cluster_offset, 0, 0); qemu_co_mutex_unlock(&s->lock); -index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num); +index_in_cluster = vmdk_find_offset_in_cluster(extent, offset); vmdk_find_index_in_cluster becomes unused. with it dropped: Reviewed-by: Vladimir Sementsov-Ogievskiy switch (ret) { case VMDK_ERROR: ret = -EIO; @@ -1336,18 +1338,14 @@ static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs, ret = BDRV_BLOCK_DATA; if (!extent->compressed) { ret |= BDRV_BLOCK_OFFSET_VALID; -ret |= (offset + (index_in_cluster << BDRV_SECTOR_BITS)) -& BDRV_BLOCK_OFFSET_MASK; +*map = cluster_offset + index_in_cluster; } *file = extent->file->bs; break; } -n = extent->cluster_sectors - index_in_cluster; -if (n > nb_sectors) { -n = nb_sectors; -} -*pnum = n; +n = extent->cluster_sectors * BDRV_SECTOR_SIZE - index_in_cluster; +*pnum = MIN(n, bytes); return ret; } @@ -2393,7 +2391,7 @@ static BlockDriver bdrv_vmdk = { .bdrv_close = vmdk_close, .bdrv_create = vmdk_create, .bdrv_co_flush_to_disk= vmdk_co_flush, -.bdrv_co_get_block_status = vmdk_co_get_block_status, +.bdrv_co_block_status = vmdk_co_block_status, .bdrv_get_allocated_file_size = vmdk_get_allocated_file_size, .bdrv_has_zero_init = vmdk_has_zero_init, .bdrv_get_specific_info = vmdk_get_specific_info, -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v2 for-2.12 10/16] s390x/tcg: use s390_program_interrupt() in SCLP Service Call
On 29.11.2017 21:26, David Hildenbrand wrote: > Now we can drop potential_page_fault(). While at it, move the > unlock further up, looks cleaner. > > Signed-off-by: David Hildenbrand > --- > target/s390x/misc_helper.c | 5 ++--- > target/s390x/translate.c | 1 - > 2 files changed, 2 insertions(+), 4 deletions(-) > > diff --git a/target/s390x/misc_helper.c b/target/s390x/misc_helper.c > index 556340756c..02654617b3 100644 > --- a/target/s390x/misc_helper.c > +++ b/target/s390x/misc_helper.c > @@ -62,11 +62,10 @@ uint32_t HELPER(servc)(CPUS390XState *env, uint64_t r1, > uint64_t r2) > { > qemu_mutex_lock_iothread(); > int r = sclp_service_call(env, r1, r2); > +qemu_mutex_unlock_iothread(); > if (r < 0) { > -program_interrupt(env, -r, 4); > -r = 0; > +s390_program_interrupt(env, -r, 4, GETPC()); > } > -qemu_mutex_unlock_iothread(); > return r; > } > > diff --git a/target/s390x/translate.c b/target/s390x/translate.c > index d0859c4bc7..76b222b0ce 100644 > --- a/target/s390x/translate.c > +++ b/target/s390x/translate.c > @@ -3704,7 +3704,6 @@ static ExitStatus op_sqxb(DisasContext *s, DisasOps *o) > static ExitStatus op_servc(DisasContext *s, DisasOps *o) > { > check_privileged(s); > -potential_page_fault(s); > gen_helper_servc(cc_op, cpu_env, o->in2, o->in1); > set_cc_static(s); > return NO_EXIT; Reviewed-by: Thomas Huth
Re: [Qemu-devel] [PATCH v2 for-2.12 11/16] s390x/tcg: use s390_program_interrupt() in DIAG
On 29.11.2017 21:26, David Hildenbrand wrote: > Now we can drop the two save statements in the translate function. > > Reviewed-by: Richard Henderson > Signed-off-by: David Hildenbrand > --- > target/s390x/misc_helper.c | 2 +- > target/s390x/translate.c | 3 --- > 2 files changed, 1 insertion(+), 4 deletions(-) > > diff --git a/target/s390x/misc_helper.c b/target/s390x/misc_helper.c > index 02654617b3..ee6179ef89 100644 > --- a/target/s390x/misc_helper.c > +++ b/target/s390x/misc_helper.c > @@ -101,7 +101,7 @@ void HELPER(diag)(CPUS390XState *env, uint32_t r1, > uint32_t r3, uint32_t num) > } > > if (r) { > -program_interrupt(env, PGM_SPECIFICATION, ILEN_AUTO); > +s390_program_interrupt(env, PGM_SPECIFICATION, ILEN_AUTO, GETPC()); > } > } > > diff --git a/target/s390x/translate.c b/target/s390x/translate.c > index 76b222b0ce..cf8ffa217e 100644 > --- a/target/s390x/translate.c > +++ b/target/s390x/translate.c > @@ -2124,9 +2124,6 @@ static ExitStatus op_diag(DisasContext *s, DisasOps *o) > TCGv_i32 func_code = tcg_const_i32(get_field(s->fields, i2)); > > check_privileged(s); > -update_psw_addr(s); > -gen_op_calc_cc(s); > - > gen_helper_diag(cpu_env, r1, r3, func_code); > > tcg_temp_free_i32(func_code); > Maybe merge it with patch 06/16 ? Thomas
Re: [Qemu-devel] [qemu-s390x] [PATCH v2 for-2.12 13/16] s390x/tcg: use s390_program_interrupt() in SACF
On 29.11.2017 21:26, David Hildenbrand wrote: > Convert this user, too. > > Reviewed-by: Richard Henderson > Signed-off-by: David Hildenbrand > --- > target/s390x/cc_helper.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target/s390x/cc_helper.c b/target/s390x/cc_helper.c > index f008897e84..5d91e458a8 100644 > --- a/target/s390x/cc_helper.c > +++ b/target/s390x/cc_helper.c > @@ -564,7 +564,7 @@ void HELPER(sacf)(CPUS390XState *env, uint64_t a1) > break; > default: > HELPER_LOG("unknown sacf mode: %" PRIx64 "\n", a1); > -program_interrupt(env, PGM_SPECIFICATION, 2); > +s390_program_interrupt(env, PGM_SPECIFICATION, 2, GETPC()); > break; > } > } > Reviewed-by: Thomas Huth
Re: [Qemu-devel] [qemu-s390x] [PATCH v2 for-2.12 14/16] s390x/tcg: use s390_program_interrupt() in STSI
On 29.11.2017 21:26, David Hildenbrand wrote: > STSI needs some more love, but let's do one step at a time. > We can now drop potential_page_fault(). > > Reviewed-by: Richard Henderson > Signed-off-by: David Hildenbrand > --- > target/s390x/misc_helper.c | 2 +- > target/s390x/translate.c | 1 - > 2 files changed, 1 insertion(+), 2 deletions(-) > > diff --git a/target/s390x/misc_helper.c b/target/s390x/misc_helper.c > index a911bff706..6d766ce1e7 100644 > --- a/target/s390x/misc_helper.c > +++ b/target/s390x/misc_helper.c > @@ -184,7 +184,7 @@ uint32_t HELPER(stsi)(CPUS390XState *env, uint64_t a0, > if ((r0 & STSI_LEVEL_MASK) <= STSI_LEVEL_3 && > ((r0 & STSI_R0_RESERVED_MASK) || (r1 & STSI_R1_RESERVED_MASK))) { > /* valid function code, invalid reserved bits */ > -program_interrupt(env, PGM_SPECIFICATION, 4); > +s390_program_interrupt(env, PGM_SPECIFICATION, 4, GETPC()); > } > > sel1 = r0 & STSI_R0_SEL1_MASK; > diff --git a/target/s390x/translate.c b/target/s390x/translate.c > index f26fa64a78..1ce1390699 100644 > --- a/target/s390x/translate.c > +++ b/target/s390x/translate.c > @@ -3988,7 +3988,6 @@ static ExitStatus op_stpt(DisasContext *s, DisasOps *o) > static ExitStatus op_stsi(DisasContext *s, DisasOps *o) > { > check_privileged(s); > -potential_page_fault(s); > gen_helper_stsi(cc_op, cpu_env, o->in2, regs[0], regs[1]); > set_cc_static(s); > return NO_EXIT; Reviewed-by: Thomas Huth
Re: [Qemu-devel] [PATCH v2 for-2.12 08/16] s390x/tcg: don't exit the cpu loop in s390_cpu_virt_mem_rw()
On 30.11.2017 12:39, Thomas Huth wrote: > On 29.11.2017 21:26, David Hildenbrand wrote: >> s390_cpu_virt_mem_rw() must always return, so callers can react on >> an exception (e.g. see ioinst_handle_stcrw()). >> >> Therefore, using program_interrupt() is wrong. Fix that up. >> >> Signed-off-by: David Hildenbrand >> --- >> target/s390x/mmu_helper.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c >> index dbe2f511f8..2c7f3d7d95 100644 >> --- a/target/s390x/mmu_helper.c >> +++ b/target/s390x/mmu_helper.c >> @@ -459,7 +459,7 @@ static int translate_pages(S390CPU *cpu, vaddr addr, int >> nr_pages, >> } >> if (!address_space_access_valid(&address_space_memory, pages[i], >> TARGET_PAGE_SIZE, is_write)) { >> -program_interrupt(env, PGM_ADDRESSING, ILEN_AUTO); >> +trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO); >> return -EFAULT; >> } >> addr += TARGET_PAGE_SIZE; > > Is that still right when running with KVM? I think the exception will > then silently be ignored instead? Good point (for older KVM). And ugly. if (kvm_enabled()) { kvm_s390_program_interrupt... } else { trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO); } > > Thomas > -- Thanks, David / dhildenb
Re: [Qemu-devel] [PATCH v2 for-2.12 11/16] s390x/tcg: use s390_program_interrupt() in DIAG
On 30.11.2017 12:53, Thomas Huth wrote: > On 29.11.2017 21:26, David Hildenbrand wrote: >> Now we can drop the two save statements in the translate function. >> >> Reviewed-by: Richard Henderson >> Signed-off-by: David Hildenbrand >> --- >> target/s390x/misc_helper.c | 2 +- >> target/s390x/translate.c | 3 --- >> 2 files changed, 1 insertion(+), 4 deletions(-) >> >> diff --git a/target/s390x/misc_helper.c b/target/s390x/misc_helper.c >> index 02654617b3..ee6179ef89 100644 >> --- a/target/s390x/misc_helper.c >> +++ b/target/s390x/misc_helper.c >> @@ -101,7 +101,7 @@ void HELPER(diag)(CPUS390XState *env, uint32_t r1, >> uint32_t r3, uint32_t num) >> } >> >> if (r) { >> -program_interrupt(env, PGM_SPECIFICATION, ILEN_AUTO); >> +s390_program_interrupt(env, PGM_SPECIFICATION, ILEN_AUTO, GETPC()); >> } >> } >> >> diff --git a/target/s390x/translate.c b/target/s390x/translate.c >> index 76b222b0ce..cf8ffa217e 100644 >> --- a/target/s390x/translate.c >> +++ b/target/s390x/translate.c >> @@ -2124,9 +2124,6 @@ static ExitStatus op_diag(DisasContext *s, DisasOps *o) >> TCGv_i32 func_code = tcg_const_i32(get_field(s->fields, i2)); >> >> check_privileged(s); >> -update_psw_addr(s); >> -gen_op_calc_cc(s); >> - >> gen_helper_diag(cpu_env, r1, r3, func_code); >> >> tcg_temp_free_i32(func_code); >> > > Maybe merge it with patch 06/16 ? > No, because this is tcg specific. > Thomas > -- Thanks, David / dhildenb
Re: [Qemu-devel] [qemu-s390x] [PATCH v2 for-2.12 15/16] s390x/tcg: drop program_interrupt()
On 29.11.2017 21:27, David Hildenbrand wrote: > All users are gone, we can finally drop it and make sure that all new > program interrupt injections are reminded of the retaddr - as they have to > use s390_program_interrupt() now. > > Signed-off-by: David Hildenbrand > --- > target/s390x/cpu.h | 1 - > target/s390x/interrupt.c | 22 +- > 2 files changed, 5 insertions(+), 18 deletions(-) Reviewed-by: Thomas Huth
Re: [Qemu-devel] [PATCH v1 2/2] intel-iommu: Extend address width to 48 bits
On Thu, Nov 30, 2017 at 05:54:35PM +0800, Liu, Yi L wrote: > On Thu, Nov 30, 2017 at 05:11:55PM +0800, Peter Xu wrote: > > On Thu, Nov 30, 2017 at 01:22:38PM +0800, Liu, Yi L wrote: > > > On Tue, Nov 14, 2017 at 06:13:50PM -0500, prasad.singamse...@oracle.com > > > wrote: > > > > From: Prasad Singamsetty > > > > > > > > The current implementation of Intel IOMMU code only supports 39 bits > > > > iova address width. This patch provides a new parameter (x-aw-bits) > > > > for intel-iommu to extend its address width to 48 bits but keeping the > > > > default the same (39 bits). The reason for not changing the default > > > > is to avoid potential compatibility problems with live migration of > > > > intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits' > > > > parameter are 39 and 48. > > > > > > > > After enabling larger address width (48), we should be able to map > > > > larger iova addresses in the guest. For example, a QEMU guest that > > > > is configured with large memory ( >=1TB ). To check whether 48 bits > > > > > > I didn't quite get your point here. Address width limits the iova range, > > > but it doesn't limit the guest memory range. e.g. you can use 39 bit iova > > > address to access a guest physical address larger than (2^39 - 1) as long > > > as the guest 2nd level page table is well programmed. Only one exception, > > > if you requires a continuous iova range(e.g. 2^39), it would be an issue. > > > Not sure if this is the major motivation of your patch? However, I'm not > > > against extend the address width to be 48 bits. Just need to make it clear > > > here. > > > > One thing I can think of is the identity mapping. Say, when iommu=pt > > is set in guest, meanwhile when PT capability is not supported from > > hardware (here I mean, the emulated hardware, of course), guest kernel > > will create one identity mapping to emulate the PT mode. > > True. > > > Current linux kernel's identity mapping should be a static 48 bits > > mapping (it must cover the whole memory range of guest), so if we > > I suppose guest memory range depends on the AW reported by CPUID? Not sure > if it is constantly 48 bits. Please refer to si_domain_init() and DEFAULT_DOMAIN_ADDRESS_WIDTH. > > > provide a 39 bits vIOMMU to the guest, AFAIU we'll fail at device > > attaching to that identity domain of every single device that is > > protected by that 39 bits vIOMMU (kernel will find that domain gaw is > > bigger than vIOMMU supported gaw of that device). > > Yeah, this is a good argue. As it is 1:1 mapping, the translated address > is limited all the same. > > > I do see no good fix for that, except boost the supported gaw to > > bigger ones. > > How about defaultly expose cap.PT bit? In that case, there will no 1:1 > mapping in guest side. Translation is skipped. So the IOMMU AW won't > limit the addressing. PT is defaultly on already from the first day it's there. :) -- Peter Xu
Re: [Qemu-devel] [PATCH 2/2] ide: fix crash in IDE cdrom read
On 11/29/2017 02:50 AM, John Snow wrote: > > On 11/28/2017 07:10 AM, Denis V. Lunev wrote: >> There is the following crash reported from the field in QEMU 2.9: >> bdrv_inc_in_flight (bs=bs@entry=0x0) >> blk_aio_prwv >> blk_aio_preadv >> ide_buffered_readv >> cd_read_sector > Is ide_atapi_cmd_reply_end missing from the call stack here for some > reason? ide_data_readw /should/ have end_transfer_func set to that if it > was processing an ATAPI command; otherwise it shouldn't be able to get > here under normal circumstances... this is my fault. I have lost this line while removing unnecessary into from callstack. Here if the full one. (gdb) bt #0 bdrv_inc_in_flight (bs=bs@entry=0x0) at block/io.c:561 #1 0x55b224160d37 in blk_aio_prwv (blk=0x55b2265405a0, offset=offset@entry=285122560, bytes=2048, qiov=qiov@entry=0x55b22a31de20, co_entry=co_entry@entry=0x55b224160400 , flags=flags@entry=0, cb=cb@entry=0x55b224048e00 , opaque=opaque@entry=0x55b22a31de00) at block/block-backend.c:1151 #2 0x55b224160db5 in blk_aio_preadv (blk=, offset=offset@entry=285122560, qiov=qiov@entry=0x55b22a31de20, flags=flags@entry=0, cb=cb@entry=0x55b224048e00 , opaque=opaque@entry=0x55b22a31de00) at block/block-backend.c:1256 #3 0x55b22404bd8d in ide_buffered_readv (s=s@entry=0x55b22a712a68, sector_num=556880, iov=iov@entry=0x55b22a712d60, nb_sectors=nb_sectors@entry=4, cb=cb@entry=0x55b22404f2a0 , opaque=opaque@entry=0x55b22a712a68) at hw/ide/core.c:637 #4 0x55b22404e2c1 in cd_read_sector (s=0x55b22a712a68) at hw/ide/atapi.c:198 #5 ide_atapi_cmd_reply_end (s=0x55b22a712a68) at hw/ide/atapi.c:272 #6 0x55b224049704 in ide_data_readw (opaque=, addr=) at hw/ide/core.c:2263 #7 0x55b223edd7f0 in portio_read (opaque=0x55b22a836000, addr=0, size=2) at /usr/src/debug/qemu-2.9.0/ioport.c:180 #8 0x55b223ee8e3b in memory_region_read_accessor (mr=0x55b22a836000, addr=0, value=0x7f843b5be7c0, size=2, shift=0, mask=65535, attrs=...) at /usr/src/debug/qemu-2.9.0/memory.c:435 #9 0x55b223ee6369 in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7f843b5be7c0, size=size@entry=2, access_size_min=, access_size_max=, access=access@entry=0x55b223ee8e10 , mr=mr@entry=0x55b22a836000, attrs=attrs@entry=...) at /usr/src/debug/qemu-2.9.0/memory.c:592 #10 0x55b223ee9d36 in memory_region_dispatch_read1 (attrs=..., size=2, pval=0x7f843b5be7c0, addr=0, mr=0x55b22a836000) at /usr/src/debug/qemu-2.9.0/memory.c:1238 #11 memory_region_dispatch_read (mr=mr@entry=0x55b22a836000, addr=addr@entry=0, pval=pval@entry=0x7f843b5be7c0, size=size@entry=2, attrs=attrs@entry=...) at /usr/src/debug/qemu-2.9.0/memory.c:1269 #12 0x55b223e9bdb2 in address_space_read_continue ( as=as@entry=0x55b2247db8c0 , addr=addr@entry=496, ---Type to continue, or q to quit--- attrs=..., attrs@entry=..., buf=buf@entry=0x7f844dc103fe , len=len@entry=2, addr1=0, l=2, mr=0x55b22a836000) at /usr/src/debug/qemu-2.9.0/exec.c:2844 #13 0x55b223e9be67 in address_space_read_full ( as=0x55b2247db8c0 , addr=addr@entry=496, attrs=..., buf=buf@entry=0x7f844dc103fe , len=2, len@entry=602521191) at /usr/src/debug/qemu-2.9.0/exec.c:2895 #14 0x55b223e9bfce in address_space_read (len=602521191, buf=0x7f844dc103fe , attrs=..., addr=496, as=) at /usr/src/debug/qemu-2.9.0/include/exec/memory.h:1718 #15 address_space_rw (as=, addr=addr@entry=496, attrs=..., attrs@entry=..., buf=buf@entry=0x7f844dc103fe , len=len@entry=2, is_write=is_write@entry=false) at /usr/src/debug/qemu-2.9.0/exec.c:2909 #16 0x55b223ee5271 in kvm_handle_io (count=512, size=2, direction=, data=, attrs=..., port=496) at /usr/src/debug/qemu-2.9.0/kvm-all.c:1828 #17 kvm_cpu_exec (cpu=cpu@entry=0x55b229032000) at /usr/src/debug/qemu-2.9.0/kvm-all.c:2058 #18 0x55b223ed1c22 in qemu_kvm_cpu_thread_fn (arg=0x55b229032000) at /usr/src/debug/qemu-2.9.0/cpus.c:1119 #19 0x7f8443993e25 in start_thread (arg=0x7f843b5bf700) at pthread_create.c:308 #20 0x7f84436c134d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 >> ide_data_readw > How do we get here? ide_is_pio_out ought to be false here; do you know > what command was being processed? Do you have a reproducer? > > Knowing both the ATA and ATAPI commands under consideration here would > be helpful. > if you prefer, I can upload core/binary to some storage. Here is the state. (gdb) p *s $3 = {bus = 0x55b22a7129f0, unit = 0 '\000', drive_kind = IDE_CD, cylinders = 0, heads = 0, sectors = 0, chs_trans = 0, nb_sectors = 11588488, mult_sectors = 16, identify_set = 1, identify_data = "\300\205", '\000' , "MQ 1", ' ' , "\003\000\000\002\004\000.2+5 EQUMD DVR-MO", ' ' , "\000\000\001\000\000\003\000\000\000\000\000\000\a", '\000' , "\a\000\a\000\003\000\264\000\264\000,\001\264\000\000\000\000\000
Re: [Qemu-devel] [qemu-s390x] [PATCH v2 for-2.12 16/16] s390x/tcg: drop potential_page_fault()
On 29.11.2017 21:27, David Hildenbrand wrote: > Only one user left, get rid of it so we don't get any new users. > > Reviewed-by: Richard Henderson > Signed-off-by: David Hildenbrand > --- > target/s390x/translate.c | 9 ++--- > 1 file changed, 2 insertions(+), 7 deletions(-) > > diff --git a/target/s390x/translate.c b/target/s390x/translate.c > index 1ce1390699..26cf993405 100644 > --- a/target/s390x/translate.c > +++ b/target/s390x/translate.c > @@ -240,12 +240,6 @@ static void update_cc_op(DisasContext *s) > } > } > > -static void potential_page_fault(DisasContext *s) > -{ > -update_psw_addr(s); > -update_cc_op(s); > -} > - > static inline uint64_t ld_code2(CPUS390XState *env, uint64_t pc) > { > return (uint64_t)cpu_lduw_code(env, pc); > @@ -2939,7 +2933,8 @@ static ExitStatus op_lpd(DisasContext *s, DisasOps *o) > > /* In a parallel context, stop the world and single step. */ > if (tb_cflags(s->tb) & CF_PARALLEL) { > -potential_page_fault(s); > +update_psw_addr(s); > +update_cc_op(s); > gen_exception(EXCP_ATOMIC); > return EXIT_NORETURN; > } > Reviewed-by: Thomas Huth
Re: [Qemu-devel] [PATCH v2 for-2.12 07/16] s390x: handle exceptions during s390_cpu_virt_mem_rw() correctly (TCG)
On 30.11.2017 12:01, Thomas Huth wrote: > On 29.11.2017 21:26, David Hildenbrand wrote: >> s390_cpu_virt_mem_rw() must always return, so callers can react on >> an exception (e.g. see ioinst_handle_stcrw()). >> >> However, for TCG we always have to exit the cpu loop (and restore the >> cpu state before that) if we injected a program interrupt. So let's >> introduce and use s390_cpu_virt_mem_handle_exc() in code that is not >> purely KVM. >> >> Directly pass the retaddr we already have available in these functions. >> >> Signed-off-by: David Hildenbrand >> --- >> hw/s390x/s390-pci-inst.c | 7 +++ >> target/s390x/cpu.h| 1 + >> target/s390x/ioinst.c | 20 +--- >> target/s390x/mmu_helper.c | 14 ++ >> 4 files changed, 39 insertions(+), 3 deletions(-) >> >> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c >> index 8123705dfd..6f41083244 100644 >> --- a/hw/s390x/s390-pci-inst.c >> +++ b/hw/s390x/s390-pci-inst.c >> @@ -163,6 +163,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t >> ra) >> } >> >> if (s390_cpu_virt_mem_read(cpu, env->regs[r2], r2, buffer, >> sizeof(*reqh))) { >> +s390_cpu_virt_mem_handle_exc(cpu, ra); >> return 0; >> } >> reqh = (ClpReqHdr *)buffer; >> @@ -174,6 +175,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t >> ra) >> >> if (s390_cpu_virt_mem_read(cpu, env->regs[r2], r2, buffer, >> req_len + sizeof(*resh))) { >> +s390_cpu_virt_mem_handle_exc(cpu, ra); >> return 0; >> } >> resh = (ClpRspHdr *)(buffer + req_len); >> @@ -189,6 +191,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t >> ra) >> >> if (s390_cpu_virt_mem_read(cpu, env->regs[r2], r2, buffer, >> req_len + res_len)) { >> +s390_cpu_virt_mem_handle_exc(cpu, ra); >> return 0; >> } > [...] > > Having to change all these spots is kind of ugly ... and I guess it will also > be quite error-prone in the future (if someone is developing new code under > KVM only and then forgets to add these handlers for TCG). The good thing is, it won't break KVM but only TCG. And it has been broken in TCG for years now without anybody noticing it. > > Can't you do that simply always at the end of s390_cpu_virt_mem_rw() instead, > when the function has been called in non-checking mode (i.e. hostbuf != NULL)? > Then we only need the extra-dance in places where we call the _check function? Oh god no, no such strange calling conventions from the same function. > >> @@ -212,9 +214,12 @@ void ioinst_handle_stcrw(S390CPU *cpu, uint32_t ipb, >> uintptr_t ra) >> >> if (s390_cpu_virt_mem_write(cpu, addr, ar, &crw, sizeof(crw)) == 0) { >> setcc(cpu, cc); >> -} else if (cc == 0) { >> -/* Write failed: requeue CRW since STCRW is a suppressing >> instruction */ >> -css_undo_stcrw(&crw); >> +} else { >> +if (cc == 0) { >> +/* Write failed: requeue CRW since STCRW is suppressing */ >> +css_undo_stcrw(&crw); >> +} >> +s390_cpu_virt_mem_handle_exc(cpu, ra); >> } >> } > > That hunk would then need to do _check first and in case there is a failure, > call css_undo_stcrw() before the s390_cpu_virt_mem_handle_exc() function, I > think. I don't really like that. And there would be a possibility for a race. So a no from my side. > > Alternatively, the s390_cpu_virt_mem_rw() function could get an additional > parameter > that points to a callback function which is used for "clean-up" right before > the > cpu_loop_exit ... and in this case the callback function would contain the > css_undo_stcrw(). > > What do you think? I also thought about this, but adding two parameters to every call is even uglier. Not talking about having to construct special structs to pass through the data. Please not that the TPI handler I am about to introduce will also require to revert stuff in case there was an exception. Another option I thought about was checking in the caller if EXC_PGM has been set. But missing to add such a check is even easier. I'd prefer to keep it as is, but if there are other opinions/ideas, please speak up. Thanks for having a look! > > Thomas > -- Thanks, David / dhildenb
Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
* Igor Mammedov (imamm...@redhat.com) wrote: > On Wed, 29 Nov 2017 18:50:19 + > "Dr. David Alan Gilbert (git)" wrote: > > > From: "Dr. David Alan Gilbert" > > > > Hi, > > This is an experimental set that reworks the way the vhost > > code handles changes in physical address space layout that > > came from a discussion with Igor. > Thanks for looking into it. > > > > Instead of updating and trying to merge sections of address > > space on each add/remove callback, we wait until the commit phase > > and go through and rebuild a list by walking the Flatview of > > memory and end up producing an ordered list. > > We compare the list to the old list to trigger updates. > > > > Note, only very lightly tested so far, I'm just trying to see if it's > > the right shape. > > > > Igor, is this what you were intending? > > I was thinking about a little less intrusive approach > > where vhost_region_add/del are modified to maintain > sorted by GPA array of mem_sections, vhost_dev::mem is dropped > altogether and vhost_memory_region array is build/used/freed > on every vhost_commit(). > Maintaining sorted array should roughly cost us O(2 log n) if > binary search is used. > > However I like your idea with iterator even more as it have > potential to make it even faster O(n) if we get rid of > quadratic and relatively complex vhost_update_compare_list(). Note vhost_update_compare_list is complex, but it is O(n) - it's got nested loops, but the inner loop moves forward and oldi never gets reset back to zero. > Pls, see comments on individual patches. Thanks; I have fixed a couple of bugs since I posted, so I'm more interested in structure/shape. Any good ideas how to test it are welcome. Dave > > > Dave > > > > Dr. David Alan Gilbert (7): > > memory: address_space_iterate > > vhost: Move log_dirty check > > vhost: New memory update functions > > vhost: update_mem_cb implementation > > vhost: Compare new and old memory lists > > vhost: Copy updated region data into device state > > vhost: Remove vhost_set_memory and children > > > > hw/virtio/trace-events | 8 + > > hw/virtio/vhost.c | 424 > > ++--- > > include/exec/memory.h | 23 +++ > > memory.c | 22 +++ > > 4 files changed, 241 insertions(+), 236 deletions(-) > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [RFC PATCH 1/1] s390x/css: unresrict cssids
On 11/30/2017 10:50 AM, Cornelia Huck wrote: > On Wed, 29 Nov 2017 19:51:23 +0100 > Christian Borntraeger wrote: > >> On 11/28/2017 03:45 PM, Cornelia Huck wrote: >>> On Tue, 28 Nov 2017 15:17:49 +0100 >>> Christian Borntraeger wrote: >>> On 11/28/2017 03:01 PM, Cornelia Huck wrote: > On Tue, 28 Nov 2017 14:25:08 +0100 > Christian Borntraeger wrote: >>> >> What I want now is to enable vfio-ccw for libvirt and Linux guests and >> for that >> we need to lift the restriction of having vfio not in FE. > > This, however, worries me. I don't really consider vfio-ccw ready for > prime time yet. We still need to figure out path handling at the very > least. And I'm still not sure that our non-handling of halt/clear won't > cause major issues with e.g. a storage server error recovery. > > Can we flag vfio-ccw as experimental or such in libvirt? We should have flagged vfio-ccw experimental in QEMU then. e.g. by using x-vfio-ccw or whatever.I dont think we can do that after the facts. >>> >>> Yes, we should have done that... can't fix that now, unfortunately. >>> I am not that deep into vfio-ccw, but I was under the impression that the missing features should not affect the vfio interface that is created by libvirt. After all this should just be a "-device vfio-ccw,." statement and some setup steps beforehand (unbind + setup of vfio-ccw) >>> >>> The halt/clear stuff is highly unlikely to influence the configuration >>> interface. I'm still not too clear about path handling, though. Will we >>> need different modes (hypervisor managed vs. full passthrough? probably >>> only passthrough)? What about pgid handling - do we need some kind of >>> pgid manager? (Keep in mind that you get to set the pgid once. This has >>> implications for e.g. reserve/release.) >>> >>> Also, what about devices other than ECKD DASD? Has there been any >>> testing? Tapes should be interesting; the other interesting ccw devices >>> are QDIO-based and I'm not sure we want to spend anything on supporting >>> those. >> >> DASD is probably the most important thing today, QDIO might never happen >> (or very late). > > One thing I'd find interesting about tapes is very long running channel > programs (like rewind) and how they interact with halt/clear. But yes, > I would think that DASD is the most important one in practice. > >> See my proposal below. >> >>> >>> The interface is probably fine, but I'd like to get an idea about the >>> path stuff (is the attachment spec that contains the pgid stuff >>> publicly available, btw.?) >>> If your concern is the serviceability I think it would be valid for a RHEL KVM to disable vfio-ccw in the kernel. Maybe we could even provide a config for QEMU? >>> >>> It's fine just to turn off vfio-ccw in the kernel. >>> PS: I learned from the PCI and CCW experience that I do not want to upstream kernel/qemu code unless I have a working libvirt code that shows that the thing will work. >>> >>> Yes, I understand the wish to verify the interface, and I think it's a >>> good idea. What I'm worried about is that this might be the precursor >>> to a push to support it, and I don't think vfio-ccw is ready for that >>> yet. >>> >> FWIW, libvirt should not care if a device works in all cases or not because >> libvirt versions should work with all kind of QEMU/kernel combinations. >> Fencing in libvirt that the kernel part is not up to the task is making >> me feel the same way as you when you see the css-unrestricted property >> at a device ;-) > > I'm more worried about the political angle than the technical. If we > don't get pressure to support this too soon, I don't care that much > about experimental or not. > >> Having said that, I think that having vfio-ccw support has a value (and it >> actually works for an unnamed test infrastructure). In addition to that I >> am much more likely to actually test this continuously if I have libvirt >> support. > > Good to hear that it works for a non-Linux guest. Any plans to test > something like storage server failover? > > [I'd really love to do something about the path handling stuff, but the > combination of scant documentation and scantier time works against me.] > >> >> >> So what about the following: >> >> 1. We will implement the libvirt support with either a or b: >> a: if we find a solution to our "where to put the property" dispute use that >> to decide >> if we can add vfio-ccw >> b if not: just provide a patch that lifts the restriction without any >> property. >> in libvirt blindy assume that free assignment will work. old QEMUs will >> complain at >> startup and libvirt will print the QEMU error. This is similar to other >> situations >> where libvirt cannot fully bprobe if the QEMU will work or not. (not having >> vfio-ccw >> support in the kernel will certa
Re: [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API
18.11.2017 00:35, John Snow wrote: On 11/17/2017 03:22 AM, Vladimir Sementsov-Ogievskiy wrote: 17.11.2017 06:10, John Snow wrote: On 11/16/2017 03:17 AM, Vladimir Sementsov-Ogievskiy wrote: 16.11.2017 00:20, John Snow wrote: On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote: Hi all. There are three qmp commands, needed to implement external backup API. Using these three commands, client may do all needed bitmap management by hand: on backup start we need to do a transaction: {disable old bitmap, create new bitmap} on backup success: drop old bitmap on backup fail: enable old bitmap merge new bitmap to old bitmap drop new bitmap Can you give me an example of how you expect these commands to be used, and why they're required? I'm a little weary about how error-prone these commands might be and the potential for incorrect usage seems... high. Why do we require them? It is needed for incremental backup. It looks like bad idea to export abdicate/reclaim functionality, it is simpler and clearer to allow user to merge/enable/disable bitmaps by hand. usage is like this: 1. we have dirty bitmap bitmap0 for incremental backup. 2. prepare image fleecing (create temporary image with backing=our_disk) 3. in qmp transaction: - disable bitmap0 - create bitmap1 - start image fleecing (backup sync=none our_disk -> temp_disk) This could probably just be its own command, though: block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera Could handle forking the bitmap. I'm not sure what the arguments would look like, but we could name the NBD export here, too. (Assuming the server is already started and we just need to create the share.) Then, we can basically do what mirror does: (1) Cancel (2) Complete Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back), and Complete would instruct QEMU to discard the changes. This way we don't need to expose commands like split or merge that will almost always be dangerous to use over QMP. In fact, a fleecing job would be really convenient even without a bitmap, because it'd still be nice to have a convenience command for it. Using an existing infrastructure and understood paradigm is just a bonus. 1. If I understand correctly, Kevin and Max said in their report in Prague about new block-job approach, using filter nodes, so I'm not sure that this is a good Idea to introduce now new old-style block-job, where we can do without it. We could do without it, but it might be a lot better to have everything wrapped up in a command that's easy to digest instead of releasing 10 smaller commands that have to be executed in a very specific way in order to work correctly. I'm thinking about the complexity of error checking here with all the smaller commands, versus error checking on a larger workflow we understand and can quality test better. I'm not sure that filter nodes becoming the new normal for block jobs precludes our ability to use the job-management API as a handle for managing the lifetime of a long-running task like fleecing, but I'll check with Max and Kevin about this. 2. there is the following scenario: customers needs a possibility to create a backup of data changed since some point in time. So, maintaining several static and one (or several) activ bitmaps with a possiblity of merge some of them and create a backup using this merged bitmap may be convenient. I think the ability to copy bitmaps and issue differential backups would be sufficient in all cases I could think of... so, instead of keeping several static bitmaps with ability to merge them, you propose to keep several active bitmaps and copy them to make a backup? so, instead of new qmp command for merge, add new qmp command for copy? in case of static bitmaps, we can implement saving/loading them to the image to free RAM space, so it is better. or what do you propose for [2] ? -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v2 for-2.12 08/16] s390x/tcg: don't exit the cpu loop in s390_cpu_virt_mem_rw()
On Thu, 30 Nov 2017 12:57:00 +0100 David Hildenbrand wrote: > On 30.11.2017 12:39, Thomas Huth wrote: > > On 29.11.2017 21:26, David Hildenbrand wrote: > >> s390_cpu_virt_mem_rw() must always return, so callers can react on > >> an exception (e.g. see ioinst_handle_stcrw()). > >> > >> Therefore, using program_interrupt() is wrong. Fix that up. > >> > >> Signed-off-by: David Hildenbrand > >> --- > >> target/s390x/mmu_helper.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c > >> index dbe2f511f8..2c7f3d7d95 100644 > >> --- a/target/s390x/mmu_helper.c > >> +++ b/target/s390x/mmu_helper.c > >> @@ -459,7 +459,7 @@ static int translate_pages(S390CPU *cpu, vaddr addr, > >> int nr_pages, > >> } > >> if (!address_space_access_valid(&address_space_memory, pages[i], > >> TARGET_PAGE_SIZE, is_write)) { > >> -program_interrupt(env, PGM_ADDRESSING, ILEN_AUTO); > >> +trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO); > >> return -EFAULT; > >> } > >> addr += TARGET_PAGE_SIZE; > > > > Is that still right when running with KVM? I think the exception will > > then silently be ignored instead? > > Good point (for older KVM). And ugly. > > if (kvm_enabled()) { > kvm_s390_program_interrupt... > } else { > trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO); > } Maybe add a comment that this is only for old kernels that do not support the mem op interface?
Re: [Qemu-devel] [PATCH v4 08/32] migration: allow send_rq to fail
* Peter Xu (pet...@redhat.com) wrote: > We will not allow failures to happen when sending data from destination > to source via the return path. However it is possible that there can be > errors along the way. This patch allows the migrate_send_rp_message() > to return error when it happens, and further extended it to > migrate_send_rp_req_pages(). > > Reviewed-by: Dr. David Alan Gilbert > Signed-off-by: Peter Xu > --- > migration/migration.c | 38 ++ > migration/migration.h | 2 +- > 2 files changed, 31 insertions(+), 9 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 8d93b891e3..db896233f6 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -199,17 +199,35 @@ static void deferred_incoming_migration(Error **errp) > * Send a message on the return channel back to the source > * of the migration. > */ > -static void migrate_send_rp_message(MigrationIncomingState *mis, > -enum mig_rp_message_type message_type, > -uint16_t len, void *data) > +static int migrate_send_rp_message(MigrationIncomingState *mis, > + enum mig_rp_message_type message_type, > + uint16_t len, void *data) > { > +int ret = 0; > + > trace_migrate_send_rp_message((int)message_type, len); > qemu_mutex_lock(&mis->rp_mutex); > + > +/* > + * It's possible that the file handle got lost due to network > + * failures. > + */ > +if (!mis->to_src_file) { > +ret = -EIO; > +goto error; > +} > + > qemu_put_be16(mis->to_src_file, (unsigned int)message_type); > qemu_put_be16(mis->to_src_file, len); > qemu_put_buffer(mis->to_src_file, data, len); > qemu_fflush(mis->to_src_file); > + > +/* It's possible that qemu file got error during sending */ > +ret = qemu_file_get_error(mis->to_src_file); > + > +error: > qemu_mutex_unlock(&mis->rp_mutex); > +return ret; > } > > /* Request a range of pages from the source VM at the given > @@ -219,26 +237,30 @@ static void > migrate_send_rp_message(MigrationIncomingState *mis, > * Start: Address offset within the RB > * Len: Length in bytes required - must be a multiple of pagesize > */ > -void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char > *rbname, > - ram_addr_t start, size_t len) > +int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char > *rbname, > + ram_addr_t start, size_t len) > { > uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname up to 256 */ > size_t msglen = 12; /* start + len */ > +int rbname_len; > +enum mig_rp_message_type msg_type; > > *(uint64_t *)bufc = cpu_to_be64((uint64_t)start); > *(uint32_t *)(bufc + 8) = cpu_to_be32((uint32_t)len); > > if (rbname) { > -int rbname_len = strlen(rbname); > +rbname_len = strlen(rbname); I don't think that move of the declaration of rbname_len is necessary; it's only msglen that you need to keep for longer. Dave > assert(rbname_len < 256); > > bufc[msglen++] = rbname_len; > memcpy(bufc + msglen, rbname, rbname_len); > msglen += rbname_len; > -migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES_ID, msglen, bufc); > +msg_type = MIG_RP_MSG_REQ_PAGES_ID; > } else { > -migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES, msglen, bufc); > +msg_type = MIG_RP_MSG_REQ_PAGES; > } > + > +return migrate_send_rp_message(mis, msg_type, msglen, bufc); > } > > void qemu_start_incoming_migration(const char *uri, Error **errp) > diff --git a/migration/migration.h b/migration/migration.h > index ebb049f692..b63cdfbfdb 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -216,7 +216,7 @@ void migrate_send_rp_shut(MigrationIncomingState *mis, >uint32_t value); > void migrate_send_rp_pong(MigrationIncomingState *mis, >uint32_t value); > -void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* > rbname, > +int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* > rbname, >ram_addr_t start, size_t len); > > #endif > -- > 2.13.6 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board
On 30 November 2017 at 08:53, Thomas Huth wrote: > We can output a character quite easily here with some few lines of > assembly that we provide as a mini-kernel for this board. > > Signed-off-by: Thomas Huth > --- > tests/boot-serial-test.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c > index d997269..dd3828c 100644 > --- a/tests/boot-serial-test.c > +++ b/tests/boot-serial-test.c > @@ -16,6 +16,14 @@ > #include "qemu/osdep.h" > #include "libqtest.h" > > +static const uint8_t kernel_mcf5208[] = { > +0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */ > +0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */ > +0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) Enable > TX */ > +0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0) Print > 'T' */ > +0x60, 0xfa /* bra.s loop */ > +}; This approach doesn't seem to be scalable to me -- are we really going to have 50 or more fragments of hand-coded hex in this file to cover the various board models? I'd much rather see us have a framework for being able to build test blobs from source using a cross compiler setup (and docker or similar so anybody can rebuild the test blobs). That will be much easier to maintain and easier to extend to having tests that test other parts of the board or other aspects of TCG emulation. thanks -- PMM
Re: [Qemu-devel] [PATCH v4 18/20] vpc: Switch to .bdrv_co_block_status()
12.10.2017 21:59, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Update the vpc driver accordingly. Drop the now-unused get_sector_offset(). Signed-off-by: Eric Blake --- v4: rebase to interface tweak v3: rebase to master v2: drop get_sector_offset() [Kevin], rebase to mapping flag --- block/vpc.c | 42 ++ 1 file changed, 22 insertions(+), 20 deletions(-) diff --git a/block/vpc.c b/block/vpc.c index 1576d7b595..4100ce1ed3 100644 --- a/block/vpc.c +++ b/block/vpc.c @@ -705,52 +705,54 @@ fail: return ret; } -static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs, -int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file) +static int coroutine_fn vpc_co_block_status(BlockDriverState *bs, +bool want_zero, +int64_t offset, int64_t bytes, +int64_t *pnum, int64_t *map, +BlockDriverState **file) { BDRVVPCState *s = bs->opaque; VHDFooter *footer = (VHDFooter*) s->footer_buf; -int64_t start, offset; +int64_t start, image_offset; bool allocated; -int64_t ret; +int ret; int n; if (be32_to_cpu(footer->type) == VHD_FIXED) { -*pnum = nb_sectors; +*pnum = bytes; +*map = offset; *file = bs->file->bs; -return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | - (sector_num << BDRV_SECTOR_BITS); +return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID; } qemu_co_mutex_lock(&s->lock); -offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, false, NULL); -start = offset; -allocated = (offset != -1); +image_offset = get_image_offset(bs, offset, false, NULL); +start = image_offset & BDRV_BLOCK_OFFSET_MASK; why do you do so? round down to sector boundary? does it mean, that *map will corresponds not to offset, but to some previous byte? +allocated = (image_offset != -1); *pnum = 0; ret = 0; do { /* All sectors in a block are contiguous (without using the bitmap) */ -n = ROUND_UP(sector_num + 1, s->block_size / BDRV_SECTOR_SIZE) - - sector_num; -n = MIN(n, nb_sectors); +n = ROUND_UP(offset + 1, s->block_size) - offset; +n = MIN(n, bytes); *pnum += n; -sector_num += n; -nb_sectors -= n; +offset += n; +bytes -= n; /* *pnum can't be greater than one block for allocated * sectors since there is always a bitmap in between. */ if (allocated) { *file = bs->file->bs; -ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start; +*map = start; +ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID; break; } -if (nb_sectors == 0) { +if (bytes == 0) { break; } -offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, false, - NULL); +image_offset = get_image_offset(bs, offset, false, NULL); } while (offset == -1); image_offset here? qemu_co_mutex_unlock(&s->lock); @@ -1097,7 +1099,7 @@ static BlockDriver bdrv_vpc = { .bdrv_co_preadv = vpc_co_preadv, .bdrv_co_pwritev= vpc_co_pwritev, -.bdrv_co_get_block_status = vpc_co_get_block_status, +.bdrv_co_block_status = vpc_co_block_status, .bdrv_get_info = vpc_get_info, -- Best regards, Vladimir
[Qemu-devel] [Bug 1735384] Re: OpenJDK JVM segfaults on qemu-sh4 (regression)
This sounds like it may be the bug fixed by this patchset: https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg05067.html -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1735384 Title: OpenJDK JVM segfaults on qemu-sh4 (regression) Status in QEMU: New Bug description: Some of the recent changes introduced a regression which makes the OpenJDK JVM crash on qemu-sh4: (sid-sh4-sbuild)root@nofan:/# java -version qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault (sid-sh4-sbuild)root@nofan:/# An older version works fine: (sid-sh4-sbuild)root@nofan:/# java -version openjdk version "9.0.1" OpenJDK Runtime Environment (build 9.0.1+11-Debian-1) OpenJDK Zero VM (build 9.0.1+11-Debian-1, interpreted mode) (sid-sh4-sbuild)root@nofan:/# Haven't had time for bisecting this yet. Adrian To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1735384/+subscriptions
Re: [Qemu-devel] [PATCH v4 19/20] vvfat: Switch to .bdrv_co_block_status()
12.10.2017 21:59, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Update the vvfat driver accordingly. Note that we can rely on the block driver having already clamped limits to our block size, and simplify accordingly. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy --- v4: rebase to interface tweak v3: no change v2: rebase to earlier changes, simplify --- block/vvfat.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/block/vvfat.c b/block/vvfat.c index a0f2335894..9142117fc6 100644 --- a/block/vvfat.c +++ b/block/vvfat.c @@ -3082,15 +3082,13 @@ vvfat_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes, return ret; } -static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs, -int64_t sector_num, int nb_sectors, int *n, BlockDriverState **file) +static int coroutine_fn vvfat_co_block_status(BlockDriverState *bs, + bool want_zero, int64_t offset, + int64_t bytes, int64_t *n, may be rename to *pnum ? + int64_t *map, + BlockDriverState **file) { -*n = bs->total_sectors - sector_num; -if (*n > nb_sectors) { -*n = nb_sectors; -} else if (*n < 0) { -return 0; -} +*n = bytes; return BDRV_BLOCK_DATA; } @@ -3251,7 +3249,7 @@ static BlockDriver bdrv_vvfat = { .bdrv_co_preadv = vvfat_co_preadv, .bdrv_co_pwritev= vvfat_co_pwritev, -.bdrv_co_get_block_status = vvfat_co_get_block_status, +.bdrv_co_block_status = vvfat_co_block_status, }; static void bdrv_vvfat_init(void) -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs
Am 29.11.2017 um 18:56 hat Alberto Garcia geschrieben: > Starting from commit 40840e419be31e6a32e6ea24511c74b389d5e0e4 we are > pausing all block jobs during bdrv_reopen_multiple() to prevent any of > them from finishing and removing nodes from the graph while they are > being reopened. > > It turns out that pausing a block job doesn't necessarily prevent it > from finishing: a paused block job can still run its exit function > from the main loop and call block_job_completed(). The mirror block > job in particular always goes to the main loop while it is paused (by > virtue of the bdrv_drained_begin() call in mirror_run()). > > Destroying a paused block job during bdrv_reopen_multiple() has two > consequences: > >1) The references to the nodes involved in the job are released, > possibly destroying some of them. If those nodes were in the > reopen queue this would trigger the problem originally described > in commit 40840e419be, crashing QEMU. This specific problem could be avoided by making the BDS reference in the reopen queue strong, i.e. bdrv_ref() in bdrv_reopen_queue_child() and bdrv_unref() only at the end of bdrv_reopen_multiple(). >2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would > not be doing all necessary bdrv_parent_drained_end() calls. If I understand correctly, you don't have a reproducer here. I'm not convinced that this one actually exists because the functions that do the graph modifications (specficically bdrv_replace_child_noperm), automatically drain and undrain nodes as necessary. > I can reproduce problem 1) easily with iotest 030 by increasing > STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking > the iotest like in this example: > >https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html > > This patch keeps an additional reference to all block jobs between > block_job_pause_all() and block_job_resume_all(), guaranteeing that > they are kept alive. This might be okay as a stopgap solution if this is a real problem in practice because we don't have a better solution right now. However, I haven't seen any sign of there being an -rc4, so we're probably too late for 2.11 anyway. It's certainly not a full solution because keeping a reference to a block job does not prevent it from completing, but only from being freed. Most block jobs do graph modifications, including dropping the references to nodes, already when they complete, not only when they are freed. Now, while the above might have sounded like we have an easy solution in bdrv_reopen(), I see another problem that looks quite tough: 3) bdrv_reopen_queue() is a recursive operation that involves all children that inherited options. If the graph changes between the bdrv_reopen_queue() call and the end of bdrv_reopen_multiple(), the calculated options (and today possibly permissions) don't actually match the graph any more. The requirement we really have is that between bdrv_reopen_queue() and bdrv_reopen_multiple() no graph changes are made. Calling bdrv_drain_all_begin() in bdrv_reopen_multiple() is too late. Maybe that actually gives us another relatively simple solution: We need to push up the drain into the callers, and possibly just assert that the nodes are drained in the reopen functions. Kevin
Re: [Qemu-devel] [PATCH v4 20/20] block: Drop unused .bdrv_co_get_block_status()
12.10.2017 21:59, Eric Blake wrote: We are gradually moving away from sector-based interfaces, towards byte-based. Now that all drivers have been updated to provide the byte-based .bdrv_co_block_status(), we can delete the sector-based interface. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [Qemu-devel] [RFC PATCH v2 1/1] s390x/css: unrestrict cssids
On 11/29/2017 06:29 PM, Cornelia Huck wrote: [..] With this change, however, our schema for generating a css bus ids, if none is specified does not make much sense. Currently we start at cssid 0x0 for non-virtual devices and use the default css (without s390-squash-mcss exclusively) for virtual devices. That means for non-virtual device the device would auto-magically end up non-visible for guests (which can't use the other channel subsystems). Thus let us also change the css bus id auto assignment algorithm, so that we first fill the default css, and then proceed to the next one (modulo MAX_CSSID). >>> >>> Let's reword the previous two paragraphs: >>> >>> "At the same time, change our schema for generating css bus ids to put >>> both virtual and non-virtual devices into the default css (spilling >>> over into other css images, if needed) so that devices without a >>> specified id don't end up hidden to guests not supporting multiple >>> channel subsystems." >>> >> >> What I don't like about your explanation is, that "so that devices without >> a specified id don't end up hidden to guests not supporting multiple channel >> subsystems" is not necessarily true: if we spill the devices are going >> to end up hidden. > > Let's make this "don't always end up hidden". I would prefer to go with this. At the same time, change our schema for generating css bus ids to put both virtual and non-virtual devices into the default css (spilling over into other css images, if needed). The intention is to deprecate s390-squash-mcss. Whit this change devices without a specified devno won't end up hidden to guests not supporting multiple channel subsystems, unless this can not be avoided (default css full). > >> The adverse effect of getting rid of the restriction on migration should not be too severe. Vfio-ccw devices are not live-migratable yet, and for virtual devices using the extra freedom would only make sense with the aforementioned guest support in place. The auto-generated bus ids are affected by both changes. We hope to not [..] >> >> I tired to be quite elaborate, because at some point of the v1 >> discussion, you said if we are planting landmines you want them explained >> in the commit message. I'm not sure how this document file is supposed >> to be called, and look like. I think this stuff is relevant for >> the decision why is this patch a good one, and what are the trade-offs >> we make. Referencing to a document would be also OK with me. > > I don't think there will be landmines left, no? Or do I misread? > The patch basically just got worse with changing the schema. If there were landmines they are still there. My original point was that it's bearable in practice, so that's why I had this short and vague ain't too bad formulation. >> >> Regarding the deprecation patch. It's already on the list as RFC. You >> have even commented on it. I intend to make a v2 once we know what are >> we going to do here. > > This needs to be a patch series with a cover letter. Discussing in > multiple places is draining. > Can do. No problem for me. >> Signed-off-by: Halil Pasic --- Hi all! Moving the property to the machine turned out very ugly (IMHO). Libvirt detects machine properties via query-command-line-options. This is however decoupled from the existence of the actual property: one needs to touch util/qemu-config.c (see patch) so the property shows up. Furthermore this stuff is global. I've also noticed that the infamous s390-squash-mcss does not show up. >>> >>> s390x-softmmu/qemu-system-s390x -machine s390-ccw-virtio,help >>> >>> shows it, as does qom-list on /machine. >> >> For qom-list you need an instance. Libvirt probes such stuff using >> (btw probing is done with machine type none, and qom-list /machine >> should not list squash if we are running machine type none). >> >>> >>> Output of --help is very haphazard anyway and you should >>> rely on the interfaces above. >>> >> >> I disagree. AFAIK management software should probe using the >> query-command-line-options QMP command. Or am I missing something? >> >> I don't speak about the output of --help here. > > It's the same interface. It both over- and underreports. Querying the > actual object is the only way to get this reliable. If that is not > possible today, it needs to be implemented. > Sorry I did not look into how --help is implemented. For what we are trying to accomplish here it's irrelevant. Wonder why did you have to bring this up and make the discussion even more complicated. >> On the other hand to get the property displayed by -machine s390-ccw-virtio,help one needs a setter on the property. So I have created a fake setter which produces an error each time called. >>> >>> Yes, this is fugly. A css object would be a better place, but way t
Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board
On 30.11.2017 13:14, Peter Maydell wrote: > On 30 November 2017 at 08:53, Thomas Huth wrote: >> We can output a character quite easily here with some few lines of >> assembly that we provide as a mini-kernel for this board. >> >> Signed-off-by: Thomas Huth >> --- >> tests/boot-serial-test.c | 10 ++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c >> index d997269..dd3828c 100644 >> --- a/tests/boot-serial-test.c >> +++ b/tests/boot-serial-test.c >> @@ -16,6 +16,14 @@ >> #include "qemu/osdep.h" >> #include "libqtest.h" >> >> +static const uint8_t kernel_mcf5208[] = { >> +0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */ >> +0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */ >> +0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) Enable >> TX */ >> +0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0) Print >> 'T' */ >> +0x60, 0xfa /* bra.s loop */ >> +}; > > This approach doesn't seem to be scalable to me -- are we > really going to have 50 or more fragments of hand-coded hex in > this file to cover the various board models? No, since this only works for certain few boards with these criteria: 1) There must be a way to load such small blobs either with -kernel or -bios. Many boards only support loading kernels in the ELF format, and we certainly don't want to encode the hex-dump of such a kernel here. Or some boards / CPUs also support -bios, but the entry point must be at the end of a large image, so that also does not make sense to include them here. 2) The UART must be usable with some few lines of assembly. That's also not always the case. So no, this is really not meant to scale to all boards, it's just meant to get at least some few more test coverage, especially while we're not having other test mechanisms in place yet. (e.g. how long are we talking about reviving tests/tcg/ again already? ... some attempts have been made to make it compilable again, but non of the patch series has been included yet so far) > I'd much rather see us have a framework for being able > to build test blobs from source using a cross compiler > setup (and docker or similar so anybody can rebuild > the test blobs). That will be much easier to maintain > and easier to extend to having tests that test other > parts of the board or other aspects of TCG emulation. I agree that this idea is way more flexible and could cover many more boards, but there are also things that need to be solved first: Not everybody has docker / cross-compilers installed or can use them, so we likely need a set of pre-build binary images somewhere. But we certainly don't want to include megabytes of blobs in the git repository ... so this would need to go into an external repository instead. Then you likely can not include this in "make check" so easily anymore (unless you force everybody to check out the external repository with the git-submodule.sh script - but I really dislike this idea) ... So yes, we should have better test coverage for almost all machines if possible, but I don't see that this is happening soon or could be really tightly integrated into "make check". I.e. the boot-serial tester could fill at least parts of this gap, I think. Thomas
Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board
On 30/11/2017 13:14, Peter Maydell wrote: > On 30 November 2017 at 08:53, Thomas Huth wrote: >> +static const uint8_t kernel_mcf5208[] = { >> +0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */ >> +0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */ >> +0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) Enable >> TX */ >> +0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0) Print >> 'T' */ >> +0x60, 0xfa /* bra.s loop */ >> +}; > > This approach doesn't seem to be scalable to me -- are we > really going to have 50 or more fragments of hand-coded hex in > this file to cover the various board models? > > I'd much rather see us have a framework for being able > to build test blobs from source using a cross compiler > setup (and docker or similar so anybody can rebuild > the test blobs). That will be much easier to maintain > and easier to extend to having tests that test other > parts of the board or other aspects of TCG emulation. It seems a bit overkill, as these snippets are ~16 bytes long. However, it would be useful to have a basic patching mechanism so that board descriptions could include a common hand-coded const array and place an address at a given offset. So you'd have struct HexFirmware { int patch_offset; short patch_size; boolpatch_bigendian; uint8_t data[32]; } and microblaze boards could have: struct HexFirmware kernel_microblaze = { .patch_offset= 0, .patch_size = 2, .patch_bigendian = false, .data = { 0xaa, 0xaa, 0x00, 0xb0, /* imm 0x */ 0x00, 0x10, 0x60, 0x30, /* addik r3,r0,0x1000 */ 0x54, 0x00, 0x80, 0x30, /* addik r4,r0,'T' */ 0x00, 0x00, 0x83, 0xf0, /* sbi r4,r3,0 */ 0xfc, 0xff, 0x00, 0xb8, /* bri -4 loop */ } }; ... { "microblaze", "petalogix-s3adsp1800", "", "TT", kernel_microblaze, 0x8400 }, { "microblazeel", "petalogix-ml605", "", "TT", kernel_microblaze, 0x83a0 }, Likewise, you could have just two copies of the code for all ARM boards that have a pl011 (or any other UART with a simple byte-long transmit register), one 32-bit and one 64-bit. Thanks, Paolo
Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
On Thu, 30 Nov 2017 12:08:06 + "Dr. David Alan Gilbert" wrote: > * Igor Mammedov (imamm...@redhat.com) wrote: > > On Wed, 29 Nov 2017 18:50:19 + > > "Dr. David Alan Gilbert (git)" wrote: > > > > > From: "Dr. David Alan Gilbert" > > > > > > Hi, > > > This is an experimental set that reworks the way the vhost > > > code handles changes in physical address space layout that > > > came from a discussion with Igor. > > Thanks for looking into it. > > > > > > > Instead of updating and trying to merge sections of address > > > space on each add/remove callback, we wait until the commit phase > > > and go through and rebuild a list by walking the Flatview of > > > memory and end up producing an ordered list. > > > We compare the list to the old list to trigger updates. > > > > > > Note, only very lightly tested so far, I'm just trying to see if it's > > > the right shape. > > > > > > Igor, is this what you were intending? > > > > I was thinking about a little less intrusive approach > > > > where vhost_region_add/del are modified to maintain > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped > > altogether and vhost_memory_region array is build/used/freed > > on every vhost_commit(). > > Maintaining sorted array should roughly cost us O(2 log n) if > > binary search is used. > > > > However I like your idea with iterator even more as it have > > potential to make it even faster O(n) if we get rid of > > quadratic and relatively complex vhost_update_compare_list(). > > Note vhost_update_compare_list is complex, > but it is O(n) - it's > got nested loops, but the inner loop moves forward and oldi never > gets reset back to zero. While skimming through patches I've overlooked it. Anyways, why memcmp(old_arr, new_arr) is not sufficient to detect a change in memory map? > > > Pls, see comments on individual patches. > > Thanks; I have fixed a couple of bugs since I posted, so I'm > more interested in structure/shape. Any good ideas how to test > it are welcome. > > Dave > > > > > > Dave > > > > > > Dr. David Alan Gilbert (7): > > > memory: address_space_iterate > > > vhost: Move log_dirty check > > > vhost: New memory update functions > > > vhost: update_mem_cb implementation > > > vhost: Compare new and old memory lists > > > vhost: Copy updated region data into device state > > > vhost: Remove vhost_set_memory and children > > > > > > hw/virtio/trace-events | 8 + > > > hw/virtio/vhost.c | 424 > > > ++--- > > > include/exec/memory.h | 23 +++ > > > memory.c | 22 +++ > > > 4 files changed, 241 insertions(+), 236 deletions(-) > > > > > > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [Bug 1735384] Re: OpenJDK JVM segfaults on qemu-sh4 (regression)
On 11/30/2017 01:19 PM, Peter Maydell wrote: > This sounds like it may be the bug fixed by this patchset: > https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg05067.html Unfortunately not. I will upload a prepared chroot for testing later and link it in this bug report. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaub...@debian.org `. `' Freie Universitaet Berlin - glaub...@physik.fu-berlin.de `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1735384 Title: OpenJDK JVM segfaults on qemu-sh4 (regression) Status in QEMU: New Bug description: Some of the recent changes introduced a regression which makes the OpenJDK JVM crash on qemu-sh4: (sid-sh4-sbuild)root@nofan:/# java -version qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault (sid-sh4-sbuild)root@nofan:/# An older version works fine: (sid-sh4-sbuild)root@nofan:/# java -version openjdk version "9.0.1" OpenJDK Runtime Environment (build 9.0.1+11-Debian-1) OpenJDK Zero VM (build 9.0.1+11-Debian-1, interpreted mode) (sid-sh4-sbuild)root@nofan:/# Haven't had time for bisecting this yet. Adrian To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1735384/+subscriptions
Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
* Igor Mammedov (imamm...@redhat.com) wrote: > On Thu, 30 Nov 2017 12:08:06 + > "Dr. David Alan Gilbert" wrote: > > > * Igor Mammedov (imamm...@redhat.com) wrote: > > > On Wed, 29 Nov 2017 18:50:19 + > > > "Dr. David Alan Gilbert (git)" wrote: > > > > > > > From: "Dr. David Alan Gilbert" > > > > > > > > Hi, > > > > This is an experimental set that reworks the way the vhost > > > > code handles changes in physical address space layout that > > > > came from a discussion with Igor. > > > Thanks for looking into it. > > > > > > > > > > Instead of updating and trying to merge sections of address > > > > space on each add/remove callback, we wait until the commit phase > > > > and go through and rebuild a list by walking the Flatview of > > > > memory and end up producing an ordered list. > > > > We compare the list to the old list to trigger updates. > > > > > > > > Note, only very lightly tested so far, I'm just trying to see if it's > > > > the right shape. > > > > > > > > Igor, is this what you were intending? > > > > > > I was thinking about a little less intrusive approach > > > > > > where vhost_region_add/del are modified to maintain > > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped > > > altogether and vhost_memory_region array is build/used/freed > > > on every vhost_commit(). > > > Maintaining sorted array should roughly cost us O(2 log n) if > > > binary search is used. > > > > > > However I like your idea with iterator even more as it have > > > potential to make it even faster O(n) if we get rid of > > > quadratic and relatively complex vhost_update_compare_list(). > > > > Note vhost_update_compare_list is complex, > > but it is O(n) - it's > > got nested loops, but the inner loop moves forward and oldi never > > gets reset back to zero. > While skimming through patches I've overlooked it. > > Anyways, > why memcmp(old_arr, new_arr) is not sufficient > to detect a change in memory map? It tells you that you've got a change, but doesn't give the start/end of the range that's changed, and those are used by vhost_commit to limit the work of vhost_verify_ring_mappings. Dave > > > > > Pls, see comments on individual patches. > > > > Thanks; I have fixed a couple of bugs since I posted, so I'm > > more interested in structure/shape. Any good ideas how to test > > it are welcome. > > > > Dave > > > > > > > > > Dave > > > > > > > > Dr. David Alan Gilbert (7): > > > > memory: address_space_iterate > > > > vhost: Move log_dirty check > > > > vhost: New memory update functions > > > > vhost: update_mem_cb implementation > > > > vhost: Compare new and old memory lists > > > > vhost: Copy updated region data into device state > > > > vhost: Remove vhost_set_memory and children > > > > > > > > hw/virtio/trace-events | 8 + > > > > hw/virtio/vhost.c | 424 > > > > ++--- > > > > include/exec/memory.h | 23 +++ > > > > memory.c | 22 +++ > > > > 4 files changed, 241 insertions(+), 236 deletions(-) > > > > > > > > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board
On 30.11.2017 13:40, Paolo Bonzini wrote: > On 30/11/2017 13:14, Peter Maydell wrote: >> On 30 November 2017 at 08:53, Thomas Huth wrote: >>> +static const uint8_t kernel_mcf5208[] = { >>> +0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */ >>> +0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */ >>> +0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) Enable >>> TX */ >>> +0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0) Print >>> 'T' */ >>> +0x60, 0xfa /* bra.s loop */ >>> +}; >> >> This approach doesn't seem to be scalable to me -- are we >> really going to have 50 or more fragments of hand-coded hex in >> this file to cover the various board models? >> >> I'd much rather see us have a framework for being able >> to build test blobs from source using a cross compiler >> setup (and docker or similar so anybody can rebuild >> the test blobs). That will be much easier to maintain >> and easier to extend to having tests that test other >> parts of the board or other aspects of TCG emulation. > > It seems a bit overkill, as these snippets are ~16 bytes long. > > However, it would be useful to have a basic patching mechanism so that > board descriptions could include a common hand-coded const array and > place an address at a given offset. So you'd have > > struct HexFirmware { > int patch_offset; > short patch_size; > boolpatch_bigendian; > uint8_t data[32]; > } > > and microblaze boards could have: > > struct HexFirmware kernel_microblaze = { > .patch_offset= 0, > .patch_size = 2, > .patch_bigendian = false, > .data = { > 0xaa, 0xaa, 0x00, 0xb0, /* imm 0x */ > 0x00, 0x10, 0x60, 0x30, /* addik r3,r0,0x1000 */ > 0x54, 0x00, 0x80, 0x30, /* addik r4,r0,'T' */ > 0x00, 0x00, 0x83, 0xf0, /* sbi r4,r3,0 */ > 0xfc, 0xff, 0x00, 0xb8, /* bri -4 loop */ > } > }; > > ... > > { "microblaze", "petalogix-s3adsp1800", "", "TT", > kernel_microblaze, 0x8400 }, > { "microblazeel", "petalogix-ml605", "", "TT", > kernel_microblaze, 0x83a0 }, The two micrablaze data arrays are completely different, since one is big endian, the other is little, so I'd need to byte-swap the whole array on the fly, too. Not sure whether it makes sense to add such complex code just to safe 16 bytes of blob data...? > Likewise, you could have just two copies of the code for all ARM boards > that have a pl011 (or any other UART with a simple byte-long transmit > register), one 32-bit and one 64-bit. OK, having a patching mechanism in place likely makes sense as soon as we want to include multiple ARM boards here. I can do that, but I'd rather like to do it as a second step. It was already quite hard work to come up with all these assembler snippets (from CPUs where I don't have a clue of) and to determine which machines could be tested this way at all, so I'd first like to wait and see whether this patch series is acceptable at all or not, since Peter has objections. Thomas
Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board
On 30 November 2017 at 12:40, Paolo Bonzini wrote: > On 30/11/2017 13:14, Peter Maydell wrote: >> On 30 November 2017 at 08:53, Thomas Huth wrote: >>> +static const uint8_t kernel_mcf5208[] = { >>> +0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */ >>> +0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */ >>> +0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) Enable >>> TX */ >>> +0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0) Print >>> 'T' */ >>> +0x60, 0xfa /* bra.s loop */ >>> +}; >> >> This approach doesn't seem to be scalable to me -- are we >> really going to have 50 or more fragments of hand-coded hex in >> this file to cover the various board models? >> >> I'd much rather see us have a framework for being able >> to build test blobs from source using a cross compiler >> setup (and docker or similar so anybody can rebuild >> the test blobs). That will be much easier to maintain >> and easier to extend to having tests that test other >> parts of the board or other aspects of TCG emulation. > > It seems a bit overkill, as these snippets are ~16 bytes long. They're 16 bytes long because that's about the limit of what you can do with this approach. The consequence is that they barely test anything at all. A more sensible framework would allow: * better testing of TCG instructions more generally * writing your test cases in C * more interesting board dependent tests * "integration test" setups like 'boot entire kernel' * etc thanks -- PMM
[Qemu-devel] [PATCH v4 0/7] s390x/pci: Improve zPCI to cover more cases
This patch fixes the following BUG: Even a guest is able to detect virtio_pci device, the init function the Linux virtio_pci driver will hang because zPCI does not support the subregions used by virtio_pci. It follows that right now the PCI support is very limited (e.g. pass through of a host vfio device) To enable features like virtio-pci several modifications needs to be done. As already stated above, Virtio-PCI uses subregions, which may eventually be discontinuous inside bars instead of a single flat region often used by real devices. The address offset being formerly calculated from the BAR base address must be adapted to the subregions instead of to the single region. This patch provides the new calculation for the three kind of BAR access, zPCI STORE, zPCI LOAD and zPCI STORE BLOCK done by zPCI. We use the opportunity to - enhance the fault detection for zPCI STORE and LOAD, - enhance the fault detection and to provide the maximum STORE BLOCK block size, maxstbl, for zPCI STORE BLOCK - factor out part of the code used to calculate the offset and access the BARs, - factor out the code for endianess conversion. Pierre Morel (7): s390x/pci: factor out endianess conversion s390x/pci: rework PCI STORE s390x/pci: rework PCI LOAD s390x/pci: rework PCI STORE BLOCK s390x/pci: move the memory region read from pcilg s390x/pci: move the memory region write from pcistg s390x/pci: search for subregion inside the BARs hw/s390x/s390-pci-bus.h | 1 + hw/s390x/s390-pci-inst.c | 251 --- hw/s390x/s390-pci-inst.h | 6 +- 3 files changed, 158 insertions(+), 100 deletions(-) -- 2.7.4 Changelog: from v3 - added explanation why we always need to swap bytes - #define for BAR definitions (Conny) - renamed "other" to "subregion" (Thomas) - renamed label "addressing_error" to "specification_error" (Thomas) - added some comments to explain why testing states is not needed (Conny) from v2 - rewording of comments, coding style. from v1 - suppress fallthrough to PCI_ROM_SLOT to handle it in the default case. - reword most of the patch commit messages - add comments to the endianness conversion - reword somme comments inside the patches
[Qemu-devel] [PATCH v4 1/7] s390x/pci: factor out endianess conversion
There are two places where the same endianness conversion is done. Let's factor this out into a static function. Note that the conversion must always be done for data in a register: The S390 BE guest converted date to le before issuing the instruction. After interception in a BE host: ZPCI VFIO using pwrite must make the conversion back for the BE kernel. Kernel will do BE to le translation when loading the register for the real instruction. After interception in a le host: TCG stores a BE register in le, swapping bytes. But since the data in the register was already le it is now BE ZPCI VFIO must convert it to le before writing to the PCI memory. In both cases ZPCI VFIO must swap the bytes from the register. Signed-off-by: Pierre Morel Reviewed-by: Yi Min Zhao --- hw/s390x/s390-pci-inst.c | 59 +++- 1 file changed, 33 insertions(+), 26 deletions(-) diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c index 8e088f3..3e1f1a0 100644 --- a/hw/s390x/s390-pci-inst.c +++ b/hw/s390x/s390-pci-inst.c @@ -314,6 +314,36 @@ out: return 0; } +/** + * Swap data contained in s390x big endian registers to little endian + * PCI bars. + * + * @ptr: a pointer to a uint64_t data field + * @len: the length of the valid data, must be 1,2,4 or 8 + */ +static int zpci_endian_swap(uint64_t *ptr, uint8_t len) +{ +uint64_t data = *ptr; + +switch (len) { +case 1: +break; +case 2: +data = bswap16(data); +break; +case 4: +data = bswap32(data); +break; +case 8: +data = bswap64(data); +break; +default: +return -EINVAL; +} +*ptr = data; +return 0; +} + int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) { CPUS390XState *env = &cpu->env; @@ -385,19 +415,7 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) data = pci_host_config_read_common( pbdev->pdev, offset, pci_config_size(pbdev->pdev), len); -switch (len) { -case 1: -break; -case 2: -data = bswap16(data); -break; -case 4: -data = bswap32(data); -break; -case 8: -data = bswap64(data); -break; -default: +if (zpci_endian_swap(&data, len)) { program_interrupt(env, PGM_OPERAND, 4); return 0; } @@ -500,19 +518,8 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) program_interrupt(env, PGM_OPERAND, 4); return 0; } -switch (len) { -case 1: -break; -case 2: -data = bswap16(data); -break; -case 4: -data = bswap32(data); -break; -case 8: -data = bswap64(data); -break; -default: + +if (zpci_endian_swap(&data, len)) { program_interrupt(env, PGM_OPERAND, 4); return 0; } -- 2.7.4
[Qemu-devel] [PATCH v4 6/7] s390x/pci: move the memory region write from pcistg
Let's move the memory region write from pcistg into a dedicated function. This allows us to prepare a later patch searching for subregions inside of the memory region. Signed-off-by: Pierre Morel Reviewed-by: Yi Min Zhao Reviewed-by: Thomas Huth --- hw/s390x/s390-pci-inst.c | 27 +-- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c index c702e8d..1277d62 100644 --- a/hw/s390x/s390-pci-inst.c +++ b/hw/s390x/s390-pci-inst.c @@ -454,12 +454,27 @@ static int trap_msix(S390PCIBusDevice *pbdev, uint64_t offset, uint8_t pcias) } } +static MemTxResult zpci_write_bar(S390PCIBusDevice *pbdev, uint8_t pcias, + uint64_t offset, uint64_t data, uint8_t len) +{ +MemoryRegion *mr; + +if (trap_msix(pbdev, offset, pcias)) { +offset = offset - pbdev->msix.table_offset; +mr = &pbdev->pdev->msix_table_mmio; +} else { +mr = pbdev->pdev->io_regions[pcias].memory; +} + +return memory_region_dispatch_write(mr, offset, data, len, +MEMTXATTRS_UNSPECIFIED); +} + int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) { CPUS390XState *env = &cpu->env; uint64_t offset, data; S390PCIBusDevice *pbdev; -MemoryRegion *mr; MemTxResult result; uint8_t len; uint32_t fh; @@ -519,15 +534,7 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) return 0; } -if (trap_msix(pbdev, offset, pcias)) { -offset = offset - pbdev->msix.table_offset; -mr = &pbdev->pdev->msix_table_mmio; -} else { -mr = pbdev->pdev->io_regions[pcias].memory; -} - -result = memory_region_dispatch_write(mr, offset, data, len, - MEMTXATTRS_UNSPECIFIED); +result = zpci_write_bar(pbdev, pcias, offset, data, len); if (result != MEMTX_OK) { program_interrupt(env, PGM_OPERAND, 4); return 0; -- 2.7.4
[Qemu-devel] [PATCH v4 3/7] s390x/pci: rework PCI LOAD
Enhance the fault detection, correction of the fault reporting. Signed-off-by: Pierre Morel Reviewed-by: Yi Min Zhao --- hw/s390x/s390-pci-inst.c | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c index 134484f..bab347e 100644 --- a/hw/s390x/s390-pci-inst.c +++ b/hw/s390x/s390-pci-inst.c @@ -373,6 +373,11 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) len = env->regs[r2] & 0xf; offset = env->regs[r2 + 1]; +if (!(fh & FH_MASK_ENABLE)) { +setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE); +return 0; +} + pbdev = s390_pci_find_dev_by_fh(s390_get_phb(), fh); if (!pbdev) { DPRINTF("pcilg no pci dev\n"); @@ -381,12 +386,7 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) } switch (pbdev->state) { -case ZPCI_FS_RESERVED: -case ZPCI_FS_STANDBY: -case ZPCI_FS_DISABLED: case ZPCI_FS_PERMANENT_ERROR: -setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE); -return 0; case ZPCI_FS_ERROR: setcc(cpu, ZPCI_PCI_LS_ERR); s390_set_status_code(env, r2, ZPCI_PCI_ST_BLOCKED); @@ -395,8 +395,9 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) break; } -if (pcias < 6) { -if ((8 - (offset & 0x7)) < len) { +switch (pcias) { +case ZPCI_IO_BAR_MIN ... ZPCI_IO_BAR_MAX: +if (!len || (len > (8 - (offset & 0x7 { program_interrupt(env, PGM_OPERAND, 4); return 0; } @@ -407,8 +408,9 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) program_interrupt(env, PGM_OPERAND, 4); return 0; } -} else if (pcias == 15) { -if ((4 - (offset & 0x3)) < len) { +break; +case ZPCI_CONFIG_BAR: +if (!len || (len > (4 - (offset & 0x3))) || len == 3) { program_interrupt(env, PGM_OPERAND, 4); return 0; } @@ -419,8 +421,9 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) program_interrupt(env, PGM_OPERAND, 4); return 0; } -} else { -DPRINTF("invalid space\n"); +break; +default: +DPRINTF("pcilg invalid space\n"); setcc(cpu, ZPCI_PCI_LS_ERR); s390_set_status_code(env, r2, ZPCI_PCI_ST_INVAL_AS); return 0; -- 2.7.4
[Qemu-devel] [PATCH v4 4/7] s390x/pci: rework PCI STORE BLOCK
Enhance the fault detection. Fixup the precedence to check the destination path existance before checking for the source accessibility. Add the maxstbl entry to both the Query PCI Function Group response and the PCIBusDevice structure. Initialize the maxstbl to 128 per default until we get the actual data from the hardware. Signed-off-by: Pierre Morel Reviewed-by: Yi Min Zhao --- hw/s390x/s390-pci-bus.h | 1 + hw/s390x/s390-pci-inst.c | 63 ++-- hw/s390x/s390-pci-inst.h | 2 +- 3 files changed, 41 insertions(+), 25 deletions(-) diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h index 560bd82..2993f0d 100644 --- a/hw/s390x/s390-pci-bus.h +++ b/hw/s390x/s390-pci-bus.h @@ -284,6 +284,7 @@ struct S390PCIBusDevice { uint64_t fmb_addr; uint8_t isc; uint16_t noi; +uint16_t maxstbl; uint8_t sum; S390MsixInfo msix; AdapterRoutes routes; diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c index bab347e..bab7353 100644 --- a/hw/s390x/s390-pci-inst.c +++ b/hw/s390x/s390-pci-inst.c @@ -294,6 +294,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2) stq_p(&resgrp->msia, ZPCI_MSI_ADDR); stw_p(&resgrp->mui, 0); stw_p(&resgrp->i, 128); +stw_p(&resgrp->maxstbl, 128); resgrp->version = 0; stw_p(&resgrp->hdr.rsp, CLP_RC_OK); @@ -648,6 +649,7 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t r3, uint64_t gaddr, S390PCIBusDevice *pbdev; MemoryRegion *mr; MemTxResult result; +uint64_t offset; int i; uint32_t fh; uint8_t pcias; @@ -662,22 +664,10 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t r3, uint64_t gaddr, fh = env->regs[r1] >> 32; pcias = (env->regs[r1] >> 16) & 0xf; len = env->regs[r1] & 0xff; +offset = env->regs[r3]; -if (pcias > 5) { -DPRINTF("pcistb invalid space\n"); -setcc(cpu, ZPCI_PCI_LS_ERR); -s390_set_status_code(env, r1, ZPCI_PCI_ST_INVAL_AS); -return 0; -} - -switch (len) { -case 16: -case 32: -case 64: -case 128: -break; -default: -program_interrupt(env, PGM_SPECIFICATION, 6); +if (!(fh & FH_MASK_ENABLE)) { +setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE); return 0; } @@ -689,12 +679,7 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t r3, uint64_t gaddr, } switch (pbdev->state) { -case ZPCI_FS_RESERVED: -case ZPCI_FS_STANDBY: -case ZPCI_FS_DISABLED: case ZPCI_FS_PERMANENT_ERROR: -setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE); -return 0; case ZPCI_FS_ERROR: setcc(cpu, ZPCI_PCI_LS_ERR); s390_set_status_code(env, r1, ZPCI_PCI_ST_BLOCKED); @@ -703,8 +688,34 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t r3, uint64_t gaddr, break; } +if (pcias > ZPCI_IO_BAR_MAX) { +DPRINTF("pcistb invalid space\n"); +setcc(cpu, ZPCI_PCI_LS_ERR); +s390_set_status_code(env, r1, ZPCI_PCI_ST_INVAL_AS); +return 0; +} + +/* Verify the address, offset and length */ +/* offset must be a multiple of 8 */ +if (offset % 8) { +goto specification_error; +} +/* Length must be greater than 8, a multiple of 8 */ +/* and not greater than maxstbl */ +if ((len <= 8) || (len % 8) || (len > pbdev->maxstbl)) { +goto specification_error; +} +/* Do not cross a 4K-byte boundary */ +if (((offset & 0xfff) + len) > 0x1000) { +goto specification_error; +} +/* Guest address must be double word aligned */ +if (gaddr & 0x07UL) { +goto specification_error; +} + mr = pbdev->pdev->io_regions[pcias].memory; -if (!memory_region_access_valid(mr, env->regs[r3], len, true)) { +if (!memory_region_access_valid(mr, offset, len, true)) { program_interrupt(env, PGM_OPERAND, 6); return 0; } @@ -714,9 +725,9 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t r3, uint64_t gaddr, } for (i = 0; i < len / 8; i++) { -result = memory_region_dispatch_write(mr, env->regs[r3] + i * 8, - ldq_p(buffer + i * 8), 8, - MEMTXATTRS_UNSPECIFIED); +result = memory_region_dispatch_write(mr, offset + i * 8, + ldq_p(buffer + i * 8), 8, + MEMTXATTRS_UNSPECIFIED); if (result != MEMTX_OK) { program_interrupt(env, PGM_OPERAND, 6); return 0; @@ -725,6 +736,10 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t r3, uint64_t gaddr, setcc(cpu, ZPCI_PCI_LS_OK); return 0; + +specification_error: +program_interrupt(env, PGM_SPECIFICATION, 6); +return 0; } static int reg_irqs(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib) diff --git
Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
On Thu, 30 Nov 2017 12:47:20 + "Dr. David Alan Gilbert" wrote: > * Igor Mammedov (imamm...@redhat.com) wrote: > > On Thu, 30 Nov 2017 12:08:06 + > > "Dr. David Alan Gilbert" wrote: > > > > > * Igor Mammedov (imamm...@redhat.com) wrote: > > > > On Wed, 29 Nov 2017 18:50:19 + > > > > "Dr. David Alan Gilbert (git)" wrote: > > > > > > > > > From: "Dr. David Alan Gilbert" > > > > > > > > > > Hi, > > > > > This is an experimental set that reworks the way the vhost > > > > > code handles changes in physical address space layout that > > > > > came from a discussion with Igor. > > > > Thanks for looking into it. > > > > > > > > > > > > > Instead of updating and trying to merge sections of address > > > > > space on each add/remove callback, we wait until the commit phase > > > > > and go through and rebuild a list by walking the Flatview of > > > > > memory and end up producing an ordered list. > > > > > We compare the list to the old list to trigger updates. > > > > > > > > > > Note, only very lightly tested so far, I'm just trying to see if it's > > > > > the right shape. > > > > > > > > > > Igor, is this what you were intending? > > > > > > > > I was thinking about a little less intrusive approach > > > > > > > > where vhost_region_add/del are modified to maintain > > > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped > > > > altogether and vhost_memory_region array is build/used/freed > > > > on every vhost_commit(). > > > > Maintaining sorted array should roughly cost us O(2 log n) if > > > > binary search is used. > > > > > > > > However I like your idea with iterator even more as it have > > > > potential to make it even faster O(n) if we get rid of > > > > quadratic and relatively complex vhost_update_compare_list(). > > > > > > Note vhost_update_compare_list is complex, > > > but it is O(n) - it's > > > got nested loops, but the inner loop moves forward and oldi never > > > gets reset back to zero. > > While skimming through patches I've overlooked it. > > > > Anyways, > > why memcmp(old_arr, new_arr) is not sufficient > > to detect a change in memory map? > > It tells you that you've got a change, but doesn't give > the start/end of the range that's changed, and those > are used by vhost_commit to limit the work of > vhost_verify_ring_mappings. Isn't memmap list a sorted and dev->mem_changed_[start|end]_addr are the lowest|highest addresses of whole map? If it's, so wouldn't getting values directly from the 1st/last entries of array be sufficient? > > Dave > > > > > > > > Pls, see comments on individual patches. > > > > > > Thanks; I have fixed a couple of bugs since I posted, so I'm > > > more interested in structure/shape. Any good ideas how to test > > > it are welcome. > > > > > > Dave > > > > > > > > > > > > Dave > > > > > > > > > > Dr. David Alan Gilbert (7): > > > > > memory: address_space_iterate > > > > > vhost: Move log_dirty check > > > > > vhost: New memory update functions > > > > > vhost: update_mem_cb implementation > > > > > vhost: Compare new and old memory lists > > > > > vhost: Copy updated region data into device state > > > > > vhost: Remove vhost_set_memory and children > > > > > > > > > > hw/virtio/trace-events | 8 + > > > > > hw/virtio/vhost.c | 424 > > > > > ++--- > > > > > include/exec/memory.h | 23 +++ > > > > > memory.c | 22 +++ > > > > > 4 files changed, 241 insertions(+), 236 deletions(-) > > > > > > > > > > > > -- > > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[Qemu-devel] [PATCH v4 2/7] s390x/pci: rework PCI STORE
Enhance the fault detection, correction of the fault reporting. Signed-off-by: Pierre Morel Reviewed-by: Yi Min Zhao --- hw/s390x/s390-pci-inst.c | 42 +- hw/s390x/s390-pci-inst.h | 4 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c index 3e1f1a0..134484f 100644 --- a/hw/s390x/s390-pci-inst.c +++ b/hw/s390x/s390-pci-inst.c @@ -470,6 +470,12 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) pcias = (env->regs[r2] >> 16) & 0xf; len = env->regs[r2] & 0xf; offset = env->regs[r2 + 1]; +data = env->regs[r1]; + +if (!(fh & FH_MASK_ENABLE)) { +setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE); +return 0; +} pbdev = s390_pci_find_dev_by_fh(s390_get_phb(), fh); if (!pbdev) { @@ -479,12 +485,10 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) } switch (pbdev->state) { -case ZPCI_FS_RESERVED: -case ZPCI_FS_STANDBY: -case ZPCI_FS_DISABLED: +/* ZPCI_FS_RESERVED, ZPCI_FS_STANDBY and ZPCI_FS_DISABLED + * are already covered by the FH_MASK_ENABLE check above + */ case ZPCI_FS_PERMANENT_ERROR: -setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE); -return 0; case ZPCI_FS_ERROR: setcc(cpu, ZPCI_PCI_LS_ERR); s390_set_status_code(env, r2, ZPCI_PCI_ST_BLOCKED); @@ -493,9 +497,13 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) break; } -data = env->regs[r1]; -if (pcias < 6) { -if ((8 - (offset & 0x7)) < len) { +switch (pcias) { +/* A ZPCI PCI card may use any BAR from BAR 0 to BAR 5 */ +case ZPCI_IO_BAR_MIN ... ZPCI_IO_BAR_MAX: +/* Check length: + * A length of 0 is invalid and length should not cross a double word + */ +if (!len || (len > (8 - (offset & 0x7 { program_interrupt(env, PGM_OPERAND, 4); return 0; } @@ -513,21 +521,21 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) program_interrupt(env, PGM_OPERAND, 4); return 0; } -} else if (pcias == 15) { -if ((4 - (offset & 0x3)) < len) { -program_interrupt(env, PGM_OPERAND, 4); -return 0; -} - -if (zpci_endian_swap(&data, len)) { +break; +case ZPCI_CONFIG_BAR: +/* ZPCI uses the pseudo BAR number 15 as configuration space */ +/* possible access lengths are 1,2,4 and must not cross a word */ +if (!len || (len > (4 - (offset & 0x3))) || len == 3) { program_interrupt(env, PGM_OPERAND, 4); return 0; } - +/* len = 1,2,4 so we do not need to test */ +zpci_endian_swap(&data, len); pci_host_config_write_common(pbdev->pdev, offset, pci_config_size(pbdev->pdev), data, len); -} else { +break; +default: DPRINTF("pcistg invalid space\n"); setcc(cpu, ZPCI_PCI_LS_ERR); s390_set_status_code(env, r2, ZPCI_PCI_ST_INVAL_AS); diff --git a/hw/s390x/s390-pci-inst.h b/hw/s390x/s390-pci-inst.h index 94a959f..4be58fe 100644 --- a/hw/s390x/s390-pci-inst.h +++ b/hw/s390x/s390-pci-inst.h @@ -302,4 +302,8 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t r3, uint64_t gaddr, int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar); int stpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar); +#define ZPCI_IO_BAR_MIN 0 +#define ZPCI_IO_BAR_MAX 5 +#define ZPCI_CONFIG_BAR 15 + #endif -- 2.7.4
[Qemu-devel] [PATCH v4 5/7] s390x/pci: move the memory region read from pcilg
Let's move the memory region read from pcilg into a dedicated function. This allows us to prepare a later patch. Signed-off-by: Pierre Morel Reviewed-by: Yi Min Zhao Reviewed-by: Thomas Huth --- hw/s390x/s390-pci-inst.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c index bab7353..c702e8d 100644 --- a/hw/s390x/s390-pci-inst.c +++ b/hw/s390x/s390-pci-inst.c @@ -345,13 +345,22 @@ static int zpci_endian_swap(uint64_t *ptr, uint8_t len) return 0; } +static MemTxResult zpci_read_bar(S390PCIBusDevice *pbdev, uint8_t pcias, + uint64_t offset, uint64_t *data, uint8_t len) +{ +MemoryRegion *mr; + +mr = pbdev->pdev->io_regions[pcias].memory; +return memory_region_dispatch_read(mr, offset, data, len, + MEMTXATTRS_UNSPECIFIED); +} + int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) { CPUS390XState *env = &cpu->env; S390PCIBusDevice *pbdev; uint64_t offset; uint64_t data; -MemoryRegion *mr; MemTxResult result; uint8_t len; uint32_t fh; @@ -402,9 +411,7 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) program_interrupt(env, PGM_OPERAND, 4); return 0; } -mr = pbdev->pdev->io_regions[pcias].memory; -result = memory_region_dispatch_read(mr, offset, &data, len, - MEMTXATTRS_UNSPECIFIED); +result = zpci_read_bar(pbdev, pcias, offset, &data, len); if (result != MEMTX_OK) { program_interrupt(env, PGM_OPERAND, 4); return 0; -- 2.7.4
[Qemu-devel] [PATCH v4 7/7] s390x/pci: search for subregion inside the BARs
When dispatching memory access to PCI BAR region, we must look for possible subregions, used by the PCI device to map different memory areas inside the same PCI BAR. Since the data offset we received is calculated starting at the region start address we need to adjust the offset for the subregion. The data offset inside the subregion is calculated by substracting the subregion's starting address from the data offset in the region. The access to the MSIX region is now handled in a generic way, we do not need the specific trap_msix() function anymore. Signed-off-by: Pierre Morel Reviewed-by: Yi Min Zhao --- hw/s390x/s390-pci-inst.c | 44 +--- 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c index 1277d62..ca110e0 100644 --- a/hw/s390x/s390-pci-inst.c +++ b/hw/s390x/s390-pci-inst.c @@ -345,12 +345,31 @@ static int zpci_endian_swap(uint64_t *ptr, uint8_t len) return 0; } +static MemoryRegion *s390_get_subregion(MemoryRegion *mr, uint64_t offset, +uint8_t len) +{ +MemoryRegion *subregion; +uint64_t subregion_size; + +QTAILQ_FOREACH(subregion, &mr->subregions, subregions_link) { +subregion_size = int128_get64(subregion->size); +if ((offset >= subregion->addr) && +(offset + len) <= (subregion->addr + subregion_size)) { +mr = subregion; +break; +} +} +return mr; +} + static MemTxResult zpci_read_bar(S390PCIBusDevice *pbdev, uint8_t pcias, uint64_t offset, uint64_t *data, uint8_t len) { MemoryRegion *mr; mr = pbdev->pdev->io_regions[pcias].memory; +mr = s390_get_subregion(mr, offset, len); +offset -= mr->addr; return memory_region_dispatch_read(mr, offset, data, len, MEMTXATTRS_UNSPECIFIED); } @@ -442,30 +461,14 @@ int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2) return 0; } -static int trap_msix(S390PCIBusDevice *pbdev, uint64_t offset, uint8_t pcias) -{ -if (pbdev->msix.available && pbdev->msix.table_bar == pcias && -offset >= pbdev->msix.table_offset && -offset < (pbdev->msix.table_offset + - pbdev->msix.entries * PCI_MSIX_ENTRY_SIZE)) { -return 1; -} else { -return 0; -} -} - static MemTxResult zpci_write_bar(S390PCIBusDevice *pbdev, uint8_t pcias, uint64_t offset, uint64_t data, uint8_t len) { MemoryRegion *mr; -if (trap_msix(pbdev, offset, pcias)) { -offset = offset - pbdev->msix.table_offset; -mr = &pbdev->pdev->msix_table_mmio; -} else { -mr = pbdev->pdev->io_regions[pcias].memory; -} - +mr = pbdev->pdev->io_regions[pcias].memory; +mr = s390_get_subregion(mr, offset, len); +offset -= mr->addr; return memory_region_dispatch_write(mr, offset, data, len, MEMTXATTRS_UNSPECIFIED); } @@ -729,6 +732,9 @@ int pcistb_service_call(S390CPU *cpu, uint8_t r1, uint8_t r3, uint64_t gaddr, } mr = pbdev->pdev->io_regions[pcias].memory; +mr = s390_get_subregion(mr, offset, len); +offset -= mr->addr; + if (!memory_region_access_valid(mr, offset, len, true)) { program_interrupt(env, PGM_OPERAND, 6); return 0; -- 2.7.4
Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board
On 30/11/2017 13:51, Peter Maydell wrote: > On 30 November 2017 at 12:40, Paolo Bonzini wrote: >> On 30/11/2017 13:14, Peter Maydell wrote: >>> On 30 November 2017 at 08:53, Thomas Huth wrote: +static const uint8_t kernel_mcf5208[] = { +0x41, 0xf9, 0xfc, 0x06, 0x00, 0x00, /* lea 0xfc06,%a0 */ +0x10, 0x3c, 0x00, 0x54, /* move.b #'T',%d0 */ +0x11, 0x7c, 0x00, 0x04, 0x00, 0x08, /* move.b #4,8(%a0) Enable TX */ +0x11, 0x40, 0x00, 0x0c, /* move.b %d0,12(%a0) Print 'T' */ +0x60, 0xfa /* bra.s loop */ +}; >>> >>> This approach doesn't seem to be scalable to me -- are we >>> really going to have 50 or more fragments of hand-coded hex in >>> this file to cover the various board models? >>> >>> I'd much rather see us have a framework for being able >>> to build test blobs from source using a cross compiler >>> setup (and docker or similar so anybody can rebuild >>> the test blobs). That will be much easier to maintain >>> and easier to extend to having tests that test other >>> parts of the board or other aspects of TCG emulation. >> >> It seems a bit overkill, as these snippets are ~16 bytes long. > > They're 16 bytes long because that's about the limit of > what you can do with this approach. The consequence is that > they barely test anything at all. Certainly they are an awful test for boards, but they are a great smoke test for TCG changes that require modifications in all target/ subdirectories. The infrastructure you want for integration tests is already provided by kvm-unit-tests, which satisfies at least bullets 2-4 from your list below (the first is unclear to me). Thanks, Paolo > A more sensible framework > would allow: > * better testing of TCG instructions more generally > * writing your test cases in C > * more interesting board dependent tests > * "integration test" setups like 'boot entire kernel' > * etc > > thanks > -- PMM >
Re: [Qemu-devel] [PATCH for-2.12 3/7] tests/boot-serial-test: Add support for the mcf5208evb board
On 30/11/2017 13:51, Thomas Huth wrote: > The two micrablaze data arrays are completely different, since one is > big endian, the other is little, so I'd need to byte-swap the whole > array on the fly, too. Not sure whether it makes sense to add such > complex code just to safe 16 bytes of blob data...? *facepalm* :) >> Likewise, you could have just two copies of the code for all ARM boards >> that have a pl011 (or any other UART with a simple byte-long transmit >> register), one 32-bit and one 64-bit. > > OK, having a patching mechanism in place likely makes sense as soon as > we want to include multiple ARM boards here. I can do that, but I'd > rather like to do it as a second step. It was already quite hard work to > come up with all these assembler snippets (from CPUs where I don't have > a clue of) and to determine which machines could be tested this way at > all, so I'd first like to wait and see whether this patch series is > acceptable at all or not, since Peter has objections. Given your remark on endianness then no, having one snippet per binary is enough. Paolo
Re: [Qemu-devel] [PATCH v4 00/20] add byte-based block_status driver callbacks
Most of conversions looks fine, but it is not simple to prove the correctness, because we start to use internal driver logic on offsets and lengths, not aligned to sectors. And we can't imagine the consequences (at least, I can't and my r-b doesn't give the guarantee) of such change. It is like take some function and expand its scope of parameters. May be I'm too paranoiac. May be it would be good to align requests in bdrv_co_block_status to sectors at least for drivers which do not provide request_alignment. May be each patch should be reviewed by person, knowing that particular driver. 12.10.2017 21:58, Eric Blake wrote: There are patches floating around to add NBD_CMD_BLOCK_STATUS, but NBD wants to report status on byte granularity (even if the reporting will probably be naturally aligned to sectors or even much higher levels). I've therefore started the task of converting our block status code to report at a byte granularity rather than sectors. Now that 2.11 is open, I'm rebasing/reposting the remaining patches. The overall conversion currently looks like: part 1: bdrv_is_allocated (merged, commit 51b0a488) part 2: dirty-bitmap (merged, commit ca759622) part 3: bdrv_get_block_status (v6 is posted [1], partially reviewed) part 4: .bdrv_co_block_status (this series, v3 was here [2]) Available as a tag at: git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-callback-v4 Based-on: <20171012034720.11947-1-ebl...@redhat.com> ([PATCH v6 00/24] make bdrv_get_block_status byte-based) Since v3: - rebase to series 3 tweak in get_status interface - further simplify qed code - better documentation of optimization flag (s/mapping/want_zero/) 001/20:[0033] [FC] 'block: Add .bdrv_co_block_status() callback' 002/20:[0077] [FC] 'block: Switch passthrough drivers to .bdrv_co_block_status()' 003/20:[0020] [FC] 'file-posix: Switch to .bdrv_co_block_status()' 004/20:[0019] [FC] 'gluster: Switch to .bdrv_co_block_status()' 005/20:[] [--] 'iscsi: Switch cluster_sectors to byte-based' 006/20:[] [--] 'iscsi: Switch iscsi_allocmap_update() to byte-based' 007/20:[0022] [FC] 'iscsi: Switch to .bdrv_co_block_status()' 008/20:[0013] [FC] 'null: Switch to .bdrv_co_block_status()' 009/20:[0017] [FC] 'parallels: Switch to .bdrv_co_block_status()' 010/20:[0014] [FC] 'qcow: Switch to .bdrv_co_block_status()' 011/20:[0019] [FC] 'qcow2: Switch to .bdrv_co_block_status()' 012/20:[0086] [FC] 'qed: Switch to .bdrv_co_block_status()' 013/20:[0015] [FC] 'raw: Switch to .bdrv_co_block_status()' 014/20:[0011] [FC] 'sheepdog: Switch to .bdrv_co_block_status()' 015/20:[] [--] 'vdi: Avoid bitrot of debugging code' 016/20:[0017] [FC] 'vdi: Switch to .bdrv_co_block_status()' 017/20:[0013] [FC] 'vmdk: Switch to .bdrv_co_block_status()' 018/20:[0019] [FC] 'vpc: Switch to .bdrv_co_block_status()' 019/20:[0010] [FC] 'vvfat: Switch to .bdrv_co_block_status()' 020/20:[0019] [FC] 'block: Drop unused .bdrv_co_get_block_status()' Eric Blake (20): block: Add .bdrv_co_block_status() callback block: Switch passthrough drivers to .bdrv_co_block_status() file-posix: Switch to .bdrv_co_block_status() gluster: Switch to .bdrv_co_block_status() iscsi: Switch cluster_sectors to byte-based iscsi: Switch iscsi_allocmap_update() to byte-based iscsi: Switch to .bdrv_co_block_status() null: Switch to .bdrv_co_block_status() parallels: Switch to .bdrv_co_block_status() qcow: Switch to .bdrv_co_block_status() qcow2: Switch to .bdrv_co_block_status() qed: Switch to .bdrv_co_block_status() raw: Switch to .bdrv_co_block_status() sheepdog: Switch to .bdrv_co_block_status() vdi: Avoid bitrot of debugging code vdi: Switch to .bdrv_co_block_status() vmdk: Switch to .bdrv_co_block_status() vpc: Switch to .bdrv_co_block_status() vvfat: Switch to .bdrv_co_block_status() block: Drop unused .bdrv_co_get_block_status() include/block/block.h | 9 ++- include/block/block_int.h | 43 +++-- block/io.c| 80 +++- block/blkdebug.c | 20 +++--- block/commit.c| 2 +- block/file-posix.c| 59 ++ block/gluster.c | 67 +++- block/iscsi.c | 154 +- block/mirror.c| 2 +- block/null.c | 23 +++ block/parallels.c | 22 --- block/qcow.c | 27 block/qcow2.c | 24 block/qed.c | 84 + block/raw-format.c| 16 ++--- block/sheepdog.c | 26 block/throttle.c | 2 +- block/vdi.c | 45 +++--- block/vmdk.c | 28 - block/vpc.c | 42 +++-- block/vvfat.c | 16 +++-- 21 files changed, 404 insertions(+), 387 deletions(-) -- Best regards, Vladimir
Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
* Igor Mammedov (imamm...@redhat.com) wrote: > On Thu, 30 Nov 2017 12:47:20 + > "Dr. David Alan Gilbert" wrote: > > > * Igor Mammedov (imamm...@redhat.com) wrote: > > > On Thu, 30 Nov 2017 12:08:06 + > > > "Dr. David Alan Gilbert" wrote: > > > > > > > * Igor Mammedov (imamm...@redhat.com) wrote: > > > > > On Wed, 29 Nov 2017 18:50:19 + > > > > > "Dr. David Alan Gilbert (git)" wrote: > > > > > > > > > > > From: "Dr. David Alan Gilbert" > > > > > > > > > > > > Hi, > > > > > > This is an experimental set that reworks the way the vhost > > > > > > code handles changes in physical address space layout that > > > > > > came from a discussion with Igor. > > > > > Thanks for looking into it. > > > > > > > > > > > > > > > > Instead of updating and trying to merge sections of address > > > > > > space on each add/remove callback, we wait until the commit phase > > > > > > and go through and rebuild a list by walking the Flatview of > > > > > > memory and end up producing an ordered list. > > > > > > We compare the list to the old list to trigger updates. > > > > > > > > > > > > Note, only very lightly tested so far, I'm just trying to see if > > > > > > it's > > > > > > the right shape. > > > > > > > > > > > > Igor, is this what you were intending? > > > > > > > > > > I was thinking about a little less intrusive approach > > > > > > > > > > where vhost_region_add/del are modified to maintain > > > > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped > > > > > altogether and vhost_memory_region array is build/used/freed > > > > > on every vhost_commit(). > > > > > Maintaining sorted array should roughly cost us O(2 log n) if > > > > > binary search is used. > > > > > > > > > > However I like your idea with iterator even more as it have > > > > > potential to make it even faster O(n) if we get rid of > > > > > quadratic and relatively complex vhost_update_compare_list(). > > > > > > > > Note vhost_update_compare_list is complex, > > > > but it is O(n) - it's > > > > got nested loops, but the inner loop moves forward and oldi never > > > > gets reset back to zero. > > > While skimming through patches I've overlooked it. > > > > > > Anyways, > > > why memcmp(old_arr, new_arr) is not sufficient > > > to detect a change in memory map? > > > > It tells you that you've got a change, but doesn't give > > the start/end of the range that's changed, and those > > are used by vhost_commit to limit the work of > > vhost_verify_ring_mappings. > Isn't memmap list a sorted and > dev->mem_changed_[start|end]_addr are the lowest|highest > addresses of whole map? > > If it's, so wouldn't getting values directly from > the 1st/last entries of array be sufficient? THat wasn't my understanding from the existing code; my understanding was that changed_start_addr is set by vhost_region_add->vhost_set_memory when a new region is added (or removed) and is set to the limit of the section added. But perhaps I'm misunderstanding. (The logic in vhost_verify_ring_mappings doesn't make sense to me either though; if vhost_verify_ring_part_mapping returns 0 on success, why is it doing if (!r) { break; } surely it should be if (r) { break; }) Dave > > > > > > Dave > > > > > > > > > > > Pls, see comments on individual patches. > > > > > > > > Thanks; I have fixed a couple of bugs since I posted, so I'm > > > > more interested in structure/shape. Any good ideas how to test > > > > it are welcome. > > > > > > > > Dave > > > > > > > > > > > > > > > Dave > > > > > > > > > > > > Dr. David Alan Gilbert (7): > > > > > > memory: address_space_iterate > > > > > > vhost: Move log_dirty check > > > > > > vhost: New memory update functions > > > > > > vhost: update_mem_cb implementation > > > > > > vhost: Compare new and old memory lists > > > > > > vhost: Copy updated region data into device state > > > > > > vhost: Remove vhost_set_memory and children > > > > > > > > > > > > hw/virtio/trace-events | 8 + > > > > > > hw/virtio/vhost.c | 424 > > > > > > ++--- > > > > > > include/exec/memory.h | 23 +++ > > > > > > memory.c | 22 +++ > > > > > > 4 files changed, 241 insertions(+), 236 deletions(-) > > > > > > > > > > > > > > > -- > > > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 06/17] iotests: Drop format-specific in _filter_img_info
On 2017-11-30 04:16, Fam Zheng wrote: > On Thu, 11/23 03:08, Max Reitz wrote: >> _filter_img_info should remove format-specific information, too. We >> already have such a filter in _img_info, and it is very useful for >> query-block-named-block-nodes (etc.), too. >> >> However, in 198 we need that information (but we still want the rest of >> the filter), so make that filtering optional. Note that "the rest of >> the filter" includes filtering of the test directory, so we can drop the >> _filter_testdir from 198 at the same time. >> >> Signed-off-by: Max Reitz >> --- >> tests/qemu-iotests/198 | 6 -- >> tests/qemu-iotests/common.filter | 29 - >> 2 files changed, 32 insertions(+), 3 deletions(-) >> >> diff --git a/tests/qemu-iotests/198 b/tests/qemu-iotests/198 >> index 34ef666381..a84a058396 100755 >> --- a/tests/qemu-iotests/198 >> +++ b/tests/qemu-iotests/198 >> @@ -91,11 +91,13 @@ $QEMU_IO --object $SECRET0 --object $SECRET1 -c "read -P >> 0x9 0 $size" --image-op >> >> echo >> echo "== checking image base ==" >> -$QEMU_IMG info --image-opts $IMGSPECBASE | _filter_img_info | >> _filter_testdir | sed -e "/^disk size:/ D" >> +$QEMU_IMG info --image-opts $IMGSPECBASE | _filter_img_info >> --format-specific \ >> +| sed -e "/^disk size:/ D" >> >> echo >> echo "== checking image layer ==" >> -$QEMU_IMG info --image-opts $IMGSPECLAYER | _filter_img_info | >> _filter_testdir | sed -e "/^disk size:/ D" >> +$QEMU_IMG info --image-opts $IMGSPECLAYER | _filter_img_info >> --format-specific \ >> +| sed -e "/^disk size:/ D" >> >> >> # success, all done >> diff --git a/tests/qemu-iotests/common.filter >> b/tests/qemu-iotests/common.filter >> index d9237799e9..0c0e53fae7 100644 >> --- a/tests/qemu-iotests/common.filter >> +++ b/tests/qemu-iotests/common.filter >> @@ -139,6 +139,15 @@ _filter_img_create() >> >> _filter_img_info() >> { >> +if [[ "$1" == "--format-specific" ]]; then >> +local format_specific=1 >> +shift >> +else >> +local format_specific=0 >> +fi >> + >> +discard=0 >> +regex_json_spec_start='^ *"format-specific": \{' >> sed -e "s#$IMGPROTO:$TEST_DIR#TEST_DIR#g" \ >> -e "s#$TEST_DIR#TEST_DIR#g" \ >> -e "s#$IMGFMT#IMGFMT#g" \ >> @@ -159,7 +168,25 @@ _filter_img_info() >> -e "/block_state_zero: \\(on\\|off\\)/d" \ >> -e "/log_size: [0-9]\\+/d" \ >> -e "s/iters: [0-9]\\+/iters: 1024/" \ >> --e "s/uuid: [-a-f0-9]\\+/uuid: >> ----/" >> +-e "s/uuid: [-a-f0-9]\\+/uuid: >> ----/" | \ >> +while IFS='' read -r line; do >> +if [[ $format_specific == 1 ]]; then >> +discard=0 >> +elif [[ $line == "Format specific information:" ]]; then >> +discard=1 >> +elif [[ $line =~ $regex_json_spec_start ]]; then >> +discard=2 >> +regex_json_spec_end="^${line%%[^ ]*}\\},? *$" >> +fi >> +if [[ $discard == 0 ]]; then >> +echo "$line" >> +elif [[ $discard == 1 && ! $line ]]; then > > s/\$line/"\$line"/ ? > >> +echo >> +discard=0 >> +elif [[ $discard == 2 && $line =~ $regex_json_spec_end ]]; then > > Ditto. It's in double brackets, so it should be fine. (I just copied it from _img_info in common.rc, and that was reviewed by Eric, so I assume it's fine. :-)) Max >> +discard=0 >> +fi >> +done >> } >> >> # filter out offsets and file names from qemu-img map; good for both >> -- >> 2.13.6 >> signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 08/17] iotests: Skip 103 for refcount_bits=1
On 2017-11-30 04:18, Fam Zheng wrote: > On Thu, 11/23 03:08, Max Reitz wrote: >> Signed-off-by: Max Reitz >> --- >> tests/qemu-iotests/103 | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/tests/qemu-iotests/103 b/tests/qemu-iotests/103 >> index ecbd8ebd71..d0cfab8844 100755 >> --- a/tests/qemu-iotests/103 >> +++ b/tests/qemu-iotests/103 >> @@ -40,6 +40,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 >> _supported_fmt qcow2 >> _supported_proto file nfs >> _supported_os Linux >> +# Internal snapshots are (currently) impossible with refcount_bits=1 >> +_unsupported_imgopts 'refcount_bits=1[^0-9]' > > What is the "[^0-9]" part for? It's so you can specify refcount_bits=16, but not refcount_bits=1,compat=0.10 or just refcount_bits=1. Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC PATCH v2 1/1] s390x/css: unrestrict cssids
On Thu, 30 Nov 2017 13:32:12 +0100 Halil Pasic wrote: (...) Before I spend way too much time on this: Is the proposed machine-property interface usable from a libvirt POV? IOW, can we go with this now and fix the ugliness later (probably via a generic overhaul of the interface)?
[Qemu-devel] [PATCH 0/3] macro do/while (0) cleanup
Noticed this by chance in the tests/ directory, so I broadened it to a grep of the entire code base. I suspect many of the bad macros were the victims of copy-and-paste from some other bad location (particularly given how many bit-rotten debug print macros were involved). https://wiki.qemu.org/BiteSizedTasks#Bitrot_prevention is still left for someone else, for another day. Eric Blake (3): net: Drop unusual use of do { } while (0); mips: Tweak location of ';' in macros maint: Fix macros with broken 'do/while(0);' usage tests/acpi-utils.h | 8 ui/sdl_zoom_template.h | 8 audio/paaudio.c| 4 ++-- hw/adc/stm32f2xx_adc.c | 2 +- hw/block/m25p80.c | 2 +- hw/char/cadence_uart.c | 2 +- hw/char/stm32f2xx_usart.c | 2 +- hw/display/cg3.c | 2 +- hw/display/dpcd.c | 2 +- hw/display/xlnx_dp.c | 2 +- hw/dma/pl330.c | 2 +- hw/dma/xlnx-zynq-devcfg.c | 2 +- hw/dma/xlnx_dpdma.c| 2 +- hw/i2c/i2c-ddc.c | 2 +- hw/misc/auxbus.c | 2 +- hw/misc/macio/mac_dbdma.c | 4 ++-- hw/misc/mmio_interface.c | 2 +- hw/misc/stm32f2xx_syscfg.c | 2 +- hw/misc/zynq_slcr.c| 2 +- hw/net/cadence_gem.c | 2 +- hw/net/pcnet.c | 20 ++-- hw/ssi/mss-spi.c | 2 +- hw/ssi/stm32f2xx_spi.c | 2 +- hw/ssi/xilinx_spi.c| 2 +- hw/ssi/xilinx_spips.c | 2 +- hw/timer/a9gtimer.c| 2 +- hw/timer/cadence_ttc.c | 2 +- hw/timer/mss-timer.c | 2 +- hw/timer/stm32f2xx_timer.c | 2 +- hw/tpm/tpm_passthrough.c | 2 +- hw/tpm/tpm_tis.c | 2 +- migration/rdma.c | 2 +- target/arm/translate-a64.c | 2 +- target/mips/msa_helper.c | 34 ++ target/s390x/kvm.c | 2 +- tests/tcg/test-mmap.c | 2 +- 36 files changed, 70 insertions(+), 68 deletions(-) -- 2.14.3
[Qemu-devel] [PATCH 2/3] mips: Tweak location of ';' in macros
It is more typical to provide the ';' by the caller of a macro than to embed it in the macro itself; this is because syntax highlight engines can get confused if a macro is called without a semicolon before the closing '}'. Signed-off-by: Eric Blake --- target/mips/msa_helper.c | 34 ++ 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c index f167a42655..8fb7a369ca 100644 --- a/target/mips/msa_helper.c +++ b/target/mips/msa_helper.c @@ -682,13 +682,13 @@ static inline int64_t msa_mod_u_df(uint32_t df, int64_t arg1, int64_t arg2) do {\ e = SIGNED_EVEN(a, df); \ o = SIGNED_ODD(a, df); \ -} while (0); +} while (0) #define UNSIGNED_EXTRACT(e, o, a, df) \ do {\ e = UNSIGNED_EVEN(a, df); \ o = UNSIGNED_ODD(a, df);\ -} while (0); +} while (0) static inline int64_t msa_dotp_s_df(uint32_t df, int64_t arg1, int64_t arg2) { @@ -1120,9 +1120,11 @@ void helper_msa_splat_df(CPUMIPSState *env, uint32_t df, uint32_t wd, #define MSA_LOOP_COND_D MSA_LOOP_COND(DF_DOUBLE) #define MSA_LOOP(DF) \ +do { \ for (i = 0; i < (MSA_LOOP_COND_ ## DF) ; i++) { \ -MSA_DO_ ## DF \ -} +MSA_DO_ ## DF; \ +} \ +} while (0) #define MSA_FN_DF(FUNC) \ void helper_msa_##FUNC(CPUMIPSState *env, uint32_t df, uint32_t wd, \ @@ -1135,17 +1137,17 @@ void helper_msa_##FUNC(CPUMIPSState *env, uint32_t df, uint32_t wd, \ uint32_t i; \ switch (df) { \ case DF_BYTE: \ -MSA_LOOP_B \ +MSA_LOOP_B; \ break; \ case DF_HALF: \ -MSA_LOOP_H \ +MSA_LOOP_H; \ break; \ case DF_WORD: \ -MSA_LOOP_W \ +MSA_LOOP_W; \ break; \ case DF_DOUBLE: \ -MSA_LOOP_D \ - break; \ +MSA_LOOP_D; \ +break; \ default:\ assert(0); \ } \ @@ -1168,7 +1170,7 @@ void helper_msa_##FUNC(CPUMIPSState *env, uint32_t df, uint32_t wd, \ do {\ R##DF(pwx, i) = pwt->DF[2*i]; \ L##DF(pwx, i) = pws->DF[2*i]; \ -} while (0); +} while (0) MSA_FN_DF(pckev_df) #undef MSA_DO @@ -1176,7 +1178,7 @@ MSA_FN_DF(pckev_df) do {\ R##DF(pwx, i) = pwt->DF[2*i+1]; \ L##DF(pwx, i) = pws->DF[2*i+1]; \ -} while (0); +} while (0) MSA_FN_DF(pckod_df) #undef MSA_DO @@ -1184,7 +1186,7 @@ MSA_FN_DF(pckod_df) do {\ pwx->DF[2*i] = L##DF(pwt, i); \ pwx->DF[2*i+1] = L##DF(pws, i); \ -} while (0); +} while (0) MSA_FN_DF(ilvl_df) #undef MSA_DO @@ -1192,7 +1194,7 @@ MSA_FN_DF(ilvl_df) do {\ pwx->DF[2*i] = R##DF(pwt, i); \ pwx->DF[2*i+1] = R##DF(pws, i); \ -} while (0); +} while (0) MSA_FN_DF(ilvr_df) #undef MSA_DO @@ -1200,7 +1202,7 @@ MSA_FN_DF(ilvr_df) do {\ pwx->DF[2*i] = pwt->DF[2*i]; \ pwx->DF[2*i+1] = pws->DF[2*i]; \ -} while (0); +} while (0) MSA_FN_DF(ilvev_df) #undef MSA_DO @@ -1208,7 +1210,7 @@ MSA_FN_DF(ilvev_df) do {\ pwx->DF[2*i] = pwt->DF[2*i+1];\ pwx->DF[2*i+1] = pws->DF[2*i+1];\ -} while (0); +} while (0) MSA_FN_DF(ilvod_df) #undef MSA_DO #undef MSA_LOOP_COND @@ -1222,7 +1224,7 @@ MSA_FN_DF(ilvod_df) uint32_t k = (pwd->DF[i] & 0x3f) % (2 * n); \ pwx->DF[i] =\ (pwd->DF[i] & 0xc0
[Qemu-devel] [PATCH 3/3] maint: Fix macros with broken 'do/while(0); ' usage
The point of writing a macro embedded in a 'do { ... } while (0)' loop is so that the macro can be used as a drop-in statement with the caller supplying the trailing ';'. Although our coding style frowns on brace-less 'if': if (cond) statement; else something else; the use of do/while (0) in a macro is absolutely essential for the purpose of avoiding a syntax error on the 'else' - but it only works if there is no trailing ';' in the macro (as the ';' in the code calling the macro would then be a second statement and cause the 'else' to not pair to the 'if'). Many of the places touched in this code are examples of the ugly bit-rotting debug print statements; cleaning those up is left as a bite-sized task for another day. Found mechanically via: $ git grep -B1 'while (0);' | grep -A1 Signed-off-by: Eric Blake --- tests/acpi-utils.h | 8 ui/sdl_zoom_template.h | 8 audio/paaudio.c| 4 ++-- hw/adc/stm32f2xx_adc.c | 2 +- hw/block/m25p80.c | 2 +- hw/char/cadence_uart.c | 2 +- hw/char/stm32f2xx_usart.c | 2 +- hw/display/cg3.c | 2 +- hw/display/dpcd.c | 2 +- hw/display/xlnx_dp.c | 2 +- hw/dma/pl330.c | 2 +- hw/dma/xlnx-zynq-devcfg.c | 2 +- hw/dma/xlnx_dpdma.c| 2 +- hw/i2c/i2c-ddc.c | 2 +- hw/misc/auxbus.c | 2 +- hw/misc/macio/mac_dbdma.c | 4 ++-- hw/misc/mmio_interface.c | 2 +- hw/misc/stm32f2xx_syscfg.c | 2 +- hw/misc/zynq_slcr.c| 2 +- hw/net/cadence_gem.c | 2 +- hw/ssi/mss-spi.c | 2 +- hw/ssi/stm32f2xx_spi.c | 2 +- hw/ssi/xilinx_spi.c| 2 +- hw/ssi/xilinx_spips.c | 2 +- hw/timer/a9gtimer.c| 2 +- hw/timer/cadence_ttc.c | 2 +- hw/timer/mss-timer.c | 2 +- hw/timer/stm32f2xx_timer.c | 2 +- hw/tpm/tpm_passthrough.c | 2 +- hw/tpm/tpm_tis.c | 2 +- migration/rdma.c | 2 +- target/arm/translate-a64.c | 2 +- target/s390x/kvm.c | 2 +- tests/tcg/test-mmap.c | 2 +- 34 files changed, 42 insertions(+), 42 deletions(-) diff --git a/tests/acpi-utils.h b/tests/acpi-utils.h index d5ca5b6238..ac52abd0dd 100644 --- a/tests/acpi-utils.h +++ b/tests/acpi-utils.h @@ -32,7 +32,7 @@ typedef struct { do {\ memread(addr, &field, sizeof(field)); \ addr += sizeof(field); \ -} while (0); +} while (0) #define ACPI_READ_ARRAY_PTR(arr, length, addr) \ do {\ @@ -40,7 +40,7 @@ typedef struct { for (idx = 0; idx < length; ++idx) {\ ACPI_READ_FIELD(arr[idx], addr);\ } \ -} while (0); +} while (0) #define ACPI_READ_ARRAY(arr, addr) \ ACPI_READ_ARRAY_PTR(arr, sizeof(arr) / sizeof(arr[0]), addr) @@ -56,7 +56,7 @@ typedef struct { ACPI_READ_FIELD((table)->oem_revision, addr);\ ACPI_READ_ARRAY((table)->asl_compiler_id, addr); \ ACPI_READ_FIELD((table)->asl_compiler_revision, addr); \ -} while (0); +} while (0) #define ACPI_ASSERT_CMP(actual, expected) do { \ char ACPI_ASSERT_CMP_str[5] = {}; \ @@ -77,7 +77,7 @@ typedef struct { ACPI_READ_FIELD((field).bit_offset, addr); \ ACPI_READ_FIELD((field).access_width, addr); \ ACPI_READ_FIELD((field).address, addr); \ -} while (0); +} while (0) uint8_t acpi_calc_checksum(const uint8_t *data, int len); diff --git a/ui/sdl_zoom_template.h b/ui/sdl_zoom_template.h index 3bb508b51e..6a424adfb4 100644 --- a/ui/sdl_zoom_template.h +++ b/ui/sdl_zoom_template.h @@ -34,22 +34,22 @@ #define setRed(r, pcolor) do { \ *pcolor = ((*pcolor) & (~(dpf->Rmask))) + \ (((r) & (dpf->Rmask >> dpf->Rshift)) << dpf->Rshift); \ -} while (0); +} while (0) #define setGreen(g, pcolor) do { \ *pcolor = ((*pcolor) & (~(dpf->Gmask))) + \ (((g) & (dpf->Gmask >> dpf->Gshift)) << dpf->Gshift); \ -} while (0); +} while (0) #define setBlue(b, pcolor) do { \ *pcolor = ((*pcolor) & (~(dpf->Bmask))) + \ (((b) & (dpf->Bmask >> dpf->Bshift)) << dpf->Bshift); \ -} while (0); +} while (0) #define setAlpha(a, pcolor) do { \ *pcolor = ((*pcolor) & (~(dpf->Amask))) + \ (((a) & (dpf->Amask >> dpf->Ashift)) << dpf->Ashift); \ -} while (0); +} while (0) static void glue(sdl_zoom_rgb, BPP)(SDL_Surface *src, SDL_Surface *dst, int smooth, SDL_Rect *dst_rect) diff --git a/audio/paaudio.c b/audio/paaudio.c index 65beb6f010..2a35e6f82c 100644 --- a/audio/paaudio.c +++ b/audio/paaudio.c @@ -89,7 +89,7 @@ static inline int PA_STREAM_IS_GOOD(pa_stream_state_t x) } \ goto label; \ }
[Qemu-devel] [PATCH 1/3] net: Drop unusual use of do { } while (0);
For a couple of macros in pcnet.c, we have to provide a new scope to avoid compiler warnings about declarations in the middle of a switch statement that aren't in a sub-scope. But use of 'do { ... } while (0);' merely to provide that new scope is arcane overkill, compared to just using '{ ... }'. Signed-off-by: Eric Blake --- hw/net/pcnet.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c index 654455355f..641bf2fd88 100644 --- a/hw/net/pcnet.c +++ b/hw/net/pcnet.c @@ -455,32 +455,32 @@ static inline void pcnet_rmd_store(PCNetState *s, struct pcnet_RMD *rmd, #define CHECK_RMD(ADDR,RES) do {\ switch (BCR_SWSTYLE(s)) { \ case 0x00: \ -do {\ +{ \ uint16_t rda[4];\ s->phys_mem_read(s->dma_opaque, (ADDR), \ (void *)&rda[0], sizeof(rda), 0); \ (RES) |= (rda[2] & 0xf000)!=0xf000; \ (RES) |= (rda[3] & 0xf000)!=0x; \ -} while (0);\ +} \ break; \ case 0x01: \ case 0x02: \ -do {\ +{ \ uint32_t rda[4];\ s->phys_mem_read(s->dma_opaque, (ADDR), \ (void *)&rda[0], sizeof(rda), 0); \ (RES) |= (rda[1] & 0xf000L)!=0xf000L; \ (RES) |= (rda[2] & 0xf000L)!=0xL; \ -} while (0);\ +} \ break; \ case 0x03: \ -do {\ +{ \ uint32_t rda[4];\ s->phys_mem_read(s->dma_opaque, (ADDR), \ (void *)&rda[0], sizeof(rda), 0); \ (RES) |= (rda[0] & 0xf000L)!=0xL; \ (RES) |= (rda[1] & 0xf000L)!=0xf000L; \ -} while (0);\ +} \ break; \ } \ } while (0) @@ -488,22 +488,22 @@ static inline void pcnet_rmd_store(PCNetState *s, struct pcnet_RMD *rmd, #define CHECK_TMD(ADDR,RES) do {\ switch (BCR_SWSTYLE(s)) { \ case 0x00: \ -do {\ +{ \ uint16_t xda[4];\ s->phys_mem_read(s->dma_opaque, (ADDR), \ (void *)&xda[0], sizeof(xda), 0); \ (RES) |= (xda[2] & 0xf000)!=0xf000; \ -} while (0);\ +} \ break; \ case 0x01: \ case 0x02: \ case 0x03: \ -do {\ +{ \ uint32_t xda[4];\ s->phys_mem_read(s->dma_opaque, (ADDR), \ (void *)&xda[0], sizeof(xda), 0); \ (RES) |= (xda[1] & 0xf000L)!=0xf000L; \ -} while (0);\ +} \ break; \ } \ } while (0) -- 2.14.3