Re: [RFC v2 05/13] vhost: Route guest->host notification through shadow virtqueue
在 2021/3/18 下午8:04, Eugenio Perez Martin 写道: On Thu, Mar 18, 2021 at 11:48 AM Eugenio Perez Martin wrote: On Thu, Mar 18, 2021 at 10:29 AM Jason Wang wrote: 在 2021/3/18 下午5:18, Eugenio Perez Martin 写道: On Thu, Mar 18, 2021 at 4:11 AM Jason Wang wrote: 在 2021/3/18 上午12:47, Eugenio Perez Martin 写道: On Wed, Mar 17, 2021 at 3:05 AM Jason Wang wrote: 在 2021/3/16 下午6:31, Eugenio Perez Martin 写道: On Tue, Mar 16, 2021 at 8:18 AM Jason Wang wrote: 在 2021/3/16 上午3:48, Eugenio Pérez 写道: Shadow virtqueue notifications forwarding is disabled when vhost_dev stops, so code flow follows usual cleanup. Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-shadow-virtqueue.h | 7 ++ include/hw/virtio/vhost.h | 4 + hw/virtio/vhost-shadow-virtqueue.c | 113 ++- hw/virtio/vhost.c | 143 - 4 files changed, 265 insertions(+), 2 deletions(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h index 6cc18d6acb..c891c6510d 100644 --- a/hw/virtio/vhost-shadow-virtqueue.h +++ b/hw/virtio/vhost-shadow-virtqueue.h @@ -17,6 +17,13 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue; +bool vhost_shadow_vq_start(struct vhost_dev *dev, + unsigned idx, + VhostShadowVirtqueue *svq); +void vhost_shadow_vq_stop(struct vhost_dev *dev, + unsigned idx, + VhostShadowVirtqueue *svq); + VhostShadowVirtqueue *vhost_shadow_vq_new(struct vhost_dev *dev, int idx); void vhost_shadow_vq_free(VhostShadowVirtqueue *vq); diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index ac963bf23d..7ffdf9aea0 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -55,6 +55,8 @@ struct vhost_iommu { QLIST_ENTRY(vhost_iommu) iommu_next; }; +typedef struct VhostShadowVirtqueue VhostShadowVirtqueue; + typedef struct VhostDevConfigOps { /* Vhost device config space changed callback */ @@ -83,7 +85,9 @@ struct vhost_dev { uint64_t backend_cap; bool started; bool log_enabled; +bool shadow_vqs_enabled; uint64_t log_size; +VhostShadowVirtqueue **shadow_vqs; Any reason that you don't embed the shadow virtqueue into vhost_virtqueue structure? Not really, it could be relatively big and I would prefer SVQ members/methods to remain hidden from any other part that includes vhost.h. But it could be changed, for sure. (Note that there's a masked_notifier in struct vhost_virtqueue). They are used differently: in SVQ the masked notifier is a pointer, and if it's NULL the SVQ code knows that device is not masked. The vhost_virtqueue is the real owner. Yes, but it's an example for embedding auxciliary data structures in the vhost_virtqueue. It could be replaced by a boolean in SVQ or something like that, I experimented with a tri-state too (UNMASKED, MASKED, MASKED_NOTIFIED) and let vhost.c code to manage all the transitions. But I find clearer the pointer use, since it's the more natural for the vhost_virtqueue_mask, vhost_virtqueue_pending existing functions. This masking/unmasking is the part I dislike the most from this series, so I'm very open to alternatives. See below. I think we don't even need to care about that. Error *migration_blocker; const VhostOps *vhost_ops; void *opaque; diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index 4512e5b058..3e43399e9c 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -8,9 +8,12 @@ */ #include "hw/virtio/vhost-shadow-virtqueue.h" +#include "hw/virtio/vhost.h" + +#include "standard-headers/linux/vhost_types.h" #include "qemu/error-report.h" -#include "qemu/event_notifier.h" +#include "qemu/main-loop.h" /* Shadow virtqueue to relay notifications */ typedef struct VhostShadowVirtqueue { @@ -18,14 +21,121 @@ typedef struct VhostShadowVirtqueue { EventNotifier kick_notifier; /* Shadow call notifier, sent to vhost */ EventNotifier call_notifier; + +/* + * Borrowed virtqueue's guest to host notifier. + * To borrow it in this event notifier allows to register on the event + * loop and access the associated shadow virtqueue easily. If we use the + * VirtQueue, we don't have an easy way to retrieve it. So this is something that worries me. It looks like a layer violation that makes the codes harder to work correctly. I don't follow you here. The vhost code already depends on virtqueue in the same sense: virtio_queue_get_host_notifier is called on vhost_virtqueue_start. So if this behavior ever changes it is unlikely for vhost to keep working without changes. vhost_virtqueue has a kick/call int where I think it should be stored actually, b
Re: [PATCH 1/4] m68k: add the virtio devices aliases
On 18/03/2021 23.39, Laurent Vivier wrote: Similarly to 5f629d943cb0 ("s390x: fix s390 virtio aliases"), define the virtio aliases. This allows to start machines with virtio devices without knowledge of the implementation type. For instance, we can use "-device virtio-scsi" on m68k, s390x or PC, and the device will be "virtio-scsi-device", "virtio-scsi-ccw" or "virtio-scsi-pci". This already exists for s390x and -ccw interfaces, adds them for m68k and MMIO (-device) interfaces. Signed-off-by: Laurent Vivier --- softmmu/qdev-monitor.c | 46 +++--- 1 file changed, 30 insertions(+), 16 deletions(-) With the typo mentioned by Philippe fixed: Reviewed-by: Thomas Huth
[PULL v4 0/6] QOM and fdc patches patches for 2021-03-16
The following changes since commit 8a40754bca14df63c6d2ffe473b68a270dc50679: Merge remote-tracking branch 'remotes/nvme/tags/nvme-next-pull-request' into staging (2021-03-18 19:55:37 +) are available in the Git repository at: git://repo.or.cz/qemu/armbru.git tags/pull-qom-fdc-2021-03-16-v4 for you to fetch changes up to 0c727a621a646504ccec2b08e55fd48030448466: memory: Drop "qemu:" prefix from QOM memory region type names (2021-03-19 06:55:49 +0100) QOM and fdc patches patches for 2021-03-16 Markus Armbruster (6): docs/system/deprecated: Fix note on fdc drive properties fdc: Drop deprecated floppy configuration fdc: Inline fdctrl_connect_drives() into fdctrl_realize_common() blockdev: Drop deprecated bogus -drive interface type hw: Replace anti-social QOM type names memory: Drop "qemu:" prefix from QOM memory region type names docs/system/deprecated.rst | 33 -- docs/system/removed-features.rst | 57 +++ include/exec/memory.h| 4 +- include/hw/arm/armv7m.h | 2 +- include/hw/arm/fsl-imx25.h | 2 +- include/hw/arm/fsl-imx31.h | 2 +- include/hw/arm/fsl-imx6.h| 2 +- include/hw/arm/fsl-imx6ul.h | 2 +- include/hw/arm/fsl-imx7.h| 2 +- include/hw/arm/xlnx-zynqmp.h | 2 +- include/hw/cris/etraxfs.h| 2 +- include/hw/i386/ich9.h | 2 +- include/hw/misc/grlib_ahb_apb_pnp.h | 4 +- include/hw/misc/zynq-xadc.h | 2 +- include/hw/register.h| 2 +- include/hw/sparc/grlib.h | 6 +- include/sysemu/blockdev.h| 1 - blockdev.c | 37 +- hw/arm/xilinx_zynq.c | 2 +- hw/audio/cs4231.c| 2 +- hw/block/fdc.c | 77 +--- hw/char/etraxfs_ser.c| 2 +- hw/cris/axis_dev88.c | 6 +- hw/display/tcx.c | 2 +- hw/intc/etraxfs_pic.c| 2 +- hw/microblaze/xlnx-zynqmp-pmu.c | 2 +- hw/misc/zynq_slcr.c | 2 +- hw/sparc/sun4m.c | 12 +- hw/timer/etraxfs_timer.c | 2 +- softmmu/vl.c | 10 +- tests/qemu-iotests/172 | 31 +- tests/qemu-iotests/172.out | 562 +-- tests/vmstate-static-checker-data/dump1.json | 4 +- tests/vmstate-static-checker-data/dump2.json | 4 +- 34 files changed, 125 insertions(+), 761 deletions(-) -- 2.26.3
Re: [PULL 5/5] m68k: add Virtual M68k Machine
On 18/03/2021 18.28, Max Reitz wrote: [...] From that it follows that I don’t see much use in testing specific devices either. Say there’s a platform that provides both virtio-pci and virtio-mmio, the default (say virtio-pci) is fine for the iotests. I see little value in testing virtio-mmio as well. (Perhaps I’m short-sighted, though.) That's a fair point. But still, if someone compiled QEMU only with a target that only provided virtio-mmio, the iotests should not fail when running "make check". To avoid that we continue playing whack-a-mole here in the future, maybe it would be better to restrict the iotests to the "main" targets only, e.g. modify check-block.sh so that the tests only run with x86, aarch64, s390x and ppc64 ? Thomas
Re: [PATCH 1/2] floppy: add a regression test for CVE-2020-25741
Alexander Bulekov writes: > dd if=/dev/zero of=/tmp/fda.img bs=1024 count=1440 > cat << EOF | ./qemu-system-i386 -nographic -m 512M -nodefaults \ > -accel qtest -fda /tmp/fda.img -qtest stdio > outw 0x3f4 0x0500 > outb 0x3f5 0x00 > outb 0x3f5 0x00 > outw 0x3f4 0x00 > outb 0x3f5 0x00 > outw 0x3f1 0x0400 > outw 0x3f4 0x0 > outw 0x3f4 0x00 > outb 0x3f5 0x0 > outb 0x3f5 0x01 > outw 0x3f1 0x0500 > outb 0x3f5 0x00 > EOF > > Signed-off-by: Alexander Bulekov I guess this is a reproducer. Please also describe actual and expected result. Same for PATCH 2.
Re: [PATCH] gitlab-ci: Restrict jobs using Docker to runners having 'docker' tag
On 19/03/2021 01.43, Philippe Mathieu-Daudé wrote: When a job is based on a Docker image [1], or is using a Docker service, it requires a runner with Docker installed. Gitlab shared runners provide the 'docker' tag when they have it installed. Are Gitlab shared runners are limited resources, we'd like to s/Are/As/ add more runners to QEMU repositories hosted on Gitlab. If a runner doesn't provide Docker, our jobs requiring it will fail. Use the standard 'docker' tag to mark the jobs requiring Docker on the runner. [1] https://docs.gitlab.com/ee/ci/yaml/#image [2] https://docs.gitlab.com/ee/ci/yaml/#services Signed-off-by: Philippe Mathieu-Daudé [...] diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index f65cb11c4d3..d4511cf7dea 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -14,6 +14,8 @@ include: - local: '/.gitlab-ci.d/crossbuilds.yml' .native_build_job_template: &native_build_job_definition + tags: + - docker stage: build image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest before_script: @@ -38,6 +40,8 @@ include: fi .native_test_job_template: &native_test_job_definition + tags: + - docker stage: test image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest script: If you add it to the templates ... won't this disable most of the jobs on the dedicated runners that don't have docker? Wouldn't it be better to add the tag only to the jobs that run "make check-tcg" ? Thomas
Re: Serious doubts about Gitlab CI
On 18/03/2021 11.28, Philippe Mathieu-Daudé wrote: On 3/18/21 10:50 AM, Philippe Mathieu-Daudé wrote: On 3/18/21 10:33 AM, Daniel P. Berrangé wrote: [...] It feels like what you hit here is fallout from your account accidentally getting blocked, rather than something which is hitting every contributor to QEMU. Did they restore projects as private perhaps ? Yes my repository was restored as private and I had to switch it to public. I'll try to blew everything (after backing it up) and recreate it as public from start, and see if I get the unlimited minutes back. You were right, I forked the project again as public and can run CI pipelines. I note this is different that my previous account, I am restricted at 15 jobs at a time, so this is slower. That's not related to your new account. It's a new behaviour that gitlab introduced for everybody using the shared runners recently. At least it's happening for me, too. I sometimes even only get 5 jobs running at the same time, but sometimes also more, I think ... it likely depends on the day and the amount of free runners. Anyway, I think we should try to somehow decrease the amount of shared-runner jobs in our CI again. It's grown too big. Some long-running jobs should maybe be moved to a dedicated runner instead... Cleber, where are we with the custom runners? Any news here? Thomas
[PATCH 2/2] floppy: add a regression test for CVE-2021-20196
dd if=/dev/zero of=/tmp/fda.img bs=1024 count=1440 cat << EOF | ./qemu-system-i386 -nographic -m 512M -nodefaults \ -accel qtest -fda /tmp/fda.img -qtest stdio outw 0x3f4 0x0500 outb 0x3f5 0x00 outb 0x3f5 0x00 outw 0x3f4 0x00 outb 0x3f5 0x00 outw 0x3f1 0x0400 outw 0x3f4 0x0 outw 0x3f4 0x00 outb 0x3f5 0x0 outb 0x3f5 0x01 outw 0x3f1 0x0500 outb 0x3f5 0x00 EOF Signed-off-by: Alexander Bulekov --- Since this looks very similar to CVE-2021-20196 (I believe Li pointed out that issue in this thread), I'm also posting the reproducer for that here. tests/qtest/fuzz-test.c | 57 + 1 file changed, 57 insertions(+) diff --git a/tests/qtest/fuzz-test.c b/tests/qtest/fuzz-test.c index 62ececc66f..8e4ccdaca8 100644 --- a/tests/qtest/fuzz-test.c +++ b/tests/qtest/fuzz-test.c @@ -76,6 +76,61 @@ static void test_fdc_cve_2020_25741(void) qtest_quit(s); } + +/* + * ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x0344 + * The signal is caused by a WRITE memory access. + * #0 0x56494543 in blk_inc_in_flight /block/block-backend.c:1356:5 + * #1 0x56494543 in blk_prw /block/block-backend.c:1328:5 + * #2 0x56494d03 in blk_pread /block/block-backend.c:1491:15 + * #3 0x55ec8986 in fdctrl_read_data /hw/block/fdc.c:1910:17 + * #4 0x55ec8986 in fdctrl_read /hw/block/fdc.c:936:18 + * #5 0x563f26b7 in portio_read /softmmu/ioport.c:185:25 + * #6 0x5636908a in memory_region_read_accessor /softmmu/memory.c:442:11 + * #7 0x5635ec25 in access_with_adjusted_size /softmmu/memory.c:552:18 + * #8 0x5635ec25 in memory_region_dispatch_read1 /softmmu/memory.c:1420:16 + * #9 0x5635ec25 in memory_region_dispatch_read /softmmu/memory.c:1448:9 + * #10 0x56248aa7 in flatview_read_continue /softmmu/physmem.c:2810:23 + * #11 0x563f18f0 in address_space_read /include/exec/memory.h:2494:26 + * #12 0x563f18f0 in cpu_inw /softmmu/ioport.c:99:5 + * #13 0x5637619c in qtest_process_command /softmmu/qtest.c:502:21 + * #14 0x5637535d in qtest_process_inbuf /softmmu/qtest.c:797:9 + * #15 0x56405f9c in tcp_chr_read /chardev/char-socket.c:557:13 + * #16 0x76f8ac3e in g_main_context_dispatch + * #17 0x567479f1 in glib_pollfds_poll /util/main-loop.c:231:9 + * #18 0x567479f1 in os_host_main_loop_wait /util/main-loop.c:254:5 + * #19 0x567479f1 in main_loop_wait /util/main-loop.c:530:11 + * #20 0x562d9ee4 in qemu_main_loop /softmmu/runstate.c:725:9 + * #21 0x55d5b615 in main /softmmu/main.c:50:5 +*/ +static void test_fdc_cve_2021_20196(void) +{ +QTestState *s; +int fd; +char tmpdisk[] = "/tmp/fda.XX"; +fd = mkstemp(tmpdisk); +assert(fd >= 0); +ftruncate(fd, 1440 * 1024); +close(fd); + +s = qtest_initf("-nographic -m 512M -nodefaults " +"-drive file=%s,format=raw,if=floppy", tmpdisk); +qtest_outw(s, 0x3f2, 0x04); +qtest_outw(s, 0x3f4, 0x0200); +qtest_outw(s, 0x3f4, 0x00); +qtest_outw(s, 0x3f4, 0x00); +qtest_outw(s, 0x3f4, 0x00); +qtest_outw(s, 0x3f4, 0x00); +qtest_outw(s, 0x3f4, 0x00); +qtest_outw(s, 0x3f4, 0x00); +qtest_outw(s, 0x3f4, 0x00); +qtest_outw(s, 0x3f4, 0x00); +qtest_outw(s, 0x3f2, 0x01); +qtest_inw(s, 0x3f4); +qtest_quit(s); +unlink(tmpdisk); +} + int main(int argc, char **argv) { const char *arch = qtest_get_arch(); @@ -87,6 +142,8 @@ int main(int argc, char **argv) test_lp1878642_pci_bus_get_irq_level_assert); qtest_add_func("fuzz/test_fdc_cve_2020_25741", test_fdc_cve_2020_25741); +qtest_add_func("fuzz/test_fdc_cve_2021_20196", + test_fdc_cve_2021_20196); } return g_test_run(); -- 2.27.0
[PATCH 1/2] floppy: add a regression test for CVE-2020-25741
dd if=/dev/zero of=/tmp/fda.img bs=1024 count=1440 cat << EOF | ./qemu-system-i386 -nographic -m 512M -nodefaults \ -accel qtest -fda /tmp/fda.img -qtest stdio outw 0x3f4 0x0500 outb 0x3f5 0x00 outb 0x3f5 0x00 outw 0x3f4 0x00 outb 0x3f5 0x00 outw 0x3f1 0x0400 outw 0x3f4 0x0 outw 0x3f4 0x00 outb 0x3f5 0x0 outb 0x3f5 0x01 outw 0x3f1 0x0500 outb 0x3f5 0x00 EOF Signed-off-by: Alexander Bulekov --- Might be useful for reproducing/regression testing tests/qtest/fuzz-test.c | 54 + 1 file changed, 54 insertions(+) diff --git a/tests/qtest/fuzz-test.c b/tests/qtest/fuzz-test.c index 00149abec7..62ececc66f 100644 --- a/tests/qtest/fuzz-test.c +++ b/tests/qtest/fuzz-test.c @@ -24,6 +24,58 @@ static void test_lp1878642_pci_bus_get_irq_level_assert(void) qtest_quit(s); } +/* + * ERROR: AddressSanitizer: SEGV on unknown address 0x0344 + * The signal is caused by a WRITE memory access. + * #0 0x59d7f519 in blk_inc_in_flight /block/block-backend.c:1356:5 + * #1 0x59d7f519 in blk_prw /block/block-backend.c:1328:5 + * #2 0x59d81673 in blk_pwrite /block/block-backend.c:1501:15 + * #3 0x58fc763f in fdctrl_write_data /hw/block/fdc.c:2414:17 + * #4 0x58fc763f in fdctrl_write /hw/block/fdc.c:967:9 + * #5 0x594a622c in memory_region_write_accessor /softmmu/memory.c:491:5 + * #6 0x594a5c93 in access_with_adjusted_size /softmmu/memory.c:552:18 + * #7 0x594a54f0 in memory_region_dispatch_write /softmmu/memory.c + * #8 0x5964a686 in flatview_write_continue /softmmu/physmem.c:2776:23 + * #9 0x5963fde8 in flatview_write /softmmu/physmem.c:2816:14 + * #10 0x5963fde8 in address_space_write /softmmu/physmem.c:2908:18 + * #11 0x5968f8a1 in cpu_outb /softmmu/ioport.c:60:5 + * #12 0x597d5619 in qtest_process_command /softmmu/qtest.c:479:13 + * #13 0x597d34bf in qtest_process_inbuf /softmmu/qtest.c:797:9 + * #14 0x59b4539d in fd_chr_read /chardev/char-fd.c:68:9 + * #15 0x77b7dc3e in g_main_context_dispatch + * #16 0x5a4facdc in glib_pollfds_poll /util/main-loop.c:231:9 + * #17 0x5a4facdc in os_host_main_loop_wait /util/main-loop.c:254:5 + * #18 0x5a4facdc in main_loop_wait /util/main-loop.c:530:11 + * #19 0x59a6dec9 in qemu_main_loop /softmmu/runstate.c:725:9 + * #20 0x581a3085 in main /softmmu/main.c:50:5 + */ +static void test_fdc_cve_2020_25741(void) +{ +QTestState *s; +int fd; +char tmpdisk[] = "/tmp/fda.XX"; +fd = mkstemp(tmpdisk); +assert(fd >= 0); +ftruncate(fd, 1440 * 1024); +close(fd); + +s = qtest_initf("-nographic -m 512M -nodefaults " +"-drive file=%s,format=raw,if=floppy", tmpdisk); +qtest_outw(s, 0x3f4, 0x0500); +qtest_outb(s, 0x3f5, 0x00); +qtest_outb(s, 0x3f5, 0x00); +qtest_outw(s, 0x3f4, 0x00); +qtest_outb(s, 0x3f5, 0x00); +qtest_outw(s, 0x3f1, 0x0400); +qtest_outw(s, 0x3f4, 0x0); +qtest_outw(s, 0x3f4, 0x00); +qtest_outb(s, 0x3f5, 0x0); +qtest_outb(s, 0x3f5, 0x01); +qtest_outw(s, 0x3f1, 0x0500); +qtest_outb(s, 0x3f5, 0x00); +qtest_quit(s); +} + int main(int argc, char **argv) { const char *arch = qtest_get_arch(); @@ -33,6 +85,8 @@ int main(int argc, char **argv) if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) { qtest_add_func("fuzz/test_lp1878642_pci_bus_get_irq_level_assert", test_lp1878642_pci_bus_get_irq_level_assert); +qtest_add_func("fuzz/test_fdc_cve_2020_25741", + test_fdc_cve_2020_25741); } return g_test_run(); -- 2.27.0
[PATCH V4 6/7] net/colo-compare: Add passthrough list to CompareState
Add passthrough list for each CompareState. Signed-off-by: Zhang Chen --- net/colo-compare.c | 29 + net/colo-compare.h | 11 +++ 2 files changed, 40 insertions(+) diff --git a/net/colo-compare.c b/net/colo-compare.c index a803f8b888..40af8cd501 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -141,6 +141,7 @@ static int packet_enqueue(CompareState *s, int mode, Connection **con) ConnectionKey key; Packet *pkt = NULL; Connection *conn; +PassthroughEntry *bypass, *next; int ret; if (mode == PRIMARY_IN) { @@ -160,6 +161,32 @@ static int packet_enqueue(CompareState *s, int mode, Connection **con) } fill_connection_key(pkt, &key); +/* Check COLO passthrough connenction */ +qemu_mutex_lock(&s->passthroughlist_mutex); +if (!QLIST_EMPTY(&s->passthroughlist)) { +QLIST_FOREACH_SAFE(bypass, &s->passthroughlist, node, next) { +if (((key.ip_proto == IPPROTO_TCP) && (bypass->l4_protocol == 0)) || +((key.ip_proto == IPPROTO_UDP) && (bypass->l4_protocol == 1))) { +if (bypass->src_port == 0 || bypass->src_port == key.dst_port) { +if (bypass->src_ip.s_addr == 0 || +bypass->src_ip.s_addr == key.src.s_addr) { +if (bypass->dst_port == 0 || +bypass->dst_port == key.src_port) { +if (bypass->dst_ip.s_addr == 0 || +bypass->dst_ip.s_addr == key.dst.s_addr) { +packet_destroy(pkt, NULL); +pkt = NULL; +qemu_mutex_unlock(&s->passthroughlist_mutex); +return -1; +} +} +} +} +} +} +} +qemu_mutex_unlock(&s->passthroughlist_mutex); + conn = connection_get(s->connection_track_table, &key, &s->conn_list); @@ -1224,6 +1251,7 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp) } g_queue_init(&s->conn_list); +QLIST_INIT(&s->passthroughlist); s->connection_track_table = g_hash_table_new_full(connection_key_hash, connection_key_equal, @@ -1236,6 +1264,7 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp) if (!colo_compare_active) { qemu_mutex_init(&event_mtx); qemu_cond_init(&event_complete_cond); +qemu_mutex_init(&s->passthroughlist_mutex); colo_compare_active = true; } QTAILQ_INSERT_TAIL(&net_compares, s, next); diff --git a/net/colo-compare.h b/net/colo-compare.h index 2a9dcac0a7..2259abcd63 100644 --- a/net/colo-compare.h +++ b/net/colo-compare.h @@ -54,6 +54,15 @@ typedef struct SendEntry { uint8_t *buf; } SendEntry; +typedef struct PassthroughEntry { +int l4_protocol; +int src_port; +int dst_port; +struct in_addr src_ip; +struct in_addr dst_ip; +QLIST_ENTRY(PassthroughEntry) node; +} PassthroughEntry; + /* * + CompareState ++ * | | @@ -110,6 +119,8 @@ struct CompareState { QEMUBH *event_bh; enum colo_event event; +QLIST_HEAD(, PassthroughEntry) passthroughlist; +QemuMutex passthroughlist_mutex; QTAILQ_ENTRY(CompareState) next; }; -- 2.25.1
[PATCH V4 3/7] qapi/net: Add new QMP command for COLO passthrough
Since the real user scenario does not need COLO to monitor all traffic. Add colo-passthrough-add and colo-passthrough-del to maintain a COLO network passthrough list. Signed-off-by: Zhang Chen --- net/net.c | 10 ++ qapi/net.json | 40 2 files changed, 50 insertions(+) diff --git a/net/net.c b/net/net.c index 725a4e1450..7c7cefe0e0 100644 --- a/net/net.c +++ b/net/net.c @@ -1199,6 +1199,16 @@ void qmp_netdev_del(const char *id, Error **errp) } } +void qmp_colo_passthrough_add(L4_Connection *conn, Error **errp) +{ +/* Setup passthrough connection */ +} + +void qmp_colo_passthrough_del(L4_Connection *conn, Error **errp) +{ +/* Delete passthrough connection */ +} + static void netfilter_print_info(Monitor *mon, NetFilterState *nf) { char *str; diff --git a/qapi/net.json b/qapi/net.json index cd4a8ed95e..ec7d3b1128 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -851,3 +851,43 @@ 'data': { 'protocol': 'IP_PROTOCOL', '*id': 'str', '*src_ip': 'str', '*dst_ip': 'str', '*src_port': 'int', '*dst_port': 'int' } } +## +# @colo-passthrough-add: +# +# Add passthrough entry according to customer's needs in COLO-compare. +# +# Returns: Nothing on success +# +# Since: 6.1 +# +# Example: +# +# -> { "execute": "colo-passthrough-add", +# "arguments": { "protocol": "tcp", "id": "object0", "src_ip": "192.168.1.1", +# "dst_ip": "192.168.1.2", "src_port": 1234, "dst_port": 4321 } } +# <- { "return": {} } +# +## +{ 'command': 'colo-passthrough-add', 'boxed': true, + 'data': 'L4_Connection' } + +## +# @colo-passthrough-del: +# +# Delete passthrough entry according to customer's needs in COLO-compare. +# +# Returns: Nothing on success +# +# Since: 6.1 +# +# Example: +# +# -> { "execute": "colo-passthrough-del", +# "arguments": { "protocol": "tcp", "id": "object0", "src_ip": "192.168.1.1", +# "dst_ip": "192.168.1.2", "src_port": 1234, "dst_port": 4321 } } +# <- { "return": {} } +# +## +{ 'command': 'colo-passthrough-del', 'boxed': true, + 'data': 'L4_Connection' } + -- 2.25.1
[PATCH V4 5/7] net/colo-compare: Move data structure and define to .h file.
Make other modules can reuse COLO code. Signed-off-by: Zhang Chen --- net/colo-compare.c | 106 - net/colo-compare.h | 106 + 2 files changed, 106 insertions(+), 106 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index 84db4978ac..a803f8b888 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -17,44 +17,24 @@ #include "qemu/error-report.h" #include "trace.h" #include "qapi/error.h" -#include "net/net.h" #include "net/eth.h" #include "qom/object_interfaces.h" #include "qemu/iov.h" #include "qom/object.h" #include "net/queue.h" -#include "chardev/char-fe.h" #include "qemu/sockets.h" -#include "colo.h" -#include "sysemu/iothread.h" #include "net/colo-compare.h" -#include "migration/colo.h" -#include "migration/migration.h" #include "util.h" #include "block/aio-wait.h" #include "qemu/coroutine.h" -#define TYPE_COLO_COMPARE "colo-compare" -typedef struct CompareState CompareState; -DECLARE_INSTANCE_CHECKER(CompareState, COLO_COMPARE, - TYPE_COLO_COMPARE) - static QTAILQ_HEAD(, CompareState) net_compares = QTAILQ_HEAD_INITIALIZER(net_compares); static NotifierList colo_compare_notifiers = NOTIFIER_LIST_INITIALIZER(colo_compare_notifiers); -#define COMPARE_READ_LEN_MAX NET_BUFSIZE -#define MAX_QUEUE_SIZE 1024 - -#define COLO_COMPARE_FREE_PRIMARY 0x01 -#define COLO_COMPARE_FREE_SECONDARY 0x02 - -#define REGULAR_PACKET_CHECK_MS 1000 -#define DEFAULT_TIME_OUT_MS 3000 - /* #define DEBUG_COLO_PACKETS */ static QemuMutex colo_compare_mutex; @@ -64,92 +44,6 @@ static QemuCond event_complete_cond; static int event_unhandled_count; static uint32_t max_queue_size; -/* - * + CompareState ++ - * | | - * +---+ +---+ +---+ - * | conn list + - > conn + --- > conn + -- > .. - * +---+ +---+ +---+ - * | | | | | | - * +---+ +---v+ +---v++---v+ +---v+ - *|primary | |secondary|primary | |secondary - *|packet | |packet +|packet | |packet + - *++ ++++ ++ - *| | | | - *+---v+ +---v++---v+ +---v+ - *|primary | |secondary|primary | |secondary - *|packet | |packet +|packet | |packet + - *++ ++++ ++ - *| | | | - *+---v+ +---v++---v+ +---v+ - *|primary | |secondary|primary | |secondary - *|packet | |packet +|packet | |packet + - *++ ++++ ++ - */ - -typedef struct SendCo { -Coroutine *co; -struct CompareState *s; -CharBackend *chr; -GQueue send_list; -bool notify_remote_frame; -bool done; -int ret; -} SendCo; - -typedef struct SendEntry { -uint32_t size; -uint32_t vnet_hdr_len; -uint8_t *buf; -} SendEntry; - -struct CompareState { -Object parent; - -char *pri_indev; -char *sec_indev; -char *outdev; -char *notify_dev; -CharBackend chr_pri_in; -CharBackend chr_sec_in; -CharBackend chr_out; -CharBackend chr_notify_dev; -SocketReadState pri_rs; -SocketReadState sec_rs; -SocketReadState notify_rs; -SendCo out_sendco; -SendCo notify_sendco; -bool vnet_hdr; -uint64_t compare_timeout; -uint32_t expired_scan_cycle; - -/* - * Record the connection that through the NIC - * Element type: Connection - */ -GQueue conn_list; -/* Record the connection without repetition */ -GHashTable *connection_track_table; - -IOThread *iothread; -GMainContext *worker_context; -QEMUTimer *packet_check_timer; - -QEMUBH *event_bh; -enum colo_event event; - -QTAILQ_ENTRY(CompareState) next; -}; - -typedef struct CompareClass { -ObjectClass parent_class; -} CompareClass; - -enum { -PRIMARY_IN = 0, -SECONDARY_IN, -}; - static const char *colo_mode[] = { [PRIMARY_IN] = "primary", [SECONDARY_IN] = "secondary", diff --git a/net/colo-compare.h b/net/colo-compare.h index 22ddd512e2..2a9dcac0a7 100644 --- a/net/colo-compare.h +++ b/net/colo-compare.h @@ -17,6 +17,112 @@ #ifndef QEMU_COLO_COMPARE_H #define QEMU_COLO_COMPARE_H +#include "net/net.h" +#include "chardev/char-fe.h" +#include "migration/colo.h" +#include "migration/migration.h" +#include "sysemu/iothread.h" +#include "colo.h" + +#define TYPE_COLO_COMPARE "colo-compare" +typedef struct CompareState CompareState; +DEC
[PATCH V4 1/7] qapi/net.json: Add IP_PROTOCOL definition
Add IP_PROTOCOL as enum include TCP,UDP, ICMP... for other QMP commands. Signed-off-by: Zhang Chen --- qapi/net.json | 31 +++ 1 file changed, 31 insertions(+) diff --git a/qapi/net.json b/qapi/net.json index 87361ebd9a..498ea7aa72 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -794,3 +794,34 @@ # ## { 'command': 'query-netdev', 'returns': ['NetdevInfo'] } + +## +# @IP_PROTOCOL: +# +# Transport layer protocol. +# +# Just for IPv4. +# +# @tcp: Transmission Control Protocol. +# +# @udp: User Datagram Protocol. +# +# @dccp: Datagram Congestion Control Protocol. +# +# @sctp: Stream Control Transmission Protocol. +# +# @udplite: Lightweight User Datagram Protocol. +# +# @icmp: Internet Control Message Protocol. +# +# @igmp: Internet Group Management Protocol. +# +# @ipv6: IPv6 Encapsulation. +# +# TODO: Need to add more transport layer protocol. +# +# Since: 6.1 +## +{ 'enum': 'IP_PROTOCOL', 'data': [ 'tcp', 'udp', 'dccp', 'sctp', 'udplite', +'icmp', 'igmp', 'ipv6' ] } + -- 2.25.1
[PATCH V4 4/7] hmp-commands: Add new HMP command for COLO passthrough
Add hmp_colo_passthrough_add and hmp_colo_passthrough_del make user can maintain COLO network passthrough list in human monitor. Signed-off-by: Zhang Chen --- hmp-commands.hx | 26 ++ include/monitor/hmp.h | 2 ++ monitor/hmp-cmds.c| 34 ++ 3 files changed, 62 insertions(+) diff --git a/hmp-commands.hx b/hmp-commands.hx index d4001f9c5d..b67a5a04cb 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1335,6 +1335,32 @@ SRST Remove host network device. ERST +{ +.name = "colo_passthrough_add", +.args_type = "protocol:s,id:s?,src_ip:s?,dst_ip:s?,src_port:i?,dst_port:i?", +.params = "protocol [id] [src_ip] [dst_ip] [src_port] [dst_port]", +.help = "Add network stream to colo passthrough list", +.cmd= hmp_colo_passthrough_add, +}, + +SRST +``colo_passthrough_add`` + Add network stream to colo passthrough list. +ERST + +{ +.name = "colo_passthrough_del", +.args_type = "protocol:s,id:s?,src_ip:s?,dst_ip:s?,src_port:i?,dst_port:i?", +.params = "protocol [id] [src_ip] [dst_ip] [src_port] [dst_port]", +.help = "Delete network stream from colo passthrough list", +.cmd= hmp_colo_passthrough_del, +}, + +SRST +``colo_passthrough_del`` + Delete network stream from colo passthrough list. +ERST + { .name = "object_add", .args_type = "object:O", diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h index ed2913fd18..3c4943b09f 100644 --- a/include/monitor/hmp.h +++ b/include/monitor/hmp.h @@ -81,6 +81,8 @@ void hmp_device_del(Monitor *mon, const QDict *qdict); void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict); void hmp_netdev_add(Monitor *mon, const QDict *qdict); void hmp_netdev_del(Monitor *mon, const QDict *qdict); +void hmp_colo_passthrough_add(Monitor *mon, const QDict *qdict); +void hmp_colo_passthrough_del(Monitor *mon, const QDict *qdict); void hmp_getfd(Monitor *mon, const QDict *qdict); void hmp_closefd(Monitor *mon, const QDict *qdict); void hmp_sendkey(Monitor *mon, const QDict *qdict); diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c index 3c88a4faef..b57e3430ab 100644 --- a/monitor/hmp-cmds.c +++ b/monitor/hmp-cmds.c @@ -1668,6 +1668,40 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict) hmp_handle_error(mon, err); } +void hmp_colo_passthrough_add(Monitor *mon, const QDict *qdict) +{ +const char *prot = qdict_get_str(qdict, "protocol"); +L4_Connection *l4_conn = g_new0(L4_Connection, 1); +Error *err = NULL; + +l4_conn->id = g_strdup(qdict_get_try_str(qdict, "id")); +l4_conn->protocol = qapi_enum_parse(&IP_PROTOCOL_lookup, prot, -1, &err); +l4_conn->src_ip = g_strdup(qdict_get_try_str(qdict, "src_ip")); +l4_conn->dst_ip = g_strdup(qdict_get_try_str(qdict, "dst_ip")); +l4_conn->src_port = qdict_get_try_int(qdict, "src_port", 0); +l4_conn->dst_port = qdict_get_try_int(qdict, "dst_port", 0); + +qmp_colo_passthrough_add(l4_conn, &err); +hmp_handle_error(mon, err); +} + +void hmp_colo_passthrough_del(Monitor *mon, const QDict *qdict) +{ +const char *prot = qdict_get_str(qdict, "protocol"); +L4_Connection *l4_conn = g_new0(L4_Connection, 1); +Error *err = NULL; + +l4_conn->id = g_strdup(qdict_get_try_str(qdict, "id")); +l4_conn->protocol = qapi_enum_parse(&IP_PROTOCOL_lookup, prot, -1, &err); +l4_conn->src_ip = g_strdup(qdict_get_try_str(qdict, "src_ip")); +l4_conn->dst_ip = g_strdup(qdict_get_try_str(qdict, "dst_ip")); +l4_conn->src_port = qdict_get_try_int(qdict, "src_port", 0); +l4_conn->dst_port = qdict_get_try_int(qdict, "dst_port", 0); + +qmp_colo_passthrough_del(l4_conn, &err); +hmp_handle_error(mon, err); +} + void hmp_object_add(Monitor *mon, const QDict *qdict) { Error *err = NULL; -- 2.25.1
[PATCH V4 7/7] net/net.c: Add handler for COLO passthrough connection
Use connection protocol,src port,dst port,src ip,dst ip as the key to bypass certain network traffic in COLO compare. Signed-off-by: Zhang Chen --- net/net.c | 153 ++ 1 file changed, 153 insertions(+) diff --git a/net/net.c b/net/net.c index 7c7cefe0e0..4a9ab29cca 100644 --- a/net/net.c +++ b/net/net.c @@ -56,6 +56,8 @@ #include "net/filter.h" #include "qapi/string-output-visitor.h" #include "qapi/hmp-output-visitor.h" +#include "net/colo-compare.h" +#include "qom/object_interfaces.h" /* Net bridge is currently not supported for W32. */ #if !defined(_WIN32) @@ -1199,14 +1201,165 @@ void qmp_netdev_del(const char *id, Error **errp) } } +static CompareState *colo_passthrough_check(L4_Connection *conn, Error **errp) +{ +Object *container; +Object *obj; +CompareState *s; + +if (!conn->id) { +error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "id", + "Need input colo-compare object id"); +return NULL; +} + +container = object_get_objects_root(); +obj = object_resolve_path_component(container, conn->id); +if (!obj) { +error_setg(errp, "colo-compare '%s' not found", conn->id); +return NULL; +} + +s = COLO_COMPARE(obj); + +if (conn->protocol == -1) { +error_setg(errp, "COLO pass through get wrong protocol"); +return NULL; +} + +if ((conn->src_ip && !qemu_isdigit(conn->src_ip[0])) || +(conn->dst_ip && !qemu_isdigit(conn->dst_ip[0]))) { +error_setg(errp, "COLO pass through get wrong IP"); +return NULL; +} + +if (conn->src_port > 65536 || conn->src_port < 0 || +conn->dst_port > 65536 || conn->dst_port < 0) { +error_setg(errp, "COLO pass through get wrong port"); +return NULL; +} + +return s; +} + +static void compare_passthrough_add(CompareState *s, +L4_Connection *conn, +Error **errp) +{ +PassthroughEntry *bypass = NULL, *next = NULL, *origin = NULL; + +bypass = g_new0(PassthroughEntry, 1); + +bypass->l4_protocol = conn->protocol; +bypass->src_port = conn->src_port; +bypass->dst_port = conn->dst_port; + +if (!inet_aton(conn->src_ip, &bypass->src_ip)) { +bypass->src_ip.s_addr = 0; +} + +if (!inet_aton(conn->dst_ip, &bypass->dst_ip)) { +bypass->dst_ip.s_addr = 0; +} + +qemu_mutex_lock(&s->passthroughlist_mutex); +if (!QLIST_EMPTY(&s->passthroughlist)) { +QLIST_FOREACH_SAFE(origin, &s->passthroughlist, node, next) { +if ((bypass->l4_protocol == origin->l4_protocol) && +(bypass->src_port == origin->src_port) && +(bypass->src_ip.s_addr == origin->src_ip.s_addr) && +(bypass->dst_ip.s_addr == origin->dst_ip.s_addr)) { +error_setg(errp, "The pass through connection already exists"); +g_free(bypass); +qemu_mutex_unlock(&s->passthroughlist_mutex); +return; +} +} +} + +QLIST_INSERT_HEAD(&s->passthroughlist, bypass, node); +qemu_mutex_unlock(&s->passthroughlist_mutex); +} + +static void compare_passthrough_del(CompareState *s, +L4_Connection *conn, +Error **errp) +{ +PassthroughEntry *bypass = NULL, *next = NULL, *origin = NULL; + +bypass = g_new0(PassthroughEntry, 1); + +bypass->l4_protocol = conn->protocol; +bypass->src_port = conn->src_port; +bypass->dst_port = conn->dst_port; + +if (!inet_aton(conn->src_ip, &bypass->src_ip)) { +bypass->src_ip.s_addr = 0; +} + +if (!inet_aton(conn->dst_ip, &bypass->dst_ip)) { +bypass->dst_ip.s_addr = 0; +} + +qemu_mutex_lock(&s->passthroughlist_mutex); +if (!QLIST_EMPTY(&s->passthroughlist)) { +QLIST_FOREACH_SAFE(origin, &s->passthroughlist, node, next) { +if ((bypass->l4_protocol == origin->l4_protocol) && +(bypass->src_port == origin->src_port) && +(bypass->src_ip.s_addr == origin->src_ip.s_addr) && +(bypass->dst_ip.s_addr == origin->dst_ip.s_addr)) { +QLIST_REMOVE(origin, node); +g_free(origin); +g_free(bypass); +qemu_mutex_unlock(&s->passthroughlist_mutex); +return; +} +} +error_setg(errp, "The pass through list can't find the connection"); +} else { +error_setg(errp, "The pass through connection list is empty"); +} + +g_free(bypass); +qemu_mutex_unlock(&s->passthroughlist_mutex); +} + void qmp_colo_passthrough_add(L4_Connection *conn, Error **errp) { /* Setup passthrough connection */ +CompareState *s; +Error *err = NULL; + +s = colo_passthrough_check(conn, &err); +if (err) { +error_propagate(er
[PATCH V4 2/7] qapi/net.json: Add L4_Connection definition
Add L4_Connection struct for other QMP commands. Except protocol field is necessary, other fields are optional. Signed-off-by: Zhang Chen --- qapi/net.json | 26 ++ 1 file changed, 26 insertions(+) diff --git a/qapi/net.json b/qapi/net.json index 498ea7aa72..cd4a8ed95e 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -825,3 +825,29 @@ { 'enum': 'IP_PROTOCOL', 'data': [ 'tcp', 'udp', 'dccp', 'sctp', 'udplite', 'icmp', 'igmp', 'ipv6' ] } +## +# @L4_Connection: +# +# Layer 4 network connection. +# +# Just for IPv4. +# +# @protocol: Transport layer protocol like TCP/UDP... +# +# @id: For specific module with Qemu object ID, If there is no such part, +# it means global rules. +# +# @src_ip: Source IP. +# +# @dst_ip: Destination IP. +# +# @src_port: Source port. +# +# @dst_port: Destination port. +# +# Since: 6.1 +## +{ 'struct': 'L4_Connection', + 'data': { 'protocol': 'IP_PROTOCOL', '*id': 'str', '*src_ip': 'str', '*dst_ip': 'str', +'*src_port': 'int', '*dst_port': 'int' } } + -- 2.25.1
[PATCH V4 0/7] Bypass specific network traffic in COLO
Due to some real user scenarios don't need to monitor all traffic. And qemu net-filter also need function to more detailed flow control. This series give user ability to bypass kinds of COLO network stream. For example, windows guest user want to enable windows remote desktop to touch guest(UDP/TCP 3389), This case use UDP and TCP mixed, and the tcp part payload always different caused by real desktop display data(for guest time/ mouse display). Another case is some real user application will actively transmit information include guest time part, primary guest send data with time 10:01.000, At the same time secondary guest send data with time 10:01.001, it will always trigger COLO checkpoint(live migrate) to drop guest performance. V4: - Fix QAPI code conflict for V6.0 merged patches. - Note this feature for V6.1. V3: - Add COLO passthrough list lock. - Add usage demo and more comments. V2: - Add the n-tuple support. - Add some qapi definitions. - Support multi colo-compare objects. - Support setup each rules for each objects individually. - Clean up COLO compare definition to .h file. - Rebase HMP command for stable tree. - Add redundant rules check. Zhang Chen (7): qapi/net.json: Add IP_PROTOCOL definition qapi/net.json: Add L4_Connection definition qapi/net: Add new QMP command for COLO passthrough hmp-commands: Add new HMP command for COLO passthrough net/colo-compare: Move data structure and define to .h file. net/colo-compare: Add passthrough list to CompareState net/net.c: Add handler for COLO passthrough connection hmp-commands.hx | 26 +++ include/monitor/hmp.h | 2 + monitor/hmp-cmds.c| 34 + net/colo-compare.c| 135 -- net/colo-compare.h| 117 ++ net/net.c | 163 ++ qapi/net.json | 97 + 7 files changed, 468 insertions(+), 106 deletions(-) -- 2.25.1
[PATCH v3 10/10] Fixed calculation error of pkt->header_size in fill_pkt_tcp_info()
From: "Rao, Lei" The data pointer has skipped vnet_hdr_len in the function of parse_packet_early().So, we can not subtract vnet_hdr_len again when calculating pkt->header_size in fill_pkt_tcp_info(). Otherwise, it will cause network packet comparsion errors and greatly increase the frequency of checkpoints. Signed-off-by: Lei Rao Signed-off-by: Zhang Chen Reviewed-by: Li Zhijian --- net/colo-compare.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index 26464a2..32b7775 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -211,7 +211,7 @@ static void fill_pkt_tcp_info(void *data, uint32_t *max_ack) pkt->tcp_ack = ntohl(tcphd->th_ack); *max_ack = *max_ack > pkt->tcp_ack ? *max_ack : pkt->tcp_ack; pkt->header_size = pkt->transport_header - (uint8_t *)pkt->data - + (tcphd->th_off << 2) - pkt->vnet_hdr_len; + + (tcphd->th_off << 2); pkt->payload_size = pkt->size - pkt->header_size; pkt->seq_end = pkt->tcp_seq + pkt->payload_size; pkt->flags = tcphd->th_flags; -- 1.8.3.1
[PATCH v3 05/10] Add a function named packet_new_nocopy for COLO.
From: "Rao, Lei" Use the packet_new_nocopy instead of packet_new in the filter-rewriter module. There will be one less memory copy in the processing of each network packet. Signed-off-by: Lei Rao --- net/colo.c| 23 +++ net/colo.h| 1 + net/filter-rewriter.c | 3 +-- 3 files changed, 25 insertions(+), 2 deletions(-) diff --git a/net/colo.c b/net/colo.c index ef00609..58106a8 100644 --- a/net/colo.c +++ b/net/colo.c @@ -174,6 +174,29 @@ Packet *packet_new(const void *data, int size, int vnet_hdr_len) return pkt; } +/* + * packet_new_nocopy will not copy data, so the caller can't release + * the data. And it will be released in packet_destroy. + */ +Packet *packet_new_nocopy(void *data, int size, int vnet_hdr_len) +{ +Packet *pkt = g_slice_new(Packet); + +pkt->data = data; +pkt->size = size; +pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST); +pkt->vnet_hdr_len = vnet_hdr_len; +pkt->tcp_seq = 0; +pkt->tcp_ack = 0; +pkt->seq_end = 0; +pkt->header_size = 0; +pkt->payload_size = 0; +pkt->offset = 0; +pkt->flags = 0; + +return pkt; +} + void packet_destroy(void *opaque, void *user_data) { Packet *pkt = opaque; diff --git a/net/colo.h b/net/colo.h index 573ab91..d91cd24 100644 --- a/net/colo.h +++ b/net/colo.h @@ -101,6 +101,7 @@ bool connection_has_tracked(GHashTable *connection_track_table, ConnectionKey *key); void connection_hashtable_reset(GHashTable *connection_track_table); Packet *packet_new(const void *data, int size, int vnet_hdr_len); +Packet *packet_new_nocopy(void *data, int size, int vnet_hdr_len); void packet_destroy(void *opaque, void *user_data); void packet_destroy_partial(void *opaque, void *user_data); diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index 10fe393..cb3a96c 100644 --- a/net/filter-rewriter.c +++ b/net/filter-rewriter.c @@ -270,8 +270,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf, vnet_hdr_len = nf->netdev->vnet_hdr_len; } -pkt = packet_new(buf, size, vnet_hdr_len); -g_free(buf); +pkt = packet_new_nocopy(buf, size, vnet_hdr_len); /* * if we get tcp packet -- 1.8.3.1
[PATCH v3 09/10] Add the function of colo_bitmap_clear_diry.
From: "Rao, Lei" When we use continuous dirty memory copy for flushing ram cache on secondary VM, we can also clean up the bitmap of contiguous dirty page memory. This also can reduce the VM stop time during checkpoint. Signed-off-by: Lei Rao --- migration/ram.c | 29 + 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 2a8ee96..9b23df6 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -856,6 +856,30 @@ unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, return first; } +/** + * colo_bitmap_clear_dirty:when we flush ram cache to ram, we will use + * continuous memory copy, so we can also clean up the bitmap of contiguous + * dirty memory. + */ +static inline bool colo_bitmap_clear_dirty(RAMState *rs, + RAMBlock *rb, + unsigned long start, + unsigned long num) +{ +bool ret; +unsigned long i = 0; + +qemu_mutex_lock(&rs->bitmap_mutex); +for (i = 0; i < num; i++) { +ret = test_and_clear_bit(start + i, rb->bmap); +if (ret) { +rs->migration_dirty_pages--; +} +} +qemu_mutex_unlock(&rs->bitmap_mutex); +return ret; +} + static inline bool migration_bitmap_clear_dirty(RAMState *rs, RAMBlock *rb, unsigned long page) @@ -3703,7 +3727,6 @@ void colo_flush_ram_cache(void) void *src_host; unsigned long offset = 0; unsigned long num = 0; -unsigned long i = 0; memory_global_dirty_log_sync(); WITH_RCU_READ_LOCK_GUARD() { @@ -3725,9 +3748,7 @@ void colo_flush_ram_cache(void) num = 0; block = QLIST_NEXT_RCU(block, next); } else { -for (i = 0; i < num; i++) { -migration_bitmap_clear_dirty(ram_state, block, offset + i); -} +colo_bitmap_clear_dirty(ram_state, block, offset, num); dst_host = block->host + (((ram_addr_t)offset) << TARGET_PAGE_BITS); src_host = block->colo_cache -- 1.8.3.1
[PATCH v3 04/10] Remove migrate_set_block_enabled in checkpoint
From: "Rao, Lei" We can detect disk migration in migrate_prepare, if disk migration is enabled in COLO mode, we can directly report an error.and there is no need to disable block migration at every checkpoint. Signed-off-by: Lei Rao Signed-off-by: Zhang Chen Reviewed-by: Li Zhijian --- migration/colo.c | 6 -- migration/migration.c | 4 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index de27662..1aaf316 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -435,12 +435,6 @@ static int colo_do_checkpoint_transaction(MigrationState *s, if (failover_get_state() != FAILOVER_STATUS_NONE) { goto out; } - -/* Disable block migration */ -migrate_set_block_enabled(false, &local_err); -if (local_err) { -goto out; -} qemu_mutex_lock_iothread(); #ifdef CONFIG_REPLICATION diff --git a/migration/migration.c b/migration/migration.c index a5ddf43..785a331 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2221,6 +2221,10 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc, } if (blk || blk_inc) { +if (migrate_colo_enabled()) { +error_setg(errp, "No disk migration is required in COLO mode"); +return false; +} if (migrate_use_block() || migrate_use_block_incremental()) { error_setg(errp, "Command options are incompatible with " "current migration capabilities"); -- 1.8.3.1
[PATCH v3 08/10] Reduce the PVM stop time during Checkpoint
From: "Rao, Lei" When flushing memory from ram cache to ram during every checkpoint on secondary VM, we can copy continuous chunks of memory instead of 4096 bytes per time to reduce the time of VM stop during checkpoint. Signed-off-by: Lei Rao --- migration/ram.c | 45 ++--- 1 file changed, 42 insertions(+), 3 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index e795a8d..2a8ee96 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -823,6 +823,39 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, return next; } +/* + * colo_bitmap_find_diry:find contiguous dirty pages from start + * + * Returns the page offset within memory region of the start of the contiguout + * dirty page + * + * @rs: current RAM state + * @rb: RAMBlock where to search for dirty pages + * @start: page where we start the search + * @num: the number of contiguous dirty pages + */ +static inline +unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, + unsigned long start, unsigned long *num) +{ +unsigned long size = rb->used_length >> TARGET_PAGE_BITS; +unsigned long *bitmap = rb->bmap; +unsigned long first, next; + +if (ramblock_is_ignored(rb)) { +return size; +} + +first = find_next_bit(bitmap, size, start); +if (first >= size) { +return first; +} +next = find_next_zero_bit(bitmap, size, first + 1); +assert(next >= first); +*num = next - first; +return first; +} + static inline bool migration_bitmap_clear_dirty(RAMState *rs, RAMBlock *rb, unsigned long page) @@ -3669,6 +3702,8 @@ void colo_flush_ram_cache(void) void *dst_host; void *src_host; unsigned long offset = 0; +unsigned long num = 0; +unsigned long i = 0; memory_global_dirty_log_sync(); WITH_RCU_READ_LOCK_GUARD() { @@ -3682,19 +3717,23 @@ void colo_flush_ram_cache(void) block = QLIST_FIRST_RCU(&ram_list.blocks); while (block) { -offset = migration_bitmap_find_dirty(ram_state, block, offset); +offset = colo_bitmap_find_dirty(ram_state, block, offset, &num); if (((ram_addr_t)offset) << TARGET_PAGE_BITS >= block->used_length) { offset = 0; +num = 0; block = QLIST_NEXT_RCU(block, next); } else { -migration_bitmap_clear_dirty(ram_state, block, offset); +for (i = 0; i < num; i++) { +migration_bitmap_clear_dirty(ram_state, block, offset + i); +} dst_host = block->host + (((ram_addr_t)offset) << TARGET_PAGE_BITS); src_host = block->colo_cache + (((ram_addr_t)offset) << TARGET_PAGE_BITS); -memcpy(dst_host, src_host, TARGET_PAGE_SIZE); +memcpy(dst_host, src_host, TARGET_PAGE_SIZE * num); +offset += num; } } } -- 1.8.3.1
[PATCH v3 06/10] Add the function of colo_compare_cleanup
From: "Rao, Lei" This patch fixes the following: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x7f6ae4559859 in __GI_abort () at abort.c:79 #2 0x559aaa386720 in error_exit (err=16, msg=0x559aaa5973d0 <__func__.16227> "qemu_mutex_destroy") at util/qemu-thread-posix.c:36 #3 0x559aaa3868c5 in qemu_mutex_destroy (mutex=0x559aabffe828) at util/qemu-thread-posix.c:69 #4 0x559aaa2f93a8 in char_finalize (obj=0x559aabffe800) at chardev/char.c:285 #5 0x559aaa23318a in object_deinit (obj=0x559aabffe800, type=0x559aabfd7d20) at qom/object.c:606 #6 0x559aaa2331b8 in object_deinit (obj=0x559aabffe800, type=0x559aabfd9060) at qom/object.c:610 #7 0x559aaa233200 in object_finalize (data=0x559aabffe800) at qom/object.c:620 #8 0x559aaa234202 in object_unref (obj=0x559aabffe800) at qom/object.c:1074 #9 0x559aaa2356b6 in object_finalize_child_property (obj=0x559aac0dac10, name=0x559aac778760 "compare0-0", opaque=0x559aabffe800) at qom/object.c:1584 #10 0x559aaa232f70 in object_property_del_all (obj=0x559aac0dac10) at qom/object.c:557 #11 0x559aaa2331ed in object_finalize (data=0x559aac0dac10) at qom/object.c:619 #12 0x559aaa234202 in object_unref (obj=0x559aac0dac10) at qom/object.c:1074 #13 0x559aaa2356b6 in object_finalize_child_property (obj=0x559aac0c75c0, name=0x559aac0dadc0 "chardevs", opaque=0x559aac0dac10) at qom/object.c:1584 #14 0x559aaa233071 in object_property_del_child (obj=0x559aac0c75c0, child=0x559aac0dac10, errp=0x0) at qom/object.c:580 #15 0x559aaa233155 in object_unparent (obj=0x559aac0dac10) at qom/object.c:599 #16 0x559aaa2fb721 in qemu_chr_cleanup () at chardev/char.c:1159 #17 0x559aa9f9b110 in main (argc=54, argv=0x7ffeb62fa998, envp=0x7ffeb62fab50) at vl.c:4539 When chardev is cleaned up, chr_write_lock needs to be destroyed. But the colo-compare module is not cleaned up normally before it when the guest poweroff. It is holding chr_write_lock at this time. This will cause qemu crash.So we add the function of colo_compare_cleanup() before qemu_chr_cleanup() to fix the bug. Signed-off-by: Lei Rao --- net/colo-compare.c | 10 ++ net/colo-compare.h | 1 + net/net.c | 4 3 files changed, 15 insertions(+) diff --git a/net/colo-compare.c b/net/colo-compare.c index 9e18baa..26464a2 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -1401,6 +1401,16 @@ static void colo_compare_init(Object *obj) compare_set_vnet_hdr); } +void colo_compare_cleanup(void) +{ +CompareState *tmp = NULL; +CompareState *n = NULL; + +QTAILQ_FOREACH_SAFE(tmp, &net_compares, next, n) { +object_unparent(OBJECT(tmp)); +} +} + static void colo_compare_finalize(Object *obj) { CompareState *s = COLO_COMPARE(obj); diff --git a/net/colo-compare.h b/net/colo-compare.h index 22ddd51..b055270 100644 --- a/net/colo-compare.h +++ b/net/colo-compare.h @@ -20,5 +20,6 @@ void colo_notify_compares_event(void *opaque, int event, Error **errp); void colo_compare_register_notifier(Notifier *notify); void colo_compare_unregister_notifier(Notifier *notify); +void colo_compare_cleanup(void); #endif /* QEMU_COLO_COMPARE_H */ diff --git a/net/net.c b/net/net.c index fb7b7dc..3a94a27 100644 --- a/net/net.c +++ b/net/net.c @@ -53,6 +53,7 @@ #include "sysemu/qtest.h" #include "sysemu/runstate.h" #include "sysemu/sysemu.h" +#include "net/colo-compare.h" #include "net/filter.h" #include "qapi/string-output-visitor.h" @@ -1375,6 +1376,9 @@ void net_cleanup(void) { NetClientState *nc; +/*cleanup colo compare module for COLO*/ +colo_compare_cleanup(); + /* We may del multiple entries during qemu_del_net_client(), * so QTAILQ_FOREACH_SAFE() is also not safe here. */ -- 1.8.3.1
[PATCH v3 03/10] Optimize the function of filter_send
From: "Rao, Lei" The iov_size has been calculated in filter_send(). we can directly return the size.In this way, this is no need to repeat calculations in filter_redirector_receive_iov(); Signed-off-by: Lei Rao Reviewed-by: Li Zhijian --- net/filter-mirror.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/filter-mirror.c b/net/filter-mirror.c index f8e6500..f20240c 100644 --- a/net/filter-mirror.c +++ b/net/filter-mirror.c @@ -88,7 +88,7 @@ static int filter_send(MirrorState *s, goto err; } -return 0; +return size; err: return ret < 0 ? ret : -EIO; @@ -159,7 +159,7 @@ static ssize_t filter_mirror_receive_iov(NetFilterState *nf, int ret; ret = filter_send(s, iov, iovcnt); -if (ret) { +if (ret < 0) { error_report("filter mirror send failed(%s)", strerror(-ret)); } @@ -182,10 +182,10 @@ static ssize_t filter_redirector_receive_iov(NetFilterState *nf, if (qemu_chr_fe_backend_connected(&s->chr_out)) { ret = filter_send(s, iov, iovcnt); -if (ret) { +if (ret < 0) { error_report("filter redirector send failed(%s)", strerror(-ret)); } -return iov_size(iov, iovcnt); +return ret; } else { return 0; } -- 1.8.3.1
[PATCH v3 07/10] Reset the auto-converge counter at every checkpoint.
From: "Rao, Lei" if we don't reset the auto-converge counter, it will continue to run with COLO running, and eventually the system will hang due to the CPU throttle reaching DEFAULT_MIGRATE_MAX_CPU_THROTTLE. Signed-off-by: Lei Rao --- migration/colo.c | 4 migration/ram.c | 10 ++ migration/ram.h | 1 + 3 files changed, 15 insertions(+) diff --git a/migration/colo.c b/migration/colo.c index 1aaf316..723ffb8 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -459,6 +459,10 @@ static int colo_do_checkpoint_transaction(MigrationState *s, if (ret < 0) { goto out; } + +if (migrate_auto_converge()) { +mig_throttle_counter_reset(); +} /* * Only save VM's live state, which not including device state. * TODO: We may need a timeout mechanism to prevent COLO process diff --git a/migration/ram.c b/migration/ram.c index 72143da..e795a8d 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -652,6 +652,16 @@ static void mig_throttle_guest_down(uint64_t bytes_dirty_period, } } +void mig_throttle_counter_reset(void) +{ +RAMState *rs = ram_state; + +rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); +rs->num_dirty_pages_period = 0; +rs->bytes_xfer_prev = ram_counters.transferred; +cpu_throttle_stop(); +} + /** * xbzrle_cache_zero_page: insert a zero page in the XBZRLE cache * diff --git a/migration/ram.h b/migration/ram.h index 6378bb3..3f78175 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -50,6 +50,7 @@ bool ramblock_is_ignored(RAMBlock *block); int xbzrle_cache_resize(uint64_t new_size, Error **errp); uint64_t ram_bytes_remaining(void); uint64_t ram_bytes_total(void); +void mig_throttle_counter_reset(void); uint64_t ram_pagesize_summary(void); int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len); -- 1.8.3.1
[PATCH v3 01/10] Remove some duplicate trace code.
From: "Rao, Lei" There is the same trace code in the colo_compare_packet_payload. Signed-off-by: Lei Rao Reviewed-by: Li Zhijian --- net/colo-compare.c | 13 - 1 file changed, 13 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index 84db497..9e18baa 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -590,19 +590,6 @@ static int colo_packet_compare_other(Packet *spkt, Packet *ppkt) uint16_t offset = ppkt->vnet_hdr_len; trace_colo_compare_main("compare other"); -if (trace_event_get_state_backends(TRACE_COLO_COMPARE_IP_INFO)) { -char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20], sec_ip_dst[20]; - -strcpy(pri_ip_src, inet_ntoa(ppkt->ip->ip_src)); -strcpy(pri_ip_dst, inet_ntoa(ppkt->ip->ip_dst)); -strcpy(sec_ip_src, inet_ntoa(spkt->ip->ip_src)); -strcpy(sec_ip_dst, inet_ntoa(spkt->ip->ip_dst)); - -trace_colo_compare_ip_info(ppkt->size, pri_ip_src, - pri_ip_dst, spkt->size, - sec_ip_src, sec_ip_dst); -} - if (ppkt->size != spkt->size) { trace_colo_compare_main("Other: payload size of packets are different"); return -1; -- 1.8.3.1
[PATCH v3 02/10] Fix the qemu crash when guest shutdown during checkpoint
From: "Rao, Lei" This patch fixes the following: qemu-system-x86_64: invalid runstate transition: 'colo' ->'shutdown' Aborted (core dumped) Signed-off-by: Lei Rao Reviewed-by: Li Zhijian --- softmmu/runstate.c | 1 + 1 file changed, 1 insertion(+) diff --git a/softmmu/runstate.c b/softmmu/runstate.c index 2874417..884f8fa 100644 --- a/softmmu/runstate.c +++ b/softmmu/runstate.c @@ -126,6 +126,7 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH }, { RUN_STATE_COLO, RUN_STATE_RUNNING }, +{ RUN_STATE_COLO, RUN_STATE_SHUTDOWN}, { RUN_STATE_RUNNING, RUN_STATE_DEBUG }, { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR }, -- 1.8.3.1
[PATCH v3 00/10] Fixed some bugs and optimized some codes for COLO
From: Rao, Lei Changes since v2: --Add a function named packet_new_nocopy. --Continue to optimize the function of colo_flush_ram_cache. Changes since v1: --Reset the state of the auto-converge counters at every checkpoint instead of directly disabling. --Treat the filter_send function returning zero as a normal case. The series of patches include: Fixed some bugs of qemu crash. Optimized some code to reduce the time of checkpoint. Remove some unnecessary code to improve COLO. Rao, Lei (10): Remove some duplicate trace code. Fix the qemu crash when guest shutdown during checkpoint Optimize the function of filter_send Remove migrate_set_block_enabled in checkpoint Add a function named packet_new_nocopy for COLO. Add the function of colo_compare_cleanup Reset the auto-converge counter at every checkpoint. Reduce the PVM stop time during Checkpoint Add the function of colo_bitmap_clear_diry. Fixed calculation error of pkt->header_size in fill_pkt_tcp_info() migration/colo.c | 10 +++ migration/migration.c | 4 +++ migration/ram.c | 76 +-- migration/ram.h | 1 + net/colo-compare.c| 25 - net/colo-compare.h| 1 + net/colo.c| 23 net/colo.h| 1 + net/filter-mirror.c | 8 +++--- net/filter-rewriter.c | 3 +- net/net.c | 4 +++ softmmu/runstate.c| 1 + 12 files changed, 128 insertions(+), 29 deletions(-) -- 1.8.3.1
Re: [RFC PATCH] i386: Add ratelimit for bus locks acquired in guest
On 3/19/2021 10:59 AM, Chenyi Qiang wrote: Hi Marcelo, Thank you for your comment. On 3/19/2021 1:32 AM, Marcelo Tosatti wrote: On Wed, Mar 17, 2021 at 04:47:09PM +0800, Chenyi Qiang wrote: Virtual Machines can exploit bus locks to degrade the performance of system. To address this kind of performance DOS attack, bus lock VM exit is introduced in KVM and it will report the bus locks detected in guest, which can help userspace to enforce throttling policies. The availability of bus lock VM exit can be detected through the KVM_CAP_X86_BUS_LOCK_EXIT. The returned bitmap contains the potential policies supported by KVM. The field KVM_BUS_LOCK_DETECTION_EXIT in bitmap is the only supported strategy at present. It indicates that KVM will exit to userspace to handle the bus locks. This patch adds a ratelimit on the bus locks acquired in guest as a mitigation policy. Introduce a new field "bld" to record the limited speed of bus locks in target VM. The user can specify it through the "bus-lock-detection" as a machine property. In current implementation, the default value of the speed is 0 per second, which means no restriction on the bus locks. Ratelimit enforced in data transmission uses a time slice of 100ms to get smooth output during regular operations in block jobs. As for ratelimit on bus lock detection, simply set the ratelimit interval to 1s and restrict the quota of bus lock occurrence to the value of "bld". A potential alternative is to introduce the time slice as a property which can help the user achieve more precise control. The detail of Bus lock VM exit can be found in spec: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Signed-off-by: Chenyi Qiang --- hw/i386/x86.c | 6 ++ include/hw/i386/x86.h | 7 +++ target/i386/kvm/kvm.c | 44 +++ 3 files changed, 57 insertions(+) diff --git a/hw/i386/x86.c b/hw/i386/x86.c index 7865660e2c..a70a259e97 100644 --- a/hw/i386/x86.c +++ b/hw/i386/x86.c @@ -1209,6 +1209,12 @@ static void x86_machine_initfn(Object *obj) x86ms->acpi = ON_OFF_AUTO_AUTO; x86ms->smp_dies = 1; x86ms->pci_irq_mask = ACPI_BUILD_PCI_IRQS; + x86ms->bld = 0; + + object_property_add_uint64_ptr(obj, "bus-lock-detection", + &x86ms->bld, OBJ_PROP_FLAG_READWRITE); + object_property_set_description(obj, "bus-lock-detection", + "Bus lock detection ratelimit"); } static void x86_machine_class_init(ObjectClass *oc, void *data) diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h index 56080bd1fb..1f0ffbcfb9 100644 --- a/include/hw/i386/x86.h +++ b/include/hw/i386/x86.h @@ -72,6 +72,13 @@ struct X86MachineState { * will be translated to MSI messages in the address space. */ AddressSpace *ioapic_as; + + /* + * ratelimit enforced on detected bus locks, the default value + * is 0 per second + */ + uint64_t bld; + RateLimit bld_limit; }; #define X86_MACHINE_SMM "smm" diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index c8d61daf68..724862137d 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -130,6 +130,8 @@ static bool has_msr_mcg_ext_ctl; static struct kvm_cpuid2 *cpuid_cache; static struct kvm_msr_list *kvm_feature_msrs; +#define SLICE_TIME 10ULL /* ns */ + int kvm_has_pit_state2(void) { return has_pit_state2; @@ -2267,6 +2269,27 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } + if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) { + X86MachineState *x86ms = X86_MACHINE(ms); + + if (x86ms->bld > 0) { + ret = kvm_check_extension(s, KVM_CAP_X86_BUS_LOCK_EXIT); + if (!(ret & KVM_BUS_LOCK_DETECTION_EXIT)) { + error_report("kvm: bus lock detection unsupported"); + return -ENOTSUP; + } + ret = kvm_vm_enable_cap(s, KVM_CAP_X86_BUS_LOCK_EXIT, 0, + KVM_BUS_LOCK_DETECTION_EXIT); + if (ret < 0) { + error_report("kvm: Failed to enable bus lock detection cap: %s", + strerror(-ret)); + return ret; + } + + ratelimit_set_speed(&x86ms->bld_limit, x86ms->bld, SLICE_TIME); + } + } + return 0; } @@ -4221,6 +4244,18 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run) } } +static void kvm_rate_limit_on_bus_lock(void) +{ + MachineState *ms = MACHINE(qdev_get_machine()); + X86MachineState *x86ms = X86_MACHINE(ms); + + uint64_t delay_ns = ratelimit_calculate_delay(&x86ms->bld_limit, 1); + + if (delay_ns) { + g_usleep(delay_ns / SCALE_US); + } +} Hi, Can't see a use-case where the throttling is very useful: this will slowdown the application to a halt (if its bus-lock i
Re: [RFC PATCH] i386: Add ratelimit for bus locks acquired in guest
On 3/19/2021 9:23 AM, Xiaoyao Li wrote: On 3/17/2021 4:47 PM, Chenyi Qiang wrote: [...] MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run) { X86CPU *x86_cpu = X86_CPU(cpu); @@ -4236,6 +4271,11 @@ MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run) } else { env->eflags &= ~IF_MASK; } + if (run->flags & KVM_RUN_X86_BUS_LOCK) { + kvm_cpu_synchronize_state(cpu); + warn_report("bus lock detected at rip: 0x%lx", env->eip); Chenyi, Let's drop the eip here since QEMU has no idea whether it points to the next instruction or the exact instruction acquires bus lock. Fair enough. + kvm_rate_limit_on_bus_lock(); + } /* We need to protect the apic state against concurrent accesses from * different threads in case the userspace irqchip is used. */ @@ -4594,6 +4634,10 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) ioapic_eoi_broadcast(run->eoi.vector); ret = 0; break; + case KVM_EXIT_X86_BUS_LOCK: + /* already handled in kvm_arch_post_run */ + ret = 0; + break; default: fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason); ret = -1;
Re: [RFC PATCH] i386: Add ratelimit for bus locks acquired in guest
Hi Marcelo, Thank you for your comment. On 3/19/2021 1:32 AM, Marcelo Tosatti wrote: On Wed, Mar 17, 2021 at 04:47:09PM +0800, Chenyi Qiang wrote: Virtual Machines can exploit bus locks to degrade the performance of system. To address this kind of performance DOS attack, bus lock VM exit is introduced in KVM and it will report the bus locks detected in guest, which can help userspace to enforce throttling policies. The availability of bus lock VM exit can be detected through the KVM_CAP_X86_BUS_LOCK_EXIT. The returned bitmap contains the potential policies supported by KVM. The field KVM_BUS_LOCK_DETECTION_EXIT in bitmap is the only supported strategy at present. It indicates that KVM will exit to userspace to handle the bus locks. This patch adds a ratelimit on the bus locks acquired in guest as a mitigation policy. Introduce a new field "bld" to record the limited speed of bus locks in target VM. The user can specify it through the "bus-lock-detection" as a machine property. In current implementation, the default value of the speed is 0 per second, which means no restriction on the bus locks. Ratelimit enforced in data transmission uses a time slice of 100ms to get smooth output during regular operations in block jobs. As for ratelimit on bus lock detection, simply set the ratelimit interval to 1s and restrict the quota of bus lock occurrence to the value of "bld". A potential alternative is to introduce the time slice as a property which can help the user achieve more precise control. The detail of Bus lock VM exit can be found in spec: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Signed-off-by: Chenyi Qiang --- hw/i386/x86.c | 6 ++ include/hw/i386/x86.h | 7 +++ target/i386/kvm/kvm.c | 44 +++ 3 files changed, 57 insertions(+) diff --git a/hw/i386/x86.c b/hw/i386/x86.c index 7865660e2c..a70a259e97 100644 --- a/hw/i386/x86.c +++ b/hw/i386/x86.c @@ -1209,6 +1209,12 @@ static void x86_machine_initfn(Object *obj) x86ms->acpi = ON_OFF_AUTO_AUTO; x86ms->smp_dies = 1; x86ms->pci_irq_mask = ACPI_BUILD_PCI_IRQS; +x86ms->bld = 0; + +object_property_add_uint64_ptr(obj, "bus-lock-detection", + &x86ms->bld, OBJ_PROP_FLAG_READWRITE); +object_property_set_description(obj, "bus-lock-detection", +"Bus lock detection ratelimit"); } static void x86_machine_class_init(ObjectClass *oc, void *data) diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h index 56080bd1fb..1f0ffbcfb9 100644 --- a/include/hw/i386/x86.h +++ b/include/hw/i386/x86.h @@ -72,6 +72,13 @@ struct X86MachineState { * will be translated to MSI messages in the address space. */ AddressSpace *ioapic_as; + +/* + * ratelimit enforced on detected bus locks, the default value + * is 0 per second + */ +uint64_t bld; +RateLimit bld_limit; }; #define X86_MACHINE_SMM "smm" diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index c8d61daf68..724862137d 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -130,6 +130,8 @@ static bool has_msr_mcg_ext_ctl; static struct kvm_cpuid2 *cpuid_cache; static struct kvm_msr_list *kvm_feature_msrs; +#define SLICE_TIME 10ULL /* ns */ + int kvm_has_pit_state2(void) { return has_pit_state2; @@ -2267,6 +2269,27 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } +if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) { +X86MachineState *x86ms = X86_MACHINE(ms); + +if (x86ms->bld > 0) { +ret = kvm_check_extension(s, KVM_CAP_X86_BUS_LOCK_EXIT); +if (!(ret & KVM_BUS_LOCK_DETECTION_EXIT)) { +error_report("kvm: bus lock detection unsupported"); +return -ENOTSUP; +} +ret = kvm_vm_enable_cap(s, KVM_CAP_X86_BUS_LOCK_EXIT, 0, +KVM_BUS_LOCK_DETECTION_EXIT); +if (ret < 0) { +error_report("kvm: Failed to enable bus lock detection cap: %s", + strerror(-ret)); +return ret; +} + +ratelimit_set_speed(&x86ms->bld_limit, x86ms->bld, SLICE_TIME); +} +} + return 0; } @@ -4221,6 +4244,18 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run) } } +static void kvm_rate_limit_on_bus_lock(void) +{ +MachineState *ms = MACHINE(qdev_get_machine()); +X86MachineState *x86ms = X86_MACHINE(ms); + +uint64_t delay_ns = ratelimit_calculate_delay(&x86ms->bld_limit, 1); + +if (delay_ns) { +g_usleep(delay_ns / SCALE_US); +} +} Hi, Can't see a use-case where the throttling is very useful: this will slowdown the application to a halt (if its bus-lock instruction is being called frequ
[PATCH v1] MAINTAINERS: Fix tests/migration maintainers
From: Hyman Huang(黄勇) Signed-off-by: Hyman Huang(黄勇) --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 25fc49d1dc..20e2387c66 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2525,6 +2525,7 @@ M: Cleber Rosa S: Odd Fixes F: scripts/*.py F: tests/*.py +F: tests/migration/ Benchmark util M: Vladimir Sementsov-Ogievskiy -- 2.24.3
[PATCH] i386/cpu: Expose AVX_VNNI instruction to guset
Expose AVX (VEX-encoded) versions of the Vector Neural Network Instructions to guest. The bit definition: CPUID.(EAX=7,ECX=1):EAX[bit 4] AVX_VNNI The following instructions are available when this feature is present in the guest. 1. VPDPBUS: Multiply and Add Unsigned and Signed Bytes 2. VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with Saturation 3. VPDPWSSD: Multiply and Add Signed Word Integers 4. VPDPWSSDS: Multiply and Add Signed Integers with Saturation The release document ref below link: https://software.intel.com/content/www/us/en/develop/download/\ intel-architecture-instruction-set-extensions-programming-reference.html Signed-off-by: Yang Zhong --- target/i386/cpu.c | 4 ++-- target/i386/cpu.h | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index ae9fd9f31d..f748989860 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -996,7 +996,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { .type = CPUID_FEATURE_WORD, .feat_names = { NULL, NULL, NULL, NULL, -NULL, "avx512-bf16", NULL, NULL, +"avx-vnni", "avx512-bf16", NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, @@ -3273,7 +3273,7 @@ static X86CPUDefinition builtin_x86_defs[] = { MSR_ARCH_CAP_SKIP_L1DFL_VMENTRY | MSR_ARCH_CAP_MDS_NO | MSR_ARCH_CAP_PSCHANGE_MC_NO | MSR_ARCH_CAP_TAA_NO, .features[FEAT_7_1_EAX] = -CPUID_7_1_EAX_AVX512_BF16, +CPUID_7_1_EAX_AVX_VNNI | CPUID_7_1_EAX_AVX512_BF16, /* * Missing: XSAVES (not supported by some Linux versions, * including v4.1 to v4.12). diff --git a/target/i386/cpu.h b/target/i386/cpu.h index b4b136cd0d..efda0d6178 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -805,6 +805,8 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS]; /* Speculative Store Bypass Disable */ #define CPUID_7_0_EDX_SPEC_CTRL_SSBD(1U << 31) +/* AVX VNNI Instruction */ +#define CPUID_7_1_EAX_AVX_VNNI (1U << 4) /* AVX512 BFloat16 Instruction */ #define CPUID_7_1_EAX_AVX512_BF16 (1U << 5) -- 2.29.2.334.gfaefdd61ec
Re: [PATCH] net/slirp: Fix incorrect permissions on samba >= 2.0.5
On 2/23/21 3:41 AM, Niklas Hambüchen wrote: > This broke `-net user,smb=/path/to/folder`: Hey, just a short ping on whether anyone would have a moment to review this `qemu-trivial` patch; it would be very nice to have SMB support to work out of the box again. Thanks!
Re: [RFC PATCH] i386: Add ratelimit for bus locks acquired in guest
On 3/17/2021 4:47 PM, Chenyi Qiang wrote: [...] MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run) { X86CPU *x86_cpu = X86_CPU(cpu); @@ -4236,6 +4271,11 @@ MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run) } else { env->eflags &= ~IF_MASK; } +if (run->flags & KVM_RUN_X86_BUS_LOCK) { +kvm_cpu_synchronize_state(cpu); +warn_report("bus lock detected at rip: 0x%lx", env->eip); Chenyi, Let's drop the eip here since QEMU has no idea whether it points to the next instruction or the exact instruction acquires bus lock. +kvm_rate_limit_on_bus_lock(); +} /* We need to protect the apic state against concurrent accesses from * different threads in case the userspace irqchip is used. */ @@ -4594,6 +4634,10 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) ioapic_eoi_broadcast(run->eoi.vector); ret = 0; break; +case KVM_EXIT_X86_BUS_LOCK: +/* already handled in kvm_arch_post_run */ +ret = 0; +break; default: fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason); ret = -1;
[PATCH] gitlab-ci: Restrict jobs using Docker to runners having 'docker' tag
When a job is based on a Docker image [1], or is using a Docker service, it requires a runner with Docker installed. Gitlab shared runners provide the 'docker' tag when they have it installed. Are Gitlab shared runners are limited resources, we'd like to add more runners to QEMU repositories hosted on Gitlab. If a runner doesn't provide Docker, our jobs requiring it will fail. Use the standard 'docker' tag to mark the jobs requiring Docker on the runner. [1] https://docs.gitlab.com/ee/ci/yaml/#image [2] https://docs.gitlab.com/ee/ci/yaml/#services Signed-off-by: Philippe Mathieu-Daudé --- If someone is interested in testing or filling the documentation gap, what I ran is: $ sudo usermod -aG docker,kvm gitlab-runner $ sudo gitlab-runner --log-format text --log-level debug \ register \ --non-interactive \ --url https://gitlab.com --registration-token MYTOKEN --description myrunner \ --tag-list 'docker,linux,x86_64,kvm' --run-untagged --limit 2 \ --executor docker --docker-image docker:dind --docker-cpus 4 \ --docker-volumes /var/run/docker.sock:/var/run/docker.sock \ --docker-dns 8.8.8.8 --docker-volumes is for docker:dind else it was not working This comes from this 3 year old thread: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/1986 We can not use the 'docker:dind' tag for a runner having docker:dind and /var/run/docker.sock volume because this is not a tag used by the shared runners, so we can't use them anymore. --- .gitlab-ci.d/containers.yml | 2 ++ .gitlab-ci.d/crossbuilds.yml | 4 .gitlab-ci.d/edk2.yml| 4 .gitlab-ci.d/opensbi.yml | 4 .gitlab-ci.yml | 4 5 files changed, 18 insertions(+) diff --git a/.gitlab-ci.d/containers.yml b/.gitlab-ci.d/containers.yml index 33e4046e233..8e2a6a99889 100644 --- a/.gitlab-ci.d/containers.yml +++ b/.gitlab-ci.d/containers.yml @@ -1,4 +1,6 @@ .container_job_template: &container_job_definition + tags: + - docker image: docker:stable stage: containers services: diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml index d5098c986b8..e59fbfdc73f 100644 --- a/.gitlab-ci.d/crossbuilds.yml +++ b/.gitlab-ci.d/crossbuilds.yml @@ -1,4 +1,6 @@ .cross_system_build_job: + tags: + - docker stage: build image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest timeout: 80m @@ -18,6 +20,8 @@ # KVM), and set extra options (such disabling other accelerators) via the # $ACCEL_CONFIGURE_OPTS variable. .cross_accel_build_job: + tags: + - docker stage: build image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest timeout: 30m diff --git a/.gitlab-ci.d/edk2.yml b/.gitlab-ci.d/edk2.yml index ba7280605c4..afbd8e4d915 100644 --- a/.gitlab-ci.d/edk2.yml +++ b/.gitlab-ci.d/edk2.yml @@ -5,6 +5,8 @@ docker-edk2: - .gitlab-ci.d/edk2.yml - .gitlab-ci.d/edk2/Dockerfile when: always + tags: + - docker image: docker:19.03.1 services: - docker:19.03.1-dind @@ -24,6 +26,8 @@ docker-edk2: - docker push $IMAGE_TAG build-edk2: + tags: + - docker stage: build needs: ['docker-edk2'] rules: # Only run this job when ... diff --git a/.gitlab-ci.d/opensbi.yml b/.gitlab-ci.d/opensbi.yml index f66cd1d9089..a4a93222c2d 100644 --- a/.gitlab-ci.d/opensbi.yml +++ b/.gitlab-ci.d/opensbi.yml @@ -5,6 +5,8 @@ docker-opensbi: - .gitlab-ci.d/opensbi.yml - .gitlab-ci.d/opensbi/Dockerfile when: always + tags: + - docker image: docker:19.03.1 services: - docker:19.03.1-dind @@ -24,6 +26,8 @@ docker-opensbi: - docker push $IMAGE_TAG build-opensbi: + tags: + - docker stage: build needs: ['docker-opensbi'] rules: # Only run this job when ... diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index f65cb11c4d3..d4511cf7dea 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -14,6 +14,8 @@ include: - local: '/.gitlab-ci.d/crossbuilds.yml' .native_build_job_template: &native_build_job_definition + tags: + - docker stage: build image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest before_script: @@ -38,6 +40,8 @@ include: fi .native_test_job_template: &native_test_job_definition + tags: + - docker stage: test image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest script: -- 2.26.2
[PATCH v4 2/2] hw/riscv: allow ramfb on virt
Allow ramfb on virt. This lets `-device ramfb' work. Signed-off-by: Asherah Connor Reviewed-by: Bin Meng Reviewed-by: Alistair Francis --- (no changes since v2) Changes in v2: * Add ramfb as allowed on riscv virt machine class. hw/riscv/virt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c index e96ec4cbbc..c0dc69ff33 100644 --- a/hw/riscv/virt.c +++ b/hw/riscv/virt.c @@ -42,6 +42,7 @@ #include "sysemu/sysemu.h" #include "hw/pci/pci.h" #include "hw/pci-host/gpex.h" +#include "hw/display/ramfb.h" static const MemMapEntry virt_memmap[] = { [VIRT_DEBUG] = {0x0, 0x100 }, @@ -781,6 +782,8 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props; mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id; mc->numa_mem_supported = true; + +machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE); } static const TypeInfo virt_machine_typeinfo = { -- 2.20.1
[PATCH v4 1/2] hw/riscv: Add fw_cfg support to virt
Provides fw_cfg for the virt machine on riscv. This enables using e.g. ramfb later. Signed-off-by: Asherah Connor Reviewed-by: Bin Meng Reviewed-by: Alistair Francis --- Changes in v4: * Adapt for changes made in c65d7080d8 "hw/riscv: migrate fdt field to generic MachineState". Changes in v3: * Document why fw_cfg is done when it is. * Move VIRT_FW_CFG before VIRT_FLASH. Changes in v2: * Add DMA support (needed for writes). hw/riscv/Kconfig| 1 + hw/riscv/virt.c | 30 ++ include/hw/riscv/virt.h | 2 ++ 3 files changed, 33 insertions(+) diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig index d139074b02..1de18cdcf1 100644 --- a/hw/riscv/Kconfig +++ b/hw/riscv/Kconfig @@ -33,6 +33,7 @@ config RISCV_VIRT select SIFIVE_PLIC select SIFIVE_TEST select VIRTIO_MMIO +select FW_CFG_DMA config SIFIVE_E bool diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c index 0b39101a5e..e96ec4cbbc 100644 --- a/hw/riscv/virt.c +++ b/hw/riscv/virt.c @@ -53,6 +53,7 @@ static const MemMapEntry virt_memmap[] = { [VIRT_PLIC] ={ 0xc00, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) }, [VIRT_UART0] = { 0x1000, 0x100 }, [VIRT_VIRTIO] = { 0x10001000,0x1000 }, +[VIRT_FW_CFG] = { 0x1010, 0x18 }, [VIRT_FLASH] = { 0x2000, 0x400 }, [VIRT_PCIE_ECAM] = { 0x3000,0x1000 }, [VIRT_PCIE_MMIO] = { 0x4000,0x4000 }, @@ -507,6 +508,28 @@ static inline DeviceState *gpex_pcie_init(MemoryRegion *sys_mem, return dev; } +static FWCfgState *create_fw_cfg(const MachineState *mc) +{ +hwaddr base = virt_memmap[VIRT_FW_CFG].base; +hwaddr size = virt_memmap[VIRT_FW_CFG].size; +FWCfgState *fw_cfg; +char *nodename; + +fw_cfg = fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, + &address_space_memory); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)mc->smp.cpus); + +nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); +qemu_fdt_add_subnode(mc->fdt, nodename); +qemu_fdt_setprop_string(mc->fdt, nodename, +"compatible", "qemu,fw-cfg-mmio"); +qemu_fdt_setprop_sized_cells(mc->fdt, nodename, "reg", + 2, base, 2, size); +qemu_fdt_setprop(mc->fdt, nodename, "dma-coherent", NULL, 0); +g_free(nodename); +return fw_cfg; +} + static void virt_machine_init(MachineState *machine) { const MemMapEntry *memmap = virt_memmap; @@ -688,6 +711,13 @@ static void virt_machine_init(MachineState *machine) start_addr = virt_memmap[VIRT_FLASH].base; } +/* + * Init fw_cfg. Must be done before riscv_load_fdt, otherwise the device + * tree cannot be altered and we get FDT_ERR_NOSPACE. + */ +s->fw_cfg = create_fw_cfg(machine); +rom_set_fw(s->fw_cfg); + /* Compute the fdt load address in dram */ fdt_load_addr = riscv_load_fdt(memmap[VIRT_DRAM].base, machine->ram_size, machine->fdt); diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h index 632da52018..349fee1f89 100644 --- a/include/hw/riscv/virt.h +++ b/include/hw/riscv/virt.h @@ -40,6 +40,7 @@ struct RISCVVirtState { RISCVHartArrayState soc[VIRT_SOCKETS_MAX]; DeviceState *plic[VIRT_SOCKETS_MAX]; PFlashCFI01 *flash[2]; +FWCfgState *fw_cfg; int fdt_size; }; @@ -53,6 +54,7 @@ enum { VIRT_PLIC, VIRT_UART0, VIRT_VIRTIO, +VIRT_FW_CFG, VIRT_FLASH, VIRT_DRAM, VIRT_PCIE_MMIO, -- 2.20.1
[PATCH v4 0/2] hw/riscv: Add fw_cfg support, allow ramfb
This is version 4 of the series to bring fw_cfg and ramfb support to riscv's virt machine, adapted for the latest master. It is still tested as working against a modified U-Boot with ramfb support. Changes in v4: * Adapt for changes made in c65d7080d8 "hw/riscv: migrate fdt field to generic MachineState". Changes in v3: * Document why fw_cfg is done when it is. * Move VIRT_FW_CFG before VIRT_FLASH. Changes in v2: * Add DMA support (needed for writes). * Add ramfb as allowed on riscv virt machine class. Asherah Connor (2): hw/riscv: Add fw_cfg support to virt hw/riscv: allow ramfb on virt hw/riscv/Kconfig| 1 + hw/riscv/virt.c | 33 + include/hw/riscv/virt.h | 2 ++ 3 files changed, 36 insertions(+) -- 2.20.1
Re: CXL 2.0 memory device design
On 21-03-17 14:40:58, Ben Widawsky wrote: > Phil, Igor, Markus > > TL;DR: What to do about multiple capacities in a single device, and what to do > about interleave? > > I've hacked together a basic CXL 2.0 implementation which exposes a CXL "Type > 3" > memory device (CXL 2.0 Chapter 2.3). For what we were trying to do this was > sufficient. There are two main capabilities that CXL spec exposes which I've > not > implemented that I'd like to start working toward and am realizing that I > what I > have so far might not be able to carry forward to that next milestone. > > Capability 1. A CXL memory device may have both a volatile, and a persistent > capacity. https://bwidawsk.net/HDM_decoders.svg (lower right > side). The current work only supports a single persistent > capacity. > Capability 2. CXL topologies can be interleaved. Basic example: > https://bwidawsk.net/HDM_decoders.svg (lower left side) > > Memory regions are configured via a CXL spec defined HDM decoder. The HDM > decoder which is minimally implemented supports all the functionality > mentioned > above (base, size, interleave, type, etc.). A CXL component may have up to 10 > HDMs. > > What I have today: https://bwidawsk.net/QEMU_objects.svg > There's a single memory backend device for each host bridge. That backend is > passed to any CXL component that is part of the hierarchy underneath that > hostbridge. In the case of a Type 3 device memory capacity a subregion is > created for that capacity from within the main backend. The device itself > implements the TYPE_MEMORY_DEVICE interface. This allows me to utilize the > existing memory region code to determine the next free address, and warn on > overlaps. It hopefully will help when I'm ready to support hotplug. > > Where I've gotten stuck: A Memory Device expects only to have one region of > memory. Trying to add a second breaks pretty much everything. > > I'm hoping to start the discussion about what the right way to emulate this in > QEMU. Ideally something upstreamable would be great. I think adding a > secondary > (or more) capacity to a memory class device is doable, but probably not the > right approach. > > For context, I've posted v3 previously. Here's a link to v4 which has some > minor > changes as well as moving back to using subregions instead of aliases: > https://gitlab.com/bwidawsk/qemu/-/tree/cxl-2.0v4 > > Thanks. > Ben > Hello. I spent some time thinking a bit about this. Right now a have a CXL type 3 memory device which implements TYPE_MEMORY_DEVICE interface. Perhaps the easiest solution would be to have that same device which doesn't implement TYPE_MEMORY_DEVICE, but does object_initialize_child_with_props() a TYPE_PC_DIMM and a TYPE_NVDIMM kind of thing. In the current design, those would be subclassed (or simply rewritten) to not have their own memory backend, and carve out from the main memory backend as I describe above. Thanks. I'm looking forward to hearing some feedback or other suggestions. Ben
RE: [RFC 0/1] Use dmabufs for display updates instead of pixman
Hi Gerd, Thank you for taking the time to explain how support for blob resources needs to be added. We are going to get started soon and here are the tasks we are planning to do in order of priority: 1) Add support for VIRTIO_GPU_BLOB_MEM_GUEST + VIRTIO_GPU_BLOB_FLAG_USE_SHAREABLE 2) Upgrade Qemu GTK UI from 3.22 to 4.x 3) Add explicit sync support to GTK4 and Qemu UI 4) Add support for VIRTGPU_BLOB_MEM_HOST3D We'll start sending patches as we go along. Thanks, Vivek > > [Kasireddy, Vivek] Sure, we'll take a look at your work and use that > > as a starting point. Roughly, how much of your work can be reused? > > There are some small udmabuf support patches which can probably be reused > pretty much > as-is. Everything else needs larger changes I suspect, but it's been a while > I looked at this > ... > > > Also, given my limited understanding of how discrete GPUs work, I was > > wondering how many copies would there need to be with blob > > resources/dmabufs and whether a zero-copy goal would be feasible or not? > > Good question. > > Right now there are two copies (gtk ui): > > (1) guest ram -> DisplaySurface -> gtk widget (gl=off), or > (2) guest ram -> DisplaySurface -> texture (gl=on). > > You should be able to reduce this to one copy for gl=on ... > > (3) guest ram -> texture > > ... by taking DisplaySurface out of the picture, without any changes to the > guest/host > interface. Drawback is that it requires adding an opengl dependency to > virtio-gpu even > with virgl=off, because the virtio-gpu device will have to handle the copy to > the texture > then, in response to guest TRANSFER commands. > > When adding blob resource support: > > Easiest is probably supporting VIRTIO_GPU_BLOB_MEM_GUEST (largely identical to > non-blob resources) with VIRTIO_GPU_BLOB_FLAG_USE_SHAREABLE > (allows the host to create a shared mapping). Then you can go create a > udmabuf for the > resource on the host side. For the non-gl code path you can mmap() the > udmabuf (which > gives you a linear mapping for the scattered guest pages) and create a > DisplaySurface > backed by guest ram pages (removing the guest ram -> DisplaySurface copy). > For the gl > code path you can create a texture backed by the udmabuf and go render on the > host > without copying at all. > > Using VIRTIO_GPU_BLOB_MEM_GUEST + > VIRTIO_GPU_BLOB_FLAG_USE_SHAREABLE for resources needs guest changes too, > either in mesa (when using virgl) or the kernel driver's dumb buffer handling > (when not > using virgl). > > Alternatively (listed more for completeness): > > You can create a blob resource with VIRTGPU_BLOB_MEM_HOST3D (requires virgl, > see also virgl_drm_winsys_resource_create_blob in mesa). It will be > allocated by the > host, then mapped into the guest using a virtual pci memory bar. Guest > userspace (aka > mesa driver) can mmap() these resources and has direct, zero-copy access to > the host > resource. > > Going to dma-buf export that, import into i915, then let the gpu render > implies we are > doing p2p dma from a physical (pci-assigned) device to the memory bar of a > virtual pci > device. > > Doing that should be possible, but frankly I would be surprised if that > actually works out- > of-the-box. Dunno how many dragons are lurking here. > Could become an interesting challenge to make that fly. > > > > Beside the code duplication this is also a maintainance issue. This > > > adds one more configuration to virtio-gpu. Right now you can build > > > virtio-gpu with virgl (depends on opengl), or you can build without > > > virgl (doesn't use opengl then). I don't think it is a good idea to add > > > a third mode, > without virgl support but using opengl for blob dma-bufs. > > [Kasireddy, Vivek] We'll have to re-visit this part but for our > > use-case with virtio-gpu, we are disabling virglrenderer in Qemu and > > virgl DRI driver in the Guest. However, we still need to use > > Opengl/EGL to convert the dmabuf (guest fb) to texture and render as part > > of the > UI/GTK updates. > > Well, VIRTGPU_BLOB_MEM_HOST3D blob resources are created using virgl renderer > commands (VIRGL_CCMD_PIPE_RESOURCE_CREATE). So supporting that without > virglrenderer is not an option. > > VIRTIO_GPU_BLOB_MEM_GUEST might be possible without too much effort. > > > > > On a different note, any particular reason why Qemu UI EGL > > > > implementation is limited to Xorg and not extended to > > > > Wayland/Weston for which there is GTK glarea? > > > > > > Well, ideally I'd love to just use glarea. Which happens on wayland. > > > > > > The problem with Xorg is that the gtk x11 backend uses glx not egl > > > to create an opengl context for glarea. At least that used to be > > > the case in the past, maybe that has changed with newer versions. > > > qemu needs egl contexts though, otherwise dma-bufs don't work. So > > > we are stuck with our own egl widget implementation for now. Probably we > > > will be
Re: [PATCH v3 1/2] hw/riscv: Add fw_cfg support to virt
Hi Alistair, On 21/03/18 05:03:p, Alistair Francis wrote: > I'm guessing the failure is because of "hw/riscv: migrate fdt field to > generic MachineState" which moved the fdt element to the MachineState. > > The fix should just be to change s->fdt to mc->fdt. Yes, looks like it. I'll fix it up and retest locally before sending out the new version. > Sorry that the patch stopped compiling while in the RISC-V tree. On the contrary, thanks for bringing it to my attention. Best, Asherah
Re: [PULL v3 0/6] QOM and fdc patches patches for 2021-03-16
On Thu, 18 Mar 2021 at 12:27, Markus Armbruster wrote: > > The following changes since commit 571d413b5da6bc6f1c2aaca8484717642255ddb0: > > Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-20210316' > into staging (2021-03-17 21:02:37 +) > > are available in the Git repository at: > > git://repo.or.cz/qemu/armbru.git tags/pull-qom-fdc-2021-03-16-v3 > > for you to fetch changes up to 3e659bd899d321fb4a56527cb8e5d3662a889c11: > > memory: Drop "qemu:" prefix from QOM memory region type names (2021-03-18 > 09:54:27 +0100) > > > QOM and fdc patches patches for 2021-03-16 > CONFLICT (content): Merge conflict in docs/system/removed-features.rst Can you fix up and resend, please? thanks -- PMM
Re: [PULL v2 00/11] emulated nvme updates and fixes
On Thu, 18 Mar 2021 at 11:58, Klaus Jensen wrote: > > From: Klaus Jensen > > Hi Peter, > > The following changes since commit b12498fc575f2ad30f09fe78badc7fef526e2d76: > > Merge remote-tracking branch > 'remotes/vivier/tags/q800-for-6.0-pull-request' into staging (2021-03-18 > 10:05:37 +) > > are available in the Git repository at: > > git://git.infradead.org/qemu-nvme.git tags/nvme-next-pull-request > > for you to fetch changes up to dc04d25e2f3f7e26f7f97b860992076b5f04afdb: > > hw/block/nvme: add support for the format nvm command (2021-03-18 12:41:43 > +0100) > > > emulated nvme updates and fixes > > * fixes for Coverity CID 1450756, 1450757 and 1450758 (me) > * fix for a bug in zone management receive (me) > * metadata and end-to-end data protection support (me & Gollu Appalanaidu) > * verify support (Gollu Appalanaidu) > * multiple lba formats and format nvm support (Minwoo Im) > > and a couple of misc refactorings from me. > > v2: > - remove an unintended submodule update. Argh. Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/6.0 for any user-visible changes. -- PMM
Re: [PATCH 1/4] m68k: add the virtio devices aliases
On 3/18/21 11:39 PM, Laurent Vivier wrote: > Similarly to 5f629d943cb0 ("s390x: fix s390 virtio aliases"), > define the virtio aliases. > > This allows to start machines with virtio devices without > knowledge of the implementation type. > > For instance, we can use "-device virtio-scsi" on > m68k, s390x or PC, and the device will be +"respectively" > "virtio-scsi-device", "virtio-scsi-ccw" or "virtio-scsi-pci". > > This already exists for s390x and -ccw interfaces, adds them "add"? > for m68k and MMIO (-device) interfaces. > > Signed-off-by: Laurent Vivier > --- > softmmu/qdev-monitor.c | 46 +++--- > 1 file changed, 30 insertions(+), 16 deletions(-) > > diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c > index 8dc656becca9..262d38b8c01e 100644 > --- a/softmmu/qdev-monitor.c > +++ b/softmmu/qdev-monitor.c > @@ -42,6 +42,8 @@ > #include "hw/qdev-properties.h" > #include "hw/clock.h" > > +#define QEMU_ARCH_NO_PCI (QEMU_ARCH_S390X | QEMU_ARCH_M68K) > + > /* > * Aliases were a bad idea from the start. Let's keep them > * from spreading further. > @@ -60,34 +62,46 @@ static const QDevAlias qdev_alias_table[] = { > { "ES1370", "es1370" }, /* -soundhw name */ > { "ich9-ahci", "ahci" }, > { "lsi53c895a", "lsi" }, > +{ "virtio-9p-device", "virtio-9p", QEMU_ARCH_M68K }, > { "virtio-9p-ccw", "virtio-9p", QEMU_ARCH_S390X }, > -{ "virtio-9p-pci", "virtio-9p", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, > +{ "virtio-9p-pci", "virtio-9p", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, TIL QEMU_ARCH_NO_PCI :) > +{ "virtio-balloon-device", "virtio-balloon", QEMU_ARCH_M68K }, > { "virtio-balloon-ccw", "virtio-balloon", QEMU_ARCH_S390X }, > -{ "virtio-balloon-pci", "virtio-balloon", > -QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, > +{ "virtio-balloon-pci", "virtio-balloon", QEMU_ARCH_ALL & > + ~QEMU_ARCH_NO_PCI }, > +{ "virtio-blk-device", "virtio-blk", QEMU_ARCH_M68K }, > { "virtio-blk-ccw", "virtio-blk", QEMU_ARCH_S390X }, > -{ "virtio-blk-pci", "virtio-blk", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, > +{ "virtio-blk-pci", "virtio-blk", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, > { "virtio-gpu-ccw", "virtio-gpu", QEMU_ARCH_S390X }, > -{ "virtio-gpu-pci", "virtio-gpu", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, > +{ "virtio-gpu-device", "virtio-gpu", QEMU_ARCH_M68K }, > +{ "virtio-gpu-pci", "virtio-gpu", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, > +{ "virtio-input-host-device", "virtio-input-host", QEMU_ARCH_M68K }, > { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_S390X }, > -{ "virtio-input-host-pci", "virtio-input-host", > -QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, > -{ "virtio-iommu-pci", "virtio-iommu", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, > +{ "virtio-input-host-pci", "virtio-input-host", QEMU_ARCH_ALL & > + ~QEMU_ARCH_NO_PCI }, > +{ "virtio-iommu-pci", "virtio-iommu", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI > }, > +{ "virtio-keyboard-device", "virtio-keyboard", QEMU_ARCH_M68K }, > { "virtio-keyboard-ccw", "virtio-keyboard", QEMU_ARCH_S390X }, > -{ "virtio-keyboard-pci", "virtio-keyboard", > -QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, > +{ "virtio-keyboard-pci", "virtio-keyboard", QEMU_ARCH_ALL & > + ~QEMU_ARCH_NO_PCI }, > +{ "virtio-mouse-device", "virtio-mouse", QEMU_ARCH_M68K }, > { "virtio-mouse-ccw", "virtio-mouse", QEMU_ARCH_S390X }, > -{ "virtio-mouse-pci", "virtio-mouse", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, > +{ "virtio-mouse-pci", "virtio-mouse", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI > }, > +{ "virtio-net-device", "virtio-net", QEMU_ARCH_M68K }, > { "virtio-net-ccw", "virtio-net", QEMU_ARCH_S390X }, > -{ "virtio-net-pci", "virtio-net", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, > +{ "virtio-net-pci", "virtio-net", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, > +{ "virtio-rng-device", "virtio-rng", QEMU_ARCH_M68K }, > { "virtio-rng-ccw", "virtio-rng", QEMU_ARCH_S390X }, > -{ "virtio-rng-pci", "virtio-rng", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, > +{ "virtio-rng-pci", "virtio-rng", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, > +{ "virtio-scsi-device", "virtio-scsi", QEMU_ARCH_M68K }, > { "virtio-scsi-ccw", "virtio-scsi", QEMU_ARCH_S390X }, > -{ "virtio-scsi-pci", "virtio-scsi", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, > +{ "virtio-scsi-pci", "virtio-scsi", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, > +{ "virtio-serial-device", "virtio-serial", QEMU_ARCH_M68K }, > { "virtio-serial-ccw", "virtio-serial", QEMU_ARCH_S390X }, > -{ "virtio-serial-pci", "virtio-serial", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X > }, > +{ "virtio-serial-pci", "virtio-serial", QEMU_ARCH_ALL & > ~QEMU_ARCH_NO_PCI}, > +{ "virtio-tablet-device", "virtio-tablet", QEMU_ARCH_M68K }, > { "
Re: [PATCH for 6.0 v2] hw/intc/i8259: Refactor pic_read_irq() to avoid uninitialized variable
On 3/18/21 10:09 AM, Philippe Mathieu-Daudé wrote: +int irq2; + if (irq == 2) { irq2 = pic_get_irq(slave_pic); Move the declaration in here: int irq2 = ... r~
Re: [PATCH 3/4] iotests: test m68k with the virt machine
On 3/18/21 11:39 PM, Laurent Vivier wrote: > This allows to cover the virtio tests with a 32bit big-endian > virtio-mmio machine. > > Signed-off-by: Laurent Vivier > --- > tests/qemu-iotests/testenv.py | 1 + > 1 file changed, 1 insertion(+) Reviewed-by: Philippe Mathieu-Daudé
[PATCH 3/4] iotests: test m68k with the virt machine
This allows to cover the virtio tests with a 32bit big-endian virtio-mmio machine. Signed-off-by: Laurent Vivier --- tests/qemu-iotests/testenv.py | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py index 1fbec854c1f7..6d27712617a3 100644 --- a/tests/qemu-iotests/testenv.py +++ b/tests/qemu-iotests/testenv.py @@ -208,6 +208,7 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str, ('arm', 'virt'), ('aarch64', 'virt'), ('avr', 'mega2560'), +('m68k', 'virt'), ('rx', 'gdbsim-r5f562n8'), ('tricore', 'tricore_testboard') ) -- 2.30.2
[PATCH 1/4] m68k: add the virtio devices aliases
Similarly to 5f629d943cb0 ("s390x: fix s390 virtio aliases"), define the virtio aliases. This allows to start machines with virtio devices without knowledge of the implementation type. For instance, we can use "-device virtio-scsi" on m68k, s390x or PC, and the device will be "virtio-scsi-device", "virtio-scsi-ccw" or "virtio-scsi-pci". This already exists for s390x and -ccw interfaces, adds them for m68k and MMIO (-device) interfaces. Signed-off-by: Laurent Vivier --- softmmu/qdev-monitor.c | 46 +++--- 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c index 8dc656becca9..262d38b8c01e 100644 --- a/softmmu/qdev-monitor.c +++ b/softmmu/qdev-monitor.c @@ -42,6 +42,8 @@ #include "hw/qdev-properties.h" #include "hw/clock.h" +#define QEMU_ARCH_NO_PCI (QEMU_ARCH_S390X | QEMU_ARCH_M68K) + /* * Aliases were a bad idea from the start. Let's keep them * from spreading further. @@ -60,34 +62,46 @@ static const QDevAlias qdev_alias_table[] = { { "ES1370", "es1370" }, /* -soundhw name */ { "ich9-ahci", "ahci" }, { "lsi53c895a", "lsi" }, +{ "virtio-9p-device", "virtio-9p", QEMU_ARCH_M68K }, { "virtio-9p-ccw", "virtio-9p", QEMU_ARCH_S390X }, -{ "virtio-9p-pci", "virtio-9p", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, +{ "virtio-9p-pci", "virtio-9p", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, +{ "virtio-balloon-device", "virtio-balloon", QEMU_ARCH_M68K }, { "virtio-balloon-ccw", "virtio-balloon", QEMU_ARCH_S390X }, -{ "virtio-balloon-pci", "virtio-balloon", -QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, +{ "virtio-balloon-pci", "virtio-balloon", QEMU_ARCH_ALL & + ~QEMU_ARCH_NO_PCI }, +{ "virtio-blk-device", "virtio-blk", QEMU_ARCH_M68K }, { "virtio-blk-ccw", "virtio-blk", QEMU_ARCH_S390X }, -{ "virtio-blk-pci", "virtio-blk", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, +{ "virtio-blk-pci", "virtio-blk", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, { "virtio-gpu-ccw", "virtio-gpu", QEMU_ARCH_S390X }, -{ "virtio-gpu-pci", "virtio-gpu", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, +{ "virtio-gpu-device", "virtio-gpu", QEMU_ARCH_M68K }, +{ "virtio-gpu-pci", "virtio-gpu", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, +{ "virtio-input-host-device", "virtio-input-host", QEMU_ARCH_M68K }, { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_S390X }, -{ "virtio-input-host-pci", "virtio-input-host", -QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, -{ "virtio-iommu-pci", "virtio-iommu", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, +{ "virtio-input-host-pci", "virtio-input-host", QEMU_ARCH_ALL & + ~QEMU_ARCH_NO_PCI }, +{ "virtio-iommu-pci", "virtio-iommu", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, +{ "virtio-keyboard-device", "virtio-keyboard", QEMU_ARCH_M68K }, { "virtio-keyboard-ccw", "virtio-keyboard", QEMU_ARCH_S390X }, -{ "virtio-keyboard-pci", "virtio-keyboard", -QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, +{ "virtio-keyboard-pci", "virtio-keyboard", QEMU_ARCH_ALL & + ~QEMU_ARCH_NO_PCI }, +{ "virtio-mouse-device", "virtio-mouse", QEMU_ARCH_M68K }, { "virtio-mouse-ccw", "virtio-mouse", QEMU_ARCH_S390X }, -{ "virtio-mouse-pci", "virtio-mouse", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, +{ "virtio-mouse-pci", "virtio-mouse", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, +{ "virtio-net-device", "virtio-net", QEMU_ARCH_M68K }, { "virtio-net-ccw", "virtio-net", QEMU_ARCH_S390X }, -{ "virtio-net-pci", "virtio-net", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, +{ "virtio-net-pci", "virtio-net", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, +{ "virtio-rng-device", "virtio-rng", QEMU_ARCH_M68K }, { "virtio-rng-ccw", "virtio-rng", QEMU_ARCH_S390X }, -{ "virtio-rng-pci", "virtio-rng", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, +{ "virtio-rng-pci", "virtio-rng", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, +{ "virtio-scsi-device", "virtio-scsi", QEMU_ARCH_M68K }, { "virtio-scsi-ccw", "virtio-scsi", QEMU_ARCH_S390X }, -{ "virtio-scsi-pci", "virtio-scsi", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, +{ "virtio-scsi-pci", "virtio-scsi", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, +{ "virtio-serial-device", "virtio-serial", QEMU_ARCH_M68K }, { "virtio-serial-ccw", "virtio-serial", QEMU_ARCH_S390X }, -{ "virtio-serial-pci", "virtio-serial", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, +{ "virtio-serial-pci", "virtio-serial", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI}, +{ "virtio-tablet-device", "virtio-tablet", QEMU_ARCH_M68K }, { "virtio-tablet-ccw", "virtio-tablet", QEMU_ARCH_S390X }, -{ "virtio-tablet-pci", "virtio-tablet", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X }, +{ "virtio-tablet-pci", "virtio-tablet", QEMU_ARCH_ALL & ~QEMU_ARCH_NO_PCI }, { } }; -- 2.30.2
[PATCH 4/4] iotests: iothreads need ioeventfd
And ioeventfd are only available with virtio-scsi-pci, so don't use the alias and add a rule to require virtio-scsi-pci for the tests that use iothreads. Signed-off-by: Laurent Vivier --- tests/qemu-iotests/127| 4 ++-- tests/qemu-iotests/256| 2 ++ tests/qemu-iotests/iotests.py | 5 + 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/127 b/tests/qemu-iotests/127 index 98e8e82a8210..a3693533685a 100755 --- a/tests/qemu-iotests/127 +++ b/tests/qemu-iotests/127 @@ -44,7 +44,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15 _supported_fmt qcow2 _supported_proto file fuse -_require_devices virtio-scsi scsi-hd +_require_devices virtio-scsi-pci scsi-hd IMG_SIZE=64K @@ -62,7 +62,7 @@ $QEMU_IO -c 'write 0 42' "$TEST_IMG.overlay0" | _filter_qemu_io _launch_qemu \ -object iothread,id=iothr \ -blockdev node-name=source,driver=$IMGFMT,file.driver=file,file.filename="$TEST_IMG.overlay0" \ --device virtio-scsi,id=scsi-bus,iothread=iothr \ +-device virtio-scsi-pci,id=scsi-bus,iothread=iothr \ -device scsi-hd,bus=scsi-bus.0,drive=source _send_qemu_cmd $QEMU_HANDLE \ diff --git a/tests/qemu-iotests/256 b/tests/qemu-iotests/256 index 8d82a1dd865f..eb3af0dea80c 100755 --- a/tests/qemu-iotests/256 +++ b/tests/qemu-iotests/256 @@ -24,6 +24,8 @@ import os import iotests from iotests import log +iotests._verify_virtio_scsi_pci() + iotests.script_initialize(supported_fmts=['qcow2']) size = 64 * 1024 * 1024 diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 1e9e6a066e90..3404ed534bb5 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -1146,6 +1146,11 @@ def _verify_virtio_blk() -> None: if 'virtio-blk' not in out: notrun('Missing virtio-blk in QEMU binary') +def _verify_virtio_scsi_pci() -> None: +out = qemu_pipe('-M', 'none', '-device', 'help') +if 'virtio-scsi-pci' not in out: +notrun('Missing virtio-scsi-pci in QEMU binary') + def supports_quorum(): return 'quorum' in qemu_img_pipe('--help') -- 2.30.2
[PATCH 2/4] iotests: Revert "iotests: use -ccw on s390x for 040, 139, and 182"
Commit f1d5516ab583 introduces a test in some iotests to check if the machine is a s390-ssw-virtio and to select virtio-*-ccw rather than virtio-*-pci. We don't need that because QEMU already provides aliases to use the correct virtio interface according to the machine type. This patch removes all virtio-*-pci and virtio-*-ccw to use virtio-* instead. Signed-off-by: Laurent Vivier cc: Cornelia Huck --- blockdev.c| 6 +- tests/qemu-iotests/040| 2 +- tests/qemu-iotests/051| 12 +--- tests/qemu-iotests/068| 4 +--- tests/qemu-iotests/093| 3 +-- tests/qemu-iotests/139| 9 ++--- tests/qemu-iotests/182| 13 ++--- tests/qemu-iotests/238| 4 +--- tests/qemu-iotests/240| 10 +- tests/qemu-iotests/257| 4 ++-- tests/qemu-iotests/307| 4 +--- tests/qemu-iotests/iotests.py | 5 - 12 files changed, 18 insertions(+), 58 deletions(-) diff --git a/blockdev.c b/blockdev.c index 5cc7c7effe9f..64da5350e3ad 100644 --- a/blockdev.c +++ b/blockdev.c @@ -969,11 +969,7 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type, QemuOpts *devopts; devopts = qemu_opts_create(qemu_find_opts("device"), NULL, 0, &error_abort); -if (arch_type == QEMU_ARCH_S390X) { -qemu_opt_set(devopts, "driver", "virtio-blk-ccw", &error_abort); -} else { -qemu_opt_set(devopts, "driver", "virtio-blk-pci", &error_abort); -} +qemu_opt_set(devopts, "driver", "virtio-blk", &error_abort); qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"), &error_abort); } diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040 index 336ff7c4f2ab..ba7cb34ce8cf 100755 --- a/tests/qemu-iotests/040 +++ b/tests/qemu-iotests/040 @@ -89,7 +89,7 @@ class TestSingleDrive(ImageCommitTestCase): qemu_io('-f', 'raw', '-c', 'write -P 0xab 0 524288', backing_img) qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0xef 524288 524288', mid_img) self.vm = iotests.VM().add_drive(test_img, "node-name=top,backing.node-name=mid,backing.backing.node-name=base", interface="none") -self.vm.add_device(iotests.get_virtio_scsi_device()) +self.vm.add_device('virtio-scsi') self.vm.add_device("scsi-hd,id=scsi0,drive=drive0") self.vm.launch() self.has_quit = False diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051 index 7cbd1415ce7b..00382cc55e25 100755 --- a/tests/qemu-iotests/051 +++ b/tests/qemu-iotests/051 @@ -119,17 +119,7 @@ echo echo === Device without drive === echo -case "$QEMU_DEFAULT_MACHINE" in - s390-ccw-virtio) - virtio_scsi=virtio-scsi-ccw - ;; - *) - virtio_scsi=virtio-scsi-pci - ;; -esac - -run_qemu -device $virtio_scsi -device scsi-hd | -sed -e "s/$virtio_scsi/VIRTIO_SCSI/" +run_qemu -device virtio-scsi -device scsi-hd echo echo === Overriding backing file === diff --git a/tests/qemu-iotests/068 b/tests/qemu-iotests/068 index 03e03508a6e2..54e49c8ffab1 100755 --- a/tests/qemu-iotests/068 +++ b/tests/qemu-iotests/068 @@ -49,11 +49,9 @@ IMG_SIZE=128K case "$QEMU_DEFAULT_MACHINE" in s390-ccw-virtio) platform_parm="-no-shutdown" - hba=virtio-scsi-ccw ;; *) platform_parm="" - hba=virtio-scsi-pci ;; esac @@ -61,7 +59,7 @@ _qemu() { $QEMU $platform_parm -nographic -monitor stdio -serial none \ -drive if=none,id=drive0,file="$TEST_IMG",format="$IMGFMT" \ - -device $hba,id=hba0 \ + -device virtio-scsi,id=hba0 \ -device scsi-hd,drive=drive0 \ "$@" |\ _filter_qemu | _filter_hmp diff --git a/tests/qemu-iotests/093 b/tests/qemu-iotests/093 index 7745cb04b611..93274dc8cbde 100755 --- a/tests/qemu-iotests/093 +++ b/tests/qemu-iotests/093 @@ -371,8 +371,7 @@ class ThrottleTestGroupNames(iotests.QMPTestCase): class ThrottleTestRemovableMedia(iotests.QMPTestCase): def setUp(self): self.vm = iotests.VM() -self.vm.add_device("{},id=virtio-scsi".format( -iotests.get_virtio_scsi_device())) +self.vm.add_device("{},id=virtio-scsi".format('virtio-scsi')) self.vm.launch() def tearDown(self): diff --git a/tests/qemu-iotests/139 b/tests/qemu-iotests/139 index e79b3c21fdce..178b1ee230ca 100755 --- a/tests/qemu-iotests/139 +++ b/tests/qemu-iotests/139 @@ -26,18 +26,13 @@ import time base_img = os.path.join(iotests.test_dir, 'base.img') new_img = os.path.join(iotests.test_dir, 'new.img') -if iotests.qemu_default_machine == 's390-ccw-virtio': -default_virtio_blk = 'virtio-blk-ccw' -else: -default_virtio_blk = 'virtio-blk-pci' class TestBlockdevDel(iotests.QMPTestCase): def setUp(self): iotests.qemu_img('create', '-f', iotests.imgfmt, base_img, '1M') self.vm = iotests.VM()
[PATCH 0/4] iotests: fix failures with non-PCI machines
Tests are executed using virtio-*-pci even on a non PCI machine. The problem can be easily fixed using the virtio aliases (virtio-*), to run virtio-*-ccw on s390x and virtio-*-device on m68k. A first attempt was tried with virtio-*-ccw by detecting the machine type, this series removes it to use the aliases that are a cleaner approach. Laurent Vivier (4): m68k: add the virtio devices aliases iotests: Revert "iotests: use -ccw on s390x for 040, 139, and 182" iotests: test m68k with the virt machine iotests: iothreads need ioeventfd blockdev.c| 6 + softmmu/qdev-monitor.c| 46 +++ tests/qemu-iotests/040| 2 +- tests/qemu-iotests/051| 12 + tests/qemu-iotests/068| 4 +-- tests/qemu-iotests/093| 3 +-- tests/qemu-iotests/127| 4 +-- tests/qemu-iotests/139| 9 ++- tests/qemu-iotests/182| 13 ++ tests/qemu-iotests/238| 4 +-- tests/qemu-iotests/240| 10 tests/qemu-iotests/256| 2 ++ tests/qemu-iotests/257| 4 +-- tests/qemu-iotests/307| 4 +-- tests/qemu-iotests/iotests.py | 10 tests/qemu-iotests/testenv.py | 1 + 16 files changed, 58 insertions(+), 76 deletions(-) -- 2.30.2
Re: [PATCH for-6.0 v2 4/5] hw/core/loader: Add new function rom_ptr_for_as()
On 3/18/21 3:28 PM, Peter Maydell wrote: On Thu, 18 Mar 2021 at 21:14, Richard Henderson wrote: On 3/18/21 1:02 PM, Peter Maydell wrote: + * Note that we do not check @as against the 'as' member in the + * 'struct Rom' returned by rom_ptr(). The Rom::as is the + * AddressSpace which the rom blob should be written to... ... Should you really have this special case? Nowhere is this going to verify that @addr is in @as. It's the "happens almost all the time" case. Nothing verifies that @addr is in @as anyway -- see the "Note that" part of the comment above. The comment explains why we don't examine Rom::as. But we do check @addr vs @as via @as -> @fv -> flatview_translate. All we do is look for "every other address in the AS that maps to the same MR as the address we started with". Are you asking about the !cbdata.mr exit ? Well, I suppose. I guess I didn't read the cb properly. Reviewed-by: Richard Henderson r~
Re: [PATCH v3 1/2] hw/riscv: Add fw_cfg support to virt
On Thu, Mar 18, 2021 at 5:25 PM Alistair Francis wrote: > > On Sun, Feb 28, 2021 at 6:17 AM Asherah Connor wrote: > > > > Provides fw_cfg for the virt machine on riscv. This enables > > using e.g. ramfb later. > > > > Signed-off-by: Asherah Connor > > This patch doesn't compile, I see this error: > > ../hw/riscv/virt.c: In function ‘create_fw_cfg’: > ../hw/riscv/virt.c:523:27: error: ‘RISCVVirtState’ has no member named ‘fdt’ > 523 | qemu_fdt_add_subnode(s->fdt, nodename); > | ^~ > > Can you please fix the compilation failure and send a new version? I'm guessing the failure is because of "hw/riscv: migrate fdt field to generic MachineState" which moved the fdt element to the MachineState. The fix should just be to change s->fdt to mc->fdt. Sorry that the patch stopped compiling while in the RISC-V tree. Alistair > > Alistair > > > --- > > > > Changes in v3: > > * Document why fw_cfg is done when it is. > > * Move VIRT_FW_CFG before VIRT_FLASH. > > > > Changes in v2: > > * Add DMA support (needed for writes). > > > > hw/riscv/Kconfig| 1 + > > hw/riscv/virt.c | 30 ++ > > include/hw/riscv/virt.h | 2 ++ > > 3 files changed, 33 insertions(+) > > > > diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig > > index facb0cbacc..afaa5e58bb 100644 > > --- a/hw/riscv/Kconfig > > +++ b/hw/riscv/Kconfig > > @@ -33,6 +33,7 @@ config RISCV_VIRT > > select SIFIVE_PLIC > > select SIFIVE_TEST > > select VIRTIO_MMIO > > +select FW_CFG_DMA > > > > config SIFIVE_E > > bool > > diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c > > index 2299b3a6be..82eff42c37 100644 > > --- a/hw/riscv/virt.c > > +++ b/hw/riscv/virt.c > > @@ -56,6 +56,7 @@ static const struct MemmapEntry { > > [VIRT_PLIC] ={ 0xc00, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) }, > > [VIRT_UART0] = { 0x1000, 0x100 }, > > [VIRT_VIRTIO] = { 0x10001000,0x1000 }, > > +[VIRT_FW_CFG] = { 0x1010, 0x18 }, > > [VIRT_FLASH] = { 0x2000, 0x400 }, > > [VIRT_PCIE_ECAM] = { 0x3000,0x1000 }, > > [VIRT_PCIE_MMIO] = { 0x4000,0x4000 }, > > @@ -488,6 +489,28 @@ static inline DeviceState *gpex_pcie_init(MemoryRegion > > *sys_mem, > > return dev; > > } > > > > +static FWCfgState *create_fw_cfg(const RISCVVirtState *s) > > +{ > > +hwaddr base = virt_memmap[VIRT_FW_CFG].base; > > +hwaddr size = virt_memmap[VIRT_FW_CFG].size; > > +FWCfgState *fw_cfg; > > +char *nodename; > > + > > +fw_cfg = fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, > > + &address_space_memory); > > +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)MACHINE(s)->smp.cpus); > > + > > +nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); > > +qemu_fdt_add_subnode(s->fdt, nodename); > > +qemu_fdt_setprop_string(s->fdt, nodename, > > +"compatible", "qemu,fw-cfg-mmio"); > > +qemu_fdt_setprop_sized_cells(s->fdt, nodename, "reg", > > + 2, base, 2, size); > > +qemu_fdt_setprop(s->fdt, nodename, "dma-coherent", NULL, 0); > > +g_free(nodename); > > +return fw_cfg; > > +} > > + > > static void virt_machine_init(MachineState *machine) > > { > > const struct MemmapEntry *memmap = virt_memmap; > > @@ -652,6 +675,13 @@ static void virt_machine_init(MachineState *machine) > > start_addr = virt_memmap[VIRT_FLASH].base; > > } > > > > +/* > > + * Init fw_cfg. Must be done before riscv_load_fdt, otherwise the > > device > > + * tree cannot be altered and we get FDT_ERR_NOSPACE. > > + */ > > +s->fw_cfg = create_fw_cfg(s); > > +rom_set_fw(s->fw_cfg); > > + > > /* Compute the fdt load address in dram */ > > fdt_load_addr = riscv_load_fdt(memmap[VIRT_DRAM].base, > > machine->ram_size, s->fdt); > > diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h > > index 84b7a3848f..b0ded3fc55 100644 > > --- a/include/hw/riscv/virt.h > > +++ b/include/hw/riscv/virt.h > > @@ -40,6 +40,7 @@ struct RISCVVirtState { > > RISCVHartArrayState soc[VIRT_SOCKETS_MAX]; > > DeviceState *plic[VIRT_SOCKETS_MAX]; > > PFlashCFI01 *flash[2]; > > +FWCfgState *fw_cfg; > > > > void *fdt; > > int fdt_size; > > @@ -54,6 +55,7 @@ enum { > > VIRT_PLIC, > > VIRT_UART0, > > VIRT_VIRTIO, > > +VIRT_FW_CFG, > > VIRT_FLASH, > > VIRT_DRAM, > > VIRT_PCIE_MMIO, > > -- > > 2.20.1 > > > >
Re: [PATCH for-6.0 v2 4/5] hw/core/loader: Add new function rom_ptr_for_as()
On Thu, 18 Mar 2021 at 21:14, Richard Henderson wrote: > > On 3/18/21 1:02 PM, Peter Maydell wrote: > >>> + * Note that we do not check @as against the 'as' member in the > >>> + * 'struct Rom' returned by rom_ptr(). The Rom::as is the > >>> + * AddressSpace which the rom blob should be written to... > ... > >> Should you really have this special case? Nowhere is this going to verify > >> that > >> @addr is in @as. > > > > It's the "happens almost all the time" case. Nothing verifies > > that @addr is in @as anyway -- see the "Note that" part of the > > comment above. > > The comment explains why we don't examine Rom::as. > But we do check @addr vs @as via @as -> @fv -> flatview_translate. All we do is look for "every other address in the AS that maps to the same MR as the address we started with". Are you asking about the !cbdata.mr exit ? That's for the implausible corner case that the caller asked us about an address with no RAM. In that case the early "check rom_ptr() directly against @address" gets us the "did the user load an image at that address", and we can skip the "check for other aliases, because by there aren't going to be any aliases to no MR at all. thanks -- PMM
Re: [PATCH v3 1/2] hw/riscv: Add fw_cfg support to virt
On Sun, Feb 28, 2021 at 6:17 AM Asherah Connor wrote: > > Provides fw_cfg for the virt machine on riscv. This enables > using e.g. ramfb later. > > Signed-off-by: Asherah Connor This patch doesn't compile, I see this error: ../hw/riscv/virt.c: In function ‘create_fw_cfg’: ../hw/riscv/virt.c:523:27: error: ‘RISCVVirtState’ has no member named ‘fdt’ 523 | qemu_fdt_add_subnode(s->fdt, nodename); | ^~ Can you please fix the compilation failure and send a new version? Alistair > --- > > Changes in v3: > * Document why fw_cfg is done when it is. > * Move VIRT_FW_CFG before VIRT_FLASH. > > Changes in v2: > * Add DMA support (needed for writes). > > hw/riscv/Kconfig| 1 + > hw/riscv/virt.c | 30 ++ > include/hw/riscv/virt.h | 2 ++ > 3 files changed, 33 insertions(+) > > diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig > index facb0cbacc..afaa5e58bb 100644 > --- a/hw/riscv/Kconfig > +++ b/hw/riscv/Kconfig > @@ -33,6 +33,7 @@ config RISCV_VIRT > select SIFIVE_PLIC > select SIFIVE_TEST > select VIRTIO_MMIO > +select FW_CFG_DMA > > config SIFIVE_E > bool > diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c > index 2299b3a6be..82eff42c37 100644 > --- a/hw/riscv/virt.c > +++ b/hw/riscv/virt.c > @@ -56,6 +56,7 @@ static const struct MemmapEntry { > [VIRT_PLIC] ={ 0xc00, VIRT_PLIC_SIZE(VIRT_CPUS_MAX * 2) }, > [VIRT_UART0] = { 0x1000, 0x100 }, > [VIRT_VIRTIO] = { 0x10001000,0x1000 }, > +[VIRT_FW_CFG] = { 0x1010, 0x18 }, > [VIRT_FLASH] = { 0x2000, 0x400 }, > [VIRT_PCIE_ECAM] = { 0x3000,0x1000 }, > [VIRT_PCIE_MMIO] = { 0x4000,0x4000 }, > @@ -488,6 +489,28 @@ static inline DeviceState *gpex_pcie_init(MemoryRegion > *sys_mem, > return dev; > } > > +static FWCfgState *create_fw_cfg(const RISCVVirtState *s) > +{ > +hwaddr base = virt_memmap[VIRT_FW_CFG].base; > +hwaddr size = virt_memmap[VIRT_FW_CFG].size; > +FWCfgState *fw_cfg; > +char *nodename; > + > +fw_cfg = fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, > + &address_space_memory); > +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)MACHINE(s)->smp.cpus); > + > +nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); > +qemu_fdt_add_subnode(s->fdt, nodename); > +qemu_fdt_setprop_string(s->fdt, nodename, > +"compatible", "qemu,fw-cfg-mmio"); > +qemu_fdt_setprop_sized_cells(s->fdt, nodename, "reg", > + 2, base, 2, size); > +qemu_fdt_setprop(s->fdt, nodename, "dma-coherent", NULL, 0); > +g_free(nodename); > +return fw_cfg; > +} > + > static void virt_machine_init(MachineState *machine) > { > const struct MemmapEntry *memmap = virt_memmap; > @@ -652,6 +675,13 @@ static void virt_machine_init(MachineState *machine) > start_addr = virt_memmap[VIRT_FLASH].base; > } > > +/* > + * Init fw_cfg. Must be done before riscv_load_fdt, otherwise the device > + * tree cannot be altered and we get FDT_ERR_NOSPACE. > + */ > +s->fw_cfg = create_fw_cfg(s); > +rom_set_fw(s->fw_cfg); > + > /* Compute the fdt load address in dram */ > fdt_load_addr = riscv_load_fdt(memmap[VIRT_DRAM].base, > machine->ram_size, s->fdt); > diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h > index 84b7a3848f..b0ded3fc55 100644 > --- a/include/hw/riscv/virt.h > +++ b/include/hw/riscv/virt.h > @@ -40,6 +40,7 @@ struct RISCVVirtState { > RISCVHartArrayState soc[VIRT_SOCKETS_MAX]; > DeviceState *plic[VIRT_SOCKETS_MAX]; > PFlashCFI01 *flash[2]; > +FWCfgState *fw_cfg; > > void *fdt; > int fdt_size; > @@ -54,6 +55,7 @@ enum { > VIRT_PLIC, > VIRT_UART0, > VIRT_VIRTIO, > +VIRT_FW_CFG, > VIRT_FLASH, > VIRT_DRAM, > VIRT_PCIE_MMIO, > -- > 2.20.1 > >
Re: [PATCH for-6.0 v2 4/5] hw/core/loader: Add new function rom_ptr_for_as()
On 3/18/21 1:02 PM, Peter Maydell wrote: + * Note that we do not check @as against the 'as' member in the + * 'struct Rom' returned by rom_ptr(). The Rom::as is the + * AddressSpace which the rom blob should be written to... ... Should you really have this special case? Nowhere is this going to verify that @addr is in @as. It's the "happens almost all the time" case. Nothing verifies that @addr is in @as anyway -- see the "Note that" part of the comment above. The comment explains why we don't examine Rom::as. But we do check @addr vs @as via @as -> @fv -> flatview_translate. r~
Re: [PATCH for-6.0 v2 3/5] memory: Add offset_in_region to flatview_cb arguments
On 3/18/21 6:48 PM, Peter Maydell wrote: > The function flatview_for_each_range() calls a callback for each > range in a FlatView. Currently the callback gets the start and > length of the range and the MemoryRegion involved, but not the offset > within the MemoryRegion. Add this to the callback's arguments; we're > going to want it for a new use in the next commit. > > Signed-off-by: Peter Maydell > --- > include/exec/memory.h | 2 ++ > softmmu/memory.c| 4 +++- > tests/qtest/fuzz/generic_fuzz.c | 5 - > 3 files changed, 9 insertions(+), 2 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH for-6.0 v2 2/5] memory: Document flatview_for_each_range()
On 3/18/21 6:48 PM, Peter Maydell wrote: > Add a documentation comment describing flatview_for_each_range(). > > Signed-off-by: Peter Maydell > --- > include/exec/memory.h | 26 -- > 1 file changed, 24 insertions(+), 2 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH] MAINTAINERS: Fix tests/migration maintainers
On 3/18/21 11:40 AM, huang...@chinatelecom.cn wrote: > From: Hyman > > Signed-off-by: Hyman It looks unusual to have a single name in your authorship and S-o-b line. Generally, this line should represent (a version of) your legal name, as you are making a legal claim: https://wiki.qemu.org/Contribute/SubmitAPatch points to http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297 for what it represents. Of course, if you DO regularly sign a single name as your legal name in other contexts, don't let me stop you from doing so here as well. And if you want to use UTF-8 to spell your name natively, or even have a combination of your native name and a Latinized form, that is acceptable as well (commit 903a41d341 is an example of that approach). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH for-6.0 v2 1/5] memory: Make flatview_cb return bool, not int
On 3/18/21 6:48 PM, Peter Maydell wrote: > The return value of the flatview_cb callback passed to the > flatview_for_each_range() function is zero if the iteration through > the ranges should continue, or non-zero to break out of it. Use a > bool for this rather than int. > > Signed-off-by: Peter Maydell > --- > include/exec/memory.h | 6 +++--- > tests/qtest/fuzz/generic_fuzz.c | 8 > 2 files changed, 7 insertions(+), 7 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: Serious doubts about Gitlab CI
On 3/18/21 8:52 PM, John Snow wrote: > On 3/18/21 3:46 PM, Stefan Hajnoczi wrote: >> On Wed, Mar 17, 2021 at 09:29:32PM +0100, Philippe Mathieu-Daudé wrote: >>> Now I'm having serious doubts about Gitlab usefulness for the QEMU >>> community... >> >> The QEMU Project has 50,000 minutes of GitLab CI quota. Let's enable >> GitLab Merge Requests so that anyone can submit a merge request and get >> CI coverage. >> > > How does this workflow work? > > I push to my branch, I submit a MR, CI runs? > > I suppose there must be a way for me to disable a CI run on my branch if > I intend to trigger it via a MR, to avoid eating minutes twice. I use that alias in ~/.gitconfig: [alias] skip-ci-push = push --push-option=ci.skip Then: $ git skip-ci-push [-f] myrepo mybranch
Re: [RFC v5 1/6] qmp: add QMP command x-debug-query-virtio
On 3/18/21 11:29 AM, Jonah Palmer wrote: > From: Laurent Vivier > > This new command lists all the instances of VirtIODevice with > their path and virtio type > > Signed-off-by: Laurent Vivier > Reviewed-by: Eric Blake > Signed-off-by: Jonah Palmer > --- We've missed soft freeze for 6.0, and this feels like a new feature; therefore... > +++ b/hw/virtio/virtio.c > > +VirtioInfoList *qmp_x_debug_query_virtio(Error **errp) > +{ > +VirtioInfoList *list = NULL; > +VirtioInfoList *node; > +VirtIODevice *vdev; > + > +QTAILQ_FOREACH(vdev, &virtio_list, next) { > +DeviceState *dev = DEVICE(vdev); > +node = g_new0(VirtioInfoList, 1); > +node->value = g_new(VirtioInfo, 1); > +node->value->path = g_strdup(dev->canonical_path); > +node->value->type = qapi_enum_parse(&VirtioType_lookup, vdev->name, > +VIRTIO_TYPE_UNKNOWN, NULL); > +node->next = list; > +list = node; This should be updated to use QAPI_LIST_PREPEND rather than open coding. > +} > + > +return list; > +} > + > static const TypeInfo virtio_device_info = { > .name = TYPE_VIRTIO_DEVICE, > .parent = TYPE_DEVICE, > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > index b7ece7a..2470e09 100644 > --- a/include/hw/virtio/virtio.h > +++ b/qapi/virtio.json > @@ -0,0 +1,68 @@ > +## > +# = Virtio devices > +## > + > +## > +# @VirtioType: > +# > +# An enumeration of Virtio device types. > +# > +# Since: 6.0 ...this now needs to reference 6.1. > +## > +{ 'enum': 'VirtioType', > + 'data': [ 'unknown', 'virtio-9p', 'virtio-blk', 'virtio-serial', > +'virtio-gpu', 'virtio-input', 'virtio-net', 'virtio-scsi', > +'vhost-user-fs', 'vhost-vsock', 'virtio-balloon', > 'virtio-crypto', > +'virtio-iommu', 'virtio-pmem', 'virtio-rng' ] > +} > + > +## > +# @VirtioInfo: > +# > +# Information about a given VirtIODevice > +# > +# @path: VirtIO device canonical path. > +# > +# @type: VirtIO device type. > +# > +# Since: 6.0 and throughout the series (I'll quit pointing it out) > +## > +# @x-debug-query-virtio: > +# > +# Return the list of all VirtIO devices > +# > +# Returns: list of @VirtioInfo > +# > +# Since: 6.0 > +# > +# Example: > +# > +# -> { "execute": "x-debug-query-virtio" } That said, adding an 'x-' experimental feature is NOT locking us down, so if some maintainer still wants to include this in -rc1 on the grounds that it will help debugging other things, rather than pushing it out to 6.1 as a new feature, then keeping things as 6.0 is tolerable in that border-line case. (If it misses -rc1, then I will become more adamant that it does not belong in 6.0) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH] target/riscv: Make VSTIP and VSEIP read-only in hip
On Thu, Mar 11, 2021 at 4:49 AM Georg Kotheimer wrote: > > Signed-off-by: Georg Kotheimer Thanks! Applied to riscv-to-apply.next Alistair > --- > target/riscv/csr.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/target/riscv/csr.c b/target/riscv/csr.c > index fd2e6363f3..00a3ab72af 100644 > --- a/target/riscv/csr.c > +++ b/target/riscv/csr.c > @@ -420,7 +420,8 @@ static const target_ulong sstatus_v1_10_mask = > SSTATUS_SIE | SSTATUS_SPIE | > SSTATUS_UIE | SSTATUS_UPIE | SSTATUS_SPP | SSTATUS_FS | SSTATUS_XS | > SSTATUS_SUM | SSTATUS_MXR | SSTATUS_SD; > static const target_ulong sip_writable_mask = SIP_SSIP | MIP_USIP | MIP_UEIP; > -static const target_ulong hip_writable_mask = MIP_VSSIP | MIP_VSTIP | > MIP_VSEIP; > +static const target_ulong hip_writable_mask = MIP_VSSIP; > +static const target_ulong hvip_writable_mask = MIP_VSSIP | MIP_VSTIP | > MIP_VSEIP; > static const target_ulong vsip_writable_mask = MIP_VSSIP; > > static const char valid_vm_1_10_32[16] = { > @@ -962,9 +963,9 @@ static int rmw_hvip(CPURISCVState *env, int csrno, > target_ulong *ret_value, > target_ulong new_value, target_ulong write_mask) > { > int ret = rmw_mip(env, 0, ret_value, new_value, > - write_mask & hip_writable_mask); > + write_mask & hvip_writable_mask); > > -*ret_value &= hip_writable_mask; > +*ret_value &= hvip_writable_mask; > > return ret; > } > -- > 2.30.1 > >
Re: [PATCH] target/riscv: Adjust privilege level for HLV(X)/HSV instructions
On Thu, Mar 11, 2021 at 5:32 AM Georg Kotheimer wrote: > > According to the specification the "field SPVP of hstatus controls the > privilege level of the access" for the hypervisor virtual-machine load > and store instructions HLV, HLVX and HSV. > > Signed-off-by: Georg Kotheimer Thanks! Applied to riscv-to-apply.next Alistair > --- > target/riscv/cpu_helper.c | 25 ++--- > 1 file changed, 14 insertions(+), 11 deletions(-) > > diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c > index 2f43939fb6..d0577b1e08 100644 > --- a/target/riscv/cpu_helper.c > +++ b/target/riscv/cpu_helper.c > @@ -325,7 +325,11 @@ static int get_physical_address(CPURISCVState *env, > hwaddr *physical, > use_background = true; > } > > -if (mode == PRV_M && access_type != MMU_INST_FETCH) { > +/* MPRV does not affect the virtual-machine load/store > + instructions, HLV, HLVX, and HSV. */ > +if (riscv_cpu_two_stage_lookup(mmu_idx)) { > +mode = get_field(env->hstatus, HSTATUS_SPVP); > +} else if (mode == PRV_M && access_type != MMU_INST_FETCH) { > if (get_field(env->mstatus, MSTATUS_MPRV)) { > mode = get_field(env->mstatus, MSTATUS_MPP); > } > @@ -695,19 +699,18 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, > int size, > qemu_log_mask(CPU_LOG_MMU, "%s ad %" VADDR_PRIx " rw %d mmu_idx %d\n", >__func__, address, access_type, mmu_idx); > > -if (mode == PRV_M && access_type != MMU_INST_FETCH) { > -if (get_field(env->mstatus, MSTATUS_MPRV)) { > -mode = get_field(env->mstatus, MSTATUS_MPP); > +/* MPRV does not affect the virtual-machine load/store > + instructions, HLV, HLVX, and HSV. */ > +if (riscv_cpu_two_stage_lookup(mmu_idx)) { > +mode = get_field(env->hstatus, HSTATUS_SPVP); > +} else if (mode == PRV_M && access_type != MMU_INST_FETCH && > + get_field(env->mstatus, MSTATUS_MPRV)) { > +mode = get_field(env->mstatus, MSTATUS_MPP); > +if (riscv_has_ext(env, RVH) && get_field(env->mstatus, MSTATUS_MPV)) > { > +two_stage_lookup = true; > } > } > > -if (riscv_has_ext(env, RVH) && env->priv == PRV_M && > -access_type != MMU_INST_FETCH && > -get_field(env->mstatus, MSTATUS_MPRV) && > -get_field(env->mstatus, MSTATUS_MPV)) { > -two_stage_lookup = true; > -} > - > if (riscv_cpu_virt_enabled(env) || > ((riscv_cpu_two_stage_lookup(mmu_idx) || two_stage_lookup) && > access_type != MMU_INST_FETCH)) { > -- > 2.30.1 > >
Re: Serious doubts about Gitlab CI
On 18/03/21 20:46, Stefan Hajnoczi wrote: The QEMU Project has 50,000 minutes of GitLab CI quota. Let's enable GitLab Merge Requests so that anyone can submit a merge request and get CI coverage. Each merge request consumes about 2500. That won't last long. Paolo
Re: of AVR target page size
On Thu, 18 Mar 2021 at 20:05, Dr. David Alan Gilbert wrote: > > * Peter Maydell (peter.mayd...@linaro.org) wrote: > > On Thu, 18 Mar 2021 at 10:45, Dr. David Alan Gilbert > > wrote: > > > > > > * Peter Maydell (peter.mayd...@linaro.org) wrote: > > > > Also, what does the > > > > /* 0x80 is reserved in migration.h start with 0x100 next */ > > > > comment refer to? migration.h has no instances of "RAM_SAVE" > > > > or 0x80 or 1 << 7... > > > > > > It looks like it got moved to qemu-file.h a few years ago > > > as RAM_SAVE_FLAG_HOOK. > > > > Could we put the define of RAM_SAVE_FLAG_HOOK in migration.h > > with all the other RAM_SAVE_FLAG defines ? It looks like the two > > places that use it already include migration.h... > > Do you mean move the rest of the RAM_SAVE_FLAG_* from migration/ram.c > into migration.h? > > We could do - although they're never used by anything else. Oh, I'd missed that the other RAM_SAVE_FLAG_* are in a C file. Yes, I think they would be better in a .h file, so the whole set of definitions can be together. migration/ram.h seems like a plausible place. thanks -- PMM
Re: [PATCH 3/3] i386: Make sure kvm_arch_set_tsc_khz() succeeds on migration when 'hv-reenlightenment' was exposed
* Vitaly Kuznetsov (vkuzn...@redhat.com) wrote: > KVM doesn't fully support Hyper-V reenlightenment notifications on > migration. In particular, it doesn't support emulating TSC frequency > of the source host by trapping all TSC accesses so unless TSC scaling > is supported on the destination host and KVM_SET_TSC_KHZ succeeds, it > is unsafe to proceed with migration. > > Normally, we only require KVM_SET_TSC_KHZ to succeed when 'user_tsc_khz' > was set and just 'try' KVM_SET_TSC_KHZ without otherwise. > > Introduce a new vmstate section (which is added when the guest has > reenlightenment feature enabled) and add env.tsc_khz to it. We already > have env.tsc_khz packed in 'cpu/tsc_khz' but we don't want to be dependent > on the section order. > > Signed-off-by: Vitaly Kuznetsov > --- > target/i386/kvm/hyperv.h | 1 + > target/i386/kvm/kvm.c| 11 +++ > target/i386/machine.c| 37 + > 3 files changed, 49 insertions(+) > > diff --git a/target/i386/kvm/hyperv.h b/target/i386/kvm/hyperv.h > index 67543296c3a4..c65e5c85c4d3 100644 > --- a/target/i386/kvm/hyperv.h > +++ b/target/i386/kvm/hyperv.h > @@ -20,6 +20,7 @@ > > #ifdef CONFIG_KVM > int kvm_hv_handle_exit(X86CPU *cpu, struct kvm_hyperv_exit *exit); > +int kvm_hv_tsc_frequency_loaded(X86CPU *cpu); > #endif > > int hyperv_x86_synic_add(X86CPU *cpu); > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index 7fe9f527103c..f6c4093778e9 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -1460,6 +1460,17 @@ static int hyperv_init_vcpu(X86CPU *cpu) > return 0; > } > > +int kvm_hv_tsc_frequency_loaded(X86CPU *cpu) > +{ > +CPUState *cs = CPU(cpu); > + > +/* > + * KVM doens't fully support re-enlightenment notifications so we need to ^^ tpyo Dave > + * make sure TSC frequency doesn't change upon migration. > + */ > +return kvm_arch_set_tsc_khz(cs); > +} > + > static Error *invtsc_mig_blocker; > > #define KVM_MAX_CPUID_ENTRIES 100 > diff --git a/target/i386/machine.c b/target/i386/machine.c > index 715620c58809..369a8f1e7a7a 100644 > --- a/target/i386/machine.c > +++ b/target/i386/machine.c > @@ -896,6 +896,42 @@ static const VMStateDescription > vmstate_msr_hyperv_reenlightenment = { > VMSTATE_END_OF_LIST() > } > }; > + > +static bool hyperv_tsc_frequency_needed(void *opaque) > +{ > +X86CPU *cpu = opaque; > +CPUX86State *env = &cpu->env; > + > +return env->tsc_khz != 0 && (env->msr_hv_reenlightenment_control || > + env->msr_hv_tsc_emulation_control); > +} > + > +static int hyperv_tsc_frequency_post_load(void *opaque, int version_id) > +{ > +X86CPU *cpu = opaque; > +int r; > + > +r = kvm_hv_tsc_frequency_loaded(cpu); > +if (r) { > +error_report("Failed to set the desired TSC frequency and " > + "reenlightenment was exposed"); > +return -EINVAL; > +} > + > +return 0; > +} > + > +static const VMStateDescription vmstate_msr_hyperv_tsc_frequency = { > +.name = "cpu/msr_hyperv_tsc_frequency", > +.version_id = 1, > +.minimum_version_id = 1, > +.needed = hyperv_tsc_frequency_needed, > +.post_load = hyperv_tsc_frequency_post_load, > +.fields = (VMStateField[]) { > +VMSTATE_INT64(env.tsc_khz, X86CPU), > +VMSTATE_END_OF_LIST() > +} > +}; > #endif > > static bool avx512_needed(void *opaque) > @@ -1495,6 +1531,7 @@ VMStateDescription vmstate_x86_cpu = { > &vmstate_msr_hyperv_synic, > &vmstate_msr_hyperv_stimer, > &vmstate_msr_hyperv_reenlightenment, > +&vmstate_msr_hyperv_tsc_frequency, > #endif > &vmstate_avx512, > &vmstate_xss, > -- > 2.30.2 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[PATCH 0/1] iotests: fix 051.out expected output after error
Oops, sorry about the churn. I can see why this would have caused a failure but I'm surprised I can't reproduce this when I run the test locally. Christian, would you be willing to test this patch out as a quick sanity check too? Connor Kuehl (1): iotests: fix 051.out expected output after error text touchups tests/qemu-iotests/051.out | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 2.30.2
[PATCH 1/1] iotests: fix 051.out expected output after error text touchups
A patch was recently applied that touched up some error messages that pertained to key names like 'node-name'. The trouble is it only updated tests/qemu-iotests/051.pc.out and not tests/qemu-iotests/051.out as well. Do that now. Fixes: 785ec4b1b9 ("block: Clarify error messages pertaining to 'node-name'") Signed-off-by: Connor Kuehl --- tests/qemu-iotests/051.out | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out index de4771bcb3..db8c14b903 100644 --- a/tests/qemu-iotests/051.out +++ b/tests/qemu-iotests/051.out @@ -61,13 +61,13 @@ QEMU X.Y.Z monitor - type 'help' for more information (qemu) quit Testing: -drive file=TEST_DIR/t.qcow2,node-name=123foo -QEMU_PROG: -drive file=TEST_DIR/t.qcow2,node-name=123foo: Invalid node name +QEMU_PROG: -drive file=TEST_DIR/t.qcow2,node-name=123foo: Invalid node-name: '123foo' Testing: -drive file=TEST_DIR/t.qcow2,node-name=_foo -QEMU_PROG: -drive file=TEST_DIR/t.qcow2,node-name=_foo: Invalid node name +QEMU_PROG: -drive file=TEST_DIR/t.qcow2,node-name=_foo: Invalid node-name: '_foo' Testing: -drive file=TEST_DIR/t.qcow2,node-name=foo#12 -QEMU_PROG: -drive file=TEST_DIR/t.qcow2,node-name=foo#12: Invalid node name +QEMU_PROG: -drive file=TEST_DIR/t.qcow2,node-name=foo#12: Invalid node-name: 'foo#12' === Device without drive === -- 2.30.2
Re: of AVR target page size
* Peter Maydell (peter.mayd...@linaro.org) wrote: > On Thu, 18 Mar 2021 at 10:45, Dr. David Alan Gilbert > wrote: > > > > * Peter Maydell (peter.mayd...@linaro.org) wrote: > > > Also, what does the > > > /* 0x80 is reserved in migration.h start with 0x100 next */ > > > comment refer to? migration.h has no instances of "RAM_SAVE" > > > or 0x80 or 1 << 7... > > > > It looks like it got moved to qemu-file.h a few years ago > > as RAM_SAVE_FLAG_HOOK. > > Could we put the define of RAM_SAVE_FLAG_HOOK in migration.h > with all the other RAM_SAVE_FLAG defines ? It looks like the two > places that use it already include migration.h... Do you mean move the rest of the RAM_SAVE_FLAG_* from migration/ram.c into migration.h? We could do - although they're never used by anything else. Dave > thanks > -- PMM > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: of AVR target page size
* Michael Rolnik (mrol...@gmail.com) wrote: > how do I test my fix? Is there a procedure? As long as your TARGET_PAGE_SIZE is now 512 or bigger you should be OK as long as your AVR stuff still works. If you want you can try and do a live migrate between two copies of qemu, but that does assume you'vre wired up the vmstate info for all your devices. Dave > Thanks, > Michael Rolnik > > On Thu, Mar 18, 2021 at 12:45 PM Dr. David Alan Gilbert > wrote: > > > * Peter Maydell (peter.mayd...@linaro.org) wrote: > > > On Thu, 18 Mar 2021 at 10:25, Dr. David Alan Gilbert > > > wrote: > > > > Oh yes, just: > > > > > > > > diff --git a/migration/ram.c b/migration/ram.c > > > > index 52537f14ac..a7269955b5 100644 > > > > --- a/migration/ram.c > > > > +++ b/migration/ram.c > > > > @@ -81,6 +81,8 @@ > > > > /* 0x80 is reserved in migration.h start with 0x100 next */ > > > > #define RAM_SAVE_FLAG_COMPRESS_PAGE0x100 > > > > > > > > +#define RAM_SAVE_FLAG__MAX RAM_SAVE_FLAG_COMPRESS_PAGE > > > > + > > > > static inline bool is_zero_range(uint8_t *p, uint64_t size) > > > > { > > > > return buffer_is_zero(p, size); > > > > @@ -4090,5 +4092,6 @@ static SaveVMHandlers savevm_ram_handlers = { > > > > void ram_mig_init(void) > > > > { > > > > qemu_mutex_init(&XBZRLE.lock); > > > > +QEMU_BUILD_BUG_ON(RAM_SAVE_FLAG__MAX >= (1 << > > TARGET_PAGE_BITS_MIN)); > > > > register_savevm_live("ram", 0, 4, &savevm_ram_handlers, > > &ram_state); > > > > } > > > > > > > > > > > > works; lets keep that in mind somewhere after Michael fixes AVR. > > > > > > You don't have a great deal of headroom even after getting AVR > > > to change, by the way -- TARGET_PAGE_BITS_MIN for Arm is 10. > > > So you might want to think about ways to eg reclaim usage of > > > that "obsolete, not used" RAM_SAVE_FLAG_FULL bit. > > > > Yep, I've been warning anyone who adds one for ages > > > > > Also, what does the > > > /* 0x80 is reserved in migration.h start with 0x100 next */ > > > comment refer to? migration.h has no instances of "RAM_SAVE" > > > or 0x80 or 1 << 7... > > > > It looks like it got moved to qemu-file.h a few years ago > > as RAM_SAVE_FLAG_HOOK. > > > > Dave > > > > > thanks > > > -- PMM > > > > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > > > > -- > Best Regards, > Michael Rolnik -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [PULL v4 00/42] Block layer patches and object-add QAPIfication
On Thu, 18 Mar 2021 at 09:48, Kevin Wolf wrote: > > The following changes since commit 571d413b5da6bc6f1c2aaca8484717642255ddb0: > > Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-20210316' > into staging (2021-03-17 21:02:37 +) > > are available in the Git repository at: > > git://repo.or.cz/qemu/kevin.git tags/for-upstream > > for you to fetch changes up to ef11a2d8972147994492c36cd3d26677e831e4d7: > > vl: allow passing JSON to -object (2021-03-18 10:45:01 +0100) > > > Block layer patches and object-add QAPIfication > > - QAPIfy object-add and --object > - stream: Fail gracefully if permission is denied > - storage-daemon: Fix crash on quit when job is still running > - curl: Fix use after free > - char: Deprecate backend aliases, fix QMP query-chardev-backends > - Fix image creation option defaults that exist in both the format and > the protocol layer (e.g. 'cluster_size' in qcow2 and rbd; the qcow2 > default was incorrectly applied to the rbd layer) > Auto-merging docs/system/removed-features.rst CONFLICT (content): Merge conflict in docs/system/removed-features.rst Auto-merging docs/system/deprecated.rst CONFLICT (content): Merge conflict in docs/system/deprecated.rst I'm afraid your pullreq was racing with another one deprecating features :-( thanks -- PMM
Re: [PULL v2 00/13] misc patches removing deprecated features
On Thu, 18 Mar 2021 at 09:30, Daniel P. Berrangé wrote: > > The following changes since commit 571d413b5da6bc6f1c2aaca8484717642255ddb0: > > Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-20210316' > into staging (2021-03-17 21:02:37 +) > > are available in the Git repository at: > > https://gitlab.com/berrange/qemu tags/dep-many-pull-request > > for you to fetch changes up to 8d17adf34f501ded65a106572740760f0a75577c: > > block: remove support for using "file" driver with block/char devices > (2021-03-18 09:22:55 +) > > > Remove many old deprecated features > > The following features have been deprecated for well over the 2 > release cycle we promise > > ``-drive file=json:{...{'driver':'file'}}`` (since 3.0) > ``-vnc acl`` (since 4.0.0) > ``-mon ...,control=readline,pretty=on|off`` (since 4.1) > ``migrate_set_downtime`` and ``migrate_set_speed`` (since 2.8.0) > ``query-named-block-nodes`` result ``encryption_key_missing`` (since 2.10.0) > ``query-block`` result ``inserted.encryption_key_missing`` (since 2.10.0) > ``migrate-set-cache-size`` and ``query-migrate-cache-size`` (since 2.11.0) > ``query-named-block-nodes`` and ``query-block`` result > dirty-bitmaps[i].status (since 4.0) > ``query-cpus`` (since 2.12.0) > ``query-cpus-fast`` ``arch`` output member (since 3.0.0) > ``query-events`` (since 4.0) > chardev client socket with ``wait`` option (since 4.0) > ``acl_show``, ``acl_reset``, ``acl_policy``, ``acl_add``, ``acl_remove`` > (since 4.0.0) > ``ide-drive`` (since 4.2) > ``scsi-disk`` (since 4.2) > > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/6.0 for any user-visible changes. -- PMM
Re: Serious doubts about Gitlab CI
On 3/18/21 3:46 PM, Stefan Hajnoczi wrote: On Wed, Mar 17, 2021 at 09:29:32PM +0100, Philippe Mathieu-Daudé wrote: Now I'm having serious doubts about Gitlab usefulness for the QEMU community... The QEMU Project has 50,000 minutes of GitLab CI quota. Let's enable GitLab Merge Requests so that anyone can submit a merge request and get CI coverage. How does this workflow work? I push to my branch, I submit a MR, CI runs? I suppose there must be a way for me to disable a CI run on my branch if I intend to trigger it via a MR, to avoid eating minutes twice. --js I think we need to expect free tiers to be insufficient for full CI coverage. In the longer term we probably need to rely on dedicated runners, although I'm not sure how much of the issue is QEMU's huge CI pipeline versus the performance limitations of shared runners. Stefan
Re: How to create vhdx differencing disk using qemu-img
On 3/17/21 10:37 PM, qi zhou wrote: When I create vhdx differencing disk using qemu-img, It says qemu-img: xxx.vhd Backing file not supported for file format 'vhdx' The command I used is qemu-img create -f vhdx -b test.vhdx test-snapshot.vhdx Here is my questions 1. Is vhdx format [full] supported by qemu ? We support raw and qcow2 fully, read-write. Everything else is "read-only", though write support might work, we don't encourage its use in production environments. 2. If not, is there any easy way to implement differencing disk of vhdx in qemu-img ? the -b flag is generally for qcow2 files; I am not very familiar with VHDX but it appears as though we don't support it here. I imagine it's https://www.altaro.com/hyper-v/hyper-v-differencing-disks-explained/ ? I don't think we support those... ah, yeah, in block/vhdx.c: typedef enum VHDXImageType { VHDX_TYPE_DYNAMIC = 0, VHDX_TYPE_FIXED, VHDX_TYPE_DIFFERENCING, /* Currently unsupported */ } VHDXImageType; 3. Is there any other tools support vhdx on linux ? Not that I'm aware of, but I can't say I've looked before.
Re: Serious doubts about Gitlab CI
On Wed, Mar 17, 2021 at 09:29:32PM +0100, Philippe Mathieu-Daudé wrote: > Now I'm having serious doubts about Gitlab usefulness for the QEMU > community... The QEMU Project has 50,000 minutes of GitLab CI quota. Let's enable GitLab Merge Requests so that anyone can submit a merge request and get CI coverage. I think we need to expect free tiers to be insufficient for full CI coverage. In the longer term we probably need to rely on dedicated runners, although I'm not sure how much of the issue is QEMU's huge CI pipeline versus the performance limitations of shared runners. Stefan signature.asc Description: PGP signature
Re: iotest 051 failure on s390
On 3/18/21 12:32 PM, Christian Borntraeger wrote: On s390 with latest master I do get 051 fail [17:30:00] [17:30:05] 5.4s output mismatch (see 051.out.bad) --- /home/cborntra/REPOS/qemu/tests/qemu-iotests/051.out +++ 051.out.bad @@ -61,13 +61,13 @@ (qemu) quit Testing: -drive file=TEST_DIR/t.qcow2,node-name=123foo -QEMU_PROG: -drive file=TEST_DIR/t.qcow2,node-name=123foo: Invalid node name +QEMU_PROG: -drive file=TEST_DIR/t.qcow2,node-name=123foo: Invalid node-name: '123foo' Testing: -drive file=TEST_DIR/t.qcow2,node-name=_foo -QEMU_PROG: -drive file=TEST_DIR/t.qcow2,node-name=_foo: Invalid node name +QEMU_PROG: -drive file=TEST_DIR/t.qcow2,node-name=_foo: Invalid node-name: '_foo' Testing: -drive file=TEST_DIR/t.qcow2,node-name=foo#12 -QEMU_PROG: -drive file=TEST_DIR/t.qcow2,node-name=foo#12: Invalid node name +QEMU_PROG: -drive file=TEST_DIR/t.qcow2,node-name=foo#12: Invalid node-name: 'foo#12' Looks like we forgot to update 051.out in addition to 051.pc.out in 785ec4b1. Easy to fix. --js
Re: Microsoft and Intel NVDIMM ACPI _DSM interfaces status?
On Thu, Mar 18, 2021 at 02:00:29AM +, Dexuan Cui wrote: > > From: Laszlo Ersek > > Sent: Wednesday, March 17, 2021 3:45 PM > > > The specs for the Intel interface are available here: > > > ... > > > This is the interface that QEMU emulates. It has been reported that > > > Windows 2016 Server and 2019 Server guests do not recognize QEMU's > > > emulated NVDIMM devices using the Microsoft driver. > > I'm not sure why this happens -- sorry, I have no Windows knowledge. > > > > Should QEMU emulate both of them to make running Windows guests easy? > > I'm not sure about the background here, but since it looks like QEMU is > already > able to emulate the Intel NVDIMM, I suppose it should be quick to add the > emulation of the Hyper-V NVDIMM. I think they're pretty similar, and the > _DSM interface supported by Hyper-V NVDIMM is simple. > > > In my (uneducated) opinion: yes. Microsoft standarized their Region > > Format Interface, with their _DSM UUID and all; and right now, that spec > > seems to be the *only* officially approved format in the RFIC registry. > > So it's plausible to me that, unlike the Linux kernel, Microsoft's > > driver doesn't support the -- technically unapproved, nonstandard -- > > Intel Region Format Interface. > > > > Dexuan, please correct me if I'm wrong. > > > > Thanks, > > Laszlo > > Hi Laszlo, I'm not 100% sure, but I guess your may be correct. > > BTW, earlier in 2019, we made the below patches (which are in the mainline): > > 2019-02-28libnvdimm/btt: Fix LBA masking during 'free list' population > 2019-02-12acpi/nfit: Require opt-in for read-only label configurations > 2019-02-02libnvdimm/dimm: Add a no-BLK quirk based on NVDIMM family > 2019-01-29nfit: Add Hyper-V NVDIMM DSM command set to white list > 2019-01-29nfit: acpi_nfit_ctl(): Check out_obj->type in the right place > > The patches improve the interoperability between Windows VM and > Linux VM, e.g. the same Hyper-V NVDIMM device can work this way: > the Windows VM creates an NTFS partition based on the device, and > creates a text file in the partition, and later we shut down the Windows VM > and assign the device to Linux VM, and Linux VM is able to read the text file. > > Before the patches, IIRC, Linux VM could only use the Hyper-V NVIDMM > device in label-less mode. > > Let me share some old 2019 notes about Hyper-V NVDIMM, in case the > info may be helpful to you: > > " > In Linux VM, IMO the label-less mode is preferred for Hyper-V NVDIMM, > because Hyper-V does not support _LSW (I'm not sure about the latest > status), so Dan made the patch "acpi/nfit: Require opt-in for read-only > label configurations" to not use the Hyper-V label info, by default. > In label-less mode, when creating a namespace, Linux can set it to > one of the 4 namespace modes: fsdax, devdax, sector, and raw (these > namespace modes are Linux-specific and can not be recognized > by Windows.). > > With the "nfit.force_labels" bootup-time kernel parameter, Linux can > be forced to be in label mode, and then if Hyper-V initializes the 4KB > BTT Info Block(s) with the standard EFI_BTT_ABSTRACTION_GUID > (which is defined in UEFI 2.7 spec), we're supposed to support the > "interoperability" between Windows VM and Linux VM. > > Note: label-less mode is incompatible with label mode. A namespace > created in one mode can't be recognized when Linux runs in the other > mode. In label mode, so far, only 2 namespace modes (raw and sector) > can be supported by the Hyper-V NVDIMM, because Hyper-V doesn't > support _LSW, yet. If Hyper-V sets the EFI_BTT_ABSTRACTION_GUID, > the namespace is "BTT-capable" and can be in sector namespace > mode, otherwise it's in raw namespace mode. > > After a Windows VM initializes a BTT-capable namespace in a Hyper-V > NVDIMM device by creating a NTFS file system in the namespace, we > can re-assign the Hyper-V NVDIMM device to Linux VM, and in label > mode Linux VM is supposed to be able to read and write the files in > the NTFS file system. > " Thank you, Laszlo and Dexuan! I wonder if there are existing Windows drivers available that work with QEMU's NVDIMM device. Otherwise it may make sense to implement the Hyper-V interface. Stefan Stefan signature.asc Description: PGP signature
Re: Microsoft and Intel NVDIMM ACPI _DSM interfaces status?
On Wed, Mar 17, 2021 at 04:52:03PM -0700, Dan Williams wrote: > On Wed, Mar 17, 2021 at 4:49 AM Stefan Hajnoczi wrote: > > > > Hi, > > Microsoft and Intel developed two different ACPI NVDIMM _DSM interfaces. > > > > The specs for the Intel interface are available here: > > https://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf > > > > This is the interface that QEMU emulates. It has been reported that > > Windows 2016 Server and 2019 Server guests do not recognize QEMU's > > emulated NVDIMM devices using the Microsoft driver. > > > > I'd like to understand the path forward that will allow both Linux and > > Windows guests to successfully use QEMU's emulated NVDIMM device > > (https://gitlab.com/qemu-project/qemu/-/blob/master/hw/acpi/nvdimm.c). > > > > Are/have these two interfaces being/been unified? > > No, they service 2 different classes of NVDIMMs. The Microsoft > specification was defined for NVDIMM-N devices that are the > traditional battery/super-capacitor backed DDR with sometimes an equal > amount of flash to squirrel away data to non-volatile media on power > loss. The Intel one is for a persistent media class of device where > there is no need to communicate health attributes like "energy source > died" or "restore from flash" failed. > > > Should QEMU emulate both of them to make running Windows guests easy? > > Depends on how tolerant Windows is to different format-interface-code > implementations and what QEMU should return on each of the functions > it decides to implement. > > Note that QEMU only implements a small subset of the Intel commands, > i.e. just enough to provision namespaces in the NVDIMM label area. > "NVDIMM label area" is a concept that is newer than the NVDIMM-N > definition which is why you don't see labels mentioned in the > Microsoft specification. Since then ACPI has developed common label > area access methods, see "6.5.10 NVDIMM Label Methods" in the ACPI 6.4 > specification. > > Note that you'll also see "9.20.8 NVDIMM Device Methods" in that spec > for some health management commands that overlap what the Microsoft > and Intel specs communicate. Linux does not support them since any > platform that might support them will also support the Intel > specification so there's no driving need for Linux to migrate. I do > not know the status of that command support in Windows, or the HPE > command support in Windows for that matter. > > If you need to support guests that expect the Hyper-V command > definition, and will fail to attach NVDIMM support in the absence of > that definition then QEMU needs UUID_NFIT_DIMM_N_HYPERV support, note > that is a different command set than even UUID_NFIT_DIMM_N_MSFT > (include/acpi/acuuid.h). That would also require changes to virtual > ACPI NFIT to advertise the correlated format interface code. There may > be more... you would need someone from Hyper-V land to enumerate all > that is expected. > Thanks! Stefan signature.asc Description: PGP signature
Re: [PATCH for-6.0 v2 4/5] hw/core/loader: Add new function rom_ptr_for_as()
On Thu, 18 Mar 2021 at 18:44, Richard Henderson wrote: > > On 3/18/21 11:48 AM, Peter Maydell wrote: > > + > > +void *rom_ptr_for_as(AddressSpace *as, hwaddr addr, size_t size) > > +{ > > +/* > > + * Find any ROM data for the given guest address range. If there > > + * is a ROM blob then return a pointer to the host memory > > + * corresponding to 'addr'; otherwise return NULL. > > + * > > + * We look not only for ROM blobs that were loaded directly to > > + * addr, but also for ROM blobs that were loaded to aliases of > > + * that memory at other addresses within the AddressSpace. > > + * > > + * Note that we do not check @as against the 'as' member in the > > + * 'struct Rom' returned by rom_ptr(). The Rom::as is the > > + * AddressSpace which the rom blob should be written to, whereas > > + * our @as argument is the AddressSpace which we are (effectively) > > + * reading from, and the same underlying RAM will often be visible > > + * in multiple AddressSpaces. (A common example is a ROM blob > > + * written to the 'system' address space but then read back via a > > + * CPU's cpu->as pointer.) This does mean we might potentially > > + * return a false-positive match if a ROM blob was loaded into an > > + * AS which is entirely separate and distinct from the one we're > > + * querying, but this issue exists also for rom_ptr() and hasn't > > + * caused any problems in practice. > > + */ > > +FlatView *fv; > > +void *rom; > > +hwaddr len_unused; > > +FindRomCBData cbdata = {}; > > + > > +/* Easy case: there's data at the actual address */ > > +rom = rom_ptr(addr, size); > > +if (rom) { > > +return rom; > > +} > > Should you really have this special case? Nowhere is this going to verify > that > @addr is in @as. It's the "happens almost all the time" case. Nothing verifies that @addr is in @as anyway -- see the "Note that" part of the comment above. thanks -- PMM
Re: [PULL 00/38] tcg patch queue for 6.0
On Wed, 17 Mar 2021 at 15:34, Richard Henderson wrote: > > The following changes since commit 5d1428d6c43942cfb40a909e4c30a5cbb81bda8f: > > Merge remote-tracking branch > 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20210315' into staging > (2021-03-17 09:07:28 +) > > are available in the Git repository at: > > https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210317 > > for you to fetch changes up to 5e8892db93f3fb6a7221f2d47f3c952a7e489737: > > tcg: Fix prototypes for tcg_out_vec_op and tcg_out_op (2021-03-17 09:04:45 > -0600) > > > TCI argument extraction helpers and disassembler > TCG build fix for gcc 11 Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/6.0 for any user-visible changes. -- PMM
[PATCH 1/2] plugins: Update qemu-plugins.symbols to match qemu-plugins.h
Reorder the function symbols that consistence with qemu-plugins.h Signed-off-by: Yonggang Luo --- plugins/qemu-plugins.symbols | 25 - 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/plugins/qemu-plugins.symbols b/plugins/qemu-plugins.symbols index 4bdb381f48..a0ac1df62a 100644 --- a/plugins/qemu-plugins.symbols +++ b/plugins/qemu-plugins.symbols @@ -5,35 +5,34 @@ qemu_plugin_register_vcpu_exit_cb; qemu_plugin_register_vcpu_idle_cb; qemu_plugin_register_vcpu_resume_cb; - qemu_plugin_register_vcpu_insn_exec_cb; - qemu_plugin_register_vcpu_insn_exec_inline; - qemu_plugin_register_vcpu_mem_cb; - qemu_plugin_register_vcpu_mem_haddr_cb; - qemu_plugin_register_vcpu_mem_inline; - qemu_plugin_ram_addr_from_host; qemu_plugin_register_vcpu_tb_trans_cb; qemu_plugin_register_vcpu_tb_exec_cb; qemu_plugin_register_vcpu_tb_exec_inline; - qemu_plugin_register_flush_cb; - qemu_plugin_register_vcpu_syscall_cb; - qemu_plugin_register_vcpu_syscall_ret_cb; - qemu_plugin_register_atexit_cb; + qemu_plugin_register_vcpu_insn_exec_cb; + qemu_plugin_register_vcpu_insn_exec_inline; qemu_plugin_tb_n_insns; - qemu_plugin_tb_get_insn; qemu_plugin_tb_vaddr; + qemu_plugin_tb_get_insn; qemu_plugin_insn_data; qemu_plugin_insn_size; qemu_plugin_insn_vaddr; qemu_plugin_insn_haddr; - qemu_plugin_insn_disas; qemu_plugin_mem_size_shift; qemu_plugin_mem_is_sign_extended; qemu_plugin_mem_is_big_endian; qemu_plugin_mem_is_store; qemu_plugin_get_hwaddr; qemu_plugin_hwaddr_is_io; - qemu_plugin_hwaddr_to_raddr; + qemu_plugin_hwaddr_phys_addr; + qemu_plugin_hwaddr_device_name; + qemu_plugin_register_vcpu_mem_cb; + qemu_plugin_register_vcpu_mem_inline; + qemu_plugin_register_vcpu_syscall_cb; + qemu_plugin_register_vcpu_syscall_ret_cb; + qemu_plugin_insn_disas; qemu_plugin_vcpu_for_each; + qemu_plugin_register_flush_cb; + qemu_plugin_register_atexit_cb; qemu_plugin_n_vcpus; qemu_plugin_n_max_vcpus; qemu_plugin_outs; -- 2.29.2.windows.3
[PATCH 2/2] plugins: Move all typedef and type declaration to the front of the qemu-plugin.h
Signed-off-by: Yonggang Luo --- include/qemu/qemu-plugin.h | 187 ++--- 1 file changed, 92 insertions(+), 95 deletions(-) diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h index 97cdfd7761..2cb17f3051 100644 --- a/include/qemu/qemu-plugin.h +++ b/include/qemu/qemu-plugin.h @@ -81,27 +81,6 @@ typedef struct qemu_info_t { }; } qemu_info_t; -/** - * qemu_plugin_install() - Install a plugin - * @id: this plugin's opaque ID - * @info: a block describing some details about the guest - * @argc: number of arguments - * @argv: array of arguments (@argc elements) - * - * All plugins must export this symbol which is called when the plugin - * is first loaded. Calling qemu_plugin_uninstall() from this function - * is a bug. - * - * Note: @info is only live during the call. Copy any information we - * want to keep. @argv remains valid throughout the lifetime of the - * loaded plugin. - * - * Return: 0 on successful loading, !0 for an error. - */ -QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id, - const qemu_info_t *info, - int argc, char **argv); - /** * typedef qemu_plugin_simple_cb_t - simple callback * @id: the unique qemu_plugin_id_t @@ -135,6 +114,98 @@ typedef void (*qemu_plugin_vcpu_simple_cb_t)(qemu_plugin_id_t id, typedef void (*qemu_plugin_vcpu_udata_cb_t)(unsigned int vcpu_index, void *userdata); +/** struct qemu_plugin_tb - Opaque handle for a translation block */ +struct qemu_plugin_tb; +/** struct qemu_plugin_insn - Opaque handle for a translated instruction */ +struct qemu_plugin_insn; + +/** + * enum qemu_plugin_cb_flags - type of callback + * + * @QEMU_PLUGIN_CB_NO_REGS: callback does not access the CPU's regs + * @QEMU_PLUGIN_CB_R_REGS: callback reads the CPU's regs + * @QEMU_PLUGIN_CB_RW_REGS: callback reads and writes the CPU's regs + * + * Note: currently unused, plugins cannot read or change system + * register state. + */ +enum qemu_plugin_cb_flags { +QEMU_PLUGIN_CB_NO_REGS, +QEMU_PLUGIN_CB_R_REGS, +QEMU_PLUGIN_CB_RW_REGS, +}; + +enum qemu_plugin_mem_rw { +QEMU_PLUGIN_MEM_R = 1, +QEMU_PLUGIN_MEM_W, +QEMU_PLUGIN_MEM_RW, +}; + +/** + * typedef qemu_plugin_vcpu_tb_trans_cb_t - translation callback + * @id: unique plugin id + * @tb: opaque handle used for querying and instrumenting a block. + */ +typedef void (*qemu_plugin_vcpu_tb_trans_cb_t)(qemu_plugin_id_t id, + struct qemu_plugin_tb *tb); + +/** + * enum qemu_plugin_op - describes an inline op + * + * @QEMU_PLUGIN_INLINE_ADD_U64: add an immediate value uint64_t + * + * Note: currently only a single inline op is supported. + */ + +enum qemu_plugin_op { +QEMU_PLUGIN_INLINE_ADD_U64, +}; + +/** + * typedef qemu_plugin_meminfo_t - opaque memory transaction handle + * + * This can be further queried using the qemu_plugin_mem_* query + * functions. + */ +typedef uint32_t qemu_plugin_meminfo_t; +/** struct qemu_plugin_hwaddr - opaque hw address handle */ +struct qemu_plugin_hwaddr; + +typedef void +(*qemu_plugin_vcpu_mem_cb_t)(unsigned int vcpu_index, + qemu_plugin_meminfo_t info, uint64_t vaddr, + void *userdata); + +typedef void +(*qemu_plugin_vcpu_syscall_cb_t)(qemu_plugin_id_t id, unsigned int vcpu_index, + int64_t num, uint64_t a1, uint64_t a2, + uint64_t a3, uint64_t a4, uint64_t a5, + uint64_t a6, uint64_t a7, uint64_t a8); +typedef void +(*qemu_plugin_vcpu_syscall_ret_cb_t)(qemu_plugin_id_t id, unsigned int vcpu_idx, + int64_t num, int64_t ret); + +/** + * qemu_plugin_install() - Install a plugin + * @id: this plugin's opaque ID + * @info: a block describing some details about the guest + * @argc: number of arguments + * @argv: array of arguments (@argc elements) + * + * All plugins must export this symbol which is called when the plugin + * is first loaded. Calling qemu_plugin_uninstall() from this function + * is a bug. + * + * Note: @info is only live during the call. Copy any information we + * want to keep. @argv remains valid throughout the lifetime of the + * loaded plugin. + * + * Return: 0 on successful loading, !0 for an error. + */ +QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id, + const qemu_info_t *info, + int argc, char **argv); + /** * qemu_plugin_uninstall() - Uninstall a plugin * @id: this plugin's opaque ID @@ -205,41 +276,6 @@ void qemu_plugin_register_vcpu_idle_cb(qemu_plugin_id_t id, void qemu_plugin_register_vcpu_resume_cb(qemu_plugin_id_t id, qemu_plugin_vcpu_simple_cb_t cb); -/** struct qemu_plugin_tb - Opa
[PATCH 0/2] Fix qemu-plugins.symbols and improve qemu-plugins.h
Yonggang Luo (2): plugins: Update qemu-plugins.symbols to match qemu-plugins.h plugins: Move all typedef and type declaration to the front of the qemu-plugin.h include/qemu/qemu-plugin.h | 187 +-- plugins/qemu-plugins.symbols | 25 +++-- 2 files changed, 104 insertions(+), 108 deletions(-) -- 2.29.2.windows.3
Re: [PATCH for-6.0 v2 5/5] target/arm: Make M-profile VTOR loads on reset handle memory aliasing
On 3/18/21 11:48 AM, Peter Maydell wrote: For Arm M-profile CPUs, on reset the CPU must load its initial PC and SP from a vector table in guest memory. Because we can't guarantee reset ordering, we have to handle the possibility that the ROM blob loader's reset function has not yet run when the CPU resets, in which case the data in an ELF file specified by the user won't be in guest memory to be read yet. We work around the reset ordering problem by checking whether the ROM blob loader has any data for the address where the vector table is, using rom_ptr(). Unfortunately this does not handle the possibility of memory aliasing. For many M-profile boards, memory can be accessed via multiple possible physical addresses; if the board has the vector table at address X but the user's ELF file loads data via a different address Y which is an alias to the same underlying guest RAM then rom_ptr() will not find it. Use the new rom_ptr_for_as() function, which deals with memory aliasing when locating a relevant ROM blob. Signed-off-by: Peter Maydell --- target/arm/cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Richard Henderson r~
Re: [PATCH for-6.0 v2 4/5] hw/core/loader: Add new function rom_ptr_for_as()
On 3/18/21 11:48 AM, Peter Maydell wrote: For accesses to rom blob data before or during reset, we have a function rom_ptr() which looks for a rom blob that would be loaded to the specified address, and returns a pointer into the rom blob data corresponding to that address. This allows board or CPU code to say "what is the data that is going to be loaded to this address?". However, this function does not take account of memory region aliases. If for instance a machine model has RAM at address 0x_ which is aliased to also appear at 0x1000_, a rom_ptr() query for address 0x_ will only return a match if the guest image provided by the user was loaded at 0x_ and not if it was loaded at 0x1000_, even though they are the same RAM and a run-time guest CPU read of 0x_ will read the data loaded to 0x1000_. Provide a new function rom_ptr_for_as() which takes an AddressSpace argument, so that it can check whether the MemoryRegion corresponding to the address is also mapped anywhere else in the AddressSpace and look for rom blobs that loaded to that alias. Signed-off-by: Peter Maydell --- include/hw/loader.h | 31 +++ hw/core/loader.c| 75 + 2 files changed, 106 insertions(+) diff --git a/include/hw/loader.h b/include/hw/loader.h index a9eeea39521..cbfc1848737 100644 --- a/include/hw/loader.h +++ b/include/hw/loader.h @@ -290,6 +290,37 @@ void rom_transaction_end(bool commit); int rom_copy(uint8_t *dest, hwaddr addr, size_t size); void *rom_ptr(hwaddr addr, size_t size); +/** + * rom_ptr_for_as: Return a pointer to ROM blob data for the address + * @as: AddressSpace to look for the ROM blob in + * @addr: Address within @as + * @size: size of data required in bytes + * + * Returns: pointer into the data which backs the matching ROM blob, + * or NULL if no blob covers the address range. + * + * This function looks for a ROM blob which covers the specified range + * of bytes of length @size starting at @addr within the address space + * @as. This is useful for code which runs as part of board + * initialization or CPU reset which wants to read data that is part + * of a user-supplied guest image or other guest memory contents, but + * which runs before the ROM loader's reset function has copied the + * blobs into guest memory. + * + * rom_ptr_for_as() will look not just for blobs loaded directly to + * the specified address, but also for blobs which were loaded to an + * alias of the region at a different location in the AddressSpace. + * In other words, if a machine model has RAM at address 0x_ + * which is aliased to also appear at 0x1000_, rom_ptr_for_as() + * will return the correct data whether the guest image was linked and + * loaded at 0x_ or 0x1000_. Contrast rom_ptr(), which + * will only return data if the image load address is an exact match + * with the queried address. + * + * New code should prefer to use rom_ptr_for_as() instead of + * rom_ptr(). + */ +void *rom_ptr_for_as(AddressSpace *as, hwaddr addr, size_t size); void hmp_info_roms(Monitor *mon, const QDict *qdict); #define rom_add_file_fixed(_f, _a, _i) \ diff --git a/hw/core/loader.c b/hw/core/loader.c index 9feca32de98..d3e5f3b423f 100644 --- a/hw/core/loader.c +++ b/hw/core/loader.c @@ -1383,6 +1383,81 @@ void *rom_ptr(hwaddr addr, size_t size) return rom->data + (addr - rom->addr); } +typedef struct FindRomCBData { +size_t size; /* Amount of data we want from ROM, in bytes */ +MemoryRegion *mr; /* MR at the unaliased guest addr */ +hwaddr xlat; /* Offset of addr within mr */ +void *rom; /* Output: rom data pointer, if found */ +} FindRomCBData; + +static bool find_rom_cb(Int128 start, Int128 len, const MemoryRegion *mr, +hwaddr offset_in_region, void *opaque) +{ +FindRomCBData *cbdata = opaque; +hwaddr alias_addr; + +if (mr != cbdata->mr) { +return false; +} + +alias_addr = int128_get64(start) + cbdata->xlat - offset_in_region; +cbdata->rom = rom_ptr(alias_addr, cbdata->size); +if (!cbdata->rom) { +return false; +} +/* Found a match, stop iterating */ +return true; +} + +void *rom_ptr_for_as(AddressSpace *as, hwaddr addr, size_t size) +{ +/* + * Find any ROM data for the given guest address range. If there + * is a ROM blob then return a pointer to the host memory + * corresponding to 'addr'; otherwise return NULL. + * + * We look not only for ROM blobs that were loaded directly to + * addr, but also for ROM blobs that were loaded to aliases of + * that memory at other addresses within the AddressSpace. + * + * Note that we do not check @as against the 'as' member in the + * 'struct Rom' returned by rom_ptr(). The Rom::as is the + * AddressSpace which the rom blob should be written to, whereas + * our @as argument is the
Re: [PATCH for-6.0 v2 2/5] memory: Document flatview_for_each_range()
On 3/18/21 11:48 AM, Peter Maydell wrote: Add a documentation comment describing flatview_for_each_range(). Signed-off-by: Peter Maydell --- include/exec/memory.h | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH for-6.0 v2 3/5] memory: Add offset_in_region to flatview_cb arguments
On 3/18/21 11:48 AM, Peter Maydell wrote: The function flatview_for_each_range() calls a callback for each range in a FlatView. Currently the callback gets the start and length of the range and the MemoryRegion involved, but not the offset within the MemoryRegion. Add this to the callback's arguments; we're going to want it for a new use in the next commit. Signed-off-by: Peter Maydell --- include/exec/memory.h | 2 ++ softmmu/memory.c| 4 +++- tests/qtest/fuzz/generic_fuzz.c | 5 - 3 files changed, 9 insertions(+), 2 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH for-6.0 v2 1/5] memory: Make flatview_cb return bool, not int
On 3/18/21 11:48 AM, Peter Maydell wrote: The return value of the flatview_cb callback passed to the flatview_for_each_range() function is zero if the iteration through the ranges should continue, or non-zero to break out of it. Use a bool for this rather than int. Signed-off-by: Peter Maydell --- include/exec/memory.h | 6 +++--- tests/qtest/fuzz/generic_fuzz.c | 8 2 files changed, 7 insertions(+), 7 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v8 16/35] Hexagon (target/hexagon/conv_emu.[ch]) utility functions
On 3/18/21 12:03 PM, Taylor Simpson wrote: Here's an example from float_convs from single: f32(-0x1.31f75000p-40:0xab98fba8) Softfloat:to uint64: 0 (INEXACT ) Hexagon:to uint64: 0 (INVALID) Ahh, so an ieee conformance issue in hexagon -- failure to defer the sign check til after rounding. So, just looking at the float_convs tests the Hexagon version of f32->uint64 would be if (float32_is_neg(RsV) && !float32_is_any_nan(RsV)) { float_raise(float_flag_invalid, &env->fp_status); RddV = 0; } else { RddV = float32_to_uint64_round_to_zero(RsV, &env->fp_status); } Looks good. Just add a comment too. r~
Re: [PATCH v2 0/6] esp: fix asserts/segfaults discovered by fuzzer
On 18/03/21 00:02, Mark Cave-Ayland wrote: Recently there have been a number of issues raised on Launchpad as a result of fuzzing the am53c974 (ESP) device. I spent some time over the past couple of days checking to see if anything had improved since my last patchset: from what I can tell the issues are still present, but the cmdfifo related failures now assert rather than corrupting memory. This patchset applied to master passes my local tests using the qtest fuzz test cases added by Alexander for the following Launchpad bugs: https://bugs.launchpad.net/qemu/+bug/1919035 https://bugs.launchpad.net/qemu/+bug/1919036 https://bugs.launchpad.net/qemu/+bug/1910723 https://bugs.launchpad.net/qemu/+bug/1909247 I'm posting this now just before soft freeze since I see that some of the issues have recently been allocated CVEs and so it could be argued that even though they have existed for some time, it is worth fixing them for 6.0. Signed-off-by: Mark Cave-Ayland v2: - Add Alexander's R-B tag for patch 2 and Phil's R-B for patch 3 - Add patch 4 for additional testcase provided in Alexander's patch 1 comment - Move current_req NULL checks forward in DMA functions (fixes ASAN bug reported at https://bugs.launchpad.net/qemu/+bug/1909247/comments/6) in patch 3 - Add qtest for am53c974 containing a basic set of regression tests using the automatic test cases generated by the fuzzer as requested by Paolo Mark Cave-Ayland (6): esp: don't underflow cmdfifo if no message out/command data is present esp: don't overflow cmdfifo if TC is larger than the cmdfifo size esp: ensure cmdfifo is not empty and current_dev is non-NULL esp: don't underflow fifo when writing to the device esp: always check current_req is not NULL before use in DMA callbacks tests/qtest: add tests for am53c974 device hw/scsi/esp.c | 73 + tests/qtest/am53c974-test.c | 122 tests/qtest/meson.build | 1 + 3 files changed, 171 insertions(+), 25 deletions(-) create mode 100644 tests/qtest/am53c974-test.c Queued, thanks. Paolo
Re: [PATCH 2/3] migration: Inhibit virtio-balloon for the duration of background snapshot
On 18.03.21 18:46, Andrey Gruzdev wrote: The same thing as for incoming postcopy - we cannot deal with concurrent RAM discards when using background snapshot feature in outgoing migration. Signed-off-by: Andrey Gruzdev --- hw/virtio/virtio-balloon.c | 8 ++-- include/migration/misc.h | 2 ++ migration/migration.c | 8 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index e770955176..d120bf8f43 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -66,8 +66,12 @@ static bool virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp, static bool virtio_balloon_inhibited(void) { -/* Postcopy cannot deal with concurrent discards, so it's special. */ -return ram_block_discard_is_disabled() || migration_in_incoming_postcopy(); +/* + * Postcopy cannot deal with concurrent discards, + * so it's special, as well as background snapshots. + */ +return ram_block_discard_is_disabled() || migration_in_incoming_postcopy() || +migration_in_bg_snapshot(); } static void balloon_inflate_page(VirtIOBalloon *balloon, diff --git a/include/migration/misc.h b/include/migration/misc.h index bccc1b6b44..738675ef52 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -70,6 +70,8 @@ bool migration_in_postcopy_after_devices(MigrationState *); void migration_global_dump(Monitor *mon); /* True if incomming migration entered POSTCOPY_INCOMING_DISCARD */ bool migration_in_incoming_postcopy(void); +/* True if background snapshot is active */ +bool migration_in_bg_snapshot(void); /* migration/block-dirty-bitmap.c */ void dirty_bitmap_mig_init(void); diff --git a/migration/migration.c b/migration/migration.c index 496cf6e17b..656d6249a6 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1976,6 +1976,14 @@ bool migration_in_incoming_postcopy(void) return ps >= POSTCOPY_INCOMING_DISCARD && ps < POSTCOPY_INCOMING_END; } +bool migration_in_bg_snapshot(void) +{ +MigrationState *s = migrate_get_current(); + +return migrate_background_snapshot() && +migration_is_setup_or_active(s->state); +} + bool migration_is_idle(void) { MigrationState *s = current_migration; Reviewed-by: David Hildenbrand -- Thanks, David / dhildenb
Re: KVM_MEM_READONLY slot flag not working properly
On 18/03/21 18:40, Lorenzo Susini wrote: Well I'm sorry but I didn't know IDT was marked as read only by Linux. If it is read only, how can you register any new interrupt handler? I guess it's a way of securing stuff against malicious attacks. I was taking for granted that the IDT was written when registering a new irq handler, given that when an interrupt is raised, the new specified handler has to be called and its address should be retrieved in some way, that is by storing it in the IDT. There's a list of handlers for each IDT entry. This is because the IDT entrypoint has to do more stuff before and after calling the function (and also it has to return with IRET instead of RET). So the IDT entry does not point directly to the function that you register. (Also some interrupts may be shared by multiple devices, in which case you can have more than one handler). I'm sorry, I'm a student and I'm trying to understand things, Thank you, Lorenzo No problem. :) Paolo
RE: [PATCH v8 16/35] Hexagon (target/hexagon/conv_emu.[ch]) utility functions
> -Original Message- > From: Richard Henderson > Sent: Thursday, March 18, 2021 10:36 AM > To: Taylor Simpson ; qemu-devel@nongnu.org > Cc: phi...@redhat.com; alex.ben...@linaro.org; laur...@vivier.eu; > a...@rev.ng; Brian Cain > Subject: Re: [PATCH v8 16/35] Hexagon (target/hexagon/conv_emu.[ch]) > utility functions > > On 3/18/21 8:11 AM, Taylor Simpson wrote: > > Actually, softfloat raises inexact instead of invalid. Is there a way to > override? > > Not true: > > switch (p.cls) { > case float_class_snan: > case float_class_qnan: > s->float_exception_flags = orig_flags | float_flag_invalid; > return max; Sorry, I was mistaken. The difference is in the negative case. } else if (float32_is_neg(RsV)) { float_raise(float_flag_invalid, &env->fp_status); RddV = 0; For many cases, this is what softfloat does. The difference comes when value is both negative and near zero (exponent is negative). Softfloat checks the exponent before checking the sign. round_to_int does this first if (a.exp < 0) { bool one; /* all fractional */ s->float_exception_flags |= float_flag_inexact; Then it sets a.cls = float_class_zero; So round_to_uint_and_pack does this case float_class_zero: return 0; Here's an example from float_convs from single: f32(-0x1.31f75000p-40:0xab98fba8) Softfloat:to uint64: 0 (INEXACT ) Hexagon:to uint64: 0 (INVALID) So, just looking at the float_convs tests the Hexagon version of f32->uint64 would be if (float32_is_neg(RsV) && !float32_is_any_nan(RsV)) { float_raise(float_flag_invalid, &env->fp_status); RddV = 0; } else { RddV = float32_to_uint64_round_to_zero(RsV, &env->fp_status); } Thanks, Taylor
Re: [PATCH] intc/i8259: avoid (false positive) gcc warning
On Thu, 18 Mar 2021, Philippe Mathieu-Daudé wrote: On 3/18/21 5:11 PM, Christian Borntraeger wrote: On 18.03.21 17:03, Paolo Bonzini wrote: On 18/03/21 16:47, Christian Borntraeger wrote: some copiler versions are smart enough to detect a potentially uninitialized variable, but are not smart enough to detect that this cannot happen due to the code flow: ../hw/intc/i8259.c: In function ‘pic_read_irq’: ../hw/intc/i8259.c:203:13: error: ‘irq2’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 203 | irq = irq2 + 8; | ^~ Let us initialize irq2 to -1 to avoid this warning as the most simple solution. What about: This also works, but if you want to go down that path then it would be good if you could do this patch as I do not have the testing infrastructure to do proper x86 changes. diff --git a/hw/intc/i8259.c b/hw/intc/i8259.c index 344fd04db1..bf28c179de 100644 --- a/hw/intc/i8259.c +++ b/hw/intc/i8259.c @@ -189,20 +189,18 @@ int pic_read_irq(DeviceState *d) irq2 = 7; } intno = slave_pic->irq_base + irq2; + irq = irq2 + 8; + pic_intack(s, 2); } else { intno = s->irq_base + irq; + pic_intack(s, irq); } - pic_intack(s, irq); } else { /* spurious IRQ on host controller */ irq = 7; intno = s->irq_base + irq; } - if (irq == 2) { - irq = irq2 + 8; - } - This looks like the patch I just sent :) Except this is simpler and more straightforward. I like this better FWIW. Regards, BALATON Zoltan
[PATCH for-6.0 v2 2/5] memory: Document flatview_for_each_range()
Add a documentation comment describing flatview_for_each_range(). Signed-off-by: Peter Maydell --- include/exec/memory.h | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index 22c10b8496a..71a1841943e 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -776,11 +776,33 @@ static inline FlatView *address_space_to_flatview(AddressSpace *as) return qatomic_rcu_read(&as->current_map); } +/** + * typedef flatview_cb: callback for flatview_for_each_range() + * + * @start: start address of the range within the FlatView + * @len: length of the range in bytes + * @mr: MemoryRegion covering this range + * @opaque: data pointer passed to flatview_for_each_range() + * + * Returns: true to stop the iteration, false to keep going. + */ typedef bool (*flatview_cb)(Int128 start, Int128 len, -const MemoryRegion*, void*); +const MemoryRegion *mr, +void *opaque); -void flatview_for_each_range(FlatView *fv, flatview_cb cb , void *opaque); +/** + * flatview_for_each_range: Iterate through a FlatView + * @fv: the FlatView to iterate through + * @cb: function to call for each range + * @opaque: opaque data pointer to pass to @cb + * + * A FlatView is made up of a list of non-overlapping ranges, each of + * which is a slice of a MemoryRegion. This function iterates through + * each range in @fv, calling @cb. The callback function can terminate + * iteration early by returning 'true'. + */ +void flatview_for_each_range(FlatView *fv, flatview_cb cb, void *opaque); /** * struct MemoryRegionSection: describes a fragment of a #MemoryRegion -- 2.20.1
[PATCH for-6.0 v2 5/5] target/arm: Make M-profile VTOR loads on reset handle memory aliasing
For Arm M-profile CPUs, on reset the CPU must load its initial PC and SP from a vector table in guest memory. Because we can't guarantee reset ordering, we have to handle the possibility that the ROM blob loader's reset function has not yet run when the CPU resets, in which case the data in an ELF file specified by the user won't be in guest memory to be read yet. We work around the reset ordering problem by checking whether the ROM blob loader has any data for the address where the vector table is, using rom_ptr(). Unfortunately this does not handle the possibility of memory aliasing. For many M-profile boards, memory can be accessed via multiple possible physical addresses; if the board has the vector table at address X but the user's ELF file loads data via a different address Y which is an alias to the same underlying guest RAM then rom_ptr() will not find it. Use the new rom_ptr_for_as() function, which deals with memory aliasing when locating a relevant ROM blob. Signed-off-by: Peter Maydell --- target/arm/cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index ae04884408c..0dd623e5909 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -331,7 +331,7 @@ static void arm_cpu_reset(DeviceState *dev) /* Load the initial SP and PC from offset 0 and 4 in the vector table */ vecbase = env->v7m.vecbase[env->v7m.secure]; -rom = rom_ptr(vecbase, 8); +rom = rom_ptr_for_as(s->as, vecbase, 8); if (rom) { /* Address zero is covered by ROM which hasn't yet been * copied into physical memory. -- 2.20.1
Re: [PATCH 3/3] i386: Make sure kvm_arch_set_tsc_khz() succeeds on migration when 'hv-reenlightenment' was exposed
On Thu, Mar 18, 2021 at 05:38:00PM +0100, Vitaly Kuznetsov wrote: > Paolo Bonzini writes: > > > On 18/03/21 17:02, Vitaly Kuznetsov wrote: > >> KVM doesn't fully support Hyper-V reenlightenment notifications on > >> migration. In particular, it doesn't support emulating TSC frequency > >> of the source host by trapping all TSC accesses so unless TSC scaling > >> is supported on the destination host and KVM_SET_TSC_KHZ succeeds, it > >> is unsafe to proceed with migration. > >> > >> Normally, we only require KVM_SET_TSC_KHZ to succeed when 'user_tsc_khz' > >> was set and just 'try' KVM_SET_TSC_KHZ without otherwise. > >> > >> Introduce a new vmstate section (which is added when the guest has > >> reenlightenment feature enabled) and add env.tsc_khz to it. We already > >> have env.tsc_khz packed in 'cpu/tsc_khz' but we don't want to be dependent > >> on the section order. > >> > >> Signed-off-by: Vitaly Kuznetsov > > > > Could we instead fail to load the reenlightenment section if > > user_tsc_khz was not set? This seems to be user (well, management) > > error really, since reenlightenment has to be enabled manually (or with > > hv-passthrough which blocks migration too). Seems to match the strategy of the patchset... > Yes, we certainly could do that but what's the added value of > user_tsc_khz which upper layer will have to set explicitly (probably to > the tsc frequency of the source host anyway)? Yes. I think what happened was "evolution": 1) Added support to set tsc frequency (with hardware multiplier) in KVM, so add -tsc-khz VAL (kHz) option to KVM. 2) Scaling is enabled only if -tsc-khz VAL is supplied. 3) libvirt switches to using -tsc-khz HVAL, where HVAL it retrieves from KVM_GET_TSC_KHZ of newly created KVM_CREATE_VM instance. It could have been done inside qemu instead. > In case we just want to avoid calling KVM_SET_TSC_KHZ twice, we can probably > achieve that by > adding a CPU flag or something. Avoid calling KVM_SET_TSC_KHZ twice ? Don't see why you would avoid that.
[PATCH for-6.0 v2 0/5] arm: Make M-profile VTOR loads on reset handle memory aliasin
For Arm M-profile CPUs, on reset the CPU must load its initial PC and SP from a vector table in guest memory. Because we can't guarantee reset ordering, we have to handle the possibility that the ROM blob loader's reset function has not yet run when the CPU resets, in which case the data in an ELF file specified by the user won't be in guest memory to be read yet. We work around the reset ordering problem by checking whether the ROM blob loader has any data for the address where the vector table is, using rom_ptr(). Unfortunately this does not handle the possibility of memory aliasing. For many M-profile boards, memory can be accessed via multiple possible physical addresses; if the board has the vector table at address X but the user's ELF file loads data via a different address Y which is an alias to the same underlying guest RAM then rom_ptr() will not find it. This series handles the possibility of aliasing by iterating through the whole FlatView of the CPU's address space checking for other mappings of the MemoryRegion corresponding to the location of the vector table. If we find any aliases we use rom_ptr() to see if the ROM blob loader has any data there. Changes from v1: * do a little bit more cleanup on flatview_for_each_range(): - switch return type to bool - document it * put the "rom_ptr() but handle aliases" functionality into a generally-available function rom_ptr_for_as() We discussed the idea of just making rom_ptr() itself handle the aliasing, but that would require looking at all the callsites to identify a good address space to use; it's also a bit more invasive to other platforms than I would like at this point in the release cycle. So I opted for "provide a new function" as a safer and simpler compromise. In many cases callers of rom_ptr() probably should be changed to use rom_ptr_for_as() at some point, though. I realised that although if we can get reset ordering sorted out we can remove this use of rom_ptr()/rom_ptr_from_as() from the Arm CPU reset function, we will still have the same "need to read the blob data directly" problem for board init functions which are the bulk of the callers of rom_ptr(). I suppose in theory we could rewrite those to postpone their accessing of the data until reset, but that sounds like it could get complicated. Anyway, that means that rom_ptr_for_as() might have a fairly long life. thanks -- PMM Peter Maydell (5): memory: Make flatview_cb return bool, not int memory: Document flatview_for_each_range() memory: Add offset_in_region to flatview_cb arguments hw/core/loader: Add new function rom_ptr_for_as() target/arm: Make M-profile VTOR loads on reset handle memory aliasing include/exec/memory.h | 32 -- include/hw/loader.h | 31 ++ hw/core/loader.c| 75 + softmmu/memory.c| 4 +- target/arm/cpu.c| 2 +- tests/qtest/fuzz/generic_fuzz.c | 11 +++-- 6 files changed, 145 insertions(+), 10 deletions(-) -- 2.20.1