date:20240308

[PATCH v5 6/7] migration/multifd: Enable multifd zero page checking by default.

2024-03-08 Thread Hao Xiang

From: Hao Xiang 

1. Set default "zero-page-detection" option to "multifd". Now
zero page checking can be done in the multifd threads and this
becomes the default configuration.
2. Handle migration QEMU9.0 -> QEMU8.2 compatibility. We provide
backward compatibility where zero page checking is done from the
migration main thread.

Signed-off-by: Hao Xiang 
---
 hw/core/machine.c   | 4 +++-
 migration/options.c | 2 +-
 qapi/migration.json | 6 +++---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 9ac5d5389a..0e9d646b61 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -32,7 +32,9 @@
 #include "hw/virtio/virtio-net.h"
 #include "audio/audio.h"
 
-GlobalProperty hw_compat_8_2[] = {};
+GlobalProperty hw_compat_8_2[] = {
+{ "migration", "zero-page-detection", "legacy"},
+};
 const size_t hw_compat_8_2_len = G_N_ELEMENTS(hw_compat_8_2);
 
 GlobalProperty hw_compat_8_1[] = {
diff --git a/migration/options.c b/migration/options.c
index 8c849620dd..d61d31be24 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -181,7 +181,7 @@ Property migration_properties[] = {
   MIG_MODE_NORMAL),
 DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
parameters.zero_page_detection,
-   ZERO_PAGE_DETECTION_LEGACY),
+   ZERO_PAGE_DETECTION_MULTIFD),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
diff --git a/qapi/migration.json b/qapi/migration.json
index 2684e4e9ac..aa1b39bce1 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -909,7 +909,7 @@
 #(Since 8.2)
 #
 # @zero-page-detection: Whether and how to detect zero pages.
-# See description in @ZeroPageDetection.  Default is 'legacy'.
+# See description in @ZeroPageDetection.  Default is 'multifd'.
 # (since 9.0)
 #
 # Features:
@@ -1106,7 +1106,7 @@
 #(Since 8.2)
 #
 # @zero-page-detection: Whether and how to detect zero pages.
-# See description in @ZeroPageDetection.  Default is 'legacy'.
+# See description in @ZeroPageDetection.  Default is 'multifd'.
 # (since 9.0)
 #
 # Features:
@@ -1339,7 +1339,7 @@
 #(Since 8.2)
 #
 # @zero-page-detection: Whether and how to detect zero pages.
-# See description in @ZeroPageDetection.  Default is 'legacy'.
+# See description in @ZeroPageDetection.  Default is 'multifd'.
 # (since 9.0)
 #
 # Features:
-- 
2.30.2

[PATCH v5 0/7] Introduce multifd zero page checking.

2024-03-08 Thread Hao Xiang

From: Hao Xiang 

v5 update:
* Move QEMU9.0 -> QEMU8.2 migration backward compatibility handling into
the patch where "multifd" zero page checking becomes the default option.
* A few function renaming according to feedback.
* Fix bug in multifd_send_zero_page_detect.
* Rebase on the new mapped-ram feature.
* Pulled in 2 commits from Fabiano.

v4 update:
* Fix documentation for interface ZeroPageDetection.
* Fix implementation in multifd_send_zero_page_check.
* Rebase on top of c0c6a0e3528b88aaad0b9d333e295707a195587b.

v3 update:
* Change "zero" to "zero-pages" and use type size for "zero-bytes".
* Fixed ZeroPageDetection interface description.
* Move zero page unit tests to its own path.
* Removed some asserts.
* Added backward compatibility support for migration 9.0 -> 8.2.
* Removed fields "zero" and "normal" page address arrays from v2. Now
multifd_zero_page_check_send sorts normal/zero pages in the "offset" array.

v2 update:
* Implement zero-page-detection switch with enumeration "legacy",
"none" and "multifd".
* Move normal/zero pages from MultiFDSendParams to MultiFDPages_t.
* Add zeros and zero_bytes accounting.

This patchset is based on Juan Quintela's old series here
https://lore.kernel.org/all/20220802063907.18882-1-quint...@redhat.com/

In the multifd live migration model, there is a single migration main
thread scanning the page map, queuing the pages to multiple multifd
sender threads. The migration main thread runs zero page checking on
every page before queuing the page to the sender threads. Zero page
checking is a CPU intensive task and hence having a single thread doing
all that doesn't scale well. This change introduces a new function
to run the zero page checking on the multifd sender threads. This
patchset also lays the ground work for future changes to offload zero
page checking task to accelerator hardwares.

Use two Intel 4th generation Xeon servers for testing.

Architecture:x86_64
CPU(s):  192
Thread(s) per core:  2
Core(s) per socket:  48
Socket(s):   2
NUMA node(s):2
Vendor ID:   GenuineIntel
CPU family:  6
Model:   143
Model name:  Intel(R) Xeon(R) Platinum 8457C
Stepping:8
CPU MHz: 2538.624
CPU max MHz: 3800.
CPU min MHz: 800.

Perform multifd live migration with below setup:
1. VM has 100GB memory. All pages in the VM are zero pages.
2. Use tcp socket for live migration.
3. Use 4 multifd channels and zero page checking on migration main thread.
4. Use 1/2/4 multifd channels and zero page checking on multifd sender
threads.
5. Record migration total time from sender QEMU console's "info migrate"
command.

++
|zero-page-checking | total-time(ms) |
++
|main-thread| 9629   |
++
|multifd-1-threads  | 6182   |
++
|multifd-2-threads  | 4643   |
++
|multifd-4-threads  | 4143   |
++

Apply this patchset on top of commit
cbccded4a2b5d685a426a437e25f67d3a375b292

Fabiano Rosas (2):
  migration/multifd: Allow zero pages in file migration
  migration/multifd: Allow clearing of the file_bmap from multifd

Hao Xiang (5):
  migration/multifd: Add new migration option zero-page-detection.
  migration/multifd: Implement zero page transmission on the multifd
thread.
  migration/multifd: Implement ram_save_target_page_multifd to handle
multifd version of MigrationOps::ram_save_target_page.
  migration/multifd: Enable multifd zero page checking by default.
  migration/multifd: Add new migration test cases for legacy zero page
checking.

 hw/core/machine.c   |  4 +-
 hw/core/qdev-properties-system.c| 10 
 include/hw/qdev-properties-system.h |  4 ++
 migration/file.c|  2 +-
 migration/meson.build   |  1 +
 migration/migration-hmp-cmds.c  |  9 +++
 migration/multifd-zero-page.c   | 87 +++
 migration/multifd-zlib.c| 21 +--
 migration/multifd-zstd.c| 20 +--
 migration/multifd.c | 92 -
 migration/multifd.h | 23 +++-
 migration/options.c | 21 +++
 migration/options.h |  1 +
 migration/ram.c | 47 +++
 migration/ram.h |  3 +-
 migration/trace-events  |  8 +--
 qapi/migration.json | 38 +++-
 tests/qtest/migration-test.c| 52 
 18 files changed, 395 insertions(+), 48 deletions(-)
 create mode 100644 migration/multifd-zero-page.c

-- 
2.30.2

[PATCH v5 2/7] migration/multifd: Allow clearing of the file_bmap from multifd

2024-03-08 Thread Hao Xiang

From: Fabiano Rosas 

We currently only need to clear the mapped-ram file bitmap from the
migration thread during save_zero_page.

We're about to add support for zero page detection on the multifd
thread, so allow ramblock_set_file_bmap_atomic() to also clear the
bits.

Signed-off-by: Fabiano Rosas 
---
 migration/multifd.c | 2 +-
 migration/ram.c | 8 ++--
 migration/ram.h | 3 ++-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index d4a44da559..6b8a78e4ca 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -115,7 +115,7 @@ static void multifd_set_file_bitmap(MultiFDSendParams *p)
 assert(pages->block);
 
 for (int i = 0; i < p->pages->num; i++) {
-ramblock_set_file_bmap_atomic(pages->block, pages->offset[i]);
+ramblock_set_file_bmap_atomic(pages->block, pages->offset[i], true);
 }
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index 003c28e133..f4abc47bbf 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3150,9 +3150,13 @@ static void ram_save_file_bmap(QEMUFile *f)
 }
 }
 
-void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset)
+void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset, bool 
set)
 {
-set_bit_atomic(offset >> TARGET_PAGE_BITS, block->file_bmap);
+if (set) {
+set_bit_atomic(offset >> TARGET_PAGE_BITS, block->file_bmap);
+} else {
+clear_bit_atomic(offset >> TARGET_PAGE_BITS, block->file_bmap);
+}
 }
 
 /**
diff --git a/migration/ram.h b/migration/ram.h
index b9ac0da587..08feecaf51 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -75,7 +75,8 @@ bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, 
Error **errp);
 bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
 void postcopy_preempt_shutdown_file(MigrationState *s);
 void *postcopy_preempt_thread(void *opaque);
-void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset);
+void ramblock_set_file_bmap_atomic(RAMBlock *block, ram_addr_t offset,
+   bool set);
 
 /* ram cache */
 int colo_init_ram_cache(void);
-- 
2.30.2

[PATCH v5 7/7] migration/multifd: Add new migration test cases for legacy zero page checking.

2024-03-08 Thread Hao Xiang

From: Hao Xiang 

Now that zero page checking is done on the multifd sender threads by
default, we still provide an option for backward compatibility. This
change adds a qtest migration test case to set the zero-page-detection
option to "legacy" and run multifd migration with zero page checking on the
migration main thread.

Signed-off-by: Hao Xiang 
Reviewed-by: Peter Xu 
Message-Id: <20240301022829.3390548-6-hao.xi...@bytedance.com>
---
 tests/qtest/migration-test.c | 52 
 1 file changed, 52 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 4023d808f9..71895abb7f 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2771,6 +2771,24 @@ test_migrate_precopy_tcp_multifd_start(QTestState *from,
 return test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
 }
 
+static void *
+test_migrate_precopy_tcp_multifd_start_zero_page_legacy(QTestState *from,
+QTestState *to)
+{
+test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
+migrate_set_parameter_str(from, "zero-page-detection", "legacy");
+return NULL;
+}
+
+static void *
+test_migration_precopy_tcp_multifd_start_no_zero_page(QTestState *from,
+  QTestState *to)
+{
+test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
+migrate_set_parameter_str(from, "zero-page-detection", "none");
+return NULL;
+}
+
 static void *
 test_migrate_precopy_tcp_multifd_zlib_start(QTestState *from,
 QTestState *to)
@@ -2812,6 +2830,36 @@ static void test_multifd_tcp_none(void)
 test_precopy_common();
 }
 
+static void test_multifd_tcp_zero_page_legacy(void)
+{
+MigrateCommon args = {
+.listen_uri = "defer",
+.start_hook = test_migrate_precopy_tcp_multifd_start_zero_page_legacy,
+/*
+ * Multifd is more complicated than most of the features, it
+ * directly takes guest page buffers when sending, make sure
+ * everything will work alright even if guest page is changing.
+ */
+.live = true,
+};
+test_precopy_common();
+}
+
+static void test_multifd_tcp_no_zero_page(void)
+{
+MigrateCommon args = {
+.listen_uri = "defer",
+.start_hook = test_migration_precopy_tcp_multifd_start_no_zero_page,
+/*
+ * Multifd is more complicated than most of the features, it
+ * directly takes guest page buffers when sending, make sure
+ * everything will work alright even if guest page is changing.
+ */
+.live = true,
+};
+test_precopy_common();
+}
+
 static void test_multifd_tcp_zlib(void)
 {
 MigrateCommon args = {
@@ -3729,6 +3777,10 @@ int main(int argc, char **argv)
 }
 migration_test_add("/migration/multifd/tcp/plain/none",
test_multifd_tcp_none);
+migration_test_add("/migration/multifd/tcp/plain/zero-page/legacy",
+   test_multifd_tcp_zero_page_legacy);
+migration_test_add("/migration/multifd/tcp/plain/zero-page/none",
+   test_multifd_tcp_no_zero_page);
 migration_test_add("/migration/multifd/tcp/plain/cancel",
test_multifd_tcp_cancel);
 migration_test_add("/migration/multifd/tcp/plain/zlib",
-- 
2.30.2

[PATCH v5 4/7] migration/multifd: Implement zero page transmission on the multifd thread.

2024-03-08 Thread Hao Xiang

From: Hao Xiang 

1. Add zero_pages field in MultiFDPacket_t.
2. Implements the zero page detection and handling on the multifd
threads for non-compression, zlib and zstd compression backends.
3. Added a new value 'multifd' in ZeroPageDetection enumeration.
4. Adds zero page counters and updates multifd send/receive tracing
format to track the newly added counters.

Signed-off-by: Hao Xiang 
Acked-by: Markus Armbruster 
---
 hw/core/qdev-properties-system.c |  2 +-
 migration/meson.build|  1 +
 migration/multifd-zero-page.c| 87 ++
 migration/multifd-zlib.c | 21 ++--
 migration/multifd-zstd.c | 20 +--
 migration/multifd.c  | 90 +++-
 migration/multifd.h  | 23 +++-
 migration/ram.c  |  1 -
 migration/trace-events   |  8 +--
 qapi/migration.json  |  7 ++-
 10 files changed, 228 insertions(+), 32 deletions(-)
 create mode 100644 migration/multifd-zero-page.c

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 228e685f52..6e6f68ae1b 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -682,7 +682,7 @@ const PropertyInfo qdev_prop_mig_mode = {
 const PropertyInfo qdev_prop_zero_page_detection = {
 .name = "ZeroPageDetection",
 .description = "zero_page_detection values, "
-   "none,legacy",
+   "none,legacy,multifd",
 .enum_table = _lookup,
 .get = qdev_propinfo_get_enum,
 .set = qdev_propinfo_set_enum,
diff --git a/migration/meson.build b/migration/meson.build
index 92b1cc4297..1eeb915ff6 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -22,6 +22,7 @@ system_ss.add(files(
   'migration.c',
   'multifd.c',
   'multifd-zlib.c',
+  'multifd-zero-page.c',
   'ram-compress.c',
   'options.c',
   'postcopy-ram.c',
diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
new file mode 100644
index 00..1ba38be636
--- /dev/null
+++ b/migration/multifd-zero-page.c
@@ -0,0 +1,87 @@
+/*
+ * Multifd zero page detection implementation.
+ *
+ * Copyright (c) 2024 Bytedance Inc
+ *
+ * Authors:
+ *  Hao Xiang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "exec/ramblock.h"
+#include "migration.h"
+#include "multifd.h"
+#include "options.h"
+#include "ram.h"
+
+static bool multifd_zero_page_enabled(void)
+{
+return migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD;
+}
+
+static void swap_page_offset(ram_addr_t *pages_offset, int a, int b)
+{
+ram_addr_t temp;
+
+if (a == b) {
+return;
+}
+
+temp = pages_offset[a];
+pages_offset[a] = pages_offset[b];
+pages_offset[b] = temp;
+}
+
+/**
+ * multifd_send_zero_page_detect: Perform zero page detection on all pages.
+ *
+ * Sorts normal pages before zero pages in p->pages->offset and updates
+ * p->pages->normal_num.
+ *
+ * @param p A pointer to the send params.
+ */
+void multifd_send_zero_page_detect(MultiFDSendParams *p)
+{
+MultiFDPages_t *pages = p->pages;
+RAMBlock *rb = pages->block;
+int i = 0;
+int j = pages->num - 1;
+
+if (!multifd_zero_page_enabled()) {
+pages->normal_num = pages->num;
+return;
+}
+
+/*
+ * Sort the page offset array by moving all normal pages to
+ * the left and all zero pages to the right of the array.
+ */
+while (i <= j) {
+uint64_t offset = pages->offset[i];
+
+if (!buffer_is_zero(rb->host + offset, p->page_size)) {
+i++;
+continue;
+}
+
+swap_page_offset(pages->offset, i, j);
+ram_release_page(rb->idstr, offset);
+j--;
+}
+
+pages->normal_num = i;
+}
+
+void multifd_recv_zero_page_process(MultiFDRecvParams *p)
+{
+for (int i = 0; i < p->zero_num; i++) {
+void *page = p->host + p->zero[i];
+if (!buffer_is_zero(page, p->page_size)) {
+memset(page, 0, p->page_size);
+}
+}
+}
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 6120faad65..83c0374380 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -123,13 +123,15 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
 int ret;
 uint32_t i;
 
-multifd_send_prepare_header(p);
+if (!multifd_send_prepare_common(p)) {
+goto out;
+}
 
-for (i = 0; i < pages->num; i++) {
+for (i = 0; i < pages->normal_num; i++) {
 uint32_t available = z->zbuff_len - out_size;
 int flush = Z_NO_FLUSH;
 
-if (i == pages->num - 1) {
+if (i == pages->normal_num - 1) {
 flush = Z_SYNC_FLUSH;
 }
 
@@ -172,10 +174,10 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)

[PATCH v5 5/7] migration/multifd: Implement ram_save_target_page_multifd to handle multifd version of MigrationOps::ram_save_target_page.

2024-03-08 Thread Hao Xiang

From: Hao Xiang 

1. Add a dedicated handler for MigrationOps::ram_save_target_page in
multifd live migration.
2. Refactor ram_save_target_page_legacy so that the legacy and multifd
handlers don't have internal functions calling into each other.

Signed-off-by: Hao Xiang 
Reviewed-by: Fabiano Rosas 
Message-Id: <20240226195654.934709-4-hao.xi...@bytedance.com>
---
 migration/ram.c | 42 +-
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d1f97cf862..887e20bf5b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1140,10 +1140,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus 
*pss,
 QEMUFile *file = pss->pss_channel;
 int len = 0;
 
-if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_NONE) {
-return 0;
-}
-
 if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
 return 0;
 }
@@ -2079,7 +2075,6 @@ static bool save_compress_page(RAMState *rs, 
PageSearchStatus *pss,
  */
 static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
 {
-RAMBlock *block = pss->block;
 ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
 int res;
 
@@ -2095,17 +2090,33 @@ static int ram_save_target_page_legacy(RAMState *rs, 
PageSearchStatus *pss)
 return 1;
 }
 
+return ram_save_page(rs, pss);
+}
+
+/**
+ * ram_save_target_page_multifd: send one target page to multifd workers
+ *
+ * Returns 1 if the page was queued, -1 otherwise.
+ *
+ * @rs: current RAM state
+ * @pss: data about the page we want to send
+ */
+static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
+{
+RAMBlock *block = pss->block;
+ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
+
 /*
- * Do not use multifd in postcopy as one whole host page should be
- * placed.  Meanwhile postcopy requires atomic update of pages, so even
- * if host page size == guest page size the dest guest during run may
- * still see partially copied pages which is data corruption.
+ * While using multifd live migration, we still need to handle zero
+ * page checking on the migration main thread.
  */
-if (migrate_multifd() && !migration_in_postcopy()) {
-return ram_save_multifd_page(block, offset);
+if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
+if (save_zero_page(rs, pss, offset)) {
+return 1;
+}
 }
 
-return ram_save_page(rs, pss);
+return ram_save_multifd_page(block, offset);
 }
 
 /* Should be called before sending a host page */
@@ -3113,7 +3124,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 
 migration_ops = g_malloc0(sizeof(MigrationOps));
-migration_ops->ram_save_target_page = ram_save_target_page_legacy;
+
+if (migrate_multifd()) {
+migration_ops->ram_save_target_page = ram_save_target_page_multifd;
+} else {
+migration_ops->ram_save_target_page = ram_save_target_page_legacy;
+}
 
 bql_unlock();
 ret = multifd_send_sync_main();
-- 
2.30.2

[PATCH v5 1/7] migration/multifd: Allow zero pages in file migration

2024-03-08 Thread Hao Xiang

From: Fabiano Rosas 

Currently, it's an error to have no data pages in the multifd file
migration because zero page detection is done in the migration thread
and zero pages don't reach multifd. This is enforced with the
pages->num assert.

We're about to add zero page detection on the multifd thread. Fix the
file_write_ramblock_iov() to stop considering p->iovs_num=0 an error.

Signed-off-by: Fabiano Rosas 
---
 migration/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/file.c b/migration/file.c
index 164b079966..5075f9526f 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -159,7 +159,7 @@ void file_start_incoming_migration(FileMigrationArgs 
*file_args, Error **errp)
 int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
 int niov, RAMBlock *block, Error **errp)
 {
-ssize_t ret = -1;
+ssize_t ret = 0;
 int i, slice_idx, slice_num;
 uintptr_t base, next, offset;
 size_t len;
-- 
2.30.2

Re: [PATCH v4 0/8] qtest: migration: Add tests for introducing 'channels' argument in migrate QAPIs

2024-03-08 Thread Het Gala

Can find the reference to the githab pipeline (before patchset) : 
https://gitlab.com/galahet/Qemu/-/pipelines/1207185095


Can find the reference to the githab pipeline (after patchset) : 
https://gitlab.com/galahet/Qemu/-/pipelines/1207183673


On 09/03/24 2:29 am, Het Gala wrote:

With recent migrate QAPI changes, enabling the direct use of the
'channels' argument to avoid redundant URI string parsing is achieved.

To ensure backward compatibility, both 'uri' and 'channels' are kept as
optional parameters in migration QMP commands. However, they are mutually
exhaustive, requiring at least one for a successful migration connection.
This patchset adds qtests to validate 'uri' and 'channels' arguments'
mututally exhaustive behaviour.

Additionally, all migration qtests fail to employ 'channel' as the primary
method for validating migration QAPIs. This patchset also adds test to
enforce only use of 'channel' argument as the initial entry point for
migration QAPIs.

Patch Summary:
-
Patch 1-2:
-
Introduce 'to' object inside migrate_qmp() so and move the calls to
migrate_get_socket_address() inside migrate_qmp. Also, replace connect_uri
with args->connect_uri everywhere.

Patch 3-6:
-
Add channels argument to allow both migration QAPI arguments independently
into migrate_qmp and migrate_qmp_fail. migrate_qmp requires the port value to
be changed from 0 to port value coming from migrate_get_socket_address. Add
migrate_set_ports to address this change of port value.

Patch 7-8:
-
Add 2 negative tests to validate mutually exhaustive behaviour of migration
QAPIs. Add a positive multifd_tcp_plain qtest with only channels as the
initial entry point for migration QAPIs.

v3->v4 Changelog:

1. introduced migrate_get_connect_uri and migrate_get_connect_qdict to
both used migrate_get_socket_address to get dest uri in socket-
address, and then use SokcketAddress_to_qdict to convert it into qdict.
2. Misc code changes.

v2->v3 Changelog:
-
1. 'channels' introduction is not required now for migrate_qmp_incoming
2. Refactor the code into 7 different patches
3. 'channels' introduction is not required now for migrate_qmp_incoming
4. Remove custom function for converting string to MigrationChannelList
5. move calls for migrate_get_socket_address inside migrate_qmp so that
migrate_set_ports can replace the QAPI's port with correct value.

Het Gala (8):
   Add 'to' object into migrate_qmp()
   Replace connect_uri and move migrate_get_socket_address inside
 migrate_qmp
   Replace migrate_get_connect_uri inplace of migrate_get_socket_address
   Add channels parameter in migrate_qmp_fail
   Add migrate_set_ports into migrate_qmp to update migration port value
   Add channels parameter in migrate_qmp
   Add multifd_tcp_plain test using list of channels instead of uri
   Add negative tests to validate migration QAPIs

  tests/qtest/migration-helpers.c | 158 +++-
  tests/qtest/migration-helpers.h |  10 +-
  tests/qtest/migration-test.c| 177 ++--
  3 files changed, 258 insertions(+), 87 deletions(-)

Re: [External] Re: [PATCH v4 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface.

2024-03-08 Thread hao . xiang

> 
> On Thu, Feb 29, 2024 at 11:40 PM Markus Armbruster  wrote:
> 
> > 
> > Hao Xiang  writes:
> > 
> >  This change extends the MigrationStatus interface to track zero pages
> > 
> >  and zero bytes counter.
> > 
> >  Signed-off-by: Hao Xiang 
> > 
> >  [...]
> > 
> >  diff --git a/qapi/migration.json b/qapi/migration.json
> > 
> >  index ca9561fbf1..03b850bab7 100644
> > 
> >  --- a/qapi/migration.json
> > 
> >  +++ b/qapi/migration.json
> > 
> >  @@ -63,6 +63,10 @@
> > 
> >  # between 0 and @dirty-sync-count * @multifd-channels. (since
> > 
> >  # 7.1)
> > 
> >  #
> > 
> >  +# @zero-pages: number of zero pages (since 9.0)
> > 
> >  +#
> > 
> >  +# @zero-bytes: number of zero bytes sent (since 9.0)
> > 
> >  +#
> > 
> >  Discussion of v3 has led me to believe:
> > 
> >  1. A page is either migrated as a normal page or as a zero page.
> > 
> >  2. The following equations hold:
> > 
> >  @normal-bytes = @normal * @page-size
> > 
> >  @zero-bytes = @zero-pages * @page-size
> > 
> >  3. @zero-pages is the same as @duplicate, with a better name. We intend
> > 
> >  to drop @duplicate eventually.
> > 
> >  If this is correct, I'd like you to
> > 
> >  A. Name it @zero for consistency with @normal. Disregard my advice to
> > 
> >  name it @zero-pages; two consistent bad names are better than one bad
> > 
> >  name, one good name, and inconsistency.
> > 
> >  B. Add @zero and @zero-bytes next to @normal and @normal-bytes.
> > 
> >  C. Deprecate @duplicate (item 3). Separate patch, please.
> > 
> >  D. Consider documenting more clearly what normal and zero pages are
> > 
> >  (item 1), and how @FOO, @FOO-pages and @page-size are related (item
> > 
> >  2). Could be done in a followup patch.

I will move this out of the current patchset and put them into a seperate 
patchset. I think I am not totally understanding the exact process of 
deprecating an interface and hence will need your help to probably go a few 
more versions. And I read from earlier conversation the soft release for 
QEMU9.0 is 3/12 so hopefully the rest of this patchset can catch it.

> > 
> >  # Features:
> > 
> >  #
> > 
> >  # @deprecated: Member @skipped is always zero since 1.5.3
> > 
> >  @@ -81,7 +85,8 @@
> > 
> >  'multifd-bytes': 'uint64', 'pages-per-second': 'uint64',
> > 
> >  'precopy-bytes': 'uint64', 'downtime-bytes': 'uint64',
> > 
> >  'postcopy-bytes': 'uint64',
> > 
> >  - 'dirty-sync-missed-zero-copy': 'uint64' } }
> > 
> >  + 'dirty-sync-missed-zero-copy': 'uint64',
> > 
> >  + 'zero-pages': 'int', 'zero-bytes': 'size' } }
> > 
> >  [...]
> >
>

Re: [PATCH v2 07/20] smbios: avoid mangling user provided tables

2024-03-08 Thread Ani Sinha




> On 08-Mar-2024, at 22:49, Igor Mammedov  wrote:
> 
> On Thu, 7 Mar 2024 09:33:17 +0530
> Ani Sinha  wrote:
> 
>>> On 06-Mar-2024, at 12:11, Ani Sinha  wrote:
>>> 
>>> 
>>> 
>>> On Tue, 5 Mar 2024, Igor Mammedov wrote:
>>> 
 currently smbios_entry_add() preserves internally '-smbios type='
 options but tables provided with '-smbios file=' are stored directly
 into blob that eventually will be exposed to VM. And then later
 QEMU adds default/'-smbios type' entries on top into the same blob.
 
 It makes impossible to generate tables more than once, hence
 'immutable' guard was used.
 Make it possible to regenerate final blob by storing user provided
 blobs into a dedicated area (usr_blobs) and then copy it when
 composing final blob. Which also makes handling of -smbios
 options consistent.
 
 As side effect of this and previous commits there is no need to
 generate legacy smbios_entries at the time options are parsed.
 Instead compose smbios_entries on demand from  usr_blobs like
 it is done for non-legacy SMBIOS tables.
 
 Signed-off-by: Igor Mammedov 
 Tested-by: Fiona Ebner   
>>> 
>>> Reviewed-by: Ani Sinha 
>>> 
 ---
 hw/smbios/smbios.c | 179 +++--
 1 file changed, 92 insertions(+), 87 deletions(-)
 
 diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
 index c46fc93357..aa2cc5bdbd 100644
 --- a/hw/smbios/smbios.c
 +++ b/hw/smbios/smbios.c
 @@ -57,6 +57,14 @@ static size_t smbios_entries_len;
 static bool smbios_uuid_encoded = true;
 /* end: legacy structures & constants for <= 2.0 machines */
 
 +/*
 + * SMBIOS tables provided by user with '-smbios file=' option
 + */
 +uint8_t *usr_blobs;
 +size_t usr_blobs_len;
 +static GArray *usr_blobs_sizes;
 +static unsigned usr_table_max;
 +static unsigned usr_table_cnt;
 
 uint8_t *smbios_tables;
 size_t smbios_tables_len;
 @@ -67,7 +75,6 @@ static SmbiosEntryPointType smbios_ep_type = 
 SMBIOS_ENTRY_POINT_TYPE_32;
 static SmbiosEntryPoint ep;
 
 static int smbios_type4_count = 0;
 -static bool smbios_immutable;
 static bool smbios_have_defaults;
 static uint32_t smbios_cpuid_version, smbios_cpuid_features;
 
 @@ -569,9 +576,8 @@ static void smbios_build_type_1_fields(void)
 
 uint8_t *smbios_get_table_legacy(uint32_t expected_t4_count, size_t 
 *length)
 {
 -/* drop unwanted version of command-line file blob(s) */
 -g_free(smbios_tables);
 -smbios_tables = NULL;
 +int i;
 +size_t usr_offset;
 
/* also complain if fields were given for types > 1 */
if (find_next_bit(have_fields_bitmap,
 @@ -581,12 +587,33 @@ uint8_t *smbios_get_table_legacy(uint32_t 
 expected_t4_count, size_t *length)
exit(1);
}
 
 -if (!smbios_immutable) {
 -smbios_build_type_0_fields();
 -smbios_build_type_1_fields();
 -smbios_validate_table(expected_t4_count);
 -smbios_immutable = true;
 +g_free(smbios_entries);
 +smbios_entries_len = sizeof(uint16_t);
 +smbios_entries = g_malloc0(smbios_entries_len);
 +
 +for (i = 0, usr_offset = 0; usr_blobs_sizes && i < 
 usr_blobs_sizes->len;
 + i++)
 +{
 +struct smbios_table *table;
 +struct smbios_structure_header *header;
 +size_t size = g_array_index(usr_blobs_sizes, size_t, i);
 +
 +header = (struct smbios_structure_header *)(usr_blobs + 
 usr_offset);
 +smbios_entries = g_realloc(smbios_entries, smbios_entries_len +
 +   size + sizeof(*table));
 +table = (struct smbios_table *)(smbios_entries + 
 smbios_entries_len);
 +table->header.type = SMBIOS_TABLE_ENTRY;
 +table->header.length = cpu_to_le16(sizeof(*table) + size);
 +memcpy(table->data, header, size);
 +smbios_entries_len += sizeof(*table) + size;
 +(*(uint16_t *)smbios_entries) =
 +cpu_to_le16(le16_to_cpu(*(uint16_t *)smbios_entries) + 1);  
>>> 
>>> I know this comes from existing code but can you please explain why we add
>>> 1 to it? This is confusing and a comment here would be nice.
>>> 
 +usr_offset += size;  
>>> 
>>> It would be better if we could add a comment here describing a bit what
>>> this is all about.
>>> 
>>> user blobs are an array of smbios_structure_header entries whereas legacy
>>> tables are an array of smbios_table structures where
>>> smbios_table->data represents the a single user provided table blob in
>>> smbios_structure_header.  
>> 
>> Igor, are you going to send a v3 for this with the comments added?
> 
> I can add comments as a patch on top of series,
> though I'd rather

Re: Unmapping KVM Guest Memory from Host Kernel

2024-03-08 Thread Matthew Wilcox

On Fri, Mar 08, 2024 at 03:50:05PM +, Gowans, James wrote:
> Currently when using anonymous memory for KVM guest RAM, the memory all
> remains mapped into the kernel direct map. We are looking at options to
> get KVM guest memory out of the kernel’s direct map as a principled
> approach to mitigating speculative execution issues in the host kernel.
> Our goal is to more completely address the class of issues whose leak
> origin is categorized as "Mapped memory" [1].

One of the things that is holding Linux back is the inability to do I/O
to memory which is not part of memmap.  _So Much_ of our infrastructure
is based on having a struct page available to stick into an sglist, bio,
skb_frag, or whatever.  The solution to this is to move to a (phys_addr,
length) tuple instead of (page, offset, len) tuple.  I call this "phyr"
and I've written about it before.  I'm not working on this as I have
quite enough to do with the folio work, but I hope somebody works on it
before I get time to.

Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

2024-03-08 Thread fan

On Thu, Mar 07, 2024 at 12:45:55PM +, Jonathan Cameron wrote:
> ...
> 
> > > > +list = records;
> > > > +extents = g_new0(CXLDCExtentRaw, num_extents);
> > > > +while (list) {
> > > > +CXLDCExtent *ent;
> > > > +bool skip_extent = false;
> > > > +
> > > > +offset = list->value->offset;
> > > > +len = list->value->len;
> > > > +
> > > > +extents[i].start_dpa = offset + dcd->dc.regions[rid].base;
> > > > +extents[i].len = len;
> > > > +memset(extents[i].tag, 0, 0x10);
> > > > +extents[i].shared_seq = 0;
> > > > +
> > > > +if (type == DC_EVENT_RELEASE_CAPACITY ||
> > > > +type == DC_EVENT_FORCED_RELEASE_CAPACITY) {
> > > > +/*
> > > > + *  if the extent is still pending to be added to the 
> > > > host,  
> > > 
> > > Odd spacing.
> > >   
> > > > + * remove it from the pending extent list, so later when 
> > > > the add
> > > > + * response for the extent arrives, the device can reject 
> > > > the
> > > > + * extent as it is not in the pending list.
> > > > + */
> > > > +ent = cxl_dc_extent_exists(>dc.extents_pending_to_add,
> > > > +[i]);
> > > > +if (ent) {
> > > > +QTAILQ_REMOVE(>dc.extents_pending_to_add, ent, 
> > > > node);
> > > > +g_free(ent);
> > > > +skip_extent = true;
> > > > +} else if (!cxl_dc_extent_exists(>dc.extents, 
> > > > [i])) {
> > > > +/* If the exact extent is not in the accepted list, 
> > > > skip */
> > > > +skip_extent = true;
> > > > +}  
> > > I think we need to reject case of some extents skipped and others not.
> > > That's not supported yet so we need to complain if we get it at least. 
> > > Maybe we need
> > > to do two passes so we know this has happened early (or perhaps this is a 
> > > later
> > > patch in which case a todo here would help).  
> > 
> > Skip here does not mean the extent is invalid, it just means the extent
> > is still pending to add, so remove them from pending list would be
> > enough to reject the extent, no need to release further. That is based
> > on your feedback on v4.
> 
> Ah. I'd missunderstood.

Hi Jonathan,

I think we should not allow to release extents that are still pending to
add. 
If we allow it, there is a case that will not work.
Let's see the following case (time order):
1. Send request to add extent A to host; (A --> pending list)
2. Send request to release A from the host; (Delete A from pending list,
hoping the following add response for A will fail as there is not a matched
extent in the pending list).
3. Host send response to the device for the add request, however, for
some reason, it does not accept any of it, so updated list is empty,
spec allows it. Based on the spec, we need to drop the extent at the
head of the event log. Now we have problem. Since extent A is already
dropped from the list, we either cannot drop as the list is empty, which
is not the worst. If we have more extents in the list, we may drop the
one following A, which is for another request. If this happens, all the
following extents will be acked incorrectly as the order has been
shifted.
 
Does the above reasoning make sense to you?

Fan

> 
> > 
> > The loop here is only to collect the extents to sent to the event log. 
> > But as you said, we need one pass before updating pending list.
> > Actually if we do not allow the above case where extents to release is
> > still in the pending to add list, we can just return here with error, no
> > extra dry run needed. 
> > 
> > What do you think?
> 
> I think we need a way to back out extents from the pending to add list
> so we can create the race where they are offered to the OS and it takes
> forever to accept and by the time it does we've removed them.
> 
> > 
> > >   
> > > > +
> > > > +
> > > > +/* No duplicate or overlapped extents are allowed */
> > > > +if (test_any_bits_set(blk_bitmap, offset / block_size,
> > > > +  len / block_size)) {
> > > > +error_setg(errp, "duplicate or overlapped extents are 
> > > > detected");
> > > > +return;
> > > > +}
> > > > +bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> > > > +
> > > > +list = list->next;
> > > > +if (!skip_extent) {
> > > > +i++;  
> > > Problem is if we skip one in the middle the records will be wrong below.  
> > 
> > Why? Only extents passed the check will be stored in variable extents and
> > processed further and i be updated. 
> > For skipped ones, since i is not updated, they will be
> > overwritten by following valid ones.
> Ah. I'd missed the fact you store into the extent without a check on validity
> but only move the index on if they were valid. Then rely on not passing a 
> trailing
>

Re: [PATCH] disas/riscv: Further correction to LUI disassembly

2024-03-08 Thread Richard Bagley

post-nack, one further comment:

One could argue that this change also aligns QEMU with supporting tools (as
Andrew observed), and it makes sense to merge this change into QEMU until
those tools update to supporting signed decimal numbers with immediates.

As it is, both GNU assembler and the LLVM integrated assembler (or llvm-mc)
throws an error with examples such as
auipc s0, -17

On the other hand, I have only seen this problem with the output of the
COLLECT plug-in, not (as yet) with QEMU execution proper.
If the problem is confined to COLLECT, perhaps the argument for aligning
with other tools is not as strong.

In the meantime, I have adjusted my change locally to include AUIPC, and
written a substantive, and I hope, clear commit description.
If you would like me to resubmit a patch with this updated change, please
let me know.



On Thu, Mar 7, 2024 at 4:08 PM Richard Bagley 
wrote:

> NACK
>
> We have established that the change is a workaround for a bug in
> the assembler.
> I withdraw the merge request.
>
> Thank you for this careful review.
>
> On Fri, Aug 11, 2023 at 4:55 AM Andrew Jones 
> wrote:
>
>> On Fri, Aug 11, 2023 at 10:25:52AM +0200, Andrew Jones wrote:
>> > On Thu, Aug 10, 2023 at 06:27:50PM +0200, Andrew Jones wrote:
>> > > On Thu, Aug 10, 2023 at 09:12:42AM -0700, Palmer Dabbelt wrote:
>> > > > On Thu, 10 Aug 2023 08:31:46 PDT (-0700), ajo...@ventanamicro.com
>> wrote:
>> > > > > On Mon, Jul 31, 2023 at 11:33:20AM -0700, Richard Bagley wrote:
>> > > > > > The recent commit 36df75a0a9 corrected one aspect of LUI
>> disassembly
>> > > > > > by recovering the immediate argument from the result of LUI
>> with a
>> > > > > > shift right by 12. However, the shift right will left-fill with
>> the
>> > > > > > sign. By applying a mask we recover an unsigned representation
>> of the
>> > > > > > 20-bit field (which includes a sign bit).
>> > > > > >
>> > > > > > Example:
>> > > > > > 0xf000 >> 12 = 0x
>> > > > > > 0xf000 >> 12 & 0xf = 0x000f
>> > > > > >
>> > > > > > Fixes: 36df75a0a9 ("riscv/disas: Fix disas output of upper
>> immediates")
>> > > > > > Signed-off-by: Richard Bagley 
>> > > > > > ---
>> > > > > >  disas/riscv.c | 9 ++---
>> > > > > >  1 file changed, 6 insertions(+), 3 deletions(-)
>> > > > > >
>> > > > > > diff --git a/disas/riscv.c b/disas/riscv.c
>> > > > > > index 4023e3fc65..690eb4a1ac 100644
>> > > > > > --- a/disas/riscv.c
>> > > > > > +++ b/disas/riscv.c
>> > > > > > @@ -4723,9 +4723,12 @@ static void format_inst(char *buf,
>> size_t buflen, size_t tab, rv_decode *dec)
>> > > > > >  break;
>> > > > > >  case 'U':
>> > > > > >  fmt++;
>> > > > > > -snprintf(tmp, sizeof(tmp), "%d", dec->imm >> 12);
>> > > > > > -append(buf, tmp, buflen);
>> > > > > > -if (*fmt == 'o') {
>> > > > > > +if (*fmt == 'i') {
>> > > > > > +snprintf(tmp, sizeof(tmp), "%d", dec->imm >>
>> 12 & 0xf);
>> > > > >
>> > > > > Why are we correcting LUI's output, but still outputting
>> sign-extended
>> > > > > values for AUIPC?
>> > > > >
>> > > > > We can't assemble 'auipc a1, 0x' or 'auipc a1, -1'
>> without getting
>> > > > >
>> > > > >  Error: lui expression not in range 0..1048575
>> > > > >
>> > > > > (and additionally for 0x)
>> > > > >
>> > > > >  Error: value of 0000 too large for field of 4 bytes
>> at 
>> > > > >
>> > > > > either.
>> > > > >
>> > > > > (I see that the assembler's error messages state 'lui', but I was
>> trying
>> > > > > 'auipc'.)
>> > > > >
>> > > > > I'm using as from gnu binutils 2.40.0.20230214.
>> > > > >
>> > > > > (And, FWIW, I agree with Richard Henderson that these
>> instructions should
>> > > > > accept negative values.)
>> > > >
>> > > > I'm kind of lost here, and you saying binutils rejects this
>> syntax?  If
>> > > > that's the case it's probably just an oversight, can you file a bug
>> in
>> > > > binutils land so folks can see?
>> > >
>> > > Will do.
>> > >
>> >
>> > https://sourceware.org/bugzilla/show_bug.cgi?id=30746
>> >
>>
>> But, to try to bring this thread back to the patch under review. While the
>> binutils BZ may address our preferred way of providing immediates to the
>> assembler, this patch is trying to make QEMU's output consistent with
>> objdump. Since objdump always outputs long immediate values as hex, then
>> it doesn't need to care about negative signs. QEMU seems to prefer
>> decimal, though, and so does llvm-objdump, which outputs values for these
>> instructions in the range 0..1048575. So, I guess this patch is making
>> QEMU consistent with llvm-objdump.
>>
>> Back to making suggestions for this patch...
>>
>> 1. The commit message should probably say something along the lines of
>>what I just wrote in the preceding paragraph to better explain the
>>motivation.
>>
>> 2. Unless I'm missing something, then this patch should also address
>>AUIPC.
>>

Re: [PATCH v2 2/2] xen: fix stubdom PCI addr

2024-03-08 Thread Jason Andryuk

On Tue, Mar 5, 2024 at 2:13 PM Marek Marczykowski-Górecki
 wrote:
>
> From: Frédéric Pierret (fepitre) 

Needs to be changed to Marek.

> When running in a stubdomain, the config space access via sysfs needs to
> use BDF as seen inside stubdomain (connected via xen-pcifront), which is
> different from the real BDF. For other purposes (hypercall parameters
> etc), the real BDF needs to be used.
> Get the in-stubdomain BDF by looking up relevant PV PCI xenstore
> entries.
>
> Signed-off-by: Marek Marczykowski-Górecki 
> ---
> Changes in v2:
> - use xs_node_scanf
> - use %d instead of %u to read values written as %d
> - add a comment from another iteration of this patch by Jason Andryuk
> ---
>  hw/xen/xen-host-pci-device.c | 69 +++-
>  hw/xen/xen-host-pci-device.h |  6 
>  2 files changed, 74 insertions(+), 1 deletion(-)
>
> diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
> index 8c6e9a1716..8ea2a5a4af 100644
> --- a/hw/xen/xen-host-pci-device.c
> +++ b/hw/xen/xen-host-pci-device.c
> @@ -9,6 +9,8 @@
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
>  #include "qemu/cutils.h"
> +#include "hw/xen/xen-legacy-backend.h"
> +#include "hw/xen/xen-bus-helper.h"
>  #include "xen-host-pci-device.h"
>
>  #define XEN_HOST_PCI_MAX_EXT_CAP \
> @@ -33,13 +35,67 @@
>  #define IORESOURCE_PREFETCH 0x1000  /* No side effects */
>  #define IORESOURCE_MEM_64   0x0010
>
> +/*
> + * Non-passthrough (dom0) accesses are local PCI devices and use the given 
> BDF
> + * Passthough (stubdom) accesses are through PV frontend PCI device.  Those
> + * either have a BDF identical to the backend's BDF 
> (xen-backend.passthrough=1)
> + * or a local virtual BDF (xen-backend.passthrough=0)
> + *
> + * We are always given the backend's BDF and need to lookup the appropriate
> + * local BDF for sysfs access.
> + */
> +static void xen_host_pci_fill_local_addr(XenHostPCIDevice *d, Error **errp)
> +{
> +unsigned int num_devs, len, i;
> +unsigned int domain, bus, dev, func;
> +char *be_path = NULL;
> +char path[80];

path is now only used for dev/vdev-%d, so 80 could be reduced.

> +
> +be_path = qemu_xen_xs_read(xenstore, 0, "device/pci/0/backend", );
> +if (!be_path)

error_setg() here?

> +goto out;
> +
> +if (xs_node_scanf(xenstore, 0, be_path, "num_devs", NULL, "%d", 
> _devs) != 1) {
> +error_setg(errp, "Failed to read or parse %s/num_devs\n", be_path);
> +goto out;
> +}
> +
> +for (i = 0; i < num_devs; i++) {
> +snprintf(path, sizeof(path), "dev-%d", i);
> +if (xs_node_scanf(xenstore, 0, be_path, path, NULL,
> +  "%x:%x:%x.%x", , , , ) != 4) {
> +error_setg(errp, "Failed to read or parse %s/%s\n", be_path, 
> path);
> +goto out;
> +}
> +if (domain != d->domain ||
> +bus != d->bus ||
> +dev != d->dev ||
> +func!= d->func)
> +continue;
> +snprintf(path, sizeof(path), "vdev-%d", i);
> +if (xs_node_scanf(xenstore, 0, be_path, path, NULL,
> +  "%x:%x:%x.%x", , , , ) != 4) {
> +error_setg(errp, "Failed to read or parse %s/%s\n", be_path, 
> path);
> +goto out;
> +}
> +d->local_domain = domain;
> +d->local_bus = bus;
> +d->local_dev = dev;
> +d->local_func = func;
> +goto out;
> +}

error_setg here in case we exited the loop without finding a match?

Thanks,
Jason

> +
> +out:
> +free(be_path);
> +}
> +

Re: [PATCH v2 1/2] hw/xen: detect when running inside stubdomain

2024-03-08 Thread Jason Andryuk

On Tue, Mar 5, 2024 at 2:13 PM Marek Marczykowski-Górecki
 wrote:
>
> Introduce global xen_is_stubdomain variable when qemu is running inside
> a stubdomain instead of dom0. This will be relevant for subsequent
> patches, as few things like accessing PCI config space need to be done
> differently.
>
> Signed-off-by: Marek Marczykowski-Górecki 

Reviewed-by: Jason Andryuk

[PATCH v3 5/5] tests/tcg: Add multiarch test for Xfer:siginfo:read stub

2024-03-08 Thread Gustavo Romero

Add multiarch test for testing if Xfer:siginfo:read query is properly
handled by gdbstub.

Signed-off-by: Gustavo Romero 
Reviewed-by: Richard Henderson 
---
 tests/tcg/multiarch/Makefile.target   | 10 ++-
 .../gdbstub/test-qxfer-siginfo-read.py| 26 +++
 tests/tcg/multiarch/segfault.c| 14 ++
 3 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/multiarch/gdbstub/test-qxfer-siginfo-read.py
 create mode 100644 tests/tcg/multiarch/segfault.c

diff --git a/tests/tcg/multiarch/Makefile.target 
b/tests/tcg/multiarch/Makefile.target
index f11f3b084d..5ab4ba89b2 100644
--- a/tests/tcg/multiarch/Makefile.target
+++ b/tests/tcg/multiarch/Makefile.target
@@ -71,6 +71,13 @@ run-gdbstub-qxfer-auxv-read: sha1
--bin $< --test 
$(MULTIARCH_SRC)/gdbstub/test-qxfer-auxv-read.py, \
basic gdbstub qXfer:auxv:read support)
 
+run-gdbstub-qxfer-siginfo-read: segfault
+   $(call run-test, $@, $(GDB_SCRIPT) \
+   --gdb $(GDB) \
+   --qemu $(QEMU) --qargs "$(QEMU_OPTS)" \
+   --bin "$< -s" --test 
$(MULTIARCH_SRC)/gdbstub/test-qxfer-siginfo-read.py, \
+   basic gdbstub qXfer:siginfo:read support)
+
 run-gdbstub-proc-mappings: sha1
$(call run-test, $@, $(GDB_SCRIPT) \
--gdb $(GDB) \
@@ -113,7 +120,8 @@ endif
 EXTRA_RUNS += run-gdbstub-sha1 run-gdbstub-qxfer-auxv-read \
  run-gdbstub-proc-mappings run-gdbstub-thread-breakpoint \
  run-gdbstub-registers run-gdbstub-prot-none \
- run-gdbstub-catch-syscalls
+ run-gdbstub-catch-syscalls \
+ run-gdbstub-qxfer-siginfo-read
 
 # ARM Compatible Semi Hosting Tests
 #
diff --git a/tests/tcg/multiarch/gdbstub/test-qxfer-siginfo-read.py 
b/tests/tcg/multiarch/gdbstub/test-qxfer-siginfo-read.py
new file mode 100644
index 00..862596b07a
--- /dev/null
+++ b/tests/tcg/multiarch/gdbstub/test-qxfer-siginfo-read.py
@@ -0,0 +1,26 @@
+from __future__ import print_function
+#
+# Test gdbstub Xfer:siginfo:read stub.
+#
+# The test runs a binary that causes a SIGSEGV and then looks for additional
+# info about the signal through printing GDB's '$_siginfo' special variable,
+# which sends a Xfer:siginfo:read query to the gdbstub.
+#
+# The binary causes a SIGSEGV at dereferencing a pointer with value 0xdeadbeef,
+# so the test looks for and checks if this address is correctly reported by the
+# gdbstub.
+#
+# This is launched via tests/guest-debug/run-test.py
+#
+
+import gdb
+from test_gdbstub import main, report
+
+def run_test():
+"Run through the test"
+
+gdb.execute("continue", False, True)
+resp = gdb.execute("print/x $_siginfo", False, True)
+report(resp.find("si_addr = 0xdeadbeef"), "Found fault address.")
+
+main(run_test)
diff --git a/tests/tcg/multiarch/segfault.c b/tests/tcg/multiarch/segfault.c
new file mode 100644
index 00..e6c8ff31ca
--- /dev/null
+++ b/tests/tcg/multiarch/segfault.c
@@ -0,0 +1,14 @@
+#include 
+#include 
+
+/* Cause a segfault for testing purposes. */
+
+int main(int argc, char *argv[])
+{
+int *ptr = (void *)0xdeadbeef;
+
+if (argc == 2 && strcmp(argv[1], "-s") == 0) {
+/* Cause segfault. */
+printf("%d\n", *ptr);
+}
+}
-- 
2.34.1

[PATCH v3 1/5] gdbstub: Rename back gdb_handlesig

2024-03-08 Thread Gustavo Romero

Rename gdb_handlesig_reason back to gdb_handlesig. There is no need to
add a wrapper for gdb_handlesig and rename it when a new parameter is
added.

Signed-off-by: Gustavo Romero 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 gdbstub/user.c |  8 
 include/gdbstub/user.h | 15 ++-
 linux-user/main.c  |  2 +-
 linux-user/signal.c|  2 +-
 4 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/gdbstub/user.c b/gdbstub/user.c
index 14918d1a21..a157e67f95 100644
--- a/gdbstub/user.c
+++ b/gdbstub/user.c
@@ -131,7 +131,7 @@ void gdb_qemu_exit(int code)
 exit(code);
 }
 
-int gdb_handlesig_reason(CPUState *cpu, int sig, const char *reason)
+int gdb_handlesig(CPUState *cpu, int sig, const char *reason)
 {
 char buf[256];
 int n;
@@ -510,7 +510,7 @@ void gdb_breakpoint_remove_all(CPUState *cs)
 void gdb_syscall_handling(const char *syscall_packet)
 {
 gdb_put_packet(syscall_packet);
-gdb_handlesig(gdbserver_state.c_cpu, 0);
+gdb_handlesig(gdbserver_state.c_cpu, 0, NULL);
 }
 
 static bool should_catch_syscall(int num)
@@ -528,7 +528,7 @@ void gdb_syscall_entry(CPUState *cs, int num)
 {
 if (should_catch_syscall(num)) {
 g_autofree char *reason = g_strdup_printf("syscall_entry:%x;", num);
-gdb_handlesig_reason(cs, gdb_target_sigtrap(), reason);
+gdb_handlesig(cs, gdb_target_sigtrap(), reason);
 }
 }
 
@@ -536,7 +536,7 @@ void gdb_syscall_return(CPUState *cs, int num)
 {
 if (should_catch_syscall(num)) {
 g_autofree char *reason = g_strdup_printf("syscall_return:%x;", num);
-gdb_handlesig_reason(cs, gdb_target_sigtrap(), reason);
+gdb_handlesig(cs, gdb_target_sigtrap(), reason);
 }
 }
 
diff --git a/include/gdbstub/user.h b/include/gdbstub/user.h
index 68b6534130..6647af2123 100644
--- a/include/gdbstub/user.h
+++ b/include/gdbstub/user.h
@@ -10,7 +10,7 @@
 #define GDBSTUB_USER_H
 
 /**
- * gdb_handlesig_reason() - yield control to gdb
+ * gdb_handlesig() - yield control to gdb
  * @cpu: CPU
  * @sig: if non-zero, the signal number which caused us to stop
  * @reason: stop reason for stop reply packet or NULL
@@ -25,18 +25,7 @@
  * or 0 if no signal should be delivered, ie the signal that caused
  * us to stop should be ignored.
  */
-int gdb_handlesig_reason(CPUState *, int, const char *);
-
-/**
- * gdb_handlesig() - yield control to gdb
- * @cpu CPU
- * @sig: if non-zero, the signal number which caused us to stop
- * @see gdb_handlesig_reason()
- */
-static inline int gdb_handlesig(CPUState *cpu, int sig)
-{
-return gdb_handlesig_reason(cpu, sig, NULL);
-}
+int gdb_handlesig(CPUState *, int, const char *);
 
 /**
  * gdb_signalled() - inform remote gdb of sig exit
diff --git a/linux-user/main.c b/linux-user/main.c
index 551acf1661..049fd85a2a 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -1014,7 +1014,7 @@ int main(int argc, char **argv, char **envp)
 gdbstub);
 exit(EXIT_FAILURE);
 }
-gdb_handlesig(cpu, 0);
+gdb_handlesig(cpu, 0, NULL);
 }
 
 #ifdef CONFIG_SEMIHOSTING
diff --git a/linux-user/signal.c b/linux-user/signal.c
index d3e62ab030..a57c45de35 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -1180,7 +1180,7 @@ static void handle_pending_signal(CPUArchState *cpu_env, 
int sig,
 /* dequeue signal */
 k->pending = 0;
 
-sig = gdb_handlesig(cpu, sig);
+sig = gdb_handlesig(cpu, sig, NULL);
 if (!sig) {
 sa = NULL;
 handler = TARGET_SIG_IGN;
-- 
2.34.1

[PATCH v3 3/5] gdbstub: Save target's siginfo

2024-03-08 Thread Gustavo Romero

Save target's siginfo into gdbserver_state so it can be used later, for
example, in any stub that requires the target's si_signo and si_code.

This change affects only linux-user mode.

Signed-off-by: Gustavo Romero 
Suggested-by: Richard Henderson 
---
 bsd-user/main.c|  2 +-
 bsd-user/signal.c  |  5 -
 gdbstub/user.c | 23 +++
 include/gdbstub/user.h |  6 +-
 linux-user/main.c  |  2 +-
 linux-user/signal.c|  5 -
 6 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/bsd-user/main.c b/bsd-user/main.c
index 512d4ab69f..04b18eee27 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -601,7 +601,7 @@ int main(int argc, char **argv)
 
 if (gdbstub) {
 gdbserver_start(gdbstub);
-gdb_handlesig(cpu, 0);
+gdb_handlesig(cpu, 0, NULL, NULL, 0);
 }
 cpu_loop(env);
 /* never exits */
diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index f4352e4530..ad738def70 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -27,6 +27,9 @@
 #include "hw/core/tcg-cpu-ops.h"
 #include "host-signal.h"
 
+/* target_siginfo_t must fit in gdbstub's siginfo save area. */
+QEMU_BUILD_BUG_ON(sizeof(target_siginfo_t) > MAX_SIGINFO_LENGTH);
+
 static struct target_sigaction sigact_table[TARGET_NSIG];
 static void host_signal_handler(int host_sig, siginfo_t *info, void *puc);
 static void target_to_host_sigset_internal(sigset_t *d,
@@ -890,7 +893,7 @@ static void handle_pending_signal(CPUArchState *env, int 
sig,
 
 k->pending = 0;
 
-sig = gdb_handlesig(cpu, sig);
+sig = gdb_handlesig(cpu, sig, NULL, >info, sizeof(k->info));
 if (!sig) {
 sa = NULL;
 handler = TARGET_SIG_IGN;
diff --git a/gdbstub/user.c b/gdbstub/user.c
index a157e67f95..df040c6ffa 100644
--- a/gdbstub/user.c
+++ b/gdbstub/user.c
@@ -36,6 +36,8 @@ typedef struct {
  */
 bool catch_all_syscalls;
 GDBSyscallsMask catch_syscalls_mask;
+uint8_t siginfo[MAX_SIGINFO_LENGTH];
+unsigned long siginfo_len;
 } GDBUserState;
 
 static GDBUserState gdbserver_user_state;
@@ -131,7 +133,8 @@ void gdb_qemu_exit(int code)
 exit(code);
 }
 
-int gdb_handlesig(CPUState *cpu, int sig, const char *reason)
+int gdb_handlesig(CPUState *cpu, int sig, const char *reason, void *siginfo,
+  int siginfo_len)
 {
 char buf[256];
 int n;
@@ -140,6 +143,18 @@ int gdb_handlesig(CPUState *cpu, int sig, const char 
*reason)
 return sig;
 }
 
+if (siginfo) {
+/*
+ * Save target-specific siginfo.
+ *
+ * siginfo size, i.e. siginfo_len, is asserted at compile-time to fit 
in
+ * gdbserver_user_state.siginfo, usually in the source file calling
+ * gdb_handlesig. See, for instance, {linux,bsd}-user/signal.c.
+ */
+memcpy(gdbserver_user_state.siginfo, siginfo, siginfo_len);
+gdbserver_user_state.siginfo_len = siginfo_len;
+}
+
 /* disable single step if it was enabled */
 cpu_single_step(cpu, 0);
 tb_flush(cpu);
@@ -510,7 +525,7 @@ void gdb_breakpoint_remove_all(CPUState *cs)
 void gdb_syscall_handling(const char *syscall_packet)
 {
 gdb_put_packet(syscall_packet);
-gdb_handlesig(gdbserver_state.c_cpu, 0, NULL);
+gdb_handlesig(gdbserver_state.c_cpu, 0, NULL, NULL, 0);
 }
 
 static bool should_catch_syscall(int num)
@@ -528,7 +543,7 @@ void gdb_syscall_entry(CPUState *cs, int num)
 {
 if (should_catch_syscall(num)) {
 g_autofree char *reason = g_strdup_printf("syscall_entry:%x;", num);
-gdb_handlesig(cs, gdb_target_sigtrap(), reason);
+gdb_handlesig(cs, gdb_target_sigtrap(), reason, NULL, 0);
 }
 }
 
@@ -536,7 +551,7 @@ void gdb_syscall_return(CPUState *cs, int num)
 {
 if (should_catch_syscall(num)) {
 g_autofree char *reason = g_strdup_printf("syscall_return:%x;", num);
-gdb_handlesig(cs, gdb_target_sigtrap(), reason);
+gdb_handlesig(cs, gdb_target_sigtrap(), reason, NULL, 0);
 }
 }
 
diff --git a/include/gdbstub/user.h b/include/gdbstub/user.h
index 6647af2123..0ec9a7e596 100644
--- a/include/gdbstub/user.h
+++ b/include/gdbstub/user.h
@@ -9,11 +9,15 @@
 #ifndef GDBSTUB_USER_H
 #define GDBSTUB_USER_H
 
+#define MAX_SIGINFO_LENGTH 128
+
 /**
  * gdb_handlesig() - yield control to gdb
  * @cpu: CPU
  * @sig: if non-zero, the signal number which caused us to stop
  * @reason: stop reason for stop reply packet or NULL
+ * @siginfo: target-specific siginfo struct
+ * @siginfo_len: target-specific siginfo struct length
  *
  * This function yields control to gdb, when a user-mode-only target
  * needs to stop execution. If @sig is non-zero, then we will send a
@@ -25,7 +29,7 @@
  * or 0 if no signal should be delivered, ie the signal that caused
  * us to stop should be ignored.
  */
-int gdb_handlesig(CPUState *, int, const char *);
+int gdb_handlesig(CPUState *, int, const char *, void *, int);
 
 /**
  * gdb_signalled() - inform remote gdb of

[PATCH v3 4/5] gdbstub: Add Xfer:siginfo:read stub

2024-03-08 Thread Gustavo Romero

Add stub to handle Xfer:siginfo:read packet query that requests the
machine's siginfo data.

This is used when GDB user executes 'print $_siginfo' and when the
machine stops due to a signal, for instance, on SIGSEGV. The information
in siginfo allows GDB to determiner further details on the signal, like
the fault address/insn when the SIGSEGV is caught.

Signed-off-by: Gustavo Romero 
---
 gdbstub/gdbstub.c   |  8 
 gdbstub/internals.h |  1 +
 gdbstub/user.c  | 23 +++
 3 files changed, 32 insertions(+)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 2909bc8c69..ab38cea46b 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -1651,6 +1651,8 @@ static void handle_query_supported(GArray *params, void 
*user_ctx)
 g_string_append(gdbserver_state.str_buf, ";qXfer:auxv:read+");
 }
 g_string_append(gdbserver_state.str_buf, ";QCatchSyscalls+");
+
+g_string_append(gdbserver_state.str_buf, ";qXfer:siginfo:read+");
 #endif
 g_string_append(gdbserver_state.str_buf, ";qXfer:exec-file:read+");
 #endif
@@ -1799,6 +1801,12 @@ static const GdbCmdParseEntry gdb_gen_query_table[] = {
 .cmd_startswith = 1,
 .schema = "l,l0"
 },
+{
+.handler = gdb_handle_query_xfer_siginfo,
+.cmd = "Xfer:siginfo:read::",
+.cmd_startswith = 1,
+.schema = "l,l0"
+ },
 #endif
 {
 .handler = gdb_handle_query_xfer_exec_file,
diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index 56b7c13b75..fcfe7c2d26 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -190,6 +190,7 @@ typedef union GdbCmdVariant {
 void gdb_handle_query_rcmd(GArray *params, void *user_ctx); /* softmmu */
 void gdb_handle_query_offsets(GArray *params, void *user_ctx); /* user */
 void gdb_handle_query_xfer_auxv(GArray *params, void *user_ctx); /*user */
+void gdb_handle_query_xfer_siginfo(GArray *params, void *user_ctx); /*user */
 void gdb_handle_v_file_open(GArray *params, void *user_ctx); /* user */
 void gdb_handle_v_file_close(GArray *params, void *user_ctx); /* user */
 void gdb_handle_v_file_pread(GArray *params, void *user_ctx); /* user */
diff --git a/gdbstub/user.c b/gdbstub/user.c
index df040c6ffa..5e175b5d62 100644
--- a/gdbstub/user.c
+++ b/gdbstub/user.c
@@ -616,3 +616,26 @@ void gdb_handle_set_catch_syscalls(GArray *params, void 
*user_ctx)
 err:
 gdb_put_packet("E00");
 }
+
+void gdb_handle_query_xfer_siginfo(GArray *params, void *user_ctx)
+{
+unsigned long offset, len;
+uint8_t *siginfo_offset;
+
+offset = get_param(params, 0)->val_ul;
+len = get_param(params, 1)->val_ul;
+
+if (offset + len > gdbserver_user_state.siginfo_len) {
+/* Invalid offset and/or requested length. */
+gdb_put_packet("E01");
+return;
+}
+
+siginfo_offset = (uint8_t *)gdbserver_user_state.siginfo + offset;
+
+/* Reply */
+g_string_assign(gdbserver_state.str_buf, "l");
+gdb_memtox(gdbserver_state.str_buf, (const char *)siginfo_offset, len);
+gdb_put_packet_binary(gdbserver_state.str_buf->str,
+  gdbserver_state.str_buf->len, true);
+}
-- 
2.34.1

[PATCH v3 0/5] Add stub to handle Xfer:siginfo:read query

2024-03-08 Thread Gustavo Romero

Xfer:siginfo:read query is received, usually, when GDB catches a signal
and needs additional info about it, like the si_code and the si_addr, so
GDB can show the user interesting info about the signal. This query can
also be received when an user explicitly asks for more information via
printing GBD's special purpose variable '$_siginfo'.

This series adds the stub to handle Xfer:siginfo:read queries.

To achieve this, it is first necessary to stash the target-specific
siginfo in the gdbstub server state struct when handling a signal, so it
requires modifying the gdb_handlesig function to accept the target's
siginfo struct and its length.

Later, when replying to a Xfer:siginfo:read query (i.e., after a
signal is caught), the query handler utilizes the stashed siginfo to
form the packet for replying to the query.

Finally, a test is added to check if the stub correctly responds to the
query when a simple binary causes a SIGSEGV. Since the si_addr must be
available in the case of a SIGSEGV, the value of si_addr is checked
against the expected faulting address, corresponding to the dereferenced
pointer value in the binary.

v1:
https://lists.gnu.org/archive/html/qemu-devel/2024-03/msg00423.html

v2:
https://lists.gnu.org/archive/html/qemu-devel/2024-03/msg01858.html


Cheers,
Gustavo

Gustavo Romero (5):
  gdbstub: Rename back gdb_handlesig
  linux-user: Move tswap_siginfo out of target code
  gdbstub: Save target's siginfo
  gdbstub: Add Xfer:siginfo:read stub
  tests/tcg: Add multiarch test for Xfer:siginfo:read stub

 bsd-user/main.c   |  2 +-
 bsd-user/signal.c |  5 +-
 gdbstub/gdbstub.c |  8 
 gdbstub/internals.h   |  1 +
 gdbstub/user.c| 46 +--
 include/gdbstub/user.h| 19 +++-
 linux-user/aarch64/signal.c   |  2 +-
 linux-user/alpha/signal.c |  2 +-
 linux-user/arm/signal.c   |  2 +-
 linux-user/hexagon/signal.c   |  2 +-
 linux-user/hppa/signal.c  |  2 +-
 linux-user/i386/signal.c  |  6 +--
 linux-user/loongarch64/signal.c   |  2 +-
 linux-user/m68k/signal.c  |  4 +-
 linux-user/main.c |  2 +-
 linux-user/microblaze/signal.c|  2 +-
 linux-user/mips/signal.c  |  4 +-
 linux-user/nios2/signal.c |  2 +-
 linux-user/openrisc/signal.c  |  2 +-
 linux-user/ppc/signal.c   |  4 +-
 linux-user/riscv/signal.c |  2 +-
 linux-user/s390x/signal.c |  2 +-
 linux-user/sh4/signal.c   |  2 +-
 linux-user/signal-common.h|  2 -
 linux-user/signal.c   | 15 --
 linux-user/sparc/signal.c |  2 +-
 linux-user/xtensa/signal.c|  2 +-
 tests/tcg/multiarch/Makefile.target   | 10 +++-
 .../gdbstub/test-qxfer-siginfo-read.py| 26 +++
 tests/tcg/multiarch/segfault.c| 14 ++
 30 files changed, 147 insertions(+), 49 deletions(-)
 create mode 100644 tests/tcg/multiarch/gdbstub/test-qxfer-siginfo-read.py
 create mode 100644 tests/tcg/multiarch/segfault.c

-- 
2.34.1

[PATCH v3 2/5] linux-user: Move tswap_siginfo out of target code

2024-03-08 Thread Gustavo Romero

Move tswap_siginfo from target code to handle_pending_signal. This will
allow some cleanups and having the siginfo ready to be used in gdbstub.

Signed-off-by: Gustavo Romero 
Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
---
 linux-user/aarch64/signal.c |  2 +-
 linux-user/alpha/signal.c   |  2 +-
 linux-user/arm/signal.c |  2 +-
 linux-user/hexagon/signal.c |  2 +-
 linux-user/hppa/signal.c|  2 +-
 linux-user/i386/signal.c|  6 +++---
 linux-user/loongarch64/signal.c |  2 +-
 linux-user/m68k/signal.c|  4 ++--
 linux-user/microblaze/signal.c  |  2 +-
 linux-user/mips/signal.c|  4 ++--
 linux-user/nios2/signal.c   |  2 +-
 linux-user/openrisc/signal.c|  2 +-
 linux-user/ppc/signal.c |  4 ++--
 linux-user/riscv/signal.c   |  2 +-
 linux-user/s390x/signal.c   |  2 +-
 linux-user/sh4/signal.c |  2 +-
 linux-user/signal-common.h  |  2 --
 linux-user/signal.c | 10 --
 linux-user/sparc/signal.c   |  2 +-
 linux-user/xtensa/signal.c  |  2 +-
 20 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index a1e22d526d..bc7a13800d 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -670,7 +670,7 @@ static void target_setup_frame(int usig, struct 
target_sigaction *ka,
 aarch64_set_svcr(env, 0, R_SVCR_SM_MASK | R_SVCR_ZA_MASK);
 
 if (info) {
-tswap_siginfo(>info, info);
+frame->info = *info;
 env->xregs[1] = frame_addr + offsetof(struct target_rt_sigframe, info);
 env->xregs[2] = frame_addr + offsetof(struct target_rt_sigframe, uc);
 }
diff --git a/linux-user/alpha/signal.c b/linux-user/alpha/signal.c
index 4ec42994d4..896c2c148a 100644
--- a/linux-user/alpha/signal.c
+++ b/linux-user/alpha/signal.c
@@ -173,7 +173,7 @@ void setup_rt_frame(int sig, struct target_sigaction *ka,
 goto give_sigsegv;
 }
 
-tswap_siginfo(>info, info);
+frame->info = *info;
 
 __put_user(0, >uc.tuc_flags);
 __put_user(0, >uc.tuc_link);
diff --git a/linux-user/arm/signal.c b/linux-user/arm/signal.c
index f77f692c63..420fc04cfa 100644
--- a/linux-user/arm/signal.c
+++ b/linux-user/arm/signal.c
@@ -357,7 +357,7 @@ void setup_rt_frame(int usig, struct target_sigaction *ka,
 
 info_addr = frame_addr + offsetof(struct rt_sigframe, info);
 uc_addr = frame_addr + offsetof(struct rt_sigframe, sig.uc);
-tswap_siginfo(>info, info);
+frame->info = *info;
 
 setup_sigframe(>sig.uc, set, env);
 
diff --git a/linux-user/hexagon/signal.c b/linux-user/hexagon/signal.c
index 60fa7e1bce..492b51f155 100644
--- a/linux-user/hexagon/signal.c
+++ b/linux-user/hexagon/signal.c
@@ -162,7 +162,7 @@ void setup_rt_frame(int sig, struct target_sigaction *ka,
 }
 
 setup_ucontext(>uc, env, set);
-tswap_siginfo(>info, info);
+frame->info = *info;
 /*
  * The on-stack signal trampoline is no longer executed;
  * however, the libgcc signal frame unwinding code checks
diff --git a/linux-user/hppa/signal.c b/linux-user/hppa/signal.c
index d08a97dae6..8960175da3 100644
--- a/linux-user/hppa/signal.c
+++ b/linux-user/hppa/signal.c
@@ -127,7 +127,7 @@ void setup_rt_frame(int sig, struct target_sigaction *ka,
 goto give_sigsegv;
 }
 
-tswap_siginfo(>info, info);
+frame->info = *info;
 frame->uc.tuc_flags = 0;
 frame->uc.tuc_link = 0;
 
diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c
index bc5d45302e..cfe70fc5cf 100644
--- a/linux-user/i386/signal.c
+++ b/linux-user/i386/signal.c
@@ -430,7 +430,7 @@ void setup_frame(int sig, struct target_sigaction *ka,
 setup_sigcontext(>sc, >fpstate, env, set->sig[0],
 frame_addr + offsetof(struct sigframe, fpstate));
 
-for(i = 1; i < TARGET_NSIG_WORDS; i++) {
+for (i = 1; i < TARGET_NSIG_WORDS; i++) {
 __put_user(set->sig[i], >extramask[i - 1]);
 }
 
@@ -490,7 +490,7 @@ void setup_rt_frame(int sig, struct target_sigaction *ka,
 __put_user(addr, >puc);
 #endif
 if (ka->sa_flags & TARGET_SA_SIGINFO) {
-tswap_siginfo(>info, info);
+frame->info = *info;
 }
 
 /* Create the ucontext.  */
@@ -504,7 +504,7 @@ void setup_rt_frame(int sig, struct target_sigaction *ka,
 setup_sigcontext(>uc.tuc_mcontext, >fpstate, env,
 set->sig[0], frame_addr + offsetof(struct rt_sigframe, fpstate));
 
-for(i = 0; i < TARGET_NSIG_WORDS; i++) {
+for (i = 0; i < TARGET_NSIG_WORDS; i++) {
 __put_user(set->sig[i], >uc.tuc_sigmask.sig[i]);
 }
 
diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c
index 39ea82c814..1a322f9697 100644
--- a/linux-user/loongarch64/signal.c
+++ b/linux-user/loongarch64/signal.c
@@ -376,7 +376,7 @@ void setup_rt_frame(int sig, struct target_sigaction *ka,
 extctx.end.haddr = (void *)frame + (extctx.end.gaddr - frame_addr);

Re: Unmapping KVM Guest Memory from Host Kernel

2024-03-08 Thread Manwaring, Derek

On 2024-03-08 10:36-0700, David Matlack wrote:
> On Fri, Mar 8, 2024 at 8:25 AM Brendan Jackman  wrote:
> > On Fri, 8 Mar 2024 at 16:50, Gowans, James  wrote:
> > > Our goal is to more completely address the class of issues whose leak
> > > origin is categorized as "Mapped memory" [1].
> >
> > Did you forget a link below? I'm interested in hearing about that
> > categorisation.

The paper from Hertogh, et al. is 
https://download.vusec.net/papers/quarantine_raid23.pdf
specifically Table 1.

> > It's perhaps a bigger hammer than you are looking for, but the
> > solution we're working on at Google is "Address Space Isolation" (ASI)
> > - the latest posting about that is [2].
>
> I think what James is looking for (and what we are also interested
> in), is _eliminating_ the ability to access guest memory from the
> direct map entirely.

Actually, just preventing speculation of guest memory through the
direct map is sufficient for our current focus.

Brendan,
I will look into the general ASI approach, thank you. Did you consider
memfd_secret or a guest_memfd-based approach for Userspace-ASI? Based on
Sean's earlier reply to James it sounds like the vision of guest_memfd
aligns with ASI's goals.

Derek

Re: [External] Re: [PATCH v4 2/7] migration/multifd: Implement zero page transmission on the multifd thread.

2024-03-08 Thread hao . xiang

> 
> On Sun, Mar 3, 2024 at 11:16 PM Peter Xu  wrote:
> 
> > 
> > On Fri, Mar 01, 2024 at 02:28:24AM +, Hao Xiang wrote:
> > 
> >  -GlobalProperty hw_compat_8_2[] = {};
> > 
> >  +GlobalProperty hw_compat_8_2[] = {
> > 
> >  + { "migration", "zero-page-detection", "legacy"},
> > 
> >  +};
> > 
> >  I hope we can make it for 9.0, then this (and many rest places) can be kept
> > 
> >  as-is. Let's see.. soft-freeze is March 12th.
> > 
> >  One thing to mention is I just sent a pull which has mapped-ram feature
> > 
> >  merged. You may need a rebase onto that, and hopefully mapped-ram can also
> > 
> >  use your feature too within the same patch when you repost.
> > 
> >  https://lore.kernel.org/all/20240229153017.2221-1-faro...@suse.de/
> > 
> >  That rebase may or may not need much caution, I apologize for that:
> > 
> >  mapped-ram as a feature was discussed 1+ years, so it was a plan to merge
> > 
> >  it (actually still partly of it) into QEMU 9.0.

Let's see if we can catch that.

> > 
> >  [...]
> > 
> >  +static bool multifd_zero_page(void)
> > 
> >  multifd_zero_page_enabled()?

Changed.

> > 
> >  +{
> > 
> >  + return migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD;
> > 
> >  +}
> > 
> >  +
> > 
> >  +static void swap_page_offset(ram_addr_t *pages_offset, int a, int b)
> > 
> >  +{
> > 
> >  + ram_addr_t temp;
> > 
> >  +
> > 
> >  + if (a == b) {
> > 
> >  + return;
> > 
> >  + }
> > 
> >  +
> > 
> >  + temp = pages_offset[a];
> > 
> >  + pages_offset[a] = pages_offset[b];
> > 
> >  + pages_offset[b] = temp;
> > 
> >  +}
> > 
> >  +
> > 
> >  +/**
> > 
> >  + * multifd_send_zero_page_check: Perform zero page detection on all pages.
> > 
> >  + *
> > 
> >  + * Sorts normal pages before zero pages in p->pages->offset and updates
> > 
> >  + * p->pages->normal_num.
> > 
> >  + *
> > 
> >  + * @param p A pointer to the send params.
> > 
> >  Nit: the majority of doc style in QEMU (it seems to me) is:
> > 
> >  @p: pointer to @MultiFDSendParams.
> > 
> >  + */
> > 
> >  +void multifd_send_zero_page_check(MultiFDSendParams *p)
> > 
> >  multifd_send_zero_page_detect()?
> > 
> >  This patch used "check" on both sides, but neither of them is a pure check
> > 
> >  to me. For the other side, maybe multifd_recv_zero_page_process()? As
> > 
> >  that one applies the zero pages.


Renamed.

> > 
> >  +{
> > 
> >  + MultiFDPages_t *pages = p->pages;
> > 
> >  + RAMBlock *rb = pages->block;
> > 
> >  + int i = 0;
> > 
> >  + int j = pages->num - 1;
> > 
> >  +
> > 
> >  + /*
> > 
> >  + * QEMU older than 9.0 don't understand zero page
> > 
> >  + * on multifd channel. This switch is required to
> > 
> >  + * maintain backward compatibility.
> > 
> >  + */
> > 
> >  IMHO we can drop this comment; it is not accurate as the user can disable
> > 
> >  it explicitly through the parameter, then it may not always about 
> > compatibility.

Dropped.

> > 
> >  + if (multifd_zero_page()) {
> > 
> >  Shouldn't this be "!multifd_zero_page_enabled()"?

Thanks for catching this! My bad. Fixed.

> > 
> >  + pages->normal_num = pages->num;
> > 
> >  + return;
> > 
> >  + }
> > 
> >  The rest looks all sane.
> > 
> >  Thanks,
> > 
> >  --
> > 
> >  Peter Xu
> >
>

Re: [PATCH v4 3/7] migration/multifd: Implement ram_save_target_page_multifd to handle multifd version of MigrationOps::ram_save_target_page.

2024-03-08 Thread hao . xiang

> 
> On Sun, Mar 3, 2024 at 11:46 PM Peter Xu  wrote:
> 
> > 
> > On Fri, Mar 01, 2024 at 02:28:25AM +, Hao Xiang wrote:
> > 
> >  1. Add a dedicated handler for MigrationOps::ram_save_target_page in
> > 
> >  multifd live migration.
> > 
> >  2. Refactor ram_save_target_page_legacy so that the legacy and multifd
> > 
> >  handlers don't have internal functions calling into each other.
> > 
> >  Signed-off-by: Hao Xiang 
> > 
> >  Reviewed-by: Fabiano Rosas 
> > 
> >  Message-Id: <20240226195654.934709-4-hao.xi...@bytedance.com>
> > 
> >  ---
> > 
> >  migration/ram.c | 43 ++-
> > 
> >  1 file changed, 30 insertions(+), 13 deletions(-)
> > 
> >  diff --git a/migration/ram.c b/migration/ram.c
> > 
> >  index e1fa229acf..f9d6ea65cc 100644
> > 
> >  --- a/migration/ram.c
> > 
> >  +++ b/migration/ram.c
> > 
> >  @@ -1122,10 +1122,6 @@ static int save_zero_page(RAMState *rs, 
> > PageSearchStatus *pss,
> > 
> >  QEMUFile *file = pss->pss_channel;
> > 
> >  int len = 0;
> > 
> >  - if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_NONE) {
> > 
> >  - return 0;
> > 
> >  - }
> > 
> >  We need to keep this to disable zero-page-detect on !multifd?

So if multifd is enabled, the new parameter takes effect. If multifd is not 
enabled, zero page checking will always be done in the main thread, which is 
exactly the behavior it is now. I thought legacy migration is a deprecated 
feature so I am trying to not add new stuff to it.

> > 
> >  -
> > 
> >  if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
> > 
> >  return 0;
> > 
> >  }
> > 
> >  @@ -2045,7 +2041,6 @@ static bool save_compress_page(RAMState *rs, 
> > PageSearchStatus *pss,
> > 
> >  */
> > 
> >  static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> > 
> >  {
> > 
> >  - RAMBlock *block = pss->block;
> > 
> >  ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> > 
> >  int res;
> > 
> >  @@ -2061,17 +2056,34 @@ static int ram_save_target_page_legacy(RAMState 
> > *rs, PageSearchStatus *pss)
> > 
> >  return 1;
> > 
> >  }
> > 
> >  + return ram_save_page(rs, pss);
> > 
> >  +}
> > 
> >  +
> > 
> >  +/**
> > 
> >  + * ram_save_target_page_multifd: send one target page to multifd workers
> > 
> >  + *
> > 
> >  + * Returns 1 if the page was queued, -1 otherwise.
> > 
> >  + *
> > 
> >  + * @rs: current RAM state
> > 
> >  + * @pss: data about the page we want to send
> > 
> >  + */
> > 
> >  +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus 
> > *pss)
> > 
> >  +{
> > 
> >  + RAMBlock *block = pss->block;
> > 
> >  + ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> > 
> >  +
> > 
> >  /*
> > 
> >  - * Do not use multifd in postcopy as one whole host page should be
> > 
> >  - * placed. Meanwhile postcopy requires atomic update of pages, so even
> > 
> >  - * if host page size == guest page size the dest guest during run may
> > 
> >  - * still see partially copied pages which is data corruption.
> > 
> >  + * Backward compatibility support. While using multifd live
> > 
> >  We can also avoid mentioning "compatibility support" here - it's a
> > 
> >  parameter, user can legally set it to anything.

Will drop that.

> > 
> >  + * migration, we still need to handle zero page checking on the
> > 
> >  + * migration main thread.
> > 
> >  */
> > 
> >  - if (migrate_multifd() && !migration_in_postcopy()) {
> > 
> >  - return ram_save_multifd_page(block, offset);
> > 
> >  + if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
> > 
> >  + if (save_zero_page(rs, pss, offset)) {
> > 
> >  + return 1;
> > 
> >  + }
> > 
> >  }
> > 
> >  - return ram_save_page(rs, pss);
> > 
> >  + return ram_save_multifd_page(block, offset);
> > 
> >  }
> > 
> >  /* Should be called before sending a host page */
> > 
> >  @@ -2983,7 +2995,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> > 
> >  }
> > 
> >  migration_ops = g_malloc0(sizeof(MigrationOps));
> > 
> >  - migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > 
> >  +
> > 
> >  + if (migrate_multifd()) {
> > 
> >  + migration_ops->ram_save_target_page = ram_save_target_page_multifd;
> > 
> >  + } else {
> > 
> >  + migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > 
> >  + }
> > 
> >  bql_unlock();
> > 
> >  ret = multifd_send_sync_main();
> > 
> >  --
> > 
> >  2.30.2
> > 
> >  --
> > 
> >  Peter Xu
> >
>

Re: [PATCH v2 4/5] gdbstub: Add Xfer:siginfo:read stub

2024-03-08 Thread Richard Henderson


On 3/8/24 08:30, Gustavo Romero wrote:

Hi Richard!

On 3/7/24 6:13 PM, Richard Henderson wrote:

On 3/7/24 08:26, Gustavo Romero wrote:

+void gdb_handle_query_xfer_siginfo(GArray *params, void *user_ctx)
+{
+    unsigned long offset, len;
+    uint8_t *siginfo_offset;
+
+    offset = get_param(params, 0)->val_ul;
+    len = get_param(params, 1)->val_ul;
+
+    if (offset + len > sizeof(target_siginfo_t)) {


If you save the siginfo_len from gdb_handlesig, you can place this in user.c

Shouldn't all user-only stubs be placed in user-target.c? Like
gdb_handle_query_xfer_auxv and gdb_handle_query_xfer_exec_file, and since
what controls the inclusion in the build of user-target.c is CONFIG_USER_ONLY?


user.c is also build for CONFIG_USER_ONLY, except that it is compiled only once, and has 
no target-specific code in it.



Is it really correct to reject (offset == 0) + (len == large), rather than 
truncate len?


I think this is correct. GDB mentions briefly that an invalid offset
should be treated as an error. Thus, I think that a valid offset but
a non-existing/invalid (large) length should be treated the same,
cause in the end data on invalid offsets are being requested anyways.


Ok.


r~

Re: [PATCH v2 3/5] gdbstub: Save target's siginfo

2024-03-08 Thread Richard Henderson


On 3/8/24 09:25, Alex Bennée wrote:

   make vm-build-[open|net|free]bsd

see make vm-help for details.


That won't build freebsd user.
Something I've mentioned to you before...


r~

Re: [PATCH v9 00/21] Introduce smp.modules for x86 in QEMU

2024-03-08 Thread Zhao Liu

On Fri, Mar 08, 2024 at 05:36:38PM +0100, Philippe Mathieu-Daudé wrote:
> Date: Fri, 8 Mar 2024 17:36:38 +0100
> From: Philippe Mathieu-Daudé 
> Subject: Re: [PATCH v9 00/21] Introduce smp.modules for x86 in QEMU
> 
> On 27/2/24 11:32, Zhao Liu wrote:
> 
> > ---
> > Zhao Liu (20):
> >hw/core/machine: Introduce the module as a CPU topology level
> >hw/core/machine: Support modules in -smp
> >hw/core: Introduce module-id as the topology subindex
> >hw/core: Support module-id in numa configuration
> 
> Patches 1-4 queued, thanks!

Thanks Philippe!

Re: [PATCH v2 00/13] Cleanup on SMP and its test

2024-03-08 Thread Zhao Liu

Hi Philippe,

> 
> Can you share your base commit please?
> 
> Applying: hw/core/machine-smp: Remove deprecated "parameter=0" SMP
> configurations
> Applying: hw/core/machine-smp: Deprecate unsupported "parameter=1" SMP
> configurations
> error: patch failed: docs/about/deprecated.rst:47
> error: docs/about/deprecated.rst: patch does not apply
> Patch failed at 0002 hw/core/machine-smp: Deprecate unsupported
> "parameter=1" SMP configurations
>

The base commit is e1007b6bab5cf ("Merge tag 'pull-request-2024-03-01'
of https://gitlab.com/thuth/qemu into staging").

But I think this conflict is because of the first 4 patches of mudule
series you picked. Let me rebase this series on that module series and
refresh a v3.

Thanks,
Zhao

Re: Unmapping KVM Guest Memory from Host Kernel

2024-03-08 Thread Sean Christopherson

On Fri, Mar 08, 2024, James Gowans wrote:
> However, memfd_secret doesn’t work out the box for KVM guest memory; the
> main reason seems to be that the GUP path is intentionally disabled for
> memfd_secret, so if we use a memfd_secret backed VMA for a memslot then
> KVM is not able to fault the memory in. If it’s been pre-faulted in by
> userspace then it seems to work.

Huh, that _shouldn't_ work.  The folio_is_secretmem() in gup_pte_range() is
supposed to prevent the "fast gup" path from getting secretmem pages.

Is this on an upstream kernel?  If so, and if you have bandwidth, can you figure
out why that isn't working?  At the very least, I suspect the memfd_secret
maintainers would be very interested to know that it's possible to fast gup
secretmem.

> There are a few other issues around when KVM accesses the guest memory.
> For example the KVM PV clock code goes directly to the PFN via the
> pfncache, and that also breaks if the PFN is not in the direct map, so
> we’d need to change that sort of thing, perhaps going via userspace
> addresses.
> 
> If we remove the memfd_secret check from the GUP path, and disable KVM’s
> pvclock from userspace via KVM_CPUID_FEATURES, we are able to boot a
> simple Linux initrd using a Firecracker VMM modified to use
> memfd_secret.
> 
> We are also aware of ongoing work on guest_memfd. The current
> implementation unmaps guest memory from VMM address space, but leaves it
> in the kernel’s direct map. We’re not looking at unmapping from VMM
> userspace yet; we still need guest RAM there for PV drivers like virtio
> to continue to work. So KVM’s gmem doesn’t seem like the right solution?

We (and by "we", I really mean the pKVM folks) are also working on allowing
userspace to mmap() guest_memfd[*].  pKVM aside, the long term vision I have for
guest_memfd is to be able to use it for non-CoCo VMs, precisely for the security
and robustness benefits it can bring.

What I am hoping to do with guest_memfd is get userspace to only map memory it
needs, e.g. for emulated/synthetic devices, on-demand.  I.e. to get to a state
where guest memory is mapped only when it needs to be.  More below.

> With this in mind, what’s the best way to solve getting guest RAM out of
> the direct map? Is memfd_secret integration with KVM the way to go, or
> should we build a solution on top of guest_memfd, for example via some
> flag that causes it to leave memory in the host userspace’s page tables,
> but removes it from the direct map? 

100% enhance guest_memfd.  If you're willing to wait long enough, pKVM might 
even
do all the work for you. :-)

The killer feature of guest_memfd is that it allows the guest mappings to be a
superset of the host userspace mappings.  Most obviously, it allows mapping 
memory
into the guest without mapping first mapping the memory into the userspace page
tables.  More subtly, it also makes it easier (in theory) to do things like map
the memory with 1GiB hugepages for the guest, but selectively map at 4KiB 
granularity
in the host.  Or map memory as RWX in the guest, but RO in the host (I don't 
have
a concrete use case for this, just pointing out it'll be trivial to do once
guest_memfd supports mmap()).

Every attempt to allow mapping VMA-based memory into a guest without it being
accessible by host userspace emory failed; it's literally why we ended up
implementing guest_memfd.  We could teach KVM to do the same with memfd_secret,
but we'd just end up re-implementing guest_memfd.

memfd_secret obviously gets you a PoC much faster, but in the long term I'm 
quite
sure you'll be fighting memfd_secret all the way.  E.g. it's not dumpable, it
deliberately allocates at 4KiB granularity (though I suspect the bug you found
means that it can be inadvertantly mapped with 2MiB hugepages), it has no line
of sight to taking userspace out of the equation, etc.

With guest_memfd on the other hand, everyone contributing to and maintaining it
has goals that are *very* closely aligned with what you want to do.

[*] https://lore.kernel.org/all/20240222161047.402609-1-ta...@google.com

Re: Unmapping KVM Guest Memory from Host Kernel

2024-03-08 Thread David Matlack

On Fri, Mar 8, 2024 at 8:25 AM Brendan Jackman  wrote:
>
> Hi James
>
> On Fri, 8 Mar 2024 at 16:50, Gowans, James  wrote:
> > Our goal is to more completely address the class of issues whose leak
> > origin is categorized as "Mapped memory" [1].
>
> Did you forget a link below? I'm interested in hearing about that
> categorisation.
>
> > ... what’s the best way to solve getting guest RAM out of
> > the direct map?
>
> It's perhaps a bigger hammer than you are looking for, but the
> solution we're working on at Google is "Address Space Isolation" (ASI)
> - the latest posting about that is [2].
>
> The sense in which it's a bigger hammer is that it doesn't only
> support removing guest memory from the direct map, but rather
> arbitrary data from arbitrary kernel mappings.

I'm not sure if ASI provides a solution to the problem James is trying
to solve. ASI creates a separate "restricted" address spaces where, yes,
guest memory can be not mapped. But any access to guest memory is
 still allowed. An access will trigger a page fault, the kernel will
switch to the "full" kernel address space (flushing hardware buffers
along the way to prevent speculation), and then proceed. i.e. ASI
doesn't not prevent accessing guest memory through the
direct map, it just prevents speculation of guest memory through the
direct map.

I think what James is looking for (and what we are also interested
in), is _eliminating_ the ability to access guest memory from the
direct map entirely. And in general, eliminate the ability to access
guest memory in as many ways as possible.

For that goal, I have been thinking about guest_memfd as a
solution. Yes guest_memfd today is backed by pages of memory that are
mapped in the direct map. But what we can do is add the ability to
back guest_memfd by pages of memory that aren't in the direct map. I
haven't thought it fully through yet but something like... Hide the
majority of RAM from Linux (I believe there are kernel parameters to
do this) and hand it off to guest_memfd to allocate from as a source
of guest memory. Then the only way to access guest memory is to mmap()
a guest_memfd (e.g. for PV userspace devices).

Re: Unmapping KVM Guest Memory from Host Kernel

2024-03-08 Thread Manwaring, Derek

On 2024-03-08 at 10:46-0700, David Woodhouse wrote:
> On Fri, 2024-03-08 at 09:35 -0800, David Matlack wrote:
> > I think what James is looking for (and what we are also interested
> > in), is _eliminating_ the ability to access guest memory from the
> > direct map entirely. And in general, eliminate the ability to access
> > guest memory in as many ways as possible.
>
> Well, pKVM does that...

Yes we've been looking at pKVM and it accomplishes a lot of what we're trying
to do. Our initial inclination is that we want to stick with VHE for the lower
overhead. We also want flexibility across server parts, so we would need to
get pKVM working on Intel & AMD if we went this route.

Certainly there are advantages of pKVM on the perf side like the in-place
memory sharing rather than copying as well as on the security side by simply
reducing the TCB. I'd be interested to hear others' thoughts on pKVM vs
memfd_secret or general ASI.

Derek

Re: Unmapping KVM Guest Memory from Host Kernel

2024-03-08 Thread Sean Christopherson

On Fri, Mar 08, 2024, David Woodhouse wrote:
> On Fri, 2024-03-08 at 09:35 -0800, David Matlack wrote:
> > I think what James is looking for (and what we are also interested
> > in), is _eliminating_ the ability to access guest memory from the
> > direct map entirely. And in general, eliminate the ability to access
> > guest memory in as many ways as possible.
> 
> Well, pKVM does that... 

Out-of-tree :-)

I'm not just being snarky; when pKVM lands this functionality upstream, I fully
expect zapping direct map entries to be generic guest_memfd functionality that
would be opt-in, either by the in-kernel technology, e.g. pKVM, or by userspace,
or by some combination of the two, e.g. I can see making it optional to nuke the
direct map when using guest_memfd for TDX guests so that rogue accesses from the
host generate synchronous #PFs instead of latent #MCs.

[PATCH v8 05/10] target/riscv: use vext_set_tail_elems_1s() in vcrypto insns

2024-03-08 Thread Daniel Henrique Barboza

Vcrypto insns should also use the same helper the regular vector insns
uses to update the tail elements.

Move vext_set_tail_elems_1s() to vector_internals.c and make it public.
Use it in vcrypto_helper.c to set tail elements instead of
vext_set_elems_1s(). Helpers must set env->vstart = 0 after setting the
tail.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/vcrypto_helper.c   | 63 -
 target/riscv/vector_helper.c| 25 -
 target/riscv/vector_internals.c | 28 +++
 target/riscv/vector_internals.h |  4 +++
 4 files changed, 55 insertions(+), 65 deletions(-)

diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index e2d719b13b..66d449c274 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -218,9 +218,7 @@ static inline void xor_round_key(AESState *round_state, 
AESState *round_key)
 void HELPER(NAME)(void *vd, void *vs2, CPURISCVState *env,\
   uint32_t desc)  \
 { \
-uint32_t vl = env->vl;\
 uint32_t total_elems = vext_get_total_elems(env, desc, 4);\
-uint32_t vta = vext_vta(desc);\
   \
 for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {\
 AESState round_key;   \
@@ -233,18 +231,16 @@ static inline void xor_round_key(AESState *round_state, 
AESState *round_key)
 *((uint64_t *)vd + H8(i * 2 + 0)) = round_state.d[0]; \
 *((uint64_t *)vd + H8(i * 2 + 1)) = round_state.d[1]; \
 } \
-env->vstart = 0;  \
 /* set tail elements to 1s */ \
-vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);  \
+vext_set_tail_elems_1s(env, vd, desc, 4, total_elems);\
+env->vstart = 0;  \
 }
 
 #define GEN_ZVKNED_HELPER_VS(NAME, ...)   \
 void HELPER(NAME)(void *vd, void *vs2, CPURISCVState *env,\
   uint32_t desc)  \
 { \
-uint32_t vl = env->vl;\
 uint32_t total_elems = vext_get_total_elems(env, desc, 4);\
-uint32_t vta = vext_vta(desc);\
   \
 for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {\
 AESState round_key;   \
@@ -257,9 +253,9 @@ static inline void xor_round_key(AESState *round_state, 
AESState *round_key)
 *((uint64_t *)vd + H8(i * 2 + 0)) = round_state.d[0]; \
 *((uint64_t *)vd + H8(i * 2 + 1)) = round_state.d[1]; \
 } \
-env->vstart = 0;  \
 /* set tail elements to 1s */ \
-vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);  \
+vext_set_tail_elems_1s(env, vd, desc, 4, total_elems);\
+env->vstart = 0;  \
 }
 
 GEN_ZVKNED_HELPER_VV(vaesef_vv, aesenc_SB_SR_AK(_state,
@@ -301,9 +297,7 @@ void HELPER(vaeskf1_vi)(void *vd_vptr, void *vs2_vptr, 
uint32_t uimm,
 {
 uint32_t *vd = vd_vptr;
 uint32_t *vs2 = vs2_vptr;
-uint32_t vl = env->vl;
 uint32_t total_elems = vext_get_total_elems(env, desc, 4);
-uint32_t vta = vext_vta(desc);
 
 uimm &= 0b;
 if (uimm > 10 || uimm == 0) {
@@ -337,9 +331,9 @@ void HELPER(vaeskf1_vi)(void *vd_vptr, void *vs2_vptr, 
uint32_t uimm,
 vd[i * 4 + H4(2)] = rk[6];
 vd[i * 4 + H4(3)] = rk[7];
 }
-env->vstart = 0;
 /* set tail elements to 1s */
-vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);
+vext_set_tail_elems_1s(env, vd, desc, 4, total_elems);
+env->vstart = 0;
 }
 
 void HELPER(vaeskf2_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
@@ -347,9 +341,7 @@ void HELPER(vaeskf2_vi)(void *vd_vptr, void *vs2_vptr, 
uint32_t uimm,
 {
 uint32_t *vd = vd_vptr;
 uint32_t *vs2 = vs2_vptr;
-uint32_t vl = env->vl;
 uint32_t total_elems = vext_get_total_elems(env, desc, 4);
-uint32_t vta = vext_vta(desc);
 
 uimm &= 0b;
 if (uimm > 14 || uimm <

[PATCH v8 00/10] riscv: set vstart_eq_zero on mark_vs_dirty

2024-03-08 Thread Daniel Henrique Barboza

Hi,

In this new version we addressed the points rose by LIU Zhiwei in the v7
version. Some patches had to go, additional patches were added:

- patches 1 and 2 from v7: already queued and sent in the last PR.

- patch 6 from v7: moved to patch 1

- patches 2, 3, 4 and 5 (all new): rework how we update tail elements

  Version 7 has a problem with tail elements being updated regardless of
  vstart >= vl because there's no guard for it. To fix that, and be able
  to remove the the vstart >= vl brconds from the translation code (most
  of them - more on that later), we changed vext_set_tail_elems_1s() to
  be a no-op if vstart >= vl. After that we went through all the code
  that was setting tail elems with vext_set_elems_1s() and converted it
  to use vext_set_tail_elems_1s(). We'll not update tail elements if
  vstart >= vl even without a brcond guard in the translation code.

- patch 6 (new): fix scalar move insns. They weren't setting vstart = 0.

- patch 7 (patch 3 from v7): do not remove brconds from scalar move
  insns

  trans_vmv_s_x() and trans_vfmv_s_f() does not have a helper that will
  handle vstart >= vl for them, so they need their brcond. 

- patches 4 and 5 from v7: dropped. We're not removing all brconds, so
  we can't get rid of cpu_vstart and cpu_vl.

Series based on alistair/riscv-to-apply.next. 

Patches missing review: 2, 3, 4, 5, 6.

Daniel Henrique Barboza (9):
  target/riscv/vector_helper.c: set vstart = 0 in GEN_VEXT_VSLIDEUP_VX()
  target/riscv: handle vstart >= vl in vext_set_tail_elems_1s()
  target/riscv/vector_helper.c: do vstart=0 after updating tail
  target/riscv/vector_helper.c: update tail with
vext_set_tail_elems_1s()
  target/riscv: use vext_set_tail_elems_1s() in vcrypto insns
  trans_rvv.c.inc: set vstart = 0 in int scalar move insns
  target/riscv: remove 'over' brconds from vector trans
  trans_rvv.c.inc: remove redundant mark_vs_dirty() calls
  target/riscv/vector_helper.c: optimize loops in ldst helpers

Ivan Klokov (1):
  target/riscv: Clear vstart_qe_zero flag

 target/riscv/insn_trans/trans_rvbf16.c.inc |  18 +-
 target/riscv/insn_trans/trans_rvv.c.inc| 205 ++---
 target/riscv/insn_trans/trans_rvvk.c.inc   |  30 +--
 target/riscv/translate.c   |   6 +
 target/riscv/vcrypto_helper.c  |  63 +++
 target/riscv/vector_helper.c   | 168 ++---
 target/riscv/vector_internals.c|  28 +++
 target/riscv/vector_internals.h|   4 +
 8 files changed, 186 insertions(+), 336 deletions(-)

-- 
2.43.2

[PATCH v8 10/10] target/riscv/vector_helper.c: optimize loops in ldst helpers

2024-03-08 Thread Daniel Henrique Barboza

Change the for loops in ldst helpers to do a single increment in the
counter, and assign it env->vstart, to avoid re-reading from vstart
every time.

Suggested-by: Richard Henderson 
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Alistair Francis 
Reviewed-by: Richard Henderson 
---
 target/riscv/vector_helper.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 1941c0e5f3..2e50341806 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -190,7 +190,7 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
 uint32_t esz = 1 << log2_esz;
 uint32_t vma = vext_vma(desc);
 
-for (i = env->vstart; i < env->vl; i++, env->vstart++) {
+for (i = env->vstart; i < env->vl; env->vstart = ++i) {
 k = 0;
 while (k < nf) {
 if (!vm && !vext_elem_mask(v0, i)) {
@@ -255,7 +255,7 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState 
*env, uint32_t desc,
 uint32_t esz = 1 << log2_esz;
 
 /* load bytes from guest memory */
-for (i = env->vstart; i < evl; i++, env->vstart++) {
+for (i = env->vstart; i < evl; env->vstart = ++i) {
 k = 0;
 while (k < nf) {
 target_ulong addr = base + ((i * nf + k) << log2_esz);
@@ -368,7 +368,7 @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
 uint32_t vma = vext_vma(desc);
 
 /* load bytes from guest memory */
-for (i = env->vstart; i < env->vl; i++, env->vstart++) {
+for (i = env->vstart; i < env->vl; env->vstart = ++i) {
 k = 0;
 while (k < nf) {
 if (!vm && !vext_elem_mask(v0, i)) {
-- 
2.43.2

[PATCH v8 07/10] target/riscv: remove 'over' brconds from vector trans

2024-03-08 Thread Daniel Henrique Barboza

Most of the vector translations has this following pattern at the start:

TCGLabel *over = gen_new_label();
tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);

And then right at the end:

 gen_set_label(over);
 return true;

This means that if vstart >= vl we'll not set vstart = 0 at the end of
the insns - this is done inside the helper that is being skipped.  The
reason why this pattern hasn't been a bigger problem is because the
conditional vstart >= vl is very rare.

Checking all the helpers in vector_helper.c we see all of them with a
pattern like this:

for (i = env->vstart; i < vl; i++) {
(...)
}
env->vstart = 0;

Thus they can handle vstart >= vl case gracefully, with the benefit of
setting env->vstart = 0 during the process.

Remove all 'over' conditionals and let the helper set env->vstart = 0
every time.

Note that not all insns uses helpers, and for those cases the 'brcond'
jump is the only way to filter vstart >= vl. This is the case of
trans_vmv_s_x() and trans_vfmv_s_f(). We won't remove the 'brcond'
conditionals from them.

While we're at it, remove the (vl == 0) brconds from trans_rvbf16.c.inc
too since they're unneeded.

Suggested-by: Richard Henderson 
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
---
 target/riscv/insn_trans/trans_rvbf16.c.inc |  12 ---
 target/riscv/insn_trans/trans_rvv.c.inc| 108 -
 target/riscv/insn_trans/trans_rvvk.c.inc   |  18 
 3 files changed, 138 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvbf16.c.inc 
b/target/riscv/insn_trans/trans_rvbf16.c.inc
index 8ee99df3f3..a842e76a6b 100644
--- a/target/riscv/insn_trans/trans_rvbf16.c.inc
+++ b/target/riscv/insn_trans/trans_rvbf16.c.inc
@@ -71,11 +71,8 @@ static bool trans_vfncvtbf16_f_f_w(DisasContext *ctx, 
arg_vfncvtbf16_f_f_w *a)
 
 if (opfv_narrow_check(ctx, a) && (ctx->sew == MO_16)) {
 uint32_t data = 0;
-TCGLabel *over = gen_new_label();
 
 gen_set_rm_chkfrm(ctx, RISCV_FRM_DYN);
-tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
-tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
 data = FIELD_DP32(data, VDATA, VM, a->vm);
 data = FIELD_DP32(data, VDATA, LMUL, ctx->lmul);
@@ -87,7 +84,6 @@ static bool trans_vfncvtbf16_f_f_w(DisasContext *ctx, 
arg_vfncvtbf16_f_f_w *a)
ctx->cfg_ptr->vlenb, data,
gen_helper_vfncvtbf16_f_f_w);
 mark_vs_dirty(ctx);
-gen_set_label(over);
 return true;
 }
 return false;
@@ -100,11 +96,8 @@ static bool trans_vfwcvtbf16_f_f_v(DisasContext *ctx, 
arg_vfwcvtbf16_f_f_v *a)
 
 if (opfv_widen_check(ctx, a) && (ctx->sew == MO_16)) {
 uint32_t data = 0;
-TCGLabel *over = gen_new_label();
 
 gen_set_rm_chkfrm(ctx, RISCV_FRM_DYN);
-tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
-tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
 data = FIELD_DP32(data, VDATA, VM, a->vm);
 data = FIELD_DP32(data, VDATA, LMUL, ctx->lmul);
@@ -116,7 +109,6 @@ static bool trans_vfwcvtbf16_f_f_v(DisasContext *ctx, 
arg_vfwcvtbf16_f_f_v *a)
ctx->cfg_ptr->vlenb, data,
gen_helper_vfwcvtbf16_f_f_v);
 mark_vs_dirty(ctx);
-gen_set_label(over);
 return true;
 }
 return false;
@@ -130,11 +122,8 @@ static bool trans_vfwmaccbf16_vv(DisasContext *ctx, 
arg_vfwmaccbf16_vv *a)
 if (require_rvv(ctx) && vext_check_isa_ill(ctx) && (ctx->sew == MO_16) &&
 vext_check_dss(ctx, a->rd, a->rs1, a->rs2, a->vm)) {
 uint32_t data = 0;
-TCGLabel *over = gen_new_label();
 
 gen_set_rm_chkfrm(ctx, RISCV_FRM_DYN);
-tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
-tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
 data = FIELD_DP32(data, VDATA, VM, a->vm);
 data = FIELD_DP32(data, VDATA, LMUL, ctx->lmul);
@@ -147,7 +136,6 @@ static bool trans_vfwmaccbf16_vv(DisasContext *ctx, 
arg_vfwmaccbf16_vv *a)
ctx->cfg_ptr->vlenb, data,
gen_helper_vfwmaccbf16_vv);
 mark_vs_dirty(ctx);
-gen_set_label(over);
 return true;
 }
 return false;
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index f3caabc101..3ec18412fe 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -616,9 +616,6 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
 TCGv base;
 TCGv_i32 desc;
 
-TCGLabel *over = gen_new_label();
-tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-
 dest = tcg_temp_new_ptr();
 mask = tcg_temp_new_ptr();
 base = get_gpr(s, rs1, EXT_NONE);
@@ -660,7 +657,6 @@ static bool

[PATCH v8 08/10] trans_rvv.c.inc: remove redundant mark_vs_dirty() calls

2024-03-08 Thread Daniel Henrique Barboza

trans_vmv_v_i , trans_vfmv_v_f and the trans_##NAME macro from
GEN_VMV_WHOLE_TRANS() are calling mark_vs_dirty() in both branches of
their 'ifs'. conditionals.

Call it just once in the end like other functions are doing.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
---
 target/riscv/insn_trans/trans_rvv.c.inc | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 3ec18412fe..fb9795c9f7 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -2065,7 +2065,6 @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
 if (s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
 tcg_gen_gvec_dup_imm(s->sew, vreg_ofs(s, a->rd),
  MAXSZ(s), MAXSZ(s), simm);
-mark_vs_dirty(s);
 } else {
 TCGv_i32 desc;
 TCGv_i64 s1;
@@ -2083,9 +2082,8 @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
   s->cfg_ptr->vlenb, data));
 tcg_gen_addi_ptr(dest, tcg_env, vreg_ofs(s, a->rd));
 fns[s->sew](dest, s1, tcg_env, desc);
-
-mark_vs_dirty(s);
 }
+mark_vs_dirty(s);
 return true;
 }
 return false;
@@ -2612,7 +2610,6 @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f 
*a)
 
 tcg_gen_gvec_dup_i64(s->sew, vreg_ofs(s, a->rd),
  MAXSZ(s), MAXSZ(s), t1);
-mark_vs_dirty(s);
 } else {
 TCGv_ptr dest;
 TCGv_i32 desc;
@@ -2635,9 +2632,8 @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f 
*a)
 tcg_gen_addi_ptr(dest, tcg_env, vreg_ofs(s, a->rd));
 
 fns[s->sew - 1](dest, t1, tcg_env, desc);
-
-mark_vs_dirty(s);
 }
+mark_vs_dirty(s);
 return true;
 }
 return false;
@@ -3567,12 +3563,11 @@ static bool trans_##NAME(DisasContext *s, arg_##NAME * 
a)   \
 if (s->vstart_eq_zero) {\
 tcg_gen_gvec_mov(s->sew, vreg_ofs(s, a->rd),\
  vreg_ofs(s, a->rs2), maxsz, maxsz);\
-mark_vs_dirty(s);   \
 } else {\
 tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2), \
tcg_env, maxsz, maxsz, 0, gen_helper_vmvr_v); \
-mark_vs_dirty(s);   \
 }   \
+mark_vs_dirty(s);   \
 return true;\
 }   \
 return false;   \
-- 
2.43.2

[PATCH v8 01/10] target/riscv/vector_helper.c: set vstart = 0 in GEN_VEXT_VSLIDEUP_VX()

2024-03-08 Thread Daniel Henrique Barboza

The helper isn't setting env->vstart = 0 after its execution, as it is
expected from every vector instruction that completes successfully.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
---
 target/riscv/vector_helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index fe56c007d5..ca79571ae2 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4781,6 +4781,7 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, 
void *vs2, \
 } \
 *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));  \
 } \
+env->vstart = 0;  \
 /* set tail elements to 1s */ \
 vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);  \
 }
-- 
2.43.2

[PATCH v8 09/10] target/riscv: Clear vstart_qe_zero flag

2024-03-08 Thread Daniel Henrique Barboza

From: Ivan Klokov 

The vstart_qe_zero flag is set at the beginning of the translation
phase from the env->vstart variable. During the execution phase all
functions will set env->vstart = 0 after a successful execution,
but the vstart_eq_zero flag remains the same as at the start of the
block. This will wrongly cause SIGILLs in translations that requires
env->vstart = 0 and might be reading vstart_eq_zero = false.

This patch adds a new finalize_rvv_inst() helper that is called at the
end of each vector instruction that will both update vstart_eq_zero and
do a mark_vs_dirty().

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1976
Signed-off-by: Ivan Klokov 
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
---
 target/riscv/insn_trans/trans_rvbf16.c.inc |  6 +-
 target/riscv/insn_trans/trans_rvv.c.inc| 83 --
 target/riscv/insn_trans/trans_rvvk.c.inc   | 12 ++--
 target/riscv/translate.c   |  6 ++
 4 files changed, 59 insertions(+), 48 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvbf16.c.inc 
b/target/riscv/insn_trans/trans_rvbf16.c.inc
index a842e76a6b..0a9cd1ec31 100644
--- a/target/riscv/insn_trans/trans_rvbf16.c.inc
+++ b/target/riscv/insn_trans/trans_rvbf16.c.inc
@@ -83,7 +83,7 @@ static bool trans_vfncvtbf16_f_f_w(DisasContext *ctx, 
arg_vfncvtbf16_f_f_w *a)
ctx->cfg_ptr->vlenb,
ctx->cfg_ptr->vlenb, data,
gen_helper_vfncvtbf16_f_f_w);
-mark_vs_dirty(ctx);
+finalize_rvv_inst(ctx);
 return true;
 }
 return false;
@@ -108,7 +108,7 @@ static bool trans_vfwcvtbf16_f_f_v(DisasContext *ctx, 
arg_vfwcvtbf16_f_f_v *a)
ctx->cfg_ptr->vlenb,
ctx->cfg_ptr->vlenb, data,
gen_helper_vfwcvtbf16_f_f_v);
-mark_vs_dirty(ctx);
+finalize_rvv_inst(ctx);
 return true;
 }
 return false;
@@ -135,7 +135,7 @@ static bool trans_vfwmaccbf16_vv(DisasContext *ctx, 
arg_vfwmaccbf16_vv *a)
ctx->cfg_ptr->vlenb,
ctx->cfg_ptr->vlenb, data,
gen_helper_vfwmaccbf16_vv);
-mark_vs_dirty(ctx);
+finalize_rvv_inst(ctx);
 return true;
 }
 return false;
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index fb9795c9f7..36941ceba2 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -167,7 +167,7 @@ static bool do_vsetvl(DisasContext *s, int rd, int rs1, 
TCGv s2)
 
 gen_helper_vsetvl(dst, tcg_env, s1, s2);
 gen_set_gpr(s, rd, dst);
-mark_vs_dirty(s);
+finalize_rvv_inst(s);
 
 gen_update_pc(s, s->cur_insn_len);
 lookup_and_goto_ptr(s);
@@ -187,7 +187,7 @@ static bool do_vsetivli(DisasContext *s, int rd, TCGv s1, 
TCGv s2)
 
 gen_helper_vsetvl(dst, tcg_env, s1, s2);
 gen_set_gpr(s, rd, dst);
-mark_vs_dirty(s);
+finalize_rvv_inst(s);
 gen_update_pc(s, s->cur_insn_len);
 lookup_and_goto_ptr(s);
 s->base.is_jmp = DISAS_NORETURN;
@@ -657,6 +657,7 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
 }
 
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -812,6 +813,7 @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, 
uint32_t rs2,
 
 fn(dest, mask, base, stride, tcg_env, desc);
 
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -913,6 +915,7 @@ static bool ldst_index_trans(uint32_t vd, uint32_t rs1, 
uint32_t vs2,
 
 fn(dest, mask, base, index, tcg_env, desc);
 
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -1043,7 +1046,7 @@ static bool ldff_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
 
 fn(dest, mask, base, tcg_env, desc);
 
-mark_vs_dirty(s);
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -1100,6 +1103,7 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, 
uint32_t nf,
 
 fn(dest, base, tcg_env, desc);
 
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -1189,7 +1193,7 @@ do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn 
*gvec_fn,
tcg_env, s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data, fn);
 }
-mark_vs_dirty(s);
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -1240,7 +1244,7 @@ static bool opivx_trans(uint32_t vd, uint32_t rs1, 
uint32_t vs2, uint32_t vm,
 
 fn(dest, mask, src1, src2, tcg_env, desc);
 
-mark_vs_dirty(s);
+finalize_rvv_inst(s);
 return true;
 }
 
@@ -1265,7 +1269,7 @@ do_opivx_gvec(DisasContext *s, arg_rmrr *a, GVecGen2sFn 
*gvec_fn,
 gvec_fn(s->sew, vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2),
 src1, MAXSZ(s), MAXSZ(s));
 
-mark_vs_dirty(s);
+finalize_rvv_inst(s);
 return

[PATCH v8 04/10] target/riscv/vector_helper.c: update tail with vext_set_tail_elems_1s()

2024-03-08 Thread Daniel Henrique Barboza

Change all code that updates tail elems to use vext_set_tail_elems_1s()
instead of vext_set_elems_1s().

Setting 'env->vstart=0' needs to be the very last thing a helper does
because env->vstart is being checked by vext_set_tail_elems_1s().

A side effect of this change is that a lot of 'vta' local variables got
unused. The reason is that 'vta' was being fetched to be used with
vext_set_elems_1s() but vext_set_tail_elems_1s() doesn't use it - 'vta' is
retrieve inside the helper using 'desc'.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/vector_helper.c | 130 ++-
 1 file changed, 52 insertions(+), 78 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 544234c2d8..2f715fea5e 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -878,7 +878,6 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, 
  \
 uint32_t esz = sizeof(ETYPE); \
 uint32_t total_elems =\
 vext_get_total_elems(env, desc, esz); \
-uint32_t vta = vext_vta(desc);\
 uint32_t i;   \
   \
 for (i = env->vstart; i < vl; i++) {  \
@@ -888,9 +887,9 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, 
  \
   \
 *((ETYPE *)vd + H(i)) = DO_OP(s2, s1, carry); \
 } \
-env->vstart = 0;  \
 /* set tail elements to 1s */ \
-vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);  \
+vext_set_tail_elems_1s(env, vd, desc, esz, total_elems);  \
+env->vstart = 0;  \
 }
 
 GEN_VEXT_VADC_VVM(vadc_vvm_b, uint8_t,  H1, DO_VADC)
@@ -910,7 +909,6 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void 
*vs2,\
 uint32_t vl = env->vl;   \
 uint32_t esz = sizeof(ETYPE);\
 uint32_t total_elems = vext_get_total_elems(env, desc, esz); \
-uint32_t vta = vext_vta(desc);   \
 uint32_t i;  \
  \
 for (i = env->vstart; i < vl; i++) { \
@@ -919,9 +917,9 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void 
*vs2,\
  \
 *((ETYPE *)vd + H(i)) = DO_OP(s2, (ETYPE)(target_long)s1, carry);\
 }\
-env->vstart = 0; \
 /* set tail elements to 1s */\
-vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz); \
+vext_set_tail_elems_1s(env, vd, desc, esz, total_elems); \
+env->vstart = 0; \
 }
 
 GEN_VEXT_VADC_VXM(vadc_vxm_b, uint8_t,  H1, DO_VADC)
@@ -1078,7 +1076,6 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,  
\
 uint32_t vl = env->vl;\
 uint32_t esz = sizeof(TS1);   \
 uint32_t total_elems = vext_get_total_elems(env, desc, esz);  \
-uint32_t vta = vext_vta(desc);\
 uint32_t vma = vext_vma(desc);\
 uint32_t i;   \
   \
@@ -1092,9 +1089,9 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,  
\
 TS2 s2 = *((TS2 *)vs2 + HS2(i));  \
 *((TS1 *)vd + HS1(i)) = OP(s2, s1 & MASK);\
 } \
-env->vstart = 0;  \
 /* set tail elements to 1s */ \
-vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);  \
+vext_set_tail_elems_1s(env, vd, desc, esz, total_elems);  \
+env->vstart = 0;  \
 }
 
 GEN_VEXT_SHIFT_VV(vsll_vv_b, uint8_t,  uint8_t, H1, H1, DO_SLL, 0x7)
@@ -1125,7 +1122,6 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,
  \
 uint32_t esz = sizeof(TD);

[PATCH v8 03/10] target/riscv/vector_helper.c: do vstart=0 after updating tail

2024-03-08 Thread Daniel Henrique Barboza

vext_vv_rm_1() and vext_vv_rm_2() are setting vstart = 0 before their
respective callers (vext_vv_rm_2 and  vext_vx_rm_2) update the tail
elements.

This is benign now, but we'll convert the tail updates to use
vext_set_tail_elems_1s(), and this function is sensitive to vstart
changes. Do vstart = 0 after vext_set_elems_1s() now to make the
conversion easier.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/vector_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index db1d3f77ce..544234c2d8 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1927,7 +1927,6 @@ vext_vv_rm_1(void *vd, void *v0, void *vs1, void *vs2,
 }
 fn(vd, vs1, vs2, i, env, vxrm);
 }
-env->vstart = 0;
 }
 
 static inline void
@@ -1962,6 +1961,7 @@ vext_vv_rm_2(void *vd, void *v0, void *vs1, void *vs2,
 }
 /* set tail elements to 1s */
 vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
+env->vstart = 0;
 }
 
 /* generate helpers for fixed point instructions with OPIVV format */
@@ -2052,7 +2052,6 @@ vext_vx_rm_1(void *vd, void *v0, target_long s1, void 
*vs2,
 }
 fn(vd, s1, vs2, i, env, vxrm);
 }
-env->vstart = 0;
 }
 
 static inline void
@@ -2087,6 +2086,7 @@ vext_vx_rm_2(void *vd, void *v0, target_long s1, void 
*vs2,
 }
 /* set tail elements to 1s */
 vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
+env->vstart = 0;
 }
 
 /* generate helpers for fixed point instructions with OPIVX format */
-- 
2.43.2

[PATCH v8 06/10] trans_rvv.c.inc: set vstart = 0 in int scalar move insns

2024-03-08 Thread Daniel Henrique Barboza

trans_vmv_x_s, trans_vmv_s_x, trans_vfmv_f_s and trans_vfmv_s_f aren't
setting vstart = 0 after execution. This is usually done by a helper in
vector_helper.c but these functions don't use helpers.

We'll set vstart after any potential 'over' brconds, and that will also
mandate a mark_vs_dirty() too.

Fixes: dedc53cbc9 ("target/riscv: rvv-1.0: integer scalar move instructions")
Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/insn_trans/trans_rvv.c.inc | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index e42728990e..f3caabc101 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -3356,6 +3356,13 @@ static void vec_element_storei(DisasContext *s, int vreg,
 store_element(val, tcg_env, endian_ofs(s, vreg, idx), s->sew);
 }
 
+static void vec_set_vstart_zero(void)
+{
+TCGv_i32 t_zero = tcg_constant_i32(0);
+
+tcg_gen_st_i32(t_zero, tcg_env, offsetof(CPURISCVState, vstart));
+}
+
 /* vmv.x.s rd, vs2 # x[rd] = vs2[0] */
 static bool trans_vmv_x_s(DisasContext *s, arg_vmv_x_s *a)
 {
@@ -3373,6 +3380,8 @@ static bool trans_vmv_x_s(DisasContext *s, arg_vmv_x_s *a)
 vec_element_loadi(s, t1, a->rs2, 0, true);
 tcg_gen_trunc_i64_tl(dest, t1);
 gen_set_gpr(s, a->rd, dest);
+vec_set_vstart_zero();
+mark_vs_dirty(s);
 return true;
 }
 return false;
@@ -3399,8 +3408,9 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
 s1 = get_gpr(s, a->rs1, EXT_NONE);
 tcg_gen_ext_tl_i64(t1, s1);
 vec_element_storei(s, a->rd, 0, t1);
-mark_vs_dirty(s);
 gen_set_label(over);
+vec_set_vstart_zero();
+mark_vs_dirty(s);
 return true;
 }
 return false;
@@ -3427,6 +3437,8 @@ static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s 
*a)
 }
 
 mark_fs_dirty(s);
+vec_set_vstart_zero();
+mark_vs_dirty(s);
 return true;
 }
 return false;
@@ -3452,8 +3464,9 @@ static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f 
*a)
 do_nanbox(s, t1, cpu_fpr[a->rs1]);
 
 vec_element_storei(s, a->rd, 0, t1);
-mark_vs_dirty(s);
 gen_set_label(over);
+vec_set_vstart_zero();
+mark_vs_dirty(s);
 return true;
 }
 return false;
-- 
2.43.2

[PATCH v8 02/10] target/riscv: handle vstart >= vl in vext_set_tail_elems_1s()

2024-03-08 Thread Daniel Henrique Barboza

We're going to make changes that will required each helper to be
responsible for the 'vstart' management, i.e. we will relieve the
'vstart < vl' assumption that helpers have today.

To do that we'll need to deal with how we're updating tail elements
first. We can't update them if vstart >= vl.

We already have the vext_set_tail_elems_1s() helper to update tail
elements.  Change it to accept an 'env' pointer, where we can read both
vstart and vl, and make it a no-op if vstart >= vl. Note that the
callers will need to set env->start after the helper from now on.

We'll enforce the use of this helper to update tail elements on all
instructions, making everyone able to skip the tail update if vstart
isn't adequate.

Let's also simplify the API a little by removing the 'nf' argument since
it can be derived from 'desc'.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/vector_helper.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index ca79571ae2..db1d3f77ce 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -174,19 +174,27 @@ GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw)
 GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl)
 GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq)
 
-static void vext_set_tail_elems_1s(target_ulong vl, void *vd,
-   uint32_t desc, uint32_t nf,
-   uint32_t esz, uint32_t max_elems)
+static void vext_set_tail_elems_1s(CPURISCVState *env, void *vd,
+   uint32_t desc, uint32_t esz,
+   uint32_t max_elems)
 {
 uint32_t vta = vext_vta(desc);
+uint32_t nf = vext_nf(desc);
 int k;
 
-if (vta == 0) {
+/*
+ * Section 5.4 of the RVV spec mentions:
+ * "When vstart ≥ vl, there are no body elements, and no
+ *  elements are updated in any destination vector register
+ *  group, including that no tail elements are updated
+ *  with agnostic values."
+ */
+if (vta == 0 || env->vstart >= env->vl) {
 return;
 }
 
 for (k = 0; k < nf; ++k) {
-vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
+vext_set_elems_1s(vd, vta, (k * max_elems + env->vl) * esz,
   (k * max_elems + max_elems) * esz);
 }
 }
@@ -222,9 +230,8 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
 k++;
 }
 }
+vext_set_tail_elems_1s(env, vd, desc, esz, max_elems);
 env->vstart = 0;
-
-vext_set_tail_elems_1s(env->vl, vd, desc, nf, esz, max_elems);
 }
 
 #define GEN_VEXT_LD_STRIDE(NAME, ETYPE, LOAD_FN)\
@@ -281,9 +288,8 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState 
*env, uint32_t desc,
 k++;
 }
 }
+vext_set_tail_elems_1s(env, vd, desc, esz, max_elems);
 env->vstart = 0;
-
-vext_set_tail_elems_1s(evl, vd, desc, nf, esz, max_elems);
 }
 
 /*
@@ -402,9 +408,8 @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
 k++;
 }
 }
+vext_set_tail_elems_1s(env, vd, desc, esz, max_elems);
 env->vstart = 0;
-
-vext_set_tail_elems_1s(env->vl, vd, desc, nf, esz, max_elems);
 }
 
 #define GEN_VEXT_LD_INDEX(NAME, ETYPE, INDEX_FN, LOAD_FN)  \
@@ -532,9 +537,8 @@ ProbeSuccess:
 k++;
 }
 }
+vext_set_tail_elems_1s(env, vd, desc, esz, max_elems);
 env->vstart = 0;
-
-vext_set_tail_elems_1s(env->vl, vd, desc, nf, esz, max_elems);
 }
 
 #define GEN_VEXT_LDFF(NAME, ETYPE, LOAD_FN)   \
-- 
2.43.2

Re: [External] Re: [PATCH v3 11/20] util/dsa: Implement DSA task asynchronous submission and wait for completion.

2024-03-08 Thread hao . xiang

> 
> On Fri, Mar 8, 2024 at 2:11 AM Jonathan Cameron
> 
>  wrote:
> 
> > 
> > On Thu, 4 Jan 2024 00:44:43 +
> > 
> >  Hao Xiang  wrote:
> > 
> >  * Add a DSA task completion callback.
> > 
> >  * DSA completion thread will call the tasks's completion callback
> > 
> >  on every task/batch task completion.
> > 
> >  * DSA submission path to wait for completion.
> > 
> >  * Implement CPU fallback if DSA is not able to complete the task.
> > 
> >  Signed-off-by: Hao Xiang 
> > 
> >  Signed-off-by: Bryan Zhang 
> > 
> >  Hi,
> > 
> >  One naming comment inline. You had me confused on how you were handling 
> > async
> > 
> >  processing at where this is used. Answer is that I think you aren't!
> > 
> >  +/**
> > 
> >  + * @brief Performs buffer zero comparison on a DSA batch task 
> > asynchronously.
> > 
> >  The hardware may be doing it asynchronously but unless that
> > 
> >  buffer_zero_dsa_wait() call doesn't do what it's name suggests, this 
> > function
> > 
> >  is wrapping the async hardware related stuff to make it synchronous.
> > 
> >  So name it buffer_is_zero_dsa_batch_sync()!
> > 
> >  Jonathan


Thanks for reviewing this. The first completion model I tried was to use a busy 
loop to pull for completion on the submission thread but it turns out to have 
too much unnecessary overhead. Think about 10 threads all submitting tasks and 
we end up having 10 busy loops. I moved the completion work to a dedicated 
thread and named it async! However, the async model doesn't fit well with the 
current live migration thread model so eventually I added a wait on the 
submission thread. It was intended to be async but I agree that it is not 
currently. I will rename it in the next revision.

> > 
> >  + *
> > 
> >  + * @param batch_task A pointer to the batch task.
> > 
> >  + * @param buf An array of memory buffers.
> > 
> >  + * @param count The number of buffers in the array.
> > 
> >  + * @param len The buffer length.
> > 
> >  + *
> > 
> >  + * @return Zero if successful, otherwise non-zero.
> > 
> >  + */
> > 
> >  +int
> > 
> >  +buffer_is_zero_dsa_batch_async(struct dsa_batch_task *batch_task,
> > 
> >  + const void **buf, size_t count, size_t len)
> > 
> >  +{
> > 
> >  + if (count <= 0 || count > batch_task->batch_size) {
> > 
> >  + return -1;
> > 
> >  + }
> > 
> >  +
> > 
> >  + assert(batch_task != NULL);
> > 
> >  + assert(len != 0);
> > 
> >  + assert(buf != NULL);
> > 
> >  +
> > 
> >  + if (count == 1) {
> > 
> >  + /* DSA doesn't take batch operation with only 1 task. */
> > 
> >  + buffer_zero_dsa_async(batch_task, buf[0], len);
> > 
> >  + } else {
> > 
> >  + buffer_zero_dsa_batch_async(batch_task, buf, count, len);
> > 
> >  + }
> > 
> >  +
> > 
> >  + buffer_zero_dsa_wait(batch_task);
> > 
> >  + buffer_zero_cpu_fallback(batch_task);
> > 
> >  +
> > 
> >  + return 0;
> > 
> >  +}
> > 
> >  +
> > 
> >  #endif
> >
>

[PATCH v4 8/8] Add negative tests to validate migration QAPIs

2024-03-08 Thread Het Gala

Migration QAPI arguments - uri and channels are mutually exhaustive.
Add negative validation tests, one with both arguments present and
one with none present.

Signed-off-by: Het Gala 
Suggested-by: Fabiano Rosas 
Reviewed-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 54 
 1 file changed, 54 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 6ba3cfd1e4..385f696a3d 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2612,6 +2612,56 @@ static void test_validate_uuid_dst_not_set(void)
 do_test_validate_uuid(, false);
 }
 
+static void do_test_validate_uri_channel(MigrateCommon *args)
+{
+QTestState *from, *to;
+g_autofree char *connect_uri = NULL;
+
+if (test_migrate_start(, , args->listen_uri, >start)) {
+return;
+}
+
+/* Wait for the first serial output from the source */
+wait_for_serial("src_serial");
+
+/*
+ * 'uri' and 'channels' validation is checked even before the migration
+ * starts.
+ */
+migrate_qmp_fail(from, args->connect_uri, args->connect_channels, "{}");
+test_migrate_end(from, to, false);
+}
+
+static void test_validate_uri_channels_both_set(void)
+{
+MigrateCommon args = {
+.start = {
+.hide_stderr = true,
+},
+.listen_uri = "defer",
+.connect_uri = "tcp:127.0.0.1:0",
+.connect_channels = "[ { 'channel-type': 'main',"
+"'addr': { 'transport': 'socket',"
+"  'type': 'inet',"
+"  'host': '127.0.0.1',"
+"  'port': '0' } } ]",
+};
+
+do_test_validate_uri_channel();
+}
+
+static void test_validate_uri_channels_none_set(void)
+{
+MigrateCommon args = {
+.start = {
+.hide_stderr = true,
+},
+.listen_uri = "defer",
+};
+
+do_test_validate_uri_channel();
+}
+
 /*
  * The way auto_converge works, we need to do too many passes to
  * run this test.  Auto_converge logic is only run once every
@@ -3678,6 +3728,10 @@ int main(int argc, char **argv)
test_validate_uuid_src_not_set);
 migration_test_add("/migration/validate_uuid_dst_not_set",
test_validate_uuid_dst_not_set);
+migration_test_add("/migration/validate_uri/channels/both_set",
+   test_validate_uri_channels_both_set);
+migration_test_add("/migration/validate_uri/channels/none_set",
+   test_validate_uri_channels_none_set);
 /*
  * See explanation why this test is slow on function definition
  */
-- 
2.22.3

[PATCH v4 6/8] Add channels parameter in migrate_qmp

2024-03-08 Thread Het Gala

Alter migrate_qmp() to allow use of channels parameter, but only
fill the uri with correct port number if there are no channels.
Here we don't want to allow the wrong cases of having both or
none (ex: migrate_qmp_fail).

Signed-off-by: Het Gala 
Suggested-by: Fabiano Rosas 
---
 tests/qtest/migration-helpers.c | 22 +-
 tests/qtest/migration-helpers.h |  4 ++--
 tests/qtest/migration-test.c| 28 ++--
 3 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 7c17d78d6b..bf9fd61035 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -135,10 +135,6 @@ static void migrate_set_ports(QTestState *to, QList 
*channel_list)
 QListEntry *entry;
 g_autofree const char *addr_port = NULL;
 
-if (channel_list == NULL) {
-return;
-}
-
 addr = migrate_get_connect_qdict(to, "socket-address");
 
 QLIST_FOREACH_ENTRY(channel_list, entry) {
@@ -208,11 +204,10 @@ void migrate_qmp_fail(QTestState *who, const char *uri,
  * qobject_from_jsonf_nofail()) with "uri": @uri spliced in.
  */
 void migrate_qmp(QTestState *who, QTestState *to, const char *uri,
- const char *fmt, ...)
+ const char *channels, const char *fmt, ...)
 {
 va_list ap;
 QDict *args;
-QList *channel_list = NULL;
 g_autofree char *connect_uri = NULL;
 
 va_start(ap, fmt);
@@ -220,11 +215,20 @@ void migrate_qmp(QTestState *who, QTestState *to, const 
char *uri,
 va_end(ap);
 
 g_assert(!qdict_haskey(args, "uri"));
-if (!uri) {
+if (uri) {
+qdict_put_str(args, "uri", uri);
+} else if (!channels) {
 connect_uri = migrate_get_connect_uri(to, "socket-address");
+qdict_put_str(args, "uri", connect_uri);
+}
+
+g_assert(!qdict_haskey(args, "channels"));
+if (channels) {
+QObject *channels_obj = qobject_from_json(channels, _abort);
+QList *channel_list = qobject_to(QList, channels_obj);
+migrate_set_ports(to, channel_list);
+qdict_put_obj(args, "channels", channels_obj);
 }
-migrate_set_ports(to, channel_list);
-qdict_put_str(args, "uri", uri ? uri : connect_uri);
 
 qtest_qmp_assert_success(who,
  "{ 'execute': 'migrate', 'arguments': %p}", args);
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 4e664148a5..1339835698 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -25,9 +25,9 @@ typedef struct QTestMigrationState {
 bool migrate_watch_for_events(QTestState *who, const char *name,
   QDict *event, void *opaque);
 
-G_GNUC_PRINTF(4, 5)
+G_GNUC_PRINTF(5, 6)
 void migrate_qmp(QTestState *who, QTestState *to, const char *uri,
- const char *fmt, ...);
+ const char *channels, const char *fmt, ...);
 
 G_GNUC_PRINTF(3, 4)
 void migrate_incoming_qmp(QTestState *who, const char *uri,
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 61aa53c3f7..b1e5660dbf 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1305,7 +1305,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 wait_for_serial("src_serial");
 wait_for_suspend(from, _state);
 
-migrate_qmp(from, to, NULL, "{}");
+migrate_qmp(from, to, NULL, NULL, "{}");
 
 migrate_wait_for_dirty_mem(from, to);
 
@@ -1455,7 +1455,7 @@ static void postcopy_recover_fail(QTestState *from, 
QTestState *to)
 g_assert_cmpint(ret, ==, 1);
 
 migrate_recover(to, "fd:fd-mig");
-migrate_qmp(from, to, "fd:fd-mig", "{'resume': true}");
+migrate_qmp(from, to, "fd:fd-mig", NULL, "{'resume': true}");
 
 /*
  * Make sure both QEMU instances will go into RECOVER stage, then test
@@ -1543,7 +1543,7 @@ static void test_postcopy_recovery_common(MigrateCommon 
*args)
  * Try to rebuild the migration channel using the resume flag and
  * the newly created channel
  */
-migrate_qmp(from, to, uri, "{'resume': true}");
+migrate_qmp(from, to, uri, NULL, "{'resume': true}");
 
 /* Restore the postcopy bandwidth to unlimited */
 migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
@@ -1624,7 +1624,7 @@ static void test_baddest(void)
 if (test_migrate_start(, , "tcp:127.0.0.1:0", )) {
 return;
 }
-migrate_qmp(from, to, "tcp:127.0.0.1:0", "{}");
+migrate_qmp(from, to, "tcp:127.0.0.1:0", NULL, "{}");
 wait_for_migration_fail(from, false);
 test_migrate_end(from, to, false);
 }
@@ -1663,7 +1663,7 @@ static void test_analyze_script(void)
 uri = g_strdup_printf("exec:cat > %s", file);
 
 migrate_ensure_converge(from);
-migrate_qmp(from, to, uri, "{}");
+migrate_qmp(from, to, uri, NULL, "{}");
 wait_for_migration_complete(from);
 
 pid = fork();
@@ -1725,7 +1725,7 @@

[PATCH v4 5/8] Add migrate_set_ports into migrate_qmp to update migration port value

2024-03-08 Thread Het Gala

migrate_set_get_qdict gets qdict with the dst QEMU parameters
migrate_set_ports() from list of channels reads each QDict for port,
and fills the port with correct value in case it was 0 in the test.

Signed-off-by: Het Gala 
Suggested-by: Fabiano Rosas 
---
 tests/qtest/migration-helpers.c | 73 +
 1 file changed, 73 insertions(+)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 91c8a817d2..7c17d78d6b 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -17,6 +17,8 @@
 #include "qapi/qapi-visit-sockets.h"
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/error.h"
+#include "qapi/qmp/qlist.h"
+#include "include/qemu/cutils.h"
 
 #include "migration-helpers.h"
 
@@ -49,6 +51,37 @@ static char *SocketAddress_to_str(SocketAddress *addr)
 }
 }
 
+static QDict *SocketAddress_to_qdict(SocketAddress *addr)
+{
+QDict *dict = qdict_new();
+
+switch (addr->type) {
+case SOCKET_ADDRESS_TYPE_INET:
+qdict_put_str(dict, "type", "inet");
+qdict_put_str(dict, "host", addr->u.inet.host);
+qdict_put_str(dict, "port", addr->u.inet.port);
+break;
+case SOCKET_ADDRESS_TYPE_UNIX:
+qdict_put_str(dict, "type", "unix");
+qdict_put_str(dict, "path", addr->u.q_unix.path);
+break;
+case SOCKET_ADDRESS_TYPE_FD:
+qdict_put_str(dict, "type", "fd");
+qdict_put_str(dict, "str", addr->u.fd.str);
+break;
+case SOCKET_ADDRESS_TYPE_VSOCK:
+qdict_put_str(dict, "type", "vsock");
+qdict_put_str(dict, "cid", addr->u.vsock.cid);
+qdict_put_str(dict, "port", addr->u.vsock.port);
+break;
+default:
+g_assert_not_reached();
+break;
+}
+
+return dict;
+}
+
 static SocketAddress *
 migrate_get_socket_address(QTestState *who, const char *parameter)
 {
@@ -83,6 +116,44 @@ migrate_get_connect_uri(QTestState *who, const char 
*parameter)
 return connect_uri;
 }
 
+static QDict *
+migrate_get_connect_qdict(QTestState *who, const char *parameter)
+{
+SocketAddress *addrs;
+QDict *connect_qdict;
+
+addrs = migrate_get_socket_address(who, parameter);
+connect_qdict = SocketAddress_to_qdict(addrs);
+
+qapi_free_SocketAddress(addrs);
+return connect_qdict;
+}
+
+static void migrate_set_ports(QTestState *to, QList *channel_list)
+{
+QDict *addr;
+QListEntry *entry;
+g_autofree const char *addr_port = NULL;
+
+if (channel_list == NULL) {
+return;
+}
+
+addr = migrate_get_connect_qdict(to, "socket-address");
+
+QLIST_FOREACH_ENTRY(channel_list, entry) {
+QDict *channel = qobject_to(QDict, qlist_entry_obj(entry));
+QDict *addrdict = qdict_get_qdict(channel, "addr");
+
+if (qdict_haskey(addrdict, "port") &&
+qdict_haskey(addr, "port") &&
+(strcmp(qdict_get_str(addrdict, "port"), "0") == 0)) {
+addr_port = qdict_get_str(addr, "port");
+qdict_put_str(addrdict, "port", addr_port);
+}
+}
+}
+
 bool migrate_watch_for_events(QTestState *who, const char *name,
   QDict *event, void *opaque)
 {
@@ -141,6 +212,7 @@ void migrate_qmp(QTestState *who, QTestState *to, const 
char *uri,
 {
 va_list ap;
 QDict *args;
+QList *channel_list = NULL;
 g_autofree char *connect_uri = NULL;
 
 va_start(ap, fmt);
@@ -151,6 +223,7 @@ void migrate_qmp(QTestState *who, QTestState *to, const 
char *uri,
 if (!uri) {
 connect_uri = migrate_get_connect_uri(to, "socket-address");
 }
+migrate_set_ports(to, channel_list);
 qdict_put_str(args, "uri", uri ? uri : connect_uri);
 
 qtest_qmp_assert_success(who,
-- 
2.22.3

[PATCH v4 4/8] Add channels parameter in migrate_qmp_fail

2024-03-08 Thread Het Gala

Alter migrate_qmp_fail() to allow both uri and channels
independently. For channels, convert string to a Dict.
No dealing with migrate_get_socket_address() here because
we will fail before starting the migration anyway.

Signed-off-by: Het Gala 
Suggested-by: Fabiano Rosas 
---
 tests/qtest/migration-helpers.c | 13 +++--
 tests/qtest/migration-helpers.h |  5 +++--
 tests/qtest/migration-test.c|  4 ++--
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 3c3fe9d8aa..91c8a817d2 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -102,7 +102,8 @@ bool migrate_watch_for_events(QTestState *who, const char 
*name,
 return false;
 }
 
-void migrate_qmp_fail(QTestState *who, const char *uri, const char *fmt, ...)
+void migrate_qmp_fail(QTestState *who, const char *uri,
+  const char *channels, const char *fmt, ...)
 {
 va_list ap;
 QDict *args, *err;
@@ -112,7 +113,15 @@ void migrate_qmp_fail(QTestState *who, const char *uri, 
const char *fmt, ...)
 va_end(ap);
 
 g_assert(!qdict_haskey(args, "uri"));
-qdict_put_str(args, "uri", uri);
+if (uri) {
+qdict_put_str(args, "uri", uri);
+}
+
+g_assert(!qdict_haskey(args, "channels"));
+if (channels) {
+QObject *channels_obj = qobject_from_json(channels, _abort);
+qdict_put_obj(args, "channels", channels_obj);
+}
 
 err = qtest_qmp_assert_failure_ref(
 who, "{ 'execute': 'migrate', 'arguments': %p}", args);
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index e16a34c796..4e664148a5 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -33,8 +33,9 @@ G_GNUC_PRINTF(3, 4)
 void migrate_incoming_qmp(QTestState *who, const char *uri,
   const char *fmt, ...);
 
-G_GNUC_PRINTF(3, 4)
-void migrate_qmp_fail(QTestState *who, const char *uri, const char *fmt, ...);
+G_GNUC_PRINTF(4, 5)
+void migrate_qmp_fail(QTestState *who, const char *uri,
+  const char *channels, const char *fmt, ...);
 
 void migrate_set_capability(QTestState *who, const char *capability,
 bool value);
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 01255e7e7e..61aa53c3f7 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1721,7 +1721,7 @@ static void test_precopy_common(MigrateCommon *args)
 }
 
 if (args->result == MIG_TEST_QMP_ERROR) {
-migrate_qmp_fail(from, args->connect_uri, "{}");
+migrate_qmp_fail(from, args->connect_uri, NULL, "{}");
 goto finish;
 }
 
@@ -1816,7 +1816,7 @@ static void test_file_common(MigrateCommon *args, bool 
stop_src)
 }
 
 if (args->result == MIG_TEST_QMP_ERROR) {
-migrate_qmp_fail(from, args->connect_uri, "{}");
+migrate_qmp_fail(from, args->connect_uri, NULL, "{}");
 goto finish;
 }
 
-- 
2.22.3

[PATCH v4 7/8] Add multifd_tcp_plain test using list of channels instead of uri

2024-03-08 Thread Het Gala

Add a positive test to check multifd live migration but this time
using list of channels (restricted to 1) as the starting point
instead of simple uri string.

Signed-off-by: Het Gala 
Suggested-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b1e5660dbf..6ba3cfd1e4 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -659,6 +659,13 @@ typedef struct {
  */
 const char *connect_uri;
 
+/*
+ * Optional: JSON-formatted list of src QEMU URIs. If a port is
+ * defined as '0' in any QDict key a value of '0' will be
+ * automatically converted to the correct destination port.
+ */
+const char *connect_channels;
+
 /* Optional: callback to run at start to set migration parameters */
 TestMigrateStartHook start_hook;
 /* Optional: callback to run at finish to cleanup */
@@ -2744,7 +2751,7 @@ test_migrate_precopy_tcp_multifd_zstd_start(QTestState 
*from,
 }
 #endif /* CONFIG_ZSTD */
 
-static void test_multifd_tcp_none(void)
+static void test_multifd_tcp_uri_none(void)
 {
 MigrateCommon args = {
 .listen_uri = "defer",
@@ -2759,6 +2766,21 @@ static void test_multifd_tcp_none(void)
 test_precopy_common();
 }
 
+static void test_multifd_tcp_channels_none(void)
+{
+MigrateCommon args = {
+.listen_uri = "defer",
+.start_hook = test_migrate_precopy_tcp_multifd_start,
+.live = true,
+.connect_channels = "[ { 'channel-type': 'main',"
+"'addr': { 'transport': 'socket',"
+"  'type': 'inet',"
+"  'host': '127.0.0.1',"
+"  'port': '0' } } ]",
+};
+test_precopy_common();
+}
+
 static void test_multifd_tcp_zlib(void)
 {
 MigrateCommon args = {
@@ -3668,8 +3690,10 @@ int main(int argc, char **argv)
test_migrate_dirty_limit);
 }
 }
-migration_test_add("/migration/multifd/tcp/plain/none",
-   test_multifd_tcp_none);
+migration_test_add("/migration/multifd/tcp/uri/plain/none",
+   test_multifd_tcp_uri_none);
+migration_test_add("/migration/multifd/tcp/channels/plain/none",
+   test_multifd_tcp_channels_none);
 migration_test_add("/migration/multifd/tcp/plain/cancel",
test_multifd_tcp_cancel);
 migration_test_add("/migration/multifd/tcp/plain/zlib",
-- 
2.22.3

[PATCH v4 2/8] Replace connect_uri and move migrate_get_socket_address inside migrate_qmp

2024-03-08 Thread Het Gala

Move the calls to migrate_get_socket_address() into migrate_qmp().
Get rid of connect_uri and replace it with args->connect_uri only
because 'to' object will help to generate connect_uri with the
correct port number.

Signed-off-by: Het Gala 
Suggested-by: Fabiano Rosas 
Reviewed-by: Fabiano Rosas 
---
 tests/qtest/migration-helpers.c | 55 ++-
 tests/qtest/migration-test.c| 79 +
 2 files changed, 64 insertions(+), 70 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index b6206a04fb..9af3c7d4d5 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -13,6 +13,10 @@
 #include "qemu/osdep.h"
 #include "qemu/ctype.h"
 #include "qapi/qmp/qjson.h"
+#include "qemu/sockets.h"
+#include "qapi/qapi-visit-sockets.h"
+#include "qapi/qobject-input-visitor.h"
+#include "qapi/error.h"
 
 #include "migration-helpers.h"
 
@@ -24,6 +28,51 @@
  */
 #define MIGRATION_STATUS_WAIT_TIMEOUT 120
 
+static char *SocketAddress_to_str(SocketAddress *addr)
+{
+switch (addr->type) {
+case SOCKET_ADDRESS_TYPE_INET:
+return g_strdup_printf("tcp:%s:%s",
+   addr->u.inet.host,
+   addr->u.inet.port);
+case SOCKET_ADDRESS_TYPE_UNIX:
+return g_strdup_printf("unix:%s",
+   addr->u.q_unix.path);
+case SOCKET_ADDRESS_TYPE_FD:
+return g_strdup_printf("fd:%s", addr->u.fd.str);
+case SOCKET_ADDRESS_TYPE_VSOCK:
+return g_strdup_printf("tcp:%s:%s",
+   addr->u.vsock.cid,
+   addr->u.vsock.port);
+default:
+return g_strdup("unknown address type");
+}
+}
+
+static char *
+migrate_get_socket_address(QTestState *who, const char *parameter)
+{
+QDict *rsp;
+char *result;
+SocketAddressList *addrs;
+Visitor *iv = NULL;
+QObject *object;
+
+rsp = migrate_query(who);
+object = qdict_get(rsp, parameter);
+
+iv = qobject_input_visitor_new(object);
+visit_type_SocketAddressList(iv, NULL, , _abort);
+visit_free(iv);
+
+/* we are only using a single address */
+result = SocketAddress_to_str(addrs->value);
+
+qapi_free_SocketAddressList(addrs);
+qobject_unref(rsp);
+return result;
+}
+
 bool migrate_watch_for_events(QTestState *who, const char *name,
   QDict *event, void *opaque)
 {
@@ -73,13 +122,17 @@ void migrate_qmp(QTestState *who, QTestState *to, const 
char *uri,
 {
 va_list ap;
 QDict *args;
+g_autofree char *connect_uri = NULL;
 
 va_start(ap, fmt);
 args = qdict_from_vjsonf_nofail(fmt, ap);
 va_end(ap);
 
 g_assert(!qdict_haskey(args, "uri"));
-qdict_put_str(args, "uri", uri);
+if (!uri) {
+connect_uri = migrate_get_socket_address(to, "socket-address");
+}
+qdict_put_str(args, "uri", uri ? uri : connect_uri);
 
 qtest_qmp_assert_success(who,
  "{ 'execute': 'migrate', 'arguments': %p}", args);
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index d9b4e28c12..01255e7e7e 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -369,50 +369,6 @@ static void cleanup(const char *filename)
 unlink(path);
 }
 
-static char *SocketAddress_to_str(SocketAddress *addr)
-{
-switch (addr->type) {
-case SOCKET_ADDRESS_TYPE_INET:
-return g_strdup_printf("tcp:%s:%s",
-   addr->u.inet.host,
-   addr->u.inet.port);
-case SOCKET_ADDRESS_TYPE_UNIX:
-return g_strdup_printf("unix:%s",
-   addr->u.q_unix.path);
-case SOCKET_ADDRESS_TYPE_FD:
-return g_strdup_printf("fd:%s", addr->u.fd.str);
-case SOCKET_ADDRESS_TYPE_VSOCK:
-return g_strdup_printf("tcp:%s:%s",
-   addr->u.vsock.cid,
-   addr->u.vsock.port);
-default:
-return g_strdup("unknown address type");
-}
-}
-
-static char *migrate_get_socket_address(QTestState *who, const char *parameter)
-{
-QDict *rsp;
-char *result;
-SocketAddressList *addrs;
-Visitor *iv = NULL;
-QObject *object;
-
-rsp = migrate_query(who);
-object = qdict_get(rsp, parameter);
-
-iv = qobject_input_visitor_new(object);
-visit_type_SocketAddressList(iv, NULL, , _abort);
-visit_free(iv);
-
-/* we are only using a single address */
-result = SocketAddress_to_str(addrs->value);
-
-qapi_free_SocketAddressList(addrs);
-qobject_unref(rsp);
-return result;
-}
-
 static long long migrate_get_parameter_int(QTestState *who,
const char *parameter)
 {
@@ -1349,8 +1305,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 wait_for_serial("src_serial");
 wait_for_suspend(from, _state);

[PATCH v4 3/8] Replace migrate_get_connect_uri inplace of migrate_get_socket_address

2024-03-08 Thread Het Gala

migrate_get_socket_address implicitly converts SocketAddress into str.
Move migrate_get_socket_address inside migrate_get_connect_uri which
should return the uri string instead.

Signed-off-by: Het Gala 
Suggested-by: Fabiano Rosas 
---
 tests/qtest/migration-helpers.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 9af3c7d4d5..3c3fe9d8aa 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -49,12 +49,12 @@ static char *SocketAddress_to_str(SocketAddress *addr)
 }
 }
 
-static char *
+static SocketAddress *
 migrate_get_socket_address(QTestState *who, const char *parameter)
 {
 QDict *rsp;
-char *result;
 SocketAddressList *addrs;
+SocketAddress *addr;
 Visitor *iv = NULL;
 QObject *object;
 
@@ -63,14 +63,24 @@ migrate_get_socket_address(QTestState *who, const char 
*parameter)
 
 iv = qobject_input_visitor_new(object);
 visit_type_SocketAddressList(iv, NULL, , _abort);
+addr = addrs->value;
 visit_free(iv);
 
-/* we are only using a single address */
-result = SocketAddress_to_str(addrs->value);
-
-qapi_free_SocketAddressList(addrs);
 qobject_unref(rsp);
-return result;
+return addr;
+}
+
+static char *
+migrate_get_connect_uri(QTestState *who, const char *parameter)
+{
+SocketAddress *addrs;
+char *connect_uri;
+
+addrs = migrate_get_socket_address(who, parameter);
+connect_uri = SocketAddress_to_str(addrs);
+
+qapi_free_SocketAddress(addrs);
+return connect_uri;
 }
 
 bool migrate_watch_for_events(QTestState *who, const char *name,
@@ -130,7 +140,7 @@ void migrate_qmp(QTestState *who, QTestState *to, const 
char *uri,
 
 g_assert(!qdict_haskey(args, "uri"));
 if (!uri) {
-connect_uri = migrate_get_socket_address(to, "socket-address");
+connect_uri = migrate_get_connect_uri(to, "socket-address");
 }
 qdict_put_str(args, "uri", uri ? uri : connect_uri);
 
-- 
2.22.3

[PATCH v4 1/8] Add 'to' object into migrate_qmp()

2024-03-08 Thread Het Gala

Add the 'to' object into migrate_qmp(), so we can use
migrate_get_socket_address() inside migrate_qmp() to get
the port value. This is not applied to other migrate_qmp*
because they don't need the port.

Signed-off-by: Het Gala 
Suggested-by: Fabiano Rosas 
Reviewed-by: Fabiano Rosas 
---
 tests/qtest/migration-helpers.c |  3 ++-
 tests/qtest/migration-helpers.h |  5 +++--
 tests/qtest/migration-test.c| 28 ++--
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index e451dbdbed..b6206a04fb 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -68,7 +68,8 @@ void migrate_qmp_fail(QTestState *who, const char *uri, const 
char *fmt, ...)
  * Arguments are built from @fmt... (formatted like
  * qobject_from_jsonf_nofail()) with "uri": @uri spliced in.
  */
-void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...)
+void migrate_qmp(QTestState *who, QTestState *to, const char *uri,
+ const char *fmt, ...)
 {
 va_list ap;
 QDict *args;
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 3bf7ded1b9..e16a34c796 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -25,8 +25,9 @@ typedef struct QTestMigrationState {
 bool migrate_watch_for_events(QTestState *who, const char *name,
   QDict *event, void *opaque);
 
-G_GNUC_PRINTF(3, 4)
-void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);
+G_GNUC_PRINTF(4, 5)
+void migrate_qmp(QTestState *who, QTestState *to, const char *uri,
+ const char *fmt, ...);
 
 G_GNUC_PRINTF(3, 4)
 void migrate_incoming_qmp(QTestState *who, const char *uri,
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 4023d808f9..d9b4e28c12 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1350,7 +1350,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 wait_for_suspend(from, _state);
 
 g_autofree char *uri = migrate_get_socket_address(to, "socket-address");
-migrate_qmp(from, uri, "{}");
+migrate_qmp(from, to, uri, "{}");
 
 migrate_wait_for_dirty_mem(from, to);
 
@@ -1500,7 +1500,7 @@ static void postcopy_recover_fail(QTestState *from, 
QTestState *to)
 g_assert_cmpint(ret, ==, 1);
 
 migrate_recover(to, "fd:fd-mig");
-migrate_qmp(from, "fd:fd-mig", "{'resume': true}");
+migrate_qmp(from, to, "fd:fd-mig", "{'resume': true}");
 
 /*
  * Make sure both QEMU instances will go into RECOVER stage, then test
@@ -1588,7 +1588,7 @@ static void test_postcopy_recovery_common(MigrateCommon 
*args)
  * Try to rebuild the migration channel using the resume flag and
  * the newly created channel
  */
-migrate_qmp(from, uri, "{'resume': true}");
+migrate_qmp(from, to, uri, "{'resume': true}");
 
 /* Restore the postcopy bandwidth to unlimited */
 migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
@@ -1669,7 +1669,7 @@ static void test_baddest(void)
 if (test_migrate_start(, , "tcp:127.0.0.1:0", )) {
 return;
 }
-migrate_qmp(from, "tcp:127.0.0.1:0", "{}");
+migrate_qmp(from, to, "tcp:127.0.0.1:0", "{}");
 wait_for_migration_fail(from, false);
 test_migrate_end(from, to, false);
 }
@@ -1708,7 +1708,7 @@ static void test_analyze_script(void)
 uri = g_strdup_printf("exec:cat > %s", file);
 
 migrate_ensure_converge(from);
-migrate_qmp(from, uri, "{}");
+migrate_qmp(from, to, uri, "{}");
 wait_for_migration_complete(from);
 
 pid = fork();
@@ -1777,7 +1777,7 @@ static void test_precopy_common(MigrateCommon *args)
 goto finish;
 }
 
-migrate_qmp(from, connect_uri, "{}");
+migrate_qmp(from, to, connect_uri, "{}");
 
 if (args->result != MIG_TEST_SUCCEED) {
 bool allow_active = args->result == MIG_TEST_FAIL;
@@ -1873,7 +1873,7 @@ static void test_file_common(MigrateCommon *args, bool 
stop_src)
 goto finish;
 }
 
-migrate_qmp(from, connect_uri, "{}");
+migrate_qmp(from, to, connect_uri, "{}");
 wait_for_migration_complete(from);
 
 /*
@@ -2029,7 +2029,7 @@ static void test_ignore_shared(void)
 /* Wait for the first serial output from the source */
 wait_for_serial("src_serial");
 
-migrate_qmp(from, uri, "{}");
+migrate_qmp(from, to, uri, "{}");
 
 migrate_wait_for_dirty_mem(from, to);
 
@@ -2605,7 +2605,7 @@ static void do_test_validate_uuid(MigrateStart *args, 
bool should_fail)
 /* Wait for the first serial output from the source */
 wait_for_serial("src_serial");
 
-migrate_qmp(from, uri, "{}");
+migrate_qmp(from, to, uri, "{}");
 
 if (should_fail) {
 qtest_set_expected_status(to, EXIT_FAILURE);
@@ -2708,7 +2708,7 @@ static void test_migrate_auto_converge(void)
 /* Wait for the first serial

[PATCH v4 0/8] qtest: migration: Add tests for introducing 'channels' argument in migrate QAPIs

2024-03-08 Thread Het Gala

With recent migrate QAPI changes, enabling the direct use of the
'channels' argument to avoid redundant URI string parsing is achieved.

To ensure backward compatibility, both 'uri' and 'channels' are kept as
optional parameters in migration QMP commands. However, they are mutually
exhaustive, requiring at least one for a successful migration connection.
This patchset adds qtests to validate 'uri' and 'channels' arguments'
mututally exhaustive behaviour.

Additionally, all migration qtests fail to employ 'channel' as the primary
method for validating migration QAPIs. This patchset also adds test to
enforce only use of 'channel' argument as the initial entry point for
migration QAPIs.

Patch Summary:
-
Patch 1-2:
-
Introduce 'to' object inside migrate_qmp() so and move the calls to
migrate_get_socket_address() inside migrate_qmp. Also, replace connect_uri
with args->connect_uri everywhere.

Patch 3-6:
-
Add channels argument to allow both migration QAPI arguments independently
into migrate_qmp and migrate_qmp_fail. migrate_qmp requires the port value to
be changed from 0 to port value coming from migrate_get_socket_address. Add
migrate_set_ports to address this change of port value.

Patch 7-8:
-
Add 2 negative tests to validate mutually exhaustive behaviour of migration
QAPIs. Add a positive multifd_tcp_plain qtest with only channels as the
initial entry point for migration QAPIs.

v3->v4 Changelog:

1. introduced migrate_get_connect_uri and migrate_get_connect_qdict to
   both used migrate_get_socket_address to get dest uri in socket-
   address, and then use SokcketAddress_to_qdict to convert it into qdict.
2. Misc code changes.

v2->v3 Changelog:
-
1. 'channels' introduction is not required now for migrate_qmp_incoming
2. Refactor the code into 7 different patches
3. 'channels' introduction is not required now for migrate_qmp_incoming
4. Remove custom function for converting string to MigrationChannelList
5. move calls for migrate_get_socket_address inside migrate_qmp so that
   migrate_set_ports can replace the QAPI's port with correct value.

Het Gala (8):
  Add 'to' object into migrate_qmp()
  Replace connect_uri and move migrate_get_socket_address inside
migrate_qmp
  Replace migrate_get_connect_uri inplace of migrate_get_socket_address
  Add channels parameter in migrate_qmp_fail
  Add migrate_set_ports into migrate_qmp to update migration port value
  Add channels parameter in migrate_qmp
  Add multifd_tcp_plain test using list of channels instead of uri
  Add negative tests to validate migration QAPIs

 tests/qtest/migration-helpers.c | 158 +++-
 tests/qtest/migration-helpers.h |  10 +-
 tests/qtest/migration-test.c| 177 ++--
 3 files changed, 258 insertions(+), 87 deletions(-)

-- 
2.22.3

Re: [PATCH] hw: gpio: introduce pcf8574 driver

2024-03-08 Thread Philippe Mathieu-Daudé


Hi Dmitriy,

On 1/3/24 08:36, Dmitriy Sharikhin wrote:

NXP PCF8574 and compatible ICs are simple I2C GPIO expanders.
PCF8574 incorporates quasi-bidirectional IO, and simple
communication protocol, when IO read is I2C byte read, and
IO write is I2C byte write. User can think of it as
open-drain port, when line high state is input and line low
state is output.

This patch allow to instantiate virtual I2C device called
"pcf8574" in machine init code via generic mechanism.

Signed-off-by: Dmitrii Sharikhin 
---
  hw/gpio/Kconfig   |   4 ++
  hw/gpio/meson.build   |   1 +
  hw/gpio/pcf8574.c | 139 ++
  include/hw/gpio/pcf8574.h |  15 
  4 files changed, 159 insertions(+)
  create mode 100644 hw/gpio/pcf8574.c
  create mode 100644 include/hw/gpio/pcf8574.h

diff --git a/hw/gpio/Kconfig b/hw/gpio/Kconfig
index d2cf3accc8..bb731ff4ce 100644
--- a/hw/gpio/Kconfig
+++ b/hw/gpio/Kconfig
@@ -16,3 +16,7 @@ config GPIO_PWR
  
  config SIFIVE_GPIO

  bool
+
+config PCF8574
+bool
+depends on I2C
diff --git a/hw/gpio/meson.build b/hw/gpio/meson.build
index 8a8d03d885..c0d9a3c757 100644
--- a/hw/gpio/meson.build
+++ b/hw/gpio/meson.build
@@ -15,3 +15,4 @@ system_ss.add(when: 'CONFIG_RASPI', if_true: files(
  ))
  system_ss.add(when: 'CONFIG_ASPEED_SOC', if_true: files('aspeed_gpio.c'))
  system_ss.add(when: 'CONFIG_SIFIVE_GPIO', if_true: files('sifive_gpio.c'))
+system_ss.add(when: 'CONFIG_PCF8574', if_true: files('pcf8574.c'))
diff --git a/hw/gpio/pcf8574.c b/hw/gpio/pcf8574.c
new file mode 100644
index 00..a6c6bd36fa
--- /dev/null
+++ b/hw/gpio/pcf8574.c
@@ -0,0 +1,139 @@
+/*
+ * NXP PCF8574 8-port I2C GPIO expansion chip.
+ *
+ * Copyright (c) 2024 KNS Group (YADRO).
+ * Written by Dmitrii Sharikhin 
+ *
+ * This file is licensed under GNU GPL.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/i2c/i2c.h"
+#include "hw/gpio/pcf8574.h"
+#include "hw/irq.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qom/object.h"
+
+/**
+ * PCF8574 and compatible chips incorporate quasi-bidirectional
+ * IO. Electrically it means that device sustain pull-up to line
+ * unless IO port is configured as output _and_ driven low.
+ *
+ * IO access is implemented as simple I2C single-byte read
+ * or write operation. So, to configure line to input user write 1
+ * to corresponding bit. To configure line to output and drive it low
+ * user write 0 to corresponding bit.
+ *
+ * In essence, user can think of quasi-bidirectional IO as
+ * open-drain line, except presence of builtin rising edge acceleration
+ * embedded in PCF8574 IC
+ **/


#define PORTS_COUNT 8


+
+OBJECT_DECLARE_SIMPLE_TYPE(PCF8574State, PCF8574)
+
+struct PCF8574State {
+I2CSlave parent_obj;
+uint8_t  input;  /* external electrical line state */
+uint8_t  output; /* Pull-up (1) or drive low (0) on bit */
+qemu_irq handler[8];


s/8/PORTS_COUNT/


+qemu_irq *gpio_in;


There is also a gpio_out, why not implement it?


+};
+
+static void pcf8574_reset(DeviceState *dev)
+{
+PCF8574State *s = PCF8574(dev);
+s->input  = 0xFF;
+s->output = 0xFF;


Alternatively MAKE_64BIT_MASK(0, PORTS_COUNT);


+}
+
+static inline uint8_t pcf8574_line_state(PCF8574State *s)
+{
+// we driving line low or external circuit does that


Comment as /* ... */, see
https://www.qemu.org/docs/master/devel/style.html#comment-style


+return s->input & s->output;
+}
+
+static uint8_t pcf8574_rx(I2CSlave *i2c)
+{
+return pcf8574_line_state(PCF8574(i2c));
+}
+
+static int pcf8574_tx(I2CSlave *i2c, uint8_t data)
+{
+PCF8574State *s = PCF8574(i2c);
+uint8_t prev;
+uint8_t diff;
+uint8_t actual;
+int line = 0;
+
+prev = pcf8574_line_state(s);
+s->output = data;
+actual = pcf8574_line_state(s);
+
+for (diff = (actual ^ prev); diff; diff &= ~(1 << line))


No enter before brace.


+{
+line = ctz32(diff);
+if (s->handler[line])


Missing brace, see
https://www.qemu.org/docs/master/devel/style.html#block-structure

Please run scripts/checkpatch.pl, see
https://www.qemu.org/docs/master/devel/submitting-a-patch.html#use-the-qemu-coding-style


+qemu_set_irq(s->handler[line], (actual >> line) & 1);
+}
+
+return 0;
+}
+
+static const VMStateDescription vmstate_pcf8574 = {
+.name = "pcf8574",
+.version_id = 0,
+.minimum_version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_UINT8(input,  PCF8574State),
+VMSTATE_UINT8(output, PCF8574State),
+VMSTATE_I2C_SLAVE(parent_obj, PCF8574State),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void pcf8574_gpio_set(void *opaque, int line, int level)
+{
+PCF8574State *s = (PCF8574State *) opaque;
+assert(line >= 0 && line < ARRAY_SIZE(s->handler));
+
+if (level)
+s->input |=  (1 << line);
+else
+s->input &= ~(1 << line);
+}
+
+static void

Re: [PATCH v6] crypto: Introduce SM4 symmetric cipher algorithm

2024-03-08 Thread Thomas Huth


On 07/12/2023 16.47, Hyman Huang wrote:

Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016).

SM4 (GBT.32907-2016) is a cryptographic standard issued by the
Organization of State Commercial Administration of China (OSCCA)
as an authorized cryptographic algorithms for the use within China.

Detect the SM4 cipher algorithms and enable the feature silently
if it is available.

Signed-off-by: Hyman Huang 
Reviewed-by: Philippe Mathieu-Daudé 
---


FYI, starting with this commit, tests/unit/test-crypto-cipher is now failing 
on s390x hosts (i.e. big endian machines)... could be that there is maybe an 
endianess issue somewhere in here...


 Thomas




diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index fb01ec38bb..f0813d69b4 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -95,12 +95,23 @@ qcrypto_block_luks_cipher_size_map_twofish[] = {
  { 0, 0 },
  };
  
+#ifdef CONFIG_CRYPTO_SM4

+static const QCryptoBlockLUKSCipherSizeMap
+qcrypto_block_luks_cipher_size_map_sm4[] = {
+{ 16, QCRYPTO_CIPHER_ALG_SM4},
+{ 0, 0 },
+};
+#endif
+
  static const QCryptoBlockLUKSCipherNameMap
  qcrypto_block_luks_cipher_name_map[] = {
  { "aes", qcrypto_block_luks_cipher_size_map_aes },
  { "cast5", qcrypto_block_luks_cipher_size_map_cast5 },
  { "serpent", qcrypto_block_luks_cipher_size_map_serpent },
  { "twofish", qcrypto_block_luks_cipher_size_map_twofish },
+#ifdef CONFIG_CRYPTO_SM4
+{ "sm4", qcrypto_block_luks_cipher_size_map_sm4},
+#endif
  };
  
  QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSKeySlot) != 48);

diff --git a/crypto/cipher-gcrypt.c.inc b/crypto/cipher-gcrypt.c.inc
index a6a0117717..1377cbaf14 100644
--- a/crypto/cipher-gcrypt.c.inc
+++ b/crypto/cipher-gcrypt.c.inc
@@ -35,6 +35,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
  case QCRYPTO_CIPHER_ALG_SERPENT_256:
  case QCRYPTO_CIPHER_ALG_TWOFISH_128:
  case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+#endif
  break;
  default:
  return false;
@@ -219,6 +222,11 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
  case QCRYPTO_CIPHER_ALG_TWOFISH_256:
  gcryalg = GCRY_CIPHER_TWOFISH;
  break;
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+gcryalg = GCRY_CIPHER_SM4;
+break;
+#endif
  default:
  error_setg(errp, "Unsupported cipher algorithm %s",
 QCryptoCipherAlgorithm_str(alg));
diff --git a/crypto/cipher-nettle.c.inc b/crypto/cipher-nettle.c.inc
index 24cc61f87b..42b39e18a2 100644
--- a/crypto/cipher-nettle.c.inc
+++ b/crypto/cipher-nettle.c.inc
@@ -33,6 +33,9 @@
  #ifndef CONFIG_QEMU_PRIVATE_XTS
  #include 
  #endif
+#ifdef CONFIG_CRYPTO_SM4
+#include 
+#endif
  
  static inline bool qcrypto_length_check(size_t len, size_t blocksize,

  Error **errp)
@@ -426,6 +429,30 @@ DEFINE_ECB_CBC_CTR_XTS(qcrypto_nettle_twofish,
 QCryptoNettleTwofish, TWOFISH_BLOCK_SIZE,
 twofish_encrypt_native, twofish_decrypt_native)
  
+#ifdef CONFIG_CRYPTO_SM4

+typedef struct QCryptoNettleSm4 {
+QCryptoCipher base;
+struct sm4_ctx key[2];
+} QCryptoNettleSm4;
+
+static void sm4_encrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([0], length, dst, src);
+}
+
+static void sm4_decrypt_native(void *ctx, size_t length,
+   uint8_t *dst, const uint8_t *src)
+{
+struct sm4_ctx *keys = ctx;
+sm4_crypt([1], length, dst, src);
+}
+
+DEFINE_ECB(qcrypto_nettle_sm4,
+   QCryptoNettleSm4, SM4_BLOCK_SIZE,
+   sm4_encrypt_native, sm4_decrypt_native)
+#endif
  
  bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,

   QCryptoCipherMode mode)
@@ -443,6 +470,9 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg,
  case QCRYPTO_CIPHER_ALG_TWOFISH_128:
  case QCRYPTO_CIPHER_ALG_TWOFISH_192:
  case QCRYPTO_CIPHER_ALG_TWOFISH_256:
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+#endif
  break;
  default:
  return false;
@@ -701,6 +731,25 @@ static QCryptoCipher 
*qcrypto_cipher_ctx_new(QCryptoCipherAlgorithm alg,
  
  return >base;

  }
+#ifdef CONFIG_CRYPTO_SM4
+case QCRYPTO_CIPHER_ALG_SM4:
+{
+QCryptoNettleSm4 *ctx = g_new0(QCryptoNettleSm4, 1);
+
+switch (mode) {
+case QCRYPTO_CIPHER_MODE_ECB:
+ctx->base.driver = _nettle_sm4_driver_ecb;
+break;
+default:
+goto bad_cipher_mode;
+}
+
+sm4_set_encrypt_key(>key[0], key);
+sm4_set_decrypt_key(>key[1], key);
+
+return >base;
+}
+#endif
  
  default:

  error_setg(errp,

Re: [PATCH v2 3/5] gdbstub: Save target's siginfo

2024-03-08 Thread Gustavo Romero


Hi Richard,

On 3/7/24 6:09 PM, Richard Henderson wrote:

On 3/7/24 08:26, Gustavo Romero wrote:

Save target's siginfo into gdbserver_state so it can be used later, for
example, in any stub that requires the target's si_signo and si_code.

This change affects only linux-user mode.

Signed-off-by: Gustavo Romero 
Suggested-by: Richard Henderson 
---
  gdbstub/internals.h    |  3 +++
  gdbstub/user-target.c  |  3 ++-
  gdbstub/user.c | 14 ++
  include/gdbstub/user.h |  6 +-
  linux-user/main.c  |  2 +-
  linux-user/signal.c    |  5 -
  6 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index 56b7c13b75..a7cc69dab3 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -58,6 +58,9 @@ typedef struct GDBState {
  int line_csum; /* checksum at the end of the packet */
  GByteArray *last_packet;
  int signal;
+#ifdef CONFIG_USER_ONLY
+    uint8_t siginfo[MAX_SIGINFO_LENGTH];
+#endif


If we this in GDBUserState in user.c -- no need for ifdefs then.


Thanks, I've moved it to user.c.



--- a/gdbstub/user-target.c
+++ b/gdbstub/user-target.c
@@ -10,11 +10,12 @@
  #include "qemu/osdep.h"
  #include "exec/gdbstub.h"
  #include "qemu.h"
-#include "internals.h"
  #ifdef CONFIG_LINUX
  #include "linux-user/loader.h"
  #include "linux-user/qemu.h"
+#include "gdbstub/user.h"
  #endif
+#include "internals.h"
  /*
   * Map target signal numbers to GDB protocol signal numbers and vice


Why are any changes required here?
Perhaps this is improper patch split from one of the others?


This was intentional. Because I declared siginfo[MAX_SIGINFO_LENGTH] in
GDBState struct, which is in internals.h and MAX_SIGINFO_LENGTH is defined
in gdbstub/user.h I had to move internals.h after user.h was included so
MAX_SIGINFO_LENGTH could be found.

I'm reverting it.



@@ -140,6 +141,11 @@ int gdb_handlesig(CPUState *cpu, int sig, const char 
*reason)
  return sig;
  }
+    if (siginfo) {
+    /* Save target-specific siginfo. */
+    memcpy(gdbserver_state.siginfo, siginfo, siginfo_len);
+    }


A comment here about asserting the size at compile-time elsewhere would be 
welcome for future code browsers.


Done.



Need to record siginfo_len for later use -- you don't want to expose all 128 
bytes if the actual structure is smaller.


In the stub, the full size is only used to check if the requested offset+len is 
valid.
So the only size that matters for reading data from siginfo and assembling
the reply is the length in the query, not siginfo_len. But I agree it's better, 
even
more now that I moved the stub from user-target.c to user.c.



@@ -510,7 +516,7 @@ void gdb_breakpoint_remove_all(CPUState *cs)
  void gdb_syscall_handling(const char *syscall_packet)
  {
  gdb_put_packet(syscall_packet);
-    gdb_handlesig(gdbserver_state.c_cpu, 0, NULL);
+    gdb_handlesig(gdbserver_state.c_cpu, 0, NULL, NULL, 0);
  }
  static bool should_catch_syscall(int num)
@@ -528,7 +534,7 @@ void gdb_syscall_entry(CPUState *cs, int num)
  {
  if (should_catch_syscall(num)) {
  g_autofree char *reason = g_strdup_printf("syscall_entry:%x;", num);
-    gdb_handlesig(cs, gdb_target_sigtrap(), reason);
+    gdb_handlesig(cs, gdb_target_sigtrap(), reason, NULL, 0);
  }
  }
@@ -536,7 +542,7 @@ void gdb_syscall_return(CPUState *cs, int num)
  {
  if (should_catch_syscall(num)) {
  g_autofree char *reason = g_strdup_printf("syscall_return:%x;", num);
-    gdb_handlesig(cs, gdb_target_sigtrap(), reason);
+    gdb_handlesig(cs, gdb_target_sigtrap(), reason, NULL, 0);
  }


All of this makes me wonder if we should provide a different interface for 
syscalls, even if it uses the same code paths internally.


Should I address it in this series? I'm sure how that interface would be.



Do we want to zero the gdbserver siginfo to indicate that the contents are no 
longer valid?  I know it's not a real signal delivered to the process, but 
might we need to construct a simple siginfo struct to match the sigtrap?


In gdb_handlesig we always copy the full size of siginfo to 
gdbserver_user_state siginfo,
which is passed in siginfo_len and now recorded gdbserver siginfo_len too for 
later use.
Isn't that copy guaranteeing that gdbserver siginfo has always no stale data?


Cheers,
Gustavo

Re: [PATCH] hw/arm: Deprecate various old Arm machine types

2024-03-08 Thread Philippe Mathieu-Daudé


On 8/3/24 18:16, Peter Maydell wrote:

QEMU includes some models of old Arm machine types which are
a bit problematic for us because:
  * they're written in a very old way that uses numerous APIs that we
would like to get away from (eg they don't use qdev, they use
qemu_system_reset_request(), they use vmstate_register(), etc)
  * they've been that way for a decade plus and nobody particularly has
stepped up to try to modernise the code (beyond some occasional
work here and there)
  * we often don't have test cases for them, which means that if we
do try to do the necessary refactoring work on them we have no
idea if they even still work at all afterwards

All these machine types are also of hardware that has largely passed
away into history and where I would not be surprised to find that
e.g. the Linux kernel support was never tested on real hardware
any more.


Thanks for writing that down.


After some consultation with the Linux kernel developers, we
are going to deprecate:

All PXA2xx machines:

akitaSharp SL-C1000 (Akita) PDA (PXA270)
borzoi   Sharp SL-C3100 (Borzoi) PDA (PXA270)
connex   Gumstix Connex (PXA255)
mainstoneMainstone II (PXA27x)
spitzSharp SL-C3000 (Spitz) PDA (PXA270)
terrier  Sharp SL-C3200 (Terrier) PDA (PXA270)
tosa Sharp SL-6000 (Tosa) PDA (PXA255)
verdex   Gumstix Verdex Pro XL6P COMs (PXA270)
z2   Zipit Z2 (PXA27x)

All OMAP2 machines:

n800 Nokia N800 tablet aka. RX-34 (OMAP2420)
n810 Nokia N810 tablet aka. RX-44 (OMAP2420)

One of the OMAP1 machines:

cheetah  Palm Tungsten|E aka. Cheetah PDA (OMAP310)

Rationale:
  * for QEMU dropping individual machines is much less beneficial
than if we can drop support for an entire SoC
  * the OMAP2 QEMU code in particular is large, old and unmaintained,
and none of the OMAP2 kernel maintainers said they were using
QEMU in any of their testing/development
  * although there is a setup that is booting test kernels on some
of the PXA2xx machines, nobody seemed to be using them as part
of their active kernel development and my impression from the
email thread is that PXA is the closest of all these SoC families
to being dropped from the kernel soon
  * nobody said they were using cheetah, so it's entirely
untested and quite probably broken
  * on the other hand the OMAP1 sx1 model does seem to be being
used as part of kernel development, and there was interest
in keeping collie around

In particular, the mainstone, tosa and z2 machine types have
already been dropped from Linux.

Mark all these machine types as depprecated.


Typo "deprecated".



Signed-off-by: Peter Maydell 
---
  docs/about/deprecated.rst | 15 +++
  hw/arm/gumstix.c  |  2 ++
  hw/arm/mainstone.c|  1 +
  hw/arm/nseries.c  |  2 ++
  hw/arm/palm.c |  1 +
  hw/arm/spitz.c|  1 +
  hw/arm/tosa.c |  1 +
  hw/arm/z2.c   |  1 +
  8 files changed, 24 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] hw/arm: Deprecate various old Arm machine types

2024-03-08 Thread Thomas Huth


On 08/03/2024 18.16, Peter Maydell wrote:

QEMU includes some models of old Arm machine types which are
a bit problematic for us because:
  * they're written in a very old way that uses numerous APIs that we
would like to get away from (eg they don't use qdev, they use
qemu_system_reset_request(), they use vmstate_register(), etc)
  * they've been that way for a decade plus and nobody particularly has
stepped up to try to modernise the code (beyond some occasional
work here and there)
  * we often don't have test cases for them, which means that if we
do try to do the necessary refactoring work on them we have no
idea if they even still work at all afterwards

All these machine types are also of hardware that has largely passed
away into history and where I would not be surprised to find that
e.g. the Linux kernel support was never tested on real hardware
any more.

After some consultation with the Linux kernel developers, we
are going to deprecate:

All PXA2xx machines:

akitaSharp SL-C1000 (Akita) PDA (PXA270)
borzoi   Sharp SL-C3100 (Borzoi) PDA (PXA270)
connex   Gumstix Connex (PXA255)
mainstoneMainstone II (PXA27x)
spitzSharp SL-C3000 (Spitz) PDA (PXA270)
terrier  Sharp SL-C3200 (Terrier) PDA (PXA270)
tosa Sharp SL-6000 (Tosa) PDA (PXA255)
verdex   Gumstix Verdex Pro XL6P COMs (PXA270)
z2   Zipit Z2 (PXA27x)

All OMAP2 machines:

n800 Nokia N800 tablet aka. RX-34 (OMAP2420)
n810 Nokia N810 tablet aka. RX-44 (OMAP2420)

One of the OMAP1 machines:

cheetah  Palm Tungsten|E aka. Cheetah PDA (OMAP310)

Rationale:
  * for QEMU dropping individual machines is much less beneficial
than if we can drop support for an entire SoC
  * the OMAP2 QEMU code in particular is large, old and unmaintained,
and none of the OMAP2 kernel maintainers said they were using
QEMU in any of their testing/development
  * although there is a setup that is booting test kernels on some
of the PXA2xx machines, nobody seemed to be using them as part
of their active kernel development and my impression from the
email thread is that PXA is the closest of all these SoC families
to being dropped from the kernel soon
  * nobody said they were using cheetah, so it's entirely
untested and quite probably broken
  * on the other hand the OMAP1 sx1 model does seem to be being
used as part of kernel development, and there was interest
in keeping collie around

In particular, the mainstone, tosa and z2 machine types have
already been dropped from Linux.

Mark all these machine types as depprecated.

Signed-off-by: Peter Maydell 
---
  docs/about/deprecated.rst | 15 +++
  hw/arm/gumstix.c  |  2 ++
  hw/arm/mainstone.c|  1 +
  hw/arm/nseries.c  |  2 ++
  hw/arm/palm.c |  1 +
  hw/arm/spitz.c|  1 +
  hw/arm/tosa.c |  1 +
  hw/arm/z2.c   |  1 +
  8 files changed, 24 insertions(+)


Reviewed-by: Thomas Huth

Re: [PATCH] pci: Add option to disable device level INTx masking

2024-03-08 Thread Michael S. Tsirkin

On Fri, Mar 08, 2024 at 01:02:01PM -0700, Alex Williamson wrote:
> On Fri, 8 Mar 2024 14:37:06 -0500
> "Michael S. Tsirkin"  wrote:
> 
> > On Fri, Mar 08, 2024 at 10:24:14AM -0700, Alex Williamson wrote:
> > > On Fri, 8 Mar 2024 11:57:38 -0500
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > On Thu, Mar 07, 2024 at 11:46:42AM -0700, Alex Williamson wrote:  
> > > > > The PCI 2.3 spec added definitions of the INTx disable and status 
> > > > > bits,
> > > > > in the command and status registers respectively.  The command 
> > > > > register
> > > > > bit, commonly known as DisINTx in lspci, controls whether the device
> > > > > can assert the INTx signal.
> > > > > 
> > > > > Operating systems will often write to this bit to test whether a 
> > > > > device
> > > > > supports this style of legacy interrupt masking.  When using device
> > > > > assignment, such as with vfio-pci, the result of this test dictates
> > > > > whether the device can use a shared or exclusive interrupt (ie. 
> > > > > generic
> > > > > INTx masking at the device via DisINTx or IRQ controller level INTx
> > > > > masking).
> > > > > 
> > > > > Add an experimental option to the base set of properties for PCI
> > > > > devices which allows the DisINTx bit to be excluded from wmask, making
> > > > > it read-only to the guest for testing purposes related to INTx 
> > > > > masking.
> > > > > 
> > > > 
> > > > Could you clarify the use a bit more? It's unstable - do you
> > > > expect to experiment with it and then make it permanent down
> > > > the road?  
> > > 
> > > No, my aspirations end at providing an experimental option.
> > > Technically all devices should support and honor this bit, so I don't
> > > think we should provide a supported method of providing broken behavior,
> > > but there do exist physical devices where this feature is broken or
> > > unsupported.  Rather than implementing emulation of one of these broken
> > > devices, with bug for bug compatibility, it's much easier to be able to
> > > trigger broken DisINTx behavior on an arbitrary device, in an
> > > unsupported fashion.  Thanks,
> > > 
> > > Alex  
> > 
> > Well, we tend not to merge patches for playing with random
> > bits in config space just so people can experiment with
> > whether this breaks guests, but given this is coming from
> > a long term contributor and a maintainer, it's a different
> > matter. So ok, to make another maintainer's life easier
> > I'm prepared to take this. I'd like to figure out though -
> > does your need extend to experimenting with all devices
> > or just with vfio ones? If the later maybe keep it there
> > where you understand what the actual need is... If the former
> > as I said I'll merge it.
> 
> I'm actually looking at using it with non-vfio devices, for example I
> have a dummy nvme driver that can configure either INTx, MSI, or MSI-X
> interrupts.  The driver just stuffs nop commands into the admin queue to
> trigger an interrupt.  This tests DMA mapping and interrupt paths.  I
> intend to port this to a userspace vfio-pci driver that I can run in a
> guest on an emulated nvme device, thereby enabling targeted testing
> without any host hardware or device dependencies.  If I were to expose
> two emulated nvme devices to the guest, one with DisINTx disabled, then
> all variations could be tested.
> 
> For full disclosure, the vfio-pci kernel driver does have a nointxmask
> module option, so while I think it would be useful and provides a
> little more flexibility that devices in QEMU can be specified with this
> behavior, there are means to do it otherwise. The QEMU vfio-pci driver
> certainly has experimental options that don't necessarily have a path
> to become supported, I hadn't realized your intention/preference to
> make it a staging ground for to-be-supported options for PCIDevice.
> 
> If you have concerns about cluttering options or maintaining dead-end
> experimental options, let's hold off on this until there's a case that
> can't be met with the kernel module option.  Thanks,
> 
> Alex

That's the concern. But you decide. One maintainer's time is not more
important than other's. If it helps you - just merge it.

Acked-by: Michael S. Tsirkin 


> > > > > Signed-off-by: Alex Williamson 
> > > > > ---
> > > > >  hw/pci/pci.c | 14 ++
> > > > >  include/hw/pci/pci.h |  2 ++
> > > > >  2 files changed, 12 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > > index 6496d027ca61..8c78326ad67f 100644
> > > > > --- a/hw/pci/pci.c
> > > > > +++ b/hw/pci/pci.c
> > > > > @@ -85,6 +85,8 @@ static Property pci_props[] = {
> > > > >  QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
> > > > >  DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
> > > > >  QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
> > > > > +DEFINE_PROP_BIT("x-pci-disintx", PCIDevice, cap_present,
> > > > > +

Re: [PATCH] pci: Add option to disable device level INTx masking

2024-03-08 Thread Alex Williamson

On Fri, 8 Mar 2024 14:37:06 -0500
"Michael S. Tsirkin"  wrote:

> On Fri, Mar 08, 2024 at 10:24:14AM -0700, Alex Williamson wrote:
> > On Fri, 8 Mar 2024 11:57:38 -0500
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Thu, Mar 07, 2024 at 11:46:42AM -0700, Alex Williamson wrote:  
> > > > The PCI 2.3 spec added definitions of the INTx disable and status bits,
> > > > in the command and status registers respectively.  The command register
> > > > bit, commonly known as DisINTx in lspci, controls whether the device
> > > > can assert the INTx signal.
> > > > 
> > > > Operating systems will often write to this bit to test whether a device
> > > > supports this style of legacy interrupt masking.  When using device
> > > > assignment, such as with vfio-pci, the result of this test dictates
> > > > whether the device can use a shared or exclusive interrupt (ie. generic
> > > > INTx masking at the device via DisINTx or IRQ controller level INTx
> > > > masking).
> > > > 
> > > > Add an experimental option to the base set of properties for PCI
> > > > devices which allows the DisINTx bit to be excluded from wmask, making
> > > > it read-only to the guest for testing purposes related to INTx masking.
> > > > 
> > > 
> > > Could you clarify the use a bit more? It's unstable - do you
> > > expect to experiment with it and then make it permanent down
> > > the road?  
> > 
> > No, my aspirations end at providing an experimental option.
> > Technically all devices should support and honor this bit, so I don't
> > think we should provide a supported method of providing broken behavior,
> > but there do exist physical devices where this feature is broken or
> > unsupported.  Rather than implementing emulation of one of these broken
> > devices, with bug for bug compatibility, it's much easier to be able to
> > trigger broken DisINTx behavior on an arbitrary device, in an
> > unsupported fashion.  Thanks,
> > 
> > Alex  
> 
> Well, we tend not to merge patches for playing with random
> bits in config space just so people can experiment with
> whether this breaks guests, but given this is coming from
> a long term contributor and a maintainer, it's a different
> matter. So ok, to make another maintainer's life easier
> I'm prepared to take this. I'd like to figure out though -
> does your need extend to experimenting with all devices
> or just with vfio ones? If the later maybe keep it there
> where you understand what the actual need is... If the former
> as I said I'll merge it.

I'm actually looking at using it with non-vfio devices, for example I
have a dummy nvme driver that can configure either INTx, MSI, or MSI-X
interrupts.  The driver just stuffs nop commands into the admin queue to
trigger an interrupt.  This tests DMA mapping and interrupt paths.  I
intend to port this to a userspace vfio-pci driver that I can run in a
guest on an emulated nvme device, thereby enabling targeted testing
without any host hardware or device dependencies.  If I were to expose
two emulated nvme devices to the guest, one with DisINTx disabled, then
all variations could be tested.

For full disclosure, the vfio-pci kernel driver does have a nointxmask
module option, so while I think it would be useful and provides a
little more flexibility that devices in QEMU can be specified with this
behavior, there are means to do it otherwise. The QEMU vfio-pci driver
certainly has experimental options that don't necessarily have a path
to become supported, I hadn't realized your intention/preference to
make it a staging ground for to-be-supported options for PCIDevice.

If you have concerns about cluttering options or maintaining dead-end
experimental options, let's hold off on this until there's a case that
can't be met with the kernel module option.  Thanks,

Alex

> > > > Signed-off-by: Alex Williamson 
> > > > ---
> > > >  hw/pci/pci.c | 14 ++
> > > >  include/hw/pci/pci.h |  2 ++
> > > >  2 files changed, 12 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > index 6496d027ca61..8c78326ad67f 100644
> > > > --- a/hw/pci/pci.c
> > > > +++ b/hw/pci/pci.c
> > > > @@ -85,6 +85,8 @@ static Property pci_props[] = {
> > > >  QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
> > > >  DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
> > > >  QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
> > > > +DEFINE_PROP_BIT("x-pci-disintx", PCIDevice, cap_present,
> > > > +QEMU_PCI_DISINTX_BITNR, true),
> > > >  DEFINE_PROP_END_OF_LIST()
> > > >  };
> > > >  
> > > > @@ -861,13 +863,17 @@ static void pci_init_cmask(PCIDevice *dev)
> > > >  static void pci_init_wmask(PCIDevice *dev)
> > > >  {
> > > >  int config_size = pci_config_size(dev);
> > > > +uint16_t cmd_wmask = PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
> > > > + PCI_COMMAND_MASTER | PCI_COMMAND_SERR;
> > > >  
> > > >

Re: [PATCH v1 3/5] hw/ppc: SPI controller model - sequencer and shifter

2024-03-08 Thread Stefan Berger





On 2/7/24 11:08, Chalapathi V wrote:

In this commit SPI shift engine and sequencer logic is implemented.
Shift engine performs serialization and de-serialization according to the
control by the sequencer and according to the setup defined in the
configuration registers. Sequencer implements the main control logic and
FSM to handle data transmit and data receive control of the shift engine.

Signed-off-by: Chalapathi V 
---
  include/hw/ppc/pnv_spi_controller.h |   58 ++
  hw/ppc/pnv_spi_controller.c | 1274 ++-
  2 files changed, 1331 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/pnv_spi_controller.h 
b/include/hw/ppc/pnv_spi_controller.h
index 8afaabdd1b..8160c35f5c 100644
--- a/include/hw/ppc/pnv_spi_controller.h
+++ b/include/hw/ppc/pnv_spi_controller.h
@@ -8,6 +8,14 @@
   * This model Supports a connection to a single SPI responder.
   * Introduced for P10 to provide access to SPI seeproms, TPM, flash device
   * and an ADC controller.
+ *
+ * All SPI function control is mapped into the SPI register space to enable
+ * full control by firmware.
+ *
+ * SPI Controller has sequencer and shift engine. The SPI shift engine
+ * performs serialization and de-serialization according to the control by
+ * the sequencer and according to the setup defined in the configuration
+ * registers and the SPI sequencer implements the main control logic.
   */
  
  #ifndef PPC_PNV_SPI_CONTROLLER_H

@@ -20,6 +28,7 @@
  #define SPI_CONTROLLER_REG_SIZE 8
  
  typedef struct SpiBus SpiBus;

+typedef struct xfer_buffer xfer_buffer;
  
  typedef struct PnvSpiController {

  DeviceState parent;
@@ -28,6 +37,39 @@ typedef struct PnvSpiController {
  MemoryRegionxscom_spic_regs;
  /* SPI controller object number */
  uint32_tspic_num;
+uint8_t responder_select;
+/* To verify if shift_n1 happens prior to shift_n2 */
+boolshift_n1_done;
+/*
+ * Internal flags for the first and last indicators for the SPI
+ * interface methods
+ */
+uint8_t first;
+uint8_t last;


Do these two correspond to the first and liast here?

xfer_buffer *seeprom_spi_request(PnvSpiResponder *resp, int first, int 
last, int bits, xfer_buffer *payload);


If so I think the data types in the prototype should be set to uint8_t 
as well and also bits should probably be an unsigned int or uint8_t?



+/* Loop counter for branch operation opcode Ex/Fx */
+uint8_t loop_counter_1;
+uint8_t loop_counter_2;
+/* N1/N2_bits specifies the size of the N1/N2 segment of a frame in bits.*/
+uint8_t N1_bits;
+uint8_t N2_bits;
+/* Number of bytes in a payload for the N1/N2 frame segment.*/
+uint8_t N1_bytes;
+uint8_t N2_bytes;
+/* Number of N1/N2 bytes marked for transmit */
+uint8_t N1_tx;
+uint8_t N2_tx;
+/* Number of N1/N2 bytes marked for receive */
+uint8_t N1_rx;
+uint8_t N2_rx;
+/*
+ * Setting this attribute to true will cause the engine to reverse the
+ * bit order of each byte it appends to a payload before sending the
+ * payload to a device. There may be cases where an end device expects
+ * a reversed order, like in the case of the Nuvoton TPM device. The
+ * order of bytes in the payload is not reversed, only the order of the
+ * 8 bits in each payload byte.
+ */
+boolreverse_bits;
  
  /* SPI Controller registers */

  uint64_terror_reg;
@@ -40,4 +82,20 @@ typedef struct PnvSpiController {
  uint8_t sequencer_operation_reg[SPI_CONTROLLER_REG_SIZE];
  uint64_tstatus_reg;
  } PnvSpiController;
+
+void log_all_N_counts(PnvSpiController *spi_controller);
+void spi_response(PnvSpiController *spi_controller, int bits,
+xfer_buffer *rsp_payload);
+void operation_sequencer(PnvSpiController *spi_controller);
+bool operation_shiftn1(PnvSpiController *spi_controller, uint8_t opcode,
+   xfer_buffer **payload, bool send_n1_alone);
+bool operation_shiftn2(PnvSpiController *spi_controller, uint8_t opcode,
+   xfer_buffer **payload);
+bool does_rdr_match(PnvSpiController *spi_controller);
+uint8_t get_from_offset(PnvSpiController *spi_controller, uint8_t offset);
+void shift_byte_in(PnvSpiController *spi_controller, uint8_t byte);
+void calculate_N1(PnvSpiController *spi_controller, uint8_t opcode);
+void calculate_N2(PnvSpiController *spi_controller, uint8_t opcode);
+void do_reset(PnvSpiController *spi_controller);
+uint8_t reverse_bits8(uint8_t x);
  #endif /* PPC_PNV_SPI_CONTROLLER_H */
diff --git a/hw/ppc/pnv_spi_controller.c b/hw/ppc/pnv_spi_controller.c
index 0f2bc25e82..ef48af5d03 100644
--- a/hw/ppc/pnv_spi_controller.c
+++ b/hw/ppc/pnv_spi_controller.c
@@ -9,7 +9,6 @@
  #include "qemu/osdep.h"
  #include "qemu/log.h"
  #include

Re: [PATCH] pci: Add option to disable device level INTx masking

2024-03-08 Thread Michael S. Tsirkin

On Fri, Mar 08, 2024 at 10:24:14AM -0700, Alex Williamson wrote:
> On Fri, 8 Mar 2024 11:57:38 -0500
> "Michael S. Tsirkin"  wrote:
> 
> > On Thu, Mar 07, 2024 at 11:46:42AM -0700, Alex Williamson wrote:
> > > The PCI 2.3 spec added definitions of the INTx disable and status bits,
> > > in the command and status registers respectively.  The command register
> > > bit, commonly known as DisINTx in lspci, controls whether the device
> > > can assert the INTx signal.
> > > 
> > > Operating systems will often write to this bit to test whether a device
> > > supports this style of legacy interrupt masking.  When using device
> > > assignment, such as with vfio-pci, the result of this test dictates
> > > whether the device can use a shared or exclusive interrupt (ie. generic
> > > INTx masking at the device via DisINTx or IRQ controller level INTx
> > > masking).
> > > 
> > > Add an experimental option to the base set of properties for PCI
> > > devices which allows the DisINTx bit to be excluded from wmask, making
> > > it read-only to the guest for testing purposes related to INTx masking.
> > >   
> > 
> > Could you clarify the use a bit more? It's unstable - do you
> > expect to experiment with it and then make it permanent down
> > the road?
> 
> No, my aspirations end at providing an experimental option.
> Technically all devices should support and honor this bit, so I don't
> think we should provide a supported method of providing broken behavior,
> but there do exist physical devices where this feature is broken or
> unsupported.  Rather than implementing emulation of one of these broken
> devices, with bug for bug compatibility, it's much easier to be able to
> trigger broken DisINTx behavior on an arbitrary device, in an
> unsupported fashion.  Thanks,
> 
> Alex

Well, we tend not to merge patches for playing with random
bits in config space just so people can experiment with
whether this breaks guests, but given this is coming from
a long term contributor and a maintainer, it's a different
matter. So ok, to make another maintainer's life easier
I'm prepared to take this. I'd like to figure out though -
does your need extend to experimenting with all devices
or just with vfio ones? If the later maybe keep it there
where you understand what the actual need is... If the former
as I said I'll merge it.


> > > Signed-off-by: Alex Williamson 
> > > ---
> > >  hw/pci/pci.c | 14 ++
> > >  include/hw/pci/pci.h |  2 ++
> > >  2 files changed, 12 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > index 6496d027ca61..8c78326ad67f 100644
> > > --- a/hw/pci/pci.c
> > > +++ b/hw/pci/pci.c
> > > @@ -85,6 +85,8 @@ static Property pci_props[] = {
> > >  QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
> > >  DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
> > >  QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
> > > +DEFINE_PROP_BIT("x-pci-disintx", PCIDevice, cap_present,
> > > +QEMU_PCI_DISINTX_BITNR, true),
> > >  DEFINE_PROP_END_OF_LIST()
> > >  };
> > >  
> > > @@ -861,13 +863,17 @@ static void pci_init_cmask(PCIDevice *dev)
> > >  static void pci_init_wmask(PCIDevice *dev)
> > >  {
> > >  int config_size = pci_config_size(dev);
> > > +uint16_t cmd_wmask = PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
> > > + PCI_COMMAND_MASTER | PCI_COMMAND_SERR;
> > >  
> > >  dev->wmask[PCI_CACHE_LINE_SIZE] = 0xff;
> > >  dev->wmask[PCI_INTERRUPT_LINE] = 0xff;
> > > -pci_set_word(dev->wmask + PCI_COMMAND,
> > > - PCI_COMMAND_IO | PCI_COMMAND_MEMORY | 
> > > PCI_COMMAND_MASTER |
> > > - PCI_COMMAND_INTX_DISABLE);
> > > -pci_word_test_and_set_mask(dev->wmask + PCI_COMMAND, 
> > > PCI_COMMAND_SERR);
> > > +
> > > +if (dev->cap_present & QEMU_PCI_DISINTX) {
> > > +cmd_wmask |= PCI_COMMAND_INTX_DISABLE;
> > > +}
> > > +
> > > +pci_set_word(dev->wmask + PCI_COMMAND, cmd_wmask);
> > >  
> > >  memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff,
> > > config_size - PCI_CONFIG_HEADER_SIZE);
> > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > index eaa3fc99d884..45f0fac435cc 100644
> > > --- a/include/hw/pci/pci.h
> > > +++ b/include/hw/pci/pci.h
> > > @@ -212,6 +212,8 @@ enum {
> > >  QEMU_PCIE_ERR_UNC_MASK = (1 << QEMU_PCIE_ERR_UNC_MASK_BITNR),
> > >  #define QEMU_PCIE_ARI_NEXTFN_1_BITNR 12
> > >  QEMU_PCIE_ARI_NEXTFN_1 = (1 << QEMU_PCIE_ARI_NEXTFN_1_BITNR),
> > > +#define QEMU_PCI_DISINTX_BITNR 13
> > > +QEMU_PCI_DISINTX = (1 << QEMU_PCI_DISINTX_BITNR),
> > >  };
> > >  
> > >  typedef struct PCIINTxRoute {
> > > -- 
> > > 2.44.0  
> >

Re: [PATCH v2 3/5] gdbstub: Save target's siginfo

2024-03-08 Thread Alex Bennée

Gustavo Romero  writes:

> Hi Alex,
>
> On 3/7/24 7:33 PM, Alex Bennée wrote:
>> Richard Henderson  writes:
>> 
>>> On 3/7/24 08:26, Gustavo Romero wrote:
 Save target's siginfo into gdbserver_state so it can be used later, for
 example, in any stub that requires the target's si_signo and si_code.
 This change affects only linux-user mode.
 Signed-off-by: Gustavo Romero 
 Suggested-by: Richard Henderson 
 ---
gdbstub/internals.h|  3 +++
gdbstub/user-target.c  |  3 ++-
gdbstub/user.c | 14 ++
include/gdbstub/user.h |  6 +-
linux-user/main.c  |  2 +-
linux-user/signal.c|  5 -
6 files changed, 25 insertions(+), 8 deletions(-)
 diff --git a/gdbstub/internals.h b/gdbstub/internals.h
 index 56b7c13b75..a7cc69dab3 100644
 --- a/gdbstub/internals.h
 +++ b/gdbstub/internals.h
 @@ -58,6 +58,9 @@ typedef struct GDBState {
int line_csum; /* checksum at the end of the packet */
GByteArray *last_packet;
int signal;
 +#ifdef CONFIG_USER_ONLY
 +uint8_t siginfo[MAX_SIGINFO_LENGTH];
 +#endif
>>>
>>> If we this in GDBUserState in user.c -- no need for ifdefs then.
>> Although it does break on FreeBSD's user target:
>>FAILED: libqemu-arm-bsd-user.fa.p/gdbstub_user-target.c.o
>>cc -m64 -mcx16 -Ilibqemu-arm-bsd-user.fa.p -I. -I.. -Itarget/arm 
>> -I../target/arm -I../common-user/host/x86_64 -I../bsd-user/include 
>> -Ibsd-user/freebsd -I../bsd-user/freebsd -I../bsd-user/host/x86_64 
>> -Ibsd-user -I../bsd-user -I../bsd-user/arm -Iqapi -Itrace -Iui -Iui/shader 
>> -I/usr/local/include/capstone -I/usr/local/include/glib-2.0 
>> -I/usr/local/lib/glib-2.0/include -I/usr/local/include 
>> -fdiagnostics-color=auto -Wall -Winvalid-pch -Werror -std=gnu11 -O2 -g 
>> -fstack-protector-strong -Wempty-body -Wendif-labels -Wexpansion-to-defined 
>> -Wformat-security -Wformat-y2k -Wignored-qualifiers -Winit-self 
>> -Wmissing-format-attribute -Wmissing-prototypes -Wnested-externs 
>> -Wold-style-definition -Wredundant-decls -Wstrict-prototypes -Wtype-limits 
>> -Wundef -Wvla -Wwrite-strings -Wno-gnu-variable-sized-type-not-at-end 
>> -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-psabi 
>> -Wno-shift-negative-value -Wno-string-plus-int 
>> -Wno-tautological-type-limit-compare -Wno-typedef-redefinition 
>> -Wthread-safety -iquote . -iquote /tmp/cirrus-ci-build -iquote 
>> /tmp/cirrus-ci-build/include -iquote 
>> /tmp/cirrus-ci-build/host/include/x86_64 -iquote 
>> /tmp/cirrus-ci-build/host/include/generic -iquote 
>> /tmp/cirrus-ci-build/tcg/i386 -pthread -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 
>> -D_LARGEFILE_SOURCE -fno-strict-aliasing -fno-common -fwrapv 
>> -ftrivial-auto-var-init=zero -fPIE -DNEED_CPU_H 
>> '-DCONFIG_TARGET="arm-bsd-user-config-target.h"' 
>> '-DCONFIG_DEVICES="arm-bsd-user-config-devices.h"' -MD -MQ 
>> libqemu-arm-bsd-user.fa.p/gdbstub_user-target.c.o -MF 
>> libqemu-arm-bsd-user.fa.p/gdbstub_user-target.c.o.d -o 
>> libqemu-arm-bsd-user.fa.p/gdbstub_user-target.c.o -c ../gdbstub/user-target.c
>>In file included from ../gdbstub/user-target.c:18:
>>../gdbstub/internals.h:62:21: error: use of undeclared identifier 
>> 'MAX_SIGINFO_LENGTH'
>>   62 | uint8_t siginfo[MAX_SIGINFO_LENGTH];
>>  | ^
>>1 error generated.
>>[2084/6731] Compiling C object libqemu-arm
>> See: https://gitlab.com/stsquad/qemu/-/jobs/6345829419
>
> argh, I've tested all targets for linux-user, but missed bsd-user. I've tried
> once to build it but that requires a BSD-like host, which I don't have at the
> moment, then I forgot about it... Let me setup one and review the change in
> the light of the comments from you and Richard.

  make vm-build-[open|net|free]bsd

see make vm-help for details.

>
> Thanks!
>
>
> Cheers,
> Gustavo

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v1 2/8] virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-08 Thread Michael S. Tsirkin

On Fri, Mar 08, 2024 at 12:45:13PM -0500, Jonah Palmer wrote:
> 
> 
> On 3/8/24 12:36 PM, Eugenio Perez Martin wrote:
> > On Fri, Mar 8, 2024 at 6: 01 PM Michael S. Tsirkin 
> > wrote: > > On Mon, Mar 04, 2024 at 02: 46: 06PM -0500, Jonah Palmer
> > wrote: > > Prevent ioeventfd from being enabled/disabled when a
> > virtio-pci > > device
> > ZjQcmQRYFpfptBannerStart
> > This Message Is From an External Sender
> > This message came from outside your organization.
> > Report Suspicious
> > 
> > ZjQcmQRYFpfptBannerEnd
> > 
> > On Fri, Mar 8, 2024 at 6:01 PM Michael S. Tsirkin  wrote:
> > > 
> > > On Mon, Mar 04, 2024 at 02:46:06PM -0500, Jonah Palmer wrote:
> > > > Prevent ioeventfd from being enabled/disabled when a virtio-pci
> > > > device has negotiated the VIRTIO_F_NOTIFICATION_DATA transport
> > > > feature.
> > > >
> > > > Due to ioeventfd not being able to carry the extra data associated with
> > > > this feature, the ioeventfd should be left in a disabled state for
> > > > emulated virtio-pci devices using this feature.
> > > >
> > > > Reviewed-by: Eugenio Pérez 
> > > > Signed-off-by: Jonah Palmer 
> > > 
> > > I thought hard about this. I propose that for now,
> > > instead of disabling ioevetfd silently we error out unless
> > > user disabled it for us.
> > > WDYT?
> > > 
> > 
> > Yes, error is a better plan than silently disabling it. In the
> > (unlikely?) case we are able to make notification data work with
> > eventfd in the future, it makes the change more evident.
> > 
> 
> Will do in v2. I assume we'll also make this the case for virtio-mmio and
> virtio-ccw?

Guess so. Pls note freeze is imminent.
> > > 
> > > > ---
> > > >  hw/virtio/virtio-pci.c | 6 --
> > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > > > index d12edc567f..287b8f7720 100644
> > > > --- a/hw/virtio/virtio-pci.c
> > > > +++ b/hw/virtio/virtio-pci.c
> > > > @@ -417,13 +417,15 @@ static void virtio_ioport_write(void *opaque, 
> > > > uint32_t addr, uint32_t val)
> > > >  }
> > > >  break;
> > > >  case VIRTIO_PCI_STATUS:
> > > > -if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> > > > +if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> > > > +!virtio_vdev_has_feature(vdev, 
> > > > VIRTIO_F_NOTIFICATION_DATA)) {
> > > >  virtio_pci_stop_ioeventfd(proxy);
> > > >  }
> > > >
> > > >  virtio_set_status(vdev, val & 0xFF);
> > > >
> > > > -if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > > +if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> > > > +!virtio_vdev_has_feature(vdev, 
> > > > VIRTIO_F_NOTIFICATION_DATA)) {
> > > >  virtio_pci_start_ioeventfd(proxy);
> > > >  }
> > > >
> > > > --
> > > > 2.39.3
> > > 
> >

[RFC PATCH INCOMPLETE] cxl: Multi-headed Single Logical Device (MHSLD)

2024-03-08 Thread Gregory Price

Implement the scaffolding for an MHSLD with a simple multi-headed
command set. This device inherits the cxl-type3 device and will
provide the hooks needed to serialize multi-system devices.

This device requires linux, as it uses shared memory (shmem) to
implement the shared state.  This also limits the emulation of
multiple guests to a single host.

To instantiate:

-device 
cxl-mhsld,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,sn=6,mhd-head=0,mhd-shmid=15

The MH-SLD shared memory region must be initialized via 'init_mhsld'
tool located in the cxl/mhsld directory.

usage: init_mhsld  
heads : number of heads on the device
shmid : shmid produced by ipcmk

Example:
$shmid1=ipcmk -M 131072
./init_mhseld 4 $shmid


What we should discuss:
* What state needs to be in MHSLDSharedState
  - for DCD
  - for other CXL extentions? (RAS?)

* What hooks we need to add to cxl-type3 for MHDs
  - Example: cvc->mhd_access_valid was added for Niagara
  - Additional calls will need to be added for things like DCD

* Shared state design and serialization
  - right now we use the same shared memory solution, this makes
serialization a little easier since we can use a mutex in shmem
  - This limits emulation of MHSLD devices to QEMU guests located
on the same host, but this seems fine for functional testing.

* Shared state initialization
  - right now i use the external tool, but if we want to limit
the ability to test things like restarting hosts, we could
instead add an explicit `initialize_state=true` option to the
device and ditch the extra program.

Signed-off-by: Gregory Price 
---
 hw/cxl/Kconfig|   1 +
 hw/cxl/meson.build|   1 +
 hw/cxl/mhsld/.gitignore   |   1 +
 hw/cxl/mhsld/Kconfig  |   4 +
 hw/cxl/mhsld/init_mhsld.c |  76 
 hw/cxl/mhsld/meson.build  |   3 +
 hw/cxl/mhsld/mhsld.c  | 177 ++
 hw/cxl/mhsld/mhsld.h  |  64 ++
 8 files changed, 327 insertions(+)
 create mode 100644 hw/cxl/mhsld/.gitignore
 create mode 100644 hw/cxl/mhsld/Kconfig
 create mode 100644 hw/cxl/mhsld/init_mhsld.c
 create mode 100644 hw/cxl/mhsld/meson.build
 create mode 100644 hw/cxl/mhsld/mhsld.c
 create mode 100644 hw/cxl/mhsld/mhsld.h

diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
index e603839a62..919e59b598 100644
--- a/hw/cxl/Kconfig
+++ b/hw/cxl/Kconfig
@@ -1,3 +1,4 @@
+source mhsld/Kconfig
 source vendor/Kconfig
 
 config CXL
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index e8c8c1355a..394750dd19 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -16,4 +16,5 @@ system_ss.add(when: 'CONFIG_I2C_MCTP_CXL', if_true: 
files('i2c_mctp_cxl.c'))
 
 system_ss.add(when: 'CONFIG_ALL', if_true: files('cxl-host-stubs.c'))
 
+subdir('mhsld')
 subdir('vendor')
diff --git a/hw/cxl/mhsld/.gitignore b/hw/cxl/mhsld/.gitignore
new file mode 100644
index 00..f9f6a37cc0
--- /dev/null
+++ b/hw/cxl/mhsld/.gitignore
@@ -0,0 +1 @@
+init_mhsld
diff --git a/hw/cxl/mhsld/Kconfig b/hw/cxl/mhsld/Kconfig
new file mode 100644
index 00..dc2be15140
--- /dev/null
+++ b/hw/cxl/mhsld/Kconfig
@@ -0,0 +1,4 @@
+config CXL_MHSLD
+bool
+depends on CXL_MEM_DEVICE
+default y
diff --git a/hw/cxl/mhsld/init_mhsld.c b/hw/cxl/mhsld/init_mhsld.c
new file mode 100644
index 00..d9e0cd54e0
--- /dev/null
+++ b/hw/cxl/mhsld/init_mhsld.c
@@ -0,0 +1,76 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2024 MemVerge Inc.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct mhsld_state {
+uint8_t nr_heads;
+uint8_t nr_lds;
+uint8_t ldmap[65536];
+};
+
+int main(int argc, char *argv[])
+{
+int shmid = 0;
+uint32_t sections = 0;
+uint32_t section_size = 0;
+uint32_t heads = 0;
+struct mhsld_state *mhsld_state = NULL;
+size_t state_size;
+uint8_t i;
+
+if (argc != 3) {
+printf("usage: init_mhsld  \n"
+"\theads : number of heads on the device\n"
+"\tshmid : /tmp/mytoken.tmp\n\n"
+"It is recommended your shared memory region is at least 
128kb\n");
+return -1;
+}
+
+/* must have at least 1 head */
+heads = (uint32_t)atoi(argv[1]);
+if (heads == 0 || heads > 32) {
+printf("bad heads argument (1-32)\n");
+return -1;
+}
+
+shmid = (uint32_t)atoi(argv[2]);
+if (shmid == 0) {
+printf("bad shmid argument\n");
+return -1;
+}
+
+mhsld_state = shmat(shmid, NULL, 0);
+if (mhsld_state == (void *)-1) {
+printf("Unable to attach to shared memory\n");
+return -1;
+}
+
+/* Initialize the mhsld_state */
+state_size = sizeof(struct mhsld_state) + (sizeof(uint32_t) * sections);
+memset(mhsld_state, 0, state_size);
+mhsld_state->nr_heads = heads;
+mhsld_state->nr_lds = heads;
+
+

Re: [PATCH v2 4/5] gdbstub: Add Xfer:siginfo:read stub

2024-03-08 Thread Gustavo Romero


Hi Richard!

On 3/7/24 6:13 PM, Richard Henderson wrote:

On 3/7/24 08:26, Gustavo Romero wrote:

+void gdb_handle_query_xfer_siginfo(GArray *params, void *user_ctx)
+{
+    unsigned long offset, len;
+    uint8_t *siginfo_offset;
+
+    offset = get_param(params, 0)->val_ul;
+    len = get_param(params, 1)->val_ul;
+
+    if (offset + len > sizeof(target_siginfo_t)) {


If you save the siginfo_len from gdb_handlesig, you can place this in user.c

Shouldn't all user-only stubs be placed in user-target.c? Like
gdb_handle_query_xfer_auxv and gdb_handle_query_xfer_exec_file, and since
what controls the inclusion in the build of user-target.c is CONFIG_USER_ONLY?



Is it really correct to reject (offset == 0) + (len == large), rather than 
truncate len?


I think this is correct. GDB mentions briefly that an invalid offset
should be treated as an error. Thus, I think that a valid offset but
a non-existing/invalid (large) length should be treated the same,
cause in the end data on invalid offsets are being requested anyways.



+    /* Reply */
+    g_string_assign(gdbserver_state.str_buf, "l");
+    gdb_memtox(gdbserver_state.str_buf, (const char *)siginfo_offset, len);


It seems easy enough to reply with the exact length remaining...


I think the correct is to reply an error in case GDB asks a data
we don't have rather than returning anything else to satisfy GDB.
If offset+len is inside target_siginfo_t, than that's ok, otherwise
that's an error.


Cheers,
Gustavo

[PULL 6/9] oslib-posix: fix memory leak in touch_all_pages

2024-03-08 Thread Paolo Bonzini

touch_all_pages() can return early, before creating threads.  In this case,
however, it leaks the MemsetContext that it has allocated at the
beginning of the function.

Reported by Coverity as CID 1534922.

Fixes: 04accf43df8 ("oslib-posix: initialize backend memory objects in 
parallel", 2024-02-06)
Reviewed-by: Mark Kanda 
Signed-off-by: Paolo Bonzini 
---
 util/oslib-posix.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 3c379f96c26..e76441695bd 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -467,11 +467,13 @@ static int touch_all_pages(char *area, size_t hpagesize, 
size_t numpages,
  * preallocating synchronously.
  */
 if (context->num_threads == 1 && !async) {
+ret = 0;
 if (qemu_madvise(area, hpagesize * numpages,
  QEMU_MADV_POPULATE_WRITE)) {
-return -errno;
+ret = -errno;
 }
-return 0;
+g_free(context);
+return ret;
 }
 touch_fn = do_madv_populate_write_pages;
 } else {
-- 
2.43.2

[PULL v2 0/9] Misc fixes and coverity CI for 2024-03-08

2024-03-08 Thread Paolo Bonzini

The following changes since commit 8f6330a807f2642dc2a3cdf33347aa28a4c00a87:

  Merge tag 'pull-maintainer-updates-060324-1' of 
https://gitlab.com/stsquad/qemu into staging (2024-03-06 16:56:20 +)

are available in the Git repository at:

  https://gitlab.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to 83aa1baa069c8f77aa9f7d9adfdeb11d90bdf78d:

  gitlab-ci: add manual job to run Coverity (2024-03-08 19:08:23 +0100)

Supersedes: <20240308145554.599614-1-pbonz...@redhat.com>

* target/i386: use TSTEQ/TSTNE
* move Coverity builds to Gitlab CI
* fix two memory leaks
* bug fixes


Akihiko Odaki (1):
  meson: Remove --warn-common ldflag

Dmitrii Gavrilov (1):
  system/qdev-monitor: move drain_call_rcu call under if (!dev) in 
qmp_device_add()

Paolo Bonzini (5):
  hw/intc/apic: fix memory leak
  oslib-posix: fix memory leak in touch_all_pages
  mips: do not list individual devices from configs/
  run-coverity-scan: add --check-upload-only option
  gitlab-ci: add manual job to run Coverity

Sven Schnelle (2):
  hw/scsi/lsi53c895a: add timer to scripts processing
  hw/scsi/lsi53c895a: stop script on phase mismatch

 configs/devices/mips-softmmu/common.mak  | 28 ++---
 configs/devices/mips64el-softmmu/default.mak |  3 --
 meson.build  |  5 ---
 hw/intc/apic.c   |  6 +--
 hw/scsi/lsi53c895a.c | 59 +---
 system/qdev-monitor.c| 23 ++-
 util/oslib-posix.c   |  6 ++-
 .gitlab-ci.d/base.yml|  4 ++
 .gitlab-ci.d/buildtest.yml   | 39 +-
 .gitlab-ci.d/opensbi.yml |  4 ++
 hw/display/Kconfig   |  2 +-
 hw/mips/Kconfig  | 20 +-
 hw/scsi/trace-events |  2 +
 scripts/coverity-scan/run-coverity-scan  | 59 
 14 files changed, 176 insertions(+), 84 deletions(-)
-- 
2.43.2

[PULL 4/9] hw/scsi/lsi53c895a: stop script on phase mismatch

2024-03-08 Thread Paolo Bonzini

From: Sven Schnelle 

Netbsd isn't happy with qemu lsi53c895a emulation:

cd0(esiop0:0:2:0): command with tag id 0 reset
esiop0: autoconfiguration error: phase mismatch without command
esiop0: autoconfiguration error: unhandled scsi interrupt, sist=0x80 sstat1=0x0 
DSA=0x23a64b1 DSP=0x50

This is because lsi_bad_phase() triggers a phase mismatch, which
stops SCRIPT processing. However, after returning to
lsi_command_complete(), SCRIPT is restarted with lsi_resume_script().
Fix this by adding a return value to lsi_bad_phase(), and only resume
script processing when lsi_bad_phase() didn't trigger a host interrupt.

Signed-off-by: Sven Schnelle 
Tested-by: Helge Deller 
Message-ID: <20240302214453.2071388-1-sv...@stackframe.org>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/lsi53c895a.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index 4ff94703816..59b88aff3fb 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -573,8 +573,9 @@ static inline void lsi_set_phase(LSIState *s, int phase)
 s->sstat1 = (s->sstat1 & ~PHASE_MASK) | phase;
 }
 
-static void lsi_bad_phase(LSIState *s, int out, int new_phase)
+static int lsi_bad_phase(LSIState *s, int out, int new_phase)
 {
+int ret = 0;
 /* Trigger a phase mismatch.  */
 if (s->ccntl0 & LSI_CCNTL0_ENPMJ) {
 if ((s->ccntl0 & LSI_CCNTL0_PMJCTL)) {
@@ -587,8 +588,10 @@ static void lsi_bad_phase(LSIState *s, int out, int 
new_phase)
 trace_lsi_bad_phase_interrupt();
 lsi_script_scsi_interrupt(s, LSI_SIST0_MA, 0);
 lsi_stop_script(s);
+ret = 1;
 }
 lsi_set_phase(s, new_phase);
+return ret;
 }
 
 
@@ -792,7 +795,7 @@ static int lsi_queue_req(LSIState *s, SCSIRequest *req, 
uint32_t len)
 static void lsi_command_complete(SCSIRequest *req, size_t resid)
 {
 LSIState *s = LSI53C895A(req->bus->qbus.parent);
-int out;
+int out, stop = 0;
 
 out = (s->sstat1 & PHASE_MASK) == PHASE_DO;
 trace_lsi_command_complete(req->status);
@@ -800,7 +803,10 @@ static void lsi_command_complete(SCSIRequest *req, size_t 
resid)
 s->command_complete = 2;
 if (s->waiting && s->dbc != 0) {
 /* Raise phase mismatch for short transfers.  */
-lsi_bad_phase(s, out, PHASE_ST);
+stop = lsi_bad_phase(s, out, PHASE_ST);
+if (stop) {
+s->waiting = 0;
+}
 } else {
 lsi_set_phase(s, PHASE_ST);
 }
@@ -810,7 +816,9 @@ static void lsi_command_complete(SCSIRequest *req, size_t 
resid)
 lsi_request_free(s, s->current);
 scsi_req_unref(req);
 }
-lsi_resume_script(s);
+if (!stop) {
+lsi_resume_script(s);
+}
 }
 
  /* Callback to indicate that the SCSI layer has completed a transfer.  */
-- 
2.43.2

Re: [PULL 00/12] Misc fixes, i386 TSTEQ/TSTNE, coverity CI for 2024-03-08

2024-03-08 Thread Paolo Bonzini

On Fri, Mar 8, 2024 at 6:32 PM Peter Maydell  wrote:
> Looks like this hits a TCG assertion on aarch64 host:
> https://gitlab.com/qemu-project/qemu/-/jobs/6353434430

Ok, I dropped the TSTEQ/TSTNE patches.

Paolo

[PULL 5/9] hw/intc/apic: fix memory leak

2024-03-08 Thread Paolo Bonzini

deliver_bitmask is allocated on the heap in apic_deliver(), but there
are many paths in the function that return before the corresponding
g_free() is reached.  Fix this by switching to g_autofree and, while at
it, also switch to g_new.  Do the same in apic_deliver_irq() as well
for consistency.

Fixes: b5ee0468e9d ("apic: add support for x2APIC mode", 2024-02-14)
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Bui Quang Minh 
Signed-off-by: Paolo Bonzini 
---
 hw/intc/apic.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 1d887d66b86..4186c57b34c 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -291,14 +291,13 @@ static void apic_deliver_irq(uint32_t dest, uint8_t 
dest_mode,
  uint8_t delivery_mode, uint8_t vector_num,
  uint8_t trigger_mode)
 {
-uint32_t *deliver_bitmask = g_malloc(max_apic_words * sizeof(uint32_t));
+g_autofree uint32_t *deliver_bitmask = g_new(uint32_t, max_apic_words);
 
 trace_apic_deliver_irq(dest, dest_mode, delivery_mode, vector_num,
trigger_mode);
 
 apic_get_delivery_bitmask(deliver_bitmask, dest, dest_mode);
 apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
-g_free(deliver_bitmask);
 }
 
 bool is_x2apic_mode(DeviceState *dev)
@@ -662,7 +661,7 @@ static void apic_deliver(DeviceState *dev, uint32_t dest, 
uint8_t dest_mode,
 APICCommonState *s = APIC(dev);
 APICCommonState *apic_iter;
 uint32_t deliver_bitmask_size = max_apic_words * sizeof(uint32_t);
-uint32_t *deliver_bitmask = g_malloc(deliver_bitmask_size);
+g_autofree uint32_t *deliver_bitmask = g_new(uint32_t, max_apic_words);
 uint32_t current_apic_id;
 
 if (is_x2apic_mode(dev)) {
@@ -708,7 +707,6 @@ static void apic_deliver(DeviceState *dev, uint32_t dest, 
uint8_t dest_mode,
 }
 
 apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
-g_free(deliver_bitmask);
 }
 
 static bool apic_check_pic(APICCommonState *s)
-- 
2.43.2

[PULL 9/9] gitlab-ci: add manual job to run Coverity

2024-03-08 Thread Paolo Bonzini

Add a job that can be run, either manually or on a schedule, to upload
a build to Coverity Scan.  The job uses the run-coverity-scan script
in multiple phases of check, download tools and upload, in order to
avoid both wasting time (skip everything if you are above the upload
quota) and avoid filling the log with the progress of downloading
the tools.

The job is intended to run on a scheduled pipeline run, and scheduled
runs will not get any other job.  It requires two variables to be in
GitLab CI, COVERITY_TOKEN and COVERITY_EMAIL.  Those are already set up
in qemu-project's configuration as protected and masked variables.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Paolo Bonzini 
---
 .gitlab-ci.d/base.yml  |  4 
 .gitlab-ci.d/buildtest.yml | 37 +
 .gitlab-ci.d/opensbi.yml   |  4 
 3 files changed, 45 insertions(+)

diff --git a/.gitlab-ci.d/base.yml b/.gitlab-ci.d/base.yml
index ef173a34e63..2dd8a9b57cb 100644
--- a/.gitlab-ci.d/base.yml
+++ b/.gitlab-ci.d/base.yml
@@ -41,6 +41,10 @@ variables:
 - if: '$CI_PROJECT_NAMESPACE == $QEMU_CI_UPSTREAM && $CI_COMMIT_TAG'
   when: never
 
+# Scheduled runs on mainline don't get pipelines except for the special 
Coverity job
+- if: '$CI_PROJECT_NAMESPACE == $QEMU_CI_UPSTREAM && $CI_PIPELINE_SOURCE 
== "schedule"'
+  when: never
+
 # Cirrus jobs can't run unless the creds / target repo are set
 - if: '$QEMU_JOB_CIRRUS && ($CIRRUS_GITHUB_REPO == null || 
$CIRRUS_API_TOKEN == null)'
   when: never
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 901265af95d..c7d92fc3018 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -729,3 +729,40 @@ pages:
   - public
   variables:
 QEMU_JOB_PUBLISH: 1
+
+coverity:
+  image: $CI_REGISTRY_IMAGE/qemu/fedora:$QEMU_CI_CONTAINER_TAG
+  stage: build
+  allow_failure: true
+  timeout: 3h
+  needs:
+- job: amd64-fedora-container
+  optional: true
+  before_script:
+- dnf install -y curl wget
+  script:
+# would be nice to cancel the job if over quota 
(https://gitlab.com/gitlab-org/gitlab/-/issues/256089)
+# for example:
+#   curl --request POST --header "PRIVATE-TOKEN: $CI_JOB_TOKEN" 
"${CI_SERVER_URL}/api/v4/projects/${CI_PROJECT_ID}/jobs/${CI_JOB_ID}/cancel
+- 'scripts/coverity-scan/run-coverity-scan --check-upload-only || { 
exitcode=$?; if test $exitcode = 1; then
+exit 0;
+  else
+exit $exitcode;
+  fi; };
+  scripts/coverity-scan/run-coverity-scan --update-tools-only > 
update-tools.log 2>&1 || { cat update-tools.log; exit 1; };
+  scripts/coverity-scan/run-coverity-scan --no-update-tools'
+  rules:
+- if: '$COVERITY_TOKEN == null'
+  when: never
+- if: '$COVERITY_EMAIL == null'
+  when: never
+# Never included on upstream pipelines, except for schedules
+- if: '$CI_PROJECT_NAMESPACE == $QEMU_CI_UPSTREAM && $CI_PIPELINE_SOURCE 
== "schedule"'
+  when: on_success
+- if: '$CI_PROJECT_NAMESPACE == $QEMU_CI_UPSTREAM'
+  when: never
+# Forks don't get any pipeline unless QEMU_CI=1 or QEMU_CI=2 is set
+- if: '$QEMU_CI != "1" && $QEMU_CI != "2"'
+  when: never
+# Always manual on forks even if $QEMU_CI == "2"
+- when: manual
diff --git a/.gitlab-ci.d/opensbi.yml b/.gitlab-ci.d/opensbi.yml
index fd293e6c317..42f137d624e 100644
--- a/.gitlab-ci.d/opensbi.yml
+++ b/.gitlab-ci.d/opensbi.yml
@@ -24,6 +24,10 @@
 - if: '$QEMU_CI == "1" && $CI_PROJECT_NAMESPACE != "qemu-project" && 
$CI_COMMIT_MESSAGE =~ /opensbi/i'
   when: manual
 
+# Scheduled runs on mainline don't get pipelines except for the special 
Coverity job
+- if: '$CI_PROJECT_NAMESPACE == $QEMU_CI_UPSTREAM && $CI_PIPELINE_SOURCE 
== "schedule"'
+  when: never
+
 # Run if any files affecting the build output are touched
 - changes:
 - .gitlab-ci.d/opensbi.yml
-- 
2.43.2

[PULL 7/9] mips: do not list individual devices from configs/

2024-03-08 Thread Paolo Bonzini

Add new "select" and "imply" directives if needed.  The resulting
config-devices.mak files are the same as before.
Builds without default devices will become much smaller
than before, and qtests fail (as expected, though suboptimal)
for mips64-softmmu because most tests do not use -nodefaults,
so remove it from build-without-defaults

Signed-off-by: Paolo Bonzini 
---
 configs/devices/mips-softmmu/common.mak  | 28 +++-
 configs/devices/mips64el-softmmu/default.mak |  3 ---
 .gitlab-ci.d/buildtest.yml   |  2 +-
 hw/display/Kconfig   |  2 +-
 hw/mips/Kconfig  | 20 +-
 5 files changed, 25 insertions(+), 30 deletions(-)

diff --git a/configs/devices/mips-softmmu/common.mak 
b/configs/devices/mips-softmmu/common.mak
index 1a853841b27..416a5d353e8 100644
--- a/configs/devices/mips-softmmu/common.mak
+++ b/configs/devices/mips-softmmu/common.mak
@@ -1,28 +1,8 @@
 # Common mips*-softmmu CONFIG defines
 
-CONFIG_ISA_BUS=y
-CONFIG_PCI=y
-CONFIG_PCI_DEVICES=y
-CONFIG_VGA_ISA=y
-CONFIG_VGA_MMIO=y
-CONFIG_VGA_CIRRUS=y
-CONFIG_VMWARE_VGA=y
-CONFIG_SERIAL=y
-CONFIG_SERIAL_ISA=y
-CONFIG_PARALLEL=y
-CONFIG_I8254=y
-CONFIG_PCSPK=y
-CONFIG_PCKBD=y
-CONFIG_FDC=y
-CONFIG_I8257=y
-CONFIG_IDE_ISA=y
-CONFIG_PFLASH_CFI01=y
-CONFIG_I8259=y
-CONFIG_MC146818RTC=y
-CONFIG_MIPS_CPS=y
-CONFIG_MIPS_ITU=y
+# Uncomment the following lines to disable these optional devices:
+# CONFIG_PCI_DEVICES=n
+# CONFIG_TEST_DEVICES=n
+
 CONFIG_MALTA=y
-CONFIG_PCNET_PCI=y
 CONFIG_MIPSSIM=y
-CONFIG_SMBUS_EEPROM=y
-CONFIG_TEST_DEVICES=y
diff --git a/configs/devices/mips64el-softmmu/default.mak 
b/configs/devices/mips64el-softmmu/default.mak
index d5188f7ea58..88a37cf27f1 100644
--- a/configs/devices/mips64el-softmmu/default.mak
+++ b/configs/devices/mips64el-softmmu/default.mak
@@ -3,8 +3,5 @@
 include ../mips-softmmu/common.mak
 CONFIG_FULOONG=y
 CONFIG_LOONGSON3V=y
-CONFIG_ATI_VGA=y
-CONFIG_RTL8139_PCI=y
 CONFIG_JAZZ=y
-CONFIG_VT82C686=y
 CONFIG_MIPS_BOSTON=y
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index a1c030337b1..901265af95d 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -659,7 +659,7 @@ build-without-defaults:
   --disable-pie
   --disable-qom-cast-debug
   --disable-strip
-TARGETS: avr-softmmu mips64-softmmu s390x-softmmu sh4-softmmu
+TARGETS: avr-softmmu s390x-softmmu sh4-softmmu
   sparc64-softmmu hexagon-linux-user i386-linux-user s390x-linux-user
 MAKE_CHECK_ARGS: check
 
diff --git a/hw/display/Kconfig b/hw/display/Kconfig
index 07acb37dc66..234c7de027c 100644
--- a/hw/display/Kconfig
+++ b/hw/display/Kconfig
@@ -55,7 +55,7 @@ config VGA_MMIO
 
 config VMWARE_VGA
 bool
-default y if PCI_DEVICES && PC_PCI
+default y if PCI_DEVICES && (PC_PCI || MIPS)
 depends on PCI
 select VGA
 
diff --git a/hw/mips/Kconfig b/hw/mips/Kconfig
index e57db4f6412..5c83ef49cf6 100644
--- a/hw/mips/Kconfig
+++ b/hw/mips/Kconfig
@@ -1,8 +1,15 @@
 config MALTA
 bool
+imply PCNET_PCI
+imply PCI_DEVICES
+imply TEST_DEVICES
 select FDC37M81X
 select GT64120
+select MIPS_CPS
 select PIIX
+select PFLASH_CFI01
+select SERIAL
+select SMBUS_EEPROM
 
 config MIPSSIM
 bool
@@ -31,17 +38,26 @@ config JAZZ
 
 config FULOONG
 bool
+imply PCI_DEVICES
+imply TEST_DEVICES
+imply ATI_VGA
+imply RTL8139_PCI
 select PCI_BONITO
+select SMBUS_EEPROM
 select VT82C686
 
 config LOONGSON3V
 bool
+imply PCI_DEVICES
+imply TEST_DEVICES
+imply VIRTIO_PCI
+imply VIRTIO_NET
 imply VIRTIO_VGA
 imply QXL if SPICE
+imply USB_OHCI_PCI
 select SERIAL
 select GOLDFISH_RTC
 select LOONGSON_LIOINTC
-select PCI_DEVICES
 select PCI_EXPRESS_GENERIC_BRIDGE
 select MSI_NONBROKEN
 select FW_CFG_MIPS
@@ -53,6 +69,8 @@ config MIPS_CPS
 
 config MIPS_BOSTON
 bool
+imply PCI_DEVICES
+imply TEST_DEVICES
 select FITLOADER
 select MIPS_CPS
 select PCI_EXPRESS_XILINX
-- 
2.43.2

[PULL 8/9] run-coverity-scan: add --check-upload-only option

2024-03-08 Thread Paolo Bonzini

Add an option to check if upload is permitted without actually
attempting a build.  This can be useful to add a third outcome
beyond success and failure---namely, a CI job can self-cancel
if the uploading quota has been reached.

There is a small change here in that a failure to do the upload
check changes the exit code from 1 to 99.  99 was chosen because
it is what Autotools and Meson use to represent a problem in the
setup (as opposed to a failure in the test).

Reviewed-by: Peter Maydell 
Signed-off-by: Paolo Bonzini 
---
 scripts/coverity-scan/run-coverity-scan | 59 ++---
 1 file changed, 42 insertions(+), 17 deletions(-)

diff --git a/scripts/coverity-scan/run-coverity-scan 
b/scripts/coverity-scan/run-coverity-scan
index d56c9b66776..43cf770f5e3 100755
--- a/scripts/coverity-scan/run-coverity-scan
+++ b/scripts/coverity-scan/run-coverity-scan
@@ -28,6 +28,7 @@
 # project settings, if you have maintainer access there.
 
 # Command line options:
+#   --check-upload-only : return success if upload is possible
 #   --dry-run : run the tools, but don't actually do the upload
 #   --docker : create and work inside a container
 #   --docker-engine : specify the container engine to use (docker/podman/auto);
@@ -57,18 +58,18 @@
 # putting it in a file and using --tokenfile. Everything else has
 # a reasonable default if this is run from a git tree.
 
-check_upload_permissions() {
-# Check whether we can do an upload to the server; will exit the script
-# with status 1 if the check failed (usually a bad token);
-# will exit the script with status 0 if the check indicated that we
-# can't upload yet (ie we are at quota)
-# Assumes that COVERITY_TOKEN, PROJNAME and DRYRUN have been initialized.
+upload_permitted() {
+# Check whether we can do an upload to the server; will exit *the script*
+# with status 99 if the check failed (usually a bad token);
+# will return from the function with status 1 if the check indicated
+# that we can't upload yet (ie we are at quota)
+# Assumes that COVERITY_TOKEN and PROJNAME have been initialized.
 
 echo "Checking upload permissions..."
 
 if ! up_perm="$(wget https://scan.coverity.com/api/upload_permitted 
--post-data "token=$COVERITY_TOKEN=$PROJNAME" -q -O -)"; then
 echo "Coverity Scan API access denied: bad token?"
-exit 1
+exit 99
 fi
 
 # Really up_perm is a JSON response with either
@@ -76,25 +77,40 @@ check_upload_permissions() {
 # We do some hacky string parsing instead of properly parsing it.
 case "$up_perm" in
 *upload_permitted*true*)
-echo "Coverity Scan: upload permitted"
+return 0
 ;;
 *next_upload_permitted_at*)
-if [ "$DRYRUN" = yes ]; then
-echo "Coverity Scan: upload quota reached, continuing dry run"
-else
-echo "Coverity Scan: upload quota reached; stopping here"
-# Exit success as this isn't a build error.
-exit 0
-fi
+return 1
 ;;
 *)
 echo "Coverity Scan upload check: unexpected result $up_perm"
-exit 1
+exit 99
 ;;
 esac
 }
 
 
+check_upload_permissions() {
+# Check whether we can do an upload to the server; will exit the script
+# with status 99 if the check failed (usually a bad token);
+# will exit the script with status 0 if the check indicated that we
+# can't upload yet (ie we are at quota)
+# Assumes that COVERITY_TOKEN, PROJNAME and DRYRUN have been initialized.
+
+if upload_permitted; then
+echo "Coverity Scan: upload permitted"
+else
+if [ "$DRYRUN" = yes ]; then
+echo "Coverity Scan: upload quota reached, continuing dry run"
+else
+echo "Coverity Scan: upload quota reached; stopping here"
+# Exit success as this isn't a build error.
+exit 0
+fi
+fi
+}
+
+
 build_docker_image() {
 # build docker container including the coverity-scan tools
 echo "Building docker container..."
@@ -152,9 +168,14 @@ update_coverity_tools () {
 DRYRUN=no
 UPDATE=yes
 DOCKER=no
+PROJNAME=QEMU
 
 while [ "$#" -ge 1 ]; do
 case "$1" in
+--check-upload-only)
+shift
+DRYRUN=check
+;;
 --dry-run)
 shift
 DRYRUN=yes
@@ -251,6 +272,11 @@ if [ -z "$COVERITY_TOKEN" ]; then
 exit 1
 fi
 
+if [ "$DRYRUN" = check ]; then
+upload_permitted
+exit $?
+fi
+
 if [ -z "$COVERITY_BUILD_CMD" ]; then
 NPROC=$(nproc)
 COVERITY_BUILD_CMD="make -j$NPROC"
@@ -266,7 +292,6 @@ if [ -z "$SRCDIR" ]; then
 SRCDIR="$PWD"
 fi
 
-PROJNAME=QEMU
 TARBALL=cov-int.tar.xz
 
 if [ "$UPDATE" = only ]; then
-- 
2.43.2

[PULL 3/9] meson: Remove --warn-common ldflag

2024-03-08 Thread Paolo Bonzini

From: Akihiko Odaki 

--warn-common ldflag causes warnings for multiple definitions of
___asan_globals_registered when enabling AddressSanitizer with clang.
The warning is somewhat obsolete so just remove it.

The common block is used to allow duplicate definitions of uninitialized
global variables. In the past, GCC and clang used to place such
variables in a common block by default, which prevented programmers for
noticing accidental duplicate definitions. Commit 49237acdb725 ("Enable
ld flag --warn-common") added --warn-common ldflag so that ld warns in
such a case.

Today, both of GCC and clang don't use common blocks by default[1][2] so
any remaining use of common blocks should be intentional. Remove
--warn-common ldflag to suppress warnings for intentional use of
common blocks.

[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85678
[2]: https://reviews.llvm.org/D75056

Signed-off-by: Akihiko Odaki 
Message-ID: <20240304-common-v1-1-1a2005d1f...@daynix.com>
Signed-off-by: Paolo Bonzini 
---
 meson.build | 5 -
 1 file changed, 5 deletions(-)

diff --git a/meson.build b/meson.build
index c59ca496f2d..f9dbe7634e5 100644
--- a/meson.build
+++ b/meson.build
@@ -476,11 +476,6 @@ if host_os == 'windows'
   qemu_ldflags += cc.get_supported_link_arguments('-Wl,--dynamicbase', 
'-Wl,--high-entropy-va')
 endif
 
-# Exclude --warn-common with TSan to suppress warnings from the TSan libraries.
-if host_os != 'sunos' and not get_option('tsan')
-  qemu_ldflags += cc.get_supported_link_arguments('-Wl,--warn-common')
-endif
-
 if get_option('fuzzing')
   # Specify a filter to only instrument code that is directly related to
   # virtual-devices.
-- 
2.43.2

[PULL 1/9] hw/scsi/lsi53c895a: add timer to scripts processing

2024-03-08 Thread Paolo Bonzini

From: Sven Schnelle 

HP-UX 10.20 seems to make the lsi53c895a spinning on a memory location
under certain circumstances. As the SCSI controller and CPU are not
running at the same time this loop will never finish. After some
time, the check loop interrupts with a unexpected device disconnect.
This works, but is slow because the kernel resets the scsi controller.
Instead of signaling UDC, start a timer and exit the loop. Until the
timer fires, the CPU can process instructions which might changes the
memory location.

The limit of instructions is also reduced because scripts running on
the SCSI processor are usually very short. This keeps the time until
the loop is exit short.

Suggested-by: Peter Maydell 
Signed-off-by: Sven Schnelle 
Message-ID: <20240229204407.1699260-1-sv...@stackframe.org>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/lsi53c895a.c | 43 +--
 hw/scsi/trace-events |  2 ++
 2 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index d607a5f9fb1..4ff94703816 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -188,7 +188,7 @@ static const char *names[] = {
 #define LSI_TAG_VALID (1 << 16)
 
 /* Maximum instructions to process. */
-#define LSI_MAX_INSN1
+#define LSI_MAX_INSN100
 
 typedef struct lsi_request {
 SCSIRequest *req;
@@ -205,6 +205,7 @@ enum {
 LSI_WAIT_RESELECT, /* Wait Reselect instruction has been issued */
 LSI_DMA_SCRIPTS, /* processing DMA from lsi_execute_script */
 LSI_DMA_IN_PROGRESS, /* DMA operation is in progress */
+LSI_WAIT_SCRIPTS, /* SCRIPTS stopped because of instruction count limit */
 };
 
 enum {
@@ -224,6 +225,7 @@ struct LSIState {
 MemoryRegion ram_io;
 MemoryRegion io_io;
 AddressSpace pci_io_as;
+QEMUTimer *scripts_timer;
 
 int carry; /* ??? Should this be an a visible register somewhere?  */
 int status;
@@ -415,6 +417,7 @@ static void lsi_soft_reset(LSIState *s)
 s->sbr = 0;
 assert(QTAILQ_EMPTY(>queue));
 assert(!s->current);
+timer_del(s->scripts_timer);
 }
 
 static int lsi_dma_40bit(LSIState *s)
@@ -1127,6 +1130,12 @@ static void lsi_wait_reselect(LSIState *s)
 }
 }
 
+static void lsi_scripts_timer_start(LSIState *s)
+{
+trace_lsi_scripts_timer_start();
+timer_mod(s->scripts_timer, qemu_clock_get_us(QEMU_CLOCK_VIRTUAL) + 500);
+}
+
 static void lsi_execute_script(LSIState *s)
 {
 PCIDevice *pci_dev = PCI_DEVICE(s);
@@ -1136,6 +1145,11 @@ static void lsi_execute_script(LSIState *s)
 int insn_processed = 0;
 static int reentrancy_level;
 
+if (s->waiting == LSI_WAIT_SCRIPTS) {
+timer_del(s->scripts_timer);
+s->waiting = LSI_NOWAIT;
+}
+
 reentrancy_level++;
 
 s->istat1 |= LSI_ISTAT1_SRUN;
@@ -1143,8 +1157,8 @@ again:
 /*
  * Some windows drivers make the device spin waiting for a memory location
  * to change. If we have executed more than LSI_MAX_INSN instructions then
- * assume this is the case and force an unexpected device disconnect. This
- * is apparently sufficient to beat the drivers into submission.
+ * assume this is the case and start a timer. Until the timer fires, the
+ * host CPU has a chance to run and change the memory location.
  *
  * Another issue (CVE-2023-0330) can occur if the script is programmed to
  * trigger itself again and again. Avoid this problem by stopping after
@@ -1152,13 +1166,8 @@ again:
  * which should be enough for all valid use cases).
  */
 if (++insn_processed > LSI_MAX_INSN || reentrancy_level > 8) {
-if (!(s->sien0 & LSI_SIST0_UDC)) {
-qemu_log_mask(LOG_GUEST_ERROR,
-  "lsi_scsi: inf. loop with UDC masked");
-}
-lsi_script_scsi_interrupt(s, LSI_SIST0_UDC, 0);
-lsi_disconnect(s);
-trace_lsi_execute_script_stop();
+s->waiting = LSI_WAIT_SCRIPTS;
+lsi_scripts_timer_start(s);
 reentrancy_level--;
 return;
 }
@@ -2197,6 +2206,9 @@ static int lsi_post_load(void *opaque, int version_id)
 return -EINVAL;
 }
 
+if (s->waiting == LSI_WAIT_SCRIPTS) {
+lsi_scripts_timer_start(s);
+}
 return 0;
 }
 
@@ -2294,6 +2306,15 @@ static const struct SCSIBusInfo lsi_scsi_info = {
 .cancel = lsi_request_cancelled
 };
 
+static void scripts_timer_cb(void *opaque)
+{
+LSIState *s = opaque;
+
+trace_lsi_scripts_timer_triggered();
+s->waiting = LSI_NOWAIT;
+lsi_execute_script(s);
+}
+
 static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
 {
 LSIState *s = LSI53C895A(dev);
@@ -2313,6 +2334,7 @@ static void lsi_scsi_realize(PCIDevice *dev, Error **errp)
   "lsi-ram", 0x2000);
 memory_region_init_io(>io_io, OBJECT(s), _io_ops, s,
   "lsi-io", 256);
+s->scripts_timer = timer_new_us(QEMU_CLOCK_VIRTUAL, scripts_timer_cb, s);

[PULL 2/9] system/qdev-monitor: move drain_call_rcu call under if (!dev) in qmp_device_add()

2024-03-08 Thread Paolo Bonzini

From: Dmitrii Gavrilov 

Original goal of addition of drain_call_rcu to qmp_device_add was to cover
the failure case of qdev_device_add. It seems call of drain_call_rcu was
misplaced in 7bed89958bfbf40df what led to waiting for pending RCU callbacks
under happy path too. What led to overall performance degradation of
qmp_device_add.

In this patch call of drain_call_rcu moved under handling of failure of
qdev_device_add.

Signed-off-by: Dmitrii Gavrilov 
Message-ID: <20231103105602.90475-1-ds-g...@yandex-team.ru>
Fixes: 7bed89958bf ("device_core: use drain_call_rcu in in qmp_device_add", 
2020-10-12)
Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
---
 system/qdev-monitor.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index a13db763e5d..874d65191ce 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -858,19 +858,18 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, 
Error **errp)
 return;
 }
 dev = qdev_device_add(opts, errp);
-
-/*
- * Drain all pending RCU callbacks. This is done because
- * some bus related operations can delay a device removal
- * (in this case this can happen if device is added and then
- * removed due to a configuration error)
- * to a RCU callback, but user might expect that this interface
- * will finish its job completely once qmp command returns result
- * to the user
- */
-drain_call_rcu();
-
 if (!dev) {
+/*
+ * Drain all pending RCU callbacks. This is done because
+ * some bus related operations can delay a device removal
+ * (in this case this can happen if device is added and then
+ * removed due to a configuration error)
+ * to a RCU callback, but user might expect that this interface
+ * will finish its job completely once qmp command returns result
+ * to the user
+ */
+drain_call_rcu();
+
 qemu_opts_del(opts);
 return;
 }
-- 
2.43.2

reminder: softfreeze coming up

2024-03-08 Thread Peter Maydell

Hi; just a reminder that QEMU softfreeze is on the 12th March, next
Tuesday. That means that all feature change work should be in a pullreq
on the mailing list by that date.

Also, if you have outstanding bugs that you know ought to be fixed
for the 9.0 release, now is a good time to make sure that they're
in the issue tracker and tagged with the 9.0 milestone.

thanks
-- PMM

Re: [PATCH] hw/arm: Deprecate various old Arm machine types

2024-03-08 Thread Richard Henderson


On 3/8/24 07:16, Peter Maydell wrote:

Mark all these machine types as depprecated.

Signed-off-by: Peter Maydell
---
  docs/about/deprecated.rst | 15 +++
  hw/arm/gumstix.c  |  2 ++
  hw/arm/mainstone.c|  1 +
  hw/arm/nseries.c  |  2 ++
  hw/arm/palm.c |  1 +
  hw/arm/spitz.c|  1 +
  hw/arm/tosa.c |  1 +
  hw/arm/z2.c   |  1 +
  8 files changed, 24 insertions(+)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v1 2/8] virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-08 Thread Jonah Palmer





On 3/8/24 12:36 PM, Eugenio Perez Martin wrote:
On Fri, Mar 8, 2024 at 6: 01 PM Michael S. Tsirkin  
wrote: > > On Mon, Mar 04, 2024 at 02: 46: 06PM -0500, Jonah Palmer 
wrote: > > Prevent ioeventfd from being enabled/disabled when a 
virtio-pci > > device

ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Report Suspicious

ZjQcmQRYFpfptBannerEnd

On Fri, Mar 8, 2024 at 6:01 PM Michael S. Tsirkin  wrote:


On Mon, Mar 04, 2024 at 02:46:06PM -0500, Jonah Palmer wrote:
> Prevent ioeventfd from being enabled/disabled when a virtio-pci
> device has negotiated the VIRTIO_F_NOTIFICATION_DATA transport
> feature.
>
> Due to ioeventfd not being able to carry the extra data associated with
> this feature, the ioeventfd should be left in a disabled state for
> emulated virtio-pci devices using this feature.
>
> Reviewed-by: Eugenio Pérez 
> Signed-off-by: Jonah Palmer 

I thought hard about this. I propose that for now,
instead of disabling ioevetfd silently we error out unless
user disabled it for us.
WDYT?



Yes, error is a better plan than silently disabling it. In the
(unlikely?) case we are able to make notification data work with
eventfd in the future, it makes the change more evident.



Will do in v2. I assume we'll also make this the case for virtio-mmio 
and virtio-ccw?




> ---
>  hw/virtio/virtio-pci.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index d12edc567f..287b8f7720 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -417,13 +417,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
>  }
>  break;
>  case VIRTIO_PCI_STATUS:
> -if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
>  virtio_pci_stop_ioeventfd(proxy);
>  }
>
>  virtio_set_status(vdev, val & 0xFF);
>
> -if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
> +if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
>  virtio_pci_start_ioeventfd(proxy);
>  }
>
> --
> 2.39.3

Re: Unmapping KVM Guest Memory from Host Kernel

2024-03-08 Thread David Woodhouse

On Fri, 2024-03-08 at 09:35 -0800, David Matlack wrote:
> I think what James is looking for (and what we are also interested
> in), is _eliminating_ the ability to access guest memory from the
> direct map entirely. And in general, eliminate the ability to access
> guest memory in as many ways as possible.

Well, pKVM does that... 


smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH v1 2/8] virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-08 Thread Eugenio Perez Martin

On Fri, Mar 8, 2024 at 6:01 PM Michael S. Tsirkin  wrote:
>
> On Mon, Mar 04, 2024 at 02:46:06PM -0500, Jonah Palmer wrote:
> > Prevent ioeventfd from being enabled/disabled when a virtio-pci
> > device has negotiated the VIRTIO_F_NOTIFICATION_DATA transport
> > feature.
> >
> > Due to ioeventfd not being able to carry the extra data associated with
> > this feature, the ioeventfd should be left in a disabled state for
> > emulated virtio-pci devices using this feature.
> >
> > Reviewed-by: Eugenio Pérez 
> > Signed-off-by: Jonah Palmer 
>
> I thought hard about this. I propose that for now,
> instead of disabling ioevetfd silently we error out unless
> user disabled it for us.
> WDYT?
>

Yes, error is a better plan than silently disabling it. In the
(unlikely?) case we are able to make notification data work with
eventfd in the future, it makes the change more evident.

>
> > ---
> >  hw/virtio/virtio-pci.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > index d12edc567f..287b8f7720 100644
> > --- a/hw/virtio/virtio-pci.c
> > +++ b/hw/virtio/virtio-pci.c
> > @@ -417,13 +417,15 @@ static void virtio_ioport_write(void *opaque, 
> > uint32_t addr, uint32_t val)
> >  }
> >  break;
> >  case VIRTIO_PCI_STATUS:
> > -if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> > +if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> > +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
> >  virtio_pci_stop_ioeventfd(proxy);
> >  }
> >
> >  virtio_set_status(vdev, val & 0xFF);
> >
> > -if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
> > +if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> > +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
> >  virtio_pci_start_ioeventfd(proxy);
> >  }
> >
> > --
> > 2.39.3
>

Re: [PATCH v3 01/29] bulk: Access existing variables initialized to >F when available

2024-03-08 Thread Anthony PERARD

On Mon, Jan 29, 2024 at 05:44:43PM +0100, Philippe Mathieu-Daudé wrote:
> When a variable is initialized to >field, use it
> in place. Rationale: while this makes the code more concise,
> this also helps static analyzers.
> 
> Mechanical change using the following Coccinelle spatch script:
> 
>  @@
>  type S, F;
>  identifier s, m, v;
>  @@
>   S *s;
>   ...
>   F *v = >m;
>   <+...
>  ->m
>  +v
>   ...+>
> 
> Inspired-by: Zhao Liu 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
> index 36e6f93c37..10ddf6bc91 100644
> --- a/hw/xen/xen_pt.c
> +++ b/hw/xen/xen_pt.c
> @@ -710,7 +710,7 @@ static void xen_pt_destroy(PCIDevice *d) {
>  uint8_t intx;
>  int rc;
>  
> -if (machine_irq && !xen_host_pci_device_closed(>real_device)) {
> +if (machine_irq && !xen_host_pci_device_closed(host_dev)) {
>  intx = xen_pt_pci_intx(s);
>  rc = xc_domain_unbind_pt_irq(xen_xc, xen_domid, machine_irq,
>   PT_IRQ_TYPE_PCI,
> @@ -759,8 +759,8 @@ static void xen_pt_destroy(PCIDevice *d) {
>  memory_listener_unregister(>io_listener);
>  s->listener_set = false;
>  }
> -if (!xen_host_pci_device_closed(>real_device)) {
> -xen_host_pci_device_put(>real_device);
> +if (!xen_host_pci_device_closed(host_dev)) {
> +xen_host_pci_device_put(host_dev);

For the Xen part:
Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PULL 00/34] riscv-to-apply queue

2024-03-08 Thread Peter Maydell

On Fri, 8 Mar 2024 at 11:13, Alistair Francis  wrote:
>
> The following changes since commit 8f6330a807f2642dc2a3cdf33347aa28a4c00a87:
>
>   Merge tag 'pull-maintainer-updates-060324-1' of 
> https://gitlab.com/stsquad/qemu into staging (2024-03-06 16:56:20 +)
>
> are available in the Git repository at:
>
>   https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20240308-1
>
> for you to fetch changes up to 301876597112218c1e465ecc2b2fef6b27d5c27b:
>
>   target/riscv: fix ACPI MCFG table (2024-03-08 21:00:37 +1000)
>
> 
> RISC-V PR for 9.0
>
> * Update $ra with current $pc in trans_cm_jalt
> * Enable SPCR for SCPI virt machine
> * Allow large kernels to boot by moving the initrd further away in RAM
> * Sync hwprobe keys with kernel
> * Named features riscv,isa, 'svade' rework
> * FIX xATP_MODE validation
> * Add missing include guard in pmu.h
> * Add SRAT and SLIT ACPI tables
> * libqos fixes and add a riscv machine
> * Add Ztso extension
> * Use 'zfa' instead of 'Zfa'
> * Update KVM exts to Linux 6.8
> * move ratified/frozen exts to non-experimental
> * Ensure mcountinhibit, mcounteren, scounteren, hcounteren are 32-bit
> * mark_vs_dirty() before loads and stores
> * Remove 'is_store' bool from load/store fns
> * Fix shift count overflow
> * Fix setipnum_le write emulation for APLIC MSI-mode
> * Fix in_clrip[x] read emulation
> * Fix privilege mode of G-stage translation for debugging
> * Fix ACPI MCFG table for virt machine
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM

Re: [PULL 00/12] Misc fixes, i386 TSTEQ/TSTNE, coverity CI for 2024-03-08

2024-03-08 Thread Peter Maydell

On Fri, 8 Mar 2024 at 14:58, Paolo Bonzini  wrote:
>
> The following changes since commit 8f6330a807f2642dc2a3cdf33347aa28a4c00a87:
>
>   Merge tag 'pull-maintainer-updates-060324-1' of 
> https://gitlab.com/stsquad/qemu into staging (2024-03-06 16:56:20 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to 47791be8cc6efa0fb9c145a2b92da0417f4137b8:
>
>   gitlab-ci: add manual job to run Coverity (2024-03-08 15:52:26 +0100)
>
> 
> * target/i386: use TSTEQ/TSTNE
> * move Coverity builds to Gitlab CI
> * fix two memory leaks
> * bug fixes
>
> 
> Akihiko Odaki (1):
>   meson: Remove --warn-common ldflag
>
> Dmitrii Gavrilov (1):
>   system/qdev-monitor: move drain_call_rcu call under if (!dev) in 
> qmp_device_add()
>
> Paolo Bonzini (8):
>   hw/intc/apic: fix memory leak
>   oslib-posix: fix memory leak in touch_all_pages
>   mips: do not list individual devices from configs/
>   target/i386: use TSTEQ/TSTNE to test low bits
>   target/i386: use TSTEQ/TSTNE to check flags
>   target/i386: remove mask from CCPrepare
>   run-coverity-scan: add --check-upload-only option
>   gitlab-ci: add manual job to run Coverity
>
> Sven Schnelle (2):
>   hw/scsi/lsi53c895a: add timer to scripts processing
>   hw/scsi/lsi53c895a: stop script on phase mismatch

Looks like this hits a TCG assertion on aarch64 host:
https://gitlab.com/qemu-project/qemu/-/jobs/6353434430

Several of the qtest tests fail with:

66/853 qemu:qtest+qtest-x86_64 / qtest-x86_64/vmgenid-test ERROR 0.44s
killed by signal 6 SIGABRT
>>> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
>>>  QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
>>> PYTHON=/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/pyvenv/bin/python3
>>>  MALLOC_PERTURB_=139 QTEST_QEMU_IMG=./qemu-img 
>>> QTEST_QEMU_BINARY=./qemu-system-x86_64 
>>> /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/tests/qtest/vmgenid-test
>>>  --tap -k
― ✀ ―
stderr:
**
ERROR:/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/tcg/aarch64/tcg-target.c.inc:1511:tcg_out_brcond:
code should not be reached
Broken pipe
../tests/qtest/libqtest.c:204: kill_qemu() detected QEMU death from
signal 6 (Aborted) (core dumped)
(test program exited with status code -6)

thanks
-- PMM

Re: [PATCH] pci: Add option to disable device level INTx masking

2024-03-08 Thread Alex Williamson

On Fri, 8 Mar 2024 11:57:38 -0500
"Michael S. Tsirkin"  wrote:

> On Thu, Mar 07, 2024 at 11:46:42AM -0700, Alex Williamson wrote:
> > The PCI 2.3 spec added definitions of the INTx disable and status bits,
> > in the command and status registers respectively.  The command register
> > bit, commonly known as DisINTx in lspci, controls whether the device
> > can assert the INTx signal.
> > 
> > Operating systems will often write to this bit to test whether a device
> > supports this style of legacy interrupt masking.  When using device
> > assignment, such as with vfio-pci, the result of this test dictates
> > whether the device can use a shared or exclusive interrupt (ie. generic
> > INTx masking at the device via DisINTx or IRQ controller level INTx
> > masking).
> > 
> > Add an experimental option to the base set of properties for PCI
> > devices which allows the DisINTx bit to be excluded from wmask, making
> > it read-only to the guest for testing purposes related to INTx masking.
> >   
> 
> Could you clarify the use a bit more? It's unstable - do you
> expect to experiment with it and then make it permanent down
> the road?

No, my aspirations end at providing an experimental option.
Technically all devices should support and honor this bit, so I don't
think we should provide a supported method of providing broken behavior,
but there do exist physical devices where this feature is broken or
unsupported.  Rather than implementing emulation of one of these broken
devices, with bug for bug compatibility, it's much easier to be able to
trigger broken DisINTx behavior on an arbitrary device, in an
unsupported fashion.  Thanks,

Alex

> > Signed-off-by: Alex Williamson 
> > ---
> >  hw/pci/pci.c | 14 ++
> >  include/hw/pci/pci.h |  2 ++
> >  2 files changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index 6496d027ca61..8c78326ad67f 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -85,6 +85,8 @@ static Property pci_props[] = {
> >  QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
> >  DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
> >  QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
> > +DEFINE_PROP_BIT("x-pci-disintx", PCIDevice, cap_present,
> > +QEMU_PCI_DISINTX_BITNR, true),
> >  DEFINE_PROP_END_OF_LIST()
> >  };
> >  
> > @@ -861,13 +863,17 @@ static void pci_init_cmask(PCIDevice *dev)
> >  static void pci_init_wmask(PCIDevice *dev)
> >  {
> >  int config_size = pci_config_size(dev);
> > +uint16_t cmd_wmask = PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
> > + PCI_COMMAND_MASTER | PCI_COMMAND_SERR;
> >  
> >  dev->wmask[PCI_CACHE_LINE_SIZE] = 0xff;
> >  dev->wmask[PCI_INTERRUPT_LINE] = 0xff;
> > -pci_set_word(dev->wmask + PCI_COMMAND,
> > - PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
> > - PCI_COMMAND_INTX_DISABLE);
> > -pci_word_test_and_set_mask(dev->wmask + PCI_COMMAND, PCI_COMMAND_SERR);
> > +
> > +if (dev->cap_present & QEMU_PCI_DISINTX) {
> > +cmd_wmask |= PCI_COMMAND_INTX_DISABLE;
> > +}
> > +
> > +pci_set_word(dev->wmask + PCI_COMMAND, cmd_wmask);
> >  
> >  memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff,
> > config_size - PCI_CONFIG_HEADER_SIZE);
> > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > index eaa3fc99d884..45f0fac435cc 100644
> > --- a/include/hw/pci/pci.h
> > +++ b/include/hw/pci/pci.h
> > @@ -212,6 +212,8 @@ enum {
> >  QEMU_PCIE_ERR_UNC_MASK = (1 << QEMU_PCIE_ERR_UNC_MASK_BITNR),
> >  #define QEMU_PCIE_ARI_NEXTFN_1_BITNR 12
> >  QEMU_PCIE_ARI_NEXTFN_1 = (1 << QEMU_PCIE_ARI_NEXTFN_1_BITNR),
> > +#define QEMU_PCI_DISINTX_BITNR 13
> > +QEMU_PCI_DISINTX = (1 << QEMU_PCI_DISINTX_BITNR),
> >  };
> >  
> >  typedef struct PCIINTxRoute {
> > -- 
> > 2.44.0  
>

Re: [PATCH v2 07/20] smbios: avoid mangling user provided tables

2024-03-08 Thread Igor Mammedov

On Thu, 7 Mar 2024 09:33:17 +0530
Ani Sinha  wrote:

> > On 06-Mar-2024, at 12:11, Ani Sinha  wrote:
> > 
> > 
> > 
> > On Tue, 5 Mar 2024, Igor Mammedov wrote:
> >   
> >> currently smbios_entry_add() preserves internally '-smbios type='
> >> options but tables provided with '-smbios file=' are stored directly
> >> into blob that eventually will be exposed to VM. And then later
> >> QEMU adds default/'-smbios type' entries on top into the same blob.
> >> 
> >> It makes impossible to generate tables more than once, hence
> >> 'immutable' guard was used.
> >> Make it possible to regenerate final blob by storing user provided
> >> blobs into a dedicated area (usr_blobs) and then copy it when
> >> composing final blob. Which also makes handling of -smbios
> >> options consistent.
> >> 
> >> As side effect of this and previous commits there is no need to
> >> generate legacy smbios_entries at the time options are parsed.
> >> Instead compose smbios_entries on demand from  usr_blobs like
> >> it is done for non-legacy SMBIOS tables.
> >> 
> >> Signed-off-by: Igor Mammedov 
> >> Tested-by: Fiona Ebner   
> > 
> > Reviewed-by: Ani Sinha 
> >   
> >> ---
> >> hw/smbios/smbios.c | 179 +++--
> >> 1 file changed, 92 insertions(+), 87 deletions(-)
> >> 
> >> diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> >> index c46fc93357..aa2cc5bdbd 100644
> >> --- a/hw/smbios/smbios.c
> >> +++ b/hw/smbios/smbios.c
> >> @@ -57,6 +57,14 @@ static size_t smbios_entries_len;
> >> static bool smbios_uuid_encoded = true;
> >> /* end: legacy structures & constants for <= 2.0 machines */
> >> 
> >> +/*
> >> + * SMBIOS tables provided by user with '-smbios file=' option
> >> + */
> >> +uint8_t *usr_blobs;
> >> +size_t usr_blobs_len;
> >> +static GArray *usr_blobs_sizes;
> >> +static unsigned usr_table_max;
> >> +static unsigned usr_table_cnt;
> >> 
> >> uint8_t *smbios_tables;
> >> size_t smbios_tables_len;
> >> @@ -67,7 +75,6 @@ static SmbiosEntryPointType smbios_ep_type = 
> >> SMBIOS_ENTRY_POINT_TYPE_32;
> >> static SmbiosEntryPoint ep;
> >> 
> >> static int smbios_type4_count = 0;
> >> -static bool smbios_immutable;
> >> static bool smbios_have_defaults;
> >> static uint32_t smbios_cpuid_version, smbios_cpuid_features;
> >> 
> >> @@ -569,9 +576,8 @@ static void smbios_build_type_1_fields(void)
> >> 
> >> uint8_t *smbios_get_table_legacy(uint32_t expected_t4_count, size_t 
> >> *length)
> >> {
> >> -/* drop unwanted version of command-line file blob(s) */
> >> -g_free(smbios_tables);
> >> -smbios_tables = NULL;
> >> +int i;
> >> +size_t usr_offset;
> >> 
> >> /* also complain if fields were given for types > 1 */
> >> if (find_next_bit(have_fields_bitmap,
> >> @@ -581,12 +587,33 @@ uint8_t *smbios_get_table_legacy(uint32_t 
> >> expected_t4_count, size_t *length)
> >> exit(1);
> >> }
> >> 
> >> -if (!smbios_immutable) {
> >> -smbios_build_type_0_fields();
> >> -smbios_build_type_1_fields();
> >> -smbios_validate_table(expected_t4_count);
> >> -smbios_immutable = true;
> >> +g_free(smbios_entries);
> >> +smbios_entries_len = sizeof(uint16_t);
> >> +smbios_entries = g_malloc0(smbios_entries_len);
> >> +
> >> +for (i = 0, usr_offset = 0; usr_blobs_sizes && i < 
> >> usr_blobs_sizes->len;
> >> + i++)
> >> +{
> >> +struct smbios_table *table;
> >> +struct smbios_structure_header *header;
> >> +size_t size = g_array_index(usr_blobs_sizes, size_t, i);
> >> +
> >> +header = (struct smbios_structure_header *)(usr_blobs + 
> >> usr_offset);
> >> +smbios_entries = g_realloc(smbios_entries, smbios_entries_len +
> >> +   size + sizeof(*table));
> >> +table = (struct smbios_table *)(smbios_entries + 
> >> smbios_entries_len);
> >> +table->header.type = SMBIOS_TABLE_ENTRY;
> >> +table->header.length = cpu_to_le16(sizeof(*table) + size);
> >> +memcpy(table->data, header, size);
> >> +smbios_entries_len += sizeof(*table) + size;
> >> +(*(uint16_t *)smbios_entries) =
> >> +cpu_to_le16(le16_to_cpu(*(uint16_t *)smbios_entries) + 1);  
> > 
> > I know this comes from existing code but can you please explain why we add
> > 1 to it? This is confusing and a comment here would be nice.
> >   
> >> +usr_offset += size;  
> > 
> > It would be better if we could add a comment here describing a bit what
> > this is all about.
> > 
> > user blobs are an array of smbios_structure_header entries whereas legacy
> > tables are an array of smbios_table structures where
> > smbios_table->data represents the a single user provided table blob in
> > smbios_structure_header.  
> 
> Igor, are you going to send a v3 for this with the comments added?

I can add comments as a patch on top of series,
though I'd rather prefer to deprecate all this legacy code
(along with ISA

[PATCH] hw/arm: Deprecate various old Arm machine types

2024-03-08 Thread Peter Maydell

QEMU includes some models of old Arm machine types which are
a bit problematic for us because:
 * they're written in a very old way that uses numerous APIs that we
   would like to get away from (eg they don't use qdev, they use
   qemu_system_reset_request(), they use vmstate_register(), etc)
 * they've been that way for a decade plus and nobody particularly has
   stepped up to try to modernise the code (beyond some occasional
   work here and there)
 * we often don't have test cases for them, which means that if we
   do try to do the necessary refactoring work on them we have no
   idea if they even still work at all afterwards

All these machine types are also of hardware that has largely passed
away into history and where I would not be surprised to find that
e.g. the Linux kernel support was never tested on real hardware
any more.

After some consultation with the Linux kernel developers, we
are going to deprecate:

All PXA2xx machines:

akitaSharp SL-C1000 (Akita) PDA (PXA270)
borzoi   Sharp SL-C3100 (Borzoi) PDA (PXA270)
connex   Gumstix Connex (PXA255)
mainstoneMainstone II (PXA27x)
spitzSharp SL-C3000 (Spitz) PDA (PXA270)
terrier  Sharp SL-C3200 (Terrier) PDA (PXA270)
tosa Sharp SL-6000 (Tosa) PDA (PXA255)
verdex   Gumstix Verdex Pro XL6P COMs (PXA270)
z2   Zipit Z2 (PXA27x)

All OMAP2 machines:

n800 Nokia N800 tablet aka. RX-34 (OMAP2420)
n810 Nokia N810 tablet aka. RX-44 (OMAP2420)

One of the OMAP1 machines:

cheetah  Palm Tungsten|E aka. Cheetah PDA (OMAP310)

Rationale:
 * for QEMU dropping individual machines is much less beneficial
   than if we can drop support for an entire SoC
 * the OMAP2 QEMU code in particular is large, old and unmaintained,
   and none of the OMAP2 kernel maintainers said they were using
   QEMU in any of their testing/development
 * although there is a setup that is booting test kernels on some
   of the PXA2xx machines, nobody seemed to be using them as part
   of their active kernel development and my impression from the
   email thread is that PXA is the closest of all these SoC families
   to being dropped from the kernel soon
 * nobody said they were using cheetah, so it's entirely
   untested and quite probably broken
 * on the other hand the OMAP1 sx1 model does seem to be being
   used as part of kernel development, and there was interest
   in keeping collie around

In particular, the mainstone, tosa and z2 machine types have
already been dropped from Linux.

Mark all these machine types as depprecated.

Signed-off-by: Peter Maydell 
---
 docs/about/deprecated.rst | 15 +++
 hw/arm/gumstix.c  |  2 ++
 hw/arm/mainstone.c|  1 +
 hw/arm/nseries.c  |  2 ++
 hw/arm/palm.c |  1 +
 hw/arm/spitz.c|  1 +
 hw/arm/tosa.c |  1 +
 hw/arm/z2.c   |  1 +
 8 files changed, 24 insertions(+)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 8565644da6d..7345e9f536a 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -247,6 +247,21 @@ to correct issues, mostly regarding migration 
compatibility. These are
 no longer maintained and removing them will make the code easier to
 read and maintain. Use versions 2.12 and above as a replacement.
 
+Arm machines ``akita``, ``borzoi``, ``cheetah``, ``connex``, ``mainstone``, 
``n800``, ``n810``, ``spitz``, ``terrier``, ``tosa``, ``verdex``, ``z2`` (since 
9.0)
+
+
+QEMU includes models of some machine types where the QEMU code that
+emulates their SoCs is very old and unmaintained. This code is now
+blocking our ability to move forward with various changes across
+the codebase, and over many years nobody has been interested in
+trying to modernise it. We don't expect any of these machines to have
+a large number of users, because they're all modelling hardware that
+has now passed away into history. We are therefore dropping support
+for all machine types using the PXA2xx and OMAP2 SoCs. We are also
+dropping the ``cheetah`` OMAP1 board, because we don't have any
+test images for it and don't know of anybody who does; the ``sx1``
+and ``sx1-v1`` OMAP1 machines remain supported for now.
+
 Backend options
 ---
 
diff --git a/hw/arm/gumstix.c b/hw/arm/gumstix.c
index d5de5409e17..91462691531 100644
--- a/hw/arm/gumstix.c
+++ b/hw/arm/gumstix.c
@@ -106,6 +106,7 @@ static void connex_class_init(ObjectClass *oc, void *data)
 mc->desc = "Gumstix Connex (PXA255)";
 mc->init = connex_init;
 mc->ignore_memory_transaction_failures = true;
+mc->deprecation_reason = "machine is old and unmaintained";
 }
 
 static const TypeInfo connex_type = {
@@ -121,6 +122,7 @@

Re: [PATCH] i386: load kernel on xen using DMA

2024-03-08 Thread Anthony PERARD

On Fri, Jun 18, 2021 at 09:54:14AM +0100, Alex Bennée wrote:
> 
> Marek Marczykowski-Górecki  writes:
> 
> > Kernel on Xen is loaded via fw_cfg. Previously it used non-DMA version,
> > which loaded the kernel (and initramfs) byte by byte. Change this
> > to DMA, to load in bigger chunks.
> > This change alone reduces load time of a (big) kernel+initramfs from
> > ~10s down to below 1s.
> >
> > This change was suggested initially here:
> > https://lore.kernel.org/xen-devel/20180216204031.5...@gmail.com/
> > Apparently this alone is already enough to get massive speedup.
> >
> > Signed-off-by: Marek Marczykowski-Górecki 
> > ---
> >  hw/i386/pc.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 8a84b25a03..14e43d4da4 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -839,7 +839,8 @@ void xen_load_linux(PCMachineState *pcms)
> >  
> >  assert(MACHINE(pcms)->kernel_filename != NULL);
> >  
> > -fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
> > +fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4,
> > +_space_memory);
> >  fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
> >  rom_set_fw(fw_cfg);
> 
> Gentle ping. The fix looks perfectly sane to me but I don't have any x86
> Xen HW to test this one. Are the x86 maintainers happy to take this on?

Yes. It looks like it works well with both SeaBIOS and OVMF, so the
patch is good.

> FWIW:
> 
> Reviewed-by: Alex Bennée 

Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

[PULL 1/3] hv-balloon: avoid alloca() usage

2024-03-08 Thread Maciej S. Szmigiero

From: "Maciej S. Szmigiero" 

alloca() is frowned upon, replace it with g_malloc0() + g_autofree.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: David Hildenbrand 
Signed-off-by: Maciej S. Szmigiero 
---
 hw/hyperv/hv-balloon.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/hyperv/hv-balloon.c b/hw/hyperv/hv-balloon.c
index ade283335a68..35333dab2434 100644
--- a/hw/hyperv/hv-balloon.c
+++ b/hw/hyperv/hv-balloon.c
@@ -366,7 +366,7 @@ static void hv_balloon_unballoon_posting(HvBalloon 
*balloon, StateDesc *stdesc)
 PageRangeTree dtree;
 uint64_t *dctr;
 bool our_range;
-struct dm_unballoon_request *ur;
+g_autofree struct dm_unballoon_request *ur = NULL;
 size_t ur_size = sizeof(*ur) + sizeof(ur->range_array[0]);
 PageRange range;
 bool bret;
@@ -388,8 +388,7 @@ static void hv_balloon_unballoon_posting(HvBalloon 
*balloon, StateDesc *stdesc)
 assert(dtree.t);
 assert(dctr);
 
-ur = alloca(ur_size);
-memset(ur, 0, ur_size);
+ur = g_malloc0(ur_size);
 ur->hdr.type = DM_UNBALLOON_REQUEST;
 ur->hdr.size = ur_size;
 ur->hdr.trans_id = balloon->trans_id;
@@ -531,7 +530,7 @@ static void hv_balloon_hot_add_posting(HvBalloon *balloon, 
StateDesc *stdesc)
 PageRange *hot_add_range = >hot_add_range;
 uint64_t *current_count = >ha_current_count;
 VMBusChannel *chan = hv_balloon_get_channel(balloon);
-struct dm_hot_add *ha;
+g_autofree struct dm_hot_add *ha = NULL;
 size_t ha_size = sizeof(*ha) + sizeof(ha->range);
 union dm_mem_page_range *ha_region;
 uint64_t align, chunk_max_size;
@@ -560,9 +559,8 @@ static void hv_balloon_hot_add_posting(HvBalloon *balloon, 
StateDesc *stdesc)
  */
 *current_count = MIN(hot_add_range->count, chunk_max_size);
 
-ha = alloca(ha_size);
+ha = g_malloc0(ha_size);
 ha_region = &(>range)[1];
-memset(ha, 0, ha_size);
 ha->hdr.type = DM_MEM_HOT_ADD_REQUEST;
 ha->hdr.size = ha_size;
 ha->hdr.trans_id = balloon->trans_id;

Re: [PATCH] pci: Add option to disable device level INTx masking

2024-03-08 Thread Cédric Le Goater


On 3/7/24 19:46, Alex Williamson wrote:

The PCI 2.3 spec added definitions of the INTx disable and status bits,
in the command and status registers respectively.  The command register
bit, commonly known as DisINTx in lspci, controls whether the device
can assert the INTx signal.

Operating systems will often write to this bit to test whether a device
supports this style of legacy interrupt masking.  When using device
assignment, such as with vfio-pci, the result of this test dictates
whether the device can use a shared or exclusive interrupt (ie. generic
INTx masking at the device via DisINTx or IRQ controller level INTx
masking).

Add an experimental option to the base set of properties for PCI
devices which allows the DisINTx bit to be excluded from wmask, making
it read-only to the guest for testing purposes related to INTx masking.

Signed-off-by: Alex Williamson 



LGTM,

Reviewed-by: Cédric Le Goater 

Thanks,

C.




---
  hw/pci/pci.c | 14 ++
  include/hw/pci/pci.h |  2 ++
  2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 6496d027ca61..8c78326ad67f 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -85,6 +85,8 @@ static Property pci_props[] = {
  QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
  DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
  QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
+DEFINE_PROP_BIT("x-pci-disintx", PCIDevice, cap_present,
+QEMU_PCI_DISINTX_BITNR, true),
  DEFINE_PROP_END_OF_LIST()
  };
  
@@ -861,13 +863,17 @@ static void pci_init_cmask(PCIDevice *dev)

  static void pci_init_wmask(PCIDevice *dev)
  {
  int config_size = pci_config_size(dev);
+uint16_t cmd_wmask = PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
+ PCI_COMMAND_MASTER | PCI_COMMAND_SERR;
  
  dev->wmask[PCI_CACHE_LINE_SIZE] = 0xff;

  dev->wmask[PCI_INTERRUPT_LINE] = 0xff;
-pci_set_word(dev->wmask + PCI_COMMAND,
- PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
- PCI_COMMAND_INTX_DISABLE);
-pci_word_test_and_set_mask(dev->wmask + PCI_COMMAND, PCI_COMMAND_SERR);
+
+if (dev->cap_present & QEMU_PCI_DISINTX) {
+cmd_wmask |= PCI_COMMAND_INTX_DISABLE;
+}
+
+pci_set_word(dev->wmask + PCI_COMMAND, cmd_wmask);
  
  memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff,

 config_size - PCI_CONFIG_HEADER_SIZE);
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index eaa3fc99d884..45f0fac435cc 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -212,6 +212,8 @@ enum {
  QEMU_PCIE_ERR_UNC_MASK = (1 << QEMU_PCIE_ERR_UNC_MASK_BITNR),
  #define QEMU_PCIE_ARI_NEXTFN_1_BITNR 12
  QEMU_PCIE_ARI_NEXTFN_1 = (1 << QEMU_PCIE_ARI_NEXTFN_1_BITNR),
+#define QEMU_PCI_DISINTX_BITNR 13
+QEMU_PCI_DISINTX = (1 << QEMU_PCI_DISINTX_BITNR),
  };
  
  typedef struct PCIINTxRoute {

[PULL 3/3] vmbus: Print a warning when enabled without the recommended set of features

2024-03-08 Thread Maciej S. Szmigiero

From: "Maciej S. Szmigiero" 

Some Windows versions crash at boot or fail to enable the VMBus device if
they don't see the expected set of Hyper-V features (enlightenments).

Since this provides poor user experience let's warn user if the VMBus
device is enabled without the recommended set of Hyper-V features.

The recommended set is the minimum set of Hyper-V features required to make
the VMBus device work properly in Windows Server versions 2016, 2019 and
2022.

Acked-by: Paolo Bonzini 
Signed-off-by: Maciej S. Szmigiero 
---
 hw/hyperv/hyperv.c| 12 
 hw/hyperv/vmbus.c |  6 ++
 include/hw/hyperv/hyperv.h|  4 
 target/i386/kvm/hyperv-stub.c |  4 
 target/i386/kvm/hyperv.c  |  5 +
 target/i386/kvm/hyperv.h  |  2 ++
 target/i386/kvm/kvm.c |  7 +++
 7 files changed, 40 insertions(+)

diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
index 6c4a18dd0e2a..3ea54ba818b2 100644
--- a/hw/hyperv/hyperv.c
+++ b/hw/hyperv/hyperv.c
@@ -951,3 +951,15 @@ uint64_t hyperv_syndbg_query_options(void)
 
 return msg.u.query_options.options;
 }
+
+static bool vmbus_recommended_features_enabled;
+
+bool hyperv_are_vmbus_recommended_features_enabled(void)
+{
+return vmbus_recommended_features_enabled;
+}
+
+void hyperv_set_vmbus_recommended_features_enabled(void)
+{
+vmbus_recommended_features_enabled = true;
+}
diff --git a/hw/hyperv/vmbus.c b/hw/hyperv/vmbus.c
index 380239af2c7b..f33afeeea27d 100644
--- a/hw/hyperv/vmbus.c
+++ b/hw/hyperv/vmbus.c
@@ -2631,6 +2631,12 @@ static void vmbus_bridge_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
+if (!hyperv_are_vmbus_recommended_features_enabled()) {
+warn_report("VMBus enabled without the recommended set of Hyper-V 
features: "
+"hv-stimer, hv-vapic and hv-runtime. "
+"Some Windows versions might not boot or enable the VMBus 
device");
+}
+
 bridge->bus = VMBUS(qbus_new(TYPE_VMBUS, dev, "vmbus"));
 }
 
diff --git a/include/hw/hyperv/hyperv.h b/include/hw/hyperv/hyperv.h
index 015c3524b1c2..d717b4e13d40 100644
--- a/include/hw/hyperv/hyperv.h
+++ b/include/hw/hyperv/hyperv.h
@@ -139,4 +139,8 @@ typedef struct HvSynDbgMsg {
 } HvSynDbgMsg;
 typedef uint16_t (*HvSynDbgHandler)(void *context, HvSynDbgMsg *msg);
 void hyperv_set_syndbg_handler(HvSynDbgHandler handler, void *context);
+
+bool hyperv_are_vmbus_recommended_features_enabled(void);
+void hyperv_set_vmbus_recommended_features_enabled(void);
+
 #endif
diff --git a/target/i386/kvm/hyperv-stub.c b/target/i386/kvm/hyperv-stub.c
index 778ed782e6fc..3263dcf05d31 100644
--- a/target/i386/kvm/hyperv-stub.c
+++ b/target/i386/kvm/hyperv-stub.c
@@ -52,3 +52,7 @@ void hyperv_x86_synic_reset(X86CPU *cpu)
 void hyperv_x86_synic_update(X86CPU *cpu)
 {
 }
+
+void hyperv_x86_set_vmbus_recommended_features_enabled(void)
+{
+}
diff --git a/target/i386/kvm/hyperv.c b/target/i386/kvm/hyperv.c
index 6825c89af374..f2a3fe650a18 100644
--- a/target/i386/kvm/hyperv.c
+++ b/target/i386/kvm/hyperv.c
@@ -149,3 +149,8 @@ int kvm_hv_handle_exit(X86CPU *cpu, struct kvm_hyperv_exit 
*exit)
 return -1;
 }
 }
+
+void hyperv_x86_set_vmbus_recommended_features_enabled(void)
+{
+hyperv_set_vmbus_recommended_features_enabled();
+}
diff --git a/target/i386/kvm/hyperv.h b/target/i386/kvm/hyperv.h
index 67543296c3a4..e3982c8f4dd1 100644
--- a/target/i386/kvm/hyperv.h
+++ b/target/i386/kvm/hyperv.h
@@ -26,4 +26,6 @@ int hyperv_x86_synic_add(X86CPU *cpu);
 void hyperv_x86_synic_reset(X86CPU *cpu);
 void hyperv_x86_synic_update(X86CPU *cpu);
 
+void hyperv_x86_set_vmbus_recommended_features_enabled(void);
+
 #endif
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 42970ab046fa..e68cbe929302 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1650,6 +1650,13 @@ static int hyperv_init_vcpu(X86CPU *cpu)
 }
 }
 
+/* Skip SynIC and VP_INDEX since they are hard deps already */
+if (hyperv_feat_enabled(cpu, HYPERV_FEAT_STIMER) &&
+hyperv_feat_enabled(cpu, HYPERV_FEAT_VAPIC) &&
+hyperv_feat_enabled(cpu, HYPERV_FEAT_RUNTIME)) {
+hyperv_x86_set_vmbus_recommended_features_enabled();
+}
+
 return 0;
 }

[PULL 0/3] Hyper-V Dynamic Memory and VMBus misc small patches

2024-03-08 Thread Maciej S. Szmigiero

From: "Maciej S. Szmigiero" 

The following changes since commit 8f6330a807f2642dc2a3cdf33347aa28a4c00a87:

  Merge tag 'pull-maintainer-updates-060324-1' of 
https://gitlab.com/stsquad/qemu into staging (2024-03-06 16:56:20 +)

are available in the Git repository at:

  https://github.com/maciejsszmigiero/qemu.git tags/pull-hv-balloon-20240308

for you to fetch changes up to 6093637b4d32875f98cd59696ffc5f26884aa0b4:

  vmbus: Print a warning when enabled without the recommended set of features 
(2024-03-08 14:18:56 +0100)


Hyper-V Dynamic Memory and VMBus misc small patches

This pull request contains two small patches to hv-balloon:
the first one replacing alloca() usage with g_malloc0() + g_autofree
and the second one adding additional declaration of a protocol message
struct with an optional field explicitly defined to avoid a Coverity
warning.

Also included is a VMBus patch to print a warning when it is enabled
without the recommended set of Hyper-V features (enlightenments) since
some Windows versions crash at boot in this case.


Maciej S. Szmigiero (3):
  hv-balloon: avoid alloca() usage
  hv-balloon: define dm_hot_add_with_region to avoid Coverity warning
  vmbus: Print a warning when enabled without the recommended set of 
features

 hw/hyperv/hv-balloon.c   | 18 --
 hw/hyperv/hyperv.c   | 12 
 hw/hyperv/vmbus.c|  6 ++
 include/hw/hyperv/dynmem-proto.h |  9 -
 include/hw/hyperv/hyperv.h   |  4 
 target/i386/kvm/hyperv-stub.c|  4 
 target/i386/kvm/hyperv.c |  5 +
 target/i386/kvm/hyperv.h |  2 ++
 target/i386/kvm/kvm.c|  7 +++
 9 files changed, 56 insertions(+), 11 deletions(-)

[PULL 2/3] hv-balloon: define dm_hot_add_with_region to avoid Coverity warning

2024-03-08 Thread Maciej S. Szmigiero

From: "Maciej S. Szmigiero" 

Since the presence of a hot add memory region is optional in hot add
request message it wasn't part of this message declaration
(struct dm_hot_add).

Instead, the code allocated such enlarged message by simply adding the
necessary size for this extra field to the size of basic hot add message
struct.

However, Coverity considers accessing this extra member to be
an out-of-bounds access, even thought the memory is actually there.

Fix this by adding an extended variant of this message that explicitly has
an additional union dm_mem_page_range at its end.

CID: #1523903
Signed-off-by: Maciej S. Szmigiero 
---
 hw/hyperv/hv-balloon.c   | 10 +-
 include/hw/hyperv/dynmem-proto.h |  9 -
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/hw/hyperv/hv-balloon.c b/hw/hyperv/hv-balloon.c
index 35333dab2434..3a9ef0769103 100644
--- a/hw/hyperv/hv-balloon.c
+++ b/hw/hyperv/hv-balloon.c
@@ -513,8 +513,8 @@ ret_idle:
 static void hv_balloon_hot_add_rb_wait(HvBalloon *balloon, StateDesc *stdesc)
 {
 VMBusChannel *chan = hv_balloon_get_channel(balloon);
-struct dm_hot_add *ha;
-size_t ha_size = sizeof(*ha) + sizeof(ha->range);
+struct dm_hot_add_with_region *ha;
+size_t ha_size = sizeof(*ha);
 
 assert(balloon->state == S_HOT_ADD_RB_WAIT);
 
@@ -530,8 +530,8 @@ static void hv_balloon_hot_add_posting(HvBalloon *balloon, 
StateDesc *stdesc)
 PageRange *hot_add_range = >hot_add_range;
 uint64_t *current_count = >ha_current_count;
 VMBusChannel *chan = hv_balloon_get_channel(balloon);
-g_autofree struct dm_hot_add *ha = NULL;
-size_t ha_size = sizeof(*ha) + sizeof(ha->range);
+g_autofree struct dm_hot_add_with_region *ha = NULL;
+size_t ha_size = sizeof(*ha);
 union dm_mem_page_range *ha_region;
 uint64_t align, chunk_max_size;
 ssize_t ret;
@@ -560,7 +560,7 @@ static void hv_balloon_hot_add_posting(HvBalloon *balloon, 
StateDesc *stdesc)
 *current_count = MIN(hot_add_range->count, chunk_max_size);
 
 ha = g_malloc0(ha_size);
-ha_region = &(>range)[1];
+ha_region = >region;
 ha->hdr.type = DM_MEM_HOT_ADD_REQUEST;
 ha->hdr.size = ha_size;
 ha->hdr.trans_id = balloon->trans_id;
diff --git a/include/hw/hyperv/dynmem-proto.h b/include/hw/hyperv/dynmem-proto.h
index a657786a94b1..68b8b606f268 100644
--- a/include/hw/hyperv/dynmem-proto.h
+++ b/include/hw/hyperv/dynmem-proto.h
@@ -328,7 +328,8 @@ struct dm_unballoon_response {
 /*
  * Hot add request message. Message sent from the host to the guest.
  *
- * mem_range: Memory range to hot add.
+ * range: Memory range to hot add.
+ * region: Explicit hot add memory region for guest to use. Optional.
  *
  */
 
@@ -337,6 +338,12 @@ struct dm_hot_add {
 union dm_mem_page_range range;
 } QEMU_PACKED;
 
+struct dm_hot_add_with_region {
+struct dm_header hdr;
+union dm_mem_page_range range;
+union dm_mem_page_range region;
+} QEMU_PACKED;
+
 /*
  * Hot add response message.
  * This message is sent by the guest to report the status of a hot add request.

Re: [RFC PATCH 4/5] hw/i386/q35: Wire virtual SMI# lines to ICH9 chipset

2024-03-08 Thread Philippe Mathieu-Daudé


On 8/3/24 17:06, Thomas Huth wrote:

On 08/03/2024 09.08, Philippe Mathieu-Daudé wrote:

On 7/3/24 20:43, Thomas Huth wrote:

On 28/02/2024 17.43, Zhao Liu wrote:

Hi Philippe,


+/*
+ * Real ICH9 contains a single SMI output line and doesn't 
broadcast CPUs.
+ * Virtualized ICH9 allows broadcasting upon negatiation with 
guest, see

+ * commit 5ce45c7a2b.
+ */
+enum {
+    ICH9_VIRT_SMI_BROADCAST,
+    ICH9_VIRT_SMI_CURRENT,
+#define ICH9_VIRT_SMI_COUNT 2
+};
+


Just quick look here. Shouldn't ICH9_VIRT_SMI_COUNT be defined 
outside of

enum {}?


Or even better, do it without a #define:

enum {
 ICH9_VIRT_SMI_BROADCAST,
 ICH9_VIRT_SMI_CURRENT,
 ICH9_VIRT_SMI_COUNT


This form isn't recommended as it confuses static analyzers,
considering ICH9_VIRT_SMI_COUNT as part of the enum.


Never heard of that before. We're using it all over the place, e.g.:

typedef enum {
     THROTTLE_BPS_TOTAL,
     THROTTLE_BPS_READ,
     THROTTLE_BPS_WRITE,
     THROTTLE_OPS_TOTAL,
     THROTTLE_OPS_READ,
     THROTTLE_OPS_WRITE,
     BUCKETS_COUNT,
} BucketType;

... and even in our generated QAPI code, e.g.:

typedef enum QCryptoHashAlgorithm {
     QCRYPTO_HASH_ALG_MD5,
     QCRYPTO_HASH_ALG_SHA1,
     QCRYPTO_HASH_ALG_SHA224,
     QCRYPTO_HASH_ALG_SHA256,
     QCRYPTO_HASH_ALG_SHA384,
     QCRYPTO_HASH_ALG_SHA512,
     QCRYPTO_HASH_ALG_RIPEMD160,
     QCRYPTO_HASH_ALG__MAX,
} QCryptoHashAlgorithm;


We tried to remove it:

https://lore.kernel.org/qemu-devel/20230315112811.22355-4-phi...@linaro.org/

But there is a problem with generated empty enums...
https://lore.kernel.org/qemu-devel/87sfdx9w58@pond.sub.org/

Re: [PATCH v1 2/8] virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-08 Thread Michael S. Tsirkin

On Mon, Mar 04, 2024 at 02:46:06PM -0500, Jonah Palmer wrote:
> Prevent ioeventfd from being enabled/disabled when a virtio-pci
> device has negotiated the VIRTIO_F_NOTIFICATION_DATA transport
> feature.
> 
> Due to ioeventfd not being able to carry the extra data associated with
> this feature, the ioeventfd should be left in a disabled state for
> emulated virtio-pci devices using this feature.
> 
> Reviewed-by: Eugenio Pérez 
> Signed-off-by: Jonah Palmer 

I thought hard about this. I propose that for now,
instead of disabling ioevetfd silently we error out unless
user disabled it for us.
WDYT?


> ---
>  hw/virtio/virtio-pci.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index d12edc567f..287b8f7720 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -417,13 +417,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  }
>  break;
>  case VIRTIO_PCI_STATUS:
> -if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
>  virtio_pci_stop_ioeventfd(proxy);
>  }
>  
>  virtio_set_status(vdev, val & 0xFF);
>  
> -if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
> +if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
>  virtio_pci_start_ioeventfd(proxy);
>  }
>  
> -- 
> 2.39.3

Re: [PATCH v2 3/5] gdbstub: Save target's siginfo

2024-03-08 Thread Gustavo Romero


Hi Alex,

On 3/7/24 7:33 PM, Alex Bennée wrote:

Richard Henderson  writes:


On 3/7/24 08:26, Gustavo Romero wrote:

Save target's siginfo into gdbserver_state so it can be used later, for
example, in any stub that requires the target's si_signo and si_code.
This change affects only linux-user mode.
Signed-off-by: Gustavo Romero 
Suggested-by: Richard Henderson 
---
   gdbstub/internals.h|  3 +++
   gdbstub/user-target.c  |  3 ++-
   gdbstub/user.c | 14 ++
   include/gdbstub/user.h |  6 +-
   linux-user/main.c  |  2 +-
   linux-user/signal.c|  5 -
   6 files changed, 25 insertions(+), 8 deletions(-)
diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index 56b7c13b75..a7cc69dab3 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -58,6 +58,9 @@ typedef struct GDBState {
   int line_csum; /* checksum at the end of the packet */
   GByteArray *last_packet;
   int signal;
+#ifdef CONFIG_USER_ONLY
+uint8_t siginfo[MAX_SIGINFO_LENGTH];
+#endif


If we this in GDBUserState in user.c -- no need for ifdefs then.


Although it does break on FreeBSD's user target:

   FAILED: libqemu-arm-bsd-user.fa.p/gdbstub_user-target.c.o
   cc -m64 -mcx16 -Ilibqemu-arm-bsd-user.fa.p -I. -I.. -Itarget/arm -I../target/arm 
-I../common-user/host/x86_64 -I../bsd-user/include -Ibsd-user/freebsd -I../bsd-user/freebsd 
-I../bsd-user/host/x86_64 -Ibsd-user -I../bsd-user -I../bsd-user/arm -Iqapi -Itrace -Iui 
-Iui/shader -I/usr/local/include/capstone -I/usr/local/include/glib-2.0 
-I/usr/local/lib/glib-2.0/include -I/usr/local/include -fdiagnostics-color=auto -Wall -Winvalid-pch 
-Werror -std=gnu11 -O2 -g -fstack-protector-strong -Wempty-body -Wendif-labels 
-Wexpansion-to-defined -Wformat-security -Wformat-y2k -Wignored-qualifiers -Winit-self 
-Wmissing-format-attribute -Wmissing-prototypes -Wnested-externs -Wold-style-definition 
-Wredundant-decls -Wstrict-prototypes -Wtype-limits -Wundef -Wvla -Wwrite-strings 
-Wno-gnu-variable-sized-type-not-at-end -Wno-initializer-overrides -Wno-missing-include-dirs 
-Wno-psabi -Wno-shift-negative-value -Wno-string-plus-int -Wno-tautological-type-limit-compare 
-Wno-typedef-redefinition -Wthread-safety -iquote . -iquote /tmp/cirrus-ci-build -iquote 
/tmp/cirrus-ci-build/include -iquote /tmp/cirrus-ci-build/host/include/x86_64 -iquote 
/tmp/cirrus-ci-build/host/include/generic -iquote /tmp/cirrus-ci-build/tcg/i386 -pthread 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fno-common -fwrapv 
-ftrivial-auto-var-init=zero -fPIE -DNEED_CPU_H 
'-DCONFIG_TARGET="arm-bsd-user-config-target.h"' 
'-DCONFIG_DEVICES="arm-bsd-user-config-devices.h"' -MD -MQ 
libqemu-arm-bsd-user.fa.p/gdbstub_user-target.c.o -MF 
libqemu-arm-bsd-user.fa.p/gdbstub_user-target.c.o.d -o 
libqemu-arm-bsd-user.fa.p/gdbstub_user-target.c.o -c ../gdbstub/user-target.c
   In file included from ../gdbstub/user-target.c:18:
   ../gdbstub/internals.h:62:21: error: use of undeclared identifier 
'MAX_SIGINFO_LENGTH'
  62 | uint8_t siginfo[MAX_SIGINFO_LENGTH];
 | ^
   1 error generated.
   [2084/6731] Compiling C object libqemu-arm

See: https://gitlab.com/stsquad/qemu/-/jobs/6345829419


argh, I've tested all targets for linux-user, but missed bsd-user. I've tried
once to build it but that requires a BSD-like host, which I don't have at the
moment, then I forgot about it... Let me setup one and review the change in
the light of the comments from you and Richard.

Thanks!


Cheers,
Gustavo

Re: [PATCH] pci: Add option to disable device level INTx masking

2024-03-08 Thread Michael S. Tsirkin

On Thu, Mar 07, 2024 at 11:46:42AM -0700, Alex Williamson wrote:
> The PCI 2.3 spec added definitions of the INTx disable and status bits,
> in the command and status registers respectively.  The command register
> bit, commonly known as DisINTx in lspci, controls whether the device
> can assert the INTx signal.
> 
> Operating systems will often write to this bit to test whether a device
> supports this style of legacy interrupt masking.  When using device
> assignment, such as with vfio-pci, the result of this test dictates
> whether the device can use a shared or exclusive interrupt (ie. generic
> INTx masking at the device via DisINTx or IRQ controller level INTx
> masking).
> 
> Add an experimental option to the base set of properties for PCI
> devices which allows the DisINTx bit to be excluded from wmask, making
> it read-only to the guest for testing purposes related to INTx masking.
> 

Could you clarify the use a bit more? It's unstable - do you
expect to experiment with it and then make it permanent down
the road?

> Signed-off-by: Alex Williamson 
> ---
>  hw/pci/pci.c | 14 ++
>  include/hw/pci/pci.h |  2 ++
>  2 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 6496d027ca61..8c78326ad67f 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -85,6 +85,8 @@ static Property pci_props[] = {
>  QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
>  DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
>  QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
> +DEFINE_PROP_BIT("x-pci-disintx", PCIDevice, cap_present,
> +QEMU_PCI_DISINTX_BITNR, true),
>  DEFINE_PROP_END_OF_LIST()
>  };
>  
> @@ -861,13 +863,17 @@ static void pci_init_cmask(PCIDevice *dev)
>  static void pci_init_wmask(PCIDevice *dev)
>  {
>  int config_size = pci_config_size(dev);
> +uint16_t cmd_wmask = PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
> + PCI_COMMAND_MASTER | PCI_COMMAND_SERR;
>  
>  dev->wmask[PCI_CACHE_LINE_SIZE] = 0xff;
>  dev->wmask[PCI_INTERRUPT_LINE] = 0xff;
> -pci_set_word(dev->wmask + PCI_COMMAND,
> - PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
> - PCI_COMMAND_INTX_DISABLE);
> -pci_word_test_and_set_mask(dev->wmask + PCI_COMMAND, PCI_COMMAND_SERR);
> +
> +if (dev->cap_present & QEMU_PCI_DISINTX) {
> +cmd_wmask |= PCI_COMMAND_INTX_DISABLE;
> +}
> +
> +pci_set_word(dev->wmask + PCI_COMMAND, cmd_wmask);
>  
>  memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff,
> config_size - PCI_CONFIG_HEADER_SIZE);
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index eaa3fc99d884..45f0fac435cc 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -212,6 +212,8 @@ enum {
>  QEMU_PCIE_ERR_UNC_MASK = (1 << QEMU_PCIE_ERR_UNC_MASK_BITNR),
>  #define QEMU_PCIE_ARI_NEXTFN_1_BITNR 12
>  QEMU_PCIE_ARI_NEXTFN_1 = (1 << QEMU_PCIE_ARI_NEXTFN_1_BITNR),
> +#define QEMU_PCI_DISINTX_BITNR 13
> +QEMU_PCI_DISINTX = (1 << QEMU_PCI_DISINTX_BITNR),
>  };
>  
>  typedef struct PCIINTxRoute {
> -- 
> 2.44.0

Re: [PATCH V4 00/14] allow cpr-reboot for vfio

2024-03-08 Thread Cédric Le Goater


On 2/22/24 18:33, Steven Sistare wrote:

Peter (and David if interested): these patches still need RB:
   migration: notifier error checking
   migration: stop vm for cpr
   migration: update cpr-reboot description
   migration: options incompatible with cpr

Alex, these patches still need RB:
   vfio: register container for cpr
   vfio: allow cpr-reboot migration if suspended


Applied to vfio-next.

Thanks,

C.

Unmapping KVM Guest Memory from Host Kernel

2024-03-08 Thread Gowans, James

Hello KVM, MM and memfd_secret folks,

Currently when using anonymous memory for KVM guest RAM, the memory all
remains mapped into the kernel direct map. We are looking at options to
get KVM guest memory out of the kernel’s direct map as a principled
approach to mitigating speculative execution issues in the host kernel.
Our goal is to more completely address the class of issues whose leak
origin is categorized as "Mapped memory" [1].

We currently have downstream-only solutions to this, but we want to move
to purely upstream code.

So far we have been looking at using memfd_secret, which seems to be
designed exactly for usecases where it is undesirable to have some
memory range accessible through the kernel’s direct map.

However, memfd_secret doesn’t work out the box for KVM guest memory; the
main reason seems to be that the GUP path is intentionally disabled for
memfd_secret, so if we use a memfd_secret backed VMA for a memslot then
KVM is not able to fault the memory in. If it’s been pre-faulted in by
userspace then it seems to work.

There are a few other issues around when KVM accesses the guest memory.
For example the KVM PV clock code goes directly to the PFN via the
pfncache, and that also breaks if the PFN is not in the direct map, so
we’d need to change that sort of thing, perhaps going via userspace
addresses.

If we remove the memfd_secret check from the GUP path, and disable KVM’s
pvclock from userspace via KVM_CPUID_FEATURES, we are able to boot a
simple Linux initrd using a Firecracker VMM modified to use
memfd_secret.

We are also aware of ongoing work on guest_memfd. The current
implementation unmaps guest memory from VMM address space, but leaves it
in the kernel’s direct map. We’re not looking at unmapping from VMM
userspace yet; we still need guest RAM there for PV drivers like virtio
to continue to work. So KVM’s gmem doesn’t seem like the right solution?

With this in mind, what’s the best way to solve getting guest RAM out of
the direct map? Is memfd_secret integration with KVM the way to go, or
should we build a solution on top of guest_memfd, for example via some
flag that causes it to leave memory in the host userspace’s page tables,
but removes it from the direct map? 

We are keen to help contribute to getting this working, we’re just
looking for guidance from maintainers on what the correct way to solve
this is.

Cheers,
James + colleagues Derek and Patrick

1 2 3 4 >

1 - 100 of 355 matches

Mail list logo