Bug in master branch: IbmPrep40pMachine.test_factory_firmware_and_netbsd broken

2023-03-01 Thread Thomas Huth



 Hi all,

seems like we've got another bug that sneaked in during the CI minutes 
blackout: The avocado test 
IbmPrep40pMachine.test_factory_firmware_and_netbsd is now broken reliably, see:


 https://gitlab.com/qemu-project/qemu/-/jobs/3858833617#L300
 https://gitlab.com/thuth/qemu/-/jobs/3858727901#L300
 https://gitlab.com/thuth/qemu/-/jobs/3857804517#L300

Is anybody already looking into this?

 Thomas




Re: [PATCH] [PATCH] disas/riscv Fix ctzw disassemble

2023-03-01 Thread Ivan Klokov

Hello, Palmer!

Thanks for your reviewing

I'm sorry, I sent V2 patch, but forgot to add the appropriate tag.

Please see 
https://lists.nongnu.org/archive/html/qemu-devel/2023-02/msg05278.html


It was also reviewed by Daniel Henrique Barboza and weiwei

02.03.2023 3:32, Palmer Dabbelt пишет:
On Fri, 17 Feb 2023 07:45:14 PST (-0800), dbarb...@ventanamicro.com 
wrote:



On 2/17/23 12:14, Ivan Klokov wrote:

Due to typo in opcode list, ctzw is disassembled as clzw instruction.



The code was added by 02c1b569a15b4b06a so I believe a "Fixes:" tag 
is in

order:

Fixes: 02c1b569a15b ("disas/riscv: Add Zb[abcs] instructions")


Signed-off-by: Ivan Klokov 
---
  disas/riscv.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index ddda687c13..d0639cd047 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -1644,7 +1644,7 @@ const rv_opcode_data opcode_data[] = {
  { "minu", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
  { "max", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
  { "maxu", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
-    { "clzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+    { "ctzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
  { "clzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },



Does the order matter here? This patch is putting ctzw before clzw, 
but 20 lines

or so before we have "clz" after "ctz".


IIUC the ordering does matter: the values in rv_op_* need to match the 
index of opcode_data[].  decode_inst_opcode() fills out rv_op_*, and 
then the various decode bits (with format_inst() being the most 
relevant as it looks at the name field).


So unless I'm missing something, the correct patch should look like

   diff --git a/disas/riscv.c b/disas/riscv.c
   index ddda687c13..544558 100644
   --- a/disas/riscv.c
   +++ b/disas/riscv.c
   @@ -1645,7 +1645,7 @@ const rv_opcode_data opcode_data[] = {
    { "max", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
    { "maxu", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
    { "clzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
   -    { "clzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
   +    { "ctzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
    { "cpopw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
    { "slli.uw", rv_codec_i_sh5, rv_fmt_rd_rs1_imm, NULL, 0, 0, 0 },
    { "add.uw", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },

The threading seems to have gotten a little screwed up with the v2 so 
sorry if
I missed something, but I didn't see one with the ordering changed.  I 
stuck

what I think is a correct patch over at
, 


LMK if that's OK (or just send a v3).

If the order doesn't matter I think it would be nice to put ctzw 
after clzw.




Thanks,


Daniel


  { "cpopw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
  { "slli.uw", rv_codec_i_sh5, rv_fmt_rd_rs1_imm, NULL, 0, 0, 0 },






[PATCH] tests/data/acpi/virt: drop (most) duplicate files.

2023-03-01 Thread Michael S. Tsirkin
When virt ACPI files were added, lots of duplicates were created because
we forgot that there's a no-prefix fallback: e.g. if
tests/data/acpi/virt/APIC.memhp is not there then test will use
tests/data/acpi/virt/APIC.

Drop these.

These were found with
$find tests/data/acpi/ -type f -exec sha256sum '{}' ';'|sort -d|uniq -w 64 
--all-repeated=separate
(trick: -d does a dictionary sort so a no-suffix file ends up first).

Note: there are still a bunch of issues with duplicates left even after this.

First pc and q35 are often identical.
Second, sometimes files are identical but not identical to the default
fallback, e.g.
tests/data/acpi/pc/SLIT.cphp and tests/data/acpi/pc/SLIT.memhp
or
tests/data/acpi/q35/HMAT.acpihmat-noinitiator and 
tests/data/acpi/virt/HMAT.acpihmatvirt

Finding a way to deduplicate these is still a TODO item - softlinks
maybe?

We also need to make rebuild-expected-aml.sh smarter about not creating
these duplicates in the 1st place.

And maybe we should use softlinks instead of relying on a fallback
to make it explicit what version does each test expect?

Signed-off-by: Michael S. Tsirkin 
---
 tests/data/acpi/virt/APIC.memhp   | Bin 172 -> 0 bytes
 tests/data/acpi/virt/APIC.numamem | Bin 172 -> 0 bytes
 tests/data/acpi/virt/DSDT.numamem | Bin 5196 -> 0 bytes
 tests/data/acpi/virt/FACP.memhp   | Bin 276 -> 0 bytes
 tests/data/acpi/virt/FACP.numamem | Bin 276 -> 0 bytes
 tests/data/acpi/virt/GTDT.memhp   | Bin 96 -> 0 bytes
 tests/data/acpi/virt/GTDT.numamem | Bin 96 -> 0 bytes
 tests/data/acpi/virt/IORT.memhp   | Bin 128 -> 0 bytes
 tests/data/acpi/virt/IORT.numamem | Bin 128 -> 0 bytes
 tests/data/acpi/virt/IORT.pxb | Bin 128 -> 0 bytes
 tests/data/acpi/virt/MCFG.memhp   | Bin 60 -> 0 bytes
 tests/data/acpi/virt/MCFG.numamem | Bin 60 -> 0 bytes
 tests/data/acpi/virt/SPCR.memhp   | Bin 80 -> 0 bytes
 tests/data/acpi/virt/SPCR.numamem | Bin 80 -> 0 bytes
 14 files changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 tests/data/acpi/virt/APIC.memhp
 delete mode 100644 tests/data/acpi/virt/APIC.numamem
 delete mode 100644 tests/data/acpi/virt/DSDT.numamem
 delete mode 100644 tests/data/acpi/virt/FACP.memhp
 delete mode 100644 tests/data/acpi/virt/FACP.numamem
 delete mode 100644 tests/data/acpi/virt/GTDT.memhp
 delete mode 100644 tests/data/acpi/virt/GTDT.numamem
 delete mode 100644 tests/data/acpi/virt/IORT.memhp
 delete mode 100644 tests/data/acpi/virt/IORT.numamem
 delete mode 100644 tests/data/acpi/virt/IORT.pxb
 delete mode 100644 tests/data/acpi/virt/MCFG.memhp
 delete mode 100644 tests/data/acpi/virt/MCFG.numamem
 delete mode 100644 tests/data/acpi/virt/SPCR.memhp
 delete mode 100644 tests/data/acpi/virt/SPCR.numamem

diff --git a/tests/data/acpi/virt/APIC.memhp b/tests/data/acpi/virt/APIC.memhp
deleted file mode 100644
index 
179d274770a23209b949c90a929525e22368568b..
GIT binary patch
literal 0
HcmV?d1

literal 172
zcmZ<^@N{0oz`(%b?<65v<@85#X!<1dKp25F13p0FMNW#lQh$F##Fe0Wcl|15CX*
gLI}uWgsNwO(#>V09of9T)-_08#k}0RR91

diff --git a/tests/data/acpi/virt/APIC.numamem 
b/tests/data/acpi/virt/APIC.numamem
deleted file mode 100644
index 
179d274770a23209b949c90a929525e22368568b..
GIT binary patch
literal 0
HcmV?d1

literal 172
zcmZ<^@N{0oz`(%b?<65v<@85#X!<1dKp25F13p0FMNW#lQh$F##Fe0Wcl|15CX*
gLI}uWgsNwO(#>V09of9T)-_08#k}0RR91

diff --git a/tests/data/acpi/virt/DSDT.numamem 
b/tests/data/acpi/virt/DSDT.numamem
deleted file mode 100644
index 
c47503990715d389914fdf9c8bccb510761741ac..
GIT binary patch
literal 0
HcmV?d1

literal 5196
zcmZvg%WoT16o>EFlh__VVmr>uc{qhq@vO#n^Jr;H?6H%$#EJ2w4N@w(5&}`OsYHcT
zDx{D_3)#^~Yza~%{tYBn?AWnj&4zz~9p>D*Gs*8LXQYhh%-r+M{l>@f@oo97-K~;R

Re: [PATCH 6/6] monitor: convert monitor_cleanup() to AIO_WAIT_WHILE_UNLOCKED()

2023-03-01 Thread Markus Armbruster
Stefan Hajnoczi  writes:

> monitor_cleanup() is called from the main loop thread. Calling

Correct.

> AIO_WAIT_WHILE(qemu_get_aio_context(), ...) from the main loop thread is
> equivalent to AIO_WAIT_WHILE_UNLOCKED(NULL, ...) because neither unlocks
> the AioContext and the latter's assertion that we're in the main loop
> succeeds.
>
> Signed-off-by: Stefan Hajnoczi 
> ---
>  monitor/monitor.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index 8dc96f6af9..602535696c 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -666,7 +666,7 @@ void monitor_cleanup(void)
>   * We need to poll both qemu_aio_context and iohandler_ctx to make
>   * sure that the dispatcher coroutine keeps making progress and
>   * eventually terminates.  qemu_aio_context is automatically
> - * polled by calling AIO_WAIT_WHILE on it, but we must poll
> + * polled by calling AIO_WAIT_WHILE_UNLOCKED on it, but we must poll
>   * iohandler_ctx manually.
>   *
>   * Letting the iothread continue while shutting down the dispatcher
> @@ -679,7 +679,7 @@ void monitor_cleanup(void)
>  aio_co_wake(qmp_dispatcher_co);
>  }
>  
> -AIO_WAIT_WHILE(qemu_get_aio_context(),
> +AIO_WAIT_WHILE_UNLOCKED(NULL,
> (aio_poll(iohandler_get_aio_context(), false),
>  qatomic_mb_read(_dispatcher_co_busy)));

Acked-by: Markus Armbruster 

For an R-by, I need to understand this in more detail.  See my reply to
the previous patch.




Re: [PATCH 5/6] hmp: convert handle_hmp_command() to AIO_WAIT_WHILE_UNLOCKED()

2023-03-01 Thread Markus Armbruster
Stefan Hajnoczi  writes:

> The HMP monitor runs in the main loop thread. Calling

Correct.

> AIO_WAIT_WHILE(qemu_get_aio_context(), ...) from the main loop thread is
> equivalent to AIO_WAIT_WHILE_UNLOCKED(NULL, ...) because neither unlocks
> the AioContext and the latter's assertion that we're in the main loop
> succeeds.
>
> Signed-off-by: Stefan Hajnoczi 
> ---
>  monitor/hmp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/monitor/hmp.c b/monitor/hmp.c
> index 2aa85d3982..5ecbdac802 100644
> --- a/monitor/hmp.c
> +++ b/monitor/hmp.c
> @@ -1167,7 +1167,7 @@ void handle_hmp_command(MonitorHMP *mon, const char 
> *cmdline)
>  Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, );
>  monitor_set_cur(co, >common);
>  aio_co_enter(qemu_get_aio_context(), co);
> -AIO_WAIT_WHILE(qemu_get_aio_context(), !data.done);
> +AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
>  }
>  
>  qobject_unref(qdict);

Acked-by: Markus Armbruster 

For an R-by, I need to understand this in more detail.  I'm not familiar
with the innards of AIO_WAIT_WHILE() & friends, so I need to go real
slow.

We change

ctx from qemu_get_aio_context() to NULL
unlock from true to false

in

bool waited_ = false;  \
AioWait *wait_ = _aio_wait; \
AioContext *ctx_ = (ctx);  \
/* Increment wait_->num_waiters before evaluating cond. */ \
qatomic_inc(_->num_waiters);  \
/* Paired with smp_mb in aio_wait_kick(). */   \
smp_mb();  \
if (ctx_ && in_aio_context_home_thread(ctx_)) {\
while ((cond)) {   \
aio_poll(ctx_, true);  \
waited_ = true;\
}  \
} else {   \
assert(qemu_get_current_aio_context() ==   \
   qemu_get_aio_context());\
while ((cond)) {   \
if (unlock && ctx_) {  \
aio_context_release(ctx_); \
}  \
aio_poll(qemu_get_aio_context(), true);\
if (unlock && ctx_) {  \
aio_context_acquire(ctx_); \
}  \
waited_ = true;\
}  \
}  \
qatomic_dec(_->num_waiters);  \
waited_; })

qemu_get_aio_context() is non-null here, correct?

What's the value of in_aio_context_home_thread(qemu_get_aio_context())?




[PULL 5/5] target/ppc: Restrict 'qapi-commands-machine.h' to system emulation

2023-03-01 Thread Markus Armbruster
From: Philippe Mathieu-Daudé 

Since commit a0e61807a3 ("qapi: Remove QMP events and commands from
user-mode builds") we don't generate the "qapi-commands-machine.h"
header in a user-emulation-only build.

Move the QMP functions from cpu_init.c (which is always compiled)
to monitor.c (which is only compiled when system-emulation
is selected).  Rename monitor.c to arm-qmp-cmds.c.

Note ppc_cpu_class_by_name() is used by both file units, so we
expose its prototype in "cpu-qom.h".

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Reviewed-by: Cédric Le Goater 
Message-Id: <20230223155540.30370-5-phi...@linaro.org>
Signed-off-by: Markus Armbruster 
---
 target/ppc/cpu-qom.h |  2 +
 target/ppc/cpu_init.c| 48 +--
 target/ppc/{monitor.c => ppc-qmp-cmds.c} | 50 +++-
 target/ppc/meson.build   |  2 +-
 4 files changed, 53 insertions(+), 49 deletions(-)
 rename target/ppc/{monitor.c => ppc-qmp-cmds.c} (78%)

diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
index 0fbd8b7246..9666f54f65 100644
--- a/target/ppc/cpu-qom.h
+++ b/target/ppc/cpu-qom.h
@@ -31,6 +31,8 @@
 
 OBJECT_DECLARE_CPU_TYPE(PowerPCCPU, PowerPCCPUClass, POWERPC_CPU)
 
+ObjectClass *ppc_cpu_class_by_name(const char *name);
+
 typedef struct CPUArchState CPUPPCState;
 typedef struct ppc_tb_t ppc_tb_t;
 typedef struct ppc_dcr_t ppc_dcr_t;
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index abee71d407..d62ffe8a6f 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -40,7 +40,6 @@
 #include "qemu/cutils.h"
 #include "disas/capstone.h"
 #include "fpu/softfloat.h"
-#include "qapi/qapi-commands-machine-target.h"
 
 #include "helper_regs.h"
 #include "internal.h"
@@ -6841,7 +6840,7 @@ static const char *ppc_cpu_lookup_alias(const char *alias)
 return NULL;
 }
 
-static ObjectClass *ppc_cpu_class_by_name(const char *name)
+ObjectClass *ppc_cpu_class_by_name(const char *name)
 {
 char *cpu_model, *typename;
 ObjectClass *oc;
@@ -6981,51 +6980,6 @@ void ppc_cpu_list(void)
 #endif
 }
 
-static void ppc_cpu_defs_entry(gpointer data, gpointer user_data)
-{
-ObjectClass *oc = data;
-CpuDefinitionInfoList **first = user_data;
-const char *typename;
-CpuDefinitionInfo *info;
-
-typename = object_class_get_name(oc);
-info = g_malloc0(sizeof(*info));
-info->name = g_strndup(typename,
-   strlen(typename) - strlen(POWERPC_CPU_TYPE_SUFFIX));
-
-QAPI_LIST_PREPEND(*first, info);
-}
-
-CpuDefinitionInfoList *qmp_query_cpu_definitions(Error **errp)
-{
-CpuDefinitionInfoList *cpu_list = NULL;
-GSList *list;
-int i;
-
-list = object_class_get_list(TYPE_POWERPC_CPU, false);
-g_slist_foreach(list, ppc_cpu_defs_entry, _list);
-g_slist_free(list);
-
-for (i = 0; ppc_cpu_aliases[i].alias != NULL; i++) {
-PowerPCCPUAlias *alias = _cpu_aliases[i];
-ObjectClass *oc;
-CpuDefinitionInfo *info;
-
-oc = ppc_cpu_class_by_name(alias->model);
-if (oc == NULL) {
-continue;
-}
-
-info = g_malloc0(sizeof(*info));
-info->name = g_strdup(alias->alias);
-info->q_typename = g_strdup(object_class_get_name(oc));
-
-QAPI_LIST_PREPEND(cpu_list, info);
-}
-
-return cpu_list;
-}
-
 static void ppc_cpu_set_pc(CPUState *cs, vaddr value)
 {
 PowerPCCPU *cpu = POWERPC_CPU(cs);
diff --git a/target/ppc/monitor.c b/target/ppc/ppc-qmp-cmds.c
similarity index 78%
rename from target/ppc/monitor.c
rename to target/ppc/ppc-qmp-cmds.c
index 8250b1304e..36e5b5eff8 100644
--- a/target/ppc/monitor.c
+++ b/target/ppc/ppc-qmp-cmds.c
@@ -1,5 +1,5 @@
 /*
- * QEMU monitor
+ * QEMU PPC (monitor definitions)
  *
  * Copyright (c) 2003-2004 Fabrice Bellard
  *
@@ -28,6 +28,9 @@
 #include "qemu/ctype.h"
 #include "monitor/hmp-target.h"
 #include "monitor/hmp.h"
+#include "qapi/qapi-commands-machine-target.h"
+#include "cpu-models.h"
+#include "cpu-qom.h"
 
 static target_long monitor_get_ccr(Monitor *mon, const struct MonitorDef *md,
int val)
@@ -172,3 +175,48 @@ int target_get_monitor_def(CPUState *cs, const char *name, 
uint64_t *pval)
 
 return -EINVAL;
 }
+
+static void ppc_cpu_defs_entry(gpointer data, gpointer user_data)
+{
+ObjectClass *oc = data;
+CpuDefinitionInfoList **first = user_data;
+const char *typename;
+CpuDefinitionInfo *info;
+
+typename = object_class_get_name(oc);
+info = g_malloc0(sizeof(*info));
+info->name = g_strndup(typename,
+   strlen(typename) - strlen(POWERPC_CPU_TYPE_SUFFIX));
+
+QAPI_LIST_PREPEND(*first, info);
+}
+
+CpuDefinitionInfoList *qmp_query_cpu_definitions(Error **errp)
+{
+CpuDefinitionInfoList *cpu_list = NULL;
+GSList *list;
+int i;
+
+list = object_class_get_list(TYPE_POWERPC_CPU, false);
+g_slist_foreach(list, 

[PULL 0/5] Monitor patches for 2023-03-02

2023-03-01 Thread Markus Armbruster
The following changes since commit 627634031092e1514f363fd8659a579398de0f0e:

  Merge tag 'buildsys-qom-qdev-ui-20230227' of https://github.com/philmd/qemu 
into staging (2023-02-28 15:09:18 +)

are available in the Git repository at:

  https://repo.or.cz/qemu/armbru.git tags/pull-monitor-2023-03-02

for you to fetch changes up to 0f3fea217164e3925db91d46f21fc9fa11708e66:

  target/ppc: Restrict 'qapi-commands-machine.h' to system emulation 
(2023-03-02 07:51:33 +0100)


Monitor patches for 2023-03-02


Dongli Zhang (1):
  readline: fix hmp completion issue

Philippe Mathieu-Daudé (4):
  target/arm: Restrict 'qapi-commands-machine.h' to system emulation
  target/i386: Restrict 'qapi-commands-machine.h' to system emulation
  target/loongarch: Restrict 'qapi-commands-machine.h' to system emulation
  target/ppc: Restrict 'qapi-commands-machine.h' to system emulation

 target/ppc/cpu-qom.h |  2 +
 monitor/hmp.c|  8 +---
 target/arm/{monitor.c => arm-qmp-cmds.c} | 28 
 target/arm/helper.c  | 29 -
 target/i386/cpu.c| 74 +---
 target/loongarch/cpu.c   | 27 
 target/loongarch/loongarch-qmp-cmds.c| 37 
 target/ppc/cpu_init.c| 48 +
 target/ppc/{monitor.c => ppc-qmp-cmds.c} | 50 -
 target/arm/meson.build   |  2 +-
 target/loongarch/meson.build |  1 +
 target/ppc/meson.build   |  2 +-
 12 files changed, 161 insertions(+), 147 deletions(-)
 rename target/arm/{monitor.c => arm-qmp-cmds.c} (90%)
 create mode 100644 target/loongarch/loongarch-qmp-cmds.c
 rename target/ppc/{monitor.c => ppc-qmp-cmds.c} (78%)

-- 
2.39.0




[PULL 3/5] target/i386: Restrict 'qapi-commands-machine.h' to system emulation

2023-03-01 Thread Markus Armbruster
From: Philippe Mathieu-Daudé 

Since commit a0e61807a3 ("qapi: Remove QMP events and commands from
user-mode builds") we don't generate the "qapi-commands-machine.h"
header in a user-emulation-only build.

Guard qmp_query_cpu_definitions() within CONFIG_USER_ONLY; move
x86_cpu_class_check_missing_features() closer since it is only used
by this QMP command handler.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20230223155540.30370-3-phi...@linaro.org>
Signed-off-by: Markus Armbruster 
---
 target/i386/cpu.c | 74 +--
 1 file changed, 39 insertions(+), 35 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 4bad3d41d3..4d508624e1 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -31,11 +31,11 @@
 #include "qapi/error.h"
 #include "qapi/qapi-visit-machine.h"
 #include "qapi/qmp/qerror.h"
-#include "qapi/qapi-commands-machine-target.h"
 #include "standard-headers/asm-x86/kvm_para.h"
 #include "hw/qdev-properties.h"
 #include "hw/i386/topology.h"
 #ifndef CONFIG_USER_ONLY
+#include "qapi/qapi-commands-machine-target.h"
 #include "exec/address-spaces.h"
 #include "hw/boards.h"
 #include "hw/i386/sgx-epc.h"
@@ -4843,40 +4843,6 @@ static void x86_cpu_get_unavailable_features(Object 
*obj, Visitor *v,
 visit_type_strList(v, "unavailable-features", , errp);
 }
 
-/* Check for missing features that may prevent the CPU class from
- * running using the current machine and accelerator.
- */
-static void x86_cpu_class_check_missing_features(X86CPUClass *xcc,
- strList **list)
-{
-strList **tail = list;
-X86CPU *xc;
-Error *err = NULL;
-
-if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
-QAPI_LIST_APPEND(tail, g_strdup("kvm"));
-return;
-}
-
-xc = X86_CPU(object_new_with_class(OBJECT_CLASS(xcc)));
-
-x86_cpu_expand_features(xc, );
-if (err) {
-/* Errors at x86_cpu_expand_features should never happen,
- * but in case it does, just report the model as not
- * runnable at all using the "type" property.
- */
-QAPI_LIST_APPEND(tail, g_strdup("type"));
-error_free(err);
-}
-
-x86_cpu_filter_features(xc, false);
-
-x86_cpu_list_feature_names(xc->filtered_features, tail);
-
-object_unref(OBJECT(xc));
-}
-
 /* Print all cpuid feature names in featureset
  */
 static void listflags(GList *features)
@@ -5005,6 +4971,42 @@ void x86_cpu_list(void)
 g_list_free(names);
 }
 
+#ifndef CONFIG_USER_ONLY
+
+/* Check for missing features that may prevent the CPU class from
+ * running using the current machine and accelerator.
+ */
+static void x86_cpu_class_check_missing_features(X86CPUClass *xcc,
+ strList **list)
+{
+strList **tail = list;
+X86CPU *xc;
+Error *err = NULL;
+
+if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
+QAPI_LIST_APPEND(tail, g_strdup("kvm"));
+return;
+}
+
+xc = X86_CPU(object_new_with_class(OBJECT_CLASS(xcc)));
+
+x86_cpu_expand_features(xc, );
+if (err) {
+/* Errors at x86_cpu_expand_features should never happen,
+ * but in case it does, just report the model as not
+ * runnable at all using the "type" property.
+ */
+QAPI_LIST_APPEND(tail, g_strdup("type"));
+error_free(err);
+}
+
+x86_cpu_filter_features(xc, false);
+
+x86_cpu_list_feature_names(xc->filtered_features, tail);
+
+object_unref(OBJECT(xc));
+}
+
 static void x86_cpu_definition_entry(gpointer data, gpointer user_data)
 {
 ObjectClass *oc = data;
@@ -5045,6 +5047,8 @@ CpuDefinitionInfoList *qmp_query_cpu_definitions(Error 
**errp)
 return cpu_list;
 }
 
+#endif /* !CONFIG_USER_ONLY */
+
 uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 bool migratable_only)
 {
-- 
2.39.0




[PULL 2/5] target/arm: Restrict 'qapi-commands-machine.h' to system emulation

2023-03-01 Thread Markus Armbruster
From: Philippe Mathieu-Daudé 

Since commit a0e61807a3 ("qapi: Remove QMP events and commands from
user-mode builds") we don't generate the "qapi-commands-machine.h"
header in a user-emulation-only build.

Move the QMP functions from helper.c (which is always compiled)
to monitor.c (which is only compiled when system-emulation
is selected).  Rename monitor.c to arm-qmp-cmds.c.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20230223155540.30370-2-phi...@linaro.org>
Signed-off-by: Markus Armbruster 
[Straightforward conflict with commit 9def656e7a2 resolved]
---
 target/arm/{monitor.c => arm-qmp-cmds.c} | 28 +++
 target/arm/helper.c  | 29 
 target/arm/meson.build   |  2 +-
 3 files changed, 29 insertions(+), 30 deletions(-)
 rename target/arm/{monitor.c => arm-qmp-cmds.c} (90%)

diff --git a/target/arm/monitor.c b/target/arm/arm-qmp-cmds.c
similarity index 90%
rename from target/arm/monitor.c
rename to target/arm/arm-qmp-cmds.c
index ecdd5ee817..c8fa524002 100644
--- a/target/arm/monitor.c
+++ b/target/arm/arm-qmp-cmds.c
@@ -227,3 +227,31 @@ CpuModelExpansionInfo 
*qmp_query_cpu_model_expansion(CpuModelExpansionType type,
 
 return expansion_info;
 }
+
+static void arm_cpu_add_definition(gpointer data, gpointer user_data)
+{
+ObjectClass *oc = data;
+CpuDefinitionInfoList **cpu_list = user_data;
+CpuDefinitionInfo *info;
+const char *typename;
+
+typename = object_class_get_name(oc);
+info = g_malloc0(sizeof(*info));
+info->name = g_strndup(typename,
+   strlen(typename) - strlen("-" TYPE_ARM_CPU));
+info->q_typename = g_strdup(typename);
+
+QAPI_LIST_PREPEND(*cpu_list, info);
+}
+
+CpuDefinitionInfoList *qmp_query_cpu_definitions(Error **errp)
+{
+CpuDefinitionInfoList *cpu_list = NULL;
+GSList *list;
+
+list = object_class_get_list(TYPE_ARM_CPU, false);
+g_slist_foreach(list, arm_cpu_add_definition, _list);
+g_slist_free(list);
+
+return cpu_list;
+}
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 14af7ba095..82c546f11a 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -23,7 +23,6 @@
 #include "sysemu/cpu-timers.h"
 #include "sysemu/kvm.h"
 #include "sysemu/tcg.h"
-#include "qapi/qapi-commands-machine-target.h"
 #include "qapi/error.h"
 #include "qemu/guest-random.h"
 #ifdef CONFIG_TCG
@@ -9188,34 +9187,6 @@ void arm_cpu_list(void)
 g_slist_free(list);
 }
 
-static void arm_cpu_add_definition(gpointer data, gpointer user_data)
-{
-ObjectClass *oc = data;
-CpuDefinitionInfoList **cpu_list = user_data;
-CpuDefinitionInfo *info;
-const char *typename;
-
-typename = object_class_get_name(oc);
-info = g_malloc0(sizeof(*info));
-info->name = g_strndup(typename,
-   strlen(typename) - strlen("-" TYPE_ARM_CPU));
-info->q_typename = g_strdup(typename);
-
-QAPI_LIST_PREPEND(*cpu_list, info);
-}
-
-CpuDefinitionInfoList *qmp_query_cpu_definitions(Error **errp)
-{
-CpuDefinitionInfoList *cpu_list = NULL;
-GSList *list;
-
-list = object_class_get_list(TYPE_ARM_CPU, false);
-g_slist_foreach(list, arm_cpu_add_definition, _list);
-g_slist_free(list);
-
-return cpu_list;
-}
-
 /*
  * Private utility function for define_one_arm_cp_reg_with_opaque():
  * add a single reginfo struct to the hash table.
diff --git a/target/arm/meson.build b/target/arm/meson.build
index a5191b57e1..6226098ad5 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -20,8 +20,8 @@ arm_softmmu_ss = ss.source_set()
 arm_softmmu_ss.add(files(
   'arch_dump.c',
   'arm-powerctl.c',
+  'arm-qmp-cmds.c',
   'machine.c',
-  'monitor.c',
   'ptw.c',
 ))
 
-- 
2.39.0




[PULL 1/5] readline: fix hmp completion issue

2023-03-01 Thread Markus Armbruster
From: Dongli Zhang 

The auto completion does not work in some cases.

Case 1.

1. (qemu) info reg
2. Press 'Tab'.
3. It does not auto complete.

Case 2.

1. (qemu) block_resize flo
2. Press 'Tab'.
3. It does not auto complete 'floppy0'.

Since the readline_add_completion_of() may add any completion when
strlen(pfx) is zero, we remove the check with (name[0] == '\0') because
strlen() always returns zero in that case.

Fixes: 52f50b1e9f8f ("readline: Extract readline_add_completion_of() from 
monitor")
Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
Message-Id: <20230207045241.8843-1-dongli.zh...@oracle.com>
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Markus Armbruster 
Tested-by: Thomas Huth 
Signed-off-by: Markus Armbruster 
---
 monitor/hmp.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/monitor/hmp.c b/monitor/hmp.c
index 2aa85d3982..fee410362f 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -1189,9 +1189,7 @@ static void cmd_completion(MonitorHMP *mon, const char 
*name, const char *list)
 }
 memcpy(cmd, pstart, len);
 cmd[len] = '\0';
-if (name[0] == '\0') {
-readline_add_completion_of(mon->rs, name, cmd);
-}
+readline_add_completion_of(mon->rs, name, cmd);
 if (*p == '\0') {
 break;
 }
@@ -1335,9 +1333,7 @@ static void monitor_find_completion_by_table(MonitorHMP 
*mon,
 /* block device name completion */
 readline_set_completion_index(mon->rs, strlen(str));
 while ((blk = blk_next(blk)) != NULL) {
-if (str[0] == '\0') {
-readline_add_completion_of(mon->rs, str, blk_name(blk));
-}
+readline_add_completion_of(mon->rs, str, blk_name(blk));
 }
 break;
 case 's':
-- 
2.39.0




[PULL 4/5] target/loongarch: Restrict 'qapi-commands-machine.h' to system emulation

2023-03-01 Thread Markus Armbruster
From: Philippe Mathieu-Daudé 

Since commit a0e61807a3 ("qapi: Remove QMP events and commands from
user-mode builds") we don't generate the "qapi-commands-machine.h"
header in a user-emulation-only build.

Extract the QMP functions from cpu.c (which is always compiled)
to the new 'loongarch-qmp-cmds.c' unit (which is only compiled
when system emulation is selected).

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20230223155540.30370-4-phi...@linaro.org>
Signed-off-by: Markus Armbruster 
---
 target/loongarch/cpu.c| 27 ---
 target/loongarch/loongarch-qmp-cmds.c | 37 +++
 target/loongarch/meson.build  |  1 +
 3 files changed, 38 insertions(+), 27 deletions(-)
 create mode 100644 target/loongarch/loongarch-qmp-cmds.c

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 290ab4d526..4e845ba29b 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -12,7 +12,6 @@
 #include "qemu/module.h"
 #include "sysemu/qtest.h"
 #include "exec/exec-all.h"
-#include "qapi/qapi-commands-machine-target.h"
 #include "cpu.h"
 #include "internals.h"
 #include "fpu/softfloat-helpers.h"
@@ -748,29 +747,3 @@ static const TypeInfo loongarch_cpu_type_infos[] = {
 };
 
 DEFINE_TYPES(loongarch_cpu_type_infos)
-
-static void loongarch_cpu_add_definition(gpointer data, gpointer user_data)
-{
-ObjectClass *oc = data;
-CpuDefinitionInfoList **cpu_list = user_data;
-CpuDefinitionInfo *info = g_new0(CpuDefinitionInfo, 1);
-const char *typename = object_class_get_name(oc);
-
-info->name = g_strndup(typename,
-   strlen(typename) - strlen("-" TYPE_LOONGARCH_CPU));
-info->q_typename = g_strdup(typename);
-
-QAPI_LIST_PREPEND(*cpu_list, info);
-}
-
-CpuDefinitionInfoList *qmp_query_cpu_definitions(Error **errp)
-{
-CpuDefinitionInfoList *cpu_list = NULL;
-GSList *list;
-
-list = object_class_get_list(TYPE_LOONGARCH_CPU, false);
-g_slist_foreach(list, loongarch_cpu_add_definition, _list);
-g_slist_free(list);
-
-return cpu_list;
-}
diff --git a/target/loongarch/loongarch-qmp-cmds.c 
b/target/loongarch/loongarch-qmp-cmds.c
new file mode 100644
index 00..6c25957881
--- /dev/null
+++ b/target/loongarch/loongarch-qmp-cmds.c
@@ -0,0 +1,37 @@
+/*
+ * QEMU LoongArch CPU (monitor definitions)
+ *
+ * SPDX-FileCopyrightText: 2021 Loongson Technology Corporation Limited
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/qapi-commands-machine-target.h"
+#include "cpu.h"
+
+static void loongarch_cpu_add_definition(gpointer data, gpointer user_data)
+{
+ObjectClass *oc = data;
+CpuDefinitionInfoList **cpu_list = user_data;
+CpuDefinitionInfo *info = g_new0(CpuDefinitionInfo, 1);
+const char *typename = object_class_get_name(oc);
+
+info->name = g_strndup(typename,
+   strlen(typename) - strlen("-" TYPE_LOONGARCH_CPU));
+info->q_typename = g_strdup(typename);
+
+QAPI_LIST_PREPEND(*cpu_list, info);
+}
+
+CpuDefinitionInfoList *qmp_query_cpu_definitions(Error **errp)
+{
+CpuDefinitionInfoList *cpu_list = NULL;
+GSList *list;
+
+list = object_class_get_list(TYPE_LOONGARCH_CPU, false);
+g_slist_foreach(list, loongarch_cpu_add_definition, _list);
+g_slist_free(list);
+
+return cpu_list;
+}
diff --git a/target/loongarch/meson.build b/target/loongarch/meson.build
index 690633969f..9293a8ab78 100644
--- a/target/loongarch/meson.build
+++ b/target/loongarch/meson.build
@@ -16,6 +16,7 @@ loongarch_tcg_ss.add(zlib)
 
 loongarch_softmmu_ss = ss.source_set()
 loongarch_softmmu_ss.add(files(
+  'loongarch-qmp-cmds.c',
   'machine.c',
   'tlb_helper.c',
   'constant_timer.c',
-- 
2.39.0




Re: [PATCH] virtio: fix reachable assertion due to stale value of cached region size

2023-03-01 Thread Jason Wang
On Thu, Feb 16, 2023 at 6:23 AM Carlos López  wrote:
>
> In virtqueue_{split,packed}_get_avail_bytes() descriptors are read
> in a loop via MemoryRegionCache regions and calls to
> vring_{split,packed}_desc_read() - these take a region cache and the
> index of the descriptor to be read.
>
> For direct descriptors we use a cache provided by the caller, whose
> size matches that of the virtqueue vring. We limit the number of
> descriptors we can read by the size of that vring:
>
> max = vq->vring.num;
> ...
> MemoryRegionCache *desc_cache = >desc;
>
> For indirect descriptors, we initialize a new cache and limit the
> number of descriptors by the size of the intermediate descriptor:
>
> len = address_space_cache_init(_desc_cache,
>vdev->dma_as,
>desc.addr, desc.len, false);

So desc.addr and desc.len are under the control of the driver. A
malicious driver can choose to do a trick there. Should we sanitize
them here?

Thanks

> desc_cache = _desc_cache;
> ...
> max = desc.len / sizeof(VRingDesc);
>
> However, the first initialization of `max` is done outside the loop
> where we process guest descriptors, while the second one is done
> inside. This means that a sequence of an indirect descriptor followed
> by a direct one will leave a stale value in `max`. If the second
> descriptor's `next` field is smaller than the stale value, but
> greater than the size of the virtqueue ring (and thus the cached
> region), a failed assertion will be triggered in
> address_space_read_cached() down the call chain.
>
> Fix this by initializing `max` inside the loop in both functions.
>
> Fixes: 9796d0ac8fb0 ("virtio: use address_space_map/unmap to access 
> descriptors")
> Signed-off-by: Carlos López 
> ---
>  hw/virtio/virtio.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index f35178f5fc..db70c4976e 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1071,6 +1071,7 @@ static void virtqueue_split_get_avail_bytes(VirtQueue 
> *vq,
>  VirtIODevice *vdev = vq->vdev;
>  unsigned int max, idx;
>  unsigned int total_bufs, in_total, out_total;
> +MemoryRegionCache *desc_cache;
>  MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
>  int64_t len = 0;
>  int rc;
> @@ -1078,15 +1079,13 @@ static void virtqueue_split_get_avail_bytes(VirtQueue 
> *vq,
>  idx = vq->last_avail_idx;
>  total_bufs = in_total = out_total = 0;
>
> -max = vq->vring.num;
> -
>  while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
> -MemoryRegionCache *desc_cache = >desc;
> -unsigned int num_bufs;
> +unsigned int num_bufs = total_bufs;
>  VRingDesc desc;
>  unsigned int i;
>
> -num_bufs = total_bufs;
> +desc_cache = >desc;
> +max = vq->vring.num;
>
>  if (!virtqueue_get_head(vq, idx++, )) {
>  goto err;
> @@ -1218,14 +1217,14 @@ static void 
> virtqueue_packed_get_avail_bytes(VirtQueue *vq,
>  wrap_counter = vq->last_avail_wrap_counter;
>  total_bufs = in_total = out_total = 0;
>
> -max = vq->vring.num;
> -
>  for (;;) {
>  unsigned int num_bufs = total_bufs;
>  unsigned int i = idx;
>  int rc;
>
>  desc_cache = >desc;
> +max = vq->vring.num;
> +
>  vring_packed_desc_read(vdev, , desc_cache, idx, true);
>  if (!is_desc_avail(desc.flags, wrap_counter)) {
>  break;
> --
> 2.35.3
>
>




Re: [PATCH v3 07/10] qapi: implement conditional command arguments

2023-03-01 Thread Markus Armbruster
Marc-André Lureau  writes:

> Hi
>
> On Wed, Mar 1, 2023 at 5:16 PM Markus Armbruster  wrote:
>> What about 3. have an additional command conditional on CONFIG_WIN32?
>> Existing getfd stays the same: always fails when QEMU runs on a Windows
>> host.  The new command exists only when QEMU runs on a Windows host.

We could additionally deprecate getfd for Windows.

> This is what was suggested initially:
> https://patchew.org/QEMU/20230103110814.3726795-1-marcandre.lur...@redhat.com/20230103110814.3726795-9-marcandre.lur...@redhat.com/
>
> I also like it better, as a specific command for windows sockets, less
> ways to use it wrongly.

Daniel, what do you think?




Re: [PATCH v3 7/7] hw/cxl/events: Add injection of Memory Module Events

2023-03-01 Thread Ira Weiny
Jonathan Cameron wrote:
> These events include a copy of the device health information at the
> time of the event. Actually using the emulated device health would
> require a lot of controls to manipulate that state.  Given the aim
> of this injection code is to just test the flows when events occur,
> inject the contents of the device health state as well.
> 
> Future work may add more sophisticate device health emulation
> including direct generation of these records when events occur
> (such as a temperature threshold being crossed).  That does not
> reduce the usefulness of this more basic generation of the events.

Seems very reasonable to me.

One spelling issue below.  With that.

Reviewed-by: Ira Weiny 

> 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/mem/cxl_type3.c  | 61 +
>  hw/mem/cxl_type3_stubs.c| 12 
>  include/hw/cxl/cxl_events.h | 19 
>  qapi/cxl.json   | 35 +
>  4 files changed, 127 insertions(+)
> 

[...]

> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 32f340d972..8b3d30cd71 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -90,6 +90,41 @@
>  '*column': 'uint16', '*correction-mask': [ 'uint64' ]
> }}
>  
> +##
> +# @cxl-inject-memory-module-event:
> +#
> +# Inject an event record for a Memory Module Event (CXL r3.0 8.2.9.2.1.3)
> +# This event includes a copy of the Device Health info at the time of
> +# the event.
> +#
> +# @path: CXL type 3 device canonical QOM path
> +# @log: Event Log to add the event to
> +# @flags: header flags
> +# @type: Device Event Type (see spec for permitted values)
> +# @health-status: Overall health summary bitmap (see spec for permitted bits)
> +# @media-status: Overall media health summary (see spec for permitted values)
> +# @additional-status: Complex field (see spec for meaning)
> +# @life-used: Percentage (0-100) of factory expected life span
> +# @temperature: Device temperature in degrees Celsius
> +# @dirty-shutdown-count: Counter incremented whenever device is unable
> +#to determine if data loss may have occured.

  ^
  occurred

Ira



duplicate acpi files

2023-03-01 Thread Michael S. Tsirkin
Hi guys,
I got annoyed that whenever I run ./tests/data/acpi/rebuild-expected-aml.sh
then my tree gets polluted with duplicate acpi files.
If I forget to blow them away then later they become stale
and bios table test fails.

So wrote a script to find these hoping to teach rebuild-expected-aml.sh
not to generate these:

find tests/data/acpi/ -type f -exec sha256sum '{}' ';'|sort -d|uniq -w 64 
--all-repeated=separate

Turns out we have lots of duplicates already!
These generally increase churn and make review and maintainance of aml
harder - I will remove the trivial ones but slightly harder issues:
- unifying pc and q35 - I guess we can teach bios table test to look
  for expected files one
  directory up and put pc and q35 in a shared directory.
- teaching rebuild-expected-aml.sh to remove duplicates.
- we really should first generate in some temp directory,
  then have a separate script to move files over, this way we can also
  do useful things like tell user what changed - or even pre-generate
  a good git commit message.

I have been using the following script but it expects files to
already be in git, not ideal:

SCM=`pwd`
temp=$(mktemp -d)
status=$?
[ -z "${file}" ] || exit $status
cd ${temp}
rm -fr old new
git clone ${SCM} old
git clone ${SCM} new
cd ${temp}/old
git checkout ${1}
./tests/data/acpi/disassemle-aml.sh -o ${temp}/old/asl
cd ${temp}/new
git checkout ${2}
./tests/data/acpi/disassemle-aml.sh -o ${temp}/new/asl
cd ${temp}
# skip irrelevant header fields
# prefix diff output so it's can be safely included in git log
diff -ru -N -IDisassembly -IChecksum -I'* Length   ' old/asl 
new/asl | sed -e 's/^---\|+++\|@@\|diff/:&/'
rm -fr ${temp}


One of you want to try improving on these issues?

-- 
MST




Re: [PATCH v3 6/7] hw/cxl/events: Add injection of DRAM events

2023-03-01 Thread Ira Weiny
Jonathan Cameron wrote:
> Defined in CXL r3.0 8.2.9.2.1.2 DRAM Event Record, this event
> provides information related to DRAM devices.
> 
> Example injection command in QMP:
> 
> { "execute": "cxl-inject-dram-event",
> "arguments": {
> "path": "/machine/peripheral/cxl-mem0",
> "log": "informational",
> "flags": 1,
> "physaddr": 1000,
> "descriptor": 3,
> "type": 3,
> "transaction-type": 192,
> "channel": 3,
> "rank": 17,
> "nibble-mask": 37421234,
> "bank-group": 7,
> "bank": 11,
> "row": 2,
> "column": 77,
> "correction-mask": [33, 44, 55,66]
> }}
> 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/mem/cxl_type3.c  | 115 
>  hw/mem/cxl_type3_stubs.c|  13 
>  include/hw/cxl/cxl_events.h |  23 
>  qapi/cxl.json   |  35 +++
>  4 files changed, 186 insertions(+)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 5d55943df2..cff5341b7b 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -1167,6 +1167,11 @@ static const QemuUUID gen_media_uuid = {
>   0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
>  };
>  
> +static const QemuUUID dram_uuid = {
> +.data = UUID(0x601dcbb3, 0x9c06, 0x4eab, 0xb8, 0xaf,
> + 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
> +};
> +
>  #define CXL_GMER_VALID_CHANNEL  BIT(0)
>  #define CXL_GMER_VALID_RANK BIT(1)
>  #define CXL_GMER_VALID_DEVICE   BIT(2)
> @@ -1262,6 +1267,116 @@ void qmp_cxl_inject_gen_media_event(const char *path, 
> CxlEventLog log,
>  }
>  }
>  
> +#define CXL_DRAM_VALID_CHANNEL  BIT(0)
> +#define CXL_DRAM_VALID_RANK BIT(1)
> +#define CXL_DRAM_VALID_NIBBLE_MASK  BIT(2)
> +#define CXL_DRAM_VALID_BANK_GROUP   BIT(3)
> +#define CXL_DRAM_VALID_BANK BIT(4)
> +#define CXL_DRAM_VALID_ROW  BIT(5)
> +#define CXL_DRAM_VALID_COLUMN   BIT(6)
> +#define CXL_DRAM_VALID_CORRECTION_MASK  BIT(7)
> +
> +void qmp_cxl_inject_dram_event(const char *path, CxlEventLog log, uint8_t 
> flags,
> +   uint64_t physaddr, uint8_t descriptor,
> +   uint8_t type, uint8_t transaction_type,
> +   bool has_channel, uint8_t channel,
> +   bool has_rank, uint8_t rank,
> +   bool has_nibble_mask, uint32_t nibble_mask,
> +   bool has_bank_group, uint8_t bank_group,
> +   bool has_bank, uint8_t bank,
> +   bool has_row, uint32_t row,
> +   bool has_column, uint16_t column,
> +   bool has_correction_mask, uint64List 
> *correction_mask,
> +   Error **errp)
> +{
> +Object *obj = object_resolve_path(path, NULL);
> +CXLEventDram dram;
> +CXLEventRecordHdr *hdr = 
> +CXLDeviceState *cxlds;
> +CXLType3Dev *ct3d;
> +uint16_t valid_flags = 0;
> +uint8_t enc_log;
> +int rc;
> +
> +if (!obj) {
> +error_setg(errp, "Unable to resolve path");
> +return;
> +}
> +if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
> +error_setg(errp, "Path does not point to a CXL type 3 device");
> +return;
> +}
> +ct3d = CXL_TYPE3(obj);
> +cxlds = >cxl_dstate;
> +
> +rc = ct3d_qmp_cxl_event_log_enc(log);
> +if (rc < 0) {
> +error_setg(errp, "Unhandled error log type");
> +return;
> +}
> +enc_log = rc;
> +
> +memset(, 0, sizeof(dram));
> +cxl_assign_event_header(hdr, _uuid, flags, sizeof(dram));
> +dram.phys_addr = physaddr;

I know I did not do this either but now that the devices can be volatile
memory; Should we try and set the Volatile bit based on the address
provided?

Or should we just allow the bits to be set by the user for testing?  I
think this is what I originally thought but given the new functionality it
may be best to make this more 'real'?

Either way:

Reviewed-by: Ira Weiny 

> +dram.descriptor = descriptor;
> +dram.type = type;
> +dram.transaction_type = transaction_type;
> +
> +if (has_channel) {
> +dram.channel = channel;
> +valid_flags |= CXL_DRAM_VALID_CHANNEL;
> +}
> +
> +if (has_rank) {
> +dram.rank = rank;
> +valid_flags |= CXL_DRAM_VALID_RANK;
> +}
> +
> +if (has_nibble_mask) {
> +st24_le_p(dram.nibble_mask, nibble_mask);
> +valid_flags |= CXL_DRAM_VALID_NIBBLE_MASK;
> +}
> +
> +if (has_bank_group) {
> +dram.bank_group = bank_group;
> +valid_flags |= 

RE: [PATCH v5 05/14] Hexagon (target/hexagon) Analyze packet before generating TCG

2023-03-01 Thread Taylor Simpson


> -Original Message-
> From: Anton Johansson 
> Sent: Thursday, February 23, 2023 10:02 AM
> To: Taylor Simpson ; qemu-devel@nongnu.org
> Cc: richard.hender...@linaro.org; phi...@linaro.org; a...@rev.ng; Brian Cain
> ; Matheus Bernardino (QUIC)
> 
> Subject: Re: [PATCH v5 05/14] Hexagon (target/hexagon) Analyze packet
> before generating TCG
> 
> On 1/31/23 23:56, Taylor Simpson wrote:
> > diff --git a/target/hexagon/gen_analyze_funcs.py
> > b/target/hexagon/gen_analyze_funcs.py
> > new file mode 100755
> > index 00..c45696bec8
> > --- /dev/null
> > +++ b/target/hexagon/gen_analyze_funcs.py
> > @@ -0,0 +1,237 @@
> > +#!/usr/bin/env python3
> > +
> > +##
> > +##  Copyright(c) 2022-2023 Qualcomm Innovation Center, Inc. All Rights
> Reserved.
> > +##
> > +##  This program is free software; you can redistribute it and/or
> > +modify ##  it under the terms of the GNU General Public License as
> > +published by ##  the Free Software Foundation; either version 2 of
> > +the License, or ##  (at your option) any later version.
> > +##
> > +##  This program is distributed in the hope that it will be useful,
> > +##  but WITHOUT ANY WARRANTY; without even the implied warranty of
> ##
> > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the ##
> GNU
> > +General Public License for more details.
> > +##
> > +##  You should have received a copy of the GNU General Public License
> > +##  along with this program; if not, see .
> > +##
> > +
> > +import sys
> > +import re
> > +import string
> > +import hex_common
> > +
> > +##
> > +## Helpers for gen_analyze_func
> > +##
> > +def is_predicated(tag):
> > +return 'A_CONDEXEC' in hex_common.attribdict[tag]
> > +
> > +def analyze_opn_old(f, tag, regtype, regid, regno):
> > +regN = "%s%sN" % (regtype, regid)
> > +predicated = "true" if is_predicated(tag) else "false"
> > +if (regtype == "R"):
> > +if (regid in {"ss", "tt"}):
> > +f.write("//const int %s = insn->regno[%d];\n" % \
> > +(regN, regno))
> > +elif (regid in {"dd", "ee", "xx", "yy"}):
> > +f.write("const int %s = insn->regno[%d];\n" % (regN, 
> > regno))
> > +f.write("ctx_log_reg_write_pair(ctx, %s, %s);\n" % \
> > +(regN, predicated))
> > +elif (regid in {"s", "t", "u", "v"}):
> > +f.write("//const int %s = insn->regno[%d];\n" % \
> > +(regN, regno))
> > +elif (regid in {"d", "e", "x", "y"}):
> > +f.write("const int %s = insn->regno[%d];\n" % (regN, 
> > regno))
> > +f.write("ctx_log_reg_write(ctx, %s, %s);\n" % \
> > +(regN, predicated))
> > +else:
> > +print("Bad register parse: ", regtype, regid)
> > +elif (regtype == "P"):
> > +if (regid in {"s", "t", "u", "v"}):
> > +f.write("//const int %s = insn->regno[%d];\n" % \
> > +(regN, regno))
> > +elif (regid in {"d", "e", "x"}):
> > +f.write("const int %s = insn->regno[%d];\n" % (regN, 
> > regno))
> > +f.write("ctx_log_pred_write(ctx, %s);\n" % (regN))
> > +else:
> > +print("Bad register parse: ", regtype, regid)
> > +elif (regtype == "C"):
> > +if (regid == "ss"):
> > +f.write("//const int %s = insn->regno[%d] + 
> > HEX_REG_SA0;\n" % \
> > +(regN, regno))
> > +elif (regid == "dd"):
> > +f.write("const int %s = insn->regno[%d] + HEX_REG_SA0;\n" 
> > % \
> > +(regN, regno))
> > +f.write("ctx_log_reg_write_pair(ctx, %s, %s);\n" % \
> > +(regN, predicated))
> > +elif (regid == "s"):
> > +f.write("//const int %s = insn->regno[%d] + 
> > HEX_REG_SA0;\n" % \
> > +(regN, regno))
> > +elif (regid == "d"):
> > +f.write("const int %s = insn->regno[%d] + HEX_REG_SA0;\n" 
> > % \
> > +(regN, regno))
> > +f.write("ctx_log_reg_write(ctx, %s, %s);\n" % \
> > +(regN, predicated))
> > +else:
> > +print("Bad register parse: ", regtype, regid)
> > +elif (regtype == "M"):
> > +if (regid == "u"):
> > +f.write("//const int %s = insn->regno[%d];\n"% \
> > +(regN, regno))
> > +else:
> > +print("Bad register parse: ", regtype, regid)
> > +elif (regtype == "V"):
> > +if (regid in {"dd", "xx"}):
> > +f.write("//const int %s = insn->regno[%d];\n" %\
> > +(regN, regno))
> > +elif (regid in {"uu", "vv"}):
> > +f.write("//const int %s = insn->regno[%d];\n" % \
> > +(regN, regno))
> > +elif (regid in {"s", "u", "v", "w"}):
> > +f.write("//const int %s = insn->regno[%d];\n" % \
> > +

RE: [PATCH v5 13/14] Hexagon (target/hexagon) Reduce manipulation of slot_cancelled

2023-03-01 Thread Taylor Simpson


> -Original Message-
> From: Anton Johansson 
> Sent: Friday, February 24, 2023 7:24 AM
> To: Taylor Simpson ; qemu-devel@nongnu.org
> Cc: richard.hender...@linaro.org; phi...@linaro.org; a...@rev.ng; Brian Cain
> ; Matheus Bernardino (QUIC)
> 
> Subject: Re: [PATCH v5 13/14] Hexagon (target/hexagon) Reduce
> manipulation of slot_cancelled
> 
> On 1/31/23 23:56, Taylor Simpson wrote:
> >   /* Called during packet commit when there are two scalar stores */
> > -void HELPER(probe_pkt_scalar_store_s0)(CPUHexagonState *env, int
> > mmu_idx)
> > +void HELPER(probe_pkt_scalar_store_s0)(CPUHexagonState *env, int
> > +args)
> >   {
> > -probe_store(env, 0, mmu_idx);
> > +int mmu_idx = args & 0x3;
> > +bool is_predicated = (args >> 2) & 1;
> > +probe_store(env, 0, mmu_idx, is_predicated);
> >   }
> Can we use bitmasks for the fields of args?

OK, but better to use "hw/registerfields.h".

Thanks,
Taylor



Re: [PATCH v2 03/18] target/riscv: Use g_assert() for the predicate() NULL check

2023-03-01 Thread LIU Zhiwei



On 2023/2/28 18:40, Bin Meng wrote:

At present riscv_csrrw_check() checks the CSR predicate() against
NULL and throws RISCV_EXCP_ILLEGAL_INST if it is NULL. But this is
a pure software check, and has nothing to do with the emulation of
the hardware behavior, thus it is inappropriate to return illegal
instruction exception when software forgets to install the hook.

Change to use g_assert() instead.

Signed-off-by: Bin Meng 
---

Changes in v2:
- new patch: Use assert() for the predicate() NULL check

  target/riscv/csr.c | 6 +-
  1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 4cc2c6370f..cfd7ffc5c2 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3786,11 +3786,6 @@ static inline RISCVException 
riscv_csrrw_check(CPURISCVState *env,
  return RISCV_EXCP_ILLEGAL_INST;
  }
  
-/* check predicate */

-if (!csr_ops[csrno].predicate) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-
  /* read / write check */
  if (write_mask && read_only) {
  return RISCV_EXCP_ILLEGAL_INST;
@@ -3803,6 +3798,7 @@ static inline RISCVException 
riscv_csrrw_check(CPURISCVState *env,
   * illegal instruction exception should be triggered instead of virtual
   * instruction exception. Hence this comes after the read / write check.
   */
+g_assert(csr_ops[csrno].predicate != NULL);


Reviewed-by: LIU Zhiwei 

Zhiwei


  RISCVException ret = csr_ops[csrno].predicate(env, csrno);
  if (ret != RISCV_EXCP_NONE) {
  return ret;




Re: [PATCH v2 02/18] target/riscv: Add some comments to clarify the priority policy of riscv_csrrw_check()

2023-03-01 Thread LIU Zhiwei



On 2023/2/28 18:40, Bin Meng wrote:

The priority policy of riscv_csrrw_check() was once adjusted in
commit eacaf4401956 ("target/riscv: Fix priority of csr related check in 
riscv_csrrw_check")
whose commit message says the CSR existence check should come before
the access control check, but the code changes did not agree with
the commit message, that the predicate() check actually came after
the read / write check.

In fact this was intentional. Add some comments there so that people
won't bother trying to change it without a solid reason.

Signed-off-by: Bin Meng 
---

Changes in v2:
- Keep the original priority policy, instead add some comments for clarification

  target/riscv/csr.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 75a540bfcb..4cc2c6370f 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3776,11 +3776,12 @@ static inline RISCVException 
riscv_csrrw_check(CPURISCVState *env,
  int read_only = get_field(csrno, 0xC00) == 3;
  int csr_min_priv = csr_ops[csrno].min_priv_ver;
  
-/* ensure the CSR extension is enabled. */

+/* ensure the CSR extension is enabled */
  if (!cpu->cfg.ext_icsr) {
  return RISCV_EXCP_ILLEGAL_INST;
  }
  
+/* privileged spec version check */

  if (env->priv_ver < csr_min_priv) {
  return RISCV_EXCP_ILLEGAL_INST;
  }
@@ -3790,10 +3791,18 @@ static inline RISCVException 
riscv_csrrw_check(CPURISCVState *env,
  return RISCV_EXCP_ILLEGAL_INST;
  }
  
+/* read / write check */

  if (write_mask && read_only) {
  return RISCV_EXCP_ILLEGAL_INST;
  }
  
+/*

+ * The predicate() not only does existence check but also does some
+ * access control check which triggers for example virtual instruction
+ * exception in some cases. When writing read-only CSRs in those cases
+ * illegal instruction exception should be triggered instead of virtual
+ * instruction exception. Hence this comes after the read / write check.
+ */


Reviewed-by: LIU Zhiwei 

Zhiwei


  RISCVException ret = csr_ops[csrno].predicate(env, csrno);
  if (ret != RISCV_EXCP_NONE) {
  return ret;




Re: [PATCH v2 18/18] target/riscv: Group all predicate() routines together

2023-03-01 Thread LIU Zhiwei



On 2023/2/28 21:45, Bin Meng wrote:

From: Bin Meng 

Move sstc()/sstc32() to where all predicate() routines live, and
smstateen_acc_ok() to near {read,write}_xenvcfg().

Signed-off-by: Bin Meng 
Reviewed-by: Weiwei Li 
---

Changes in v2:
- move smstateen_acc_ok() to near {read,write}_xenvcfg()

  target/riscv/csr.c | 177 ++---
  1 file changed, 87 insertions(+), 90 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 785f6f4d45..3a7e0217e2 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -40,42 +40,6 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
  csr_ops[csrno & (CSR_TABLE_SIZE - 1)] = *ops;
  }
  
-/* Predicates */


Don't remove this comment. Otherwise,

Reviewed-by: LIU Zhiwei 

Zhiwei


-#if !defined(CONFIG_USER_ONLY)
-static RISCVException smstateen_acc_ok(CPURISCVState *env, int index,
-   uint64_t bit)
-{
-bool virt = riscv_cpu_virt_enabled(env);
-RISCVCPU *cpu = env_archcpu(env);
-
-if (env->priv == PRV_M || !cpu->cfg.ext_smstateen) {
-return RISCV_EXCP_NONE;
-}
-
-if (!(env->mstateen[index] & bit)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-
-if (virt) {
-if (!(env->hstateen[index] & bit)) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-
-if (env->priv == PRV_U && !(env->sstateen[index] & bit)) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-}
-
-if (env->priv == PRV_U && riscv_has_ext(env, RVS)) {
-if (!(env->sstateen[index] & bit)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-}
-
-return RISCV_EXCP_NONE;
-}
-#endif
-
  static RISCVException fs(CPURISCVState *env, int csrno)
  {
  #if !defined(CONFIG_USER_ONLY)
@@ -399,6 +363,60 @@ static RISCVException sstateen(CPURISCVState *env, int 
csrno)
  return RISCV_EXCP_NONE;
  }
  
+static RISCVException sstc(CPURISCVState *env, int csrno)

+{
+RISCVCPU *cpu = env_archcpu(env);
+bool hmode_check = false;
+
+if (!cpu->cfg.ext_sstc || !env->rdtime_fn) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+if ((csrno == CSR_VSTIMECMP) || (csrno == CSR_VSTIMECMPH)) {
+hmode_check = true;
+}
+
+RISCVException ret = hmode_check ? hmode(env, csrno) : smode(env, csrno);
+if (ret != RISCV_EXCP_NONE) {
+return ret;
+}
+
+if (env->debugger) {
+return RISCV_EXCP_NONE;
+}
+
+if (env->priv == PRV_M) {
+return RISCV_EXCP_NONE;
+}
+
+/*
+ * No need of separate function for rv32 as menvcfg stores both menvcfg
+ * menvcfgh for RV32.
+ */
+if (!(get_field(env->mcounteren, COUNTEREN_TM) &&
+  get_field(env->menvcfg, MENVCFG_STCE))) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+if (riscv_cpu_virt_enabled(env)) {
+if (!(get_field(env->hcounteren, COUNTEREN_TM) &&
+  get_field(env->henvcfg, HENVCFG_STCE))) {
+return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
+}
+}
+
+return RISCV_EXCP_NONE;
+}
+
+static RISCVException sstc_32(CPURISCVState *env, int csrno)
+{
+if (riscv_cpu_mxl(env) != MXL_RV32) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+return sstc(env, csrno);
+}
+
  /* Checks if PointerMasking registers could be accessed */
  static RISCVException pointer_masking(CPURISCVState *env, int csrno)
  {
@@ -943,60 +961,6 @@ static RISCVException read_timeh(CPURISCVState *env, int 
csrno,
  return RISCV_EXCP_NONE;
  }
  
-static RISCVException sstc(CPURISCVState *env, int csrno)

-{
-RISCVCPU *cpu = env_archcpu(env);
-bool hmode_check = false;
-
-if (!cpu->cfg.ext_sstc || !env->rdtime_fn) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-
-if ((csrno == CSR_VSTIMECMP) || (csrno == CSR_VSTIMECMPH)) {
-hmode_check = true;
-}
-
-RISCVException ret = hmode_check ? hmode(env, csrno) : smode(env, csrno);
-if (ret != RISCV_EXCP_NONE) {
-return ret;
-}
-
-if (env->debugger) {
-return RISCV_EXCP_NONE;
-}
-
-if (env->priv == PRV_M) {
-return RISCV_EXCP_NONE;
-}
-
-/*
- * No need of separate function for rv32 as menvcfg stores both menvcfg
- * menvcfgh for RV32.
- */
-if (!(get_field(env->mcounteren, COUNTEREN_TM) &&
-  get_field(env->menvcfg, MENVCFG_STCE))) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-
-if (riscv_cpu_virt_enabled(env)) {
-if (!(get_field(env->hcounteren, COUNTEREN_TM) &&
-  get_field(env->henvcfg, HENVCFG_STCE))) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-}
-
-return RISCV_EXCP_NONE;
-}
-
-static RISCVException sstc_32(CPURISCVState *env, int csrno)
-{
-if (riscv_cpu_mxl(env) != MXL_RV32) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-
-return sstc(env, csrno);
-}
-
  static RISCVException read_vstimecmp(CPURISCVState *env, int csrno,
 

Re: [PATCH v2 17/18] target/riscv: Drop priv level check in mseccfg predicate()

2023-03-01 Thread LIU Zhiwei



On 2023/2/28 21:45, Bin Meng wrote:

From: Bin Meng 

riscv_csrrw_check() already does the generic privilege level check
hence there is no need to do the specific M-mode access check in
the mseccfg predicate().

With this change debugger can access the mseccfg CSR anytime.

Signed-off-by: Bin Meng 
Reviewed-by: Weiwei Li 
---

(no changes since v1)

  target/riscv/csr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 020c3f524f..785f6f4d45 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -451,7 +451,7 @@ static RISCVException pmp(CPURISCVState *env, int csrno)
  
  static RISCVException epmp(CPURISCVState *env, int csrno)

  {
-if (env->priv == PRV_M && riscv_cpu_cfg(env)->epmp) {
+if (riscv_cpu_cfg(env)->epmp) {


Reviewed-by: LIU Zhiwei 

Zhiwei


  return RISCV_EXCP_NONE;
  }
  




Re: [PATCH v2 16/18] target/riscv: Allow debugger to access sstc CSRs

2023-03-01 Thread LIU Zhiwei



On 2023/2/28 21:45, Bin Meng wrote:

From: Bin Meng 

At present with a debugger attached sstc CSRs can only be accssed
when CPU is in M-mode, or configured correctly.

Fix it by adjusting their predicate() routine logic so that the
static config check comes before the run-time check, as well as
adding a debugger check.

Signed-off-by: Bin Meng 
Reviewed-by: Weiwei Li 
---

(no changes since v1)

  target/riscv/csr.c | 19 ++-
  1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index a0e70f5ba0..020c3f524f 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -952,6 +952,19 @@ static RISCVException sstc(CPURISCVState *env, int csrno)
  return RISCV_EXCP_ILLEGAL_INST;
  }
  
+if ((csrno == CSR_VSTIMECMP) || (csrno == CSR_VSTIMECMPH)) {

+hmode_check = true;
+}
+
+RISCVException ret = hmode_check ? hmode(env, csrno) : smode(env, csrno);
+if (ret != RISCV_EXCP_NONE) {
+return ret;
+}
+
+if (env->debugger) {
+return RISCV_EXCP_NONE;
+}
+
  if (env->priv == PRV_M) {
  return RISCV_EXCP_NONE;
  }
@@ -972,11 +985,7 @@ static RISCVException sstc(CPURISCVState *env, int csrno)
  }
  }
  
-if ((csrno == CSR_VSTIMECMP) || (csrno == CSR_VSTIMECMPH)) {

-hmode_check = true;
-}
-
-return hmode_check ? hmode(env, csrno) : smode(env, csrno);
+return RISCV_EXCP_NONE;


Reviewed-by: LIU Zhiwei 

Zhiwei


  }
  
  static RISCVException sstc_32(CPURISCVState *env, int csrno)




Re: [PATCH v2 15/18] target/riscv: Allow debugger to access {h,s}stateen CSRs

2023-03-01 Thread LIU Zhiwei



On 2023/2/28 21:45, Bin Meng wrote:

From: Bin Meng 

At present {h,s}stateen CSRs are not reported in the CSR XML
hence gdb cannot access them.

Fix it by adjusting their predicate() routine logic so that the
static config check comes before the run-time check, as well as
adding a debugger check.

Signed-off-by: Bin Meng 
Reviewed-by: Weiwei Li 
---

(no changes since v1)

  target/riscv/csr.c | 22 --
  1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 15b23b9b5a..a0e70f5ba0 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -337,13 +337,22 @@ static RISCVException hstateen_pred(CPURISCVState *env, 
int csrno, int base)
  return RISCV_EXCP_ILLEGAL_INST;
  }
  
+RISCVException ret = hmode(env, csrno);

+if (ret != RISCV_EXCP_NONE) {
+return ret;
+}
+
+if (env->debugger) {
+return RISCV_EXCP_NONE;
+}
+
  if (env->priv < PRV_M) {
  if (!(env->mstateen[csrno - base] & SMSTATEEN_STATEEN)) {
  return RISCV_EXCP_ILLEGAL_INST;
  }
  }
  
-return hmode(env, csrno);

+return RISCV_EXCP_NONE;
  }
  
  static RISCVException hstateen(CPURISCVState *env, int csrno)

@@ -366,6 +375,15 @@ static RISCVException sstateen(CPURISCVState *env, int 
csrno)
  return RISCV_EXCP_ILLEGAL_INST;
  }
  
+RISCVException ret = smode(env, csrno);

+if (ret != RISCV_EXCP_NONE) {
+return ret;
+}
+
+if (env->debugger) {
+return RISCV_EXCP_NONE;
+}
+
  if (env->priv < PRV_M) {
  if (!(env->mstateen[index] & SMSTATEEN_STATEEN)) {
  return RISCV_EXCP_ILLEGAL_INST;
@@ -378,7 +396,7 @@ static RISCVException sstateen(CPURISCVState *env, int 
csrno)
  }
  }
  
-return smode(env, csrno);

+return RISCV_EXCP_NONE;


Reviewed-by: LIU Zhiwei 

Zhiwei


  }
  
  /* Checks if PointerMasking registers could be accessed */




Re: [PATCH v2 05/18] target/riscv: gdbstub: Do not generate CSR XML if Zicsr is disabled

2023-03-01 Thread LIU Zhiwei



On 2023/3/2 8:30, Bin Meng wrote:

On Thu, Mar 2, 2023 at 7:43 AM Palmer Dabbelt  wrote:

On Wed, 01 Mar 2023 01:55:34 PST (-0800), Bin Meng wrote:

On Wed, Mar 1, 2023 at 5:52 PM LIU Zhiwei  wrote:


On 2023/2/28 18:40, Bin Meng wrote:

There is no need to generate the CSR XML if the Zicsr extension
is not enabled.

Should we generate the FPU XML or Vector XML when Zicsr is not enabled?

Good point. I think we should disable that too.

Seems reasonable.  Did you want to do that as part of a v3, or just as a
follow-on fix?


I looked at this further.

The FPU / Vector XML is guarded by the " env->misa_ext" check. If
Zicsr is disabled while F or V extension is off, QEMU will error out
in riscv_cpu_realize() earlier before the gdbstub init.


Make sense.

Zhiwei



So current patch should be fine.

Regards,
Bin




Re: [PATCH 0/2] target/riscv: some vector_helper.c cleanups

2023-03-01 Thread Palmer Dabbelt

On Sun, 26 Feb 2023 09:05:12 PST (-0800), dbarb...@ventanamicro.com wrote:

Based-on: 20230222185205.355361-2-dbarb...@ventanamicro.com
("[PATCH v7 01/10] target/riscv: introduce riscv_cpu_cfg()")

Hi,

This is a re-send of patch 1, which is already reviewed, with a
follow-up that uses riscv_cpu_cfg() in the remaining of the file. This
was suggested by Weiwei Li in the "[PATCH 0/4] RISCVCPUConfig related
cleanups" review. Patch 1 makes the work of patch 2 easier since it
eliminated some uses of env_archcpu() we want to avoid.

Both patches depends on patch "[PATCH v7 01/10] target/riscv: introduce
riscv_cpu_cfg()" that can be found here:

https://patchew.org/QEMU/20230222185205.355361-1-dbarb...@ventanamicro.com/20230222185205.355361-2-dbarb...@ventanamicro.com/


Daniel Henrique Barboza (2):
  target/riscv/vector_helper.c: create vext_set_tail_elems_1s()
  target/riscv/vector_helper.c: avoid env_archcpu() when reading
RISCVCPUConfig

 target/riscv/vector_helper.c | 104 +--
 1 file changed, 39 insertions(+), 65 deletions(-)


Thanks, these are queued up.  If we're already broken on 
non-power-of-two then that ROUND_UP() suggestion might be worth looking 
at, as I doubt we'd want to support them even if the ISA allows for it.




Re: [PATCH 0/2] Fix the OpenSBI CI job and bump to v1.2

2023-03-01 Thread Palmer Dabbelt

On Fri, 24 Feb 2023 13:25:41 PST (-0800), Palmer Dabbelt wrote:

The OpenSBI version bump found a CI failure, which appears to actually
have been related to the Docker version as opposed to the Ubuntu
version -- at least assuming my local CI run
 is accurate
(thanks to Thomas for pointing out how to get those set up).

I've left off the Ubuntu version upgrade because it's triggering some
key-related issues when in apt.  That's probably worth doing, but I
figured it'd be better to send these along now to try and get things
unblocked.  The EDK2 Docker setup looks like it would have the same
issue but I'll also keep that independent.


This is queued.



Re: [PATCH 0/4] RISCVCPUConfig related cleanups

2023-03-01 Thread Bin Meng
Hi Palmer,

On Thu, Mar 2, 2023 at 10:08 AM Palmer Dabbelt  wrote:
>
> On Fri, 24 Feb 2023 09:45:16 PST (-0800), dbarb...@ventanamicro.com wrote:
> > Hi,
> >
> > These cleanups were suggested by LIU Zhiwei during the review of the
> > RISCV_FEATURE_* cleanups, currently on version 7 [1].
> >
> > These are dependent on the patch "[PATCH v7 01/10] target/riscv: introduce
> > riscv_cpu_cfg()" from [1] because we use the riscv_cpu_cfg() API.
> >
> >
> > [1] https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg06467.html
> >
> > Daniel Henrique Barboza (4):
> >   target/riscv/csr.c: use env_archcpu() in ctr()
> >   target/riscv/csr.c: simplify mctr()
> >   target/riscv/csr.c: use riscv_cpu_cfg() to avoid env_cpu() pointers
> >   target/riscv/csr.c: avoid env_archcpu() usages when reading
> > RISCVCPUConfig
> >
> >  target/riscv/csr.c | 90 +-
> >  1 file changed, 24 insertions(+), 66 deletions(-)
>
> I just based these on that patch, which landed as d4ea711704
> ("target/riscv: introduce riscv_cpu_cfg()").  That resulted in a handful
> of merge conflicts, but everything looked pretty mechanical.  So it's
> queued up.
>

As Weiwei pointed out in
https://lore.kernel.org/qemu-devel/e40e75ff-37e0-94d3-e9e2-c159b0e2d...@iscas.ac.cn/,
patch#1 should be dropped.

But I see it was landed up in your tree @
https://github.com/palmer-dabbelt/qemu/commit/3c7d54f945f1b5b474ea35c0815a1618927c9384,
while my changes are already in tree @
https://github.com/palmer-dabbelt/qemu/commit/94e297071bc0a5965cc32c497a886f2cf9d32710.

Not sure why git doesn't figure that out ...

Regards,
Bin



Re: [PATCH v3 4/8] hw/isa/vt82c686: Implement PCI IRQ routing

2023-03-01 Thread BALATON Zoltan

On Wed, 1 Mar 2023, Mark Cave-Ayland wrote:

On 27/02/2023 16:52, Bernhard Beschow wrote:
On Mon, Feb 27, 2023 at 1:57 PM BALATON Zoltan > wrote:

in. So if
 >> fuloon2e needs to do that then it should. I'll check that as I was 
focusing

 >
 > fuloong2e

I've checked fuloong2e and it still works as before. PCI bus is handled by
bonito on that board so your patch would actually break it. The VIA chip
is a PCIDevice. You're not supposed to replace the interrupts of the bus
it's connected to from this model as that should be done by the pci-host
or the board. Therefore modeling the chip's PIRQ/PINT pins as gpios which
is the QDev concept for that is right and your usage of pci_set_irq here
is wrong.


Works for me:
(08/84) 
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_mips64el_fuloong2e: 
PASS(2.77 s)


The bonito code is interesting in that the IRQ is swizzled in 
pci_bonito_map_irq() to the internal IRQ, and then pci_bonito_set_irq() sets 
the output (CPU?) IRQ accordingly. This means that the routing is currently 
fixed based upon the slot number, rather than using the VIA PCI IRQ routing. 
This bit will need some thought as to how this interacts with pci_bus_irqs() 
in your proposed patch, feel free to suggest a suitable approach.


I believe the fuloong2e may be similarly connected as the pegasos2. The 
Marvell Discovery II mv64361 was based on a MIPS counterpart so the 
concepts may be similar in these just the CPU arch is different.


This doc https://wiki.qemu.org/images/0/09/Bonito64-spec.pdf says the 
bonito north bridge has some GPin and GPIO pins which are connected to the 
interrupt controller (see section 5.15). Probably you can infer which pins 
PCI IRQs should come in from the map_irq function in the bonito model. I'd 
expect GPIO0-3 based on description in the table in section 6.1


On the other hand the board's firmware suggests PCI interrupt lines are 
also connected to the PIRQ pins of th 686B:


https://github.com/loongson-community/pmon/blob/master/sys/dev/pci/vt82c686_devbd2e.c

(if this is the right file to look at as there are different versions but 
dev board 2e said to inlude fuloong2e in the main README). Then in 686B 
PCI interrupts are mapped to 9.10.11.13 with the PnP IRQ routing registers 
in 686B.


This could then be modeled similarly to how I did it in this series for 
pegasos2: One could add gpio inputs in bonito to model the pins where the 
PCI interrupt lines are connected then connect these together in the board 
code just like they are wired on the real board.


Although this board does not have any PCI slots so these are only for the 
on board PCI devices: https://www.linux-mips.org/wiki/Fuloong_2E but a 
similar dev board may have 4 PCI slots.


Regards,
BALATON Zoltan

Re: [PATCH 1/2] target/riscv/vector_helper.c: create vext_set_tail_elems_1s()

2023-03-01 Thread Palmer Dabbelt

On Sun, 26 Feb 2023 10:23:01 PST (-0800), phi...@linaro.org wrote:

On 26/2/23 18:05, Daniel Henrique Barboza wrote:

Commit 752614cab8e6 ("target/riscv: rvv: Add tail agnostic for vector
load / store instructions") added code to set the tail elements to 1 in
the end of vext_ldst_stride(), vext_ldst_us(), vext_ldst_index() and
vext_ldff(). Aside from a env->vl versus an evl value being used in the
first loop, the code is being repeated 4 times.

Create a helper to avoid code repetition in all those functions.
Arguments that are used in the callers (nf, esz and max_elems) are
passed as arguments. All other values are being derived inside the
helper.

Reviewed-by: Weiwei Li 
Reviewed-by: Frank Chang 
Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/vector_helper.c | 86 +---
  1 file changed, 30 insertions(+), 56 deletions(-)




+static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
+   void *vd, uint32_t desc, uint32_t nf,
+   uint32_t esz, uint32_t max_elems)
+{
+uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
+uint32_t vta = vext_vta(desc);
+uint32_t registers_used;
+int k;
+
+for (k = 0; k < nf; ++k) {
+vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
+  (k * max_elems + max_elems) * esz);
+}
+
+if (nf * max_elems % total_elems != 0) {
+registers_used = ((nf * max_elems) * esz + (vlenb - 1)) / vlenb;
+vext_set_elems_1s(vd, vta, (nf * max_elems) * esz,
+  registers_used * vlenb);
+}


   for (unsigned k = 0; k < nf; ++k) {
   vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
 (k * max_elems + max_elems) * esz);
   }

   if (nf * max_elems % total_elems != 0) {
   uint32_t cnt = (nf * max_elems) * esz;
   vext_set_elems_1s(vd, vta, cnt, QEMU_ALIGN_UP(cnt, vlenb));
   }

I suspect ROUND_UP() could be used if vlenb is a power of 2.


As far as I can tell there's nothing in the ISA that requires vlenb be a 
power of two, it's just defined as 

   The XLEN-bit-wide read-only CSR vlenb holds the value VLEN/8, i.e., 
   the vector register length in bytes.


I'm pretty surprised to see that's the case and I'd doubt anything 
actually works with non-power-of-two vlenb.  It's possible I'm just 
missing something in the ISA so I opened a bug at

.



Re: [PATCH 0/4] RISCVCPUConfig related cleanups

2023-03-01 Thread Palmer Dabbelt

On Fri, 24 Feb 2023 09:45:16 PST (-0800), dbarb...@ventanamicro.com wrote:

Hi,

These cleanups were suggested by LIU Zhiwei during the review of the
RISCV_FEATURE_* cleanups, currently on version 7 [1].

These are dependent on the patch "[PATCH v7 01/10] target/riscv: introduce
riscv_cpu_cfg()" from [1] because we use the riscv_cpu_cfg() API.


[1] https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg06467.html

Daniel Henrique Barboza (4):
  target/riscv/csr.c: use env_archcpu() in ctr()
  target/riscv/csr.c: simplify mctr()
  target/riscv/csr.c: use riscv_cpu_cfg() to avoid env_cpu() pointers
  target/riscv/csr.c: avoid env_archcpu() usages when reading
RISCVCPUConfig

 target/riscv/csr.c | 90 +-
 1 file changed, 24 insertions(+), 66 deletions(-)


I just based these on that patch, which landed as d4ea711704 
("target/riscv: introduce riscv_cpu_cfg()").  That resulted in a handful 
of merge conflicts, but everything looked pretty mechanical.  So it's 
queued up.


Thanks!



Re: [RFC] CXL: TCG/KVM instruction alignment issue discussion default

2023-03-01 Thread Ajay Joshi
On Tue, Feb 28, 2023 at 4:54 PM Jonathan Cameron
 wrote:
>
> On Mon, 27 Feb 2023 11:06:47 +
> Jørgen Hansen  wrote:
>
> > On 2/18/23 11:22, Gregory Price wrote:
> > > Breaking this off into a separate thread for archival sake.
> > >
> > > There's a bug with handling execution of instructions held in CXL
> > > memory - specifically when an instruction crosses a page boundary.
> > >
> > > The result of this is that type-3 devices cannot use KVM at all at the
> > > moment, and require the attached patch to run in TCG-only mode.
> > >
> > >
> > > CXL memory devices are presently emulated as MMIO, and MMIO has no
> > > coherency guarantees, so TCG doesn't cache the results of translating
> > > an instruction, meaning execution is incredibly slow (orders of
> > > magnitude slower than KVM).
> > >
> > >
> > > Request for comments:
> > >
> > >
> > > First there's the stability issue:
> > >
> > > 0) TCG cannot handle instructions across a page boundary spanning ram and
> > > MMIO. See attached patch for hotfix.  This basically solves the page
> > > boundary issue by reverting the entire block to MMIO-mode if the
> > > problem is detected.
> > >
> > > 1) KVM needs to be investigated.  It's likely the same/similar issue,
> > > but it's not confirmed.
> >
> > I ran into an issue with KVM as well. However, it wasn't a page boundary
> > spanning issue, since I could hit it when using pure CXL backed memory
> > for a given application. It turned out that (at least) certain AVX
> > instructions didn't handle execution from MMIO when using qemu. This
> > generated an illegal instruction exception for the application. At that
> > point, I switched to tcg, so I didn't investigate if running a non-AVX
> > system would work with KVM.
>
> Short term I'm wondering if we should attempt to error out on KVM
> unless some override parameter is used alongside the main cxl=on
This seems like a good idea. Avoids the trouble of discovering a lot later
during the execution.
>
> >
> > > Second there's the performance issue:
> > >
> > > 0) Do we actually care about performance? How likely are users to
> > > attempt to run software out of CXL memory?
> > >
> > > 1) If we do care, is there a potential for converting CXL away from the
> > > MMIO design?  The issue is coherency for shared memory. Emulating
> > > coherency is a) hard, and b) a ton of work for little gain.
> > >
> > > Presently marking CXL memory as MMIO basically enforces coherency by
> > > preventing caching, though it's unclear how this is enforced
> > > by KVM (or if it is, i have to imagine it is).
> >
> > Having the option of doing device specific processing of accesses to a
> > CXL type 3 device (that the MMIO based access allows) is useful for
> > experimentation with device functionality, so I would be sad to see that
> > option go away. Emulating cache line access to a type 3 device would be
> > interesting, and could potentially be implemented in a way that would
> > allow caching of device memory in a shadow page in RAM, but that it a
> > rather large project.
>
> Absolutely agree.  Can sketch a solution that is entirely in QEMU and
> works with KVM on a white board, but it doesn't feel like a small job
> to actually implement it and I'm sure there are nasty corners
> (persistency is going to be tricky)
>
> If anyone sees this as a 'fun challenge' and wants to take it on though
> that would be great!
>
> Jonathan
>
> >
> > > It might be nice to solve this for non-shared memory regions, but
> > > testing functionality >>> performance at this point so it might not
> > > worth the investment.
> >
> > Thanks,
> > Jorgen
>



Re: [PATCH 2/6] target/riscv: Fix the relationship of PBMTE/STCE fields between menvcfg and henvcfg

2023-03-01 Thread Palmer Dabbelt

On Fri, 24 Feb 2023 04:36:43 PST (-0800), liwei...@iscas.ac.cn wrote:


On 2023/2/24 20:19, Andrew Jones wrote:

On Fri, Feb 24, 2023 at 12:08:48PM +0800, Weiwei Li wrote:

henvcfg.PBMTE/STCE are read-only zero if menvcfg.PBMTE/STCE are zero.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
  target/riscv/csr.c | 13 +
  1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index feae23cab0..02cb2c2bb7 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1956,7 +1956,11 @@ static RISCVException read_henvcfg(CPURISCVState *env, 
int csrno,
  return ret;
  }

-*val = env->henvcfg;
+/*
+ * henvcfg.pbmte is read_only 0 when menvcfg.pbmte = 0
+ * henvcfg.stce is read_only 0 when menvcfg.stce = 0
+ */
+*val = env->henvcfg & (~(HENVCFG_PBMTE | HENVCFG_STCE) | env->menvcfg);
  return RISCV_EXCP_NONE;
  }

@@ -1972,7 +1976,7 @@ static RISCVException write_henvcfg(CPURISCVState *env, 
int csrno,
  }

  if (riscv_cpu_mxl(env) == MXL_RV64) {
-mask |= HENVCFG_PBMTE | HENVCFG_STCE;
+mask |= env->menvcfg & (HENVCFG_PBMTE | HENVCFG_STCE);

nit:

   While HENVCFG_PBMTE == MENVCFG_PBMTE, I'd prefer we use
   MENVCFG_* with menvcfg and HENVCFG_* with henvcfg.


Yeah. I agree. However, I think this mask is finally used for henvcfg.
We just use menvcfg to mask the  bits

when the same bits are zero. So I didn't modify HENVCFG_* here.


I guess it's kind of bikeshedding because the bits are the same, but 
what's in the patch seems cleaner to me: we're writing the H state 
masked by the M state, so we should use the H definitions (even if it 
doesn't matter).




Regards,

Weiwei Li




  }

  env->henvcfg = (env->henvcfg & ~mask) | (val & mask);
@@ -1990,14 +1994,15 @@ static RISCVException read_henvcfgh(CPURISCVState *env, 
int csrno,
  return ret;
  }

-*val = env->henvcfg >> 32;
+*val = (env->henvcfg & (~(HENVCFG_PBMTE | HENVCFG_STCE) |
+env->menvcfg)) >> 32;
  return RISCV_EXCP_NONE;
  }

  static RISCVException write_henvcfgh(CPURISCVState *env, int csrno,
target_ulong val)
  {
-uint64_t mask = HENVCFG_PBMTE | HENVCFG_STCE;
+uint64_t mask = env->menvcfg & (HENVCFG_PBMTE | HENVCFG_STCE);
  uint64_t valh = (uint64_t)val << 32;
  RISCVException ret;

--
2.25.1



Thanks,
drew




Re: [PATCH 0/6] target/riscv: Add support for Svadu extension

2023-03-01 Thread Palmer Dabbelt

On Thu, 23 Feb 2023 20:08:46 PST (-0800), liwei...@iscas.ac.cn wrote:

This patchset adds support svadu extension. It also fixes some relationship 
between *envcfg fields and Svpbmt/Sstc extensions.

Specification for Svadu extension can be found in:

https://github.com/riscv/riscv-svadu

The port is available here:
https://github.com/plctlab/plct-qemu/tree/plct-svadu-upstream

Weiwei Li (6):
  target/riscv: Fix the relationship between menvcfg.PBMTE/STCE and
Svpbmt/Sstc extensions
  target/riscv: Fix the relationship of PBMTE/STCE fields between
menvcfg and henvcfg
  target/riscv: Add csr support for svadu
  target/riscv: Add *envcfg.PBMTE related check in address translation
  target/riscv: Add *envcfg.HADE related check in address translation
  target/riscv: Export Svadu property

 target/riscv/cpu.c|  8 
 target/riscv/cpu.h|  1 +
 target/riscv/cpu_bits.h   |  4 
 target/riscv/cpu_helper.c | 16 ++--
 target/riscv/csr.c| 26 --
 5 files changed, 47 insertions(+), 8 deletions(-)


Thanks, this is queued up on riscv-to-apply.next .



Re: [PATCH v2 4/6] hw/cxl: QMP based poison injection support

2023-03-01 Thread Michael S. Tsirkin
On Mon, Feb 27, 2023 at 05:03:09PM +, Jonathan Cameron wrote:
> Inject poison using qmp command cxl-inject-poison to add an entry to the
> poison list.
> 
> For now, the poison is not returned CXL.mem reads, but only via the
> mailbox command Get Poison List.
> 
> See CXL rev 3.0, sec 8.2.9.8.4.1 Get Poison list (Opcode 4300h)
> 
> Kernel patches to use this interface here:
> https://lore.kernel.org/linux-cxl/cover.1665606782.git.alison.schofi...@intel.com/
> 
> To inject poison using qmp (telnet to the qmp port)
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-inject-poison",
> "arguments": {
>  "path": "/machine/peripheral/cxl-pmem0",
>  "start": 2048,
>  "length": 256
> }
> }
> 
> Adjusted to select a device on your machine.
> 
> Note that the poison list supported is kept short enough to avoid the
> complexity of state machine that is needed to handle the MORE flag.
> 
> Signed-off-by: Jonathan Cameron 

You need to CC QAPI maintainers.

> ---
> v2:
> Improve QMP documentation.
> Fix up some endian issues
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 90 +
>  hw/mem/cxl_type3.c  | 56 +++
>  hw/mem/cxl_type3_stubs.c|  6 +++
>  include/hw/cxl/cxl_device.h | 20 +
>  qapi/cxl.json   | 18 
>  5 files changed, 190 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 702e16ca20..792d3ee5aa 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -62,6 +62,8 @@ enum {
>  #define GET_PARTITION_INFO 0x0
>  #define GET_LSA   0x2
>  #define SET_LSA   0x3
> +MEDIA_AND_POISON = 0x43,
> +#define GET_POISON_LIST0x0
>  };
>  
>  /* 8.2.8.4.5.1 Command Return Codes */
> @@ -295,6 +297,10 @@ static CXLRetCode cmd_identify_memory_device(struct 
> cxl_cmd *cmd,
>  stq_le_p(>persistent_capacity, cxl_dstate->pmem_size / 
> CXL_CAPACITY_MULTIPLIER);
>  stq_le_p(>volatile_capacity, cxl_dstate->vmem_size / 
> CXL_CAPACITY_MULTIPLIER);
>  stl_le_p(>lsa_size, cvc->get_lsa_size(ct3d));
> +/* 256 poison records */
> +st24_le_p(id->poison_list_max_mer, 256);
> +/* No limit - so limited by main poison record limit */
> +stw_le_p(>inject_poison_limit, 0);
>  
>  *len = sizeof(*id);
>  return CXL_MBOX_SUCCESS;
> @@ -384,6 +390,88 @@ static CXLRetCode cmd_ccls_set_lsa(struct cxl_cmd *cmd,
>  return CXL_MBOX_SUCCESS;
>  }
>  
> +/*
> + * This is very inefficient, but good enough for now!
> + * Also the payload will always fit, so no need to handle the MORE flag and
> + * make this stateful. We may want to allow longer poison lists to aid
> + * testing that kernel functionality.
> + */
> +static CXLRetCode cmd_media_get_poison_list(struct cxl_cmd *cmd,
> +CXLDeviceState *cxl_dstate,
> +uint16_t *len)
> +{
> +struct get_poison_list_pl {
> +uint64_t pa;
> +uint64_t length;
> +} QEMU_PACKED;
> +
> +struct get_poison_list_out_pl {
> +uint8_t flags;
> +uint8_t rsvd1;
> +uint64_t overflow_timestamp;
> +uint16_t count;
> +uint8_t rsvd2[0x14];
> +struct {
> +uint64_t addr;
> +uint32_t length;
> +uint32_t resv;
> +} QEMU_PACKED records[];
> +} QEMU_PACKED;
> +
> +struct get_poison_list_pl *in = (void *)cmd->payload;
> +struct get_poison_list_out_pl *out = (void *)cmd->payload;
> +CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> +uint16_t record_count = 0, i = 0;
> +uint64_t query_start, query_length;
> +CXLPoisonList *poison_list = >poison_list;
> +CXLPoison *ent;
> +uint16_t out_pl_len;
> +
> +query_start = ldq_le_p(>pa);
> +/* 64 byte alignemnt required */
> +if (query_start & 0x3f) {
> +return CXL_MBOX_INVALID_INPUT;
> +}
> +query_length = ldq_le_p(>length) * 64;
> +
> +QLIST_FOREACH(ent, poison_list, node) {
> +/* Check for no overlap */
> +if (ent->start >= query_start + query_length ||
> +ent->start + ent->length <= query_start) {
> +continue;
> +}
> +record_count++;
> +}
> +out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> +assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> +
> +memset(out, 0, out_pl_len);
> +QLIST_FOREACH(ent, poison_list, node) {
> +uint64_t start, stop;
> +
> +/* Check for no overlap */
> +if (ent->start >= query_start + query_length ||
> +ent->start + ent->length <= query_start) {
> +continue;
> +}
> +
> +/* Deal with overlap */
> +start = MAX(ent->start & 0xffc0, query_start);
> +stop = MIN((ent->start & 0xffc0) + ent->length,
> +   

Re: [PATCH v2 1/2] hw/riscv: Skip re-generating DT nodes for a given DTB

2023-03-01 Thread Palmer Dabbelt

On Mon, 27 Feb 2023 23:45:21 PST (-0800), bm...@tinylab.org wrote:

Launch qemu-system-riscv64 with a given dtb for 'sifive_u' and 'virt'
machines, QEMU complains:

  qemu_fdt_add_subnode: Failed to create subnode /soc: FDT_ERR_EXISTS

The whole DT generation logic should be skipped when a given DTB is
present.

Fixes: b1f19f238cae ("hw/riscv: write bootargs 'chosen' FDT after 
riscv_load_kernel()")
Signed-off-by: Bin Meng 
Reviewed-by: Daniel Henrique Barboza 
---

(no changes since v1)

 hw/riscv/sifive_u.c | 1 +
 hw/riscv/virt.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index ad3bb35b34..76db5ed3dd 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -118,6 +118,7 @@ static void create_fdt(SiFiveUState *s, const MemMapEntry 
*memmap,
 error_report("load_device_tree() failed");
 exit(1);
 }
+return;
 } else {
 fdt = ms->fdt = create_device_tree(_size);
 if (!fdt) {
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 86c4adc0c9..0c7b4a1e46 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -1014,6 +1014,7 @@ static void create_fdt(RISCVVirtState *s, const 
MemMapEntry *memmap)
 error_report("load_device_tree() failed");
 exit(1);
 }
+return;
 } else {
 ms->fdt = create_device_tree(>fdt_size);
 if (!ms->fdt) {


Thanks, these two are queued up.



Re: [PATCH] target/riscv: Add support for Zicond extension

2023-03-01 Thread Palmer Dabbelt

On Tue, 21 Feb 2023 01:10:09 PST (-0800), liwei...@iscas.ac.cn wrote:

The spec can be found in https://github.com/riscv/riscv-zicond.
Two instructions are added:
 - czero.eqz: Moves zero to a register rd, if the condition rs2 is
   equal to zero, otherwise moves rs1 to rd.
 - czero.nez: Moves zero to a register rd, if the condition rs2 is
   nonzero, otherwise moves rs1 to rd.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/cpu.c   |  2 +
 target/riscv/cpu.h   |  1 +
 target/riscv/insn32.decode   |  4 ++
 target/riscv/insn_trans/trans_rvzicond.c.inc | 49 
 target/riscv/translate.c |  1 +
 5 files changed, 57 insertions(+)
 create mode 100644 target/riscv/insn_trans/trans_rvzicond.c.inc

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 0dd2f0c753..80b92930ae 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -74,6 +74,7 @@ struct isa_ext_data {
 static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(h, false, PRIV_VERSION_1_12_0, ext_h),
 ISA_EXT_DATA_ENTRY(v, false, PRIV_VERSION_1_12_0, ext_v),
+ISA_EXT_DATA_ENTRY(zicond, true, PRIV_VERSION_1_12_0, ext_zicond),
 ISA_EXT_DATA_ENTRY(zicsr, true, PRIV_VERSION_1_10_0, ext_icsr),
 ISA_EXT_DATA_ENTRY(zifencei, true, PRIV_VERSION_1_10_0, ext_ifencei),
 ISA_EXT_DATA_ENTRY(zihintpause, true, PRIV_VERSION_1_10_0, 
ext_zihintpause),
@@ -1143,6 +1144,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, 
false),

 /* These are experimental so mark with 'x-' */
+DEFINE_PROP_BOOL("x-zicond", RISCVCPU, cfg.ext_zicond, false),
 DEFINE_PROP_BOOL("x-j", RISCVCPU, cfg.ext_j, false),
 /* ePMP 0.9.3 */
 DEFINE_PROP_BOOL("x-epmp", RISCVCPU, cfg.epmp, false),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 7128438d8e..81b7c92e7a 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -447,6 +447,7 @@ struct RISCVCPUConfig {
 bool ext_zkt;
 bool ext_ifencei;
 bool ext_icsr;
+bool ext_zicond;
 bool ext_zihintpause;
 bool ext_smstateen;
 bool ext_sstc;
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b7e7613ea2..fb537e922e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -890,3 +890,7 @@ sm3p1   00 01000 01001 . 001 . 0010011 @r2
 # *** RV32 Zksed Standard Extension ***
 sm4ed   .. 11000 . . 000 . 0110011 @k_aes
 sm4ks   .. 11010 . . 000 . 0110011 @k_aes
+
+# *** RV32 Zicond Standard Extension ***
+czero_eqz   111  . . 101 . 0110011 @r
+czero_nez   111  . . 111 . 0110011 @r
diff --git a/target/riscv/insn_trans/trans_rvzicond.c.inc 
b/target/riscv/insn_trans/trans_rvzicond.c.inc
new file mode 100644
index 00..645260164e
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvzicond.c.inc
@@ -0,0 +1,49 @@
+/*
+ * RISC-V translation routines for the Zicond Standard Extension.
+ *
+ * Copyright (c) 2020-2023 PLCT Lab
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#define REQUIRE_ZICOND(ctx) do {  \
+if (!ctx->cfg_ptr->ext_zicond) {  \
+return false; \
+} \
+} while (0)
+
+static bool trans_czero_eqz(DisasContext *ctx, arg_czero_eqz *a)
+{
+REQUIRE_ZICOND(ctx);
+
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, EXT_NONE);
+TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
+
+tcg_gen_movcond_tl(TCG_COND_EQ, dest, src2, ctx->zero, ctx->zero, src1);
+gen_set_gpr(ctx, a->rd, dest);
+return true;
+}
+
+static bool trans_czero_nez(DisasContext *ctx, arg_czero_nez *a)
+{
+REQUIRE_ZICOND(ctx);
+
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, EXT_NONE);
+TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
+
+tcg_gen_movcond_tl(TCG_COND_NE, dest, src2, ctx->zero, ctx->zero, src1);
+gen_set_gpr(ctx, a->rd, dest);
+return true;
+}
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 772f9d7973..6e65c6afca 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -1103,6 +1103,7 @@ static uint32_t opcode_at(DisasContextBase *dcbase, 
target_ulong pc)
 #include 

Re: [PATCH v2 0/6] hw/cxl: Poison get, inject, clear

2023-03-01 Thread Alison Schofield
On Mon, Feb 27, 2023 at 05:03:05PM +, Jonathan Cameron wrote:

Hi Jonathan,
Can you share your repo with this support?  How about your qemu cmdline?
I'm more of a 'try it out' type of a reviewer for qemu changes.
Thanks,
Alison

> v2: Thanks to Ira for review and also to Philippe as some of the
> changes follow through from comments on precusor series.
> 
> - Fixed a bunch of endian issues. Note that QEMU CXL suppport only currently
>   supports platforms that happen to be little endian so these are more
>   theoretical than bugs that can be triggered.
> - Improve handling over mailbox inject poison that overlaps with
>   qmp injected (which can be bigger).
> - Tighter checks on alignment.
> - Add 'Since' entries to qapi docs.
> - Drop the CXLRetCode move out of this series as it isn't needed for this.
>   Will appear in next series I post instead (Ira's event series)
> - Drag down the st24_le_p() patch from Ira's Event series so we can use
>   it in this series.
> 
> Note Alison has stated the kernel series will be post 6.3 material
> so this one isn't quite as urgent as the patches it is based on.
> However I think this series in a good state (plus I have lots more queued
> behind it) hence promoting it from RFC.
> 
> Changes since RFC v2: Thanks to Markus for review.
>  - Improve documentation for QMP interface
>  - Add better description of baseline series
>  - Include precursor refactors around ret_code / CXLRetCode as this is now
>the first series in suggeste merge order to rely on those.
>  - Include Ira's cxl_device_get_timestamp() function as it was better than
>the equivalent in the RFC.
> 
> Based on following series (in order)
> 1. [PATCH v4 00/10] hw/cxl: CXL emulation cleanups and minor fixes for 
> upstream
> 2. [PATCH v6 0/8] hw/cxl: RAS error emulation and injection
> 3. [PATCH v2 0/2] hw/cxl: Passthrough HDM decoder emulation
> 4. [PATCH v4 0/2] hw/mem: CXL Type-3 Volatile Memory Support
> 
> Based on: Message-Id: 20230206172816.8201-1-jonathan.came...@huawei.com
> Based-on: Message-id: 20230227112751.6101-1-jonathan.came...@huawei.com
> Based-on: Message-id: 20230227153128.8164-1-jonathan.came...@huawei.com
> Based-on: Message-id: 20230227163157.6621-1-jonathan.came...@huawei.com
> 
> The series supports:
> 1) Injection of variable length poison regions via QMP (to fake real
>memory corruption and ensure we deal with odd overflow corner cases
>such as clearing the middle of a large region making the list overflow
>as we go from one long entry to two smaller entries.
> 2) Read of poison list via the CXL mailbox.
> 3) Injection via the poison injection mailbox command (limited to 64 byte
>entries)
> 4) Clearing of poison injected via either method.
> 
> The implementation is meant to be a valid combination of impdef choices
> based on what the spec allowed. There are a number of places where it could
> be made more sophisticated that we might consider in future:
> * Fusing adjacent poison entries if the types match.
> * Separate injection list and main poison list, to test out limits on
>   injected poison list being smaller than the main list.
> * Poison list overflow event (needs event log support in general)
> * Connecting up to the poison list error record generation (rather complex
>   and not needed for currently kernel handling testing).
> 
> As the kernel code is currently fairly simple, it is likely that the above
> does not yet matter but who knows what will turn up in future!
> 
> Kernel patches:
>  [PATCH v7 0/6] CXL Poison List Retrieval & Tracing
>  cover.1676685180.git.alison.schofi...@intel.com
>  [PATCH v2 0/6] cxl: CXL Inject & Clear Poison
>  cover.1674101475.git.alison.schofi...@intel.com
> 
> 
> Ira Weiny (2):
>   hw/cxl: Introduce cxl_device_get_timestamp() utility function
>   bswap: Add the ability to store to an unaligned 24 bit field
> 
> Jonathan Cameron (4):
>   hw/cxl: rename mailbox return code type from ret_code to CXLRetCode
>   hw/cxl: QMP based poison injection support
>   hw/cxl: Add poison injection via the mailbox.
>   hw/cxl: Add clear poison mailbox command support.
> 
>  hw/cxl/cxl-device-utils.c   |  15 ++
>  hw/cxl/cxl-mailbox-utils.c  | 285 ++--
>  hw/mem/cxl_type3.c  |  92 
>  hw/mem/cxl_type3_stubs.c|   6 +
>  include/hw/cxl/cxl_device.h |  23 +++
>  include/qemu/bswap.h|  23 +++
>  qapi/cxl.json   |  18 +++
>  7 files changed, 420 insertions(+), 42 deletions(-)
> 
> -- 
> 2.37.2
> 



Re: [PATCH] target/riscv: Fix checking of whether instruciton at 'pc_next' spans pages

2023-03-01 Thread Palmer Dabbelt

On Sun, 19 Feb 2023 23:27:32 PST (-0800), songsha...@eswincomputing.com wrote:

This bug has a noticeable behavior of falling back to the main loop and
respawning a redundant translation block including a single instruction
when the end address of the compressive instruction is exactly on a page
boundary, and slows down running system performance.

Signed-off-by: Shaobo Song 
---
 target/riscv/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 772f9d7..8ffa211 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -1261,7 +1261,7 @@ static void riscv_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 uint16_t next_insn = cpu_lduw_code(env, ctx->base.pc_next);
 int len = insn_len(next_insn);

-if (!is_same_page(>base, ctx->base.pc_next + len)) {
+if (!is_same_page(>base, ctx->base.pc_next + len - 1)) {
 ctx->base.is_jmp = DISAS_TOO_MANY;
 }
 }


Thanks, this is queued in riscv-to-apply.next .



Re: [PATCH] RISC-V: XTheadMemPair: Remove register restrictions for store-pair

2023-03-01 Thread Palmer Dabbelt

On Mon, 20 Feb 2023 01:56:12 PST (-0800), christoph.muell...@vrull.eu wrote:

From: Christoph Müllner 

The XTheadMemPair does not define any restrictions for store-pair
instructions (th.sdd or th.swd). However, the current code enforces
the restrictions that are required for load-pair instructions.
Let's fix this by removing this code.

Signed-off-by: Christoph Müllner 
---
 target/riscv/insn_trans/trans_xthead.c.inc | 4 
 1 file changed, 4 deletions(-)

diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index be87c34f56..cf1731b08d 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -980,10 +980,6 @@ static bool trans_th_lwud(DisasContext *ctx, arg_th_pair 
*a)
 static bool gen_storepair_tl(DisasContext *ctx, arg_th_pair *a, MemOp memop,
  int shamt)
 {
-if (a->rs == a->rd1 || a->rs == a->rd2 || a->rd1 == a->rd2) {
-return false;
-}
-
 TCGv data1 = get_gpr(ctx, a->rd1, EXT_NONE);
 TCGv data2 = get_gpr(ctx, a->rd2, EXT_NONE);
 TCGv addr1 = tcg_temp_new();


Thanks, this is queued in riscv-to-apply.next .



Re: Re: [PATCH v5 07/12] hmp: add cryptodev info command

2023-03-01 Thread zhenwei pi




On 3/1/23 18:47, Dr. David Alan Gilbert wrote:

* zhenwei pi (pizhen...@bytedance.com) wrote:

Example of this command:
  # virsh qemu-monitor-command vm --hmp info cryptodev
cryptodev1: service=[akcipher|mac|hash|cipher]
 queue 0: type=builtin
cryptodev0: service=[akcipher]
 queue 0: type=lkcf

Signed-off-by: zhenwei pi 


Yes, I think that's fine from HMP; you might want to use some of the
qapi list macros;


Acked-by: Dr. David Alan Gilbert 



Sorry, I missed this in the v6 series. I prefer a followup patch to do 
this minor change, or in the next version(if there is any problem in the 
v6 version).


Thanks!


---
  backends/cryptodev-hmp-cmds.c | 54 +++
  backends/meson.build  |  1 +
  hmp-commands-info.hx  | 14 +
  include/monitor/hmp.h |  1 +
  4 files changed, 70 insertions(+)
  create mode 100644 backends/cryptodev-hmp-cmds.c

diff --git a/backends/cryptodev-hmp-cmds.c b/backends/cryptodev-hmp-cmds.c
new file mode 100644
index 00..4f7220bb13
--- /dev/null
+++ b/backends/cryptodev-hmp-cmds.c
@@ -0,0 +1,54 @@
+/*
+ * HMP commands related to cryptodev
+ *
+ * Copyright (c) 2023 Bytedance.Inc
+ *
+ * Authors:
+ *zhenwei pi
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+#include "monitor/hmp.h"
+#include "monitor/monitor.h"
+#include "qapi/qapi-commands-cryptodev.h"
+#include "qapi/qmp/qdict.h"
+
+
+void hmp_info_cryptodev(Monitor *mon, const QDict *qdict)
+{
+QCryptodevInfoList *il;
+QCryptodevBackendServiceTypeList *sl;
+QCryptodevBackendClientList *cl;
+
+for (il = qmp_query_cryptodev(NULL); il; il = il->next) {
+g_autofree char *services = NULL;
+QCryptodevInfo *info = il->value;
+char *tmp_services;
+
+/* build a string like 'service=[akcipher|mac|hash|cipher]' */
+for (sl = info->service; sl; sl = sl->next) {
+const char *service = QCryptodevBackendServiceType_str(sl->value);
+
+if (!services) {
+services = g_strdup(service);
+} else {
+tmp_services = g_strjoin("|", services, service, NULL);
+g_free(services);
+services = tmp_services;
+}
+}
+monitor_printf(mon, "%s: service=[%s]\n", info->id, services);
+
+for (cl = info->client; cl; cl = cl->next) {
+QCryptodevBackendClient *client = cl->value;
+monitor_printf(mon, "queue %" PRIu32 ": type=%s\n",
+   client->queue,
+   QCryptodevBackendType_str(client->type));
+}
+}
+
+qapi_free_QCryptodevInfoList(il);
+}
diff --git a/backends/meson.build b/backends/meson.build
index 954e658b25..b369e0a9d0 100644
--- a/backends/meson.build
+++ b/backends/meson.build
@@ -1,5 +1,6 @@
  softmmu_ss.add([files(
'cryptodev-builtin.c',
+  'cryptodev-hmp-cmds.c',
'cryptodev.c',
'hostmem-ram.c',
'hostmem.c',
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 754b1e8408..47d63d26db 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -993,3 +993,17 @@ SRST
``info virtio-queue-element`` *path* *queue* [*index*]
  Display element of a given virtio queue
  ERST
+
+{
+.name   = "cryptodev",
+.args_type  = "",
+.params = "",
+.help   = "show the crypto devices",
+.cmd= hmp_info_cryptodev,
+.flags  = "p",
+},
+
+SRST
+  ``info cryptodev``
+Show the crypto devices.
+ERST
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 2220f14fc9..e6cf0b7aa7 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -178,5 +178,6 @@ void hmp_ioport_read(Monitor *mon, const QDict *qdict);
  void hmp_ioport_write(Monitor *mon, const QDict *qdict);
  void hmp_boot_set(Monitor *mon, const QDict *qdict);
  void hmp_info_mtree(Monitor *mon, const QDict *qdict);
+void hmp_info_cryptodev(Monitor *mon, const QDict *qdict);
  
  #endif

--
2.34.1



--
zhenwei pi



Re: [PATCH v2 05/18] target/riscv: gdbstub: Do not generate CSR XML if Zicsr is disabled

2023-03-01 Thread Palmer Dabbelt

On Wed, 01 Mar 2023 16:30:52 PST (-0800), Bin Meng wrote:

On Thu, Mar 2, 2023 at 7:43 AM Palmer Dabbelt  wrote:


On Wed, 01 Mar 2023 01:55:34 PST (-0800), Bin Meng wrote:
> On Wed, Mar 1, 2023 at 5:52 PM LIU Zhiwei  
wrote:
>>
>>
>> On 2023/2/28 18:40, Bin Meng wrote:
>> > There is no need to generate the CSR XML if the Zicsr extension
>> > is not enabled.
>>
>> Should we generate the FPU XML or Vector XML when Zicsr is not enabled?
>
> Good point. I think we should disable that too.

Seems reasonable.  Did you want to do that as part of a v3, or just as a
follow-on fix?



I looked at this further.

The FPU / Vector XML is guarded by the " env->misa_ext" check. If
Zicsr is disabled while F or V extension is off, QEMU will error out
in riscv_cpu_realize() earlier before the gdbstub init.

So current patch should be fine.


There's a merge conflict that git auto-resolved as

diff --cc target/riscv/csr.c
index a1ecf62305,3a7e0217e2..a2cf3536f0
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@@ -90,10 -53,10 +53,9 @@@ static RISCVException fs(CPURISCVState 
 
 static RISCVException vs(CPURISCVState *env, int csrno)

 {
- CPUState *cs = env_cpu(env);
- RISCVCPU *cpu = RISCV_CPU(cs);
+ RISCVCPU *cpu = env_archcpu(env);
 
-if (env->misa_ext & RVV ||

-cpu->cfg.ext_zve32f || cpu->cfg.ext_zve64f) {
+if (cpu->cfg.ext_zve32f) {
 #if !defined(CONFIG_USER_ONLY)
 if (!env->debugger && !riscv_cpu_vector_enabled(env)) {
 return RISCV_EXCP_ILLEGAL_INST;

which looks correct to me.  It's passing my tests and queued up, but LMK if
something looks wrong.

Thanks!



Re: [PATCH 14/33] tests: acpi: update expected blobs

2023-03-01 Thread Michael S. Tsirkin
On Fri, Feb 24, 2023 at 04:37:53PM +0100, Igor Mammedov wrote:
> only following context change:
>  -  Local1 = Zero
> If ((Arg0 != ToUUID ("e5c937d0-3553-4d7a-9117-ea4d19c3434d") /* Device 
> Labeling Interface */))
> {
> Return (Local0)
>  ...
> Return (Local0)
> }
> 
>  +  Local1 = Zero
> Local2 = AIDX (DerefOf (Arg4 [Zero]), DerefOf (Arg4 [One]
> 
> Signed-off-by: Igor Mammedov 
> ---
>  tests/qtest/bios-tables-test-allowed-diff.h   |  35 --
>  tests/data/acpi/pc/DSDT   | Bin 6360 -> 6360 bytes
>  tests/data/acpi/pc/DSDT.acpierst  | Bin 6283 -> 6283 bytes
>  tests/data/acpi/pc/DSDT.acpihmat  | Bin 7685 -> 7685 bytes
>  tests/data/acpi/pc/DSDT.bridge| Bin 12487 -> 12487 bytes
>  tests/data/acpi/pc/DSDT.cphp  | Bin 6824 -> 6824 bytes
>  tests/data/acpi/pc/DSDT.dimmpxm   | Bin 8014 -> 8014 bytes
>  tests/data/acpi/pc/DSDT.hpbridge  | Bin 6323 -> 6323 bytes
>  tests/data/acpi/pc/DSDT.ipmikcs   | Bin 6432 -> 6432 bytes
>  tests/data/acpi/pc/DSDT.memhp | Bin 7719 -> 7719 bytes
>  tests/data/acpi/pc/DSDT.nohpet| Bin 6218 -> 6218 bytes
>  tests/data/acpi/pc/DSDT.numamem   | Bin 6366 -> 6366 bytes
>  tests/data/acpi/pc/DSDT.roothp| Bin 9745 -> 9745 bytes
>  tests/data/acpi/q35/DSDT  | Bin 8252 -> 8252 bytes
>  tests/data/acpi/q35/DSDT.acpierst | Bin 8269 -> 8269 bytes
>  tests/data/acpi/q35/DSDT.acpihmat | Bin 9577 -> 9577 bytes
>  tests/data/acpi/q35/DSDT.acpihmat-noinitiator | Bin 8531 -> 8531 bytes
>  tests/data/acpi/q35/DSDT.applesmc | Bin 8298 -> 8298 bytes
>  tests/data/acpi/q35/DSDT.bridge   | Bin 11481 -> 11481 bytes
>  tests/data/acpi/q35/DSDT.core-count2  | Bin 32392 -> 32392 bytes
>  tests/data/acpi/q35/DSDT.cphp | Bin 8716 -> 8716 bytes
>  tests/data/acpi/q35/DSDT.cxl  | Bin 9578 -> 9578 bytes
>  tests/data/acpi/q35/DSDT.dimmpxm  | Bin 9906 -> 9906 bytes
>  tests/data/acpi/q35/DSDT.ipmibt   | Bin 8327 -> 8327 bytes
>  tests/data/acpi/q35/DSDT.ipmismbus| Bin 8340 -> 8340 bytes
>  tests/data/acpi/q35/DSDT.ivrs | Bin 8269 -> 8269 bytes
>  tests/data/acpi/q35/DSDT.memhp| Bin 9611 -> 9611 bytes
>  tests/data/acpi/q35/DSDT.mmio64   | Bin 9382 -> 9382 bytes
>  tests/data/acpi/q35/DSDT.multi-bridge | Bin 12545 -> 12545 bytes
>  tests/data/acpi/q35/DSDT.nohpet   | Bin 8110 -> 8110 bytes
>  tests/data/acpi/q35/DSDT.numamem  | Bin 8258 -> 8258 bytes
>  tests/data/acpi/q35/DSDT.pvpanic-isa  | Bin 8353 -> 8353 bytes
>  tests/data/acpi/q35/DSDT.tis.tpm12| Bin 8858 -> 8858 bytes
>  tests/data/acpi/q35/DSDT.tis.tpm2 | Bin 8884 -> 8884 bytes
>  tests/data/acpi/q35/DSDT.viot | Bin 9361 -> 9377 bytes

this one is unfortunately malformed, and tests fail if I apply it.


>  tests/data/acpi/q35/DSDT.xapic| Bin 35615 -> 35615 bytes
>  36 files changed, 35 deletions(-)
> 
> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
> b/tests/qtest/bios-tables-test-allowed-diff.h
> index 7e7745db39..dfb8523c8b 100644
> --- a/tests/qtest/bios-tables-test-allowed-diff.h
> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
> @@ -1,36 +1 @@
>  /* List of comma-separated changed AML files to ignore */
> -"tests/data/acpi/pc/DSDT",
> -"tests/data/acpi/pc/DSDT.acpierst",
> -"tests/data/acpi/pc/DSDT.acpihmat",
> -"tests/data/acpi/pc/DSDT.bridge",
> -"tests/data/acpi/pc/DSDT.cphp",
> -"tests/data/acpi/pc/DSDT.dimmpxm",
> -"tests/data/acpi/pc/DSDT.hpbridge",
> -"tests/data/acpi/pc/DSDT.ipmikcs",
> -"tests/data/acpi/pc/DSDT.memhp",
> -"tests/data/acpi/pc/DSDT.nohpet",
> -"tests/data/acpi/pc/DSDT.numamem",
> -"tests/data/acpi/pc/DSDT.roothp",
> -"tests/data/acpi/q35/DSDT",
> -"tests/data/acpi/q35/DSDT.acpierst",
> -"tests/data/acpi/q35/DSDT.acpihmat",
> -"tests/data/acpi/q35/DSDT.acpihmat-noinitiator",
> -"tests/data/acpi/q35/DSDT.applesmc",
> -"tests/data/acpi/q35/DSDT.bridge",
> -"tests/data/acpi/q35/DSDT.core-count2",
> -"tests/data/acpi/q35/DSDT.cphp",
> -"tests/data/acpi/q35/DSDT.cxl",
> -"tests/data/acpi/q35/DSDT.dimmpxm",
> -"tests/data/acpi/q35/DSDT.ipmibt",
> -"tests/data/acpi/q35/DSDT.ipmismbus",
> -"tests/data/acpi/q35/DSDT.ivrs",
> -"tests/data/acpi/q35/DSDT.memhp",
> -"tests/data/acpi/q35/DSDT.mmio64",
> -"tests/data/acpi/q35/DSDT.multi-bridge",
> -"tests/data/acpi/q35/DSDT.nohpet",
> -"tests/data/acpi/q35/DSDT.numamem",
> -"tests/data/acpi/q35/DSDT.pvpanic-isa",
> -"tests/data/acpi/q35/DSDT.tis.tpm12",
> -"tests/data/acpi/q35/DSDT.tis.tpm2",
> -"tests/data/acpi/q35/DSDT.viot",
> -"tests/data/acpi/q35/DSDT.xapic",
> diff --git a/tests/data/acpi/pc/DSDT b/tests/data/acpi/pc/DSDT
> index 
> 

Re: [PATCH] [PATCH] disas/riscv Fix ctzw disassemble

2023-03-01 Thread Palmer Dabbelt

On Fri, 17 Feb 2023 07:45:14 PST (-0800), dbarb...@ventanamicro.com wrote:



On 2/17/23 12:14, Ivan Klokov wrote:

Due to typo in opcode list, ctzw is disassembled as clzw instruction.



The code was added by 02c1b569a15b4b06a so I believe a "Fixes:" tag is in
order:

Fixes: 02c1b569a15b ("disas/riscv: Add Zb[abcs] instructions")


Signed-off-by: Ivan Klokov 
---
  disas/riscv.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index ddda687c13..d0639cd047 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -1644,7 +1644,7 @@ const rv_opcode_data opcode_data[] = {
  { "minu", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
  { "max", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
  { "maxu", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
-{ "clzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "ctzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
  { "clzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },



Does the order matter here? This patch is putting ctzw before clzw, but 20 lines
or so before we have "clz" after "ctz".


IIUC the ordering does matter: the values in rv_op_* need to match the 
index of opcode_data[].  decode_inst_opcode() fills out rv_op_*, and 
then the various decode bits (with format_inst() being the most relevant 
as it looks at the name field).


So unless I'm missing something, the correct patch should look like

   diff --git a/disas/riscv.c b/disas/riscv.c
   index ddda687c13..544558 100644
   --- a/disas/riscv.c
   +++ b/disas/riscv.c
   @@ -1645,7 +1645,7 @@ const rv_opcode_data opcode_data[] = {
{ "max", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
{ "maxu", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
{ "clzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
   -{ "clzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
   +{ "ctzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
{ "cpopw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
{ "slli.uw", rv_codec_i_sh5, rv_fmt_rd_rs1_imm, NULL, 0, 0, 0 },
{ "add.uw", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },

The threading seems to have gotten a little screwed up with the v2 so sorry if
I missed something, but I didn't see one with the ordering changed.  I stuck
what I think is a correct patch over at
,
LMK if that's OK (or just send a v3).


If the order doesn't matter I think it would be nice to put ctzw after clzw.



Thanks,


Daniel


  { "cpopw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
  { "slli.uw", rv_codec_i_sh5, rv_fmt_rd_rs1_imm, NULL, 0, 0, 0 },




Re: [PATCH v2 05/18] target/riscv: gdbstub: Do not generate CSR XML if Zicsr is disabled

2023-03-01 Thread Bin Meng
On Thu, Mar 2, 2023 at 7:43 AM Palmer Dabbelt  wrote:
>
> On Wed, 01 Mar 2023 01:55:34 PST (-0800), Bin Meng wrote:
> > On Wed, Mar 1, 2023 at 5:52 PM LIU Zhiwei  
> > wrote:
> >>
> >>
> >> On 2023/2/28 18:40, Bin Meng wrote:
> >> > There is no need to generate the CSR XML if the Zicsr extension
> >> > is not enabled.
> >>
> >> Should we generate the FPU XML or Vector XML when Zicsr is not enabled?
> >
> > Good point. I think we should disable that too.
>
> Seems reasonable.  Did you want to do that as part of a v3, or just as a
> follow-on fix?
>

I looked at this further.

The FPU / Vector XML is guarded by the " env->misa_ext" check. If
Zicsr is disabled while F or V extension is off, QEMU will error out
in riscv_cpu_realize() earlier before the gdbstub init.

So current patch should be fine.

Regards,
Bin



Re: [PATCH 14/33] tests: acpi: update expected blobs

2023-03-01 Thread Michael S. Tsirkin
On Fri, Feb 24, 2023 at 04:37:53PM +0100, Igor Mammedov wrote:
> only following context change:
>  -  Local1 = Zero
> If ((Arg0 != ToUUID ("e5c937d0-3553-4d7a-9117-ea4d19c3434d") /* Device 
> Labeling Interface */))
> {
> Return (Local0)
>  ...
> Return (Local0)
> }
> 
>  +  Local1 = Zero
> Local2 = AIDX (DerefOf (Arg4 [Zero]), DerefOf (Arg4 [One]
> 
> Signed-off-by: Igor Mammedov 

Nope:

diff -ru -N -IDisassembly -IChecksum '-I* Length   ' 
old/asl/tests/data/acpi/q35/DSDT.viot.dsl 
new/asl/tests/data/acpi/q35/DSDT.viot.dsl
:--- old/asl/tests/data/acpi/q35/DSDT.viot.dsl  2023-03-01 19:22:57.636454958 
-0500
:+++ new/asl/tests/data/acpi/q35/DSDT.viot.dsl  2023-03-01 19:22:58.451460462 
-0500
:@@ -148,7 +148,6 @@
 {
  0x00 // .
 }
-Local1 = Zero
 If ((Arg0 != ToUUID ("e5c937d0-3553-4d7a-9117-ea4d19c3434d") 
/* Device Labeling Interface */))
 {
 Return (Local0)
:@@ -159,12 +158,14 @@
 Return (Local0)
 }
 
+Local1 = Zero
 Local2 = AIDX (DerefOf (Arg4 [Zero]), DerefOf (Arg4 [One]
 ))
 If (!((Local2 == Zero) | (Local2 == 0x)))
 {
 Local1 |= One
 Local1 |= (One << 0x07)
+Local1 |= (One << 0x05)
 }
 
 Local0 [Zero] = Local1


and the funny thing is, the second change is in the expected
file but not in the code so this patch causes the test to fail.


> ---
>  tests/qtest/bios-tables-test-allowed-diff.h   |  35 --
>  tests/data/acpi/pc/DSDT   | Bin 6360 -> 6360 bytes
>  tests/data/acpi/pc/DSDT.acpierst  | Bin 6283 -> 6283 bytes
>  tests/data/acpi/pc/DSDT.acpihmat  | Bin 7685 -> 7685 bytes
>  tests/data/acpi/pc/DSDT.bridge| Bin 12487 -> 12487 bytes
>  tests/data/acpi/pc/DSDT.cphp  | Bin 6824 -> 6824 bytes
>  tests/data/acpi/pc/DSDT.dimmpxm   | Bin 8014 -> 8014 bytes
>  tests/data/acpi/pc/DSDT.hpbridge  | Bin 6323 -> 6323 bytes
>  tests/data/acpi/pc/DSDT.ipmikcs   | Bin 6432 -> 6432 bytes
>  tests/data/acpi/pc/DSDT.memhp | Bin 7719 -> 7719 bytes
>  tests/data/acpi/pc/DSDT.nohpet| Bin 6218 -> 6218 bytes
>  tests/data/acpi/pc/DSDT.numamem   | Bin 6366 -> 6366 bytes
>  tests/data/acpi/pc/DSDT.roothp| Bin 9745 -> 9745 bytes
>  tests/data/acpi/q35/DSDT  | Bin 8252 -> 8252 bytes
>  tests/data/acpi/q35/DSDT.acpierst | Bin 8269 -> 8269 bytes
>  tests/data/acpi/q35/DSDT.acpihmat | Bin 9577 -> 9577 bytes
>  tests/data/acpi/q35/DSDT.acpihmat-noinitiator | Bin 8531 -> 8531 bytes
>  tests/data/acpi/q35/DSDT.applesmc | Bin 8298 -> 8298 bytes
>  tests/data/acpi/q35/DSDT.bridge   | Bin 11481 -> 11481 bytes
>  tests/data/acpi/q35/DSDT.core-count2  | Bin 32392 -> 32392 bytes
>  tests/data/acpi/q35/DSDT.cphp | Bin 8716 -> 8716 bytes
>  tests/data/acpi/q35/DSDT.cxl  | Bin 9578 -> 9578 bytes
>  tests/data/acpi/q35/DSDT.dimmpxm  | Bin 9906 -> 9906 bytes
>  tests/data/acpi/q35/DSDT.ipmibt   | Bin 8327 -> 8327 bytes
>  tests/data/acpi/q35/DSDT.ipmismbus| Bin 8340 -> 8340 bytes
>  tests/data/acpi/q35/DSDT.ivrs | Bin 8269 -> 8269 bytes
>  tests/data/acpi/q35/DSDT.memhp| Bin 9611 -> 9611 bytes
>  tests/data/acpi/q35/DSDT.mmio64   | Bin 9382 -> 9382 bytes
>  tests/data/acpi/q35/DSDT.multi-bridge | Bin 12545 -> 12545 bytes
>  tests/data/acpi/q35/DSDT.nohpet   | Bin 8110 -> 8110 bytes
>  tests/data/acpi/q35/DSDT.numamem  | Bin 8258 -> 8258 bytes
>  tests/data/acpi/q35/DSDT.pvpanic-isa  | Bin 8353 -> 8353 bytes
>  tests/data/acpi/q35/DSDT.tis.tpm12| Bin 8858 -> 8858 bytes
>  tests/data/acpi/q35/DSDT.tis.tpm2 | Bin 8884 -> 8884 bytes
>  tests/data/acpi/q35/DSDT.viot | Bin 9361 -> 9377 bytes
>  tests/data/acpi/q35/DSDT.xapic| Bin 35615 -> 35615 bytes
>  36 files changed, 35 deletions(-)
> 
> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
> b/tests/qtest/bios-tables-test-allowed-diff.h
> index 7e7745db39..dfb8523c8b 100644
> --- a/tests/qtest/bios-tables-test-allowed-diff.h
> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
> @@ -1,36 +1 @@
>  /* List of comma-separated changed AML files to ignore */
> -"tests/data/acpi/pc/DSDT",
> -"tests/data/acpi/pc/DSDT.acpierst",
> -"tests/data/acpi/pc/DSDT.acpihmat",
> -"tests/data/acpi/pc/DSDT.bridge",
> -"tests/data/acpi/pc/DSDT.cphp",
> -"tests/data/acpi/pc/DSDT.dimmpxm",
> 

Re: [PATCH v2 10/20] vfio/common: Record DMA mapped IOVA ranges

2023-03-01 Thread Joao Martins
On 02/03/2023 00:07, Joao Martins wrote:
> On 28/02/2023 20:36, Alex Williamson wrote:

[...]

>> Can we make the same argument that the overhead is negligible if a VM
>> makes use of 10s of GB of virtio-mem with 2MB block size?
>>
>> But then on a 4KB host we're limited to 256 tracking entries, so
>> wasting all that time and space on a runtime IOVATree is even more
>> dubious.
>>
>> In fact, it doesn't really matter that vfio_listener_region_add and
>> this potentially new listener come to the same result, as long as the
>> new listener is a superset of the existing listener. 
> 
> I am trying to put this in a way that's not too ugly to reuse the most between
> vfio_listener_region_add() and the vfio_migration_mapping_add().
> 
> For you to have an idea, here's so far how it looks thus far:
> 
> https://github.com/jpemartins/qemu/commits/vfio-dirty-tracking
> 
> Particularly this one:
> 
> https://github.com/jpemartins/qemu/commit/3b11fa0e4faa0f9c0f42689a7367284a25d1b585
> 
> vfio_get_section_iova_range() is where most of these checks are that are sort 
> of
> a subset of the ones in vfio_listener_region_add().
> 
>> So I think we can
>> simplify out a lot of the places we'd see duplication and bugs.  I'm
>> not even really sure why we wouldn't simplify things further and only
>> record a single range covering the low and high memory marks for a
>> non-vIOMMU VMs, or potentially an approximation removing gaps of 1GB or
>> more, for example.  Thanks,
> 
> Yes, for Qemu, to have one single artificial range with a computed min IOVA 
> and
> max IOVA is the simplest to get it implemented. It would avoid us maintaining 
> an
> IOVATree as you would only track min/max pair (maybe max_below).
> 
> My concern with a reduced single range is 1) big holes in address space 
> leading
> to asking more than you need[*] and then 2) device dirty tracking limits e.g.
> hardware may have upper limits, so you may prematurely exercise those. So 
> giving
> more choice to the vfio drivers to decide how to cope with the mapped address
> space description looks to have a bit more longevity.
> 
> Anyway the temptation with having a single range is that this can all go away 
> if
> the vfio_listener_region_add() tracks just min/max IOVA pair.
> 
> Below scissors mark it's how this patch is looking like in the commit above
> while being a full list of mappings. It's also stored here:
> 
> https://github.com/jpemartins/qemu/commits/vfio-dirty-tracking
> 
> I'll respond here with a patch on what it looks like with the range watermark
> approach.
> 

... Which is here:

https://github.com/jpemartins/qemu/commits/vfio-dirty-tracking-range

And below scissors mark at the end this patch in the series. Smaller, most of
the churn is the new checks. I need to adjust commit messages, depending on
which way the group decides to go. So take those with a grain of salt.

> 
> [0] AMD 1T boundary is what comes to mind, which on Qemu relocates memory 
> above
> 4G into after 1T.

>8-

From: Joao Martins 
Date: Wed, 22 Feb 2023 19:49:05 +0200
Subject: [PATCH wip 7/12] vfio/common: Record DMA mapped IOVA ranges

According to the device DMA logging uAPI, IOVA ranges to be logged by
the device must be provided all at once upon DMA logging start.

As preparation for the following patches which will add device dirty
page tracking, keep a record of all DMA mapped IOVA ranges so later they
can be used for DMA logging start.

Note that when vIOMMU is enabled DMA mapped IOVA ranges are not tracked.
This is due to the dynamic nature of vIOMMU DMA mapping/unmapping.

Signed-off-by: Joao Martins 
Signed-off-by: Avihai Horon 
---
 hw/vfio/common.c  | 110 --
 hw/vfio/trace-events  |   1 +
 include/hw/vfio/vfio-common.h |   5 ++
 3 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 655e8dbb74d4..ff4a2aa0e14b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -44,6 +44,7 @@
 #include "migration/blocker.h"
 #include "migration/qemu-file.h"
 #include "sysemu/tpm.h"
+#include "qemu/iova-tree.h"

 VFIOGroupList vfio_group_list =
 QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -426,6 +427,11 @@ void vfio_unblock_multiple_devices_migration(void)
 multiple_devices_migration_blocker = NULL;
 }

+static bool vfio_have_giommu(VFIOContainer *container)
+{
+return !QLIST_EMPTY(>giommu_list);
+}
+
 static void vfio_set_migration_error(int err)
 {
 MigrationState *ms = migrate_get_current();
@@ -610,6 +616,7 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
 .iova = iova,
 .size = size,
 };
+int ret;

 if (!readonly) {
 map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
@@ -626,8 +633,10 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
 return 0;
 }

+ret = -errno;
 error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
-return -errno;
+

Re: [PATCH v2 10/20] vfio/common: Record DMA mapped IOVA ranges

2023-03-01 Thread Joao Martins
On 28/02/2023 20:36, Alex Williamson wrote:
> On Tue, 28 Feb 2023 12:11:06 +
> Joao Martins  wrote:
>> On 23/02/2023 21:50, Alex Williamson wrote:
>>> On Thu, 23 Feb 2023 21:19:12 +
>>> Joao Martins  wrote:  
 On 23/02/2023 21:05, Alex Williamson wrote:  
> On Thu, 23 Feb 2023 10:37:10 +
> Joao Martins  wrote:
>> On 22/02/2023 22:10, Alex Williamson wrote:
>>> On Wed, 22 Feb 2023 19:49:05 +0200
>>> Avihai Horon  wrote:  
 From: Joao Martins 
 @@ -612,6 +665,16 @@ static int vfio_dma_map(VFIOContainer *container, 
 hwaddr iova,
  .iova = iova,
  .size = size,
  };
 +int ret;
 +
 +ret = vfio_record_mapping(container, iova, size, readonly);
 +if (ret) {
 +error_report("vfio: Failed to record mapping, iova: 0x%" 
 HWADDR_PRIx
 + ", size: 0x" RAM_ADDR_FMT ", ret: %d (%s)",
 + iova, size, ret, strerror(-ret));
 +
 +return ret;
 +}  
>>>
>>> Is there no way to replay the mappings when a migration is started?
>>> This seems like a horrible latency and bloat trade-off for the
>>> possibility that the VM might migrate and the device might support
>>> these features.  Our performance with vIOMMU is already terrible, I
>>> can't help but believe this makes it worse.  Thanks,
>>>   
>>
>> It is a nop if the vIOMMU is being used (entries in 
>> container->giommu_list) as
>> that uses a max-iova based IOVA range. So this is really for iommu 
>> identity
>> mapping and no-VIOMMU.
>
> Ok, yes, there are no mappings recorded for any containers that have a
> non-empty giommu_list.
> 
>> We could replay them if they were tracked/stored anywhere.
>
> Rather than piggybacking on vfio_memory_listener, why not simply
> register a new MemoryListener when migration is started?  That will
> replay all the existing ranges and allow tracking to happen separate
> from mapping, and only when needed.

 The problem with that is that *starting* dirty tracking needs to have all 
 the
 range, we aren't supposed to start each range separately. So on a memory
 listener callback you don't have introspection when you are dealing with 
 the
 last range, do we?  
>>>
>>> As soon as memory_listener_register() returns, all your callbacks to
>>> build the IOVATree have been called and you can act on the result the
>>> same as if you were relying on the vfio mapping MemoryListener.  I'm
>>> not seeing the problem.  Thanks,
>>>   
>>
>> While doing these changes, the nice thing of the current patch is that 
>> whatever
>> changes apply to vfio_listener_region_add() will be reflected in the mappings
>> tree that stores what we will dirty track. If we move the mappings 
>> calculation
>> necessary for dirty tracking only when we start, we will have to duplicate 
>> the
>> same checks, and open for bugs where we ask things to be dirty track-ed that
>> haven't been DMA mapped. These two aren't necessarily tied, but felt like I
>> should raise the potentially duplication of the checks (and the same thing
>> applies for handling virtio-mem and what not).
>>
>> I understand that if we were going to store *a lot* of mappings that this 
>> would
>> add up in space requirements. But for no-vIOMMU (or iommu=pt) case this is 
>> only
>> about 12ranges or so, it is much simpler to piggyback the existing listener.
>> Would you still want to move this to its own dedicated memory listener?
> 
> Code duplication and bugs are good points, but while typically we're
> only seeing a few handfuls of ranges, doesn't virtio-mem in particular
> allow that we could be seeing quite a lot more?
> 
Ugh yes, it could be.

> We used to be limited to a fairly small number of KVM memory slots,
> which effectively bounded non-vIOMMU DMA mappings, but that value is
> now 2^15, so we need to anticipate that we could see many more than a
> dozen mappings.
> 

Even with 32k memory slots today we are still reduced on a handful. hv-balloon
and virtio-mem approaches though are the ones that may stress such limit IIUC
prior to starting migration.

> Can we make the same argument that the overhead is negligible if a VM
> makes use of 10s of GB of virtio-mem with 2MB block size?
> 
> But then on a 4KB host we're limited to 256 tracking entries, so
> wasting all that time and space on a runtime IOVATree is even more
> dubious.
>
> In fact, it doesn't really matter that vfio_listener_region_add and
> this potentially new listener come to the same result, as long as the
> new listener is a superset of the existing listener. 

I am trying to put this in a way that's not too ugly to reuse the most between
vfio_listener_region_add() and the vfio_migration_mapping_add().

For you to have an 

Re: [RFC 22/52] riscv: Replace MachineState.smp access with topology helpers

2023-03-01 Thread Palmer Dabbelt

On Tue, 14 Feb 2023 18:57:35 PST (-0800), zhao1@linux.intel.com wrote:

On Tue, Feb 14, 2023 at 10:17:45AM +0800, Mi, Dapeng1 wrote:

Date: Tue, 14 Feb 2023 10:17:45 +0800
From: "Mi, Dapeng1" 
Subject: RE: [RFC 22/52] riscv: Replace MachineState.smp access with
 topology helpers

> From: Zhao Liu 
> Sent: Monday, February 13, 2023 5:50 PM
> To: Eduardo Habkost ; Marcel Apfelbaum
> ; Philippe Mathieu-Daud? ;
> Yanan Wang ; Michael S . Tsirkin
> ; Richard Henderson ; Paolo
> Bonzini ; Eric Blake ; Markus
> Armbruster 
> Cc: qemu-devel@nongnu.org; Wang, Zhenyu Z ; Mi,
> Dapeng1 ; Ding, Zhuocheng
> ; Robert Hoo ;
> Christopherson,, Sean ; Like Xu
> ; Liu, Zhao1 ; Meng, Bin
> ; Palmer Dabbelt ; Alistair
> Francis ; Vijai Kumar K 
> Subject: [RFC 22/52] riscv: Replace MachineState.smp access with topology
> helpers
>
> From: Zhao Liu 
>
> When MachineState.topo is introduced, the topology related structures
> become complicated. So we wrapped the access to topology fields of
> MachineState.topo into some helpers, and we are using these helpers
> to replace the use of MachineState.smp.
>
> In the codes of riscv, it's straightforward to replace topology access
> with wrapped generic interfaces.
>
> Cc: Bin Meng 
> Cc: Palmer Dabbelt 
> Cc: Alistair Francis 
> CC: Vijai Kumar K 
> Signed-off-by: Zhao Liu 
> ---
>  hw/riscv/microchip_pfsoc.c | 11 ++-
>  hw/riscv/numa.c| 21 +++--
>  hw/riscv/opentitan.c   |  8 
>  hw/riscv/shakti_c.c|  2 +-
>  hw/riscv/sifive_e.c| 10 ++
>  hw/riscv/sifive_u.c| 28 ++--
>  hw/riscv/virt.c| 24 +---
>  7 files changed, 55 insertions(+), 49 deletions(-)
>
> diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
> index 2b91e49561f1..30295cce17e7 100644
> --- a/hw/riscv/microchip_pfsoc.c
> +++ b/hw/riscv/microchip_pfsoc.c
> @@ -164,7 +164,8 @@ static void microchip_pfsoc_soc_instance_init(Object
> *obj)
>
>  object_initialize_child(OBJECT(>u_cluster), "u-cpus", >u_cpus,
>  TYPE_RISCV_HART_ARRAY);
> -qdev_prop_set_uint32(DEVICE(>u_cpus), "num-harts", ms->smp.cpus - 1);
> +qdev_prop_set_uint32(DEVICE(>u_cpus), "num-harts",
> + machine_topo_get_cpus(ms) - 1);
>  qdev_prop_set_uint32(DEVICE(>u_cpus), "hartid-base", 1);
>  qdev_prop_set_string(DEVICE(>u_cpus), "cpu-type",
>   TYPE_RISCV_CPU_SIFIVE_U54);
> @@ -249,10 +250,10 @@ static void microchip_pfsoc_soc_realize(DeviceState
> *dev, Error **errp)
>
>  /* CLINT */
>  riscv_aclint_swi_create(memmap[MICROCHIP_PFSOC_CLINT].base,
> -0, ms->smp.cpus, false);
> +0, machine_topo_get_cpus(ms), false);
>  riscv_aclint_mtimer_create(
>  memmap[MICROCHIP_PFSOC_CLINT].base + RISCV_ACLINT_SWI_SIZE,
> -RISCV_ACLINT_DEFAULT_MTIMER_SIZE, 0, ms->smp.cpus,
> +RISCV_ACLINT_DEFAULT_MTIMER_SIZE, 0, machine_topo_get_cpus(ms),
>  RISCV_ACLINT_DEFAULT_MTIMECMP, RISCV_ACLINT_DEFAULT_MTIME,
>  CLINT_TIMEBASE_FREQ, false);
>
> @@ -276,11 +277,11 @@ static void microchip_pfsoc_soc_realize(DeviceState
> *dev, Error **errp)
>  l2lim_mem);
>
>  /* create PLIC hart topology configuration string */
> -plic_hart_config = riscv_plic_hart_config_string(ms->smp.cpus);
> +plic_hart_config =
> riscv_plic_hart_config_string(machine_topo_get_cpus(ms));
>
>  /* PLIC */
>  s->plic = sifive_plic_create(memmap[MICROCHIP_PFSOC_PLIC].base,
> -plic_hart_config, ms->smp.cpus, 0,
> +plic_hart_config, machine_topo_get_cpus(ms), 0,
>  MICROCHIP_PFSOC_PLIC_NUM_SOURCES,
>  MICROCHIP_PFSOC_PLIC_NUM_PRIORITIES,
>  MICROCHIP_PFSOC_PLIC_PRIORITY_BASE,
> diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c
> index 472010256183..1fabdc42e767 100644
> --- a/hw/riscv/numa.c
> +++ b/hw/riscv/numa.c
> @@ -37,13 +37,14 @@ int riscv_socket_count(const MachineState *ms)
>
>  int riscv_socket_first_hartid(const MachineState *ms, int socket_id)
>  {
> -int i, first_hartid = ms->smp.cpus;
> +int i, first_hartid, cpus = machine_topo_get_cpus(ms);
>
> +first_hartid = cpus;
>  if (!numa_enabled(ms)) {
>  return (!socket_id) ? 0 : -1;
>  }
>
> -for (i = 0; i < ms->smp.cpus; i++) {
> +for (i = 0; i < cpus; i++) {
>  if (ms->possible_cpus->cpus[i].props.node_id != socket_id) {
>  continue;
>  }
> @@ -52,18 +53,18 @@ int riscv_socket_first_hartid(const MachineState *ms,
> int socket_id)
>  }
>  }
>
> -return (first_hartid < ms->smp.cpus) ? first_hartid : -1;
> +return (first_hartid < cpus) ? first_hartid : -1;
>  }
>
>  int riscv_socket_last_hartid(const MachineState *ms, int socket_id)
>  {
> -int i, last_hartid = -1;
> +int i, last_hartid = -1, cpus = machine_topo_get_cpus(ms);
>
>  if (!numa_enabled(ms)) {
> -

Re: [PATCH v7 00/10] make write_misa a no-op and FEATURE_* cleanups

2023-03-01 Thread Palmer Dabbelt

On Wed, 22 Feb 2023 10:51:55 PST (-0800), dbarb...@ventanamicro.com wrote:

Hi,

In this version we gave up removing all the write_misa() body and,
instead, we went back to something closer to what we were doing in v2.
write_misa() is now gated behind an experimental x-misa-w cfg option,
defaulted to false.

The idea is that x-misa-w allow us to keep experimenting and testing the
code. Marking it as experimental will (hopefully) make users wary of the
fact that this feature is unstable. The expectation is that the flag will
be removed once write_misa() is ready to always write MISA.

Changes from v6:
- patches without reviews/acks: patch 3
- patch 2: taken from version 3, acks and r-bs preserved
- patch 3:
  - rename 'misa-w' to 'x-misa-w' to be clearer about our intents with
the cfg option
- v6 link: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg05047.html

Daniel Henrique Barboza (10):
  target/riscv: introduce riscv_cpu_cfg()
  target/riscv: do not mask unsupported QEMU extensions in write_misa()
  target/riscv: allow MISA writes as experimental
  target/riscv: remove RISCV_FEATURE_DEBUG
  target/riscv/cpu.c: error out if EPMP is enabled without PMP
  target/riscv: remove RISCV_FEATURE_EPMP
  target/riscv: remove RISCV_FEATURE_PMP
  hw/riscv/virt.c: do not use RISCV_FEATURE_MMU in
create_fdt_socket_cpus()
  target/riscv: remove RISCV_FEATURE_MMU
  target/riscv/cpu: remove CPUArchState::features and friends

 hw/riscv/virt.c   |  7 ---
 target/riscv/cpu.c| 25 ++---
 target/riscv/cpu.h| 29 ++---
 target/riscv/cpu_helper.c |  6 +++---
 target/riscv/csr.c| 15 ++-
 target/riscv/machine.c| 11 ---
 target/riscv/monitor.c|  2 +-
 target/riscv/op_helper.c  |  2 +-
 target/riscv/pmp.c|  8 
 9 files changed, 39 insertions(+), 66 deletions(-)


I just queued this up, using the text from the v1 as that's more of a 
description of the patch set.  I think that text is still sufficiently 
accurate, but let me know if I missed anything.  Here's what I ended up 
with


   Merge patch series "make write_misa a no-op and FEATURE_* cleanups"
   
   Daniel Henrique Barboza  says:
   
   The RISCV_FEATURES_* enum and the CPUArchState::features attribute were

   introduced 4+ years ago, as a way to retrieve the enabled hart features
   that aren't represented via MISA CSR bits. Time passed on, and
   RISCVCPUConfig was introduced. With it, we now have a centralized way of
   reading all hart features that are enabled/disabled by the user and the
   board. All recent features are reading their correspondent cpu->cfg.X
   flag.
   
   All but the 5 features in the RISCV_FEATURE_* enum. These features are

   still operating in the same way: set it during riscv_cpu_realize() using
   their cpu->cfg value, read it using riscv_feature() when needed. There
   is nothing special about them in comparison with all the other features
   and extensions to justify this special handling.
   
   This series then is doing two things: first we're actually allowing

   users to write the MISA CSR if they so choose. Then we're deprecate each
   RISC_FEATURE_* usage until, in patch 11, we remove everything related to
   it. All 5 existing RISCV_FEATURE_* features will be handled as everyone
   else.

Thanks!



Re: [PATCH v2 00/14] target/riscv: Some updates to float point related extensions

2023-03-01 Thread Palmer Dabbelt

On Tue, 14 Feb 2023 18:05:25 PST (-0800), liwei...@iscas.ac.cn wrote:

Specification for Zv* extensions can be found in:

https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc

The port is available here:
https://github.com/plctlab/plct-qemu/tree/plct-zvfh-upstream-v2

v2:

* improve the error message for vector related check suggested by Daniel 
Henrique Barboza in patch 5
* add similar simplification for check in csr.c/cpu_helper.c in patch 8
* fix typos in commit messages


Weiwei Li (14):
  target/riscv: Fix the relationship between Zfhmin and Zfh
  target/riscv: Fix the relationship between Zhinxmin and Zhinx
  target/riscv: Simplify the check for Zfhmin and Zhinxmin
  target/riscv: Add cfg properties for Zv* extensions
  target/riscv: Fix relationship between V, Zve*, F and  D
  target/riscv: Add propertie check for Zvfh{min} extensions
  target/riscv: Indent fixes in cpu.c
  target/riscv: Simplify check for Zve32f and Zve64f
  target/riscv: Replace check for F/D to Zve32f/Zve64d in
trans_rvv.c.inc
  target/riscv: Remove rebundunt check for zve32f and zve64f
  target/riscv: Add support for Zvfh/zvfhmin extensions
  target/riscv: Fix check for vector load/store instructions when EEW=64
  target/riscv: Simplify check for EEW = 64 in trans_rvv.c.inc
  target/riscv: Expose properties for Zv* extensions

 target/riscv/cpu.c|  99 
 target/riscv/cpu.h|   3 +
 target/riscv/cpu_helper.c |   2 +-
 target/riscv/csr.c|   3 +-
 target/riscv/insn_trans/trans_rvv.c.inc   | 184 +++---
 target/riscv/insn_trans/trans_rvzfh.c.inc |  25 ++-
 6 files changed, 146 insertions(+), 170 deletions(-)


Thanks, I queued this up.  There were a few more spelling errors in the 
commit messages, I just cleaned those up while merging.




Re: [PATCH v2 05/18] target/riscv: gdbstub: Do not generate CSR XML if Zicsr is disabled

2023-03-01 Thread Palmer Dabbelt

On Wed, 01 Mar 2023 01:55:34 PST (-0800), Bin Meng wrote:

On Wed, Mar 1, 2023 at 5:52 PM LIU Zhiwei  wrote:



On 2023/2/28 18:40, Bin Meng wrote:
> There is no need to generate the CSR XML if the Zicsr extension
> is not enabled.

Should we generate the FPU XML or Vector XML when Zicsr is not enabled?


Good point. I think we should disable that too.


Seems reasonable.  Did you want to do that as part of a v3, or just as a 
follow-on fix?



Zhiwei



Regards,
Bin




Re: [PATCH v2 3/6] bswap: Add the ability to store to an unaligned 24 bit field

2023-03-01 Thread Fan Ni
On Mon, Feb 27, 2023 at 05:03:08PM +, Jonathan Cameron wrote:
> From: Ira Weiny 
> 
> CXL has 24 bit unaligned fields which need to be stored to.  CXL is
> specified as little endian.
> 
> Define st24_le_p() and the supporting functions to store such a field
> from a 32 bit host native value.
> 
> The use of b, w, l, q as the size specifier is limiting.  So "24" was
> used for the size part of the function name.
> 
> Signed-off-by: Ira Weiny 
> Signed-off-by: Jonathan Cameron 
> 

Reviewed-by: Fan Ni 

> ---
> v7:
>   - Pulled this patch out of the CXL events series as Ira pointed
> out it can be used to simplify this series.
> ---
>  include/qemu/bswap.h | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
> index 15a78c0db5..ee71cbeaaa 100644
> --- a/include/qemu/bswap.h
> +++ b/include/qemu/bswap.h
> @@ -8,11 +8,23 @@
>  #undef  bswap64
>  #define bswap64(_x) __builtin_bswap64(_x)
>  
> +static inline uint32_t bswap24(uint32_t x)
> +{
> +return (((x & 0x00ffU) << 16) |
> +((x & 0xff00U) <<  0) |
> +((x & 0x00ffU) >> 16));
> +}
> +
>  static inline void bswap16s(uint16_t *s)
>  {
>  *s = __builtin_bswap16(*s);
>  }
>  
> +static inline void bswap24s(uint32_t *s)
> +{
> +*s = bswap24(*s);
> +}
> +
>  static inline void bswap32s(uint32_t *s)
>  {
>  *s = __builtin_bswap32(*s);
> @@ -176,6 +188,7 @@ CPU_CONVERT(le, 64, uint64_t)
>   * size is:
>   *   b: 8 bits
>   *   w: 16 bits
> + *   24: 24 bits
>   *   l: 32 bits
>   *   q: 64 bits
>   *
> @@ -248,6 +261,11 @@ static inline void stw_he_p(void *ptr, uint16_t v)
>  __builtin_memcpy(ptr, , sizeof(v));
>  }
>  
> +static inline void st24_he_p(void *ptr, uint32_t v)
> +{
> +__builtin_memcpy(ptr, , 3);
> +}
> +
>  static inline int ldl_he_p(const void *ptr)
>  {
>  int32_t r;
> @@ -297,6 +315,11 @@ static inline void stw_le_p(void *ptr, uint16_t v)
>  stw_he_p(ptr, le_bswap(v, 16));
>  }
>  
> +static inline void st24_le_p(void *ptr, uint32_t v)
> +{
> +st24_he_p(ptr, le_bswap(v, 24));
> +}
> +
>  static inline void stl_le_p(void *ptr, uint32_t v)
>  {
>  stl_he_p(ptr, le_bswap(v, 32));
> -- 
> 2.37.2
> 
> 


Re: [PATCH v2 2/6] hw/cxl: Introduce cxl_device_get_timestamp() utility function

2023-03-01 Thread Fan Ni
On Mon, Feb 27, 2023 at 05:03:07PM +, Jonathan Cameron wrote:
> From: Ira Weiny 
> 
> There are new users of this functionality coming shortly so factor
> it out from the GET_TIMESTAMP mailbox command handling.
> 
> Signed-off-by: Ira Weiny 
> Signed-off-by: Jonathan Cameron 

Reviewed-by: Fan Ni 

> ---
>  hw/cxl/cxl-device-utils.c   | 15 +++
>  hw/cxl/cxl-mailbox-utils.c  | 11 +--
>  include/hw/cxl/cxl_device.h |  2 ++
>  3 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
> index 4c5e88aaf5..86e1cea8ce 100644
> --- a/hw/cxl/cxl-device-utils.c
> +++ b/hw/cxl/cxl-device-utils.c
> @@ -269,3 +269,18 @@ void cxl_device_register_init_common(CXLDeviceState 
> *cxl_dstate)
>  
>  cxl_initialize_mailbox(cxl_dstate);
>  }
> +
> +uint64_t cxl_device_get_timestamp(CXLDeviceState *cxl_dstate)
> +{
> +uint64_t time, delta;
> +uint64_t final_time = 0;
> +
> +if (cxl_dstate->timestamp.set) {
> +/* Find the delta from the last time the host set the time. */
> +time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> +delta = time - cxl_dstate->timestamp.last_set;
> +final_time = cxl_dstate->timestamp.host_set + delta;
> +}
> +
> +return final_time;
> +}
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 7b2aef0d67..702e16ca20 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -163,17 +163,8 @@ static CXLRetCode cmd_timestamp_get(struct cxl_cmd *cmd,
>  CXLDeviceState *cxl_dstate,
>  uint16_t *len)
>  {
> -uint64_t time, delta;
> -uint64_t final_time = 0;
> -
> -if (cxl_dstate->timestamp.set) {
> -/* First find the delta from the last time the host set the time. */
> -time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> -delta = time - cxl_dstate->timestamp.last_set;
> -final_time = cxl_dstate->timestamp.host_set + delta;
> -}
> +uint64_t final_time = cxl_device_get_timestamp(cxl_dstate);
>  
> -/* Then adjust the actual time */
>  stq_le_p(cmd->payload, final_time);
>  *len = 8;
>  
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index edb9791bab..02befda0f6 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -287,4 +287,6 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr 
> host_addr, uint64_t *data,
>  MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
>  unsigned size, MemTxAttrs attrs);
>  
> +uint64_t cxl_device_get_timestamp(CXLDeviceState *cxlds);
> +
>  #endif
> -- 
> 2.37.2
> 
> 


Re: [PATCH v2 1/6] hw/cxl: rename mailbox return code type from ret_code to CXLRetCode

2023-03-01 Thread Fan Ni
On Mon, Feb 27, 2023 at 05:03:06PM +, Jonathan Cameron wrote:
> Given the increasing usage of this mailbox return code type, now
> is a good time to switch to QEMU style naming.
> 
> Reviewed-by: Ira Weiny 
> Signed-off-by: Jonathan Cameron 
> 

Reviewed-by: Fan Ni 

> --
> v7: (thanks to Ira Weiny for review)
> - Rename in place as the move to the header isn't needed for this series
>   That move patch will now be the start of the CXL events series that
>   will follow this one.
> ---
>  hw/cxl/cxl-mailbox-utils.c | 64 +++---
>  1 file changed, 32 insertions(+), 32 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index ed663cc04a..7b2aef0d67 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -23,7 +23,7 @@
>   * FOO= 0x7f,
>   *  #define BAR 0
>   *  2. Implement the handler
> - *static ret_code cmd_foo_bar(struct cxl_cmd *cmd,
> + *static CXLRetCode cmd_foo_bar(struct cxl_cmd *cmd,
>   *  CXLDeviceState *cxl_dstate, uint16_t 
> *len)
>   *  3. Add the command to the cxl_cmd_set[][]
>   *[FOO][BAR] = { "FOO_BAR", cmd_foo_bar, x, y },
> @@ -90,10 +90,10 @@ typedef enum {
>  CXL_MBOX_UNSUPPORTED_MAILBOX = 0x15,
>  CXL_MBOX_INVALID_PAYLOAD_LENGTH = 0x16,
>  CXL_MBOX_MAX = 0x17
> -} ret_code;
> +} CXLRetCode;
>  
>  struct cxl_cmd;
> -typedef ret_code (*opcode_handler)(struct cxl_cmd *cmd,
> +typedef CXLRetCode (*opcode_handler)(struct cxl_cmd *cmd,
> CXLDeviceState *cxl_dstate, uint16_t 
> *len);
>  struct cxl_cmd {
>  const char *name;
> @@ -105,16 +105,16 @@ struct cxl_cmd {
>  
>  #define DEFINE_MAILBOX_HANDLER_ZEROED(name, size) \
>  uint16_t __zero##name = size; \
> -static ret_code cmd_##name(struct cxl_cmd *cmd,   \
> -   CXLDeviceState *cxl_dstate, uint16_t *len) \
> +static CXLRetCode cmd_##name(struct cxl_cmd *cmd,   \
> + CXLDeviceState *cxl_dstate, uint16_t *len) \
>  { \
>  *len = __zero##name;  \
>  memset(cmd->payload, 0, *len);\
>  return CXL_MBOX_SUCCESS;  \
>  }
>  #define DEFINE_MAILBOX_HANDLER_NOP(name)  \
> -static ret_code cmd_##name(struct cxl_cmd *cmd,   \
> -   CXLDeviceState *cxl_dstate, uint16_t *len) \
> +static CXLRetCode cmd_##name(struct cxl_cmd *cmd,   \
> + CXLDeviceState *cxl_dstate, uint16_t *len) \
>  { \
>  return CXL_MBOX_SUCCESS;  \
>  }
> @@ -125,9 +125,9 @@ 
> DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
>  DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
>  
>  /* 8.2.9.2.1 */
> -static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
> - CXLDeviceState *cxl_dstate,
> - uint16_t *len)
> +static CXLRetCode cmd_firmware_update_get_info(struct cxl_cmd *cmd,
> +   CXLDeviceState *cxl_dstate,
> +   uint16_t *len)
>  {
>  struct {
>  uint8_t slots_supported;
> @@ -159,9 +159,9 @@ static ret_code cmd_firmware_update_get_info(struct 
> cxl_cmd *cmd,
>  }
>  
>  /* 8.2.9.3.1 */
> -static ret_code cmd_timestamp_get(struct cxl_cmd *cmd,
> -  CXLDeviceState *cxl_dstate,
> -  uint16_t *len)
> +static CXLRetCode cmd_timestamp_get(struct cxl_cmd *cmd,
> +CXLDeviceState *cxl_dstate,
> +uint16_t *len)
>  {
>  uint64_t time, delta;
>  uint64_t final_time = 0;
> @@ -181,7 +181,7 @@ static ret_code cmd_timestamp_get(struct cxl_cmd *cmd,
>  }
>  
>  /* 8.2.9.3.2 */
> -static ret_code cmd_timestamp_set(struct cxl_cmd *cmd,
> +static CXLRetCode cmd_timestamp_set(struct cxl_cmd *cmd,
>CXLDeviceState *cxl_dstate,
>uint16_t *len)
>  {
> @@ -201,9 +201,9 @@ static const QemuUUID cel_uuid = {
>  };
>  
>  /* 8.2.9.4.1 */
> -static ret_code cmd_logs_get_supported(struct cxl_cmd *cmd,
> -   CXLDeviceState *cxl_dstate,
> -   uint16_t *len)
> +static CXLRetCode cmd_logs_get_supported(struct cxl_cmd *cmd,
> +   

Re: [PATCH v5 5/7] hw/isa/vt82c686: Work around missing level sensitive irq in i8259 model

2023-03-01 Thread BALATON Zoltan

On Wed, 1 Mar 2023, David Woodhouse wrote:

On Wed, 2023-03-01 at 19:01 +0100, BALATON Zoltan wrote:



It isn't a *correct* fix without a little bit more typing, but does
this make it work?

diff --git a/hw/intc/i8259.c b/hw/intc/i8259.c
index 17910f3bcb..36ebcff025 100644
--- a/hw/intc/i8259.c
+++ b/hw/intc/i8259.c
@@ -246,6 +246,7 @@ static void pic_ioport_write(void *opaque, hwaddr addr64,
 if (val & 0x08) {
 qemu_log_mask(LOG_UNIMP,
   "i8259: level sensitive irq not supported\n");
+    s->elcr = 0xff;


This works too. I guess the log can be then removed too. Could you submit
a proper patch or you want me to do that so we can drop the workaround for
it? Thanks for looking into it.



Happy for you to do the rest of the typing ... :)


I don't mind the typing but this is quite a bit more involved than I 
expected. You've lost me at the vmstate stuff which I don't quite know how 
to change or test. If these were stored as bits in an ISW1 register as 
described by the docs I've looked at now then no change in migration would 
be needed but this isn't how it seems to be in QEMU so I give up on that 
in this case. Could you please do the typing then?


Thank you,
BALATON Zoltan


So, *ideally* I think you need to introduce a new field in the
PICCommonState which records the status of the LTIM bit. And fix up the
vmstate_pic_common in hw/intc/i8259_common.c to save and restore that
(with versioning for upgrade/downgrade).

Then you find those places which currently check the bit for the
specific pin in s->elcr, and make them something like:

--- a/hw/intc/i8259.c
+++ b/hw/intc/i8259.c
@@ -133,7 +133,7 @@ static void pic_set_irq(void *opaque, int irq, int level)
}
#endif

-if (s->elcr & mask) {
+if (s->ltim || (s->elcr & mask)) {
/* level triggered */
if (level) {
s->irr |= mask;

It *might* be that you should make the LTIM behaviour optional, so that
only certain incarnations of the i8259 actually get it at all and it
*wouldn't* take effect if a guest tried to set it, which is what the
PIIX3 datasheet implies. But I suspect we can get away without that.



Re: [PATCH 3/5] qmp: Added the helper stamp check.

2023-03-01 Thread Toke Høiland-Jørgensen
Daniel P. Berrangé  writes:

> On Wed, Mar 01, 2023 at 03:53:47PM +0100, Toke Høiland-Jørgensen wrote:
>> Daniel P. Berrangé  writes:
>> 
>> > On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote:
>> >> Daniel P. Berrangé  writes:
>> >> 
>> >> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
>> >> >> Daniel P. Berrangé  writes:
>> >> >> 
>> >> >> Just to interject a note on this here: the skeleton code is mostly a
>> >> >> convenience feature used to embed BPF programs into the calling binary.
>> >> >> It is perfectly possible to just have the BPF object file itself reside
>> >> >> directly in the file system and just use the regular libbpf APIs to 
>> >> >> load
>> >> >> it. Some things get a bit more cumbersome (mostly setting values of
>> >> >> global variables, if the BPF program uses those).
>> >> >> 
>> >> >> So the JSON example above could just be a regular compiled-from-clang
>> >> >> BPF object file, and the management program can load that, inspect its
>> >> >> contents using the libbpf APIs and pass the file descriptors on to 
>> >> >> Qemu.
>> >> >> It's even possible to embed version information into this so that Qemu
>> >> >> can check if it understands the format and bail out if it doesn't - 
>> >> >> just
>> >> >> stick a version field in the configuration map as the first entry :)
>> >> >
>> >> > If all you have is the BPF object file is it possible to interrogate
>> >> > it to get a list of all the maps, and get FDs associated for them ?
>> >> > I had a look at the libbpf API and wasn't sure about that, it seemed
>> >> > like you had to know the required maps upfront ?  If it is possible
>> >> > to auto-discover everything you need, soley from the BPF object file
>> >> > as input, then just dealing with that in isolation would feel simpler.
>> >> 
>> >> It is. You load the object file, and bpf_object__for_each_map() lets you
>> >> discover which maps it contains, with the different bpf_map__*() APIs
>> >> telling you the properties of that map (and you can modify them too
>> >> before loading the object if needed).
>> >> 
>> >> The only thing that's not in the object file is any initial data you
>> >> want to put into the map(s). But except for read-only maps that can be
>> >> added by userspace after loading the maps, so you could just let Qemu do
>> >> that...
>> >> 
>> >> > It occurrs to me that exposing the BPF program as data rather than
>> >> > via binary will make more practical to integrate this into KubeVirt's
>> >> > architecture. In their deployment setup both QEMU and libvirt are
>> >> > running unprivileged inside a container. For any advanced nmetworking
>> >> > a completely separate component creates the TAP device and passes it
>> >> > into the container running QEMU. I don't think that the separate
>> >> > precisely matched helper binary would be something they can use, but
>> >> > it might be possible to expose a data file providing the BPF program
>> >> > blob and describing its maps.
>> >> 
>> >> Well, "a data file providing the BPF program blob and describing its
>> >> maps" is basically what a BPF .o file is. It just happens to be encoded
>> >> in ELF format :)
>> >> 
>> >> You can embed it into some other data structure and have libbpf load it
>> >> from a blob in memory as well as from the filesystem, though; that is
>> >> basically what the skeleton file does (notice the big character string
>> >> at the end, that's just the original .o file contents).
>> >
>> > Ok, in that case I'm really wondering why any of this helper program
>> > stuff was proposed. I recall the rationale was that it was impossible
>> > for an external program to load the BPF object on behalf of QEMU,
>> > because it would not know how todo that without QEMU specific
>> > knowledge.
>> 
>> I'm not sure either. Was there some bits that initially needed to be set
>> before the program was loaded (read-only maps or something)? Also,
>> upstream does encourage the use of skeletons for embedding into
>> applications, so it's not an unreasonable thing to start with if you
>> don't have the kind of deployment constraints that Qemu does in this
>> case.
>> 
>> > It looks like we can simply expose the BPF object blob to mgmt apps
>> > directly and get rid of this helper program entirely.
>> 
>> I believe so, yes. You'd still need to be sure that the BPF object file
>> itself comes from a trusted place, but hopefully it should be enough to
>> load it from a known filesystem path? (Sorry if this is a stupid
>> question, I only have a fuzzy idea of how all the pieces fit together
>> here).
>
> It could be from a well known location on the filesystem, but might
> be better to make it possible to query it from QMP, which is mostly
> safe *provided* you've not yet started guest CPUs running. It could
> be queried at startup and then cached for future use.

Right, I don't have a strong opinion about the exact mechanism, just
wanted to convey a general "loading an 

Re: [PATCH v2 03/20] vfio/migration: Add VFIO migration pre-copy support

2023-03-01 Thread Alex Williamson
On Wed, 1 Mar 2023 17:12:51 -0400
Jason Gunthorpe  wrote:

> On Wed, Mar 01, 2023 at 12:55:59PM -0700, Alex Williamson wrote:
> 
> > So it seems like what we need here is both a preface buffer size and a
> > target device latency.  The QEMU pre-copy algorithm should factor both
> > the remaining data size and the device latency into deciding when to
> > transition to stop-copy, thereby allowing the device to feed actually
> > relevant data into the algorithm rather than dictate its behavior.  
> 
> I don't know that we can realistically estimate startup latency,
> especially have the sender estimate latency on the receiver..

Knowing that the target device is compatible with the source is a point
towards making an educated guess.

> I feel like trying to overlap the device start up with the STOP phase
> is an unnecessary optimization? How do you see it benifits?

If we can't guarantee that there's some time difference between sending
initial bytes immediately at the end of pre-copy vs immediately at the
beginning of stop-copy, does that mean any handling of initial bytes is
an unnecessary optimization?

I'm imagining that completing initial bytes triggers some
initialization sequence in the target host driver which runs in
parallel to the remaining data stream, so in practice, even if sent at
the beginning of stop-copy, the target device gets a head start.

> I've been thinking of this from the perspective that we should always
> ensure device startup is completed, it is time that has to be paid,
> why pay it during STOP?

Creating a policy for QEMU to send initial bytes in a given phase
doesn't ensure startup is complete.  There's no guaranteed time
difference between sending that data and the beginning of stop-copy.

QEMU is trying to achieve a downtime goal, where it estimates network
bandwidth to get a data size threshold, and then polls devices for
remaining data.  That downtime goal might exceed the startup latency of
the target device anyway, where it's then the operators choice to pay
that time in stop-copy, or stalled on the target.

But if we actually want to ensure startup of the target is complete,
then drivers should be able to return both data size and estimated time
for the target device to initialize.  That time estimate should be
updated by the driver based on if/when initial_bytes is drained.  The
decision whether to continue iterating pre-copy would then be based on
both the maximum remaining device startup time and the calculated time
based on remaining data size.

I think this provides a better guarantee than anything based simply on
transferring a given chunk of data in a specific phase of the process.
Thoughts?  Thanks,

Alex




RE: [PATCH] tcg: `reachable_code_pass()` remove empty else-branch

2023-03-01 Thread Taylor Simpson



> -Original Message-
> From: Anton Johansson 
> Sent: Wednesday, March 1, 2023 7:22 AM
> To: qemu-devel@nongnu.org
> Cc: a...@rev.ng; richard.hender...@linaro.org; Taylor Simpson
> 
> Subject: [PATCH] tcg: `reachable_code_pass()` remove empty else-branch
> 
> This patch extends reachable_code_pass() to also deal with empty else-
> branches of the form
> 
>   br $L0
>   set_label $L1
>   set_label $L0
> 
> converting them to
> 
>   set_label $L1
> 
> when $L0 is only referenced by the br op.  This type of empty-else branch will
> be emitted by idef-parser in the Hexagon frontend once CANCEL statements
> have been ignored.
> 
> Signed-off-by: Anton Johansson 
> ---
>  tcg/tcg.c | 41 ++---
>  1 file changed, 30 insertions(+), 11 deletions(-)
> 
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index a4a3da6804..531bc74231 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -2664,21 +2664,40 @@ static void reachable_code_pass(TCGContext *s)
>  dead = false;
>  remove = false;
> 
> -/*
> - * Optimization can fold conditional branches to 
> unconditional.
> - * If we find a label with one reference which is preceded by
> - * an unconditional branch to it, remove both.  This needed 
> to
> - * wait until the dead code in between them was removed.
> - */
> -if (label->refs == 1) {
> -TCGOp *op_prev = QTAILQ_PREV(op, link);

Can't we just insert a while loop here to move op_prev back across labels?

while (op_next->opc == INDEX_op_set_label) {
op_prev = QTAILQ_PREV(op, op_prev);
}

> -if (op_prev->opc == INDEX_op_br &&
> -label == arg_label(op_prev->args[0])) {

Also, here is the patch that exposes the need for this optimization
https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg07236.html

Thanks,
Taylor




Re: [PATCH v7 00/23] Consolidate PIIX south bridges

2023-03-01 Thread Michael S. Tsirkin
On Thu, Feb 23, 2023 at 05:25:23PM +, Bernhard Beschow wrote:
> Ping
> 
> Can we queue the piix3 part already? Now that the series doesn't introduce a 
> PIC proxy any more the piix3 part is essentially QOM cleanup.
> 
> Note that I cautiously dropped some Reviewed-by tags in the piix3 part as 
> well.
> 
> Best regards,
> Bernhard

This conflicts with ICH9 cleanup - I guess once that is merged you will
rebase right?

-- 
MST




Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memory hot-add protocols

2023-03-01 Thread Maciej S. Szmigiero

On 1.03.2023 18:24, David Hildenbrand wrote:
(...)

With virtio-mem one can simply have per-node virtio-mem devices.

2) I'm not sure what's the overhead of having, let's say, 1 TiB backing
memory device mostly marked madvise(MADV_DONTNEED).
Like, how much memory + swap this setup would actually consume - that's
something I would need to measure.


There are some WIP items to improve that (QEMU metadata (e.g., bitmaps), KVM 
metadata (e.g., per-memslot), Linux metadata (e.g., page tables).
Memory overcommit handling also has to be tackled.

So it would be a "shared" problem with virtio-mem and will be sorted out 
eventually :)



Yes, but this might take a bit of time, especially if kernel-side changes
are involved - that's why I will check how this setup works in practice
in its current shape.

(...)

Reboot? Logically unplug all memory and as the guest boots up, re-add the 
memory after the guest booted up.

The only thing we can't do is the following: when going below 4G, we cannot 
resize boot memory.


But I recall that that's *exactly* how the HV version I played with ~2 years ago worked: always 
start up with some initial memory ("startup memory"). After the VM is up for some 
seconds, we either add more memory (requested > startup) or request the VM to inflate memory 
(requested < startup).


Hyper-V actually "cleans up" the guest memory map on reboot - if the
guest was effectively resized up then on reboot the guest boot memory is
resized up to match that last size.
Similarly, if the guest was ballooned out - that amount of memory is
removed from the boot memory on reboot.


Yes, it cleans up, but as I said last time I checked there was this concept of 
startup vs. minimum vs. maximum, at least for dynamic memory:

https://www.fastvue.co/tmgreporter/blog/understanding-hyper-v-dynamic-memory-dynamic-ram/

Startup RAM would be whatever you specify for "-m xG". If you go below min, you 
remove memory via deflation once the guest is up.



That article was from 2014, so I guess it pertained Windows 2012 R2.

The memory settings page in more recent Hyper-V versions looks like on
the screenshot at [1].

It no longer calls that main memory amount value "Startup RAM", now it's
just "RAM".

Despite what one might think the "Enable Dynamic Memory" checkbox does
*not* control the Dynamic Memory protocol availability or usage - the
protocol is always available/exported to the guest.

What the "Enable Dynamic Memory" checkbox controls is some host-side
heuristics that automatically resize the guest within chosen bounds
based on some metrics.

Even if the "Enable Dynamic Memory" checkbox is *not* enabled the guest
can still be online-resized via Dynamic Memory protocol by simply
changing the value in the "RAM" field and clicking "Apply".

At least that's how it works on Windows 2019 with a Linux guest.



So it's not exactly doing a hot-add after the guest boots.


I recall BUG reports in Linux, that we got hv-balloon hot-add requests ~1 
minute after Linux booted up, because of the above reason of startup memory [in 
these BUG reports, memory onlining was disabled and the VM would run out of 
memory because we hotplugged too much memory]. That's why I remember that this 
approach once was done.

Maybe there are multiple implementations noways. At least in QEMU you could 
chose whatever makes most sense for QEMU.



Right, it seems that the Hyper-V behavior evolved with time, too.


This approach (of resizing the boot memory) also avoids problems if the
guest loses hot-add / ballooning capability after a reboot - for example,
rebooting into a Linux guest from Windows with hv-balloon.


TBH, I wouldn't be too concerned about that scenario ("hotplugged memory to a guest, guest reboots 
into a weird OS, weird OS isn't able to use hotplugged memory). For virtio-mem, the important part was 
that you always "know" how much memory the VM is aware about. If you always start with 
"Startup memory" and hotadd later (only if you detected guest support after a bootup), you can 
handle that scenario.


I'm not *that* concerned with cross-guest-type scenario either,
but if it can be made more smooth then I wouldn't mind.

Thanks,
Maciej

[1]: 
https://www.tenforums.com/performance-maintenance/38478-windows-10-hyper-v-dynamic-memory.html#post544905





Re: [PATCH] virtio: fix reachable assertion due to stale value of cached region size

2023-03-01 Thread Michael S. Tsirkin
On Wed, Feb 15, 2023 at 11:14:46PM +0100, Carlos López wrote:
> In virtqueue_{split,packed}_get_avail_bytes() descriptors are read
> in a loop via MemoryRegionCache regions and calls to
> vring_{split,packed}_desc_read() - these take a region cache and the
> index of the descriptor to be read.
> 
> For direct descriptors we use a cache provided by the caller, whose
> size matches that of the virtqueue vring. We limit the number of
> descriptors we can read by the size of that vring:
> 
> max = vq->vring.num;
> ...
> MemoryRegionCache *desc_cache = >desc;
> 
> For indirect descriptors, we initialize a new cache and limit the
> number of descriptors by the size of the intermediate descriptor:
> 
> len = address_space_cache_init(_desc_cache,
>vdev->dma_as,
>desc.addr, desc.len, false);
> desc_cache = _desc_cache;
> ...
> max = desc.len / sizeof(VRingDesc);
> 
> However, the first initialization of `max` is done outside the loop
> where we process guest descriptors, while the second one is done
> inside. This means that a sequence of an indirect descriptor followed
> by a direct one will leave a stale value in `max`. If the second
> descriptor's `next` field is smaller than the stale value, but
> greater than the size of the virtqueue ring (and thus the cached
> region), a failed assertion will be triggered in
> address_space_read_cached() down the call chain.
> 
> Fix this by initializing `max` inside the loop in both functions.
> 
> Fixes: 9796d0ac8fb0 ("virtio: use address_space_map/unmap to access 
> descriptors")
> Signed-off-by: Carlos López 
> ---
>  hw/virtio/virtio.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index f35178f5fc..db70c4976e 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1071,6 +1071,7 @@ static void virtqueue_split_get_avail_bytes(VirtQueue 
> *vq,
>  VirtIODevice *vdev = vq->vdev;
>  unsigned int max, idx;
>  unsigned int total_bufs, in_total, out_total;
> +MemoryRegionCache *desc_cache;

why are you moving desc_cache here?

>  MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
>  int64_t len = 0;
>  int rc;
> @@ -1078,15 +1079,13 @@ static void virtqueue_split_get_avail_bytes(VirtQueue 
> *vq,
>  idx = vq->last_avail_idx;
>  total_bufs = in_total = out_total = 0;
>  
> -max = vq->vring.num;
> -
>  while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
> -MemoryRegionCache *desc_cache = >desc;
> -unsigned int num_bufs;
> +unsigned int num_bufs = total_bufs;
>  VRingDesc desc;
>  unsigned int i;
>  
> -num_bufs = total_bufs;

nice cleanup but not a bugfix. Keep cleanups separate from fixes pls.

> +desc_cache = >desc;

init as part of declaration seems cleaner.

> +max = vq->vring.num;
>  

can we move declaration of max here within the loop?
will make sure the problem does not recur.

>  if (!virtqueue_get_head(vq, idx++, )) {
>  goto err;
> @@ -1218,14 +1217,14 @@ static void 
> virtqueue_packed_get_avail_bytes(VirtQueue *vq,
>  wrap_counter = vq->last_avail_wrap_counter;
>  total_bufs = in_total = out_total = 0;
>  
> -max = vq->vring.num;
> -
>  for (;;) {
>  unsigned int num_bufs = total_bufs;
>  unsigned int i = idx;
>  int rc;
>  
>  desc_cache = >desc;
> +max = vq->vring.num;
> +


same question can we move declaration into the loop?

>  vring_packed_desc_read(vdev, , desc_cache, idx, true);
>  if (!is_desc_avail(desc.flags, wrap_counter)) {
>  break;
> -- 
> 2.35.3




Re: [PATCH 2/5] hw/isa/vt82c686: Implement PCI IRQ routing

2023-03-01 Thread Bernhard Beschow



Am 1. März 2023 14:20:54 UTC schrieb Mark Cave-Ayland 
:
>On 23/02/2023 20:20, Bernhard Beschow wrote:
>
>> The real VIA south bridges implement a PCI IRQ router which is configured
>> by the BIOS or the OS. In order to respect these configurations, QEMU
>> needs to implement it as well.
>> 
>> Note: The implementation was taken from piix4_set_irq() in hw/isa/piix4.
>> 
>> Signed-off-by: Bernhard Beschow 
>> ---
>>   hw/isa/vt82c686.c | 44 
>>   1 file changed, 44 insertions(+)
>> 
>> diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
>> index 3f9bd0c04d..f24e387d63 100644
>> --- a/hw/isa/vt82c686.c
>> +++ b/hw/isa/vt82c686.c
>> @@ -604,6 +604,48 @@ static void via_isa_request_i8259_irq(void *opaque, int 
>> irq, int level)
>>   qemu_set_irq(s->cpu_intr, level);
>>   }
>>   +static int via_isa_get_pci_irq(const ViaISAState *s, int irq_num)
>> +{
>> +switch (irq_num) {
>> +case 0:
>> +return s->dev.config[0x55] >> 4;
>> +
>> +case 1:
>> +return s->dev.config[0x56] & 0xf;
>> +
>> +case 2:
>> +return s->dev.config[0x56] >> 4;
>> +
>> +case 3:
>> +return s->dev.config[0x57] >> 4;
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +static void via_isa_set_pci_irq(void *opaque, int irq_num, int level)
>> +{
>> +ViaISAState *s = opaque;
>> +PCIBus *bus = pci_get_bus(>dev);
>> +int pic_irq;
>> +
>> +/* now we change the pic irq level according to the via irq mappings */
>> +/* XXX: optimize */
>> +pic_irq = via_isa_get_pci_irq(s, irq_num);
>> +if (pic_irq < ISA_NUM_IRQS) {
>> +int i, pic_level;
>> +
>> +/* The pic level is the logical OR of all the PCI irqs mapped to 
>> it. */
>> +pic_level = 0;
>> +for (i = 0; i < PCI_NUM_PINS; i++) {
>> +if (pic_irq == via_isa_get_pci_irq(s, i)) {
>> +pic_level |= pci_bus_get_irq_level(bus, i);
>> +}
>> +}
>> +qemu_set_irq(s->isa_irqs[pic_irq], pic_level);
>> +}
>> +}
>> +
>>   static void via_isa_realize(PCIDevice *d, Error **errp)
>>   {
>>   ViaISAState *s = VIA_ISA(d);
>> @@ -676,6 +718,8 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
>>   if (!qdev_realize(DEVICE(>mc97), BUS(pci_bus), errp)) {
>>   return;
>>   }
>> +
>> +pci_bus_irqs(pci_bus, via_isa_set_pci_irq, s, PCI_NUM_PINS);
>>   }
>> /* TYPE_VT82C686B_ISA */
>
>This looks right, however generally a PCI device shouldn't really be setting 
>PCI bus IRQs: this is normally done by the PCI host bridge. Is it just the 
>case that the x86 world is different here for legacy reasons?

Well, looking at the pegasos2 schematics it seems to me that at least the intA 
+ intB lines are connected to both chips (and that they are even shared between 
the AGP slot and the PCI bus). On the Marvell north bridge they seem to be 
connected to GPIO pins and on the VIA chip to the PCI IRQ router.

Note that GPIO pins can usually be deactivated (tristated) and that the four 
VIA PCI IRQs can be deactivated by assigning "interrupt 0". Such a design would 
allow the lines to be hardwired to both "interrupt controllers", allowing 
system software to use either controller, or possibly even mixed (e.g. intA to 
be treated by the north bridge and intB by VIA).

Such designs are currently not easily implementable in QEMU since only one IRQ 
handler can be assigned to a PCI bus. As a workaround, one could assign a 
custom IRQ handler which implements special handling.

Getting back to your question, I think you are right that assigning the IRQ 
handler in the VIA model may break e.g. the Fuloong2e machine where the IRQ 
handler is set in the north bridge. Since the VIA chip is instantiated later it 
now effectively replaces the handler.

It would be really neat if QEMU allowed for assigning two or more IRQ handlers 
to a PCI bus...

Do you think that two interrupt controllers connected to IRQ lines like that 
sounds reasonable?

Best regards,
Bernhard

>
>
>ATB,
>
>Mark.



[PATCH 1/1] hw/riscv/virt.c: add cbom-block-size fdt property

2023-03-01 Thread Daniel Henrique Barboza
From: Anup Patel 

The cbom-block-size fdt property property is used to inform the OS about
the blocksize in bytes for the Zicbom cache operations.

Linux documents it in Documentation/devicetree/bindings/riscv/cpus.yaml
as:

  riscv,cbom-block-size:
$ref: /schemas/types.yaml#/definitions/uint32
description:
  The blocksize in bytes for the Zicbom cache operations.

Signed-off-by: Anup Patel 
---
 hw/riscv/virt.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 49acb57da4..31b55cc62f 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -244,6 +244,12 @@ static void create_fdt_socket_cpus(RISCVVirtState *s, int 
socket,
 name = riscv_isa_string(cpu_ptr);
 qemu_fdt_setprop_string(ms->fdt, cpu_name, "riscv,isa", name);
 g_free(name);
+
+if (cpu_ptr->cfg.ext_icbom) {
+qemu_fdt_setprop_cell(ms->fdt, cpu_name, "riscv,cbom-block-size",
+  cpu_ptr->cfg.cbom_blocksize);
+}
+
 qemu_fdt_setprop_string(ms->fdt, cpu_name, "compatible", "riscv");
 qemu_fdt_setprop_string(ms->fdt, cpu_name, "status", "okay");
 qemu_fdt_setprop_cell(ms->fdt, cpu_name, "reg",
-- 
2.39.2




[PATCH 0/1] hw/riscv/virt.c: add cbom-block-size fdt property

2023-03-01 Thread Daniel Henrique Barboza
Hi,

I'm sending this almost last minute patch as part of the work done in:

[PATCH v8 0/4] riscv: Add support for Zicbo[m,z,p] instructions

It turns out that the Linux kernel expects a 'cbom-block-size' FDT prop
to be able to determine the cbom-block-size. Without this patch the
kernel will misbehave with the Zicbom extension in the virt machine.

Note that a similar patch would be need for other RISC-V machines that
wants to support Linux with Zicbom.


Anup Patel (1):
  hw/riscv/virt.c: add cbom-block-size fdt property

 hw/riscv/virt.c | 6 ++
 1 file changed, 6 insertions(+)

-- 
2.39.2




Re: [PATCH 0/5] hw/timer/i8254: Un-inline and simplify IRQs

2023-03-01 Thread Michael S. Tsirkin
On Wed, Feb 15, 2023 at 06:43:48PM +0100, Philippe Mathieu-Daudé wrote:
> i8254_pit_init() uses a odd pattern of "use this IRQ output
> line if non-NULL, otherwise use the ISA IRQ #number as output".
> 
> Rework as simply "Use this IRQ output".


Acked-by: Michael S. Tsirkin 


Given it also affects KVM I will let Paolo merge this.

> Un-inline/rename/document functions.
> 
> Based-on: <20230215161641.32663-1-phi...@linaro.org>
>   "hw/ide: Untangle ISA/PCI abuses of ide_init_ioport" v2
> https://lore.kernel.org/qemu-devel/20230215161641.32663-1-phi...@linaro.org/
> 
> Philippe Mathieu-Daudé (5):
>   hw/timer/hpet: Include missing 'hw/qdev-properties.h' header
>   hw/timer/i8254: Factor i8254_pit_create() out and document
>   hw/i386/pc: Un-inline i8254_pit_init()
>   hw/timer/i8254: Really inline i8254_pit_init()
>   hw/i386/kvm: Factor i8254_pit_create_try_kvm() out
> 
>  hw/i386/kvm/i8254.c| 18 ++
>  hw/i386/microvm.c  |  6 +
>  hw/i386/pc.c   | 15 +---
>  hw/isa/i82378.c|  2 +-
>  hw/isa/piix4.c |  4 ++--
>  hw/isa/vt82c686.c  |  2 +-
>  hw/mips/jazz.c |  2 +-
>  hw/timer/hpet.c|  1 +
>  hw/timer/i8254.c   | 16 +
>  include/hw/timer/i8254.h   | 48 +-
>  target/i386/kvm/kvm-stub.c |  6 +
>  11 files changed, 69 insertions(+), 51 deletions(-)
> 
> -- 
> 2.38.1




Re: [PATCH v5 5/7] hw/isa/vt82c686: Work around missing level sensitive irq in i8259 model

2023-03-01 Thread David Woodhouse
On Wed, 2023-03-01 at 19:01 +0100, BALATON Zoltan wrote:
> 
> > It isn't a *correct* fix without a little bit more typing, but does
> > this make it work?
> > 
> > diff --git a/hw/intc/i8259.c b/hw/intc/i8259.c
> > index 17910f3bcb..36ebcff025 100644
> > --- a/hw/intc/i8259.c
> > +++ b/hw/intc/i8259.c
> > @@ -246,6 +246,7 @@ static void pic_ioport_write(void *opaque, hwaddr 
> > addr64,
> >  if (val & 0x08) {
> >  qemu_log_mask(LOG_UNIMP,
> >    "i8259: level sensitive irq not 
> > supported\n");
> > +    s->elcr = 0xff;
> 
> This works too. I guess the log can be then removed too. Could you submit 
> a proper patch or you want me to do that so we can drop the workaround for 
> it? Thanks for looking into it.


Happy for you to do the rest of the typing ... :)

So, *ideally* I think you need to introduce a new field in the
PICCommonState which records the status of the LTIM bit. And fix up the
vmstate_pic_common in hw/intc/i8259_common.c to save and restore that
(with versioning for upgrade/downgrade).

Then you find those places which currently check the bit for the
specific pin in s->elcr, and make them something like:

--- a/hw/intc/i8259.c
+++ b/hw/intc/i8259.c
@@ -133,7 +133,7 @@ static void pic_set_irq(void *opaque, int irq, int level)
 }
 #endif
 
-if (s->elcr & mask) {
+if (s->ltim || (s->elcr & mask)) {
 /* level triggered */
 if (level) {
 s->irr |= mask;

It *might* be that you should make the LTIM behaviour optional, so that
only certain incarnations of the i8259 actually get it at all and it
*wouldn't* take effect if a guest tried to set it, which is what the
PIIX3 datasheet implies. But I suspect we can get away without that.



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 00/12] Q35 PCI host fixes and QOM cleanup

2023-03-01 Thread Michael S. Tsirkin
On Tue, Feb 14, 2023 at 02:14:29PM +0100, Bernhard Beschow wrote:
> This series mostly cleans up QOM-related initialization code. It also performs
> some modernization and fixing.
> 
> The first patch originates from "PC and ICH9 clanups" series [1] which has 
> been
> dropped in v3 in favor of another series [2]. Review comments in [2] suggest 
> it
> needs more work, so bring the patch back here.
> 
> Patch 2 fixes a clangd warning and patch 3 modernizes usage of the memory API.

what's the warning? commit log says nothing about it.

> Patches 4-9 clean up initialization code.
> 
> The last four patches also clean up initialization code with the last patch
> doing the actual cleanup.
> 
> Based-on: <20230213162004.2797-1-shen...@gmail.com>
>  "[PATCH v4 0/9] PC cleanups"
> 
> Testing done:
> * `make check`
> * `make check-avocado`
> * `qemu-system-x86_64 -M q35 -m 2G -cdrom \
>  manjaro-kde-21.3.2-220704-linux515.iso`
> 
> [1] 
> https://lore.kernel.org/qemu-devel/20230131115326.12454-1-shen...@gmail.com/
> [2] 
> https://lore.kernel.org/qemu-devel/20230203180914.49112-1-phi...@linaro.org/
> 
> Bernhard Beschow (12):
>   hw/i386/pc_q35: Resolve redundant q35_host variable
>   hw/pci-host/q35: Fix contradicting .endianness assignment
>   hw/pci-host/q35: Use memory_region_set_address() also for
> tseg_blackhole
>   hw/pci-host/q35: Initialize PCMachineState::bus in board code
>   hw/pci-host/q35: Initialize "bypass-iommu" property from board code
>   hw/pci-host/q35: Initialize properties just once
>   hw/pci-host/q35: Initialize PCI hole boundaries just once
>   hw/pci-host/q35: Turn PCI hole properties into class properties
>   hw/pci-host/q35: Rename local variable to more idiomatic "phb"
>   hw/pci-host/q35: Propagate to errp rather than doing error_fatal
>   hw/pci-host/q35: Merge mch_realize() into q35_host_realize()
>   hw/pci-host/q35: Move MemoryRegion pointers to host device
> 
>  include/hw/pci-host/q35.h |  17 +-
>  hw/i386/pc_q35.c  |  33 ++--
>  hw/pci-host/q35.c | 325 ++
>  3 files changed, 178 insertions(+), 197 deletions(-)
> 
> -- 
> 2.39.1
> 




Re: [PATCH 00/12] Q35 PCI host fixes and QOM cleanup

2023-03-01 Thread Michael S. Tsirkin
On Tue, Feb 21, 2023 at 03:39:28PM +, Bernhard Beschow wrote:
> 
> 
> Am 14. Februar 2023 13:14:29 UTC schrieb Bernhard Beschow :
> >This series mostly cleans up QOM-related initialization code. It also 
> >performs
> >
> >some modernization and fixing.
> >
> >
> >
> >The first patch originates from "PC and ICH9 clanups" series [1] which has 
> >been
> >
> >dropped in v3 in favor of another series [2]. Review comments in [2] suggest 
> >it
> >
> >needs more work, so bring the patch back here.
> >
> >
> >
> >Patch 2 fixes a clangd warning and patch 3 modernizes usage of the memory 
> >API.
> >
> >
> >
> >Patches 4-9 clean up initialization code.
> >
> >
> >
> >The last four patches also clean up initialization code with the last patch
> >
> >doing the actual cleanup.
> >
> 
> Ping


sent some comments. Philippe was reviewing related patches maybe
he wants to poke at these too.

> >
> >
> >Based-on: <20230213162004.2797-1-shen...@gmail.com>
> >
> > "[PATCH v4 0/9] PC cleanups"
> >
> >
> >
> >Testing done:
> >
> >* `make check`
> >
> >* `make check-avocado`
> >
> >* `qemu-system-x86_64 -M q35 -m 2G -cdrom \
> >
> > manjaro-kde-21.3.2-220704-linux515.iso`
> >
> >
> >
> >[1] 
> >https://lore.kernel.org/qemu-devel/20230131115326.12454-1-shen...@gmail.com/
> >
> >[2] 
> >https://lore.kernel.org/qemu-devel/20230203180914.49112-1-phi...@linaro.org/
> >
> >
> >
> >Bernhard Beschow (12):
> >
> >  hw/i386/pc_q35: Resolve redundant q35_host variable
> >
> >  hw/pci-host/q35: Fix contradicting .endianness assignment
> >
> >  hw/pci-host/q35: Use memory_region_set_address() also for
> >
> >tseg_blackhole
> >
> >  hw/pci-host/q35: Initialize PCMachineState::bus in board code
> >
> >  hw/pci-host/q35: Initialize "bypass-iommu" property from board code
> >
> >  hw/pci-host/q35: Initialize properties just once
> >
> >  hw/pci-host/q35: Initialize PCI hole boundaries just once
> >
> >  hw/pci-host/q35: Turn PCI hole properties into class properties
> >
> >  hw/pci-host/q35: Rename local variable to more idiomatic "phb"
> >
> >  hw/pci-host/q35: Propagate to errp rather than doing error_fatal
> >
> >  hw/pci-host/q35: Merge mch_realize() into q35_host_realize()
> >
> >  hw/pci-host/q35: Move MemoryRegion pointers to host device
> >
> >
> >
> > include/hw/pci-host/q35.h |  17 +-
> >
> > hw/i386/pc_q35.c  |  33 ++--
> >
> > hw/pci-host/q35.c | 325 ++
> >
> > 3 files changed, 178 insertions(+), 197 deletions(-)
> >
> >
> >
> >-- >
> >2.39.1
> >
> >
> >




Re: [PATCH 06/12] hw/pci-host/q35: Initialize properties just once

2023-03-01 Thread Michael S. Tsirkin
On Tue, Feb 14, 2023 at 02:14:35PM +0100, Bernhard Beschow wrote:
> Although not used there, the attributes for Q35's "pci-hole64-size" and
> "short_root_bus" properties currently reside in its child device. This
> causes the default values to be overwritten during the child's
> object_initialize() phase.

pls add explanation why this is a problem.

> Avoid this by moving both attributes into the
> host device.
> 
> Signed-off-by: Bernhard Beschow 
> ---
>  include/hw/pci-host/q35.h |  5 +++--
>  hw/pci-host/q35.c | 20 +---
>  2 files changed, 8 insertions(+), 17 deletions(-)
> 
> diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
> index fcbe57b42d..93e41ffbee 100644
> --- a/include/hw/pci-host/q35.h
> +++ b/include/hw/pci-host/q35.h
> @@ -54,8 +54,6 @@ struct MCHPCIState {
>  Range pci_hole;
>  uint64_t below_4g_mem_size;
>  uint64_t above_4g_mem_size;
> -uint64_t pci_hole64_size;
> -uint32_t short_root_bus;
>  uint16_t ext_tseg_mbytes;
>  };
>  
> @@ -64,7 +62,10 @@ struct Q35PCIHost {
>  PCIExpressHost parent_obj;
>  /*< public >*/
>  
> +uint64_t pci_hole64_size;
> +uint32_t short_root_bus;
>  bool pci_hole64_fix;
> +
>  MCHPCIState mch;
>  };
>  
> diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> index 0e198f97a7..03aa08dae5 100644
> --- a/hw/pci-host/q35.c
> +++ b/hw/pci-host/q35.c
> @@ -76,7 +76,7 @@ static const char *q35_host_root_bus_path(PCIHostState 
> *host_bridge,
>  Q35PCIHost *s = Q35_HOST_DEVICE(host_bridge);
>  
>   /* For backwards compat with old device paths */
> -if (s->mch.short_root_bus) {
> +if (s->short_root_bus) {
>  return "";
>  }
>  return ":00";
> @@ -161,27 +161,19 @@ static void q35_host_get_pci_hole64_end(Object *obj, 
> Visitor *v,
>  
>  pci_bus_get_w64_range(h->bus, );
>  value = range_is_empty() ? 0 : range_upb() + 1;
> -hole64_end = ROUND_UP(hole64_start + s->mch.pci_hole64_size, 1ULL << 30);
> +hole64_end = ROUND_UP(hole64_start + s->pci_hole64_size, 1ULL << 30);
>  if (s->pci_hole64_fix && value < hole64_end) {
>  value = hole64_end;
>  }
>  visit_type_uint64(v, name, , errp);
>  }
>  
> -/*
> - * NOTE: setting defaults for the mch.* fields in this table
> - * doesn't work, because mch is a separate QOM object that is
> - * zeroed by the object_initialize(>mch, ...) call inside
> - * q35_host_initfn().  The default values for those
> - * properties need to be initialized manually by
> - * q35_host_initfn() after the object_initialize() call.
> - */
>  static Property q35_host_props[] = {
>  DEFINE_PROP_UINT64(PCIE_HOST_MCFG_BASE, Q35PCIHost, parent_obj.base_addr,
>  MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT),
>  DEFINE_PROP_SIZE(PCI_HOST_PROP_PCI_HOLE64_SIZE, Q35PCIHost,
> - mch.pci_hole64_size, Q35_PCI_HOST_HOLE64_SIZE_DEFAULT),
> -DEFINE_PROP_UINT32("short_root_bus", Q35PCIHost, mch.short_root_bus, 0),
> + pci_hole64_size, Q35_PCI_HOST_HOLE64_SIZE_DEFAULT),
> +DEFINE_PROP_UINT32("short_root_bus", Q35PCIHost, short_root_bus, 0),
>  DEFINE_PROP_SIZE(PCI_HOST_BELOW_4G_MEM_SIZE, Q35PCIHost,
>   mch.below_4g_mem_size, 0),
>  DEFINE_PROP_SIZE(PCI_HOST_ABOVE_4G_MEM_SIZE, Q35PCIHost,
> @@ -218,9 +210,7 @@ static void q35_host_initfn(Object *obj)
>  object_initialize_child(OBJECT(s), "mch", >mch, TYPE_MCH_PCI_DEVICE);
>  qdev_prop_set_int32(DEVICE(>mch), "addr", PCI_DEVFN(0, 0));
>  qdev_prop_set_bit(DEVICE(>mch), "multifunction", false);
> -/* mch's object_initialize resets the default value, set it again */
> -qdev_prop_set_uint64(DEVICE(s), PCI_HOST_PROP_PCI_HOLE64_SIZE,
> - Q35_PCI_HOST_HOLE64_SIZE_DEFAULT);
> +
>  object_property_add(obj, PCI_HOST_PROP_PCI_HOLE_START, "uint32",
>  q35_host_get_pci_hole_start,
>  NULL, NULL, NULL);
> -- 
> 2.39.1




Re: [PATCH 05/12] hw/pci-host/q35: Initialize "bypass-iommu" property from board code

2023-03-01 Thread Michael S. Tsirkin
On Tue, Feb 14, 2023 at 02:14:34PM +0100, Bernhard Beschow wrote:
> The Q35 PCI host already has a "bypass-iommu" property. However, the
> host initializes this property itself by accessing global machine state,
> thereby assuming it to be a PC machine. Avoid this by having board code
> set this property.
> 
> Signed-off-by: Bernhard Beschow 
> ---
>  hw/i386/pc_q35.c  | 2 ++
>  hw/pci-host/q35.c | 3 +--
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index c2dc87acee..b3c55012d4 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -231,6 +231,8 @@ static void pc_q35_init(MachineState *machine)
>  x86ms->below_4g_mem_size, NULL);
>  object_property_set_int(phb, PCI_HOST_ABOVE_4G_MEM_SIZE,
>  x86ms->above_4g_mem_size, NULL);
> +object_property_set_bool(phb, "bypass-iommu",
> + pcms->default_bus_bypass_iommu, NULL);

Can we use a macro to avoid duplicating the property name?

>  sysbus_realize_and_unref(SYS_BUS_DEVICE(phb), _fatal);
>  
>  /* pci */
> diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> index 26e9e28e0e..0e198f97a7 100644
> --- a/hw/pci-host/q35.c
> +++ b/hw/pci-host/q35.c
> @@ -66,8 +66,7 @@ static void q35_host_realize(DeviceState *dev, Error **errp)
>  s->mch.pci_address_space,
>  s->mch.address_space_io,
>  0, TYPE_PCIE_BUS);
> -pci->bypass_iommu =
> -PC_MACHINE(qdev_get_machine())->default_bus_bypass_iommu;
> +
>  qdev_realize(DEVICE(>mch), BUS(pci->bus), _fatal);
>  }
>  
> -- 
> 2.39.1




Re: [PATCH 03/12] hw/pci-host/q35: Use memory_region_set_address() also for tseg_blackhole

2023-03-01 Thread Michael S. Tsirkin
On Tue, Feb 14, 2023 at 02:14:32PM +0100, Bernhard Beschow wrote:
> Deleting from and adding to the parent memory region seems to be the old
> way of changing a memory region's address which is superseeded by
> memory_region_set_address(). Moreover, memory_region_set_address() is
> already used for tseg_window which is tseg_blackhole's counterpart in
> SMM space.
> 
> Ammends: bafc90bdc594 'q35: implement TSEG'

I don't really see what purpose does this tag serve but
if you want it use the standard format pls.


> Signed-off-by: Bernhard Beschow 
> ---
>  hw/pci-host/q35.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> index 3124cad60f..0384ce4350 100644
> --- a/hw/pci-host/q35.c
> +++ b/hw/pci-host/q35.c
> @@ -404,12 +404,11 @@ static void mch_update_smram(MCHPCIState *mch)
>  } else {
>  tseg_size = 0;
>  }
> -memory_region_del_subregion(mch->system_memory, >tseg_blackhole);
> +
>  memory_region_set_enabled(>tseg_blackhole, tseg_size);
>  memory_region_set_size(>tseg_blackhole, tseg_size);
> -memory_region_add_subregion_overlap(mch->system_memory,
> -mch->below_4g_mem_size - tseg_size,
> ->tseg_blackhole, 1);
> +memory_region_set_address(>tseg_blackhole,
> +  mch->below_4g_mem_size - tseg_size);
>  
>  memory_region_set_enabled(>tseg_window, tseg_size);
>  memory_region_set_size(>tseg_window, tseg_size);
> -- 
> 2.39.1




Re: [PATCH 02/12] hw/pci-host/q35: Fix contradicting .endianness assignment

2023-03-01 Thread Michael S. Tsirkin
On Tue, Feb 14, 2023 at 02:14:31PM +0100, Bernhard Beschow wrote:
> Settle on little endian which is consistent with using
> pci_host_conf_le_ops.
> 
> Fixes: bafc90bdc594 'q35: implement TSEG'
> Signed-off-by: Bernhard Beschow 

I think it's native because native is a bit cheaper and
it's just 0x anyway.
Why change? A comment would be a good idea though.

> ---
>  hw/pci-host/q35.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> index 83f2a98c71..3124cad60f 100644
> --- a/hw/pci-host/q35.c
> +++ b/hw/pci-host/q35.c
> @@ -289,7 +289,6 @@ static void blackhole_write(void *opaque, hwaddr addr, 
> uint64_t val,
>  static const MemoryRegionOps blackhole_ops = {
>  .read = blackhole_read,
>  .write = blackhole_write,
> -.endianness = DEVICE_NATIVE_ENDIAN,
>  .valid.min_access_size = 1,
>  .valid.max_access_size = 4,
>  .impl.min_access_size = 4,
> -- 
> 2.39.1




Re: [PATCH v7 0/4] riscv: Add support for Zicbo[m,z,p] instructions

2023-03-01 Thread Daniel Henrique Barboza

Hi Palmer,

On 3/1/23 18:35, Palmer Dabbelt wrote:

On Thu, 23 Feb 2023 15:44:23 PST (-0800), dbarb...@ventanamicro.com wrote:

Hi,

This new version has changes based on feedbacks of both v5 and v6.

Patch 1 was revamped. We're modifying probe_access_flags() to accept a
'size' parameter to allow for RISC-V usage with PMP. Changes in the existing
callers are trivial and no behavior change is done (well, at least it's not
intended). And we avoid adding another  probe_* API that only RISC-V
will care about.

Changes from v6:
- patch 1:
  - no longer adding a new probe_access_flags_range() API
  - add a 'size' param to probe_access_flags()
- patch 2:
  - check for RISCV_EXCP_ILLEGAL_INST first in check_zicbo_envcfg()
  - add a probe for MMU_DATA_STORE after check_zicbo_envcfg()
  - write zeros even if the address isn't mapped to RAM
- patch 3:
  - simplify the verifications in check_zicbom_access() by using probe_write()
- v6 link: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg05379.html

Christoph Muellner (3):
  target/riscv: implement Zicboz extension
  target/riscv: implement Zicbom extension
  target/riscv: add Zicbop cbo.prefetch{i,r,m} placeholder

Daniel Henrique Barboza (1):
  tcg: add 'size' param to probe_access_flags()

 accel/stubs/tcg-stub.c  |   2 +-
 accel/tcg/cputlb.c  |  17 ++-
 accel/tcg/user-exec.c   |   5 +-
 include/exec/exec-all.h |   3 +-
 semihosting/uaccess.c   |   2 +-
 target/arm/ptw.c    |   2 +-
 target/arm/sve_helper.c |   2 +-
 target/riscv/cpu.c  |   7 ++
 target/riscv/cpu.h  |   4 +
 target/riscv/helper.h   |   5 +
 target/riscv/insn32.decode  |  16 ++-
 target/riscv/insn_trans/trans_rvzicbo.c.inc |  57 +
 target/riscv/op_helper.c    | 132 
 target/riscv/translate.c    |   1 +
 target/s390x/tcg/mem_helper.c   |   6 +-
 15 files changed, 247 insertions(+), 14 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvzicbo.c.inc


Acked-by: Palmer Dabbelt 



Thanks! I guess you would want to pick the new version instead that is already
fully acked:

[PATCH v8 0/4] riscv: Add support for Zicbo[m,z,p] instructions


Richard already picked and sent patch 1 in his tcg patch queue here:


[PULL v2 00/62] tcg patch queue

So I guess you can either include patch 1 in your tree and exclude it when 
master
is updated or you can wait for patch 1 to land on master before picking the 
remaining
3 patches.


Thanks,


Daniel




in case Richard wants to take these along with the TCG patch, otherwise I'm 
happy to take these through the RISC-V tree when that lands (or do some sort of 
shared tag, as we're getting kind of close).




Re: [PATCH 02/12] hw/pci-host/q35: Fix contradicting .endianness assignment

2023-03-01 Thread Michael S. Tsirkin
On Tue, Feb 14, 2023 at 02:14:31PM +0100, Bernhard Beschow wrote:
> Settle on little endian which is consistent with using
> pci_host_conf_le_ops.
> 
> Fixes: bafc90bdc594 'q35: implement TSEG'

incorrect formatting for the fixes tag


> Signed-off-by: Bernhard Beschow 
> ---
>  hw/pci-host/q35.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> index 83f2a98c71..3124cad60f 100644
> --- a/hw/pci-host/q35.c
> +++ b/hw/pci-host/q35.c
> @@ -289,7 +289,6 @@ static void blackhole_write(void *opaque, hwaddr addr, 
> uint64_t val,
>  static const MemoryRegionOps blackhole_ops = {
>  .read = blackhole_read,
>  .write = blackhole_write,
> -.endianness = DEVICE_NATIVE_ENDIAN,
>  .valid.min_access_size = 1,
>  .valid.max_access_size = 4,
>  .impl.min_access_size = 4,
> -- 
> 2.39.1




seeking advice for configuring usb_desc in ccid / dev-smartcard-reader.c

2023-03-01 Thread Ripke, Klaus
Hello


in he/usb/dev-smartcard-reader.c: we need a slightly differing version
of the "Athena Smart Card Reader" as of qemu_ccid_descriptor with two
bytes changed to fixed "extended" values, 14 for max slot and 4 in
feature 2.
This data is shared by all ccid devices through a chain down to
usb_desc (which is klass->usb_desc for all ccid as of now).

Should we best follow the practice of dev-audio and dev-hid by using
another static config, selected by some device property?

Or better dynamically create and modify copies of all structures in
realize?

Or some other way?


many thanks for your help, kind regards
Klaus


Re: [PATCH] vhost: accept VIRTIO_F_ORDER_PLATFORM as a valid SVQ feature

2023-03-01 Thread Michael S. Tsirkin
On Tue, Feb 14, 2023 at 09:36:01AM +0100, Eugenio Perez Martin wrote:
> On Tue, Feb 14, 2023 at 8:51 AM Michael S. Tsirkin  wrote:
> >
> > On Tue, Feb 14, 2023 at 08:02:08AM +0100, Eugenio Perez Martin wrote:
> > > On Tue, Feb 14, 2023 at 7:31 AM Jason Wang  wrote:
> > > >
> > > > On Tue, Feb 14, 2023 at 3:19 AM Eugenio Pérez  
> > > > wrote:
> > > > >
> > > > > VIRTIO_F_ORDER_PLATFORM indicates that memory accesses by the driver 
> > > > > and
> > > > > the device are ordered in a way described by the platform.  Since vDPA
> > > > > devices may be backed by a hardware devices, let's allow
> > > > > VIRTIO_F_ORDER_PLATFORM.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez 
> > > > > ---
> > > > >  hw/virtio/vhost-shadow-virtqueue.c | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > >
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > > > > b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > index 4307296358..6bb1998f12 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > @@ -34,6 +34,7 @@ bool vhost_svq_valid_features(uint64_t features, 
> > > > > Error **errp)
> > > > >  switch (b) {
> > > > >  case VIRTIO_F_ANY_LAYOUT:
> > > > >  case VIRTIO_RING_F_EVENT_IDX:
> > > > > +case VIRTIO_F_ORDER_PLATFORM:
> > > >
> > > > Do we need to add this bit to vdpa_feature_bits[] as well?
> > > >
> > >
> > > If we want to pass it to the guest, yes we should. I'll send another
> > > patch for it.
> > >
> > > But I think that should be done on top / in parallel actually.
> > >
> > > Open question: Should all vdpa hardware devices offer it? Or this is
> > > only needed on specific archs?
> > >
> > > Thanks!
> >
> > I don't get what this is doing at all frankly. vdpa has to
> > pass VIRTIO_F_ORDER_PLATFORM to guest at all times - unless
> > - it's a x86 host where it kind of works anyway
> > - it's vduse which frankly is so slow we can do VIRTIO_F_ORDER_PLATFORM 
> > anyway.
> 
> That was my understanding, adding vdpasim to the list of exceptions
> (please correct me if I'm wrong).
> 
> > In short VIRTIO_F_ORDER_PLATFORM has nothing to do with the device
> > and everything with qemu itself.
> >
> 
> I have little experience outside of x86 so I may be wrong here. My
> understanding is that this feature allows the guest to optimize
> barriers around memory ops:
> * If VIRTIO_F_ORDER_PLATFORM is not negotiated, the driver can use
> softer memory barriers that protects ordering between different
> processors.
> * If VIRTIO_F_ORDER_PLATFORM is negotiated, stronger ordering is
> needed that also protects transport (PCI) accesses
> 
> This is backed up by comments in the standard:
> This implies that the driver needs to use memory barriers suitable for
> devices described by the platform; e.g. for the PCI transport in the
> case of hardware PCI devices.
> 
> And in virtio drivers:
> For virtio_pci on SMP, we don't need to order with respect to MMIO
> accesses through relaxed memory I/O windows, so virt_mb() et al are
> sufficient.
> For using virtio to talk to real devices (eg. other heterogeneous
> CPUs) we do need real barriers.
> 
> So the sentence "VIRTIO_F_ORDER_PLATFORM has nothing to do with the
> device and everything with qemu itself." is actually the reverse, and
> has everything to do with devices?

Point is this is not device's decision.


> > Yea we can allow VIRTIO_F_ORDER_PLATFORM from kernel but given
> > we never did at this point it will need a protocol feature bit.
> > I don't think it's worth it ..
> >
> 
> With "from kernel" do you mean in vhost-kernel or in virtio ring
> driver? The virtio ring driver already supports them.

vhost-kernel

> I'm ok with leaving this for the future but that means hw devices in
> non-x86 platforms may not work correctly, isn't it?
> 
> Thanks!

You need to pass this to guest. My point is that there is no reason to
get it from the kernel driver. QEMU can figure out whether the flag is
needed itself.

-- 
MST




Re: [PATCH v7 0/4] riscv: Add support for Zicbo[m,z,p] instructions

2023-03-01 Thread Palmer Dabbelt

On Thu, 23 Feb 2023 15:44:23 PST (-0800), dbarb...@ventanamicro.com wrote:

Hi,

This new version has changes based on feedbacks of both v5 and v6.

Patch 1 was revamped. We're modifying probe_access_flags() to accept a
'size' parameter to allow for RISC-V usage with PMP. Changes in the existing
callers are trivial and no behavior change is done (well, at least it's not
intended). And we avoid adding another  probe_* API that only RISC-V
will care about.

Changes from v6:
- patch 1:
  - no longer adding a new probe_access_flags_range() API
  - add a 'size' param to probe_access_flags()
- patch 2:
  - check for RISCV_EXCP_ILLEGAL_INST first in check_zicbo_envcfg()
  - add a probe for MMU_DATA_STORE after check_zicbo_envcfg()
  - write zeros even if the address isn't mapped to RAM
- patch 3:
  - simplify the verifications in check_zicbom_access() by using probe_write()
- v6 link: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg05379.html

Christoph Muellner (3):
  target/riscv: implement Zicboz extension
  target/riscv: implement Zicbom extension
  target/riscv: add Zicbop cbo.prefetch{i,r,m} placeholder

Daniel Henrique Barboza (1):
  tcg: add 'size' param to probe_access_flags()

 accel/stubs/tcg-stub.c  |   2 +-
 accel/tcg/cputlb.c  |  17 ++-
 accel/tcg/user-exec.c   |   5 +-
 include/exec/exec-all.h |   3 +-
 semihosting/uaccess.c   |   2 +-
 target/arm/ptw.c|   2 +-
 target/arm/sve_helper.c |   2 +-
 target/riscv/cpu.c  |   7 ++
 target/riscv/cpu.h  |   4 +
 target/riscv/helper.h   |   5 +
 target/riscv/insn32.decode  |  16 ++-
 target/riscv/insn_trans/trans_rvzicbo.c.inc |  57 +
 target/riscv/op_helper.c| 132 
 target/riscv/translate.c|   1 +
 target/s390x/tcg/mem_helper.c   |   6 +-
 15 files changed, 247 insertions(+), 14 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvzicbo.c.inc


Acked-by: Palmer Dabbelt 

in case Richard wants to take these along with the TCG patch, otherwise 
I'm happy to take these through the RISC-V tree when that lands (or do 
some sort of shared tag, as we're getting kind of close).




Re: [PATCH v7 1/4] tcg: add 'size' param to probe_access_flags()

2023-03-01 Thread Palmer Dabbelt

On Thu, 23 Feb 2023 16:10:59 PST (-0800), Richard Henderson wrote:

On 2/23/23 13:44, Daniel Henrique Barboza wrote:

probe_access_flags() as it is today uses probe_access_full(), which in
turn uses probe_access_internal() with size = 0. probe_access_internal()
then uses the size to call the tlb_fill() callback for the given CPU.
This size param ('fault_size' as probe_access_internal() calls it) is
ignored by most existing .tlb_fill callback implementations, e.g.
arm_cpu_tlb_fill(), ppc_cpu_tlb_fill(), x86_cpu_tlb_fill() and
mips_cpu_tlb_fill() to name a few.

But RISC-V riscv_cpu_tlb_fill() actually uses it. The 'size' parameter
is used to check for PMP (Physical Memory Protection) access. This is
necessary because PMP does not make any guarantees about all the bytes
of the same page having the same permissions, i.e. the same page can
have different PMP properties, so we're forced to make sub-page range
checks. To allow RISC-V emulation to do a probe_acess_flags() that
covers PMP, we need to either add a 'size' param to the existing
probe_acess_flags() or create a new interface (e.g.
probe_access_range_flags).

There are quite a few probe_* APIs already, so let's add a 'size' param
to probe_access_flags() and re-use this API. This is done by open coding
what probe_access_full() does inside probe_acess_flags() and passing the
'size' param to probe_acess_internal(). Existing probe_access_flags()
callers use size = 0 to not change their current API usage. 'size' is
asserted to enforce single page access like probe_access() already does.

No behavioral changes intended.

Signed-off-by: Daniel Henrique Barboza
---
  accel/stubs/tcg-stub.c|  2 +-
  accel/tcg/cputlb.c| 17 ++---
  accel/tcg/user-exec.c |  5 +++--
  include/exec/exec-all.h   |  3 ++-
  semihosting/uaccess.c |  2 +-
  target/arm/ptw.c  |  2 +-
  target/arm/sve_helper.c   |  2 +-
  target/s390x/tcg/mem_helper.c |  6 +++---
  8 files changed, 26 insertions(+), 13 deletions(-)


Queueing to tcg-next.


Unless I'm missing something, that's not in Peter's tree yet?  I Ack'd 
the cover, it's fine with me if you want to take these via the TCG tree.  
Doing some sort of shared tag for the first one works for me too, I've 
also got some other stuff in the RISC-V queue.


Thanks!



Re: [PATCH 12/12] hw: Move ich9.h to southbridge/

2023-03-01 Thread Michael S. Tsirkin
On Mon, Feb 27, 2023 at 01:22:37PM +0100, Philippe Mathieu-Daudé wrote:
> On 13/2/23 18:30, Bernhard Beschow wrote:
> > ICH9 is a south bridge which doesn't necessarily depend on x86, so move
> > it into the southbridge folder, analoguous to PIIX.
> 
> However it is still tied to it due to:
> 
> hw/isa/lpc_ich9.c:315:cpu_interrupt(first_cpu, CPU_INTERRUPT_SMI);
> hw/isa/lpc_ich9.c:462:cpu_interrupt(cs, CPU_INTERRUPT_SMI);
> hw/isa/lpc_ich9.c:465:cpu_interrupt(current_cpu,
> CPU_INTERRUPT_SMI);
> target/i386/cpu.h:1145:#define CPU_INTERRUPT_SMI CPU_INTERRUPT_TGT_EXT_2

I guess at least the commit log should be changed then.


> > Signed-off-by: Bernhard Beschow 
> > ---
> >   MAINTAINERS | 1 +
> >   include/hw/{i386 => southbridge}/ich9.h | 6 +++---
> >   hw/acpi/ich9.c  | 2 +-
> >   hw/acpi/ich9_tco.c  | 2 +-
> >   hw/i2c/smbus_ich9.c | 2 +-
> >   hw/i386/acpi-build.c| 2 +-
> >   hw/i386/pc_q35.c| 2 +-
> >   hw/isa/lpc_ich9.c   | 2 +-
> >   hw/pci-bridge/i82801b11.c   | 2 +-
> >   tests/qtest/tco-test.c  | 2 +-
> >   10 files changed, 12 insertions(+), 11 deletions(-)
> >   rename include/hw/{i386 => southbridge}/ich9.h (99%)




Re: [PATCH v5 3/7] hw/isa/vt82c686: Implement PCI IRQ routing

2023-03-01 Thread BALATON Zoltan

On Wed, 1 Mar 2023, Bernhard Beschow wrote:

Am 1. März 2023 11:15:02 UTC schrieb BALATON Zoltan :

On Wed, 1 Mar 2023, Bernhard Beschow wrote:

Am 1. März 2023 00:17:09 UTC schrieb BALATON Zoltan :

The real VIA south bridges implement a PCI IRQ router which is configured
by the BIOS or the OS. In order to respect these configurations, QEMU
needs to implement it as well. The real chip may allow routing IRQs from
internal functions independently of PCI interrupts but since guests
usually configute it to a single shared interrupt we don't model that
here for simplicity.

Note: The implementation was taken from piix4_set_irq() in hw/isa/piix4.

Suggested-by: Bernhard Beschow 
Signed-off-by: BALATON Zoltan 
---
hw/isa/vt82c686.c | 38 +-
1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index 01e0148967..018a119964 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -604,6 +604,42 @@ static void via_isa_request_i8259_irq(void *opaque, int 
irq, int level)
qemu_set_irq(s->cpu_intr, level);
}

+static int via_isa_get_pci_irq(const ViaISAState *s, int irq_num)
+{
+switch (irq_num) {
+case 0:
+return s->dev.config[0x55] >> 4;
+case 1:
+return s->dev.config[0x56] & 0xf;
+case 2:
+return s->dev.config[0x56] >> 4;
+case 3:
+return s->dev.config[0x57] >> 4;
+}
+return 0;
+}
+
+static void via_isa_set_pci_irq(void *opaque, int irq_num, int level)
+{
+ViaISAState *s = opaque;
+PCIBus *bus = pci_get_bus(>dev);
+int i, pic_level, pic_irq = via_isa_get_pci_irq(s, irq_num);
+
+if (unlikely(pic_irq == 0 || pic_irq == 2 || pic_irq > 14)) {


Where does the "pic_irq > 14" come from? It's not mentioned in the datasheet.


Check at 0x3c register of USB and AC97 functions. For the others it may be 
valid but unlikely to be used hence we just disallow it. (In my version which 
also mapped IDE here I've checkrf for each source but there's no way to do that 
in this version.)


I'm not sure what you mean. The 0x3c regs aren't related to the PCI IRQ routing 
regs.

Moreover, as I wrote in my other mail, I wonder what the datasheet tries to 
tell us here at all. The information there partly contradicts itself.

Can you please clarify?


Here is the entire register desription that you've partly quoted before:

Offset 3C - Interrupt Line (00h).. RW
7-4 Reserved always reads 0
3-0 USB Interrupt Routing  default = 16h
 Disabled. default
0001 IRQ1
0010 Reserved
0011 IRQ3
0100 IRQ4
0101 IRQ5
0110 IRQ6
0111 IRQ7
1000 IRQ8
1001 IRQ9
1010 IRQ10
1011 IRQ11
1100 IRQ12
1101 IRQ13
1110 IRQ14
 Disabled

Apart from the obvious typo stating default 16h the list below clearly 
says that the default is really 0 and 0 and 15 means Disabled (so if this 
is a copy paste error and the default should be 15 that would still mean 
it's disabled by default) and could be routed to any other ISA IRQ but 
you really should not route it to 2 as that would mess up the cascade IRQ. 
That's how I read that.


And yes I was trying to tell you rhat this is not related to the PCI IRQ 
routing regs which only set the IRQ for the PIRQ pins ahd this one sets 
the IRQ for the function it belongs to (USB, AC97, etc.) independently of 
that. Your patch which is now in the series does not implement this but 
uses pci interrupts instead and still works because guests don't seem to 
actually route IRQs to different interrupts just put everything on IRQ9 so 
your patch still works. As this makes QEMU model simpler we can do that 
and later if we ever need to model this for a guest that actually wants to 
use this feature of the chip you'll have my v1 series in the list archives 
where I've tried to implement the above. For me we can end it here.


Regards,
BALATON Zoltan

Re: [PATCH v2 03/20] vfio/migration: Add VFIO migration pre-copy support

2023-03-01 Thread Jason Gunthorpe
On Wed, Mar 01, 2023 at 12:55:59PM -0700, Alex Williamson wrote:

> So it seems like what we need here is both a preface buffer size and a
> target device latency.  The QEMU pre-copy algorithm should factor both
> the remaining data size and the device latency into deciding when to
> transition to stop-copy, thereby allowing the device to feed actually
> relevant data into the algorithm rather than dictate its behavior.

I don't know that we can realistically estimate startup latency,
especially have the sender estimate latency on the receiver..

I feel like trying to overlap the device start up with the STOP phase
is an unnecessary optimization? How do you see it benifits?

I've been thinking of this from the perspective that we should always
ensure device startup is completed, it is time that has to be paid,
why pay it during STOP?

Jason



Re: [PATCH v5 00/18] pci hotplug tracking

2023-03-01 Thread Michael S. Tsirkin
On Thu, Feb 16, 2023 at 09:03:38PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> v5: - don't deprecate IDs and return to ID & QOM scheme
> - split complicated HOTPLUG_STATE patch into several ones


picked up 1-12

new events and commands need more review, in particular by qapi
maintainers.

> 
> 
> The main patches are the last four ones:
> 
> - introduce HOTPLUG_STATE event, that inform when hotplug controller
> change it's state, especially indicator leds
> 
> - query-hotplug command, that provides same information as event on
> demand
> 
> - DEVICE_ON event - a kind of counterpart for DEVICE_DELETED, signals
> when device is finally accepted by guest, power indicator is on and so
> on.
> 
> That's all for smarter handling of SHPC and PCIe-native hotplug.
> 
> If you want to test new events, don't forget
>   -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off
> flag, to disable ACPI hotplug default.
> 
> Vladimir Sementsov-Ogievskiy (18):
>   pci/shpc: set attention led to OFF on reset
>   pci/shpc: change shpc_get_status() return type to uint8_t
>   pci/shpc: shpc_slot_command(): handle PWRONLY -> ENABLED transition
>   pci/shpc: more generic handle hot-unplug in shpc_slot_command()
>   pci/shpc: pass PCIDevice pointer to shpc_slot_command()
>   pci/shpc: refactor shpc_device_plug_common()
>   pcie: pcie_cap_slot_write_config(): use correct macro
>   pcie_regs: drop duplicated indicator value macros
>   pcie: drop unused PCIExpressIndicator
>   pcie: pcie_cap_slot_enable_power() use correct helper
>   pcie: introduce pcie_sltctl_powered_off() helper
>   pcie: set power indicator to off on reset by default
>   pci: introduce pci_find_the_only_child()
>   qapi/qdev.json: unite DEVICE_* event data into single structure
>   qapi: add HOTPLUG_STATE infrastructure
>   shpc: implement HOTPLUG_STATE event and query-hotplug
>   pcie: implement HOTPLUG_STATE event and query-hotplug
>   qapi: introduce DEVICE_ON event
> 
>  qapi/qdev.json  | 224 ++--
>  include/hw/hotplug.h|  12 ++
>  include/hw/pci/pci.h|   1 +
>  include/hw/pci/pci_bridge.h |   2 +
>  include/hw/pci/pcie.h   |  10 +-
>  include/hw/pci/pcie_regs.h  |  14 --
>  include/hw/pci/shpc.h   |   2 +
>  include/monitor/qdev.h  |   7 +
>  hw/core/hotplug.c   |  13 ++
>  hw/pci-bridge/pci_bridge_dev.c  |  14 ++
>  hw/pci-bridge/pcie_pci_bridge.c |   1 +
>  hw/pci/pci.c|  33 +
>  hw/pci/pcie.c   | 122 +++--
>  hw/pci/pcie_port.c  |   1 +
>  hw/pci/shpc.c   | 214 ++
>  softmmu/qdev-monitor.c  |  67 ++
>  16 files changed, 639 insertions(+), 98 deletions(-)
> 
> -- 
> 2.34.1




Re: [PATCH v3 4/8] hw/isa/vt82c686: Implement PCI IRQ routing

2023-03-01 Thread BALATON Zoltan

On Wed, 1 Mar 2023, Mark Cave-Ayland wrote:

On 27/02/2023 16:52, Bernhard Beschow wrote:
On Mon, Feb 27, 2023 at 1:57 PM BALATON Zoltan > wrote:


On Mon, 27 Feb 2023, BALATON Zoltan wrote:
 > On Mon, 27 Feb 2023, BALATON Zoltan wrote:
 >> On Mon, 27 Feb 2023, Bernhard Beschow wrote:
 >>> Am 26. Februar 2023 23:33:20 UTC schrieb BALATON Zoltan
 >>> mailto:bala...@eik.bme.hu>>:
  On Sun, 26 Feb 2023, Bernhard Beschow wrote:
 > Am 25. Februar 2023 18:11:49 UTC schrieb BALATON Zoltan
 > mailto:bala...@eik.bme.hu>>:
 >> From: Bernhard Beschow >

 >>
 >> The real VIA south bridges implement a PCI IRQ router which is
 >> configured
 >> by the BIOS or the OS. In order to respect these 
configurations, QEMU

 >> needs to implement it as well.
 >>
 >> Note: The implementation was taken from piix4_set_irq() in
 >> hw/isa/piix4.
 >>
 >> Signed-off-by: Bernhard Beschow mailto:shen...@gmail.com>>
 >> [balaton: declare gpio inputs instead of changing pci bus irqs 
so it

 >> can
 >> be connected in board code; remove some empty lines]
 >> Signed-off-by: BALATON Zoltan mailto:bala...@eik.bme.hu>>
 >> Tested-by: Rene Engel >

 >> ---
 >> hw/isa/vt82c686.c | 39 +++
 >> 1 file changed, 39 insertions(+)
 >>
 >> diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
 >> index 3f9bd0c04d..4025f9bcdc 100644
 >> --- a/hw/isa/vt82c686.c
 >> +++ b/hw/isa/vt82c686.c
 >> @@ -604,6 +604,44 @@ static void via_isa_request_i8259_irq(void
 >> *opaque, int irq, int level)
 >>     qemu_set_irq(s->cpu_intr, level);
 >> }
 >>
 >> +static int via_isa_get_pci_irq(const ViaISAState *s, int 
irq_num)

 >> +{
 >> +    switch (irq_num) {
 >> +    case 0:
 >> +        return s->dev.config[0x55] >> 4;
 >> +    case 1:
 >> +        return s->dev.config[0x56] & 0xf;
 >> +    case 2:
 >> +        return s->dev.config[0x56] >> 4;
 >> +    case 3:
 >> +        return s->dev.config[0x57] >> 4;
 >> +    }
 >> +    return 0;
 >> +}
 >> +
 >> +static void via_isa_set_pci_irq(void *opaque, int irq_num, int 
level)

 >> +{
 >> +    ViaISAState *s = opaque;
 >> +    PCIBus *bus = pci_get_bus(>dev);
 >> +    int pic_irq;
 >> +
 >> +    /* now we change the pic irq level according to the via 
irq

 >> mappings */
 >> +    /* XXX: optimize */
 >> +    pic_irq = via_isa_get_pci_irq(s, irq_num);
 >> +    if (pic_irq < ISA_NUM_IRQS) {
 >> +        int i, pic_level;
 >> +
 >> +        /* The pic level is the logical OR of all the PCI irqs 
mapped

 >> to it. */
 >> +        pic_level = 0;
 >> +        for (i = 0; i < PCI_NUM_PINS; i++) {
 >> +            if (pic_irq == via_isa_get_pci_irq(s, i)) {
 >> +                pic_level |= pci_bus_get_irq_level(bus, i);
 >> +            }
 >> +        }
 >> +        qemu_set_irq(s->isa_irqs[pic_irq], pic_level);
 >> +    }
 >> +}
 >> +
 >> static void via_isa_realize(PCIDevice *d, Error **errp)
 >> {
 >>     ViaISAState *s = VIA_ISA(d);
 >> @@ -614,6 +652,7 @@ static void via_isa_realize(PCIDevice *d, 
Error

 >> **errp)
 >>     int i;
 >>
 >>     qdev_init_gpio_out(dev, >cpu_intr, 1);
 >> +    qdev_init_gpio_in_named(dev, via_isa_set_pci_irq, "pirq",
 >> PCI_NUM_PINS);
 >
 > This line is a Pegasos2 specific addition for fixing its IRQ 
handling.
 > Since this code must also work with the Fuloong2e board we 
should aim
 > for a minimal changeset here which renders this line out of 
scope.

 >
 > Let's keep the two series separate since now I need to watch two 
series

 > for comments. Please use Based-on: tag next time instead.
 
  Well, it's not. It's part of the QDev model for VT8231 that 
allows it to
  be connected by boards so I think this belongs here otherwise 
this won't
  even compile because the function you've added would be unused 
and bail
  on -Werror. Let's not make this more difficult than it is. I'm OK 
with
  reasonable changes but what's your goal now? You can't get rid of 
this
  line as it's how QDev can model it. Either I have to call into 
this model

  or have to export these pins as gpios.
 >>>
 >>> Exporting the pins is a separate aspect on top of implementing PCI 
IRQ
 >>> routing. To make 

Re: [PATCH v5 00/18] pci hotplug tracking

2023-03-01 Thread Michael S. Tsirkin
On Thu, Feb 16, 2023 at 09:03:38PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> v5: - don't deprecate IDs and return to ID & QOM scheme
> - split complicated HOTPLUG_STATE patch into several ones


One small point: when you change patchset subject, that is ok,
but pls reply to old patchset with an email explaining that.

> 
> 
> The main patches are the last four ones:
> 
> - introduce HOTPLUG_STATE event, that inform when hotplug controller
> change it's state, especially indicator leds
> 
> - query-hotplug command, that provides same information as event on
> demand
> 
> - DEVICE_ON event - a kind of counterpart for DEVICE_DELETED, signals
> when device is finally accepted by guest, power indicator is on and so
> on.
> 
> That's all for smarter handling of SHPC and PCIe-native hotplug.
> 
> If you want to test new events, don't forget
>   -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off
> flag, to disable ACPI hotplug default.
> 
> Vladimir Sementsov-Ogievskiy (18):
>   pci/shpc: set attention led to OFF on reset
>   pci/shpc: change shpc_get_status() return type to uint8_t
>   pci/shpc: shpc_slot_command(): handle PWRONLY -> ENABLED transition
>   pci/shpc: more generic handle hot-unplug in shpc_slot_command()
>   pci/shpc: pass PCIDevice pointer to shpc_slot_command()
>   pci/shpc: refactor shpc_device_plug_common()
>   pcie: pcie_cap_slot_write_config(): use correct macro
>   pcie_regs: drop duplicated indicator value macros
>   pcie: drop unused PCIExpressIndicator
>   pcie: pcie_cap_slot_enable_power() use correct helper
>   pcie: introduce pcie_sltctl_powered_off() helper
>   pcie: set power indicator to off on reset by default
>   pci: introduce pci_find_the_only_child()
>   qapi/qdev.json: unite DEVICE_* event data into single structure
>   qapi: add HOTPLUG_STATE infrastructure
>   shpc: implement HOTPLUG_STATE event and query-hotplug
>   pcie: implement HOTPLUG_STATE event and query-hotplug
>   qapi: introduce DEVICE_ON event
> 
>  qapi/qdev.json  | 224 ++--
>  include/hw/hotplug.h|  12 ++
>  include/hw/pci/pci.h|   1 +
>  include/hw/pci/pci_bridge.h |   2 +
>  include/hw/pci/pcie.h   |  10 +-
>  include/hw/pci/pcie_regs.h  |  14 --
>  include/hw/pci/shpc.h   |   2 +
>  include/monitor/qdev.h  |   7 +
>  hw/core/hotplug.c   |  13 ++
>  hw/pci-bridge/pci_bridge_dev.c  |  14 ++
>  hw/pci-bridge/pcie_pci_bridge.c |   1 +
>  hw/pci/pci.c|  33 +
>  hw/pci/pcie.c   | 122 +++--
>  hw/pci/pcie_port.c  |   1 +
>  hw/pci/shpc.c   | 214 ++
>  softmmu/qdev-monitor.c  |  67 ++
>  16 files changed, 639 insertions(+), 98 deletions(-)
> 
> -- 
> 2.34.1




Re: [PATCH v2 07/10] hw/ide/piix: Require an ISABus only for user-created instances

2023-03-01 Thread Bernhard Beschow



Am 1. März 2023 16:42:16 UTC schrieb Mark Cave-Ayland 
:
>On 23/02/2023 20:46, Bernhard Beschow wrote:
>> 
>> 
>> Am 7. Februar 2023 20:52:02 UTC schrieb Mark Cave-Ayland 
>> :
>>> On 06/02/2023 23:40, Bernhard Beschow wrote:
>>> 
 Am 5. Februar 2023 22:32:03 UTC schrieb Mark Cave-Ayland 
 :
> On 05/02/2023 22:21, BALATON Zoltan wrote:
> 
>> On Sun, 5 Feb 2023, Mark Cave-Ayland wrote:
>>> On 26/01/2023 21:17, Bernhard Beschow wrote:
 Internal instances now defer interrupt wiring to the caller which
 decouples them from the ISABus. User-created devices still fish out the
 ISABus from the QOM tree and the interrupt wiring remains in PIIX IDE.
 The latter mechanism is considered a workaround and intended to be
 removed once a deprecation period for user-created PIIX IDE devices is
 over.
 
 Signed-off-by: Bernhard Beschow 
 ---
     include/hw/ide/pci.h |  1 +
     hw/ide/piix.c    | 64 
 ++--
     hw/isa/piix.c    |  5 
     3 files changed, 56 insertions(+), 14 deletions(-)
 
 diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
 index 24c0b7a2dd..ee2c8781b7 100644
 --- a/include/hw/ide/pci.h
 +++ b/include/hw/ide/pci.h
 @@ -54,6 +54,7 @@ struct PCIIDEState {
     MemoryRegion bmdma_bar;
     MemoryRegion cmd_bar[2];
     MemoryRegion data_bar[2];
 +    bool user_created;
     };
       static inline IDEState *bmdma_active_if(BMDMAState *bmdma)
 diff --git a/hw/ide/piix.c b/hw/ide/piix.c
 index 5980045db0..f0d95761ac 100644
 --- a/hw/ide/piix.c
 +++ b/hw/ide/piix.c
 @@ -108,6 +108,13 @@ static void bmdma_setup_bar(PCIIDEState *d)
     }
     }
     +static void piix_ide_set_irq(void *opaque, int n, int level)
 +{
 +    PCIIDEState *d = opaque;
 +
 +    qemu_set_irq(d->isa_irqs[n], level);
 +}
 +
     static void piix_ide_reset(DeviceState *dev)
     {
     PCIIDEState *d = PCI_IDE(dev);
 @@ -138,11 +145,18 @@ static void pci_piix_init_ports(PCIIDEState *d, 
 ISABus *isa_bus)
     };
     int i;
     +    if (isa_bus) {
 +    d->isa_irqs[0] = isa_bus->irqs[port_info[0].isairq];
 +    d->isa_irqs[1] = isa_bus->irqs[port_info[1].isairq];
 +    } else {
 +    qdev_init_gpio_out(DEVICE(d), d->isa_irqs, 2);
 +    }
 +
     for (i = 0; i < 2; i++) {
     ide_bus_init(>bus[i], sizeof(d->bus[i]), DEVICE(d), i, 
 2);
     ide_init_ioport(>bus[i], NULL, port_info[i].iobase,
     port_info[i].iobase2);
 -    ide_init2(>bus[i], isa_bus->irqs[port_info[i].isairq]);
 +    ide_init2(>bus[i], qdev_get_gpio_in(DEVICE(d), i));
       bmdma_init(>bus[i], >bmdma[i], d);
     d->bmdma[i].bus = >bus[i];
 @@ -154,8 +168,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, 
 Error **errp)
     {
     PCIIDEState *d = PCI_IDE(dev);
     uint8_t *pci_conf = dev->config;
 -    ISABus *isa_bus;
 -    bool ambiguous;
 +    ISABus *isa_bus = NULL;
       pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
     @@ -164,22 +177,36 @@ static void pci_piix_ide_realize(PCIDevice 
 *dev, Error **errp)
       vmstate_register(VMSTATE_IF(dev), 0, _ide_pci, d);
     -    isa_bus = ISA_BUS(object_resolve_path_type("", TYPE_ISA_BUS, 
 ));
 -    if (ambiguous) {
 -    error_setg(errp,
 -   "More than one ISA bus found while %s supports 
 only one",
 -   object_get_typename(OBJECT(dev)));
 -    return;
 -    }
 -    if (!isa_bus) {
 -    error_setg(errp, "No ISA bus found while %s requires one",
 -   object_get_typename(OBJECT(dev)));
 -    return;
 +    if (d->user_created) {
 +    bool ambiguous;
 +
 +    isa_bus = ISA_BUS(object_resolve_path_type("", TYPE_ISA_BUS,
 +   ));
 +
 +    if (ambiguous) {
 +    error_setg(errp,
 +   "More than one ISA bus found while %s supports 
 only one",
 +   object_get_typename(OBJECT(dev)));
 +    return;
 +    }
 +
 +    if (!isa_bus) {
 +    error_setg(errp, "No ISA bus found while %s requires one",

Re: [PATCH v5 13/18] pci: introduce pci_find_the_only_child()

2023-03-01 Thread Michael S. Tsirkin
On Thu, Feb 16, 2023 at 09:03:51PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> To be used in further patch to identify the device hot-plugged into
> pcie-root-port.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Anton Kuchin 

Wait a second does this work for multifunction devices correctly?

> ---
>  include/hw/pci/pci.h |  1 +
>  hw/pci/pci.c | 33 +
>  2 files changed, 34 insertions(+)
> 
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index d5a40cd058..b6c9c44527 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -341,6 +341,7 @@ void pci_for_each_device_under_bus_reverse(PCIBus *bus,
>  void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
>pci_bus_fn end, void *parent_state);
>  PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
> +PCIDevice *pci_find_the_only_child(PCIBus *bus, int bus_num, Error **errp);
>  
>  /* Use this wrapper when specific scan order is not required. */
>  static inline
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 208c16f450..34fd1fb5b8 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1771,6 +1771,39 @@ void pci_for_each_device(PCIBus *bus, int bus_num,
>  }
>  }
>  
> +typedef struct TheOnlyChild {
> +PCIDevice *dev;
> +int count;
> +} TheOnlyChild;
> +
> +static void the_only_child_fn(PCIBus *bus, PCIDevice *dev, void *opaque)
> +{
> +TheOnlyChild *s = opaque;
> +
> +s->dev = dev;
> +s->count++;
> +}
> +
> +PCIDevice *pci_find_the_only_child(PCIBus *bus, int bus_num, Error **errp)
> +{
> +TheOnlyChild res = {0};
> +
> +pci_for_each_device(bus, bus_num, the_only_child_fn, );
> +
> +if (!res.dev) {
> +assert(res.count == 0);
> +error_setg(errp, "No child devices found");
> +return NULL;
> +}
> +
> +if (res.count > 1) {
> +error_setg(errp, "Several child devices found");
> +return NULL;
> +}
> +
> +return res.dev;
> +}
> +
>  const pci_class_desc *get_class_desc(int class)
>  {
>  const pci_class_desc *desc;
> -- 
> 2.34.1




Re: [PATCH v5 3/7] hw/isa/vt82c686: Implement PCI IRQ routing

2023-03-01 Thread Bernhard Beschow



Am 1. März 2023 11:15:02 UTC schrieb BALATON Zoltan :
>On Wed, 1 Mar 2023, Bernhard Beschow wrote:
>> Am 1. März 2023 00:17:09 UTC schrieb BALATON Zoltan :
>>> The real VIA south bridges implement a PCI IRQ router which is configured
>>> by the BIOS or the OS. In order to respect these configurations, QEMU
>>> needs to implement it as well. The real chip may allow routing IRQs from
>>> internal functions independently of PCI interrupts but since guests
>>> usually configute it to a single shared interrupt we don't model that
>>> here for simplicity.
>>> 
>>> Note: The implementation was taken from piix4_set_irq() in hw/isa/piix4.
>>> 
>>> Suggested-by: Bernhard Beschow 
>>> Signed-off-by: BALATON Zoltan 
>>> ---
>>> hw/isa/vt82c686.c | 38 +-
>>> 1 file changed, 37 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
>>> index 01e0148967..018a119964 100644
>>> --- a/hw/isa/vt82c686.c
>>> +++ b/hw/isa/vt82c686.c
>>> @@ -604,6 +604,42 @@ static void via_isa_request_i8259_irq(void *opaque, 
>>> int irq, int level)
>>> qemu_set_irq(s->cpu_intr, level);
>>> }
>>> 
>>> +static int via_isa_get_pci_irq(const ViaISAState *s, int irq_num)
>>> +{
>>> +switch (irq_num) {
>>> +case 0:
>>> +return s->dev.config[0x55] >> 4;
>>> +case 1:
>>> +return s->dev.config[0x56] & 0xf;
>>> +case 2:
>>> +return s->dev.config[0x56] >> 4;
>>> +case 3:
>>> +return s->dev.config[0x57] >> 4;
>>> +}
>>> +return 0;
>>> +}
>>> +
>>> +static void via_isa_set_pci_irq(void *opaque, int irq_num, int level)
>>> +{
>>> +ViaISAState *s = opaque;
>>> +PCIBus *bus = pci_get_bus(>dev);
>>> +int i, pic_level, pic_irq = via_isa_get_pci_irq(s, irq_num);
>>> +
>>> +if (unlikely(pic_irq == 0 || pic_irq == 2 || pic_irq > 14)) {
>> 
>> Where does the "pic_irq > 14" come from? It's not mentioned in the datasheet.
>
>Check at 0x3c register of USB and AC97 functions. For the others it may be 
>valid but unlikely to be used hence we just disallow it. (In my version which 
>also mapped IDE here I've checkrf for each source but there's no way to do 
>that in this version.)

I'm not sure what you mean. The 0x3c regs aren't related to the PCI IRQ routing 
regs.

Moreover, as I wrote in my other mail, I wonder what the datasheet tries to 
tell us here at all. The information there partly contradicts itself.

Can you please clarify?

Thanks,
Bernhard

>
>Regards,
>BALATON Zoltan
>
>>> +return;
>>> +}
>>> +
>>> +/* The pic level is the logical OR of all the PCI irqs mapped to it. */
>>> +pic_level = 0;
>>> +for (i = 0; i < PCI_NUM_PINS; i++) {
>>> +if (pic_irq == via_isa_get_pci_irq(s, i)) {
>>> +pic_level |= pci_bus_get_irq_level(bus, i);
>>> +}
>>> +}
>>> +/* Now we change the pic irq level according to the via irq mappings. 
>>> */
>>> +qemu_set_irq(s->isa_irqs_in[pic_irq], pic_level);
>>> +}
>>> +
>>> static void via_isa_realize(PCIDevice *d, Error **errp)
>>> {
>>> ViaISAState *s = VIA_ISA(d);
>>> @@ -615,9 +651,9 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
>>> 
>>> qdev_init_gpio_out(dev, >cpu_intr, 1);
>>> isa_irq = qemu_allocate_irqs(via_isa_request_i8259_irq, s, 1);
>>> +qdev_init_gpio_in_named(dev, via_isa_set_pci_irq, "pirq", 
>>> PCI_NUM_PINS);
>>> isa_bus = isa_bus_new(dev, pci_address_space(d), 
>>> pci_address_space_io(d),
>>>   errp);
>>> -
>>> if (!isa_bus) {
>>> return;
>>> }
>> 
>>



Re: [PATCH v5 14/18] qapi/qdev.json: unite DEVICE_* event data into single structure

2023-03-01 Thread Michael S. Tsirkin
On Thu, Feb 16, 2023 at 09:03:52PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> DEVICE_DELETED and DEVICE_UNPLUG_GUEST_ERROR has equal data, let's
> refactor it to one structure. That also helps to add new events
> consistently.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Needs ack from QAPI maintainers.

> ---
>  qapi/qdev.json | 39 +++
>  1 file changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index 2708fb4e99..135cd81586 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -114,16 +114,37 @@
>  { 'command': 'device_del', 'data': {'id': 'str'} }
>  
>  ##
> -# @DEVICE_DELETED:
> +# @DeviceAndPath:
>  #
> -# Emitted whenever the device removal completion is acknowledged by the 
> guest.
> -# At this point, it's safe to reuse the specified device ID. Device removal 
> can
> -# be initiated by the guest or by HMP/QMP commands.
> +# In events we designate devices by both their ID (if the device has one)
> +# and QOM path.
> +#
> +# Why we need ID? User specify ID in device_add command and in command line
> +# and expects same identifier in the event data.
> +#
> +# Why we need QOM path? Some devices don't have ID and we still want to emit
> +# events for them.
> +#
> +# So, we have a bit of redundancy, as QOM path for device that has ID is
> +# always /machine/peripheral/ID. But that's hard to change keeping both
> +# simple interface for most users and universality for the generic case.
>  #
>  # @device: the device's ID if it has one
>  #
>  # @path: the device's QOM path
>  #
> +# Since: 8.0
> +##
> +{ 'struct': 'DeviceAndPath',
> +  'data': { '*device': 'str', 'path': 'str' } }
> +
> +##
> +# @DEVICE_DELETED:
> +#
> +# Emitted whenever the device removal completion is acknowledged by the 
> guest.
> +# At this point, it's safe to reuse the specified device ID. Device removal 
> can
> +# be initiated by the guest or by HMP/QMP commands.
> +#
>  # Since: 1.5
>  #
>  # Example:
> @@ -134,18 +155,13 @@
>  #  "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
>  #
>  ##
> -{ 'event': 'DEVICE_DELETED',
> -  'data': { '*device': 'str', 'path': 'str' } }
> +{ 'event': 'DEVICE_DELETED', 'data': 'DeviceAndPath' }
>  
>  ##
>  # @DEVICE_UNPLUG_GUEST_ERROR:
>  #
>  # Emitted when a device hot unplug fails due to a guest reported error.
>  #
> -# @device: the device's ID if it has one
> -#
> -# @path: the device's QOM path
> -#
>  # Since: 6.2
>  #
>  # Example:
> @@ -156,5 +172,4 @@
>  #  "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
>  #
>  ##
> -{ 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
> -  'data': { '*device': 'str', 'path': 'str' } }
> +{ 'event': 'DEVICE_UNPLUG_GUEST_ERROR', 'data': 'DeviceAndPath' }
> -- 
> 2.34.1




Re: [PATCH v5 18/18] qapi: introduce DEVICE_ON event

2023-03-01 Thread Michael S. Tsirkin
On Thu, Feb 16, 2023 at 09:03:56PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> We have DEVICE_DELETED event, that signals that device_del command is
> actually completed. But we don't have a counter-part for device_add.
> Still it's sensible for SHPC and PCIe-native hotplug, as there are time
> when the device in some intermediate state. Let's add an event that say
> that the device is finally powered on, power indicator is on and
> everything is OK for next manipulation on that device.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

I don't much mind though a bit more motivation would be nice.
How is this going to be used? When does management care?

Meanwhile, for the schema - can this one get ACKs from QAPI maintainers please?


> ---
>  qapi/qdev.json | 10 ++
>  hw/pci/pcie.c  | 14 ++
>  hw/pci/shpc.c  | 12 
>  3 files changed, 36 insertions(+)
> 
> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index 6f2d8d6647..116a8a7de8 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -348,3 +348,13 @@
>  { 'command': 'query-hotplug',
>'data': { 'id': 'str' },
>'returns': 'HotplugInfo' }
> +
> +##
> +# @DEVICE_ON:
> +#
> +# Emitted whenever the device insertion completion is acknowledged by the 
> guest.
> +# For now only emitted for SHPC and PCIe-native hotplug.
> +#
> +# Since: 8.0
> +##
> +{ 'event': 'DEVICE_ON', 'data': 'DeviceAndPath' }
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 636f962a23..4297e4e8dc 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -22,6 +22,7 @@
>  
>  #include "monitor/qdev.h"
>  #include "qapi/error.h"
> +#include "qapi/qapi-events-qdev.h"
>  #include "hw/pci/pci_bridge.h"
>  #include "hw/pci/pcie.h"
>  #include "hw/pci/msix.h"
> @@ -47,6 +48,13 @@ static bool pcie_sltctl_powered_off(uint16_t sltctl)
>  && (sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_OFF;
>  }
>  
> +static bool pcie_sltctl_powered_on(uint16_t sltctl)
> +{
> +return (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON &&
> +(sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_ON &&
> +(sltctl & PCI_EXP_SLTCTL_AIC) == PCI_EXP_SLTCTL_ATTN_IND_OFF;
> +}
> +
>  static LedActivity pcie_led_state_to_qapi(uint16_t value)
>  {
>  switch (value) {
> @@ -816,6 +824,12 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
>  qdev_hotplug_state_event(DEVICE(dev), NULL, child_dev, 
> _state);
>  }
>  
> +if ((sltsta & PCI_EXP_SLTSTA_PDS) && pcie_sltctl_powered_on(val) &&
> +!pcie_sltctl_powered_on(old_slt_ctl) && child_dev)
> +{
> +qapi_event_send_device_on(child_dev->id, child_dev->canonical_path);
> +}
> +
>  /*
>   * If the slot is populated, power indicator is off and power
>   * controller is off, it is safe to detach the devices.
> diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
> index 6a4f93949d..380b2b83b3 100644
> --- a/hw/pci/shpc.c
> +++ b/hw/pci/shpc.c
> @@ -299,6 +299,12 @@ static bool shpc_slot_is_off(uint8_t state, uint8_t 
> power, uint8_t attn)
>  return state == SHPC_STATE_DISABLED && power == SHPC_LED_OFF;
>  }
>  
> +static bool shpc_slot_is_on(uint8_t state, uint8_t power, uint8_t attn)
> +{
> +return state == SHPC_STATE_ENABLED && power == SHPC_LED_ON &&
> +attn == SHPC_LED_OFF;
> +}
> +
>  static void shpc_slot_command(PCIDevice *d, uint8_t target,
>uint8_t state, uint8_t power, uint8_t attn)
>  {
> @@ -366,6 +372,12 @@ static void shpc_slot_command(PCIDevice *d, uint8_t 
> target,
>  SHPC_SLOT_EVENT_MRL |
>  SHPC_SLOT_EVENT_PRESENCE;
>  }
> +
> +if (!shpc_slot_is_on(old_state, old_power, old_attn) &&
> +shpc_slot_is_on(state, power, attn) && child_dev)
> +{
> +qapi_event_send_device_on(child_dev->id, child_dev->canonical_path);
> +}
>  }
>  
>  static void shpc_command(PCIDevice *d)
> -- 
> 2.34.1




Re: [PATCH 0/5] Pegasos2 fixes and audio output support

2023-03-01 Thread BALATON Zoltan

On Wed, 1 Mar 2023, Bernhard Beschow wrote:

Am 1. März 2023 19:24:20 UTC schrieb BALATON Zoltan :

On Wed, 1 Mar 2023, Mark Cave-Ayland wrote:

On 23/02/2023 09:13, Bernhard Beschow wrote:

Am 22. Februar 2023 23:00:02 UTC schrieb BALATON Zoltan :

On Wed, 22 Feb 2023, Bernhard Beschow wrote:

Am 22. Februar 2023 21:12:01 UTC schrieb BALATON Zoltan :

On Wed, 22 Feb 2023, Bernhard Beschow wrote:

Am 22. Februar 2023 19:25:16 UTC schrieb BALATON Zoltan :

On Wed, 22 Feb 2023, Bernhard Beschow wrote:

On Wed, Feb 22, 2023 at 4:38 PM Bernhard Beschow  wrote:

I've had a closer look at your series and I think it can be simplified:
Patch 2 can be implemented quite straight-forward like I proposed in a
private mail: https://github.com/shentok/qemu/commit/via-priq-routing.
Then, in order to make patch 3 "hw/ppc/pegasos2: Fix PCI interrupt routing"
working, one can expose the PCI interrupts with a single line like you do
in patch 2. With this, patch 1 "hw/isa/vt82c686: Implement interrupt
routing in via_isa_set_irq" isn't needed any longer and can be omitted.

In via-ac97, rather than using via_isa_set_irq(), pci_set_irq() can be
used instead. pci_set_irq() internally takes care of all the ISA interrupt
level tracking patch 1 attempted to address.



Here is a proof of concept branch to demonstrate that the simplification
actually works: https://github.com/shentok/qemu/commits/pegasos2 (Tested
with MorphOS with and without pegasos2.rom).


Does this only work because both the via-ac97 and the PCI interrupts are mapped 
to the same ISA IRQ and you've only tested sound? The guest could configure 
each device to use a different IRQ, also mapping them so they share one ISA 
interrupt. What happens if multiple devices are mapped to IRQ 9 (which is the 
case on pegasos2 where PCI cards, ac97 and USB all share this IRQ) and more 
than one such device wants to raise an interrupt at the same time? If you ack 
the ac97 interrupt but a PCI network card or the USB part still wants to get 
the CPUs attention the ISA IRQ should remain raised until all devices are 
serviced.


pci_bus_get_irq_level(), used in via_isa_set_pci_irq(), should handle
exactly that case very well.


I don't see a way to track the status of all devices in a single qemu_irq which 
can only be up or down so we need something to store the state of each source.


pci_set_irq() causes pci_bus_change_irq_level() to be called.
pci_bus_change_irq_level() tracks the sum of all irq levels of all
devices attached to a particular pin in irq_count. Have a look at
pci_bus_change_irq_level() and you will understand better.


I'm aware of that, we're using that in sam460ex which connects all PCI 
interrupt lines to a single IRQ and Peter explored and explained it in a 
comment there when that was discovered. First we had a patch with or-irq but 
due to this behaviot that's not needed for PCI interrupts. But the VT8132 could 
change what ISA IRQ you route the sub functions to.


That depends on the sub function if you can do that. And if so, then it depends 
on whether the function is still in PCI mode (see below).


It happens that on pegasos2 by default all of those are routed to IRQ9 except 
IDE


All *PCI* interrupts are routed to IRQ9 while IDE legacy interrupts are routed 
to the compatible ISA IRQs. Note that the IDE function must only trigger the 
ISA IRQs if it is in legacy mode while it must only trigger the PCI IRQ in 
non-legacy mode. See https://www.bswd.com/pciide.pdf for more details on this 
particular topic.


The docs say so but based on what guests that work on real hardware do it does 
not work that way. Look up previous discussion on this on the list from around 
the time Mark changed via-ide about 4-5 years ago. That series was a result of 
his review of my proposed changes and gave resuled in an alternative appdroach. 
On pegasos2 (and probably also on fuloong2e based on same later findings, see 
patches to that, I can try to find these later if you can't find them) via-ide 
*always* uses IRQ 14/15 and the native mode only switches register addresses 
from legacy io ports to PCI io space so you can set it in with BAR regs but the 
IRQs don't change despite what the docs say. There are some hacks in Linux 
kernel and other guests to account for this but the comments for the reason are 
wrong in Linux, they say IDE is always in legacy mode but in fact if has a 
half-native mode which is what I called it where io addresses are set with BARs 
but IRQs are still the legacy ISA ones. You can fin

d some references in previous discussion. Probably searching for via-ide 
half-native mode might find it.



but what if a guest changes ac97 to use a different interrupt? Then it's not a 
PCI interrupt any more so you can't use pci_set_irq in via=ac97.


How would it do that? AFAICS there is no dedicated register to configure which IRQ 
to use. This means that it can only trigger an interrupt via its PCI intx pin 
which is subject to the PCI -> ISA IRQ 

Re: [PATCH v3 4/8] hw/isa/vt82c686: Implement PCI IRQ routing

2023-03-01 Thread Mark Cave-Ayland

On 27/02/2023 16:52, Bernhard Beschow wrote:

On Mon, Feb 27, 2023 at 1:57 PM BALATON Zoltan > wrote:


On Mon, 27 Feb 2023, BALATON Zoltan wrote:
 > On Mon, 27 Feb 2023, BALATON Zoltan wrote:
 >> On Mon, 27 Feb 2023, Bernhard Beschow wrote:
 >>> Am 26. Februar 2023 23:33:20 UTC schrieb BALATON Zoltan
 >>> mailto:bala...@eik.bme.hu>>:
  On Sun, 26 Feb 2023, Bernhard Beschow wrote:
 > Am 25. Februar 2023 18:11:49 UTC schrieb BALATON Zoltan
 > mailto:bala...@eik.bme.hu>>:
 >> From: Bernhard Beschow mailto:shen...@gmail.com>>
 >>
 >> The real VIA south bridges implement a PCI IRQ router which is
 >> configured
 >> by the BIOS or the OS. In order to respect these configurations, 
QEMU
 >> needs to implement it as well.
 >>
 >> Note: The implementation was taken from piix4_set_irq() in
 >> hw/isa/piix4.
 >>
 >> Signed-off-by: Bernhard Beschow mailto:shen...@gmail.com>>
 >> [balaton: declare gpio inputs instead of changing pci bus irqs so it
 >> can
 >> be connected in board code; remove some empty lines]
 >> Signed-off-by: BALATON Zoltan mailto:bala...@eik.bme.hu>>
 >> Tested-by: Rene Engel mailto:reneenge...@emailn.de>>
 >> ---
 >> hw/isa/vt82c686.c | 39 +++
 >> 1 file changed, 39 insertions(+)
 >>
 >> diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
 >> index 3f9bd0c04d..4025f9bcdc 100644
 >> --- a/hw/isa/vt82c686.c
 >> +++ b/hw/isa/vt82c686.c
 >> @@ -604,6 +604,44 @@ static void via_isa_request_i8259_irq(void
 >> *opaque, int irq, int level)
 >>     qemu_set_irq(s->cpu_intr, level);
 >> }
 >>
 >> +static int via_isa_get_pci_irq(const ViaISAState *s, int irq_num)
 >> +{
 >> +    switch (irq_num) {
 >> +    case 0:
 >> +        return s->dev.config[0x55] >> 4;
 >> +    case 1:
 >> +        return s->dev.config[0x56] & 0xf;
 >> +    case 2:
 >> +        return s->dev.config[0x56] >> 4;
 >> +    case 3:
 >> +        return s->dev.config[0x57] >> 4;
 >> +    }
 >> +    return 0;
 >> +}
 >> +
 >> +static void via_isa_set_pci_irq(void *opaque, int irq_num, int 
level)
 >> +{
 >> +    ViaISAState *s = opaque;
 >> +    PCIBus *bus = pci_get_bus(>dev);
 >> +    int pic_irq;
 >> +
 >> +    /* now we change the pic irq level according to the via irq
 >> mappings */
 >> +    /* XXX: optimize */
 >> +    pic_irq = via_isa_get_pci_irq(s, irq_num);
 >> +    if (pic_irq < ISA_NUM_IRQS) {
 >> +        int i, pic_level;
 >> +
 >> +        /* The pic level is the logical OR of all the PCI irqs 
mapped
 >> to it. */
 >> +        pic_level = 0;
 >> +        for (i = 0; i < PCI_NUM_PINS; i++) {
 >> +            if (pic_irq == via_isa_get_pci_irq(s, i)) {
 >> +                pic_level |= pci_bus_get_irq_level(bus, i);
 >> +            }
 >> +        }
 >> +        qemu_set_irq(s->isa_irqs[pic_irq], pic_level);
 >> +    }
 >> +}
 >> +
 >> static void via_isa_realize(PCIDevice *d, Error **errp)
 >> {
 >>     ViaISAState *s = VIA_ISA(d);
 >> @@ -614,6 +652,7 @@ static void via_isa_realize(PCIDevice *d, Error
 >> **errp)
 >>     int i;
 >>
 >>     qdev_init_gpio_out(dev, >cpu_intr, 1);
 >> +    qdev_init_gpio_in_named(dev, via_isa_set_pci_irq, "pirq",
 >> PCI_NUM_PINS);
 >
 > This line is a Pegasos2 specific addition for fixing its IRQ 
handling.
 > Since this code must also work with the Fuloong2e board we should aim
 > for a minimal changeset here which renders this line out of scope.
 >
 > Let's keep the two series separate since now I need to watch two 
series
 > for comments. Please use Based-on: tag next time instead.
 
  Well, it's not. It's part of the QDev model for VT8231 that allows it 
to
  be connected by boards so I think this belongs here otherwise this 
won't
  even compile because the function you've added would be unused and 
bail
  on -Werror. Let's not make this more difficult than it is. I'm OK with
  reasonable changes but what's your goal now? You can't get rid of this
  line as it's how QDev can model it. Either I have to call into this 
model
  or have to export these pins as gpios.
 >>>
 >>> Exporting the pins is a separate aspect on top of implementing PCI IRQ
 >>> routing. To make this clear and obvious this should be a dedicated 
patch.
 >>> In 

[PATCH 5/6] hmp: convert handle_hmp_command() to AIO_WAIT_WHILE_UNLOCKED()

2023-03-01 Thread Stefan Hajnoczi
The HMP monitor runs in the main loop thread. Calling
AIO_WAIT_WHILE(qemu_get_aio_context(), ...) from the main loop thread is
equivalent to AIO_WAIT_WHILE_UNLOCKED(NULL, ...) because neither unlocks
the AioContext and the latter's assertion that we're in the main loop
succeeds.

Signed-off-by: Stefan Hajnoczi 
---
 monitor/hmp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monitor/hmp.c b/monitor/hmp.c
index 2aa85d3982..5ecbdac802 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -1167,7 +1167,7 @@ void handle_hmp_command(MonitorHMP *mon, const char 
*cmdline)
 Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, );
 monitor_set_cur(co, >common);
 aio_co_enter(qemu_get_aio_context(), co);
-AIO_WAIT_WHILE(qemu_get_aio_context(), !data.done);
+AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
 }
 
 qobject_unref(qdict);
-- 
2.39.2




Re: [PATCH v5 5/7] hw/isa/vt82c686: Work around missing level sensitive irq in i8259 model

2023-03-01 Thread Bernhard Beschow



Am 1. März 2023 14:05:31 UTC schrieb David Woodhouse :
>On Wed, 2023-03-01 at 14:18 +0100, BALATON Zoltan wrote:
>> > Are you sure the PIC ELCR is actually set for the lines you're having
>> > trouble with? Is that something the Pegasos SmartFirmware would have
>> > done, and MorphOS is expecting to inherit but isn't actually setting up
>> > for itself?
>> 
>> No, it works with other guests like Linux and AmigaOS that use PIC as set 
>> up by the firmware but MorphOS tries to use it in level sensitive mode and 
>> likely has an IRQ handler which expects this to work. This is where I've 
>> debugged it and came to this workaround:
>> 
>> https://lists.nongnu.org/archive/html/qemu-ppc/2023-02/msg00403.html
>> 
>> When booting MorphOS with -d unimp I see these logs:
>> 
>> i8259: level sensitive irq not supported
>> i8259: level sensitive irq not supported
>> 
>> which is I guess when it tries to set it for both PICs. (If you want to 
>> try this MorphOS iso is downloadable and instructions how to boot it is 
>> here: http://zero.eik.bme.hu/~balaton/qemu/amiga/#morphos
>
>
>Wow. Even looking at the PIIX3 datasheet from 1996, That 'Edge/Level
>Bank Select (LTIM)' bit was documented as 'This bit is disabled. Its
>function is replaced by the Edge/Level Triggerede Control (ELCR)
>Registers.
>
>We've been able to set the edge/level on a per-pin basis ever since the
>ELCR was introduced with the IBM PS/2, I think.
>
>It isn't a *correct* fix without a little bit more typing, but does
>this make it work?
>
>diff --git a/hw/intc/i8259.c b/hw/intc/i8259.c
>index 17910f3bcb..36ebcff025 100644
>--- a/hw/intc/i8259.c
>+++ b/hw/intc/i8259.c
>@@ -246,6 +246,7 @@ static void pic_ioport_write(void *opaque, hwaddr addr64,
> if (val & 0x08) {
> qemu_log_mask(LOG_UNIMP,
>   "i8259: level sensitive irq not supported\n");
>+s->elcr = 0xff;

Thanks so much, David! You're a genious...

Best regards,

Bernhard

> }
> } else if (val & 0x08) {
> if (val & 0x04) {
>
>
>



[PATCH 3/6] block: convert bdrv_graph_wrlock() to AIO_WAIT_WHILE_UNLOCKED()

2023-03-01 Thread Stefan Hajnoczi
The following conversion is safe and does not change behavior:

 GLOBAL_STATE_CODE();
 ...
  -  AIO_WAIT_WHILE(qemu_get_aio_context(), ...);
  +  AIO_WAIT_WHILE_UNLOCKED(NULL, ...);

Since we're in GLOBAL_STATE_CODE(), qemu_get_aio_context() is our home
thread's AioContext. Thus AIO_WAIT_WHILE() does not unlock the
AioContext:

  if (ctx_ && in_aio_context_home_thread(ctx_)) {\
  while ((cond)) {   \
  aio_poll(ctx_, true);  \
  waited_ = true;\
  }  \

And that means AIO_WAIT_WHILE_UNLOCKED(NULL, ...) can be substituted.

Signed-off-by: Stefan Hajnoczi 
---
 block/graph-lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/graph-lock.c b/block/graph-lock.c
index 454c31e691..639526608f 100644
--- a/block/graph-lock.c
+++ b/block/graph-lock.c
@@ -127,7 +127,7 @@ void bdrv_graph_wrlock(void)
  * reader lock.
  */
 qatomic_set(_writer, 0);
-AIO_WAIT_WHILE(qemu_get_aio_context(), reader_count() >= 1);
+AIO_WAIT_WHILE_UNLOCKED(NULL, reader_count() >= 1);
 qatomic_set(_writer, 1);
 
 /*
-- 
2.39.2




[PATCH 4/6] block: convert bdrv_drain_all_begin() to AIO_WAIT_WHILE_UNLOCKED()

2023-03-01 Thread Stefan Hajnoczi
Since the AioContext argument was already NULL, AIO_WAIT_WHILE() was
never going to unlock the AioContext. Therefore it is possible to
replace AIO_WAIT_WHILE() with AIO_WAIT_WHILE_UNLOCKED().

Signed-off-by: Stefan Hajnoczi 
---
 block/io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 8974d46941..db438c7657 100644
--- a/block/io.c
+++ b/block/io.c
@@ -520,7 +520,7 @@ void bdrv_drain_all_begin(void)
 bdrv_drain_all_begin_nopoll();
 
 /* Now poll the in-flight requests */
-AIO_WAIT_WHILE(NULL, bdrv_drain_all_poll());
+AIO_WAIT_WHILE_UNLOCKED(NULL, bdrv_drain_all_poll());
 
 while ((bs = bdrv_next_all_states(bs))) {
 bdrv_drain_assert_idle(bs);
-- 
2.39.2




[PATCH 6/6] monitor: convert monitor_cleanup() to AIO_WAIT_WHILE_UNLOCKED()

2023-03-01 Thread Stefan Hajnoczi
monitor_cleanup() is called from the main loop thread. Calling
AIO_WAIT_WHILE(qemu_get_aio_context(), ...) from the main loop thread is
equivalent to AIO_WAIT_WHILE_UNLOCKED(NULL, ...) because neither unlocks
the AioContext and the latter's assertion that we're in the main loop
succeeds.

Signed-off-by: Stefan Hajnoczi 
---
 monitor/monitor.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/monitor/monitor.c b/monitor/monitor.c
index 8dc96f6af9..602535696c 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -666,7 +666,7 @@ void monitor_cleanup(void)
  * We need to poll both qemu_aio_context and iohandler_ctx to make
  * sure that the dispatcher coroutine keeps making progress and
  * eventually terminates.  qemu_aio_context is automatically
- * polled by calling AIO_WAIT_WHILE on it, but we must poll
+ * polled by calling AIO_WAIT_WHILE_UNLOCKED on it, but we must poll
  * iohandler_ctx manually.
  *
  * Letting the iothread continue while shutting down the dispatcher
@@ -679,7 +679,7 @@ void monitor_cleanup(void)
 aio_co_wake(qmp_dispatcher_co);
 }
 
-AIO_WAIT_WHILE(qemu_get_aio_context(),
+AIO_WAIT_WHILE_UNLOCKED(NULL,
(aio_poll(iohandler_get_aio_context(), false),
 qatomic_mb_read(_dispatcher_co_busy)));
 
-- 
2.39.2




[PATCH 1/6] block: don't acquire AioContext lock in bdrv_drain_all()

2023-03-01 Thread Stefan Hajnoczi
There is no need for the AioContext lock in bdrv_drain_all() because
nothing in AIO_WAIT_WHILE() needs the lock and the condition is atomic.

Note that the NULL AioContext argument to AIO_WAIT_WHILE() is odd. In
the future it can be removed. There is an assertion in
AIO_WAIT_WHILE() that checks that we're in the main loop AioContext and
we would lose that check by dropping the argument. However, that was a
precursor to the GLOBAL_STATE_CODE()/IO_CODE() macros and is now a
duplicate check. So I think we won't lose much by dropping it, but let's
do a few more AIO_WAIT_WHILE_UNLOCKED() coversions of this sort to
confirm this is the case.

Signed-off-by: Stefan Hajnoczi 
---
 block/block-backend.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 278b04ce69..d2b6b3652d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1835,14 +1835,8 @@ void blk_drain_all(void)
 bdrv_drain_all_begin();
 
 while ((blk = blk_all_next(blk)) != NULL) {
-AioContext *ctx = blk_get_aio_context(blk);
-
-aio_context_acquire(ctx);
-
 /* We may have -ENOMEDIUM completions in flight */
-AIO_WAIT_WHILE(ctx, qatomic_mb_read(>in_flight) > 0);
-
-aio_context_release(ctx);
+AIO_WAIT_WHILE_UNLOCKED(NULL, qatomic_mb_read(>in_flight) > 0);
 }
 
 bdrv_drain_all_end();
-- 
2.39.2




[PATCH 2/6] block: convert blk_exp_close_all_type() to AIO_WAIT_WHILE_UNLOCKED()

2023-03-01 Thread Stefan Hajnoczi
There is no change in behavior. Switch to AIO_WAIT_WHILE_UNLOCKED()
instead of AIO_WAIT_WHILE() to document that this code has already been
audited and converted. The AioContext argument is already NULL so
aio_context_release() is never called anyway.

Signed-off-by: Stefan Hajnoczi 
---
 block/export/export.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/export/export.c b/block/export/export.c
index 28a91c9c42..e3fee60611 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -306,7 +306,7 @@ void blk_exp_close_all_type(BlockExportType type)
 blk_exp_request_shutdown(exp);
 }
 
-AIO_WAIT_WHILE(NULL, blk_exp_has_type(type));
+AIO_WAIT_WHILE_UNLOCKED(NULL, blk_exp_has_type(type));
 }
 
 void blk_exp_close_all(void)
-- 
2.39.2




[PATCH 0/6] block: switch to AIO_WAIT_WHILE_UNLOCKED() where possible

2023-03-01 Thread Stefan Hajnoczi
AIO_WAIT_WHILE_UNLOCKED() is the future replacement for AIO_WAIT_WHILE(). Most
callers haven't been converted yet because they rely on the AioContext lock. I
looked through the code and found the easy cases that can be converted today.

Stefan Hajnoczi (6):
  block: don't acquire AioContext lock in bdrv_drain_all()
  block: convert blk_exp_close_all_type() to AIO_WAIT_WHILE_UNLOCKED()
  block: convert bdrv_graph_wrlock() to AIO_WAIT_WHILE_UNLOCKED()
  block: convert bdrv_drain_all_begin() to AIO_WAIT_WHILE_UNLOCKED()
  hmp: convert handle_hmp_command() to AIO_WAIT_WHILE_UNLOCKED()
  monitor: convert monitor_cleanup() to AIO_WAIT_WHILE_UNLOCKED()

 block/block-backend.c | 8 +---
 block/export/export.c | 2 +-
 block/graph-lock.c| 2 +-
 block/io.c| 2 +-
 monitor/hmp.c | 2 +-
 monitor/monitor.c | 4 ++--
 6 files changed, 7 insertions(+), 13 deletions(-)

-- 
2.39.2




  1   2   3   4   5   >